Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
πfs – A data-free filesystem (github.com/philipl)
98 points by tosh on Sept 29, 2021 | hide | past | favorite | 30 comments


Previous discussions

PiFS – The Data-Free Filesystem (February 20, 2021 — 1 points, 1 comments) - https://news.ycombinator.com/item?id=26208704

Πfs: Never worry about data again (October 25, 2019 — 3 points, 1 comments) - https://news.ycombinator.com/item?id=21359338

The π Filesystem for FUSE: Store Your Data in π (February 21, 2019 — 1 points, 1 comments) - https://news.ycombinator.com/item?id=19223032

pifs - Avoid disk space usage by saving your files in the digits of Pi (December 14, 2018 — 3 points, 1 comments) - https://news.ycombinator.com/item?id=18687275

πfs – A data-free filesystem (March 14, 2017 — 285 points, 105 comments) - https://news.ycombinator.com/item?id=13869691

Πfs: Stores your data in π (January 6, 2016 — 2 points, 1 comments) - https://news.ycombinator.com/item?id=10856108

Πfs: Never worry about data again (January 5, 2016 — 5 points, 1 comments) - https://news.ycombinator.com/item?id=10847693


I love that they ran with this far enough to get it working. We need a graph of the average number of bits to store an offset into pi versus size of stored data.


Well, based on this sentence:

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

I'd say best-case scenario, you're looking at 1:1 offset storage size vs. stored data size :)


How would the location search slow down with inceased block size?

And is the algorithm to do so faster on a quantum computer?


that's way worse than 1:1 unless all integers between 0 and 255 occur in the first 256 digits of pi, which I'm 99.pi% sure is not the case.


Wait until people find out all the CSAM is stored on there. They can't ban π soon enough. Its worse than bitcoin. /s


I had to look up the acronym. Are we not allowed to say "kiddie porn" on here?


Child sexual abuse material because that's what it is: sexual abuse. Calling it porn sends the message that it's legitimate, it isn't.


If you applies the same interpretation as with "gay porn" to "kiddie porn" then its would be something completely different. It would be porn performed (acted) by kids and possibly targeted at kids. That does make much sense. They are not acting, they are abused and people should name it like that.


It's the politically-correct version now, apparently. Hadn't heard it before the Apple phone-scanner debacle.


Love it, the tone of the readme is amazing.


> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

I don't get it. we simply replace one byte (the data) in another byte, or even more than that (the index in pi) What am I missing?

Besides, why do you have to "search" pi? why not just make a table mapping all possible 2/3/4 bytes (256^(2/3/4) combinations) to it's corresponding positions in pi, and every subsequent compression will run much more efficiently.

BTW, it is very easy to show that a simple huffman code based compression yields a better compression ratio than this method.


Looks like "NFS" in the hacker news font.


I wonder if this concept can be used for Πcoin.


I wonder how hard it would be to find offsets into pi that contain surprisingly legit bit sequences.


This. Given a few millions digits I wonder what the hit rate is.


Assuming your data is much shorter than the number of digits to search, and that repeated digits do not appear often enough to matter, the hit rate is just pow(10, numberOfDigitsInData) / numberOfDigitsToSearch. Same idea for any other base (if you then count digits of that base, not base 10 digits of course).

That is, odds of finding a 6 digit datum in a million digits are fairly good. Finding longer data becomes exceedingly unlikely very fast.


I think I'm asking the inverse question - instead of having a known-good datum, given the fact that Pi isn't random, I wonder what coincidentally you could stumble onto, given a broad enough heuristic to "discover" interesting sequences.


For all we know, Pi is random (i.e. normal, although we haven't been able to prove it). That would mean any sequence appears eventually, with uniform odds. Hence any ("interesting") data you'd want to store does appear at some point, and I gave the odds of any (interesting or not) digit string appearing in the first x digits.


This is seriously a very interesting concept. Sounds like tower of babel but somehow much more useful for it's obvious purpose.


Do you mean the library of babel?


No, the Tower of Babel.[0] With this revolutionary technology, we can keep track of information using only metadata; in the information age, such a “digital Tower of Babel” could let us attain ever-increasing heights of Knowledge, if we are not scattered as a result.

[0]: https://xkcd.com/496/


I don't get how the internet secretary thing relates to the tower of babel... Did you mean https://xkcd.com/2421/?


“You mean the fifth?”

“No, the third.”


Yes.


The last commit was made 5 years ago.


I doubt Pi has changed much…


To be fair, this is Hacker News.


pi was abandoned in favor of pi 2


All hail Tau




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: