HACKER Q&A
📣 max_

Computational feasibility of piedpiper compression algorithm?algorithm


What are the limits of compression algorithms?

I am not asking from the perspective of what exists but from the perspective of what is possible from a theoretical computer science perspective ( computational complexity)

What is the best we can theoretically do? i.e And what doesn't piedpiper already exist in real life.

By piedpiper I mean the fictional startup in the HBO comedy Scilicon Valley


  👤 emrah Accepted Answer ✓
One can think of compression as an attempt to increase entropy in representing the data. The less predictable the data is, the more entropy it has. Compression exploits patterns in the data to increase its entropy.

So technically we shouldn't talk about compression level as an absolute. The input file will have some level of entropy and the question is what is the maximum level of entropy it could have and how close to it can the given compression algorithm get.

For example, encrypted data is hard to compress because it already has high entropy.

"Better" usually means "is able to identify more or longer strings of duplicate data (or other patterns) in the file".

Reference: https://en.m.wikipedia.org/wiki/Entropy_(information_theory)


👤 sigmaprimus
Here is a short and sweet article that might answer your question. It starts out a bit heady but if you can chew through the initial paragraphs, the pigeon hole explanation is easily digestible.

http://matt.might.net/articles/why-infinite-or-guaranteed-fi...