On Multiple Representations and Efficiency

Here's an interesting point about encoders:

If there exist more than one representation for a particular string of information, then the encoder is inefficient.
Why? Well, it's simple: Say we're looking at all the strings which are represented in 2-bits, and two of them decode to string A. We could save space by using only one representation for A, and moving a 3-bit encoded string, B, into one of the 2-bit slots that A used to take up. This allows a cascade of moves from longer to shorter codes, so that the average length gets shorter. The resulting encoder has only one representation for any string. Such an encoder is efficient.

Example:
inefficient encoder:
00 = Z
01 = A
10 = A
110 = B
1110 = C
1111 = D

efficient encoder:
00 = Z
01 = A
10 = B
110 = C
111 = D

We've obviously saved space by removing redundant representations.

Examples of efficient encoders:
Examples of inefficient encoders:
  • LZW and most of its derivatives
  • Any general purpose programming language
  • Turing Machines and Universal Computers

Now, let's think about Kolmogorov Complexity.

The Kolmogorov Complexity (KC) of a string is the size of the smallest representation of that string in a given language. The language is the method of representation or encoding.

Let's define a decoder as a machine, virtual or real, which accepts an input and returns a corresponding output. We can construct a decoder for any encoder, simply by using the same language in each. So, the size of the smallest input which produces a given string as the output is the KC of that string under that encoder/decoder.

That's obvious. Now, let's think about efficient encoders. They have exactly one input which produces a given output, so that input must be the smallest. Since it is the smallest input which produces that output, its size is the KC of that output under the encoder/decoder.

In other words, we hit Kolmogorov Complexity whenever we use an efficient encoder.

Not only that, but we know they are more efficient than inefficient encoders, so their KCs are smaller on average. For instance, a Turing Machine has an infinite number of representations for any given string, so it has substantially larger average KCs than efficient coders. Therefore, the size of a string compressed with a prediction encoder using an appropriate probability model is very often less than the Kolmogorov Complexity of the same string using, say, a Turing Machine.

So enough of this Kolmogorov Complexity being the Holy Grail of compression! We hit KC every day, and we can do better by finding good probability models than we can by making Kolmogorov encoders for Intel chips or whatever.