On Decoder Size and Kolmogorov Complexity

I am going to say something that many people are not going to like. When measuring information content, from a theoretical point of view, decoder size is totally irrelevant. The main reason is this: The decoder is itself a transform, and therefore must be represented in some way. The size and speed of the decoder are thus entirely dependant on the method of its representation and the probability model there implied.

Now I'm going to talk about Kolmogorov complexity for a bit.

The Kolmogorov complexity of a stream of information is the size of the smallest program which generates that information. This is intended to be ideal, because the decoder size is zero and therefore does not affect the measure of information content. However, the decoder size is not zero; only the software portion of it is zero. The rest of the decoder is in hardware. This hardware, and the instruction set it uses, has its own implicit probability model and therefore completely determines the measured information content.

So, Kolmogorov complexity is indeed an ideal measure of information content on a particular piece of hardware.

Knowing this, we might try to come up with some theoretically ideal hardware. But, even Turing machines and universal computers have an instruction set of sorts, and thus have an implicit probability model of their own, so they're out. We may go further and reduce hardware down to logic gates and define information content in terms of the smallest arrangement of logic gates capable of producing the information. But then we need to represent the arrangements of gates somehow, and that representation has its own implicit model. Even if we go down to the lowest level of known physics, the arrangement of stuff still has to be represented.

In other words, there is no way of avoiding the use of some implicit probability model when representing information.

Since Kolmogorov complexity is completely dependant on the hardware used, we can define arbitrary hardware to give us any decoder we want, and we can say that the decoder size is effectively zero. Since adding zero won't affect the size of the measure, we can say the decoder size is completely irrelevant in the general case, i.e. no specific hardware.

So, we can concentrate on making better probability models and predictors, knowing that they determine the measured information content completely.

Click here for a discussion on prediction and its arbitrary qualities.
Click here for a discussion on multiple representations and efficiency.

The Practical Case

Now let's talk practical. In a practical case, the decoder, i.e. the hardware and operating system, have already been decided for us, along with its implicit model. So, a Kolmogorov measurement exists for any string of information on that machine. This is the limit of compression.

Now, if we want to have a multi-platform compressed file format, we have to create a virtual decoder that can be emulated on all of the desired platforms. This virtual decoder has its own Kolmogorov measure, independant of all the real platforms. It can be made, in fact, to have a smaller measure for some data sets than the real platforms.

This is not a contradiction, because the virtual decoder software becomes part of the encoded information when used on a particular real machine, and therefore adds to the measure on that machine to a point equal to or greater than the Kolmogorov measure on that machine.

So, for multi-platform use, we want to implement a virtual machine that can be represented on the desired platforms in a small enough amount of memory to be only a small portion of the total encoded (and usually transmitted) information, and fast enough to be useful on the desired platforms.

Some examples:

PKZip
LHA
GZip