Created 24. March 2004.

The amazing world of library imcompatibility

One of the things that annoy me with Linux and other unixlike operating systems is the needless duplication of effort. One of the fruits of this is the library hell problem.

Have you ever wondered why almost all Linux distros are slightly incompatible with each other, even though they are all built from the same sources? Or more specifically why Red Hat, Debian, Gentoo, SUSE, Mandrake and a zillion other distributions package their files in (most of the time) non-interchangeable ways? Does this seem like a pointless waste of effort to you? For those you that answered yes, I'm going to tell one of the reasons for the current state of affairs. But first you have to learn about dynamic libraries and two related acronyms: API and ABI.

What are dynamic libraries

Since about the 1960s or so people realized that coding every single project from scratch might not be such a good idea. The engineers of the time came up with the idea of creating libraries of useful functions. Any program could then call functions in these libraries transparently. This method became very popular and is used widely in all modern operating systems.

In order to use a dynamic library, one must know two things about it. The first is the application programming interface. That tells what functions and whatnot the library contains and what they are supposed to do. A common way of defining an API are C header files. Basically an API is something the programmer needs to know in order to take advantage of the library. The application binary interface, or ABI, tells how the executable and dynamic linker can map those calls to actual memory addresses.

This is all fine and good, but what happens when bugs are fixed in the library or new features are added? They change the contents of the library, so what happens to programs that try to use the new libraries? There are three different kinds of changes that can occur.

Changes that preserve ABI
These maintain an ABI identical to the previous version. Most bug fixes belong to this class. These are fully backwards-compatible, unless the program relied on the incorrect behaviour of the bug that got fixed.
Backwards compatible changes to ABI
Changes the ABI somehow, but programs linked against old versions work with the new library. Examples include adding a totally new function.
Backwards incompatible changes to ABI
The ABI is changed in a way that prevents programs linked against an old version of the library from working. Removing a function, or changing its parameters are an example of a this kind of change.
When a library's ABI is changed in a backwards-incompatible way, it is called breaking binary compatibility. In addition to the above mentioned methods, there are several subtle and surprising ways to introduce binary incompatibilities to a library. This is especially the case with C++, which didn't even have a standardized way to form the ABI until recently. In C++ adding certain kinds of functions to a library or even just compiling it with different compiler flags might change the ABI (or they might not, only the gurus know for sure).

The problems of maintaining ABI compatibility

One of the cornerstones of commercial software development has been to never, ever break the ABI. A well known example of this philosophy is Windows. You can run programs made for Win95 on W2k or WinXP without problems. The same goes for most device drivers. Compare this to Linux where binaries from two years ago most likely won't run.

Windows' compatibility does have its dark side though. Several of Windows' ugliest deformations are directly caused by maintaining a backwards-compatible ABI, in some cases all the way back to DOS. All libraries have some kind of design limitations. Working around them without modifying the original ABI can be exceedingly hard and/or ugly. The ABI becomes a virtual straitjacket for the programmer.

Why binary compatibility is not strictly enforced

When a misfeature is found in a library, the elegant and "correct" way to fix that is to ignore any compatibility problems. The programmer is allowed to fix the problem in any way he deems best. This results in higher quality code, since the shackles of history are not present. When the new version is deployed, applications are re-compiled against the new version and they just work.

This has been the traditional Unix way. Since Unix has been a programmer's system, this has been seen as a good thing. It gets you higher quality code faster, since you don't have to suffer from brain farts of yesteryear. Backwards ABI compatibility is not a problem, since Unix comes with sources for the entire system.

The downside to this is obvious: getting a simple bug fix may mean recompiling half of your system. Fortunately this is rare. Another problem, though not as severe, is that library interfaces might not be as thoroughly designed. The programmer might just throw the first version out to people and fix the misfeatures in it once enough people complain about them.

There are two more reasons for the ABI breakup that are specific to the Free software people. One of them is money. Maintaining binary compatibility is time-consuming and not at all fun. Coders working for companies get paid quite a lot of money for their work, so they are expected to deal with these problems. Most free software developers aren't paid, so it is quite understandable that they concentrate their efforts on the interesting parts.

The other reason is the bogeyman known as proprietary software. Having the ABIs drift is not a huge problem for people working on open source software, but is quite a strain on closed source companies who have to re-certify their products on new releases, offer support on a myriad of different configurations and so on.

If the above seems petty to you, here's some food for thought. The basic functions needed by almost all programs are kept in a library called libc. Features provided by libc include file manipulation, string processing and so on. Linux uses the Free Software Foundation's implementation, which is called Glibc. This is one of the reasons why FSF people want to call the system GNU/Linux. Now try to remember who is the most prominent figure of FSF. Then try to remember his views on anything even slightly non-free. Now ask yourself: is it really that surprising that Glibc has broken binary compatibility several times? (If you don't believe me on the ABI breakup part, check Ryan Gordon's comment in this story.)

Conclusions

You now know one of the (many) reasons why binaries of different Linux distributions are not always interchangable, even if they were built from the same source code. Most programs link against several dynamic libraries. If even one of these has, for one reason or another, a different ABI, the binary does not work. The only way to work around this is to rebuild the program from source on the system in question.

There is a kind of middle ground between these two extreme views. That is to only break binary compatibility between major releases. Suppose there is some fancy library called libfrob, which has just released version 1.0. In the optimal case programs built against 1.0 work flawlessly with libfrob versions 1.0.2, 1.1, 1.2, 1.6 and 1.9.9. Only at version 2.0 could there be ABI breaking changes. This is something that a lot of projects are working towards, but the results have been less than perfect. It takes a lot of discipline to maintain strict ABI compatibility. Let's hope that the people programming the various libraries take the effort to have binary compatibility a top priority.

Found errors?

I tried my best to ensure that there are no factual errors on this page. If you do find one, send me mail so I can rectify any bugs. Note though, that this is not a technical document, but rather an informal introduction. Slight inaccuracies in details are not life-threatening. The mail address can be found at the front page.


(C) 2004 by Mr Shrap. All rights reserved.

Back to front page.