Using the C preprocessor with Java

Oct 25, 2004
Last updated: March 23, 2005
For comments: tzvetanmi@yahoo.com

Table of Contents:

  1. The Problem
  2. My Solution
  3. Further Improvement

The Problem

The fact that Java doesn't have a preprocessor is usually considered an avdantage. Perhaps the motivation behind that idea is that programmers or IDEs like Eclipse could get confused. Analyzing code containing C-style macros for purposes like class browsing or code completion is impossible in the general case (although in practice good style requires macros not to break the observable language syntax), so this has some merrits. Nevertheless, I am sure that any professional programmer will agree with this - there are cases when macros can be tremendous help.

A very obvious example is Eclipse itself, or SWT in particular. SWT attempts to be a very thin layer on top of the native GUI subsystem - for that it often has to pass around actual C pointers in the JVM. On a 32-bit system a pointer fits in a Java int, but on 64-bit system it requires a Java long. Since Java doesn't have the equivalent of C's "typedef", and using a separate object for each pointer would be performance suicide, different sources must be compiled for 32-bit and 64-bit SWT! Absurd, isn't it ?

As far as I know the SWT developers had to resort to a preprocessing tool, which as a part of the regular build process scans all of the sources for specially marked int-s and replaces them with long-s for 64-bit builds. Sad, really.

This is one case where a standard preprocessor would obviously have helped. There are others. Ironically, Java needs a preprocessor more than C++; C++ has language features that remove the need for a preprocessor in many cases (typedef, templates).

Not having a preprocessor has lead to other not so obvious problems with Java. It doesn't lend itself easily to tools which generate source code like yacc or lex. The problem has several roots:

Preprocessing becomes a real burden. The preprocessor must either be run on the entire source tree in advance, or it must be integrated into the compiler itself.

My Solution

To address these issues and as basis for further experimentation, I have developed an experimental patch for Jikes. It invokes CPP before every file is processed and augments Jikes to understand "#line" directives. This allows full usage of C-style macros in Java programs and enables code generation techniques to be used as a part of the build process. Compile-time errors and debugging refer to locations in the original file, as they should. There are some limitations though:

I have integrated this in my build process and so far it seems to be working well (Jikes is so much faster than Sun's javac that even with the overhead of invoking CPP for every file, my compilation times still went way down).

This is the patch against jikes-1.22: jikes-1.22-cpp.patch.gz. The modified compiler needs GNU's cpp in the executable path. I have only compiled and used it under Win32+Cygwin, but it should work on other platforms. If you decide to try it and experience any problems, please let me know.

To apply the patch:

Further Improvement

The C preprocessor is not particularily powerful or convenient. Sometimes it would be nice to be able to do more meaningful things in the pre-processing phase - for example change the case of identifiers, generate code conditionally, etc.

There is nothing preventing the usage of a more advanced pre-processor like M4. The problem with M4 is that it has awkward syntax and even though it can be very powerful, it is not easy to harness its power (kinda like C++ templates).

Ideally I would like to experiment with a preprocessor implementing a complete programming language. The possibilities that would open are endless - imagine a parser generator executing as a pre-processor macro. There is really no need why it should be a separate build step. A more practical example is regular expression compiler - static regular expressions can be compiled into optimized matchers at compile time - there is no reason why that should happen at runtime.

Some people tell me that such a pre-processor already exists and it is called Lisp :-) While that is true, Lisp programming isn't very practical these days and to be completely honest I am not a big fan of Lisp syntax (or lack there of).

Further, I see extensive using of preprocessing as a tool to improve execution efficiency by moving as much computation as possible into compile time. That should be applicable to conventional programming languages like Java.

Of course extrapolating this idea to its logical conclusion eventually leads us to a preprocessor which essentially is able to do compile-time reflection and even AST manipulation. That would be the ultimate tool and it doesn't even seem that hard to implement. Imagine the possibilities: lots of things that currently can only be achieved with runtime bytecode instrumentation (which is ultimately error-prone and inefficient) could happen transparently at compile time. It could change the face of frameworks like Tapestry and Hibernate.

Back to my home page