Programming
   Home >  Programming >  Java > 

INTRODUCTION TO CLASS FILES


When a ".java" file is compiled by a java compiler it produces ".class" file. This file contains the java type, it may be a class or an interface. In this section we are going to discuss java virtual machine(JVM) ".class" file format.

A class file consists of 8 -bit bytes streams. 16 bit, 32 bit and 64 bit quantities are constructed by reading 2,4,or 8 consecutive bytes. These bytes are stored in big endian format but while reading the byte codes it is neccesary to convert then into little endian as Intel microprocessor understands that format only.

A class file structure is given below.

    class file 
     {
         magic  -  4 bytes.
         minor version   - 2 bytes.
         major version   - 2 bytes.
         constant pool count  -  2 bytes.
         cp_info constant pool [constant pool count]
         access flags  -  2 bytes.
         this class       -  2 bytes.
         super class    - 2 bytes. 
         interface count - 2 bytes.
         interfaces[interface count] 
         field count    - 2 bytes.
         field_info fields[field count]
         method count     - 2 bytes.
         method_info methods[method count]
         attribute count     - 2 bytes.
         attrbute_info attributes[attribute count]
     }

The best way to understand the class file format is by understanding the actual bits and bytes.

Here we have a most difficult java program. Let's call this program file as "zzz.java". As usual we have to compile this file by a java compiler (e.g. SUN's javac). After successful compilation we get a ".class" file. Now this file will be named as "aaa.class".The "zzz.java" file is not important as it is just a text file but the "aaa.class" file is important as it contains the actual bytes.

        class aaa
        {
           public void abc(int i)
            {
              int j;
              j=2;
            }
        }

What is this aaa.class file and what does it contain?

Intel/Motorola microprocessors have their own machine language i.e., a particular code may mean something for an Intel microprocessor and something else for a Motorola microprocessor. So, to make Java machine independent, a hypothetical microprocessor was invented which has its own machine language. This hypothetical microprocessor is called the JAVA VIRTUAL MACHINE or the JVM. The JVM knows nothing about the Java programming language. All the instructions of the JVM are one byte large and they are called BYTECODES. The "aaa.class" file contains these bytecodes which are executed accordingly, whether it's an Intel or a Motorola microprocessor, by the JVM. As these instructions are one byte large (8 bits), there can be only 255 instructions. So, all that Java stands on is these 255 instructions and something more which will be discussed later.

So for the time being lets assume that with this vast knowledge of class files we can proceed further to understand the actual format of the bytecodes generated in class file by the java compiler.

First just try with "edit aaa.class" from the DOS prompt and see the contents of the class file. You will notice that the class file contains only ASCII characters from which no one can figure out the format of the instructions (bytecodes).

To make this more simpler we will write a simple ".CPP" program which reads the class file byte by byte and displays the contents in decimal,ASCII and simple character format.Let us call this program as "bytes.cpp"

(NOTE : We recommend you download (in the zip format) the entire project.)

#include<stdio.h>
#include<stdlib.h>
void abc(int ch)
 {
	 FILE *fp=fopen("z.txt","a");
	 fprintf(fp,"%d..%x..%c\n",ch,ch,ch);
	 fclose(fp);
 }
int main(int argc, char *argv[])
{
  FILE *fp=fopen(argv[1],"rb");
  if(fp==NULL)
	{ printf("Error reading file\n");
	  exit(0);
	  }
 int ch=fgetc(fp);
		 while( ch != -1)
		  {
			  abc(ch);
			  ch= fgetc(fp);
		  }
	 printf("Over");
  fclose(fp);
  return 0;
}


Now, run bytes.cpp with an argument aaa.class.Note that this program works under DOS and therefore the extension(.ext) of the class file has to be changed from 5 charcters to 3 characters i.e. from aaa.class to aaa.cla. As specified in the program we will have to look for the output in a file "z.txt". So, edit z.txt and go through the actual bytes.

The outputwill be in the form:

       dec      hex      char

       202..    ca..     ascii character.

The first value is an equivalent decimal no.,second is a hexadecimal no. and the third is the character.Now going through the bits and bytes of z.txt one can figure out how the java program is converted in the bits and bytes form in a class file.

Now let us see the actual bytes of aaa.class.

The first four bytes are

    HEX     DEC
    CA      202
    FE      254
    BA      186
    BE      190

These are called as the magic number .It has a specific meaning for the netscape,internet explorer or any other borwser. If these bytes are changed then the browser cannot recognised it as a .class file.

The next 4 bytes tells the version number of java compliler.

    HEX     DEC
    00      00
    03      03
    00      00
    2d      45

00 03 is a minor version number and 00 45 is the major version.Thus the version number of the java compiler is 45.3 as the bytes are stored in little endian format.

Next 2 bytes gives constant pool count.

    HEX     DEC
    00      00
    12      18

This means 18 array's of the variables size structures are followed by these bytes. Thus we have 0-17 arrays of structures.But in reality 0 th structure is used by JVM inrternally. Thus we have 1 to 17 variable length structures which gives the informations about the string constants, class names, field names and other constants that are referred to within the class file structures and its subststuctures.

After this the constant pool begins. The constant pool has the following general format.

      cp_info
       {
           tag;
           info[];
       }

The constant pool contains arrays of cp_info structures. Each structure has one byte tag and depepnding upon the value of the tag, size of info[] array is determined.

The following table gives the information about all tags those are used in .class files for different tags.

          constant type           value of tag

          CONSTANT_Class                  7  
          CONSTANT_Fieldref               9
          CONSTANT_Methodref             10
          CONSTANT_InterfaceMethodref    11
          CONSTANT_String                 8
          CONSTANT_Integer                3
          CONSTANT_Float                  4
          CONSTANT_Long                   5
          CONSTANT_Double                 6
          CONSTANT_NameAndType           12
          CONSTANT_Utf8                   1     

Coming back to where we left,the next byte after the constant pool count is 7 which indicates that the value of tag of the first structure is 7. This means that the next 2 bytes give an index into the constant pool which is the name of the class file. Here the next to bytes are 00 15, this means that in the 15th structure has the class file name i.e. aaa.class.Similarly the next byte is a tag which will indicate the index to a particular information of the class file like name of the java file, super class name etc.

Thus when we get tag 7 we will consider the structure as

      CONSTANT_CLASS_info
			{
				tag;
				name_index;
			}

In this name_index is pointed to some array number of constant pool where the Constant Class is stored.

The third structure starts with tag 10. It indicates the type CONSTANT_Methodref. Take the next 4 bytes for the explaination, of which first 2 bytes indicates the class_index. It gives the number where we get the CONSTANT_Classs_info structure. In our case the class_index bytes are 00 02, converting them into little endian format we get the number 2 where CONSTANT_Class_info structure is stored. Next 2 bytes after that are 00 04 which indicates name_and_type_index. That array number 4 points to a structure CONSTANT_NameAndType_info which has the name and description of the methods. When we get the tag 10 we will consider the following structure.

      CONSTANT_Methodref_info
			  {
				tag;
				class_index;
				name_index_and_type_index
			  }


The fourth structure starts with the tag 12. This structure gives the information about method without indicating which class it belongs to . It indicates the type CONSTANT_NameAndTyperef. Take the next 4 bytes in which first 2 bytes points to an array where CONSTANT-Utf8_info structure is stored giving java method name. In our case these bytes are 00 07 and in pool 7 we get this method .

The next 2 bytes following to these gives the descriptor index.That points to an array in constant pool where the structure CONSTANT_Utf8_info resides. In our case these two bytes are 00 05, at that pool 5 we get the descriptor (signature) "( ) v".This is for void ( ).

When we get the tag 12 we will consider the following structure.

  
      CONSTANT_NameAndType_info
				{
					tag;
					name_index;
					descriptor index;
				}

After that from 5th - 17th array of structures the tag we get is 1. It indicates the type CONSTANT_Utf8ref. The next 2 bytes gives length of the string follows. Take 5th structure in that two bytes after tag 1 are 00 03 which gives length of string equal to 3 .After those bytes the string follows and it is " ( ) v ". When we get the tag 1 we will consider the following structure.

     CONSTANT_Utf8_info
			{ 
				tag ;
				length;
				byte[length];
            }

Besides this depending on code in java file we may get other tags.Depending on those tags we have to follow particular structure. For more information refer "THE JAVA VIRTUAL MACHINE SPECIFICATION (By Tim Lindholm & Frank Yellin). At present we have not considered the other tags and their structures.

After the constant pool two bytes indicate the access flags. It tells us about class & interface declaration. Last bit is used for declaration of private or public.

       HEX DEC
       00  00
       20  32

Here the last byte i.e.32, can be written in bitwise format as 0100 0000. This indicates it's not a public class. If we write " public class aaa ", then the last bit will be set indicating it's a public classs. The access flag gives the information about super class,abstract class etc.depending on the settings of other bits.

Access flags are followed by the This class ( 2 bytes ). The value indicates an index into the constant pool where CONSTANT_CLASS_info structure is stored. In our case thse bytes are

       HEX DEC
       00  00
       01  01 

In the array no.1 in constant pool the bytes are

       HEX DEC
       07  07
       00  00 
       00  15.

And the corresponding bytes in array no. 15 in constant pool are

       HEX DEC CHAR
       01  01
       00  00  
       03  03      
       61  97    a
       61  97    a
       61  97    a

Value of This class is 1 which indicates that, go to constant pool no. 1 ( or array no.1 in the constant pool). In that pool, tag 7 indicates that read the following two bytes & then goto that particular constant pool. Here it gives the no. 15. In that 15th constant pool we get a this class " aaa ". This shows that the This class name is aaa.

The next two bytes indicate the super class. The bytes are

       HEX DEC
       00  00
       02  02 

In the array no.2 in constant pool the bytes are

       HEX DEC
       07  07
       00  00 
       11  17

And the corresponding bytes in array no. 15 in constant pool are

       HEX DEC CHAR
       01  01
       00  00  
       10  16
                j
                a
                v
                a
                /
                l
                a
                n
                g
                /
                o
                b
                j
                e
                c
                t            

Thus class is derived from java/lang/object. Every class in java is finally derived from java/lang/object. If we write code as " class aaa extends applet", then the super class comes to be java.applet.Applet. Next two bytes gives how many iterfaces there are.

       HEX DEC
       00  00
       00  00.

It tells that virtual functions = 0. Next two bytes gives how many fields there are, called as field count.

       HEX DEC
       00  00
       00  00.

In present ' .java ' file we haven't include field variables. Following two bytes gives method count.

       HEX DEC
       00  00
       02  02  

There are two methods in our java file. But if we see carefully our java program, only one method is present named abc(). This is because java gives a construtor along with a class. The following bytes to the method count is stored into method_info stucture.

method_info
	{  
		access flags;
		name index;
		descriptor index;
		attribute count;
		attribute info attribute[ attribute count ];
		} 

In our case there are 2 methods. Therefore there are two method_info \ stuctures. The first two bytes are access flag which is 01 indicating that the method is a public method. The next two bytes are name index .

       HEX DEC
       00  00
       10  16  

And the corresponding bytes in array no. 16 in constant pool are

       HEX DEC CHAR
       01  01
       00  00  
       03  03      
       61  97  a
       62  98  b
       63  99  c       .

So the name of the first method is ' abc '. After this we have name index descriptor index ( 2 bytes ).

       HEX DEC
       00  00
       06  06  .

And the corresponding bytes in array no. 06 in constant pool are

       HEX DEC CHAR
       01  01
       00  00  
       04  04      
       28  40  (
       49  73  I
       29  41  )
       56  86  v     .

This is the signature of the method ' abc ( ) '.It looks like ( I ) v, where 'I' within ( ) indicate that an integer argument is passed to the method close bracket & v is for the return type void. Now we have the atrribute count ( 2 bytes ) .

       HEX DEC
       00  00
       01  01  .

Attributes signify the properties of the method. It is followed by attribute name index ( 2 bytes ).

       HEX DEC
       00  00
       08  08  .

And the corresponding bytes in array no. 08 in constant pool are

       HEX DEC CHAR
       01  01
       00  00  
       04  04     
       43  67    C
       6f  111   o
       64  100   d
       65  101   e

This means attribute of the mehtod is Code. All methods have only one attribute and it is code. The code attribute structure is as follows:

      code_attribute {
         attribute name index ;
         attribute length;
         max stack;
         max locals;
         code length;
         code [ code length ];
         exception table length;
           { 
            start pc;
            end pc;
            handler pc;
            catch type; 
           }exception table [exception table length ]
         attribute count;
         attribute info attribute[ attribute count ];
       }

We have already discussed attribute name index(2 bytes). Attrribute length ( 4 bytes )gives length of code .

       HEX DEC
       00  00
       00  00 
       00  00
       ff  ff    .

Thus the code length is 31. Next, max stack( 2 bytes ) are

       HEX DEC
       00  00
       01  01  .

This max stack shows max. no. of words on operand stack at any point during execution of method. After that 2 bytes gives the max local variable used by this method.

       HEX DEC
       00  00
       03  03  .

Following 2 bytes givs the code length.

       HEX DEC
       00  00
       00  00
       00  00
       03  03

After that code[] array follows which consists of no. of bytes as the code length. From the byte codes, exception table length comes to zero. Next 2 bytes are for attributes .

       HEX DEC
       00  00
       01  01

After this 2 bytes of name index follows.

       HEX DEC
       00  00
       0b  11

At constant pool no.11 the bytes are

       HEX DEC CHAR
       01  01
       00  00
       0f  15
                L
                i
                n
                e
                N
                u
                m
                b
                e 
                r
                T
                a
                b
                l
                e

Thus the attribute name in code attribute is LineNumberTable .

The LineNumberTable has following structure.

         {
           attribute name index ;
           attribute length;
           line number table length ;
            { 
                start pc;
                line number; 
            } line number table[line number table length ];
         }

Atribute name index (2 bytes ) points to array in constant pool.The value at array further points to array where the CONSTANT_Utf8_info structure is stored. This strucure represents a string "LineNumberTable".

This follows by length of attribute which is equal to 10 in our case.

Line number table length gives the no. of entries in the line number table array. From the actual bytes it comes out to be 2.

In the first array start pc(2 bytes) = 0 &line number table (2 bytes) = 6. In the second array start pc(2 bytes) = 2 &line number table (2 bytes) = 3.

Then 2nd method starts, where we get

                 method name  - 
                 signature or descriptor -  (   )  v 
                 attirbute name - code which is followed by code bytes 

After this attribute Line number table follows with attributes.

There are different attributes shown by the code depending on the java code. Those are Exception attribute, Local variable attribute, Line Number attribute, Constant value attribute, Code attribute.

After 2 methods the source file attribute count(2 bytes) starts. In our case it is 1.

Then source_file attribute structure follows which has following format.

         {
            attribute name index ;
            attribute length;
            source file index;
         }

In our case attribute name index bytes are

       HEX DEC
       00  00
       0d  13.

At that constant pool no. 13 we have

       HEX DEC CHAR
       01  01
       00  00
       0a  10
                S
                o
                u
                r
                c
                e
                F
                i
                l
                e 

Thus we can find out what is the attribute.

The next 4 bytes gives the attribute length.In our case it is 2.

       HEX DEC
       00  00
       00  00
       00  00
       02  02.

This is followed by the source file index(2 bytes).

       HEX DEC
       00  00
       0e  14

At that constant pool 14 the bytes are

       HEX DEC CHAR
       01  01
       00  00
       06  06
                a
                .
                j
                a
                v
                a

Thus source file index points to the source File and gives the ". java " file name i.e. a.java.

This is how bytes are stored in particular format by the java compiler.