Java Institute DreamsCity 2000
Welcome to DreamsCity
Return to Java Institute

USING CHECKSUMS

These tips were developed using Java(tm) 2 SDK, Standard Edition, 
v 1.2.2.


In the computer software field, a "checksum" is a value computed
from a stream of bytes. The checksum is a signature for the bytes, 
that is, a combining of the bytes using some algorithm. What's
important is that changes or corruption in the byte stream can be 
detected with a high degree of probability.

An example of checksum use is found in data transmission. An
application might transmit 100 bytes of information to another
application across a network. The application appends to the bytes 
a 32-bit checksum that is computed from the values of the bytes. 
On the receiving end of the transmission, the checksum is computed 
again based on the 100 bytes that were received. If the checksum
at the receiving end is different than the one computed at the 
transmitting end, then the data has been corrupted in some way.

A checksum is typically much smaller than the data it's calculated
on. So it relies on a probabilistic model to catch most, but not
all, errors in the data. Checksums closely resemble hash codes, in
that an algorithm is applied in each case to compute a number from 
a sequence of bytes.

The class java.util.zip.CRC32 implements one of the standard
checksum algorithms: CRC-32. To see how you might use
checksums, consider the following application: you're writing some 
strings to a text file, and you'd like to know whether the string 
list has been modified after writing. For example, you'd like to 
find out if someone used a text editor to edit the file. Here are 
two programs that comprise the application. The first program 
writes a set of strings to a file, and computes a running checksum 
from the bytes of the string characters:

    import java.io.*;
    import java.util.zip.CRC32;

    public class Checksum1 {

        // list of names to write to a file

        static final String namelist[] = {
            "Jane Jones",
            "Tom Garcia",
            "Sally Smith",
            "Richard Robinson",
            "Jennifer Williams"
        };

        public static void main(String args[]) throws IOException {
            FileWriter fw = new FileWriter("out.txt");
            BufferedWriter bw = new BufferedWriter(fw);
            CRC32 checksum = new CRC32();

            // write the length of the list

            bw.write(Integer.toString(namelist.length));
            bw.newLine();

            // write each name and update the checksum

            for (int i= 0; i < namelist.length; i++) {
                String name = namelist[i];
                bw.write(name);
                bw.newLine();
                checksum.update(name.getBytes());
            }

            // write the checksum

            bw.write(Long.toString(checksum.getValue()));
            bw.newLine();

            bw.close();
        }
    }

The output of running this program is in a file "out.txt", with
contents:

    5
    Jane Jones
    Tom Garcia
    Sally Smith
    Richard Robinson
    Jennifer Williams
    4113203990

The number on the last line is a checksum computed by combining all
the bytes found in the string characters.

The second program reads the file:

    import java.io.*;
    import java.util.zip.CRC32;

    public class Checksum2 {
        public static void main(String args[]) throws IOException {
            FileReader fr = new FileReader("out.txt");
            BufferedReader br = new BufferedReader(fr);
            CRC32 checksum = new CRC32();

            // read the number of names from the file

            int len = Integer.parseInt(br.readLine());

            // read each name from the file and update the checksum

            String namelist[] = new String[len];
            for (int i = 0; i < len; i++) {
                namelist[i] = br.readLine();
                checksum.update(namelist[i].getBytes());
            }

            // read the checksum

            long cs = Long.parseLong(br.readLine());

            br.close();

            // if checksum doesn't match, give error,
            // else display the list of names

            if (cs != checksum.getValue()) {
                System.err.println("*** bad checksum ***");
            }
            else {
                for (int i = 0; i < len; i++) {
                    System.out.println(namelist[i]);
                }
            }
        }
    }

This program reads the list of names from the file and displays the
names. If you edit "out.txt" with a text editor, and change one of
the names, for example changing "Tom" to "Thomas", the program will
compute a different checksum, and display a checksum error message.

Now, you might think that a person could maliciously change the 
text file, compute a new checksum, and change that as well. This
is possible, but not easy to do. That's because the CRC-32 checksum 
algorithm is not obvious to a casual user, and so it's difficult to 
calculate what the new checksum value should be.

Another way of using checksums is through the CheckedInputStream and 
CheckedOutputStream classes in java.util.zip. These classes support 
computation of a running checksum on an I/O stream.

Any comments? email to:
richard@dreamscity.net