USING CHECKSUMS
These tips were developed using Java(tm) 2 SDK, Standard Edition,
v 1.2.2.
In the computer software field, a "checksum" is a value computed
from a stream of bytes. The checksum is a signature for the bytes,
that is, a combining of the bytes using some algorithm. What's
important is that changes or corruption in the byte stream can be
detected with a high degree of probability.
An example of checksum use is found in data transmission. An
application might transmit 100 bytes of information to another
application across a network. The application appends to the bytes
a 32-bit checksum that is computed from the values of the bytes.
On the receiving end of the transmission, the checksum is computed
again based on the 100 bytes that were received. If the checksum
at the receiving end is different than the one computed at the
transmitting end, then the data has been corrupted in some way.
A checksum is typically much smaller than the data it's calculated
on. So it relies on a probabilistic model to catch most, but not
all, errors in the data. Checksums closely resemble hash codes, in
that an algorithm is applied in each case to compute a number from
a sequence of bytes.
The class java.util.zip.CRC32 implements one of the standard
checksum algorithms: CRC-32. To see how you might use
checksums, consider the following application: you're writing some
strings to a text file, and you'd like to know whether the string
list has been modified after writing. For example, you'd like to
find out if someone used a text editor to edit the file. Here are
two programs that comprise the application. The first program
writes a set of strings to a file, and computes a running checksum
from the bytes of the string characters:
import java.io.*;
import java.util.zip.CRC32;
public class Checksum1 {
// list of names to write to a file
static final String namelist[] = {
"Jane Jones",
"Tom Garcia",
"Sally Smith",
"Richard Robinson",
"Jennifer Williams"
};
public static void main(String args[]) throws IOException {
FileWriter fw = new FileWriter("out.txt");
BufferedWriter bw = new BufferedWriter(fw);
CRC32 checksum = new CRC32();
// write the length of the list
bw.write(Integer.toString(namelist.length));
bw.newLine();
// write each name and update the checksum
for (int i= 0; i < namelist.length; i++) {
String name = namelist[i];
bw.write(name);
bw.newLine();
checksum.update(name.getBytes());
}
// write the checksum
bw.write(Long.toString(checksum.getValue()));
bw.newLine();
bw.close();
}
}
The output of running this program is in a file "out.txt", with
contents:
5
Jane Jones
Tom Garcia
Sally Smith
Richard Robinson
Jennifer Williams
4113203990
The number on the last line is a checksum computed by combining all
the bytes found in the string characters.
The second program reads the file:
import java.io.*;
import java.util.zip.CRC32;
public class Checksum2 {
public static void main(String args[]) throws IOException {
FileReader fr = new FileReader("out.txt");
BufferedReader br = new BufferedReader(fr);
CRC32 checksum = new CRC32();
// read the number of names from the file
int len = Integer.parseInt(br.readLine());
// read each name from the file and update the checksum
String namelist[] = new String[len];
for (int i = 0; i < len; i++) {
namelist[i] = br.readLine();
checksum.update(namelist[i].getBytes());
}
// read the checksum
long cs = Long.parseLong(br.readLine());
br.close();
// if checksum doesn't match, give error,
// else display the list of names
if (cs != checksum.getValue()) {
System.err.println("*** bad checksum ***");
}
else {
for (int i = 0; i < len; i++) {
System.out.println(namelist[i]);
}
}
}
}
This program reads the list of names from the file and displays the
names. If you edit "out.txt" with a text editor, and change one of
the names, for example changing "Tom" to "Thomas", the program will
compute a different checksum, and display a checksum error message.
Now, you might think that a person could maliciously change the
text file, compute a new checksum, and change that as well. This
is possible, but not easy to do. That's because the CRC-32 checksum
algorithm is not obvious to a casual user, and so it's difficult to
calculate what the new checksum value should be.
Another way of using checksums is through the CheckedInputStream and
CheckedOutputStream classes in java.util.zip. These classes support
computation of a running checksum on an I/O stream.
|