IMPROVING SERIALIZATION
PERFORMANCE WITH
EXTERNALIZABLE
With serialization, you can customize how an object's fields are
mapped to a stream, and even recover when you encounter a stream
that has different fields from the ones you expect. This
flexibility is a benefit of the serialization format; the format
includes more then just your object's field values, but also
metadata about the version of your class and its field names and
types.
However, flexibility comes at the price of lower performance.
This is certainly true for serialization. This tip shows you how
to improve the performance of serialization by turning off the
standard serialization format. You do this by making your objects
externalizable. Let's start the tip with a programming example
that uses serializable objects:
import java.io.*;
class Employee implements Serializable {
String lastName;
String firstName;
String ssn;
int salary;
int level;
public Employee(String lastName, String firstName, String ssn,
int salary, int level)
{
this.lastName = lastName;
this.firstName = firstName;
this.ssn = ssn;
this.salary = salary;
this.level = level;
}
}
public class TestSerialization {
public static final int tests=5;
public static final int count=5000;
public static void appMain(String[] args) throws Exception {
Employee[] emps = new Employee[count];
for (int n=0; n<count; n++) {
emps[n] = new Employee("LastName" + n, "FirstName" + n,
"222-33-" + n, 34000 + n,
n % 10);
}
for (int outer=0; outer<tests; outer++) {
ObjectOutputStream oos = null;
FileOutputStream fos = null;
BufferedOutputStream bos = null;
long start = System.currentTimeMillis();
try {
fos = new FileOutputStream("TestSerialization");
bos = new BufferedOutputStream(fos);
oos = new ObjectOutputStream(bos);
for (int n=0; n<count; n++) {
oos.writeObject(emps[n]);
}
long end = System.currentTimeMillis();
System.out.println("Serialization of " + count +
" objects took " + (end-start) + " ms.");
}
finally {
if (oos != null) oos.close();
if (bos != null) bos.close();
if (fos != null) fos.close();
}
new File("TestSerialization").delete();
}
}
public static void main(String[] args)
{
try {
appMain(args);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
The TestSerialization class is a simple benchmark that measures
how long it takes to write Employees into an OutputStream. It
creates 5000 fictitious employees, then writes them all into
a file. The test runs five times. If you run TestSerialization,
you should see output that looks something like this (your times
might differ substantially depending on environmental factors
such as your processor speed and other applications running in
your system):
Serialization of 5000 objects took 438 ms.
Serialization of 5000 objects took 203 ms.
Serialization of 5000 objects took 234 ms.
Serialization of 5000 objects took 188 ms.
Serialization of 5000 objects took 219 ms.
These results indicate that it was a good idea to run the test
more than once because the first run was so different from the
others. Ignoring the first run, which probably incurred some
one-time startup overhead, the results range from approximately
190-235 ms to write 5000 objects to a file.
The Employee class takes advantage of the simplest flavor of
serialization by implementing the signal interface Serializable;
this indicates to the Java(tm) virtual machine* that you want to
use the default serialization mechanism. Implementing the
Serializable interface allows you to serialize the Employee
objects by passing them to the writeObject() method of
ObjectOutputStream. ObjectOutputStream automates the process of
writing the Employee class metadata and instance fields to the
stream. In other words, it does all the serialization work for you.
Though the work is automated, you might want faster results. How
do you improve the results? The answer is you need to write some
custom code. Begin by declaring that the Employee class implements
Externalizable instead of Serializable. You also need to declare
a public no-argument constructor for the Employee class.
When you declare that an object is Externalizable you assume full
responsibility for writing the object's state to the stream.
ObjectOutputStream no longer automates the process of writing your
class's metadata and instance fields to the stream. Instead, you
manipulate the stream directly using the methods readExternal and
writeExternal. Here is the code you need to add to the Employee
class:
public void readExternal(java.io.ObjectInput s)
throws ClassNotFoundException, IOException
{
lastName = s.readUTF();
firstName = s.readUTF();
ssn = s.readUTF();
salary = s.readInt();
level = s.readInt();
}
public void writeExternal(java.io.ObjectOutput s)
throws IOException
{
s.writeUTF(lastName);
s.writeUTF(firstName);
s.writeUTF(ssn);
s.writeInt(salary);
s.writeInt(level);
}
The ObjectInput and ObjectOutput interfaces extend the DataInput
and DataOutput interfaces, respectively. This gives you the
methods you need to use the stream. Through methods inherited from
DataInput and DataOutput, you can read and write native types using
methods such as readInt() and writeInt(), and read and write string
types using methods such as readUTF() and writeUTF(). (Java uses a
UTF-8 variant to encode Unicode strings, see RFC 2279 and the Java
Virtual Machine Specification for details.)
Try running the example again with the Externalizable version of
Employee. You should see better performance, for example:
Serialization of 5000 objects took 266 ms.
Serialization of 5000 objects took 125 ms.
Serialization of 5000 objects took 110 ms.
Serialization of 5000 objects took 156 ms.
Serialization of 5000 objects took 109 ms.
Again ignoring the first run, this gives a range of 110-156ms,
which is about 35-40% faster than the serializable version.
Does this kind of performance advantage imply that you should
make all of your classes externalizable? Absolutely not. As you
can see, making a class externalizable requires writing more code.
And more code means more possible bugs. If you forget to write
a field, or read fields in a different order than you wrote them,
externalization will break. With the Serializable interface, these
problems are handled by the ObjectOutputStream. Probably the worst
disadvantage of externalizable objects is that you must have the
class in order to interpret the stream. This is because the stream
format is opaque binary data. With normal serializable classes the
stream format includes field names and types. So it is possible to
reconstruct the state of an object even without the object's class
file. Unfortunately, the Java serialization mechanism doesn't
include any code to do the reconstruction, so you will have write
your own code to do that. (See the ObjectStreamWalker class at
http://staff.develop.com/halloway/JavaTools.html for sample code
to get you started.)
However, if performance is your primary concern, it's a good idea
to use externalizable objects. If your code manages a large number
of events in a Local Area Network and you need near real-time
performance, you will probably want to model the events as
externalizable objects.
|