Every day millions are generated in online sales. Get your share with LinkShare.
Sosnoski Software Solutions, Inc.

Also See
Sosnoski Software Solutions, Inc.

XML Stream Performance

Here are the results obtained using a test program included in the XML Stream (XMLS) download. The test technique is to first build a dom4j or JDOM representation of the XML document, then output that representation repeatedly using either Java serialization, text, or XMLS, and finally use a copy of the output to repeatedly reconstruct the representation. The test times are for the last 10 of 11 total passes for each operation.

The documents tested are intended to be representative of a wide range of applications:

  • much_ado.xml, the Shakespeare play marked up as XML. No attributes and a fairly flat structure, heavy text content (202K bytes).
  • periodic.xml, periodic table of the elements in XML. Some attributes, also fairly flat, relatively low text (117K bytes).
  • soap2.xml, generated list of values in SOAP document form. Heavy on namespaces and attributes (134K bytes).
  • xml.xml, the XML specification, with the DTD reference removed and all entities defined inline. Text-style markup with heavy mixed content, some attributes (160K bytes).
  • build.xml, the Ant build file for this project. Lots of attributes, low text (5K bytes).

Several of these documents contained non-significant whitespace. With the default options XMLS is likely to handle this type of whitespace with handles, so it may have received more of an advantage than would otherwise be the case. The next set of tests will include documents with non-significant whitespace removed to see how much this effects the results.

In the XMLS test runs the same adapter instances were used for each test pass but were reset between passes, so that each pass started from scratch without any retained information. Changing the code to create a new instance of the adapters for each test pass did not significantly change the results.

The timings shown are from tests using Sun Microsystems Java version 1.3.1, Java HotSpot(TM) Client VM 1.3.1-b24, on an Athlon 1GHz system with 256MB of RAM, running Redhat Linux 7.1, using the default memory settings.

Output time

Figure 1. Output time

Input time

Figure 2. Input time

Roundtrip time

Figure 3. Roundtrip time

Output size

Figure 4. Output size

As you can see from these results, XMLS gives dramatic performance improvements over the standard text XML document format, which itself is much faster than Java serialization of the document representations.

The size reduction for XMLS is not as great overall as the time reduction, but still very good considering that the emphasis is on speed. The different document types make more of a difference in this area. Heavy text documents (much_ado.xml and xml.xml in this test) get much less benefit from the XMLS encoding than more structure-intensive documents such as soap2.xml and periodic.xml.

The build.xml file results are difficult to see on the scale of these graphs, but worth mentioning. This small file was included in the tests out of concern that the handle approach used by XMLS might not perform as well for small files as for larger ones. In terms of output size, the text output was about 50 percent larger than the XMLS output and the Java serialized output was more than 3 times the size, which is not that much different from the other test results.

The time differences were much more pronounced, though. The roundtrip time for text using dom4j was more than 5 times that of XMLS, and for Java serialization more than 12 times XMLS. JDOM's time for XMLS was about 50 percent longer than dom4j, as with the other documents, but the text time was about 17 times that of XMLS and the Java serialization time about 8 times XMLS.

It appears from these results that text and Java serialization may have high startup overhead for relatively small documents. Contrary to initial expectations, it looks like XMLS may be even better as an alternative to these formats for small documents than for larger ones. Beyond this, using XMLS for streams of documents of the same type is likely to provide even greater performance improvements.

[Blue Ribbon Campaign icon]

Copyright 1998-2002, Sosnoski Software Solutions, Inc.
All rights reserved.
Contact webmaster for site problems.

Solidarity with the Palestinian People