USING HPROF TO TUNE PERFORMANCE
How fast is the Java(tm) platform? For many applications, the
answer is "fast enough"-- if you make careful choices in your
design and make good use of the language. But while design
documents and coding standards might encourage efficient code,
the only way to know for sure is by profiling, that is, obtaining
method timing and other information pertinent to performance.
Fortunately, the tools that you need to do profiling are part of
the Java(tm) 2 SDK. This tip will get you started with HPROF,
the Java Profiler Agent, and present an example where a simple code
snippet is improved to run 100 times faster.
To see HPROF's options, enter the following at a command prompt:
java -Xrunhprof:help
One specification you can make for HPROF is cpu=samples. This
setting enables you to profile by sampling. With sampling, the
virtual machine regularly pauses execution and checks to see which
method call is active. With enough samples (and a decent
sampling rate), you can pinpoint where your code spends its time.
For example, consider the following example:
package com.develop.demos;
import java.io.IOException;
public class TestHprof {
public static String cat = null;
public final static int loop=5000;
public static void makeString() {
cat = new String();
for (int n=0; n<loop; n++) {
addToCat("more");
}
}
public static void addToCat(String more) {
cat = cat + more;
}
public static void makeStringInline() {
cat = new String();
for (int n=0; n<loop; n++) {
cat = cat + "more";
}
}
public static void makeStringWithLocal() {
String tmp = new String();
for (int n=0; n<loop; n++) {
tmp = tmp + "more";
}
cat = tmp;
}
public static void makeStringWithBuffer() {
StringBuffer sb = new StringBuffer();
for (int n=0; n<loop; n++) {
sb.append("more");
}
cat = sb.toString();
}
public static void main(String[] args) {
long begin = System.currentTimeMillis();
if (null != System.getProperty("WaitForProfiler")) {
System.out.println("Start your profiler, then press any key to begin...");
try {
System.in.read();
}
catch (IOException ioe) {
}
}
makeString();
makeStringInline();
makeStringWithLocal();
makeStringWithBuffer();
long end = System.currentTimeMillis();
System.out.println("Total run time of " + (end - begin) + " milliseconds");
}
}
A call to makeString simply builds up a long string by repeated
concatenation. This is definitely a slow way to build the string,
but how can it be made faster? One possibility is to eliminate the
overhead of a function call by putting the addToCat method inline,
as in makeStringInLine.
Another possible optimization is illustrated in makeStringWithLocal.
This method uses a temporary local variable; a local variable might
be accessed more quickly than the static cat.
Still another possibility is to use the StringBuffer class
instead of String, since the intermediate results don't have
to be stored in Strings. This is demonstrated in
makeStringWithBuffer.
Which of these four implementations is fastest? Let's run HPROF
against the program. Running HPROF will help determine which
optimizations are worthwhile.
java -Xrunhprof:cpu=samples,depth=6 com.develop.demos.TestHprof
Notice the depth=6 specification. This indicates a stack trace
depth of 6. Note too that by default the profiler output goes
to java.hprof.txt. The interesting part of this file is the table
at the bottom which lists the percentage of time spent in each
different stack trace:
CPU SAMPLES BEGIN (total = 7131) Wed Jan 12 13:12:40 2000
rank self accum count trace method
1 20.57% 20.57% 1467 47 demos/TestHprof.makeStringInline
2 20.40% 40.98% 1455 39 demos/TestHprof.addToCat
3 20.28% 61.25% 1446 53 demos/TestHprof.makeStringWithLocal
4 11.85% 73.10% 845 55 java/lang/String.getChars
5 11.75% 84.85% 838 42 java/lang/String.getChars
6 11.72% 96.58% 836 50 java/lang/String.getChars
(remaining entries less than 1% each, omitted for brevity)
The self column is an estimate of the percentage of time
a particular stack trace is active. In this case, you want to
time four methods (makeString, makeStringInline,
makeStringWithLocal, and makeStringWithBuffer); let's call these
top-level methods. You cannot simply add the times for these
methods, because a sample that was not in a top-level method
might still have that top-level method somewhere in its call stack.
So, for each entry in the table, you need to crawl back up the
stack to find the associated top-level method. The trace column
is a pointer to the needed information, which is higher up in
the HPROF output file. For example, the 4th ranked sample is
trace 55:
TRACE 55:
java/lang/String.getChars(:Compiled method)
java/lang/StringBuffer.append(:Compiled method)
com/develop/demos/TestHprof.makeStringWithLocal \
(TestHprof.java:Compiled method)
com/develop/demos/TestHprof.main(TestHprof.java:57)
Ahah! Trace 55 leads back to makeStringWithLocal, so its time
should be added to the time for 3rd-ranked Trace 53, which is a
direct invocation of makeStringWithLocal. If necessary, you could
continue this process, and cross-reference all the call stacks
and the CPU samples by hand. Alternately, you could use a tool
that interprets the profiling output. In this simple example,
the top six samples are enough. Three of the top-level methods
(makeString, makeStringInline, and makeStringWithLocal) each have
two entries in the top six. Adding up each method's entries leads
to a tie. Each of the three contributed over 30% of the total
running time. On the other hand, makeStringBuffer's stack traces
are way down the list, and total less than 0.3% of the runtime of
the application. In other words, putting the function inline and
using a local variable didn't help, but switching from String to
StringBuffer caused the code to execute over 100 times faster.
Vive la HPROF!
This example only scratches the surface of Java profiling.
Profiling data can often be used not only to time methods, but
also to explain why one implementation is faster than another.
With HPROF's cpu=timings flag, you can profile by explicitly
timing methods. This gives more accurate results than sampling,
but is slower and more intrusive. The heap options can be used to
track memory problems.
Profiling is an important arrow in any Java developer's quiver,
and HPROF is a free way to get started. Whatever tools you choose
to use, make sure that you profile your code to find and prove
performance gains.
|