Java Institute DreamsCity 2000
Welcome to DreamsCity
Return to Java Institute

USING HPROF TO TUNE PERFORMANCE

How fast is the Java(tm) platform? For many applications, the 
answer is "fast enough"-- if you make careful choices in your 
design and make good use of the language. But while design 
documents and coding standards might encourage efficient code, 
the only way to know for sure is by profiling, that is, obtaining 
method timing and other information pertinent to performance. 
Fortunately, the tools that you need to do profiling are part of 
the Java(tm) 2 SDK. This tip will get you started with HPROF, 
the Java Profiler Agent, and present an example where a simple code 
snippet is improved to run 100 times faster.  

To see HPROF's options, enter the following at a command prompt: 

java -Xrunhprof:help

One specification you can make for HPROF is cpu=samples. This 
setting enables you to profile by sampling. With sampling, the 
virtual machine regularly pauses execution and checks to see which 
method call is active. With enough samples (and a decent 
sampling rate), you can pinpoint where your code spends its time.  
For example, consider the following example: 
 
package com.develop.demos;
 
import java.io.IOException;

public class TestHprof {
    public static String cat = null;
    public final static int loop=5000;
    
    public static void makeString() {
        cat = new String();
        for (int n=0; n<loop; n++) {
            addToCat("more");
        }    
    }
    
    public static void addToCat(String more) {
        cat = cat + more;
    }
    
    public static void makeStringInline() {
        cat = new String();
        for (int n=0; n<loop; n++) {
            cat = cat + "more";
        }
    }

    public static void makeStringWithLocal() {
        String tmp = new String();
        for (int n=0; n<loop; n++) {
           tmp = tmp + "more";
        }
        cat = tmp;
    }
    
    public static void makeStringWithBuffer() {
        StringBuffer sb = new StringBuffer();
        for (int n=0; n<loop; n++) {
            sb.append("more");
        }
        cat = sb.toString();
    }
    public static void main(String[] args) {
    long begin = System.currentTimeMillis();
    
    if (null != System.getProperty("WaitForProfiler")) {
    System.out.println("Start your profiler, then press any key to begin...");
    try {
    System.in.read();
    }
    catch (IOException ioe) {
    }
    }
            
        makeString();
        makeStringInline();
        makeStringWithLocal();
        makeStringWithBuffer();
        
        long end = System.currentTimeMillis();
        System.out.println("Total run time of " + (end - begin) + " milliseconds");
    }
} 
 

A call to makeString simply builds up a long string by repeated 
concatenation. This is definitely a slow way to build the string, 
but how can it be made faster?  One possibility is to eliminate the 
overhead of a function call by putting the addToCat method inline,
as in makeStringInLine. 
    
Another possible optimization is illustrated in makeStringWithLocal.
This method uses a temporary local variable; a local variable might 
be accessed more quickly than the static cat. 
    
Still another possibility is to use the StringBuffer class 
instead of String, since the intermediate results don't have 
to be stored in Strings. This is demonstrated in 
makeStringWithBuffer. 
    
Which of these four implementations is fastest? Let's run HPROF 
against the program. Running HPROF will help determine which 
optimizations are worthwhile. 

java -Xrunhprof:cpu=samples,depth=6 com.develop.demos.TestHprof

Notice the depth=6 specification. This indicates a stack trace 
depth of 6. Note too that by default the profiler output goes 
to java.hprof.txt. The interesting part of this file is the table 
at the bottom which lists the percentage of time spent in each 
different stack trace: 

CPU SAMPLES BEGIN (total = 7131) Wed Jan 12 13:12:40 2000
rank   self  accum   count trace method
   1 20.57% 20.57%    1467    47 demos/TestHprof.makeStringInline
   2 20.40% 40.98%    1455    39 demos/TestHprof.addToCat
   3 20.28% 61.25%    1446    53 demos/TestHprof.makeStringWithLocal
   4 11.85% 73.10%     845    55 java/lang/String.getChars
   5 11.75% 84.85%     838    42 java/lang/String.getChars
   6 11.72% 96.58%     836    50 java/lang/String.getChars
   (remaining entries less than 1% each, omitted for brevity)

The self column is an estimate of the percentage of time 
a particular stack trace is active. In this case, you want to 
time four methods (makeString, makeStringInline, 
makeStringWithLocal, and makeStringWithBuffer); let's call these 
top-level methods. You cannot simply add the times for these 
methods, because a sample that was not in a top-level method 
might still have that top-level method somewhere in its call stack. 
So, for each entry in the table, you need to crawl back up the 
stack to find the associated top-level method. The trace column 
is a pointer to the needed information, which is higher up in 
the HPROF output file. For example, the 4th ranked sample is 
trace 55: 

TRACE 55:
java/lang/String.getChars(:Compiled method)
java/lang/StringBuffer.append(:Compiled method)
com/develop/demos/TestHprof.makeStringWithLocal \
(TestHprof.java:Compiled method)
com/develop/demos/TestHprof.main(TestHprof.java:57)

Ahah! Trace 55 leads back to makeStringWithLocal, so its time 
should be added to the time for 3rd-ranked Trace 53, which is a 
direct invocation of makeStringWithLocal. If necessary, you could 
continue this process, and cross-reference all the call stacks 
and the CPU samples by hand. Alternately, you could use a tool 
that interprets the profiling output. In this simple example, 
the top six samples are enough. Three of the top-level methods 
(makeString, makeStringInline, and makeStringWithLocal) each have 
two entries in the top six. Adding up each method's entries leads 
to a tie. Each of the three contributed over 30% of the total 
running time. On the other hand, makeStringBuffer's stack traces 
are way down the list, and total less than 0.3% of the runtime of 
the application. In other words, putting the function inline and 
using a local variable didn't help, but switching from String to 
StringBuffer caused the code to execute over 100 times faster.  
Vive la HPROF! 

This example only scratches the surface of Java profiling.  
Profiling data can often be used not only to time methods, but 
also to explain why one implementation is faster than another. 
With HPROF's cpu=timings flag, you can profile by explicitly 
timing methods. This gives more accurate results than sampling, 
but is slower and more intrusive. The heap options can be used to 
track memory problems. 

Profiling is an important arrow in any Java developer's quiver, 
and HPROF is a free way to get started. Whatever tools you choose 
to use, make sure that you profile your code to find and prove 
performance gains.

Any comments? email to:
richard@dreamscity.net