Thoughts on Software Tools

Thoughts on Software Tools—to see the list of articles used in this paper click here (section E).

While the quality of any software project is ultimately determined by the abilities of the people working on it, the type of tools made available throughout development and testing have a lot to say regarding the final outcome. With idea in mind, this paper presents abstracts of four papers written about software tools. In each section the authors present a tool and the workings behind it, as well as some justification why the tool is (in their estimations) successful. My own reflections on the tools, and how they affect my work, conclude each section.

Leading off our discussion on tools is Susan Horwitz and Thomas Reps’ "The Use of Program Dependence Graphs in Software Engineering." In their work the authors’ main point is to introduce the reader to the idea of a program dependence graph and show how (when used in conjunction with other techniques) it aids in addressing problems that occur in software design.

Program dependence graphs (or PDG’s), while suited for numerous problems, were designed more specifically for understanding: how a program works, differences in various versions, and constructing new programs from pieces of old. When visualizing PDG’s consider a program P where the PDG representation is known as P(G). P(G) "is a directed graph whose vertices are connected by several kinds of edges," while the vertices of P(G) represent assignment statements and predicates. PDG’s also allow for special entry vertexes as well as an initial definition vertex (assignment of a variable from its initial declaration) for each variable in P. The edges of PDG’s are either control or data dependence edges—which always emanate from either the entry or predicate vertexes. Control dependencies are either true or false, while data dependencies represent possible values of a given variable.

PDG’s have a limiting factor though—they can only be used to address single procedure programs with no function calls. To that end the authors extend the concept of PDG’s to system dependence graphs, or SDG’s. SDG’s are basically a collection of PDG’s with one significant requirement: all parameters are passed by value (the authors note additional papers that deal with SDG’s and passing by reference). To handle this added functionality SDG’s have two additional resources: a call vertex and a summary edge. A call vertex is simply a way of representing a procedure call while a summary edge represents dependencies that arise from calls. The authors also mention a third but limited permutation of PDG’s—program representation graphs (or PRG’s) which represent programs without procedures.

The rest of the paper discusses using PDG’s to tackle the three problems mentioned earlier. An area of particular interest is the authors’ discussion on the technique of program slicing (determining what might affect a program at a particular point p or might be affected by p) as an additional resource in using PDG’s effectively. The authors conclude by briefly mentioning the Wisconsin Program Integration System, a project on source control and integration that uses PDG’s and the various techniques presented.

The concepts of PDG’s play an important role in my profession. In my group (and throughout the company) we use a source control utility to manage multiple developers working on one project. This utility, while handing backup and archival, manages changes between versions and can tell the difference between two versions of a particular piece of code—all techniques mentioned by the authors. While our utility lacks some of the benefits provided by program slicing (it is up to the developer to figure out what affect a particular change may have), understanding differences in programs and to some extent program understanding are definitely there. While I do not know the specifics behind our utility (it may very well use PDG’s) the concepts of PDG’s are there—and in use in various forms throughout the company.

Continuing with tools Robert O’Callahan and Daniel Jackson introduce the value behind type inference in "Lackwit: A Program Understanding Tool Based on Type Inference." By finding areas in a program where (sets of) variables share a common representation one can detect problems such as dead code, simple operation errors and numerous other examples—all through type inference.

"Many interesting properties of programs can be described in terms of constraints on the underlying representation of data values." By extending this idea through a new type system where representation information is encoded in these types constrains can now be demonstrated as type constraints. Solving these constrains requires type inference—or more specifically a program that understands them. Developed by the authors, Lackwit automates type inference by computing new types for program expressions based upon their actual usage. The authors then list numerous reasons why type inference (and therefore Lackwit) is an attractive basis for analysis, with a primary reason being able to handle complex languages such as C.

After a brief example on type inference the authors discuss the type inference system itself. Using the standard inference algorithm (a reference is provided for more information) the authors assign variables to types as well as unifying type expressions with types then being assigned a construct by an inference rule (numerous examples are given in C). The result of this is a direct mapping of variables and functions from the source code to type signatures created by Lackwit. These signatures are then used to create graphs summarizing information about a single component of a variable.

The rest of the paper discusses a case study where Lackwit was used to analyze Morphin—a robot control program consisting of approximately 17,000 lines of C code—to find unused code. While the authors give some interesting statistics, an item of particular interest is how Lackwit was extended to analyze the detection of memory leaks as well as memory that is never read by simply attaching additional parameters to the types used. The authors then noted how there were improvements to be made in performance and graph generation regarding Lackwit, as well as continued experimentation with various type systems. The paper concluded with a brief discussion of related work on type inference.

From a professional standpoint I like how Lackwit is extensible enough to tackle numerous questions; of particular interest to me was how it was used to detect memory leaks. The low overhead required is also a plus, as anything that requires substantial ramp-up time is difficult when your schedule is based on "Internet time." While the tool is still undergoing development and improvements at the time of this paper, I would encourage the authors to tackle a larger project and see if the same successes occur as they did in the Morphin example. The authors noted some problems with working with large code bases (such as graphing large, highly interconnected systems), and I would be hesitant to use the tool before I received assurances otherwise.

David Evans adds his thoughts to tool usage with "Static Detection of Dynamic Memory Errors." Many classes of bugs arise from invalid assumptions being made about the results of functions and variables. To combat this Evans introduces a technique where explicit assumptions are made through annotations in the source to allow a program to detect a broad range of errors—with the program being LCLint.

At a high level LCLint does not use full interprocedural analysis (as that would be too expensive to run on large programs); instead all procedures are checked independently of each other. To add more functionality LCLint associates values with each function and variable (a definition state, an allocation state and the null state, with different possibilities for each). This additional checking is then done at compile time (with some simplifying assumptions for added efficiency) with LCLint reporting any potential problems to the user. Evans acknowledges that LCLint will sometimes incorrectly identify problems, but defends this by stating that by accepting an analysis not based on worst-case assumptions but rather approximations from likely scenarios more value is added. By allowing this, he argues, LCLint is allowed to be more ambitious in its checks, detecting more errors with only the occasional incorrect message.

For added flexibility LCLint allows the use of special annotations to give extra hints as to how to it makes interface assumptions. The annotations can either be used when running LCLint or placed directly in the source code, and are syntactically similar to C type qualifiers as they can be used in conjunction with one another, although certain combinations are illegal. Evans then proceeds with a more in-depth discussion on common annotations and their usage, as well as providing the reader with a full appendix and various examples.

The author then discusses a case study that was done using LCLint on source code of approximately 1000 lines, noting two important takeaways from the experiment. The first is that LCLint provides immediate results (as it can be run on un-annotated source code) without time being lost adding annotations or being overwhelmed by messages. The second is that with LCLint you get what you pay for (in the example the author discusses the additional bugs that were found by adding various annotations). The author concludes with some of his experiences developing the tool as well as leaving the reader with a final thought: while LCLint can not detect all errors, when used in conjunction with run-time testing more reliable code can be produced with less effort.

I found this paper to be very extremely interesting. I know that a tool like this has possibilities in the software industry because we are already using variations of it (and other groups may be using LCLint that I am not aware of) currently in our development process. A tool with almost no initial cost is huge. Also, the examples the author provided on annotation usage are by no means difficult and would be something I would be interested in learning how to use effectively to find some of the nasty memory errors he describes. I agree with the author’s final assessment however in that LCLint is not a replacement for software testing—it just saves time and allows us to concentrate our efforts in testing elsewhere.

Concluding the series on tools is "Dynamically Discovering Likely Program Invariants to Support Program Evolution" by Michael Ernst et al. While similar to the approach put forth by Evans, Ernst proposes an alternative to annotation—automatically infer invariants from the program itself. The authors then discuss methods for discovering invariants as well as reporting on experiments done using this technique.

The approach behind discovering invariants is rather unique. Instead of working on a compile-time approach, the authors instead run the target program numerous times. During each execution values of variables, procedure arguments, derived variables (variables that may not be explicitly referred to during program execution such as the size of an array or its first and last values) and return values are logged to disk. At the end of these runs an invariant detector (developed by the authors) then goes through and looks for patterns (i.e., constant or a range of values, sequences, linear relationships, etc.), ultimately deriving invariants from them.

The authors then discuss the use and benefits of invariants through a small example (a replace program consisting of 563 lines of C). One item the authors point out concerns a constant value in the source. In order to understand more about the source the authors redefined the value and looked at the resulting traces from iterations of the new program. Through these traces the authors were able to infer some invariants on a particular loop in the source, ultimately finding an array out of bounds error that occurs when the loop is not entered. Another use of invariants was making sure that they held through source modifications. In the context of the replace example the authors compared traces before and after changes to the source and verified that all the expected invariants held in the new code.

The authors then assess their experiment on inferring invariants. Discussing one particular example regarding insight into various areas of the source that they had previously drawn incorrect assumptions on, the authors note how their tool only requires some insight into a given program, not a full understanding of it. Also addressed is the issue of scalability, as the authors’ note how invariant detection is not effected by code size. There is also a brief discussion on future improvements to invariant detection, as the authors specifically mention some enhancements for the sake of performance and the supporting user interface. The authors conclude by noting the benefits that invariants provide—most notably that they aid in bug finding, test case generation and affecting programming through discussion on situations that may not otherwise appear or be considered.

I like the concept put forth here. I can definitely see how a tool like this would have a positive effect on the development and testing aspects of my product. I also like the low entry cost involved, as well as additional source annotation not always being required. Where I have concerns though is in the scability issues noted by the authors. More specifically, the code samples discussed in the paper are far from the size and complexity that our code base currently is (Win32 C++). I would be interested in seeing performance numbers on a code base of 100,000 to one million lines of code for example, as well as how many iterations were required before invariants could be properly inferred.

This paper has presented abstracts of four papers that each address a particular tool designed to aid software development and testing. While each author concludes that their tool reaches some measure of success each has its own strengths and weaknesses. While it is ultimately up to the reader to decide which tool(s) to use, one conclusion can be drawn from all of them—the appropriate tools (those mentioned here or otherwise) can make our lives easier.