Test Automation Snake Oil

by James Bach

Item #1: A product is passed from one maintenance developer to the next. Each new developer discovers that the products design documentation is out of date and that the build process is broken. After a month of analysis, each pronounces it to be poorly engineered and insists on rewriting large portions of the code. After several more months, each quits or is reassigned and the cycle repeats.

Item #2: A product is rushed through development without sufficient understanding of the problems that it's supposed to solve. Many months after it is delivered, a review discovers that it costs more to operate and maintain the system than it would have cost to perform the process by hand.

Item #3: $100,000 is spent on a set of modern integrated development tools. It is soon determined that the tools are not powerful, portable, or reliable enough to serve a large scale development effort. After nearly two years of effort to make them work, they are abandoned.

Item #4: Software is written to automate a set of business tasks. But the tasks change so much that the project gets far behind schedule and the output of the system is unreliable. Periodically, the development staff is pulled off the project in order to help perform the tasks by hand, which makes them fall even further behind on the software.

Item #5: A program consisting of many hundreds of nearly independent functions is put into service with only rudimentary testing. Just prior to delivery, a large proportion of the functions are deactivated as part of debugging. Almost a year passes before anyone notices that those functions are missing.

These are vignettes from my own experience, but I bet they sound familiar. It's a truism that most software projects fail, and for good reason-from the outside, software seems so simple. But, the devil is in the details, isn't it? And seasoned software engineers approach each new project with a wary eye and skeptical mind.

Test automation is hard, too. Look again at the five examples, above. They aren't from product development projects. Rather, each of them was an effort to automate testing. In the eight years I spent managing test teams and working with test automation (at some of the hippest and richest companies in the software business, mind you), the most important insight I gained was that test software projects are as susceptible to failure as any other software project. In fact, in my experience, they actually fail much more often, mainly because most organizations don't apply the same care and professionalism to their testware as they do to their shipping products.

Strange, then, that almost all testing pundits, practicing testers, test managers, and of course, companies that sell test tools unreservedly recommend test automation. Well, perhaps "strange" is not the right word. After all, CASE tools were a big fad for a while, and test tools are just another species of CASE. From object-orientation to "programmerless" programming, starry-eyed advocacy is nothing new to our industry. So maybe the poor quality of public information and analysis about test automation is less strange than it is simply a sign of the immaturity of the field. As a community, perhaps we're still in the phase of admiring the cool idea of test automation, and not yet to the point of recognizing its pitfalls and gotchas.

Having said that, let me hasten to agree that test automation is a very cool idea. Most full-time testers and probably all developers dream of pressing a big green button and letting a lab full of loyal robots do the hard work of testing, freeing themselves for more enlightened pursuits, such as playing head-to-head Doom. However, if we are to achieve this Shangri-La, we must proceed with caution. This article is a critical analysis of the "script and playback" style of automation for system-level regression testing of GUI applications.

Debunking the Classic Argument for Automation

"Automated tests execute a sequence of actions without human intervention. This approach helps eliminate human error, and provides faster results. Since most products require tests to be run many times, automated testing generally leads to significant labor cost savings over time. Typically a company will pass the break-even point for labor costs after just two or three runs of an automated test."

The quote above is from a white paper on test automation published by a leading vendor of test tools. Similar statements can be found in advertisements and documentation for most commercial regression test tools. Sometimes they are accompanied by impressive graphs, too. Notice how the same argument could be applied to the idea of using software to automate any repetitive human activity. The idea boils down to just this: computers are faster, cheaper, and more reliable than humans. Therefore, automate.

At least with regard to system testing of Windows applications, this line of reasoning rests on many questionable assumptions:

1. Everything important that people do when manually testing can be mapped to a definable "sequence of actions."

Most skilled hand testing is not pre-planned in detail, but is rather a guided exploration and reasoning process. This process is therefore not a sequence of actions, but rather a heuristic search. Sure, this process can be projected into a sequence of very specific test cases with very specific pass/fail criteria, but the resulting hundreds or thousands of test cases would be, at best, an adjunct to skilled hand testing, not a replacement.

2. That sequence of actions is useful to repeat many times.

Once a specific test case is executed a single time, and no bug is found, there's only a remote chance that same test, executed again, will find a bug, unless a new bug is introduced into the system. If there is variation in the test cases, though, as there usually is when tests are executed by hand, there is a greater likelihood of revealing problems both new and old. Variability is one of the great advantages of hand testing over script and playback testing. When I was at Borland, the spreadsheet group used to track whether bugs were found through automation or manual testing-consistently, over 80% of bugs were found manually, despite several years of investment in automation. Their theory was that hand tests were more variable and more directed at new features and specific areas of change where bugs were more likely to be found.

3. That sequence can be automated.

Some tasks that are easy for people are hard for computers. Probably the hardest part of automation is interpreting test results. For GUI software, it is very hard to automate that process so as to automatically notice all categories of significant problems while ignoring the insignificant problems.

The problem of automatability is compounded by the high degree of uncertainty and change in a typical innovative software project. In market-driven software projects it's common to use an incremental development approach, which pretty much guarantees that the product will change, in fundamental ways, until quite late in the project. This fact coupled with the typical absence of complete and accurate product specifications, make automation development something like driving through a trackless forest in the family sedan: you can do it, but you'll have to go slow, do a lot of backtracking, and you might get stuck.

Even if we have a particular sequence of operations that can in principle be automated, we can only do so if we have an appropriate tool for the job. Information about tools is hard to come by, though, and the most critical aspects of a regression test tool are impossible to evaluate unless we create or review an industrial size test suite using the tool. Here are some of the factors to consider when selecting a test tool. Notice how many of them could never be evaluated just by perusing the users manual or watching a trade show demo:

Capability: does the tool have all the critical features we need, especially in the area of test result validation and test suite management?
Reliability: does the tool work for long periods without failure, or is it full of bugs? Many test tools are developed by small companies that do a poor job of testing them.
Capacity: beyond the toy examples and demos, does the tool work without failure in an industrial environment? Can it handle large scale test suites that run for hours or days and involve thousands of scripts?
Learnability: can the tool be mastered in a short time? Are there training classes or books available to aid that process?
Operability: are the features of the tool cumbersome to use, or prone to user error?
Performance: is the tool quick enough to allow a substantial savings in test development and execution time versus hand testing.
Compatibility: does the tool work with the particular technology that we need to test?
Non-Intrusiveness: how well does the tool simulate an actual user? Is the behavior of the software under test the same with automation as without?

4. Once automated, the process will go faster, because it does not require human intervention.

All automated test suites require human intervention, if only to diagnose the results and fix broken tests. It can also be surprisingly hard to make a complex test suite run without a hitch. Common culprits are changes to the software being tested, memory problems, file system problems, network glitches, and bugs in the test tool itself.

5. Once automated, human error is eliminated.

Yes, some errors are eliminated. Namely, the ones that humans make when they are asked carry out a long list of mundane mental and tactile activities. But other errors are amplified. Any bug that goes unnoticed when the master compare files are generated will go systematically unnoticed every time the suite is executed. Or an oversight during debugging could accidentally deactivate hundreds of tests. The dBase team at Borland once discovered that 3,000 tests in their suite were hard-coded to report success, no matter what problems were actually in the product. To mitigate these problems, the automation should be tested or reviewed on a regular basis. Corresponding lapses in a hand testing strategy, on the other hand, are much easier to spot using basic test management documents, reports, and practices.

6. It is possible to measure the relative costs and benefits of manual testing versus automated testing.

The truth is, hand testing and automated testing are really two different processes, rather than two different ways to execute the same process. Their dynamics are different, and the bugs they tend to reveal are different. Therefore, direct comparison of them in terms of dollar cost or number of bugs found is meaningless. Besides, there are so many particulars and hidden factors involved in a genuine comparison that the best way to evaluate the issue is in the context of a series of real software projects. That's why I recommend treating test automation as one part of a multifaceted pursuit of an excellent test strategy, rather than an activity that dominates the process, or stands on it own.

7. The value of automated testing will duplicate or surpass that of manual testing.

This is true only if all of the previous assumptions are true, and if there is also no value in having testers spend time actually using the product.

8. The cost to automate the testing will be less than three times the cost of a single, manual, pass through the same process.

This loosey goosey estimate may have come from field data or from the fertile mind of a marketing wonk. In any case, the cost to automate is contingent on many factors, including the technology being tested, the test tools used, the skill of the test developers, and the quality of the test suite. Writing a single test script is not necessarily a lot of effort, but constructing a suitable test harness can take weeks or months. As can the process of deciding which tool to buy, which tests to automate, how to trace the automation to the rest of the test process, and of course, learning how to use the tool and then actually writing the test programs. A careful approach to this process (i.e. one that results in a useful product, rather than gobbledygook) usually takes months of full-time effort, and longer if the automation developer is inexperienced with either the problem of test automation or the particulars of the tools and technology.

9. For an individual test cycle, the cost of operating the automated tests, plus the cost of maintaining the automation, plus the cost of any other new tasks necessitated by the automation, plus the cost of any remaining manual testing, will be significantly less than the cost of a comparable purely manual test pass.

This is yet another reckless assumption. Most analyses of the cost of test automation completely ignore the special new tasks that must be done just because of the automation:
Test cases must be documented carefully.

The automation itself must be tested and documented.
Each time the suite is executed someone must carefully pore over the results to tell the false negatives from real bugs.
Radical changes in the product to be tested must be reviewed to evaluate their impact on the test suite, and new test code may have to be written to cope with them.
If the test suite is shared, meetings must be held to coordinate the development, maintenance, and operation of the suite.
The headache of porting the tests must be endured, if the product being tested is subsequently ported to a new platform, or even to a new version of the same platform. I know of many test suites that were blown away by hurricane Win95.

These new tasks make a significant dent in a tester's day. Every group I ever worked in that tested GUI software tried at one point or another to make all testers do part-time automation, and every group eventually abandoned that idea in favor of a dedicated automation engineer or team. Writing test code and performing interactive hand testing are such different activities that a person assigned to both duties will tend to focus on one to the exclusion of the other. Also, since automation development is software development, it requires a certain amount of development talent. Some testers aren't up to it. One way or another, companies with a serious attitude about automation usually end up with full time staff to do it, and that must be figured in to the cost of the overall strategy.

I've left for last the most thorny of all the problems that we face in pursuing an automation strategy: it's always dangerous to automate something that we don't understand. It's vital to get the test strategy clearly outlined and documented, not to mention the specification of the product to be tested, before introducing automation. Otherwise the result will be a large mass of test code that no one fully understands. As the original developers of the suite drift away to other assignments, and others take over maintenance, the suite gains a kind of citizenship in the test team. The maintainers are afraid to throw any old tests out, even if they look trivial, because they might actually be important. It continues to accrete new tests, becoming an increasingly mysterious oracle, like some old Himalayan guru or talking oak tree. No one knows what the suite actually tests, and the bigger it gets, the less likely anyone will go to the trouble to find out.

This situation has happened to me personally (more than once, before I learned my lesson), and I have seen and heard of it happening to many other test managers. Most don't even realize that it's a problem, until one day a development manager asks what the test suite covers and what it doesn't, and no one is able to give an answer. Or one day when it's needed most the whole test system breaks down and there's no manual process to back it up. The irony of the situation is that an honest attempt to do testing more professionally can end up assuring that it's done blindly and ignorantly.

A manual testing strategy can suffer from confusion too, but when tests are created dynamically from a relatively small set of principles or documents, it's much easier to review and adjust the strategy. It's a slower testing method, yes, but it's much more visible, reviewable, flexible, and it can cope with the chaos of incomplete and changing products and specs.

A Sensible Approach to Automation

Despite the concerns raised in this article, I do believe in test automation. Just as there can be quality software, there can be quality test automation. To create good test automation, though, we have to be careful. The path is strewn with pitfalls. Here are some key principles to keep in mind:

Maintain a careful distinction between the automation and the process that it automates. The test process should be in a form that is convenient to review and that maps to the automation.
Think of your automation as a baseline test suite to be used in conjunction with hand testing, rather than as a replacement.
Carefully select your test tools. Gather experiences from other testers and organizations (Usenet or Compuserve can be good for this). Try evaluation versions of candidate tools before you buy.
Put careful thought into buying or building a test management harness. A good test management system can really help make the suite more reviewable and maintainable.
Assure that each execution of the test suite results in a status report that includes what tests passed and failed versus the actual bugs found. The report should also detail any work done to maintain or enhance the suite. I've found these reports to be indispensable source material for analyzing just how cost effective the automation is.
Assure that the product is mature enough so that maintenance costs from constantly changing tests don't overwhelm any benefits provided.

Conclusion

One day, a few years ago, there was a blackout during a fierce evening storm, right in the middle of the unattended execution of our wonderful test suite. When my team arrived at work the next morning, we found that our suite had automatically rebooted itself, reset the network, picked up where it left off, and finished the testing. It took a lot of work to make our suite that bulletproof, and we were delighted. The thing is, we later found, during a review of test scripts in the suite, that out of about 450 tests, only about 18 of them were truly useful. It's a long story how that came to pass-basically the wise oak tree scenario-but the upshot of it was that we had a test suite that could, with high reliability, discover nothing important about the software we were testing. I've told this story to other test managers who shrug it off. They don't think this could happen to them. Well, it can happen if the machinery of testing distracts you from the craft of testing.

Make no mistake. Automation is a great idea. To make it a good investment, as well, the secret is to think about testing first and automation second. If testing is a means to the end of understanding the quality of the software, automation is just a means to a means. You wouldn't know it from the advertisements, but it's only one of many strategies that support effective software testing.