Robert Martin's Agile PPP Critique

Critique of Robert C. Martin's "Agile Principles, Patterns, and Practices"

A Study of Martin's Payroll Example and Database Comments

DRAFT - 04/19/2007

This is a critique of the payroll example in Robert C. Martin's book, Agile Principles, Patterns, and Practices in C# (ISBN: 0-13-185725-8, et al.). It is based on a very similar work for C++ and Java that is a few years older. If you have the C++/Java version, you should have no trouble following along if you are familiar with the contents of the book.

I chose the payroll example because most of the other examples deal with embedded software, device drivers, and physical modeling. My primary concern and familiarity is with business applications, and something that works for one domain may not work well for another. My criticisms of OOP thus may not be relevant outside of the business domain.

Because Martin's arguments rely heavily on database issues, I will address the database issues first, and then move into application-specific design issues.

Page 350: "We could begin by generating the database schema... It would be easy to design a workable schema and then start building some queries. However, this approach will generate an application for which the database is the central concern"

What is wrong with that? Databases are good modeling and abstraction tools. I did just that. See the bottom for the schema. At least he admits that database design is relatively easy. Martin continues to bash databases:

Page 351: Databases are implementation details. Consideration of the database should be deferred as long as possible. Far too many applications were designed with the database in mind from the beginning and so are inextricably tied to those databases. Remember the definition of abstraction: "the amplification of the essential and the elimination of the irrelevant". At this stage of the project, the database is irrelevant; it is merely a technique used for storing and accessing data, nothing more."

I have to take a big-time exception to this on many levels. Databases are powerful tools if you know how to use them properly. They are not "merely a technique for storing". As mentioned on my main OO blog page, that is like saying that cars are merely for storing gasoline. Databases are a powerful and feature-rich attribute management system. They are a great high-level abstraction tool also. But many OO'ers just want to use them as a dumb filing system and "wrap away" the database behind classes because they don't like databases. However, this excessive wrapping creates several problems.

First, it creates two things that have to be updated when certain database changes are made: the tables and the class. Suppose you want to add a new field, such as employee start-date. If you talk directly to the database, then you only have to update one set, reducing monotonous busy-work that does not add business value. But if you wrap the database, you have to edit the class also to mirror additions to the table and/or SQL. You have two things which have to change in lock-step. This is poor repetition factoring. It wastes effort translating from one high-level concept (relational) to another (OOP) rather than add real business value.

Second, it exchanges database ties for application ties. OO'ers like to talk about how one can switch database vendors if they wrap all the SQL in classes. However, it just makes one dependent on OOP classes instead of the database. Switching application languages happens about as often as switching database vendors. Plus, SQL is more similar to other brands than application languages so changing it is less work than switching app languages. OO proponents rarely talk about the mirror problem of switching.

Plus, in a database-centric design, it is easier for outsiders to come in and immediately start working after briefly studying the schema. (That is, if it is a good schema. But, we are not comparing good OO to bad schemas.) OO designs are cluttered with repetitious set/gets and other repetitious manual reinventions of common collection-oriented behaviors.

Once you learn SQL, you don't have to learn every shop's different conventions for adding, changing, deleting, sorting, cross-referencing, etc. (SQL is not the ideal relational language, but it still beats the OO equivalents and is a partial standard.) OO fails to factor, consolidate, and standardize such common behaviors, wasting your money and mine on reinvention of collection-oriented idioms and related interfaces.

Look how much of Martin's code and descriptions are devoted to mere adding, changing, deleting, etc. I would rather a software designer focus on domain-specific issues; the real business logic specific to this system or project. By downplaying the database, Martin ends up being mired in petty collection management. Databases standardize and commoditize much of that such that it does not end up as part of the primary domain model.

Plus, different application languages and tools can work on the same data. The Employee Information screen could be written in Visual Basic, the batch paycheck generator in Java, the union dues calculations in COBOL, etc. The database becomes the common interface, not some specific application language. If C# classes are the only "proper" way to talk to employee data, then you are limiting your app language choice. As a manager, do you really want to box yourself in?

Although Martin does not do it here, OO'ers often claim that wrapping the database "encapsulates" it so that you can intercept any behavior done to the database if need be. However, database triggers and stored procedures can give one this also. Plus, they work no matter which application language you use them with.

Note that I am not against utilities that may simplify certain aspects of dealing with a database or SQL. But generally I use "helper" utilities instead of wrappers. They assist with database-related issues rather than try to entirely hide/wrap it away. But not attempting the Jupiterian task of wrapping all of SQL and RDBMS, they are nimbler, simpler, and easier to abandon or rework if they don't fit real needs. Sometimes strait-forward SQL is the simplest solution. However, if some statements or clauses grow tedious or repetitious, then I have utilities for those segments.

Page 351, second paragraph: "Instead of starting with the data of the system, let's start by considering the behavior of the system. After all, it is the system's behavior that we are being paid to create."

This is misleading. One is paid to make working software that is easy to maintain. Databases and relational algebra (behavior!) can provide a large amount of application behavior if you use them well. Databases can provide a nice mix of attributes and behavior to define, manage, and run the application; each doing what it does best. This is another case of unjustified relational bashing.

Now that we got the database complaints over with, we can get back to payroll application design.

Page 352 and 353: Figure 26-1 and Figure 26-2

Martin commits a common OO mistake of making a hierarchy of "employee types", or at least "payment types" (hourly, commissioned, and salaried). "Types" are often a poor modeling decision in business applications because they cannot handle orthogonal traits very well. Martin gives tons of lip service about making applications easier to change without lots of code overhaul. However, hierarchical OO classes are expensive to de-hierarchy.

In fact, I've witnessed first-hand an actual scenario that busts Martin's hierarchy. I've worked at a place where it was decided that any employee can receive sales bonus. If they sold the company's products to their neighbors or friends, they qualified for commissions. We live in a heavily sells-oriented capitalist society, so preparing for such a situation should be a no-brainer.

If you look at my schema below, you will notice that there is nothing that forces a mutually-exclusive choice of "payment types". The equivalent of Martin's 3 classes is driven mostly by the "empRates" table. If they are an hourly employee, then they have a corresponding hourly rate row in the table. If they are salaried, then they have a monthly rate (per specifications on page 350). If they get a sales commission(s), then the commission rate(s) are also there. We are making the design assumption that they are not necessarily mutually exclusive.

You may ask what keeps somebody from being both an hourly and a salaried (monthly) employee. If such combinations are not allowed, then it can be prevented via data entry validation and/or database triggers. The point is that we are not hard-wiring the allowed or disallowed combinations into our basic design to give us feature flexibility. Validating such combinations is treated also as mere data configuration issue, and not a large-scale code feature. Validation is generally done by using basic Boolean logic expressions with IF statements, table-driven set management, or a combination depending on our complexity needs.

A combination of set theory and IF statements works best for larger-scale feature combination management in my experience. The tables influenced by set theory will handle roughly about 90% of the combination validation rules, and the last 10% can be done via explicit IF statements. However, meta-tizing (table-izing) the combination validation rules is probably overkill for this particular example. Placing an IF statement in the data entry screen and/or using database triggers is probably sufficient here. Also, I have witnessed small and very large companies being burned by hierarchies multiple times. Hierarchies are a thorn to category or classification system scalability.

I will agree that non-tree solutions based on set-oriented thinking are often a hard sell to both users and programmers. People are used to trees but not set-oriented thinking. Powerful ideas are not always the easiest to learn, as quantum physics shows. However, I've yet to hear an inheritance proponent argue trees trump sets only because "sets are too hard to learn".

We can also reuse our rate table concept to have different rates for different kinds of sales. For example, there may be one commission rate for sales of one set of products and another rate for another set.

It is also possible that we want to package the rates such that we don't have to explicitly add a rate for each employee, but have all employees who have a given setting receive the same rate. Thus, an employee belonging to "commission group R" may all receive the same sales commission rates. We could handle this by adding another "commissionGroup" table, or just use the existing "empRates" table and ignore the "rate" value for those groups that don't need a rate. Sales commission can grow pretty complicated, and is beyond the scope of this example.

But we are discussing paycheck generation, not validation. In my opinion, it is not the job of the paycheck generation process to perform such validation. We may even assume that salaried and hourly are not necessarily mutually-exclusive. That way if the laws change in the future, our system can process it. It is the burden of the data-entry or data collection system to weed out invalid feature combinations.

In practice it may not hurt to check, especially if the organization or system is so large that you don't know what you can assume about the data. They may not tell you what the rules are or not let you see the database triggers that check such. I've been a contractor to companies that habitually tell the contractors as little as possible for political or budget reasons.

Thus, to cover your rear end, you may want to perhaps generate a warning message instead of an outright stoppage. For example, in the "PayStub" table you may use the "hidden" line-type that tells it not to print a given line on the pay-stub. This may be a good place to put warning messages intended for payroll clerks. The use of warnings in business software is under-utilized in my opinion. People seem to have the idea that error messages should be all or nothing.

Page 351: Affiliations

I don't know why Martin has a concept dedicated to "affiliations" in the payroll system. We are not building an affiliate management system here. It is not really concerned with affiliations, but just their impact on payroll. For my version, periodic deductions are dealt with in the "empRates" table and non-periodic ones are dealt with via the "deductions" table.

Actually, there is a third kind of charge not addressed in Martin's requirements: periodic fixed amounts. For example, an employee may always have 30.00 deducted for parking every month. This is not based on a percentage of an employee's total paycheck, but a fixed amount. Being that there are many other real-world issues that Martin's example is not addressing to keep the book digestible, I will not consider them here either, except possibly to illustrate potential future requirements needs and how a design would weather such addition.

Actual Source Code NEW!

Summary

A data-centric design of the payroll system is simpler and cleaner for these reasons:

Database design is fairly well understood such that there are fewer different ways to do it resulting in less "creativity". Less creativity creates consistency, and consistency results in designs that are easier to learn and understand. Consistency and clear design decision rules is a big bugaboo of OO designs. With data centric designs you basically have just tasks and schemas. Both task division and data normalization was more or less mastered in the 70's (although effective combining of the two didn't happen until the 80's). Sometimes I think people reject them because they are predictable enough that it bores them. OO's spaghetti is a new challenge that may keep some awake, but is costing companies billions. It would be more logical to wait until it is solved.
Collection-oriented issues are not a key concern of our design, making it less cluttered, as already described. For example, the person implementing the payroll calculations does not have to concern themselves with how the employee and rate tables got filled up or get emptied out. That is not their concern. However, somebody reading Martin's design has to sift through all the Employee class methods concerned with adding, changing, deleting, searching, etc. for employees. The data-centric developers don't even have to know what programming language filled up the tables. This helps one break big projects down into more manageable ones.
It would be less bloated. Martin's code spends too much time creating and allocating employee objects, transaction objects, classification objects, etc. This is unnecessary red tape. If there is a clear reason for all this extra stuff, it is not stated anywhere. In many cases a simple SQL "Join" will bring in the necessary associated information. Martin's design spends too much code on opening and closing cabinets instead of using what's inside.

At the very least, a data-centric solution is not objectively worse. Paradigm preference may be largely subjective. I am tired of OOP being forced on us developers without a fair trial.

My Database Schema

(This is now obsolete due to the coded example created since)

"//" denote comments

 employee   // table name
 -----------
 empID    // employee ID
 payDateStretegy*  // weekly, monthly, etc...
 checkDeliveryStrategy*   // mail, deposit, etc...
 firstName
 etc...

 timeCard
 ----------
 empRef    // Employee ID (foreign key)
 cardDate
 payCateg*  // payment category (regular, overtime, sick, etc.)
 hours

 salesCard
 ----------
 empRef    // Employee ID (foreign key)
 cardDate
 saleCateg*
 units

 empRates  // pay and deduction rates
 ----------
 empRef    // Employee ID (foreign key)
 rateCateg*   // hourly-rate, monthly-rate, union-rate, etc.
 rate    

 deductions   // for non-periodic dues, fees, etc.
 -------------
 empRef    // Employee ID (foreign key)
 orgRef   // Organization ID (affiliation)
 deductType*    // "union", "taxes", etc...
 amount
 applyDate    // date deduction is applicable

 payStub    // paycheck and commission line-item description
 ---------------------
 empRef
 payDate
 lineType*   // (regular, hidden, no-num, group-heading)
 sequenceNum   // line-item sequence or ordering
 payCategory*   // (union, sales, tax, medical, etc.)
 description
 amount
 etc...

 constants  // constants or "codes" used in various tables 
 -----------   
 groupName    // (see note)
 attribName
 attribDescript
 attribNote

Notes:

*Columns marked with asterisk are tied to the "constants" table. The constants table is used to document and validate various "codes" used in the tables. For example, we may have a constant group called "deduct_type", and the various values ("attribName") could be "taxes", "union_fees", "medical", etc. In a larger system, we may also want to consider a "constantGroup" table to better describe the constant groups. Normally the descriptions ("attribDescript") is displayed on formal reports, such as paychecks, and not the internal codes.
In the "payStub" table, the "lineType" indicator is a formatting code. "regular" is a normal line with a value, "hidden" means the line is for internal use such as auditing or debugging, "no-num" means that the value is not shown because it may not be applicable to a given line-item, and "group-heading" is simply a bold-font header that demarks sections, perhaps to separate the deductions or sales sections.
I didn't bother to split sales-related information from non-sales-related information in the "payStub" table because that is a "presentation" issue. The payroll system here is not about designing report layouts and Martin did not really address this. It supplies enough information so that it can be presented in whatever way is deemed appropriate.

Main