Why I Prefer Procedural/Relational

Why I Prefer Procedural/Relational Over OOP

Demoting The Code To Partner
yin yang

Updated: 9/21/2002

Although most of my articles focuses on the down-sides of OOP, due to popular request (or popular revolt, depending on your point of view) I will describe here why I prefer the procedural/relational paradigm over OOP, focusing more on the up-sides of procedural-relational instead. This is a forest- level philosophical summary and does not get into specific examples. See the links scattered about below for more specifics.

It is sometimes said that OOP combines data and behavior, while competing techniques, specifically "procedural programming" separate them. Integration is good, right? Therefore, integration of data and behavior is a good thing, right? Well, if we look at other things in the world we often see tradeoffs with regard to integration. Integration sometimes makes things complex and hard to swap out parts for compatible alternatives. It also may make it hard to analyze things in isolation. Sometimes you want to just look at the meatballs, and other times just at the spaghetti noodles. If we integrate the meat into the noodles, then it is harder to do such.

Mother nature has decided to split the human brain into two halves. One half focuses on logic and the other half on emotions. (This may be an over-simplification of the division of labor by some accounts, but considered a good-enough approximation for most.) This "separation of concerns" seems to be of benefit, or mother nature would have merged it more tightly together by now. Similarly, the United States has found it worth while to split the government into three branches: Legislative, Judicial ,and Executive. One makes the laws, one interprets the laws, and the third performs the laws (if performable). Thus, we have two examples of a seemingly "natural split" of duties.

It is my opinion that the division between data and behavior in procedural/relational (p/r) applications also has software design benefits. It allows a certain "contract" between the database and the database user (application programmer). Contracts usually have certain obligations and costs, but these costs are worthwhile for the parties of the contract or else the contract would not be entered into.

The relational paradigm involves just such a contract. If you follow your end of the bargain, then the other party, the relational paradigm, will provide wonderful benefits. The "limitations" required are not arbitrary, for they provide a powerful intellectual rigor which is one of the greatest tools of modern software technology in my opinion. Along with GUI's, relational technology is one of the rare "wow!" technologies that is a fundamental leap forward in software design.

The contract can be compared to a national highway system. If you agree to pay your taxes and abide by traffic laws, then the government will agree to build and maintain a vast network of highways so that you can quickly and conveniently go from place A to place B with relatively minimal effort in a comfortable vehicle (you provide the vehicle). If you don't like this contract then go live in a country with very few paved roads and you will probably change your mind.

The contract of the relational paradigm basically says, "If you put or keep your information (data) in a certain given structure, then there are powerful and concise operations that you can do on this information." E. Codd, the inventor of relational algebra, opened the world's eyes to a powerful kind of "math" (with some help from more practical-minded collegues). This math can reduce complex structures and patterns into relatively simple formulas, or "relational math".

One can create instant "virtual models" merely by issuing a relational formula. This saves a lot of time and effort. It is like ordering about a vast army using a pre-determined mutual military lingo instead of instructing each and every soldier one-by-one. It does not matter whether the operation is on ten nodes/records or a billion, the order is still dutifully carried out as written. One alternative is to code such structures by hand, which is what OOP more or less is doing. OOP appears to be doing things the "old fashioned way" despite being sold as "more modern". (More on the history below.) OOP is essentially instructing each soldier one-by-one.

It is not just the "relational math" alone, but also the conciseness of tables (if you have a good "table browser"). Tables are simply more compact and flexible than the same information in programming code, at least to my mind. (Powerful code-handling IDE's simply {are trying to} reinvent the database in most cases.) Code is tough to read, search, sort, cross-reference, and filter, compared to the same information in tables. Of course, you can't put the entire application in tables (at least not with today's technology/tools), but you can put a large part of the application-controlling information in them with some practice.

Note that I leaned toward tables even before I got my hands on decent relational technology. (Relational technology was not part of my university curriculum.) For example, early on I had a tendency to line parameters up in a tabular fashion in code. It would resemble something like this:

  w = foo(norf,   "exmpl", 05,    g,  nork)
  x = foo(glob,   "zorg",  33,    r,  frot)
  y = foo(zonnof, "blag",  22.5,  z,  melo)
  z = foo(fli,    "pag",   13.1,  k,  pendmend)
  etc...

This shows that my table impulses are probably part of my personality rather than something ingrained from repetition of relational usage. I don't know how common table-head syndrome is in others. I do get email from other table fans who stumble onto this website, but cannot tell how representative these messages are of the general programming population. My observation is that the thinking patterns of individuals, including programmers, varies widely from each other.

Thus, since tables and relational algebra are superior to code overall (at least to my mind), the more of the application that you factor (move) to tables and table tools, the more power you have to edit, manage, and change the application. Databases makes info concise and powerful, code does not.

OOP philosophy and trends like GOF patterns, are "code centric". They attempt to manage the complexity of the world with structures programmed into a programming language. In the p/r world, relational algebra handles most of this. Sure, the relational algebra is still in code in most cases, but it is usually easier to change a formula than change the code structure. The formula is usually less integrated into the code structure. This means that changes to what the formula deals with are less likely to "ripple" to other parts of the programming code or change the larger-scale structure of the code.

Another benefit of the division of data and behavior is that different programming languages can easily work with the same data. Java, FORTRAN, Smalltalk, and LISP can all share the same "noun model" because our noun model is mostly in the database rather than coded into one programming language. It is not uncommon for an IT shop to have COBOL, Visual Basic, and Java all talking to the same database. Sharing "objects" among different programming languages and paradigms is much tougher.

More Uses for Formulas

Formulas are not just useful in relational math (above), but also in the "dispatching" (selection) of procedural rules. Procedural code tends to have the pattern of:

  task X {
    ....
    if [expression_A] {
       [perform something 1]
    }
    if [expression_B] {
       [perform something 2]
    }
    ....
  } // end task X

If the rules for dispatching expression A change, for example, it does not require significant code changes, but merely a formula change. (Expression A is altered as needed). This results in less code movement upon changes. (The expressions are usually Boolean formulas, I would note.)

OOP tends to try to dangle behavior dispatching off of noun structures or noun taxonomies, like decorative bulbs on the branches of a Christmas tree. Dispatching becomes closely tied with an action's or rule's physical position in this tree or web. OOP fans seem to like this tight association for some reason, but it is simply not change-friendly in my observation. The nouns that participate in the dispatching of a given rule are subject to relatively heavy change over time. Plus, multiple nouns can often be involved in any given rule.

Thus, I use formulas to associate the two rather than physical placement in a noun structure. (Remember that OOP tends to put the noun structure into code.) I find it easier and less disruptive to change a formula than to move code all around the noun taxonomy tree or noun structure web. The virtual-ness here tends to mirror the relational formula approach and benefits described earlier. The "formula philosophy" works, at least for me. (See The Noun Shuffle for more on this.)

The Fork in the Historical Road

P/r and OOP took two different historical paths to the "noun management" problem. In the early days of programming, the "structure" was born. These "structures" are called "struct" in C, "Records" in Pascal, "data definitions" in COBOL, or "punched card layout" in the punched card world.

As programs grew more complex, more complex things needed to be done with structures. Two diverging philosophies emerged to handle this growing complexity. OOP is the extreme result of one path, and databases the result of the other. The OOP approach was to more easily allow the programming language (code) to manipulate and manage these structures.

However, the database approach realized that there were certain common patterns to the manipulations of these structures. They worked on various ways to make "generic structure management" tools and operations by analyzing these patterns and trying to consolidate and simplify these operations into a canonical form that transcend any single application. They also did not want to tie them to any one application programming language so that they could sell these tools to more shops.

I find the second branch the better one (at least for custom business software). While the OOP approach made it easier for application programmers to write the software to manipulate the structures, the second approach made most of such manipulations unnecessary or far simpler. The first approach made it easier to reinvent database operations, but the second approach invented the database so that application programmers don't have to. The first approach (OO) gives you nifty tools to make your own scooter, while the second approach gives you a functioning car out of the box.

Okay, now that we covered the basic philosophy and benefits of the split, now lets deal with the drawbacks. I won't claim that there are zero drawbacks, but I have not seen any huge penalties. There are plenty of p/r tools and techniques to mitigate such problems if and when they occur, and there is no evidence I have seen that such problems "scale up" as projects get larger.

The primary claim of OOP proponents regarding the integration is that OOP allows one to swap behavior in place of data without a lot of changes. OOP allegedly "abstracts (hides) away the source of information" such that one does not have to worry about it nor tie their code to the source of it. But I have not seen very many realistic examples of this being better on the OO side. Sometimes certain operations need a more up-to-date version of the piece of data, and the solution is to call the operation that generates or refreshes that data item right before using it. (This may require a bit of module refactoring). Other times database "triggers" and "views" can be used to provide a result that is "calculated on the fly".

A related claim is that OOP "protects" or "wraps" the data to prevent or reduce the number of different "access points" that have to be tracked or changed. Databases are "wide open" and "naked" in their view. However, if they are reinventing database-like operations in their code, then they essentially have created similar issues for themselves that database systems often already take into account. This is a complex issue that involves complex tradeoffs. One would have to look at specific situations. (See also Data Protection.)

Most of the alleged drawbacks are fictional or exaggerated in OOP literature, I have come to conclude. The benefits of using the flexible, dynamic, and virtual modeling and relativistic view power of relational algebra is far too strong to just toss away because of minor drawbacks in my opinion. It is tossing the baby out with the bath water. To reuse our road analogy, the complaints are like saying, "but sometimes the direct route is over the hill instead via the winding roads; so why can't we just all own SUV's (all-terrain vehicles) and go over the hills instead relying on these roads?" (Some TV commercials have tried to sell such an idea.)

Some OOP proponents will say that using OOP in conjunction with RDBMS (relational databases) can give one the "best of both worlds". However, it appears that OOP and RDBMS fight over and duplicate too much territory. The GOF patterns themselves are a testament to how complex code can get when you don't make use of a database and try to "hand index" or sew all the structures (classes) together on your own. Perhaps GOF patterns are meant for situations where a database is not available, but that has never been stated explicitly. Many OOP fans, perhaps most, would rather use an OOP database (OODBMS) or not use a database at all rather than RDBMS's. However, OODBMS's have been a commercial failure for the most part, and are thus not viable alternatives to RDBMS right now. Some even suggest than "OOP Database" is a contradiction to the very nature of OO philosophy. Needless to say, OOP's relationship with databases of any kind is murky and heated.
Also note that I am not claiming that relational technology and tables are a complete replacement to GOF and GOF-like patterns. Some of the equivalent will still be in code under p/r, but most is in the database and database handling tools.

OOP proponents tend to fall into one of two groups (and some in-between). The first group is the "taxonomists". This group clings to the central idea of sub-classing things into a mutually-exclusive (M.E.) list of "sub-types", and/or to the extension of this: the inheritance tree. My usual criticism of this is that a tree or mutually-exclusive list is too "delicate" and/or too course a granularity to match real world change patterns, and/or a M.E. list or tree is one of many possible views (structure) of the same thing. (See Abstraction, Subtype Proliferation Myth, and Method Granularity Problems.)

In relational-centric thinking, a tree or M.E. list is simply one possible view (query result) among many other possible structure views. Categories like "stack", "tree", "queue" etc. are simply rather arbitrary names to as-needed views. The same node (record) can be part of a stack, tree, and a queue at the same time. Such "structures", and GOF-like patterns, are considered rather ethereal in relational thinking. By hard-wiring trees and M.E. lists into code, you lose or reduce the ability to get different views of the same information. Plus, why should some structural views be in code and others via database queries? This strikes me as inconsistent.

The second group of OO proponents, which I will call the "compositionists", will generally agree with my assessment that "simple inheritance" is often insufficient in non-trivial applications to handle the multiple viewpoints needed by the customer/user/module. (They tend to be "HAS-A" thinkers, in contrast to the "IS-A" thinking of the taxonomists.)

However, the compositionist solution is to basically throw complexity at the problem. They complicate OO designs such that these new contraptions lose most of the idealistic appeal of simple inheritance and are essentially re-creating a database manager from scratch by having classes cross-reference other classes up and down the wazoo to get all the needed views. (The OOP "Visitor" pattern is sometimes considered the poster-boy of such works.) In a p/r application, a good many of these would instead be indexes or "keys" between records. You don't see them in programming code, at least not between each individual node/record. If you hand-join 200 classes, then you have to do it 200 times (or at least read code with it repeated 200 times). In the relational approach a single join clause can join 200 or 2,000,000,000 records with the same relational expression.

Some OO fans will point out that relational technology only performs such transformations on data, and not on algorithms. However, at least in my domain, the vast majority of stuff to be processed is either attributes or can be converted into attributes with a little training. I have experimented with putting code snippets in tables to alleviate the difference, but in practice such is really not needed or useful that often I find.
Also, I don't personally consider SQL the "ideal" query language. See SQL Criticism for more on this. But, it is an improvement over the OOP approach of noun-modeling-in-code even with its existing flaws. For example, OO fans sometimes complain that the same Join clause is repeated multiple times, negating the savings against doing Joins by hand. However, that does not have to be the case. It is only current standards and tradition that require it. The paradigm itself does not require that. The most common joins can technically be defined all in one spot. The SQL Criticism page referenced above goes into more detail on this.

I would much rather have all these cross-references and relationships in a database, which is a tool specifically designed to manage tons of cross-references and relationships. OOP code looks an awful lot like a raw index and ASCII data dump from the database into a flat file (programming code). I would almost never want to manage bulk data that way, so why would I also want to work with a similar representation in programming code?

Whenever I try to figure out non-tree OOP code, I often end up drawing links between many of the different classes that reference each other in the code. The end result is paper printouts with a bunch of lines all over the place. I generally do not have to do this that often with procedural code, at least not reasonably-well designed code (which may not be the minority, unfortunately). The only large exception is back in the "goto" days of programming before block IF and block While statements made their way into some shops. OOP reminds me too much of that era. When working with OOP, I often have the urge to key in all the references between classes into a database so that I can sort, search, filter, etc. the relationships as needed to find something, figure out something, or study the pattern of relationships. In p/r this is pretty much all done for you as part of the database design and usage. You either look at the schema, or issue a query to see what you want to see. It is nearly impossible for all combinations of relational queries to have a counter-part in code without a heavy-duty IDE that copies some database features.

The bottom line is that relational database technology is superior, in my opinion, to programming code (of any paradigm) for the duty of managing complex relationships usually needed in software design. Thus, the more relationship management that you can shift to a (good) relational database, the easier it is to manage those relationships.

The real power of the procedural/relational paradigm is not in the programming language itself, but in relational modeling and relational tools. OOP attempts to empower programming languages themselves, while p/r tends to farm off key forms of complexity to relational technology, reducing the role of code altogether. This generally results in simpler code and simpler software maintenance. The procedural paradigm has not died, it has simply found its place as the Yin to the Yang (or is that Yang to the Yin?).

OOP Criticism