Core Philosophical Differences

Getting to the root of the differences
between OOP and procedural/relational

Updated: 3/21/2003

Every now and then I try to see if I can better isolate and simplify the core differences between the object oriented paradigm and relational or relational-centric techniques. This is my latest attempt at this tricky endeavor. This is not my first attempt, and probably will not be the last. Often times these attempts simply create different ways of saying more or less the same thing. However, it is still a useful exercise because it gives different perspectives to ponder things from.

The fundamental philosophical disconnect between OO modeling and relational-centric modeling creates continuous problems or time-consuming translating layers. In my opinion, either one or the other must go, at least as far as domain modeling itself, such as business modeling. (Other OO techniques, such as small utility API's can probably live okay in a relational-centric application, since they don't fight over territory as much as business modeling itself does between the two paradigms.)

One of these root differences centers around the perceived nature of taxonomies, and the other around the differences between OO "records" and relational records.

The Nature of Taxonomies

Often times in comparisons of OOP to procedural/relational techniques, it is stated that OOP factors duplicate case/switch statements (or nested IF structures) to single or fewer spots: the sub-type hierarchy. In practice I find that case/switch statements either tend not to be duplicated very often, or tend to drift apart over time, growing less and less "duplicate".

Even if they didn't differ, the OOP approach tends to duplicate operations, such as method names, to multiple spots. Thus, it "busts" pure duplication factoring in its own way. It is simply trading one form of duplication for another. See shapes example for more on this.

For a long while I could not find any reasoning for why they tend to drift. I just had to state to OO fans that the duplication goes against my observation of the real world application code and the way it changes over time. Then one day I encountered a passage that said something like, "Philosophers have generally discovered that taxonomies are relative to specific needs, persons, and/or viewpoints." A one-size-fits-all hierarchical classification of just about anything is an artificial construct that may not be appropriate for all uses.

This observation describes the case statement drift rather well: each procedural "task" is a specific viewpoint with specific needs. In other words, the case statement structure describes a local taxonomy. The best structural decomposition for one task may not be the best for another. OO design tends to assume global taxonomies, or at least taxonomies that are wider in scope than the "task" scope usually found in procedural/relational code. A global taxonomy is generally flawed design thinking in my frank opinion.

Relational techniques further help with this "relativism of viewpoint". The view you create of the data is governed by relational expressions (queries) rather than the "shape" of the code structure. It is a "computed or virtual structure" rather than one we have to hand-build in code. (Some say that relational techniques don't deal with hierarchies very well. Whether this is a flaw of relational theory or of specific implementations is hard to say. Regardless, I often find that trees are over-used anyhow in many designs.)

It is true that some people tend to "think in trees" and/or in terms of universal sub-type taxonomies. However, this is not a universal trait of all humans, and somewhat problematic if the real world does not self-organize or change in a tree-wise fashion in practice.

Note that not all OO fans over-use trees and sub-types, and that there may be some special niches where trees/sub-typing is stable enough to be useful as a global taxonomy. I don't know enough about every industry to completely rule it out.

Network Characteristics

The OO approach's basic building block is more or less a dictionary array which has two "columns," often known as the "key", and the "value". In OO lingo the key is the method or attribute name, and the value is one of:

A scalar - a simple value like number or string
An algorithm - code or a pointer to a method/function
A reference (pointer) to another dictionary array (object)

A dictionary array may go by other names such as "associative array", or "record". I will lean toward "record" in this discussion, mostly because it is the shorter name. Note that our definition of record includes fields which may potentially store algorithms (methods), or at least references to algorithms, not just data.

A "class" is kind of a "static object" that can only be changed before compile or run time. Dynamic or interpreted OO languages tend to make little or no distinction between objects and classes. Static (compiled) languages tend to have more rules that restrict the scope and usage of records, but conceptually are the same otherwise. Overloading is just a more complicated form of the "key," where parameter definitions become part of it.

There are generally two ways to view inheritance under this definition of "object". Some OOP languages perform inheritance similar to the way some cellular biological organisms do: by cloning, which is making a duplicate copy of an organism. Using this technique, inheritance is simply cloning one record to get a second. The attributes or pointers of the copy can then be changed (overridden) as needed. (Sometimes this cloning technique is called "prototyping".) Other OOP languages perform inheritance by providing a "search path" to find methods or attributes (keys) that are not in the current record. This "search path" can simply be considered a "special" key-value pair in our dictionary array. Think of it as an attribute (key) called "parent" or "parents", depending on whether multiple inheritance is permitted.
Although this "dictionary array" definition is rather mechanical in nature, it avoids getting deeper into the various philosophies of design that vary widely among OO practitioners. If I pick a particular OO methodology to define OOP from, then fans of other OO methodologies will probably complain. Thus, I am attempting to use a lowest-common-denominator definition instead.
A dictionary array should not be confused with a Data Dictionary. These are generally unrelated concepts.

An OOP application will thus tend to look like a network of records (objects). Some of the links between records will be due to inheritance (our "parent" key as described in above footnote), and others will simply be references to other records, such as one might find in an OO "Strategy Pattern". This network of records is similar to the "network databases" (NDB's) of the 1960's, and object databases tend to share many characteristics with them, both the good and bad.

Relational, on the other hand, is based on the concept of "tables". Since tables are a larger-scoped structure than records, you can do more powerful, larger-scale reasoning and operations with them than with a web of dictionaries (OO) in my opinion; at least at this point in history. There has yet to be a Dr. Codd of the NDB world, but I cannot rule out the possibility that some kind of "dictionary algebra" will someday be created or discovered to rival the power of relational algebra. But at this stage, relational appears to have its symbolic manipulation act better together. (Failed OO attempts are described later.)

Navigating objects in the application or queries often requires following the links in the records one-by-one. For this reason OO-like and pre-relational databases are sometimes called "navigational databases". You will often have operations like "next", "previous", "first", "up" (parent), "down", etc.; or use a "path" along a graph (network) in order to traverse and navigate the structure of records.

On the other hand, relational uses logic expressions to find stuff. You ask, "give me a result set that satisfies the following conditions....". You generally don't have to explicitly iterate or traverse through records or pointer chains to filter, find, and/or cross-reference information. (Although, it is true that you may have to iterate through the result set one result record at a time in application code. But this is using the result, not making it.)

This illustrates the basic differences between
tables and objects. The connection between tables
is shown dotted because they are generally "calculated"
instead of actual links. Although indexes may be
added to speed commonly-used table operations, these
indexes are generally not something that users (app
developers) see.

Although tables are a higher-level construct, the drawback is perhaps that they do not handle non-consistent "records" as well as the NDB's. These are records in which the fields (keys) may be different per record. Relational requires a kind of "master column list" per table. Any "slot" or "cell" used in a record must be in this master list of columns. Objects generally don't have this requirement. Each object can usually have its own set of keys (methods or attributes) that differ from all other objects. (In static OOP languages, this may not be true of objects, but it will be of classes.)

Trying to use varied columns on tables tends to lead to lots of sparse (empty) columns for records that don't happen to use a given column, or leads to skinny "attribute tables", which are essentially tables that act a lot like a dictionary. They may have an Attribute column and a Value column, for example. (Empty columns do not necessarily take up more space in modern databases. Thus, memory or disk consumption should not be considered a drawback.)

I personally think the higher-power logic overrides the drawbacks of dealing with non- consistent records, but after long debates I have to concede that the preference may be subjective. There is no math or metric that says one is universally better than the other. Relational queries do tend to be shorter than the NDB equivalent for data that fits well into table-shaped structures, but the difference is murkier for queries on more varied records. To me, the benefits that relational offer for the good table-fits outweigh the drawbacks of dealing with poor table-fits. Relational techniques can still be used reasonably well under poor-fit situations.

Perhaps there is a way to get the best of both worlds. Maybe fixed-structure records and variable-structure records can be made to live in harmony somehow. Although I have not seen a decent instance of such a tool or protocol yet, I won't discount the possibility that someday it might grow practical. It appears difficult to optimize performance for both approaches, though. Another difficulty is making it possible for a field to "move" from being a "fixed" column to a variable/dynamic column, and visa-versa, without changing existing query code. This is a ripe area for further research. See Dynamic Relational.

Tables just work better for my mind. I really dig the power of relational algebra and the simplicity of presentation that tables provide for me. NDB's just lack a consistent larger-scale structure beyond the granularity of a record (dictionary) to grasp onto to guide me on a forest level instead of just a tree level.

Trees (inheritance) have been tried as a solution to the larger-scale structure gap of OO. But in my opinion trees don't fit the change-patterns of the real world very well, at least not on a larger scale, as described above. GOF-like OO Patterns have also been proposed as a higher-level structure for OO, but they are not formally enforced by the paradigm. Further, the relational equivalent of most of such patterns (if there is one) is usually superior in my opinion.

It also is sometimes said that OOP better integrates behavior and data. However, this is mostly due to the overly-long tradition of hierarchy-based file systems. If such file systems were replaced with a relational file system, things would probably be different.

Summary

The base philosophical differences I seem to have with the OO paradigm seem to boil down to the appropriateness of trees, the appropriateness of global taxonomies compared to local or ad-hoc taxonomies, and the network-database-like structure of OO versus relational.

See Also:
Other OO-vs-Relational Articles
OOP Criticism