Subtype Proliferation Myth

Updated 5/23/2002

One thing that OO is reasonably well optimized for (probably at the expense of other stuff), is extensions or changes by subtype. A typical example given is a Shape class with subtypes such as Rectangle and Ellipse.

The problem is that I don't see a large proliferation of clear, stable subtypes at all. This is closely related the fact that most real world hierarchies or taxonomies are too dynamic to fit well into classical OO hierarchies. (This often applies equally to any scheme which divides things into mutually exclusive sub-divisions.)

The Lockstep Myth

For the most part, variations don't grow and change in a clean tree-wise way. For example, OO fans sometimes bring up employee subtypes as a business example. The problem is that employees can be subdivided along multiple non-mutually-exclusive facets. This is sometimes referred to as "orthogonal aspects". Subtype division candidates include:

Federal "exempt" and "nonexempt" (white or blue "collar")
Part-time versus full-time employees
commission-based wage employees versus non-commission-based
managers versus non-managers
contractors (via agency) versus temporary (non-agency) versus permanent

Further, some of these may totally change due to new laws, etc. You can pick one of these sets, but not all. And, any one choice is fairly arbitrary. It is like being forced to decide which of your many children you will assign inheritance to in your written will, assuming only one child is permitted.

A "role" or "strategy" pattern may be more appropriate than sub-typing here. Some OO fans have also suggested multiple inheritance. A potential problem with multiple inheritance is that it may not model mutually exclusive features very well. For example, one may risk inheriting both "exempt" and "non-exempt" classes. It may also make persistence tougher. See OOSC2 Chapter 24 Critique for more notes on multiple inheritance and employee examples.

My experience is that management wants features combined in dynamic and unpredictable (at design time) ways. This "feature recombination" view does not fit hierarchical sub-typing very well. (See the Bank example for more discussion on feature recombination versus sub-classing.)

  1. Good Candidate for Sub-typing:

              a  b  c  d

       1      A  R  K  U
       2      A  R  K  I
       3      C  X  M  U
       4      C  X  M  P

  2. Poor Candidate for Sub-typing:

       1      A  R  K  U
       2      C  R  M  I
       3      A  X  K  U
       4      A  X  M  P

In this example, the rows represent variations (subtypes) and the columns represent "features" in the Meyer-ese sense (methods or attributes).

The first set suggests a strong possibility for creating formal subtypes. Note how rows 1 and 2 share a pattern, as does rows 3 and 4. This suggests that rows 1 and 2 can safely belong to one subtype/subclass and rows 3 and 4 to another.

Subtypes will generally share several features that change "in lockstep" like this. In this case, columns a, b, and c change in lockstep in the first set. The break-even point for how many lock-stepped features are required to qualify for subtype division depends on many issues, including personal judgment and the likelihood that the lockstep pattern will last. (In the shape example, the lockstep pattern appeared in the form of empty (null, blank, or zero) columns for shapes that do not use certain measurements.)

The second example is what I find much more often in the real world. There is either very little permanent commonality, or if there is commonality, then new variations tend to be rare. If new variations are rare, then the OO sub-classing mechanism is of little help. Being allegedly "change-friendly" is of little help if there is no change. In other words, the more dynamic something is, the less likely it is to fit a hierarchy or mutually-exclusive divisions (sub-types). Thus, if you want a change-friendly system, then abandon or downplay the notion of sub-types.

For example, "real estate" can be subdivided into land, residential housing, and commercial buildings (at least). It is not that likely that a new real estate type will come along in any given year, let alone a decade. Thus, picking a paradigm just to handle such rare changes might be mostly in vain, especially if the new variation is significantly different from the existing ones. If there is very little commonality, then perhaps the hierarchy is only nominal. See Inheritance or Something Else? for more on this. Publications almost seems like a clean hierarchy, but not on closer inspection. It may be that some of these things that initially look like a taxonomy are better served with things like the role pattern (including the procedural/relational versions of patterns), where the features are set-based rather than hierarchical or mutually exclusive.

[Even though...] it has been known since 1847 that classifications are dependent on the purpose of the classification, people continue to believe that it is possible to create a classification system that is context-independent. (Haim Kilov on comp.object, 6/01. Note that I consider sub-types to be "classifications".)
"I find OOP technically unsound. It attempts to decompose the world in terms of interfaces that vary on a single type. To deal with the real problems you need multisorted algebras.......I have yet to see an interesting piece of code that comes from these OO people." A. Stepanov, STL pioneer.

Presidential Politics Analogy

One of the problems I have with picking a U.S. presidential candidate is that I prefer some of the stances of one candidate, and some of the stances of the other(s). Perhaps it would be less of an issue if I could vote on the issues independently. (California and other states already have issue-based ballots for state-wide issues.) I would then not have to pick a mutually exclusive "lump" that on the average better fits my preferred combination by small margins.

This is essentially the same kind of problem I see with subtyping used for business modeling. Were the founding fathers OO fans by chance?

See Also:
Customer Feature Plans

Dubins and Rucks

Let's take a different look at the lockstep issue.

  class A childof BirdX    
    feature 1... // cell A1 
    feature 2... // cell A2
    feature 3... // cell A3
    feature 4... // cell A4
  endClass
  class B childof BirdX
    feature 1... // cell B1
    feature 2... // cell B2
    feature 3... // cell B3
    feature 4... // cell B4
  endClass
  ...
  // define two instances
  A Duck
  B Robin

Here, "cells" are simply implementations of features. Thus, cell B2 is the implementation of feature/attribute 2 of subclass B. (If you can think of a better name than "cell", please let me know. Meyer's "Feature" seems to apply to the method name, not necessarily the implementation variations themselves.)

Note how something belonging to the BirdX family has to be either A or B. A dichotomy has been created here. (For the sake of discussion, let's assume that class BirdX is only a "template" class, and cannot instantiated.)

This kind of arrangement assumes either all of A's cells, or all of B's cells. Something belonging to the BirdX family must be either A or B.

However, most of the real (business) world has "Dubins" and "Rucks". A new variation or instance may need cells A1, B2, A3, B4, etc. It does not make much sense to add a new subclass for every possible combination of the 4 features. For numerous methods or variations, the combinations grow astronomically.

The association of features (cells) is quite often temporary in custom business applications. Thus, the "clustering", "linking", or "binding" of features as often done in OO subclasses poorly models the business world for the most part. A subclass is poor encapsulation of actual relations.

The above diagram shows the difference between common OO thinking and the approach more appropriate to most business organizations. Subclasses improperly assume that features fit into groups.

Note that we are purposely blurring the distinction between instances and "variations" (such as subclasses). The issues of when one becomes the other is separate issue that will not be addressed here.

I find it better to treat each feature as being independent. This makes it easier to reassign features to instances without worrying about "partner" features. Subclasses just get in the way with their faulty assumptions about long-term feature association. (Note that our coined term "cell" may be more appropriate than "feature", as described above.)

Dicey OO Solution

To "solve" this, many OO fans suggest splitting methods up into tiny pieces and putting most or all of them in the parent class. Thus, subclasses and/or instantiations simply call the pool of feature variations (cells) of the now hefty parent.

This may "work" for the most part, but it is simply not superior to procedural/relational approaches, and may create an ongoing chore known as "refactoring", among other problems. For related information on this, please see Boundaries of Change and Variation.

Time and Space

Types can be divided into roughly 4 categories:

Types that are poor candidates for sub-typing due to the lack of a clear subdivision pattern (such as the 2nd candidate example above).
Types that have a current clear subdivision pattern, but are likely to change and lose their pattern over time. Examples might be a corporate organizational structure or a customer account feature structure. (Note that losing a sub-division pattern is not the same as losing a branch on a tree. It is important to keep these conceptually straight. Losing a branch is when a subtype is no longer active. Losing a subtype pattern is when splitting/subdividing in the first place turns out to be a bad idea.)
Types that have clear and stable subdivisions, but are not likely to gain new subdivisions (subtypes) over time. Examples include shapes. Most or all fundimental geometric shapes are already discovered.
Types that have clear and stable subdivisions, and are likely to gain new subdivisions fairly often. The only example I can think of is device and interface drivers.

Subdividing category #1 is obviously a mistake.

Subdividing #2 may seem like a good idea at first, but can lead to messy trees that spread like a vine because one may have to keep overriding parent classes in order to keep up with the changes. This has been tagged inheritance buildup elsewhere on this site. (The link also describes how it is tough and risky to keep starting over or clean up for new hierarchies. This was dubbed the "Fragile Parent Problem.")

Now #3 seems like a good use for OO sub-classing. However, if additions are generally rare, then perhaps building in sub-classing into the language may not be a productive idea. Regular procedural techniques can handle these just fine. All the fears about accidentally denting nearby methods (code blocks) when adding subtypes are nearly moot since adding is by definition rare. (Related to Meyer's "Single Choice" principle.)

Category #4 seems to be the one that OO is all geared up for. If #4 was common, then OO would be godsend. Unfortunately, I find it quite rare in the programming I do (small and medium custom business applications). Perhaps if you write device drivers for a living, then OO inheritance would be quite nice. But, what about the rest of us? (Certain types of variation proliferation that may or may not qualify as #4 will be discussed later.)

Overzealous?

Even things that can fit fairly well to OO sub-typing like GUI's and graphics are often over-hierarchitized by overzealous OO fans. For example, I once found a Java class that generated and re-sized JPEG images.

The class builder had to subclass a GUI panel to get access to certain graphical operations. However, this was to be a batch (command mode) process that did not need GUI's. Because it opened GUI components, the process did not shut down properly in some cases because of a bug in Sun's GUI classes.

Thus, the JPEG class designer was sucked up onto an irrelevant branch and subjected to GUI bugs in order to "get at" certain operations in a non-GUI operation. (I am not really complaining about the GUI bugs here, other than an example of side-effects of having to swim through irrelevant classes.)

Perhaps it could have been rewritten to avoid opening GUI objects with some effort, but Java's API's seem influenced by an over-eagerness to take "advantage" of inheritance. Thus, one is forced or heavily encouraged to make and use sub-classing, whether appropriate or not.

One can argue that it is the users (Java designers) and not the paradigm that is at fault. However, if something is infrequently needed and often misused, perhaps it should be yanked from the mainstream.

Couch Scholars

I wish that computer scientists would do a better job at documenting real-world subclasses before devising abstraction gimmicks to prematurely solve poorly mapped problem spaces. For example, many OO academics completely ignore operational expansion (new operations for existing subclasses) at the expense of type-wise expension. See also the Shape example.

I would recommend more foot work to comb the countryside building a taxonomy and analysis of what, where, who, how often, and under what conditions subtyping occures. I realize that this is less intellectually interesting than devising abstraction games, but this is what is really needed. Excessive time up in the ivory tower has perhaps clouded judgement.

The late astronomer Carl Sagan speculated that ancient Greek technology did not advance as far as pure math did in that culture because Greek citizens then believed that manual effort was for slaves and the poor. Thus, they did not want to get their hands and feet dirty by experimenting with the actual world. They believed that the Universe could best be figured out by thinking; not by doing and observing.

I suspect that software engineering research has run into a similar "Couch Scholar" phenomena with regard to OOP. Scholars seem unwilling to dig through real projects looking for real patterns. Even when they do look, it is often in scientific applications instead of the more common business applications.

Dealing with Proliferation

One form of "sub-typing proliferation" that I actually do encounter somewhat often is what I call "parametertized subtypes". These are usually put into a table where screen forms are used to create and manage the "variations". The objective is to package the changes such that most of the additions come about by filling in a form or table row instead of programming. This allows a quicker and safer way to manage the growth. Often some programming is still needed, but the table can still serves as the primary guide and "feature selection repository" that selects the routines or functions supplied. See Control Tables for some examples.

When I ask what sort of things OO fans tend to use subclasses with, I usually get very vague answers. However, there are 3 areas that seem to be the most common source of sub-classing among OOers:

1. Extending library/package classes for an application

2. Internal conversions ("adapters")

3. User Interface

Extending libraries and components can be just in procedural/relational (P/R) programming as well. Rather than "override" a method, one simply uses their own function in P/R. (Sometimes one may still call a library function within the new variation, but with some pre- or post-processing.)

One can perhaps argue that this is inferior self-documentation and less protection; however, the P/R approach does generally provide the same abilities to "extend" libraries. Note that I have not inspected enough samples to see if OO sub-classing is really beneficial or simply using OO philosophy for the sake of conformance. (A related issue is the tying of complex objects or types.)

The second one on our list, adapters, is less of an issue in P/R because the association between data and operations is not as tight. Thus, less formal or no conversion is often needed.

The third stated use, UI building, is a complex topic. P/R GUI building is under-explored IMO. However, one of my favorite approaches to UI organization is Data Dictionaries. They are a subset of Control Tables (introduced above) that store information about UI and database fields. They are especially useful when the number of fields numbers in multiple dozens or more. (Not all screen fields are one-to-one mappable to DB fields, and special techniques are often needed to handle such "virtual" fields.) [Note that I plan on providing more details and suggestions regarding Data Dictionaries soon.]

One advantage of the parameterized (tabled) variation extension approach is that such collections can be shared with other paradigms and languages. Take animal classifications, for example. If they were hardwired into an OOP language, like many training materials suggest doing, then only that language can use the data. (OODBMS perhaps can be used, but this may make it hard to share with non-OO paradigms, among other problems.)

However, parameterizing that information and putting it into a RDBMS allows many different applications, paradigms, and researchers to share species information. OOP tends to be excessively memory-centric, and paradigm centric with regard to data. It likes to hide it's light under a barrel. The melding of algorithms with data can often result in such "selfish" problems.

Are Patterns a Solution?

Sometimes "Patterns" are proposed as a way to solve some of the sloppy propagation of variations (subclasses) found in the real world. However, the OO versions of many of these patterns are unnecessarily complicated. For one, they often have excessive "middlemen" classes. Aside from extra complexity, these middlemen often violate Meyer's "Single Choice Principle". (Note that I consider a list of operations just as important as a list of subtypes when applying this principle. Meyer also hints of operational lists with his text editor menu command example on page 62 of OOSC2.)

Further, some OO fans say that the real benefits of many OO patterns do not show up until several years after the birth of a project. This may conflict with standard investment accounting time discounting theories.

Also, many patterns better relate to the field of data and relational modeling. It is my opinion that data modeling not be too strongly mixed with algorithm modeling because it reduces the sharing ability of data among multiple paradigms and languages, which is a common need in business. Also, tasks and data (nouns) often change their relationship between each other. Thus, it is not good to heavily couple them.

There has not been enough research and effort put into non-OO versions of many patterns. Thus, there is not enough to compare at this time. I am the only documenter of such that I know of.

In some ways reliance of OO patterns can have drastic consequences to code structure (Meyerian continuity principle) if you need to change from one pattern to another. Procedural/relational design is often more immune to pattern changes. For example, suppose we have a table with attributes such as "isMgr" and "baseRate".

  if emp.isMgr and emp.baseRate > threashold then
     giveRaise(emp)
  end if

Later, we wish to use a role-like pattern. The code above will not have to be shuffled around, only have minor changes.

  if hasRole(emp, "manager") and emp.baseRate > threashold then
     giveRaise(emp)
  end if

Further, patterns may be a particular viewpoint, and given sections may need different orthogonal perspectives. Thus, patterns should perhaps be viewed from a "has-a" perspective instead of an "is-a" perspective. The procedural/relational viewpoint is often to try to farm out patterns to a temporary query instead of a physical code structure. It is nearly impossible to have multiple code structures overlap to give the needed pattern for different viewpoints, but often easily do-able via relational queries. It is easier to change a relational or Boolean expression than it is to change the physical arrangement of code. (Some argue that OOP code is also change-able by rearranging class associations, but if you use inheritance in the pattern structure, then you are usually limited to just one parent (aspect); and if you use delegation, then you have nothing over procedural/relational.)

Not all pattern views can be generated via queries, but we still need to have code that can adapt to changing, morphing, and hybrid business patterns. This is where the IF statement expressions come in. Rather than move our code to fit today's pattern, we can often just change the IF criteria without having to move code; at least not a lot of code. It is easier to learn and find code if it does not move around a lot.

It is a philosophy of Table Oriented Programming to control as much of relationships and behavior dispatching using Boolean expressions or expressions in general as possible. Tables-Of-Rules are a lot easier to change and manage and view than programming code in most cases. It is true that dynamic IDE's, such as the one's that have made Smalltalk famous can do some of this; but they are often re-inventing techniques that are already covered in relational table management techniques/tools. Why re-invent one wheel for data and another for code? Why not focus on perfecting table browsers instead?

Another problem with OO patterns is they the encourage over-engineering for potentially fleeting patterns. A pattern fanatic will see 3 items that fit a pattern, and then arrange the code to ease "mass production" of "more of the same". Thus, if these 3 items grow into 20 of the same pattern, then there indeed may be a net gain. However, if the original 3 was mostly a coincidental alignment, then "engineering for more of the same" was a bad investment. Thus, vote K.I.S.S. (simplicity) unless you are sure of the stability and prominence of the pattern. (Prominence is mentioned in case there are overlapping, orthogonal pattern view needs, as described above.)

In some domains where natural laws or slow-changing international standards dictate a stable pattern, then code-structure-based patterns may indeed be the way to go. However, in the dynamic business culture, tread with care.

Summary

Object Oriented Programming is well-optimized for managing frequently expanding subtypes. The problem is that features perhaps rarely expand in a clean, clear-cut hierarchical way in the real world.

Although there is unfortunately insufficient research to know how real type variations actually do expand, my impression is that OOP inheritance has added a boatload of uncessesary complexity and other organizational sacrifices in order to solve something that is a small problem to begin with in most business software.

OOP sub-typing may be building better umbrellas for fish.

See also:
Split Ends
The Driver Pattern
Publications Example
Overlapping or Orthogonal Aspects
Boundaries of Change and Variation

Back to OOP Criticism | Main