Critique of OOSC2 by Bertrand Meyer

Critique of Bertrand Meyer's
Object Oriented Software Construction,
2nd Edition

Document Updated: 5/28/2005
Go to chapter: 1, 2, 3, 4, 5, 6, 8, 9, 11, 14, 19, 20, 21, 24, 26, 27, 30, 31, Bottom

Introduction

I have asked several OOP proponents to recommend the best book for presenting the case for OOP. Meyer's book appeared the most often in the resulting lists, so I purchased a copy despite the steep price tag. The "GOF" pattern book came in close second, but it does not really make an effort to compare to non-OOP techniques. Robert Martin's works also were mentioned often, but Martin seems to focus on device drivers and systems software rather than the business domain. (Update: See Martin Critique) So, I am putting on trial the best OOP evidence book having sufficient mention of business applications: Object Oriented Software Construction (2nd ed.) by Bertrand Meyer.

I have been doing custom business programming for small and medium projects since the late 1980's. When Object Oriented Programming started popping it's head into the mainstream, I began looking into it to see how it could improve the type of applications that I work on.

Note that this excludes large business frameworks such as SAP, PeopleSoft, etc. I have never built a SAP-clone and probably never will, as with many others in my niche.
I have come to the conclusion that although OO may help in building the fundamental components of business applications, and even the language itself, any minor organizational improvement OO adds to the applications themselves are not justified by the complexity, confusion, and training effort it will likely add to a business-oriented language. In other-words, OO is not a general-purpose software organizational paradigm, and "selling" it as such harms progress in the alternatives.

I have used languages where the GUI, collections handling, and other basic infrastructure are built into the language or language libraries in such a way that OO's benefits would rarely help the applications developers. It is also my opinion that the language of base framework implementations probably should not be the same as the application's language for the most part. For example, most Visual Basic components are written in C++ (at least they were pre-dot-NET).

Some techies suggest that business applications are not "real programming" and that systems software and embedded work is where "real programming" is done. This write-up does not intend to evaluate the social worth of different domains or specialties. Custom business software is a large niche and needs good software engineering techniques as much as any domain regardless of the social standing or "geek coolness" factor. The focus should be on what needs to be done rather than what is "cool".
In other words, the needs of "infrastructure components" and of business software modeling are sufficiently different that the best techniques for one may not apply to the other. The longer-term patterns of change are just different for each. Meyer seems to have more of a one-size-fits-all view of languages and paradigms than I do. If a domain (industry kind) is large enough, then it makes sense to use the paradigm or methodologies that best fits it. It does not make sense to make police officers use fire-trucks simply because that is what is available.

For a preview of my opinions and analysis of this situation, may I suggest the following links:


Introduction to OO criticism
Why I Prefer Procedural/Relational
The Driver Pattern
Subtype Proliferation Myth
Black Box Wire Bloat
 

Although the stated niche is not representative of all programming tasks, it is still a rather large one and should not be ignored when choosing paradigms.

Here is a quick summary of my criticisms of OOSC2:

  1. Meyer tends to build up false or crippled representations of OO's competitors, which distorts OO's alleged comparative advantages.

  2. A good many of the patterns that OO improves are not something needed directly by the stated niche, except in rare cases.

  3. We have very conflicting views and philosophies on data sharing and databases.

Note that although my writing style has at times been called sarcastic and harsh, please do not confuse the delivery tone with the message.

Also note that I am not against abstraction and generic-ness. I am only saying that OO's brand of these is insufficient for my niche.


Chapter 1

 
Page 16, "the contribution of O-O tools to modern interactive systems and [GUI's] is well known, to the point that [it is often confused with icons and windows, etc.]."

I reject the notion that OO is the best or only way to do GUI's. Perhaps it is the best way to implement GUI drivers and components, but not the best way to add GUI's to the stated niche. I have built many FoxPro 2.6 (for Windows) GUI's without having to write a single class. (FoxPro 2.6 had some glaring GUI shortcomings, but these were not necessarily related to OO.) Visual Basic also allows one to make sophisticated GUI's without ever having to write a single class. (True, VB uses some OO-ish syntax. However, this is mostly just an issue of verb placement.)

See Also: Other GUI Approaches

Regarding the confusion with drag-n-drop mouse thingies: in my opinion and observation, this very confusion contributed to the popularity of Object Oriented Programming as a sought-after buzzword. GUI's sold product, and anything associated with GUI's (whether the perceived nature of the association was accurate or not) rode in on GUI's coattails. GUI's strong popularity increase and OO's increase are very closely related in time. Coincidence?

 
Page 16, "O-O techniques enable those who master them to produce software faster and at less cost."

I would like to see proof of this beyond anecdotal claims and from people whose wallets do not depend on OO's acceptance. This is the age of science. Where is the science? (I agree that some niches may have had large enough benefits that studies are not needed. However, extrapolation of this to all niches is a mistake. See also Objective Research Lacking.)

Further, what happens if many do not master OO? Is bad/sloppy OO better than bad/sloppy procedural or other paradigms? I think this is an important question because one must judge the usefulness of a paradigm by how typical developers are going to use it, not ideal developers. My favorite analogy for this is that road cars should be built and judged for how average drivers use them, not for how Richard Petty (a champion racer) would drive them.

This issue has been a strong dividing point in debates I have had with OO fans. I have been accused of "promoting mediocrity," which is then what Honda is doing by targeting average, regular drivers instead of Richard Petty. However, it is not Honda's job, nor probably within their power to significantly improve the skills of most drivers. (And, driving skill *is* important, since driving is just about the top pre-retirement killer.) Think of all the fuel and repairs that could be saved if everyone knew how to drive manual shift cars.

 
Page 18, The 5 to 9 digit change in U.S. postal zip-codes.

I am curious to see how OO would reduce the cost of such a change. I can envision several relational technologies that may greatly reduce the number of "visit points" for such changes.

Perhaps in COBOL you had to explicitly state the length for each zip-code reference, but this would be comparing bad procedural languages to good OO. (Perhaps COBOL has some good points, but size-dependency is not one of them, at least not in the older versions.)

Most code does not need to worry about things such as field lengths. You simply read or pass along the field as a unit. If you do need specific length information, then many RDBMS allow one to query the DB for the field length. (Unfortunately, SQL syntax is not standardized for this feature.) Optionally, a central data dictionary (see below) may do the trick.

Being agnostic about type information, such as whether a field is a string or number, can also be centralized in most cases. Unfortunately, many RDBMS are choosy about such information in their SQL implementations. (Perhaps they felt that explicitness was more important than representation change handling.) For example, some RDBMS allow you to put quotes around numerics in SQL. Thus, if a field changes from a number to string or visa versa, you don't have to change the SQL code. Others don't. (See also Dynamic Relational)

Overall, type- and size-agnosticism problems for basic types are mostly implementation-specific or standards-specific, and not an inherent fault of the procedural/relational paradigm itself. I beg Meyer not to paint with too wide a brush.

I see nothing inherent in the procedural/relational paradigm that forces internal representation information to be "spread out over many parts" as Meyer implies. Sure, a bad language or bad programmer may still do such, but that is true in any paradigm.

Further, many units of code have multiple grouping candidates. There often is no one right grouping because something can have multiple orthogonal factors/categories/dimensions associated with it. Unfortunately, linear text (source code) generally forces us to pick just one and favor that one dimension over other candidates. OO tends to pick a different kind of grouping factor than procedural, but it is not necessarily the best candidate. OOP book examples tend to misleadingly down-play the value of "lost" grouping factors. For more on this, see Control Tables and code structure analysis.

 
Page 18, "The issue is not that some part of the program knows the physical structure of data: this is inevitable since the data must be accessed for internal handling. But with traditional design techniques this knowledge is spread out over too many parts of the system, causing unjustifiably large program changes if some of the physical structure changes.........The theory of abstract data types will provide the key to this problem, by allowing programs to access data by external properties rather than physical implementation."

I wish Meyer would be more clear on what is meant by "traditional design techniques". (If he meant his false dichotomy between top-down procedural techniques and OO, this will be covered in chapter 5.)

However, the issue of sharing information about data formats is trickier than OO makes it sound. The problem is that different paradigms and programming languages MUST SHARE the same data in a typical business system. OO seems to assume that one language will handle all data accessing, and this is a bad assumption in many cases.

One partial non-OO solution is a good Data Dictionary. A data dictionary has the goal of putting as much information about a data field as possible in a central place. In this sense, it is somewhat similar to an OO class with regard to encapsulating (or centralizing) information about a data entity in a single place. This provides the factoring needed to reduce the number of "visit points" needed for data format changes.

Thus, if any part of a postal system needs to know how big a field is, it checks the appropriate place in the data dictionary rather than hardwire the number into the software.

Although a good data dictionary would probably solve the zip-code expansion problem (above), a stickier point is the association of algorithms with data items.

It is my strong opinion that locking up the representation or interpretation of data in a single language and/or paradigm is often a bad idea. First, because "in style" languages and paradigms change too fast. Second, businesses often have to share data with multiple languages, paradigms, and systems. So far it appears that RDBMS (relational systems) allow this better than OODBMS.

For more on this, see Melding Can Be Hazardous and Chapter 31 Review.

(Note that data dictionaries are also capable of melding data and algorithms if needed. However, this may not be recommended for larger systems.)


Chapter 2

 
Page 25, Static typing

I have heard good arguments from both sides of the strong versus weak typing issue. It is a topic almost guaranteed to create heated discussions in "geek group" discussion boards. It may just come down to personal work habits and preferences, and sometimes the domain (business topic) in question. (Note that there are some differences between "strong" and "static" typing, but in practice they tend to go hand-in-hand.)

In my opinion, strong typing tends to result in conversion and adapter clutter to the code. It is harder to work with code that has more clutter not directly related to the problem solution at hand. Strong typing is sort of like a crash helmet that blocks part of your view. On the one hand it might protect you from injury, on the other, it distracts and detracts from you "reading the road". See also Definition and Merit of Scripting and Chapter 11.

Further, it may be that strong typing works best for building components, and weak typing (dynamic typing) is best for building custom applications, using the components.

Also, sometimes you don't want the program to have to know what type something is. For example, you may be using a database and moving a value from one table to another. If each field has to be explicitly declared a type in such a program, then you risk problems if a given field's type changes. This is unnecessary coupling. (There are ways around this in some strong-typed systems, but they can get a little messy.)

In my opinion, strong typing shares some of the same problems as protocol coupling.


Chapter 3

 
Page 40 and 43, Modular Decomposability and Modular Understandability

Please see Black Box Bye Bye for a perspective on these concepts.

 
Page 44-45, Continuity: "A small change in requirements should produce only a small amount of coding effort." [paraphrased]

I happen to find "Continuity" to be a good description of a major problem with collection handling in many languages and design philosophies. Collection processing for RAM-oriented structures (like arrays) usually has a very different interface than larger or more formal collections (such as SQL tables). I see no reason what-so-ever to provide a significantly different interface to both collections types.

Arrays, including associative arrays (a.k.a. "dictionaries" in some languages), do not scale very well in complexity, and SQL tables are often too formal and require excessively bulky setup for some tasks.

I would like to see computer researchers look into this issue more.

See also:
Arbitrary Overhauls
The Gap
The Driver Pattern

Another potential point of continuity problems is excess use of IS-A thinking. Often there are multiple criteria on which to classify or manage something by. If the criteria chosen as the IS-A division does not pan out or changes, then all parts of the design that are dependant on this excessive elevation of one factor will either fall apart, get in the way, or need major rework. (See Multiple Criteria and Inheritance under chapter 24 for an example.)

One potential "gotcha" of continuity to keep in mind is not taking the total cost of ownership into account. Sometimes even though small change requests may require fairly large code overhauls, the total cost of an approach may still be cheaper.

cost of ownership
The red approach is probably still a better
design decision than the blue approach even
though it has occasional continuity problems.
It's the total area under the curve that matters
(although time discounting may play a role).

Over-engineered applications are often like the blue curve; one has to slog through the excess complexity throughout the entire life of the project.

 
Page 57, Uniform Access Principle

Although I agree with this philosophy, frankly, I have not found it to be a significant problem when not directly supported by the language syntax. I am not sure why it has not been a bigger issue, but so far it looks like a "paper tiger." This may be the reason why more languages, even OOP ones, don't support it very well, as Meyer states.

Note that one potential drawback of Uniform Access is that it may hide the cost of calculation. If the calculation is expensive and/or time-consuming, but is disguised as a simple attribute, then one may risk excessively frequent accesses when a more appropriate treatment would be to copy it to a local memory variable. One could not tell the difference between what would otherwise be Florida.VoteCount and Florida.CountVotes()

 
Page 57-58, Open-Closed Principle

Meyer is not very clear, in my opinion, about exactly what "open-closed" is about. Some parts suggest it is about shared libraries, and others suggest it is about adding new portions without having to re-compile existing code. I will address both.

Sharing a central "source" or module is often very tough in dynamic situations, regardless of the paradigm. One often runs into the "Fragile Parent Problem", for example. (See Inheritance Problems for an example or two.)

A centralized library approach seems to work fairly well in specific applications, but cross-application sharing is a bear in any paradigm because the feature needs are rarely similar enough. I often find it much safer to make a copy of a module/package rather than try to have all potential applications reference a central copy. Also note that Copy Reuse is a tool like any other tool; and thus should only be used if appropriate. I am not recommending mass copy-N-paste here.

It would be nice to be able to have a central copy, and perhaps satisfy a certain sense of "factoring purity", but it is really tough to pull off well in a dynamic reality. "Copy reuse" (replication) is often simply a good compromise between "reference reuse" (a central repository) and no reuse. It allows you to reuse code, but not lock you into backward-compatibility with other applications, the Fragile Parent Problem, and other sharing headaches.

The only things that seem to benefit from central sharing is something that has a very simple, stable interface with a clear definition of requirements. For example, a sort routine. Unfortunately, the real world often does not fit into this type of predictability and simple interface. In many business applications the interface is almost as dynamic as the implementation.

A political analogy is that there is much duplication in State governments (USA). However, centralizing (factoring) it to the Federal level often results in an inflexible, unresponsive bureaucracy because it takes too long to evaluate and implement the impact of change.

It also tends to reduce experimentation. One advantage of State duplication is that one State tries something and the rest watch to see how it goes. Automobile insurance laws are a prime example. When "no-fault" insurance was proposed in California, other States that already had it were analyzed.

I find module variation the same way. Sometimes the only real way to know how something will work is to try it. If it does not work, it will only screw up the current application instead of all "referencees" of the centralized version.

OO "overriding" is sometimes offered as a solution to this. However, the variations often do not fall on clean "method boundaries". In other words, there is often no simple way to override 1/3 of a method. Besides, even if the variations do fall on clean method boundaries, such long-term use of overriding can result in messy Inheritance Buildup, in which the inheritance tree grows into a hard-to-manage creeping vine. "Refactoring" is kind of a silly "fix" to this. Refactoring is the symptom, not the solution. OO was originally sold as a way to reduce maintenance effort. Now they seem to be backpedaling because refactoring is simply a euphemism for code changes (especially movement of code). Renaming a problem does not make it go away.

For an illustration of "messy method boundaries", see the Change Boundary write-up.

Another common citation of Open-Closed is in distributing compiled software in chunks (such as DLL's). The problem with "my distribution grouping is better than yours" is that it is highly aspect dependant. Favoring one grouping aspect often disfavors another. This aspect issue is often used in OO examples to unfairly bash other techniques. Examples can almost always be found that make one grouping look better than another for a particular case. Excluded are sample cases which favor different groupings.

See Also:
Overriding Compiled Routines
Changing existing code (Mellor)
The Reuse Dilemma

 

Static Maintenance Paradox

It seems inheritance is often a poor choice in dynamic situations. The odd part is that in the opposite case, static situations, change management is nearly a non-issue by definition. Therefore, the paradigm used for static situations is a very minor issue from a maintenance perspective. Lots of change, then inheritance is a poor fit. Little or no change, then which paradigm best helps manage change is a moot point. I have tentatively named this the "Static Maintenance Paradox".

 
Page 61, Publication Example

I did a write-up based on this example. Please see Publications Case Study.

An alternative title for it could be "how to prevent programmers from being taxonomists."

 
Page 61-63, Single Choice Principle, "Whenever a software system must support a set of alternatives, one and only one module in the system should know their exhaustive list."

This is a point of major contention. There appears to be some discrimination going on here.

Let's look at what this "list" can be. Although Meyer's examples usually consider a set of variations as "the list", he actually dropped a hint pointing to his huge flaw (or at least a very poorly addressed issue). Near the bottom of page 62, Meyer gives the following example of a list:

"In a text editor: the notion of user command, with such variants as line insertion, line deletion.... global replacement..." [emphasis added]

So you can see, the "list" can be a set of commands also, not just subtypes. However, the typical OO approach does not centralize all lists to one module (class). All it does is invert the operational list with the subtype list.

Let's look at a template of a typical OO example:

  class Circle inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
  }
  class Rectangle inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
  }
  class Polygon inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
  }
Hey look! The list of operations (scale, rotate, draw) on these shape variations is repeated three times in different modules! Is this a violation of the Single Choice Principle? Sure looks like it to me.

If we add a new method, then we have to remember to visit potentially every shape:

  class Circle inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
    method fill {....}  // NEW
  }
  class Rectangle inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
    method fill {....}  // NEW
  }
  class Polygon inherits Shape {
    method scale {....}
    method rotate {....}
    method draw {....}
    method fill {....}  // NEW
  }
A procedural approach would instead (typically) have only one procedure per operation (scale, rotate, and draw). Thus, the procedural approach better satisfies the Single Choice Principle compared to OO subclasses with regard to new operations. (See the Shapes and Inheritance section before you go claiming that inheritance prevents or reduces method name-list repetition. In short, it's equivalent also simplifies procedural/relational code.)

OO tends to emphasize centralizing the subtype aspect (lists) at the expense of the operational aspect (lists). I have yet to see any evidence one is inherently better than the other.

Actually this "single choice" stuff is nothing more than a factoring issue. Factoring is all about reducing unnecessary repetition of patterns and constructs. Thus, a Single Choice violation is a repetition of a list.

The only way I have found to factor out both the subtype aspect and the operational aspect at the same time is with Control Tables. Outside of control tables, it is a simple tradeoff of favoring one dimension at the expense of the other. (The two dimensions being subtype/variation and operational.)

Note that this particular battle is mostly about the selection criteria names (such as method name or case statement selector name/string), and not the implementation code contents of the method or case "block".

Common?

Actually, I do not find repeated case-statement lists very often in my code. The first reason is that I often use tables to store dispatching information, and the names are only listed once in their appropriate column. (For this reason and others, I find tables better at managing such information than linear OO classes.) Second, most features are selections of task-specific strategy lists, and not one-strategy-per-variation as many OO examples seem to assume. Thus, the strategy case list(s) per task is usually different from task to task. For instance, the list of different strategies for calculating taxes will usually be very different from the list to calculate shipping costs. They rarely have the same names, let alone the same count. Thus, even ignoring the aspect tradeoff issue (above), in actual practice case-list repetition exists more often in anti-procedural books than in real life.
 

For more information on case statements and the factoring and proximity tradeoffs, see also:

Shapes Example
Structures and Patterns of IF and CASE statements
Control Table Theory
Aspects


Chapter 4 - Reuse

 
Reuse Notes: See the discussion above about pages 57-58 for some notes on reuse. Also, I cited 2 sources (Wikiwiki and Dr. Dobb's) on the links section of the main OO page which suggest that OO has failed to significantly increase reuse.

 
Page 74, "Over and over again, programmers weave a number of basic patterns: sorting, searching, reading, writing, comparing, traversing, allocating, synchronizing...Experienced developers know this feeling of deja vu, so characteristic of their trade."

Many of these on Meyer's list are examples of typical collection operations. I am a proponent of having these built into a language, or at least a significant part of the language design or libraries, so as to make them readily accessible and consistent. I favor table-friendly languages which make collection processing a snap. I rarely have to directly use arrays and linked lists in them. The same collections interface can be used for all collections, which should be the case IMO. The problem with choosing a collection type is that collection needs often morph over time. They often slip from one type into another or grow aspects of 2 or more collection types. Thus, a general collection mechanism is preferred except for occasional special speed needs. (Note that certain options are more likely for more complex collections, but most protocol elements can remain the same.) I talked about this a bit already in the prior discussion on continuity.

It is ridiculous to expect a typical business application programmer to spend their time writing stacks and sort routines. There are plenty of libraries and database engines out there already. Carpenters don't need to make their own drills anymore. (Perhaps OOP may be great for making them, but it does not help in using them from the application builder's perspective. The builders needing OOP is not the same as the tool/library users needing OOP. One size does not fit all.)

See Also:
Device Drivers
The Collection Taxonomy Sin

 
Page 76, "A potential obstacle to reuse comes from policies of larger organizations that limit paid-for work to the specific scope of a contract. Such rules are to make accountability simpler, but can prevent reuse. ...... However, there is nothing to prevent an RFP from requiring a general-purpose solution and metrics for meeting reuse goals." [Heavily paraphrased]

This to me sounds like dreaming. Good contracts and RFP's are already hard to build to just meet existing requirements. Adding requirements and metrics in there for reuse is probably only going to make the write-up more expensive and give lawyers more clauses to battle over.

However, I am all for experimenting to see how this works. Just don't write it into law until it is well tested.

 
Page 89, "R3 - No complex data structures are involved...."

I am not sure what Meyer means by this.

 
Page 90, "[procedural programming requires a] large number of searching routines, each covering a specific case and differing from some others only by a few details in violation of [factoring]. Candidate reusers could easily lose their way in such a maze."

I find this overly harsh criticism of procedural programming. There are features and techniques of some procedural languages that simplify the process of building generic routines and greatly improve factoring. These techniques include indirection (preferably code indirection, a.k.a. "string evaluation," instead of pointer indirection), persistent tables, static variables, parent routine variable "scope inheritance" options, and others procedural and relational techniques.

Although I agree that such solutions generally offer less protection than OO versions, factoring is not the drawback that Meyer suggests. Based on coding debates, I would guess that such solutions are roughly up to 25 percent more code than OO, especially when multiple, independent instances are needed. (With single instances, the non-OO solutions actually seem to come out ahead with regard to code size.) However, this addition is not due to factoring, but due to manually coding structures that are otherwise built into an OO language.
Meyer seems to be looking at C and Fortran, which are rather wimpy representatives of flexible procedural programming. C and Fortran can be rather efficient from a machine perspective, but they score low in flexibility and dynamism in my opinion (unless you like C address pointers).
See Procedural/Relational Tips for language constructs beyond those of C and Fortran that can improve one's p/r programming.
However, procedural generality is generally a moot point for custom business software. I admit that I am somewhat at a loss to fully explain exactly why, but it just seems to be the case out of many years of experience. I dug around in my brain for ways to apply Meyer's techniques, but cannot find many real-world situations that would fit the patterns that OO allegedly makes easier.

The OO literature generally does a fine job of trying to sell me on patterns and solutions that I don't encounter or need very often (unless I completely toss relational theory out the window, which most business are not prepared to do). It comes across as, "Look how easy this tool allows you to fit your entire knee into your ear!" It indeed does a good job of allowing me to put my knee in my ear. But....um....I don't need to do that very often, Mr. OO Salesperson.

For more a detailed description of this topic, please see The Driver Pattern. The Subtypes write-up is also somewhat related, but I recommend the Driver write-up first.

Further, I propose a single interface to most collections, rather than collection taxonomies made popular with Smalltalk and other OO libraries. Selecting which "search routine" to use can be a simple attribute setting in a good API.

  t = openTable(#table invoices #driver Foo)

    versus

  t = openTable(#table invoices #driver Bar)
Here a named parameter is used to select which table driver (implementation) to use. A default can perhaps be assigned if a "driver" parameter is not given. In case our syntax looks confusing, try this:
  t = openTable(table="invoices", driver="Foo")
Named parameters can make such API's quite flexible. For example, if we wanted to use SQL (if supported) instead of a direct table, then we could say:
  t = openTable(sql="select * from invoices", driver="Foo")
See Also:
Driver Interface Versus Implementation
The Reuse Interface Bloat Dilemma


Chapter 5

 
Page 113, "One can in principle include the concern for extendibility and generality in top-down design..... but nothing in [top-down methodology] encourages generalization...."

The implication here seems to be that the top-down doctrine has no features to it that helps with building generic, flexible code. Therefore, it goes that top-down sucks and that OO is the only alternative.

Meyer does not state this conclusion directly, but I find it hard to read it any other way. The falsehoods thus generated are:

  1. Top-down has a flaw that cannot be remedied without tossing out the entire top-down paradigm.

  2. The only two choices are top-down or OO. (Is "procedural" and "relational" the same as "top-down" in Meyer's mind? I cannot tell.)

I have built many procedural/relational systems that did not "lock" the pieces into a strict sequence that Meyer implies that procedural does throughout chapter 5. (They may be sequence dependent at a subroutine level, but this is also the case within OO methods.)

There are many techniques to avoid getting overly dependent on sequence. The most common technique I use is prerequisites, usually dependent on data from relational tables. A module will check to see that the prerequisites are met. If not, then an exception is triggered or a "fail" result code is returned. It is somewhat similar to the "pre-conditions" talked about in chapter 11.

For example, a Sign_Contract module/routine, based on Meyer's page 111 home buying example, could resemble:

  sub Sign_Contract(accountID)
    result = space(0)  // initialize as blank
    // retrieve the account record as "t"
    t = get_record(table="accounts",where="accountid=" & accountID)
    if empty(t.homeRef)   // no home reference ID?
       result = "Home is not specified yet."
       exit sub
    elseif t.loanStatus != "complete"  // loan not finished?
       result = "Cannot sign contract until loan is completed."
       exit sub
    endIf

    result = getBuyerSignitures(...)
    if empty(result)
       result = getSellersSignitures(...)
    end if
    // Empty return value means success.
    // Assuming 'result' value returned, based on Eiffel convention
  endSub
This non-OO routine could be placed anywhere in the program. It's only sequencing requirements directly reflect real world sequencing limitations, which it is designed to detect and report on.

Tree Addicts?

Regarding the rise and fall of top-down, I often suspect that inheritance is over-hyped for similar reasons that top-down was. Trees are very appealing to software organization researchers because they appear to be able to organize complex and varying information into an easy-to-comprehend and very visual (at least perceptually) structure.

However, it seems that trees end up failing to grow in a clean fashion when applied to the dynamic real world. Trees do well when the change is expansion of existing patterns and structures, but start to fail when branches need to be moved and/or cross-reference each other often enough that a graph (network) structure would have been more appropriate to begin with. In these cases, the original tree foundation becomes nothing more than an archaic burden.

In the case of top-down, this "branch jumping" problem is seen when somebody tries to change the sequence order of something, as Meyer describes.

In the case with OO inheritance, branch jumping can occur when the hierarchy is weaker than originally anticipated. (See Subtype Proliferation Myth.)

It is true that OO can be used without much inheritance, but client relationships between classes are a much harder sale than inheritance. Most of the "see how magic OO is" demos that I have seen involve inheritance. Beyond a tighter bond between data and algorithm (not always a good thing), I have yet to see any major contributions from client relationships of OO classes.

Inheritance is a strange beast. It is very powerful when it is properly applied to the right patterns; however, it is also a grand source of confusion, misuse, and language complexity. Therefore, if these "right patterns" are rare in a given domain, then perhaps inheritance should be tossed.

To Meyer's credit, he does not overemphasize inheritance and gives good advice to avoid its overuse. However, in my opinion he should look into getting rid of it for some or most target domains.

See Also:
Alternatives to Trees

 
Page 113, "The reality is less simple: project n in a company is usually a variation on project n - 1, and a preview of project n + 1."

I am not sure this is fully accurate. Most companies have only one budgeting system, one accounting system, one HR system, one inventory tracking system, etc. Why would they want to make 8 inventory tracking systems for themselves?

I will agree that they often overhaul such systems roughly every 8 years or so. However, here are the main reasons why such overhauls are done:

  1. They get a new operating system which does not run the old software.

  2. It is hard to find programmers who want to work with an out-of-style language. I have seen this happen quite a few times. In fact, I was once a consultant in an "older" language until the software system was rewritten in something more fashionable. Another time I was hired to rewrite my own software into a newer language. There was nothing technically wrong with the older one. (I don't claim this is always logical, only that happens.)

  3. A merger or buyout results in one half of the company having to unplug their system.

  4. A new paradigm comes along and it is decided that it is easier to start from scratch rather than change existing code. (Example: mainframe to LAN to client/server to Web to ????.) One can argue that if the original code was generic enough, it could "flex" into any new paradigm. However, this is an extreme idea in my opinion. Projecting future business requirements is one thing. Projecting for future paradigms is probably an order of magnitude tougher with regard to flexibility and generality. I get one of those weird, dizzy headaches just trying to envision such a goal.

 
Page 115, "The presentation used a typical example: table searching."

Typical? Um, I think not. I am still looking for an OO book that uses typical custom business applications/patterns to demonstrate the superiority of OO without tossing RDBMS. (Note that showing "how to" is not the same as showing "how better". The later requires more effort.)

 
Page 116, "Yet this simple idea -- look at the data first, forget the immediate purpose of the system -- may hold the key to reusability and extendibility."

Isn't this called "relational modeling"? Something that is not exclusive to OO.

Further, like I said before, relational data can be more easily shared with more paradigms and more programming languages than OO data. Sharing and convertibility are important in the business world. (Sure, relational data can be mapped to OO, but what benefit does this give in order to justify the mapping layer?)

Note that I am not for modeling purely by data. I recommend using both behavior and data for modeling, but not tying them together too early in the process, unlike most OO modeling techniques. Excess tying of the two often results in methods artificially "jammed" into classes/entities simply because OO dictates that they have to fit into one. Sometimes there is a clear relationship, but often not.


Chapter 6

 
Page 122, "The descriptions should be precise and unambiguous."

I find that the Java API's greatly violate this; mostly, because they violate the black box principles in a sneaky way.

 
Page 125, "The result of the Lientz and Swanson maintenance study.....More than 17% percent of software costs was found to come from the need to......change....data formats."

I have seen this mentioned a few times in the book. It would be interesting to see more specific, and representative, examples of this in other to analyze possible approaches and tradeoffs.

 
Page 125, The middle-initial case.

I think there is more to this than simply not planning enough.

If multiple middle initials are quite rare (I assume they are, but don't really know), is it then worth it to make more input fields for them?

I can see at least 2 drawbacks to having multiple initial input fields. The first is that it is more data-entry time to skip the extra field(s) for the majority time that a person has only 1 or none middle initials.

Second, more fields may result in more data entry errors. A clerk may type in two middle initials instead of one, or even worse. You may get stuff like "John Q. S. Mith" instead of "John Q. Smith".

Thus, a quick look at the costs versus the benefits provide no clear answer.

 
Page 125, "The 'millennium problem' [Y2K]....is another example of the dangers of accessing data based on physical representation."

Most of the Y2K issue is caused by simple shortsighted thinking and/or priorities. (And in the early days, expensive storage costs.) In many cases technicians knew about the shortcomings of 2-digit years. However, they either did not care to deal with it or were ordered not to spend resources on it by their supervisors.

No paradigm, short real AI, will substitute for an unwillingness of humans to plan. A poorly planned OO system is no better than a poorly planned procedural system; and I have never met an OO programmer who disagreed with this. OO will NOT think nor plan for you.

Perhaps using OO may have reduced it in certain specific situations, but at the expense of another layer between the raw data and the algorithm. Layers add to complexity; they are not free. Conversion and preprocessing layers can be built with procedural programming also.

Note that adding an OO date class, or even a date type or validation and comparing routines (API) in a procedural language, does not guarantee that the programmer will actually use such features. An OOP programmer can still write multiple, incompatible date classes; or bypass them altogether. The larger the project, the harder it is to communicate to avoid having multiple classes which may overlap in purpose.

A stupid programmer can load in and compare dates as strings in OO just as much as in procedural programming. Much of the data in business systems comes from outside sources, and one may forget to convert.

if (date1.toString < date2.toString) {....}
The programmer may be thinking, Hmmmm. What was that comparer method again? Rats! They subclassed it in another library, and I don't want to hunt for the base class in order find it in Help. Was it date1.less(date2), date1.lessThan(date2), date1.compare("<",date2)? Oh well! I will just use "toString", it is usually consistently named. Isn't polymorphism wonderful? "toString" it is!

One could argue that comparisons are always done by overriding the standard comparer symbols (">","==", etc). However, not all languages support that, or the vendor may have used names for multi-language compatibility. Further, an operation that does not make it clear what is being operated on can cause readability and maintenance problems. The operation dateCompare(a,'<',b) makes it clear that we are comparing as dates and not as strings or numbers. (I use "as" because many dynamic languages don't have formal type declarations, or even types.) We cannot tell by a statement like (a < b) alone what is being compared. Sure, an IDE may help in some languages, but then you cannot inspect with printouts very well.
Another reason one may do something similar is to prepare a variable for export to another system, but forget that it is converted when they compare.
    d = date1.toString()
    ....
    if (d > x) {.....}
    ....
    export(d, foo, bar)
  
Nothing in OOP prevents this either:
  Class myClass
    attribute foo: String
    attribute bar: Integer
    attribute myDate: String
    ....
  end Class
  
One possible reason they may do this is that their source (outside of their control) may send the date as a string or number, but they forget to convert it to a Date class/type after loading in the original. (Note that sharing data with outside machines {B-to-B} is generally common in business applications.) Other possible reasons for such an error is that the programmer is sloppy, stupid, high, or simply never exposed to date problems. He/she may have come from a company that used only strings formatted as "YYYYMMDD".

Some claim that OOP training often uses dates as training examples, and thus "OOP better prevents Y2K-like errors". However, that is advertising training or nagging, and not the paradigm itself.

Only human intelligence, not a compiler, can tell that all such classes could and/or should be rolled up into a single class or API. Only a human, not a compiler, can tell the machine to cast/convert imported date values into proper dates or date-types instead of strings or what-not. (Perhaps a machine can guess, but the final result still needs human evaluation.)

Also note that some procedural/relational languages have a built-in date type. (Usually these are database-influenced languages for some reason, such as PL/SQL, XBase, etc.) Still, this does not guarantee proper use of dates. (The debate over strong typing versus dynamic typing or non-typing is not entered into here.)

Y2K bugs were usually caused by bad choices, NOT bad paradigms. Until you invent Spanking-Oriented-Programming, such problems will not go away. OO is not your mother; it won't pick up your socks for you.

Further, what may perhaps be useful for simple base types, such as strings, numbers, and dates; will not necessarily scale up to "fat types" like Customer, Invoice, Product, etc. What works on a small scale may not work well at a larger scale.

See also: Data Protection Myth.

 
Page 128, E-mail name removal example.

I am not sure if this example proves anything about OO. Perhaps the mail thingy was just a poorly organized or poorly factored piece of software. It can happen in any paradigm.


Chapter 8

 
Page 253, (Bottom) "head.store(save_file)....... Just by itself, this [data saving] mechanism would suffice to recommend an object-oriented environment over its more traditional counterparts.

I would like to see what the heck he is comparing this too. This is a rather strong statement from Bertrand Meyer.

I know that data saving was a pain in C and Pascal when I used them, but I have used much more collection-friendly and persistence-friendly procedural languages since then. In XBase, the saving is usually automatic, for example. Even if you forgot to explicitly close (save) a collection, everything is automatically saved when the program ends. (I am not promoting all of XBase here, only saying that the problems Meyer describes are not born in to procedural/relational systems. His brush is too wide again.)

It is even closed if there was a run-time error, as long as the error is trapped within the interpreter. (If it crashes the interpreter, this would usually be either a bug on the interpreter writers' part, or a hardware fault.)

Some dialects also have a way to specify a collection as being a temporary table, and thus it acts as a file-bound collection while it is active. But, when the collection is closed, any files used to store the temporary data are automatically cleaned up (erased). Even if this mechanism was not built-in, re-creating it manually was simply a matter of supplying a unique table/file name and remembering to delete it at the end. (For batch or single-user systems, explicit deletion was often not even needed since the next usage would recycle the same temporary file.)

Not that due to buffering and caching mechanisms, disk I/O for repeated use of a given table is generally kept to a minimum. Smaller tables can even (automatically) fit entirely in RAM, avoiding any disk I/O until save-time (or if an explicit command is issued to persist any cached changes.) Caching works for other relational technologies also, such as SQL engines.

Another wonderful benefit of using buffered relational tables for all collections was that explicit memory management was not an issue. Unlike many systems, converting from a RAM-only collection to a disk-cached collection was a 100 percent non-issue. The only "memory leaks" in XBase will come from bugs in the interpreter, not bugs in the application programmer. (Some GUI approaches may leave "ghost screens" behind if the programmer does not close things properly, but this is probably a separate issue.)

In general, auto-releasing table/relational resources is very language, API, and DB engine-dependent. One should be careful about making paradigm-wide generalizations.

Most XBase dialects have/had arrays, but I found I rarely needed them. The only time I used arrays was to get results back from an array-generating API. For example, a file folder directory API would return file names and file statistics in an array. Why the API makers did not use tables to return such results is kind of odd. XBase tables are much more powerful and flexible than arrays.


Chapter 9

 
Page 301, Automatic Memory Management

See above under the page 253 entry. At least twice I have heard OO fans brag about automatic memory management. Simply another OO myth.


Chapter 11

Design By Contract

Another title for this chapter could perhaps be "extreme validation". All the extra clauses and validation blocks are not without a price. For one, they provide more code to distract the eye and concentration, making it harder or longer to find the code that actually does something besides stand guard. Thus, there is the potential cost in time to find the code you are looking for, the potential errors introduced because the clutter confused one or interrupted one's thought, and possible errors introduced from changing a method's/routine's contents, but forgetting to change a validation condition to match. (A kind of mini Single Choice Principle violation.) Also see the comments about strong typing in chapter 2. There is something to be said about keeping intent and purpose of code lean and clean.

Also, a simple IF statement can be used in place of Require and Ensure. I am not sure special constructs are worth cluttering up a language with. They strike me as a little bit on the pedantic side. However, this is only a small gripe of mine.


Chapter 14

 
For my view of inheritance and related links (at the end the section), please see Tree Happy, or Tree Addicts under the Chapter 5 review.

 
Page 468, shape hierarchy diagram

Oh, if only the rest of the business world fit such a clean, never-changing hierarchy. OO would be godsend if this was the case.

 
Page 474, Mail-A-Pet example (small type font)

I am so sick of shape, GUI, device driver, and animal examples in OO books. OO books would have a student believe that the world is built around shapes and animal taxonomy. (I suppose a video game might be, but I have yet to build one, at least not for money.)

Another recurring OO example inheritance topic is food categories. Dairy products, meats, cereals, etc. (Gee, where does Spam fit in?) I do not remember any food examples in OOSC2. Did I miss them?

When I complain about this lack of example imagination in discussion groups, OO fans point out that they are only for teaching purposes. However, I suspect that inheritance is of very limited use in the real world, and thus authors are forced to use the same few examples over and over again.

 
Page 498, Case Statements

This relates back to our Single Choice Principle discussion (page 61). Especially notable is, "Any addition of a new type, or change in an existing one, will affect every routine."

Again, what about the addition of a new method to each subtype (shape)? This will affect every subclass: a violation of the Single Choice Principle. Procedural programming (case statements) will tend to keep the changes isolated to one routine, which makes Single Choice happier in that case.


Chapter 19

 
Page 665, "...all the great methodologists have also been programmers and project leaders on large development [projects]."

Perhaps this creates a bias toward languages and paradigms geared for large projects. What is good for large projects may not be what is also good for small and medium projects. I have no empirical evidence, but I doubt the need levels of various points on the size spectrum scale linearly as projects increase in size. In fact, the graph on page 615 does suggest this. The lower curves appear to be more linear.

Perhaps OO indeed makes large projects run smoother, but please don't extrapolate this to all project sizes.

The dividing lines between "applications" is often blurred in my domain. For example, a particular application might be relatively small, but it may connect to a database that is shared by many other applications. There is no reason to make a large application if one can make many small applications that communicate primarily via one or few big database(s). This is often a good way to break big problems into many small problems. Databases are in general harder to convolute than programming code in my opinion, and therefore make a better large-scale "messaging and state repository" than code.
I have asked OO fans on several occasions whether they think that OO mostly shines just with larger projects, and get mixed results (roughly a 50/50 split). Many do insist that OO has clear benefits for smaller sizes also. (Although they do a poor job of articulating/demonstrating the exact reasons for the alleged benefits.)

I agree that there is some overlap in the needs of different project sizes, but large projects are not always the best proving ground for other sizes. For one, name-space issues are often much more important in larger code bases. The steps/features to handle large name-spaces are often over-kill and distracting at smaller sizes.

For yet another vehicle analogy, the best truck driver may not make the best motorcycle driver, and visa verse. Truck-driving requires long-term alertness fitness and keen depth perception, while motorcycling requires quick reflexes and subtle body weight balancing and distribution skills. The skill sets and training needed are generally different. After all, some good truck drivers don't even know how to ride a bicycle.

See Also: More on size comparisons


Chapter 20

Because of the size of the multi-panel example, chapter 20 has been moved to it's own page.

Click Here for the Chapter 20 review

(Summary: The panel example is yet another case of Meyer setting up a non-OO straw-man to be knocked down by the mighty "HerOO". Perhaps he likes OO so much because he is possibly a crappy procedural/relational programmer. I know that is harsh criticism for such an experienced and respected man, but his procedural/relational examples frankly need help.)


Chapter 21

It seems that Bertrand's solution to Undo and related operations puts the "undo" implementations at too high a level. For example, ignoring paragraph spacing and certain other doodads for the moment, all text editing can be reduced to just inserting and deleting. (We could throw in a Replace primitive, but it is technically not needed). Thus, perhaps one only has to implement Undo for just two low-level primitives, and not for every single high-level operation. The high-level operations would be defined as combinations of the primitives.


Without primitives, one may have to repeat UNDO implementations
for each and every UI operation.

This would greatly reduce the quantity of polymorphic Undo methods or their equivalent. It may also reduce code quantity. If there are 50 command operations, then Meyer's approach may need up to 50 Undo implementations (methods). However, using primitives, we still have only two implementations regardless of the number of high-level operations.

We may still have to keep track of which high-level operation belongs with which low-level transaction(s), but this is mostly an attribute storage/tracking issue and not a polymorphic dispatching issue.

A similar approach may apply to a Sketch tool. Primitives could be line, pixel, and/or (rectangular) region. For more complex operations, it might be easier to take a snapshot of a region or the entire image before the transformation is applied. For example, it is tough to reverse a "blur" or an "add random noise" (speckle) algorithm using a counter-algorithm. Thus, the Blur command could simply execute an a Snapshot() operation before performing. If the user chooses to undo it, then the most recent snapshot becomes the image's current state.
This is roughly analogous to using a single collections engine (database or database interface) instead of re-inventing efforts such as archiving, data converting, index management, performance tuning, inter-application messaging, etc., for each and every application.


Chapter 24

 
Page 813, Bottom diagram.

Minor editing issue. The arrows out of the middle square appears incorrect. I suspect that one of the side-pointing arrows is supposed to come out of the top third of the white square.

 
Page 836, "ACCOUNT <-- SAVINGS_ACCOUNT <-- FIXED_RATE_ACCOUNT"

Regarding bank accounts, see the Bank Example. A strict hierarchy may not be appropriate because of the "recombinational" nature of business and marketing features.

 
Page 851-856, "Multiple Criteria and View Inheritance"

Throughout these pages, Meyer seems somewhat uncomfortable with the various solutions to multiple criteria. This is unfortunate because the multiple criteria situation he describes in his employee classification example is quite common in business applications. (The similarity of his subcategories to those in the subtypes document are purely accidental, or perhaps because employee types is a commonly used example.)

He proposes his "handle pattern" as a possible general solution, but at the bottom of page 855 seems undecided or marginally committed, admitting that the handle pattern is complicated. (His handle approach seems to use Strategy Pattern-like references.)

But the most frightening part of this section is Meyer's repeated suggestion to "pick a single criteria as a best guess" (paraphrased). For example, on page 854 he says:

"The alternative to view [multiple] inheritance is to choose one of the classification criteria as primary, and use it as the sole guide for devising the inheritance hierarchy; to address the other criteria, you will use specific features." (Emphasis added)
And, on page 856 he says:
"As you improve your understanding of the application area, it will often happen that one of the criteria starts to dominate the others, imposing itself as the primary guide for devising the inheritance structure. In such cases, the preceding discussion strongly suggests that you should renounce view [multiple] inheritance in favor of more straightforward techniques." (Emphasis added)
This is dangerous advice! I find the importance of different aspects is very dynamic. For example, new government regulations might do away with the "permanent" and "temporary" classification. (A recent court case underlined the arbitrariness of such distinctions.) Suddenly, your "primary criteria" is gone with the wind.

Meyer seems to be risking a violation of his own Continuity Principle.

Even if it stays around, another criteria may grow in importance. In my opinion, the programmer should not try to hard-wire business taxonomies into the design to begin with. It makes too many assumptions about the power of your crystal ball. It may artificially elevate the status of one criteria above the other without any strong or lasting merit. This faulty mindset is rampant in OO thinking. (It may be less of an issue for some niches, but not custom business applications.)

For the employee classification example, I would suggest using purely client relations, such as some form of attributes or strategies.

See Also:
Subtype Proliferation Myth
Publications Example
Bank Example
Aspects


Chapter 26

 
Page 897, "The War of the Semicolon....and the hatred between them is as ferocious as it is ancient."

I am glad I am not the only one who has encountered this messy battle. I now chalk it off to personal preference and wish that languages would let programmers choose which style to use via command switches, syntax hints, or some other means.


Chapter 27

 
Page 907, TV Station Example

I plan to add a more detailed discussion about this example at a later time.

At present, I see no reason why a RDBMS could not provide the basic framework and even design-by-contract-like features of DB constraints. Meyer seems to be re-inventing many collection-oriented features from scratch rather than let the built-in abilities of the RDBMS handle them. It is called "reuse".


Chapter 30

 
Page 964, Concurrency

Regarding multithreadedness support in a language. I am not sure that multithreading is really needed that often for the stated niche. One partially decent example I have seen is some sort of "wizard" help system for a GUI, which guides the user through an actual process while showing the results in an "active" screen. (This is in contrast to "modal" wizards which are basically just multi-step menus.)

Personally, I think development time is better spent on making the interface more intuitive, or making better training materials. Wizards are not something that are easy (quick) to do. Its benefits probably are not just justified in my opinion unless there are thousands of users.

If a wizard is a requirement, however, then there are ways to do it without adding multithreading directly to a language. One method is to have the wizard be launched as a fully independent process. The communication between the application and the wizard can be via regular relational tables. (Hopefully via something more friendly than SQL.)

There are also many other interactive help paradigms besides Microsoft-style wizards. For example, having an "instruction bar" at the bottom of the screen, and the current area or icon being discussed surrounded by yellow. This approach is very effective and does not require multithreadedness for the most part.

Another possible example is query results coming in piece by piece. The idea is that the user can be studying the start of the results while the rest is being loaded. One way to handle this in a development environment is to treat one process as a component. This resembles Microsoft's OLE technology, which generally does not require direct OO (classes) nor direct multithreading programming to implement from the application builder's perspective. From his/her perspective, it can be like starting a second, independent process, but with one process window being placed inside the other.

Both situations are similar. A window or panel is opened which is treated like a window into a separate or semi-separate application. The code to do it can resemble:

  launch_indep_window([window attributes],"myroutine()")
This starts up a new, independent window which gets its contents and scope from running the specified routine (called "myroutine" here). The "connectedness" of the new window to the existing process GUI or window depends on the window attributes given. For example, the attributes might distinguish between a floating window and a fixed-position panel (relative to parent window).

Again, communication between the two processes, if needed, can be via relational tables. Table handling often already has sharing and semaphore features built in. (There is generally the "locking" approach and the "transaction" approach. It is sometimes suggested that direct record locking is not truly relational.) Thus, there is no need to recreate/reinvent them for program variable sharing also. We are taking advantage of something that is already part of something else.

I realize table locking or transactions are not always the most convenient way to do concurrency; But, it is perfectly adequate for occasional use. (Remember, feature pack-rat languages tend not to do well. Handling something "adequately" is fine for the relatively uncommon stuff.)


Chapter 31 - OO and Databases

 
Page 1047, List of Database Features

Note that this is fairly similar to the Standard Collection Operations. I did not include Integrity Constraints and Transactions because these are higher-end "protection" features, rather than a minimum base.

 
Page 1051, "[relational DB] types are drawn from a small group of predefined possibilities (integers, strings, dates...), each with fixed space requirements."

The "fixed-space requirement" is implementation-specific and is not inherently true, but we will deal with that later.

Regarding the "simple types", Meyer implies that the simplicity of the types is a liability of relational systems that OO remedies. However, many RDBMS (relational systems) have as one of their goals and benefits the ability to share with many different kinds of systems and paradigms:

sharing data
The simple types facilitate this kind of sharing. Complex types are much tougher to share between systems. This is a major complaint I have with OO persistence. See Black Box Bye Bye for related observations about sharing complex types. RDBMS in essence have "skinnier wires" than OODBMS and OO data, making them better black-box candidates.
My write-ups on Control Tables suggest using tables to hold behavior. This may seem like a contradiction to my data/behavior separation suggestion to some. However, Control Tables do not really hold data (in the traditional sense) for the most part. Further, most of the content of Control Tables is indicator codes, flags, and titles; not programming code. Most Control Tables in my experience are less than 1/4 code from a cell-count perspective (for these that allow code).

 
Page 1051, "Property R3 [simple types] rules out many multimedia, CAD-CAM, and image processing applications, where some data elements, such as image bitmaps, are of highly variable sizes, and sometimes very large."

This is not true. Many RDBMS have the ability to store very large fields, sometimes called "Large Binary Objects" (LBO), because they are open to any kind of data (anything can be represented as binary).

Even file/LAN-based relational systems like early XBase allowed "memo fields" since the mid-1980's, which can be fairly large, even open-ended in some dialects. (Some dialects only permit text in memo fields, while others have binary variations.)

Generally, the regular table stores a pointer or reference (often an integer ID) that identifies the logical location of an LBO. The DB user does not necessarily see this pointer/reference, they only see the LBO. However, storing the actual LBO outside the "regular" table allows table searching and indexing to be faster. This is based on the observation that LBO's rarely need to be indexed. (Although, the text may be indexed in some cases through a separate free-form document indexing mechanism.)

However, this is an implementation issue. Neither relational theory nor SQL standards dictate any upper field-size limits. If you want to complain about field-size limits, complain to the vendor, and not the paradigm.

 
Page 1051, "A relational description will not be able to represent the reference field, 'author', whose value is the denotation of another object.... There is a workaround in the relational model, but it is heavy and impractical.... to connect the two relations, you may perform a join....no particular problem in an O-O system's run-time network of objects."

In regards to his hierarchical query statement, "bookX.author.birthYear" [paraphrased], what if there are multiple authors? The Law of Demeter also questions such usage, and hierarchical query systems were tried in the 1960's with IBM's IMS database system. Most shops found relational to be an improvement over IMS. Tree-DB's were found too fragile when relationships morphed into graphs instead of trees over time. P/R techniques tend to factor the specification of a relationship (path) into a single Query or JOIN statement instead of repeating it over and over for each field/attribute reference. This tends to reduce rework upon relationship changes.

I am not sure exactly what Meyer is comparing when he says, "heavy and impractical" and "no...problem [for OO's] run-time network of objects". Is he complaining about relational technology from a linguistic/organizational standpoint, or from a performance/speed standpoint? I will address both of these below.

Performance and Speed

Table joining (an SQL term) is nothing more than following a reference (ID or pointer). This is exactly what an OO system must also do. It is true that using the pointer/ID may require traversing an index, but indexes are to make things faster, not slower.

Indexing is a way to find something without having to sequentially traverse a collection one-by-one. I confess that I do not know how indexing works in OODBMS (OO DB's) and OO collections, but if they found a better way than RDBMS, then the RDBMS companies would have already adopted it. (See the review of chapter 8 for a description of RAM-buffering.)

I would note that the run-time structure that Meyer shows on page 227 is very relational in structure. Whether the data and links are stored in RAM or disk or a combo (caching) is mostly an implementation issue, not a paradigm issue.

There is nothing that says relational data cannot be processed in RAM, and usually is in part due to disk caching technology. My experience with XBase caching blurred the logical distinction I make between disk and RAM. Some commercial relational systems are even purposely designed and optimized to run in RAM. [I will try to insert some links later.]

The biggest distinction between relational and OO is how data is shared among paradigms (see above) and how closely algorithms are allowed to be tied to data. This has almost nothing to do with RAM versus disk. (OO also tends to be more hierarchical than relational, but this is also probably a minor issue with regard to performance.)

Many RDBMS enforce referential integrity rules, and other formal protection mechanisms. It is true that these sometimes consume a fair amount of processing. However, they are there to to improve reliability. Perhaps it would be nice if these protection mechanisms could be disabled when raw speed is needed. Note that I have used relational systems that lacked many of these mechanisms. (They were quite fast, but had occasional corruption problems if machines were turned off or malfunctioned in the middle of key processes.)

Another possible source of this false dichotomy is the fact that many OO systems use a simple linked list of pointers (object references) to model one-to-many relationships. This may make traversing the list quicker than doing a B-tree search via typical relational indexes for each entry on the "many" side. However, I see no reason why a relational system could not use a similar approach if B-trees are not satisfactory (for a given relationship). After all, the relationship between items (records) is often explicitly described to an RDBMS. Thus, it should have enough information to use a similar approach if the simple list approach has superior performance.

Relational theory encourages the hiding of the indexing and storage mechanism from the user and application developer. Thus, it can be changed or improved without affecting the application code. Indexes in OOP tend to be "naked" in this regard. OO is supposed to hide implementation, and this includes indexes, no?

Somebody suggested that perhaps relational engines have been late to the RAM database market; thus they are still optimized for disk and caching instead of RAM. However, this is not a paradigm weakness, and may change over time.

One interesting theory put forth by Russell Wallace is that OODBMS emphasize a single aspect or single data access path at the expense of others. For example, if you have a hierarchical arrangement, the OODBMS may be faster. However, if you need other views or other relations in addition to the one hierarchy, then RDBMS may be faster. One can say that OODBMS may be more IS-A optimized while RDBMS are more HAS-A optimized. Perhaps RDBMS can be tuned to give priority to specific indexes and relations. However, this may have limited value in business applications, which tend to be HAS-A in structure.

Linguistic and Organizational

I agree that SQL sometimes is a pain, but fix the bath water, not the baby. There is more to relational than SQL syntax. If SQL stinks, then think about ways to improve it or replace it with a better relational language or protocol. (XBase, for example, did not use SQL as it's collection manipulation protocol in it's heyday. However, it did have some similar elements. But one of the most significant differences was that collection manipulation was directly integrated into the overall language. This gave it a very different feel than using SQL.)

One can even use RDBMS "views" to create virtual tables in which joined tables appear as one to the table user and/or programmer. The virtual table user may not even know it is based on multiple joined tables. (Meyer even mentioned DB views somewhere in passing, but I cannot seem to re-find the reference.)

Ad-hoc Queries

Finally, Ad-hoc querying and OO seem to have a tough time getting along. See Byte 10/1997 - Debunking Object-Database Myths. Quote: "Object databases are back. They are still maturing, still misunderstood, and still hard to use." It is a bit dated, but it seemed to try to paint a balanced view, and OO query ease did not rank very high. (Some also suggest that OO query languages resemble the "navigational database" query languages of the 1960's. Relational is generally regarded as an improvement over navigational querying and theory.) I hope Meyer's solution is not "yet more and more and more training." OODBMS better be 100 percent better, because they seem to require 100 percent more training.

 
1052, Object Identity

Meyer starts out by implying that "explicit" record ID's (object or record identifiers) are a limiting factor of relational systems. However, he then goes on to say that for OODB's to scale and be sharable, logical (abstract) instead of physical "identities" are needed. That is pretty much what relational record ID's are. This appears to be a contradiction. Further, record ID's often end up becoming a very useful external reference number for customers and users. Identifying things by mutable attributes is not precise enough when you have hundreds of millions of records. (Also see Improving SQL Joins.)

An explicit unique identity makes it easier to see what is going on. Take an example from the OODBMS Manifesto site:

  (peter, 40, {kids: (john, 15, {})})
  (susan, 41, {kids: (john, 15, {})})
We don't know whether John is the same person or just two different kids with the same name and age. If all the persons had a unique identifier, then one can tell by that identifier. One could see it on a printed text report:
  
  Peter, age: 40, ID: 187623
     Kids:
        John, age: 15, ID: 457365
  Susan, age: 41, ID: 201827
     Kids:
        John, age: 15, ID: 603625
Thus, the manager does not need to call an object programmer to see if they are the same person or not by comparing their RAM address or its equivalent. Identity has an external representation in most RDBMS that even managers can use and see. The only benefit of internal identifiers seems to be job-security for OODBMS query experts.

Two Copies?

Some say that in RDBMS, if you retrieve the same row twice in an application, then you get two copies, while OODBMS will only give one, and only reference it twice.

How RDBMS and client drivers implement this is very vendor- and implementation-dependent. There is no base law in relational theory that says it has to be one way or the other. For set-based DB's (not the only game in town), the result set could simply be an index collection that references the records, rather than copy the actual content to the result set. An actual reference to a data field then could trigger a lookup of that record. But remember that in relational theory, a result set is simply a "view" based on the given criteria. It is not meant to be the "raw original". One has to think differently when comparing objects to relational result sets.

Alternatively, good accounting could make multiple result sets reference the same record if needed. Whether this is done in practice or not, I don't know. The savings might not be worth the effort for most vendors except for perhaps the larger fields ("memos", "blobs", etc). I know from personal experience that some of the desktop DB's used such an approach for open-ended fields. If it was possible in 1991, then it is certainly possible now. There are many ways to skin a DB.

Another thing to consider is consistency. Relational results tend to be "snapshots in time". If we only use references, some pieces that may exist at the time of query may not be available or out-of-sync when they are used later, creating a host of conundrums. For example, suppose the user enters a bunch of search or filtering criteria for a report (such as a query-by-example screen). Between the time the report is queried and the time it is displayed some of the data may change if we rely heavily on references. In my experience this results in phone-calls from angry or confused users.

They may ask for all products allocated to Texas, for example; but some may be reallocated to Hong Kong by another user between that time-gap. Thus, the report user may see Hong Kong items in their report even though they only asked for Texas items. (Some kind of "refresh" button would be helpful if they want to see updated info.) The snap-shot approach is thus not a flaw, but a tool used to provide an internally consistent view.

See Also: Relational Versus OOP


 


Top of Page | OOP Criticism (main)
© Copyright 2000-2005 by Findy Services and B. Jacobs. All rights reserved.