Publications Case Study


A Study of Dynamism with Resource Constraints
DRAFT - 2/25/2000

On page 61 of Bertrand Meyer's Object Oriented Software Construction, 2nd Edition, Meyer brings up an example involving a book and periodical library system.

It just so happens that I worked with a corperate library system to study merging two libraries and a programming language conversion. I also did a proof-of-concept for an intranet version of the query interface.

Publications are actually a rare example of business dichotomies that remain stable enough over time to perhaps take advantage of the subclass oriented way of thinking. However, it still has a foot in the realm of dynamism. Thus, this example should probably not be viewed as a typical business application.

The data layout for the library items I worked with resembled:


Title - Publication title
Authors - Author or authors of the work
Call_No - Unique identifier (I believe an internal method was used)
Agency_No - Special number given to certain publications from certain government agencies.
Doc_Type - Document type. There were several, including some internal classifications.
Publisher
Pub_Date - Publication Date
Library - Location of item. For a while they were filed separately.
Notes
Keywords - Any keywords which may help searching.
 
(Note that there was other fields for internal tracking and logging purposes, but I will leave them out to keep things simple. Also, note that I did not come up with this layout.)

Now, this system was kept fairly simple. It was up to the librarians to key in the right set of fields for each new entry, and everybody seemed happy to do it this way. For one, it was flexible. The programmer did not have to become a Publications Taxonomy Expert to have it ask for only the applicable fields. Further, If new type of publications came along, often an existing field was close enough to the new needs. This reduced the need to rent programmers for changes. (The library had a tight budget.)

Footnote: some complain that the unused fields for a particular publication type "waste space". However, the space "wasted" is rarely more than a byte or two per unused field on most modern RDBMS. Thus, "empty" fields are of a minor resource concern.

However, what if it was to be a more sophisticated system? What if we wanted the data entry screen to ask for only those fields relevant to a particular publication type?

There seem to be two competing approaches to this. I will call the common OO approach the taxonomy hierarchical, or "tree" approach. (Taxonomy is the science of classifying things, such as animal species.) The second approach I will call the "feature-list" approach. (For a preview of what the feature approach looks like, you can peek at the Bank Example.)

Although Meyer did not show the OO version of the publication representation (that I can find), based on the information on page 61, I will surmise this representation of what Meyer would do for an OO version:

  class Publication {
     var author, title, pubYear
  }
  class Book inherits Publication {
     var publisher
  }
  class Journal inherits Publication {
     var volume, issue
  }
  class Proceedings inherits Publication {
     var editor, place
  }
(See footnote about other possible OO alternatives.)

In this approach, if a publication is designated a "Journal", then the data entry screen will have an input box for author, title, pubYear, volume, and issue. The first 3 come from the root class (Publication), and the rest come from the Journal subclass.

Before we pass any judgment, let's look at the "feature-list" approach. For this, I will use a sort of Control Table:

PubType Author Title PubYear Publisher Volume Issue Editor Place
Book X X X X        
Journal X X X   X X    
Proceedings X X X       X X
Other X X X X X X X X

This control table carries generally the same field association information as the tree (OO) version. However, we added an "Other" category as a catchall for unanticipated categories. (Allowing for such is a good practice.)

In the tree version, would adding the "Other" category result in moving those fields back to the top class? If many fields end up being shared in the future as categories are added and changed, then an OO programmer may end up playing musical chairs with the field variables and their level on the hierarchy. Plus, making the level change may make for some tricky reformatting/converting of existing data, since it would be moved to a different class. The feature-list approach requires no data reformatting what so ever if a field is changed to be shared among subtypes.

Further, the "feature checklist" control table can be managed by a non-programmer. (The assigned person should be skilled and comfortable with computers, but they do not have to know anything about program code nor OO subclassing.)

Dynamic

My business software experience tells me that anything dealing with variations (subtypes) MUST be flexible. In this example, new mediums (variations) were being added as I watched: They had videos, CD's, and even training software in their library. What new media and publication types will there be in the future????? What about hybrids? What about a book that also contains a CD or video?

The mutually exclusiveness of traditional OO types is now a poor model for our "has X and Y".

Suppose videos were added as a publication type. "Length" (duration) is proposed as one of the new fields pertaining to videos. After several months go by, someone points out that Length can pertain to other publication types. For example, almost anything with paper can have "number of pages". Software can even have it's size in "K" or "megs" as it's length. (One might also want to think about having a "Unit_of_Measure" field to make it more open ended. Unit-of-Measure could be pages, minutes, megabytes, etc. The unit-of-measure could be tied to the item type so that it does not have to be retyped for every entry.)

In the OO tree approach, first the programmer will add Length to the Video subtype. Then when it is later agreed to have Length apply to other types as well, the programmer has to come back in and move the field Length back to the root class (Publications). In the feature-list approach, the "Field Administrator" only has to open up the type control table and make a few simple mouse or keyboard clicks. It is in essence, "subtype painting by number." (The "Field Administrator" may or may not be a programmer.)

It is often not practical to expect a programmer to come tromping in every time a new variation is added or changed. That can be expensive, especially if a new (turnover) programmer has to reverse-engineer the existing code, which they will not be familiar with. The feature-list approach is much more flexible. It is often also safer because one does not have to meddle in the code to add or change variations.

New Fields

New fields (regardless of how shared among subtypes) will probably require programmer intervention in either case. However, there are steps that can be taken to minimize this. (Warning: programmers may put themselves out of a job by reading these tips :-)

First, try to keep fields somewhat generic, rather than publication type specific. In an above example we found we could use Length (or perhaps "Size") along with a Unit-of-Measure field. This allows it to apply to a wider variety of future and changed subtypes.

The "Place" field in Meyer's example could also double as publication location (such as Country) in case such info is needed.

Second, I find it convenient to add a custom field or two along with an editable label field. Example fields:

....
Issue
Editor
Custom1_title
Custom1_value
Custom2_title
Custom2_value
The field management table may then resemble:

PubType Author Title PubYear Publisher Volume Issue Editor Custom1 Custom2
Book X X X X          
Journal X X X   X X      
Proceedings X X X       X Place  
Artifact of Hist.   X X         Finder Est. Age
Other X X X X X X X    

Here we have added the "Artifact of History" publication subtype. The two "new" fields for it (labeled "Finder" and "Est. Age") are probably too rare to make into a general field available to all. Thus, we used the two custom field sets. Custom fields were "created" without ever altering the data structure.

We also moved "Place" into a custom field. However, now that I think about it, "location" could also be used for the artifact discovery location. Thus, it perhaps deserves to be a generic field of it's own.

(Note that all subtypes should probably have a 'Note' field. Therefore, we did not show it. Perhaps the same argument can be made for 'Title', thus simplifying our table.)

Summary

Many OO books and examples often use inheritance as a solution for these types of problems. However, we have seen how inheritance is not flexible enough to keep up with typical changes.

Perhaps there are other ways to model such problems with OO (see below), but not in such a way that takes advantage of the built-in features of OO. In other words, OO does not provide a superior solution for these types of problems.

It seems that OO training materials often use "bait and switch" techniques. They sell you on the simplicity of inheritance, yet in the real world, something more complex and less able to take advantage of OO's differences are needed instead.

Footnotes

An OO expert may perhaps claim or agree that sub-classing is not appropriate for this application, recommending a different OO approach. For example, a "client relationship" (a Meyer term) between subtypes and fields could be envisioned:
  class Journal inherits Publication {
     authorField author
     titleField  title
     pubYearField pubYear
     volumeField volume
     issueField  issue
  }
This gets one away from direct tree management, but I fail to see how it is superior to the non-OO alternatives. Even the repetition of names suggest poor factoring (the type and the instance repeat much of the name). It also does not allow non-programmers to add new subtypes.

If you have a good OO version that you think is both simple and flexible, then please let me know. (No elaborate, Rube Goldberg-like OO patterns, please. I hate those.)

Data Integrity?

Some complain that letting the librarian add and control fields may result in "muddy data." To some extent this may be true. However, if the client does not want to pay for programming time every time a new publication medium type is encountered, then they should not have to. The analyst should only present the options and the tradeoffs. It is not the analyst's job to force their ideals onto the client, only to present honest tradeoffs and recommendations. (Note that control tables are still useful even if customer access is decided against.)


See Also:
Subtype Proliferation Myth
Shape Example
Bank Example
Control Tables


OOP Criticism Main | Meyer's OOSC2
© Copyright 2000 by Findy Services and B. Jacobs