OOP Challenge 2

OOP Challenge 2 - Generic Printing

9/28/2001

Originator: Chris Twiner, et al.

Although this is a small example, I decided to include it because variations of it keep popping up and it is a good example of how OO looks good on paper, but sort of simple-minded or artificial when applied to the real world.

A portion of Chris's original message (lightly edited):

  case statement :
 
  switch(Type) {
    case 'employee' : printEmployee; break;
    case 'manager' :  printManager;  break;
    case 'director' : printDirector; break;
    etc..
  }

 
An OO approach (given person as a base class):
 
  person.print;
 
Difficult wasn't it.  The functions still have to be maintained
somewhere. If you add a new type of person the
previous code will [still run]. Only the object (and whatever
creates it) need be created/altered.

[end of quote]

My Response

First of all, one may not need a separate function for "printEmployee" et al. You might be fine doing it within the case blocks. I see case blocks as no more evil than method blocks. Blocks is blocks to me. ("break" is silly C-specific thing. Other languages don't need "break", BTW.)

Regardless, the equivalent code for both paradigms is going to be roughly the same. Point 1: Case statements are not more code than OOP generally.

The rest gets into where these blocks are best kept. Enter again the noun-centric proximity versus verb-centric proximity battle. (See the Shapes Example for more. Short version: There are significant tradeoffs to each grouping with no clear objective benefit of one over the other.)

Note that Chris separated "Manager" from "Employee", etc. Such subtyping of employees was found to be suspect in the subtypes document. However, I realize that his example could apply to other things besides employee subclasses.

Back to Reality

In practice, generic printing would only be used for debugging or "quickies"; and not something that you give to the boss except in an emergency rush. Usually there are more than one report for primary entities. Therefore, a single Print method would not suffice. (I suppose you could designate one of them as the default report per entity, but then you have to crystal-ball the intended usage for multi-entity requests.)

And, there are often reports that involve multiple entities. Thus, association with only one entity is a little artificial in many cases. See Aspects Document for more on this. Short version: Encaspulation is not as pure as often made out to be because there are often multiple legitimate association candidates. Does an Employee-By-Department report belong with the Employee class, the Department class, or a class by itself? How much of each entity does a report have to refer to before it gets it's own class? There is a lot of potential for continuity problems and rather arbitrary decisions.

One could replace the example with something like:

  sub recordDump(recordHandle)
     for each fld in getFieldNames(recordHandle)  // for each field name
        printLn fld & ": " & recordHandle[fld]
     end for
  end sub

I have actually made such utilities before. The output resembled:

     FirstNameMI: Bob K.
        LastName: Jones
            Dept: 42
        PayGrade: 16A
       WorkPhone: 123-456-7890 x31
       HomePhone: 123-373-8383
           Hired: 12/19/2000
             Etc....

(Right-alignment of the field titles makes it much easier to read than left-alignment in my opinion. However, most vendors seem to prefer left alignment because it looks prettier. Form over function? Note that I have not shown the code for performing alignment of any kind.)

We could also make one that would do something similar with any given SQL statement that returned a result set. That way we can supply it with more complicated lookups (joins) and conditions.

The same function could do both if some sort of switch is given as parameter.

I suppose an OO fan would prefer polymorphism to select which one, but then you have to create a bunch of cluttery classes and make sure everything is the right type before it can be used by our generic reporting tool. That brings up the question of how something can be generic across applications if it expects a specific type. The chance of diverse applications all sharing the same "SQLstatement" class/type is almost nil. This is a classic fat wire issue, also known as "protocol coupling". Sure you could write adapter classes, but why bother?

I know, Smalltalk can probably do roughly the same thing in some circumstances, but since a RDBMS is already around for most business applications, why not use it rather than hope your OOP language reinvented a half-baked DB from scratch? Note that our tool works no matter which language wrote to the database. You have to pipe everything through Corba or the like to do such in OOP.

A More Open-Ended Approach

Rather than limit such a device to just one entity/object/class, I often use (and sometimes even make) a tool that can do:

  showQuery("select * from foo, bar where foo.id=bar.id")

  showQuery("select * from Sales where amt > 25000")

  showQuery("select * from Sales where regionID in (14,23,82,99)")

  Etc.

This displays the query result in tabular form. (It usually either creates an HTML table, or fills a Grid control on a GUI form.) It takes only about 20-lines of code to make a basic HTML version.

An OOP class implementing its own Print or ToString method cannot easily do these kinds of things.

Medical Example

Someone at the comp.object newsgroup brought up a medical application example where there were a series of medical measurement "types". Example output may roughly resemble:

  Data for patient 12345 on 12/22 8:15am (Visit# 146)

  Mulse:     12.33 cc
  Triroid:   14 k (13) 4
  Hamptom:   143cc, 13pp, 312.32rg
  Yardiac:   573 KLM - 5
  Bulse:     428.3, 17, B
  (Hypothetical measurements only)

They bragged that each measurement sub-class "knew how to print itself". One could of course use a case (switch) statement in a procedural version. I see nothing wrong with a case statement so far based on the requirements given. (See Meyer's Single Choice Principle for more on case-statement issues and tradeoffs.)

However, let's explore a Control Table version.

Table: Measurements

Abbrev	Descript	FmtExpression
MULS	Mulse	rs.p1 + " cc"
TRIR	Triroid	rs.p1 + " k (" + rs.p2 + ") " + rs.p3
HAMP	Hamptom	rs.p1 + "cc, " + rs.p2 + "pp, " + rs.p3 + "rg"
YAC	Yardiac	rs.p1 + " KLM - " + rs.p2
BULS	Bulse	rs.p1 + ", " + rs.p2 + ", " + rs.p3

Table: PatientData

VisitRef AbbrevRef p1 p2 p3 p4 p5 p6

146 MULS 12.33

146 TRIR 14 13 4

146 HAMP 143 13 312.32

146 YAC 573 5

146 BULS 428.3 17 B

A printing function may then resemble:

  subroutine printMeasurements(visitID)
    sql = "select * from PatientData, Measurements "
    sql += "where abbrev = abbrevRef "
    sql += "and visitRef = " + visitID
    rs = getRecordSet(sql, driver=std)
    while DBgetNext(rs)
       printLine rs.Descript + ": " + evaluate(rs.fmtExpression) 
    end while
    DBclose(rs)
  end subroutine

The "evaluate" function executes a string expression as code. Variations of it are found in many scripting languages.

There are other related approaches, but this gives an idea of what can be done. Note that it even allows new measurement "types" to be added without changing a single line of code (except for the formatting expression).

An Eval-Free Version

A more formal version that does not need evaluate( ) could be built with tables similar to such:

    Table: MeasurementParts
    -----------------------
    AbbrevRef  (f.key to Measurements table)
    P  (int)   (1, 2, 3, etc.)
    Prefix
    Suffix     (" cc" for first row of example)


    Table: PatientData
    ------------------
    VisitRef
    AbbrevRef
    P_ref
    TheValue

The key (no pun) to this solution is the "Prefix" and "Suffix" fields. They allow simple string appending to create the result instead of evaluating expressions. The code to put them together may look something like:

    rs = getRecordSet(....)
    ....
    while DBgetNext(rs)
       result += rs.Prefix + rs.TheValue + rs.Suffix
    end while
    printLine result

This solution is probably superior from a relational purist viewpoint, but would be harder to set up without a custom user interface.

Somebody complained that this would not allow much custom formatting, such as controlling the number of decimal places. I assumed that the value was formatted before being saved (the field is a string). However, if really needed (not likely IMO), then we could still have a "fmtFunc" field in the MeasurementParts table:

        ....
        while DBgetNext(rs)
            temp = rs.TheValue 
            if not blank(rs.fmtFunc)
               temp = eval(rs.fmtFunc & "(" & temp & ")")
            end if
            result &= rs.Prefix & temp & rs.Suffix
        end while
        ....

Of course, if this was needed, it would bring us back to using Eval(). However, it would mostly be used for rare exceptions. If something grows common, then it should perhaps be turned into a table flag of some sort. (I used "&" for concatenation instead of plus here to avoid confusion with math operations.)

If one wanted something similar, but without using Eval(), then you could do something like this:

        ....
        while DBgetNext(rs)
           result &= rs.Prefix & CustomFmt(rs) & rs.Suffix
        end while
        ....
        function CustomFmt(rs)
           result = rs.useValue     // default
           select on rs.abbrevRef & "." & rs.p
           case "GLRG.2"
              result = zork(result)
           case "FLOG.1", "SCCR.3"
              result = dork(result)
           end select
           return(result)
        end function

The nice thing about this approach is that all the exceptions (oddities) are in one spot. If we did OOP divisions by "subtype", then such oddities would be scattered among the "normal stuff". Grouping by oddities allows one to better see patterns to factor into the mainstream if certain approaches grow more common. We can also see that FLOG and SCCR share a common implementation. Spotting the similarities and moving them together would be tougher in OOP subtype-based grouping.

Notes and Enhancements to Medical Example

The field variable names, such as rs.Descript can be assumed to be dictionary arrays in this example. In some languages, they might be represented like rs["Descript"] instead.
We could make the formatting string simpler by assigning them to single variables.
```
      p1 = rs.p1
      p2 = rs.p2
      p3 = rs.p3
      etc....
      
```
The formatting string for the last example could then be written:
```
      p1 + ", " + p2 + ", " + p3  
```
However, we risk forgetting to add a new variable if we add a new "p" column.
A UNIX-like script syntax approach could perhaps simplify the expressions. The "Triroid" example could then resemble:
```
    "$rs.p1 k ($rs.p2) $rs.p3"

      Or

    "$p1 k ($p2) $p3"     // if above simplification applied
```
The dollar-sign implies substitution of a variable or expression.
Some might complain that this approach limits the number of measurement parameters. Generally, if one looks at the max number of existing parameters, and adds 2 or 3 to that number, then the chances of going beyond that number are probably small I estimate in this case. If there are measurements which are arbitrarily long, such as a time-series, then it should probably have its own table. The above expression system could then call a time-series function that returns a formatted string.
```
 
  A record in "Measurements" table:

  Abbrev:    "TIMS"
  Descript:  "Time-Series (sec/cc)"
  FmtExpression:  "myTimeSeries(visitID)"

  EXAMPLE OUTPUT
  ....
  Yardiac:   573 KLM - 5
  Bulse:     428.3, 17, B
  Time-Series (sec/cc):  0(2.1), 0.5(2.4), 1.0(2.8), 1.5(2.9) ....
```
If there are a lot of measurement "types", then using a measurement ID instead of an abbreviation as a key may be more appropriate. Note that the user (doctors) may not necessarily see the abbreviations we choose. But for quick ad-hoc reports, abbreviations may simplify things.
Another field in the Measurements table could be Sequence, which would be a floating-point number to tell which sorting order to display the items in. We might also have other classification or grouping codes, such as department interest codes. We can then apply relational and Boolean expressions to get only what we need.
To assist this approach, PrintMeasurements routine could be made a bit more generic by passing in part of the Where clause.
```
  subroutine printMeasurements(whereClause)
    sql = "select * from PatientData, Measurements "
    sql += "where abbrev = abbrevRef "
    sql += "and (" + whereClause + ") "
    ....
```
We could then get the same results as above by calling:
```
    printMeasurements("visitRef = 146")
    //  Or
    printMeasurements("visitRef = " + visitID)

    // To show just one measurement:
    printMeasurements("visitRef = " + visitID +
      " and Abbrev='YAC' " )
```

Challenges Intro | OOP Criticism

VisitRef	AbbrevRef	p1	p2	p3	p4	p5	p6
146	MULS	12.33
146	TRIR	14	13	4
146	HAMP	143	13	312.32
146	YAC	573	5
146	BULS	428.3	17	B