Challenge #5 - Mellor's Problem (Power Utility)

Originator: Object Mentor site (Robert C. Martin)

Updated: 2/11/2002

The Object Mentor website has an example application that has been used to compare various OO design approaches. The application is based on a public power utility scenario and its various rate calculation strategies for different "types" or ranges of customers and the influence of season and location.

The original paper (PDF format) can be found here: http://www.objectmentor.com/publications/articlesByDate.html

Search for "Mellor" if do not immediately see it listed.

Analysis

The crux of procedural criticism in the OO paper says:

"By putting the entire rate policy in the same module, as part of the same algorithm, we risk interdependencies between them. A single change to one part of the policy has the potential to affect them all."

I am not sure if the author means "same module" or "same routine". I will address both; routines first.

There is no law that says that all procedural code has to be in one subroutine. Thus, I split it into multiple routines (see code sample below). Why the OO author did not consider this approach is very odd. (Note that I included routines not shown in the C version for completeness.)

Compiled Divisions

Some have suggested that the author is talking about granularity of the compiled units. For example, if the language can only compile at the module level, then changing one routine would require the entire module to be re-compiled. However, this is very language dependant. I have used languages that allowed re-compiling at the routine level (although it still needed a global bind). Interpreted languages offer even more options. dBASE, for example, allowed one-routine-per-file and the p-code was recompiled at that level (routine).

I won't claim that all procedural languages allow easy routine-level recompilation/replacement/splitting, but there is no by-birth limit on the paradigm itself. It seems many readers of the Object Mentor use C as their guide to what procedural languages are or are not capable of.

Further, I probably could have put each routine in my example into its own module.

Safety Versus Ease

The problem statement seems to indicate the "safety in change" is more important than ease of change or understanding.

I am not sure if this is common in many business applications. Most businesses keep their IT department to a bear minimum, meaning that things have to be easy to get to, understand, change, and get out.

Excess layers may hamper this. I think safety and convenience are sometimes mutually exclusive. In addition, programmers may be encouraged to take "hacky" shortcuts if there is a lot of red-tape in their way of getting things done. Thus, extra "protection layers" may have the opposite effect if its original goal.

Fat Joins

Someone suggested that such an application may not scale because the SQL "joins" (relations) may grow too large and complicated as the application grows.

It is hard to say without actually seeing the pattern of growth. Will the needed details be in a few tables or lots of different tables? See Join Issues in the Business Modeling article for more on this issue.

Changing Existing Code

Somebody has pointed out that using a case statement may require modification at two different places if a new region (territory) is added, for example. The case list would have to be changed, plus the new routine/module would have to be added. An OO approach can simply have a new region sub-class added without changing any existing code (although this may be language-dependent).

If the code is kept in the case statement instead of calling a separate routine, then only one routine need be changed. However, some claim that this risks altering existing code (nearby case blocks). I would point out that sometimes one has to risk "bumping" nearby methods if only one method needs changing in a class.
They also claim that keeping the code in a case statement makes the subroutine "too long". I view each case block as a separate unit the way an OO fan may view methods as separate units. Blocks is blocks. Case blocks are simply small blocks nested in larger blocks the same way that methods are usually nested in a class. It is a typical aspect tradeoff as far as what we allow to be "exposed" by having them together. The granularity of what can be compiled or changed as an independent unit is often a language-dependent and/or IDE issue.

If such was truly a problem, then a control table could perhaps be used, where fields in the Region table either directly contain or reference code (via a function name). Adding a new region would require no changes to existing code.

In some ways this is better than the OO approach if there is already a Region table being used. This is because a subclass would have to "register" itself somewhere to something to be known to the existing application The "registry" is probably some kind of table or list. IOW, such a design has a bit of Table Oriented Programming design to it.

But, if there is already a Region table, then the registry is duplicating the region list. In other words, a violation of the Single Choice principle. One probably cannot add other region-related attributes to this registry. It makes more sense to take advantage of a list (table) already in existence, or at least one that has the ability to accept new attributes.

Although I don't have a dedicated Region table in my design, if region becomes a central concept, then the formation of one is likely IMO.
Someone pointed out that the "registry" may not have to be something that the programmer manages, and thus should not "count" as duplication. I will leave it to the reader to decide on this. It may depend on the programming environment and tool implementation.

See the countries tax example for similar issues, especially with regard to tablizing the divisions instead of using case statements. Also see the "change pattern" notes below.

Further, polymorphism can often degenerate into something else. Case statements are often more flexible under many of these degeneration scenarios because code does not have to be moved. Only the case or IF criteria need be changed in most cases. (Or, case blocks changed to IF blocks.)

Change Pattern Unknown

We don't have enough information about the change patterns of this application to know whether region-based dispatching will expand, or change into something else. My approach is equally ready for both, while the OO design is too highly coupled to region divisions in my opinion. OO solutions often assume that the change pattern is "yet more of the same" with regard to adding more variations/subtypes/divisions.

I can imagine rules like, "if in region X and an approved member of the Conservation-B program, then ....". An OO version may further split the Region-X subclass into a conservation-B and non-conservation-B sub-sub-class (under region-X). This risks degenerating into the Mega-Name Pattern.

General Government Monetary Rule Patterns

I suspect that the pattern of most government monetary billing, taxing, and fee calculations would resemble those found on typical personal income tax forms (middle-level, not the "simplified" forms). My experience with government contractor billing requirements found generally similar patterns. If one graphed such patterns, they tend to resemble a directed graph.

The directed graphs don't have many loops (they don't "back up" for the most part, unless perhaps if a mistake is detected), and the requirements tend to use "schedules" (range look-up tables) instead of formulas. Schedules require less math exposure to understand (including for the politicians who make such rules), and better avoid rounding discrepancies and confusion.

I suspect that in practice, the Mellor "sliding scale" formula would be defined under (translated to) a schedule. I suspect the author described it as a formula to save space or avoid producing a table, which is more effort and sometimes harder to electronically distribute than a table.

Although some of this kind of pattern could perhaps be represented by splitting customers into "types", or even calculation-specific (localized) types for the "sub-branches", I see no real advantage of such. The range, scope, and granularity of such types would be subject to unpredictable change. Perhaps one could say, "I think better when I turn things into subtypes". However, they should be careful about extrapolating such a mental preference into other individuals without some external metric for reducing code size, reducing change impact effects, etc.

So why doesn't the stated Mellor Problem follow this pattern? It could just be an exception. However, I suspect that many OOP authors and fans subconsciously exaggerate the presents and occurrence of sub-types. OOP textbooks and training materials point out the pattern so often, that peoples' minds start to see the pattern where it does not really exists, or remember its presence when it does occur, but not its absence. Reading the newspaper may give one the impression that plane crashes are common when in fact planes are more safe per-mile-traveled than cars. Mr. Martin may be more likely to collect examples that fit patterns OOP is optimized for or only present a view of the problem that allegedly fits OOP's strengths. (I have not independently verified the accuracy of the Mellor Problem, so I am only presenting speculation at this point.)

Customer "Plans"

There are many other potential change-patterns that we must deal with. In business applications, usually there is an upper limit to the number of "levels" or mutually-exclusive divisions before things "degenerate" into other patterns. From the customer's perspective, the most flexible solutions are often independent features.

For example, some computer manufactures offer "levels" of PC's. They may have an entry-level PC, a mid-level, and high-end. Sometimes this is divided up into home and business users. A typical taxonomy might resemble:

  PC CUSTOMER "LEVEL" TAXONOMY

  Home User PC
     Low End
     Medium End
     High End (serious game player)
  Business User PC
     Low End
     Medium End
     High End

However, this approach is often not satisfactory for the customer. They might want only some of the features of the high-end machines, but don't want to pay for the rest of those features. I have seen many ads by computer companies with quotation forms that treat each feature independently.

  QUOTE/ORDER FORM FOR CUSTOM PC

  Graphics card: [ ] Low    [ ] Medium     [ ] High
  RAM Amount: ______
  Case:  [ ] Flat  [ ] Mini-tower  [ ] Full-tower
  Disk Drive: [ ] 20g  [ ] 50g  [ ] 100g  Other: _____
  OS:  [ ] Windows   [ ] Red Hat   [ ] OS/2   Other: _____
  CPU: [ ] AMD-Y50   [ ] AMD-Y70   [ ] Intel ......
  Etc......

(The form is simplified for illustration purposes.)

The first approach bundles or couples features such as graphics card and case type. Although it may simplify manufacturing, it is often not the best way to satisfy customers.

I once wanted to purchase a vehicle with leather seats because leather is easier on family allergies than cloth seats. However, the dealer said we had to purchase the "high-end" model in order to get leather seats. Instead, we found a different dealer who would install leather seats in the mid-level model for a reasonable amount. Besides, there were styling features on the high-end model that we really did not want.

The given Mellor customer brake-down in many ways fits the "level" model rather than the feature-independent model. For example, the hours of notice could be made continuous. It might also grow independent of other power-related features. For example, there may be one plan for large service businesses, and another for manufacturers. Yet, both of these may still have hours-of-notice choices.

Thus, I find it not very change-friendly to couple features in software design unless there is some compelling force in the problem space that "glues" such features together. Perhaps State governments don't care about customer convenience and are stuck in their ways (arbitrary, inflexible categories). However, laws can still change in the blink of an eye, and we should prepare for them.

In extreme cases, independent feature implimentations tend to resemble the hyper-grid pattern. A study of hyper-grid issues may trigger some ideas about managing changes in a Mellor-like problem.

Design Decisions

There are a couple of important design decisions I made. First off, the distinction between "business" and "industrial" seems like a weak taxonomy division both in the description and reasoning about what a clear distinction would be. Therefore, I rolled "business" into the "InterruptStrategy" field strategies. (See table illustration.) Businesses that don't have interruption discounts would get a strategy of "none". This decision may be the source of heavy debate, but adding a separate flag would not change my design much anyhow.

I could have even rolled up the consumer strategy flags into that one strategy field by having a "reg_consumer" and "lifeline" strategy name. However, I felt that it may be best to keep these separate without knowing more about typical change patterns for that organization. Also, the two consumer flags could possible be rolled into a single "ConsumerStrategy" field. This would allow more strategies to be added without changing the schema. In fact, they could all be rolled up into a single strategy, with possible dispatching codes such as reg_consumer, lifeline, reg_biz, indust, onehour, no_notice. These could even be used to dispatch routines without a case statement. Just something to ponder.

Also, I use the Sites (location) table as the record (instance) source instead of the Billing table. In other words, it is by site instead of by customer, unlike the original. This has two advantages. First, it avoids having to implement a loop/iterator under the business/industrial section; and second, it allows one to break the bill out by site, which real customers would most likely be happy to see (if not demand it). Note that I have it save the calculated site value.

This approach may change how bulk discounts (multi-site) are applied. A volume discount would probably be calculated after all the site amounts are calculated.

I used a "RegionMap" table to map zip-codes to territories. This avoids the problem of assigning territory codes to every site when the area ranges change. If by chance zip-code is not fine-grained enough, or there are exceptions, then the "TerritoryCalc" routine can be altered to handle them. (The original did not directly show how the territory calculation/lookup was done.)

Table Illustration

Region Map

The Billing table, not shown, has fields "BillingID", "Name", and several typical address fields. See above analysis footnote about alternative schema design strategies.

Summary

Overall, the OO approach given seems much more complex than this version. At least I see no way that it is clearly superior. If this extra complexity is to provide "change-safety" then it makes some poor assumptions about human nature and/or ignores a wide range of potential change patterns in my opinion. If it is done for compiler division reasons, then it makes some wrong assumptions about the potential of procedural/relational languages.

I do agree with many of the points presented in the original. However, the reasoning seams to end when it comes to procedural/relational comparisons.

It would also be nice if a full code example was included with the original document, instead of just UML-like diagrams.

Notes

It would possibly make sense to use stored parameters for the actual rate values and ranges instead of hard-wiring them into the code. (The original mentions this also I believe.) However, if the algorithm changes as often as the rate amounts, then perhaps tablizing them is not worth it.
Still, there are places like the "SlidingScale" routine that could probably use some local factoring of rates and ranges so that one variable has to be changed instead of multiple copies of the number. Or, even make it more generic (not dependent on "rs" for example) if sliding scales are used for other parts of the system as well.
I could not get the "direct" method to work, so I had to use SQL to update the amount. Normally it would look something like this:
```
     rs!amt = calculatedAmt
     rs.update
```
Microsoft DAO often needs an ".edit" command also, however, ADO does not. ADO was not available in this version of the interpreter. It may have otherwise solved the direct access problem also. The direct access syntax is especially convenient when updating many fields.
In case you have the urge to claim that the "dot" syntax in the DB API's makes this an OOP program, the API's for record traversal have been very similar for decades, before OOP API's were popular. Back then the record-set reference was called a "handle".
It might make sense to actually store any error messages in the Sites records rather than just report them to the screen. This would allow for batch processing to also obtain error messages.
The summar/winter calculation in the original seems a little odd. The written description said that summer rates were higher due to A/C demand, but the rest of the specification seems to have it reversed.
There seems to be some disagreement among readers about how the sliding scale is calculated. However, I don't consider this issue pivot-able to larger design issues.
The SQL engine used for this example is not very sophisticated. For example, it does not seem to like aliases in some Join clauses. VBA also is not very well designed for string concatenation and variable insertion.

Code Sample

This code example is written in Microsoft Visual Basic for Applications (VBA) using MS-Access 97. I don't claim VBA to be the best language. I used it simply because I have it and it has a decent table IDE. See the notes above about various language issues and suggestions for improvements. A few of the longer lines have been artificially wrapped for display purposes.

Option Compare Database
Option Explicit
' Mellor's Problem - version 1.0b

Dim stdDB As Database     ' DB connection

'--------------------------
Sub main()
   Set stdDB = CurrentDb
   calcMany "1=1"            ' true = all
   MsgBox "Done!"
End Sub
'--------------------------
Sub calcMany(criteria)  
' calculate multiple sites with a criteria expression
   Dim sql, rs
   sql = "SELECT * FROM Sites LEFT JOIN RegionMap "
   sql = sql & " ON Sites.zipCode = RegionMap.zipCode "
   sql = sql & " WHERE " & criteria
   ' Open the recordset and calc each record
   Set rs = stdDB.OpenRecordset(sql, dbOpenDynaset)
   Do While Not rs.EOF
      CalcAmt (rs)
      rs.MoveNext
   Loop
   rs.Close
End Sub
'--------------------------
Sub CalcAmt(rs)   ' Calculate amount for a given record
   Dim amt
   amt = 0
   
   If rs!isConsumer Then
      amt = CalcConsumer(rs)
   Else
      amt = slidingScale(rs)
      amt = amt * IndustrialDiscount(rs)
   End If
   '---- Now save it
   Dim sql
   sql = "UPDATE sites SET amt = " & amt & " WHERE siteID = " & rs!siteid
   stdDB.Execute (sql)    ' Edit command would not work, used SQL instead
End Sub
'-------------------------
Function CalcConsumer(rs)
   Dim result, KWH
   KWH = rs!KWH
   If rs!isLifeline Then     ' low-income discount
      If KWH <= 100 Then
         result = KWH * 0.03
      ElseIf KWH <= 200 Then
         result = 3 + (KWH - 100) * 0.05
      Else
         result = territoryCalc(rs)
      End If
   Else
      result = territoryCalc(rs)
   End If
   CalcConsumer = result         ' return result
End Function
'-------------------------
Function IndustrialDiscount(rs)
   Dim result
   Select Case LCase(rs!interruptStrategy)
   Case "none"
      result = 1
   Case "indust"
      result = 0.95
   Case "onehour"
      result = 0.9
   Case "no_notice"
      result = 0.8
   Case Else
      MsgBox "Error: missing Interrupt Strategy for site " & rs!siteid
      result = 1
   End Select
   IndustrialDiscount = result   ' return value
End Function
'---------------------
Function territoryCalc(rs)
   Dim rate
   Select Case rs!territID
   Case 1, 2
      rate = IIf(isWinter(), 0.07, 0.06)
   Case 3
      rate = 0.065
   Case Else
      MsgBox "Error: unknown region for site " & rs!siteid
   End Select
   ' Insert any exceptions to zipCode-based lookup here
   territoryCalc = rs!KWH * rate
End Function
'-----------------------
Function slidingScale(rs)
   Dim slide, rate, KWH
   KWH = rs!KWH
   If KWH < 1000 Then
      slide = (KWH - 1) / 999
      rate = 0.09 - (0.04 * slide)
   Else
      rate = 0.05
   End If
   slidingScale = (KWH * rate)
End Function
'-----------------------
Function isWinter()
   isWinter = False     ' temp filler
End Function

Challenge List