Patterns in P/R

Patterns using Procedural/Relational Paradigm

Move Over OOP!

Updated 8/17/2003



The concept of "patterns" has swept the object oriented programming community since the release of the "Gang of Four" (GOF) OO pattern book. ([insert reference]). A good many of the GOF patterns have equivalent procedural/relational (p/r) counterparts to them. (Those that don't are usually specific to OOP languages.) Here we attempt to document some of these p/r counterparts. Patterns are not just for OOP.

Most p/r programmers will recognize many of these patterns, or at least some of their variations, in their existing work. I find the p/r versions of the patterns are usually more intuitive, simpler, and more flexible than their OO counterparts. However, this may be purely subjective until solid, unbiased, metrics are devised.

For many patterns, there is generally an "Eval" and "Non-Eval" version. The non-Eval version usually involves a case statement, while the Eval version involves storing expressions in tables or other collections. Some languages are more Eval-friendly and/or table-friendly than others. Thus, the approach chosen will often depend on the features and abilities of the language. Generally, I break table-driven dispatching down into 3 levels:

  1. Using a case-statement to execute the code selected by the name given in the table column(s).

  2. Using an Eval-like function to call the function given (as a function name). Note that the table may look the same here as #1.

  3. Putting the entire routine or code segment into the table cell. This could then be executed via an "execute()" function. This makes a row resemble a "horizontal class" in some respects.
Some OO fans say that putting expressions or code into tables is "doing OO without OOP". Rather than get caught up in chicken-or-egg terminology battles, let's just say that I prefer tables because of their 2-D nature as opposed to the 1-D nature of OO code. Placing code in collections pre-dates the birth of OOP (Simula-67) by roughly a decade, at least. OOP gets credit for using classes to do such, not collections. LISP pioneered many aspects of treating code and data in a similar fashion.

Even without Eval or Execute, the p/r version is still better than the OO version in my opinion. I will grant that the OO approach offers a bit more potential compile-time checking, but not much else. (Perhaps sacrificing run-time changes/additions in the process.) Further, it seems more logical to use the same collections management approach for both program code AND data. Otherwise, you are duplicating effort (special IDE's), duplicating schema information, and increasing the learning curve. Collection handling should be factored into a single methodology regardless of whether it is code or data. LISP came closer to the right idea by treating code like data instead of data like code, which is what OO does wrong.

You will note that many of my examples don't include a lot of code. This is because many of the GOF-influenced patterns are mere "viewpoints" in the relational world, created by a trivial relational or Boolean formula, and not a physical code structure. Further, they tend to be local rather than global. This makes them easier and less intrusive to test and change. Thus, to us table fans, GOF-like patterns appear Neanderthalic. It is better abstraction to use formulas to represent repeating patterns than to physically repeat them. For one, the same items can more easily participate in multiple patterns. Think about it.


Strategy Pattern

The strategy pattern is about a set of algorithms to solve the same basic problem. It is fairly common in business applications. It often ends up being a code in a table that specifies which algorithm to use.

For example, the shipping costs for a commerce system may be by total weight, by total amount, by quantity, or some other complex approach. If you were a commerce provider for several stores, you may have a Store table that resembles:

Table: Store
StoreID
StoreName
CustomerID - reference to customer
ShipCostMethod - strategy name (string)
...etc...

("Method" here is not to be confused with an OO "method".)

sub ShippingCost(order, store)   // parameters are recordsets
  var result = 0
  select store.shipCostMethod
  case "byweight"
     result = order.totPounds * 0.65
  case "byamount"
     var minAmt = 3.50
     result = order.totAmt * 0.10
     if result < minAmt    // minimum
        result = minAmt
     end if
  case "byquantity"
     result = order.totQty * 4.0
  case "complex1"
     result = complex1(order)
  case "complex2"
     result = complex2(order, store)
  otherwise
     errRaise "No such strategy"
  end select
  return(result)
end sub
Real system rates are often "schedule driven" (tables), but we simplified it for illustration purposes. Also, the various shipping companies may be included, such as "UPS", "Fedex", "Airborne", etc. These may be considered sub-strategies, or perhaps just parameters of ShipCostMethod. (More on parameters will be discussed below.) The various credit card vendors (MC, Amex, Visa, Discover, etc.) may also be considered strategies, since it takes different algorithms to verify their card numbers.

Notice how the simpler strategies are inside of the subroutine, while the more complex ones call other subroutines/functions.

Parameters

Sometimes parameters need to somehow be specified in the table. This of course complicates things (in either paradigm). There are at least 4 approaches to parameters:

  1. Put the whole expression in a field

  2. Have one strategy name field and one parameter field for all of the potential parameters.

  3. Have multiple generic parameters (p1, p2, p3, etc.)

  4. Have a seperate table for each strategy
Approach #1 we have seen in some of our Control Table and Data Dictionary examples. I tend to only recommend this for table contents maintained by programmers and not by application users (via a regular add/change/delete interface). Otherwise, the application will have to write and change parameters in expressions, which is messy to program and manage.

Approach #1 usually requires the ability to evaluate expressions at run-time. An operation like Eval(x) is often used for this purpose. Note that using Eval can eliminate or reduce the need for a "dispatch" routine (case statement) because the routines can be called directly. Remember, though, that not all languages offer an Eval() equivalent. These guys are probably "stuck" with a case statement (which is still better than what happy-to-bash-case-statements OO fans need, in my opinion).

Approach #2 is similar to #1 accept that it separates the parameters from strategy (routine) name. It is friendlier for languages that lack variable "scope inheritance". Often one wants to pass the primary table handle(s) used in a routine to the strategy function. These are too "internal" to store with the strategy expression. (Specific strategies are often dedicated to the strategy dispatcher routine {such as our ShippingCosts routine} because they are too specific to be considered general purpose anyhow. Internal parameters/handles are roughly analogous to "self" or "this" references in OOP.)

Thus, we don't want the internal parameters in the parameters stored in our table. In languages that don't support scope inheritance, one way to achieve this is to "edit in" the internal parameters:

e = t.strategyName & "(" & internalParams & "," & t.strategyParams & ")"
result = Eval(e)
Here "&" represents concatenation. If one side or both don't need parameters, then we may want to parse out the quote. LISP-like languages are more flexible this way because they don't need commas to separate parameters. I just wish that LISP used "f(x y z)" syntax instead of "(f x y z)". Still, LISP-like languages are great for integrating tables and code.

This approach still suffers some of the same limits as approach #1. Namely, it is hard to change the parameters except by direct (programmer) editing.

Approach #3 may use a table that resembles:

Table: Store
StoreID
StoreName
CustomerID - reference to customer
ShipCostMethod - strategy name (string)
ShipCostParam_1
ShipCostParam_2
ShipCostParam_3
...etc...

Here are some example tables based on both approaches for comparison.

Parameter Example A
StoreID (misc.) ShipCostMethod
123 ... byweight(0.65)
124 ... byamount(0.1, 3.5)
125 ... complex1( )
126 ... byweight(0.40)

Parameter Example B
StoreID (misc.) ShipCostMethod ShipCostParam_1 ShipCostParam_2
123 ... byweight 0.65  
124 ... byamount 0.1 3.5
125 ... complex1    
126 ... byweight 0.40  

Code using parameter example B may then look something like:

  ...
  case "byweight"
     result = order.totPounds * store.shipCostParam_1
  case "byamount"
     var minAmt = store.shipCostParam_2
     result = order.totAmt * store.shipCostParam_1
     if result < minAmt    // minimum
        result = minAmt
     end if
  ...
(A type conversion may be needed for strong-typed languages.)

Approach #4, separate table for each strategy (if there are parameters), allows new strategies to be added without having to change the primary table if it is lacking enough parameter slots. It also allows dedicated parameter slots (similar to named parameters.) However, this may be overkill for simpler situations.

An example of "dedicated tables" is if we have an Employee table with a flag which indicates exempt or non-exempt status. Suppose that exempt status required much more (or different) information than non-exempt. This information can be moved to an "Exempt" table, which would be dedicated to exempt-specific information. (You could perhaps also have a "NonExempt" table for the non-exempt-specific stuff if needed.) A rough rule of thumb for table dedication is 4 or more strategy-specific fields. (Note that there is still the original Employee table. We are not sub-classing here.)

A good compromise may be a hybrid. Use approach #3 for simpler strategies and #4 for the complex ones. Two or three parameters in the primary table is probably sufficient for a majority of the strategies. Thus, only the more complicated strategies would use dedicated tables.

A fifth possibility is to place all possible parameters into the primary table. Specific names would be given to each. Sometimes there is enough overlap to reduce the fields needed. For example, two of the strategies above have shipping rates; thus, a "shipRate" field could be shared. Overall, this fifth approach should be used sparingly, however. It is common practice, though, to have dedicated slots for commonly-used attributes. Often times this is done because the designer suspects that such attributes may be needed by multiple strategies/purposes. Deciding whether a field is strategy-specific or likely to be shared among strategies requires good "business sense" or domain-specific knowledge.

Frequency Assessment

Although variations of the strategy pattern are quite common, in actual use I find they tend to be "blended" with other strategies. Thus, modeling strategies as stand-alone units of execution is probably limiting. (Unfortunately, this is how OO texts often portray them.) This is why the "Eval" version above is not really used often. See Nested Block Proliferation for a discussion of certain patterns of "nested strategies". Although the cited example is exaggerated for illustration purposes, strategies are indeed often nested or "cross-influencing" in actual practice in my experience.

Another damper to isolation is shared initialization or shared results. Even though the strategies may be different, they may still need the same initial information and/or dump results to the same place. Breaking them into separate methods or routines may result in more named blocks to manage and parameter passing than if you follow the simple pattern:

  sub task_X()
    initialization stuff....
    select on [stragegy]
    case "1" {....}    // strategy #1
    case "2" {....}    // strategy #2
    case "3" {....}    // strategy #3
    case "etc...."
    end select
    finalization stuff....
  end sub


Observer Pattern

The observer pattern is generally used to "trigger" or "notify" a set of other operations (routines) when a given event happens. Three common p/r approaches are:

  1. A "dispatch" routine (with a case statement)

  2. List of routines (list evaluation)

  3. Table (one row per callee)

Approach #1 may resemble:

  sub notifyUs(x, y)
    routine1(x, y)
    routine2(x, y)
    routine3(x, y)
  end sub
Wasn't that easy?

Note that unlike the Strategy pattern, the parameters are usually the same.

An example of #2 can seen in P/R tips.

Approach #3 would involve putting the routines to be notified into a table, one row per called routine. When an event occurs that is to be "observed", the table is then iterated through and the routines executed.

In its simplest form, it is not that much different from #2. However, using tables allows us to do some fancy things like trigger only a limited set of routines based on some conditional criteria:

sub notifyUs(params)
  var rs = openRS("SELECT * FROM NotifyX WHERE deptSales > 20000")
  while getNext(rs)
    eval(rs.notifyee & "(" & params & ")" )
  end while
  // assuming auto-rs closing
end sub
Or, to be more generic:
sub notifyUs(params, criteria)
  var rs = openRS("SELECT * FROM NotifyX WHERE " & criteria)
  while getNext(rs)
    eval(rs.notifyee & "(" & params & ")" )
  end while
end sub
Tablizing the routines also makes controlling and managing the order of execution much easier than most OOP languages.

It's abilities like these that make OOP's common shortage of collection-friendly operations annoying. OO classes are half-ass collections (actually about 1/4-ass). Viva relational tables!


Multiple Dispatch

Multiple dispatching is a natural for tables. As many fields as you need, and even complex conditions, can control the dispatching. Further, a developer's view of the dispatching info is very dynamic. You can sort and filter the view any way needed.

For a hypothetical example, assume you are implementing modem drivers for many different operating systems and modem models. Further, in some cases there are different drivers needed for different speed ranges. For example, you might need one driver for a speed range from 1200 baud up to 9600 baud, and another driver for 14000 to 52000 baud.

Modem Driver Dispatcher Variation 1

OS Model Speed_Low Speed_High dial send recv hangup
UNIX Hayes 20a 1200 9600 [implement.] [implement.] [implement.] [implement.]
UNIX Hayes 20a 14000 52000 [implement.] [implement.] [implement.] [implement.]
Win-9x Hayes 20a 1200 52000 [implement.] [implement.] [implement.] [implement.]
UNIX USR Z20 300 52000 [implement.] [implement.] [implement.] [implement.]
Win-9x USR Z20 1200 2400 [implement.] [implement.] [implement.] [implement.]
Win-9x USR Z20 4800 52000 [implement.] [implement.] [implement.] [implement.]
Mac KLM 10 1200 52000 [implement.] [implement.] [implement.] [implement.]
Etc.... .... .... .... .... .... .... ....

Modem Driver Dispatcher Variation 2

OS Model Speed_Low Speed_High Operation Implementation
UNIX Hayes 20a 1200 9600 dial [implement.]
UNIX Hayes 20a 1200 9600 send [implement.]
UNIX Hayes 20a 1200 9600 recv [implement.]
UNIX Hayes 20a 1200 9600 hangup [implement.]
Win-9x USR Z20 4800 52000 dial [implement.]
Win-9x USR Z20 4800 52000 send [implement.]
Win-9x USR Z20 4800 52000 recv [implement.]
Win-9x USR Z20 4800 52000 hangup [implement.]
Etc... .... .... .... .... ....

The "[implement.]" placeholders mark what could be function names, pointers to functions (C-like), case statement selectors, or even actual code. The actual choice depends on your environment and/or language. (See introduction.)

Note that RDBMS tables may not be appropriate for implementing an actual modem driver system because they may not provide predictable retrieval times. (This example is hypothetical anyhow. No claim to matching actual driver needs is made.)

However, the table is simply an organizational tool. The final source code and/or executable does not necessarily have to use tables itself. The table could be used to implement (generate) the final code. One can even envision a field that tells which executable/DLL to target for a given implementation if there is going to be some grouping of specific implementations.

You can do a lot of other fancy things, like have the name/ID of the assigned developer in each row. Tables can make a very nice code management tool. Why reinvent the wheel in IDE's to duplicate what tables already offer? (Although many existing RDBMS's admittedly don't offer very good approaches/tools for such usage because they are too caught up in improving mass transaction speed to win benchmark contests.)

Sharing Similarities

One may notice that many of the implementations may repeat. Thus, some way to factor the similarities is perhaps needed. One approach is to implement a kind of "tablized inheritance" by having a DefaultID or ParentID field, along with a DriverID to provide a unique key. Any blank cells would be sought after in the parent(s). There are other variations like set-based and/or priority based approaches to sharing similarities and dispatching that I won't go into here. We are not stuck with the limited dispatching provided by OOP languages, but can "juice it up". Some OO fans will say, "but OO's dispatching is built-in". True, but that also means that it does not easily allow juiced-up dispatching to handle a complex and dynamic world that is not tree-shaped or needs multiple "answers", especially when implementing fickle business rules as opposed to device drivers.

UI Event-handling Example

One can envision something similar used for GUI event handling:

EventID Application Form Widget Event Implementation Priority
34 Scheduler timeslots grid doubleclick [....] 10
35 Scheduler timeslots button_1 click [....] 8
36 Scheduler confirm button_2 click [....] 10
91 Reports mainform picklist_A change [....] 12
92 Reports mainform button_1 click [....] 10
93 Reports * * key_F1 [....] 14
Asterisks represent wild-cards
See Tabled GUI's for more on table-oriented GUI's. Perhaps user roles and widget "type" (button, text-box, list, etc.) could be added to the wild-card-able columns.

OOP versions of multiple dispatching, such as the Visitor Pattern are usually a shameful mess. It is a Rube Goldberg contraption for what it allegedly does. I get a headache trying to figure out C++ or Java versions of Visitor. Tables shine in the light of manageable simplicity and nearly infinite flexibility in comparison. Viva Tables!

There is another non-table, non-case-statement, non-OO version of multi-dispatching, tentatively known as the Calculated Script approach.


Role Pattern

The role pattern allows different entities to serve the same "role" or the same entity to serve different roles. The common theme is that diverse things are being related to one another.

For example, a wide variety of "things" might "hold" fixed assets in a company. (A fixed asset might be equipment, furniture, etc.) Employees, departments, subsidiaries, etc. may be in charge of various fixed assets.

One approach to this is to place and asset number field in each of the entities involved. However, It is a little tough to process assets as a group, because each participating entity must be joined (SQL UNION) during aggregate processing.

Another approach is to have an Asset table that links to each entity.

Asset_ID Asset_Descr Holder_Entity Holder_ID
1682 Couch Dept 24
1627 Desk Employee 77377
1628 Car Dept 107

This allows entity-specific info to be retrieved if needed. Holder_Entity can be either the actual table name, or some other agreed-upon code or abbreviation.

Another variation is to have a separate column for each entity. Although this makes for a messier table in my opinion, it may make certain SQL queries easier to deal with (fewer Select's). Your choice may depend on the kinds of queries/accessing that will likely be done on the table.

SQL-friendlier version
Asset_ID Asset_Descr Dept_Holder Emp_Holder
1682 Couch 24 0
1627 Desk 0 77377
1628 Car 107 0

Structurally, I still prefer the first version. For one, you don't have to add new fields to link to new entities. However, that would probably be a relatively rare addition anyhow.

If there were multiple asset holders per asset, then the link should probably be from the holders to the asset table. In other words, the asset table would not have any foreign keys to the holders.

It may be helpful to also provide an entity description in the table. However, this may duplicate descriptions already in the entity tables, which may not be considered good normalization. For one, the original title may change and the Asset table may miss the change. If each entity table had a same-named "Description" field, then it may be prudent to get the name from them.

One problem with this is that some entities, such as Employee, may have the description split into first name, last name, etc. Perhaps a RDBMS "view" could be made to provide a "calculated" description based on the component names.

Some RDBMS may even allow calculated fields or "view columns" in the original table schema. This is probably closer to the convention of each OOP class having a Print method to provide a description of each instance regardless of the type or membership of an instance. However, this feature oddly is not that common in RDBMS outside of "views". (Perhaps the need for it, beyond competing against OOP examples, is not really that common. OO training material often exaggerates the frequency of certain patterns or needs.)

The decision to duplicate the description or reference it is a judgement call based on your organization's field naming conventions and other issues.

Link Table Or Not

One common decision point with roles is whether to have a "middle man table", also known as a "link table", or whether to put the links in entity-specific tables.

Middle-Table Version
EmpID Role Reference
9 Manager 325
13 TeamLeader 67
13 Programmer 22

Notice how the same employee can serve more than one role. The Reference field is a foreign key to a role-specific table. For example, the Reference for the first record shown would point to a record in the Manager table.

Rather than have a middle table, the Manager table (if there is one) can have a foreign key to an employee. This then gets us into the issue of what is the difference between a "role" and regular relationships. Not only may a Manager table have a foreign key to an Employee, but time-sheet line items may also have them. Does this mean that the employee serves as the "role" of a "time-sheet participant" or as a "project participant"? (A project management system may use time-sheet data to track participants). Is anything that references employees part of the role network? If so, does this mean than anything that references anything is part of a role network? Where is the dividing line? It seems highly semantic-dependant.

Thus, it appears that the dividing line between a "role" and regular relational modeling is a blurry one. Perhaps a tighter definition would be situations where a table has mutually-exclusive links to multiple entities. In other-words, an "entity indicator" to tell us which table a foreign key refers to. But, this "new" definition does not fit tightly with the OOP version. Another problem with this definition is that small changes in the direction of the links would disqualify something from being a "role", even though the participants are mostly the same. An unambiguous technical definition of "role" may be either elusive or nearly useless except as defined under particular applications (such as employee roles).

Note that often the "middle table" can grow in size (field quantity) and importance so that it is no longer just a link provider. For example, our employee roles table may contain "date joined", "grade" (performance assessment) and other attributes shared by all or most roles.

Also, excessive need for role-like patterns may be indicative of poor schema design and/or artificial or forced taxonomies. But since sometimes one has to live with bad schemas, it would be nice if SQL or other relational languages had special operations to simplify such patterns.


Patterns Versus Formulas

OOP is missing the bigger picture

I once took a university course called something like Patterns and Pattern Recognition (using computers). Although it provided relatively few out-of-the-box techniques, one of the useful concepts emphasized was the idea of building a compact formulaic representation of a pattern, any kind of pattern. In other words, you don't manage the pattern itself, but a formula for it. This allows one to store, experiment with, and change things (or their view of them) with relatively little effort or resources. The things being manipulated by the pattern are generally not in the formula itself. Thus, we are not disturbing the "payload" by changing the formula.

formula diagram

OOP patterns are generally not at this stage. They are essentially template examples to copy, and not formulas. It is my opinion that relational algebra has come much closer to this goal than GOF-like OOP patterns. In procedural/relational setups, you generally put the "noun model" into the database.

If you need are particular "noun view", you simply apply the needed relational formula to get it. We don't have to change the database schemas (as long as they have sufficient info -- classic GIGO). Nor does the structure of the programming code have to change. Yes, the formula has to change, but not the larger-scale code structure. We are changing a brick, and not the whole wall. Our "nodes" in our structure and the links between them are simply not in programming code, but in the database. (Technically, the formula or query can "create" the links in a virtual ad-hoc fashion.) This simplifies our programming code.

Relational databases were originally motivated mostly by the business need for ad-hoc queries and not for dynamic, virtual modeling and code simplification. However, it turns out that these two are more or less the same thing. Ad-hoc querying means that we can tell the machine to construct a model for us and deliver the data that such a model would present without a lot of changes, especially permanent changes.

The database model only needs sufficient info to describe what it is modeling. As long as the info is somewhere in there, relational querying can usually deliver it (or at least close-enough for later custom fine-tuning). This has at least three benefits:

Note that I find SQL less than perfect and would overhaul it if given a chance. But even with its imperfections, it still beats the hard-wired modeling thinking of OOP.)

Drawbacks?

Some OOP proponents will say that such may be great for data, but not for behavior. In other words, databases and queries can manage and manipulate data well, but not behavior they say. In theory, and practice, we could put behavior in relational tables also. (See Control Tables for examples.) However, in practice I have found that this is not really needed that often. The long-term coupling (relationship) between table rows and behavior is rarely one-to-one in my experience, at least for business applications. (See Noun Shuffle.) Generally it is better design to use tables to tell what you want, rather than how to do it. Leave "how" to a subroutine or module.

Further, the separation of data and behavior allows multiple languages to more easily communicate. Data is simply more transportable and sharable than algorithms. I also find it more navigation-able than the mixed approach. Some OOP IDE's can replicate such techniques, but in the end they are simply reinventing the database.

If such techniques are so great, how come they are not more popular? One of the reasons is that OOP has done a good job of marketing itself through clever (but misleading) cliches and examples. Second, relational database systems mostly target the "glass house, big iron" crowd. I have witnessed nimble, table-friendly tools and languages that made the benefits of table-based modeling easy to "get at" and manage (despite other flaws in such tools). The best of both worlds can be merged for the most part in my opinion; and even if they can't, existing relational tools are still better than the OOP model-nouns-in-the-code approach.

Wimpy Abstractions?

Although he does not suggest relational abstractions as a replacement, Paul Graham has this to say about GOF-style OO patterns:
This practice is not only common, but institutionalized. For example, in the OO world you hear a good deal about "patterns". I wonder if these patterns are not sometimes evidence of case (c), the human compiler, at work. When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I'm using abstractions that aren't powerful enough--often that I'm generating by hand the expansions of some macro that I need to write. (Paul Graham, emphasis added.)
Here is another interesting anonymous quote from the WikiWiki OOP Logs:
In software engineering, duplication of something is often considered a sign that more abstraction or factoring is needed. Duplication should generally be refactored into one or fewer spots to simplify the code and localize changes to one or fewer spots. In other words, why do patterns have to be duplicated instead of just referenced for each usage? Do GOF patterns represent unnecessary duplication? Is all duplication bad? Interesting philosophical questions. [Emph. added]


See Also:
Useful Procedural/Relational Techniques
Control Tables
Shapes Example (case statements)
Business Modeling
More Pattern Philosophy
Device Driver Pattern
Change Patterns
OO Challenges
Structural Analysis Notes
The Yin and Yang of P/R


Home | OOP Criticism
© Copyright 2000-2003 by Findy Services and B. Jacobs