Aspects and OO

Encapsulation of behavior begs "where do I encapsulate?"

Updated: 2/23/2002

Object oriented thinking tends to favor "encapsulating" behavior with some object/class. However, I find this problematic in many business applications because operations often involve multiple "aspects," and choosing which object, class or noun to associate the operation with often seems arbitrary.

Oddly enough, this organization resembles a neuron in many respects. Although this does not "prove" it's the best processing model, it suggests that perhaps it is not too far off from how the brain handles and organizes complex information.
Also note that there is a field called "Aspect Oriented Programming." However, so far it appears to be more about specific algorithms than about general process and programming organization.

First we will look at a simple "toy" example, and then we will look at a portion of a (somewhat) more realistic invoicing application.

Bus Hit Larry

There are 3 primary aspects to this:

1. The bus
2. The person (Larry)
3. Collision

OO will often try to include the behavior in the bus object, or sometimes a person object, or even a "collision" object; while a procedural/relational (p/r) paradigm will most likely use a "collision" routine or module.

The big problem is that both of these approaches are arbitrary. Both OO and p/r have to pick one primary aspect to put the behavior in.

There may be secondary aspect using methods (OO) or case/if statements within a class or subroutine. However, one is still forced to pick a primary aspect. Examples:

  // Variation 1 (typical OO)
  block bus
     block collision_1
        ...
     end block
     block collision_2
        ...
     end block
     etc...
  end block

  // Alternative 2 (typical p/r)
  block collide
     block bus
       ...
     end block
     block Larry
       ...
     end block
  end block

Here, the first approach groups everything related to "bus" together. In the second approach, everything related to collisions is put together. There may be collision-related items in the Bus object, but collision-related items could also appear in other objects.

Thus, generally only one aspect can be physically grouped together at a time. Portions of secondary aspects may be grouped together under the primary aspect, but it is usually just portions.

Think of a typical sales report by product and region:

       BY REGION

  Region Product Sales

   TX     sox     13k
   TX     scarf   11k
   TX     shoes    8k

   NY     sox     54k
   NY     scarf   29k
   NY     shoes   17k

   CA     sox     40k
   CA     scarf   12k
   CA     shoes   14k


       BY PRODUCT

  Region Product  Sales

   TX     sox     13k
   NY     sox     54k
   CA     sox     40k

   TX     scarf   11k
   NY     scarf   29k
   CA     scarf   12k

   TX     shoes    8k
   NY     shoes   17k
   CA     shoes   14k

Managers often want multiple reports because no single report grouping can show all needed views. (They may also want to see sales by supplier, by distributor, by payment method, but payment tardiness, by sales, by marketing technique {TV, radio, etc.}, by profit margin on products, and so forth.)

Program code is not much different with regard to aspect grouping. Sometimes you wish to see one aspect, and other times you wish to see another.

Thus, when I here OO fans say that OO "is better at grouping related things together", they often forget that most real world operations involve multiple aspects. A particular change request may favor the aspect chosen by the programmer that day, but another change request on another day may not. (See the Shapes Example for a simple demonstration.)

Where I think the p/r approach shines is that there tends to be only one primary "action" or "task" aspect. But the OO designer has to choose between associating the collision behavior between the person or the bus or "collisions". Thus, it is often harder to find things in OO for this reason.

  // procedural/relational
  collide( bus, person)

  // OO
  bus.collide(person)
  OR
  person.collidewith(bus)
  OR
  Collision c = new Collision(person, bus)

When thinking about an issue, usually there is one primary verb per issue. However, there are often multiple primary nouns:

Noun1     Verb    Noun2
 Bus      hit     Larry

In OO, the designer has to choose between lumping "hit" with either bus or Larry, or perhaps even a Hit class.

Thus, if somebody wants to locate the code that deals with "Bus hit Larry", they have to make 3 guesses in OO:

class/object Bus
class/object Hit
class/object Larry

In p/r one is fairly sure that the related behavior is in routine "Hit" (or perhaps "collide"). In OO you never know. It could be in any of the 3, and there are no clear rules for which it will be found in. Entire debates about which class to stick things in get long and messy.

OO fans may brag, "Look, all the bus-related info is all in one spot. Thus I can modify bus stuff easier or add another vehicle without hoping all over in your p/r verbs."

This is true, but the opposite happens if we need to add or significantly change a given verb. If you group by aspect-X, of course aspect-X-oriented changes or additions are going to be easier. However, grouping by aspect-X will then disfavor other aspects. Thus, such claims are only a card trick because they don't mention the tradeoffs of multiple aspects.

For example, if we add a new verb, such as "Clean," the noun-centric approach will require visiting all nouns to add a new Clean method. However, in the verb-oriented (p/r) approach, none of the noun code has to be changed. The Clean() routine is simply added without touching any Bus or Person entity.

OO Presumption - Aspect Importance Ranking
Aspect 1: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aspect 2: ||||||||||||||
Aspect 3: ||||||||
Aspect 4: ||||||
Aspect 5: |||||

A More Realistic View
Aspect 1: |||||||||||||||||||||||||||||||||
Aspect 2: ||||||||||||||||||||||||||||
Aspect 3: |||||||||||||||||||||||
Aspect 4: ||||||||||||||||||||
Aspect 5: ||||||||||||||||||

OO tends to assume, or pretend like one aspect is significantly more important than another. Thus, an OO fan will most likely think, "The collision is mostly about the bus. Therefore, I will put the collision behavior in the Bus class/object."

However, the collision involves at least three aspects (person, colliding, and bus). However, the damage done by the bus depends on things outside of the bus' control, such as the health of Larry, the location of the collision, hardness of the surrounding pavement or environment, laws regarding collisions, etc.

We can say that this event "has-a" bus-related aspect, but this is not the same as the event being ("is-a") bus-related event. (See Sutype Proliferation Myth for more on real-world usefulness of "is-a" relationships.)

In linear code, picking only one aspect as being the primary aspect will always be a problem. Attempts to remedy this sometimes involve some sort of code or aspect browser so that different aspects can be "queried" as needed.

One approach is a Smalltalk-like IDE (code browser) where same-named methods can be brought together for viewing, and another is a Control Table, where rows represent one aspect, and columns represent another.

Another table-oriented approach to "query by aspect" is to have different columns to represent different aspects. This allows for complex aspect queries. It is not unlike the sales report example above. However, this approach is currently tough to apply to an entire application More research is needed in this area.

Although a dynamic code/behavior browser can solve *some* of this, it is tough because the boundaries of the range of aspects often do not fall on the same boundaries. One attempted solution is to chop the code up into little pieces. However, it then gets kind of messy to follow the flow, and context often cannot be used:

begin aspect A
  var K
  blah_1
  blah_2
  begin aspect B and aspect C
     blah_3
  end
  begin aspect D
     blah_4
  end
  blah_5
end

Here, blah 3 has "inherited" aspects A, B, and C. If we chop this thing up into smaller units of code (one per "blah"), we then have to replicate the "A" block for blah 1 through blah 5. Each little block then has to have independent "header information" such as a block name, parameters, etc.

  block blah_1 {
    Aspect(A)
  }
  block blah_2 {
    Aspect(A)
  }
  block blah_3 {
    Aspect(B)
    Aspect(C)
  }
  block blah_4 {
    Aspect(A)
    Aspect(D)
  }
  block blah_5 {
    Aspect(A)
  }

(Note that each "blah" may also be considered an aspect and/or be part of the "blah" aspect.)

It may also be hard to see the sequence with small blocks that are ordered by something other than sequence. (Sequence is yet another aspect). In traditional programming sequence is often indicated by the physical ordering in the code.

Even if an IDE can bring them together based on a sequence aspect query (if and IDE can determine it), we still often have to wade through the repeated header information.

Even our variable K may have to become a formal parameter (not shown) in some or all of the micro-blocks. Too many small blocks can be compared to writing (text) without using pronouns and context. Every sentence would have to explicitly state and declare exactly what it is talking about. Even OO tends to use such context in the form of class variables. (In p/r the rough equivalent is routine-level variables shared by various IF and/or CASE blocks. CASE and IF blocks can be viewed as being roughly analogous to methods. See Block Discrimination for more on this.)

Overall, neither paradigm has solved this sticky aspect problem. Both p/r and OO tend to focus too much on one aspect at the expense of others.

But, until something more aspect-friendly (and usable) comes along, I will stick with procedural/relational because at least the aspect usually chosen, task, is more consistent.

Invoicing Status Code Example

Let's now look at a somewhat more realistic peice of business logic:

"During the month-end process, if the invoice is complete, and it is a bill to the government, and the user has the proper permissions, then save the invoice into a special archive area required by law for government invoices for this firm's kind of work. (In this case, the month-end is triggered via a user instead of a demaen process. There could be several reasons for this, such as the user having to make a few decisions or fix problem invoices before proceeding.)"

Here is what procedural code for this may look like:

  sub monthEnd(invoice)
    ...
    if invoice.status = DONE and invoice.isGov  _
              and userHasSecurity(userID) then
       Archive_X()
    end if
    ...
  end sub

Actually, due to "context" (see above prior section), the code may sometimes look more like:

  sub monthEnd(invoice)
    ...
    if userHasSecurity(userID)
       ...
       if invoice.status = DONE
          ...
          if invoice.isGov
             ...
             Archive_X()
             ...
          end if
          ...
       end if
       ...
    end if
    ...
  end sub

Often the "blocking" is via subroutine calls instead of IF statements. Thus, the actual nesting may not be as deep.

Archive_X involves at least four aspects:

  1. Invoice status aspect
  2. Government aspect
  3. Security (access) aspect
  4. Month-end (task) aspect
  5. Sequence aspect (time) (may not be applicable here)

Now, an OO developer will often search for an object or class in which to "place" Archive_X (or it's equivalent).

Should Archive_X go into an Invoice class/object? A Government class/object? A Security class/object? A MonthEnd class/object?

To me, they are all potential candidates because they all have an influence on Archive_X. (We have shown only IF statements here for simplicity, but the influences can be in the form of other types of data or language structures. Also, it could be put into an Archive class, but then it is not really different from our procedural function, except that it is probably more syntax to set up.)

Let's say we put it into a Government class/object. After all, it does seem specific to government requirements. (Some OO fans may be tempted to subclass clients based on a government/non-government dichotomy. But, this is risky as described in the Subtypes document.)

However, suppose that your firm has gotten a bulk deal (volume discount) on the archive process. Now they pay a flat rate no matter which invoice is archived (up to a reasonable point). Thus, now all finished invoices go to the special archive.

Behold, the process no longer has any ties to Government-ness. So what the fudge is it doing in a Government class?

Okay, let's try putting it in the "Done" Status subclass. However, what if the price drops on the archive service such that even unfinished invoices can be archived? Our poor little operation now has to search for a new home class.

We could put it in a "Month_End" class, but then it is no longer very OO. Plus, it could need moving from that aspect also (under both paradigms).

Many business processes do not fit nicely into the "encapsulate behavior" encouragement of OO. Encapsulation is often the arbitrary or weak assignment of a process or item to a noun-oriented class/object. In reality, it "belongs" to multiple aspects.

We can say that Archive_X "has-a" government aspect to it during the original design, but it is not an "is-a" relationship, which OO thinking tends to be fond of. OO tends to push for artificial dichotomies or relationships of behavior or attribute assignment.

Astute readers may realize that what I am trying to promote is the de-coupling of features (breaking ties) that do not belong coupled into the source code or design. Just because archiving is tied to only government clients during the planning stage is no reason to hard-wire these two aspects together into our design.

Feature de-coupling can ensure that changes and new combinations can happen in the future with minimal or no source code change. By using IF statements with database feature selection flags and strategy codes, such as described in the Bank example and the Publications example, unnecessary feature coupling is avoided. Note that the cited examples demonstrate de-coupling from a hierarchy or taxonomy, but a similar design philosophy also applies to aspect de-coupling.

My design would probably have an IsGov flag and an ArchiveType or HasArchive flag in the client record structure. These can be changed individually in the data (instances) with no change to the source code. It might not be good job security, but it is good design.

Even though my original designs (above) linked archiving to government-ness, it is an easy change to "re-hook" it to an archive type field. (It could be argued that initial tying keeps the user interface simpler.) Just change an IF statement. Compare this with an OO approach that would probably require moving code from one class to another. This movement will often be the case in subclass-based action selection. Changing in-place is easier and less risky than moving in most cases.

Note that even though I use the word "type" for the archive treatment selector field name, perhaps strategy would be more appropriate. However, "type" is shorter and more common. Also note that sometimes there might be limitations placed on which combinations of the attributes are allowed. This can be done in the data entry validation and/or RDBMS triggers.

Things change their "activation criteria" much more often than they change their "task membership". This is why procedural modeling is better for business applications. OO keeps trying to tie activation criteria to noun-based membership schemes and taxonomies, but these backfire too often. Thus, I prefer verb-based (a.k.a. "task based") modeling for business application code.

Change Patterns Example

Q: When something changes usually this is not only the nouns/data but the tasks too. In a DB, tables change a lot at the beginning but after some time they don't change much.

But the usage of the entities still changes frequently in my experience.

Simplified Example:

January:

  step1
  step2
  step3

March:

  if entityA.x then
    step1
  end if
  step2
  step3

July:

  if entityA.x and entityC.z then
    step1
  end if
  step2
  step3

December:

  if entityB.y then
    step1
  end if
  if not entityC.r then
    step2
    step3
  end if

Here the task steps remain fairly stable; however, information from many different entities comes and goes into play.

The most variant parts are which entities affect behavior.

That is my impression of the most frequent changes in biz apps, and modeling from the start around entities would obviously cause more code changes and movements. I have worked on two applications where almost every major table was involved one way or another in price (discount) calculations.

If an OO'er walked in on March, they would be inclined to put step 1 with entity A. However, if you look at the longer-term view, then you would see that this would be a bad decision.

Hard Partitioning

Although hard partitioning is not native to object oriented designs, it seems to be more common there. It is another manifestation of the IS-A philosophy that under-pins much of OOP.

A typical example of hard partitioning is in a work-flow application. A given item may "move" from a status of "pending" to "in-review" and later to "reviewed", or perhaps "rejected" or "passed". Some designs have separate physical buckets (or at least hard-coded buckets) that they place each item into as it is being processed.

For example, they may have several collections, one called "pending_collection", another called "rejected_collection", etc. This is understandable, because it fits the visual impression that one may initially have of the process. However, the fault with this is that it is hard-wiring a single aspect (see above) into the design.

Suppose you did have all these separate collections defined. Then the boss comes up and asks you for "all items submitted by Jane Doe between January and June". The current status of the item may be part of such a report, but it is only a minor player, one column among many. You now have to combine all these collections to make the report. You have to combine them because they are split on an aspect that is not important to the new request.

In other words, the new request goes against the "grain" of partitioning; an orthogonal dimension to the design. We now require more effort to work around the hard-wired partitioning by status. Viewing information by multiple aspects is becoming more and more important in business. Perhaps in the old days where only the most important aspect of a business was automated, hard partitions may have been satisfactory and very efficient. However, the world is growing up, and wants more complex, cross-cutting analysis.

The best approach is often "virtual partitioning". All such items should be in one collection (such as a table). The status code is one attribute (field) among many others. We don't elevate its status above other attributes because each attribute may play a role under different circumstances. Some of the circumstances may not be foreseeable at design time. Thus, it is usually a safer bet to prepare for multiple partitioning dimensions.

This approach allows us to supply a virtually partitioned collection when needed, but not sacrifice easy access to other partitions when needed. If we want to view by a particular status, we can issue a query or view such as:

s = SQL("select * from Items where iStatus = 'pending' ")

(I am not saying that SQL is always the best way to do virtual partitioning, just the most common.)

There is one caveat to this approach. Suppose we start out a consumer-based store system by deleting all cancelled orders. After the system is well established, the request comes in to keep all the cancelled orders instead of delete them.

If we introduce a status code in order to separate them, then we may have to change a lot of programming code in order to make sure every spot that uses order data knows to filter out cancelled orders.

It is possible that we may forget a few spots. Thus, it may be less risky to introduce a second collection (or table) that stores only the cancelled orders.

One more possible reason is that the database system may not offer row-level security. Thus, one has to have similar or duplicate table schemas in order to separate rows for security reasons.

IBM Research Into Hyper-space

Here is a reference to IBM Research on multiple aspects. Although there are some wonderful papers listed describing the problem, their "solution" still needs more work.

OOP Criticism | Main