Programming Organization Goals and Metrics

Goals and Metrics
in Programming Organization

How well does OOP satisfy these goals
compared to other paradigms?
Updated 9/20/2002

Descriptive Metrics
Physical vs. Mental Metrics
Change Scenarios and Numerical Metrics
Capers Jones LOC Critique
Goto Introspection Lesson
Fighter Jet Analogy
Partial Apology to Perl Fans

Descriptive Metrics

Protection

Protection is the feature of protecting elements from improper, usually careless, actions or associations. It often takes the form of rules or tags on programming elements which say, "You are not allowed to do such and such," or "You must do such and such or I will not compile you."

The earlier in the process a problem is detected, the better the "protection" is considered:

At compile or syntax checking (lint) time.
At load time (such as loading an EXE file into memory or a script into an interpreter).
At execution (run) time.
Long after the mess was made.

Most languages have at least some amount of protection, otherwise mistakes would usually result in memory space violations or program aborts beyond the interpreter or language handlers.

Protection often adds limits to scope, context, ranges, and usage. Protection often, but not always, takes the form of clauses or modifier keywords. For example, in Java the "abstract" keyword disallows a class from having runnable code. Similarly, the Java "final" keyword prevents one from formally extending (inheriting from) any class marked as "final."

Array bounds checking is also a form of protection. Without it, a program may end up rewriting itself when the index goes out of bounds.

I once had a C program that inadvertantly rewrote it's I/O handlers when a parameter of the wrong size was inadvertantly passed. (The compiler never complained.) I had a tough time debugging it because the results of the code scrambling were not apparent at the point of derailment. (Unfortunately, many C++ programmers base their poor opinion of procedural programming on C. It is almost like using Hitler as an example of all Germans.)

I will fully agree that protection is one of OOP's potential strong points; much more so than "reuse" (below), which is often where the claims are given. However, there may be tradeoffs that we will consider.

Data and behavior can be tightly integrated so only a predetermined set of operations can operate on or "see" any given object. Also, variables can be designated to only be changed by methods within their surrounding class. Similarly, methods can be designated to only be called from within their own class. OOP often provides many program-enforced ways to allow the programmer to carefully control the interfaces into an objects or classes. Many OO languages also allow template classes which when referenced, require an implementing class to implement all methods designated as such.

Although most of these protection features are not required in order for a language to be considered OO, many are nevertheless easier to enforce in an OO paradigm; or at least better explored.

Protection in procedural paradigms has not been as explored and/or documented as well as OO protection. Triggers, referential integrity, and stored procedures are often used with relational paradigms to create forms of protection. (One advantage of DB-level protection is that the data is protected regardless of paradigm or the language used.)

I fully agree that protection is worthy goal. However, there may be significant tradeoffs in OOP's protection and perhaps in too much protection in general for many application domains.

For one, we have already seen that tight associations between things (objects and data) and behaviors do not model the real world very well. Most real world objects and entities do not walk around or get created with formal restrictions on what actions can be done to them. In our example we saw that crime might be greatly reduced if we could put more built-in restrictions on who can do what to who.

However, lack of a fit to the real world does not necessarily mean that OOP is not a good way to model things inside a computer. But, it may be an indicator of an idealism taken too far.

Second, there may be a tradeoff between internal integration and external integration. The internal "world" is the world directly controllable through OOP's classes and constructs. Once information is converted into "proper" OOP objects, the power and protection of OOP can be fairly easily realized. However, conditioning (mapping, converting, and screening) the data to be acceptable to an internal OO application's world may be a greater burden than if the guard was let down a little or decided by the programmer on a case-by-case basis. The burden of making the data kosher may be greater than the savings and benefits of tight internal coordination.

Finally, taking advantage of OO's brand of protection often requires "large types" (subtypes) to be a useful modeling technique. This is often not the case for custom business applications.

Note that sometimes "encapsulation" is considered a synonym for protection. However, encapsulation may imply a physical grouping, which is really "proximity" (see below) instead of protection. Protection does not imply anything about the physical location of elements in relational to another even though physically grouping things together may help "protect" them in some situations.

See Also:
Data Protection
Definition and Merit of Scripting

Proximity

Proximity is the goal of having relating items placed physically together in our software. This simplifies software maintenance and inspection by reducing the jumping around needed to make changes or inspect code. Traditional (procedural) programming tended to group like items by behavior. However, OOP allows one to also group by subclass. (See our discussion about Control Table theory for a deeper discussion of these two groupings.)

OOP proponents often claim that the subclass grouping (SOMPI) is superior to the procedural grouping (SIMPO). However, this has not been shown to be the case a significant majority of the time. Grouping by operation (SIMPO) has it's own benefits at times.

This tug-of-war between the two arrangements has prompted us to combine the two arrangements so that we can have both groupings at the same time. How can this be done? By adding a dimension. OOP code and procedural code are generally one-dimensional in concept. However, Control Tables are two-dimensional in concept. Thus, one is not forced to pick one grouping at the expense of another.

Part of this gets into the issue of code indexing so that related code can be found. If we look at the world of document indexing and RDBMS, queries can be on multiple criteria, and with complex expressions. OOP and its single-class "encapsulation" is limited. It is like being forced to have only one field/parameter for all code queries. In other words, it does not scale. It is yet another example of IS-A thinking. To be fair, procedural design has also been traditionally limited to one code indexing aspect (routines). One possible solution is a variation on Control Tables, where RDBMS can be used to index code segments on multiple criteria. A glimpse can be seen in the Multiple Dispatch Pattern example. It is an area that can use more research even if you reject using RDBMS technology.

The bottom line is that OOP has no clear victory in the proximity category. It allows a proximity alternative that procedural programming does not do very well, but it still forces an awkward tradeoff on the software designer.

Internal Reuse

In this write-up we are going to distinguish between internal reuse and external reuse. Internal reuse is code reuse within the same or closely related project. Often "frameworks" are built to support a family of applications which share commonalties. For example, a company may produce a family of graphics applications which may share a core graphic rendering engine.

It appears that the goal of internal reuse and external reuse often conflict with each other because they require different granularity (chunk sizes) in the parts intended for reuse.

I will agree that OOT does fairly well at this type of reuse, especially if RDBMS are not involved. However, the need for "application families" is not that common in custom, internal business software. Most companies have only one accounting system, one inventory system, one billing system, etc. Except for some aspects of the user interface perhaps, these different sub-systems share very little potential core logic.

Note that inheritance is often cited as a reuse tool because subclasses do not have to re-implement inherited methods and attributes. However, I have found that procedural programming and other paradigms can provide very similar "default" behavior when needed. (It is usually a very different structure, but is not more overall code.) This approach also has the advantage of that the features don't have to fit into a lockstep pattern usually needed for subtyping. Thus, inheritance's contribution to reuse is way over-hyped.

External Reuse

External reuse is the ability to take parts of one application or library and transplant them into a completely different one, usually created at a different time or organization.

It is my experience that taking advantage of this type of reuse generally requires either very small components, or components with relatively simple, well-defined interfaces.

If the components have complex interfaces or require interfacing with complex structures (tables, classes, etc.), then it is often just as easy to rebuild the components from scratch anyhow. This is because the systems often model the same type of things in very different ways.

Even though two similar companies both have payroll operations, their data modeling will probably be very different. It is similar to the way that different vocal languages use very different structures to say the same thing (especially when you compare languages from different continents). Such things just do not have a one-to-one correspondence and require complex and risky re-mappings. Anyone who has ever been involved in total system conversions will realize this. The combinations and philosophies make for zillions of different ways to do and model the same general operations.

The protocol coupling analogy illustrated the "tying" nature of many OO designs. This makes the pieces more conceptually dependent on each other, and thus the whole OO application, or at least very large chunks of it, must be transplanted together.

Overall, promoted usages of OOP, like protocol coupling, often work against external reuse.

See Also:
The Reuse Dilemma (reference v. copy reuse)

Self Documentation

OOP often provides structure and element modifiers that formally state limitations of who or what can read or modify what. Keywords like "private", "protected", "final", etc. often appear in OOP languages.

It is often said that these have two purposes. 1) formally protect, and 2) Document that something should be protected.

Although, such features are "nice", their use does not significantly improve the overall program in my opinion. First of all, since they provide very little immediate benefit, many programmers will avoid using them. Second, they can be eliminated if the follow-on programmer does not want to "obey" them. Third, using regular comments instead of formal keywords keeps the language simpler.

Fourth, this type of documentation only solves half of the problem. To be effective, and programmer needs to know:

1. What should be protected
2. Why is should be protected

The OOP keywords generally do not tell WHY something should be protected. This often requires further comments. If further comments are needed anyhow, then including keywords is somewhat redundant. It seems to me that you might as well just supply good comments to begin with rather than try to formalize a spanking system.

Simplicity

This is an attempt to keep the code simple. A common (but imperfect) measurement is the number of "tokens" needed. Tokens will be defined variables, objects, methods, keywords, operators, specifiers, etc.

Whether OOP makes the program simpler, more complicated, or whatnot is highly contested. Some argue that the protective nature of OO philosophy produces more code. For example, one is often encouraged to write "set" and "get" methods instead of simple assignments to attributes:

   method getAmount() {
     return this.Amount
   }
   method setAmount(t) {
     if valid(t) {
       this.Amount = t
     else {
       exception ...
     }
   }

I find these cumbersome, especially when there are a lot of fields (attributes). Bloating up the code with these repetitious structures makes useful stuff harder to find and read. Validation is an important operation, but there are many other ways to deal with it besides building Set methods for every single attribute. Data Dictionaries are one approach that I prefer. (Set/Get operations are not needed in all OO languages. See also Double Dipping.)

Repetition Factoring

Repetition factoring is about removing repetition of structure and/or code by moving it into a single or fewer locations. See the "Buzzwords" entry for more on this.

Repetition factoring (RF) is different from the term "factoring" by itself. Factoring has grown to encompass too many concepts to be a useful metric or point of clear discussion.

Although repetition factoring is generally a "good thing," one point to be careful about is factoring stuff that is somewhat likely to grow apart in the future. Just because something is the same/similar today does not mean that it will be the same/similar tomorrow.

Some argue that RF should be done even for stuff that is coincidentally or temporarily similar. They argue that one can always pull them apart if they grow too different. However, altering code simply to reorganize it as opposed to adding new functionality is generally frowned upon in my experience. (Assuming that there is even a budget for such rearranging.)

Share-ability

This is the ability to share information with other languages and paradigms. I have argued that OO often makes this harder because of it's tendency to tie data with behavior.

Maintainability

Maintainability is the amount of effort needed to make changes to an existing system. OOT proponents often cite maintainability as a primary benefit of OOT. However, their arguments often make assumptions about the nature of changes that often do not fit the real world very well in my opinion.

The first assumption that we already talked about is hierarchical changes.

The second problematic assumption is that most changes are favorable to the subclass grouping (SOMPI). This just may not be the case.

Code Size

This criteria is similar to simplicity (above). However, there are times when they are not the same. For example, SQL correlated sub-queries are usually smaller than their procedural counterpart. However, many find correlated sub-queries unintuitive and unreducible in the traditional sense.

Regular expressions (parsing codes) are another example of code which can be small, yet complicated to the unpracticed.

Rapid Development

This is the ability to generate an application that fits requirements very fast. There are many many tradeoffs to having something ready fast and having something that will be maintainable and expandable in the longer run.

Most of the claims for RAD in OOP relate to alleged increased reuse, not so much its ability to generate completely new code. Reuse was discussed above.

Change Impact Tracing

This is the about being able to trace or track the impact of changes back to as many impact points as possible. OOP's potentially tight protection may make change impact analysis easier. However, I have not explored this enough to make any definite conclusions. I suspect OO can improve change impact tracing; but at the price of code bloat, such as extra meta classes. (See also Machine-targeted syntax.)

Compiled Unit Separation

Although I generally prefer interpreted languages, having the ability to separate compiled portions for distribution often comes up in discussions. Often it comes down to aspect tradeoff issues again: one paradigm may favor B-aspect separations at the expense of C-aspect separations, for example. See also Challenge #3. (Note that some interpreters can run encrypted p-code to protect the source code from snoopers.)

Further, the granularity of compilation is very language-dependant. I have used procedural languages that could compile at the routine level and not just the module level.

There is no law that says a language cannot be made to compile at the block or even line level. Whether that is practical or needed is another issue.

Sometimes a variation of this metric is called "intrusiveness", meaning having to "touch" existing code units to make changes. The theory is that one may Bump Thy Neighbor in the change editing process and break something. In my opinion, this fear is sometimes exaggerated unless perhaps it is a medical device or astronaut life-support system. Bumping nearby code is not significantly more likely than editing the wrong file/class/routine. Many editors make it just as easy to grab the wrong file as it is to scramble a nearby code block. This kind of error may all be relative to the editing or code management tools being used. Nothing prevents a "block-lock" editor from being built to protect neighbor blocks. You could treat files as blocks and blocks a files (or put them in tables).

Consistency

Consistency is the goal or ability to keep the design consistent from one project to the next, and/or for different developers to produce designs that are consistent. It is generally easier to read and understand code if there is some predictable pattern or organizational philosophy. See Goto Discussion below for more on this.

Summary

I will agree that OOP can increase built-in protection under certain circumstances. However, there may be significant tradeoffs in this. Other OOP claims, such as reuse, are dubious and hype-linked at best.

Physical vs. Mental Metrics

One possible distinction in metrics is between "physical" and "mental" metrics. Physical metrics are those that an observer can either analyze or verify. These will usually fall into hand and eye movements in the software engineering field. Special machinery exists to even follow eye movements, but a description by the developer (user) is probably sufficient for smaller-scale studies.

A typical description may resemble, "First I check routine X to see if it has an error handler, then if I find it, I see if it is registered in the handler list". The observer(s) can then go through the same motions later and count/measure hand and eye movements.

Using only hand and eye movements seems too primitive to some. However, the alternatives get rather subjective. If the developer can describe a fairly precise algorithm (methodology) for how to think about finding stuff and performing changes (see below about change impact analysis), then such can often be turned into physical measurements.

However, often paradigm claims seem hard for developers to pin down. They may say things like, "It just helps me think about things easier". They may indeed be correct. The paradigm could indeed be a better fit for the way that they think. However, it is intellectually dangerous to extrapolate what fits their mind into other minds.

I was once in a (yet another) debate with an object-oriented proponent. He suggested that OO better mapped to human language, or at least English. (I question this, but that is another story.) I then asked him why matching verbal languages was important.

He then said something like, "I think natively in language (English), and assumed that others do also. Perhaps this assumption is not correct." I then pointed out that I often think in images first, and then translate the visuals into whatever language is the target (English, Python, VB, etc.). There are many visual artists, amateur and professional, in my family lineage. Whether most other visual thinkers also prefer Table Oriented Programming and whether most verbal thinkers prefer OOP is simply speculation at this point, but suggests interesting areas for academic study.

A related issue is whether one should change their mind to fit a paradigm or change/choose paradigms to fit their mind? The best answer probably boils down to cost-benefit analysis. If a developer must spend several years to alter his/her thinking in order to be a competitive developer, then perhaps they should select another profession.

However, there is often an assumption that there is "one best paradigm". Suppose, for the sake of argument, there are 3 viable paradigms: 1) Procedural/relational, 2) Object Oriented, and 3) Functional. Also suppose that different minds will fit different ones better than others. If the industry drifts back and forth like clothing fashion (tie width, pant flair), then some cycles may benefit certain minds over others.

It makes more sense to keep a healthy market for all three, otherwise the cost of software development may go up because those who don't fit the in-style paradigm well are either going to be less productive, or leave the market, leaving less practitioners and less choice. (Assuming that all three paradigms are generally equal under a "fitting" expert.) Further, shifts in cycles may disadvantage those in the current but waning cycle, despite years of practical experience.

Is perfecting the latest groove really more important than such experience? If the argument dealt with something external, such as changes in user interface design, then I might be more likely to agree. However, we are dealing mostly with internal software design here. (I agree that one should explore the current trends in order to "keep up". However, there is a point where the change is for the sake of change instead of vertical progress.)

Some will argue that a "good developer will struggle to master all paradigms and techniques". However, I am not sure this is practical, or even possible. Mastering a methodology and paradigm may take several years, especially if it is not a native "mind-fit". Further, should the "verbal thinker" (above) have to force themselves to think visually if the current in-style paradigm happens to favor that? That is a rather drastic request, especially if the verbal thinker is highly productive using verbal-centric tools. If us geeks could change our heads that easy, we wouldn't be software developers to begin with. We would be international sales executives or doctors or rock singers. Girls dig those more :-)

An exception may be if the majority of developers were a better fit to a given paradigm. This is similar to the "right-handed principle". It is generally cheaper to cater the majority than make multiple products or tools for the "exceptions". That is why "big and tall" clothing is usually much more expensive than department-store counter-parts. (The cost of the extra cloth is actually not that significant.) The demand for niche sizes is too small to take full advantage of Economies of Scale (mass production).

However, I have not seen any evidence that one paradigm is significantly preferred over others. Although OOP is the "nominal" favorite right now, many OO proponents complain that too many actual developers keep "sliding back toward procedural designs". Until better surveys are available, it is fair to assume that the various paradigms are on equal footing with regard to popularity. In other words, "equal or unknown until proven otherwise".

If one wants to get away from the issues of comparing different minds, then sticking with physical metrics is probably preferred. However, there are catches to physical metrics also. Somebody might have fast hands and slow eyes, or visa versa. Often times I would rather type in a search sub-string to greatly narrow down a list rather than read through a larger list to find what I am looking for. However, others seem to be faster at visually scanning such lists for potential matches than me. They seem to prefer to use their eyes for such tasks instead of their hands. My favored approach may also change over the course of the day as my body grows more weary. Spending most of the day typing and reviewing documentation may wear me out differently than debugging somebody else's code.)

To keep a foot in objectivity, one should simply count and measure the needed hand or eye movements, and not worry about assigning weights to the various movements until later. For example, simply record the fact that approach A takes 8 mouse clicks and approach B takes 4 eye movements of about 20 degrees each. Whether a mouse click should "count more" than an eye movement is a decision that can be made later, perhaps assigned based on individuals. One could then say something like, "Approach X favors those with strong hands but weak eyes while approach Y is the opposite."

It is curious to note that measuring programmer productivity can resemble user interface design. In essence, much of software engineering is indeed about designing an "interface" for developers. If our design makes it easier for future programmers (including possibly our future selves) to quickly and safely find and make needed changes, then many of the goals of software engineering have been met. Thus, it is the same goal as UI design, just a different audience.

Scenario Analysis and Numerical Metrics

Sometimes the issue comes up of whether one has to have an in-depth, long-term knowledge of something in order to measure the benefits. "If you had my experience, you would just know that it is better. I cannot directly describe why in normal language." I call this the "Zen copout" of some OO fans. It is anti-science in my opinion.

As a generality, the complexity of using the measuring system (ruler) for practical systems is an order or two magnitudes simpler than what is being measured.

My favorite analogy is a stopwatch to measure car speed. The person using the stopwatch does not have to understand car engines to use the stopwatch. (An even simpler approach is a same-time race. No stop-watch is needed for that except in very close races, which we are not concerned with here.)

Perhaps atomic clocks etc. may be used for very precise measurements, but these are usually for lab experiments and not for measuring practical benefits. For this discussion, we are not talking about minute difference, but measuring factors that matter to business owners and managers. Processes that involve human behavior are rarely going to require an atomic clock because of the vast variability in human behavior. Further, using an atomic clock and building one are different endeavors.

It might take a rocket scientist to build a rocket, but measuring the speed and payload ability is a simpler process. Countries without space travel technology can still track radio signals to measure it's speed, for example. (Ignoring stealth mode.)

So, if this "measuring is easier than what is being measured" rule applies to most everything practical, then is measuring OOP benefits somehow immune, as the Zen camp claims?

Can one honestly say that "to measure me is to know me"?

Manager/owners with an IT shop basically care about:

1. Maximizing features
2. Minimizing labor and supplies costs
3. Minimizing turnaround time
4. Maximizing quality
5. Maximizing machine speed

(We will not address the issue of which is more important at this point. Obviously things like #5 are less an issue today than they were 30 years ago.)

The above may also be referred to as "better, faster, and cheaper" as a shortcut, where "faster" usually refers to #3 and not #5.

These are what managers usually care about; not satisfying some internal intellectual gee-whiz curiosity. Managers are not here to entertain bored developers. (Although it may help morale some, it is not the main goal.)

They will hire, fire, and promote based on what they observe for the most part. They will probably not know about OOP in detail and should not have to care.

Manager/owners have two choices:

A. Rely on their technical experts

B. Observe the results (inputs and outputs)

Managers have to rely mostly on their own observations. Technical experts may be biased, have different goals, give conflicting info, not have enough experience with consumer behavior, etc. A technical manager may want to try the latest technical fad just to gain experience. (Although it should be pointed out that letting the staff work with the "latest and greatest" may help reduce turnover, regardless of whether the new technology is really better.)

The bottom line is that "to measure me is to know me" does not hold up from the manager's perspective. If you say to them, "Paradigm C is better because it has hyperbolictricaspulation.", the manager may say, "Well that sounds nice, but how does it get me better, faster, and cheaper?"

Your reply better be something along the lines of, "It puts related things together where-as the old paradigm scattered them all over." Even though this may satisfy him/her in the short-run, in the long-run if he does not see better, faster, and cheaper; he may grow skeptical. (Although, often it is too late to change at that point.)

A more technical manager may ask for more evidence that it "puts related things together" better than the old approach.

Keep in mind that measuring that something is better is not the same as discovering why something is better. The stop-watch will tell you which car is faster, but it won't directly tell you why.

Now, some have mentioned "provably correct algorithms". However, so far such efforts have had a tremendous negative impact on #2 and #3 (above). Thus, provability so far is not a magic bullet. Although it may greatly help #4, it does so at the expense of other roughly equally important criteria.

It would be nice to study large samples of actual situations to see which paradigms produce better/faster/cheaper; But, this is often not realistic. It also has the unfortunate side-effect of not telling "why" something is better.

Thus, the next best thing in my opinion is comparing two programs with "scenario analysis" or "change impact analysis". This is where (hopefully) typical scenarios are given to both parties on implementations with equal features. Each party should be given the opportunity to create scenarios to reduce bias. In the end, hopefully enough scenarios will be given to get a feel for how each code example stands up. (Although multiple paradigm/languages could be used for these comparisons, we are assuming 2 for example's sake.)

Typical scenarios may resemble:

"You need to add two new countries into all relevant parts of the system."

"You need to adjust the output to assume screen size B instead of screen size A."

"You needed to change the units from hours to minutes."

"Change the tax schedule for France to the following schedule....."

"Client X, and only client X, wants HTML table borders to show so that they can be printed. (Borders based on color don't print in many browsers.)"

Next comes the "counting" stage. Here are some possible ways to count or measure the benefits in scenario analyses.

1. Amount of code

This is probably the easiest to verify. Note that being easy to verify does not necessarily make it the most important. There is a famous story about Soviet shoe factories. In one sector the primary metric was number of shoes. The factories found that they could make more shoes faster and with less resources if they made more child and baby shoes. Thus, there was soon a glut of children's shoes on the market and a shortage of adult shoes.

A reverse situation is the voluntary export car quota that Japanese manufacturers and American law makers agreed on. Because Japan was limited on the quantity of cars they could ship, they shipped more expensive cars loaded with features such as high-end stereo, leather seats, sun roofs, fancy paint jobs, etc. Japanese cars thus moved to compete with the likes of Lincoln and Cadillac instead of Chevy.

The point here is that one metric is rarely sufficient to tell the whole story, even if it is an easy one to measure. In other-words, the ease of measurability is not necessarily proportional to it's importance as a metric.

Note that method of measuring code are not always agreed upon. Some approaches use raw line counts. However, different languages and paradigms may emphasize wide but fewer lines, and visa verse. Thus counting at a smaller level, such as operators and operands may be more "fair".

2. Stopping Points

Also referred to as "hop counts" in some debates, this is a count of the number of different places that have to be visited and/or altered. In debates one will often say something like, "I had to change only one method, but you had to change 5 different routines" (per same change).

One potential issue is what constitutes a "point". For the sake of discussion, I will use a "block" as the unit unless stated otherwise. Different approaches are examined below.

I find that these hop counts often relate to which proximity (grouping) the programmer/paradigm chose to emphasize. Emphasizing one aspect often "penalizes" competing aspects. Thus, the claimant may not actually be getting the free lunch that they originally envisioned. Debates of CASE statements often involve such sticky trade-offs. Scenarios can usually be found that benefit or penalize any given proximity decision. This metric is probably where OO fans make the most conceptual mistakes in my experience.

The most objective metrics in this category are perhaps the number of "places" that need to be changed for a given set of change scenarios. Here is an example scoring sheet.

Example Scoring Sheet
for Change Impact Analysis
(Only one application instance shown here)

Scenario ID	Description	Est. Probability	Named Blocks	Unnamed Blocks	Lines
1	Add Fedex shipping	0.6	4	7	15
2	Split description into internal and external	0.2	8	12	13
3	Make categories non-mutually-exclusive	0.4	5	5	7
4	2 price groups: regular and Gold	0.3	7	12	28

Here we are using 3 variations of the "spots that require changing" metric. One can count the number of named units, such as subroutines and methods; the number of un-named blocks, such IF and While blocks; and the number of lines-of-code that need to be changed.

See Jones below with regard to lines-of-code. Also, the smallest unit is used if multiple are applicable. For example, "method" is used instead of "class", unless perhaps un-wrapped attributes are used. A named unit is also counted as an un-named block if it is the only relevant or surrounding block. Thus, every change spot will increment each category at least by one unless it was already counted in a prior change.

All 3 are included because different people may believe that some approaches are better than others. If all 3 show about the same result (better or worse), then there should be minimal complaints. If some vary, then one can take a closer look at the actual code to study the pattern of difference. You may want to add all 3 together so as not to favor any one. (Perhaps with normalization done on the 3, based on relative totals, so that higher quantity metrics, such as lines-of-code, don't "swamp" the impact of the other metrics.)

The important thing is that all parties agree on the scoring approach. If they don't agree, then perhaps let each party score them the way they prefer, as long as they document their favored approach *before* the actual scoring to remove the appearance of bias. It may turn out that the final results do not vary significantly even under the different tally approaches. If they do differ, then further analysis to understand why it matters may give useful insights into perceived costs or narrow the scope of disagreement, as described later. In other words, total agreement may not be necessary to learn something useful.

The estimated probability is the probability that a suggested change will actually take place. "0.5" means there is an estimated 50/50 chance that particular change will be requested in the future. It is multiplied by the counts in each row. The purpose of this is to penalize based on a practical level of occurrence. For example, somebody may come up with a change scenario that totally "crushes" your favorite paradigm. You may look at it and say, "You are right, but in my experience that kind of change is rare." Factoring in probability will shrink, but not totally remove, the impact of rare changes.

Different parties may disagree over these probability values, in which case the scoring is done under the multiple proposed probabilities. Even if the final scores disagree, documenting that different groups perceive change differently is important. For, this alone may account for the difference in perception between some paradigm proponents in my observation.

For example, Robert C. Martin, an author of C++ training materials, believes that noun taxonomies (sub-typing) is an accurate description or modeling view of many real-world change patterns. Of course, I strongly disagree, observing that features tend to drift rather independently over time on the smallest granularity levels. Thus, Martin would probably rank lower the probability of changes which go against sub-typing patterns, such as feature selection becoming non-mutually-exclusive. But, at least we know where we disagree after reviewing some examples together. This is progress. (See also the Mellor Example.)

I sometimes suspect that exposure to OO training material will bias which change patterns people notice in the world. It is analogous to repeated viewing of TV commercials of teeth-whitening toothpaste. It makes you pay more attention to the color of people's teeth, perhaps ignoring other features in the process. "Making it easier to add a new sub-type" is a common (but misleading) theme in OO training materials, for example.

The above example scoring sheet is only for one paradigm or examined code base. There would be at least two of such sheets, one for each code base submission. (I see no reason to limit it to one code-base per paradigm. Analyzing multiple approaches can reduce claims of poor representation.)

Finally note that the above example is only counting spots that actually change. It may be possible to count spots that have to be reviewed for changes, but this is a tougher metric to track and measure it appears. However, it would make an interesting area of metric research.

3. Seek Time

This is the time it takes a programmer to find the spots that need updating. Ideally, actual times would be counted via a stopwatch of aggregate situations.

However, in the absence of such empirical studies, a trace of the mental and physical steps may be the next best thing. Example:

Step 1: Boss gives you a screen-shot of a form where he manually drew in a new field to be added.

Step 2: You see that it is an HR screen (based on inspection and conjecture) and that it has the title, "New Employee"

Step 3: You go to the HR code directory and search for any code file containing "New Employee".

Step 4: About 3 matching files turn up. You inspect each one until you find the routine containing the title that was displayed on the form sample.

Step 5: You inspect the routine to see what table(s) it uses.

Step 6: You add the new field into the screen and into the table.

Step 7: You realize that the "Change Employee" screen will also need the new field. (We will assume that the original programmer did not factor the similarities between Add and Change together into one routine.)

Step 8: Add new field to Change Screen.

Step 9: You test the new code.

(Note that this scenario is for example only and not necessarily meant to illustrate best practices.)

In the end, the number of steps could be tallied up by each side.

However, there may be some issue about what counts as a "step". Also, some steps may be bigger than others. For example, one approach may require one to search through 100 files and another require one to search through 1,000 files.

Thus, raw step counts are only a rough guide. One is going to have to study the actual step descriptions to get a full feel. The difference may be subjective and/or dependant on one's tools and/or skills with those tools. For example, somebody who has mastered a good text search utility will be able to find stuff quicker and easier. Further, some people have fast fingers and slow eyes, and visa versa. Both camps would rank the steps differently. One of these would rather type more to avoid visual work. (See prior section for more on physical metrics.)

Each side may be called upon to give further details about any given step.

Although this approach is fraught with many potential points of contention, it is nevertheless probably one of the most useful.

4. Defect Counts

Defects counts are tough to gather using scenario analysis. A full empiracle study may have to be performed to gather this kind of information.

However, defects can sometimes indirectly be inferred from some of the above. For example, the more places that have to be changed, the more likely it is for a programmer to miss one or mistype one of them.

Also, some errors may be detectable via the compiler in some language-paradigm combinations. For example, CASE statements usually do not complain until run- time about a value that is not in it's list. But, some strong-typed OOP languages can detect at compile-time that a variation failed to implement a method if that method was designated as "required" via some abstract (parent) class keyword. (This is one of the few areas I give OO a potential advantage, though the situations that could take advantage of it are not that common in my domain.)

5. Training Costs

This one is also hard to measure without extensive empirical studies. Another difficulty is isolating "switch over" factors. For example, some OO fans suggest that learning procedural programming first "ruins" a programmer's mind, so that learning OO is harder than it would otherwise be. However, perhaps such a programmer would not go into programming to begin with if they faced OO early on. Different paradigms may require and attract very different kinds of minds.

Another kink is that a paradigm which requires more training may result in more benefits in the end. For example, say paradigm A requires 3 years of training to get to productivity level 30. (After 3 years let's assume the benefit returns generally plateau.) Yet, paradigm B requires 6 years, but results in a productivity level of 50. The final answer about "which is best" may depend on the actual supply and demand of the labor pool. A programmer that is 40 percent more productive may cost 50 percent more.

So, I ask you, is demonstrating OO's superiority beyond the type of "scenario analyses" I described?

If so, why?

Why does is it different than measuring almost any other technical endeavor where the metrics are simpler than the implementations?

What makes OO special such that it is immune to input/output and time/motion metrics, unlike practically anything else in the practical world?

Capers Jones LOC Study

Capers Jones did a study of lines-of-code (LOC) per function-point (features). Some OO fans have used this study as "proof" that OO uses less code. However, there are some potentially large problems with that study:

"Cultural Factors" not factored out. One of the largest selling points of OOP is/was "reuse". Thus, organizations who buy into OOP are more likely to pay attention to intra-application reuse (among other types of reuse) because it was sold to them as such. If you pay good money to have a gas-mileage improvement device installed in your car, you are thereafter more likely to monitor gas mileage. In OOP's case, it may be a self-fulfilling prophecy. Whatever management pays more attention to is likely to encourage programmers to do the same because you-know-who signs their paychecks. I often have to maintain procedural code that has very poor repetition factoring. It is not at all because the paradigm cannot handle it, but almost exclusively because the perpetrators are rarely punished. Repetition factoring would probably reduce a good many programs by at least half LOC. The fixes are often even conceptually quite easy. Copy-N-Paste syndrome is rampant.
Procedural programming is often used by bare-bones scripting languages and assembler because it is easier to implement. Thus, the average LOC for OOP may end up being compared to languages like assembler.
OOP tends to stretch vertically instead of horizontally. In other words, OOP lines are perhaps longer on the average than procedural lines. Thus, counting raw lines as a measurement of information density may be misleading. Perhaps counting tokens and operators instead of "lines" or "statements" would be a better metric.
Educational background of OO versus non-OO programmers. OO may appeal to those with more software engineering education and/or general programming experience.
No side-by-side inspect-able examples have confirmed such claims. I have repeatedly asked OO fans to provide side-by-side code of the same application/example that proves that OO can do the same thing in less code than procedural/relational business applications. Nobody has met the challenge. (Some say the benefits only appear in larger applications, which makes one want to see Jone's LOC statistics by application size.) Usually it is fairly easy to exaggerate examples to emphasize/demonstrate aspects that something does better if one truly knows the reasons for the differences regardless of project size, but OO fans have strangely not been able to do such either. (Some have tried to compare bad code/languages to good OOP, which went over like a lead balloon.)
The differences within the paradigms are quite wide, suggesting that paradigm is only a minor player.
Nobody has been able to show or articulate exactly why OOP uses allegedly less code. (I have seen examples regarding device drivers that may qualify, but their structural patterns don't extrapolate to business applications for the most part.)
Jones admitted that the study was preliminary.
It appears that some or many of the scores for various languages were extrapolated from the scores of other languages deemed to be similar. Thus, actual studies of a given language may not have actually taken place, or partially influenced by other language scores if there were insufficient data points.

Also keep in mind that "less code" is only one of many useful metrics. For example some languages can do equivalent stuff with less code simply by using "implied constructs" which make the code harder to follow and less modification-friendly. Perl's "$_" family of operators is an example in my opinion.

Discrepancy?

Peter Douglass had this to say on usenet:

.....
Ah, yes, I have seen this function points analysis before, and I must say
that I personally do not find it very credible.  If you compare the results
on that page with the results of

haskell.org/papers/NSWC/jfp.ps

you will see that there is wide discrepancy:

If we compare number of lines to implement the "function points" vs number
of lines to implement the geo-server in the Haskell vs Ada vs paper we see
the following:

Language |  function-points |  geo-server |  ratio

Haskell          38                85         2.24
Ada              71               767        10.80
Ada9x            49               800        16.32
C++              53              1105        20.85
Awk              21               250        11.90
Proteus         107               293         2.74
 
(these are the only languages with entries in both tables).
 
In other words, there is over an order of magnitude variation between the
results obtained by the function points method and the results of an
empirical study.  This certainly casts a shadow on the validity of function
point analysis applied to different languages.
 
--PeterD

Goto Introspection Lesson

A debate once broke out where one participant claimed that OOP is a "step up" from structured programming (procedural with "blocks") the same way that structured programming was a step up from Goto programming (as in "goto line 30"). Although this is comparing apples to oranges, it did bring up an interesting question.

Being that OO fans have a hard time articulating the alleged advantages of OOP (beyond textbook doctrine, which falls under scrutiny anyhow), we wondered if it was possible to describe exactly why block programming is "better" than Goto's. This could serve as a microcosm for the larger OO versus procedural/relational debate. Thus, our articulation skills could be tested on something smaller as practice and a little pre-study.

The original article that triggered the demise of Goto's by Edsger W. Dijkstra (Communications of the ACM, Vol. 11, No. 3, March 1968, pp. 147-148) did not provide much help. It said something along the lines of "being easier to figure out where you have been". I don't think this has been mathematically proven. The paper needed exact math or code examples rather than words to prove it's point.

Being that I spent about a year doing Goto programming in my earlier days, I thought about the difference. I did not like heavy use of Goto's that much (when shown block techniques) even though I started learning with goto's. But, heavy introspection has more or less failed to describe exactly why.

The reason that I mentioned that I learned on goto's early is that some OO fans say that learning procedural first "ruins" your mind to OOP, and de-programming takes many years. Thus, I wanted to highlight that I was not "ruined" or pre-disposed to blocks by learning blocks first. (The "ruin mindset" claim of those OO fans is rather extreme if you ask me.) I was pretty much exposed to both blocks and goto's at the same time. Some shops and campuses used old language versions that did not support blocks, usually for budgetary reasons.

But, something did occur to me. I did not mind my own Goto code that much. There were certain conventions that I followed. As long as I stuck with these conventions, the results were not that bad, at least not to me. Even today there are some algorithms that would be simpler (IMO) with some Goto's sprinkled in under my conventions. (I tended to use Goto equivalents of IF-blocks for smaller statement units. Thus, the IF block was a welcome addition to me.)

The real problem was somebody else's code. Their Goto's looked like spaghetti to me. I don't know whether their convention was simply different from mine, or that they lacked a convention altogether. Unfortunately, I never asked if the others used a convention system. Just like me, they never bothered to document it; let alone explicitly analyze their approach to see if there was a pattern.

Thus, from my perspective it is not Goto's themselves that are the real culprit, but Goto's in the hands of other programmers besides myself. The block system (structured) is more consistent from programmer to programmer. Certainly not identical, but much more consistent than Goto code. For example, one never goes back up in a routine without using a loop. Thus, if there are no loop blocks (While, For, Until), then you know the "pointer" never moves up, only down. The block "types" signal what kind of movement patterns to expect.

Is this what it all boils down to: inter-programmer consistency? This would make it nothing more than a herding mechanism. Barring more evidence, this is the best that can be done with regard to "proving" nested blocks are better. Even objective measures of overall consistency are lacking, short of specific axioms like the above loop rule. However, since nobody seemed to complain much about the fall from grace of Goto's, I suppose it is a moot historical issue. (Except that we failed to collect enough information to apply to later lessons.)

However, OO is not going to out-thrown procedural so easily if I have my say. Can we extrapolate anything we learned from the Goto experiment? Is OOP more consistent across programmers?

I would have to answer a definite "no" on this. If anything, OOP gives programmers more ways to make messes, and each "celebrity guru" has a vastly different methodology. There are choices whether to use IS-A or HAS-A relationships, whether to group methods by nouns or some other kind of class grouping, etc. If anything, I find more consistency in procedural programming. Procedural programming code is usually grouped around tasks or actions, and the "noun modeling" is mostly via the database instead of programming code. If I want to find the code for the month-end summary process, for instance, there is fairly likely to be a MonthEnd or MonthSummary routine or module around somewhere. There is also likely to be a "main" or "main-menu" routine to lead to the month-end routine or process if the process is started in interactive mode. (Although I tend to table-ize menu or screen navigation in larger projects.)

Some say that procedural approaches lead to "sequence lock-in", meaning that you cannot easily change the order of execution of the parts. This is hogwash. All you have to do is treat each task as an independent task. In other words, don't assume or build-in any prerequisites unless necessary for the task (beyond paradigm issues). See OOSC2 Chapter 5 Critique for more on this.

I rarely find such consistency in OOP designs. Often something like the month-end processing would be chopped up into multiple methods and the pieces stuck to the different nouns (classes) that seemed like the best fit to the programmer at the time. OO fans often talk about "dividing up responsibilities" among the nouns. However, that is a rather leaky grouping approach. (I argue that this is often an arbitrary endeavor in the Aspects write-up because the "source noun" is too dynamic in business applications, and/or each operation often involves multiple sources. That is, multiple nouns.)

The "nouns-in-DB-and-verbs-in-code" thumb rule of the procedural/relational methodology simply reduces the possible medium- and large-scale code structure variations. If a decent objective numerical consistency metric set could be devised (perhaps the same one used for the above GOTO tests), I believe it would confirm my observation.

In summary, it appears that:

The lessons learned from Goto's are not very complete.
The benefits of post-Goto techniques don't seem to extrapolate into anything that clearly helps the OOP-versus-procedural/relational debate.
Consistency may be an important metric, although at this point agreeable numerical metrics are lacking.

Shape of Flow?

Another suggestion that I encountered at c2.com is that under nested blocks the code indentation allows one to "see the shape of the flow". With Goto's there are only a few visual clues to the flow from one part of a program or routine to another. It is indeed much harder to follow the code flow if the indentation is missing or "messed up" in a block structure, at least for me.

If this is indeed the main reason for the acceptance of nested blocks over Goto's, I don't see how this lesson applies to OOP betterment claims, though. OO is even less visual than relational-centric approaches in my opinion. I can spot far more attribute patterns if information is in a table than if in code. Part of the reason is that adjacent rows and columns are usually visually next to each other at the same time. (See Control Tables.) Another reason is that one can issue queries or views to change their perspective of a given set of information to help emphasize what you want to emphasize.

Fighter Jet Metrics Analogy

One could judge fighter jets by their maximum speed. Max-speed is certainly an objective metric and it is relatively easy to measure. However, there are other aspects that affect the jet's ability to shoot down enemy planes besides raw speed. Max-speed would be roughly analogous to counting lines of code in programs because it is easy to measure, but not the full story. (See above Jones LOC study comments.)

If you ask expert pilots about why one jet may be a better fighter jet than the others, he/she may say things like, "Well, jet A is more stable in sharp turns than jet B. Jet B wobbles too much to get stable aiming."

Thus, "wobbliness in turns" enters into the list of "verbal metrics" that jet designers can look into closer. It is harder to measure and probably subject to more fuzzy interpretations/definitions than max-speed, but it can be done with some effort. Vibrations can be measured, and the relationship between vibration, speed, and turn angle can also be measured. (Actually, vibration can have multiple parameters on it's own, such as amplitude and frequency. However, we will roll these into one "energy value" to keep the example simple.)

Another factor could be how well pilots do under different combinations of the above 3 variables. More than likely, different pilots will perform differently under the various combinations.

As time goes on, the pilots would be able to articulate more and more little metrics like this one which could be turned into measurable components. At first their impressions may be based on gut feelings, but after fiddling and thinking about things, they are able to articulate more and more about why they think Jet A is better than Jet B.

I kind of like the jet analogy because the design of fighter jets is often as much about helping the pilot do a better job as it is about raw technology. Like software engineering, it is about the melding of the human controller to a machine.

As time goes on, it seems that fighter jets are becoming more and more automated. In the future they may not even have pilots. In other words, a remote-control maneuverable missile. This analogy may then no longer apply, other than perhaps a hint about the future of software engineering. "Dave, I fixed your bug. Do you want to see it, Dave?"

Different jet designs may also favor different pilots in different ways because each pilot has different skills, habits, and abilities. Some may think quicker, while others are able to concentrate on more tasks at the same time, etc. Even among the top pilots, their relative strengths and weakness will be different in different areas. Different jet designs may favor one of these traits over another.

In other words, a one-jet-fits all may not be the best approach. The same might be true of software design tools and languages. It is also true that for compatibility and economies-of-scale, that a custom jet/tool for each pilot/developer is not practical. However, a nice compromise might be a set of jets/tools that pilots/developers can choose from and hone their skills around. Each choice could have a distinct "personality" or philosophy of design.

Partial Apology to Perl Fans

Dear Perl Fans,
A couple of years ago I started a "criticisms of Perl" rant on the comp.lang.perl newsgroup. After pondering it for a while, I realize that my criticisms were generally misdirected.
Don't get me wrong, I *still* think Perl is a "write-only language" and I still don't like the idea of using arrays of array pointers for complex collection management; but what I think may be moot. You see, Perl started becoming "mainstream". That made me fear that it would get shoved down programmers throats who were not fond of Perl. OOP did this, and I didn't want to see it happen again. I inadvertently projected this Borg-esque fear onto Perl.
The benefits of paradigms and development tools are largely subjective. Just because developer A works well with tool A does *not* mean that developer B will also work well with it. One-size-fits-all does not work because we all have very different brains. (A new field, "technical psychologist" or "organizational psychologist" is perhaps needed.)
Thus, as long as Perl fans can read each other's Perl code and be productive, it does not matter if I or other non-fans cannot do the same with Perl. Similarly, you perhaps would not like applications that I design. (Note that I have not verified whether Perl fans can read each other's code.)
But, if anybody says that "Perl is for everyone", then I *will* take objection. That is crossing the line. Unless, of course, objective metrics can be provided.
In summary, I hope everybody can learn from my mistake and not over-extrapolate their own preferences onto everybody else. Viva La Difference! (Is my french better than my Perl?)
PS. The "scope" code example that I posted back then indeed was a piece of junk, as somebody later pointed out to me. However, the concept that I was incorrectly trying to illustrate was *not central* to my primary criticisms.

Thank You for your understanding, -tmind-

OOP-Criticism | Top-of-document | Change-Patterns | Fad-Cycles | Science & Math

Goals and Metricsin Programming Organization