Updated 2/4/2002
The following challenge is from John Urberg:
Alleged Relational Approach: ---------------------------- String query = "select * from invoice, customer, sales_rep, invoice_line, product, product_type where invoice.customer_id = customer.customer_id and invoice.sales_rep_id = sales_rep.sales_rep_id and invoice.invoice_id = invoice_line.invoice_id and invoice_line.product_id = product.product_id and product.product_type_id = product_type.product_type_id and invoice.date >= today()"; ResultSet rs = con.executeQuery(query); while(rs.next()) { switch( rs.getInt("product_type_id") ) { case 1 total += base_calculation() case 2 total += extended_calcuation( ) case 3 total += special_offer_calcualation( ) case } } An OO Approach: --------------- Vector invoice = Database.get("Invoice", "date >= " + Date.today()); Enum e = invoice.getElements(); while(e.hasElements()) { invoice = (Invoice)e.next() total += invoice.getTotal() }
The (implied) allegation is that the procedural/relational version is more code and that it is factored better.
I agree that the first version is more code, but only because it is more complete. The OOP version has simply put many of the details somewhere else. (If the method calls are through purchased API's, then he should at least describe them in detail or provide the spec.) We can do the same thing with subroutines and/or stored procedures and/or views.
// Re-worked p/r version rs = openRS("inv2", "invoice.date >= today()") while getNext(rs) { total += invoice_getTotal(rs) }Here we ask our routine openRS() to open the SQL statement called "inv2" and attach our extra AND clause to it. It is a regular ol' function. Our invoice_getTotal() routine does the same thing that the OOP version does. (Perhaps it should be called "getAmount" instead of "getTotal". I don't know why John called it that.) The product type case statement would be inside of invoice_getTotal().
Thus, procedurally we can also "farm stuff off" to other routines/modules/units/classes/snippets/blobs/etc.
Regarding the product "types", generally I avoid hard-wiring a hierarchical taxonomy into things like products, prefering a set-oriented classification technique, usually using RDBMS. (See Subtype Proliferation Myth for the reasoning behind this.) However, assuming that we keep subtypes here, some OO fans might say that OOP gets rid of case statements. Usually it does not get rid of them, it just shuffles them around. Most likely there will be some unit of code (method, class, etc.) that corresponds to each case block somewhere in the OOP code. It just may look like:
class Bathroom_products inherits Product ... method amount { return base_calculation(All three detail lines correspond to John's original case statement.) } ... } class Kitchen_products inherits Product ... method amount { return extended_calcuation( ) } ... } class Discount_products inherits Product ... method amount { return special_offer_calcualation( ) } ... }
Thus, the example given is misleading, or at best incomplete because we have to guess what the rest of it looks like.
Note that John claims that this is what typical code looks like that he has actually seen. This challenge is not about what programmers actually do (that is dealt with elsewhere), but about what the paradigm is capable of.
But ignoring this rule and comparing what John has shown (assuming we extrapolate the missing portions to match the procedural code), no metrics have been given to show that one is better than the other. One implication is that the OOP version has better repetition factoring. In other-words, stuff referenced multiple times is centralized on one spot so that it does not have to be explicitly repeated for each usage.
But, we cannot tell how many times getTotal() and getElements() are referenced throughout the total application. Thus, we have insufficient information to make that determination.
Thus, usually the sale price will be in a single field in either the Invoice entity or an InvcDetail entity (depending on how bulk discounts or other things are recorded, but the detail-level is a better bet IMO). Thus, the calculation steps may be as simple as:
SELECT SUM(SalePrice) FROM InvcDetail WHERE [criteria]Note that we may not even have to care about the Invoice entity. Why John chose to couple the node extraction to Invoice, I don't know. We may only care about a particular product or sum the discounts, for example, regardless of which invoice they occur in.
In my experience most of the SQL statements are too different per task to warrant factoring. About the only place that I can think of is possibly the joins. The join paths are fairly consistent across the application. They are often even known to the RDBMS by referential integrity rules, if not explicit link declarations. (It varies per DB. See example Join Table in the next section below for a rough example). However, there are a few problems with the simplistic approach of putting the join clauses in a central spot.
First, which joins participate often varies. John's example above includes a Sales Rep link. In other queries we may not need this link. Links are not cheap operations, so we should be careful to link only what is needed. How John's example would "know" to exclude it in other situations is not clear.
Second, sometimes you need regular joins and sometimes you need left joins. (Left joins are where filler data is included for missing matches.)
Thus, bulk isolation of the joins appears a bad idea. However, there may be ways to reduce and simplify the specification of the "regular" joins. Perhaps have a sub-language where you specify the tables and which join type to use:
SQL = std_join("invoice, customer, sales_rep, invoice_line, product, product_type")This would generate the FROM clause and part of the WHERE clause. (The rest we have to append manually.) And what about left joins? Perhaps have an asterisk that specifies where table gets a left join:
SQL = std_join("invoice, customer, *sales_rep, invoice_line...etc...")Here, the asterisk on "sales_rep" indicates a left join.
However, there are still potential problems. What about table aliases?
SQL = std_join("i:invoice, cu:customer, *s:sales_rep, invoice_line...etc...")What do you think? We are essentially trying to re-invent SQL here. Would such a component/utility be useful? I suspect there may still be too many exceptions which require us to diddle directly with SQL. Our simplification is not likely to approach the full expressive ability of SQL.
Thus, a sub-query-language API may be able to squeeze some repetition from typical SQL statements, but the benefits are marginal in my opinion. The important thing to realize is that there is nothing about the procedural/relational paradigms that prevents it.
Join ID | Table 1 | Key 1 | Table 2 | Key 2 |
---|---|---|---|---|
42 | invoice | customer_id | customer | customer_id |
43 | invoice | sales_rep_id | sales_rep | sales_rep_id |
44 | invoice | invoice_id | invoice_line | invoice_id |
45 | invoice_line | product_id | products | product_id |
Note that the fact that the ID fields have the same name suggests a simplification of the table. However, I prefer the convention of "_ref" for foriegn keys. Thus the primary key would be something like "customer_id" and the foriegn keys for it would be "customer_ref". Also, I left out the "product_type" table because of my distaste for subtyping in business modeling.Such a table could perhaps be used to create automatic joins if we can only find a way to deal with potential field name overlaps. This issue is normally resolved by creating aliases in SQL statements.
One approach is to be more explicit about which table the field request is from.
// OOP x = curObj.sales_rep.commissionRate x = curObj.getField("sales_rep", "commissionRate") // alternative x = curObj.getField("sales_rep.commissionRate") // alternative // Procedural x = getField("invoice", "sales_rep", "commissionRate") x = getField("invoice", "sales_rep.commissionRate") // alternative x = getField(handle, "sales_rep.commissionRate") // alternative // Multi-command procedural version setEntity("invoice") // set "current" or default entity x = getField("sales_rep.commissionRate") // used with priorHere, each field request needs to know the "current" table, the source table, and the field name. (In the OOP version we assume that the current object holds the current table reference. A similar situation holds with the p/r "handle" version.)
"getField" has enough information to find what we need without an explicit SQL statement if it has access to our Join Table. We will tentatively call this approach "late joining". It might look like we made explicit SQL obsolete, but the picture is not quite as rosy as it may first appear:
The extra reference information would exceed the code size of a typical SQL statement after roughly a dozen field references or so. Thus, it may be justified for shorter tasks if the efficiency issues don't come into play.
John also included an "ISAM" example (XBase-like), but the issues involved were not really different, so I did not include it here. Note that I agree that a typical SQL version is usually better at joins than cursor approaches under most circumstances. But, there are other areas where it is superior.