Merging OOP and TOP?

A discussion in the comp.objects newsgroup got me thinking. Can OOP and TOP be merged? A table record actually is very similar to an OOP object. Fields can be seen as properties in many cases.

I am assuming here that the actual (physical) representation of the persistent (stored) objects in OOP uses familiar relational tables and indexes. (If there is a better way, I would like to know about it.) This does not mean that programmers use relational techniques and syntax, but that low-level storage and retrieval mechanisms are basically indexed tables.

Apparently, converting table data into objects and back from objects requires some sort of mapping between tabled records and fields into memory objects. These memory objects are organized based on OO techniques instead of relational techniques.

This mapping obviously requires extra planning and programming. (Some say it can be made automatic, but have yet shown how.) The big question is whether or not this "mapping" step is worth the alleged organizational improvements handed to us by OO techniques. Thus, the mapping allows one to transcend the primitive organization of relational tables.

Before exploring this question more, lets first review our alleged advantages of keeping structures in more table-oriented organization. Thus, before leaving Alaska, we will review the niceties of Alaska.

Automatic Persistent Storage - allocated records are automatically saved to disk. With most OOP, you have to program in the disk-saving behavior.

Automatic memory-caching - The programmer does not have to worry about structures larger than available memory.

Closer to index structure - A non-indexed search on a large table can turn a one-second data lookup into a 40 minute wait. From a practical perspective, this is not a trivial issue.

Closer to physical table structure - Algorithms are often easier to re-create than data. Given a choice of losing custom software built over the years or losing data gathered over the years, most organizations will choose to keep the data. Thus, the self-evident structure and representation of the data perhaps should not be sacrificed at the expense of "cleaner" algorithms. OO-ing the data may make it unreadable if the generating program disappears. It may also make data sharing more difficult.

Now we are going to look and a typical Object Oriented database. The properties are thus:

  Class ADDRESS
    Name, type text
    Section, type text  (dept., suite, etc.)
    Street_Addr, type text
    City, type text
    State, type text  (Assume U.S. for simplicity)
    Zipcode, type text

  Class CONTACT
    Name, type text
    Phone, type PhoneNum
    Address, type Address

  Class CLIENT
    CompName, type text  (company name)
    Primary_Address, type Address
    Sales_Contact, type Contact
    Billing_Contact, type Contact

  Class VENDOR
    CompName, type text
    Primary_Address, type Address
    Primary_Contact, type Contact

Assume this is a fairly large company, or better yet, assume that we want our system to be able to scale well for very large companies. Until a better technology comes along, we will need to use indexed tables to represent the physical data.

Although there are many ways to organize the tables, we will make a dichotomy of two organizational types. One type will be "element-oriented" (EO), and the other will be "packet-oriented" (PO). EO tends to break things into logical chunks, while PO tends to keep data in self-contained chunks.

The EO table would closely mirror our Object-Oriented structure:

  Table ADDRESS
    Address_ID, type primary key
    Name, type text
    Section, type text
    Street_Addr, type text
    City, type text
    State, type text  
    Zipcode, type text

  Table CONTACT
    Contact_ID, type primary key
    Name, type text
    Phone_id, type key to Phone Table
    Address_id, type key to Address Table

  Table CLIENT
    Client_ID, type primary key
    CompName, type text   (company name)
    Primary_Address_ID, type key to Address Table
    Sales_Contact_ID, type key to Contact Table
    Billing_Contact_ID, type key to Contact Table

  Table VENDOR
    Vendor_ID, type primary key
    CompName, type text
    Primary_Address_ID, type key to Address Tbl.
    Primary_Contact_ID, type key to Contact Tbl.

Basically all we did was add primary keys to each table and use these as "pointers" to the corresponding entries in the element tables. (Users of relational tables should be familiar with this process.) Note that the "Name" in the Address table is being ignored, or overridden, by the Contact Table.

Our Packet Oriented (PO) Tables would look like:

  Table CLIENT
    Client_ID, type primary key
    CompName, type text      
    Primary_Address_Section, type text
    Primary_Address_StreetAddr
    Primary_Address_City
    Primary_Address_State
    Primary_Address_Zipcode
    Sales_Contact_Name
    Sales_Contact_Phone
    Sales_Contact_Section
    Sales_Contact_StreetAddr
    Sales_Contact_City
    Sales_Contact_State
    Sales_Contact_Zipcode
    Billing_Contact_Section
    Billing_Contact_StreetAddr
    Billing_Contact_City
    etc...

  Table Vendor
    Vendor_ID, type primary key
    CompName, type text
    Primary_Address_Section
    Primary_Address_StreetAddr
    etc...

The PO organization has only two tables here. It is more "flat". The EO tables are more structured in that repeating portions are referenced instead of actually repeated. It provides some nice features. For example, if we wanted to add a fax number field to all the contacts, then we would only have to change the Contact table. With the PO setup, we would have to add at least 3 fields.

Is it possible, however, that the EO layout is overly structured? For example, to retreive "company name of all cleints who are billed in Chicago" would require going through three tables. This makes our system more fragile--if there is an indexing or referencing error, we are more likely to get mismatched address and contact info.

Traditional relational techniques are also tougher to use with EO. For example, under PO the SQL syntax for our sample query might be:

  Select client_name from client 
    where billing_contact_city = "Chicago"

In our EO setup, this would look more like:

  Select client.name from address, contact, client
    Where address.city = "Chicago" and
    address.address_id = contact.address_id and
    client.billing_contact_id = contact.contact_id

If SQL was more EO-friendly, it would perhaps handle a syntax more like:

  Select client.compname from client where
    client.billing_contact.contact.address.city
    = "Chicago"

The links between the various tables would be automatically followed by this great new OO-SQL. Since the Contact table "inherits" the Address table fields, perhaps this can be shortened a little bit to:

  Select client.compname from client where
    client.billing_contact.address.city
    = "Chicago"

This would be much nicer than our four-line, traditional SQL example. Unfortunately, this type of SQL is not (yet) in common use.

Note that perhaps a compromise between EO and PO could have been made -- put all the address attributes in the Contact table so as to reduce the hierarchy from three to two. Having a separate Address table seems to us a bit carried away.

Our problem is that the common OOP languages do not directly support EO tables (or any tables) to represent objects, even though they can closely, and perhaps automatically, map to OO structures.

If the built-in syntax and methods of OOP directly handled these types of tables, then the benefits of table-oriented thinking can perhaps be shared with OOP languages. In addition to the "OO-SQL" syntax example above, commands like:

  client cli = new client("Bank of Clouds")
  cli.address.city = "Miami"

are equivelent to adding (appending) a new record to a persistent table. I am not sure what the ideal syntax should be. However, some good hard thinking needs to be done to stop seeing tables and objects as separate things.

Current OOP languages make persistent storage a separate and manual step. This does not seem necessary to us. The common OOP languages are, it seems, "memory-centric".

Perhaps our beef with OOP is not so much OO features such as inheritance, polymorphism, and encapsulation, but this memory-centricity that we keep seeing.

Give us automatic table mapping, multiple table types, and control tables, and we may just stop bashing the common OOP languages.

Footnote on 1-to-many relationships

Some have claimed that OODBMS are more efficient because related objects are "stored together" when there is a one-to-many relationship. However, this generally cannot be true if new objects are inserted into the "many" collection because gaps cannot easily be made in the middle of a file to fit new objects into. Thus, pointers to the insertion point(s) (at the end) must eventually be created for new objects. These pointers are no more efficient than indexes, which are basically pointers themselves.

It is true that the OODBMS can be "packed" to put the related parts physically together, but this "packing" can happen with relational tables as well so that related items are together in memory cache.

Thus, OODBMS does not solve the 1-to-many insertion problem as some claim.

[Back to T.O.P.] . [Main]