Dynamic Relational

Beyond the Static Model

Updated: 4/2/2004

In application programming languages, you generally have "static typing" and "dynamic typing" programming languages. Pascal and Java are examples of statically typed languages, and SmallTalk and LISP are examples of dynamic languages. Over the years, I have grown to prefer the dynamic approach, and began wondering if more dynamic features could not be applied to relational databases and relational theory.

Dynamic relational databases (DRDBMS) could perhaps help out with some niches that the current crop of RDBMS don't handle very well. Some of the complaints that OOP proponents and Functional Programming proponents have against RDBMS seem to be a result of their static nature, and not relational theory itself.

Cell Types

Every cell could be open-ended in size and type, assuming no constraints are applied to a given column. One could envision each cell being pretty much a string. (But this does not mean we have to put quotes around numbers.) Operators would probably have to be clear about whether they operate on dates, numbers, or strings, etc. For example, some RDBMS use "+" to mean both string concatenation or numeric addition, depending on the operands' types. That practice would have to end. (I never was fond of it anyhow.) One would also have be be clear whether comparisons are intended to be string comparisons, or some other type, such as numeric. Other than this, SQL queries would not need to look too different than under the static approach.

Perhaps for backward compatibility, the schemas could provide an optional "type" indicator for columns so that old-style SQL can still work in cases where ambiguity would otherwise arise. An optional type indicator could allow one to tune the system to act like a static RDBMS. Static options are discussed in more detail later on.

If you want to enforce that a given column be a particular way, then you add constraints, not "types". Ensuring that a value has only digits and has no more than one decimal, in other words a "number", would perhaps be part of a standard (included) library of constraints.

Open Schemas

Many developers complain that database administrators are bottlenecks to projects which have strict deadlines. One possible way to alleviate this is to make the number of columns open-ended per record. (Perhaps tables could also be programmer-defined, but lets not consider that just yet.)

Each record (row) would be like a dictionary array (sometimes called "associative arrays" or "hashes"). You can add new attribute-value pairs as needed. I know this may seem odd at first, but ponder it for a while before you outright reject it.

One might ask what happens if you do a query in which the target column (attribute) is not there. It would be treated just as if it was there and it was blank or null.

I don't think we need nulls anymore with a DRDBMS because numeric operations could return an empty string. Numeric operations, such as averaging where there are no matches, were the only reason I saw to keep nulls for numbers. If the average operation can return an empty string if it cannot find records to average, then we can do away with nulls.

To keep things simple, there should be no distinction made between an attribute with a blank value and the non-existence of that attribute. If you remove that rule, then it creates boatloads of problems I have discovered. I think this is one reason XML does not allow pair-less attributes, unlike HTML. For example, the HTML "HR" tag allows one to say <HR NOSHADE>. But XML would require something like <HR NOSHADE="Yes">.

If the existence of an attribute-value pair is required for all records in a table, then a constraint is added for the given table. In essence, a "required" constraint. Primary keys would probably need a required constraint in order to satisfy relational theory rules.

Implementation

A typical geek will probably start envisioning how to implement such a contraption internally. I suggest you think about each record being an XML statement. XML is in fashion these days, so talking about it in terms of XML won't get you fired. An "Employee" table might be represented as a file that looks something like:

  <employee empID=3467 lastname="Jones" manager="yes"/>
  <employee empID=1273 lastname="Smith" 
        firstname="Robin" salary=68000/>
  <employee empID=4260 lastname="Lee" 
        firstname="Nancy" hiredate="10-May-1996"/>
  etc...

Indexing should not be significantly different than static RDBMS. After all, most modern RDBMS don't use fixed record sizes anyhow. Thus, they have already solved the problem of indexing variable-length records.

It is true that this approach may take up more disk-space than the static approach because numbers are not as compressed. However, performance "hints" can perhaps be given to the database such that it knows to compress the values. More space may also be needed for column names since they are not known in advance. However, compression techniques can probably supply shorter representations of often-used names. Remember, we don't have to store place-holders for blank cells due to the equivalency rule given above.

If columns (attributes) can be added willy-nilly, then one may ask if there is a central schema. Whether there is or not is an implementation issue. Having one will make it more expensive to add new fields, but not having one will make it more expensive to take inventory of a given table.

To define constraints and required columns, a minimal central schema will probably be needed. It just would not contain every possible column, but only those columns which have special treatment. Perhaps a table could be "locked" to reflect only what is in the schema if needed.

Tuning the Level of Static-ness

A DBA could optionally indicate that a given table must have pre-defined columns and perhaps a "type" indicator for each column. The level of static-ness is then up to the DBA. Some applications or tables may be better off under the looseness of dynamic-ness, while others may need more controls. Perhaps during development, the loose approach may make more sense, and then tightened as a project matures. Thus, our system can be both a static one and a dynamic one. We are not stuck with one or the other.

Result Sets

When an application talks to a relational database, generally the result set structure is known in advanced. However, under the dynamic approach, one may not know in advance which columns will be returned. Existing frameworks may not be prepared to deal with, and some kind of translation layer may be needed. One approach is to have the middle layer take inventory of a given result set and create a column list based on the result. However, you may get a different answer depending on the query criteria issued.

Generally, the best approach seems to be to treat each record in the result set like a dictionary array. A "getNextRecord" operation would simply return a new dictionary array. One possible drawback is that many array implementations treat empty values differently than missing attributes, which does not mesh well with our convention (see above). Thus, either we have special API's to access the columns in a consistent manner by supplying a blank value if needed, or we supply a list of "expected" columns to the API before issuing a query or as part of the query. Use of the SQL-style asterisk would have to be minimized. Example:

  sql = "select empID,firstName,LastName,Salary from Employees";
  rs = query(sql, stdConnection);
  while (row = getNextRecord(rs)) {
    fancyPrint(row);
  }

Thus, a "filler" blank value will be plugged in to any of the missing columns given in our Select statement by the API. (Remember, constraints can be put in the table if certain columns should always have values.)

Keep in mind that this only applies to limited languages or limited API's. Sticking an asterisk in the Where clause could still work if our application system is prepared for it.

Another way to handle dynamic rows in static languages or API's is to return the "variable" portion as a separate dictionary array or string. For example, suppose we had a table with 3 required columns: "ID", "Name", and "Location". The accepting row structure could perhaps resemble:

  structure myRow {
    ID: Int;
    Name: String[40];
    Location: String[50];
    OtherColumns: DictionaryArray;  // dynamic columns
  }

Alternatively, the "OtherColumns" could be a long string that is an XML-like or comma-delimited string ("foo='bar',glob='flip',zap=34") containing the dynamic column value pairs. Special functions or API calls could parse out a requested value from such a string.

Does this Violate Relational Theory?

Some have suggested that this approach violates "pure" relational theory. I have yet to hear a convincing argument. I have never seen a widely-accepted relational rule that says that all columns must be known in advance, other than primary key, which we still have. Even if it did, we just define all possible columns as already existing, they just happen to be blank much of the time. Nor does relational theory seem to demand that column types be known in advanced, or that formal types be used. Maybe specific query languages require certain conventions, but those are specific implementations of relational theory. Besides, some question the relational-ness of the existing static RDBMS. But they were implemented anyhow.