Weaknesses of present solutions

As of my present techniques research, the weaknesses can be classified into three areas:

The next sections will *deal* the them in more detail.

Data Type Diversity

There is a lot of sources the data is coming from: database, file, configuration (property) file, XML file, a data structure (an object), business objects, input parameters of a HTTP request (entry form data), search engine results, and many many others. What is noticeable on this enumeration is the diversity of data access methods. It is quite understandable that the approach of obtaining the data from a database differs from the one where the data come from XML file. It could be, in the first case, an instance of "ResultSet" and, in the lather case, "DocumentHandler" from SAX XML interface. Obviously, this is only an example, the approach differs based on programming language, component library, available data access techniques such as: JDBC, ODBC, proprietary drivers; SAX, DOM, proprietary parser, etc.

One can see, that it is difficult to design code accordingly to all possible data source variations. The most trouble comes when designing a component library with reusable code. For example, a component dedicated for debugging (inspecting) a business entity property values. The entity could be stored in a file, database, XML format, as an input form fields, etc. The debugging component has to consider all possible data types and has to be designed appropriately.

Another example is approach of business procedures. A business procedure is a technique of encapsulating "business logic" (mostly, if not only, data transformations and data checks - validation, verification). Such a procedure need to access input data, perform the transaction, create an output data and return it. In this case, a need of unified data access approach is more then obvious. It is unacceptable to write several instances of the same business procedure, each dedicated for different data sources.

Figure 2.2: Data Flow Diagram: Example of type diversity --- using business procedure

Figure 2.2 shows an example of business procedure "StoreNewUser". It accepts input data --- the new user information, such as user name, password, fullname, role, etc. After input data validation checks are performed, the procedure creates adequate records in a database. As shown in the figure, there are three different objects holding the user's input data: "HttpRequest", "Request" and "3rdRequest".

Figure 2.3: Class Diagram: various input data types / classes

Each of them have different access methods used to retrieve the data, see Figure 2.3. Now, there are two solutions for this situation. The first one is to write three versions of "StoreNewUser" where each of them is tailored for given data type. Let's leave it to reader's judgment to figure out all the disadvantages of this (really bad) solution. The other solution is to propose a common data type into which all the other would have to conform. It is the "BusinessObject", from our example. This way the "StoreNewUser" procedure deals with one data type and it is easy to develop new data adapters as the new input data types come along in the future. Just note that the data adapters adapt the various data types ("HttpRequest", "Request" and "3rdRequest") into one uniform type ("BusinessObject"). This solution is shown in figure 2.4.

Figure 2.4: Data Flow Diagram: Example of type diversity --- solution

For the comprehensibility/complexity of the example, figure 2.5 shows the class of the proposed "BusinessObject", representing the unified input data type.

Figure 2.5: Class Diagram: unified input data type / class

One can notice, that there are extra objects necessary to develop --- the adapters --- to satisfy this solution. But one should not forget, that such adapters can be provided beforehand for all expected data types and reused as requested.

Data Manipulation Repetition

Next, there is a lot of data manipulation performed frequently. It is data type conversions, data value validation, use of default values, etc. They are very minor and simple operations, but they are frequently required in many locations. Insufficiently well designed data management leads to very little code reuse, "copy & paste" type of development which subsequently results in difficultly maintained code.

Let's use an example for better illustration: when developing WWW application a need of preprocessing form data is very frequent. A typical "parameter processing" scenario is shown in figure 2.6.

Figure 2.6: Interaction Diagram: Example of parameter processing

The actual parameter value is obtained as type "String" from instance of "HttpRequest". After test on "empty" (null) value is the sequence divided in 2 scenarios: A1 and A2. The first one is for positive test results and the other one for the negative results. Note, that the "scenarios" are used to help to overcome the in-capability of interaction diagrams to support conditional branches.

A1: if the parameter value is not present (is empty / null) then another test is performed to find out whether the value is required or not --- scenarios B1 and B2. If so (B1), a "MissingParameter" exception is thrown. On the other hand (B2), if the value is not present and not required, a default value is used. That is the end of scenario A1.

A2 scenario begins with test on the value validity, if it passes the test (C1), it is being converted to type "Integer" and returned. In the opposite case (C2), an "InvalidFormat" exception is thrown to indicate an invalid parameter value.

It is common practice to copy and paste the code --- which executes the just described sequence --- in many locations inside the application. A bit more sophisticated designs would develop a utility class, say "DataChecker", which takes the instance of "HttpRequest" as a parameter and encapsulates the sequence. Then the sequence shrinks to just one method call --- "check", see figure 2.7.

Figure 2.7: Interaction Diagram: new example of parameter processing

Unfortunately, there are two issues which comes with it. First, the utility class only works with "HttpRequest" as input parameter type, so if there is a need for this functionality, but the input data come from different source, the whole sequence has to be copied again (see previous section "Data Type Diversity").

Second and more serious issue is the dependence on other components: when the sequence execution originated from "MyClass", certain method calls ("isParameterValueNull", "isRequired", "....") were invoked on certain objects (instances of "MyClass", "HttpRequest", "DefaultValues"). Now, the same method calls have to be made also from the new location --- DataChecker ---, but some of the method calls are invoked on "MyClass" instance. This brings back dependency on "MyClass" class, which prevents from reusing the "DataChecker" from other classes then "MyClass".

To solve this dependency, other classes, rather then "MyClass", has to be responsible for holding the dependent methods implementation. This already leads towards more complex design, which is part of this "Data Instance" proposal.

Metadata Management (MDM)

The strategies for managing metadata information differs a lot. Each of them provides different level of granularity. The least metadata information starts with property names and graduates by property type, default values, value validation formulas, relationship definition, etc.

The maturity level of the MDM is inherent from data *storage* technique. Let's shed some light into the matter by enumerating couple of examples:

The lesson learned from the previous metadata management enumeration is that (1) there is many strategies, (2) they depends on data storage technique and (3) each have different level of granularity of the provided information.

A good designer should make extensive use of metadata. It allows design of reusable components which greatly reduces complexity of the application design.

Good example is the sequence shown if Figure 2.6. If metadata management is in place, then design of "DataValidator" component is much easier and cleaner. It is because majority of method calls, having the discussed dependence on "other" components, is about obtaining metadata information.