Unified Semantic Model for Types, Attributes, Relationships and Behaviors
Author: Kenny Yu, PhD
Unified Semantic Model for Types, Attributes, Relationships and Behaviors
Modeling techniques lack relationship semantics.
Modeling deficiencies hamper software development
Semantics and management of relationships
Comparisons with other technologies at the database level
Comparison with object-oriented analysis and design
Applicability and Performance considerations
Detailed Example of implementation in a banking application
Book loan from a network of libraries
Data or knowledge management systems
Modeling methodology is established that provides infrastructural service for managing entities, their types, attributes, relationships and behaviors with expressive relationship semantics. The model dynamically evolves to reflect changes and growth of the physical and conceptual systems being modeled. The model drives the process and registers the results of software analysis, design and development.
The relational model describes relationships between two types with a foreign key constraint, which in many cases is inadequate. In a sample database that is supplied with MS Access, the relationships of [Customer – Order] and [Category – Product] are both modeled with one-to-many relationships. These two relationships, however, are different in that an order has definitive existential dependency on a customer while a category of a product may be arbitrary and re-assigned.
While UML expresses relationships as composition and aggregation, implementing them fully in a programming language is a non-trivial matter. The instantiation of the composite object should be dependent on the composing object, and the removal of the latter should clear the former. In Java, for example, to fully express a compositional relationship, such as the one between order and order item, one needs to hide the public constructor of item and ensure that only the order object can create the item. Removing the order should nullify the item. Such efforts can only be achieved through laborious coding due to lack of relationship management mechanism in the language.
The existing technologies require full knowledge of the system to be modeled. The resultant models are static and inflexible. On a relational database, tables, columns and constraints need to be added or modified. At the application level, classes need to be modified and recompiled. The modeling methods are only applicable to well-defined systems and lack the ability to accommodate additional requirements after the design is finished. If a subject system is less than perfectly defined, the analysis and design is obsolete as soon as they are finished. The model is often incomplete because
The forms of incompleteness and incorrectness may be:
One has to assume, from the examples cited above, that the understanding of a system of reality is dynamic, fluid and evolving. It is not possible to describe such a system with static models. The methodology employed in software analysis and design, at object level with UML, Java, C++, XML and at persistence with relational database tables and database constraints, are inadequate.
The typing of entities is an evolving process. An entity may be of 0, 1 or many types. It may also devolve to lose its types. New types can be created on demand at runtime. Existing types may evolve to assume additional features. The author terms these features amorphous typing. These features reflect the dynamic nature of any realistic system. For example, when a person is born, she acquires a name and an id, likely a social security number and assumes relationships with her parents. Over time, she establishes relationships with a pediatrician, teachers and friends, assumes types of student, customer, employee, manager and learns to walk, talk and work.
The elements of the model can be summarized as follows:
Therefore, the model has three subsystems:
The contents of the definitions in the definition subsystem are described as follows:
On the basis of dependency between two types (a primary and a secondary) of entities and the multiplicity on the primary end, most relationships fall in one of the six types:
The terminology used here to describe the relationship may be inconsistent with usage in the literature and in the field of practice, which itself is often inconsistent. Confusions should be resolved with the definitions given above. The terms R1P, R1D, RH, RC, RU and RA may be preferred since confusions are caused mostly by the connotation of the English words.
In R1P and R1D relationships, the secondary entity can relate to only one primary. Multiple entities must be unified to yield the secondary. When two parents produce a child, there exists a complex relationship: [(Father, Mother) – RU —<> Marriage <>– R1D — Child]. RH, RC, RU and RA relationships allow the secondary to relate to multiple primaries. In RDBMS and UML terms, R1P and R1D represent one-to-many relationships. The others present many-to-many relationships.
The relationships and their possible implementation on RDBMS with the use of mapping tables are summarized in the following table. Columns 2 through 4 indicate the propagation on deletion. The header “Entity to primary” means on deletion of an entity propagate to the primary end of the relationship and so on.
Relation Type |
Entity to primary |
Entity to secondary |
Relation to entity |
Note |
R1P |
Cascade |
Cascade |
Cascade on secondary |
Deleting the primary entity propagates to the relationship and to the secondary. One primary for many secondary. |
R1D |
|
Cascade |
Cascade on secondary |
Deleting the secondary nullifies the relationship. One primary for many secondary. |
RH |
|
Cascade |
Cascade on secondary |
Deleting the secondary nullifies the relationship. Many primary for one secondary. |
RC |
|
|
|
Neither the primary nor the secondary can be deleted if a relationship exists. They must quit the relationship first. |
RU |
Cascade |
|
|
The primary can be deleted, nullifying the relationship. The secondary cannot. They must quit the relationship first. |
RA |
Cascade |
Cascade |
|
Both the primary and the secondary can be deleted, nullifying the relationship. |
The establishment of the six relationship types enables the architecture to provide their management as a system service to applications development and avoid elaborate and complex use of relational technologies such as multiple mapping tables and constraints, and therefore, numerous application-specific SQL statements for manipulating the data. Basic behaviors of entities are specified by the relationships they have with others. For example, different consequences ensue from a student dropping out of a class and an instructor quitting.
A relationship is defined between two types and therefore is binary and directed. A relationship may have a description and a label at each end for presentation. The data of a relationship may have a modifier, which is often an index value or coordinates. For example, while sugar has an [ingredient_of] relationship with a drink, the modifier may specify 250 grams/liter. The data type, or class, or the modifier is specified on the relationship. The instance of the class, which is represented in textual form, is recorded on the links, which are instances of the relationship linking the entities.
Not all of these relationships may exist in an application system. The designer may find that 2 to 3 relationships are present in a particular system and decide to disregard the others. All six relationship types identified and defined here for the sake of completeness.
The assignment subsystem establishes links among types, attributes, relationship types and behaviors. When an attribute is assigned to a type, it may assume a new name. The attribute amount may be referred to as balance on type Account, and netWorth on type Customer. A relationship is derived from a relationship type, has a name and may have labels on both ends, and may have a modifier class that defines a description on a relationship instance. Such description is needed, for example, in the relationship [document references book] to specify a page number of the book. A behavior, when assigned onto a type, may assume a new name for human comprehension.
While the model of this work goes beyond a database model, comparisons with other database models serve to help understand its features.
At the database model level, the establishment and management of the relationships distinguish this architecture from existing models, as described in http://unixspace.com/context/databases.html. The hierarchical model only handles the R1P relationships. The relationship managed in the network model is the RU relationships defined here. The associative model handles RA relationships. The relational model, without the use of mapping tables, can handle R1P, R1D and partially RU. With the use of mapping tables, it can handle RA. However, the relational model requires a physical table for each class of entity, resulting in proliferation of entity tables and mapping tables that have to be managed by applications code. The defects of the method as described earlier in this work are obvious.
This model has the benefits the object-relational model offers because the attributes of a class can be of any data type. For example, the location of an object on a map can be modeled with an entity of GeoMap class and an RA relationship with the object.
The object-oriented database model has defects and limitations inherited from object-oriented analysis and design methodologies. OOA and OOD do not provide means for describing and managing the richness and complexities of relationships as we have discovered. More in-depth comparison with OOA/OOD is provided in a later section.
The semi-structured database model is the antithesis of this work in that it mixes the schema with the data. This work provides a high degree of separation between the definition of data and the data itself.
The context data model is a hybrid that can represent relational, hierarchical and network data structures. It does not separate the metadata and data content into three subsystems as this model does.
OOA/OOD has no mechanism for managing dynamic type creation and assignment; all type definitions must be coded and compiled. Its relationship management is inadequate and no service is available for controlling the creational and terminational dependencies. While UML can specify the cardinality of relationships, it only reflects the intention of the designer. Although it is possible to achieve dynamic attributes and behavior through design patterns, it is up to the developer to implement such patterns with no help at the programming language or platform level.
A relational database is not required for implementing this data model. This model, in fact, may not need a database server at all. The data model can be stored in flat files, XML files or serialized objects of the programming language used. However, using a relational database offers benefits: 1. Most readers are familiar with relational database notations, such as the entity relationship diagram, which help conveying the concepts; 2. The infrastructure of a relational database, including the unique constraint, referential integrity assurance mechanism and indexing schemes, facilitates the implementation; 3. There are established technologies, such as JDBC, for interfacing to the database from a programming language.
If implemented on a RDBMS, this architecture supersedes the relational model since it pre-creates all tables and constraints necessary for most applications. The developer simply identifies the types, attribute and relationships in the application to register them in the generalized model. He may choose to create additional specialized relational database objects such as tables and constraints in addition to what this architecture provides.
An implementation of this model is illustrated in Figure 1.
Figure 1 presents a simplified version of the implementation. Relations_type_def table has the six types identified in this work. Out of considerations such as performance and complexity, the tables attribute_data and relationship_data may be divided into more tables on the basis of types of the attributes and types of relationships. For example, the data for R1D and RH may be placed in one separate table since the relationships share the cascade constraint.
While this generalized model can be used to complement, supplement, subsume or supercede object-oriented methodologies, including Unified Modeling Language and the programming languages with OOD features, such as Java, the notations and terms widely used in those technologies are helpful for conveying the concepts and structures of this model. Therefore, the data model is extended into the application programming layer and reified in the form of a set of Java API depicted in UML. The API outlines a language for describing and implementing the management of entities, types, relationships, attributes and behaviors.
The set of Java programming interface is illustrated in Figure2.
For an entity to assume a type, other entities, along with the type, are supplied to satisfy the requirements for R1P, R1D and RH relationships.
An application developer’s use of the API, in a simplified case, would involve:
For static, simple and isolated systems, which can be modeled with a few tables in a database, this modeling methodology does not offer significant benefits. This model solves the extensibility problems with large, complex and dynamic systems such as laboratory information management systems, chemical and bio-informatics, semantic networks, and drug interactions.
Since the model separates the meta-model, the model and the data, the performance is more easily managed. In cases where extremely large data sizes are involved, techniques for denormalization and materialized views are needed.
An example of implementation of this model is provided to illustrate the concepts and the techniques for analysis.
Example case
A bank has branch offices. Each branch has a manager and a number of cashiers. The manager is essential to the branch such that she can not be promoted out of it or quit without a replacement. Cashiers can come and go and can work for more than one branch. Customers open accounts at branches and conduct transactions at the home branch or other branches. Accounts may be jointly held. Some transactions are conducted through a cashier. The manager and the cashiers of a branch, as employees of the bank, receive employment benefits from the bank and only differ in the job roles at the branch.
Analysis of the data model elements
The following types can be identified, which are entered into the Entity_Type_def table as depicted in Figure 1.
Type |
Parent Type |
Note |
Bank |
|
The headquarter |
Branch |
|
Cannot exist without the bank |
Employee |
|
May relate to one or more branch |
Manager |
Employee |
Relates to one branch with RU, is an employee |
Cashier |
Employee |
Relates to one or more branches |
Customer |
|
Holds accounts and conducts transactions |
Transaction |
|
Relates to branches and cashiers |
The following relationships are identified, which are entered into the Relationship_assigned table as depicted in Figure 1.
Primary Type |
Secondary Type |
Relationship Type |
Primary Label |
Secondary Label |
Bank |
Branch |
R1P |
Branches |
Headquarter |
Branch |
Manager |
RU |
Manager |
Manager of |
Branch |
Cashier |
RA |
Cashiers |
Works at |
Employee |
Employee |
RU |
Subordinates |
Manager |
Account |
Employee |
RA |
Opened by |
Opened accounts |
Customer |
Account |
RH |
Accounts |
Owner |
Account |
Branch |
RA |
Opened at branch |
Opened accounts |
Account |
Transaction |
R1D |
Transactions |
Account |
Transaction |
Employee |
RA |
Performed by |
|
Transaction |
Branch |
RA |
Performed at |
|
The “Primary Label” is what appears on the graphical user interface on the view of the entity on the primary end. The display of a branch, for example, will show its manager next to the label “Manager”. The use of the labels on the user interface will be shown later in this section.
Attribute definitions are entered into the Attribute_Def table as depicted in Figure 1.
Attribute Name |
Data Type |
Address |
Text |
Amount |
USCurrency |
Employee No. |
Text |
The address attribute is assigned on Employee, Customer, Branch and Bank.
The amount attribute is assigned on Transaction and Account. On Transaction, a positive number indicates deposit and negative withdrawal. On account, it is the ending balance.
The Employee Number is assigned on Employee.
The above assignments are entered into the Attribute_assigned table as depicted in Figure 1.
Behavior definitions and assignments:
Behavior definitions and assignments are entered into the Behavior_def and Behavior_assigned tables as depicted in Figure 1.
Implementation of the language elements
Since the model encapsulates most of the business logic, this layer performs calculations and links the user interface to the data model.
Implementation of the user interface
The user interface can reflect the dynamic and flexible nature of the data model. It allows the browsing of entities along the relationship links and presents all the relationships, types, attributes and behaviors.
An implementation of the user interface is presented in Figure 3.
Figure 3 User interfaces that draw data from the model
In Figure 3, the Branch view’s title, Overland Branch, is the branch’s name in the model. “Address” is an attribute; “Manager” is from the [Branch – Manager] relationship; “Cashier” is from the [Branch – Cashier] relationship. In the Account view, the title “Checking 3120017” is the account’s name. The field “Opened by is from the [Account – Employee] relationship; “Owner” from [Customer -- Account]; “Transactions” from [Account – Transaction]. The two buttons, “Transact” and “Close Account”, represent the two behaviors assigned on Account type. In the Employee view, to the right of the name “Michelle White” is the types the employee has assumed. “Address” and Employee No” are two attributes. “Manager” and “Works at” are from relationships. “Resign” is a behavior assign to Employee in the model.
All the text and list values as well as behavior definitions on the above three view are directly taken from the model without hard-coded values in the design of the user interface.
The cases listed below server to demonstrate procedures and techniques of applying the model to various systems. The applicability of the model is not limited to these cases
Kinship modeling tracks parent-child relationship across multiple generations. We identify the following entity types and attributes
Person, first name, last name, gender, birthday, birthplace
Coupling: Start Date, End Date, the name of a coupling is the concatenated forms of the name of the members.
The Coupling here is similar to marriage in that is produces children but different in that it is permanent as long as there is a child.
Relationships:
Primary Type |
Secondary Type |
Relationship Type |
Primary Label |
Secondary Label |
Coupling |
Person |
RU |
Family of |
Member of family |
Coupling |
Person |
R1D |
Parent of |
Child of |
While the [parent – coupling – child] relationships identified here seem looser than the alternative [parents – RH – child] relationship, it is actually more accurate because the parents to a child may be unknown. Given a person, the model can trace up on an RU relationship to find his parents and down on one or more R1D relationships to find his children.
A customer of a library can request inter-library loans from a library. A book has a home library that owns it. The librarian may assist the checkout. The following entity types are identified: Library, Book, Customer, Loan, and Librarian.
Relationships:
Primary Type |
Secondary Type |
Relationship Type |
Primary Label |
Secondary Label |
Book |
Loan |
RH |
On loan |
Books |
Library |
Loan |
RH |
Loans |
Checkout Library |
Customer |
Loan |
RH |
Has loan |
Borrower |
Library |
Book |
RA |
Owns |
Belongs to |
Data or knowledge management systems
Large sizes of data from heterogeneous sources need to be categorized as the first step of its management. Categorization facilitates navigation and searches. A typical example is content directories, such as Yahoo directory. The categories form intersecting hierarchical structures.
For example, there are three paths to the category [United State] from http://dir.yahoo.com/:
While each path is on a branch of hierarchical structure, the end nodes are identical. Therefore, for the purpose of content management like the one on Yahoo.com, we identify types
Type |
Note |
Directory |
The root of all categories |
Category |
Including unlimited levels of subdivision |
Web Site |
The site listings within a category |
|
|
Relationships
Primary Type |
Secondary Type |
Relationship Type |
Note |
Directory |
Category |
R1P |
Directory consists of categories |
Category |
Category |
R1P |
Category has subcategories |
Category |
Category |
RU |
Categories link to each other across hierarchies |
Category |
Web Site |
RA |
Web site listings place under category |
The RU relationship here describes the cross-linking of categories. Among the cross-linked ones, one is the primary, the deletion of which invalidates the link.
The dimension of time can be easily added to the model to describe changes in all dimensions, including types, attributes and relationship over a time course. Each assignment and data can bear a time stamp. At the relational data model level, this is achieved by adding a datetime column to the xxx_assigned and xxx_data tables. The resultant model simulates the evolution of a complex system in detail.
An implementation of semantic network is provided in Unified Medical Language System (UMLS) by National Library of Medicine (http://www.nlm.nih.gov/research/umls/META3.HTML#s30). The 2003AB release of the Semantic Network contains 135 semantic types and 54 relationships. One branch of the semantic types contains [Physical Object – Organism – Animal – Mammal – Human], for example. These semantic types are entity types in the context of this work. In the following table, the semantic relationships are categorized into relationship types define in this work. It is evident that the data or knowledge on the semantic network can be cast on to this model.
Semantic relationship |
Relationship type |
Note |
isa |
R1D |
|
associated_with |
RU |
Use the loosest for the high-level relationship. |
physically_related_to |
RU |
|
part_of |
R1P |
|
consists_of |
R1P |
|
contains |
R1P |
|
connected_to |
RU |
|
interconnects |
RC |
Can be RA if the connection is loose. |
branch_of |
R1P |
|
tributary_of |
R1P |
|
ingredient_of |
RU |
Ingredient may be shared. |
spatially_related_to |
RU |
|
location_of |
R1D |
|
adjacent_to |
RU |
|
surrounds |
RU |
|
traverses |
RU |
|
functionally_related_to |
RU |
|
affects |
RU |
|
manages |
RU |
|
treats |
RU |
|
disrupts |
RU |
|
complicates |
RU |
|
interacts_with |
RC |
|
prevents |
RU |
Need a type or attribute of Absence. |
brings_about |
R1D |
|
produces |
R1D |
|
causes |
R1D |
|
performs |
R1D |
|
occurs_in |
R1D |
|
uses |
RU |
|
manifestation_of |
RU |
|
indicates |
RU |
|
result_of |
R1D |
|
temporally_related_to |
RU |
|
co_occurs_with |
RC |
|
precedes |
RU |
|
conceptually_related_to |
RU |
|
evaluation_of |
RU |
|
degree_of |
RU |
|
analyzes |
RU |
|
measurement_of |
RU |
|
measures |
RU |
|
diagnoses |
RU |
|
property_of |
RU |
May be modeled as attribute. |
derivative_of |
R1D |
|
developmental_form_of |
R1D |
|
method_of |
R1D |
|
conceptual_part_of |
R1D |
|
issue_in |
R1D |
|
Since all entities can be placed in a flat structure, separate from the type information, it is convenient to attach a security code to each of them. The code determines a user’s access permission on the entities, such as the ability to modify the attributes, relationships, on the basis of the user’s role and group affiliations.
The methodology unifies modeling at the object and the persistence layers, provides semantics and management infrastructure for types, attributes, relationships, and behaviors. It is suitable for modeling and application development for large-scale, complex and under-defined systems. The benefits include adaptiveness, reduced requirement for coding and ease of maintenance.
Managing Conditions and Effects of Interactions