Introduction

FastDB is a highly efficient main memory database system with realtime capabilities and convenient C++ interface. FastDB doesn't support a client-server architecture and all applications using a FastDB database should run at the same host. FastDB is optimized for applications with dominated read access pattern. High speed of query execution is provided by the elimination of data transfer overhead and a very effective locking implementation. The Database file is mapped to the virtual memory space of each application working with the database. So the query is executed in the context of the application, requiring no context switching and data transfer. Synchronization of concurrent database access is implemented in FastDB by means of atomic instructions, adding almost no overhead to query processing. FastDB assumes that the whole database is present in RAM and optimizes the search algorithms and structures according to this assumption. Moreover, FastDB has no overhead caused by database buffer management and needs no data transfer between a database file and buffer pool. That is why FastDB will work significantly faster than a traditional database with all data cached in buffers pool.

FastDB supports transactions, online backup and automatic recovery after system crash. The transaction commit protocol is based on a shadow root pages algorithm, performing atomic update of the database. Recovery can be done very fast, providing high availability for critical applications. Moreover, the elimination of transaction logs improves the total system performance and leads to a more effective usage of system resources.

FastDB is an application-oriented database. Database tables are constructed using information about application classes. FastDB supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. FastDB provides a flexible and convenient interface for retrieving data from the database. A SQL-like query language is used to specify queries. Such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies the design of database applications and makes them more efficient.

Although FastDB is optimized in the assumption that database as a whole fits into the physical memory of the computer, it is also possible to use it with databases, the size of which exceeds the size of the physical memory in the system. In the last case, standard operating system swapping mechanisms will work. But all FastDB search algorithms and structures are optimized under the assumption of residence of all data in memory, so the efficiency for swapped out data will not be very high.

Query language

FastDB supports a query language with SQL-like syntax. FastDB uses a notation more popular for object-oriented programming then for a relational database. Table rows are considered as object instances, the table is the class of these objects. Unlike SQL, FastDB is oriented on work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of the FastDB query language from standard SQL are:

  1. There are no joins of several tables and nested subqueries. The query always returns a set of objects from one table.
  2. Standard C types are used for atomic table columns.
  3. There are no NULL values, except null references. I completely agree with C.J. Date's criticism of three-value logic and his proposal to use default values instead.
  4. Structures and arrays can be used as record components. A special exists quantor is provided for locating elements in arrays.
  5. Parameterless user methods can be defined for table records (objects) as well as for record components.
  6. User functions with (only) one single string or numeric argument can be defined by the application.
  7. References between objects are supported including automatic support for inverse references.
  8. Construction of start from follow by performs a recursive records traversal using references.
  9. Because the query language is deeply integrated into C++ classes, a case sensitive mode is used for language identifiers as well as for keywords.
  10. No implicit conversion of integer and floating types to string representation is done. If such conversion is needed, it must be done explicitly.

The following rules in BNF-like notation specify the grammar of the FastDB query language search predicates:

Grammar conventions
ExampleMeaning
expressionnon-terminals
notterminals
|disjoint alternatives
(not)optional part
{1..9}repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current | first | last
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | user-function
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= field (asc | desc)
field ::= identifier { . identifier }
traverse ::= start from field ( follow by fields-list )
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a a-z, A-Z, '_' or '$' character, contain only a-z, A-Z, 0-9, '_' or '$' characters, and do not duplicate a SQL reserved word.

List of reserved words
absandascbetweenby
currentdescescapeexistsfalse
firstfollowfromininteger
islengthlikelastlower
notnullorrealstart
stringtrueupper

ANSI-standard comments may also be used. All characters after a double-hyphen up to the end of the line are ignored.

FastDB extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. The result of applying the and/or operator to integer operands is an integer value with bits set by the bit-AND/bit-OR operation. Bit operations can be used for efficient implementation of small sets. Also the rasing to a power operation (x^y) is supported by FastDB for integer and floating point types.

Structures

FastDB accepts structures as components of records. Fields of the structure can be accessed using the standard dot notation: company.address.city

Structure fields can be indexed and used in an order by specification. Structures can contain other structures as their components; there are no limitations on the nesting level.

The programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such a method should have no arguments except a pointer to the object to which it belongs (the this pointer in C++), and should return an atomic value (of boolean, numeric, string or reference type). Also the method should not change the object instance (immutable method). If the method returns a string, this string should be allocated using the new char operator, because it will be deleted after copying of its value.

So user-defined methods can be used for the creation of virtual components - components which are not stored in the database, but instead are calculated using values of other components. For example, the FastDB dbDateTime type contains only integer timestamp components and such methods as dbDateTime::year(), dbDateTime::month()... So it is possible to specify queries like: "delivery.year = 1999" in an application, where the delivery record field has dbDateTime type. Methods are executed in the context of the application, where they are defined, and are not available to other applications and interactive SQL.

Arrays

FastDB accepts arrays with dynamic length as components of records. Multidimensional arrays are not supported, but it is possible to define an array of arrays. FastDB provides a set of special constructions for dealing with arrays:

  1. It is possible to get the number of elements in the array by the length() function.
  2. Array elements can be fetched by the[] operator. If an index expression is out of array range, an exception will be raised.
  3. The operator in can be used to check if an array contains a value specified by the left operand. This operation can be used only for arrays of atomic type: with boolean, numeric, reference or string components.
  4. Iteration through array elements is performed by the exists operator. A variable specified after the exists keyword can be used as an index in arrays in the expression preceeded by the exists quantor. This index variable will iterate through all possible array index values, until the value of the expression will become true or the index runs out of range. The condition
            exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped by companies located in 'US', while the query
            not exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped from companies outside 'US'.

    Nested exists clauses are allowed. Using nested exists quantors is equivalent to nested loops using the correspondent index variables. For example the query

            exists column: (exists row: (matrix[column][row] = 0))
    
    will select all records, containing 0 in elements of a matrix field, which has type array of array of integer. This construction is equivalent to the following two nested loops:
           bool result = false;
           for (int column = 0; column < matrix.length(); column++) { 
                for (int row = 0; row < matrix[column].length(); row++) { 
    	         if (matrix[column][row] == 0) { 
                         result = true;
    		     break;
                     }
                }
           }
    
    The order of using indices is essential! The result of the following query execution
            exists row: (exists column: (matrix[column][row] = 0))
    
    will be completely different from the result of the previous query. In the last case, the program simply hangs due to an infinite loop in case of empty matrices.

Strings

All strings in FastDB have varying length and the programmer should not worry about specification of maximal length for character fields. All operations acceptable for arrays are also applicable to strings. In addition to them, strings have a set of own operations. First of all, strings can be compared with each other using standard relation operators. At present, FastDB supports only the ASCII character set (corresponds to type char in C) and byte-by-byte comparison of strings ignoring locality settings.

The operator like can be used for matching a string with a pattern containing special wildcard characters '%' and '_'. The character '_' matches any single character, while the character '%' matches zero or more characters. An extended form of the like operator together with the escape keyword can be used to handle the characters '%' and '_' in the pattern as normal characters if they are preceded by a special escape character, specified after the escape keyword.

It is possible to search substrings within a string by the in operator. The expression ('blue' in color) will be true for all records which color field contains 'blue'. If the length of the searched string is greater than some threshold value (currently 512), a Boyer-Moore substring search algorithm is used instead of a straightforward search implementation.

Strings can be concatenated by + or || operators. The last one was added for compatibility with the ANSI SQL standard. As far as FastDB doesn't support the implicit conversion to string type in expressions, the semantic of the operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as used for accessing structure components. For example the following query
        company.address.city = 'Chicago'
will access records referenced by the company component of a Contract record and extract the city component of the address field of the referenced record from the Supplier table.

References can be checked for null by is null or is not null predicates. Also references can be compared for equality with each other as well as with the special null keyword. When a null reference is dereferenced, an exception is raised by FastDB.

There is a special keyword current, which during a table search can be used to refer to the current record. Usually , the current keyword is used for comparison of the current record identifier with other references or locating it within an array of references. For example, the following query will search in the Contract table for all active contracts (assuming that the field canceledContracts has a dbArray< dbReference<Contract> > type):

        current not in supplier.canceledContracts

FastDB provides special operators for recursive traverse of records by references:

     start from root-references
     ( follow by list-of-reference-fields )
The first part of this construction is used to specify root objects. The nonterminal root-references should be a variable of reference or of array of reference type. The two special keywords first and last can be used here, locating the first/last record in the table correspondingly. If you want to check all records referenced by an array of references or a single reference field for some condition, then this construction can be used without the follow by part.

If you specify the follow by part, then FastDB will recursively traverse the table of records, starting from the root references and using a list-of-reference-fields for transition between records. The list-of-reference-fields should consist of fields of reference or of array of reference type. The traverse is done in depth first top-left-right order (first we visit the parent node and then the siblings in left-to-right order). The recursion terminates when a null reference is accessed or an already visited record is referenced. For example the following query will search a tree of records with weight larger than 1 in TLR order:

        "weight > 1 start from first follow by left, right"

For the following tree:

                              A:1.1
              B:2.0                             C:1.5
      D:1.3         E:1.8                F:1.2         G:0.8
the result of the query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)

Functions

Predefined functions
NameArgument typeReturn typeDescription
absintegerintegerabsolute value of the argument
absrealrealabsolute value of the argument
integerrealintegerconversion of real to integer
lengtharrayintegernumber of elements in array
lowerstringstringlowercase string
realintegerrealconversion of integer to real
stringintegerstringconversion of integer to string
stringrealstringconversion of real to string
upperstringstringuppercase string

A FastDB application can define its own functions. The function should have only a single argument of int8, real8 or char const* type and return a value of bool, int8, real8 or char* type. User functions should be registered by the USER_FUNC(f) macro, which creates a static object of the dbUserFunction class, binding the function pointer and the function name. For example the following statements make it possible to use the sin function in SQL statements:

        #include <math.h>
	...
        USER_FUNC(sin);
Functions can be used only within the application, where they are defined. Functions are not accessible from other applications and interactive SQL. If a function returns a string type , the returned string should be copied by means of the operator new, because FastDB will call the destructor after copying the returned value.

In FastDB, the function argument can (but not necessarily must) be enclosed in parentheses. So both of the following expressions are valid:

        '$' + string(abs(x))
	length string y

C++ interface

One of the primary goals of FastDB is to provide a flexible and convenient application language interface. Anyone who has to use ODBC or similar SQL interfaces will understand what I am speaking about. In FastDB, a query can be written in C++ in the following way:

    dbQuery q; 
    dbCursor<Contract> contracts;
    dbCursor<Supplier> suppliers;
    int price, quantity;
    q = "(price >=",price,"or quantity >=",quantity,
        ") and delivery.year=1999";
    // input price and quantity values
    if (contracts.select(q) != 0) { 
        do { 
            printf("%s\n", suppliers.at(contracts->supplier)->company);
        } while (contracts.next());
    } 

Table

Data in FastDB is stored in tables which correspond to C++ classes whereas the table records correspond to class instances. The following C++ types are accepted as atomic components of FastDB records:

TypeDescription
boolboolean type (true,false)
int1one byte signed integer (-128..127)
int2two bytes signed integer (-65536..65536)
int4four bytes signed integer (-2147483647..2147483647)
int8eight bytes signed integer (-2**63..2**63-1)
real4four bytes ANSI floating point type
real8eight bytes ANSI double precision floating point type
char const*zero terminated string
dbReference<T>reference to class T
dbArray<T>dynamic array of elements of type T

In addition to types specified in the table above, FastDB records can also contain nested structures of these components. FastDB doesn't support unsigned types to simplify the query language, to eliminate bugs caused by signed/unsigned comparison and to reduce the size of the database engine.

Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). Therefore the programmer has to explicitly enumerate class fields to be included in the database table (it also makes mapping between classes and tables more flexible). FastDB provides a set of macros and classes to make such mapping as simple as possible.

Each C++ class or structure, which will be used in the database, should contain a special method describing its fields. The macro TYPE_DESCRIPTOR(field_list) will construct this method. The single argument of this macro is - enclosed in parentheses - a list of class field descriptors. If you want to define some methods for the class and make them available for the database, then the macro CLASS_DESCRIPTOR(name, field_list) should be used instead of TYPE_DESCRIPTOR. The class name is needed to get references to member functions.

The following macros can be used for the construction of field descriptors:

FIELD(name)
Non-indexed field with specified name.
KEY(name, index_type)
Indexed field. index_type should be a combination of HASHED and INDEXED flags. When the HASHED flag is specified, FastDB will create a hash table for the table using this field as a key. When the INDEXED flag is specified, FastDB will create a (special kind of index) T-tree for the table using this field as a key.
SUPERCLASS(name)
Specifies information about the base class (parent) of the current class.
RELATION(reference, inverse_reference)
Specifies one-to-one, one-to-many or many-to-many relationships between classes (tables). Both reference and inverse_reference fields should be of reference or of array of reference type. inverse_reference is a field of the referenced table containing the inverse reference(s) to the current table. Inverse references are automatically updated by FastDB and are used for query optimization (see Inverse references).
METHOD(name)
Specifies a method of the class. The method should be a parameterless instance member function returning a boolean, numeric, reference or string type. Methods should be specified after all other attributes of the class.

Although only atomic fields can be indexed, an index type can be specified for structures. The index will be created for components of the structure only if such type of index is specified in the index type mask of the structure. This allows the programmers to enable or disable indices for structure fields depending on the role of the structure in the record.

The following example illustrates the creation of a type descriptor in the header file:

class dbDateTime { 
    int4 stamp;
  public:
 
    int year() { 
	return localtime((time_t*)&stamp)->tm_year + 1900;
    }
    ...

    CLASS_DESCRIPTOR(dbDateTime, 
		     (KEY(stamp,INDEXED|HASHED), 
		      METHOD(year), METHOD(month), METHOD(day),
		      METHOD(dayOfYear), METHOD(dayOfWeek),
		      METHOD(hour), METHOD(minute), METHOD(second)));
};    

class Detail { 
  public:
    char const* name;
    char const* material;
    char const* color;
    real4       weight;

    dbArray< dbReference<Contract> > contracts;

    TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), 
		     KEY(material, HASHED), 
		     KEY(color, HASHED),
		     KEY(weight, INDEXED),
		     RELATION(contracts, detail)));
};

class Contract { 
  public:
    dbDateTime            delivery;
    int4                  quantity;
    int8                  price;
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;

    TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), 
		     KEY(quantity, INDEXED), 
		     KEY(price, INDEXED),
		     RELATION(detail, contracts),
		     RELATION(supplier, contracts)));
};
Type descriptors should be defined for all classes used in the database. In addition to defining type descriptors, it is necessary to establish a mapping between C++ classes and database tables. The macro REGISTER(name) will do it. Unlike the TYPE_DESCRIPTOR macro, the REGISTER macro should be used in the implementation file and not in the header file. It constructs a descriptor of the table associated with the class. If you are going to work with multiple databases from one application, it is possible to register a table in a concrete database by means of the REGISTER_IN(name,database) macro. The parameter database of this macro should be a pointer to the dbDatabase object. You can register tables in the database as follows:

REGISTER(Detail);
REGISTER(Supplier);
REGISTER(Contract);
The table (and correspondent class) can be used only with one database at each moment of time. When you open a database, FastDB imports into the database all classes defined in the application. If a class with the same name already exists in the database, its descriptor stored in the database is compared with the descriptor of this class in the application. If the class definitions differ, FastDB tries to convert records from the table to the new format. Any kind of conversion between numeric types (integer to real, real to integer, with extension or truncation) is allowed. Also, addition of new fields can be easily handled. But removal of fields is only possible for empty tables (to avoid accidental data destruction).

After loading all class descriptors, FastDB checks if all indices specified in the application class descriptor are already present in the database, constructs new indices and removes indices, which are no more used. Reformatting the table and adding/removing indices is only possible when no more than one application accesses the database. So when the first application is attached to the database, it can perform table conversion. All other applications can only add new classes to the database.

There is one special internal database Metatable, which contains information about other tables in the database. C++ programmers need not access this table, because the format of database tables is specified by C++ classes. But in an interactive SQL program, it may be necessary to examine this table to get information about record fields.

Query

The class query is used to serve two purposes:
  1. to construct a query and bind query parameters
  2. to cache compiled queries
FastDB provides overloaded '=' and ',' C++ operators to construct query statements with parameters. Parameters can be specified directly in places where they are used, eliminating any mapping between parameter placeholders and C variables. In the following sample query, pointers to the parameters price and quantity are stored in the query, so that the query can be executed several times with different parameter values. C++ overloaded functions make it possible to automatically determine the type of the parameter, requiring no extra information to be supplied by the programmer (such reducing the possibility of a bug).
        dbQuery q;
        int price, quantity;
        q = "price >=",price,"or quantity >=",quantity;
Since the char* type can be used both for specifying a fraction of a query (such as "price >=") and for a parameter of string type, FastDB uses a special rule to resolve this ambiguity. This rule is based on the assumption that there is no reason for splitting a query text into two strings like ("price ",">=") or specifying more than one parameter sequentially ("color=",color,color). So FastDB assumes the first string to be a fraction of the query text and switches to operand mode after it. In operand mode, FastDB treats the char* argument as a query parameter and switches back to query text mode, and so on... It is also possible not to use this "syntax sugar" and construct query elements explicitly by the dbQuery::append(dbQueryElement::ElementType type, void const* ptr) method. Before appending elements to the query, it is necessary to reset the query by the dbQuery::reset() method ('operator=' does it automatically).

It is not possible to use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in a query: using a string buffer or a pointer to pointer to string:

     dbQuery q;
     char* type;
     char name[256];
     q = "name=",name,"and type=",&type;

     scanf("%s", name);
     type = "A";     
     cursor.select(q);
     ...
     scanf("%s", name);
     type = "B";     
     cursor.select(q);
     ...

Query variables can neither be passed to a function as a parameter nor be assigned to another variable. When FastDB compiles the query, it saves the compiled tree in this object. The next time the query will be used, no compilation is needed and the already compiled tree can be used. It saves some time needed for query compilation.

FastDB provides two approaches to integrate user-defined types in databases. The first - the definition of class methods - was already mentioned. The other approach deals only with query construction. Programmers should define methods, which will not do actual calculations, but instead return an expression (in terms of predefined database types), which performs the necessary calculation. It is better to describe it by example. FastDB has no builtin datetime type. Instead of this, a normal C++ class dbDateTime can be used by the programmer. This class defines methods allowing to specify datetime fields in ordered lists and to compare two dates using normal relational operators:

class dbDateTime { 
    int4 stamp;
  public:
    ...
    dbQueryExpression operator == (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"=",stamp;
	return expr;
    }
    dbQueryExpression operator != (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<>",stamp;
	return expr;
    }
    dbQueryExpression operator < (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<",stamp;
	return expr;
    }
    dbQueryExpression operator <= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<=",stamp;
	return expr;
    }
    dbQueryExpression operator > (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">",stamp;
	return expr;
    }
    dbQueryExpression operator >= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">=",stamp;
	return expr;
    }
    friend dbQueryExpression between(char const* field, dbDateTime& from,
				     dbDateTime& till)
    { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp;
	return expr;
    }

    friend dbQueryExpression ascent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp");
	return expr;
    }	
    friend dbQueryExpression descent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"desc";
	return expr;
    }	
};
All these methods receive as their parameter a name of a field in the record. This name is used to contract the full name of the record's component. This can be done by class dbComponent, which constructor takes the name of the structure field and the name of the component of the structure and returns a compound name separated by a '.' symbol. The class dbQueryExpression is used to collect expression items. The expression is automatically enclosed in parentheses, eliminating conflicts with operator precedence.

So, assuming a record containing a field delivery of dbDateTime type, it is possible to construct queries like these:

        dbDateTime from, till;
        q1 = between("delivery", from, till),"order by",ascent("delivery");
        q2 = till >= "delivery"; 
In addition to these methods, some class specific method can be defined in such way, for example the method overlaps for a region type. The benefit of this approach is that a database engine will work with predefined types and is able to apply indices and other optimizations to proceed such query. And from the other side, the encapsulation of the class implementation is preserved, so programmers should not rewrite all queries when a class representation is changed.

Variables of the following C++ types can be used as query parameters:

int1bool
int2char const*
int4char **
int8char const**
real4dbReference<T>
real8dbArray< dbReference<T> >

Cursor

Cursors are used to access records returned by a select statement. FastDB provides typed cursors, i.e. cursors associated with concrete tables. There are two kinds of cursors in FastDB: readonly cursors and cursors for update. Cursors in FastDB are represented by the C++ template class dbCursor<T>, where T is the name of a C++ class associated with the database table. The cursor type should be specified in the constructor of the cursor. By default, a read-only cursor is created. To create a cursor for update, you should pass a parameter dbCursorForUpdate to the constructor.

A query is executed either by the cursor select(dbQuery& q) method. Or by the select() method, which can be used to iterate through all records in the table. Both methods return the number of selected records and set the current position to the first record (if available). A cursor can be scrolled in forward or backward direction. The methods next(), prev(), first(), last() can be used to change the current position of the cursor. If no operation can be performed as there are no (more) records available, these methods return NULL and the cursor position is not changed.

A cursor for class T contains an instance of class T, used for fetching the current record. That is why table classes should have a default constructor (constructor without parameters), which has no side effects. FastDB optimizes fetching records from the database, copying only data from fixed parts of the object. String bodies are not copied, instead of this the correspondent field points directly into the database. The same is true for arrays: their components have the same representation in the database as in the application (arrays of scalar types or arrays of nested structures of scalar components).

An application should not change elements of strings and arrays in a database directly. When an array method needs to update an array body, it creates an in-memory copy of the array and updates this copy. If the programmer wants to update a string field, she/he should assign to the pointer a new value, but don't change the string directly in the database. It is recommended to use the char const* type instead of the char* type for string components, to enable the compiler to detect the illegal usage of strings.

The cursor class provides the get() method for obtaining a pointer to the current record (stored inside the cursor). Also the overloaded 'operator->' can be used to access components of the current record. If a cursor is opened for update, the current record can be changed and stored in the database by the update() method or can be removed. If the current record is removed, the next record becomes the current. If there is no next record, then the previous record (if it exists) becomes the current. The method removeAll() removes all records in the table. Whereas the method removeAllSelected only removes all records selected by the cursor.

When records are updated, the size of the database may increase. Thus an extension of the database section in the virtual memory is needed. As a result of such remapping, base addresses of the section can be changed and all pointers to database fields kept by applications will become invalid. FastDB automatically updates current records in all opened cursors when a database section is remapped. So, when a database is updated, the programmer should access record fields only through the cursor -> method. She/he should not use pointer variables.

Memory used for the current selection can be released by the reset() method. This method is automatically called by the select(), dbDatabase::commit(), dbDatabase::rollback() methods and the cursor destructor, so in most cases there is no need to call the reset() method explicitly.

Cursors can also be used to access records by reference. The method at(dbReference const& ref) sets the cursor to the record pointed to by the reference. In this case, the selection consists exactly of one record and the next(), prev() methods will always return NULL. Since cursors and references in FastDB are strictly typed, all necessary checking can be done statically by the compiler and no dynamic type checking is needed. The only kind of checking, which is done at runtime, is checking for null references. The object identifier of the current record in the cursor can be obtained by the currentId() method.

It is possible to restrict the number of records returned by a select statement. The cursor class has the two methods setSelectionLimit(size_t lim) and unsetSelectionLimit(), which can be used to set/unset the limit of numbers of records returned by the query. In some situations, a programmer may want to receive only one record or only few first records; so the query execution time and size of consumed memory can be reduced by limiting the size of selection. But if you specify an order for selected records, the query with the restriction to k records will not return the first k records with the smallest value of the key. Instead of this, arbitrary k records will be taken and then sorted.

So all operations with database data can be performed by means of cursors. The only exception is the insert operation, for which FastDB provides an overloaded insert function:

        template<class T>
        dbReference insert(T const& record);
This function will insert a record at the end of the table and return a reference of the created object. The order of insertion is strictly specified in FastDB and applications can use this assumption about the record order in the table. For applications widely using references for navigation between objects, it is necessary to have some root object, from which a traversal by references can be made. A good candidate for such root object is the first record in the table (it is also the oldest record in the table). This record can be accessed by execution of the select() method without parameter. The current record in the cursor will be the first record in the table.

The C++ API of FastDB defines a special null variable of reference type. It is possible to compare the null variable with references or assign it to the reference:

        void update(dbReference<Contract> c) {
            if (c != null) { 
	        dbCursor<Contract> contract(dbCursorForUpdate);
		contract.at(c);
		contract->supplier = null;
            }
        }

Database

The class dbDatabase controls the application interactions with the database. It performs synchronization of concurrent accesses to the database, transaction management, memory allocation, error handling,...

The constructor of dbDatabase objects allows programmers to specify some database parameters:

    dbDatabase(dbAccessType type = dbAllAccess,
	       size_t dbInitSize = dbDefaultInitDatabaseSize,
	       size_t dbExtensionQuantum = dbDefaultExtensionQuantum,
	       size_t dbInitIndexSize = dbDefaultInitIndexSize,
	       int nThreads = 1);
A database can be opened in readonly mode (dbDatabase::dbReadOnly access type) or in normal mode, allowing modification of the database (dbDatabase::dbAllAccess). When the database is opened in readonly mode, no new class definitions can be added to the database and definitions of existing classes and indices can not be altered.

The parameter dbInitSize specifies the initial size of the database file. The database file increases on demand; setting the initial size can only reduce the number of reallocations (which can take a lot of time). In the current implementation of the FastDB database the size is at least doubled at each extension. The default value of this parameter is 4 megabytes.

The parameter dbExtensionQuantum specifies the quantum of extension of the memory allocation bitmap. Briefly speaking, the value of this parameter specifies how much memory will be allocated sequentially without attempt to reuse space of deallocated objects. The default value of this parameter is 4 Mb. See section Memory allocation for more details.

The parameter dbInitIndexSize specifies the initial index size. All objects in FastDB are accessed through an object index. There are two copies of this object index: current and committed. Object indices are reallocated on demand; setting an initial index size can only reduce (or increase) the number of reallocations. The default value of this parameter is 64K object identifiers.

And the last parameter nThreads controls the level of query parallelization. If it is greater than 1, then FastDB can start the parallel execution of some queries (including sorting the result). The specified number of parallel threads will be spawned by the FastDB engine in this case. Usually it does not make sense to specify the value of this parameter to be greater than the number of online CPUs in the system. It is also possible to pass zero as the value of this parameter. In this case, FastDB will automatically detect the number of online CPUs in the system. The number of threads also can be set by the dbDatabase::setConcurrency method at any moment of time.

The class dbDatabase contains a static field dbParallelScanThreshold, which specifies a threshold for the number of records in the table after which query parallelization is used. The default value of this parameter is 1000.

The database can be opened by the open(char const* databaseName, char const* fileName = NULL) method. If the file name parameter is omitted, it is constructed from the database name by appending the ".fdb" suffix. The database name should be an arbitrary identifier consisting of any symbols except '\'. The method open returns true if the database was successfully opened; or false if the open operation failed. In the last case, the database handleError method is called with a DatabaseOpenError error code. A database session can be terminated by the close method, which implicitly commits current transactions.

In a multithreaded application each thread, which wants to access the database, should first be attached to it. The method dbDatabase::attach() allocates thread specific data and attaches the thread to the database. This method is automatically called by the open() method, so there is no reason to call the attach() method for the thread, which opens the database. When the thread finishes work with the database, it should call the dbDatabase::detach() method. The method close automatically invokes the detach() method. The method detach() implicitly commits current transactions. An attempt to access a database by a detached thread causes an assertion failure.

FastDB is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of the database are not possible (this is the price for the efficient log-less transaction mechanism and zero time recovery). When an application wants to modify the database (open a cursor for update or insert a new record in the table), it first locks the database in exclusive mode, prohibiting accesses to the database by other applications, even for read-only queries. So to avoid blocking of database applications for a long time, the modification transaction should be as short as possible. No blocking operations (like waiting for input from the user) should be done within this transaction.

Using only shared and exclusive locks on the database level, allows FastDB to almost eliminate overhead of locking and to optimize the speed of execution of non-conflicting operations. But if many applications simultaneously update different parts of the database, then the approach used in FastDB will be very inefficient. That is why FastDB is most suitable for a single-application database access model or for multiple applications with a read-dominated access pattern model.

Both cursor and query objects should be used only by one thread in a multithreaded application. If there are more than one threads in your application, use local variables for cursors and queries objects in each thread. The dbDatabase object is shared between all threads and uses thread specific data to perform query compilation and execution in parallel with minimal synchronization overhead. There are few global things, which require synchronization: symbol table, pool of tree node,... But scanning, parsing and execution of the query can be done without any synchronization, providing high level of concurrency at multiprocessor systems.

A database transaction is started by the first select or an insert operation. If a cursor for update is used, then the database is locked in exclusive mode, prohibiting access to the database by other applications and threads. If a read-only cursor is used, then the database is locked in shared mode, preventing other applications and threads from modifying the database, but allowing the execution of concurrent read requests. A transaction should be explicitly terminated either by the dbDatabase::commit() method, which fixes all changes done by the transaction in the database; or by the dbDatabase::rollback() method to undo all modifications done by transactions. The method dbDatabase::close() automatically commits current transactions.

If you start a transaction by performing selection using a read-only cursor and then use a cursor for update to perform some modifications of the database, the database will be first locked in shared mode; then the lock will be upgraded to exclusive mode. This can cause a deadlock problem if the database is simultaneously accessed by several applications. Imagine that application A starts a read transaction and application B also starts a read transaction. Both of them hold shared locks on the database. If both of them want to upgrade their locks to exclusive mode, they will forever block each other (exclusive lock can not be granted until a shared lock of another process exists). To avoid such situations try to use a cursor for update at the beginning of the transaction; or explicitly use the dbdatabase::lock() method. More information about the implementation of transactions in FastDB can be found in section Transactions.

It is possible to explicitly lock the database by the lock() method. Locking is usually done automatically - there are only few cases when you will want to use this method. It will lock the database in exclusive mode until the end of the current transaction.

A backup of the database can be done by the dbDatabase::backup(char const* file) method. A backup locks the database in shared mode and flushes an image of the database from main memory to the specified file. Because of using a shadow object index, the database file is always in a consistent state, so recovery from the backup can be performed just by renaming the backup file (if backup was performed on tape, it should be first restored to the disk).

The class dbDatabase is also responsible for handling various application errors, such as syntax errors during query compilation, out of range index or null reference access during query execution. There is a virtual method dbDatabase::handleError, which handles these errors:

        virtual void handleError(dbErrorClass error, 
                                 char const*  msg = NULL, 
                                 int          arg = 0);
A programmer can derive her/his own subclass from the dbDatabase class and redefine the default reaction on errors.

Error classes and default handling
ClassDescriptionArgumentDefault reaction
QueryErrorquery compilation errorposition in query stringabort compilation
ArithmeticErrorarithmetic error during division or power operations-terminate application
IndexOutOfRangeErrorindex is out if array boundsvalue of indexterminate application
DatabaseOpenErrorerror while database opening-open method will return false
FileErrorfailure of file IO operationerror codeterminate application
OutOfMemoryErrornot enough memory for object allocationrequested allocation sizeterminate application
Deadlockupgrading lock causes deadlock-terminate application
NullReferenceErrornull reference is accessed during query execution-terminate application

Query optimization

The execution of queries, when all data is present in memory, is very fast, compared with the time for query execution in a traditional RDBMS. But FastDB even more increases the speed for query execution by applying several optimizations: using indices, inverse references and query parallelization. The following sections supply more information about these optimizations.

Using indices in queries

Indices are a traditional approach for increasing RDBMS performance. FastDB uses two types of indices: extensible hash table and T-tree. The first type provides the fastest way (with constant time in average) to access a record with a specified value of the key. Whereas the T-tree, which is acombination of AVL-Tree and array, has the same role for a MMRDBMS as the B-Tree for a traditional RDBMS. It provides search, insertion and deletion operations with guaranteed logarithmic complexity (i.e. the time for performing a search/insert/delete operation for a table with N records is C*log2(N), where C is some constant). The T-tree is more suitable for a MMDBMS than a B-Tree, because the last one tries to minimize the number of page loads (which is an expensive operation in disk-based databases), while the T-tree tries to optimize the number of compare/move operations. The T-tree is the best type to use with range operations or when the order of records is significant.

FastDB uses simple rules for applying indices, allowing a programmer to predict when an index and which one will be used. The check for index applicability is done during each query execution, so the decision can be made depending on the values of the operands. The following rules describe the algorithm of applying indices by FastDB:

Now we should make clear what the phrase "index is compatible with operation" means and which type of index is used in each case. A hash table can be used when:

A T-tree index can be applied if a hash table is not applicable (or a field is not hashed) and:

If an index is used to search the prefix of a like expression, and the suffix is not just the '%' character, then an index search operation can return more records than really match the pattern. In this case we should filter the index search output by applying a pattern match operation.

When the search condition is a disjunction of several subexpressions (the expression contains several alternatives combined by the or operator), then several indices can be used for the query execution. To avoid record duplicates in this case, a bitmap is used in the cursor to mark records already selected.

If the search condition requires a sequential table scan, the T-tree index still can be used if the order by clause contains the single record field for which the T-tree index is defined. As far as sorting is very expensive an operation, using an index instead of sorting significantly reduces the time for the query execution.

It is possible to check which indices are used for the query execution, and a number of probes can be done during index search, by compiling FastDB with the option -DDEBUG=DEBUG_TRACE. In this case, FastDB will dump trace information about database functionality including information about indices.

Inverse references

Inverse references provide an efficient and reliable way to establish relations between tables. FastDB uses information about inverse references when a record is inserted/updated/deleted and also for query optimization. Relations between records can be of one of the following types: one-to-one, one-to-many and many-to-many.