FastDB supports transactions, online backup and automatic recovery after system crash. The transaction commit protocol is based on a shadow root pages algorithm, performing atomic update of the database. Recovery can be done very fast, providing high availability for critical applications. Moreover, the elimination of transaction logs improves the total system performance and leads to a more effective usage of system resources.
FastDB is an application-oriented database. Database tables are constructed using information about application classes. FastDB supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. FastDB provides a flexible and convenient interface for retrieving data from the database. A SQL-like query language is used to specify queries. Such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies the design of database applications and makes them more efficient.
Although FastDB is optimized in the assumption that database as a whole fits into the physical memory of the computer, it is also possible to use it with databases, the size of which exceeds the size of the physical memory in the system. In the last case, standard operating system swapping mechanisms will work. But all FastDB search algorithms and structures are optimized under the assumption of residence of all data in memory, so the efficiency for swapped out data will not be very high.
start from follow by
performs a recursive records
traversal using references.
The following rules in BNF-like notation specify the grammar of the FastDB query language search predicates:
Example | Meaning |
---|---|
expression | non-terminals |
not | terminals |
| | disjoint alternatives |
(not) | optional part |
{1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order ) expression ::= disjunction disjunction ::= conjunction | conjunction or disjunction conjunction ::= comparison | comparison and conjunction comparison ::= operand = operand | operand != operand | operand <> operand | operand < operand | operand <= operand | operand > operand | operand >= operand | operand (not) like operand | operand (not) like operand escape string | operand (not) in operand | operand (not) in expressions-list | operand (not) between operand and operand | operand is (not) null operand ::= addition additions ::= multiplication | addition + multiplication | addition || multiplication | addition - multiplication multiplication ::= power | multiplication * power | multiplication / power power ::= term | term ^ power term ::= identifier | number | string | true | false | null | current | first | last | ( expression ) | not comparison | - term | term [ expression ] | identifier . term | function term | exists identifier : term function ::= abs | length | lower | upper | integer | real | string | user-function string ::= ' { { any-character-except-quote } ('') } ' expressions-list ::= ( expression { , expression } ) order ::= order by sort-list sort-list ::= field-order { , field-order } field-order ::= field (asc | desc) field ::= identifier { . identifier } traverse ::= start from field ( follow by fields-list ) fields-list ::= field { , field } user-function ::= identifier
Identifiers are case sensitive, begin with a a-z, A-Z, '_' or '$' character, contain only a-z, A-Z, 0-9, '_' or '$' characters, and do not duplicate a SQL reserved word.
abs | and | asc | between | by |
current | desc | escape | exists | false |
first | follow | from | in | integer |
is | length | like | last | lower |
not | null | or | real | start |
string | true | upper |
ANSI-standard comments may also be used. All characters after a double-hyphen up to the end of the line are ignored.
FastDB extends ANSI standard SQL operations by supporting bit manipulation
operations. Operators and
/or
can be applied not only
to boolean operands but also to operands of integer type. The result of applying the
and
/or
operator to integer operands is an integer
value with bits set by the bit-AND/bit-OR operation. Bit operations can be used
for efficient implementation of small sets. Also the rasing to a power
operation (x^y) is supported by FastDB for integer and floating point
types.
company.address.city
Structure fields can be indexed and used in an order by
specification. Structures can contain other structures as their components;
there are no limitations on the nesting level.
The programmer can define methods for structures, which can be used
in queries with the same syntax as normal structure components.
Such a method should have no arguments except a pointer to the object to which
it belongs (the this
pointer in C++), and should return
an atomic value (of boolean, numeric, string or reference type).
Also the method should not change the object instance (immutable method).
If the method returns a string, this string should be allocated using the
new char
operator, because it will be deleted after copying of
its value.
So user-defined methods can be used for
the creation of virtual components -
components which are not stored in the database,
but instead are calculated
using values of other components.
For example, the FastDB dbDateTime
type contains only integer timestamp components and such methods
as dbDateTime::year()
, dbDateTime::month()
...
So it is possible to specify queries like: "delivery.year = 1999
"
in an application, where the delivery
record field has
dbDateTime
type. Methods are executed in the context of the
application, where they are defined, and are not available to other
applications and interactive SQL.
length()
function.
[]
operator.
If an index expression is out of array range, an exception will be raised.
in
can be used to check if an array contains
a value specified by the left operand. This operation can be used only for arrays of
atomic type: with boolean, numeric, reference or string components.
exists
operator. A variable specified after the exists
keyword can be used
as an index in arrays in the expression preceeded by the exists
quantor. This index variable will iterate through all possible array
index values, until the value of the expression will become true
or
the index runs out of range. The condition
exists i: (contract[i].company.location = 'US')will select all details which are shipped by companies located in 'US', while the query
not exists i: (contract[i].company.location = 'US')will select all details which are shipped from companies outside 'US'.
Nested exists
clauses are allowed. Using nested
exists
quantors is equivalent to nested loops using the correspondent
index variables. For example the query
exists column: (exists row: (matrix[column][row] = 0))will select all records, containing 0 in elements of a
matrix
field, which has type array of array of integer.
This construction is equivalent to the following
two nested loops:
bool result = false; for (int column = 0; column < matrix.length(); column++) { for (int row = 0; row < matrix[column].length(); row++) { if (matrix[column][row] == 0) { result = true; break; } } }The order of using indices is essential! The result of the following query execution
exists row: (exists column: (matrix[column][row] = 0))
will be completely different from the result of the previous query.
In the last case, the program simply hangs due to an infinite loop
in case of empty matrices.
char
in C) and
byte-by-byte comparison of strings ignoring locality settings.
The operator like
can be used for
matching a string with a pattern containing special wildcard characters
'%' and '_'. The character '_' matches any single character,
while the character '%' matches zero or more characters.
An extended form of the like
operator together with
the escape
keyword can be used to handle the
characters '%' and '_' in the pattern as normal characters if
they are preceded by a special escape character, specified after
the escape
keyword.
It is possible to search substrings within a string by the in
operator. The expression ('blue' in color)
will be true
for all records which color
field contains 'blue'.
If the length of the searched string is greater than some threshold value
(currently 512), a Boyer-Moore substring search algorithm is used instead
of a straightforward search implementation.
Strings can be concatenated by +
or ||
operators.
The last one was added for compatibility with the ANSI SQL standard.
As far as FastDB doesn't support the implicit conversion to string type in
expressions, the semantic of the operator +
can be redefined for
strings.
company.address.city = 'Chicago'will access records referenced by the
company
component of
a Contract
record and extract the city component of the
address
field of the referenced record from
the Supplier
table.
References can be checked for null
by is null
or is not null
predicates. Also references can be compared for
equality with each other as well as with the special null
keyword. When a null reference is dereferenced, an exception is raised
by FastDB.
There is a special keyword current
, which
during a table search can be used to refer to the current record.
Usually , the current
keyword is used for comparison of the current record identifier with
other references or locating it within an array of references.
For example, the following query will search in the Contract
table for all active contracts
(assuming that the field canceledContracts
has a
dbArray< dbReference<Contract> >
type):
current not in supplier.canceledContracts
FastDB provides special operators for recursive traverse of records by references:
The first part of this construction is used to specify root objects. The nonterminal root-references should be a variable of reference or of array of reference type. The two special keywordsstart from
root-references (follow by
list-of-reference-fields )
first
and
last
can be used here, locating the first/last record in the table
correspondingly.
If you want to check all records
referenced by an array of references or a single reference field
for some condition, then this
construction can be used without the follow by
part.If you specify the follow by part, then FastDB will recursively traverse the table of records, starting from the root references and using a list-of-reference-fields for transition between records. The list-of-reference-fields should consist of fields of reference or of array of reference type. The traverse is done in depth first top-left-right order (first we visit the parent node and then the siblings in left-to-right order). The recursion terminates when a null reference is accessed or an already visited record is referenced. For example the following query will search a tree of records with weight larger than 1 in TLR order:
"weight > 1 start from first follow by left, right"
For the following tree:
A:1.1 B:2.0 C:1.5 D:1.3 E:1.8 F:1.2 G:0.8the result of the query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)
Name | Argument type | Return type | Description |
---|---|---|---|
abs | integer | integer | absolute value of the argument |
abs | real | real | absolute value of the argument |
integer | real | integer | conversion of real to integer |
length | array | integer | number of elements in array |
lower | string | string | lowercase string |
real | integer | real | conversion of integer to real |
string | integer | string | conversion of integer to string |
string | real | string | conversion of real to string |
upper | string | string | uppercase string |
A FastDB application can define its own functions.
The function should have only a single
argument of int8, real8
or char const*
type and
return a value of bool, int8, real8
or char*
type.
User functions should be registered by the USER_FUNC(f)
macro,
which creates a static object of the dbUserFunction
class, binding
the function pointer and the function name. For example the following
statements make it possible to use the sin
function in SQL
statements:
#include <math.h> ... USER_FUNC(sin);Functions can be used only within the application, where they are defined. Functions are not accessible from other applications and interactive SQL. If a function returns a string type , the returned string should be copied by means of the operator
new
, because
FastDB will call the destructor after copying the returned value.In FastDB, the function argument can (but not necessarily must) be enclosed in parentheses. So both of the following expressions are valid:
'$' + string(abs(x)) length string y
dbQuery q; dbCursor<Contract> contracts; dbCursor<Supplier> suppliers; int price, quantity; q = "(price >=",price,"or quantity >=",quantity, ") and delivery.year=1999"; // input price and quantity values if (contracts.select(q) != 0) { do { printf("%s\n", suppliers.at(contracts->supplier)->company); } while (contracts.next()); }
Type | Description |
---|---|
bool | boolean type (true,false ) |
int1 | one byte signed integer (-128..127) |
int2 | two bytes signed integer (-65536..65536) |
int4 | four bytes signed integer (-2147483647..2147483647) |
int8 | eight bytes signed integer (-2**63..2**63-1) |
real4 | four bytes ANSI floating point type |
real8 | eight bytes ANSI double precision floating point type |
char const* | zero terminated string |
dbReference<T> | reference to class T |
dbArray<T> | dynamic array of elements of type T |
In addition to types specified in the table above, FastDB records can also contain nested structures of these components. FastDB doesn't support unsigned types to simplify the query language, to eliminate bugs caused by signed/unsigned comparison and to reduce the size of the database engine.
Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). Therefore the programmer has to explicitly enumerate class fields to be included in the database table (it also makes mapping between classes and tables more flexible). FastDB provides a set of macros and classes to make such mapping as simple as possible.
Each C++ class or structure, which will be used in the database, should
contain a special method describing its fields. The macro
TYPE_DESCRIPTOR(
field_list)
will construct
this method. The single argument of this macro is - enclosed in parentheses -
a list of class field descriptors.
If you want to define some methods for the class
and make them available for the database, then the macro
CLASS_DESCRIPTOR(
name, field_list)
should be used instead of TYPE_DESCRIPTOR
. The class name is
needed to get references to member functions.
The following macros can be used for the construction of field descriptors:
HASHED
and INDEXED
flags.
When the HASHED
flag is specified, FastDB will create a hash table
for the table using this field as a key. When the INDEXED
flag is
specified, FastDB will create a (special kind of index) T-tree for the table
using this field as a key.
inverse_reference
is a field of the referenced table
containing the inverse reference(s) to the current table. Inverse references
are automatically updated by FastDB and are used for query optimization
(see Inverse references).
Although only atomic fields can be indexed, an index type can be specified for structures. The index will be created for components of the structure only if such type of index is specified in the index type mask of the structure. This allows the programmers to enable or disable indices for structure fields depending on the role of the structure in the record.
The following example illustrates the creation of a type descriptor in the header file:
class dbDateTime { int4 stamp; public: int year() { return localtime((time_t*)&stamp)->tm_year + 1900; } ... CLASS_DESCRIPTOR(dbDateTime, (KEY(stamp,INDEXED|HASHED), METHOD(year), METHOD(month), METHOD(day), METHOD(dayOfYear), METHOD(dayOfWeek), METHOD(hour), METHOD(minute), METHOD(second))); }; class Detail { public: char const* name; char const* material; char const* color; real4 weight; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), KEY(material, HASHED), KEY(color, HASHED), KEY(weight, INDEXED), RELATION(contracts, detail))); }; class Contract { public: dbDateTime delivery; int4 quantity; int8 price; dbReference<Detail> detail; dbReference<Supplier> supplier; TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), KEY(quantity, INDEXED), KEY(price, INDEXED), RELATION(detail, contracts), RELATION(supplier, contracts))); };Type descriptors should be defined for all classes used in the database. In addition to defining type descriptors, it is necessary to establish a mapping between C++ classes and database tables. The macro
REGISTER(
name)
will do it. Unlike the
TYPE_DESCRIPTOR
macro, the REGISTER
macro should
be used in the implementation file and not in the header file. It constructs
a descriptor of the table associated with the class. If you are going to work
with multiple databases from one application, it is possible to register
a table in a concrete database by means of the
REGISTER_IN(
name,database) macro.
The parameter database
of this macro should be a pointer to the
dbDatabase
object. You can register tables
in the database as follows:
REGISTER(Detail); REGISTER(Supplier); REGISTER(Contract);The table (and correspondent class) can be used only with one database at each moment of time. When you open a database, FastDB imports into the database all classes defined in the application. If a class with the same name already exists in the database, its descriptor stored in the database is compared with the descriptor of this class in the application. If the class definitions differ, FastDB tries to convert records from the table to the new format. Any kind of conversion between numeric types (integer to real, real to integer, with extension or truncation) is allowed. Also, addition of new fields can be easily handled. But removal of fields is only possible for empty tables (to avoid accidental data destruction).
After loading all class descriptors, FastDB checks if all indices specified in the application class descriptor are already present in the database, constructs new indices and removes indices, which are no more used. Reformatting the table and adding/removing indices is only possible when no more than one application accesses the database. So when the first application is attached to the database, it can perform table conversion. All other applications can only add new classes to the database.
There is one special internal database Metatable
, which
contains information about other tables in the database. C++ programmers
need not access this table, because the format of database tables is specified
by C++ classes. But in an interactive SQL program, it may be necessary to
examine this table to get information about record fields.
=
' and ',
' C++ operators
to construct query statements with parameters. Parameters can be specified
directly in places where they are used, eliminating any mapping between
parameter placeholders and C variables. In the following sample query,
pointers to the parameters price
and quantity
are stored in the query, so that the query can be executed several times
with different parameter values. C++ overloaded functions make it possible
to automatically determine the type of the parameter,
requiring no extra information
to be supplied by the programmer (such reducing the possibility of a bug).
dbQuery q; int price, quantity; q = "price >=",price,"or quantity >=",quantity;Since the
char*
type can be used both for specifying a fraction
of a query (such as "price >=") and for a parameter of string type,
FastDB uses a special rule to resolve this ambiguity. This rule is based on the
assumption that there is no reason for splitting a query text into two strings
like ("price ",">=") or specifying more than one parameter sequentially
("color=",color,color). So FastDB assumes the first string to be a fraction
of the query text and switches to operand mode
after it. In operand mode, FastDB treats the char*
argument
as a query parameter and switches back to query text mode, and so on...
It is also possible not to use this "syntax sugar" and construct
query elements explicitly by the
dbQuery::append(dbQueryElement::ElementType type, void const* ptr)
method. Before appending elements to the query,
it is necessary to reset the query by the dbQuery::reset()
method
('operator=
' does it automatically).It is not possible to use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in a query: using a string buffer or a pointer to pointer to string:
dbQuery q; char* type; char name[256]; q = "name=",name,"and type=",&type; scanf("%s", name); type = "A"; cursor.select(q); ... scanf("%s", name); type = "B"; cursor.select(q); ...
Query variables can neither be passed to a function as a parameter nor be assigned to another variable. When FastDB compiles the query, it saves the compiled tree in this object. The next time the query will be used, no compilation is needed and the already compiled tree can be used. It saves some time needed for query compilation.
FastDB provides two approaches to integrate user-defined types in databases.
The first - the definition of class methods - was already mentioned.
The other approach deals only with query construction. Programmers should
define methods, which will not do actual calculations, but instead
return an expression (in terms of predefined database types), which
performs the necessary calculation. It is better to describe it by example.
FastDB has no builtin datetime type. Instead of this, a normal C++
class dbDateTime
can be used by the programmer. This class defines
methods allowing to specify datetime fields in ordered lists and
to compare two dates using normal relational operators:
class dbDateTime { int4 stamp; public: ... dbQueryExpression operator == (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"=",stamp; return expr; } dbQueryExpression operator != (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<>",stamp; return expr; } dbQueryExpression operator < (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<",stamp; return expr; } dbQueryExpression operator <= (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),"<=",stamp; return expr; } dbQueryExpression operator > (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),">",stamp; return expr; } dbQueryExpression operator >= (char const* field) { dbQueryExpression expr; expr = dbComponent(field,"stamp"),">=",stamp; return expr; } friend dbQueryExpression between(char const* field, dbDateTime& from, dbDateTime& till) { dbQueryExpression expr; expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp; return expr; } friend dbQueryExpression ascent(char const* field) { dbQueryExpression expr; expr=dbComponent(field,"stamp"); return expr; } friend dbQueryExpression descent(char const* field) { dbQueryExpression expr; expr=dbComponent(field,"stamp"),"desc"; return expr; } };All these methods receive as their parameter a name of a field in the record. This name is used to contract the full name of the record's component. This can be done by class
dbComponent
, which constructor takes
the name of the structure field and the name of the component of the structure
and returns a compound name separated by a '.' symbol.
The class dbQueryExpression
is used to collect expression items.
The expression is automatically enclosed in parentheses, eliminating conflicts
with operator precedence.
So, assuming a record containing a field delivery
of dbDateTime type, it is possible
to construct queries like these:
dbDateTime from, till; q1 = between("delivery", from, till),"order by",ascent("delivery"); q2 = till >= "delivery";In addition to these methods, some class specific method can be defined in such way, for example the method
overlaps
for a region type.
The benefit of this approach is that a database engine will work
with predefined types and is able to apply indices and other optimizations
to proceed such query. And from the other side, the encapsulation of the class
implementation is preserved, so programmers should not rewrite all queries
when a class representation is changed.Variables of the following C++ types can be used as query parameters:
int1 | bool |
int2 | char const* |
int4 | char ** |
int8 | char const** |
real4 | dbReference<T> |
real8 | dbArray< dbReference<T> > |
dbCursor<T>
,
where T
is the name of a C++ class associated with
the database table. The cursor type should be specified in the constructor
of the cursor. By default, a read-only cursor is created.
To create a cursor for update, you should pass a parameter
dbCursorForUpdate
to the constructor.
A query is executed either by the cursor
select(dbQuery& q)
method.
Or by the select()
method,
which can be used to iterate through
all records in the table. Both methods return the number of selected records
and set the current position to the first record (if available).
A cursor can be scrolled in forward or backward direction.
The methods next(), prev(), first(), last()
can be used to
change the current position of the cursor.
If no operation can be performed as there are no (more) records
available, these methods return NULL
and the cursor position is not changed.
A cursor for class T contains an instance of class T, used for fetching the current record. That is why table classes should have a default constructor (constructor without parameters), which has no side effects. FastDB optimizes fetching records from the database, copying only data from fixed parts of the object. String bodies are not copied, instead of this the correspondent field points directly into the database. The same is true for arrays: their components have the same representation in the database as in the application (arrays of scalar types or arrays of nested structures of scalar components).
An application should not change
elements of strings and arrays in a database directly.
When an array method needs to update an array body,
it creates an in-memory copy of the array and updates this
copy. If the programmer wants to update a string field, she/he should assign
to the pointer a new value,
but don't change the string directly in the database.
It is recommended to use the char const*
type instead of the
char*
type for string components,
to enable the compiler to detect the illegal usage of strings.
The cursor class provides the
get()
method for obtaining a pointer to
the current record (stored inside the cursor). Also the overloaded
'operator->
'
can be used to access components of the current record.
If a cursor is opened for update,
the current record can be changed and stored in the database
by the update()
method or can be removed.
If the current record is removed, the next record becomes the
current. If there is no next record, then the previous record
(if it exists) becomes the current. The method removeAll()
removes all records in the table.
Whereas the method removeAllSelected
only removes
all records selected by the cursor.
When records are updated, the size of the database may increase.
Thus an extension of the database section in the virtual memory
is needed. As a result of such remapping, base addresses of the section can be
changed and all pointers to database fields kept by applications will become
invalid. FastDB automatically updates current records in all opened
cursors when a database section is remapped. So, when a database is updated,
the programmer should access record fields only through the cursor
->
method. She/he should not use pointer variables.
Memory used for the current selection can be released by the
reset()
method.
This method is automatically called by the select(),
dbDatabase::commit(), dbDatabase::rollback()
methods
and the cursor destructor, so in most cases there is no need to
call the reset()
method explicitly.
Cursors can also be used to access records by reference. The method
at(dbReference
sets the cursor to the record
pointed to by the reference. In this case, the selection consists exactly of
one record and the next(), prev()
methods will always return
NULL
. Since cursors and references in FastDB are strictly
typed, all necessary checking can be done statically by the compiler and
no dynamic type checking is needed. The only kind of checking,
which is done at runtime, is checking for null references.
The object identifier of the current record in the cursor can be obtained by
the currentId()
method.
It is possible to restrict the number of records returned by a select statement.
The cursor class has the two methods
setSelectionLimit(size_t lim)
and
unsetSelectionLimit()
,
which can be used to set/unset the limit
of numbers of records returned by the query. In some situations,
a programmer may want to receive
only one record or only few first records; so the query execution
time and size of consumed memory can be reduced by limiting the size of
selection. But if you specify an order for selected records,
the query with the restriction to
k records will not return the first k records
with the smallest value of the key. Instead of this, arbitrary k
records will be taken and then sorted.
So all operations with database data can be performed by means of cursors. The only exception is the insert operation, for which FastDB provides an overloaded insert function:
template<class T> dbReferenceThis function will insert a record at the end of the table and return a reference of the created object. The order of insertion is strictly specified in FastDB and applications can use this assumption about the record order in the table. For applications widely using references for navigation between objects, it is necessary to have some root object, from which a traversal by references can be made. A good candidate for such root object is the first record in the table (it is also the oldest record in the table). This record can be accessed by execution of theinsert(T const& record);
select()
method without parameter. The current record in the cursor will
be the first record in the table.
The C++ API of FastDB defines a special null
variable
of reference type.
It is possible to compare the null
variable with references
or assign it to the reference:
void update(dbReference<Contract> c) { if (c != null) { dbCursor<Contract> contract(dbCursorForUpdate); contract.at(c); contract->supplier = null; } }
dbDatabase
controls the application interactions
with the database. It performs synchronization of concurrent accesses to the
database, transaction management, memory allocation, error handling,...
The constructor of dbDatabase
objects allows programmers to specify
some database parameters:
dbDatabase(dbAccessType type = dbAllAccess, size_t dbInitSize = dbDefaultInitDatabaseSize, size_t dbExtensionQuantum = dbDefaultExtensionQuantum, size_t dbInitIndexSize = dbDefaultInitIndexSize, int nThreads = 1);A database can be opened in readonly mode (
dbDatabase::dbReadOnly
access type) or in normal mode, allowing modification of the database
(dbDatabase::dbAllAccess
). When the database is opened in readonly
mode, no new class definitions can be added to the database and definitions
of existing classes and indices can not be altered.
The parameter
dbInitSize
specifies the initial size of the database file.
The database file increases on demand; setting the initial size can only
reduce the number of reallocations (which can take a lot of time).
In the current implementation of the FastDB database
the size is at least doubled at each extension.
The default value of this parameter is 4 megabytes.
The parameter dbExtensionQuantum
specifies the quantum of extension of the
memory allocation bitmap.
Briefly speaking, the value of this parameter specifies how much memory
will be allocated sequentially without attempt to reuse space of
deallocated objects. The default value of this parameter is 4 Mb.
See section Memory allocation for more details.
The parameter dbInitIndexSize
specifies the initial index size.
All objects in FastDB are accessed through an object index.
There are two copies of this object index:
current and committed. Object indices are reallocated on
demand; setting an initial index size can only reduce (or increase)
the number of reallocations. The default value of this parameter is 64K object
identifiers.
And the last parameter nThreads
controls the level of query
parallelization. If it is greater than 1, then FastDB can start the parallel
execution of some queries (including sorting the result).
The specified number of parallel threads will
be spawned by the FastDB engine in this case. Usually it does not make
sense to specify the value of this parameter to be greater than the
number of online CPUs in the system. It is also possible to pass zero
as the value of this parameter. In this case, FastDB will automatically detect
the number of online CPUs in the system. The number of threads also can be set
by the dbDatabase::setConcurrency
method at any moment of time.
The class dbDatabase
contains a static field
dbParallelScanThreshold
, which specifies a threshold for the
number of records in the table after which query parallelization
is used. The default value of this parameter is 1000.
The database can be opened by the
open(char const* databaseName, char const* fileName = NULL)
method.
If the file name parameter is omitted, it is constructed from
the database name by appending the ".fdb" suffix. The database name should
be an arbitrary identifier consisting of any symbols except '\'.
The method open
returns true
if the database was
successfully opened; or false
if the open operation failed.
In the last case, the database handleError
method is called with a
DatabaseOpenError
error code. A database session can be terminated
by the close
method, which implicitly commits current transactions.
In a multithreaded application each thread, which wants to access the database,
should first be attached to it. The method dbDatabase::attach()
allocates thread specific data and attaches the thread to the database.
This method is automatically called by the open()
method, so
there is no reason to call the attach()
method for the thread,
which opens the database. When the thread finishes work with the database, it should
call the dbDatabase::detach()
method. The method
close
automatically invokes the detach()
method.
The method detach()
implicitly commits current transactions.
An attempt to access a database by a detached thread causes an assertion failure.
FastDB is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of the database are not possible (this is the price for the efficient log-less transaction mechanism and zero time recovery). When an application wants to modify the database (open a cursor for update or insert a new record in the table), it first locks the database in exclusive mode, prohibiting accesses to the database by other applications, even for read-only queries. So to avoid blocking of database applications for a long time, the modification transaction should be as short as possible. No blocking operations (like waiting for input from the user) should be done within this transaction.
Using only shared and exclusive locks on the database level, allows FastDB to almost eliminate overhead of locking and to optimize the speed of execution of non-conflicting operations. But if many applications simultaneously update different parts of the database, then the approach used in FastDB will be very inefficient. That is why FastDB is most suitable for a single-application database access model or for multiple applications with a read-dominated access pattern model.
Both cursor and query objects should be used only by one thread in a
multithreaded application. If there are more than one threads in your
application, use local variables for cursors and queries objects
in each thread. The dbDatabase
object is shared between all
threads and uses thread specific data to perform query
compilation and execution in parallel with minimal synchronization overhead.
There are few global things, which require synchronization: symbol table,
pool of tree node,... But scanning, parsing and execution of the query can
be done without any synchronization, providing high level of concurrency
at multiprocessor systems.
A database transaction is started by the first select or an insert operation.
If a cursor for update is used, then the database is locked in exclusive
mode, prohibiting access to the database by other applications and threads.
If a read-only cursor is used, then the database is locked in shared mode, preventing
other applications and threads from modifying the database,
but allowing the execution of concurrent read requests.
A transaction should be explicitly terminated
either by the dbDatabase::commit()
method, which fixes all
changes done by the transaction in the database; or by the
dbDatabase::rollback()
method to undo all modifications
done by transactions. The method dbDatabase::close()
automatically
commits current transactions.
If you start a transaction by performing selection using a read-only cursor and
then use a cursor for update to perform some modifications of the database,
the database will be first locked in shared mode; then the lock will be upgraded
to exclusive mode. This can cause a deadlock problem if the database is simultaneously
accessed by several applications. Imagine that application A starts
a read transaction and application B also starts a read transaction. Both
of them hold shared locks on the database. If both of them want to
upgrade their locks to exclusive mode, they will forever block each other
(exclusive lock can not be granted until a shared lock of another process exists).
To avoid such situations try to use a cursor for update at the beginning of the
transaction; or explicitly use the dbdatabase::lock()
method.
More information about the implementation of transactions in FastDB can be found
in section Transactions.
It is possible to explicitly lock the database by the lock()
method.
Locking is usually done automatically - there are only few cases when
you will want to use this method. It will lock the database in exclusive
mode until the end of the current transaction.
A backup of the database can be done by the
dbDatabase::backup(char const* file)
method. A backup locks the database in shared mode and flushes an image of the
database from main memory to the specified file. Because of using a shadow object index,
the database file is always in a consistent state, so recovery from the backup can
be performed just by renaming the backup file (if backup was performed on tape, it
should be first restored to the disk).
The class dbDatabase
is also responsible for handling various
application errors, such as syntax errors during query compilation,
out of range index or null reference access during query execution.
There is a virtual method dbDatabase::handleError
, which handles
these errors:
virtual void handleError(dbErrorClass error, char const* msg = NULL, int arg = 0);A programmer can derive her/his own subclass from the
dbDatabase
class and redefine the default reaction on errors.
Class | Description | Argument | Default reaction |
---|---|---|---|
QueryError | query compilation error | position in query string | abort compilation |
ArithmeticError | arithmetic error during division or power operations | - | terminate application |
IndexOutOfRangeError | index is out if array bounds | value of index | terminate application |
DatabaseOpenError | error while database opening | - | open method will return false |
FileError | failure of file IO operation | error code | terminate application |
OutOfMemoryError | not enough memory for object allocation | requested allocation size | terminate application |
Deadlock | upgrading lock causes deadlock | - | terminate application |
NullReferenceError | null reference is accessed during query execution | - | terminate application |
FastDB uses simple rules for applying indices, allowing a programmer to predict when an index and which one will be used. The check for index applicability is done during each query execution, so the decision can be made depending on the values of the operands. The following rules describe the algorithm of applying indices by FastDB:
= < > <= >= between like
)
Now we should make clear what the phrase "index is compatible with operation" means and which type of index is used in each case. A hash table can be used when:
=
is used.
between
operation is used and the values of both bounds operands
are the same.
like
operation is used and the pattern string contains
no special characters ('%' or '_') and no escape characters (specified in an
escape
part).
A T-tree index can be applied if a hash table is not applicable (or a field is not hashed) and:
= < > <= >= between
)
is used.
like
operation is used and the pattern string contains
no empty prefix (i.e. the first character of the pattern is not '%' or '_').
If an index is used to search the prefix of a like
expression, and
the suffix is not just the '%' character, then an index search operation can return
more records than really match the pattern. In this case we should filter the
index search output by applying a pattern match operation.
When the search condition is a disjunction of several subexpressions
(the expression contains several alternatives combined by the or
operator), then several indices can be used for the query execution.
To avoid record duplicates in this case, a bitmap is used in the cursor
to mark records already selected.
If the search condition requires a sequential table scan, the T-tree index
still can be used if the order by
clause contains the single
record field for which the T-tree index is defined. As far as sorting is very
expensive an operation, using an index instead of sorting significantly
reduces the time for the query execution.
It is possible to check which indices are used for the query execution,
and a number of probes can be done during index search, by compiling FastDB
with the option -DDEBUG=DEBUG_TRACE
. In this case, FastDB will
dump trace information about database functionality including information
about indices.
When a record with declared relations is inserted in the table, the inverse references in all tables, which are in relation with this record, are updated to point to this record. When a record is updated and a field specifying the record's relationship is changed, then the inverse references are also reconstructed automatically by removing references to the updated record from those records which are no longer in relation with the updated record and by setting inverse references to the updated record for new records included in the relation. When a record is deleted from the table, references to it are removed from all inverse reference fields.
Due to efficiency reasons, FastDB is not able to guarantee the consistency of all references. If you remove a record from the table, there still can be references to the removed record in the database. Accessing these references can cause unpredictable behavior of the application and even database corruption. Using inverse references allows to eliminate this problem, because all references will be updated automatically and the consistency of references is preserved.
Let's use the following table definitions as an example:
class Contract; class Detail { public: char const* name; char const* material; char const* color; real4 weight; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED), KEY(material, HASHED), KEY(color, HASHED), KEY(weight, INDEXED), RELATION(contracts, detail))); }; class Supplier { public: char const* company; char const* location; bool foreign; dbArray< dbReference<Contract> > contracts; TYPE_DESCRIPTOR((KEY(company, INDEXED|HASHED), KEY(location, HASHED), FIELD(foreign), RELATION(contracts, supplier))); }; class Contract { public: dbDateTime delivery; int4 quantity; int8 price; dbReference<Detail> detail; dbReference<Supplier> supplier; TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED), KEY(quantity, INDEXED), KEY(price, INDEXED), RELATION(detail, contracts), RELATION(supplier, contracts))); };
In this example there are one-to-many relations between the tables
Detail-Contract and Supplier-Contract. When a Contract
record is inserted in the database, it is necessary only to set the references
detail
and supplier
to the correspondent
records of the Detail
and the Supplier
table.
The inverse references contracts
in these records will be updated
automatically. The same happens when a Contract
record is
removed: references to the removed record will be automatically excluded
from the contracts
field of the referenced Detail
and
Supplier
records.
Moreover, using inverse reference allows to choose more effective plans for query execution. Consider the following query, selecting all details shipped by some company:
q = "exists i:(contracts[i].supplier.company=",company,")";The straightforward approach to execute this query is scanning the
Detail
table and testing each record for this condition.
But using inverse references we can choose another approach: perform an
index search in the Supplier
table for records with the specified
company name and then use the inverse references to locate records from the
Detail
table, which are in transitive relation with the
selected supplier records. Certainly we should eliminate duplicates of
records, which can appear because the company can ship a number of different
details. This is done by a bitmap in the cursor object.
As far as index search is significantly faster than sequential search
and accessing records by reference is very fast an operation, the total
time of such query execution is much shorter compared with the
straightforward approach.
Algorithms used in FastDB allow to quite precisely calculate the average and maximal time of query execution depending on the number of records in the table (assuming that the size of array fields in records is significantly smaller than the table size; and the time of iteration through array elements can be excluded from the estimation). The following table shows the complexity of searching a table with N records depending on the search condition:
Type of search | Average | Maximal |
---|---|---|
Sequential search | O(N) | O(N) |
Sequential search with sorting | O(N*log(N)) | O(N*log(N)) |
Search using hash table | O(1) | O(N) |
Search using T-tree | O(log(N)) | O(log(N)) |
Access by reference | O(1) | O(1) |
FastDB uses the Heapsort algorithm for sorting selected records to provide guaranteed log(N) complexity (quicksort is on the average a little bit faster, but worst time is O(N*N)). A hash table also has different average and maximal complexity. On the average, a hash table search is faster than a T-tree search, but in the worst case it is equivalent to a sequential search while a T-tree search always guarantees log(N) complexity.
The execution of update statements in FastDB is also fast, but this time is less predictable, because the commit operation requires flushing of modified pages to disk which can cause unpredictable operating system delays.
To split a table scan, FastDB starts N threads, each of them tests N-s records of the table (i.e. thread number 0 tests records 0,N,2*N,... thread number 1 test records 1,1+N,1+2*N,... and so on). Each thread builds its own list of selected records. After termination of all threads, these lists are concatenated to construct the single result list.
If the result shall be sorted, then each thread, after finishing the table scan, sorts the records it selected. After termination of all threads, their lists are merged (as it is done with an external sort).
Parallel query execution is controlled by two parameters: the number of spawned
threads and a parallel search threshold. The first is specified in the
dbDatabase
class constructor or set by the
dbDatabase::setConcurrency
method. A zero value of this parameter
asks FastDB to automatically detect the number of online CPUs in the system and
spawns exactly this number of threads. By default, the number of threads is set to 1,
so no parallel query execution takes place.
The parallel search threshold parameter specifies the minimal number of records in the
table for which parallelization of the query can improve query performance
(starting a thread has its own overhead). This parameter is a static
component of the dbDatabase
class and can be changed by an application at
any moment of time.
Parallel query execution is not possible when:
dbDatabase::dbParallelScanThreshold
.
start from
part.
FastDB performs a cyclic scan of bitmap pages. It saves the identifier
of the current bitmap page and the current position within the page. Each time
an allocation request arrives, scanning the bitmap starts from the (saved)
current position.
When the last allocated bitmap page is scanned, scanning continues from the
beginning (from the first bitmap page) upto the current position.
When no free space is found after a full cycle through all bitmap pages,
a new bulk of memory is allocated. The size of the extension is the maximum of the size
of the allocated object and of the extension quantum. The extension quantum is a parameter
of the database, specified in the constructor. The bitmap is extended in order to map
the additional space. If the virtual space is exhausted and no more
bitmap pages can be allocated, then an OutOfMemory
error
is reported.
Allocating memory using a bitmap provides high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimizing the number of modified pages is significant when a commit operation is performed and all dirty pages should be flushed onto the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and the transaction commit time is reduced. Using the extension quantum helps to preserve sequential allocation. Once the bitmap is extended, objects will be allocated sequentially until the extension quantum is completely consumed. Only after reaching the end of the bitmap, the scan restarts from the beginning, searching for holes in previously allocated memory.
To reduce the number of bitmap page scans, FastDB associates a descriptor with each page, which is used to remember the maximal size of the hole in the page. This calculation of the maximal hole size is performed in the following way: if an object of size M can not be allocated from this bitmap page, the maximal hole size is less than M, and M is stored in the page descriptor if the previous size value of the descriptor is greater than M. For the next allocation of an object of size >= M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated within this bitmap page.
Some database objects (like hash table pages) should be aligned on the page boundary to provide more efficient access. The FastDB memory allocator checks the requested size. If it is aligned on page boundary, the address of the allocated memory segment is also aligned on page boundary. A search for a free hole will be done faster in this case, because FastDB increases the step of the current position increment according to the value of the alignment.
To be able to deallocate memory used by an object, FastDB needs to keep somewhere information about the object size. There are two ways of getting the object size in FastDB. All table records are prepended by a record header, which contains the record size and a (L2-list) pointer, linking all records in the table. Such the size of the table record object can be extracted from the record header. Internal database objects (bitmap pages, T-tree and hash table nodes) have known size and are allocated without any header. Instead of this, handles for such objects contain special markers, which allow to determine the class of the object and get its size from the table of builtin object sizes. It is possible to use markers because allocation is always done in quanta of 16 bytes, so the low 4 bits of an object handle are not used.
It is possible to create a database larger than 4Gb or containing more than
4Gb of objects, if you pass values greater than 32 bit in the compiler command line
for the dbDatabaseOffsetBits
or the
dbDatabaseOidBits
parameter.
In this case, FastDB will use an 8 byte integer type to
represent an object handle/object identifier. It will work only at truly
64-bit operating systems, like Digital Unix.
When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains a handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. FastDB marks in a special bitmap page of the object index, which one contains the modified object handle.
When a transaction is committed, FastDB first checks if the size of the object index was increased during the commited transaction. If so, it also reallocates the shadow copy of the object index. Then FastDB frees memory for all "old objects", i.e. objects which has been cloned within the transaction. Memory can not be deallocated before commit, because we want to preserve the consistent state of the database by keeping cloned objects unchanged. If we deallocated memory immediately after cloning, a new object could be allocated at the place of the cloned object, and we would loose consistency. As memory deallocation is done in FastDB by the bitmap using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in a bitmap page, which also should be cloned before modification. Cloning a bitmap page will require new space for allocation of the page copy, and we could reuse the space of deallocated objects. But this is not acceptable due to the reasons explained above - we will loose database consistency. That is why deallocation of object is done in two steps. When an object is cloned, all bitmap pages used for marking the object space, are also cloned (if not cloned before). So when the transaction is committed, we only clear some bits in bitmap pages: no more requests for allocation of memory can be generated at this moment.
After deallocation of old copies, FastDB flushes all modified pages onto the disk to synchronize the contents of the memory and the contents of the disk file. After that, FastDB changes the current object index indicator in the database header to switch the roles of the object indices. The current object index becomes the shadow index and vice versa. Then FastDB again flushes the modified page (i.e. the page with the database header) onto the disk, transferring the database to a new consistent state. After that, FastDB copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, the contents of both indices are synchronized and FastDB is ready to start a new transaction.
The bitmap of the modified object index pages is used to minimize the duration of committing a transaction. Not the whole object index, but only its modified pages should be copied. After the transaction commitment the bitmap is cleared.
When a transaction is explicitly aborted by the dbDatabase::rollback
method, the shadow object index is copied back to the current index, eliminating
all changes done by the aborted transaction. After the end of copying,
both indices are identical again and the database state corresponds to the state
before the start of the aborted transaction.
Allocation of object handles is done by a free handle list. The header of the list is also shadowed and the two instances of the list headers are stored in the database header. A switch between them is done in the same way as between the object indices. When there are no more free elements in the list, FastDB allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. The object index is the only entity in the database which is not cloned on modification. Instead of this, two copies of the object index are always used.
There are some predefined OID values in FastDB. OID 0 is reserved as an invalid object identifier. OID 1 is used as the identifier for the metatable object - the table containing descriptors of all other tables in the database. This table is automatically constructed during the database initialization; descriptors of all registered application classes are stored in this metatable. OID starting from 2 are reserved for bitmap pages. The number of bitmap pages depends on the maximum virtual space of the database. For 32 bit handles, the maximal virtual space is 4Gb. The number of bitmap pages can be calculated, as this size divided by page size divided by allocation quantum size divided by number of bits in the byte. For a 4 Gb virtual space, a 4 Kb page size and 16 byte allocation quantum, 8K bitmap pages are required. So 8K handles are reserved in the object index for bitmaps. Bitmap pages are allocated on demand, when the database size is extended. So the OID of the first user object will be 8194.
dirty
flag is set in the database header, FastDB performs a
database recovery. Recovery is very similar to rollback of transactions.
The indicator of the current index in the database object header is used to
determine the index corresponding to the consistent database state. Object handles
from this index are copied to another object index, eliminating
all changes done by uncommitted transactions. As the only action
performed by the recovery procedure is copying the object index (really only
handles having different values in the current and the shadow index are copied to
reduce the number of modified pages) and the size of the object index is small,
recovery can be done very fast.
The fast recovery procedure reduces the "out-of-service" time for
an application.
There is one hack which is used in FastDB to increase the database performance.
All records in the table are linked in an L2-list, allowing efficient traversal
through the list and insertion/removal of records.
The header of the list is stored in a table object (which is the record of the
Metatable
). L2-list pointers are
stored at the beginning of the object together with the object size.
New records are always appended in FastDB at the end of the list.
To provide consistent inclusion into a database list, we should clone the last record
in the table and the table object itself. But if the record size is very big,
cloning the last record can cause significant space
and time overhead.
To eliminate this overhead, FastDB does not clone the last record but allows a temporary inconsistency of the list. In which state will the list be if a system fault happens before committing the transaction ? The consistent version of the table object will point to the record which was the last record in the previous consistent state of the database. But as this record was not cloned, it can contain pointers to a next record, which doesn't exist in this consistent database state. To fix this inconsistency, FastDB checks all tables in the database during the recovery procedure: if the last record in the table contains a non-NULL next reference, next is changed to NULL to restore consistency.
If a database file was corrupted on the disk, the only way to recover the database
is to use a backup file (certainly if you do not forget to make it).
A backup file can be made by the interactive SQL utility using the backup
command; or by the application using the dbDatabase::backup()
method.
Both create a snapshot of the database in a specified file (it can be the name of a
device, a tape for example). As far as a database file is always in a consistent
state, the only action needed to perform recovery by means of the backup file
is to replace the original database file with the backup file.
If some application starts a transaction, locks the database and then crashes, the database is left in a locked state and no other application can access it. To restore from this situation, you should stop all applications working with the database. Then restart. The first application opening the database will initialize the database monitor and perform recovery after this type of crash.
FastDB uses an extensible hash table with collision chains. The table is implemented as an array of object references with a pointer to a collision chain. Collision chain elements form a L1-list: each element contains a pointer to the next element, the hash function value and the OID of the associated record. Hash tables can be created for boolean, numeric and string fields.
To prevent the growth of collision chains, the size of a hash table is automatically increased when the table becomes full. In the current implementation, the hash table is extended when both of the following two conditions are true:
char
field, because no more than 256 items of the hash table can be
used). Each time the hash table is extended, its size is doubled. More precisely:
the hash table size is 2**n-1.
Using an odd or a prime number for the hash size allows to improve the
quality of hashing and efficiently
allocates space for hash table, the size of which is aligned on page
boundary. If the hash table size is 2**n, than we will always loose
the least n bits of the hash key.FastDB uses a very simple hash function, which despite of its simplicity can provide good results (uniformal distribution of values within the hash table). The hash code is calculated using all bytes of the key value by the following formula:
h = h*31 + *key++;The hash table index is the remainder of dividing the hash code by the hash table size.
Like AVL trees, the height of left and right subtrees of a T-tree may differ by at most one. Unlike AVL trees, each node in a T-tree stores multiple key values in a sorted order, rather than a single key value. The left-most and the right-most key value in a node define the range of key values contained in the node. Thus, the left subtree of a node contains only key values less than the left-most key value, while the right subtree contains key values greater than the right-most key value in the node. A key value which falls between the smallest and largest key value in a node is said to be bounded by that node. Note that keys equal to the smallest or largest key in the node may or may not be considered to be bounded based on whether the index is unique and based on the search condition (e.g. "greater-than" versus "greater-than or equal-to").
A node with both a left and a right child is referred to as an internal node, a node with only one child is referred to as a semi-leaf, and a node with no children is referred to as a leaf. In order to keep the occupancy high, every internal node must contain a minimum number of key values (typically k-2, if k is the maximum number of keys that can be stored in a node). However, there is no occupancy condition on the leaves or semi-leaves.
Searching for a key value in a T-tree is relatively straightforward. For every node, a check is made to see if the key value is bounded by the left-most and the right-most key value in the node; if this is the case, then the key value is returned if it is contained in the node (else, the key value is not contained in the tree). Otherwise, if the key value is less than the left-most key value, then the left child node is searched; else the right child node is searched. The process is repeated until either the key is found or the node to be searched is null.
Insertions and deletions into the T-tree are a bit more complicated. For insertions, first a variant of the search described above is used to find the node that bounds the key value to be inserted. If such a node exists, then if there is room in the node, the key value is inserted into the node. If there is no room in the node, then the key value is inserted into the node and the left-most key value in the node is inserted into the left subtree of the node (if the left subtree is empty, then a new node is allocated and the left-most key value is inserted into it). If no bounding node is found, then let N be the last node encountered by the failed search and proceed as follows: If N has room, the key value is inserted into N; else, it is inserted into a new node that is either the right or left child of N, depending on the key value and the left-most and right-most key values in N.
Deletion of a key value begins by determining the node containing the key value, and the key value is deleted from the node. If deleting the key value results in an empty leaf node, then the node is deleted. If the deletion results in an internal node or semi-leaf containing fewer than the minimum number of key values, then the deficit is made up by moving the largest key in the left subtree into the node, or by merging the node with its right child.
In both insert and delete, allocation/deallocation of a node may cause the tree to become unbalanced and rotations (RR, RL, LL, LR) may be necessary. The heights of subtrees in the following description include the effects of the insert or delete operation. In case of an insert, nodes along the path from the newly allocated node to the root are examined until
In the case of delete, nodes along the path from the de-allocated node's parent to the root are examined until a node is found whose subtrees' heights now differ by one. Furthermore, every time a node whose subtrees' heights differ by more than one is encountered, a rotation is performed. Note that de-allocation of a node may result in multiple rotations.
The following rules in BNF-like notation specifies the grammar of the SUBSQL directives:
directive ::= select (*) from table-name select-condition ; | insert into table-name values values-list ; | create index on field-name ; | drop index field-name ; | drop table-name | open database-name ( database-file-name ) ; | delete from table-name | backup file-name | commit | rollback | exit | show | help table-name ::= identifier values-list ::= tuple { , tuple } tuple ::= ( value { , value } ) value ::= number | string | true | false | tuple index ::= index | hash field-name ::= identifier { . identifier } database-name ::= string database-file-name ::= string file-name ::= string
SUBSQL automatically commits a read-only transaction after each
select statement in order to release a shared database lock as soon as possible.
But all database modification operations should be explicitly committed
by a commit
statement or undone by a rollback
statement. open
opens a new database, wherase exit
closes
an open database (if it was opened), and so implicitly commits the last transaction.
If a database file name was not
specified in the open
statement, then a file name is constructed from
the database name by appending the ".fdb"
suffix.
The select
statement always prints all record fields. FastDB doesn't support
tuples: the result of the selection is always a set of objects (records).
The format of the select statement output is similar with the one accepted by the insert
statement (with the exception of reference fields). So it is possible to
export/import a database table without references by means of the
select/insert
directives of SUBSQL.
The select
statement prints references in the format
"#hexadecimal-number"
. But it is not possible to use this format
in the insert
statement. As object references are represented
in FastDB by internal object identifiers, a reference field can not be set in an
insert
statement (an object inserted into the database will
be assigned a new OID, so it does not make sense to specify a reference field
in the insert
statement). To ensure database reference consistency,
FastDB just ignores reference fields when new records are inserted into the table
with references. You should specify the value 0 at the place of reference fields.
If you omit the '*' symbol in the select statement, FastDB will output object
identifiers of each selected record.
It is mandatory to provide values for all record fields in an insert
statement; default values are not supported. Components of structures and
arrays should be enclosed in parentheses.
It is not possible to create or drop indices and tables while other
applications are working with the database. Such operations change
the database scheme: after such modifications the state of other applications
will become incorrect. But the delete
operation
doesn't change the database scheme. So it can be performed as a normal transaction,
when the database is concurrently used by several applications.
If SUBSQL hangs trying to execute some statement, then some other application
holds the lock on the database, preventing SUBSQL from accessing it.
This is an example of a "navigation-only" application - no queries are used in this application at all. All navigation between records (objects) is done by means of references. Really, this application is more suitable for object oriented databases, but I include it in FastDB
testperf
program by the number of iterations.
System | Number of CPUs | Number of threads | Insertion*) | Hash table search | T-tree search | Sequential search | Sequential search with sorting |
---|---|---|---|---|---|---|---|
Pentium-II 300, 128 Mb RAM, Windows NT | 1 | 1 | 0.056 | 0.015 | 0.041 | 1 400 | 25 000 |
Pentium-II 333, 512 Mb RAM, Linux | 1 | 1 | 0.052 | 0.016 | 0.045 | 1 600 | 33 000 |
Pentium-Pro 200, 128 Mb RAM, Windows NT | 2 | 1 | 0.071 | 0.023 | 0.052 | 1 600 | 35 000 |
Pentium-Pro 200, 128 Mb RAM, Windows NT | 2 | 2 | 0.071 | 0.023 | 0.052 | 1 800 | 23 000 |
AlphaServer 2100, 250 Mhz, 512 Mb RAM, Digital Unix | 2 | 1 | 0.250 | 0.031 | 0.084 | 2 600 | 42 000 |
AlphaServer 2100, 250 Mhz, 512 Mb RAM, Digital Unix | 2 | 2 | 0.250 | 0.031 | 0.084 | 1 600 | 23 000 |
AlphaStation, 500 Mhz, 256 Mb RAM, Digital Unix | 2 | 1 | 0.128 | 0.010 | 0.039 | 1 300 | 36 000 |
*) doesn't include commit time
It will be nice if you can run this test at some other platforms and send me the results. I need to notice, that for N = 1000000 you need at least 128Mb of memory, otherwise you will test the performance of your disk.
REGISTER
macro
(it should be done in some implementation module). If you are going
to redefine the default FastDB error handler (for example, if you want to use
a message window for reporting instead of stderr
), you should
define your own database class and derive it from dbDatabase
.
You should create an instance of the database class and make it accessible to
all application modules.
Before you can do something with your database, you should open it.
Checking the dbDatabase::open()
return code, you can
find out, if the database was successfully opened. Errors during database
opening do not terminate the application (but they are reported)
even with the default error handler.
Once you are certain that the database is normally opened, you can start
to work with the database. If your application is multithreaded and several threads
will work with the same database, you should attach each thread to the
database by the dbDatabase::attach
method. Before thread termination,
it should detach itself from the database by invoking the
dbDatabase::detach()
method. If your application uses navigation
through database objects by references, you need some kind of root object
which can be located without any references. The best candidate for the root
object is the first record of the table. FastDB guarantees that new
records are always inserted at the end of the table. So the first table record
is also the oldest record in the table.
To access database data, you should create a number of dbQuery
and dbCursor
objects. If several threads are working with the
database, each thread should have its own instances of query and
cursor objects. Usually it is enough to have one cursor for each table
(or two if your application also can update table records). But in case
of nested queries, using several cursors may be needed.
Query objects are usually created for each type of queries. Query objects are
used also for caching compiled queries, so it will be a good idea to
extend the life span of query variables (may be make them static).
There are four main operations with database: insert, select, update, remove.
The first is done without using cursors, by means of the global overloaded
template function insert
. Selection, updating and deleting of
records is performed using cursors. To be able to modify a table you should
use a cursor for update. A cursor in FastDB is typed and contains an instance
of an object of the table class. The overloaded 'operator->
'
of the cursor can be used to access components of the current record
and also to update these components. The method update
copies data from the cursor's object to the current table record.
The cursor's method remove
will remove the current cursor record,
the method removeAllSelected
will remove all selected records and
the method removeAll
will remove all records in the table.
Each transaction should be either committed by
the dbDatabase::commit()
or aborted by
the dbDatabase::rollback()
method. A transaction is started
automatically when the first select, insert or remove operation is executed.
Before exiting from your application do not forget to close the database.
Also remember, that the method dbDatabase::close()
will automatically
commit the last transaction, so if this is not what you want, then explicitly perform
a dbDatabase::rollback
before exit.
So a template for a FastDB application can look like this:
// // Header file // #include "fastdb.h" extern dbDatabase db; // create database object class MyTable { char const* someField; ... public: TYPE_DESCRIPTOR((someField)); }; // // Implementation // REGISTER(MyTable); int main() { if (db.open("mydatabase")) { dbCursor<MyTable> cursor; dbQuery q; char value[bufSize]; q = "someField=",value; gets(value); if (cursor.select(q) > 0) { do { printf("%s\n", cursor->someField); } while (cursor.next()); } db.close(); return EXIT_SUCCESS; } else { return EXIT_FAILURE; } }To compile a FastDB application you have to include the header file
"fastdb.h"
. This header file includes other FastDB header files,
so make sure that the FastDB directory is in the compiler's include path. To
link a FastDB application, you need the FastDB library ("fastdb.lib"
for Windows or "libfastdb.a"
for Unix). You can either
specify the full path to this library or place it in some default
library catalog (for example /usr/lib
for Unix).
To build the FastDB library, just type make
in the FastDB directory.
There is no autoconfiguration utility included
in the FastDB distribution. Most system dependent parts of the code are compiled using
conditional compilation. There are two makefiles in the FastDB distribution.
One for MS Windows with MS Visual C++ (makefile.mvc
)
and another one for generic Unix with a gcc compiler(makefile
).
If you want to use Posix threads or some other compiler, you
should edit this makefile.
There is also a make.bat
file, which just spawns a
nmake -f makefile.mvc
command.
The install
target in the
Unix makefile will copy FastDB header files, the FastDB library and the subsql utility
to directories specified by the INCSPATH, LIBSPATH
and
BINSPATH makefile variables correspondingly.
Default values of these variables are the following:
INCSPATH=/usr/include LIBSPATH=/usr/lib BINSPATH=/usr/bin
Once your application starts to work, you will be busy with
support and extension of your application. FastDB is able to perform
automatic schema evaluation for such cases as adding a new field to the table and
changing the type of a field. The programmer can also add new indices or remove
rarely used indices. The database trace can be switched on (by (re-)compiling the
FastDB library with the -DDEBUG=DEBUG_TRACE
option) to
perform analysis of database functionality and efficiency of using indices.
The SUBSQL utility can be used for database browsing and inspection, performing online backups, importing data to and exporting data from the database. FastDB will perform automatic recovery after system or application crash; you should not worry about it. The only thing you perhaps have to do manually is stopping all database applications if one of them crashes, leaving the database blocked.
I will provide e-mail support and help you with development of FastDB applications.
Look for new version at my homepage | E-Mail me about bugs and problems