| |
Ways to Improve
Performance
Your databases are growing
bigger and bigger. It is extremely important to give a heavy consideration about
data access performance throughout the development cycle. The performance of an
application is typically dominated by data accessing. The optimization of
performance may be involved with many aspects such as server hardwares/softwares,
design and size of database, network tuning, optimization of distributed
queries, and client application coding. In many cases, the performance can be
improved by correct coding of your data accessing components. This short article
is mainly focused on the performance tuning of client coding through this
OleDBPro module. This OleDBPro module have various ways to boost your
application performance and some of them are special. All of them can be
implemented easily and simply without much coding involved.
-
Avoid network
traffic as much as possible.
Usually network roundtrips and data
packing between a
client and a server are the number one reason for poor performance of data
accessing, and must be eliminated as much as possible especially when a
big set of data are involved. The OleDBPro has two core classes, CRBase
and CBatchParam<T>, which use the batch mode to update and
retrieve into and from an OLEDB data source by default. All their derived
classes inherit the same mechanism to complete their task.
It is very critical to correctly construct a SQL query
statement to avoid fetching needless data. One of the capabilities of the
SQL language is its ability to filter away data at the server side so that
only the data required is returned to the client. Using these facilities
minimizes expensive network traffic between a server and a client. This
implies that both the SELECT fields and WHERE clauses must be restrictive
enough to retrieve only the data required. The reduction of the size of a
rowset will improve data accessing speed, capability of remote use and
multi-user scalability.
If you only updates (SQL Add, Update and Delete) lots
of data into a data source, it is highly desirable to use CBatchParam<T>
instead of CRBase derived classes to send multiple sets (20, 40, or more)
of data into a server by a single call of CBatchParam<......>::DoBatch
through a parameterized SQL statement or a stored procedure, because it
avoids retrieving data from a server to a client and reduces data packing
and movement over expensive network,
as shown in the example MultiProcs.
OleDBPro module has a powerful template class,
CMultiBulkRecord<T>, which can handle complicated statement
batches like "Select * from Orders;Insert into Employees
values(.....);Excute GetOrderInfo(?, ?, ?,....)". Statement batch is
a way of sending multiple statements from a client to a server at one
time, thereby reducing the number of network roundtrips to the server. If
the statement batch contains multiple SELECT statements, the server will
return multiple rowsets to a client in a single data stream.
As shown in the example Scroll,
this OleDBPro module supports use of bookmark, keys and indexes to
pinpoint records, referring to CRBase and
CRBaseEx.
Additionally, you can jump from one record to another by setting nSkipped
of CRBase::MoveNext(LONG
nSkipped=0). all of these methods are designed
for reducing data traffic over network.
As shown and discussed in the
examples FilterSort and DataShape,
use of MS data access services can eliminate the avoidable data movement
over network in many cases.
If possible, it is highly recommended to use a stored
procedure to handle a batch statement with multiple executions and let a
server handle it to reduce data movement over network as much as possible.
CRBase uses the batch mode to fetch records from a
server to a client. By default, the batch size is 20. However, if a record
has a few fields and its size is small, it may be correct to increase the
batch size for boosting performance before opening a rowset.
-
Don't use
rowset properties than necessary.
In OLEDB, rowset properties determine what cursor
should be used for managing a resultant rowset. Cursors are a useful and
flexible tool in a database management system. However, it is expensive
for a server to manage a cursor. The more functionality a cursor has, the
more expensive it costs.
-
Use
transactions often but correctly.
A primary goal of using transactions
(COsession::BeginTrans,
COSession::Commit and
COSession::Rollback) is to reduce
the amount of data transferred and data packing between server and
client. Long-running transactions can be great for a single user, but they
scale poorly to multiple users, may block away other users accessing the
same resources, and may even cause deadlocks. Therefore, an application
should avoid too long-running transactions in a multi-user environment.
-
Use prepared
parameterized statements or procedures.
Both the CBatchParam<T> and
CBulkRecordParam<T>
classes fully support prepared parameterized statemets and procedures with
any numbers (1, 2, 3, .......) and types (INPUT, OUTPUT and INPUT/OUTPUT)
of parameters. The use of prepared parameterized statement or procedures
can avoid the
repeated parsing of a SQL statement at server side. Further more, the two
classes reuse a OLEDB TCommand object without the repeated creation of
this object at the client side.
-
Consider using
provider-specific interfaces.
OLEDB is extremely flexible and extensible. You may
easily use provider-specific interfaces to send and fetch data into and
from a data source. For example, you can easily use the interface
IRowsetFastLoad of SQL Server provider to load records into a table at the
fastest speed (BCP, Batch CoPy). In comparison with ODBC, it is really
simple to use provider-specific interfaces.
-
Select an OLEDB
provider correctly.
Typically, it is highly recommended to use a native
OLEDB provider. Today, maybe there are a few OLEDB providers available for
a DBMS. Some of them may run faster at retrieving records, and others may
faster at updating records into a DBMS. You may need to compare them and
select one from them for your specific purposes.
-
Use
Just-In-Need to retrieve data from a server.
As shown in the example FastAccess,
this OleDBPro module has a UNIQUE feature at this writing time, deferring. If you do have a big rowset with a large number
of fields but don't always need accessing all of them for each record
(accessing them under some particular conditions), you can use
CRBase::SetDBPart to discard some of fields which are not often
accessed at run time. If you do need to access those of discarded fields
under some particular conditions, you can use CRBase::GetDataEx or
CRBase::SetDataEx to retrieve or update them. This feature is
something like Just-In-Need, and obviously reduces the network traffic.
The improvement is mainly dependent on whether the property DBPROP_DEFERRED
is set to true and what percentage of data fetching can be avoided. For
details and reasons, refer to the short article, Deferred
Columns and Performance. At this
writing time, MS Access providers set DBPROP_DEFERRED to true by default.
Even if an OLEDB provider does not set this property to true, it is still
safe and recommended to use this feature because the provider may be added
with this feature in the future. If so, your current codes will have a
role in improving your application performance in the future.
Additionally, this unique feature reduces coping data from an OLEDB
provider to its consumer, and increase the speed somewhat too. When we tested
this idea on MS Access provider, the result just amazed us!
-
Avoid
conversions between data types if proper.
By default, OleDBPro uses data types of columns of a
raw rowset, which are determined by an OLEDB provider and its table column
definitions. It is fast. If proper, use default data types.
-
Configure the
underlying DBBINDING structures correctly.
By default, CRBase and its derived classes retrieve
data values, statuses and lengths for variable-length data types from a
provider. You can configure the underlying DBBINDING structures just for
data values only at run time. This way could reduce setting data from a
provider into its client consumer and slightly increase data accessing
speed.
-
Reduce
traversing an array of DBBINDING structures.
To get data values, statuses or lengths needs to
traverse an array of DBBINDING structures, inside CRBase and its derived
classes. This may consumes a little time especially for a big rowset. You
can set an array of pointers to the inside buffers described by the
DBBINDING structures through calling CRBase::GetData,
CRBase::GetStatusPtr and
CRBase::GetLengthPtr to eliminate
traversing the DBBINDING structures repeatedly. This may slightly increase
the performance.
|