These are the basic concepts and
terminology of Oracle's distributed database architecture. The chapter
includes:
1.1 Distributed Database Architecture
1.2
Database
Links
1.1 Distributed Database Architecture
A distributed database system
allows applications to access data from local and remote databases. In a homogenous
distributed system, each database is an Oracle database. In a heterogeneous
distributed system, at least one of the databases is a non-Oracle
database. Distributed databases use a client/server architecture to
process information requests.
This section contains the
following topics:
1.1.1 Homogenous
Distributed Database Systems
1.1.2 Heterogeneous
Distributed Database Systems
1.1.3 Client/Server
Database Architecture
1.1.4 Data
Fragmentation
A homogenous distributed
database system is a network of two or more Oracle databases that reside
on one or more machines. Figure 1 illustrates a distributed system
that connects three
Figure 1.1 Homogeneous
Distributed Database
databases: ORACLE1, ORACLE2, and
ORACLE0. An application can simultaneously access or modify the data in several
databases in a single distributed environment. For example, a single query from
a Manufacturing client on local database ORACLE2 can retrieve joined data from
the PRODUCTS table on the local database and the DEPT table on the remote
ORACLE1 database.
For a client application, the
location and platform of the databases are transparent. One can also create synonyms
for remote objects in the distributed system so that users can access them with
the same syntax as local objects. For example, if one is connected to database
ORACLE2 yet wants to access data on database ORACLE1, creating a synonym on
ORACLE2 for the remote DEPT table allows one to issue this query:
SELECT * FROM dept;
In this way, a distributed system
gives the appearance of native data access. Users on ORACLE2 do not have to
know that the data they access resides on remote databases.
An Oracle distributed database
system can incorporate Oracle databases of different versions. All supported
releases of Oracle can participate in a distributed database system.
Nevertheless, the applications that work with the distributed database must
understand the functionality that is available at each node in the system--for
example, a distributed database application cannot expect an Oracle7 database
to understand the object SQL extensions that are only available with Oracle8i.
The terms distributed
database and distributed processing are closely related, yet have
distinct meanings.
Distributed
database |
A set of
databases in a distributed system that can appear to applications as a single
data source. For example,
the setup of machines in the network shown above is such an instance. |
Distributed
processing |
The operations
that occurs when an application distributes its tasks among different
computers in a network. For example, a database application
typically distributes front-end presentation tasks to client computers and
allows a back-end database server to manage shared access to a database.
Consequently, a distributed database application processing system is more
commonly referred to as a client/server database application
system. |
|
Oracle distributed database
systems employ a distributed processing architecture. For example, an Oracle
database server acts as a client when it requests data that another Oracle
database server manages.
The terms distributed
database system and database replication are related, yet
distinct. In a pure (that is, non-replicated) distributed database,
the system manages a single copy of all data and supporting database objects.
Typically, distributed database applications use distributed transactions to
access both local and remote data and modify the global database in real-time.
The term replication
refers to the operation of copying and maintaining database objects in multiple
databases belonging to a distributed system. While replication relies on
distributed database technology, database replication offers applications
benefits that are not possible within a pure distributed database environment.
Most commonly, replication is
used to improve local database performance and protect the availability of
applications because alternate data access options exist.
For example, an application may normally access a local database rather
than a remote server to minimize network traffic and achieve maximum
performance. Furthermore, the application can continue to function if the local
server experiences a failure, but other servers with replicated data remain
accessible.
1.1.2
In a heterogeneous
distributed database system, at least one of the databases is a non-Oracle
system. To the application, the heterogeneous distributed database system
appears as a single, local, Oracle database; the local Oracle database server
hides the distribution and heterogeneity of the data. The Oracle database
server accesses the non-Oracle system using Oracle8i Heterogeneous
Services in conjunction with an agent. If one accesses the non-Oracle
data store using an Oracle Transparent Gateway, then the agent is a
system-specific application.
For example, if one includes a Sybase database in an Oracle
distributed system, then one needs to obtain a Sybase-specific transparent
gateway so that the Oracle databases in the system can communicate with it.
Alternatively, generic
connectivity can be used to access non-Oracle data stores so long as the
non-Oracle system supports the ODBC or OLE DB protocols. If one uses the
generic agent included with the Oracle database server, then one does not need
to purchase a separate transparent gateway.
1.1.2.1
Generic connectivity allows one
to connect to non-Oracle8i data stores by using either a Heterogeneous
Services ODBC agent or a Heterogeneous Services OLE DB agent--both are included
with the Oracle8i product as a standard feature. Any data source
compatible with the ODBC or OLE DB standards can be accessed using a generic
connectivity agent.
Generic
connectivity is implemented as one of the following types of Heterogeneous
Service agents:
Each user
session receives its own dedicated agent process spawned by the first use in
that user session of the database link to the non-Oracle system. The agent
process ends when the user session ends.
The advantage to generic
connectivity is that it does not require one to purchase and configure a
separate system-specific agent. One simply need an ODBC or OLE DB driver that
can interface with the agent.
1.1.2.2
The features of the Heterogeneous
Services include the following:
Feature |
Purpose |
Distributed
transactions |
Allows a
transaction to span both Oracle and non-Oracle systems, while still
guaranteeing transaction consistency. |
SQL
translations |
Integrates
data from non-Oracle systems into the Oracle environment as if the data were
stored in one local database. SQL statements are transparently transformed
into SQL statement understood by the non-Oracle system. |
Data
dictionary translations |
Makes a
non-Oracle system appear as an Oracle database server. SQL statements
containing references to Oracle's data dictionary tables are transformed
into SQL statements containing references to a non-Oracle system's data
dictionary tables. |
Pass-Through
SQL |
Gives
application programmers direct access to a non-Oracle system from an Oracle
application using the non-Oracle system's SQL dialect. |
Stored
procedure access |
Allows access
to stored procedures in SQL-based non-Oracle systems as if they were PL/SQL
remote procedures. |
NLS
support |
Supports
multi-byte character sets, and translates character sets between a
non-Oracle system and the Oracle8i server. |
Multi-Threaded
agents |
Takes
advantage of the operating system's threading capabilities by reducing the number
of required processes. |
Agent
self-registration |
Automates the
updating of Heterogeneous Services configuration data on remote hosts,
ensuring correct operation over heterogeneous database links. |
Generic
connectivity |
Allows
Oracle8i to connect to a non-Oracle data store using the ODBC or
OLE DB protocol. |
|
1.1.3
A database server is the Oracle
software managing a database, and a client is an application that requests
information from a server. Each computer in a network is a node that can host
one or more databases. Each node in a distributed database system can act as a
client, a server, or both, depending on the situation.
In Figure 1.2, the
host for the ORACLE1 database is acting as a database server when a statement
is issued against its local data (for example, the second statement in
each transaction issues a statement against the local DEPT table), but is
acting as a client when it issues a statement against remote data (for example,
the first statement in each transaction is issued against the remote table EMP
in the ORACLE0 database).
A client can connect directly
or indirectly to a database server. A direct connection occurs when a
client connects to a server and accesses information from a database contained on
that server.
For example, if one connects to the ORACLE1 database and access
the DEPT table on this database as in Figure 1.2, one
can issue the following:
SELECT * FROM dept;
This query is direct because one
is not accessing an object on a remote database.
In contrast, an indirect
connection occurs when a client connects to a server and then accesses
information contained in a database on a different server.
For example, if one connects to
the ORACLE1 database but access the EMP table on the remote ORACLE0 database as
in Figure 2
one can issue the following:
SELECT * FROM emp@oracle0;
This query is indirect because
the object one is accessing is not on the database to which one is directly
connected.
1.1.5 Data
Fragmentation
If relation r is fragmented, r is
divided into a number of fragments r1,r2,r3….r n. These fragments contain
sufficient information to allow reconstruction of the original relation r. This
reconstruction can take place through the application of either the union
operation or a special type of join operation on the various fragments. There
are two different schemes for fragmenting a relation; horizontal fragmentation
and vertical fragmentation. Horizontal fragmentation splits the relation by
assigning each tuple of r or to one or more segments. Vertical fragmentation
splits the relation by decomposing the relation R of relation r in a special way.
These two schemes can be applied successively to the same relation, resulting
in a number of fragments. We take the example of an Account-schema(branch,
account, balance)
1.1.4.1 Horizontal
Fragmentation
The relation r is partitioned
into a number of subsets r1, r2, ….r n. Each tuple of relation must belong to
at least one of the fragments, so the original relation can be reconstructed,
if needed
A fragment can be defined as
selection on the global relation r. That is the predicate Pi to construct
fragment ri as follows ri=spi(r)
Branch |
Account |
Balance |
Hillside |
a-305 |
500 |
Hillside |
a-226 |
336 |
Valleyview |
a-177 |
205 |
Valleyview |
a-402 |
10000 |
Hillside |
a-155 |
62 |
Valleyview |
a-408 |
1123 |
Valleyview |
a-639 |
750 |
Sample account
Hillside |
a-305 |
500 |
Hillside |
a-226 |
336 |
Hillside |
a-155 |
62 |
Account 1
Valleyview |
a-177 |
205 |
Valleyview |
a-402 |
10000 |
Valleyview |
a-408 |
1123 |
Valleyview |
a-639 |
750 |
Account2
We obtain the reconstruction of
the relation r by taking the union of all the fragments ie. R=r1Čr2Č….Črn
As an illustration suppose that
the relation r is the account relation. This relation can be divided into n
different fragments, each of which consists of tuples of accounts belonging to
a particular branch. If the banking system has only two branches Hillside and
Valleyview—then there are two different fragments.
Account1=sbranch
= “Hillside”(account) Account2=sbranch = “Valleyview”(account)
The fragments in this example are
disjoint
1.1.4.2 Vertical Fragmentation
Vertical fragmentation is the
same as decomposition.Vertical fragmentation of r(R) involves the definition of
several subsets of attributes R1,R2,……..Rn of the schema R such
R=R1ČR2Č……..
ČRn
Each fragment ri of r is defined
by ri=ŐRi( r) The fragmentation should be done such
thwt we can reconstruct relation r from the fragments by taking the natural
join
r = r1´r2´r3…….´rn
The central concept in
distributed database systems is a database link. A database links is a
connection between two physical database servers that allows a client to access
them as one logical database.
This section contains the
following topics:
1.2.1
What Are Database Links?
1.2.2 What Are Shared Database
Links?
1.2.3 Why Use
Database Links?
1.2.4 Global
Database Names in Database Links
1.2.5 Types of
Database Links
1.2.6 Users of
Database Links
1.2.7 Creation
of Database Links: Examples
1.2.8 Schema
Objects and Database Links
1.2.9 Database
Link Restrictions
A database link is a pointer that
defines a one-way communication path from an Oracle database server to another
database server. The link pointer is actually defined as an entry in a data
dictionary table. To access the link, one must be connected to the local
database that contains the data dictionary entry.
A database link connection is
one-way in the sense that a client connected to local database A can use a link
stored in database A to access information in remote database B, but users
connected to database B cannot use the same link to access data in database A.
If local users on database B want to access data on database A, then they must
define a link that is stored in the data dictionary of database B.
A database link connection allows
local users to access data on a remote database. For this connection to occur,
each database in the distributed system must have a unique global database
name in the network domain. The global database name uniquely identifies a
database server in a distributed system.
Figure 1.3 shows an
example of user SCOTT accessing the EMP table on the remote database with the
global name ORACLE1.VRCE.ernet.in:
Database links are either private
or public. If they are private, then only the user who created the
link has access; if they are public, then all database users have access.
One principal difference among
database links is the way that connections to a remote database occur. Users
accessing a remote database through a:
One can create database links
using the CREATE DATABASE LINK statement. After a link is created, one can use
it to specify schema objects in SQL statements.
A shared database link is a link
between a local server process and the remote database. The link is shared
because multiple client processes can use the same link simultaneously.
When a local database is
connected to a remote database through a database link, either database can run
in dedicated or multi-threaded server (MTS) mode. The following table
illustrates the possibilities:
Local
Database Mode |
Remote
Database Mode |
Dedicated |
Dedicated |
Dedicated |
Multi-threaded |
Multi-threaded |
Dedicated |
Multi-threaded |
Multi-threaded |
|
A shared database link can exist
in any of these four configurations. Shared links differ from standard database
links in the following ways:
1.2.3
The great advantage of database
links is that they allow users to access another user's objects in a remote
database so that they are bounded by the privilege set of the object's owner.
In other words, a local user can access a link to a remote database without
having to be a user on the remote database.
For example, assume that
employees submit expense reports to Accounts Payable (A/P), and further suppose
that a user using an A/P application needs to retrieve information about
employees from the HR database. The A/P users should be able to connect to the
HR database and execute a stored procedure in the remote HR database that
retrieves the desired information. The A/P users should not need to be HR
database users to do their jobs; they should only be able to access HR
information in a controlled way as limited by the procedure.
Database links allow one to grant
limited access on remote databases to local users. By using current user links,
one can create centrally managed global users whose password information is hidden
from both administrators and non-administrators. For example, A/P
users can access the HR database as SCOTT, but unlike fixed user links, SCOTT's
credentials are not stored where database users can see them.
By using fixed user links, one
can create non-global users whose password information is stored in unencrypted
form in the LINK$ data dictionary table. Fixed user links are easy to create
and require low overhead because there are no SSL or directory requirements,
but a security risk results from the storage of password information in the
data dictionary.
1.2.4
To understand how a database link
works, one must first understand what a global database name is. Each database
in a distributed database is uniquely identified by its global database name.
Oracle forms a database's global database name by prefixing the database's
network domain, specified by the DB_DOMAIN initialization parameter at database
creation, with the individual database name, specified by the DB_NAME
initialization parameter.
For example, Figure 1.4
illustrates a representative hierarchical arrangement of databases throughout a
network.
The name of a database is formed
by starting at the leaf of the tree and following a path to the root. For
example, the MFTG database is in DIVISION3 of the VRCE_TOOLS branch of the COM
domain. The global database name for MFTG is created by concatenating the nodes
in the tree as follows:
MFTG.DIVISION3.VRCE_TOOLS.ernet.in
While several databases can share
an individual name, each database must have a unique global database name. For
example, the network domains US.ETRLS.VRCEadmin.ernet.in and
UK.ELNX.VRCEadmin.ernet.in each contain a ORACLE0 database. The global database
naming system distinguishes the ORACLE0 database in the ETRLS division from the
ORACLE0 database in the ELNX division as follows:
ORACLE0.US.ETRLS.VRCEadmin.ernet.in
ORACLE0.UK.ELNX.VRCEadmin.ernet.in
1.2.5
Oracle allows one to create private,
public, and global database links. These basic link types
differ according to which users are allowed access to the remote database:
Type |
Owner |
Description |
Private
|
User who
created the link. Access ownership data through DBA_DB_LINKS or
ALL_DB_LINKS. |
Creates link
in a specific schema of the local database. Only the owner of a private
database link or PL/SQL subprograms in the schema can use this link to
access database objects in the corresponding remote database. |
Public |
User called
PUBLIC. Access ownership data through DBA_DB_LINKS or ALL_DB_LINKS. |
Creates a
database-wide link. All users and PL/SQL subprograms in the database can use
the link to access database objects in the corresponding remote
database. |
Global |
User called
PUBLIC. Access ownership data through DBA_DB_LINKS or ALL_DB_LINKS. |
Creates a
network-wide link. When an Oracle network uses Oracle Names, the names
servers in the system automatically create and manage global database links
for every Oracle database in the network. Users and PL/SQL subprograms in
any database can use a global link to access objects in the corresponding
remote database. |
|
Determining the type of database
links to employ in a distributed database depends on the specific requirements
of the applications using the system. Consider these advantages and
disadvantages:
Private
Database Link |
This link is
more secure than a public or global link, because only the owner of the
private link, or subprograms within the same schema, can use the link to
access the remote database. |
Public
Database Link |
When many
users require an access path to a remote Oracle database, one can create a
single public database link for all users in a database. |
Global
Database Link |
When an Oracle
network uses Oracle Names, an administrator can conveniently manage global
database links for all databases in the system. Database link management is
centralized and simple. |
1.2.6
When creating the link, one
determines which user should connect to the remote database to access the data.
The following table explains the differences among the categories of users
involved in database links:
User Type |
Meaning |
Sample Link
Creation Syntax |
Connected
user |
A local user
accessing a database link in which no fixed username and password have been
specified. If SYSTEM accesses a public link in a query, then the connected
user is SYSTEM, and Oracle connects to the SYSTEM schema in the remote
database. Note: A connected
user does not have to be the user who created the link, but is any user who
is accessing the link. |
CREATE PUBLIC
DATABASE LINK oracle1 USING 'oracle1'; |
Current
user |
A global user
in a CURRENT_USER database link. The global user must be authenticated by an
X.509 certificate and a private key and be a user on both databases involved
in the link. Current user links are an aspect of the Oracle Advanced
Security option. |
CREATE PUBLIC
DATABASE LINK oracle1 CONNECT TO CURRENT_USER using 'oracle1'; |
Fixed
user |
A user whose
username/password is part of the link definition. If a link includes a fixed
user, then the fixed user's username and password are used to connect to the
remote database. |
CREATE PUBLIC
DATABASE LINK oracle1 CONNECT TO Jhoro IDENTIFIED BY doe USING
'oracle1'; |
1.2.6.1
Connected user links have no
connect string associated with them. The advantage of a connected user link is
that a user referencing the link connects to the remote database as the same
user. Furthermore, because no connect string is associated with the link, no
password is stored in clear text in the data dictionary.
Connected user links have some
disadvantages. Because these links require users to have accounts and
privileges on the remote databases to which they are attempting to connect,
they require more privilege administration for administrators. Also, giving
users more privileges than they need violates the fundamental security concept
of least privilege: users should only be given the privileges they need to
perform their jobs.
The ability to use connected user
database link depends on several factors, chief among them whether the user is
authenticated by Oracle using a password, or externally authenticated by the
operating system or a network authentication service. If the user is externally
authenticated, then the ability to use a connected user link also depends on
whether the remote database accepts remote authentication of users, which is
set by the REMOTE_OS_AUTHENT initialization parameter.
The REMOTE_OS_AUTHENT parameter
operates as follows:
If
REMOTE_OS_AUTHENT is... |
Then... |
TRUE for the
remote database |
An
externally-authenticated user can connect to the remote database using a
connected user database link. |
FALSE for the
remote database |
An
externally-authenticated user cannot connect to the remote database using a
connected user database link unless a secure protocol or a network
authentication service supported by the Oracle Advanced Security option is
used. |
1.2.6.2
A benefit of a named link is that
it connects a user in a primary database to a remote database with the security
context of the user in the connect string. For example, local user JOE can
create a public database link in JOE's schema that specifies the fixed user
SCOTT with password TIGER. If JHORO uses the fixed user link in a query, then
JHORO is the user on the local database, but he connects to the remote database
as SCOTT/TIGER.
Fixed user links have a username
and password associated with the connect string. The username and password are
stored in unencrypted form in the data dictionary in the LINK$ table. This fact
creates a possible security weakness of fixed user database links: a user with
the SELECT ANY TABLE privilege has access to the data dictionary so long as the
O7_DICTIONARY_ACCESSIBILITY initialization parameter is set to TRUE, and thus
the authentication associated with a fixed user is compromised.
For an example of this security
problem, assume that JHORO does not have privileges to use a private link that
connects to the HR database as SCOTT/TIGER, but has SELECT ANY TABLE privilege
on a database in which the O7_DICTIONARY_ACCESSIBILITY initialization parameter
is set to TRUE. He can select from LINK$ and read that the connect string to HR
is SCOTT/TIGER. If JHORO has an account on the host on which HR resides, then
he can connect to the host and then connect to HR as SCOTT using the password
TIGER. He will have all SCOTT's privileges if he connects locally and any audit
records will be recorded as if he were SCOTT.
Current user database links make
use of a global user. A global user must be authenticated by an X.509
certificate and a private key and be a user on both databases involved in the
link.
The user invoking the
CURRENT_USER link does not have to be a global user. For example, if JHORO is
authenticated by password to the Accounts Payable database, he can access a
stored procedure to retrieve data from the HR database. The procedure uses a
current user database link, which connects her to HR as global user SCOTT.
SCOTT is a global user and thereby authenticated through a certificate and
private key over SSL, but JHORO is not.
Note that current user database
links have these consequences:
1.2.7
Create database links using the CREATE
DATABASE LINK statement. The table gives examples of SQL statements that create
database links in a local database to the remote
ORACLE0.US.ETRLS.VRCE admin.ernet.in database:
SQL Statement |
Connects To
Database |
Connects As |
Link Type |
|
ORACLE0 using
net service name ORACLE0_US |
Connected
user |
Private
connected user |
|
ORACLE0 using
service name AM_SLS |
Current
global user |
Private
current user |
|
ORACLE0 using
net service name ORACLE0_US |
SCOTT using
password TIGER |
Private fixed
user |
|
ORACLE0 using
net service name REV |
SCOTT using
password TIGER |
Public fixed
user |
|
ORACLE0 using
net service name ORACLE0 |
SCOTT using
password TIGER, authenticated as KAUSTAV using password KAUSTAV |
Shared public
fixed user |
1.2.8
After one has created a database
link, one can execute SQL statements that access objects on the remote
database. For example, to access remote object EMP using database link FOO, one
can issue:
SELECT * FROM emp@foo;
Constructing properly formed
object names using database links is an essential aspect of data manipulation
in distributed systems.
Oracle uses the global database
name to name the schema objects globally using the following scheme:
schema.schema_object@global_database_name
where:
schema
|
is a
collection of logical structures of data, or schema objects. A schema is
owned by a database user and has the same name as that user. Each user owns a
single schema. |
Schema_object
|
is a logical
data structure like a table, index, view, synonym, procedure, package, or a
database link. |
global_database_name
|
is the name
that uniquely identifies a remote database. This name must be the same as the
concatenation of the remote database's initialization parameters DB_NAME and
DB_DOMAIN, unless the parameter GLOBAL_NAMES is set to FALSE, in which case
any name is acceptable. |
|
For example, using a database
link to database ORACLE0.DIVISION3.VRCE.ernet.in, a user or application can reference
remote data as follows:
SELECT * FROM scott.emp@oracle0.division3.VRCE.ernet.in; # emp table in scott's schema
SELECT loc FROM scott.dept@oracle0.division3.VRCE.ernet.in;
If GLOBAL_NAMES is set to FALSE,
then one can use any name for the link to ORACLE0.DIVISION3.VRCE.ernet.in. For
example, one can call the link FOO. Then, one can access database as follows:
SELECT name FROM scott.emp@foo; #
link name different from global name .
Oracle allows one to create synonyms
so that one can hide the database link name from the user. A synonym allows
access to a table on a remote database using the same syntax that one would use
to access a table on a local database. For example, assume one issues the
following query against a table in a remote database:
SELECT * FROM emp@oracle1.VRCE.ernet.in;
One can create the synonym EMP
for EMP@ORACLE1.VRCE.ernet.in so that one can issue the following query instead
to access the same data:
SELECT * FROM emp;
1.2.8.2
To resolve application references
to schema objects (a process called name resolution), Oracle forms
object names hierarchically. For example, Oracle guarantees that each schema
within a database has a unique name, and that within a schema each object has a
unique name. As a result, a schema object's name is always unique within the
database. Furthermore, Oracle resolves application references to an object's
local name.
In a distributed database, a
schema object such as a table is accessible to all applications in the system.
Oracle extends the hierarchical naming model with global database names to
effectively create global object names and resolve references to the
schema objects in a distributed database system. For example, a query can
reference a remote table by specifying its fully qualified name, including the
database in which it resides.
For example, assume that one
connects to the local database as user SYSTEM:
CONNECT system/manager@oracle1
One then issue the following
statements using database link ORACLE1.VRCE.ernet.in to access objects in the
SCOTT and JHORO schemas on remote database ORACLE1:
SELECT * FROM scott.emp@oracle1.VRCE.ernet.in;
INSERT INTO Jhoro.accounts@oracle1.VRCE.ernet.in (acc_no, acc_name, balance)
VALUES (5001, 'BOWER', 2000);
UPDATE Jhoro.accounts@oracle1.VRCE.ernet.in
SET balance = balance + 500;
DELETE FROM Jhoro.accounts@oracle1.VRCE.ernet.in
WHERE acc_name = 'BOWER';
One cannot perform the
following operations using database links:
Chapter 2:
This chapter introduces one to
the use of materialized views and discusses:
2.1 The Need for
Materialized Views
2.2 Types of
Materialized Views
2.3 Creating a
Materialized View
2.5 Altering a
Materialized View
2.6 Dropping a
Materialized View
Materialized views are used in
data warehouses to increase the speed of queries on very large databases.
Queries to large databases often involve joins between tables or aggregations
such as SUM, or both. These operations are very expensive in terms of time and
processing power. The type of materialized view that is created determines how
the materialized view can be refreshed and used by query rewrite.
Materialized views can be used in
a number of ways and almost identical syntax can be used to perform a number of
roles. For example, a materialized view can be used to replicate data, which
was formerly achieved by using the CREATE SNAPSHOT statement. Now CREATE
MATERIALIZED VIEW is a synonym for CREATE SNAPSHOT.
Materialized views improve query
performance by precalculating expensive join and aggregation operations on the
database prior to execution time and storing the results in the database. The
query optimizer can use materialized views by automatically recognizing when an
existing materialized view can and should be used to satisfy a request. It then
transparently rewrites the request to use the materialized view. Queries are
then directed to the materialized view and not to the underlying detail tables.
In general, rewriting queries to use materialized views rather than detail
tables results in a significant performance gain.
When using query rewrite, one
wants to create materialized views that satisfy the largest number of queries.
For example, if one identifies 20 queries that are commonly applied to the
detail or fact tables, then one might be able to satisfy them with five or six
well-written materialized views. A materialized view definition can include any
number of aggregations (SUM, COUNT(x), COUNT(*), COUNT(DISTINCT x), AVG,
VARIANCE, STDDEV, MIN, and MAX) and/or include any number of joins. If one is
unsure of which materialized views to create, Oracle provides a set of advisory
functions in the DBMS_OLAP package to help in designing and evaluating
materialized views for query rewrite.
If a materialized view is to be
used by query rewrite, it must be stored in the same database as its fact or
detail tables. A materialized view can be partitioned, and one can define a
materialized view on a partitioned table and one or more indexes on the
materialized view.
Materialized views are similar to
indexes in several ways: they consume storage space, they must be refreshed
when the data in their master tables changes, and, when used for query rewrite,
they improve the performance of SQL execution and their existence is
transparent to SQL applications and users. Unlike indexes, materialized views
can be accessed directly using a SELECT statement. Depending on the type of
refresh that is required, they can also be accessed directly in an INSERT,
UPDATE, or DELETE statement.
The SELECT clause in the
materialized view creation statement defines the data that the materialized
view is to contain. There are only a few restrictions on what may be specified.
Any number of tables can be joined together, however, they cannot be remote
tables if one wishes to take advantage of query rewrite or the warehouse
refresh facility (part of the DBMS_OLAP package). Besides tables, views, inline
views, subqueries, and materialized views may all be joined or referenced in
the SELECT clause.
The types of materialized views
are:
2.2.1 Materialized
Views with Joins and Aggregates
2.2.2 Single-Table
Aggregate Materialized Views
2.2.3 Materialized
Views Containing Only Joins
In data warehouses, materialized
views would normally contain one of the aggregates shown in Example 2 below.
For fast refresh to be possible, the SELECT list must contain all of the GROUP
BY columns (if present), and may contain one or more aggregate functions. The
aggregate function must be one of: SUM, COUNT(x), COUNT(*),COUNT(DISTINCT x),
AVG, VARIANCE, STDDEV, MIN, and MAX, and the expression to be aggregated can be
any SQL value expression.
If a materialized view contains
joins and aggregates, then it cannot be fast refreshed using a materialized
view log. Therefore, for a fast refresh to be possible, only new data can be
added to the detail tables and it must be loaded using the direct path method.
Here are some examples of the
type of materialized view that can be created.
Create
Materialized View: Example 1
CREATE MATERIALIZED VIEW vrce_oracle1_mv
PCTFREE 0 TABLESPACE mviews
STORAGE (initial 16k next 16k pctincrease 0)
BUILD DEFERRED
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT
s.vrce_name,
SUM(dollar_oracle1) AS sum_dollar_oracle1
FROM vrce s, fact f
WHERE f.vrce_key = s.vrce_key
GROUP BY s.vrce_name;
Example 1 creates a materialized
view vrce_oracle1_mv that computes the sum of oracle1 by vrce.
It is derived by joining the tables vrce and fact on the
column vrce_key. The materialized view does not initially contain any
data because the build method is DEFERRED. A complete refresh is required for
the first refresh of a build deferred materialized view. When it is refreshed,
a complete refresh is performed and, once populated, this materialized view can
be used by query rewrite.
Create
Materialized View: Example 2
CREATE MATERIALIZED VIEW vrce_stdcnt_mv
PCTFREE 0 TABLESPACE mviews
STORAGE (INITIAL 16k NEXT 16k PCTINCREASE 0)
BUILD IMMEDIATE
REFRESH FAST
ENABLE QUERY REWRITE
AS
SELECT vrce_name, t.time_key,
STDDEV(unit_oracle1) AS stdcnt_unit_oracle1
AVG(unit_oracle1) AS avgcnt_unit_oracle1
COUNT(unit_oracle1) AS count_days
SUM(unit_oracle1) AS sum_unit_oracle1
FROM vrce s, fact f, time t
WHERE s.vrce_key = f.vrce_key AND
f.time_key = t.time_key
GROUP BY vrce_name, t.time_key;
The statement above creates a
materialized view vrce_stdcnt_mv that computes the standard deviation
for the number of units sold by a vrce on a given date. It
is derived by joining the tables vrce, time, and fact on the
columns vrce_key and time_key. The materialized view is populated with data
immediately because the build method is immediate and it is available for use
by query rewrite. In this example, the default refresh method is FAST, which is
allowed because the COUNT and SUM aggregates have been included to support fast
refresh of the STDDEV aggregate.
A materialized view that contains
one or more aggregates (SUM, AVG, VARIANCE, STDDEV, COUNT) and a GROUP BY
clause may be based on a single table. The aggregate function can involve an
expression on the columns such as SUM(a*b). If this materialized view is to be
incrementally refreshed, then a materialized view log must be created on the
detail table with the INCLUDING NEW VALUES option, and the log must contain all
columns referenced in the materialized view query definition.
CREATE MATERIALIZED VIEW log on fact
with rowid (vrce_key, time_key, dollar_oracle1, unit_oracle1)
including new values;
CREATE MATERIALIZED VIEW sum_oracle1
PARALLEL
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT f.vrce_key, f.time_key,
COUNT(*) AS count_grp,
SUM(f.dollar_oracle1) AS sum_dollar_oracle1,
COUNT(f.dollar_oracle1) AS count_dollar_oracle1,
SUM(f.unit_oracle1) AS sum_unit_oracle1,
COUNT(f.unit_oracle1) AS count_unit_oracle1
FROM fact f
GROUP BY f.vrce_key, f.time_key;
In this example, a materialized
view has been created which contains aggregates on a single table. Because the
materialized view log has been created, the materialized view is fast
refreshable. If DML is applied against the fact table, then, when the commit is
issued, the changes will be reflected in the materialized view.
Table 2.1
illustrates the aggregate requirements for a single-table aggregate
materialized view.
If aggregate
X is present, aggregate Y is required and aggregate Z is optional |
||
X |
Y |
Z |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that COUNT(*) must always be
present.
Incremental refresh for a
single-table aggregate materialized view is possible after any type of DML to
the base tables (direct load or conventional INSERT, UPDATE, or DELETE).
A single-table aggregate
materialized view can be defined to be refreshed ON COMMIT or ON DEMAND. If it
is ON COMMIT, the refresh is performed at commit time of the transaction that
does DML on one of the materialized view's detail tables.
After a refresh ON COMMIT, you
are urged to check the alert log and trace files to see if any error occurred
during the refresh.
Materialized views may contain
only joins and no aggregates, such as in the next example where a materialized
view is created which joins the fact table to the vrce table.
The advantage of creating this type of materialized view is that expensive
joins will be precalculated.
Incremental refresh for a
materialized view containing only joins is possible after any type of DML to
the base tables (direct load or conventional INSERT, UPDATE, or DELETE).
A materialized view containing
only joins can be defined to be refreshed ON COMMIT or ON DEMAND. If it is ON
COMMIT, the refresh is performed at commit time of the transaction that does
DML on the materialized view's detail table.
If one
specifies REFRESH FAST, Oracle performs further verification of the query
definition to ensure that fast refresh can be performed if any of the detail tables change.
If some of the above restrictions
are not met, then the materialized view should be created as REFRESH FORCE to
take advantage of incremental refresh when it is possible. If the materialized
view is created as ON COMMIT, Oracle performs all of the fast refresh checks.
If one of the tables did not meet all of the criteria, but the other tables
did, the materialized view would still be incrementally refreshable with
respect to the other tables for which all the criteria are met.
A materialized view log should
contain the rowid of the master table. It is not necessary to add other
columns.
To speed up refresh, it is
recommended that the user create indexes on the columns of the materialized
view that vrces the rowids of the fact table.
CREATE MATERIALIZED VIEW LOG ON fact WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON time WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON vrce WITH ROWID;
CREATE MATERIALIZED VIEW detail_fact_mv PARALLEL
BUILD IMMEDIATE
REFRESH FAST
AS
SELECT
f.rowid "fact_rid", t.rowid "time_rid", s.rowid "vrce_rid",
s.vrce_key, s.vrce_name, f.dollar_oracle1,
f.unit_oracle1, f.time_key
FROM fact f, time t, vrce s
WHERE f.vrce_key = s.vrce_key(+) AND
f.time_key = t.time_key(+);
In this example, in order to
perform a REFRESH FAST, unique constraints should exist on s.vrce_key
and t.time_key. It is also recommended that indexes be created on the
columns fact_rid, time_rid, and vrce_rid, as
illustrated below, which will improve the performance of refresh.
CREATE INDEX mv_ix_factrid ON
detail_fact_mv(fact_rid);
Alternatively, if the example
shown above did not include the columns time_rid and vrce_rid,
and if the refresh method was REFRESH FORCE, then this materialized view would
be fast refreshable only if the fact table was updated but not if the
tables time or vrce were updated.
CREATE MATERIALIZED VIEW detail_fact_mv
PARALLEL
BUILD IMMEDIATE
REFRESH FORCE
AS
SELECT
f.rowid "fact_rid",
s.vrce_key, s.vrce_name, f.dollar_oracle1,
f.unit_oracle1, f.time_key
FROM fact f, time t, vrce s
WHERE f.vrce_key = s.vrce_key(+) AND
f.time_key = t.time_key(+);
A materialized view can be
created with the CREATE MATERIALIZED VIEW statement or using Oracle Enterprise
Manager. The following command creates the materialized view vrce_oracle1_mv.
CREATE MATERIALIZED VIEW vrce_oracle1_mv
PCTFREE 0 TABLESPACE mviews
STORAGE (INITIAL 16k NEXT 16k PCTINCREASE 0)
PARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE
ENABLE QUERY REWRITE
AS
SELECT
s.vrce_name,
SUM(dollar_oracle1) AS sum_dollar_oracle1
FROM vrce s, fact f
WHERE f.vrce_key = s.vrce_key
GROUP BY s.vrce_name;
It is not uncommon in a data
warehouse to have already created summary or aggregation tables, and the DBA
may not wish to repeat this work by building a new materialized view. In this
instance, the table that already exists in the database can be registered as a prebuilt
materialized view
2.3.1
Even though a materialized view
is defined, it will not automatically be used by the query rewrite facility.
The clause ENABLE QUERY REWRITE must be specified if the materialized view is
to be considered available for rewriting queries.
If this clause is omitted or
specified as DISABLE QUERY REWRITE when the materialized view is initially
created, the materialized view can subsequently be enabled for query rewrite
with the ALTER MATERIALIZED VIEW statement.
If one defines a materialized
view as BUILD DEFERRED, it is also not eligible for query rewrite until it is
populated with data.
Query
rewrite is not possible with all materialized views. If query rewrite is not
occurring when expected, check to see if the materialized view satisfies all of
the following conditions.
1. There cannot be
non-repeatable expressions (ROWNUM, SYSDATE, non-repeatable PL/SQL functions,
and so on) anywhere in the defining query.
2. There cannot be
references to RAW or LONG RAW datatypes or object REFs.
3. The query must
be a single-block query, that is, it cannot contain set functions (UNION,
MINUS, and so on). However, a materialized view can have multiple query blocks
(for example, inline views in the FROM clause and subselects in the WHERE or
HAVING clauses).
4. If the
materialized view was registered as PREBUILT, the precision of the columns must
agree with the precision of the corresponding SELECT expressions unless
overridden by WITH REDUCED PRECISION.
1. If a query has
both local and remote tables, only local tables will be considered for
potential rewrite.
2. None of the
detail tables can be owned by SYS, and the materialized view cannot be owned by
SYS.
When one defines a materialized
view, one can specify its two refresh options: how to refresh and what type of
refresh. If unspecified, the defaults are assumed as ON DEMAND and FORCE.
The two refresh execution modes
are: ON COMMIT and ON DEMAND. The method you select will affect the type of
materialized view that can be defined.
Refresh Mode |
Description |
ON
COMMIT |
Refresh
occurs automatically when a transaction that modified one of the
materialized view's fact tables commits. Can be used with materialized views
on single table aggregates and materialized views containing joins
only. |
ON
DEMAND |
Refresh
occurs when a user manually executes one of the available refresh procedures
contained in the DBMS_MVIEW package (REFRESH, REFRESH_ALL_MVIEWS,
REFRESH_DEPENDENT). |
|
If a materialized view fails
during refresh at COMMIT time, the user has to explicitly invoke the refresh
procedure using the DBMS_MVIEW package after addressing the errors specified in
the trace files. Until this is done, the view will no longer be refreshed
automatically at commit time.
One can specify how you want your
materialized views to be refreshed from the detail tables by selecting one of
four options: FORCE, COMPLETE, FAST, and NEVER.
Refresh
Option |
Description |
COMPLETE |
Refreshes by
recalculating the materialized view's defining query. |
FAST |
Refreshes by
incrementally adding the new data that has been inserted into the tables.
The new data is obtained from the direct path log or from the materialized
view logs. |
FORCE |
Applies fast
refresh if possible; otherwise, it applies COMPLETE refresh. |
NEVER |
Indicates that
the materialized view will not be refreshed with the Oracle refresh
mechanisms. |
|
Whether the fast refresh option
is available will depend upon the type of materialized view. Fast refresh is
available for three general classes of materialized views: materialized views
with joins only, materialized views with joins and aggregates, and materialized
views with aggregates on a single table.
The materialized view's defining
query is restricted as follows:
Defining queries for materialized
views with joins only and no aggregates have these restrictions on fast
refresh:
Defining queries for materialized
views with single-table aggregates have these restrictions on fast refresh:
Defining queries for materialized
views with joins and aggregates have these restrictions on fast refresh:
An ORDER BY clause is allowed in
the CREATE MATERIALIZED VIEW statement. It is only used during the initial
creation of the materialized view. It is not used during a full refresh or an
incremental refresh.
To improve the performance of
queries against large materialized views, vrce the rows in the materialized
view in the order specified in the ORDER BY clause. This initial ordering
provides physical clustering of the data. If indexes are built on the columns
by which the materialized view is ordered, accessing the rows of the
materialized view using the index will significantly reduce the time for disk
I/O due to the physical clustering.
The ORDER BY clause is not
considered part of the materialized view definition. As a result, there is no
difference in the manner in which Oracle detects the various types of
materialized views (for example, materialized join views with no aggregates).
For the same reason, query rewrite is not affected by the ORDER BY clause. This
feature is similar to the CREATE TABLE ... ORDER BY ... capability that exists
in Oracle. For example:
CREATE MATERIALIZED VIEW sum_oracle1
REFRESH FAST ON DEMAND AS
SELECT cityid, COUNT(*) count_all,
SUM(oracle1) sum_oracle1, COUNT(oracle1) cnt_oracle1
FROM city_oracle1
ORDER BY cityid;
In the above example, we would
use the "ORDER BY cityid" clause only during the creation of the
materialized view. The materialized view definition is not affected by the
ORDER BY clause. The definition is:
SELECT cityid, COUNT(*) count_all,
SUM(oracle1) sum_oracle1, COUNT(oracle1) cnt_oracle1
FROM city_oracle1
A nested materialized view is a
materialized view whose definition is based on another materialized view. A
nested materialized view may reference other relations in the database in
addition to materialized views.
In a data warehouse, you
typically create many aggregate views on a single join (for example, rollups
along different dimensions). Incrementally maintaining these distinct
materialized aggregate views can take a long time because the underlying join has
to be performed many times. By using nested materialized views, the join is
performed just once (while maintaining the materialized view containing joins
only) and incremental maintenance of single-table aggregate materialized views
is very fast due to the self-maintenance refresh operations on this class of
views. Using nested materialized views also overcomes the limitation posed by
materialized aggregate views, where incremental maintenance can only be done
with direct-load insert.
Nested materialized views incur
the space overhead of materializing the join and having a materialized view
log. This is in contrast to materialized aggregate views where the space
requirements of the materialized join view and its log are not demanding, but
have relatively long refresh times due to multiple computations of the same
join.
Nested materialized views are
incrementally refreshable under any type of DML while materialized aggregate
views are incrementally refreshable under direct-load insert only.
One can create a materialized
join view or a single-table aggregate materialized view on a single-table on
top of another materialized join view, single-table aggregate materialized
view, complex materialized view (a materialized view Oracle cannot perform
incremental refresh on) or base table. All the underlying objects (be they
materialized views or tables) on which the materialized view is defined on must
have a materialized view log. All the underlying objects are treated as if they
were tables. All the existing options for materialized join views and
single-table aggregate materialized views can be used. Thus, ON COMMIT refresh
is supported for these types of nested materialized views.
The following presents a retail
database with an example schema and some materialized views to illustrate how
nested materialized views can be created.
VRCE (vrce_key, vrce_name, vrce_city, vrce_state, vrce_country)
PRODUCT (prod_key, prod_name, prod_brand)
TIME (time_key, time_day, time_week, time_month)
FACT (vrce_key, prod_key, time_key, dollar_oracle1)
/* create the materialized view logs */
CREATE MATERIALIZED VIEW LOG ON fact
WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON vrce
WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON time
WITH ROWID;
CREATE MATERIALIZED VIEW LOG ON product
WITH ROWID;
/*create materialized join view join_fact_vrce_time as incrementally
refreshable at COMMIT time */
CREATE MATERIALIZED VIEW join_fact_vrce_time
REFRESH FAST ON COMMIT AS
SELECT s.vrce_key, s.vrce_name, f.dollar_oracle1, t.time_key, t.time_day,
f.prod_key, f.rowid frid, t.rowid trid, s.rowid srid
FROM fact f, vrce s, time t
WHERE f.time_key = t.time_key AND
f.vrce_key = s.vrce_key;
To create a nested materialized
view on the table join_fact_vrce_time, one would have to create a
materialized view log on the table. Because this will be a single-table
aggregate materialized view on join_fact_vrce_time, one needs to log
all the necessary columns and use the INCLUDING NEW VALUES clause.
/* create materialized view log on join_fact_vrce_time */
CREATE MATERIALIZED VIEW log on join_fact_vrce_time
WITH rowid (vrce_name, time_day, dollar_oracle1)
INCLUDING new values;
/* create the single-table aggregate materialized view sum_oracle1_vrce_time on
join_fact_vrce_time as incrementally refreshable at COMMIT time. */
CREATE MATERIALIZED VIEW sum_oracle1_vrce_time
REFRESH FAST ON COMMIT
AS
SELECT COUNT(*) cnt_all, SUM(dollar_oracle1) sum_oracle1, COUNT(dollar_oracle1)
cnt_oracle1, vrce_name, time_day
FROM join_fact_vrce_time
GROUP BY vrce_name, time_day;
Note that the above single-table
aggregate materialized view sum_oracle1_vrce_time is logically equivalent
to a multi-table aggregate on the tables fact, time, and vrce whose definition
is
SELECT COUNT(*) cnt_all, SUM(f.dollar_oracle1) sum_oracle1,
COUNT(f.dollar_oracle1) cnt_oracle1, s.vrce_name, t.time_day
FROM fact f, time t , vrce s
WHERE f.time_key = t.time_key AND
f.vrce_key = s.vrce_key
GROUP BY vrce_name, time_day;
One can now define a materialized
join view join_fact_vrce_time_prod as a join between join_fact_vrce_time
and table product.
CREATE MATERIALIZED VIEW join_fact_vrce_time_prod
REFRESH FAST ON COMMIT
AS
SELECT j.rowid jrid, p.rowid prid, j.vrce_name, j.prod_key, j.prod_name,
j.dollar_oracle1
FROM join_fact_vrce_time j, product p
WHERE j.prod_key = p.prod_key;
The above schema can be
diagrammatically represented as in Figure 2.1
Materialized views with joins and
aggregates can be nested if they are refreshed as COMPLETE REFRESH. Thus, a
user can arbitrarily nest materialized views having joins and aggregates. No
incremental maintenance is possible for these materialized views.
Note that the ON COMMIT refresh
option is not available for complex materialized views. Because you have to
invoke the refresh functions manually, ordering has to be taken into account.
This is because the refresh for a materialized view that is built on other
materialized views will use the current state of the other materialized views,
whether they are fresh or not. You can find the dependent materialized views
for a particular object using the PL/SQL function GET_MV_DEPENDENCIES() in the
DBMS_MVIEW package.
Here are some guidelines on how
to use nested materialized views:
1. If incremental
refresh is desired when a materialized view contains joins and aggregates and
standard fast refresh cannot be used because DML is occurring on the tables,
consider creating a single-table aggregate materialized view nested on a
materialized join view.
2. If incremental
refresh is desired, you should incrementally refresh all the materialized views
along any chain. It makes little sense to define an incrementally refreshable
materialized view on top of a materialized view that must be refreshed with a
complete refresh.
3. When using
materialized join views and single-table aggregate materialized views, you can
define them to be ON COMMIT or ON DEMAND. The choice would depend on the
application using the materialized views. If one expects the materialized views
to always remain fresh, then all the materialized views should have the ON
COMMIT refresh option. If the time window for refresh does not permit
refreshing all the materialized views at commit time, then the appropriate
materialized views could be created with (or altered to have) the ON DEMAND
refresh option.
There are five amendments that
can be made to a materialized view:
All other changes are achieved by
dropping and then recreating the materialized view.
The COMPILE clause of the ALTER
MATERIALIZED VIEW statement can be used when the materialized view has been
invalidated as described in "Invalidating a Materialized View". This compile process is quick, and
allows the materialized view to be used by query rewrite again.
Use the DROP MATERIALIZED VIEW
statement to drop a materialized view. For example:
DROP MATERIALIZED VIEW oracle1_sum_mv;
This command drops the
materialized view oracle1_sum_mv
. If the
materialized view was prebuilt on a table, then the table is not dropped but it
can no longer be maintained with the refresh mechanism. Alternatively, you can
drop a materialized view using Oracle Enterprise Manager.
Loading strategies can be
classified as one-phase or two-phase. In one-phase loading,
data is loaded directly into the target table, quality assurance tests are
performed, and errors are resolved by performing DML operations prior to
refreshing materialized views. If a large number of deletions are possible,
then storage utilization may be adversely affected, but temporary space
requirements and load time are minimized. The DML that may be required after
one-phase loading causes multi-table aggregate materialized views to become
unusable in the safest rewrite integrity level.
In a two-phase loading process:
Immediately after loading the
detail data and updating the indexes on the detail data, the database can be
opened for operation, if desired. Query rewrite can be disabled by default
(with ALTER SYSTEM SET QUERY_REWRITE_ENABLED = FALSE) until all the
materialized views are refreshed, but enabled at the session level for any
users who do not require the materialized views to reflect the data from the
latest load (with ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE). However, as
long as QUERY_REWRITE_INTEGRITY = ENFORCED or TRUSTED, this is not necessary
because the system ensures that only materialized views with updated data
participate in a query rewrite.
Replication is the process of copying
and maintaining database objects,
such as tables, in multiple databases that make up a
distributed database system.
In a replicated environment
multiple copies of the same database object are present in various sites
constituting the replication environment.
The changes applied to the database objects at one site are monitored
continuously, stored locally and at
frequent intervals, are forwarded and applied at each of the remote locations.
Replication uses distributed database technology to share data between multiple
sites but it is different from the distributed database system in the sense
unlike in the distributed architecture where a particular database object
resides at a particular location, in a replicated database the same data is
available at multiple locations.
Many reasons provide a strong case for
the use of Replicated environments. Some of them are listed below.
Replication improves the
availability of applications because it provides them with alternative data
access options. If one site becomes unavailable, users can continue to query or
even update the remaining locations.
3.2 Performance
Replication provides fast, local
access to shared data because it balances activity over multiple sites. Some
users can access one server while other users access other servers, thereby
reducing the load at all servers. Also, users can access data from the
replication site that has the lowest access cost, which is typically the site
that is geographically closest to them.
3.3 Disconnected Computing
A snapshot is a
complete or partial copy (replica) of a target master table from a single point
in time. Snapshots enable users to work on a subset of a database while
disconnected from the central database server. Later, when a connection is
established, users can synchronize (refresh) snapshots on demand. When users
refresh snapshots, they update the central database with all of their changes,
and they receive any changes that may have happened while they were
disconnected.
3.4 Network Load Reduction
Replication can be used to distribute
data over multiple regional
locations. Then, applications can access various regional servers instead of
accessing one central server. This configuration can reduce network load
dramatically.
3.5 Types of Replication Environment
Oracle supports three types of Replication
environments as indicated below.
3.5.1 Multimaster replication
3.5.2 Snapshot replication.
3.5.3 Multimaster and Snapshot Hybrid Configuration.
3.5.2 MULTIMASTER
REPLICATION.
Multimaster replication is
the type of replication environment which allows multiple sites, acting as
equal peers, to manage groups of replicated database objects. Each site
participating in this environment is called a master site. All
the sites interact with every other site in the environment frequently to
exchange updates that have taken place to each of their database objects. These
changes are then applied at all the sites; in other words after each exchange
all the sites contain the same updated information.
A single master site acts as a master definition site for the master groups (
Master Groups are dealt in detail later). This particular site performs many of
the
administrative and
maintenance tasks for the multimaster replication environment. Any of the
master sites can be configured as the master definition site but there can be
only a single master definition site for a master group and additionally the
master definition site can be changed to a different master site if
necessary.
Multimaster master scheme
ensures excellent failover protection, to make sure information is available
when and where the users need it. It can be used to protect the availability of
a mission critical database. For example, a multimaster replication environment
can replicate all of the data in the database to establish a failover site
should the primary site become unavailable due to system or network outages. In
contrast with Oracle's standby database feature, such a failover site also can
serve as a fully functional database to support application access when the
primary site is concurrently operational, whereas a standby database can only
become fully functional if the primary site is unavailable.
There
are two types of multimaster replication.
3.5.2.1 Asynchronous
Replication,
often referred
to as store-and-forward replication, captures any local changes,
stores them in a queue, and, at regular intervals, propagates and applies these
changes at remote sites. With this form of replication, there is a period of
time before all sites achieve data convergence.
3.5.2.2 Synchronous Replication,
also known as real-time
replication, applies any changes or executes any replicated procedures at
all sites participating in the replication environment as part of a single
transaction. If the DML or procedure fails at any site, the entire transaction
rolls back. Synchronous replication ensures data consistency at all sites in
real-time.
3.5.2.1 Asynchronous Replication
Asynchronous replication independently sends the changes to all the sites in the environment. Propagation occurs separately after the execution of the replication procedure locally. This is the default mode of replication. The process of asynchronous replication is described below.
*User
issues DML or executes a wrapper for a replicated procedure.
Once a table has been set up
for replication, any DML that a user issues on the table is captured for
replication to all other master sites.
For each row that is
inserted, updated, or deleted, an internal trigger creates a deferred remote
procedure call (RPC). The deferred transaction queue contains all deferred
RPCs. If a procedure has been replicated and its wrapper is executed at a
master site, the procedure call is also placed in the deferred transaction
queue.
*Deferred
Transaction Queue stores deferred RPCs.
Each transaction in the
deferred transaction queue has a list of destinations that define where the
deferred transaction should be propagated; this list contains all master sites
except for the originating site.
*Propagation
sends deferred transaction queue entry to destination.
At scheduled intervals or
on-demand, the deferred transactions in the deferred transaction queue are
propagated to the target destinations.
*Deferred
transaction queue entry applied at remote destination.
As a deferred transaction is
being propagated to a target destination, each deferred RPC is applied at the
destination site. If the deferred transaction cannot be successfully applied at
the destination site, it is resent and placed into the error queue at the
destination site, where the DBA can fix the error condition and re-apply the
deferred transaction.
When a deferred transaction
queue entry is applied at the remote destination, Oracle checks for data
conflicts. If a conflict is detected, it is logged and optionally a conflict
resolution method is invoked.
3.5.2.2 Synchronous Replication
Synchronous replication propagates any changes made at a local site to other synchronously linked masters in a replication environment during the same transaction as the initial change. If the propagation fails at any of the master sites, the entire transaction, including the initial change at the local master site, rolls back. This strict enforcement ensures data consistency across the replication environment. The process of synchronous replication is described below.
* User issues DML or executes a wrapper for a
replicated procedure.
Once a table has been setup
for replication, any DML that a user issues on the target table is captured for
replication to all other master replication sites. If a procedure has been
replicated and its wrapper is executed at a master site, the procedure call is
also captured for replication.
*DML or wrapper execution is immediately propagated
to destination sites.
The internal trigger captures any DML and immediately propagates these actions to all other master sites in the replication environment. The internal trigger applies these actions in the security context of the propagator's database link and uses an internal RPC to apply these actions at the destination site.
Like an internal trigger, a wrapper for a replicated procedure
immediately propagates the procedure call to all other master sites in the
replication environment. If the
transaction fails at any one of the master replication sites, then the
transaction is rolled back at all master sites. This methodology ensures data
consistency across all master replication sites. Because of the need to
rollback a transaction if any site fails, synchronous replication is extremely
dependent on highly-available networks, databases, and the associated hardware.
Asynchronous replication, however, means that the data sets at the different master sites in the replication environment may be different for a period of time before the changes have been propagated. Also, data conflicts may occur in an asynchronous replication environment. But in Synchronous replication, there is never a period of time when the data at any of the master sites does not match. Also no data conflicts are introduced. These benefits have the cost of requiring many hardware and networking resources with no flexibility for downtime.
3.6
Conflict Resolution Concepts
Asynchronous replication
might result in conflicts. The possible conflicts that can occur fall into
three categories:
UPDATE CONFLICTS – Two or more updates are
applied to the same row before the preceding update can be propagated to the
other sites.
UNIQUENESS CONFLICTS – An insert is performed at
two or more sites and the primary key for each insert contains the same value,
or an update at one site modifies the primary key , which contains the same
value as an insert at another site.
DELETE CONFLICTS – A row is deleted at one
site and an update occurs at another site, which may result in an attempt to
update a row that does not exist, or the same row is deleted in the same
interval at more than one site.
An instance where an update conflict is possible is shown below.
TIME |
Master Site A |
Master Site B |
5:00 P.M. |
Propagate changes to
Master Site B |
Propagate changes to
Master Site A |
5:05 P.M. |
Updates row 1 |
|
5:07 P.M. |
|
Updates row 1 |
5:10 P.M. |
Propagate changes to
Master Site B |
Propagate changes to
Master Site A |
If the propagation of
changes occurs every 10 minutes in the environment, in the above instance we
find a conflict occurs on row 1 of the table at 5:10 P.M. Once a data conflict
is detected, the following actions occur: The conflict resolution methods try
to resolve the data conflict. If the conflict is not resolved, the data
conflict is logged in the error queue at the destination site. If the data
conflict is logged in the error queue, the database administrator is
responsible for resolving the data conflict manually. One of the conflict
resolution methods is to take the latest change in the interval into
consideration and apply it ignoring the rest of them. A refined form of this is
called “Latest Timestamp” method of resolving the conflicts. In
addition to this Oracle provides a pre-built method called “Overwrite”
method. This method ignores the values from the originating site and hence it
doesn’t guarantee convergence of data. This method is designed to be used by
single master site and multiple snapshot sites. Oracle does not provide any
prebuilt conflict resolution methods to handle delete or ordering conflicts. It
does, however, allow the user to build his own conflict resolution method to
resolve data conflicts.
3.7
Multimaster Topology
The network topology of the machines used in this
project is as shown.
(Master Definition Site)
Database
link
Setting up of Master Sites
Setting up Master Groups
3.7.1 Configuring a Snapshot Replication Environment
In order to setup an environment for snapshot
replication the following tasks need to be done:
*Create the necessary schema.
(schema objects that may participate in replication may be tables etc. among
others)
*Create the necessary
database links ( Refer to chapter )
Assign the appropriate
privileges.
Allocate sufficient job
processes. (Refer to chapter )
This may be done in two ways
ŕUsing an automated tool provided by Oracle
called Replication manager Setup
wizard.
ŕUsing the replication management API
(application program interface).
3.8 Creating a Schema
A snapshot schema in a
remote database and the schema that contains the master table in the master
database must be the same. Since the Scott user of oracle2 contains the tables
to be replicated in the database network the snapshot schema in oracle1 must
also be Scott.
In order to enable the
refreshing of the snapshot for the refreshing mode specified in the definition
of the snapshot, it uses database links to reference remote tables. To ensure
secure behavior in refreshing the snapshot the defining query uses database
links that include embedded user name and password in their definition. The
restriction is that a public database link cannot be used to define a snapshot.
Also, the remote account that the link uses must have the SELECT privileges
necessary to access the data referenced in the snapshot's defining query.
Before creating snapshots,
some administrative database links are required to be created.
A “public” database
link from the snapshot site to the master site . This makes defining private
database links easier.
“Private” database
links from the snapshot administrator to the proxy administrator and from the
propagator to the receiver.
The Replication Manager
Setup Wizard creates database links automatically.
Private database links
connecting each snapshot database to the schema at the master database need to
be created.
3.8.1 Privileges
Both the creator and the owner of the snapshot must be able to issue the defining SELECT statement of the snapshot. The user that creates the snapshot must have the CREATE SNAPSHOT privilege and the appropriate SELECT privileges to execute the defining SELECT statement.
3.8.2
Schedule Purge at Master Site
To keep the size of the deferred transaction queues in check, a purge operation may be scheduled to remove all successfully completed deferred transactions from the deferred transaction queue.
3.8.3 Scheduled Push
Scheduling a push at the snapshot site automatically propagates the deferred transactions at the snapshot site to the associated target master site. Typically, there is only a single scheduled link per snapshot group at a snapshot site, because a snapshot group only has a single target master site.
3.8.4 SNP Background
Processes and Interval
Sufficient
SNP (or job queue) background processes should be allocated to handle the
refreshing of snapshots. Because the snapshot site has only a single scheduled
link to the target master site, the snapshot site only requires a single SNP
process, but to handle additional activity, such as scheduled jobs it may be
required to allocate at least two SNP processes at the snapshot site.
The SNP processes are
defined using the JOB_QUEUE_PROCESSES initialization parameter in the
initialization parameter file for the database. The initialization parameter
file for the database is named as init(sid).ora . To set up SNP processes we may manually edit the initialization
parameter file. The SNP job interval determines how often the
SNP processes "wake up" to execute any pending operations, such as
pushing a queue.
/*
Remember to add initialization parameter file */
The following are the three mechanisms used at the master site to
support refreshing of the snapshots:
Master Table
The master table is the
basis for the snapshot and is located at the target master site. This table may
be involved in both snapshot replication and multimaster replication (remember
that a snapshot points to only one master site).
Changes made to the master
table, as recorded by the snapshot log, will be propagated to the snapshot
during the refresh process.
Internal Triggers
When changes are made to the
master table using DML, an internal trigger records information about the
affected rows in the snapshot log. This information includes the values of the
primary key and/or the ROWID, This is an internal trigger that is fired when a
snapshot log for the target master table is created.
Snapshot Logs
A snapshot log is a table
that records all of the DML changes to a master table. A single master
table has one and only one snapshot
log, regardless of how many snapshots refresh from the master.
Before any snapshot groups
are created , snapshot logs at the master sites need to be created. The log is
required for any master table that supports snapshots with fast refreshes.
A fast refresh of a snapshot
is possible only if the snapshot's master table has a snapshot log. When a
snapshot is fast refreshed, entries in the snapshot's associated snapshot log
that have appeared since the snapshot was last refreshed are applied to the
snapshot. The time of last refresh can be obtained from the database audit trail.
Also in order to get the
snapshot refresh times manually the view
DBA_SNAPSHOT_REFRESH_TIMES may be queried at the master.
Any snapshot created at the
snapshot site is automatically registered at the master site.
3.8.5
Snapshot Groups
A snapshot group in a replication system maintains a partial or
complete copy of the objects at the target master group. Snapshot groups cannot
span master group boundaries.
Snapshot groups are required for supporting
updateable snapshots. If a snapshot does not belong to a snapshot group, then
it must be a read-only snapshot.
3.8.6 Refresh Groups
One of the main tasks of Oracle is to provide a
read-consistent view of the database at the statement level as well as the
transaction level.Thus with in a given transaction the group of snapshots
logically related must be Transactionally consistent
A refresh group can contain
snapshots from more than one snapshot group to maintain transactional (read)
consistency across master group boundaries.
Usage
of multiple refresh groups to refresh the contents of a single snapshot group
must be avoided. Using multiple refresh groups to refresh the contents of a
single snapshot group may introduce inconsistencies in the snapshot data, which
may cause referential integrity problems at the snapshot site.
3.8.7 OVERVIEW OF
SNAPSHOT CREATED
An
overview of the snapshot created on the master definition site Oracle1 is
shown.
The table
‘sub’ is created on the master definition site from the SQL prompt
create table sub(subcode number ,
subname varchar2(30),
constraint pk_2 primary key(subcode));
This table is created in the ‘scott’ schema of the database in Oracle1.
SQL script
for creation of snapshot.
create snapshot scott.test
refresh next (sysdate) + 1/(24*60)
for update as
select * from scott.sub@scott_link;
The snapshot ‘test’ is created in the ‘scott’ schema
of Oracle2 database environment. It refers the table ‘sub’ in the ‘scott’
schema of Oracle1 through the database link ‘scott_link’ as shown in the last
phrase of the script. The snapshot is updateable as indicated by the clause ‘for
update as’ and is refreshed every 1
minute as indicated in the ‘refresh’ clause.
3.9
SETUP OF REPLICATION ENVIRONMENT.
The SQL scripts generated
during the process of setting up of Multimaster as well as snapshots
environment are given. They include scripts for generating master sites, adding
them to master groups and configuring the materialized view as well. These
scripts are generated after configuring the replication environment through DBA
Studio and then clicking on the ‘Generate SQL Script’ tab.
Steps in setting up a Master
site:
At oracle0:
Connect as SYSTEM to oracle0
CONNECT SYSTEM/MANAGER@oracle0
Create replication
administrator
CREATE USER repadmin IDENTIFIED BY repadmin;
Grant privileges to
replication administrator
To grant the replication administrator
privileges to create and
manage a replication environment we execute the
GRANT_ADMIN_ANY_SCHEMA API.
DBMS_REPCAT_ADMIN.GRANT_ADMIN_ANY_SCHEMA (
Note: In order that REPADMIN is able to
create snapshot logs for
any replicated table, we grant
COMMENT ANY TABLE and
LOCK ANY TABLE to REPADMIN.
Register propagator
A propagator that is
responsible for propagating the deferred
transaction queue to other master sites
is registered.
DBMS_DEFER_SYS.REGISTER_PROPAGATOR (
USERNAME => 'repadmin');
The receiver receives the propagated
deferred transactions
sent by the propagator from other master sites.
DBMS_REPCAT_ADMIN.REGISTER_USER_REPGROUP (
USERNAME => 'repadmin',
Schedule Purge at Master
Site
The SCHEDULE_PURGE API automates the purge process.
This helps to keep the size of the deferred transaction
queue
in check by purging successfully
completed deferred
transactions.
CONNECT repadmin/repadmin@oracle0
DBMS_DEFER_SYS.SCHEDULE_PURGE (
Create Master Site Users
This consists of the following steps:
CREATE PROXY SNAPSHOT ADMINISTRATOR
The proxy snapshot administrator performs tasks at the
target
master site on behalf of the snapshot
administrator
CONNECT system/manager@oracle0
CREATE USER proxy_snapadmin IDENTIFIED BY
proxy_snapadmin;
DBMS_REPCAT_ADMIN.REGISTER_USER_REPGROUP (
username => 'PROXY_SNAPADMIN',
privilege_type => 'PROXY_SNAPADMIN',
Create proxy refresher
The proxy refresher performs tasks at the master site on
behalf
of the refresher at the snapshot site.
CREATE USER proxy_refresher IDENTIFIED BY
proxy_refresher;
GRANT CREATE SESSION TO proxy_refresher;
GRANT SELECT ANY TABLE TO proxy_refresher;
Create database links between master sites
The database links provide the necessary
distributed
mechanisms to allow the different replication sites to replicate
data among themselves. See before creating any private database links,
the public database links that each private database link
will use are created. All the replication administrators are
then connected using database links.
CONNECT system/manager@oracle0
CREATE PUBLIC DATABASE LINK oracle1 USING oracle1;
CREATE PUBLIC DATABASE LINK oracle2 USING oracle2;
CONNECT repadmin/repadmin@oracle0
CREATE DATABASE LINK oracle1 CONNECT TO repadmin
IDENTIFIED BY repadmin;
CREATE DATABASE LINK oracle2 CONNECT TO repadmin
IDENTIFIED BY repadmin;
CONNECT system/manager@oracle1
CREATE PUBLIC DATABASE LINK oracle1 USING oracle0;
CREATE PUBLIC DATABASE LINK oracle2 USING oracle2;
CONNECT repadmin/repadmin@oracle1;
CREATE DATABASE LINK oracle0 CONNECT TO repadmin
IDENTIFIED BY repadmin;
CREATE DATABASE LINK oracle2 CONNECT TO repadmin
IDENTIFIED BY repadmin;
CONNECT system/manager@oracle2;
CREATE PUBLIC DATABASE LINK oracle0 USING oracle0;
CREATE PUBLIC DATABASE LINK oracle1 USING oracle1;
CONNECT repadmin/repadmin@oracle2
CREATE DATABASE LINK oracle0 CONNECT TO repadmin
IDENTIFIED BY repadmin;
CREATE DATABASE LINK oracle1 CONNECT TO repadmin
The scheduled link determines how often the deferred
transaction queue is propagated to each of
the other master
sites. SCHEDULE_PUSH procedure needs to be executed for
each database link that is created
CONNECT repadmin/repadmin@oracle0
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
CONNECT repadmin/repadmin@oracle1
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
BEGIN
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
CONNECT repadmin/repadmin@oracle2
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
END;
BEGIN
DBMS_DEFER_SYS.SCHEDULE_PUSH (
destination => 'ORACLE1',
interval => 'SYSDATE + 10 / (24 * 60)',
next_date => SYSDATE);
END;
CONNECT repadmin/repadmin@oracle0
DBMS_DEFER_SYS.SCHEDULE_PUSH (
interval => 'SYSDATE + 10 / (24 * 60)',
SETTING UP SNAPSHOT ENVIRONMENT
SETTING UP SNAPSHOT GROUPS
MONITORING THE DATABASE AND IMPROVING
THE PERFORMANCE BASED ON THE STATISTICS
LOGGED.
Based on the information
logged regarding the Replication activity through ‘Auditing’ (Refer ) the performance of the activity
of replication is made optimum
which is elaborated in the coming
sections.
General Overview
Auditing is the tracking of database
actions. The main aim of auditing is to:
Gather data about specific database activities. The
database administrator can thn make some useful decisions for improving either
the performance or security features of the database.
Check the occurrence of unauthorized activities.
The general types of auditing are
4.1.1Statement Auditing
The auditing of SQL statements with respect to only the type of statement is called statement auditing. “Audit Table” tracks several DDL statements regardless of the table on which they are issued.
4.1.2 Privilege Auditing
The auditing of the use of powerful system privileges to perform corresponding actions is called Privilege auditing. Privilege auditing is more specific than statement auditing because it audits only the use of the privilege.
4.1.3 Schema Object Auditing
The auditing of specific statements on a particular schema object. Schema object auditing is the most specific auditing, doing so only for a specific statement on a specific schema object.
4.3 The Database Audit Trail
Auditing records information such as the operation,
the user performing the operation in the chronological order of their
occurrence. Audit records can be stored in either a data dictionary table,
called the database audit trail,
or an operating system audit trail.
The database audit trail is generated by setting the
AUDIT_TRAIL variable to DB
or TRUE. This enables system-wide auditing and causes
audited records to be written to the database audit trail (the SYS.AUD$ table).
The parameter AUDIT_TRAIL is set to true in the file
The parameter file init.ora and is set to FALSE in
the file
The parameter file
init(sid).ora where ‘sid’ denotes the system identifier (Oracle0,
Oracle1, Oracle2)
Failing to do any of the above two things does not
generate the audit
information. The database audit trail is a single table named AUD$ in
the SYS schema of each Oracle database's data dictionary.
The following general information is included in
each audit trail record
The user name .
The name of the schema object accessed .
The operation performed or attempted.
The completion code of the operation .
The date and time stamp.
The system privileges used.
Audit trail has nothing to do with transaction
commit or rollback. The audit information remains even after rollback. In order
to make meaningful interpretation of the audit information, there are to PL/SQL
scripts that need to be run as SYS.
These are CATALOG.SQL and CATAUDIT.SQL
In
order to disable auditing we run the script CATNOAUD.SQL .
4.4 Project Specific Use of Auditing
The various steps are
1. The
AUDIT_TRAIL parameter was set to TRUE.
The scripts CATALOG.SQL and
CATAUDIT.SQL were run.
An Audit select is issued on the snapshot at the snapshot
site.
Any access of the snapshot for reading it is then entered
into the database along with its time
stamp. The audit information for that snapshot is read form the audit trail as
follows
select
owner,TO_CHAR(timestamp,'DD-MON-YY HH:MI
P.M.'),ACTION_NAME
from sys.dba_audit_trail where
(owner='SCOTT'
and
TO_CHAR(timestamp,'DD-MON-YY') in
(select TO_CHAR(sysdate,'DD-MON-YY') from
dual))
and
((TO_CHAR(timestamp,'HH:MI P.M.') between
('01:00 P.M.') and ('02:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('02:00 P.M.') and ('03:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('03:00 P.M.') and ('04:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('04:00 P.M.') and ('05:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('05:00 P.M.') and ('06:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('06:00 P.M.') and ('07:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('07:00 P.M.') and ('08:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('08:00 P.M.') and ('09:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('09:00 P.M.') and ('10:00 P.M.'))
or
(TO_CHAR(timestamp,'HH:MI P.M.') between
('10:00 P.M.') and ('11:00 P.M.'))) ;
This query gives the record of information for the database snapshot between 11:30 a.m. and 12 noon. Now assuming that the snapshot refresh interval is 1 hour i.e., at 11 am , 12 noon ,1 p.m. and so on. Now if the access of the snapshot is very high between 11:30 and 12 noon this means that the snapshot provides values that are not up to date. So in order to provide updated information to the users of the snapshot we force a refresh of the snapshot by using the DBMS procedure
For instance,
dbms_snapshot.refresh (
'sub,marks,code', 'CF' ); performs a complete refresh of the SUB snapshot, a
fast refresh of the MARKS snapshot, and a default refresh of the CODE
snapshot.
4.5 Job
Queues
Job queues are
event schedules that are specified by the user. They are used in a distributed
database configuration to enable automatic refresh of snapshots. These
processes wake up periodically and refresh any snapshots that are scheduled to
be automatically refreshed. If more than one job
queue process is
used, the processes share the task of
refreshing snapshots. These processes also execute job requests created
by the DBMS_JOB package and propagate queued messages to queues on other
databases.
The job queue
background processes (SNPn) handle message propagation. To enable propagation, at least one job
queue process must be started.
We can use job queues to schedule
periodic execution of PL/SQL code.
4.5.1 SNP Background Processes
A multi-process
Oracle system uses some additional processes called background processes in order to improve the performance and to
allow multiple users. Background processes consolidate functions that would
otherwise be handled by multiple Oracle programs running for each user
process.
Background processes perform I/O by interrupting main-line activity i.e. they
function asynchronously and monitor other Oracle processes to provide increased
parallelism for better performance and reliability.
SNP background
processes execute job queues. Any PL/SQL code can be scheduled to be performed
periodically using the job queue. To schedule a job we submit it to the job queue
and specify the frequency at which the job is to be run. Jobs can be altered,
deleted and disabled.
At least one SNP
process is required to be running to execute queued jobs in the background. SNP
processes periodically wake up and execute any queued jobs that are due to be
run. SNP background processes differ from other Oracle background processes, in
that the failure of an SNP
process does not cause the instance to fail. If an SNP process fails, Oracle
restarts it.
SNP background
processes will not execute jobs if the system has been started in restricted mode. The mode
can be turned on or off as follows.
ALTER
SYSTEM ENABLE RESTRICTED SESSION;
ALTER
SYSTEM DISABLE RESTRICTED SESSION;
In a restricted
session the jobs are not executed, we have to switch to the unrestricted mode
for the processes to run.
An oracle
instance can support a maximum of 36 SNP processes. Task of executing queued
jobs can be shared among multiple SNP processes. These are named as SNP0 to
SNP9, and SNPA-SNPZ.
Job queue
initialization parameters enable to control the operation of the SNP background
processes. The parameters can be set in the init(sid).ora file.
ALTER SYSTEM SET JOB_QUEUE_PROCESSES = 10;
The
JOB_QUEUE_INTERVAL parameter controls the frequency of wake up of the SNP
processes.
4.5.2 Managing Job Queues
To schedule and
manage jobs in the job queue, the procedures in the DBMS_JOB package are used.
Any user who can execute the job queue procedures can use the job queue. The
DBMS_JOB package provides the following procedures
SUBMIT
REMOVE
CHANGE
WHAT
NEXT_DATE
INTERVAL
BROKEN
RUN
4.5.2.1 SUBMIT
Used to submit a
new job to the job queue, the following parameters can be specified;
VARIABLE jobno
number;
BEGIN
DBMS_JOB.SUBMIT ( :jobno,
Snapjob(‘db2’,’sub’),
sysdate, 'sysdate + 1');
COMMIT;
END;
Jobno ----
the job number assigned to the new job that is added to the queue.
Snapjob --- name
of the procedure that is required to be executed periodically.
Sysdate ----
specifies the next time the job is to be run
Sysdate+1 --
expression that calculates the interval between
the updates.
JOB eXecution mechanism
SNP background
processes execute jobs. To execute a job, the process creates a session to run
the job. When an SNP process runs a job, the job is run in the same environment
in which it was submitted and with the owner's default privileges. When a job
is forced to run using the procedure DBMS_JOB.RUN, the job is run by a user process. In this case it is run with the
user’s default privileges only.
4.5.2.2 RemovE
Used to remove a
job from the job queue. The following statement removes job number (jno) from
the job queue
DBMS_JOB.REMOVE
(jno);
Restrictions
Currently
executing jobs can only be removed from the job queue. However, the job will
not be interrupted and the current execution will be completed. You can remove
only jobs you own. If you try to remove a job that you do not own, you receive
a message that states the job is not in the job queue.
4.5.2.3 CHANGE
We can alter any
of the user-definable parameters associated with a job by calling the
DBMS_JOB.CHANGE procedure. In the following example, the job identified as
‘jno’ is now executed every n days…
DBMS_JOB.CHANGE(jno, null, null, 'SYSDATE
+ n');
If we specify
NULL for WHAT, NEXT_DATE, or INTERVAL when the procedure CHANGE is called, the
current value remains unchanged.
4.5.2.4 WHAT
The definition
of a job is changed by calling the DBMS_JOB.WHAT procedure. The following
example changes the definition of the job identified as jno:
DBMS_JOB.WHAT(jno, new_procedure_name);
|
4.5.2.5 NEXT_DATE
The next date that Oracle executes a job
can be altered by calling the DBMS_JOB.NEXT_DATE procedure, as follows:
DBMS_JOB.NEXT_DATE(jno,
'SYSDATE + 1');
4.5.2.6
INTERVAL
The following
statement changes the execution interval for a job by calling the DBMS_JOB.INTERVAL
procedure:
DBMS_JOB.INTERVAL(jno, 'NULL');
In this case,
the job will not run again after it successfully executes.
4.5.2.7 Broken
Used to force a broken job to run.
A job is said to be broken if
a. It has failed to execute for 16 consecutive attempts.
b. The job has been explicitly marked broken by using
the
DBMS_JOB.BROKEN
procedure
DBMS_JOB.BROKEN(jno, TRUE)
Oracle does not attempt to execute a broken job
until the job is you either marked as not broken, or forced the job to be executed by calling the procedure
DBMS_JOB.RUN.
A job can be marked as not broken and subsequently forced to
execute using …
DBMS_JOB.BROKEN(jno,
FALSE);
4.5.2.8 RUN
Used to manually
execute a job. To force a job to be executed immediately, the procedure RUN is
used. On running a job using
DBMS_JOB.RUN, Oracle recomputes the next execution date.
The following
statement runs job ‘jno’ in your session and recomputes the next execution date
DBMS_JOB.RUN (jno);
Terminating a Job
A running job is terminated by marking the job as broken,
identifying the session running the job, and disconnecting that session. The
job should be marked as broken, so that Oracle does not attempt to run the job
again.
A standby database is a database replica created from a backup of a
primary database. By applying archived redo logs from the primary database to
the standby database, we can keep the two databases synchronized.
Main
purposes of standby database
·
Disaster
protection
·
Protection
against data corruption
·
Supplemental
reporting
If the primary database is destroyed or its data becomes corrupted, we
can perform a failover to the standby database, in which case the standby
database becomes the new primary database. We can also open a standby database
with the read-only option, thereby allowing it to function as an independent
reporting database.
We can construct the standby database from backups of the primary
database control files and datafiles, and then prepare it for managed recovery.
This involves the following basic steps:
1.
Make
a backup of the primary datafiles (or access a previous backup) and create the
standby control file.
2.
Transfer
the standby datafiles and control file to the standby site.
3.
Configure
Net8 so that we can connect to the standby service name.
4.
Configure
the primary and standby initialization parameter files.
5.
Initiate
automatic archiving on the primary site.
6.
Start
the standby instance without mounting it.
A standby database environment refers to the
physical configuration of a primary database with one or more associated
standby databases.The main factors affecting configuration of the standby
database environment are described as follows.
Although a standby database can be synchronized with one and only one
primary database, a single primary database can support a theoretically
unlimited number of standby databases. These standby databases are separate and
independent, and can reside on multiple machines or on a single machine.
Each standby database must be created from a backup of the primary
database. This backup can be either consistent or inconsistent, open or closed.
We can also use a single backup of the primary database to create an
unlimited number of standby databases, although the various standby databases
in the environment do not have to be created from the same backup. For example,
we can create one standby database from a backup of the primary database taken
on some date and create another standby database from the backup taken on some
other date .
In a managed
standby environment, the primary database can automatically archive to a
maximum of four standby sites. Consequently, we can simultaneously run a
maximum of four standby databases in managed recovery mode in any given standby
database environment.
Although automatic archival limits the number of standby databases that
we can maintain in managed recovery mode, it does not limit the number of
standby databases that we can maintain in manual recovery mode..
We have two options for
transferring the logs to the stand-by site:
·
Configure
the primary database to archive automatically to the standby site
·
Transfer
the logs manually
If we choose to implement a managed recovery environment, then we must
connect the primary and standby databases through Net8. To configure the
primary initialization parameter file for automatic archival to a standby site,
we need to specify a service name. To specify a service name, we must configure
network files such as tnsnames.ora,
listener.ora,
and possibly names.ora
.
When a primary database archives to a standby service, Oracle automatically
transfers the archived redo logs through Net8 to a directory on the standby
site. As the primary database archives each log, Oracle automatically transfers
the new log to the standby site.
The primary database can continue to archive to the standby site even
if the standby database is not in standby recovery mode, but only if the
standby instance is started. The mechanism for recovery of a standby database
is independent of the mechanism for automatic transfer of archived redo logs to
the standby site. Consequently, we can take a standby database out of managed
recovery mode and temporarily place it in read-only mode. While the standby
database is in read-only mode, archived redo logs continue to accumulate at the
standby site.
Even if we configure the primary database to archive automatically to
the standby site, we can still transfer
the logs manually if the occasion requires it, but only if we have specified
the standby site as an optional destination .
In a non-managed recovery environment, we do not configure the primary
database to archive automatically to the standby site and so must transfer the
archived redo logs by hand. Because the primary database is not archiving
automatically to the standby site, we do not need to maintain a Net8
connection. For example, we can use an operating system utility such as UNIX cp or ftp to transfer the archived
redo logs to the standby site, connect to the standby database using operating
system authentication, then recover the database manually.
We need to know the number and configuration of the machines involved.
Of particular importance are whether:
·
The
primary and standby databases reside on the same host or on different hosts
·
The
primary and standby sites have identical or different directory structures
There is no theoretical limit to the number of machines on which the
standby databases reside. For example, we can locate a standby database:
·
On
the same host as the primary database
·
On
a different host in the same data center
·
On
a different host in a different data center, but in the same metropolitan area
·
On
a different host in a data center in a geographically remote location, for
example, on a different continent
The location of hosts involved in the standby database environment has
obvious implications for a disaster recovery strategy. For example, if the
primary host in a data center is destroyed, then we cannot perform failover to
a standby database unless it resides on a different host, which may or may not
be in the same data center. In a worst case scenario, if the data center is
completely destroyed, then we cannot perform a failover to a standby database
unless the standby database is located on a different machine in a remote
location.
The directory structure of the various standby sites determines the
path names for the standby datafiles and redo logs. A standby site on the same
host as the primary database necessarily uses a different directory structure;
otherwise, the standby database attempts to overwrite the primary database
files.
We have three basic configuration options, which are illustrated in the
figure given below:
·
A
standby database on the same host as the primary database
·
A
standby database on a separate host that uses the same directory structure as
the primary host
·
A
standby database on a separate host that uses a different directory structure
from the primary host
The following table describes possible configurations of primary and
standby databases and the consequences of each:
Although these configurations are mutually exclusive, that is, a given
standby database must use only one configuration, we can run multiple standby
databases simultaneously for a given
primary database. Consequently, we
can implement any combination of configurations. For example, we can maintain
one standby database on the same host as the primary database, another standby
database on a separate host in the same data center, and a third standby
database on a separate host on the other side of the world. We can run each
standby in managed recovery mode, manual recovery mode, or read-only mode.
Although a managed standby environment is mostly automated, it is not
completely automated. In various situations, we may need to perform status
checks or maintenance on the standby database. The most common maintenance
operations are:
·
Checking the Status of Archived Redo Logs
·
Backing Up the Standby Database
·
Responding to Physical Changes in the Primary Database
A standby database can be a powerful tool for both disaster prevention
and supplementary reporting. For example, we can:
·
Maintain
a standby database in a location that is geographically remote from the primary
database, or maintain several standby databases in geographically diverse
locations.
·
Maintain
the primary and standby databases on different disk drives of the same machine,
so that if the primary database's drive fails, we can activate the standby
database and resume normal operations.
·
Implement
a managed standby configuration, whereby a standby database automatically
applies archived redo logs that are automatically shipped to the standby site
by a primary database. In this way, changes to a primary database are regularly
propagated to a standby database.
·
Make
a standby database the new primary database with minimal loss of time and data
if the primary database is completely destroyed.
·
Provide
possible protection against erroneous batch jobs, user errors (for example,
truncating the wrong table), or application corruptions on the primary database
by not applying archived logs containing corrupt data to the standby
database. We can then activate the uncorrupted standby database, making it the
primary database.
While a standby database can be a tremendous benefit in wer backup and
recovery strategy, it involves costs as well. For example, a standby database
requires:
·
An
additional computer if we want to maximize disaster prevention by maintaining a
standby database on a separate host
·
Implementation
and maintenance of a Net8 connection if we use the managed standby environment
·
Additional
system resources and extra storage space no matter which implementation we
choose
·
Administration
of the standby database to mirror some structural operations (for example,
adding a tablespace or datafile) performed on the primary database