Chapter 8: Fault-Tolerance
Techniques
Learning
Objectives
|
- Describe different types of hardware failures
- Plan and set up fault-tolerance methods for disk
storage
- Describe the built-in fault-tolerance features of
some operating systems
- Explain and set up UPS fault-tolerance options
- Develop and implement a tape backup plan and
rotation method
- Develop a disaster recovery plan
|
Component Failures
- Workstations
- Servers
- Switches
- Repeaters
- Bridges
Component Failures Definitions
- Disk fragmentation is a normal and gradual process in
which files become spread throughout a disk and empty
spaces develop between files.
- Defragmentation is a software process that rearranges
data to fill in the spaces that develop on disks and
makes data easier to obtain.
- A power supply is the component in an electrical device
that converts power from the wall outlet to the type and
level of power required by the electrical device.
- A backplane is a main circuit board in a modular computer
or network device with plug-in connectors for the modular
boards.
Fault Tolerance
- Fault tolerance is using hardware and software to ensure
against equipment failures, computer service
interruptions, and data loss.
Disk-Storage Fault Tolerance
- One of the best data security measures is to plan for
disk redundancy in servers and host computers in one of
two ways:
- Installing backup disks
- Installing RAID drives
Installing Backup Disks
- Disk mirroring prevents data loss by duplicating data
from a main disk to a backup disk.
- A small computer system interface (SCSI) adapter is a 32-
or 64-bit computer adapter that transports data between
the computer and attached device(s).
- Disk duplexing also prevents data loss by duplicating
data from a main disk to a backup disk, however it places
the backup disk on a controller or adapter different from
the one used by the main disk.
Disk Mirroring |
 |
 |
Disk Duplexing |
Installing RAID Drives
- Redundant array of inexpensive disks (RAID) is a set of
standards to extend the life of hard disk drives and to
pre-vent data loss from a hard disk failure.
- Striping is a data storage method that breaks up data
files across all volumes of a disk set to minimize wear
on a single volume.
To Mirror a Drive |
Open the Disk Administrator |
 |
To Mirror a Drive |
Setting up a mirrored set |
 |
Server Fault Tolerance
- Hard disk hot-fix methods
- Transaction tracking
- Directory replication
- User account and security replication
- Protection of the operating system from software
application errors
- Record locking
Hot-Fix Capabilities
- A hot fix automatically stores data to an undamaged area
of a disk when a damaged area prevents the data from
being written.
- In sector sparing, certain hard disk sectors are reserved
so they can be used when a bad sector is discovered.
- Cluster remapping flags a damaged cluster and finds an
undamaged cluster on which to write data.
Transaction Tracking
- Transaction tracking is a fault-tolerance method in which
a log is kept of all recent transactions until they are
written to disk.
- If a hard disk or system failure occurs, unwritten
transactions are recovered from the log.
Directory Replication
- A copy of information is available on the secondary
server if the main server malfunctions.
- Database reports can be generated on the secondary server
without slowing down updates on the primary server.
User Account and Security Replication
- The security accounts manager (SAM) database stores
information about user accounts, groups, and access
privileges on a Microsoft Windows NT server.
- The Registry is a database used to store information
about the configuration, program setup, devices, drivers,
and other data important to the setup of a computer
running Windows NT or Windows 95.
User Account and Security Replication
- A primary domain controller (PDC) is an NT server that
acts as the master server when there are two or more NT
servers on a network. It holds the master database of
user accounts and access privileges.
- A backup domain controller (BDC) is an NT server that
acts as a backup to the PDC. It has a copy of the
security accounts manager database that contains
information about user accounts and access privileges.
User Account and Security Replication
- A standalone server is an NT server that is used as a
special-purpose server, such as to store databases. It
does no account log-on verification.
Backing Up the PDC with BDCs |
 |
 |
Synchronizing the BDCs with
the PDC |
Protecting the Operating System
- The Windows NT privileged mode is a protected area from
which the operating system runs.
- Direct access to the computer’s memory or hardware
is allowed only from this mode.
- Application programs that need to access memory and
hardware issue requests to an operating system service
rather than issuing direct memory or hardware
instructions.
File and Record Locking
- Locking is an operating system process that prevents more
than one user from updating a file or a record in a file
at the same time.
- Using an uninterruptible Power Supply for Fault Tolerance
- An uninterruptible power supply (UPS) is a device built
into electrical equipment or a separate device that
provides immediate battery power to equipment during a
power failure or brownout.
To Set the UPS Interface
Parameters Select the UPS icon on the Control Panel
|
 |
 |
Configuring the UPS setup
interface |
To Set the UPS Interface
Parameters Setting the UPS service to start
automatically
|
 |
Developing a Backup Plan
- Purchase a reliable tape system
- Develop a regular backup schedule for network servers,
hosts, and work-stations
Connecting a Tape Drive to a
Separate SCSI Adapter |
 |
Types of Backups
- A full backup is a backup of an entire system, including
all system files, programs, and data files.
- An incremental backup is a backup of new or changed
files.
 |
Backing Up a Server Hard
Disk |
Backing Up Workstations
- May be acceptable on small networks
(10-15 workstations)
- For large networks (several hundred workstations),
consider purchasing tape drives to be rotated from
workstation to workstation.
Keeping Spare Equipment on Hand
- Reduces interruptions to network computing
- Ensures that critical business or organizational
functions suffer only brief interruptions due to
equipment failure
Developing a Disaster Recovery Plan
- Purchase computer operating systems with built-in
fault tolerance
|
Implement disk storage redundancy
|
Implement tape backup scheme with tape rotation
|
Install a UPS
|
Store recent full backups at an offsite location
|
Purchase spare equipment
|
Formulate a written disaster recovery plan
|
Install additional cable
|