Chapter 8:

Chapter 8: Fault-Tolerance Techniques

Learning Objectives

Describe different types of hardware failures
Plan and set up fault-tolerance methods for disk storage
Describe the built-in fault-tolerance features of some operating systems
Explain and set up UPS fault-tolerance options
Develop and implement a tape backup plan and rotation method
Develop a disaster recovery plan

Component Failures

Workstations
Servers
Switches
Repeaters
Bridges

Component Failures Definitions

Disk fragmentation is a normal and gradual process in which files become spread throughout a disk and empty spaces develop between files.
Defragmentation is a software process that rearranges data to fill in the spaces that develop on disks and makes data easier to obtain.
A power supply is the component in an electrical device that converts power from the wall outlet to the type and level of power required by the electrical device.
A backplane is a main circuit board in a modular computer or network device with plug-in connectors for the modular boards.

Fault Tolerance

Fault tolerance is using hardware and software to ensure against equipment failures, computer service interruptions, and data loss.

Disk-Storage Fault Tolerance

One of the best data security measures is to plan for disk redundancy in servers and host computers in one of two ways:
Installing backup disks
Installing RAID drives

Installing Backup Disks

Disk mirroring prevents data loss by duplicating data from a main disk to a backup disk.
A small computer system interface (SCSI) adapter is a 32- or 64-bit computer adapter that transports data between the computer and attached device(s).
Disk duplexing also prevents data loss by duplicating data from a main disk to a backup disk, however it places the backup disk on a controller or adapter different from the one used by the main disk.

Disk Mirroring

Disk Duplexing

Installing RAID Drives

Redundant array of inexpensive disks (RAID) is a set of standards to extend the life of hard disk drives and to pre-vent data loss from a hard disk failure.
Striping is a data storage method that breaks up data files across all volumes of a disk set to minimize wear on a single volume.

To Mirror a Drive
Open the Disk Administrator

To Mirror a Drive
Setting up a mirrored set

Server Fault Tolerance

Hard disk hot-fix methods
Transaction tracking
Directory replication
User account and security replication
Protection of the operating system from software application errors
Record locking

Hot-Fix Capabilities

A hot fix automatically stores data to an undamaged area of a disk when a damaged area prevents the data from being written.
In sector sparing, certain hard disk sectors are reserved so they can be used when a bad sector is discovered.
Cluster remapping flags a damaged cluster and finds an undamaged cluster on which to write data.

Transaction Tracking

Transaction tracking is a fault-tolerance method in which a log is kept of all recent transactions until they are written to disk.
If a hard disk or system failure occurs, unwritten transactions are recovered from the log.

Directory Replication

A copy of information is available on the secondary server if the main server malfunctions.
Database reports can be generated on the secondary server without slowing down updates on the primary server.

User Account and Security Replication

The security accounts manager (SAM) database stores information about user accounts, groups, and access privileges on a Microsoft Windows NT server.
The Registry is a database used to store information about the configuration, program setup, devices, drivers, and other data important to the setup of a computer running Windows NT or Windows 95.

User Account and Security Replication

A primary domain controller (PDC) is an NT server that acts as the master server when there are two or more NT servers on a network. It holds the master database of user accounts and access privileges.
A backup domain controller (BDC) is an NT server that acts as a backup to the PDC. It has a copy of the security accounts manager database that contains information about user accounts and access privileges.

User Account and Security Replication

A standalone server is an NT server that is used as a special-purpose server, such as to store databases. It does no account log-on verification.

Backing Up the PDC with BDCs

Synchronizing the BDCs with the PDC

Protecting the Operating System

The Windows NT privileged mode is a protected area from which the operating system runs.
Direct access to the computer’s memory or hardware is allowed only from this mode.
Application programs that need to access memory and hardware issue requests to an operating system service rather than issuing direct memory or hardware instructions.

File and Record Locking

Locking is an operating system process that prevents more than one user from updating a file or a record in a file at the same time.
Using an uninterruptible Power Supply for Fault Tolerance
An uninterruptible power supply (UPS) is a device built into electrical equipment or a separate device that provides immediate battery power to equipment during a power failure or brownout.

To Set the UPS Interface Parameters

Select the UPS icon on the Control Panel

Configuring the UPS setup interface

To Set the UPS Interface Parameters

Setting the UPS service to start automatically

Developing a Backup Plan

Purchase a reliable tape system
Develop a regular backup schedule for network servers, hosts, and work-stations

Connecting a Tape Drive to a Separate SCSI Adapter

Types of Backups

A full backup is a backup of an entire system, including all system files, programs, and data files.
An incremental backup is a backup of new or changed files.

Backing Up a Server Hard Disk

Backing Up Workstations

Consider network traffic

May be acceptable on small networks
(10-15 workstations)

For large networks (several hundred workstations), consider purchasing tape drives to be rotated from workstation to workstation.

Keeping Spare Equipment on Hand

Reduces interruptions to network computing
Ensures that critical business or organizational functions suffer only brief interruptions due to equipment failure

Developing a Disaster Recovery Plan

Purchase computer operating systems with built-in fault tolerance	Implement disk storage redundancy
Implement tape backup scheme with tape rotation	Install a UPS
Store recent full backups at an offsite location	Purchase spare equipment
Formulate a written disaster recovery plan	Install additional cable