Module 7: Managing Fault Tolerance
Fault tolerance is the the ability of a computer or OS to respond to a catastrophic event, such as power outage or a hardware failure, so that no data is lost, and that work in progress is not corrupted.
RAID provides fault tolerance by implementing data redundancy. There are 5 levels:
RAID 0 | Volume
sets & Disk striping without parity (just for study)
|
RAID 1 | Disk mirroring / duplexing. |
RAID 2 | Disk striping with Error-Correction-Code (ECC). |
RAID 3 | Disk striping with ECC stored as parity. |
RAID 4 | Disk striping large blocks; parity stored on one drive. |
RAID 5 | Disk striping with parity distributed on multiple drives. |
Only RAID 0, RAID 1 and RAID 5 are supported be Windows NT Server, not Windows NT Workstation.
Mirror Sets
- Mirror sets (RAID 1) use the NT Ftdisk.sys (fault tolerance driver) to simultaneously write the same data to two physical drives.
- NT server configures fault tolerance at the level of the logical drive letter, not the physical disk level. If you have two drives on one disk, you can choose to mirror one to other.
- Mirror Sets can enhance read performance because the fault tolerance driver reads from both members of a set at once. There can be a slight decrease in write performance. When one drive fails, performance returns too normal.
Disk Duplexing
- In Disk duplexing, each disk in the mirror set has its own disk controller. In this way, it can protect single controller failure. Disk duplexing is a hardware enhancement to a NTS mirror set. No additional software configuration is necessary.
- Disk duplexing can improve bus traffic because two controllers are involved.
- Keep in mind that the total usage of both disks capacity is only used for about 50 %, this doesn't make it cheap, but it is one of the cheapest methods.
- In stripe sets with parity, you need at least 3 disks. Up to 32 disks can be supported.
- There is a parity block in each stripe and parity block is on alternate disks.
- Upon disk failure, the data on the new disk can be regenerated using the data and parity information in each stripe on the remaining disks.
- RAID 5 significantly improves read performance, but write performance is significantly slower that stripe set without parity.
RAID 5 offers better cost advantage over mirror sets (overhead is about 25% as compared to 50% in mirror set).
- To restore the fault tolerance; replace the failed disk --> in Disk Administrator use the fault tolerance menu and choose regenerate.
Breaking a Mirrored Set
When a member of a mirrored set fails, the functional member will continue to operate. To replace the failed disk the administrator must first break the mirror set by doing the following:
Remember: only mirror sets can provide fault tolerance for the boot and system partition. |
Fault Tolerance Specifications RAID 0 -means- NOT fault tolerant. Windows NT SERVER only supports RAID 0, 1 and 5. NT Workstation natively supports RAID 0 AKA, it does not have fault tolerance -- built in anyway.
RAID Level 2 - Disk Striping with error correction code (ECC)
RAID Level 3 - Disk Striping with ECC stored as parity
RAID Level 4 - Disk Striping large blocks; parity stored on one drive
RAID Summary Chart
Volume Set Mirror Duplex Disk Striping without Parity Disk Striping with parity RAID 0 1 1 0 5 # Disks required 1 (2- 32 areas per volume) 2 same controller 2 not same controller 2 3 Max # disks 32 areas per vol. 32 32 Contain system / boot partition No Yes Yes No No Can be extended without data loss Yes No Can be decreased without data loss No No File Systems Must be the same on all volumes and/or:
FAT, NTFS ; can put multiple file systems togetherNTFS only Different Types of Hard Disks together Yes Advantage disk space ; best method w/out fault tolerance potential read performance gain reduce bus traffic and potential read performance gain; also protect against controller failure I/O speed gain. Fastest read/write performance of all disk sets I/O speed gain. 2nd Fastest read/write performance of all disk sets Disadvantage no fault ; no performance gain write
performance ; costcost no fault ; space requires 3x more memory for parity calcs.- AKA - memory hog and space hog! Supports Removable media? Can be done but not recommended unless you plan to use removable media as fixed disk. Can be done but not recommended unless you plan to use removable media as fixed disk. No Paging File Can be placed but no performance gains Can be placed but no performance gains Can be placed but no performance gains Can be placed but no performance gains Should not be implemented on ; causes poor performance "Lose one you lose em all" Yup Yup
Creating RAID 0, RAID 1, and RAID 5 Disks Creating a Volume Set
- Select area of free space (or select a formatted partition with a drive letter assigned to it)
- Holding down the CTRL key, select a second area of free space (can be on same physical disk)
- Partition > Create Volume Set (for a formatted partition, "Extended Volume Set")
- Select Total size of volume set
- Volume set is created and Disk Administrator automatically assigns a drive letter to it
- Commit Changes now
Extending a Volume Set when a Volume Set is already created
- Select the Volume Set
- Holding down the CTRL key, select a another area of free space (can be on same physical disk)
- Partition > Extend Volume Set
- Volume set is extended
- Commit Changes now
Creating a Stripe Set without Parity
- Select area of free space
- Holding down the CTRL key, select a second area of free space (must be on separate physical disk)
- Partition > Create Stripe Set
- Select Total size of Stripe set
- Stripe set is created and Disk Administrator automatically assigns a drive letter to it
- Commit Changes now
Creating a Disk Mirror / Duplex
- Select a formatted partition with a drive letter assigned to it.
- Holding down the CTRL key, select area of free space (must be equal to in size) (must be on separate physical disk)
- Fault Tolerance > Establish Mirror
- Mirror / Duplex is created (drive letter is formatted partition with a drive letter assigned to it)
- Commit Changes now
Creating a Stripe Set with Parity
Calculate the size
- 1/3 of total space is used to store parity in Disk Striping with Parity (3 disks) ; 1/4 of the total space is used to store parity in Disk Striping with parity (4 disks) etc.
- Disk Striping with parity is cumulation of most available space on three or more drives. Largest space on each drive is equal to smallest space available on smallest drive.
- Select area of free space (on separate physical disk)
- Holding down the CTRL key, select a second area of free space (must be on separate physical disk)
- Holding down the CTRL key, select a third area of free space (must be on separate physical disk)
- Fault Tolerance > Create Stripe Set with Parity
- Commit Changes now
RAID 1 Failure: Disk Mirrors And Duplex Failure
Note: It looks easy enough, but recovery in RL (real life) is a whole different story. Disaster recovery is a very, very complex undertaking and demands careful planning and testing both at the hardware and software level -- do it -- before you actually have to!
Overview
- When original member of disk mirror or duplex set fails, NT -- "automatically" -- uses the other member of the set to continue operation.
- Whenever a member of a mirror or duplex set fails, you must replace the failed member and reestablish the mirror or duplex, to continue to have data protection.
- NT Server will be unable to reboot at all if the original set member fails, because the BOOT.INI file points to that member (it's necessary to hand-edit the ARC name in the boot file to point to the other member of the set to restart the system at all). You will have to boot with your Fault Tolerance boot disk to regain access to the system to run Disk Administrator utility to repair the set.
Fixing Broken Mirrors And Duplexes
RAID 5 Failure: Stripe Set with Parity |
Overview
Fixing Failed Members of a Stripe Set
Windows NT Boot Disk - this is not the same disk as the Emergency Repair Disk
Windows NT Boot Disk Fixes - NT boot disk can access a drive that has NTFS or FAT file system installed. Boot disk useful for:
Note
To create a fault tolerance boot disk:
Intel x86-based computers | RISC-based computers |
Ntldr | Osloader.exe |
NTdetect.com | Hal.dll |
NTbootdd.sys (for small computer system interface (SCSI) disks not using a SCSI BIOS)* | *.pal (Alpha only) |
Boot.ini |
*The NTbootdd.sys file appears only on SCSI systems in which the SCSI BIOS is not used!
Variable | Value |
Osloader | Multi(0)disk(0)fdisk(0)\Osloader.exe |
Systempartition | Multi(0)disk(0)fdisk(0) |
Osloadpartition | Path to the secondary mirrored partition. |
osloadfilename | Path to the Windows NT Server root directory. |
Multi(0)disk(0)rdisk(1)partition(2)
Convention | Description |
Multi | scsi | Identifies the controller type. It can either be SCSI or some other type (multi). |
(x) | Identifies
the hardware adapter
(starts with 0). |
Disk (y) | SCSI
bus number.
For multi the value is always 0. |
Rdisk(z) | Disk
number (ignored for SCSI controllers).
(start with 0) |
Partition(a) | Partition
number
(start with 1) this one is the odd ball; all the rest start with 0 |
The SCSI ARC naming convention varies the disk() parameter for successive disks on one controller; the multi controller format varies the rdisk() parameter.
Example Boot.ini
How do I work with Advanced RISC Computer (ARC) names? |
What are the two sections labeled [boot loader] and [operating systems] in the BOOT.INI (read only, system, hidden)?
1. The [boot loader] section supplies timeout interval after which the default operating system to load (defined in the default= line that follows timeout) loads automatically. *Windows NT usually boots from (boot loader) section.
2. The [operating systems] section supplies complete menu of operating system choices NTLDR displays after the program loads. You can disable the timer before it elapses by pressing any arrow or letter key on the keyboard. Then, you can wait as long as you like to make your menu selection. *Windows NT boots from (operating system) section if deliberate change to OS is made.
Note: You need to make changes in both especially if using disk mirroring or disk duplexing and one fails.
Remember
I. How do I determine if I use the Multi or SCSI parameter?
1. "SCSI"
only applies to a SCSI drive whose onboard BIOS has been disabled (no BIOS
translation capabilities).
2.
If the NTBOOTDD.SYS file is on your system, use "SCSI".
If neither is the case, use "multi."
II. How do I determine the SCSI or Multi number?
What is a disk controller - a chip and associated circuitry that is responsible for controlling a disk drive. There are different controllers for different interfaces. For example, an IDE interface requires an IDE controller and a SCSI interface requires a SCSI controller.
Which Disk Controller is the drive we are looking for attached to?
A. If there
is only 1 Disk controller (multi or SCSI) = 0
B.
If there are 2 Disk controllers (multi or SCSI) = 1 ; this is assuming the hard
drive we are looking for is attached to the second controller. If the hard drive
we are looking for is attached to the first disk controller (multi or SCSI)
will = 0
III. How do I determine the rdisk and disk number?
Part 1
1. If
we are using "multi", "disk" will always be 0
2.
If we are using "scsi", "rdisk" will always be 0
Part 2
A. disk =
SCSI bus ID (usually 0 to 6) when "SCSI" is chosen.
B.
rdisk = LUN (SCSI logical unit number) or position in disk chain when "multi"
is chosen. Usually
-The
first hard disk = 0
-The
second hard disk = 1
-The
third hard disk = 2
-and
so on unless specifically mentioned.
Odds and Ends |
NTFS support:
When to use NTFS:
NTFS Notables
|