Previous | Next
Objectives|
Introduction|
The Components of Backups|
Scheduler|
Transport|
Characteristics of a Good Backup Strategy|
Considerations for Developing a Backup Strategy|
Commands|
dump and restore|
The tar Command|
The dd Command|
The mt Command|
Compression Programs|
Conclusions|
Review Questions
Section 8
BACKUPS
This section aims to
- discuss the components of a backup strategy,
- introduce you to the characteristics of a good backup strategy,
- discuss the factors that affect a backup strategy,
- introduce the UNIX commands tar dump restore dd gzip compress, and
- offer suggestions on creating a backup strategy for your system.
This is THE MOST IMPORTANT responsibility of the System Administrator. Backups MUST be made of all the data on the system. It is inevitable that equipment will fail and that users will "accidentally" delete files. There should be a safety net so that important information can be recovered.
It isn't just users who accidentally delete files
A friend of mine who was once the administrator of a UNIX machine (and shall remain nameless, but is now a respected Academic at CQU) committed one of the great no-nos of UNIX Administration.
Early on in his career he was carefully removing numerous old files for some obscure reason when he entered commands resembling the following (he was logged in as root when doing this).
cd / usr/user/panea notice the mistake
rm -r *
The first command contained a typing mistake (the extra space) that means that instead of being in the directory /usr/user/panea he was now in the / directory. The second command says delete everything in the current directory and any directories below it. Result: a great many files removed.
The moral of this story is that everyone makes mistakes. Root users, normal users, hardware and software all make mistakes, break down or have faults. This means you must keep backups of any system.
There are basically three components to a backup strategy the
- scheduler
- transport, and
This is the program that is used to move the data to be backed up from the disk on which it resides to the media on which it will be saved. There are a number of different programs available on a UNIX system and you will be introduced to most of the more popular ones in this section.
- media
The actual physical device on which the backup is made. Media might be floppy disks (very painful on anything but a trival system), magnetic tapes (ranging from 80Mb up to 8 Gigabytes and beyond), write once optical disks or even another hard disk drive.
The scheduler is the object that decides when backups should be performed and how much should be backed up. The scheduler could be the root user or a program, usually cron (discussed in a later section).
The amount of information that the scheduler backs up can have the following categories
- full backups,
All the information on the entire system is backed up. This is the safest type but also the most expensive in machine and operator time and the amount of media required.
- partial backups, or
Only the busier and more important file systems are backed up. One example of a partial backup might include configuration files (like /etc/passwd), user home directories and the mail and news spool directories. The reasoning is that these files change the most and are the most important to keep a track of. In most instances this can still take substantial resources to perform.
- incremental backups.
Only those files that have been modified since the last backup are backed up. This method requires less resources but a large amount of incremental backups make it more difficult to locate the version of a particular file you may desire.
The transport is a program that is responsible for placing the backed up data onto the media. There are quite a number of different programs that can be used as transports. Some of the standard UNIX transport programs are examined later in this section.
There are two basic mechanisms that are used by transport programs to obtain the information from the disk
- image, and
- through the file system.
Image Transports
An image transport program bypasses the file system and reads the information straight off the disk using the raw device file. To do this the transport program needs to understand how the information is structured on the disk. This means that transport programs are linked very closely to exact file systems since different file system structure information differently.
Once read off the disk the data is written byte by byte from disk onto tape. This method generally means that backups are usually quicker than the file by file method. However restoration of individual files generally takes much more time.
Transport programs that use the method include dd, volcopy and dump.
File by File
Commands performing backups using this method use the system calls provided by the operating system to read the information. Since almost any UNIX system uses the same system calls a transport program that uses the file by file method (and the data it saves) is more portable.
File by file backups generally take more time but it is generally easier to restore individual files. Commands that use this method include tar and cpio.
Media
Backups are usually made to tape based media. There are different types of tape. Tape media can differ in
- physical size and shape, and
- amount of information that can be stored.
From a 100Mb up to 8Gb.
Different types of media can also be more reliable and efficient. The most common type of backup media used today are 4 millimetre DAT tapes.
Backup strategies change from site to site. What works on one machine may not be possible on another. There is no standard backup strategy. There are however a number of characteristics that need to be considered including
- ease of use,
- time efficiency,
- easy to restore files,
- ability to verify backups,
- tolerant of faulty media, and
- portable to a range of machines.
Easy To Use
If backups are easy to use, you will use them. AUTOMATE!! It should be as easy as placing a tape in a drive, typing a command and waiting for it to complete. The reading for this section contains a story about what happens if backups are seen to be too much work.
Time Efficiency
Obtain a balance to minimize the amount of operator, real and CPU time taken to carry out the backup and to restore files. The typical tradeoff is that a quick backup implies a longer time to restore files. Keep in mind that you will in general perform more backups than restores.
On some large sites particular backup strategies fail because there aren't enough hours in a day. Backups scheduled to occur every 24 hours fail because the previous backup still hasn't finished.
Easy to Restore Files
The reason for doing backups is so you can get information back. You will have to be able to restore information ranging from a single file to an entire file system. You need to know on which media the required file is and you need to be able to get to it quickly.
This means that you will need to maintain a table of contents and label media carefully.
Verified Backups
YOU MUST VERIFY YOUR BACKUPS. The safest method is once the backup is complete, read the information back from the media and compare it with the information stored on the disk. If it isn't the same then the backup is not correct.
Well that is a nice theory but it rarely works in practice. This method is only valid if the information on the disk hasn't changed since the backup started. This means the file system cannot be used by users while a backup is being performed or during the verification. Keeping a file system unused for this amount of time is not often an option.
Other quicker methods include
- restoring selected files from the start, middle and end of the backup,
If these particular files are retrieved correctly the assumption is that all of the files are valid.
- create a table of contents during the backup, afterwards read the contents of the tape and compare the two.
These methods also do not always work. Under some conditions and with some commands the two methods will not guarantee that your backup is correct.
Fault Tolerant
A backup strategy should be able to handle
- faults in the media, and
- physical dangers.
There are situations where it is important that
- there exist at least two copies of full backups of a system, and
- that at least one set should be stored at another site.
Consider the following situation.
A site has one set of full backups stored on tapes. They are currently performing another full backup of the system onto the same tapes. What happens when the backup system is happily churning away when it gets about halfway and crashes (the power goes off, the tape drive fails etc). This could result in the both the tape and the disk drive being corrupted. Always maintain duplicate copies of full backups.
The Pauls ice-cream factory in Brisbane is located right on the river bank. During the early 1970s Brisbane suffered a major flood. Pauls' computer room was in the basement of their factory and was completely washed out. All the backups were kept in the computer room.
Portable
There may be situations where the data stored on backups must be retrieved onto a different type of machine. The ability for backups to be portable to different types of machine is often an important characteristic.
For example:
-
The computer currently being used by a company is the last in its line. The manufacturer is bankrupt and no-one else uses the machine. Due to unforeseen circumstances the machine burns to the ground. The Systems Administrator has recent backups available and they contain essential data for this business. How are the backups to be used to reconstruct the system?
Apart from the above characteristics, factors that may affect the type of backup strategy implemented will include
- the available commands
The characteristics of the available commands limit what can be done.
- available hardware
The capacity of the backup media to be used also limits how backups are performed. In particular how much information can the media hold?
- maximum expected size of file systems
The amount of information required to be backed up and whether or not the combination of the available software and hardware can handle it. A suggestion is that individual file systems should never contain more information than can fit easily onto the backup media.
- importance of the data
The more important the data is the more important that it be backed up regularly and safely.
- level of data modification
The more data being created and modified the more often it should be backed up. For example the directories /bin and /usr/bin will hardly ever change so they rarely need backing up. On the other hand directories under /home are likely to change drastically every day.
Reading.
UNIX System Administration Handbook (2nd Ed.) Chapter 11
Purpose.
To provide an overview of backup systems.
As with most things the different versions of UNIX provide a plethora of commands that could possibly act as the transport in a backup system. The following table provides a summary of the characteristics of the more common programs that are used for this purpose.
Command Availability Characteristics
dump/restore BSD systems image backup, allows multiple volumes,
not included on most AT&T systems
tar almost all file by file, most versions do not support
systems multiple volumes, intolerant of errors
cpio AT&T systems file by file, can support multiple
volumes some versions don't,
Table 8.1. The Different Backup Commands.
There are a number of other public domain and commercial backup utilities available which are not listed here.
A favourite amongst many Systems Administrators, dump is used to perform backups and restore is used to retrieve information from the backups.
These programs are of BSD UNIX origin and have not made the jump across to SysV systems. Most SysV systems do not come with dump and restore. The main reason is since dump and restore bypass the file system they must know how the particular file system is structured. So you simply can't recompile a version of dump from one machine onto another (unless they use the same file system structure).
Many recent versions of systems based on SVR4 (the latest version of System V UNIX) come with versions of dump and restore.
dump
The command line format for dump is
dump [ options [ arguments ] ] file system
dump [ options [ arguments ] ] filename
Arguments must appear after all options and must appear in a set order.
dump is generally used to backup an entire partition (file system). If given a list of filenames dump will back up the individual files.
dump works on the concept of levels (it uses 9 levels). A dump level of 0 means that all files will be backed up. A dump level of 1...9 means that all files that have changed since the last dump of a lower level will be backed up.
Options Purpose
0-9 dump level
a archive-file archive-file will be a table of contents of the archive.
f dump-file specify the file (usually a device file) to write the
dump to, a - specifies standard output
u update the dump record (/etc/dumpdates)
v after writing each volume rewind the tape and verify.
The file system must not be used during dump
or the verification.
Table 8.2. Arguments for dump
There are other options. Refer to the manual page for the system for more information.
Linux does not currently support the dump or restore commands.
For example:
-
dump 0dsbfu 54000 6000 126 /dev/rst2 /usr
full backup of /usr file system on a 2.3 Gig 8mm tape connected to device rst2 The numbers here are special information about the tape drive the backup is being written on.
The purpose of the restore command is to extract files archived using the dump command. restore provides the ability to extract single individual files, directories and their contents and even an entire file system.
restore -irRtx [ modifiers ] [ filenames ]
The restore command has an interactive mode where commands like ls etc can be used to search through the backup.
Arguments Purpose
i interactive, directory information is read from the tape
after which you can browse through the directory hierarchy
and select files to be extracted.
r restore the entire tape. Should only be used to restore an
entire file system or to restore an incremental tape after
a full level 0 restore.
t table of contents, if no filename provided root directory is
listed including all subdirectories (unless the h modifier
is in effect)
x extract named files. If a directory is specified it and all
its sub-directories are extracted.
Table 8.3. Arguments for the restore Command.
Modifiers Purpose
a archive-file use an archive file to search for a file's location.
Convert contents of the dump tape to the new file
system format
d turn on debugging
h prevent hierarchical restoration of sub-directories
v verbose mode
f dump-file specify dump-file to use, - refers to standard input
s n skip to the nth dump file on the tape
Table 8.4. Argument modifiers for the restore Command.
tar is a general purpose command used for archiving files. It takes multiple files and directories and combines them into one large file. By default the resulting file is written to a default device (usually a tape drive). However the resulting file can be placed onto a disk drive.
tar -function[modifier] device [files]
When using tar each individual file stored in the final archive is preceded by a header that contains approximately 512 bytes of information. Also the end of the file is always padded so that it occurs on an even block boundary. For this reason every file added into the tape archive has on average an extra .75Kb of padding per file.
Arguments Purpose
function A single letter specifying what should be done,
values listed in Table 8.6
modifier Letters that modify the action of the specified
function, values listed in Table 8.7
files The names of the files and directories to be
restored or archived. If it is a directory then
EVERYTHING in that directory is restored or archived
Table 8.5. Arguments to tar.
Function Purpose
c create a new tape, do not write after last file
r replace, the named files are written onto the end
of the tape
t table, information about specified files is listed,
similar in output to the command ls -l, if no
files specified all files listed
u * update, named files are added to the tape if they
are not already there or they have been modified
since being previously written
x extract, named files restored from the tape, if the
named file matches a directory all the contents
are extracted recursively
* the u function can be very slow
Table 8.6. Values of the function argument for tar.
Modifier Purpose
v verbose, tar reports what it is doing and to what
w tar prints the action to be taken, the name of the
file and waits for user confirmation
f file, causes the device parameter to be treated as
a file
m modify, tells tar not to restore the modification
times as they were archived but instead to use
the time of extraction
o ownership, use the UID and GID of the user running
tar not those stored on the tape
Table 8.7. Values of the modifier argument for tar.
If the f modifier is used it must be the last modifier used. Also tar is an example of a UNIX command where the - character is not required to specify modifiers.
For example:
tar -xvf temp.tar tar xvf temp.tar
extract all the contents of the tar file temp.tar
tar -xf temp.tar hello.dat
extract the file hello.dat from the tar file temp.tar
tar -cv /dev/rmt0 /home
archive all the contents of the /home directory onto tape,
overwriting whatever is there
Exercise 8-1. Create a file called temp.dat under a directory tmp that is within your home directory. Use tar to create an archive containing the contents of your home directory.
Exercise 8-2. Delete the $HOME/tmp/temp.dat created in the previous question. Extract the copy of the file that is stored in the tape archive (the term tape archive is used to refer to a file created by tar) created in the previous question.
The man page for dd lists its purpose as being "copy and convert data". Basically dd takes input from one source and sends it to a different destination. The source and destination can be device files for disk and tape drives or normal files.
The basic format of dd is
dd [option = value ....]
Table 8.8. lists some of the different options available.
Option Purpose
if=name input file name (default is standard input)
of=name output file name (default is standard output)
ibs=num the input block size in num bytes (default is 512)
obs=num the output block size in num bytes (default is 512)
bs=num set both input and output block size
skip=num skip num input records before starting to copy
files=num copy num files before stopping (used when input is
from magnetic tape)
conv=ascii convert EBCDIC to ASCII
conv=ebcdic convert ASCII to EBCDIC
conv=lcase make all letters lowercase
conv=ucase make all letters uppercase
conv=swab swap every pair of bytes
Table 8.8. Options for dd.
For example:
dd if=/dev/hda1 of=/dev/rmt4
with all the default settings copy the contents of hda1
(the first partition on disk a) to the tape drive for the system
Exercise 8-3. Use dd to copy the contents of a floppy disk to a single file to be stored under your home directory.
The usual media used in backups is magnetic tape. Magnetic tape is a sequential media. That means that to access a particular file you must pass over all the tape containing files that come before the file you want. The mt command is used to send commands to a magnetic tape drive that control the location of the read/write head of the drive.
mt [-f tapename] command [count]
Arguments Purpose
tapename raw device name of the tape device
command one of the commands specified in table 8.9.
Not all commands are recognised by all tape drives.
count number of times to carry out command
Table 8.9. Parameters for the mt Command.
Commands Action
fsf move forward the number of files specified by
the count argument
asf move forward to file number count
rewind rewind the tape
retension wind the tape out to the end and then rewind
erase erase the entire tape
offline eject the tape
Table 8.10. Commands Possible using the mt Command.
For example:-
mt -f /dev/nrst0 asf 3
move to the third file on the tape
mt -f /dev/nrst0 rewind
mt -f /dev/nrst0 fsf 3
same as the first command
The mt command can be used to put multiple dump/tar archive files onto the one tape. Each time dump/tar is used one file is written to the tape. The mt command can be used to move the read/write head of the tape drive to the end of that file, at which time dump/tar can be used to add another file.
For example:
-
mt -f /dev/rmt/4 rewind
rewind the tape drive to the start of the tape
tar -cvf /dev/rmt/4 /home/jonesd
backup my home directory, after this command the tape will be automatically rewound
mt -f /dev/rmt/4 asf 1
move the read/write head forward to the end of the first file
tar -cvf /dev/rmt/4a /home/thorleym
backup the home directory of thorleym onto the end of the tape drive
There are now two tar files on the tape, the first containing all the files and directories from the directory /home/jonesd and the second containing all the files and directories from the directory /home/thorleym.
Various compression programs are sometimes used in conjunction with transport programs to reduce the size of backups. This is not always a good idea. Adding compression to a backup adds extra complexity to the backup and as such increases the chances of something going wrong.
compress
compress is the standard UNIX compression program and is found on every UNIX machine (well, I don't know of one that doesn't have it). The basic format of the compress command is
compress filename
The file with the name filename will be replaced with a file with the same name but with an extension of .Z added and that is smaller than the orginal (it has been compressed).
A compressed file is uncompressed using the uncompress command or the -d switch of compress.
uncompress filename or compress -d filename
For example:
bash$ ls -l ext349*
-rw-r----- 1 jonesd 17340 Jul 16 14:28 ext349
bash$ compress ext349
bash$ ls -l ext349*
-rw-r----- 1 jonesd 5572 Jul 16 14:28 ext349.Z
bash$ uncompress ext349
bash$ ls -l ext349*
-rw-r----- 1 jonesd 17340 Jul 16 14:28 ext349
gzip
gzip is a new addition to the UNIX compression family. It works in basically the same way as compress but uses a different (and better) compression algorithm. It uses an extension of .z and the program to uncompress a gzip archive is gunzip.
For example:
bash$ gzip ext349
bash$ ls -l ext349*
-rw-r----- 1 jonesd 4029 Jul 16 14:28 ext349.z
bash$ gunzip ext349
Exercise 8-4. Modify your solution to exercise 8-2 so that instead of writing the contents of your floppy straight to a file on your hard disk it first compresses the file using either compress or gzip and then saves to a file.
In this section you have
- been introduced to the components of a backup strategy scheduler, transport, and media
- been shown some of the UNIX commands that can be used as the transport in a backup strategy
- examined some of the characteristics of a good backup strategy and some of the factors that affect a backup strategy
8.1. Design a backup strategy for your system. List the components of your backup strategy and explain how these components affect your backup strategy.
8.2. Explain the terms media, scheduler and transport.
8.3. Outline the difference between file by file and image transport programs.
8.4. If possible contact a local business and discuss the backup strategy that they use. If possible identify whether that strategy meets some of the guidelines discussed in this section. (It is not necessary for them to be using UNIX.)
Previous | Next
David Jones (author)
Chris Hanson (html 05/09/96)