Previous | Next

Objectives| Introduction| The Components of Backups| Scheduler| Transport| Characteristics of a Good Backup Strategy| Considerations for Developing a Backup Strategy| Commands| dump and restore| The tar Command| The dd Command| The mt Command| Compression Programs| Conclusions| Review Questions

Section 8


BACKUPS


Objectives


This section aims to

Introduction


This is THE MOST IMPORTANT responsibility of the System Administrator. Backups MUST be made of all the data on the system. It is inevitable that equipment will fail and that users will "accidentally" delete files. There should be a safety net so that important information can be recovered.


It isn't just users who accidentally delete files


A friend of mine who was once the administrator of a UNIX machine (and shall remain nameless, but is now a respected Academic at CQU) committed one of the great no-nos of UNIX Administration.

Early on in his career he was carefully removing numerous old files for some obscure reason when he entered commands resembling the following (he was logged in as root when doing this).

	cd / usr/user/panea		notice the mistake
	rm -r *	
The first command contained a typing mistake (the extra space) that means that instead of being in the directory /usr/user/panea he was now in the / directory. The second command says delete everything in the current directory and any directories below it. Result: a great many files removed.

The moral of this story is that everyone makes mistakes. Root users, normal users, hardware and software all make mistakes, break down or have faults. This means you must keep backups of any system.


The Components of Backups


There are basically three components to a backup strategy the

Scheduler


The scheduler is the object that decides when backups should be performed and how much should be backed up. The scheduler could be the root user or a program, usually cron (discussed in a later section).

The amount of information that the scheduler backs up can have the following categories


Transport


The transport is a program that is responsible for placing the backed up data onto the media. There are quite a number of different programs that can be used as transports. Some of the standard UNIX transport programs are examined later in this section.

There are two basic mechanisms that are used by transport programs to obtain the information from the disk


Image Transports


An image transport program bypasses the file system and reads the information straight off the disk using the raw device file. To do this the transport program needs to understand how the information is structured on the disk. This means that transport programs are linked very closely to exact file systems since different file system structure information differently.

Once read off the disk the data is written byte by byte from disk onto tape. This method generally means that backups are usually quicker than the file by file method. However restoration of individual files generally takes much more time.

Transport programs that use the method include dd, volcopy and dump.


File by File


Commands performing backups using this method use the system calls provided by the operating system to read the information. Since almost any UNIX system uses the same system calls a transport program that uses the file by file method (and the data it saves) is more portable.

File by file backups generally take more time but it is generally easier to restore individual files. Commands that use this method include tar and cpio.


Media


Backups are usually made to tape based media. There are different types of tape. Tape media can differ in Different types of media can also be more reliable and efficient. The most common type of backup media used today are 4 millimetre DAT tapes.


Characteristics of a Good Backup Strategy


Backup strategies change from site to site. What works on one machine may not be possible on another. There is no standard backup strategy. There are however a number of characteristics that need to be considered including

Easy To Use


If backups are easy to use, you will use them. AUTOMATE!! It should be as easy as placing a tape in a drive, typing a command and waiting for it to complete. The reading for this section contains a story about what happens if backups are seen to be too much work.


Time Efficiency


Obtain a balance to minimize the amount of operator, real and CPU time taken to carry out the backup and to restore files. The typical tradeoff is that a quick backup implies a longer time to restore files. Keep in mind that you will in general perform more backups than restores.

On some large sites particular backup strategies fail because there aren't enough hours in a day. Backups scheduled to occur every 24 hours fail because the previous backup still hasn't finished.


Easy to Restore Files


The reason for doing backups is so you can get information back. You will have to be able to restore information ranging from a single file to an entire file system. You need to know on which media the required file is and you need to be able to get to it quickly.

This means that you will need to maintain a table of contents and label media carefully.


Verified Backups


YOU MUST VERIFY YOUR BACKUPS. The safest method is once the backup is complete, read the information back from the media and compare it with the information stored on the disk. If it isn't the same then the backup is not correct.

Well that is a nice theory but it rarely works in practice. This method is only valid if the information on the disk hasn't changed since the backup started. This means the file system cannot be used by users while a backup is being performed or during the verification. Keeping a file system unused for this amount of time is not often an option.

Other quicker methods include

These methods also do not always work. Under some conditions and with some commands the two methods will not guarantee that your backup is correct.


Fault Tolerant


A backup strategy should be able to handle There are situations where it is important that Consider the following situation.

A site has one set of full backups stored on tapes. They are currently performing another full backup of the system onto the same tapes. What happens when the backup system is happily churning away when it gets about halfway and crashes (the power goes off, the tape drive fails etc). This could result in the both the tape and the disk drive being corrupted. Always maintain duplicate copies of full backups.

The Pauls ice-cream factory in Brisbane is located right on the river bank. During the early 1970s Brisbane suffered a major flood. Pauls' computer room was in the basement of their factory and was completely washed out. All the backups were kept in the computer room.


Portable


There may be situations where the data stored on backups must be retrieved onto a different type of machine. The ability for backups to be portable to different types of machine is often an important characteristic.

For example:

The computer currently being used by a company is the last in its line. The manufacturer is bankrupt and no-one else uses the machine. Due to unforeseen circumstances the machine burns to the ground. The Systems Administrator has recent backups available and they contain essential data for this business. How are the backups to be used to reconstruct the system?

Considerations for Developing a Backup Strategy


Apart from the above characteristics, factors that may affect the type of backup strategy implemented will include
Reading.

UNIX System Administration Handbook (2nd Ed.) Chapter 11

Purpose.

To provide an overview of backup systems.


Commands


As with most things the different versions of UNIX provide a plethora of commands that could possibly act as the transport in a backup system. The following table provides a summary of the characteristics of the more common programs that are used for this purpose.
Command		Availability	Characteristics

dump/restore	BSD systems	image backup, allows multiple volumes,
				 not included on most AT&T systems
tar		almost all	file by file, most versions do not support
		 systems	 multiple volumes, intolerant of errors
cpio		AT&T systems	file by file, can support multiple
				 volumes some versions don't,  

		Table 8.1. The Different Backup Commands.
There are a number of other public domain and commercial backup utilities available which are not listed here.


dump and restore


A favourite amongst many Systems Administrators, dump is used to perform backups and restore is used to retrieve information from the backups. These programs are of BSD UNIX origin and have not made the jump across to SysV systems. Most SysV systems do not come with dump and restore. The main reason is since dump and restore bypass the file system they must know how the particular file system is structured. So you simply can't recompile a version of dump from one machine onto another (unless they use the same file system structure).

Many recent versions of systems based on SVR4 (the latest version of System V UNIX) come with versions of dump and restore.


dump


The command line format for dump is
dump [ options [ arguments ] ] file system
dump [ options [ arguments ] ] filename
Arguments must appear after all options and must appear in a set order.

dump is generally used to backup an entire partition (file system). If given a list of filenames dump will back up the individual files.

dump works on the concept of levels (it uses 9 levels). A dump level of 0 means that all files will be backed up. A dump level of 1...9 means that all files that have changed since the last dump of a lower level will be backed up.

	Options		Purpose

	0-9		dump level
	a archive-file	archive-file will be a table of contents of the archive.
	f dump-file	specify the file (usually a device file) to write the
			  dump to, a - specifies standard output
	u 		update the dump record (/etc/dumpdates)
	v		after writing each volume rewind the tape and verify.
			  The file system must not be used during dump
			  or the verification.

		Table 8.2. Arguments for dump
There are other options. Refer to the manual page for the system for more information.

Linux does not currently support the dump or restore commands.

For example:

dump 0dsbfu 54000 6000 126 /dev/rst2 /usr
full backup of /usr file system on a 2.3 Gig 8mm tape connected to device rst2 The numbers here are special information about the tape drive the backup is being written on.

The restore/rrestore Command


The purpose of the restore command is to extract files archived using the dump command. restore provides the ability to extract single individual files, directories and their contents and even an entire file system.
restore -irRtx [ modifiers ] [ filenames ]
The restore command has an interactive mode where commands like ls etc can be used to search through the backup.
	Arguments	Purpose

	i	interactive, directory information is read from the tape
		  after which you can browse through the directory hierarchy
		  and select files to be extracted.
	r	restore the entire tape.  Should only be used to restore an
		  entire file system or to restore an incremental tape after
		  a full level 0 restore.
	t	table of contents, if no filename provided root directory is
		  listed including all subdirectories (unless the h modifier
		  is in effect)
	x	extract named files.  If a directory is specified it and all
		  its sub-directories are extracted.

		Table 8.3. Arguments for the restore Command.


	Modifiers	Purpose

	a archive-file	use an archive file to search for a file's location.
			  Convert contents of the dump tape to the new file
			  system format
	d		turn on debugging
	h		prevent hierarchical restoration of sub-directories
	v		verbose mode
	f dump-file	specify dump-file to use, - refers to standard input
	s n		skip to the nth dump file on the tape

		Table 8.4. Argument modifiers for the restore Command.

The tar Command


tar is a general purpose command used for archiving files. It takes multiple files and directories and combines them into one large file. By default the resulting file is written to a default device (usually a tape drive). However the resulting file can be placed onto a disk drive.
tar -function[modifier] device [files]
When using tar each individual file stored in the final archive is preceded by a header that contains approximately 512 bytes of information. Also the end of the file is always padded so that it occurs on an even block boundary. For this reason every file added into the tape archive has on average an extra .75Kb of padding per file.
	Arguments	Purpose

	function	A single letter specifying what should be done,
			  values listed in Table 8.6
	modifier	Letters that modify the action of the specified
			  function, values listed in Table 8.7
	files		The names of the files and directories to be
			  restored or archived.  If it is a directory then
			  EVERYTHING in that directory is restored or archived

		Table 8.5. Arguments to tar.


	Function	Purpose

	c		create a new tape, do not write after last file 
	r		replace, the named files are written onto the end
			  of the tape
	t		table, information about specified files is listed,
			  similar in output to the command ls -l, if no
			  files specified all files listed
	u *		update, named files are added to the tape if they
			  are not already there or they have been modified
			  since being previously written
	x		extract, named files restored from the tape, if the
			  named file matches a directory all the contents
			  are extracted recursively
		*  the u function can be very slow

		Table 8.6. Values of the function argument for tar.


	Modifier	Purpose

	v		verbose, tar reports what it is doing and to what
	w		tar prints the action to be taken, the name of the
			  file and waits for user confirmation
	f		file, causes the device parameter to be treated as
			  a file
	m		modify, tells tar not to restore the modification
			  times as they were archived but instead to use
			  the time of extraction
	o		ownership, use the UID and GID of the user running
			  tar not those stored on the tape

		Table 8.7. Values of the modifier argument for tar.
If the f modifier is used it must be the last modifier used. Also tar is an example of a UNIX command where the - character is not required to specify modifiers.

For example:

	tar -xvf temp.tar			tar xvf temp.tar
	extract all the contents of the tar file temp.tar
	tar -xf temp.tar hello.dat		
	extract the file hello.dat from the tar file temp.tar
	tar -cv /dev/rmt0 /home
	archive all the contents of the /home directory onto tape,
	 overwriting whatever is there
Exercise 8-1. Create a file called temp.dat under a directory tmp that is within your home directory. Use tar to create an archive containing the contents of your home directory.

Exercise 8-2. Delete the $HOME/tmp/temp.dat created in the previous question. Extract the copy of the file that is stored in the tape archive (the term tape archive is used to refer to a file created by tar) created in the previous question.


The dd Command


The man page for dd lists its purpose as being "copy and convert data". Basically dd takes input from one source and sends it to a different destination. The source and destination can be device files for disk and tape drives or normal files.

The basic format of dd is

	dd [option = value ....]
Table 8.8. lists some of the different options available.
	Option	Purpose

	if=name		input file name (default is standard input)
	of=name		output file name (default is standard output)
	ibs=num		the input block size in num bytes (default is 512)
	obs=num		the output block size in num bytes (default is 512)
	bs=num		set both input and output block size
	skip=num	skip num input records before starting to copy
	files=num	copy num files before stopping (used when input is
			  from magnetic tape)
	conv=ascii	convert EBCDIC to ASCII
	conv=ebcdic	convert ASCII to EBCDIC
	conv=lcase	make all letters lowercase
	conv=ucase	make all letters uppercase
	conv=swab	swap every pair of bytes

		Table 8.8. Options for dd.
For example:
	dd if=/dev/hda1 of=/dev/rmt4
	with all the default settings copy the contents of hda1
	 (the first partition on disk a) to the tape drive for the system
Exercise 8-3. Use dd to copy the contents of a floppy disk to a single file to be stored under your home directory.


The mt Command


The usual media used in backups is magnetic tape. Magnetic tape is a sequential media. That means that to access a particular file you must pass over all the tape containing files that come before the file you want. The mt command is used to send commands to a magnetic tape drive that control the location of the read/write head of the drive.
mt [-f tapename] command [count]


	Arguments	Purpose

	tapename	raw device name of the tape device
	command		one of the commands specified in table 8.9.
			  Not all commands are recognised by all tape drives.
	count		number of times to carry out command

		Table 8.9. Parameters for the mt Command.



	Commands	Action

	fsf		move forward the number of files specified by
			 the count argument
	asf		move forward to file number count
	rewind		rewind the tape
	retension	wind the tape out to the end and then rewind
	erase		erase the entire tape
	offline		eject the tape

		Table 8.10. Commands Possible using the mt Command.
For example:
mt -f /dev/nrst0 asf 3
move to the third file on the tape
mt -f /dev/nrst0 rewind
mt -f /dev/nrst0 fsf 3

same as the first command
The mt command can be used to put multiple dump/tar archive files onto the one tape. Each time dump/tar is used one file is written to the tape. The mt command can be used to move the read/write head of the tape drive to the end of that file, at which time dump/tar can be used to add another file.

For example:

mt -f /dev/rmt/4 rewind
rewind the tape drive to the start of the tape
tar -cvf /dev/rmt/4 /home/jonesd
backup my home directory, after this command the tape will be automatically rewound
mt -f /dev/rmt/4 asf 1
move the read/write head forward to the end of the first file
tar -cvf /dev/rmt/4a /home/thorleym
backup the home directory of thorleym onto the end of the tape drive
There are now two tar files on the tape, the first containing all the files and directories from the directory /home/jonesd and the second containing all the files and directories from the directory /home/thorleym.


Compression Programs


Various compression programs are sometimes used in conjunction with transport programs to reduce the size of backups. This is not always a good idea. Adding compression to a backup adds extra complexity to the backup and as such increases the chances of something going wrong.


compress


compress is the standard UNIX compression program and is found on every UNIX machine (well, I don't know of one that doesn't have it). The basic format of the compress command is
	compress filename
The file with the name filename will be replaced with a file with the same name but with an extension of .Z added and that is smaller than the orginal (it has been compressed). A compressed file is uncompressed using the uncompress command or the -d switch of compress.
	
	uncompress filename   or   compress -d filename
For example:
	bash$ ls -l ext349*
	-rw-r----- 1 jonesd      17340 Jul 16 14:28 ext349
	bash$ compress ext349
	bash$ ls -l ext349*
	-rw-r----- 1 jonesd       5572 Jul 16 14:28 ext349.Z
	bash$ uncompress ext349
	bash$ ls -l ext349*
	-rw-r----- 1 jonesd      17340 Jul 16 14:28 ext349

gzip


gzip is a new addition to the UNIX compression family. It works in basically the same way as compress but uses a different (and better) compression algorithm. It uses an extension of .z and the program to uncompress a gzip archive is gunzip.

For example:

	bash$ gzip ext349
	bash$ ls -l ext349*
	-rw-r----- 1 jonesd    4029 Jul 16 14:28 ext349.z
	bash$ gunzip ext349
Exercise 8-4. Modify your solution to exercise 8-2 so that instead of writing the contents of your floppy straight to a file on your hard disk it first compresses the file using either compress or gzip and then saves to a file.


Conclusions


In this section you have

Review Questions


8.1. Design a backup strategy for your system. List the components of your backup strategy and explain how these components affect your backup strategy.

8.2. Explain the terms media, scheduler and transport.

8.3. Outline the difference between file by file and image transport programs.

8.4. If possible contact a local business and discuss the backup strategy that they use. If possible identify whether that strategy meets some of the guidelines discussed in this section. (It is not necessary for them to be using UNIX.)


Previous | Next

David Jones (author)
Chris Hanson (html 05/09/96)