Previous | Next
Objectives|
Introduction|
The UNIX Boot Process|
Single User, Multi-User and Different Run-Levels|
System V Run Levels|
The Startup Scripts|
Why Won't My System Boot?|
Solutions|
Daemons|
Shutting the System Down|
Reasons for Shutting the System Down|
Being Nice to the Users|
Commands for Shutting Down and Rebooting|
Conclusions|
Review Questions
Section 9
STARTING UP AND SHUTTING DOWN
By the end of this section you should be
- able to describe the process of booting a UNIX computer,
- able to explain the different formats of the initialisation scripts used by different versions of the UNIX operating system,
- able to explain the purpose of the init process,
- able to describe the format, contents and purpose of the file /etc/inittab,
- able to list and explain the different run levels a UNIX computer may be in,
- aware of the daemons that may be started,
- aware of the reasons and methods of shutting a UNIX computer down,
- able to explain why you shouldn't just turn a UNIX computer off, and
- aware of some problems and their solutions as to why a UNIX computer will not boot.
Being a multi-tasking, multi-user operating system means that UNIX is a great deal more complex than an operating system like MS-DOS. Before the UNIX operating system can perform correctly there are a number of steps that must be followed and procedures executed. The failure of any one of these can mean that the system will not start, or if it does it will not work correctly. It is important for the Systems Administrator to be aware of what happens during system startup so that any problems that occur can be remedied.
It is also important for the Systems Administrator to understand what is the correct mechanism to use to shut a UNIX machine down. A UNIX machine should (almost) never be just turned off. There are a number of steps to carry out to ensure that the operating system and many of its support functions remain in a consistent state.
By the end of this section you should be familiar with the startup and shutdown procedures for a UNIX machine and all the related concepts.
The startup procedure of a UNIX machine is much more complex than that of a single user operating system. Different UNIX systems will have slightly different methods of coming up. The following discussion provides a general overview of the UNIX boot process. Diagram 9.1 provides a summary of the process.
ROM
Most machines have a section of read only memory (ROM) that contains a program the machine executes when the power first comes on. What is programmed into ROM will depend on the hardware platform.
For example, on an IBM PC the ROM program typically does some hardware probing and then looks in a number of predefined locations (the first floppy drive and the primary hard drive partition) for a bootstrap program.
On hardware designed specifically for the UNIX operating system (machines from DEC, SUN etc) the ROM program will be a little more complex. Many will present some form of prompt. Generally this prompt will accept a number of commands that allow the Systems Administrator to specify
- where to boot the machine from, and
Sometimes the standard root partition will be corrupt and the system will have to be booted from another device. Examples include another hard drive, a CD-ROM, floppy disk or even a tape drive.
- whether to come up in single user or multi-user mode.
As a bare minimum the ROM program must be smart enough to work out where the bootstrap program is stored and how to start executing it.
The Bootstrap Program
At some stage the ROM program will execute the code stored in the boot block of a device (typically a hard disk drive). The code stored in the boot block is referred to as a bootstrap program.
The bootstrap program is responsible for locating and loading the kernel of the UNIX operating system into memory. The kernel of a UNIX operating system is usually stored in the root directory of the file system under some system defined filename. Table 9.1. provides examples of some of the kernel filenames used by some systems.
System Filename
/vmunix BSD
/unix SysV
/vmlinuz Linux
Table 9.1. Kernel Names.
On some systems the bootstrap program may also perform some additional hardware probing.
Kernel Initialisation
Once the bootstrap program has installed the kernel into memory the kernel will
- initialise its internal data structures,
- perform some further hardware checking (some systems check for every major device that is supposed to be connected),
- verify the integrity of the root file system and then mount it, and
- create the process 0 (swapper) and process 1 (init).
The swapper process is actually part of the kernel and is not a "real" process. The init process is the ultimate parent of all processes that will execute on a UNIX system.
The only way in which a process can be created is by an existing process performing a fork. A fork creates a brand new process that contains copies of the code and data structures of the original process. In most cases the new process will then perform an exec that replaces the old code and data structures with that of a new program.
Once the kernel has initialised itself init will perform the remainder of the startup procedure.
init
init is the process that is the ultimate ancestor of all user processes on a UNIX system. It always has a PID of 1. Depending on its configuration init will place the system into either single user mode or multi-user mode (the normal state of a UNIX machine).
Going into multi-user mode the init process must execute the various system startup scripts. These startup scripts are Bourne shell scripts stored under the /etc directory. Different versions of UNIX have slightly different formats but the responsibilities are basically the same (they are outlined below).
Diagram 9.1. UNIX Startup Procedure.
User Logins
One of the last steps performed by init is to enable user logins. A user is able to login using a terminal because a getty process has been run for that terminal. What happens after the getty process will be discussed in the next section on terminals.
All UNIX machine can be in two basic states
- multi-user mode, and
This is the standard mode for a UNIX machine. Multiple users are allowed to log in, all the daemons and all the services provided by the machine are available.
- single user mode.
This is basically the system maintenance mode. In single user mode only the bare minimum of services are available. Only one user (the root user) will be able to log in, only the root file system will be mounted automatically (others may be able to be mounted manually) and most of the daemons and services will not be available.
Whether or not the system comes up into single-user or multi-user mode depends on how the system was started up. On some systems the ROM program will have a switch that allows the Systems Administrator to specify single or multi-user mode.
When going into single user mode init forks to create a new process running the Bourne shell with root privilege. It does this prior to executing any of the initialisation scripts. This means that in single-user mode very few of the normal services on the machine are running, including normal logins, networking, the print services and many others.
The system will typically come up in single user mode for two reasons
- the root user wanted it to
There are times when you will want to bring the system up single user mode to perform some system maintenance duties. Typically the root user will bring the system up single user mode by entering a particular command at the ROM program prompt or by specifying a particular option at shutdown.
- the boot procedure has failed
Typically caused by some errors in the initialisation files or by fsck detecting some errors that it could not fix by itself.
When the Systems Administrator logs out of the shell prompt provided in single user mode the system will normally attempt to enter multi-user mode.
Later versions of System V based Unices added a number of different run levels that the machine could be in. Table 9.2 summarises the different run levels. At any one time the system must be in one of these run levels. There are various administrative commands that can take the system from one level to the other (the command who -r can be used to display the run level).
State Function
0 prepare the machine for turning off power, if the machine
can turn the power off tell it to do so
1 system administrator mode, all file systems mounted, only
small set of kernel processes running, single user mode
2 multi-user mode
3 multi-user mode with remote file sharing, processes,
and daemons
4 user definable system state
5 shutdown to ROM
6 shutdown and reboots
,S single-user mode, only root file system mounted
Table 9.2. init states for SVR4 UNIX Systems
As the system boots it will move through the various run levels (s, 1, 2, 3) under the control of init. Each run level has associated with it various initialisation scripts that will be executed as the machine enters that run level.
On SysV based machines the procedures followed by init as it moves through these run levels is controlled by the file /etc/inittab.
/etc/inittab
Each line of the inittab file is a separate entry and uses the following format
id:run_state:action:process
Label Explanation
id one or two characters to uniquely identify the entry
run_state indicates the run level at which the process should
be executed
action this tells init how to execute process
process the full path of a program to execute for this entry
Table 9.3. Explanation of inittab field entries.
When init receives notification that a certain event has happened it will examine the /etc/inittab file and execute the process specified for that event. How it should execute the process is controlled by the action field.
Example values for action include
- respawn
If the process does not already exist create it but don't wait for it to terminate, carry on immediately. If at any stage the process terminates, restart it (the getty processes for terminals are started this way).
- wait
When init receives notification of the event for the first time it will execute the process and wait for the process to finish. The process is never executed again until the system receives notification of the event again (This is how the system will execute the initialisation scripts.)
- bootwait
Execute the process only when the system first goes multi-user and wait for the process to finish.
- powerfail
Execute only when init receives a power fail signal.
Figure 9.1 is an example of an /etc/inittab taken from a System V Release 4 machine.
mt:23:bootwait:/etc/brc < dev/console > /dev/console 2>&1
p3:s1234:powerfail:/etc/shutdown -y -i0 -g0 > /dev/console 2>&1
s0:0:wait:/etc/rc0 off > /dev/console 2>&1 < /dev/console
s1:1:wait:/etc/rc1 > /dev/console 2>&1 < /dev/console
s2:23:wait:/etc/rc2 > /dev/console 2>&1 < /dev/console
.....
sac:234:respawn:/usr/lib/saf/sac -t 300
con:123:respawn:/etc/getty console console
Figure 9.1. Example /etc/inittab.
As mentioned above one of the major responsibilities of the init process is to execute the systems startup scripts. These scripts are executed after the kernel has been initialised but before normal users are allowed to log on. These scripts will typically
- check the integrity of the machine's file systems using fsck,
- mount the file systems,
- designate paging and swap areas,
- check disk quotas,
- clear out temporary files in /tmp and other locations,
- start up system daemons for printing, mail, accounting, system logging, networking and cron,
- enable user logins by running getty processes, and
- a number of other tasks.
Each basic version of UNIX has a different format for the startup scripts
- the BSD format, and
init will run predefined shell scripts usually called /etc/rc and perhaps /etc/rc.local.
- the System V format.
init reads the file /etc/inittab and as the system enters each associated run level it runs a specified shell script.
BSD Startup Scripts
Most BSD based systems will have at least the files
- /etc/rc, and
The system startup script that is executed as the system goes multi-user. It will typically run /etc/rc.local.
- /etc/rc.local.
The startup script that contains procedures deemed to be specific to your local site.
Some systems will add additional scripts /etc/rc.boot, /etc/rc.single that are run under various circumstances.
SysV Startup Scripts
Under SysV the /etc/inittab file is used to inform the init process which startup scripts it should execute. Each run level will have associated with it a particular startup script.
The filenames for these scripts generally follow the format /etc/rcL (these files may be located in the /sbin directory on later versions). Where L is the single character that denotes the run level (taken from table 9.2).
The purpose of these scripts is to execute all the shell scripts stored in a directory called with the name /etc/rcL.d. Where L is again the run level.
For example:
When the system enters run level 3 init will execute the script /etc/rc3 (/sbin/rc3 on later machines). This script will in turn execute all the scripts in the directory /etc/rc3.d.
The rcL.d directories will contain scripts with filenames that either start with
- an uppercase K, or
The "K files" are used to kill processes.
- an uppercase S
The "S files" are used to start processes and other initialisation procedures.
init will execute all the "K files" in a directory in alphabetical order first and then execute all the "S files" in alphabetical order.
Exercise 9-1. Examine the startup scripts of your system. Which format does your machine use? List the names of all the startup scripts used. Add echo commands to each of the startup scripts to display an informative message when the scripts start and finish.
Exercise 9-2. Modify your systems startup files to so that it displays a message.
There will be times when you have to reboot your machine in a nasty manner. One rule of thumb used by Systems Administration to solve some problems is "When in doubt, turn the power off, count to ten slowly, and turn the power back on". There will be times when the system won't come back to you, DON'T PANIC!
Possible reasons why the system won't reboot include
- hardware problems,
Both hardware failure and problems caused by human error (e.g. the power cord isn't plugged in, the drive cable is the wrong way around)
- defective boot floppies, drives or tapes,
- damaged file systems,
- improperly configured kernels,
A kernel configured to use SCSI drives won't boot on a system that uses an IDE drive controller.
- errors in the rc scripts.
The following is a Systems Administration maxim
Always keep a separate working method for booting the machine into at least single user mode.
This method might be a boot floppy, CD-ROM or tape. The format doesn't matter. What does matter that at anytime you can bring the system up in at least single user mode so you can perform some repairs.
A separate mechanism to bring the system up single user mode will enable you to solve most problems involved with damaged file systems, improperly configured kernels and errors in the rc scripts.
Hardware Problems
Some guidelines to solving hardware problems
- check the power supply and its connections,
Don't laugh, there are many cases I know of in which the whole problem was caused by the equipment not being plugged in properly or not at all.
- check the cables and plugs on the devices,
- check any fault lights on the hardware,
- power cycle the equipment (power off, power on),
There is an old Systems Administration maxim. If something doesn't work turn it off, count to 10 very slowly and turn it back on again (usually with the fingers crossed).
- try rebooting the system without selected pieces of hardware,
It may be only one faulty device that is causing the problem. Try isolating the problem device.
- use any diagnostic programs that are available, or as a last resort
- call a technician or a vendor.
Damaged File Systems
First always have backups of all file systems so that you can quickly recover some information. Try using fsck to fix the problem. If worse comes to worse resort to your backups.
Improperly Configured Kernels
Reasons why you might change the kernel will be discussed in a later section. When you do change the kernel you should always keep a backup working version of kernel that you can use to reboot the system.
A daemon is a process that spends much of its time waiting for some event to occur. Once the event occurs the daemon wakes up and performs some predefined action. The action is sometimes controlled by a configuration file. init itself can be classed as a daemon.
A standard UNIX system has a large number of daemons running at any one time. Most of these daemons have to be started by the initialisation scripts as the system first boots. Table 9.4. lists some of the common daemons on a UNIX machine.
Daemon Purpose
inetd the main network server
named Name server, provides dynamic hostname data for
TCP/IP networking
timed Time daemon used to synchronise different system
clocks
sendmail the mail daemon, responsible for delivering mail
locally and to remote hosts
nfsd nfs file exporting daemon
ypbind ypserv NIS (yellow pages) daemons
syslogd System logging daemon, records various events.
Table 9.4. Various Daemons.
Exercise 9-3. Using a combination of the ps command and by examining your systems startup scripts discover which daemons are being run on your system.
You should not just simply turn a UNIX computer off or reboot it. Doing so will usually cause some sort of damage to the system especially to the file system. Most of the time the operating system may be able to recover from such a situation (but NOT always).
There are a number of tasks that have to be performed for a UNIX system to be shutdown cleanly
- tell the users the system is going down,
Telling them 5 seconds before pulling the plug is not a good way of promoting good feeling amongst your users. Wherever possible the users should know at least a couple of days in advance that the system is going down (there is always one user who never knows about it and complains).
- signal all the currently executing processes that it is time for them to die,
Hopefully these processes will all die gracefully (given some time) and will not do anything nasty to the system in the process.
- place the system into single user mode, and
- perform sync to flush the file systems buffers so that the physical state of the file system matches the logical state.
All UNIX systems will supply a command or two that perform these tasks for you. System V and BSD based systems use a slightly different format.
In general, you should try to limit the number of times you turn a computer on or off as doing so involves some wear and tear. It is often better to simply leave the computer on 24 hours a day. In the case of a UNIX system being used for a mission critical application by some business it may have to be up 24 hours a day.
Some of the reasons why you may wish to shut a UNIX system down include
- general housekeeping,
Everytime you reboot a UNIX computer it will perform some important housekeeping tasks, including deleting files from the temporary directories and performing checks on the machines file systems.
Rebooting will also get rid of any zombie processes.
- general failures, and
Occasionally problems will arise for which there is only one resort, shutdown. These problems can include
- hanging logins,
- unsuccessful mount requests,
- dazed devices,
- runaway processes filling up disk space or CPU time and preventing any useful work being done.
- system maintenance and additions.
There are some operations that only work if the system is rebooted or if the system is in single user mode, for example adding a new device.
Knowing of the existence of the appropriate command is the first step in bringing your UNIX computer down. The other step is outlined in the heading for this section. The following command is an example of what not to do.
shutdown -g0
On a SVR4 box this results in a message somewhat like this appearing on the user's terminal
THE SYSTEM IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged.
Not a method inclined to win friends and influence people.
The following is a list of guidelines of how and when to perform system shutdowns
- shutdowns should be scheduled,
If users know the system is coming down at specified times they can organise their computer time around those times.
- perform a regular shutdown once a week, and
A guideline, so that the housekeeping tasks discussed above can be performed. If it's regular the users get to know when the system will be going down.
- use /etc/motd.
This is the message users see when they first log onto a system use it to inform users of the next scheduled shutdown.
There are a number of different methods for shutting down and rebooting a system including
- the shutdown command
The most used method for shutting the system down. The command can display messages at preset intervals warning the users that the system is coming down.
- the BSD halt command
Logs the shutdown, kills the system processes, executes sync and halts the processor.
- the BSD reboot command
Similar to halt but causes the machine to reboot rather than halting.
- sending init a TERM signal
init will usually interpret a TERM signal (signal number 15) as a command to go into single user mode. It will kill of user processes and daemons. The command is kill -15 1 (init is always process number 1). It may not work or be safe on all machines.
- the BSD fasthalt or fastboot commands
Shell scripts which create a file /fastboot before calling halt or reboot. When the system reboots and it finds a file /fastboot it will not perform a fsck on the file systems.
The most used method will normally be the shutdown command. It provides users with warnings and is the safest method to use. AT&T and BSD have different shutdown commands.
The following sections talk about the format, purpose and options for various commands associated with the shutdown procedure. It is by no means an exhaustive discussion and many systems will have modified or made additions to the way in which these commands work.
The AT&T shutdown Command
The format of the command is
shutdown -ggrace_period -iinit_state [-y]
Parameter Purpose
grace_period The number of seconds to wait before beginning
the shutdown (default of 60)
init_state The init state (run level) to put the system into
(one of the appropriate levels listed in Table 9.2.)
Table 9.5. Parameters of the SysV Shutdown Command.
In some cases the command will ask for confirmation just before performing the shutdown. This question can be pre-answered by using the -y option.
The BSD shutdown Command
The format of the command is
shutdown [-fhknr]time [warning-message]
The command actually calls on the halt command to perform the actual work of halting the system. shutdown just displays messages etc. before calling halt.
Parameters Meaning
-f flag to specify that file systems will not be checked on
system restart (create the /fastboot file)
-k simulate shutdown of the system, WON'T actually do it
-h simply halt the system (the same as using the halt command)
-r reboot the system (same as using the reboot command)
-n don't execute the sync command before shutting down
time has two formats either
+number system down in number minutes
hour:min system down at time indicated in 24 hour format
Warning messages are displayed at periodic intervals
and logins are disabled five minutes before shutdown
warning-message message to display to the users
Table 9.6. Format of the BSD Shutdown Command.
The BSD halt Command
The halt and reboot commands are actually used by the shutdown command. The halt command performs a sync on the disks and stops the processors. The format of the halt command is.
halt [-nqy]
Parameters Purpose
-n don't sync the disks before stopping
-q do a quick halt (create the /fastboot file so that the file
systems are not fscked on reboot)
-y halt the system
Table 9.7. Parameters of the BSD halt Command.
The reboot Command
The reboot command shuts the system down and then restarts it. Its format is
reboot [-dnq][boot arguments]
Parameters Meaning
-d dump a copy of the kernels memory (referred to as system core) before rebooting (some machines recognise this option but don't do anything)
-n avoid the sync call.
-q reboot quickly and ungracefully
boot arguments Use to specify how the system should restart (single user mode for example). Some machines ignore these arguments.
Table 9.8. Parameters of the BSD reboot Command.
This section has provided
- an introduction to the processes performed when starting up and shutting down a UNIX computer,
- a knowledge of the commands, shell scripts and configuration files involved in both processes,
- an introduction to the problems that might cause a UNIX computer not to reboot and some of the possible solutions, and
- reasons why you would want to shut a UNIX computer down.
9.1. Describe (with diagrams) the process of starting a UNIX computer.
9.2. List three reasons why you might want to shut a UNIX computer down.
9.3. List four reasons why a UNIX computer may not reboot.
9.4. What steps have to be completed to shut a UNIX system down properly?
9.5. On a Linux machine there is usually a shell script /etc/rc.d/rc.0 that is executed when the machine is shutdown. However there is no entry in the /etc/inittab that tells init to run the script. How is the script executed?
Hints.
Think about the programs that are executed when a machine is being shutdown.
There is a command strings that displays any text that may appear in an executable program.
Previous | Next
David Jones (author)
Chris Hanson (html 08/09/96)