Previous | Next

Section 4 ADVANCED UNIX USE

Objectives

At the end of this section you will

be able to write shell programs that make use of shell functions,
become aware of how to use input/output redirection from within shells,
be introduced to the concepts of processes, programs and signals,
understand and use regular expressions to perform a number of tasks,
be able to make use of awk and understand some of its intricacies, and
be able to make use of sed.

In section 3 you were provided with an introduction to some simple UNIX comamnds and to the art of writing shell scripts. In this section you will encounter some of the more advanced commands and some of the advanced techniques that can be used in writing shell scripts.

Before getting started lets be certain about the terminology that is to be used in relation to shell programs. Table 4.1 lists the terminology that will be used throughout this course in relation to shell programs. Other people may use slightly different terms.

Term		Explanation

Shell program	an executable file that contains UNIX and shell
		 commands and can be interpreted by a UNIX shell
Shell script	same as a shell program
Shell procedure	same as a shell program
shell function	similar to a function in C or Pascal, only used
		 within a shell program (introduced below)

	Table 4.1. Terminology for shell scripts and functions.

Shell Functions

Any good programming langauges provides support for functions. Functions serve to group collections of commands that carry out an often required task. The Bourne shell programming language is no exception.

Actually that isn't quite true. Older versions of the Bourne shell actually didn't support functions. On most System V based UNIX operating systems this is no longer a problem as the Bourne shell has been updated. However some BSD based machines may have old versions of the Bourne shell that don't recognise shell functions. (bash supports them)

		Bourne Shell Function Syntax

	name()
	{
		command-list
	}

	Figure 4.1. Bourne Shell Function Syntax.

Section 3 introduced the predefined shell variables $0 $1...$9 and the special meaning that shell programs give them. The meaning of these variables change when they are used inside of a shell function.

Some of the differences include:

$1 $2... are used to represent any parameters passed to the function (not those passed to the shell program),
$0 still contains the name of the shell program,
$* is the list of all the parameters passed to the shell function.

There is only one pool of shell variables in a shell program. The Bourne shell does not support any notion of local variables. Any variable created in a shell function can be accessed from outside of that function.

For example:

Create a shell procedure containing the following code

#!/bin/sh

# an example shell function called hello
hello()
{
    # display the function's 1st param
    echo hello there $1
    # display all function parameters
    echo $*
    echo $0
    local_var="hello"
}

# call the function and pass to it all the 
# programs parameters
hello $*
# show that there is no scope rules
echo "Local variable value = " $local_var
# call the fucntion but pass it no parameters
hello

In keeping with the standard concept of functions, shell functions also have a return status. The return command indicates the shell function's return status. A successful shell function should return a value of 0 while a shell function that fails should return a non zero value.

For example:

temp()
{
  return 1
}

if temp
then
  echo succeeded
else
  echo failed
fi

Exercise 4-1. Convert the shell script file_type from review question three from section 3 into a function. The function should be called file_type and it should set a variable type to the type of the file passed in (the function should only take one parameter).

An example use of the function would be
file_type /etc/passwd echo "/etc/passwd is a $type."

Input/Output for Shell Programs

This section shows you how to

obtain interactive input from the user while a shell program is executing,
read input from a file while the shell program is running, and
introduce you to some of the advanced features of the echo command.
The read Command

The read command is used to take a line from standard input and place it into a list of shell variables. On versions of UNIX System V Release 2 or later the input to the read command can be redirected to come from a file.
```
		read Command Format.

	read variable-list

variable-list is a list of one or more shell variables.

The first word read is placed into the first variable, the second
 into the second and so on.  Any excess words are placed into the
 last shell variable.

read always returns an exit status of zero unless an end of file
 condition is met.  With interactive input from the keyboard this
 is when the  key combination is used. 

Figure 4.2. read Command Format.
```
For example:
Using the following shell program called input
```
  read first second
  echo "Variable first contains " $first
  echo "Variable second contains " $second
```
Some example runs of input
```
  input					run the shell script
  hello there my friend			input some data
  Variable first contains hello
  Variable second contains there my friend
  input
  hello
  Variable first contains hello
  Variable second contains 
```
Assume there exists a file call input_file with the following contents
hello there
The following command line runs the script with the above file as input rather than using standard input (the keyboard)
```
  input < input_file
  Variable first contains hello
  Variable second contains there
```
Assume the following is in a shell procedure
```
  while read whole_line
  do
    echo I just got the line $whole_line
  done < /etc/passwd
```
This shell procedure will go through the /etc/passwd file a line at a time. The read command continues to succeed until end of file is reached.
Exercise 4-2. It is often the case that you need to ask a user whether or not to do something. Write a shell function called ok that displays the following message
(y,n) ==>
then waits for the user to enter a value. The function should set the value of the shell variable response using the following guides
- if the user enters y (or yes Yes YES etc) it should contain y
- if the user enters n (or no No nO etc) it should contain n
- if the user enters anything else it should contain ERROR
Exercise 4-3. Assume the existence of a file called amounts in which each line contains two numbers (you don't know how many lines it will have). For example,
5 15 65 40
Write a shell procedure calculate that uses the file amounts as input and produces a file totals with the following format
5 + 15 = 20 65 + 40 = 105

Extensions to the echo Command

The echo command by default always displays a terminating new line at the end of each command. There is a way of circumventing this. As of the System III version of UNIX, special echo escape characters were added. These escape characters are listed in Table 4.2.
Using \c to supress the terminating newline character will only work on some versions of UNIX. The alternative to using \c on most systems is a -n switch for the echo command.
For example:
Use the \c character to surpress the newline character (this won't work on Linux)
```
	echo "Enter a number ==> \c"
	read number
```
Use the -n switch to surpress the newline character
```
	echo -n "Enter a number ==> "
	read number
```
```
  Character	Purpose

	\b	display the backspace character
	\c	display the line without the terminating newline
	\f	do a form feed
	\n	display the newline character
	\r	display the carriage return character
	\t	display the tab character
	\\	display the backslash character
	\nnn	display any character with the ASCII value signified by nnn
		 (a 1 to 3 digit octal number starting with 0)

		Table 4.2. echo Escape Characters.
```
Trapping Signals

When you hit the CTRL-C combination to stop the execution of a shell program the shell sends a software termination (sometimes referred to as the TERM signal) signal to the program's process. By default all shell programs when they receive this signal will terminate immediately.
The UNIX operating system generates a number of different signals (Table 4.3. lists some of them). Each signal has an associated unique identifying number and a symbolic name. Table 4.2 lists the signals defined by a UNIX machine using a SVR4 version of UNIX.
Each process has an a default behaviour for every signal. When it receives a signal it carries out this default behaviour. The most common default actions are
- terminate, or
  In this case the process will cease to exist. Signal number 9 (SIGKILL is an example of one signal that causes this behaviour.)
- ignore.
  Many processes will simply ignore some signals and carry on processing. Some signals cannot be ignored, SIGKILL for example.
There are mechanisms available that allow programmers or Systems Administrators to change the default behaviour associated with some signals. The trap command listed below is one of them.

The kill Command

There are a number of methods by which you can send a signal to a process, for example hitting the key combination CTRL-C will usually send the SIGTERM signal to a process. Another more general method is to use the kill command.
```
		kill Command Format.

	kill [ -signal ] pid ...

	Sends the signal specified by the number signal to
	the process identified with process identifier pid.
	The kill command will handle a list of process identifiers.

	By default kill sends signal number 15 (the TERM signal).

	Figure 4.3. kill Command Format.
```
The ps command lists processes, their process identifiers and a variety of other information.
For example:
```
ps
  PID TTY STAT  TIME COMMAND
26796 pp0 S     0:01 -bash	the user's login shell
30434 pp0 R     0:00 ps		the process for ps
kill 26796
```
this will have the affect of logging the user out as the TERM signal has been sent to the login shell which will terminate on receiving it.
The trap Command

Situations occur in which the default behaviour associated with a signal must be overridden.
For example:
It is common to write shell scripts the create temporary files. In most cases it is good practice to delete these temporary files when the shell script exits. However what happens if a user sends a shell program process the SIGTERM signal while it is running?
By default the process will cease immediately. It will NOT delete any temporary files. In this case it would be nice to be able to delete the temporary files first, before exiting.
Another example is the case where you don't want the user to be able to quit out of a shell procedure. In this case you want signals like SIGTERM to be ignored.
The trap command allows the default behaviour of signals to be overridden.
```
Symbolic Name	Number	Purpose

SIGHUP		1	hangup 
SIGINT		2	interrupt (rubout) 
SIGQUIT		3	quit (ASCII FS) 
SIGILL		4	illegal instruction (not reset when caught) 
SIGTRAP		5	trace trap (not reset when caught) 
SIGIOT		6	IOT instruction 
SIGABRT		6	used by abort, replace SIGIOT in the future 
SIGEMT		7	EMT instruction 
SIGFPE		8	floating point exception 
SIGKILL		9	kill (cannot be caught or ignored) 
SIGBUS		10	bus error 
SIGSEGV		11	segmentation violation 
SIGSYS		12	bad argument to system call 
SIGPIPE		13	write on a pipe with no one to read it 
SIGALRM		14	alarm clock 
SIGTERM		15	software termination signal from kill 
SIGUSR1		16	user defined signal 1 
SIGUSR2		17	user defined signal 2 
SIGCLD		18	child status change 
SIGCHLD		18	child status change alias (POSIX) 
SIGPWR		19	power-fail restart 
SIGWINCH	20	window size change 
SIGURG		21	urgent socket condition 
SIGPOLL		22	pollable event occured 
SIGIO		22	socket I/O possible (SIGPOLL alias) 
SIGSTOP		23	stop (cannot be caught or ignored) 
SIGTSTP		24	user stop requested from tty 
SIGCONT		25	stopped process has been continued 
SIGTTIN		26	background tty read attempted 
SIGTTOU		27	background tty write attempted 
SIGVTALRM	28	virtual timer expired 
SIGPROF		29	profiling timer expired 
SIGXCPU		30	exceeded cpu limit 
SIGXFSZ		31	exceeded file size limit 

		Table 4.3. Signals on a SYSVR4 Box.


		trap Command Format.

	trap [commands] [signals]

	trap with no parameters displays a list of current trap assignments.

	signals	a list of signals to change the default action for
		 commands is one or more commands that will be executed
		 when ever one of the listed signals is received

	If commands is the null string then the specified signals are ignored.

	If only signals is used the list of signals will be reset back to the
	 original default actions.

	Figure 4.4. trap Command Format.
```
For example:
```
ps			execute ps to find out the pid of login shell
  PID TTY STAT  TIME COMMAND
26796 pp0 S     0:01 -bash
30434 pp0 R     0:00 ps
trap "echo I received signal 2; echo goodbye" 2
```
everytime the shell receives signal 2 it should perform the echo commands
kill -2 26796
send signal 2 to the login shellI received signal 2 goodbye trap view the default actions trap -- 'echo you sent signal 2' SIGINT trap 2
Reset the default actions for signal 2.
trap 'echo logged off at `date` >> $HOME/logoffs' 0
When the user logs out append a message to the logoffs file.
Type the command in at the shell prompt and then logout.
Exercise 4-4. Write a shell procedure called signals that loops forever doing nothing. However if it should receive the signal 15 it should display an appropriate message.
For example:
```
	signals&	run the shell script in the background
	kill 34256	send signal 15 to the signals process
	Signals received signal 15
```
Regular Expressions

It has been mentioned that the shell has built in support for character matching with the * ? and [] constructs. However some UNIX commands understand a more powerful method for character matching called regular expressions.
Regular expressions are recognised, in one form or another, by the following UNIX commands ed, ex, sed, awk, grep, egrep, expr and even vi. Regular expressions are used by these commands to match a sequence of characters.
Regular expressions are sometimes referred to as REs.
Regular expressions provide more powerful and varied mechanisms for matching and modifying text. Table 4.3 lists the basic symbols recognised by REs and what they will match.
The power in regular expressions comes when they are combined. The primitives from Table 4.4 can be combined together to perform complex matches. Table 4.5 provides a number of examples.
```
Symbols	Purpose

c	any character other than \ [ . * ^ ] $ (or some subset of these
	 depending on implementation) matches itself
\	remove any special meaning from a character
.	match any one character
^	match the start of the line when it is the first character in the RE
$	matches the end of the line when it is the last character in the RE
*	match zero or more matches of previous character or expression
[chars]	match any one character from the string chars
[^chars]match any one character NOT in the string chars

		Table 4.4. Summary of Regular Expressions.
```
```
Example RE	What it matches

unix		matches the work unix and nothing else
[Uu]nix		matches any word starting with a u or a U followed by nix
[^aeiouAEIOU]*	matches any sequence of characters that does not contain
		  a vowel
^abc$		matches lines that contain abc only
hel.		match any word starting with hel followed by one letter
		  e.g. help hell hel<

		Table 4.5. Example Regular Expressions.
```
Exercise 4-5. Describe the text that the following REs will match.
a) abc.def
b) [aA][bB][cC]dD
c) $
d) [^sS]*[aeiou]*[a-z]*
e) hello^there$friend
f) [abc]\*
g) \[hello there\.*$
One of the difficulties with regular expressions is that there is more than one variety. Each different variety has its own extensions and limitations. The varieties of REs include
- limited regular expressions,
  As used by the commands ex, ed, sed and grep.
- full regular expressions, and
  As used by egrep, awk and some versions of vi
- System V extensions to regular expressions.
  Versions of System V support commands that recognise extensions to limited REs
Limited Regular Expressions

Limited regular expressions recognise the basic symbols from Table 4.3 and in addition support the storing of matched expressions into registers. The system maintains a set of registers starting from register one. The act of storing an RE into a register is sometimes referred to as tagging.
A regular expression surrounded by  is considered tagged (the \ character is included because the () characters have special meaning to the shell) and is stored into the next register. The first tagged RE will be placed into register 1, the second into register 2 and so on. The value stored in each register is accessed by \n where n is the number of the register.
For example:
$tick$ tock \1 will match the strings tick tock tick $aa$bb\1* will match any string that starts with aabb and then has any number of aa characters following
Tagging is especially powerful when used in combination with some of the ed commands discussed later in this section.

Full Regular Expressions

Full regular expressions include the symbols outlined in table 4.3 and add the additional constructs from table 4.6. They do not support the notion of tagging supported by limited regular expressions.
```
	Symbol	Meaning

	+	matches one or more occurrences of the previous RE
	?	matches zero or one occurrences of the previous RE
	|	matches either of two REs separated by a | or both REs
		 separated by a new line
	()	used to group an RE so that * + ? | can be applied to it

		Table 4.6. Extensions for Full REs.
```
System V Extensions

System V provides an extension to limited REs that allows you to specify the exact number of matches you require.
```
	Symbol	Meaning

	\{n\}	match exactly n occurrences of a single character RE
	\{n,\}	match at least n occurrences of a single character RE
	\{n,m\}	match between n and m occurrences of a single character RE

		Table 4.7. System V Extensions to Limited REs.
```
For example:
abc\{5\}
matches a string starting with abc followed by 5 occurrences of c
Exercise 4-6. Write grep commands that use REs to carry out the following.
a) find any line starting with j in the file /etc/passwd
(equivalent to asking find any username that starts with j)
b) find any username that starts with j and uses bash as their login shell
b) find any user that belongs to a group with a group number between 0 and 99

ex, vi and Regular Expressions

When you use commands like :w or :q from within the vi editor you are actually executing an ex command. ex is a powerful line oriented text editor that will accept commands either interactively or from a ex script file.
Our main interest in ex here is its ability to use regular expressions to perform complex manipulation of text. Particularly interesting is the ability to use these ex commands from vi. Similar commands can also be used under ed and sed.
You will probably already have used a number of ex commands during your normal use of vi. Whenever you enter : and then some command (e.g. :w :wq :q) you are entering an ex command.

Specifying Addresses

ex commands support the notion of addresses that allow you to specify the range of lines on which you want the command to be applied to. Table 4.8 specifies the variety of methods by which addresses can be specified.
For example:
:1,5w!hello.dat
Save lines 1 through to 5 into the file hello.dat :/hello/,/goodbye/w!inbetween
Save to the file inbetween all the lines between the next line that contains hello and the next line after that that contains goodbye :.,$w!rest
Save all the lines from the current line to the last line into the file rest
```
Address			Purpose

.			addresses the current line
$			addresses the last line of the file
number			addresses the line which is number lines from
			 the start of the file
letter			addresses the line which has been tagged with
			 the label corresponding with the lower case
			 alphabetical character letter*
/expression/		addresses the first line which matches the regular
			 expression expression by searching FORWARD from
			 the current line
?expression?		addresses the first line which matches the regular
			 expression expression by searching BACKWARD from
			 the current line
address+number		addresses the line which is number lines FORWARD
			 from address
address-number		addresses the line which is number lines BACKWARD
			 from address
address1,address2	specifies the range from address1 to address2
,			stands for the pair 1,$ i.e. the whole file,
			 from the first line to the last line 
;			stands for the pair .,$ i.e. from the current
			 line to the end of the file
	* lines are tagged using the k command outlined below

		Table 4.8. Specifying Addresses for Regular Expressions.
```
Other Commands

w and q (write and quit) are just two of a wide arrange of ex commands. Some of the others are listed in table 4.8. (This is not a complete list.)
```
		ex Command Format.

	address command count

	address		as specified in table 4.6
	command		as specified in table 4.7
	count		how many times to perform the command
			 count defaults to 1

	Figure 4.5. ex Command Format.
```
The buffer referred to in Table 4.9 (for the d command) could be one of the 27 buffers maintained by ex and vi for the purpose of storing text.
There is an unnamed buffer that is used to store text from a normal delete operation. There are then 26 named buffers (referred to by the names a through to z).
For example:
4d3
Starting from line 4 delete 3 lines
1,$s/hello/HELLO/
replace the first occurence of hello with HELLO from every line in the file
1,$s/hello/HELLO/g
replace all occurrences of hello with HELLO from every line in the file
5d+++
delete the 5th line and then move on three lines
da2
delete two lines starting from the current line and put them into the a buffer
5pa
insert the contents of the a buffer after line 5
```
Command			Purpose

line a			append text after line
range co line 		copy lines specified by range to just after line
range d buffer count	delete lines specified by range and count.
			 Place them into buffer if specfied
range j count		join text from specifed lines into one line
n			edit the next file specified in the command line
q			quit
line r file		read the contents of file and insert it after line
sh			start up another shell
range s/pat1/rep1/options count
			replace all patterns matching the RE pat1 in the
			 specified range with rep1 (will only work on the
			 first occurence of pat1 for each line unless g is
			 used as an option
u			undo the previous change
range w file		write the specified lines to the file. 
			 w>>file appends the lines to the file

		Table 4.9. Sample of ex commands.
```
Exercise 4-7. Do the following using both vi and sed
a) replace all occurrences of UNIX with UNIX[tm]
b) replace all occurrences of Write with Writeln for all lines between the next occurence of BEGIN and the next occurence of END
c) replace all occurrences of > that start a line with nothing
Exercise 4-8. What do the following ex commands do?
a) .+1,$d
b) 1,$s/OSF/Open Software Foundation/g
c)1,/end/s/$[a-z]*$ $[0-9]*$/\2 \1/

awk

awk (named for its inventors Aho, Weinberger and Kernighan) is a very useful UNIX tool used by Systems Administrators to perform a number of tasks. awk's primary tasks are involved with searching, modifying and creating reports based on the contents of a file.
The following reading provides you with an introduction to this powerful tool. You will notice that the syntax of awk has much in common with that of the C programming language.
For example:
Place the following into a shell script, it will find any line that starts with the first parameter to the shell script in the file indicated by the second parameter. To use a shell variable in an awk script you need to surround it by ' '.
```
	awk 'BEGIN {
	  FS=":";
	  print "Usernames matching '$1' are.."
	           }
	  /^'$1'/  {
	  print $1;
	  X = X + 1
	        }
	  END   {
	  print "There were", X, " of them."
	        }' $2
```
Exercise 4-9. A computing academic stores the marks for his students in a colon delimited file called results.dat. The file has the following format
surname:firstname:a1Mark:examMark
For example:
```
  jones:david:5:55
  blow:joe:10:100
```
Write an awk script to produce a file called final.dat that displays the information taken from results.dat in the following format:
```
  jones, david     60
  blow, joe       110
```
Assume: a1Mark is out of 10 and is worth 10% of the final result. examMark is out of 90 and is worth 90% of the mark.
Exercise 4-10. Write an shell script called user that accepts a list of group identifiers and uses awk to produce a list of users who belong to that group and displays the total number of users who belong to that group.
```
	
		user 0 1 2
		Group 0 has no users
		root
		daemon
		uucp
		uucpa
		Group 1 has 4 users
		Group 2 has no users
```
Perl

The following description of perl is taken from the frequently asked questions list of the Usenet newsgroup comp.lang.perl. Perl is becoming an increasingly important tool in toolkit of a Systems Administrator. We do not cover it in any detail in this course because of a lack of space.

What is Perl?

A programming language, by Larry Wall <lwall@netlabs.com>.
Here's the beginning of the description from the Perl man page:
Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it. (Language historians will also note some vestiges of csh, Pascal, and even BASIC-PLUS.) Expression syntax corresponds quite closely to C expression syntax. Unlike most UNIX utilities, Perl does not arbitrarily limit the size of your data--if you've got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And the hash tables used by associative arrays grow as necessary to prevent degraded performance.
Perl uses sophisticated pattern matching techniques to scan large amounts of data very quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files look like associative arrays (where dbm is available). Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism which prevents many stupid security holes. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then Perl may be for you. There are also translators to turn your sed and awk scripts into Perl scripts.

Conclusion

Throughout this section you have been introduced to some of the more advanced commands that form some of the tools of the trade for a Systems Administrator. These are tools you will use again and again as a UNIX Systems Administrator. It is important that you become familiar with them.

Review Questions

4.1. Write a shell program called mycp. It is to be a replacement for the cp command but instead of simply copying the file it should first check to see if the destination file exists. If it does it should ask the user whether or not they
- wish to copy over the existing destination file,
- move the existing destination file to filename.old and then copy the new destination, or
- not copy at all.
4.2. The existing cp program supports the following type of command.
cp file1 file2 file3 directory
That is copying more than one file into a destination directory. Modify your mycp command from the previous question to allow this type of syntax.
4.3. It is often the case that specific users on a system continually use too much disk space. There are a number of solutions to this problem including quotas (talked about in a later section).
Implement another solution to this problem that has the following characteristics.
- maintain a file called disk.hog which contains the usernames (one per line) of the offending users and the amount of disk space they are allowed to have. For example
```
			jonesd 50000
			okellys 10
```
- maintain a script find_hog that is run once a day and performs the following tasks
  - for each user in disk.hog discover how much disk space they are using
  - if the amount of disk space exceeds the allowed amount write their username to a file offender
HINTS. The command du -s directoryname can be used to find out how much disk space the directory directoryname and all its child directories use.
The file /etc/passwd records the home directory for each user.

Previous | Next

David Jones (author)
Chris Hanson (html 25/08/96)

Section 4

ADVANCED UNIX USE

Objectives

Introduction

Shell Functions

Input/Output for Shell Programs

The `read` Command

Extensions to the `echo` Command

Trapping Signals

The `kill` Command

The `trap` Command

Regular Expressions

Limited Regular Expressions

Full Regular Expressions

System V Extensions

`ex, vi` and Regular Expressions

Specifying Addresses

Other Commands

`awk`

Perl

What is Perl?

Conclusion

Review Questions