F. SYSTEM MONITOR REPORT

I will start explaining the reason why I created this script. My office is located in the main office building, 2 kms away from the Centralized Control Room of the refinery. When I started working here I noticed I was missing some I/A systems alarms because operations called me only when the problem was really bad. There were times when the flashing SYS button was forgotten because the operators were very busy at those moments. The worse thing happened when I found one of our fault-tolerant CP in red, and apparently it had been in that state for several days, according to the system monitor log.

At that time we did not have remote display managers that can show us the System Monitors status. We did have however telnet (vt100) access to the I/A systems. I decided then to write an SQL report to extract all alarm messages from the system monitor group group resident in the historian.

Later on I modified it to filter and get only the last 2 days alarm messages. However, it was still tedious to access every day each one of the nodes, run the reports, and see which problems we had to solve. Besides, you forgot to run the script sometimes.

After struggling several days trying to understand how sendmail works on Sun stations, I succeeded in my effort to have the Sun stations send their reports to my ccMail account.

The rule in the refinery is that the first thing you have to do when you arrive to your office is to check your ccMail messages. Well, after that moment I did not miss any I/A system alarm for more than a few hours. After installing the report on other systems, we have been receiving regularly their System Monitor reports.

If you want to do the same in your system, I will give you here the steps required for a 50 series station. For the 20 series, wait a few more days. It will be here. COMING VERY SOON!


HISTORIAN

Start finding which station hosts the historian that receives the alarm messages of your System Monitors. (Look at your system configuration).

By default the System Monitor message group: "sysmonmsg" is configured to hold a small number of messages. When that number is exceeded, the new alarms will overwrite the oldest ones. In order not to miss any alarms because they were overwritten, I suggest to start modifying the maximum number of messages of that group to 1000.

Create in /opt/fox/hstorian/bin the following 2 ascii files: ac0.sql, and ac1.sql

ac0.sql :

update messag_cfg set max_msgs = 1000 where id = "sysmonmsg";
update statistics;

ac1.sql :

update messag_cfg set policy = 2 where policy != 2;
update statistics;

Type now the following commands, replacing 'hhhhhh' with your historian name:

setenv PATH ${PATH}:/opt/informix/bin
setenv INFORMIXDIR /opt/informix
cd /opt/fox/hstorian/bin
isql hhhhhh ac0.sql
isql hhhhhh ac0.sql

The first file ac0.sql will modify the total number of messages to 1000, while the second one will set the policy to 2 (overwrite old messages). The last one might not be required in your system, old historian problem: HH652.


THE LAST 2 DAYS SQL REPORT

Now, type or paste the report "last2.ace" in directory /opt/ac. You can use other directory name, just remember to modify all references in all the files.

The only changes you have to do in that report are:

Change 'hhhhhh' with your historian name.
Change ''Thermal Cracker' with the name of your plant.

To compile the report type the following commands, or put them in an script:

INFORMIXDIR=/opt/informix
export INFORMIXDIR
/opt/informix/bin/saceprep /opt/ac/last2

Confirm that no error messages are reported during the compilation, otherwise you have to look for file last2.err and fix them.

last2.out sample:
THE FOLLOWING REPORT IS FOR THE FCC SYSTEM (Last 2 days ONLY)

TODAY      =    24
YESTERDAY  =    23

22:16:41 23/11/97     CCDI01      Equip = FCCTR1
    SYSMON -00070 Single PIO Bus Access Recovery on A

22:16:32 23/11/97     CCDI01      Equip = FCCTR1
    SYSMON -00068 Single PIO Bus Access Error on A

00:23:10 23/11/97     CCAP01      Process = ARCHIVE_CTL
    HSTORI -00006 Unrecognized error code 9
....
....	

SCRIPTS TO RUN THE REPORT

I have two scripts for this purpose. One is run on demand any time I want to check the latest messages, while the other one is run periodically by cron.

The on demand "last2.go" script is:

#!/bin/sh
# last2.go
INFORMIXDIR=/opt/informix
export INFORMIXDIR
rm /opt/ac/last2.out
/opt/informix/bin/sacego /opt/ac/last2
more /opt/ac/last2.out

The script "last2.run" to be run by cron is:

#!/bin/sh
# last2.run
INFORMIXDIR=/opt/informix
export INFORMIXDIR
rm /opt/ac/last2.out
/opt/informix/bin/sacego /opt/ac/last2
mail tsid1@email.isla.pdv.com < /opt/ac/last2.out

To modify your crontab file for cron to run the script, type the following commands:

cd /opt/ac
crontab -l > crontab_file

To run the script every day at 7:00 am, edit crontab_file adding this line:

0 7 * * * /opt/ac/last2.run > /dev/null 2>&1

Finally load your crontab_file to cron memory:

crontab crontab_file

At this point you will have the report last2.out available every day at 7:00 am in your historian station.

If you want to receive the report in your local e-mail account, you will need the station (or another in the same node) to be connected to your company LAN where your PC is connected. If you are lucky to be in this case, continue with following steps.


HOSTS FILE

Modify the I/A Sun's hosts database to include your Company's DNS Server. Our refinery has another Sun station acting as DNS server: isla.pdv.com

After the modification your /etc/hosts should look like this:

# Sun Host Database
127.0.0.1 localhost loghost
#
# 2nd Ethernet Port stations ...
195.3.16.26 isla.pdv.com
195.3.22.26 hlaphl
195.3.19.50 tsid1
...

Where:

5th line: defines the DNS Server.
6th line: 2nd ethernet port name of this station.
7th line: my PC name in the network.


RESOLV.CONF FILE

If the report will have to go to the Internet, the file: /etc/resolv.conf must exist in order to specify the Domain and DNS Servers of the company. If the message goes to the company LAN, this file is not necessary.

/etc/resolv.conf should look like this:
; DNS resolver file
;
domain isla.pdv.com
;
nameserver 195.3.16.26
nameserver 198.64.100.32
...

The numbers shown are not real ones for security reasons, even when our network is protected by a firewall.


SENDMAIL CONFIGURATION FILE

The file: "/etc/sendmail.cf" defines how to send mail to the outside world. Here you should modify: Major Relay Hosts, Local Domain Names. In our refinery, our local domain name is: isla.pdv.com

For 50 Series: /etc/sendmail.cf
For 51 Series: /etc/mail/sendmail.cf
After editing, copy it to /etc.

Here you have our sendmail.cf after the modifications : (Partial display)

###########################################################
#       SENDMAIL CONFIGURATION FILE FOR SUBSIDIARY MACHINES
#       See the manual "System Administration for the Sun Workstation".
#       Look at "Setting Up The Mail Routing System" in the chapter on
#       Communications.  The Sendmail references in the back of the
#       manual are also very useful.
 
# local UUCP connections -- not forwarded to mailhost
CV
 
# my official hostname
# Dj$w.$m
Dj$w
 
# major relay mailer
DMether
 
# major relay host
# DRmailhost                                    <-- Original Entry
# CRmailhost                                    <-- Original Entry
DRisla.pdv.com
CRisla.pdv.com
 
#################################################
#       General configuration information
# local domain names
#
Dmisla.pdv.com
Cmisla.pdv.com
....
....


Test if sendmail works by sending an ascii file to yourself, like:

mail tsid1@isla.pdv.com < /etc/printers


If everything went ok, you will be receiving everyday in your e-mail the reports from I/A System Monitor reports. Enjoy them.



Index - Home


This page hosted by
Get your own Free Home Page