UNIX Hints & Hacks |
|||||||||||||||||||||||||||||||||||||
Chapter 4: System Monitoring |
|
||||||||||||||||||||||||||||||||||||
|
Use grep with diff to monitor problems that might be hiding in large log files.
Flavors: AT&T, BSD
Shell: sh
Syntax:
grep [pattern] diff file1 file2
This short script monitors a log file for specific errors or messages; when they occur, the script sends an email to a predefined address.
#! /bin/sh touch /tmp/sys.old while [ 1 ] do grep ERROR /var/adm/SYSLOG > /tmp/sys.new FOUND=`diff /usr/tmp/sys.new /tmp/sys.old` if [ -n "$FOUND" ]; then Mail -s "ALERT ERROR" admin@rocket.ugu.com < /tmp/sys.new mv /tmp/sys.new /tmp/sys.old else sleep 10 fi done
Line 1: Define the shell.
Line 2: Create a file to compare to.
Line 3: Begin the endless monitoring.
Line 5: Search for any errors in the system log file.
Line 6: Find any differences in the old error list with the new.
Line 7: If there is a difference, notify the system administrator via email.
Line 8: Replace the old list of errors with the new list.
Line 9: If there is no difference, wait 10 seconds and check again.
There are times when you want to be aware of errors or changes to the system. Finding the differences between an old log file and a new log file can provide the information of the errors or problems you might be looking for.
In this scenario, the user experienced a drop in network connectivity through a specific mount point. This occurred many times throughout the day for periods up to five minutes. My first thought was that a network device was rebooting automatically. Five minutes was a pretty fair amount of time. The log files showed NFS timeouts to the remote mount point randomly once or twice a day. All logs on the remote server reported no problems. So it had to be something going on with the network.
After acquiring a sniffer, it was set up to watch the line through the night. I ran the preceding script searching for NFS timeouts, knowing that the sniffer was going to collect an enormous amount of data. Searching through all the data would have taken forever because only the network, not the specific NFS service port, could be globally monitored with this sniffer. In the morning, three NFS timeouts were emailed to us. When the sniffer was examined and compared with the time stamps of the NFS timeouts I received via email, I learned that all connectivity to the router didn't exist during these periods.
Because I didn't have access to the router, I now had the proof needed that something was possibly wrong with the router. The network administrator for that router was then brought into the loop and discovered that the router was continuously rebooting. It was replaced and all was well.
Man pages:
diff, grep, Mail, mail
UNIX Hints & Hacks |
|||||||||||||||||||||||||||||||||||||
Chapter 4: System Monitoring |
|
||||||||||||||||||||||||||||||||||||
|
© Copyright Macmillan USA. All rights reserved.