Chapter 4: System Monitoring: 4.8 Watching by grepping the Difference

UNIX Hints & Hacks

Chapter 4: System Monitoring

Sections in this Chapter:
4.1 Monitoring at Boot Time	4.5 Mail a Process	4.9 Monitoring with ping	4.13 Checking the Time
4.2 Starting with a Fresh Install	4.6 Watching the Disk Space	4.10 Monitoring Core Files
4.3 Monitor with tail	4.7 Find the Disk Hog	4.11 Monitoring Crash Files
4.4 Cut the Log in Half	4.8 Watching by grepping the Difference	4.12 Remember Daylight Savings Time

4.8 Watching by grepping the Difference

4.8.1 Description

Use grep with diff to monitor problems that might be hiding in large log files.

Example

Flavors: AT&T, BSD

Shell: sh

Syntax:

grep [pattern]
diff file1 file2

This short script monitors a log file for specific errors or messages; when they occur, the script sends an email to a predefined address.

#! /bin/sh
touch /tmp/sys.old
while [ 1 ]
do
 grep ERROR /var/adm/SYSLOG > /tmp/sys.new
 FOUND=`diff /usr/tmp/sys.new /tmp/sys.old`
 if [ -n "$FOUND" ]; then
   Mail -s "ALERT ERROR" admin@rocket.ugu.com < /tmp/sys.new
    mv /tmp/sys.new /tmp/sys.old
  else
   sleep 10
 fi
done

Line 1: Define the shell.

Line 2: Create a file to compare to.

Line 3: Begin the endless monitoring.

Line 5: Search for any errors in the system log file.

Line 6: Find any differences in the old error list with the new.

Line 7: If there is a difference, notify the system administrator via email.

Line 8: Replace the old list of errors with the new list.

Line 9: If there is no difference, wait 10 seconds and check again.

Reason

There are times when you want to be aware of errors or changes to the system. Finding the differences between an old log file and a new log file can provide the information of the errors or problems you might be looking for.

Real World Experience

In this scenario, the user experienced a drop in network connectivity through a specific mount point. This occurred many times throughout the day for periods up to five minutes. My first thought was that a network device was rebooting automatically. Five minutes was a pretty fair amount of time. The log files showed NFS timeouts to the remote mount point randomly once or twice a day. All logs on the remote server reported no problems. So it had to be something going on with the network.

After acquiring a sniffer, it was set up to watch the line through the night. I ran the preceding script searching for NFS timeouts, knowing that the sniffer was going to collect an enormous amount of data. Searching through all the data would have taken forever because only the network, not the specific NFS service port, could be globally monitored with this sniffer. In the morning, three NFS timeouts were emailed to us. When the sniffer was examined and compared with the time stamps of the NFS timeouts I received via email, I learned that all connectivity to the router didn't exist during these periods.

Because I didn't have access to the router, I now had the proof needed that something was possibly wrong with the router. The network administrator for that router was then brought into the loop and discovered that the router was continuously rebooting. It was replaced and all was well.

Other Resources

Man pages:

diff, grep, Mail, mail

UNIX Hints & Hacks

Contents Index

Chapter 4: System Monitoring

Previous Chapter Next Chapter

Sections in this Chapter:
4.1 Monitoring at Boot Time	4.5 Mail a Process	4.9 Monitoring with ping	4.13 Checking the Time
4.2 Starting with a Fresh Install	4.6 Watching the Disk Space	4.10 Monitoring Core Files
4.3 Monitor with tail	4.7 Find the Disk Hog	4.11 Monitoring Crash Files
4.4 Cut the Log in Half	4.8 Watching by grepping the Difference	4.12 Remember Daylight Savings Time