UNIX Hints & Hacks

ContentsIndex

Chapter 4: System Monitoring

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

4.1 Monitoring at Boot Time

 

4.5 Mail a Process

 

4.9 Monitoring with ping

 

4.2 Starting with a Fresh Install

 

4.6 Watching the Disk Space

 

4.10 Monitoring Core Files

 

 

4.3 Monitor with tail

 

4.7 Find the Disk Hog

 

4.11 Monitoring Crash Files

 

 

4.4 Cut the Log in Half

 

4.8 Watching by grepping the Difference

 

4.12 Remember Daylight Savings Time

 

 

 

4.9 Monitoring with ping

4.9.1 Description

4.9.1 Description

There are several reasons why a system can fall off the network. Use ping to let you know when it happens.

Example One: ping a Remote Host

Flavor: AT&T

Syntax:

ping [-c count] [-s size] host
ping host [size] [count]
ping host [-n count] size

Almost all flavors of UNIX offer the capability to execute ping with a packet count and a size. Check your man pages to find out the order that the arguments need to be in. To ensure that you don't get a false response, use ping with a count of 3 and a large enough number of data byes so that the network connectivity between the two machines is thoroughly tested.

For a successful result with a count of three data packets, all three should be transmitted and all three received. There should be 0% packet loss between the two systems. If there is a problem with the network, the number of packets received will be zero.

Here is the output of a successful transmission of three data packets that are 1000 bytes large.

xinu 1% ping -c 3 -s 1000 rocket
PING jumbo (209.15.10.11): 1000 data bytes
1008 bytes from 209.15.10.11: icmp_seq=0 ttl=255 time=20 ms
1008 bytes from 209.15.10.11: icmp_seq=1 ttl=255 time=5 ms
1008 bytes from 209.15.10.11: icmp_seq=2 ttl=255 time=5 ms
----jumbo PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 5/10/20 ms

If there was a problem with the network, this is what the output would look like.

xinu 2% ping -c 3 -s 1000 rocket
PING 209.15.10.11 (209.15.10.11): 1000 data bytes
----209.15.10.11 PING Statistics----
3 packets transmitted, 0 packets received, 100% packet loss

The thought is if I can know when zero packets were received, I can send out a message on that signal. Using the ping command in combination with grep and awk achieves the following.

xinu 2% ping -c 3 -s 1000 rocket | grep received | awk -f, '{ print $1 }' | awk '{print $1}'
0

If there is no connectivity to the remote machine, as in this case, the result is the value 0. If there is connectivity the result would have been 3.

Example Two: Monitoring a Host with ping

Flavor: AT&T

Shell: sh

Syntax:

ping [-c count] [-s size] host
grep [pattern]
echo [string]
mail [-s string] address
sleep [value]

When this is added to a notification script that mails a system administrator when there are problems, it could be written as

#! /bin/sh
while [ 1 ]
do
 PING=`ping -c 3 -s 1000 rocket | grep received | awk -F, '{ print $2 }' | awk '{print $1}'`
 if [ $PING -eq 0 ]; then
   echo "rocket Off Network" |  mail -s "PING FAILED" admin@pager.ugu.com
 fi
 sleep 60
done

Line 1: Define the shell.

Line 2: Begin the endless monitoring.

Line 4: Store the number of received data packets into the variable $PING.

Line 5: Check whether no data packets were received by the remote host.

Line 6: If none were received, send a mail message that the system is off the network from a ping test.

Line 7: Wait for a minute, and then test it again.

Note - Be aware the sleep period should not be too great of a value. Cases have been reported where this value was set to five minutes. Some UNIX systems can reboot in under five minutes. When this occurs the Ping Failed message never gets sent and a problem will never be known. Use your own judgment on this value.


If you are in a large environment, you can easily modify the preceding script to perform this function on a list of hosts that you define.

#! /bin/sh
HOSTS="rocket moon pluto"
while [ 1 ]; do
 for SYS in $HOSTS; do
   PING=`ping -c 3 -s 1000 $SYS | grep received | awk -F, '{ print $2 }' | awk '{print $1}'`
   if [ $PING -eq 0 ]; then
     echo "$SYS Off Network" |  mail -s "PING FAILED" admin@pager.ugu.com
   fi
 done
 sleep 30
done

Line 1: Define the shell.

Line 2: Define the multiple systems that are being monitored.

Line 3: Begin the endless monitoring of the remote hosts.

Line 4: Begin checking each of the remote hosts one at a time.

Line 5: Store the number of received data packets into the variable $PING.

Line 6: Check whether no data packets were received by the remote host.

Line 7: If none were received, send a mail message that the system is off the network from a ping test.

Line 8: Wait a little bit, and then try it all again.

The sleep value should be adjusted with each host that is added to be monitored. If there are over seven hosts, you might not even want to add a pause with the sleep command. It is good to try to monitor a system once every one or two minutes. You should evaluate the amount that is required for your individual environment.

Reason

Monitoring a series of systems is a necessary proactive step that can be taken to maintain a healthy computing environment.

Real World Experience

Monitoring the system is one of my favorite things to do, especially when one goes down. I'll tell you why: Almost two out of five times that my pager goes off, it's because a system dropped off the network because the user hit the power button. When a proactive phone call is made to the user, asking whether everything is okay, he is usually shocked that you knew so quickly that he tampered with the system. By staying on top of the situation, you earn his admiration. At the same time, however, he's scared that you are like a god or Big Brother that knows all and can see all.

Other Resources

Man pages:

awk, echo, grep, mail, ping, sleep

World Wide Web:

Big Brother-- http://maclawran.ca/bb-dnld

UNIX Hints & Hacks

ContentsIndex

Chapter 4: System Monitoring

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

4.1 Monitoring at Boot Time

 

4.5 Mail a Process

 

4.9 Monitoring with ping

 

4.2 Starting with a Fresh Install

 

4.6 Watching the Disk Space

 

4.10 Monitoring Core Files

 

 

4.3 Monitor with tail

 

4.7 Find the Disk Hog

 

4.11 Monitoring Crash Files

 

 

4.4 Cut the Log in Half

 

4.8 Watching by grepping the Difference

 

4.12 Remember Daylight Savings Time

 

 

 

© Copyright Macmillan USA. All rights reserved.