UNIX Hints & Hacks |
|||||||||||||||||||||||||||||||||||||
Chapter 4: System Monitoring |
|
||||||||||||||||||||||||||||||||||||
|
There are several reasons why a system can fall off the network. Use ping to let you know when it happens.
Flavor: AT&T
Syntax:
ping [-c count] [-s size] host ping host [size] [count] ping host [-n count] size
Almost all flavors of UNIX offer the capability to execute ping with a packet count and a size. Check your man pages to find out the order that the arguments need to be in. To ensure that you don't get a false response, use ping with a count of 3 and a large enough number of data byes so that the network connectivity between the two machines is thoroughly tested.
For a successful result with a count of three data packets, all three should be transmitted and all three received. There should be 0% packet loss between the two systems. If there is a problem with the network, the number of packets received will be zero.
Here is the output of a successful transmission of three data packets that are 1000 bytes large.
xinu 1% ping -c 3 -s 1000 rocket PING jumbo (209.15.10.11): 1000 data bytes 1008 bytes from 209.15.10.11: icmp_seq=0 ttl=255 time=20 ms 1008 bytes from 209.15.10.11: icmp_seq=1 ttl=255 time=5 ms 1008 bytes from 209.15.10.11: icmp_seq=2 ttl=255 time=5 ms ----jumbo PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 5/10/20 ms
If there was a problem with the network, this is what the output would look like.
xinu 2% ping -c 3 -s 1000 rocket PING 209.15.10.11 (209.15.10.11): 1000 data bytes ----209.15.10.11 PING Statistics---- 3 packets transmitted, 0 packets received, 100% packet loss
The thought is if I can know when zero packets were received, I can send out a message on that signal. Using the ping command in combination with grep and awk achieves the following.
xinu 2% ping -c 3 -s 1000 rocket | grep received | awk -f, '{ print $1 }' | awk '{print $1}' 0
If there is no connectivity to the remote machine, as in this case, the result is the value 0. If there is connectivity the result would have been 3.
Flavor: AT&T
Shell: sh
Syntax:
ping [-c count] [-s size] host grep [pattern] echo [string] mail [-s string] address sleep [value]
When this is added to a notification script that mails a system administrator when there are problems, it could be written as
#! /bin/sh while [ 1 ] do PING=`ping -c 3 -s 1000 rocket | grep received | awk -F, '{ print $2 }' | awk '{print $1}'` if [ $PING -eq 0 ]; then echo "rocket Off Network" | mail -s "PING FAILED" admin@pager.ugu.com fi sleep 60 done
Line 1: Define the shell.
Line 2: Begin the endless monitoring.
Line 4: Store the number of received data packets into the variable $PING.
Line 5: Check whether no data packets were received by the remote host.
Line 6: If none were received, send a mail message that the system is off the network from a ping test.
Line 7: Wait for a minute, and then test it again.
If you are in a large environment, you can easily modify the preceding script to perform this function on a list of hosts that you define.
#! /bin/sh HOSTS="rocket moon pluto" while [ 1 ]; do for SYS in $HOSTS; do PING=`ping -c 3 -s 1000 $SYS | grep received | awk -F, '{ print $2 }' | awk '{print $1}'` if [ $PING -eq 0 ]; then echo "$SYS Off Network" | mail -s "PING FAILED" admin@pager.ugu.com fi done sleep 30 done
Line 1: Define the shell.
Line 2: Define the multiple systems that are being monitored.
Line 3: Begin the endless monitoring of the remote hosts.
Line 4: Begin checking each of the remote hosts one at a time.
Line 5: Store the number of received data packets into the variable $PING.
Line 6: Check whether no data packets were received by the remote host.
Line 7: If none were received, send a mail message that the system is off the network from a ping test.
Line 8: Wait a little bit, and then try it all again.
The sleep value should be adjusted with each host that is added to be monitored. If there are over seven hosts, you might not even want to add a pause with the sleep command. It is good to try to monitor a system once every one or two minutes. You should evaluate the amount that is required for your individual environment.
Monitoring a series of systems is a necessary proactive step that can be taken to maintain a healthy computing environment.
Monitoring the system is one of my favorite things to do, especially when one goes down. I'll tell you why: Almost two out of five times that my pager goes off, it's because a system dropped off the network because the user hit the power button. When a proactive phone call is made to the user, asking whether everything is okay, he is usually shocked that you knew so quickly that he tampered with the system. By staying on top of the situation, you earn his admiration. At the same time, however, he's scared that you are like a god or Big Brother that knows all and can see all.
Man pages:
awk, echo, grep, mail, ping, sleep
World Wide Web:
Big Brother-- http://maclawran.ca/bb-dnld
UNIX Hints & Hacks |
|||||||||||||||||||||||||||||||||||||
Chapter 4: System Monitoring |
|
||||||||||||||||||||||||||||||||||||
|
© Copyright Macmillan USA. All rights reserved.