Chapter 1: Topics in Administration: 1.5 Keep Those Daemons Running

UNIX Hints & Hacks

Chapter 1: Topics in Administration

Sections in this Chapter:
1.1 Collecting System Information	1.7 Swap on-the-Fly	1.13 Remove the ---- Dashes ----
1.2 Backup Key Files!	1.8 Keep It Up with nohup	1.14 echo Does ls
1.3 Execution on the Last Day of a Month	1.9 Redirecting Output to Null	1.15 Building Large Dummy Files
1.4 Dealing with Unwanted Daemons	1.10 Keeping Remote Users Out	1.16 Burning-in Disk Drives
1.5 Keep Those Daemons Running	1.11 Rewinding Tapes Fast	1.17 Bringing a System Down
1.6 fuser Instead of ps	1.12 Generating a Range of Numbers

1.5 Keep Those Daemons Running

1.5.1 Description

If a daemon has a habit of dying, monitor it and restart it if it dies.

Example

Flavors: AT&T

Shells: csh, ksh.

The following csh script will keep an eye on the process table and check whether any predefined daemons have died. If they have died, restart them. If the process is still running, exit the script.

#!/bin/csh
foreach DAEMON ( MonitorSuLog.pl MonitorLogins.pl DiskHogs.pl )
 ps -e | fgrep "$DAEMON:t" | cut -c1-8 > /dev/null
 if ( $status > 0 ) then
   echo "Restarting $daemon"
   date
   $DAEMON &
 endif
end

Line 1: Define the shell to use for the script.

Line 3: Process each of the defined daemons listed individually.

Line 4: Search through the Process table for the defined daemon and parse the results. If there is output, you don't want to see it, so send it to /dev/null.

Line 5: If the daemon was found in the process table, a status signal greater than 0 will exist and lines 6, 7, and 8 will be executed. If the daemon does not exist, stop here and go to line 10.

Line 6: Send the restart message to standard out.

Line 7: Send the current date and time to standard out.

Line 8: Start the daemon.

Line 9: End the testing.

Line 10: If there are more daemons to test, get the next daemon and check the daemon with line 4. If there are no more daemons to check, exit the script.

To get this to monitor the daemons continuously throughout the day, put an entry into the crontab and have it run every 10 minutes. Modify the crontab setting:

# crontab -l > /tmp/crontab.txt
# vi /tmp/crontab.txt

Add the following line into the crontab file so that it runs the monitor script every 10 minutes:

0,10,20,30,40,50 * * * * /usr/local/bin/monitor_daemons

In the previous crontab entry, if accounting is turned on, all output from cron will be logged and possibly mailed to the user running this cron job. If you don't expect the daemons to die very often, it would be wise to use this entry to check how often the daemons are dying. If you expect the daemons to die a great deal, send the output to /dev/null with following crontab entry:

0,10,20,30,40,50 * * * * /usr/local/bin/monitor_daemons >/dev/null 2>&1

After the entry has been made in the crontab.txt file, submit the file to the cron. For security reasons, remove the crontab.txt file:

# /bin/crontab /tmp/crontab.txt
# rm /tmp/crontab.txt

Reason

This hack isn't a fix, but merely a patch. This is a problem. Daemons should not die if they are functioning properly. There are times when daemons pertaining to certain applications and system programs can die. If you do not have maintenance support, you could be out of luck and stuck with the problem. If you do have support, you already know that it takes time to get to that second- or third-level tech support person who knows what you are talking about, and time is not on your side.

Real World Experiences

Experience has shown that daemons die sometimes and for unknown reasons. Being faced with a DNS daemon dying twice a month and receiving those wonderful early morning calls is no fun. Because the problem might occur only rarely, it can be very difficult to figure out. This hack patches the problem quickly and, if nothing else, stops your pager from going off.

Another great use for this hack is to monitor a process; but instead of restarting the daemon, you can start a new process when the old one dies. There are times when data cannot be verified, checked, or processed until another program finishes, so keeping an eye on specific processes can help to automate your environment.

Other Resources

Man pages:

cron, crontab, ps, test

UNIX Hints & Hacks

Contents Index

Chapter 1: Topics in Administration

Previous Chapter Next Chapter

Sections in this Chapter:
1.1 Collecting System Information	1.7 Swap on-the-Fly	1.13 Remove the ---- Dashes ----
1.2 Backup Key Files!	1.8 Keep It Up with nohup	1.14 echo Does ls
1.3 Execution on the Last Day of a Month	1.9 Redirecting Output to Null	1.15 Building Large Dummy Files
1.4 Dealing with Unwanted Daemons	1.10 Keeping Remote Users Out	1.16 Burning-in Disk Drives
1.5 Keep Those Daemons Running	1.11 Rewinding Tapes Fast	1.17 Bringing a System Down
1.6 fuser Instead of ps	1.12 Generating a Range of Numbers