UNIX Hints & Hacks

ContentsIndex

Chapter 4: System Monitoring

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

4.1 Monitoring at Boot Time

 

4.5 Mail a Process

 

4.9 Monitoring with ping

 

4.2 Starting with a Fresh Install

 

4.6 Watching the Disk Space

 

4.10 Monitoring Core Files

 

 

4.3 Monitor with tail

 

4.7 Find the Disk Hog

 

4.11 Monitoring Crash Files

 

 

4.4 Cut the Log in Half

 

4.8 Watching by grepping the Difference

 

4.12 Remember Daylight Savings Time

 

 

 

4.10 Monitoring Core Files

4.10.1 Description

4.10.1 Description

Keeping an eye out for core files is an important way to not waste disk space. If the user doesn't need them, get rid of them.

Example One: Locating the Core Files

Flavors: AT&T, BSD

Shells: All

Syntax:

find dirname [-xdev] [-local] [-mount] [-name file] expression

In using the find command, there are options available. There is one that can keep the find command from spanning across other filesystems, including NFS-mounted filesystems. Check your man pages to see which one is being used by your flavor. The available arguments for this function would be -x, -xdev, -local, and -mount.

Search for all the core files on the local root filesystem that have not been accessed in three days and display them to standard out, by using

# find / -xdev -name core -atime +3 -print

If it is determined in your environment that it is safe to remove any and all core files, find can execute the remove command on the core files that it finds.

#find / -local -name core -exec rm -f {} ';'

In this version of the command, find searches locally on the system for the file named core. If one is found, it is stored in a buffer, and the rm command is then executed on the file that is stored in the buffer. This continues until the find command completes its search. This can be placed in the crontab to be run every night. The crontab entry would appear as

15 12 * * * find / -local -name core -exec rm -f {} ';'

Reason

When a program is sent a QUIT signal, it writes out what was in memory at the time the signal was sent to disk. These core files can be equal in size to the amount of memory in the system. Often, they are equal to the amount of memory that the running application is using at the time when the core file is created.

Real World Experience

The root filesystem is sometimes allocated only 7-30MD of disk space for the partition. If a large enough core file is created and root fills up with no space left, the system has the potential to grind slowly to a halt or crash.

It is very important to keep a watchful eye for these files. Today, some vendors have this find command built in to the crontab when you do a base default installation of the operating system. It is one of the little forgotten commands that add to an administrator's headaches on a bad day.

Core files can be written in various places throughout the system. Each place where the core files reside can be considered on an individual basis for determining whether the file needs to be removed right away. When memory is dumped to a core file, it can reside the user's home directory, the directory the application resides in, or in the root directory.

Because these core files are binary files, you might want to use the strings command to extract any useful information that you can. These can be extremely large in size, so you should pipe the command to more.

# strings core | more
couldn't register prog %d vers %d out of memory registerrpc: %s trouble replying to prog %d never registered prog %d svc_tcp.c - tcp socket creation problem svctcp_.c - cannot getsockname or listen out of memory svctcp_create: %s svc_tcp: makefd_xprt: %s svcudp_create: socket creation problem svcudp_create - cannot getsockname out of memory svcudp_create: %s enablecache: cache already enabled enablecache: could not allocate cache enablecache: could not allocate cache data enablecache: could not allocate cache fifo cache_set: victim not found cache_set: victim alloc failed cache_set: could not allocate new rpc_buffer . . . etc...

The largest core I've experienced was a little over 500MB and took over 15 minutes to write out. I had a root filesystem configured on a 4GB disk partition, so there was enough space to write out the core file. The programmers didn't want to keep it around, because it was too much for them to analyze. They already knew they had a memory leak problem program.

One last note on the subject of core files. Time and time again, third-party first-line technical support people ask users to email the core files. Unless you want to have more problems on your hands, don't let your users do this! Tell them to use the FTP site, because core files are usually too big for email gateways and can get rejected or, worse, they can hang up in an SMTP gateway somewhere. Users often believe the person on the other end of the phone knows what he or she is talking about because they believe technical support knows what's best. Only UNIX administrators know what's best for their environment, right?

Other Resources

Man pages:

find, strings

UNIX Hints & Hacks

ContentsIndex

Chapter 4: System Monitoring

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

4.1 Monitoring at Boot Time

 

4.5 Mail a Process

 

4.9 Monitoring with ping

 

4.2 Starting with a Fresh Install

 

4.6 Watching the Disk Space

 

4.10 Monitoring Core Files

 

 

4.3 Monitor with tail

 

4.7 Find the Disk Hog

 

4.11 Monitoring Crash Files

 

 

4.4 Cut the Log in Half

 

4.8 Watching by grepping the Difference

 

4.12 Remember Daylight Savings Time

 

 

 

© Copyright Macmillan USA. All rights reserved.