Chapter 4: System Monitoring: 4.10 Monitoring Core Files

UNIX Hints & Hacks

Chapter 4: System Monitoring

Sections in this Chapter:
4.1 Monitoring at Boot Time	4.5 Mail a Process	4.9 Monitoring with ping	4.13 Checking the Time
4.2 Starting with a Fresh Install	4.6 Watching the Disk Space	4.10 Monitoring Core Files
4.3 Monitor with tail	4.7 Find the Disk Hog	4.11 Monitoring Crash Files
4.4 Cut the Log in Half	4.8 Watching by grepping the Difference	4.12 Remember Daylight Savings Time

4.10 Monitoring Core Files

4.10.1 Description

Keeping an eye out for core files is an important way to not waste disk space. If the user doesn't need them, get rid of them.

Example One: Locating the Core Files

Flavors: AT&T, BSD

Shells: All

Syntax:

find dirname [-xdev] [-local] [-mount] [-name file] expression

In using the find command, there are options available. There is one that can keep the find command from spanning across other filesystems, including NFS-mounted filesystems. Check your man pages to see which one is being used by your flavor. The available arguments for this function would be -x, -xdev, -local, and -mount.

Search for all the core files on the local root filesystem that have not been accessed in three days and display them to standard out, by using

# find / -xdev -name core -atime +3 -print

If it is determined in your environment that it is safe to remove any and all core files, find can execute the remove command on the core files that it finds.

#find / -local -name core -exec rm -f {} ';'

In this version of the command, find searches locally on the system for the file named core. If one is found, it is stored in a buffer, and the rm command is then executed on the file that is stored in the buffer. This continues until the find command completes its search. This can be placed in the crontab to be run every night. The crontab entry would appear as

15 12 * * * find / -local -name core -exec rm -f {} ';'

Reason

When a program is sent a QUIT signal, it writes out what was in memory at the time the signal was sent to disk. These core files can be equal in size to the amount of memory in the system. Often, they are equal to the amount of memory that the running application is using at the time when the core file is created.

Real World Experience

The root filesystem is sometimes allocated only 7-30MD of disk space for the partition. If a large enough core file is created and root fills up with no space left, the system has the potential to grind slowly to a halt or crash.

It is very important to keep a watchful eye for these files. Today, some vendors have this find command built in to the crontab when you do a base default installation of the operating system. It is one of the little forgotten commands that add to an administrator's headaches on a bad day.

Core files can be written in various places throughout the system. Each place where the core files reside can be considered on an individual basis for determining whether the file needs to be removed right away. When memory is dumped to a core file, it can reside the user's home directory, the directory the application resides in, or in the root directory.

User Home Directory Some applications lock themselves to the directory of the user running the application. When the core file is dumped, it places it into the home directory of that user. You should check to verify that the user is not in need of the core file before it is removed. She might be experiencing problems with the application and working with a vendor who needs the core file to resolve the problem. In reality, 9 out of 10 users never need the core file, and half of that never even know what the file is or how it was created. They leave it in their home directories thinking it is a core file that the system uses as part of their account.

Application Directory If applications are running daemons and they crash, the application can trap the signal and write the core file to a predetermined directory set by an environment variable that the application knows about. Vendors sometimes do this so that they can tell users where the necessary maintenance and support files can be found to help solve problems that users are having. The users of the application might be working with the vendor to resolve any problems they are having. You might want to check with them before removing these files.

Root Directory The core files that end up in the root directory should be either moved and analyzed or removed. Core files reside here when an application running as root ends with a QUIT signal or, on some BSD systems, when the operating system crashes in a hard way with memory parity error or other hardware failures.

Because these core files are binary files, you might want to use the strings command to extract any useful information that you can. These can be extremely large in size, so you should pipe the command to more.

# strings core | more

couldn't register prog %d vers %d
out of memory
registerrpc: %s
trouble replying to prog %d
never registered prog %d
svc_tcp.c - tcp socket creation problem
svctcp_.c - cannot getsockname or listen
out of memory
svctcp_create: %s
svc_tcp: makefd_xprt: %s
svcudp_create: socket creation problem
svcudp_create - cannot getsockname
out of memory
svcudp_create: %s
enablecache: cache already enabled
enablecache: could not allocate cache
enablecache: could not allocate cache data
enablecache: could not allocate cache fifo
cache_set: victim not found
cache_set: victim alloc failed
cache_set: could not allocate new rpc_buffer
.
.
.
etc...

The largest core I've experienced was a little over 500MB and took over 15 minutes to write out. I had a root filesystem configured on a 4GB disk partition, so there was enough space to write out the core file. The programmers didn't want to keep it around, because it was too much for them to analyze. They already knew they had a memory leak problem program.

One last note on the subject of core files. Time and time again, third-party first-line technical support people ask users to email the core files. Unless you want to have more problems on your hands, don't let your users do this! Tell them to use the FTP site, because core files are usually too big for email gateways and can get rejected or, worse, they can hang up in an SMTP gateway somewhere. Users often believe the person on the other end of the phone knows what he or she is talking about because they believe technical support knows what's best. Only UNIX administrators know what's best for their environment, right?

Other Resources

Man pages:

find, strings

UNIX Hints & Hacks

Contents Index

Chapter 4: System Monitoring

Previous Chapter Next Chapter

Sections in this Chapter:
4.1 Monitoring at Boot Time	4.5 Mail a Process	4.9 Monitoring with ping	4.13 Checking the Time
4.2 Starting with a Fresh Install	4.6 Watching the Disk Space	4.10 Monitoring Core Files
4.3 Monitor with tail	4.7 Find the Disk Hog	4.11 Monitoring Crash Files
4.4 Cut the Log in Half	4.8 Watching by grepping the Difference	4.12 Remember Daylight Savings Time