UNIX Hints & Hacks

ContentsIndex

Chapter 6: File Management

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

6.1 Copy Files with Permissions and Time Stamps

 

6.5 Finding Files with grep

 

6.8 Moving and Renaming Groups of Files

 

6.11 Splitting Files

6.2 Copy Files Remotely

 

6.6 Multiple grep

 

6.9 Stripping the Man Pages

 

6.12 Limit the Size of the Core

6.3 Which tmp Is a Good Temp?

 

6.7 Executing Commands Recursively with find

 

6.10 Clean Up DOS Files

 

6.13 uuencode and uudecode

6.4 Dealing with Symbolic Links

 

 

 

 

 

 

 

6.10 Clean Up DOS Files

6.10.1 Description

6.10.1 Description

Here is a simple way to get rid of all the ^Ms at the end of lines.

Flavors: AT&T, BSD

Syntax:

tr -d string < infile > outfile
sed 's/[regular expression]/[Replacement]/[flags]/g' infile > outfile

Many times files downloaded from PCs running DOS will have a Ctrl-M ( ^M)on the end of every line when viewed in an editor.

# vi /tmp/hosts.dos

206.19.11.10    pluto    pluto.foo.com^M
206.19.11.203    star    star.foo.com^M
206.19.11.161    moon    moon.foo.com^M
mars    mars.foo.com^M

There are a couple of ways that the Ctrl-M ( ^M) can be stripped out of the file. The first is the tr command, which is used to translate characters. It is possible to tell tr to delete all Ctrl-M characters.

# tr -d "\015" < /tmp/hosts.dos > /tmp/hosts.unix

In this tr command, you delete ( -d) all occurrences of the \015 or Ctrl-M character from the file /tmp/hosts.dos and rewrite the output to /tmp/hosts.unix file. Another way to strip the Ctrl-M character is to pass the file through sed and have it perform a substitution:

# sed 's/^V^M//g' /tmp/hosts.dos > /tmp/hosts.unix

In this version, sed processes the file /tmp/hosts.dos searching for all occurrences of the Ctrl-M character. When it finds one, it performs a substitution of the character. In this case, you can swap the Ctrl-M with null and output the results into the /tmp/hosts.unix file.

The sed command can also be used from within the vi editor. The vi editor has two modes: the insert mode and the command mode. From within the command mode, sed can be executed to perform the substitution.

vi /tmp/hosts.dos

When inside the vi editor, make sure you are in the command mode by pressing the Esc key. Pressing the colon ( :) key allows you to input the sed command.

:%s/^V^M//g

This command behaves the same as using the sed command from a UNIX shell. It searches for all occurrences of the Ctrl-M character. When it finds one, it substitutes the character with nothing. You can then continue working in the file if you have more changes to make.

206.19.11.10    pluto    pluto.foo.com
206.19.11.203    star    star.foo.com
206.19.11.161    moon    moon.foo.com
206.16.11.201    mars    mars.foo.com

The result is a nice clean file with no Ctrl-M characters located anywhere throughout the file.

Reason

Some databases and applications require data from outside the UNIX world. These files can be imported from DOS and need to be straight, clean, plain text with no special control characters embedded in them.

Real World Experience

Some new flavors of UNIX provide tools for handling cases such as these. These tools are called to_dos, to_unix, unix2dos and dos2unix and provide extended options. Check with your flavor to see whether it supports these commands.

Other Resources

Man pages:

sed, tr, to_dos, dos2unix

UNIX Hints & Hacks

ContentsIndex

Chapter 6: File Management

 

Previous ChapterNext Chapter

Sections in this Chapter:

   

6.1 Copy Files with Permissions and Time Stamps

 

6.5 Finding Files with grep

 

6.8 Moving and Renaming Groups of Files

 

6.11 Splitting Files

6.2 Copy Files Remotely

 

6.6 Multiple grep

 

6.9 Stripping the Man Pages

 

6.12 Limit the Size of the Core

6.3 Which tmp Is a Good Temp?

 

6.7 Executing Commands Recursively with find

 

6.10 Clean Up DOS Files

 

6.13 uuencode and uudecode

6.4 Dealing with Symbolic Links

 

 

 

 

 

 

 

© Copyright Macmillan USA. All rights reserved.