DirectNET

Data Center Management Solutions including UPS Systems, Data Center Cooling, KVM over IP & IP Power Strips, Server Racks and Server Rack accessories; KVM Switches and KVM Extenders; Rackmount Monitors and Rackmount Keyboards.


NAVIGATION
Home
Store
INSIDE MAC
Television Shows
Broadcast Shows
Daily News Shows
Special Shows
EVENTS
DAILY TIPS
Design
Mac OS X
Mac OS X UNIX
COMMUNITY
Forums
Surveys
NEWS
Current
Press
Archive
FEATURES
Editorial
Dr. Mac
Reviews
Reader Reports
RESOURCES
FAQ
Documentation
Learning Center
MAN pages
Glossary
Tutorials
Tips
Links

OUR PARTNERS

OS X | UNIX

back

Unix

Mac OS X Unix Tutorial

by Adrian Mayo - Senior Editor for Mac OS X Unix, Janice Mayo - Senior Editor for Mac OS X Unix

Part 6 - 'grep', 'sed', and Regular Expressions (page 1 of 2)

The Story So Far

Parts 1 to 4 covered basic Unix theory. In Part five we used our Unix skills to in a more practical way, displaying and locating files with commands such as 'head', 'tail', 'locate', and 'find'.

This part continues where part five left off, introducing regular expressions and two commands that make use of them:

  • 'grep' for searching within files
  • 'sed' for changing files.

Part 7 will cover the concepts of re-direction and pipes, and how one uses pipes to combine simple commands to perform more complex tasks.

Parts 8 and 9 will cover shell scripting.

Regular Expressions

A regular expression is a pattern that can be searched for in a plain text file. It may be fixed with only one possible match, or contain special characters and have a number of possible matches.

For example, the pattern 'Janice' only has one possible match - 'Janice'

The pattern 'J.*e' will match any text that starts with 'J' and finishes with 'e' - 'Jane', 'Janice' etc.

Many Unix commands work with regular expressions:

  • grep
  • sed
  • awk
  • ed, ex, and vi

BBedit can also match using regular expressions.

In order to demonstrate regular expressions we need a command that makes use of them. To this end I will introduce the infinitely useful 'grep' using just simple fixed patterns. 'grep' searches text files for a given pattern. Then I will describe more complex regular expressions again using 'grep' to demonstrate pattern matching. Finally I will show 'sed' in action. This command searches files for regular expressions and can make changes to the files.



Searching Inside Files with 'grep'

So far the commands we have used locate and search for files. 'grep' searches within a file. 'grep' will search a file, or many files, for a given piece of text (often called a string).

I will cover more complex usage of 'grep' in an Advanced Lesson.

The syntax of grep is:

% grep [options] pattern file

Let's search the contents of files in a directory called 'letters':

% cd ~/letters
% ls
party-list.txt   to-bank-manager.txt   to-jan.txt
to-jan2.txt      to-me.txt             to-scott.txt

To find all letters that contain the text, or string, 'Janice':

% grep Janice *
party-list.txt:Janice
to-jan.txt:Dear Janice
to-jan2.txt:Dear Janice

Each match is listed in the form:

<file name>:<line that contains the string>

Another example:

% grep adrian *
to-me.txt:Dear adrian
to-me.txt:Regards adrian

Only 2! 'grep' is case sensitive unless you give option '-i':

% grep -i adrian *
to-bank-manager.txt:Sincerely Adrian
to-jan.txt:Regards Adrian
to-jan2.txt:Regards Adrian
to-me.txt:Dear adrian
to-me.txt:Regards adrian
to-scott.txt:Regards Adrian

'grep', like most Unix commands, has many options. The most useful of these are demonstrated next:

 
Tell Me More...

What's in a Name?

From where does 'grep' get its rather odd name?

The standard Unix editors, ed/ex/vi, have a command that searches every line of a file for a given 'regular expression', displaying each matching line. The syntax of this command is:

g/re/p

where 're' is the regular expression to search for.

'grep' is a stand-alone program that does just this.

Regular expressions are explained later in the tutorial. A string is a simple regular expression.

Is Scott Invited?

Is Scott invited to the party? Let's see:

% grep -i scott party-list.txt
%

Nope!

'man grep'

Never forget to 'man' new commands to learn more about them:

% man grep

<there follows much useful information....>


'grep' options

'-c' count: display each file searched and the number of times the pattern was found.

% grep -ci adrian *
party-list.txt:0
to-bank-manager.txt:1
to-jan.txt:1
to-jan2.txt:1
to-me.txt:2
to-scott.txt:1

'-l' list: display the name only for each file that contains the pattern.

% grep -li adrian *
to-bank-manager.txt
to-jan.txt
to-jan2.txt
to-me.txt
to-scott.txt

'-h' no-list: display only the lines that contain the string, no filenames.

% grep -hi adrian *
Sincerely Adrian
Regards Adrian
Regards Adrian
Dear adrian
Regards adrian
Regards Adrian

'-w' word: search for the pattern as a whole word.

Compare:

% grep  Jan *
party-list.txt:Janice
party-list.txt:Janet
to-jan.txt:Dear Janice
to-jan2.txt:Dear Janice

Against:

% grep -w Jan *
%

'-v' inverse: search for lines that do not contain the pattern.

'-I' ignore: ignore binary files (non-text files) in the search.

'-e' extended: switches to extended 'grep', which can handle extended regular expressions.

The standard 'grep' can only handle basic regular expressions. As an alternative to '-e', you can use 'egrep' which is just 'grep' with an implicit '-e' option.


Recursive 'grep'

The option '-r' tells 'grep' to search directories recursively. Thus it will search all files in the current directory, and for each directory it finds, it will search all files in that directory too. This recursion will traverse through a directory hierarchy of arbitrary depth.

% cd ~/Sites/Tips/
% grep -iIrc Monday *
index.ws:0
unix-tricks/index.ws:20
unix-tricks/week10/friday.ws:0
unix-tricks/week10/monday.ws:1
unix-tricks/week10/thursday.ws:0
unix-tricks/week10/tuesday.ws:0
unix-tricks/week10/wednesday.ws:0
unix-tricks/week11/friday.ws:0
unix-tricks/week11/monday.ws:1
unix-tricks/week11/thursday.ws:0
unix-tricks/week11/tuesday.ws:0
unix-tricks/week11/wednesday.ws:0
......

Notice that I have combined several options: ignore case; ignore binary files (recommended for recursive searches); recursive search; print count only.

 
Tell Me More...

Unix Lines

Remember that the Unix end of line is different to the standard Mac end of line. Thus grepping Mac style files will not give the expected results - the file appears to be one long line.

Word Processor Files

Files from the likes of MS Word and AppleWorks contain control information and may appear to be binary files and so will be skipped by the '-I' option. They will also contain non-Unix end of line markers.


Next Page

Next page I will describe regular expressions. These can be used with 'grep' and describe a range of possible matches instead of an exact string such as "Janice". For example "Jan.*" will match "Janice", "Jan", "Janet", etc.

I will also introduce the command 'sed', which can search for and replace text described by regular expressions.

previous

Part 6 - 'grep', 'sed', and Regular Expressions (page 1 of 2)

next

Copyright © 2000-2008 Inside Mac Media, Inc. All rights reserved.
Apple assumes no responsibility with regard to the selection, performance, or use of the products or services. All understandings, agreements, or warranties, if any, take place directly between the vendors and prospective users.
Apple, the Apple logo, Mac, PowerMac G4, PowerMac G5, Xserve, Xserve RAID, PowerBook, iBook, Airport, AirPort Extreme, iMac, eMac, iLife, iMovie, iCal, iPhoto, iTunes, QuickTime, FireWire, iPod, iSight, AppleWorks, Macintosh, Jaguar, Panther, Mac OS, Mac OS X and Mac OS X Server are trademarks of Apple Computer, Inc.