lecture 8: using regexps with sed and...

21
Sed Awk Lecture 8: Using RegExps with Sed and Awk CS2042 - UNIX Tools October 17, 2008 Lecture 8: Sed and Awk

Upload: others

Post on 02-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

Lecture 8:Using RegExps with Sed and Awk

CS2042 - UNIX Tools

October 17, 2008

Lecture 8: Sed and Awk

Page 2: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Lecture Outline

1 SedThe Translate CommandIntro to Stream EditingSome Practical Examples

2 AwkWhat is Awk?Awk Variables

Lecture 8: Sed and Awk

Page 3: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

When to Script

What if we wanted to...

Change a Notepad-style text file to Unix-style?

Strip directory prefixes from a path?

Print certain columns from a text file?

Remove all the comment lines from some code?

How much time/effort/code would it take to do this stuff inC/C++/Java?

Stuff like this is the reason scripting was created - you can geteach task done in a line or two.

Lecture 8: Sed and Awk

Page 4: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Search and Replace with TR

The Translate Command

tr [options] <set1> [set2]

Translate or delete characters

Sets are strings of characters

Only works on STDIN/STDOUT - use redirection to translatefiles!

By default, searches for strings matching set1 and replacesthem with set2

tr -c <set1> [set2] will complement set1 before replacing itwith set2

tr -d <set1> deletes the characters in set1 withouttranslating anything.

Lecture 8: Sed and Awk

Page 5: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

TR Examples

Example:

Try echo * - it prints everything in the directory, separated byspaces. Let’s separate them by newlines instead:

echo * | tr ’ ’ ’/n’ - replaces all spaces with newlines

Example:

Let’s print a file in all uppercase:

tr ’a-z’ ’A-Z’ < test.txt - prints the contents of text.txt inall caps

Lecture 8: Sed and Awk

Page 6: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Redirection Revisited

Bash processes I/O redirection from left to right, allowing us to dofun things like this:

Example:

Let’s delete everything but the numbers from test1.txt, then storethem in test2.txt.

tr -cd ’0-9’ < test1.txt > test2.txt

Note:

Redirecting from and to a file at the same time (i.e. if test2.txtabove were also test1.txt) will empty the target file. To preservethe original data, either redirect output to a different file, orappend the output to the end of the original using the >>operator.

Lecture 8: Sed and Awk

Page 7: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Lecture Outline

1 SedThe Translate CommandIntro to Stream EditingSome Practical Examples

2 AwkWhat is Awk?Awk Variables

Lecture 8: Sed and Awk

Page 8: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

About Sed

Stream Editor

sed [options] [script] [file]

Stream editor for filtering and transforming text

We’ll focus on the form sed ’s/<regexp>/<text>’ [file]

This form replaces anything that matches <regexp> with<text>

What is the difference between sed and tr?

We can match RegExps!

sed also does a lot of other stuff, grep some docs for details!

Lecture 8: Sed and Awk

Page 9: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Sed In Use

Example:

echo ”The sky is blue” > testsed ’s/blue/falling/’ test

The sky is falling

Or, without using a file:

Example:

echo ”The sky is blue” | sed ’s/blue/falling/’

The sky is falling

Lecture 8: Sed and Awk

Page 10: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Lecture Outline

1 SedThe Translate CommandIntro to Stream EditingSome Practical Examples

2 AwkWhat is Awk?Awk Variables

Lecture 8: Sed and Awk

Page 11: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

Examples

Let’s strip the directory prefix from our pathnames (i.e. convert/usr/local/src to src)

Example:

pwd | sed ’s/.*\///’

Translates anything preceding (and including) a frontslash tonothing

Note the backslash-escaped frontslash!

Without escaping the slash, our RegExp would be .*

Lecture 8: Sed and Awk

Page 12: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

The Translate CommandIntro to Stream EditingSome Practical Examples

A Sed Script

Any text file that begins with #! is a script!

Example:

Create a new text file named trim.sed

#!/usr/bin/sed -fs/^ *//s/ *$//

You can now run this script from the shell like any other localprogram:echo ” this is a test ” | ./trim.sed

this is a test

And we have a script that trims leading and trailing whitespace!

Lecture 8: Sed and Awk

Page 13: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Lecture Outline

1 SedThe Translate CommandIntro to Stream EditingSome Practical Examples

2 AwkWhat is Awk?Awk Variables

Lecture 8: Sed and Awk

Page 14: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

A New Language

Awk is another scripting language which is specifically designed tooperate on database-like text files.

What is a database?

Databases are files used to store data in a way that is easy tosearch. A simple database is made up of records (rows) and fields(columns). Fields store data related to each record.

A simple database of highballs and their ingredients might looklike:

screwdriver vodka orange juicehighball whiskey ginger alebulldog gin grapefruit juice

Lecture 8: Sed and Awk

Page 15: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Awk Syntax

The GNU version of Awk is called gawk.

The gawk Command

gawk [options] -f program file [target file]gawk [options] program text [target file]

Programs can be specified through a separate file (1) or aspart of the command (2)

If no target file is specified, it’ll work on STDIN

An Awk program consists of a sequence of patterns-actions:

pattern {action statements}The action statements will be run on any input record thatmatches the pattern.

Lecture 8: Sed and Awk

Page 16: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Examples

Print the second and fourth columns of the input

echo “This is a test” | gawk ’{print $2, $4}’is test

If no pattern is given, each record is acted on

Print the second column of lines containing blue:

gawk ’/blue/ {print $2}’Print lines where the 3rd column is bologna

gawk ’$3==“bologna” {print}’Print lines between #START and #FINISH

gawk ’/#START/,/#FINISH/ {print}’

Lecture 8: Sed and Awk

Page 17: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Lecture Outline

1 SedThe Translate CommandIntro to Stream EditingSome Practical Examples

2 AwkWhat is Awk?Awk Variables

Lecture 8: Sed and Awk

Page 18: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

User-Defined Variables

Awk Variables

Awk variables are created when they are first used - no declarationnecessary. They are either floating-point numbers or strings, withthe type based on context.

gawk ’{n=5; print $n}’Will print the 5th column of each record

Lecture 8: Sed and Awk

Page 19: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Built-in Awk Variables

There are a few Awk built-ins which may be useful to us:

FS - the input field separator - space by default

Change this to \t if your file is tab-separated

RS - the input record separator - newline by default

NR - number of input records seen so far

NF - number of fields in the current input record

All of the built-ins can be found in the gawk manpage.

Lecture 8: Sed and Awk

Page 20: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

BEGIN and END

What if we want to take an action only one time, at the beginningor end of execution?

Special Patterns

BEGIN and END are special patterns which aren’t tested againstthe input. The action parts of all BEGIN patterns are executedbefore any input is read, and the action parts of END patterns areexecuted after all input has been exhausted.

Example:

gawk ’{sum+=$2} END {print sum}’Sums column 2 as input is read, prints the total at the end

gawk ’{sum+=$3} END {print (sum/NR)}’Prints the average of column 3

Lecture 8: Sed and Awk

Page 21: Lecture 8: Using RegExps with Sed and Awkmath.iit.edu/~mccomic/595/unixhelp/lectures/lecture8.pdf · C/C++/Java? Stuff like this is the reason scripting was created - you can get

SedAwk

What is Awk?Awk Variables

Interval Expressions in Awk

Note:

In order to use interval expressions in your RegExps (the thingswithin curly braces, like {3,}) you must specify either the –posixor –re-interval option when you run Awk. The asterisk, questionmark, and plus sign operators all work fine by default.

Lecture 8: Sed and Awk