learning sed and awk

60
Learning sed and awk Presented by Yogesh Sawant January 2008

Upload: yogesh-sawant

Post on 16-Jul-2015

237 views

Category:

Software


9 download

TRANSCRIPT

Page 1: Learning sed and awk

Learning sed and awk

Presented by Yogesh Sawant

January 2008

Page 2: Learning sed and awk

Course Goal

• This course provides you with knowledge and skills to– Develop sed and awk scripts

– Use sed and awk to automate common tasks

– Use sed and awk to create formatted reports

Page 3: Learning sed and awk

Course Map• A primer on sed and awk

– Conceptual understanding of sed and awk– Similarities of sed and awk– Working of sed and awk– How to invoke sed and awk

• UNIX regular expressions– Overview of UNIX regular expressions– Metacharacters

• Delving deeper into sed– Syntax of sed commands– Commonly used sed commands

• Delving deeper into awk– Programming model of awk– Variables– Operators– Conditionals– Loops– Arrays– Functions

Page 4: Learning sed and awk

A primer on sed and awk

• Objectives– Learn what is sed and awk

– Learn similarities of sed and awk

– Understand working of sed and awk

– Learn how to use sed and awk

Page 5: Learning sed and awk

• sed is a non-interactive stream-oriented UNIX utility• sed is used for parsing text files, and to apply textual

transformations to a sequential stream of data• sed reads the input data stream line by line, applies the

operations that have been specified, and then outputs the modified data– $ sed ‘s/needle/magnet/g’ haystack > new_haystack

• sed is often used as a filter in a pipeline• sed has its origins in ed, the original UNIX line editor• Basic difference between sed and ed is that ed is not

stream oriented, whereas sed is• ed is an interactive editor, whereas sed is not

What is sed

Page 6: Learning sed and awk

• To automate editing actions to be performed on one or more files

• To simplify the task of performing the same edits on multiple files

• To write data conversion programs

When should you use sed

Page 7: Learning sed and awk

• AWK can be described as: A Pattern-Matching Programming Language

• AWK is designed for processing text-based data, either in files or data streams

• A typical example of an awk program is one that transforms data into a formatted report

• You can trace the lineage of awk to sed and grep, and through these two programs to ed, the original UNIX line editor

What is AWK

Page 8: Learning sed and awk

• View a text file as a textual database made up of records and fields

• Use variables to manipulate this database• Use arithmetic and string operations• Use programming constructs such as loops and

conditionals• Generate formatted reports• Define functions• Execute UNIX commands from a script• Process the result of UNIX commands

What does awk offer me?

Page 9: Learning sed and awk

• They are invoked using similar syntax– $ sed ‘instructions’ /foo/bar– $ awk ‘instructions’ /foo/bar

• They are both stream-oriented, reading input from text files one line at a time and directing the result to standard output

• They use regular expressions for pattern matching

• They allow the user to specify instructions in a script

Similarities of sed and awk

Page 10: Learning sed and awk

• Read one line at a time from the input file• Make a copy of the input line• Execute the given instructions on the copy

of input line• Output the modified line

How sed and awk work

Inputstrea

m

Output

stream

instructions

Input line

Page 11: Learning sed and awk

• Each instruction has two parts: a pattern and a procedure

• Pattern is a regular expression delimited with forward slashes (/)

• Procedure specifies one or more actions to be performed

• In sed, procedure consists of editing commands like those used in the line editor

• In awk, procedure consists of programming statements and functions

Instructions to sed and awk

Page 12: Learning sed and awk

• Specifying instructions on the command line– sed [-e] ‘instructions’ /foo/bar– sed ‘s/us/them/’ ring_file.text– Enclosing single quotes prevent the shell from

interpreting special characters– sed ‘s/us/them/’ ring_file > new_file– -e option is necessary only when you specify more

than one instructions• -e option tells sed to interpret the next argument as an

instruction• sed –e ‘s/us/them/’ –e ‘s/ring/stone/’ ring_file.text

Invoking sed

Page 13: Learning sed and awk

• Using a script file– Editing instructions can be placed in a file– sed –f scriptfile /foo/bar– Editing instructions in the file are executed in the

order in which they appear$ cat subs_file

s/us/them/

s/ring/stone/

$sed –f subs_file ring_file.text

– Comments can be added in the sed script with the help of number sign (#)

Invoking sed

Page 14: Learning sed and awk

• Suppressing automatic display of input lines– -n --quiet, --silent

– By default, sed writes each input line to output after processing it. This option prevents that.

– $ sed –n ‘/Mordor/ p’ ring_file

• In-place editing of files– -i --in-place

– GNU sed provides the feature of replacing the original file with the result of applying sed program

– $ sed –i ‘s/Ring/Stone/’ ring_file

Invoking sed

Page 15: Learning sed and awk

• Specifying instructions on the command line– awk ‘instructions’ /foo/bar– Enclosing single quotes prevent the shell from

interpreting special characters– $ awk ‘/ya/’ /etc/passwd

– If procedure is not specified in the instruction, default action is to print the line

– $ awk –F : ‘/ya/ { print $5 }’ /etc/passwd• -F --field-separator

– This option lets you change the field delimiter– The default field delimiter is one or more spaces and / or tabs

– Procedure should be enclosed within braces ({})

Invoking awk

Page 16: Learning sed and awk

• Specifying instructions on the command line– Multiple instructions can be mentioned separated with

semicolons– $ awk –F : ‘/ya/ { print $5 ; print $6; print $7 }’ /etc/passwd

– awk interprets each input line as a record, and each word on that line as a field

– $0 represents entire input line– $1, $2, $3 represent individual fields on the input line

Invoking awk

Page 17: Learning sed and awk

• Using a script file– Editing instructions can be placed in a file

• -f scriptfile• --file=scriptfile

– This option instructs the awk utility to get the script from the specified file

awk –f scriptfile /foo/bar$ cat awkscr_file/ya/ { print $5 print $6 print $7}$ awk -F : -f awkscr_file /etc/passwd

Invoking awk

Page 18: Learning sed and awk

• Assigning value to a variable– -v var=value– --assign=var=value

• This option sets value to a variable before the script is executed. This happens even before the BEGIN procedure is run.

• The –v option and its assignment must precede all the file name arguments, as well as the program text

$ cat awkscr_option-v{ if (match ($0, user)) print user, "exists"}$ awk -v user=yogeshs -f awkscr_option-v /etc/passwd

Invoking awk

Page 19: Learning sed and awk

UNIX Regular Expressions

• Objective– Learn what are regular expressions

– Learn to use regular expressions in the UNIX environment

Page 20: Learning sed and awk

• An expression is something that can not be interpreted literally

• An expression is something that needs to be evaluated

• An expression describes a result

• A regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules

UNIX Regular Expressions

Page 21: Learning sed and awk

. Matches any single character except newline* Matches any number (including zero) of single characters that immediately

precedes it[…] Matches any one of the characters enclosed between brackets

A circumflex (^) as first character inside brackets reverses the match for all characters

A hyphen (-) is used to indicate a range of characters

^ As first character of regular expression, matches beginning of line$ As last character of regular expression, matches end of line\{n\} Matches exactly n occurrences of the single character

that immediately precedes it\{n,\}Matches at least n occurrences\{n,m\} Matches any number of occurrences between n and m

\ Escapes the special character that follows

Metacharacters

Page 22: Learning sed and awk

+ Matches one or more occurrences of the preceding regular expression

? Matches zero or one occurrences of the preceding regular expression

| specifies that either the preceding or following regular expression can be matched (alteration)$ egrep ‘an|the’ a_case_of_identity

() Groups regular expression

Extended Metacharacters (egrep and awk)

Page 23: Learning sed and awk

Delving deeper into sed

• Objective– Understand how sed commands work

– Learn commonly used sed commands

Page 24: Learning sed and awk

• sed command set consists of 25 commands• An address is optional with any command

– [address] command– Address can be a pattern described as a regular expression

• $ sed ‘/Dark/ d’ ring_file– Address can be specified with the help of a line number

• $ sed ‘3 d’ ring_file• $ sed ‘5,10 d’ ring_file• $ sed ‘$ d’ ring_file

– Appending the ! character to the end of an address negates the sense of match• $ sed ‘/Dark/! d’ ring_file

• Multiple commands can be placed on the same line, separated by semicolon (;)– $ sed ‘s/Mortal/Immortal/; s/Men/Gods/’ ring_file

• Command can be grouped at the same address by surrounding the list of commands in braces– $ sed ‘2,10 {s/Mortal/Immortal/; s/Men/Gods/}’ ring_file

sed commands

Page 25: Learning sed and awk

• [address] s/pattern/replacement/flags– Regular expression can be delimited with any character except newline

• s#/usr/mail#/usr2/mail#– If address is mentioned, substitute command is applied to lines

matching it• $ sed ‘3,5 s/One/None/’ ring_file• $ sed ‘/Dark/ s/One/None/’ ring_file

– In the replacement section, following characters have special meaning:• & Replaced by the string matched by the regular expression

– $ sed ‘s/sky/blue \&/’ ring_file• \n Matches the nth substring previously specified in the pattern using \( and \)

– $sed ‘s/\(Dark\) \(Lord\)/very \1 and sluggardly \2/’ ring_file

• \ Used to escape the ampersand (&), the backslash (\), and the delimiter when they are used literally in the replacement section

– $ sed ‘s/\/usr\/mail/\/usr2\/mail/’ mail_user

sed commands – substitution (s)

Page 26: Learning sed and awk

– Flags that modify the substitution are:• n A number (1 to 512) indicating that a replacement should

be made for only the nth occurrence of the pattern– $ sed ‘s/Ring/Stone/2’ ring_file

• g Make changes globally on all occurrences in the pattern space– $ sed ‘s/Ring/Stone/g’ ring_file

• p Print the contents of the pattern space– $ sed –n ‘$p’ ring_file

• w file Write the contents of the pattern space to file– $ sed ‘s/Ring/Stone/w stonefile’ ring_file

sed commands – substitution (s)

Page 27: Learning sed and awk

• [address] d• This command takes an address and

deletes the contents of the pattern space if the line matches the address

• $ sed ‘/ring/ d’ ring_file• If the line matches the address, the entire

line is deleted, not just the portion of the line that is matched

sed commands – delete (d)

Page 28: Learning sed and awk

• [address] a\– text

• This command places the given text after the line that is matched by address

$cat sedscr_append/find them/ a\wherever they are$ sed –f sedscr_append ring_file

• This command can be applied only to a single line, not a range of lines

• A backslash is required after a to escape end-of-line• Text to be appended must be placed on the next line• To append multiple lines of text, all lines must end with a

backslash, except the last line

sed commands – append (a)

Page 29: Learning sed and awk

• [address] i\– text

• This command places the given text before the line that is matched by address

$cat sedscr_insert/find them/ i\And what the rings do?$ sed –f sedscr_insert ring_file

• This command can be applied only to a single line, not a range of lines

• A backslash is required after i to escape end-of-line• Text to be inserted must be placed on the next line• To insert multiple lines of text, all lines must end with a

backslash, except the last line

sed commands – insert (i)

Page 30: Learning sed and awk

• [address] c\– text

• This command replaces the line selected by address with given text$cat sedscr_insert/bind them/ c\One Ring to bring Frodo Baggins, and put an end to all the rings

$ sed –f sedscr_insert ring_file• A backslash is required after c to escape end-of-line• Replacement text must be placed on the next line• To provide multiple lines as replacement text, all lines must end with

a backslash, except the last line• When a range of lines is specified as address, all lines as a group

are replaced by a single copy of text

sed command – change (c)

Page 31: Learning sed and awk

• [address] y/source-chars/dest-chars/• This command transliterates any character in the

pattern space which match any of the source-chars with corresponding character in dest-chars

• $ sed ‘y/DL/dl/’ ring_file• The replacement is made by character position.

Therefore, it has no idea of a word

• This command affects the entire contents of the pattern space.

sed commands – transform (y)

Page 32: Learning sed and awk

• [address] p• This command causes the contents of the

pattern space to be output• $ sed –n ‘/Mordor/ p’ ring_file• It is useful when default output is

suppressed using –n option of sed

• Command = prints the line number• $ sed –n ‘/Mordor/ =’ ring_file

sed commands – print (p)

Page 33: Learning sed and awk

• [address] w file• This command appends the contents of the

pattern space to the given file• $ sed ‘/Mordor/ w places’ ring_file• Exactly one space must be present between w

and file• This command will create the file if it does not

exist• If the file exists, its contents will be overwritten

each time the script is executed

sed commands – write (w)

Page 34: Learning sed and awk

• [address] n• This command outputs the contents of pattern space and

then reads the next line of input• In effect, this command causes the next line of input to

replace the current line in the pattern space. Subsequent commands in the sed script are applied to the replacement line, not the current line.

/Men/{ns/Dark/White/}

• Matches any line containing word Men and substitutes word Dark with White on the next line

sed commands – next (n)

Page 35: Learning sed and awk

• [line-address] q• This command causes sed to stop

reading new input lines (and stop sending them to output)

• This command can take only a single-line address. Once the address is reached, the script will be terminated.

• $ sed ‘/Mordor/ q’ ring_file• $ sed ‘100q’ ring_file

sed commands – quit (q)

Page 36: Learning sed and awk

Delving deeper into awk

• Objective– Understand programming model of

awk

– Learn variables of awk

– Learn operators of awk

– Learn conditionals, loops of awk

– Learn arrays of awk

– Learn functions of awk

Page 37: Learning sed and awk

• The essential organization of an awk script is of the form:– pattern { action }– An action is one or more statements that will be

performed on those input lines that match the pattern• $ awk ‘/ring/ { print }’ ring_file

– If no pattern is specified, the action is performed for every input line• $ awk { print } ring_file

– If no action is specified, the default action is to print the line• $ awk ‘/ring/’ ring_file

Programming model of AWK

Page 38: Learning sed and awk

• A pattern can be any of the following:– /regular expression/

• $ awk '/[Ll]ord|king/ {print}' ring_file– BEGIN

• Specify action to be taken before any lines are read• $ awk 'BEGIN {print "howdy, folks"} //' ring_file

– END• Specify action to be taken after last line is read• $ awk 'BEGIN {lc=0} // {lc++} END {print lc}' ring_file

– relational expression• $ awk 'BEGIN {i=1} i<4 {print; i++}' ring_file

– pattern,pattern• $ awk '/Dark/,/Shadow/ {print}' ring_file

Programming model of AWK

Page 39: Learning sed and awk

• There are three kinds of variables in awk:– user-defined

– built-in

– fields

• A variable need not be declared or initialized• A variable can contain a string or numeric value• An un-initialized variable has empty string as its

string value and zero as its numeric value

Variables in awk

Page 40: Learning sed and awk

• User-defined variables– Name of a variable must be a sequence of letters,

digits, and underscores, and it may not begin with a digit

$ cat awkscr_var-user-definedBEGIN { FS=":" }{ if ($7 == "/bin/bash") bash_users++}END { print bash_users, "users are having bash as their default shell" }

$ awk -f awkscr_var-user-defined /etc/passwd

Variables in awk

Page 41: Learning sed and awk

• Built-in variables• There are two types of built-in variables in awk:

– Variables whose values can be changed• FS defines field separator• OFS defines output field separator• RS defines record separator• ORS defines output record separator

– Variables that can be used, and whose values are internally updated by awk

• FILENAME contains name of current input file

– All built-in variable’s names are entirely uppercase

Variables in awk

Page 42: Learning sed and awk

• Field variables• awk considers each input line as a record, and

each word as a field– $1, $2, $3 etc. refer to individual fields in the input

record• $ awk 'BEGIN {FS=":"} {print $1, " is known as ", $5}' /etc/passwd

– $0 refers to the entire input record• $ awk 'BEGIN {FS=":"} {if ($7 == "/sbin/nologin") print $0}' /etc/passwd

Variables in awk

Page 43: Learning sed and awk

• Arithmetic operators+ Addition- Subtraction* Multiplication/ Division% Modulo^ Exponentiation** Exponentiation

• Assignment operators++ Add 1 to variable-- Subtract 1 from variable+= Assign result of addition-= Assign result of subtraction*= Assign result of subtraction/= Assign result of division%= Assign result of modulo^= Assign result of exponentiation**= Assign result of exponentiation

Operators of awk

Page 44: Learning sed and awk

• Relational operators< Less than> Greater than<= Less than or equal to>= Greater than or equal to== Equal to!= Not equal to~ Matches!~ Does not match

• Boolean operators– Boolean operators allow you to combine a series of comparisons

|| Logical OR&& Logical AND! Logical NOT

Operators of awk

Page 45: Learning sed and awk

• A conditional statement allows you to make a test before performing an action

if (expression) action1[else action2]

• An expression might contain arithmetic, relational, or Boolean operators• If action consists of more than one statement, it is enclosed within a pair of

in braces$ cat awkscr_if{ print $1; if (index ($7, "bash")) print " uses bash" else print " does not use bash"}$ awk -F : -f awkscr_if /etc/passwd

Conditionals in awk

Page 46: Learning sed and awk

• awk provides a conditional operator that is found in C programming language

• expr ? action1 : action2$ cat awkscr_ternary

{

print $1,

index ($7, "bash") ? " uses bash" : "does not use bash"

}

$ awk -F : -f awkscr_ternary /etc/passwd

Conditionals in awk

Page 47: Learning sed and awk

• Loops can be specified using while, do, or for statement• While loop

while (condition) action

• If the conditional expression is never true, the action is not performed

• An action consisting of more than one statement must be enclosed in braces

$ cat awkscr_whileBEGIN { i=1 while (i<=10) { print num, "*", i , "=", num * i i++ }}$ awk -f awkscr_while -v num=100

Looping in awk

Page 48: Learning sed and awk

• Do loopdo actionwhile (action)

• The action is performed at least once• An action consisting of more than one statement must be enclosed

in braces$ cat awkscr_do-whileBEGIN { i=1 do { print num, "*", i , "=", num * i i++ } while (i<=10)}$ awk -f awkscr_do-while -v num=5

Looping in awk

Page 49: Learning sed and awk

• For loopfor (initialization; condition; increment) action

• This loop starts by executing initialization• As long as condition is true, it repeatedly executes action, and

then increment• An action consisting of more than one statement must be

enclosed in braces$ cat awkscr_forBEGIN { for (i=1; i<=10; i++) print num, "*", i, "=", num * i}$ awk -f awkscr_for -v num=3

Looping in awk

Page 50: Learning sed and awk

• Statements that affect the flow control of a loop are:– break

• Breaks out of the loop so that no more iterations of the loop are performed

– continue• Stops the current iteration and starts a new iteration at the top

• Statements that affect the main input loop of awk are:– next

• Causes next line of input to be read and then resume execution at the top of the main loop

– exit• Exits the main loop and passes control to the END block, if there is

one

Looping in awk

Page 51: Learning sed and awk

• An array is a variable that can be used to store a set of values• In awk, you don’t have to declare size of the array

– users[1] = “root”• Individual elements are accessed by their index in the array

– print users[3]• Whether an element exists in an array at a certain index can be

determined as:if (index in array) print “subscript index is present”

• To remove an individual element of an array, use the delete statement:– delete array[index]

• You can not have a variable and an array with the same name in the same awk program

Arrays in awk

Page 52: Learning sed and awk

• Index of an associative array can be a string or a number• There is a special looping syntax for accessing all

elements of an associative array– for (index in array)– print index, array[index]

• All arrays in awk are associative arrays

$ cat awkscr_array-associativeBEGIN { FS=":" }{ shells[$7]++ }END { for (s in shells) print shells[s], "users are having", s, "as their default shell"}

$ awk -f awkscr_array-associative /etc/passwd

Associative arrays in awk

Page 53: Learning sed and awk

• A function is a self-contained computation that accepts a number of arguments and returns some value

• Arithmetic functions– int(x) Returns turncated value of x

•$ awk ‘BEGIN {print int(57.43)}’

– sqrt(x)Returns square root of x•$ awk ‘BEGIN {print sqrt(25)}’

Functions in awk

Page 54: Learning sed and awk

• String functions– length(s) Returns length of string s, or length of $0 if no string is

supplied• $ awk '{ if (length ($0)) print }' ring_file

– index(s,t) Returns position of substring t in string s, or zero if not present• awk -F : '{ if(index($0, "bash")) print}' /etc/passwd

– tolower(s) Translate all uppercase characters in string s to lowercase and return the new string• $ awk -F : '{ print tolower ($5)}' /etc/passwd

– toupper(s) Translate all lowercase characters in string s to uppercase and return the new string• $ awk -F : '{ print toupper ($5)}' /etc/passwd

– sprintf(“format”, expr) Uses printf format specification for expr• awk -F : '{name=sprintf ("%s is known as %s", $1, $5); print name}' /etc/passwd

Functions in awk

Page 55: Learning sed and awk

• String functions– match(s,r) Returns the position in s where regular expression r

begins, or 0 if no occurrences are found• $ awk -F : '{ if (match ($0, "bash")) print $1 " uses " $7}' /etc/passwd

– sub(r,s,t) Substitute first occurrence of regular expression r with s in string t

• If t is not supplied, defaults to $0• $ awk '{sub ("Dark", "Blue"); print}' ring_file

– gsub(r,s,t) Substitute first occurrence of regular expression r with s in string t

• If t is not supplied, defaults to $0• $ awk '{ gsub (":", " "); print }' /etc/passwd

– split(s,a,sep) Parses string s into elements of array a using field separator sep• $ awk '{split ($0, all, ":"); print all[1], " is known as " all[5]}' /etc/passwd

Functions in awk

Page 56: Learning sed and awk

• Function is a program component that can be reusedfunction name(parameter-list) { statements}

• A valid function name is a sequence of letters, digits, and underscores that doesn’t start with a digit

• The parameter-list is a comma-separated list of function’s arguments and local variable names

– The argument names are used to hold the argument values passed to the function

– The local variables are initialized to empty string– A function can not have a parameter with the same name as the function itself

• Whitespace characters (spaces and tabs) are not allowed between function name and the open-parenthesis of the argument list

• Passing variables as parameters to a function is the case of pass by value• When an array is passed as parameter to function, it is the case of call by

reference in awk• If return statement is not written, then the function returns an unpredictable

value

Writing functions with awk

Page 57: Learning sed and awk

Epilogue

• Part of solving a problem is knowing which tool to use• Using awk for simple problems such as 'printing fourth

column from a file' is wasting a mighty tool on trivial problems

• Use sed and / or cut for such naive problems• When you need to work context-oriented, use awk• Context-oriented means: problems like "get all numbers

in a file totaling them" or "get the content of a certain line and apply changes to the other lines following it depending on this content" or something like that

Page 58: Learning sed and awk

• Using awk instead of sed has the price of performance and size

• awk takes a substantially longer time to load compared to sed or ed, and does its job at a considerably slower pace

• The real distinguishing point between sed and awk as a text processor is that awk is able to work with a persistent context, whereas capabilities of sed in this area are limited to non-existent. If you - for instance - would have to sum one field to a total you would do it with awk (it would be possible to do it with sed, but would be a nightmare - poorly suited tool for the job)

Why not to use awk when sed can do the job?

Page 59: Learning sed and awk

How to Learn More

• Books– sed & awk,

By Dale Dougherty and Arnold Robbins,O’Reilly and Associates Inc.

– Mastering Regular Expressions,By Jeffery E. F. Friedl,O’Reilly and Associates Inc.

– Effective awk Programming,By Arnold Robbins,O’Reilly and Associates Inc.

Page 60: Learning sed and awk

How to Learn More

• Internet– sed

• The sed tutorial from Grymoire– http://www.grymoire.com/Unix/Sed.html

• Handy one-line sed scripts– http://sed.sourceforge.net/sed1line.txt

• sed Tutorial– http://www.gnulamp.com/sed.html

– AWK• The awk tutorial from Grymoire

– http://www.grymoire.com/Unix/Awk.html• awk Tutorial

– http://www.grymoire.com/Unix/Awk.html• The GNU Awk User’s Guide

– http://www.gnu.org/software/gawk/manual/html_node/index.html