cis 218 advanced unix1 cis 218 – advanced unix (g)awk

40
CIS 218 Advanced UNIX CIS 218 – Advanced UNIX (g)awk

Upload: eunice-hood

Post on 13-Jan-2016

242 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 1

CIS 218 – Advanced UNIX

(g)awk

Page 2: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 2

Overview

• awk is a programming language

• Awk uses syntax based on grep and sed for handling numbers and text

• awk provides field level addressability. And within a field (word) using substring commands

• awk works field by field

Page 3: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 3

awk command syntax

• There are two ways to execute an awk program/script:– awk [-F field-separator] ‘program’ target-file– awk [-F field-separator] -f program.file target

• From our discussion of sed, and Refrigerator Rule No. 5, I would hope you are firmly committed to the second form!

Page 4: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 4

awk Variables

• There are a number of awk variables that are very useful– FS (The field separator, defaults to white space)– OFS (Output field separator, can be critical)– NR (Number of records, a sequential counter)– NF (Number of fields in the current record)– FILENAME (Name of the current target file)

Page 5: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 5

awk Variables (cont.)

– $0 (The entire line as read from the target file)– $n (Where n is the nth field in the record. This

is how we get field level addressability in awk)

• nawk, gawk, etc give us more variables, the most significant two are:– ARGC (the count of the command line

arguments)– ARGV (an array of the command line

arguments)

Page 6: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 6

Parts of a program

• All programs are composed of one or more of the following three constructs:– sequence (a series of instructions, one

following the next, executed sequentially)– selection (the ability of the code to decide

which instructions to execute, conditional execution)

– iteration (adding looping so that selected code will be repeated over an over)

Page 7: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 7

awk Program Format

• Awk programs are composed of pattern {action} pairs (actions must be

enclosed in French braces {} )– a pattern without a corresponding action takes

the default action, print $0– an action without a corresponding pattern is

applied to every line– each input line is submitted to every

pattern/action pair

Page 8: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 8

awk Program Format (cont.)• Placement of the open French brace is critical

– pattern { both patterns are action 1 executed for

lines action 2 matching

the pattern }– pattern lines matching the pattern

{action 1 are printed, and both

action 2 actions are

performed on } every line!

Page 9: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 9

Patterns

• In an awk program, the pattern is the selection tool that decides what actions are applied to which lines.

• Patterns can be:– relational expressions– regular expressions– magic patterns

Page 10: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 10

Relational Expression patternsSymbol Meaning Symbol Meaning

< Less than == equal to

<= Less than or equal to

~ contains the RE

> Greater than !~ doesn't contain RE

>= Greater than or equal to

&& logical and

!= not equal to || logical or

Page 11: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 11

Regular Expression patterns

• Must be enclosed in slashes /RE/• Anchors apply to the entire line if they are

used as the only pattern• Remember, you can use regular expressions

in relational patterns with ~ and !~ to apply them to fields

• Both true regular expressions and fixed patterns can be used as REs in awk

Page 12: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 12

Pre/Post Processing• There are two in awk:

– BEGIN {the action associated is performed before the target file is opened}

– END {the action associated is performed after the target file is successfully closed}

• Both are coded in UPPER CASE

Page 13: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 13

# comments

• Like most scripting languages # indicates a comment

• awk scripts should be well documented

• Comments should explain what you are doing and why.

Page 14: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 14

print

• The print command is the simplistic output tool for awk. Basically and “echo”/

• You can direct print to send its data to a file with the > operator

• Generally print is used for simple output or debugging output

Page 15: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 15

printf

• Similar in concept to the “C” language command. The format of a printf command is:

printf (“formatting string”,variables)

• The formatting characters correspond to the variables one for one in both lists.

• Each formatting character is prefixed by %

Page 16: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 16

printf (cont.)

• The formatting specifiers contain then following characters:– - indicates that the data should be left justifed– n indicates the minimum width of the field– .n indicates the maximum width of the field

“%-5s”indicates a string field, left

justified, of width 5 bytes

Page 17: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 17

printf formatting characters

Format Meaning Format Meaning

%c single ASCII character

%G shortest of %E or %f

%d decimal integer %i decimal integer

%e scientific notation %o octal number

%E SCIENTIFIC NOTATION

%s string

%f floating point %x hexadecimal (lc)

%g shortest of %f or %e

%X HEXADECIMAL

Page 18: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 18

printf spacing characters

• There are two characters available to change the spacing of your text:– \n inserts a newline character. You must use

this if you want your output to occur on successive lines.

– \t inserts a tab character

Page 19: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 19

getline

• getline is used to read from the keyboard• It can also capture the results of a command

but this form is seldom used• Read from the keyboard using

getline variable < “/dev/tty” • If you don’t supply a variable, awk will use

$0, so in most cases you want to use a variable.

Page 20: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 20

rand() srand()

• The rand() function generates pseudo-random numbers in the range 0 - 1.

• Given the same seed, it will always generate the same series of numbers.

• srand() is used to supply a new seed to rand().

• If you don’t supply srand() a value, it uses the current time as the seed.

Page 21: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 21

system()

• The system() function allows you to execute system commands within an awk script.

• You must enclose the system command in quotation marks.

• You cannot capture the output from the system() function within the script but you can capture the return code.

Page 22: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 22

length()

• The length([argument]) function returns the length of the argument in bytes.

• If you give length() a number, it will return the number of digits in the number.

• If you don’t give length() an argument, it will use $0 by default.

Page 23: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 23

index()

• The index(string,target) function returns the position of the first occurrence of the target within the string.

• The index() function is often used to set the boundary for the substr() function.

Page 24: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 24

substr()

• The substr(string,start[,length]) function will return the part of the string beginning with start and continuing for length bytes.

• If you don’t give it a length, it will return all the bytes between the start and the end of the string.

Page 25: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 25

split()

• You will use split(string, array[, separator]) to divide a string into parts using separator to parse them, storing the resultant parts in the array.

• If you don’t code a separator, the function will use the field separator to parse the string.

Page 26: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 26

if• Besides using patterns, if gives us another

way to perform selection• The format of an if statement is

if (condition){verb(s)}[else

{ verb(s)}]

• If you have more than one verb, they must be enclosed in French braces.

Page 27: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 27

if conditions

A < B A is less than BA <= B A is less than or equal to BA == B A equals B (note 2 =)A > B A is greater than B

A >= B A is greater than or equal to BA != B A is not equal to B

A ~ /RE/ A contains the regularexpression RE

Page 28: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 28

if

• A sample if

Page 29: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 29

exit

• The input file is closed

• Control is transferred to the action associated with the END magic pattern if there is one

• Generally used as a bailout in case of catastrophic errors

Page 30: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 30

for loop• This is a counted loop

• executes until the counter reaches the target value

• Increment (count up) or decrement (count down)

• also works with the elements of an array

• multiple verbs must be enclosed in { }

Page 31: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 31

for loop example

Page 32: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 32

while loop

• The while loop is an example of conditional execution

• The loop cycles as long as the condition specified is true

• A while loop always checks to see if it should execute

• multiple verbs must be enclosed in { }

Page 33: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 33

while loop example

Page 34: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 34

do/while

• Even though it has a while in it, this is an example of until logic.

• Until logic is shunned by conscientious coders.

• ‘nuff said

Page 35: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 35

break

• Used to exit from a loop

• Control is passed to the line following the end of the loop

• Causes an exit from the loop but NOT the awk script. If you want to bail out of the whole script, use the exit command.

Page 36: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 36

break example

Page 37: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 37

continue

• Causes awk to skip the rest of the body of the loop for the current value

• In a for loop the counter is incremented, and the next cycle of the loop is started

• In a while loop, the next iteration of the loop starts

Page 38: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 38

continue example

Page 39: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 39

next

• Causes the script to start over

• takes the next element from standard input or the target file

• Like exit, this command effects the whole script

Page 40: CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk

CIS 218 Advanced UNIX 40

next example