perl. perl notes 2 perl perl - practical extraction report language –for text files –system...

35
Perl

Upload: aubrie-merritt

Post on 12-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl

Page 2: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 2

Perl

• Perl - Practical extraction report language– for text files– system management– combines C, SED, AWK, SH– interpreted– dynamic

Page 3: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 3

Data Structures

• scalars $num• arrays @num• associative arrays %num

• $num[50]– 50th element of the array num

• $#num– last index of num

Page 4: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 4

Examples#! /usr/local/bin/perl -w

# find the sum of a list of numbers from STDIN

# one number per line

$sum = 0;

while( <STDIN> ) {

$sum += int $_;

}

print "the sum is $sum\n";

Page 5: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 5

Examples

#!/usr/bin/perl -w

# find the sum of a list of numbers from STDIN

# several numbers per line

$sum = 0;

while( <STDIN> ) {

@nums = split;

foreach (@nums) {

$sum += int $_;

}

}

print "the sum is $sum\n";

Page 6: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 6

Average

#!/usr/bin/perl -w

# find the average of a list of

# numbers from STDIN

# several numbers per line

$sum = 0;

$count = 0;

while( <STDIN> ) {

@nums = split;

foreach (@nums) {

$sum += int $_;

$count++;

}

}

print "the average is ", $sum/$count, "\n";

Page 7: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 7

median#!/usr/bin/perl -w

# find the median of a list of number

# from STDIN

# several numbers per line

@nums = ();

while( <STDIN> ) {

@nums = (@nums, split );

}

@nums = sort @nums;

if($#nums % 2) {

$median = ($nums[($#nums - 1)/2] + $nums[($#nums + 1)/2])/2;

}

else {

$median = $nums[$#nums/2];

}

print "the median is $median\n";

Page 8: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 8

Output?

#!/usr/bin/perl -w

@stuff = ("one", "two", "three");

print @stuff, "\n";

$stuff = ("one", "two", "three");

print $stuff, "\n";

$stuff = @stuff;

print $stuff, "\n";

onetwothree8

three

3

Page 9: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 9

Pattern Matching

m//

s///

Modifiers• i case-insensitive• m multiple lines• s single line• x extend

Page 10: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 10

Regular Expressions

Code Meaning

\w Alphanumeric Characters

\W Non-Alphanumeric Characters

\s White Space

\S Non-White Space

\d Digits

\D Non-Digits

\b Word Boundary

\B Non-Word Boundary

\A ^ At the Beginning of a String

\Z $ At the End of a String

. Match Any Single Character

Page 11: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 11

Regular Expressions

* Zero or More Occurrences

? Zero or One Occurrence

+ One or More Occurrences

{ N } Exactly N Occurrences

{ N,M } Between N and M Occurrences

.* <thingy> Greedy Match, up to the last thingy

.*? <thingy> Non-Greedy Match, up to the first thingy

[ set_of_things ] Match Any Item in the Set

[ ^ set_of_things ] Does Not Match Anything in the Set

( some_expression ) Tag an Expression

$1..$N Tagged Expressions used

in Substitutions

Page 12: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 12

Rules

• Rule 1– The engine tries to match as far left

as it can

• Rule 2– The regular expression is regarded

as set of alternatives. Tries them left to right. (see page 61)

• Rule 3– Items that have choices match from

left to right

/x*y*/

• Rule 4– Assertions– ^ $ \b \B \A \Z \G (?…) (?!…)

Page 13: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 13

Rules

• Rule 5– A quantified atom matches only if

the atom itself matches some number of times allowed by the quantifier

Maximal minimal

{n,m} {n,m}?

{n,} {n,}? At least n

{n} {n}? Exactly n

* *? 0 or more

+ +? 1 or more

? ?? 0 or 1

Page 14: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 14

Rules

• Rule 6– Each atom matches according to its

type– (…) ==> grouping + storage $1, $2– . matches any char except \n– […] groups– Special characters \a \n \r …– \1 \2 ... backreference to (…)– \033 octal char– \xf7 hex char– \cD control char– any other \ matches the char itself

Page 15: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 15

precedence

• () (?: )• Repetition• Sequence• | alteration

Pattern strings/ab*c/ abc, ac, ababd, abbbc/abc*/ a, ab, abc, abccc, abcabc/(abc)*/ abc, abcc. empty string, abcabc/ed|jo/ ed, jo, edo, ejo/(ed)|(jo)/ ed, jo, edo, ejo/ed|jo{1,3}/ ed, jo, edo, ejo, joo, jooooo/ed|jo{1,3}?/ ed, jo, edo, ejo, joo, jooooo/^ed|jo$/ fred and joe, ed jo, fred jo, jo/^(ed|jo)$/ fred and joe, ed jo, fred jo, jo$pat = ‘bob’;/$pat{3}/

pat, bob, bobbobbob, bobbb, patt

$pat = ‘bob’;/($pat){3}/

pat, bob, bobbobbob, bobbb, patt

Page 16: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 16

• How do you fix it?

/(‘[^’]’*’)/

Pattern strings/\w+/ Greetings, planet earth!/\w*/ Greetings, planet earth!/n[et]*/ Greetings, planet earth!/n[et]+/ Greetings, planet earth!/G.*t/ Greetings, planet earth!/(‘.*’)/ this ‘test’ isn’t good

Page 17: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 17

Examples

s/^([^ ]) +([^ ]+)/$2 $1/

/(\w+)\s*=\s*\1/

/.{40,}/

/^((\d+\.?\d*|\.\d+)$/

if (/Time: (..):(..):(..)/){

$hours = $1;

$minutes = $2;

$seconds = $3;

}

Page 18: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 18

Default arguments

• $_, @_, @ARGV, STDIN

sub foo{

my $x = shift; # @_ default

• in the main program @ARGVwhile($_ = shift) {

if(/^-(.*)/){

process_optein($1);

} else {

process_file($_);

}

}

Page 19: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 19

Reading a stream

open FIN, “myfile” or die;

while (<FIN>){

# do something with $_

}

foreach (<FIN>){

# do something with $_

}

print sort <FIN>;

Page 20: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 20

Reading a stream

# print a window@f = <FIN>;

foreach ( 0..$#f ) {

if[$[$_] =~ /\bShazam\b/){

$lo = ($_ > 0)? $_ -1 : $_;

$hi = ($_ < $#f) )? $_ +1 : $_;

print map{“$_: $f[$_]”} $lo .. $hi;

}

}

Page 21: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 21

Sorting

• sort numerically

sub numerically { $a <=> $b }

@list = sort numerically

(16, 1, 8, 2, 4, 32);

or

@list = sort { $a <=> $b }

(16, 1, 8, 2, 4, 32);

@list = sort{uc($a) cmp uc($b)}

qw(this is a test);

#reverse

@list = sort { $b <=> $a }

(16, 1, 8, 2, 4, 32);

Page 22: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 22

example#! /usr/bin/perl -w

# This script will count the frequency of distinct words

# in the file that is given as an argument.

# Warning: Error checking is minimal!

die "usage: $0 file\n" unless @ARGV;

while(<>){

tr/A-Z/a-z/; # translate to lowercase

@w = split(/[\W]+/,$_); # split into words

foreach (@w){

$list{$_}++; # increment the counter

}

}

foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) {

print $key, ' = ', $list{$key}, "\n";

}

Page 23: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 23

Tokenizing

# tokenize an arithmetic expression

while($_){

if(/^(\d+)/) {

push @tok, ‘num’, $1;

} elsif(/^([+\-\/*()])/) {

push @tok, ‘punct’, $1;

} elsif (/^([\d\D])/) {

die “invalid char $1 in input”;

}

$_ = substr($_, length $1);

}

• substr slows things down– cut start of string

Page 24: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 24

Tokenizing 2

while(/

(\d+) |

([+\-\/*()]) |

([\d\D])/gx) {

if($1 ne “”){

push @tok, ‘num’, $1;

}elsif ($2 ne “”) {

push @tok, ‘punct’, $2;

}else {

die “invalid char $3 in input”;

}

}

Page 25: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 25

Tokenizing 3

{

if(/\G(\d+)/gc) {

push @tok, ‘num’, $1;

} elsif(/\G([+\-\/*()])/gc) {

push @tok, ‘punct’, $1;

} elsif (/\G([\d\D])/gc) {

die “invalid char $1 in input”;

}else{

last;

}

redo;

}

Page 26: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 26

Use split for clarity

($a, $b, $c) =

/^(\S+)\s+(\S+)\s+(\S+)/;

($a, $b, $c) = split /\s+/, $_;

($a, $b, $c) = split;

Get the fifth field:

($a) =

/[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/;

or

($a) = /(?:[^:]*:){4}([^:]*)/;

or

($a) = (split /:/)[4];

Page 27: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 27

unpacps l

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND

100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2 0:00 -tcsh

000 1216 30658 30562 10 0 2780 1080 - R pts/2 0:00 ps l

chomp (@ps = `ps l`);

shift @ps;

for(@ps){

($uid, $pid, $sz, $tt) =

unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_;

print "$uid, $pid, $sz, $tt\n";

}

Page 28: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 28

Avoid regex for simple strings

do_it() if $answer eq ‘yes’;

do_it() if $answer =~ /^yes$/;

do_it() if $answer =~ /yes/;

do_it() if lc($answer) eq ‘yes’;

do_it() if $answer =~ /^yes$/i;

Page 29: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 29

#!/usr/bin/perl

# remove the comments from a C program

$filename = shift or die "usage $0 filename\n";

open FIN, $filename or die "can't open file";

while (<FIN>){

for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){

if($in_comment){

$in_comment = 0 if $_ eq "*/";

} else {

if ($_ eq "/*") {

$in_comment = 1;

print " ";

} else {

print;

}

}

}

print "\n";

}

Page 30: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 30

References

$a = 3.1416;

$scalar_ref = \$a;

$array_ref = \@a;

$hash_ref = \%a;

$array_el_ref = \$a[3];

$hash_el_ref = \$a{‘John’};

Page 31: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 31

Lists of Lists

@LoL = (

[“fred”, “barney” ],

[“george”, “jane”, “elroy” ],

[“homer”, “marge”, “bart” ],

);

print $LoL[2][2]; # prints “bart”

$ref_to_LoL = [

[“fred”, “barney” ],

[“george”, “jane”, “elroy” ],

[“homer”, “marge”, “bart” ],

];

print $ref_to_LoL ->[2][2];

• Note:$LoL[2][2] implies $LoL[2]->[2]

Page 32: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 32

Grow your own

while(<>){

@tmp = split;

push @LoL, [ @tmp ];

}

Page 33: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 33

Hashes of Arrays%HoL = (

flinstones => [“fred”, “barney” ],

jetsons => [“george”, “jane”, “elroy” ],

simpsons => [“homer”, “marge”, “bart” ],

);

• generation# reading from a file with format:

# flistones: fred barney ..

while(<>){

next unless s/^(.*?):\s*//;

$HoL{$1} = [ split ];

}

• orwhile($line = <>){

($who, $rest) = split /:\s*/, 2;

@fields = split ‘ ‘, $rest;

$Hol{$who} = [ @fields ];

}

Page 34: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 34

Hashes of Arrays# calling a function

for $group (flinstones, jetsons, simpsons) { %HoL($group) = [ get_family($group) ];

);

# append member to existing family

push @{ $HoL{flinstones} }, “wilma”, “betty”;

• access$HoL{flinstone}[0] = “fred”;

Page 35: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic

Perl notes 35

Packages, Modules, and Object Classes