perl. perl notes 2 perl perl - practical extraction report language –for text files –system...
TRANSCRIPT
Perl
Perl notes 2
Perl
• Perl - Practical extraction report language– for text files– system management– combines C, SED, AWK, SH– interpreted– dynamic
Perl notes 3
Data Structures
• scalars $num• arrays @num• associative arrays %num
• $num[50]– 50th element of the array num
• $#num– last index of num
Perl notes 4
Examples#! /usr/local/bin/perl -w
# find the sum of a list of numbers from STDIN
# one number per line
$sum = 0;
while( <STDIN> ) {
$sum += int $_;
}
print "the sum is $sum\n";
Perl notes 5
Examples
#!/usr/bin/perl -w
# find the sum of a list of numbers from STDIN
# several numbers per line
$sum = 0;
while( <STDIN> ) {
@nums = split;
foreach (@nums) {
$sum += int $_;
}
}
print "the sum is $sum\n";
Perl notes 6
Average
#!/usr/bin/perl -w
# find the average of a list of
# numbers from STDIN
# several numbers per line
$sum = 0;
$count = 0;
while( <STDIN> ) {
@nums = split;
foreach (@nums) {
$sum += int $_;
$count++;
}
}
print "the average is ", $sum/$count, "\n";
Perl notes 7
median#!/usr/bin/perl -w
# find the median of a list of number
# from STDIN
# several numbers per line
@nums = ();
while( <STDIN> ) {
@nums = (@nums, split );
}
@nums = sort @nums;
if($#nums % 2) {
$median = ($nums[($#nums - 1)/2] + $nums[($#nums + 1)/2])/2;
}
else {
$median = $nums[$#nums/2];
}
print "the median is $median\n";
Perl notes 8
Output?
#!/usr/bin/perl -w
@stuff = ("one", "two", "three");
print @stuff, "\n";
$stuff = ("one", "two", "three");
print $stuff, "\n";
$stuff = @stuff;
print $stuff, "\n";
onetwothree8
three
3
Perl notes 9
Pattern Matching
m//
s///
Modifiers• i case-insensitive• m multiple lines• s single line• x extend
Perl notes 10
Regular Expressions
Code Meaning
\w Alphanumeric Characters
\W Non-Alphanumeric Characters
\s White Space
\S Non-White Space
\d Digits
\D Non-Digits
\b Word Boundary
\B Non-Word Boundary
\A ^ At the Beginning of a String
\Z $ At the End of a String
. Match Any Single Character
Perl notes 11
Regular Expressions
* Zero or More Occurrences
? Zero or One Occurrence
+ One or More Occurrences
{ N } Exactly N Occurrences
{ N,M } Between N and M Occurrences
.* <thingy> Greedy Match, up to the last thingy
.*? <thingy> Non-Greedy Match, up to the first thingy
[ set_of_things ] Match Any Item in the Set
[ ^ set_of_things ] Does Not Match Anything in the Set
( some_expression ) Tag an Expression
$1..$N Tagged Expressions used
in Substitutions
Perl notes 12
Rules
• Rule 1– The engine tries to match as far left
as it can
• Rule 2– The regular expression is regarded
as set of alternatives. Tries them left to right. (see page 61)
• Rule 3– Items that have choices match from
left to right
/x*y*/
• Rule 4– Assertions– ^ $ \b \B \A \Z \G (?…) (?!…)
Perl notes 13
Rules
• Rule 5– A quantified atom matches only if
the atom itself matches some number of times allowed by the quantifier
Maximal minimal
{n,m} {n,m}?
{n,} {n,}? At least n
{n} {n}? Exactly n
* *? 0 or more
+ +? 1 or more
? ?? 0 or 1
Perl notes 14
Rules
• Rule 6– Each atom matches according to its
type– (…) ==> grouping + storage $1, $2– . matches any char except \n– […] groups– Special characters \a \n \r …– \1 \2 ... backreference to (…)– \033 octal char– \xf7 hex char– \cD control char– any other \ matches the char itself
Perl notes 15
precedence
• () (?: )• Repetition• Sequence• | alteration
Pattern strings/ab*c/ abc, ac, ababd, abbbc/abc*/ a, ab, abc, abccc, abcabc/(abc)*/ abc, abcc. empty string, abcabc/ed|jo/ ed, jo, edo, ejo/(ed)|(jo)/ ed, jo, edo, ejo/ed|jo{1,3}/ ed, jo, edo, ejo, joo, jooooo/ed|jo{1,3}?/ ed, jo, edo, ejo, joo, jooooo/^ed|jo$/ fred and joe, ed jo, fred jo, jo/^(ed|jo)$/ fred and joe, ed jo, fred jo, jo$pat = ‘bob’;/$pat{3}/
pat, bob, bobbobbob, bobbb, patt
$pat = ‘bob’;/($pat){3}/
pat, bob, bobbobbob, bobbb, patt
Perl notes 16
• How do you fix it?
/(‘[^’]’*’)/
Pattern strings/\w+/ Greetings, planet earth!/\w*/ Greetings, planet earth!/n[et]*/ Greetings, planet earth!/n[et]+/ Greetings, planet earth!/G.*t/ Greetings, planet earth!/(‘.*’)/ this ‘test’ isn’t good
Perl notes 17
Examples
s/^([^ ]) +([^ ]+)/$2 $1/
/(\w+)\s*=\s*\1/
/.{40,}/
/^((\d+\.?\d*|\.\d+)$/
if (/Time: (..):(..):(..)/){
$hours = $1;
$minutes = $2;
$seconds = $3;
}
Perl notes 18
Default arguments
• $_, @_, @ARGV, STDIN
sub foo{
my $x = shift; # @_ default
• in the main program @ARGVwhile($_ = shift) {
if(/^-(.*)/){
process_optein($1);
} else {
process_file($_);
}
}
Perl notes 19
Reading a stream
open FIN, “myfile” or die;
while (<FIN>){
# do something with $_
}
foreach (<FIN>){
# do something with $_
}
print sort <FIN>;
Perl notes 20
Reading a stream
# print a window@f = <FIN>;
foreach ( 0..$#f ) {
if[$[$_] =~ /\bShazam\b/){
$lo = ($_ > 0)? $_ -1 : $_;
$hi = ($_ < $#f) )? $_ +1 : $_;
print map{“$_: $f[$_]”} $lo .. $hi;
}
}
Perl notes 21
Sorting
• sort numerically
sub numerically { $a <=> $b }
@list = sort numerically
(16, 1, 8, 2, 4, 32);
or
@list = sort { $a <=> $b }
(16, 1, 8, 2, 4, 32);
@list = sort{uc($a) cmp uc($b)}
qw(this is a test);
#reverse
@list = sort { $b <=> $a }
(16, 1, 8, 2, 4, 32);
Perl notes 22
example#! /usr/bin/perl -w
# This script will count the frequency of distinct words
# in the file that is given as an argument.
# Warning: Error checking is minimal!
die "usage: $0 file\n" unless @ARGV;
while(<>){
tr/A-Z/a-z/; # translate to lowercase
@w = split(/[\W]+/,$_); # split into words
foreach (@w){
$list{$_}++; # increment the counter
}
}
foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) {
print $key, ' = ', $list{$key}, "\n";
}
Perl notes 23
Tokenizing
# tokenize an arithmetic expression
while($_){
if(/^(\d+)/) {
push @tok, ‘num’, $1;
} elsif(/^([+\-\/*()])/) {
push @tok, ‘punct’, $1;
} elsif (/^([\d\D])/) {
die “invalid char $1 in input”;
}
$_ = substr($_, length $1);
}
• substr slows things down– cut start of string
Perl notes 24
Tokenizing 2
while(/
(\d+) |
([+\-\/*()]) |
([\d\D])/gx) {
if($1 ne “”){
push @tok, ‘num’, $1;
}elsif ($2 ne “”) {
push @tok, ‘punct’, $2;
}else {
die “invalid char $3 in input”;
}
}
Perl notes 25
Tokenizing 3
{
if(/\G(\d+)/gc) {
push @tok, ‘num’, $1;
} elsif(/\G([+\-\/*()])/gc) {
push @tok, ‘punct’, $1;
} elsif (/\G([\d\D])/gc) {
die “invalid char $1 in input”;
}else{
last;
}
redo;
}
Perl notes 26
Use split for clarity
($a, $b, $c) =
/^(\S+)\s+(\S+)\s+(\S+)/;
($a, $b, $c) = split /\s+/, $_;
($a, $b, $c) = split;
Get the fifth field:
($a) =
/[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/;
or
($a) = /(?:[^:]*:){4}([^:]*)/;
or
($a) = (split /:/)[4];
Perl notes 27
unpacps l
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2 0:00 -tcsh
000 1216 30658 30562 10 0 2780 1080 - R pts/2 0:00 ps l
chomp (@ps = `ps l`);
shift @ps;
for(@ps){
($uid, $pid, $sz, $tt) =
unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_;
print "$uid, $pid, $sz, $tt\n";
}
Perl notes 28
Avoid regex for simple strings
do_it() if $answer eq ‘yes’;
do_it() if $answer =~ /^yes$/;
do_it() if $answer =~ /yes/;
do_it() if lc($answer) eq ‘yes’;
do_it() if $answer =~ /^yes$/i;
Perl notes 29
#!/usr/bin/perl
# remove the comments from a C program
$filename = shift or die "usage $0 filename\n";
open FIN, $filename or die "can't open file";
while (<FIN>){
for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){
if($in_comment){
$in_comment = 0 if $_ eq "*/";
} else {
if ($_ eq "/*") {
$in_comment = 1;
print " ";
} else {
print;
}
}
}
print "\n";
}
Perl notes 30
References
$a = 3.1416;
$scalar_ref = \$a;
$array_ref = \@a;
$hash_ref = \%a;
$array_el_ref = \$a[3];
$hash_el_ref = \$a{‘John’};
Perl notes 31
Lists of Lists
@LoL = (
[“fred”, “barney” ],
[“george”, “jane”, “elroy” ],
[“homer”, “marge”, “bart” ],
);
print $LoL[2][2]; # prints “bart”
$ref_to_LoL = [
[“fred”, “barney” ],
[“george”, “jane”, “elroy” ],
[“homer”, “marge”, “bart” ],
];
print $ref_to_LoL ->[2][2];
• Note:$LoL[2][2] implies $LoL[2]->[2]
Perl notes 32
Grow your own
while(<>){
@tmp = split;
push @LoL, [ @tmp ];
}
Perl notes 33
Hashes of Arrays%HoL = (
flinstones => [“fred”, “barney” ],
jetsons => [“george”, “jane”, “elroy” ],
simpsons => [“homer”, “marge”, “bart” ],
);
• generation# reading from a file with format:
# flistones: fred barney ..
while(<>){
next unless s/^(.*?):\s*//;
$HoL{$1} = [ split ];
}
• orwhile($line = <>){
($who, $rest) = split /:\s*/, 2;
@fields = split ‘ ‘, $rest;
$Hol{$who} = [ @fields ];
}
Perl notes 34
Hashes of Arrays# calling a function
for $group (flinstones, jetsons, simpsons) { %HoL($group) = [ get_family($group) ];
);
# append member to existing family
push @{ $HoL{flinstones} }, “wilma”, “betty”;
• access$HoL{flinstone}[0] = “fred”;
Perl notes 35
Packages, Modules, and Object Classes