an introduction to perl sources and inspirations:...
TRANSCRIPT
An Introduction to Perl
Sources and inspirations:
http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Perl/lecture.html
Randal L. Schwartz and Tom Christiansen,“Learning Perl” 2nd ed., O’Reilly
Randal L. Schwartz and Tom Phoenix,“Learning Perl” 3rd ed., O’Reilly
Dr. Nathalie Japkowicz, Dr. Alan Williams
Go O'Reilly! CSI 3125, Perl, page 1
CSI 3125, Perl, page 2
Perl overview (1)
• Perl = Practical extraction and report language
• Perl = Pathologically eclectic rubbish lister
• It is a powerful general-purpose language, which is particularly useful for writing “quick and dirty” programs.
• Invented by Larry Wall, with no apologies for its lack of elegance (!).
• If you know C and a fair bit of Unix (or Linux), you can learn Perl in days (well, some of it...).
CSI 3125, Perl, page 3
Perl overview (2)
• In the hierarchy of programming language, Perl is located half-way between high-level languages such as Pascal, C and C++, and shell scripts (languages that add control structure to the Unix command line instructions) such as sh, sed and awk.
• By the way:
– awk = Aho, Weinberger, Kernighan
– sed = Stream Editor.
CSI 3125, Perl, page 4
Advantages of Perl (1)
• Perl combines the best (according to its admirers ) features of:
Unix/Linux shell programming,
The commands sed, grep, awk and tr,
C,
Cobol.
• Shell scripts are usually written in many small files that refer to each other. Perl achieves the functionality of such scripts in a single program file.
CSI 3125, Perl, page 5
Advantages of Perl (2)
• Perl offers extremely strong regular expression capabilities, which allow fast, flexible and reliable string handling operations, especially pattern matching.
As a result, Perl works particularly well in text processing applications.
• As a matter of fact, it is Perl that allowed a lot of text documents to be quickly moved to the HTML format in the early 1990s, allowing the Web to expand so rapidly.
CSI 3125, Perl, page 6
Disadvantages of Perl
• Perl is a jumble! It contains many, many features from many languages and tools.
• It contains different constructs for the same functionality (for example, there are at least 5 ways to perform a one-line if statement).
It is not a very readable language.
• You cannot distribute a Perl program as an opaque binary. That is, you cannot really commercialize products you develop in Perl.
CSI 3125, Perl, page 7
Perl resources and versions
• http://www.perl.org tells you everything that you want to know about Perl.
• What you will see here is Perl 5.
• Perl 5.8.0 has been released in July 2002.
• Perl 6 (http://dev.perl.org/perl6/) is the next version, still under development, but moving along nicely. The first book on Perl 6 is in stores (http://www.oreilly.com/catalog/perl6es).
CSI 3125, Perl, page 8
Scalar data: strings and numbersScalars need not to be defined or their types declared:Perl understands from context.
% cat hellos.pl#!/usr/bin/perl -wprint "Hello" . " " . "world\n";print "hi there " . 2 . " worlds!" ."\n";print (("5" + 6) . " eggs\n" . " in " . " 3 + 2 = " . ("3" + "2") . " baskets\n" );
invoke Perl% hellos.pl
Hello worldhi there 2 worlds!11 eggs in 3 + 2 = 5 baskets
CSI 3125, Perl, page 9
Scalar variables
Scalar variable names start with a dollar sign. They do not have to be declared.
% cat scalar.pl#!/usr/bin/perl -w$i = 1;$j = "2";print "$i and $j \n";$k = $i + $j;print "$k\n";print $i . $j . "\n";print '$k\n' . "\n";
% scalar.pl
1 and 2312$k\n
CSI 3125, Perl, page 10
Quotes and substitution
Suppose $x = 3Single-quotes ' ' allow no substitution except for the escape sequences \\ and \'.
print('$x\n'); gives $x\n and no new line.Double-quotes " " allow substitution of variables like $x and control codes like \n (newline).
print("$x\n"); gives 3 (and a new line).Back-quotes ` ` also allow substitution, then try to execute the result as a system command, returning as the final value whatever the system command outputs.
$y = `date`; print($y); results in Sun Aug 10 07:04:17 EDT 2003
CSI 3125, Perl, page 11
Control statements: if, else, elsif
% names.plstan'stan' follows 'fred'
my input
cut newline
Perl's output
% cat names.pl#!/usr/bin/perl -w$name = <STDIN>;chomp($name);if ($name gt 'fred') { print "'$name' follows 'fred'\n";}elsif ($name eq 'fred') { print "both names are 'fred'\n";}else { print "'$name' precedes 'fred'\n";}
% names.plStan'Stan' precedes 'fred'
standard input
CSI 3125, Perl, page 12
Control statements: loops (1)
% oddsum_while.pl10Use of uninitialized value at oddnums.pl line 6, <STDIN> chunk 1.The total is 25.
my input
% cat oddsum_while.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$n = 1;while ($n < $max) { $sum += $n; $n += 2; } # On to the next odd numberprint "The total is $sum.\n";
a warningPerl's output
CSI 3125, Perl, page 13
Control statements: loops (2)
• End-line comments begin with #
• It is okay, though not nice, to use a variable without initialization (like $sum). Such a variable is initialized to 0 if it is first used as a number or to the empty string "" if it is first used as a string. In fact, it is always undef, variously converted.
• Perl can, if asked, issue a warning (use the -w flag).
• Of course, while is only one of many looping constructs in Perl. Read on...
CSI 3125, Perl, page 14
Control statements: loops (3)
% cat oddsum_until.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$n = 1;$sum = 0;until ($n >= $max) { $sum += $n; $n += 2; } # On to the next odd numberprint "The total is $sum.\n";% oddsum_until.pl10The total is 25.
CSI 3125, Perl, page 15
Control statements: loops (4)
% cat oddsum_for.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$sum = 0;for ($n = 1 ; $n < $max ; $n += 2) { $sum += $n; }print "The total is $sum.\n";% oddsum_for.pl10The total is 25.
We also have do-while and do-until, and we have foreach. Read on.
CSI 3125, Perl, page 16
Control statements: loops (5)
% cat oddsum_foreach.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$sum = 0;foreach $n ( (1 .. $max) ) { if ( $n % 2 != 0 ) { $sum += $n; }
}print "The total is $sum.\n";% oddsum_foreach.pl10The total is 25.
CSI 3125, Perl, page 17
Control constructs compared
C Perl (braces required)
the same if () { ... } if () { ... }
if (! ) { ... } unless () { ... }
different } else if () { ... } } elsif () { ... }
the same while () { ... } while () { ... }
the same for (aa;bb;cc) {...} for (aa;bb;cc) {...}
foreach $v (@array){... }
different break last
different continue next
similar 0 is FALSE 0, "0", and "" are FALSE
similar != 0 is TRUE anything not false is TRUE
CSI 3125, Perl, page 18
Lists and arrays
• A list is an ordered collection of scalars. An array is a variable that contains a list.
• Each element is an independent scalar value. A list can hold numbers, strings, undef values—any mixture of kinds of scalar values.
• To use an array element, prefix the array name with a $; place a subscript in square brackets.
• To access the whole array, prefix its name with a @.
• You can copy an array into another. You can use the operators sort, reverse, push, pop, split.
CSI 3125, Perl, page 19
Command-line arguments
Suppose that a Perl program stored in the file cleanUp is invoked in Unix/Linux with the command:
cleanUp -o result.htm data.htm
The built-in list named @ARGV then contains three elements:
('-o', 'result.htm', 'data.htm')
These three element can be accessed as:$ARGV[0]$ARGV[1]$ARGV[2]
CSI 3125, Perl, page 20
Array examples (1)% cat arraysort.pl#!/usr/bin/perl -w$i = 0;while ($k = <STDIN>) { $a[$i++] = $k; }print "===== sorted =====\n";print sort(@a);% arraysort.plNathalieFrankhelloJohnZebranotarynil
control-D here
===== sorted =====FrankJohnNathalieZebrahellonilnotary
CSI 3125, Perl, page 21
Array examples (2A)
% whole_rev.pla b c de fg h i== reversed ==g h ie fa b c d
Reversing a text file (whole lines).
% cat whole_rev.pl#!/usr/bin/perl -wwhile ($k = <STDIN>) { push(@a, $k); }print "== reversed ==\n";while ($oldval = pop(@a)) { print $oldval; }
control-D here
CSI 3125, Perl, page 22
Array examples (2B)
% each_rev.pl
a bc d efg
efg d bc a
hi j
j hi
klm nopq st
st nopq klm
Reversing each line in a text file
% cat each_rev.pl#!/usr/bin/perl -wwhile($k = <STDIN>) { @a = split(/\s+/, $k); $s = ""; for ($i = @a; $i > 0; $i--) { $s = "$s$a[$i-1] "; } chop($s); print "$s\n"}
outputcontrol-Dsplit cuts the line on white space
(we will see regular expressions soon)
CSI 3125, Perl, page 23
Array examples (3)
Reversing a text file (whole lines)
print reverse(<STDIN>);
Reversing each line in a text file
while($k = <STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $k))) { $s = "$s$i "; } chop($s); print "$s\n";}
CSI 3125, Perl, page 24
A digression:Perl's favourite default variable
while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $_))) { $s = "$s$i "; } chop($s); print "$s\n";}
by default,Perl reads into $_
while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/ ))) { $s = "$s$i "; } chop($s); print "$s\n";}
by default,Perl splits
$_ too!
CSI 3125, Perl, page 25
Hashes
• A hash is similar to an array, but instead of subscripts, we can have anything as a key, and we use curly brackets rather than square brackets.
• The official name is associative array (known to be implemented by hashing ).
• Keys and values can be any scalars; keys are always converted to strings.
• To refer to a hash as a whole, prefix its name with a %.
• If you assign a hash to an array, it becomes a simple list.
CSI 3125, Perl, page 26
Hash examples I (1)
% cat hash_array.pl#!/usr/bin/perl -w%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n");@an_array = %some_hash;print "@an_array\n========\n";foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n";}
CSI 3125, Perl, page 27
Hash examples I (2)
% hash_array.pl
betty bye
wilma 1.72e+30 foo 35 2.5 hello bar 12.4
========
betty: bye
wilma: 1.72e+30
foo: 35
2.5: hello
bar: 12.4
%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n");@an_array = %some_hash;print "@an_array\n========\n";foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n";}
CSI 3125, Perl, page 28
Hash examples II
% hash_arrows.pla => 1b => 2c => 3
% cat hash_arrows.pl#!/usr/bin/perl -wmy %hash = ( "a" => 1, "b" => 2, "c" => 3);foreach $key (sort keys %hash) { $value = $hash{$key}; print "$key => $value\n";}
CSI 3125, Perl, page 29
A brief interlude:the diamond operator
% cat aone-atwo-a% cat bthree-bfour-bfive-b% concat a bone-atwo-athree-bfour-bfive-b
% concat a b >c% cat cone-atwo-athree-bfour-bfive-b
<> loops over the files listed as command-line arguments;$_ is the current input line
% cat concat#!/usr/bin/perl -wwhile ( <> ) { print $_; }
CSI 3125, Perl, page 30
Hash examples III:character frequency count
% cat frequency.pl#!/usr/bin/perl -wwhile (<>) { # split $_ into single characters, loop foreach $c (split //) { # Increment $count of $c ++$count{$c};} }# end of input, print %countfor $c (sort keys %count) { print "$c\t$count{$c}\n";}
CSI 3125, Perl, page 31
Character frequency count (2)
% frequency.plNathalieFranhelloJohnratherNotaryF 1J 1
^D
8 21 2F 2J 2N 2a 5e 3h 4i 1l 3n 2o 3r 4t 3y 1
space\n
CSI 3125, Perl, page 32
Subroutines
• A subroutine is a user-defined function. The syntax is very simple; so is the semantics.
#!/usr/bin/perlsub max { if ( $x > $y ) { $x } else { $y }}$x = 10; $y = 11;print &max . "\n";
• There are no arguments; the script accesses two global variables. The subroutine call is marked with &. The value returned is that of the last expression evaluated.
CSI 3125, Perl, page 33
Subroutines (2)
A few housekeeping rules.• You can place your definitions anywhere in the file,
though it is recommended to have them at the beginning.• Perl always uses the latest definition in the file—any
preceding one is ignored.• Certain elements of the syntax are optional.
• The & might sometimes be omitted (but it is not a good idea).
• The return operator may precede a value to be returned (this can be useful):
if ( $x > $y ) { return $x }
else { return $y }
CSI 3125, Perl, page 34
Subroutines (3)
• Clearly, the use of global variables is much too limited. Subroutines take arguments, and work on them via a predefined list variable @_ or its elements $_[0], $_[1] and so on.
#!/usr/bin/perlsub max { if ( $_[0] > $_[1] ) { $_[0] } else { $_[1] }}print &max ( 12, 13 ) . "\n";
CSI 3125, Perl, page 35
Subroutines (4)
•$_[0], $_[1] are not fun to work with. We can rename them locally, using the my operator—it creates a sub's private variables. Here, we declare two such variables and right away initialize them.
#!/usr/bin/perlsub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 15, 14 ) . "\n";
CSI 3125, Perl, page 36
Subroutines (5)
• But: this is not a safe max calculation.
#!/usr/bin/perlsub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";print &max ( 26 ) . "\n";
• This produces 19 (23 gets ignored) and 26 (the second value is undef, that is, 0).
CSI 3125, Perl, page 37
Subroutines (6)
• We could stop the subroutine if the number of arguments is wrong. The (generally very useful!) operator die does that for us.
#!/usr/bin/perlsub max { if ( @_ != 2 ) { die "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";
The script is stopped after printing this:max needs two arguments: 16 19 23
CSI 3125, Perl, page 38
Subroutines (7)
• We can have just a warning, if we use the operator warn instead.
#!/usr/bin/perlsub max { if ( @_ != 2 ) { warn "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";
The script prints this:max needs two arguments: 16 19 2319
CSI 3125, Perl, page 39
Subroutines (8)
• It is, by the way, not a bad idea to generalize max by allowing it to take any number of arguments.
#!/usr/bin/perlsub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max}print &max ( 15, 14 ) . "\n";print &max ( 16, 19, 23 ) . "\n";print &max ( 26 ) . "\n";
CSI 3125, Perl, page 40
Subroutines (9)
• This even works for empty lists.
#!/usr/bin/perlsub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max}$z = &max ( );if ( defined $z ) { print $z . "\n"; }else { print "undefined\n"; }
CSI 3125, Perl, page 41
Regular expressions (1)
• A regular expression (also called a pattern) is a template that describes a class of strings. A string can either match or not match the pattern.
• The simplest pattern is one character.• A character class—the pattern matches any of
these characters—is written in square brackets:[01234567] an octal digit[0-7] an octal digit[0-9A-F] a hex digit[^A-Za-z] not a letter (^ "negates")[0-9-] a decimal digit
or a minus
CSI 3125, Perl, page 42
Regular expressions (2)
• Metacharacters:. (dot) any character except \n
• Anchors:^ the beginning of a string$ the end of a string
• Multipliers:* repeat the preceding item 0 or more times+ repeat the preceding item 1 or more times? make the preceding item optional{n} repeat n times{n, m} repeat n to m times (n <= m){n,} repeat n or more times
CSI 3125, Perl, page 43
Regular expressions (3)
$x = "01239876AGH";
if ( $x =~ /^0[1-9]{4,}/ ){ print "yes1\n"; }
if ( $x =~ /[A-Z]{3}$/ ){ print "yes2\n"; }
if ( $x =~ /^.*[A-Z]{4}$/ ){ print "yes3\n"; }
• The Boolean operator =~ tries to match a string with a regular expression written inside slashes.
CSI 3125, Perl, page 44
Regular expressions (4)
$x = "01239876AGH";
if ( $x =~ /([0-9]{4}|[A-Z]{3}){2,}/ ){ print "yes4\n"; }
if ( $x =~ /(0?|4)(5|[1abc]{1,})/ ){ print "yes5\n"; }
• Patterns can be grouped by parentheses (the whole pattern becomes one item).
Alternative is denoted by the bar |.
CSI 3125, Perl, page 45
Regular expressions (5)
• The precedence of pattern elements:parentheses ( )multipliers * + ? {n} {n,m} {n,}sequence, anchors ^ $alternation |
• Some character classes are predefined:class not
classdigit \d \Dword char [a-zA-Z0-9_] \w \Wwhitespace \s \S
• Some additional anchors:word boundary \b \B
CSI 3125, Perl, page 46
Regular expression examples (1)
$i = "Jim";
match
$i =~ /Jim/; yes
$i =~ /J/; yes
$i =~ /j/; no
$i =~ /j/i; yes
$i =~ /\w/; yes
$i =~ /\W/; no
Case is ignored in matching if the postfix i is used.
CSI 3125, Perl, page 47
Regular expression examples (2)
$j = "JjJjJjJj";
$j =~ /j*/; yes: matches anything
$j =~ /j+/; yes: matches the first j
$j =~ /j?/; yes: matches the first j
$j =~ /j{2}/; no
$j =~ /j{2}/i; yes: ignores case
$j =~ /(Jj){3}/; yes
CSI 3125, Perl, page 48
Regular expression examples (3)
$k = "Boom Boom, out go the lights!";
$k =~ /Jim|Boom/; # yes: matches Boom
$k =~ /(Boom){2}/; # no: a space between Booms
$k =~ /(Boom ){2}/; # no: fails on the comma
$k =~ /(Boom\W){2}/; # yes: \W is space, comma
$k =~ /\bBoom\b/; # yes
$k =~ /\bBoom.*the\b/; # yes
$k =~ /\Bgo\B/; # no: "go" is a complete word
$k =~ /\Bgh\B/; # yes: the "gh" inside "lights"
CSI 3125, Perl, page 49
Regular expression substitution (1)
We can modify a string variable by applying a substitution.The operator is =~ and the substitution is written as:
s/pattern1/pattern2/
$v = "a string to play with";
$v =~ s/^\w+/just a single/;
print "$v\n";
just a single string to play with
CSI 3125, Perl, page 50
Regular expression substitution (2)
Matched patterns are remembered in built-in variables$1, $2, $3 etc. These variables keep their values till the next matching operation.Each set of paretheses in a pattern corresponds to a "memory" variable.
# $v == "just a single string to play with"
$v =~ s/(\b\w*\b)(.*)/'$1'$2/;
print "$v\n";
print "$2, $1 $1\n";
'just' a single string to play with
a single string to play with, just just
CSI 3125, Perl, page 51
Regular expression substitution (3)
A substitution can be applied to all occurrences of the pattern, that is, globally:
s/pattern1/pattern2/g
# $v == "'just' a single string to play with"
$v =~ s/\b\w*\b/word/g;
print "$v\n";
'word' word word word word word word
$v =~ s/\b\w*\b$/last/;
print "$v\n";
'word' word word word word word last
CSI 3125, Perl, page 52
Regular expression substitution (4)
$v = "This is a double double word.";
$v =~ s/(\b\w+\b) \1/\1/;
print "$v\n";
This is a double word.
$v = "This is a triple triple triple word.";
$v =~ s/(\b\w+\b) \1 \1/\1/;
print "$v\n";
This is a triple word.
Parentheses as memory can help construct powerful patterns with "instant repetition". We can use \1, \2 etc. for matched substrings.
CSI 3125, Perl, page 53
Regular expression substitution (5)
$Day = '0[1-9]|[12][0-9]|3[01]|[1-9]';$Month = '0[1-9]|1[012]|[1-9]';# Year number up to 31 must have a leading zero or two.$Year = '[0-9]{4}|[0-9]{3}|3[2-9]|[4-9][0-9]';while(<>){ # Find all dates, selecting and reinserting the context. # $1 and $6 match the context. Superfluous digits, # as 43 and 55 in 432001-01-2255, belong in the context. # "Dates" such as April 31 or February 30 are allowed. # There are no provisions for leap years. s/(\D*)(($Year)-($Month)-($Day))(\D|.*$)/$1<date>$2<\/date>$6/g; s/(\D*)(($Day)-($Month)-($Year))(\D|.*$)/$1<date>$2<\/date>$6/g; print $_;}
Here is a more realistic example (last year's homework).
You rather need explanations: in class, please.
CSI 3125, Perl, page 54
Regular expression substitution (6)
DATA
Both 12-09-2000 and 25-8-324 are good dates,
but 30-14-1955 and 10-10-10 are not. OTOH, 10-10-010 is.
RESULTS
Both <date>12-09-2000</date> and <date>25-8-324</date> are good dates,
but 30-14-1955 and 10-10-10 are not. OTOH, <date>10-10-010</date> is.
One example run, to show how it works.
CSI 3125, Perl, page 55
In another course
• Predefined variables (lots!)
• More on lists, arrays and hashes
• More on regular expressions
• File management
• Directory management
• Process management
• Perl database facilities
• CGI programming
• ... and more, and much more
CSI 3125, Perl, page 56
Adapted from Programming Perl, page 361.
1. Testing "all-at-once" instead of incrementally, either bottom-up or top-down.
2. Optimistically skipping print scaffolding to dump values and show progress.
3. Not running with the perl -w switch to catch obvious typographical errors.
4. Leaving off $ or @ or % from the front of a variable.
5. Forgetting the trailing semicolon.
6. Forgetting curly braces around a block.
Mistakes that novices make (1)Thanks to Alan Williams for this list
CSI 3125, Perl, page 57
7. Unbalanced (), {}, [], "", '', ``, and sometimes <>.
8. Confusing '' and "", or / and \.
9. Using == instead of eq, != instead of ne, = instead of ==, and so on.
• ('White' == 'Black') and ($x = 5) evaluate as (0 == 0) and (5) and thus are true!
10.Using "else if" instead of "elsif".
11.Putting a comma after the file handle in a print statement.
Mistakes that novices make (2)
CSI 3125, Perl, page 58
Mistakes that novices make (3)
12.Not chopping the output of backquotes `date` or not chopping input:• print "Enter y to proceed: ";• $ans = <STDIN>;• chop $ans;• if ($ans eq 'y') { print "You said y\
n";}• else { print "You did not say 'y'\n";}
13.Forgetting that Perl array subscripts and string indexes normally start at 0, not 1.
14.Using $_, $1, or other side-effect variables, then modifying the code in a way that unknowingly affects or is affected by these.
15.Forgetting that regular expressions are greedy, seeking the longest match not the shortest match.