chapter 11: perl scripting off larry’s wall. in this chapter … background terminology syntax...

35
Chapter 11: Perl Scripting Off Larry’s Wall

Upload: harriet-sherman

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Chapter 11:Perl Scripting

Off Larry’s Wall

Page 2: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

In this chapter …• Background

• Terminology

• Syntax

• Variables

• Control Structures

• File Manipulation

• Regular Expressions

Page 3: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Perl• Practical Extraction and Report Language• Developed by Larry Wall in 1987• Originally created for data processing and

report generation• Elements of C, AWK, sed, scripting• Add-on modules and third party code make it

a more general programming language

Page 4: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Features• C-derived syntax

• Ambiguous variables & dynamic typing

• Singular and plural variables

• Informal, easy to use

• Many paradigms – procedural, functional, object-oriented

• Extensive third party modules

Page 5: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Features, con’t• As elegant as you make it

• Do What I Mean intelligence

• Fast, easy, down and dirty coding

• Interpreted, not compiled

• perldoc – man pages for Perl modules

Page 6: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Terminology• Module – one stand alone piece of code

• Distribution – set of modules

• Package – a namespace for one or more distributions

• Package variable – declared in package, accessible between modules

• Lexical variable – local variable (scope)

Page 7: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Terminology, con’t• Scalar – variable that contains only one

value (number, string, etc)

• Composite – variable made of one or more scalars

• List – series of one or more scalars– e.g. (2, 4, ‘Zach’)

• Array – composite variable containing a list

Page 8: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Invoking Perl• perl –e ‘text of perl program’

• perl perl_script

• Make perl script executable and you can execute the script itself– i.e. ./my_script.pl

• Common file extension .pl not required

• Like other scripts start with #! to specify execution program

Page 9: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Invoking Perl, con’t• Use perl –w to display warnings

– Will warn if using undeclared variables– Instead of –w, use warnings; in your script

• Same effect

• Usually you’ll find perl in /usr/bin/perl

Page 10: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Syntax• Each perl statement ended by semicolon (;)

• Can have multiple statements per line

• Whitespace ignored largely– Except within quoted strings

• Double quotes allow interpretation of variables and special characters (like \n)

• Single quotes don’t (just like the shell)

Page 11: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Syntax, con’t• Forward slash used to delimit regular

expressions (e.g. /.*sh?/)

• Backslash used for escape characters– E.g. \n – newline, \t – tab

• Lines beginning with # are ignored as comments

Page 12: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Output• Old way

– print what_to_print;– Concatenate

• print item_1, item_2

– Want a newline?• print what_to_print, “\n”

• New way– say what_to_print

• Automatically adds newline

Page 13: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Output, con’t• what_to_print can be many things

– Quoted string – “Here’s some text”– Variables - $myvar– Result of a function – toupper($myvar)– A combination

• print “Sub Tot: $total \n”, “Tax: $total*$tax \n”

• Want to display an error and exit?– die “Uh-oh!\n”;

Page 14: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Variables• Perl variables can be singular or plural

• Data typing done dynamically at runtime

• Three types– Scalar (singular)– Array (plural)– Hash a.k.a. Associative Arrays (plural)

• Variable names are case sensitive

• Can contain letters, numbers, underscore

Page 15: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Variables, con’t• Each type of variable starts with a different

special character to mark type

• By default all variables are package in scope

• To make lexical, preface declaration with my keyword

• Lexical variables override package variables

• Include use strict; to not allow use of undeclared variables

Page 16: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Variables, con’t• We’ve already covered use warnings;

• Undeclared variables, if referenced, have a default value of undef– Equates to 0 or null string– Can check by using defined() function

• $. is equal to the line number you’re on

• $_ is the default operand – ‘it’

Page 17: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Scalars• Singular, holds one value, either string or

number

• Must be preceded with $ i.e. $myvar

• Perl will automatically cast between strings and numbers

• Will treat as a number or string, whichever is appropriate in context

Page 18: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Arrays• Plural, containing an ordered list of scalars

• Zero-based indexing

• Dynamic size and allocation

• Begin with @ e.g. @myarray

• @variable references entire array

• To reference a single element (which would be a scalar, right?) $variable[index]

Page 19: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Arrays, con’t• $#array returns the index of the last element

– Zero based – this means it’s one less than the size of the array

• @array[x..y] returns a ‘slice’ or sublist

• Printing arrays– Array enclosed in double quotes prints space

delimited list– Not in quotes all entries concatenated

Page 20: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Arrays, con’t• Arrays can be treated like FIFO queues

– shift(@array) – pop first element off– push(@array, scalar) – push element on at end

• Use splice to combine arrays– splice(@array,offset,length,@otherarray)

Page 21: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Hashes• Plural, contain an array of key-value pairs

• Prefix with % i.e. %myhash

• Keys are strings, act as indexes to array

• Each key must be unique, returns one value

• Unordered

• Optimized from random access

• Keys don’t need quotes unless there are spaces

Page 22: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Hashes, con’t• Element access

– $hashvar{index} = value• e.g. $myvar{boat} =“tuna”; print $myvar{boat};

– %hashvar = ( key => value, …);• e.g. %myvar = ( boat => “tuna”, 4 => “fish”);

– Get array of keys or values• keys(%hashvar)• values(%hashvar)

Page 23: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Evaluating Expressions• Most control structures use an expression to

evaluate whether they are run

• Perl uses different comparison operators for strings and numbers

• Also uses the same file operators (existence, access, etc) that bash uses

Page 24: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Expressions• Numeric operators

– ==, !=, <, >, <=, >= – <=> returns 0 if equal, 1 if >, -1 if <

• String Operators– eq, ne, lt, gt, le, ge– cmp same as <=>

Page 25: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Control Structures• if (expr) {…}

• unless (expr) {…}

• if (expr) {…} else {…}

• if (expr) {…} elsif (expr) {…} … else {…}

• while (expr) {…}

• until (expr) {…}

Page 26: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Control Structures, con’t• for and foreach are interchangeble

• Syntax 1– Similar to bash for…in structure– foreach [var] (list) {…}– If var not defined, $_ assumed– For each loop iteration, the next value from list is

populated in var

Page 27: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Control Structures, con’t• for/foreach Syntax 2

– Similar to C’s for loop– foreach (expr1; expr2; expr3) {…}– expr1 sets initial condition– expr2 is the terminal condition– expr3 is the incrementor

Page 28: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Control Structures, con’t• Short-circuiting loops

– Use last to break out of loop altogether• Same as bash’s break

– Use next to skip to the next iteration of the loop• Same as bash’s continue

Page 29: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Handles• A handle is essentially a variable linked to a

file or process

• Perl automatically opens handles for the default streams– STDIN, STDOUT, STDERR

• You can open additional handles– To a file for input/output/appending– To a process for input/output

Page 30: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Handles, con’t• Basic syntax

– open(handle, [‘mode’], “ref”);– handle is a variable to reference the handle– mode can be many things

• Simple cases: <, >, >>, |• Input (<) implied if omitted

– ref is what to open – file or process– mode and ref can be combined as one string

Page 31: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Handles, con’t• Once open access via handle variable

• Output– print handle “what to print”

• Input– $var = <handle> gets one line of input– Use <handle> as a loop condition to read input

one line at a time, populating $_

Page 32: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Handles, con’t• <> - magic handle, pulls from STDIN or

command line arguments to perl

• Line of input contains EOL character– Use chomp($var) to remove it– Use chop($var) to remove the last character

• When done close(handle);– Housekeeping, good coding practice– Perl actually closes all open handles for you

Page 33: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Handles, con’t• Examples

– open(my $INPUT, “/path/to/file”);– open(my $ERRLOG, “>>/var/log/errors”);– open(my $SORT, “| sort –n”);– open(my $ALIST, "grep \'^[Aa]\' /usr/share/dict/words|")– while(<INPUT>) { print $ERRLOG $_; }

Page 34: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

Regular Expressions• Recall Appendix A

• Perl has a few unique features and caveats

• Regular Expressions (RE) delimited by forward slash

• Perl uses the =~ operator for RE matching– Ex. if ($myvar =~ /^T/) { …} # if myvar starts w/ T

• To negate RE matching use !~ operator

Page 35: Chapter 11: Perl Scripting Off Larry’s Wall. In this chapter … Background Terminology Syntax Variables Control Structures File Manipulation Regular Expressions

RE, con’t• =~ operator can also be used to do

replacement– Ex. $result =~s/old/new/;– ‘old’ replaced with ‘new’ if matched

• Remember, RE (esp. in Perl) are greedy– Will match longest possible match

• Bracketed expressions don’t need to be escaped, just use parentheses