karthik sangaiah. developed by larry wall ◦ “there’s more than one way to do it” ◦...

21
Karthik Sangaiah

Upload: nancy-york

Post on 20-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Karthik Sangaiah

Page 2: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Developed by Larry Wall◦ “There’s more than one way to do it”◦ “Easy things should be easy and hard things

should be possible” Main purpose of Perl was for text

manipulation Regular Expressions fundamental to text

processing

Page 3: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

String that describes a pattern Simplest regex is a word A regex consisting of a word matches any

string that contains that word Ex:

◦ “Hello World” =~ /World/

Page 4: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

“=~” operator produces TRUE if regex matches a string

Ex:◦ if (“Sample Words”

=~ /Sample/){

print “It matches\n”;else{

print “It doesn’t match\n”;}

“!~” operator produces TRUE of regex does NOT match a string

Ex:◦ if (“Sample Words” !~

/Sample/){

print “It doesn’t match\n”;else{

print “It matches\n”;}

Page 5: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Can use variable as regex

Ex: $temp = “ls” “ls - l” =~ /$temp/

If using default variable “$_”:◦ “$_ =~” can be omitted

Ex: $_ = “ls -l”; if (/ls/) { print “It matches\n”;}else {

print “It doesn’t match\n”;}

Page 6: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Regexs in Perl are mostly treated as double-quoted Strings

Values of variables in regex will be subtituted in before regex is evaluated for matching

Ex:$foo = ‘vision’;‘television’ =~ /tele$foo/;

Page 7: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

“/ /” default delimiters can be changed to arbitrary delimiters by using “=~ m”

Ex:“Sample Text” =~ m!Text!;“Sample Text” =~ m{Text};“Sample Text” =~ m“Text”;

Page 8: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Reserved for use in regex notations◦ { }, [ ], ( ), ^, $, ., |, *, +, ?, \

Need to use “\” before use of a metacharacter in the regex

Ex:◦ “5*2=10" =~ /5\*2/;◦ "/usr/bin/perl" =~ /\/usr\/bin\/perl/;

“/” also needs to be backslashed if it’s used as the delimiter

Page 9: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

“^” matches at beginning of string “$” matches at end of string or before new

line at end of string Ex:

“television” =~ /^tele/;“television” =~ /vision$/;

When using “^” and “$”, regex has to match in beginning and end of string (i.e. match whole string).

Ex:“vision” =~ /^vision$/;

Page 10: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Allows a set of possible characters, rather than a single character to match

Character classes denoted by […] with a set of characters matched inside

Ex./[btc]all/; #Matches ball, tall, or call/word[0123456789]/; #Matches word0…word9

Page 11: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Special characters in character class are handled with backslash as well

Special characters within character class:◦ “-”, “]”, “\”, “^”, “$”, “.”, “]”

Ex:/[\$c]w/; #matches $w or cw$x = ‘btc’;/[$x]all/; #matches ball, tall, or call/[\$x]all/; #matches $all or xall/[\\$x]all/; #matches \all, ball, tall, or call

Page 12: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Special Char. “-” used as range operator Ex:

/word[0-9]/; #matches word0…word9/word[0-9a-z] /; #matches word0… word9, or worda… wordz

Special Char. “^” in first position of character class denotes a negated character class

Ex:/[^0-9]/; #matches a non-numeric character

Page 13: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Common character class abbreviations:◦ \d – digit, [0-9]◦ \s – whitespace character, [\ \t\r\n\f]◦ \w – word character(alphanumeric or _),◦ \D – negated \d◦ \S – negated \s◦ \W – negated \w◦ . – any character but “\n”

Abbreviations can be used inside and outside character classes

Page 14: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

“\b” matches boundary between a word character and a non-word character

Ex:◦ $x = “Exam1 Question from Sample Exam”;

◦ $x =~ /Exam/; #matches Exam in Exam1◦ $x =~ /\bExam/; #matches cat in Exam◦ $x =~ /\bExam\b/; #matches cat at end of string

Page 15: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Often, we want to match against lines and ignore newline characters

Sometimes we need to keep track of newlines.

//s – Single line matching //m – Multi-line matching These modifiers affect two aspects how the

regex is interpreted:◦ How the ‘.’ character class is defined◦ Where the anchor, ^ and $, are able to match

Page 16: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

No modifier (//) – Default◦ . matches all characters but \n◦ ^ matches at beginning of string◦ $ matches at end of string or before a newline at

the end of string String as Single long line (//s)

◦ . matches any character◦ ^ matches at beginning of string◦ $ matches end of string or before a newline at the

end of string

Page 17: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

String as Multiple lines (//m)◦ . matches all characters but \n◦ ^ matches at beginning of any line within the

string◦ $ matches end of any line within the string

String as Single long line but detect mutliple lines (//sm)◦ . matches any character◦ ^ matches at beginning of any line within the

string◦ $ matches end of any line within the string

Page 18: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

$x = “You will know how to use Perl\nFor text processing\n";

$x =~ /^For/; # No match, “For" not at start of string $x =~ /^For/s; # No match, “For" not at start of string $x =~ /^For/m; # match, “For" at start of second line $x =~ /^For/sm; # match, “For" at start of second

line

Page 19: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Alternation metacharacter “|”◦ Used to match different possible words or

character strings◦ Word 1 or word 2 -> /word1|word2/;

Perl tries to match the regex at earliest possible point in the string

Ex.“shoes and strings” =~ /shoes/strings/and/; #matches shoes“shoes” =~ /s|sh|sho|shoes/; #matches “s”“shoes” =~ /shoes|sho|s/; #matches “cats”

Page 20: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”

Perl Resource 5: Perl Regular Expressions Tutorial◦ http://www.cs.drexel.edu/~knowak/cs265_fall_201

0/perlretut_2007.pdf Perl History

◦ http://www.xmluk.org/perl-cgi-history-information.htm

Perl Special Variables◦ http://www.kichwa.com/quik_ref/

spec_variables.html

Page 21: Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible”