r egular e xpression in p erl (p art 1) thach nguyen

15
REGULAR EXPRESSION IN PERL (PART 1) Thach Nguyen

Upload: berniece-clark

Post on 17-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

REGULAR EXPRESSION IN PERL (PART 1)

Thach Nguyen

Page 2: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

OBJECTIVE

Page 3: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

WHAT IS REGULAR EXPRESSION (REGEX, REGEXP)?

Big factor behind the fame of Perl A string that describe a pattern

Examples of pattern: Search engine to find webpage (Google) List files in directory (ls *.txt, dir *.*) Search, extract parts of strings, search and replace

(Microsoft Word)

Efficient, flexible to manipulate text Not really difficult to understand as reputation

Constructed using simple concepts (conditional, loop)

If getting used to terse notation of them, you’re good to go

Page 4: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

HOW TO USE REGEX

Part 1: basics (solve about 98% of your needs) Simple word matching Using character classes Matching this or that

Part 2: power tools (for the rest) Advanced regex operators Latest innovation

Page 5: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Simple word matching The simplest regex: a word, a string of

characters Match any string that contains that word

Eg:

Result: It matches

Page 6: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Simple word matching Operator

=~ : return true if the regex matched !~ : return true if doesn’t match

/ … / : delimiter to enclose the string/variable of string needed to search Eg: $greeting = “World”;

if (“Hello World” =~ /$greeting/) { … } Other arbitrary delimiters:

Page 7: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Simple word matching – Additional Can use the default variable $_ , the omit “$_ =~ ”

part Eg: $_ = “Hello World”; If (/World/) { … }

If regex matches in > 1 place: the earliest point is matched Eg: "Hello World" =~ /o/; # matches 'o' in 'Hello‘

Page 8: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Simple word matching – Special characters metacharacters: {}[]()^$.|*+?\

Use backslash \ to include

Escape Sequences ASCII characters (\n, \t. etc), arbitrary bytes (octal,

hexa )

Variables: substituted before matching Eg: $foo = ‘house’;

'cathouse' =~ /cat$foo/; # matches

Page 9: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Simple word matching – Special characters Anchor metacharacters: ^ and $ , to match the

beginning and the end of string

Overall: it’s just the surface of regex technology

Page 10: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS Using character classes:

A set of possible characters To match the whole class at particular point in the regex Denoted by brackets [ … ]

Eg: /item[0123456789]/; # matches 'item0' or ... or 'item9' "abc" =~ /[cab]/; # matches 'a‘

To match 'yes' in a case-insensitive way (yes, Yes, YES): /[yY][eE][sS]/ /yes/i (i : case-insensitive, modifier of

matching operation)

Page 11: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS Using character classes – Special characters:

Special characters: -]\^$ Needed a backslash to represent

] The end of a character class

$ Scalar variable

\ Escape sequences

- Range operator within character class

^ Negated character class

Page 12: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS Using character classes – Special characters:

Several abbreviations for common character classes

\d a digit and represents [0-9]

\s whitespace character, represents [\ \t\r\n\f]

\D negated \d

\S negated \s

\W negated \w

. any character but "\n"

\b matches a boundary between a wordcharacter and a non-word character \w\W or \W\w

Page 13: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS Issues:

why '.' matches everything but "\n“? We would like to ignore the newline characters, empty when

counting and matching on the line If we want to keep track of newlines: anchor ^ $,

modifiers /…/s (single line) and /…/m (multiple line)

No modifier //

‘.’ match any character except ‘\n’^, $: just match the beginning and end of string, before a newline

S modifier //s

Treat string as a single long line‘.’ match any character, ^ and $ just match the beginning and end of string before a newline

M modifier //m

Treat string as a set of multiple lines‘.’ match any character except ‘\n’^ and $ match at the start or end of any line in string

Both //sm Treat string as a single line, but detect multiple lines‘.’ match any character^ and $ match the start and end of any line within the string

Page 14: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

PART1: THE BASICS

Matching this or that: Able to match different possible words or strings Using alternation metacharacter | Eg:

"cats and dogs" =~ /dog|cat|bird/; # matches "cat“

"cats" =~ /cats|cat|ca|c/; # matches "cats"

Page 15: R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen

QUESTION