re review (perl syntax)

Fall 2006 CSE 467/567 1

RE review (Perl syntax)single-character disjunction: [aeiou]ranges: [0-9]negation: [^aeiou]conjunction: /cat/matching zero or one: /cats?/Kleene * and +: /[ab]+/ matches ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’,

etcwildcard: /c.t/ matches “cat”, “cbt”, “cct”, … anchors: ^, $, \b, \B

/projects/CSE467/Resources/Code/Perl

Fall 2006 CSE 467/567 2

Conjunction

Two regular expressions are conjoined by juxtaposition (placing the expressions side by side).

Examples:

/a/ matches ‘a’

/m/ matches ‘m’

/am/ matches ‘am’ but not ‘a’ or ‘m’ alone

Fall 2006 CSE 467/567 3

Disjunction

We have already seen disjunction of characters using the square bracket notation

General disjunction is expressed using the vertical bar (|), also called the pipe symbol.

This form of disjunction allows us to match any one of the alternative patterns, not just characters like the [ ] disjunction form.

Fall 2006 CSE 467/567 4

Grouping

• Parentheses, ‘(’ and ‘)’, are used to group subpatterns of a larger pattern.

• Ex: /[Gg](ee)|(oo)se/

Fall 2006 CSE 467/567 5

Replacement

In addition to matching, we can do replacements when a match is found:

Example:To replace the British spelling of color with the American spelling, we can write:

s/colour/color/

Fall 2006 CSE 467/567 6

Registers – saving matches• To save a match from part of a pattern, to reuse it later on, Perl

provides registers• Registers are named \#, where # is the number of the register• Ex.

DE DO DO DO DE DA DA DAIS ALL I WANT TO SAY TO YOU

/(D[AEO].)*/ will match the first line

/(D[AEO])(.D[AEO]) \2 \2\s \1 (.D[AEO]) \3 \3/ matches it more specifically

This pattern also matches strings like DA DE DE DE DA DO DO DO

\s matches a whitespace character

Fall 2006 CSE 467/567 7

For more information

• PERL Regular Expression TUTorial– http://perldoc.perl.org/perlretut.html

• PERL Regular Expression reference page– http://perldoc.perl.org/perlre.html

http://perldoc.perl.org/perlretut.html

http://perldoc.perl.org/perlre.html

Fall 2006 CSE 467/567 8

Eliza• Published by Weizenbaum in 1966

• Modelled a Rogerian therapist

• Had no intelligence – worked by pattern matching and replacement

• Had some people convinced that it really understood!

• demo at http://chayden.net/eliza/Eliza.shtml

http://chayden.net/eliza/Eliza.shtml

http://chayden.net/eliza/Eliza.shtml

Fall 2006 CSE 467/567 9

Wordcount program

• Unix wordcount program (wc) counts lines, words and characters

• Determining counts & probabilities of words has many applications:– augmentative communiction– context-sensitive spelling error correction– speech recognition– hand-writing recognition

Fall 2006 CSE 467/567 10

Counting words in a corpora (preview)#!/usr/bin/perl

#FROM Perl BOOK, PAGE 39$/ = ""; # Enable paragraph mode.$* = 1; # ENABLE multi-line patterns.# Now read each paragraph and split into words. Record each# instance of a word in the %wordcount associative array.$total = 0;while (<>){ s/-\n//g; # Dehyphenate hyphenations (across lines) s/<s>//g; # Remove <s> tr/A-Z/a-z/; # Canonicalize to lowercase. @words = split(/\W*\s+\W*/, $_); foreach $word (@words) { $wordcount{$word}++; # Increment the entry. $total++; }}# Now print out all the entries in the %wordcount arrayforeach $word (sort keys(%wordcount)) { printf "(%8.6f\%) %20s occurs %3d time(s)\n", (100 * $wordcount{$word}/$total), $word, $wordcount{$word}; }printf "Total number of distinct words is %d.\n", $total;

re review (perl syntax)

Documents