re review (perl syntax)
DESCRIPTION
RE review (Perl syntax). single-character disjunction: [aeiou] ranges: [0-9] negation: [^aeiou] conjunction: /cat/ matching zero or one: /cats?/ Kleene * and +: /[ab]+/ matches ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’, etc wildcard: /c.t/ matches “cat”, “cbt”, “cct”, … anchors: ^, $, \b, \B - PowerPoint PPT PresentationTRANSCRIPT
Fall 2006 CSE 467/567 1
RE review (Perl syntax)single-character disjunction: [aeiou]ranges: [0-9]negation: [^aeiou]conjunction: /cat/matching zero or one: /cats?/Kleene * and +: /[ab]+/ matches ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’,
etcwildcard: /c.t/ matches “cat”, “cbt”, “cct”, … anchors: ^, $, \b, \B
/projects/CSE467/Resources/Code/Perl
Fall 2006 CSE 467/567 2
Conjunction
Two regular expressions are conjoined by juxtaposition (placing the expressions side by side).
Examples:
/a/ matches ‘a’
/m/ matches ‘m’
/am/ matches ‘am’ but not ‘a’ or ‘m’ alone
Fall 2006 CSE 467/567 3
Disjunction
We have already seen disjunction of characters using the square bracket notation
General disjunction is expressed using the vertical bar (|), also called the pipe symbol.
This form of disjunction allows us to match any one of the alternative patterns, not just characters like the [ ] disjunction form.
Fall 2006 CSE 467/567 4
Grouping
• Parentheses, ‘(’ and ‘)’, are used to group subpatterns of a larger pattern.
• Ex: /[Gg](ee)|(oo)se/
Fall 2006 CSE 467/567 5
Replacement
In addition to matching, we can do replacements when a match is found:
Example:To replace the British spelling of color with the American spelling, we can write:
s/colour/color/
Fall 2006 CSE 467/567 6
Registers – saving matches• To save a match from part of a pattern, to reuse it later on, Perl
provides registers• Registers are named \#, where # is the number of the register• Ex.
DE DO DO DO DE DA DA DAIS ALL I WANT TO SAY TO YOU
/(D[AEO].)*/ will match the first line
/(D[AEO])(.D[AEO]) \2 \2\s \1 (.D[AEO]) \3 \3/ matches it more specifically
This pattern also matches strings like DA DE DE DE DA DO DO DO
\s matches a whitespace character
Fall 2006 CSE 467/567 7
For more information
• PERL Regular Expression TUTorial– http://perldoc.perl.org/perlretut.html
• PERL Regular Expression reference page– http://perldoc.perl.org/perlre.html
Fall 2006 CSE 467/567 8
Eliza• Published by Weizenbaum in 1966
• Modelled a Rogerian therapist
• Had no intelligence – worked by pattern matching and replacement
• Had some people convinced that it really understood!
• demo at http://chayden.net/eliza/Eliza.shtml
Fall 2006 CSE 467/567 9
Wordcount program
• Unix wordcount program (wc) counts lines, words and characters
• Determining counts & probabilities of words has many applications:– augmentative communiction– context-sensitive spelling error correction– speech recognition– hand-writing recognition
Fall 2006 CSE 467/567 10
Counting words in a corpora (preview)#!/usr/bin/perl
#FROM Perl BOOK, PAGE 39$/ = ""; # Enable paragraph mode.$* = 1; # ENABLE multi-line patterns.# Now read each paragraph and split into words. Record each# instance of a word in the %wordcount associative array.$total = 0;while (<>){ s/-\n//g; # Dehyphenate hyphenations (across lines) s/<s>//g; # Remove <s> tr/A-Z/a-z/; # Canonicalize to lowercase. @words = split(/\W*\s+\W*/, $_); foreach $word (@words) { $wordcount{$word}++; # Increment the entry. $total++; }}# Now print out all the entries in the %wordcount arrayforeach $word (sort keys(%wordcount)) { printf "(%8.6f\%) %20s occurs %3d time(s)\n", (100 * $wordcount{$word}/$total), $word, $wordcount{$word}; }printf "Total number of distinct words is %d.\n", $total;