1 matchete paths through the pattern matching jungle martin hirzel nate nystrom bard bloom jan vitek...
TRANSCRIPT
1
MatchetePaths through the Pattern Matching Jungle
Martin HirzelNate NystromBard Bloom
Jan Vitek
7+8 January 2008 PADL
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
2
What is Pattern Matching?
Examples:– Switch in C/Java– Exception handlers– ML-style patterns– Regular expressions– XPath patterns– Bit masks
Selection– If match, then
execute handler–E.g. is this a float?
22.341
Bindings–Give names to parts–E.g. integral part: 22,
fractional part: 341
3
Example: Lists
-- list multiplicationmult( )= 3 * mult( )= 3 * -1 * mult( )= 3 * -1 * 0 * mult( )= 3 * -1 * 0 * 4 * mult(nil)= 3 * -1 * 0 * 4 * 1 = 0
-- list constructioncons(3, cons(-1, cons(0, cons(4, null))))= 3 -1 40
3 -1 40
-1 40
40
4
4
Matching Structured Terms
int mult(List ls) { match(ls) { cons~(0, _): return 0; cons~(int h, List t): return h * mult(t); null: return 1; } return 1;}
Selection
Bindings
Central feature of ML, Haskell
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Hardly a jungle!
5
Less Structured Data
Data Pattern Language
Strings Regular expression Perl
XML XPath XSLT
Raw bits Binary pattern Erlang
Major factor in success ofpractical languages!
6
Why Unify?
• Given list of strings:
• Given String variable: name• Find name, extract int age• Match list deconstructor patterncons~(…, …)
• Match string nested RegExp/([a-z]+) ([0-9]+)/(name, int age)
sue 10 bob 15 ann 11
7
Matchete (Java Extension)
• Integrates pattern sublanguages
• Common set of primitive patterns
• Nesting composite patterns
• Simple uniform semantics
8
Primitive Patterns
Name Examples
Wildcard _
Value22.341
x
tiger.stripes + spider.legs
Binderint x
ScaryAnimal python
9
Composite Patterns
[[(0x2cf9:16) 01 (int x:14)]]BitLevel
/([a-z]) ([0-9]+)/(chr,int f)RegExp
<bib/book>(NodeList n)XPath
int[]{1, x, int y}Array
re("([0-9]+)")~(int i)Parameterized
cons~(0, _)Deconstructor
ExamplesName
10
Deconstructor Definition
class List { private int head; private List tail; public List(int h, List t) { head = h; tail = t; } public cons~(int h, List t) { h = head; t = tail; }}
Fields
Constructor
Deconstructor
Match on receiver objectOut parameters = subjects for nested patterns
11
Nesting
cons~(/([a-z]+) ([0-9]+)/(name, int age), _)
Wildcard_
Valuename
Binderint age
Deconstructorcons
RegExp([a-z]+) ([0-9]+)
sue 10 bob 15 ann 11
12
Subjects flow to children
RegExp([a-z]+) ([0-9]+)
Wildcard_
Valuename
Binderint age
Deconstructorcons
sue 10 bob 15 ann 11
bob 15 ann 11
sue 10
sue 10
13
Decisions and bindings flowto textual successor
RegExp([a-z]+) ([0-9]+)
Wildcard_
Valuename
Binderint age
Deconstructorcons
Handlerprint(age)
14
CompilationMatchete source code
Built on Rats!parser generator
GeneratedJava source
Debugginginformation
Runtimelibrary
OtherJava source
Matchete compiler
Java class files
Java compiler Postprocessor
15
Implemented Examples
• Balance red-black tree
• Process TCP/IP network packet
• Pretty-print XML bibliography
• … + smaller regression tests
16
Discussion: Typing
Matchete uses strong dynamic typing– No runtime errors, just failed matches– If Matchete compiler gives no error,
then Java compiler gives no error either
Why not (more) static typing?– Data formats mismatch– Test bed for a new scripting language
17
Discussion: Integration
Simpler language
re("a(b)c(d)")~(p,q)
Nointegration
No need to count
/a(p:b)c(q:d)/Tight
integration
Sublanguagereuse
/a(b)c(d)/ (p,q)
Looseintegration
AdvantageExampleChoice
Matchete choses tight integration for BitLevel,loose integration for RegExp and XPath,
no integration for XML as terms
18
Related Work
• Structured terms– Algebraic types: ML, Haskell, …– Objects: Tom, OOMatch, JMatch, …– Letting users define patterns: F#, Scala
• Strings: Perl; SNOBOL• Bit-level data: Erlang; DataScript; PADS• XML:– As trees: XSLT, XJ (XPath)– As terms: XDuce, HydroJ, …
19
Conclusions
• Pattern matching applies toterms, strings, XML, and raw bits
• Matchete offers path to unification