parsing expression grammars aaron hoffer css 548 autumn 2012
TRANSCRIPT
4251 3
0011 0010 1010 1101 0001 0100 1011
Parsing Expression Grammars
Aaron Hoffer
CSS 548
Autumn 2012
4251 3
0011 0010 1010 1101 0001 0100 1011
2
PEGs
• What if Flex and Yacc were one program?• What if you could use the same regular
expression patterns as Flex in your parser generator?
• What if Yacc supported…– ! (not “XYZ…”)– * (zero or more)
4251 3
0011 0010 1010 1101 0001 0100 1011
3
/* Scanning C comments with Flex */
<INITIAL>”/*” { BEGIN(IN_COMMENT);}
<IN_COMMENT>[^*]*\*+ { BEGIN(WARNING);}
<WARNING>[^/] { BEGIN(IN_COMMENT);}
<WARNING>”/“ { BEGIN(INITIAL); }
4251 3
0011 0010 1010 1101 0001 0100 1011
4
/* Scanning C comments with PEG */
Comment: ”/*” (“*” !”/” / [^*])* “*/”
Let’s break it into multiple rules to see what it means:
Comment: ”/*” Middle “*/”
Middle: (Asterisk | NotAsterisk)*
Asterisk: “*” !”/”
NotAsterisk: [^*]
4251 3
0011 0010 1010 1101 0001 0100 1011
5
/*Nested /*comments*/ with PEG*/
• Add the non-terminal Comment into Middle • Now parses nested comments
Comment:”/*” Middle “*/”Middle: (Comment | Asterisk | NotAsterisk)*Asterisk: “*” !”/” NotAsterisk: [^*]
4251 3
0011 0010 1010 1101 0001 0100 1011
6
What is a PEG?
• Not context-free grammar or regular expression • Are not ambiguous. PEG parsers matches rules in the
order they are defined• Are a formal description of what a recursive descent
parser with back-tracking is capable of parsing• Support predicates like “not” and “and” because the
parser can look ahead and then back-track
4251 3
0011 0010 1010 1101 0001 0100 1011
7
Domain Specific Languages and PEGs
• Alan Kay Viewpoints Research Institute – lots of research on PEGs
• Their vision: tiny DSLs cooperating in the same environment to accomplish big tasks (sounds like Lisp to some critics)
• VPRI’s STEP project demonstrated an OS, graphics system, word processor, spreadsheet, etc. in 20 KLOCs
• How VPRI did it?– Throw everything away. – Create self-hosting language and little DSLs. – Collapse code size by factor of 1000