yacc no more
DESCRIPTION
YACC no more. Integrating parsers, interpreters and compilers into your application. Sriram Srinivasan (“Ram”). This is he. Sriram Srinivasan One of the core engineers of the WebLogic app server Wrote the first commercially available EJB implementation Wrote the TP engine in the WLS - PowerPoint PPT PresentationTRANSCRIPT
Session # 2221
YACC no more
Sriram Srinivasan (“Ram”)
Integrating parsers, interpreters and compilers into your application
Session # 22212
This is he
• Sriram Srinivasan
• One of the core engineers of the WebLogic app server– Wrote the first commercially available EJB
implementation– Wrote the TP engine in the WLS
• Author: “Advanced Perl Programming” (O’reilly)
Beginning
Session # 22213
Why this talk?
• Quest for higher level programming patterns– More productive / faster / maintainable etc…
• Integrating compilers, parsers, interpreters into your application
Beginning
Session # 22214
Embeddable Parsers
• JDK parsers for configuration data– java.util.Properties, XML, regex library
• java.util.Properties– Limited to “property = value” format– Takes care of comments, multi-line values, quotes
Case Study: Configuration Data
#app server propertiesconnectionPoolName = testPoolnumThreads = 10
…p = new Properties().load(inputStream)
Middle
Session # 22215
XML parsers
• Good for structured, hierarchical data
• DOM (Document Object Model) parser– Converts an entire XML document into a
corresponding tree of Nodes.
• SAX (Simple API for XML) – Callback class extends DefaultHandler– Supplies methods for startDocument(…), startElement(…), endElement(…) etc.
Middle
Session # 22216
Adding code to data
• Problem: We want to add add macros and expressions to our properties.
numThreads = numProcessors# Ensure that connection pool is smaller than# thread pool. connectionPoolSize = min(numThreads – 2, 1)
• This requires an expression evaluator
Middle
Session # 22217
Embeddable interpreters
• Plethora of free, high quality interpreters available– BeanShell (Java-like syntax)– Rhino (JavaScript)– Jython (Python in Java)– Kawa (Scheme in Java)
• When embedded, flow of control easily goes from java to interpreter to back.
• Command-line shell always included
Middle
Session # 22218
BeanShell
• Expressions identical to java
• Types are inferred dynamically
Middle
add( a, b ) { return a + b; }
sum = add(1, 2); // 3 str = add("Web", "Logic"); // "WebLogic"
Session # 22219
Embedding BeanShell
Middle
import bsh.Interpreter;
Interpreter i = new Interpreter();i.set("foo", 5);i.eval("bar = foo*10"); System.out.println("bar = "+ i.get("bar"));
i.eval(new FileReader("config.properties"));Integer n = i.get("connectionPoolSize");
• Instead of writing code to parse the properties file, just eval it!– Comments should be “// … ”, not “# …– Each property definition line should end in “;”
Session # 222110
• Strict java expression syntax – no class declarations
• Loose convenience syntax
BeanShell features
Middle
b = new java.awt.Button();b.label = "Yo" // eqvt. to b.setLabel("Yo")h = new Hashtable();h{"spud"} = "potato";// Swing stuffb = new JButton("My Button");f = new JFrame("My Frame");f.getContentPane().add(b, "Center");f.pack();f.show();
Session # 222111
Rhino
• Free ECMAScript interpreter from Mozilla
• Slightly more cumbersome to embed than BeanShell
• Contains bytecode compiler that can be called from within java
• Closures
• Regex support built-in. Good for text manipulation
Middle
Session # 222112
Case study: Command pattern
Middle
function insertCommand(text) { this.pos = buf.pos buf.insert(text) this.len = text.length this.undo = function () { buf.moveTo(this.pos) buf.erase(this.len); }
undoStack.push(this);}
new insertCommand("foo")undoStack.pop().undo()
• Undo/Redo in an editor
Session # 222113
Python
Middle
• Python (Java implementation is "Jython") – powerful high-level language– Compiles to bytecode. – True scripting language– Can extend java classes– Static compilation and standalone execution
Session # 222114
More case studies
• Embedded expressions – Spreadsheet formulae
• Customizable GUIs– Macro facility, keyboard mapping
• Remote agents
• Monitoring
• Performance through partial evaluation
Middle
Session # 222115
Case Study: Remote Agents
• Example: Test Agents
• Can upload script to each agent to launch processes, control them locally.– Jython is well-suited for this kind of task
• Example: Scriptable IMAP mail server– "All messages that contain this regex, make
a copy in this folder"
Middle
Session # 222116
Case Study: Monitoring
• SNMP model: Obtain attributes from each node over the network, do calculation
• Alternatively, upload script to each node, and let it return the result– Conserves network bandwidth
• Can insert any kind of probe • Study application data structures• Application-specific profiling
Middle
Session # 222117
Case Study: Performance
• Partial evaluation can yield substantial performance benefits
• Object - RDBMS adaptors– Code generator studies class and db
schema– Omits unnecessary conversions, null checks
• Vector dot product
Middle
dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];
// But if 'a' is fixed {16,0,4} …dp = b[0] << 4 + b[2] << 2
Session # 222118
Generating java
• Moving from embedded interpreters to generating java source– Example: JSP.
• Convert template to java, compile and dynamically load
• BEA/WebLogic's weblogic.dtdc– Converts XML DTD to a high performance
SAX parser tuned to that DTD
Middle
Session # 222119
Generating code with Doclets
• javadoc is a general purpose parserjavadoc –doclet ListClass foo.java
• ListClass.start() called with a hierarchy of *Doc nodes
import com.sun.javadoc.*; public class ListClass { public static boolean start(RootDoc root) { ClassDoc[] classes = root.classes(); for (int i = 0; i < classes.length; ++i) { System.out.println(classes[i]); } return true; }
• Arbitrary tags can be introduced at any level
Middle
Session # 222120
Case study: iContract
• Pattern: doclet expressions converted to annotated java code
/** * Ensure that argument is always > 0* @pre f >= 0.0** Ensure that the function produces the sqrt * within a* @post Math.abs((return * return) - f) < 0.001 */ public float sqrt(float f) { ... }
Middle
Session # 222121
Case Study: EJBGen
/** * @ejbgen:entity * ejb-name = AccountEJB-OneToMany * data-source-name = demoPool * table-name = Accounts */abstract public class AccountBean implements EntityBean { /** * @ejbgen:cmp-field column = acct_id * @ejbgen:primkey-field * @ejbgen:remote-method transaction-attribute = Required */ abstract public String getAccountId();
Middle
Session # 222122
Generating bytecode
• Example: WebLogic RMI adaptors
• Sometimes, some facilities are available only in bytecode (goto's!)
• Example: fast string matching– Given a search string, encode the state
machine into bytecode– Worth it if the same pattern is going to be
used many times• Virus scanners• Searching genome sequences
Middle
Session # 222123
Example: String matching
• Problem: match "10100"– Convert to a state machine– Each state encodes a succesful prefix match
Middle
S5S0 S1 S3 S41 0 1 0 0
0 1
S2
1
0
1
Session # 222124
String matching (contd.)
• If only goto were allowed in java …
• But, goto's are allowed in bytecode!
Middle
try { //buf is the buffer to be searched int i = -1; s0: i++; if (buf[i] != '1') goto s0; s1: i++; if (buf[i] != '0') goto s1; s2: i++; if (buf[i] != '1') goto s0; s3: i++; if (buf[i] != '0') goto s1; s4: i++; if (buf[i] != '0') goto s3; s5: i++; return i-5;} catch (ArrayIndexOutOfBoundsException e) { return -1;}
Session # 222125
String matching (contd.)
• Using an assembler like jasmin
Middle
iconst_m1 istore_1S0: ;; i++; if a[i] != '1' goto S0; iinc 1 1 ; i++ aload_0 ; load a[i] iload_1 caload bipush 49 ; load '1' if_icmpne S0 ; if .. goto S0S1: ;; i++; if a[i] != '0' goto S1 iinc 1 1 aload_0 iload_1 caload bipush 48 if_icmpne S1
Session # 222126
Custom languages
• Craft a language that fits the context you are working in– Avoid XML ugliness: SRML (Simple Rule Markup)– Instead of "if s.purchaseAmount > 100 … "
Middle
<simpleCondition className="ShoppingCart" objectVariable="s"> <binaryExp operator="gt"> <field name="purchaseAmount"/> <constant type="float" value="100"/> </binaryExp> </simpleCondition>
Session # 222127
Antlr Introduction
• Antlr: A recursive descent parser with configurable lookahead (LL(k) parser)
• Much, much simpler than lex/yacc– Yacc error messages are cryptic, tough for non-CS
types to understand– Even generated code easy to understand
• Includes tree building and recognition– No such facility in yacc
• Lexer, parser and tree recognizer phase have similar syntax
Middle
Session # 222128
Antlr
• Example: hierarchical property list– A list consists of name value pairs– Names are identifiers, values are numbers or lists
Middle
( a 200 b (c 10 d 20))
Session # 222129
Antlr (contd.)
Middle
class LispLexer extends Lexer;
ID : ('a' .. 'z')+;
NUM: ('0' .. '9')+;
LP : '(';
RP : ')';
class LispParser extends Parser;
list : LP (nameValuePair)+ RP;
nameValuePair : ID value ;
value : NUM | list;
Session # 222130
Antlr (contd.)
Middle
nameValuePair returns [NVP ret=null]
{Object v;}
: t:ID v=value
{ret = new NVP(t.getText(),v);}
;
value returns [Object ret=null]
: t:NUM {ret=t.getText();}
| ret=list
;
• Adding code, arguments, return values
Session # 222131
Way out there …
Middle
• Configurable hardware– New circuits on the fly
• Intentional programming– Code not represented as a stream of characters
Session # 222132
Summary
• Run-time evaluation gives you a lot of power
• Other languages add features (e.g. closures) to java
• Lots of simple, free, quality parsers, interpreters
• Produce custom java source or byte code for performance
• Roll your own domain-specific language with ANTLR or javacc.
• Yacc No More.
End
Session # 222133
References
• Doclets– Doclet tools: www.doclet.com– EJBGen: www.beust.com, Cedric Beust – Icontract: www.reliable-systems.com, Reto Kramer
• Languages, interpreters– Beanshell: www.beanshell.org– Rhino: www.mozilla.org/rhino– Python: www.python.org, www.jython.org– ANTLR: www.antlr.org– More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html
• SRML: xml.coverpages.org/srml.html
End
Session # 222134
References (contd.)
• Bytecode manipulation:– Jasmin: mrl.nyu.edu/~meyer/jasmin/– Jikes Bytecode toolkit:
www.alphaworks.ibm.com/tech/jikesbt – BCEL: bcel.sourceforge.net
• "Rapid" - Reconfigurable hardware – www.cs.washington.edu/research
• "The death of computer languages, the birth of intentional programming", Charles Simonyi– research.microsoft.com/scripts/pubs/trpub.asp– Microsoft tech report MSR-TR-95-52
• Thinking in Patterns with Java, Bruce Eckel– www.mindview.net/Books/TIPatterns
End