the xtext grammar language
DESCRIPTION
Slides of a talk held at XtextCON 2014 (Kiel)TRANSCRIPT
Xtext Grammar Language
Jan Köhnlein
2014, Kiel
grammar org.xtextcon.Statemachine with org.eclipse.xtext.common.Terminals!generate statemachine "http://www.xtextcon.org/Statemachine"!Statemachine: name=STRING elements+=Element*;!Element: Event | State; Event: name=ID description=STRING?;!State: 'state' name=ID transitions+=Transition*;!Transition: event=[Event] '=>' state=[State];
Precedences
Action, Assignment, Keyword, RuleCall, Parenthesized
Cardinalities *, +, ? Predicates =>, ->
Group <blank>
Unordered Group &
Alternative |
AssignmentFirst: name=ID?;//(name=ID)?;!CardinalityFirst: 'a' 'b'?;//'a' ('b'?);!PredicateFirst: =>'a' 'b';//(=>'a') 'b';!GroupFirst: 'a' | 'b' 'c';//'a' | ('b' 'c');
Syntactic Aspects
grammar org.xtextcon.Statemachine with org.eclipse.xtext.common.Terminals!generate statemachine "http://www.xtextcon.org/Statemachine"!Statemachine: name=STRING ('events' events+=Event+)? states+=State*;!Event: name=ID description=STRING?;!State: 'state' name=ID transitions+=Transition*;!Transition: event=[Event|ID] '=>' state=[State|ID];
Lexing
Lexer
Splits document text into tokens
Works before and independent of the parser
Token matching First match wins Precedence
All keywords Local terminal rules Terminal rules from super grammar
As last resort you can define a custom lexer
Lexing
// Example\n’Lamp’\nevents button_pressed\nstate on\n\tbutton_pressed => off\nstate off\n\tbutton_pressed => light
SL_COMMENTSTRING WSevents WS ID WSstate ID WSID WS => WS ID WSstate ID WSID WS => WS ID
events state => ID
STRING ML_COMMENT SL_COMMENT
WSANY_OTHER
TokensToken StreamChar Stream
Terminal Rulesgrammar org.eclipse.xtext.common.Terminals hidden(WS, ML_COMMENT, SL_COMMENT) import "http://www.eclipse.org/emf/2002/Ecore" as ecore!terminal ID : '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;terminal INT returns ecore::EInt: ('0'..'9')+;terminal STRING : '"' ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' | "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|"'") )* "'"; terminal ML_COMMENT: '/*' -> '*/';terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;!terminal WS : (' '|'\t'|'\r'|'\n')+;!terminal ANY_OTHER : .;
Terminals in the Parser
Hidden Terminals
Are ignored by the parser inside the respective scope
Can be defined on grammar or rule level (inherited)grammar org.eclipse.xtext.common.Terminals hidden(WS, ML_COMMENT, SL_COMMENT)
or on parser rulesQualifiedName hidden(): // strictly no whitespace
ID ('.' ID)*;
Datatype Rules
Return a value (instead of an EObject)
Are processed by the parser Superior alternative to terminal rules
Examples
Double returns ecore::EDouble hidden(): '-'? INT '.' INT ('e'|'E' '-'? INT)?; QualifiedName: // returns ecore::EString (default) ID ('.' ID)*; ValidID: ID | 'keyword0' | 'keyword1';
Transforms textual representation to value and back For terminal and datatype rules
e.g. strips quotes (STRINGValueConverter)remove leading ^ (IDValueConverter)
EMF defaults are often sufficient (EFactory)
Value Converter
class MyValueConverterService extends AbstractDeclarativeValueConverterService { @Inject MyValueConverter myValueConverter;! @ValueConverter(rule = "MY_TERMINAL") def MY_TERMINAL() { return myValueConverter }}
Ambiguities
ExampleExpression: If; If: Variable | 'if' condition=Expression 'then' then=Expression ('else' else=Expression)?; Variable: name=ID;
warning(200): ...: Decision can match input such as "'else'" using multiple alternatives: 1, 2As a result, alternative(s) 2 were disabled for that inputwarning(200): ...: Decision can match input such as "'else'" using multiple alternatives: 1, 2As a result, alternative(s) 2 were disabled for that input
if cond1 then if cond2 then foo else // dangling bar
Ambiguity Analysis
In the workflow, add
!
AntlrWorks www.antlr3.org/works/
fragment = DebugAntlrGeneratorFragment { options = auto-inject {}}
Ambiguity Resolution
Add keywords
Use syntactic predicates: “If the predicated element matches in the current decision, decide for it.”
plain(=>'else' else=Expression)?; =>('else' else=Expression)?; // not so good
first-set->('else' else=Expression)?;
Correspondence to Ecore
grammar org.xtextcon.Statemachine with org.eclipse.xtext.common.Terminals!generate statemachine "http://www.xtextcon.org/Statemachine"!Statemachine returns Statemachine: name=STRING ('events' events+=Event+)? states+=State*;!Event returns Event: name=ID description=STRING?;!State returns State: 'state' name=ID transitions+=Transition*;!Transition returns Transition: event=[Event|ID] '=>' state=[State|ID];
statemachine
name : StringStatemachine
name : StringTransition
name : StringState
name : Stringdescription: String
Event
*
* *states events
transitions
event1
1state
Supertypes... !Element: Event | State;!Event returns Event: name=ID description=STRING?;!State returns State: 'state' name=ID transitions+=Transition*;!...
description: StringEvent
State
name : StringElement
name : StringTransition
Common features are promoted to the super type
Dispatcher rule Element needs not to be called
A rule can return a subtype of the specified return type
Imported EPackage
Workflow { bean = StandaloneSetup { ... registerGeneratedEPackage = “org.xtextcon.statemachine.StatemachinePackage” registerGenModelFile = “platform:/resource/<path-to>/Statemachine.genmodel" }
//generate statemachine "http://www.xtextcon.org/Statemachine"import "http://www.xtextcon.org/Statemachine"
AST Creation
EObject Creation
The current pointer
Points to the EObject returned by a parser rule call Is set to null when entering the rule On the first assignment the EObject is created and assigned to current
Further assignments go to current
EObject CreationType : (Class | DataType) 'is' visibility=('public' |'private'); DataType : 'datatype' name=ID; Class : 'class' name=ID;
datatype A is public
Type current = null;current = new DataType();current.setName(“A”);current.setVisibility(“public”);return current;
Actions
Alternative without assignment -> no returned object
Actions make sure an EObject is created Specified type must be compatible with return type Created element is assigned to current
BooleanLiteral: value?='true' | 'false';
BooleanLiteral returns Expression: {BooleanLiteral} (value?=‘true’ | ‘false');
The rule 'BooleanLiteral' may be consumed without object instantiation. Add an action to ensure object creation, e.g. '{BooleanLiteral}'.
Left Recursion
Expression : Expression '+' Expression | Number;!Number : value = INT;
The rule 'Expression' is left recursive.
Left FactoringExpression: {Addition} left=Number ('+' right=Number)*; Number: value = INT;
1
// modelAddition { left = Number { value = 1 }}
Assigned ActionAddition returns Expression: Number ({Sum.left = current} '+' right=Number)*; Number: value = INT;
1
Expression current = null;current = new Number();current.setValue(1);return current;
// modelNumber { value = 1}
Assigned Action IIAddition returns Expression: Number ({Sum.left = current} '+' right=Number)*; Number: value = INT;
1 + 2
Expression current = null;current = new Number();current.setValue(1);Expression temp = new Sum();temp.setLeft(current);current = temp;…current.right = <second Number>;return current;
// modelAddition { left = Number { value = 1 } right = Number { value = 2 }}
Assigned Action III
Creates a new element makes it the parent of current and sets current to it
Needed normalizing left-factored grammars infix operators
Best Practices
Best Practices
Use nsURIs for imported EPackages
Prefer datatype rules to terminal rules
Use syntactic predicates instead of backtracking
Be sloppy in the grammar but strict in validation