c03 syntax

Upload: bama-raja-segaran

Post on 11-Oct-2015

55 views

Category:

Documents


0 download

DESCRIPTION

c03 Syntax

TRANSCRIPT

  • Defining Program Syntax

    Advanced Programming Languages

  • Syntax And SemanticsProgramming language syntax: how programs look, their form and structureSyntax is defined using a kind of formal grammarProgramming language semantics: what programs do, their behavior and meaningSemantics is harder to define

    Advanced Programming Languages

  • OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms

    Advanced Programming Languages

  • An English GrammarA sentence is a nounphrase, a verb, and anoun phrase.

    A noun phrase is anarticle and a noun.

    A verb is

    An article is

    A noun is... ::=

    ::=

    ::= loves | hates|eats

    ::= a | the ::= dog | cat | rat

    Advanced Programming Languages

  • How The Grammar WorksThe grammar is a set of rules that say how to build a treea parse treeYou put at the root of the treeThe grammars rules say how children can be added at any point in the treeFor instance, the rule says you can add nodes , , and , in that order, as children of ::=

    Advanced Programming Languages

  • A Parse Tree

    thedogthecatloves

    Advanced Programming Languages

  • A Programming Language GrammarAn expression can be the sum of two expressions, or the product of two expressions, or a parenthesized subexpressionOr it can be one of the variables a, b or c ::= + | * | ( ) | a | b | c

    Advanced Programming Languages

  • A Parse Tree

    + ( ) * ( )ab((a+b)*c)c

    Advanced Programming Languages

  • OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms

    Advanced Programming Languages

  • ::=

    ::=

    ::= loves | hates|eats

    ::= a | the ::= dog | cat | rattokensnon-terminal symbolsstart symbola production

    Advanced Programming Languages

  • BNF Grammar DefinitionA BNF grammar consists of four parts:The set of tokensThe set of non-terminal symbolsThe start symbolThe set of productions

    Advanced Programming Languages

  • Definition, ContinuedThe tokens are the smallest units of syntaxStrings of one or more characters of program text They are atomic: not treated as being composed from smaller partsThe non-terminal symbols stand for larger pieces of syntaxThey are strings enclosed in angle brackets, as in They are not strings that occur literally in program textThe grammar says how they can be expanded into strings of tokensThe start symbol is the particular non-terminal that forms the root of any parse tree for the grammar

    Advanced Programming Languages

  • Definition, ContinuedThe productions are the tree-building rulesEach one has a left-hand side, the separator ::=, and a right-hand side The left-hand side is a single non-terminalThe right-hand side is a sequence of one or more things, each of which can be either a token or a non-terminalA production gives one possible way of building a parse tree: it permits the non-terminal symbol on the left-hand side to have the things on the right-hand side, in order, as its children in a parse tree

    Advanced Programming Languages

  • AlternativesWhen there is more than one production with the same left-hand side, an abbreviated form can be usedThe BNF grammar can give the left-hand side, the separator ::=, and then a list of possible right-hand sides separated by the special symbol |

    Advanced Programming Languages

  • ExampleNote that there are six productions in this grammar. It is equivalent to this one: ::= + | * | ( ) | a | b | c ::= + ::= * ::= ( ) ::= a ::= b ::= c

    Advanced Programming Languages

  • EmptyThe special nonterminal is for places where you want the grammar to generate nothingFor example, this grammar defines a typical if-then construct with an optional else part: ::= if then ::= else |

    Advanced Programming Languages

  • Parse TreesTo build a parse tree, put the start symbol at the rootAdd children to every non-terminal, following any one of the productions for that non-terminal in the grammarDone when all the leaves are tokensRead off leaves from left to rightthat is the string derived by the tree

    Advanced Programming Languages

  • PracticeShow a parse tree for each of these strings:

    a+ba*b+c(a+b)(a+(b)) ::= + | * | ( )| a | b | c

    Advanced Programming Languages

  • Compiler NoteWhat we just did is parsing: trying to find a parse tree for a given stringThats what compilers do for every program you try to compile: try to build a parse tree for your program, using the grammar for whatever language you usedTake a course in compiler construction to learn about algorithms for doing this efficiently

    Advanced Programming Languages

  • Language DefinitionWe use grammars to define the syntax of programming languagesThe language defined by a grammar is the set of all strings that can be derived by some parse tree for the grammarAs in the previous example, that set is often infinite (though grammars are finite)Constructing grammars is a little like programming...

    Advanced Programming Languages

  • OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms

    Advanced Programming Languages

  • Constructing GrammarsMost important trick: divide and conquerExample: the language of Java declarations: a type name, a list of variables separated by commas, and a semicolonEach variable can be followed by an initializer:float a; boolean a,b,c; int a=1, b, c=1+2;

    Advanced Programming Languages

  • Example, ContinuedEasy if we postpone defining the comma-separated list of variables with initializers: Primitive type names are easy enough too: (Note: skipping constructed types: class names, interface names, and array types) ::= ; ::= boolean | byte | short | int | long | char | float | double

    Advanced Programming Languages

  • Example, ContinuedThat leaves the comma-separated list of variables with initializersAgain, postpone defining variables with initializers, and just do the comma-separated list part: ::= | ,

    Advanced Programming Languages

  • Example, ContinuedThat leaves the variables with initializers: For full Java, we would need to allow pairs of square brackets after the variable nameThere is also a syntax for array initializersAnd definitions for and ::= | =

    Advanced Programming Languages

  • OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms

    Advanced Programming Languages

  • Where Do Tokens Come From?Tokens are pieces of program text that we do not choose to think of as being built from smaller piecesIdentifiers (count), keywords (if), operators (==), constants (123.4), etc.Programs stored in files are just sequences of charactersHow is such a file divided into a sequence of tokens?

    Advanced Programming Languages

  • Lexical Structure AndPhrase StructureGrammars so far have defined phrase structure: how a program is built from a sequence of tokensWe also need to define lexical structure: how a text file is divided into tokens

    Advanced Programming Languages

  • One Grammar For BothYou could do it all with one grammar by using characters as the only tokensNot done in practice: things like white space and comments would make the grammar too messy to be readable ::= if then ::= else |

    Advanced Programming Languages

  • Separate GrammarsUsually there are two separate grammarsOne says how to construct a sequence of tokens from a file of charactersOne says how to construct a parse tree from a sequence of tokens ::= | ::= | | ::= | | ::= | | |

    Advanced Programming Languages

  • Separate Compiler PassesThe scanner reads the input file and divides it into tokens according to the first grammarThe scanner discards white space and commentsThe parser constructs a parse tree (or at least goes through the motionsmore about this later) from the token stream according to the second grammar

    Advanced Programming Languages

  • Historical Note #1Early languages sometimes did not separate lexical structure from phrase structureEarly Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keywordOther languages like PL/I allow keywords to be used as identifiersThis makes them harder to scan and parseIt also reduces readability

    Advanced Programming Languages

  • Historical Note #2Some languages have a fixed-format lexical structurecolumn positions are significantOne statement per line (i.e. per card)First few columns for statement labelEtc.Early dialects of Fortran, Cobol, and BasicAlmost all modern languages are free-format: column positions are ignored

    Advanced Programming Languages

  • OutlineGrammar and parse tree examplesBNF and parse tree definitionsConstructing grammarsPhrase structure and lexical structureOther grammar forms

    Advanced Programming Languages

  • Other Grammar FormsBNF variationsEBNF variationsSyntax diagrams

    Advanced Programming Languages

  • BNF VariationsSome use or = instead of ::=Some leave out the angle brackets and use a distinct typeface for tokensSome allow single quotes around tokens, for example to distinguish | as a token from | as a meta-symbol

    Advanced Programming Languages

  • EBNF VariationsAdditional syntax to simplify some grammar chores:{x} to mean zero or more repetitions of x[x] to mean x is optional (i.e. x | )() for grouping| anywhere to mean a choice among alternativesQuotes around tokens, if necessary, to distinguish from all these meta-symbols

    Advanced Programming Languages

  • EBNF ExamplesAnything that extends BNF this way is called an Extended BNF: EBNFThere are many variations ::= { ;} ::= if then [else ] ::= { ( | ) ;}

    Advanced Programming Languages

  • Syntax DiagramsSyntax diagrams (railroad diagrams)Start with an EBNF grammarA simple production is just a chain of boxes (for nonterminals) and ovals (for terminals):ifthenelseexprstmtstmtif-stmt ::= if then else

    Advanced Programming Languages

  • BypassesSquare-bracket pieces from the EBNF get paths that bypass themifthenelseexprstmtstmtif-stmt ::= if then [else ]

    Advanced Programming Languages

  • BranchingUse branching for multiple productions ::= + | * | ( )| a | b | c

    Advanced Programming Languages

  • LoopsUse loops for EBNF curly brackets ::= {+ }

    Advanced Programming Languages

  • Syntax Diagrams, Pro and ConEasier for people to read casuallyHarder to read precisely: what will the parse tree look like?Harder to make machine readable (for automatic parser-generators)

    Advanced Programming Languages

  • Formal Context-Free GrammarsIn the study of formal languages and automata, grammars are expressed in yet another notation: These are called context-free grammarsOther kinds of grammars are also studied: regular grammars (weaker), context-sensitive grammars (stronger), etc.S aSb | X X cX |

    Advanced Programming Languages

  • Many Other VariationsBNF and EBNF ideas are widely usedExact notation differs, in spite of occasional efforts to get uniformityBut as long as you understand the ideas, differences in notation are easy to pick up

    Advanced Programming Languages

  • ExampleWhileStatement: while ( Expression ) Statement DoStatement: do Statement while ( Expression ) ; ForStatement: for ( ForInitopt ; Expressionopt ; ForUpdateopt) Statement [from The Java Language Specification, James Gosling et. al.]

    Advanced Programming Languages

  • ConclusionWe use grammars to define programming language syntax, both lexical structure and phrase structureConnection between theory and practiceTwo grammars, two compiler passesParser-generators can write code for those two passes automatically from grammars

    Advanced Programming Languages

  • Conclusion, ContinuedMultiple audiences for a grammarNovices want to find out what legal programs look likeExpertsadvanced users and language system implementerswant an exact, detailed definitionToolsparser and scanner generatorswant an exact, detailed definition in a particular, machine-readable form

    Advanced Programming Languages