1 syntax and semantics the purpose of syntax problem of describing syntax formal methods of...

22
1 Syntax and Semantics • The Purpose of Syntax • Problem of Describing Syntax • Formal Methods of Describing Syntax • Derivations and Parse Trees • Sebesta Chapter 3

Upload: nicholas-burns

Post on 26-Dec-2015

259 views

Category:

Documents


0 download

TRANSCRIPT

1Syntax and Semantics

• The Purpose of Syntax

• Problem of Describing Syntax

• Formal Methods of Describing Syntax

• Derivations and Parse Trees

• Sebesta Chapter 3

2What is Syntax and Semantics

• Syntax and Semantics define a PL• Syntax

– form or structure of program units• expressions, statements, declarations, etc.

• Semantics– meaning of program units

• expressions, statements, declarations, etc.

• Why do we need language definitions?– to design a language– to implementer a compiler/interpreter– to write a program (use the language)

3 Syntax Elements

• A sentence is – a string of characters over some alphabet

• A language is – a set of sentences

• A lexeme is – the lowest level syntactic unit of a language

• e.g.,*, public, totalCount

• A token is – a category of lexemes

• e.g., identifier

4Describing Syntax

• Recognizers

– read an input string in the alphabet of the language (a sentence) and decide whether it belongs to the language

• used in compilers – see Chapter 4 for details

• Generators

– produce sentences in a language• a sentence is syntactically correct if it can be

generated by the generator

5Backus-Naur Form (BNF)

• BNF is a meta-language – i.e. a language used to describe another language– invented by John Backus to describe ALGOL 58 – used by Peter Naur to describe ALGOL 60

• BNF is equivalent to context-free grammars• a BNF grammar is defined by

– a set of terminal symbols, – a set of nonterminal symbols– a set of rules– a start symbol (one of the terminal symbols)

6BNF Elements

• terminal symbols – are the lexemes of the target PL

• e.g., while, ( , )

• nonterminal symbols – represent classes of syntactic structures

• they act like syntactic variables• e.g., <statement>

• rules – define how a nonterminal symbol can by

developed into a sequence of nonterminal and terminal symbols

• e.g., <while_stmt> while ( <logic_expr> ) <stmt>

7BNF Rules

• A rule has– a left-hand side (LHS)– then – a right-hand side (RHS)

• There can be several rules for one LHS<stmt> <assignment>

<stmt> begin <stmt_list> end

• Syntactic lists are described using recursion<ident_list> ident

<ident_list> ident , <ident_list>

• A grammar is – a finite nonempty set of rules

8EBNF

• Extended BNF (EBNF) – is most often used– avoids having numerous rules for the same LHS

• Extra meta-symbols (in addition to ) – [… ]

• enclosed symbols are optional (1 or 0 times)– e.g., <if_stmt> if ( <exp> ) <stmt> [ else <stmt> ]

– {…}• enclosed symbols can be repeated (0 to n times)

– e.g., <ident_list> ident {, ident }

– …|…• choice of one of the symbol sequences separated by |

– e.g., <stmt> <assignment> | begin <stmt_list> end

– (…)• groups enclosed symbols

9

BNF<expr> <expr> + <term> <expr> <expr> - <term> <expr> <term> <term> <term> * <factor> <term> <term> / <factor> <term> <factor> <factor> <exp> ** <factor> <factor> <exp> <exp> ( <expr> ) <exp> id

EBNF<expr> <term> { ( + | - ) <term> }

<term> <factor> { ( * | / ) <factor> }

<factor> <exp> [ ** <factor> ]

<exp> ( <expr> ) | id

BNF vs. EBNF

10Augmented EBNF

• another meta-symbol

= (equal) instead of • meta-symbols for repetitions

+ means one or more times

* means zero or more times<ident> = <letter>+ ( <letter> | <digit> )*

• rules can use iteration instead of recursion – e.g.:

• <stmt_list> <stmt> | <stmt> ; <stmt_list>

– can be formulated as• <stmt_list> = <stmt> ( ; <stmt> )*

11Context-Free Grammar

• Context-Free Grammars (CFG)– defined by Noam Chomsky– meant to describe the syntax of natural languages

• Context-Free Grammar G = (S, T, N, P)• S = start symbol• T = set of terminal symbols – lexemes and tokens• N = set of non-terminal symbols - abstractions• P = production rules – definition of a LHS abstraction

using RHS

• A sentence– a sequence of terminal symbols

12A Small Language in EBNF

<program> begin <stmt_list> end<stmt_list> <stmt> | <stmt> ; <stmt_list><stmt> <var> = <expr><expr> <term> + <term> | <term> - <term><term> <var> | const<var> a | b | c

13Derivation

• A derivation is – a repeated application of rules

• starting with the start symbol• substitution of a nonterminal LHS by the RHS of a rule• ending with a sentence (all terminal symbols)

• Every string of symbols in the derivation is – a sentential form

• A sentence is– sentential form with only terminal symbols

14Derivation Types

• A leftmost derivation– leftmost nonterminal in each sentential form is

expanded first

• A rightmost derivation– rightmost nonterminal is expanded first

• A mixed derivation– an arbitrary nonterminal is expanded

15Derivation Example<program> begin <stmt_list> end<stmt_list> <stmt> | <stmt> ; <stmt_list><stmt> <var> = <expr><expr> <term> + <term> | <term> - <term><term> <var> | const<var> a | b | c

<program> => begin <stmt_list> end

=> begin <stmt> end

=> begin <var> = <expr> end

=> begin a = <expr> end

=> begin a = <term> + <term> end

=> begin a = <var> + <term> end

=> begin a = b + <term> end

=> begin a = b + const end

16Questions

In the preceding slide:1. Is the derivation a leftmost or a rightmost derivation?

2. State the "opposite" derivation.• I.e. if it is a leftmost derivation give rightmost one • or vice versa

3. What are the terminal symbols of the language, what are the nonterminal symbols and what is the start symbol?

4. Change a rule so that begin a = - b + const end

is a legal sentence

17Parse Tree

• Parse Tree is– a hierarchical representation of a derivation

<program>

<stmt_list>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

begin end

18

EBNF Grammar

<assign> <id> = <expr><expr> <id> + <expr> | <id> * <expr>

| ( <expr> )

| <id>

<id> a | b | c

Parse tree of the sentence:

a = b * (a + c)

Simple Assignment Language

<assign>

a

<id> = <expr>

<id>

c

*

b

<id> <expr>

<expr>( )

a

<id> + <expr>

19Ambiguous Grammars

• A grammar is ambiguous

– if and only if it generates a sentential form that has two or more distinct parse trees

– e.g.<assign> <id> = <expr><expr> <expr> + <expr> | <expr> * <expr>

| ( <expr> )

| <id>

<id> a | b | c

20

add-first parse tree a = b + c * d

multiply-first parse tree a = b + c * d

Two Distinct Parse Trees

<assign>

a

=

<id>

d

*<expr>

b

<id>

+

<expr>

<expr>

c

<id>

<expr>

<expr><id>

<assign>

a

=

*

<expr>

b

<id>

+<expr>

<id>

<expr>

c

<expr>

<id>

d

<expr>

<id>

21An Unambiguous Expression Grammar

• The same language can be defined with an unambiguous grammar!

<assign> <id> = <expr><expr> <expr> + <term> | <term>

<term> <term> * <factor>

| <factor>

<factor> ( <expr> ) | <id>

<id> a | b | c

22Precedence Through Grammar

• A grammar can enforce the precedence of operators– The parse tree shows how

• (low levels are evaluated first)

– e.g., <expr> <expr> + <term> | <term><term> <term> * const | const

*

<expr>

const

<term>

+<expr>

const

<term>

<term>

const