chapter 6: syntaxrafea/csce325/slides/06/syntax.pdf · certain tokens be separated by token...
TRANSCRIPT
![Page 1: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/1.jpg)
Chapter 6: Syntax
![Page 2: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/2.jpg)
Syntax
Syntax is the structure of a language. Earlier, both syntax and semantics were
described using lengthy English language explanations. Although semantics are still described in
English, syntax is described using a formal system.
2
![Page 3: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/3.jpg)
Syntax
In the 1950s, Noam Chomsky developed the idea of context-free grammars. John Backus, with contributions by Peter Naur,
developed a notational system for describing context-free grammars:
The Backus-Naur Forms (BNF)
3
![Page 4: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/4.jpg)
Syntax
BNF was first used to describe the syntax of Algol60. Later used to describe C, Java, and Ada.
Every modern programmer and computer scientist must know how to read, interpret,
and apply BNF descriptions of language syntax.
4
![Page 5: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/5.jpg)
Syntax
BNFs occur in three basic forms: Original BNF
Extended BNF (EBNF) (Popularized by Niklaus
Wirth)
Syntax Diagrams
5
![Page 6: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/6.jpg)
Lexical Structure
The lexical structure of a programming language is the structure of its words. Can be considered separate from syntax, but is
VERY closely related to it.
6
![Page 7: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/7.jpg)
Lexical Structure
Typically, the scanning phase of a translator collects sequences of characters from the input program into tokens.
Tokens are then processed by a parsing phase, which determines the syntactic structure.
Tokens can be defined using either grammar or regular expressions (to describe text patterns).
7
![Page 8: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/8.jpg)
Lexical Structure • Tokens fall into several distinct categories:
– Reserved words (Keywords):
• if, while, else, main
– Literals or constants: • 42, 27.5, “Hello”, ‘A’
– Special symbols:
• > >= < ; , +
– Identifiers • X24, var1, balance
8
![Page 9: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/9.jpg)
Lexical Structure
Java reserved words:
abstract default if private this boolean implements protected throw do break double import public throws byte else instanceof return transient case extends int short try catch final interface static void char finally long strictfp volatile class float native super while const for new switch continue goto package synchronized
9
![Page 10: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/10.jpg)
Lexical Structure
Identifiers may not be names as keywords.
Keywords may also be called predefined
identifiers. In some languages, identifiers have a fixed
maximum length. 10
![Page 11: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/11.jpg)
Lexical Structure
Some programming languages allow arbitrary
length of identifiers, but only the first six or eight characters may be guaranteed to be significant (very confusing for programmers).
11
![Page 12: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/12.jpg)
Lexical Structure
• What about:
– doif
• Is it an identifier called “doif”
• Or is it the keywords do if?
12
![Page 13: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/13.jpg)
Lexical Structure
Principle of longest substring (Principle of
maximum munch): At each point, the longest possible string of
characters is collected into a single token.
13
![Page 14: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/14.jpg)
Lexical Structure
The principle of longest substring requires that
certain tokens be separated by token delimiters or white space. End of lines may be significant, indentation
may also be significant.
14
![Page 15: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/15.jpg)
Lexical Structure
A free format language is one where the format does not affect the program structure (Except to satisfy the principle of longest substring of course). Example: Put as many blank lines as you want. Put as many spaces as you want between identifiers.
Most modern languages are free format.
15
![Page 16: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/16.jpg)
Lexical Structure
FORTRAN is a primary example of a language violating the free format conventions.
As pre-processing, FORTRAN totally ignores white spaces. They are removed before processing starts.
FORTRAN has no reserved words at all.
16
![Page 17: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/17.jpg)
Lexical Structure Regular expressions:
Are descriptions of patterns of characters.
Composed of three basic operations: Concatenation Repetition Choice (selection)
17
![Page 18: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/18.jpg)
Lexical Structure
Regular expressions: Example, describe using a regular expression the occurrence of:
0 or more repetitions of either a or b Followed by the single character c (concatenation)
Such as:
aaaaabbbbbbc abbbbbbbbbc abaaaabbbbaaaaabc c abaaabbbbc bbbbbbc
Repetition Choice
Concatenation
18
![Page 19: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/19.jpg)
Lexical Structure
Regular expressions: Example, describe using a regular expression the occurrence of:
0 or more repetitions of either a or b Followed by the single character c (concatenation)
Example of rejected strings:
bca cabbbb b a aaaabbb
19
![Page 20: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/20.jpg)
Lexical Structure
Regular expressions: Example, describe using a regular expression the occurrence of:
0 or more repetitions of either a or b Followed by the single character c (concatenation)
The regular expression is:
(a | b)* c The | means OR The * means zero or more occurrences
20
![Page 21: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/21.jpg)
Lexical Structure
• Regular expressions: – Regular expression notation is often extended by
additional operators such as the “+” operator.
– (a | b)+ • Means ONE or more occurrences of either a or b • Equivalent to (a | b) (a | b)*
21
![Page 22: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/22.jpg)
Lexical Structure Regular expressions:
Example: write a regular expression for integer
constants: i.e. one or more digits.
Note [a-b] means a range
22
![Page 23: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/23.jpg)
Lexical Structure Regular expressions:
Example: write a regular expression for integer
constants
[0-9]+
23
![Page 24: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/24.jpg)
Lexical Structure Regular expressions:
Example: write a regular expression for floating
point constants: One or more digits followed by an optional decimal point then one or more digits.
[0-9]+(\.[0-9]+)?
Escape Sequence Optional
24
![Page 25: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/25.jpg)
Lexical Structure Regular expressions: Most modern text editors allow for defining
regular expressions to perform searching.
Search utilities such as UNIX grep also uses it.
Lex can also be used to turn regular expressions into an automatic scanner!
25
![Page 26: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/26.jpg)
Lexical Structure Regular expressions:
Can you write a small lexical analyzer to recognize
certain tokens.
Can you write a small scanner to accept a simple expression consisting of the tokens you previously recognized?.
26
![Page 27: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/27.jpg)
Parsing Techniques and Tools
• A scanner program that only identifies tokens using regular expressions can be automatically generated using regular expressions.
• Lex is a famous scanner generator.
• It’s freeware version is called Flex (Fast Lex).
• To be covered in detail in a compiler course.
27
![Page 28: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/28.jpg)
Context-Free Grammars and BNFs Grammar of a Simple English Sentence
Example: sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets
OR
28
![Page 29: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/29.jpg)
Context-Free Grammars and BNFs
• Grammar of a Simple English Sentence Example: – One can alternatively use different notation such
as: • <sentence> ::= <noun_phrase> <verb_phrase> ‘.’
• But the ‘ ‘ used around the full stop now also become
metasymbols themselves.
29
![Page 30: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/30.jpg)
Context-Free Grammars and BNFs There is an ISO standard format for BNF
notation. ISO 14977 [1996]
30
![Page 31: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/31.jpg)
Context-Free Grammars and BNFs
• Question: Does the sentence “The girl sees a dog.” belong to the grammar indicated earlier?
• We go through a process of derivation to see if this sentence is accepted by the grammar or not.
31
![Page 32: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/32.jpg)
Context-Free Grammars and BNFs Exercise: Is it possible to derive:
The girl sees a dog. From the following grammar?
sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets
32
![Page 33: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/33.jpg)
Context-Free Grammars and BNFs • There are two primary problems with the previous
grammar: – thegirlseesapet is also an acceptable sentence.
• It is up to the scanner to be insensitive to spaces.
– The grammar does not specify that articles appearing at the
beginning of a sentence should be capitalized.
• Such “positional” property is often hard to deal with using context-free grammars.
33
![Page 34: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/34.jpg)
Context-Free Grammars and BNFs Terminology:
sentence -> noun_phrase verb_phrase . noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets
Non-Terminal
Terminal
Metasymbol
Production (Grammar Rule)
Start Symbol
34
![Page 35: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/35.jpg)
Context-Free Grammars and BNFs • Definitions:
– A context-free grammar consists of a series of grammar rules:
• The rules consist of a left hand side that is a single structure. • Followed by a metasymbol “->” • Followed by a right hand side consisting of non-terminals and
terminals separated by |
– Productions are in BNF if they are as given using only the symbols
• -> • | • Sometimes parenthesis
35
![Page 36: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/36.jpg)
Context-Free Grammars and BNFs Definitions:
A context-free language:
Defines the language of the grammar.
This language is the set of all strings of terminals for
which there exists a derivation beginning with the start symbol and ending with the string of terminals.
36
![Page 37: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/37.jpg)
Context-Free Grammars and BNFs • Definitions:
– A grammar is called context-free because:
• Non-terminals appear singly on the left hand side of productions. • Each non-terminal can be replaced by any right-hand side choice.
– Example:
– In the previous example, we can use any of the given verbs (pets, sees) with the girl subject (context-free)
– It may make sense to use the verb “pets” only with girls, this will make it context-sensitive!
– Context-sensitivity is more of a semantic issue!!
37
![Page 38: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/38.jpg)
Context-Free Grammars and BNFs Definitions:
A grammar is made context-sensitive by adding
non-terminals to the left hand side of productions.
Anything that is not expressible using a context-free grammar is a semantic, not a syntactic issue.
38
![Page 39: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/39.jpg)
Context-Free Grammars and BNFs Example of a context-sensitive grammar:
Enforce articles appearing at beginning of sentences to be capital.
sentence -> beginning noun-phrase verb-phrase . beginning article -> The | A (Newly added production)
noun_phrase -> article noun article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets
Added to the first rule
Two non-terminals on
the LHS !! NOT context-free!
39
![Page 40: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/40.jpg)
Context-Free Grammars and BNFs
Example of a context-sensitive grammar: Enforce articles appearing at beginning of sentences to be capital.
sentence -> beginning noun-phrase verb-phrase . beginning article -> The | A noun_phrase -> article noun
article -> a | the noun -> girl | dog verb_phrase -> verb noun_phrase verb -> sees | pets
Derivation: sentence -> beginning noun-phrase verb-phrase. sentence -> beginning article noun verb-phrase. sentence -> THE noun verb-phrase. Now we enforced capital letters at the
beginning of sentences !!! (Semantic!)
40
![Page 41: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/41.jpg)
Context-Free Grammars and BNFs
Example: Describe using a CFG arithmetic expressions with addition and multiplication
expr -> expr + expr | expr * expr | (expr) | number
number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
We can alternatively say number -> [0-9]+
Exercise: Derive 235 + 55 Exercise: Derive (2 + 5) * 6
41
![Page 42: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/42.jpg)
Parse Trees and Abstract Syntax Trees
Derivations express the structure of syntax, but not very well.
There could be multiple derivations at times.
A parse tree better expresses the structure inherent in a derivation.
The parse tree graphically describes the replacement process in a derivation.
42
![Page 43: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/43.jpg)
Parse Trees and Abstract Syntax Trees
Example: Derive 234 from the following grammar using a parse tree.
number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
43
![Page 44: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/44.jpg)
Parse Trees and Abstract Syntax Trees
Example: Derive 234 from the
following grammar using a parse tree.
number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
number
number
number
digit
digit
digit
2
3
4
44
![Page 45: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/45.jpg)
Parse Trees and Abstract Syntax Trees
Example: Derive (2+3) * 4 from the following grammar using a parse tree.
expr -> expr + expr | expr * expr | (expr) | number
number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
45
![Page 46: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/46.jpg)
Parse Trees and Abstract Syntax Trees
Example: Derive (2+3) * 4 from the following grammar using a parse tree.
expr -> expr + expr | expr * expr | (expr) | number
number -> number digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
expr
* expr
number
digit
4
expr
expr expr +
number number
digit digit
2 3
expr
( )
46
![Page 47: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/47.jpg)
Parse Trees and Abstract Syntax Trees
Notes: Leaves are terminals (tokens) Interior nodes are non-terminals Every replacement in a derivation using a
grammar: A -> xyz corresponds to the creation of children at node
A:
A
z y x ...
47
![Page 48: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/48.jpg)
Parse Trees and Abstract Syntax Trees
Abstract Syntax Trees:
Parse trees are too detailed.
An abstract syntax tree condenses a parse tree to its essential structure.
48
![Page 49: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/49.jpg)
Parse Trees and Abstract Syntax Trees
Abstract Syntax Trees Example:
number
number
number
digit
digit
digit
2
3
4
2
3
4
Abstract Syntax Tree Original (Concrete) Syntax Tree
49
![Page 50: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/50.jpg)
Parse Trees and Abstract Syntax Trees
Abstract Syntax Trees Example:
Abstract Syntax Tree (Even Parentheses Can Go)
Original (Concrete) Syntax Tree
*
4 +
2 3
expr
* expr
number
digit
4
expr
expr expr +
number number
digit digit
2 3
expr
( )
50
![Page 51: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/51.jpg)
Parse Trees and Abstract Syntax Trees
Syntax Directed Semantics:
The parse tree and the abstract syntax tree must have a structure that corresponds to the computation being performed.
Also called Semantics-Based Syntax.
51
![Page 52: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/52.jpg)
Ambiguity, Associativity, and Precedence
Ambiguity: Two different derivations can lead to the same
parse tree.
Different derivations can lead to different parse trees also.
52
![Page 53: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/53.jpg)
Ambiguity, Associativity, and Precedence
Ambiguity:
A grammar is ambiguous if some string has two distinct parse (or abstract syntax) trees.
Not necessarily just two distinct derivations!
53
![Page 54: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/54.jpg)
Ambiguity, Associativity, and Precedence
Ambiguity Example:
expr
expr expr
expr
+
* expr
expr
expr
+
* expr
expr expr NUMBER (2)
NUMBER (3)
NUMBER (4)
NUMBER (2)
NUMBER (3)
NUMBER (4)
Grammar: expr -> expr + expr | expr * expr | (expr) | number
number -> number digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Derive: 2 + 3 * 4
Two Parse Trees Derive
the same Expression!!
Precedence Issue (Which
one first, multiplication or addition?)
54
![Page 55: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/55.jpg)
Ambiguity, Associativity, and Precedence
Ambiguity Example:
Grammar (With Subtraction Now): expr -> expr + expr | expr - expr | (expr) | number
number -> number digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Derive: 2 - 3 - 4
Two Parse Trees Derive
the same Expression!!
Associativity Issue (Which subtraction to execute
first?)
expr
expr expr
expr
-
- expr
expr
expr
-
- expr
expr expr NUMBER (2)
NUMBER (3)
NUMBER (4)
NUMBER (2)
NUMBER (3)
NUMBER (4)
55
![Page 56: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/56.jpg)
Ambiguity, Associativity, and Precedence
Ambiguity: Ambiguity must usually be eliminated.
Semantics determine which parse tree is correct.
56
![Page 57: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/57.jpg)
Ambiguity, Associativity, and Precedence
Leftmost Derivation: You can identify the presence of ambiguity by
leftmost derivations.
When performing a derivation, only replace the leftmost remaining non-terminal.
A leftmost derivation must have a unique parse tree, otherwise, the grammar is ambiguous!
57
![Page 58: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/58.jpg)
Ambiguity, Associativity, and Precedence
Leftmost Derivation Example: Derive 3 + 4 * 5
expr -> expr + expr expr -> number + expr expr -> 3 + expr expr -> 3 + expr * expr expr -> 3 + number * expr Etc.
Always replace the leftmost non-terminal first!
58
![Page 59: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/59.jpg)
Ambiguity, Associativity, and Precedence
Another Leftmost Derivation: Derive 3 + 4 * 5
expr -> expr * expr expr -> expr + expr * expr expr -> number + expr * expr expr -> 3 + expr * expr expr -> 3 + number * expr Etc.
The leftmost derivation in this example lead to a different parse tree!!
The grammar is ambiguous!!
59
![Page 60: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/60.jpg)
Ambiguity, Associativity, and Precedence
But which of the previously performed leftmost derivations is the correct one for the expression 3 + 4 * 5? Semantics determine that.
Which operator has higher precedence? The
addition or the multiplication?
60
![Page 61: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/61.jpg)
Ambiguity, Associativity, and Precedence
• Also, when executing 3 – 4 – 5, do we execute using
• Left precedence: (3-4) – 5 OR • Right precedence: 3 – (4 –5)
61
![Page 62: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/62.jpg)
Ambiguity, Associativity, and Precedence
Example of Ambiguity Removal by modifying the grammar. The grammar:
expr -> expr + expr | expr - expr | (expr) | number
number -> number digit | digit
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Is changed to:
expr -> expr + term | term term -> term * factor | factor factor -> (expr) | number number -> number digit | digit digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Multiplication is a lower rule, Forces multiplication to occur lower in the parse
tree, thus gives it higher precedence than addition
expr + term is different from
term + expr It controls associativity (left or right)
62
![Page 63: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/63.jpg)
Ambiguity, Associativity, and Precedence
If we say expr -> expr + term, (left recursion of expr), it causes left associativity
2 + 3 is executed first, then + 4
expr
term
+
+ expr
term expr
NUMBER (2)
NUMBER (3)
NUMBER (4)
63
![Page 64: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/64.jpg)
Ambiguity, Associativity, and Precedence
If we say expr -> term + expr, (right recursion of expr), it causes right associativity
3 + 4 is executed first, then 2 is added to them
expr
term expr
expr
+
+ term NUMBER (2)
NUMBER (3)
NUMBER (4)
64
![Page 65: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/65.jpg)
Ambiguity, Associativity, and Precedence
• Is there another way to remove ambiguity?
– Fully parenthesized expressions: expr → ( expr + expr ) | ( expr * expr ) | NUMBER
so: ((2 + 3) * 4)
and: (2 + (3 * 4))
– Prefix expressions:
expr → + expr expr | * expr expr | NUMBER
so: + + 2 3 4 and: + 2 * 3 4
– But both alternatives change the language!!!
65
![Page 66: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/66.jpg)
Extended BNF (EBNF) • An extension to classical BNF was adopted to simplify
grammatical rules.
• Example: – number -> number digit | digit – Generates a number as a sequence of digits:
• number -> number digit • number -> number digit digit • number -> digit digit digit • …
– Using BNF, we can express it as • number -> digit {digit} to express repetition (0 or more
occurrence)
66
![Page 67: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/67.jpg)
Extended BNF (EBNF)
• Example: – expr -> expr + term | term – Generates an expression as a sequence of terms
separated by +’s • expr -> expr + term • expr -> expr + term + term • expr -> expr + term + term + term • …
– It can be written in EBNF as • expr -> term {+term}
67
![Page 68: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/68.jpg)
Extended BNF (EBNF)
• We can use EBNF to express optional features.
• if_stmt -> if ( expr ) stmt | if( expr ) stmt else stmt
• Can be written using EBNF as:
• if-stmt → if( expr ) stmt [ else stmt ]
68
![Page 69: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/69.jpg)
Syntax Diagrams
Syntax diagrams are a useful graphical representation of grammar rules.
It indicates the sequence of terminals and non-terminals encountered in the right hand side of the rule.
EBNF is usually more compact than syntax diagrams.
69
![Page 70: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/70.jpg)
Syntax Diagrams
• Example: The syntax diagram of the following EBNF rule: – if-stmt → if( expr ) stmt [ else stmt ]
if-statement expression
statement
if ( )
else statement
Circles or ovals denote terminals
Squares or rectangles denote
terminals
70
![Page 71: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/71.jpg)
Parsing Techniques and Tools
A grammar written in BNF, EBNF, or as syntax diagrams describes the strings of tokens that are syntactically legal in a programming language.
The simples form or a parser is a recognizer: A program that accepts or rejects strings, based on whether they are
legal in the language or not.
More general parsers Build parse trees (or abstract syntax trees). Carry out other operations such as calculating values for expressions.
71
![Page 72: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/72.jpg)
Parsing Techniques and Tools
Parsers can be: Bottom-Up (Shift Reduce) Parsers.
Top-Down Parsers.
72
![Page 73: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/73.jpg)
Parsing Techniques and Tools
Bottom-Up (Shift Reduce) Parsers:
Match an input such as 234 with the right hand sides of grammatical
rules. When a match occurs, the right hand side is replaced by, or reduced to,
the non-terminal on the left. They construct derivations and parse trees from the leaves to the roots. They are also called shift reduce parsers because they shift tokens onto a
stack prior to reducing strings to non-terminals.
73
![Page 74: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/74.jpg)
Parsing Techniques and Tools
Top-Down Parsers:
Non-terminals are expanded to match incoming tokens and directly
construct a derivation.
74
![Page 75: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/75.jpg)
Parsing Techniques and Tools • Programs can be written that automatically translate a BNF description
into a parser.
• Bottom-up parsing is usually more powerful than top-down parsing, and is the preferred method for such parser generators.
• Parser generators are also called compiler compilers.
• YACC (Yet Another Compiler Compiler) is a famous parser generator. It’s freeware version is called Bison.
• To be covered in detail in a compiler course.
75
![Page 76: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/76.jpg)
Lexics vs. Syntax vs. Semantics A number can be defined by a regular expression.
A number can also be defined using a grammatical rule!
How do we define a number, using a regular expression or a BNF rule?
A scanner operating on regular expressions is definitely faster, no need
to use the extensive recursive power of a parser operating on BNF.
76
![Page 77: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/77.jpg)
Lexics vs. Syntax vs. Semantics
• Example: – Lexics: tokens exist such as:
• A, the, girl, dog, sees, pets, .
– Syntax: • How do we arrange the tokens above according to a language grammar? • Which one come when, the noun, the verb, the article, etc.
– Semantic:
• Articles such as “a”, “the” need to be upper case if at the beginning of the sentence.
77
![Page 78: Chapter 6: Syntaxrafea/CSCE325/slides/06/Syntax.pdf · certain tokens be separated by token delimiters or white space. End of lines may be significant ... • It’s freeware version](https://reader035.vdocument.in/reader035/viewer/2022070213/6109e71805ee483ef2171996/html5/thumbnails/78.jpg)
Lexics vs. Syntax vs. Semantics
• Rule:
If it is not grammar, or the disambiguating rules,
It’s semantics!
78