Slide 2
Syntax, Semantics - Definition
• The syntax of a programming language is the form of its expressions, statements and programming units.
• The semantics is the meaning of these expressions, statements and programming units.
• A grammar is a formal set of rules that describes a valid syntax of a language.
Slide 3
Syntax, Semantics - Examples
• Syntax of Date: DD/DD/DDDD
where D represents a digit.
The semantics describes which parts stand for the date, month and year.
• Syntax of “if” statement:
if ( <logic_expr> ) <stmt>
Semantics: <stmt> will be executed only if <logic_expr> evaluates to “true”
Slide 4
Lexemes
• Lexemes are the lowest level syntactic units.
Example:
val = (int)(xdot + y*0.3) ;
In the above statement, the lexemes are
val, = , ( , int, ), (, xdot, +, y, * , 0.3, ), ;
Slide 5
Tokens
The category of lexemes are tokens.
• Identifiers: Names chosen by the programmer. Eg. val, xdot, y
• Keywords: Names chosen by the language designer to help syntax and structure. Eg. int, return, void. (Keywords that cannot be used as identifiers are known as reserved words )
Slide 6
Tokens (Contd.)
• Operators: Identify actions. Eg. +, &&, !
• Literals: Denote values directly. Eg. 3.14, -10, ‘a’, true, null
• Punctuation Symbols: Supports syntactic structure. Eg. (, ), ;, {, }
Slide 7
Backus Naur Form (BNF)
Useful for describing the syntax of progr. languages. Eg. Pascal “if”:
Terminals
<if_stmt> if <logic_expr> then <stmt> Production
LHS Non-terminals
Non-terminals are abstractions for syntactic structures.Terminals are lexemes or tokens.
Slide 8
Logical OR in BNF
Logical OR in BNF is denoted by |
Eg.<digit> 0|1|2|3|4|5|6|7|8|9
<if_stmt> if <logic_expr> then <stmt>
| if <logic_expr> then <stmt> else <stmt>
<sign> + |
Slide 9
Recursive rules in BNF
A BNF rule is recursive if LHS appears on RHS.
Eg:<ident_list> <identifier>
| <identifier> , <ident_list>
<integer> <digit>
| <digit> <integer>
Slide 10
Extended BNF
• [ ] Optional element:
<if_stmt> if <logic_expr> then <stmt> [ else <stmt>]
<real_num> [<int_num>] . <int_num>
• { } Unspecified number of repititions: Repeated infefinitely or left out altogether.
<ident_list> <identifier> { , <identifier> }
Slide 11
EBNF (Contd.)
• ( …| …) Multiple choice options. A single element must be chosen from a group.
“for” loop in Pascal:
<for_stmt>for <var> := <expr> (to|downto) <expr> do <stmt>
EBNF enhances the readability and writability of
BNF.
Slide 12
Syntax Graphs
BNF Syntax Graph
• LHS <if_stmt> if_stmt
• Non-terminal <stmt>
• Terminal if
stmt
if
Slide 13
Syntax Graph - Example
BNF:
<if_stmt> if <logic_expr> then <stmt>
Syntax Graph:
if_stmt if logic_expr then stmt
Slide 15
Syntax Graph Constructs (Contd.)
• Unspecified repetitions:
{<digit>}
• Repetition with minimum one occurrence
<digit>{<digit>}
digit
digit
Slide 16
A simple grammar
<assign><ident>=<expr>
<ident> A|B|C
<expr> <ident>+<expr>
| <ident>*<expr>
| ( <expr> )
| <ident>
Slide 17
Sentences
A sentence is got by replacing the non terminals by strings of symbols according to the rules in the grammar.
Egs. (Based on the grammar on previous slide)
A = B*(A+C)
C = A+B*A
B = A
Slide 18
Parse Trees
Parse trees describe the hierarchical structure of sentences. It has the following properties:
• The root is labeled by LHS.
• Every non-leaf node (internal node) is a non-terminal.
• Each leaf is labeled with a terminal.
Slide 20
Ambiguous Grammar
A grammar that generates a sentence which has two or more distinct parse trees is said to be an ambiguous grammar.
Eg. If we rewrite the grammar on slide 15 as below,<assign><ident>=<expr>
<ident> A|B|C
<expr> <expr>+<expr>
| <expr>*<expr>
| ( <expr> )
| <ident>
then the sentence A = B*C+A would have two distinct parse trees, and therefore the above grammar is ambiguous.
Slide 21
Derivation
Derivation is a mechanism by which the rules of a grammar can be repeatedly applied to generate a sentence.
At each stage, a non-terminal is replaced by the right-hand side of a rule, till finally the whole sentence is generated.