syntax and semantics form and meaning of programming languages copyright © 2003-2015 by curt hill

35
Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Upload: dora-russell

Post on 08-Jan-2018

235 views

Category:

Documents


6 download

DESCRIPTION

Some Terminology Sentence –A string of characters using some alphabet Language –A set of sentences –Possibly infinite Lexeme –The most basic unit of the syntax Token –A class of lexemes Copyright © by Curt Hill

TRANSCRIPT

Page 1: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Syntax and Semantics

Form and Meaning of Programming Languages

Copyright © 2003-2015 by Curt Hill

Page 2: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Definitions• Syntax: form of the

expressions, statements and units

• Semantics: meaning of those expressions, statements and units

• What is needed for this course and beyond is a way to describe both in a clear and unambiguous way

Copyright © 2003-2015 by Curt Hill

Page 3: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Some Terminology• Sentence

– A string of characters using some alphabet

• Language– A set of sentences– Possibly infinite

• Lexeme– The most basic unit of the syntax

• Token– A class of lexemes

Copyright © 2003-2015 by Curt Hill

Page 4: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Programming Languages• Here we also have characters and

lexemes• A token is a class of lexemes

– Any token is interchangeable with its own class for syntax

– It may change the meaning, but not the form

• In English: nouns, verbs etc– Nouns are interchangeable, even though

the meaning changes• Reserved words, punctuation,

identifiers

Copyright © 2003-2015 by Curt Hill

Page 5: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Tokens and Lexemes• The lexeme is the word or item from

the language itself• A token is the representation of the

lexeme that is output by the scanner• Tokens are often records or objects• Tokens are often identified by an

enumeration• This may be enhanced by other

information, such as an identifier in a symbol table

Copyright © 2003-2015 by Curt Hill

Page 6: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Formal methods of describing syntax

• Two men worthy of note– Noam Chomsky

•Noted linguist and political activist•Devised an hierarchy of languages

– John Backus•FORTRAN•Algol60•Backus Normal (Naur) Form

Copyright © 2003-2015 by Curt Hill

Page 7: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Chomsky Grammars• All languages are defined by a grammar

• A grammar contains four pieces– V - an alphabet

– The legal characters– T - set of terminal symbols

– Terminals may appear in the language such as reserved words

– Non-terminals may not appear• They are concepts or statements

composed of terminals– P - a set of rewriting rules, these

are called productions– Z - the distinguished symbol

Copyright © 2003-2015 by Curt Hill

Page 8: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

More on Grammar• A language is all the legal strings

accepted by this language• Terminals are those things that

actually exist in the language• Non-terminals are those things

that only represent syntactic items• For a parse to be complete all non-

terminals must be rewritten into terminals

• Lets consider a simple example

Copyright © 2003-2015 by Curt Hill

Page 9: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Binary• The grammar is

G = {V,T,P,Z}• The alphabet, terminals and non-

terminals:V = {0,1,Z,A}

• Terminals:T = {0,1}

• Non-Terminals must be Z and A• Distinguished symbol is Z• Productions are on next screen

Copyright © 2003-2015 by Curt Hill

Page 10: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Productions• P = {Z ::= AA ::= 1 AA ::= 0 AA ::= 0A ::= 1}

• A production allows us to rewrite from one form to another

• A non-terminal is on the left • Terminals and non-terminals on the right

Copyright © 2003-2015 by Curt Hill

Page 11: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Derive 101

Copyright © 2003-2015 by Curt Hill

Start with distinguished symbol

Z

Apply production Z::= A AApply production: A ::= 1 A 1A

Apply production: A ::= 0 A 10A

Apply production: A ::= 1 101

Page 12: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Chomsky Hierarchy• Chomsky proposed an hierarchy

of languages based on the strength of the rewriting rules

• There are four– Type 0 through Type 3

• The hierarchy is based on the strength of the rewriting rules

• Type 0 is strongest, 3 is weakest• In programming languages we

are only interested in the 3 and 2Copyright © 2003-2015 by Curt Hill

Page 13: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Type 3 - regular languages

• U ::= N or U := WN• U and W are non-terminals and

N is a terminal• A non-terminal may only be

replaced by a terminal or non-terminal followed by a terminal

• Often used for describing tokens• Regular expressions are of this

type

Copyright © 2003-2015 by Curt Hill

Page 14: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Type 2 - context free languages• U ::= v

• U is in set of non-terminals and v is in set of terminals and non-terminals

• A terminal may be replaced by any combination of terminals and non-terminals– The context of the terminal does not

matter• Most programming languages are

context-free or have a few minor exceptions

Copyright © 2003-2015 by Curt Hill

Page 15: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Language Hierarchies

Copyright © 2003-2015 by Curt Hill

Type 3 Regular

Type 2 Context Free

Type 1 Context Sensitive

Type 0 Unrestricted

Page 16: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

BNF• John Backus defined FORTRAN

with a notation similar to Context Free languages independent of Chomsky in 1959

• Peter Naur extended it slightly in describing ALGOL

• Became known as BNF for Backus Normal Form or Backus Naur Form

• Meta-language is the language that describes another language

Copyright © 2003-2015 by Curt Hill

Page 17: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

BNF Again• There are several meta-languages

for BNF, the production rules given above are one

• Like the Chomsky grammar there are non-terminals, terminals, productions and a start symbol– Each non-terminal represents some

abstract concept in a language– There is often some notational way

to distinguish a terminal from a non-terminal

Copyright © 2003-2015 by Curt Hill

Page 18: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Simplest notation• Form of productions: LHS RHS• Where:

– LHS is a non-terminal (context free and regular grammars)

– RHS is any sequence of terminals and non-terminals, including empty

• There can be many productions with exactly the same LHS, these are alternatives

• If the RHS contains the LHS, the rule is recursive

Copyright © 2003-2015 by Curt Hill

Page 19: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Simple extensions• Some times there is an alternation

symbol that allows us to only need one production with the same LHS, often the vertical bar

• Some times things enclosed in [ and ] are optional, they may be present zero or one times

• Some times things enclosed in { and } may be present 1 or more times– Thus [{x}] allows zero or more x items

Copyright © 2003-2015 by Curt Hill

Page 20: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

More• The extensions are often called

EBNF• Syntax graphs are equivalent

to EBNF• These tend to be more easy to

read

Copyright © 2003-2015 by Curt Hill

Page 21: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Simple Expressions

Copyright © 2003-2015 by Curt Hill

expressionterm

+

-termfactor

*

/factor

constant ident ( )expression

Page 22: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

BNF is generative• A derivation is sentence generation• Leftmost derivation

– Only the leftmost non-terminal can be rewritten

– This is usually the kind of derivation used by compilers

– The previous derivation was leftmost• There are also rightmost

derivations• The order of derivation does not

affect the language defined

Copyright © 2003-2015 by Curt Hill

Page 23: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Example BNF productions

Copyright © 2003-2015 by Curt Hill

<program> <stmts><stmts> <stmt> | <stmt> ; <stmts><stmt> <var> = <expr><var> a | b | c | d<expr> <term> + <term> | <term> - <term><term> <var> | const

Page 24: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Example Derivation

Copyright © 2003-2015 by Curt Hill

<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const

Page 25: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Parse trees• A multi-way tree where:

– Each interior node is a non-terminal

– Each leaf is a terminal– The start symbol is the root– Nested under each interior node

is the RHS of the production, with the LHS being the node itself

• This is a handy data structure for compilers and the like

Copyright © 2003-2015 by Curt Hill

Page 26: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Example Parse Tree

Copyright © 2003-2015 by Curt Hill

program

stmts

stmt

var expr =

term term = a

b

constvar

Page 27: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Ambiguity• A grammar is ambiguous when

two parse trees can be derived from the same input sequence

• An ambiguous grammars usually require some fix-up in the compiler to guarantee that only one will be chosen

• Many IF grammars are ambiguous concerning whether they have an else or not

Copyright © 2003-2015 by Curt Hill

Page 28: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

BNF Problems• BNF cannot capture important information– That a variable is defined– That an expression contains proper

types• Some problems like type checking

could be done but would bulk out the grammar so much to be unusable– Other problems like declare before use

in C++ are impossible to catch in BNF• Many of these are types of things

are called Static SemanticsCopyright © 2003-2015 by Curt Hill

Page 29: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

The Solution?• Attribute Grammars• An attempt to augment the

syntax with static semantic information

• Associate with each production (and with nodes of the parse tree) a function that would check the static semantic information

• Check the attributes with a set of predicates

Copyright © 2003-2015 by Curt Hill

Page 30: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Attribute Grammars• A context free grammar • For each symbol there may be a

set of attribute values• A set of functions that define these

attribute values based on non-terminals

Copyright © 2003-2015 by Curt Hill

Page 31: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Example

Copyright © 2003-2015 by Curt Hill

Production Attribute<exp>::=<term> val(exp)=val(term)<exp>::=<exp> + <term>

val(exp)=val(exp)+ val(term)

<term>::=<term> * <factor>

val(term)=val(term) * val(factor)

<term> ::= <factor>

val(term) = val(factor)

<factor> ::= ident val(factor) = val(ident)<factor> ::= (<exp>)

val(factor) = val(exp)Consider: 2+4(1+2)

Page 32: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Second Example

Copyright © 2003-2015 by Curt Hill

Production Attribute

<decl>::=<type><list> <type,names><type>::=int type=int<type>::=float type=float<list>::=ident names(list)=ident<list>::=ident , <list> names(list)=ident

names(list)

We can now determine whether defined or not from the types

Page 33: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Second example• Consider declarations• Production Attributes

<decl>::=<type><list><type,names> <type>::=inttype=int <type>::=floattype=float <list>::=identnames(list)=ident <list>::=ident , <list> names(list)=ident names(list) Now we can determine from the attributes whether an item is defined or not

Copyright © 2003-2015 by Curt Hill

Page 34: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

YACC Uses• YACC (Yet Another Compiler

Compiler) and many other programs is a common UNIX tool for constructing compilers

• YACC uses an attribute grammar of sorts– Attached to each production is a

function call– You get to write the function that

does the checking at that point, including code generation

Copyright © 2003-2015 by Curt Hill

Page 35: Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

Conclusion and Summary• Syntax is about the form of

langauges• Semantics the meaning• BNF represents a context free

grammar

Copyright © 2003-2015 by Curt Hill