syntax of all sorts

Post on 21-Jan-2016

35 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Syntax of All Sorts. COS 441 Princeton University Fall 2004. Acknowledgements. Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html - PowerPoint PPT Presentation

TRANSCRIPT

Syntax of All Sorts

COS 441

Princeton University

Fall 2004

Acknowledgements

Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford– Good source of supplemental information

http://theory.stanford.edu/people/jcm/books/cpl-teaching.html– Differs in foundational approach when

compared to Harper’s approach will discuss this difference in later lectures

Interpreter vs Compiler

Source Program

Source Program

Compiler

Input OutputInterpreter

Input OutputTarget

Program

Interpreter vs Compiler

• The difference is actually a bit more fuzzy

• Some “interpreters” compile to native code– SML/NJ runs native machine code!– Does fancy optimizations too

• Some “compilers” compile to byte-code which is interpreted – javac and a JVM which may also compile

Interactive vs Batch

• SML/NJ is a native code compiler with an interactive interface

• javac is a batch compiler for bytecode which may be interpreted or compiled to native code by a JVM

• Python compiles to bytecode then interprets the bytecode it has both batch and interactive interfaces

• Terms are historical and misleading– Best to be precise and verbose

Typical CompilerSource Program

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator Target Program

Brief Look at Syntax

• Grammar e ::= n | e + e | e £ e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

• Expressions in languagee e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d

… 27 £ 4 + 3

Grammar defines a languageExpressions in language derived by sequence of productions

Parse Tree

• Derivation represented by treee e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d … 27 £ 4 + 3

e

e £

e +

e

e27

4 3

Tree shows parenthesization of expression

Dealing with Parsing Ambiguity

• Ambiguity– Expression 27 £ 4 + 3 can be parsed two ways– Problem: 27 £ (4 + 3) (27 £ 4) + 3

• Ways to resolve ambiguity– Precedence

• Group £ before + • Parse 3 £ 4 + 2 as (3 £ 4) + 2

– Associativity• Parenthesize operators of equal precedence to left (or right)• Parse 3 + 4 + 5 as (3 + 4) + 5

Rewriting the Grammar

• Harper describe a way to rewrite the grammar to avoid ambiguity

• Eliminating ambiguity in a parsing grammar is a dark art– Not always even possible– Depends on the parsing technique you use– Learn all you want to know and more in a compiler

book or course

• Ambiguity is bad when trying to think formally about programming languages

Abstract Syntax Trees

• We want to reason about expressions inductively– So we need an inductive description of

expressions

• We could use the parse tree– But it has all this silly term and factor stuff to

parsing unambiguous

• Introduce nicer tree after the parsing phase

Syntax Comparisons

Digits d ::= 0 | … | 9

Numbers n ::= d | n d

Expressions e ::= n

| e1 + e2

| e1 £ e2

Numbers n 2 N

Expressions e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

Digits d ::= 0 | … | 9

Numbers n ::= d | n d

Expressions e ::= t | t + e

Terms t ::= f | f £ t

Factors f ::= n | (e)

Abstract

Ambiguous Concrete

Unambiguous Concrete

Induction Over Syntax

• We can think of the CFG notation as defining a predicate exp

• Thinking about things this way lets us use rule induction on abstract syntax

times(E1,E2)expE1 exp E2 exp

times-eplus(E1,E2)exp

E1 exp E2 expplus-e

num[N] expN 2 N

num-e

A More General Pattern

• One might ask what is the general pattern of predicates defined by CFG that represent abstract syntax– represents a signature that assigns arities

to operators such a num[N], plus, and times

– Note the T1 … Tn are all unique

OP(T1,…,Tn) term

T1 term Tn term(OP) = n

An Example

succ(X)natX nat

Szero nat

Z

An Example

succ(X)natX nat

Szero nat

Z

Operator Arity

zero 0

succ 1

An Example

succ(X)natX nat

Szero nat

Z

n ::= zero

| succ(n)

Operator Arity

zero 0

succ 1

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Operator Arity

zero 0

succ 1

even ::= zero

| succ(odd)

odd ::= succ(even)

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Operator Arity

zero 0

succ 1

Operator Arity

succ 1

even ::= zero

| succ(odd)

odd ::= succ(even)

NOT Abstract Syntax

eq(X,X) equal

A(X,succ(Y),succ(Z))A(X,Y,Z)

F(succ(succ(N)),Z)F(succ(N),X) F(N,Y) A(X,Y,Z)

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype nat = zero | succ of nat

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype exp = num of nat | plus of (exp * exp)| times of (exp * exp)

datatype nat = zero | succ of nat

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype exp = Num of int | Plus of (exp * exp)| Times of (exp * exp)

Capitalize by convention

datatype nat = Zero | Succ of nat

A small cheat for performance

Abstract Syntax To ML

• Converting things is not always straightforward– Constructor names need to be distinct

• ML lacks some features that would make some things easier– Operator overloading – Interacts badly with type-inference

• Subtyping would improve things too– We can get by with predicates sometimes instead

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

even ::= zero

| succ(odd)

odd ::= succ(even)

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

even ::= zero

| succ(odd)

odd ::= succ(even)

datatype even = EvenZero| EvenSucc of (odd)and odd = OddSucc of even

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

fun even(Zero) = true|even(Succ(n))= odd(n)and odd(Zero) = false |odd(Succ(n)) = even(n)

even ::= zero

| succ(odd)

odd ::= succ(even)

Syntax Trees Are Not Enough

• Programming languages include binding constructs– variables function arguments, variable function

declarations, type declarations

• When reasoning about binding constructs we want to ignore irrelevant details – e.g. the actual “spelling” of the variable– we only that variables are the same or different

• Still we must be clear about the scope of a variable definition

Abstract Binding Trees

• Harper uses abstract binding trees (Chapter 5)– This solution is based on very recent work

M. J. Gabbay and A.M. Pitts, A New Approach to Abstract Syntax with Variable Binding, Formal Aspects of Computing 13(2002)341-363

• The solution is new but the problems are old!Alonzo Church. A set of postulates for the foundation of logic. The

Annals of Mathematics, Second Series, Volume 33, Issue 2 (Apr. 1932), 346-366.

• We will talk about the problem with binding today and deal with the solutions in the next lecture

Lambda Calculus

• Formal system with three parts– Notation for function expressions– Proof system for equations– Calculation rules called reduction

• Basic syntactic notions– Free and bound variables– Illustrates some ideas about scope of binding

• Symbolic evaluation useful for discussing programs

History

• Original intention– Formal theory of substitution

• More successful for computable functions– Substitution ! symbolic computation– Church/Turing thesis

• Influenced design of Lisp, ML, other languages

• Important part of CS history and theory

Expressions and Functions

• Expressionsx + y x + 2 £ y + z

• Functionsx. (x + y) z. (x + 2 £ y + z)

• Application(x. (x + y)) 3 = 3 + y(z. (x + 2 £ y + z)) 5 = x + 2 £ y + 5

Parsing: x. f (f x) = x.( f (f (x)) )

Free and Bound Variables

• Bound variable is “placeholder”– Variable x is bound in x. (x+y) – Function x. (x+y) is same function as z. (z+y)

• Compare x+y dx = z+y dz x P(x) = z P(z)

• Name of free (=unbound) variable does matter– Variable y is free in x. (x+y) – Function x. (x+y) is not same as x. (x+z)

• Occurrences– y is free and bound in x. ((y. y+2) x) + y

Reduction

• Basic computation rule is -reduction

(x. e1) e2 [xÃe2]e1

where substitution involves renaming as

needed

• Example:

(f. x. f (f x)) (y. y+x)

Rename Bound Variables

• Rename bound variables to avoid conflicts

(f. z. f (f z)) (y. y+x)

z. (z+x)+x

• Substitute “blindly”

(f. x. f (f x)) (y. y+x)

x. (x+x)+x

Reduction

(f. x. f (f x)) (y. y+x)

Reduction

(f. z. f (f z)) (y. y+x)

Rename bound “x” to “z” to avoid conflict with free “x”

Reduction

(f. z. f (f z)) (y. y+x)

[fÃ(y. y+x)](z. f (f z))

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) ([yÃz](y+x))

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

z. [yÃ(z+x)](y+x)

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

z. ((z+x)+x)

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

[fÃ(y. y+x)](x. f (f x))

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) ([yÃx](y+x))

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

x. [yÃ(x+x)](y+x)

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

x. ((x+x)+x)

Main Points About -calculus

• captures “essence” of variable binding– Function parameters– Bound variables can be renamed

• Succinct function expressions

• Simple symbolic evaluator via substitution

• Easy rule: always rename bound variables to be distinct

top related