part iii learning structured representations hierarchical bayesian models

72
Part III Learning structured representations Hierarchical Bayesian models

Upload: deborah-velazquez

Post on 30-Dec-2015

31 views

Category:

Documents


0 download

DESCRIPTION

Part III Learning structured representations Hierarchical Bayesian models. Universal Grammar. Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG). Grammar. Phrase structure. Utterance. Speech signal. Outline. Learning structured representations grammars logical theories - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Part III Learning structured representations Hierarchical Bayesian models

Part III

Learning structured representationsHierarchical Bayesian models

Page 2: Part III Learning structured representations Hierarchical Bayesian models

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

→→→→→

][][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

Page 3: Part III Learning structured representations Hierarchical Bayesian models

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 4: Part III Learning structured representations Hierarchical Bayesian models

Structured Representations

Unstructured Representations

A historical divide

(Chomsky, Pinker, Keil, ...)

(McClelland, Rumelhart, ...)

Innate knowledge Learning

vs

Page 5: Part III Learning structured representations Hierarchical Bayesian models

Innate Knowledge

Structured Representations

Learning

Unstructured Representations

ChomskyKeil

McClelland,Rumelhart

Structure Learning

Page 6: Part III Learning structured representations Hierarchical Bayesian models

Representations

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

→→→→→

][][][

Grammars

Logical theories

Causal networks

asbestos

lung cancer

coughing chest pain

Page 7: Part III Learning structured representations Hierarchical Bayesian models

Representations

Chemicals Diseases

Bio-active substances

Biologicalfunctions

cause

affect

disrupt

interact with affectQuickTime™ and a decompressor

are needed to see this picture.Semantic networks

Phonological rules

QuickTime™ and a decompressor

are needed to see this picture.

Page 8: Part III Learning structured representations Hierarchical Bayesian models

How to learn a R

• Search for R that maximizes

• Prerequisites– Put a prior over a hypothesis space of Rs.– Decide how observable data are generated

from an underlying R.

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

Page 9: Part III Learning structured representations Hierarchical Bayesian models

How to learn a R

• Search for R that maximizes

• Prerequisites– Put a prior over a hypothesis space of Rs.– Decide how observable data are generated

from an underlying R.

QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.QuickTime™ and a decompressor

are needed to see this picture.

anything

Page 10: Part III Learning structured representations Hierarchical Bayesian models

Context free grammar

S N VP VP V N “Alice” V “scratched”

VP V N N “Bob” V “cheered”

S

N VP

V

cheered

Alice

S

N VP

V

scratched

Alice N

Bob

Page 11: Part III Learning structured representations Hierarchical Bayesian models

Probabilistic context free grammar

S N VP VP V N “Alice” V “scratched”

VP V N N “Bob” V “cheered”

S

N VP

V

cheered

Alice

1.0 0.6

0.4

0.5

0.5

0.5

0.5

S

N VP

V

scratched

Alice N

Bob

1.0

0.5 0.6

1.0

0.5 0.4

0.5 0.5

probability = 1.0 * 0.5 * 0.6= 0.3

probability = 1.0*0.5*0.4*0.5*0.5= 0.05

Page 12: Part III Learning structured representations Hierarchical Bayesian models

The learning problem

S N VP VP V N “Alice” V “scratched”

VP V N N “Bob” V “cheered”

1.0 0.6

0.4

0.5

0.5

0.5

0.5

Alice scratched.Bob scratched.Alice scratched Alice.Alice scratched Bob.Bob scratched Alice.Bob scratched Bob.

Alice cheered.Bob cheered.Alice cheered Alice.Alice cheered Bob.Bob cheered Alice.Bob cheered Bob.

Data D:

Grammar G:

Page 13: Part III Learning structured representations Hierarchical Bayesian models

Grammar learning• Search for G that maximizes

• Prior:

• Likelihood: – assume that sentences in the data are

independently generated from the grammar.

QuickTime™ and a decompressorare needed to see this picture.QuickTime™ and a decompressorare needed to see this picture.

QuickTime™ and a decompressorare needed to see this picture.

QuickTime™ and a decompressorare needed to see this picture.

QuickTime™ and a decompressorare needed to see this picture.

QuickTime™ and a decompressorare needed to see this picture.

(Horning 1969; Stolcke 1994)

Page 14: Part III Learning structured representations Hierarchical Bayesian models

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Experiment

• Data: 100 sentences

(Stolcke, 1994)

...

Page 15: Part III Learning structured representations Hierarchical Bayesian models

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Generating grammar: Model solution:

Page 16: Part III Learning structured representations Hierarchical Bayesian models

For all x and y, if y is the sibling of x then x isthe sibling of y

For all x, y and z, if x is the ancestor of y and y is the ancestor of z, then x is the ancestor of z.

Predicate logic

• A compositional language

Page 17: Part III Learning structured representations Hierarchical Bayesian models

Learning a kinship theory

Sibling(victoria, arthur), Sibling(arthur,victoria),

Ancestor(chris,victoria), Ancestor(chris,colin),

Parent(chris,victoria), Parent(victoria,colin),

Uncle(arthur,colin), Brother(arthur,victoria)…(Hinton, Quinlan, …)

Data D:

Theory T:

Page 18: Part III Learning structured representations Hierarchical Bayesian models

Learning logical theories

• Search for T that maximizes

• Prior:

• Likelihood: – assume that the data include all facts that are

true according to T

(Conklin and Witten; Kemp et al 08; Katz et al 08)

Page 19: Part III Learning structured representations Hierarchical Bayesian models

Theory-learning in the lab

R(f,k)

R(f,c)R(k,c)

R(f,l) R(k,l)

R(c,l)

R(f,b) R(k,b)

R(c,b)

R(l,b)

R(f,h)

R(k,h)R(c,h)

R(l,h)

R(b,h)

(cf Krueger 1979)

Page 20: Part III Learning structured representations Hierarchical Bayesian models

Theory-learning in the lab

R(f,k). R(k,c). R(c,l). R(l,b). R(b,h).

R(X,Z) ← R(X,Y), R(Y,Z).

Transitive:

f,k f,c

k,c

f,l f,b f,h

k,l k,b k,h

c,l c,b c,h

l,b l,h

b,h

Page 21: Part III Learning structured representations Hierarchical Bayesian models

Lear

ning

tim

e

Com

plex

ity

trans. trans.

The

ory

leng

th

Goo

dman

trans. excep. trans.(Kemp et al 08)

Page 22: Part III Learning structured representations Hierarchical Bayesian models

Conclusion: Part 1

• Bayesian models can combine structured representations with statistical inference.

Page 23: Part III Learning structured representations Hierarchical Bayesian models

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 24: Part III Learning structured representations Hierarchical Bayesian models

(Han and Zhu, 2006)

Vision

Page 25: Part III Learning structured representations Hierarchical Bayesian models

Motor Control

(Wolpert et al., 2003)

Page 26: Part III Learning structured representations Hierarchical Bayesian models

Causal learning

Causal models

ContingencyData

Schema

chemicals

diseases

symptoms

Patient 1: asbestos exposure, coughing, chest pain

Patient 2: mercury exposure, muscle wasting

asbestos

lung cancer

coughing chest pain

mercury

minamata disease

muscle wasting

(Kelley; Cheng; Waldmann)

Page 27: Part III Learning structured representations Hierarchical Bayesian models

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

→→→→→

][][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(phrase structure | grammar)

P(utterance | phrase structure)

P(speech | utterance)

P(grammar | UG)

Page 28: Part III Learning structured representations Hierarchical Bayesian models

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy:

P({ui}, {si}, G | U)

= P ({ui} | {si}) P({si} | G) P(G|U)

Hierarchical Bayesian model

P(G|U)

P(s|G)

P(u|s)

Page 29: Part III Learning structured representations Hierarchical Bayesian models

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer {si} given {ui}, G:

P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)

Top-down inferences

Page 30: Part III Learning structured representations Hierarchical Bayesian models

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G given {si} and U:

P(G| {si}, U) α P( {si} | G) P(G|U)

Bottom-up inferences

Page 31: Part III Learning structured representations Hierarchical Bayesian models

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G and {si} given {ui} and U:

P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)

Simultaneous learning at multiple levels

Page 33: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

d1 d2 d3 d4

FH,FT

d1 d2 d3 d4

1

d1 d2 d3 d4

~ Beta(FH,FT)

Coin 1 Coin 2 Coin 200...2200

physical knowledge

• Qualitative physical knowledge (symmetry) can influence estimates of continuous parameters (FH, FT).

• Explains why 10 flips of 200 coins are better than 2000 flips of a single coin: more informative about FH, FT.

Coins

Page 34: Part III Learning structured representations Hierarchical Bayesian models

Word Learning

“This is a dax.” “Show me the dax.”

• 24 month olds show a shape bias

• 20 month olds do not

(Landau, Smith & Gleitman)

Page 35: Part III Learning structured representations Hierarchical Bayesian models

Is the shape bias learned?• Smith et al (2002) trained

17-month-olds on labels for 4 artificial categories:

• After 8 weeks of training 19-month-olds show the shape bias:

“wib”

“lug”

“zup”“div”

“This is a dax.” “Show me the dax.”

Page 36: Part III Learning structured representations Hierarchical Bayesian models

?

Learning about feature variability

(cf. Goodman)

Page 37: Part III Learning structured representations Hierarchical Bayesian models

?

Learning about feature variability

(cf. Goodman)

Page 38: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical model

Bag proportions

Data

mostlyred

mostlybrown

mostlygreen

Color varies across bags but not much within bags

Bags in general

mostlyyellow

mostlyblue?

Meta-constraints M

Page 39: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

[1,0,0] [0,1,0] [1,0,0] [0,1,0] [.1,.1,.8]

= 0.1

= [0.4, 0.4, 0.2]

Within-bag variability

…[6,0,0] [0,6,0] [6,0,0] [0,6,0] [0,0,1]

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 40: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

[.5,.5,0] [.5,.5,0] [.5,.5,0] [.5,.5,0] [.4,.4,.2]

= 5

= [0.4, 0.4, 0.2]

Within-bag variability

…[3,3,0] [3,3,0] [3,3,0] [3,3,0] [0,0,1]

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 41: Part III Learning structured representations Hierarchical Bayesian models

Shape of the Beta prior

Page 42: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

Bag proportions

Data

Bags in general

MMeta-constraints

Page 43: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 44: Part III Learning structured representations Hierarchical Bayesian models

Learning about feature variability

Categories in general

Meta-constraints

Individual categories

Data

M

Page 45: Part III Learning structured representations Hierarchical Bayesian models

“wib” “lug” “zup” “div”

Page 46: Part III Learning structured representations Hierarchical Bayesian models

“dax”

“wib” “lug” “zup” “div”

Page 47: Part III Learning structured representations Hierarchical Bayesian models

Model predictions

“dax”

“Show me the dax:”

Choice probability

Page 48: Part III Learning structured representations Hierarchical Bayesian models

Where do priors come from?

M

Categories in general

Meta-constraints

Individual categories

Data

Page 49: Part III Learning structured representations Hierarchical Bayesian models

Knowledge representation

Page 50: Part III Learning structured representations Hierarchical Bayesian models

Children discover structural form

• Children may discover that– Social networks are often organized into cliques– The months form a cycle– “Heavier than” is transitive– Category labels can be organized into hierarchies

Page 51: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

Form

Structure

Data

mouse

squirrel

chimp

gorilla

whiskers hands tail X

mouse

squirrel

chimp

gorilla

Tree

Meta-constraints M

Page 52: Part III Learning structured representations Hierarchical Bayesian models

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrel

chimp

gorilla

Tree

whiskers hands tail X

mouse

squirrel

chimp

gorilla

Meta-constraints M

Page 53: Part III Learning structured representations Hierarchical Bayesian models

Structural forms

Order Chain RingPartition

Hierarchy Tree Grid Cylinder

Page 54: Part III Learning structured representations Hierarchical Bayesian models

P(S|F,n): Generating structures

if S inconsistent with F

otherwise

• Each structure is weighted by the number of nodes it contains:

where is the number of nodes in S

mouse

squirrel

chimp

gorilla

mouse

squirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Page 55: Part III Learning structured representations Hierarchical Bayesian models

• Simpler forms are preferred

A B C

P(S|F, n): Generating structures from forms

D

All possible graph structures S

P(S|F)

Chain

Grid

Page 56: Part III Learning structured representations Hierarchical Bayesian models

whiskers hands tail X

mouse

squirrel

chimp

gorilla

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrel

chimp

gorilla

Tree

Meta-constraints M

Page 57: Part III Learning structured representations Hierarchical Bayesian models

• Intuition: features should be smooth over graph S

Relatively smooth Not smooth

p(D|S): Generating feature data

Page 58: Part III Learning structured representations Hierarchical Bayesian models

(Zhu, Lafferty & Ghahramani)

}i

j

Let be the feature value at node i

p(D|S): Generating feature data

Page 59: Part III Learning structured representations Hierarchical Bayesian models

whiskers hands tail X

mouse

squirrel

chimp

gorilla

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrel

chimp

gorilla

Tree

Meta-constraints M

Page 60: Part III Learning structured representations Hierarchical Bayesian models

anim

als features

judg

es cases

Feature data: results

Page 61: Part III Learning structured representations Hierarchical Bayesian models

Developmental shifts

110 features

20 features

5 features

Page 62: Part III Learning structured representations Hierarchical Bayesian models

Similarity data: results

colors

colo

rs

Page 63: Part III Learning structured representations Hierarchical Bayesian models

Relational data

Form

Structure

Data

Cliques

Meta-constraints M

1 23

4 56 7 8

1 2 3 4 5 6 7 812345678

Page 64: Part III Learning structured representations Hierarchical Bayesian models

Relational data: results

Primates Bush cabinet Prisoners

“x dominates y” “x tells y” “x is friends with y”

Page 65: Part III Learning structured representations Hierarchical Bayesian models

Why structural form matters

• Structural forms support predictions about new or sparsely-observed entities.

Page 66: Part III Learning structured representations Hierarchical Bayesian models
Page 67: Part III Learning structured representations Hierarchical Bayesian models

Cliques (n = 8/12) Chain (n = 7/12)

Experiment: Form discovery

Page 68: Part III Learning structured representations Hierarchical Bayesian models

Form

Structure

Data

mouse

squirrel

chimp

gorilla

whiskers hands tail blooded

mouse

squirrel ?chimp ?gorilla ?

Universal Structure grammar U

Page 69: Part III Learning structured representations Hierarchical Bayesian models

A hypothesis space of forms

Form FormProcess Process

Page 70: Part III Learning structured representations Hierarchical Bayesian models

Conclusions: Part 2

• Hierarchical Bayesian models provide a unified framework which helps to explain:

– How abstract knowledge is acquired

– How abstract knowledge is used for induction

Page 71: Part III Learning structured representations Hierarchical Bayesian models

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 72: Part III Learning structured representations Hierarchical Bayesian models

Handbook of Mathematical Psychology, 1963

QuickTime™ and a decompressor

are needed to see this picture.