part iii learning structured representations hierarchical bayesian … · 2010. 8. 23. ·...

72
Part III Learning structured representations Hierarchical Bayesian models

Upload: others

Post on 31-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Part III

Learning structured representationsHierarchical Bayesian models

Page 2: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

!

!

!

!

!

][

][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structuregrammars (e.g., CFG, HPSG, TAG)

Page 3: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 4: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

StructuredRepresentations

UnstructuredRepresentations

A historical divide

(Chomsky, Pinker, Keil, ...)

(McClelland, Rumelhart, ...)

Innate knowledge Learningvs

Page 5: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

InnateKnowledge

StructuredRepresentations

Learning

UnstructuredRepresentations

ChomskyKeil

McClelland,Rumelhart

StructureLearning

Page 6: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Representations

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

!

!

!

!

!

][

][][Grammars

Logical theories

Causal networks

asbestos

lung cancer

coughing chest pain

Page 7: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Representations

Chemicals Diseases

Bio-active substances

Biologicalfunctions

cause

affect

disrupt

interact with affectSemantic networks

Phonological rules

Page 8: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

How to learn a R• Search for R that maximizes

• Prerequisites– Put a prior over a hypothesis space of Rs.– Decide how observable data are generated

from an underlying R.

Page 9: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

How to learn a R• Search for R that maximizes

• Prerequisites– Put a prior over a hypothesis space of Rs.– Decide how observable data are generated

from an underlying R.

anything

Page 10: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Context free grammarS → N VP VP → V N → “Alice” V → “scratched”

VP → V N N → “Bob” V → “cheered”

S

N VP

V

cheered

Alice

S

N VP

V

scratched

Alice N

Bob

Page 11: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Probabilistic context free grammarS → N VP VP → V N → “Alice” V → “scratched”

VP → V N N → “Bob” V → “cheered”

S

N VP

V

cheered

Alice

1.0 0.6

0.4

0.5

0.5

0.5

0.5

S

N VP

V

scratched

Alice N

Bob

1.0

0.5 0.6

1.0

0.5 0.4

0.5 0.5

probability = 1.0 * 0.5 * 0.6= 0.3

probability = 1.0*0.5*0.4*0.5*0.5= 0.05

Page 12: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

The learning problem

S → N VP VP → V N → “Alice” V → “scratched”

VP → V N N → “Bob” V → “cheered”

1.0 0.6

0.4

0.5

0.5

0.5

0.5

Alice scratched.Bob scratched.Alice scratched Alice.Alice scratched Bob.Bob scratched Alice.Bob scratched Bob.

Alice cheered.Bob cheered.Alice cheered Alice.Alice cheered Bob.Bob cheered Alice.Bob cheered Bob.

Data D:

Grammar G:

Page 13: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Grammar learning• Search for G that maximizes

• Prior:

• Likelihood:– assume that sentences in the data are

independently generated from the grammar.

(Horning 1969; Stolcke 1994)

Page 14: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Experiment

• Data: 100 sentences

(Stolcke, 1994)

...

Page 15: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Generating grammar: Model solution:

Page 16: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

For all x and y, if y is the sibling of x then x isthe sibling of y

For all x, y and z, if x is the ancestor of y and yis the ancestor of z, then x is the ancestor of z.

Predicate logic

• A compositional language

Page 17: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Learning a kinship theory

Sibling(victoria, arthur), Sibling(arthur,victoria),Ancestor(chris,victoria), Ancestor(chris,colin),Parent(chris,victoria), Parent(victoria,colin),Uncle(arthur,colin), Brother(arthur,victoria) …

(Hinton, Quinlan, …)

Data D:

Theory T:

Page 18: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Learning logical theories• Search for T that maximizes

• Prior:

• Likelihood:– assume that the data include all facts that are

true according to T

(Conklin and Witten; Kemp et al 08; Katz et al 08)

Page 19: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Theory-learning in the lab

R(f,k)

R(f,c)R(k,c)

R(f,l) R(k,l)

R(c,l)

R(f,b) R(k,b)

R(c,b)

R(l,b)

R(f,h)

R(k,h)R(c,h)

R(l,h)

R(b,h)

(cf Krueger 1979)

Page 20: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Theory-learning in the labR(f,k). R(k,c). R(c,l). R(l,b). R(b,h).

R(X,Z) ← R(X,Y), R(Y,Z).

Transitive:

f,k f,c

k,c

f,l f,b f,h

k,l k,b k,h

c,l c,b c,h

l,b l,hb,h

Page 21: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Lear

ning

tim

e

Com

plex

ity

trans. trans.

Theo

ry le

ngth

Goo

dman

trans. excep. trans.(Kemp et al 08)

Page 22: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Conclusion: Part 1

• Bayesian models can combinestructured representations with statisticalinference.

Page 23: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 24: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

(Han and Zhu, 2006)

Vision

Page 25: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Motor Control

(Wolpert et al., 2003)

Page 26: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Causal learning

Causalmodels

ContingencyData

Schema

chemicals

diseases

symptoms

Patient 1: asbestos exposure, coughing, chest pain

Patient 2: mercury exposure, muscle wasting

asbestos

lung cancer

coughing chest pain

mercury

minamata disease

muscle wasting

(Kelley; Cheng; Waldmann)

Page 27: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

VerbVP

NPVPVP

VNPRelRelClause

RelClauseNounAdjDetNP

VPNPS

!

!

!

!

!

][

][][

Phrase structure

Utterance

Speech signal

Grammar

Universal Grammar Hierarchical phrase structuregrammars (e.g., CFG, HPSG, TAG)

P(phrase structure | grammar)

P(utterance | phrase structure)

P(speech | utterance)

P(grammar | UG)

Page 28: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

A hierarchical Bayesian model specifies a jointdistribution over all variables in the hierarchy:

P({ui}, {si}, G | U)

= P ({ui} | {si}) P({si} | G) P(G|U)

Hierarchical Bayesian model

P(G|U)

P(s|G)

P(u|s)

Page 29: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer {si} given {ui}, G:

P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)

Top-down inferences

Page 30: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G given {si} and U:

P(G| {si}, U) α P( {si} | G) P(G|U)

Bottom-up inferences

Page 31: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Phrase structure

Utterance

Grammar

Universal Grammar

u1 u2 u3 u4 u5 u6

s1 s2 s3 s4 s5 s6

G

U

Infer G and {si} given {ui} and U:

P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)

Simultaneous learning at multiple levels

Page 32: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Whole-object biasShape bias

gavagaiduckmonkeycar

Words in general

Individual words

Data

Word learning

Page 33: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

d1 d2 d3 d4

FH,FT

d1 d2 d3 d4

θ1

d1 d2 d3 d4

θ ~ Beta(FH,FT)

Coin 1 Coin 2 Coin 200...θ2 θ200

physical knowledge

• Qualitative physical knowledge (symmetry) can influenceestimates of continuous parameters (FH, FT).

• Explains why 10 flips of 200 coins are better than 2000 flipsof a single coin: more informative about FH, FT.

Coins

Page 34: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Word Learning

“This is a dax.” “Show me the dax.”

• 24 month olds show a shape bias• 20 month olds do not

(Landau, Smith & Gleitman)

Page 35: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Is the shape bias learned?• Smith et al (2002) trained

17-month-olds on labels for4 artificial categories:

• After 8 weeks of training 19-month-olds show the shapebias:

“wib”

“lug”

“zup”“div”

“This is a dax.” “Show me the dax.”

Page 36: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

?

Learning about feature variability

(cf. Goodman)

Page 37: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

?

Learning about feature variability

(cf. Goodman)

Page 38: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical model

Bag proportions

Data

mostlyred

mostlybrown

mostlygreen

Color varies across bagsbut not much within bagsBags in general

mostlyyellow

mostlyblue?

Meta-constraints M

Page 39: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

[1,0,0] [0,1,0] [1,0,0] [0,1,0] [.1,.1,.8]

= 0.1= [0.4, 0.4, 0.2]

Within-bag variability

…[6,0,0] [0,6,0] [6,0,0] [0,6,0] [0,0,1]

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 40: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

[.5,.5,0] [.5,.5,0] [.5,.5,0] [.5,.5,0] [.4,.4,.2]

= 5= [0.4, 0.4, 0.2]

Within-bag variability

…[3,3,0] [3,3,0] [3,3,0] [3,3,0] [0,0,1]

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 41: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Shape of the Beta prior

Page 42: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

Bag proportions

Data

Bags in general

MMeta-constraints

Page 43: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

M

Bag proportions

Data

Bags in general

Meta-constraints

Page 44: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Learning about feature variability

Categories in general

Meta-constraints

Individual categories

Data

M

Page 45: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

“wib” “lug” “zup” “div”

Page 46: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

“dax”

“wib” “lug” “zup” “div”

Page 47: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Model predictions

“dax”

“Show me the dax:”

Choice probability

Page 48: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Where do priors come from?M

Categories in general

Meta-constraints

Individual categories

Data

Page 49: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Knowledge representation

Page 50: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Children discover structural form• Children may discover that

– Social networks are often organized into cliques– The months form a cycle– “Heavier than” is transitive– Category labels can be organized into hierarchies

Page 51: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

Form

Structure

Data

mouse

squirrelchimp

gorilla

whiskers hands tail Xmousesquirrelchimpgorilla

Tree

Meta-constraints M

Page 52: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrelchimp

gorilla

Tree

whiskers hands tail Xmousesquirrelchimpgorilla

Meta-constraints M

Page 53: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Structural forms

Order Chain RingPartition

Hierarchy Tree Grid Cylinder

Page 54: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

P(S|F,n): Generating structures

if S inconsistent with F

otherwise

• Each structure is weighted by the number of nodesit contains:

where is the number of nodes in S

mouse

squirrelchimp

gorilla

mousesquirrel

chimp

gorilla

mousesquirrel

chimp gorilla

Page 55: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

• Simpler forms are preferred

A B C

P(S|F, n): Generating structures from forms

D

All possiblegraph structures S

P(S|F)

ChainGrid

Page 56: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

whiskers hands tail Xmousesquirrelchimpgorilla

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrelchimp

gorilla

Tree

Meta-constraints M

Page 57: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

• Intuition: features should be smooth overgraph S

Relatively smooth Not smooth

p(D|S): Generating feature data

Page 58: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

(Zhu, Lafferty & Ghahramani)

}i

j

Let be the featurevalue at node i

p(D|S): Generating feature data

Page 59: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

whiskers hands tail Xmousesquirrelchimpgorilla

A hierarchical Bayesian model

F: form

S: structure

D: data

mouse

squirrelchimp

gorilla

Tree

Meta-constraints M

Page 60: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

anim

als features

judg

es cases

Feature data: results

Page 61: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Developmental shifts

110 features

20 features

5 features

Page 62: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Similarity data: results

colors

colo

rs

Page 63: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Relational data

Form

Structure

Data

Cliques

Meta-constraints M

1 23

4 56 7 8

1 2 3 4 5 6 7 812345678

Page 64: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Relational data: resultsPrimates Bush cabinet Prisoners

“x dominates y” “x tells y” “x is friends with y”

Page 65: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Why structural form matters

• Structural forms support predictions aboutnew or sparsely-observed entities.

Page 66: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple
Page 67: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Cliques (n = 8/12) Chain (n = 7/12)

Experiment: Form discovery

Page 68: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Form

Structure

Data

mouse

squirrelchimp

gorilla

whiskers hands tail bloodedmousesquirrel ?chimp ?gorilla ?

warm

Universal Structure grammar U

Page 69: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

A hypothesis space of formsForm FormProcess Process

Page 70: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Conclusions: Part 2

• Hierarchical Bayesian models provide aunified framework which helps to explain:

– How abstract knowledge is acquired

– How abstract knowledge is used for induction

Page 71: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Outline

• Learning structured representations– grammars– logical theories

• Learning at multiple levels of abstraction

Page 72: Part III Learning structured representations Hierarchical Bayesian … · 2010. 8. 23. · •Learning structured representations –grammars –logical theories •Learning at multiple

Handbook of Mathematical Psychology, 1963