from pfp erformance to ctc ompetence · evelin 2012 florianjaeger human language processinglab ,...

89
F P f t C t F P f t C t From Performance to Competence From Performance to Competence Language evolution, language change, and Language evolution, language change, and evidence that language is shaped by usage evidence that language is shaped by usage EVELIN 2012 Florian Jaeger Human Language Processing Lab EVELIN 2012, Psycholinguistic and Grammar , Lecture 1 Florian Jaeger , Human Language Processing Lab http://www.hlp.rochester.edu/

Upload: others

Post on 08-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

F P f t C tF P f t C tFrom Performance to CompetenceFrom Performance to CompetenceLanguage evolution, language change, and Language evolution, language change, and evidence that language is shaped by usageevidence that language is shaped by usage

EVELIN 2012

Florian Jaeger Human Language Processing Lab

EVELIN 2012, Psycholinguistic and Grammar, Lecture 1

Florian Jaeger, Human Language Processing Lab  http://www.hlp.rochester.edu/

Page 2: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Please interrupt when you have Please interrupt when you have iiquestionsquestions

[2]

Page 3: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Email listEmail list

• Please subscribe to our email list

https://groups.google.com/group/evelin2012‐h li i ti / b ibpsycholinguistic/subscribe

All updates will be sent to this address rather than the previous email list I used.

[3]

Page 4: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Goal of this classGoal of this class

P d di d d i i h f h li i i• Present and discuss data and insights from psycholinguistics that should be of relevance to linguistics and in doing so to give an introduction to a few current views and issues in psycholinguistics.

• Our entry point will be language use, which by some is treated as orthogonal to the study of linguistics (e.g. certain approaches in generative grammar), whereas others consider pp g g ),it pivotal (e.g. so‐called functionalist linguistics)

[4]

Page 5: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Language use (performance) & Language use (performance) & Grammar (competence)?Grammar (competence)?Grammar (competence)?Grammar (competence)?

C f• Competence vs. performance:– Ontologically (i.e. by definition) such a distinction can be drawn– But we can ask another question – and in some sense it’s a 

personal question Is this distinction in the definition offered bypersonal question: Is this distinction in the definition offered by other useful or even relevant to (your) research? Is it productive in that it generates predictions that can be investigated? Etc.

• An old claim (let’s call it the functionalist hypothesis, even though it’s older): Grammar cannot be studied withoutthough it s older): Grammar cannot be studied without reference to language use.   [e.g. BatesMacWhinney82,89; Bybee01,02; Givon91,92,01; Hawkins94 01 02 04 07; Hocket60; Langacker91 92; Slobin73]Hawkins94,01,02,04,07; Hocket60; Langacker91,92; Slobin73]

[5]

Page 6: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

IfIf soso

‘G ’ b d d i h h li i i ‘Grammar’ cannot be understood without psycholinguistics and sociolinguistics as the studies of language use 

[6]

Page 7: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

What is meant by this?What is meant by this?

Wh h i f ‘ ’ i d f l i ?• What are the properties of ‘grammar’ in need of explanation?– Language acquisition

• The mere fact that it is possible (cf. poverty of stimulus, subset problem)problem)

• The time course of language acquisition– Patterns of language change

Typological distributions (‘Universals’)– Typological distributions ( Universals )

• Two interpretations of the functionalist hypothesis:p yp– Strong: All of the above properties can be explained without 

reference to arbitrary linguistic biases/rules/constraints.– Weak: Some/many of the above properties can be explained 

ith t f t bit li i ti bi / l / t i twithout reference to arbitrary linguistic biases/rules/constraints.

[7]

Page 8: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Types of universalsTypes of universals

C i l• Categorical– Absolute: Every/No language has property X– Implicational: Every language that has property X also has 

property Yproperty Y

• Gradient/Statistical– E.g. word order preferences:g p f

[based on Tomlin 1986]

– E.g. Every language that has property X tends to also have property Y 

• I’ll use the term ‘universal’ for all of these – for convenience’sI ll use the term  universal  for all of these  for convenience s sake.

[8]

Page 9: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

The functionalist hypothesis for The functionalist hypothesis for typological generalizationstypological generalizationstypological generalizationstypological generalizations

F i l l h G i l• Functional pressures on language change: Grammatical properties may be observed more often across languages because they improve a language’s ‘utility’. [e.g. BatesMacWhinney82,89; Bybee01,02; ChristiansonChater08; Croft04; Givon91,92,01; Hawkins94,01,02,04,07; Hocket60; Langacker91; Slobin73; Zipf49]

[9]

Page 10: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Biological vs. cultural evolutionBiological vs. cultural evolution

Biological evolution

Cultural evolution/language change ‘transmission’

[10]

[taken from Nowak et al. 2002‐Nature]

Page 11: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

The functionalist hypothesis for The functionalist hypothesis for typological generalizationstypological generalizationstypological generalizationstypological generalizations

F i l l h G i l• Functional pressures on language change: Grammatical properties may be observed more often across languages because they improve a language’s ‘utility’. [e.g. BatesMacWhinney82,89; Bybee01,02; ChristiansonChater08; Croft04; Givon91,92,01; Hawkins94,01,02,04,07; Hocket60; Langacker91; Slobin73; Zipf49]

• This is an intriguing possibility, as it promises to reduce the b f iti l bit ( li i ti ifi ) tinumber of cognitively arbitrary (=linguistic specific) properties 

of language that we need to explain. 

[11]

Page 12: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

ChallengesChallenges

1. ‘Transmission problem’: Where do hypothesized pressures operate?  That is, how would such pressures come to shape language over time? g g

– Biases on language acquisition, changing the structures acquired by the next generation [Lecture 2] 

– Biases operating throughout adult life that change the output provided to the next generation ( would imply linguistic 

ll bilit th h t lif ) [L t 3 4]malleability throughout life) [Lecture 3, 4]

[12]

Page 13: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

ChallengesChallenges

2. What is ‘utility’? What is good? [Lecture 1 & 4]– Learnability 

[cf. Deacon98; Slobin76; Newport81; ChristiansenChater08]E f i Mi i i ti f t– Ease of processing, e.g.: Minimization of memory cost [cf. GildeaTemperley08; Hawkins94,01,02,04,07,09; Levy05; Tily11]

– Trade‐off between production and comprehension effort [Zipf, 1935, 1949; Levy & Jaeger, 2007]

– Efficient and robust communication[cf. Aylett and Turk, 2004; FerreriCancho05,07,10; GenzelCharniak02,03; Jaeger06,10; LevyJaeger07;  PiantadosiTilyGibson11; QianJaeger09,10,submitted]

we need to define utility based on clear principles that are validated against psycholing. (and socioling.) data

[13]

Page 14: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[Hawkins, 2004

[14]

Page 15: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[15]

Page 16: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

PlanPlan

• Today: – Crash course on word recognition and some aspects of sentenceCrash course on word recognition and some aspects of sentence 

processing– Typological and diachronic evidence that languages across the 

world are shaped by general learning biases, processing preferences and communicative pressures.  [Gildea and Temperley, 2010; Hawkins, 2004; Manin, 2006; Piantadosi et al., 2011a,b; Zipf, 1935, 1949]

– Introduction to fundamentals of information theory 

[16]

Page 17: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

ReadingsReadings

R i d• Required: – Pinker (2000) – 2pp– Jaeger and Tily (2011) – 7 pp

[17]

Page 18: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

P t 2P t 2Part 2Part 2

Let’s start with a wellLet’s start with a well‐‐known fact known fact about the mental lexiconabout the mental lexiconabout the mental lexiconabout the mental lexicon

[18]

Page 19: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

The frequency ~ length correlationThe frequency ~ length correlation

German

[19]

[taken from Zipf 1935:23; based on Kaeding 1928]

Page 20: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

American English

[20][taken from Zipf 1935:28]

Page 21: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

The Principle of Least EffortThe Principle of Least Effort[Zi f 1949 1][Zipf 1949:1]

I i l h P i i l f L EffIn simple terms, the Principle of Least Efforts means, for example, that a person in solving his immediate problems will view these against the background of his probable future problems as estimated by himself. […] The person will strive to minimize the [ ] pprobable average rate of his work‐expenditure (over time).[ h i i i i l Zi f tt ib t th t f i il[emphasis in original; Zipf attributes the roots of similar ideas to Maupertuis in the 18th century]

[21]

Page 22: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

The Principle of Least EffortThe Principle of Least Effort[Zi f 1949 1][Zipf 1949:1]

I i l h P i i l f L EffIn simple terms, the Principle of Least Efforts means, for example, that a person in solving his immediate problems will view these against the background of his probable future problems as estimated by himself. […] The person will strive to minimize the [ ] pprobable average rate of his work‐expenditure (over time).[ h i i i i l Zi f tt ib t th t f i il[emphasis in original; Zipf attributes the roots of similar ideas to Maupertuis in the 18th century]

[22]

Page 23: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Two opposing forcesTwo opposing forces

P i id i h l i h ( h• Prior to considering that language use might (among other things) serve to communicate:– Speaker economy (force of unification): map all meanings onto 

th ( h t) d /Ə/the same (short) word /Ə/– Hearer economy (force of diversification): map each meaning 

onto a different word 

These two forces together are assumed to affect (through diachronic change) the structure of the mental lexicon.

[23]

Page 24: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Zi f’ i i h d l i d d i• Zipf’s insights and speculations preceded two important events that are crucial to his general idea: – The formulation of information theory (Shannon, 1948) – The rise of modern psycholinguistics (1960s): e.g. What makes a 

word easy to produce or recognize?

[24]

Page 25: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Part 3Part 3

Information theory (light)Information theory (light)y ( g )y ( g )

Page 26: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Information theoryInformation theory[Shannon 1948][Shannon, 1948]

[26]

Page 27: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

InformationInformation

Sh i f i f l f d• Shannon information, for example, of a word

I(w) = log[ 1 / p(w)  ]= ‐log p(w)

– Log often taken to base 2  units of information is “bits”

• Intuitive properties:– 0 bits new information, if something is perfectly predictable (cf. 

surprisal)surprisal)– More new information, the less predictable something is

[27]

Page 28: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Communication through a noisy Communication through a noisy channelchannelchannelchannel

[Figure 1 from Shannon 1949]

[28]

Page 29: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Shannon’s noisy channel theorem Shannon’s noisy channel theorem (for mere mortals like us)(for mere mortals like us)

[Shannon, 1948; Wolfowitz, 1961]

• For any noisy digital channel with capacity C > 0 and any rate f i f ti t i i 0 R C th i fi itof information transmission 0 < R < C, there is a finite 

sequence of n codes (a finite language with 1 ≤ n < ∞ words)  so that – The number of possible words increases exponentially with the 

maximum length l < n– The error probability decreases exponentially with nIf R C i i ibl i bi il lIf R < C, it is possible to communicate at an arbitrarily low error 

rate 

• The converse holds, too: for any R > C, the error probability , y , p ywill converge against 1 the larger the codeword vocabulary.

[29]

Page 30: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Noisy Channel TheoremNoisy Channel Theorem

Th h l i d fi h i l f i f i• The channel capacity defines the maximal rate of information per time step/sent signal that allows communication at an arbitrarily low error rate.

• An optimal code then transmits information at an average rate close to, but not exceeding the channel capacity.– Constant Entropy Rate [Genzel and Charniak, 2002]– Smooth Signal Redundancy [Aylett and Turk, 2004]– Uniform information density [Jaeger, 2006; Levy and Jaeger, 2007]

[30]

Page 31: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Part 4Part 4

Some background on word recognitionSome background on word recognitiong gg g

Page 32: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Spoken word recognition

Page 33: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Spoken word recognition

The problem:Spoken words unfold as a series of transient acoustic events extending

f h d d ith t li bl t d b d iover a few hundred ms, without reliable cues to word boundaries.

Imagine reading this page through a two-letter aperture, the text scrolling past without spaces separating the words, at a variable rate one could not control, with the visual features for each letter arriving asynchronously.

Page 34: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

We don’t say what we hearWe don’t say what we hear

M f h h i ’ ll i h h• Most of what we hear isn’t really in the speech stream or at least not in the sequential order that one might think

[Figure 2 from[Figure 2 from Johnson ,2004]

[34]

Page 35: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Variability (noise) in productionVariability (noise) in production

E h h li d h d ’• Even when phonemes are realized, they don’t map deterministically onto acoustic dimension

The acoustic signal created byspeakers maps linguistic 

[from Fox and Vaughn, in prep]

categories probabilistically onto acoustic dimension (e.g. energy distributions over ( g gyfrequencies)

35

Page 36: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Perceptual noisePerceptual noise

Th h i bi l i l d h• The human nervous system is a biological system and as such it exhibits noisy responses to stimuli. Consider for example, a neuron’s response rate to its preferred stimulus (e.g. a line with a given spatial orientation):

tria

l)A

ctiv

ity(o

ne t

Noise at the neural level. 

Neu

ral

Preferred stimulus

36

Page 37: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

An illustration:An illustration:Noisy visual input during readingNoisy visual input during readingNoisy visual input during readingNoisy visual input during reading

Page 38: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Over time Over time 

Page 39: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 40: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 41: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 42: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 43: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 44: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts
Page 45: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Speech perception: A challenging task!Speech perception: A challenging task!

Li h f d h f• Listeners have to extract features and phonemes out of speech stream

• Perception is noisy• Perception is noisy• Lack of invariance  [cf. Lecture 3]

– for segments and for speakersg p– Sounds can be affected by phonetic context, syllabic stress, prosody & intonation, speaking rate, and emotional state

• So, how do we do it?

Page 46: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Integration of bottomIntegration of bottom‐‐up noisy signal up noisy signal with topwith top down expectationsdown expectationswith topwith top‐‐down expectationsdown expectations

[M Cl ll d][ lh ] [McClelland][Rummelhart]

[46]

[Rummelhart and McClelland 1981; this is a model of visual word recognition, but similar models have been proposed for spoken word recognition]

Page 47: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Sine Wave SpeechSine Wave Speech

Tones 1 and 2Tones 1 and 3Tones 2 and 3

All three

Original hTones 2 and 3

Source: http://www.haskins.yale.edu/haskins/misc/SWS/tonecombo.html

speech

Page 48: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Two types of topTwo types of top‐‐down knowledgedown knowledge(illustrated for visual word recognition)(illustrated for visual word recognition)

Captures degraded or partial input(word superiority)

• The ‘word superiority’ effect

• The ‘pronounceable non‐word’ effect

(word superiority)

• In online processing this effect is present too:

• In online processing, we are faster to recognizeeffect is present, too: 

We are faster at recognizing letters in 

are faster to recognize orthographically licensed forms.g g

words than non‐words

Page 49: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

How can one study How can one study spokenspoken word recognition?word recognition?

Eye camera

Scene camera

Pick up the beakerPick up the beaker

[Allopenna et al 1998]

pp

Page 50: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Modern eyeModern eye‐‐trackerstrackers

[50]

Page 51: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

1Trials

1

2+

3

4

ons

ons 200 ms

5

TimeLook at the cross. Cli k h b k

on o

f fix

atio

on o

f fix

atioClick on the beaker.

Target = beakerCohort = beetleU l t d ii

Prop

ortio

Prop

ortio

Unrelated = carriagecarriageTime

Page 52: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

ResultsResults

?

Page 53: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Another topAnother top‐‐down effect: down effect: Freq enc effectsFreq enc effectsFrequency effectsFrequency effects

E id f i i f f d i• Evidence for stronger activation of more frequent words in lexical processing in speech (Dahan et al. 2001)

53

Page 54: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Back to ZipfBack to Zipf

W bi d d i f d• We are biased towards expecting more frequent words

• Think about the fact that frequent words are recognized faster even when they are equally longeven when they are equally long.

• Not only does it save production effort to have phonological words shorter (as Zipf speculated), it also seems that less bottom‐up signal is required  to understand them!

• Interestingly more frequent words that have the same number of phonemes, still tend to be pronounced with shorter duration in speech. [e.g. Gahl, 2008]duration in speech.  [e.g. Gahl, 2008]

[54]

Page 55: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

More topMore top‐‐down knowledge: down knowledge: contextual predictabilitycontextual predictabilitycontextual predictabilitycontextual predictability

eat + (something edible)

“I’m gonna eat the ”eat the…

Page 56: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

SurprisalSurprisal

H l (2001) d h d’ l i i• Hale (2001) proposed that a word’s complexity in sentence comprehension is determined by its surprisal, an old measure borrowed from information theory (it’s the same as Shannon information)

56

Page 57: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

57

Page 58: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[from Smith and Levy, in Prep]Prep]

58

Page 59: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

More topMore top‐‐down expectationsdown expectations

I ddi i l l i l k l d f d• In addition to general lexical knowledge, frequency, and contextual predictability, we also are incredibly good at taking into consideration the current visual context and what we think the speaker knows.

Listeners overcome the problem of noisy perception of invariant input but rapidly integrating the noisy bottom upinvariant input, but rapidly integrating the noisy bottom‐up signal with top‐down information from many information sources.

This is quite compatible with Zipf’s observations although he, of course, did not know about all of these facts about word recognition (he postulated his laws in 1949!)recognition (he postulated his laws in 1949!).

[59]

Page 60: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Part 5Part 5

ReRe visiting Zipf withvisiting Zipf withReRe‐‐visiting Zipf with visiting Zipf with information theory & information theory & 

psycholinguistics in mindpsycholinguistics in mind

[60]

Page 61: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

ReRe‐‐thinking Zipfthinking Zipf[Pi t d i Til d Gib 2011][Piantadosi, Tily, and Gibson, 2011]

If k d i i i i i h i ll• If spoken word recognition is sensitive to what is contextually expected, perhaps contextual expectations, rather than frequency, are what (over generations) determines the phonological length of words (i.e. the amount of bottom‐up signal)

How probable is the context given the word?Sum (marginalize!)

• The average information a word w carries in its different contexts C is:

How probable is the word given the context?( g )

over contexts

which can be estimated from a corpus:

[61]

Page 62: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Result for EnglishResult for English

[62][Figure 2 from Piantadosi et al., 2011]

Page 63: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Bigram results for all 11 languagesBigram results for all 11 languages

[63]

[part of Figure 1 from Piantadosi et al., 2011]

Page 64: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[Figure 1 from Piantadosi et al., 2011]

C i h l f• Comparing the results for different  n in the ngram model

[64]

Page 65: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

One more example: RussianOne more example: Russian

M i h

[Figure 1 and 2 from Manin, 2006]

• Manin uses human judgments from a cloze‐like task to estimate information (or unpredictability)

[65]

Page 66: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Take home points (1)Take home points (1)

B i i l i i i ( h f• Bottom‐up input in language processing is noisy (the output of production is variable and perception itself is a noisy process)

• Listeners overcome the challenges of spoken word recognition by relying on probabilistic cues (top‐down knowledge about the language, the current context, etc.)

• Words that are hard to process because they are on average unexpected (have high information), tend to be phonologically longer.longer.

[66]

Page 67: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[from http://roosterteeth com/comics/strip php; thx to Nurit Melnik][from http://roosterteeth.com/comics/strip.php; thx to Nurit Melnik]

[67]

Page 68: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

P t 6P t 6Part 6Part 6

Beyond the lexicon: An example from Beyond the lexicon: An example from sentence processingsentence processingsentence processingsentence processing

[68]

Page 69: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Memory and dependency lengthMemory and dependency length[cf Jaeger and Til 2011 for a concise o er ie ][cf. Jaeger and Tily, 2011 for a concise overview]

Si il d i i i i i i• Similar to word recognition, sentence processing is sensitive to probabilities: structure that are less expected in the context take longer to process.   [e.g. MacDonald and Shillcock, 2001; Kamide et al. 2003; Staub and Clifton, 2006; Levy, 2008; Smith and Levy, 2009]

• Syntactic processing is also sensitive to memory demands: Longer dependencies take longer to process   [e.g. Gibson 1998, 2000; Gibson and Grodner, 2005; Lewis et al., 2006; Vasishth et al., 2005]

[69]

Page 70: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Memory and dependency lengthMemory and dependency length

[Figure 1 from Jaeger and Tily, 2011]

[ f Til 2012][ from Tily, 2012]

[70]

Page 71: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

How do psycholinguists study effects How do psycholinguists study effects of dependency length?of dependency length?of dependency length?of dependency length?

S lf d di• Self‐paced reading– Word‐by‐word reading time between button presses– Comprehension accuracy in questions displayed after sentence

• Stop‐making‐sense task– Like self‐paced reading, but critical sentence are ungrammatical 

and we measure how long it takes before subjects press “n”‐g j pnope, not a sentence anymore.

• Eye‐tracking readingFirst pass fixations– First pass fixations

– Fixation durations– Proportion of regressive saccades

[71]

Page 72: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[72]

Page 73: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[73]

Page 74: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[74]

Page 75: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[75]

Page 76: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[76]

Page 77: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[77]

Page 78: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Page 79: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

[79]

Page 80: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

An example findingAn example finding[ Figure 4 from Gibson, 1998]

• Confirmed in many independent studies, though there are still ti b t h th th lt b d d tquestions about whether the result can be reduced to 

expectation‐based processing    [e.g. Wells et al., 2009; MacDonald][80]

Page 81: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Is this processing preference reflected Is this processing preference reflected in grammar?in grammar?in grammar?in grammar?

S d l d k d d i h ?• So, do languages tend to keep dependencies short?

• This question was recently addressed for English and German by Gildea and Temperley (2010)

[81]

[ Table 1 from Gildea and Temperley, 2010]

Page 82: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Why is German not as optimal as Why is German not as optimal as English in terms of dependencies?English in terms of dependencies?English in terms of dependencies?English in terms of dependencies?

I ld b il bili f h h d l i• It could be availability of case, another cue to the underlying structure of the sentence (which reduces memory costs)

• Intriguing evidence comes from Tily (2011) who studies the• Intriguing evidence comes from Tily (2011), who studies the development of dependency length from Old English, which had case, to Modern English, which does not.

[82][ from Tily, 2012]

Page 83: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

WrappingWrapping upupWrappingWrapping‐‐upup

[83]

Page 84: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Take home point (2)Take home point (2)

Gi h l h h i d if h• Given that language has the properties expected if humans and languages evolved to transfer information efficiently, and given that these properties are unlikely to be due to chance (see e.g. Ferrer i Cancho and Sole, 2003 on Zipf’s law), this provides tentative support for the idea that functional pressures shape language over time.p p g g

• But so far we’ve only seen correlations. How would processing and communicative biases come to affect language over time?

• Tomorrow we’ll see how one can test more directly how processing and communicative biases could come to affect language over generations by affecting language acquisitionlanguage over generations by affecting language acquisition.

[84]

Page 85: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

How to estimate the average How to estimate the average information of a word?information of a word?information of a word?information of a word?

Pi d i l (2011) d l i h hi (• Piantadosi et al (2011) use ngram models with smoothing (n = 2,3,4) based on the Google ngram corpus available for 11 languages.– The large size of the Google ngram corpus is important to obtain 

reliable estimates of the average information a word carries in context.

• These estimates are then regressed against word length.

[85]

Page 86: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

How is information estimated from a How is information estimated from a corpus?corpus?corpus?corpus?

• As Shannon information is defined with reference toAs Shannon information is defined with reference to probability, we need to estimate the probability of words in order to estimate their information.

• So called ngram models provide a simple way that is frequently employed to derive probability estimates from a ll ti f h iticollection of speech or writing.

• So, let’s assume we have such a corpus.

[86]

Page 87: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

A very small corpus:A very small corpus:Over the last two decades, cognitive science has undergone a paradigm shift towards probabilistic models of the brain and cognition. Many aspects of human cognition are 

d d i f i l f il bl i f i i h li h fnow understood in terms of rational use of available information in the light of uncertainty (e.g. models in memory, categorization, generalization and concept learning, visual inference, motor planning). Building on a long traditional of computational models for language, such rational models have also been proposed for language processing and acquisition. This class provides an overview to the newly emerging field of computational psycholinguistics, which combines insights and methods from linguistic theory, natural language processing, machine learning, psycholinguistics, and cognitive science into the study of how we understand andpsycholinguistics, and cognitive science into the study of how we understand and produce language. There has been a surge in work in this area, which is attracting scholars from many disciplines. The goal of this class is to provide students with enough background to start their own research in computational psycholinguistics.

• Now let’s extract the bigrams from this text. That’s really just the list of all two‐word sequences in the above text, followed by how often they occur

[87]

Page 88: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

From bigrams to Shannon informationFrom bigrams to Shannon information

thi l 2this class 2over the 1the last 1last two 1this area 1…

• (for a neat tool that let’s you estimate bigrams based on the y gBrown corpus, see But we can be lazy and use a tool, e.g. http://word.snu.ac.kr/ngram/)

E th d thi i f ll d 2 t f 3 ti b th d• E.g. the word this is followed 2 out of 3 times by the word class. Hence, our best (maximum likelihood) estimate of p(class | this) = ‐log 2/3

[88]

Page 89: From PfP erformance to CtC ompetence · EVELIN 2012 FlorianJaeger Human Language ProcessingLab , ... [Zi f[Zipf 1949 1]1949:1] In silimple terms, the Pi i lPrinciple of Least Efforts

Getting the Getting the average average information of a information of a word in contextword in contextword in contextword in context

I l h d l l d i h• In our sample text, the word class only occurred twice, each time preceded by this. Recall that the average information of a word in context based on a corpus is calculated as:

where C is the context (here simply the preceding word) and N is the number of different context (here 2 since class occurs twice in the corpus). p )

• Hence, the average information of class given the preceding word in our sample is: ‐1/2(log 2/3 + log 2/3) = ‐1/2 *log 4/9 = 0 8 bi f i f i h l id0.58 bits of information that class on average provides.

[89]