day 2: pruning continued; begin competition models
DESCRIPTION
Day 2: Pruning continued; begin competition models. Roger Levy University of Edinburgh & University of California – San Diego. Today. Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models. Marginalization. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/1.jpg)
Day 2: Pruning continued;begin competition models
Roger LevyUniversity of Edinburgh
&University of California – San Diego
![Page 2: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/2.jpg)
Today
• Concept from probability theory: marginalization
• Complete Jurafsky 1996: modeling online data• Begin competition models
![Page 3: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/3.jpg)
Marginalization
• In many cases, a joint p.d. will be more “basic” than the raw distribution of any member variable
• Imagine two dice with a weak spring attached• No independence → joint more basic
• The resulting distribution over Y is known as the marginal distribution
• Calculating P(Y) is called marginalizing over X
Coin1 = H
Coin1 = T
Coin2 = H
1/3 1/8
Coin2 = T
1/8 5/12
![Page 4: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/4.jpg)
Today
• Concept from probability theory: marginalization
• Complete Jurafsky 1996: modeling online data• Begin competition models
![Page 5: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/5.jpg)
Modeling online parsing
• Does this sentence make sense?The complex houses married and single students and their families.
• How about this one?The warehouse fires a dozen employees each year.
• And this one?The warehouse fires destroyed all the buildings.
• fires can be either a noun or a verb. So can houses:[NP The complex] [VP houses married and single students…].
• These are garden path sentences• Originally taken as some of the strongest evidence
for serial processing by the human parser
Frazier and Rayner 1987
![Page 6: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/6.jpg)
Limited parallel parsing
• Full-serial: keep only one incremental interpretation
• Full-parallel: keep all incremental interpretations
• Limited parallel: keep some but not all interpretations
• In a limited parallel model, garden-path effects can arise from the discarding of a needed interpretation
[S [NP The complex] [VP houses…] …]
[S [NP The complex houses …] …]
discarded
kept
![Page 7: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/7.jpg)
Modeling online parsing: garden paths
• Pruning strategy for limited ranked-parallel processing• Each incremental analysis is ranked• Analyses falling below a threshold are discarded• In this framework, a model must characterize• The incremental analyses• The threshold for pruning
• Jurafsky 1996: partial context-free parses as analyses
• Probability ratio as pruning threshold• Ratio defined as P(I) : P(Ibest)
• (Gibson 1991: complexity ratio for pruning threshold)
![Page 8: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/8.jpg)
Garden path models 1: N/V ambiguity
• Each analysis is a partial PCFG tree• Tree prefix probability used for ranking of
analysis
• Partial rule probs marginalize over rule completions
these nodes are actually still undergoing expansion
*implications for granularity of structural analysis
![Page 9: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/9.jpg)
N/V ambiguity (2)
• Partial CF tree analysis of the complex houses…
• Analysis of houses as noun has much lower probability than analysis as verb (> 250:1)
• Hypothesis: the low-ranking alternative is discarded
![Page 10: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/10.jpg)
N/V ambiguity (3)
• Note that top-down vs. bottom-up questions are immediately implicated, in theory
• Jurafsky includes the cost of generating the initial NP under the S• of course, it’s a small cost as P(S -> NP …) = 0.92
• If parsing were bottom-up, that cost would not have been explicitly calculated yet
![Page 11: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/11.jpg)
Garden path models II
• The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs)
• From the valence + simple-constituency perspective, MC and RRC analyses differ in two places:
The horse raced past the barn fell.
(that was)
p≈1p=0.14
transitive valence: p=0.08
best intransitive: p=0.92
![Page 12: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/12.jpg)
Garden path models II (2)
• 82 : 1 probability ratio means that lower-probability analysis is discarded
• In contrast, some RRCs do not induce garden paths:
• Here, found is preferentially transitive (0.62)• As a result, the probability ratio is much closer (≈
4 : 1)• Conclusion within pruning theory: beam threshold is
between 4 : 1 and 82 : 1• (granularity issue: when exactly does probability
cost of valence get paid??? c.f. the complex houses)
The bird found in the room died.
*note also that Jurafsky does not treat found as having POS ambiguity
![Page 13: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/13.jpg)
• Jurafsky 1996 is a product-of-experts (PoE) model
• Expert 1: the constituency model• Expert 2: the valence model
• PoEs are flexible and easy to define, but…• The Jurafsky 1996 model is actually deficient
(loses probability mass), due to relative frequency estimation
Notes on the probabilistic model
![Page 14: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/14.jpg)
Notes on the probabilistic model (2)
• Jurafsky 1996 predated most work on lexicalized parsers (Collins 1999, Charniak 1997)
• In a generative lexicalized parser, valence and constituency are often combined through decomposition & Markov assumptions, e.g.,
• The use of decomposition makes it easy to learn non-deficient models
sometimes approximated as
![Page 15: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/15.jpg)
Jurafsky 1996 & pruning: main points
• Syntactic comprehension is probabilistic• Offline preferences explained by syntactic +
valence probabilities• Online garden-path results explained by same
model, when beam search/pruning is assumed
![Page 16: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/16.jpg)
General issues
• What is the granularity of incremental analysis?• In [NP the complex houses], complex could be an
adjective (=the houses are complex)• complex could also be a noun (=the houses of the
complex)• Should these be distinguished, or combined?• When does valence probability cost get paid?
• What is the criterion for abandoning an analysis?
• Should the number of maintained analyses affect processing difficulty as well?
![Page 17: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/17.jpg)
Today
• Concept from probability theory: marginalization
• Complete Jurafsky 1996: modeling online data• Begin competition models
![Page 18: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/18.jpg)
General idea
• Disambiguation: when different syntactic alternatives are available for a given partial input, each alternative receives support from multiple probabilistic information sources
• Competition: the different alternatives compete with each other until one wins, and the duration of competition determines processing difficulty
![Page 19: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/19.jpg)
Origins of competition models
• Parallel competition models of syntactic processing have their roots in lexical access research
• Initial question: process of word recognition• are all meanings of a word simultaneously
accessed?• or are only some (or one) meanings accessed?
• Parallel vs. serial question, for lexical access
![Page 20: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/20.jpg)
Origins of competition models (2)
• Testing access models: priming studies show that subordinate (= less frequent) meanings are accessed as well as dominant (=more frequent) meanings
• Also, lexical decision studies show that more frequent meanings are accessed more quickly
![Page 21: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/21.jpg)
Origins of competition models (3)
• Lexical ambiguity in reading: does the amount of time spent on a word reflect its degree of ambiguity?
• Readers spend more time reading equibiased ambiguous words than non-equibiased ambiguous words (eye-tracking studies)
• Different meanings compete with each otherRayner and Duffy (1986); Duffy, Morris, and Rayner (1988)
Of course the pitcher was often forgotten…
? ?
![Page 22: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/22.jpg)
Competition in syntactic processing
• Can this idea of competition be applied to online syntactic comprehension?
• If so, then multiple interpretations of a partial input should compete with one another and slow down reading• does this mean increase difficulty of
comprehension?• [compare with other types of difficulty, e.g.,
memory overload]
![Page 23: Day 2: Pruning continued; begin competition models](https://reader036.vdocument.in/reader036/viewer/2022062409/5681469f550346895db3b797/html5/thumbnails/23.jpg)
Constraint types
• Configurational bias: MV vs. RR• Thematic fit (initial NP to verb’s roles)• i.e., Plaus(verb,noun), ranging from
• Bias of verb: simple past vs. past participle• i.e., P(past | verb)*
• Support of by• i.e., P(MV | <verb,by>) [not conditioned on specific
verb]• That these factors can affect processing in the
MV/RR ambiguity is motivated by a variety of previous studies (MacDonald et al. 1993, Burgess et al. 1993, Trueswell et al. 1994 (c.f. Ferreira & Clifton 1986), Trueswell 1996)
*technically not calculated this way, but this would be the rational reconstruction