is-syntactic-knowledge-probabilistic.v2.corrdanlass/courses/prob-and-stats-winter15/...(based on...

27
Is syntactic knowledge probabilistic?

Upload: others

Post on 16-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

✬ ✫

✩ ✪

Is syntactic knowledge

probabilistic?

✬ ✫

✩ ✪

Experiments described in:

Anette Rosenbach (2002) Genitive Variation in English. Conceptual

Factors in Synchronic and Diachronic Studies. Mouton de Gruyter.

Anette Rosenbach (2003) Iconicity and Economy in the Choice between

the ’s-genitive and the of-genitive in English. In Determinants of Gram-

matical Variation in English, ed. by G. Rohdenburg and B. Mondorf,

Mouton de Gruyter, 379–411.

Joan Bresnan (2007) Is syntactic knowledge probabilistic? Experiments

with the English dative alternation. In Roots: Linguistics in Search of

Its Evidential Base. Series: Studies in Generative Grammar, ed. by S.

Featherston and W. Sternefeld. Berlin: Mouton de Gruyter, 77–96.

✬ ✫

✩ ✪

Rosenbach (2003) reports a forced choice study

which controls for the overlapping factors (animacy,

topicality, prototypicality of the possession relation)

that affect genitive choice:

Items and conditions:

[+animate +topical +proto]: the boy’s eyes ∼ the eyes of the boy

[+animate, +topical, −proto]: the mother’s future ∼ the future of the mother

[+animate, −topical, +proto]: a girl’s face ∼ the face of a girl

[+animate, −topical, −proto]: a woman’s shadow ∼ the shadow of a woman

[−animate, +topical, +proto]: the chair’s frame ∼ the frame of the chair

[−animate, +topical, −proto]: the bag’s contents ∼ the contents of the bag

[−animate, −topical, +proto]: a lorry’s wheels ∼ the wheels of a lorry

[−animate, −topical, −proto]: a car’s fumes ∼ the fumes of a car

✬ ✫

✩ ✪

–Operationalizes animacy as personal, common

nouns vs. concrete common nouns (excluding ge-

ograpical and temporal)

–Operationalizes topicality as second-mention,

definite expression vs. first-mention, indefinite

expression

–Operationalizes possessive relations as

for humans: body parts, kin terms, and per-

manent legal ownership vs. states and ab-

stract ‘possessions’

for inanimates: part/whole relations vs. non-

part/whole relations

✬ ✫

✩ ✪

A sample question from her questionnaire:

A helicopter waited on the nearby grass like

a sleeping insect, its pilot standing outside

with Marino. Whit, a perfect specimen of

male fitness in a black flight suit, opened

[the helicopter’s doors/ the doors of the

helicopter] to help us board.

(based on Patricia Cornwell, The Body Farm, 52)

✬ ✫

✩ ✪

’s and of genitives in English (Rosenbach 2002)

✬ ✫

✩ ✪

Other findings:

the ’s-genitive is spreading across time (older to

younger speakers) and space (younger American to

younger British speakers)

✬ ✫

✩ ✪

Note on design and analysis:

–univariable analysis (= ‘basic statistical tests’,

such as Chisquare)

–controls (e.g. holds length of possessor and pos-

sessum constant; excludes proper nouns)

–stratificational analysis (e.g. age, pp. 396–7)

✬ ✫

✩ ✪

Compare a subsequent corpus study:

Lars Hinrichs & Benedikt Szmrecsanyi. 2007 Re-

cent changes in the function and frequency of Stan-

dard English genitive constructions: A multivariate

analysis of tagged corpora. English Language and

Linguistics 11(3): 437–74.

✬ ✫

✩ ✪

Bresnan (2007):

Hypothesis: If the dative corpus model sufficiently

characterizes language users’ implicit linguistic

knowledge of usage probabilities, then where the

model predicts higher- or lower-probability out-

comes, we would expect experiment participants to

do so as well in behavioral tasks.

✬ ✫

✩ ✪

• An indirect task: Rate the naturalness of the

alternatives according to your own judgments on

a numerical scale of 1 to 100.

• A direct task: Guess the choices made by the

original dialogue speakers and rate the likeli-

hood of your guess being correct on a numerical

scale of 1 to 100.

✬ ✫

✩ ✪

Bresnan 2007

Experiment:

Thirty instances of dative constructions were ran-

domly drawn from the centers of five probability

bins of the dative corpus model distribution. (Po-

tentially ambiguous items were replaced.)

✬ ✫

✩ ✪

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

Sampled Constructions for Experiment 1

Co

rpus

Mo

del

Pro

bab

ilit

ies

vlow

low

med

hi

vhi

✬ ✫

✩ ✪

The contexts of the sampled instances were re-

trieved from the full Switchboard corpus tran-

scriptions and edited for readability by removing

disfluencies and backchannelings.

• The probability model was not conditioned on

speech features (disfluency, prosody, etc)

• The experimental task required reading and not

audition.

✬ ✫

✩ ✪

• An alternative to each target construction was

constructed,

• the order of passages was randomized,

• and the order of target constructions alternated.

• A questionnaire was created containing the thirty

passages.

✬ ✫

✩ ✪

Sample passage

Speaker A:

I moved to Arkansas and Texas after living in Ohio and the

schools down here rate, you know, bottom ten percent across

the country and having been through grade school up there

and coming down here to high school I can understand why.

Because they’re so far behind and so poorly staffed, half the

time the teachers don’t know what’s going on.

Speaker B:

Well, that’s really too bad because

(1) it’s giving some people unfair advantage.

(2) it’s giving unfair advantage to some people.

✬ ✫

✩ ✪

19 participants from Stanford summer term under-

graduates were recruited and paid.a

The participants were instructed to rate the relative

naturalness of the alternatives in the given context

passage, according to their own intuitions, on a

scale of 0 to 100; the ratings of the alternatives

must sum to 100.

aThe results from participantss who had taken a syntax course were excluded, as well as

bilinguals and non-native speakers of English.

✬ ✫

✩ ✪

Finding: Both as a group and individually, partic-

ipants’ numerical ratings of the alternative dative

continuations showed a direct linear relation to the

corpus log odds of those constructions.

✬ ✫

✩ ✪

Corpus Log Odds

Rat

ings

20

40

60

80

−5 0 5

✬ ✫

✩ ✪

Corpus Log Odds

Rat

ings

0

20

40

60

80

100

−5 0 5

s1.us s2.us

−5 0 5

s3.us s4.us

−5 0 5

s5.us

s6.us s7.us s8.us s9.us

0

20

40

60

80

100

s10.us0

20

40

60

80

100

s11.us s12.us s13.us s14.us s15.us

s16.us

−5 0 5

s17.us s18.us

−5 0 5

0

20

40

60

80

100

s19.us

✬ ✫

✩ ✪

Analysis using multilevel multivariable regression

showed:

the corpus model probabilities are significant pre-

dictors of the ratings, after controlling for random

effects of subject and verb as well as item order,

order of constructions, and lemma frequency.

✬ ✫

✩ ✪

Bresnan (2007) also compared each subject’s rat-

ings with the actual choices by the speakers in the

original conversations. Baseline = 0.57.

Proportions of Participants’ Ratings

Favoring Actual Corpus Choices

0.63 0.83 0.80 0.70

0.80 0.80 0.67 0.77

0.73 0.83 0.80 0.77

0.80 0.77 0.77 0.73

0.73 0.87 0.67

✬ ✫

✩ ✪

Participants naturalness ratings are reliably asso-

ciated with the syntactic alternatives used by the

original speakers:

(Wilcoxon signed rank test with continuity correc-

tion, n = 19, V = 190, p < 0.001)

✬ ✫

✩ ✪

In a follow-up experiment different participants

were asked to guess which choice the original

speaker made, and to rate the likelihood that their

guess was correct. These likelihood ratings were

highly significant—

they could make reliable guesses about which alter-

native the original dialogue participant chose

(Wilcoxon signed rank test with continuity correc-

tion, n = 20, V = 210, p < 0.0001)

✬ ✫

✩ ✪

Related work:

MacDonald, M.C. (1999). Distributional information in language and

acquisition: Three puzzles and a moral. The emergence of language, ed.

by Brian MacWhinney, 177–96. Mahwah, NJ: Lawrence Erlbaum.

Arnold, J., Wasow,T., Losongco, A., & Ginstrom, R. (2000). Heaviness

vs. newness: The effects of complexity and information structure on

constituent ordering. Language 76(1): 28–55.

Gries, S.T. (2003). Towards a corpus-based identification of prototypical

instances of constructions. Annual Review of Cognitive Linguistics 1:

1–27.

Rosenbach, A. (2003). Aspects of iconicity and economy in the choice

between the s-genitive and the of -genitive in English. Determinants of

grammatical variation in English, ed.. by Gunter Rohdenburg and Britta

Mondorf, 379–411. Berlin: Mouton de Gruyter.

Rosenbach, A. (2005). Animacy versus weight as determinants of

grammatical variation in English. Language 81(3): 613–44.

✬ ✫

✩ ✪

Gahl, S., & Garnsey, S. (2004). Knowledge of grammar, knowledge of

usage: Syntactic probabilities affect pronunciation variation. Language

80(4): 748–774.

Levy, R., & Jaeger, T. (2007). Speakers optimize information density

through syntactic reduction. Proceedings of the twentieth annual confer-

ence on neural information processing systems, pp. 29–37. Vancouver:

NIPS.

Bresnan, J. (2007) Is syntactic knowledge probabilistic? Experiments

with the English dative alternation. In Roots: Linguistics in Search of

Its Evidential Base. Series: Studies in Generative Grammar, ed. by S.

Featherston and W. Sternefeld. Berlin: Mouton de Gruyter, 77–96.

Tily, H., Gahl, S., Arnon, I., Snider, N., Kothari, A., & Bresnan, J. (2009).

Syntactic probabilities affect pronunciation variation in spontaneous

speech. Language and Cognition 1(2): 147–165.

Jaeger, T. F. (2010). Redundancy and reduction: Speakers manage

syntactic information density. Cognitive Psychology 61: 23–62.

✬ ✫

✩ ✪

Bresnan, J. & M. Ford (2010). Predicting syntax: Processing dative

constructions in American and Australian varieties of English, Language

86.1: 168–213.

Victor Kuperman and Joan Bresnan (2012). The effects of construction

probability on word durations during spontaneous incremental sentence

production. Journal of Memory & Language 66: 588-611.

Theijssen, D. (2012) Making Choices. Modeling the English Dative

Alternation. Nijmegen: Radboud University Centre for Language Studies

Ph.D. dissertation.

Ford, M., & Bresnan,J. (2012). “They whispered me the answer” in

Australia and the US: A comparative experimental study. In From Quirky

Case to Representing Space: Papers in Honor of Annie Zaenen, ed. by

Tracy Holloway King and Valeria de Paiva. Stanford: CSLI Publications.

Ford, M. & J. Bresnan (2013). Generating data as a proxy for unavailable

corpus data: the contextualized sentence completion task. To appear in

Corpus Linguistics and Linguistic Theory.