pranav anand, caroline andrews, matthew wagers assessing the pragmatics of experiments with...

18
Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of California, Santa Cruz

Upload: hayden-burton

Post on 26-Mar-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Pranav Anand, Caroline Andrews, Matthew Wagers

Assessing the pragmatics of experiments with

crowdsourcing: The case of scalar implicature

University of California, Santa Cruz

Page 2: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiments & Pragmatic Processing

Each of the critics reviewed some of the movies.

– evidence for EI’s, with different response choices

- no evidence of EI’s

but not all ?

Depending on the study:

Worry: How much do methodologies themselves influence judgements?

Worry: Are we adequately testing the influence of methodologies on our data?

Case Study: (Embedded) Implicatures

Previous Limitation: Lack of Subjects and Money

Crowd-sourcing addresses both problems

Page 3: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Pragmatics of Experimental Situations

Teleological Curiosity - Subjects hypothesizing “expected” behavior, matching an ideal

Worry: How much do methodologies themselves influence judgements?

Evaluation Apprehension – subjects know they are being judged

The experiment itself is part of the pragmatic context See Rosenthal & Rosnow. (1975) The Volunteer Subject.

Page 4: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Elements of Experimental Context

e.g. True / False, Yes / No, 1-7 scale

Response Structure – Response choices available to the subject

Worry: How much do methodologies themselves influence judgements?

Prompt – the Question

Protocol – Social Context / Task Specification

directions for the Response Structure

Immediate Linguistic/Visual Context

Our Goal: Explore variations of these elements in a systematic way

Page 5: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experimental Design

Is this an accurate description?

Some of the spices have red lids.

Linguistic Contexts – All Relevant, All Irrelevant, No Context

Protocol Experimental – normal experiment instructions

Annotation – checking the work of unaffiliated annotators

4 Implicature Targets, 6 Some/All Controls, 20 Fillers

Page 6: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 1:Social Context

Focus on ProtocolAnnotation vs Experiment

Population: Undergraduates

All – Irrelevant No Story All-Relevant

Experiment

Annotation

Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know

Page 7: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 1:Social Context

Finding: Social context even when linguistic context does not.

LinguisticContext:No Effect

Page 8: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 1:Social Context

Finding: Social context even when linguistic context does not.

Lower SI rate for Annotation(p<0.05)

Page 9: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

Accuracy Prompt - “Is this an accurate description?”Response Categories - Yes, No, Don’t Know

Informativity Prompt - “How Informative is this sentence?”

Response Categories - Not Informative Enough Informative Enough Too Much Information False

Population: Mechanical Turk Workers

Systematic Debriefing Survey

Page 10: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

Effect for Prompt

Page 11: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

Effect for Prompt(p<0.001)

Effect for Context(p<0.001)

Page 12: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

Effect for Prompt(p<0.001)

Effect for Context(p<0.001)

Weak Interaction:Prompt xContext(p<0.06)

Page 13: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

No Effect forProtocol

Page 14: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Experiment 2Prompt Type

Low SI ratesoverall

But the debriefing surveyindicates that (roughly) 70% of participants were aware of some/all contrast

Page 15: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Populations

Turkers – More sensitive to Linguistic ContextLess sensitive to changes in changes in social context/ evaluation apprehension

Undergraduates – More sensitive to Protocol

Page 16: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Take Home Points

• Methodological variables should be explored alongside conventional linguistic variables– Ideal: models of these processes (cf. Schutze 1996)– Crowdsourcing allows for cheap/fast exploration of

parameter spaces

• New Normal: Don’t guess, test.– Controls, norming, confounding … all testable

online

Page 17: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

A potential check on exuberance

• Undergraduates may be WEIRD*, but crowdsourcing engenders its own weirdness– High evaluation apprehension– Uncontrolled backgrounds, skillsets, focus levels– Unknown motivations

• Ignorance does not necessarily mean diversity– This requires study if we rely on such participants

more

* Heinrich et al. (2010) The Weirdest People in the World? BBS

Page 18: Pranav Anand, Caroline Andrews, Matthew Wagers Assessing the pragmatics of experiments with crowdsourcing: The case of scalar implicature University of

Acknowledgments

Thanks Jaye Padgett and to the attendees of two Semantics Lab presentations and the XPRAG conference for their

comments, to the HUGRA committee for their generous award and support, and thanks to Rosie Wilson-Briggs

for stimuli construction.