the impact of automated measurement of text characteristics graesser-plenary... · potential...

Post on 08-May-2018

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Art GraesserProfessor, Psychology & Institute for Intelligent Systems, University of Memphis

Honorary Research Fellow, University of Oxford

The Impact of Automated

Measurement of Text

Characteristics

Bill & Melinda

Gates Foundation

Overview• A snapshot of conversational agents

in assessments of reading, writing,

listening, and speaking

• Models of reading that emphasize

discourse

• Automatic scoring of text and writing

with CohMetrix and other automated

systems

Foundational Claims• There have been major advances in

computational linguistics and automated

discourse analyses during the last two

decades.

• Accuracy is impressive in computer analyses

of reading, writing, listening, and speaking.

• Conversational agents in social scenarios will

play an increasing role in these assessments.

A snapshot of agents in assessments of

Reading

Writing

Speaking

Listening

Conversational Agents

BEAT LeonardoPKD Android Casey

iSTART TLTS MRE

AutoTutor Adele STEVE

iMAP

SI Agent

Guru

Writing-Pal

Memphis Agent Environments

PKD Android

Andrew Olney

DeepTutor

Vasile Rus

AutoTutor

Art Graesser

iMAP

Max Louwerse

Guru

Andrew Olney

Meta-Tutor

Roger Azevedo

AutoTutor-LITE

Xiangen Hu

iSTART

Danielle McNamara

iDRIVE

Barry Gholson

Writing-Pal

Danielle McNamara

HURA Advisor

Xiangen Hu

DeepTutor

Vasile Rus

Trialogs

Expert Fellow Student

Human

Vicarious

Trialogs in Learning

Low Ability Vicarious learning

Medium Ability Tutorial dialogue

High Ability Teachable agent

Trialogs in Assessment

Low Ability Short responses to prompts

Inaccurate or irrelevant

Violation of social norms

Medium Ability

High Ability Lengthier turns

Accurate contributions

Social appropriateness

10

Confidential and Proprietary. Copyright © 2011 Educational Testing Service. All rights reserved.Confidential and Proprietary. Copyright © 2011 Educational Testing Service. All rights reserved.

Trialog (English Language Skills)

11

Agent Utterance

Lisa: Hey, Ron, you need to leave your water outside. I'm going

to go talk to my friends. I'll see you guys inside.

Ron: Why did she tell me I have to leave my water outside, Tim?

Human (Tim): I do not know.

Ron: Tim, why can't I drink water?

Human (Tim): The books may get wet.

Lisa: Why do you still have your water bottle, Ron? Look at rule

number 2. We cannot get in the library with food or drink.

2005 ETS Invitational Conference, New York City, October 10-11, 2005

Tactical Language and Culture

Training System (Lewis Johnson)

iSTART (Danielle McNamara) interactive Strategy Training for Active Reading and Thinking

Writing Pal (Danielle McNamara)

Self-Regulated Learning with

MetaTutor (Roger Azevedo)

Scientific reasoning

Electronic textbook

Periodic questions – answer and ask

Critique experiments and newspaper clips

Game context with aliens

Trialogs with tutor and student agents

Operation ARIES!

People need to learn how to critically evaluate descriptions of research . . .

Th

Correlational

Design

Causal statement

Core Concepts are presented

across the game

c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c

Interactive

textCase studies Interrogation

---------------- Story elements ---------------

Daphne Greenberg

Lee Branum-Martin

Robin Morris

Chris Oshima

Maureen Lovett

Jan Frijters

Art Graesser

Xiangen Hu

Mark Conley

Andrew Onley

Intervention with AutoTutor and Trialogues(Graesser, D’Mello, Hu, Cai, Olney, & Morgan, 2012)

Conversational agents

Intelligent Tutoring System

Online through browser

Media include texts, diagrams, videos, quizzes, games, and social media

Adults communicate by typing, speaking, or pointing/clicking

Text Selection and Repository• Critical to good instruction

• Related to both cognition and motivation

• Interesting• Relevant to adult lives• Multiple purposes• Not too easy or too difficult

• Different genres, media, and technologies

Models of reading

that emphasize

discourse

Models or Frameworks

in Discourse Processes Field

• Construction-integration model (Kintsch)

• Structure building framework (Gernsbacher)

• Causal structure (Trabasso, Van den Broek)

• Landscape model (Van den Broek)

• Constructionist theory (Graesser, Singer, Trabasso)

• Event indexing model (Zwaan, Magliano, Graesser)

• Memory-based resonance model (Myers, O’Brien)

• Embodied cognition (Glenberg, Zwaan)

Multilevel framework of

discourse comprehension1. Words

2. Syntax

3. Textbase

Explicit ideas (propositions)

Referential cohesion

4. Situation model

Causal, intentional, temporal,

spatial, logical relationships

Connectives

5. Genre and rhetorical structure

6. Pragmatic communication

Graesser & McNamara

(2011). Topics in

Cognitive Science.

Language Learning is Multidimensionaland Changes over Time

(Scarborough, 2003)

Reading Framework Proposed by

Perfetti (1999)

Goldman, Brown, Britt, Magliano,

Greenleaf, Lee, Griffin, Hastings,

Lawless, Pellegrino, Radinsky,

Raphael, Shanahan, Wiley

Multilevel framework of discourse comprehension

1. Words

2. Syntax

3. Textbase

Explicit ideas (propositions)

Referential cohesion

4. Situation model

Causal, intentional, temporal, spatial logical relationships

Connectives

5. Genre and rhetorical structure

6. Pragmatic communication

Graesser & McNamara (2011). Topics in Cognitive Science.

Potential Principles of Processing

• Bottom-up processing until reaching a level

where the reader is not proficient. Attempt to achieve deepest, most global level.

• Intermediate levels (3 & 4) have the highest

information novelty.Textbase and situation model demand resources.

• Top-down higher levels can circumvent the

need to process lower levels.A cost to comprehension at lower levels.

• Levels can compensate for deficits at other

levels.Particular compensation mechanisms require more research.

Scenario 1

A child has trouble recognizing letters in

the alphabet so there is an obstacle in

lexical decoding at the word level (level 1).

The word deficit blocks him from

understanding any of the text at levels 2-6.

Scenario 2

• Parents take their children to a new

Disney movie that have some adult

themes. The children notice the

parents laughing at different points in

the movie than they do.

• The children are making it successfully

through discourse levels 1-4, but not

levels 5 and 6.

Scenario 3• An adult reads a health insurance document.

There are lengthy sentences with embedded

clauses, complex syntax, numerous quantifiers

(all, many, rarely), and many logical operators

(and, or, not, if).

• The adult signs the contract because she

understands its purpose and trusts the insurance

agency. Levels 5 and 6 circumvent the need to

understand levels 2-4 completely.

Scenario 4

• Laboratory partners in an engineering course

read the directions to assemble a new

computer. They argue about how to hook up

the cables on the dual monitors.

• They have no problems with levels 1, 2, 3, 5,

and 6, but they do have a deficit at the

situation model level (level 4).

Conclusion

A mature assessment of reading

and writing needs to be sensitive

to the different levels of language

and discourse.

Automatic scoring of text and writing with Coh-Metrix and other automated systems

Graesser, McNamara, & Kulikowich (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40, 223-234.

Graesser, A. C., & McNamara, D. S. (2012). Automated analysis of essays and open-ended verbal responses. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA Handbook of Research Methods in Psychology, Vol 1: Foundations, Planning, Measures, and Psychometrics(pp. 307-325). Washington, DC: American Psychological Association.

Text Difficulty Measures• Popular measures (correlate r = .89 to .94)

– Flesch-Kincaid (Klare, 1976)

– Degrees of Reading Power (Koslin, Zeno, Koslin, 1987)

– Lexile scores (Stenner, 2006, MetaMetrix)

– SourceRater (Kosten, 2011, Educational Testing Service)

– Reading Maturity Metric (Landauer, Foltz, 2011, Pearson)

• Typical factors

– Word Familiarity

Word frequency, # of letters, # of syllables

– Sentence Length

• Measures are more accurate if there are other levels

Automated Computer

Assessments of Writing• Essay graders (accuracy = human experts)

– Intelligent Essay Assessor (Pearson Knowledge Technology)

– E-Rater, Criterion, CBAL (Educational Testing Service)

– Writing-Pal (McNamara, Crossley, similar to Coh-Metrix)

• Answers to questions– C-Rater (Educational Testing Service)

• Think aloud & self-explanations during reading– iSTART (McNamara)

– Reading Strategy Assessment Tool (Magliano, Millis)

• Contributions during conversation– AutoTutor dialogs and trialogs (Graesser)

– Operation ARA (Pearson Education)

Language & Discourse

Analysis Tools• Analysis of Words

– WordNet, Framenet, MRC Database, Celex, LIWC

• Syntax– Penn Treebank, Charniak Parser

• Propositions – Propbank, logical form, entailment

• Coreference and cohesion – Coh-Metrix, SourceRater

• Essay Graders – Intelligent Essay Assessor, E-rater

• Genre analyzers– Biber, Coh-Metrix

• Dialogue– AutoTutor, iSTART, Operation ARA

Tools to Represent World

Knowledge• Linguistic Inquiry and Word Count (LIWC)

(Pennebaker, Booth, & Francis, 2007)

• Latent semantic analysis (Landauer, McNamara, Dennis, & Kintsch, 2007)

• N-grams

(Jurafsky & Martin, 2008)

• Topics model and MEM (Griffiths, Steyvers, & Tenenbaum, 2007; Chung

& Pennebaker, 2010)

Figurative Language Remains

a Challenge

• Metaphor, Simile

• Personification

• Metonymy

• Hyperbole, Understatement

• Irony, Sarcasm

• Jokes, Wit

• Indirect speech acts

Cohesion– Repeated nouns and concepts across

sentences

That cat sat on a hat. The hat was black.

– Less anaphor

The hat was black. vs. It was black.

That is short. vs. That sentence is short.

– More connectives between sentences

because, although, however, first, then

– More headers and topic sentences

– Genre structure

Plants (low and high cohesion)What Are the Needs of Plants?

Like all living things, plants have certain needs. Plants need sunlight, water, and air to live. Plants also need minerals(MIN·uhr·uhlz). A mineral is a naturally occurring substance that is neither plant nor animal.

The parts of plants help them to get or make what they need. All plants get water and minerals from the soil. The root is the part of the plant that grows underground. Roots help hold the plant in the ground. Roots also help take in water and minerals that the plant needs.

The stem is the part that supports the plant. It helps the plant stand upright. It carries minerals and water from the roots. It also carries food from the leaves to other parts of the plant.

[…]

What Plants Need

Plants have certain needs, just like all living

things have needs. For example, plants need

sunlight, water, and air to live. Plants also need

minerals (pronounced as MIN·uhr·uhlz). A

mineral is not a plant or an animal. Instead, a

mineral is a substance in the ground that occurs

naturally. There are three parts of plants that

help plants get what they need or help plants

make what they need.

The Three Parts of a Plant

The three parts of the plant are the roots, stems,

and leaves.

1. The Root

The root is the part of the plant that grows

underground. All plants get water and minerals

from the ground, which is sometimes called soil.

Roots help the plant take in water and minerals

that the plant needs from the soil. Roots also

help hold the plant in the ground.

2. The Stem

The stem is the part that supports the plant. The

stem helps the plant stand upright. It carries

minerals and water from the roots of the plant to

other parts of the plant. The stem also carries

food from the leaves to other parts of the plant.

Coh-Metrix Goals

• Automatic analysis of texts on multiple

levels of the multilevel, multicomponent

frameworks

• A tool that can be used by researchers,

teachers, students, educational leaders,

and the public

Coh-Metrix: A Natural Language Processing Tool

Analyzes texts on over 100 measures of

cohesion and language

Google Cohmetrix

Example Coh-Metrix MeasuresWord Measures

• Number of syllables

• Part of speech (noun, verb…)

• Word frequency

• Concreteness, imagery

• Multiple meanings

• WordNet

Co-reference Cohesion

• Noun and argument overlap

• Stem overlap

– (lemmas: run, runs, runner)

• Latent semantic analysis (LSA)

• Lexical diversity (type-token ratio)

• Pronouns

Situation Model Cohesion

• Connectives & discourse markers

• Causal and intentional verbs

• Causal and intentional cohesion

• Repetition in tense and aspect

• Logical operators

– and, or, therefore, if, then, not

Syntax

• Structural complexity

• Modifiers per noun-phrase

• Words before main verb of

main clause

• Syntactic similarity between

sentences

Preprocessing Syntax Analysis Lexical Analysis(Filters) (Tagger, Parser) (Lemmatizier, Stemmer)

LSA

WordNet

MRCCELEX

Wd Lists

Coh-Metrix

Database

Database

Info

Syntax

Features

Sentence

Complexity

Word

Difficulty

Lexical

Features

Spatial

Cohesion

Temporal

Cohesion

Causal

Cohesion

Referential

Cohesion

Text

Complexity

Components

Copy and Paste text

Click on Analyze

Graph appears with verbal translation

Tea.cohmetrix.com

Coh-Metrix Easability Components

• Narrativity. Narrative text tells a story, with characters, events, places, and

things that are familiar to the reader.

• Syntactic simplicity. Sentences with more complex syntax are more

difficult to process, whereas those with few words and simple, familiar structures are easier to process and understand.

• Word concreteness. Concrete words evoke mental images and are

more meaningful to the reader than abstract words.

• Referential cohesion. High cohesion text contains words and ideas

that overlap across sentences and the entire text, forming threads that connect the textbase together for the reader.

• Deep cohesion. Causal, intentional, and temporal connectives help the

reader to form a more coherent and deeper understanding of the text.

0 20 40 60 80 100

Deep Cohesion

Referential Cohesion

Word Concreteness

Syntactic Simplicity

Narrativity

Percentiles

Maps and Globes

0 20 40 60 80 100

Deep Cohesion

Referential Cohesion

Word Concreteness

Syntactic Simplicity

Narrativity

Percentiles

How the Camel Got His Hump Back

Thousands of years ago, our

ancestors invented the map.

Ancient maps were crude but

very useful tools. They helped

people find food, clean water,

and the way back home--even

when home was a cave.

As civilizations grew, better

maps were needed.

Now this is the next tale, and it

tells how the Camel got his big

hump. In the beginning of

years, when the world was so

new and all, and the Animals

were just beginning to work for

Man, there was a Camel, and

he lived in the middle of a

Howling Desert…

How did we arrive at our 5 major measures? (Graesser, McNamara, & Kulikowich, 2011, Educational Researcher)

• TASA - Touchstone Applied Science Associates • 37,520 texts

• Texts had mean of 288.6 words (SD = 25.4)

• Most of the texts in the language arts, science, and social

studies

• Represents texts a student would experience

throughout K12.

• Degrees of Reading Power (DRP) scores and

Genre classification

• Conducted a Principal Components Analysis

with 53 Coh-Metrix measures

Red = Narrative, Green = Social Studies, Blue = Science

-1,5

-1

-0,5

0

0,5

1

1,5

2

Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR

Narrativity Large differences

between narrative and

informational texts

Grade level approximated by DRP

Red = Narrative, Green = Social Studies, Blue = Science

-1,5

-1

-0,5

0

0,5

1

1,5

2

Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR

Syntactic SimplicitySyntax is simpler

for informational

texts

Grade level approximated by DRP

Red = Narrative, Green = Social Studies, Blue = Science

-1,5

-1

-0,5

0

0,5

1

1,5

2

Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR

Referential CohesionReferential cohesion

may compensate for

difficult content

Grade level approximated by DRP

Red = Narrative, Green = Social Studies, Blue = Science

-1,5

-1

-0,5

0

0,5

1

1,5

2

Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR

Deep CohesionScience is a bit

lower and slight

increase over

grades

Grade level approximated by DRP

Red = Narrative, Green = Social Studies, Blue = Science

-1,5

-1

-0,5

0

0,5

1

1,5

2

Grades < 2 Grades 2-3 Grades 4-5 Grades 6-8 Grades 9-10 Grades 11-CCR

Word Concreteness An intriguing

curvilinear trend

Grade level approximated by DRP

Measures of Text Difficulty Report(Nelson, Perfetti, Liben, & Liben, 2012)

Measures of Text Difficulty Report(Nelson, Perfetti, Liben, & Liben, 2012)

Formality of Language

Formal

• Expository

• High cohesion

• Complex syntax

• Abstract words

Informal

• Narrative

• Low cohesion

• Simple syntax

• Concrete words

Policies of Text Assignment

• Pushing the envelope

• Building self-efficacy

• Balanced diet

• The role of topic interest

Text Library

Search

Coh-MetrixProfile

Analysis & Recommendations

Sort Texts

Easy Hard

Text Difficulty Heat Maps

There was once a popular television show about an exceptional talking

horse named Mr. Ed.

You might surmise that some people actually believed, or, at the very

least, wanted to believe in the alleged talking horse.

Centuries ago, there were vast majorities of people who believed that

there was a horse that could answer questions; that it could even spell

words and do complex arithmetic!

This is the factual story about a horse known as Clever Hans, and the

"clever" part of his name came from his lofty intelligence.

Around the turn of the 20th century, a German school instructor named

Mr. von Osten exhibited his keenly intelligent horse who could not

convey information verbally, so instead, he did so by tapping his hoofs

on the ground in order to give his responses to questions.

There was once a popular television show about a talking horse named

Mr. Ed.

You might think that some people actually believed there really were

talking horses.

But no, even back then they knew that horses could not talk.

Over 100 years ago there were a lot of people who believed that there

was a horse that could answer questions.

The horse could even spell words and do arithmetic!

This is the true story about a horse known as Clever Hans.

The horse lived with his owner in Germany; Hans is a common name

(even for a horse) in Germany.

The “clever” part of his name came from his high intelligence.

Here is the story: Around the turn of the 20th century, a German school

teacher named Mr. von Osten showed off his very smart horse.

Of course, not even this smart horse could talk, so he communicated

by tapping his hoofs.

Easy Hard

Easy Text Version Difficult Text Version

Corresponding sentences differ

up to 3 levels between Text A and

Text B

Definition of Engagement

Ztd = Z-score of text segment on difficulty

Zrt = Z-score of reading time

/ Zrt – Ztd / = Discrepancy score

Disengagement increases with the

discrepancy score.

What is the half-life of engagement for a

text?

FKG Score

De-Coupling as a function of Flesch-

Kincaid score of sentence triplets

Conclusions about Reading Time

and Decoupling

• Readers most engaged at the zone of

language they can handle

• Currently analyzing this decoupling at

different levels of the multilevel

framework

• Intelligent tutor could provide feedback

and activities at periodic points as they

read.

Foundational Claims• There have been major advances in computational

linguistics and automated discourse analyses during the last two decades.

• Accuracy is impressive in computer analyses of reading, writing, listening, and speaking.

• Conversational agents in social scenarios will play an increasing role in these assessments.

Who is at the Table?

• AI and computational

linguistics

• Cognitive and learning

sciences

• Language and

discourse processes

• Measurement and

assessment

top related