improving learning from peer review with nlp and its techniques (july 2009 – june 2011) kevin...

Improving Learning from Peer Review with NLP and ITS Techniques

(July 2009 – June 2011)

Kevin AshleyDiane Litman Chris Schunn

Thank You for the Support!

New interdisciplinary research group Research outcomes

– Refereed publications– Pending IES and NSF proposals

Technology development– New version of SWoRD – “Intelligent” scaffolding components

Outline

SWoRD Intelligent Scaffolding for Reviewers and Authors AI-supported Argument Diagramming Summary

SWoRD [Cho & Schunn, 2007]

Authors submit papers Reviewers submit (anonymous) feedback Authors revise and resubmit papers Authors provide back-ratings to reviewers

regarding feedback helpfulness

SWoRD Rebuild SWoRD 3.5@LRDC dying, v4.0@Missouri struggling New SWoRD v5.0@LRDC

– Rebuilt from scratch (more stable, expandable)– More instructional flexibility

» # and type of rating dimensions, reviewing dimensions, # of drafts, grading options, …

– Better instructor oversight of students» Missing papers & reviews, high conflict reviews, inter-

rater accuracy, …

– Better research support» Can directly download ‘research’ data

SWoRD 5.0 Users Active classes in Spring 2011: 39

Users in Spring 2011: ~2000

TOTAL User accounts: ~3900

Countries:

– USA

– Canada

– United Kingdom

– Netherlands

– Estonia

– Hungary

– Turkey

– China

Disciplines:

– Psychology

– Astronomy & Physics

– Computer Science

– Biology

– Economics

– Engineering

– Speech-Language Pathology

– English & Rhetoric

– Philosophy

– Women's Health

Levels:

– University

– High School

– Middle School

Some Remaining Weaknesses

1. Feedback is often not stated in effective ways

2. Feedback and papers often do not focus on core aspects

Feedback Features and Positive Writing Performance [Nelson & Schunn, 2008]

Solutions

Summarization

Localization

Understanding of the Problem

Implementation

Our Approach: Detect and Scaffold

1. Detect and direct reviewer attention to key feedback features such as solutions

2. Detect and direct reviewer and author attention to thesis statements in papers and feedback

Detecting Key Features of Text Using Educational Data Mining

Natural Language Processing (NLP) to extract attributes from text, e.g.– Regular expressions (e.g. “the section about”)– Domain lexicons (e.g. “federal”, “American”)– Syntax (e.g. demonstrative determiners)– Overlapping lexical windows (quotation identification)

Machine Learning (ML) to predict whether feedback contains localization and solutions, and whether papers contain a thesis statement

Learned Localization Model [Xiong, Litman & Schunn, 2010]

Quantitative Model Evaluation(10 fold cross-validation)

Feedback Feature

ClassroomCorpus

N BaselineAccuracy

ModelAccuracy

ModelKappa

HumanKappa

Localization

History 875 53% 78% .55 .69

Psychology 3111 75% 85% .58 .63

Solution

History 1405 61% 79% .55 .79

CogSci 5831 67% 85% .65 .86

Predicting Feedback Helpfulness [Xiong & Litman, under review]

Recall that SWoRD supports numerical back ratings of feedback helpfulness

– My concerns come from some of the claims that are put forth. Page 2 says that the 13th amendment ended the war. Is this true? Was there no more fighting or problems once this amendment was added? … (rating 5)

– Your paper and its main points are easy to find and to follow. (rating 1)

Predicting Expert Ratings(Average of Writing and Domain Experts)

Structural attributes (e.g. review length, number of questions), lexical statistics, and meta-data (e.g. paper ratings) developed for product reviews (e.g. Amazon) are also useful for peer feedback

Features specialized for peer-review (e.g. localization) can further improve performance

Current work: student helpfulness ratings

The Problem

The ProblemStudents unable to synthesize what the

sources say…

Students unable to synthesize what the

sources say…

The Problem Students unable to synthesize what the

sources say…

Students unable to synthesize what the

sources say…

… or to apply them in solving the

problem.

… or to apply them in solving the

problem.

Our Solution

Source texts

Source texts Author creates

Argument Diagram

Author creates Argument Diagram

Peers review Argument Diagrams

Peers review Argument Diagrams

Author revises Argument Diagram

Author revises Argument Diagram

Author writes paper

Author writes paper

Peers review papers

Peers review papers

Author revises paper

Author revises paper

AI: Guides preparing diagram and using it

in writing

AI: Guides preparing diagram and using it

in writing

AI: Guides reviewingAI: Guides reviewing

Argument diagram student created with LASAD

1 · Hypothesis Link: 1

If: Participants are assigned to the active conditionThen: they will be better at correctly identifying stimuli than participants in the passive condition.

2 · Hypothesis Link: 2

If: The participant has small handsThen: they will be better at recognizing objects

than regardless of what condition they’re in..

9 · (+) supports Link: 1

Active touch participants were able to more accurately identify objects because they had the use of sensitive

fingertips in exploring the objects


Active touch is more effective than passive touch


Active touch improved through the development levels but passive touch stayed the same (hand size may

play role)


Sensory perceptors in smaller hands are closer together, allowing for more accurate object acuity

8 · Citation Link: 1

(Craig 2001)


(Gibson 1962)


(Cronin 1977)


(Peters 2009)

LASAD analyzes diagrams With even small set of types of argument nodes and relations and of

constraint-defining rules… Even simple argument diagrams provide pedagogical information that

can be automatically analyzed. E.g., has student:– Addressed all sources and hypotheses? (No)– Indicated that citations support claims/hypotheses? (Not vice versa as

here)– Related all sources and hypotheses under single claim? (No)– Related some citations to more than one hypothesis? (No interactions

here)– Included oppositional relations as well as supports? (No)– Avoided isolated citations? (Yes)– Avoided disjoint sub-arguments? (No)

Prototype SWoRD Interface for feedback to reviewer pre-review submission

Claims or reasons are unconnected to the research question or hypothesis.

Lippman, 2010 is not organized around a hypothesis.

Siler 2009 is more focused on the response to the task not focused on the actual type of task which is what the hypothesis for the effect of IV2. Doesn’t support the research question.

H2 needs reasoning to connect prior research with the hypothesis, e.g. “because multi-step algebra problems are perceived as more difficult, people are more likely to fail in solving them.”

Support 2 is weak because it’s basically citing a study as the reason itself. Instead, it should be a general claim, that uses Jones, 2007 to back it up.

Lippman, 2010 is free floating and needs to be linked to either the research question or a hypothesis.

Say where these issues happen!(like the green text in other comments)

Say where these issues happen!(like the green text in other comments)

Suggest how to fix these

problems!(like the blue text

in other comments)

Suggest how to fix these

problems!(like the blue text

in other comments)

= Localization hintsXX = Solution hintsXX

Prototype tool to translate student argument diagrams into text

A Translation of Your Argument Diagram (click to edit)

Next Steps

A Translation of Your Argument Diagram (click to edit)

Next Steps

The first hypothesis is, “If participants are assigned to the active condition, then they will be better at correctly identifying stimuli than participants in the passive condition.” This hypothesis is supported by (Craig 2001) where it was found that “Active touch participants were able to more accurately identify objects because they had the use of sensitive fingertips in exploring the objects.” The hypothesis is also supported by (Gibson 1962) where …

The first hypothesis is, “If participants are assigned to the active condition, then they will be better at correctly identifying stimuli than participants in the passive condition.” This hypothesis is supported by (Craig 2001) where it was found that “Active touch participants were able to more accurately identify objects because they had the use of sensitive fingertips in exploring the objects.” The hypothesis is also supported by (Gibson 1962) where …

The second hypothesis is, … The second hypothesis is, …

1

2

Export textExport text

QuitQuit

Save progressSave progressPossible things to improve your argument:•Add a missing citation•Add third hypothesis•Indicate which hypothesis is an interaction hypothesis and specifying an interaction variable(s)•Relate one or more hypotheses along with their supporting sources under a single sub claim•Include any oppositional relations between citations and a hypothesis•Relate the disjointed subarguments concerning the hypotheses under one overall argument

Possible things to improve your argument:•Add a missing citation•Add third hypothesis•Indicate which hypothesis is an interaction hypothesis and specifying an interaction variable(s)•Relate one or more hypotheses along with their supporting sources under a single sub claim•Include any oppositional relations between citations and a hypothesis•Relate the disjointed subarguments concerning the hypotheses under one overall argument

improving learning from peer review with nlp and its techniques (july 2009 – june 2011) kevin...

Documents

papers reviewers

papers authors

key feedback features

anonymous feedback authors

litman schunn

research datasword

paper ratings

better research supportcan