detecting missing hyphens in learner text

16
Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill * , Martin Chodorow ** , Susanne Wolff * and Nitin Madnani * * Educational Testing Service, 660 Rosedale Road, Princeton, NJ 08541, USA {acahill, swolff, nmadnani}@ets.org ** Hunter College and the Graduate Center, City University of New York, NY 10065, USA [email protected]

Upload: hasana

Post on 07-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Detecting Missing Hyphens in Learner Text. Aoife Cahill * , Martin Chodorow ** , Susanne Wolff * and Nitin Madnani * * Educational Testing Service, 660 Rosedale Road, Princeton, NJ 08541, USA {acahill, swolff, nmadnani}@ets.org - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013

Detecting Missing Hyphens in Learner Text

Aoife Cahill*, Martin Chodorow**, Susanne Wolff* and Nitin Madnani*

*Educational Testing Service, 660 Rosedale Road, Princeton, NJ 08541, USA{acahill, swolff, nmadnani}@ets.org

**Hunter College and the Graduate Center, City University of New York, NY 10065, [email protected]

Page 2: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Outline

14-June-2013

• Motivation• Baselines• New Model • Experiments and Results• Conclusion

2

Page 3: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Motivation

14-June-2013

• Hyphen errors are infrequent• But are an important consideration

for students aiming to improve the overall quality of their writing

3

Dogs are lucky… most of them have built in fur coats! Brrrr!

From: http://daughternumberthree.blogspot.com

Page 4: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Motivation

14-June-20134

• Missing hyphen errors are not all lexical1. Schools may have more after school sports.2. I went to the dentist after school today.

• Language Learner text introduces additional complications3. My father like play basketball with me.

Page 5: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Baselines

14-June-2013

• Baseline 1: Collins Dictionary [5,246]– predicts a missing hyphen between bigrams that appear

hyphenated in the dictionary

• Baseline 2: Wiki (counts) [1,095]– predicts a missing hyphen between bigrams that occur

hyphenated more than 1,000 times in Wikipedia

• Baseline 3: Wiki (probs) [673,269]– predicts a missing hyphen between bigrams where the

probability of the hyphenated form as estimated from Wikipedia is greater than 0.66

5

Page 6: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

New Model

14-June-2013

• Logistic Regression Model– assigns a probability to the likelihood of a

hyphen occurring between wi and wi+1

6

Features

Tokens wi-1, wi, wi+1, wi+2 Dict Does the hyphenated form appear in the Collins dictionary?

Stems si-1, si, si+1, si+2 Prob What is the probability of the word bigram appearing hyphenated in Wikipedia?

Tags ti-1, ti, ti+1, ti+2 Distance Distance to following and preceding verb, noun

Bigrams wi–wi+1, si–si+1, ti–ti+1 Verb/Noun Is there a verb/noun preceding/following this bigram?

Page 7: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Data

14-June-2013

• Training– Well-edited text (San Jose Mercury News)– Error-corrected data mined from Wikipedia

Revisions– Combination

• Test– Artificial errors: Brown corpus– Learner text: CLC-FCE corpus, TOEFL/GRE

essays

7

Page 8: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Evaluation on Artificial Errors

14-June-2013

• Brown Corpus: 24,243 sentences, automatically remove hyphens from 2,072 words

• Each system makes a prediction for all bigrams about whether a hyphen should appear between the pair of words– precision: how many of the missing hyphen errors

predicted by the system were true errors– recall: how many of the artificially removed hyphens the

system detected as errors– f-score: the harmonic mean of precision and recall

8

Page 9: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Artificial Error Results

14-June-20139

Baseline True Positives Precision Recall F-Score

Collins Dictionary 397 40.5 19.2 26.0

Wiki (counts) 359 39.1 17.3 24.0

Wiki (probs) 811 85.5 39.1 53.7

Classifier True Positives Precision Recall F-Score

SJM-Trained 1097 82.0 52.9 64.3

Wiki-Revision-trained 1061 72.8 51.2 60.1

Combination 1106 80.9 53.4 64.3

Page 10: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Evaluation on Learner Text (1)

14-June-2013

• CLC-FCE corpus: 173 instances of missing hyphen errors

10

Baseline True Positives Precision Recall F-Score

Collins Dictionary 131 64.5 75.7 69.7

Wiki (counts) 141 73.1 81.5 77.0

Wiki (probs) 36 92.3 20.8 34.0

Classifier True Positives Precision Recall F-Score

SJM-Trained 60 84.5 34.7 49.2

Wiki-Revision-trained 71 98.6 41.0 58.0

Combination 66 98.5 38.2 55.0

Page 11: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Evaluation on Learner Text (1)

• Some observations:– Very low frequency error (173)– Dominated by one lexical item: make-up– Errors are not independent events

14-June-201311

Page 12: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Evaluation on Learner Text (2)

• Precision-only manual evaluation

• Random sample of 100 errors per system detected in 1,000 student essays

• 2 native speaker judgements (0.79)

14-June-201312

Page 13: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Evaluation on Learner Text (2)

• Native Speaker Judgements (Precision)

14-June-201313

Baseline Total Predictions Judge 1 Judge 2

Collins Dictionary 416 11 8

Wiki (counts) 2185 20 21

Wiki (probs) 224 54 52

Classifier Total Predictions Judge 1 Judge 2

SJM-Trained 421 62 69

Wiki-Revision-trained 577 43 41

Combination 450 60 62

Page 14: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Conclusions

14-June-2013

• Logistic Regression Model for predicting missing hyphens in learner text

• Trained on:1. A corpus of well-edited text2. A corpus of automatically mined corrections

• In general, the classifiers outperform the baselines, especially in terms of precision

14

http://blog.ezinearticles.com

Thanks!

Questions? Comments?

Page 15: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

Brown Corpus: Precision/Recall

14-June-201315

Page 16: Detecting Missing Hyphens in Learner Text

Copyright © 2013 by Educational Testing Service. All rights reserved.Copyright © 2013 by Educational Testing Service. All rights reserved.

CLC-FCE: Precision/Recall

14-June-201316