james christie automated marking for essay content ~ does it work?
TRANSCRIPT
Essay Definition 1 of 2
… requires a response composed by the examinee, usually in the form of one or more sentence, of a nature that no single response or pattern of responses can be listed as correct, and the accuracy and quality of which can be judged subjectively only by one skilled or informed in the subject, …
Essay Definition 2 of 2
… but even an expert cannot usually classify a response as categorically right or wrong. Rather, there are different degrees of quality or merit which can be recognized. …
attributed toStalnaker, 1951
Possible criteria for automated essay marking
• Ease of creating a scoring schema• Ability to score on various mark regimes• Ease of identification on non-scoring
elements• Ease of modification
• should scoring error(s) occur• Consistent and reproducible scoring• Acceptability of results to
• human markers, essayists, …• Defensibility• Accuracy and precision• Coachability avoidance• Cost
Essay Set• ALPHA
– The cat sat on the mat.• BRAVO
– The cat sat on the floor.• CHARLIE
– The dog lay on the floor.• MODEL
– The cat sat on the mat.
Process interface
What LEVEL of Diagnostics to use [0 ... 3] : 0
What ESSAY SET to use .................... : catmat
Enter SCHEMA to use ...................... : catmat
ALPHA.EXT . BRAVO.EXT . CHARLIE.EXT . MODEL.EXT .
Started on Thursday, February 06 2003 at 15:59:05 Finished on Thursday, February 06 2003 at 15:59:06
Schema ReportEssay set ............. catmat
Schema Report using ... catmat
Entities 1 - 4 :
Entity ID : 1 2 3 4Entity Type : a f f fPart ID's : z
Essay Name : ALPHA.EXT: y y y y BRAVO.EXT: _ y y _ CHARLIE.EXT: _ _ _ _ MODEL.EXT: y y y y
Content Report
Essay set .............. catmat
Content Report using ... catmat
Essay Name : Words Sentences Usage[%] Coverage[%]Part: z[ 3] Mark[ 3] %[100]
ALPHA.EXT: 6 1 50.00 100.00 3 3 100.00
BRAVO.EXT: 6 1 33.33 50.00 0 0 0.00
CHARLIE.EXT: 6 1 0.00 0.00 0 0 0.00
MODEL.EXT: 6 1 50.00 100.00 3 3 100.00
Started on Thursday, February 06 2003 at 15:59:05
Finished on Thursday, February 06 2003 at 15:59:06
Marked 4 file(s): scanned 4 file(s)
Marking PerformanceEssay Set
First v Second Markers
Human v SEAR
A 0.704** / 0.700** 0.594** / 0.596**
B 0.810** / 0.740** 0.404** / 0.376**
C 0.164 / 0.277 0.302* / 0.394**
D N/A 0.238* / 0.336**
Pearson / Spearman
Significance **= 0.01 *= 0.05
Commercial packages – v – SEAR
Product Essays Free Text
Style Content GradingScheme
PlainText
WordProcesse
d
English
NonEnglish
PEG 4 point
e-Rater 4 point
Intelligent Assessor
marks
Intelligent Essay
Assessor
4 point
Intellimetrics 4 point
SEAR % & mark
s
?
Future work [content]• maximise use of active and passive
voices
• cope with spelling (and grammar) errors
• increased coverage of Bloom’s Taxonomy
• include non-textual feature(s)
• develop– better feedback to the essayist– better feedback to the examiner– plagiarism detection mechanism(s)
Is this the future [for style and content]?
English The cat sat on the mat.
Italian Il gatto era seduto sullo zerbino.
Greek I gata ekatse ston kanape.
Russian Koshka sidit na matrase.
French Le chat s’est assis sur le tapis.
German Die Katze sass auf dem Teppich.
Dutch De kat zat op de mat.
Spanish El gato se sent’s en la alfombra.
etc
Future work [style]• obtain marked essays for style
marking
– plain ASCII essays• using a common set of metrics
– word-processed essays• using a common set of metrics• augmented with word-
processing based metrics
James R Christie
Does it work? Yes, but …
http://www.jkp.christie.btinternet.co.uk