SOFTCARDINALITY: Hierarchical Text Overlap for
Student Response Analysis
Sergio Jimenez and Claudia Becerra
a participating system in the Student Response AnalysisTASK-7 SemEval 2013
Alexander Gelbukh
Instituto PolitécnicoNacional, México
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Soft Cardinality
A=, ,
B= , ,
|A|=3
|B|=3
Classical(integer)
Soft(real)
|A|’2.9
|B|’1.3
Cardinality: number of different elements in a collection, i.e. set definition.
C= ,= |C|=1 |C|’=1.0
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Soft Cardinality
|𝐴|′=∑𝑖=1
|𝐴|
𝑤𝑖 (∑𝑗=1
|𝐴|
𝑠𝑖𝑚(𝑎𝑖❑ ,𝑎 𝑗
❑)𝑝)− 1
inter-elementssimilarity
elementsweights
“softness”control
When
word-to-wordsimilarity
idf termweighting
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Hierarchical Similarity Model
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Word-to-wordsimilarity
“Sentence”-to-”sentence”
similarity
“Document”- Soft Cardinality
words Questions (Q) vs.Answers (A)
Reference Answer (RA) vs. Reference Answer (RA)
Q vs. set(SA)
A vs. set(SA)
(features for ML)
Word-to-word Similarity
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
q-grams character overlap
“Sentence”-to-“Sentence” Similarity
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Word overlap
Questions (Q) vs. Answers (A)Reference Answer (RA) vs. Reference Answer (RA)
“Document” Soft Cardinality
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Text overlap
a question (Q) vs. sets of reference answers (RA)
an answer (A) vs. sets of reference answers (RA)
Weights from sentence soft cardinality
Feature Set
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Total number of features:
Submited System
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
A single run obtained from supervised classification models (J48 Tree + bagging) trained separately on Beetle and SciEntsBank data sets.
Same feature set for 5-way, 3-way and 2way classification tasks.
Parameters were not necessary!No external resources used !
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
overall accuracy
Soft Cardinality + Lexical Overlap Baseline
Soft Cardinality: Hierarchical Text Overlap for SRA Sergio Jimenez
Conclusions
• The text overlap method based on the soft cardinality is very challenging baseline for the SRA task.
• The Soft Cardinality method in combination with the lexical overlap baseline produce an even stronger baseline.
Soft Cardinality at *SEM and SemEval
• STS-2012, official 3th out of 89 systems• STS-2013-CORE task, 18th out of 90 systems (4th
un-official)• STS-2013-TYPED task, top-system UNITOR team• CLTE-2012, 3rd out of 29 systems (1st un-official)• CLTE-2013, among the 2-top systems• SRA-2013, among the 2-top systems
, , 1.3’