htst evaluation notes. outline of stable version of htst stable version of htst contains: – ht...

8
HTST Evaluation Notes

Upload: godwin-gaines

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

HTST Evaluation Notes

Page 2: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Outline of Stable Version of HTST

• Stable version of HTST contains:– HT sense knowledge base– Auxiliary sub-lexicons and data extracted from HT, e.g. highly

polysemous words, polyseme density etc.– Context feature (USAS tags) model data extracted from OED word

sense definitions.– Main software modules

• CLAWS• USAS• VARD• HT-OED based components developed in SAMUELS

Page 3: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Evaluation

• HTST is evaluated on six manually annotated test texts.• Test data:

– Five test texts manually annotated by Fraser;– One EEBO test text manually annotated by Jane.– Full test data set will contain ten texts.

• Evaluation criteria– General performance in terms of precision– Impact of OED contextual information– Impact of time filtering

• Further evaluation on full test data is under way

Page 4: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

HTST Overall Performance

Test file Main HT codes Thematic codes

Hans1820 83.15% 86.17%

Hans2001 78.78% 80.52%

Fiction1 79.51% 79.83%

Fiction2 80.28% 80.48%

History 84.37% 84.43%

1621-Newes-out-of-France

85.69% 86.67%

Note: VARD is used for EEBO sample “1621-Newes-out-of-France”, but not used for other test data.

Page 5: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Experiment with OED Information

Main HT codes Thematic codes

Test file\code With OED No OED With OED No OED

Hans1820 83.15% 81.01% 86.17% 83.93%

Hans2001 78.78% 76.36% 80.52% 78.00%

Fiction1 79.51% 76.82% 79.83% 77.15%

Fiction2 80.28% 79.18% 80.48% 79.38%

History 84.37% 81.43% 84.43% 81.49%

1621-Newes 85.69% 87.06% 86.67% 87.75%

Note: OED information helped in most cases (modern English), but decreased precision for EEBO sample. Possible cause is that OED definitions are all written in modern English.

Page 6: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Time Filtering on EEBO sample Published in 1621 (Main HT Codes)

Year range 1650 1700 1750 1800 1850 1900 1950 2000

500 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

550 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

600 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

650 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

700 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

750 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

800 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

850 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

900 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

950 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1000 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1050 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1100 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1150 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1200 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1250 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1300 84.70% 85.19% 85.00% 84.70% 84.41% 84.21% 84.11% 84.01%

1350 84.60% 85.09% 84.90% 84.60% 84.31% 84.11% 84.01% 83.92%

1400 84.60% 85.09% 84.80% 84.50% 84.21% 84.01% 83.92% 83.82%

1450 84.70% 85.19% 84.90% 84.60% 84.31% 84.01% 83.92% 83.82%

1500 84.70% 85.19% 85.00% 84.70% 84.41% 84.11% 84.01% 83.92%

1550 84.80% 85.29% 85.29% 85.00% 84.70% 84.41% 84.31% 84.21%

1600 85.19% 85.69% 85.69% 85.39% 85.09% 84.80% 84.70% 84.60%

Page 7: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Time Filtering on EEBO Sample Published in 1621 (Thematic Codes)

Year range 1650 1700 1750 1800 1850 1900 1950 2000

500 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

550 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

600 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

650 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

700 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

750 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

800 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

850 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

900 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

950 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1000 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1050 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1100 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1150 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1200 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1250 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1300 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1350 85.88% 86.37% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1400 85.98% 86.47% 86.07% 85.78% 85.58% 85.39% 85.29% 85.19%

1450 86.07% 86.56% 86.17% 85.88% 85.68% 85.39% 85.29% 85.19%

1500 86.07% 86.56% 86.27% 85.98% 85.78% 85.49% 85.39% 85.29%

1550 86.17% 86.67% 86.47% 86.17% 85.98% 85.68% 85.58% 85.49%

1600 86.07% 86.56% 86.56% 86.27% 86.07% 85.58% 85.49% 85.39%

Page 8: HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data

Observation

• On average, about 82% precision is expected.• With proper parameter setting, thematic code tagging can reach nearly

88% on some types of texts. • Need further improvement by tuning implemented methods and

introducing more reliable methods.• OED data contains noise caused by the inconsistent HT versions. If OED

entries can be precisely mapped to latest HT codes in future, it should improve the tagger.

• Larger reliable test data is needed for further development.