metadata generation and glossary creation in elearning lothar lemnitzer review meeting, zürich, 25...

40
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Upload: beverly-darcy-owens

Post on 20-Jan-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Metadata generation and glossary creation in

eLearning

Lothar LemnitzerReview meeting, Zürich, 25 January

2008

Page 2: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Outline

• Demonstration of the functionalities• Where we stand• Evaluation of tools• Consequences for the development

of the tools in the final phase

Page 3: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Demo

We simulate a tutor who adds a learning objects and generates and edits additional data

Page 4: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Where we stand (1)

Achievements reached in the first year of the project:

• Annotated corpora of learning objects• Stand-alone prototype of keyword

extractor (KWE)• Stand-alone prototype of glossary

candidate detector (GCD)

Page 5: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Where we stand (2)

Achievements reached in the second year of the project:

• Quantitative evaluation of the corpora and tools

• Validation of the tools in user-centered usage scenarios for all languages

• Further development of tools in response to the results of the evaluation

Page 6: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Evaluation - rationale

Quantitative evaluation is needed to• Inform the further development of

the tools (formative)• Find the optimal setting / parameters

for each language (summative)

Page 7: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Evaluation (1)

Evaluation is applied to:• the corpora of learning objects• the keyword extractor• the glossary candidate detector

In the following, I will focus on the tool evaluation

Page 8: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Evaluation (2)

Evaluation of the tools comprises of1. measuring recall and precision

compared to the manual annotation2. measuring agreement on each task

between different annotators3. measuring acceptance of keywords /

definition (rated on a scale)

Page 9: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

KWE Evaluation step 1

• On human annotator marked n keywords in document d

• First n choices of KWE for document d extracted

• Measure overlap between both sets• measure also partial matches

Page 10: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Best method F-Measure

Bulgarian TFIDF/ADRIDF 0.25

Czech TFIDF/ADRIDF 0.18

Dutch TFIDF 0.29

English ADRIDF 0.33

German TFIDF 0.16

Polish ADRIDF 0.26

Portuguese TFIDF 0.22

Romanian TFIDF/ADRIDF 0.15

Page 11: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

KWE Evaluation – step 2

• Measure Inter-Annotator Agreement (IAA)

• Participants read text (Calimera „Multimedia“)

• Participants assign keywords to that text (ideally not more than 15)

• KWE produces keywords for text

Page 12: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

KWE Evaluation – step 2

1. Agreement is measured between human annotators

2. Agreement is measured between KWE and human annotators

We have tested two measures / approaches– kappa according to Bruce / Wiebe– AC1, an alternative agreement weighting

suggested by Debra Haley at OU, based on Gwet

Page 13: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

IAA human annotators

IAA of KWE with best settings

Bulgarian 0.63 0.99

Czech 0.71 0.78

Dutch 0.67 0.72

English 0.62 0.82

German 0.64 0.63

Polish 0.63 0.67

Portuguese 0.58 0.67

Romanian 0.59 0.61

Page 14: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

KWE Evaluation – step 3

• Humans judge the adequacy of keywords

• Participants read text (Calimera „Multimedia“)

• Participants see 20 KW generated by the KWE and rate them

• Scale 1 – 4 (excellent – not acceptable)• 5 = not sure

Page 15: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

20 kw First 5 kw

First 10 kw

Bulgarian 2.21 2.54 2.12

Czech 2.22 1.96 1.96

Dutch 1.93 1.68 1.64

English 2.15 2.52 2.22

German 2.06 1.96 1.96

Polish 1.95 2.06 2.1

Portuguese

2.34 2.08 1.94

Romanian 2.14 1.8 2.06

Page 16: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

GCD Evaluation - step 1

• A human annotator marked definitions in document d

• GCD extracts defining contexts from same document d

• Measure overlap between both sets• Overlap is measured on the sentence

level, partial overlap counts

Page 17: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Is-definitions

Recall Precision

Bulgarian 0.64 0.18

Czech 0.48 0.29

Dutch 0.92 0.21

English 0.58 0.17

German 0.55 0.37

Polish 0.74 0.22

Portuguese 0.69 0.30

Romanian 1.0 0.53

Page 18: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

GCD Evaluation – step 2

• Measure Inter-Annotator Agreement• Experiments run for Polish and Dutch• Prevalence-adjusted version of kappa

used as a measure• Polish: 0.42; Dutch: 0.44• IAA rather low for this task

Page 19: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

GCD Evaluation – step 3

• Judging quality of extracted definitions• Participants read text • Participants get definitions extracted by

GCD for that text and rate quality• Scale 1 – 4 (excellent – not acceptable)• 5 = not sure

Page 20: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

# defin. # testers Av. value

Bulgarian 25 7 2.7

Czech 24 6 3.1

Dutch 14 6 2.8

English 10 4 3.3

German 5 5 2.1

Polish 11 5 2.7

Portuguese

36 6 2.2

Romanian 9 7 3.0

Page 21: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

GCD Evaluation – step 3

Further findings• relatively high variance (many ‚1‘

and ‚4‘)• Disagreement between users about

the quality of individual definitions

Page 22: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Individual user feedback - KWE• The quality of the generated keywords

remains an issue • Variance in the responses from different

language groups• We suspect a correlation between

language of the users and their satisfaction

• Performance of KWE relies on language settings, we have to investigate them further

Page 23: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Individual user feedback – GCD

• Not all the suggested definitions are real definitions.

• Terms are ok, but definitions cited are often not what would be expected.

• Some terms proposed in the glossary did not make any sense.

• The ability to see the context where a definition has been found is useful.

Page 24: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Consequences - KWE

• Use non-distributional information to rank keywords (layout, chains)

• Present first 10 keywords to user, more keywords on demand

• For keyphrases, present most frequent attested form

• Users can add their own keywords

Page 25: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Consequences - GCD

• Split definitions into types and tackle the most important types

• Use machine learning alongside local grammars

• Look into the part of the grammars which extract the defined term

• Users can add their own definitions

Page 26: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Plans for final phase

• KWE, work with lexical chains• GCD, extend ML experiments

• Finalize documentation of the tools

Page 27: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Validation

User scenarios with NLP tools embedded:

1. Content provider adds keywords and a glossary for a new learning object

2. Student uses keywords and definitions extracted from a learning object to prepare a presentation of the content of that learning object

Page 28: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Validation

3. Students use keywords and definitions extracted from a learning objects to prepare a quiz / exam about the content of that learning object

Page 29: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Validation

We want to get feedback about• The users‘ general attitude towards

the tools• The users‘ satisfaction with the

results obtained by the tools in the particular situation of use (scenario)

Page 30: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

User feedback

• Participants appreciate the option to add their own data

• Participants found it easy to use the functions

Page 31: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Plans for the next phase

Improve precision of extraction results:• KWE – implement lexical chainer• GCD – use machine learning in

combination with local grammars or substituting these grammars

• Finalize documentation of the tools

Page 32: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Corpus statistics – full corpus

• Measuring lengths of corpora (# of documents, tokens)

• Measuring token / tpye ratio• Measuring type / lemma ratio

Page 33: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

# of documents

# of tokens

Bulgarian 55 218900

Czech 1343 962103

Dutch 77 505779

English 125 1449658

German 36 265837

Polish 35 299071

Portuguese 29 244702

Romanian 69 484689

Page 34: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Token / type Types / Lemma

Bulgarian 9.65 2.78

Czech 18.37 1.86

Dutch 14.18 1.15

English 34.93 2.8 (tbc)

German 8.76 1.38

Polish 7.46 1.78

Portuguese 12.27 1.42

Romanian 12.43 1.54

Page 35: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Corpus statistics – full corpus

• Bulgarian, German and Polish corpora have a very low number of tokens per type (probably problems with sparseness)

• English has by far the highest ratio• Czech, Dutch, Portuguese and

Romanian are in between• type / lemma ration reflects richness of

inflectional paradigms

Page 36: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

To do

• Please check / verify this numbers• Report, for the M24 deliverable,

about improvements / recanalysis of the corpora (I am aware of such activities for Bulgarian, German, and English)

Page 37: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Corpus statistics – annotated subcorpus

• Measuring lenghts of annotated documents

• Measuring distribution of manually marked keywords over documents

• Measuring the share of keyphrases

Page 38: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

# of annotated documents

Average length (# of tokens)

Bulgarian 55 3980

Czech 465 672

Dutch 72 6912

English 36 9707

German 34 8201

Polish 25 4432

Portuguese 29 8438

Romanian 41 3375

Page 39: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

# of keywords

Average # of keywords per doc.

Bulgarian 3236 77

Czech 1640 3.5

Dutch 1706 24

English 1174 26

German 1344 39.5

Polish 1033 41

Portuguese 997 34

Romanian 2555 62

Page 40: Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Keyphrases

Bulgarian 43 %

Czech 27 %

Dutch 25 %

English 62 %

German 10 %

Polish 67 %

Portuguese 14 %

Romanian 30 %