different german & english coreference resolution...

31
Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH – Forschungsbereich Sprachtechnologie, Berlin GSCL 2017, 14th September, Berlin Coreference Resolution Different German & English Coreference Resolution Models for Multi-Domain Content Curation Scenarios

Upload: others

Post on 28-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm

DFKI GmbH – Forschungsbereich Sprachtechnologie, Berlin

GSCL 2017, 14th September, Berlin

Coreference Resolution

Different German & English Coreference Resolution Models for Multi-Domain Content Curation Scenarios

Page 2: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

About this Presentation• Introduction to Coreference Resolution• Coreference Resolution for English• Coreference Resolution for German• Our Approaches: Corefrule, Corefstat, Corefproj

• Coreference Resolution for Digital Curation• Endpoint

GSCL 2017 - Coreference for Digital Curation 2

Page 3: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

COREFERENCE RESOLUTION

GSCL 2017 - Coreference for Digital Curation 3

Page 4: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

GSCL 2017 - Coreference for Digital Curation 4

Source:Coreference ResolutionpresentationbyShumin WuandNicolasNicolov ofJ.D.PowerandAssociates

Page 5: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

What is Coreference Resolution?• Process of identifying all words & phrases in a document that

refer to the same entity

• Core of Natural Language Understanding (NLU) since 1960s

• Documents usually contain the full named entity once or a fewtimes. For full NLU, coreference resolution is essential

• Can be meaningfully applied in Question Answering, NamedEntity Recognition, Machine Translation, Summarisation

GSCL 2017 - Coreference for Digital Curation 5

Page 6: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Coreference & Co.AnaphoraThe music was so loud it couldn‘t be enjoyed.

CataphoraDespite her difficulty,Wilma came to understand the point.

Split antecedentsCarol told Bob to attend the party. They arrived together.

Coreferring noun phrasesSome of our colleagues will help us. These people will earnour trust.

GSCL 2017 - Coreference for Digital Curation 6

Page 7: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Motivation• Digital Curation Platform for Knowledge Workers

– Named Entity Recognition, Entity & Relation Extraction,Translation, Summarisation, Timelining, Clustering

• Benefit from Coreference: Disambiguation– Explore tools available for both German & English– Build our own… evaluate

GSCL 2017 - Coreference for Digital Curation 7

Page 8: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

SUMMARY OF APPROACHES

GSCL 2017 - Coreference for Digital Curation 8

Page 9: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Approaches to Coreference• 3 Paradigms

– Rule-based (Heuristics)– Machine Learning (Mention-Rank Model)– Knowledge-based (Crosslingual Projections)

• Coreference Resolution for English– [Raghunathan et al., 2010] [Clark & Manning, 2015 | 2016]

• Coreference Resolution for German– [Verseley et al., 2008] [Krug at al., 2015] [Roesiger &

Riester, 2015] [Tuggener, 2016]

GSCL 2017 - Coreference for Digital Curation 9

Page 10: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Evaluation of Coreference• Benchmarking Shared Tasks on Standard datasets

– Message Understanding Conference (MUC)– Automatic Content Extraction (ACE)– Computational Natural Language Learning (CoNLL)– Semantic Evaluation (SemEval)– Coreference Resolution beyond OntoNotes (CORBON)

• Evaluation Metrics– Message Understanding Conference F-measure (MUC6)– Bagga, Baldwin, Biermann (B3)– Constrained Entity-Alignment F-Measure (CEAF)

GSCL 2017 - Coreference for Digital Curation 10

Page 11: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Evaluation: MUC-6 F-Measure

GSCL 2017 - Coreference for Digital Curation 11

a1

a2

a3

a4

b1

b2

c1

c2

c3

a1

a2

a3

a4

b1

b2

c1

c2

c3

Reference: Systemoutput:

Countthenumberofcorrespondinglinksbetweenmentions

Precision=4/5

Recall=4/6

F-measure=2*Precision*Recall/(Precision+Recall)=0.727

Page 12: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Evaluation Metrics Summary• MUC6 F-measure

– Ignores single mention entities– Potentially biased toward large clusters– No one-to-one entity mapping guarantee

• B3

– Set view of mentions in an entity– Based on number of corresponding mentions between

entities averaged over total number of mentions– Does not provide one-to-one entity mapping

• CEAF– One-to-one entity mapping– Optimal mapping can be tuned to a different similarity

measure

GSCL 2017 - Coreference for Digital Curation 12

Page 13: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

THREE IMPLEMENTATIONS

GSCL 2017 - Coreference for Digital Curation 13

Corefrule Corefstat Corefproj

Page 14: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Rule-based (Multi-Sieve)• English version Based on Stanford CoreNLP

– https://nlp.stanford.edu/software/dcoref.html

• German version: in-house implementation– https://github.com/dkt-projekt/e-NLP/ecorenlp/modules

• Idea of an annotation pipeline– Sentence splitting, tokenisation, parsing, morphology

GSCL 2017 - Coreference for Digital Curation 14

Page 15: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Multi-Sieve Coreference

Sieve7:GermanSpecificProcessing

Sieve6:NamedEntityRecognition

Sieves3-5:NPHeadMatching

Sieve2:PreciseConstructs

Sieve1:ExactMatch

GSCL 2017 - Coreference for Digital Curation 15

Page 16: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Multi-Sieve Coreference

Sieve7:GermanSpecificProcessing

Sieve6:NamedEntityRecognition

Sieves3-5:NPHeadMatching

Sieve2:PreciseConstructs

Sieve1:ExactMatch

GSCL 2017 - Coreference for Digital Curation 16

• Parse a document• Get Noun Phrases,

Pronominal Phrases• Cluster them via sieve

heuristics• Exact Match: If NPs

match each other in acontext window of 5(with stemming), thenlink them

• “Der Hund” / “DesHundes”

Page 17: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Multi-Sieve Coreference

Sieve7:GermanSpecificProcessing

Sieve6:NamedEntityRecognition

Sieves3-5:NPHeadMatching

Sieve2:PreciseConstructs

Sieve1:ExactMatch

GSCL 2017 - Coreference for Digital Curation 17

• Models linked ifappositive or predicativenominative constructsare detected

• Appositive: “DonaldTrump, President ofUSA”

Page 18: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Multi-Sieve Coreference

Sieve7:GermanSpecificProcessing

Sieve6:NamedEntityRecognition

Sieves3-5:NPHeadMatching

Sieve2:PreciseConstructs

Sieve1:ExactMatch

GSCL 2017 - Coreference for Digital Curation 18

• If head word of two NPsis same, they arecoreferrent

• Some Relaxations andRules

Page 19: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Examples of Match Heuristics• Compatible mentions:

– Exact string match of capitalized mentions“Trump” & “Trump”

– Exact string match of mentions within a sentence“car” & “car”

– Acronyms“USA” & “United States of America”

– First person / Second person / Third person pronoun“I” & “me”, “you” & “yours”, “he” & “him”, “she” & “her”

• Incompatible mentions:– Different acronyms

“USA” & “UK”– Personal, gender, number disagreement

“I” & “you”, “he” & “she”, “car” & “cars”

GSCL 2017 - Coreference for Digital Curation 19

Page 20: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Evaluation: Sieve Settings1. All Sieves in place2. All mentions but no

coreference links3. {1} after deletion of

clusters with nomentions

4. {1} with insertion ofclusters with nomentions added last

GSCL 2017 - Coreference for Digital Curation 20

Page 21: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Statistical (Mention-Ranking)• Based on Stanford CoreNLP (English & German)

– English trained and evaluated on CoNLL ’11, ‘12– German trained on TüBa/D-Z, evaluated on SemEval ’10

• 4 Types of Features learned using Dagger [Ross 2011]– Distance– Syntactic– Semantic– Lexical

• Issues in out-of-domain adaptation

GSCL 2017 - Coreference for Digital Curation 21

Page 22: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Features

GSCL 2017 - Coreference for Digital Curation 22

Page 23: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Projection (Crosslingual)• Coreference for German based on English models• Transferring Models Vs Transferring Data• Corbon 2017 English—German Data

GSCL 2017 - Coreference for Digital Curation 23

Page 24: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Comparative Evaluation

GSCL 2017 - Coreference for Digital Curation 24

English:CoNLL 2012 German:SemEval 2010

Page 25: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

MULTI DOMAIN CONTENT CURATION SCENARIOS

GSCL 2017 - Coreference for Digital Curation 25

Page 26: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Curation Case Studies

GSCL 2017 - Coreference for Digital Curation 26

Page 27: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Coreference for Curation• Applied English & German coreference models on

different datasets• Corefrule outperforms Corefstat, Corefproj in terms of

number of mentions

GSCL 2017 - Coreference for Digital Curation 27

Page 28: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

About this Presentation• Introduction to Coreference Resolution• Coreference Resolution for English• Coreference Resolution for German• Our Approaches: Corefrule, Corefstat, Corefproj

• Coreference Resolution for Digital Curation

• Endpoint

GSCL 2017 - Coreference for Digital Curation 28

Page 29: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Conclusions• Performed Coreference Resolution

– In both English & German– On a variety of text types– For competing approaches (sieve, mention-rank, projection)

• Successful in coreference resolution for curation datasetssuch as an archive of letters, research materials forexhibition, news articles & downstream applications

• Currently, best choice is Multi Sieve (Rule-based)approach for out-of-domain processing

GSCL 2017 - Coreference for Digital Curation 29

Page 30: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

Thank You!

GSCL 2017 - Coreference for Digital Curation 30

Email:[email protected]

Page 31: Different German & English Coreference Resolution …gscl2017.dfki.de/files/presentations/Session3/Ankit-S...Ankit Srivastava, Sabine Weber, Peter Bourgonje, and Georg Rehm DFKI GmbH

EXTRA SLIDES

GSCL 2017 - Coreference for Digital Curation 31