nlp: an information extraction perspective ralph grishman september 2005 nyu
TRANSCRIPT
-
NLP:An Information Extraction Perspective
Ralph GrishmanSeptember 2005
-
Information Extraction(for this talk)Information Extraction (IE) = identifying the instances of the important relations and events for a domain from unstructured text.
-
Extraction ExampleTopic: executive successionGeorge Garrick, 40 years old, president of the London-based European Information Services Inc., was appointed chief executive officer of Nielsen Marketing Research, USA.
George Garrick, 40 years old,Nielsen Marketing Research, USA.
Position
Company
Location
Person
Status
President
European Information Services, Inc.
London
George Garrick
Out
CEO
Nielsen Marketing Research
USA
George Garrick
In
-
Why an IE Perspective?IE can use a wide range of technologies:some successes with simple methods (names, some relations)high performance IE will need to draw on a wide range of NLP methodsultimately, everything needed for deep understandingPotential impact of high-performance IEA central perspective of our NLP laboratory
-
Progress and FrustrationOver the past decadeIntroduction of machine learning methods has allowed a shift from hand-crafted rules to corpus-trained systemsshifted burden to annotation of lots of data for a new taskBut has not produced large gains in bottom-line performanceglass ceiling on event extraction performance can the latest advances give us a push in performance and portability?
-
Pattern MatchingRoughly speaking, IE systems are pattern-matching systems:we write a pattern corresponding to a type of event we are looking forx shot ywe match it against the textBooth shot Lincoln at Fords Theatreand we fill a data base entryshooting eventsassailanttarget BoothLincoln
-
Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.performanceportabilityfuzzy boundaries
-
Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.
-
Identifying linguistic expressionsTo be at all useful, the patterns for IE must be stated structurallypatterns at the token level are not general enoughSo our main obstacle (as for many NLP tasks) is accurate structural analysisname recognition and classificationsyntactic structureco-reference structureif the analysis is wrong, the pattern wont match
-
Decomposing Structural AnalysisDecomposing structural analysis into subtasks like named entities, syntactic structure, coreference has clear benefits problems can be addressed separatelycan build separate corpus-trained modelscan achieve fairly good levels of performance (near 90%) separatelywell, maybe not for coreferenceBut it also has problems ...
-
Sequential IE Framework10090%80%70%Errors are compounded from stage to stage
RawDocName/NominalMentionTagger
ReferenceResolver
RelationTagger
Analyzed Doc.Precision:
-
A More Global ViewTypical pipeline approach performs local optimization of each stageWe can take advantage of interactions between stages by taking a more global view of best analysisFor example, prefer named entity analyses which allow for more coreference or more semantic relations
-
Names which can be coreferenced are much more likely to be correctCounting only difficult names for name tagger small margin over 2nd hypothesis, not on list of common names
-
Names which can participate in semantic relations are much more likely to be correct
Chart1
66.411
66.514.3
67.315.3
75.719
78.823.1
81.328.9
84.134.7
86.942.2
8950.1
90.755.3
Participating in Relation
Not Participating in Relation
Threshold of Margin (difference between the logprobabilities of the first and second name hypotheses)
Name Accuracy
Probability of a name being correct & margin lower than threshold
Sheet1
0.266.411
0.566.514.3
0.867.315.3
175.719
1.278.823.1
1.581.328.9
1.884.134.7
286.942.2
38950.1
490.755.3
Sheet1
66.411
66.514.3
67.315.3
75.719
78.823.1
81.328.9
84.134.7
86.942.2
8950.1
90.755.3
In Relation
Not in Relation
Margin Threshold
Accuracy(%)
Probability of a name being correct & margin lower than threshold
Sheet2
Sheet3
-
Sources of interactionCoreference and semantic relations impose type constraints (or preferences) on their arguments
A natural discourse is more likely to be cohesive to have mentions (noun phrases) which are linked by coreference and semantic relations
-
N-bestOne way to capture such global information is to use an N-best pipeline and rerank after each stage, using the additional information provided by that stage(Ji and Grishman ACL 2005 )
Reduced name tagging errors for Chinese by 20% (F measure: 87.5 --> 89.9)
-
Multiple Hypotheses + Re-Ranking Re-Ranking ModelCombination of information fromInteractions between stages
RawDocName/NominalMentionTagger
ReferenceResolver
RelationTagger100%99%98%97%85%120top11NameCorefRelationMaximumPrecision:prunedprunedprunedFinalPrecision
-
Computing Global ProbabilitiesRoth and Yih (CoNLL 2004) optimized a combined probability over two analysis stageslimited interaction to name classification and semantic relation identificationoptimized product of name and relation probabilities, subject to constraint on types of name argumentsused linear programming methodsobtained 1%+ improvement in name tagging, and 2-4% in relation tagging, over conventional pipeline
-
Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.
-
Lots of Ways of Expressing an EventBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnBooth murdered LincolnBooth fatally shot Lincoln
-
Syntactic ParaphrasesSome paraphrase relations involve the same words (or morphologically related words) and are broadly applicableBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnThese are syntactic paraphrases
-
Semantic ParaphrasesOthers paraphrase relations involve different word choices:Booth assassinated LincolnBooth murdered LincolnBooth fatally shot LincolnThese are semantic paraphrases
-
Attacking Syntactic ParaphrasesSyntactic paraphrases can be addressed through deeper syntactic representations which reduce paraphrases to a common relationship:chunkssurface syntaxdeep structure (logical subject/object)predicate-argument structure (semantic roles)
-
Tree BanksSyntactic analyzers have been effectively created through training from tree banksgood coverage possible with a limited corpus
-
Predicate Argument BanksThe next stage of syntactic analysis is being enabled through the creation of predicate-argument banksPropBank (for verb arguments) (Kingsbury and Palmer [Univ. of Penn.])NomBank (for noun arguments)*(Meyers et al. )
* first release next week
-
PA Banks, contdTogether these predicate-argument banks assign common argument labels to a wide range of constructs
The Bulgarians attacked the TurksThe Bulgarians attack on the TurksThe Bulgarians launched an attack on the Turks
-
Depth vs. AccuracyPatterns based on deeper representations cover more examplesbutDeeper representations are generally less accurateLeaves us with a dilemma to use shallow (chunk) or deep (PA) patterns
-
Resolving the DilemmaThe solution: allow patterns at multiple levelscombine evidence from the different levelsuse machine learning methods to assign appropriate weights to each level
In cases where deep analysis fails, correct decision can often be made from shallow analysis
-
Integrating Multiple LevelsZhao applied this approach to relation and event detectioncorpus-trained methoda kernel measures similarity of an example in the training corpus with a test inputseparate kernels atword levelchunk levellogical syntactic structure levela composite kernel combines information at different levels
-
Kernel-based Integration
PreprocessingPost-processingSVM / KNNResultsLogical RelationsNameTaggerSentParserPOSTaggerOther Analyzer
-
Benefits of Level IntegrationZhao demonstrated significant performance improvements for semantic relation detection by combining word, chunklogical syntactic relationsover performance of individual levels(Zhao and Grishman ACL 2005 )
-
Attacking Semantic ParaphraseSome semantic paraphrase can be addressed through manually prepared synonym sets, such as are available in WordNetStevenson and Greenwood [Sheffield] (ACL 2005) measured the degree to which IE patterns could be successfully generalized using WordNetmeasured on executive succession taskstarted with a small seed set of patterns
-
Seed Pattern Set for Executive Succession
v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit}
Subject
Verb
Object
company
v-appoint
person
person
v-resign
-
-
Evaluating IE Patterns
Text filtering metric: if we select documents / sentences containing a pattern, how many of the relevant documents / sentences do we get?
-
Wordnet worked quite well for the executive succession task
seed expanded P R P Rdocument filtering100%26%68%96%sentence filtering81%10%47%64%
-
Challenge of Semantic ParaphraseBut semantic paraphrase, by its nature, is more open ended and more domain-specific than syntactic paraphrase, so it is hard to prepare any comprehensive resource by handCorpus-based discovery methods will be essential to improve our coverage
-
Paraphrase discoveryBasic Intuition:find pairs of passages which probably convey the same informationalign structures at points of known correspondence (e.g., names which appear in both passages)Fred xxxxx HarrietFred yyyyy Harriet
similar to MT training from bitextsparaphrases
-
Evidence of paraphraseFrom almost parallel text: strong external evidence of paraphrase + a single aligned exampleFrom comparable text: weak external evidence of paraphrase + a few aligned examplesFrom general text: using lots of aligned examples
-
Paraphrase from Translations(Barzilay and McKeown ACL 01 [Columbia])Take multiple translations of same novel.High likelihood of passage paraphraseAlign sentences.Chunk and align sentence constituents
Found lots of lexical paraphrases (words & phrases); a few larger (syntactic) paraphrasesData availability limited
-
Paraphrase from news sources(Shinyama, Sekine, et al. IWP 03 )Take news stories from multiple sources from same dayUse word-based metric to identify stories about same topicTag sentences for names; look for sentences in the two stories with several names in commonmoderate likelihood of sentence paraphraseLook for syntactic structures in these sentences which share namessharing 2 names, paraphrase precision 62% (articles about murder in Japanese)sharing one name, at least four examples of a given paraphrase relation, precision 58% (2005 results, English, no topic constraint)
-
Relation paraphrase from multiple examplesBasic idea:Ifexpression R appears with several pairs of namesa R b, c R d, e R f, expression S appears with several of the same pairsa S b, e S f, Then there is a good chance that R and S are paraphrases
-
Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson
CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse
(example based on Sekine 2005)
-
Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson
CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse
select main linking predicate
-
Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson
CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse
2 shared pairsparaphrase link (buy acquire)
-
Relation paraphrase, contdBrin (1998); Agichtein and Gravano (2000):acquired individual relations (authorship, location)Lin and Pantel (2001)patterns for use in QASekine (IWP 2005 )acquire all relations between two types of namesparaphrase precision 86% for person-company pairs, 73% for company-company pairs
-
Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.
-
Topic
Set of documents on topic
Set of patterns characterizing topic
-
Riloff MetricDivide corpus into relevant (on-topic) and irrelevant (off-topic) documentsClassify (some) words into major semantic categories (people, organizations, )Identify predication structures in document (such as verb-object pairs)Count frequency of each structure in relevant (R) and irrelevant (I) documentsScore structures by (R/I) log RSelect top-ranked patterns
-
BootstrappingGoal: find examples / patterns relevant to a given topic without any corpus tagging (Yangarber 00 )Method:identify a few seed patterns for topicretrieve documents containing patternsfind additional structures with high Riloff metricadd to seed and repeat
-
#1: pick seed patternSeed: < person retires >
-
#2: retrieve relevant documentsSeed: < person retires >Fred retired.... Harry was named president.Maki retired.... Yuki was named president.Relevant documentsOther documents
-
#3: pick new patternSeed: < person retires >
< person was named president > appears in several relevant documents (top-ranked by Riloff metric)Fred retired.... Harry was named president.Maki retired.... Yuki was named president.
-
#4: add new pattern to pattern setPattern set: < person retires > < person was named president >
-
Applied to Executive Succession task
v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit, step-down } Run discovery procedure for 80 iterationsseed
Subject
Verb
Object
company
v-appoint
person
person
v-resign
-
-
Discovered patterns
Subject
Verb
Object
company
v-appoint
person
person
v-resign
-
person
succeed
person
person
be
| become
president
| officer
| chairman
| executive
company
name
president |
person
join | run | leave
company
person
serve
board | company
person
leave
post
-
Evaluation: Text FilteringEvaluated using document-level text filtering
Comparable to WordNet-based expansionSuccessful for a variety of extraction tasks
Pattern set
Recall
Precision
Seed
11%
93%
Seed+discovered
88%
81%
-
Document Recall / Precision
-
Evaluation: Slot fillingHow effective are patterns within a complete IE system?MUC-style IE on MUC-6 corpora
Caveat: filtered / aligned by hand
manualMUC54716247 70 56manualnow697974 56 75 642774 40527260
-
Topical Patterns vs. ParaphrasesThese methods gather the main expressions about a particular topicThese include sets of paraphrasesname, appoint, selectBut also include topically related phrases which are not paraphrasesappoint & resignshoot & die
-
Pattern Discovery + Paraphrase DiscoveryWe can couple topical pattern discovery and paraphrase discoveryfirst discover patterns from topic description (Sudo )then group them into paraphrase sets (Shinyama )Result are semantically coherent extraction pattern groups (Shinyama 2002)although not all patterns are groupedparaphrase detection works better because patterns are already semantically related
-
Paraphrase identification for discovered patterns (Shinyama et al 2002)worked well for executive succession task (in Japanese): precision 94%, coverage 47%coverage = number of paraphrase pairs discovered / number of pairs required to link all paraphrasesdidnt work as well for arrest task fewer names, multiple sentences with same name led to alignment errors
-
ConclusionCurrent basic research on NLP methods offers significant opportunities for improved IE performance and portabilityglobal optimization to improve analysis performancericher treebanks to support greater coverage of syntactic paraphrasecorpus-based discovery methods to support greater coverage of semantic paraphrase