information extraction shallow processing techniques for nlp ling570 december 5, 2011
Post on 19-Dec-2015
230 views
TRANSCRIPT
![Page 1: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/1.jpg)
Information Extraction
Shallow Processing Techniques for NLPLing570
December 5, 2011
![Page 2: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/2.jpg)
RoadmapInformation extraction:
Definition & Motivation
General approach
Relation extraction techniques:Fully supervised
Lightly supervised
![Page 3: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/3.jpg)
Information ExtractionMotivation:
Unstructured data hard to manage, assess, analyze
![Page 4: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/4.jpg)
Information ExtractionMotivation:
Unstructured data hard to manage, assess, analyze
Many tools for manipulating structured data:Databases: efficient search, agglomerationStatistical analysis packagesDecision support systems
![Page 5: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/5.jpg)
Information ExtractionMotivation:
Unstructured data hard to manage, assess, analyze
Many tools for manipulating structured data:Databases: efficient search, agglomerationStatistical analysis packagesDecision support systems
Goal:Given unstructured information in textsConvert to structured data
E.g., populate DBs
![Page 6: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/6.jpg)
IE ExampleCiting high fuel prices, United Airlines said Friday it
has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. Lead airline: Amount: Effective Date: Follower:
![Page 7: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/7.jpg)
IE ExampleCiting high fuel prices, United Airlines said Friday it
has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco. Lead airline: United Airlines Amount: $6 Effective Date: 2006-10-26 Follower: American Airlines
![Page 8: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/8.jpg)
Other ApplicationsShared tasks:
Message Understanding Conferences (MUC)
![Page 9: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/9.jpg)
Other ApplicationsShared tasks:
Message Understanding Conferences (MUC)Joint ventures/mergers:
![Page 10: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/10.jpg)
Other ApplicationsShared tasks:
Message Understanding Conferences (MUC)Joint ventures/mergers:
CompanyA, CompanyB, CompanyC, amount, type
Terrorist incidents
![Page 11: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/11.jpg)
Other ApplicationsShared tasks:
Message Understanding Conferences (MUC)Joint ventures/mergers:
CompanyA, CompanyB, CompanyC, amount, type
Terrorist incidents Incident type, Actor, Victim, Date, Location
General:
![Page 12: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/12.jpg)
Other ApplicationsShared tasks:
Message Understanding Conferences (MUC)Joint ventures/mergers:
CompanyA, CompanyB, CompanyC, amount, type
Terrorist incidents Incident type, Actor, Victim, Date, Location
General:PricegrabberStock market analysisApartment finder
![Page 13: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/13.jpg)
Information ExtractionComponents
Incorporates range of techniques, tools
![Page 14: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/14.jpg)
Information ExtractionComponents
Incorporates range of techniques, toolsFSTs (pattern extraction)Supervised learning:
ClassificationSequence labeling
![Page 15: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/15.jpg)
Information ExtractionComponents
Incorporates range of techniques, toolsFSTs (pattern extraction)Supervised learning:
ClassificationSequence labeling
Semi-, un-supervised learning: clustering, bootstrapping
![Page 16: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/16.jpg)
Information ExtractionComponents
Incorporates range of techniques, toolsFSTs (pattern extraction)Supervised learning:
ClassificationSequence labeling
Semi-, un-supervised learning: clustering, bootstrapping
Part-of-Speech taggingNamed Entity RecognitionChunkingParsing
![Page 17: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/17.jpg)
Information Extraction Process
Typically pipeline of major components
![Page 18: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/18.jpg)
Information Extraction Process
Typically pipeline of major components
First step: Named Entity Recognition (NER)Often application-specific
Generic type: Person, Organiztion, LocationSpecific: genes to college courses
![Page 19: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/19.jpg)
Information Extraction Process
Typically pipeline of major components
First step: Named Entity Recognition (NER)Often application-specific
Generic type: Person, Organiztion, LocationSpecific: genes to college courses
Identify NE mentions: specific instances
![Page 20: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/20.jpg)
Example: Pre-NERCiting high fuel prices, United Airlines said Friday
it has increased fares by $6 per round trip to some cities also served by lower-cost carriers. American Airlines, a unit of AMR Corp. immediately matched the move, spokesman Time Wagner said. United, a unit of UAL Corp., said the increase took effect Thursday an applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Denver to San Francisco.
![Page 21: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/21.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines] said
[TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 22: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/22.jpg)
Information Extraction:Reference Resolution
Builds on NER
Clusters entity mentions referring to same thingsCreates entity equivalence classes
![Page 23: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/23.jpg)
Information Extraction:Reference Resolution
Builds on NER
Clusters entity mentions referring to same thingsCreates entity equivalence classes
Includes anaphora resolutionE.g. pronoun resolution (it, he, she…; one; this, that)
![Page 24: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/24.jpg)
Information Extraction:Reference Resolution
Builds on NER
Clusters entity mentions referring to same thingsCreates entity equivalence classes
Includes anaphora resolutionE.g. pronoun resolution (it, he, she…; one; this, that)
Also non-anaphoric, general mentions
![Page 25: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/25.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines] said
[TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 26: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/26.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines]
said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 27: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/27.jpg)
Relation Detection & Classification
Relation detection & classification:Find and classify relations among entities in text
![Page 28: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/28.jpg)
Relation Detection & Classification
Relation detection & classification:Find and classify relations among entities in text
Mix of application-specific and generic relations
Generic relations:
![Page 29: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/29.jpg)
Relation Detection & Classification
Relation detection & classification:Find and classify relations among entities in text
Mix of application-specific and generic relations
Generic relations:
Family, employment, part-whole, membership, geospatial
![Page 30: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/30.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines]
said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 31: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/31.jpg)
Relation Detection & Classification
Relation detection & classification:Find and classify relations among entities in text
Mix of application-specific and generic relationsGeneric relations:
Family, employment, part-whole, membership, geospatial
Generic relations:United is part-of UAL Corp.American Airlines is part-of AMRTim Wagner is an employee of American Airlines
![Page 32: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/32.jpg)
Relation Detection & Classification
Relation detection & classification: Find and classify relations among entities in text
Mix of application-specific and generic relationsGeneric relations:
Family, employment, part-whole, membership, geospatial
Generic relations:United is part-of UAL Corp.American Airlines is part-of AMRTim Wagner is an employee of American Airlines
Domain-specific relations:United serves Chicago; United serves Denver, etc
![Page 33: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/33.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines]
said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 34: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/34.jpg)
Event Detection &Classification
Event detection and classification:Find key events
![Page 35: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/35.jpg)
Event Detection &Classification
Event detection and classification:Find key events
Fare increase by UnitedMatching fare increase by AmericanReporting of fare increases
How cued?
![Page 36: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/36.jpg)
Event Detection &Classification
Event detection and classification:Find key events
Fare increase by UnitedMatching fare increase by AmericanReporting of fare increases
How cued? Instances of “said” and ”cite”
![Page 37: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/37.jpg)
Temporal Expressions:Recognition and Analysis
Temporal expression recognitionFinds temporal events in text
![Page 38: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/38.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines]
said [TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 39: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/39.jpg)
Temporal Expressions:Recognition and Analysis
Temporal expression recognitionFinds temporal events in text
E.g. Friday, Thursday in example text
![Page 40: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/40.jpg)
Temporal Expressions:Recognition and Analysis
Temporal expression recognitionFinds temporal events in text
E.g. Friday, Thursday in example textOthers:
Date expressions: Absolute: days of week, holidays Relative: two days from now, next year
Clock times: noon, 3:30pm
![Page 41: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/41.jpg)
Temporal Expressions:Recognition and Analysis
Temporal expression recognition Finds temporal events in text
E.g. Friday, Thursday in example text Others:
Date expressions: Absolute: days of week, holidays Relative: two days from now, next year
Clock times: noon, 3:30pm
Temporal analysis: Map expressions to points/spans on timeline
Anchor: e.g. Friday, Thursday: tied to example dateline
![Page 42: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/42.jpg)
Template FillingTexts describe stereotypical situations in domain
Identify texts evoking template situations
Fill slots in templates based on text
![Page 43: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/43.jpg)
Template FillingTexts describe stereotypical situations in domain
Identify texts evoking template situations
Fill slots in templates based on text
In example: fare-raise is stereo-typical event
![Page 44: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/44.jpg)
Information Extraction Steps
Named Entity Recognition
Reference Resolution
Relation Detection & Classification
Event Detection & Classification
Temporal Expression Recognition & Analysis
Template-filling
![Page 45: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/45.jpg)
Relation Detection & Classification
![Page 46: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/46.jpg)
Common Relations
![Page 47: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/47.jpg)
Relations and Entitiesfrom Example
![Page 48: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/48.jpg)
Supervised Approaches to Relation Analysis
Two stage process:
![Page 49: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/49.jpg)
Supervised Approaches to Relation Analysis
Two stage process:Relation detection:
Detect whether a relation holds between two entities
![Page 50: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/50.jpg)
Supervised Approaches to Relation Analysis
Two stage process:Relation detection:
Detect whether a relation holds between two entities Positive examples:
![Page 51: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/51.jpg)
Supervised Approaches to Relation Analysis
Two stage process:Relation detection:
Detect whether a relation holds between two entities Positive examples:
Drawn from labeled training data Negative examples:
![Page 52: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/52.jpg)
Supervised Approaches to Relation Analysis
Two stage process:Relation detection:
Detect whether a relation holds between two entities Positive examples:
Drawn from labeled training data Negative examples:
Generated from within-sentence entity pairs w/no relation
Relation classification:Label each related pair of entities by type
Multi-class classification task
![Page 53: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/53.jpg)
Feature DesignWhich features can we use to classify relations?
![Page 54: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/54.jpg)
Example: post-NERCiting high fuel prices, [ORGUnited Airlines] said
[TimeFriday] it has increased fares by [MONEY$6] per round trip to some cities also served by lower-cost carriers. [ORGAmerican Airlines], a unit of [ORGAMR Corp.] immediately matched the move, spokesman [PERTime Wagner] said. [orgUnited], a unit of [ORGUAL Corp.], said the increase took effect [TIMEThursday] an applies to most routes where it competes against discount carriers, such as [LOCChicago] to [LOCDallas] and [LOCDenver] to [LOCSan Francisco].
![Page 55: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/55.jpg)
Common Relations
![Page 56: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/56.jpg)
Feature DesignWhich features can we use to classify relations?
![Page 57: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/57.jpg)
Feature DesignWhich features can we use to classify relations?
Named entity features:Named entity types of candidate argumentsConcatenation of two entity typesHead words of argumentsBag-of-words from arguments
![Page 58: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/58.jpg)
Feature DesignWhich features can we use to classify relations?
Named entity features:Named entity types of candidate argumentsConcatenation of two entity typesHead words of argumentsBag-of-words from arguments
Context word features:Bag-of-words/bigrams between entities (+/-
stemmed)Words and stems before/after entitiesDistance in words, # entities between arguments
![Page 59: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/59.jpg)
Feature DesignFeatures for relation classification
Syntactic structure features can signal relations E.g. appositive signals part-of relation
![Page 60: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/60.jpg)
Feature DesignFeatures for relation classification
Syntactic structure features can signal relations E.g. appositive signals part-of relation
![Page 61: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/61.jpg)
Feature DesignSyntactic structure features
Derived from syntactic analysisFrom base-phrase chunking to full constituent parse
![Page 62: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/62.jpg)
Feature DesignSyntactic structure features
Derived from syntactic analysisFrom base-phrase chunking to full constituent parse
Features:Presence of constructionChunk base-phrase pathsBag-of-chunk headsDependency/constituency pathDistance between arguments
How can we represent syntax?
![Page 63: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/63.jpg)
Feature DesignSyntactic structure features
Derived from syntactic analysisFrom base-phrase chunking to full constituent parse
Features:Presence of constructionChunk base-phrase pathsBag-of-chunk headsDependency/constituency pathDistance between arguments
How can we represent syntax?Binary features: Detection of construction patterns
![Page 64: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/64.jpg)
Feature DesignSyntactic structure features
Derived from syntactic analysisFrom base-phrase chunking to full constituent parse
Features:Presence of constructionChunk base-phrase pathsBag-of-chunk headsDependency/constituency pathDistance between arguments
How can we represent syntax?Binary features: Detection of construction patternsStructure features: Encode paths through trees
![Page 65: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/65.jpg)
Some Example Features
![Page 66: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/66.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, but
![Page 67: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/67.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, butRequire large manually annotated corpus for
training
![Page 68: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/68.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, butRequire large manually annotated corpus for
training
Unsupervised approaches interesting, but
![Page 69: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/69.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, butRequire large manually annotated corpus for
training
Unsupervised approaches interesting, butMay not yield results consistent with desired
categories
Lightly supervised approaches:
![Page 70: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/70.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, butRequire large manually annotated corpus for
training
Unsupervised approaches interesting, butMay not yield results consistent with desired
categories
Lightly supervised approaches:Build on small seed sets of patterns that capture
relations of interestE.g. manually constructed regular expressions
![Page 71: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/71.jpg)
Lightly Supervised Approaches
Supervised approaches highly effective, butRequire large manually annotated corpus for
training
Unsupervised approaches interesting, butMay not yield results consistent with desired
categories
Lightly supervised approaches:Build on small seed sets of patterns that capture
relations of interestE.g. manually constructed regular expressions
Iteratively generalize and refine patterns to optimize
![Page 72: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/72.jpg)
Bootstrapping RelationsCreate seed patterns and instances
Example: finding airline hub citiesInitial pattern:
![Page 73: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/73.jpg)
Bootstrapping RelationsCreate seed patterns and instances
Example: finding airline hub citiesInitial pattern: / * has a hub at * /
In a large corpus finds examples like: Delta has a hub at LaGuardia. Milwaukee-based Midwest has a hub at KCI. American airlines has a hub at the San Juan airport
![Page 74: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/74.jpg)
Bootstrapping RelationsCreate seed patterns and instances
Example: finding airline hub citiesInitial pattern: / * has a hub at * /
In a large corpus finds examples like: Delta has a hub at LaGuardia. Milwaukee-based Midwest has a hub at KCI. American airlines has a hub at the San Juan airport
Instances: (Delta, LaGuardia), (Midwest,KCI),(American,San Juan)
![Page 75: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/75.jpg)
Bootstrapping RelationsIssues:
![Page 76: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/76.jpg)
Bootstrapping RelationsIssues:
False alarms:Some matches aren’t valid relations
The catheter has a hub at the proximal end. A star topology often has a hub at the center.
Fix?
![Page 77: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/77.jpg)
Bootstrapping RelationsIssues:
False alarms:Some matches aren’t valid relations
The catheter has a hub at the proximal end. A star topology often has a hub at the center.
Fix? /[ORG] has hub at [LOC]/Misses:
Some instances are narrowly missed by pattern No frills rival easyJet, which has established a hub at
Liverpool. Ryanair also has a continental hub at Charleroi airport.
Fix?
![Page 78: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/78.jpg)
Bootstrapping RelationsIssues:
False alarms:Some matches aren’t valid relations
The catheter has a hub at the proximal end. A star topology often has a hub at the center.
Fix? /[ORG] has hub at [LOC]/Misses:
Some instances are narrowly missed by pattern No frills rival easyJet, which has established a hub at Liverpool. Ryanair also has a continental hub at Charleroi airport.
Fix? Relax patterns
![Page 79: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/79.jpg)
Bootstrapping RelationsIssues:
False alarms:Some matches aren’t valid relations
The catheter has a hub at the proximal end. A star topology often has a hub at the center.
Fix? /[ORG] has hub at [LOC]/Misses:
Some instances are narrowly missed by pattern No frills rival easyJet, which has established a hub at Liverpool. Ryanair also has a continental hub at Charleroi airport.
Fix? Relax patterns – see issue 1
![Page 80: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/80.jpg)
Bootstrapping Relations Issues:
False alarms:Some matches aren’t valid relations
The catheter has a hub at the proximal end. A star topology often has a hub at the center.
Fix? /[ORG] has hub at [LOC]/ Misses:
Some instances are narrowly missed by pattern No frills rival easyJet, which has established a hub at Liverpool. Ryanair also has a continental hub at Charleroi airport.
Fix? Relax patterns – see issue 1 Add new high precision patterns
![Page 81: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/81.jpg)
BootstrappingHow can we obtain more high quality patterns?
![Page 82: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/82.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling?
![Page 83: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/83.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling? – see above…Use tuples identified by seed patternsFind other occurrences of tuplesExtract patterns from those occurrences
![Page 84: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/84.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling? – see above…Use tuples identified by seed patternsFind other occurrences of tuplesExtract patterns from those occurrences
E.g. tuple: Ryanair, Charleroi and hub
![Page 85: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/85.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling? – see above…Use tuples identified by seed patternsFind other occurrences of tuplesExtract patterns from those occurrences
E.g. tuple: Ryanair, Charleroi and hubOccurrences:
Budget airline Ryanair, which uses Charleroi as a hubAll flights in and out of Ryanair’s Belgian hub at Charleroi
Patterns:
![Page 86: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/86.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling? – see above… Use tuples identified by seed patterns Find other occurrences of tuples Extract patterns from those occurrences
E.g. tuple: Ryanair, Charleroi and hub Occurrences:
Budget airline Ryanair, which uses Charleroi as a hubAll flights in and out of Ryanair’s Belgian hub at Charleroi
Patterns: /[ORG], which uses [LOC] as a hub /
![Page 87: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/87.jpg)
BootstrappingHow can we obtain more high quality patterns?
Human labeling? – see above… Use tuples identified by seed patterns Find other occurrences of tuples Extract patterns from those occurrences
E.g. tuple: Ryanair, Charleroi and hub Occurrences:
Budget airline Ryanair, which uses Charleroi as a hubAll flights in and out of Ryanair’s Belgian hub at Charleroi
Patterns: /[ORG], which uses [LOC] as a hub / /[ORG]’s hub at [LOC] /
![Page 88: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/88.jpg)
![Page 89: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/89.jpg)
Bootstrapping IssuesIssues:
How do we represent patterns?
How to we assess the patterns?
How to we assess the tuples?
![Page 90: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/90.jpg)
Pattern RepresentationGeneral approaches:
![Page 91: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/91.jpg)
Pattern RepresentationGeneral approaches:
Regular expressionsSpecific, high precision
![Page 92: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/92.jpg)
Pattern RepresentationGeneral approaches:
Regular expressionsSpecific, high precision
Machine learning style feature vectorsMore flexible, higher recall
![Page 93: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/93.jpg)
Pattern RepresentationGeneral approaches:
Regular expressionsSpecific, high precision
Machine learning style feature vectorsMore flexible, higher recall
Components in representation:Context before first entity mentionContext between entity mentionsContext after last entity mentionOrder of arguments
![Page 94: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/94.jpg)
Assessing Patterns & Tuples
Issue:
![Page 95: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/95.jpg)
Assessing Patterns & Tuples
Issue: No gold standard!Can use seed patterns
![Page 96: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/96.jpg)
Assessing Patterns & Tuples
Issue: No gold standard!Can use seed patterns
Balance two main factors:Performance on seed tuples
Hits and missesProductivity in full collection
Finds
![Page 97: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/97.jpg)
Assessing Patterns & Tuples
Issue: No gold standard!Can use seed patterns
Balance two main factors:Performance on seed tuples
Hits and missesProductivity in full collection
Finds
Use conservative strategy to minimize semantic driftE.g. Sydney has a ferry hub at Circular Key.
![Page 98: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/98.jpg)
Relation Analysis Evaluation
Approaches:Standardized, labeled data sets
![Page 99: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/99.jpg)
Relation Analysis Evaluation
Approaches:Standardized, labeled data sets
Precision, recall, f-measure ‘Labeled’ precision: test both detection and classification ‘Unlabeled’ precision: test only detection
![Page 100: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/100.jpg)
Relation Analysis Evaluation
Approaches:Standardized, labeled data sets
Precision, recall, f-measure ‘Labeled’ precision: test both detection and classification ‘Unlabeled’ precision: test only detection
No fully labeled corpus
![Page 101: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/101.jpg)
Relation Analysis Evaluation
Approaches:Standardized, labeled data sets
Precision, recall, f-measure ‘Labeled’ precision: test both detection and classification ‘Unlabeled’ precision: test only detection
No fully labeled corpusEvaluate retrieved tuples (e.g. DB contents)
Intuition: Don’t care how many times we find same tuple
![Page 102: Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d2e5503460f94a06445/html5/thumbnails/102.jpg)
Relation Analysis Evaluation
Approaches:Standardized, labeled data sets
Precision, recall, f-measure ‘Labeled’ precision: test both detection and classification ‘Unlabeled’ precision: test only detection
No fully labeled corpusEvaluate retrieved tuples (e.g. DB contents)
Intuition: Don’t care how many times we find same tuple
Checked by human assessors