sequence classification: chunking & ner shallow processing techniques for nlp ling570 november...
Post on 19-Dec-2015
214 views
TRANSCRIPT
![Page 1: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/1.jpg)
Sequence Classification:
Chunking & NERShallow Processing Techniques for NLP
Ling570November 23, 2011
![Page 2: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/2.jpg)
Roadmap Named Entity Recognition
Chunking
HW #9
![Page 3: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/3.jpg)
Named Entity Recognition
![Page 4: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/4.jpg)
RoadmapNamed Entity Recognition
Definition
Motivation
Challenges
Common Approach
![Page 5: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/5.jpg)
Named Entity RecognitionTask: Identify Named Entities in (typically)
unstructured text
Typical entities:Person namesLocationsOrganizationsDatesTimes
![Page 6: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/6.jpg)
ExampleMicrosoft released Windows Vista in 2007.
Example due to F. Xia
![Page 7: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/7.jpg)
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Example due to F. Xia
![Page 8: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/8.jpg)
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence:
Example due to F. Xia
![Page 9: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/9.jpg)
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence: products, companies, featuresBiomedical:
Example due to F. Xia
![Page 10: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/10.jpg)
ExampleMicrosoft released Windows Vista in 2007.
<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>
Entities:Often application/domain specific
Business intelligence: products, companies, featuresBiomedical: Genes, proteins, diseases, drugs, …
Example due to F. Xia
![Page 11: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/11.jpg)
Named Entity TypesCommon categories
![Page 12: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/12.jpg)
Named Entity ExamplesFor common categories:
![Page 13: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/13.jpg)
Why NER?Machine translation:
![Page 14: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/14.jpg)
Why NER?Machine translation:
Person
![Page 15: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/15.jpg)
Why NER?Machine translation:
Person names typically not translatedPossibly transliteratedWaldheim
Number:
![Page 16: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/16.jpg)
Why NER?Machine translation:
Person names typically not translatedPossibly transliteratedWaldheim
Number: 9/11: Date vs ratio911: Emergency phone number, simple number
![Page 17: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/17.jpg)
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on
![Page 18: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/18.jpg)
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
![Page 19: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/19.jpg)
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs
![Page 20: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/20.jpg)
Why NER?Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs
Text-to-speech:
![Page 21: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/21.jpg)
Why NER? Information extraction:
MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),
valuations
Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs
Text-to-speech: 206-616-5728
Phone numbers (vs other digit strings) , differ by language
![Page 22: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/22.jpg)
ChallengesAmbiguity
Washington chose
![Page 23: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/23.jpg)
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
![Page 24: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/24.jpg)
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
cat: (95 results)
![Page 25: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/25.jpg)
ChallengesAmbiguity
Washington choseD.C., State, George, etc
Most digit strings
cat: (95 results)CAT(erpillar) stock tickerComputerized Axial TomographyChloramphenicol Acetyl Transferasesmall furry mammal
![Page 26: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/26.jpg)
Context & Ambiguity
![Page 27: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/27.jpg)
EvaluationPrecision
Recall
F-measure
![Page 28: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/28.jpg)
ResourcesOnline:
Name listsBaby name, who’s who, newswire services,
census.govGazetteersSEC listings of companies
ToolsLingpipeOpenNLPStanford NLP toolkit
![Page 29: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/29.jpg)
Approaches to NERRule/Regex-based:
![Page 30: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/30.jpg)
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex:
![Page 31: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/31.jpg)
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
![Page 32: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/32.jpg)
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
Machine Learning via Sequence Labeling:Better for names, organizations
![Page 33: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/33.jpg)
Approaches to NERRule/Regex-based:
Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+
Machine Learning via Sequence Labeling:Better for names, organizations
Hybrid
![Page 34: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/34.jpg)
NER as Sequence Labeling
![Page 35: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/35.jpg)
NER as Classification TaskInstance:
![Page 36: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/36.jpg)
NER as Classification TaskInstance: token
Labels:
![Page 37: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/37.jpg)
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), Outside
![Page 38: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/38.jpg)
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUM
![Page 39: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/39.jpg)
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?
![Page 40: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/40.jpg)
NER as Classification TaskInstance: token
Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?
(|NER Types|x 2) + 1
![Page 41: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/41.jpg)
NER as Classification: Features
What information can we use for NER?
![Page 42: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/42.jpg)
NER as Classification: Features
What information can we use for NER?
![Page 43: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/43.jpg)
NER as Classification: Features
What information can we use for NER?
Predictive tokens: e.g. MD, Rev, Inc,..
How general are these features?
![Page 44: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/44.jpg)
NER as Classification: Features
What information can we use for NER?
Predictive tokens: e.g. MD, Rev, Inc,..
How general are these features? Language? Genre? Domain?
![Page 45: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/45.jpg)
NER as Classification: Shape Features
Shape types:
![Page 46: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/46.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower case
![Page 47: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/47.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercase
![Page 48: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/48.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalized
![Page 49: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/49.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower case
![Page 50: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/50.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower caseCapitalized with period: H.
![Page 51: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/51.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower casecapitalized: e.g. Washington
First letter uppercaseall caps: e.g. WHO
all letters capitalizedmixed case: eBay
Mixed upper and lower caseCapitalized with period: H.Ends with digit: A9
![Page 52: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/52.jpg)
NER as Classification: Shape Features
Shape types: lower: e.g. cumming
All lower case capitalized: e.g. Washington
First letter uppercase all caps: e.g. WHO
all letters capitalized mixed case: eBay
Mixed upper and lower case Capitalized with period: H. Ends with digit: A9 Contains hyphen: H-P
![Page 53: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/53.jpg)
Example Instance Representation
Example
![Page 54: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/54.jpg)
Sequence LabelingExample
![Page 55: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/55.jpg)
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
![Page 56: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/56.jpg)
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
![Page 57: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/57.jpg)
EvaluationSystem: output of automatic tagging
Gold Standard: true tags
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
F1 balances precision & recall
![Page 58: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/58.jpg)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
![Page 59: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/59.jpg)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
![Page 60: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/60.jpg)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag?
![Page 61: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/61.jpg)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag? O – most tokens aren’t NEs
Evaluation measures focuses on NE
![Page 62: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/62.jpg)
EvaluationStandard measures:
Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)
Classifiers vs evaluation measuresClassifiers optimize tag accuracy
Most common tag? O – most tokens aren’t NEs
Evaluation measures focuses on NE
State-of-the-art:Standard tasks: PER, LOC: 0.92; ORG: 0.84
![Page 63: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/63.jpg)
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…
![Page 64: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/64.jpg)
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
![Page 65: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/65.jpg)
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
Hybrid system:High precision rules tag unambiguous mentions
Use string matching to capture substring matches
![Page 66: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/66.jpg)
Hybrid ApproachesPractical sytems
Exploit lists, rules, learning…Multi-pass:
Early passes: high precision, low recallLater passes: noisier sequence learning
Hybrid system:High precision rules tag unambiguous mentions
Use string matching to capture substring matchesTag items from domain-specific name listsApply sequence labeler
![Page 67: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/67.jpg)
Chunking
![Page 68: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/68.jpg)
RoadmapChunking
Definition
Motivation
Challenges
Approach
![Page 69: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/69.jpg)
What is Chunking?Form of partial (shallow) parsing
![Page 70: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/70.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
![Page 71: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/71.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentence
![Page 72: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/72.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrases
![Page 73: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/73.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunking
![Page 74: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/74.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunkingCreate simple bracketing
[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]
![Page 75: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/75.jpg)
What is Chunking?Form of partial (shallow) parsing
Extracts major syntactic units, but not full parse trees
Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS
May ignore some categories; i.e. base NP chunkingCreate simple bracketing
[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]
[NPThe morning flight] from [NPDenver] has arrived
![Page 76: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/76.jpg)
Why Chunking?Used when full parse unnecessary
![Page 77: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/77.jpg)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
![Page 78: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/78.jpg)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
![Page 79: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/79.jpg)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
![Page 80: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/80.jpg)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
Summarization: Base information, remove mods
![Page 81: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/81.jpg)
Why Chunking?Used when full parse unnecessary
Or infeasible or impossible (when?)
Extraction of subcategorization frames Identify verb arguments
e.g. VP NP VP NP NP VP NP to NP
Information extraction: who did what to whom
Summarization: Base information, remove mods
Information retrieval: Restrict indexing to base NPs
![Page 82: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/82.jpg)
Processing Example Tokenization: The morning flight from Denver has arrived
![Page 83: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/83.jpg)
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
![Page 84: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/84.jpg)
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
Chunking: NP PP NP VP
![Page 85: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/85.jpg)
Processing Example Tokenization: The morning flight from Denver has arrived
POS tagging: DT JJ N PREP NNP AUX V
Chunking: NP PP NP VP
Extraction: NP NP VP
etc
![Page 86: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/86.jpg)
ApproachesFinite-state Approaches
Grammatical rules in FSTsCascade to produce more complex structure
![Page 87: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/87.jpg)
ApproachesFinite-state Approaches
Grammatical rules in FSTsCascade to produce more complex structure
Machine LearningSimilar to POS tagging
![Page 88: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/88.jpg)
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
![Page 89: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/89.jpg)
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching rule
![Page 90: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/90.jpg)
Finite-State Rule-Based Chunking
Hand-crafted rules model phrasesTypically application-specific
Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching ruleGreedy approach, not guaranteed optimal
![Page 91: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/91.jpg)
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal:
![Page 92: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/92.jpg)
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP:
![Page 93: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/93.jpg)
Finite-State Rule-Based Chunking
Chunk rules:Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay
Examples:NP (Det) Noun* NounNP Proper-NounVP VerbVP Aux Verb
![Page 94: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/94.jpg)
Finite-State Rule-Based Chunking
Chunk rules: Cannot contain recursion
NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay
Examples: NP (Det) Noun* Noun NP Proper-Noun VP Verb VP Aux Verb
Consider: Time flies like an arrow
Is this what we want?
![Page 95: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/95.jpg)
Cascading FSTsRicher partial parsing
Pass output of FST to next FST
![Page 96: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/96.jpg)
Cascading FSTsRicher partial parsing
Pass output of FST to next FST
Approach:First stage: Base phrase chunkingNext stage: Larger constituents (e.g. PPs, VPs)Highest stage: Sentences
![Page 97: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/97.jpg)
Example
![Page 98: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/98.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance:
![Page 99: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/99.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification
![Page 100: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/100.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)
![Page 101: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/101.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.
![Page 102: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/102.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation &
identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-I
![Page 103: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/103.jpg)
Chunking by ClassificationModel chunking as task similar to POS tagging
Instance: tokens
Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE)
Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-INP-B NP-I NP-I NP-B
![Page 104: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/104.jpg)
Features for ChunkingWhat are good features?
![Page 105: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/105.jpg)
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
![Page 106: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/106.jpg)
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
![Page 107: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/107.jpg)
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
Parts of speechfor 2 preceding, current, 2 following
![Page 108: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/108.jpg)
Features for ChunkingWhat are good features?
Preceding tagsfor 2 preceding words
Wordsfor 2 preceding, current, 2 following
Parts of speechfor 2 preceding, current, 2 following
Vector includes those features + true label
![Page 109: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/109.jpg)
Chunking as ClassificationExample
![Page 110: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/110.jpg)
EvaluationSystem: output of automatic tagging
Gold Standard: true tags Typically extracted from parsed treebank
Precision: # correct chunks/# system chunks
Recall: # correct chunks/# gold chunks
F-measure:
F1 balances precision & recall
![Page 111: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/111.jpg)
State-of-the-ArtBase NP chunking: 0.96
![Page 112: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/112.jpg)
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
![Page 113: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/113.jpg)
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
Limiting factors:
![Page 114: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/114.jpg)
State-of-the-ArtBase NP chunking: 0.96
Complex phrases: Learning: 0.92-0.94Most learners achieve similar results
Rule-based: 0.85-0.92
Limiting factors:POS tagging accuracy Inconsistent labeling (parse tree extraction)Conjunctions
Late departures and arrivals are common in winterLate departures and cancellations are common in winter
![Page 115: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/115.jpg)
HW #9
![Page 116: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/116.jpg)
Building a MaxEnt POS Tagger
Q1: Build feature vector representations for POS tagging in SVMlight format
maxent_features.* training_file testing_file rare_wd_threshold rare_feat_threshold outdir
training_file, testing_file: like HW#7w1/t1 w2/t2 …wn/tn
Filter rare words and infrequent features
Store vectors & intermediate representations in outdir
![Page 117: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/117.jpg)
Feature RepresentationsFeatures:
Ratnaparkhi, 1996, Table 1 (duplicated in MaxEnt slides)
Character issues:Replace “,” with “comma”Replace “:” with “colon”
Mallet and svmlight format use these as delimiters
![Page 118: Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011](https://reader037.vdocument.in/reader037/viewer/2022110322/56649d2e5503460f94a05589/html5/thumbnails/118.jpg)
Q2: ExperimentsRun MaxEnt classification using your training and
test files
Compare effects of different thresholds on feature count, accuracy, and runtime
Note: Big filesThis assignment will produce even larger sets of
results that HW#8. Please gzip your tar files. If the DropBox won’t accept the files, you can store
the files on patas. Just let Sanghoun know where to find them.