chunking: shallow parsing
DESCRIPTION
School of Computing FACULTY OF ENGINEERING. Chunking: Shallow Parsing. Eric Atwell, Language Research Group. Shallow Parsing. Break text up into non-overlapping contiguous subsets of tokens. Also called chunking, partial parsing, light parsing. What is it useful for? – semantic patterns - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/1.jpg)
School of somethingFACULTY OF OTHER
School of ComputingFACULTY OF ENGINEERING
Chunking: Shallow Parsing
Eric Atwell, Language Research Group
![Page 2: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/2.jpg)
Shallow Parsing
Break text up into non-overlapping contiguous subsets of tokens.
• Also called chunking, partial parsing, light parsing.
What is it useful for? – semantic patterns
• Finding key “meaning-elements”: Named Entity Recognition
• people, locations, organizations
• Studying linguistic patterns, e.g. semantic patterns of verbs
• gave NP
• gave up NP in NP
• gave NP NP
• gave NP to NP
• Can ignore complex structure when not relevant
![Page 3: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/3.jpg)
A Relationship between Segmenting and Labeling
Tokenization segments the text
Tagging labels the text
Shallow parsing does both simultaneously.
![Page 4: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/4.jpg)
Chunking vs. Full Syntactic Parsing
“G.K. Chesterton, author of The Man who was Thursday”
![Page 5: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/5.jpg)
Representations for Chunks
IOB tags
• Inside, outside, and begin
• In English, the start of a phrase is often marked by a function-word
![Page 6: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/6.jpg)
Representations for Chunks
Trees
• Chunk structure is a two-level tree that spans the entire text, containing both chunks and non-chunks
![Page 7: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/7.jpg)
CONLL Corpus: training data for Machine Learning of chunking
From the Conference on Natural Language Learning Competition from 2000
Goal: create machine learning methods to improve on the chunking task
![Page 8: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/8.jpg)
CONLL Corpus
Data in IOB format from WSJ Wall Street Journal:
• Word POS-tag IOB-tag
• Training set: 8936 sentences
• Test set: 2012 sentences
Tags from the Brill tagger
• Penn Treebank Tags
Evaluation measure: F-score
• 2*precision*recall / (recall+precision)
• Baseline was: select the chunk tag that is most frequently associated with the POS tag, F =77.07
• Best score in the contest was F=94.13
![Page 9: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/9.jpg)
Chunking with Regular Expressions
This time we write regex’s over TAGS rather than characters
• <DT><JJ>?<NN>
• <NN.*>
• <JJ|NN>+
Compile them with parse.ChunkRule()
• rule = parse.ChunkRule(‘<DT|NN>+’)
• chunkparser = parse.RegexpChunk([rule], chunk_node = ‘NP’)
Resulting object is a (sort-of) parse tree
• Top-level node called S
• Chunks are labelled NP
![Page 10: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/10.jpg)
Chunking with Regular Expressions
![Page 11: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/11.jpg)
Chunking with Regular Expressions
Rule application is sensitive to order
![Page 12: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/12.jpg)
Chinking
Specify what does not go into a chunk.
• Kind of like specifying punctuation as being not alphanumeric and spaces.
• Can be more difficult to think about.
![Page 13: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/13.jpg)
Simple chink-chunk approach: function v content word-class
Regular expressions for chunks and chinks CAN get complex
BUT the whole point is to be simpler than full parsing!
SO: use a simple model which works “reasonably well”
(then tidy up afterwards…)
Chunk = nominal content-word (noun)
Chink = others (verb, pronoun, determiner, preposition, conjunction) (+adjective, adverb as a borderline category)
![Page 14: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/14.jpg)
Example
Fruit flies like a banana
fruit\N flies\N like\V a\A banana\N
[fruit flies] like a [banana]
[S [NP fruit\N flies\N NP]
[VP like\V
[NP a\A banana\N NP]
VP]
S]
![Page 15: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/15.jpg)
An alternative parse
This sentence is grammatically ambiguous:
Fruit flies like a banana
fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana]
fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana]
cf: “bank robbers like a chase” v “bread bakes in an oven”
[S [NP fruit\N NP]
[VP flies\V
[PP like\I [NP a\A banana\N NP] PP]
VP]
S]
![Page 16: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/16.jpg)
Ambiguity leads to more rules
fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana]
fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana]
BUT what about: Time flies like an arrow - time\N, time\V
time\N flies\N like\V an\A arrow\N [time flies] like an [arrow]
time\N flies\V like\I an\A arrow\N [time] flies like an [arrow]
time\V flies\N like\I an\A arrow\N time [flies] like an [arrow]
3rd PoS-tagging gives ambiguous parse
![Page 17: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/17.jpg)
Chunking can predict prosodic breaks
http://www.acm.org/crossroads/
An Approach for Detecting Prosodic Phrase Boundaries in Spoken English by Claire Brierley and Eric Atwell
![Page 18: Chunking: Shallow Parsing](https://reader036.vdocument.in/reader036/viewer/2022081506/5681489d550346895db5b389/html5/thumbnails/18.jpg)
Summary
Shallow parsing is useful for:
Entity recognition
• people, locations, organizations
Studying linguistic patterns
• gave NP
• gave up NP in NP
• gave NP NP
• gave NP to NP
Prosodic phrase breaks – pauses in speech
Can ignore complex structure when not relevant
Chink-chunk approach: “quick-and-dirty” chunking, content v function PoS
Chink-chunk parsing is simpler than context-free grammar parsing!