natural language processing: data, algorithms, and knowledge
DESCRIPTION
Natural Language Processing: Data, Algorithms, and Knowledge. BEARS 2011. Dan Klein Computer Science Division University of California, Berkeley. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A. Language Technologies. Goal: Deep Understanding. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/1.jpg)
Natural Language Processing: Data, Algorithms, and Knowledge
BEARS 2011
Dan Klein
Computer Science Division
University of California, Berkeley
![Page 2: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/2.jpg)
Language Technologies
Goal: Deep Understanding Requires context,
linguistic structure, meanings…
Reality: Shallow Matching Requires robustness and
scale Amazing successes, but
fundamental limitations
![Page 3: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/3.jpg)
Large-Scale NLP: Watson
![Page 4: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/4.jpg)
Factoids and Limitations
![Page 5: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/5.jpg)
Text Data is Superficial
An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.
![Page 6: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/6.jpg)
… But Language is Complex
Semantic structures References and entities Discourse-level connectives Meanings and implicatures Contextual factors Perceptual grounding …
An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.
![Page 7: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/7.jpg)
More Data: Machine Translation
Cela constituerait une solution transitoire qui permettrait de conduire à terme à une charte à valeur contraignante.
That would be an interim solution which would make it possible to work towards a binding charter in the long term .
[this] [constituerait] [assistance] [transitoire] [who] [permettrait] [licences] [to] [terme] [to] [a] [charter] [to] [value] [contraignante] [.]
[it] [would] [a solution] [transitional] [which] [would] [of] [lead] [to] [term] [to a] [charter] [to] [value] [binding] [.]
[this] [would be] [a transitional solution] [which would] [lead to] [a charter] [legally binding] [.]
[that would be] [a transitional solution] [which would] [eventually lead to] [a binding charter] [.]
SOURCE
HUMAN
1x DATA
10x DATA
100x DATA
1000x DATA
![Page 8: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/8.jpg)
Data By Itself Isn’t Enough!
![Page 9: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/9.jpg)
Analysis and Alignment
[Burkett, Blitzer, and Klein 10]
![Page 10: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/10.jpg)
Data and Knowledge Classic knowledge representation worry: How
will a machine ever know that… Ice is frozen water? Beige looks like this: Chairs are solid?
Answers: 1980: write it all down 2000: get by without it 2020: learn it from data
![Page 11: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/11.jpg)
Deeper Linguistic Analysis
Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun,
where frightened tourists squeezed into musty shelters .
Accuracy: 90+ [Petrov and Klein 09]
![Page 12: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/12.jpg)
Personal Pronouns (PRP)
Learning Hidden Syntax
PRP-1 it them him
PRP-2 it he they
PRP-3 It He I
NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
Proper Nouns (NNP)
Parsing Accuracy: 90.5+ [Petrov and Klein 09]
![Page 13: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/13.jpg)
Data and Knowlege: Parsing
They considered running the ad during the Super Bowl.
considered it during: 112running it during: 239
running * during: 3k considered * during: 2k
[Bansal and Klein 11]
![Page 14: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/14.jpg)
Deeper Understanding: Reference
![Page 15: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/15.jpg)
Names vs. Entities
![Page 16: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/16.jpg)
Example Errors
![Page 17: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/17.jpg)
Discovering Knowledge
![Page 18: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/18.jpg)
Unsupervised Learning
![Page 19: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/19.jpg)
Coreference Systems
![Page 20: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/20.jpg)
Cross-Document Identity
![Page 21: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/21.jpg)
Cross-Document Summaries
Lindsay Lohan pleaded not guilty Wednesday to felony grand theft of a $2,500 necklace, a case that could return the troubled starlet to jail rather than the big screen. Saying it appeared that Lohan had violated her probation in a 2007 drunken driving case, the judge set bail at $40,000 and warned that if Lohan was accused of breaking the law while free he would have her held without bail. The Mean Girls star is due back in court on Feb. 23, an important hearing in which Lohan could opt to end the case early.
[Berg-Kirkpatrick, Gillick, and Klein 11]
![Page 22: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/22.jpg)
Grounded Language
[Golland, Liang, and Klein 10]
![Page 23: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/23.jpg)
Grounding with Natural Data
… on the beige loveseat.
![Page 24: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/24.jpg)
PredictionsToday 2020 (likely) 2020 (hopefully)
Find information Synthesize information Infer information
Keywords and names Entities Concepts
Knowledge-free “structural” systems
Knowledge from text Knowledge from grounded contexts
“Talk” to search engines Talk to embedded devices
Talk to mobile robots
Superficial patterns Deep understanding Monologs dialogs
![Page 25: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/25.jpg)
Conclusion
Simple algorithms and large data have gotten us amazingly far!
To go further, we need Algorithms that work with deeper structure Learning methods that turn data into knowledge Systems that are contextualized
![Page 26: Natural Language Processing: Data, Algorithms, and Knowledge](https://reader031.vdocument.in/reader031/viewer/2022033101/56812f26550346895d94bc22/html5/thumbnails/26.jpg)
Thank you!