intro to ai - clarkson universityjsearlem/cs451/fa13/lectures/24.nlu.pdf · language a large amount...
TRANSCRIPT
![Page 1: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/1.jpg)
11-20-2013
![Page 2: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/2.jpg)
Natural Language Understanding
Read: AIMA Chapters 22 & 23
HW#8, due Monday, 11/25
![Page 3: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/3.jpg)
What are some of the most impressive
technologies in futuristic Science Fiction?
e.g. consider Star Trek
One is the “Universal Translator”
![Page 4: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/4.jpg)
Science behind Watson
Three key capabilities
Natural Language Understanding
Hypothesis Generation
Evidence-based Learning
![Page 5: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/5.jpg)
NLP is a discipline that aims to build computer systems that will be able to analyze, understand and generate human speech.
NLP subareas of research are:
Speech Recognition (speech analysis),
Speech Synthesis (speech generation), and
Natural Language Understanding (NLU).
![Page 6: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/6.jpg)
Putting meaning to the words
Input might be speech or could be typed in
Holy grail of Artificial Intelligence problems
![Page 7: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/7.jpg)
Georgetown University:
“The spirit is willing but the flesh is weak.”
English to Russian
Russian to English
“The vodka is good but the meat has spoiled.”
![Page 8: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/8.jpg)
Consider the following conversation between
Mary and Tom:
Tom: “Who do you like tonight, Boston or LA?”
Mary: “Lakers. You?”
Tom: “Come on Mary, LA can’t handle Bird.”
Mary: “I’ve got a five that says Magic will shut him down.”
![Page 9: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/9.jpg)
Problem: English sentences are incomplete descriptions of the info they are intended to convey.
I called Linda to ask her to the movies. She said she’d love to go.
but…
speakers can be vague or precise; can leave out
details that the hearer is expected to know
![Page 10: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/10.jpg)
Problem: The same expression means different things in different contexes.
Where’s the water?
but…
can communicate about an infinite world with a
finite number of symbols
![Page 11: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/11.jpg)
Problem: New words, expressions and meanings evolve.
I’ll fax it to you.
In the 1600s, St. Paul’s cathedral was said to be “amusing, awful and artificial.”
“Selfie” named by Oxford dictionaries as word of the year 2013.
but…
languages can evolve as experiences change
![Page 12: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/12.jpg)
Problem: There are a lot of ways of saying the same thing.
Mary was born on March 27th.
Mary’s birthday is March 27th.
but…
when you know a lot, facts imply each other
![Page 13: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/13.jpg)
Speech recognition is the process of converting spoken language to written text or some similar form.
Speech synthesis is the process of converting the text into spoken language.
![Page 14: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/14.jpg)
Natural Language Understanding (NLU) is a process of analysis of recognized words and transforming them into data meaningful to computer.
Other words, NLU is a computer based system that “understands” human language.
NLU is used in combination with speech recognition.
![Page 15: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/15.jpg)
■Three major issues involved in understanding language
A large amount of human knowledge is assumed
Language is pattern based: phonemes are components of words and words make phrases and sentences.
Language acts are the product of agents, either human or computer
■Terry Winograd’s SHRDLU(Winograd 1972)
Early AI programs made progress by restricting the focus to microworld
![Page 16: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/16.jpg)
SHRDLU could respond to English queries What is sitting on the red block? What shape is the blue block on the table? Place the green pyramid on the red brick.
![Page 17: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/17.jpg)
Language is a complicated phenomenon, involving processes as varied as the recognition of sounds or printed letters, syntactic parsing, high-level semantic inferences, and even the communication of emotional content through rhythm and inflection.
To manage this complexity, linguists have defined different levels of analysis for natural language.
![Page 18: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/18.jpg)
NLP Pipeline
Phonetic/Phonological Analysis
Morphological analysis
OCR/Tokenization
Syntactic analysis
Semantic Interpretation
Discourse Processing
speech text
![Page 19: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/19.jpg)
Phonology
Syntax
Semantics
Pragmatics &
World Knowledge
![Page 20: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/20.jpg)
Prosody: dealing with inflection, stress, pitch, timing
Phonology: examining sounds combined to form language, important for speech recognition and generation
Morphology: concerned with morphemes making up words. These include rules governing the formation of words. Important in determining the role of a word in a sentence in most languages in the world.
Morphological anomaly: “The computer eated an apple.”
![Page 21: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/21.jpg)
Syntax: dealing with rules for combining words into legal phrases and sentences
Syntactic anomaly:
“The computer ate apple.”
“An the ate apple computer.”
Semantics: considers meaning of words, phrases,
and sentences also ways in which meaning is conveyed in natural language
Semantic anomaly: “The computer ate an apple.”
![Page 22: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/22.jpg)
Pragmatics: dealing with ways in which language is used and its effects on the listener
“Do you know the time?”
Pragmatic anomaly: “Next year, all taxes will disappear.”
World knowledge: includes knowledge of physical world, is essential to understand the full meaning of a text
“The pen is in the box.” versus
“The box is in the pen.”
![Page 23: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/23.jpg)
Lazy Contented Cats Sleep Peacefully
Sleep Furiously Colorless Green Ideas
![Page 24: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/24.jpg)
Squad helps dog bite victim. Helicopter powered by human flies. I ate spaghetti with meatballs.
… with salad. … with abandon. … with a fork … with a friend.
Ambiguity can be lexical, syntactic, semantic, or referential
![Page 25: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/25.jpg)
S
NP VP
V NP PP
Art N PP
John
saw
a
with a telescope
in a park boy
S
NP VP
V NP
PP Art N PP
John
saw
a with a telescope in a park boy
S
NP VP
V NP
PP
Art N PP
John
saw
a
with a telescope
in a park boy
John saw a boy in a park
with a telescope.
![Page 26: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/26.jpg)
S
NP VP
V NP PP
Art N PP
John
saw
a
with a telescope
in a park boy
S
NP VP
V NP
PP Art N PP
John
saw
a with a dog in a park boy
S
NP VP
V NP
PP
Art N PP
John
saw
a
with a statue
in a park boy
![Page 27: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/27.jpg)
Identify all noun phrases that refer to the same entity
John Simon, Chief Financial Officer of Prime Corp.
since 1986, saw his pay jump 20%, to $1.3 million,
as the 37-year-old also became the financial-
services company’s president...
Best results: F-measure of 70.4 (MUC-6) and 63.4 (MUC-7) [Ng & Cardie, 2002]
![Page 28: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/28.jpg)
Advances in software and hardware create NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis.
Stochastic and symbolic methods combine for real world applications.
![Page 29: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/29.jpg)
Speech Processing
A Voice Interface
![Page 30: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/30.jpg)
Some Applications
■Information Retrieval: Web search (uni- or multi-lingual)
■Query Answering/ Dialogue, e.g.,
■Report Generation: English/French weather report
■Foreign Language Training: Spanish/Arabic tutorial systems
for military linguists
■Machine Translation : on Yahoo
Chat-80
Babelfish
![Page 31: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/31.jpg)
Speech
Synthesizer
Speech
Recognizer
Natural
Language
Generator
“I would like to fly to
Seattle tomorrow.”
“When would you
like to leave?”
Natural
Language
Understanding
Dialog
Manager
Domain
Knowledge
![Page 32: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/32.jpg)
What is speech? Vibrations of vocal cords creates sound “ahh” Mouth, throat, tongue, lips shape sound
English speech 40 phonemes; 24 consonants, 16 vowels
Sounds transmit “language”
![Page 33: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/33.jpg)
Speech does not equal written language
![Page 34: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/34.jpg)
"I told him to go back where he came from, but he wouldn't listen."
![Page 35: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/35.jpg)
Tell which person it is (voice print)
Could also be important for monitoring meetings, determining speaker
![Page 36: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/36.jpg)
Primarily identifying words
Improving all the time
Commercial systems:
IBM ViaVoice, Dragon Dictate, ...
![Page 37: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/37.jpg)
Speaker dependent/independent Parametric patterns are sensitive to speaker With training (dependent) can get better
Vocabulary Some have 50,000+ words
Isolated word vs. continuous speech Continuous: where words stop & begin Typically a pattern match, no context used
Did you vs. Didja
![Page 38: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/38.jpg)
Java Speech SDK FreeTTS 1.1.1
http://freetts.sourceforge.net/docs/index.php
IBM JavaBeans for speech Visual/Real Basic speech SDK OS capabilities (speech recognition and
synthesis built in to OS) (TextEdit) VoiceXML
![Page 39: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/39.jpg)
tool automate the construction of NLP systems
avoid the need for large linguistic knowledge bases
portability move to new domain quickly
reduce the need for expertise in computational linguistics
robustness handle ungrammatical or unexpected text
missing domain knowledge
![Page 40: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/40.jpg)
Statistical methods have transformed the field of NLP
Very good performance on increasing numbers/types of problems in NLP
Thus far, the most successful statistical and ML algorithms are supervised learning algorithms
Require large amounts of training data that has been annotated with the “correct” answers
Corpus annotation bottleneck
![Page 41: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/41.jpg)
Japanese, Chinese, Thai, ...: no spaces between words
Combining simple statistics from unsegmented Japanese
newswire yields results rivaling grammar-based approaches.
[Ando & Lee 2000, 2003]
![Page 42: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/42.jpg)
Translating from one language to another is challenging even to human translators.
e.g. signs translated into English by a person:
Utmost of chicken with smashed pot. (restaurant in Greece)
Nervous meatballs (restaurant in Bulgaria)
The nuns harbor all diseases and have no respect for religion. (Swiss nunnery hospital)
All the water has been passed by the manager. (German hotel)
![Page 43: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/43.jpg)
Morphological analysis
Syntactic analysis
Semantic Interpretation
Interlingua
input analysis generation
Morphological synthesis
Syntactic realization
Lexical selection
output
![Page 44: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/44.jpg)
![Page 45: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/45.jpg)
Doesn’t work well enough yet
![Page 46: Intro to AI - Clarkson Universityjsearlem/cs451/fa13/lectures/24.NLU.pdf · language A large amount of human knowledge is assumed Language is pattern based: phonemes are components](https://reader033.vdocument.in/reader033/viewer/2022042304/5ecfd0a51d881f6a336c5ea5/html5/thumbnails/46.jpg)
ACL 2013 8th Workshop on Statistical Machine Translation
MT Summit 2013
Machine Translation without the Translation, Chronicle of Higher Education
NLPCS 2013: 10th International Workshop on Natural Language Processing and Cognitive Science
Workshop on Natural Language Processing and Automated Reasoning