cs60057 speech &natural language processing
DESCRIPTION
CS60057 Speech &Natural Language Processing. Autumn 2005. Lecture 2 22 July 2005. Today’s slides adapted from Ilyas Cicekli’s slide http://www.cs.ucf.edu/~ilyas/Courses/CAP6640 Martin & Jurafsky’s book. Text Books. - PowerPoint PPT PresentationTRANSCRIPT
Lecture 2, 7/22/2005 Natural Language Processing 1
CS60057Speech &Natural Language Processing
Autumn 2005
Lecture 2
22 July 2005
Lecture 2, 7/22/2005 Natural Language Processing 2
Today’s slides adapted fromIlyas Cicekli’s slide
http://www.cs.ucf.edu/~ilyas/Courses/CAP6640
Martin & Jurafsky’s book
Lecture 2, 7/22/2005 Natural Language Processing 3
Text Books
Daniel Jurafsky, and James H. Martin, "Speech and Language Processing", Prentice Hall, 2000.
Other References James Allen, "Natural Language Understanding", Second edition,
The Benjamin/Cumings Publishing Company Inc., 1995 Christopher D. Manning, and Hinrich Schutze, "Foundations of Statistical
Natural Language Processing", The MIT Press, 1999.
Lecture 2, 7/22/2005 Natural Language Processing 4
Why Should You Care?
Trends1. An enormous amount of knowledge is now available in
machine readable form as natural language text
2. Human-computer communication is becoming increasingly in vogue – dialog based systems.
Question answering, conversational agents, Machine translation
Information Retrieval, Summarization, Fusion,
Lecture 2, 7/22/2005 Natural Language Processing 5
Ambiguity
I made her duck.
How many different interpretations does this sentence have?
What are the reasons for the ambiguity? The categories of knowledge of language can be thought
of as ambiguity resolving components. How can each ambiguous piece be resolved? Does speech input make the sentence even more
ambiguous? Yes – deciding word boundaries
Lecture 2, 7/22/2005 Natural Language Processing 6
Ambiguity Some interpretations of : I made her duck.
1. I cooked duck for her.2. I cooked duck belonging to her.3. I created a toy duck which she owns.4. I caused her to quickly lower her head or body.5. I used magic and turned her into a duck.
duck – morphologically and syntactically ambiguous: noun or verb.
her – syntactically ambiguous: dative or possessive. make – semantically ambiguous: cook or create. make – syntactically ambiguous:
Transitive – takes a direct object. => 2 Di-transitive – takes two objects. => 5 Takes a direct object and a verb. => 4
Lecture 2, 7/22/2005 Natural Language Processing 7
Why is NLP difficult? Because Natural Language is highly ambiguous.
Syntactic ambiguity The president spoke to the nation about the problem of drug
use in the schools from one coast to the other. has 720 parses. Ex:
• “to the other” can attach to any of the previous NPs (ex. “the problem”), or the head verb 6 places
• “from one coast” has 5 places to attach• …
Lecture 2, 7/22/2005 Natural Language Processing 8
Ambiguity in a Bengali/Hindi Sentence
Give examples
Some interpretations of:
Morphological Ambiguity:
Semantic Ambiguity:
Lecture 2, 7/22/2005 Natural Language Processing 9
Why is NLP difficult? Word category ambiguity
book --> verb? or noun? Word sense ambiguity
bank --> financial institution? building? or river side? Words can mean more than their sum of parts
make up a story Fictitious worlds
People on mars can fly. Defining scope
People like ice-cream. Does this mean that all (or some?) people like ice cream?
Language is changing and evolving I’ll email you my answer. This new S.U.V. has a compartment for your mobile phone. Googling, …
Lecture 2, 7/22/2005 Natural Language Processing 10
Resolve Ambiguities
We will introduce models and algorithms to resolve ambiguities at
different levels.
part-of-speech tagging -- Deciding whether duck is verb or noun.
word-sense disambiguation -- Deciding whether make is create or
cook.
lexical disambiguation -- Resolution of part-of-speech and
word-sense ambiguities are two important kinds of lexical
disambiguation.
syntactic ambiguity -- her duck is an example of syntactic
ambiguity, and can be addressed by probabilistic parsing.
Lecture 2, 7/22/2005 Natural Language Processing 11
Resolve Ambiguities (cont.)I made her duck
S S
NP VP NP VP
I V NP NP I V NP
made her duck made DET N
her duck
Lecture 2, 7/22/2005 Natural Language Processing 12
Dealing with Ambiguity
Three approaches: Tightly coupled interaction among processing levels; knowledge
from other levels can help decide among choices at ambiguous levels.
Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.
Syntax proposes/semantics disposes approach Probabilistic approaches based on making the most likely choices
Lecture 2, 7/22/2005 Natural Language Processing 13
Models to Represent Linguistic Knowledge
Different formalisms (models) are used to represent the required linguistic knowledge.
State Machines -- FSAs, HMMs, ATNs, RTNs Formal Rule Systems -- Context Free Grammars,
Unification Grammars, Probabilistic CFGs. Logic-based Formalisms -- first order predicate logic,
some higher order logic. Models of Uncertainty -- Bayesian probability theory.
Lecture 2, 7/22/2005 Natural Language Processing 14
Algorithms to Manipulate Linguistic Knowledge
We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior.
Most of the algorithms we will study are transducers and parsers. These algorithms construct some structure based on their input.
Since the language is ambiguous at all levels,
these algorithms are never simple processes. Categories of most algorithms that will be used can fall
into following categories. state space search dynamic programming
Lecture 2, 7/22/2005 Natural Language Processing 15
Language and Intelligence
Turing Test Computer Human
Human Judge
Human Judge asks tele-typed questions to Computer and Human. Computer’s job is to act like a human. Human’s job is to convince Judge that he is not machine. Computer is judged “intelligent” if it can fool the judge Judgment of intelligence is linked to appropriate answers to
questions from the system.
Lecture 2, 7/22/2005 Natural Language Processing 16
NLP - an inter-disciplinary Field
NLP borrows techniques and insights from several disciplines. Linguistics: How do words form phrases and sentences? What
constraints the possible meaning for a sentence? Computational Linguistics: How is the structure of sentences are
identified? How can knowledge and reasoning be modeled? Computer Science: Algorithms for automatons, parsers. Engineering: Stochastic techniques for ambiguity resolution. Psychology: What linguistic constructions are easy or difficult for
people to learn to use? Philosophy: What is the meaning, and how do words and
sentences acquire it?
Lecture 2, 7/22/2005 Natural Language Processing 17
Some Buzz-Words
NLP – Natural Language Processing CL – Computational Linguistics SP – Speech Processing HLT – Human Language Technology NLE – Natural Language Engineering SNLP – Statistical Natural Language Processing
Other Areas: Speech Generation, Text Generation, Speech Understanding,
Information Retrieval, Dialogue Processing, Inference, Spelling Correction, Grammar
Correction, Text Summarization, Text Categorization,
Lecture 2, 7/22/2005 Natural Language Processing 18
Some NLP Applications
Machine Translation – Translation between two natural languages. Babel Fish translations system, Systran
Information Retrieval – Web search (uni-lingual or multi-lingual).
Query Answering/Dialogue – Natural language interface with a database system, or a dialogue system.
Report Generation – Generation of reports such as weather reports.
Other Applications – Grammar Checking, Spell Checking, Spell Corrector
Lecture 2, 7/22/2005 Natural Language Processing 20
The Big Picture
Speech recognition Speech Synthesis
Source text Analysis Target text Generation
Source Language Speech Signal
Target Language Speech Signal
Lecture 2, 7/22/2005 Natural Language Processing 21
The Reductionist Approach
Text Normalization
Morphological Analysis
POS Tagging
Parsing
Semantic Analysis
Discourse Analysis
Text Rendering
Morphological Synthesis
Phrase Generation
Role Ordering
Lexical Choice
Discourse Planning
Source Language Analysis Target Language Generation
Lecture 2, 7/22/2005 Natural Language Processing 22
Natural Language Understanding
Words
Morphological Analysis
Morphologically analyzed words (another step: POS tagging)
Syntactic Analysis
Syntactic Structure
Semantic Analysis
Context-independent meaning representation
Discourse Processing
Final meaning representation
Lecture 2, 7/22/2005 Natural Language Processing 23
Natural Language Generation
Meaning representation
Utterance PlanningMeaning representations for sentences
Sentence Planning and Lexical ChoiceSyntactic structures of sentences with lexical choices
Sentence Generation
Morphologically analyzed words
Morphological GenerationWords
Lecture 2, 7/22/2005 Natural Language Processing 24
Natural Language Generation
NLG is the process of constructing natural language outputs from non-linguistic inputs. the reverse process of NL understanding.
A NLG system may have two main parts: Discourse Planning -- what will be generated, Surface Realization -- realizes a sentence from its
internal representation. Lexical Choice
selecting the correct words describing the concepts.
Lecture 2, 7/22/2005 Natural Language Processing 25
Machine Translation
Machine Translation -- converting a text in language A into the corresponding text in language B (or speech).
Different Machine Translation architectures: interlingua based systems transfer based systems
How to acquire the required knowledge resources such as mapping rules and bi-lingual dictionary? By hand or acquire them automatically from corpora.
Example Based Machine Translation acquires the required knowledge (some of it or all of it) from corpora.
Lecture 2, 7/22/2005 Natural Language Processing 26
Some statistics (old)
Business e-mail sent per day in the US: 2.1Billion First class mail per year: 107 Billion Text on Internet
(2/99): > 6TB Current: ?
indexed: 16% (Lawrence and Giles, Nature 400, 1999) Dialog (www.dialog.com): 9 TB Average college library: 1 TB
Lecture 2, 7/22/2005 Natural Language Processing 27
Languages
Languages: 39,000 languages and dialects (22,000 dialects in India alone)
Top languages: Chinese/Mandarin (885M), Spanish (332M), English (322M), Bengali (189M), Hindi (182M), Portuguese (170M), Russian (170M), Japanese (125M)
Source: www.sil.org/ethnologue, www.nytimes.com Internet: English (128M), Japanese (19.7M), German (14M),
Spanish (9.4M), French (9.3M), Chinese (7.0M) Usage: English (1999-54%, 2001-51%, 2003-46%, 2005-43%) Source: www.computereconomics.com