statistical nlp
TRANSCRIPT
-
8/18/2019 Statistical NLP
1/19
Statistical Natural Language
Processing
-
8/18/2019 Statistical NLP
2/19
What is NLP?
Natural Language Processing (NLP), orComputational Linguistics, is concerned withtheoretical and practical issues in the design
and implementation of computer systems forprocessing human languages
It is an interdisciplinary field which draws on
other areas of study such as computerscience, artificial intelligence, linguistics andlogic
-
8/18/2019 Statistical NLP
3/19
pplications of NLP
natural language interfaces todata!ases
programs for classifying and retrie"ingdocuments !y content
e#planation generation for e#pertsystems
machine translationad"anced word$processing tools
-
8/18/2019 Statistical NLP
4/19
What ma%es NLP a
computational challenge?
m!iguous nature of Natural Language&
'here are "aried applications for
language technology
nowledge representation is a difficult
tas%&
'here are different le"els of information
encoded in our language
-
8/18/2019 Statistical NLP
5/19
What is statistical NLP?
Statistical NLP aims to perform
statistical inference for the field of NLP
Statistical inference consists of ta%ingsome data generated in accordance
with some un%nown pro!a!ility
distri!ution and ma%ing inferences&
-
8/18/2019 Statistical NLP
6/19
oti"ations for Statistical NLP
Cogniti"e modeling of the human language
processing has not reached a stage where
we can ha"e a complete mapping !etween
the language signal and the informationcontents&
Complete mapping is not always re*uired&
Statistical approach pro"ides the fle#i!ilityre*uired for ma%ing the modeling of a
language more accurate&
-
8/18/2019 Statistical NLP
7/19
Idea !ehind Statistical NLP
+iew language processing as a noisy
channel information transmission&
'he approach re*uires a model thatcharacteries the transmission !y gi"ing
for e"ery message the pro!a!ility of the
o!ser"ed output
-
8/18/2019 Statistical NLP
8/19
Statistical odeling and
Classification
Primiti"e acoustic features
-uantiation
a#imum li%elihood and related rulesClass conditional density function
.idden ar%o" odel ethodology
-
8/18/2019 Statistical NLP
9/19
/etails0&
Primitive acoustic features are used to
estimate the speech spectrum on the !asis of
its statistical properties&
1y means of quantization a typical speechsignal can !e represented as a se*uence of
sym!ols and can !e mapped using statistical
decision rules into a multidimensional
acoustic feature space, thus classifying the
signal&
-
8/18/2019 Statistical NLP
10/19
a#imum Li%elihood
lthough there is no direct method for computing thepro!a!ility of a phonetic unit gi"en its acousticfeatures,we can use 1ayes rule to estimate theprobability of a phonetic class given its features
from the likelihood of the features given theclass& 'his method leads to the ma#imum li%elihoodclassifier which assigns an un%nown "ector to thatclass whose pro!a!ility density function conditionedon the class has the ma#imum "alue&
nother "ariant of the ma#imum li%elihood methodology
is clustering&
-
8/18/2019 Statistical NLP
11/19
.idden ar%o" odels
.idden ar%o" odel, is a set of states (le#icalcategories in our case) with directed edgesla!eled with transition probabilities thatindicate the pro!a!ility of mo"ing to the state at
the end of the directed edge, gi"en that one isnow in the state at the start of the edge& 'hestates are also la!eled with a function whichindicates the pro!a!ilities of outputting differentsym!ols if in that state (while in a state, one
outputs a single sym!ol !efore mo"ing to thene#t state)& In our case, the sym!ol output froma state2le#ical category is a word !elonging tothat le#ical category&
-
8/18/2019 Statistical NLP
12/19
.idden ar%o" odels (cont&)
-
8/18/2019 Statistical NLP
13/19
Conditional Class /ensity
3unction
ll statistical methods of speech
recognition depend on the class
conditional density function&
'hese, in turn, depend on the e#istence
of a sufficiently large, correctly la!eled
training set and well understood
statistical estimation techni*ues
-
8/18/2019 Statistical NLP
14/19
.ow does statistics help
/isam!iguation may !e achie"ed !yusing stochastic conte#t free grammars
It helps in pro"iding degrees ofgrammaticality
Naturalness
Structural preference
4rror 'olerance
-
8/18/2019 Statistical NLP
15/19
4#ample using stochastic
C35
for e#ample consider the sentence
6 7ohn Wal%s 6
'he grammar is as follows 8
9 S $: NP + ;&<= S $: NP ;&>
> NP $: N ;&
@ NP $: N N ;&=
A N $: 7ohn ;&B
B N $: Wal%s ;&@
< + $: Wal%s 9&;
'he num!ers on the right represent the weights for each rule&'he weightof the analysis is the product of the weights of the rules used in thederi"ation&
Predicting the right sentence that is percei"edis !ased on these weights&
-
8/18/2019 Statistical NLP
16/19
/egrees of grammaticality
'raditional approaches to NLP do not
accommodate gradations of
grammaticality& sentence is either
correct or not&
In some cases accepta!ility may "ary
with the structure and conte#t of the
sentence&
-
8/18/2019 Statistical NLP
17/19
Structural Preference
Consider the sentence
6 'he emergency crews hate most is domestic
"iolence&
'he correct interpretation is86'he emergency Dthat the crews hate mostE is domestic
"iolence&
'hese preferences can !e seen more as structural
preferences rather than parsing preferences&Statistical approaches can easily handle such structural
preferences&
-
8/18/2019 Statistical NLP
18/19
4rror 'olerance
remar%a!le property of human
language comprehension is error
tolerance&
any sentences that the traditional
approach classifies as ungrammatical
can actually !e interpreted !y statistical
NLP techni*ues&
-
8/18/2019 Statistical NLP
19/19
Conclusions
3ree and commercial software is now
a"aila!le that pro"ides a lot of NLP features&
(e&g& icrosoft FP has a speech recognition
software !y which users can control menusand e#ecute commands)
lot of research is going into de"eloping new
applications and in"estigating new techni*ues
and approaches that will ma%e Statistical NLP
more feasi!le in the near future&