Download - Stochastic Methods for NLP
Stochastic Methods for NLP
Probabilistic Context-Free ParsersProbabilistic Lexicalized Context-Free ParsersHidden Markov Models – Viterbi AlgorithmStatistical Decision-Tree Models
Probabilistic CFG
1. sent <- np, vp. p(sent) = p(r
1) * p(np) * p(vp).
2. np <- noun. p(np) = p(r
2) * p(noun).
....9. noun <- dog. p(noun) = p(dog).
The probabilities are taken from a particular corpus
of text.
Probabilistic Lexicalized CFG
1. sent <- np(noun), vp(verb). p(sent) = p(r
1) * p(np) * p(vp)
* p(verb|noun).2. np <- noun. p(np) = p(r
2) * p(noun).
....9. noun <- dog. p(noun) = p(dog).
Note that we've introduced the probability of a
particular verb given a particular noun.
Markov Chain
Discrete random process: The system is in various
states and we move from state to state. The
probability of moving to a particular next state (a
transition) depends solely on the current state and
not previous states (the Markov property).May be modeled by a finite state machine with
probabilities on the edges.
Hidden Markov Model
Each state (or transition) may produce an output.The outputs are visible to the viewer, but the
underlying Markov model is not.The problem is often to infer the path through the
model given a sequence of outputs.The probabilities associated with the transitions are
known a priori.There may be more than one start state. The
probability of each start state may also be known.
Uses of HMM
Parts of speech (POS) taggingSpeech recognitionHandwriting recognitionMachine TranslationCryptanalysis Many other non-NLP applications
Viterbi Algorithm
Used to find the mostly likely sequence of states
(the Viterbi path) in a HMM that leads to a given
sequence of observed events.Runs in time proportional to (number of
observations) * (number of states)2.Can be modified if the state depends on the last n
states (instead of just the last state). Take time
(number of observations) * (number of states)n
Viterbi Algorithm - Assumptions
The system at any given time is in one particular
state.There are a finite number of states.Transitions have an associated incremental metric.Events are cumulative over a path, i.e., additive in
some sense.
Viterbi Algorithm - Code
See the
http://en.wikipedia.org/wiki/Viterbi_algorithm.
Uses in NLP
Parts of speech (POS) tagging: The observations are
the words of the sentences. The HMM nodes are the
parts of speech.Speech recognition: The observations are the sound
waves (after some processing). The HMM may
contain the words in the text sentence, or the
phonemes.
Statistical Decision-Tree Model
SPATTER (Magerman)
Alternative approach to CFGs.Uses statistical measures generated by hand
annotation of large corpus of text.Automatically discovers disambiguation criteria for
parsingUses a stack decoding algorithmFinds one tree then uses branch-and-bound
Stack Decoding Algorithm
Similar to beam search, but claims to use a stack,
instead of a priority queue. The n best nodes (partial solutions) are kept.The best node is expanded and its children are put
on the stack.The stack is then trimmed back to n nodes.