stochastic methods for nlp

Stochastic Methods for NLP

Probabilistic Context-Free ParsersProbabilistic Lexicalized Context-Free ParsersHidden Markov Models – Viterbi AlgorithmStatistical Decision-Tree Models

Probabilistic CFG

1. sent <- np, vp. p(sent) = p(r

1) * p(np) * p(vp).

2. np <- noun. p(np) = p(r

2) * p(noun).

....9. noun <- dog. p(noun) = p(dog).

The probabilities are taken from a particular corpus

of text.

Probabilistic Lexicalized CFG

1. sent <- np(noun), vp(verb). p(sent) = p(r

1) * p(np) * p(vp)

* p(verb|noun).2. np <- noun. p(np) = p(r

2) * p(noun).

....9. noun <- dog. p(noun) = p(dog).

Note that we've introduced the probability of a

particular verb given a particular noun.

Markov Chain

Discrete random process: The system is in various

states and we move from state to state. The

probability of moving to a particular next state (a

transition) depends solely on the current state and

not previous states (the Markov property).May be modeled by a finite state machine with

probabilities on the edges.

Hidden Markov Model

Each state (or transition) may produce an output.The outputs are visible to the viewer, but the

underlying Markov model is not.The problem is often to infer the path through the

model given a sequence of outputs.The probabilities associated with the transitions are

known a priori.There may be more than one start state. The

probability of each start state may also be known.

Uses of HMM

Parts of speech (POS) taggingSpeech recognitionHandwriting recognitionMachine TranslationCryptanalysis Many other non-NLP applications

Viterbi Algorithm

Used to find the mostly likely sequence of states

(the Viterbi path) in a HMM that leads to a given

sequence of observed events.Runs in time proportional to (number of

observations) * (number of states)2.Can be modified if the state depends on the last n

states (instead of just the last state). Take time

(number of observations) * (number of states)n

Viterbi Algorithm - Assumptions

The system at any given time is in one particular

state.There are a finite number of states.Transitions have an associated incremental metric.Events are cumulative over a path, i.e., additive in

some sense.

Viterbi Algorithm - Code

See the

http://en.wikipedia.org/wiki/Viterbi_algorithm.

Uses in NLP

Parts of speech (POS) tagging: The observations are

the words of the sentences. The HMM nodes are the

parts of speech.Speech recognition: The observations are the sound

waves (after some processing). The HMM may

contain the words in the text sentence, or the

phonemes.

Statistical Decision-Tree Model

SPATTER (Magerman)

Alternative approach to CFGs.Uses statistical measures generated by hand

annotation of large corpus of text.Automatically discovers disambiguation criteria for

parsingUses a stack decoding algorithmFinds one tree then uses branch-and-bound

Stack Decoding Algorithm

Similar to beam search, but claims to use a stack,

instead of a priority queue. The n best nodes (partial solutions) are kept.The best node is expanded and its children are put

on the stack.The stack is then trimmed back to n nodes.

stochastic methods for nlp

Documents