stochastic methods for nlp

12
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context- Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical Decision-Tree Models

Upload: gazit

Post on 05-Jan-2016

26 views

Category:

Documents


3 download

DESCRIPTION

Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical Decision-Tree Models. Stochastic Methods for NLP. 1. sent

TRANSCRIPT

Page 1: Stochastic Methods for NLP

Stochastic Methods for NLP

Probabilistic Context-Free ParsersProbabilistic Lexicalized Context-Free ParsersHidden Markov Models – Viterbi AlgorithmStatistical Decision-Tree Models

Page 2: Stochastic Methods for NLP

Probabilistic CFG

1. sent <- np, vp. p(sent) = p(r

1) * p(np) * p(vp).

2. np <- noun. p(np) = p(r

2) * p(noun).

....9. noun <- dog. p(noun) = p(dog).

The probabilities are taken from a particular corpus

of text.

Page 3: Stochastic Methods for NLP

Probabilistic Lexicalized CFG

1. sent <- np(noun), vp(verb). p(sent) = p(r

1) * p(np) * p(vp)

* p(verb|noun).2. np <- noun. p(np) = p(r

2) * p(noun).

....9. noun <- dog. p(noun) = p(dog).

Note that we've introduced the probability of a

particular verb given a particular noun.

Page 4: Stochastic Methods for NLP

Markov Chain

Discrete random process: The system is in various

states and we move from state to state. The

probability of moving to a particular next state (a

transition) depends solely on the current state and

not previous states (the Markov property).May be modeled by a finite state machine with

probabilities on the edges.

Page 5: Stochastic Methods for NLP

Hidden Markov Model

Each state (or transition) may produce an output.The outputs are visible to the viewer, but the

underlying Markov model is not.The problem is often to infer the path through the

model given a sequence of outputs.The probabilities associated with the transitions are

known a priori.There may be more than one start state. The

probability of each start state may also be known.

Page 6: Stochastic Methods for NLP

Uses of HMM

Parts of speech (POS) taggingSpeech recognitionHandwriting recognitionMachine TranslationCryptanalysis Many other non-NLP applications

Page 7: Stochastic Methods for NLP

Viterbi Algorithm

Used to find the mostly likely sequence of states

(the Viterbi path) in a HMM that leads to a given

sequence of observed events.Runs in time proportional to (number of

observations) * (number of states)2.Can be modified if the state depends on the last n

states (instead of just the last state). Take time

(number of observations) * (number of states)n

Page 8: Stochastic Methods for NLP

Viterbi Algorithm - Assumptions

The system at any given time is in one particular

state.There are a finite number of states.Transitions have an associated incremental metric.Events are cumulative over a path, i.e., additive in

some sense.

Page 9: Stochastic Methods for NLP

Viterbi Algorithm - Code

See the

http://en.wikipedia.org/wiki/Viterbi_algorithm.

Page 10: Stochastic Methods for NLP

Uses in NLP

Parts of speech (POS) tagging: The observations are

the words of the sentences. The HMM nodes are the

parts of speech.Speech recognition: The observations are the sound

waves (after some processing). The HMM may

contain the words in the text sentence, or the

phonemes.

Page 11: Stochastic Methods for NLP

Statistical Decision-Tree Model

SPATTER (Magerman)

Alternative approach to CFGs.Uses statistical measures generated by hand

annotation of large corpus of text.Automatically discovers disambiguation criteria for

parsingUses a stack decoding algorithmFinds one tree then uses branch-and-bound

Page 12: Stochastic Methods for NLP

Stack Decoding Algorithm

Similar to beam search, but claims to use a stack,

instead of a priority queue. The n best nodes (partial solutions) are kept.The best node is expanded and its children are put

on the stack.The stack is then trimmed back to n nodes.