a* search

36
A* Search Uses evaluation function f (n)= g(n) + h(n) where n is a node. g is a cost function • Total cost incurred so far from initial state at node n • For 8-puzzle, each move has equal cost h is an heuristic that estimates cost to goal

Upload: kura

Post on 23-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

A* Search. Uses evaluation function f ( n ) = g(n ) + h(n ) where n is a node. g is a cost function Total cost incurred so far from initial state at node n For 8-puzzle, each move has equal cost h is an heuristic that estimates cost to goal. (Hamming distance) . A* Pseudocode. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A* Search

A* Search

• Uses evaluation function f (n)= g(n) + h(n) where n is a node.

– g is a cost function• Total cost incurred so far from initial state at node n• For 8-puzzle, each move has equal cost

– h is an heuristic that estimates cost to goal

Page 2: A* Search

(Hamming distance)

Page 3: A* Search

A* Pseudocodecreate the open list of nodes, initially containing only our starting node

create the closed list of nodes, initially empty

while (we have not reached our goal) { consider the best node in the open list (the node with the lowest f value)

if (this node is the goal) { then we're done } else {

move the current node to the closed list and consider all of its successors

for (each successor) { if (this successor is in the closed list and our current g value is lower) {

update the successor with the new, lower, g value change the successor's parent to our current node }

else if (this successor is in the open list and our current g value is lower) {

update the successor with the new, lower, g value change the successor's parent to our current node }

else this successor is not in either the open or closed list {

add the successor to the open list and set its g value } } } }

Page 4: A* Search

Go over HW 2

Page 5: A* Search

Go over HW 3

Nim Demo: http://www.math.uri.edu/~bkaskosz/flashmo/marien.html

Page 6: A* Search

Presentations on Wednesday:

Nathan NifongMatthaus Litteken

Page 7: A* Search

Example of statistical language models:n-grams

• Estimates probability distribution of a word w, given n-1 words that have come before in the sequence. P(w|w1, w2, …, wn-1)

• Purpose: to guess next word from previous words to disambiguate:

– “Students have access to a list of course requirements”

– “Would you like a drink of [garbled]?

– “He loves going to the [bark].”

Page 8: A* Search

• N grams: Applications throughout natural language processing:– text classification– speech recognition– machine translation– intelligent spell-checking– handwriting recognition– playing “Jeopardy!”

Page 9: A* Search

• What is P(bark|he loves going to the) ?• What is P(park|he loves going to the) ?

• Can estimate from a large corpus:

– P (w |w1, .., wn-1 ) = frequency of w1…wn-1 w divided by frequency of w1…wn-1

– Example: Use Google

Page 10: A* Search

Problem! Web doesn’t give us enough examples to get good statistics.

One solution: – Approximate P (w|w1, .., wn-1) by using small n (e.g., n =2:

bigrams).

Bigram example: P (bark|the) vs. P(park|the)

(calculate using Google)

Trigram: P (bark|to the) vs. P (park|to the)

Page 11: A* Search

Typically, bigrams are used:

Let candidate utterance s = w1w2...wn

Then P(s) = P(wkk=1

n

∏ |wk−1) (by the chain rule)

count""for stands where)(

)()|(1

11 C

wCwwCwwPk

kkkk

Page 12: A* Search

Now can calculate probability of utterance:

P (he loves going to the bark)P(he|<s>) P (loves| he) P (going| loves) P (to| going) P (the| to) P (bark | the) P(</s> bark)

<s> = sentence start marker</s> = sentence end marker

Page 13: A* Search

Text classification using N-grams

Page 14: A* Search

Mini-example(Adapted from Jurafsky & Martin, 2000)

Corpus 1 (Class 1):

Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s>

<s> I am the egg man</s>

Class 1 Bigram Probabilities (examples):

Class 1 Bigram Probabilities (examples):P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67

Page 15: A* Search

Mini-example(Adapted from Jurafsky & Martin, 2000)

Corpus 1 (Class 1):

Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s>

<s> I am the egg man</s>

Class 1 Bigram Probabilities (examples):

Class 2 Bigram Probabilities (examples):P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1 P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67

New sentence 1: “They are the egg man”

Page 16: A* Search

Mini-example(Adapted from Jurafsky & Martin, 2000)

Corpus 1 (Class 1):

Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s>

<s> I am the egg man</s>

Class 1 Bigram Probabilities (examples):

Class 2 Bigram Probabilities (examples):P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67

New sentence 1: “They are the egg man”New sentence 2: “Goo goo g’joob”

Page 17: A* Search

N-gram approximation to Shakespeare(Jurafsky and Martin, 2000)

• Trained unigram, bigram, trigram, and quadrigram model on complete corpus of Shakespeare’s works (including punctuation).

• Use these models to generate random sentences by choosing new unigram/bigram/trigram/quadrigram probabilistically

Page 18: A* Search

Unigram model1. To him swallowed confess hear both. Which. Of

save on trail for are ay device and rote life have.

2. Every enter now severally so, let

3. Hill he late speaks; or! a more to leg less first you enter

4. Are where exeunt and sighs have rise excellency took of...Sleep knave we. near; vile like.

Page 19: A* Search

Bigram model1. What means, sir. I confess she? then all sorts,

he is trim, captain.

2. Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow.

3. What we, hath got so she that I rest and sent to scold and nature bankrupt, nor the first gentleman?

4. Thou whoreson chops. Consumption catch your dearest friend, well, and I know where many mouths upon my undoing all but be, how soon, then; we’ll execute upon my love’s bonds and we do you will?

Page 20: A* Search

Trigram model1. Sweet prince, Falstaff shall die. Harry of

Monmouth’s grave.

2. This shall forbid it should be branded, if renown made it empty.

3. Indeed the duke; and had a very good friend.

4. Fly, and will rid me these news of price. Therefore the sadness of parting, as they say, ‘tis done.

Page 21: A* Search

Quadrigram model

1. King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in;

2. Will you not tell me who I am?

3. Indeed the short and long. Marry, ‘tis a noble Lepidus.

4. Enter Leonato’s brother Antonio, and the rest, but seek the weary beds of people sick.

Page 22: A* Search

From Cavnar and TrenkleN-gram-based text categorization

(1994) • Early paper, but clearly lays out main ideas of n-gram text

classification.

• Categorization of USENET newsgroups– by language– by topic

Page 23: A* Search

Categorization requirements

Page 24: A* Search

N-grams (in this paper)

• N-character slice (rather than N-word slice)

• Examples:

Page 25: A* Search

Advantages of using character n-grams versus word n-grams

• Less sensitive to errors (e.g., in OCR documents)

• Helps deal with limited statistics problem (some words might not appear in document)

Page 26: A* Search

Frequency distribution of n-grams

• Zipf’s law: Frequency (n-gram) ≅ 1 / rank(n-gram)

Also true for words

Page 27: A* Search
Page 28: A* Search

Generate profile of document

Can also do this for entire category by putting all n-grams from all category documents in a single “bag of n-grams”

Page 29: A* Search

Observations

Page 30: A* Search
Page 31: A* Search
Page 32: A* Search

Measure profile distance

• Given profile for entire category (e.g., “cryptography”), can calculate distance from a new document to that category by comparing their profiles.

• For each n-gram in document profile, calculate how “out of place” it is in rank compared with its rank in the category profile.

Page 33: A* Search
Page 34: A* Search

Classifying document

• To classify document D, calculate its distance from each category, and choose the category with minimum distance (must be below some threshold distance).

• If no category is below threshold distance, then class of D is “not known”.

Page 35: A* Search

Cavnar and Trenkle’s Results

• Used newsgroup FAQs as “category” documents from which to “learn” n-gram models

Results (Confusion Matrix)

Page 36: A* Search

Smoothing

Needed to overcome problem of sparse data.

E.g., even in a large corpus, can get zero probability for valid bigrams.

Laplace smoothing (or “add-one” smoothing):Add 1 to all the bigram counts