announcements - university of washington...boulder beer mojo ipa goose island india pale ale great...

49
Announcements 1 ©2017 Kevin Jamieson HW3 problem 4c

Upload: others

Post on 19-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Announcements

1©2017 Kevin Jamieson

• HW3 problem 4c

Page 2: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Announcements

2©2017 Kevin Jamieson

• HW3 problem 4c

Page 3: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Announcements

3©2017 Kevin Jamieson

• HW3 problem 4c

Page 4: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson 4

Sequences and Recurrent Neural Networks

Machine Learning – CSE4546 Kevin Jamieson University of Washington

November 30, 2017

Page 5: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Variable length sequences

5

Images are usually standardized to be the same size (e.g., 256x256x3)

Neural Network

Page 6: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Variable length sequences

6

Images are usually standardized to be the same size (e.g., 256x256x3)

But what if we wanted to do classification on country-of-origin for names?

HintonScottish English Irish

Neural Network

Recurrent Neural Network

Page 7: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Variable length sequences

7

Recurrent Neural Network

Standard RNN

LSTM

Slide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 8: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson 8

Basic Text/Document Processing

Machine Learning – CSE4546 Kevin Jamieson University of Washington

November 30, 2017

Page 9: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

TF*IDF

9

n documents/articles with lots of text

How to get a feature representation of each article?

1. For each document d compute the proportion of times word t occurs out of all words in d, i.e. term frequency

2. For each word t in your corpus, compute the proportion of documents out of n that the word t occurs, i.e., document frequency

3. Compute score for word t in document d as

TFd,t

DFt

TFd,t log(1

DFt)

Page 10: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

http://www.ratebeer.com/beer/two-hearted-ale/1502/2/1/

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Two Hearted Ale - Input ~2500 natural language reviews

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Page 11: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Two Hearted Ale - Weighted Bag of Words:

Page 12: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Weighted count vector

for the ith beer:

zi 2 R400,000

Cosine distance:

d(zi, zj) = 1� zTi zj

||zi|| ||zj ||

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Two Hearted Ale - Nearest Neighbors: Bear Republic Racer 5 Avery IPA Stone India Pale Ale (IPA) Founders Centennial IPA Smuttynose IPA Anderson Valley Hop Ottin IPA AleSmith IPA BridgePort IPA Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose Cannon Hop3 Sweetwater IPA

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Page 13: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Find an embedding {x1, . . . , xn} ⇢ Rd such that

||xk � xi|| < ||xk � xj || whenever d(zk, zi) < d(zk, zj)

for all 100-nearest neighbors.

(10

7constraints, 10

5variables)

Solve with hinge loss and stochastic gradient descent.

Could have also used local-linear-embedding,

max-volume-unfolding, kernel-PCA, etc.

(20 minutes on my laptop) (d=2,err=6%) (d=3,err=4%)

Reviews for each beer

Bag of Words weighted by

TF*IDFEmbedding in d dimensions

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scaling

distance in 400,000

dimensional “word space”

Page 14: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Page 15: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Sanity check: styles should cluster together and similar styles should be close.

IPA

Pale aleBrown ale

Porter

Stout

Doppelbock

Belgian darkLambic

Wheat

Belgian lightWit

Light lager

Pilsner

AmberBlond

Page 16: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Algorithm requires feature representations of the beers {x1, . . . , xn} ⇢ Rd

BeerMapper - Under the Hood

Reviews for each beer

Bag of Words weighted by

TF*IDF

Get 100 nearest neighbors using cosine distance

Non-metric multidimensional

scalingEmbedding in d dimensions

Sanity check: styles should cluster together and similar styles should be close.

IPA

Pale aleBrown ale

Porter

Stout

Doppelbock

Belgian darkLambic

Wheat

Belgian lightWit

Light lager

Pilsner

AmberBlond

Page 17: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Other document modeling

17

1. Construct word x document matrix of counts

2. Compute non-negative matrix factorization

3. Use factorization to represent documents

4. Cluster documents into topics

Matrix factorization:

Also see latent Dirichlet factorization (LDA)

Page 18: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson 18

Previous section presented methods to embed documents into a latent space

Alternatively, we can embed words into a latent space

This embedding came from directly querying for relationships.

word2vec is a popular unsupervised learning approach that just uses a text corpus (e.g. nytimes.com)

Word embeddings, word2vec

Page 19: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Word embeddings, word2vec

19

slide: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Page 20: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Word embeddings, word2vec

20

slide: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Training neural network to predict co-occuring words. Use first layer weights as embedding, throw out output layer

Page 21: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

Word embeddings, word2vec

21

slide: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Training neural network to predict co-occuring words. Use first layer weights as embedding, throw out output layer

ehxants,ycariX

i

ehxants,yii

Page 22: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson

word2vec outputs

22

slide: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/

king - man + woman = queen

country - capital

Page 23: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

©Kevin Jamieson 23

Active Learning, classification

Machine Learning – CSE4546 Kevin Jamieson University of Washington

November 30, 2017

Page 24: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Impressive recent advances in image recognition and translation…

Page 25: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Time

Amount of data needed for state of the art model

Number available labels

Impressive recent advances in image recognition and translation…

Challenges for large models:1) An enormous amount of

labeled data is necessary for training

Page 26: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Time

Amount of data needed for state of the art model

Number available labels

Impressive recent advances in image recognition and translation…

Challenges for large models:1) An enormous amount of

labeled data is necessary for training

2) An enormous amount of wall-clock time is necessary for training

Page 27: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

airplaneautomobilebird

Example: Image recognition

Page 28: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

feature 1

feat

ure

2Example: Image recognition airplane

automobilebird

Page 29: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

feature 1

feat

ure

2

Nonadaptive label assignment

Example: Image recognition airplaneautomobilebird

Page 30: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

feature 1

feat

ure

2

Nonadaptive label assignment

Example: Image recognition airplaneautomobilebird

Page 31: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

feature 1

feat

ure

2

Nonadaptive label assignment

feature 1

feat

ure

2

Adaptive label assignment

Example: Image recognition airplaneautomobilebird

Page 32: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

feature 1

feat

ure

2

Nonadaptive label assignment

feature 1

feat

ure

2

Adaptive label assignment

Example: Image recognition airplaneautomobilebird

Page 33: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

complexity (reliability/robustness, scalability/computation, etc)

# labels

error

x1, x2, . . . i.i.d.random sampling

adaptive samplingxj may depend on {xi}i<j

Page 34: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

complexity (reliability/robustness, scalability/computation, etc)

# labels

error

x1, x2, . . . i.i.d.random sampling

adaptive samplingxj may depend on {xi}i<j

Being convinced that data-collection should be adaptive is not the same thing as knowing how to be adaptive.

Page 35: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Caption Contest #553January 20, 2017

“Maybe his second week will go better”Third

“I’d like to see other people”Second

“The corrupt media will blow this way out of proportion”First

Page 36: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

• n ⇡ 5000 captions submitted each week

• crowdsource contest to volunteers who rate captions

• goal: identify funniest caption

• 50+ weeks of experiments

Bob Mankoff Cartoon Editor, The New Yorker

newyorker.com/cartoons/vote

Page 37: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

“It's amazing to think he started out in the lobby.”

Page 38: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

“I thought all our plants moved to Mexico.”

38

Page 39: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

“Be patient. He'll grow on you.”

39

Which caption do we show next?1) Non-adaptive uniform distribution over captions 2) Adaptive: stop showing captions that will not win

Page 40: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Non-AdaptiveAdaptive

4-5 times fewer ratings needed

Which caption do we show next?1) Non-adaptive uniform distribution over captions 2) Adaptive: stop showing captions that will not win

Page 41: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Objective: with probability .99, identify using as few total samples as possible

arg max

i=1,...,nµi

While algorithm does not exit: - algorithm shows caption - Observe iid Bernoulli with

i 2 {1, . . . , n}P(“funny”) = µi

Sampling rule

Stopping rule

Best-action identification problem

Page 42: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Best-arm Identification n=2

prob

Number of heads|

µ2|

µ1

!1pm

!1pm

m = 2 log(1/�)��2=) bµ1,m > bµ2,m + 2

rlog(1/�)

2m=) µ1 > µ2

with probability � 1 � 2�

Consider n = 2 and flip coins i = 1, 2 to get Xi,1, Xi,2, . . . , Xi,m

bµi,m =1

m

mX

j=1

Xi,j

Test: bµ1,m � bµ2,m � 0

Arm 1 lower confidence bound

Arm 2 upper confidence bound>

By a Cherno↵ Bound, if � = µ1 � µ2 then

Page 43: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

captions

· · ·

1 2 3 n-1 n

Funn

y

0

1average # votes for “Funny”

confidence interval /

slog(n)

#votes

Page 44: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

captions

· · ·

1 2 3 n-1 n

Funn

y

0

1

· · ·

1 2 3 n-1 n

confidence interval /

slog(n)

#votes

more votes =) smaller intervals

Page 45: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

captions

Funn

y

0

1

· · ·

1 2 3 n-1 n

keep sampling until intervals do not overlap

confidence interval /

slog(n)

#votes

Page 46: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

captions

· · ·

1 2 3 n-1 n

Funn

y

0

1

�2 �3

confidence interval /

slog(n)

#votes

n max

i=1,...,n�

�2i log(n)# votes Non-adaptive:

Stop sampling caption i as soon as no overlap

nX

i=1

�2i log(n)Successive Elimination [Even-dar…’06]:

keep sampling until intervals do not overlap

Page 47: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose

Learn an accurate classifier using a small number of labels

Find the winner of a competition using a small number of judgementsVery related to adaptive A/B testing

Pure Exploration

Find the ad that results in highest click-through-rate and keep showing it

Balance of exploration versus exploitation

Page 48: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose
Page 49: Announcements - University of Washington...Boulder Beer Mojo IPA Goose Island India Pale Ale Great Divide Titan IPA New Holland Mad Hatter Ale Lagunitas India Pale Ale Heavy Seas Loose