the art of predictive analytics: more data, same models...

The Art of Predictive Analytics:More Data, Same Models

[STUDY SLIDES]

Joseph [email protected]

@turianMetaOptimize

2012.02.02

http://twitter.com/turian

NOTE:These are the STUDY slides from my talk at the predictive analytics meetup: http://bit.ly/xVLBuS

I have removed some graphics, and added some text.Please email me any questions

http://www.meetup.com/NYC-Predictive-Analytics/events/49643772/

Who am I?

Engineer with 20 yrs coding exp

PhD 10 yrs exp: large-scale ML + NLP

Founded MetaOptimize

What is MetaOptimize?

Consultancy + community on:

Large-scale ML + NLP

Well engineered solutions

http://metaoptimize.com/

“ Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective

knowledge.” - @aria42

http://metaoptimize.com/qa/


“ A lot of expertise inmachine learning is simply

developing effective biases.”

-Dan Melamed(quoted from memory)

What's a good choice of learning rate for the second layer of this neural net on image patches?

[intuition]

(Yoshua Bengio)

0.02!

Occam's Razoris a great example of ML intuition

Without the aid of prejudice and custom I should not be able to find my way across the room.

- William Hazlitt

It's fun to be a geek

Be an artist

How to build the world'sbiggest langid (langcat) model?

+ Vowpal Wabbit = Win

How to build the world'sbiggest langid (langcat) model?

SOLVED.

The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)

A lot of data with one featurecorrelated with the label

Twitter sentiment analysis?

Awesome! RT @rupertgrintnet Harry Potter Marks Place

in Film History http://bit.ly/Eusxi :)

“ Distant supervision”(Go et al., 09)

(Use emoticons as labels)

Recipe:You know a lot about the problem

Smart Priors

You know a lot about the problem:Smart Priors

Yarowsky (1995), WSD

1) One sense per collocation.2) One sense per discourse.

Recipe:You know a lot about the problem

Create new features

You know a lot about the problem:Create new features

Error-analysis

What errors is your model making?

DO SOME EXPLORATORYDATA ANALYSIS (EDA)

Andrew Ng: “ Advice for applying ML”Where do the errors come from?

Recipe:You know a little about the problem

Semi-supervised learning

You know a little about the problem:Semi-supervised learning

JOINT semi-supervised learningAndo and Zhang (2005)

Suzuki and Isozaki (2008)Suzuki et al. (2009), etc.

=> effective but task-specific

You know a little about the problem:Semi-supervised learning

Unsupervised learning,followed by Supervised learning

34

Supmodel

Supdata

Supervised training

How can Bob improve his model?

35

Supmodel

Supdata

Supervised training

Semi-suptraining?

36

Supmodel

Supdata

Supervised training

Semi-suptraining?

Morefeats

37

Supmodel

Supdata

Morefeats

Supmodel

Supdata

Morefeats

sup task 1

sup task 2

More features can be used on different tasks

38

Semi-supmodel

Unsupdata

Supdata

Joint semi-sup

(standard semi-sup setup)

39

Semi-supmodel

Unsupmodel

Unsupdata

Supdata

unsuppretraining

semi-supfine-tuningUnsupervised, then supervised

40

Unsupmodel

Unsupdata

unsuptraining

unsupfeats

Use unsupervised learning to create new features

41

Semi-supmodel

Unsupdata

unsuptraining

Sup training

Supdata

unsupfeats

These features can then be shared with other people

42

Unsupdata

unsuptraining

unsupfeats

sup task 1 sup task 2 sup task 3

Recipe:You know almost nothing

about the problem

Build cool generic features

Know almost nothing about problem:Build cool generic features

Word features(Turian et al., 2010)

http://metaoptimize.com/projects/wordreprs/



45

Brown clustering(Brown et al. 92)

(image from Terry Koo)

cluster(chairman) = `0010’2-prefix(cluster(chairman)) = `00’

46

50-dim embeddings: Collobert + Weston (2008)t-SNE vis by

van der Maaten +Hinton (2008)

Know almost nothing about problem:Build cool generic features

Document features:

Document clusteringLSA/LDA

Deep model

Document features

Salakhutdinov + Hinton 06

Domain adaptationfor sentiment analysis

(Glorot et al. 11)

Document features example

Recipe:You know a little about the problem

Make more REAL training examples

Make more real training examplesCuz you have some time

or a small budget

Amazon Mechanical Turk

Snow et al. 08“ Cheap and Fast – But is it Good?”

1K turk labels per dollar

Average over (5) Turks to reduce noise

=> http://crowdflower.com/

http://crowdflower.com/

Soylent (Bernstein et al. 10)

Find-Fix-Verify:Crowd control design pattern

Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...

Find a problem

Fix each problem

Verify quality of each fix

Make more real training examples

Active learning

Dualist (Settles 11)http://code.google.com/p/dualist/

http://code.google.com/p/dualist/

Dualist (Settles 11)http://code.google.com/p/dualist/

Applications:Document categorization

WSDInformation Extraction

Twitter sentiment analysis

http://code.google.com/p/dualist/

You know a little about the problem:Make more training examples

FAKE training examples


Denoising AARBM

MNIST distortions (LeCun et al. 98)

No negative examples?


Multi-view / multi-modal

Multi-view / multi-modal

How do you evaluate an IR system, if you have no labels?

See how good the title is at retrieving the body text.

2) KNOW THE DATA

Know the data

Labelled/structured data:ODP, Freebase, Wikipedia,

Dbpedia, etc.

Know the data

Unlabelled data:WaCKy, ClueWeb09, CommonCrawl,

Ngram corpora

NgramsGoogleBing

Google BooksRoll your own: Common crawl

Know the data

Do something stupid on a lot of data

Do something stupid on a lot of data:Ngrams

Spell-checkingPhrase segmentation

Word breakingSynonyms

Language modelsSee “An Overview of Microsoft Web N-gram Corpus and Applications” (Wang et al 10)


Web-scale k-means for NER(Lin and Wu 09)


Web-scale clustering

Know the data

Multi-modal learning

Multi-modal learningImages and captions

features features

“ facepalm”

=

Multi-modal learningTitles and article body

features features

Article body

=

Title

Multi-modal learningAudio and tags

features features

“ upbeat” ,“ hip hop”

=

3) IT'S MODELSALL THE WAY DOWN

Break down a pipeline1-best (greedy), k-best,

Finkel et al. 06

Good code to build on

Stanford NLP tools, clustering algorithms, Terry Koo's parser, etc.

Good code to build on

YOUR MODEL

Eat your own dogfoodBootstrapping (Yarowsky 95)

Co-training (Blum+Mitchell 98)EM (Nigam et al., 00)

Self-training (McClosky et al., 06)

Dualist (Settles '11)Active learning + semisup learning

Eat your own dogfood

Cheap bootstrapping:One step of EM

(Settles 11)

“ Awesome! What a great movie!”

It's models all the way down

Use models to annotate

Low recall + high precision+ lots of data = win

Use models to annotate

Face modeling

Pose-invariant face features

It's models all the way down

THE FUTURE?

Joins on large noisy data sets

Joins on large noisy data sets

ReVerb (Fader et al., 11)http://reverb.cs.washington.edu

Extractions over entire ClueWeb09(826 MB compressed)

http://reverb.cs.washington.edu/

ReVerb (Fader et al., 11)

⋈

Joins on noisy data sets(can clean up the data??)

???

The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)

Summary of recipes:Know your problem

Throw in good featuresUse other's good models in yr pipeline

Make more training examplesUse a lot of data

"It especially annoys me when racists are accused of 'discrimination.'

The ability to discriminate is a precious facility; by judging all members of one

'race' to be the same, the racist precisely shows himself incapable of discrimination."

- Christopher Hitchens (RIP)

Other cool research to look at:* Frustratingly easy domain adaptation (Daume 07)* The Unreasonable Effectiveness of Data(Halevy et al 09)* Web-scale algorithms (search on http://metaoptimize.com/qa/)* Self-taught learning (Raina et al 07)


Joseph [email protected]

@turianhttp://metaoptimize.com/qa/

2012.02.02

Please email me any questions

http://twitter.com/turian

the art of predictive analytics: more data, same models...

Documents