the art of predictive analytics: more data, same models...
TRANSCRIPT
The Art of Predictive Analytics:More Data, Same Models
[STUDY SLIDES]
Joseph [email protected]
@turianMetaOptimize
2012.02.02
NOTE:These are the STUDY slides from my talk at the predictive analytics meetup: http://bit.ly/xVLBuS
I have removed some graphics, and added some text.Please email me any questions
Who am I?
Engineer with 20 yrs coding exp
PhD 10 yrs exp: large-scale ML + NLP
Founded MetaOptimize
What is MetaOptimize?
Consultancy + community on:
Large-scale ML + NLP
Well engineered solutions
“ Both NLP and ML have a lot of folk wisdom about what works and what doesn't. [This site] is crucial for sharing this collective
knowledge.” - @aria42
http://metaoptimize.com/qa/
http://metaoptimize.com/qa/
http://metaoptimize.com/qa/
“ A lot of expertise inmachine learning is simply
developing effective biases.”
-Dan Melamed(quoted from memory)
What's a good choice of learning rate for the second layer of this neural net on image patches?
[intuition]
(Yoshua Bengio)
0.02!
Occam's Razoris a great example of ML intuition
Without the aid of prejudice and custom I should not be able to find my way across the room.
- William Hazlitt
It's fun to be a geek
Be an artist
Be an artist
How to build the world'sbiggest langid (langcat) model?
+ Vowpal Wabbit = Win
How to build the world'sbiggest langid (langcat) model?
SOLVED.
The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)
A lot of data with one featurecorrelated with the label
Twitter sentiment analysis?
Awesome! RT @rupertgrintnet Harry Potter Marks Place
in Film History http://bit.ly/Eusxi :)
“ Distant supervision”(Go et al., 09)
(Use emoticons as labels)
Recipe:You know a lot about the problem
Smart Priors
You know a lot about the problem:Smart Priors
Yarowsky (1995), WSD
1) One sense per collocation.2) One sense per discourse.
Recipe:You know a lot about the problem
Create new features
You know a lot about the problem:Create new features
Error-analysis
What errors is your model making?
DO SOME EXPLORATORYDATA ANALYSIS (EDA)
Andrew Ng: “ Advice for applying ML”Where do the errors come from?
Recipe:You know a little about the problem
Semi-supervised learning
You know a little about the problem:Semi-supervised learning
JOINT semi-supervised learningAndo and Zhang (2005)
Suzuki and Isozaki (2008)Suzuki et al. (2009), etc.
=> effective but task-specific
You know a little about the problem:Semi-supervised learning
Unsupervised learning,followed by Supervised learning
34
Supmodel
Supdata
Supervised training
How can Bob improve his model?
35
Supmodel
Supdata
Supervised training
Semi-suptraining?
36
Supmodel
Supdata
Supervised training
Semi-suptraining?
Morefeats
37
Supmodel
Supdata
Morefeats
Supmodel
Supdata
Morefeats
sup task 1
sup task 2
More features can be used on different tasks
38
Semi-supmodel
Unsupdata
Supdata
Joint semi-sup
(standard semi-sup setup)
39
Semi-supmodel
Unsupmodel
Unsupdata
Supdata
unsuppretraining
semi-supfine-tuningUnsupervised, then supervised
40
Unsupmodel
Unsupdata
unsuptraining
unsupfeats
Use unsupervised learning to create new features
41
Semi-supmodel
Unsupdata
unsuptraining
Sup training
Supdata
unsupfeats
These features can then be shared with other people
42
Unsupdata
unsuptraining
unsupfeats
sup task 1 sup task 2 sup task 3
Recipe:You know almost nothing
about the problem
Build cool generic features
Know almost nothing about problem:Build cool generic features
Word features(Turian et al., 2010)
http://metaoptimize.com/projects/wordreprs/
45
Brown clustering(Brown et al. 92)
(image from Terry Koo)
cluster(chairman) = `0010’2-prefix(cluster(chairman)) = `00’
46
50-dim embeddings: Collobert + Weston (2008)t-SNE vis by
van der Maaten +Hinton (2008)
Know almost nothing about problem:Build cool generic features
Document features:
Document clusteringLSA/LDA
Deep model
Document features
Salakhutdinov + Hinton 06
Domain adaptationfor sentiment analysis
(Glorot et al. 11)
Document features example
Recipe:You know a little about the problem
Make more REAL training examples
Make more real training examplesCuz you have some time
or a small budget
Amazon Mechanical Turk
Snow et al. 08“ Cheap and Fast – But is it Good?”
1K turk labels per dollar
Average over (5) Turks to reduce noise
=> http://crowdflower.com/
Soylent (Bernstein et al. 10)
Find-Fix-Verify:Crowd control design pattern
Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...Soylent, a prototype...
Find a problem
Fix each problem
Verify quality of each fix
Make more real training examples
Active learning
Dualist (Settles 11)http://code.google.com/p/dualist/
Dualist (Settles 11)http://code.google.com/p/dualist/
Applications:Document categorization
WSDInformation Extraction
Twitter sentiment analysis
You know a little about the problem:Make more training examples
FAKE training examples
NOISE
FAKE training examples
Denoising AARBM
MNIST distortions (LeCun et al. 98)
No negative examples?
FAKE training examples
Multi-view / multi-modal
Multi-view / multi-modal
How do you evaluate an IR system, if you have no labels?
See how good the title is at retrieving the body text.
2) KNOW THE DATA
Know the data
Labelled/structured data:ODP, Freebase, Wikipedia,
Dbpedia, etc.
Know the data
Unlabelled data:WaCKy, ClueWeb09, CommonCrawl,
Ngram corpora
NgramsGoogleBing
Google BooksRoll your own: Common crawl
Know the data
Do something stupid on a lot of data
Do something stupid on a lot of data:Ngrams
Spell-checkingPhrase segmentation
Word breakingSynonyms
Language modelsSee “An Overview of Microsoft Web N-gram Corpus and Applications” (Wang et al 10)
Do something stupid on a lot of data
Web-scale k-means for NER(Lin and Wu 09)
Do something stupid on a lot of data
Web-scale clustering
Know the data
Multi-modal learning
Multi-modal learningImages and captions
features features
“ facepalm”
=
Multi-modal learningTitles and article body
features features
Article body
=
Title
Multi-modal learningAudio and tags
features features
“ upbeat” ,“ hip hop”
=
3) IT'S MODELSALL THE WAY DOWN
Break down a pipeline1-best (greedy), k-best,
Finkel et al. 06
Good code to build on
Stanford NLP tools, clustering algorithms, Terry Koo's parser, etc.
Good code to build on
YOUR MODEL
Eat your own dogfoodBootstrapping (Yarowsky 95)
Co-training (Blum+Mitchell 98)EM (Nigam et al., 00)
Self-training (McClosky et al., 06)
Dualist (Settles '11)Active learning + semisup learning
Eat your own dogfood
Cheap bootstrapping:One step of EM
(Settles 11)
“ Awesome! What a great movie!”
It's models all the way down
Use models to annotate
Low recall + high precision+ lots of data = win
Use models to annotate
Face modeling
Pose-invariant face features
Pose-invariant face features
It's models all the way down
THE FUTURE?
Joins on large noisy data sets
Joins on large noisy data sets
ReVerb (Fader et al., 11)http://reverb.cs.washington.edu
Extractions over entire ClueWeb09(826 MB compressed)
ReVerb (Fader et al., 11)
⋈
Joins on noisy data sets(can clean up the data??)
???
The art of predictive analytics:1) Know the data out there2) Know the code out there3) Intuition (bias)
Summary of recipes:Know your problem
Throw in good featuresUse other's good models in yr pipeline
Make more training examplesUse a lot of data
"It especially annoys me when racists are accused of 'discrimination.'
The ability to discriminate is a precious facility; by judging all members of one
'race' to be the same, the racist precisely shows himself incapable of discrimination."
- Christopher Hitchens (RIP)
Other cool research to look at:* Frustratingly easy domain adaptation (Daume 07)* The Unreasonable Effectiveness of Data(Halevy et al 09)* Web-scale algorithms (search on http://metaoptimize.com/qa/)* Self-taught learning (Raina et al 07)
Joseph [email protected]
@turianhttp://metaoptimize.com/qa/
2012.02.02
Please email me any questions