preliminariesbackgroundchallengesstatistics and csoutlinefaculty.bscb.cornell.edu › ~hooker ›...

Preliminaries Background Challenges Statistics and CS Outline

Statistical Machine Learning

Venue: Tuesday/Thursday 11:40 - 12:55WN 145

Lecturer: Giles HookerOffice Hours: Wednesday 2 - 4

Comstock 1186Ph: 5-1638e-mail: gjh27


Texts and Resources

Hastie, Tibshirani, Friedman,2001, "The Elements ofStatistical Learning", Springer.

Other References: see www.bscb.cornell.edu/˜hooker/ML2007Software: R plus toolboxes, see www.r-project.org


Other Resources

See class website.

Vapnik, 2000, The Nature of Statistical Learning Theory

CS 478 – Machine LearningCS 578 – Empirical Machine LearningMATH 774 – Topics in Statistical Learning TheoryORIE 474 – Statistical Data Mining; a masters level course.ORIE 674 – superceded by this class.

http://www.cs.cornell.edu/Projects/learning/


Evaluation and Expectations

Prerequisites: ORIE 351 and ORIE 670Really: Basic probability, theoretical statistics, linear algebra,

multivariate calculus, programming.Assesment: 4 Assignments + 1 (small) project

First 3 assignments: 25% EachProject: 5%Collaboration (with acknowledgement) isencouraged for first three assignments andproject.Final assignment (20%) will be in place of anexam; you are expected to do your own work


Context and Data

Last 25+ years → automated data collection

automated transactionscentralized electronic data basesnew measurement devices

Massive increase in amount, quality and complexity ofavailable dataCanonical example: Walmart transactions

Items purchased, sale information, store layout, checkoutinformation, mode of paymentCustomer info: age, gender, address, previous purchasepatterns, credit history, health status, marital status,occupation, education, driving records....in 2000, 1 Peta-byte; roughly the amount of visual informationyou will process in your lifetime.


More Examples

More recent: Amazon.com

Same info as above but also, click streams, browsing patterns,visiting patterns, listed opinions, links followed, mousegestures....

Also

Image data: photos (recent Microsoft research), digitizedhandwriting, face recognitionAudio data: same thingNatural writing: epinions, aviation safety inspection reports,google searches, e-mail...E-bay buying patterns, auction patterns


Outside of Sociology/Economics/Commerce

Medical imagingAutomatic diagnosis/ER triageDrug discoveryCancer taxonomyAstronomy: Sloan Digital Sky SurveyCommunications: predicting network failures

What are we going to do with all this?


Data Size: Challenges for Statistics

Storage, access, processing

⇒, heavy CS involvementConcern over computational efficiencySub-sampling ⇒ very small populations might be of higheconomic valueAmazon Goldbox example

Modeling and Inference

Enormous power → all (reasonable) models are wrongManual model-building requires large amounts of timeParametric assumptions are not reasonableDon’t know what we’re looking for (eg association rules)


Complexity: Challenges

High dimensionality

many "nuisance" variables, lots of correlation (frequentlynonlinear)model/variable selection ⇒ massive variancedifficulty of model checkingcurse of dimensionality

"Feature" complexity

Different lengths of covariatesSpacial and geometric structureUnstructured covariates: natural language, click-streams

Uncertain measures of fit: Google search


Feature ComplexityData do not look like a traditional design matrix:

imagessentencesvideomoleculesebay auctions

Need to have a metric for comparison ⇒ feature extraction.But also complex relationships:

links between webpagescomposite molecules

Strange invariances (face recognition: rotation, translation,occlusion).Much modern effort = dealing with specific complex problems.


What is Maching Learning?And what is Data Mining?

Definitions (and distinctions) differ; not welldemarcated/largely a matter of advertising.Concerned with "automated", model-free statistics.

My definitions:Machine Learning = focus on predictive modelingData Mining = discovery of patterns from massive amounts ofdata.

General Philosophy"Don’t show me a model: let the data do the work".


This Class

Focus on prediction:

Generic model y = F (x) + ε

Given data {yi , xi}Ni=1 ∼ p(y , x)y = "output", or "response"x = "inputs", "features" (ML = not "predictors")

Want to estimate F to get good prediction:

F ∗ = argminF∈C

EyxL(y ,F (x))


Why?

L = "loss" of guessing F when truth is y

F ∗ is best predictor of y under L

Examples:Regression: y ∈ R,

L(y ,F (x)) = |y − F (x)|, (y − F (x))2

Classification: y ∈ {c1, . . . , cK}

L = Ly ,F (K × Kmatrix)


Nearest Neighbors

Canonical example

Observations: x1, . . . , xN

Responses: y1, . . . , yN .Desired: Estimate y for response for a new observation x.

Prediction Rule: letI = argmin

i∈1,...,N‖xi − x‖

predict yI .

Model-free, successful (sort of), not something a statistician wouldthink of.


Is Data Mining Statistics?

Historically (and currently) dominated by computer science.Very little attention from statisticians.Still perceived as Computer Science/MathLead to "obvious" statistical mistakes in research, but alsosuccessful ideas that are very different from statistical thinking.Growing recognition/convergence from both sides.Friedman, 1997, "Data Mining and Statistics: What’s theConnection?"Brieman, 2002, "Statistical Modeling: The Two Cultures"


Is Data Mining Statistics?


This Class

Statistical Machine Learning

How do we understand ML success from a statisticalperspective?

bias, variance, regularizationHow do we assess the performance of these methods

cross validation, bootstrap....Does this fit into what we already know?


Outline

Linear methods: linear regression/linear discriminant analysisExtensions: nonlinear regression, kernel methods, neuralnetworksRegularization and Model AssessmentNonlinear methods: nearest neighborsTree-based methods and extensionsAggregation methods: boosting, bagging and randomizationRequested topics, PAC learning theory, semi-supervisedlearning, clustering

preliminariesbackgroundchallengesstatistics and csoutlinefaculty.bscb.cornell.edu › ~hooker ›...

Documents