large-scale machine learning - new york university lecun large-scale machine learning large-scale...

5
Yann LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John Langford Yann LeCun Microsoft Research Courant Institute

Upload: vocong

Post on 23-Jun-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John

Yann LeCun

Large-Scale Machine Learning

Large-Scale Machine Learning

John Langford Yann LeCun Microsoft Research Courant Institute

John Langford Yann LeCun Microsoft Research Courant Institute

Page 2: Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John

Yann LeCun

What is Data Science?What is Data Science?

Data Science: automatically extracting knowledge from dataMathematics & StatisticsMachine LearningDomain Expertise

Applications in BusinessLots and lots

Applications in the SciencesAstronomy, CosmologyHigh-energy PhysicsBiology, GenomicsNeuroscienceThe Social Sciences

Medicine

Government

[after Drew Conway's Data Science Venn Diagram]

Mathematics &

StatisticsComputation

Domain Expertise

conventional

research

Danger

Zone!

Machine

Learning

Data

Science

Page 3: Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John

Yann LeCun

Large Scale Machine LearningLarge Scale Machine Learning

Class website:http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:starthttp://cilvr.cs.nyu.edu courses big data→ →

Forum, discussion, Q&A on Piazzahttps://piazza.com/class#spring2013/csciga3033002

Evaluation:Programming assignmentsProjectFinal exam

Computing infrastructure100-node cluster, 8 CPUs/node, Hadoop (donated by Yahoo! Labs)

SoftwareTorch: http://www.torch.ch/Vowpal Wabbit: https://github.com/JohnLangford/vowpal_wabbit/wiki

Page 4: Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John

Yann LeCun

Big Data?Big Data?

Data often comes to in the form of a tableN: dimension of each vector (possibly very sparse)T: number of training samples (possibly infinite)

Big Data is large T, or large N, or bothLarge T, small N: great!Infinite T, small N: on-line / streamingSmall T, large N: hell!

Problems:(distributed) data storage and accesscan't use algo super-linear in TLarge N: overfittingParallelizingDealing with unbalanced setRepresenting high-dim data

N

T

Page 5: Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine Learning John Langford Yann LeCun Microsoft Research Courant Institute John

Yann LeCun

SyllabusSyllabusIntro

Online Linear learning

2nd order optimization methods

LBFGS

Online Non-linear learning

Boosted Decision Trees

Hadoop, Allreduce

Parallel learning, OpenMP, CUDA

Inverted Indicies & Predictive Indexing

Hashing, LSH, linear/non-linear dimensionality reduction

Feature Learning, deep learning

Many Classes

Active Learning

Exploration and Learning