machine learning lecture for methodological foundations of biomedical informatics fall 2015 (bmsc-ga...
TRANSCRIPT
Machine LearningLecture for Methodological Foundations of Biomedical Informatics
Fall 2015 (BMSC-GA 4449)
Sisi MaNYU Langone Medical Center
CHIBI
What type of problems can machine learning solve?
• Re Real Estate
Artificial Intelligence
Retail Sales
Conservation
Climate
Current Active Projects on Kaggle as of Oct, 26th,2015
What type of problems can machine learning solve?
Predominantly:
Classification
How to classify?
Main Ways to Classify:- Unsupervised- Supervised
Unsupervised Learning
Group similar items together
Comics credit: http://nlp.cs.berkeley.edu/comics.shtml
Unsupervised Learning
Since the definition of similarity is arbitrary, one can get different labeling solutions.
Unsupervised Learning
The solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.
Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.
Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.
Lowe, 2012
Unsupervised LearningThe solution depend on both: (1) what variables were used to construct the similarity metric (2) how the similarity metric were constructed.
Image Credit: https://en.wikipedia.org/wiki/Metric_(mathematics)
Unsupervised Learning
How do we know the solution is good?It corresponds to something we care about.
Unsupervised Learning
Supervised Learning
Supervised Learning
Overfitting
Duda, 2ed
Supervised Learning
Overfitting
Image Credit: https://commons.wikimedia.org/wiki/File:Overfitting.svg
Supervised Learning
How do I know if I am overfitting?
Validation Data
Supervised Learning
How do I know if I am overfitting?
Duda, 2ed
18
Supervised Learning
Support Vector Machine
Key Characteristics of SVM• Maximum gap to prevent overfitting• QP problems can be solved with
standard methods.• Soft margins to tolerate noise• Kernel trick for linearly non-separable
dataStatnikov et al., 2011
Most modern algorithms have built in mechanism to minimize overfitting.
19
Predictive Modeling: A Simplified General Framework
Validation Data
20
Predictive Modeling: Cross Validation for error estimation and model selection
Ma et al., 2015 (in preparation)
Machine Learning vs Statistics
Robert Tibshiriani
Machine Learning vs Statistics
Robert Tibshiriani
Machine Learning vs Statistics
Machine Learning Statistics
One major difference between machine learning and statistics :How is the model evaluated?
Machine Learning vs StatisticsWhat is a good model? According to most statistician, in practice especially
Most commonly evaluated by R-squared Breiman, 2001
Machine Learning vs Statistics
Validation Data
What is a good model? According to machine learning researcher.
The Future
What’s the job?
Homework
Research bias-variance decomposition and answer the following question from ”An Introduction to Statistical Learning”.
Resources