data mining relation to course: data mining (chap 28) g15: dillon littlefield & nathan moeller...
TRANSCRIPT
Data MiningREL AT ION TO COURSE : DATA M IN ING (CHAP 28 )
G15 : D ILLON L ITTLEF IELD & NATHAN MOELLER
Z IMMER , CARL . "BACTER IAL ECOSYSTEMS D IV IDE PEOPLE INTO 3 GROUPS , SC IENT ISTS S AY. " THE NEW YORK T IMES . THE NEW YORK T IMES , 20 APR. 2011 . WEB . 10 APR. 2015 .
Classification vs ClusteringCriteria Classification Clustering
Prior Knowledge of classes Yes No
Use case Classify new sample into known classes
Suggest groups based on patterns in data
Algorithms Decision Trees, Bayesian classifiers
K-means, Expectation Maximization
Data Needs Labeled samples from a set of classes
Unlabeled samples
Bacterial Ecosystems
•Blood types fall into classes A, B, AB, and O. What about gut ecosystems?
•Each gut has a unique population of microbes.
•Research suggests there may be three distinct types of microbiomes called enterotypes.
Medical Applications
• Tailor diets to specific enterotypes
• Tailor drug prescriptions to enterotypes
• Alternative to antibiotics: restore good bacteria to gut
Challenge
• Each person has 100 trillion microbes
• Each enterotype is a balance of many bacterial species
• Debate not settled: UMN professor Dan Knights suggests for continuum
Classification vs ClusteringCategorize each question as a classification or clustering problem:
• What is the blood type of the patient?
• Based on gut bacteria ecosystems, do human fall into a small number of distinct groups?
• How many natural groups do humans fall into based on their gut-bacteria ecosystem?