data-science-not-just-for-big-data
DESCRIPTION
I review what remains constant in Data Science regardless of data size and describe data science basic principles and ideas.TRANSCRIPT
© KDnuggets 2013 1
Data Science not just for Big Data
Gregory Piatetsky, @kdnuggets
Analytics, Big Data, Data Mining, and Data Science Resources
© KDnuggets 2013 2
What do we call it?
• Statistics, 1830-• Data mining, 1980-• Knowledge Discovery in Data (KDD), 1989-• Business Analytics, 1997-• Predictive Analytics, 2002-• Data Analytics,2011-• Data Science, 2011-• Big Data, 2012 -
Same Core Idea:Finding Useful Patterns in Data
Different Emphasis
© KDnuggets 2013 3
Big Data > Data Mining > > Predictive Analytics , Data Science
Big Data
Google Trends search, Jan 2008- Sep 2013, Worldwide
Data mining
Data Mining Big Data
© KDnuggets 2013 4
Data Science before “Big Data”
• Ancient astronomers• Kepler laws of planetary motion (1609),
derived from observations by Tycho Brahe
• Genetics – Gregor Mendel found patterns in inheritance of pea plants
• Western Medicine• …
© KDnuggets 2013 5
Ignaz Semmelweis – early data scientist (1818-1865)
Semmelweis found that the main difference between clinics was that 1st had medical students who also examined cadavers, and inferred that students carried something on their hands from the autopsy. He proposed washing hands after autopsy but was rejected and died in insane asylum
Graph from Wikipedia
© KDnuggets 2013 6
Data Science Application: Process, not one step
CRISP-DMprocess
© KDnuggets 2013 7
Data Science Application: Process, not one step
CRISP-DMprocess
Building PredictiveModels
Most fun for data scientists,But only a small part of the process
© KDnuggets 2013 8
Data Science Basic Principles & Ideas
• Focus on actionable patterns• Build predictive models - supervised learning (train, test, x-
validate)• Avoid overfitting• Calculating similarity of objects - unsupervised learning• Avoid information leakers• Select important variables/features• Model accuracy vs lift: how much more prevalent a pattern is
than would be expected by chance• Estimate probability and cost/gain of actions• Help optimize decisions
© KDnuggets 2013 9
What Changes in Data Science with Big Data?
• Data munging becomes much more complex• New algorithms, technology needed to deal with Big
Data Volume, Velocity, & Variety• New, effective algorithms that require Big Data: e.g.:
deep belief networks, recommendations• Predictions become (somewhat ) more accurate• New things become visible: social networks,
recommendations, mobility, knowledge ?
• However, basic principles remain