data-science-not-just-for-big-data

9

Click here to load reader

Upload: gregory-piatetsky-shapiro

Post on 12-May-2015

1.363 views

Category:

Technology


0 download

DESCRIPTION

I review what remains constant in Data Science regardless of data size and describe data science basic principles and ideas.

TRANSCRIPT

Page 1: data-science-not-just-for-big-data

© KDnuggets 2013 1

Data Science not just for Big Data

Gregory Piatetsky, @kdnuggets

Analytics, Big Data, Data Mining, and Data Science Resources

Page 2: data-science-not-just-for-big-data

© KDnuggets 2013 2

What do we call it?

• Statistics, 1830-• Data mining, 1980-• Knowledge Discovery in Data (KDD), 1989-• Business Analytics, 1997-• Predictive Analytics, 2002-• Data Analytics,2011-• Data Science, 2011-• Big Data, 2012 -

Same Core Idea:Finding Useful Patterns in Data

Different Emphasis

Page 3: data-science-not-just-for-big-data

© KDnuggets 2013 3

Big Data > Data Mining > > Predictive Analytics , Data Science

Big Data

Google Trends search, Jan 2008- Sep 2013, Worldwide

Data mining

Data Mining Big Data

Page 4: data-science-not-just-for-big-data

© KDnuggets 2013 4

Data Science before “Big Data”

• Ancient astronomers• Kepler laws of planetary motion (1609),

derived from observations by Tycho Brahe

• Genetics – Gregor Mendel found patterns in inheritance of pea plants

• Western Medicine• …

Page 5: data-science-not-just-for-big-data

© KDnuggets 2013 5

Ignaz Semmelweis – early data scientist (1818-1865)

Semmelweis found that the main difference between clinics was that 1st had medical students who also examined cadavers, and inferred that students carried something on their hands from the autopsy. He proposed washing hands after autopsy but was rejected and died in insane asylum

Graph from Wikipedia

Page 6: data-science-not-just-for-big-data

© KDnuggets 2013 6

Data Science Application: Process, not one step

CRISP-DMprocess

Page 7: data-science-not-just-for-big-data

© KDnuggets 2013 7

Data Science Application: Process, not one step

CRISP-DMprocess

Building PredictiveModels

Most fun for data scientists,But only a small part of the process

Page 8: data-science-not-just-for-big-data

© KDnuggets 2013 8

Data Science Basic Principles & Ideas

• Focus on actionable patterns• Build predictive models - supervised learning (train, test, x-

validate)• Avoid overfitting• Calculating similarity of objects - unsupervised learning• Avoid information leakers• Select important variables/features• Model accuracy vs lift: how much more prevalent a pattern is

than would be expected by chance• Estimate probability and cost/gain of actions• Help optimize decisions

Page 9: data-science-not-just-for-big-data

© KDnuggets 2013 9

What Changes in Data Science with Big Data?

• Data munging becomes much more complex• New algorithms, technology needed to deal with Big

Data Volume, Velocity, & Variety• New, effective algorithms that require Big Data: e.g.:

deep belief networks, recommendations• Predictions become (somewhat ) more accurate• New things become visible: social networks,

recommendations, mobility, knowledge ?

• However, basic principles remain