the rise of data science in the age of big data analytics: why data distillation and machine...
DESCRIPTION
The reason why Big Data is important is because we want to use it to make sense of our world. It’s tempting to think there’s some “magic bullet” for analyzing big data, but simple “data distillation” often isn’t enough, and unsupervised machine-learning systems can be dangerous. (Like, bringing-down-the-entire-financial-system dangerous.) Data Science is the key to unlocking insight from Big Data: by combining computer science skills with statistical analysis and a deep understanding of the data and problem we can not only make better predictions, but also fill in gaps in our knowledge, and even find answers to questions we hadn’t even thought of yet.TRANSCRIPT
![Page 1: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/1.jpg)
Revolution Confidential
T he R is e of Data S c ienc e in the age of B ig Data A nalytic sWhy Data Dis tillation and Mac hine L earning A ren’t E nough
David M S mithV P Marketing and C ommunityR evolution Analytic s
![Page 2: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/2.jpg)
Revolution ConfidentialToday, we’ll dis c us s :
What is Data Science? Why machine learning isn’t enough Why Data Science works The Data Scientists Toolkit The Future of Big Data Analytics Closing thoughts and resources
2
![Page 3: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/3.jpg)
Revolution Confidential
3© Dov Harrington, CC By-2.0http://www.flickr.com/photos/idovermani/4110546683/
![Page 4: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/4.jpg)
Revolution ConfidentialWhere is it s afe to fis h near S an F ranc is co?
4San Francisco Estuary Institutehttp://www.sfei.org/tools/wqt
![Page 5: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/5.jpg)
Revolution ConfidentialHurric ane S andy
Bob Rudishttp://rud.is/b/2012/10/28/watch-sandy-in-r-including-forecast-cone/
5
![Page 6: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/6.jpg)
Revolution ConfidentialHurric ane S andy
Ed Chenhttp://blog.echen.me/hurricane-sandy-outages/
6
![Page 7: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/7.jpg)
Revolution Confidential
When did Michael J acks on have his bigges t hits ?
New York Times, June 25 2009 (3 hours after Michael Jackson’s death)http://www.nytimes.com/interactive/2009/06/25/arts/0625-jackson-graphic.html 7
![Page 8: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/8.jpg)
Revolution ConfidentialT hree E s s ential S kills of Data S c ientis ts
8Drew Conwayhttp://www.dataists.com/2010/09/the-data-science-venn-diagram/
Data IntegrationMashups
Applications
ModelsVisualizationPredictionsUncertainty
ProblemsData Sources
Credibility
EffectiveData
Applications
![Page 9: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/9.jpg)
Revolution Confidential
9Image © Abode of Chaos, CC BY 2.0http://www.flickr.com/photos/home_of_chaos/6418989233/
![Page 10: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/10.jpg)
Revolution ConfidentialMac hine learning (ML ) for predic tions
10
Res
pons
e
Feat
ures
Res
pons
es
MLscoring rules
Building the Model
Valid
atio
n se
t
Pre
dict
ions
scoring rules
Validating the Model
New
Dat
a
Pre
dict
ions
(sco
res)
scoring rules
Scoring new data
“Accuracy”
![Page 11: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/11.jpg)
Revolution ConfidentialP roblem: A lac k of pers pec tive
11Image © 2010 David M Smith. Some rights reserved CC BY-2.0
![Page 12: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/12.jpg)
Revolution ConfidentialP roblem: L ac k of c redibility
12
![Page 13: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/13.jpg)
Revolution ConfidentialP roblem: C omplexity
13
![Page 14: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/14.jpg)
Revolution ConfidentialData Science to the Rescue!
14
![Page 15: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/15.jpg)
Revolution ConfidentialA ns wer Unas ked Ques tions
15Revolutions blog: “The Uncanny Valley of Big Data”http://blog.revolutionanalytics.com/2012/02/the-uncanny-valley-of-big-data.html
![Page 16: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/16.jpg)
Revolution Confidential
16
“More data beats better algorithms, every time” – Google
“Companies that have massive amounts of data without massive amounts
of clue are going to be displaced by startups that have less data but more
clue.” -- Tim O’Reilly
Google Research, “The Unreasonable Effectiveness of Data”: http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html
Tim O’Reilly on Google+: https://plus.google.com/107033731246200681024/posts/4Xa76AtxYwdTechnoCalifornia: http://technocalifornia.blogspot.com/2012/07/more-data-or-better-models.html
F ill in knowledge gaps
![Page 17: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/17.jpg)
Revolution ConfidentialAvoid ineffec tive reac tions
17Stupid Data Miner Trickshttp://nerdsonwallstreet.typepad.com/my_weblog/files/dataminejune_2000.pdf
S&P
500
![Page 18: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/18.jpg)
Revolution Confidential
18© Henricks Photos CC-BY-ND 2.0http://www.flickr.com/photos/hendricksphotos/3240667626/
![Page 19: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/19.jpg)
Revolution Confidential0. Data (B ig & Mes s y)
19
![Page 20: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/20.jpg)
Revolution Confidential1. A language for programming with data
20
Download the White Paper
R is Hotbit.ly/r-is-hot
![Page 21: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/21.jpg)
Revolution Confidential
21
Grant awards to homeless veterans FY09Data: Data.govAnalysis: Drew Conway
User-defined functions
Internet API interfaceXML parsing
Custom graphics
Data import and pre-processing
Iterative data processing
![Page 22: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/22.jpg)
Revolution Confidential2. S peed. L ots and lots of s peed.
22
Variable Transformation
Model Estimation
Model Refinement
Model Comparison / Benkmarking
Feature SelectionSampling
AggregationData Predictions
![Page 23: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/23.jpg)
Revolution Confidential
Core 0(Thread 0)
Core n(Thread n)
Core 2(Thread 2)
Core 1(Thread 1)
Multicore Processor (4, 8, 16+ cores)
DataData Data
Disk
Shared Memory
Us e all available c omputing c yc les
23
![Page 24: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/24.jpg)
Revolution Confidential
Compute Node
Compute Node
Master Node
DataPartition
DataPartition
Compute Node
Compute Node
DataPartition
DataPartition
3. A lgorithms that don’t choke on B ig Data
PEMAs: Parallel External-Memory Algorithms24
BIGDATA
![Page 25: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/25.jpg)
Revolution ConfidentialDrink les s c offee!
25
Single ThreadedNon-optimized
algorithms
OptimizedParallelizedAlgorithms
![Page 26: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/26.jpg)
Revolution Confidential4. Move c ode to data (not vic e vers a)
26
Map-Reduce
RHadoop: http://bit.ly/RHadoop
![Page 27: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/27.jpg)
Revolution ConfidentialB ig Data A pplianc es
27
More info: http://bit.ly/R-Netezza
![Page 28: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/28.jpg)
Revolution ConfidentialP lay Nic e with Others
• Business Intelligence Tools• Web-based data apps• Reporting / Spreadsheets
Presentation Layer
• R
Analytics Layer
• Relational datastores• Unstructured datastores
Data Layer
28
![Page 29: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/29.jpg)
Revolution ConfidentialWhat every data s c ientis t needs
Open-Source RRevolution R
EnterpriseInterface with multiple data sources
Exploratory data analysis
Wide range of statistical methods
High-speed computation
Big Data support
Data/code locality (Hadoop, etc.)
Print-quality data visualization
Scheduled batch production
Works in a multi-tool ecosystem
Integration into Data Apps
29
![Page 30: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/30.jpg)
Revolution ConfidentialR evolution R E nterpris e: B ig-Data R
Open-Source RRevolution R
EnterpriseInterface with multiple data sources
Exploratory data analysis
Wide range of statistical methods
High-speed computation
Big Data support
Data/code locality (Hadoop, etc.)
Print-quality data visualization
Scheduled batch production
Works in a multi-tool ecosystem
Integration into Data Apps
30www.revolutionanalytics.com/products
![Page 31: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/31.jpg)
Revolution Confidential
31Image © www.tinyplanetphotography.com
![Page 32: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/32.jpg)
Revolution ConfidentialA nd … the future?
Even more data
Cloud computing
Demand for Data Scientists
Diverging paradigms for data analytics
32http://www.indeed.com/jobtrends
![Page 33: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/33.jpg)
Revolution ConfidentialDiverging data paradigms
33
HadoopNoSQL
FilesClusters
Data Appliances
More data, better fault tolerance
Easier programming, better performanceExplorationModeling
StoragePreprocessing
Production
![Page 34: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/34.jpg)
Revolution ConfidentialData S c ienc e in P roduc tion
Real-time Big Data Analytics: From Deployment to Production
Thursday, November 29, 201210:00AM - 11:00AM Pacific Time
www.revolutionanalytics.com/news-events/free-webinars/
34
![Page 35: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/35.jpg)
Revolution ConfidentialB uilding Data S c ienc e Teams
DJ Patil in O’Reilly Radar: http://oreil.ly/I3H5fI
Statistics and Data Science graduates
Kaggle and Chorus
Revolution Analytics R Training: http://www.revolutionanalytics.com/services/training/
35
![Page 36: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/36.jpg)
Revolution ConfidentialC los ing T houghts
Data Science process leads to more powerful, and more useful models
Data Scientists need a technology platform to think about, explore, and model data
Revolution R Enterprise is R for Big Data
36
![Page 37: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/37.jpg)
Revolution ConfidentialR es ourc es
Revolution R Enterprise : R for Big Data www.revolutionanalytics.com/products
Rhadoop : Connecting R and Hadoop bit.ly/r-hadoop
Contact David Smith [email protected] @revodavid blog.revolutionanalytics.com
37
![Page 38: The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough](https://reader034.vdocument.in/reader034/viewer/2022042714/554f5aecb4c905524c8b54b2/html5/thumbnails/38.jpg)
Revolution ConfidentialT hank you.
38
www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR
The leading commercial provider of software and support for the popular open source R statistics language.