devsci: improving software through data scienceskill 2016 change skill 2016 change. what’s the...
TRANSCRIPT
Data Science:Becoming a Data-driven Organization
@MatthewRenze
#MicrosoftUAE
Are you a decision maker?
Are you flooded with data?
Are you a decision maker?
Are you flooded with data?
Are you a decision maker?
Are you using data science?
Why is it important?
What is data science?
How do I get started?
What is data science?
Why is it important?
How do I get started?
Why is data science important?
Job Postings for Data Scientists
Source: Dice Salary Survey 2017
Top-paying Tech SkillsSkill 2016 Change Skill 2016 Change
What’s the problem?
What’s the problem?
The Current State of Business
Don’t understand customersLack of product-market fit
Unused / low-value features
Missed market opportunities
Human biases
Guesswork
Cost of labor
Human errors
Three Main Approaches
Createbetter
products
Makesmarter
decisions
Reduce laborcosts
Three Main Approaches
Createbetter
products
Three Main Approaches
Makesmarter
decisions
Three Main Approaches
Reduce laborcosts
Three Main Approaches
Createbetter
products
Makesmarter
decisions
Reduce laborcosts
What is data science?
Computer
science
Math and
statistics
Domain
knowledge
Data
science
Data
engineering
Scientific
method
Data
science
Data Knowledge Decision Action
What Is a Data Scientist?
Performs data science
More than a scientist
More than an analyst
More than a developer
What skills are necessary?
Data Science Skills
Programming
Working with data
Descriptive statistics
Data visualization
Data Science Skills
Programming
Working with data
Descriptive statistics
Data visualization
Statistical modeling
Handling Big Data
Machine learning
Deploying to production
What tools are used?
70%
60%
40%
30%
20%
10%
0%
50%
SQ
L
Exc
el
Pyt
ho
n
MyS
QLR
Pyt
ho
n t
oo
ls
gg
plo
t
SQ
L Serv
er
Tab
leau
Java
Scr
ipt
Matp
lotl
ib
Java
Po
stg
reSQ
L
Ora
cle
D3
Ho
meg
row
n
Hiv
e
Sp
ark
Clo
ud
era
Vis
ual B
asi
c
Mo
ng
oD
B
Had
oo
p
SA
S
C+
+
Sca
la
Po
werP
ivo
t
SQ
Lite C
Pig
Red
Sh
ift
Weka
Hb
ase
(EM
R)
Perl
SP
SS
Tera
data
Tool: language, platform, analytics
Sh
are
of
Resp
on
den
ts
Source: O’Reilly 2015 Data Science Salary Survey
Data Science Tools
70%
60%
40%
30%
20%
10%
0%
50%
SQ
L
Exc
el
Pyt
ho
n
MyS
QLR
Pyt
ho
n t
oo
ls
gg
plo
t
SQ
L Serv
er
Tab
leau
Java
Scr
ipt
Matp
lotl
ib
Java
Po
stg
reSQ
L
Ora
cle
D3
Ho
meg
row
n
Hiv
e
Sp
ark
Clo
ud
era
Vis
ual B
asi
c
Mo
ng
oD
B
Had
oo
p
SA
S
C+
+
Sca
la
Po
werP
ivo
t
SQ
Lite C
Pig
Red
Sh
ift
Weka
Hb
ase
(EM
R)
Perl
SP
SS
Tera
data
Tool: language, platform, analytics
Sh
are
of
Resp
on
den
ts
Source: O’Reilly 2015 Data Science Salary Survey
Data Science Tools
70%
60%
40%
30%
20%
10%
0%
50%
SQ
L
Exc
el
Pyt
ho
n
MyS
QLR
Pyt
ho
n t
oo
ls
gg
plo
t
SQ
L Serv
er
Tab
leau
Java
Scr
ipt
Matp
lotl
ib
Java
Po
stg
reSQ
L
Ora
cle
D3
Ho
meg
row
n
Hiv
e
Sp
ark
Clo
ud
era
Vis
ual B
asi
c
Mo
ng
oD
B
Had
oo
p
SA
S
C+
+
Sca
la
Po
werP
ivo
t
SQ
Lite C
Pig
Red
Sh
ift
Weka
Hb
ase
(EM
R)
Perl
SP
SS
Tera
data
Tool: language, platform, analytics
Sh
are
of
Resp
on
den
ts
Source: O’Reilly 2015 Data Science Salary Survey
Data Science Tools
70%
60%
40%
30%
20%
10%
0%
50%
SQ
L
Exc
el
Pyt
ho
n
MyS
QLR
Pyt
ho
n t
oo
ls
gg
plo
t
SQ
L Serv
er
Tab
leau
Java
Scr
ipt
Matp
lotl
ib
Java
Po
stg
reSQ
L
Ora
cle
D3
Ho
meg
row
n
Hiv
e
Sp
ark
Clo
ud
era
Vis
ual B
asi
c
Mo
ng
oD
B
Had
oo
p
SA
S
C+
+
Sca
la
Po
werP
ivo
t
SQ
Lite C
Pig
Red
Sh
ift
Weka
Hb
ase
(EM
R)
Perl
SP
SS
Tera
data
Tool: language, platform, analytics
Sh
are
of
Resp
on
den
ts
Source: O’Reilly 2015 Data Science Salary Survey
Data Science Tools
70%
60%
40%
30%
20%
10%
0%
50%
SQ
L
Exc
el
Pyt
ho
n
MyS
QLR
Pyt
ho
n t
oo
ls
gg
plo
t
SQ
L Serv
er
Tab
leau
Java
Scr
ipt
Matp
lotl
ib
Java
Po
stg
reSQ
L
Ora
cle
D3
Ho
meg
row
n
Hiv
e
Sp
ark
Clo
ud
era
Vis
ual B
asi
c
Mo
ng
oD
B
Had
oo
p
SA
S
C+
+
Sca
la
Po
werP
ivo
t
SQ
Lite C
Pig
Red
Sh
ift
Weka
Hb
ase
(EM
R)
Perl
SP
SS
Tera
data
Tool: language, platform, analytics
Sh
are
of
Resp
on
den
ts
Source: O’Reilly 2015 Data Science Salary Survey
Data Science Tools
How is data science performed?
The Data Science Process
Data
The Data Science Process
Find a question
Data
The Data Science Process
Find a question
Collectthe data
Data
The Data Science Process
Find a question
Collectthe data
Preparethe data
Data
The Data Science Process
Find a question
Collectthe data
Preparethe data
Create a model
Data
The Data Science Process
Find a question
Collectthe data
Preparethe data
Create a model
Evaluatethe model
Data
The Data Science Process
Find a question
Collectthe data
Preparethe data
Create a model
Evaluatethe model
Deploythe model
Data
The Data Science Process
Find a question
Collectthe data
Preparethe data
Createa model
Evaluatethe model
Deploythe model
Data
The Data Science Process
Iterative process
Find a question
Explore the data
Prepare the data
Create a model
Evaluate the
model
Deploy the
model
Data
The Data Science Process
Iterative process
Non-sequential
Find a question
Explore the data
Prepare the data
Create a model
Evaluate the
model
Deploy the
model
Data
The Data Science Process
Iterative process
Non-sequential
Early termination
Find a question
Explore the data
Prepare the data
Create a model
Evaluate the
model
Deploy the
model
Data
How do I get started?
What are the ingredients ofa data-driven enterprise?
Strategy
People
DataTechnology
Culture
Strategy
People
Data
Technology
Culture
What is the process of becominga data-driven enterprise?
AI
Predict
Analyze
Organize
Collect
AI
Predict
Analyze
Organize
Collect
Need
s
1. Collect
Collect
1. Collect
Transactions
Logging
Digitization
Collect
1. Collect
Transactions
Logging
Digitization
Telemetry
Experiments
External dataCollect
2. Organize
Organize
Collect
2. Organize
Transform
Clean
Store
Organize
Collect
2. Organize
Transform
Clean
Store
Data ETL
Data Warehouse
Data Lake
Organize
Collect
3. Analyze
Analyze
Organize
Collect
3. Analyze
Reports
Dashboards
KPI monitors
Analyze
Organize
Collect
3. Analyze
Reports
Dashboards
KPI monitors
Data mining
Descriptive analytics
Diagnostic analytics
Analyze
Organize
Collect
4. Predict
Predict
Analyze
Organize
Collect
4. Predict
Predictive analytics
Prescriptive analytics
Machine learning
Predict
Analyze
Organize
Collect
5. Automate AI
Predict
Analyze
Organize
Collect
5. Automate
Artificial intelligence
Reinforcement learning
Deep learning
AI
Predict
Analyze
Organize
Collect
AI
Predict
Analyze
Organize
Collect
AI
Predict
Analyze
Organize
Collect
Advice for Success
Get buy-in from leadership
Focus on low-hanging fruit
Don’t silo data science teamsDemocratize your data
Advice for Success
Get buy-in from leadership
Focus on low-hanging fruit
Don’t silo data science teamsDemocratize your data
Embrace smart failure
Focus on feedback
Embed data collection
Avoid the Observer Effect
Where to Go Next?
Where to Go Next
Data Camp: https://www.datacamp.com
Pluralsight: https://www.pluralsight.com
Coursera: https://www.coursera.org
www.pluralsight.com/authors/matthew-renze
Pluralsight Courses
Data Science: The Big Picture
Data Science with R
Exploratory Data Analysis with R
Data Visualization with R (3-part)
Deep Learning: The Big Picture
https://www.pluralsight.com/authors/matthew-renze
www.matthewrenze.com
Feedback
Very important to me!
What did you like?
What could I improve?
Conclusion
Why is it important?
What is data science?
How do I get started?
Is your organization?
Are you prepared?
Is our world prepared?
Thank You!
Matthew Renze
Data Science Consultant
Renze Consulting
Twitter: @matthewrenze
Email: [email protected]
Website: www.matthewrenze.com