the data analytic lifecyclemit.spbau.ru/files/steve todd data analytics lifecycle.pdf ·...
TRANSCRIPT
1 EMC CONFIDENTIAL—INTERNAL USE ONLY
The Data Analytic Lifecycle
Steve Todd, EMC Fellow Vice President of Strategy and Innovation
Academic University St. Petersburg, Russia April 11, 2013
2 EMC CONFIDENTIAL—INTERNAL USE ONLY
Goals
Introduce myself
My history managing Global Innovation since 2011
My decision to gather/analyze global innovation data
The mistakes I made when I started
My involvement with EMC’s Data Scientist curriculum
Starting over with the Data Analytics Lifecycle
3 EMC CONFIDENTIAL—INTERNAL USE ONLY
My Career
B.S.C.S, M.S.C.S. University of New Hampshire
200+ patents filed
Author of Two Books on Innovation
Selected as Top 10 Innovation Blogger
Selected as EMC Distinguished Engineer in 2008
One of 5 active EMC Fellows (60,000+) employees
Corporate Vice President of Strategy and Innovation
Global Innovation consultant
Russia, China, Israel, Egypt, Europe, India, and Brazil
4 EMC CONFIDENTIAL—INTERNAL USE ONLY
May 2011: Director of the EIN
EMC Innovation Network was created in 2007
The Director manages global innovation and research
Mission Statement:
You can’t manage what you can’t measure…..
Expand knowledge locally, Transfer it globally, and Leverage it strategically’
5 EMC CONFIDENTIAL—INTERNAL USE ONLY
Gathering Innovation Data
Beijing/ Jidong Chen
Bangalore / Karthik Srinivasan
Tel Aviv / Yael Villa
St. Petersburg Pavel Egorov, Inga Petryaevskaya, Ivan Gumenyuk
Cork/Padraig Murphy
Santa Clara / Mike Dutch
Cairo / Shareef Bassiouny
Hopkinton/
Team Formed June 7, 2011
Shanghai/ Roby Chen
Steve Todd Sudhir Vijendra Mary Henderson Sairam Iyer Calvin Smith
6 EMC CONFIDENTIAL—INTERNAL USE ONLY
Data Collection
Track activities commonly associated with innovation
University Engagements
Publications Conferences Customers/ Partners
Knowledge Transfer Sessions
Ideas Intellectual Property
7 EMC CONFIDENTIAL—INTERNAL USE ONLY
Architectural Approach Dashboard/Analytics for research/innovation activities
Database Dashboard – Metrics/reports
Analytics
8 EMC CONFIDENTIAL—INTERNAL USE ONLY
Problem #1
Dirty Data
9 EMC CONFIDENTIAL—INTERNAL USE ONLY
Problem #2
Selecting an Analytic Model
10 EMC CONFIDENTIAL—INTERNAL USE ONLY
Problem #3
Too many visualizations!
11 EMC CONFIDENTIAL—INTERNAL USE ONLY
Problem #4
No way to measure “lineage”
Idea 1
Idea 2
Idea 3
Idea 4
Idea 5
Idea 6
Idea 7
Finalist 1
Finalist 2
Finalist 3
POC Mtg 1
POC Mtg 2
POC Mtg 3
POC Mtg 4
Product Specification
Sprint 1
Sprint 2
Sprint 3
Product Complete
Patent 1 Patent 2
12 EMC CONFIDENTIAL—INTERNAL USE ONLY
Problem #5
No recommendations to improve innovation at EMC
13 EMC CONFIDENTIAL—INTERNAL USE ONLY
EMC To the Rescue
EMC is a Big Data Analytics company……
EMC has created a Data Scientist curriculum
14 EMC CONFIDENTIAL—INTERNAL USE ONLY
IN 2000 THE WORLD GENERATED
TWO EXABYTES OF NEW INFORMATION
Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.
15 EMC CONFIDENTIAL—INTERNAL USE ONLY
Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.
IN 2000 THE WORLD GENERATED
TWO EXABYTES OF NEW INFORMATION
EVERY DAY
16 EMC CONFIDENTIAL—INTERNAL USE ONLY
17 EMC CONFIDENTIAL—INTERNAL USE ONLY
I enrolled for the Data Scientist Course…
18 EMC CONFIDENTIAL—INTERNAL USE ONLY
… and discovered the Data Analytics Life Cycle
19 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 1: Discovery
• Frame the business problem as an analytic challenge that can be solved in phases.
• Understand what's been done in the past.
• Assess the resources supporting the project (people, technology, time, and data).
• Form initial hypotheses.
• Create an Analytic Plan
20 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 1: Hypotheses
• Statements that I will try and prove or disprove with analytics
• IH1: Innovation activity in different geographic regions can be mapped to corporate strategic directions. • IH2: Innovators that participate in global knowledge transfer deliver ideas more quickly than those that do not. • IH3: An idea submission can be analyzed and evaluated for the likelihood of receiving funding. • IH4: Knowledge discovery and growth for a particular topic can be measured and compared across geographic
regions. • IH5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions. • IH6: Strategic corporate themes can be mapped to geographic regions. • IH7: Frequent knowledge expansion and transfer events reduce the amount of time it takes to generate a
corporate asset from an idea. • IH8: Emerging research topics can be classified and mapped to specific ideators, innovators, boundary spanners
and assets.
An increase in geographic knowledge transfer improves the speed of idea delivery.
21 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 1: Discovery • Create an Analytic Plan
22 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 2: Discovery
• Build an analytic sandbox
• Extract, Load, Transform (ELT but not ETL)
• Explore the Data
• Assess Data Quality
• Phase 2 is all about conditioning, or preparing, the data….
• …. And I DISCOVERED that I did not have the right data to prove one of my hypotheses
23 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 2: Discovery
24 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 3: Model Planning In Phase 2, the data exploration was mainly about conditioning the data, exploring it, validating quality, and understanding it more fully. Phase 3: Look at every hypothesis Perform limited experiments with different analytic models
H5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions.
25 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 4: Run the Models!
• Key analytic models chosen for our project • Social Network Analysis • Topic Modeling (Stanford Toolkit) • Natural Language Processing
26 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 5: Communicate Results
• Many people who are great at the analytics do not enjoy telling their story or evangelizing the project….. • …. But analytics are supposed to drive change!
27 EMC CONFIDENTIAL—INTERNAL USE ONLY
Phase 6: Operationalize
• Employees can “improve” their ideas by using our analytic models
• Complex text matching algorithm • Helps measure ancestry of ideas • Helps identify subject matter experts around the world • Convinces EMC executives that Big Data is powerful • Great public relations for EMC • Identifies “clusters” of innovators • Creates a data-driven culture at EMC
28 EMC CONFIDENTIAL—INTERNAL USE ONLY
EMC Data Science Curriculum
90 min
1 day
5 days Aspiring Data
Scientists
Business Leaders
Heads of Data Science Teams
Data Science and Big Data Analytics
Data Science and Big Data Analytics for Business Transformation
Introducing Data Science and Big Data Analytics for Business Transformation
New
New
29 EMC CONFIDENTIAL—INTERNAL USE ONLY
Questions?
Additional Resources:
1. EMC Education Services curriculum on Data Science and Big Data Analytics
for Business Transformation:
http://education.emc.com/guest/campaign/data_science.aspx
2. My Blog on Data Science & Big Data Analytics:
http://infocus.emc.com/author/david_dietrich/
3. Blog on applying Data Analytics Lifecycle to measuring innovation data:
http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/