the data analytic lifecyclemit.spbau.ru/files/steve todd data analytics lifecycle.pdf ·...

Post on 07-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 EMC CONFIDENTIAL—INTERNAL USE ONLY

The Data Analytic Lifecycle

Steve Todd, EMC Fellow Vice President of Strategy and Innovation

Academic University St. Petersburg, Russia April 11, 2013

2 EMC CONFIDENTIAL—INTERNAL USE ONLY

Goals

Introduce myself

My history managing Global Innovation since 2011

My decision to gather/analyze global innovation data

The mistakes I made when I started

My involvement with EMC’s Data Scientist curriculum

Starting over with the Data Analytics Lifecycle

3 EMC CONFIDENTIAL—INTERNAL USE ONLY

My Career

B.S.C.S, M.S.C.S. University of New Hampshire

200+ patents filed

Author of Two Books on Innovation

Selected as Top 10 Innovation Blogger

Selected as EMC Distinguished Engineer in 2008

One of 5 active EMC Fellows (60,000+) employees

Corporate Vice President of Strategy and Innovation

Global Innovation consultant

Russia, China, Israel, Egypt, Europe, India, and Brazil

4 EMC CONFIDENTIAL—INTERNAL USE ONLY

May 2011: Director of the EIN

EMC Innovation Network was created in 2007

The Director manages global innovation and research

Mission Statement:

You can’t manage what you can’t measure…..

Expand knowledge locally, Transfer it globally, and Leverage it strategically’

5 EMC CONFIDENTIAL—INTERNAL USE ONLY

Gathering Innovation Data

Beijing/ Jidong Chen

Bangalore / Karthik Srinivasan

Tel Aviv / Yael Villa

St. Petersburg Pavel Egorov, Inga Petryaevskaya, Ivan Gumenyuk

Cork/Padraig Murphy

Santa Clara / Mike Dutch

Cairo / Shareef Bassiouny

Hopkinton/

Team Formed June 7, 2011

Shanghai/ Roby Chen

Steve Todd Sudhir Vijendra Mary Henderson Sairam Iyer Calvin Smith

6 EMC CONFIDENTIAL—INTERNAL USE ONLY

Data Collection

Track activities commonly associated with innovation

University Engagements

Publications Conferences Customers/ Partners

Knowledge Transfer Sessions

Ideas Intellectual Property

7 EMC CONFIDENTIAL—INTERNAL USE ONLY

Architectural Approach Dashboard/Analytics for research/innovation activities

Database Dashboard – Metrics/reports

Analytics

8 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #1

Dirty Data

9 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #2

Selecting an Analytic Model

10 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #3

Too many visualizations!

11 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #4

No way to measure “lineage”

Idea 1

Idea 2

Idea 3

Idea 4

Idea 5

Idea 6

Idea 7

Finalist 1

Finalist 2

Finalist 3

POC Mtg 1

POC Mtg 2

POC Mtg 3

POC Mtg 4

Product Specification

Sprint 1

Sprint 2

Sprint 3

Product Complete

Patent 1 Patent 2

12 EMC CONFIDENTIAL—INTERNAL USE ONLY

Problem #5

No recommendations to improve innovation at EMC

13 EMC CONFIDENTIAL—INTERNAL USE ONLY

EMC To the Rescue

EMC is a Big Data Analytics company……

EMC has created a Data Scientist curriculum

14 EMC CONFIDENTIAL—INTERNAL USE ONLY

IN 2000 THE WORLD GENERATED

TWO EXABYTES OF NEW INFORMATION

Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.

15 EMC CONFIDENTIAL—INTERNAL USE ONLY

Sources: “How Much Information?” Peter Lyman and Hal Varian, UC Berkeley,. 2011 IDC Digital Universe Study.

IN 2000 THE WORLD GENERATED

TWO EXABYTES OF NEW INFORMATION

EVERY DAY

16 EMC CONFIDENTIAL—INTERNAL USE ONLY

17 EMC CONFIDENTIAL—INTERNAL USE ONLY

I enrolled for the Data Scientist Course…

18 EMC CONFIDENTIAL—INTERNAL USE ONLY

… and discovered the Data Analytics Life Cycle

19 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Discovery

• Frame the business problem as an analytic challenge that can be solved in phases.

• Understand what's been done in the past.

• Assess the resources supporting the project (people, technology, time, and data).

• Form initial hypotheses.

• Create an Analytic Plan

20 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Hypotheses

• Statements that I will try and prove or disprove with analytics

• IH1: Innovation activity in different geographic regions can be mapped to corporate strategic directions. • IH2: Innovators that participate in global knowledge transfer deliver ideas more quickly than those that do not. • IH3: An idea submission can be analyzed and evaluated for the likelihood of receiving funding. • IH4: Knowledge discovery and growth for a particular topic can be measured and compared across geographic

regions. • IH5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions. • IH6: Strategic corporate themes can be mapped to geographic regions. • IH7: Frequent knowledge expansion and transfer events reduce the amount of time it takes to generate a

corporate asset from an idea. • IH8: Emerging research topics can be classified and mapped to specific ideators, innovators, boundary spanners

and assets.

An increase in geographic knowledge transfer improves the speed of idea delivery.

21 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 1: Discovery • Create an Analytic Plan

22 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 2: Discovery

• Build an analytic sandbox

• Extract, Load, Transform (ELT but not ETL)

• Explore the Data

• Assess Data Quality

• Phase 2 is all about conditioning, or preparing, the data….

• …. And I DISCOVERED that I did not have the right data to prove one of my hypotheses

23 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 2: Discovery

24 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 3: Model Planning In Phase 2, the data exploration was mainly about conditioning the data, exploring it, validating quality, and understanding it more fully. Phase 3: Look at every hypothesis Perform limited experiments with different analytic models

H5: Knowledge transfer activity can identify research-specific boundary spanners in disparate regions.

25 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 4: Run the Models!

• Key analytic models chosen for our project • Social Network Analysis • Topic Modeling (Stanford Toolkit) • Natural Language Processing

26 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 5: Communicate Results

• Many people who are great at the analytics do not enjoy telling their story or evangelizing the project….. • …. But analytics are supposed to drive change!

27 EMC CONFIDENTIAL—INTERNAL USE ONLY

Phase 6: Operationalize

• Employees can “improve” their ideas by using our analytic models

• Complex text matching algorithm • Helps measure ancestry of ideas • Helps identify subject matter experts around the world • Convinces EMC executives that Big Data is powerful • Great public relations for EMC • Identifies “clusters” of innovators • Creates a data-driven culture at EMC

28 EMC CONFIDENTIAL—INTERNAL USE ONLY

EMC Data Science Curriculum

90 min

1 day

5 days Aspiring Data

Scientists

Business Leaders

Heads of Data Science Teams

Data Science and Big Data Analytics

Data Science and Big Data Analytics for Business Transformation

Introducing Data Science and Big Data Analytics for Business Transformation

New

New

29 EMC CONFIDENTIAL—INTERNAL USE ONLY

Questions?

Additional Resources:

1. EMC Education Services curriculum on Data Science and Big Data Analytics

for Business Transformation:

http://education.emc.com/guest/campaign/data_science.aspx

2. My Blog on Data Science & Big Data Analytics:

http://infocus.emc.com/author/david_dietrich/

3. Blog on applying Data Analytics Lifecycle to measuring innovation data:

http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/

top related