symex 2015 - agile process for big data analytic

28
More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division. insert photo

Upload: pmi-indonesia-chapter

Post on 21-Jan-2017

324 views

Category:

Leadership & Management


0 download

TRANSCRIPT

• More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division.

insert photo

Context of “Big Data” Science Scope of Data Analytic Project Management Complexities Team Structure and R&R Agile Principles and Process Model Common Execution Issues Q&A

Volume Exceeds physical limits of vertical scalability

Velocity Decision window small compared to data change rate

Variety Many different formats makes integration expensive

Variability Many options or variable interpretations confound analysis

By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.

– – Gartner, Mark Beyer “Information Management in the 21st Century”

Data, Data, .. Everywhere New Data Sources Larger Data Volumes

New Data Management Technologies Hadoop + Spark + Tool Ecosystem

New Era of Data Analytic Descriptive, Predictive & Prescriptive Data-Driven Organization

10x increase every five years

85% from new data types

Volume Velocity Variety

2013 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Cloud Data Storage is Unlimited

Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs

2015 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Generic Tasks 1. Define Analytic

Requirement 2. Setup

Infrastructure 3. Collect Data 4. Data Modeling 5. Data Processing 6. Model Deployment 7. Monitoring 8. Evaluation 9. Etc….

Complexity

Valu

e

Descriptive Analytics

Diagnostic Analytics

Predictive Analytics

Prescriptive Analytics

What happened?

Why did it happen?

What will happen?

How can we make it happen?

Vision Analytics

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

Legal discovery and document archiving

Pricing analysis

Fraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized Insurance

Advance computation based on machine learning & predictive analytics are core capabilities that are needed throughout future business

Pull-based Batch Loads

Enterprise Data Models

Complex ETL Logic

Poorly Suited to Non-Relational Data

Emergent design is difficult

Much More than Technologies people process

New Roles: 1. Data Engineer 2. Data Scientists

CRISP-DM - Cross Industry Standard Process for Data Mining.

Framework for Guidance Process Model Non-proprietary Experience Base Application/Industry neutral Tool neutral Focus on business issues As well as technical analysis

Business Understanding

Data Understanding

Data Preparation Modeling Deployment Evaluation

Format Data

Integrate Data

Construct Data

Clean Data

Select Data

Determine Business

Objectives

Review Project

Produce Final

Report

Plan Monitoring &

Maintenance

Plan Deployment

Determine Next Steps

Review Process

Evaluate Results

Assess Model

Build Model

Generate Test Design

Select Modeling Technique

Assess Situation

Explore Data

Describe Data

Collect Initial Data

Determine Data Mining

Goals

Verify Data

Quality

Produce Project Plan

Common Issues Learning curve for data science & data engineer. We can’t design insights, we discover it through exploring Low data quality .. Less insights from the data. The result is not good enough.

Key Strategies Extra dedicated time to learn before project sprints (Eq. MOOC). Add capabilities to explore data, iterate and publish intermediate results. Improve data quality based on feedbacks. Build-Measure-Release iteration.