big data agile analytics by ken collier - director agile analytics, thoughtworks

Post on 26-Jan-2015

151 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself. Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities. The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics: •sophisticated analytics techniques, plus •lean learning principles, plus •agile delivery methods, plus •so-called "big data" technologies Learn: •The analytical modeling process and techniques •How analytical models are deployed using modern technologies •The complexities of data discovery, harvesting, and preparation •How to apply agile techniques to shorten the analytics development cycle •How to apply lean learning principles to develop actionable and valuable analytics •How to apply continuous delivery techniques to operationalize analytical models

TRANSCRIPT

BIG DATA AGILE ANALYTICS Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks

1

Valu

e

Complexity

What happened?

Descriptive Analytics

Why did it happen?

Diagnostic Analytics

What will happen?

Predictive Analytics How can we

make it happen?

Prescriptive Analytics

Valu

e

Complexity

What happened?

Descriptive Analytics

Why did it happen?

Diagnostic Analytics

What will happen?

Predictive Analytics How can we

make it happen?

Prescriptive Analytics

3

Traditional Business Intelligence

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Advanced Analytics

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Big Data Analytics Pipeline

Modeling Data

Operational Data

External Data

Data Integration

Reporting Engine

Dimension Mapping

Clean Data

Report Report Report

Dimensional Data

Data Sampling

Feature Selection

Data Partitioning

Test Data

Training Data

Analytical Modeling

Candidate Model

Model Validation

Accepted Model

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Advanced Analytics

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Discover & Explore

Analyze & Act

Data Convergence Analytical Divergence

Discover

Harvest

Filter

Integrate Augment Analyze

Act

Analytical Opportunities

How Advanced Analytics Works If we knew X, we could do Y

Typical Timeline

3-6 months 2 months 2-4 months

10

Data Convergence Analytical Divergence

Discover

Harvest

Filter

Integrate Augment Analyze

Act

Analytical Opportunities

Traditional Analytics If we knew X, we could do Y

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Analytical Divergence

Analytical Opportunities If we knew X, we could do Y

Data Convergence

Discover

Harvest

Filter

Integrate Augment Analyze

Act

Repeat this cycle solving small problems every few days

LEARN

MEASURE

BUILD

Agility in Analytics

Retain high value customers

High value business goal

Like this example…

What’s the smallest, simplest thing we can do?

Retain high value customers

Like this example… Common features of

defectors?

Is it useful & actionable?

Retain high value customers

Like this example… Common features of

defectors?

Repeat! Retain high value customers

Like this example… Common features of

defectors?

Shopping behaviors of defectors?

Retain high value customers

Like this example… Common features of

defectors?

What leads to customers leaving?

Shopping behaviors of defectors?

What do defectors say about us?

Customers’ sentiment before defecting?

What encourages customers to stay?

Do incentives reduce defection rates?

Problem solved or continue?

What leads to customers leaving?

Like this example… Common features of

defectors?

Shopping behaviors of defectors?

What do defectors say about us?

Customers’ sentiment before defecting?

What encourages customers to stay?

Do incentives reduce defection rates?

Advanced Analytics

Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Science

Machine Learning

Statistics

THE “DATA SCIENTIST”

Machine Learning Statistical Modeling

Artificial Neural Networks

Decision Tree Learning

Support Vector Machines

Clustering

…and many more…

Bayesian Classification

Monte Carlo Simulation

Logistic Regression

K-Nearest Neighbor

…and many more…

Domain Knowledge

Data Semantics

Business Understanding

Business Communication

Programming Skills

Functional Programming

Data “Wrangling”

Map/Reduce, SQL, & NoSQL

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

drones.pitchinteractive.com

Data Visualization

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Reduction

Objective Truth

Discoverable Truth

Uninterpretable

Irrelevant Noise

Not Actionable

Impactful New Insights

“Little Data”

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Reduction

Insight

Knowledge

Action

Disruption

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Reduction

Insight

Knowledge

Action

Disruption

Business vs. IT

Focus vs. Platform

Monitor & Measure

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Reduction

Insight

Knowledge

Action

Disruption

Business vs. IT

Focus vs. Platform

Monitor & Measure

Privacy Controls Radical Transparency

Data Democracy

Open Data

Advanced Analytics

Data Science

Visual Storytelling

Machine Learning

Statistics Agile Analytics

Big Data Solutions Thinking

Ethics

Agile Delivery Lean

Learning

Impact

Volume Velocity

Variety

NoSQL Complexity

Polyglot Persistence

Continuous Integration

Collaboration Evolve

Continuous Delivery

Hypothesis

Build Learn

Measure

Data Reduction

Insight

Knowledge

Action

Disruption

Business vs. IT

Focus vs. Platform

Monitor & Measure

Privacy Controls Radical Transparency

Data Democracy

Open Data

Ken Collier, Director, Agile Analytics kcollier@thoughtworks.com

Value Creation

Cool New Technologies +

Sophisticated Analytics +

Lean Learning Principals +

Fast Agile Delivery =

top related