big data agile analytics by ken collier - director agile analytics, thoughtworks
DESCRIPTION
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself. Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities. The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics: •sophisticated analytics techniques, plus •lean learning principles, plus •agile delivery methods, plus •so-called "big data" technologies Learn: •The analytical modeling process and techniques •How analytical models are deployed using modern technologies •The complexities of data discovery, harvesting, and preparation •How to apply agile techniques to shorten the analytics development cycle •How to apply lean learning principles to develop actionable and valuable analytics •How to apply continuous delivery techniques to operationalize analytical modelsTRANSCRIPT
BIG DATA AGILE ANALYTICS Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks
1
Valu
e
Complexity
What happened?
Descriptive Analytics
Why did it happen?
Diagnostic Analytics
What will happen?
Predictive Analytics How can we
make it happen?
Prescriptive Analytics
Valu
e
Complexity
What happened?
Descriptive Analytics
Why did it happen?
Diagnostic Analytics
What will happen?
Predictive Analytics How can we
make it happen?
Prescriptive Analytics
3
Traditional Business Intelligence
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Advanced Analytics
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Big Data Analytics Pipeline
Modeling Data
Operational Data
External Data
Data Integration
Reporting Engine
Dimension Mapping
Clean Data
Report Report Report
Dimensional Data
Data Sampling
Feature Selection
Data Partitioning
Test Data
Training Data
Analytical Modeling
Candidate Model
Model Validation
Accepted Model
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Advanced Analytics
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Discover & Explore
Analyze & Act
Data Convergence Analytical Divergence
Discover
Harvest
Filter
Integrate Augment Analyze
Act
Analytical Opportunities
How Advanced Analytics Works If we knew X, we could do Y
Typical Timeline
3-6 months 2 months 2-4 months
10
Data Convergence Analytical Divergence
Discover
Harvest
Filter
Integrate Augment Analyze
Act
Analytical Opportunities
Traditional Analytics If we knew X, we could do Y
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Analytical Divergence
Analytical Opportunities If we knew X, we could do Y
Data Convergence
Discover
Harvest
Filter
Integrate Augment Analyze
Act
Repeat this cycle solving small problems every few days
LEARN
MEASURE
BUILD
Agility in Analytics
Retain high value customers
High value business goal
Like this example…
What’s the smallest, simplest thing we can do?
Retain high value customers
Like this example… Common features of
defectors?
Is it useful & actionable?
Retain high value customers
Like this example… Common features of
defectors?
Repeat! Retain high value customers
Like this example… Common features of
defectors?
Shopping behaviors of defectors?
Retain high value customers
Like this example… Common features of
defectors?
What leads to customers leaving?
Shopping behaviors of defectors?
What do defectors say about us?
Customers’ sentiment before defecting?
What encourages customers to stay?
Do incentives reduce defection rates?
Problem solved or continue?
What leads to customers leaving?
Like this example… Common features of
defectors?
Shopping behaviors of defectors?
What do defectors say about us?
Customers’ sentiment before defecting?
What encourages customers to stay?
Do incentives reduce defection rates?
Advanced Analytics
Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Science
Machine Learning
Statistics
THE “DATA SCIENTIST”
Machine Learning Statistical Modeling
Artificial Neural Networks
Decision Tree Learning
Support Vector Machines
Clustering
…and many more…
Bayesian Classification
Monte Carlo Simulation
Logistic Regression
K-Nearest Neighbor
…and many more…
Domain Knowledge
Data Semantics
Business Understanding
Business Communication
Programming Skills
Functional Programming
Data “Wrangling”
Map/Reduce, SQL, & NoSQL
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
drones.pitchinteractive.com
Data Visualization
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Reduction
Objective Truth
Discoverable Truth
Uninterpretable
Irrelevant Noise
Not Actionable
Impactful New Insights
“Little Data”
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Reduction
Insight
Knowledge
Action
Disruption
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Reduction
Insight
Knowledge
Action
Disruption
Business vs. IT
Focus vs. Platform
Monitor & Measure
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Reduction
Insight
Knowledge
Action
Disruption
Business vs. IT
Focus vs. Platform
Monitor & Measure
Privacy Controls Radical Transparency
Data Democracy
Open Data
Advanced Analytics
Data Science
Visual Storytelling
Machine Learning
Statistics Agile Analytics
Big Data Solutions Thinking
Ethics
Agile Delivery Lean
Learning
Impact
Volume Velocity
Variety
NoSQL Complexity
Polyglot Persistence
Continuous Integration
Collaboration Evolve
Continuous Delivery
Hypothesis
Build Learn
Measure
Data Reduction
Insight
Knowledge
Action
Disruption
Business vs. IT
Focus vs. Platform
Monitor & Measure
Privacy Controls Radical Transparency
Data Democracy
Open Data
Ken Collier, Director, Agile Analytics [email protected]
Value Creation
Cool New Technologies +
Sophisticated Analytics +
Lean Learning Principals +
Fast Agile Delivery =