from concept to adoption - the maze of organizational readiness for big data solutions
TRANSCRIPT
From concept to adoptionThe maze of organizational readiness for Big Data solutions
Or
Switching gears in a maturing organization
Intro
4
2017
The Workshop
The Story
6
2017
The Workshop
Accepting payments for virtual goods is a risky business.
Ecommerce fraud
1.0 Countermeasures§ Limit the number of attempts
§ Vet the source of transactions
§ Manual review of transactions
Ecommerce fraud
2.0 Countermeasures§ Semantic associations
§ Risk modeling
§ Behavioral modeling
vs.
Friendly fraudIdentity theft
Financial fraud
A prediction problem
Given a set of features describing an user, how likely are things to turn sour?
Building a Vision
The art and the science
10
2017
The Workshop
The ProblemA long feedback loop between§ Data entering our systems§ People gaining insights about
its meaning§ Having those insights
creating an impact
Data
Users DDDApplications months
Data Driven Decisions
11
2017
The Workshop
Start with Why§ Why is the problem relevant?§ Why now?
Data
Users DDDApplications real-time
12
2017
The Workshop
The ContextAre we using the right tools?
13
2017
The Workshop
The ContextDeath by a thousand cuts
14
2017
The Workshop
Big DataMore than a technology enabler
Used to define• A new approach to decision
making• A different operating model
• Changes in roles and responsibilities
Something that crystallizes the imagination of people We buzzwords
15
2017
The Workshop
Anything missing?
Reality Check #1
Technology selection
17
2017
The Workshop
LandscapeTechnology selection is getting increasingly more complex§ Vendors push for vertical platforms
§ We love to build frameworks
All data products can be bent to a certain extent§ Native graph, non native document
§ Native columnar, non native time-series
18
2017
The Workshop
19
2017
The Workshop
1. Big PictureUsed as a collection of guiding principles and patterns
It highlights
§ What capabilities are needed§ When decisions need to be made
Other considerations§ Existing skills§ Competency availability
§ Learning curve
Applications
RTDW
BIOLAP
OLTP
Data Ingestion
Events store
events
raw data
Stream Processing
Aggregate / Specialized databases
aggregates
MR / Hive / Spark / R
Cloud Disk FTP ServerCloud
Switch Back Up Server
LDAP Server LDAP Server
UPS Battery
Firewall Backup Tape Library
LDAP Repository LDAP Repository
Batch Processing Dashboard
ML
User
20
2017
The Workshop
2. ComplexityTraditionally regarded to as size.
Reality is there’s more to it§ 5Vs is about inherent complexity
§ Extrinsic complexity needs to be factored in
VarietyVeracity
VelocityVolume
Value
Availability Confidentiality
21
2017
The Workshop
3. Experiment§ Operational and production
readiness§ CAP theorem in practice
§ The devil is in the details
The product brochure does not give the full picture§ What is @Aphyr saying?
§ Do you really need it?
Reality Check #2
Users
23
2017
The Workshop
Traditional BIThe role of BI is traditionally biased towards reaction.§ Reports
§ KPIs§ Alerts
Heavy reliance on § Few, coarse grained aggregates
§ SQL§ Excel!
BI team
BusinessData visualization
Programming
Statistics
24
2017
The Workshop
Data Science§ Not just about upskilling§ Focus on building actionable
insights
§ Find champions that can help spread the word
§ Learn the craftData Science team
BusinessData visualization
ProgrammingStatistics
Big data
Reality Check #3
Maturity vs Innovation
26
2017
The Workshop
MaturityProcess, organizational structure and engineering practices have the potential of hindering innovation
Innovation-led projects are hard to manage when an organization is in a subsequent phase
So, ultimately…
Fluid phase Transitional phase Specific phase
Rate of innovation
time
Product innovation
Process innovation
Source: (Utterback 1994)
27
2017
The Workshop
Anything missing?
Focus
Maintaining momentum and engagement
29
2017
The Workshop
The road to MVPFocus on a Minimum Viable Solution§ Focus on outcome, not output§ Deliver value incrementally
§ Measure early§ Experiment with real data
Build a start-up team to focus on core benefits
§ Cutting through bureaucracy§ Ensuring we avoid biggerism
Core benefits
Tangible Specification
Augmented features
Innovation happens at the centre
30
2017
The Workshop
MVSCore benefits
§ Taxonomy of associations between players and other data sources
§ Device protection for account takeovers
§ Fraud ring identification
§ Bonus abuse preventionDevice
fingerprint
LocationID check
Physical address
Phone number
Credit cards
Date of birth
Risk score
PasswordRelationship graph
Event store
Friendly fraud
Friendly fraud
Financial fraud
Identity theftFriendly fraudClassifier
31
2017
The Workshop
Outcomes§ Prove something, then engineer it
§ ML can be done in Excel
Choosing not to adopt something is as important as adopting it
§ Reduces clutter
§ Improves focus
Again, focus on the core benefitsApplications
RTDW
BIOLAP
OLTP
Data Ingestion
Events store
events
raw data
Stream Processing
Aggregate / Specialized databases
aggregates
MR / Hive / Spark / R
Cloud Disk FTP ServerCloud
Switch Back Up Server
LDAP Server LDAP Server
UPS Battery
Firewall Backup Tape Library
LDAP Repository LDAP Repository
Batch Processing Dashboard
ML
User
End of the story
33
2017
The Workshop
Lessons Learned § Understanding - facts checking§ Projecting – viable vision§ Executing – honest feedback
At all steps, isolate the blast radius
Understand
Project
Execute
§ The problem§ The context
§ Start with why§ Build a vision§ Experiment
§ Focus§ Measure early§ Learn & Adapt
Questions?