big data analytics - kannan.files.wordpress.com · what is big data analytics •big data is so...
TRANSCRIPT
Big Data Analytics
Why Enterprises are struggling
and How Startups can step
in
Manish Choudhary Sunil Guttula
Susheel Kaushik Raghu Mendu
!!!
!!!
!!!
!!!
!!!
“Big Data Is Less About Size, And More About Freedom”
―TechCrunch
!!!
!!!
!!! “Findings: ‘Big Data’ Is
More Extreme Than Volume”
― Gartner
“Big Data! It’s Real, It’s Real-time, and It’s Already Changing Your World”
―IDC
“Total data: ‘bigger’ than big data”
― 451 Group
!!!
!!!
!!!
!!!
!!!
“Big Data Is Less About Size, And More About Freedom”
―Techcrunch
!!!
!!!
!!! “Findings: ‘Big Data’ Is
More Extreme Than Volume”
― Gartner
“Big Data! It’s Real, It’s Real-time, and It’s Already Changing Your World”
―IDC
“Total data: ‘bigger’ than big data”
― 451 Group
Stop Talking, Let’s Get Started
Digital Shadow – it’s growing continuously
Storage Footprint is also Increasing
"The World’s Technological Capacity to Store, Communicate, and Compute Information", Martin Hilbert and Priscila López (2011), Science
281 471 2,200
67,000 667,000
1
10
100
1,000
10,000
100,000
1,000,000
1986 1993 2000 2007 2012
Peta
byt
es
Storage in Exabytes
10x in 5 yrs
10x in 5 Yrs
2x in 10 Yrs
What is Big Data Analytics
• Big Data is so large and complex that it becomes difficult to process using existing on-hand database management tools – Walmart – Over 1 Million transactions an hour
• 2.5 Peta Bytes = 167 times the information contained in all books
• Big Data Analytics is application of advanced analytic techniques to very large, diverse and varied data sets
• Amount of Data is ever increasing Volume
• Data creation speed and insights latency Velocity
• Type of Data – Structured & Unstructured Variety
Predict Buyer Behavior to Increase Revenue Big Data Analytics Enables Increased Per-Customer-Profit
LOW
HIGH
Agent “Best Guess”
Custo
mer
Pro
fit
Branch Level Reporting Enabling
Profit-based Recommendations
Legacy System
TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED
Big Data Analytics
BI Reporting
Market Basket Analysis & Customer Lifetime Value Computations Enabling
User-based Recommendations
In-Database Analytics
Data Enriched with Unstructured Activity Logs
To Identify At Risk Customers
USE CASE
Reduce Risk With External Data Big Data Analytics Enables Global Crisis Avoidance
LOW
HIGH
Daily Risk Model Updates
Underw
riting R
isk
Monthly Risk Model Updates
TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED
Legacy System
BI Reporting In-Database Analytics
K-Means Clustering & Decision Tree Scoring Improves Accuracy
Delivering In Minutes What Was
Days
Big Data Analytics
Unstructured Data Sources Enrich The Data
USE CASE
Staying Competitive
• Earned Run Average (ERA) – Used to define pitchers
“productivity” for almost 100 years in baseball
– Statistic that measures how good pitchers are preventing runs
• Skill-Interactive Earned Run Average (SIERA)
SIERA = 6.145 - 16.986*(SO/PA) +11.434*(BB/PA) - 1.858*((GB-
FB-PU)/PA) + 7.653*((SO/PA)^2) +/- 6.664*(((GB-FB-PU)/PA)^2) +
0.130*(SO/PA)*((GB-FB-PU)/PA) - 5.195*(BB/PA)*((GB-FB-PU)/PA)
Source: New York Times
Data Manipulation, Analytics and Visualization
Analytics Productivity Layer
Massively Parallel Processing
SQL
Data Scientist
Data Engineer
Data Analyst Bl Analyst
LOB User
Data Platform Admin
DA
TA S
CIE
NC
E TE
AM
Cloud, x86 Infrastructure, or Appliance
NO SQL
Data Scientist
• Extract meaning from data and create data products
• Leveraging: Math, Statistics, Machine Learning, Pattern Recognition, Advanced Computing, Uncertainty Modeling, Visualization…
Data Scientist Is The 'Sexiest Job Of The 21st Century' - Harvard Business Review
Mobile
Data Visualization
Big Data Analytics Reference Architecture
Ingestion Assimilation Store and Access Analysis Visualization
Data Sources
Structured Data Sources
Traditional Data Integration
Traditional Data Warehousing
Big Data Analytics Ramifications
POS
CRM
ERP
LOB Data
Web/Social
Multimedia
Machine
Mobile
Documents
ETL
MDM
Data Quality
Federated Data
Warehouse
Enterprise Data
Warehouse
Dat
a M
arts
BU 1
BU 2
BU 3
BI as a Service
SQL Stores
BIG Data Analytics Platforms (MPP Databases, NoSQL Stores)
EMC Greenplum, Apache Hadoop, HP Vertica, IBM
Netezza, Oracle ExaData, Teradata
Statistics D
ata Min
ing
Op
eration
s Research
Neu
ral Netw
orks
OLA
P
Gen
etic Algo
rithm
s
Alerts Dashboard
Spreadsheet Report
Traditional Analysis
Automation
Early Adopters of Big Data Analytics Online Use Case
Online Companies
• Media • Ad Optimization
• Article Categorization
• Retailers • Product Recommendation
Telco
• Churn Prediction
Banking
• Product Recommendation
Marketing
• Sentiment Analysis
Why they worked • Everything at scale • Actions were automated • New business process
Data Scientists • Built complex optimization
and recommendation models
Enterprise Adoption of Big Data Analytics
LaggardsLate
Majority
Early
Majority
Early
AdoptersInnovators
"The
Chasm"
Technology Adoption ProcessData Scientists working with
Business Users
Business Users collaborating with Data
Scientists
Business Users
Online Enterprises
Great Panel
• Manish Choudhary
• MD at Pitney Bowes Consumer
• Sunil Guttula
• CEO Bizosys Technologies Producer
• Raghu Mendu
• Co-Founder, Ventureast Investor
• Susheel Kaushik
• Senior Director Product Management, EMC All Rounder
Backup Slides