big data science at the digital catapult
DESCRIPTION
Talk on Big Data and the need for it in the digital economy. This talk is centred around the Digital Catapult's challenge areas.TRANSCRIPT
BIG DATA SCIENCE
“The price of light is far less than the cost of darkness”
Chandan Rajah [ @ChandanRajah ]
BENEFITS OF BIG DATA
COST SPEED
AGILITY CAPABILITY
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
What is Big Data ?
Big Data ≠ Data Volume
Big Data = Crude Oil
Think of data like ‘Crude Oil’
Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it
in ‘mega tanks’
What is Data Science ?
Data Science ≠ Statistical Analysis
Data Science = Oil Refinery
Data science is about ‘treating’ data; applying ‘science’ to the data;
Refine the data ‘results’; and combine to form ‘insight’
Knowns, Unknowns & DIKUW FTW!
known knownswe know we know
known unknownswe know we don’t know
unknown unknownswe don’t know we don’t
know
DDATA
IINFORMATION
KKNOWLEDGE
WWISDOM
UUNDERSTANDI
NG
raw what how to why when
numbers description experience cause & effect prediction
letters context tested proven what’s best
symbols relationship instruction
signals reports programs models
PAST FUTURE
Data Engineer Data Analyst Data Miner Data Scientist
known knownsknown unknowns unknown
unknowns
Data Analytics to Data Discovery ?
data you know
data you don’t know
qu
esti
on
s y
ou
’re a
skin
gq
uestio
ns y
ou
’re n
ot a
skin
g
Data Analyst
Data Scientist
DataAnalytics
Data Discovery
DATA MODELLINGY F( X, random noise, parameters)
ALGORITHMIC MODELLINGY [ BLACK BOX ] X
DIVIDE
SCATTER
Split Data in BlockReplicate and Store
Petabytes of Resilience
CONQUER
EXPLORE
1000s of Parallel ThreadsExplore Every PathMachine Learning
INSIGHT
GATHER
Real Time ActionPeriodic DashboardsIterative Evolution
What is the Big Idea ?
Divide = HDFS
Name Node
1 32
Client 1. Create Metadata
2. Put Blocks
Data Nodes
Control / Monitoring
1 1
2 2
3 3
WR
ITE
Name Node
1 1 1 2
2
2
3 3 34
4 4
Client 1. Get Metadata
2. Fetch Blocks
Data Nodes
Control / Monitoring
REA
D
Conquer = MapReduce
Insight = Functional Paradigm
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
Why is Big Data needed ?
VOLUME VELOCITY VARIETY
Exponential growth; 2x in
2 yrs
PB (1000 TB) is now
common
Event streams; never at
rest
640k GB per internet
minute
100s of data sources
85% not in a table
Where in the Value Chain ?
Generation Transport Knowledge Output Value
BIG DATA SCIENCE
Straddles all four Challenge Areas
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
Big Data Heat Map – Gartner 2012
Big Data Potential by Sector – McKinsey for USBLS, 2011
Big Data Investment by Industry – Gartner, 2012
Top Big Data Challenges – Gartner, 2012
Survey on Big Data Investments – IDG Survey, 2013
Survey on Main Drivers to Invest – IDG Survey,
2014
Steps to the EPIPHANY
WHERE
WHAT WHY
DEMO
DEMO
RECAP OF BENEFITS
COST SPEED
AGILITY CAPABILITY
LAST WORDS OF WISDOM
NOT ALL ROADS LEAD TO ROME
TIME VALUE OF DATA KNOWLEDGE IS POWER
I AM AN INDIVIDUAL
“The price of light is far less than the cost of darkness”