Download - Getting your head around big data
Getting your head around
BIG Data
https://github.com/glennblockhttps://twitter.com/gblock
“I should be tweeting"
3
Make machine data accessible, usable and valuable to everyone.
Platform for Machine DataAny Machine Data
HA Indexes and Storage
Search and Investigation
Proactive Monitoring
Operational Visibility
Real-time Business Insights
CommodityServers
Online Services Web
Services
ServersSecurity GPS
Location
StorageDesktops
Networks
Packaged Applications
CustomApplicationsMessaging
TelecomsOnline
Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
DATA
15,000 BC – PicturesLascaux, France
6000 BC – Symbols
3,500 BC – Language
1,275 BC – Papyrus
1st - 13th Century - Codex
13th Century – Movable type
15th Century – Printing press
19th to 20th century Babbage Analytical engine
1936 – Turing machine
1945 – ENIAC
1947 – The first bug
1977 - Arpanet
1990s Internet
Phones and Tablets
RFID
Cloud
Services
New consumer devices
23
90 percent of all the data in the world has been generated over the last two years
source: sciencedaily.com
Every day 2.5 quintillion bytes of data is generated
1 quintillion = 1 + 18 zeros!57.5 billion 32 GB iPads
source: storagenewsletter.com
2.7 zettabytes exist in the digital universe
1 zettabyte = 1 + 21 zeros!42zb = All human speech digitized
source: highscalability.com
How big is big?
That’s A LOT of data!
How do you harness it?
This is what big data is really about.
Asking questions andgetting answers
Massive amounts of data.
Machine generated
VOLUME
Data is coming from a multitude of sources
Mix of structured and un-structured (JSON, XML, CSV, Plain Text)
Need a way to store it and and query it
VARIETY
VARIETYLog filesActivity FeedsEmails
Device StreamsAudio FilesVideos
Data arrives at many different frequencies
Need to be able to process real time.
VELOCITY
Not all data that is stored is useful.
Need to identify the useful data
Need to wade through all the noise
VERACITY
SOLUTIONS
Map/Reducefunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)
function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
Hi scale and availability databases
Distributed processing of large datasets
Data Visualization and analysis
End to end tools
More information
www.mongodb.org www.memsql.com cassandra.apache.orghadoop.apache.org
www.tableausoftware.comwww.elasticsearch.orgsplunk.com
@gblock http://github.com/glennblock
http://www.flickr.com/photos/11812960@N04/4050576435