a real time architecture using hadoop and storm @ fosdem 2013
DESCRIPTION
TRANSCRIPT
![Page 1: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/1.jpg)
A real-time architecture using Hadoop and Storm.
![Page 2: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/2.jpg)
A real-time architecture using Hadoop & Storm. 2
Speakers
Nathan Bijnens@nathan_gs
Geert Van Landeghem@gvanlandeghem
![Page 3: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/3.jpg)
A real-time architecture using Hadoop & Storm. 3
Our Vision
Big Data
test
Volume
![Page 4: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/4.jpg)
A real-time architecture using Hadoop & Storm. 4
Big Data
test
Velocity
![Page 5: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/5.jpg)
A real-time architecture using Hadoop & Storm. 5
Our Vision
Volume
test
Variety
![Page 6: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/6.jpg)
A real-time architecture using Hadoop & Storm. 6
Credits
Nathan MarzEngineer at Backtype(now Twitter).
Storm
Cascalog
ElephantDB
manning.com/marz
![Page 7: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/7.jpg)
A real-time architecture using Hadoop & Storm. 7
A Data System
![Page 8: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/8.jpg)
A real-time architecture using Hadoop & Storm. 8
Not all information is equal. Some information is derived from other pieces of
information.
Data is more than Information
![Page 9: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/9.jpg)
A real-time architecture using Hadoop & Storm. 9
Eventually you will reach the most
This is the information you hold true, simple because it exists.
Data is more than Information
![Page 10: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/10.jpg)
A real-time architecture using Hadoop & Storm. 10
EventsEverything we do generates events:- Pay with Credit Card
- Commit to Git
- Click on a webpage
- Tweet
![Page 11: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/11.jpg)
A real-time architecture using Hadoop & Storm. 11
Events used to manipulate the master data.
Events - Before
![Page 12: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/12.jpg)
A real-time architecture using Hadoop & Storm. 12
Today, events are the master data.
Events - After
![Page 13: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/13.jpg)
A real-time architecture using Hadoop & Storm. 13
everything.
Data System
![Page 14: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/14.jpg)
A real-time architecture using Hadoop & Storm. 14
Data is Immutable
Events
![Page 15: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/15.jpg)
A real-time architecture using Hadoop & Storm. 15
Data is Time Based
Events
![Page 16: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/16.jpg)
A real-time architecture using Hadoop & Storm. 16
Capturing change traditionally
Person Location
Nathan Antwerp
Geert Dendermonde
John Ghent
Person Location
Nathan Ghent
Geert Dendermonde
John Ghent
![Page 17: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/17.jpg)
A real-time architecture using Hadoop & Storm. 17
Capturing change
Person Location Time
Nathan Antwerp 2005-01-01
Geert Dendermonde 2011-10-08
John Ghent 2010-05-02
Nathan Ghent 2013-02-03
Person Location Time
Nathan Antwerp 2005-01-01
Geert Dendermonde 2011-10-08
John Ghent 2010-05-02
![Page 18: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/18.jpg)
A real-time architecture using Hadoop & Storm. 18
The data you query is often transformed, aggregated, ...
Query
![Page 19: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/19.jpg)
A real-time architecture using Hadoop & Storm. 19
Query
Query = function ( data )
![Page 20: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/20.jpg)
A real-time architecture using Hadoop & Storm. 20
Number of people living in each city.
Person Location Time
Nathan Antwerp 2005-01-01
Geert Dendermonde 2011-10-08
John Ghent 2010-05-02
Nathan Ghent 2013-02-03
Location Count
Ghent 2
Dendermonde 1
![Page 21: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/21.jpg)
A real-time architecture using Hadoop & Storm. 21
Query
All Data Query
![Page 22: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/22.jpg)
A real-time architecture using Hadoop & Storm. 22
Query: Precompute
All Data QueryPrecomputed
View
![Page 23: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/23.jpg)
A real-time architecture using Hadoop & Storm. 23
Layered Architecture
Speed Layer
Batch Layer
Serving Layer
![Page 24: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/24.jpg)
A real-time architecture using Hadoop & Storm. 24
Layered Architecture
HadoopElephant
DB
Qu
ery
Incoming Data
Cassandra
![Page 25: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/25.jpg)
A real-time architecture using Hadoop & Storm. 25
Batch Layer
![Page 26: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/26.jpg)
A real-time architecture using Hadoop & Storm. 26
Batch Layer
HadoopElephant
DB
Incoming Data
![Page 27: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/27.jpg)
A real-time architecture using Hadoop & Storm. 27
Unrestrained computation.
Batch Layer
![Page 28: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/28.jpg)
A real-time architecture using Hadoop & Storm. 28
Horizontal scalable.
Batch Layer
![Page 29: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/29.jpg)
A real-time architecture using Hadoop & Storm. 29
High Latency.
matter.
Batch Layer
![Page 30: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/30.jpg)
A real-time architecture using Hadoop & Storm. 30
Stores master copy of data set...
Batch Layer
append only.
![Page 31: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/31.jpg)
A real-time architecture using Hadoop & Storm. 31
Batch Layer
![Page 32: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/32.jpg)
A real-time architecture using Hadoop & Storm. 32
Batch: View generation
Master Dataset
View #1
View #3
View #2MapReduce
![Page 33: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/33.jpg)
A real-time architecture using Hadoop & Storm. 33
1. Take a large problem and divide it into sub-problems
2. Perform the same function on all sub-problems
3. Combine the output from all sub-problems
…
…
Output
MA
PRED
UC
E
MapReduce
DoWork() DoWork() DoWork()…
![Page 34: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/34.jpg)
A real-time architecture using Hadoop & Storm. 34
Read only database.No random writes required.
Batch View Database
![Page 35: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/35.jpg)
A real-time architecture using Hadoop & Storm. 35
Batch View DatabaseElephantDB
Splout
![Page 36: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/36.jpg)
A real-time architecture using Hadoop & Storm. 36
Batch Layer
Not yet absorbed.
Data absorbed into Batch Views
Time No
w
Just a few hours of data.
![Page 37: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/37.jpg)
A real-time architecture using Hadoop & Storm. 37
Speed Layer
![Page 38: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/38.jpg)
A real-time architecture using Hadoop & Storm. 38
Overview
HadoopElephant
DB
Incoming Data
Cassandra
![Page 39: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/39.jpg)
A real-time architecture using Hadoop & Storm. 39
Stream processing.
Speed Layer
![Page 40: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/40.jpg)
A real-time architecture using Hadoop & Storm. 40
Continuous computation.
Speed Layer
![Page 41: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/41.jpg)
A real-time architecture using Hadoop & Storm. 41
Transactional.
Speed Layer
![Page 42: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/42.jpg)
A real-time architecture using Hadoop & Storm. 42
Storing a limited window of data.Compensating for the last few hours of data.
Speed Layer
![Page 43: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/43.jpg)
A real-time architecture using Hadoop & Storm. 43
All the complexity is isolated in the Speed layer auto-
corrected.
Speed Layer
![Page 44: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/44.jpg)
A real-time architecture using Hadoop & Storm. 44
CAPYou have a choice between:
Availability- Queries are eventual consistent.
Consistency- Queries are consistent.
![Page 45: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/45.jpg)
A real-time architecture using Hadoop & Storm. 45
Some algorithms are hard to implement in real time. For those cases we could
estimate the results.
Eventual accuracy
![Page 46: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/46.jpg)
A real-time architecture using Hadoop & Storm. 46
Speed Layer
Incoming Data
Real Time
View 1
Real Time
View 2
![Page 47: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/47.jpg)
A real-time architecture using Hadoop & Storm. 47
StormMessage passing.
Distributed processing.
Horizontally scalable.
Incremental algorithms.
Fast.
Data in motion.
![Page 48: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/48.jpg)
A real-time architecture using Hadoop & Storm. 48
StormMessage passing.
Distributed processing.
Horizontally scalable.
Incremental algorithms.
Fast.
Data in motion.
![Page 49: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/49.jpg)
A real-time architecture using Hadoop & Storm. 49
Storm
Nimbus Zookeeper
Worker Node
Supervisor
Wo
rker
Wo
rker
Wo
rker
Worker Node
Supervisor
Wo
rker
Wo
rker
Wo
rkerWorker Node
SupervisorW
orker
Wo
rker
Wo
rker
![Page 50: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/50.jpg)
A real-time architecture using Hadoop & Storm. 50
StormTuple
Stream
![Page 51: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/51.jpg)
A real-time architecture using Hadoop & Storm. 51
StormSpout
Bolt
![Page 52: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/52.jpg)
A real-time architecture using Hadoop & Storm. 52
StormGrouping
![Page 53: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/53.jpg)
A real-time architecture using Hadoop & Storm. 53
Speed Layer ViewsThe views are stored in Read & Write database.- Cassandra
- Hbase
- MongoDB
- MySQL
- ElasticSearch
-
Much more complex than a read only view.
![Page 54: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/54.jpg)
A real-time architecture using Hadoop & Storm. 54
Serving Layer
![Page 55: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/55.jpg)
A real-time architecture using Hadoop & Storm. 55
Overview
HadoopElephant
DB
Qu
ery
Incoming Data
Cassandra
![Page 56: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/56.jpg)
A real-time architecture using Hadoop & Storm. 56
This layer queries the Batch & Real Time views and merges it.
Serving Layer
![Page 57: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/57.jpg)
A real-time architecture using Hadoop & Storm. 57
Serving Layer
Real Time Views
Merge
Batch Views
![Page 58: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/58.jpg)
A real-time architecture using Hadoop & Storm. 58
Overview
![Page 59: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/59.jpg)
A real-time architecture using Hadoop & Storm. 59
Overview
HadoopElephant
DB
Qu
ery
Incoming Data
Cassandra
![Page 60: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/60.jpg)
A real-time architecture using Hadoop & Storm. 60
Lambda ArchitectureCan discard any view, batch and real time, and just recreate everything from the master data.
Mistakes are corrected via recomputation.- Write bad data? Remove the data & recompute.
- Bug in view generation? Just recompute the view.
Data storage is highly optimized.
![Page 61: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/61.jpg)
A real-time architecture using Hadoop & Storm. 61
Recommendations
![Page 62: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/62.jpg)
A real-time architecture using Hadoop & Storm. 62
Serialization & Schema
Catch errors as quickly as they happen. Validation on write vs on read.
![Page 63: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/63.jpg)
A real-time architecture using Hadoop & Storm. 63
Serialization & Schema
CSV is actually a serialization language that is just poorly defined.
![Page 64: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/64.jpg)
A real-time architecture using Hadoop & Storm. 64
Serialization & SchemaUse a format with a schema.- Thrift
- Avro
- Protobuffers
![Page 65: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/65.jpg)
A real-time architecture using Hadoop & Storm. 65
Questions?
What are your needs?@nathan_gs & @gvanlandeghem
![Page 66: A real time architecture using Hadoop and Storm @ FOSDEM 2013](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6504e4a7959b1098b459b/html5/thumbnails/66.jpg)
A real-time architecture using Hadoop & Storm. 66
DataCrunchers
We enable companies in envisioning, defining and implementing a data strategy.
A one-stop-shop for all your Big Data needs.
The first Big Data Consultancy agency in Belgium.