the hive think tank: "stream processing systems" by karthik ramasamy of twitter
TRANSCRIPT
Stream Processing Systems
Karthik RamasamyTwitter
@karthikz
2
Value of Real Time DataIt’s contextual
[1] Courtesy Michael Franklin, BIRTE, 2015.
3
Heron
Batching of tuplesAmortizing the cost of transferring tuples
Task isolation
Ease of
debug-ability/isolation/profiling
Fully API compatible with StormDirected acyclic graph
Topologies, Spouts and Bolts
Support for back pressureTopologies should self adjustingg
Use of main stream languagesC++, Java and Python
EfficiencyReduce resource consumption G
Design: Goals
4
Better Storm
Twitter Heron
Container Based Architecture\Separate Monitoring and Scheduling-Simplified Execution Model2Much Better Performance
5
HeronSample Topologies
6
Heron@TwitterStorm is decommissioned
LARG
EST
CLUS
TER
100’
s of T
OPO
LOGI
ES
BILL
IONS
OF M
ESSA
GES
100’s
OF T
ERAB
YTES
REDU
CED
INCI
DENT
S
GOO
D N
IGHT
SLE
EP
3X reduction in resource usage
Auto scaling the system in the presence of unpredictability
7
Technology Challenges
The Road Ahead
Auto tuning of real time analytics jobs/queries
Exploiting faster networks for efficiently moving data
ÄÜ
J
8
@karthikz Get in Touch