fifth elephant - 2014: live analytical dashboards at scale

of 21 /21
Live analytical dashboards at scale - SQL style Shashwat Agarwal

Author: agshashwat

Post on 15-Jan-2015

188 views

Category:

Data & Analytics


1 download

Embed Size (px)

DESCRIPTION

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

TRANSCRIPT

  • 1. Live analytical dashboards at scale - SQL style Shashwat Agarwal

2. Live Analytical 3. Live Analytical 4. What we have Services (Alotofthem) Events (millions of updates) Information 5. Challenges Metric Definition Scale Reliability 6. Metric Definition Not just count of events; but func of fields from one or more related events/entities on each event or a batch of events (for statistical analysis) for a set of dimensions 7. Scale Challenges Dimensional Lookup High throughput (write), Low Latency (query) MultiDimensional Store 8. Reliability Challenges Accuracy Consistency Fault tolerance 9. Solution? Real time + Scale == Stream Processing Kafka Storm 10. Storage MultiDimensional support Optimized for Time series query Low query response times High write throughput Scalable TSD* * OpenTSDB does not support kerberose 11. Metric Definition Not scalable to write storm topologies for each metrics Require DSL for non-tech folks Introducing... Esper 12. Storm Topology - 1 Dim Lookup Dim Lookup Kafka Spouts Enricher Bolts Kafka Bolts { id: a123-234, time: 1234, entityId: OD12 } Event { id: a123-234, time: 1234, entityId: OD12 } Enriched Event Dim Store 13. Storm Topology - 3 TSDKafka Spouts Esper Bolts TSD Bolts { id: a123-234, time: 1234, entityId: OD12 } Enriched Event ( metric name, [dim name-value-pairs]*, value, ts ) 14. Time Batching Event time Enables calculate statistics windowed join out of order events 15. Reliability Faults Upgrades Metrics Def changes Last good Checkpoint Reset Checkpoint Replay Transactional Storm 16. Storm Topology - 2 Kafka Spouts TIme Batch Bolt HBase Bolt { id: a123-234, time: 1234, entityId: OD12 } Enriched Event 17. HBase Time Batch Schema Table 1 - Event Queue Key _slot_ batchId is constructed from event timestamp Value (each column - Event JSON) 18. HBase Time Batch Schema Table 2 - Event Queue Update Log Key _log__ batchId is constructed from event timestamp version is timestamp at which batch was updated Value Version 19. Storm Topology - 3 TSD Time Batch Spout Esper Bolts TSD Bolts ( metric name, [dim name-value-pairs]*, value, ts ) 20. Learnings Replayability Event and Entity Schema Checkpointing Bootstrapping Sidelining Fault Tolerance 21. Questions ?? sb.lk/hasgeek