introducing athena: 08/19 big data application meetup, talk #3

33
Athena Streaming with Samza Goddess of Wisdom

Upload: cask-data-inc

Post on 13-Apr-2017

634 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Athena Streaming with Samza

Goddess of Wisdom

Page 2: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

The team

Page 3: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Brief Background Stream Processing, Samza, Athena

Kafka Uber uses Kafka as logging/event system

Message streams

Near-real time computation

Stream Processing Framework: Apache Samza

Platform built on top of Samza: Athena

Page 4: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Current use cases

Page 5: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Current use cases Aggregation

Page 6: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Pricing

Assess supply and demand in real time to calculate accurate surge multiples

Kafka Samza location updates

Elasticsearch

S3 Spark location updates

HTTP Service

Realtime

Batch

Queries

Page 7: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Mapper Job

Mapper Job

Mapper Job

Event  parsing,  filtering,  classification  

Event  Aggregation  

Raw events

Riak

Inserter Job

Inserter Job

Tiles

1m Agg Tiles

Reducer

Job

Reducer

Job

1m Agg Tiles

Per  Contract   Common  

Artemis Samza

Page 8: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Current use cases Event driven update engine

Page 9: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Driver Activation

KAFKA SAMZA Driver status updates

Onboarding Service

Fetch driver info

RocksDB

Retry queue

Update status

Page 10: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Upcoming use cases

Page 11: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Fraud monitoring and alerting

KAFKA Partitioner

Real Time

Hourly

Alerting Service

Monitoring Service

Uberx metrics

Page 12: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Da Vinci: Streaming platform for data science

KAFKA Aggregation job (Model generation)

RocksDB

Historical state

Timeseries data

Model evaluation

Database / Elasticsearch

updates

Model Updates

Page 13: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Samza Architecture Overview

Page 14: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Samza Architecture

Page 15: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Basic Structure of a Task

Page 16: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Task

Page 17: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Deployment in Uber

Page 18: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Page 19: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Athena Tooling

Page 20: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Tooling

●  Athena manager ●  Job configuration ●  Unit test framework ●  Graphite integration ●  Codahale library support ●  Maven archetype ●  Artifactory support

Page 21: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Athena Manager

Page 22: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Page 23: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Page 24: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Job configuration

reference.conf

Sandbox Staging Production Dev

athena-core-lib

application.conf

Sandbox Staging Production Dev

Samza job

Page 25: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Job configuration projects { artemis { job_1 { mapper { envs.common { task.inputs = "kafka.topic_1" task.class = "com.uber.athena.SampleTaskClass" } envs.local = ${projects.artemis.job_1.mapper.envs.common} envs.local = { task.window.ms = 30000 } envs.sandbox = ${projects.artemis.job_1.mapper.envs.local} envs.sandbox = { yarn.package.path = "http://artifactory..../artifactory/libs-snapshot-local/com/uber/athena/.../hello-athena.tar.gz" } envs.staging = ${projects.artemis.job_1.mapper.envs.sandbox} envs.production = ${projects.artemis.job_1.mapper.envs.sandbox} }

Page 26: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Job configuration

Page 27: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Unit test framework

Stream Job

StreamTask InitableTask WindowableTask

TaskUnitTestHarness

Message Listener

Inject data Custom IncomingMessageEnvelope

Custom MessageCollector

Page 28: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Unit test framework

String classTaskName = "com.uber.athena.test.TestProcessTask"; // Register the job to the test harness TaskUnitTestHarness<String, Integer> testProcessTask = new TaskUnitTestHarness<>(classTaskName, false, true); // Register a listener for validating the output of every process function testProcessTask.registerMessageListenerOnProcess(new KeyAsserterListener()); // Start the job testProcessTask.start(); // Inject data testProcessTask.inject(key,value); // Get the full output testWindowTask.getResult()

Page 29: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Tooling

●  Athena manager ●  Job configuration ●  Unit test framework ●  Graphite integration ●  Codahale library support ●  Maven archetype ●  Artifactory support

Page 30: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Observations

Page 31: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Observations

●  YARN is not bad ! ●  Offset lag and Buffered messages ●  Kafka Appender for ELK ●  Checkpoint topic partition incorrect count ●  Config validation needs improvement ●  Job restarts are complicated ●  Built-in Metrics are insufficient

Page 32: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

●  Seamless upgrades ●  Custom built-in serde support ●  Config validation enhancement ●  Auto benchmarking a Samza job ●  Unit test framework enhancement

Upcoming Samza improvements

Page 33: Introducing Athena: 08/19 Big Data Application Meetup, Talk #3

Q&A