fabric - realtime stream processing framework

33
Fabric Real-time stream processing framework Shashank Gautam Sathish Kumar KS

Upload: shashank-gautam

Post on 12-Jan-2017

211 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Fabric - Realtime stream processing framework

Fabric Real-time stream processing framework

Shashank Gautam Sathish Kumar KS

Page 2: Fabric - Realtime stream processing framework

What is Fabric?

Fabric is a scalable, practical and reliable real-time stream processing framework designed for easy operability and extension.

Fabric is proven to work very well for:

● High velocity multi-destination event ingestion with guaranteed persistence.

● Rules/Filter based real-time triggers for advertising/broadcast● Online Fraud detection● Real-time pattern matching● Streaming analytics

Page 3: Fabric - Realtime stream processing framework

The Problem● Primary motivation

○ Streaming millions of messages per second○ Connectivity to different source - Kafka, MySql etc○ Write to different targets - DB, Queue, API or publish to other

high level applications○ Near real time

● Desirable properties from the framework ○ High Throughput - support of batching of events○ Data sanity - Avoiding datasets which makes no sense○ Make data available for other applications to consume○ Scalability and Data Reliability○ Provide easy development and deployment○ Resource effectiveness

Page 4: Fabric - Realtime stream processing framework

Fabric Core Components

Page 5: Fabric - Realtime stream processing framework

Fabric Compute and Executor Fabric Compute Framework

● Computation pipeline setup ● Batch event processing● Event passing among components● Acknowledgements

Page 6: Fabric - Realtime stream processing framework

Fabric Compute and Executor continued...

Page 7: Fabric - Realtime stream processing framework

Fabric Compute and Executor continued...

Fabric-executor

Responsible for :

● Launching, monitoring and managing deployed computations● 1:1 relation between 1 instance of computation : fabric executor

process● Fabric executor is single JVM process within a docker container

Page 8: Fabric - Realtime stream processing framework

Fabric Terminologies● Compute Framework

○ Realtime event processing framework○ Core event orchestration○ Perform user-defined operations

● EventSet○ Collection(of configurable size) of events○ Basic transmission unit within the computation

● Computation/Topology○ Pipeline for data flow using fabric components created by user○ Components can be of two types, Source and Processor

Page 9: Fabric - Realtime stream processing framework

Fabric Terminologies continued...

● Source○ Sources event sets into the computation○ Manages the Qos of the events ingested into the computation

● Processor○ Performs computation on an incoming event set ○ Emits an outgoing event set ○ Types:

■ Streaming Processor: Streaming Processor is triggered whenever and event set is sent to the processor.

■ Scheduled Processor: Scheduled Processor is triggered whenever a fixed period of time elapses in a periodic fashion.

Page 10: Fabric - Realtime stream processing framework

Management And DeploymentFabric Manager

● Dropwizard Web Service and runs inside a docker container● Provides APIs to register components - sources and processors ● Provides APIs to perform CRUD on computations● Management APIs to deploy, scale, get, delete computations● Application resource exposes APIs for deployment related operations of computations. ● Deployment Env: Marathon and Mesos

Sample Resources

● Components. eg: POST /v1/components ○ Other APIs - get, search, register etc

● Computation. eg: POST /v1/computations/{tenant} ○ Other APIs - get, search, update, deactivate etc

● Application. eg: POST /v1/applications/{tenant}/{computation_name}○ Other APIs - get, scale, suspend etc

Page 11: Fabric - Realtime stream processing framework

Fabric Sample

Page 12: Fabric - Realtime stream processing framework

Fabric Components

Page 13: Fabric - Realtime stream processing framework

Create Components using maven archetype

Maven archetype command -

mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=<artifact_id_of_your_project> -DgroupId=<group_id_of_your_project> -DinteractiveMode=ture

Example -

mvn archetype:generate -DarchetypeGroupId=com.olacabs.fabric -DarchetypeArtifactId=fabric-processor-archetype -DarchetypeVersion=1.0.0-SNAPSHOT -DartifactId=fabric-my-processor -DgroupId=com.olacabs.fabric -DinteractiveMode=ture

What it does -

Creates the pom project for the processor with all the updated version of compute and other related jars.

Creates boilerplate code, with example, for scheduled and stream processor. You can modify the example java file as per your need.

Page 14: Fabric - Realtime stream processing framework

Sample Fabric Source/*** A Sample Source Implementation which generates* Random sentences.*/@Source(namespace = "global", name = "random-sentence-source", version = "0.1", description = "Sample source", cpu = 0.1, memory = 64,requiredProperties = {}, optionalProperties = {"randomGeneratorSeed"})

public class RandomSentenceSource implements PipelineSource {

Random random;String[] sentences = { "A quick brown fox jumped over the lazy dog", "Life is what happens to you when you are busy making other plans" . . . . . .};

@Overridepublic void initialize(final String instanceName,final Properties global,final Properties local, final ProcessingContext processingContext, final ComponentMetadata componentMetadata) throws Exception {

int seed = ComponentPropertyReader.readInteger(local, global, "randomGeneratorSeed", instanceName, componentMetadata, 42); random = new Random(seed);

}

Page 15: Fabric - Realtime stream processing framework

Sample Fabric Source continued...

@Overridepublic RawEventBundle getNewEvents() {

return RawEventBundle.builder().events( getSentences(5).stream().map(sentence -> Event.builder() .id(random.nextInt()) .data(sentence.toLowerCase()) .build()) .collect(Collectors.toCollection(ArrayList::new))) .meta(Collections.emptyMap()) .partitionId(Integer.MAX_VALUE) .transactionId(Integer.MAX_VALUE) .build();}

private List<String> getSentences(int n) { List<String> listOfSentences = new ArrayList<>(); for (int i = 0; i < n; i++) { listOfSentences.add(sentences[random.nextInt(sentences.length)]); } return listOfSentences;}

}

Page 16: Fabric - Realtime stream processing framework

Sample Fabric Processor

/*** A sample Processor implementation which* Gets the data (sentences) and splits based on delim.*/

@Processor(namespace = "global", name = "splitter-processor", version = "0.1", cpu = 0.1, memory = 32, description = "A processor that splits sentences by a given delimiter", processorType = ProcessorType.EVENT_DRIVEN, requiredProperties = {}, optionalProperties = {"delimiter"})

public class SplitterProcessor extends StreamingProcessor { private String delimiter;

@Overridepublic void initialize(final String instanceName, final Properties global, final Properties local,

final ComponentMetadata componentMetadata) throws InitializationException {

delimiter = ComponentPropertyReader.readString(local, global, "delimiter", instanceName, componentMetadata, ",");}

Page 17: Fabric - Realtime stream processing framework

Sample Fabric Processor continued...

@Override protected EventSet consume(final ProcessingContext processingContext, final EventSet eventSet) throws ProcessingException {

List<Event> events = new ArrayList<>(); eventSet.getEvents().stream() .forEach(event -> { String sentence = (String) event.getData();

String[] words = sentence.split(delimiter); events.add(Event.builder().data(words)id(Integer.MAX_VALUE).properties(Collections.emptyMap()).build());

});

return EventSet.eventFromEventBuilder() .partitionId(eventSet.getPartitionId()) .events(events) .build();

}

@Override public void destroy() { // do some cleanup if necessary }

}

Page 18: Fabric - Realtime stream processing framework

Sample Computation / Topology

A sample topology -

● Select random sentence from in memory list● Split the sentence based on a delimiter● Counts the word● Prints the count on console

Page 19: Fabric - Realtime stream processing framework

Sample Computation / Topology Spec continued...

{ "name": "word-count-print-topology", "sources": [

{ "id": "random-sentence-source", "meta": { // … meta for source} }, "properties": { //.. properties for source} ], "processors": [ { "id": "splitter-processor",

"meta": { // … meta for processor} "properties": { //.. properties for processor} }, { "id": "word-count-processor",

"meta": { // … meta for processor} "properties": { //.. properties for processor} }, { "id": "console-printer-processor",

"meta": { // … meta for processor} "properties": { //.. properties for processor} } ],

"connections": [ { "fromType": "SOURCE", "from": "random-sentence-source", "to": "splitter-processor" }, { "fromType": "PROCESSOR", "from": "splitter-processor", "to": "word-count-processor" }, { "fromType": "PROCESSOR", "from": "word-count-processor", "to": "console-print-processor" } ], "properties": {// … global properties

}}

Page 20: Fabric - Realtime stream processing framework

Steps for Action

Page 21: Fabric - Realtime stream processing framework

Fabric Implementation at Ola

Page 22: Fabric - Realtime stream processing framework

Fabric At Ola

Page 23: Fabric - Realtime stream processing framework

Fabric At Ola continued...

Artifact Registration View

Page 24: Fabric - Realtime stream processing framework

Fabric At Ola continued...

Topology Creation View

Page 25: Fabric - Realtime stream processing framework

Fabric At Ola continued...

Created Topology View

Page 26: Fabric - Realtime stream processing framework

Fabric At Ola continued...

One click deployment

Page 27: Fabric - Realtime stream processing framework

Fabric At Ola continued...

Marathon App

Page 28: Fabric - Realtime stream processing framework

FabricNumbers

Page 29: Fabric - Realtime stream processing framework

Fabric At Ola Stats

Ola is currently receiving ~2.5 million events per second from its end users - driver and customer apps as well as internally generated events. Multiple real-time use cases stem from the events which includes:

● Fraud detection and prevention● Just-in-time notifications● Security alerts● Real-time reporting● Generating user specific offers

Fabric has been in production at Ola for 10 months now and powering these applications apart from acting as raw event ingestion and pub-sub system.

Page 30: Fabric - Realtime stream processing framework

Fabric At Ola Stats continued...

Key Stats -

● Event Streams Handled : 375+

● No of topologies live : 160+

● Ingestion rate : ~2.5 million per second on 10 nodes

● Node Config : C4.8x large machines

Page 31: Fabric - Realtime stream processing framework

Fabric Summary Points

1. Developed in Java.2. Highly scalable and guaranteed availability3. Reliable - Framework level guarantees against message loss, support for replay, multiple

sources and complex tuple trees4. Event batching is supported at the core level.5. Source level event partitioning used as unit for scalability.6. Uses capabilities provided by docker to ensure strong application7. On the fly topology creation and deployment by dynamically assembling topologies using

components directly from artifactory8. Inbuilt support for custom metrics and custom code level healthchecks to catch application

failures right when they happen9. Easy development and deployment

And many more...

Page 32: Fabric - Realtime stream processing framework

Links

Fabric is recently open sourced on github.

● Github link: https://github.com/olacabs/fabric

● Documentation: https://github.com/olacabs/fabric/blob/develop/README.md

Please Contribute…!

Page 33: Fabric - Realtime stream processing framework

Thank You!

Shashank GautamSathish Kumar KS