better stream processing with python...kafka streams 9 • simple library, not a framework • event...

Post on 03-Jun-2020

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Better Stream Processingwith PythonTaking the Hipster out of Streaming

Andreas Heider, Robert Wall12.07.2017 EuroPython

Who are we?

• DevelopersatWinton

• Wintonisaglobalinvestmentmanagementanddatasciencecompany,foundedin1997

• Webelievethescientificmethodcanbeprofitablyappliedtothefieldofinvesting

2

What do we mean by Stream processing?

3

Batch Stream

Example: Real Time Financial Market Data

4

Time Symbol Price Qty

10:15:01 AAPL $144 10

10:15:02 GOOG $940 5

10:15:03 AAPL $145 11

Exchange10:15:02GOOG

5@$940

10:15:01AAPL

10@$144

Trades

Stream processing: Binning

5

Time Symbol Price Qty

10:15:01 AAPL $144 10

10:15:02 GOOG $940 5

10:15:03 AAPL $145 11

BinningProcess

Time Symbol Avg.Price

Volume

10:15 AAPL $144.5 1300

10:15 GOOG $943 1250

10:16 AAPL $145.3 1450

Streaming Data at Winton

6

EventStreams

EventStreams

MarketData

AlternativeData

Internal/BusinessEvents

Monitoring

Databases

RiskManagement

InvestmentManagement

Analytics

Transformations

Research

Apache Kafka

7

Producer Consumer

Topic

Partition1

Partition2

Partition3

Sprawl of Stream Processing systems

8

Kafka Streams

9

• Simplelibrary,notaframework• Eventatatimestreamprocessing• Stateful processing,joinsandaggregations• Distributedprocessingandfaulttolerance• PartofmainApacheKafkaproject• Javaonlysofar:(

Python at Winton

Manyusers,withdifferentskillsets:

• Developers

• Researchers

• Operations

• …

10

Talking to Kafka using kafka-python

11

Hipster Stream Processing

Python Kafka Clients

12

https://github.com/dpkp/kafka-python

• PurePythonimplementation

• Friendly,pythonic interface

https://github.com/confluentinc/confluent-kafka-python

• WrapperaroundClibrary• Amazinglyhighperformanceandrobustness

Experiences using low-level client

13

• Whatstartsoutasa10linescriptendsupasyetanotherhomegrownstreamingframework

• Thedevilisinthedetails:• Guaranteeingatleastonce(orevenexactly-onceprocessing)• Handlingstateful processing• Distributingloadovervariousmachines• Microbatching• Handlingrebalancesnicely

Kafka Streams for Python

https://github.com/wintoncode/winton-kafka-streams

14

Demo

15

Goals / Roadmap

1. CleanimplementationofKafka’scorestreamsAPIinPython

2. Experimentwithmorepythonic API/DSL

3. Optimise performanceviabatching/numpy/Arrow

4. ImplementmoreadvancedfeaturesofKafka’sstreamsAPI(exactlyonce,…)

16

Get in touch!

• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams

• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

• Announcementonkafka-dev

• Cometoourstandandtalktous

• ThankstoConfluent

17

Questions?

• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams

• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

• Announcementonkafka-dev

• Cometoourstandandtalktous

• ThankstoConfluent

18

Backup

19

Some words of experience

• Noteverythingfitsthestreamingmodel

• Manuallychangingdataistricky• Becarefulwhatyouputin,haverecoverymethod

• Stabledeploymentcanbechallenging• EspeciallyZookeeperandbuggyclients

• Setupmonitoringfromthestart• WeusePrometheusandGrafana• https://github.com/yahoo/kafka-manager

20

top related