building data pipelines: from simple to more advanced - hands-on experience / crunchconf - oct 29,...

56
Building data pipelines 01 from simple to more advanced - hands-on Sergii Khomenko, Data Scientist [email protected], @lc0d3r CrunchConf - October 29, 2015

Upload: sergii-khomenko

Post on 24-Jan-2018

860 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Building data pipelines

01

from simple to more advanced - hands-on

Sergii Khomenko, Data [email protected], @lc0d3r

CrunchConf - October 29, 2015

Page 2: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Sergii Khomenko

2

Data scientist at one of the biggest fashion communities, Stylight.

Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations.

Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet Camp London, Berlin Buzzwords 2015 , Tableau Conference on Tour, Budapest BI Forum 2015

Page 3: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Profitable LeadsStylight provides its partners with high-quality leads enabling partner shops to leverage Stylight as a ROI positive traffic channel.

InspirationStylight offers

shoppable inspiration that

makes it easy to know what to

buy and how to style it.

Branding & ReachStylight offers a unique opportunity for brands to reach an audience that is actively looking for style online.

ShoppingStylight helps users search

and shop fashion and lifestyle products smarter across

hundreds of shops.

3

Stylight – Make Style HappenCore Target Group

Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.

Page 4: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Stylight – acting on a global scale

Page 5: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Experienced & Ambitious Team

Innovative cross-functional organisation with flat hierarchy builds a unique team spirit.• +200 employees• 40 PhDs/Engineers• 28 years average age

• 63% female• 23 nationalities• 0 suits

5

Page 6: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Agenda

6

T h e G o o d , T h e B a d A n d T h e L e g a c y

O p e n S o u r c e s t a c k

A m a z o n A W S

G o o g l e C l o u d

T i p s , t r i c k s a n d b e s t p r a c t i c e s

Page 7: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

7

I n c o m p u t i n g , a p ipe l i ne i s a s e t o f d a t a p r o c e s s i n g e l e m e n t s c o n n e c t e d i n s e r i e s , w h e r e t h e

o u t p u t o f o n e e l e m e n t i s t h e i n p u t o f t h e n e x t o n e .

Page 8: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

The Good, The Bad And The Legacy

8

Page 9: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Sources of data:

9

• Web tracking • Metrics tracking • Behaviour tracking

• Business intelligence ETL • Internal Services • ML tagging service

Page 10: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Access patterns

10

• Real-time • Nearly real-time • Daily batches

Page 11: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

11

Page 12: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

12

Page 13: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Properties

13

• Data consistency • Doesn’t scale • Hard to add new sources • Complex system • Many interfaces

• As lean and legacy as possible • No need for special services

Page 14: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

14

Page 15: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

15

Streaming

Page 16: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Open Source Stack

16

Page 17: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

17

http://lambda-architecture.net/

Page 18: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

18

A p a c h e K a f k a i s p u b l i s h - s u b s c r i b e m e s s a g i n g r e t h o u g h t a s a d i s t r i b u t e d c o m m i t l o g .

Page 19: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

19

Page 20: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

20

Page 21: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

21http://www.ipponusa.com/wp-content/uploads/2014/10/spark-architecture.jpg

Page 22: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Results

22

• Scalable • Flexible

• High costs of maintenance • Not so easy to setup

Page 23: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

23

A p r o g r a m m i n g l a n g u a g e i s l o w l e v e l w h e n i t s p r o g r a m s r e q u i r e

a t t e n t i o n t o t h e i r r e l e v a n t .

Alan Jay Perlis / Epigrams on Programming

Page 24: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Amazon AWS

24

Page 25: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 26: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Kinesis Streams

Page 27: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

27

Page 28: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

28

Page 29: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

29

businessdevelopment

& finance

websiteevents

enrichmentBusiness

Intelligence

Page 30: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Kinesis Firehose Kinesis Analytics

Page 31: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 32: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 33: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

33

Page 34: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

34

custom unificationpipeline

ProductProcessing

BusinessIntelligence

ML/TaggingProduct events

variety of event types and structures

Page 35: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 36: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

36

Page 37: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

AWS Data Pipeline

Page 38: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 39: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Google Cloud

39

Page 40: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

40

Page 41: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

41

Page 42: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

42

Page 43: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

43

Page 44: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

44

Page 45: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 46: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Tips, tricks and best practices

46

Page 47: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Cross-Functional Team

47

Department: mission oriented team with all resources and the least dependencies

Product Team: builds the software the department or its customers use

Squad: team that executes the product development

47

Department

Product Team

Squad

PO

Engineer

Engineer

Designer

Data Scientist

Head of

Business Role

Business Role

Page 48: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

48

Page 49: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Cross-Functional Team

49

• You build it - you run it

• You check your numbers (domain knowledge)

• You provide your data as interface layer

• Data report comes after data tracking

49

Department

Product Team

Squad

PO

Engineer

Engineer

Designer

Data Scientist

Head of

Business Role

Business Role

Page 50: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

50

Page 51: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

51

Page 52: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

52

Page 53: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015
Page 54: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

54

I t h i n k t h a t i t ' s e x t r a o r d i n a r i l y i m p o r t a n t t h a t w e i n c o m p u t e r s c i e n c e k e e p f u n i n c o m p u t i n g .

W h e n i t s t a r t e d o u t , i t w a s a n a w f u l l o t o f f u n .

Alan Jay Perlis / The Structure and Interpretation

of Computer Programs

Page 56: Building data pipelines: from simple to more advanced - hands-on experience / CrunchConf - Oct 29, 2015

Related talks

56

• Helping Data Teams with Puppet / Puppet Camp London

• Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift / Tableau Conference on Tour - Berlin

• Google Cloud Dataflow Two Worlds Become a Much Better One