you sun jeong data analytics with druid°œ표... · data analytics with druid druid architecture...

30
DATA ANALYTICS WITH DRUID YOU SUN JEONG

Upload: others

Post on 24-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUIDYOU SUN JEONG

Page 2: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

WHO AM I ?

Senior Software Engineer of SK Telecom

Commercial Products

Big Data Discovery Solution (~’16)

Hadoop DW (~’15)

PaaS(CloudFoundry) (~’13)

Iaas (OpenStack) (~’13)

Mail to : [email protected]

2

Page 3: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

FOOTPRINTS

2014

2015 - Hadoop DW - Realtime NW Analytics

2016 - Big Data Discovery- Streaming Processing

3

Page 4: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

AGENDA

‣ History

‣ What is Druid?

‣ Druid Architecture

‣ Real-Time Ingestion Demo (15m)

‣ Cohort Analysis (15m)

4

Page 5: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

HISTORY

▸ Development started at Meta markets in 2011

▸ Apache V2 in early 2015

▸ 150+ contributors today

▸ https://github.com/druid-io

5

Page 6: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DATA LAKE

6

https://www.linkedin.com/pulse/more-analytics-than-just-fishing-data-lake-john-poppelaars

Page 7: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DW VS DATA LAKE

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differences.html

7

Page 8: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

WHAT IS DRUID

Distributed, In-memory Multi-dimensional

OLAP store

8

Page 9: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

PROBLEMStimestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 3909846810 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0

timestamp impressions clicks 2011-01-01T00:00:00Z 10 6

timestamp domain user gender clicked 2011-01-01T00:01:35Z bieber.com 4312345532 Female 1 2011-01-01T00:03:03Z bieber.com 3484920241 Female 0 2011-01-01T00:04:51Z ultra.com 9530174728 Male 1 2011-01-01T00:05:33Z ultra.com 4098310573 Male 1 2011-01-01T00:05:53Z ultra.com 5832057930 Female 0 2011-01-01T00:06:17Z ultra.com 5789283478 Female 1 2011-01-01T00:23:15Z bieber.com 4730093842 Female 0 2011-01-01T00:38:51Z ultra.com 9530174728 Male 1 2011-01-01T00:49:33Z bieber.com 4930097162 Female 1 2011-01-01T00:49:53Z ultra.com 0381837193 Female 0

timestamp domain gender impressions clicks 2011-01-01T00:00:00Z bieber.com Female 4 2 2011-01-01T00:00:00Z ultra.com Female 3 1 2011-01-01T00:00:00Z ultra.com Male 3 2

9

Page 10: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

BIG DATA DISCOVERY

▸ Roll-up

▸ Summarizing over a dimension

▸ Drill-down

▸ Focusing (zooming in)

▸ Slicing and dicing

▸ Reducing dimensions (slice)

▸ Picking values of specific dimensions (dice)

▸ Pivoting

▸ Rotating multi-dimensional cube

10

Page 11: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

OLAP CUBE

▸ Slice and Dice

11

Page 12: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

IN-MEMORY

12

Page 13: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

COLUMNAR STORAGE

13

Page 14: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DRUID TERMS

▸ Data

▸ Timestamp

▸ Dimension

▸ Metric

▸ Datasource

▸ Segment

▸ Granularity

14

Page 15: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DRUID ARCHITECTURE

REALTIME

BROKER HISTORICAL

15

Page 16: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

ARCHITECTURE - BATCH INGESTION

HDFS

HISTORICAL NODE

HISTORICAL NODE

HISTORICAL NODE

BROKER NODE

Segments

Queries

16

Page 17: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

ARCHITECTURE - STREAMING INGESTION

REALTIME NODE

HISTORICAL NODE

HISTORICAL NODE

HISTORICAL NODE

BROKER NODE

Segments

QueriesStreaming

17

Page 18: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

ARCHITECTURE - LAMBDA

REALTIME NODE

HISTORICAL NODE

HISTORICAL NODE

HISTORICAL NODE

BROKER NODE

Segments

QueriesStreaming

HDFS

18

Page 19: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

GLUE ARCHITECTURE

REAL TIME TASK

HISTORICAL NODE

HISTORICAL NODE

HISTORICAL NODE

BROKER NODE

Segments

Queries

Streaming

STREAM PROCESSOR

(TRANQUILITY)

Kafka Indexing Service

19

Page 20: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

REAL WORLD ARCHITECTURE

DATA NODE #1

DATA NODE #N

OVERLORDMIDDLE MANAGE

#1

COORDINATOR

MYSQL

HA PROXY

MEMCACHED#2

BROKER NODE

#1

BROKER NODE

#1

MEMCACHED#3

MEMCACHED#1

HISTORICAL NODE #1

HISTORICAL NODE #N

MIDDLE MANAGE

#N

ZK1

ZK2

ZK3

20

Page 21: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DRUID MONITORING

21

http://www.slideshare.net/CharlesAllen9/programmatic-bidding-data-streams-druid

Page 22: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DRUID DATASOURCE

22

Page 23: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

RDRUID

DATA ANALYTICS WITH DRUID

https://github.com/druid-io/RDruid

23

Page 24: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

PYDROID

24

https://github.com/druid-io/pydruid

Page 25: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

DEMO

▸ Jupyter Notebook(PyDruid)

▸ Mobile App User Events for 1 week : 2 billion events

▸ Scenario : Unique users Cohort Analysis

25

Page 26: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DEMO

Page 27: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

MAY THE FORCE BE WITH YOU

27

Page 28: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

REFERENCES

▸ Druid: http://www.popit.kr/tag/druid/ (https://www.facebook.com/popitkr/): http://druid.io/

▸ Cohort Analysis : http://www.gregreda.com/2015/08/23/cohort-analysis-with-python/

▸ Druid Meetup@Seoul : http://www.meetup.com/Druid-Seoul/

28

Page 29: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

DATA ANALYTICS WITH DRUID

POPIT

29

https://www.facebook.com/popitkr/

Page 30: YOU SUN JEONG DATA ANALYTICS WITH DRUID°œ표... · data analytics with druid druid architecture realtime broker historical 15. data analytics with druid architecture - batch ingestion

Q&A

THANK YOU

DATA ANALYTICS WITH DRUID 30