web analytics at scale with druid at naver...web analytics at scale with druid at naver.com jason...

70
Web analytics at scale with Druid at naver.com Jason Heo ([email protected]) Doo Yong Kim ([email protected])

Upload: others

Post on 03-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Web analytics at scalewith Druid at naver.com

Jason Heo ([email protected])Doo Yong Kim ([email protected])

Page 2: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Part 1• About naver.com• What is & Why Druid• The Architecture of our service

• Part 2• Druid Segment File Structure• Spark Druid Connector• TopN Query• Plywood & Split-Apply-Combine• How to fix TopN’s unstable results

• Appendix

Agenda

Page 3: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

About naver.com

https://en.wikipedia.org/wiki/Naver

• naver.com• The biggest website in South Korea• The Google of South Korea• 74.7% of all web searches in South Korea

Page 4: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Developed Analytics Systems at Naver• Working with Databases since 2000• Author of 3 MySQL books• Currently Elasticsearch, Spark, Kudu,

and Druid

• Working on Spark and Druid-based OLAP platform

• Implemented search infrastructure at coupang.com

• Have been interested in MPP and advanced file formats for big data

Jason Heo Doo Yong Kim

About Speakers

Page 5: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Platforms we've tested so far

Parquet

ORC

Carbon Data

Elasticsearch

ClickHouse Kudu

Druid

SparkSQL

Hive

Impala

Drill

PrestoKylin

Phoenix

Query Engine

Storage Format

Page 6: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• What is Druid?• Our Requirements• Why Druid?• Experimental Results

What is & Why Druid

Page 7: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Column-oriented distributed datastore• Real-time streaming ingestion• Scalable to petabytes of data• Approximate algorithms (hyperLogLog, theta sketch)

https://www.slideshare.net/HadoopSummit/scalable-realtime-analytics-using-druid

From HORTONWORKS

What is Druid?

Page 8: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

From my point of view

• Druid is a cumbersome version of Elasticsearch (w/o search feature)• Similar points

• Secondary Index• DSLs for query• Flow of Query Processing

• Terms Aggregation ↔TopN Query, Coordinator ↔Broker, Data Node ↔Historical

• Different points• more complicated to operate• better with much more data• better for Ultra High Cardinality• less GC overhead• better for Spark Connectivity (for Full Scan)

What is Druid?

Page 9: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Real-timeNode

Historical

BrokerOverlord

MiddleManager

Coordinator

Kafka

Index Service

Segment management

What is Druid? - Architecture

MySQLmetadata

Zookeeper

cluster mgmt.

Deep Storage(HDFS, S3)

stores Druid segments for durability

Query Service

Clients

Druid DSL

Segmentsdownload

Segments for query

Page 10: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Real-timeNode

Historical

Broker

{"queryType": "groupBy","dataSource": "sample_data","dimension": ["country", "device"],"filter": {},"aggregation": [...],"limitSpec": [...]

}

{"queryType": "topN","dataSource": "sample_data","dimension": "sample_dim","filter": {...}"aggregation": [...],"threshold": 5

}

SELECT ... FROM dataSource

What is Druid? - Queries

• SQLs can be converted to Druid DSL• No JOIN

Page 11: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

SELECT COUNT(*)FROM logsWHERE url = ?;

1. Random Access(OLTP)

SELECT url,COUNT(*)

FROM logsGROUP BY urlORDER BY COUNT(*)

DESCLIMIT 10;

2. Most Viewed

SELECT visitor,COUNT(*)

FROM logsGROUP BY visitor;

3. Full Aggregation

SELECT ...FROM logs INNER JOIN users

GROUP BY ...HAVING ...

4. JOIN

Why Druid? - Requirements

Page 12: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Supports Bitmap Index

• Fast Random Access

Perfect solution for OLTP and OLAP

For OLTP

• Supports TopN Query• 100x times faster than GroupBy query

• Supports Complex Queries• JOIN, HAVING, etc• with our Spark Druid Connector

For OLAP

Why Druid?

★★★★☆1. Random Access

★★★★☆3. Full Aggregation

★★★★★2. Most Viewed

★★★★☆4. JOIN

Page 13: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Fast Random Access• Terms Aggregation

• TopN Query• Easy to manage

Pros

Cons• Slow full scan with es-hadoop• Low Performance for multi-field terms aggregation

(esp. High Cardinality)• GC Overhead

Comparison – ElasticSearch

1. Random Access ★★★★★

3. Full Aggregation ☆☆☆☆☆

2. Most Viewed ★★★☆☆

4. JOIN ☆☆☆☆☆

Page 14: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Fast Random Access via Primary Key• Fast OLAP with Impala

Pros

• No Secondary Index• No TopN Query

Cons

Comparison – Kudu + Impala

★★★★★ (PK)★☆☆☆☆ (non-PK)

1. Random Access

★★★★★3. Full Aggregation

☆☆☆☆☆2. Most Viewed

★★★★★4. JOIN

Page 15: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Random Access Most Viewed

0.25 0.35 0.08

2.72.9

0.78

0

0.5

1

1.5

2

2.5

3

3.5

Elasticesarch Kudu+Impala Druid1 Field 2 Fields

0.003

0.14

0.03

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Elastisearch Kudu+Impala Druid

Experimental Results – Response Time

sec sec

Page 16: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Experimental Results – Notes

• ES: Lucene Index• Kudu+Impala: Primary Key• Druid: Bitmap Index

Random Access

• ES: Terms Aggregation• Kudu+Implala: Group By• Druid: TopN

• Split-Apply-Combine for Multi Fields

Most Viewed

• 210 mil. rows• same parallelism

• same number of shards/partitions/segments

Data Sets

Page 17: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Logs

The Architecture of our service

Zeppelin

Plywood

Druid DSL

Coordinator

Overlord

Middle Manager

Peon

Spark Thrift Server

Batch Ingestion

Parquet

Kafka

Run daily batch job

API Server

Historical

Spark Executor

Segments File Broker

Druid

SparkSQL

KafkaIndexingService

Kafkatransform logs

Parquet

remove duplicated logs

Real-time Ingestion

Page 18: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Switching

Page 19: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Introduction – Who am I?

1. Doo Yong Kim2. Naver3. Software engineer

4. Big data

Page 20: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Contents

1. Druid Storage Model2. Spark Druid Connector Implementation3. TopN Query

4. Plywood & Split-Combine-Apply5. Extending Druid Query

Page 21: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Druid Storage Model – 4 characteristics

• Columnar format• Explicit distinguishes between dimension, metric• Bitmap index

• Dictionary encoded

Page 22: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Druid Storage Model - background

Druid treats dimension and metric separately.

Dimension Metric• Bitmap Index• GroupBy Fields

• Argument of Aggregate Function

{"dimensionsSpec": {

"dimensions": ["country", "device", ...]},..."metricsSpec": [

{ "type": "count", "name": "count" },{ "type": "doubleSum", "fieldName": "duration", "name": "duration" }

]}

Druid Ingestion Spec

Page 23: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Druid Storage Model- Dimension

Country (Dimension)

Korea

UK

Korea

Korea

Korea

UK

Korea ↔ 0UK ↔ 1 Dictionary for country

UK appears in 2nd, 6th rows

Korea → 101110

UK → 010001

Bitmap for Korea

010001

Dictionary Encoded Values

Page 24: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Druid Storage Model - Metric

13215293014

Country (Dimension) duration (Metric)Korea 13

UK 2

Korea 15

Korea 29

Korea 30

UK 14

Page 25: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Row

Filter it manuallydevice LIKE 'Iphone%'

Druid Storage Model

Bitmapcountry Filtering

Bitmapdevice Filtering

duration Filtering

Filter by bitmapcountry = 'Korea'

('Korea', 'Iphone 6s', 13)

SELECT country, device, durationFROM logsWHERE country = 'Korea'

AND device LIKE 'Iphone%'

Page 26: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector

Page 27: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector

1. 3 Ways to implement, Our implementation2. What is needed to implement3. Sample Codes, Performance Test

4. How to implement

Page 28: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector - 3 Ways to implement

Druid Broker

Spark Driver

DSLSQL Druid Historical

Spark Driver

SQL Spark Executor

• Good if SQL is rewritable to DSL• But DSL does not support all SQL

• Ex: JOIN, sub-query

• Easy to implement• No need to understand Druid Index Library

• Ser/de operation is expensive• Parallelism is bounded to no. of Historical

Select DSL

Large JSON

1st way 2nd way

Page 29: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector - 3 Ways to implement

Spark Driver

SQL

• Read Druid segment files directly.• Similar to the way of reading Parquet

• Difficult to implement• Need to understand Druid segment library

3rd way

Executor

Segment File

Reads segments using Druid Library

Allocate Spark executor into Historical Node

We chose this way!

Page 30: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

spark.read.format("com.navercorp.ni.druid.spark.druid").option("coordinator", "host1.com:18081").option("broker", "host2.com:18082").option("datasource", "logs").load().createOrReplaceTempView("logs")

Spark Druid Connector – How to use

spark.sql("""SELECT country, device, durationFROM logsWHERE country = 'Korea'AND device LIKE 'Iphone%'

""").show(false)

Create table Execute Query

Page 31: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Total 4.4B rows

0.21

7.5

0

1

2

3

4

5

6

7

8

Spark Druid Spark Parquet

Random Access

24.1

7.7

0

5

10

15

20

25

30

Spark Druid Spark Parquet

Full Scan & GROUP BY

Spark Druid Connector - Performance

Seconds, lower is better

Page 32: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – How to implement

Page 33: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – How to implement

1. Druid Rest API2. Druid Segment Library3. Spark Data Source API

Page 34: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – Get table schema

Spark Driver

Druid Broker

{"queryType": "segmentMetaData","dataSource": "logs","merge": "true"

}

{"columns": {"__time": {...},"country": {...},"device": {...},"duration": {...}...

}

spark.read.format("...").option("coordinator", "...").option("broker", "...").option("datasource", "logs").load()

Schema

Page 35: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – Partition pruning

WHERE country = 'Korea'AND_time = CAST('2018-05-23' AS TIMESTAMP)

Segments can be prunedby interval condition and single dimension partition

1. Interval conditionserverview returns only matched segments

2. Single dimension partitioncompare start and end with given filter

Spark Driver

Druid Coordinator

GET /.../logs/intervals/2018-05-23/serverview

[{"segment": {"shardSpec": {"dimension": "country","start": "null", "end":

"b" ...},"id": "segmentId"

},"servers": [{"host": "host1"},{"host": "host2"}

]},{ "segment": ...},...

}

Page 36: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – Spark filters to Druid filters

WHERE country = 'Korea'AND city = 'Seoul'

buildScan(requiredColumns: [country, device, duration],filters: [EqualTo(country, Korea), EqualTo(city, Seoul)])

Spark's filters are converted into Druid's DimFilter

private def toDruidDimFilters(sparkFilter: Filter): DimFilter = {sparkFilter match {...case EqualTo(attribute, value) => {

new SelectorDimFilter(attribute,value.toString,null

)case GreaterThan(attribute, value) => ...

Page 37: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector – Attach locality to RACK_LOCAL

• getPreferredLocations(partition: Partition)

• Returns Hosts having Druid Segments• Caution: Spark does not always guarantee that executors launch on preferred locations

• Set spark.locality.wait to very large value

Page 38: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Spark Druid Connector - How to implement

Done!Now Spark executor can read records from Druid segment files.

Segment File

Spark DruidConnector Spark

Page 39: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

TopN Query

Page 40: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

TopN Query

1. How TopN Query works2. Performance3. Limitation

Page 41: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

TopN Query flow (N=100)

Broker

Historical

Segment Cache

User

TopN Query – We heavily use TopN query

Historical

Segment Cache

Historical

Segment Cache

Client get merged results from each historical node.

Broker merge each’s results and make final records.

Each historical node return local top 100 results

Page 42: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

country SUM(duration)korea 114

uk 47

us 21

country SUM(duration)

uk 67

korea 24

usa 3

country SUM(duration)korea 87

uk 57

china 33

country SUM(duration)korea 225

uk 171

china 33

usa 24

country SUM(duration)korea 225

uk 171

china 33

TopN Query - Example

Top 3 country ORDER BY SUM(duration)

Broker Top 3 Result

Top 3 of Historical a

Top 3 of Historical b

Top 3 of Historical c

Page 43: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

country SUM(duration)

korea 114

uk 47

usa 21

china 17

country SUM(duration)

uk 67

korea 24

usa 3

china 1

country SUM(duration)

korea 87

uk 57

usa 22

china 33

country SUM(duration)

korea 225

uk 171

china 33

Missing!

TopN – is an approximate approach

Page 44: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

GroupBy(Few minutes)

TopN(1536 ms)

rank metric rank metric1 1,948,297 1 1,948,2972 1,404,167 2 1,404,1673 1,383,538 3 1,383,5384 1,141,977 4 1,141,9775 1,099,028 5 1,090,2776 1,090,277 6 1,079,2427 1,051,448 7 1,051,4488 996,961 8 996,9619 941,284 9 941,284

10 937,078 10 937,078

100x Faster!

TopN – 100x faster than GroupBy

1. rank changed

rank 5 → rank 6

2. value changed

1,099,028 → 1,079,242

Page 45: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

TopN – Limitations

1. TopN only has one dimension.2. Unstable result when replication factor is larger than 2.

Page 46: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Plywood

1. Plywood2. Split-Apply-Combine3. Our Improvement

Page 47: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

1. https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf2. http://plywood.imply.io/index

// Split [ country, city, device ]ply().apply(dataSource, $(dataSource).filter(...)) // Filter1.apply(dataSource, $(dataSource).filter(...)) // Filter2.apply(dataSource, $(dataSource).filter(...)) // Filter3.apply('country', $(dataSource).split(...)

.apply(...) // Filter to Split1 (country)

.apply('city', $(dataSource).split(...).apply(...) // Filter to Split2 (city).apply(...) // Filter to Split2 (city).apply('device', $(dataSource).split(...).apply(...) // Filter to Split3 (device)

))

)

SELECT country, city, deviceFROM $TABLEWHERE …GROUP BY country, city, device

Split Apply Combine - SAC

Page 48: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Before After

Plywood tuning

Page 49: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Throughput (qps, higher is better)

Before

Before After

Tuning Results

Page 50: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Challenge

Page 51: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Same query but the results can be different under 2+ replica factor configuration

Stable TopN - Motivation

Seg_1

Seg_2

Historical 1

Seg_1

Seg_2

Historical 2

Broker

Historical 1 Historical 2

Broker

TopN(Seg_1 + Seg_2) TopN(Seg_2 + Seg_3)

First Result Second Result

Results can be different

!=

Seg_3Seg_3

Seg_1

Seg_2

Seg_3

Seg_2

Seg_3

TopN(Seg_3)

Seg_1

TopN(Seg_1)

Page 52: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Bypass Historical side TopN Merge, do Broker side merge TopN results for each segment by it’s ID order

by_segment patch

Broker Broker

First Result Second Result

Always identical

==

Seg_1

Seg_2

Historical 1

Seg_1

Seg_2

Historical 2 Historical 1 Historical 2

TopN(Seg_1) + TopN(Seg_2) TopN(Seg_2) + TopN(Seg_3)

Seg_3Seg_3

Seg_1

Seg_2

Seg_3

Seg_2

Seg_3

TopN(Seg_3)

Seg_1

TopN(Seg_1)

Page 53: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Navis @ SK TelecomEns @ Naver

Special Thanks

Page 54: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Thank you!

Page 55: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Appendix

Page 56: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• 10 Broker Nodes• 40 Historical Nodes• 2 MiddleManager & Overlord Nodes• 2 Coordinator Nodes• 10 Yarn & HDFS Nodes for Batch Ingestion• Spark Standalone Cluster runs on Historical Nodes

• for Locality

Druid Deploy & Configuration (1)

Page 57: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

• Druid version : 0.11• H/W Spec for Broker & Historical

• CPU: 40 cores (w/ hyperthread)• RAM: 128GB• HDD: SSD w/ RAID 5

• Memory Configuration

Configuration Value for Broker Value for Historical-Xmx 20GB 12GB-XX:MaxDirectMemorySize 30GB 45GBdruid.processing.numMergeBuffers 10 20druid.processing.numThreads 20 30druid.processing.buffer.sizeBytes 512MB 800MBdruid.cache.sizeInBytes 0 5GBdruid.server.http.numThreads 40 40

Druid Deploy & Configuration (2)

Page 58: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Use Yarn External Resource for Batch Ingestion

"tuningConfig": {"type": "hadoop","jobProperties": {"yarn.resourcemanager.hostname" : "host1.com","yarn.resourcemanager.address" : "host1.com:8032","yarn.resourcemanager.scheduler.address": "host1.com:8030","yarn.resourcemanager.webapp.address": "host1.com:8088","yarn.resourcemanager.resource-tracker.address": "host1.com:8031","yarn.resourcemanager.admin.address": "host1.com:8033"}

}

Ingest Spec for External Yarn and HDFS

Page 59: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Use External HDFS for intermediate MR output

"tuningConfig": {"type": "hadoop","jobProperties": {"fs.defaultFS": "hdfs://DEFAULT_FS:8020","dfs.namenode.http-address": "NAMENODE:50070","dfs.namenode.https-address": "NAMENODE:50470","dfs.namenode.servicerpc-address": "NAMENODE:8022"}

}

Ingest Spec for External Yarn and HDFS

Page 60: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Lambda Architecture with Two Databases

https://en.wikipedia.org/wiki/Lambda_architecture

Lambda Architecture with Druid

https://www.slideshare.net/gianmerlino/druid-at-sf-big-analytics-2015-1201

Why Druid? – Simple Lambda Architecture

Page 61: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

HowKafka Indexing Service

Page 62: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

https://github.com/knoguchi/cm-druid

Druid on CDH

Page 63: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Extending Druid Query

1. Accumulated Metric in TopN2. Stable TopN Result

Page 64: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Row stream

Query

Second Query

Historical

Result

Result

Extending Druid Query

Client

Broker

Historical

Cursor

Aggregation

Row

Row

Row

Row

Row

Page 65: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Extending Druid Query - Motivation

2 queries are needed to make following table1. Total 3 times TopN query for 3 countries2. Aggregation query for total duration

Country SUM(duration) Ratio over total durationkorea 225 20%

uk 171 15.2%

usa 33 2.9%

Can we do it at once?

Page 66: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Extending Druid Query - Background

Yes we can!Just do TopN operation and SUM operation simultaneously!

country SUM(duration)korea 114

china 17

usa 21

uk 47

country duration

korea 100

korea 14

uk 40

uk 7

usa 21

china 17

Segment Data

Aggregated in map structure

country SUM(duration)korea 114

uk 47

usa 21

Final records

Total duration equals sum of all metric values!

Page 67: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

{"queryType": "topN",..."metric": "edits","accMetrics": ["edits"],...

}

{..."edits": 33,"__acc_edits": 1234...

}

User Request

Druid Response

Extending Druid Query in TopN

Broker

Historical

Cursor

TopNAggregation Row TopN Queue

Count Metric

We customized Druid to calculate total edits and metric at once!

Row

Row

Row

Row

Row

Page 68: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

Huge intermediate files with MapReduce

• Druid's default Batch Ingestion use MapReduce• To ingest 1.4GB Parquet file (Single Dim. Partition)

• Read: 16.6GB• Write: 20.5GB• Total: 41.1GB

Druid Spark Batch

Page 69: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

We modified Original Druid Spark Batch

• https://github.com/metamx/druid-spark-batch• Original version of Druid Spark Batch from Metamarket (creator of Druid)

• We added some features• Parquet input• Single Dimension Partition• Query Granularity• Same Ingest spec with Druid MapReduce Batch

Druid Spark Batch

Page 70: Web analytics at scale with Druid at naver...Web analytics at scale with Druid at naver.com Jason Heo(analytic.js.heo@navercorp.com) Doo Yong Kim (dooyong.kim@navercorp.com) •Part

37.1

7

0

5

10

15

20

25

30

35

40

MapReduce Spark

Disk Read, Write

759

2260

0

500

1000

1500

2000

2500

MapReduce Spark

Ingest time(Single Dim Partition)

(3 Segments, 430MB each)

333376

0

50

100

150

200

250

300

350

400

MapReduce Spark

Ingest time(Single Dim Partition)

(11 Segments, 135MB each)

Druid Spark Batch

GB, lower is better Seconds, lower is better Seconds, lower is better