improvements to flink & it's applications in alibaba search

20
Blink Improvements to Flink & Its Applications in Alibaba Search Xiaowei Jiang, Feng Wang {xiaowei.jxw, jason.wang} @alibaba-inc.com

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

1.234 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Improvements to Flink & it's Applications in Alibaba Search

Blink Improvements to Flink &

Its Applications in Alibaba Search Xiaowei Jiang, Feng Wang

{xiaowei.jxw, jason.wang} @alibaba-inc.com

Page 2: Improvements to Flink & it's Applications in Alibaba Search

Who Are We?

n Xiaowei Jiang

l 2014 −− now Alibaba

l 2010 −− 2014 Facebook

l 2002 −− 2010 Microsoft

l 2000 −− 2002 Stratify

n Feng Wang

l 2006 −− now Alibaba

Page 3: Improvements to Flink & it's Applications in Alibaba Search

About Alibaba

n  Alibaba Group

l  Operating the world’s largest online marketplace

l  Annual GMV $394 Billion in year 2015

n  Alibaba Search

l  Personalized search and recommendation platform

l  Major driver of online traffic

Page 4: Improvements to Flink & it's Applications in Alibaba Search

Agenda

n Background n What is Blink? n Improvements in Blink n Challenges & Future

Page 5: Improvements to Flink & it's Applications in Alibaba Search

Logs

Scenario – Realtime A/B Test

Transacton

Parser Filter Join Agg

Parser Filter

UDF Druid Click

Impression Parser Filter

Page 6: Improvements to Flink & it's Applications in Alibaba Search

Scenario – Search Index Build & Update DataSource

Filter Sync

HBase

IC

Filter Sync

UIC

Join

Search Engine

Export

HBase

Result

UIC

IC1

IC2

UIC1

UIC2

Page 7: Improvements to Flink & it's Applications in Alibaba Search

Streaming Topologies

Long Batch Pipelines Machine Learning at Scale

Graph Analysis

à low latency

à resource utilization à iterative algorithms

à mutable state

Flink: Unified Compute Engine

Page 8: Improvements to Flink & it's Applications in Alibaba Search

Flink Stack

Page 9: Improvements to Flink & it's Applications in Alibaba Search

What is Blink?

n Blink – Improvements to Flink from Alibaba l Comprehensive Improvements to Flink Table API

l  Improved Runtime Compatible with Flink API and Ecosystem

n Status l Runs on Thousands of Nodes In Alibaba Production

l Supports Mission Critical Products

Page 10: Improvements to Flink & it's Applications in Alibaba Search

Table API Improvements

n Principle – Unified SQL layer for batch and streaming

n Functionality l  UDF/UDTF/UDAGG

l  Stream-Stream Join

l  Aggregation(min, max, avg, sum, count, distinct_count)

l  Windowing (time_window, count_window)

l  Retraction

Page 11: Improvements to Flink & it's Applications in Alibaba Search

Runtime Improvements

n New Runtime Architecture on YARN n Optimized State, Checkpoint & Failover n Reliable & Production Quality n Much More

Page 12: Improvements to Flink & it's Applications in Alibaba Search

Flink on YARN

Client Node YARN Node

YARN Node

YARN ResourceManager

YARN NodeManager

Container

Flink JobManager

YARN AppMaster

YARN Node

YARN NodeManager

Container

Flink TaskManager

YARN Node

YARN NodeManager

Container

Flink TaskManager

FlinkYARN Client

HDFS

4.allocate worker

3.allocate app master

1. store user jar and configuration

2. register resource and request app master

always bootstrap containers with user jar and config

Page 13: Improvements to Flink & it's Applications in Alibaba Search

Blink on YARN

Client Node YARN Node

YARN Node

YARN ResourceManager

YARN NodeManager

Container

JobMaster

YARN Node

YARN NodeManager

YARN Node

YARN NodeManager

Blink Client

HDFS

4.allocate worker

3.allocate app master

1. store user jar and configuration

2. register resource and request app master

always bootstrap containers with user jar and config

Container

TaskExecutor

Container

TaskExecutor

Container

TaskExecutor

Container

Container

TaskExecutor

JobMaster4.allocate worker

Page 14: Improvements to Flink & it's Applications in Alibaba Search

Blink Job Architecture

Yarn Node

NodeManagerYarn Node

NodeManagerShuffle Service

Yarn Node

NodeManager

Shuffle Service

HDFS

ZooKeeper

control channel

control channel

state backup/recover

local data channel local data channel

state backup/recover

Container

Job Master

task scheduler

checkpoint coordinator

Container

rocks db spilled file

Task Executor

taskin out

Container

rocks db spilled file

Task Executor

taskin out

Container

rocks db spilled file

Task Executor

taskin out

Container

rocks db spilled file

Task Executor

taskin out

completed checkpoint

schedule events

Network data channel

Page 15: Improvements to Flink & it's Applications in Alibaba Search

Blink Checkpoint & State

TaskExecutor

Local CPn Local CPn-1Incremental Backup

OnComplete

i1 i2 i3 Bn

in queue

o1 o2 Bn-1

o3

out queue

2. hard link snapshot

Job Master

1. trigger

3.ack

clean up

4. complete

clean up

Taskoperator

state

HDFSreference

async

CPn

CPn-1

diff

State Files1.sst 2.sst n.sst

Page 16: Improvements to Flink & it's Applications in Alibaba Search

Blink Rescale

Page 17: Improvements to Flink & it's Applications in Alibaba Search

Blink Failover At Least Once

Source

Source

Source

Source

fail restartrestart

failover

Excactly Once

Source

Source

Source

Source

fail restart

failover

Sink

Sink

Sink

Sink

Page 18: Improvements to Flink & it's Applications in Alibaba Search

Blink Metrics

Job Vertex Number: [CPU, Memory] * Parallelism

In Queue

TPS Out Queue

Latency

Delay

CPU Memory

Task Metrics

Running Tasks

Page 19: Improvements to Flink & it's Applications in Alibaba Search

Challenges & Future

n Continued Optimization in Streaming n Batch in Production n Machine Learning in Production n Larger Cluster Scale n Contribute back to Flink community

Page 20: Improvements to Flink & it's Applications in Alibaba Search

Q & A

Thank You! Xiaowei Jiang: [email protected]

Twitter: @xiaoweij

Feng Wang: [email protected] Twitter: @ifengwang