eron wright - introducing flink on mesos

Post on 16-Apr-2017

189 Views

Category:

Data & Analytics

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introducing Flink on MesosEron Wright – eron.wright@emc.comDELL EMC@eronwright

2of 15

What is Apache Mesos?

• A popular cluster manager (similar to YARN)• Makes available CPU, memory, & disk resources• Unique capabilities for storage services• Emerging as a foundation for data-centric, converged

infrastructure

• Provides a programming model for using cluster resources• A Mesos program is called a “framework”

• Packaged into an open-source distribution called DCOS• Prescribes best practices related to Mesos frameworks,

related services, etc.

3of 15

Why Flink on Mesos?

• Flink works best on a cluster manager– Easy to scale each job independently– Externalize scheduling logic (fairness, quota, …)– Good job isolation

• Flink can benefit from unique Mesos capabilities– Disk resources– Dynamic resource management– Unique management features (e.g. inverse offers for controlled downscaling & maintenance)

Demo

Flink Master Process

6of 15

Introduction

Flink Master Process

• The Flink Master Process is:– The “Application Master” for a single Flink cluster– A Mesos framework!

• Hosts numerous components:– Job Manager– Resource Manager (acts as Mesos scheduler)– Artifact Server (HTTP server for Mesos fetcher)

• Responsible for TM scaling and recovery– Handles JobManager scale change requests– Stores task state in ZooKeeper

host1host2

Master

JM

RM

HTTPD

TM TM

Mesos

7of 15

How it Works

Flink Master Process

• Offer handling:– Uses Netflix Fenzo as an optimizer– Gathers offers until all tasks launched

• Recovery:– Stores intentional state in ZooKeeper– Master uses leader election– Mesos allows some time for recovery before killing

tasks

• Monitoring:– Detects task failure; launches replacement

automatically.

host1host2

Master

TM TM

4. Launch

Mesos2. Resource Offers

1. Register

5. Fetch (HTTP)

6. Status update

3. Optimize

8of 15

Configuration

Flink Master Process (Con’t)

• Framework Info– mesos.resourcemanager.framework.secret– mesos.resourcemanager.framework.principal– mesos.resourcemanager.framework.role

• Mesos Master Info– mesos.master: (IP address or ZK lookup info)– mesos.failover-timeout

Note: no port configuration is necessary; Mesos automatically assigns ports.

Dispatcher

10of 15

Introduction

Dispatcher

• A highly-available service for launching Flink clusters.

• A Mesos framework!

• Accessed via REST by the CLI

• DCOS compatibility: – HTTP-based– Accessible via the Admin Router– (future) JWT authentication

• Aligned with FLIP-6

host1

1D

1C

1B

1A

host2

2D

2C

2B

2A

host3

3D

3C

3B

3A

host4

4D

4C

4B

4ADispatcher

Master

TM TM

TMTM

Master

CLI

TM

Mesos

11of 15

Framework Hierarchy

Dispatcher (Con’t)

• Nesting of frameworks is a common Mesos pattern. Here, Marathon launches the dispatcher, which launches the Flink Master Process, etc.

• Architecturally, it avoids a dependency on the Marathon API. For example, Aurora could be used here in place of Marathon.

Dispatcher

Master

Marathon

TM

(Task)

(Task)

(Task)

12of 15

Launching a Session

Dispatcher (Con’t)

• Use: mesos-session.sh

• CLI uploads files to dispatcher via HTTP– Flink Configuration– Supplemental files (--ship)– Keytabs– Certificates

• Dispatcher adds additional elements:– Configuration

› ZooKeeper Namespace

– Flink JAR– …

host1

1D

1C

1B

1A

host2

2D

2C

2B

2A

host3

3D

3C

3B

3A

host4

4D

4C

4B

4ADispatcher

Master

TM TM

CLI

HTTP(S)

TM

HTTP(S)

Mesos

13of 15

Dispatcher Deployment Modes

Dispatcher (Con’t)

• Dispatcher is usable in two ways

• Remote Mode:– Recommended for detached execution

• Local Mode:– Recommended for simple, interactive sessions

(e.g. flink shell)

3C

3B

3A

4C

4B

4ADispatcher

Master

Master

CLI

HTTP(S)

3C

3B

3A

4C

4B

4A

Master

CLI + Dispatcher

Local Mode Remote Mode

Summary

15of 15

Future Directions

• Dynamic Scaling– Add/remove Task Managers in response to scale changes over a job’s lifetime– Support Mesos maintenance procedures (e.g. inverse offers)

• Dispatcher Evolution (FLIP-6)– Generalize to support all deployment scenarios, unified CLI – Provide a centralized Web UI (incl. job history)– Authentication Support (e.g. OAuth 2.0)

• Docker Image Support– Tracking the “Mesos unified containerizer”

• Mesos Disk Support– Allocate multiple disks for Task Manager temp space– Scale up the I/O

16of 15

Project Status

• Targeted for: Flink 1.2

• Contributors:– Eron Wright (Dell EMC)– Maximilian Michels (data Artisans)

• Design Doc: – Mesos Integration on Google Docs

• JIRAs:– FLINK-1984 – Integrate Flink with Apache Mesos

• Code:– https://github.com/EronWright/flink/tree/feature-FLINK-1984-T2

top related