pravega: streaming storage · linux foundation cncf –webinar april 2020. about the speaker...

64
Pravega: Rethinking Storage for Streams Flavio Junqueira [email protected] Pravega - http ://pravega.io Linux Foundation CNCF – Webinar April 2020

Upload: others

Post on 31-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega: Rethinking Storage for StreamsFlavio Junqueira

[email protected]

Pravega - http://pravega.io

Linux Foundation CNCF – Webinar

April 2020

Page 2: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

About the speaker

• Dell EMC• Senior Distinguished Engineer

• On Pravega since 2016

• Background• Distributed computing• Research: Microsoft, Yahoo!• Worked on various Apache projects• E.g., Apache ZooKeeper, Apache

BookKeeper

• Contact• Email: [email protected]• Twitter: @fpjunqueira

Pravega: Streaming Storage - http://pravega.io 2

Page 3: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Many sources continuously generating data

Pravega: Streaming Storage - http://pravega.io 3

Page 4: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Social networksOnline shopping

Server monitoring

Sensors (IoT)

Stream of user events• Status updates• Online transactions

Telemetry streams• CPU, memory, disk utilization

Stream of sensor events• Temperature samples• Radar and image

Unbounded data streams

Pravega: Streaming Storage - http://pravega.io 4

Page 5: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Landscape

Continuous data flows

Data Pipelines

StorageStream

Processor

• Sources of continuous data flows:• User status• Online transactions• Server telemetry data• Sensor samples• Connected cars• Drone videos

• Core elements• Storage (Pravega)• Stream processor (Apache Flink)

• Arbitrary Direct Acyclic Processing Graphs

• Visualization• Alerts• Insights• Recommendations• Actionable analytics

Storage Storage

StreamProcessor

StreamProcessor

StreamProcessor

Storage

Pravega: Streaming Storage - http://pravega.io 5

Page 6: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Video stream

Telemetry

Drone fleet

UNIFIEDANALYTICSPROCESSINGEST

Unbounded data streams

• From cattle health• To airplane inspection between flights

Pravega: Streaming Storage - http://pravega.io 6

Page 7: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

UNIFIED

ANALYTICS

UNIFIED

PROCESSING

INGESTION

ENGINE

Manufacturing flow

Decoiler Leveler

Lubricator

Press

Industrial Sensor Data Anomaly Detection

Pravega: Streaming Storage - http://pravega.io 7

Page 8: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Data streams – Sequential

EventsServer,Sensor,etc. e1e2e3e4e5e6e7e8ekek+1ek+2ek+3

HeadTail

Pravega: Streaming Storage - http://pravega.io 8

Page 9: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Data streams - Parallelism

EventsServers,Sensors,etc.

e1

e2

e3

e4

e5

e6

e7

e8

e10

e11

e14

e12

e13

e15

e16

e9e18

e17

e19

e20

e21

e22

e23

e24

e26

e28

e29

e27

e25ek

ek+1

ek+2

ek+3

ek+5

ek+4

ek+6

ek+7

ek+8

ek+9

ek+10 ek+8

HeadTail

Pravega: Streaming Storage - http://pravega.io 9

Page 10: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Data streams – Traffic fluctuation

Servers,Sensors,etc.

Load Increases

Load Drops

HeadTail

Events

e1

e2

e3

e4

e5

e6

e7

e8

e10

e11e14

e12

e13e15

e16

e9

e18

e17

e19

e20

e21

e22

e23e24

e26

e28

e29

e27

e25

ek

ek+1

ek+2

ek+3

ek+5

ek+4

ek+6

ek+7

ek+8

ek+9

ek+10 ek+8

e30

e33

e31

e32

e39

e34

e35

e37

e36

e38e41

e43

e40

e42

e44

Pravega: Streaming Storage - http://pravega.io 10

Page 11: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Data streams – Unbounded

Servers,Sensors,etc.

e1

e2

e3

e4

e5

e6

e7

e8

e10

e11e14

e12

e13e15

e16

e9

e18

e17

e19

e20

e21

e22

e23e24

e26

e28

e29

e27

e25

ek

ek+1

ek+2

ek+3

ek+5

ek+4

ek+6

ek+7

ek+8

ek+9

ek+10 ek+8

e30

e33

e31

e32

e39

e34

e35

e37

e36

e38e41

e43

e40

e42

e44

HeadTail

Fresh, recentlyingested

Older, historicaldata

Events

Ideally, no such a distinctionbetween fresh and historical

The Lambda way

Pravega: Streaming Storage - http://pravega.io 11

Page 12: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Data streams – Read scalability

HeadTail

e1

e2

e3

e4

e5

e6

e7

e8

e10

e11e14

e12

e13e15

e16

e9

e18

e17

e19

e20

e21

e22

e23e24

e26

e28

e29

e27

e25

ek

ek+1

ek+2

ek+3

ek+5

ek+4

ek+6

ek+7

ek+8

ek+9

ek+10 ek+8

e30

e33

e31

e32

e39

e34

e35

e37

e36

e38e41

e43

e40

e42

e44

Streamprocessor

Pravega: Streaming Storage - http://pravega.io 12

Page 13: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream in

Streaming Storage

• Unbounded data• Elastic• Consistent• Tailing and historical data analytics

• Cloud native

Stream out

Data stream: cloud-native, storage primitive

Pravega: Streaming Storage - http://pravega.io 13

Page 14: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Introducing Pravega

Pravega: Streaming Storage - http://pravega.io 14

Page 15: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream Segment

0001010011111101101001010010101001

HeadTail

• Storage unit• Append-only sequences of bytes• … bytes, not events, messages or

records• Flexible in the formats it can store• Events: expect serializer implementation

Pravega: Streaming Storage - http://pravega.io 15

Page 16: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream Segment – Parallelism

00011101010101001

00011101010101001

00011101010101001

• Sources append to segments in parallel• Routing keys map appends to segments

HeadTail

Pravega: Streaming Storage - http://pravega.io 16

Page 17: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream Segment – Scaling

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

Scaleup

Scaledown

HeadTail

• Degree of parallelism changes dynamically• Auto-scaling: reacts to workload changes

Pravega: Streaming Storage - http://pravega.io 17

Page 18: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream Segment - Transactions

00011101010101001

00011101010101001

00011101010101001

HeadTail

00011101010101001

00011101010101001

00011101010101001

Temporarytransactionsegments

Stream mainsegments

Merged upon commit

Pravega: Streaming Storage - http://pravega.io 18

Page 19: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream Segment - Transactions

00011101010101001

00011101010101001

00011101010101001

HeadTail

00011101010101001

00011101010101001

00011101010101001

Temporarytransactionsegments

Stream mainsegments

Discarded on abort

Pravega: Streaming Storage - http://pravega.io 19

Page 20: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Append 1 if offset is 0 - fails

Stream Segment –Appending conditionally

HeadTail

Append 0 if offset is 0 - succeeds

• Revisioned streams• Conditional appends• Compare expected and current revisions• Revision implementation uses segment

offset

• State synchronizer• Builds on revisioned streams• Replicated state machines• Optimistic concurrency

Pravega: Streaming Storage - http://pravega.io 20

Page 21: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Stream scaling(a.k.a, changing the degree of parallelism)

Pravega: Streaming Storage - http://pravega.io 21

Page 22: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

01000110

Scaling a stream

….. 01110110 01100001 01101100 01000110

• Stream has one segment

1

….. 01110110 01100001 01101100

• Seal current segment

• Create new ones

2

01000110

01000110

• Auto or manual scaling• Auto scaling

• Follows write workload• Say input load has increased• Increase the degree of

parallelism• Manual scaling

• API call

Pravega: Streaming Storage - http://pravega.io 22

Page 23: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Routing key space

0.0

1.0

Timet0

Segment 1

Pravega: Streaming Storage - http://pravega.io 23

Page 24: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Split

0.5

Segment 1

Segment 3

t0 t1

Routing key space

0.0

1.0

Time

0.5

Segment 2Segment 1

Pravega: Streaming Storage - http://pravega.io 24

Page 25: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Location 1

Location 2

- Keys are coordinates in a geo application- E.g., taxi rides

Split

0.5

Segment 1

Segment 3

t0 t1

Routing key space

0.0

1.0

Time

0.5

Segment 2Segment 1

Pravega: Streaming Storage - http://pravega.io 25

Page 26: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Split

0.75

Segment 4

Segment 5

t2

- Keys are coordinates in a geo application- E.g., taxi rides

Location 1

Location 2

Split

0.5

Segment 1

Segment 3

t0 t1

Routing key space

0.0

1.0

0.5

0.75

Time

Segment 2Segment 1

Pravega: Streaming Storage - http://pravega.io 26

Page 27: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Merge

0.75

t2

Back to cold

Split

0.75

Segment 4

Segment 5

t2

Split

0.5

Segment 1

Segment 3

t0 t1

Routing key space

0.0

1.0

0.5

0.75

Time

Segment 6Segment 2Segment 1

Pravega: Streaming Storage - http://pravega.io 27

Page 28: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Time

Key ranges are not statically assigned to segments

0.9 maps toSegment 1

0.9 maps toSegment 2

0.9 maps toSegment 4

0.9 maps toSegment 6

Merge

0.75Segment 6

t2

Split

0.75

Segment 4

Segment 5

t2

Split

0.5

Segment 1

Segment 2

Segment 3

t0 t1

Routing key space

0.0

1.0

Segment 1

0.5

0.75

0.9

Pravega: Streaming Storage - http://pravega.io 28

Page 29: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega: Streaming Storage - http://pravega.io 29

Page 30: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Daily CyclesPeak rate is 10x higher than lowest rate

4:00 AM

9:00 AM

NYC Yellow Taxi Trip Records, March 2015http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Page 31: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega Auto Scaling

Merge Split

Page 32: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega Architecture

Pravega: Streaming Storage - http://pravega.io 34

Page 33: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Ingesting to a stream

Append

Event writer

Event writer

PravegaIngest stream data

00011101010101001

00011101010101001

00011101010101001

Pravega: Streaming Storage - http://pravega.io 35

Tracks writerposition

Page 34: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Reading from a stream

AppendRead

Event writer

Event writer

Event reader

Event reader

Group• Split segment load• Load balance• Ability to grow and shrink

PravegaIngest stream data Process stream data

00011101010101001

00011101010101001

00011101010101001

Pravega: Streaming Storage - http://pravega.io 36

Page 35: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Reading from a stream

AppendRead

Event writer

Event writer

Event reader

Event reader

Group• Split segment load• Load balance• Ability to grow and shrink

PravegaIngest stream data Process stream data

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

Stream scaling

Pravega: Streaming Storage - http://pravega.io 37

Page 36: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Control and data planes

AppendRead

Event writer

Event writer

Event reader

Event reader

PravegaIngest stream data Process stream data

Controller(Stream metadata)

Controller(Stream metadata)

Controller(Stream metadata)

Segment store(Stream data)Segment store(Stream data)

Segment Store(Stream data)

Segment containers

Controller• Stream lifecycle• Transactions

Segment store• Segment lifecycle• Stores segment data

Group• Split segment load• Load balance• Ability to grow and shrink

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

Pravega: Streaming Storage - http://pravega.io 38

Page 37: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Tiered storage

AppendRead

Event writer

Event writer

Event reader

Event reader

PravegaIngest stream data Process stream data

Controller(Stream metadata)

Controller(Stream metadata)

Controller(Stream metadata)

Segment store(Stream data)Segment store(Stream data)

Segment Store(Stream data)

Segment containers

Controller• Stream lifecycle• Transactions

Segment store• Segment lifecycle• Stores segment data

Group• Split segment load• Load balance• Ability to grow and shrink

ApacheZooKeeper

Tier 1(Apache

BookKeeper)

Tier 2(File or Object)

Tiered storage• Tier 1: Apache BookKeeper• Tier 2: File or Object

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

00011101010101001

Pravega: Streaming Storage - http://pravega.io 39

Page 38: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Segment storeSegment

store

The write path

Event Stream Writer

Controller

1

2Apache

BookKeeper

Long-term storage

3

4

• Synchronous write• Temporarily stored• Truncated once flushed to

next storage tier• Optimized for low-latency

writes

• Asynchronous write• Permanently stored• Options: HDFS, NFS, Extended S3• High read/write throughput

Append bytes

Locate segment

Segment store

Pravega: Streaming Storage - http://pravega.io 41

Page 39: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Segment storeSegment

store

The read path

Event StreamReader

Controller

1

2Apache

BookKeeper

Long-term storage

3

• Used for recovery alone• Not used to serve reads

• Bytes read from memory• If not present, pull data from Tier 2

Read bytes

Locate segment

4 Bytes read

Segment store

Pravega: Streaming Storage - http://pravega.io 42

Page 40: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Connecting to Stream Processors

Pravega: Streaming Storage - http://pravega.io 43

Page 41: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Connectors

Writer

Writer

Writer

Writer

Source connector(uses Pravega as a source)

Reader

Reader

Reader

Reader

Sink connector(dumps data onto Pravega)

e.g., job output data

e.g., job input data

https://github.com/pravega/flink-connectors

Pravega: Streaming Storage - http://pravega.io 44

Page 42: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Existing connectors

• Apache Flink

• Apache Hadoop

• Logstash plugins

• Alpakka connector

• More to come…

Pravega: Streaming Storage - http://pravega.io 45

Page 43: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Existing connectors

• Apache Flink

• Apache Hadoop

• Logstash plugins

• Alpakka connector

• More to come…

Pravega: Streaming Storage - http://pravega.io 46

Page 44: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Queries

Applications

Devices

etc.

Database

Stream

File / Object

Storage

Stateful computations over streamsreal-time and historic

scalable, fault tolerant, fast, in-memory,event time, large state, exactly-once

Historic

Data

Streams

Application

Apache Flink in a nutshell

Pravega: Streaming Storage - http://pravega.io 47

Page 45: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Compute with Apache Flink

Pravega(Streaming Storage / PubSub / Messaging)

Applications

Sensor

APIs

Data / Event

Producers Consuming

Applications

Streaming Applications

Pravega: Streaming Storage - http://pravega.io 48

Page 46: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega

Pravega

Streaming Pipelines

Applications

Sensor

APIs

Pravega

Pravega

Pravega

Pravega: Streaming Storage - http://pravega.io 49

Page 47: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega

Flink reading from Pravega

Pravega: Streaming Storage - http://pravega.io 50

Page 48: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Reading via Reader Group

Task Manager

Task Manager

Task Manager

Task Manager

Pravega Stream

• Reader group automatically assigns and re-balances segments

• Hides complexity from app

• Stream scaling

Pravega

Reader

Segment 2

Segment 1

Segment 3

Segment 4

Flink Job

Source tasks

Pravega

Reader

Pravega: Streaming Storage - http://pravega.io 51

Page 49: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Upon a Flink checkpoint

Task Manager

Task Manager

Task Manager

Task Manager

Pravega Stream

Pravega

Reader

Segment 2

Segment 1

Segment 3

Segment 4

Flink Job

Source tasks

Pravega

Reader

Master• Initiates checkpoint• Invokes call on ReaderGroup API• Implementation of

MasterTriggerRestoreHook

1

Pravega: Streaming Storage - http://pravega.io 52

Page 50: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Upon a Flink checkpoint

Task Manager

Task Manager

Task Manager

Task Manager

Pravega Stream

Pravega

Reader

Segment 2

Segment 1

Segment 3

Segment 4

Flink Job

Source tasks

Pravega

Reader

Master

C

C

12

RevisionedStream

• Coordinate via state synchronizer• Readers emit checkpoint event

• Initiates checkpoint• Invokes call on ReaderGroup API• Implementation of

MasterTriggerRestoreHook

Pravega: Streaming Storage - http://pravega.io 53

Page 51: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Upon a Flink checkpoint

Task Manager

Task Manager

Task Manager

Task Manager

Pravega Stream

Pravega

Reader

Segment 2

Segment 1

Segment 3

Segment 4

Flink Job

Source tasks

Pravega

Reader

Master

C

C

12

RevisionedStream

• Coordinate via state synchronizer• Readers emit checkpoint event

3

• Receives checkpoint

Pravega: Streaming Storage - http://pravega.io 54

Page 52: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Creating a Pravega source

// create the Pravega source to read a stream of text

FlinkPravegaReader<String> source = FlinkPravegaReader.<String>builder()

.withPravegaConfig(pravegaConfig)

.forStream(stream)

.withDeserializationSchema(PravegaSerialization.deserializationFor(String.class))

.build();

https://github.com/pravega/pravega-samples

Pravega: Streaming Storage - http://pravega.io 55

Page 53: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Creating a Pravega source (… and using it)

// create the Pravega source to read a stream of text

FlinkPravegaReader<String> source = FlinkPravegaReader.<String>builder()

.withPravegaConfig(pravegaConfig)

.forStream(stream)

.withDeserializationSchema(PravegaSerialization.deserializationFor(String.class))

.build();

// count each word over a 10 second time period

DataStream<WordCount> dataStream = env.addSource(source).name("Pravega Stream")

.flatMap(new WordCountReader.Splitter())

.keyBy("word")

.timeWindow(Time.seconds(10))

.sum("count");

https://github.com/pravega/pravega-samples

Pravega: Streaming Storage - http://pravega.io 56

Page 54: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Flink writing to Pravega

Pravega

Pravega: Streaming Storage - http://pravega.io 57

Page 55: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Exactly-once with Transactions

Sink tasks

Flink Job

Pravega

Txnwrites

• Transactional writes for job output• Executes a 2PC to commit results• Option to not use transactions

• At-least-once semantics

Pravega: Streaming Storage - http://pravega.io 58

Page 56: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Exactly-once with Transactions

FlinkMaster

FlinkMaster

S

S

Pravega

Coordinates checkpointing

Start checkpoint (Prepare)

AckPrepare

Complete checkpoint

Commit txn

2-Phase commit protocol

S

S

Sink tasks Sink tasks

Pravega: Streaming Storage - http://pravega.io 59

Page 57: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Creating a Pravega sink

// create the Pravega sink to write a stream of text

FlinkPravegaWriter<String> writer = FlinkPravegaWriter.<String>builder()

.withPravegaConfig(pravegaConfig)

.forStream(stream)

.withEventRouter(new EventRouter())

.withWriterMode(PravegaWriterMode.EXACTLY_ONCE)

.withSerializationSchema(PravegaSerialization.serializationFor(String.class))

.build();

dataStream.addSink(writer).name("Pravega Stream");

https://github.com/pravega/pravega-samples

Pravega: Streaming Storage - http://pravega.io 60

Page 58: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Pravega on Kubernetes

Pravega: Streaming Storage - http://pravega.io 61

Page 59: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Kubernetes operators

• Operator• Custom controller for managing the lifecycle of an application

• Automation• Deployment

• Configuration

• Scaling

• Upgrades

• Monitoring

• etc.

Pravega: Streaming Storage - http://pravega.io 62

Page 60: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Kubernetes operators

• Pravega Operator

• BookKeeper Operator

• ZooKeeper Operator

https://github.com/pravega/pravega-operator

https://github.com/pravega/bookkeeper-operator

https://github.com/pravega/zookeeper-operator

Pravega: Streaming Storage - http://pravega.io 63

Page 61: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Wrap up

Pravega: Streaming Storage - http://pravega.io 64

Page 62: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Conclusion

• Multitude of sources continuously generating data• Pravega: storage for data streams

• Unbounded• Elastic• Consistent

• Connects to streams processors• E.g., Apache Flink• Exactly-once end to end

• Open source• Apache License v2• Hosted on github• Looking for a home for incubation

Pravega: Streaming Storage - http://pravega.io 65

Page 63: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

Getting started

• Check the web site: http://pravega.io

• Check the organization and repository:https://github.com/pravega/pravega

• Run Pravega standalone./gradlew :standalone:startStandalone

• Try some sampleshttps://github.com/pravega/pravega-samples

• Try on Kuberneteshttps://github.com/pravega/pravega-operator

• Feel free to provide feedback and contribute

Pravega: Streaming Storage - http://pravega.io 66

Page 64: Pravega: Streaming Storage · Linux Foundation CNCF –Webinar April 2020. About the speaker •Dell EMC •Senior Distinguished Engineer •On Pravega since 2016 •Background •Distributed

http://pravega.iohttp://github.com/pravega/pravegahttp://flink.apache.orghttps://github.com/pravega/flink-connectorshttps://github.com/pravega/pravega-operatorhttps://github.com/pravega/bookkeeper-operatorhttps://github.com/pravega/zookeeper-operator

E-mail: [email protected]: @fpjunqueira

Questions?

Pravega’s Web site

Pravega’s codeApache Flink’s sitePravega-Flink connector

Operators

Pravega: Streaming Storage - http://pravega.io 67