pravega: streaming storage · linux foundation cncf –webinar april 2020. about the speaker...
TRANSCRIPT
Pravega: Rethinking Storage for StreamsFlavio Junqueira
Pravega - http://pravega.io
Linux Foundation CNCF – Webinar
April 2020
About the speaker
• Dell EMC• Senior Distinguished Engineer
• On Pravega since 2016
• Background• Distributed computing• Research: Microsoft, Yahoo!• Worked on various Apache projects• E.g., Apache ZooKeeper, Apache
BookKeeper
• Contact• Email: [email protected]• Twitter: @fpjunqueira
Pravega: Streaming Storage - http://pravega.io 2
Many sources continuously generating data
Pravega: Streaming Storage - http://pravega.io 3
Social networksOnline shopping
Server monitoring
Sensors (IoT)
Stream of user events• Status updates• Online transactions
Telemetry streams• CPU, memory, disk utilization
Stream of sensor events• Temperature samples• Radar and image
Unbounded data streams
Pravega: Streaming Storage - http://pravega.io 4
Landscape
Continuous data flows
Data Pipelines
StorageStream
Processor
• Sources of continuous data flows:• User status• Online transactions• Server telemetry data• Sensor samples• Connected cars• Drone videos
• Core elements• Storage (Pravega)• Stream processor (Apache Flink)
• Arbitrary Direct Acyclic Processing Graphs
• Visualization• Alerts• Insights• Recommendations• Actionable analytics
Storage Storage
StreamProcessor
StreamProcessor
StreamProcessor
Storage
Pravega: Streaming Storage - http://pravega.io 5
Video stream
Telemetry
Drone fleet
UNIFIEDANALYTICSPROCESSINGEST
Unbounded data streams
• From cattle health• To airplane inspection between flights
Pravega: Streaming Storage - http://pravega.io 6
UNIFIED
ANALYTICS
UNIFIED
PROCESSING
INGESTION
ENGINE
Manufacturing flow
Decoiler Leveler
Lubricator
Press
Industrial Sensor Data Anomaly Detection
Pravega: Streaming Storage - http://pravega.io 7
Data streams – Sequential
EventsServer,Sensor,etc. e1e2e3e4e5e6e7e8ekek+1ek+2ek+3
HeadTail
Pravega: Streaming Storage - http://pravega.io 8
Data streams - Parallelism
EventsServers,Sensors,etc.
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11
e14
e12
e13
e15
e16
e9e18
e17
e19
e20
e21
e22
e23
e24
e26
e28
e29
e27
e25ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
HeadTail
Pravega: Streaming Storage - http://pravega.io 9
Data streams – Traffic fluctuation
Servers,Sensors,etc.
Load Increases
Load Drops
HeadTail
Events
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38e41
e43
e40
e42
e44
Pravega: Streaming Storage - http://pravega.io 10
Data streams – Unbounded
Servers,Sensors,etc.
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38e41
e43
e40
e42
e44
HeadTail
Fresh, recentlyingested
Older, historicaldata
Events
Ideally, no such a distinctionbetween fresh and historical
The Lambda way
Pravega: Streaming Storage - http://pravega.io 11
Data streams – Read scalability
HeadTail
e1
e2
e3
e4
e5
e6
e7
e8
e10
e11e14
e12
e13e15
e16
e9
e18
e17
e19
e20
e21
e22
e23e24
e26
e28
e29
e27
e25
ek
ek+1
ek+2
ek+3
ek+5
ek+4
ek+6
ek+7
ek+8
ek+9
ek+10 ek+8
e30
e33
e31
e32
e39
e34
e35
e37
e36
e38e41
e43
e40
e42
e44
Streamprocessor
Pravega: Streaming Storage - http://pravega.io 12
Stream in
Streaming Storage
• Unbounded data• Elastic• Consistent• Tailing and historical data analytics
• Cloud native
Stream out
Data stream: cloud-native, storage primitive
Pravega: Streaming Storage - http://pravega.io 13
Introducing Pravega
Pravega: Streaming Storage - http://pravega.io 14
Stream Segment
0001010011111101101001010010101001
HeadTail
• Storage unit• Append-only sequences of bytes• … bytes, not events, messages or
records• Flexible in the formats it can store• Events: expect serializer implementation
Pravega: Streaming Storage - http://pravega.io 15
Stream Segment – Parallelism
00011101010101001
00011101010101001
00011101010101001
• Sources append to segments in parallel• Routing keys map appends to segments
HeadTail
Pravega: Streaming Storage - http://pravega.io 16
Stream Segment – Scaling
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
Scaleup
Scaledown
HeadTail
• Degree of parallelism changes dynamically• Auto-scaling: reacts to workload changes
Pravega: Streaming Storage - http://pravega.io 17
Stream Segment - Transactions
00011101010101001
00011101010101001
00011101010101001
HeadTail
00011101010101001
00011101010101001
00011101010101001
Temporarytransactionsegments
Stream mainsegments
Merged upon commit
Pravega: Streaming Storage - http://pravega.io 18
Stream Segment - Transactions
00011101010101001
00011101010101001
00011101010101001
HeadTail
00011101010101001
00011101010101001
00011101010101001
Temporarytransactionsegments
Stream mainsegments
Discarded on abort
Pravega: Streaming Storage - http://pravega.io 19
Append 1 if offset is 0 - fails
Stream Segment –Appending conditionally
HeadTail
Append 0 if offset is 0 - succeeds
• Revisioned streams• Conditional appends• Compare expected and current revisions• Revision implementation uses segment
offset
• State synchronizer• Builds on revisioned streams• Replicated state machines• Optimistic concurrency
Pravega: Streaming Storage - http://pravega.io 20
Stream scaling(a.k.a, changing the degree of parallelism)
Pravega: Streaming Storage - http://pravega.io 21
01000110
Scaling a stream
….. 01110110 01100001 01101100 01000110
• Stream has one segment
1
….. 01110110 01100001 01101100
• Seal current segment
• Create new ones
2
01000110
01000110
• Auto or manual scaling• Auto scaling
• Follows write workload• Say input load has increased• Increase the degree of
parallelism• Manual scaling
• API call
Pravega: Streaming Storage - http://pravega.io 22
Routing key space
0.0
1.0
Timet0
Segment 1
Pravega: Streaming Storage - http://pravega.io 23
Split
0.5
Segment 1
Segment 3
t0 t1
Routing key space
0.0
1.0
Time
0.5
Segment 2Segment 1
Pravega: Streaming Storage - http://pravega.io 24
Location 1
Location 2
- Keys are coordinates in a geo application- E.g., taxi rides
Split
0.5
Segment 1
Segment 3
t0 t1
Routing key space
0.0
1.0
Time
0.5
Segment 2Segment 1
Pravega: Streaming Storage - http://pravega.io 25
Split
0.75
Segment 4
Segment 5
t2
- Keys are coordinates in a geo application- E.g., taxi rides
Location 1
Location 2
Split
0.5
Segment 1
Segment 3
t0 t1
Routing key space
0.0
1.0
0.5
0.75
Time
Segment 2Segment 1
Pravega: Streaming Storage - http://pravega.io 26
Merge
0.75
t2
Back to cold
Split
0.75
Segment 4
Segment 5
t2
Split
0.5
Segment 1
Segment 3
t0 t1
Routing key space
0.0
1.0
0.5
0.75
Time
Segment 6Segment 2Segment 1
Pravega: Streaming Storage - http://pravega.io 27
Time
Key ranges are not statically assigned to segments
0.9 maps toSegment 1
0.9 maps toSegment 2
0.9 maps toSegment 4
0.9 maps toSegment 6
Merge
0.75Segment 6
t2
Split
0.75
Segment 4
Segment 5
t2
Split
0.5
Segment 1
Segment 2
Segment 3
t0 t1
Routing key space
0.0
1.0
Segment 1
0.5
0.75
0.9
Pravega: Streaming Storage - http://pravega.io 28
Pravega: Streaming Storage - http://pravega.io 29
Daily CyclesPeak rate is 10x higher than lowest rate
4:00 AM
9:00 AM
NYC Yellow Taxi Trip Records, March 2015http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Pravega Auto Scaling
Merge Split
Pravega Architecture
Pravega: Streaming Storage - http://pravega.io 34
Ingesting to a stream
Append
Event writer
Event writer
PravegaIngest stream data
00011101010101001
00011101010101001
00011101010101001
Pravega: Streaming Storage - http://pravega.io 35
Tracks writerposition
Reading from a stream
AppendRead
Event writer
Event writer
Event reader
Event reader
Group• Split segment load• Load balance• Ability to grow and shrink
PravegaIngest stream data Process stream data
00011101010101001
00011101010101001
00011101010101001
Pravega: Streaming Storage - http://pravega.io 36
Reading from a stream
AppendRead
Event writer
Event writer
Event reader
Event reader
Group• Split segment load• Load balance• Ability to grow and shrink
PravegaIngest stream data Process stream data
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
Stream scaling
Pravega: Streaming Storage - http://pravega.io 37
Control and data planes
AppendRead
Event writer
Event writer
Event reader
Event reader
PravegaIngest stream data Process stream data
Controller(Stream metadata)
Controller(Stream metadata)
Controller(Stream metadata)
Segment store(Stream data)Segment store(Stream data)
Segment Store(Stream data)
Segment containers
Controller• Stream lifecycle• Transactions
Segment store• Segment lifecycle• Stores segment data
Group• Split segment load• Load balance• Ability to grow and shrink
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
Pravega: Streaming Storage - http://pravega.io 38
Tiered storage
AppendRead
Event writer
Event writer
Event reader
Event reader
PravegaIngest stream data Process stream data
Controller(Stream metadata)
Controller(Stream metadata)
Controller(Stream metadata)
Segment store(Stream data)Segment store(Stream data)
Segment Store(Stream data)
Segment containers
Controller• Stream lifecycle• Transactions
Segment store• Segment lifecycle• Stores segment data
Group• Split segment load• Load balance• Ability to grow and shrink
ApacheZooKeeper
Tier 1(Apache
BookKeeper)
Tier 2(File or Object)
Tiered storage• Tier 1: Apache BookKeeper• Tier 2: File or Object
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
00011101010101001
Pravega: Streaming Storage - http://pravega.io 39
Segment storeSegment
store
The write path
Event Stream Writer
Controller
1
2Apache
BookKeeper
Long-term storage
3
4
• Synchronous write• Temporarily stored• Truncated once flushed to
next storage tier• Optimized for low-latency
writes
• Asynchronous write• Permanently stored• Options: HDFS, NFS, Extended S3• High read/write throughput
Append bytes
Locate segment
Segment store
Pravega: Streaming Storage - http://pravega.io 41
Segment storeSegment
store
The read path
Event StreamReader
Controller
1
2Apache
BookKeeper
Long-term storage
3
• Used for recovery alone• Not used to serve reads
• Bytes read from memory• If not present, pull data from Tier 2
Read bytes
Locate segment
4 Bytes read
Segment store
Pravega: Streaming Storage - http://pravega.io 42
Connecting to Stream Processors
Pravega: Streaming Storage - http://pravega.io 43
Connectors
Writer
Writer
Writer
Writer
Source connector(uses Pravega as a source)
Reader
Reader
Reader
Reader
Sink connector(dumps data onto Pravega)
e.g., job output data
e.g., job input data
https://github.com/pravega/flink-connectors
Pravega: Streaming Storage - http://pravega.io 44
Existing connectors
• Apache Flink
• Apache Hadoop
• Logstash plugins
• Alpakka connector
• More to come…
Pravega: Streaming Storage - http://pravega.io 45
Existing connectors
• Apache Flink
• Apache Hadoop
• Logstash plugins
• Alpakka connector
• More to come…
Pravega: Streaming Storage - http://pravega.io 46
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streamsreal-time and historic
scalable, fault tolerant, fast, in-memory,event time, large state, exactly-once
Historic
Data
Streams
Application
Apache Flink in a nutshell
Pravega: Streaming Storage - http://pravega.io 47
Compute with Apache Flink
Pravega(Streaming Storage / PubSub / Messaging)
Applications
Sensor
APIs
Data / Event
Producers Consuming
Applications
Streaming Applications
Pravega: Streaming Storage - http://pravega.io 48
Pravega
Pravega
Streaming Pipelines
Applications
Sensor
APIs
Pravega
Pravega
Pravega
Pravega: Streaming Storage - http://pravega.io 49
Pravega
Flink reading from Pravega
Pravega: Streaming Storage - http://pravega.io 50
Reading via Reader Group
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
• Reader group automatically assigns and re-balances segments
• Hides complexity from app
• Stream scaling
Pravega
Reader
Segment 2
Segment 1
Segment 3
Segment 4
Flink Job
Source tasks
Pravega
Reader
Pravega: Streaming Storage - http://pravega.io 51
Upon a Flink checkpoint
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Segment 2
Segment 1
Segment 3
Segment 4
Flink Job
Source tasks
Pravega
Reader
Master• Initiates checkpoint• Invokes call on ReaderGroup API• Implementation of
MasterTriggerRestoreHook
1
Pravega: Streaming Storage - http://pravega.io 52
Upon a Flink checkpoint
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Segment 2
Segment 1
Segment 3
Segment 4
Flink Job
Source tasks
Pravega
Reader
Master
C
C
12
RevisionedStream
• Coordinate via state synchronizer• Readers emit checkpoint event
• Initiates checkpoint• Invokes call on ReaderGroup API• Implementation of
MasterTriggerRestoreHook
Pravega: Streaming Storage - http://pravega.io 53
Upon a Flink checkpoint
Task Manager
Task Manager
Task Manager
Task Manager
Pravega Stream
Pravega
Reader
Segment 2
Segment 1
Segment 3
Segment 4
Flink Job
Source tasks
Pravega
Reader
Master
C
C
12
RevisionedStream
• Coordinate via state synchronizer• Readers emit checkpoint event
3
• Receives checkpoint
Pravega: Streaming Storage - http://pravega.io 54
Creating a Pravega source
// create the Pravega source to read a stream of text
FlinkPravegaReader<String> source = FlinkPravegaReader.<String>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(PravegaSerialization.deserializationFor(String.class))
.build();
https://github.com/pravega/pravega-samples
Pravega: Streaming Storage - http://pravega.io 55
Creating a Pravega source (… and using it)
// create the Pravega source to read a stream of text
FlinkPravegaReader<String> source = FlinkPravegaReader.<String>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(PravegaSerialization.deserializationFor(String.class))
.build();
// count each word over a 10 second time period
DataStream<WordCount> dataStream = env.addSource(source).name("Pravega Stream")
.flatMap(new WordCountReader.Splitter())
.keyBy("word")
.timeWindow(Time.seconds(10))
.sum("count");
https://github.com/pravega/pravega-samples
Pravega: Streaming Storage - http://pravega.io 56
Flink writing to Pravega
Pravega
Pravega: Streaming Storage - http://pravega.io 57
Exactly-once with Transactions
Sink tasks
Flink Job
Pravega
Txnwrites
• Transactional writes for job output• Executes a 2PC to commit results• Option to not use transactions
• At-least-once semantics
Pravega: Streaming Storage - http://pravega.io 58
Exactly-once with Transactions
FlinkMaster
FlinkMaster
S
S
Pravega
Coordinates checkpointing
Start checkpoint (Prepare)
AckPrepare
Complete checkpoint
Commit txn
2-Phase commit protocol
S
S
Sink tasks Sink tasks
Pravega: Streaming Storage - http://pravega.io 59
Creating a Pravega sink
// create the Pravega sink to write a stream of text
FlinkPravegaWriter<String> writer = FlinkPravegaWriter.<String>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withEventRouter(new EventRouter())
.withWriterMode(PravegaWriterMode.EXACTLY_ONCE)
.withSerializationSchema(PravegaSerialization.serializationFor(String.class))
.build();
dataStream.addSink(writer).name("Pravega Stream");
https://github.com/pravega/pravega-samples
Pravega: Streaming Storage - http://pravega.io 60
Pravega on Kubernetes
Pravega: Streaming Storage - http://pravega.io 61
Kubernetes operators
• Operator• Custom controller for managing the lifecycle of an application
• Automation• Deployment
• Configuration
• Scaling
• Upgrades
• Monitoring
• etc.
Pravega: Streaming Storage - http://pravega.io 62
Kubernetes operators
• Pravega Operator
• BookKeeper Operator
• ZooKeeper Operator
https://github.com/pravega/pravega-operator
https://github.com/pravega/bookkeeper-operator
https://github.com/pravega/zookeeper-operator
Pravega: Streaming Storage - http://pravega.io 63
Wrap up
Pravega: Streaming Storage - http://pravega.io 64
Conclusion
• Multitude of sources continuously generating data• Pravega: storage for data streams
• Unbounded• Elastic• Consistent
• Connects to streams processors• E.g., Apache Flink• Exactly-once end to end
• Open source• Apache License v2• Hosted on github• Looking for a home for incubation
Pravega: Streaming Storage - http://pravega.io 65
Getting started
• Check the web site: http://pravega.io
• Check the organization and repository:https://github.com/pravega/pravega
• Run Pravega standalone./gradlew :standalone:startStandalone
• Try some sampleshttps://github.com/pravega/pravega-samples
• Try on Kuberneteshttps://github.com/pravega/pravega-operator
• Feel free to provide feedback and contribute
Pravega: Streaming Storage - http://pravega.io 66
http://pravega.iohttp://github.com/pravega/pravegahttp://flink.apache.orghttps://github.com/pravega/flink-connectorshttps://github.com/pravega/pravega-operatorhttps://github.com/pravega/bookkeeper-operatorhttps://github.com/pravega/zookeeper-operator
E-mail: [email protected]: @fpjunqueira
Questions?
Pravega’s Web site
Pravega’s codeApache Flink’s sitePravega-Flink connector
Operators
Pravega: Streaming Storage - http://pravega.io 67