apache flink
Post on 18-Feb-2017
307 Views
Preview:
TRANSCRIPT
Apache Flink Introduction
By: Ahmed Nader
2
Agenda
• What’s Apache Flink?
• Deeper into Flink
• Quick Start and Configuration
• Get your hands dirty
• Tips and some useful links
• References
3
What’s Apache Flink? Open Source platform for distributed Stream
and Batch Processing. Large scale data processing engine. Real Streaming engine, not cutting stream
into batches. Flink has 2 APIs.
DataStream DataSet
4
Datastream API Represents a continuous stream of data of
certain type. Operations applied on each element of the
stream or windows.
Data Strea
mOperatio
nData Strea
mSource Sink
5
Datastream API Example Live Stock Feed:
Apple 235
Alert if Microsoft > 120
Apple 235
Google 516
Sum every 10 seconds
Microsoft
124
Microsoft
124
Google 516
Write event to databas
e
Alert if sum > 10000
6
Dataset API Uses Batch processing. Special case for Stream processing where
finite data sources are just streams that happen to end.
Offers dedicated API with machine learning and graph processing libraries.
Data Set
Operation
Data SetSource Sink
7
Dataset API Example Map/Reduce paradigm:
Map Reduce
a
12…
8
Flink Stack
9
Analyzing flink stack Streaming dataflow runtime which interprets
every program as a dataflow graph. Some Libraries on top of Datastream and Dataset
API such as: Table: enables SQL like queries. Gelly: Graph processing to transform and
traverse graphs in a distributed fashion. ML: has a couple of machine learning algorithms
yet still too basic. CEP: easily detect complex events in a data
stream. Which can allow to get hold of what’s really important in your data.
10
Deeper into Flink
Data Sources
From an input file
From a socket
From acollection
11
Deeper into Flink
Data Sinks
Write to a CSV File
Write to a socket
Print on the terminal
12
Deeper into Flink Data Transformations(for DataStream API): Map: takes 1 element and produces 1
element. flatMap: takes 1 element and produces 0 or
more elements. Filter: Evaluates a boolean value for each
element and retains those returning true. KeyBy: partitions a stream into disjoint
partitions each has elements of the same key. Window: groups all stream events according
to some characteristic ex: data arrived in last 5 seconds.
Union, Join, Split, Select…
13
Deeper into Flink Interesting Use cases: Processing Twitter feed and one good
application for that can be collecting statistics on that feed.
see: http://blog.brakmic.com/stream-processing-with-apache-flink/ Identifying popular locations where people
arrive by taxis,By applying filter and map functions on a datastream of taxi ride records then getting the most popular places for the last 15 minutes for example.
see: https://www.mapr.com/blog/essential-guide-streaming-first-processing-apache-flink
14
Setup Pre-requisites: Java 7.x or higher. Maven 3.0.4 or higher. Start a new flink project using Maven:Run the following script in the terminal:mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-java \ -DarchetypeVersion=1.0.1OR Add flink to an existing project:
see: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html
15
Get your hands dirty:
16
Get your hands dirty:
17
Get your hands dirty:
Execution
Local/debuggingcluster Command Line
Interface
Web interface
See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html
18
Tips and some useful links: Subscribe to the mailing list, by sending an
empty email to user-subscribe@flink.apache.org.
Clone the flink project on Github for more examples.
There’s a free course by DataArtisanssee: http://dataartisans.github.io/flink-
training/index.html Here are some other useful links too:• http://
www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
• https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html
• https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html
19
References http://blog.brakmic.com/stream-processing-with-apache-flink/ http://
www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
https://www.mapr.com/blog/essential-guide-streaming-first-processing-apache-flink
https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.html
http://dataartisans.github.io/flink-training/index.html https://
ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html
20
Thanks!Any Questions??
top related