ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta aRchItectures
Collaborative Project
D 6.3
Project Presentation 03.02.2014 – 30.04.2014
Contractual Date of Delivery: 30.04.2014
Actual Date of Delivery: 30.04.2014
Author(s): Michael Mock
Institution: Poslovna Inteligencija d.o.o.
Workpackage: WP6
Security: PU
Nature: O
Total number of pages: 37
Project coordinator name: Michael Mock
Project coordinator organisation name:
Fraunhofer Institute for Intelligent Analysis
and Information Systems (IAIS)
Revision: 1
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Abstract:
This document is the FERARI deliverable of WP6 for the first review period
(03.02.2014 – 30.04.2014). The project presentation gives an overall overview of the FERARI project including the goals of the project, project partners and workpackage organization.
Revision history
Administration Status
Project acronym: FERARI ID: ICT-FP7-619491
Document identifier: D 6.3 Project Presentation (03.02.2014 – 30.04.2014)
Leading Partner: Poslovna Inteligencija d.o.o.
Report version: 1
Report preparation date: 10.04.2014
Classification: PU
Nature: OTHER
Author(s) and contributors: Michael Mock (FHG) Status: - Plan
- Draft
- Working
- Final
x Submitted
Copyright
This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use
within the consortium and the European Commission. www.ferari-project.eu
Flexible Event pRocessing for big dAta
aRchItectures (FERARI)
Introduction
2
FERARI – A FP7 EC - ICT project
Grant Agreement No. 619491
STREP Specific Targeted Research Project
Grown out of FP7 basic research project LIFT (FET Open)
FERARI was ranked 6th of 33 proposals within objective 4.2 Scalable Data Analysis
• February 2014 – January 2017, Funding: 2.95 Mio. EUR
3
Technion (Technion) + Haifa University
Technical University of Crete (TUC)
T-Hrvatski Telekom (HT)
FERARI - Consortium
Fraunhofer IAIS (FHG)
IBM (IBM)
Poslovna Inteligencija (PI)
4
Fraunhofer IAIS: Intelligent Analysis and Information Systems
270 people: scientists, project engineers, technical and
administrative staff
Located on Fraunhofer Campus Schloss
Birlinghoven/Sankt Augustin near Bonn
Joint research groups and cooperation with
Lead researcher: Dr. Michael Mock
5
„From sensor data to business intelligence, from media
analysis to visual information systems: Our research
allows companies to do more with data“
Institute Director: Prof. Dr. Stefan Wrobel
Technical University of Crete
Founded in 1977 in Chania, Crete
120 faculty members, ~175 adjunct faculty and lab personnel
2900 undergraduate and 550 graduate students
Around 200 research programs, total budget 20.5 million
ECE department: 25 faculty, ~200 undergrad students/year
Research organized in 10 research laboratories
SoftNet Lab (headed by Prof. Garofalakis): Focus on Big Data Analytics, Data
Streams, Cloud Computing
Lead researcher: Prof. Minos Garofalakis
6
TECHNION – Israel Institute for Technology and University of Haifa
Located in Haifa, oldest University in Israel (1912)
600 Faculty Members (3 Nobel Laureates)
Computer Science: 50 faculty members, 1500 Students
Lead researcher: Prof. Assaf Schuster, head of Technion
Computer Engineering Center, focus on Distributed and
Scalable Data Mining, Monitoring Distributed Data
Streams, Big Data Technologies and Analytics
and Dr. Daniel Keren, Department of Computer Science
at Haifa University
7
The Technion-Israel Institute of Technology is a major source of the innovation and brainpower that
drives the Israeli economy, and a key to Israel’s reputation as the world’s “Start-Up Nation.” Its
three Nobel Prize winners exemplify academic excellence.
IBM Research – Haifa
350 people: scientists, software engineers, subject
matter experts
Located in Haifa, Israel on the campus of Haifa
University
The largest IBM Research Lab outside the USA
Lead researcher: Fabiana Fournier
8
IBM Research is the innovation branch of IBM, the motto of
IBM Research is “the world is our lab”
T-Hrvatski Telekom: Communication, Information & Entertainment, Always & Everywhere
9
T-HT Group is the leading provider of telecommunications
services in Croatia and the sole company to offer the full
range of these services: it combines the services of fixed
and mobile telephony, data transmission, Internet and
international communications
T-HT’s strategy: GROW - COMPETE – TRANSFORM
Key figures for 2012:
Revenues: 991 mio EUR
EBITDA margin: 45,3%
5780 employees
Lead representative: Maja Vekić-Vedrina
“T-HT - to be the online company and to power the online
society and digital economy in Croatia and the Region”
Poslovna inteligencija: Leader in business intelligence
10
90 employees - 90% project engineers, technical and
business consultant, 10% sales and administration
HQ in Zagreb, Croatia, offices in UK, Slovenia, Serbia,
Bosnia and Herzegovina and Montenegro
Extensive experience in Telecommunication industry
and in R&D Big Data projects
Lead representative: Dražen Oreščanin
„We provide our customers with the best possible service in strategic consultancy and in
implementation of intelligent information systems for decision support, thereby helping them to
create new values and identify new business opportunities.“
Motivation
A number of recent technological developments have started to change our world forever:
• the rise of the internet
• the ever growing amount of activities in social networks
• the widespread adoption of smart phones and other mobile devices
• the instrumentation of the world with sensors. This is accompanied by dropping prices for computers, networks, and storage
11
Objectives
Provide support for large scale services by making the sensor layer a first class citizen in Big Data architectures.
Provide support for Complex Event Processing technology for business users in Big Data architectures.
Provide support for integrating machine learning tasks in the architecture.
Provide support for flexible and adaptive analytics workflows.
Exemplify the potential of the new architecture in the telecommunication and the cloud domain.
12
Use cases
Monitoring a smart energy grid.
Analysing the traffic state of a large city using car-to-car communication.
Monitoring the quality of a telecommunication network.
Detecting latent failures in a large cloud of thousands of machines.
Inspecting potentially fraudulent credit card transactions in real-time and blocking these transactions when necessary.
13
Application Scenarios
Mobile Phone Fraud Detection Detecting mobile phone fraud by analysing usage patterns
Reliably detect mobile phone fraud
Avoid financial losses due to fraud
Scalability to millions of events /sec (for simple filtering), for more complex analysis less (depending on complexity of task)
Cloud Health Monitoring Cloud data centre activity log monitoring
Possibility to replace time-interval by event- based maintenance
Avoiding service down-time
14
Negotiation Question: Data Size
15
Quantity of data
Average monthly number rated call details records is > 650 mio and total monthly quantity of data is > 300 GB. When it comes to raw call details, monthly quantities are significantly higher: number of records > 5500 mio and total size of data >10 TB.
Cloud services are one of the recently implemented services in Hrvatski Telekom. Number of cloud servers and customers using cloud services is still fairly low but numbers are rapidly increasing. Currently, the cloud consists of 6 machines which are producing a total amount of data of >40 GB per month. During the course of this project, we expect that the cloud might double its current size.
FERARI success criteria
The project’s success will be rigorously measured by the following validation criteria: Communication reduction with respect to global/state-of-the art
solutions.
A second quantitative validation criterion is processing time relative to the size of the data.
A third criterion is – for monitoring applications – the number of false alarms
Number of domains to which the approach can be deployed. A key to this is the variety aspect enabled by Distributed Complex Event Processing.
Flexibility. The system will be designed such that it can adapt to new, unforeseen circumstances and can be easily consumable.
16
Workpackages
17
Phase 1 (M1 – M12) - use case definition - component definition - architecture definition
Phase 2 (M13 – M24) - Component refinement - First use case prototype
implementation - First Architecture
implementation
Phase 3 (M25 – M34) will demonstrate and evaluate the impact of the methods developed in this project
Work Plan
18
Workpackage Structure
19
* WPs 6 and 7, which will interact with all WPs for dissemination and management tasks have been left out to increase readability. The general flow of dependencies is top-down from the use cases to the architecture and methodological work. Architecture and methods interact iteratively, since there are many technical and methodological dependencies.
WP1 – Use Cases
WP2 - Architecture
WP4 – Flexible Event
Processing
WP3 – Communication Efficient, Low –
Latency Methods WP5 – Robust
Distributed Stream
Monitoring
FERARI - Workpackages
20
Software Platform
Prototype
Stream processing
Communication efficient
processing Complex event
processing
provides
WP1: Application Scenarios, Test bed, Prototype
Objectives:
Selecting and defining the application scenarios fraud mining and cloud health monitoring
Definition of testing & evaluation criteria for the end users at HT
Setting up of a test bed both at HT and at the project partner’s local sites
Implementation and evaluation of scenarios in a prototype to demonstrate the advantage of FERARI with respect to the state of the art as well as to demonstrate its business value
21
WP2: Big Data Streaming Architecture & Technology Integration
Objectives:
Define a Big Data architecture that makes the sensor layer a first class citizen of the architecture,
Define a data and control flow that can implement a push based approach, so that processing can be partially done in situ,
Provide methods for robust distributed stream processing including online machine learning
Implement the architecture in as software platform (open source).
22
Architecture Diagram of FERARI
23
Event processing deals with these functions:
• get events from sources (event producers).
• route these events, filter them, normalize or otherwise transform them, aggregate them, detect patterns over multiple events (event processing agents).
• transfer events as alerts to a human or as a trigger to an autonomous adaptation system (event consumers).
WP2: Big Data Streaming Architecture & Technology Integration - TASKS
24
The tangible output of WP2 will be the definition of the software big-data architecture allowing for the integration of components for complex event processing, in-situ processing and robust distributed stream processing including online machine learning. In addition, the architecture will be
provided as software platform.
Interdependencies between WP1 & WP2
WP2: Software Platform Open source
General purpose for communication efficient big-data stream analysis alg.
Flexible event processing
Components as libraries
Interfaces to plugin concrete algorithms (learning, monitoring)
In stream learning
CEP Language
25
Software Platform
Prototype
Plugin concrete algorithms
WP3: Communication Efficient, Low-Latency Methods
Objectives:
develop in-situ processing methods that go beyond current methods
develop new algorithms that are able to efficiently detect granular events
identify and explore the right level of in-situ processing for scalability issues
26
In-Situ Processing (LIFT)
Coordinator Monitors Global Treshold (example: all nodes of a cloud work “in healthy” state) Sensors monitor local Safe-Zone in -situe
nodes
Alarm message only if local Safe Zone is violated
Global Condition/ Reference Point
Local Condition Safe - Zone
Resolution protocol (after violation)
27
WP4: Flexible Event Processing
Objectives:
develop a Complex Event Processing model and methodology suitable for specification, implementation, and maintenance of event-driven applications
Providing semantics for specifying event patterns
Providing a end-user consumable framework for flexibly specifying event processing systems
Providing modules for generation of an event processing network implementation and optimization plan that allows distributed in situ monitoring of complex event patterns
28
WP5: Robust Distributed Stream Monitoring
Objectives:
develop methods for robust distributed stream monitoring
exploit online machine learning methods to adapt the FERARI data/control flow to unforeseen circumstances
Provide support for integrating machine learning into the architecture.
Accounting for uncertainty in the architecture
29
Mobility Monitoring using stationary sensors
Each sensor computes a (linear counting) sketch of bluetooth addresses in sensor rage
Sketch is a bit-array of fixed length
Provide set of mobility mining primitives
count distinct
union
intersection
Simple LIFT Example
Coordinator
Si Sj
sk(R
Si)
sk(R
Sj)
30
WP6: Dissemination & Exploitation
Objectives:
Disseminating the FERARI theoretic framework to the scientific community of data mining and distributed systems.
Outlining the methodological and technical superiority of the proposed solution compared to other approaches to distributed monitoring
Dissemination to high-profile early adaptors within the scope of the application scenarios
31
WP7: Coordination
Objectives:
Establishment of a strong project management scheme
Successful achievement of the project objectives on time and within budget
Generation of synergies amongst the project members
Continuous monitoring of the project’s progress and timely initiation of corrective actions (if needed)
Coordination of the continuous process aiming to transfer the knowledge generated to the relevant scientific communities
32
List of Deliverables
33
Deliver
able
No
Deliverable name WP
No.
Nature
Dissemi-
nation
level
Due
date
1.1 Application Scenario Description and Requirement Analysis
1 R PU M12
1.2 Final Application Scenarios and Description of Test Environment
1 R PU M24
1.3 Application Scenario & Prototype Report 1 R PU M36
2.1 Architecture definition
2 R PU M12
2.2 System Prototype
2 R PU M24
2.3 Final Prototype 2 R PU M36
3.1 Requirements and state of the art overview on in situ methods
3 R PU M12
3.2 Development of algorithms based on in-situ, low-latency Methods
3 R PU M24
3.3 Implementation and evaluation of in-situ, low latency Algorithms
3 R PU M36
4.1 Requirements and state of the art overview on Flexible Event Processing
4 R PU M12
4.2 Goal driven model and methodology for specification of event processing Applications
4 R PU M24
4.3 Automatic generation of annotated event Processing network from the goal-driven Model
4 R PU M36
Deliver
able
No
Deliverable name WP
No.
Nature
Dissemi-
nation
level
Due
date
5.1 Requirements and state of the Art overview on Robust Stream Monitoring
6 R PU M12
5.2 Algorithms for Robust Distributed Stream Monitoring and Supporting Data Integrity
6 R PU M24
5.3 Implementation of Algorithms for Robust Distributed Stream Monitoring and Supporting data Integrity
6 R PU M36
6.1 Project Fact Sheet 6 O PU M3
6.2 Project Web Site 6 O PU M3
6.3 Project Presentation 6 O PU M3
6.4 Project Workshop, Seminar and Training Course 6 R PU M30
6.5 First Draft of Exploitation Plan 6 R CO M24
6.6 Exploitation and Dissemination Plan 6 R CO M36
7.1 Quality Assurance Plan 7 R PU M6
7.2 1st Annual Project Report 7 R CO M12
7.3 2nd Annual Project Report 7 R CO M24
7.4 Final Project Report 7 R CO M36
Each WP-Leader is responsible for the deliverables of his or her WP – more details in the
Summary
The goal of the FERARI project is to pave the way for efficient, real-time Big Data technologies of the future.
It will enable business users to express complex analytics tasks through a high-level declarative language that supports distributed Complex Event Processing and sophisticated machine learning operators as an integral part of the system architecture.
Effective, real-time execution at scale will be achieved by making the sensor layer a first-class citizen in distributed streaming architectures and leveraging in-situ data processing as a first (and, in the long run, the only realistic) choice for realizing planetary-scale Big Data systems.
34
http://www.ferari-project.eu
35