frankfurt big data lab & refugee projeect

18
tp://www.bigdata.uni-frankfurt.de/ Welcome to the Frankfurt Big Data Lab! tp://www.bigdata.uni-frankfurt.de/

Upload: goethe-univeristy

Post on 12-Feb-2017

396 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Welcome to the Frankfurt Big Data Lab!

http://www.bigdata.uni-frankfurt.de/

Page 2: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Mission The objective of the Big Data Laboratory is to carry out research in the domains of big data and data analytics from the perspective of information systems and computer science.

Our approach is based on the interdisciplinary binding between data management technologies and analytics.

2

The lab is located in Frankfurt, the financial metropolis of central Europe and targets to be a source of knowledge and expertise both for research and industry applications.

Frankfurt Big Data LabThe DATA REFUGEES Project

Page 3: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Prof. Dott. Ing. Roberto V. Zicari

Dr. Karsten TolleLab Director

Hee Eun Kim PhD Student

Todor IvanovPhD Student

Marten RosselliPhD StudentAffiliations: Goethe University / Accenture

Sven RillPhD StudentAffiliation: Hof University of Applied Sciences

Rahul SoniPhD StudentAffiliations: Goethe University / Accenture

Concha Sanchez-OcañaProject ManagerDBIS, Goethe University Frankfurt

Raik NiemannPhD StudentAffiliation: Hof University of Applied Sciences

3

Team

Page 4: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/4

Teaching

Page 5: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Our lab is currently active in the following research areas:

1. Big Data Management Technologies

2. Data Analytics / Data Science

3. Graph Databases / Linked Open Data (LOD)

4. Big Data for Common Good5

Research Areas

Page 6: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Our work is concentrated on the evaluation and optimization of

Operational data stores that allow flexible schemas

Big Data management and analytical platforms (Hadoop, Spark, etc …)

Complex distributed storage and processing architectures

Big Data Benchmarks

6

1. Big Data Management Technologies

Page 7: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/7

1. Big Data Management Technologies (cnt)Benchmarking Big Data platforms for performance, scalability, elasticity, fault-tolerance …

Yahoo Cloud Service Benchmark (YCSB)

Evaluating the performance (read/write workloads) of NoSQL stores like Cassandra.

HiBench10 workloads for evaluating the Hadoop platform in terms of speed, throughput, HDFS bandwidth, system resource utilization and machine learning algorithms.

BigBenchApplication level benchmark consisting of 30 queries implemented in Hive based on the TPC-DS benchmark.

TPCx-HS The first standard Big Data Benchmark for Hadoop, based on the TeraSort workload.

Benchmarks used

Page 8: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/8

1. Big Data Management Technologies (cnt)

NoSQLEvaluated the Cassandra / DataStax Enterprise (9 Nodes Cassandra Cluster) with HiBench and Yahoo Cloud Service Benchmarks.

Hadoop Ecosystem • Evaluated the performance of different virtualized Hadoop

cluster configurations on top of VMware vSphere using the Big Data Extension (Project Serengeti).

• Benchmarking the Cloudera Hadoop Distribution 5.2 (4 Nodes Hadoop Cluster) with the TPCx-HS benchmarks.

• Experimenting with the BigBench benchmark using Hive and Spark SQL.

In-Memory Databases Evaluation of a Big Data Architecture based on SAP HANA and Cloudera Hadoop for different use cases and analytical workloads

Platforms used

Page 9: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/9

1. Big Data Management Technologies (cnt)Relevant Publications

• Performance Evaluation of Enterprise Big Data Platforms with HiBench (In 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2015), August 20-22, Helsinki, Finland)

• Benchmarking the Availability and Fault Tolerance of Cassandra (In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16-17, 2015, Toronto, Canada)

• Performance Evaluation of Spark SQL using BigBench (In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16-17, 2015, Toronto, Canada)

• Benchmarking DataStax Enterprise/Cassandra with HiBench (Technical Report No. 2014-2 )

• Performance Evaluation of Virtualized Hadoop Clusters (Technical Report No. 2014-1)

• Benchmarking Virtualized Hadoop Clusters (In proceedings of the Big Data Benchmarking - 5th International Workshop, WBDB 2014, Potsdam,Germany, August 5-6, 2014, Revised Selected Papers)

Full list of publications is available online: http://www.bigdata.uni-frankfurt.de/publications/

Page 10: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

1. Big Data Management Technologies (cnt)Member of the Standard Performance Evaluation Corporation (SPEC)

SPEC is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. The RG Big Data Working Group is a forum for individuals and organizations interested in the big data benchmarking topic.

List of all 52 Member Organizations

Advanced Strategic Technology LLC ARM

bankmark UG

Barcelona Supercomputing Center

Charles University

Cisco Systems

Cloudera, Inc

Compilaflows

Delft University of Technology

Dell

fortiss GmbH

Friedrich-Alexander-University Erlangen-Nuremberg

Goethe University Frankfurt

Hewlett-PackardHuawei

IBM

Imperial College London

Indian Institute of Technology, Bombay

Institute for Information Industry, TaiwanInstitute of Communication and Computer Systems/NTUA

Intel

Karlsruhe Institute of Technology

Kiel University

Microsoft

MIOsoft Corporation

NICTA

NovaTec GmbH

Oracle

Purdue University

Red Hat

RWTH Aachen University

Salesforce.com

San Diego Supercomputing Center

San Francisco State University SAP AG

Siemens Corporation Technische Universität Darmstadt

The MITRE CorporationUmea University

University of Alberta

University of Coimbra University of Florence

University of Lugano

University of Minnesota University of North Florida

University of Paderborn

VMware

University of Wuerzburg

University of Texas at Austin

University of Stuttgart *

University of Pavia

Page 11: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

Benchmarking (Berlin SPARQL Benchmark - BSBM):

Linked Open Data are structured data that are published online in order to be accessed automatically by computers. By combining different sources huge amount of similar or related structured data are brought together in order to be queried and analyzed.

This research area is closely related to the Semantic Web and its standards stack like RDF* and OWL. We are interested in analyzing and benchmarking existing storage solutions and to apply the idea of LOD to selected applications.

Our current activities are:• AFE-Web – Cooperation project with Römisch Germanischen Kommission (RGK) Antike Fundmünzen in Europa:

database (AFE-WEB) is a web-based database for recording and publishing coin finds• European Coin Find Network• Nomisma.org (Karsten Tolle being member of the steering committee)

11

2. Graph Databases/Linked Open Data(LOD)

Page 12: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/

• Twenty students from UC Berkeley, Stanford University and Goether University Frankfurt committed participation to the challenges of obesity, heart/lung failure and mood disorder.

• Frankfurt Big Data Lab uses of data acquisition and data blending to improve the quality of analytics.

12

3. Data analytics/Data science

# of patients in a postal area

<visualization of the given patients’ data><retrieved Twitter data by

a keyword of obesity><retrieved Twitter data from

a state of Pennsylvania>

Page 13: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/13

4. Big Data for Social GoodWhat can be done in the international research community to make sure that some of the most brilliant big data use cases do have an impact also for social issues ?

Our motivation is to encourage the international research community to work on Big Data problems that have a potential positive social impact for mankind

World map of scientific collaborations, 2014

Page 14: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/14

4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT 1.1 million refugees and migrants registered in Germany in 2015Number of refugees to arrive in Frankfurt increases from 170 to 250 per week

We will explore the question if and how data can be used to create:— A Data Products to help the inclusion in the city of Frankfurt— Insights that can be escalated to the decision makers in the city of Frankfurt.

Page 15: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/15

4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT - METODOLOGY We will gather data from various sources available in Frankfurt. The challenge is that the flow of information is not, by nature, well organized.

Data integration

Data fusion

Data blending

Design Thinking

TechniquesCollect data from multiple sources, including changes of format and cleanup of redundant or useless entries. The outcome is a standardized, unified table.

Integrate imperfect data sources overlapping over a small group of objects.

Allow sources to be imperfect, incomplete, and overlapping over a few objects or none at all, requiring inspired guesses and generalizations

Create and evaluate new ideas through a human centric approach for problem solving.

Page 16: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/16

4. Big Data for Social Good ProjectsWe are here!

No too much data available!

We aim to demonstrate that is feasible by using available data to help, support and possibly guide the process of inclusion for refugees in the city of Frankfurt.

Page 17: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/17

4. Big Data for Social Good Projects

• LOOKING FOR DATA!• Define a open source tool and methodology for managing the volunteers and

activities and propose it to the AWO organization. THIS DATA IS NOT COLLECTED

http://lale.help https://volunteer-planner.org• Retrieve the twitter data about refugees in Frankfurt.

• Coaching the two refugees to develop a mobile app.• We aim in particular to help refugee children. We hope to be able to help

them in the inclusion process in our society.

Page 18: Frankfurt Big Data Lab & Refugee Projeect

http://www.bigdata.uni-frankfurt.de/18

4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT NEEDS

We encourage developers to contact us if you wish to contribute to this project!

Contact Person: Concha Sanchez-Ocaña, Project Manager. [email protected]

Organizations involved

Frankfurt Big Data Lab, Goethe University Frankfurt.School of Business, University of Applied Sciences MainzResearch Center SAFE (funded by the State of Hessen initiative for research, LOEWE)Betriebliche Kommunikationssysteme und IT-Security, University of Applied Sciences Offenburg

THANK YOU!!