frankfurt big data lab & refugee projeect
TRANSCRIPT
http://www.bigdata.uni-frankfurt.de/
Welcome to the Frankfurt Big Data Lab!
http://www.bigdata.uni-frankfurt.de/
http://www.bigdata.uni-frankfurt.de/
Mission The objective of the Big Data Laboratory is to carry out research in the domains of big data and data analytics from the perspective of information systems and computer science.
Our approach is based on the interdisciplinary binding between data management technologies and analytics.
2
The lab is located in Frankfurt, the financial metropolis of central Europe and targets to be a source of knowledge and expertise both for research and industry applications.
Frankfurt Big Data LabThe DATA REFUGEES Project
http://www.bigdata.uni-frankfurt.de/
Prof. Dott. Ing. Roberto V. Zicari
Dr. Karsten TolleLab Director
Hee Eun Kim PhD Student
Todor IvanovPhD Student
Marten RosselliPhD StudentAffiliations: Goethe University / Accenture
Sven RillPhD StudentAffiliation: Hof University of Applied Sciences
Rahul SoniPhD StudentAffiliations: Goethe University / Accenture
Concha Sanchez-OcañaProject ManagerDBIS, Goethe University Frankfurt
Raik NiemannPhD StudentAffiliation: Hof University of Applied Sciences
3
Team
http://www.bigdata.uni-frankfurt.de/
Our lab is currently active in the following research areas:
1. Big Data Management Technologies
2. Data Analytics / Data Science
3. Graph Databases / Linked Open Data (LOD)
4. Big Data for Common Good5
Research Areas
http://www.bigdata.uni-frankfurt.de/
Our work is concentrated on the evaluation and optimization of
Operational data stores that allow flexible schemas
Big Data management and analytical platforms (Hadoop, Spark, etc …)
Complex distributed storage and processing architectures
Big Data Benchmarks
6
1. Big Data Management Technologies
http://www.bigdata.uni-frankfurt.de/7
1. Big Data Management Technologies (cnt)Benchmarking Big Data platforms for performance, scalability, elasticity, fault-tolerance …
Yahoo Cloud Service Benchmark (YCSB)
Evaluating the performance (read/write workloads) of NoSQL stores like Cassandra.
HiBench10 workloads for evaluating the Hadoop platform in terms of speed, throughput, HDFS bandwidth, system resource utilization and machine learning algorithms.
BigBenchApplication level benchmark consisting of 30 queries implemented in Hive based on the TPC-DS benchmark.
TPCx-HS The first standard Big Data Benchmark for Hadoop, based on the TeraSort workload.
Benchmarks used
http://www.bigdata.uni-frankfurt.de/8
1. Big Data Management Technologies (cnt)
NoSQLEvaluated the Cassandra / DataStax Enterprise (9 Nodes Cassandra Cluster) with HiBench and Yahoo Cloud Service Benchmarks.
Hadoop Ecosystem • Evaluated the performance of different virtualized Hadoop
cluster configurations on top of VMware vSphere using the Big Data Extension (Project Serengeti).
• Benchmarking the Cloudera Hadoop Distribution 5.2 (4 Nodes Hadoop Cluster) with the TPCx-HS benchmarks.
• Experimenting with the BigBench benchmark using Hive and Spark SQL.
In-Memory Databases Evaluation of a Big Data Architecture based on SAP HANA and Cloudera Hadoop for different use cases and analytical workloads
Platforms used
http://www.bigdata.uni-frankfurt.de/9
1. Big Data Management Technologies (cnt)Relevant Publications
• Performance Evaluation of Enterprise Big Data Platforms with HiBench (In 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE 2015), August 20-22, Helsinki, Finland)
• Benchmarking the Availability and Fault Tolerance of Cassandra (In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16-17, 2015, Toronto, Canada)
• Performance Evaluation of Spark SQL using BigBench (In 6th Workshop on Big Data Benchmarking (6th WBDB), June 16-17, 2015, Toronto, Canada)
• Benchmarking DataStax Enterprise/Cassandra with HiBench (Technical Report No. 2014-2 )
• Performance Evaluation of Virtualized Hadoop Clusters (Technical Report No. 2014-1)
• Benchmarking Virtualized Hadoop Clusters (In proceedings of the Big Data Benchmarking - 5th International Workshop, WBDB 2014, Potsdam,Germany, August 5-6, 2014, Revised Selected Papers)
Full list of publications is available online: http://www.bigdata.uni-frankfurt.de/publications/
http://www.bigdata.uni-frankfurt.de/
1. Big Data Management Technologies (cnt)Member of the Standard Performance Evaluation Corporation (SPEC)
SPEC is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. The RG Big Data Working Group is a forum for individuals and organizations interested in the big data benchmarking topic.
List of all 52 Member Organizations
Advanced Strategic Technology LLC ARM
bankmark UG
Barcelona Supercomputing Center
Charles University
Cisco Systems
Cloudera, Inc
Compilaflows
Delft University of Technology
Dell
fortiss GmbH
Friedrich-Alexander-University Erlangen-Nuremberg
Goethe University Frankfurt
Hewlett-PackardHuawei
IBM
Imperial College London
Indian Institute of Technology, Bombay
Institute for Information Industry, TaiwanInstitute of Communication and Computer Systems/NTUA
Intel
Karlsruhe Institute of Technology
Kiel University
Microsoft
MIOsoft Corporation
NICTA
NovaTec GmbH
Oracle
Purdue University
Red Hat
RWTH Aachen University
Salesforce.com
San Diego Supercomputing Center
San Francisco State University SAP AG
Siemens Corporation Technische Universität Darmstadt
The MITRE CorporationUmea University
University of Alberta
University of Coimbra University of Florence
University of Lugano
University of Minnesota University of North Florida
University of Paderborn
VMware
University of Wuerzburg
University of Texas at Austin
University of Stuttgart *
University of Pavia
http://www.bigdata.uni-frankfurt.de/
Benchmarking (Berlin SPARQL Benchmark - BSBM):
Linked Open Data are structured data that are published online in order to be accessed automatically by computers. By combining different sources huge amount of similar or related structured data are brought together in order to be queried and analyzed.
This research area is closely related to the Semantic Web and its standards stack like RDF* and OWL. We are interested in analyzing and benchmarking existing storage solutions and to apply the idea of LOD to selected applications.
Our current activities are:• AFE-Web – Cooperation project with Römisch Germanischen Kommission (RGK) Antike Fundmünzen in Europa:
database (AFE-WEB) is a web-based database for recording and publishing coin finds• European Coin Find Network• Nomisma.org (Karsten Tolle being member of the steering committee)
11
2. Graph Databases/Linked Open Data(LOD)
http://www.bigdata.uni-frankfurt.de/
• Twenty students from UC Berkeley, Stanford University and Goether University Frankfurt committed participation to the challenges of obesity, heart/lung failure and mood disorder.
• Frankfurt Big Data Lab uses of data acquisition and data blending to improve the quality of analytics.
12
3. Data analytics/Data science
# of patients in a postal area
<visualization of the given patients’ data><retrieved Twitter data by
a keyword of obesity><retrieved Twitter data from
a state of Pennsylvania>
http://www.bigdata.uni-frankfurt.de/13
4. Big Data for Social GoodWhat can be done in the international research community to make sure that some of the most brilliant big data use cases do have an impact also for social issues ?
Our motivation is to encourage the international research community to work on Big Data problems that have a potential positive social impact for mankind
World map of scientific collaborations, 2014
http://www.bigdata.uni-frankfurt.de/14
4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT 1.1 million refugees and migrants registered in Germany in 2015Number of refugees to arrive in Frankfurt increases from 170 to 250 per week
We will explore the question if and how data can be used to create:— A Data Products to help the inclusion in the city of Frankfurt— Insights that can be escalated to the decision makers in the city of Frankfurt.
http://www.bigdata.uni-frankfurt.de/15
4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT - METODOLOGY We will gather data from various sources available in Frankfurt. The challenge is that the flow of information is not, by nature, well organized.
Data integration
Data fusion
Data blending
Design Thinking
TechniquesCollect data from multiple sources, including changes of format and cleanup of redundant or useless entries. The outcome is a standardized, unified table.
Integrate imperfect data sources overlapping over a small group of objects.
Allow sources to be imperfect, incomplete, and overlapping over a few objects or none at all, requiring inspired guesses and generalizations
Create and evaluate new ideas through a human centric approach for problem solving.
http://www.bigdata.uni-frankfurt.de/16
4. Big Data for Social Good ProjectsWe are here!
No too much data available!
We aim to demonstrate that is feasible by using available data to help, support and possibly guide the process of inclusion for refugees in the city of Frankfurt.
http://www.bigdata.uni-frankfurt.de/17
4. Big Data for Social Good Projects
• LOOKING FOR DATA!• Define a open source tool and methodology for managing the volunteers and
activities and propose it to the AWO organization. THIS DATA IS NOT COLLECTED
http://lale.help https://volunteer-planner.org• Retrieve the twitter data about refugees in Frankfurt.
• Coaching the two refugees to develop a mobile app.• We aim in particular to help refugee children. We hope to be able to help
them in the inclusion process in our society.
http://www.bigdata.uni-frankfurt.de/18
4. Big Data for Social Good ProjectsThe DATA REFUGEES PROJECT NEEDS
We encourage developers to contact us if you wish to contribute to this project!
Contact Person: Concha Sanchez-Ocaña, Project Manager. [email protected]
Organizations involved
Frankfurt Big Data Lab, Goethe University Frankfurt.School of Business, University of Applied Sciences MainzResearch Center SAFE (funded by the State of Hessen initiative for research, LOEWE)Betriebliche Kommunikationssysteme und IT-Security, University of Applied Sciences Offenburg
THANK YOU!!