summary, conclusions and commentsbigdata.htw-berlin.de/17/slides/4.4_guelzow.pdf · fraunhofer:...
TRANSCRIPT
Volker Guelzow
Summary, conclusions and comments
Social Media Platforms 10 Billion Facebook messages/day 4.5 Billion likes 350 million picture/day Source: https://www.simplilearn.com/how-facebook-is-using-big-data-article
2
Social Media used for marketing http://www.socialmediaexaminer.com/SocialMediaMarketingIndustryReport2015.pdf
3
Big Data, are we still leading?
• Flagship projects SKA and HL-LHC with Exabytes/Year in about 2025
• Various other Projects like • European XFEL ~ 100 PB/
Year before 2020 • CTA, ….. • Big Data not only a matter of size • Big Data means open data
4
Big Data in Astronomy
• Data-driven methods • Lots of information hidden in lowers data, but difficult to extract (degeneracies) individually • Data driven methods can statistically tackle the problem. • Astronomy is a discovery science • In order to tap the full potential: Big data means open data! • Some very exciting big data projects in the making • Ideal playing ground (no commercial interest, (almost) no privacy
concerns) • Data-driven algorithms key to exploit the fun
5
The SKA Data Flow
6
Fraunhofer: Data Life Cycle
A A D 1 Application generates data
2 Data recorded by sensors
3 Sensors pre-process data
4 Transmitting data
5 Storing data
6 Structuring data
8 Interpreting data
9 Forecasting data
13 Solutions show added values
15 Implementa-tion modifies application
14 Interpreting and implementing added values
10 Data simulation
12 Optimizing models
11 Data becomes input for decision models
7 Grouping data
A ANALYTICS A APPLICATION D DATA
What about Data Compression/Reduction?
From 1 PB/s to 100 MB/s Nyriad(NZ) start up
8
Will the network be a problem??
Cern to Tier X: Average 48 Gbps
9
Future computing models?
• Federation of resources • Federation across domains • Only a few large data centres • Many centres for compute • Hybrid solutions • Commercial clouds
10
The SKA Computing Model A collaborative alliance • transparent and location
agnostic interface to SRCs for users
• no SKA user should care where their data products are
• all SKA users should be able to access their data products, irrespective of whether their country or region hosts a regional SRC
11
Software, Algorithms, Methods?
Science need Causality, marketing from social media is focused on Correlation • Software is a problem of all communities • Algorithms like machine learning, simulations (start-to-end), low
signal to noise , Visual Analytics and visualization etc need to be evolved
• Optimization, enableling new technologies etc is needed We need to invest in this field What about Data Curation?
12
Software, Algorithms, Methods
13
Cooperation with industry
14
Cooperation with industry
SAP-HANA4Pulsar SAP for medical Insights Partner KIT in smart Innovation Lab But it‘s not oneway!
15
Only Astonomers?
16
17
Education, Training, Reputation, new directions New type of scientists bridging the gap between computer science and domain science is needed … to make efficient use of ICT technology and to help harvesting the domain science We have to give reputation and carrer possibilities And we have to train scientist in the research fields
18
ADA-Center Structure
Further Universities
National (LMU, Passau, Würzburg etc.)
International (Georgia-Tech, Montreal etc.)
Companies E-Commerce Industrie 4.0
Agrarian Public safety Automotive
Logistics Public transport
Insurance usw.
ADA-Center Fraunhofer IIS and FAU
Youth development
SENSORS Signal detection and processing
IoT Data transmission and networking
ANALYTICS Creation of
knowledge and models
Research cooperations, groups, projects
Cooperations with industry, projects
Helmholtz Association HGF President Wiestler: Information & Datamanagement A Key Element for Helmholtz • Helmholtz Data Federation • LSDMA • Helmholtz Analytics Framework • LHC Tier 1&2 centres • Helmholtz Incubator Softwaredevelopments eg. dCache, GGUS, HPC,…. -> Many competences!
20
Helmholtz Association -DLR 20.000 simulations for digital aircraft Deep learning in remote sensing Jena Institute: Management /Analysis of BD Smart Systems IT Security Citizen science
21
PartnershipIni-a-veComputa-onalSciencesπCS
n Individualizedservicesforselectedscien-ficgroups–flagshiprole– Dedicatedpoint-of-contact
– Individualsupportandguidanceandtargetedtraining&educa-on
– Planningdependabilityforusecasespecificop-mizedITinfrastructures
– EarlyaccesstolatestITinfrastructure(hard-andsoBware)developmentsandspecifica-onoffuture
requirements
– AccesstoITcompetencenetworkandexper-seatCSandMathdepartments
n Partnercontribu-on– EmbeddingITexpertsinusergroups
– Jointresearchprojects(includingfunding)
– Scien-ficpartnership–equalfoo-ng–jointpublica-ons
n LRZbenefits– Understandingthe(currentandfuture)needsandrequirementsoftherespec-vescien-ficdomain
– Developingfutureservicesforallusergroups
– Thema-cfocusing:EnvironmentalCompu-ng
D. Kranzlmüller CompBioMedWorkshop 22
Technology
Technology developement (up to 20% increase in performance/year for constant budget) will NOT solve the problems. We don‘t really see disruptive technologies coming up soon Develope on software technology • Optimization and parallelization • GPGPU‘s, FPGA‘S, …. • Green computing
23
What about Funding ?
National and European Funding Coordinated and coherent Actions needed
24
Will the EOSC solve the problem
25
Summary – 1st • Big data is not only a matter of size • Google, Amazon, Facebook etc will not solve our problems • But we can learn and profit from them • We can produce a direct impact to industry and should cooperate • But we need to cooperate across discilines • There is plenty of expertise in Germany on data management,
Software development and nice cooperations with industry • But we need to improve education of people • We need to get experts across domains together • We need a change in scientific reputation of „scientific software
development“
26
Summary – 2nd
• We need to invest into SW development • We don‘t see pressing limits at networking • This allows for different computing models -> Hybrid models, only
a few data centres in Germany? • Computer and storage technology progress will not solve our
problems for a constant budget • Open Data and FAIR principles need policies
27