big data - unibs.it · big data analytics long term archiving tape library high performance data...

20
Big Data Management and Analytics Claudio Cacciari ([email protected])

Upload: others

Post on 27-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Big Data

Management and Analytics

Claudio Cacciari ([email protected])

Page 2: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Cineca’s used storage

Page 3: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Cineca’s scenario

European projects

Italian

projects

Projects

FAIR

principles

Services &

resourcesCloud

HPC

Big Data

analytics

Long term

archiving

Tape

library

High

performance

data transfer

Page 4: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5
Page 5: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Cloud

HPC

plain fs

Long

term

archive

Page 6: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Big Data and Analytics

Giorgio Pedrazzi

Page 7: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Technologies

• Cloud computing: Openstack, Docker, Singularity

• Hadoop ecosystem: Hive, Pig, Mahout, Spark

• Open source applications: R, H2O.ai, Stanford NLP, Knime5

• Commercial software: Stata, SAS, Matlab, 5

Page 8: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Data repository

Page 9: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

EUHIT Portal

Page 10: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

EuHIT

EuHIT is a consortium that aims at integrating cutting-edge

European facilities for turbulence research across national

boundaries.

Page 11: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

EUDAT

A truly pan-European Infrastructure

EUDAT offers common data services,

supporting multiple research communities as

well as individuals, through a geographically

distributed, resilient network of 35 European

organisations

Our vision is to enable European

researchers and practitioners

from any research discipline to

preserve, find, access, and

process data in a trusted

environment, as part of a

Collaborative Data

Infrastructure

Page 12: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

EUDAT Service Suite

http://www.eudat.eu/services

Page 13: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

EUDAT data management

Page 14: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Persistent Identifiers (PID)

• EUDAT relies on the B2HANDLE service to associate persistent identifier to

digital objects

• Its focus is the registration of data in an early state of the scientific process,

where lots of data is generated and has to become referable to collaborate with

other scientific groups or communities.

14 © CINECA

Handle resolution

Page 15: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Why High Performance Computers in HBP?

Brain simulation

Data analytics

Image

processing

Visualisation

The human brain

is COMPLEX!

Illustration: Brown Bird Design

Page 16: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

High Performance Analytics & Computing Platform

Our mission

Build and operate a

supercomputing, data and

visualization infrastructure

enabling scientists to:• Run large-scale, data intensive,

interactive brain simulations up to the

size of a full human brain

• Manage the large amounts of data

used and produced in the HBP

• Manage complex workflows, data

analysis and visualization workloads

Page 17: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

High Performance Analytics & Computing Platform

Our role in the Human Brain Project

• Providing the base infrastructure for the HBP:supercomputers, storage, network, other resources

• Development of software and technology to– Facilitate usage of the infrastructure for researchers

– Make more efficient use of the infrastructure, e.g.• Simulation technology capable to exploit modern and

future supercomputers

• Visualization tools: working with large-scale imaging or simulation data

• Enabling the data federation and data-intensive computing

• Interactive computing technology

• Supporting developers and users

Page 18: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Federated base infrastructure for the HBP

Page 19: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

Unified access to federated resources

Middleware: unified access to resources

Page 20: Big data - unibs.it · Big Data analytics Long term archiving Tape library High performance data transfer . Cloud HPC plainfs Long term archive. Big Data and Analytics ... • Opensourceapplications:R,H2O.ai,StanfordNLP,Knime5

QUESTIONS