big data - unibs.it · big data analytics long term archiving tape library high performance data...
TRANSCRIPT
Cineca’s used storage
Cineca’s scenario
European projects
Italian
projects
Projects
FAIR
principles
Services &
resourcesCloud
HPC
Big Data
analytics
Long term
archiving
Tape
library
High
performance
data transfer
Cloud
HPC
plain fs
Long
term
archive
Big Data and Analytics
Giorgio Pedrazzi
Technologies
• Cloud computing: Openstack, Docker, Singularity
• Hadoop ecosystem: Hive, Pig, Mahout, Spark
• Open source applications: R, H2O.ai, Stanford NLP, Knime5
• Commercial software: Stata, SAS, Matlab, 5
Data repository
EUHIT Portal
EuHIT
EuHIT is a consortium that aims at integrating cutting-edge
European facilities for turbulence research across national
boundaries.
EUDAT
A truly pan-European Infrastructure
EUDAT offers common data services,
supporting multiple research communities as
well as individuals, through a geographically
distributed, resilient network of 35 European
organisations
Our vision is to enable European
researchers and practitioners
from any research discipline to
preserve, find, access, and
process data in a trusted
environment, as part of a
Collaborative Data
Infrastructure
EUDAT Service Suite
http://www.eudat.eu/services
EUDAT data management
Persistent Identifiers (PID)
• EUDAT relies on the B2HANDLE service to associate persistent identifier to
digital objects
• Its focus is the registration of data in an early state of the scientific process,
where lots of data is generated and has to become referable to collaborate with
other scientific groups or communities.
14 © CINECA
Handle resolution
Why High Performance Computers in HBP?
Brain simulation
Data analytics
Image
processing
Visualisation
The human brain
is COMPLEX!
Illustration: Brown Bird Design
High Performance Analytics & Computing Platform
Our mission
Build and operate a
supercomputing, data and
visualization infrastructure
enabling scientists to:• Run large-scale, data intensive,
interactive brain simulations up to the
size of a full human brain
• Manage the large amounts of data
used and produced in the HBP
• Manage complex workflows, data
analysis and visualization workloads
High Performance Analytics & Computing Platform
Our role in the Human Brain Project
• Providing the base infrastructure for the HBP:supercomputers, storage, network, other resources
• Development of software and technology to– Facilitate usage of the infrastructure for researchers
– Make more efficient use of the infrastructure, e.g.• Simulation technology capable to exploit modern and
future supercomputers
• Visualization tools: working with large-scale imaging or simulation data
• Enabling the data federation and data-intensive computing
• Interactive computing technology
• Supporting developers and users
Federated base infrastructure for the HBP
Unified access to federated resources
Middleware: unified access to resources
QUESTIONS