e-infrastructure available for research, using the right tool for the right job
TRANSCRIPT
1
e-Infrastructure available for research, using the right tool for the right job
Dr David Wallom
Associate Director
Overview
• What?
• Where?
• How?
What is e-Infrastructure?
The integration of digitally-based technology, resources, facilities, and services combined with people and organizational structures needed to support modern, collaborative research (and teaching).
1.Data and Storage
2.Software (and Algorithms)
3.Hardware (Compute)
4.Networks
5.Security and authentication (BIS Report)
6.People (Collaboration, Skills, Capacity)
7.The Digital Library
Supercomputers
• Modern supercomputers are parallel (multi-
processor) computers with hundreds or thousands of processors.
• Usually commodity processors (Intel, AMD, etc) –
similar to those in a desktop PC.
• Usually commodity compute servers connected by fast networks (10Gbit, Ethe
rnet, Infiniband)
• This is a change from previous custom-built supercomputers (CRAY T3 etc)
• Increase in speed of supercomputers over desktop computers is from using
multiple CPUs at once, not from faster CPUs.
• Although this philosophy now moving to desktop PCs with multicore
processors.
• Servers and HPC moving to “manycore” processors
• So to gain benefit from supercomputers requires getting your application to ru
n on multiple processors
– parallel computing
Parallel Programming
• Two basic options for efficient parallel computing.
• Reduce completion time of a single run
– Speed up the execution time of a single program run by dividing up the computatio
n among the processors.
– High-performance computing
– Need to modify (parallelize) your program.
• Reduce total completion time of many runs
– Run many instances of the same program concurrently, each on a different processo
r.
– High-throughput computing
– Don’t need to change your code – need to introduce high level control functions
High Performance Computing
• Use multiple processors to speed up program run-time, by dividing up the
computation among CPUs.
• Requires changing your program -> parallel programming
• Usually achieved by splitting up the data to process to different processors – data
parallel computing
• Each processor does processing on its section of the data, concurrently with
all the other processors.
• Processors may need to access data stored in the memory of another proce
ssor (on a cluster), or in banks of memory shared between processors (SMP
Parallel Computing
• Many compute nodes (or servers) connected by a fast network
– Usually Infiniband or 1/10 Gbit Ethernet
• Each node has multiple processors
– Usually 2 or 4
• Each processor has multiple processing cores
– Usually 4 to 16
– Manycore (32 or more) coming soon
– GPUs or custom chips have hundreds of cores
• A single compute node can have lots of cores
– 32 or 64 now inexpensive and common
– About 10 years ago 64-‐processor SMP wa
Shared and Distributed Memory application models
• Shared memory (SMP)
– Processing cores on a compute node all have shared access to all the memory on
the node
– Parallel programs often written using multiple threads, usually with one thread
per processing core
• Distributed memory (cluster)
– Processing cores on one compute node can’t directly access memory from other no
des
– Program needs to send informa8on using message passing
– Parallel programs often written using Message Passing Interface (MPI) standard
GP-GPU
• High performance of GPUs for gaming has led to development of
general purpose GPUs (GP‐GPUs)
• High-
performance GPUs aimed at technical computing rather than gamin
g
• More general processing capability, fast double precision floating po
int, large error-correcting GPU memory
• But much more expensive than standard GPUs
• nVIDIA Tesla Keplar K40 GPU has 2880 cores
– 4.29 TFlops single precision
– 1.43 TFlops double precision
• But GPUs have a specialised architecture and programming model,
so programs need to be rewritten for GPUs – CUDA or OpenCL
• Many applications have now been ported to GPUs.
• Some applications run very well on GPUs and can scale across m
ultiple GPUs.
• However some give little or no performance benefit over a many-
core compute node.
Types of hardware e-Infrastructure
• Generic
– Supporting multiple different research communities through provision of general purpose
servcies
• Community Specific
– A tailored set of resources specific to a particular research groups or discipline
HPC Computational Resources
• Institutional – Advanced Computing Centres
• Regional – New EPSRC funded mid range centres
• National – HECToR/ARCHER
• International - PRACE
Institutional
• Many HEI have research computing resources locally
• HPC clusters and data storage services
• ‘Advanced Computing Centre, Research Computing Centre’*
UK Government decided there was a need for regional research infrastructure to link into national facilities
UK Tier 1 and Tier 2 Systems
National HPC
ARCHIE-West
MidPlus
EPSRC Regional HPC• Emerald
• GPU system - 372 NVIDIA Tesla processors• sustained capability of 114TF and on installation in
March 2012 was one of the largest GPU based systems in Europe.
• Hosted by STFC e-Science
• Iridis• 12,000 core Intel Westmere based system• ~108TF. • Capability/highly scaling work
• SGI supercomputer cluster • 83 SGI servers with Intel Xeon processors
E5-2600• Total of 5,312 cores.
EPSRC Regional HPC
• Compute • New capability cluster 2700 cores, infiniBand, some
GPU and large-memory SMP nodes• High throughput cluster 2900 cores to facilitate
projects requiring to span large parameter spaces.
• Data storage and archive facilities • initially ~1 PB capacity including metadata-based
search and retrieval with secure implementation of a range of user-specified levels of privacy.
MidPlus
HPC Midlands• 3000 core Bull HPC• 48 Tflop• Infiniband interconnect
• Based across the Sci-Tech Daresbury and Harwell Oxford national science and innovation campuses, represents an exceptionally formidable supercomputing environment and includes:
– Blue Joule: the UK’s number one supercomputer, the world’s largest dedicated to software development and capable of over a thousand trillion calculations per second
– Blue Wonder: a world-class iDataPlex cluster comprising over 8,000 processor cores and ideal for driving optimal value from ‘big data’
– State-of-the-art interactive, 3D immersive visualisation suites, including a 150 seat lecture theatre with stereo capabilities and an eight-projector 120 degree surround visualisation system
• Software development: harness our world-class supercomputing expertise/infrastructure to design, test and optimisecutting-edge code
• Applications and optimisation: test concepts and solve problems using our state-of-the-art modelling, simulation and visualisation facilities
• HPC on-demand: access our portfolio of resources on a one off or regular basis to drive innovation, productivity and competitiveness
• Collaboration: propel fundamental and applied research to new heights utilising our agenda-setting capabilities
• Training and education: enhance your understanding of leading-edge computing technologies and the benefits of harnessing them
HECToR
• Procured for UK scientists by Engineering and Physical Sciences Research Council – EPSRC
• Hardware – CRAY
• Management &Computational Science and Engineering Support –EPCC
• ARCHER is located at The University of Edinburgh
• Until recently the UKs Largest HPC system
PRACE
The Partnership for Advance Computing in
Europe is the European HPC Research
Infrastructure
• PRACE enables world-class science through large scale simulations
• PRACE provides HPC services on leading edge capability systems on a diverse set of architectures
• PRACE operates up to six Tier-0 systems as a single entity including user and application support– International non-for-profit Association with seat in Brussels; 20
members
– Systems funded by hosting members with 100 Million € / 5 years each
– Currently France, Germany, Italy, Spain; The Netherlands expected soon
• PRACE offers its resources through a single pan-European peer review process– Governed by an independent Scientific Steering Committee
High Throughput Computing
• Many researchers want to run the same program many times with many different input
parameters and/or input data files.
• These are often called parameter sweep applications.
• If each program run is independent, then different runs can be run at the same
time on different processors.
• Each program run is on a single processor, so takes about the same
time to execute as on a single PC.
• However if you use 20 processors at once, results from all runs can be
completed simultaneously and therefore the whole application Is 20 times faster
• More later!
www.egi.euEGI-InSPIRE RI-261323
EGI
• European
– Over 35 countries
• Grid
– Secure sharing
• Infrastructure
– Computers
– Cloud
– Data
– Instruments
– …. and others
31/03/2015 SciTechEurope Masterclass 2012 22
www.egi.euEGI-InSPIRE RI-261323
Geneva, 4th July 2012
31/03/2015 SciTechEurope Masterclass 2012 23
www.egi.euEGI-InSPIRE RI-261323
Analysing New Viruses
• VIDISCA-454, new method to
find new viruses from genetic
material
• Runs EGI using customised
workflows, allowing researchers
to save time.
• Analysis done in 14 hours not 17
days
• The method was used to identify
a new type of coronavirus
• Results published in Nature
Medicine
31/03/2015 SciTechEurope Masterclass 2012
Many children suffer from respiratory diseases caused by unknown viruses
http://go.egi.eu/virus
24
www.egi.euEGI-InSPIRE RI-261323
Fusion Reactor Modeling
• Investigating
viability of fusion
as a power source
• Modeling and
simulating the
reactor
• Used 1 million
CPU hours in the
last 12 months
31/03/2015 SciTechEurope Masterclass 2012 25
www.egi.euEGI-InSPIRE RI-261323
Tackling Alzheimers
• Diagnostic Enhancement
of Confidence by an
International Distributed
Environment
• Diagnostic tools for the
medical community
sharing medical data
securely
• Allow doctors to diagnose
Alzheimer’s disease in its
early stages and track the
progress of the symptoms
over time
31/03/2015 SciTechEurope Masterclass 2012 26
www.egi.euEGI-InSPIRE RI-261323
Cherenkov Telescope Array
• Future ground-based high energy gamma-ray instrument
• Integrate resources 132 institutes in 25 countries
• Using applications and grid technology provided by EGI
• Rapidly run data-intensive stimulations to explore design
31/03/2015 SciTechEurope Masterclass 2012 27
Core services are building blocks of EUDAT‘s Common Data Infrastructuremainly included on bottom layer of data services
Fundamental Core Services• Long-term preservation• Persistent identifier service• Data access and upload• Workspaces• Web execution and workflow services• Single Sign On (federated AAI)• Monitoring and accounting services• Network services
Extended Core Services (community-supported)• Joint meta data service• Joint data mining service
EUDAT core services
No need to match the needs of all at the same time, addressing a
group of communities can be very valuable, too
Expected benefits of a Collaborative Data Infrastructure
Cost-efficiency through shared resources and economies of scale
Better exploitation of synergies between communities and service providers Support to existing scientific communities’ infrastructures and smaller communities
Trans-disciplinarity
Inter-disciplinary collaboration Communities from different disciplines working together to build services Data sharing between disciplines – re-use and re-purposing Each discipline can solve only part of a problem
Cross-border services
Data nowadays distributed across states, countries, continents, research groups are international
User-driven infrastructure
User-centric approach in designing the services, testing and evaluation Strategic user empowerment in governance approach
Sustainability
Ensuring wide access to and preservation of data Greater access to existing data and better management of data for the future Increased security by managing multiple copies in geographically distant locations
Put Europe in a competitive position for important data repositories of world-wide relevance
National and International Community Research infrastructure Projects
• STFC CLF• Diamond LightSource• MOTT-2• NSCCS• NanoCMOS• DSR (analysing
requirements)• MIMAS• EDINA• NeISS• DiRAC
• ELIXIR
• LifeWatch
• CLARIN
• GridPP
• SKA
• SDSS
National Service for Computational Chemistry Software (NSCCS)
• Provides access to software, specialist consultation, computing resources and software training to support UK academics working across all fields of chemistry.
• Lead by Imperial College with physical hardware based at STFC with service software support from both STFC and Imperial.
• Future service development roadmap– Web based portal job submission.– Local and then via WMS– Shibboleth authentication to portal.– Migration of local bespoke accounting tool to NGS UAS/APEL/RUS– Make available resources as an NGS partner.– Install the NGS software stack for job submission (likely to be glite
Cream-CE, integrate into WMS)
National eInfrastructurefor Social Simulation
• JISC Funded. Meets the demand for powerful simulation tools by social scientists, public and private sector policymakers.
• Problems being addressed:– Curation, sharing and re-use of simulation outputs– Design and implementation of standards for sharing data and
methods. – Controlling access to information which may be private,
confidential, or copyright.– Manipulation of complex simulation outputs across multiple
service components; providing real-time access to powerful computational resources.
– Facilitating access to research resources and expertise among a distributed community of users
Distributed Research utilising Advanced Computing (DiRAC)
• Integrated supercomputing facility for theoretical modeling and HPC-based research in particle physics, astronomy, cosmology and nuclear physics
European Strategic Framework for Research Infrastructures (ESFRI)
ELIXIR: Europe’s emerging infrastructure for biological information
AIM –To build a sustainable European infrastructure for biological information, supporting life science research and its translation to medicine, the environment, the bio-industries and society.
Services:• Management of Europe’s growing volume
and variety of biological data which are heterogeneous, complex and heavily linked
• Interaction with and support for data in other ESFRI projects in medicine, agriculture and environment.
• Biological domain expertise• Computer Tools Infrastructure• Computational infrastructure • Training centres for users of ELIXIR.• Industry translational services• 3 million users growing to 10 million in 2020• Petabytes now growing to exabytes in 2020
1800 terrestrial Long-Term Ecological Research (LTER) sites: increasingly sensor instrumented
>200 Marine reference and focal sites, with more to come: increasingly sensor instrumented
Hundreds of millions of specimens in natural science collections: >275m now indexed, increasing at 20% p.a.
Challenge of SCALE: > 25,000 users
Plus: all kinds of small, personal, group, and departmental datasets that need to get published
CLARIN
Jisc Conference
Lonond
13th April 2010
The CLARIN Mission
what?
create a research infrastructure that makes language resources and technologies available to scholars of all disciplines, especially humanities and social sciences
how?
unite existing digital archives into a federation of connected archives with unified web access
provide language and speech technology tools as web services operating on language data in archives
Ian Bird, CERN 40
LHC Computing
Signal/Noise: 10-13 (10-9 offline)
Data volume
High rate * large number of channels * 4 experiments
15 PetaBytes of new data each year
Compute power
Event complexity * Nb. events * thousands users
200 k of (today's) fastest CPUs
45 PB of disk storage
Worldwide analysis & funding
Computing funding locally in major regions & countries
Efficient analysis everywhere
GRID technology
& IN THE CLOUD…
A Working Definition of Cloud Computing
• Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
4
Walloms Def: If a user speaks to a
person to get access to resources, its
virtualisation, HPC or whatever;
if the user gets ‘access’ through an ICT
based interface, without human
intervention, expanding and contracting
their available resources at will, it’s a
Cloud!
Courtesy of NIST
5 Essential Cloud Characteristics
• On-demand self-service
• High performance network access (not necessarily JANet quality though)
• Resource pooling Location independence
• Rapid elasticity/service scalability
• Measured service/usage is accounted for
4
Courtesy of NIST
Service Models of Cloud Computing: SaaS
• SaaS: Software as a Service –> Google Apps, Force.com, Facebook, Microsoft Office 365;
deployeduse
SaaS
provider
45
Examples
Service Models of Cloud Computing: SaaS, PaaS
• SaaS: Software as a Service –> Google Apps, Force.com, Facebook, Microsoft Office 365;
• PaaS: Platform as a Service –> Google App Engine, Azure Platform, Oracle Fusion;
use
Applicatio
n
package
deployed
PaaS
provider
.NET PHP Python Ruby
Visual Studio and Eclipse
…
Web Standards + Industry Standards
Azure™ Services Platform
Microsoft Azure
Service Models of Cloud Computing: SaaS, PaaS, IaaS
• SaaS: Software as a Service –> Google Apps, Force.com, Facebook, Microsoft Office 365;
• PaaS: Platform as a Service –> Google App Engine, Azure Platform;
• IaaS: Infrastructure as a Service –> Amazon Web Services, Elastic Hosts, 100percentIT
use
OS
image
instantiated
IaaS
provider
Amazon AWS
Amazon AWS
Elastic Compute Cluster (EC2)
SimpleDB
Simple Storage Service
(S3)
Simple Queue Servcie(SQS)
CloudFront
University of Oxford IT Services Cloud
www.egi.euEGI-InSPIRE RI-261323
Federated Cloud Platform
• 12 countries provide 15 certified resources
– Czech Republic, Germany, Greece, Hungary, Italy, Macedonia, Poland, Slovakia, Spain, Sweden, Turkey, United Kingdom
• 2 countries currently integrating– Croatia, Finland
• 5 countries interested– Bulgaria, France, Israel*, The
Netherlands, Switzerland
• Worldwide interest– South Africa* (SAGrid)
– South Korea* (KISTI)
– United States* (NIST, NSF Centres)
* Not shown on map
www.egi.euEGI-InSPIRE RI-261323
Fed Cloud Value proposition
The EGI Federated Cloud is the federation of public and privateClouds, offering Cloud Services to consumers in Europe and the restof the world
A cloud system able to
• Scale to user* needs presenting individually to each community
• Integrate multiple different providers to give resilience
• Prevent vendor lock-in
• Enable resource provision targeted towards the research community
Standards based federation of IaaS cloud:
• Exposes a set of independent cloud services accessible to users utilising a common standards profile
• Allows deployment of services across multiple providers and capacity bursting
* Where user is no longer just the end user
www.egi.eu
July 2014
EGI-InSPIRE RI-261323
Some of the EGI FedCloud
Communities
• Ecology – BioVeL: Biodiversity Virtual e-Laboratory
• Structural biology – WeNMR: a worldwide e-Infrastructure for NMR and structural biology
• Linguistics – CLARIN: ‘British National Corpus’ service (BNCWeb)
• Earth Observation – SSEP: European Space Agency’s Supersites Exploitation Platform for
volcano and earthquakes monitoring (Collaboration with Helix Nebula)
• Software Engineering – SCI-BUS: simulated environments for portal testing
• Software Engineering – DIRAC: deploying ready-to-use distributed computing systems
• Interdisciplinary research– Catania Science Gateway Framework
• Musicology – Peachnote: dynamic analysis of musical scores
• Earth Observation – ENVRI: Common Operations of Environmental Research
infrastructures (collaboration with EISCAT3D)
• Geology – VERCE: Virtual Earthquake and seismology Research
• Ecology – LifeWatch: E-Science European Infrastructure for Biodiversity and Ecosystem
Research
• High Energy Physics – CERN ATLAS: ATLAS processing cluster via HelixNebula
More info: https://wiki.egi.eu/wiki/Fedcloud-tf:Users
53The EGI Federated Cloud, architecture and use cases
EGI-InSPIRE Review 2014
Strategic Plan for Helix Nebula
• Set up a cloud computing infrastructure for European Research Area
• Identify and adopt policies for trust, security and privacy on a European-level
• Create a light-weight governance structure involving all stakeholders
• Define a short and medium term funding scheme
Pilot phase goals
• Through the pilot phase we expect to explore/push a series of perceived barriers to Cloud adoption:
• Security: Unknown or low compliance and security standards • Reliability: Availability of service for business critical tasks • Data privacy: Moving sensitive data to the Cloud • Scalability/Elasticity: Will the Cloud scale-up to our needs • Network performance: Data transfer bottleneck; QoS• Integration: Hybrid systems with in-house/legacy systems • Vendor lock-in: Dependency on vendors once data & applications
have been transferred to the Cloud • Legal concerns: Such as who has legal liability • Transparency: Clarity of conditions, terms and pricing
What about commercial providers?
EPSRC Software Strategy
• Software as an Infrastructure – Survey, response, action plan – http://www.epsrc.ac.uk/SiteCollectionDocuments/other/SoftwareAsAnInfrastr
ucture.pdf
• Areas – Identification of new areas and grand challenges – Enabling and promoting collaboration – Research and Development – Training – Career Path Support – Joint funding models – Supporting Innovation – User Support – Quality of Code – Sustainability of Code
EPSRC Collaborative Computational Projects
CCP Title
CCP4 Macromolecular Crystallography
CCP5 The Computer Simulation of Condensed Phases
CCP9Computational Electronic Structure of Condensed Matter
CCP12 High Performance Computing in Engineering
CCP-ASEArchAlgorithms and Software for Emerging Architectures
CCP-BioSimBiomolecular simulation at the life sciences interface
CCP-EM Electron cryo-Microscopy
CCPi Tomographic Imaging
CCPN NMR
CCP-NC NMR Crystallography
CCPQQuantum dynamics in Atomic, Molecular and Optical Physics
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for building better software
• Better software enables better research
• Software reaches boundaries in its development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage• Software reviews and refactoring, collaborations
to develop your project, guidance and best practice on software development, project management, training, community building, publicity and more…
Software Sustainability Institute
www.software.ac.uk
SSI: Long Term Goals
• Provision of useful, effective services for research software community Transfer knowledge and skills to the community
• Development and sharing of research community knowledge, intelligence and interactions Raise awareness, identify and address trends and issues
• Promotion of research software best practice Change the culture and attitudes to software
• Mantra: Keep the software in its respective community Work with the community, to increase ability Don’t introduce dependency on SSI as the developer Expand and exploit networks and opportunities
Software Sustainability Institute
www.software.ac.uk
A National Facility for Research Software
Website: www.software.ac.uk
Email: [email protected]
Twitter: twitter.com/SoftwareSaved
Some current collaborations
Conclusion
• Research is increasingly dominated by digital generation, interpretation and storage
of information
• e-Infrastructure is the full breadth of services, hardware, software & people which
must be integrated to provide a common platform for all research
• You don’t need to own physical equipment to get access to e-infrastructure in the
new landscape
• Making sure that you use the right tool for the right job is essential