Download - Jorge gomes
CloudViews, Porto, May 2010 1
GRID, PaaS for e-science ?(thoughts about grids and clouds)
Jorge [email protected]
Laboratório de Instrumentação e Física Experimental de Partículas
It seems reasonable to envision, for a time 10 or 15 years hence, a “thinking center” that will incorporate the functions of present-day libraries together with anticipated advances in information storage and retrieval and the symbiotic functions suggested earlier in this paper.
The picture readily enlarges itself into a network of such centers, connected to one another by wide-band communication lines and to individual users by leased-wire services.
In such a system, the speed of the computers would be balanced, and the cost of the gigantic memories and the sophisticated programs would be divided by the number of users.
Man-Computer Symbiosis, J.C.R. Licklider, 1960
A view from 1960
CloudViews, Porto, May 2010 2
CloudViews, Porto, May 2010 3
About LIP• LIP is Portuguese scientific research
laboratory:– High Energy Physics (HEP)– Associated laboratory funded by the
Portuguese public funding agencies – Private non-profit association– Created in 1986 when Portugal joined CERN
• LIP participation in physics experiments includes:– Atlas, CMS, Compass, Auger, AMS, SNO,
Zeplin, Hades, …
• Other activities include:– Building DAQ systems and particle detectors,
detectors R&D, medical physics, Geant4– Electronics, precision mechanics, grid
computing
CERN - LHC
CloudViews, Porto, May 2010 6
About the LHC• The Large Hadron Collider (LHC) is the largest
scientific instrument on earth:– Located at CERN in the Swiss/French border– 27 Km of circumference– At 100 meters depth (average)– 600 million particle collisions per second– Reproducing the energy density that existed
just a few moments after the big bang
• Objective:– Probe deeper into the matter
structure than ever before– Understand fundamental
questions about the universe
• Four experiments working in parallel:– ATLAS, CMS, ALICE, LHCB
Beams of protons will collide at an energy of 14 TeV
Beams of lead nuclei will collide at an energy of 1150 TeV
CloudViews, Porto, May 2010 7
• Data volume– High rate * large number of
channels * 4 experiments 15 PetaBytes of new data each
year• Compute power
– Event complexity * Nb. events * thousands users
100 k of (today's) fastest CPUs• Worldwide analysis & funding
– Computing funding locally in major regions & countries
– Efficient analysis everywhere– Hundreds of locations GRID Computing
The LHC computing challenge
CloudViews, Porto, May 2010 8
Grid Infrastructure Projects TimelineInt.Eu.Grid
2007
2006
2005
2008
2004
2003
2002
2001
EGEE- IIIEGEE -IIEELAEGEE - ILCGCrossGridDataGrid Int.Eu.Grid
2007
2006
2005
2008
2004
2003
2002
2001
-EELALCGCrossGridDataGrid EGI Inspire
2009
2010
LIP and Grid Computing projects
CloudViews, Porto, May 2010 9
Today WLCG is a success
• Running increasingly high workloads:– Jobs in excess of 650k /
day; Anticipate millions / day soon
– CPU equiv. ~100k cores
• Workloads are:– Real data processing– Simulations– Analysis – more and
more (new) users
• Data transfers at unprecedented rates
267 sites55 countries150,000 CPUs28 PB online41 PB offline16,000 users200 VOs660,000 jobs/day
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
CloudViews, Porto, May 2010 10
Was the largest multidisciplinary gridNow being replaced by EGI Inspire
CloudViews, Porto, May 2010 11
Middleware
• Security – Virtual Organization Management (VOMS) – MyProxy
• Data management – File catalogue (LFC)– File transfer service (FTS)– Storage Element (SE)– Storage Resource Management (SRM)
• Job management – Work Load Management System(WMS)– Logging and Bookeeping (LB)– Computing Element (CREAM CE, LCG CE)– Worker Nodes (WN)
• Information System– Monitoring: BDII (Berkeley Database Information Index), RGMA
(Relational Grid Monitoring Architecture) aggregate service information from multiple Grid sites, now moved to SAM (Site Availability Monitoring)
– Monitoring & visualization (Gridview, Dashboard, Gridmap etc.)
LIP workshop 2010 12
European Grid Initiative (EGI)
• European Grid Initiative replacing EGEE– Scientific grid computing sustainability in Europe:
• Grid computing in Europe is now critical for many scientific communities• Grid computing in Europe must not depend on isolated short term projects
– New organizational model with two layers:• National Grid Initiatives (NGIs) funded and managed by the governments• European Grid Initiative funded by the NGIs and by the EU
• EGI headquarters has been established in Amsterdam• The transition is happening now !
• Most NGIs are not limited to grid computing:– Distributed computing Current focus is grid computing– HPC– HTC– Applications– Network provisioning for distributed computing
• There is a growing interest in cloud computing: – by the NGIs– by the research communities
National Grid Initiatives
CloudViews, Porto, May 2010 13
• The 48 month EGI-InSPIRE (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) project will continue the transition to a sustainable pan-European e-Infrastructure started in EGEE-III. It will sustain support for Grids of high-performance and high-throughput computing resources, while seeking to integrate new Distributed Computing Infrastructures (DCIs), i.e. Clouds, SuperComputing, Desktop Grids, etc.
• Future technologies will include the integration of cloud resources (from either commercial or academic providers) into the production infrastructure offered to the European research community.
• Exploratory work to see how cloud computing technologies could be used to provision gLite services is already taking place within EGEE-III between the EC FP7 funded RESERVOIR project and the StratusLab collaboration. Work using the Azure environment from Microsoft will be explored through the VenusC project.
European Grid Initiative
CloudViews, Porto, May 2010 14
• D2.6) Integration of Clouds and Virtualisation into the European production infrastructure: Provide a roadmap as to how clouds and virtualisation technology could be integrated into the EGI exploring not only the technology issues, but also the total costs of ownership of delivering such resources. [month 8]
European Grid Initiative
CloudViews, Porto, May 2010 15
CloudViews, Porto, May 2010 18
Portuguese NGI Main Site
• Electrical power:– 2000 kVA– 6x 200kVA UPSs– Emergency power generation
• Chilled water cooling:– Chillers with free-cooling– Close-control units
• Other characteristics:– Computer room area 370m2
– Fire detection and extinction– Access control– Remote monitoring and alarms
• Computing resources:– Cluster HTC and HPC– Online storage– Offline storage– Services– Housing
The availabilit
y of computin
g resources
initially
targette
d for g
rid computin
g
opens new oportunitie
s !!!
• Grid has been very successful for some user communities
• High Energy Physics matches perfectly grid computing:– Large user community– Excellent technical skills– Very structured and well organized – Users share common goals– Users share common data– Willing to share and collaborate– Distributed users and resources
(geographically and administratively)– Huge amounts of data to process
• They have a motivation and a reward for sharing resources !
Lessons – user communities
CloudViews, Porto, May 2010 19
• This is not valid for many other user communities:– Small number of users (sometimes one single user)– Not structured sometimes even in direct competition– Not much distributed communities– Isolated peaks of activity instead of sustained usage– No tradition to cooperate (also sociological)
• They have low motivation to share resources
• Sometimes it is possible to create common VOs for them:– Good example is the EGEE biomed VO that includes many
independent researchers under the global coordination of EGEE
Lessons – user communities
CloudViews, Porto, May 2010 20
• The model for the scientific grids is based on virtual organizations (VO) - user communities:– Users organize themselves and
create VOs– Users integrate their own resources – Users share their resources with the
other VO members– They might share resources with
other VOs
– There is no pay-per-use model– There is no economic model
Lessons – business model
CloudViews, Porto, May 2010 21
EGEE / EGI - Grid Infrastructure bus
Opera
tions
Core
Serv
ices
Help
desk
VO
sup
port
Train
ing
Dis
sem
inati
on
Mid
dle
ware
Reso
urc
es
Reso
urc
es
Reso
urc
es
Reso
urc
es
Reso
urc
es
Reso
urc
es
Reso
urc
es
VOVO
VO
VOVO
VO
VO
• Reduced motivation for the resource providers:– No reward for providers not related with VOs– Most frequently providers only share if they have a local user
community that needs grid computing and pushes for it– Providers tend to commit the minimum possible resources– Small capacity to provide elasticity for the VOs
Lessons – business model
CloudViews, Porto, May 2010 22
• Technical– Standards effort is very important (Open Grid Forum)– Interoperability (not perfect or complete but very valuable)– Sophisticated data and resource management– Worldwide authentication for scientific grids– Common usage and security policies– Many developments for privacy and security– Powerful European infrastructure based on Géant network
• European policy and coordination structure– Creation of national grid initiatives supported by the governments– Creation of the European Grid Initiative– Model for long term sustainability
Lessons – grid achievements
CloudViews, Porto, May 2010 23
• Mostly oriented for batch processing• Complex architecture • Steep learning curve• Hard do deploy, maintain and troubleshoot• May require considerable human resources to operate and use• Creation of new VOs is a heavy task• Several middleware stacks without full interoperability
(gLite, ARC, UNICORE, globus, ...)• Applications may require some degree of porting• Not much user friendly• Reduced range of supported operating systems• Too heavy for small sites• Too heavy for users without very large processing requirements
Lessons – grid complaints
CloudViews, Porto, May 2010 24
• Elastic grid infrastructures:– Complement native grids with grids on top of clouds– Plan the physical infrastructure for sustained loads and
use cloud services to accommodate usage peaks– Better elasticity for the VOs– Less costs and higher capacity for peak usage– Native grids for HPC and HTC, clouds for HTC– Valid both for grid sites and grid infrastructures
Grids & Clouds
CloudViews, Porto, May 2010 25
• Fully clouded grid infrastructures:– Full grid infrastructure on top of clouds– Fully dynamic only (pay/allocate) what is needed– Grid managers have less worries about the underlying
infrastructure and can concentrate on the service – Easier deployment if releases are cloud oriented– Unfortunately this is still problematic for HPC– When using commercial providers:
• Possibly more resilient infrastructures• Might be not so interesting for high sustained loads• Requires careful estimation of costs and economies
– But valid for a scientific/academic cloud service ...
Grids & Clouds
CloudViews, Porto, May 2010 26
• In science there are additional issues:– Is there money to pay services ?– Is there money to pay hardware ?– Is there money to pay human resources ?– Is there money for maintenance ?– Is there money for the electricity ?– We only have big money once in a while ...– Budgets and projects last one year so we don’t know if there will be
money next year ...
• Requires:– Careful estimation of costs– Detailed and accurate planning– Sustained funding
Grids & Clouds
CloudViews, Porto, May 2010 27
• In science there are additional concerns:– Black box we don’t know the architecture and scalability
behind the commercial clouds– Lack of standard interfaces (provider lock-in)– Performance for very data intensive applications– Low latency for parallel applications– Privacy, security and availability concerns– Legal concerns– Network bandwidth to the commercial Internet– Future/Evolution of the clouds and their costs
Grids & Clouds
CloudViews, Porto, May 2010 28
• Support for special cases:– As NGI we get requests from all types of users will all
sort of requirements ...– A more generic approach to support all sorts of
computing requirements is welcome
– Users with small/medium computing requirements that don’t want to mess with grid computing
– Users that need very specific or non-supported distributed computing middleware
– Users that need very specific software environments– Users that want to do things other than computing
Grids & Clouds
CloudViews, Porto, May 2010 29
• Attract more resource providers:– Bring in resource providers not rewarded by grid– Cloud computing is more generic than grid– It may bring in more academic / scientific resource
providers– Being generic is more advantageous, everybody can use
it for something– Then these new resources could be also usable to
provide grid over the cloud
– An economic model allowing to get credits for the CPU provided
Grids & Clouds
CloudViews, Porto, May 2010 30
• Increase flexibility– Use same resources to support a wide range of users
• Grid users with many different requirements and needs• Generic scientific computing users (non-grid)• Other types of needs
– Optimize the infrastructures use• Use free resources on grid and non-grid computing clusters
– Preservation of data and processes• Capability to resurrect older grid and non-grid computing
environments to run legacy applications
– More power to the end users• Let the users choose and take care of their needs• Let operations people concentrate on running the infrastructure
Grids & Clouds
CloudViews, Porto, May 2010 31
• Most building blocks do exist for “clouded grids”• There is motivation to do it• Several scenarios already demonstrated• At LIP we run OpenNebula in our grid infrastructure• Clouds are very suitable for HTC grid applications• Some of the concerns related with clouds are not
much different from the ones mentioned for grids• For commercial grids further analysis are needed • Evolution points in the direction of providing virtual
grid infrastructures on clouds
Summary
CloudViews, Porto, May 2010 32
33
http://www.ibergrid.eu/2010/
http://wiki.ncg.ingrid.pt/
• Portugal and Spain compose the EGEE Southwest federation• LIP coordinates EGEE activities in the country since EGEE-I
– Infrastructure operations coordination– User and site support– Infrastructure services– Training– Security and authentication
• The Portuguese sites are:– LIP (Lisbon)– LIP (Coimbra)– UP (Porto)– DI-Uminho (Braga)– Uminho-CP (Braga)– IEETA (Aveiro)– CFP-IST (Lisbon)– Univ Lusíada (Famalicão)
EGEE in Portugal
CloudViews, Porto, May 2010 34
CloudViews, Porto, May 2010 35
LIP
• Besides the Tier-2 LIP also provides computing resources for other research activities namely within grid projects– AUGER, SNO, COMPASS, CMS, ATLAS, medical physics, ESA, AMS, etc– LCG, EGEE, int.eu.grid, EELA, NGI, IBERGRID, etc
• LCG has been the driven force behind
grid computing in the country
• The Portuguese federated Tier-2 is
composed by 3 sites:• LIP-Lisbon• LIP-Coimbra• NGI main node for grid computing
CloudViews, Porto, May 2010 36
LIP Tier-2 topology (Lisbon)
...
SGEFarm
37x HP DL160G514x HP DL160G6 Other systems• SUN X2200• SUN X4100• DELL PE1950
About 600 COREs
Force10core switch
DELLStorage servers~ 40TB each
SRM storageLUSTRE + STORM
SRM +GSIFTP doors
ComputingElement
Site BDII
MonitoringBox
gLite
• Why– Encapsulation– Provide multiple environments for multiple applications and services– More dynamic resource allocation profiting from existing resources
• Where and how– Computing farm (mixture of real and virtual resources)– To host grid (or other) persistent services– For testing purposes– Xen paravirtualization mostly
• Things being explored– More dynamic approach to virtual resources for classic farm computing (enable
multiple environments tailored to the VOs)– A flexible framework for persistent virtual machines enabling resilient service
provisioning– Cloud computing services on top of the existing resources (virtual machines on
demand)– KVM (Kernel-based Virtual Machine) native virtualization Intel-VT and AMD-V, RH
bought Qumranet the developer of KVM and seems to be betting on it
Virtualization
CloudViews, Porto, May 2010 37
CloudViews, Porto, May 2010 38
Iniciativa Nacional Grid (INGRID)
• Initiative from the Portuguese Ministry of Science– Launched in April of 2006 in the context of
• “Ligar Portugal” is a larger initiative for the information society– Managed by the government agencies FCT and UMIC
• Bodies from de Ministry of Science:– FCT is the Portuguese Science Foundation– UMIC is the Portuguese Knowledge Society Agency
– Technical coordination by LIP and UMIC
• Main objectives:– Reinforce the national competence and capacity in the grid computing
domain– Enable the use of grid computing for complex problem solving– Integrate Portugal in international grid computing infrastructures– Reinforce the multidisciplinary collaboration among research communities– Promote conditions for commercial companies to find in the country know
how in the grid computing domain
CloudViews, Porto, May 2010 39
INGRID projects
• G-Cast: Application of GRID-computing in a coastal morphodynamics nowcast-forecast system
• GridClass - Learning Classifiers Systems for Grid Data Mining
• PoliGrid - distributed policies for resource management in Grids
• Collaborative Resources Online to Support Simulations on Forest Fires (CROSS-Fire): a Grid Platform to Integrate Geo-referenced Web Services for Real-Time Management
• GRID for ATLAS/LHC data simulation and analysis
• GERES-med: Grid-Enabled REpositorieS for medical applications
• BING –Brain Imaging Network Grid • GRITO – A Grid for preservation• PM#GRID - GRID Platform
Development for European Scale Satellite Based Air Pollution Mapping
• AspectGrid: Pluggable Grid Aspects for Scientific Applications
• P-found: GRID computing and distributed data warehousing of protein folding and unfolding simulations
CloudViews, Porto, May 2010 41
Infrastructures and projectsINGRID
EGIIBERGRID
LCG
...
INGRID+
Users:– INGRID projects– Virtual organizations
(national and international)– Other users with demanding
computing requirements
Core resourcesMain node etc
Existing resources(EGEE, int.eu.grid, EELA,
INGRID projects...)
Other resources
Users
Create an autonomous NGI grid infrastructure
CloudViews, Porto, May 2010 42
Setup of NGI Core Resources
• Core resources initially composed of three grid clusters:– main node for grid computing
• New facility locate at LNEC– grid resources provided by the LIP computer centre in Lisbon
• Located at the LIP facilities in Lisbon– additional grid resources provided by the LIP computer centre in Coimbra
• Located at the CFC datacentre in the University of Coimbra
• Support for the integration of computing resources in the country:– Initially focus on existing resource centres– Expand to other sites at a later stage– Concentrate on gLite resources– Funding line to support the organizations providing resources
CloudViews, Porto, May 2010 43
Main node - Location
• The main node is being built by a consortium of research organizations under the Portuguese NGI context:– LIP, FCCN, LNEC
• The project started in 2007.• Some components are already operational.• It will become officially operational in the coming weeks. • The centre is located at the LNEC
campus very near to the near the
FCCN NOC in Lisbon• Excellent network connectivity:
– FCCN national backbone– Géant PoP
• Facilities to house computing equipment are very expensive– 900K Euro in equipment – more than 1200K Euro in the facility (low construction cost)
• Operational costs are also very heavy– Electrical power for cooling and all the systems– Environment impact also relevant
• Optimization very important– Minimize electrical power losses– Maximize effectiveness of cooling systems
• Measures– Chillers + free cooling– Minimize mixture of hot and cold air– Careful set point selection for air conditioning– Highly efficient UPS systems and power supplies– Use blade centers for higher power efficiency– Power efficiency study (look at reusing heat or other forms of generating power)– Look at ways to turn off/on systems dynamically
Main Node Facility – Details
CloudViews, Porto, May 2010 44
• Setup– Tape library LTO-4
• Grid accessible data repositories• Hierarchical storage
– Core grid services• Two blade centers• 192 CPU cores
– Grid cluster• HTC and HPC blades • ~ 1250 CPU cores for processing
– Online grid storage• Server direct attached storage• ~ 620TB raw + 70TB raw SAN
– Local network• Core 10gigabit Ethernet• Non-blocking, wire-speed, low latency
– Resources from other organizations:• LNEC grid cluster
CloudViews, Porto, May 2010 45
Main node - Computing resources
CloudViews, Porto, May 2010
Computing Resources
• High Throughput Computing Servers– IBM blades:
• 2 quad-core AMD opteron 2356 processors• 2 quare-core INTEL Xeon E5420 processors
– HP blades:• 2 quad-core INTEL Xeon X5550 processors
– 3 GB of RAM per core (24GB per blade)– Running SL5 x86_64
• High Performance Computing Servers– IBM blades:
• 2 quad-core AMD opteron 2356 processors• Infiniband• 4 GB of RAM per core (32GB per blade)• Running SL5 x86_64
47
Storage Resources
• Storage servers and expansion boxes – IBM X3650 servers running SL5 x86_64
• 2 quad-core Intel(R) Xeon(R) L5420 CPUs• 2 SAS disks deployed in Raid mirror• 10 Gigabit Ethernet adapters Each server has associated 40 TB
of effective storage• Expansion boxes in Raid 5 Volumes with 1 TB SATA-II disks• Total of ~ 620 TB of online grid storage space
– HP DL360 servers running SL5 x86_64• 2 quad-core Intel(R) Xeon(R) L5420 CPUs• 2 SAS disks deployed in Raid mirror• 10 Gigabit Ethernet adapters Each server has associated 40 TB
of effective storage• Expansion boxes with 450 GB SAS disks
Jornadas RCTS 2010 48
Main node for grid computing - schema
Core10gigabitEthernetswitch
...
...Computing Blades
SGE cluster
Storage = Lustre + StoRM
Sup
port
ser
vice
s bl
ades
net
1ª phase:~1250 CPU cores~ 620 TB raw
HPC HTC HTC HTC HTC
CORE
CORE
CloudViews, Porto, May 2010 49
Middleware• What do we want:
– Interoperability with other organizations– Long term support– Reliability– Low cost
• Choice:– Long term: may depend on decisions taken at European level in EGI (UMD)– Short term: use gLite– Medium term: consider other user needs
• gLite:– Possibly the most used middleware in European and other grid infrastructures– Already being used by the Portuguese resource centres in EGEE, Int.Eu.Grid and EELA– gLite developers participate actively in the international standardization bodies
• But ...– Difficult to deploy and maintain, some reliability issues, too much HEP centric– We will integrate additional components when needed– MPI support with Int.Eu.Grid middleware extensions– Cloud computing can be a good complementing technology
• EGI-DS– European Grid Initiative planning– Portugal in policy board (UMIC)
• EGI– Portugal is member through UMIC– First national fee payed
• EGI InSPIRE– Integrated Sustainable Pan-European
Infrastructure for Researchers in Europe– Under EU negotiations– Main project for infrastructure coordination and operation
• EGI InSPIRE international tasks– International bid for global tasks – Portugal and Spain in the middleware rollout coordination
European Grid Initiative (EGI)
LIP workshop 2010 51
The EGI-InSPIRE Project
Integrated Sustainable Pan-European Infrastructure for Researchers in Europe
• A 4 year project with €25M EC contribution– Project cost €69M– Total Effort ~€330M– Staff ~ 170FTE
Project Partners (48) EGI.eu, 37 NGIs, 2 EIROs, 8
AP
Funded
Un-Funded
EGI-InSPIRE - EGEE UF5
52
USER
S
VOs
VOs
Virtual Research
Community
UserCommunit
y Board
USER
S
VOs
VOs
Virtual Research
Commmuity
USER
S
NGI
NGIHelpdesk
EGI.Eu
Training Events
Trainers
Apps.DB
EGIHelpdesk
VRCHelpdesk
OtherHelpdesk
ESFRI Project
VOsVirtual
ResearchCommunity
The European Grid Initiative
CloudViews, Porto, May 2010 57
IBERGRID
• Is a common Portuguese/Spanish Iberian infrastructure
• IBERGRID will provide an umbrella for an Iberian regional grid– Integrating Portuguese and Spanish NGI resources– Fully interoperable with EGI
• Focus is now in the IBERGRID development as a requirement for a successful common participation in EGI– Towards a sustainable model but without loosing synergies and advantages
• Current status– Main grid core services have been deployed on both countries– The initial set of virtual organizations has been created– Several sites are already configured to support IBERGRID VOs– Testing of this pilot infrastructure is ongoing
IBERGRID and EGI
Jornadas RCTS 2010 58
Portuguese grid initiative Spanish grid initiative
IBERGRID = grid computing, HPC, applications, networks, volunteer computing
• Portugal– IBERGRID common VOs management and coordination– Operations portal– Catalogues and services for the IBERGRID VOs– Certification Authority for Portugal (LIPCA)
• Spain– Helpdesk (Request Tracker)– Monitoring and accounting– Infrastructure database (GOCDB/HGSM)– Certification Authority (PKIrisGrid)– Middleware security
• Common – Core services and redundancy– Regional information system– Support groups– Operations coordination– Training infrastructure– Infrastructure security– Seed resources for new users
Iberian transition plan
Jornadas RCTS 2010 59