Download - Jorge gomes

CloudViews, Porto, May 2010 1

GRID, PaaS for e-science ?(thoughts about grids and clouds)

Jorge [email protected]

Laboratório de Instrumentação e Física Experimental de Partículas

http://www.lip.pt/index.php

http://www.lip.pt/images/weblogos/blue3.jpg

It seems reasonable to envision, for a time 10 or 15 years hence, a “thinking center” that will incorporate the functions of present-day libraries together with anticipated advances in information storage and retrieval and the symbiotic functions suggested earlier in this paper.

The picture readily enlarges itself into a network of such centers, connected to one another by wide-band communication lines and to individual users by leased-wire services.

In such a system, the speed of the computers would be balanced, and the cost of the gigantic memories and the sophisticated programs would be divided by the number of users.

Man-Computer Symbiosis, J.C.R. Licklider, 1960

A view from 1960





About LIP• LIP is Portuguese scientific research

laboratory:– High Energy Physics (HEP)– Associated laboratory funded by the

Portuguese public funding agencies – Private non-profit association– Created in 1986 when Portugal joined CERN

• LIP participation in physics experiments includes:– Atlas, CMS, Compass, Auger, AMS, SNO,

Zeplin, Hades, …

• Other activities include:– Building DAQ systems and particle detectors,

detectors R&D, medical physics, Geant4– Electronics, precision mechanics, grid

computing

CERN - LHC




About the LHC• The Large Hadron Collider (LHC) is the largest

scientific instrument on earth:– Located at CERN in the Swiss/French border– 27 Km of circumference– At 100 meters depth (average)– 600 million particle collisions per second– Reproducing the energy density that existed

just a few moments after the big bang

• Objective:– Probe deeper into the matter

structure than ever before– Understand fundamental

questions about the universe

• Four experiments working in parallel:– ATLAS, CMS, ALICE, LHCB

Beams of protons will collide at an energy of 14 TeV

Beams of lead nuclei will collide at an energy of 1150 TeV




• Data volume– High rate * large number of

channels * 4 experiments 15 PetaBytes of new data each

year• Compute power

– Event complexity * Nb. events * thousands users

100 k of (today's) fastest CPUs• Worldwide analysis & funding

– Computing funding locally in major regions & countries

– Efficient analysis everywhere– Hundreds of locations GRID Computing

The LHC computing challenge



Grid Infrastructure Projects TimelineInt.Eu.Grid

2007

2006

2005

2008

2004

2003

2002

2001

EGEE- IIIEGEE -IIEELAEGEE - ILCGCrossGridDataGrid Int.Eu.Grid

2007

2006

2005

2008

2004

2003

2002

2001

-EELALCGCrossGridDataGrid EGI Inspire

2009

2010

LIP and Grid Computing projects




Today WLCG is a success

• Running increasingly high workloads:– Jobs in excess of 650k /

day; Anticipate millions / day soon

– CPU equiv. ~100k cores

• Workloads are:– Real data processing– Simulations– Analysis – more and

more (new) users

• Data transfers at unprecedented rates


267 sites55 countries150,000 CPUs28 PB online41 PB offline16,000 users200 VOs660,000 jobs/day

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…


Was the largest multidisciplinary gridNow being replaced by EGI Inspire




Middleware

• Security – Virtual Organization Management (VOMS) – MyProxy

• Data management – File catalogue (LFC)– File transfer service (FTS)– Storage Element (SE)– Storage Resource Management (SRM)

• Job management – Work Load Management System(WMS)– Logging and Bookeeping (LB)– Computing Element (CREAM CE, LCG CE)– Worker Nodes (WN)

• Information System– Monitoring: BDII (Berkeley Database Information Index), RGMA

(Relational Grid Monitoring Architecture) aggregate service information from multiple Grid sites, now moved to SAM (Site Availability Monitoring)

– Monitoring & visualization (Gridview, Dashboard, Gridmap etc.)



LIP workshop 2010 12

European Grid Initiative (EGI)

• European Grid Initiative replacing EGEE– Scientific grid computing sustainability in Europe:

• Grid computing in Europe is now critical for many scientific communities• Grid computing in Europe must not depend on isolated short term projects

– New organizational model with two layers:• National Grid Initiatives (NGIs) funded and managed by the governments• European Grid Initiative funded by the NGIs and by the EU

• EGI headquarters has been established in Amsterdam• The transition is happening now !



• Most NGIs are not limited to grid computing:– Distributed computing Current focus is grid computing– HPC– HTC– Applications– Network provisioning for distributed computing

• There is a growing interest in cloud computing: – by the NGIs– by the research communities

National Grid Initiatives




• The 48 month EGI-InSPIRE (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) project will continue the transition to a sustainable pan-European e-Infrastructure started in EGEE-III. It will sustain support for Grids of high-performance and high-throughput computing resources, while seeking to integrate new Distributed Computing Infrastructures (DCIs), i.e. Clouds, SuperComputing, Desktop Grids, etc.

• Future technologies will include the integration of cloud resources (from either commercial or academic providers) into the production infrastructure offered to the European research community.

• Exploratory work to see how cloud computing technologies could be used to provision gLite services is already taking place within EGEE-III between the EC FP7 funded RESERVOIR project and the StratusLab collaboration. Work using the Azure environment from Microsoft will be explored through the VenusC project.

European Grid Initiative




• D2.6) Integration of Clouds and Virtualisation into the European production infrastructure: Provide a roadmap as to how clouds and virtualisation technology could be integrated into the EGI exploring not only the technology issues, but also the total costs of ownership of delivering such resources. [month 8]

European Grid Initiative





Portuguese NGI Main Site

• Electrical power:– 2000 kVA– 6x 200kVA UPSs– Emergency power generation

• Chilled water cooling:– Chillers with free-cooling– Close-control units

• Other characteristics:– Computer room area 370m2

– Fire detection and extinction– Access control– Remote monitoring and alarms

• Computing resources:– Cluster HTC and HPC– Online storage– Offline storage– Services– Housing

The availabilit

y of computin

g resources

initially

targette

d for g

rid computin

g

opens new oportunitie

s !!!



• Grid has been very successful for some user communities

• High Energy Physics matches perfectly grid computing:– Large user community– Excellent technical skills– Very structured and well organized – Users share common goals– Users share common data– Willing to share and collaborate– Distributed users and resources

(geographically and administratively)– Huge amounts of data to process

• They have a motivation and a reward for sharing resources !

Lessons – user communities




• This is not valid for many other user communities:– Small number of users (sometimes one single user)– Not structured sometimes even in direct competition– Not much distributed communities– Isolated peaks of activity instead of sustained usage– No tradition to cooperate (also sociological)

• They have low motivation to share resources

• Sometimes it is possible to create common VOs for them:– Good example is the EGEE biomed VO that includes many

independent researchers under the global coordination of EGEE

Lessons – user communities




• The model for the scientific grids is based on virtual organizations (VO) - user communities:– Users organize themselves and

create VOs– Users integrate their own resources – Users share their resources with the

other VO members– They might share resources with

other VOs

– There is no pay-per-use model– There is no economic model

Lessons – business model


EGEE / EGI - Grid Infrastructure bus

Opera

tions

Core

Serv

ices

Help

desk

VO

sup

port

Train

ing

Dis

sem

inati

on

Mid

dle

ware

Reso

urc

es

Reso

urc

es

Reso

urc

es

Reso

urc

es

Reso

urc

es

Reso

urc

es

Reso

urc

es

VOVO

VO

VOVO

VO

VO



• Reduced motivation for the resource providers:– No reward for providers not related with VOs– Most frequently providers only share if they have a local user

community that needs grid computing and pushes for it– Providers tend to commit the minimum possible resources– Small capacity to provide elasticity for the VOs

Lessons – business model




• Technical– Standards effort is very important (Open Grid Forum)– Interoperability (not perfect or complete but very valuable)– Sophisticated data and resource management– Worldwide authentication for scientific grids– Common usage and security policies– Many developments for privacy and security– Powerful European infrastructure based on Géant network

• European policy and coordination structure– Creation of national grid initiatives supported by the governments– Creation of the European Grid Initiative– Model for long term sustainability

Lessons – grid achievements




• Mostly oriented for batch processing• Complex architecture • Steep learning curve• Hard do deploy, maintain and troubleshoot• May require considerable human resources to operate and use• Creation of new VOs is a heavy task• Several middleware stacks without full interoperability

(gLite, ARC, UNICORE, globus, ...)• Applications may require some degree of porting• Not much user friendly• Reduced range of supported operating systems• Too heavy for small sites• Too heavy for users without very large processing requirements

Lessons – grid complaints




• Elastic grid infrastructures:– Complement native grids with grids on top of clouds– Plan the physical infrastructure for sustained loads and

use cloud services to accommodate usage peaks– Better elasticity for the VOs– Less costs and higher capacity for peak usage– Native grids for HPC and HTC, clouds for HTC– Valid both for grid sites and grid infrastructures

Grids & Clouds




• Fully clouded grid infrastructures:– Full grid infrastructure on top of clouds– Fully dynamic only (pay/allocate) what is needed– Grid managers have less worries about the underlying

infrastructure and can concentrate on the service – Easier deployment if releases are cloud oriented– Unfortunately this is still problematic for HPC– When using commercial providers:

• Possibly more resilient infrastructures• Might be not so interesting for high sustained loads• Requires careful estimation of costs and economies

– But valid for a scientific/academic cloud service ...

Grids & Clouds




• In science there are additional issues:– Is there money to pay services ?– Is there money to pay hardware ?– Is there money to pay human resources ?– Is there money for maintenance ?– Is there money for the electricity ?– We only have big money once in a while ...– Budgets and projects last one year so we don’t know if there will be

money next year ...

• Requires:– Careful estimation of costs– Detailed and accurate planning– Sustained funding

Grids & Clouds




• In science there are additional concerns:– Black box we don’t know the architecture and scalability

behind the commercial clouds– Lack of standard interfaces (provider lock-in)– Performance for very data intensive applications– Low latency for parallel applications– Privacy, security and availability concerns– Legal concerns– Network bandwidth to the commercial Internet– Future/Evolution of the clouds and their costs

Grids & Clouds




• Support for special cases:– As NGI we get requests from all types of users will all

sort of requirements ...– A more generic approach to support all sorts of

computing requirements is welcome

– Users with small/medium computing requirements that don’t want to mess with grid computing

– Users that need very specific or non-supported distributed computing middleware

– Users that need very specific software environments– Users that want to do things other than computing

Grids & Clouds




• Attract more resource providers:– Bring in resource providers not rewarded by grid– Cloud computing is more generic than grid– It may bring in more academic / scientific resource

providers– Being generic is more advantageous, everybody can use

it for something– Then these new resources could be also usable to

provide grid over the cloud

– An economic model allowing to get credits for the CPU provided

Grids & Clouds




• Increase flexibility– Use same resources to support a wide range of users

• Grid users with many different requirements and needs• Generic scientific computing users (non-grid)• Other types of needs

– Optimize the infrastructures use• Use free resources on grid and non-grid computing clusters

– Preservation of data and processes• Capability to resurrect older grid and non-grid computing

environments to run legacy applications

– More power to the end users• Let the users choose and take care of their needs• Let operations people concentrate on running the infrastructure

Grids & Clouds




• Most building blocks do exist for “clouded grids”• There is motivation to do it• Several scenarios already demonstrated• At LIP we run OpenNebula in our grid infrastructure• Clouds are very suitable for HTC grid applications• Some of the concerns related with clouds are not

much different from the ones mentioned for grids• For commercial grids further analysis are needed • Evolution points in the direction of providing virtual

grid infrastructures on clouds

Summary




33

http://www.ibergrid.eu/2010/

http://wiki.ncg.ingrid.pt/



http://wiki.ncg.ingrid.pt/


• Portugal and Spain compose the EGEE Southwest federation• LIP coordinates EGEE activities in the country since EGEE-I

– Infrastructure operations coordination– User and site support– Infrastructure services– Training– Security and authentication

• The Portuguese sites are:– LIP (Lisbon)– LIP (Coimbra)– UP (Porto)– DI-Uminho (Braga)– Uminho-CP (Braga)– IEETA (Aveiro)– CFP-IST (Lisbon)– Univ Lusíada (Famalicão)

EGEE in Portugal





LIP

• Besides the Tier-2 LIP also provides computing resources for other research activities namely within grid projects– AUGER, SNO, COMPASS, CMS, ATLAS, medical physics, ESA, AMS, etc– LCG, EGEE, int.eu.grid, EELA, NGI, IBERGRID, etc

• LCG has been the driven force behind

grid computing in the country

• The Portuguese federated Tier-2 is

composed by 3 sites:• LIP-Lisbon• LIP-Coimbra• NGI main node for grid computing




LIP Tier-2 topology (Lisbon)

...

SGEFarm

37x HP DL160G514x HP DL160G6 Other systems• SUN X2200• SUN X4100• DELL PE1950

About 600 COREs

Force10core switch

DELLStorage servers~ 40TB each

SRM storageLUSTRE + STORM

SRM +GSIFTP doors

ComputingElement

Site BDII

MonitoringBox

gLite



• Why– Encapsulation– Provide multiple environments for multiple applications and services– More dynamic resource allocation profiting from existing resources

• Where and how– Computing farm (mixture of real and virtual resources)– To host grid (or other) persistent services– For testing purposes– Xen paravirtualization mostly

• Things being explored– More dynamic approach to virtual resources for classic farm computing (enable

multiple environments tailored to the VOs)– A flexible framework for persistent virtual machines enabling resilient service

provisioning– Cloud computing services on top of the existing resources (virtual machines on

demand)– KVM (Kernel-based Virtual Machine) native virtualization Intel-VT and AMD-V, RH

bought Qumranet the developer of KVM and seems to be betting on it

Virtualization





Iniciativa Nacional Grid (INGRID)

• Initiative from the Portuguese Ministry of Science– Launched in April of 2006 in the context of

• “Ligar Portugal” is a larger initiative for the information society– Managed by the government agencies FCT and UMIC

• Bodies from de Ministry of Science:– FCT is the Portuguese Science Foundation– UMIC is the Portuguese Knowledge Society Agency

– Technical coordination by LIP and UMIC

• Main objectives:– Reinforce the national competence and capacity in the grid computing

domain– Enable the use of grid computing for complex problem solving– Integrate Portugal in international grid computing infrastructures– Reinforce the multidisciplinary collaboration among research communities– Promote conditions for commercial companies to find in the country know

how in the grid computing domain




INGRID projects

• G-Cast: Application of GRID-computing in a coastal morphodynamics nowcast-forecast system

• GridClass - Learning Classifiers Systems for Grid Data Mining

• PoliGrid - distributed policies for resource management in Grids

• Collaborative Resources Online to Support Simulations on Forest Fires (CROSS-Fire): a Grid Platform to Integrate Geo-referenced Web Services for Real-Time Management

• GRID for ATLAS/LHC data simulation and analysis

• GERES-med: Grid-Enabled REpositorieS for medical applications

• BING –Brain Imaging Network Grid • GRITO – A Grid for preservation• PM#GRID - GRID Platform

Development for European Scale Satellite Based Air Pollution Mapping

• AspectGrid: Pluggable Grid Aspects for Scientific Applications

• P-found: GRID computing and distributed data warehousing of protein folding and unfolding simulations




Infrastructures and projectsINGRID

EGIIBERGRID

LCG

...

INGRID+

Users:– INGRID projects– Virtual organizations

(national and international)– Other users with demanding

computing requirements

Core resourcesMain node etc

Existing resources(EGEE, int.eu.grid, EELA,

INGRID projects...)

Other resources

Users

Create an autonomous NGI grid infrastructure




Setup of NGI Core Resources

• Core resources initially composed of three grid clusters:– main node for grid computing

• New facility locate at LNEC– grid resources provided by the LIP computer centre in Lisbon

• Located at the LIP facilities in Lisbon– additional grid resources provided by the LIP computer centre in Coimbra

• Located at the CFC datacentre in the University of Coimbra

• Support for the integration of computing resources in the country:– Initially focus on existing resource centres– Expand to other sites at a later stage– Concentrate on gLite resources– Funding line to support the organizations providing resources




Main node - Location

• The main node is being built by a consortium of research organizations under the Portuguese NGI context:– LIP, FCCN, LNEC

• The project started in 2007.• Some components are already operational.• It will become officially operational in the coming weeks. • The centre is located at the LNEC

campus very near to the near the

FCCN NOC in Lisbon• Excellent network connectivity:

– FCCN national backbone– Géant PoP



• Facilities to house computing equipment are very expensive– 900K Euro in equipment – more than 1200K Euro in the facility (low construction cost)

• Operational costs are also very heavy– Electrical power for cooling and all the systems– Environment impact also relevant

• Optimization very important– Minimize electrical power losses– Maximize effectiveness of cooling systems

• Measures– Chillers + free cooling– Minimize mixture of hot and cold air– Careful set point selection for air conditioning– Highly efficient UPS systems and power supplies– Use blade centers for higher power efficiency– Power efficiency study (look at reusing heat or other forms of generating power)– Look at ways to turn off/on systems dynamically

Main Node Facility – Details




• Setup– Tape library LTO-4

• Grid accessible data repositories• Hierarchical storage

– Core grid services• Two blade centers• 192 CPU cores

– Grid cluster• HTC and HPC blades • ~ 1250 CPU cores for processing

– Online grid storage• Server direct attached storage• ~ 620TB raw + 70TB raw SAN

– Local network• Core 10gigabit Ethernet• Non-blocking, wire-speed, low latency

– Resources from other organizations:• LNEC grid cluster


Main node - Computing resources



CloudViews, Porto, May 2010

Computing Resources

• High Throughput Computing Servers– IBM blades:

• 2 quad-core AMD opteron 2356 processors• 2 quare-core INTEL Xeon E5420 processors

– HP blades:• 2 quad-core INTEL Xeon X5550 processors

– 3 GB of RAM per core (24GB per blade)– Running SL5 x86_64

• High Performance Computing Servers– IBM blades:

• 2 quad-core AMD opteron 2356 processors• Infiniband• 4 GB of RAM per core (32GB per blade)• Running SL5 x86_64



47

Storage Resources

• Storage servers and expansion boxes – IBM X3650 servers running SL5 x86_64

• 2 quad-core Intel(R) Xeon(R) L5420 CPUs• 2 SAS disks deployed in Raid mirror• 10 Gigabit Ethernet adapters Each server has associated 40 TB

of effective storage• Expansion boxes in Raid 5 Volumes with 1 TB SATA-II disks• Total of ~ 620 TB of online grid storage space

– HP DL360 servers running SL5 x86_64• 2 quad-core Intel(R) Xeon(R) L5420 CPUs• 2 SAS disks deployed in Raid mirror• 10 Gigabit Ethernet adapters Each server has associated 40 TB

of effective storage• Expansion boxes with 450 GB SAS disks



Jornadas RCTS 2010 48

Main node for grid computing - schema

Core10gigabitEthernetswitch

...

...Computing Blades

SGE cluster

Storage = Lustre + StoRM

Sup

port

ser

vice

s bl

ades

net

1ª phase:~1250 CPU cores~ 620 TB raw

HPC HTC HTC HTC HTC

CORE

CORE




Middleware• What do we want:

– Interoperability with other organizations– Long term support– Reliability– Low cost

• Choice:– Long term: may depend on decisions taken at European level in EGI (UMD)– Short term: use gLite– Medium term: consider other user needs

• gLite:– Possibly the most used middleware in European and other grid infrastructures– Already being used by the Portuguese resource centres in EGEE, Int.Eu.Grid and EELA– gLite developers participate actively in the international standardization bodies

• But ...– Difficult to deploy and maintain, some reliability issues, too much HEP centric– We will integrate additional components when needed– MPI support with Int.Eu.Grid middleware extensions– Cloud computing can be a good complementing technology



• EGI-DS– European Grid Initiative planning– Portugal in policy board (UMIC)

• EGI– Portugal is member through UMIC– First national fee payed

• EGI InSPIRE– Integrated Sustainable Pan-European

Infrastructure for Researchers in Europe– Under EU negotiations– Main project for infrastructure coordination and operation

• EGI InSPIRE international tasks– International bid for global tasks – Portugal and Spain in the middleware rollout coordination

European Grid Initiative (EGI)

LIP workshop 2010 51



The EGI-InSPIRE Project

Integrated Sustainable Pan-European Infrastructure for Researchers in Europe

• A 4 year project with €25M EC contribution– Project cost €69M– Total Effort ~€330M– Staff ~ 170FTE

Project Partners (48) EGI.eu, 37 NGIs, 2 EIROs, 8

AP

Funded

Un-Funded

EGI-InSPIRE - EGEE UF5

52



USER

S

VOs

VOs

Virtual Research

Community

UserCommunit

y Board

USER

S

VOs

VOs

Virtual Research

Commmuity

USER

S

NGI

NGIHelpdesk

EGI.Eu

Training Events

Trainers

Apps.DB

EGIHelpdesk

VRCHelpdesk

OtherHelpdesk

ESFRI Project

VOsVirtual

ResearchCommunity

The European Grid Initiative



IBERGRID

• Is a common Portuguese/Spanish Iberian infrastructure

• IBERGRID will provide an umbrella for an Iberian regional grid– Integrating Portuguese and Spanish NGI resources– Fully interoperable with EGI

• Focus is now in the IBERGRID development as a requirement for a successful common participation in EGI– Towards a sustainable model but without loosing synergies and advantages

• Current status– Main grid core services have been deployed on both countries– The initial set of virtual organizations has been created– Several sites are already configured to support IBERGRID VOs– Testing of this pilot infrastructure is ongoing



IBERGRID and EGI


Portuguese grid initiative Spanish grid initiative

IBERGRID = grid computing, HPC, applications, networks, volunteer computing



• Portugal– IBERGRID common VOs management and coordination– Operations portal– Catalogues and services for the IBERGRID VOs– Certification Authority for Portugal (LIPCA)

• Spain– Helpdesk (Request Tracker)– Monitoring and accounting– Infrastructure database (GOCDB/HGSM)– Certification Authority (PKIrisGrid)– Middleware security

• Common – Core services and redundancy– Regional information system– Support groups– Operations coordination– Training infrastructure– Infrastructure security– Seed resources for new users

Iberian transition plan




Download - Jorge gomes

Top Related