nikhef/ (sara) tier-1 data center infrastructure tier-1 facts expanding the nikhef center wim...

26
Nikhef/(SARA) tier-1 data center infrastructure Tier-1 facts Tier-1 facts Expanding the Nikhef center Expanding the Nikhef center Wim Heubers / Nikhef Wim Heubers / Nikhef Amsterdam NL Amsterdam NL

Upload: scot-murphy

Post on 13-Dec-2015

241 views

Category:

Documents


3 download

TRANSCRIPT

Nikhef/(SARA) tier-1data center infrastructure

Tier-1 factsTier-1 factsExpanding the Nikhef centerExpanding the Nikhef center

Wim Heubers / NikhefWim Heubers / NikhefAmsterdam NLAmsterdam NL

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

22

LCG Tier-1 Amsterdam Science Park

Nikhef - National institute for subatomic physics LHC (ATLAS, LHCb, ALICE), astroparticle physicsLHC (ATLAS, LHCb, ALICE), astroparticle physics data center: 500 mdata center: 500 m22, 800KW incl cooling, 800KW incl cooling

grid services (disk storage, clusters)grid services (disk storage, clusters) internet exchange AMS-IXinternet exchange AMS-IX

SARA - Computing and Networking Services colo services, consultingcolo services, consulting data center: 1500 mdata center: 1500 m22, 2 MW incl cooling,, 2 MW incl cooling,

national super, national cluster, netherlight, etcnational super, national cluster, netherlight, etc grid services (tape and disk storage, clusters)grid services (tape and disk storage, clusters) internet exchange AMS-IXinternet exchange AMS-IX

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

33

LCG Tier-1 Amsterdam Science

Park More infrastructure:

SURFnet - national research network provides connectivity to LCG OPN

Big Grid - the dutch e-science grid provides resources for LCG tier-1 and other domains 2008-2011.

Amsterdam Internet Exchange AMS-IX major and neutral internet exchange six housing locations including SARA and Nikhef

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

44

Nikhef-SARA LCG Tier-1

Nikhef - SARA share … campus, building, on-site security, restaurant LCG OPN connections tier-1 operations (!)

Nikhef - SARA do NOT share … power and cooling infrastructure sysadmin tier-1 resources (grid services, clusters, storage) tier-1 operations (!)

SARA does and Nikhef does not … provide hierarchical storage (tape, dCache) generic grid services

Nikhef does and SARA does not … … middleware development (VOMS, LCAS, etc)middleware development (VOMS, LCAS, etc) scaling and validation test bedsscaling and validation test beds tier-3 servicestier-3 services

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

55

Computing

0

2000

4000

6000

8000

10000

12000

14000

16000

2007/08 2008/09 2009/08 2010/2011

kSPECint2000

LCG HEP Other e-sciences

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

66

Disk Storage

0

1000

2000

3000

4000

5000

6000

7000

2007/08 2008/09 2009/08 2010/2011

TByte

LCG HEP Other e-sciences

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

77

Tape Storage

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

2007/08 2008/09 2009/08 2010/2011

TByte

LCG HEP Other e-sciences

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

88

LCG HEP resources

Tier-1 installations:Tier-1 installations:

ComputingComputing SARA 60%SARA 60% Nikhef 40%Nikhef 40%

Disk storageDisk storage SARA 60%SARA 60% Nikhef 40%Nikhef 40%

Tape storageTape storage SARA 100%SARA 100%

Note: ‘Big Grid’ budget until 2011.Note: ‘Big Grid’ budget until 2011.

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

99

Expanding the Nikhef data center

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1010

Data center layout

amsterdaminternet

exchange

nikhef grid

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1111

Amsterdam Internet Exchange AMS-IX

neutral and independentneutral and independent started 15 years ago at Science Parkstarted 15 years ago at Science Park now: distributed housing at 6 locations in Amsterdamnow: distributed housing at 6 locations in Amsterdam large exchange: 300 connected partieslarge exchange: 300 connected parties Nikhef housing: 200 racks, 100 customersNikhef housing: 200 racks, 100 customers

Nikhef provides:Nikhef provides: UPS power, cooling, security, accessUPS power, cooling, security, access assistance during office hoursassistance during office hours

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1212

Amsterdam Internet Exchange AMS-IX

zero-down-time

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1313

Nikhef - power demands

controlledcontrolledlinearlinearincreaseincrease

0

200

400

600

800

1000

1200

1400

1600

1800

2007 2008 2009 2110 2111 2112 2113 2114 2115

KWatt

Cooling

Grid

AMS-IX

Nikhef

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1414

we need more …we need more … floor space, power, coolingfloor space, power, cooling security, security, fire suppression, fire suppression, alarm proceduresalarm procedures monitoring of critical infrastructuremonitoring of critical infrastructure

but it has to be …but it has to be … realized within the existing (institute) building realized within the existing (institute) building without affecting ams-ix operations (zero-down-time)without affecting ams-ix operations (zero-down-time)

Expanding the data center

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1515

many discussions with managementmany discussions with management reliable infrastructure is very expensive!reliable infrastructure is very expensive!

gained experience gained experience visit commercial data centers,visit commercial data centers, visit conferences like ‘Datacenter Dynamics’visit conferences like ‘Datacenter Dynamics’

hired technical external expertise and project hired technical external expertise and project managementmanagement

incident due to overloaded circuit breakerincident due to overloaded circuit breaker monitoring and capacity planning are essentialmonitoring and capacity planning are essential

put effort into temporarily measuresput effort into temporarily measures

What happened …

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1616

Temporarily measures (1)

backup generators

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1717

Temporarily measures (2)

This week: add extra cooling for 50 KW grid resources, just in time for May run CCRC08

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1818

install new cooling equipment (on the roof)install new cooling equipment (on the roof) integrate a 2integrate a 2ndnd UPS and generator into infrastructure UPS and generator into infrastructure

remember ‘zero-down-time’remember ‘zero-down-time’ install new fire suppression and climate handling systemsinstall new fire suppression and climate handling systems convert library into new data room on the 2convert library into new data room on the 2ndnd floor floor move grid clusters and storage from 1move grid clusters and storage from 1stst to 2 to 2ndnd floor floor extent AMS-IX housing on the 1extent AMS-IX housing on the 1stst floor floor

Make the grid resources visible …Make the grid resources visible …

Planning …

finished April 2009 (I hope)

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

1919

From library to grid …

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2020

main power distribution:main power distribution: connected to facility control system (alarm -> standby service)connected to facility control system (alarm -> standby service) current (amps) per phase in power distribution unitscurrent (amps) per phase in power distribution units power drop per phase on distribution railspower drop per phase on distribution rails

power usage in racks:power usage in racks: connected to ‘our’ IT control systemconnected to ‘our’ IT control system current (amps) and power usage (KWh) per phase in rackscurrent (amps) and power usage (KWh) per phase in racks needed for capacity planning and billing energy costs to users needed for capacity planning and billing energy costs to users

Note: monitoring the grid clusters and storage is done separately Note: monitoring the grid clusters and storage is done separately (Ganglia)(Ganglia)

Monitoring power to the racks

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2121

Power monitoring

Amps and KWhrs

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2222

AMS-IX housing (can’t change too much): AMS-IX housing (can’t change too much): 10 years ago designed for 1.8 KW average per rack10 years ago designed for 1.8 KW average per rack yes, this is still the average today! [telco equipment]yes, this is still the average today! [telco equipment] but we have annoying but we have annoying ‘hot spots’ on the floor on the floor too many obstacles under raised floor and above ceilingtoo many obstacles under raised floor and above ceiling

grid housing (new floor!)grid housing (new floor!) maximum 50 racks and 300KW total powermaximum 50 racks and 300KW total power raised floor, but limited space above the racksraised floor, but limited space above the racks proposed solution: cold corridor principleproposed solution: cold corridor principle

save energy …save energy … free cooling, optimize cold air flowfree cooling, optimize cold air flow increase room temperature and cold water temperature (10-16 C)increase room temperature and cold water temperature (10-16 C)

Cooling

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2323

now: only smoke detectionnow: only smoke detection

choice between:choice between: leave it as it isleave it as it is suppression with inert gas (Argon)suppression with inert gas (Argon) suppression with chemical gas (Novec-1230)suppression with chemical gas (Novec-1230)

Suggestions?Suggestions?

Fire suppression

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2424

Extending an existing facility

2007 2008 2009 2110 2111 21122113

21142115

increase

floor space

no-break powercooling

0%

20%

40%

60%

80%

100%

120%

It is expensive in time and It is expensive in time and moneymoney

You don’t get what you You don’t get what you really wantreally want

Piping and fitting through Piping and fitting through concrete floorsconcrete floors

zero-down-time: stressingzero-down-time: stressing

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2525

Remarks and conclusions

from idea to realization: it takes two yearsfrom idea to realization: it takes two years

you have to position yourself between IT and you have to position yourself between IT and infrastructureinfrastructure

sustainability: can a data center be green?sustainability: can a data center be green? CoolingCooling Grid : how to guarantee an optimal usage of resources?Grid : how to guarantee an optimal usage of resources?

if you can start all over again: do it!if you can start all over again: do it!

HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef

2626

Questions? Questions?