Nikhef/(SARA) tier-1data center infrastructure
Tier-1 factsTier-1 factsExpanding the Nikhef centerExpanding the Nikhef center
Wim Heubers / NikhefWim Heubers / NikhefAmsterdam NLAmsterdam NL
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
22
LCG Tier-1 Amsterdam Science Park
Nikhef - National institute for subatomic physics LHC (ATLAS, LHCb, ALICE), astroparticle physicsLHC (ATLAS, LHCb, ALICE), astroparticle physics data center: 500 mdata center: 500 m22, 800KW incl cooling, 800KW incl cooling
grid services (disk storage, clusters)grid services (disk storage, clusters) internet exchange AMS-IXinternet exchange AMS-IX
SARA - Computing and Networking Services colo services, consultingcolo services, consulting data center: 1500 mdata center: 1500 m22, 2 MW incl cooling,, 2 MW incl cooling,
national super, national cluster, netherlight, etcnational super, national cluster, netherlight, etc grid services (tape and disk storage, clusters)grid services (tape and disk storage, clusters) internet exchange AMS-IXinternet exchange AMS-IX
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
33
LCG Tier-1 Amsterdam Science
Park More infrastructure:
SURFnet - national research network provides connectivity to LCG OPN
Big Grid - the dutch e-science grid provides resources for LCG tier-1 and other domains 2008-2011.
Amsterdam Internet Exchange AMS-IX major and neutral internet exchange six housing locations including SARA and Nikhef
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
44
Nikhef-SARA LCG Tier-1
Nikhef - SARA share … campus, building, on-site security, restaurant LCG OPN connections tier-1 operations (!)
Nikhef - SARA do NOT share … power and cooling infrastructure sysadmin tier-1 resources (grid services, clusters, storage) tier-1 operations (!)
SARA does and Nikhef does not … provide hierarchical storage (tape, dCache) generic grid services
Nikhef does and SARA does not … … middleware development (VOMS, LCAS, etc)middleware development (VOMS, LCAS, etc) scaling and validation test bedsscaling and validation test beds tier-3 servicestier-3 services
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
55
Computing
0
2000
4000
6000
8000
10000
12000
14000
16000
2007/08 2008/09 2009/08 2010/2011
kSPECint2000
LCG HEP Other e-sciences
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
66
Disk Storage
0
1000
2000
3000
4000
5000
6000
7000
2007/08 2008/09 2009/08 2010/2011
TByte
LCG HEP Other e-sciences
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
77
Tape Storage
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
2007/08 2008/09 2009/08 2010/2011
TByte
LCG HEP Other e-sciences
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
88
LCG HEP resources
Tier-1 installations:Tier-1 installations:
ComputingComputing SARA 60%SARA 60% Nikhef 40%Nikhef 40%
Disk storageDisk storage SARA 60%SARA 60% Nikhef 40%Nikhef 40%
Tape storageTape storage SARA 100%SARA 100%
Note: ‘Big Grid’ budget until 2011.Note: ‘Big Grid’ budget until 2011.
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
99
Expanding the Nikhef data center
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1010
Data center layout
amsterdaminternet
exchange
nikhef grid
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1111
Amsterdam Internet Exchange AMS-IX
neutral and independentneutral and independent started 15 years ago at Science Parkstarted 15 years ago at Science Park now: distributed housing at 6 locations in Amsterdamnow: distributed housing at 6 locations in Amsterdam large exchange: 300 connected partieslarge exchange: 300 connected parties Nikhef housing: 200 racks, 100 customersNikhef housing: 200 racks, 100 customers
Nikhef provides:Nikhef provides: UPS power, cooling, security, accessUPS power, cooling, security, access assistance during office hoursassistance during office hours
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1212
Amsterdam Internet Exchange AMS-IX
zero-down-time
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1313
Nikhef - power demands
controlledcontrolledlinearlinearincreaseincrease
0
200
400
600
800
1000
1200
1400
1600
1800
2007 2008 2009 2110 2111 2112 2113 2114 2115
KWatt
Cooling
Grid
AMS-IX
Nikhef
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1414
we need more …we need more … floor space, power, coolingfloor space, power, cooling security, security, fire suppression, fire suppression, alarm proceduresalarm procedures monitoring of critical infrastructuremonitoring of critical infrastructure
but it has to be …but it has to be … realized within the existing (institute) building realized within the existing (institute) building without affecting ams-ix operations (zero-down-time)without affecting ams-ix operations (zero-down-time)
Expanding the data center
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1515
many discussions with managementmany discussions with management reliable infrastructure is very expensive!reliable infrastructure is very expensive!
gained experience gained experience visit commercial data centers,visit commercial data centers, visit conferences like ‘Datacenter Dynamics’visit conferences like ‘Datacenter Dynamics’
hired technical external expertise and project hired technical external expertise and project managementmanagement
incident due to overloaded circuit breakerincident due to overloaded circuit breaker monitoring and capacity planning are essentialmonitoring and capacity planning are essential
put effort into temporarily measuresput effort into temporarily measures
What happened …
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1616
Temporarily measures (1)
backup generators
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1717
Temporarily measures (2)
This week: add extra cooling for 50 KW grid resources, just in time for May run CCRC08
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
1818
install new cooling equipment (on the roof)install new cooling equipment (on the roof) integrate a 2integrate a 2ndnd UPS and generator into infrastructure UPS and generator into infrastructure
remember ‘zero-down-time’remember ‘zero-down-time’ install new fire suppression and climate handling systemsinstall new fire suppression and climate handling systems convert library into new data room on the 2convert library into new data room on the 2ndnd floor floor move grid clusters and storage from 1move grid clusters and storage from 1stst to 2 to 2ndnd floor floor extent AMS-IX housing on the 1extent AMS-IX housing on the 1stst floor floor
Make the grid resources visible …Make the grid resources visible …
Planning …
finished April 2009 (I hope)
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2020
main power distribution:main power distribution: connected to facility control system (alarm -> standby service)connected to facility control system (alarm -> standby service) current (amps) per phase in power distribution unitscurrent (amps) per phase in power distribution units power drop per phase on distribution railspower drop per phase on distribution rails
power usage in racks:power usage in racks: connected to ‘our’ IT control systemconnected to ‘our’ IT control system current (amps) and power usage (KWh) per phase in rackscurrent (amps) and power usage (KWh) per phase in racks needed for capacity planning and billing energy costs to users needed for capacity planning and billing energy costs to users
Note: monitoring the grid clusters and storage is done separately Note: monitoring the grid clusters and storage is done separately (Ganglia)(Ganglia)
Monitoring power to the racks
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2121
Power monitoring
Amps and KWhrs
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2222
AMS-IX housing (can’t change too much): AMS-IX housing (can’t change too much): 10 years ago designed for 1.8 KW average per rack10 years ago designed for 1.8 KW average per rack yes, this is still the average today! [telco equipment]yes, this is still the average today! [telco equipment] but we have annoying but we have annoying ‘hot spots’ on the floor on the floor too many obstacles under raised floor and above ceilingtoo many obstacles under raised floor and above ceiling
grid housing (new floor!)grid housing (new floor!) maximum 50 racks and 300KW total powermaximum 50 racks and 300KW total power raised floor, but limited space above the racksraised floor, but limited space above the racks proposed solution: cold corridor principleproposed solution: cold corridor principle
save energy …save energy … free cooling, optimize cold air flowfree cooling, optimize cold air flow increase room temperature and cold water temperature (10-16 C)increase room temperature and cold water temperature (10-16 C)
Cooling
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2323
now: only smoke detectionnow: only smoke detection
choice between:choice between: leave it as it isleave it as it is suppression with inert gas (Argon)suppression with inert gas (Argon) suppression with chemical gas (Novec-1230)suppression with chemical gas (Novec-1230)
Suggestions?Suggestions?
Fire suppression
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2424
Extending an existing facility
2007 2008 2009 2110 2111 21122113
21142115
increase
floor space
no-break powercooling
0%
20%
40%
60%
80%
100%
120%
It is expensive in time and It is expensive in time and moneymoney
You don’t get what you You don’t get what you really wantreally want
Piping and fitting through Piping and fitting through concrete floorsconcrete floors
zero-down-time: stressingzero-down-time: stressing
HEPix May 2008HEPix May 2008Wim Heubers / NikhefWim Heubers / Nikhef
2525
Remarks and conclusions
from idea to realization: it takes two yearsfrom idea to realization: it takes two years
you have to position yourself between IT and you have to position yourself between IT and infrastructureinfrastructure
sustainability: can a data center be green?sustainability: can a data center be green? CoolingCooling Grid : how to guarantee an optimal usage of resources?Grid : how to guarantee an optimal usage of resources?
if you can start all over again: do it!if you can start all over again: do it!