a. mohapatra, hepix 2013 ann arbor1 uw madison cms t2 site report d. bradley, t. sarangi, s. dasu,...

14
A. Mohapatra, HEPiX 2013 Ann Arbor 1 UW Madison CMS T2 site UW Madison CMS T2 site report report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline Infrastructure Resources Management & Operation Contributions to CMS Summary

Upload: nancy-hodges

Post on 27-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

A. Mohapatra, HEPiX 2013 Ann Arbor 1

UW Madison CMS T2 site reportUW Madison CMS T2 site reportUW Madison CMS T2 site reportUW Madison CMS T2 site report

D. Bradley, T. Sarangi, S. Dasu, A. MohapatraHEP Computing Group

Outline

InfrastructureResourcesManagement & OperationContributions to CMSSummary

A. Mohapatra, HEPiX 2013 Ann Arbor 2

EvolutionEvolutionEvolutionEvolution

Started out as a grid3 site Played a key role in the formation of

the Grid laboratory of Wisconsin (GLOW)

HEP/CS (Condor team) collaboration • Designed standalone MC production system,

adapted CMS software, and run it robustly in non-dedicated environments (UW grid & beyond)

Selected as one of the 7 CMS Tier2 site in the US.

Became a member of WLCG and subsequently OSG.

Serving all OSG supported VOs besides CMS

A. Mohapatra, HEPiX 2013 Ann Arbor 3

InfrastructureInfrastructureInfrastructureInfrastructure

3 machine rooms, 16 racks Power supply – 650 KW Cooling – Chilled water based air coolers and

POD based hot aisles

A. Mohapatra, HEPiX 2013 Ann Arbor 4

Compute / Storage Compute / Storage ResourcesResources

Compute / Storage Compute / Storage ResourcesResources

Compute (SL6) • T2 HEP Pool – 4200 cores

• Dedicated to CMS• CHTC and GLOW Pool – 3200

cores• Opportunistic

Storage (Hadoop)•Migrated from dCache

to Hadoop 3 years ago.•3PBs distributed across

350 nodes •Will add 1B soon.

A. Mohapatra, HEPiX 2013 Ann Arbor 5

Network ConfigurationNetwork ConfigurationNetwork ConfigurationNetwork Configuration

Internet2

NLR

ESNET

Chicago

PurdueFNAL Nebraska

UW Campus

T2 LAN

Server

Server

Server

Sw

itch

Sw

itch

100GNew

3x10G

10G

10G

10G

1G4x10G

Perfsonar (Latency & Bandwidth)nodes are used to debug LAN andWAN (+ USCMS matrix) issues

A. Mohapatra, HEPiX 2013 Ann Arbor 6

Network Configuration (2)Network Configuration (2)Network Configuration (2)Network Configuration (2)

Strong support from compus network team (DoIT)

Good rate for CMS data transfer from T1/T2 sites

A. Mohapatra, HEPiX 2013 Ann Arbor 7

Software and ServicesSoftware and ServicesSoftware and ServicesSoftware and Services

File systems• AFS, NFS, CVMFS, ext4

Job batch systems• Condor

OSG software • Globus, GUMS, glexec

Storage• Hadoop (hdfs), BestMan2(srm), gridFtp,

Xrootd

Cluster management• Local yum repo, Puppet

Cluster monitoring• Nagios, Ganglia, and tons of home grown scripts

A. Mohapatra, HEPiX 2013 Ann Arbor 8

Cluster Software ManagementCluster Software ManagementCluster Software ManagementCluster Software Management

Cfengine usage until fall of 2012 Migration to puppet

• Translation and successful validation of individual cfengine based modules

A. Mohapatra, HEPiX 2013 Ann Arbor 9

Cluster Health MonitoringCluster Health MonitoringCluster Health MonitoringCluster Health Monitoring

Nagios•Hardware, disks etc.

Ganglia•Services, memory, cpu/disk usage,

network, storage

OSG tools•RSV

CMS specific •SAM, Hammer Cloud

Miscellaneous Scripts•Cron’ed

A. Mohapatra, HEPiX 2013 Ann Arbor 10

CMS Usage (production/analysis)CMS Usage (production/analysis)CMS Usage (production/analysis)CMS Usage (production/analysis)

A. Mohapatra, HEPiX 2013 Ann Arbor 11

Contributions to CMSContributions to CMSContributions to CMSContributions to CMS

• HTCondor and Glidein technology

A. Mohapatra, HEPiX 2013 Ann Arbor 12

Any data, Anytime, AnywhereAny data, Anytime, AnywhereAny data, Anytime, AnywhereAny data, Anytime, Anywhere

A. Mohapatra, HEPiX 2013 Ann Arbor 13

Experience with Amazon EC2 Experience with Amazon EC2 Experience with Amazon EC2 Experience with Amazon EC2

A. Mohapatra, HEPiX 2013 Ann Arbor 14

SummarySummarySummarySummary

Our Tier-2 site has been running well