computing strategy doe annual review patricia mcbride fermilab computing division

35
Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division September 26, 2007

Upload: rocio

Post on 15-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division September 26, 2007. DOE Annual Program Review. Computing is a fundamental part of the laboratory infrastructure and the scientific program. Three main components: Core Computing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

Computing Strategy

DOE Annual ReviewPatricia McBride

Fermilab Computing Division

September 26, 2007

Page 2: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 2P L McBride Computing Strategy

DOE Annual Program Review

• Computing is a fundamental part of the laboratory infrastructure and the scientific program.

• Three main components:– Core Computing– Computing in support of the Scientific Program– Computing R&D

• Strategy:– Support the scientific program with state of the art

computing facilities– Maximize connections between the experiments and

computer facilities– Research and development aimed at scientific discovery

and the facilities of tomorrow

Page 3: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 3P L McBride Computing Strategy

Computing in Support of the Scientific Program• Broad scientific program with diverse needs:

– Run II, CMS (and the LHC),Astrophysics, Neutrino program, Theory, Accelerator

– Future program: ILC, Project X, Nova, SNAP

• Challenges:– Experiments require more and more computing resources– Experiments and Users are spread over the globe.

• Look for common solutions:– GRID so resources can be shared– Common storage solutions (expertise from Run II)

• Connectivity between global computing resources is increasingly important.

– We need to move data seamlessly throughout widely distributed computing systems.

• To accomplish this takes experience, expertise, and R&D– Fermilab has become a leader in HEP computing facilities through Run

II experience and continues this tradition with the CMS Tier-1 center.

Page 4: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 4P L McBride Computing Strategy

Facilities - Current Status• Computing Division operates

computing hardware and provides and manages needed computer infrastructure, i.e., space, power & cooling

• Computer rooms are located in 3 different buildings (FCC, LCC and GCC)

• Mainly 4 types of hardware:– Computing Box (Multi-CPU, Multi-Core)

– Disk Server– Tape robot with tape drive– Network equipment

computing boxes

6300 (Apr 07)

disk > 1 PetaByte

tapes5.5 PetaByte in 39,250 tapes (available 62,000)

tape robots 9

power 4.5 MegaWatts

cooling 4.5 MegaWatts

Page 5: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 5P L McBride Computing Strategy

Year 2002 Total Computer Power Projections vs Capacity & Actual Usage

0

500

1000

1500

2000

2500

3000

3500

4000

KVA

Built Capacity Next Build Experiments PredictionBudget Prediction Power Trend Prediction Actual Power Usage

FCC

LCC 108

GCCCRA

GCC CRB

GCC CRC

anticipatedFY08

LCC107

Planning for future facilities•The scientific program will continue to demand increasing amounts of computing power and storage.

–More computers are purchased (~1,000/yr)–Power required per new computer increases–More computers per sq. ft. of floor space

• Advances in computing may not be as predictable in the future.

•Multi-core processors of the future may not be so easy to exploit for our applications.•R&D in these area may be needed.

•R&D will be needed to learn to cope with increasing demands in the next decade.

FCC KVA Reduction HistoryDate Exp KVA Comments CDF D0 CMS SDSS OtherJan '03 CDF -6 50 500 MHz -6Jan '03 D0 -6 50 500 MHz -6Sept. '03 GP Farms -6 50 nodes -6Nov. '03 CDF -5 5 SCGI2000 -5Nov. '03 D0 -8 50 Recon PC's -8Jan. '04 CDF -4 23 Recon PC's -4Apr. '04 CMS -7 42 nodes, disk -7Apr./May '04SDSS -7 Remove/replace -7May '04 CDF -35 160 nodes - New Muon -35June '04 D0 -8 D0mino -8June '04 D0 -52 Recon farms 15 racks -52Sept. '04 CDF -17 80 farm nodes to BTeV -17Dec. '04 GP Farms -6 40 nodes -6Jan. '05 D0 -8 D0mino -8Jan. '05 CDF -5 FCDFSGI2 -5March '05 CMS -44 176 nodes -44June '05 D0 -32 128 nodes from FCC2 -32July '05 CMS -16 64 nodes from FCC -16April '06 D0 -24 96 D0 CAB from FCC2 to GCC -24April '06 D0 -22 89 D0 Recon from FCC2 to GCC -22Sept. '06 D0 -30 160 D0 CAB compute nodes retired -30Oct. '06 CDF -60 256 CAF compute nodes retired -60May '07 D0 -3 2 SCSI racks on FCC1E -3Sept '07 CDF -35 80 CAF II file servers from FCC1E -35

Total -446 -213 -147 -67 -7 -12

Page 6: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 6P L McBride Computing Strategy

Facilities - Future• To improve sufficient provisioning of

computing power, electricity and cooling, following new developments are under discussion:– Water cooled racks (instead of air cooled

racks)– Blade server designs

• Vertical arrangement of server units in rack• Common power supply instead of

individual power supplies per unit• higher density, lower power consumption

– Multi-Core Processors due to smaller chip manufacturing processes• Same computing power at reduced power

consumption• Major topic of discussion at CHEP in Sept. ‘07

Page 7: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 7P L McBride Computing Strategy

Strategies to support Run II Computing

• Computing Division FTEs dedicated to Run II Computing will be held constant through FY08 and increase by 2 FTEs in FY09.– 2 Application Physicists were hired in FY07 explicitly for Computing

Operations– Scientist leadership expected to remain in place through FY09– Continuing Guest Scientist and visitor positions

• Increased use of shared scientific services (common across Run II, CMS, MINOS, others)– This has been a fairly successful strategy for the past 3+ years

• Use of Grid resources for computing – All computational resources at Fermilab are part of FermiGrid and are

thus available for sharing in times of low use by the primary customer– Being Grid-enabled allows for opportunistic use of computing

resources world-wide and will facilitate large Run II special processing efforts.

• Experiments and Computing Division (CD) are undertaking campaigns together to improve automation and operational efficiency

Page 8: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 8P L McBride Computing Strategy

FermiGrid: a strategy for success• A set of common services for the Fermilab site:

– The site Globus gateway.– The site Virtual Organization support services (VOMRS

and VOMS).– The site Grid User Mapping Service (GUMS) which

routinely handles well in excess of 500K transactions/day.– The Site AuthoriZation Service (SAZ) which allows us to

implement a consistent and enforceable site security policy (whitelist and blacklist).

• The public interface between the Open Science Grid and Fermilab computing resources:

– Also provides interoperation with EGEE and LCG.– Approximately 10% of our total computing resources have

been opportunistically used by members of various OSG Virtual Organizations.

• A consistent interface to much of our disparate collections of compute and storage resources:

– CDF, D0 and GPFARM clusters & worker nodes.– SRM, dcache, STKen.

0

5

10

15

20

25

30

35

40

Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07

Month

Un

iqu

e V

O C

ou

nt

by O

rgan

izati

on

EGEE

Other

LHC

Fermilab User VOs

Pragma

ILC

OSG

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07

Month

CP

U

Tim

e (H

ou

rs)

Other

Fermilab User VOs

EGEE

LHC

Pragma

OSG

ILC

Page 9: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 9P L McBride Computing Strategy

Data Handling: Tape systems

Fermilab CD has lots of operational experience with tape systems.

Operational success is achieved through frequent interactions with stakeholders.

Assure integrity of archived data through checksums, reading of unaccessed tapes, migration to newer media, and other measures.

Tape systems are a mature technology. Improvements will come from increase in size of tapes and robot technology

•Data handling for experiments:

•Storage: “active library-style” archiving on tapes in tape robots

•Access: disk based system (dCache) to cache sequential/random access patterns to archived data samples

Page 10: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

Central Storage aims to reduce cost of user storage.

• Fileserver consolidation– BlueArc NAS filers

• Tiered Storage– HDS, 3PAR, Nexsan

• Thin Provisioning• Dual SAN fabric

– 272 ports

• 32 Storage arrays– ~800TB raw total

Central Storage Management

NAS Storage Growth Year 1

Page 11: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 11P L McBride Computing Strategy

Networking for FNAL

Strategy for networking:• Deploy additional capacity at

incremental cost through over-provisioned fiber plant on-site and leased dark fiber off-site.

• Build and share expertise in high-performance data movement from application to application.

Data transfers in preparation for CMS data operations:

• In the last two years, outbound traffic from Fermilab has grown from 94.3 TB/month (July 2005) to 2.15 PB/month (July 2007). CMS exercise "CSA06" accounts for the bump in June/July 2006.

Full scale = 2.5 PB/month

Page 12: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 12P L McBride Computing Strategy

Distributed Computing: Networking and data transfers

• Computing on the GRID requires excellent and reliable network connections.

• Network research insights have proved invaluable for real-world CMS data movement performance tuning and problem solving.

• Fermilab network and storage teams provide expertise for CMS data operations.

• Over a 90-day period of mock-production data transfer testing for CMS, 77% of all data delivered from CMS Tier-1 centers was delivered by Fermilab.

Full scale = 5 PB

CMS-Computing: Tier CMS-Computing: Tier structure:structure:20% T0 at CERN, 40% 20% T0 at CERN, 40% at T1s and 40% at T2sat T1s and 40% at T2sFermilab is the largest Fermilab is the largest CMS T1CMS T1

Page 13: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 13P L McBride Computing Strategy

Open Science Grid• Fermilab is a major contributor in world-wide Grid computing and a

consortium member and leader in Open Science Grid• OSG is a collaboration between physics (high energy, astro,

nuclear, gravitational wave-LIGO), computer science, IT facilities and non-physical sciences.

• The OSG Project is funded cross-agency - DOE SciDAC-2 and NSF - for 5 years (from 10/06) for 33FTE of effort: – Fermilab staff members have project leadership roles:

• Ruth Pordes - Executive Director, • Don Petravick - Security Officer, • Chris Green - co-leader of the Users Group, • Ted Hesselroth - Storage Software Coordinator.

• OSG currently provides access to 77 compute clusters (~30,000 cores) and 15 disk caches (~4PetaBytes), as well as mass storage at BNL, LBNL, Fermilab. Throughput currently 60K application jobs/day and 10K CPU days/day. Overall success rate on grid ~80%.

Page 14: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 14P L McBride Computing Strategy

Run II already benefits from OSG• D0 used Grid accessible farms in

the US and South America (on OSG) and Europe “opportunistically” for more than half of their full dataset reprocessing Feb-May 2007.

• More than 12 Lab and University clusters on OSG sites were used; processed 286 million event and transferred 70 TB data from/to Fermilab.

• CDF have grid enabled all Monte Carlo production.

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Week in 2007

CIT_CMS_T2 FNAL_DZEROOSG_2 FNAL_FERMIGRIDFNAL_GPFARM GLOW GRASE-CCR-U2MIT_CMS MWT2_IU NebraskaNERSC-PDSF OSG_LIGO_PSU OU_OSCER_ATLASOU_OSCER_CONDOR Purdue-RCAC SPRACEUCSDT2 UFlorida-IHEPA UFlorida-PGUSCMS-FNAL-WC1-CE

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Week in 2007

CIT_CMS_T2 FNAL_DZEROOSG_2 FNAL_FERMIGRIDFNAL_GPFARM GLOW GRASE-CCR-U2MIT_CMS MWT2_IU NebraskaNERSC-PDSF OSG_LIGO_PSU OU_OSCER_ATLASOU_OSCER_CONDOR Purdue-RCAC SPRACEUCSDT2 UFlorida-IHEPA UFlorida-PGUSCMS-FNAL-WC1-CE

D0 Throughput on OSG (CPU hours/week)

Event Throughput - D0 Reprocessing

Page 15: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 15from O. Gutsche (CMS)

Fermilab CMS Tier 1 facility

• Fermilab’s T1 facility is in the 3rd year of a 4 year procurement period and within budget– Number of CPU doubled to ~900 nodes corresponding to 5.5 MSI2k

– Disk space increased to 1.7 PByte (one of the world largest dCache installations)

– Wide area network connection currently 20 Gbit/s

Export form FNAL, peak more than 1GB/s a day

Import to FNAL, peak more than 250MB/s a day

August August

Page 16: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 16from O. Gutsche (CMS)

Fermilab T1 facility usage• Fermilab’s contributes significantly to the

overall CMS computing– Major contribution to MC production (own

production and archive of samples produced on US T2)

– Major contribution to standard operations (re-reconstruction and skimming, etc.)

• User analysis contribution goes beyond T1 facility

– Large user analysis activity not only on the T1 facility

– LPC-CAF used extensively by Fermilab, US and international collaborators for various analysis purposes

• Operation and extension of facility manpower extensive

– Admin staff continuously maintains the systems– Scaling issues frequently arise while increasing

size– 4 year ramp-up plan helps solving scaling

problems in a timely manner– Strong support in the future required for

successful operation

successful Analysis jobs (dark green):

more than 50,000 in August

August

August

successful Production jobs (dark green):

more than 100,000 in August

Page 17: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 17P L McBride Computing Strategy

LHC@FNAL: remote operations• Remote operations for CMS and the LHC has been an

R&D effort for several years. • Collaborated with CERN to set the scope of remote

operations and the development plan– Expect to participate in CMS shifts, DQM, data operation and

LHC studies.

• Established a collaboration with plasma physics community.

– Joint proposal submitted to SciDAC(not funded).

• Collaborative tools are an important part of remote operations including high quality communication equipment.

• FNAL development efforts for remote operations: Role-Based Access (RBAC) for the LHC, Screen Shot Service (already in use by CDF and CMS global runs)

• Working with the ILC community to develop plans for remote operations capabilities.

RBAC: 1 CD developer to CERN

for 9 months

SSS: developer at FNAL (~0.3 FTE)

Page 18: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 18P L McBride Computing Strategy

GEANT4 Development @ FNALFermilab Joined the Geant4 Collaboration in June 2007.

Contributions to time performance & reliability (M. Fischer, J. Kowalkowski, M. Paterno) and hadronic physics (S. Banerjee, D. Elvira, J. Yarba).

This work has already resulted in:

• 5% improvement to the G4 code

• Validation of low energy p/p – thin target data to be used in a re-parameterization of the Low Energy Parameterization.

• Re-design of the repository of validation results, collection of information

The Fermilab group also has responsibilities in the CMS simulation group management, and it is the core of the CMS simulation infrastructure development group.

CMS Simulation at the LHC Physics Center

Page 19: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 19P L McBride Computing Strategy

Computing for the ILC

• Detector Simulations (~3 FTE) – CD has established detector simulation activities, working with PPD, other

labs and universities. This effort will grow and evolve to meet the needs of the detector studies.

– We plan to focus on infrastructure, tools in addition.• A combination of computer professionals and physicists will be

involved in this effort. • Tools include simulation package support, algorithm development,

code repository, and other expertise. GEANT4 support from CD will aid in the efforts.

• Accelerator simulations (~2 FTE)– CD will provide accelerator simulation tools and accelerator simulations for

the ILC as part of the APC. • Contributions include the Synergia and related tools and work on the

linac studies.• Damping ring studies are ramping down for now. • Other efforts will be taken up as required.

Page 20: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 20P L McBride Computing Strategy

Computing for the ILC

• Large scale simulation support (will be 1-2 FTE) – There is a proposal to provide large scale facilities for tightly coupled

calculations for the ILC and other scientific efforts at the lab. One possible application is the simulation of the ILC RF system for the input coupler and high-order-mode couplers for the cavities. This facility could provide support for many applications that require tightly coupled parallel computing, including accelerator simulations.

• Test beam (1-2FTE for computing; 1-2 engineering) – CD is planning to provide support for data storage and analysis for test

beam studies for the ILC. The support is similar to the type of support provided for the experimental program, and it is an important service for the detector R&D groups without access to significant computing resources.

• Accelerator Controls (~5-7 FTE from CD/ ~5 from AD)– Includes LLRF, Test facility controls, (high availability, timing), RDR, EDR

Page 21: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 21P L McBride Computing Strategy

Accelerator Simulations: SciDAC2 and COMPASS

• The COMPASS collaboration won a SciDAC2 award in April of 2007.– Project funded by HEP, NP, BES,

and ASCR, at ~$3M/year for 5 years

• COMPASS is the successor of the AST SciDAC1 project – Includes more activities &

participants– Panagiotis Spentzouris from

Fermilab is the PI for COMPASS.(See his presentation for more

details.)

The Community Petascale Project for Accelerator Science and Simulation collaboration

Page 22: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 22P L McBride Computing Strategy

10 channel LLRF controller noise measurementsSFDR= -81.8dbSuperconducting RF 8-cavity vectorsum gradient

measurements

Field regulation: 0.006%Phase regulation: 0.042º

Successful test of ILC Low Level RF control at DESY-FLASH Sept. 2007

Noise measured using cavity probe splits while the existing DESY LLRF was controlling the cryomodule. Noise figures include cavity probes, cables, analog and digital electronics.RMS error:

Amplitude: -84dB.Phase: 0.042 degrees.

Computing in ILC Accelerator R&D

Joint Computing Division / Accelerator Division Team

Page 23: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 23P L McBride Computing Strategy

Computing for Astrophysics

The computing strategy is to leverage existing infrastructure (tape robots,Fermigrid, etc) already in place for CDF/D0 and CMS

Fermilab is hosting a public archive for SDSS.plus two copies as backups

in storage.

Expt. Duration TB/yr Nodes

SDSS 1 yr 3 100

DES Start 2011 50 100

CDMS 2+ yrs 7 100

Auger 5 yrs 3 16

Total 63 300

Resource planning for Astrophysics

Experiments5 years

Page 24: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 24P L McBride Computing Strategy

Flash Memory

SDRAMFPGA

(Actel A3P1000)

Flash Memory (under test)

Flash Memory Test Board

DAQ Slice Firmware Development Board

•Reading a billion pixels in space --> computing R&D•Fermilab (CD) has contributed to the definition of the partitioned DAQ architecture for SNAP.

•Designed a firmware development board and a test stand to test communications, instrument control, data compression and data communications capabilities.

SNAP Instrument Electronics R&D

FNAL flash memory test system for SNAP

Page 25: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 25P L McBride Computing Strategy

Lattice QCD facility

Fermilab is a member of the SciDAC-2 Computational Infrastructure for LQCD Project

The facility is distributed at 3 labs: • Custom built QCDOC at BNL• Specially configured clusters at Jlab,

Fermilab • At Fermilab:

– “QCD” (2004) – 128 processors coupled with a Myrinet 2000 network, sustaining 150 GFlop/sec

– “Pion” (2005) – 520 processors coupled with an Infiniband fabric, sustaining 850 GFlop/sec

– “Kaon” (2006) – 2400 processor cores coupled with an Infiniband fabric, sustaining 2.56 TFlop/sec

Lots of scientific papers have been produced using this facility.

Page 26: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 26P L McBride Computing Strategy

Lattice QCD - plans• As part of the DOE 4-year USQCD

project, Fermilab is scheduled build:– a 4.2 TFlop/sec system in late 2008– a 3.0 TFlop/sec system in late 2009

• Software projects:– new and improved libraries for LQCD

computations– multicore optimizations– automated workflows– reliability and fault tolerance– visualizations

TO

P 5

00

Su

perc

om

pu

ter

Kaon

Page 27: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 27P L McBride Computing Strategy

Cosmological Computing - Plans

CD currently maintains a small 8 core cluster for cosmologyOther groups have many more resources:

Virgo Consortium: 670 CPU SparcIII, 816 CPU Power4ITC, Harvard: 316 CPU Opteron, 264 CPU AthlonLANL: 294 CPU Pentium4 (just for cosmology)CITA: 270 CPU Xeon clusterSLAC: 72 CPU SGI Altrix, 128 CPU Xeon clusterPrinceton: 188 CPU Xeon clusterUWash: 64 CPU Pentium cluster

Using FRA grant and contribution from KICP (UC) and Fermilab, CD will host/maintain a 560 core cluster for cosmology by December.

Gnedin (FNAL/PPD) developed the Adaptive Refinement Tree (ART) code

• Implementation of Adaptive Mesh Refinement method

• Refines on a cell-by-cell basis• Full 4D adaptive

• Includes - dark matter dynamics

- gas dynamics and chemistry - radiative transfer

- star formation, etc

The new cluster is crucial to extract fundamental physics from astrophysical observations (SDSS, DES, SNAP)→ Complements/enhances experimental program

Page 28: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 28P L McBride Computing Strategy

Cosmological Computing Cluster• Modern state-of-the art cosmological

simulations require even more inter-communication between processes than Lattice QCD and:

– ≥ 100,000 CPU-hours (130 CPU-months). Biggest ones take > 1,000,000 CPU-hours.

– computational platforms with wide (multi-CPU), large-memory nodes.

• New cluster plans based on:– “Computational Cosmology Initiative: Task

Force Report” CD participating in a Task Force to

unify cosmological computing on a national scale

• Equipment for cluster– AMD Barcelona, 552 Cores

• Cosmological calculations involve substantial amounts of data:

– The full system will involve the FNAL Enstore Mass Storage System.

Page 29: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 29P L McBride Computing Strategy

Summary and Conclusions

• Computing is an integral part of the Fermilab scientific program and must adapt to new demands and an ever changing environment.

• Fermilab is a leader in GRID computing, storage and networking for HEP applications and provides facilities for the Fermilab experiments, the astrophysics program and CMS.

• R&D efforts in network, storage and other core systems have been key to smooth operations of the Fermilab computing facilities.

• The Fermilab Tier-1 facility is a leader in CMS and computing and scientific staff associated with the center provide expertise and leadership to the collaboration.

• We have a begun a program to address computing issues for the ILC.• Advanced Scientific Computing R&D is a vital part of the computing

strategy at the lab: Accelerator simulations, Lattice QCD, Cosmological Computing. These applications need state of the art facilities to be competitive.

Page 30: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 30P L McBride Computing Strategy

Backup slides

Page 31: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 31P L McBride Computing Strategy

FNAL Computer SecurityFNAL Computer Security

• The challenge in Computer Security is to The challenge in Computer Security is to maintain a balance of security and maintain a balance of security and openness in support of open science.openness in support of open science.

• Risk-based program follows NIST Risk-based program follows NIST standardsstandards

• An array of scanners and detectors with a An array of scanners and detectors with a central database (NIMI)central database (NIMI)

– Tracks every system connected to the Tracks every system connected to the FNAL networkFNAL network

– Identifies the sysadmin of every systemIdentifies the sysadmin of every system– Scans continuously & periodically for Scans continuously & periodically for

services and vulnerabilitiesservices and vulnerabilities– Detects network anomaliesDetects network anomalies– Notifies and blocks non-compliant systemsNotifies and blocks non-compliant systems

• Central Laboratory-wide authentication Central Laboratory-wide authentication systemsystem

– Kerberos- & Windows-basedKerberos- & Windows-based– Kerberos-derived X.509 certificatesKerberos-derived X.509 certificates

The thing standing between us and millions of attacks a day is the computer security team…

0

1000000

2000000

3000000

4000000

5000000

6000000

2007 Darknet Attacks

Page 32: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 32P L McBride Computing Strategy

Mass Storage & Data Movement• Enstore is used as the tape backend for storage of scientific dataEnstore is used as the tape backend for storage of scientific data

– Presents a file-system view of tape storagePresents a file-system view of tape storage– Routinely move >30 TB per day in and out of EnstoreRoutinely move >30 TB per day in and out of Enstore

• dCache is used as a high-performance disk cache for transient datadCache is used as a high-performance disk cache for transient data– May be used w/ or w/o EnstoreMay be used w/ or w/o Enstore– Provides Grid interfacesProvides Grid interfaces– Supports many replicas for performance or reliabilitySupports many replicas for performance or reliability– Build from commodity disk arrays (SATABeast)Build from commodity disk arrays (SATABeast)

• Both are joint projects involving High-Energy Physics and Grid Both are joint projects involving High-Energy Physics and Grid collaboratorscollaborators

Page 33: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 33from Ruth Pordes - OSG

LHC and OSG• OSG facility provides access to all US LHC resources (Tier-1s, 12

Tier-2s sites and >10 Tier-3s), contributing more than agreed upon data and job throughput for experiment data challenges and event simulations in 2006 and 2007.

• OSG provides common shared grid services: operations, security, monitoring, accounting, testing etc. Partners with the European Grid infrastructure (EGEE), Nordu Data Grid Facility (NDGF) and other national infrastructures to make the World Wide LHC Computing Grid.

• The OSG Virtual Data Toolkit (VDT) provides common grid middleware and support for both OSG and EGEE. Fermilab CD, US CMS and US ATLAS software and computing organizations develop common and reusable software through joint projects facilitated by OSG.

Page 34: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 34P L McBride Computing Strategy

Scientific Linux at FermilabScientific Linux at Fermilab• Scientific Linux was born to enable HEP Scientific Linux was born to enable HEP

computer centers to continue to use computer centers to continue to use opensource Linux distributions with support opensource Linux distributions with support and security patches. and security patches.

• Scientific Linux (SL) is a joint project Scientific Linux (SL) is a joint project between Fermilab, CERN and other between Fermilab, CERN and other contributors which provides an open-source contributors which provides an open-source distribution of Linux for the scientific distribution of Linux for the scientific (primarily High-Energy Physics) community(primarily High-Energy Physics) community

• Scientific Linux Fermi (SLF) provides Scientific Linux Fermi (SLF) provides Fermilab-specific customizationsFermilab-specific customizations

• SL is installed at EGEE and OSG sites for SL is installed at EGEE and OSG sites for LHC computing.LHC computing.

• SL and SLF are community-supported SL and SLF are community-supported (primarily via mailing lists)(primarily via mailing lists)

• SLF provides infrastructure for patching, SLF provides infrastructure for patching, inventory and configuration managementinventory and configuration management

• Some applications (primarily Oracle) require Some applications (primarily Oracle) require commercially-supported Red Hat Linuxcommercially-supported Red Hat Linux

• See: See: https://www.scientificlinux.org/

Page 35: Computing Strategy DOE Annual Review Patricia McBride Fermilab Computing Division

September 26, 2007 35P L McBride Computing Strategy

Credits

• Many thanks to the many people from CD who contributed to this presentation.

• Apologies for all the work I could not show.