henp computing at bnl torre wenaus star software and computing leader bnl rhic & ags users...
TRANSCRIPT
HENP Computing at BNL
Torre WenausSTAR Software and Computing Leader
BNL
RHIC & AGS Users MeetingAsilomar, CA
October 21, 1999
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ContentBruce’s talk
ATLAS
Linux
Mock Data Challenges
D0
focus on areas really changing the scale of HENP comp at BNL
Mount’s APOGEE talk
Security
Software ‘attracting good people’
ROOT; Phenix’s online threaded
Objectivity, MySQL
RIKEN comp center
Esnet
Open Science
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Historical Perspective
Prior to RHIC, BNL has hosted many small to modest scale AGS experiments
With RHIC, BNL moves into realm of large collider detectors computing task at a scale similar to SLAC, Fermilab, CERN etc.
Has required a dramatic change in scale of HENP computing at BNL RHIC Computing Facility (RCF) established Feb 1997 to supply
primary (non-simulation) RHIC computing needs Successful operations in two ‘Mock Data Challenge’ production
stress tests and in summer 1999 engineering run First physics run in early 2000
Presence of RCF a strong factor in the selection of BNL as the principal US computing site for the CERN LHC ATLAS experiment
Requirements and computing plan similar to RCF Will operate in close coordination with RCF LHC and ATLAS operations begin in 2005
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
This Talk
Will focus on the major growth of HENP computing as a BNL activity brought by these new programs
RHIC computing at BNL ATLAS computing at BNL Brief mention of some other programs Conclusions
Thanks to Bruce Gibbard, RHIC computing facility head, and others (indicated on slides) for materials
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
RHIC Computing at RCF
Four experiments: PHENIX, STAR, PHOBOS, BRAHMS 4:4:2:1 relative scales of computing task
Aggregate raw data recording rate of ~60 MBytes/sec Annual raw data volume ~600 TBytes
NB. Size of global WWW content estimated at 7 Tbytes
Event reconstruction: 13,000 SPECint95 (450MHz PC = 18 SPECint95)
Event filtering (data mining) and physics analysis: 7,000 SPECint95 ‘mining’ interesting data off of tape for physics analyses
aggregate access rates of ~200 MBytes/sec iterative, interactive analysis of disk-based data by hundreds of users
aggregate access rates of ~1000 MBytes/sec
Software development and distribution 100’s of developers; many 100k lines of code per experiment RCF is primary development and distribution (AFS) site
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Computing Strategies
Extensive use of community/commercial/commodity products hardware and software increasing use of open software (eg. Linux, MySQL database)
Exploit ‘embarrassingly parallel’ nature of HENP computing farms of loosely coupled processors (Linux PCs on Ethernet) limited use of Sun machines for I/O intensive analysis
Hierarchical storage management (disk + tape robot/shelf) and flexible partitioning of event data based on access characteristics
optimize storage cost and access latencies to interesting data
Extensive use of OO software technologies adopted by all four RHIC experiments, ATLAS, other BNL HENP
software efforts (eg. D0), and virtually all other forthcoming expts primarily C++; some Java Object I/O: Objectivity commercial OO database and ROOT
community (CERN) developed tool
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Event Data Storage and Management
Major software challenge: event data storage and management ROOT: HENP community tool (from CERN)
used by all RHIC experiments for event data storage Objectivity: Commercial object database
Used by PHENIX for conditions database RCF did Linux port
Relational databases (MySQL, ORACLE) Many cataloguing applications in experiments, RCF MySQL developed by STAR as complement to ROOT for event store,
replacing Objectivity Grand Challenge Architecture
Managed access to HPSS-resident data, particularly for data mining LBNL-led with ANL, BNL participation; deployment at RCF
Particle Physics Data Grid: transparent wide-area data processing US HENP ‘Next Generation Internet’ project, primarily LHC directed RCF/RHIC will act as early testbed
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ATLAS Computing at BNL
A Toroidal LHC ApparatuS One of 4 experiments at LHC 14 TeV pp collider
ATLAS computing at CERN estimated to be >10 times that of RHIC
Augmented by regional centers outside CERN Total scale similar to CERN installation
US ATLAS will have one primary ‘Tier 1’ regional center, at BNL ~20% of CERN facility; ~2x RCF
BNL also manages the US ATLAS construction project; ~20% of full ATLAS detector
Simulation, data mining, physics analysis, and software development will be primary missions of the BNL Tier 1 center
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ATLAS: Commonality and Synergy with RHIC
Qualitative requirements and Tier 1 quantitative requirements similar to RCF Exploit economies of scale in hardware and software Share technical expertise Learn from and build on RHIC computing as a ‘real world testbed’
Commonality: Complete coincidence of supported platforms
Intel/Linux processor farms, Sun/Solaris Objectivity -- and shared concerns over Objectivity! HPSS -- and shared concerns over HPSS! Data mining, Grand Challenge ROOT as an interim analysis tool Particle Physics Data Grid
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Current Status
RHIC RCF Hardware for first year physics in place, except for some tape store
hardware (5 drives; IBM server upgrades) Extensive testing and tuning to be done
performance, reliability, robustness All year 1 requirements satisfied except for disk capacity (later
augmentation an option; not critically needed now) In production use by experiments Positive review by Technical Advisory Committee just concluded
US ATLAS Tier 1 center Initial facility in place, usage by US ATLAS ramping up Operating out of RCF ATLAS software installed and operating More hardware on the way; further increases at proposal stage Dedicated manpower ramping up
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Conclusions
RHIC and RCF have brought BNL to the forefront of HENP computing Computing scale, imminent operation, mainstream approaches and
community involvement make RHIC computing an important testbed for today’s technologies and a stepping stone to the next generation
Performance to date gives confidence for RHIC operations Strong software efforts at BNL in the experiments
BNL as host of US ATLAS Tier 1 center will be a leading HENP computing center in the years to come
Leveraging the facilities, expertise and experience of RCF and the RHIC program
Facility installation to be complemented by a software development effort integrated with the local US ATLAS group
Programs well supported by Brookhaven as part of an increased attention to scientific computing at the lab
Lots of potential for involvement!
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
RIKEN QCDSP Parallel Computer
Special purpose massively parallel machine based on DSPs for quantum field theory calculations
4D mesh with nearest neighbor connections
12,288 node, 600 Gflops
Custom designed and built Collaboration centered at
Columbia
RIKEN BNL Research Center
192 mother boards, 64 processors each
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
CDIC - Center for Data Intensive Computing
Newly established BNL Center developing collaborative projects Close ties to SUNY at Stony Brook
Some of the HENP projects proposed or begun RHIC Visualization
Newly established collaboration with Stony Brook to develop dynamic 3D visualization tools for RHIC interactions and `beam’s eye’ view
RHIC Computing Proposed collaboration with IBM to use idle PC cycles for RHIC
physics simulation (generator level) Data Mining
New project studying application of `rough sets’ data mining concepts to RHIC event classification and feature extraction
Accelerator Design Proposed parallel simulation of beam dynamics for accelerator design
and optimization
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Visualization
RHIC Au-Au collision animation
(Quicktime movie available on web)
PHENIX event simulation
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ESnet Utilization
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Open Software/Open Science Conference
BNL Oct 2, 1999 Educate scientists on open source projects Stimulate open source applications in
science Present science applications to open source
developers
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
HENP Computing Challenges
Experiment Data ComputeE895 (AGS) 10 TB/yr 600 SPECint95BaBar (SLAC) 400 TB/yr 5,000 SPECint95STAR (RHIC) 266 TB/yr 10,100 SPECint95PHENIX (RHIC) 700 TB/yr 8,500 SPECint95D0 Run II (FNAL) 280 TB/yr 4,075 SPECint95CDF Run II (FNAL) 464 TB/yr 3,650 SPECint95ATLAS (LHC) 1100 TB/yr 2,000,000 SPECint95
Experiment Countries Institutes Collaborators Time FrameE895 (AGS) 3 12 49 2000BaBar (SLAC) 9 85 600 2010STAR (RHIC) 7 34 400 2010PHENIX (RHIC) 10 41 400 2010D0 Run II (FNAL) 11 77 500 2005CDF Run II (FNAL) 8 41 490 2005ATLAS (LHC) 34 144 1700 2015
Craig Tull, LBNL
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
STAR at RHICRHIC: Relativistic Heavy Ion Collider at Brookhaven National Laboratory
Colliding Au - Au nuclei at 200GeV/nucleon Principal objective: Discovery and characterization of the Quark
Gluon Plasma Additional spin physics program in polarized p - p Engineering run 6-8/99; first year physics run 1/00
STAR experiment One of two large ‘HEP-scale’ experiments at RHIC, >400
collaborators each (PHENIX is the other) Heart of experiment is a Time Projection Chamber (TPC) drift
chamber (operational) together with Si tracker (year 2) and electromagnetic calorimeter (staged over years 1-3)
Hadrons, jets, electrons and photons over large solid angle
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
The STAR Computing Task
Data recording rate of 20MB/sec; ~12MB raw data per event (~1Hz) ~4000+ tracks/event recorded in tracking detectors (factor of 2
uncertainty in physics generators) High statistics per event permit event by event measurement and
correlation of QGP signals such as strangeness enhancement, J/psi attenuation, high Pt parton energy loss modifications in jets, global thermodynamic variables (eg. Pt slope correlated with temperature)
17M Au-Au events (equivalent) recorded in nominal year
Relatively few but highly complex events requiring large processing power Wide range of physics studies: ~100 concurrent analyses in ~7
physics working groups
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
RHIC/STAR Computing Facilities
Dedicated RHIC computing center at BNL, the RHIC Computing Facility Data archiving and processing for reconstruction and analysis Three production components: Reconstruction (CRS) and analysis
(CAS) services and managed data store (MDS) 10,000 (CRS) + 7,500 (CAS) SpecInt95 CPU ~50TB disk, 270TB robotic tape, 200MB/s I/O bandwidth,
managed by High Performance Storage System (HPSS) developed by DOE/commercial consortium (IBM et al)
Current scale: ~2500 Si95 CPU, 3TB disk for STAR
Limited resources require the most cost-effective computing possible Commodity Intel farms (running Linux) for all but I/O intensive
analysis (Sun SMPs)
Smaller outside resources: Simulation, analysis facilities at outside computing centers Limited physics analysis computing at home institutions
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Implementation of RHIC Computing ModelIncorporation of Offsite Facilities
T3E HPSS Tape storeSP2
Many universities, etc.
Berkeley
Japan
MIT
Doug Olson, LBNL
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
HENP Computing: Today’s Realities
Very Large Data Volumes
Large, Globally Distributed Collaborations
Long Lived Projects (>15 years)
Large (1-2M LOC), Complex Analyses
Distributed, Heterogeneous Systems
Very Limited Computing Manpower
Most Computing Manpower are not Professionals Not necessarily a bad thing! Good understanding and direct interest
among developers in the problem
Reliance on Open and Commercial Software & Standards
Evolving Computer Industry & Technology
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Event Data StorageManagement of Petabyte data volumes arguably the most difficult task in HENP
computing today Solutions must map effectively onto OO software technology
Intensive community effort in Object Database technology in last 5 years Focus on Objectivity, the only commercial product that scales to PBytes Great early promise; strong potential to minimize in-house development
and match well the OO architecture of experiments Reality has been more difficult: development effort much greater than
expected, and mixed results on scalabilityIn parallel with Objectivity, community solutions have also been developed
Particularly, ROOT system from CERN supporting I/O of C++ based object models
When complemented by a relational database, provides a robust and scalable solution that integrates well with experiment software
The jury is still out STAR and some other experiments have dropped Objectivity in favor of
ROOT+RDBMS BaBar at SLAC is in production with Objectivity, and is working through
the problems
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Data Management
Coupled to the event data storage problem, but distinct, is the problem of managing effective archiving and retrieval of the data
Hierarchical storage management system required, capable of managing Terabytes of disk-resident rapid-access data Petabytes of tape-resident data with medium latency access
Industry offers very few solutions today One (only) has been identified: HPSS Deployed at RCF (and many other sites), successfully but with
caveats Demands high manpower levels for development and 24x7 support Still under development, particularly in HENP applications, with
stability and robustness issues
Community HENP solutions under development in this area as well (Fermilab, DESY)
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Distributed Computing
In current generation experiments such as RHIC, and to a much greater degree in the next generation such as LHC, distributed computing is essential
Fully empowering physicists not at the experimental site to participate in development and analysis, with effective access to the data
Distributing the computing and data management task among several large sites
The central site can no longer afford to support computing on its own
Near and long term efforts underway to address the need eg. NOVA project at BNL (Networked Object-based enVironment
for Analysis): small project to address immediate and near term needs (STAR/RHIC, ATLAS, possibly others)
Large, LHC directed projects such as the Particle Physics Data Grid project and the MONARC regional center modelling project
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Computing Requirements
Nominal year processing and data volume requirements:
Raw data volume: 200TB
Reconstruction: 2800 Si95 total CPU, 30TB DST data 10x event size reduction from raw to reco 1.5 reconstruction passes/event assumed
Analysis: 4000 Si95 total analysis CPU, 15TB micro-DST data 1-1000 Si95-sec/event per MB of DST depending on analysis
Wide range, from CPU-limited to I/O limited ~100 active analyses, 5 passes per analysis micro-DST volumes from .1 to several TB
Simulation: 3300 Si95 total including reconstruction, 24TB
Total nominal year data volume: 270TB
Total nominal year CPU: 10,000 Si95
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
STAR Computing Facilities: RCF Data archiving and processing for reconstruction and analysis (not
simulation; done offsite) General user services (email, web browsing, etc.) Three production components: Reconstruction and analysis services
(CRS, CAS) and managed data store (MDS) Nominal year scale:
10,000 (CRS) + 7,500 (CAS) SpecInt95 CPU Intel farms running Linux for almost all processing; limited use of
Sun SMPs for I/O intensive analysisCost-effective, productive, well-aligned with the HENP
community ~50TB disk, 270TB robotic tape, 200MB/s, managed by HPSS
Current scale (when new procurements are in place): ~2500 Si95 CPU, 3TB disk for STAR ~8TB of data currently in HPSS
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Computing Facilities
Dedicated RHIC computing center at BNL, the RHIC Computing Facility Data archiving and processing for reconstruction and analysi
Simulation done offsite 10,000 (reco) + 7,500 (analysis) Si95 CPU
Primarily Linux; some Sun for I/O intensive analysis ~50TB disk, 270TB robotic tape, 200MB/s, managed by HPSS Current scale (STAR allocation, ~40% of total):
~2500 Si95 CPU 3TB disk
Support for (a subset of) physics analysis computing at home institutions
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Mock Data ChallengesMDC1: Sep/Oct ‘98
>200k (2TB) events simulated offsite; 170k reconstructed at RCF (goal was 100k)
Storage technologies exercised (Objectivity, ROOT) Data management architecture of Grand Challenge project demonstrated Concerns identified: HPSS, AFS, farm management software
MDC2: Feb/Mar ‘99 New ROOT-based infrastructure in production AFS improved, HPSS improved but still a concern Storage technology finalized (ROOT) New problem area, STAR program size, addressed in new procurements
and OS updates (more memory, swap)
Both data challenges: Effective demonstration of productive, cooperative, concurrent (in MDC1)
production operations among the four experiments Bottom line verdict: the facility works, and should perform in physics
datataking and analysis
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Offline Software EnvironmentCurrent software base a mix of Fortran (55%) and C++ (45%)
from ~80%/20% (~95%/5% in non-infrastructure code) in 9/98 New development, and all post-reco analysis, in C++
Framework built over ROOT adopted 11/98 Origins in the ‘Makers’ of ATLFAST Supports legacy Fortran codes, table (IDL) based data structures
developed in previous StAF framework without change Deployed in offline production and analysis in our ‘Mock Data
Challenge 2’, 2-3/99Post-reconstruction analysis: C++/OO data model ‘StEvent’
StEvent interface is ‘generic C++’; analysis codes are unconstrained by ROOT and need not (but may) use it
Next step: migrate the OO data model upstream to reco
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Initial RHIC DB Technology Choices
A RHIC-wide Event Store Task Force in Fall ‘97 addressed data management alternatives
Requirements formulated by the four experiments Objectivity and ROOT were the ‘contenders’ put forward STAR and PHENIX selected Objectivity as the basis for data
management Concluded that only Objectivity met the requirements of their event
stores ROOT selected by the smaller experiments and seen by all as
analysis tool with great potential Issue for the two larger experiments:
Where to draw a dividing line between Objectivity and ROOT in the data model and data processing
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Event Store Requirements -- And Fall ‘97 View
Requirement Objectivity ROOT
Good C++ API OK OK
Scalability to RHIC data volumes OK No file mgmt
Adequate I/O throughput OK OK
HPSS compatibility Planned No
Integrity, availability of data OK No file mgmt
Recovery from permanently lost data OK No file mgmt
Object versioning, schema evolution OK Crude
Long term availability OK? OK?
Access control OS OS
Administration tools OK No
Backup, recovery of subsets of data OK No file mgmt
WAN distribution of data OK No file mgmt
Data locality control OK OS
Linux support No OK
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Requirements: STAR 8/99 View (My Version)
Requirement Obj 97 Obj 99 ROOT 97 ROOT 99
C++ API OK OK OK OK
Scalability OK ? No file mgmt MySQL
Aggregate I/O OK ? OK OK
HPSS Planned OK? No OK
Integrity, availability OK OK No file mgmt MySQL
Recovery from lost data OK OK No file mgmt OK, MySQL
Versions, schema evolve OK Your job Crude Almost OK
Long term availability OK? ??? OK? OK
Access control OS Your job OS OS, MySQL
Admin tools OK Basic No MySQL
Recovery of subsets OK OK No file mgmt OK, MySQL
WAN distribution OK Hard No file mgmt MySQL
Data locality control OK OK OS OS, MySQL
Linux No OK OK OK
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
RHIC Data Management: Factors For Evaluation
My perception of changes in the STAR view from ‘97 to now are shown
Objy Root+MySQL Factor
Cost
Performance and capability as data access solution
Quality of technical support
Ease of use, quality of doc
Ease of integration with analysis
Ease of maintenance, risk
Commonality among experiments
Extent, leverage of outside usage
Affordable/manageable outside RCF
Quality of data distribution mechanisms
Integrity of replica copies
Availability of browser tools
Flexibility in controlling permanent storage location
Level of relevant standards compliance, eg. ODMG
Java access
Partitioning DB and resources among groups
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
Object Database: Storage Hierarchy vs User View
User deals only with ‘object model’ of his own design; storage details are hidden
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ATLAS and US ATLAS
One of two large HEP experiments at CERN’s Large Hadron Collider (LHC)
Proton-proton collider; 14 TeV in center of mass 1 billion events/year Principal objective: Discovery and characterization of physics
‘beyond the Standard Model’: Higgs, Supersymmetry, … Startup 2005+
Brookhaven hosts the US Project Office for US contributions to ATLAS ~$170M; about 20% of the project
Brookhaven recently selected as host lab for US ATLAS Computing and site of US Regional Center
Extension of RHIC Computing Facility US ATLAS Computing projected to grow to ~$15M/yr
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ConclusionsHENP is (unfortunately!) still pushing the envelope in the scale of the data
processing and management tasks of present and next generation experiments
The HENP community has looked to the commercial and open software worlds for tools and approaches, with strong successes in some areas (OO programming), qualified successes in others (HPSS), and the jury is still out on some (Object Databases)
Moore’s Law and the rise of Linux have made provisioning CPU cycles less of an issue
The community has converged on OO as the principal tool to make software development tractable
But solutions to data storage and management are much less clear A need on the rise is distributed computing, but internet-driven growth in
capacities and technologies will be a strong lever
Developments within the HENP community continue to be important, either as fully capable solutions or interim solutions pending further commercial/open software developments
Torre Wenaus, BNL
RHIC/AGS Users Meeting 10/99
ConclusionsThe circumstances of STAR
Startup this year Slow start in addressing event store implementation, C++ migration Large base of legacy software Extremely limited manpower and computing resources
drive us to very practical and pragmatic data management choices Beg, steal and borrow from the community Deploy community and industry standard technologies Isolate implementation choices behind standard interfaces, to revisit
and re-optimize in the futurewhich leverage existing STAR strengths
Component and standards-based software greatly eases integration of new technologies
preserving compatibility with existing tools for selective and fall-back use
while efficiently migrating legacy software and legacy physicists After some course corrections, we have a capable data management
architecture for startup that scales to STAR’s data volumes … but Objectivity is no longer in the picture.