sura cyberinfrastructure workshop georgia state university january 5 – 7, 2005
DESCRIPTION
Jefferson Lab: Experimental and Theoretical Physics Grids. Andy Kowalski. SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005. Jefferson Lab. Who are we? Thomas Jefferson National Accelerator Facility Department of Energy Research Laboratory - PowerPoint PPT PresentationTRANSCRIPT
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
SURA Cyberinfrastructure WorkshopGeorgia State University
January 5–7, 2005
Jefferson Lab: Experimental and
Theoretical Physics Grids
Andy Kowalski
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Jefferson Lab
• Who are we?
• Thomas Jefferson National Accelerator Facility
• Department of Energy Research Laboratory
• Southeastern Universities Research Association
• What do we do?
• High Energy Nuclear Physics
• quarks and gluons
• Operate a 6.07 GeV continuous electron beam accelerator
• Free-Electron Laser (10 kW)
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Jefferson Lab
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Data and Storage
• Three experimental halls
• HallA and HallC
• 100’s GB/day each
• HallB – CLAS
• 1-2.5 TB/day (currently up to 30MB/sec)
• Currently store and manage 1 PB of data on tape.
• Users around the world want access to the data
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Computing
• Batch Farm
• 200 dual CPU nodes
• ~358,060 SPECint2000
• Moves 4-7 TB/day
• Reconstruction
• Analysis
• Simulations (CLAS – large)
• Lattice QCD Machine
• 3 clusters
• 128, 256, 384 nodes
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Need for Grids
• 12 GeV Upgrade
• HallB – CLAS data rates increase to 40-60 MB/sec
• Will export 50% or more of the data
• Import data from simulations done at Universities
• This can be a rather large amount
• HallD – GlueX
• Same scale as the LHC experiments
• 100 MB/sec - 3 PB of data per year
• 1 PB of raw data at JLab
• 1 PB for analysis (JLAb and offsite)
• 1 PB for simulations (offsite)
• Lattice QCD
• 10 TF machine
• A significant amount of data
• Users around the world want access to the data
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
JLab: Theory and Experimental Grid Efforts
• Similarities
• Focus on Data Grids
• Desire interfaces definitions for interoperability
• Chose web services for implementation
• WSDL defines the interface
• Theory
• ILDG and PPDG
• SRM
• Replica Catalog
• Experimenal
• PPDG and pursuing OSG
• SRM
• Job submission interface
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
ILDG: Data Grid Services Web Services Architecture
File Client
Meta Data Catalog
Replica Catalog
SRM Service
Replication Service
Storage (disk, silo)
File Server(s)
Web Services
Single Site
Meta Data Catalog
Replica Catalog
SRM Service
Replication Service
Storage (disk, silo)
File Server(s)
Replica Catalog
Meta Data Catalog
Storage (disk, silo)
File Server(s)
SRM Service
Storage (disk, silo)
File Server(s)
Replication Service
Replica Catalog
SRM Service
Storage (disk, silo)
File Server(s)
Replication Service
Replica Catalog
SRM Service
(Consistency Agent)
Storage (disk, silo)
File Server(s)
SRM Service
Replication Service
Replica Catalog
Storage (disk, silo)
File Server(s)
SRM Service
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
ILDG: A Three Tier Web Services Architecture
Web Browser
XML to HTML servlet
Web Service
Application
Web Service
Web Service
Local Backend Services
(batch, file, etc.)
Web Server (Portal)
Authenticated connections
Remote Web Server
Web Service
Storage system
Catalogs
Web services provide a standard API for clients, and intermediary servlets allow use from a browser (as in a portal)
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Components: Meta Data Catalog
Hold metadata for files
• Hold metadata for a set of files (data set)
• Process query lookup
• Queries return (sets of) GFN, (Global File Name = key), and optionally full metadata for each match
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
File Client
Meta Data Catalog
Replica Catalog
SRM Service
Replication Service
Storage Resource
File Server(s) SRM Listener
Web Services
Single Site
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Components: Replica Catalog
Track all copies of a file / data set
• Get replicas
• Create replica
• Remove replica
• Prototypes exist at
• Jefferson Lab
• Fermilab
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
File Client
Meta Data Catalog
Replica Catalog
SRM Service
Replication Service
Storage Resource
File Server(s) SRM Listener
Web Services
Single Site
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Components: Storage Resource Manager
Manage storage system
• Disk only
• Disk plus tape
• 3 party file transfers
• Negotiate protocols for file retrieval (select a file server)
• Auto stage a file on get (asynchronous operation)
• Version 2.1 defined (collaboration)
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
File Client
Meta Data Catalog
Replica Catalog
SRM Service
Replication Service
Storage Resource
File Server(s) SRM Listener
Web Services
Single Site
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
ILDG Components
MetaData Catalog (MDC)
• Each collaboration deploys one
• A mechanism (not defined yet, under discussion) exists for searching all (a virtual MDC)
Replica Catalog (RC)
• (same comments)
Storage Resource Manager (SRM)
• Each collaboration deploys one or more
• At each SRM site, there are one or more file servers: http, ftp, gridftp, jparss, bbftp, …
* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
JLab: Experimental Effort
• PPDG (Particle Physics Data Grid)
• Collaboration of computer scientists and physicists
• Developing and deploying production Grid systems for experiment-specific applications
• Now supporting OSG (Open Science Grid)
• SRM (Storage Resource Manager)
• A common/standard interface to mass storage systems
• In 2003 FSU used SRM v1 to process monte-carlo for 30 million events
• In 2004 deployed a v2 implementation for testing
• Required for production in February 2005
• Already working with LBL, Fermi, CERN to define v3
• Job Submission
• PKI Based authentication to Auger (JLab job submission system)
• Investigated uJDL (a user level job description language)
• BNL leading this effort
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Envisioned Architecture
Meta-JobGateway
Meta-JobScheduler
Meta-JobTranslator
Site Submitter
ReplicaCatalog
StorageResource Manager
ReplicaCatalog
GridMonitoring
LSFStorage
Resource Manager
User Submission uJDL
sJDL
uJDL
sJDL
sJDL
Provided by the Experiments
Condor-G
Provided by the Site
GFN
GFN
SURL
SURL
Meta-DataCatalog
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
SRM v2
• Implemented SRM version 2.1.1.
• Interface to Jasmine via the HPC Disk/Cache Manager.
• JLab SRM is a Java Web Service.
• Uses Apache Axis as SOAP Engine
• Uses Apache Tomcat as Servlet Engine.
• Uses GridFTP for file movement
• Testing with CMU
• Production service required by February 2005.
• Had a hard time using GT3
• Cannot just take components that one wants (it is all or nothing)
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
SRM v2 Server Deployment
• Requires Tomcat, MySQL, SRM worker daemon
• Firewall configuration: SRM port 8443 GRIDFTP ports 2811, 40000-45000
• Currently only installed at JLab
• Testing client access with CMU
• Next step: install an SRM server at CMU
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
SRM v2 Client Deployment
• Installed at JLab and CMU
• Implements only srmGet and srmPut (permission problem to fix)
• Requires specific ant and java versions
• Proper grid certificate request and installation a challenge (?)
• Use OpenSSL for cert request instead
• Globus requires a full installation simply to request a cert and run the client
• Just need grid-proxy-init
• Note: Curtis' notes are at http://www.curtismeyer.com/grid_notes
• Currently the only SRM v2 server and client
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Long-Term SRM Work
• We are considering how the next SRM version could become the primary interface to Jasmine and the primary farm file mover.
• Use for Local and Remote Access
• Goal: 25TB/day from tape through SRM.
• Balancing classes of requests/prioritizing types of data transfers becomes essential.
• Farm interaction use cases must be modeled: farm input, farm output, scheduling.
• We are already looking at what SRM v3 will look like.
• SRM Core Features and Feature Sets (ideas from the last SRM meeting)
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Job Submission
• uJDL
• Is this really needed?
• Is a standard job submission interface what is really needed?
• Is that Condor-G?
• Auger interface
• Uses java web services
• Uses PKI authentication for authentication
• Not GSI
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
Grid3dev - OSG
• JLab development effort is limited
• Grid3 proved successful
• Atlas and CMS were the major users
• JLab plans to join Grid3dev as a step toward OSG-INT/OSG
• We cannot develop everything we need
• VO management tools, monitoring, etc.
• Testing and evaluation
• Integration with facility infrastructure
• Determine what we need and can use for others
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy
Thomas Jefferson National Accelerator Facility
Andy Kowalski
References
• http://www.ppdg.net
• http://www.opensciencegrid.org
• http://www.lqcd.org/ildg
• http://sdm.lbl.gov/srm
• http://www.globus.org