sura cyberinfrastructure workshop georgia state university january 5 – 7, 2005

22
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy Thomas Jefferson National Accelerator Facility Andy Kowalski SURA Cyberinfrastructure Workshop Georgia State University January 5–7, 2005 Jefferson Lab: Experimental and Theoretical Physics Grids Andy Kowalski

Upload: marlee

Post on 12-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Jefferson Lab: Experimental and Theoretical Physics Grids. Andy Kowalski. SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005. Jefferson Lab. Who are we? Thomas Jefferson National Accelerator Facility Department of Energy Research Laboratory - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

SURA Cyberinfrastructure WorkshopGeorgia State University

January 5–7, 2005

Jefferson Lab: Experimental and

Theoretical Physics Grids

Andy Kowalski

Page 2: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Jefferson Lab

• Who are we?

• Thomas Jefferson National Accelerator Facility

• Department of Energy Research Laboratory

• Southeastern Universities Research Association

• What do we do?

• High Energy Nuclear Physics

• quarks and gluons

• Operate a 6.07 GeV continuous electron beam accelerator

• Free-Electron Laser (10 kW)

Page 3: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Jefferson Lab

Page 4: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Data and Storage

• Three experimental halls

• HallA and HallC

• 100’s GB/day each

• HallB – CLAS

• 1-2.5 TB/day (currently up to 30MB/sec)

• Currently store and manage 1 PB of data on tape.

• Users around the world want access to the data

Page 5: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Computing

• Batch Farm

• 200 dual CPU nodes

• ~358,060 SPECint2000

• Moves 4-7 TB/day

• Reconstruction

• Analysis

• Simulations (CLAS – large)

• Lattice QCD Machine

• 3 clusters

• 128, 256, 384 nodes

Page 6: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Need for Grids

• 12 GeV Upgrade

• HallB – CLAS data rates increase to 40-60 MB/sec

• Will export 50% or more of the data

• Import data from simulations done at Universities

• This can be a rather large amount

• HallD – GlueX

• Same scale as the LHC experiments

• 100 MB/sec - 3 PB of data per year

• 1 PB of raw data at JLab

• 1 PB for analysis (JLAb and offsite)

• 1 PB for simulations (offsite)

• Lattice QCD

• 10 TF machine

• A significant amount of data

• Users around the world want access to the data

Page 7: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

JLab: Theory and Experimental Grid Efforts

• Similarities

• Focus on Data Grids

• Desire interfaces definitions for interoperability

• Chose web services for implementation

• WSDL defines the interface

• Theory

• ILDG and PPDG

• SRM

• Replica Catalog

• Experimenal

• PPDG and pursuing OSG

• SRM

• Job submission interface

Page 8: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

ILDG: Data Grid Services Web Services Architecture

File Client

Meta Data Catalog

Replica Catalog

SRM Service

Replication Service

Storage (disk, silo)

File Server(s)

Web Services

Single Site

Meta Data Catalog

Replica Catalog

SRM Service

Replication Service

Storage (disk, silo)

File Server(s)

Replica Catalog

Meta Data Catalog

Storage (disk, silo)

File Server(s)

SRM Service

Storage (disk, silo)

File Server(s)

Replication Service

Replica Catalog

SRM Service

Storage (disk, silo)

File Server(s)

Replication Service

Replica Catalog

SRM Service

(Consistency Agent)

Storage (disk, silo)

File Server(s)

SRM Service

Replication Service

Replica Catalog

Storage (disk, silo)

File Server(s)

SRM Service

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

Page 9: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

ILDG: A Three Tier Web Services Architecture

Web Browser

XML to HTML servlet

Web Service

Application

Web Service

Web Service

Local Backend Services

(batch, file, etc.)

Web Server (Portal)

Authenticated connections

Remote Web Server

Web Service

Storage system

Catalogs

Web services provide a standard API for clients, and intermediary servlets allow use from a browser (as in a portal)

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

Page 10: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Components: Meta Data Catalog

Hold metadata for files

• Hold metadata for a set of files (data set)

• Process query lookup

• Queries return (sets of) GFN, (Global File Name = key), and optionally full metadata for each match

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

File Client

Meta Data Catalog

Replica Catalog

SRM Service

Replication Service

Storage Resource

File Server(s) SRM Listener

Web Services

Single Site

Page 11: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Components: Replica Catalog

Track all copies of a file / data set

• Get replicas

• Create replica

• Remove replica

• Prototypes exist at

• Jefferson Lab

• Fermilab

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

File Client

Meta Data Catalog

Replica Catalog

SRM Service

Replication Service

Storage Resource

File Server(s) SRM Listener

Web Services

Single Site

Page 12: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Components: Storage Resource Manager

Manage storage system

• Disk only

• Disk plus tape

• 3 party file transfers

• Negotiate protocols for file retrieval (select a file server)

• Auto stage a file on get (asynchronous operation)

• Version 2.1 defined (collaboration)

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

File Client

Meta Data Catalog

Replica Catalog

SRM Service

Replication Service

Storage Resource

File Server(s) SRM Listener

Web Services

Single Site

Page 13: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

ILDG Components

MetaData Catalog (MDC)

• Each collaboration deploys one

• A mechanism (not defined yet, under discussion) exists for searching all (a virtual MDC)

Replica Catalog (RC)

• (same comments)

Storage Resource Manager (SRM)

• Each collaboration deploys one or more

• At each SRM site, there are one or more file servers: http, ftp, gridftp, jparss, bbftp, …

* Slide from Chip Watson, ILDG Middleware* Slide from Chip Watson, ILDG MiddlewareProject StatusProject Status

Page 14: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

JLab: Experimental Effort

• PPDG (Particle Physics Data Grid)

• Collaboration of computer scientists and physicists

• Developing and deploying production Grid systems for experiment-specific applications

• Now supporting OSG (Open Science Grid)

• SRM (Storage Resource Manager)

• A common/standard interface to mass storage systems

• In 2003 FSU used SRM v1 to process monte-carlo for 30 million events

• In 2004 deployed a v2 implementation for testing

• Required for production in February 2005

• Already working with LBL, Fermi, CERN to define v3

• Job Submission

• PKI Based authentication to Auger (JLab job submission system)

• Investigated uJDL (a user level job description language)

• BNL leading this effort

Page 15: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Envisioned Architecture

Meta-JobGateway

Meta-JobScheduler

Meta-JobTranslator

Site Submitter

ReplicaCatalog

StorageResource Manager

ReplicaCatalog

GridMonitoring

LSFStorage

Resource Manager

User Submission uJDL

sJDL

uJDL

sJDL

sJDL

Provided by the Experiments

Condor-G

Provided by the Site

GFN

GFN

SURL

SURL

Meta-DataCatalog

Page 16: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

SRM v2

• Implemented SRM version 2.1.1.

• Interface to Jasmine via the HPC Disk/Cache Manager.

• JLab SRM is a Java Web Service.

• Uses Apache Axis as SOAP Engine

• Uses Apache Tomcat as Servlet Engine.

• Uses GridFTP for file movement

• Testing with CMU

• Production service required by February 2005.

• Had a hard time using GT3

• Cannot just take components that one wants (it is all or nothing)

Page 17: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

SRM v2 Server Deployment

• Requires Tomcat, MySQL, SRM worker daemon

• Firewall configuration:  SRM port 8443  GRIDFTP ports 2811, 40000-45000

• Currently only installed at JLab

• Testing client access with CMU

• Next step: install an SRM server at CMU

Page 18: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

SRM v2 Client Deployment

• Installed at JLab and CMU

• Implements only srmGet and srmPut (permission problem to fix)

• Requires specific ant and java versions

• Proper grid certificate request and installation a challenge (?)

• Use OpenSSL for cert request instead

• Globus requires a full installation simply to request a cert and run the client

• Just need grid-proxy-init

• Note:  Curtis' notes are at http://www.curtismeyer.com/grid_notes

• Currently the only SRM v2 server and client

Page 19: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Long-Term SRM Work

• We are considering how the next SRM version could become the primary interface to Jasmine and the primary farm file mover.

• Use for Local and Remote Access

• Goal: 25TB/day from tape through SRM.

• Balancing classes of requests/prioritizing types of data transfers becomes essential.

• Farm interaction use cases must be modeled: farm input, farm output, scheduling.

• We are already looking at what SRM v3 will look like.

• SRM Core Features and Feature Sets (ideas from the last SRM meeting)

Page 20: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Job Submission

• uJDL

• Is this really needed?

• Is a standard job submission interface what is really needed? 

• Is that Condor-G?

• Auger interface

• Uses java web services

• Uses PKI authentication for authentication

• Not GSI

Page 21: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

Grid3dev - OSG

• JLab development effort is limited

• Grid3 proved successful

• Atlas and CMS were the major users

• JLab plans to join Grid3dev as a step toward OSG-INT/OSG

• We cannot develop everything we need

• VO management tools, monitoring, etc.

• Testing and evaluation

• Integration with facility infrastructure

• Determine what we need and can use for others

Page 22: SURA Cyberinfrastructure Workshop Georgia State University January 5 – 7, 2005

Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy

Thomas Jefferson National Accelerator Facility

Andy Kowalski

References

• http://www.ppdg.net

• http://www.opensciencegrid.org

• http://www.lqcd.org/ildg

• http://sdm.lbl.gov/srm

• http://www.globus.org