d0race: testbed session

8
D0RACE: Testbed Session Lee Lueking Lee Lueking D0 Remote Analysis D0 Remote Analysis Workshop Workshop February 12, 2002 February 12, 2002

Upload: lottie

Post on 05-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

D0RACE: Testbed Session. Lee Lueking D0 Remote Analysis Workshop February 12, 2002. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: D0RACE: Testbed Session

D0RACE: Testbed Session

Lee LuekingLee Lueking

D0 Remote Analysis WorkshopD0 Remote Analysis Workshop

February 12, 2002February 12, 2002

Page 2: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 2

Overview The networkThe network reliability and performance is of great importance to the D0 Data Model. D0 would like to be involved in the DataGrid/WP7 and DataTag studies to monitor and improve the reliability and performance is of great importance to the D0 Data Model. D0 would like to be involved in the DataGrid/WP7 and DataTag studies to monitor and improve the

network performance. DataGrid/WP7 covers the network within Europe whereas DataTag concentrates on the Europe-US intercontinental link. An initiative to enhance the performance of the network performance. DataGrid/WP7 covers the network within Europe whereas DataTag concentrates on the Europe-US intercontinental link. An initiative to enhance the performance of the connecting network in the US could be a result of this initiative.connecting network in the US could be a result of this initiative.

   Until recently SAM used ftp and bbftp for the transport of files between the data storage location and the cache area of the computer used to process the files. Recently tests have started to Until recently SAM used ftp and bbftp for the transport of files between the data storage location and the cache area of the computer used to process the files. Recently tests have started to

make use of make use of GridFTPGridFTP together with the Grid Security Infrastructure GSI. This is particularly interesting because it will require a marriage between this security layer and the Kerberos security together with the Grid Security Infrastructure GSI. This is particularly interesting because it will require a marriage between this security layer and the Kerberos security in operation at Fermilab.in operation at Fermilab.

   SAM can be used to select files and run production or analysis jobs on them. Recently an initiative has started to use SAM can be used to select files and run production or analysis jobs on them. Recently an initiative has started to use Condor as a workload schedulerCondor as a workload scheduler within SAM in order to make maximal within SAM in order to make maximal

use of the compute resources available at Fermilab and the participating institutes.use of the compute resources available at Fermilab and the participating institutes.    D0 has a request system for Monte Carlo generation of specific data channels. At this moment these requests are send to the (mostly external) Monte Carlo production sites by email and D0 has a request system for Monte Carlo generation of specific data channels. At this moment these requests are send to the (mostly external) Monte Carlo production sites by email and

submitted through human intervention. The goal is to evolve to an automatic system where the user requests are submitted to the full D0 Monte Carlo submitted through human intervention. The goal is to evolve to an automatic system where the user requests are submitted to the full D0 Monte Carlo Compute FabricCompute Fabric consisting of all cpu consisting of all cpu resources within the collaboration. Most likely this services will be integrated within SAM.resources within the collaboration. Most likely this services will be integrated within SAM.

   At this moment only limited use is made of the At this moment only limited use is made of the data storagedata storage capacity of institutes other than Fermilab. One of the difficulties has been the multitude of storage systems. At this moment Monte capacity of institutes other than Fermilab. One of the difficulties has been the multitude of storage systems. At this moment Monte

Carlo data is stored at SARA in Amsterdam and at the Computer Centre CCIN2P3 in Lyon but an effort will be made to integrate maximally all available storage locations within the Carlo data is stored at SARA in Amsterdam and at the Computer Centre CCIN2P3 in Lyon but an effort will be made to integrate maximally all available storage locations within the collaboration for Monte Carlo generated data as well as analysis data.collaboration for Monte Carlo generated data as well as analysis data.

   PlanningPlanning Most of the above projects have at least been discussed or have even been started at some level as part of the D0 PPDG and wider D0 Grid efforts. We estimate that the International Grid Most of the above projects have at least been discussed or have even been started at some level as part of the D0 PPDG and wider D0 Grid efforts. We estimate that the International Grid

Testbed initiative will boost most if not all of these activities. The network monitoring and GridFTP tests have started on the European side but could be taken to a similar level on the US site Testbed initiative will boost most if not all of these activities. The network monitoring and GridFTP tests have started on the European side but could be taken to a similar level on the US site still this year.still this year.

   First contacts have been made with the Condor team and some initial tests have been done but more work is needed. The use of Condor as a workload scheduler within SAM will still take First contacts have been made with the Condor team and some initial tests have been done but more work is needed. The use of Condor as a workload scheduler within SAM will still take

several months and the use of Condor to make use of grid cpu resources in the participating institutes in Europe will have about the same timescale.several months and the use of Condor to make use of grid cpu resources in the participating institutes in Europe will have about the same timescale.    The distributed data storage can only proceed with the speed new storage locations become available but the present situation can still be largely improved. Within half a year it should be The distributed data storage can only proceed with the speed new storage locations become available but the present situation can still be largely improved. Within half a year it should be

possible to store all externally produced data locally at the producing institutes. One year from now it should be possible to store any D0 data at a location where the grid (or SAM in the D0 possible to store all externally produced data locally at the producing institutes. One year from now it should be possible to store any D0 data at a location where the grid (or SAM in the D0 case) decides the data can be stored best.case) decides the data can be stored best.

   The network performance has to be increased such that the user will not notice the difference in use of files that are stored locally or not. Distributed compute resources within the collaboration The network performance has to be increased such that the user will not notice the difference in use of files that are stored locally or not. Distributed compute resources within the collaboration

should become more integrated just as the data storage systems. A workload scheduler should be able to make optimal use of all these resources.should become more integrated just as the data storage systems. A workload scheduler should be able to make optimal use of all these resources.    ManagementManagement D0 is preparing a more detailed proposal for an International D0 Grid Testbed and the management will be described in there. It will have a small managerial board with people from the D0 is preparing a more detailed proposal for an International D0 Grid Testbed and the management will be described in there. It will have a small managerial board with people from the

participating institutes in the US and Europe and it will have a technical board, which will address architectural issues and practical problems. The managerial board will have its representatives participating institutes in the US and Europe and it will have a technical board, which will address architectural issues and practical problems. The managerial board will have its representatives in the International DataGird Coordination meeting as well in the other appropriate bodies such as the PPDG.in the International DataGird Coordination meeting as well in the other appropriate bodies such as the PPDG.

Page 3: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 3

Who is interestedWho is interested Oklahoma U., IN2P3, Wuppertal, NIKHEF, UTA, Oklahoma U., IN2P3, Wuppertal, NIKHEF, UTA,

Lancaster, Imperial, Prague, Micigan).Lancaster, Imperial, Prague, Micigan). Standard network testing procedure:Standard network testing procedure:

Netperf, iperf(Horst Severini, Shawn McKee ) Netperf, iperf(Horst Severini, Shawn McKee ) performance. performance.

Transmission rates from station logs as function of Transmission rates from station logs as function of time.time.

Throughput numbers.Throughput numbers. Measurements of error rates, packet and higher level.Measurements of error rates, packet and higher level.

Page 4: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 4

What are we trying to achieve?What are we trying to achieve? Find bottlenecks, network understanding and debugging Find bottlenecks, network understanding and debugging

at various sites. at various sites. Understand scalability issues, operation of multiple sites.Understand scalability issues, operation of multiple sites. End-to-end testEnd-to-end test Transfers not only from FNAL but among all sites, or Transfers not only from FNAL but among all sites, or

configured locations.configured locations. Break into specific tests, isolate components.Unit testing.Break into specific tests, isolate components.Unit testing. How to optimize cachingHow to optimize caching How can we do real work? Run reco or reco analyze at How can we do real work? Run reco or reco analyze at

multiple sites. Simultainously . multiple sites. Simultainously .

Page 5: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 5

By March 15By March 15Iperf test Iperf test

Single file, from cache, from enstore, Single file, from cache, from enstore, from central-analysis cachefrom central-analysis cache

Project package running. Iain will set upProject package running. Iain will set up Begin more complex tests and monitoringBegin more complex tests and monitoring

Touch base biweekly, at least in e-mail, Touch base biweekly, at least in e-mail, maybe in d0grid meetings or other.maybe in d0grid meetings or other.

Page 6: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 6

Linux clusters: Clued0/sam combination. RogerLinux clusters: Clued0/sam combination. Roger How do we judge when a release is useful? GordonHow do we judge when a release is useful? Gordon Remote tasks: software shifts, moderating FAQ Remote tasks: software shifts, moderating FAQ

pages, web master.pages, web master. Suggestion: Maintain a standard, operational, sam Suggestion: Maintain a standard, operational, sam

reference station for people to look at and see how reference station for people to look at and see how things are configured.things are configured.

Pass the torch. Encourage and help each other get Pass the torch. Encourage and help each other get things setup and running.things setup and running.

Documentation needs to be kept up to date. Meena Documentation needs to be kept up to date. Meena

Page 7: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 7

All of these projects are working towards All of these projects are working towards the common goal of providing transparent the common goal of providing transparent access to the massively distributed access to the massively distributed computing infrastructure that is needed to computing infrastructure that is needed to meet the challenges of modern experiments meet the challenges of modern experiments … … (From the EU DataTAG proposal)(From the EU DataTAG proposal)

Page 8: D0RACE: Testbed Session

February 12, 2002 Lee Lueking - D0 RACE 8

Grid Projects Timeline

Q3 00

Q4 00

Q4 01

Q3 01

Q2 01

Q1 01

Q1 02

GriPhyN: $11.9M+$1.6M

PPDG:$9.5M

iVDGL:$13.65M

EU DataGrid: $9.3M

EU DataTAG:4M Euros

GridPP: