proof farm preparation for atlas fdr-1 wensheng deng, tadashi maeno, sergey panitkin, robert petkus,...

15
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin , Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Upload: brent-freeman

Post on 17-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

PROOF Farm preparation for Atlas FDR-1

Wensheng Deng, Tadashi Maeno, Sergey Panitkin,Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei

Ye

BNL

Page 2: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Outline

Introduction Atlas FDR-1 Farm preparation for FDR1 PROOF tests Analyses

Sergey Panitkin

Page 3: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

S. Rajagopalan, FDR meeting for U.S.

FDR: What is it?

Provides a realistic test of the computing model from online (SFO) to analysis at Tier-2’s.

Exercise the full software infrastructure (CondDB, TAGDB, trigger configuration, simulation with mis-alignments, etc.) using mixed events.

Implement the calibration/alignment model.

Implement the Data Quality monitoring.

Specifics (from D. Charlon, T/P week):

Prepare a sample of mixed events that looks like raw data (bytestream)

Stream the events from the SFO output at Point 1 Including express and calibration streams

Copy to Tier 0 (and replicate to Tier 1’s)

Run calibration and DQ procedures on express/calibration stream

Bulk processing after 24-48 hours incorporating any new calibrations.

Distribute ESD and AOD to Tier-1s (later to Tier 2’s as well)

Make TAG and DPDs

Distributed Analysis

Reprocess data after a certain time.

Page 4: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

S. Rajagopalan, FDR meeting for U.S.

FDR-1 Time Line

January: Sample preparation, mixing events

Week of Feb. 4: FDR-1 run Stream data through SFOs

Transfer to T0, processing of ES and CS.

Bulk processing completed by weekend.

Including ESD and AOD production

Regular shifts: DQ monitoring, Calibration and Tier-0 processing shifts

Expert coverage at Tier-1 as well to ensure smooth data transfer.

Week of February 11: AOD samples transferred to Tier-1s

DPD production at Tier-1.

Week of February 18/25: All data samples should be available for subsequent analysis.

At some later point: Reprocessing at Tier-1’s and re-production of DPDs.

FDR-1 should complete before April and feedback into FDR-2

Page 5: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

PROOF farm preparation

Existing Atlas PROOF farm @BNL was expanded in anticipation of FDR1

10 new nodes each with: 8 CPUs 16 GB RAM 500 GB Hard drive Expect additional 64 GB Solid State Disk (SSD) 1Gb network

Standard Atlas software stack Ganglia Monitoring Latest version of root (currently 5.18 as of Jan. 28, 2008)

Sergey Panitkin

Page 6: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Current Farm Configuration

Sergey Panitkin

“Old farm”

10 nodes – 4 GB RAM each 40 cores: 1.8 GHz Opterons20 TB of HDD space (10x4x500 GB)

Extension

10 nodes - 16 GB RAM each80 cores: 2.0 GHz Kentsfields5 TB of HDD space (10x500 GB)640 GB SSD space (10x64 GB)

+

Page 7: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Farm resource distribution issues

The new “extension” machines are “CPU heavy”:8 cores, 1 HDD Tests showed that 1 CPU core requires ~ 10MB/s in typical I/O bound

Atlas analysis

Tests showed 1 SATA HD can sustain ~ 20 MB/s, e.g. ~ 2 cores

In order to provide adequate bandwidth for 8 cores per box we needed to augment “extension” machines with SSDs

SSDs provide high bandwidth capable of sustaining 8 core load, but have relatively small volume – 64 GB per machine. They will be able to accommodate only a fraction of the expected FDR1 data.

Hence, SSD space should be actively managed The exact scheme of data management needs to be worked out The following slides attempt to summarize current discussion about

data management with current proof farm configuration

Sergey Panitkin

Page 8: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

New Solid State Disks

Model: Mtron MSP-SATA7035064 Capacity 64 GB Average access time ~0.1 ms (typical HD ~10ms) Sustained read ~120MB/s Sustained write ~80 MB/s IOPS (Sequential/ Random)  81,000/18,000 Write endurance  >140 years @ 50GB write per day MTBF  1,000,000 hours 7-bit Error Correction Code

Sergey Panitkin

Page 9: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Farm resource distribution

Sergey Panitkin

SSD 640GB

HDD 5TB

HDD 20TB

40 Cores

80 Cores“Old Farm”

Extension

BNLXRDHDD1

BNLXRDHDD2

BNLXRDSSD

Page 10: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Plans for FDR1 and beyond

Test data transfer from dCache Direct transfer (xrdcp) via Xrootd door on dCache

Two step transfer (dccp-xrdcp) through intermediate storage

Integration with Atlas DDM

Implement dq2 registration for dataset transfers

Gain experience with SSDs Scalability tests with SSDs and regular HDs

Choice of optimal PROOF configuration for SSD nodes

Data staging mechanism within the farm

HD to SSD data transfer

SSD space monitoring and management

Analysis policies ( free for all, analysis train, subscription, etc)

Test “fast Xrootd access” – new I/O mode for Xrootd client

Test Xrootd/PROOF federation (geographically distributed) with Wisconsin

Organize local user community to analyze FDR data

Sergey Panitkin

Page 11: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Data Flow I

We expect that all the data (AODs, DPDs, TAGS, etc) will first arrive at dCache.

We assume that certain subset of the data will be copied from dCache to the PROOF farm for analysis in root.

This movement is expected to be done using a set of custom scripts and is initiated by the Xrootd/PROOF farm manager .

Scripts will copy datasets using xrdcp via Xrootd door on dCache.

Fall back solution exists in case Xrootd door on dCache is unstable.

Copied datasets will be registered in DQ2.

On the xrootd farm datasets will be stored on HDD space (currentely ~25 TB)

Certain high priority datasets will be copied to SSD disks by farm manager for analysis with PROOF

Determination of the high priority datasets will be done based on physics analysis priorities (FDR coordinator, PWG, etc)

The exact scheme for SSD “subscription” needs to be worked out

Subscription, On-demand loading, etc

Look at Alice

Sergey Panitkin

Page 12: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Integration with Atlas DDM

Sergey Panitkin

/data

Xrootd/PROOF Farm

dCache

Panda

xrdcp with dq2 registration

/ssd

xrdcptentakel

DQ2

T0

dq2_ls –fp –s BNLXRDHDD1 “my_dataset”

analysis

Atlas user

Grid transfers

Page 13: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

FDR tests

Batch analyses with Xrootd as data server AOD analysis. Compare speed with dCache – D.Adams, H.Ma

Store (all?) TAGS on the farm Our previous tests showed that Athena analyses gain from TAGs

stored on Xrootd

Use PROOF farm for physics analysis Athena Root Access analysis of AODs using PROOF

ARA was demonstrated to run on PROOF in January (Shuwei Ye)

Store (all?) FDR1 DPDs on the farm FDR1 DPDs made by H. Ma already copied to the farm

DPD based analyses Stephanie Majewski plans to study increase in the sensitivity of an

inclusive SUSY search using information from isolated tracks

Sergey Panitkin

Page 14: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Root version mismatch issues

All of datasets for FDR1 will be produced with rel. 13, which relies on root v.5.14

PROOF farm currently uses the latest production version of root -5.18. This version has many improvements in functionality and stability compare to v.5.14. It is recommend by PROOF developers

Due to changes in xrootd protocol clients running root v.5.14 cannot work with xrootd/PROOF servers from v.5.18

In order to run ARA analysis on PROOF or utilize it as Xrootd SE for AOD/TAG analysis, the PROOF farm needs to be downgraded to v5.14. Such downgrade will hurt root based analysis of AANT and DnPDs.

In principle we can run 2 farms in parallel The old farm with PROOF v.5.14

The extension farm with PROOF v.5.18

The data management scheme described on previous slides can be trivially applied to both farms.

This is a temporary solution. Athena is expected to use root v 5.18 in the next release. This will largely remove version mismatch problems

Sergey Panitkin

Page 15: PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL

Current status

Work in progress! File transfer from dCache is functional New LRC was created Files copied to Xrootd are registered in LRC via custom dq2_cr Datasets can be found using DDM tools dq2-list-dataset-replicas

user.HongMa.fdr08_run1.0003050.StreamEgamma.merge.AOD.o1_r6_t1.DPD_v130040_V5

INCOMPLETE: BNLPANDA,BNLXRDHDD1

COMPLETE:

List of files in a dataset on Xrootd can be obtained via dq2_ls Several FDR1 AOD datasets and one DPD dataset were transferred

using this mechanis Issues: Still need better integration with DDM Possible problem with large files transfers via dCache door

Sergey Panitkin