management of user requested data in us atlas

17
Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November 14, 2012

Upload: alisa

Post on 06-Jan-2016

17 views

Category:

Documents


1 download

DESCRIPTION

Management of User Requested Data in US ATLAS. Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November 14, 2012. Outline. User Analysis Output Central Deletion Service Victor USERDISK cleanup Monitoring and Notifications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Management of User Requested Data in US ATLAS

Management of User Requested Data in US ATLAS

Management of User Requested Data in US ATLAS

Armen VartapetianUniversity of Texas, Arlington

US ATLAS Distributed Facility Workshop UC Santa Cruz, November 14, 2012

Page 2: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 2

OutlineOutline

User Analysis OutputCentral Deletion Service Victor USERDISK cleanupMonitoring and Notifications

DaTRI

LOCALGROUPDISK policy

Page 3: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 3

Storing User Analysis OutputStoring User Analysis OutputUser analysis output in US is stored in USERDISK of the site where the job has run

Only US sites have USERDISKS. In non-US sites the destination for output is SCRATCHDISK

US has specific policy for USERDISK maintenance/cleanup – more relaxed/user-friendly than for SCRATCHDISK (details later)

Both space tokens are temporary storage, but users can subscribe their data to different locations using DaTRI request system (details later)

Typical destination for user data by DaTRI requests is LOCALGROUPDISK or GROUPDISK for longer storage, or even to SCRATCHDISK for further temporary storage

Datasets in LOCALGROUPDISK or GROUPDISK by default don’t have limited lifetime, so these space tokens (unlike some other space tokens) are not cleaned up on a regular basis

Page 4: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 4

Central Deletion ServiceCentral Deletion ServiceCleanup of all space tokens is carried out through the central deletion service

The very basic command to submit a dataset for deletion is: dq2-delete-replicas <dataset> <space-token>

The command will submit the dataset deletion to the Central Deletion Service and right away put it on queue

Deletion service flow for datasets is: ToDelete -> Waiting -> Resolved -> Queued -> Deleted . It also shows the status ToDelete -> Deleted for file count, as well as for the space. Errors are also shown, if any.

Currently the typical deletion rate for US sites is 2-4 Hz for T2-s and 7-8 Hz for T1

One can change/optimize the deletion rate tweaking some site specific parameters in deletion service configuration file

Load, bottlenecks and other srm issues can create timeouts, reduction of the deletion rate and cause errors

If site has more than 100 errors in 4 hours, the ADCoS shifter must file a ggus ticket

Page 5: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 5

Cleanup Decision - VictorCleanup Decision - VictorDaily monitoring of the space tokens to detect low space availability and trigger space cleanup is done by the system called Victor Victor takes care of only those space tokens which need regular cleanupIt prepares a list of datasets to be sent to central deletion system. A grace period of 1 day is exercised SCRATCHDISK – cleanup is triggered when free space is <50%. The oldest replicas are selected for deletion (older than 15 days). Target free space >55% .DATADISK – when free space is getting low. Only “secondary” type of datasets are triggered for deletion, older than 15 days. Popularity of datasets is taken into account.

forT2-s cleanup is triggered when free space <10%, with target >15%for T1 cleanup is triggered when free space <500 TB, with target >750TB

PRODDISK – cleanup is triggered when free space <10TB, with target free space >12TB. Only datasets older than 31 days. The issue is also to cleanup the pandamover files, done locallyGROUPDISK – cleanup defined by the group responsible person

Page 6: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 6

USERDISK Cleanup USERDISK Cleanup

The USERDISK cleanup is done on average every 2 months

We target datasets older than 2 months

Targeted user datasets are matched with dataset owner DN from dq2 catalog and dataset lists per DN are created

Notification email is sent to users about the upcoming cleanup of the datasets with a link to the list and some basic information on how to proceed if the dataset is still needed

We maintain and use a list of DN to email address associations, and regularly take care of the missing/obsolete emails

After the notification email the users have 10 days to save the data they need

This cleanup procedure is used during the last 4 years

Very smooth operation, no complains, users happy

Page 7: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 7

USERDISK Cleanup Notification USERDISK Cleanup Notification

Question whether the user is well informed on all available options to save the data targeted for deletion

Excerpt from the notification email with the information for users:

You are advised to save any dataset, which is still of interest, to your private storage area. You may also use your local group disk storage area xxx_LOCALGROUPDISK if such area has been defined. Please contact your local T1/T2/T3 responsible of disk storage for further assistance.

If the list contains datasets of common interest to a particular physics group, please contact that group representative to move your datasets to xxx_ATLASGROUPDISK area.

If you are going to copy your dataset to xxx_LOCALGROUPDISK or xxx_ATLASGROUPDISK please use the Subscription Request page: http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req If you are going to copy your dataset to any private storage area (not known to grid) please use dq2-get. See the link for help: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo

This must cover all the practical options…

Page 8: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 8

Storage Monitoring, NotificationsStorage Monitoring, Notifications

Storage monitoring from ddm group: http://bourricot.cern.ch/dq2/accounting/site_reports/USASITES/

Drop-down menus provide other storage tables and plots, grouped by space tokens, clouds, etc.

Also notifications with the list of space tokens, which run low on free space, and if any space token runs out of space ( < 0.5TB ) and is blacklisted

Notification thresholds:T1 DATADISK < 10TB

T2 DATADISK < 2TB

PRODDISK < 20%

USERDISK < 10%

Others < 10TB

Page 9: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 9

DaTRIDaTRI

Data Transfer Request Interface (DaTRI) – to submit transfer requests, also provides monitoring of the transfer status

Request can be placed by web interface or automatically as output destination of the analysis job

All the links are available at the left bar of Panda Monitor page under the Datasets Distribution drop-down menu

Users need to be registered within DaTRI. Registration link is in the main page. Also there is a link to check the registration status. Also if you are not sure, use the opportunity to check your certificate for usatlas role

DaTRI request on web interface – basically you fill dataset pattern, destination and justification for transfer

Page 10: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 10

DaTRIDaTRISubmitted DaTRI request has following states/stages: PENDING -> AWAITING_APPROVAL -> AWAITING_SUBSCRIPTION -> SUBSCRIBED -> TRANSFER -> DONEOnce scheduled for approval, a request ID will be assignedError message if dataset pattern is not correct, dataset is empty, destination site has not enough space, group quota at the destination site is exceeded, etc. Each cloud has DaTRI coordinators for manual approval. In US Kaushik De, Armen VartapetianApproval to GROUPDISKs done by group representativesAn automatic approval if summary size is < 0.5TB, and only if user has usatlas role (a very common issue/problem)

Monitoring provides also link to the dashboard, as well as replica status for each dataset

Plan to provide a functionality within DaTRI web interface to upload list/pattern of user datasets for deletion. Help users to get rid of the obsolete data

Page 11: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 11

LOCALGROUPDISK PolicyLOCALGROUPDISK PolicyIntended as a long term storage for users

Unpledged resource (main concern T1/T2)

No ADC policy or recommendations for management

Central cleaning only for aborted and failed tasks

The main issue is the absence of the usage and cleanup policy. Because of that, tendency to grow in size

Usage tables for some of the US LOCALGROUPDISK-s in backup slides

Common trend is that usually there are 2-3 super users per site who occupy more than half of the space (there may be a group behind such user). A dozen of top users occupy more than 90% of the space, and there are many more users with less of a share

Similar situation with storage distribution can be seen in other clouds as well

Part of that data may have more relevance to GROUPDISK or even DATADISK (move data to pledged resources).

Page 12: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 12

LOCALGROUPDISK PolicyLOCALGROUPDISK PolicySome datasets with many replicas. Some of them owned by the same top users. The situation will become unsustainable if the number of such top users will grow over time

Some datasets with only replica, and big chunk of that is not used for a while. Put in place policy/path for their retirement

Popularity analysis may help to distinguish datasets which may be obsolete, and candidates for retirement

We may start with soft space limit of 2-3TB per user per site

Start to ask questions when size is above that

Particularly for the datasets not used for N months (1 year?) – check if user still needs them

Approval mechanism for sample transfers > N TB (10TB?). Centralized approval and decision for space allocation for big samples.

LOCALGROUPDISK management policy is currently under discussion at RAC

Page 13: Management of User Requested Data in US ATLAS

Armen Vartapetian US ATLAS Distributed Facility Workshop November 14, 2012 13

BACKUP

Page 14: Management of User Requested Data in US ATLAS

BNL localgroupdisk, used space 196TB

User DN Used Space (TB) # of Datasets/dc=org/dc=doegrids/ou=people/cn=david adams 407137 53 695/dc=org/dc=doegrids/ou=people/cn=anyes taffard 365111 37 767/dc=org/dc=doegrids/ou=people/cn=andrew haas 477621 24 7959/dc=org/dc=doegrids/ou=people/cn=caleb lampen 137475 21 26312/dc=org/dc=doegrids/ou=people/cn=shuwei ye 481005 11 832/c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov 7 380/c=uk/o=escience/ou=manchester/l=hep/cn=john almond 5 777/dc=org/dc=doegrids/ou=people/cn=jacob searcy 585618 5 428/dc=org/dc=doegrids/ou=people/cn=vivek jain 39104 4 146/dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916 3 136

/dc=org/dc=doegrids/ou=people/cn=tarrade fabien 615936 2 120

/dc=org/dc=doegrids/ou=people/cn=stephanie majewski 989915 2 12

/dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik 292404 2 37

Total for Top 13 Users (used space > 2TB), see the list above 176

Total for Remaining 35 Users (used space < 2TB) 20

Total Used Space 196

Page 15: Management of User Requested Data in US ATLAS

SLAC localgroupdisk, used space 355TB

User DN Used Space (TB) # of Datasets/dc=ch/dc=cern/ou=organic units/ou=users/cn=eifert 122 5048/dc=ch/dc=cern/ou=organic units/ou=users/cn=toshi 68 352/dc=org/dc=doegrids/ou=people/cn=anyes taffard 365111 44 1637/dc=org/dc=doegrids/ou=people/cn=brokk toggerson 918086 21 600/dc=org/dc=doegrids/ou=people/cn=andrew haas 477621 20 5067/dc=org/dc=doegrids/ou=people/cn=steven andrew farrell 628960 17 1489/dc=org/dc=doegrids/ou=people/cn=jason veatch 421088 15 362/dc=org/dc=doegrids/ou=people/cn=michael werth 340844 9 165/dc=org/dc=doegrids/ou=people/cn=bart clayton butler 62122 6 138/dc=org/dc=doegrids/ou=people/cn=alaettin serhan mete 462708 5 77/dc=org/dc=doegrids/ou=people/cn=david wilkins miller 359945 5 555/dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr. 669916 3 56/dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik 292404 3 32/dc=org/dc=doegrids/ou=people/cn=maximilian swiatlowski 759645 3 966/dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916 2 42Total for Top 15 Users (used space > 2TB), see the list above 343

Total for Remaining 19 Users (used space < 2TB) 12

Total Used Space 355

Page 16: Management of User Requested Data in US ATLAS

MWT2+ILLINOISHEP localgroupdisk, used space 302TB

User DN Used Space (TB) # of Datasets/dc=org/dc=doegrids/ou=people/cn=samuel meehan 301165 140 631/dc=org/dc=doegrids/ou=people/cn=david lesny 786524 34 1358/dc=org/dc=doegrids/ou=people/cn=frederick luehring 621522 26 1310/dc=org/dc=doegrids/ou=people/cn=anton kapliy 714928 23 1387/dc=org/dc=doegrids/ou=people/cn=jordan scott webster 343989 20 1012/c=uk/o=escience/ou=oxford/l=oesc/cn=maria fiascaris 16 1325/dc=org/dc=doegrids/ou=people/cn=antonio boveia 203522 15 1076/dc=org/dc=doegrids/ou=people/cn=constantinos melachrinos 366868 6 432/c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov 3 12/dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr. 669916 2 44/dc=org/dc=doegrids/ou=people/cn=joseph tuggle 107765 2 155/dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916 2 43/dc=org/dc=doegrids/ou=people/cn=elizabeth jue hines 745833 2 1

Total for Top 13 Users (used space > 2TB), see the list above 291

Total for Remaining 20 Users (used space < 2TB) 11

Total Used Space 302

Page 17: Management of User Requested Data in US ATLAS

AGLT2 localgroupdisk, used space 238TB

User DN Used Space (TB) # of Datasets

/dc=org/dc=doegrids/ou=people/cn=haijun yang 938003 204 4739

/dc=org/dc=doegrids/ou=people/cn=shawn mckee 83467 10 496

/dc=ch/dc=cern/ou=organic units/ou=users/cn=lxu 9 9

/c=il/o=iucc/ou=tau/cn=nir amram 4 8

/dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916 2 40

Total for Top 5 Users (used space > 2TB), see the list above 229

Total for Remaining 18 Users (used space < 2TB) 9

Total Used Space 238