dcache deployment at tier1a uk hep sysman april 2005

17
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005

Upload: nen

Post on 17-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

DCache Deployment at Tier1A UK HEP Sysman April 2005. DCache at RAL 1. Mid 2003 We deployed a non grid version for CMS. It was never used in production. End of 2003/Start of 2004 RAL offered to package a production quality DCache. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

DCache Deployment at Tier1A

UK HEP Sysman April 2005

Page 2: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

DCache at RAL 1

• Mid 2003

– We deployed a non grid version for CMS. – It was never used in production.

• End of 2003/Start of 2004– RAL offered to package a production quality

DCache.– Stalled due to bugs and and went back to

developers and LCG developers.

Page 3: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

DCache at RAL 2

• Mid 2004– Small deployment for EGEE JRA1

• Intended for gLite i/o testing.• End of 2004

– CMS instance• 3 disk servers ~ 10TB disk space• Disk served via nfs to pool nodes.• Each pool node running a gridftp door.• In LCG information system.

Page 4: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

CMS Instance

Page 5: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

DCache at RAL 4

• Start of 2005– New Production instance supporting CMS,

DTeam,LHCb and Atlas VOs.• 22TB disk space.• CMS instance decommissioned and

reused. • Separate gdbm file for each VO.• Uses directory-pool affinity to map areas of

file system to VOs’ assigned disk.

Page 6: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Page 7: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

DCache at RAL 5

• Early 2005– Service Challenge 2

• 4 disk servers ~ 12TB disk space.• UKLight connection to CERN.• Pools directly on disk servers.• Standalone Gridftp and SRM doors .• SRM not used in Challenge due to software

problems at CERN.• Interfaced to Atlas Data Store.

Page 8: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

SC2 instance

gridftp gridftp

gridftp

gridftp

gridftp

gridftp gridftp gridftp

Nortel 5510 Stack (80Gps)

SRM

Summit 7i

UKLIGHT (2*1Gps)

SJ42*1Gps dCach

e

D/B

8 dCachepools

3TB

3TB

3TB

3TB

DisklessGridFTP doors

head

Page 9: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

SC2 results

• Achieved 75MB/s to disk, 50MB/s to tape

– Seen faster - 3000Mb/s to disk over LAN

– Network delivered at last minute, under-provisioned

• Odd iperf results, high udp packet loss.

Page 10: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Future Developments

• Interface ADS to production dCache– Considering second srm door.– Implement script to propagate deletes from

dCache to ADS• Service Challenge 3

– Still planning.– Use production dCache.

• Experiments may want to retain data.– Avoid multi-homing if possible.

• Connect UKLight into site network.

Page 11: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Production Setup

Proposed

Testing:Dteam onlyfor now

Page 12: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

VO Support

• Bit of a hack – DCache has no concept of VOs

– Gridmap periodically run through perl script to produce mapping of DN to Unix UID/GID.• Each vo member mapped to first pool

account of vo. All vo’s files owned by that account.

– VOMS support coming…

Page 13: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Postgres

• Postgres SRM database is CPU hog– Being worked on.– Current recommendation is a separate

host for PostgreSQL.• Can use the database to store dCache

transfer information for monitoring.• In future may be possible to use for pnfs

databases.

Page 14: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

SRM requests

• Each SRM request lasts for (default) 24 hours if not finished properly.

– Too many and the srm door queues new requests until slot available.

– Educate users to use lcg-sd after an lcg-gt, don’t Ctrl-C lcg-rep…

Page 15: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

SRM-SRM copies

• Pull mode– If dCache is the destination, then the

destination pool initiates the gridftp transfer from the source srm.• Need dcache-opt rpm installed (don’t need

gridftp door running) on pools.• Pool node need certificate and

GLOBUS_TCP_PORT_RANGE accessible to incoming.

– Lcg-utils don’t do this but srmcp does.

Page 16: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Quotas

• If two vo’s can access same pool, no way to stop one vo grabbing all of pool.

• No global quotas

– Hard to do, pools can come and go

• Only way to restrict disk usage is limit pools a vo can write to.

– But can’t get space available per vo.

Page 17: DCache Deployment at Tier1A UK HEP Sysman April 2005

Derek Ross

E-Science Department

Links

• http://ganglia.gridpp.rl.ac.uk/?c=DCache

• http://ganglia.gridpp.rl.ac.uk/?c=SC