update on plan for tier-1 @ kisti-gsdc

16
Update on Plan for Tier-1 @ KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team [email protected] GSDC Tier-1 Team 20/11/2012 WLCG MB

Upload: kirestin-donaldson

Post on 01-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

GSDC Tier-1 Team. 20/11/2012 WLCG MB. Update on Plan for Tier-1 @ KISTI-GSDC. Sang-Un Ahn , for the GSDC Tier-1 Team [email protected]. Outline. Current Status Resource Summary Operation status Network Update on Plan Tape: installation & test Network upgrade Pledges SAM & APEL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Update on Plan for Tier-1 @ KISTI-GSDC

Update on Plan for Tier-1 @ KISTI-GSDC

Sang-Un Ahn, for the GSDC Tier-1 [email protected]

GSDC Tier-1 Team

20/11/2012 WLCG MB

Page 2: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 2

• Current Status– Resource Summary– Operation status– Network

• Update on Plan– Tape: installation & test– Network upgrade– Pledges– SAM & APEL– Staff

• Conclusion

Outline

Page 3: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 3

CURRENT STATUS

Resource SummaryOperation Network

Page 4: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 4

Resource Summary• CPUs

– Intel Xeon, 24 cores w/ 96GB (3GB/core) x 62 WNs = 1488 cores (~17k HS06)

• Disks– 1000 TB disks

• Middleware components– CREAM-CE, site-BDII, VOBOX, XROOTD, WNs, APEL– Production in gLite3.2 -> EMI migration in progress: EMI-2 on SL6

• Tape (being installed)– 1 PB capacity, 275 TB buffer, 2 GB/s throughput

• Network– Dedicated 1Gbps established

Storage Element Status @ MonALISA

Page 5: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 5

Operation• Job capacity more than 1400 jobs since the end of July

– Internal network issue solved

• Quite stable now but instability on PBS requires its restarting in 48/72hrs • Hope to be improved after EMI-2 migration

March – October in 2012

Internal network issue solved

1,400

# of

Jobs

Stable status

Active jobs

Apr-12 May-12 Jun-12 Jul-12 Aug-12 Sep-12 Oct-120%

10%20%30%40%50%60%70%80%90%

100%

78.68%86.47%

72.87%

93.68% 91.35% 93.86% 91.38%

78.68%86.47%

84.72%93.68% 91.35% 93.86% 91.38%

Availability Reliability

Page 6: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 6

Network Traffic Status

Start of torrent-based ALICE package distribution service:Observed large incoming traffic (maximal 5 minutes) after

Dedicated 1Gbps established

• GLORIAD-KR – CERN (dedicated 1Gbps)– ‘Yearly’ graph (1 Day Average)– Correlation with # of active jobs

Incoming Traffic in bpsOutgoing Traffic in bpsMaximal 5 minutes Incoming TrafficMaximal 5 minutes Outgoing TrafficFunctional test

Internal network issue solved

Page 7: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 7

UPDATE ON PLAN

TAPE: installation & testNetwork upgradePledgesSAM & APELStaff

Page 8: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 8

TAPE• IBM TAPE library (TS3500)

• TSM and GPFS on SL6.3 is being installed and functioning test will be complete in this week

• Disk buffer for cache (allows prompt access to archived data)– ALICE request: Min. 200 TB; max. 400 TB for pA– 275 TB is now assigned for buffer

• Start of data transfer test with ALICE is foreseen by 3 Dec. 2012

Capacity (expend-able) Tape Drive Throughput Robotics Support

1 PB (3PB) 8 drives: R/W @ 250MB/s 2 GB/s Dual 24/7 recovery for 3 years

Page 9: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 9

Network to LHCOPN

Shared 10Gbps

Dedicated 1Gbps

Dedicated 1Gbps

• Currently, 1Gbps (dedicated) connection for CERN-KISTI• 10Gbps connection is required to join LHC OPN (Optical Private Network)• In April 2013, dedicated 2Gbps will be established (budget secured)• Plan for upgrading 3Gbps will be presented by May 2013

Current Network Set-up

Page 10: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 10

Pledged• Pledged resources

– Based on WLCG & ALICE Collaboration MoU– Provides 2500 cores (~31k HS06) & 2PB for TAPE by 2014– Meeting ALICE requirement by 2013

Current(ALICE Req.)

Pledged

2012 2013 2014

CPUs (HS06) 16,900(25,000) 18,800 25,000 31,250

Disk (TB) 1,000(1,000) 1,000 1,000 1,000

Tape (TB) -(1,500) 700 1,500 2,000

Page 11: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 11

SAM & APEL• Job queues available for OPS VO

– KISTI_GSDC included monthly SAM report for OPS VO– Not monitored by ALICE VO

• APEL server is running and properly publishing accounting data– Recently found APEL was not correctly configured– Fixed it and should be working now

Page 12: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 12

StaffROLE June (7 FTE) Current (7 FTE)

System Admin

Disk & Server 3 FTE 2.5 FTE (2 FTE to be employed)

Tape 1 FTE (to be employed) 1 FTE

Grid Middleware 1.5 FTE 1.5 FTE

Network 0.5 FTE + KISTI support 0.5 FTE + KISTI support

Power/Cooling KISTI support KISTI support

Physics Analysis ALICE service 1 FTE (0.5 FTE to be employed) 1.5 FTE

Good News• 2 new member employed in June: 1 for TAPE; 1 for M/W and ALICE support

Bad News• 1 member (Beob Kyun Kim) for M/W left in September and 1 member for S/A

would leave soon

Page 13: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 13

SUMMARY

MilestonesT1 Demonstration Plan Roadmap

Page 14: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 14

Milestones

ObjectiveTarget date

Original Revised

Nominate KISTI/GSDC representatives in the WLCG Management Board and the GDB Jun. 2012 -

Establishment of a 1Gbps connectivity to CERN Apr. 2012 -

Installation of tape system Nov. 2012 Dec. 2012

High speed transfer of data from CERN to KISTI at the speed required to receive and archive 10% of the ALICE AA raw data foreseen for 2012 over a continuous period of 2 weeks

Dec. 2012 Jan. 2012

Provide a precise plan for 3Gbps (or higher) connectivity to CERN Jan. 2013 May 2013

Present a plan for providing on-call services/support according to the T1 specifications as laid out in the WLCG MoU May 2013 -

85% of the job capacity running for at least 2 months Jan. 2013 Feb. 2013

90% Storage Element (DPM and/or XROOTD ?) availability (functional tests) for at least 2 months Jan. 2013 Jul. 2013

Running of the reliability tests (both OPS and ALICE-specific) and publishing those to the new SAM infrastructure Jan. 2013 -

Integration with the APEL accounting system and publishing accounting data Jan. 2013 -

90% of the WLCG T1 service targets for at least 2 months Feb. 2013 Sep. 2013

Integration in the WLCG OPN (with 2Gbps) Jan. 2013 Jul. 2013

Functional tests of the OPN (with 2Gbps) Feb. 2013 Aug. 2013

■ Done ■ Almost done ■ To be done ■ In question ■ Urgent

Page 15: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 15

T1 Roadmap

1. TAPE library set-up and data transfer test: to be ready for pA 2. Plan for 3Gbps to be provided by May 20133. Internal discussion on 24hr on-call service: monitoring, shift, and documentations …4. >85% job capacity can be accomplished 5. Demonstration of Tier-1 target services (>90%) at least 2 months

Page 16: Update on Plan for Tier-1 @ KISTI-GSDC

WLCG MB/20. 11. 2012 16

Thank you