site report from kek, japan jp-kek-crc-01 and jp-kek-crc-02 go iwai, kek/crc grid operations...

19
Site Report from KEK, Site Report from KEK, Japan Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Kungliga Tekniska högskolan, Stockholm, Sweden Stockholm, Sweden 13-15 June 2007 13-15 June 2007

Upload: vivien-robbins

Post on 03-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Site Report from KEK, JapanSite Report from KEK, Japan

JP-KEK-CRC-01 and JP-KEK-CRC-02JP-KEK-CRC-01 and JP-KEK-CRC-02Go Iwai, KEK/CRCGo Iwai, KEK/CRC

Grid Operations Workshop – 2007Grid Operations Workshop – 2007Kungliga Tekniska högskolan, Stockholm, SwedenKungliga Tekniska högskolan, Stockholm, Sweden

13-15 June 200713-15 June 2007

Page 2: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

DEPLOYMENT STATUS AT KEKDEPLOYMENT STATUS AT KEKJP-KEK-CRC-01 and JP-KEK-CRC-02JP-KEK-CRC-01 and JP-KEK-CRC-02

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 22

Page 3: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

KEK External NetworkKEK External NetworkKEK External NetworkKEK External NetworkKEK Internal NetworkKEK Internal NetworkKEK Internal NetworkKEK Internal Network

Logical Site OverviewLogical Site Overview

JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02

KEK FirewallKEK FirewallKEK FirewallKEK Firewall

HPSSHPSSHPSSHPSS

Central Computing SystemCentral Computing SystemNew KEK-CCNew KEK-CC

Grid LANGrid LAN

Scoped only for GRIDsScoped only for GRIDs

TaiwanTaiwanAsia-Pacific regionAsia-Pacific regionTaiwanTaiwanAsia-Pacific regionAsia-Pacific region

APANAPANAPANAPAN

Domestic institutesDomestic institutesU.S.AU.S.ADomestic institutesDomestic institutesU.S.AU.S.A

SuperSINETSuperSINETSuperSINETSuperSINET

Production SystemProduction SystemProduction SystemProduction System

Not for WLCGNot for WLCGStaff’s trainingStaff’s trainingWill Shift to PPSWill Shift to PPS

Not for WLCGNot for WLCGStaff’s trainingStaff’s trainingWill Shift to PPSWill Shift to PPS

JP-KEK-CRC-00JP-KEK-CRC-00JP-KEK-CRC-00JP-KEK-CRC-00

JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01

Production SystemProduction SystemProduction SystemProduction System

2007/6/132007/6/13 33Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 4: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

KEK-1KEK-1KEK-1KEK-1

KEK-2KEK-2KEK-2KEK-2

2007/6/132007/6/13 44Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Physical Site OverviewPhysical Site Overview

Page 5: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Brief Summary of LCG Brief Summary of LCG DeploymentDeployment

JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01

• since Nov. 2005.since Nov. 2005.• is registered to GOC, is ready is registered to GOC, is ready

to WLCGto WLCG• is operated by KEK staffs.is operated by KEK staffs.• Site Role:Site Role:

– practice for production system practice for production system JP-KEK-CRC-02.JP-KEK-CRC-02.

– test use among university groups in test use among university groups in Japan.Japan.

• Resource and Component:Resource and Component:– SL-3.0.5 w/ gLite-3.0 laterSL-3.0.5 w/ gLite-3.0 later– CPU: 14, Storage: ~1.5TBCPU: 14, Storage: ~1.5TB– FTS, FTA, RB, MON, BDII, LFC, CE, SEFTS, FTA, RB, MON, BDII, LFC, CE, SE

• Supported VOs:Supported VOs:– belle, apdg, g4med, ppj, dteam, ops, belle, apdg, g4med, ppj, dteam, ops,

calice, ilc and ailcalice, ilc and ail

JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02

• since early 2006.since early 2006.• is registered to GOC, is ready to is registered to GOC, is ready to

WLCG.WLCG.• Site Role:Site Role:

– More stable services based on KEK-1 More stable services based on KEK-1 experiences. experiences.

• Resource and Component:Resource and Component:– SL or SLC w/ gLite-3.0 laterSL or SLC w/ gLite-3.0 later– CPU: 48, Storage: ~1TB (w/o HPSS)CPU: 48, Storage: ~1TB (w/o HPSS)– Full componentsFull components

• Supported VOs:Supported VOs:– belle, apdg, g4med, atlasj, ppj, ilc, belle, apdg, g4med, atlasj, ppj, ilc,

calice, dteam, ops and ailcalice, dteam, ops and ail

2007/6/132007/6/13 55Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 6: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

GridGrid Related ServicesRelated Services• We have our own GRID CAWe have our own GRID CA

– is started on Feb. 2006, and is recognized by LCG.is started on Feb. 2006, and is recognized by LCG.– is accredited by APGRID PMAis accredited by APGRID PMA– http://gridca.kek.jp/http://gridca.kek.jp/

• VO Membership ServiceVO Membership Service– Supported VOs:Supported VOs:

• apdg is the VO for Asia-Pacific Data Grid.apdg is the VO for Asia-Pacific Data Grid.• belle is the VO for Belle experiments.belle is the VO for Belle experiments.• atlasj is the VO for Atlas experiments in Japan.atlasj is the VO for Atlas experiments in Japan.• g4med is the VO for Geant4 medical application.g4med is the VO for Geant4 medical application.• PPJ is the VO for the Particle Physics in Japan.PPJ is the VO for the Particle Physics in Japan.• ail is the VO for Associated International Laboratory between Japan and France.ail is the VO for Associated International Laboratory between Japan and France.

– http://voms.kek.jp/http://voms.kek.jp/• Local Mirror ServiceLocal Mirror Service

– SL, SLC, LCG, gLiteSL, SLC, LCG, gLite– It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories.It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories.

• ~3 minutes with KEK repository~3 minutes with KEK repository– http://hepdg.cc.kek.jp/mirror/http://hepdg.cc.kek.jp/mirror/

• Semi-automatic Installation ServiceSemi-automatic Installation Service– WNs can be installed semi-automatically by PXE (Preboot eXecution Environment) and kickstart WNs can be installed semi-automatically by PXE (Preboot eXecution Environment) and kickstart

configuration file.configuration file.– http://hepdg.cc.kek.jp/install/http://hepdg.cc.kek.jp/install/

• Site PortalSite Portal– http://grid.kek.jp/http://grid.kek.jp/

2007/6/132007/6/13 66Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 7: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

People on Grid at KEK/CRCPeople on Grid at KEK/CRC• 7 persons in total7 persons in total• CACA

– T. Sasaki T. Sasaki and and Y. IidaY. Iida• VOMSVOMS

– Y. Watase Y. Watase and and G. IwaiG. Iwai• Site Operation and SecuritySite Operation and Security

– KEK-0KEK-0• G. Iwai G. Iwai

– KEK-1KEK-1• T. SasakiT. Sasaki, , Y. IidaY. Iida, , Y. Watase Y. Watase and and G. IwaiG. Iwai

– KEK-2KEK-2• T. SasakiT. Sasaki, , Y. WataseY. Watase, and , and G. IwaiG. Iwai

• DeploymentDeployment– Y. WataseY. Watase, , Y. Iida Y. Iida and and G. IwaiG. Iwai

• DocumentationDocumentation– Y. WataseY. Watase

• NetworkingNetworking– S. Suzuki, S. Yashiro and S. Suzuki, S. Yashiro and Y. IidaY. Iida

• Application (SRB, Portal and some Gridified applications)Application (SRB, Portal and some Gridified applications)– K. Murakami, K. Murakami, Y. Iida Y. Iida and and G. IwaiG. Iwai

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 77

Page 8: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

OPERATION STATISTICSOPERATION STATISTICS

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 88

Page 9: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Submitted GGUS Submitted GGUS Tickets in JFY2006Tickets in JFY2006

• Total number of submitted Total number of submitted ticket: 28ticket: 28– KEK-1: 11KEK-1: 11– KEK-2: 17KEK-2: 17

2007/6/132007/6/13 99Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 10: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Number of Submitted Jobs in JFY2006Number of Submitted Jobs in JFY2006

JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01 JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02

2007/6/132007/6/13 1010Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 11: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Normalized CPU time in JFY2006Normalized CPU time in JFY2006(kSI2K*hrs)(kSI2K*hrs)

JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01 JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02

2007/6/132007/6/13 1111Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 12: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

VIRTUAL ORGANIZATIONVIRTUAL ORGANIZATIONBelle Experiment and Accelerator ScienceBelle Experiment and Accelerator Science

2007/6/132007/6/13 1212Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 13: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

VO for the Belle VO for the Belle ExperimentExperiment

• Belle VO is federated among Belle VO is federated among 4 countries, 6 institutes, 9 4 countries, 6 institutes, 9 sites.sites.– Japan: Nagoya University and Japan: Nagoya University and

KEKKEK– Taiwan: ASGC and NCUTaiwan: ASGC and NCU– Australia: University of Australia: University of

MelborneMelborne– Poland: CYFRONETPoland: CYFRONET– Korea University comes up Korea University comes up

soon.soon.

• Started using SRB and LCGStarted using SRB and LCG• Data distribution service using Data distribution service using

SRB-DSISRB-DSI– Belle already has a few PBs data Belle already has a few PBs data

in total including 100s TB DST in total including 100s TB DST and MCand MC• Bulk file register helps us: Bulk file register helps us:

SregisterSregister• we do not move any of themwe do not move any of them

– It is too much difficult to export It is too much difficult to export existing data to LCG physicallyexisting data to LCG physically

– Benefits both for native SRB Benefits both for native SRB users and LCG usersusers and LCG users

• SRB-DSI with LCG is in SRB-DSI with LCG is in operation now. operation now.

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1313

CYFRONETCYFRONETPolandPoland

CYFRONETCYFRONETPolandPoland

KEKKEKJapanJapanKEKKEK

JapanJapanNagoya Univ.Nagoya Univ.

JapanJapanNagoya Univ.Nagoya Univ.

JapanJapan

Melbourne Univ.Melbourne Univ.AustraliaAustralia

Melbourne Univ.Melbourne Univ.AustraliaAustralia

ASGCASGCTaiwanTaiwanASGCASGC

TaiwanTaiwanNCUNCU

TaiwanTaiwanNCUNCU

TaiwanTaiwan

Page 14: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

2007/6/132007/6/13 1414Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 15: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Hiroshima IT

VO for the Accelerator ScienceVO for the Accelerator Science

• Domestic supportsDomestic supports– Typical case at laboratory: A few staffs, ~10 students and no technician. Typical case at laboratory: A few staffs, ~10 students and no technician.

• Start to monitor them centrally over the VOStart to monitor them centrally over the VO– Focused on their operation supportsFocused on their operation supports– Not only for WLCG sites but also for NON-WLCG sitesNot only for WLCG sites but also for NON-WLCG sites

– PPJ VO is started for the accelerator science in PPJ VO is started for the accelerator science in Japan.Japan.

– Federated among a few universities.Federated among a few universities.• Tohoku Univ., Tsukuba Univ., Kobe Univ., Tohoku Univ., Tsukuba Univ., Kobe Univ.,

Hiroshima Univ., Nagoya Univ. and KEK.Hiroshima Univ., Nagoya Univ. and KEK.– Usage:Usage:

• To share resources and experiences among To share resources and experiences among major groups, ILC, KamLand, CDF and ATLAS major groups, ILC, KamLand, CDF and ATLAS without depending on experimental projects.without depending on experimental projects.

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1515

Page 16: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

ConclusionConclusion• Tools used in daily grid operations Tools used in daily grid operations

– Semi –automatic installation tools only for WNsSemi –automatic installation tools only for WNs• Most of tools are handmade scriptsMost of tools are handmade scripts

– Monitoring tools, e.g.; SAM and GSTAT are very useful.Monitoring tools, e.g.; SAM and GSTAT are very useful.– GGUS Search and APWIKI are also. GGUS Search and APWIKI are also. – We are testing to audit by using nCircle, vulnerability management system.We are testing to audit by using nCircle, vulnerability management system.

• Scheduled InterventionsScheduled Interventions– 11 times in JFY200611 times in JFY2006– Due toDue to

• Software/hardware upgrade and site reconfigurationSoftware/hardware upgrade and site reconfiguration• Annual maintenanceAnnual maintenance• Replacement of host certificateReplacement of host certificate

• Unscheduled interventionsUnscheduled interventions– ~10 times/year~10 times/year– Ex) Failed to reconfigure the site, or power cut by thunder.Ex) Failed to reconfigure the site, or power cut by thunder.

• Domestic supports in JapanDomestic supports in Japan– Important mission for KEK.Important mission for KEK.

• ~90% of problems are detected by the COD, SAM, GSTAT and nagios.~90% of problems are detected by the COD, SAM, GSTAT and nagios.– Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan.Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan.– We’d like to keep the tighter collaboration with ASGC. We’d like to keep the tighter collaboration with ASGC.

2007/6/132007/6/13 1616Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 17: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

ENDENDThank youThank you

2007/6/132007/6/13 1717Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm

Page 18: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1818

KEK-CCKEK-CC Grid LANGrid LAN

B-NETB-NET

KEK-FBKEK-FB KEK-2KEK-2202.13.197.0/24202.13.197.0/24

KEK-2KEK-2202.13.197.0/24202.13.197.0/24

New builtNew builtNew builtNew built

130.87.224.0/21130.87.224.0/21

SRB/MCATSRB/MCAT172.22.28.0/24172.22.28.0/24

130.87.224.0/21130.87.224.0/21

SRB/MCATSRB/MCAT172.22.28.0/24172.22.28.0/24

KEK-1KEK-1130.87.208.0/22130.87.208.0/22

KEK-1KEK-1130.87.208.0/22130.87.208.0/22

KEK-DMZKEK-DMZ

KEK FirewallKEK Firewall

GridFTPGridFTP130.87.104.0/22130.87.104.0/22

GridFTPGridFTP130.87.104.0/22130.87.104.0/22

HSMHSMHSMHSMNFSNFS

SRBSRB

GridFTPGridFTP

SRB-DSISRB-DSISRB-DSISRB-DSIPluggable ExtensionPluggable Extension

APANAPANAPANAPAN

SuperSINETSuperSINETSuperSINETSuperSINET

LCG with SRB at Belle LCG with SRB at Belle VOVO

Page 19: Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden

Points to Cover in Each PresentationPoints to Cover in Each Presentation• tools used in daily grid operations tools used in daily grid operations • what features are missing to make your work easier what features are missing to make your work easier • examples of the most frequent scheduled interventions at your site examples of the most frequent scheduled interventions at your site • examples of the most frequent unscheduled interventions at your site examples of the most frequent unscheduled interventions at your site • points to improve in communication with ROC, other sites, Vos, rest of points to improve in communication with ROC, other sites, Vos, rest of

the world... the world... • How do you plan deployment of updates/new versions so continuous production is not How do you plan deployment of updates/new versions so continuous production is not

interrupted? interrupted? • Communication with users: how are you informed about operational problems at your site Communication with users: how are you informed about operational problems at your site

reported by local/remote users? Mail/GGUS/phone/other? reported by local/remote users? Mail/GGUS/phone/other? • Correlation of cross-site issues: is the operations meeting enough for Correlation of cross-site issues: is the operations meeting enough for

this? How do you do it otherwise? this? How do you do it otherwise? • What percentage of real site problems are detected and reported by the What percentage of real site problems are detected and reported by the

COD before you know about them? COD before you know about them? • usefulness of the following operations bodies/meetings and suggestions to improve them: usefulness of the following operations bodies/meetings and suggestions to improve them:

– CODCOD– your ROC support team your ROC support team – operations meeting operations meeting

2007/6/132007/6/13 1919Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm