1 kolkata, asia 2 2011 - joint chain/eu-indiagrid2/epikh school for grid site administrators,...
TRANSCRIPT
1Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011www.epikh.eu
The EPIKH Project(Exchange Programme to advance e-Infrastructure Know-How)
CE+WN Installation and configuration
Riccardo Rotondo ([email protected])
National Institute of Nuclear Physics
Asia 2 2011 - CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators
Kolkata, 03.02.2011
2Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Outline
• Computing Element overview• Worker Node overview• CE CREAM overview• gLite stack overview• gLite CE cream and siteBDII
– Installation on CE and WN (wiki)– Configuration on CE and WN (wiki)
3Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
gLite stack overview
4Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
gLite overview
worker node
5Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
glite overview• User Interface: it’s the point of access for users to
glite grid services• WMS: it’s the component that optimize resource
usage.• CE: the machine who manage worker nodes• WN: the machines who actually execute applications• SE: machines where files are stored• LFC: used to “find” files on the grid• BDII: services responsible to publish all info of your
sites• Logging and Bookkeping: as it’s name says it’s a
logger and alert user when job is finisched
6Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Computing Element Overview
• Computing Element provides some of main services of a site.
• Main functionalities:– job management (job submission, job control)– job status updated for WMS– Usually installed together with the site BDII service that
publishes all information regarding the computing element
• It can runs several kinds of batch system:– Torque + MAUI– LSF– SGE– Condor
7Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Torque + MAUI
• Torque server service:– pbs_server provides basic batch services such as
receiving/creating a batch job.
• Torque client service:– psb_mom places jobs into execution. It’s is also
responsible for returning job’s output to the user.
• MAUI system service:– job_scheduler contains site’s policy to decide which job is
going to be executed and when.
8Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Site BDII*
• By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual.
• It collect all site GRISes* (for example SE,RB,LFC,etc...)
• Service is named bdii
• Log file: /opt/bdii/var/bdii.log
• *BDII = Berkeley Database Information Index• **GRIS = Grid Resouce Information Service
9Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Worker Node Element Overview
• They are machines which really execute your job.
• User can only access their services by a Computing Element.
• Their characteristics are collected by Computing Element that publishes all information by BDII services
10Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
• Computing Resource Execution And Management
• Accept job submission requests belonging from a WMS and other job management request.
• It exposes a web services interface
CE Cream overview
11Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Requirements
• Three or more machine:– One will be used to perform CE installation;– Others will be used to perform WN installation;
• Architecture: 64 bit• Operating System: Scientific Linux 5• CE machine with a public ip address, direct and
reverse address resolution on a DNS and equipped with an X509 certificate.
12Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
CE Cream and WN Installation & Configruation
(on Torque/PBS)
13Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Wiki
• Follow the steps here for CE CREAM:–https://grid.ct.infn.it/twiki/bin/view/EPIK
H/CECreamEpikh
• Follow the steps here for WN:• https://grid.ct.infn.it/twiki/bin/view/EPI
KH/WNEpikh
14Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
A few words on benchmark • How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR
?
• Try to search for you value in thris link:• http://www.italiangrid.org/grid_operations/site_manager/HEP-
SPEC06
• https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_412
• https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results
• For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so:
• CE_SI00 = 3800
• CE_SF00 = 3800
• CE_CAPABILITY="CPUScalingReferenceSI00=3800”
• CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06”
• Where (3800/40)/4= 23.75
15Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Adding a VO
# vim my-ig-site-info.def
VOS="euindia infngrid ops dteam"QUEUES="cert grid"CERT_GROUP_ENABLE="euindia ops dteam"GRID_GROUP_ENABLE="infngrid"
16Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Adding a VO/2
q1 q2 q3
17Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Q1_GROUP_ENABLE
Adding a VO/3
Q2_GROUP_ENABLE
Q3_GROUP_ENABLE
18Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Adding a VO/4
# vim vo.d/euindiaSW_DIR=$VO_SW_DIR/euindiaDEFAULT_SE=$SE_HOSTSTORAGE_DIR=$CLASSIC_STORAGE_DIR/euindiaVOMS_SERVERS="'vomss://voms.ct.infn.it:8443/voms/euindia?/euindia'"VOMSES="'euindia voms.ct.infn.it 15004 /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it euindia'"VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'"
• Here some settings to support euindia VO:
Then install the VO voms certificates with:
wget http://grid018.ct.infn.it/mrepo/cometa_sl4-i386/RPMS.app/cometa-vomscert-1.0-3.noarch.rpm
rpm –ivh cometa-vomscert-1.0-3.noarch.rpm
19Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Adding a VO/5
• Now you have to provide a group and some users for EUINDIA VO modifying this two files:
- ig-groups.conf
- ig-users.conf
# vim ig-groups.conf # Append following lines to the end of file"/euindia/ROLE=SoftwareManager":::sgm:"/euindia"::::-
20Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Adding a VO/6
# vim ig-users.conf #append this line at the end of the file39001:euindia001:3900:euindia:euindia::39002:euindia002:3900:euindia:euindia::39003:euindia003:3900:euindia:euindia::39004:euindia004:3900:euindia:euindia::39005:euindia005:3900:euindia:euindia::39006:euindia006:3900:euindia:euindia::39007:euindia007:3900:euindia:euindia::39008:euindia008:3900:euindia:euindia::39009:euindia009:3900:euindia:euindia::39010:euindia010:3900:euindia:euindia::39011:euindia011:3900:euindia:euindia::39012:euindia012:3900:euindia:euindia::39013:euindia013:3900:euindia:euindia::39014:euindia014:3900:euindia:euindia::39015:euindia015:3900:euindia:euindia::39016:euindia016:3900:euindia:euindia::39017:euindia017:3900:euindia:euindia::39018:euindia018:3900:euindia:euindia::39019:euindia019:3900:euindia:euindia::39020:euindia020:3900:euindia:euindia::39101:sgmeuindia001:3910,3900:sgmeuindia,euindia:euindia:sgm:39102:sgmeuindia002:3910,3900:sgmeuindia,euindia:euindia:sgm:39103:sgmeuindia003:3910,3900:sgmeuindia,euindia:euindia:sgm:
21Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Testing installation
22Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Tests on CE• SSH access to CE to test if CE can see WN and to test if all main
service are up & running
# pbsnodes Your-ip-hostname state = free np = 2 properties = lcgpro ntype = cluster status = opsys=linux,uname=Linux grid-test-63.trigrid.it 2.6.18-164.6.1.el5 #1 [cut]
# /etc/init.d/gLite status*** tomcat5:/opt/glite/etc/init.d/tomcat5 is already running (1514)*** glite-lb-locallogger:glite-lb-logd runningglite-lb-interlogd running# /etc/init.d/globus-gridftp statusglobus-gridftp-server (pid 25452) is running...
23Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Tests on CE
• SSH access to CE and then become a gilda user:
# su – euindia001
$ vi test.sh#!/bin/sh sleep 20 #(it's useful to see the job status) hostname
• Create a file and add the following:
• Set right permission to be executable:
$ chmod 700 test.sh
24Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Tests on CE
• Launch job locally on CE
$ qsub –q euindia test.sh
• Then check list of job in execution on CE
$ qstat –a
ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----3.wn.localdo gilda001 short test.sh 5839 -- -- -- 00:15 R --
• In case you want to abort a job execution:
$ qdel 3 #that is jobid
• In case you want to more info:
$ qstat -f 3
25Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Tests on CE
• If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output.
$ lstest.sh.e3 test.sh.o3$ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain
26Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
JDL example
$ vim hostname-cream.jdl
Type = "Job";JobType = "Normal";Executable = "/bin/hostname";StdOutput = "hostname.out";StdError = "hostname.err";OutputSandbox = {"hostname.err","hostname.out"};Arguments = "-f";OutputSandboxBaseDestUri = "gsiftp://localhost";ShallowRetryCount = 3;
27Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Working test• SSH access to UI to test if CE can receive and execute
simple job$ ssh [email protected] #password: XXXXXXX$ voms-proxy-init --voms euinda[cut][rotondo@genius ~]$ glite-ce-delegate-proxy -e grid-test-33.trigrid.it riccardo2010-06-29 02:36:21,683 WARN - No configuration file suitable for loading. Using built-in configuration2010-06-29 02:36:26,389 NOTICE - Proxy with delegation id [riccardo] succesfully delegated to endpoint [https://grid-test-33.trigrid.it:8443//ce-cream/services/gridsite-delegation]$[rotondo@genius ~]$ glite-ce-job-submit –r grid-test-33.trigrid.it:8443/cream-pbs-cert -D riccardo hostname-cream.jdl2010-06-29 02:39:06,444 WARN - No configuration file suitable for loading. Using built-in configurationhttps://grid-test-33.trigrid.it:8443/CREAM501920532$ glite-ce-job-status https://ceristXX.grid.arn.dz:8443/CREAM888739522****** JobID=[https://ceristXX.grid.arn.dz:8443/CREAM888739522] Status = [DONE-OK] ExitCode = [0]
28Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Troubleshooting
• Which logs are supposed to be open if something goes wrong?:–/var/log/message, for general errors–/opt/glite/var/log (especially glite-
ce-cream.log)–/var/spool/pbs/server_priv/
accounting/<data>, if even local submission on batch system doesn’t work.
29Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
References• INFNGRID generic installation guide:
– http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2
• YAIM configuration variables
– https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables
• CE Cream installation guide:
– GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki]
• YAIM system administrator guide:
– https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400
• How To Check And Test Your CREAMCE
– http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAMCE
30Kolkata, Asia 2 2011 - Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, 03.02.2011
Thank you for your kind attention !
Any questions ?