south african grid training computing element albert van eck ufs - icts 18 november 2009 slides by:...

37
South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

Upload: alannah-harrington

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

18 Nov 2009, Cape Town South African Grid Training 3 OVERVIEW The Computing Element is the central service of a site. Its main functionalities are: – manage the jobs (job submission, job control) ‏ – update the status of the jobs to the WMS – publish all site information (site location, queues, about the CPUs status, and so on) via LDAP (site BDII service) ‏ It can run on several kinds of batch systems: – Torque + MAUI – LSF – SGE – Condor

TRANSCRIPT

Page 1: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

South African Grid Training

COMPUTING ELEMENT

Albert van EckUFS - ICTS

18 November 2009Slides by: GIUSEPPE PLATANIA

Page 2: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 2

OUTLINE• OVERVIEW

• INSTALLATION & CONFIGURATION

• TESTING

• FIREWALL SETUP

• TROUBLESHOOTING

Page 3: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 3

OVERVIEW• The Computing Element is the central service of a site.• Its main functionalities are:

– manage the jobs (job submission, job control)– update the status of the jobs to the WMS– publish all site information (site location, queues, about

the CPUs status, and so on) via LDAP (site BDII service)

It can run on several kinds of batch systems:– Torque + MAUI– LSF– SGE– Condor

Page 4: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 4

TORQUE + MAUI• The Torque server is composed of:

– pbs_server pbs_server which provides the basic batch services such as receiving/creating a batch job.

• The Torque client is composed of:– pbs_mompbs_mom which places the job into execution. It is

also responsible for returning the job’s output to the user

• The MAUI system is composed of:– job_schedulerjob_scheduler which contains the site's policy to

decide which job must be executed and when.

Page 5: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 5

Site BDII**– By default it is installed on the CE– It collects all site GRISes* (for example

SE,RB,LFC,etc..)– The name of the service is bdii– Log file: /opt/bdii/var/bdii.log

*GRIS=Grid Resource Information Service**BDII=Berkeley Database Information Index

Page 6: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 6

Computing Element installation &

configuration using YAIM

Page 7: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 7

There are several kinds of metapackages to install:

ig_CE – LCG ComputingElement without batch system packages.

ig_CE_LSF – LCG ComputingElement with LSF.

• IMPORTANT: provided for consistency, it does not install LSF but it apply some fixes via ig_configure_node.

ig_CE_torque – LCG ComputingElement with Torque+MAUI.

WHAT KIND OF CE?

Page 8: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 8

HOW TO GET A HOST CERTIFICATE

• Host certificate for CE.– Please, request it from your RA

•For this tutorial:HOST=$(hostname -f)mkdir /etc/grid-securitycp /root/$HOST/${HOST}-cert.pem /etc/grid-security/hostcert.pemcp /root/$HOST/${HOST}-key.pem /etc/grid-security/hostkey.pem

• Install host certificates – (hostcert.pem and hostkey.pem) in /etc/grid-security

– mkdir /etc/grid-security– cd /etc/grid-security– chmod 644 hostcert.pem– chmod 400 hostkey.pem

Page 9: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 9

Repository settings

• REPOS="ca dag glite-lcg_ce ig jpackage gilda"

Download and save the repo files:• for name in $REPOS; do wget

http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done

Page 10: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 10

INSTALLATION• yum remove jdk• yum install xml-commons-resolver12• yum install jdk java-1.6.0-sun-compat • yum install maui-3.2.6p19_20.snap.1182974819-5.slc4 \ maui-server-3.2.6p19_20.snap.1182974819-5.slc4• yum install ig_CE_torque• yum install lcg-CA

Gilda rpms:• yum install gilda_utils

If it's also the site BDII collector:• yum install ig_BDII

Page 11: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 11

• Copy ig-site-info.def template file provided by ig_yaim into gilda directory and customize it

cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/<your_site-info.def>

• Open /opt/glite/yaim/etc/gilda/<your_site-info.def> file using a text editor and set the following values according to your grid environment:

CE_HOST=<write the CE hostname you are installing> TORQUE_SERVER=$CE_HOST

Customize ig-site-info.def

Page 12: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 12

JOB_MANAGER=lcgpbsBATCH_BIN_DIR=/usr/binBATCH_VERSION=torque-2.1.9-4CE_BATCH_SYS=pbsCE_CPU_MODEL=OpteronCE_CPU_VENDOR=AMDCE_CPU_SPEED=3000 CE_OS="ScientificSL"CE_OS_RELEASE=4.8CE_OS_VERSION="SL"CE_MINPHYSMEM=2048CE_MINVIRTMEM=4096CE_SMPSIZE=2CE_SI00=1000CE_SF00=1200CE_OUTBOUNDIP=TRUECE_INBOUNDIP=TRUE

Customize ig-site-info.def

Page 13: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 13

GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.confUSERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.confJAVA_LOCATION="/usr/java/latest"

SITE_EMAIL="grid-prod@<your_domain>"SITE_NAME=GILDA-54..58 #Your Number (eg. GILDA-60)SITE_LOC="Cape Town, SOUTH AFRICA"SITE_LAT=37.5SITE_LONG=15.152SITE_WEB="https://gilda.ct.infn.it"SITE_SUPPORT_SITE="grid-prod@<your_domain>“

REMOVE the following, if it exists:SITE_TIER=“xxxxxxxx"

Customize ig-site-info.def

Page 14: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 14

QUEUES="short long infinite gilda"

SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOS

If you configure a queue for a single VO:

QUEUES="short long infinite gilda"

SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOSGILDA_GROUP_ENABLE="gilda"

Customize ig-site-info.def

Page 15: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 15

DPM_HOST="aliserv6.ct.infn.it“SE_LIST="$DPM_HOST“VOS="gilda <others>" #If you have more than one: "gilda

my_other_vo"ALL_VOMS="gilda“WMS_HOST="egee-wms-01.cnaf.infn.it"SE_MOUNT_INFO_LIST="none"CE_OTHERDESCR="Cores=8,Benchmark=$CE_SI00-HEP-SPEC06"CE_RUNTIMEENV="LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 GLITE-3_0_0

GLITE-3_1_0 R-GMA"CE_CAPABILITY="CPUScalingReferenceSI00=$CE_SI00"BATCH_SERVER=$CE_HOSTBDII_HOST=gilda-bdii.ct.infn.itSITE_BDII_HOST=$CE_HOSTBDII_REGIONS="CE SE"BDII_CE_URL="ldap://$CE_HOST:2170/mds-vo-name=resource,o=grid"BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo-name=resource,o=grid"

Customize ig-site-info.def

Page 16: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 16

WMS_HOST="egee-wms-01.cnaf.infn.it"SE_MOUNT_INFO_LIST="none"CE_OTHERDESCR="Cores=8,Benchmark=$CE_SI00-HEP-SPEC06"CE_RUNTIMEENV="LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 GLITE-3_0_0

GLITE-3_1_0 R-GMA"CE_CAPABILITY="CPUScalingReferenceSI00=$CE_SI00"

VO_GILDA_SW_DIR=$VO_SW_DIR/gildaVO_GILDA_DEFAULT_SE=$CLASSIC_HOSTVO_GILDA_STORAGE_DIR=$CLASSIC_STORAGE_DIR/gildaVO_GILDA_QUEUES="gilda"VO_GILDA_VOMS_SERVERS="vomss://voms.ct.infn.it:8443/voms/

gilda?/gilda"VO_GILDA_VOMSES="'gilda voms.ct.infn.it

15001/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it gilda'"VO_GILDA_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'

'/C=IT/O=INFN/CN=INFN CA'"

Customize ig-site-info.def

Page 17: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 17

WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf

The file specified in WN_LIST has to define all your WNs' full hostnames.

WARNING: It's important to configure the WN file (/opt/glite/yaim/etc/gilda/wn-list.conf) before you run the yaim configure command

Customize ig-site-info.def

Page 18: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 18

• Copy users and groups example files to /opt/glite/yaim/etc/gilda/

cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/

• Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf and ig-groups.conf

cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf

cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf

Customize ig-site-info.def

Page 19: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 19

CE Torque Configuration• Now we can configure the node:

/opt/glite/yaim/bin/ig_yaim -c \ -s /opt/glite/yaim/etc/gilda/<your_site-info.def> \ -n ig_CE_torque \ -n BDII_site

* Note that there is two different (-n) node type parameters

Page 20: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 20

Computing ElementTesting

Page 21: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 21

• Check that the local GRIS and the site BDII are running on CE and are publishing the right information (CPU, site name and so on)

ldapsearch -x –h your_ce_hostname -p 2170 -b mds-vo-name=resource,o=grid

ldapsearch -x –h your_ce_hostname -p 2170 -b mds-vo-name=your_site_name,o=grid

The second ldapsearch will return nothingSee next slide

Testing

Page 22: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 22

ldapsearch -x -h your_ce_hostname -p 2170 -b mds-vo-name=your_site_name,o=grid

The ldapsearch won’t return anything

Solution:Edit the following file/opt/glite/yaim/etc/gilda/services/glite-bdii_siteComment out the following entries, or set the correct

values for them and rerun ig_yaim...BDII_REGIONS=...BDII_host-id-1_URL=...

Testing

Page 23: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 23

• Become a gilda user # su – gilda001

• Create a file (test.sh) and add the following: #!/bin/sh sleep 20 #(it's useful to see the job status) hostname

• Save it and set the file permission to be executable:

chmod 700 test.sh

Testing

Page 24: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 24

[gilda001@ce gilda001]$ qsub -q short test.sh

[gilda001@ce gilda001]$ qstat -a

ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK

Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----3.wn.localdo gilda001 short test.sh 5839 -- -- --

00:15 R --

Testing

Page 25: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 25

[gilda001@ce gilda001]$ qstat -a[gilda001@ce gilda001]$

• The job execution has finished and we have to list the output file:

[gilda001@ce gilda001]$ lstest.sh.e3 test.sh.o3

• And show the results:[gilda001@ce gilda001]$ cat test.sh.e3 (error file)[gilda001@ce gilda001]$[gilda001@ce gilda001]$ cat test.sh.o3 (output file)wn.localdomain

Testing

Page 26: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 26

Log onto the UI:

Hostname -> glite-tutor.ct.infn.itUsername -> capetown01..06Password -> GridCAP01..06

Grid passphrase -> CAPETOWN

Testing

Page 27: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 27

[plt@glite-tutor plt]$ voms-proxy-init --voms gilda[plt@glite-tutor plt]$ globus-job-run <your-ce-full-hostname>:2119/jobmanager-lcgpbs -q

short /bin/hostname

wn.localdomain

[plt@glite-tutor plt]$ glite-wms-job-submit -a -r your-ce-hostname:2119/jobmanager-lcgpbs-gilda hostname.jdl

Selected Virtual Organisation name (from proxy certificate extension): gildaConnecting to host glite-rb.ct.infn.it, port 7772Logging to host glite-rb.ct.infn.it, port 9002******************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is:

- https://glite-rb.ct.infn.it:9000/Vo-4Ih1s-iDbBPr3rs69GQ

********************************************************************************plt@glite-tutor plt]$ glite-wms-job-status https://glite-rb.ct.infn.it:9000/Vo-4Ih1s-iDbBPr3rs69GQ

Testing

Page 28: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 28

FIREWALL SETUP

Page 29: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 29

/etc/sysconfig/iptables (1/2)*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]:RH-Firewall-1-INPUT - [0:0]-A INPUT -j RH-Firewall-1-INPUT-A FORWARD -j RH-Firewall-1-INPUT-A RH-Firewall-1-INPUT -i lo -j ACCEPT-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2135 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2119 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2170 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2811 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport maui -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_mom -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_resmon -j ACCEPT

Page 30: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 30

-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3878:3879 -j

ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 3879 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3882 -j ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 1020:1023 -j

ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 20000:25000 -j

ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 32768:65535 -j

ACCEPT-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 32768:65535 -j

ACCEPT-A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibitedCOMMIT

/etc/sysconfig/iptables (2/2)

Page 31: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 31

IPTABLES STARTUP

/sbin/chkconfig iptables on

/etc/init.d/iptables start

Page 32: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 32

Troubleshooting

Page 33: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 33

Troubleshooting[plt@ui plt]$ globus-job-run you_ce_hostname:2119/jobmanager-lcgpbs -q short /bin/hostnameGRAM Job submission failed because the connection to the server failed (check host and port)

(error code 12)

solution: check if the globus-gatekeeper daemon is up and running on CE

[plt@ui plt]$ globus-job-run <ce_hostname>:2119/jobmanager-lcgpbs -q short /bin/hostnameGRAM Job submission failed because authentication failed:GSS Major Status: Authentication FailedGSS Minor Status Error Chain:

init.c:499: globus_gss_assist_init_sec_context_async: Error during context initializationinit_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problemsglobus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's

credentialsglobus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's

credentials: Couldn't verify the remote certificateOpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert

bad certificate (error code 7)

solution: probably there is no GILDA CA rpm installed on CE

Page 34: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 34

[plt@ui plt]$ edg-gridftp-ls gsiftp://<ce_hostname>/error the server sent an error response: 530 530 LCMAPS

credential mapping NOT successful

error the server sent an error response: 530 530 LCMAPS credential mapping NOT successful

Solution: Check the VO mapping on the CE:/opt/edg/etc/lcmaps/gridmapfile/opt/edg/etc/lcmaps/groupmapfile

Troubleshooting

Page 35: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 35

The CE is publishing incorrect information such as:GlueCEStateFreeCPUs: 0GlueCEStateRunningJobs: 0GlueCEStateStatus: ProductionGlueCEStateTotalJobs: 0GlueCEStateWaitingJobs: 4444

Run the script:/opt/glite/etc/gip/plugin/glite-info-dynamic-scheduler-wrapperand check if it gives some errors. Often it doesn’t work because

the batch system is down or in a lock state. If that is the case, restart the torque-server service:/etc/init.d/pbs_server restart

Troubleshooting

Page 36: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 36

• If a query to the site BDII doesn’t show the information about a site, you have to look at the BDII logfile:/opt/bdii/var/bdii.log

• For example:GILDA: ldap_bind: Can't contact LDAP server

Check if:– BDII is up & running (ps aux |grep bdii)– That resource url is in the list file

/opt/glite/etc/gip/site-urls.conf – Firewall Setup

Troubleshooting

Page 37: South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

18 Nov 2009, Cape TownSouth African Grid Training 37