job lifecycle

27
/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira 20100127 @ uu.nl Life-cycle of a Grid Computing Job with some side stories

Upload: nuno-ferreira

Post on 06-Aug-2015

25 views

Category:

Documents


0 download

TRANSCRIPT

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

20100127 @ uu.nl

Life-cycle of a Grid Computing Job with some side stories

1/24

Outline

Grid & Science - EGEE

Virtual Organizations

enmr.eu architecture

Grid Job Life Cycle

Hello Grid!

CNS tutorial

Web Portals

2/24

The Grid

“Coordinated resource sharing and problem solving in dynamic,

multi-institutional virtual organizations”.

Foster, I. et al., Int. J. Superc. Appli. (2000)15:3

3/24

Why do scientists need the Grid?

High-energy physics (15 PB/year)

15 PB ~ 20*10^6 CD’s

Genome projects, data mining,

Tackling the protein folding,

Protein structure, …

4/24

Enabling Grids for E-science

GStat (Jan 2010) : http://goc.grid.sinica.edu.tw/gstat/

Infrastructure

317 sites

58 countries

~ 140K CPU’s 24/7

~ 69 PB disk

Users

182 registered VO’s

~ 12K registered users

> 300K jobs / day

5/24

Registered EGEE Virtual Organizations

Application domain Active VO’s Users

High-energy Physics 41 4737

Infrastructures 28 2365

Life Sciences 10 519

... ... ...

Total 182 11908

http://cic.gridops.org/index.php?section=home&page=volist

VO name Scope Registered Users

(20090210)

Registered Users

(20100125)

biomed Gobal 223 257

enmr.eu Global 54 155

VO Registered Users

6/24 Stats : 20100125

7/24

How to become an enmr.eu user?

http://ca.dutchgrid.nl/request/

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Your Name

enmr.eu Grid architecture

8/24

Enmr.eu Grid Status

9/24

The (not so short) Job Life-cycle

10/24 www.gridcafe.org

Authentication and Authorization (1/2)

11/24

[nuno@ui-enmr ~]$ ll ~/.globus

total 16

-rw-r--r-- 1 nuno users 2189 Nov 14 17:18 usercert.p12

-rw-r--r-- 1 nuno users 4947 Nov 14 17:19 usercert.pem

-rw------- 1 nuno users 963 Nov 14 17:20 userkey.pem

[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu

Cannot find file or dir: /home/nuno/.glite/vomses

Enter GRID pass phrase:

Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

Creating temporary proxy ........................... Done

Contacting voms-02.pd.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=Padova/CN=voms-02.pd.infn.it]

"enmr.eu" Done

Creating proxy .......................... Done

Your proxy is valid until Wed Jan 27 03:44:48 2010

[nuno@ui-enmr ~]$ grid-cert-info -s -i -sd -ed

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

/C=NL/O=NIKHEF/CN=NIKHEF medium-security certification auth

Oct 23 00:00:00 2009 GMT

Oct 23 15:15:43 2010 GMT

Authentication and Authorization (2/2)

12/24

[nuno@ui-enmr ~]$ voms-proxy-init --voms enmr.eu

Cannot find file or dir: /home/nuno/.glite/vomses

Enter GRID pass phrase:

Your identity: /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

Creating temporary proxy ............................................... Done

Contacting voms2.cnaf.infn.it:15014 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it] "enmr.eu" Done

Creating proxy ............................................. Done

Your proxy is valid until Wed Jan 27 03:54:00 2010

[nuno@ui-enmr ~]$ voms-proxy-info

subject : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira/CN=proxy

issuer : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

identity : /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Nuno Loureiro Ferreira

type : proxy

strength : 1024 bits

path : /tmp/x509up_u500

timeleft : 11:56:19

Available resources

14/24

[nuno@ui-enmr bcbr]$ lcg-infosites --vo enmr.eu ce all

#CPU Free Total Jobs Running Waiting ComputingElement

----------------------------------------------------------

399 20 85 57 28 grid-ce-01.ba.infn.it:2119/jobmanager-lcgpbs-short

16 7 9 9 0 ce-enmr.chem.uu.nl:2119/jobmanager-lcgpbs-medium

88 88 0 0 0 glite-ce.grid.uj.ac.za:8443/cream-pbs-long

2460 906 103 103 0 trekker.nikhef.nl:2119/jobmanager-pbs-medium

1632 1584 45 45 0 deimos.htc.biggrid.nl:2119/jobmanager-pbs-medium

200 0 0 0 0 t2-ce-05.lnl.infn.it:8443/cream-lsf-enmr1

… snip …

Avail Space(Kb) Used Space(Kb) Type SEs

----------------------------------------------------------

2444576886 555136905 n.a prod-se-01.pd.infn.it

3127661680 1371977164 n.a prod-se-02.pd.infn.it

1858674692 106001211 n.a se-enmr.chem.uu.nl

13828076063 21152016643 n.a se01.dur.scotgrid.ac.uk

… snip …

Submit a job

15/24

[nuno@ui-enmr bcbr]$ glite-wms-job-submit -a -o jid hello.jdl

Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server

====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy

Your job identifier is:

https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg

The job identifier has been saved in the following file:

/home/nuno/grid/hello/bcbr/jid

==========================================================================

Query Job Status

16/24

[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid

*************************************************************

BOOKKEEPING INFORMATION:

Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg

Current Status: Scheduled

Status Reason: Job successfully submitted to Globus

Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong

Submitted: Tue Jan 26 16:26:07 2010 CET

*************************************************************

[nuno@ui-enmr bcbr]$ glite-wms-job-status -i jid

*************************************************************

BOOKKEEPING INFORMATION:

Status info for the Job : https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg

Current Status: Done (Success)

Exit code: 0

Status Reason: Job terminated successfully

Destination: pbs-enmr.cerm.unifi.it:2119/jobmanager-lcgpbs-verylong

Submitted: Tue Jan 26 16:26:07 2010 CET

*************************************************************

Retrieve Job Output

17/24

[nuno@ui-enmr bcbr]$ glite-wms-job-output -i jid --dir ./out

Connecting to the service https://wms-enmr.chem.uu.nl:7443/glite_wms_wmproxy_server

================================================================================

JOB GET OUTPUT OUTCOME

Output sandbox files for the job:

https://lb-enmr.chem.uu.nl:9000/gOtqQuG4ebqpz3m5z8_2Eg

have been successfully retrieved and stored in the directory:

/home/nuno/grid/hello/bcbr/out

================================================================================

[nuno@ui-enmr bcbr]$ ll ./out/

total 4

-rw-r--r-- 1 nuno users 0 Jan 26 17:31 hello.err

-rw-r--r-- 1 nuno users 48 Jan 26 17:31 hello.out

[nuno@ui-enmr bcbr]$ more ./out/hello.out

Hello Grid! I was here : wn3-enmr.cerm.unifi.it

CNS example (1/3)

18/24

[nuno@ui-enmr cns-example]$ ll

total 160

-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz

-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README

-rwxr-xr-x 1 nuno users 134 Mar 18 2009 run-cns

-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl

[nuno@ui-enmr cns-example]$ tar tvzf cns-input.tgz

-rw-r--r-- abonvin/staff 30070 2008-05-06 12:42:33 CaMM13Tmpcs1.tbl

-rw-r--r-- abonvin/staff 16946 2008-05-06 12:42:33 CaMM13Tmrdc1.tbl

-rw-r--r-- abonvin/staff 912 2008-05-06 12:44:53 README

-rw-r--r-- abonvin/staff 208142 2008-05-06 12:42:33 calmodulin-MM13.pdb

-rw-r--r-- abonvin/staff 341327 2008-05-06 12:42:33 calmodulin-MM13.psf

-rw-r--r-- abonvin/staff 4982 2008-05-06 12:42:33 ion.param

-rw-r--r-- abonvin/staff 158398 2008-05-06 12:42:33 noes.tbl

-rw-r--r-- abonvin/staff 548 2008-05-06 12:42:33 par_axis.pro

-rw-r--r-- abonvin/staff 74090 2008-05-06 12:42:33 parallhdg5.3.pro

-rw-r--r-- abonvin/staff 16549 2008-05-06 12:42:33 phipsi.tbl

-rw-r--r-- abonvin/staff 9571 2008-05-06 12:42:33 sa-test.inp

-rw-r--r-- abonvin/staff 273 2008-05-06 12:42:33 tensor.pdb

-rw-r--r-- abonvin/staff 1181 2008-05-06 12:42:33 tensor.psf

-rw-r--r-- abonvin/staff 57 2008-05-06 12:42:33 tensor.tbl

http://www.enmr.eu/eNMR-tutorials

CNS example (2/3)

19/24

[nuno@ui-enmr cns-example]$ more run-cns

source $VO_ENMR_EU_SW_DIR/BCBR/cns/1.2-para/set_cns.bash

tar xfz cns-input.tgz

cns < sa-test.inp > sa-test.out

tar cvfz cns-output.tgz *

[nuno@ui-enmr cns-example]$ more run-cns.jdl

Executable = "run-cns";

StdOutput = "std.out";

StdError = "std.err";

InputSandbox = {"cns-input.tgz","run-cns"};

OutputSandbox = {"std.out", "std.err","cns-output.tgz"};

Requirements = RegExp ("chem.uu.nl",other.GlueCEUniqueId);

CNS example (3/3)

20/24

[nuno@ui-enmr cns-example]$ glite-wms-job-submit -a -o jid run-cns.jdl

[nuno@ui-enmr cns-example]$ glite-wms-job-output -i jid –dir ./

[nuno@ui-enmr cns-example]$ ll

total 24464

-rw-r--r-- 1 nuno users 144884 Mar 18 2009 cns-input.tgz

-rw-r--r-- 1 nuno users 24854174 Jan 26 18:24 cns-output.tgz

-rw-r--r-- 1 nuno users 79 Jan 26 17:13 jid

-rw-r--r-- 1 nuno users 1529 Mar 18 2009 README

-rwxr-xr-x 1 nuno users 137 Jan 26 17:12 run-cns

-rw-r--r-- 1 nuno users 229 Jan 17 17:58 run-cns.jdl

[nuno@ui-enmr out]$ more sa_1.pdb

REMARK FILENAME="/home/enmr016/globus-tmp.wn23-enmr.25892.0/https_3a_2f_2flb-"

… snip …

REMARK DATE:26-Jan-2010 17:29:14 created by user: enmr016

REMARK VERSION:1.2

ATOM 1 HA ALA 1 1.868 27.047 -8.664 1.00 15.00 A

ATOM 2 CB ALA 1 0.511 28.488 -7.902 1.00 15.00 A

ATOM 3 HB1 ALA 1 0.379 28.981 -8.854 1.00 15.00 A

… snip …

21/24

Web Portal Grid Interaction

On-going work – GROMACS WebPortal

22/24

Zwartkijken / Idées Noires - Franquin

“Life cycle of a GRID computing job?

That's something like:

conception..,

abortion..,

conception..,

birth..,

premature death..,

reanimation.., etc?

:p

T.”

20100127 – 11AM

23/24

24/24

Big-Picture layer

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Alexandre Bonvin

Rolf Boleans

Hardware-layer

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/CN=Johan van der Zwan

Middleware layer

/C=IT/O=INFN/OU=Personal Certificate/L=Padova/ CN=Cristina Aiftimiei

Application layer

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Marc van Dijk

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Sjoerd De Vries

/O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=Tsjerk Wassenaar

User layer /O=dutchgrid/O=users/O=universiteit-utrecht/OU=chem/ CN=*.*

Acknowlegments

Service Availability Monitoring

25/24

Grid Operations Center Data Base

26/24

Building a Grid

27/24 27/24

1. The architecture

2. The hardware

3. The middleware

Network

Resources

Middleware

Application

User-c

entric