enabling grids for e-science egee-ii infso-ri-031688 porting an application to the egee grid data...

Download Enabling Grids for E-sciencE   EGEE-II INFSO-RI-031688 Porting an application to the EGEE Grid  Data management for Application Rachel Chen

If you can't read please download the document

Upload: piers-mccoy

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

EGEE-II INFSO-RI Enabling Grids for E-sciencE 3 Introduction The main goal:  Application porting: to port and execute an existing non-grid application to the Grid.  Application development: to develop a grid application. Some sources define this process commonly as “gridifying”. There are many useful applications which need gridifying.

TRANSCRIPT

Enabling Grids for E-sciencEEGEE-II INFSO-RI Porting an application to the EGEE Grid & Data management for Application Rachel Chen Academia Sinica Grid Computing EGEE-II INFSO-RI Enabling Grids for E-sciencE 2 Outline Introduction The workflow of porting an application to the Grid Common command list The practical without Data management The practical with Data management References EGEE-II INFSO-RI Enabling Grids for E-sciencE 3 Introduction The main goal: Application porting: to port and execute an existing non-grid application to the Grid. Application development: to develop a grid application. Some sources define this process commonly as gridifying. There are many useful applications which need gridifying. EGEE-II INFSO-RI Enabling Grids for E-sciencE 4 Introduction There are a lot of applications using EGEE. In HEP ATLAS:CMS:LHCb:ALICE: In BioinformaticsBioDCV:3DEM: In Biomed WISDOM:AvianFlu: More.. EGEE-II INFSO-RI Enabling Grids for E-sciencE 5 The workflow of Grid Application development 1.Analyze the application 2.Develop a non-grid application (or inheriting and updating an ancient one) 3.Execute, Test and Debug the application 4.Construct the job suit JDL (Job Description Language) files, executables, auxiliary scripts and input/output data files 5.Upload your data files to SE 6.Submit the job to the Grid 7.Execute, Test and Debug the application; 8.IF something goes wrong THEN GOTO 4 (or 2 ) EGEE-II INFSO-RI Enabling Grids for E-sciencE 6 Some information We are using the GILDA testbed today The production EGEE grid looks like this! Current EGEE production middleware GILDA is one VO on EGEE resources for training and prototyping EGEE-II INFSO-RI Enabling Grids for E-sciencE 7 Practicals The application called add will be ported and executed in grid environment. add is written in C, java, python programming languages. EGEE-II INFSO-RI Enabling Grids for E-sciencE 8 Common command list CommandThe meaning voms-proxy-initInitialize user proxy. voms-proxy-infoGet the user proxy information. edg-job-submitSubmit a job. edg-job-list-matchInvestigate whether there is CE for the job. edg-job-statusGet the job status. edg-job-get-outputGet the job output. lcg-cpCopies a grid file to a local destination. lcg-crCopies a file to a SE and registers the file in the catalog. lcg-delremove a file/directory. lfc-lsList file/directory entries in a directory. lfc-mkdirCreate a directory. Enabling Grids for E-sciencEEGEE-II INFSO-RI The practical without Data management EGEE-II INFSO-RI Enabling Grids for E-sciencE 10 add program (Version 1, No Data management) Reads input data from a file called testFile.txt. This file must be specified in the JDL file. From the input file, add 2 values on the same line, then output the result to the standard output. need a parameter:./add testFile.txt(c) java add testFile.txt(java) python add.py testFile.txt(python) add INPUT OUTPUT EGEE-II INFSO-RI Enabling Grids for E-sciencE Prerequisites: File add.c/add.java/add.py the source codes of the programs File testFile.txt it contains a sample input values. File add.jdl a prepared JDL (Job Description Language) file. File run.sh a script that you can execute your executable file manually. File readme.txt introduce all files in the folder. A complier or an interpreter A standard C compiler and linker. In this case we will use GNU C (gcc) already installed. Java Python interpreter 11 EGEE-II INFSO-RI Enabling Grids for E-sciencE 12 add: logon Step: 1.Log on to the GILDA user interface using PuTTY SSH (Secure shell) client located on your Windows Desktop. (The user input is given in red color.) Hostname: glite-tutor.ct.infn.it login as: taipeiXX (taipei01~taipei50) (where XX is your number) Password: GridTAIXX (GridTAI01~GridTAI50) (where XX is the same number) EGEE-II INFSO-RI Enabling Grids for E-sciencE 13 add: getting the prerequisites Step: 2.Download the prerequisites stored in a zipped file NoDataManagement.zip with the following command: wget Unzip the archive in your current directory with the command: unzip NoDataManagement.zip Change the current directory: cd NoDataManagement There are 3 folders(c, java, python) and 3 files(readme.txt, std.out, testFile.txt) in the current directory. The folder name is the language name we use for this example. Please choose one of them and change your working directory. cd c or cd java or cd python EGEE-II INFSO-RI Enabling Grids for E-sciencE 14 add: compilation (C) Step: 3.Compile and link the program using GNU C compiler / linker: gcc -o add add.c This will create an executable file add. Look at the directory content: ls -l EGEE-II INFSO-RI Enabling Grids for E-sciencE 15 add: compilation (Java) Step: 3.Compile the program using Java compiler: javac add.java This will create a class file add.class. Look at the directory contents: ls -l EGEE-II INFSO-RI Enabling Grids for E-sciencE 16 add: compilation (Python) Step: 3. No needed if you choose python. EGEE-II INFSO-RI Enabling Grids for E-sciencE 17 add: testing as a non-grid application Step: 4.execute your program with the following command:./add testFile.txt (C) java add testFile.txt (Java) python add.py testFile.txt (Python) Look at the content of the input file testFile.txt: more testFile.txt And you may examine the source code: more add.c (C) more add.java (Java) more add.py (Python) EGEE-II INFSO-RI Enabling Grids for E-sciencE 18 gLite: entering the Grid! Step: 5.Login to the GILDA Grid Initialize your proxy: voms-proxy-init --voms gilda This will ask for the passphrase which is TAIPEI for all users. Check the proxy status with: voms-proxy-info -all EGEE-II INFSO-RI Enabling Grids for E-sciencE 19 gLite: Checking the job requirements Step: 6.Investigate whether there is matched resource for the job: edg-job-list-match --vo gilda add.jdl This command will produce a listing with all of the Grid Computing elements together with jobmanager queues that fulfill the requirements of our job. EGEE-II INFSO-RI Enabling Grids for E-sciencE 20 gLite: Checking the job requirements A list of Computer Elements that can execute your program EGEE-II INFSO-RI Enabling Grids for E-sciencE 21 gLite: Submitting the job to GILDA Grid Steps: 7.Execute the following command: edg-job-submit -o myJobId --vo gilda add.jdl This will submit the job and will store its unique identifier in a file called myJobId. You may look at that file. 8.Monitor the job status with: edg-job-status -i myJobId Execute this command several times until Done (Success) status. EGEE-II INFSO-RI Enabling Grids for E-sciencE 22 Practical (continued): retrieving the job results Step: 9.Execute the following command: edg-job-get-output -i myJobId -dir./ This will retrieve the Output sandbox files and will store them into a local directory under the current directory. Directory name will be something like taipeiXXX_6tJj5hmisLFXsl9zoSaw6A. Enter the output directory and look at the file std.out cd taipeiXXX_6tJj5hmisLFXsl9zoSaw6A more std.out EGEE-II INFSO-RI Enabling Grids for E-sciencE Step: 10.Look to the supplied add.jdl file: more add.jdl The add.jdl looks like (if you choose python): 23 add: the JDL-file Executable = "/usr/bin/env"; JobType = "Normal"; Arguments = "python add.py testFile.txt"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "testFile.txt" }; OutputSandbox = { "std.out", "std.err" } Executable = "/usr/bin/env"; JobType = "Normal"; Arguments = "python add.py testFile.txt"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "testFile.txt" }; OutputSandbox = { "std.out", "std.err" } Executable sets the name of the executable file; JobType the type of the job; Arguments command line arguments of the program; StdOutput, StdError - files for storing the standard output and error messages output; InputSandbox input files needed by the program, including the executable; OutputSandbox output files which will be written during the execution, including standard output and standard error output; Enabling Grids for E-sciencEEGEE-II INFSO-RI The practical with Data management EGEE-II INFSO-RI Enabling Grids for E-sciencE 25 Data management in gLite Why do we need data management in our application? Sharing The size of the data set Getting data more efficiently EGEE-II INFSO-RI Enabling Grids for E-sciencE 26 add program (Version 2, Data management) Reads input data from a file and the logical file name is /grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt. From the input file, add 2 values on the same line, then output the result to the file called result.txt and some message to the standard output. need some parameters: if you want to execute it on non-grid env:./add testFile.txt result.txt (c) java add testFile.txt result.txt (java) python add.py testFile.txt result.txt (python) add INPUT OUTPUT EGEE-II INFSO-RI Enabling Grids for E-sciencE Prerequisites: File add.c/add.java/add.py the source codes of the programs File testFile.txt it contains a sample input values. You have to use lfc command to upload this file to SE and register it in the LFC. File add.jdl a prepared JDL (Job Description Language) file. File run.sh a script that you can execute your executable file manually. File runJob.sh a script used on the WN. File readme.txt introduce all files in the folder. A complier or an interpreter A standard C compiler and linker. In this case we will use GNU C (gcc) already installed. Java Python interpreter 27 EGEE-II INFSO-RI Enabling Grids for E-sciencE 28 add: getting the prerequisites Step: 1.Go to your home directory and download the prerequisites stored in a zipped file DataManagement.zip with the following command: wgetUnzip the archive in your current directory with the command: unzip DataManagement.zip (This will create a subdirectory DataManagement with all of the prerequisite files inside.) Change the current directory: cd DataManagement There are 3 folders(c, java, python) and 4 files(readme.txt, std.out, testFile.txt, and result.txt) in the current directory. The folder name is the language name we use for this example. Please choose one of them and change your working directory. cd c or cd java or cd python EGEE-II INFSO-RI Enabling Grids for E-sciencE 29 Add: export some env Step 2. Export some necessary environment variables: export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin EGEE-II INFSO-RI Enabling Grids for E-sciencE 30 Modify some files Step: 3. Please open your file add.jdl and modify the item InputData according to your account name InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; eg: InputData = {"lfn:/grid/gilda/training/taipei/taipei01/testFile.txt"}; And modify the file runJob.sh and try to modify the following line according to your account name lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt eg: lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/taipei01/testFile.txt file:`pwd`/testFile.txt EGEE-II INFSO-RI Enabling Grids for E-sciencE 31 Upload the input file Step: 4. Make your own directory in the file catalog: lfc-mkdir /grid/gilda/training/taipei/YOUR_ACCOUNT/ eg: lfc-mkdir /grid/gilda/training/taipei/taipei01/ Use the command to upload the file testFile.txt into the SE and register this file into the file catalog: lcg-cr --vo gilda -v d iceage-se-01.ct.infn.it l lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:/home/YOUR_ACCOUNT/DataManagement/testFile.txt eg: lcg-cr --vo gilda -v -d iceage-se-01.ct.infn.it -l lfn:/grid/gilda/training/taipei/taipei01/testFile.txt file:/home/taipei01/DataManagement/testFile.txt Check the directory contents: lfc-ls /grid/gilda/training/taipei/YOUR_ACCOUNT/ eg: lfc-ls /grid/gilda/training/taipei/taipei01/ EGEE-II INFSO-RI Enabling Grids for E-sciencE 32 add: compilation (C) Step: 5. Compile and link the program using GNU C compiler / linker: gcc -o add add.c This will create an executable file add. Look at the directory contents: ls -l EGEE-II INFSO-RI Enabling Grids for E-sciencE 33 add: compilation (Java) Step: 5. Compile and link the program using Java compiler: javac add.java This will create a class file add.class. Look at the directory contents: ls -l EGEE-II INFSO-RI Enabling Grids for E-sciencE 34 add: compilation (Python) Step: 5. No needed if you choose python. EGEE-II INFSO-RI Enabling Grids for E-sciencE 35 add: testing as a non-grid application Step: 6. execute your program with the following commands:./add testFile.txt result.txt (C) java add testFile.txt result.txt (Java) python add.py testFile.txt result.txt (Python) Look at the content of the input file testFile.txt: more testFile.txt Look at the content of the output file result.txt: more result.txt And you may examine the source code: more add.c (C) more add.java (Java) more add.py (Python) EGEE-II INFSO-RI Enabling Grids for E-sciencE 36 gLite: Checking the job requirements Step: 7. Investigate whether there is matched resource for the job: edg-job-list-match --vo gilda add.jdl This command will produce a listing with all of the Grid Computing elements together with jobmanager queues that fulfill the requirements of our job. EGEE-II INFSO-RI Enabling Grids for E-sciencE 37 gLite: Checking the job requirements A list of Computer Elements that can execute your program EGEE-II INFSO-RI Enabling Grids for E-sciencE 38 gLite: Submitting the job to GILDA Grid Steps: 8. Execute the following command: edg-job-submit -o myJobId --vo gilda add.jdl This will submit the job and will store its unique identifier in a file called myJobId. You may look at that file. 9. Monitor the job status with: edg-job-status -i myJobId Execute this command several times until Done (Success) status. EGEE-II INFSO-RI Enabling Grids for E-sciencE 39 Practical (continued): retrieving the job results Step: 10. Execute the following command: edg-job-get-output -i myJobId -dir./ This will retrieve the Output sandbox files and will store them into a local directory under the current directory. Directory name will be something like taipeiXX_6tJj5hmisLFXsl9zoSaw6A. Change the output directory and look at the files result.txt and std.out cd taipeiXX_6tJj5hmisLFXsl9zoSaw6A more std.out more result.txt EGEE-II INFSO-RI Enabling Grids for E-sciencE Look to the supplied add.jdl file: more add.jdl The add.jdl looks like (if you choose python): 40 add: the JDL-file Executable = "/bin/sh"; JobType = "Normal"; Arguments = "runJob.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "runJob.sh" }; InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; DataAccessProtocol = {"rfio","gridftp","gsiftp"}; OutputSandbox = { "std.out", "std.err", "result.txt" } Executable = "/bin/sh"; JobType = "Normal"; Arguments = "runJob.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "runJob.sh" }; InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; DataAccessProtocol = {"rfio","gridftp","gsiftp"}; OutputSandbox = { "std.out", "std.err", "result.txt" } InputData representing the Logical File Name (LFN) or Grid Unique Identifier (GUID) needed by the job as input; DataAccessProtocol the application is able to speak with for accessing files listed in InputData on a given SE; EGEE-II INFSO-RI Enabling Grids for E-sciencE 41 add: runJob.sh Step: 11. Look to the supplied runJob.sh file: more runJob.sh The runJob.sh looks like (if you choose python): #!/bin/sh export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin # get the file from SE lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt # execute it /usr/bin/env python add.py testFile.txt result.txt #!/bin/sh export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin # get the file from SE lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt # execute it /usr/bin/env python add.py testFile.txt result.txt EGEE-II INFSO-RI Enabling Grids for E-sciencE 42 Summary Understand grid infrastructure Data management Information management WMS . Understand the requirement and the workflow Requirement analysis. Modify your program. Keep close communication with domain experts. Adapt grid existing applications NA4 in EGEE: EGEE-II INFSO-RI Enabling Grids for E-sciencE 43 Support for application gridification SZTAKI operates as Grid Application Support Centre More information EGEE-II INFSO-RI Enabling Grids for E-sciencE 44 References JDL Attributes https://edms.cern.ch/document/590869/1 gLite 3.0 User Guide https://edms.cern.ch/file/722398/1.1/ R-GMA overview pageGLUE SchemaJDL attributes specification for WM proxy https://edms.cern.ch/document/590869/1 EGEE-II INFSO-RI Enabling Grids for E-sciencE 45 More exercises Another Example: This program is very similar with add, but it returns the product after multiplying 2 values on the same line of the input file testFile.txt. Download sample codes, input files and some scripts Without Data managementWith Data managementWrite your own JDL file and script used on the WN if needed, please. You have to upload your output file (result.txt) to SE /grid/gilda/training/taipei/YOUR_ACCOUNT/resultFile.txt EGEE-II INFSO-RI Enabling Grids for E-sciencE 46 Questions?