IT Web>GridServices (2024-03-06, GordonStewart)

Grid Services

Jobs can be submitted to grid resources using the ARC tools, which are available in CVMFS. Our colleagues in Durham have written a good introductory tutorial; a summary of the steps required to submit and manage jobs, adapted for Glasgow users, is given below.

ARC tools

The tools required for grid job submission and management are available from CVMFS:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
setupATLAS

lsetup emi

If you plan to submit jobs to the ScotGrid VO, at present you must also amend the X509_VOMSES environment variable as follows:

export X509_VOMSES=/etc/vomses

Certificates and proxies

To use grid resources, you will need a certificate, from which you can generate a proxy certificate. The proxy certificate has a relatively short lifetime, and is used to actually submit the job. A proxy is associated with a particular Virtual Organisation (VO), for example vo.scotgrid.ac.uk, which is selected when it is created. You can generate a proxy using the arcproxy command:

$ arcproxy -S <VO_ALIAS> -N

For example, to generate a proxy for the vo.scotgrid.ac.uk VO:

$ arcproxy -S vo.scotgrid.ac.uk -N
Your identity: /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=bugs bunny
Contacting VOMS server (named vo.scotgrid.ac.uk): voms.gridpp.ac.uk on port: 15509
Proxy generation succeeded
Your proxy is valid until: 2018-01-19 23:36:59

Job description (xRSL)

Before submitting a job, you need to create a file which describes the features of the job for ARC (its executable, the names of input and output files, what to do with logs, etc.). This file is written in the Extended Resource Specification Language (xRSL). A simple job description which runs a script called test.sh could look like this:

&
(executable = "test.sh")
(arguments = "")
(jobName = "TestJob")
(stdout = "stdout")
(stderr = "stderr")
(gmlog = "gmlog")
(walltime = "60")

A full description of xRSL can be found in the ARC reference manual:

http://www.nordugrid.org/documents/xrsl.pdf

Submitting a job

Jobs are submitted to a Compute Element (CE). The ScotGrid site at Glasgow has four CEs:

ce01.gla.scotgrid.ac.uk
ce02.gla.scotgrid.ac.uk
ce03.gla.scotgrid.ac.uk
ce04.gla.scotgrid.ac.uk

It does not matter which CE you choose to submit to. (If you've looked at the tutorial linked above, you'll see that Durham gave their CEs the sensible names ce1, ce2, etc. We thought that would be too easy.)

Jobs are submitted using the arcsub command:

$ arcsub -c <CE_HOSTNAME> <XRSL_FILENAME>

For example, to submit test.xrsl to ce03 at Glasgow:

$ arcsub -c ce03.gla.scotgrid.ac.uk test.xrsl
Job submitted with jobid: gsiftp://ce03.gla.scotgrid.ac.uk:2811/jobs/NCKLDmEQkwrnZ4eC5pmRAbBiTBFKDmABFKDmpMFKDmABFKDmQffBxy

When a job is submitted successfully, you will be presented with its job ID which can be used to refer to the job later. Information about submitted jobs is also recorded in a job list file; by default, this file is ~/.arc/jobs.dat (~/.arc/jobs.xml with some earlier versions of ARC), but you can choose a different location by supplying the -j argument to arcsub:

$ arcsub -j <JOBLIST_FILENAME> -c <CE_HOSTNAME> <XRSL_FILENAME>

For example:

$ arcsub -j test.dat -c ce03.gla.scotgrid.ac.uk test.xrsl

Querying the status of a job

You can obtain information about the status of jobs using the arcstat command:

$ arcstat <JOB_ID>

For example, to obtain information about the job submitted in the previous step:

$ arcstat gsiftp://ce03.gla.scotgrid.ac.uk:2811/jobs/NCKLDmEQkwrnZ4eC5pmRAbBiTBFKDmABFKDmpMFKDmABFKDmQffBxy
Job: gsiftp://ce03.gla.scotgrid.ac.uk:2811/jobs/NCKLDmEQkwrnZ4eC5pmRAbBiTBFKDmABFKDmpMFKDmABFKDmQffBxy
 Name: StageJob
 State: Queuing

Status of 1 jobs was queried, 1 jobs returned information

You may have to wait a few minutes after submitting a job before status information becomes available.

You can also query the status of all jobs in a job list file:

$ arcstat -j <JOBLIST_FILENAME>

Retrieving job output

Output and log files for a job can be retrieved using the arcget command. As when querying the status of a job, you can use either a job ID or a job list file with this command:

$ arcget <JOB_ID>
$ arcget -j <JOBLIST_FILENAME>

For example, to get the output of the job submitted above:

$ arcget gsiftp://ce03.gla.scotgrid.ac.uk:2811/jobs/NCKLDmEQkwrnZ4eC5pmRAbBiTBFKDmABFKDmpMFKDmABFKDmQffBxy
Results stored at: p6vLDmj3kwrnZ4eC3pmXXsQmABFKDmABFKDm9pFKDmABFKDmtVM1wm
Jobs processed: 1, successfully retrieved: 1, successfully cleaned: 1

You will only be able to retrieve job output once the job has finished.

Copying input and output files ("staging")

You can tell ARC to copy input and output files to and from the compute element by including additional attributes in your xRSL file:

&
(executable = "test.sh")
(arguments = "")
(jobName = "TestJob")
(inputFiles = ("input.dat" ""))
(outputFiles = ("output.txt" "")
               ("results.tgz" "")
)
(stdout = "stdout")
(stderr = "stderr")
(gmlog = "gmlog")
(walltime = "60")

Files used in the exectuable, stdout and stderr attributes are transferred automatically, but other files should be listed in the inputFiles or outputFiles attribute as necessary. The inputFiles and outputFiles attributes each take one or more values like this:

("<FILENAME>" "<URL>")

Where <URL> is left blank, ARC transfers the file to or from the submission machine (this would be the case for input.dat, output.txt and results.tgz in the example xRSL above). Alternatively, a URL may be provided to copy the file to or from a remote resource:

("index.html" "http://www.example.org/index.html")
("rabbits.zip" "ftp://ftp.example.org/rabbits.zip")
("values.dat" "gsiftp://gridftp.example.org/data/values.dat")

Various protocols are supported, including Rucio and SRM, and details can be found in the ARC reference manual:

http://www.nordugrid.org/documents/xrsl.pdf

Due to the slightly convoluted path files follow to make their way from the submission machine through the CE to the compute node, it is easiest to avoid using paths when specifying outputFiles. Instead, if files are created in subdirectories, it may be simpler to copy these files back to $HOME at the end of the script (this is a working directory belonging to your job, and is not related to your home directory). You may also wish to add multiple output files or directories to an archive, in order to simplify the process of retrieving results further.

Topic revision: r19 - 2024-03-06 - GordonStewart