Batch System

Overview

The PPE group maintains a PBS cluster for running small quantities of jobs. If you need to run large numbers of jobs, you should investigate the possibility of running on ScotGrid. The current composition of the batch system is as follows:

Nodes Operating System Total CPU Cores
node001 to node003 Scientific Linux 6 96
node007 Scientific Linux 6 40
node008 Scientific Linux 5 4
node013 to node015 Scientific Linux 5 12
node034 Scientific Linux 6 56

The PBS headnode is offler.ppe.gla.ac.uk, and you will see this name in the output of various PBS commands.

Queues

Name Operating System Maximum runtime
short5 Scientific Linux 5 1 hour
medium5 Scientific Linux 5 6 hours
long5 Scientific Linux 5 1 day
vlong5 Scientific Linux 5 5 days
short6 Scientific Linux 6 1 hour
medium6 Scientific Linux 6 6 hours
long6 Scientific Linux 6 1 day
vlong6 Scientific Linux 6 5 days

Jobs running in the vlong* queues can be pre-empted by jobs in the short* and medium* queues. A pre-empted job is placed in the suspended state; it remains in memory on the compute node, but is no longer being executed. Once the pre-empting job has finished, the pre-empted job will be allowed to continue.

Job Prioritisation

The cluster is configured with a fair-share scheduler, which aims to distribute compute time fairly among users. When multiple users are competing for resources, preference will be shown to users whose recent usage has been lower. Short jobs are also generally given priority over longer jobs.

Using PBS

Batch jobs can be submitted and managed from any Linux desktop using the commands described in this section. Further information on these commands can be found in the linked documentation and Linux man pages at the bottom of this page.

Create a submission script

Jobs are defined using a submission script, which is like a shell script with the addition of certain directives (indicated by the #PBS prefix) which tell PBS how the job should be handled. A simple submission script might look like the following:

#PBS -N TestJob
#PBS -o test.log
#PBS -j oe
#PBS -l mem=1024Mb

echo "This is a test..."

Submit a job

Jobs are submitted using the qsub command:

$ qsub <FILENAME>

After running this command, the ID of the newly-submitted job will be output. For example, to submit a job defined by the submission script test.pbs:

$ qsub test.pbs
1000150.offler.ppe.gla.ac.uk

The numerical portion of this ID (1000150 in this example) can be used to manage the job in the future.

Show running jobs

You can view details of submitted jobs using the qstat command:

$ qstat

offler.ppe.gla.ac.uk:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1000151.offler.p     rrabbit  medium6  test_job_123      56299     1   1    --  05:59 R 03:21   node034
1000152.offler.p     bbunny   long6    test_job          29369     1   1    --  24:00 R 01:24   node007

You can also provide a job ID to limit the output to a particular job:

$ qstat 1000151

offler.ppe.gla.ac.uk:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1000151.offler.p     rrabbit  medium6  test_job_123      56299     1   1    --  05:59 R 03:21   node034

Delete a job

Jobs are deleted using the qdel command:

$ qsub <JOB_ID>

To delete the job with ID 12345:

$ qdel 12345

References

Topic revision: r16 - 2017-05-30 - GordonStewart
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback