Batch System
The PPE group maintains a
PBS
cluster for running small quantities of jobs. If you need to run large numbers of jobs, you should investigate the possibility of running on
ScotGrid. The current composition of the batch system is as follows:
Nodes |
Operating System |
Total CPU Cores |
node007 |
Scientific Linux 6 |
40 |
node008 |
Scientific Linux 5 |
4 |
node013 to node017 |
Scientific Linux 5 |
20 |
node019 |
Scientific Linux 6 |
4 |
node034 |
Scientific Linux 6 |
56 |
tempnode001 to tempnode006 |
Scientific Linux 5 |
24 |
tempnode007 to tempnode015 |
Scientific Linux 6 |
36 |
The following queues are provided:
Name |
Operating System |
Maximum runtime |
long5 |
Scientific Linux 5 |
1 day |
long6 |
Scientific Linux 6 |
1 day |
medium5 |
Scientific Linux 5 |
6 hours |
medium6 |
Scientific Linux 6 |
6 hours |
short5 |
Scientific Linux 5 |
1 hour |
short6 |
Scientific Linux 6 |
1 hour |
vlong5 |
Scientific Linux 5 |
5 days |
vlong6 |
Scientific Linux 6 |
5 days |
Jobs running in the
vlong*
queues can be pre-empted by jobs in the
short*
and
medium*
queues. A pre-empted job is placed in the suspended state; it remains in memory on the compute node, but is no longer being executed. Once the pre-empting job has finished, the pre-empted job will be allowed to continue.
The PBS headnode is
offler.ppe.gla.ac.uk
, and you will see this name in the output of various PBS commands.
Using PBS
Batch jobs can be submitted and managed from any Linux desktop using the commands described in this section. Further information on these commands can be found in the linked documentation and Linux man pages at the bottom of this page.
Create a submission script
Jobs are defined using a submission script, which is like a shell script with the addition of certain directives (indicated by the
#PBS
prefix) which tell PBS how the job should be handled. A simple submission script might look like the following:
#PBS -N TestJob
#PBS -l walltime=1,mem=1024Mb
#PBS -m abe
#PBS -M user@machine
#
echo "This is a test..."
Submit a job
Jobs are submitted using the
qsub
command:
$ qsub <FILENAME>
After running this command, the ID of the newly-submitted will be output. For example, to submit a job defined by the submission script
test.pbs
:
$ qsub test.pbs
1000150.offler.ppe.gla.ac.uk
The numerical portion of this ID (
1000150
in this example) can be used to manage the job in the future.
Show running jobs
You can view details of submitted jobs using the
qstat
command:
$ qstat
offler.ppe.gla.ac.uk:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
1000151.offler.p rrabbit medium6 maus_sim_814 56289 1 1 -- 05:59 R 03:21 node034
1000152.offler.p bbunny long6 test_job 29669 1 1 -- 24:00 R 01:24 node007
This
Queues
There are currently eight queues on the batch system. The four queues ending in '4' will run jobs on SL4 machines and the four queues ending in '5' will run jobs on SL5 machines:
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
short4 -- -- 01:00:00 -- 0 0 -- E R
medium4 -- -- 06:00:00 -- 0 0 -- E R
long4 -- -- 24:00:00 -- 0 0 -- E R
vlong4 -- -- 120:00:0 -- 0 0 -- E R
short5 -- -- 01:00:00 -- 0 0 -- E R
medium5 -- -- 06:00:00 -- 0 0 -- E R
long5 -- -- 24:00:00 -- 0 0 -- E R
vlong5 -- -- 120:00:0 -- 0 0 -- E R
where
short5
is the default queue and
Walltime
is the maximum walltime allowed on each queue.
While it is possible to view your own jobs with
qstat
, the command will not display all jobs. To display all jobs use the Maui client command
showq
To see the current priorities of waiting jobs use the command
showq -i
.
Job Priority
The priority of a job is the sum of several weighting factors.
- There is a constant weighting given to short jobs and smaller weighting given to medium and long jobs. So that if all other factors are equal short jobs will have priority.
- The primary weighting is user fairshare. As a users jobs run their usage increases and the priority of their queued jobs decreases. This is balanced so that a user who uses exactly their fairshare allotment (currently 20% of the cpu averaged over the previous 48 days) will have their medium job priority decreased such that the medium job priority is equal to someone else's vlong job priority who has not used the batch system in the previous 48 days.
- Waiting jobs priority slowly increases as a function of time waiting in the queue. Currently a vlong job would have to wait several weeks to match the priority of a medium queue job all other things being equal.
Delete a job
Jobs are deleted using the
qdel
command:
$ qsub <JOB_ID>
To delete the job with ID
12345
:
$ qdel 12345
References