Batch System
The PPE group has limited resources for batch computing. The ppepbs batch system is provided for running a small number of jobs. If a large number of jobs are needed then please use the
Grid or try the
Compute Cluster
.
The PPE batch system that is managed via the TORQUE Resource Manager (based on
OpenPBS) and the Maui scheduler. The batch system can be accessed from any linux desktop using the pbs commands described below.
The batch nodes are installed with mixture of 64 bit SL4 and SL5. There are 47 cpus for SL5 jobs and 40 cpus for SL4 jobs. Eight queues are available spilt into two groups: four queues for SL4 jobs and four queues for SL5 jobs (see the queues section below). Executables should be built on one of the PPE Linux desktops machines of the required flavour. The version of scientific linux install on a machine can be checked by examining the
/etc/redhat-release
file:
cat /etc/redhat-release
and to check if the a machine is 32 or 64 bit:
uname -m
a 64 bit machine will return
x86_64
and 32 bit machine
i686
.
Job submission
From any ppe linux desktop jobs can be submitted to a TORQUE queue via
qsub
, e.g.:
ssh ppepbs
qsub test.job
where
test.job
might contain
#PBS -N TestJob
#PBS -l walltime=1,mem=1024Mb
#PBS -m abe
#PBS -M user@machine
#
echo "This is a test..."
More documentation is given in the
qsub
man page.
Queues
There are currently eight queues on the batch system. The four queues ending in '4' will run jobs on SL4 machines and the four queues ending in '5' will run jobs on SL5 machines:
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
short4 -- -- 01:00:00 -- 0 0 -- E R
medium4 -- -- 06:00:00 -- 0 0 -- E R
long4 -- -- 24:00:00 -- 0 0 -- E R
vlong4 -- -- 120:00:0 -- 0 0 -- E R
short5 -- -- 01:00:00 -- 0 0 -- E R
medium5 -- -- 06:00:00 -- 0 0 -- E R
long5 -- -- 24:00:00 -- 0 0 -- E R
vlong5 -- -- 120:00:0 -- 0 0 -- E R
where
short5
is the default queue and
Walltime
is the maximum walltime allowed on each queue.
While it is possible to view your own jobs with
qstat
, the command will not display all jobs. To display all jobs use the Maui client command
showq
To see the current priorities of waiting jobs use the command
showq -i
.
Job Pre-emption
Jobs in the
vlong4
and
vlong5
queues can be preempted by jobs waiting in the
short4
,
short5
,
medium4
or
medium5
queues. A preempted job is placed in the suspended state - it remains in memory but is not longer being executed. Once the preempting job has finished the preempted job starts executing again.
Job Priority
The priority of a job is the sum of several weighting factors.
- There is a constant weighting given to short jobs and smaller weighting given to medium and long jobs. So that if all other factors are equal short job will have priority.
- The primary weighting is user fairshare. As a users jobs run their usage increases and the priority of their queued jobs decreases. This is balanced so that a user who uses exactly their fairshare allotment (currently 20% of the cpu averaged over the previous week) will have their medium job priority decreased such that the medium job priority is equal to someone elses vlong job priority who has not used the batch system in the previous week.
- A waiting jobs priority slowly increases as a function of time waiting in the queue. Currently a vlong job would have to wait several weeks to match the priority of a medium queue job all other things being equal.
--
AndrewPickford - 12 Jan 2009