TWiki> IT Web>BatchSystem (revision 11)EditAttach

Batch System

The PPE group has limited resources for batch computing. The ppepbs batch system is provided for running a small number of jobs. If a large number of jobs are needed then please use the Grid or try the Compute Cluster.

The PPE batch system that is managed via the TORQUE Resource Manager (based on OpenPBS) and the Maui scheduler. The batch system can be accessed from any linux desktop using the pbs commands described below.

The batch nodes are installed with mixture of 64 bit SL4 and SL5. There are 47 cpus for SL5 jobs and 40 cpus for SL4 jobs. Eight queues are available spilt into two groups: four queues for SL4 jobs and four queues for SL5 jobs (see the queues section below). Executables should be built on one of the PPE Linux desktops machines of the required flavour. The version of scientific linux install on a machine can be checked by examining the /etc/redhat-release file:

cat /etc/redhat-release

and to check if the a machine is 32 or 64 bit:

uname -m

a 64 bit machine will return x86_64 and 32 bit machine i686.

Job submission

From any ppe linux desktop jobs can be submitted to a TORQUE queue via qsub, e.g.:

qsub test.job

where test.job might contain

#PBS -N TestJob
#PBS -l walltime=1,mem=1024Mb
#PBS -m abe
#PBS -M user@machine
echo "This is a test..."

More documentation is given in the qsub man page.


There are currently eight queues on the batch system. The four queues ending in '4' will run jobs on SL4 machines and the four queues ending in '5' will run jobs on SL5 machines:

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
short4             --      --    01:00:00   --    0   0 --   E R
medium4            --      --    06:00:00   --    0   0 --   E R
long4              --      --    24:00:00   --    0   0 --   E R
vlong4             --      --    120:00:0   --    0   0 --   E R
short5             --      --    01:00:00   --    0   0 --   E R
medium5            --      --    06:00:00   --    0   0 --   E R
long5              --      --    24:00:00   --    0   0 --   E R
vlong5             --      --    120:00:0   --    0   0 --   E R

where short5 is the default queue and Walltime is the maximum walltime allowed on each queue.

While it is possible to view your own jobs with qstat, the command will not display all jobs. To display all jobs use the Maui client command showq

To see the current priorities of waiting jobs use the command showq -i.

Job Pre-emption

Jobs in the vlong4 and vlong5 queues can be preempted by jobs waiting in the short4, short5, medium4 or medium5 queues. A preempted job is placed in the suspended state - it remains in memory but is not longer being executed. Once the preempting job has finished the preempted job starts executing again.

Job Priority

The priority of a job is the sum of several weighting factors.

  • There is a constant weighting given to short jobs and smaller weighting given to medium and long jobs. So that if all other factors are equal short jobs will have priority.
  • The primary weighting is user fairshare. As a users jobs run their usage increases and the priority of their queued jobs decreases. This is balanced so that a user who uses exactly their fairshare allotment (currently 20% of the cpu averaged over the previous 48 days) will have their medium job priority decreased such that the medium job priority is equal to someone else's vlong job priority who has not used the batch system in the previous 48 days.
  • Waiting jobs priority slowly increases as a function of time waiting in the queue. Currently a vlong job would have to wait several weeks to match the priority of a medium queue job all other things being equal.

Killing a job

Jobs may be terminated by executing qdel JOBID where the JOBID is the numerical ID code returned in the qstat listing.

-- AndrewPickford - 12 Jan 2009

Edit | Attach | Watch | Print version | History: r16 | r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2015-05-28 - GavinKirby
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback