TWiki> IT Web>BatchSystem (revision 6)EditAttach

Batch System

The PPE group has limited resources for batch computing. The ppepbs batch system is provided for running a small number of jobs. If a large number of jobs are needed then please use the Grid or try the Compute Cluster

PPEPBS

The PPE group has a small batch system that is managed via the TORQUE Resource Manager (based on OpenPBS) and the Maui scheduler. The batch nodes are installed with mixture of 64 bit SL4 and SL5. There are 47 cpus for SL5 jobs and 40 cpus for SL4 jobs. Eight queues are available spilt into two groups: four queues for SL4 jobs and four queues for SL5 jobs (see the queues section below). Executables should be built on one of the PPE Linux desktops machines of the required flavour. The version of scientific linux install on a machine can be checked by examining the /etc/redhat-release file:

cat /etc/redhat-release

and to check if the a machine is 32 or 64 bit:

uname -m

a 64 bit machine will return x86_64 and 32 bit machine i686.

Job submission

Jobs can be submitted to a TORQUE queue via qsub, e.g.:

ssh ppepbs
qsub test.job

where test.job might contain

#PBS -N TestJob
#PBS -l walltime=1,mem=1024Mb
#PBS -m abe
#PBS -M user@machine
#
echo "This is a test..."

More documentation is given in the qsub man page. The TORQUE documentation pages installed on ppepbs can be listed via

ssh ppepbs
rpm -ql torque-docs

Queues

There are currently eight queues on ppepbs. The four queues ending in '4' will run jobs on SL4 machines and the four queues ending in '5' will run jobs on SL5 machines:

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
short4             --      --    01:00:00   --    0   0 --   E R
medium4            --      --    06:00:00   --    0   0 --   E R
long4              --      --    24:00:00   --    0   0 --   E R
vlong4             --      --    120:00:0   --    0   0 --   E R
short5             --      --    01:00:00   --    0   0 --   E R
medium5            --      --    06:00:00   --    0   0 --   E R
long5              --      --    24:00:00   --    0   0 --   E R
vlong5             --      --    120:00:0   --    0   0 --   E R

where short5 is the default queue and Walltime is the maximum walltime allowed on each queue.

Jobs in the vlong4 and vlong5 queues can be preempted by jobs waiting in the short4, short5, medium4 or medium5 queues. A preempted job is placed in the suspended state - it remains in memory but is not longer being executed. Once the preempting job has finished the preempted job starts executing again.

While it is possible to view your own jobs with qstat, the command will not display all jobs. To display all jobs use the Maui client command showq

-- AndrewPickford - 12 Jan 2009

Edit | Attach | Watch | Print version | History: r16 | r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 2011-05-10 - AndrewPickford
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback