---+ Batch System The PPE group maintains a [[https://en.wikipedia.org/wiki/Portable_Batch_System][PBS]] cluster for running small quantities of jobs. If you need to run large numbers of jobs, you should investigate the possibility of running on ScotGrid. The batch system uses the [[https://en.wikipedia.org/wiki/TORQUE][TORQUE]] resource manager (based on OpenPBS) and the Maui scheduler. It can be accessed from any Linux desktop using the commands described below. The current composition of the batch system is as follows: | *Nodes* | *Operating System* | *Total CPU Cores* | | =node123= to =node456= | SL5 | 999 | The following queues are provided: | *Name* | *Operating System* | *Maximum runtime* | | =short5= | SL5 | 1 hour | | =medium5= | SL5 | 6 hours | | =long5= | SL5 | 1 day | | =vlong5= | SL5 | 5 days | | =short6= | SL6 | 1 hour | | =medium6= | SL6 | 6 hours | | =long6= | SL6 | 1 day | | =vlong6= | SL6 | 5 days | ---++ Using PBS ---+++ Create a submission script Jobs are defined using a submission script, which is like a shell script with the addition of certain directives (indicated by the =#PBS= prefix) which tell PBS how the job should be handled. A simple submission script might look like the following: <pre> #PBS -N TestJob #PBS -l walltime=1,mem=1024Mb #PBS -m abe #PBS -M user@machine # echo "This is a test..." </pre> ---+++ Submit a job Jobs are submitted using the =qsub= command: =$ qsub <FILENAME>= To submit a job defined by the submission script =test.pbs=: =$ qsub test.pbs= More details can be found in the qsub [[http://linux.die.net/man/1/qsub-torque][man page]]. ---+++ Show running jobs ---+++ Queues There are currently eight queues on the batch system. The four queues ending in '4' will run jobs on SL4 machines and the four queues ending in '5' will run jobs on SL5 machines: <pre> Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- short4 -- -- 01:00:00 -- 0 0 -- E R medium4 -- -- 06:00:00 -- 0 0 -- E R long4 -- -- 24:00:00 -- 0 0 -- E R vlong4 -- -- 120:00:0 -- 0 0 -- E R short5 -- -- 01:00:00 -- 0 0 -- E R medium5 -- -- 06:00:00 -- 0 0 -- E R long5 -- -- 24:00:00 -- 0 0 -- E R vlong5 -- -- 120:00:0 -- 0 0 -- E R </pre> where <code>short5</code> is the default queue and <code>Walltime</code> is the maximum walltime allowed on each queue. While it is possible to view your own jobs with <code>qstat</code>, the command will not display all jobs. To display all jobs use the Maui client command <code>showq</code> To see the current priorities of waiting jobs use the command <code>showq -i</code>. ---+++ Job Pre-emption Jobs in the <code>vlong4</code> and <code>vlong5</code> queues can be preempted by jobs waiting in the <code>short4</code>, <code>short5</code>, <code>medium4</code> or <code>medium5</code> queues. A preempted job is placed in the suspended state - it remains in memory but is not longer being executed. Once the preempting job has finished the preempted job starts executing again. ---+++ Job Priority The priority of a job is the sum of several weighting factors. * There is a constant weighting given to short jobs and smaller weighting given to medium and long jobs. So that if all other factors are equal short jobs will have priority. * The primary weighting is user fairshare. As a users jobs run their usage increases and the priority of their queued jobs decreases. This is balanced so that a user who uses exactly their fairshare allotment (currently 20% of the cpu averaged over the previous 48 days) will have their medium job priority decreased such that the medium job priority is equal to someone else's vlong job priority who has not used the batch system in the previous 48 days. * Waiting jobs priority slowly increases as a function of time waiting in the queue. Currently a vlong job would have to wait several weeks to match the priority of a medium queue job all other things being equal. ---+++ Killing a job Jobs may be terminated by executing <code>qdel JOBID</code> where the JOBID is the numerical ID code returned in the qstat listing.
This topic: IT
>
WebHome
>
BatchSystem
Topic revision: r12 - 2016-04-22 - GordonStewart
Copyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback