Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Batch System (HTCondor) | ||||||||
Line: 6 to 6 | ||||||||
Overview | ||||||||
Changed: | ||||||||
< < | The PPE group maintains an HTCondor![]() | |||||||
> > | The PPE group maintains an HTCondor![]() | |||||||
The current composition of the batch system is as follows:
| ||||||||
Changed: | ||||||||
< < | ||||||||
> > |
| |||||||
The Condor central manager (the closest thing it has to a headnode) is hex.ppe.gla.ac.uk .
HTCondor was known as Condor prior to 2012, when threatened legal action forced a change of name. It is still commonly referred to as simply "Condor", and you will find both names used interchangeably in this document. | ||||||||
Line: 36 to 34 | ||||||||
You may find it easiest to submit jobs by first logging into hex.ppe.gla.ac.uk . | ||||||||
Deleted: | ||||||||
< < | ||||||||
Create a submit description fileJobs are defined using a submit description file, which contains commands which tell HTCondor how to queue the job. These commands are analogous to the lines in a PBS submission script which began with the#PBS prefix and contained directives used by PBS when queuing the job.
A simple submit description file might look like the following: | ||||||||
Changed: | ||||||||
< < |
universe = vanilla | |||||||
> > | universe = vanilla | |||||||
executable = test.sh input = test.data output = test.out | ||||||||
Line: 55 to 50 | ||||||||
This will run the executable test.sh in a manner similar to the following: | ||||||||
Changed: | ||||||||
< < |
./test.sh < test.data > test.out 2> test.error | |||||||
> > | ./test.sh < test.data > test.out 2> test.error | |||||||
The log file (test.log in this example) will contain logging information provided by Condor. | ||||||||
Added: | ||||||||
> > | Condor jobs are allocated a single CPU and 1 GiB memory by default, and will run on a machine with the same architecture and operating system as the submission host (i.e. jobs submitted from hex.ppe.gla.ac.uk will run on CentOS 7 nodes by default). To request a different resource allocation, or to specify that a job should run under a different operating system, see Specify CPU and memory requirements and Submit a job with additional requirements. | |||||||
Further information can be found in the Condor documentation: | ||||||||
Line: 75 to 69 | ||||||||
$ condor_submit <FILENAME>
After running this command, the ID of the newly-submitted job will be output. For example, to submit a job defined by the submit description file test.job : | ||||||||
Changed: | ||||||||
< < |
$ condor_submit test.job | |||||||
> > | $ condor_submit test.job | |||||||
Submitting job(s). 1 job(s) submitted to cluster 38. | ||||||||
Line: 84 to 76 | ||||||||
This cluster ID (38 in this example) can be used to manage the job in the future. | ||||||||
Deleted: | ||||||||
< < | ||||||||
Show status informationYou can view details of submitted jobs using thecondor_q command: | ||||||||
Changed: | ||||||||
< < |
$ condor_q | |||||||
> > | $ condor_q | |||||||
Changed: | ||||||||
< < | -- Schedd: hex.ppe.gla.ac.uk : <172.20.203.50:9618?... @ 05/30/17 11:18:00 | |||||||
> > | -- Schedd: hex.ppe.gla.ac.uk : <172.20.203.50:9618?... @ 05/30/17 11:18:00 | |||||||
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS gpstewart CMD: sleep.sh 5/30 11:17 _ 1 _ 1 42.0 | ||||||||
Line: 101 to 90 | ||||||||
You can view information about the state of the Condor system as a whole using the condor_status command: | ||||||||
Changed: | ||||||||
< < |
$ condor_status | |||||||
> > | $ condor_status | |||||||
Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@node004.ppe.gla.ac.uk LINUX X86_64 Unclaimed Idle 0.000 64010 6+00:49:19 | ||||||||
Line: 126 to 112 | ||||||||
$ condor_rm <CLUSTER_ID>
To remove the job with cluster ID 43 : | ||||||||
Changed: | ||||||||
< < |
$ condor_rm 43 | |||||||
> > | $ condor_rm 43 | |||||||
All jobs in cluster 43 have been marked for removal | ||||||||
Line: 132 to 116 | ||||||||
All jobs in cluster 43 have been marked for removal | ||||||||
Deleted: | ||||||||
< < | ||||||||
View historyYou can view information about historical job submission using thecondor_history command: | ||||||||
Changed: | ||||||||
< < |
$ condor_history | |||||||
> > | $ condor_history | |||||||
ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 43.0 gpstewart 5/30 11:28 X ??? /home/grid/gpstewart/condor/sleep/sleep.sh 42.0 gpstewart 5/30 11:17 0+00:00:31 C 5/30 11:18 /home/grid/gpstewart/condor/sleep/sleep.sh | ||||||||
Line: 147 to 128 | ||||||||
39.0 gpstewart 5/11 14:00 0+00:00:06 C 5/11 14:00 /home/grid/gpstewart/condor/mail/mail.sh | ||||||||
Added: | ||||||||
> > | You can view detailed information a job by including the -long argument:
$ condor_history -long 3805 ResidentSetSize = 0 ResidentSetSize_RAW = 0 RemoteUserCpu = 0.0 RecentBlockWrites = 0 RecentBlockReadKbytes = 36 JobCurrentStartExecutingDate = 1498649218 ... | |||||||
As noted previously, the history which is reported by Condor provides information for jobs submitted via the scheduler on the local machine only, and not across the whole pool. | ||||||||
Added: | ||||||||
> > |
Specify CPU and memory requirementsUnlike the old PBS nodes, on which jobs were free to grab whatever resources they liked (to the detriment of both themselves and other jobs on the node), the Condor compute nodes are configured to use cgroups which will restrict a job's resource usage to those resources requested. By default, all Condor jobs are allocated a single CPU and 1 GiB memory. You can adjust these values by addingrequest_cpus and request_memory statements to your job submit description file:
request_cpus = 2 request_memory = 4 GBRequesting significantly more CPUs or memory than usual may mean that your job has to wait longer before sufficient resources can be allocated to run it. On the other hand, specifying a lower memory requirement may allow jobs to squeeze in to otherwise heavily-loaded nodes. Further information can be found in the Commands for Matchmaking section of the Condor documentation: Submit a job with additional requirementsYou can exert more control over where a job runs by including arequirements specification in your job submit description file. This allows you to specify values for various Condor ClassAds, combined with C-style boolean operators. For example, to specify that your job should run on a Scientific Linux 6 machine:
requirements = OpSysAndVer == "SL6"You can obtain a list of ClassAds and their values on a given node by running the following command: condor_status -startd HOSTNAME
For example, to obtain the list of ClassAds from node005 :
condor_status -startd node005.ppe.gla.ac.uk
Further information can be found in the Commands for Matchmaking section of the Condor documentation:
|