TWiki
>
IT Web
>
HTCondor
(revision 1) (raw view)
Edit
Attach
---+ Batch System (HTCondor) %TOC% ---++ Overview The PPE group maintains an [[https://research.cs.wisc.edu/htcondor][HTCondor]] cluster for running batch jobs. This system is currently in its development phase, but it is expected it will replace the much older [[Batch System][PBS batch system]] in the future. You are welcome to submit jobs to the Condor cluster, but please be aware that machines may be reconfigured and rebooted without warning while the system is being commissioned. The current composition of the batch system is as follows: | *Nodes* | *Operating System* | *Total CPU Cores* | | =node004= | !CentOS 7 | 32 | The Condor central manager (the closest thing it has to a headnode) is =hex.ppe.gla.ac.uk=. HTCondor was known as Condor prior to 2012, when threatened legal action forced a change of name. It is still commonly referred to as simply "Condor", and you will find both names used interchangeably in this document. ---+++ Queues HTCondor does not have queues in the way that PBS does. Instead, jobs are submitted to the Condor pool and then matched to appropriate resources based on their individual requirements. ---+++ Job Prioritisation The cluster is configured with a fair-share scheduler, which aims to distribute compute time fairly among users. When multiple users are competing for resources, preference will be shown to users whose recent usage has been lower. Running jobs can be pre-empted by newly-submitted jobs with a higher priority. Pre-empted jobs will either be suspended or evicted. A suspended job remains on the node on which it was running, but is no longer executed; once the pre-empting job has finished, the pre-empted job will be allowed to continue. An evicted job is terminated and re-queued for execution at a later time. ---++ Using HTCondor Unlike PBS, which has a central server and multiple client machines, HTCondor features a distributed architecture. Jobs can be submitted from the central manager or from any machine running the scheduler daemon, which includes most Linux desktops. The job history which is reported by =condor_history= provides information for jobs submitted via the scheduler on the local machine (rather than across the whole pool), so it is a good idea to use a single machine for job submission. Running jobs must also communicate periodically with the submission machine. You may find it easiest to submit jobs by first logging into =hex.ppe.gla.ac.uk=. ---+++ Create a submit description file Jobs are defined using a submit description file, which contains commands which tell HTCondor how to queue the job. These commands are analogous to the lines in a PBS submission script which began with the =#PBS= prefix and contained directives used by PBS when queuing the job. A simple submit description file might look like the following: <pre> universe = vanilla executable = test.sh input = test.data output = test.out error = test.error log = test.log queue </pre> This will run the executable =test.sh= in a manner similar to the following: <pre> ./test.sh < test.data > test.out 2> test.error </pre> The log file (=test.log= in this example) will contain logging information provided by Condor. Further information can be found in the Condor documentation: * [[https://research.cs.wisc.edu/htcondor/quick-start.html][Examples for Submit Description Files]] * [[http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html][condor_submit]]
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r9
|
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2017-05-11
-
GordonStewart
IT
Log In
or
Register
IT Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
ATLAS
PUUKA
DetDev
Gridmon
IT
LHCb
LinearCollider
Main
NA62
Sandbox
TWiki
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback