--
ThomasDoherty - 2009-10-26
Using Ganga to submit jobs to the Panda backend
References:
Full Ganga Atlas Tutorial
Data preparation reprocessing - using Ganga
1. In a clean shell, setup Ganga. It is important to setup Ganga before Athena.
source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
2. Setup the athena release.
NOTE: To set up for any release one must be familar with using CMT (bootstrap procedures and requirement files) - see
here
for more info:
source cmthome/setup.sh -tag=14.5.2.6,32,AtlasProduction
3. Setup any checked out packages you use in your code.
cd $TEST/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
> source setup.sh
4. Go to run directory and start ganga.
> cd ../run
> ganga
5. Execute your job script.
In[0]: execfile('cbarrera_test2.py')
6. You can monitor your job's progress by typing *jobs* inside Ganga or, if you submitted to the Panda backend by http://panda.cern.ch:25880/server/pandamon/query.
7. Once your job has finished you can copy the output data using the dq2 tools.
> source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
> dq2-get
yourData
Obviously it's not as simple as that...
Here is the job file I submitted using ganga:
1 j = Job()
2 j.application = Athena()
3 j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v06060101:DBRelease-6.6.1.1.tar.gz'
4 j.application.option_file = 'Data_jobOptions_cosmic.py'
5 j.application.athena_compile = False
6 j.application.prepare()
7 j.inputdata = DQ2Dataset()
8 j.inputdata.dataset = "data08_cos.00092051.physics_IDCosmic.recon.ESD.o4_r653/"
9 j.outputdata = DQ2OutputDataset()
10 j.backend = Panda()
11 j.splitter = DQ2JobSplitter()
12 j.splitter.numsubjobs = 20
13 j.submit()
A few comments on some of these lines. Line 3 is overriding the database release to match the one needed to read ESD/DPD. In the case of the spring cosmic reprocessing,
the DB release is 6.6.1.1. If the database releases don't match the jobs fail on the Grid. I don't understand why it works locally though (maybe a question for Graeme).
Line 4 corresponds to your jobOptions. There were some changes I needed to do in mine as well, to prepare my code for running on the Grid:
______________________________________________________________________________
include("RecExCommission/RecExCommissionFlags_jobOptions.py" )
ATLASCosmicFlags.useLocalCOOL = True
# setup DBReplicaSvc to choose closest Oracle replica, configurables style
from AthenaCommon.AppMgr import ServiceMgr
from PoolSvc.PoolSvcConf import PoolSvc
ServiceMgr+=PoolSvc(SortReplicas=True)
from DBReplicaSvc.DBReplicaSvcConf import DBReplicaSvc
ServiceMgr+=DBReplicaSvc(UseCOOLSQLite=False)
globalflags.ConditionsTag.set_Value_and_Lock('COMCOND-REPC-002-13')
______________________________________________________________________________
Also remember to remove the input data line in the original JO's (from second Reference webpage). Line 5 is set to False because we have already compiled the packages
locally. Line 6 tells Ganga to tar your user area and send it with the job. Line 10 specifies the backend to which you are sending your job. There are three options: LCG,
Panda and NorduGrid. I chose Panda because my data existed only in BNLPANDA, a site in the US cloud. And apparently, as Graeme told us yesterday, that's the way to go:
prepare your jobs with Ganga and then submit to PanDa. Line 12 corresponds to the number of subjobs you want to split your job into. I read in this page
https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ#DQ2JobSplitter_won_t_submit_when that if you are using DQ2Splitter you should not specify a site, that you should let
it decide. But I believe specifying a cloud is permitted. Finally in Line 13 you submit your job.
This instructions are just guidance. I am sure unexpected things will emerge when dealing with different cases.