Using Ganga to submit jobs to the Panda backend


Full Ganga Atlas Tutorial

Data preparation reporocessing - using Ganga

1. In a clean shell, setup Ganga. It is important to setup Ganga before Athena.

  source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh

2. Setup the athena release.

> source cmthome/setup.sh -tag=,AtlasProduction
3. Setup any checked out packages you use in your code.
   > cd $TEST/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
   > source setup.sh
4. Go to run directory and start ganga.
   > cd ../run
   > ganga
5. Execute your job script.
   In[0]: execfile('cbarrera_test2.py')
6. You can monitor your job's progress by typing *jobs* inside Ganga or, if you submitted to the Panda backend by http://panda.cern.ch:25880/server/pandamon/query.
7. Once your job has finished you can copy the output data using the dq2 tools.
   > source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
   > dq2-get 


Obviously it's not as simple as that... 

Here is the job file I submitted using ganga:

1    j = Job()
2    j.application = Athena()
3    j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v06060101:DBRelease-'
4    j.application.option_file = 'Data_jobOptions_cosmic.py'
5    j.application.athena_compile = False
6    j.application.prepare()
7    j.inputdata = DQ2Dataset()
8    j.inputdata.dataset = "data08_cos.00092051.physics_IDCosmic.recon.ESD.o4_r653/"
9    j.outputdata = DQ2OutputDataset()
10   j.backend = Panda()
11   j.splitter = DQ2JobSplitter()
12   j.splitter.numsubjobs = 20
13   j.submit()

A few comments on some of these lines. Line 3 is overriding the database release to match the one needed to read ESD/DPD. In the case of the spring cosmic reprocessing, 
the DB release is If the database releases don't match the jobs fail on the Grid. I don't understand why it works locally though (maybe a question for Graeme). 
Line 4 corresponds to your jobOptions. There were some changes I needed to do in mine as well, to prepare my code for running on the Grid:


include("RecExCommission/RecExCommissionFlags_jobOptions.py" )
ATLASCosmicFlags.useLocalCOOL  = True
# setup DBReplicaSvc to choose closest Oracle replica, configurables style
from AthenaCommon.AppMgr import ServiceMgr
from PoolSvc.PoolSvcConf import PoolSvc
from DBReplicaSvc.DBReplicaSvcConf import DBReplicaSvc


Also remember to remove the input data line in the original JO's (from second Reference webpage). Line 5 is set to False because we have already compiled the packages 
locally. Line 6 tells Ganga to tar your user area and send it with the job. Line 10 specifies the backend to which you are sending your job. There are three options: LCG, 
Panda and NorduGrid. I chose Panda because my data existed only in BNLPANDA, a site in the US cloud. And apparently, as Graeme told us yesterday, that's the way to go: 
prepare your jobs with Ganga and then submit to PanDa. Line 12 corresponds to the number of subjobs you want to split your job into. I read in this page 
https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ#DQ2JobSplitter_won_t_submit_when that if you are using DQ2Splitter you should not specify a site, that you should let 
it decide. But I believe specifying a cloud is permitted. Finally in Line 13 you submit your job.

This instructions are just guidance. I am sure unexpected things will emerge when dealing with different cases.

