ATLAS Web>RunningGangaWithPanda (revision 1)~~EditAttach~~

-- ThomasDoherty - 2009-10-26

Using Ganga to submit jobs to the Panda backend

References:

Data preparation reporocessing - using Ganga

1. In a clean shell, setup Ganga. It is important to setup Ganga before Athena.

  source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh

2. Setup the athena release.

   > source cmthome/setup.sh -tag=14.5.2.6,AtlasProduction

3. Setup any checked out packages you use in your code.
   > cd $TEST/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
   > source setup.sh
4. Go to run directory and start ganga.
   > cd ../run
   > ganga
5. Execute your job script.
   In[0]: execfile('cbarrera_test2.py')
6. You can monitor your job's progress by typing *jobs* inside Ganga or, if you submitted to the Panda backend by http://panda.cern.ch:25880/server/pandamon/query.
7. Once your job has finished you can copy the output data using the dq2 tools.
   > source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
   > dq2-get

yourData

Obviously it's not as simple as that... 

Here is the job file I submitted using ganga:

1    j = Job()
2    j.application = Athena()
3    j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v06060101:DBRelease-6.6.1.1.tar.gz'
4    j.application.option_file = 'Data_jobOptions_cosmic.py'
5    j.application.athena_compile = False
6    j.application.prepare()
7    j.inputdata = DQ2Dataset()
8    j.inputdata.dataset = "data08_cos.00092051.physics_IDCosmic.recon.ESD.o4_r653/"
9    j.outputdata = DQ2OutputDataset()
10   j.backend = Panda()
11   j.splitter = DQ2JobSplitter()
12   j.splitter.numsubjobs = 20
13   j.submit()

A few comments on some of these lines. Line 3 is overriding the database release to match the one needed to read ESD/DPD. In the case of the spring cosmic reprocessing, 
the DB release is 6.6.1.1. If the database releases don't match the jobs fail on the Grid. I don't understand why it works locally though (maybe a question for Graeme). 
Line 4 corresponds to your jobOptions. There were some changes I needed to do in mine as well, to prepare my code for running on the Grid:

______________________________________________________________________________

include("RecExCommission/RecExCommissionFlags_jobOptions.py" )
ATLASCosmicFlags.useLocalCOOL  = True
# setup DBReplicaSvc to choose closest Oracle replica, configurables style
from AthenaCommon.AppMgr import ServiceMgr
from PoolSvc.PoolSvcConf import PoolSvc
ServiceMgr+=PoolSvc(SortReplicas=True)
from DBReplicaSvc.DBReplicaSvcConf import DBReplicaSvc
ServiceMgr+=DBReplicaSvc(UseCOOLSQLite=False) 

globalflags.ConditionsTag.set_Value_and_Lock('COMCOND-REPC-002-13')
______________________________________________________________________________

Also remember to remove the input data line in the original JO's (from second Reference webpage). Line 5 is set to False because we have already compiled the packages 
locally. Line 6 tells Ganga to tar your user area and send it with the job. Line 10 specifies the backend to which you are sending your job. There are three options: LCG, 
Panda and NorduGrid. I chose Panda because my data existed only in BNLPANDA, a site in the US cloud. And apparently, as Graeme told us yesterday, that's the way to go: 
prepare your jobs with Ganga and then submit to PanDa. Line 12 corresponds to the number of subjobs you want to split your job into. I read in this page 
https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ#DQ2JobSplitter_won_t_submit_when that if you are using DQ2Splitter you should not specify a site, that you should let 
it decide. But I believe specifying a cloud is permitted. Finally in Line 13 you submit your job.

This instructions are just guidance. I am sure unexpected things will emerge when dealing with different cases.

Topic revision: r1 - 2009-10-26 - ThomasDoherty

ATLAS