Difference: RunningGangaWithPanda (2 vs. 3)

Revision 32009-10-26 - ThomasDoherty

 -- ThomasDoherty - 2009-10-26
 Using Ganga to submit jobs to the Panda backend
 Data preparation reprocessing - using Ganga
-<
<
+. In a clean shell, setup Ganga. It is important to setup Ganga before Athena.
->
>
+. In a clean shell, setup Ganga.
   source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh



2. Setup the athena release.
-<
<
+NOTE: To set up for any release one must be familar with using CMT (bootstrap procedures and requirement files) - see here for more info:
->
>
+NOTE: To set up for any release one must be familar with using CMT (bootstrap procedures and requirement files) - see here for more information. In this case once your requirements file is set up and a directory in your test area for 14.5.2.6 is created then try:
   source cmthome/setup.sh -tag=14.5.2.6,32,AtlasProduction



3. Setup any checked out packages you use in your code.
-<
<
+  cd $TEST/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
->
>
+For example check out (and compile) the UserAnalysis package as in the "HelloWorld"example here or here.
  cd $TEST/PhysicsAnalysis/AnalysisCommon/UserAnalysis/cmt
  source setup.sh
-<
<
+> source setup.sh
4. Go to run directory and start ganga.
   > cd ../run
   > ganga
5. Execute your job script.
   In[0]: execfile('cbarrera_test2.py')
6. You can monitor your job's progress by typing *jobs* inside Ganga or, if you submitted to the Panda backend by http://panda.cern.ch:25880/server/pandamon/query.
7. Once your job has finished you can copy the output data using the dq2 tools.
   > source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
   > dq2-get
->
>
+. Go to run directory and start ganga.
   cd ../run
   ganga



5. Execute your Ganga job script while Ganga is running (where an example of what the 'pandaBackend_test.py' would look like is below in other words have this file in your run directory) and type:
    execfile('pandaBackend_test.py')
-<
<
+ yourData 
Obviously it's not as simple as that...
->
>
+. You can monitor your job's progress by typing jobs inside Ganga or, if you submitted to the Panda backend by http://panda.cern.ch:25880/server/pandamon/query.
-<
<
+Here is the job file I submitted using ganga:
->
>
+. Once your job has finished you can copy the output data using the dq2 tools.
          source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh
   dq2-get "your_dataset_name"
-<
<
+j = Job()
->
>
+Where "your_dataset_name" is given to you by Ganga once the job completes. And 'pandaBackend_test.py' could look like this (without line numbers):
1    j = Job()
 j.application = Athena()
3    j.application.atlas_dbrelease = 'ddo.000001.Atlas.Ideal.DBRelease.v06060101:DBRelease-6.6.1.1.tar.gz'
4    j.application.option_file = 'Data_jobOptions_cosmic.py'
 j.splitter.numsubjobs = 20
13   j.submit()
-<
<
+A few comments on some of these lines. Line 3 is overriding the database release to match the one needed to read ESD/DPD. In the case of the spring cosmic reprocessing, 
the DB release is 6.6.1.1. If the database releases don't match the jobs fail on the Grid. I don't understand why it works locally though (maybe a question for Graeme). 
Line 4 corresponds to your jobOptions. There were some changes I needed to do in mine as well, to prepare my code for running on the Grid:
->
>
+NOTE: Line 3 is overriding the database release to match the one needed to read ESD/DPD. In the case of the spring cosmic reprocessing,the DB release is 6.6.1.1. If the database releases don't match the jobs fail on the Grid. Line 4 corresponds to your Athena jobOptions. You can use the top job options copied from your UserAnalysis packages share directory.
cp ../share/AnalysisSkeleton_topOptions.py .
-<
<
+__________________________________________________________________________
->
>
+BUT to prepare your code for running on the Grid there are some changes needed for this Athena JO - please add these lines:
______________________________________________________________________________
 include("RecExCommission/RecExCommissionFlags_jobOptions.py" )
ATLASCosmicFlags.useLocalCOOL  = True
 from DBReplicaSvcConf import DBReplicaSvc
ServiceMgr+=DBReplicaSvc(UseCOOLSQLite=False)
-<
<
+globalflags.ConditionsTag.set_Value_and_Lock('COMCOND-REPC-002-13')
__________________________________________________________________________
->
>
+_________________________________________________________________________
-<
<
+Also remember to remove the input data line in the original JO's (from second Reference webpage). Line 5 is set to False because we have already compiled the packages 
locally. Line 6 tells Ganga to tar your user area and send it with the job. Line 10 specifies the backend to which you are sending your job. There are three options: LCG, 
Panda and NorduGrid. I chose Panda because my data existed only in BNLPANDA, a site in the US cloud. And apparently, as Graeme told us yesterday, that's the way to go: 
prepare your jobs with Ganga and then submit to PanDa. Line 12 corresponds to the number of subjobs you want to split your job into. I read in this page 
https://twiki.cern.ch/twiki/bin/view/Atlas/DAGangaFAQ#DQ2JobSplitter_won_t_submit_when that if you are using DQ2Splitter you should not specify a site, that you should let 
it decide. But I believe specifying a cloud is permitted. Finally in Line 13 you submit your job.
->
>
+Also remember to remove (or comment out) the input data line and change the geometry tag and the conditions DB tag to match those used in the reprocessing cycle (see details for each reprocessing campaign on this page here. For example:
globalflags.ConditionsTag.set_Value_and_Lock('COMCOND-REPC-002-13')
-<
<
+This instructions are just guidance. I am sure unexpected things will emerge when dealing with different cases.
->
>
+The athena JO's used for this specific example (Data_jobOptions_cosmic.py and Settings_DepletionDepth.py) can be found here and here - so if you wish to use them - copy them into your run directory.
->
>
+Back to the Ganga JO script:
Line 5 is set to False because we have already compiled the packages locally.
Line 6 tells Ganga to tar your user area and send it with the job.
Line 10 specifies the backend to which you are sending your job. There are three options: LCG, 
Panda and NorduGrid. I chose Panda because my data existed only in BNLPANDA, a site in the US cloud. 
Line 12 corresponds to the number of subjobs you want to split your job into. 
Finally in Line 13 you submit your job.

View topic | History: r15 < r14 < r13 < r12 | More topic actions...