NA62 Monte Carlo Grid Production Shifts
This wiki contains information about and for the volunteers for the NA62 MC Grid production rota. Check this page regularly for updates.
Today's production plan
Keep the queue full, which means 200-300 jobs running at all times, and less than 100 scheduled.
We run decay type 10, Kch2pipipi simulations with software version v9/r261 at IC, RAL, GLA, LIV, BIR and UCL. We run 6000 events jobs.
When this is done, and if this wiki is not updated, continue with the next production on the list, moving upwards on the page.
This wiki section will be updated often, so please check carefully before your shift. If you find any discrepancies, please notify Dan and Janusz immediately. Read the logbook/blog for more up-to-date information:
https://na62.gla.ac.uk/elog (or older elog at
http://na62shifts.wordpress.com/).
Shifts sign up
We have the following shifters: Antonio Cassese, Mark Slater, Philip Rubin, Paolo Massarotti, Monica Pepe, Spasimir Balev, Vito Palladino, Mario Vormstein , Karim Massri.
We have a Doodle poll entitled "NA62 Grid Production Shifts", where volunteers can sign up for shifts. If you've volunteered here for shifts, please check your email for the poll address.
Production plan
The production plan is located here:
https://na62.gla.ac.uk/index.php?task=production
This displays a database-generated table that fills up along the way. New items may be added by the production coordinator along the way. Only Mark and Dan can use
Ganga for job management at this time, but it will be available soon for everyone.
Grid Production Shifts
If all goes well, GP "shifts" will not require too much work. Shifts are defined as 9am to 9pm CERN time but it is enough if the person on shift checks the production status and submits jobs twice a day or so. Is problems arise, this might require another hour or so of your time, since you will not be expected to actually fix the problem but only to notify the admins.
What to do on shift
Until Ganga is available, MC jobs should be submitted manually. The person overseeing the production (the "shift" taker) will have to check periodically if everything goes according to plan, and
- intervene if possible to fix the error(s)
- notify the production coordinator if not possible to directly intervene to rectify the error
- notify the site admin if jobs systematically fail at a given site. Check the jobs history to detect any problems at sites (e.g. jobs consistently finishing early, or going to status CLEARED without registering any output). If jobs fail at a site, exclude it from job submissions and notify the site admin!
- notify Janusz if outputs fail to be replicated on CERN Castor (labelled LS instead of CC in the files table)
Checklist
The person in charge must do the following:
- check current production status from the wiki and from our logbook
- keep an eye on the jobs list and make sure we have 200 jobs RUNNING at all times and not more than 50 SCHEDULED. Please note that these numbers will change when new resources are added. Check this number at the beginning of each shift, read the (b)logbook and eventually contact the shifter before you for clarifications.
- if queue is low (less than 100 RUNNING), use online script creator to submit a new batch of jobs
- check production page to see if we've reached 100%. If yes, then start next production round on the list. Rounds that need to be done are the ones with 0 runs, and 0 files, and marked with "Not yet started".
- re-validate results weekly (every Monday) with the physics group (email Tonino to check the outputs on Castor)
- notify production coordinator if any major errors occur
- log everything in our logbook (https://na62.gla.ac.uk/elog) or blog (http://na62shifts.wordpress.com, no longer used)
Troubleshooting
In case you find an error produced by the online interface, please immediately notify Dan, Janusz and Tonino.