NA62 Monte Carlo Grid Production Rota
This wiki contains information about and for the volunteers for the NA62 MC Grid production rota. Check this page regularly for updates.
List of volunteers
As of October 15, 2012 we have the following names: Antonio Cassese, Mark Slater, Philip Rubin, Paolo Massarotti, Monica Pepe, Spasimir Balev, Vito Palladino, Mario Vormstein , Karim Massri.
A schedule will be posted in here as soon as people sign up for shifts.
Production plan
The production plan is located here:
http://na62.gla.ac.uk/index.php?task=production
This displays a database-generated table that fills up along the way. New items will be added by the production coordinator as discussed at the Siena meeting in August 2012.
Grid Production Shifts
GP "shifts" will not require too much work, and it is enough if the person on shift will check the production status twice a day or so. It is expected that GP 'shifts' will be at least a day long.
How it all works
MC jobs will be submitted by automaticall by cron jobs. The person overseeing the production (the "shift" taker) will have to check periodically if everything goes according to plan, and
- intervene if possible to fix the error(s)
- notify the production coordinator if not possible to directly intervene to rectify the error
- notify the site admin if jobs systematically fail at a given site. Check the jobs history to detect any problems at sites (e.g. jobs consistently finishing early, or going to status CLEARED without registering any output).
If jobs fail at a site, exclude it from job submissions and notify the site admin!
- notify Janusz if outputs fail to be replicated on CERN Castor (labelled LS instead of CC in the files table)
Checklist
The person in charge must do the following:
- keep an eye on the jobs list
and make sure we 200 jobs RUNNING at all times and not more than 100 SCHEDULED.*
Please note that these numbers will change when new resources are added. Check this number at the beginning of each shift.
- if queue is low (less than 100 RUNNING), use online script creator to submit a new batch of jobs
- check production page to see if we've reached 100%
- re-validate results weekly (every Monday) with the physics group (email Tonino to check the outputs on CAstor)
- notify production coordinator if any major errors occur
Troubleshooting
In case you find an error produced by the online interface, please immediately notify Dan, Janusz and Tonino.