|
META TOPICPARENT |
name="WebHome" |
NA62 Monte Carlo Grid Production Rota |
|
- intervene if possible to fix the error(s)
- notify the production coordinator if not possible to directly intervene to rectify the error
|
|
< < |
- notify the site admin if jobs systematically fail at a given site
|
> > |
- notify the site admin if jobs systematically fail at a given site. Check the jobs history to detect any problems at sites (e.g. jobs consistently finishing early, or going to status CLEARED without registering any output).
If jobs fail at a site, exclude it from job submissions and notify the site admin!
|
|
- notify Janusz if outputs fail to be replicated on CERN Castor (labelled LS instead of CC in the files table)
Checklist
The person in charge must do the following: |
|
< < |
- keep an eye on the jobs list and make sure we have 200+ jobs queued at all times
- if queue is low, use online script creator to submit a new batch of jobs
|
> > |
- keep an eye on the jobs list
and make sure we 200 jobs RUNNING at all times and not more than 100 SCHEDULED.* Please note that these numbers will change when new resources are added. Check this number at the beginning of each shift.
- if queue is low (less than 100 RUNNING), use online script creator to submit a new batch of jobs
|
|
- check production page to see if we've reached 100%
|
|
< < |
- re-validate results weekly (every Monday) with the physics group (email Tonino)
|
> > |
- re-validate results weekly (every Monday) with the physics group (email Tonino to check the outputs on CAstor)
|
|
- notify production coordinator if any major errors occur
Troubleshooting |