NA62 Monte Carlo Production Howto
This wiki explains how to submit NA62 Monte Carlo jobs on the Grid using the custom-written tools and online interface for this. This wiki is written for NA62 members who have volunteered to participate in the
production rota.
Monitoring
The web interface for NA62 MC Grid jobs scripting, monitoring and accounting is located at:
https://na62.gla.ac.uk/index.php?task=production
You can use this interface to monitor running and completed jobs, output files and production status. You can also use the iPhone app to monitor jobs, files and production status.
You can get the iPhone app from here | |
|
The aim is to maintain the production rate at its maximum (whatever that is, depending on the resources available) and for this the person on shift must submit new jobs when the number of waiting and running jobs is low. How is
low defined ?
We should have more than 200 jobs RUNNING at all times and about 50 SCHEDULED, but not more than 100 jobs in waiting states.
Please note that these numbers will change when new resources are added. Check this number at the beginning of each shift.
Job submissions in
production mode are done via the
Scripter interface, as explained below.
Scripter
The
Scripter is an user-friendly UI for producing all necessary job submission scripts (JDL, wrapper and .mac file), in both single- and multiple job submission
scripts and commands for NA62 MC job submission. The
Scripter is located here:
https://na62.gla.ac.uk/index.php?task=scripter
This is an HTML form with many input filelds, most of them self-explanatory. The pre-filled values are inherited from the previous submission
(which could have been a test job for example), so you
must check that they fit the production round you are managing.
Here is how the scripter interface looks like:
Description of the form fields:
- Choose description - this is now a drop down menu, containing items from the actual production schedule
- Run numbers - the start run is pre-filled with the next available run number (from the DB). Choose the upper limit such that you submit not more than 100 jobs at a time.
- Number of events - this is the number of events per job (run). We aim to keep the job runtime below 12 hours, so for channel 10 and v9/r261 that means 6000 events per job. For other channels you would have to calculate an optimum. How to do this: submit a 300-500 events job; when it finishes with output saved locally, go to the jobs table, click the "Submission Date/Time" cell to expand the row and get detailed info. Find in there the "events per second" figure. Calculate how many events (in multiples of 1000) we would be able to run and still keep the total runtime below 12 hours. Check previous production jobs as well.
- Leave the random seed as it is, because it will be set automatically for each run.
- MC software version - you must use the latest software version (check here if unsure). Take a look at the scripts of previous jobs to make (click sure. There is a grid "version" for each installed software revision (e.g. v6/r188, v7/r193, v9/r261), see this wiki.
- Radiative corrections, Disable detector(s), Disable Cherenkov - leave default value here ("off", "none,none"), unless instructed otherwise.
- Destination - tick here only the sites that have the chosen MC software version installed. Check this table to make sure. Check the jobs history to detect any problems at sites (e.g. jobs consistently finishing early, or going to status CLEARED without registering any output). If jobs fail at a site, uncheck it here, notify the site admin and add a comment in the logbook!
- In single jobs mode, you can display commented lines from scripts in case you would like to check extra settings, comments etc. For production, leave this unchecked.
- Write scripts to disk/User and password - for multiple job submissions, you need to tick the "Write scripts to disk" checkbox, and introduce your uid and password for this interface. You must have registered and your credentials must have been validated for this to work. In single job mode, uid and password are not needed, since you will have to submit the (test) job with your credentials from your UI.
- Click Prepare, and you are taken to a new page. If the page says "There are scheduled submissions in there. Please try again in 5 minutes" it means that you have (or someone else has) just scheduled another batch of jobs and you have to wait for these to be actually submitted, else the scripts may be overwritten - with unpredictable results.
Multiple Submissions
Below is the screen that will be displayed after clicking the
Prepare button for multiple submissions. Carefully double-check the settings here as well:
This example shows only two jobs. You can submit up to 60 at a time, but it is best to
submit batches of 50 (these numbers may change, check this wiki before your shift). You can open the linked files to check is all settings are correct.
Do not use manual submission. Click
Schedule to send these jobs to the bot. Relax. A cronjob will pick these commands and execute them within the next 10 minutes.
You will be able to see the result of your multiple submission by checking the
jobs table.
If the action is not executed, you must notify the Glasgow Scotgrid team <uki-scotgrid-glasgow@physics.gla.ac.uk> as soon as possible, because something is wrong with the WMS endpoint.
Manual job submission
Jobs can be submitted manually one by one with
your credentials (i.e. grid certificate), from command line on your Grid UI. Run the scripter in single run mode (no password is required),
paste the commands provided by the scripter into your UI terminal, and press
enter
. If you have submitted a job this way (e.g. for testing the system), then use the form at the bottom of the page
to insert the job specs and status URL in the run database.
You must have registered and your credentials must have been validated for this to work.
Remember that to submit jobs with your credentials, you must:
- have a valid Grid certificate (e.g. a CERN certificate)
- register for NA62 VO membership via https://voms.gridpp.ac.uk:8443/voms/na62.vo.gridpp.ac.uk/user/home.action (you must have the certificate uploaded into the browser for this).
- have access to a Grid UI (a computer with the necessary software and settings)
Make sure you have all the above. Familiarize yourself with Grid commands before trying this feature.
Troubleshooting
In case you find an error produced by the online interface, please immediately notify Dan, Janusz and Tonino.