Troubleshooting MC
This page lists the most common problems encoutered with the NA62 MC code and the whole grid production setup, along with our solutions.
GGUS tickets
Some of the problems encountered are available as GGUS tickets:
Code compilation
Shared object portability issue. In
Generator/GNUmakefile
, define a new variable
CMCLIBNAME
, just above the existing
CMCLIB
definition:
CMCLIBNAME := libcmc.so
CMCLIB := $(CMCDIR)/libcmc.so
and then the target:
Generator: $(CMCLIB)
$(CMCLIB): $(OBJF) $(OBJC) $(COMMONOBJ) $(OBJCC)
$(FC) -shared -Wl,-soname,$(CMCLIBNAME) -o $(CMCLIB) $^
This way the .so is linked w/o the full path, which ensures portability of the executable. Same trick in
Beam/GNUmakefile
, but variable names
BEAMLIBNAME
and
BEAMLIB
.
Access permissions
Many errors came from read/write access permissions either for the local software area on sites, or for storage paths on various SEs. Using two certificates - one for jobs submission (and writing to the local storage) and the other for output replication to RAL and CERN complicates things somewhat.
Utilities
There are many utilities, mostly
bash
scripts for fixing various problems:
-
recover_zombies.sh
- checks ZOMBIE jobs and updates the database if output is on CERN Castor (on svr020)
-
cleanup.sh
- deletes old output files from local storage on sites (on svr020)
-
verify-castor.sh
- checks if MC outputs are on CERN Castor and updates the DB (ppepc102)
-
castor.sh
- produces missing files lists for Janusz's controller