Troubleshooting MC

This page lists the most common problems encoutered with the NA62 MC code and the whole grid production setup, along with our solutions.

GGUS tickets

Some of the problems encountered are available as GGUS tickets:

Code compilation

Shared object portability issue. In Generator/GNUmakefile, define a new variable CMCLIBNAME, just above the existing CMCLIB definition:

CMCLIBNAME := libcmc.so
CMCLIB := $(CMCDIR)/libcmc.so
and then the target:
Generator: $(CMCLIB)

$(CMCLIB): $(OBJF) $(OBJC) $(COMMONOBJ) $(OBJCC)
       $(FC) -shared -Wl,-soname,$(CMCLIBNAME) -o $(CMCLIB) $^
This way the .so is linked w/o the full path, which ensures portability of the executable. Same trick in Beam/GNUmakefile, but variable names BEAMLIBNAME and BEAMLIB.

Access permissions

Many errors came from read/write access permissions either for the local software area on sites, or for storage paths on various SEs. Using two certificates - one for jobs submission (and writing to the local storage) and the other for output replication to RAL and CERN complicates things somewhat.

Utilities

There are many utilities, mostly bash scripts for fixing various problems:

  • recover_zombies.sh - checks ZOMBIE jobs and updates the database if output is on CERN Castor (on svr020)
  • cleanup.sh - deletes old output files from local storage on sites (on svr020)
  • verify-castor.sh - checks if MC outputs are on CERN Castor and updates the DB (ppepc102)
  • castor.sh - produces missing files lists for Janusz's controller

Topic revision: r9 - 2013-04-12 - DanProtopopescu
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback