Jet Flavour Tagging Howto
This is a detailed record on how the Marlin framework and included LCFI packages are used for jet flavour tagging. b-jet flavour tagging is part of our analysis of the feasibility of the ZZ fusion channel with CLIC ILD at 1.4 TeV.
|
|
Jet Finder and Truth Tagging
We use the
LCFI
flavour tagging package. This package consists of a topological vertex finder ZVTOP, which reconstructs secondary interactions, and a multivariate classifier which combines several jet-related variables to tag bottom, charm, and light quark jets (see
diagram).
Our steering file will contain the jet finder, flavour tagging and LCFI processors, and we will write new
slcio
files containing the added collections:
<group name="JetFinders"/>
<group name="MyTrueAngularJetFlavourProcessorCollection"/>
<processor name="IPRPCutProcessor"/>
<processor name="MyPerEventIPFitterProcessor"/>
<processor name="ZVRESRPCutProcessor"/>
<processor name="MyZVTOP_ZVRES"/>
<processor name="FTRPCutProcessor"/>
<processor name="MyFlavourTagInputsProcessor"/>
<processor name="MyLCIOOutputProcessor"/>
The
JetFinder processor reconstructs 2 and 4 jets events from the input collection (LooseSelectedPandoraPFANewPFOs was used). For the reconstructed 4 jets,
MyTrueAngularJetFlavourProcessor determines MC Jet Flavour by angular matching of heavy quarks to jets, and also determines hadronic and partonic charge of the jet.
The LCFI processors have the following functions:
- IPRPCut - selects Reconstructed Particles based on track parameters, number of hits etc.
- MyPerEventIPFitter - determines IP position and error from the tracks in an event by simple fitting
- ZVRESRPCut - applies cuts on the d0 and z0 values of the track
- MyZVTOP_ZVRES - topological vertex finder
- FTRPCut - flavour tagging reconstructed particle cuts (on d0, z0 and PT)
- MyFlavourTagInputs - from vertices and tracks calculates discriminating variables for the neural net
Table of
input and output collections for our setup (one can choose other names, of course):
Processor |
Type |
Input Collection name |
Output Collection name |
JetFinder |
SatoruJetFinder |
LooseSelectedPandoraPFANewPFOs |
Durham_4Jets |
MyTrueAngularJetFlavour |
TrueAngularJetFlavour |
MCParticlesSkimmed, Durham_4Jets |
TrueJetFlavour_4Jets |
IPRPCut |
RPCut |
LooseSelectedPandoraPFANewPFOs |
IPFitSelectedParticles |
MyPerEventIPFitter |
PerEventIPFitter |
IPFitSelectedParticles |
IPVertex |
ZVRESRPCut |
RPCut |
RecoMCTruthLink, Durham_4Jets |
ZVRESSelectedJets |
MyZVTOP_ZVRES |
ZVTOP_ZVRES |
IPVertex, ZVRESSelectedJets |
ZVRESDecayChains, ZVRESDecayChainRPTracks, ZVRESSelectedJets |
FTRPCut |
RPCut |
RecoMCTruthLink, ZVRESDecayChains |
FTSelectedJets |
MyFlavourTagInputs |
FlavourTagInputs |
ZVRESDecayChains, FTSelectedJets |
FlavourTagInputs |
Our input
slcio
files contain the collections: LooseSelectedPandoraPFANewPFOs, MCParticlesSkimmed, PandoraPFANewClusters, PandoraPFANewPFOs, PandoraPFANewReclusterMonitoring, ProngVertices, RecoMCTruthLink, SelectedLDCTracks, SelectedPandoraPFANewPFOs, TightSelectedPandoraPFANewPFOs and V0Vertices.
The processors listed above could be run in sequence, or split in several steps, invoking a
LCIOOutput processor to write intermediate
slcio
outputs at every step. Here's
a script for that, where the intermediate xml files are slight modifications of the files provided in
LCFIVertex/steering
examples. We found that the most
time-consuming processor is
ZVTOP_ZVRES with more than 10 s/event.
The
LCIOOutput processor creates new
slcio
files containing the new collections added by the above processors.
Troubleshooting: The
b3_D0CutValue
parameter of the IPRPCutProcessor was set to
5O
instead of
50
, and was causing a crash. For the ZVRESRPCut processor,
h1_MCPIDEnable
had to be set to
false
. See also
this post
.
Neural Network Training
The
slcio
files created at the previous step contain the collections
Durham_4Jets,
FlavourTagInputs and
TrueJetFlavour_4Jets, which we will use now to train our neural nets. We use the
NeuralNetTrainer
code included in the LCFI package. Separate nets were trained for 1, 2, or 3+ vertices to identify b-jets, c-jets, and c-jets with b background. Our steering file contains only:
<processor name="MyNeuralNetTrainer" type="NeuralNetTrainer"/>
The neural nets are saved as XML files in
nnets/
and will be used for flavour tagging (next step). No
slcio
output is written at this time.
Flavour Tagging
Now we are ready to employ the
FlavourTag processor, which will do flavour tagging using the neural nets trained in the previous step. The input
slcio
file contains the FlavourTagInputs and FTSelectedJets (or Durham_4Jets, not sure if there's a difference at this level) collections.
<processor name="MyFlavourTag"/>
<processor name="MyLCIOOutputProcessor"/>
The output
slcio
will contain the collection FlavourTag which will be used for our ZZFusion analysis.
Purity and Efficiency Studies
To determine the optimal cut for our b-tagging, a purity vs. efficiency study was performed. One can use the
MakePurityVsEfficiencyRootPlot.C
macro provided by the LCFIVertex package.
First, we have to run:
<processor name="MyAIDAProcessor"/>
<processor name="MyPlot"/>
<processor name="MyLCFIAIDAPlotProcessor"/>
We had to provide MyPlot with the actual name of the TrueJetFlavourCollection:
<parameter name="TrueJetFlavourCollection" type="string">TrueJetFlavour_4Jets </parameter>
Note that LCFI must be compiled with ROOT if one wants
.root output from PlotProcessor (instead of .txt). For this, add as usual
FIND_PACKAGE( ROOT REQUIRED )
FOREACH( pkg ROOT )
IF( ${pkg}_FOUND )
INCLUDE_DIRECTORIES( ${${pkg}_INCLUDE_DIRS} )
ADD_DEFINITIONS( ${${pkg}_DEFINITIONS} )
ENDIF()
ENDFOREACH()
to the
LCFIVertex/CMakeLists.txt
file, source the root environment, then run
cmake
and
make install
.
Once the Plots processors are run via
Marlin
, a RAIDA root file will be produced. Customise the
MakePurityVsEfficiencyRootPlot.C
macro and run it to use the RAIDA as input to produce the purity vs. efficiency plots.