Jet Flavour Tagging Howto

This is a detailed record on how the Marlin framework and included LCFI packages are used for jet flavour tagging. b-jet flavour tagging is part of our analysis of the feasibility of the ZZ fusion channel with CLIC ILD at 1.4 TeV.

Jet Finder and Truth Tagging

We use the LCFI flavour tagging package. This package consists of a topological vertex finder ZVTOP, which reconstructs secondary interactions, and a multivariate classifier which combines several jet-related variables to tag bottom, charm, and light quark jets (see diagram).

Our steering file will contain the jet finder, flavour tagging and LCFI processors, and we will write new slcio files containing the added collections:

  <group name="JetFinders"/>
  <group name="MyTrueAngularJetFlavourProcessorCollection"/>
  <processor name="IPRPCutProcessor"/>
  <processor name="MyPerEventIPFitterProcessor"/>
  <processor name="ZVRESRPCutProcessor"/>
  <processor name="MyZVTOP_ZVRES"/>
  <processor name="FTRPCutProcessor"/>
  <processor name="MyFlavourTagInputsProcessor"/>
  <processor name="MyLCIOOutputProcessor"/>  

The JetFinder processor reconstructs 2 and 4 jets events from the input collection (LooseSelectedPandoraPFANewPFOs was used). For the reconstructed 4 jets, MyTrueAngularJetFlavourProcessor determines MC Jet Flavour by angular matching of heavy quarks to jets, and also determines hadronic and partonic charge of the jet.

The LCFI processors have the following functions:

  • IPRPCut - selects Reconstructed Particles based on track parameters, number of hits etc.
  • MyPerEventIPFitter - determines IP position and error from the tracks in an event by simple fitting
  • ZVRESRPCut - applies cuts on the d0 and z0 values of the track
  • MyZVTOP_ZVRES - topological vertex finder
  • FTRPCut - flavour tagging reconstructed particle cuts (on d0, z0 and PT)
  • MyFlavourTagInputs - from vertices and tracks calculates discriminating variables for the neural net

Table of input and output collections for our setup (one can choose other names, of course):

# Processor Type Input Collection name Output Collection name
1 JetFinder SatoruJetFinder LooseSelectedPandoraPFANewPFOs Durham_4Jets
2 MyTrueAngularJetFlavour TrueAngularJetFlavour MCParticlesSkimmed, Durham_4Jets TrueJetFlavour_4Jets
3 IPRPCut RPCut LooseSelectedPandoraPFANewPFOs IPFitSelectedParticles
4 MyPerEventIPFitter PerEventIPFitter IPFitSelectedParticles IPVertex
5 ZVRESRPCut RPCut RecoMCTruthLink, Durham_4Jets ZVRESSelectedJets
6 MyZVTOP_ZVRES ZVTOP_ZVRES IPVertex, ZVRESSelectedJets ZVRESDecayChains, ZVRESDecayChainRPTracks, ZVRESSelectedJets
7 FTRPCut RPCut RecoMCTruthLink, ZVRESDecayChains FTSelectedJets
8 MyFlavourTagInputs FlavourTagInputs ZVRESDecayChains, FTSelectedJets FlavourTagInputs

Our input slcio files contain the collections: LooseSelectedPandoraPFANewPFOs, MCParticlesSkimmed, PandoraPFANewClusters, PandoraPFANewPFOs, PandoraPFANewReclusterMonitoring, ProngVertices, RecoMCTruthLink, SelectedLDCTracks, SelectedPandoraPFANewPFOs, TightSelectedPandoraPFANewPFOs and V0Vertices.

The processors listed above could be run in sequence, or split in several steps, invoking a LCIOOutput processor to write intermediate slcio outputs at every step. Here's a script for that, where the intermediate xml files are slight modifications of the files provided in LCFIVertex/steering examples. We found that the most time-consuming processor is ZVTOP_ZVRES with more than 10 s/event.

The LCIOOutput processor creates new slcio files containing the new collections added by the above processors.

Troubleshooting: The b3_D0CutValue parameter of the IPRPCutProcessor was set to 5O instead of 50, and was causing a crash. For the ZVRESRPCut processor, h1_MCPIDEnable had to be set to false. See also this post.

Neural Network Training

The slcio files created at the previous step contain the collections Durham_4Jets, FlavourTagInputs and TrueJetFlavour_4Jets, which we will use now to train our neural nets. We use the NeuralNetTrainer code included in the LCFI package. Separate nets were trained for 1, 2, or 3+ vertices to identify b-jets, c-jets, and c-jets with b background. Our steering file contains only:

  <processor name="MyNeuralNetTrainer" type="NeuralNetTrainer"/>

The neural nets are saved as XML files in nnets/ and will be used for flavour tagging (next step). No slcio output is written at this time.

Flavour Tagging

Now we are ready to employ the FlavourTag processor, which will do flavour tagging using the neural nets trained in the previous step. The input slcio file contains the FlavourTagInputs and FTSelectedJets (or Durham_4Jets, not sure if there's a difference at this level) collections.

  <processor name="MyFlavourTag"/>
  <processor name="MyLCIOOutputProcessor"/>  
The output slcio will contain the collection FlavourTag which will be used for our ZZFusion analysis.

Purity and Efficiency Studies

To determine the optimal cut for our b-tagging, a purity vs. efficiency study needs to be done. We use the MakePurityVsEfficiencyRootPlot.C macro provided by the LCFIVertex package. First, we have to run:

  <processor name="MyAIDAProcessor"/>
  <processor name="MyPlot"/>
  <processor name="MyLCFIAIDAPlotProcessor"/>
We had to provide MyPlot with the actual name of the TrueJetFlavourCollection:
<parameter name="TrueJetFlavourCollection" type="string">TrueJetFlavour_4Jets </parameter>
Note that LCFI must be compiled with ROOT if one wants .root output from PlotProcessor (instead of .txt). For this, we added
FIND_PACKAGE( ROOT REQUIRED )
FOREACH( pkg ROOT ) 
 IF( ${pkg}_FOUND )
  INCLUDE_DIRECTORIES( ${${pkg}_INCLUDE_DIRS} )
  ADD_DEFINITIONS( ${${pkg}_DEFINITIONS} )
 ENDIF()
ENDFOREACH()
to the LCFIVertex/CMakeLists.txt file, sourced the root environment, then ran cmake and make install.

Once the AIDA Plots processors are run via Marlin, a RAIDA root file is produced. We customised the MakePurityVsEfficiencyRootPlot.C macro and ran it to use this RAIDA file as input to produce the purity vs. efficiency plots:

root -l MakePurityVsEfficiencyRootPlot.C

Topic attachments
I Attachment HistorySorted ascending Action Size Date Who Comment
PDFpdf LCFI_Flow_Diagram.pdf r1 manage 87.7 K 2013-07-01 - 16:10 DanProtopopescu LCFI processors - flow diagram
PNGpng Timing-ScreenShot.png r1 manage 70.9 K 2013-07-01 - 16:03 DanProtopopescu Screen shot: time used by Marlin processors
PDFpdf Vertexing_Howto.pdf r1 manage 743.5 K 2013-07-01 - 16:52 DanProtopopescu Vertexing HowTo (Ben Jeffery)
XMLxml jet_truth_tag-steer.xml r1 manage 82.1 K 2013-07-01 - 17:20 DanProtopopescu Steering file 1: jet finder and truth flavour tagging
Texttxt runLCFI.txt r1 manage 1.1 K 2013-07-04 - 16:15 DanProtopopescu Script to run LCFI sequence of processors
PNGpng BRTotalUncertBands_lm.png r2 r1 manage 111.7 K 2013-07-01 - 14:41 DanProtopopescu Higgs branching ratios (from A. Denner et al., EPJ C71, p.1753)
Edit | Attach | Print version | History: r30 | r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2013-07-10 - DanProtopopescu
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback