MCLimits Fitting Code: Tool for Measuring Sensitivity

The MC Limits fitting code, developed by Tom Junk for the CDF Collaboration, is a hypothesis testing tool. Making use of the Likelihood function to discriminate between two hypothesised theories, it is possible to utilise the output of the test to measure expected discovery and exclusion limits for new Physics.

This tool is currently utilised at both CDF and D0 at Fermilab for the combined exclusion measurements. This page will detail the use of the code at Glasgow as a tool for determining the sensitivity of the ATLAS Experiment to the SM Higgs Boson. Thus far, this tool has been utilised both as a tool to assess the sensitivity of a Neural Network analysis on the Higgs plus associated top channel and to assess the combined sensitivity of four low-mass Higgs channels in the mass range 110 - 190 Gev.

The code can be downloaded in a tarball from the PhyStat website, by following this link. The file you download (as of 15th Feb 2010) is mclimit_feb17_2009.tgz.

MCLimits Code

The MC Limits tarball should be downloaded following the link above. Several files are included in the tarball in a directory, mclimit/. Details of the files are given below, alongside their purpose.

  • mclimits.C
  • mclimits.h
  • toy.C

Prerequisites

Project Aims

This project aims to develop an Artificial Neural Network (ANN) system and fitting software for the analysis of data from inclusive Higgs searches at ATLAS involving a lepton trigger and Higgs decay to b+bbar. It also aims to document this software, both its development and how it is to be used for data analysis - specifically separating signal from background and obtaining exclusion limits based on the anticipated luminosity. It is desirable that the documentation should allow the data analysis system to be used by those without advanced knowledge of ROOT or C++ programming, and should include a user-friendly guide to gaining access to relevant resources (TWikis, Grid, etc.).

Prerequisites

Before starting data analysis it is necessary to gain access to several online resources, specifically the Glasgow ATLAS TWiki and the central ATLAS TWiki, as well as obtaining a Grid certificate to allow use of Subversion, a software repository tool. The first and third of these can be accomplished easily enough - see instructions on Full TWiki Access and Grid Computing respectively. However, gaining access to the central ATLAS Twiki requires a CERN computer account.

Once you have a Grid certificate you must follow the instructions on the PPEIT TWiki to obtain access to the Subversion repositories.

Online Resources

The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks here. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.

Tools

ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.

The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).

ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:

Limitations

  • Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
  • It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
  • The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
  • The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
  • Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
  • If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
  • It would be desirable to adapt the code to be able to process different signals, e.g. lbb.

TMVA Training Plots

There is a macro in the latest NNFitter version which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.

  • Type source exportscript.sh to export the relevant parameters.
  • Enter ROOT and type .x <nop>runTrainTest.C

This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.

Setting up Subversion

Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.

Running analysis & making ntuples

This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.

-- CatherineWright - 2010-02-12

Online Resources

The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks here. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.

Tools

ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.

The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).

ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:

Limitations

  • Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
  • It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
  • The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
  • The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
  • Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
  • If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
  • It would be desirable to adapt the code to be able to process different signals, e.g. lbb.

TMVA Training Plots

There is a macro in the latest NNFitter version which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.

  • Type source exportscript.sh to export the relevant parameters.
  • Enter ROOT and type .x <nop>runTrainTest.C

This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.

Setting up Subversion

Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.

Running analysis & making ntuples

This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.

-- CatherineWright - 2010-02-12

Edit | Attach | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 2010-02-15 - CatherineWright
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback