MCLimits Fitting Code: Tool for Measuring Sensitivity
The MC Limits fitting code, developed by Tom Junk for the CDF Collaboration, is a hypothesis testing tool. Making use of the Likelihood function to discriminate between two hypothesised theories, it is possible to utilise the output of the test to measure expected discovery and exclusion limits for new Physics.
This tool is currently utilised at both
CDF
and
D0
at
Fermilab
for the combined exclusion measurements. This page will detail the use of the code at Glasgow as a tool for determining the sensitivity of the ATLAS Experiment to the SM Higgs Boson. Thus far, this tool has been utilised both as a tool to assess the sensitivity of a Neural Network analysis on the Higgs plus associated top channel and to assess the combined sensitivity of four low-mass Higgs channels in the mass range 110 - 190 Gev.
The code can be downloaded in a tarball from the
PhyStat
website, by following
this link
. The file you download (as of 15th Feb 2010) is mclimit_feb17_2009.tgz.
MCLimits Code
The MC Limits tarball should be downloaded following the link above. Several files are included in the tarball in a directory, mclimit/. Details of the files are given below, alongside their purpose.
- mclimits.C
- mclimits.h
- toy.C
-
Prerequisites
Project Aims
This project aims to develop an Artificial Neural Network (ANN) system and fitting software for the analysis of data from inclusive Higgs searches at ATLAS involving a lepton trigger and Higgs decay to b+bbar. It also aims to document this software, both its development and how it is to be used for data analysis - specifically separating signal from background and obtaining exclusion limits based on the anticipated luminosity. It is desirable that the documentation should allow the data analysis system to be used by those without advanced knowledge of ROOT or C++ programming, and should include a user-friendly guide to gaining access to relevant resources (TWikis, Grid, etc.).
Prerequisites
Before starting data analysis it is necessary to gain access to several online resources, specifically the Glasgow ATLAS TWiki and the central ATLAS TWiki, as well as obtaining a Grid certificate to allow use of Subversion, a software repository tool. The first and third of these can be accomplished easily enough - see instructions on
Full TWiki Access and
Grid Computing respectively. However, gaining access to the central
ATLAS Twiki
requires a CERN computer account.
Once you have a Grid certificate you must follow the instructions on the
PPEIT TWiki
to obtain access to the Subversion repositories.
Online Resources
The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks
here
. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.
Tools
ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.
The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).
ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:
Limitations
- Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
- It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
- The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
- The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
- Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
- If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
- It would be desirable to adapt the code to be able to process different signals, e.g. lbb.
TMVA Training Plots
There is a macro in
the latest NNFitter version
which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.
- Type source exportscript.sh to export the relevant parameters.
- Enter ROOT and type .x <nop>runTrainTest.C
This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.
Setting up Subversion
Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.
Running analysis & making ntuples
This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.
--
CatherineWright - 2010-02-12
Online Resources
The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks
here
. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.
Tools
ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.
The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).
ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:
Limitations
- Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
- It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
- The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
- The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
- Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
- If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
- It would be desirable to adapt the code to be able to process different signals, e.g. lbb.
TMVA Training Plots
There is a macro in
the latest NNFitter version
which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.
- Type source exportscript.sh to export the relevant parameters.
- Enter ROOT and type .x <nop>runTrainTest.C
This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.
Setting up Subversion
Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.
Running analysis & making ntuples
This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.
--
CatherineWright - 2010-02-12