Difference: McLimitsFittingCode (6 vs. 7)

Revision 72010-02-15 - CatherineWright

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

MCLimits Fitting Code: Tool for Measuring Sensitivity

Line: 7 to 7
 This tool is currently utilised at both CDF and D0 at Fermilab for the combined exclusion measurements. This page will detail the use of the code at Glasgow as a tool for determining the sensitivity of the ATLAS Experiment to the SM Higgs Boson. Thus far, this tool has been utilised both as a tool to assess the sensitivity of a Neural Network analysis on the Higgs plus associated top channel and to assess the combined sensitivity of four low-mass Higgs channels in the mass range 110 - 190 Gev.

The code can be downloaded in a tarball from the PhyStat website, by following this link. The file you download (as of 15th Feb 2010) is mclimit_feb17_2009.tgz.

Changed:
<
<

MCLimits Code

>
>

MCLimits Tarball

 
Changed:
<
<
The MC Limits tarball should be downloaded following the link above. Several files are included in the tarball in a directory, mclimit/. Details of the files are given below, alongside their purpose.
  • mclimits.C
  • mclimits.h
  • toy.C

Prerequisites

>
>
The MC Limits tarball should be downloaded following the link above. Several files are included in the tarball in a directory, mclimit/. These are detailed below:

Documentation

 
Changed:
<
<

Project Aims

>
>
There is significant documentation which comes along with the tarball of the code. This contains details of the methods and functions available in mclimits as well as giving an outline of the statistical framework
  • mclimit_csm.pdf - Details general statistical framework and also has details of each of the methods available in the mclimits code.
  • chisquare.pdf - Describes the relationship between a chi^2 and a likelihood, detailing the specific chi^2 function minimised. Discusses how systematic uncertainties are handled.
  • genlimit.pdf - Describes the Bayesian Limit Calculator, developed by J. Heinrich and utilised by CDF/D0 to produce the combined exclusion limits for the two experiments as a function of the SM cross-section.
  • mclimit.html - Contains details of the updates to the code along with brief explanations of what each file is
  • README - Warning about the example code for running. See below.

The Code

  • mclimits_csm.C - This is the code, containing all the methods and functions necessary to test two hypotheses and produce discovery sensitivity and exclusion limits.
  • mclimits_csm.h

Test Files - Beware

 
Changed:
<
<
This project aims to develop an Artificial Neural Network (ANN) system and fitting software for the analysis of data from inclusive Higgs searches at ATLAS involving a lepton trigger and Higgs decay to b+bbar. It also aims to document this software, both its development and how it is to be used for data analysis - specifically separating signal from background and obtaining exclusion limits based on the anticipated luminosity. It is desirable that the documentation should allow the data analysis system to be used by those without advanced knowledge of ROOT or C++ programming, and should include a user-friendly guide to gaining access to relevant resources (TWikis, Grid, etc.).
>
>
These files DO NOT work (see the README file for details):
  • tchanlc.C
  • tchan_cls.C
  • preparetchan.h
  • Makefile
  • Makefile.arch
 
Changed:
<
<

Prerequisites

>
>
This file might work, but hasn't been tested. It takes the output of the mclimits method, csm_model::print() (which is detailed in the documentation, mclimits_csm.pdf) and produces a webpage of template names and systematics sources for each template.
  • ptohtml.pl
 
Changed:
<
<
Before starting data analysis it is necessary to gain access to several online resources, specifically the Glasgow ATLAS TWiki and the central ATLAS TWiki, as well as obtaining a Grid certificate to allow use of Subversion, a software repository tool. The first and third of these can be accomplished easily enough - see instructions on Full TWiki Access and Grid Computing respectively. However, gaining access to the central ATLAS Twiki requires a CERN computer account.
>
>

Running the Code

 
Changed:
<
<
Once you have a Grid certificate you must follow the instructions on the PPEIT TWiki to obtain access to the Subversion repositories.
>
>
To run the mclimits package, you need to produce a file which prepares the inputs to mclimits. This is what tchan* is an example of, however these examples require some specific root files that don't come along with the tarball. As such, and a simple example is provided here.
 
Changed:
<
<

Online Resources

>
>

Useful Output

 
Changed:
<
<
The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks here. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.
>
>
The outcomes from the MCLimits code is heavily dependent on the methods which are selected and run. However there are a set of standard outputs that are of use when doing a hypothesis test, and assessing sensitivity. These are detailed here.
 
Changed:
<
<

Tools

>
>
- PDFs
 
Changed:
<
<
ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.
>
>
- Lumi95
 
Changed:
<
<
The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).
>
>
- Lumi3sigma
 
Changed:
<
<
ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:
>
>
- Lumi5sigma
 
Changed:
<
<
>
>
- CLs +/- 1,2sigma
 
Changed:
<
<
>
>
- 1-CLb +/- 1,2sigma
 
Changed:
<
<
>
>
- J. Heinrich's Bayesian Limit results (s95med, s95p1,s95p2,s95m1,s95m2)
 
Changed:
<
<

Limitations

>
>

Useful Statistics Documentation

 
Changed:
<
<
  • Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
  • It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
  • The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
  • The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
  • Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
  • If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
  • It would be desirable to adapt the code to be able to process different signals, e.g. lbb.

TMVA Training Plots

There is a macro in the latest NNFitter version which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.

  • Type source exportscript.sh to export the relevant parameters.
  • Enter ROOT and type .x <nop>runTrainTest.C

This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.

Setting up Subversion

Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.

Running analysis & making ntuples

This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.

-- CatherineWright - 2010-02-12

Online Resources

The central ATLAS TWiki is an invaluable resource, with extensive information on all aspects of data analysis, including a detailed guide to the relevant software. For an introductory overview see the analysis and computing workbooks here. The computing workbook gives a detailed account of the full chain of data generation and analysis, together with guidance on using Athena and the Grid. This is intended to be consulted before the analysis workbook, since the latter assumes a reasonable working knowledge of the ATLAS computing environment.

Tools

ANN :- This is a kind of algorithm with a structure consisting of "neurons" organised in a sequence of layers. The most common type, which is used here, is the Multi-Layer Perceptron (MLP), which comprises three kinds of layer. The input neurons are activated by a set trigger, and once activated they pass data on to a further set of "hidden" neurons (which can in principle be organised into any number of layers, but most frequently one or two), and finally the processed data is forwarded to the output neurons.

The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).

ROOT :- This is a data analysis package developed by CERN specifically for use in nuclear and particle physics. It is extremely flexible and powerful, especially for generating graphical representations (e.g. histograms) of data. Essentially, it may be thought of as an object-oriented language compatible with C++, although ROOT code generally does not resemble pure C++. Some useful ROOT resources are:

Limitations

  • Due to ROOT limitations, it cannot output files larger than 2GB - this has been remedied by cutting the input: an additional constraint was added to genemflat_batch.sh, specifically GeneralParameter string 1 Constraint=(my_failEvent&65536)==0.
  • It must also be run on a PBS machine because of the structure of the genemflat_batch.sh file (i.e. PBS commands).
  • The script is set up to use an older version of the code (v4_02) - do newer versions offer any improvements?
  • The file <nop>teststeerFlatPlotterATLAStthSemileptonic.txt appears to contain an invalid range for the pseudorapidity (max. value = pi)
  • Adjusted ntuple_area so that it can be set dynamically (i.e. as a 9th input variable)
  • If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
  • It would be desirable to adapt the code to be able to process different signals, e.g. lbb.

TMVA Training Plots

There is a macro in the latest NNFitter version which will calculate the responsiveness of the neural net as a function of the number of training cycles, in order to gauge the optimal number of cycles to use (i.e. avoid the dangers of under- or over-training). The macro must be run in a directory where a neural net run has already been carried out.

  • Type source exportscript.sh to export the relevant parameters.
  • Enter ROOT and type .x <nop>runTrainTest.C

This will create two .eps output files, one showing the success of signal/background fitting, and the other displaying the sensitivity of the neural net to the number of training cycles used, allowing the speed of convergence to be gauged.

Setting up Subversion

Note - the first part of this section outlines the procedure used to set up a repository with Subversion. Once the repository has been created, these commands do not need to be used again, so if you are working with code from an existing repository you should start at checking out.

Running analysis & making ntuples

This is the procedure used to remake the ATLAS ntuples, using code from CERN Subversion repositories. These ntuples were then used as input for the neural net.

>
>
There is a myriad of statistics books and papers available. A simple list of papers that are useful for understanding the statistical concepts adopted in the mclimits code are listed here:
  • "Confidence Level Computation for Combining Searches with Small Statistics", Thomas Junk, arXiv:hep-ex/9902006v1
  • "Presentation of search results: the CLs technique", A.L. Read, J.Phys. G: Nucl.Part. Phys. 28 (2002) 2693-2704
  • "Signal Significance in Particle Physics", Pekka K. Sinervo, arXiv:hep-ex/0208005v1 (CDF/PUB/STATISTICS/PUBLIC/6031)
  • "How to Claim a Discovery", W.A.Rolke and A.M. Lopez, PHYSTAT-2003-MOBT002
  • "Sensitivity of Searches for New Signals and Its Optimization", Giovanni Punzi, PHYSTAT-2003-MODT002
  • "Evaluation of three methods for calculating statistical significance when incorporating a systematic uncertainty into a test of the background-only hypothesis for a Poisson process", Robert D. Cousins, James T. Linnemann, Jordan Tucker, arXiv:physics/0702156v3
  • "Combined CDF and D0 Upper Limits on Standard Model Higgs-Boson Production with 2.1 - 5.4 fb-1 of Data", The TEVNPH Working Group, arXiv:0911.3930v1
  -- CatherineWright - 2010-02-12
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback