ATLAS Web>HiggsAnalysisAtATLASUsingRooStats (revision 38)~~EditAttach~~

-- WilliamBreadenMadden - 2012-01-20

Higgs analysis at ATLAS using RooStats

This page contains basic information on getting started with Higgs analysis at ATLAS using RooStats.

Higgs analysis at ATLAS using RooStats

What is RooStats?

RooStats is a project to create statistical tools built on top of the RooFit library, which is a data modelling toolkit. It is distributed in ROOT. Specifically, it has been distributed in the ROOT release since version 5.22 (December 2008). The latest version of ROOT is recommended as the RooStats project developes quickly.

Using ROOT on the Glasgow PPELX network

There are instructions on how to use the different versions of ROOT at Glasgow here.

Execute the following commands in order to set up ROOT version 5.32.00 on PPELX:

export ROOTSYS=/data/ppe01/sl5x/x86_64/root/5.32.00
export PATH=$ROOTSYS/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib/root:$LD_LIBRARY_PATH

Using ROOT on the CERN LXPLUS network

Execute the following commands in order to set up ROOT version 5.32.00 on LXPLUS:

. /afs/cern.ch/sw/lcg/external/gcc/4.3.2/x86_64-slc5/setup.sh
cd /afs/cern.ch/sw/lcg/app/releases/ROOT/5.32.00/x86_64-slc5-gcc43-opt/root/
. bin/thisroot.sh

Setting up RooStats

There are three main options available for acquiring ROOT with RooStats included.

Option 1: Download the latest ROOT release binaries.

The latest ROOT binaries for various operating systems are accessible here.

Option 2: Build the ROOT trunk from source.

Follow the appropriate instructions here to build the ROOT trunk.

Shell script: building ROOT with RooFit and RooStats

#!/bin/bash # This script builds the latest version of ROOT with RooFit and RooStats in Ubuntu. # First, the ROOT prerequisites are installed, # then, the most common ROOT optional packages are installed. # Next, the latest version of ROOT in the CERN Subversion repository is checked out. # Finally, ROOT is compiled. # Install ROOT prerequisites. sudo apt-get install subversion sudo apt-get install make sudo apt-get install g++ sudo apt-get install gcc sudo apt-get install binutils sudo apt-get install libx11-dev sudo apt-get install libxpm-dev sudo apt-get install libxft-dev sudo apt-get install libxext-dev # Install optional ROOT packages. sudo apt-get install gfortran sudo apt-get install ncurses-dev sudo apt-get install libpcre3-dev sudo apt-get install xlibmesa-glu-dev sudo apt-get install libglew1.5-dev sudo apt-get install libftgl-dev sudo apt-get install libmysqlclient-dev sudo apt-get install libfftw3-dev sudo apt-get install cfitsio-dev sudo apt-get install graphviz-dev sudo apt-get install libavahi-compat-libdnssd-dev sudo apt-get install libldap-dev sudo apt-get install python-dev sudo apt-get install libxml2-dev sudo apt-get install libssl-dev sudo apt-get install libgsl0-dev # Check out the latest ROOT trunk. svn co http://root.cern.ch/svn/root/trunk /usr/local/root # Configure the build. cd /usr/local/root # Configure for the system architecture and configure to build the libRooFit advanced fitting package as part of the compilation. ./configure linuxx8664gcc --enable-roofit # See other possible configurations using the following command: ./configure --help # Compile. make

On a MacBook Pro 7, 1 running Ubuntu 11.04, the compilation takes ~ 1 hour. Following the compilation, the ROOT environment variables can be set up. In Ubuntu or Scientific Linux, the following lines could be added to the ~/.bashrc file:

export ROOTSYS=/usr/local/root export PATH=$ROOTSYS/bin:$PATH export LD_LIBRARY_PATH=$ROOTSYS/lib/root:$LD_LIBRARY_PATH

Option 3: Build the RooStats branch.

Do this if you want the latest development of RooStats (that has not yet been incorporated into a ROOT version).

The necessary instructions can be found here.

RooFit

General description

The RooFit library provides a toolkit for modelling the expected distribution of events in a physics analysis. Models can be used to perform unbinned maximum likelihood fits, produce plots and generate "toy Monte Carlo" samples for various studies.

The core functionality of RooFit is to enable the modelling of 'event data' distributions, in which each event is a discrete occurrence in time and has one or more measured observables associated with it. Experiments of this nature result in datasets obeying Poisson (or binomial) statistics. The natural modeling language for such distributions is probability density functions (PDFs), F(x;p), that describe the probability density of the distribution of observables x in terms of the function parameter p.

In RooFit, every variable, data point, function and PDF is represented in a C++ object. So, for example, in constructing a RooFit model, the mathematical components of the model map to separate C++ objects. Objects are classified by the data or function type that they represent, not by their respective role in a particular setup. All objects are self-documenting. The name of an object is a unique identifier for the object while the title of an object is a more elaborate description of the object.

Here are a few examples of mathematical concepts that correspond to various RooFit classes:

Mathematical concept	RooFit class
variable	RooRealVar
function	RooAbsReal
PDF	RooAbsPdf
space point (set of parameters)	RooArgSet
list of space points	RooAbsData
integral	RooRealIntegral

Composite functions correspond to composite objects. The ArgSet class is dependent on argument order while the ArgList class is not.

Example code: defining a RooFit variable


   General form for defining a RooFit variable:
      RooRealVar x(<object name>, <object title>, <value>, <minimum value>, <maximum value>)
   Specific example for defining a RooFit variable x with the value 5:
      RooRealVar x("x", "x observable", 5, -10, 10)

RooPlot

A RooPlot is essentially an empty frame that is capable of holding anything plotted verses its variable.

PDFs

One of the things that makes the RooFit PDFs nice and flexible (but perhaps counterintuitive) is that they have no idea what is considered the observable. For example, a Gaussian has x, the mean and sigma while the PDF is always normalised to unity. What is one integrating in order to get 1? For the Gaussian, it is x, by convention; the mean and sigma are parameters of the model. RooFit doesn't know that x is special; x, the mean and sigma are all on equal footing. You can tell RooFit that x is the variable to normalise over.

So, RooGaussian has no intrinsic notion of distinction between observables and parameters. The choice of observables (for unit normalisation) is always passed to gauss.getVal().

What is the value of this? In a nutshell, it allows one to do Bayesian stuff very easily.

Bayes' theorem holds that the probability of b given a is related to the probability of a given b. Normally, one might say that the mean of a Gaussian is 1 and that then gives a distribution for x. However, if one had a dataset for x, one might want to know the posterior for the mean. The RooFit ability to switch what the probability density function is of is very useful for Bayesian stuff. Because RooFit allows this composition, the thing that acts as x in the Gaussian could actually be a function of, say, x^2. A Jacobian factor is picked up in going from x to x^2, so the normalisation no longer makes sense. Sometimes RooFit can figure out what the Jacobian factor should be and sometimes it resorts to numerical integration.

Example code: create a Gaussian PDF using RooStats and plot it using the RooPlot class


   {
      // Build a Gaussian PDF.
         RooRealVar x("x", "x", -10, 10);
         RooRealVar mean("mean", "mean of Gaussian", 0, -10, 10);
         RooRealVar sigma("sigma", "width of Gaussian", 3);
         RooGaussian gauss("gauss", "Gaussian PDF", x, mean, sigma);

      // Plot the PDF.
         RooPlot* xframe = x.frame();
         gauss.plotOn(xframe);
         xframe->Draw();
   }

Example code: telling a RooFit PDF what to normalise over


   Not normalised (i.e., this is not a PDF):
      gauss.getVal();
   Hey, RooFit! This is the thing to normalise over (i.e., guarantees Int[xmin, xmax] Gauss(x, m, s)dx == 1):
      gauss.getVal(x);
   What is the value if sigma is considered the observable? (i.e., guarantees Int[smin, smax] Gauss(x, m, s)ds == 1):

Datasets

General description

A dataset is a collection of points in N-dimensional space. In RooFit, data can be stored in an unbinned or binned manner. Unbinned data is stored using the RooDataSet class while binned data is stored using the RooDataHist class. The user can define how many bins there are in a variable. For the purposes of plotting using the RooPlot class, a RooDataSet is binned into a histogram.

In general, working in RooFit with binned and unbinned data is very similar, as both the RooDataSet (for unbinned data) and RooDataHist (for binned data) classes inherit from a common base class, RooAbsData, which defines the interface for a generic abstract data sample. With few exceptions, all RooFit methods take abstract datasets as input arguments, allowing for the interchangeable use of binned and unbinned data.

RooDataSet (unbinned data)

Example code: generating toy Monte Carlo, storing it as unbinned data and then plotting it


   }
      // Create a RooDataSet and fill it with generated toy Monte Carlo data:
         RooDataSet* myData = gauss.generate(x, 100);
      // Plot the dataset.
         RooPlot* myFrame = x.frame();
         myData.plotOn()(myFrame);
         myFrame.Draw();
   }

Plotting unbinned data is similar to plotting binned data with the exception that one can display it in some preferred binning.

Example code: plotting unbinned data (a RooDataSet) using a specified binning


   RooPlot* myFrame = x.frame() ;
   myData.plotOn(myFrame, Binning(25)) ;
   myFrame->Draw()

Importing data from ROOT trees (how to populate RooDataSets from TTrees)

* to be completed *

RooDataHist (binned data)

Importing data from ROOT TH histogram objects (take a histogram and map it to a binned data set) (how to populate RooDataHists from histograms)

In RooFit, binned data is represented by the RooDataHist class. The contents of a ROOT histogram can be imported into a RooDataHist object. In importing a ROOT histogram, the binning of the original histogram is imported as well. A RooDataHist associates the histogram with a RooFit variable object of type RooRealVar. In this way it is always known what kind of data is stored in the histogram.

In displaying the data, RooFit, by default, shows the 68% confidence interval for Poisson statistics.

Example code: import a ROOT histogram into a RooDataHist (a RooFit binned dataset)


   {
         // Access the file.
            TFile* myFile = new TFile("myFile.root");
         // Load the histogram.
            TH1* myHistogram = (TH1*) myFile->Get("myHistogram");
         // Draw the loaded histogram.
            myHistogram.Draw();

         // Declare an observable x.
            RooRealVar x("x", "x", -1, 2);

         // Create a binned dataset that imports the contents of TH1 and associates its contents to observable 'x'.
            RooDataHist myData("myData", "myData", RooArgList(x), myHistogram);

         // Plot the imported dataset.
            RooPlot* myFrame = x.frame();
            myData.plotOn(myFrame)
            myFrame.Draw()
   }

Fitting

Fitting a model to data

Fitting a model to data can be done in many ways. The most common methods are the χ2 fit and the -log(L) fit. The default fitting method in ROOT is the χ2 method, while the default method in RooFit is the -log(L) method. The -log(L) method is often preferred because it is more robust for low statistics fits and because it can also be performed on unbinned data.

Fitting a PDF to unbinned data

Example code: fit a Gaussian PDF to data


   // Fit gauss to unbinned data
      gauss.fitTo(*myData);

The RooFit workspace

General description

The RooFit "workspace" provides the ability to store the full likelihood model, any desired priors and the minimal data necessary to reproduce the likelihood function in a ROOT file. Thus, the workspace is needed for combinations and has potential for digitally publishing results (PhyStats agreed to publish likelihood functions). The RooFit workspace can be used for this.

Consider a Gaussian PDF. One might create this Gaussian PDF and then import it into a workspace. All of the dependencies of the Gaussian would be drawn in and owned by the workspace (there are no nightmarish ownership problems). Alternatively, one might create simply the Gaussian inside the workspace using the "Workspace Factory".

Example code: using the Workspace Factory to create a Gaussian PDF


   // Create a Gaussian PDF using the Workspace Factory (this is essentially the shorthand for creating a Gaussian).
      RooWorkspace* myWorkspace = new RooWorkspace("myWorkspace");
      myWorkspace->factory("Gaussian::g(x[-5, 5], mu[0], sigma[1]");

What's in the RooFit workspace?

Example code: What's in the workspace?


   // Open the appropriate ROOT file.
      root -l myFile.root
   // Import the workspace.
      myWorkspace = (RooWorkspace*) _file0->Get("myWorkspace");
   // Print the workspace contents.
      myWorkspace.Print();
   // Example printout:

   // variables
   // ---------
   // (x,m,s)
   //
   // p.d.f.s
   // -------
   // RooGaussian::g[ x=x mean=m sigma=s ] = 0
   //
   // datasets
   // --------
   // RooDataSet::d(x)

   // Import the variable saved as x.
      RooRealVar* myVariable = myWorkspace.var("x");
   // Import the PDF saved as g.
      RooAbsPdf* myPDF = myWorkspace.pdf("g");
   // Import the data saved as d.
      RooAbsData* myData = myWorkspace.data("d");
   // Import the ModelConfig saved as m.
      ModelConfig* myModelConfig = (ModelConfig*) myWorkspace.obj("m");

Visual representations of the model/PDF contents

Graphviz

Graphviz consists of a graph description language called the DOT language and a set of tools that can generate and/or process DOT files.

Example code: examining PDFs and creating graphical representations of them


   // Create variables and a PDF using those variables.
      RooRealVar mu("mu", "mu", 150);
      RooRealVar sigma("sigma", "sigma", 5, 0, 20);
      RooGaussian myGaussianPDF("myGaussianPDF", "Gaussian PDF", x, mu, sigma);
   // Create a Graphviz DOT file with a representation of the object tree.
      myGaussianPDF.graphVizTree("myGaussianPDFTree.dot");
      // This produced DOT file can be converted to some graphical representation:
         // Convert the DOT file to a 'top-to-bottom graph' using UNIX commands:
            // dot -Tgif -o myGaussianPDF_top-to-bottom_graph.gif myGaussianPDFTree.dot
         // Convert the DOT file to a 'spring-model graph' using UNIX commands:
            // fdp -Tgif -o myGaussianPDF_spring-model_graph.gif myGaussianPDFTree.dot
   // Print the PDF contents.
      myGaussianPDF.Print();
      // Example output:
         // RooGaussian::G[ x=x mean=mu sigma=sigma ] = 1
         // sigma.Print()
         // RooRealVar::sigma = 5  L(0 - 20) 
   // Print the PDF contents in a detailed manner.
      myGaussianPDF.Print("verbose");
   // Print the PDF contents to stdout.
      myGaussianPDF.Print("t");
      // Example output:
         // 0x166eab0 RooGaussian::G = 1 [Auto] 
         // 0x15f7fe0/V- RooRealVar::x = 150
         // 0x1487090/V- RooRealVar::mu = 150
         // 0x1487bc0/V- RooRealVar::sigma = 5
   // Print the PDF contents to a file.
      myGaussianPDF.printCompactTree("", "myGaussianPDFTree.txt")
      // Example output file contents:
         // 0x166eab0 RooGaussian::G = 1 [Auto] 
         // 0x15f7fe0/V- RooRealVar::x = 150
         // 0x1487090/V- RooRealVar::mu = 150
         // 0x1487bc0/V- RooRealVar::sigma = 5

Model Inspector

The Model Inspector is a GUI for examining the model contained in the RooFit workspace. The function and it's parameters are as follows:

void ModelInspector(const char* infile = "", const char* workspaceName = "combined", const char* modelConfigName = "ModelConfig", const char* dataName = "obsData")

If the workspace(s) were made using hist2workspace, the names have a standard form (as shown above).

Using the Model Inspector

   // Load the macro.
      root -L ModelInspector.C++
   // Run the macro on the appropriate ROOT workspace file.
      ModelInspector("results/my_combined_example_model.root")

The Model Inspector GUI should appear. The GUI consists of a number of plots, corresponding to the various channels in the model, and a few sliders, corresponding to the parameters of interest and the nuisance parameters in the model. The initial plots are based on the values of the parameters in the workspace. There is a little "Fit" button which fits the model to the data points (while also printing the standard terminal output detailing the fitting). After fitting, a yellow band is shown around the best fit model indicating the uncertainty from propagating the uncertainty of the fit through the model. On the plots, there is a red line shown (corresponding to the fit for each of the parameters at their nominal value pushed up by 1 sigma).

How do I get it?

You can get the ModelInspector.C from the Statistics Forum RooStats tools here.

Links

Video illustrating the usage of the Model Inspector

Accessing the RooFit workspace

Example code: accessing the workspace


   // Open the appropriate ROOT file.
      root -l BR5_MSSM_signal90_combined_datastat_model.root
      // Alternatively, you could open the file in a manner such as the following:
         myFileName = "BR5_MSSM_signal90_combined_datastat_model.root"
         TFile *myFile = TFile::Open(myFileName);
   // Import the workspace.
      RooWorkspace* myWorkspace = (RooWorkspace*) _file0->Get("combined");
   // Print the workspace contents.
      myWorkspace->Print();
   // Import the PDF.
      RooAbsPdf* myPDF = myWorkspace->pdf("model_BR5_MSSM_signal90");
   // Import the variable representing the observable.
      RooRealVar* myObservable = myWorkspace->var("obs");
   // Create a RooPlot frame using the imported variable..
      RooPlot* myFrame = myObservable.frame();
   // Plot the PDF on the created RooPlot frame.
      myPDF.plotOn(myFrame);
   // Draw the RooPlot.
      myFrame->Draw();

Example code: accessing both data and PDF from a workspace stored in a file


   // Note that the following code is independent of actual PDF in the file. So, for example, a full Higgs combination could work with identical code.

   // Open a file and import the workspace.
      TFile myFile("myResults.root") ;
      RooWorkspace* myWorkspace = f.Get("myWorkspace") ;
   // Plot the data and PDF
      RooPlot* xframe = w->var("x")->frame() ;
      w->data("d")->plotOn(xframe) ;
      w->pdf("g")->plotOn(xframe) ;
   // Construct a likelihood and profile likelihood
      RooNLLVar nll("nll","nll",*myWorkspace->pdf("g"),*w->data("d")) ;
      RooProfileLL pll("pll","pll", nll,*myWorkspace->var("m")) ;
      RooPlot* myFrame = w->var("m")->frame(-1,1) ;
      pll.plotOn(myFrame) ;
      myFrame->Draw()

Links for RooFit

User's Manual

Tutorials

RooStats

General description

RooStats provides tools for high-level statistics questions in ROOT. It builds on RooFit, which provides basic building blocks for statistical questions.

The main goal is to standardise the interface for major statistical procedures so that they can work on an arbitrary RooFit model and dataset and handle any parameters of interest and nuisance parameters. Another goal is to implement most accepted techniques from frequentist, Bayesian and likelihood based approaches. A further goal is to provide utilities to perform combined measurements.

Example code: create a simple model using the RooFit Workspace Factory. Specify parts of the model using ModelConfig. Create a simple dataset. Complete a confidence interval test using the ProfileLikelihoodCalculator of RooStats


   {
      // In this script, a simple model is created using the Workspace Factory in RooFit.
      // ModelConfig is used to specify the parts of the model necessary for the statistical tools of RooStats.
      // A 95% confidence interval test is run using the ProfileLikelihoodCalculator of RooStats.

      // Define a RooFit random seed in order to produce reproducible results.
         RooRandom::randomGenerator()->SetSeed(271);
   
      // Make a simple model using the Workspace Factory.
         // Create a new workspace.
            RooWorkspace* myWorkspace = new RooWorkspace();
         // Create the PDF G(x|mu,1) and the variables x, mu and sigma in one command using the factory syntax.
            myWorkspace->factory("Gaussian::normal(x[-10,10], mu[-1,1], sigma[1])");
      // Define parameter sets for observables and parameters of interest.
         myWorkspace->defineSet("poi","mu");
         myWorkspace->defineSet("obs","x");
      // Print the workspace contents.
         myWorkspace->Print() ;
      // Specify for the statistical tools the components of the defined model.
         // Create a new ModelConfig.
            ModelConfig* myModelConfig = new ModelConfig("my G(x|mu,1)");
         // Specify the workspace.
            myModelConfig->SetWorkspace(*myWorkspace);
         // Specify the PDF.
            myModelConfig->SetPdf(*myWorkspace->pdf("normal"));
         // Specify the parameters of interest.
            myModelConfig->SetParametersOfInterest(*myWorkspace->set("poi"));
         // Specify the observables.
            myModelConfig->SetObservables(*myWorkspace->set("obs"));
      // Create a toy dataset.
         // Create a toy dataset of 100 measurements of the observables (x).
            RooDataSet* myData = myWorkspace->pdf("normal")->generate(*myWorkspace->set("obs"), 100);
         //myData->print();
      // Use the ProfileLikelihoodCalculator to obtain a 95% confidence interval.
         // Specify the confidence level required.
            double confidenceLevel = 0.95;
         // Create an instance of the ProfileLikelihoodCalculator, specifying the data and the ModelConfig for it.
            ProfileLikelihoodCalculator myProfileLikelihoodCalculator(*myData, *myModelConfig);
         // Set the confidence level.
            myProfileLikelihoodCalculator.SetConfidenceLevel(confidenceLevel);
         // Obtain the resulting interval.
            LikelihoodInterval* myProfileLikelihoodInterval = myProfileLikelihoodCalculator.GetInterval();
         // Use this interval result. In this case, it makes sense to say what the lower and upper limits are.
         // Define the object variables for the purposes of the confidence interval.
            RooRealVar* x = myWorkspace->var("x");
            RooRealVar* mu = myWorkspace->var("mu");
         cout << "The profile likelihood calculator interval is [ "<<
         myProfileLikelihoodInterval->LowerLimit(*mu) << ", " <<
         myProfileLikelihoodInterval->UpperLimit(*mu) << "] " << endl;
         // Set mu equal to zero.
            mu->setVal(0);
         // Is mu in the interval?
            cout << "Is mu = 0 in the interval?" << endl;
            if (myProfileLikelihoodInterval->IsInInterval(*mu) == 1){
               cout << "Yes" << endl;
            } else{
               cout << "No" << endl;
            }
   }

Links for RooStats

Wiki

RooStats User's Guide

Tutorials

E-group for support with ATLAS-sensitive information: atlas-phys-stat-root@cern.ch

E-mail support for software issues, bugs etc.: roostats-development@cern.ch

ModelConfig

The ModelConfig RooStats class encapsulates the configuration of a model to define a particular hypothesis. It is now used extensively by the calculator tools. ModelConfig always contains a reference to an external workspace that manages all of the objects that are a part of the model (PDFs and parameter sets). So, in order to use ModelConfig, the user must specify a workspace pointer before creating the various objects of the model.

HistFactory

General description

The HistFactory can be used to run an analysis without being a RooFit/RooStats expert. In a nutshell, ROOT files containing input histograms are set up and XML configuration files are set up for those input ROOT files. The XML configuration files specify details on the histograms, how RooFit should interpret the information in the files and how histograms are to be used in calculations. A little executable contained in ROOT called hist2workspace is used to import the histograms into a RooFit workspace.

prepareHistFactory

The ROOT release ships with a script called prepareHistFactory in the $ROOTSYS/bin directory. The prepareHistFactory script prepares a working area. Specifically, it creates results/, data/ and config/ directories. It then copies the HistFactorySchema.dtd and example XML files from the ROOT tutorials directory into the config/ directory. It also copies a ROOT file into the data/ directory for use with the example XML configuration files.

HistFactorySchema.dtd

The HistFactorySchema.dtd file, located in $ROOTSYS/etc/, specifies the XML schema. It is typically placed in the config/ directory of a working area together with the top-level XML file and the individual channel XML files. The user should not modify this file. The HistFactorySchema.dtd is commented to specify exactly the meaning of the various options.

hist2workspace

The hist2workspace executable is run using the top-level XML configuration file as an argument in the following manner:

   hist2workspace top_level.xml

XML files

General description

A minimum of two XML files are required for configuration. The "top-level" XML configuration file defines the measurement and contains a list of channels that contribute to this measurement. The "channel" XML configuration files are used to describe each channel in detail. For each contributing channel, there is a separate XML configuration file.

Conventions

The nominal and variational histograms should all have the same normalisation convention. There are a few conventions possible:

Option 1:
- Lumi="XXX" is in the main XML's element, where XXX is in fb^-1.
- Histograms have units of fb/bin.
- Some samples have NormFactors that are all relative to prediction (e.g. 1 is the nominal prediction).

Option 2:
- Lumi="1" is in the main XML's element.
- Histograms are normalized to unity.
- Each sample has a NormFactor that is the expected numbers of events in data.

Option 3:
- Lumi="1" is in the main XML's element.
- Histograms have units of (numbers of events)/bin expected in data.
- Some samples have NormFactors that are all relative to prediction (e.g. 1 is the nominal prediction).
- It's up to you. In the end, the expected number is a product: N=Lumi*BinContent*NormFactor(s). See the user's guide for more precise equations.

Top-Level XML file

General description

This file specifies a top level 'Combination' that is composed of:

several 'Channels', which are described in separate XML configuration files.
several 'Measurements' (corresponding to a full fit of the model), each of which specifies
- Name, a name for this measurement to be used in tables and files.
- Lumi, the integrated luminosity associated with the measurement in picobarns.
- LumiRelErr, the relative error of the luminosity measurement.
- the histogram bins to be used:
  - BinLow specifies the lowest bin number used for the measurement (inclusive).
  - BinHigh specifies the highest bin number used for the measurement (exclusive).
- relative uncertainty on the luminosity.
- what is(/are) the parameter(/s) of interest that will be measured.
  - Use POI to specify this.
- which parameter(/s) should be fixed/floating (e.g., nuisance parameters)
- which type of constriants are desired:
  - ConstraintTerm is used.
  - default: Gaussian
  - supported: Gaussian, Gamma, LogNormal, Uniform
  - RelativeUncertainty
- whether the tool should export the model only and skip the default fit.
  - ExportOnly: if "True", skip the fit (export only the model; don't perform the initial fit).

OutputFilePrefix is a prefix for the output ROOT file to be created.

Mode represents the type of analysis. Use "comb".

ParamSetting allows for the specification of which parameters are fixed. If a parameter is included here, it is neither a nuisance parameter nor a POI, but a fixed parameter of the mode.

Val allows for the specification of the specific value.

Const: this has a specific value, not set here, but where the param is defined.

Specific instructions

There are several parts in the top-level XML file. Initially, the XML schema is specified. The output is specified next. Specifically, the prefix for any output files (ROOT files containing workspaces) is given. Now, the channel XML files are given for each measurement (e.g., signal, background, systematics information) using the Input tag. Next, the details for each channel are defined using the Measurement tag. Which bins to use are specified inclusively using the Measurement tag attributes BinLow and BinHigh. Within the Measurement tag, the POI tag is used to specify the point of interest for the channel, i.e. the test statistic.

Example file: $ROOTSYS/tutorials/histfactory/example.xml

<!DOCTYPE Combination  SYSTEM 'HistFactorySchema.dtd'>

<Combination OutputFilePrefix="./results/example" Mode="comb" >

  <Input>./config/example_channel.xml</Input>

  <Measurement Name="GaussExample" Lumi="1." LumiRelErr="0.1" BinLow="0" BinHigh="2" Mode="comb" >
    <POI>SigXsecOverSM</POI>
    <ParamSetting Const="True">Lumi alpha_syst1</ParamSetting>
    <!-- don't need <ConstraintTerm> default is Gaussian-->
  </Measurement>

  <Measurement Name="GammaExample" Lumi="1." LumiRelErr="0.1" BinLow="0" BinHigh="2" Mode="comb" >
    <POI>SigXsecOverSM</POI>
    <ParamSetting Const="True">Lumi alpha_syst1</ParamSetting>
    <ConstraintTerm Type="Gamma" RelativeUncertainty=".3">syst2</ConstraintTerm>
  </Measurement>

  <Measurement Name="LogNormExample" Lumi="1." LumiRelErr="0.1" BinLow="0" BinHigh="2" Mode="comb" >
    <POI>SigXsecOverSM</POI>
    <ParamSetting Const="True">Lumi alpha_syst1</ParamSetting>
    <ConstraintTerm Type="LogNormal" RelativeUncertainty=".3">syst2</ConstraintTerm>
  </Measurement>

  <Measurement Name="ConstExample" Lumi="1." LumiRelErr="0.1" BinLow="0" BinHigh="2" Mode="comb" ExportOnly="True">
    <POI>SigXsecOverSM</POI>
    <ParamSetting Const="True">Lumi alpha_syst1</ParamSetting>
  </Measurement>

</Combination>

Channel XML files

General description

These files specify for each channel

Name, a name for the channel.
InputFile, the input file in which the histogram can be found. If this is not specified, it must be specified for each sample and data.
HistoPath, the path (within the ROOT file) at which the histogram can be found.
HistoName (optional), the name of the histogram to be used, unless overridden for specific samples and data.
observed data (if absent, the tool will use the expectation, which is useful for expected sensitivity)
several 'Samples' (e.g., signal, bkg1, bkg2 etc.), each of which specifies
- Name, a name
- whether the sample is normalized by theory (e.g., N = L * sigma) or whether the sample is data driven.
- a nominal expectation histogram.
- a named 'Normalization Factor' (which can be fixed or allowed to float in a fit).
- several 'Overall Systematics' in normalization with
  - a name.
  - +/- 1 sigma variations (e.g., 1.05 and 0.95 for a 5% uncertainty).
- several 'Histogram Systematics' in shape with
  - a name (which can be shared with the OverallSyst if correlated).
  - +/- 1 sigma variational histograms.

Specific instructions

First, the channel file specifies the XML schema. Then, the channel is defined and named. The location of the data histogram for the channel is defined. For each background, a Sample is defined. It is specified whether it is normalised to luminosity (i.e., the histograms should be per inverse picobarn and will be scaled). To enable normalisation to luminosity, set the tag attribute NormalizeByTheory to "True". For external normalisation, a data-driven background measurement is fixed to the lumi of the dataset. In this case, set the tag attribute NormalizeByTheory to "False". For the normalisation factor, (e.g., "SigXsecOverSM"), the tag attribute "Name" should match the POI specified in the top level XML configuration file.

Systematic uncertainties

For an overall relative rate systematic, the "OverallSys" tag is used with its appropriate tag attributes. For a shape systematic (a systematic that affects the shape of a histogram), the "HistoSys" tag is used with its appropriate tag attributes. Specifically, for the HistoSys tag attributes "HistoNameHigh" and "HistoNameLow", the respective histograms for the upper and lower shape systematic uncertainties are specified in a manner such as the following:

<HistoSys Name="myShapeSystematic_1" HistoNameHigh="myShapeSystematic_1_high" HistoNameLow="myShapeSystematic_1_low" />

Example file: $ROOTSYS/tutorials/histfactory/example_channel.xml

<!DOCTYPE Channel  SYSTEM 'HistFactorySchema.dtd'>

  <Channel Name="channel1" InputFile="./data/example.root" HistoName="" >
    <Data HistoName="data" HistoPath="" />
    <Sample Name="signal" HistoPath="" HistoName="signal">
      <OverallSys Name="syst1" High="1.05" Low="0.95"/>
      <NormFactor Name="SigXsecOverSM" Val="1" Low="0." High="3." Const="True" />
    </Sample>
    <Sample Name="background1" HistoPath="" NormalizeByTheory="True" HistoName="background1">
      <OverallSys Name="syst2" Low="0.95" High="1.05"/>
    </Sample>
    <Sample Name="background2" HistoPath="" NormalizeByTheory="True" HistoName="background2">
      <OverallSys Name="syst3" Low="0.95" High="1.05"/>
      <!-- <HistoSys Name="syst4" HistoPathHigh="" HistoPathLow="histForSyst4"/>-->
    </Sample>
  </Channel>

Guidance in writing the XML configuration files

Histograms

Units

The units of the input histogram and the luminosity specified in the top-level XML configuration file should be compatible; for example, input histograms might be in pb while the luminosity might be in pb^{-1} or input histograms might be in fb while the luminosity might be in fb^{-1}.

Naming convention

The input histogram names are of no special significance, however, it is often preferable to have a good naming convention devised. One might consider the order in which one has information in the XML and aim to have histogram names appear in a similar order when listed alphabetically.

Generic example
Type of histogram	Histogram naming convention
Phenomenon histogram	<phenomenon name>_m<mass point>
Phenomenon upward systematic histogram	<phenomenon name>_m<mass point>_sys_<systematic name>_up
Phenomenon downward systematic histogram	<phenomenon name>_m<mass point>_sys_<systematic name>_do

The reason for having an "m" character preceding the mass point is that it allows for easy search and replace of the mass point value when automatically producing multiple XML configuration files corresponding to multiple mass points.

Specific example
Type of histogram	Histogram name
ttH histogram	ttH_m110
ttH upward luminosity systematic histogram	ttH_m110_sys_Lumi_up
ttH downward luminosity systematic histogram	ttH_m110_sys_Lumi_do
ttH upward JES systematic histogram	ttH_m110_sys_JES_up
ttH downward JES systematic histogram	ttH_m110_sys_JES_do
WW Herwig 105987 upward luminosity systematic histogram	WW_Herwig_105987_m110_sys_Lumi_up
WW Herwig 105987 downward luminosity systematic histogram	WW_Herwig_105987_m110_sys_Lumi_do
WW Herwig 105987 upward JES systematic histogram	WW_Herwig_105987_m110_sys_JES_up
WW Herwig 105987 downward JES systematic histogram	WW_Herwig_105987_m110_sys_JES_do

Samples

The sample names are of no special significance, however, it is often preferable to have very short names (e.g., signal1) for the purposes of clarity when, for example, printing the contents of the workspace.

The NormalizedByTheory attribute should be "True" (as opposed to "TRUE" or "true") for all non-data-driven backgrounds. If the Data tag is removed, expected data shall be used in calculations.

The signal(s) are specified as such through the use of the POI tag (in the channel XML configuration files).

Example file: ttH_m110_channel.xml

<!DOCTYPE Channel SYSTEM 'HistFactorySchema.dtd'>

<!-- channel name and input file -->
<Channel Name="ttH_m110" InputFile="data/ttH_histograms.root" HistoName="">
   
   <!-- data -->
   <Data HistoName="data" />

   <!-- signal -->
   <Sample Name="signal" HistoName="ttH_m110"
      NormalizeByTheory="False" >
      <NormFactor Name="SigXsecOverSM" Val="1.0" Low="0.0" High="100." Const="True" />
      <!-- systematics: -->
      <HistoSys Name="Lumi"
      HistoNameHigh="ttH_m110_sys_Lumi_up"
      HistoNameLow="ttH_m110_sys_Lumi_do"
      />
      <HistoSys Name="JES"
      HistoNameHigh="ttH_m110_sys_JES_up"
      HistoNameLow="ttH_m110_sys_JES_do"
      />         
      <OverallSys Name="scale_ttbar" High="1.05" Low="0.95"/>
   </Sample>

   <!-- backgrounds -->
   
   <!-- WW_Herwig_105987 -->
   <Sample Name="WW_Herwig_105987" NormalizeByTheory="True" HistoName="WW_Herwig_105987_m110">
   </Sample>

   <!-- Wplusjets -->
   <Sample Name="Wplusjets" NormalizeByTheory="False" HistoName="Wplusjets_m110">
      <!-- systematics -->
      <HistoSys Name="Lumi"
      HistoNameHigh="Wplusjets_m110_sys_Lumi_up"
      HistoNameLow="Wplusjets_m110_sys_Lumi_do"
      />
      <HistoSys Name="JES"
      HistoNameHigh="Wplusjets_m110_sys_JES_up"
      HistoNameLow="Wplusjets_m110_sys_JES_do"
      />   
   </Sample>

</Channel>

Example file: ttH_m110_top-level.xml

<!DOCTYPE Combination SYSTEM 'HistFactorySchema.dtd'>

<!-- workspace output file prefix -->
<Combination OutputFilePrefix="workspaces/ttH_m110_workspace" Mode="comb" >

   <!-- channel XML file(s) -->
   <Input>config/ttH_m110_channel.xml</Input>

   <!-- measurement and bin range -->
   <Measurement Name="datastat" Lumi="1" LumiRelErr="0.037" BinLow="0" BinHigh="21" Mode="comb" ExportOnly="True">
      <POI>SigXsecOverSM</POI>
   </Measurement>

</Combination>

Create XML configuration files automatically

Once the XML configuration files for a certain mass point are written, these files can be used to automatically create further XML configuration files for all remaining mass points.

If the described naming convention is used (i.e., mass points are prefixed with an "m"), the following shell script might be used to create all required XML configuration files for all mass points.

Example file: createXMLFiles.sh

#!/bin/bash

# This script is designed for use with the naming conventions described in the following page:
# https://ppes8.physics.gla.ac.uk/twiki/bin/view/ATLAS/HiggsAnalysisAtATLASUsingRooStats
# The #userInput hashtag is used to indicate places in the script where the user might modify
# things.

createXMLFiles()
# Arguments: originalMass
{
originalMass=$1
newMass=$2
# Set/specify the original XML configuration file names.
originalChannelXMLFileName=ttH_m${originalMass}_channel.xml
originalTopLevelXMLFileName=ttH_m${originalMass}_top-level.xml
# Set/specify the new XML configuration file names.
newChannelXMLFileName=ttH_m${newMass}_channel.xml
newTopLevelXMLFileName=ttH_m${newMass}_top-level.xml
# Duplicate the original XML configuration files while substituting the original mass
# points with the new mass points in the new resulting file.
# Here, the program sed is used to replace all occurrances of a specified pattern in
# a specified file with another specified pattern.
# The "s" correspondes to "substitute".
# The "g" corresponds to "globally" (all instances in a line).
sed "s/m$originalMass/m$newMass/g" $originalChannelXMLFileName > $newChannelXMLFileName
sed "s/m$originalMass/m$newMass/g" $originalTopLevelXMLFileName > $newTopLevelXMLFileName
}

# Specify the original mass point.
originalMass=110 #userInput

# Execute the function for creation of new XML configuration files for all required mass points.
createXMLFiles $originalMass 115 #userInput
createXMLFiles $originalMass 120 #userInput
createXMLFiles $originalMass 125 #userInput
createXMLFiles $originalMass 130 #userInput
createXMLFiles $originalMass 140 #userInput

Links for HistFactory

HistFactory manual

HistFactory XML reference

XML example

Exotics Working Group statistics tutorial XML reference

Exotics Working Group statistics tutorial workspace examples

Analysis!

ATLAS recommends the use of the profile likelihood as a test statistic.

Full significance calculation example, from histogram creation to model production (using hist2workspace), to RooStats calculations

Prepare a working area.

Make the main directory.

cd ~
mkdir test
cd test

Make the directory for the XML configuration files.

mkdir config

Make the directory for the input histogram ROOT files.

mkdir data

Make the directory for the workspace.

mkdir workspaces

Generate input histograms.

Change to the directory for the input histograms.

cd data

Run some code such as the following. For fun, we will create a simulated data signal at about 126 GeV at about three times the number of events of that which was expected.

Example file: make_test_histograms.c

{
   // Create the expected signal histogram.
      // Create the function used to describe the signal shape (a simple Gaussian shape).
         TF1 mySignalFunction("mySignalFunction", "(1/sqrt(2*pi*0.5^2))*2.718^(-(x-126)^2/(2*0.5^2))", 120, 130);
      // Create the histogram with 100 bins between 120 and 130 GeV.
         TH1F mySignal("mySignal", "mySignal", 100, 120, 130);
      // Fill the histogram using the signal function.
         mySignal.FillRandom("mySignalFunction", 10000000);
   // Create the background histogram.
      // Create the function used to describe the signal shape (a simple polynomial of the second order).
         TF1 myBackgroundFunction("myBackgroundFunction", "-2.3*10^(-6)*x^2+0.0007*x+0.4", 120, 130);
      // Create the histogram with 100 bins between 120 and 130 GeV.
         TH1F myBackground("myBackground", "myBackground", 100, 120, 130);
      // Fill the histogram using the background function.
         myBackground.FillRandom("myBackgroundFunction", 100000000000);
   // Create the (simulated) data histogram. This histogram represents what one might have as real data.
      // Create the histogram using by combining the signal histogram multiplied by 3 with the background histogram.
         TH1F myData=3*mySignal+myBackground;
      // Set the name of the histogram.
         myData->SetName("myData");
   // Save the histograms created to a ROOT file.
      TFile myFile("test_histograms.root", "RECREATE");
      mySignal->Write();
      myBackground->Write();
      myData->Write();
      myFile.Close();
}

Create the workspace.

There are two approaches which might be used to create the RooFit workspace. The first method uses the hist2workspace program to run on XML configuration files and input histogram files. The second method explicitly builds the workspace using RooFit code. In general, most should opt for the first, XML-based approach. If you wish to understand the process of specifying the specifics of the models you want to create, you might take a look at the second approach.

Option 1: Create the workspace using hist2workspace and XML configuration files.

Change to the directory for the configuration XML files.

cd ~/test/config

Copy the HistFactory XML schema to the configuration directory.

cp $ROOTSYS/etc/HistFactorySchema.dtd .

Create XML configuration files in the configuration directory.

Example file: test_top-level.xml

<!DOCTYPE Combination SYSTEM 'HistFactorySchema.dtd'>

<!-- workspace output file prefix -->
<Combination OutputFilePrefix="./workspaces/test_workspace" Mode="comb" >

   <!-- channel XML file(s) -->
   <Input>./config/test_channel.xml</Input>

   <!-- measurement and bin range -->
   <Measurement Name="datastat" Lumi="1" LumiRelErr="0.037" BinLow="0" BinHigh="99" Mode="comb" ExportOnly="True">
      <POI>SigXsecOverSM</POI>
   </Measurement>

</Combination>

Example file: test_channel.xml

<!DOCTYPE Channel SYSTEM 'HistFactorySchema.dtd'>

<!-- channel name and input file -->
<Channel Name="test" InputFile="./data/test_histograms.root" HistoName="">
   
   <!-- data -->
   <Data HistoName="myData" />

   <!-- signal -->
   <Sample Name="signal" HistoName="mySignal"
      NormalizeByTheory="True" >
      <NormFactor Name="SigXsecOverSM" Val="1.0" Low="0.0" High="100." Const="True" />
   </Sample>

   <!-- backgrounds -->
   
   <!-- background -->
   <Sample Name="background" NormalizeByTheory="True" HistoName="myBackground">
   </Sample>

</Channel>

As you can see, the model is very simple. There is the expected signal, the expected background and the actual data (in this case, the data is simulated). The point of interest relates to the signal and the expected data will be compared to the actual data.

Run hist2workspace.

hist2workspace config/test_top-level.xml

There should now be created 4 files in the workspaces directory:

test_workspace_combined_datastat_model.root
test_workspace_datastat.root
test_workspace_results.table
test_workspace_test_datastat_model.root

The test_workspace_combined_datastat_model.root file is the ROOT file that contains the combined workspace object (is has constraints, weightings etc. properly incorporated). This is the file you are almost certainly interested in. The test_workspace_test_datastat_model.root file contains a workspace object without proper constraints, weightings etc. I don't know what the other two files are.

Use the ProfileLikelihoodCalculator to calculate the 95% condifence interval on the parameter of interest as specified in the ModelConfig.

Create a C++ program such as the following:

Example file: ProfileLikeliHoodCalculator_confidence_level.cpp

#include <iostream>
#include <fstream>
#include <sstream>
#include <stdio.h>
#include <string.h>
#include <cmath>
#include "TStyle.h"
#include "TROOT.h"
#include "TPluginManager.h"
#include "TSystem.h"
#include "TFile.h"
#include "TGaxis.h"
#include "TCanvas.h"
#include "TH1.h"
#include "TF1.h"
#include "TLine.h"
#include "TSpline.h"
#include "RooAbsData.h"
#include "RooDataHist.h"
#include "RooCategory.h"
#include "RooDataSet.h"
#include "RooRealVar.h"
#include "RooAbsPdf.h"
#include "RooSimultaneous.h"
#include "RooProdPdf.h"
#include "RooNLLVar.h"
#include "RooProfileLL.h"
#include "RooFitResult.h"
#include "RooPlot.h"
#include "RooRandom.h"
#include "RooMinuit.h"
#include "TRandom3.h"
#include "RooWorkspace.h"
#include "RooStats/RooStatsUtils.h"
#include "RooStats/ModelConfig.h"
#include "RooStats/ProfileLikelihoodCalculator.h"
#include "RooStats/LikelihoodInterval.h"
#include "RooStats/LikelihoodIntervalPlot.h"
#include "TStopwatch.h"
using namespace std;
using namespace RooFit;
using namespace RooStats;

int main(){
   // Access the inputs.
      // Open the ROOT workspace file.
         TString myInputFileName = "workspaces/test_workspace_combined_datastat_model.root";
         cout << "Opening file " << myInputFileName << "..." << endl;
         TFile *_file0 = TFile::Open(myInputFileName);
      // Access the workspace.
         cout << "Accessing workspace..." << endl;
         RooWorkspace* myWorkspace = (RooWorkspace*) _file0->Get("combined");
      // Access the ModelConfig
         cout << "Accessing ModelConfig..." << endl;      
         ModelConfig* myModelConfig = (ModelConfig*) myWorkspace->obj("ModelConfig");
      // Access the data.
         cout << "Accessing data..." << endl;
         RooAbsData* myData = myWorkspace->data("obsData");

   // Use the ProfileLikelihoodCalculator to calculate the 95% confidence interval on the parameter of interest as specified in the ModelConfig.
      cout << "Calculating profile likelihood...\n" << endl;
      ProfileLikelihoodCalculator myProfileLikelihood(*myData, *myModelConfig);
      myProfileLikelihood.SetConfidenceLevel(0.95);
      LikelihoodInterval* myConfidenceInterval = myProfileLikelihood.GetInterval();
      // Access the confidence interval on the parameter of interest (POI).
         RooRealVar* myPOI = (RooRealVar*) myModelConfig->GetParametersOfInterest()->first();
   
   // Print the results.
      cout << "Printing results..." << endl;
      // Print the confidence interval on the POI.
         cout << "\n95% confidence interval on the point of interest " << myPOI->GetName()<<": ["<<
            myConfidenceInterval->LowerLimit(*myPOI) << ", "<<
            myConfidenceInterval->UpperLimit(*myPOI) <<"]\n"<<endl;
   return 0;
  }

Compile this code using a Makefile such as the following:

Example file: Makefile

ProfileLikeliHoodCalculator_confidence_level.cpp : ProfileLikeliHoodCalculator_confidence_level.cpp
   g++ -g -O2 -fPIC -Wno-deprecated  -o ProfileLikeliHoodCalculator_confidence_level.cpp ProfileLikeliHoodCalculator_confidence_level.cpp `root-config --cflags --libs --ldflags` -lHistFactory -lXMLParser -lRooStats -lRooFit -lRooFitCore -lThread -lMinuit -lFoam -lHtml -lMathMore -I$ROOTSYS/include -L$ROOTSYS/lib

Compile the code.

make

Results

The end result as displayed in the terminal output is the following:

95% confidence interval on the point of interest SigXsecOverSM: [2.99653, 3.00347]

Further information

Links for ROOT

ROOT User's Guide

-- WilliamBreadenMadden - 2010-10-29

Topic revision: r38 - 2012-01-20 - WilliamBreadenMadden

ATLAS