-- WilliamBreadenMadden - 2011-08-03

Higgs analysis at ATLAS using RooStats

This page contains basic information on getting started with Higgs analysis at ATLAS using RooStats.

A note on code and formatting

Example code is generally indented and given highlighting. Explanatory code and text directly relating to it has yellow highlighting while example code that one might run directly in ROOT is given a green-text-on-black-background console style. Scripts are given grey highlighting. So, in a nutshell, code segments are in yellow, while full scripts are in terminal. In code examples given, user-created objects are generally prefaced with "my", for example, "myData" for the purposes of clarity.

What is RooStats?

RooStats is a project to create statistical tools built on top of the RooFit library, which is a data modelling toolkit. It is distributed in ROOT. Specifically, it has been distributed in the ROOT release since version 5.22 (December 2008). The latest version of ROOT is recommended as the RooStats project is developing quickly.

Using the appropriate version of ROOT at Glasgow

There are instructions on how to use the different versions of ROOT at Glasgow here.

Personally acquiring RooStats

There are three main options available for acquiring ROOT with RooStats included.

Option 1: Download the latest ROOT release binaries.

The latest ROOT binaries for various operating systems are accessible here.

Option 2: Build the ROOT trunk from source.

ftp://root.cern.ch/root/root_v5.27.06.source.tar.gz

Follow the appropriate instructions here to build it.

Shell script: building ROOT with RooFit and RooStats

#!/bin/bash # This script builds the latest version of ROOT with RooFit and RooStats in Ubuntu. # First, the ROOT prerequisites are installed, # then, the most common ROOT optional packages are installed. # Next, the latest version of ROOT in the CERN Subversion repository is checked out. # Finally, ROOT is compiled. # Install the ROOT prerequisites. sudo apt-get install subversion sudo apt-get install make sudo apt-get install g++ sudo apt-get install gcc sudo apt-get install binutils sudo apt-get install libx11-dev sudo apt-get install libxpm-dev sudo apt-get install libxft-dev sudo apt-get install libxext-dev # Install the optional ROOT packages. sudo apt-get install gfortran sudo apt-get install ncurses-dev sudo apt-get install libpcre3-dev sudo apt-get install xlibmesa-glu-dev sudo apt-get install libglew1.5-dev sudo apt-get install libftgl-dev sudo apt-get install libmysqlclient-dev sudo apt-get install libfftw3-dev sudo apt-get install cfitsio-dev sudo apt-get install graphviz-dev sudo apt-get install libavahi-compat-libdnssd-dev sudo apt-get install libldap-dev sudo apt-get install python-dev sudo apt-get install libxml2-dev sudo apt-get install libssl-dev sudo apt-get install libgsl0-dev # Check out latest ROOT trunk. svn co http://root.cern.ch/svn/root/trunk ~/root # The configuration for the build is set. cd ~/root # Run this to define the system architecture and to enable building of the libRooFit advanced fitting package: ./configure linuxx8664gcc --enable-roofit # See other possible configurations using the following command: ./configure --help # Start compiling. make # Upon completion, ROOT is run by executing ~/root/bin/root. # The following line could be added to the ~/.bashrc file: # export PATH=$PATH:/home/wbm/root/bin

Option 3: Build the RooStats branch.

Do this if you want the latest development of RooStats (that has not yet been incorporated into a ROOT version).

The necessary instructions can be found here.

RooFit

The RooFit library provides a toolkit for modelling the expected distribution of events in a physics analysis. Models can be used to perform unbinned maximum likelihood fits, produce plots and generate "toy Monte Carlo" samples for various studies.

The core functionality of RooFit is to enable the modelling of 'event data' distributions, where each event is a discrete occurrence in time, and has one or more measured observables associated with it. Experiments of this nature result in datasets obeying Poisson (or binomial) statistics. The natural modeling language for such distributions are probability density functions F(x;p) that describe the probability density the distribution of observables x in terms of function in parameter p.

In RooFit, every variable, data point, function and PDF is represented in a C++ object. So, for example, in constructing a RooFit model, the mathematical components of the model map to separate C++ objects. Objects are classified by the data or function type that they represent, not by their respective role in a particular setup. All objects are self-documenting. The name of an object is a unique identified for the object while the title of an object is a more elaborate description of the object.

Here are a few examples of mathematical concepts that correspond to various RooFit classes:

Mathematical concept RooFit class
variable RooRealVar
function RooAbsReal
PDF RooAbsPdf
space point (set of parameters) RooArgSet
list of space points RooAbsData
integral RooRealIntegral

Example code: defining a RooFit variable

General form for defining a RooFit variable: RooRealVar x(<object name>, <object title>, <value>, <minimum value>, <maximum value>) Specific example for defining a RooFit variable x with the value 5: RooRealVar x("x", "x observable", 5, -10, 10)

A RooPlot is essentially an empty frame that is capable of holding anything plotted verses its variable.

PDFs

One of the things that makes the RooFit PDFs nice and flexible (but perhaps counterintuitive) is that they have no idea what is considered the observable. For example, a Gaussian has x, the mean and sigma while the PDF is always normalised to unity. What is one doing performing the integration on in order to get 1? For the Gaussian, it is x, by convention; the mean and sigma are parameters of teh model. RooFit doesn't know that x is not special; x, the mean and sigma are all on equal footing. You can tell it that x is the variable to normalise over.

So, RooGaussian has no intrinsic notion of distinction between observables and parameters. The choice of observables (for unit normalisation) is always passed to gauss.getVal().

What is the value of this? In a nutshell, it allows one to do Bayesian stuff very easily.

Bayes' theorem holds that the probability of b given a is related to the probability of a given b. Normally, one might say that the mean of a Gaussian is 1 and that then gives a distribution for x. However, if one had a dataset for x, one might want to know the posterior for the mean. The RooFit ability to switch who it is a probability density function of is very useful for Bayesian stuff. Because RooFit allows this composition, the thing that acts as x in the Gaussian could actually be a function of, say, x^2. You picked up a Jacobian factor in going from x to x^2, so the normalisation no longer makes sense. Sometimes RooFit can figure out what the Jacobian factor should be and sometimes it resorts to numerical integration.

Example code: create a Gaussian PDF using RooStats and plot it using the RooPlot class

{ // Build a Gaussian PDF. RooRealVar x("x", "x", -10, 10); RooRealVar mean("mean", "mean of Gaussian", 0, -10, 10); RooRealVar sigma("sigma", "width of Gaussian", 3); RooGaussian gauss("gauss", "Gaussian PDF", x, mean, sigma); // Plot the PDF. RooPlot* xframe = x.frame(); gauss.plotOn(xframe); xframe->Draw(); }

Example code: telling a RooFit PDF what to normalise over

Not normalised (i.e., this is not a PDF): gauss.getVal(); Hey, RooFit! This is the thing to normalise over (i.e., guarantees Int[xmin, xmax] Gauss(x, m, s)dx == 1): gauss.getVal(x); What is the value if sigma is considered the observable? (i.e., guarantees Int[smin, smax] Gauss(x, m, s)ds == 1):

Composite functions correspond to composite objects.

The RooFit workspace

The RooFit "workspace" provides the ability to store the full likelihood model, any desired priors and the minimal data necessary to reproduce the likelihood function in a ROOT file. Thus, the workspace is needed for combinations and has potential for digitally publishing results (PhyStats agreed to publish likelihood functions). The RooFit workspace can be used for this.

One might create a Gaussian PDF and then import it into a workspace. All of the dependencies of the Gaussian shall be drawn in and owned by the workspace. There are no nightmarish ownership problems. Alternatively, one might simply create the Gaussian inside the workspace using the Workspace Factory.

Example code: using the Workspace Factory to create a Gaussian PDF

// Create a Gaussian PDF using the Workspace Factory (this is essentially the shorthand for creating a Gaussian). RooWorkspace* myWorkspace = new RooWorkspace("myWorkspace"); myWorkspace->factory("Gaussian::g(x[-5, 5], mu[0], sigma[1]");

Example code: What's in the workspace?

// Open the appropriate ROOT file. root -l myFile.root // Import the workspace. myWorkspace = (RooWorkspace*) _file0->Get("myWorkspace"); // Print the workspace contents. myWorkspace.Print(); // Example printout: variables --------- (x,m,s) p.d.f.s ------- RooGaussian::g[ x=x mean=m sigma=s ] = 0 datasets -------- RooDataSet::d(x) // Import the variable saved as x. RooRealVar* myVariable = myWorkspace.var("x"); // Import the PDF saved as g. RooAbsPdf* myPDF = myWorkspace.pdf("g"); // Import the data saved as d. RooAbsData* myData = myWorkspace.data("d");

Example code: accessing the workspace

// Open the appropriate ROOT file. root -l BR5_MSSM_signal90_combined_datastat_model.root // Import the workspace. myWorkspace = (RooWorkspace*) _file0->Get("combined"); // Print the workspace contents. myWorkspace->Print(); // Import the PDF. RooAbsPdf* myPDF = myWorkspace->pdf("model_BR5_MSSM_signal90"); // Import the variable representing the observable. RooRealVar* myObservable = myWorkspace->var("obs"); // Create a RooPlot frame using the imported variable.. RooPlot* myFrame = myObservable.frame(); // Plot the PDF on the created RooPlot frame. myPDF.plotOn(myFrame); // Draw the RooPlot. myFrame->Draw();

RooStats

RooStats provides tools for high-level statistics questions in ROOT. It builds on RooFit, which provides basic building blocks for statistical questions.

The main goal is to standardise the interface for major statistical procedures so that they can work on an arbitrary RooFit model and dataset and handle any parameters of interest and nuisance parameters. Another goal is to implement most accepted techniques from frequentist, Bayesian and likelihood based approaches. A further goal is to provide utilities to perform combined measurements.

Datasets

In RooFit, data can be stored in an unbinned or binned manner. Unbinned data is stored using the RooDataSet class while binned data is stored using the RooDataHist class. The user can defined how many bins there are in a variable. For the purposes of plotting using the RooPlot class, a RooDataSet is binned into a histogram.

Example code: generating toy Monte Carlo, storing it as unbinned data and then plotting it

} // Create a RooDataSet and fill it with generated toy Monte Carlo data: RooDataSet* myData = gauss.generate(x, 100); // Plot the dataset. RooPlot* myFrame = x.frame(); myData.plotOn()(myFrame); myFrame.Draw(); }

Importing data (how to populate datasets from histograms and TTrees

Importing data from ROOT trees

Importing data from ROOT TH histogram objects (take a histogram and map it to a binned data set)

Fitting a PDF to unbinned data

Example code: generating toy Monte Carlo, storing it as unbinned data and then plotting it

// Fit gauss to unbinned data gauss.fitTo(*myData);

Example code: create a simple model using the RooFit Workspace Factory. Specify parts of the model using ModelConfig. Create a simple dataset. Complete a confidence interval test using the ProfileLikelihoodCalculator of RooStats

{ // In this script, a simple model is created using the Workspace Factory in RooFit. // ModelConfig is used to specify the parts of the model necessary for the statistical tools of RooStats. // A 95% confidence interval test is run using the ProfileLikelihoodCalculator of RooStats. // Define a RooFit random seed in order to produce reproducible results. RooRandom::randomGenerator()->SetSeed(271); // Make a simple model using the Workspace Factory. // Create a new workspace. RooWorkspace* myWorkspace = new RooWorkspace(); // Create the PDF G(x|mu,1) and the variables x, mu and sigma in one command using the factory syntax. myWorkspace->factory("Gaussian::normal(x[-10,10], mu[-1,1], sigma[1])"); // Define parameter sets for observables and parameters of interest. myWorkspace->defineSet("poi","mu"); myWorkspace->defineSet("obs","x"); // Print the workspace contents. myWorkspace->Print() ; // Specify for the statistical tools the components of the defined model. // Create a new ModelConfig. ModelConfig* myModelConfig = new ModelConfig("my G(x|mu,1)"); // Specify the workspace. myModelConfig->SetWorkspace(*myWorkspace); // Specify the PDF. myModelConfig->SetPdf(*myWorkspace->pdf("normal")); // Specify the parameters of interest. myModelConfig->SetParametersOfInterest(*myWorkspace->set("poi")); // Specify the observables. myModelConfig->SetObservables(*myWorkspace->set("obs")); // Create a toy dataset. // Create a toy dataset of 100 measurements of the observables (x). RooDataSet* myData = myWorkspace->pdf("normal")->generate(*myWorkspace->set("obs"), 100); //myData->print(); // Use the ProfileLikelihoodCalculator to obtain a 95% confidence interval. // Specify the confidence level required. double confidenceLevel = 0.95; // Create an instance of the ProfileLikelihoodCalculator, specifying the data and the ModelConfig for it. ProfileLikelihoodCalculator myProfileLikelihoodCalculator(*myData, *myModelConfig); // Set the confidence level. myProfileLikelihoodCalculator.SetConfidenceLevel(confidenceLevel); // Obtain the resulting interval. LikelihoodInterval* myProfileLikelihoodInterval = myProfileLikelihoodCalculator.GetInterval(); // Use this interval result. In this case, it makes sense to say what the lower and upper limits are. // Define the object variables for the purposes of the confidence interval. RooRealVar* x = myWorkspace->var("x"); RooRealVar* mu = myWorkspace->var("mu"); cout << "The profile likelihood calculator interval is [ "<< myProfileLikelihoodInterval->LowerLimit(*mu) << ", " << myProfileLikelihoodInterval->UpperLimit(*mu) << "] " << endl; // Set mu equal to zero. mu->setVal(0); // Is mu in the interval? cout << "Is mu = 0 in the interval?" << endl; if (myProfileLikelihoodInterval->IsInInterval(*mu) == 1){ cout << "Yes" << endl; } else{ cout << "No" << endl; } }

Further information

RooFit links:

User's Manual

Tutorials

RooStats links:

Wiki

RooStats User's Guide

Tutorials

HistFactory manual

E-group for support with ATLAS-sensitive information: <atlas-phys-stat-root@cern.ch>

E-mail support for software issues, bugs etc.: <roostats-development@cern.ch>

ROOT links:

ROOT User's Guide

-- WilliamBreadenMadden - 2010-10-29

Edit | Attach | Print version | History: r51 | r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r14 - 2011-08-05 - WilliamBreadenMadden
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback