ln -s setup_glantp.sh /afs/phas.gla.ac.uk/user/a/atlasmgr/physics/GlaNtp/setup_glantp.shthen set up the environment:
source ./setup_glantp.sh -v 00-00-72 -b /afs/phas.gla.ac.uk/user/a/atlasmgr/physics/GlaNtp/ -s GlaNtp\ Packagev17 -a 17.0.5.5.2this will set up the environment for working in a v17 release of Athena, and it will make available the GlaNtp commands in the current working environment. Note that an overview of the GlaNtp framework, made in doxygen, may be found here.
#PBS -j oe -m e -M a.gemmell@physics.gla.ac.ukwith your e-mail address – this enables the batch system to send you an e-mail informing you of the completion (successful or otherwise) of your job.
check out the latest version of the code running framework from subversion (check what this is on trac) using the command
svn co https://ppesvn.physics.gla.ac.uk/svn/atlas/NNFitter/tags/NNFitter-00-00-0X %BR%
check out a version of the GlaNtp code into your home directory (or set up genemflat_batch_Complete2_SL5.sh to point at someone else's installation of the the code). The procedure for how to do this is described in the next section.
ensure you know the ntuple_area variable to be passed in at run-time to genemflat_batch_Complete2_SL5.sh. This will be the directory where the input ntuples are stored.
The BASEBATCHDIR is now set automatically to the working directory when the script is executed.
Set yourself up for access into SVN (using a proxy to access SVN, as described here)
source /data/ppe01/sl5x/x86_64/grid/glite-ui/latest/external/etc/profile.d/grid-env.sh svn-grid-proxy-init
Create the directory where you want to set up your copy, and get a copy of the setup script (afraid the best place to get this script is from the scripts area of the GlaNtp code you're checking out. I am aware of the tautology of getting a script from the package so you can get the package, but that's the way it is. Just download this one file and go from there - you can delete it later when you've got the whole thing. (The code below assumes you're checking out from the trunk. Generally better to check out a specific tag, but the latest tag and the trunk should be the same, so you should just be able to copy and paste the below code.)
mkdir /home/ahgemmell/GlaNtp cd /home/ahgemmell/GlaNtp svn co https://ppesvn.physics.gla.ac.uk/svn/atlas/GlaNtp/trunk/scripts/GlaNtpScript.sh
You then need to set up your environment ready for the validation. This is done with the setup_glantp.sh script, which is available within the NNFitter package. (Yes, I know - another case of getting the code before getting the code...) You run the script (which is also used for debugging the code) with
source setup_glantp.sh
Make a directory to hold the code itself:
mkdir GlaNtpPackage
GlaNtpScript.sh not only checks out and compiles the code, it also then goes and validates it. setup_glantp.sh sets up the environment variables so the validation data can be found.
You now run the script in the parent directory of GlaNtpPackage, specifying whether you want a specific tag (e.g. 00-00-10), or just from the head of the trunk (h) so you're more free to play around with it. It's always a good idea to check out a specific tag, so that whatever you do to the head, you can still run over a valid release.
./GlaNtpScript.sh SVN 00-00-10
This will check out everything, and run a few simple validations - the final output should look like this (i.e. don't be worried that not everything seems to have passed validation!):
HwwFlatFitATLAS Validation succeeded Done with core tests Result of UtilBase validation: NOT DONE: NEED Result of Steer validation: OK Result of StringStringSet validation: OK Result of StringIntMap validation: OK Result of ItemCategoryMap validation: OK Result of FlatSystematic validation: OK Result of LJMetValues validation: OK Result of PhysicsProc validation: OK Result of FlatNonTriggerableFakeScale validation: OK Result of FlatProcessInfo validation: OK Result of PaletteList validation: OK Result of CutInterface validation: NOT DONE: NEED Result of NNWeight validation: NOT DONE: NEED Result of FlatFileMetadata validation: OK Result of FlatFileMetadataContainer validation: OK Result of Masks validation: NOT DONE: NEED Result of FFMetadata validation: OK Result of RUtil validation: NOT DONE: NEED Result of HistHolder validation: NOT DONE: NEED Result of GlaFlatFitCDF validation: OK Result of GlaFlatFitBigSysTableCDF validation: OK Result of GlaFlatFitBigSysTableNoScalingCDF validation: OK Result of GlaFlatFitATLAS validation: OK Result of FlatTuple validation: OK Result of FlatReWeight validation: OK Result of FlatReWeight_global validation: OK Result of FlatReWeightMVA validation: OK Result of FlatReWeightMVA_global validation: OK Result of TreeSpecGenerator validation: OK Result of FlatAscii validation: OK Result of FlatAscii_global validation: OK Result of FlatTRntp validation: OK
GeneralParameter string 1 FlatTupleVar/<variable_name>=<tree>/<variable_name_in_tree>Also specified are the name of the leaf for the cutmask and invert word -- these are global values for a file.
GeneralParameter string 1 CutMaskString=cutMask GeneralParameter string 1 InvertWordString=invertWordThe structure of Computentp's output is specified by
ListParameter EvInfoTree:1 1 NN_BJetWeight_Jet1:NN_BJetWeight_Jet1/NN_BJetWeight_Jet1If you want a parameter to be found in the output, best to list it here....
ListParameter EvInfoTree:1 1 NN_BJetWeight_Jet1:NN_BJetWeight_Jet1/NN_BJetWeight_Jet1Currently all information is in the EvInfoTree, which provides event level information. However, future work will involve trying to establish a GlobalInfoTree, which contains information about the entire sample, such as cross-section - this will only need to be loaded once, and saves having to write the same information into the tree repeatedly, and subsequently reading it repeatedly.
ListParameter <tag> <onoff> <colon-separated-parameter-list><onoff> - specifies whether this parameter will be taken into consideration (1) or ignored (0) - generally this should be set to 1.
ColumnParameter <tag> <sequence> <keyword=doubleValue:keyword=doubleValue...>The expression <tag>:<sequence> must be unique, e.g.
ColumnParameter File 0 OnOff=0:SorB=0:Process=Data ColumnParameter File 1 OnOff=1:SorB=0:Process=Fakewhere <tag> is the same, but <sequence> is different. The fact that the <sequence> carries meaning is specific to the implementation. Note that all of the values passed from ColumnParameter will eventually be evaluated as Doubles - any variables where you pass a string (as for 'Process' above), this is not actually passed to the code - these code snippets are to make the code more easily readable by puny humans, who comprehend the meaning of strings more readily than Doubles.
0 116102-filter.root FlatPlotter/NNScoreAny_0_0_0 0The file in general expects each line to contain an integer, 2 strings and another integer, separated by spaces. If the integers are less than one, then that line is ignored. Therefore, so long as you are careful to exclude spaces from your strings, and stick to the string/integer formula, it is possible to place comments in this file:
-1 -------------------------------------------------------- x -1 -1 ttH x -1 -1 -------------------------------------------------------- x -1
Process_0_0 TTjj:Semileptonic Process_1_0 ttH:Semileptonic Process_2_0 EWK:Semileptonic Process_3_0 QCD:Semileptonic
GeneralParameter string 1 FileString=my_EventtypeIndicates the leaf in the input file which shows which process the event belongs to - this is the same number as we specify later in genemflat for steerComputentp.txt - it does not have to be consistent with the process numbers as defined in atlastth_hislist_flat-v15.txt, AtlasttHRealTitles.txt and FlatAtlastthPhysicsProc1.txt.
ColumnParameter File 1 OnOff=1:SorB=1:Process=tthThe number before the switches (OnOff, SorB, etc - in this case it is 1) corresponds to the number given in AtlasttHRealTitles.txt. The other numbers are self-explanatory - they establish if that file is to be used, if it is signal or background (1=signal. 0=background) and the name of the process. In this instance, the Process name is just a comment for your own elucidation - it is not used itself in the code, so does not necessarily have to correspond to the process names as provided in AtlasttHRealTitles.txt (though of course it is useful for them to be similar). The other file that is produced by genemflat that specifies the input files for Computentp is steerComputentp.txt
# Specify the known metadata ListParameter SignalProcessList 1 Alistair_tth ListParameter Process:Alistair_tth 1 Filename:${ntuple_area}/ttH-v15.root:File:${mh}:IntLumi:1.0This is just a list of the various input files, and we specify the integrated luminosity. The 'File' parameter is only used for book-keeping by Computentp, and does not have to correspond to the file numbers used in the ANN steering files (or to my_Eventtype), but for sanity's sake it is probably best to keep things consistent. We make an exception for the signal - we assign it the number ${mh} - so that we can keep track of things if we have different mass Higgs in our signals.
# Map of input file name to output file name: The ComputentpOutput will have a sed used to get the right mapping. ListParameter InputOutputMapName:1 1 ${ntuple_area}/ttH-v15.root:${Computentpoutput}/tth_NNinput.rootThe InputOutputMapName is a list of integers - this doesn't have to bear any relevance to any numbers that have gone before - just give each output a unique number. This is followed by the mapping of input file names provided, to the output names that Computentp will produce.
ColumnParameter BackgroundList 0 tt0j=0 ColumnParameter SignalList 1 ttH=1 ColumnParameter DataList 1 Data=11Here you specify once again the numbers assigned to the processes by my_Eventtype (for tt0j it equals zero), and list things as BackgroundList, SignalList or DataList. The number after 'BackgroundList' or 'SignalList' is unique for each process (to preserve the uniqueness of <tag>:<sequence>), but it must be sequential, running from 0 to n-1 (where you have n samples) - apart from for DataList entries (as shown above). It also does not need to correspond to my_Eventtype, however, for completeness' sake within this file I have set it as such. The number at the end of this declaration (tt0j=0 in this case) needs to be sequential - it instructs the net of the order in which to process the samples, so it must go from 0 to n-1 (when you have n samples). It must match up with the numbers provided in atlastth_histlist_flat-v16.txt and AtlasttHRealTitles.txt so that processes and data can be matched to the various individual files.
ColumnParameter PseudoDataList 0 tt0j=0This is simply a restatement of the BackgroundList (as we're looking for exclusion, the pseudodata is background only) - the same numbers in the same place. This list specifies the processes included in the pseudoexperiments, and therefore the signal process is not included in this list.
ListParameter ProcessLabels:1 1 tt0j:t#bar{t}0jThe number after ProcessLabels again doesn't correspond to my_Eventtype - I have made it the same as the number after BackgroundList/SignalList and PseudoDataList. The important feature from this is that it tells the ANN what to label each of the various processes as in the results plots. The numbers must run from 1 to n.
ColumnParameter UCSDPalette 0 tt0j=19 ColumnParameter PrimaryColorPalette 0 tt0j=0These two parameters specify the colours used in the plotting for each of the processes (the numbers correspond to those in the Color Wheel of TColor). The numbers after the UCSDPalette and PrimaryColorPalette are the same ones as have been used previously in this file. Whether the plotting uses the colours stated in UCSDPalette or PrimaryColourPalette is determined in the file flatsteerStackNNAtlas.txt by setting the parameter:
GeneralParameter string 1 Palette=UCSDPaletteThe final parameter to be set in FlatAtlastthPhysicsProc1.txt is:
ColumnParameter ProcessOrder 0 tt0j=0Once again, the number on its own (in this case 0) is the same as the other such instances in this file. The final number (zero in this case) is the order in which this process should be plotted - i.e. in this case, the tt0j sample will be plotted first in the output, with the other samples piled on top of it. This number obviously does not need to correspond to my_Eventtype.
ColumnParameter Combine:Lumi 0 OnOff=1:Low=-0.11:High=0.11:Channel=1:Process=TTjjThe <sequence> parameter (in this case '0') is there so that you can specify the parameters for a given error for multiple channels, without falling foul of the uniqueness requirement for <tag>:<sequence>. We have chosen it so that it equals my_Eventtype for that process. 'Channel' is present just in case you're considering multiple channels. We're only considering the one channel in this case (SemiLeptonic). The final parameter (Process) is not actually used - the second parameter tells the ANN which errors are which, but this isn't very easily read by you, so feel free to add it in to help you keep track of the various errors! These final few parameters can be placed in any order, so long as they are separated by semicolons.
GeneralParameter int 1 NEvent=20000000 GeneralParameter int 0 FirstEvent=1 GeneralParameter int 0 LastEvent=10FirstEvent and LastEvent allow you to specify a range of events to run over - this is liable only to be useful during debugging. (Note that these parameters are currently turned off). NEvent gives the maximum number of events processed for any given sample - take care with this, if you are running a particularly large sample through the code....
ListParameter EvInfoTree:1 1 my_NN_BJetWeight_Jet1:my_NN_BJetWeight_Jet1/my_NN_BJetWeight_Jet1This information must be provided for every variable you're interested in in any way. It provides the variable name, and a map to that variable name from the input tree. Note that the number after EvInfoTree must be unique for each entry (EvInfoTree:2, EvInfoTree:3, etc)
ListParameter SpecifyVariable:my_NN_BJetWeight_Jet1 1 Type:doubleThis is another compulsory piece of information for GlaNtp - telling it which tree the information is in (event or global) and the event type.
ColumnParameter SpecifyHist:my_NN_BJetWeight_Jet1 0 OnOff=1:Min=-5:Max=10:NBin=25This is just for the plotting scripts (but if you're training on variables, you should probably want them plotted as well...). The number after the SpecifyHist string (in this case 0) needs to be different for each entry. OnOff decides whether the variable is to be plotted or not, and must be specified. Min and Max specifiy the range of the x-axis (for energy / mass, this is in units of MeV), and unless specified defaults to 0 and 200 respectively. NBin specifies the number of bins in the histogram, with the default of 50.
ListParameter DiscrToLabel:7 1 my_NN_BJet12_M:M^{BJet}_{12}\(MeV/c^{2}),MeV/c^{2}The number number following DiscrToLabel must match that given in VariableTreeToNTPATLASttHSemiLeptonic-v16.txt. Then comes the real variable name - the name that the code deals with. Following the semicolon is the x-axis label, written in LaTeX style formatting. The backslash denotes a space in the axis label (the parameter must be one long continuous stream). The bit after the comma is optional, but if used specifies the units for the y-axis (e.g. # events per MeV). These labels are written using Root's LaTeX markup.
sysfile=FlatSysSetAtlastth.txt steerfile=FlatFitSteer.txt mkdir -p templates/fit rm -f templates/fit/out_${mh}.log Fit ${basehistlistname} ${template_area}/ \$sysfile \$steerfile $mh > templates/fit/out_${mh}.logThe final call is rendered in the actual job file (e.g. run114) as
Fit /home/ahgemmell/NNFitter-00-00-09-Edited/NNTraining/atlastth_histlist_flat-v15.txt templates/tth120/ $sysfile $steerfile 120 > templates/fit/out_120.logIf you want to save time, (by not having to run templating for every error you wish to consider), you can instead only consider the rate uncertainties, and provide these as fractional changes to the rate, specified in FlatSysSetAtlastth1.txt. Whether or not you consider shape uncertainties is controlled by a couple of parameters in the steering file FlatFitSteer,txt, (which is created by the action of genemflat_batch_Complete2_SL5.sh)
GeneralParameter bool 1 UseShape=0 GeneralParameter bool 1 UseShapeMean=0Setting UseShape=1 means shape uncertainties will be taken into account for all the uncertainties that you provide the extra steering files and ANN scores for, UseShapeMean=1 means that the ANN results for your various uncertainties will be used to produce the rate uncertainties based on their integrals, rather than on the numbers provided in FlatSysSetAtlastth1.txt - using the relative sizes of the integrals of the AAN output as an estimator of the rate uncertainty can be useful if you don't want to be subject to statistical variations in the computation of your systematic uncertainties (if UseShapeMean=0, the systematic rate uncertainty is calculated as a fractional change on the nominal rate). Considering shape uncertainties requires more steering files, and this will be detailed in later.
ColumnParameter Combine:Lumi 0 OnOff=1:Low=-0.11:High=0.11:Channel=1:Process=TTjjThe first parameter consists of two parts in this example: 'Combine' and 'Lumi'. The second part is the name of the uncertainty being considered. The first part 'Combine' (and the associated semicolon between them) is optional. It tells the ANN that the uncertainty thus labelled are independent of each other, and can be added in quadrature. 'OnOff' obviously tells the ANN to consider those uncertainty (1) or not (0). 'Low' and 'High' establish the relevant bounds of the uncertainty as fractions of the total (however, for the ANN these uncertainties are symmetrised, so to save time they are here assumed to be symmetric unless elsewhere stated) - note that these are not the uncertainties on the quantity, but rather the effect of that uncertainty on the rate of your process. Process is not actually read by the ANN, but is there to make the whole thing more human-friendly to read. The current errors, and their bounds are below. If no source for these error bounds is given, then they were the defaults found in the files from time immemorial (where as necessary I assumed that all tt + X errors were the same, as were all ttbb (QCD) errors, as in the original files the only samples considered were ttjj, ttbb(EWK), ttbb(QCD) and ttH - these errors probably originate from the CSC note). If you are only considering rate uncertainties, this is where the fitting code will find the relevant numbers.
ListParameter SysInfoToSysMap:1 1 Combine:LumiTrigLepIDThe number in the <tag> after SysInfoToSysMap is unique for each error (in this case it goes from one to eight). There is one entry per error considered, apart from the cases where the errors are combined in quadrature (as specified in FlatSysSetAtlastth1.txt), where they are given one entry to share between them. The <colon-separated-parameter-list> provides a map between the name of the errors as considered by FlatSysSetAtlastth1.txt (the errors combined in quadrature are lumped together under the name 'Combine'), and something more human-readable. The human-readable names are what will be written out by the fitting code (which identifies each error based on numbers, rather than the names in FlatSysSetAtlastth1.txt) when it is producing its logfile. Obviously there is often not much change between the two names, apart form in the case of Combined errors.
template_area=templates/${process}${mh} 0 116102-filter.root FlatPlotter/NNScoreAny_0_0_0 0could become:
template_area=${MAINDIR} 0 run1/templates/tth120/116102-filter.root FlatPlotter/NNScoreAny_0_0_0 0This ensures that atlastth_histlist_flat-v15.txt will still point toward the ANN templates from the nominal run. You must now create additional steering files to point toward the high and low error ANN templates - their names are of the format:
"ShapePos_"+errorname+"_"+HistOutput "ShapeNeg_"+errorname+"_"+HistOutputwhere HistOutput is atlastth_histlist_flat-v15.txt and errorname is the human-readable error name, as defined in SysNamesAtlastth1.txt. You also need to change the ${basehistlistname} in the call to the fitting code so that it points directly at atlastth_histlist_flat-v15.txt, with no preceding directory structure - the code bases the names of the two extra shape steering files on this argument, and will not take into account any directories in the argument. (So that if ${basehistlistname} was directory/file.txt, the fitting code would look for the extra steering files with the name ShapePos _ISR_directory/file.txt in the case of ISR being our error).
GeneralParameter string 1 FlatTupleVar/cutWord=my_GoodJets_N/my_GoodJets_NThis sets the variable we wish to use in our filter - it interfaces with the cutMask and invertWord as specified in TreeSpecATLAStth.txt. Note that depending on the number of jets you wish to run your analysis on (set as a command line argument during the running of the script), this is edited with genemflat.
ListParameter SpecifyVariable:Higgs:cutMask 1 Type:int:Default:3 ListParameter SpecifyVariable:Higgs:invertWord 1 Type:int:Default:0InvertWord is used to invert the relevant bits (in this case no bits are inverted) before the cut from cutMask is applied. The cutMask tells the filter which bits we care about (we use a binary filter). So, for example, if cutMask is set to 6 (110 in binary), we are telling the filter that we wish the second and third bit to be equal to one in cutWord - we don't care about the first bit. It is possible to specify multiple options of the cutMask and invertWord in the same file, distinguished by the word after SpecifyVariable (in this case Higgs). Which ones are used are determined by teststeerFlatReaderATLAStthSemileptonic-v16.txt.
GeneralParameter string 1 Constraint=(my_failEvent&3)==3This controls the events used in the training, using a bitwise comparison. If the constraint is true (i.e. the first two bits are set, and not equal to zero), then the event is used for training. This filter is not used currently, as training of the net takes place based on the Computentp output - this Computentp output only contains sensible states (as specified in the TreeSpecATLAStth.txt file's filter). If further filtering is required, then care must be taken to ensure that my_failEvent (or whatever you wish to base your filter on) is specified in the VariableTreeToNTP file, so that Computentp will copy it into its output. **If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&. This is because USEHILOSB adds more constraints.**
GeneralParameter string 1 ControlRegion=HiggsSpecifies which cutMask and invertWord are to be used from TreeSpecATLAStth_global.txt. This is changed at runtime with one of the parameters
./genemflat_batch_Complete2_SL5.sh 12 400 1.04 tth 120 120 6 Higgs 00-00-45 /data/atlas07/stdenis/v16-r13/bjet2 agemmell@cern.ch srv001 ahgemmell ppepc23.physics.gla.ac.ukThese options denote:
12 is the run number
400 is the jobstart - this is a potentially redundant parameter to do with the PBS queue.
1.04 is the luminosity that will be normalised to (in fb^-1).
tth is the process type - aim to develop this to incorporate other processes, e.g. lbb
120 is the min. Higgs mass
120 is the max. Higgs mass
6 is the number of jets in the events you want to run over (i.e. this is an exclusive 6 jet analysis - events with 7 jets are excluded)
Higgs controls which cutMask and invertWord you wish to use, as specified in TreeSpecATLAStth-v16_global.txt. Current options are 'Higgs' and 'NoCuts'
00-00-45 is the release of GlaNtp that you are using for your run
/data/atlas07/stdenis/v16-r13/bjet2 is the directory where the input ntuples are located (having my_failEvent bits set for ( 65536 for >0 sensible states) and ( 131072 for 4 tight b-tagged jets)
agemmell@cern.ch is your email address, so the batch system can let you know when the jobs are done
srv001 is the Neurobayes server you want to run (if you're running a TMVA run, this is less important. There are 10 servers, 001-010. Servers 001-005 are on ppepc23, servers 006-010 are on ppepc39. This is related to the last argument you can pass to the script.
ahgemmell is your Glasgow username, used for Neurobayes servers
ppepc23.physics.gla.ac.uk is the machine your Neurobayes server is located on. If you don't provide this argument, it defaults to ppepc23.
Creates a run12 subdirectory in working directory and makes it the working directory
Creates TMVAsteer.txt - writes fitting parameters to it
NN structure is set ( H6AONN5MEMLP MLP 1 H:!V:NCycles=1000:HiddenLayers=N+1,N:RandomSeed=9876543). This line sets up two hidden layers with N+1 and N neurons respectively (where N is the number of input variables).
Training cycles (1000) and hidden layers - N+1?
4 text steer files are copied into the run directory for templating
2 text steer files are copied into the run directory for stacking plots.
2 lines of text are appended to a temporary copy of flatsteerStackNNAtlas.txt: GeneralParameter string 1 HWW=tth-TMVA
GeneralParameter double 1 <nop>IntLumi=${lumi}
jetmin/jetmax - These seem to be redundant. Commented out, effective as of v.3
zmin/zmax (1/2) - what is their function?
weighting = <nop>TrainWeight - is this redundant?
TMVAvarset.txt - input variable set
# Flags to limit the scope of the run if desired Computentps=1 DoTraining=1 ComputeTMVA=1 DoTemplates=1 DoStackedPlots=1 DoFit=1These control whether or not various parts of the code are run - the names of the flags are pretty self-explanatory about what parts of the code they control. For example, it is possible to omit the training in subsequent (templating) runs, if it has previously been done. This shortens the run time significantly. ***NOTE*** The flags DoTraining and DoTemplates had previously (until release 00-00-21) been set on the command line. They were moved from the command line when the other flags were introduced. If you wish the fit to be run using data and not pseudodata, then the flag is set in FlatFitSteer.txt, which is created in genemflat:
GeneralParameter bool 1 PseudoData=1If this flag is set to 1 then pseudodata is used, 0 causes data to be used.
GeneralParameter bool 1 LoadGlobalOnEachEvent=0Determines if you have a separate global tree or not. If you do not, set this equal to one, and the relevant global values will be read out anew for each event from the event tree.
ParamInfo: File:0 OnOff : 1 Process : 2.19001e-314 SorB : 0These reflect the parameters as set in TMVAsteer.txt (created via genemflat_batch_Complete2_SL5_sh). Note that the number following 'Process' is nonsense (and in later releases of GlaNtp is not present) - that parameter is there in the steering file simply to make it more human-readable. However, the code still tries to read it in, but can only handle doubles - the net result varies from run to run, but can always be safely ignored.
****** FlatReader Info Start ****** Entries examined : 2884 Events:00 Seen : 2884 Events:00 Seen Any : 2884 Events:01 Passing Mask Selection for Higgs : 1800 Events:01 Passing Mask Selection for Higgs Any : 1800 Events:02 Passing DilPairType Selection for ALL : 1800 Events:02 Passing DilPairType Selection for ALL Any : 1800 Events:30 Passing Selection : 1800 Events:30 Passing Selection Any : 1800 Events:40 Passing Bad Lepton Energy Filter : 1800 Events:40 Passing Bad Lepton Energy Filter Any : 1800 Events:9999 Final Selection : 1800 Events:9999 Final Selection Any : 1800 No precomputed weight: Weight Difference not checked : 148336 UPEvents: 00 Seen : 2884 UPEvents: 00 SeenAny : 2884 UPEvents:30 Passing Selection : 1800 UPEvents:30 Passing SelectionAny : 1800 UPWtEvents: 00 Seen : -2115.31 UPWtEvents: 00 SeenAny : -2115.31 UPWtEvents:30 Passing Selection : -1318.36 UPWtEvents:30 Passing SelectionAny : -1318.36 WtEvents:00 Seen : -2115.31 WtEvents:00 Seen Any : -2115.31 WtEvents:01 Passing Mask Selection for Higgs : -1318.36 WtEvents:01 Passing Mask Selection for Higgs Any : -1318.36 WtEvents:02 Passing DilPairType Selection for ALL : -1318.36 WtEvents:02 Passing DilPairType Selection for ALL Any : -1318.36 WtEvents:30 Passing Selection : -1318.36 WtEvents:30 Passing Selection Any : -1318.36 WtEvents:40 Passing Bad Lepton Energy Filter : -1318.36 WtEvents:40 Passing Bad Lepton Energy Filter Any : -1318.36 WtEvents:9999 Final Selection : -1318.36 WtEvents:9999 Final Selection Any : -1318.36 ****** FlatReader Info End ******In 'Events', the first number, the entries examined, is the number of entries in the MC sample being passed to the Neural Net. The Final Selection is the number of entries that make it past your cutmask etc to actually be passed to the Neural Net. The intervening numbers are at this stage rather meaningless. 'UPWtEvents' can be safely ignored in its entirety. 'WtEvents' provides the same numbers as in 'Events', but this time with Scale Weights etc applied - these are now the yields.
Print out Yield ================ Channel & tt & ttH & ttbb & Wlnu & Wbb & Wc & Wcc & st_Wt & st_schan & st_tchan & eFake & Data\\ SemiLeptonic& 2154.475 & 2.598 & 49.313 & 4140.901 & 160.634 & 498.841 & 322.464 & 83.942 & 3.265 & 18.288 & 4069.197 & 10616.000 sum & 2154.475 & 2.598 & 49.313 & 4140.901 & 160.634 & 498.841 & 322.464 & 83.942 & 3.265 & 18.288 & 4069.197 & 10616.000 ====================================================== NSig 2.59829 NBkg 11501.3 NData 10616 ====================================================== End Print outJust above this is information of the weighted histograms of the various processes (weighted such that the integral equals the yield).
Channel: SemiLeptonic(0) Process: eFake(10) ib= 1 0.88551 1.50073 wgt= 0.88551 wgtE= 1.50073 wgtEsum2= 2.2522The first number is the bin number being considered. wgt is the weighted integral of that bin, and all preceding bins (i.e. the total integral up to that point) Immediately following this is the record of generating the first pseudoexperiment. It lists the weighted contents of each of the bins of a neural net histogram, assuming background only, with poisson fluctuations. It then gives the integral of this pseudoexperiment:
Pseudodata Integral: 11506For obvious reasons this should be similar to the projected background yield. Later on, at the start of the fitting we also have the following:
= After fit ========================================== Parameters fit: 7 Name Value Error =========== ============ ========= LumiTrigLepID : 0.0140496 0.903865 JES : 0.00324431 0.980996 Met : 0.00131664 0.99884 btag : 0.0197434 0.649849 NLOAccep : -0.0627014 0.727165 pdf : -0.0106981 0.99073 xsec : 0.00620052 0.923289These values come from a Minuit fit, so should be taken with a pinch of salt. The 'Value' compares the results of the pseudoexperiment for all the various errors, and compares it to what you told it. E.g. if you said you had 1fb-1 for luminosity, but the pseudodata suggested a luminosity of 1.01, then Value would be 0.01 - you are 'out' by 1%. 'Error' says how much of your proposed error you have 'used' - if you say you have a 10% error on your luminosity, but the fit suggests at 1% error, then 'Error' would be 0.10 - you are using 10% of your 'allowed' error. These are calculated against data, seeing how much the various backgrounds are allowed to vary according to the systematics before they are no longer compatible with data.
GeneralParameter bool 1 PlotLikelihood =1in teststeerFlatReaderATLAStthSemileptonic.txt. The range and number of bins in this plot can be controlled by editing the following switches:
GeneralParameter int 1 LikeliPseudoExpNBin=400 GeneralParameter double 1 LikeliPseudoExpMin=0. GeneralParameter double 1 LikeliPseudoExpMax=10.
Process Name File Name File Scale Events Integral IntLumi Alpha ttjj /data/atlas09/ahgemmell/NNInputFiles_v16/mergedfilesProcessed/105200-29Aug.root 0 1 613 613 1 0.907015 ttH /data/atlas09/ahgemmell/NNInputFiles_v16/mergedfilesProcessed/ttH-v16.root 120 1 556 556 1 1Some of the values are established through steerComputentp.txt in the line
ListParameter Process:ttH 1 Filename:/data/atlas09/ahgemmell/NNInputFiles_v16/mergedfilesProcessed/ttH-v16.root:File:120:IntLumi:1.0
It must also be run on a PBS machine because of the structure of the genemflat_batch_Complete2_SL5.sh file (i.e. PBS commands).
If USEHILOSB is set to 1 then && must be appended to cut criteria, e.g. GeneralParameter string 1 Constraint=(my_failEvent&65536)==0&&
It would be desirable to adapt the code to be able to process different signals, e.g. lbb.
Type source exportscript.sh to export the relevant parameters.
Enter ROOT and type .x runTrainTest.C
./plotTMVA.sh 120 <run> <job>N.B. This is done automatically by genemflat currently.
source setup_glantp.sh 00-00-32At any point you can check that the a given steering file can be read by GlaNtp by using testSteerrv5.exe - found inside your GlaNtp package:
testSteerrv5.exe <file to be tested>Another debugging script checks you have defined the processes correctly:
testFlatProcessInforv5.exe FlatAtlastthPhysicsProc1.txtThe output from the near the end of this is the important bit:
Table of what category each process is falling under IP : PN : LB : B : D : S : P : O 0 : tt : t#bar{t} : 1 : 0 : 0 : 1 : 0IP is something or other (need to ask Rick to remind me), PN is process name, LB is the label of the process. B, D and S are whether or not the process is Background, Data or Signal respectively. P is whether or not that process is included in the manufacture of Pseudoexperiments, and O is the order in which that process is plotted. To debug the code further, two things need to be done - first, all the debug switches need to be turned on, and then you need to restrict the number of events to ~10 (for a Computentp run this will still manage to generate a 2 GB log file!). All of these switches are found in teststeerFlatReaderATLAStthSemileptonic.txt (the progenitor for all FlatReader files) and steerComputentp.txt (created by genemflat). The debug switches are:
GeneralParameter bool 1 Debug=0 GeneralParameter bool 1 DebugGlobalInfo=0 GeneralParameter bool 1 DebugEvInfo=0 GeneralParameter int 1 ReportInterval=100In steerComputentp.txt there is also one additional debug option:
GeneralParameter bool 1 DebugFlatTRntp=1All the debug switches can be set to one (I'm not sure of the exact effect of each individual switch) - the report interval can be adapted depending on how many events are present in your input files and on how large you want your log files to be. To restrict the events you use
# # Loop Control # GeneralParameter int 1 NEvent=999999 GeneralParameter int 0 FirstEvent=1 GeneralParameter int 0 LastEvent=10The easiest switch is to set NEvent=10 - however, if desired you can run over a specified range, by switching of the NEvent switch (changing it to int 0 NEvent) and switching on the other two switches, using them to specify the events you wish to run over. Then you can run a subset of a complete run, but altering the flags found in genemflat:
# Flags to limit the scope of the run if desired Computentps=1 DoTraining=0 ComputeTMVA=0 DoTemplates=0 DoStackedPlots=0 DoFit=0However, sometimes even this can not produce enough information., so there exist a few other options for checking your code. The first option is
runFlatReader FlatReaderATLAStthNoNN.txt /data/atlas09/ahgemmell/NNInputFiles_v16/mergedfilesProcessed/ttH-v16.rootThis produces a lot of printout, so be sure to restrict the number of events as described above! An example of part of the output is
****** FlatReader Info Start ****** Entries examined : 13307 Events:00 Seen : 13307 Events:00 Seen Any : 13307 Events:01 Passing Mask Selection for Higgs : 5049The numbers correspond to first of all the number of entries in the input MC passing into the FlatReader (e.g. in our preselection we require at least 6 jets). The second number is in reference to the number of entries passing the cutMask (e.g. requiring exactly 6 jets). This is shortly followed by
WtEvents:00 Seen : 6.17741 WtEvents:00 Seen Any : 6.17741 WtEvents:01 Passing Mask Selection for Higgs : 2.59829These entries correspond to the yields - the numbers of events expected in our specified luminosity. If you want to get more debugging from Computentp, then run it with another argument (doesn't matter what the argument is - in the example below it's simply 1):
Computentp steerComputentp.txt 1
Double Variable: my_NN_BJet12_M not valid and hence saved : 1Look at VariableTreeToNTPATLASttHSemiLeptonic-v16.txt - are the names of the variables really consistent?
To enable you to specify the range and number of bins in the histogram showing the distribution of the pseudoexperiment exclusions. (Found in drivetestFlatFitAtlastth.rootUnscaledTemplates.root)GeneralParameter int 1 LikeliPseudoExpMin=0. GeneralParameter int 1 LikeliPseudoExpMax=10. GeneralParameter int 1 LikeliPseudoExpNBin=400
H6AONN5MEMLP MLP 1 !H:!V:NCycles=1000:HiddenLayers=N+1,N:RandomSeed=9876543If the phrase 'H6AONN5MEMLP' is changed, then this change must also be propogated to the webpage plotter (e-mail from Rick 1 Mar 2011)
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
ods | CutWordListv8.ods | r1 | manage | 73.5 K | 2012-05-03 - 10:29 | AdrianBuzatu | CutWordList version 8 |
ods | CutWordListv9.ods | r1 | manage | 73.8 K | 2012-05-04 - 09:17 | RichardStDenis | Cutmasks for tth, WH and ZH version 9 |
eps | Est_12_120.eps | r1 | manage | 16.0 K | 2009-07-24 - 12:06 | GavinKirby | |
txt | FlatStackParams.txt | r1 | manage | 4.9 K | 2012-03-07 - 16:06 | AdrianBuzatu | Description of the parameters that can be set in the plotting control. |
eps | FlatStack_1.eps | r1 | manage | 90.1 K | 2009-07-24 - 11:12 | GavinKirby | |
eps | FlatStack_2.eps | r1 | manage | 109.0 K | 2009-07-24 - 12:06 | GavinKirby | |
Report-FINAL.pdf | r1 | manage | 202.3 K | 2009-09-29 - 10:47 | ChrisCollins | ||
eps | drivetestFlatFitAtlastth.rootSemiLeptonic_lnsb1.eps | r1 | manage | 17.0 K | 2009-07-24 - 12:06 | GavinKirby | |
eps | drivetestFlatFitAtlastth.rootSemiLeptonic_lnsb2.eps | r1 | manage | 16.1 K | 2009-07-24 - 12:06 | GavinKirby | |
eps | score_12_120.eps | r1 | manage | 28.5 K | 2009-07-24 - 12:06 | GavinKirby |