Difference: AtlasDataAnalysis (159 vs. 160)

Revision 1602014-05-20 - GavinKirby

Line: 1 to 1

META TOPICPARENT	name="WebHome"

Est_12_120.eps

<-- p { margin-bottom: 0.21cm; }h1 { margin-bottom: 0.21cm; }h1.western { font-family: "Liberation Serif",serif; }h1.cjk { font-family: "DejaVu Sans"; }h1.ctl { font-family: "DejaVu Sans"; }h2 { margin-bottom: 0.21cm; }h4 { margin-bottom: 0.21cm; }h5 { margin-bottom: 0.21cm; }h3 { margin-bottom: 0.21cm; }h3.western { font-family: "Liberation Serif",serif; }h3.cjk { font-family: "DejaVu Sans"; }h3.ctl { font-family: "DejaVu Sans"; }pre.cjk { font-family: "DejaVu Sans",monospace; }a:link { } -->

Line: 31 to 31

The key feature of a neural network is its ability to be "trained" to recognise patterns in data, allowing high efficiency algorithms to be developed with relative ease. This training is typically done with sample data which has been generated artificially, resulting in an algorithm that is very effective at recognising certain patterns in data sets. The only shortcoming is the danger of "over-training" an ANN, meaning that it becomes overly discriminating and searches across a narrower range of patterns than is desired (one countermeasure is to add extra noise to training data).

Changed:

<
<

Computentp :- Simply running the code as above will result in less than optimal Neural Net training. The training procedure requires equal numbers of events from signal and from background (in this case it results in half of the signal events being used in training, half for testing). However, the above code will take events from the background signal samples in proportion to the file sizes - these result in proportions not quite in accordance with physical ratios. As the Neural Net weights results according to information about the cross-section of the process and so on stored in the tree, the final result is that while the outputs are weighted in a physical fashion, the Net is not trained to the same ratios, and so is not optimally trained. To solve this problem, Computentp is used to mix together all background and signal samples., and assign TrainWeights to them, so that the events are weighted correctly for the Net's training.

>
>

Computentp – Simply running the code as above will result in less than optimal Neural Net training. The training procedure requires equal numbers of events from signal and from background (in this case it results in half of the signal events being used in training, half for testing). However, the above code will take events from the background signal samples in proportion to the file sizes - these result in proportions not quite in accordance with physical ratios. As the Neural Net weights results according to information about the cross-section of the process and so on stored in the tree, the final result is that while the outputs are weighted in a physical fashion, the Net is not trained to the same ratios, and so is not optimally trained. To solve this problem, Computentp is used to mix together all background and signal samples and assign TrainWeights to them, so that the events are weighted correctly for the Net's training.

Preparing samples for the Neural Net

Line: 45 to 45

These cross-sections are for the overall process, at √s = 7 TeV.

Changed:

<
<

The ttH sample cross-sections are provided for the overall process - the MC is divided into two samples with W+ and W- independent of one another. These two samples are merged before being put through the ANN.

>
>

The ttH sample cross-sections are provided for the overall process – the MC is divided into two samples with W⁺ and W^- independent of one another. These two samples are merged before being put through the ANN.

Changed:

<
<

The tt samples were initially generated to produce the equivalent of 75 fb^-1 of data, based on the LO cross-sections. Taking into account the k-factor of 1.84, this means that now all samples simulate 40.8 fb^-1 of data. These samples have also had a generator-level filter applied - most events (especially for tt+0j) are of no interest to us, so we don't want to fill up disk-space with them, so we apply filters based on the numbers of jets etc. The Filter Efficiency is the fraction of events that pass from the general sample into the final simulated sample. To clarify how all the numbers hang together, consider the case of tt+0j. We have simulated 66,911 events - as said above, this corresponds to 40.8fb-1 of data. We have a Filter Efficiency of 0.06774, so the full number of events that a complete semi-leptonic event would be comes to 987,762 events in 40fb-1. Divide this by 40 to get the number of events in 1fb-1 (i.e. the cross-section), and you get 24,694 events per fb^-1. Our starting point for our cross-section is 13.18, with a k-factor of 1.84, which gives a cross-section of 24.25 - so all the numbers compare with each other pretty favourably. This of course makes getting from the number of sensible state events to the number expected per fb^-1 rather easy - simply divide by 40.8.... You'll notice that the cross-section includes all the branching ratios already, so we don't need to worry about that.

>
>

The tt samples were initially generated to produce the equivalent of 75 fb^-1 of data, based on the LO cross-sections. Taking into account the k-factor of 1.84, this means that now all samples simulate 40.8 fb^-1 of data. These samples have also had a generator-level filter applied - most events (especially for tt+0j) are of no interest to us, so we don't want to fill up disk-space with them, so we apply filters based on the numbers of jets etc. The Filter Efficiency is the fraction of events that pass from the general sample into the final simulated sample. To clarify how all the numbers hang together, consider the case of tt+0j. We have simulated 66,911 events - as said above, this corresponds to 40.8 fb^-1 of data. We have a Filter Efficiency of 0.06774, so the full number of events that a complete semi-leptonic event would be comes to 987,762 events in 40 fb^-1. Divide this by 40 to get the number of events in 1 fb^-1 (i.e. the cross-section), and you get 24,694 events per fb^-1. Our starting point for our cross-section is 13.18, with a k-factor of 1.84, which gives a cross-section of 24.25 – so all the numbers compare with each other pretty favourably. This of course makes getting from the number of sensible state events to the number expected per fb^-1 rather easy – simply divide by 40.8. You'll notice that the cross-section includes all the branching ratios already, so we don't need to worry about that.

**IMPORTANT** The Filter Efficiency for these samples was calculated based on a no-pileup sample. The filter is generator level, and one of the things it will cut an event for is not enough jets. However, pileup adds jets, but these are added well after the filter. The net result is that a number of events that failed the filter would have passed, had the pileup been added earlier in the process. This means the filter efficiency (and thus the cross-sections) are incorrect, by a yet to determined amount....

View topic | History: r161 < r160 < r159 < r158 | More topic actions...