e. santovetti lesson 2 monte carlo methods statistical...

56
1 E. Santovetti E. Santovetti lesson 2 lesson 2 Monte Carlo methods Monte Carlo methods Statistical test Statistical test

Upload: others

Post on 05-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

1

E. SantovettiE. Santovetti

lesson 2lesson 2

Monte Carlo methodsMonte Carlo methodsStatistical testStatistical test

Page 2: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

2

IntroductionIntroduction

Monte Carlo methods try to simulate the physical effect and the detector response in order to better understand the detector performance (geometrical acceptance, efficiency, dead time, etc...) to the desired signal.

A crucial issue to face when we want to simulate any process is to provide a robust and reliable random number generator

The usual steps are:

1) Generate sequence r1, r2 …rm, uniform in [0, 1]

2) Use this to produce another sequence x1, x2,...,xm, distribuited according to some pdf f(x), in wich we are interested

3) Use the x values to estimate some property of f (x), e.g., fraction of x values with a < x < b.

Monte Carlo (MC) generated values = simulated data

Page 3: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

3

Random number generatorRandom number generator

Goal: generate uniformly distributed values in [0, 1].

Difficult coin for e.g. 32 bit number... (too tiring).

→ ‘random number generator’ = computer algorithm to generate r1, r2, ..., rn.

Example: multiplicative linear congruential generator (MLCG)

This rule produces a sequence of numbers n0, n1, ...

Page 4: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

4

Random number generator (2)Random number generator (2)

The sequence is (unfortunately) periodic!

Example (see Brandt Ch 4): a = 3, m = 7, n0 = 1.

sequence repeats

Choose a, m to obtain long period (maximum = m – 1);

m usually close to the largest integer that can represented in the computer.

Only use a subset of a single period of the sequence.

Page 5: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

5

Random number generator (3)Random number generator (3)

Is the sequence really a random sequence?

a and m have be chosen so that ri pass few tests of randomness:

Uniform distribution in 0-1;All values independent (no correlation between pairs)

Far better generators available, e.g. TRandom3, based on Mersenne twister algorithm, period = 219937-1=4·106000 (a “Mersenne prime”).See Matsumoto, M.; Nishimura, T. (1998). "Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator". ACM Transactions on Modeling and Computer Simulation 8 (1): 3–30

Commun. ACM 31 (1988) 742 A=40692M = 2147483399suggested

Sequence periodic:pseudo-random sequence

Page 6: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

6

Transformation methodTransformation methodGiven the sequence

find the transformation to obtain

Require that

Then we have

We need the cumulative inverse functionNot always possible

Page 7: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

7

Trasformation method exampleTrasformation method exampleConsider the exponentioal pdf

Page 8: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

8

The acceptance rejection methodThe acceptance rejection method

Consider a pdf case in which it is not trivial to find the inverse of cumulative

A random sequence following the pdf f(x) can be obtained in the following way

Enclose the pdf in a box

Generate a random variable x uniform in the interval [xmin, xmax], x=xmin+(xmax-xmin)r, r uniform in [0,1]

Generate a second independent uniform random number u in the interval [0, fmax]

If u<f(x), x is accepted, if not, rejct and repea

Efficiency much lower than direct extraction: <0.5 depending on the shape

Page 9: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

9

Example of acceptance rejection methodExample of acceptance rejection method

If the dot is below the curve,use it in the histogram

Page 10: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

10

The acceptance rejection methodThe acceptance rejection method

The efficiency of the method could be very low if the curve has long tails (low values of probability for large x regions, e.g. Gaussian)

We can select different regions in which the maximum is very different. For each region we apply the acceptance method separately considering the local maximum.

Caveat: The generation is done in each region as many times as higher is the local maximum

Page 11: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

11

Random generator softwareRandom generator software

Page 12: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

12

Monte Carlo event generatorMonte Carlo event generator

Consider a very simple event

Generate cosθ and φ

Less simple events:

For this processes dedicated software (PYTHIA, EvtGen, HERWIG, ISAJET, …) are used

Output = ‘events’, i.e., for each event we get a list of generated particles and their momentum vectors, types, etc

Page 13: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

13

Monte Carlo event generatorMonte Carlo event generator

PYTHIA generator:p p gluino gluino

First, the elementary particles

...then, the oservable particles

Page 14: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

14

Monte Carlo detector simulationMonte Carlo detector simulationTakes as input the particle list and momenta from generator.

Simulates detector response:multiple Coulomb scattering (generate scattering angle),particle decays (generate lifetime),ionization energy loss (generate Δ),electromagnetic, hadronic showers,production of signals, electronics response, …

Output = simulated raw data → input to reconstruction software: track finding, fitting, etc.

Predict what you should see at ‘detector level’ given a certain hypothesis for ‘generator level’. Compare with the real data in particular channel to tune and calibrate detector simulated response

Estimate ‘efficiencies’ = #events found / # events generated.Programming package: GEANT4

Page 15: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

15

Statistical test:Statistical test:general conceptsgeneral concepts

Page 16: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

16

HypothesesHypotheses

A hypothesis H is some theory or model that specifies the probability for the data, i.e., the outcome of the observation.

x could be uni-multivariate, continuous or discrete.

x could represent e.g. observation of a single particle, a single event, or an entire “experiment”.

Possible values of x form the sample space S (or “data space”).

Simple (or “point”) hypothesis: f(x|H) completely specified (H is the estimation of a parameter).

Composite hypothesis: H contains unspecified parameter(s).

The probability for x given H is also called the likelihood of the hypothesis, written L(x|H).

Page 17: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

17

Definition of a testDefinition of a test

The goal is to make some statement, based on the observed data x, on the validity of the possible hypotheses.

Consider e.g. a simple hypothesis H1 and alternative H0.

A test of H1 is defined by specifying a (small) critical region W of the data space such that there is no more than some (small) probability α, assuming H1 is correct, to observe the data there, i.e.,

If x is observed in the critical region, reject H1.

α is called the size or significance level of the test.

Critical region also called “rejection” region; complement is acceptance region.

Page 18: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

18

Definition of a test (2)Definition of a test (2)

In general there are a lot of possible critical regions that give approximately the same significance α, (unless the Hypothesis H makes very precise predictions on the event x)

So the choice of the critical region for a test of H1 needs to take into account the alternative hypothesis H0. It is more difficult to give an absolute evaluation of a single hypothesis than to compare two hypotheses. Test H1 against H0

In other wards, we place the critical region where there is a low probability to be found if H1 is true, but high if H0 is true:

xCritical region W

Page 19: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

19

Rejecting a hypothesisRejecting a hypothesis

Note that rejecting H1 is not necessarily equivalent to the statement that we believe it is false and H0 true (confidence level).

In frequentist statistics only associate probability with outcomes of repeatable observations (the data).

In Bayesian statistics, probability of the hypothesis (degree of belief) would be found using Bayes’ theorem:

which depends on the prior probability P(H).

What makes a frequentist test useful is that we can compute the probability to accept/reject a hypothesis assuming that it is true, or assuming some alternative is true (there are no a priori probabilities, all hypotheses are equivalent at the beginning). We need to know the PDF's f(x|H)

Page 20: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

20

Errors of type-I or type-IIErrors of type-I or type-II

The first mistake that we can make is to reject a hypothesis that is the right one. We reject H1 despite it is the right hypothesis. This is a type-I error and the maximum probability to occur is the size of the test (event on the tail of the PDF)

On the other hand we can accepts H1 as true, while it is false and the hypothesis H0 is true. This is called type-II error and occurs with probability

1-β is called the power of the test with respect to the alternative H0

xCritical region W

αβ

β is a measure of the contamination

Page 21: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

21

Example 1Example 1An example is the selection of a particle decay by the reconstruction of the daughter tracks. To select this events we can see their invariant mass

As we can easily see from MC simulation, signal and background (combinatoric) have a completely different distribution

The signal is a Gaussian (or a Crystal Ball function) while the background is a negative exponential (almost linear).

Going enough far from the peak, we can easily find a rejection region with very low

Since the background is almost flat, the probability will be quite large.

In this case, the power of this test against the background contamination is limited (sPlot and sFit technique).

Acceptance region

background contamination in the signal region

Page 22: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

22

Example 2Example 2

An other example is the selection of Z . To detect this channel we use the decay.

A possible background for this channel is the direct decay Z

To reject it, the invariant mass has to be considered far from the Z mass peak

We can also consider the variable:

This variable peaks at zero for the background Z

signal

background

Page 23: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

23

Event example in HEPEvent example in HEP

High pTmuons

High pTjets of hadrons

Simulated SUSY event

Missing transverse energy

Page 24: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

24

Event example in HEPEvent example in HEPSimulated minimum bias event

This event from Standard Model t-tbar production also has high pT jets and muons, and some missing transverseenergy. → can easily mimic a SUSY event.

Page 25: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

25

Statistical tests in a particle physics contextStatistical tests in a particle physics context

For each event the results is a collection of numbers:Momentum (for each particle reconstructed)Particle mass (for each particle reconstructed and recognized)Jets energy and pTMissing energy

x follows some n-dimensional multivariate pdf, which depends on the type of event produced, e.g.

background SUSY signal

We can define for each event two basic hypotheses: H0 (background), H1 (signal) and build the conditional probability of such event given that it is a background or signal event (PDF)

Most of the times we do not have the analytic form of the PDF. We have the results of our simulation program, than distributions of many correlated variables for signal (H1) and background (H0)

Page 26: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

26

Selecting eventsSelecting events

Suppose we have a data sample with two kinds of events, corresponding to hypotheses H0 (background) and H1 (signal) and we want to select those of type H1.

Each event is a point in space (in the present example 2-dimensional space). What ‘decision boundary’ should we use to accept/reject events as belonging to event types H0 or H1?

A first selection can be done with simple linear cuts:

Accepted region

Page 27: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

27

Other selection cutsOther selection cuts

We could also decided to select signal events in other ways.

Accepted region

Accepted region

linear non linear

How to find the best selection way?

Page 28: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

28

Test statisticTest statistic

A generic way to proceed is to introduce a scalar function of the event variables.

t is called test statistic.

Then, let us consider the pdf of the t variable

Selection can be done cutting in the variable t. The cut value tcut divides the space in two regions:

t < tcut: signal, event accepted

t > tcut: background, event rejected

signal background

In general, we do not have explicit pdf form but we have the Monte Carlo simulation that give us the distributions

xCritical region W

αβ

Page 29: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

29

EfficiencyEfficiency

The probability to reject the background hypothesis (then accept the signal hypothesis) for a background event (background contamination) is:

signal

background

The probability to accept the signal hypothesis for a signal event (signal efficiency) is:

and are the tails beyond the cut of the background and signal distribution

xCritical region W

αβ

Background efficiency

Page 30: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

30

Purity of event selectionPurity of event selection

We want now to quantify how ”pure” is our signal sample or what is the probability of the event to be a signal, given that it has been accepted (purity).

We have to make some initial assumptions, suppose also that the fractions of background and signal events in our sample are b and s (prior probabilities)

We can use the Bayes theorem:

So the purity depends on the prior probabilities as well as on the signal and background efficiencies.

We have to maximize it

or better:

Page 31: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

31

Building a test statisticBuilding a test statisticHow can we build a good (optimal) critical region, starting from a test function t?

We can invoke the Neyman-Pearson lemma that states:

To get the highest power for a given significance level in a test ofH0, (background) versus H1, (signal) the critical region should have

inside the region, and ≤ c outside, where c is a constant which determines the power.

Equivalently optimal scalar test statistic is:

Bayes factor, or Likelihoods ratio

In general t is a multidimentional vector, function of x

Acceptance region for r

Page 32: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

32

Building a test statistic (2)Building a test statistic (2)

The problem is that in many cases we do not have the analytical expression of the pdf to evaluate P(x|H1) and P(x|H0)

Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram (n variables considered for each event). With M bins for each of the n variables, total of Mn cells (study of all the possible correlations).

But n is potentially large, → prohibitively large number of cells to populate with Monte Carlo data.

Compromise: test statistic t(x) with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background.

Multivariate methods: Fisher discriminant, Neural networks, Kernel density methods, Support Vector Machines, Decision trees (Boosting*, Bagging)

Page 33: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

33

Linear test statisticLinear test statistic

Ansatz: linear dependence

Choose the parameters a1, ..., an so that the pdfs have maximum ‘separation’: large distance between mean values and small widths

Fischer: maximize

In physics and mathematics, an ansatz is an educated guess that is verified later by its results

Page 34: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

34

Linear test statistic (2)Linear test statistic (2)

We have:

In terms of mean and variance of t:

Page 35: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

35

Linear test statistic (3)Linear test statistic (3)The numerator of J(a) is

We have to maximize

and the denominator is

gives the Fisher’s linear discriminant function

linear decision boundaryWe do not need to know the full joint p.d.f. but only the mean and the covariance: 1/2n(n+3) parameters

Page 36: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

36

Fisher discriminant for Gaussian dataFisher discriminant for Gaussian dataSuppose f(x|Hk) is multivariate Gaussian with

The Fischer discriminant (plus an offset) is:

and the Likelihood ratio is

That is, t log(r) + const., (monotonic) so for this case, the Fisher discriminant is equivalent to using the likelihood ratio, and thus gives maximum purity for a given efficiency.

For non-Gaussian data this no longer holds, but linear discriminant function may be simplest practical solution.

Often try to transform data so as to better approximate Gaussian before constructing Fisher discriminant

Adding a constant does not change the properties of t and only moves the cut value

Page 37: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

37

Fisher discriminant for Gaussian data (2)Fisher discriminant for Gaussian data (2)In case of a multivariate Gaussian with common covariance matrix, it is interesting to evaluate the ”a posteriori” probability of a certain hypothesis

and for a particular choice of the offset a0, we can have:

which is the logistic sigmoid function

Page 38: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

38

Linear decision boundariesLinear decision boundaries

A linear decision boundary is optimal only when both the classes follow a multivariate Gaussian with same covariance matrix but different means

For other cases the linear boundaries can be completely useless

Page 39: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

39

Non linear transformationsNon linear transformationsWe can try to find a non linear transformation of variables such that in the new variables a linear boundary is effective

Non linear test statistic is in many cases mandatory

Accepted region

non linear

Multivariate statistical methods are a Big Industry:

● Neural Networks,● Boosted decision tree,● Kernel density methods● ….

Page 40: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

40

Decision treeDecision treeBoosting algorithms can be applied to any classifier. Here they are applied to decision trees.

S means signal, B means background, terminal nodes called leaves are shown in boxes. The key issue is to define a criterion that describes the goodness of separation between signal and background in the tree split.

Assume the events are weighted with each event having weight Wi. Define the purity of the sample in a node by

Note that P(1 − P) is 0 if the sample is pure signal or pure background. For a given node let (n = number of events in the node)

The criterion chosen is to minimize

To determine the increase in quality when a node is split into two nodes, one maximizes

At the end, if a leaf has purity greater than 1/2 (or whatever is set), then it is called a signal leaf, otherwise, a background leaf. Events are classified signal (have score of 1) if they land on a signal leaf and background (have score of -1) if they land on a background leaf.

The resulting tree is a decision tree.

Page 41: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

41

Boosted decision tree (BDT)Boosted decision tree (BDT)

Decision trees have been available for some time. They are known to be powerful but unstable, i.e., a small change in the training sample can produce a large change in the tree and the results.

Combining many decision trees to make a “majority vote” can improve the stability somewhat. In the boosted decision tree method, the weights of misclassified events are boosted for succeeding trees.

If there are N total events in the sample, the weight of each event is initially taken as 1/N. Suppose that there are M trees and m is the index of an individual tree

There are several commonly used algorithms for boosting the weights of the misclassified events in the training sample. The boosting performance is quite different using various ways to update the event weights

I = 1 if event is misclassifiedI = 0 if event is right classified

event type (real)

event weight

event type from classification

Page 42: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

42

Adaptive boost (Ada Boost)Adaptive boost (Ada Boost)

event weight: will change if the event was mis-identified. Otherwise not. We weight much the events mis-identified in such a way that these cases will be changed

Err is a measure of how the tree mis-identifies the events. Err very small means that the tree is working very well.Alpha large

Page 43: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

43

Adaptive boost (Ada Boost)Adaptive boost (Ada Boost)

For boosting, the training events which were misclassified (a signal event fell on a background leaf or viceversa) have their weights increased (boosted), and a new tree is formed. (somehow the new selection three will put more attention to this events).

This procedure is then repeated for the new tree and so on, many times (iterative process). At the end, many trees are built up.

The score from the mth individual tree Tm is taken as +1 if the event falls on a signal leaf and -1 if the event falls on a background leaf. The final score is taken as a weighted sum of the scores of the individual leaves

Page 44: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

44

BDT concrete exampleBDT concrete example

Using TMVA and some code modied from G. Cowan's CERNacademic lectures (June 2008)

Page 45: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

45

BDT concrete exampleBDT concrete example

First tree (m=0) result

Page 46: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

46

BDT concrete example: tree 1,2,3,4BDT concrete example: tree 1,2,3,4

Page 47: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

47

BDT concrete example: final resultBDT concrete example: final result

BDT much more powerful than Fischer linear discriminant and simple decision tree

Page 48: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

48

Example 2: XOR Example 2: XOR (quite unusual)(quite unusual)

BDT completely efficient even at the first iteration (single decision tree).

Fischer linear discriminant not effective at all.

A variables transformation is in this case very useful (a=x+y, b=x-y)

Page 49: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

49

BDT example: BDT example: circular correlationcircular correlationUsing TMVA and create_circ macro from$ROOTSYS/tmva/test/createData.C to generate dataset

Page 50: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

50

Circular correlation: methods performancesCircular correlation: methods performances

Compare performance of Fisher discriminant, single DT and BDT with more and more trees (5 to 400)

All other parameters at TMVA default (would be 400 trees)

Fisher bad (expected)

Single (small) DT: not so good

More trees improve performance until saturation

Page 51: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

51

Circular correlation, Decision contourCircular correlation, Decision contour

Fisher bad (expected)

Note: max tree depth = 3

Single (small) DT: not so good. Note: a larger tree would solve this problem

More trees improve performance (less step-like, closer to optimal separation) until saturation

Largest BDTs: wiggle a little around the contour → picked up features of training sample, that is, overtraining

The question if it is better to have big tree depth instead of large number of trees depends on the particular case

Page 52: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

52

Circular correlation, Decision contourCircular correlation, Decision contour

Better shape with more trees: quasi-continuous

Overtraining because of disagreement between training and testing? Let's see

Page 53: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

53

Circular correlation, Decision contourCircular correlation, Decision contour

Best signicance actually obtained with last BDT, 400 trees!

But to be fair, equivalent performance with 10 trees already

Less “stepped” output desirable? → maybe 50 is reasonable

Page 54: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

54

BDT exampleBDT exampleApply this method to select the decay

In order to discriminate this channel from the minimum bias event and combinatorial background, we identify a certain number of kinematic variables.

Page 55: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

55

BDT example (2)BDT example (2)Other variables

Page 56: E. Santovetti lesson 2 Monte Carlo methods Statistical teststatistics.roma2.infn.it/~santovet/Downloads/Stat2.pdfMonte Carlo methods Statistical test 2 Introduction Monte Carlo methods

56

BDT exampleBDT example

We train the BDT with data (Monte Carlo simulated) and background.Once the BDT is trained (and fixed) we apply this cut to the real data