advanced analysis techniques in hep
DESCRIPTION
Advanced Analysis Techniques in HEP. A reasonable man adapts himself to the world. An unreasonable man persists to adapts the world to himself. So, all progress depends on the unreasonable one. - Bernard Shaw. Pushpa Bhat Fermilab. ACAT2000 Fermilab, IL October 2000. Outline. - PowerPoint PPT PresentationTRANSCRIPT
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
1
Advanced Analysis Techniques in HEP
Pushpa BhatFermilab
ACAT2000Fermilab, ILOctober 2000
A reasonable man adapts himself to the world.An unreasonable man persists to adapts the world to himself.So, all So, all progress depends on the unreasonable one.
- Bernard Shaw
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
2
Outline Introduction
Intelligent Detectors Moving intelligence closer to action
Optimal Analysis Methods
The Neural Network Revolution
New Searches & Precision Measurements Discovery reach for the Higgs Boson Measuring Top quark mass, Higgs mass
Sophisticated Approaches
Probabilistic Approach to Data Analysis
Summary
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
3
Data CollectionData Collection
World before Experiment
World After Experiment
Data TransformationData Transformation
Feature ExtractionFeature Extraction
Global DecisionGlobal Decision Data InterpretationData Interpretation
DataOrganization
ReductionAnalysis
DataOrganization
ReductionAnalysis
Data CollectionData Collection
Express Analysis
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
4
Intelligent DetectorsData analysis starts when a high energy event occursTransform electronic data into useful “physics” information in real-time Move intelligence closer to action!
Algorithm-specific hardware Neural Networks in Silicon
Configurable hardware FPGAs, DSPs – Implement “smart” algorithms in hardware
Innovative data management on-line + “smart” algorithms in hardware Data in RAM disk & AI algorithms in FPGAs
Expert Systems for Control & Monitoring
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
5
Data Analysis TasksParticle Identification e-ID, -ID, b-ID, e/, q/g
Signal/Background Event Classification Signals of new physics are rare and small
(Finding a “jewel” in a hay-stack)
Parameter Estimation t mass, H mass, track parameters, for example
Function Approximation Correction functions, tag rates, fake rates
Data Exploration Knowledge Discovery via data-mining Data-driven extraction of information, latent structure analysis
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
6
Optimal Analysis MethodsThe measurements being multivariate, the optimal methods of analyses are necessarily multivariateDiscriminant Analysis: Partition multidimensional variable space, identify boundaries Cluster Analysis: Assign objects to groups based on similarityExamples Fisher linear discriminant, Gaussian classifier Kernel-based methods, K-nearest neighbor (clustering)
methods Adaptive/AI methods
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
7
x1x1
x2x2
Why Multivariate Methods?
x1x1
x2x2
Because they are optimal!Because they are optimal!
D(x1,x2)=2.014x1 + 1.592x2D(x1,x2)=2.014x1 + 1.592x2
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
8
Also, they need to have optimal flexibility/complexity
x1
x2
)2sin(4.05.0)( xxh Mth Order Polynomial Fit
M=1 M=3 M=10
x1
x2
x1
x2
Simple Flexible Highly flexible
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
9
The Golden Rule
Keep it simpleAs simple as possibleNot any simpler
- Einstein
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
10
Optimal Event Selection
b)p(b)(xp
s)p(s)|p(x
)x(bp
)x(sp
)(xr
b)p(b)(xp
s)p(s)|p(x
)x(bp
)x(sp
)(xr
defines decision boundariesdefines decision boundariesthat minimize the probabilitythat minimize the probabilityof misclassificationof misclassification
So, the problem mathematically reduces to that of calculating r(x), the Bayes Discriminant Function or probability densities
Posterior probabilityPosterior probability
s)|p(xb)(xp
s)|p(x
r1
r
)|( xsp
s)|p(xb)(xp
s)|p(x
r1
r
)|( xsp
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
11
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
12
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
13
Probability Density EstimatorsHistogramming:
The basic problem of non-parametric density estimation is very simple! Histogram data in M bins in each of the d feature variables
Md bins Curse Of Dimensionality In high dimensions, we would either require a huge
number of data points or most of the bins would be empty leading to an estimated density of zero.
But, the variables are generally correlated and hence tend to be restricted to a sub-space Intrinsic Dimensionality
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
14
Kernel-Based MethodsAkin to Histogramming but adopts importance sampling
Place in d-dimensional space a hypercube of side h centered on each data point x,
The estimate will have discontinuities
Can be smoothed out using different forms for kernel functions H(u). A common choice is a multivariate kernel
N
n
n
d h
xxH
hNxp
1
11)(~
N
n
n
d h
xxH
hNxp
1
11)(~
N
n
n
d h
xx
hNxp
12
2
2/2 2
||exp
)2(
11)(~
N
n
n
d h
xx
hNxp
12
2
2/2 2
||exp
)2(
11)(~
N = Number of data points H(u) = 1 if xn in the hypercube = 0 otherwise
h=smoothingparameter
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
15
Place a hyper-sphere centered at each data point x and allow the radius to grow to a volume V until it contains K data points. Then, density at x
If our data set contains Nk points in class Ck and N points in total, then
NV
Kxp )(
NV
Kxp )(
K nearest-neighbor Method
N = Number of data pointsN = Number of data points
VN
KCxp
k
kk )|(
VN
KCxp
k
kk )|(
KKkk = # of points in volume = # of points in volume
V for class CV for class Ckk
K
K
xp
CpCxPxCp kkk
k )(
)()|()|(
K
K
xp
CpCxPxCp kkk
k )(
)()|()|(
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
16
Discriminant Approximation with Neural Networks
Output of a feed forward neural network can approximate the Bayesian posterior probability p(s|x,y)Directly without estimating class-conditional probabilities
x
y
),,( yxn
r
ryxspyxn
1),|(),,(
r
ryxspyxn
1),|(),,(
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
17
Calculating the Discriminant
Consider the sum
i
iii dyxnyxE 2]),,([),,(
Where di = 11 for signal
= 00 for background = vector of parameters
Then
r
ryxspyxn
d
yxdE
1),|(),,(0
),,(
in the limit of large data samples and provided that the function n(x,y,) is flexible enough.
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
18
NN estimates a mapping function without requiring a mathematical description of how the output formally depends on the input.
The “hidden” transformation functions, g, adapt themselves to the data as part of the training process. The number of such functions need to grow only as the complexity of the problem grows.
x1
x2
x3
x4
DNN
aijii
kjj
NN e1
1(a))};X({ D
- ggg
ij
k
Neural Networks
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
19
Measuring the Top Quark Mass
The DiscriminantsThe Discriminants
Discriminant variables shaded = topshaded = top
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
20
Background-rich
Signal-rich
Measuring the Top Quark MassMeasuring the Top Quark Mass
mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2
DØ Lepton+jetsDØ Lepton+jets
Strategy for Discovering the Higgs Boson
at the Tevatron
P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) hep-ph/0001152
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
22
Hints from the Analysis of Precision Data
LEP Electroweak Group, http://www.cern.ch/LEPEWWG/plots/summer99
)(107 6745-
MH = GeV/c2
MH < 225 GeV/c2 at 95% C.L.
)107( 6745
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
23
Event SimulationSignal Processes
Backgrounds
Event generation WH, ZH, ZZ and Top with PYTHIA Wbb, Zbb with CompHEP, fragmentation with PYTHIA
Detector modeling SHW (http://www.physics.rutgers.edu/~jconway/soft/shw/shw.html)
Trigger, Tracking, Jet-finding b-tagging (double b-tag efficiency ~ 45%) Di-jet mass resolution ~ 14%
bbbbZH
bbWHpp
,
tbtbqttWZZZbZbbWbpp ,,,,,,
(Scaled down to 10% for RunII Higgs Studies)
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
24
WH Results from NN AnalysisWH Results from NN AnalysisMMHH = 100 GeV/c = 100 GeV/c22
WH WH vs Wbb
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
25
WH (110 GeV/c2) NN Distributions
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
26
Results, Standard vs. NN
A good chance of discovery up to MH= 130 GeV/c2 with 20-30fb-1
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
27
Improving the Higgs Mass Resolution
13.8% 12.2%
13.1% 11..3%
13%13% 11%11%
Use mjj and HT (= Etjets ) to train NNs to predict the Higgs boson mass
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
28
Newer ApproachesEnsembles of Networks
Committees of Networks Performance can be better than the best single
network
Stacks of NetworksControl both bias and variance
Mixture of ExpertsDecompose complex problems
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
29
Exploring Models: Bayesian Approach
Provides probabilistic information on each parameter of a model (SUSY, for example) via marginalization over other parameters
Bayesian method enables straight-forward and meaningful model comparisons. It also allows treatment of all uncertainties in a consistent manner.
Mathematically linked to adaptive algorithms such as Neural Networks (NN)
Hybrid methods involving NN for probability density estimation and Bayesian treatement can be very powerful
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
30
SummaryWe are building very sophisticated equipment and will record unprecedented amounts of data in the coming decadeUse of advanced “optimal” analysis techniques will be crucial to achieve the physics goalsMultivariate methods, particularly Neural Network techniques, have already made impact on discoveries and precision measurements and will be the methods of choice in future analysesHybrid methods combining “intelligent” algorithms and probabilistic approach will be the wave of the future
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
31
Optimal Event Selection
Optimal Event Selection
x
r(x,y) = constant defines an optimaldecision boundary
r(x,y) = constant defines an optimaldecision boundary
Feature spaceFeature space
),|(
),|(
)()|,(
)()|,(),(
yxbp
yxsp
bpbyxp
spsyxpyxr
),|(
),|(
)()|,(
)()|,(),(
yxbp
yxsp
bpbyxp
spsyxpyxr
S = B =
Conventional cutsx x
y y
0
0
y
0y
x0
x
y
x
y
0x
0y
Probabilistic Approach to Data Analysis
Bayesian Methods
(The Wave of the future)
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
33
Bayesian AnalysisBayesian Analysis
M modelA uninteresting parametersp interesting parameters d data
p A
M A
M p A
dMpApdMp
dMpApdpp
MpqMAQMpAdL
MpqMAQMpAdLdMpAp
)|,,()|(
)|,,()|(
),(),(),,|(
),(),(),,|()|,,(
p A
M A
M p A
dMpApdMp
dMpApdpp
MpqMAQMpAdL
MpqMAQMpAdLdMpAp
)|,,()|(
)|,,()|(
),(),(),,|(
),(),(),,|()|,,(
LikelihoodLikelihood PriorPriorPosteriorPosterior
Bayesian Analysis of Multi-source DataP.C. Bhat, H. Prosper, S. Snyder, Phys. Lett. B 407(1997) 73P.C. Bhat, H. Prosper, S. Snyder, Phys. Lett. B 407(1997) 73
ACAT2000 Oct. 16-20, 2000 Pushpa Bhat
34
Higgs Mass FitsS=80 WH events, assume background distribution described by Wbb.S=80 WH events, assume background distribution described by Wbb.ResultsResults
S/B = 1/10 MS/B = 1/10 Mfitfit= 114 +/- 11GeV/c= 114 +/- 11GeV/c22
S/B = 1/5 MS/B = 1/5 Mfitfit= 114 +/- 7GeV/c= 114 +/- 7GeV/c22