image processing, computer vision, and deep learning: new … · 2018. 11. 19. · image...

10
Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics interpretation of LHC events A. Schwartzman 1 , M. Kagan 1 , L, Mackey 2 , B. Nachman 1 and L. De Oliveira 3 1 SLAC National Accelerator Laboratory, Stanford University, 2575 Sand Hill Road, Menlo Park, CA 94025, USA 2 Department of Statistics, Stanford University, Stanford, CA 94305, USA 3 Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA E-mail: [email protected] Abstract. This review introduces recent developments in the application of image processing, computer vision, and deep neural networks to the analysis and interpretation of particle collision events at the Large Hadron Collider (LHC). The link between LHC data analysis and computer vision techniques relies on the concept of jet-images, building on the notion of a particle physics detector as a digital camera and the particles it measures as images. We show that state-of-the-art image classification techniques based on deep neural network architectures significantly improve the identification of highly boosted electroweak particles with respect to existing methods. Furthermore, we introduce new methods to visualize and interpret the high level features learned by deep neural networks that provide discrimination beyond physics- derived variables, adding a new capability to understand physics and to design more powerful classification methods at the LHC. 1. Introduction The Large Hadron Collider is exploring physics at the energy frontier, probing some of the most fundamental questions about the nature of our universe. The datasets of the LHC experiments are among the largest in all science. Each particle collision event at the LHC is rich in information, particularly in the detail and complexity of each event picture, making it ideal for the application of machine learning techniques to extract the maximum amount of physics information. Machine learning has been widely used for the analysis of particle collision data. Examples include unsupervised learning algorithms, such as jet clustering [1], and supervised algorithms for different particle identification and classification tasks, including the identification of different kind of quark flavors, such as b- and c-jets [2], tau leptons [9], and photons [4], among many others. Particle classification algorithms in particle colliders have traditionally been designed using physics intuition to extract a small set of observables or features that were then input into a multivariate classifier. For example, in the case of b flavor tagging in the ATLAS experiment, a Boosted Decision Tree consisting of 24 input variables is used [5]. The 24 input variables provide information about the physics of b-quark decays that is distinct from

Upload: others

Post on 16-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

Image Processing, Computer Vision, and Deep

Learning: new approaches to the analysis and physics

interpretation of LHC events

A. Schwartzman1, M. Kagan1, L, Mackey2, B. Nachman1 and L. DeOliveira3

1 SLAC National Accelerator Laboratory, Stanford University, 2575 Sand Hill Road, MenloPark, CA 94025, USA2 Department of Statistics, Stanford University, Stanford, CA 94305, USA3 Institute for Computational and Mathematical Engineering, Stanford University, Stanford,CA 94305, USA

E-mail: [email protected]

Abstract. This review introduces recent developments in the application of image processing,computer vision, and deep neural networks to the analysis and interpretation of particle collisionevents at the Large Hadron Collider (LHC). The link between LHC data analysis and computervision techniques relies on the concept of jet-images, building on the notion of a particlephysics detector as a digital camera and the particles it measures as images. We show thatstate-of-the-art image classification techniques based on deep neural network architecturessignificantly improve the identification of highly boosted electroweak particles with respectto existing methods. Furthermore, we introduce new methods to visualize and interpret thehigh level features learned by deep neural networks that provide discrimination beyond physics-derived variables, adding a new capability to understand physics and to design more powerfulclassification methods at the LHC.

1. IntroductionThe Large Hadron Collider is exploring physics at the energy frontier, probing some of the mostfundamental questions about the nature of our universe. The datasets of the LHC experimentsare among the largest in all science. Each particle collision event at the LHC is rich ininformation, particularly in the detail and complexity of each event picture, making it idealfor the application of machine learning techniques to extract the maximum amount of physicsinformation. Machine learning has been widely used for the analysis of particle collision data.Examples include unsupervised learning algorithms, such as jet clustering [1], and supervisedalgorithms for different particle identification and classification tasks, including the identificationof different kind of quark flavors, such as b- and c-jets [2], tau leptons [9], and photons [4],among many others. Particle classification algorithms in particle colliders have traditionallybeen designed using physics intuition to extract a small set of observables or features that werethen input into a multivariate classifier. For example, in the case of b flavor tagging in theATLAS experiment, a Boosted Decision Tree consisting of 24 input variables is used [5]. The24 input variables provide information about the physics of b-quark decays that is distinct from

Page 2: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

the light-quark and gluon jet backgrounds. Much of the work in developing such techniquesrelies on finding the physics variables that best capture the differences between different classesof particles.

During the last several years, spectacular advances in the fields of artificial intelligence,computer vision, and deep learning have resulted in remarkable performance in imageclassification and vision tasks [6], in particular through the use of deep convolutional neuralnetworks (CNN) [7]. One of the key innovations in these methods is that they operate directlyfrom raw images instead of requiring the extraction of features from the images. Motivated bythe success of these techniques, this paper investigates the applicability of deep neural networksfor image classification to the analysis of LHC data. LHC events are in fact ideal for the use ofcomputer vision methods, since each LHC event picture can be interpreted as an image from adigital ( 100M pixel) camera.

2. Jets at the LHCOne of the most important signatures for the analysis of data from the Large Hadron Collider(LHC) are collimated streams of particles known as jets. Individual jets identify quarks andgluons emitted in high-energy interactions. Combinations of jets are used to identify unstablemassive particles such as the top quark, the W, Z, and Higgs bosons. For example, W bosons arereconstructed by the two jets from the decay process W → qq, where q represents a quark. Theseheavy particles, appear in many models of new physics that predict the existence of new exoticsub-atomic particles. Almost every analysis of LHC data rests on the efficient and accuratereconstruction of jets and their properties. An example of an event display from the ATLASexperiment1 containing two high pT

2 jets displayed in different colors is shown in Figure 1.

Figure 1. ATLAS event display containing two high pT jets.

The event display shows three different views: a transverse (relative to the beam axis) x− yview on the left panel, a parallel x−z view on the top right panel, and a lego view on the bottomright panel. The lego view shows the projection of the energy deposits on the calorimeter intoa two-dimensional (η, φ) plane that is obtained by unfolding the detector along the φ direction.This view is particularly relevant for image processing and computer vision approaches as it canbe related to a fixed size detector image. The variable η is called pseudo-rapidity and is relatedto the polar angle θ.

1 https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2011-047/fig 28.png2 Analyses of hadron collider data use the jet transverse momentum, pT , rather than total energy. The jet pT isdefined as the the magnitude of jet momentum transverse to the beam axis: pT =

√p2x + P 2

y .

Page 3: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

Jets are reconstructed using dedicated algorithms that group calorimeter energy deposits intoclusters of fixed two-dimensional spatial (η, φ) size, corresponding to individual jets. Circles inthe bottom right panel of Figure 1 represent the area of jets found with the anti-kt clusteringalgorithm with radius parameter R = 0.4, whereas in yellow is shown the magnitude of thetransverse energy of the calorimeter deposits within jets. For a review of jet algorithms see [8].

At the LHC, electroweak heavy particles (W, Z, and Higgs bosons, and top quarks) canbe produced with large Lorentz boosts. The boost of the electroweak particle determines thedistance in the (η, φ) plane between their decay products. Highly boosted heavy electroweakparticles can hence lead to signatures of jet substructure, consisting of single (large radius) jetscontaining multiple nearby sub-jets. For example, experimental signatures boosted W bosonswill contain two close-by sub-jets. Figure 2 shows an event display containing two boosted topquarks3. The lego view in the top right panel shows top-jet candidates as green circles withred circles indicating individual sub-jets. The size of the yellow squares is proportional to thetransverse component of the energy deposits. A three-prong structure is clearly visible for thejet labeled as hadronic top candidate. In this case, the boosted jet corresponds to the processt→Wb→ qqb.

Figure 2. ATLAS event display containing two boosted top-quark jets.

The most difficult challenge to the identification of boosted jets from heavy electroweakparticles at the LHC is the rejection of the very large background from multi-jet production inQCD. The key idea to classify boosted top or boson jets is to identify the internal structureof individual jets. The main observables used in jet substructure analyses that best separatesignal (top and boson jets) from background (quark and gluon jets, generically referred as QCDjets) are the jet mass, the n-prong structure of the jets, for example measured with the variablen-subjettiness [9], and the radiation pattern within the jet. For a review of jet substructuretechniques see [10][11][12].

3. Data samplesThis review considers simulated jet data produced from Monte Carlo simulations of LHCcollisions. Boosted W bosons were generated as signal using using PYTHIA 8.170 [13] at

√s = 14

TeV via the decay of a hypothetical heavy W’ boson forced to decay to a hadronically decayingW boson (W → qq) and a Z boson which decays invisibly to neutrinos. Multijet production

3 https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2011-073/fig 20.png

Page 4: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

of quarks and gluons were generated as background using the same PYTHIA generator andcenter-of-mass energy as for W bosons.

4. Jet imagesIn order to use image processing and computer vision approaches for jet tagging at the LHC,a new data representation is introduced: the jet-image [14]. Jet images build on the notion ofa detector as pixels on a digital camera, and jets as images, enabling the connection betweenthe fields of computer vision and jet substructure and jet physics. Jet images are defined by a25 × 25 grid of size (0.1 × 0.1) in (η × φ) space centered around the axis of R=1.0 anti-kt jets.The intensity of the pixels given by the transverse momentum pT of the pixel cell. Prior tothe application of computer vision classification techniques, a series of pre-processing steps toaccount for the space-time symmetries of jets images are applied. Pre-processing for the specificcase of the identification of 2-prong jets such as those from the decay of W bosons, is definedby a translation that places the leading pT subjet at the center of jet-image, a rotation suchthat the second pT leading subjet is placed down below, and a flip operation such that the rightside of the image has higher total pT than the left. The goal of these pre-processing steps isto make it easier for image classification algorithms to focus on the important differences inphysics between images. Jet-image pre-processing is illustrated in Figure 3 which shows theaverage jet-image of W (signal) and QCD (background) jets in a narrow pT and mass bin beforeand after the the pre-processing steps.

[GeV

]T

Pix

el p

-910

-810

-710

-610

-510

-410

-310

-210

-110

1

10

210

310

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeVs WZ, →Pythia 8, W'

/GeV < 260 GeV, 65 < mass/GeV < 95T

240 < p [G

eV]

TP

ixel

p

-910

-810

-710

-610

-510

-410

-310

-210

-110

1

10

210

310

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeVs WZ, →Pythia 8, W'

/GeV < 260 GeV, 65 < mass/GeV < 95T

240 < p

[GeV

]T

Pix

el p

-910

-810

-710

-610

-510

-410

-310

-210

-110

1

10

210

310

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeVsPythia 8, QCD dijets,

/GeV < 260 GeV, 65 < mass/GeV < 95T

240 < p

[GeV

]T

Pix

el p

-910

-810

-710

-610

-510

-410

-310

-210

-110

1

10

210

310

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeVsPythia 8, QCD dijets,

/GeV < 260 GeV, 65 < mass/GeV < 95T

240 < p

Figure 3. Average jet-image for W jets (top panel) and QCD jets (bottom panel) with240 < pT < 260 GeV and 65 <mass< 95 GeV before (left) and after (right) pre-processing.

5. Jet tagging using deep neutral networksThe concept of jet-images has enabled the use image classification methods for the identification(tagging) of boosted W boson and top quarks at the LHC. In the first case, through the use ofFisher jets [14]: a linear classifier inspired by facial recognition algorithms. In the second case,by the use of neural networks [15]. An earlier use of image-based event reconstruction fromthe OPAL Collaboration at the LEP collider is described in [16]. This paper focuses on the

Page 5: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

application of modern computer vision algorithms based on deep neural networks for jet-imageclassification.

Using the pre-processed jet-image representation of jets, two different architectures ofdeep neural networks for image classification were trained to separate W from QCD jets:Convolutional Neural Networks (CNN), and Fully Connected (FC) MaxOut networks. Thisrepresents a significant departure from state-of-the-art jet tagging at LHC in that the full dataimage is used as input to the networks rather than a set of physics-defined jet substructurevariables. In analogy to image recognition problems in computer vision, deep neural networkstrained on the full jet-image are expected to learn high level representation of the data andcapture new, more discriminant information and features between signal and background.

For a detailed description of the deep network architectures see [17]. The MaxOut architectureconsists of two FC layers with MaxOut activations units followed by two FC layers with rectifiedlinear units (ReLU), followed by a FC sigmoid layer for classification. The number of units perlayer are 256, 128, 64, and 25. The CNN network consists of three convolution layers followed bytwo fully connected layers. Each convolution layer utilize 64 filters of sizes 11 × 11, 3 × 3, and 3× 3 respectively. The large size of the filters in the first layer is motivated by the typical scale ofthe most prominent features in W jets, and the fact that images are very sparse, with an averageoccupancy of about 5%. Each convolutional layer consists of three sequential units: convolution,max-pool, and dropout, with ReLU activation functions. The max-pool down sampling in eachconvolution layer is (2,2), (3,3), and (3,3). The FC hidden layer consists of 64 units. Sigmoidunits are used in the final classification layer. Figure 4 shows a diagram illustrating the CNNarchitecture.

32- -Generic overview slide

Boosted Boson Type TaggingJet ETmiss

SLAC, Stanford University

March 26, 2014

Benjamin Nachman and Ariel Schartzman

B. Nachman (SLAC) Boosted Boson Type Tagging March 26, 2014 1 / 21

Boosted Boson Type TaggingJet ETmiss

SLAC, Stanford University

March 26, 2014

Benjamin Nachman and Ariel Schartzman

B. Nachman (SLAC) Boosted Boson Type Tagging March 26, 2014 1 / 21

Even more non-linearity: Going Deep

Deep Convolutional Architectures for Jet-Images at the Large Hadron Collider

Introduction The Large Hadron Collider (LHC) at CERN is the largest and most powerful particle accelerator in the world, collecting 3,200 TB of proton-proton collision data every year. A true instance of Big Data, scientists use machine learning for rare-event detection, and hope to catch glimpses of new and uncharted physics at unprecedented collision energies.

Our work focuses on the idea of the ATLAS detector as a camera, with events captured as images in 3D space. Drawing on the success of Convolutional Neural Networks in Computer Vision, we study the potential of deep leaning for interpreting LHC events in new ways.

The ATLAS detector The ATLAS detector is one of the two general-purpose experiments at the LHC. The 100 million channel detector captures snapshots of particle collisions occurring 40 million times per second. We focus our attention to the Calorimeter, which we treat as a digital camera in cylindrical space. Below, we see a snapshot of a 13 TeV proton-proton collision.

LHC Events as Images We transform the ATLAS coordinate system (η, φ) to a rectangular grid that allows for an image-based grid arrangement. During a collision, energy from particles are deposited in pixels in (η, φ) space. We take these energy levels, and use them as the pixel intensities in a greyscale analogue. These images — called Jet Images — were first introduced by our group [JHEP 02 (2015) 118], enabling the connection between LHC physics event reconstruction and computer vision.. We transform each image in (η, φ), rotate around the jet-axis, and normalize each image, as is often done in Computer Vision, to account for non-discriminative difference in pixel intensities.

In our experiments, we build discriminants on top of Jet Images to distinguish between a hypothetical new physics event, W’→ WZ, and a standard model background, QCD.

Jet Image

Convolution Max-Pool Convolution Max-Pool Flatten

Fully Connected ReLU Unit

ReLU Dropout ReLU DropoutLocal

Response Normalization

W’→ WZ event

ConvolutionsConvolved

Feature Layers

Max-Pooling

Repeat

Physics Performance Improvements Our analysis shows that Deep Convolutional Networks significantly improve the classification of new physics processes compared to state-of-the-art methods based on physics features, enhancing the discovery potential of the LHC. More importantly, the improved performance suggests that the deep convolutional network is capturing features and representations beyond physics-motivated variables.

Concluding Remarks We show that modern Deep Convolutional Architectures can significantly enhance the discovery potential of the LHC for new particles and phenomena. We hope to both inspire future research into Computer Vision-inspired techniques for particle discovery, and continue down this path towards increased discovery potential for new physics.

Difference in average image between signal

and background

Deep Convolutional Networks Deep Learning — convolutional networks in particular — currently represent the state of the art in most image recognition tasks. We apply a deep convolutional architecture to Jet Images, and perform model selection. Below, we visualize a simple architecture used to great success.

We found that architectures with large filters captured the physics response with a higher level of accuracy. The learned filters from the convolutional layers exhibit a two prong and location based structure that sheds light on phenomenological structures within jets.

Visualizing Learning Below, we have the learned convolutional filters (left) and the difference in between the average signal and background image after applying the learned convolutional filters (right). This novel difference-visualization technique helps understand what the network learns.

2D Convolutions to Jet Images

Understanding Improvements Since the selection of physics-driven variables is driven by physical understanding, we want to be sure that the representations we learn are more than simple recombinations of basic physical variables. We introduce a new method to test this — we derive sample weights to apply such that

meaning that physical variables have no discrimination power. Then, we apply our learned discriminant, and check for improvement in our figure of merit — the ROC curve.

Standard physically motivated discriminants — mass (top) and n-subjettiness (bottom)

Receiver Operating Characteristic

Notice that removing out the individual effects of the physics-related variables leads to a likelihood performance equivalent to a random guess, but the Deep Convolutional Network retains some discriminative power. This indicates that the deep network learns beyond theory-driven variables — we hypothesize these may have to do with density, shape, spread, and other spatially driven features.

Luke de Oliveiraa, Michael Aaron Kaganb, Lester Mackeyc, Benjamin Nachmanb, Ariel Schwartzmanb

aStanford University, Institute for Computational and Mathematical Engineering (ICME), bSLAC National Accelerator Laboratory, cStanford University, Department of Statistics

Repeat

Apply deep learning techniques on jet images! [3]

convolutional nets are a standard image processing technique; also consider maxout

Figure 4. Convolutional neural network architecture applied to a jet-image.

The convolution layer can be intuitively understood as follows. In the first convolution layer,filters of 11 × 11 pixels are scanned across the jet-image in a sliding window approach. At eachpoint in the jet-image, the convolution between the filter and the patch of the image behind itis computed. This process results in a new image for each filter. The filter values are learnedduring the CNN training. The following max pool layer then considers patches of size 2 × 2in a sliding window approach and only the maximum value within the 2 × 2 patch is used toform the max-pool resulting image. In the next convolutional layer, new filters are run across allprevious convolution images and a similar max-polling is performed. An unique aspect of CNNarchitectures is that the first convolution layer only have access to local information of size givenby the filter size. In the subsequent layers, when the previous convolutions are combined, units

Page 6: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

have access to more global information, until reaching the final FC layer where units combineinformation of the whole image. This architecture enables a hierarchical representation of thedata where the first layers focus on local features for discrimination, for example individualsubjets or local radiation patterns, whereas deeper layers learn higher level features such asdistances between subjets or more global structure features within jets.

6. W boson jet tagging perfomanceThe performance of the deep neural networks to identify W boson and reject QCD jets wasevaluated for jets with 250 GeV < pT < 300 GeV and within a 65 GeV < m < GeV masswindow containing the peak of the W boson mass. The primary figure of merit to compare theperformance of different jet tagging techniques is the ROC curve, which measures the rejectionof background QCD jets as a function of the efficiency to correctly classify signal W jets fordifferent thresholds applied to the output of the classifier. Background rejection is defined asthe inverse of the efficiency to incorrectly classify background jets as signal.

Figure 5 shows ROC curves comparing the performance of several single jet substructurevariables (mass, n-subjettiness τ21, and the distance ∆R between the two leading pT subjets)and the two neural networks architectures. An additional normalized convolutional deepneural network (Convnet-norm) was also considered. In this setting, the pixel intensities werenormalized by the total image intensity before training. Figure 5 shows that the performance ofthe three DNN classifiers significantly outperform all individual single jet substructure variables.We also observe that the MaxOut architecture performs slightly better than the convolutionalnetworks. The curved denoted Fisher corresponds to the Fisher jet-image discriminant describedin reference [14].

Signal Efficiency

0.2 0.4 0.6 0.8

1/(B

ackg

roun

d E

ffici

ency

)

0

50

100

150

= 13 TeV, Pythia 8s

/GeV < 300 GeV, 65 < mass/GeV < 95T

250 < p

mass

21τ

R∆

Fisher

MaxOut

Convnet

Convnet-norm

Random

Figure 5. ROC curves for state-of-the-art jet substructure variables and computer vision deepneural network discriminants.

Figure 6(a) compares the performance of the DNN classifiers with two-variable combinationsof jet substructure variables. While two-variable combinations achieve better discriminationpower than individual variables, DNN classifiers still perform significantly better, indicatingthat the deep networks are learning new physics information not captured in the jet substructurevariables considered.

Page 7: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

In order to understand if the DNN have learned the jet substructure variables, Figure 6(b)shows ROC curves for combinations of the CNN classifier and each of the individual jetsubstructure variables. It can be seen than adding τ21 and ∆R to the CNN classifier doesnot improve the ROC curve, indicating that the network has fully learned these two variables.However, there is clear improvement in classification performance when the CNN is combinedwith the jet mass. This suggests that, while the CNN is able to learn new physics features tosignificantly outperform the combination of mass and τ21, it does not fully learn the jet mass.

Signal Efficiency

0.2 0.4 0.6 0.8

1/(B

ackg

roun

d E

ffici

ency

)

0

50

100

150

= 13 TeV, Pythia 8s

/GeV < 300 GeV, 65 < mass/GeV < 95T

250 < p

21τmass+

R∆mass+

R∆+21τ

MaxOut

Convnet

Convnet-norm

Random

(a)

Signal Efficiency

0.2 0.4 0.6 0.8

1/(B

ackg

roun

d E

ffici

ency

)

0

50

100

150

= 13 TeV, Pythia 8s

/GeV < 300 GeV, 65 < mass/GeV < 95T

250 < p

mass+Convnet

+Convnet21τ

R+Convnet∆

MaxOut

Convnet

Convnet-norm

Random

(b)

Figure 6. (a) ROC curves for two-variable combination of jet substructure features andcomputer vision deep neural network discriminants. (b) ROC curves for combinations of CNNoutput with individual jet substructure features.

7. Visualization of learned featuresImage-based approaches to LHC data analysis can provide unique new ways to visualize whatphysics information is learned by the different classifiers. Here only a few examples are given.For a more detailed discussion see [17]

Figure 7 shows the conditional distribution of n-subjettiness τ21 and jet mass given the CNNoutput. The strong non-linear relationship between CNN and τ21, indicates that the CNN haslearned the information contained in n-subjettiness. On the other hand, the correlation betweenjet mass and CNN is significantly weaker, in agreement with the previous observation that jetmass is not fully learned by the deep neural networks architectures considered.

A very powerful way to visualize the physics information learned in deep representations isthe deep correlation jet-image, constructed as the Pearson correlation coefficient of the pixelintensities with the deep neural network output, as shown in Figure 8. This image shows howdiscriminating information is contained within the network. In particular, one finds that thethe localization and size of the second leading pT subjet (located at the bottom of the image) ishighly correlated with high CNN activations corresponding to signal W jets. This is consistentwith our expectation for soft QCD gluon emission compared with hard emissions for the case ofW jets. Broad blue regions around and outside the leading two subjets are indicative of gluonradiation expected in QCD jets.

Page 8: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

| D

NN

Out

put)

21τP

r(

0

0.02

0.04

0.06

0.08

0.1

21τ0 0.2 0.4 0.6 0.8 1

DN

N O

utpu

t

0

0.2

0.4

0.6

0.8

1 = 13 TeV, Pythia 8sQCD,

/GeV < 300 GeV, 65 < mass/GeV < 95T

250 < p

(a)

Pr(

Jet M

ass

/ GeV

| D

NN

Out

put)

0

0.02

0.04

0.06

0.08

0.1

Jet Mass / GeV

70 80 90

DN

N O

utpu

t

0

0.2

0.4

0.6

0.8

1 = 13 TeV, Pythia 8sQCD,

/GeV < 300 GeV, 65 < mass/GeV < 95T

250 < p

(b)

Figure 7. CNN output as a function of τ21 (a) and mass (b).

1.0 0.5 0.0 0.5 1.0[Transformed] Pseudorapidity (η)

1.0

0.5

0.0

0.5

1.0

[Tra

nsf

orm

ed]

Azi

muth

al A

ngle

(φ)

Correlation of Deep Network output with pixel activations.pWT ∈[250,300] matched to QCD, mW∈[65,95] GeV

0.60

0.45

0.30

0.15

0.00

0.15

0.30

0.45

0.60

Pears

on C

orr

ela

tion C

oeff

icie

nt

Figure 8. Deep correlation jet-image computed as the linear correlation between the CNNoutput and the pixel intensity. Red regions are signal-like, whereas blue regions are background-like

In order to understand what new physics information DNN are learning beyond jet mass andτ21, we considered a highly restricted phase space so that τ21 and mass does not vary significantlyover this reduced phase space. Figure 9 shows the ROC curves for the deep neural networksand the individual jet substructure variables for a subset of jets with very narrow values of pT ,mass, and τ21. The small range of τ21 between 0.19 and 0.21 selects both signal and backgroundjets that have a very similar (and significant) 2-prong structure. In other words, this restrictiveselection ensures that signal and background jets look almost identical in terms of jet massand n-subjettiness. As expected, jet mass and τ21 provide almost no discrimination powerin Figure 9, whereas both CNN and MaxOut networks are still able to distinguish betweensignal and background within this restrictive phase space, confirming that they are learning newinformation beyond mass and τ21.

Page 9: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

Signal Efficiency

0.2 0.4 0.6 0.8

1/(B

ackg

roun

d E

ffici

ency

)

0

5

10

15

20

= 13 TeVsPythia 8, < 0.21, 79 < mass/GeV < 8121τ/GeV < 260 GeV, 0.19 <

T240 < p

mass

21τR∆

MaxOut

Convnet

Random

Figure 9. ROC curves in a highly restrictive phase space of 79 GeV < m < 81 GeV and 0.19< τ21 < 0.21.

Figure 10 shows the deep correlation jet-images obtained in three small windows of τ21. Theseimages provide spacial information indicative of where in the image the networks are findingdiscriminating features beyond mass and n-subjettiness. A prominent feature in these imagesis the red region in-between the two leading pT subjets in the cases of small and moderate τ21values, related to the color flow pattern differences between W and QCD jets. Large valuesof τ21, corresponding to jets that exhibit a strong one-prong structure, also show a particularradiation pattern not exploited in the two state-of-the-art jet substructure variables considered.These images can suggest further development of novel jet substructure variables that are ableto capture these differences between signal and background.

Pea

rson

Cor

rela

tion

Coe

ffici

ent

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeV, Pythia 8s WZ, →W' < 0.21, 79 < mass/GeV < 8121τ/GeV < 260 GeV, 0.19 <

T240 < p

(a)

Pea

rson

Cor

rela

tion

Coe

ffici

ent

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeV, Pythia 8s WZ, →W' < 0.41, 79 < mass/GeV < 8121τ/GeV < 260 GeV, 0.39 <

T240 < p

(b)

Pea

rson

Cor

rela

tion

Coe

ffici

ent

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

)η[Translated] Pseudorapidity (

-1 -0.5 0 0.5 1

)φ[T

rans

late

d] A

zim

utha

l Ang

le (

-1

-0.5

0

0.5

1

= 13 TeV, Pythia 8s WZ, →W' < 0.61, 79 < mass/GeV < 8121τ/GeV < 260 GeV, 0.59 <

T240 < p

(c)

Figure 10. Deep correlation jet-image in a highly restrictive phase space of 79 GeV < m < 81GeV and 0.19 < τ21 < 0.21 (a), 0.39 < τ21 < 0.41 (b), and 0.59 < τ21 < 0.61 (c).

Page 10: Image Processing, Computer Vision, and Deep Learning: new … · 2018. 11. 19. · Image Processing, Computer Vision, and Deep Learning: new approaches to the analysis and physics

8. ConclusionsThis paper presents a new paradigm to study jets at the LHC. Representing jets as images enablesthe application of advanced computer vision and deep learning methods for jet classification,adding a new capability to understand LHC events and to design new, more powerful,classification methods. Jet-image deep neural networks classifiers significantly outperform state-of-the-art W boson jet classification methods as they are able to learn new discriminant featuresnot captured in the physics motivated variables. Studies suggest that this additional informationis related to the different color-flow patterns between W and QCD jets. While the DNN classifiersstudied in this paper are very powerful, it was shown that they do not exploit all the informationavailable for discrimination. Jet mass, in particular, is not fully learned by any of the DNNarchitectures considered. In addition to improved classification performance, DNN jet-imageanalysis provides new ways to visualize the information learned by the DNNs and to connectthese visualizations with the physics of jets. The ideas presented in this paper have broadapplicability and could be used for many different analyses where events can be represented asimages both within and beyond high energy physics.

9. AcknowledgementsThis work was supported by the Stanford Data Science Initiative and by the US Department ofEnergy (DOE) grant DE-AC02-76SF00515.

10. References[1] Cacciari M, Salam G P, and Soyez G 2008, JHEP 0804 063[2] ATLAS Collaboration 2016, JINST 11 P04008[3] CMS Collaboration 2016, J. Instrum. 11 P01019[4] CMS Collaboration 2015, J. Instrum. 10 P08010[5] ATLAS Collaboration 2015, ATL-PHYS-PUB-2015-022[6] Russakovsky O, et. al. 2015, arXiv:1409.0575 [cs.CV][7] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, et al. 1989, Neural Comput. 1 541-551[8] Salam G P 2010, Eur. Phys. J. C. 67 637-686[9] Thaler J, Tilburg K V 2011, JHEP 1103 015[10] Abdesselam A, et. al. 2011, Eur. Phys. J. C. 71 1661[11] Altheimer A, et. al. 2012, J. Phys. G. 39 063001[12] Altheimer A, et. al. 2014, Eur. Phys. J. C. 74 2792[13] Sjostrand T, Mrenna S, and Skands P Z 2008 Comput. Phys. Commun. 178 852-867[14] Cogan J, Kagan M, Strauss E, Schwartzman A, 2015 JHEP 02 118[15] Almeida L G, Backovic M, Cliche M, Lee S J, Perelstein M 2015, arXiv:1501.05968 [hep-ph][16] Thomson M A 1996, Nuclear Instruments and Methods in Physics Research A 382 553-560[17] Oliveira L , Kagan M, Mackey L, Nachman B, Schwartzman A, 2015, arXiv1511.05190 [hep-ph]