ai-driven analysis and image acquisition of in-vivo ...€¦ · chen, krämer, klein, romen –...

Department of MathematicsTechnical University of Munich

AI-driven analysis and image acquisition of in-vivoneuronal network activity

Xingying Chen, Bastian KrämerMichal Klein, René Romen

August 5, 2019



Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion

Chen, Krämer, Klein, Romen – Data Innovation Lab 2


Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



Zebrafish Larvae

Figure 1: Zebrafish[5]



Light Field Microscope

Figure 2: (a) A microscopewith a microlens array[3]

Figure 3: Microlens array[1]



Advantages of Light Field Microscopy Images

Figure 4: Comparison of images at di�erent focal planes[3]




Figure 5: (b), (c) The red point stands for a point source generating illumination. Redregions on top shows the intensity of illumination arriving at the sensor plane[3].




Figure 6: Example light field of a sphere[3]



Light Field Transformations

43

Lenslet Image

43

Lenslet Sub-image

23

23I Angular resolution: 43× 43

I Spatial resolution: 23× 23



Reordering Light Field

Figure 7: Reordering lenslet to views image[6]



Light Field Transformations

43

View

43

Views Image

23

23I Angular resolution: 43× 43

I Spatial resolution: 23× 23



Light Field Image

43

Lenslet Image

43

Views Image

23

23

Views



Goal



Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



Datasets - Real Images

I Fish eyeLenslet Views

Figure 10: Fish eye



Datasets - Real Images

I SpheresLenslet Views

Figure 11: SpheresChen, Krämer, Klein, Romen – Data Innovation Lab 21


Dataset - Training Data

I Created own dataset containing geometric objects

Figure 12: Scene

Ground Truth Depth Map

Figure 13: Resulting depth map



Dataset

Figure 14: Resulting lenslet and views image

I 369 images



Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



Overview of Networks

I Input - views imageI Epinet

I Views network

I Input - lenslet imageI Lenslet network

I Lenslet classification network





I Views network





Epinet

Figure 15: Epinet architecture [9]

I Multi-stream network



Epinet

I 2.2 million parameters

I Multistream convolutional encoder

I Upscaling: nearest neighbor interpolation

I Optimizer: RMSProp (no learning rate decay)

I Regularization: None

I Activation function: Leaky ReLU

I Learning rate: 10−4

I Error: Mean squared error





I Views network





Views Network

52943

43

256

1

1

256

2

2

256

4

4

128

4

4

128

256

256

64

16

16

32

4

4

1

1

1

Figure 16: Le� to right: input, 4 convolution layers, upscaling by nearest neighborinterpolation, 3 convolution layers.



Views Network


I Convolutional encoder

I Upscaling: nearest neighbor interpolation

I Optimizer: Adam (no learning rate decay)

I Regularization: None

I Activation function: Leaky ReLU

I Learning rate: 10−6

I Error: Absolute error





I Views network





Lenslet Network

Figure 17: General structure of a convolution encoder-decoder architecture. We useInception-ResNet-v2 [10] blocks in encoder.



Lenslet Network


I Inception-ResNet-v2 blocks encoder

I Upscaling: fast up-convolution (x3)

I Optimizer: Adam (exp. decaying learning rate (400 examples))

I Regularization: L2 (scale: 5 ∗ 10−6)

I Activation function: Leaky ReLU (a�er each block)

I Learning rate: 5 ∗ 10−6

I Error: Mean squared error





I Views network





Lenslet Classification Network

Figure 18: General structure of a convolution encoder-decoder architecture. We useInception-ResNet-v2[10] blocks in encoder.



Lenslet Classification Network

I 5 million parameters

I Inception-ResNet-v2 encoder

I Upscaling: fast up-convolution (x3)

I Optimizer: Adam (exp. decaying learning rate (400 examples))

I Regularization: L2 (scale: 5 ∗ 10−6)

I Activation function: Leaky ReLU (a�er each block)

I Learning rate: 5 ∗ 10−6

I Error: Mean absolute error



Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



�antitative Results

Network MSE BPR (δ = 0.15) Input #params (millions)Epinet 0.0049958 0.04304 Stacked views 2.2Views 0.0122427 0.12022 Stacked views 4.1Lenslet 0.00595414 0.05586 Lenslet 7.6

Lenslet (cls) 26.9925 0.99521 Lenslet 5



�alitative Results - Test DataGround Truth Epinet Views

Lenslet Lenslet (cls)

0.0

0.2

0.4

0.6

0.8

1.0

Figure 19: Depth prediction of an image from test set a�er 300 epochs.



�alitative Results - Fish EyeEpinet Views


0.0

0.2

0.4

0.6

0.8

1.0

Figure 20: Depth prediction of fish eye a�er 300 epochs.



�alitative Results - SpheresEpinet Views


0.0

0.2

0.4

0.6

0.8

1.0

Figure 21: Depth prediction of spheres a�er 300 epochs.



Discussion - Real Data

I Image of fish eye and spheres taken with di�erent configurations

I Be�er networks on the test set pick up more noise

I No ground truth to compare the results



Table of Contents

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



Discussion - Generated Dataset

I Simulation needed to overcome lack of data

I General problem when using deep networks on light field data

I Geometric objects are biologically not plausible

I Real data contains noise

I Realistic simulation with unrealistic scene



Discussion - Light Field Data

I Typical light field resolutions in research papers:I Spatial resolution of 512× 512

I Angular resolution of 3× 3, 5× 5 or 9× 9

I e.g. see EPINET[9], VommaNet[8]

I Light field microscopy data in our project:

I Spatial resolution of 43× 43

I Angular resolution of 23× 23




I Typical light field resolutions in research papers:I Spatial resolution of 512× 512

I Angular resolution of 3× 3, 5× 5 or 9× 9

I e.g. see EPINET[9], VommaNet[8]

I Light field microscopy data in our project:I Spatial resolution of 43× 43

I Angular resolution of 23× 23




I Depth predictions with a resolution of 43× 43 too small

I Upsampling of spatial resolution to 256× 256

I Di�erences in baseline and depth range

I Architectures proposed in research papers were not directlysuited for our data



Outlook

I Realistic scene generation

I Networks that are invariant to camera configurations

I Light field reconstruction using depth prediction



Summary

1 Introduction

2 Data

3 Networks

4 Results

5 Discussion



Thank you for your a�ention

�estions?



Bibliography I

[1] h�ps://www.rpcphotonics.com/product/mla-s100-f12/.

[2] Nearest neighbour interpolation.

[3] M. Broxton, L. Grosenick, S. Yang, N. Cohen, A. Andalman, K. Deisseroth, and M. Levoy.

Wave optics theory and 3-d deconvolution for the light field microscope.

Optics express, 21(21):25418–25439, 2013.

[4] X. Duan, X. Ye, Y. Li, and H. Li.

High quality depth estimation from monocular images based on depth prediction andenhancement sub-networks.

In 2018 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE,2018.



Bibliography II

[5] L. D. Ellis, E. C. Soo, J. C. Achenbach, M. G. Morash, and K. H. Soanes.

Use of the zebrafish larvae as a model to study cigare�e smoke condensate toxicity.

PLoS One, 9(12):e115305, 2014.

[6] C. Hahne, A. Aggoun, V. Velisavljevic, S. Fiebig, and M. Pesch.

Baseline and triangulation geometry in a standard plenoptic camera.

International Journal of Computer Vision, 126(1):21–35, Jan 2018.

[7] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab.

Deeper depth prediction with fully convolutional residual networks.

In 2016 Fourth international conference on 3D vision (3DV), pages 239–248. IEEE, 2016.

[8] H. Ma, H. Li, Z. Qian, S. Shi, and T. Mu.

Vommanet: an end-to-end network for disparity estimation from reflective and texture-less light field images.

arXiv preprint arXiv:1811.07124, 2018.



Bibliography III

[9] C. Shin, H.-G. Jeon, Y. Yoon, I. So Kweon, and S. Joo Kim.

Epinet: A fully-convolutional neural network using epipolar geometry for depth fromlight field images.

In Proceedings of the IEEE Conference on Computer Vision and Pa�ern Recognition, pages4748–4757, 2018.

[10] C. Szegedy, S. Io�e, V. Vanhoucke, and A. A. Alemi.

Inception-v4, inception-resnet and the impact of residual connections on learning.

In Thirty-First AAAI Conference on Artificial Intelligence, 2017.



Backup Slides



Depth Estimate

Figure 22: Example of a depth estimate.[4]



Bad Pixel Ratio

BPR(x, x̂; δ) =1

NM

N∑i=0

M∑j=0

I[δ < |xi,j − x̂ i,j|]



Upsampling Strategies Published at IEEE International Conference on 3D Vision (3DV) 2016

Figure 2. From up-convolutions to up-projections. (a)Standard up-convolution. (b) The equivalent but fasterup-convolution. (c) Our novel up-projection block, fol-lowing residual logic. (d) The faster equivalent versionof (c)

We further extend simple up-convolutions usinga similar but inverse concept to [7] to create up-sampling res-blocks. The idea is to introduce asimple 3 × 3 convolution after the up-convolutionand to add a projection connection from the lowerresolution feature map to the result, as shown inFig. 2(c). Because of the different sizes, the small-sized map needs to be up-sampled using anotherup-convolution in the projection branch, but sincethe unpooling only needs to be applied once forboth branches, we just apply the 5 × 5 convolu-tions separately on the two branches. We call thisnew up-sampling block up-projection since it ex-tends the idea of the projection connection [7] toup-convolutions. Chaining up-projection blocks al-lows high-level information to be more efficientlypassed forward in the network while progressivelyincreasing feature map sizes. This enables the con-struction of our coherent, fully convolutional net-work for depth prediction. Fig. 2 shows the dif-ferences between an up-convolutional block to up-projection block. It also shows the correspondingfast versions that will be described in the followingsection.

Fast Up-Convolutions. One further contributionof this work is to reformulate the up-convolutionoperation so to make it more efficient, leading to adecrease of training time of the whole network of

Figure 3. Faster up-convolutions. Top row: the com-mon up-convolutional steps: unpooling doubles a fea-ture map’s size, filling the holes with zeros, and a 5 × 5convolution filters this map. Depending on the positionof the filter, only certain parts of it (A,B,C,D) are mul-tiplied with non-zero values. This motivates convolv-ing the original feature map with the 4 differently com-posed filters (bottom part) and interleaving them to ob-tain the same output, while avoiding zero multiplications.A,B,C,D only mark locations and the actual weight val-ues will differ

around 15%. This also applies to the newly intro-duced up-projection operation. The main intuitionis as follows: after unpooling 75% of the result-ing feature maps contain zeros, thus the following5 × 5 convolution mostly operates on zeros whichcan be avoided in our modified formulation. Thiscan be observed in Fig. 3. In the top left the origi-nal feature map is unpooled (top middle) and thenconvolved by a 5 × 5 filter. We observe that in anunpooled feature map, depending on the location(red, blue, purple, orange bounding boxes) of the5×5 filter, only certain weights are multiplied withpotentially non-zero values. These weights fall intofour non-overlapping groups, indicated by differentcolors and A,B,C,D in the figure. Based on the fil-ter groups, we arrange the original 5 × 5 filter tofour new filters of sizes (A) 3 × 3, (B) 3 × 2, (C)2× 3 and (D) 2× 2. Exactly the same output as theoriginal operation (unpooling and convolution) cannow be achieved by interleaving the elements of thefour resulting feature maps as in Fig. 3. The corre-sponding changes from a simple up-convolutionalblock to the proposed up-projection are shown inFig. 2 (d).

Figure 23: Di�erent up-sampling strategies[7]



Upsampling StrategiesPublished at IEEE International Conference on 3D Vision (3DV) 2016

Figure 2. From up-convolutions to up-projections. (a)Standard up-convolution. (b) The equivalent but fasterup-convolution. (c) Our novel up-projection block, fol-lowing residual logic. (d) The faster equivalent versionof (c)

We further extend simple up-convolutions usinga similar but inverse concept to [7] to create up-sampling res-blocks. The idea is to introduce asimple 3 × 3 convolution after the up-convolutionand to add a projection connection from the lowerresolution feature map to the result, as shown inFig. 2(c). Because of the different sizes, the small-sized map needs to be up-sampled using anotherup-convolution in the projection branch, but sincethe unpooling only needs to be applied once forboth branches, we just apply the 5 × 5 convolu-tions separately on the two branches. We call thisnew up-sampling block up-projection since it ex-tends the idea of the projection connection [7] toup-convolutions. Chaining up-projection blocks al-lows high-level information to be more efficientlypassed forward in the network while progressivelyincreasing feature map sizes. This enables the con-struction of our coherent, fully convolutional net-work for depth prediction. Fig. 2 shows the dif-ferences between an up-convolutional block to up-projection block. It also shows the correspondingfast versions that will be described in the followingsection.

Fast Up-Convolutions. One further contributionof this work is to reformulate the up-convolutionoperation so to make it more efficient, leading to adecrease of training time of the whole network of

Figure 3. Faster up-convolutions. Top row: the com-mon up-convolutional steps: unpooling doubles a fea-ture map’s size, filling the holes with zeros, and a 5 × 5convolution filters this map. Depending on the positionof the filter, only certain parts of it (A,B,C,D) are mul-tiplied with non-zero values. This motivates convolv-ing the original feature map with the 4 differently com-posed filters (bottom part) and interleaving them to ob-tain the same output, while avoiding zero multiplications.A,B,C,D only mark locations and the actual weight val-ues will differ

around 15%. This also applies to the newly intro-duced up-projection operation. The main intuitionis as follows: after unpooling 75% of the result-ing feature maps contain zeros, thus the following5 × 5 convolution mostly operates on zeros whichcan be avoided in our modified formulation. Thiscan be observed in Fig. 3. In the top left the origi-nal feature map is unpooled (top middle) and thenconvolved by a 5 × 5 filter. We observe that in anunpooled feature map, depending on the location(red, blue, purple, orange bounding boxes) of the5×5 filter, only certain weights are multiplied withpotentially non-zero values. These weights fall intofour non-overlapping groups, indicated by differentcolors and A,B,C,D in the figure. Based on the fil-ter groups, we arrange the original 5 × 5 filter tofour new filters of sizes (A) 3 × 3, (B) 3 × 2, (C)2× 3 and (D) 2× 2. Exactly the same output as theoriginal operation (unpooling and convolution) cannow be achieved by interleaving the elements of thefour resulting feature maps as in Fig. 3. The corre-sponding changes from a simple up-convolutionalblock to the proposed up-projection are shown inFig. 2 (d).

Figure 24: Faster up-convolution[7]



Upsampling Strategies

I Deconvolution can cause checkerboard artifacts

Figure 25: Nearest neighbor upsampling[2]



Lenslet Network - Fish Eye - Fewer Epochs

Figure 26: Spheres prediction (bestepoch)

Figure 27: Spheres prediction (lastepoch)



Lenslet Network - Spheres - Fewer Epochs

Figure 28: Spheres prediction (bestepoch)

Figure 29: Spheres prediction (lastepoch)



Lenslet Classification Network - Fewer Epochs

Figure 30: Ground truth. Figure 31: Predicted depth map.


ai-driven analysis and image acquisition of in-vivo ...€¦ · chen, krämer, klein, romen –...

Documents