multi-view and multi-resolution real-time digital holographic microscopy

Multi-view and multi-resolution real-time digital holographic microscopy

Tomoyoshi Shimobaba, Nobuyuki Masuda and Tomoyoshi Ito

P

PGraduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan

DHM (Digital holographic microscopy) is attractive as a new microscopy technique, because the DHM allows both the

amplitude and phase of a specimen to be simultaneously observed. This chapter describes a real-time digital holographic

microscopy, that enables simultaneous multiple reconstructed images with arbitrary resolution, depth and positions, using

Shifted-Fresnel diffraction instead of Fresnel diffraction. In this system, we used four GPUs (Graphics Processing Unit)

for multiple reconstructions in real-time.

Keywords holography; digital holographic microscopy; graphics processing unit; DHM; GPU

1. Introduction

DHM (Digital holographic microscopy) is a well-known powerful method allowing both the amplitude and phase of a

specimen to be simultaneously observed [1][2]. The technique can obtain a hologram whereby the information of a

specimen is electronically recorded, via the use of a CCD (Charge-Coupled Device Image Sensor) and CMOS

(Complementary Metal Oxide Semiconductor) image sensor. In order to obtain a reconstructed image from a hologram,

numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT

(Fast Fourier Transform) algorithm [3].

However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to

calculate the Fresnel diffraction by the FFT algorithm. For example, if we obtain a reconstructed image from a

hologram whose size is 512×512 using an Intel Core2Duo E6300 CPU, the calculation time for the Fresnel diffraction

takes about 1 second.

In order to obtain greater computational speed for the Fresnel diffraction, using hardware accelerator is an effective

means. For example, a research group developed an FPGA (Field Programmable Gate Array)-based board, FFT-HORN,

in order to accelerate the Fresnel diffraction in DHPIV (Digital Holographic Particle Image Velocimetry) [4][5].

Their latest machine, FFT-HORN2, could obtain reconstructed images from holograms whose size is 1,024×1,024

grids, which were captured by a DHPIV optical system, in about 33 milliseconds. The approach using the FPGA

technology showed excellent computational speed; however, the approach has the following restrictions: the high cost

of developing the FPGA board, long development time and the technical know-how needed for the FPGA technology.

On the other hand, recent GPUs (Graphic Processing Unit) with many stream processors allow us to use highly parallel

processors. The stream processor is a simple scalar processor, which can operate 32-bit or 64-bit floating-point addition,

multiplication, and multiply-add instructions. The approach of accelerating numerical calculations using a GPU chip is

referred to as “GPGPU (General-Purpose computation on GPU)” or “GPU computing”. The merits of GPGPU are the

high computational power, low cost for the GPU board, and short development time.

In the optics field, several researches using the GPGPU technique for fast calculation of CGH (Computer-Generated-

Hologram) have been proposed [6][7]. A well-known problem in CGH is the enormous calculation cost for generating a

CGH from three-dimensional (3D) object data. These researches could solve this problem, and generate a CGH from a

simple 3D object in real-time.

In the first half of the chapter, we describe a real-time DHM system using the GPGPU technique [8]. The

computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs.

In order to obtain a reconstructed image from a hologram, the Fresnel diffraction are required; however, the Fresnel

diffraction has a restriction, namely, the same sampling spacings must be set on the hologram and the reconstructed

image, or the sampling spacing on the reconstructed plane depends on the propagation distance and the wavelength of a

reference light [2][3].

Therefore, we cannot observe the reconstructed image with arbitrary resolution due to the restriction of the sampling

spacing. In addition, in current DHM, an objective lens is used to increase the resolution of the reconstructed image;

however, using the objective lens sacrifices an area of the reconstructed image.

In the latter half of the chapter, we describe a DHM observable in multi-view and multi-resolution [9], without using

an objective lens. The DHM can obtain multiple reconstructed images with arbitrary resolution, depths and positions,

using the Shifted-Fresnel diffraction [10], instead of Fresnel diffraction.

Microscopy: Science, Technology, Applications and Education A. Méndez-Vilas and J. Díaz (Eds.)

©FORMATEX 2010 1419

______________________________________________

The Shifted-Fresnel diffraction based on Fresnel diffraction can calculate a reconstructed image with different

sampling spacings between the hologram and the reconstructed image, as well as a shift away from the propagation

axis. In addition, we used four GPU chips in order to observe four reconstructed images in real-time from one

hologram.

2. Real-time digital holographic microscopy using GPU

Figure 1 shows the set-up for our real-time DHM system. The system mainly consists of two parts: the optical system

for recording a hologram and the real-time calculation system using a GPU.

The optical system is a traditional DHM set-up. As shown in the figure, we used a 5-mW He-Ne laser as a reference

light. The wavelength of the laser is 632.8 nm. “BS” and “ND” indicate a beam splitter and a neutral density filter, "M"

and “MO” indicate a mirror and an objective lens. We used a CCD camera made by ARTRAY, which has a resolution

of 1,360×1,024 and a pixel pitch of 4.65µm×4.65µm. We also used a test target, USAF 1951, as a sample. These

holograms are then transferred to a PC (Personal Computer) via the USB2.0 interface. The PC controls the GPU and the

CCD camera.

The GPU, “GeForce 8800 GTS” made by NVIDIA has 96 stream processors with the processor clock frequency of

1.2GHz and the memory bus width of 320-bit with the memory clock frequency of 1.6GHz. The GPU can calculate the

Fresnel diffraction at high speed, thus allowing us to obtain reconstructed images from holograms at about 24 frames

per second.

2.1 Fresnel diffraction

Here, we briefly describe the Fresnel diffraction [2][3]. The Fresnel diffraction is expressed as:

+⋅×=

+∗=

−+−=

−

∫∫

)]([exp()],([

)2

exp(

)(exp(),(

)2

exp(

))()((exp(),(

)2

exp(

),(

221

22

22

yxd

iFyxaFF

di

di

yxd

iyxa

di

di

ddyxd

ia

di

di

yxu

λπ

λλπ

λπ

λλπ

ηξηξλπ

ηξλλπ

(1)

,where (x,y) and (ξ, η) are coordinates on reconstruction plane u(x,y) and hologram a(ξ, η) captured by the CCD,

respectively, λ is the wavelength of the reference light, and z is the distance from the hologram to the reconstruction

plane. The operators ][⋅F and ][1 ⋅−F indicate the forward and inverse Fourier transform, respectively. If we calculate

the Fresnel diffraction using a computer or a GPU, we must discretize Eq.(1), and, subsequently use the FFT algorithm.

He-Ne LaserBS

M

Mirror

BS

CCD

MO

Sample

GPUwith

CUDA

USB

User

Collimator

PC

Monitor

ND

ND

Real-time digital holographic microscope

Reconstructed image(24 frame/sec)

Hologram

Fig. 1 The figure shows the set-up for our real-time DHM system.

The system mainly consists of two parts: the optical system for

recording a hologram and the real-time calculation system using a

GPU.


1420 ©FORMATEX 2010

______________________________________________

2.2 Graphics processing unit (GPU)

GPU chips before the unified shader architecture appeared were designed mainly for accelerating 3D computer

graphics; therefore, they were difficult to use for general-purpose computing. However, recent GPU chips with the

unified shader architecture are designed for general-purpose computing, not to mention acceleration of 3D computer

graphics. Figure 2 shows the outline of a GPU chip with unified shader architecture.

In the figure, the GPU chips have some multiprocessors. As the GPU in Fig.1, we used a GPU board made by

GALAXY Technology. The GPU board mounts the GPU chip of “GeForce 8800 GTS” made by NVIDIA, which has

12 multiprocessors.

The multiprocessor has eight stream processors (SP). One SP can operate 32-bit floating-point addition, multiplication

and multiply-add instructions. Therefore, the GPU chip has 96 SPs, and they can activate at 1.2GHz in parallel. Eight

SPs in a multiprocessor operate the same instruction in parallel, and they process different datum. Namely, a

multiprocessor is an SIMD (Single Instruction Multiple Data) like processor. Each multiprocessor can operate the same

processing or different processing.

The GPU chip has a peak performance of 2 operations/SP × 96SPs × 1.2GHz = 230.4 Gflops (floating point number

operations per second). Thus, we can use the GPU chip as a highly parallel processor.

The host computer controls the GPU board and the communication between the host computer and the GPU board via

the PCI-express bus. The host computer can directly access 640M bytes of the device memory on the GPU board. The

device memory is used for storing input data and results computed by the GPU chip. The multiprocessor has a shared

memory, which is 16K byte memory, low latency and faster than the device memory. We need to use the device

memory and the shared memory correctly.

We use the CUDA (Compute Unified Device Architecture) compiler as a programming environment for the GPU chip.

The CUDA compiler can compile a C-like language source code for the instruction set of the GPU chip, which is

referred to as a "Kernel". And, we download a "Kernel" to the GPU chip via the PCI-express bus.

The Fresnel diffraction described in the previous section is accelerated by the FFT algorithm. The CUDA compiler

allows us to accelerate the FFT algorithm using the GPU chip: the CUFFT library [11] is the FFT library on the GPU

chip, which is similar to the FFTW library [12].

Figure3 shows how to use CUFFT. The function “cufftPlan2d” prepares FFT on the GPU. “cufftPlan2d” in the figure

prepares two-dimensional FFT with 1,024×1,024, and the input data and output data are complex value. The function

“cufftExecC2C” calculates FFT on the GPU. The macro “CUFFT_FORWARD” can executes forward FFT. If we

indicate “CUFFT_INVERSE”, the function “cufftExecC2C” calculates inverse FFT on the GPU. The function

“cufftDestroy” releases resources on the GPU for the FFT operation.

void fftExecute(cufftComplex *in, cufftComplex *out){

cufftPlan2d(&plan, 1024, 1024, CUFFT_C2C);

cufftExecC2C(plan, (cufftComplex*)in, (cufftComplex*)out, CUFFT_FORWARD

);

cufftDestroy(plan);}

Host Computer

Device Memory

PCI-Express Bus

GPU Chip

GPU Board

SP SP

Shared Memory

Multiprocessor

SP SP

SP SP

SP SP

Fig. 2 The GPU chips have some multiprocessors. The multiprocessor has eight stream

processors (SP). One SP can operate 32-bit floating-point addition, multiplication and

multiply-add instructions.

Fig. 3 The figure shows how to use CUFFT



______________________________________________

2.3 Performance

The following table shows a comparison of the reconstruction rate between the case of CPU alone and that of the GPU.

The unit of the reconstruction rate is frames per second (fps). The reconstruction rate involves the time from capturing a

hologram to displaying a reconstructed image. We used Intel Core2Duo E6300 as the "CPU" in the table, memory of

2Gbytes, and the operating system of Microsoft Windows XP Professional SP2.

The hologram and reconstructed image are 512×512 grids; however, in order to avoid circular convolution of Eq.(1),

we expand the calculation area to double-size (1,024×1,024 grids) during the calculation. With this size, the GPU can

obtain reconstructed images about 20 times faster than the CPU. In case of the hologram size, the single floating point

operation on the GPU chip is enough to obtain a reconstruction image from a hologram.

CPU (Intel Core2Duo E6300) GPU (GeForce 8800 GTS)

Reconstruction rate (fps) 1.24 24.91

3. Multi-view and multi-resolution real-time digital holographic microscopy

The Fresnel diffraction are required in order to reconstruct images from a hologram; however, the Fresnel diffraction

has a restriction, namely, the same sampling spacings must be set on the hologram and the reconstructed image, or the

sampling spacing on the reconstructed plane depends on the propagation distance and the wavelength of a reference

light. Therefore, we cannot observe the reconstructed image with arbitrary resolution due to the restriction of the

sampling spacing. In addition, in current DHM, an objective lens is used to increase the resolution of the reconstructed

image; however, using the objective lens sacrifices an area of the reconstructed image.

In this section, we describe a multi-view and multi-resolution real-time DHM [9], without using an objective lens.

The DHM can obtain multiple reconstructed images with arbitrary resolution, depths and positions, using the Shifted-

Fresnel diffraction [10], instead of Fresnel diffraction.

The Shifted-Fresnel diffraction based on Fresnel diffraction can calculate a reconstructed image with different

sampling spacings between the hologram and the reconstructed image, as well as a shift away from the propagation

axis. In addition, we used four GPU chips in order to observe four reconstructed images in real-time from one

hologram.

The concept of the proposed DHM is shown in Fig.4. In the DHM, we can simultaneously observe reconstructed

images at different depths (Fig.4 (a) and (b)) along the depth direction. In addition, while observing a wide viewing

area of a reconstructed image (Fig.4(c)), we can simultaneously observe reconstructed images with the user-setting

arbitrary resolutions at arbitrary positions (Fig.4 (d) and (e)).

As shown in Fig.1, an objective lens can increase the resolution of the reconstructed image, while decreasing the

viewing area of the reconstructed image. For this reason, the proposed DHM system does not use an objective lens,

instead, the proposed DHM system uses arbitrary sampling spacing on the reconstructed image. If we observe a wide

viewing area of a reconstructed image, we set a large sampling spacing on the reconstructed plane. Therefore in order to

observe the reconstructed image in detail, we need to set a small sampling spacing on the reconstructed plane.

In order to realize the proposed DHM, the following methods are required:

1. A computational method for obtaining reconstructed images with arbitrary resolution from a hologram

2. A high-speed computational system for reconstructing from a hologram in real-time.

To solve these problems, we used Shifted-Fresnel diffraction and multi GPUs. An outline of the DHM system is

shown in Fig.5. The system consists of an optical system without using an objective lens, and a high-speed

computational system using four GPU chips.

Sample

(e)

(b)

(a)

(d)

Reconstructed image1

Reconstructed image2Reconstructed image3

(c)

Fig. 4 The concept of the proposed DHM. In the DHM, we can simultaneously

observe reconstructed images at different depths (Fig.4 (a) and (b)) along the

depth direction. In addition, while observing a wide viewing area of a

reconstructed image (Fig.4(c)), we can simultaneously observe reconstructed

images with the user-setting arbitrary resolutions at arbitrary positions (Fig.4 (d)

and (e)).



______________________________________________

In the figure, we used a 5-mW He-Ne laser (the wavelength is 632.8 nm) as a reference light. We used a CCD camera

made by ARTRAY, which has a resolution of 1,360×1,024 and a pixel pitch of 4.65µm×4.65µm.

In the reconstruction calculation from a hologram captured by the CCD, we resize the hologram with 1,024×1,024

grids in order to use FFT for the calculation.

We also used three samples, which are USAF 1951 test target, the head of a mosquito and a fly. Holograms captured

by the CCD are recorded as in-line Gabor hologram and are transferred to personal computer via the USB2.0 interface.

Then, four GPU chips calculate four reconstructed images from one hologram in real-time. The GPU, “GeForce GTX

295” made by NVIDIA has two GPU chips on the board. The GPU board has 240×2 stream processors with the

processor clock frequency of 1.2GHz and the memory bus width of 480-bit×2 with the memory clock frequency of

about 2.0GHz.

3.1 Shifted-Fresnel diffraction

Recently, a new diffraction calculation, Shifted-Fresnel diffraction, has been proposed [10]. The method enables

arbitrary sampling spacings to be set on a hologram and a reconstructed plane as well as a shift away from the

propagation axis. Other methods capable of changing sampling spacing have also been studied [13][14]. We chose the

Shifted-Fresnel diffraction because one can do a shift away from the propagation axis. The Shifted-Fresnel diffraction is

expressed by the following equations:

)]],([)],([[),( 11111

22 nmBFFTnmAFFTFFTCnmu s ××= − . (2)

The operators ][⋅FFT and ][1 ⋅−FFT indicate the forward FFT and inverse FFT, respectively. The coefficient SC and

the functions ),( 11 nmA and ),( 11 nmB are as follows:

))(2

exp())(exp()exp(

2201220122

22 ynyxmx

ziyx

zi

zi

ikzCs ∆+∆−×+=

λπ

λπ

λ (3)

))(exp())(2

exp())(exp())(exp(),(),( 21

212121

22

22

21

211111 nsmsiyyxx

ziyx

ziyx

zinmanmA yxoo +−×+−×+×+= π

λπ

λπ

λπ

(4)

))(exp(),(21

2111 nsmsinmB yx += π (5)

where, (m1, n1) and (m2, n2) are the discretized coordinates on the hologram and the reconstructed image, ),( 11 nma and

),( 22 nmu are the hologram and the reconstructed image, 1x∆ and 1y∆ are the sampling spacing on the hologram, 2x∆

and 2y∆ are the sampling spacing on the reconstructed image, ),( 11 oo yx and ),( 22 oo yx are the shift distances away

from the propagation axis.

And, we define ,

z

xxSx λ

21∆∆= (6)

z

yyS y λ

21∆∆= (7)

1111 oxxmx +∆= (8)

He-Ne LaserCCDCollimatorND

2 GPU boards(4 GPU chips)

PC

Mouse

High-speed computational system using 4 GPU chips

Optical system

SamplesFig. 5 We used a 5-mW He-Ne laser (the wavelength is 632.8 nm) as a

reference light. We used a CCD camera made by ARTRAY, which has a

resolution of 1,360×1,024 and a pixel pitch of 4.65µm×4.65µm. We also used

three samples, which are USAF 1951 test target, the head of a mosquito and a fly.

Holograms captured by the CCD are recorded as in-line Gabor hologram and are

transferred to personal computer via the USB2.0 interface. Then, four GPU chips

calculate four reconstructed images from one hologram in real-time. The GPU,

“GeForce GTX 295” made by NVIDIA has two GPU chips on the board. The

GPU board has 240×2 stream processors with the processor clock frequency of

1.2GHz and the memory bus width of 480-bit×2 with the memory clock

frequency of about 2.0GHz.



______________________________________________

1111 oyyny +∆= (9)

2222 oxxmx +∆= (10)

2222 oyyny +∆= (11)

For more details, see [10]. Last, we can obtain the light intensity as a reconstructed image using the following equation:

2

2222 ),(),( nmInmI = (12)

Calculating the above equations, we can obtain a reconstructed image with an arbitrary depth, resolution and shift to

change the parameters z , 2x∆ and 2y∆ , and ),( 22 oo yx , respectively. Note that if we need only the intensity of a

reconstructed image, we can neglect the coefficient sC .

3.2 Shifted-Fresnel diffraction on multi-GPU

As shown in Eq.(2), the Shifted-Fresnel diffraction can accelerate the computational time using two FFTs and one

inverse FFT; however, recent central processing units (CPUs) do not have sufficient computational power for real-time

calculation. Therefore, we use GPU instead of CPU.

Multiple reconstructions using multi-GPU is shown below:

1. The CPU sends a hologram captured by the CCD to the memories on the two GPU boards.

2. Each GPU chip expands the doubled size of the hologram (2,048×2,048 grids) with zero-padding to avoid the

circular convolution.

3. Each GPU chip calculates the complex multiplication of the hologram ),( 11 nmA and the exponential terms in Eq.(4).

4. Each GPU chip calculates the FFT of the result in Step 3 using CUFFT library. CUFFT library is the fast FFT library on the NVIDIA GPU.

5. Each GPU chip generate the propagation term ),( 11 nmB in Eq.(5).

6. Each GPU chip calculate the FFT of the propagation term ),( 11 nmB .

7. Each GPU chip calculates the complex multiplication of the results of Steps 4 and 6.

8. Each GPU chips calculates the inverse FFT of Step 7. 9. Each GPU chips reduces the area of the results in Step 8 to 1,024×1,024 grids. 10.Each GPU chips calculates the complex power (Eq.(12)) of the result in Step 9.

11.The CPU receives the four reconstructed images from the memories on each GPU.

We used OpenMP library [15] to operate each GPU chip because each GPU must be controlled by CPU threads. Each

GPU can operate in parallel. For real-time reconstruction, we repeat from Steps 1 to 11.

3.3 Shifted-Fresnel diffraction on multi-GPU

The hologram and four reconstructed images are 1,024×1,024 grids; however, in order to avoid circular convolution of

Eq.(2) we expand the calculation area to double-size during the calculation. Under the condition that the number of

multiple reconstructed images is four, the calculation time using the CPU alone and the four GPU chips are shown in

the following table.

CPU (Intel Core2Quad Q6600) Four GPU chips

(GeForce GTX295×2board)

Calculation time for four reconstructed

images (ms)

9,542 60

In Fig.6, we can observe four reconstructed images in real-time (See the movie in Ref.[16]). "View 1" shows the

reconstructed image of the head of mosquito. "View 2" shows the reconstructed image of the fly. "View 3" shows the

reconstructed image of the USAF test target. "View 4" shows the reconstructed image with wide viewing area of about

4.8mm×4.8mm. All of the views can be observed in arbitrary resolution, depth and positions.



______________________________________________

View1(Head of Mosquito)

View2(Fly)

View3(USAF1951)

View4(Whole view)

4. Conclusion

In this chapter, we described the DHM system observable in multi-view and multi-resolution. For multiple

reconstruction, we used the four GPU chips. In addition, we used the Shifted-Fresnel diffraction to obtain reconstructed

images with arbitrary depth, resolution and view position. The method enables multi-view reconstructed images with

the large area and higher resolution to be observed.

Acknowledgements This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-

Aid for Young Scientists (B), 21760308, 2009, and, the Ministry of Internal Affairs and Communications, Strategic Information and

Communications R\&D Promotion Programme (SCOPE), 2009.

References

[1] Schnars U, Juptner W. Direct recording of holograms by a CCD target and numerical Reconstruction. Appl.Opt. 1994;

33;2;179-181.

[2] Schnars U, JueptnerW. Digital Holography - Digital Hologram Recording, Numerical Reconstruction, and Related

Techniques. New York, NY: Springer; 2005.

[3] Ersoy OK. Diffraction, Fourier Optics And Imaging. Wiley-Interscience;2006.

[4] Masuda N, Ito T, Kayama K, Kono H, Satake S, Kunugi T, Sato K. Special purpose computer for digital holographic particle

tracking velocimetry. Opt.Express.2006;14;2;603-608.

[5] Abe Y, Masuda N, Wakabayashi H, Kazo Y, Ito T, Satake S, Kunugi T, Sato K. Special purpose computer system for flow

visualization using holography technology. Opt.Express.2008;16;11;7686-7692.

[6] Masuda N, Ito T, Tanaka T, Shiraki A, Sugie T. Computer generated holography using a graphics processing unit.

Opt.Express.2006;14;2;587-592 .

[7] Ahrenberg L, Benzie P, Magnor M, Watson J. Computer generated holography using parallel commodity graphics hardware.

Opt.Express.2006.14;17;7636-7641.

[8] Shimobaba T, Sato Y, Miura J, Takenouchi M, Ito T. Real-time digital holographic microscopy using the graphic processing

unit. Opt. Express.2008;16; 11776-11781.

[9] Shimobaba T, Masuda N, Ichihashi Y and Ito T. Real-time digital holographic microscopy observable in multi-view and multi-

resolution. Journal of Optics. in print.

[10] Muffoletto RP, Tyler JM, Tohline JE. Shifted Fresnel diffraction for computational holography. Opt. Express.2007;15;5631-

5640.

[11] NVIDIA. CUDA FFT Library. NVIDIA.2007. [12] http://www.fftw.org/

[13] Zhang F, Yamaguchi I, Yaroslavsky LP. Algorithm for reconstruction of digital holograms with adjustable magnification. Opt.

Lett. 2004;29;1668-1670.

[14] Yu L and Kim MK. Pixel resolution control in numerical reconstruction of digital holography. Opt. Lett.2006;31;897-899.

[15] http://openmp.org/

[16] http://www.youtube.com/watch?v=X2CuGw7Gr00

Fig. 6 We can observe four reconstructed images in real-

time (See the movie in Ref.[16]). "View 1" shows the

reconstructed image of the head of mosquito. "View 2"

shows the reconstructed image of the fly. "View 3" shows

the reconstructed image of the USAF test target. "View 4"

shows the reconstructed image with wide viewing area of

about 4.8mm×4.8mm. All of the views can be observed in

arbitrary resolution, depth and positions.



______________________________________________

multi-view and multi-resolution real-time digital holographic microscopy

Documents