structured light in sunlight - wision lab...

Structured Light In Sunlight

Mohit GuptaColumbia UniversityNew York, NY 10027

[email protected]

Qi YinColumbia UniversityNew York, NY [email protected]

Shree K. NayarColumbia UniversityNew York, NY [email protected]

Abstract

Strong ambient illumination severely degrades the per-

formance of structured light based techniques. This is espe-

cially true in outdoor scenarios, where the structured light

sources have to compete with sunlight, whose power is often

2-5 orders of magnitude larger than the projected light. In

this paper, we propose the concept of light-concentration to

overcome strong ambient illumination. Our key observation

is that given a fixed light (power) budget, it is always better

to allocate it sequentially in several portions of the scene,

as compared to spreading it over the entire scene at once.

For a desired level of accuracy, we show that by distributing

light appropriately, the proposed approach requires 1-2 or-

ders lower acquisition time than existing approaches. Our

approach is illumination-adaptive as the optimal light dis-

tribution is determined based on a measurement of the am-

bient illumination level. Since current light sources have

a fixed light distribution, we have built a prototype light

source that supports flexible light distribution by control-

ling the scanning speed of a laser scanner. We show several

high quality 3D scanning results in a wide range of outdoor

scenarios. The proposed approach will benefit 3D vision

systems that need to operate outdoors under extreme ambi-

ent illumination levels on a limited time and power budget.

1. Introduction

Structured light 3D scanning, because of its accuracy andsimplicity, is the method of choice for 3D reconstruction

in several applications, including factory automation, per-

formance capture, digitization of cultural heritage and au-tonomous vehicles. In many real-world settings, structured

light sources have to compete with strong ambient illumi-

nation. In these scenarios, because of the limited dynamicrange of image sensors, the signal (intensity due to struc-

tured light) in the captured images can be extremely low,resulting in poor 3D reconstructions.

This is especially true outdoors, where sunlight is often

2-5 orders of magnitude brighter than the projected struc-

tured light. For instance, it is known that Kinect, a popularstructured light device, cannot recover 3D shape in strong

sunlight [1]. While several optical techniques for ambient

light reduction have been proposed [10], they achieve onlymoderate success. An example using an off-the-shelf laser

3D scanner is shown in Figure 1. Under strong ambient il-lumination, the reconstruction quality of an object placed

outdoors degrades, even when spectral filtering is used.

One obvious solution to the ambient light problem is

(a) An object placed outdoors (b) Image of the sky

6 am 9 am 12 pm

(c-e) 3D reconstructions at different times of the day

Figure 1. Effect of ambient illumination on structured light 3D

scanning. (a) An object placed outdoors on a clear day receives

strong ambient illuminance Ra from the sun and the sky. (b) Im-

age of the sky at 9am. (c-e) 3D reconstructions using conventional

methods at different times of the day. From left to right, as the day

progresses, Ra increases (2000 lux, 24,000 lux and 90,000 lux,

respectively) and the reconstruction quality degrades.

to increase the power of the light source. Unfortunately,

this is not always possible. Especially in outdoor scenar-

ios, vision systems often operate on a limited power budget.Moreover, low-cost hand-held projectors (e.g., pico projec-

tors) are increasingly becoming popular as structured light

sources. For these low-power devices to be useful outdoors,it is important to be able to handle strong ambient illumina-

tion on a tight power budget.

In this paper, we introduce the concept of light concen-

tration in order to deal with strong ambient illumination.

The key idea is that even with a small light budget, signal

level can be increased by concentrating the available pro-jector light on a small portion of the scene. This is illus-

trated in Figure 2 (center). At first glance, it may appear thatconcentrating the light will require more measurements, as

only a fraction of the scene is illuminated and encoded at

a time. However, we show that, it is possible to achievesignificantly lower acquisition times by concentrating light

as compared to existing approaches that spread the avail-

able light over the entire projector image plane, and thenreduce image noise by frame-averaging. We call this the

light-concentration advantage.

1

light

source

large spread

light

source

medium spread

light

source

narrow spread

Decreasing Light Spread

1st block th block Pro

ject

ed

Im

ag

es

Pro

ject

or

(Left) Stripe Intensity = (Center) Stripe Intensity = (Right) Stripe Intensity =

Figure 2. Light redistribution for structured light. We consider different light distributions for designing structured light systems that

perform under strong ambient illumination. Given a fixed light budget, as the light spread decreases (from left to right), the intensity of

each projected stripe increases. Existing structured light techniques lie at the two extremes of the power distribution scale. (Left) Systems

where light is distributed over the entire projector image plane yield low signal strength and hence poor reconstruction quality. (Right)

Systems where all the light is concentrated in to a single column require a large number of measurements. (Center) We show that by

concentrating the light appropriately, it is possible to achieve fast and high-quality 3D scanning even in strong ambient illumination.

The light-concentration advantage arises from the factthat frame-averaging increases signal-to-noise-ratio (SNR)

by a factor of square-root of the number of averaged frames.However, the same time and power budget, if allocated into

smaller scene regions, increases SNR linearly with the num-

ber of measurements. We show that for the same accuracy(SNR) level, while the number of measurements required

by existing approaches is linear in the ambient illuminance

level Ra, i.e., O (Ra), the proposed approach requires onlyO(√

Ra

)

measurements. For outdoor ambient illuminance

levels, this translates into 1-2 orders of magnitude (10-100

times) lower acquisition time.

Scope and contributions: This paper introduces light re-

distribution as a new dimension in the design of structuredlight systems. We do not introduce a new structured light

coding scheme. Instead, we show that by managing the

light budget appropriately, it is possible to perform fast andaccurate 3D scanning outdoors on a limited power budget.

After determining the optimal light distribution based on

the ambient illuminance level, any one of the existing high-SNR structured light coding scheme [14, 4, 6] can be used.

The proposed approach can adapt to the ambient light level.For instance, as ambient illuminance decreases, the acquisi-

tion time required by our approach decreases. The proposed

techniques are not restricted only to ambient illuminationdue to sunlight. They are applicable in any scenario that

has a wide range of ambient illumination.

Hardware prototype and practical implications: Exist-

ing projectors distribute light over the entire image plane;they do not have the ability to distribute light in a flexible

manner. We have developed a prototype projector with flex-

ible light distribution ability by using an off-the-shelf laserscanner. Different light distributions over the projector im-

age plane are achieved by varying the scanning speed of

the scanner. The proposed approach achieves fast and high-quality (pixel-level) 3D reconstruction for even the most ex-

treme scenarios (direct sunlight, low-powered light source).These features make our approach especially suitable for

moving platforms such as autonomous cars which need to

operate outdoors under varying ambient illuminance levelson a limited power budget.

2. Related Work

Structured light 3D scanning: Structured light techniques

are classified based on the coded patterns that they project

on the scene. Some typical examples are single linestripes [3], sinusoidal patterns [13], binary patterns [11] and

deBruijn codes [15]. For a comprehensive survey on exist-

ing coding schemes, the reader is referred to [12].

Significant work has been done towards designing high

SNR structured light coding schemes [14, 4, 6]. It has been

shown that in scenarios with extremely low SNR (such asstrong ambient illumination), optimal SNR is achieved by

using patterns with the fewest possible intensity levels (bi-nary patterns with two intensity levels) [6]. In Figure 1,

despite binary Gray code patterns being used, result quality

degrades as ambient illumination increases. This is becauseusing high SNR patterns without considering light redistri-

bution is not sufficient to achieve high-quality results under

strong ambient illumination.

Optical methods for suppressing ambient illumination:

Examples of such methods include using a narrow spec-tral bandwidth laser (sunlight has broad bandwidth) with a

narrow-band spectral filter [10] and a polarized light source

(sunlight is unpolarized) with a polarization filter [10].

This paper proposes a different approach. Given a fixed

level of ambient illuminance (after optical suppression), we

determine the optimal distribution of the light (of the struc-

tured light source) in order to maximize the SNR. The in-crease in SNR achieved by our method is in addition to, and

much higher than, that achieved by the optical methods. In

order to deal with extreme ambient illumination scenarios,optical suppression techniques can be used in a complemen-

tary manner to our method.

Recently, a pulsed light source with a fast shutter [8] was

used to suppress ambient illumination. Our approach is in-spired by this work, which corresponds to the right extreme

of the light distribution scale in Figure 2. In this method, all

the light is concentrated into a single column. While effec-tive, it requires a large number of images. In contrast, we

consider the entire range of light distributions. Given the

same power budget, our method, by distributing the avail-able light efficiently, requires 10-100 times fewer images in

most outdoor scenarios.

3. Structured Light In Ambient Illumination

We model the structured light source L as a projector

that has an image plane with C columns. The projector

projects spatio-temporally coded patterns on the scene sothat a unique intensity code is assigned to each column 1.

The power of the light source is fixed at P Watts. If the

available power is spread equally among all C columns,each column generates P

CWatts of light. This is illustrated

in Figure 2 (left).

The intensity of a scene point S in a captured image is:

I = Il + Ia + η, (1)

where Il and Ia are intensities corresponding to the light

source L and ambient illumination A, respectively. η is thecamera noise. The goal is to extract the signal component

Il reliably from the captured images. The accuracy of the

estimated signal Il (and hence the depth-accuracy) is pro-

portional to the signal-to-noise-ratio: SNR = Ilη

.

3.1. Ambient illumination and depth accuracy

The components Il and Ia are proportional to the illumi-

nance values Rl and Ra at scene point S due to the light

source L and ambient illumination A, respectively:

Il = αRl, Ia = βRa, (2)

where α and β encapsulate the scene point’s BRDF, lightfall-off, and camera’s spectral gain 2. Rl is proportional to

the source power P . We assume the affine camera noise

model, with both signal-dependent and signal-independentterms [5]:

η2 = σ2

r +αRl + βRa

g, (3)

where σr is the standard deviation of the signal-independent

sensor read noise, and g is camera gain. In scenarios with

1Because of epipolar geometry between the projector and camera, only1D coding (e.g., along the columns) on the projector plane is sufficient toperform depth recovery using triangulation. In the rest of the paper, all thepixels within a column are grouped together as a single entity - a column.

2β also includes the effect of any optical (e.g., spectral or polarization)filtering used for reducing ambient illumination. In all our experiments,we used a narrow-band laser light source and spectral filter in front of ourcamera. This suppresses ambient illumination by a factor of about 20.

strong ambient illumination, Ra >> Rl, and the dominant

source of noise is the signal-dependent photon noise, i.e.,

σ2

r << βRa

g. Then, the SNR is approximated as:

SNR ≈ λRl√Ra

, (4)

where λ is a constant. In order to achieve a desired depthaccuracy δ, the SNR should be higher than a threshold τ ,

i.e., SNR > τ 3. Substituting in Eq. 4:

Rl√Ra

≥ τ

λ. (5)

We call this the decodability condition. In order to

achieve the desired depth accuracy, all the captured images

must satisfy the decodability condition.

If Ra is significantly larger than Rl, the decodability

condition is not satisfied. This results in large errors in therecovered shape, as illustrated in Figure 1. As Ra increases,

the quality of the reconstructed shape deteriorates.

3.2. Increasing SNR by multiframe capture

A common technique for increasing SNR is by capturing

multiple frames per image4 and combining them into a sin-

gle image. For instance, by capturing f frames [F1, . . . , Ff ]

for each image I , and computing the average image I =∑iFi

f, noise can be reduced by a factor of

√f .

How many frames should be combined so that the de-

codability condition is satisfied? Using Eq. 4, the SNR for

an image computed by averaging f frames is SNRav =√fλ Rl√

Ra. Since SNRav should be greater than τ , we get:

f ≥(

τ

λRl

)2

Ra . (6)

Let NC be the number of images required by the partic-ular structured light coding scheme used to encode all the

projector columns uniquely, and f , as defined above, is the

number of frames to be averaged per image. Then, the totalnumber of measurements M is given as:

M = NC × f . (7)

From Eqs. 6 and 7, we arrive at the following result:

Result 1 (Acquisition time for frame averaging) Given

a fixed power budget P , the number of measurements M(and hence the acquisition time) using frame-averaging is

linear in the ambient illuminance level Ra, i.e., O (Ra).

Thus, while frame-averaging can be an effective method

for increasing SNR in weak ambient illumination (e.g., in-doors), the acquisition time is prohibitively large for out-

door ambient illumination levels that are 102 − 103 timesthe typical indoor illumination.

In view of this tradeoff between desired accuracy and ac-

quisition time, we ask the following question: Is it possible

3The threshold τ depends on the structured light coding and decodingalgorithms. It increases monotonically with δ. The analytical expressionsfor λ and τ are derived in a technical report available at [2].

4In this paper, we distinguish images from frames. Images correspondto measurements captured under different illumination patterns. Framesare measurements captured under the same illumination pattern. Multipleframes may be combined to compute a single image.

to achieve high depth accuracy while also requiring a small

number of measurements, even in extremely strong ambientlight conditions and with a limited power budget?

4. The Light-Concentration Advantage

Suppose the SNR needs to be increased by a factor of s in

order to satisfy the decodability condition. Our key obser-vation is that SNR can be improved much more efficiently

as compared to frame-averaging by concentrating light into

a smaller region of the scene. This is different from block-ing the light, which results in light-loss. The total light bud-

get remains the same - it is just concentrated into a smaller

region. This is illustrated in Figure 2 (center).

In particular, let the projector image plane be divided

into s blocks of size K = Cs

columns each. Suppose all theavailable light is concentrated into a single block at a time,

and each block is encoded independently. We call this the

concentrate-and-scan strategy, as light is concentrated in aselected region of the scene, and then the illuminated region

is scanned over the entire scene. The averaging strategy de-

fined in the previous section is called spread-and-average,as it includes spreading all the light over the entire projector

image plane, and then averaging frames.

While the concentrate-and-scan approach requires stimes more images (as each of the s block is encoded inde-

pendently), since each column receives s times more light,SNR is increased by a factor of s, without requiring any

frame-averaging. Thus, the decodability condition is satis-

fied with only s times more measurements. In contrast, asdiscussed in the previous section, the spread-and-average

approach would require s2 times more measurements to in-crease the SNR by a factor of s. Thus, we get:

Result 2 (Light-concentration advantage) Given a fixed

power budget, in order to achieve a desired accuracy level

(SNR), it is always better to increase the signal directly by

using the concentrate-and-scan approach, instead of reduc-

ing noise by the spread-and-average approach.

The above result, called the light-concentration advan-

tage (LCA), forms the basis of the proposed techniques. As

we show later, as a consequence of the LCA, concentrate-and-scan requires a much lower acquisition time (1-2 orders

of magnitude smaller), as compared to spread-and-averagein extreme ambient illumination conditions. In the follow-

ing, we formalize the concepts introduced above.

4.1. Concentrateandscan structured light

Suppose we could concentrate all the light into any block

of size K columns, whereK (1 ≤ K ≤ C) could be chosenarbitrarily. Then, given a fixed block size K , concentrate-

and-scan structured light consists of dividing the projec-

tor image plane into ⌈CK⌉ non-overlapping blocks of K

columns each. Let the blocks be B1, B2, . . . , B⌈ CK

⌉. Then,

for each block Bi, only the columns within Bi are encoded

(using any existing coding scheme) while spreading light

only within that block. This step is repeated sequentiallyfor all the blocks by concentrating light in a single block at

a time. This is illustrated in Figure 2 (center).

-4.0 3.5 4.0 4.5 5.0 5.5 6.0

200

400

600

800

1000

Log Ambient Illumination (lux)

Op

tim

al B

lock

Siz

e (Kopt)

25 lux 50 lux 100 lux

Real-world illumination

Mo

on

less

Nig

ht

Dir

ect

Su

nli

gh

t

1m 0.5m 0.2m 0.3m

200

400

600

800

1000

-4.0 3.5 4.0 4.5 5.0 5.5 6.0


Op

tim

al B

lock

Siz

e (Kopt)

Real-world illumination

(a) Different source powers (b) Different scene distances

Figure 3. Optimal block size Kopt for the proposed

concentrate-and-scan method. (a) Variation of Kopt with am-

bient illuminance level, for different light source powers P (result-

ing in illuminance of 25, 50 and 100 lux, respectively at a normally

facing scene point 1 meter away). (b) Variation of Kopt for differ-

ent scene-source distances, for the 50 lux light source.

Nu

mb

er

of

Ima

ge

s -4.0 3.5 4.0 4.5 5.0 5.5 6.0

200

400

600

800

1000


Range of real-world illumination

Mo

on

less

Nig

ht

Dir

ect

Su

nli

gh

t

Spread-and-Average

Scan-only

Concentrate-and-

Scan [Proposed]

(a) Comparison of different methods

-4.0 3.5 4.0 4.5 5.0 5.5

20

60

100

140


Nu

mb

er

of

Ima

ge

s

25 lux 50 lux 100 lux 1m 0.5m 0.2m 0.3m

-4.0 3.5 4.0 4.5 5.0 5.5


20

60

100

140

Nu

mb

er

of

Ima

ge

s

(b) Different source powers (c) Different scene distancesFigure 4. Number of measurements. (a) Comparison of the num-

ber of measurements required by different methods. Scene-source

distance is assumed to be 1 meter, and source illuminance is 50

lux. For most scenarios, the concentrate-and-scan method re-

quires 1-2 orders of magnitude fewer images than existing meth-

ods. Number of images required by our method, for (b) different

source power ratings, and (c) for different scene–source distances.

4.2. What is the optimal block size K?

The block size K determines the total number of mea-

surements, and also the SNR of each measurement. A large

block size K requires fewer measurements, but also resultsin low SNR per measurement. On the other hand, small Krequires more measurements, but higher SNR per measure-ment. Given this trade-off, what K should be used?

In order to fully exploit the light-concentration advan-tage, the block size K should be chosen so that the de-

codability condition is satisfied without requiring frame-averaging. Let Rl be the source illuminance when light is

spread over the entire image plane. Then, the illuminance

when light is concentrated into K columns is RlCK

. Substi-tuting in the decodability condition (Eq. 5), we get:

K ≤ λC

τ

Rl√Ra

. (8)

On the other hand, K should be as large as possible (up

to a maximum of C) in order to minimize the number of

required images. Thus, the optimal block size Kopt is:

Kopt =λC

τ

Rl√Ra

. (9)

As expected, Kopt is inversely correlated with Ra. As

Ra increases, Kopt becomes smaller. This ensures that the

available light is concentrated into a smaller region so thatthe decodability condition is satisfied.

Figure 3 (a) shows the variation of Kopt with Ra, fordifferent source powers. The three sources correspond ap-

proximately to a small laser pointer, a desktop laser scan-ner and a pico projector (resulting in illuminance of 25 lux,

50 lux and 100 lux respectively at a normally facing scene

point 1 meter away). The number of projector columns isC = 1024. For these settings, λ = 4.47 (see the supple-

mentary technical report [2] for details of computation of

λ). The constant τ = 3.0 was calculated assuming binarystructured light coding 5, and the accuracy level is 0.5 pixels

- accuracy is defined in terms of the difference between the

estimated projector column correspondence and the ground-truth. As the source power P increases, the curve shifts to

the right. Similarly, increasing the source-scene distance ef-

fectively reduces the source power, and shifts the plot to theleft, as shown in Figure 3 (b).

4.3. Acquisition time

Let NK be the number of images required to encodea block of size K columns. NK depends on the cod-

ing scheme used within each block. The number of mea-

surements Mcs required for concentrate-and-scan is simplythe product of NK and the number of blocks C

K: Mcs =

NK × CK

. Note that no frame-averaging is required as theSNR of each measurement is sufficiently high to satisfy the

decodability condition. Substituting the value of Kopt fromEq. 9, we get:

Mcs = NK

τ

λRl

√

Ra. (10)

Thus, we get the following result:

Result 3 (Acquisition time for concentrate-and-scan)

Given a fixed power budget P , the number of measure-

ments Mcs (and hence the acquisition time) using the

concentrate-and-scan approach is proportional to√Ra,

i.e., O(√

Ra

)

.

In contrast, recall from Result 1, that the number of

measurements Msa required for the spread-and-average ap-proach is O (Ra). Thus, as Ra increases, Msa increases

much more rapidly as compared to Mcs. Figure 4 (a) shows

the number of measurements required by the concentrate-and-scan and spread-and-average techniques for a wide

range of ambient illumination levels. The camera, scene

settings and the accuracy level are the same as in Figure 3(a). It was assumed that binary Gray codes are used, so

that NK = log2 K . The source illuminance is 50 lux.

5Similar analysis can be performed for other structured light schemessuch as phase-shifting [13] and N-ary coding [6]. See the supplementarytechnical report [2] for analysis and results for N-ary coding.

Cylindrical Lens

Laser Diode

Polygonal Mirror Laser Sheet

Camera

Variable Speed

Scanner

Spectral Filter

(a) Prototype Illustration (b) Prototype Image

256 384 512 640 768

0

85

170

255

Image Column Index

Ima

ge

In

ten

sity

(c-e) Different Light Spreads (f) ComparisonFigure 5. Hardware Prototype. (a-b) Our hardware system is

based on an off-the-shelf laser scanner. The scanner has a rotating

polygonal mirror that sweeps a laser sheet. Flexible light distri-

bution capability is implemented by varying the mirror’s rotation

speeds. (c-e) A scene illuminated at different rotation speeds. As

the speed decreases (from left to right), the illuminated area de-

creases, but the illumination strength increases. (f) Comparison

of the intensities along the marked scanlines. Because the total

energy is the same, the area under the three plots is the same.

We also plot the number of images required for single line-striping, where all the light is concentrated into a single col-

umn (as illustrated in Figure 2 (right)). This scan-only tech-nique [8] is a special case of concentrate-and-scan approach

with K = 1. The scan-only technique requires Ms = Cimages, irrespective of the ambient illumination levels.

Implications (from Figure 4 (a)): For typical low power

projectors, the concentrate-and-scan approach requires 1-2

orders of magnitude (10-100 times) lower acquisition time

than the existing schemes, for all outdoor ambient illumi-

nance levels (Ra > 104).

Conversely, given the same time budget, concentrate-

and-scan approach achieves a significantly higher SNR and

result quality. Figures 4 (b) and (c) show the variation ofMcs, for different source powers P and different scene-

source distances dss. Again, the number of required imagesis relatively small even for the most extreme cases (direct

sunlight, low-powered light source and large dss).

5. Hardware Prototype

In order to implement the concentrate-and-scan ap-

proach, we need a projector whose light could be distributedprogrammatically into any contiguous subset of K columns

on the image plane. It should be possible to vary K . This

functionality is not available in existing off-the-shelf pro-jectors, which distribute light over the entire image plane.

How can we achieve flexible light-distribution capability?

Scanning based projectors: While several existing pro-

jectors spread light spatially (e.g., using a lens), there is a

class of projectors that raster-scan a narrow beam of lightrapidly across the image plane. These are called scanning-

projectors. For example, all laser-video projectors (e.g., Mi-

Scene (inset - sky image) Spread-and-average Scan-only Concentrate-and-scan [Proposed]

9am on a cloudy day. Ambient Illuminance Ra ≈ 22, 000lux. Number of input images = 18.

1pm on a clear, sunny day. Ambient Illuminance Ra ≈ 94, 000lux. Number of input images = 32.

Figure 6. Results of 3D scanning in sunlight. (a) Objects placed outdoors in two different ambient illumination conditions - 9am on a

cloudy day (top row) and 1pm on a bright sunny day (bottom row). 3D scanning results using (b) spread-and-average, (c) scan-only, and

(d) the proposed concentrate-and-scan approaches, respectively. For each row, the same capture time and power budget were used

for all three techniques. The spread-and-average method achieves a low SNR, resulting in large holes in the recovered shapes. The

scan-only methods results in low resolution, thus losing all the surface details. Moreover, there are holes due to discontinuities at the

boundaries. In contrast, the proposed method achieves high-quality results. The optimal block sizes for the concentrate-and-scan approach

are Kopt = 512 and 256 columns for the top and bottom rows, respectively. The total number of projector columns C = 1024.

croVision SHOWWX+TM Laser Projector) and laser scan-

ners belong to this category. The scanning mechanism israpid enough that the beam traverses the entire image plane

within the duration of one projected image. There are sev-

eral realizations of the scanning mechanism, e.g. a gal-vanometer or a rapidly rotating polygonal mirror. Our hard-

ware system is based on an off-the-shelf laser scanner from

Spacevision Ltd. (www.space-vision.jp). The scanner usesa rotating polygonal mirror, and is illustrated in Figure 5.

The key observation is that it is possible to implementdifferent light distributions by changing the speed of the

scanning mechanism (in our case, the rotation velocity ofthe polygonal mirror) 6 Let the total power of the source be

P . Suppose the scanning frequency is S scans-per-second

(sps). The camera frame rate is also S frames-per-second(fps) so that one image is captured for every scan. If the

total number of projector columns is C, the energy radi-

ated by a single column during a single image capture isPc = P

S×C. If the scanning speed is reduced by a factor ω,

only Cω

columns are illuminated in every captured image.The energy radiated by a column increases to ω × Pc. Fig-

ure 5 (c-e) shows a scene illuminated at three different ro-

tation speeds. As the speed decreases, the illuminated areadecreases, but the illumination strength increases.

Concentrate-and-scan structured light implementation:

Let the optimal block size be Kopt; the image plane is di-

6Different laser scanning speeds have been used for generating differentcamera exposures in a structured light setup [7].

vided into CKopt

blocks. Let the number of images required

for encoding each block be NK . Let the projected images

be {T ji | 1 ≤ i ≤ NK , 1 ≤ j ≤ C

Kopt}, where the subscript

and the superscript are the image-index within a block, and

the block index, respectively. Note that each Tji has Kopt

columns. Concentrate-and-scan structured light consists of

the following steps (for a pictorial explanation of the algo-rithm, see the project video available at [2]):

1. Reduce the scanner speed from S toS×Kopt

Csps. The

frame rate of the camera remains the same at S fps.2. For every i, concatenate all {T j

i | 1 ≤ j ≤ CKopt

}images into a single image T cat

i (having C columns).

T cati is projected during a single projector scan. In

this duration, the camera captures CKopt

images Iji , one

corresponding to each block.3. For each camera pixel x, identify the block j so that

Iji (x) > 0 for some i. This is the corresponding block,

that contains the corresponding column (no column is

encoded with an all zeros code). The corresponding

column is estimated using the decoding algorithm forthe coding scheme used within each block.

6. Results

Figure 6 shows 3D scanning results for objects placed

outdoors under different ambient illuminance levels. The

optimal block size was determined using Eq. 9. The con-stants λ = 4.47 and τ = 3.0 were estimated using the ex-

pressions given in the technical report [2]. Il and Ia were

Scene Close-ups of 3D Reconstructions

12pm on a sunny day. Ambient Illuminance Ra ≈ 90, 000lux. Number of input images = 32.

10am on a cloudy day. Ambient Illuminance Ra ≈ 22, 000lux. Number of input images = 18.

Figure 7. Structured light in the wild. 3D scanning results for two outdoor scenes with strong ambient light. In both cases, our method

achieves highly detailed 3D structure with a limited power budget (illuminance from source ≈ 50 lux) and few (< 50) images.

measured by capturing two HDR images of the scene - one

with the projector on, and one with the projector off 7. Bi-

nary Gray code patterns were used as the structured lightencoding scheme. Camera exposure times were chosen to

be just below the saturation level.We compare with the results of the spread-and-average

and the scan-only methods. The same capture time and

power budget were used for all the methods. Despite av-eraging, the spread-and-average method achieves low SNR,

resulting in large holes in the recovered shapes. For the

scan-only method, the width of the stripe was increased toCM

, where M is the number of measurements that can be

captured within the time budget, so that the whole image

plane is covered. Because the depth resolution is inverselycorrelated to the stripe-width, this method achieves a very

low depth resolution. Notice that all the surface details are

lost. Moreover, there are discontinuities at the stripe bound-aries. In contrast, the proposed method achieves results with

both high-resolution and high SNR.

Structured Light in the Wild: Figure 7 shows 3D scan-ning results for two outdoor scenes with strong ambient

light (90,000 and 22,000 lux). In both cases, our methodachieves highly detailed 3D structure with less than 50 im-

ages with a very limited power budget (source illuminance

≈ 50 lux).Illumination-adaptive structured light: Since the opti-

mal block size Kopt can be determined automatically us-

ing image-based measurements, we have implemented an

7It is assumed that Ra is constant across the scene. If there is largevariation in Ra (e.g., due to a shadow edge), different block sizes can beused for different parts of the scene.

illumination-adaptive structured light system. Figure 8

shows a scene scanned at various times of the day. As

the day progresses, ambient illuminance increases, and thenumber of measurements increases accordingly (10, 18, 18,

32 and 56). For each illumination level, we show compari-son with the spread-and-average method. For each instant,

the capture time and power budget are the same for both

the methods. For low ambient illumination, Kopt = C. Inthis case, our method behaves like the spread-only method.

As ambient illuminance increases, the result quality of the

spread-and-average scheme deteriorates. For a time-lapse

video of results, see the project video [2].

7. Discussion

Contributions: This paper proposes light distribution asa new dimension in the design of structured light systems.

We show that by controlling the distribution of the light, it

is possible to develop fast and accurate 3D scanning sys-tems that work in a wide range of outdoor scenarios with a

limited time and power budget.

Limitations: Our approach assumes that the power of thelight source, when completely concentrated into a single

line, is sufficient for the decodability condition to be sat-isfied. While this is true in most settings even for a low-

power light source, for parts of a highly specular objects,

the image component due to ambient illumination may betoo strong. In this case, even concentrating all the light into

a single column fails to overcome ambient illumination. An

example is illustrated in Figure 9. It is possible to overcomethis limitation partly by diffusing the projected patterns [9].

06:00 am 08:00 am 10:00 am 14:00 pm 12:00 pm

Object placed outdoors (inset - sky image).

3D scanning results using spread-and-average method.

3D scanning results using the proposed concentrate-and-scan method.

Figure 8. Illumination-adaptive structured light. 3D scanning results at different times of the day. For each instant (each column),

the capture time and power budget are the same for both methods. For low ambient illuminance (left), both concentrate-and-scan and

spread-and-average methods produce good results. As the day progresses, concentrate-and-scan method adapts to the ambient illuminance

level (increasing from left to right) by choosing the appropriate block size, and achieves results of much higher quality.

Acknowledgments: This research was supported by NSF

(grant IIS 09-64429) and ONR (grant N00014-11-1-0285).The authors are grateful to Yukio Sato of Space-Vision Inc.

for making the laser scanner and associated software avail-

able for the experiments reported in this paper.

References

[1] Kinect outdoors. www.youtube.com/watch?v=rI6CU9aRDIo. 1

[2] Project webpage. http://www.cs.columbia.edu/CAVE/

projects/StructuredLightInSunlight/. 3, 5, 6, 7

[3] G. J. Agin and T. O. Binford. Computer description of curved

objects. IEEE Transactions on Computers, 25(4), 1976. 2

[4] D. Caspi, N. Kiryati, and J. Shamir. Range imaging with

adaptive color structured light. IEEE PAMI, 20(5), 1998. 2

[5] S. W. Hasinoff, F. Durand, and W. T. Freeman. Noise-

optimal capture for high dynamic range photography. In

CVPR, 2010. 3

[6] E. Horn and N. Kiryati. Toward optimal structured light pat-

terns. Proc. 3DIM, 1997. 2, 5

[7] I. Ihrke, I. Reshetouski, A. Manakov, A. Tevs, M. Wand, and

H.-P. Seidel. A Kaleidoscopic Approach to Surround Geom-

etry and Reflectance Acquisition. Proc. IEEE CCD Work-

shop, 2012. 6

[8] C. Mertz, S. Koppal, S. Sia, and S. Narasimhan. A low-

power structured light sensor for outdoor scene reconstruc-

tion and dominant material identification. Proc. IEEE PRO-

CAMS, 2012. 3, 5

[9] S. K. Nayar and M. Gupta. Diffuse structured light. In ICCP,

2012. 7

[10] D. Padilla, P. Davidson, J. Carlson, and D. Novick. Ad-

vancements in sensing and perception using structured light-

Specular metal hemisphere 3D reconstructionFigure 9. Failure case. (a) A metal hemisphere reflects sunlight

specularly. (b) Inside the highlight, even concentrating all the pro-

jector light into a single column fails to overcome ambient illumi-

nation, resulting in a large hole in the reconstructed shape.

ing techniques: An ldrd final report. Sandia National Lab

Report, 2005. 1, 2

[11] J. Posdamer and M. Altschuler. Surface measurement by

space-encoded projected beam systems. Computer Graphics

and Image Processing, 18(1), 1982. 2

[12] J. Salvi, S. Fernandez, T. Pribanic, and X. Llado. A state of

the art in structured light patterns for surface profilometry.

Pattern Recognition, 43(8), 2010. 2

[13] V. Srinivasan, H. Liu, and M. Halious. Automated phase-

measuring profilometry: A phase mapping approach. Ap-

plied Optics, 24, 1985. 2, 5

[14] Y. Wang, K. Liu, D. Lau, Q. Hao, and L. Hassebrook. Maxi-

mum snr pattern strategy for phase shifting methods in struc-

tured light illumination. JOSA A, 27(9), 2010. 2

[15] L. Zhang, B. Curless, and S. Seitz. Rapid shape acquisi-

tion using color structured light and multi-pass dynamic pro-

gramming. Proc. IEEE 3DPVT, 2002. 2

An SNR analysis of structured light techniques

in the presence of strong ambient illumination

Supplementary Technical Reportfor Paper ID 0100

In this technical report, we provide an analysis on the image formulation ofstructured light based techniques in the presence of strong ambient illumination.In Section 1, we derive an analytical expression for the signal-noise-ratio (SNR)of structured light coding schemes. We show that the SNR is proportional tothe power ratio between structured light signal and ambient illumination. InSection 2, we formulate the theoretical relation between depth accuracy andSNR. Based on this relation, the concept of decodability condition is present forgeneral coding schemes. In Section 3, we derive a theoretical lower bound onacquisition time for different structured light coding schemes.

1 SNR formulation of structured light codingscheme in strong ambient illumination

Structured light methods belong to the category of active triangulation tech-niques. Its general setup consists of a strucutred light source (projector) and acamera. We model the structured light source L as a projector that has an imageplane with C columns. The projector projects spatio-temporally coded patternson the scene so that a unique intensity code is assigned to each column 1.

The intensity of a scene point S in a captured image is:

I = Il + Ia + η, (1)

where Il and Ia are intensities corresponding to the light source L and ambientillumination A, respectively. η is the camera noise. The goal is to extract thesignal component Il reliably from the captured images.

The accuracy of the estimated signal Il is proportional to the signal-to-noise-ratio:

1Because of epipolar geometry between the projector and camera, only 1D coding (e.g.,along the columns) on the projector plane is sufficient to perform depth recovery using trian-gulation.

1

SNR =∆(Il)

η=

1

L− 1· Ilη, (2)

where ∆(Il) is the minimal distance required to distinguish two different signals.L is number of different itensities used for coding. For example, since binarymethods [3] only use two different intenties, L = 2 in this case. For N-arycoding [1] or color coding [4], L equals to N or the color number.

The components Il and Ia are proportional to the irradiance values Rl andRa at scene point S due to the light source L and ambient illumination A,respectively:

Il = αRl, Ia = βRa, (3)

where α and β encapsulate the scene point’s BRDF, light fall-off, and cameragain. β also includes the effect of any optical (e.g., spectral or polarization)filtering used for reducing ambient illumination.

We assume the affine camera noise model, with both signal-dependent andsignal-independent terms [2]:

η2 = σ2r +

αRl + βRa

g, (4)

where σr is the standard deviation of the signal-independent sensor read noise,and g is camera gain. In scenarios with strong ambient illumination, Ra >> Rl,and the dominant source of noise is the signal-dependent photon noise, i.e.,σ2r << βRa

g . Then, the SNR is approximated as:

SNR ≈ λ · 1

L− 1· Rl√

Ra

, (5)

where λ =√

α2gβ . The reflection properties of the scene are assumed to be the

same for structured light and ambient illumination. Also, we used a narrow-band laser light source and spectral filter in front of our camera. This suppressesambient illumination by a factor of 20. In specific, α = 0.250, β = 0.0125,g = 4.0 for our setup and λ is caculated as 4.47 accordingly.

2 Theoretical relation between SNR and depthaccuracy

In this section, we derive the expression of depth accuracy given a SNR level.We have known that this relation depends on the structured light coding anddecoding algorithms. For analysis simplicity, we assume that minimum-distancedecoding is used for all coding schemes. Also, we assume the smallest hamming

2

distance of a coding scheme is always 1, which means the signal can be decodedcorrectly only when all the code digits are completely correct.

Suppose Iil is an intensity code, the captured intensity is Iil = Iil + η due tosensor noise. The condition that Iil can be decoded correctly with minimum-distance decoder is:

|Iil − Iil | < |Iil − Ijl |, for all j = i (6)

For Eq. 6 to hold for even the nearest two codes, we get the condition:

min(Iil − Ijl

2) >

τ

2· η, for all j = i (7)

where τ is a parameter to control the probability of decoding correctness. Sincephoton noise is the dominating noise source in strong ambient illumination,we use a gaussian distribution N(·) to approximate the noise distribution ofeach intensity code. Therefore, we can give the probability pr(i) that a singleintensity code is decoded correctly:

pr(i) = N(min(Iil − Ijl

2) >

τ

2· η ) = N(

1

L− 1· Il > τ · η ) = N( SNR > τ ) (8)

where L is the number of intensity levels as discussed in Section 1.We assume the code length of a specific coding scheme is Nc and all code

bits are independent to each other. The probability that the code sequence iscorrectly decoded is:

Pr =∏

pr(i), 1 < i < Nc (9)

The error tolerance of correspondence decoding is used as the measurementof depth accuracy.

δ = µ · (1−∏

pr(i)), 1 < i < Nc (10)

where µ is a weight constant set to be 10 in our setting.In order to achieve a desired depth accuracy δ, the SNR should be higher

than a threshold τ , i.e., SNR > τ . The threshold τ increases monotonicallywith δ. Substituting in Eq. 5:

1

L− 1· Rl√

Ra

≥ τ

λ. (11)

We call this the decodability condition. In order to achieve the desired depthaccuracy, all the captured images must satisfy the decodability condition. Inour experiment, we set the decoding error tolerance to be 0.5 column in average.As a result, the τ equals to 3.0 in our experiment.

3

3 Theoretical lower bound on acquisition timeof structured light coding schemes

In this section, we discuss the theoretical lower bound on the aquisition time ofexisting structured light coding schemes. As introduced in Section 2, to achievea required accuracy level, structured light coding schemes have to meet thedecodability condition.

When the SNR is insufficient to hold the decodability condition, a commontechnique for increasing SNR is by capturing multiple frames per image. It willincrease the SNR by a factor of

√f when f frames are averaged per image. To

meet the decodability condition using frame-averaging, we get the expressionfor the minimal number of frames per image:

f ≥(τ · (L− 1)

λRl

)2

Ra . (12)

Let Nc be the number of images required by the particular structured lightcoding scheme used to encode all the projector columns uniquely, and f , asdefined above, is the number of frames to be averaged per image. Then, thetotal number of measurements M is given as:

M = Nc × f . (13)

If we set a constant exposure time to all the captured frames, we will get alower bound on the total number of measurements, which is also a lower boundon the aquisition time:

M ≥ Nc ·max(

(τ · (L− 1)

λRl

)2

Ra, 1) . (14)

Eq. 14 is applicable to general existing structured light schemes. For exam-ple, we set the L = 2, then we get the lower bound for binary coding. We setthe L to be the number of intensity levels in N-ary or color coding, then we alsoget the lower bound for them.

In our paper, we introduce the concept of light redistribution. Then webring in a new dimensoin in the Eq. 14. If the total column number is C, andK columns are illuminated at a time, Eq. 14 can be re-written as:

M ≥ C

K·N

′

c ·max(

(K · τ · (L− 1)

C · λRl

)2

Ra, 1) . (15)

For more discussion on the light redistribution, please check the paper formore details.

4

References

[1] D. Caspi, N. Kiryati, and J. Shamir. Range imaging with adaptive color structuredlight. IEEE PAMI, 20(5), 1998.

[2] Samuel W. Hasinoff, Fredo Durand, and William T. Freeman. Noise-optimal cap-ture for high dynamic range photography. In CVPR, 2010.

[3] J. Posdamer and M. Altschuler. Surface measurement by space-encoded projectedbeam systems. Computer Graphics and Image Processing, 18(1), 1982.

[4] L. Zhang, B. Curless, and S.M. Seitz. Rapid shape acquisition using color struc-tured light and multi-pass dynamic programming. Proc. IEEE 3DPVT, 2002.

5

structured light in sunlight - wision lab...

Documents