an investigation of using random dot patterns to achieve x ... · augmented reality displays ....

An Investigation of Using Random Dot Patterns to Achieve X-Ray Vision for Near-Field Applications of Stereoscopic

Video Based Augmented Reality Displays

by

Sanaz Ghasemi

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Mechanical and Industrial Engineering University of Toronto

© Copyright by Sanaz Ghasemi 2018

ii

An Investigation of Using Random Dot Patterns to Achieve X-Ray

Vision for Near-Field Applications of Stereoscopic Video Based

Augmented Reality Displays

Sanaz Ghasemi

Doctor of Philosophy

Department of Mechanical and Industrial Engineering

University of Toronto

2018

Abstract

As one of the most interesting applications of Augmented Reality, ‘X-ray vision’ involves the

presentation of computer generated objects as if they lie behind a real object surface. Achieving

this notion presents several challenges, including perceptual ambiguity about the ordinal and

absolute depth of the virtual object relative to the real object’s surface and maintaining sufficient

information about the virtual and real object. This thesis investigates how random dots on an

object’s surface can facilitate seeing a virtual object as behind the surface with stereoscopic

displays.

Using a psychophysical method, Experiment 1 demonstrated the potential of this approach to

improve ordinal depth judgements. Experiment 2 investigated the effect of dot size and dot

density on transparency ratings of the real surface while preserving surface details. Paired

Comparison results revealed an advantage of the proposed method in comparison with the ‘no

pattern’ condition for the transparency ratings. Surface detail preservation was also descriptively

shown to decrease with increasing dot density and dot size.

iii

Experiment 3 explored the impact of variations in image sharpness, dot size and dot density of

the random dot pattern, and the depth of the virtual object, on the accuracy and difficulty of

absolute depth judgements about the virtual object. Compared to the ‘no-pattern’ condition, the

random dot patterns improved the accuracy of depth judgments. In estimating the depth of the

virtual object when random dot patterns were used, no main effect of dot size was found.

However, interactions suggested higher dot densities lead to smaller errors. Moreover, subjective

difficulty ratings in performing depth judgements with sharper patterns may indicate that the

random dot patterns support the use of convergence as a beneficial depth cue. The implications

of these findings for the design of ‘X-ray’ displays for near-field (including medical)

stereoscopic AR are discussed.

iv

Acknowledgments

First and foremost, I would like to express my gratitude to my PhD supervisor, Professor Paul

Milgram, who provided me with insightful guidance throughout my studies, gave me the courage

and taught me to stand up for my knowledge and beliefs as a researcher and encouraged me to

continue pursuing this research when I needed it. I am also particularly grateful for the

meticulous attention and valuable time he gave to developing my written and presentation skills.

What I learned from you will never be forgotten.

I would also like to sincerely thank Professor John Kennedy who genuinely cared about and

made valuable contributions to my research. I am grateful for the time and expertise that

Professor Justin Hollands provided me with. I would also like to acknowledge the insightful

comments and invaluable feedback I received from my external examiner, Professor Victoria

Interrante. Her PhD thesis, vast expert knowledge and kindness have and will continue to serve

as an inspiration for me. I would also like to thank Professor Mark Chignell for serving as a

voting member of my final examination.

Finally, I would like to express my utmost gratitude to my parents who have provided me with a

life I am grateful for every single day. If I have accomplished anything in my life, it has been

because of you, your sacrifices, your encouragements and your wisdom. I would also like to

thank Nazanin, my sister, who has been my truest companion and the most reliable and loving

friend I know. You make life so much better. I am also thankful to Hooman Abbasian who

helped tremendously in guiding me through my data analysis.

Last but not least, I want to thank the love of my life, Nader Noroozi, who has been the

strongest, most supportive and most caring partner and the greatest source of motivation for me

through the thick and thin we’ve been through during the past year. Although you only joined me

during the last year of my PhD journey, I am forever indebted to you for this accomplishment.

v

Table of Contents

Acknowledgments.......................................................................................................................... iv

Table of Contents .............................................................................................................................v

List of Tables ................................................................................................................................. ix

List of Figures ..................................................................................................................................x

List of Appendices ...................................................................................................................... xvii

Chapter 1 ..........................................................................................................................................1

Introduction & Overview ............................................................................................................1

Chapter 2 ..........................................................................................................................................7

Perceptual Background ...............................................................................................................7

2.1 Depth Cues ...........................................................................................................................7

2.1.1 Occlusion (Interposition) .........................................................................................8

2.1.2 Relative (Familiar) Size ...........................................................................................8

2.1.3 Accommodation .......................................................................................................9

2.1.4 (Binocular) Convergence .........................................................................................9

2.1.5 Binocular Disparity ..................................................................................................9

2.2 Integration of Depth Cues ..................................................................................................10

2.3 Surfaces ..............................................................................................................................13

2.4 Texture ...............................................................................................................................14

2.5 Transparency ......................................................................................................................14

Chapter 3 ........................................................................................................................................16

X-Ray Vision in AR: Literature Review ...................................................................................16

3.1 Challenges ..........................................................................................................................16

3.2 Review of Proposed Solutions ...........................................................................................20

3.2.1 Cutaway or Virtual Hole ........................................................................................20

3.2.2 Modified Opacity ...................................................................................................21

vi

3.2.3 Context-preserving Techniques .............................................................................25

3.3 Criteria for Success ............................................................................................................26

Chapter 4 ........................................................................................................................................28

Our Method ...............................................................................................................................28

4.1 Use of Texture....................................................................................................................28

4.2 Stereo-Translucency ..........................................................................................................29

4.3 Information Preservation ...................................................................................................34

4.4 Computational Costs ..........................................................................................................34

4.5 Past Work ...........................................................................................................................35

Chapter 5 ........................................................................................................................................37

Experiments 1 and 2: Effect of Using Random Dot Patterns on Depth Order

Disambiguation, Perception of Transparency and Surface Information Preservation ..............37

5.1 Purpose ...............................................................................................................................38

5.2 Experimental Method.........................................................................................................38

5.2.1 Image Generation and Presentation .......................................................................39

5.2.2 Participants .............................................................................................................41

5.3 Experiment 1 ......................................................................................................................42

5.3.1 Objectives and Hypotheses ....................................................................................42

5.3.2 Procedure ...............................................................................................................43

5.3.3 Results and Discussion ..........................................................................................44

5.4 Experiment 2 ......................................................................................................................49

5.4.1 Objectives, Hypotheses and Procedure ..................................................................49

5.4.2 Results and Discussion ..........................................................................................57

5.5 Contributions, Limitations and Conclusions......................................................................60

Chapter 6 ........................................................................................................................................62

Experiment 3: Effect of Using Random Dot Patterns for Improving Accuracy of Depth

Judgements ................................................................................................................................62

vii

6.1 Purposes .............................................................................................................................62

6.2 Experimental Method.........................................................................................................63

6.2.1 Image Generation and Presentation .......................................................................63

6.2.2 Participants .............................................................................................................72

6.2.3 Procedure ...............................................................................................................72

6.2.4 Depth Judgement Task ...........................................................................................74

6.3 Hypotheses .........................................................................................................................76

6.3.1 Estimated Depth of Virtual Object relative to real surface (EDVO) .....................77

6.3.2 Difficulty Rating of depth estimation task (DR)....................................................77

6.4 Results ................................................................................................................................78

6.4.1 Estimated Depth of Virtual Object (EDVO) ..........................................................79

6.4.2 Difficulty Rating of depth estimation task (DR)..................................................101

6.4.3 Correspondence between Average Absolute Errors in EDVO and DRs .............107

6.4.4 Responses to the Interview Questions .................................................................108

6.5 Discussion ........................................................................................................................111

6.5.1 Errors in EDVO ...................................................................................................111

6.5.2 DRs ......................................................................................................................113

6.5.3 Relationship between Average Absolute Errors in EDVO and DRs ...................115

6.5.4 Some Notes on the Responses to the Interview Questions ..................................115

6.6 Contributions and Limitations .........................................................................................117

Chapter 7 ......................................................................................................................................119

Conclusion ..............................................................................................................................119

7.1 Contributions....................................................................................................................119

7.2 Practical Implications.......................................................................................................120

7.3 Limitations and Suggested Improvements to Experiments .............................................121

7.3.1 Experiments 1 and 2.............................................................................................121

viii

7.3.2 Experiment 3 ........................................................................................................122

7.4 Future Work .....................................................................................................................123

References ....................................................................................................................................125

Appendix A: Forms and Questionnaires ......................................................................................132

A1. Experiment 1 ....................................................................................................................132

A2. Experiment 2 ....................................................................................................................135

A3. Experiment 3 ....................................................................................................................139

Appendix B: Supplementary Material for Chapter 6 (Experiment 3)..........................................144

B.1. Summary of Insights Gained from Pilot Studies .............................................................144

B.2. Difficulty Rating of Depth Estimation Task ...................................................................145

B.3. Transcript of Interviews with Participants ......................................................................152

Appendix C: Enlarged Stereo Images ..........................................................................................162

C.1. Figure 1.3 .........................................................................................................................162

C.2. Figure 4.2 (a) ...................................................................................................................163

Appendix D: Depth Cues .............................................................................................................164

Object-centered Cues .......................................................................................................164

(Static) Observer-centered cues .......................................................................................167

Appendix E: List of Abbreviations ..............................................................................................169

ix

List of Tables

Table 3.1: Summary of literature review on perceptual issues of X-ray vision in AR. ............... 19

Table 6.1: Contrast results for significant interaction effects between depth and pattern. The rows

are colour coded to aid in identification of patterns with the same dot size. ................................ 88

x

List of Figures

Figure 1.1: This simplified Reality-Virtuality continuum shows the various proportions with

which real (shown in blue) and virtual (shown in red) worlds can be combined to display

information. (Adapted from Milgram and Kishino, 1994). ............................................................ 2

Figure 1.2: Methods used by Schall et al. (a) (Schall et al., 2012), Lerotic et al. (b) (Lerotic et al.,

2007) and Mohr et al. (Mohr et al., 2015) applying the X-ray vision notion to present internal

structures for different applications. Image (a): “AR view with superimposed enclosures and

base point of the building corner and a capping registered in 3D.” Reprinted by permission from

RightsLink: Springer, Personal and ubiquitous computing, Smart Vidente: advances in mobile

augmented reality for interactive visualization of underground infrastructure, Schall, G.,

Zollmann, S., & Reitmayr, G., Copyright 2012 by Springer. Image (b): “fused NPR AR with the

original video.” Reprinted by permission from RightsLink: Springer Berlin Heidelberg, Medical

Image Computing and Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-

photorealistic rendering for augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang,

G. Z., Copyright 2007 by Springer-Verlag Berlin Heidelberg. Image (c): AR view showing the

interior of a coffee machine to aid in maintenance procedures. Courtesy of Peter Mohr. ............. 3

Figure 1.3: Stereo pairs. The blue circle indicates a virtual object rendered behind the surface of

a real object (the face). In this case, although the occlusion cue suggests that the virtual object is

in front of the real surface, the addition of the random dot patterns is intended to aid the observer

in correctly perceiving the virtual object as being inside the person’s head. An enlarged

(landscape) version of this image is provided in Appendix C1 to help in perceiving the desired

percept. The observer’s left eye should find a rearward ring shifted to the right compared to the

nose, and the right eye should find it shifted to the left. To view the image in this figure (as well

as all other stereo pairs presented in this thesis) in stereo without the aid of any stereoscopic

viewing equipment, the reader is advised to free fuse the images, using the white squares at the

top as a fixation point. Depending on which method the reader finds easier, either a) cover the

right image and, while observing the left pair, allow your eyes to relax, as if looking into the

distance, until the two images fuse into one (parallel fusing); or b) cover the left image and,

while observing the right pair, cross (i.e. converge) your eyes until the two images fuse into one

xi

(cross fusing). (Note that fusing this image is supposed to be difficult, as a consequence of the

cue conflict outlined above.) ........................................................................................................... 5

Figure 3.1: Example of virtual hole metaphor used by Rosenthal et al. (2002) for the task of

targeting needle biopsies in phantoms. The vertical lines are meant to show the sides of the

virtual hole. Reprinted from Medical Image Analysis, Vol. 6, Rosenthal et. al, Augmented reality

guidance for needle biopsies: An initial randomized, controlled trial in phantoms, 313-320, 2002,

with permission from Elsevier. ..................................................................................................... 21

Figure 3.2: Example of the Modified Opacity visualization method used by Bichlmeier et al.

(2007). Reprinted by permission from IEEE 2007. ...................................................................... 22

Figure 3.3: The 7 evaluated visualizations studied by Sielhorst et al. (2006). Visualizations 2 and

3, corresponding respectively to surface rendering transparently superimposed and surface

rendering through a virtual window in the skin, were determined to be the best in terms of depth

perception and effectiveness. Reprinted by permission from RightsLink: Springer Berlin

Heidelberg, International Conference on Medical Image Computing and Computer-Assisted

Intervention, Depth perception–a major issue in medical AR: evaluation study by twenty

surgeons, Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N., Copyright 2006 by

Springer-Verlag Berlin Heidelberg. .............................................................................................. 24

Figure 3.4: A context-preserving visualization used by Lerotic et al. (2007). Reprinted by

permission from RightsLink: Springer Berlin Heidelberg, Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-photorealistic rendering for

augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z., Copyright 2007 by

Springer-Verlag Berlin Heidelberg. .............................................................................................. 26

Figure 4.1: Stereo pairs. The blue circle indicates a virtual object rendered in front of the surface

of a real object (the face). In this case, the binocular disparity cue and the occlusion cue provide

consistent information, allowing the virtual object to be perceived unambiguously as being in

front of the person’s face. Note that the middle face shows less mouth than the left face and the

eyebrows are more extensive in the left face. ............................................................................... 29

Figure 4.2: In both sets of stereo pairs (a) and (b) (identical to Figure 1.3), the blue virtual circle

is stereoscopically rendered behind the face. In this case, the binocular disparity cue and the

xii

occlusion cue provide inconsistent information, leading to a cue conflict: (a) Untreated image.

An enlarged (landscape) version of this image is provided in Appendix C2 to help in perceiving

the desired percept.; (b) Addition of random dots onto the face (using a projector). If successful,

the reader should more easily perceive the virtual circle as being behind the face in (b), relative

to (a). ............................................................................................................................................. 31

Figure 4.3: Hypothesised percept when using a dot pattern as a means of surface manipulation.

The top portion of the image shows a magnified (2D) view of the real surface (skin), which has

been altered by adding a random dot pattern. The lower portion of the image shows the top view

of the observer as he/she may perceive the image if this percept is achieved. ............................. 34

Figure 4.4: Sample stimulus used by Otsuki and Milgram (2013). The blue circle indicates a

virtual object rendered beneath the depicted surface, which has been modified through the

addition of a pattern of random black dots. Reprinted by permission from IEEE 2013. .............. 35

Figure 5.1: Example of a stimulus stereo pair used in the experiment. The blue circle indicates a

virtual object rendered beneath a textured purple surface, which has been modified through the

addition of a pattern of random black dots. (The reader is referred to Figure 1.3 for instructions

on how to free fuse such stereo images.) ...................................................................................... 40

Figure 5.2: Stimuli used for Experiments 1 and 2. Only the 9 stimuli in the 40, 50 and 60%

columns were used in Expriment 1. All 12 stimuli were used in Experiment 2. .......................... 42

Figure 5.3: Psychophysical functions fitted to results of Experiment 1 for dot sizes of (a) 1/25,

(b) 1/50 and (c) 1/75. .................................................................................................................... 49

Figure 5.4: Samples of stereo pairs illustrating the shape matching task for assessment of surface

information. (a) The inner and outer yellow objects are both circles. (b) The inner yellow object

is an ellipse. (a) and (b) constitute the No Pattern condition. (c) Example of task with random dot

pattern present, and where inner yellow object is an ellipse. The orientation of the major axes of

the ellipses in (b) and (c) are 54º (corresponding to level 3) and 144º (corresponding to level 8),

respectively. .................................................................................................................................. 54

xiii

Figure 5.5: Options for designating the orientation of the major axis in ellipse conditions. This

image was provided as a guide for assisting participants in selecting their responses to the ellipse

axis orientation questions, in the form of numerals 0 to 10 on the computer keypad. ................. 54

Figure 5.6: Schematic illustration of hypotheses for both parts of Experiment 2. (a) effect of dot

density (H3); (b) effect of dot size (H4). ...................................................................................... 57

Figure 5.7: d’ and Transparency Rating (TR) results obtained from Experiment 2. The solid lines

join the d’ results, corresponding to the left hand axis, while the dashed lines join the

transparency ratings (TR), corresponding to the right hand axis. The yellow horizontal lines

correspond to the No Pattern condition. ....................................................................................... 58

Figure 5.8: Mean absolute offset errors as a function of dot size and dot density. The orange

horizontal line corresponds to the No Pattern condition. .............................................................. 59

Figure 6.1: Sample stereo pair of stimuli shown to participants. The pattern used in this example

consisted of random dots with sizes of 1/75 and distributed with 40% dot density. For guidance

on how to fuse these images, see explanation provided in caption of Figure 1.3. ........................ 64

Figure 6.2: Diagram presenting different parts of the stimulus. ................................................... 64

Figure 6.3: Stimuli with sharp random dot patterns used for Experiment 3. ................................ 66

Figure 6.4: Stimuli with blurry random dot patterns used for Experiment 3. ............................... 67

Figure 6.5: Front, side and top views of wireframe truncated cone (the virtual object) and

cylindrical bin (the real object). .................................................................................................... 68

Figure 6.6: Diagram showing real model used for generating virtual truncated cone. (It should be

noted that, as mentioned, this model consists of concentric circles. However, since the image

shows this model from the side, these circles appear in this figure as ellipses.) .......................... 69

Figure 6.7: Sequence of steps taken to generate virtual truncated cone. As expected, the

connecting rod between the two circles cannot be seen in these images, as it is perpendicular to

the line of sight.............................................................................................................................. 69

xiv

Figure 6.8: Schematic top view diagram of rod with circles placed at 6 different depths relative to

the surface of the bin. The black numbers noted on the rod are indicative of the proportion of the

truncated cone that was placed behind the bin’s surface. (The light blue lines joining the circles

represent the sides of the final virtual truncated cone.) ................................................................ 70

Figure 6.9: Schematic diagram showing the camera setup with respect to the bin (above), which

was replaced by the rod connecting the circles (below), which was placed along the red dashed

line (marking the surface of the bin’s location). ........................................................................... 71

Figure 6.10: Guide placed next to monitor for participants’ reference during experiment. ......... 74

Figure 6.11: Top view example of truncated cone’s position relative to the surface of the bin. In

this image ‘a’ and ‘b’ denote the distance of the larger and smaller circles of the truncated cone

relative to the bin’s surface, respectively. ..................................................................................... 76

Figure 6.12: Experimental hypotheses 6 and 7, illustrating expected changes in DR as a function

of dot size and dot density. ........................................................................................................... 78

Figure 6.13: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the

virtual object’s actual depth proportion for various dot sizes and dot density of 20%: (a) Sharp

condition, (b) Blurry condition. The sizes and colours of the dots are proportional to the number

of occurrences at each point. Each column adds up to 75 trials (15 participants*5 trials). A blue

trend line has been fitted to the data. The y=x and y=5 reference lines are also provided to show

perfect and chance performance, respectively. ............................................................................. 80



condition, (b) Blurry condition. .................................................................................................... 81



condition, (b) Blurry condition. .................................................................................................... 82

Figure 6.16: Scatterplot showing the ‘Estimated Depth of Virtual Object’ as a function of the

virtual object’s actual depth proportion for the No Pattern condition. ......................................... 83

xv

Figure 6.17: Plot showing the Point of Subjective Equality as a function of dot density. The PSE

for the No Pattern condition is shown for reference. .................................................................... 84

Figure 6.18: Average absolute error in perceived depth as a function of the virtual object’s actual

depth relative to the real surface for dot size = 1/25. .................................................................... 86





Figure 6.21: Average absolute error in perceived depth of virtual object as a function of its actual

depth depicting the interaction effect of blur and depth. .............................................................. 92

Figure 6.22: Average absolute error in perceived depth of virtual object as a function of dot

density depicting the interaction effect of blur and dot density. ................................................... 93


depth, depicting the interaction effect of depth and dot density for dot size=1/25. ...................... 95


depth depicting the interaction effect of depth and dot density for dot size=1/50. ....................... 95


depth depicting the interaction effect of depth and dot density for dot size=1/75. ....................... 96


depth depicting the interaction effect of blur, depth and dot density for dot size=1/25. .............. 98


depth depicting the interaction effect of blur, depth and dot density for dot size=1/50. .............. 99


depth depicting the interaction effect of blur, depth and dot density for dot size=1/75. ............ 100

xvi

Figure 6.29: Average difficulty ratings as a function of the virtual object’s actual depth relative

to the real surface for dot size = 1/25. ......................................................................................... 102





Figure 6.32: Effect of depth on DRs. .......................................................................................... 105

Figure 6.33: DRs as a function of dot size depicting the interaction effect of blur and dot size. 107

Figure 6.34: Scatterplot showing average absolute errors in perceived depth as a function of the

difficulty rating for all trials. ....................................................................................................... 108

Figure B.1: Stereoscopic image showing the pyramid (=virtual object) placed at halfway along

its length relative the surface of the bin (=real surface). The apex in this image is pointed towards

to the observer …………………………………………………..………………………………145

Figure B.2: Scatterplots showing the ‘Difficulty Rating’ as a function of the virtual object’s

actual depth proportion for dot density 20% and various dot sizes: (a) Sharp condition, (b) Blurry

condition. The size and colour of the dots are proportional to the number of occurrences at each

point. ........................................................................................................................................... 147

Figure B.3: Scatterplots showing the ‘Difficulty Level’ as a function of the virtual object’s actual

depth for various dot sizes and dot density of 40%: (a) Sharp condition, (b) Blurry condition. 149

Figure B.4: Scatterplots showing the ‘Difficulty Level’ as a function of the virtual object’s actual

depth for various dot sizes and dot density of 60%: (a) Sharp condition, (b) Blurry condition. 151

Figure B.5: Scatterplot showing the ‘Difficulty Level’ as a function of the virtual object’s actual

depth for the No Pattern condition. ............................................................................................. 152

xvii

List of Appendices

Appendix A: Forms and Questionnaires ......................................................................................132

A1. Experiment 1 ....................................................................................................................132

A2. Experiment 2 ....................................................................................................................135

A3. Experiment 3 ....................................................................................................................139

Appendix B: Supplementary Material for Chapter 6 (Experiment 3)..........................................144

B.1. Summary of Insights Gained from Pilot Studies .............................................................144

B.2. Difficulty Rating of Depth Estimation Task ...................................................................145

B.3. Transcript of Interviews with Participants ......................................................................152

Appendix C: Enlarged Stereo Images ..........................................................................................162

C.1. Figure 1.3 .........................................................................................................................162

C.2. Figure 4.2 (a) ...................................................................................................................163

Appendix D: Depth Cues .............................................................................................................164

Object-centered Cues .......................................................................................................164

(Static) Observer-centered cues .......................................................................................167

Appendix E: List of Abbreviations ..............................................................................................169

1

Chapter 1

Introduction & Overview

The goal of this thesis is to show a tactic for improving stereovision that reveals objects behind a

surface. Three experiments will be reported – one on the potential of this approach to improve

ordinal depth judgements, one on achieving transparency of the surface while preserving surface

details, and one on the improvement of absolute depth judgements using this approach. To

provide an appropriate context for the understanding of these experiments, this chapter is focused

on preliminary background information and gives an overview of the chapters to follow.

Despite the rapidly expanding application areas of this technology, Augmented Reality (AR) has

been around since the 1960’s, although the term “Augmented Reality (AR)” came to life only 25

years ago when Caudell and Mizell (1992) used it for training purposes in an industrial setting.

Since then, AR technologies have demonstrated success in a variety of medical, personal,

navigation, television, advertising and commerce, and gaming application domains (Schmalstieg

and Höllerer 2017). This advance can be attributed to the wide range of applications that can

benefit from the addition of computer-generated (virtual) elements to images of the real world.

Milgram and Kishino (1994) used the Reality-Virtuality continuum, as depicted in Figure 1.1, to

define AR displays, based on their definition: “AR displays are those in which the image is of a

primarily real environment, which is enhanced, or augmented, with computer-generated

imagery”. There are three primary methods to achieve visual AR (Schmalstieg and Hollerer,

2016):

1. Optical see-through (OST) displays allow the user to see the real world through an

optical combiner that is used to reflect the computer generated virtual objects onto the

user’s eyes (Rolland & Fuchs, 2000).

2

2. Video based displays1 combine the real and virtual objects electronically. In other

words, the image of the real world captured by a camera is displayed on a conventional

viewing device (such as a monitor) and the virtual elements are added onto the image

using a graphics processor.

3. Spatial Projection refers to cases where a light projector is used to project a virtual

image directly onto a real object.

Figure 1.1: This simplified Reality-Virtuality continuum shows the various proportions with

which real (shown in blue) and virtual (shown in red) worlds can be combined to display

information. (Adapted from Milgram and Kishino, 1994).

One of the most intriguing applications of AR is the notion of “X-ray vision,” denoting the

ability to virtually “see through” a real object’s surface to present information that is not

otherwise visible to the user (Livingston, Dey, Sandor, & Thomas, 2013). In contrast to most AR

applications, which involve superimposing computer generated images onto real objects, the

present context involves adding images beneath, or behind, the surface of real objects.

The ability to ‘see through’ a surface or have ‘X-ray vision’ has a wide range of applications in

various realms. For example, in civil engineering and for surveying purposes, visualizations can

be used to reveal hidden subsurface infrastructures, such as underground pipes (Schall, Zollman,

& Reitmayr, 2012). In medical applications, preoperative ultrasound images can be overlaid onto

the organ to show its underlying anatomy (Lerotic, Chung, Mylonos & Yang, 2007). In industrial

settings, seeing through machines can help maintenance engineers and other workers perform

1 These displays are most commonly referred to as Video See-Through (VST) displays. However, to prevent

confusion with the concept of OST displays, where an optical element is actually looked through, we have refrained

from using this term,

3

various operations without the need to memorize manuals and documentation (Mohr et al.,

2015). Sample images of such applications are shown in Figure 1.2.

(a) (b)

(c)

Figure 1.2: Methods used by Schall et al. (a) (Schall et al., 2012), Lerotic et al. (b) (Lerotic et

al., 2007) and Mohr et al. (Mohr et al., 2015) applying the X-ray vision notion to present internal

structures for different applications. Image (a): “AR view with superimposed enclosures and

base point of the building corner and a capping registered in 3D.” Reprinted by permission from

RightsLink: Springer, Personal and ubiquitous computing, Smart Vidente: advances in mobile

augmented reality for interactive visualization of underground infrastructure, Schall, G.,

Zollmann, S., & Reitmayr, G., Copyright 2012 by Springer. Image (b): “fused NPR AR with the

4

original video.” Reprinted by permission from RightsLink: Springer Berlin Heidelberg, Medical

Image Computing and Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-

photorealistic rendering for augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang,

G. Z., Copyright 2007 by Springer-Verlag Berlin Heidelberg. Image (c): AR view showing the

interior of a coffee machine to aid in maintenance procedures. Courtesy of Peter Mohr.

Regardless of the realm of application, achieving the metaphor of X-ray vision is difficult since it

is ‘unnatural’ in the real world. One of the major challenges involved is the potential perceptual

ambiguity caused by simply superimposing a hidden virtual object onto the image of a real

object surface2. The consequent blocking off of the real surface suggests to the observer that the

virtual object must be in front of the real surface, rather than behind it, thus contradicting the

notion of X-ray vision. Even with stereoscopic (3D) displays, simply rendering a virtual object at

the proper depth “correctly” behind a real object may nevertheless create the perception of a

floating virtual object in front of the surface of the real object (Drascic & Milgram, 1996;

Johnson, Edwards, & Hawkes, 2003). This is a consequence of the strength of the occlusion cue3

(Cutting & Vishton, 1995). Even when the ordinal depths of the virtual object and the real

surface are judged correctly, research has shown that the presence of a real surface in front of a

virtual object can lead to imprecise judgments of the absolute depth of the virtual object

(Edwards et al., 2004) and that the content of the real surface can reduce the distance within

which the virtual object can be placed from the real surface without leading to double vision

(Johnson et al., 2003).

To deal with the challenges involved in the simultaneous presentation of overlapping surfaces,

various researchers have suggested the addition of some sort of ‘texture’ to the real surface

(Interrante, Fuchs and Pizer, 1997; Zollmann, Kalkofen, Mendez and Reitmay, 2010; Lerotic et

al., 2007; Avery, Sandor & Thomas, 2009). However, these methods either require precise

modelling of the real surface, are not applicable to cases where occlusion of the virtual object is

2 For the sake of clarity, in describing this method we use the term ‘surface’ to refer to the surface of a real object,

which has been captured by some kind of a sensor and has been reproduced in the image. The computer-generated

object, on the other hand, will be referred to as the virtual ‘object’.

3 For an explanation of this cue, see Chapter 2.

5

difficult to realize, or require the real object’s surface to possess salient features in order for the

algorithms to function effectively.

In this thesis, we propose another method of dealing with the perceptual challenges involved in

X-ray vision. With this method, which we are proposing be used for near-field applications of

AR, an artificial non-uniform texture is added to the surface of a real object. The key differences

of our approach from others’ are that: (a) our texture involves randomly distributed (black) dots

(similar to those used in random dot stereograms); (b) the only depth cues that are present are the

occlusion and binocular disparity cue (which limits the application of this method to stereoscopic

displays only); and (c) the occlusion cue is not consistent with the binocular disparity cue (the

virtual object occludes the real surface). An example of the application of this method is

provided in Figure 1.3.

Figure 1.3: Stereo pairs. The blue circle indicates a virtual object rendered behind the surface of

a real object (the face). In this case, although the occlusion cue suggests that the virtual object is

in front of the real surface, the addition of the random dot patterns is intended to aid the observer

in correctly perceiving the virtual object as being inside the person’s head. An enlarged

(landscape) version of this image is provided in Appendix C1 to help in perceiving the desired

percept. The observer’s left eye should find a rearward ring shifted to the right compared to the

nose, and the right eye should find it shifted to the left. To view the image in this figure (as well

as all other stereo pairs presented in this thesis) in stereo without the aid of any stereoscopic

viewing equipment, the reader is advised to free fuse the images, using the white squares at the

top as a fixation point. Depending on which method the reader finds easier, either a) cover the

right image and, while observing the left pair, allow your eyes to relax, as if looking into the

6

distance, until the two images fuse into one (parallel fusing); or b) cover the left image and,

while observing the right pair, cross (i.e. converge) your eyes until the two images fuse into one

(cross fusing). (Note that fusing this image is supposed to be difficult, as a consequence of the

cue conflict outlined above.)

It was hypothesized that adding these random dot patterns would improve the observer’s ability

to perceive the correct depth of the virtual object, both relatively and absolutely. To investigate

and optimize the effect on depth perception, the dot sizes and dot densities were varied

throughout the experiments conducted.

While various methods have been shown to be effective for improving depth perception for

applications of X-ray vision with OST displays, there is a paucity of literature that focuses on

this issue with video-based AR displays. Moreover, the anticipated consumer applications of X-

ray vision in AR have led to most literature being focused on the challenges involved in

achieving X-ray vision for medium and far-field distances. In this project, I have designed and

conducted experiments that demonstrate the effectiveness of adding random dot patterns in

improving depth perception for near-field applications of video-based AR displays.

In the next chapter, the perceptual background and terminology necessary to understand the idea

behind this research are presented. Chapter 3 provides an overview of existing solutions that aim

to deal with challenges of X-ray vision in AR, followed by a discussion of the details of our idea

presented in Chapter 4. Chapters 5 and 6 provide the details of the three experiments conducted

to evaluate the proposed concept. The final chapter summarizes the contributions and limitations

of this research.

7

Chapter 2

Perceptual Background4

The effectiveness of any visualization technique heavily depends on the ability to understand and

exploit the way in which our visual system extracts and integrates information from the real

world through perception. It is therefore useful to sketch some fundamentals of perception as

they relate to the topic of this thesis before moving on to methods used in achieving X-ray vision

for AR applications (Chapter 3). In this chapter, we first discuss the basis of perceiving depth

(Wickens, Hollands, Banbury, & Parasuraman, 2000) as it relates to the topic of this thesis and

then move on to providing definitions of specific terms used in this context.

2.1 Depth Cues

To estimate the depth of objects, our visual system relies on various sources of depth

information, which are defined and categorized as depth cues. While some of these cues provide

information about the ordinal or relative depth of objects (e.g. which is closer or nearest), others

provide absolute5 depth information, which allows an observer to ascertain the absolute size of a

measurement (e.g. in meters).

In addition to providing different types of information, the relative ‘strengths’ of depth cues also

vary at different distances (Cutting and Vishton, 1995). Therefore, to understand how various

depth cues are used to perceive the 3D layout of our environment, Cutting and Vishton (1995)

divided the continuum of depth into three regions: personal space, action space and vista space.

These terms are also commonly referred to as near-field, medium-field and far-field distances,

respectively (Livingston et al., 2013). For example, for near-field distances, some of the most

effective depth cues are: occlusion, relative size, accommodation, convergence and binocular

4 Note: The reader should approach this chapter as a rather superficial review of perceptual literature as it pertains to

the goal and focus of this thesis. This chapter is meant only to provide the background necessary for understanding

the theory behind the method used in this investigation.

5 In some literature, ordinal and absolute depth are also referred to as ‘relative’ and ‘metric’ depth, respectively.

However, to avoid ambiguity, I have chosen to exclude these terms from our discussion and have chosen to only use

ordinal and absolute as they pertain to our study.

8

disparity. Since the method investigated in this thesis is meant primarily for near-field

applications, only these depth cues are discussed in detail below. For a broader overview of

depth cues, the reader is referred to Appendix D.

2.1.1 Occlusion (Interposition)

Foreground-background occlusion occurs if an object intervenes between a vantage point and

another object. Both objects may project into the optic array at a vantage point. The front of the

foreground (or ‘occluder’) projects to the vantage point, and if it is opaque, either none or only

part of the other object can project to the vantage point. In this case, either the whole object or

the other part of it is hidden – ‘occluded’. In cases where the foreground is transparent, the

background object can either partially or completely project to the vantage point, with optic

arrays passing through the foreground’s surface. There are many kinds of optical information for

occlusion. Research on optical features encouraging the appearance of occlusion continues to

this day (Gillam and Grove, 2011; Kennedy, 1974; Peterson, 2015).

It is widely believed that occlusion is the most powerful depth cue at all distances where visual

perception holds. The reason for this is that our world is populated mostly by solid objects that

are opaque. However, transparent or translucent objects are also encountered regularly and can

be easily incorporated into our understanding.

In the context of X-ray vision applications of AR, various researchers have used the occlusion

cue by having features of the real surface occlude the virtual object, thus allowing the observer

easily to perceive the virtual object as being behind the real object (Lerotic et al., 2007; Avery et

al., 2009; Sandor, Cunningham, Dey & Mattila, 2010).

2.1.2 Relative (Familiar) Size

As objects move farther away, their projected sizes become smaller. Therefore, if an object is

recognized or if the absolute size of a depicted object is known, one can infer its distance from

its apparent size using the size-distance invariance hypothesis (Kilpatrick and Ittelson, 1953).

Additionally, if one knows the relative sizes of multiple different objects, then their ordinal

proximity can be inferred from their relative apparent sizes in the visual field. Thus, the

important point about this cue is that it is a relative cue. In other words, a basis for comparison

must exist, either from the scene or from the observer’s experience.

9

In the context of perceptual experiments, if the experimenter aims to prevent participants from

using this cue, it is essential that objects either be presented in the same size regardless of their

distance, or that size variations be independent of the object’s distance.

2.1.3 Accommodation

To bring images into focus on the retina, the curvature of the lenses of the eye requires

adjustment. This adjustment is referred to as accommodation. Closer objects require more

adjustment and, thus, sensing the amount of this adjustment might help in determining the

absolute depth of nearby objects.

Although static focus distances may not provide much information, changes in focus are what

makes this depth cue effective. Moreover, this cue is generally described as a monocular depth

cue since it does not require the involvement of both eyes.

2.1.4 (Binocular) Convergence

The amount of inward turning of the eyes when a focal point is fixated determines the degree of

‘convergence,’ and thus sensing the extent of this inward turning can help in determining the

distance of an object. This cue is used to provide absolute depth information for nearby objects.

2.1.5 Binocular Disparity

The ability to perceive a scene from two eyes that are separated by an interpupillary distance

provides (95% of) humans with one of the most important and perceptually acute sources of

depth information (Coutant & Westheimer, 1993).

When a scene is viewed, the fixation point (also referred to as the focal point) will fall on a

particular location on the retina of each eye, resulting in zero disparity. One can furthermore

envisage an imaginary geometric arc called the horopter, comprising all retinal points, including

the focal point, that also have zero retinal disparity. Other points that are closer or farther from

this arc are mapped onto disparate locations on the two retinas, which are nevertheless fused into

a single image in depth. The horopter thus provides a reference plane from which the ordinal

depth of other objects can be judged. Objects that are in front of the horopter (closer to the

observer) will result in fused images with crossed disparity, whereas objects that are behind the

horopter (farther from the observer) will result in fused images with uncrossed disparity. Based

10

on the amount of retinal disparity in the projection of each point to each eye, the visual system is

thus able to discern the ordinal depths between two points in space via the binocular disparity

depth cue (Patterson, 2009).

The importance of binocular disparity in perceiving depth was first shown through the invention

of the stereoscope by Wheatstone (1838), where a pair of flat drawings were used to achieve a

three-dimensional percept of an object. Later, in 1960, by introducing the concept of random dot

stereograms, Julesz (1971) made a significant contribution to the science behind stereo vision. A

typical example of a random dot stereogram is one where two images consist of identical

randomly distributed dots, but with a central square region that is shifted horizontally by a small

distance relative to the other image. When viewed individually, each image appears as a flat field

of random dots. However, when viewed stereoscopically, the central square region appears at a

depth that is different from the background plane of random dots. Random dot stereograms

provide evidence that binocular depth perception can be achieved without the need for

monocular form recognition.

Although the neurophysiological processes through which the brain derives depth information

from binocular disparity are outside the scope of this thesis, it is nevertheless important to note

the importance of vergence eye movements for the effectiveness of this cue. As mentioned, the

brain uses the horizontal disparity of objects on the retina to estimate their depth relative to the

fixation point. Through the use of vergence eye movements, the fixation point (defined as the

intersection of the line of sight of the two eyes) changes, resulting in a corresponding shift in the

position of the horopter. By doing so, our visual system is able to increase the range within

which it is able to perceive depth through binocular disparity (Foley & Richards, 1972). In

addition, the brain is able to use the corresponding changes in ocular vergence as a depth cue in

its own right. Therefore, if it were possible to provide extra cues that facilitate the observer’s

ability to converge her eyes at different depths, it may be possible to use the feedback from

convergence to increase the accuracy of information obtained from binocular disparity.

2.2 Integration of Depth Cues

In natural environments, multiple depth cues typically provide both consistent and

complementary information. However, in specific cases and especially with the use of visual

displays (due to the technological limitations of implementing various depth cues), cue conflicts

11

do arise. In other words, two or more sources of depth information can in some cases provide

inconsistent and/or discrepant information about depth. An example of this was shown in Figure

1.3, where the occlusion cue suggests that the virtual object is in front of the person’s face while

the binocular disparity cue displays the virtual object as being inside the person’s head. The way

in which and whether consistent and inconsistent cues interact with each other to provide a single

depth map or shape estimate to the observer has been the topic of much research (e.g., Johnston,

Cumming & Parker, 1993; Young, Landy & Maloney, 1993; Landy, Maloney, Johnston &

Young, 1995; Kennedy, Juricevic and Bai, 2003; Wismeijer, Erkelens, van Ee & Wexler, 2010).

For example, as discussed, Cutting and Vishton (1995) divided the continuum of depth into three

regions and defined relative ‘strengths’ for each depth cue. However, while the division of this

continuum has served as a valuable and useful tool in many aspects, the complexities involved in

the interaction of depth cues led to the development of more complicated models.

One model of cue interaction, suggested by Johnston et al. (1993), is referred to as ‘weak fusion,’

or ‘weighted linear combination’. In this model, the so-called ‘weak observer’ processes the

information provided by each depth cue separately and then averages the separate depth

estimates (from each cue) by using different weights for each. The weighting of each cue

depends on its estimated reliability under the circumstances.

An alternative to the weak fusion model is ‘strong fusion’, which involves the cooperation of

depth cues prior to obtaining depth estimates. In other words, in contrast to the weak fusion

model, the depth cues are not processed separately; rather they interact and provide the ‘strong

observer’ with the most probable three-dimensional interpretation of the scene. Examples of this

include ‘promotion’ and ‘disambiguation’. In the former case, one cue provides compensating

information for another incomplete depth cue. In the latter, depth information provided from an

inherently ambiguous cue (e.g. kinetic depth) is disambiguated by another depth cue (Johnston et

al., 1993). Based on Landy et al. (1995), models that are focused on modularity tend toward the

weak side whereas those that suggest more holistic interactions amongst cues tend toward the

strong side.

In the same paper, Landy et al. (1995) introduce the ‘modified weak fusion’ (MWF) model,

based on which interactions between different cues result in two types of information for each

cue: a commensurate depth map and an estimated measure of the cue’s reliability (which are

12

both based on a combination of information provided by the cue itself and those provided by

other cues). These estimates provide inputs to the final fusion (or weighted averaging) stage,

where the weights of each cue take the estimated reliabilities and the discrepancies between cues

into account. In other words, the MWF model can be simplified to the weak fusion model and

provides a means of constraining the strong fusion model to one that is able to be tested.

On the other hand, when the conflict between depth cues is large, some researchers have

suggested that usually one of two processes occur: cue switching or cue dominance (also referred

to as ‘vetoing’) (Wismeijer et al., 2010). In the former, the visual system switches in time

between various depth percepts based on the information available from individual depth cues

(van Ee, van Dam & Erkelens, 2002). In the latter, one cue (usually the most reliable one)

overrides the other cue and depth judgements are made based on that single cue (Bülthoff &

Mallot, 1988; van Ee, Adams & Mamassian, 2003).

Regardless of which model is used to describe the final percept that results from perceiving an

image such as the one depicted in Figure 1.3, an important implication of the discussed models is

that by creating conditions that result in changes in either cue reliability, cue availability or cue

inconsistency, it may be possible to aid observers in making more accurate depth judgements6. In

this case, with regards to the three possible models discussed above (MWF, cue switching and

cue dominance), we may be able to either:

• Increase the weighting of the cue that suggests the correct depth and reduce the weighting

of the cue that suggests the incorrect depth (if the MWF model applies), or;

• Increase the frequency with which an observer uses the cue that suggests the correct

depth (if cue switching occurs), or;

• Aid the observer in completely ignoring the cue that suggests the incorrect depth and

guide the observer towards using the cue that suggest the correct depth (if cue dominance

occurs).

6 In cases where the conflict of depth cues is created artificially (with the use of visual displays), an accurate depth

judgement may actually be the ‘desired’ depth percept/judgement.

13

Either way, creating such conditions allows us to achieve our desired depth percept. This idea is

the key to the theory behind the investigated method in this thesis and will be returned to

throughout the coming chapters.

While the issues involved in perception of depth can be discussed in much more detail, I will

now change the focus towards defining terms that are most relevant to the topic of my study.

More in-depth information about depth cues and their interactions can be found in the cited

publications as well as those not directly cited: e.g. Wickens et al. (2000), Bruce, Green &

Georgeson (2003), Parker et al. (1992), Interrante (1996).

2.3 Surfaces

Since the real object’s surface plays an important role in this study, it is important to define what

a surface is. Basically, two volumes meet at a surface. In the case of our study, the surface of the

real object meets the air at its surface. According to Kennedy and Wnuczko (2015), “a surface is

a polarized plane, that is, different on its two sides”. Generally, at any point on a non-planar and

non-spherical 3D surface there will be a unique direction in which the surface curves most

strongly, and that direction is referred to as the first principal direction. The orthogonal direction

at that point will be the direction in which the surface either curves most strongly in another

direction (if the surface is locally saddle-shaped) or it will be the direction in which the surface is

most flat (if the surface is locally cylindrical or locally elliptical). In other words, apart from the

special cases of a (flat) plane (zero curvature in any direction) and a sphere (equal curvature in

all directions), there are only five generic categories of surface shapes, defined by the signs of

the first and second principal directions (same sign = elliptical; both positive = convex; both

negative = concave; one positive/one negative = hyperbolic; and one zero/the other non-zero =

cylindrical). For the purposes of this thesis, we will categorize surfaces simply as flat or non-flat.

The limits to a surface are defined by its edges (Kennedy and Wnuczko, 2015). For example, in

the case of a cylinder placed against a wall, the cylinder’s edges can be observed because

wherever the cylinder’s surface ends, an occluding boundary is formed by the cylinder’s surface,

taken with respect to a vantage point. To one side of the boundary is the cylinder’s surface and to

the other side is the wall, as far as the vantage point is concerned. This occluding bound projects

a straight line in the optic array at the vantage point.

14

In some ways, our vision can be considered as “superficial” since, when we look at objects

around us, what is almost always perceived is the front ‘surface’ of opaque objects (Kennedy and

Wnuczko, 2015). It is for this reason that, throughout this thesis, the ‘real object’ and the ‘real

object’s surface’ will be used interchangeably. For the sake of simplicity, it might also be

referred to as the ‘real surface’.

2.4 Texture

Gibson (1950) listed “the quality of being visually resistant or ‘hard’ ” as one of the most

essential properties of a surface. He equated this property to having ‘texture’ which can be

perceived when inhomogeneous retinal stimulation occurs. ‘Texture’ can also be thought of as

what is provided by the material underlying a surface (e.g. granite or wood). In addition, the

optical projection of the texture depends on the structure or shape formed by the material (e.g. a

ball or an oar), as well as the structure of the surface as bumps and hollows.

As it pertains to the research focus of this thesis, I have placed surfaces into three categories:

containing no visible texture, containing 2D (textural) elements, or containing 3D (textural)

elements. While a smooth surface can belong to one of the first two groups, a surface containing

bumps or ridges will belong to the last.

With regards to 2D textures, Rao and Lohse (1993) used a combination of statistical techniques

to identify the features that can be used to characterize different texture patterns. Their study

revealed two major dimensions that they subjectively interpreted as “periodicity vs. irregularity”

(periodic meaning that the statistical properties of local patches of the surface are uniform over

the surface) and “directionality vs. nondirectionality” (nondirectional meaning that the surface

elements have no orientation bias). These two dimensions accounted for 90% of the variability in

their subjects’ classification of 30 pictures from Brodatz’s album (Brodatz, 1966). They also

found a third dimension characterized as representing “structural complexity” which accounted

for another 6% of the variability.

2.5 Transparency

Referring to a summary provided by Tsirlin, Allison and Wilcox (2008), one can consider

transparency to have three different primary manifestations:

15

a) Glass-transparency, which is essentially what is observed when light passes through clear

materials such as glass;

b) Translucency, which is what occurs when light is diffused as it passes through a material

and causes objects to appear less clear on the other side; and

c) Pseudo-transparency, which is the result of light passing through gaps in non-transparent

objects, such as lace or wire fences7. Based on this concept, Julesz (1971) further defined

Stereo-Transparency as Pseudo-Transparency that is perceived in surfaces defined solely

by binocular disparity.

The common theme that ties these three groups together is the fact that an object that is known to

possess some form of transparency can not only be seen but can also be seen through. What

allows for such an odd co-existence, as asserted by Interrante (1996), is our perceptual and

cognitive ability to reconstruct a continuous representation of an opaque object and a transparent

object at any given location on the transparent surface as seen from the vantage point.

In terms of the present research topic, it can be concluded that if we were able to convey the

existence of an object behind another object’s surface while preserving sufficient information

about the two objects, we may be able to create the impression of ‘transparency’ of the closer

object’s surface.

Having provided this brief perceptual background and definitions of terms, we can now move on

to the next chapter, which provides an overview of existing solutions that aim to deal with the

challenges of X-ray vision in AR.

7 In fact, “transparency” in computer graphics is actually almost always pseudo-transparency, where an “additive”

mathematical model is used to suggest the presence of little holes on the surface of the occluder, through which the

occluded object is presented. The mathematical model used to realise this effect incorporates a linear combination of

the intensities If and Ib of the occluding (foreground) and occluded (background) surfaces respectively, weighted by

the relative concentration of opaque material in the occluder: I = αIf + (1− α)Ib (Interrante, 1996).

16

Chapter 3

X-Ray Vision in AR: Literature Review

As discussed in Chapter 1, one of the prominent applications of AR is X-ray vision, which

allows the observer to ‘see through’ a real surface and perceive images of the structures beneath

the surface. To achieve this, images of what is beneath the surface are usually graphically

combined with the image of the real surface8. In achieving the metaphor of X-ray vision with AR

displays, several challenges are involved in achieving this with AR displays. This chapter

provides a review of the literature related to challenges involved in the use of AR for X-ray

vision applications, based upon which an overview of existing solutions and gaps will be

presented.

3.1 Challenges

As mentioned, to achieve X-ray vision in AR, virtual objects that are placed behind real objects

are superimposed9 onto an image of the real object. If the observer is able to see through the real

object’s surface and perceive the virtual object, based on the definition provided for

‘transparency’, the observer will perceive the real object to be transparent. For this to happen,

two requirements must be met:

1. The observer must be able to perceive the correct depth order between the virtual and the

real object:

• Johnson et al. (2003) used an OST display for overlaying preoperative MRI/CT

scans onto the patient in surgery and reported that surgeons would sometimes

perceive the virtual image (scans) as floating above the surface, even though they

were rendered in depth using a stereoscopic display to be behind the body.

8 These internal structures that are ‘added onto’ the image of the scene (which consists of the real object) may be

images either that were previously recorded or that are entirely computer-generated. For simplicity, in either of these

cases, the structure of what is behind the real surface will be referred to as the virtual object.

9 Although superimposing is the operation that is actually being carried out when the virtual objects are being added

to the real image, in light of our goal of having the virtual objects appear to be behind the real ones, a better term to

use would arguably be ‘subposing’ or ‘subposition’.

17

2. In the context of stereoscopic displays, the observer must be able to fuse the virtual

objects and real objects simultaneously:

• Using the concept of ‘stereo-transparency10’, Akerstrom and Todd (1988)

investigated the challenges involved in perceiving transparent surfaces. The

results from their experiments revealed that perception of overlapping transparent

surfaces is more difficult and requires more time than non-transparent surfaces.

They also found that the perception of stereo transparency becomes more difficult

when the distance between the overlapping planes and/or the density of elements

on the planes is increased.

• Johnson et al. (2003) asked participants to look through real stereo images of a

smooth skull, comprising the eye sockets and nasal bone, a brain in the skull and

natural foliage, to see a random dot target rendered behind the surface. The target

was initially presented at a large distance from the real surface, resulting in double

vision. The participants were then asked to move the target closer to the surface

until fusion was possible. This distance was recorded as the maximum distance at

which an observer could still fuse a target rendered behind a transparent surface.

Results revealed that the real object’s surface image content affected this distance

(with no clear trend). Hou and Milgram (2003) also confirmed this finding by

performing an experiment where participants were asked to manipulate a virtual

object near the surface of a real object. The results of their experiments showed

that, as the texture density of the real surface increased, the maximum distance at

which an observer could still fuse the virtual object behind the real surface was

reduced.

Once the impression of X-ray vision is achieved, there’s also the possibility that the accuracy of

absolute depth judgements about the real and virtual object will be adversely affected:

10 Stereo transparency is defined as Pseudo-Transparency that is perceived in surfaces defined solely by disparity

(Julesz, 1971).

18

• Ellis and Bucher (1994) asked participants to judge the depth of a virtual tetrahedron in

the absence and presence of a real checkerboard pattern placed in front of it. Results

showed that the introduction of the (opaque) checkerboard caused the mean position of

the virtual image of the tetrahedron to approach the viewers significantly.

• Ellis and Menges (1998) found that the presence of a visible real surface spatially close to

a virtual object significantly influenced the observer’s depth judgement of the virtual

object, resulting in the object appearing nearer than it really was. Singh et al. (2010) also

confirmed these results using a replication of Ellis and Menges’ experiments.

• Johnson et al. (2003) reported that, even when surgeons would see the virtual objects as

being below the real surface, they would perceive its position closer to the surface than

suggested by the binocular disparity cue.

• Edwards et al. (2004) also found reduced accuracy for depth judgements of a virtual

object when viewed through a physical transparent surface. In their experiments, the

virtual object was perceived as farther behind the surface compared to its actual position.

Their results also showed that this depth judgment error depended on the actual depth of

the virtual object.

Since all of these issues were identified in cases where OST displays were used, one may argue

whether these issues also apply to video based displays. However, it is important to reiterate that

the most relevant distinction between OSTs and video-based displays in this context is the

opacity of the virtual object (virtual objects cannot completely occlude real ones in OSTs).

Therefore, it is expected that, if these issues exist with OSTs, they should be further exacerbated

using video overlays, for which virtual objects tend to be completely opaque.

Another important point to make relates to the discrepancy between the findings of Edwards et

al. (2004), who noted that the distance to a virtual object tended to be overestimated when seen

behind a nearer transparent real surface, and those of Ellis and Bucher (1994) and Ellis and

Menges (1998), who reported to have found that the distance to the virtual object was

underestimated when seen behind a nearer opaque real surface. One possible explanation for this

discrepancy can be the distance at which the real surface was placed from the virtual object. In

both Ellis and Bucher’s (1994) and Ellis and Menges’ (1998) experiments, the real surface was

19

placed 300 mm in front of the virtual object. In Edwards et al.’s (2004) experiments, however,

the real surface was placed between 80 mm in front to 20 mm behind the virtual object and the

overestimation of the virtual object’s depth occurred only when the real surface was placed at a

distance less than 20 mm in front of the virtual object. Therefore, as Edwards et al. found from

their experiments, a plausible explanation might be that the direction of the error in perceived

depth of the virtual object depends on the distance between the real surface and the virtual

object.

A summary of the above literature is provided in Table 3.1. As can be seen, the general trend

shows incorrect depth order and inaccurate depth judgements of the virtual object, regardless of

whether the real surface is flat or curved or whether the virtual object is solid or wireframe.

Table 3.1: Summary of literature review on perceptual issues of X-ray vision in AR.

Publication Display

Used Real Object

Virtual

Object Identified Issues

Ellis and

Bucher

(1994)

Stereo

OST

Flat checkerboard

pattern

Tetrahedron

(wireframe or

solid)

- Virtual object appearing

closer to observer

Ellis and

Menges

(1998)

Stereo

OST

Flat checkerboard

pattern

Wireframe

pyramid


closer to observer

Johnson et

al. (2003)

Stereo

OST

- A smooth skull

- The front of the

skull including the

eye sockets and

nasal bone

- Brain in the skull

- Natural foliage

Random dot

target

- Real object’s surface

content reduced distance with

which virtual object could be

rendered while maintaining

ability to fuse image.

Real anatomy MRI/CT scans

- Virtual object appeared to

be floating above real

object’s surface


closer to observer

Edwards et

al. (2004)

Stereo

OST

Phantom

mimicking skin and

brain

Truncated

cone


farther from observer when

placed just behind the real

surface (distance less than 20

mm)

- Reduced accuracy of depth

judgements in presence of

real object

20

3.2 Review of Proposed Solutions

As mentioned in Chapter 2, the strengths of depth cues differ across different regions of space

(i.e., near-field, medium field, and far field distances). As a result, when investigating perceptual

challenges and solutions, it is important to consider the specific application and depth region for

which a solution is being proposed. Even though medical applications of AR justify the

importance of studying the relevant perceptual challenges in achieving X-ray vision for near-

field distances, there is relatively little research in this field. This is because most of the current

research in X-ray vision applications of AR is focused on mobile applications, presumably since

it is considered as the potentially major consumer application of this technology (Livingston et

al., 2013). Moreover, although a number of visualization techniques have been proposed to deal

with these challenges, there are very few studies that have provided experimental results

investigating the effectiveness of their proposed solutions. In this section, an overview of the

general methods used for dealing with perceptual issues in X-ray vision for near field

applications is presented.

3.2.1 Cutaway or Virtual Hole

To aid the observer in perceiving the virtual object as being placed behind the real surface, one

of the metaphors that has been used is referred to as the cutaway or virtual hole. In this method, a

hole appears to be carved on the real surface, through which the virtual object is presented. This

virtual hole may have a 3D structure showing the sides and bottom of the hole that is placed

behind the virtual object (Livingston et al., 2013). An example of this method is shown in Figure

3.1.

As part of their aforementioned experiments, Ellis and Menges (1998) used a virtual hole in the

real object’s surface as a visualization technique that could help in conveying correct depth

information. The results of their experiments showed that using this metaphor reduced the depth

judgment bias. Rosenthal et al. (2002) also used this method for comparing AR ultrasound

guidance systems for targeting needle biopsies in phantoms. An image showing the view from

the head mounted displayed used in that study is presented in Figure 3.1. Their results showed

lower mean errors in needle placement when using this technique, compared to when AR

displays were not used.

21

Although their study was demonstrative of the potential benefits of using AR, it did not explicitly

address the advantage of this specific visualization technique. In general, in cases where

preservation of the content of the real object’s surface is important, this method can definitely

prove problematic, as the virtual hole clearly eliminates all the information of the real object’s

surface (Bajura, Fuchs & Ohbuchi, 1992).

Figure 3.1: Example of virtual hole metaphor used by Rosenthal et al. (2002) for the task of

targeting needle biopsies in phantoms. The vertical lines are meant to show the sides of the

virtual hole. Reprinted from Medical Image Analysis, Vol. 6, Rosenthal et. al, Augmented reality

guidance for needle biopsies: An initial randomized, controlled trial in phantoms, 313-320, 2002,

with permission from Elsevier.

3.2.2 Modified Opacity

Another way to achieve the metaphor of X-ray vision is to depict the real object as being

partially transparent, by reducing the opacity of the real object pixels using image processing

techniques (Livingston et al., 2013). Rather than uniformly reducing the opacity of the entire real

object, Bichlmeier, Wimmer, Heining & Navab (2007) endeavoured to achieve a natural looking

transparency by defining an optimized opacity value, which was a function of the surface

curvature and the angle and distance between the observer and the image. In short, their model

22

involved assigning higher opacity values to regions with higher curvature and larger angle and

distance relative to the observer’s viewpoint. Sample images of their visualization technique,

implemented using a video based display, are shown in Figure 3.2. Although Bichlmeier et al.

(2007) demonstrated the feasibility of their suggested technique, they unfortunately did not

evaluate its efficiency regarding the correct perception of relative and absolute distances of

objects within the AR scene.

Figure 3.2: Example of the Modified Opacity visualization method used by Bichlmeier et al.

(2007). Reprinted by permission from IEEE 2007.

In general, there are two aspects to the limitations involved in the use of this technique:

computational and visual. Firstly, the use of this image processing technique is computationally

expensive and requires both a viewer tracking system and an accurate model of the real object.

For example, Bichlmeier et al. (2007) mentioned that high quality visualizations suffer from low

performance speed since the ‘quality of the transparency effect’ depends on the ‘accuracy level

of the surface model’. Secondly, the modified opacity technique can also be affected by the

display capabilities. In the case of OST displays, for example, virtual objects possess lower

brightness and are not able to completely cover real objects. On the other hand, video based

displays (which are intended for the use of the AR method described in this thesis) can allow

virtual objects to completely occlude real ones, which is the opposite of the desired effect for

conveying the metaphor of X-ray vision (Livingston et al., 2013).

However, it is worth mentioning that the two aforementioned techniques have been shown to be

superior to some alternative visualization techniques. For example, Sielhorst, Bichlmeier,

Heining & Navab (2006) evaluated 7 different visualization techniques using a stereoscopic

video based head-mounted display. The 7 techniques include: Surface rendering opaquely

superimposed; Surface rendering transparently superimposed; Surface rendering through a

23

virtual window in the skin; Triangle mesh; Volume rendering model through a virtual window in

the skin; Surface rendering with a glass effect of the skin; and Volume rendering superimposed.

These visualizations are presented in the same order (1-7) in Figure 3.3. Based on their

terminology (Sielhorst et al, 2006):

• Triangle mesh is the case where the surface of the bone structure “is stored in the

computer as a list of triangles” and visualized with the edges of these triangles (image 4

in Figure 3.3).

• Surface rendering involves the visualization of the bone structure surface with

“untextured but shaded solid triangles” (images 1, 2, 3 and 6 in Figure 3.3).

• Volume rendering “represents the whole volume rather than the surface” of the bone

structure with transparency values assigned to emphasize the bone structure (images 5

and 7 in Figure 3.3).

• Glass effect is the case where the surface of the skin “is rendered transparently and

achromatically” showing reflections of a virtual light source (image 6 in Figure 3.3).

In their experiments, 20 surgeons were given the task of moving a pointer to a specific point

on the surface of the spine (virtual object) inside of a phantom (real object). Amongst the

various visualization techniques they tested, they found the two best visualization modes to

be the transparent surface rendering and the virtual window surface rendering (images 2 and

3 respectively in Figure 3.3). These two methods can be considered analogous to the ‘virtual

hole’ and ‘modified opacity’ technique discussed above.

24

Figure 3.3: The 7 evaluated visualizations studied by Sielhorst et al. (2006). Visualizations 2 and

3, corresponding respectively to surface rendering transparently superimposed and surface

rendering through a virtual window in the skin, were determined to be the best in terms of depth

perception and effectiveness. Reprinted by permission from RightsLink: Springer Berlin

Heidelberg, International Conference on Medical Image Computing and Computer-Assisted

Intervention, Depth perception–a major issue in medical AR: evaluation study by twenty

25

surgeons, Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N., Copyright 2006 by

Springer-Verlag Berlin Heidelberg.

3.2.3 Context-preserving Techniques

A more advanced type of method that aims to deal with the challenges involved in the use of AR

displays for achieving X-ray vision is an image-based technique that is referred to as ‘context-

preserving’. Context-preserving refers to methods in which the removal of the real object surface

is controlled when imposing the virtual image onto it, such that certain details of the real surface

are preserved. In other words, the image of the real object is used to extract a partial model of the

real object that includes edges (Kalkofen, Mendez & Schmalstieg, 2007; Avery et al., 2009),

salient regions (Lerotic et al., 2007; Sandor et al., 2010) or a combination of salient regions,

edges and texture details (Zollmann et al., 2010). In addition to preserving the most important

information about the real object’s surface, these methods use these features to occlude the

virtual object, thereby suggesting the correct depth order between the real and virtual object.

Even in cases where such features don’t exist, synthetic features are added onto real surfaces. For

example, Zollmann et al. (2010) suggested adding synthetic features based on ‘tonal art maps’, to

provide compensation for surfaces where too few features exist. In their work, by adding a

hatching pattern to the surface of the pavement in an outdoor scene and having the pattern

occlude parts of the virtual underground pipes, they provided occlusion cues, which suggest that

the virtual pipes are in fact located underneath the pavement.

Within the literature mentioned above, the one method that has been applied to the medical

domain is that of Lerotic et al. (2007). Lerotic et al. (2007) used the da Vinci system (consisting

of a video based stereoscopic display) to compare the effectiveness of their proposed

visualization technique with traditional overlays11. Their technique involved rendering the real

surface as ‘translucent’ by adjusting its opacity value while detecting and preserving its salient

features, as shown in Figure 3.4. In their first experiment, participants were asked to locate eight

virtual spheres placed along different depths of a model of a real thorax. Results showed

11 “Traditional overlays” refers to cases where the virtual object is overlaid onto the real object without any

modifications.

26

significant improvement in the accuracy of depth judgements using their solution in comparison

with traditional overlays.

Although demonstrated to be beneficial, extracting these partial models requires computationally

expensive rendering steps or special purpose hardware (Livingston et al., 2013). These methods

are also not applicable to cases where occlusion of the virtual object is difficult to realize, or is

not desired. Moreover, most of these methods require the real object’s surface to possess salient

features in order for the algorithms to function effectively.

Figure 3.4: A context-preserving visualization used by Lerotic et al. (2007). Reprinted by

permission from RightsLink: Springer Berlin Heidelberg, Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-photorealistic rendering for

augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z., Copyright 2007 by

Springer-Verlag Berlin Heidelberg.

3.3 Criteria for Success

From the challenges and proposed solutions presented in Sections 3.1 and 3.2, it can be

concluded that the success of methods used to achieve X-ray vision with AR displays requires

considering several indicators. In particular, an effective method must:

1. Provide the observer with the information that permits her to understand the depth order

between the virtual and real objects: In simpler terms, the observer must be able to

perceive the virtual image as being behind the real object surface (and thus inside the real

object).

27

2. Preserve some amount of detail about both the virtual objects and the surface of the real

objects that is sufficient for carrying out one’s intended task: Not surprisingly, achieving

these two properties typically involves a compromise. If the real object surface is able to

occlude portions of the virtual object (allowing the observer easily to infer the virtual

object as being behind the real surface), at least some details of the virtual object may be

lost. On the other hand, if the virtual object is overlaid onto the real surface without

occlusion by the real surface, in addition to losing details of the real surface, the depth

order of the virtual and real objects may become incomprehensible.

3. Require a reasonable computational load for creating the final rendering: For instance, as

discussed, some methods require the computation of an accurate 3D model of the

physical environment to create a convincing composition of virtual and physical objects.

Therefore, to summarize, a convincing solution should be one that finds an appropriate level of

compromise between depth perception and information preservation (of both real and virtual

objects), while minimizing computational cost. In the following chapter, I present the reasoning

behind the idea of this thesis, while justifying its criteria for success by referring to the concepts

presented above.

28

Chapter 4

Our Method

4.1 Use of Texture

As discussed in Section 3.2, to deal with the challenges involved in achieving X-ray vision with

AR displays, various researchers have suggested the addition of some sort of ‘texture’ to the real

surface. For example, Zollmann et al. (2010) used the addition of synthetic features to images of

real pavement to occlude superimposed virtual pipes, thereby suggesting that the virtual pipes are

in fact located underneath the pavement.

In the context of stereoscopic displays, however, the research by Interrante et al. (1997) seems to

be most relevant to the topic of this thesis. In their work, Interrante et al. suggested using sparse

opaque textures that were specifically designed to convey intrinsic surface shape properties, to

improve perception of depth and spatial understanding of the surface. By adding grid lines or

strokes to the surface of a 3D computer-generated transparent object, Interrante et al. were able

to use a combination of the occlusion cue, the binocular disparity cue, the relative density cue

and motion parallax12 to improve depth perception. Their claim was based on the idea that

consistent depth cues reinforce each other, leading to improved depth perception (Interrante,

1996). Though their work was done in a completely virtual environment, the premise of their

work could be formulated as positing that adding texture to a surface can facilitate the veridical

perception of depth from binocular disparity.

If this is in fact true, for cases where the occlusion and binocular disparity cue are in conflict, the

addition of texture to a surface may result in an increase of the availability and/or reliability of

the binocular disparity cue so that it can dominate the occlusion cue. This reasoning is also in

line with the theory that was presented in Section 2.2 (on integration of depth cues). As may be

recalled, based on the perceptual models on depth cue integration, cues are either combined

using weights that are based on their respective reliabilities, switched between or ignored due to

the presence of a more reliable cue. Therefore, it may be possible to create conditions that result

12 For a brief explanation of these two latter cues, see Appendix D.

29

in changes in either cue reliability, cue availability or cue inconsistency, to aid observers in

making more accurate depth judgements. For example, in Figure 4.2(a) and Figure 4.2(b),

described below, even though the occlusion cue is suggesting that the virtual object is in front of

the real surface, it may be possible to reduce the weighting of the occlusion cue, reduce the

frequency with which the observer uses the occlusion cue or to help the observer in ignoring this

cue by increasing the availability and/or reliability of the binocular disparity cue. In the

following section, we propose that adding a random dot texture pattern to a real surface in a

stereoscopic display is a potentially effective means of increasing the availability and/or

reliability of the binocular disparity cue (through supporting vergence eye movements). If this is

done successfully, the observer should be able to perceive the virtual object as lying behind the

real surface.

4.2 Stereo-Translucency

In the context of stereoscopic video based AR displays, when a virtual object is correctly

rendered (stereoscopically) in front of a real object, the binocular disparity cue and the occlusion

cue together provide consistent information, allowing the virtual object to be perceived

unambiguously as being in front, as illustrated in Figure 4.113.

Figure 4.1: Stereo pairs. The blue circle indicates a virtual object rendered in front of the surface

of a real object (the face). In this case, the binocular disparity cue and the occlusion cue provide

13 It is worth noting that, this is not the case with stereoscopic OST displays. With OST displays, when the virtual

object is rendered in front of the real surface, the occlusion cue may be inconsistent with the binocular disparity cue

since the virtual object will appear to be transparent.

30

consistent information, allowing the virtual object to be perceived unambiguously as being in

front of the person’s face. Note that the middle face shows less mouth than the left face and the

eyebrows are more extensive in the left face.

The addition of random dot patterns to the real surface of Figure 4.1 should in this case have no

effect on how the virtual object is perceived relative to the real surface. However, in cases where

the virtual image is rendered stereoscopically behind a real object, even though the binocular

disparity cue is communicating that the virtual object is behind the real surface, the occlusion cue

nevertheless continues to suggest that the virtual object is in front (Drascic & Milgram, 1996).

An example of this situation is depicted in Figure 4.2(a). We refer to this case as being

incongruous, as a consequence of the conflict between these two very important depth cues –

occlusion and binocular disparity.

(a)

(b)

31

Figure 4.2: In both sets of stereo pairs (a) and (b) (identical to Figure 1.3), the blue virtual circle

is stereoscopically rendered behind the face. In this case, the binocular disparity cue and the

occlusion cue provide inconsistent information, leading to a cue conflict: (a) Untreated image.

An enlarged (landscape) version of this image is provided in Appendix C2 to help in perceiving

the desired percept.; (b) Addition of random dots onto the face (using a projector). If successful,

the reader should more easily perceive the virtual circle as being behind the face in (b), relative

to (a).

To aid the observer to contend with the sometimes perplexing effects of incongruity, and to

facilitate perception of the correct depth order of the virtual object and the real surface, we

propose the addition of random dot patterns onto the real surface. By comparing Figure 4.2 (b)14

with Figure 4.2(a), one should get the impression that perceiving the virtual object as being

behind the surface is easier when the random dot pattern is present (Figure 4.2 (b)) compared to

when it is not (Figure 4.2 (a))15.

Expanding upon what was discussed in the previous section, one explanation for the expected

effect is that by adding random dots to the real object surface, we are able to provide observers

with distinct fixation points (in the form of the edges of the dots), thus guiding them in making

vergence eye movements (between a virtual object and the real surface) and using the additional

vergence cue to make better depth judgements. By doing so, we should be able to increase the

availability and/or reliability of the binocular disparity cue such that the observer is more easily

able to perceive the virtual object as being behind the real surface (despite the conflicting

occlusion cue). Furthermore, because the virtual object is perceived as being behind the real

surface, which remains visible, observers are able to perceive the real surface as being

“transparent” – i.e. X-ray vision.

It is important to clarify the terminology we are using here. As discussed in Chapter 2, one of the

primary manifestations of transparency is Pseudo-Transparency, which is the result of light

14 Note that this figure is identical to Figure 1.3.

15 Note that, unless the reader is able to view these stereo pair images stereoscopically, it will not be possible to

perceive any differences with regards to where the virtual object is located relative to the real surface.

32

passing through gaps in non-transparent objects, such as lace or wire fences (Tsirlin et al., 2008).

Julesz (1971) also used this concept to define Stereo-Transparency, which is Pseudo-

Transparency that is perceived in surfaces defined solely by disparity. However, we have

hesitated to use Julesz’s term to refer to the phenomenon described above as Stereo-

Transparency (or Stereo-Pseudo-Transparency), due to the fact that the percept is not due only to

binocular disparity, but rather to the conjunction of both binocular disparity and occlusion

cues16. Otherwise stated, what we observe is not due to light passing through gaps in non-

transparent surfaces, and thus does not fit the accepted constraints of Pseudo-Transparency. One

option might be to label the observed phenomenon as “Pseudo-Translucency” (or “Stereo-

Pseudo-Translucency”), a term that could be further justified by the fact that virtual objects that

are rendered stereoscopically behind a real surface but nevertheless occlude that surface give the

overall impression of a diffuse surface, somewhat akin to frosted glass. As discussed later on in

this thesis, however, we have avoided using the term “translucency” in the subjective judgement

components of our experiments, due to our (untested) premonition that participants would likely

be confused by questions that are framed using that term. In the remainder of this thesis, we use

the term “transparency” in our discussion, to reflect the instructions given to participants.

Another hypothesized effect of the addition of a dot pattern onto a surface is the expected

creation of “holes” on the surface wherever the (black) dots are added. The proposed hypothesis

related to this is that, when observers are faced with the aforementioned cue conflict, they are

given the impression of looking through these holes in the real surface (the dots being the holes)

at the virtual object placed underneath the real surface. At the same time, however, because the

non-dotted parts are still occluded by the virtual object while remaining visible, this adds to the

impression of translucency, as discussed above.

Moreover, by using a uniform colour for the dots in the dot pattern (as shown in Figure 4.2 (b)),

it is postulated that a potential consequence of the virtual object occluding the dots may be the

illusion of a uniform background, of the same colour as the dots, lying behind the virtual object,

16 It is worth noting that the difference between the occlusion cue in our stimuli and Julesz’s random dot patterns is

its congruity with the binocular disparity cue. In other words, while the occlusion cue is in agreement with the

binocular disparity cue in Julesz’s random dot patterns, the incongruity of the occlusion cue and the binocular

disparity cue is what actually leads to the impression of ‘transparency’ in the case of our stimuli.

33

within the real object. As explained in Figure 4.3, the reasoning here is that, in contrast to the

non-dot portions of the pattern, which retain all of the original surface information, the black dot

portions occlude the information on the surface. Consequently, it may be possible for an observer

to perceive all of the black dot parts of the image as belonging to a large black background17.

This percept is likely to be reinforced further by the portions of the black background that are

occluded by the real surface and the virtual object that are clearly in front of that background.

17 Note that there is no reason for the random dots necessarily to be black, and thus for the background always to be

perceived as being black. In principle any colour of dots should produce the same effect, although obviously some

colours will be more appropriate than others, depending on the colours and features contained in the real surface.

For example, because of the ‘dark is deep’ bias, it is suspected that random dots with darker colours compared to the

real surface would work best.

Real Object Surface

Black dots

34

Figure 4.3: Hypothesised percept when using a dot pattern as a means of surface manipulation.

The top portion of the image shows a magnified (2D) view of the real surface (skin), which has

been altered by adding a random dot pattern. The lower portion of the image shows the top view

of the observer as he/she may perceive the image if this percept is achieved.

4.3 Information Preservation

Although other patterns such as a checkerboard might also achieve the impression of

transparency, compared to regular patterns the randomness of these patterns is intended to aid

users in focusing their attention on the surface rather than the pattern itself. In other words, the

use of prominent patterns that take on a character of their own may lead to adding visual noise

rather than enhancing the overall effectiveness of the presentation (Interrante, 1996). Moreover,

using a random dot pattern allows for independent experimental control over the density of the

black dots18.

4.4 Computational Costs

Since presentation of the real surface requires no image processing steps other than the

overlaying of the random dots, computational costs can be minimized. That is, unlike the use of

strokes for adding texture (Interrante et al., 1997), one does not need to have a detailed model of

the real object. The only extra step is to render the black dots of the pattern at depths

corresponding to points on the real surface, which can be done by obtaining a partial model,

using a depth map obtained from stereo pair images19. While it may be argued that adding grid

lines to the real object’s surface may require the same (relatively low) level of modelling of the

real object, as discussed in the previous section grid lines run the risk of forming a distracting

pattern, which would be undesirable. As an alternative to computationally overlaying the random

dot patterns on the real surface, it may also be appropriate under certain circumstances to use a

18 From a practical point of view, since the pattern is random, the ultimate user of such a display system could be

provided with the means to easily adjust the parameters of the random-dot mask (such as dot size, dot density, dot

distribution, etc.) in real-time in order to preserve the visibility of desired content on the real surface.

19 It is important to distinguish between different extents to which one can model a real object surface. In the

present case, we are considering a point cloud depth map obtained from scanning a real surface, or from performing

stereo matching, to comprise a relatively minimal extent of modelling that surface, in contrast to more extensive

models that involve quantitative relationships among all, or most, components of the object.

35

projector to project a pattern onto the real object surface, in which case no model at all would be

necessary.

4.5 Past Work

Before presenting the experimental work done for this thesis, it is important to provide a brief

overview of what was done to investigate the effectiveness of this idea prior to the

commencement of this PhD work. The first implementation of this idea was done by Otsuki and

Milgram (2013), where random dot patterns of different dot sizes and dot densities were overlaid

onto a pink (virtual) background, which was intended to represent a flat surface of an object, as

illustrated in Figure 4.4. Their results confirmed the bias found by Ellis & Bucher (1994), Ellis &

Menges (1998) and Johnson et al (2003) towards perceiving the virtual circle to be closer to the

observer than it actually was. Furthermore, using Thurstonian scaling (Thurstone, 1927), Otsuki

and Milgram’s results showed higher ratings for smaller dot sizes and higher dot densities in

response to the questions:

- In which image is it easier to perceive that the circle is behind the masking window20?

- In which image does the masking window appear to be more transparent?

Figure 4.4: Sample stimulus used by Otsuki and Milgram (2013). The blue circle indicates a

virtual object rendered beneath the depicted surface, which has been modified through the

addition of a pattern of random black dots. Reprinted by permission from IEEE 2013.

20 Masking window in this experiment referred to the part of the pink background that was covered with the black

dots.

36

The limitations involved in the implementation and results of that series of experiments (which

will be discussed in depth in the following chapter) formed the motivation to further investigate

this idea. The next chapter presents the first set of experiments done as part of this PhD thesis.

37

Chapter 5

Experiments 1 and 2: Effect of Using Random Dot Patterns on Depth Order Disambiguation, Perception of Transparency and Surface Information Preservation21

As discussed earlier, to expand the potential application areas of X-ray vision with stereoscopic

displays, by means of offering a viable compromise between depth perception, surface

information preservation and minimal computational expense, we proposed adding random dot

patterns to the surface of real objects. Despite the potential advantages, this method is

nevertheless similar to the solutions presented in Chapter 3, in that it involves a trade-off

between depth information and real surface content preservation. As part of our effort to explore

that trade-off, and thereby the potential effectiveness of this method in dealing with the

challenges of X-ray vision with stereoscopic AR, the present chapter describes and presents the

results of a set of experiments which aimed to determine the effect of dot size and dot density on

both perceived transparency (related to perception of depth order) and perception of real surface

information22.

This set of experiments consisted of 2 experiments: Experiment 1 and Experiment 2. Experiment

1 focused on investigating the feasibility of this display principle and assessing the effect of

random dot patterns in perceiving the correct depth order between a virtual object and a real

surface. Experiment 2, on the other hand, was aimed at examining the effect of relative dot size

and dot density on perceiving the impression of transparency of the same real surface while

preserving surface information. This chapter presents the detailed description and results of these

experiments.

21 Note that large portions of this chapter coincide with the 2017 publication by Ghasemi, Otsuki, Milgram &

Chellali in the journal Presence.

22 While it can be argued that surfaces contain features, optical arrays contain information about these features and

observers detect these features by using the optical information, for the sake of simplicity, surface features will be

referred to as surface information throughout this thesis.

38

5.1 Purpose

The collective purpose of the two experiments was to investigate the trade-off involved in

perceiving the correct depth order for a virtual object that is intended to appear behind a real

surface, and the perception of sufficient information about the real surface. In particular, these

experiments were designed to answer the following questions:

• Can the addition of a random dot pattern lead to disambiguation of the depth order

between the virtual object and the real surface?

• With the virtual object being perceived as behind the real surface, does the addition of a

random dot pattern lead to a more convincing impression of ‘transparency’ of the real

surface? If so, what are the effects of dot size and dot density of the random dot pattern in

achieving this impression?

• Does a trade-off exist between perceiving ‘transparency’ and preservation of surface

information?

• Based on these results, how can one optimize the dot size and dot density of the random

dot patterns to achieve X-ray vision while preserving sufficient surface information?

In the next section we provide a description of the experimental method that was used to address

the above questions.

5.2 Experimental Method

In investigating the effect of dot size and dot density on the ability to perceive both depth order

and surface information, it is important to use an appropriate distance between the real surface

and the virtual object, such that the virtual object can easily be perceived as being behind the real

surface. In other words, our primary objective here was not to examine participants’ ability to

discern different distances between the virtual object and the real object surface. Rather, our

objective was first to ensure that participants would be able to perceive that the virtual object was

behind the surface, and then to explore the factors that influence the resulting sense of the

transparency of that surface and their ability to perceive information on the object surface.

For this reason, two experiments were done. In addition to testing the effect of random dot

patterns on depth order disambiguation, Experiment 1 also aimed to determine an appropriate

39

distance for placing the virtual object in Experiment 2. In doing so, we aimed to reveal the

presence and sensitivity of any perceptual bias in localizing the virtual object within the vicinity

of the real surface. Experiment 2, on the other hand, was designed to investigate the trade-off

involved in perceiving the impression of transparency while also preserving surface information.

In this section, we discuss image generation and presentation and provide information about the

participants for both experiments. The sections after that discuss each experiment separately.

5.2.1 Image Generation and Presentation

An example of the stimuli used in the experiment is shown in Figure 5.1, which is a simplified

version of the more general case depicted in Figure 4.2 (b), but with the complex 3D face in

Figure 4.2 (b) replaced by a (purple) textured plane perpendicular to the line of sight. With

regards to the apparent similarity here to stimuli used in an earlier experiment reported by Otsuki

& Milgram (2013), as shown in Figure 4.4, we note that a primary goal of the present

experiments was to investigate the effectiveness of this method when applied to real surfaces (in

compliance with the definition of AR). For our real object, we employed a coloured photo of a

real textured surface that was extracted from a volume of professional photographs by P. Brodatz

(Abdelmounaime & Dong-Chen, 2013; Brodatz, 1966)23. In doing so, our intention at this point

was that the surface, as shown in Figure 5.1, would be flat and would comprise a visible 2D

texture. The absence of 3D textural elements on this surface24 was intended to provide us with

the means of evaluating our solution for specific surface types, such as those that might be

considered analogous to the smooth surface of organs containing 2D marks, spots or vessels.

Once the random dot patterns were generated (as explained below) and overlaid onto the real

surface, all images were rendered stereoscopically using a desktop computer (Windows 7

Professional OS with NVIDIA Quadro 600), coded using Visual C++ 2010 and OpenGL. The

stimuli were presented to participants on a 23-inch LCD screen (ASUS VG236HE, 1920 x 1080

23 These textures are publicly available in support of research on image processing and image analysis.

24 Recall the distinction between these three types of textures, outlined in Section 2.4.

40

resolution, 120 Hz refresh rate). Stereo images were observed using the NVIDIA 3D vision

system with 3D Vision 2 glasses.

Figure 5.1: Example of a stimulus stereo pair used in the experiment. The blue circle indicates a

virtual object rendered (0.35 mm) beneath a textured purple surface, which has been modified

through the addition of a pattern of random black dots. (The reader is referred to Figure 1.3 for

instructions on how to free fuse such stereo images.)

For all trials, the real object surface with the random dot pattern was presented at the same depth

as the display surface (i.e., with zero disparity)25. The blue virtual circle, on the other hand, was

rendered at different depths, based on an equivalent parallel camera orientation, depending on

the particular stimulus presentation. The on-screen horizontal disparities for the circle were

calculated based on a fixed viewer-to-display distance of 40 cm and an assumed average inter-

pupillary distance of 65 mm. To prevent the use of the relative size depth cue, the diameter of the

circle was kept constant, at 187 pixels, regardless of the distance from the surface. The line width

of the circle was also kept constant, at 2 pixels. Together with the selection of the real surface,

outlined above, the colour and line width of the virtual circle were chosen such that the stimulus

as a whole could be considered analogous to a partial endoscopic view of an organ with a virtual

vessel rendered beneath the surface.

In keeping with our goal of investigating the case of incongruous AR displays in this experiment

(as discussed in Section 4.2), no occlusion cues suggesting the blue virtual circle being behind

25 Because the real object surface was flat and was rendered with zero disparity for the present experiment, it was

functionally equivalent to a monoscopic image.

41

the real surface were present in the stimuli. In other words, as seen in Figure 5.1, the blue virtual

circle was continuous – even though it was stereoscopically rendered behind the surface.

In both experiments, the random dot patterns were generated using the MATLAB function

‘rand’. In all cases, the textured surface was square, with an area of 334x334 pixels, and the area

of the random dot pattern, also square, was 148x148 pixels.

Dot size (DS) and dot density (DD) were varied throughout both experiments, as illustrated in

Figure 5.2. The parameter that we are calling dot size should, technically speaking, be referred to

as ‘relative dot size’, since it refers to the fraction into which each dimension was divided, rather

than the actual physical size of the dots. For example, a (relative) dot size of 1/25 means that a

25x25 grid was used to generate the random dot pattern. For our 148x148 pixel grid, a dot size of

1/25, for example, therefore meant that each dot had an area of 6x6 pixels. Dot density, on the

other hand, refers to the percentage of the entire random pattern area that was covered with dots.

It should be noted that these two parameters are independent of each other. In addition to the

stimuli presented in Figure 5.2, a ‘No Pattern’ condition was also presented.

5.2.2 Participants

For each of the experiments, 15 students from the University of Toronto were recruited, all 18-39

years old (7 male and 8 female for Experiment 1 and 12 male and 3 female for Experiment 2).

All participants either had normal visual acuity or used corrective devices to achieve normal

visual acuity during the experiments. To confirm the absence of any stereoscopic vision

problems, the NVIDIA 3D stereo vision test26 was administered. After taking the stereo vision

test, participants were given an information sheet outlining the details of the experiment. They

were then given the consent form to sign, which was followed by a brief questionnaire. Copies of

these are included in Appendix A1 and A2. Participants of Experiment 1 were precluded from

26 The NVIDIA 3D stereo vision test is a simple application through which the ability to see in 3D can be verified.

When this application is launched, the letters in ‘nVIDIA’ and NVIDIA’s logo start moving back and forth in depth.

If the participant is able to see stereoscopically, he/she can attest to their ability to perceive this motion in 3D.

Conversely, if any potential participant were to be unable to detect those depth changes, s/he would not be accepted

for participation in the experiment.

42

participating in Experiment 2 to prevent learning effects. As compensation, participants were

each paid $15/hour.

Figure 5.2: Stimuli used for Experiments 1 and 2. Only the 9 stimuli in the 40, 50 and 60%

columns were used in Experiment 1. All 12 stimuli were used in Experiment 2.

5.3 Experiment 1

5.3.1 Objectives and Hypotheses

The aim of this experiment was to test the basic premise of our AR X-ray vision concept –

whether adding random dot patterns is indeed able to facilitate the perception of an incongruous

virtual object located behind a real surface. In detail, our first hypothesis (H1) was that when

virtual objects are stereoscopically rendered behind, but very close to, the real surface, the

addition of random dot patterns can lead to disambiguation of the depth order between the virtual

object and the real surface.

Expanding further upon H1, it was hypothesized that, because all portions of the virtual circle

were always visible in the image (as opposed to portions of it being occluded by the real object

surface), the participants would be biased towards perceiving the virtual circle as being closer to

the viewer in comparison with its actual geometric location, as defined by its imposed

43

stereoscopic disparity. In other words, whenever the virtual circle was presented, by means of

on-screen disparity, to be in front of the real surface, it was hypothesized that this would be

unambiguously perceived as such. However, whenever the circle was rendered to be behind the

real surface, we hypothesized (H1a) that it would be perceived to be closer to the surface than its

actual distance behind it.

Moreover, considering our postulate that the addition of random dot patterns can lead to

disambiguation of the depth order between the virtual object and the real surface, we predicted

that, in cases where the random dot pattern was present, participants would be more accurate in

determining the virtual circle’s position (H1b).

In addition to testing the above hypotheses, a second goal of this experiment was to determine an

appropriate depth for positioning of the virtual circle for Experiment 2, to permit compensation

for the predicted bias. In other words, our aim was to increase the probability that participants in

Experiment 2 would consistently perceive the virtual circle as being placed behind the real

surface. Therefore, both accuracy, in terms of determining the presence of any perceptual bias in

localizing the virtual circle within the vicinity of the real surface, as well as precision, in terms of

estimating the sensitivity of perceiving the location of the circle, were investigated. To this end,

the psychophysical method of constant stimuli was used (Gescheider, 2013), comprising a series

of trials in which the virtual circle was presented at different distances both in front of and

behind the real surface.

5.3.2 Procedure

After getting acquainted with the software, participants were shown a series of stimuli, to each of

which they responded whether they perceived the circle as being in front of or behind the

surface. The virtual circle was presented at 6 distances relative to the surface, three in front and

three behind. Relative to the physical setup of our experiment, the values used, all in mm, were:

{+0.2, +0.35, +0.5} in front and {-0.2, -0.35, -0.5} behind. (These distances were equivalent to

disparity angles of {-0.24, -0.49, -0.7} (in front) and {+0.24, +0.49, +0.7} (behind), in units of

44

arc-minutes27.) These values were selected based on pilot studies performed using the three dot

sizes {1/25, 1/50, 1/75} and the three dot densities {40%, 50%, 60%}, as well as the ‘No Pattern’

condition. The objective in choosing these particular values was to maximize the sensitivity for

identifying the associated thresholds of depth perception by emphasising values within the

expected transition zone of the resulting psychophysical functions, while avoiding any ‘floor’

and ‘ceiling’ effects associated with 100% certainty judgements, which were expected to have

resulted if substantially larger distances in front and behind had been selected.

With 5 trials for each combination of conditions, this led to 300 trials (6 x (3x3 +1) x 5) for each

participant. The stimuli containing the random dot patterns used are shown in the first three

columns of Figure 5.2. The presentation order of the stimuli was randomized. Participants had 4

seconds to reply to each presentation. (This time limit was chosen through extensive pilot testing,

to reduce speed-accuracy trade-off effects.) If participants ran out of time for a particular

stimulus, the subsequent stimulus would appear automatically, but the missed trial would

reappear, unbeknownst to participants, later on in the experiment. This would occur as many

times as required until the participant had successfully replied within the time limit for that

stimulus.

5.3.3 Results and Discussion

Figure 5.3 shows the results obtained from Experiment 1 (for each dot size), where each curve

represents a psychophysical function fitted to the associated set of experimental data

(Gescheider, 2013). It should be recalled that only the 9 stimuli in the 40, 50 and 60% columns

of Figure 5.2 were used in this experiment. The y-axis in Figure 5.3 represents the proportion of

times that the circle was perceived as being in front of the surface, averaged over participants.

The x-axis represents the actual position of the circle relative to the surface. The dashed vertical

line indicating x=0 (mm) corresponds to the Point of Objective Equality – that is, the

27 The disparity angles were obtained from the equation r=(d*I)/(D*(D+d)) where r, d, I and D correspond

respectively to disparity angle, predicted depth, inter-pupillary distance and viewing distance (Patterson, 2009). Note

that because the units in both the numerator and denominator of this equation cancel each other, the disparity angle,

r, is obtained in radians and can be converted to units such as arc-minutes. Note as well that reporting disparity

values when presenting results has been recommended by researchers in the 3D community, since it “affords more

efficient and accurate cross-study comparisons” (McIntire, Havig, & Geiselman, 2014).

45

(hypothetical) case for which the circle would be placed exactly at the depth of the real surface28.

For comparison purposes, the results for the “No Pattern” condition have also been included in

the graphs for all three dot size conditions.

Looking first at the No Pattern results (the same in all three graphs), we see clearly that the Point

of Subjective Equality (PSE), defined as the interpolated intersection of each fitted

psychophysical function with the 0.5 proportion level (shown as a dashed horizontal line in

Figure 5.3) lies at 0.493 mm behind the plane of the real surface. What this means is that if the

virtual circle had actually been placed at this distance behind the real surface, participants would

have perceived it 50% of the time as being in front of and 50% of the time behind that location.

In other words, the PSE or the hypothetical location at which participants believed on the

average that the virtual object was located on the surface, was actually 0.493 mm behind the

surface (and farther from the participants). This result was thus in support of our hypothesis H1a.

Referring now to the random dot pattern responses, for each of the relative dot sizes there does

not appear to be any obvious differences among the three dot density (DD) graphs. On the other

hand, for the DS = 1/25 graph, the PSE appears, for all three DD values, clearly to be behind the

surface, similarly to the No Pattern results. However, for the other two DS values (1/50 and

1/75), the PSE values appear to be very close to 0.

Comparing the random dot pattern psychophysical functions to those of the No Pattern condition,

one can observe that the PSE values for the two dot sizes of 1/50 and 1/75 lie closer to zero than

for the No Pattern condition. These observations suggest that, unless very large dot sizes are

used, the addition of random dot patterns can help with disambiguation of the depth order

between virtual objects and the real surfaces. This result thus supports hypothesis H1b29.

28 Note that this condition was not in fact part of the stimulus set.

29 It is worth noting that, given that these results were based on the psychophysical function, significance testing

was not feasible and, therefore, the support of H1a and H1b should not be deemed as statistically significant.

Moreover, another limitation involved with this experiment is that the psychophysical functions were fitted to data

that were too close to the Point of Objective Equality. Ideally, including stimulus distances that were both farther

behind and farther in front of the real surface would likely have resulted in more reliable estimated psychophysical

functions.

46

To determine the minimum distance that would ensure that the participants would ‘reliably’

perceive the virtual circle as being behind the real surface, a maximum error frequency of 25%

was chosen. Amongst the 10 conditions tested, the largest distance corresponding to the

intersection of the fitted psychophysical functions with the 0.25 proportion level belongs to the

largest relative dot size (1/25) and smallest dot density (40%), and is equivalent to 2.68 mm

behind the real surface. Therefore, for the next experiment, it was reasoned that, as long as the

displacement chosen places the virtual circle beyond this distance behind the real surface, one

could be confident that the circle would be consistently perceived as being behind the real

surface (with a maximum error frequency of 25%, for the DS=1/25, DD=40% condition, and a

much smaller error frequency for all of the other conditions). In fact, to reduce the error

frequency further, the blue virtual circle was presented even farther away, at a distance of 3 mm

(equivalent to 4.16 arc-minutes) behind the screen/surface for the next experiment30.

30 Care was taken not to place the virtual circle too far behind the real surface, by confirming that this value was

within Panum’s fusional area, to ensure that binocular fusion would be maintained. To do so, pilot testing was done

to confirm that no reports of difficulty in fusing the virtual circle were made.

47

(a)

48

(b)

49

(c)

Figure 5.3: Psychophysical functions fitted to results of Experiment 1 for dot sizes of (a) 1/25,

(b) 1/50 and (c) 1/75.

5.4 Experiment 2

5.4.1 Objectives, Hypotheses and Procedure

As explained above, the goal of this experiment was to investigate the trade-off involved

between concurrently perceiving surface transparency while preserving the ability to discern

surface information. We hypothesized (H2) that, whereas on the one hand it should be easier

relative to the No Pattern conditions tested to perceive transparency whenever random dots are

added (H2a), on the other hand surface information should be easier to preserve for the No

Pattern condition, for which there are no random dots to interfere with examining the content of

the surface (H2b).

50

We also hypothesized (H3) that increasing the dot density of the pattern would result in a

stronger impression of transparency (H3a) but a reduction in preservation of surface information

(H3b). The reasoning behind this is that, as previously explained, the black dots were expected to

give the impression of there being ‘holes’ in the surface, such that with larger proportion of holes

in the surface, it should be easier to see through it (i.e. more perceived transparency) but harder

to retain information about the portions of the surface with the black dots.

On the other hand, it was also hypothesized (H4) that increasing the dot size (which is not the

same as increasing the dot density) should lead to a weaker sense of transparency (H4a), since

larger dots will yield a smaller number of dots (or holes) on the surface to be seen through.

Moreover, those larger chunks of coherent surface information being occluded by the pattern

were expected to lead to a reduction in surface information preservation (H4b).

To investigate these hypotheses, the experiment was conducted in two consecutive sections (1

and 2). For both sections, the blue virtual circle was presented at a constant disparity angle of

4.16 arc-minutes, as explained above. The independent parameters, illustrated in Figure 5.2, were

three relative dot sizes {1/25, 1/50, 1/75} and four dot densities {40%, 50%, 60%, 70%}, as well

as the ‘No Pattern’ condition.

It is worth pointing out some more of the important differences between the current experiment

and an earlier set of related experiments reported by our team (Otsuki & Milgram, 2013). In that

earlier experiment, although a similar psychophysical test was administered, there was no

attempt to employ it to compute an effective location for the virtual object for their subsequent

investigation of perceived transparency. This resulted in their placement of the virtual object too

close to the real surface to act as a reliable stimulus for exploring the transparency effect in their

investigation of the incongruous condition. In addition, the surface used in that experiment

contained no texture, which, in addition to the fact that it was simulated rather than real, made it

somewhat less realistic. Finally, there was no attempt in that experiment to explore the ability to

discern surface information, and thus to explore the hypothesized trade-off explained below.

5.4.1.1 Section 1: Perception of Surface Information

Section 1 of Experiment 2 aimed to assess the effect of the random dot pattern parameters in

terms of any potential loss of surface information. Since the surface, by itself, did not contain

51

any specific information to be preserved, there was a need to add elements onto the surface.

These additional elements were covered by the random dots just as any other surface containing

such elements would be. (An example of this, once again, could be the surface of an organ

containing visible vessels.) To investigate how much information was lost due to the addition of

the random dot patterns, a shape matching task was designed, to evaluate participants’ accuracy

in identifying information presented on the real object surface when covered by different random

dot patterns. To accomplish this, each real surface was modified by adding to it a pair of

concentric yellow shapes – either two circles or a circle and an ellipse – after which the random

dot patterns were added31. As shown in the example of Figure 5.4(c), this means that the black

dots occluded different parts of the yellow shapes in different ways, depending on the particular

random pattern, just as they occluded the rest of the surface. (Note that, although the blue virtual

circle was still present for the surface information task, and was rendered behind the real surface,

it did not play any role in the shape matching task.)

The outer yellow shape for this task was always a circle. However, the inner yellow shape had a

30% probability of being also a circle (Figure 5.4(a)) or a 70% probability of being an ellipse

(Figure 5.4 (b)). The task was to determine, within 6 seconds, whether the inner yellow shape

was also a circle, like the outer circle, or whether it was an ellipse – that is, not a circle32.

To help participants do the shape matching task, they were advised during their training to

visually scan the whole image to examine the separation between the inner yellow shape and the

outer yellow circle. In other words, if the two shapes appeared to be equally separated from each

other around their circumferences, it was logical to conclude that they were both circles, whereas

if the separations appeared to vary, the conclusion should be that one shape was an ellipse. It

should be noted that, because we wanted this to be a relatively difficult task, the ellipses were

31 It should be noted that, although the yellow shapes were digitally added to the surface (and not, specifically,

captured by a sensor), they were meant to be considered as a ‘real’ feature present on the real object’s surface.

32 Should the reader, after examining Figure 5.4, be of the opinion that this was a difficult task, that was exactly the

intention!

52

designed to have very small eccentricities33. As can be seen in Figure 5.4(a) and Figure 5.4(b),

the difference between the two surface accuracy conditions was very slight.

Keeping in mind our overriding goal of evaluating whether an observer would be able

holistically to examine large parts of a real surface while employing our stereoscopic AR

display, we made the task even more difficult by preventing participants from focusing on only

one specific region of the stimulus. To accomplish this, the orientation of the major axis of each

ellipse was varied randomly and, in addition to pronouncing whether or not any particular

stimulus was an ellipse, participants were also asked to identify the direction of the major axis of

that perceived ellipse. (This was also intended to reduce the likelihood of guessing the

responses.) The orientations could possess any value from 0 to 180º, with 18º intervals, resulting

in 10 possible orientations. If participants perceived the inner yellow object as a circle, they

would press the ‘up’ arrow. On the other hand, if they perceived the inner object as an ellipse,

they were asked to indicate, using the numeric keypad, which of the 10 orientations of the major

axis of the ellipse they had observed, according to the response selection scheme presented to

them, as depicted in Figure 5.5.

For each combination of dot size (DS) and dot density (DD), as well as for the No Pattern

condition, 10 trials were randomly presented to each participant, of which 7 were ellipses (with a

10% chance for each orientation, unbeknownst to them) and 3 were circles. This led to a

minimum of 130 trials ((3x4 +1) *10) for each participant. The presentation order of the stimuli

was randomized. None of the shape matching conditions occurred more than once.

The parameter values for the experiment – namely eccentricity, number of response angles, time

limit duration – were selected on the basis of extensive pilot testing.

In trials where participants ran out of time, the experiment would automatically move on to the

next stimulus and the missed trial would repeat itself throughout the experiment as many times

as required until the participant had replied to all stimuli within the time limit.

33 In fact, the ellipses were not obtained according to the formal definition of eccentricity; rather, the ‘ellipses’ were

obtained by multiplying the x-axis of a corresponding circle by a factor of 0.95.

53

To motivate participants during the experiment, a lottery with a $50 gift card prize was

performed after all experiments were done. The participants were informed that the number of

lottery ballots assigned to their name would be proportional to their respective performance

scores.

(a)

(b)

(c)

54

Figure 5.4: Samples of stereo pairs illustrating the shape matching task for assessment of surface

information. (a) The inner and outer yellow objects are both circles. (b) The inner yellow object

is an ellipse. (a) and (b) constitute the No Pattern condition. (c) Example of task with random dot

pattern present, and where inner yellow object is an ellipse. The orientation of the major axes of

the ellipses in (b) and (c) are 54º (corresponding to level 3) and 144º (corresponding to level 8),

respectively.

Figure 5.5: Options for designating the orientation of the major axis in ellipse conditions. This

image was provided as a guide for assisting participants in selecting their responses to the ellipse

axis orientation questions, in the form of numerals 0 to 10 on the computer keypad.

For analysis purposes, Signal Detection Theory (SDT) was used for assessing performance on

distinguishing circles from ellipses. In addition, the absolute offset errors in detecting the

orientation of the major axis of the ellipse (using the numerical responses shown in Figure 5.5)

were averaged across each condition. According to the hypotheses presented in the beginning of

this section, it was hypothesized that as both dot density and dot size increased, performance on

the surface identification task would decrease. In particular, it was hypothesized that d’ values,

which are indicative of detection sensitivity, would decrease, while average absolute offset errors

would increase. The reasoning behind these hypotheses (H3b and H4b) was that, as relatively

greater portions of the yellow objects were covered by dots, it would be more difficult to perform

the shape matching task. For obvious reasons, the No Pattern condition was expected to result in

the highest sensitivity and lowest average offset error, since the yellow shapes were completely

unobstructed (hypothesis H2b).

55

5.4.1.2 Section 2: Impression of Surface Transparency

Section 2 of the experiment, which was administered to the same participants directly following

completion of Section 1, focused on exploring the relative effectiveness of the random dot

pattern parameters for creating the perception of transparency. Prior to starting this section of the

experiment, the purpose of the research and the concept of ‘transparency’ in the present context

were explained and demonstrated to participants. In particular, they were instructed that they

would be shown a set of images similar to that illustrated here in Figure 5.2, in each of which the

blue wireframe circle should appear to them to be located behind the portion of the textured

purple surface containing a random dot pattern. They were also told that, due to the manner in

which the display had been created, it was likely that they would perceive the textured purple

surface as being transparent34, and that the goal of this part of the experiment was to explore the

manner in which they perceived this transparency effect.

Because we did not consider it feasible to estimate in a direct and objective way how participants

would be able to perceive ‘transparency’ in the present context, we instead deemed Thurstone’s

classical method of paired comparison scaling (Thurstone, 1927) to be the most viable means of

achieving this end. During the data gathering phase, participants were presented with all possible

pairs of the images shown in Figure 5.2 (plus the No Pattern condition), two at a time. They had

unlimited time to examine each pair of images and to respond to the question: “In which image

is the impression of transparency more convincing?” The 13 different conditions (3 dot sizes x 4

dot densities + no pattern condition) resulted in 78 paired comparisons for each participant,

which were then aggregated over all participants and transformed into an (equal interval) scale of

Transparency Ratings (TR).

It should be pointed out that the question presented to participants was designed such that, rather

than asking directly about the perceived ‘degree’ of transparency, the relative strength of their

impression about transparency was instead being questioned. It is also important to realize that

there is no real zero on the equal interval scale of values resulting from this procedure, such that

34 Note that, as explained earlier, we avoided using the term ‘translucency’ for this experiment, based on our

presumption that participants might be confused by that term.

56

high or low comparative impressions of transparency do not necessarily translate to high or low

absolute ratings of degree of transparency.

Based on previous findings (Otsuki & Milgram, 2013), it was hypothesized that larger dot

densities and smaller dot sizes would lead to higher ratings for impression of transparency

(hypotheses H3a and H4a respectively), and in addition that the No Pattern condition would

yield the lowest rating (hypothesis H2a). One explanation for this is that the black dots in the

random dot pattern were postulated to be perceived as holes in the surface, such that, by

increasing dot density, the increased proportion of perceived holes should lead to a stronger

sense of transparency. On the other hand, it was surmised that increasing the dot size would lead

to a weaker sense of transparency, since larger dots (at the same dot density) result in a smaller

number of perceived holes on the real surface. Based on the same reasoning, it was expected that

the control condition comprising no pattern would result in the lowest transparency ratings

(hypothesis H2a).

It should be noted that the extra 70% dot density conditions that were added to Experiment 2

were a result of pilot tests, which led to the prediction that including these conditions would

potentially provide a better manifestation of the expected trade-off, explained in the next sub-

section.

5.4.1.3 Hypothesized Trade-offs

Before examining the results of the experiment, it is important to understand the relationship

between the various hypotheses presented for the two sections. Figure 5.6 summarizes those

respective hypotheses and illustrates our a priori expectation about the relationship between

them. The primary message to be extracted from Figure 5.6(a) is the trade-off between what we

believe to be the two primary objectives of augmented reality X-ray vision: effectively

presenting the impression of a virtual object (in this case the blue circle) being inside of a real

object (i.e. effectively equivalent to conveying the impression of surface transparency) while

concurrently maintaining the ability to observe and understand any pertinent information (in this

case the yellow circle and/or ellipse) on the surface of that real object (i.e. perception of surface

information). Figure 5.6(b), on the other hand, suggests that having smaller dots should always

have the effect of better perceiving surface transparency, while also retaining surface

57

information. The results presented in the following section should be read in light of these two

sets of hypotheses.

(a) (b)

Figure 5.6: Schematic illustration of hypotheses for both parts of Experiment 2. (a) effect of dot

density (H3); (b) effect of dot size (H4).

5.4.2 Results and Discussion

As mentioned, to assess participants’ performance in detecting ellipses, signal detection theory

(SDT) was used, where the occurrence of an inner ellipse was considered a “signal” event, and a

“hit” occurred whenever an ellipse was correctly detected as an ellipse35. To obtain a set of

average performance data over all participants, hits and false alarm rates were aggregated across

participants and then used to estimate the two collective SDT parameters, d’ and beta, for each

condition. The d’ results for different dot sizes and dot densities are shown as solid lines in

Figure 5.7.

35 Although there were 10 possible response angles (i.e. orientations) for the elliptical signal conditions, it is

important to note that these were all considered as having equivalent signal strengths. In other words, our

assumption was that there was one single value of d’ for the signal present case, rather than 10 different signal

strengths.

58

Figure 5.7: d’ and Transparency Rating (TR) results obtained from Experiment 2. The solid

lines join the d’ results, corresponding to the left hand axis, while the dashed lines join the

transparency ratings (TR), corresponding to the right hand axis. The yellow horizontal lines

correspond to the No Pattern condition.

The transparency rating (TR) measures are also included in Figure 5.7, as dashed lines. The No

Pattern condition results supported our hypothesis (H2a) of having the lowest TR value. (For

convenience, this value was assigned a value of zero on the scale derived from the paired

comparison data.) However, we were unable to identify a clear trend for the remaining TR values

for either different dot sizes or dot densities, according to Figure 5.6 and in support of H3a and

H4a. Comparing these results to those of Otsuki and Milgram (2013), who carried out an

analogous test that included DD=25%, in comparison with DD=50%, it is suspected that

designing our experiment with lower dot densities (<40%) might have allowed us to observe the

hypothesized increasing trend of TR values with increased dot density, as depicted in Figure 5.6.

Nevertheless, the substantial difference between the TR value for the No Pattern condition and

the TR values for the pattern conditions in support of hypothesis H2a demonstrates at least to

some extent the potential effectiveness of this method for creating the percept of transparency.

With regards to discerning surface information, it was hypothesized that with increases in both

dot density and relative dot size, performance on the detection task should decrease (hypotheses

H3b and H4b). This appears to have been supported by the results shown in Figure 5.7, where

59

the d’ values do in fact decrease with increases in both DS and DD. However, it is important to

note that ‘good performance’ is manifested in Figure 5.7 by d’ values in the vicinity of 1,

whereas d’ values in the vicinity of 0 (and below) represent essentially chance performance. In

addition to implying that the difficulty of the shape matching task may have been too high, this

suggests that this observed trend may not be that strong. On the other hand, the No Pattern

condition conforms to the expectation of yielding the highest d’ value (hypothesis H2b).

The averages of the absolute offset errors for the ellipse orientation task were plotted as a

function of dot size and dot density (Figure 5.8). It is worth noting that these offset errors

correspond only to the cases in which participants correctly judged the presence of the ellipse. As

can be seen, dot density does not seem to affect these errors in a meaningful way. Dot size,

however, does seems to have had an effect on the error, with the largest dot size (1/25) leading to

smaller mean offset errors, even when compared to the No Pattern condition. To check the

significance of this apparent finding, a two-way ANOVA was carried out, followed by post hoc

tests. Results showed that average offset errors were indeed significantly affected by the dot size,

F(2,28)=16.37, p<.0001 but not by dot density, F(3,42)=0.329, p>.05. Contrasts revealed that

average offset errors for the 1/50 dot size, F(1,14)=18.55, and the 1/75 dot size, F(1,14)=22.34,

were significantly larger than those of the 1/25 dot size.

Figure 5.8: Mean absolute offset errors as a function of dot size and dot density. The orange

horizontal line corresponds to the No Pattern condition.

60

This interesting finding may initially seem to contradict the SDT results, which showed d’ values

reflecting essentially chance performance for the 1/25 and 1/50 dot size conditions. However,

referring to the fact that the offset errors correspond only to the cases in which participants

correctly judged the presence of the ellipse, this makes sense. In other words, it seems that it was

in cases where the larger black dots (with DS=1/25) did not occlude the intersection of the major

axis of the ellipse with the outer circle that participants were able to both correctly identify the

ellipse and be more accurate in determining its orientation. The smaller dot size (DS=1/75), on

the other hand, provided a better holistic representation of the surface information (resulting in

larger d’ values) without preserving the more detailed information (resulting in larger offset

errors). This finding suggests that, in cases where the location of the most essential surface

information is known, if it is feasible to find a random dot pattern that does not occlude this part

of the surface, achieving x-ray vision should be done using a larger dot size.

5.5 Contributions, Limitations and Conclusions

Results from this set of experiments showed that the use of random dot patterns can be effective

in contributing to the percept of transparency of real surfaces in 3D AR displays, with expected

relevance towards X-ray vision applications. In particular, the main contributions of these

experiments are:

• Random dot patterns (with appropriately designed dot sizes) were shown to be a

potentially effective method for disambiguating the depth order between virtual objects

and real surfaces with textures that lack 3D textural elements.

• By appropriately controlling the relative dot size and dot density of the patterns, it should

be possible to retain sufficient information about the real surface to enable a user both to

observe a virtual object being presented inside of a real one, while concurrently

examining the surface of the real object.

It is important, however, to point out that the experiments presented here were limited to the use

of a flat real surface with a 2D texture, and to a 2D wireframe virtual object being presented in

depth. Although such objects are easy to manipulate digitally, such conditions may be rare in

actual AR applications. For example, taking the medical domain as an important target

application, real objects are 3D organs that usually consist of convex surfaces. Such conditions

61

justify the need to determine whether the results observed for this flat real surface will also

pertain to convex real 3D surfaces. It would also be interesting to investigate the applicability of

these findings to cases where the virtual wireframe is presented in 3D.

Furthermore, the results of these experiments, which confirmed the potential of using random dot

patterns for improving depth order perception, serve as motivation to investigate this effect also

for improving absolute depth judgements. The conclusions thus provide the justifications and

framework for Experiment 3, which is the topic of the next chapter.

62

Chapter 6

Experiment 3: Effect of Using Random Dot Patterns for Improving Accuracy of Depth Judgements

As results from the previous set of experiments showed, the addition of random dot patterns to

real object surfaces can be effective in perceiving the virtual object as being behind the real

surface, which achieves the notion of X-ray vision. However, as described in Section 4.1, based

on theory, the addition of random dot patterns should not only allow for proper depth order

perception but it should also allow for more accurate absolute depth judgements. To investigate

this possibility, an experiment was designed in which participants were asked to judge the

absolute depth of a virtual object relative to a real surface. This chapter presents the detailed

description and results of these experiments.

6.1 Purposes

As mentioned, the primary purpose of this experiment was to investigate whether the addition of

random dot patterns can lead to improvements in the accuracy of absolute depth judgements

between the virtual object and the real surface. As may be recalled, the reasoning was that, by

adding random dot patterns to a real surface, we are able to provide observers with distinct

fixation points (being the edges of the dots), thus guiding them in making vergence eye

movements (between the virtual object and the real surface) and in making better depth

judgements (based on confirmatory information provided by the convergence cue). Therefore,

another goal of this experiment was to test this theory, by manipulating the distinctiveness of the

dots. Moreover, since developing a practically usable X-ray display requires a measure of the

user’s assessment about the difficulty of performing a depth judgement task, the other goal of

this experiment was to investigate this subjective difficulty.

Therefore, this experiment was designed to answer the following questions:

• Can the addition of random dot patterns lead to increased accuracy of absolute depth

judgements between the virtual object and real surface for non-flat surfaces?

• If so, is it true that the resulting increased accuracy of depth judgements is because

random dot patterns provide distinct edges?

63

• Do the addition of random dot patterns lead to an increase or decrease in the subjective

difficulty of performing a particular depth judgement task?

• Can design guidelines be formulated to assist in determining the dot size and dot density

of random dot patterns that achieve optimal depth judgement accuracies?

In the next section we provide a description of the experimental method that was used to address

the above questions.

6.2 Experimental Method

The same experimental platform described in the previous chapter was used to carry out this

depth judgement experiment. This section describes in detail the stimuli generation and

presentation, the experimental task, the procedures that were followed, as well as the

experimental hypotheses. (In the process of designing this experiment, several pilot studies were

performed; some of the key lessons learned from these pilot studies are presented in Appendix

B1.)

6.2.1 Image Generation and Presentation

An example of the stimuli used in the experiment is shown in Figure 6.1. The stimuli consisted

of a 3D image comprising several different parts, as shown in Figure 6.2. In this section, we

delve into the specific details of each of these parts.

64

Figure 6.1: Sample stereo pair of stimuli shown to participants. The pattern used in this example

consisted of random dots with sizes of 1/75 and distributed with 40% dot density. Four tenths of

the blue virtual truncated cone is presented behind the surface of the bin. For guidance on how to

fuse these images, see explanation provided in caption of Figure 1.3.

Figure 6.2: Diagram presenting different parts of the stimulus.

6.2.1.1 Real Object

For the real object, a circular dustbin was used. The reason for choosing a curved surface was to

investigate the effectiveness of our idea for non-flat (convex) surfaces, due to the fact that real

world objects, specifically in the medical domain, are rarely 2D, let alone flat. The inside

diameter of the bin was 43 cm. Since one of our claims is that our method is most effective for

real object surfaces without a prominent visible texture, the cylinder was covered with white

65

cardboard to ensure that this condition was met. Such a surface can be considered analogous to

the smooth surface of an organ, which does not contain distinct elements.

6.2.1.2 Random Dot Patterns

The random dot patterns were generated using the MATLAB function ‘rand’. As a means of

circumventing the challenge of writing software to superimpose the random dot patterns

digitally, the random dot patterns were projected onto the surface of the bin using an AAXA P4-

X pico projector. The distance of the projector from the bin was such that the projected pattern

was a 30 cm x 30 cm square. The position of the projected pattern remained constant throughout

all conditions. Throughout the experiment, dot size (DS) and dot density (DD) of the patterns

were varied. Recalling that, based on our definition, dot size refers to the fraction into which

each dimension is divided, dot sizes of 1/25, 1/50 and 1/75 corresponded to squares with 12, 6

and 4 mm sides, respectively.

Moreover, since one of the main purposes of this experiment was to investigate whether the

distinct edges of the random dot patterns were used as fixation points to guide observers in

making vergence eye movements and, therefore, more accurate depth judgements, each random

dot pattern was projected as either a sharp or a blurry image36. To blur the random dot patterns,

the Gaussian blur filter of Photoshop (with a blurring radius of 20 pixels) was used.

With 3 dot sizes, 3 dot densities, 2 (sharp vs. blurry) conditions, 18 different patterns in total

were projected onto the bin. Figure 6.3 and Figure 6.4 show monoscopic versions of the stimuli,

including both the sharp and blurry patterns, respectively. In addition to the stimuli presented in

Figure 6.3 and Figure 6.4, a ‘No Pattern’ condition was also presented.

36 It should be noted that eye tracking devices are necessary to investigate this phenomenon closely and definitively.

Blurring the patterns only allows for obtaining preliminary evidence for the possibility of this phenomenon

occurring.

66

Figure 6.3: Stimuli with sharp random dot patterns used for Experiment 3.

67

Figure 6.4: Stimuli with blurry random dot patterns used for Experiment 3.

6.2.1.3 Virtual Object

A sample of the stereo images used in the experiment is presented in Figure 6.1. The virtual

object was a truncated wireframe cone, with its top surface (smaller circle) placed closer to the

observer than its base (larger circle). Figure 6.5 depicts the front, side and top views of the

truncated cone. The decision to employ a 3D rather than a 2D virtual object was to investigate

the feasibility of extending the application of random dot patterns for achieving X-ray vision to

3D virtual objects. Additionally, using a 3D virtual object allowed for testing the accuracy of

depth judgements.

68

Figure 6.5: Front, side and top views of wireframe truncated cone (the virtual object) and

cylindrical bin (the real object).

Generally, in applications of stereoscopic AR, the 3D images taken from the real world are often

processed for purposes of camera calibration and to obtain depth maps. These depth maps are

then used to render a virtual object at a specific depth relative to the real object(s). However,

doing so requires a certain computational capacity (both software and hardware). Considering

that the focus of this PhD thesis was to investigate the human factors side of this approach rather

than its technical implementation, it was decided to simulate these conditions, without sacrificing

the validity of the obtained results, using real physical models. Therefore, to generate the virtual

object and render it at its appropriate depth a real model was used.

The steps to generate the virtual object were as follows:

a) Two concentric circles with different diameters were drawn on paper, cut out and

attached to the tips of a rod. Two perpendicular diameter lines were also drawn on the

circles. An illustration of this is shown in Figure 6.6. The rod was placed such that the

circles were perpendicular to the line of sight. A stereo image was taken of the circles

using a Fujifilm FinePix REAL 3D W3 stereo camera.

69

b) The left and right images were imported into Photoshop. For each pair, using the Custom

Shape Tool, rings were drawn onto the two circles. Corresponding points on the diameter

lines of the two circles were also connected using the Line Tool (with a thickness of 3

pixels). Once this virtual truncated cone was created, the rest of the image was deleted by

selecting the Inverse of the coloured truncated cone and saving the image as a .PNG file.

This sequence of steps is illustrated in Figure 6.7.

Figure 6.6: Diagram showing real model used for generating virtual truncated cone. (It should

be noted that, as mentioned, this model consists of concentric circles. However, since the image

shows this model from the side, these circles appear in this figure as ellipses.)

Figure 6.7: Sequence of steps taken to generate virtual truncated cone. As expected, the

connecting rod between the two circles cannot be seen in these images, as it is perpendicular to

the line of sight.

70

As explained below, the task involved estimating the distances between the proximal and distal

circles, as well as their distances to the surface of the cylinder, as shown in Figure 6.5. To

prevent participants from using the relative size depth cue in this task, two different truncated

cone lengths (the distances between the two circles) were presented, by using either a 15 or a 17

cm rod. In addition, the sizes of the base and top circles were also changed between trials, by

randomly varying the diameters of the blue circles drawn in Step 2 (shown in Figure 6.7).

For the experiment we wanted to present the truncated cone with its base at 6 different depths - at

the surface (of the bin), two tenths, four tenths, six tenths and eight tenths of its length behind the

surface, and completely behind the surface (with the distal surface of the truncated cone touching

the surface of the bin). The required stimuli are illustrated (schematically) in Figure 6.8.

Figure 6.8: Schematic top view diagram of rod with circles placed at 6 different depths relative

to the surface of the bin. The black numbers noted on the rod are indicative of the proportion of

the truncated cone that was placed behind the bin’s surface. (The light blue lines joining the

circles represent the sides of the final virtual truncated cone.)

To obtain the necessary stimuli we needed to actually execute this with the real model. To render

the virtual object at its correct depth relative to the surface of the bin, the following steps were

taken:

a) The bin was placed at a fixed location on a table and in front of the Fujifilm stereo

camera, which was fixed on a stand at a distance of 138 cm from the surface of the bin.

For each random dot pattern (as well as the No Pattern condition), a stereo image was

taken of the bin (as shown in Figure 6.3 and Figure 6.4). Once this was done, the bin was

removed from the scene. However, the location of the surface of the bin was marked on

the table.

71

b) Once the bin was removed, the rod connecting the two circles was replaced in the scene.

A general illustration of the replacement of the bin with the rod connecting the circles is

shown in Figure 6.9. To obtain the virtual object images needed for the stimuli illustrated

in Figure 6.8, the length of each connecting rod (between the two concentric circles) was

divided into 5 sections – 3 cm apart for the 15 cm rod and 3.4 cm apart for the 17 cm rod.

Separate stereo images were taken of the model (as shown in Step 1 of Figure 6.7) for a

series of locations of the larger pink circle relative to the surface of the bin in order to

obtain images of the 6 conditions illustrated in Figure 6.8.

c) The stereo photos taken of the bin and of the model of the virtual object were split to left

and right images, and each of these went through the process illustrated in Figure 6.7.

d) Each left image of the virtual object was overlaid on top of each of the left images shown

in Figure 6.3 and Figure 6.4. The same was done for right images. This was done by

using the Apply Image option in Photoshop.

A sample of the resulting final stereo image is presented in Figure 6.1.

Figure 6.9: Schematic diagram showing the camera setup with respect to the bin (above), which

was replaced by the rod connecting the circles (below), which was placed along the red dashed

line (marking the surface of the bin’s location).

72

Once the images were generated, they were rendered stereoscopically using a desktop computer

(Windows 7 Professional OS with NVIDIA Quadro 600), coded using MATLAB. The stimuli

were presented to participants with a size of 15cm by 23 cm on a 23-inch LCD screen (ASUS

VG236HE, 1920 x 1080 resolution, 120 Hz refresh rate). Stereo images were observed using the

NVIDIA 3D vision system with 3D Vision 2 glasses. The participants’ task was to determine,

within a limited amount of time, the fraction of the truncated cone that was perceived to be

behind the cylinder’s surface.

6.2.2 Participants

15 students from the University of Toronto were recruited, all 18-49 years old (9 male and 6

female). Participants of Experiments 1 and 2 were precluded from participating in Experiment 3

to prevent learning effects. All participants either had normal visual acuity or used corrective

devices during the experiments to achieve normal visual acuity. To confirm the absence of any

stereoscopic vision problems, the NVIDIA 3D stereo vision test was administered. As

compensation, participants were each paid $15/hour. To motivate participants during the

experiment, a lottery with a $50 gift card prize was carried out after all experiments were done.

The participants were informed that the number of lottery ballots assigned to their names would

be proportional to their respective performance scores.

6.2.3 Procedure

After taking the stereo vision test, participants were given an information sheet outlining the

details of the experiment. They were then given the consent form to sign, which was followed by

a brief questionnaire, asking about their age, gender, use of corrective lenses and ability to

perceive stereoscopically. Copies of these are included in Appendix A3.

Once these steps were taken, 6 different samples of stimuli similar to the one shown in Figure

6.1 were presented to the participant one by one. During the first sample presentation,

participants were asked to describe what they saw. Once it was confirmed that they were seeing a

truncated cone that is partially behind the surface of a bin covered by random black dots, the

procedure of the experiment was further explained to them by showing the rest of the 6

examples.

73

Participants were then taken through a brief training session (consisting of 6 trials) which

allowed them to become familiar with the experimental software. In contrast to the preceding

examples, however, no feedback was provided to participants. During this training session (as

well as the actual experiment), a chinrest was used to ensure a fixed viewer-to-display distance

of 40 cm.

Each trial consisted of a 2-second presentation of a stimulus. The stimulus would then disappear

and participants would be prompted with a screen asking for a response to “Determine what

proportion of the truncated cone (x-tenths) is behind the bin's surface.” Because we realised that

it might be easy to confuse our request to estimate the proportion behind the surface with a

potential request to estimate the proportion in front, a visual guide, printed on a sheet of paper

and shown in Figure 6.10, was placed next to the monitor. It is important to point out that, even

though the virtual object was presented at 6 discrete depths (0, 2, 4, 6, 8, and 10 tenths behind the

surface of the bin), participants received no feedback and were thus unaware of this constraint. In

order to obtain higher resolutions in the measured errors, we allowed their responses to take on

any of the 11 discrete values between 0 and 10. After choosing their response from the numbers

on top of the keyboard, participants would press Enter.

They were then prompted to answer a second question that asked: “On a scale of 1 (easiest) to 4

(most difficult), how difficult did you find the task?” The procedure to answer this question was

the same as before, using the numeric keyboard. Participants had unlimited time to answer both

questions. The rationale for including this question was to obtain a measure of participants’

assessment about the difficulty of the task, which is an important factor in developing a

practically usable X-ray display. The correspondence between these ratings and the accuracy of

the depth judgement task was also meant to provide a measure of the consistency between

participants’ objective performance and subjective experience. In other words, asking this

question was meant to provide us with information about whether more accurate responses were

associated with lower subjective difficulties and vice versa.

74

Figure 6.10: Guide placed next to monitor for participants’ reference during experiment.

Once the training was done, the actual experiment began. With 19 display conditions (3 dot

densities x 3 dot sizes x 2 blur (sharp and blurry) levels + 1 No Pattern condition) and 30 task

conditions (6 depths x 5 repetitions of each), participants went through a total of 570 (19x30)

trials. On average, the experiment took about 50-60 minutes, during which they were given 10

minutes to take one break.

After the trials were completed, participants were interviewed. The script for this interview is

also included in Appendix A3. The main topics covered during the interview were:

• whether there were any difficulties experienced in fusing images,

• whether there were any specific strategies used for making depth judgements, and

• the way in which the black dots were perceived.

6.2.4 Depth Judgement Task

Although the general experimental procedure was presented in the previous section, due to the

important role of the depth judgement task in the experimental design, this section focuses

exclusively on the details of this task.

75

Generally, the depth of objects can be considered in two ways: ordinal and absolute. Ordinal

depth pertains to the depth order between two (or more) objects (i.e., which is closer, which is

farther). For example, our first set of experiments (described in Chapter 5) asked about the

ordinal depth of the virtual object relative to the real surface37. Absolute depth, on the other

hand, can be ascertained by the observer using units such as meters (Livingston et al., 2013).

In addition, the absolute depth of objects can be defined in terms of egocentric distances or

exocentric distances. Egocentric distances refer to the absolute depth of an object relative to the

observer, while exocentric distances refer to the absolute depth of an object relative to another

object in the field of view (Swan et al., 2007). As previously mentioned, the focus of this

experiment was to determine whether adding random dot patterns can lead to an improvement in

the accuracy of exocentric distance estimations about the virtual object relative to the real

surface.

Considering the points mentioned above, by asking participants to “determine what proportion of

the truncated cone (x-tenths) is behind the bin's surface”, we are asking them to determine the

exocentric distance of the larger circle of the truncated cone relative to the bin’s surface (‘a’ in

Figure 6.11), given that the exocentric distance between the larger circle and the smaller circle of

the truncated cone (‘a+b’ in Figure 6.11) is 10 units38.

37 It should be pointed out though that, ultimately, the findings from the ordinal depth judgement task in this

experiment resulted in a psychophysical function that provided an estimate of the absolute depth of the virtual

object.

38 It is worth pointing out that even though asking for the proportion of the cone that is in front of the bin (that is,

b/10, where ‘b’ is shown in Figure 6.11) should technically achieve the same result, the reason we chose to ask for

the proportion of the cone that is behind the bin’s surface was that we are generally interested in applications of X-

ray vision where depth judgements about the objects that are behind the real surface are more pertinent.

76

Figure 6.11: Top view example of truncated cone’s position relative to the surface of the bin. In

this image ‘a’ and ‘b’ denote the distance of the larger and smaller circles of the truncated cone

relative to the bin’s surface, respectively.

6.3 Hypotheses

To recap, the independent parameters of this experiment were as follows:

• Dot Size: 1/25 (largest), 1/50, 1/75 (smallest)

• Dot Density: 20%, 40%, 60%

• Blur level: Sharp, Blurry

• Fraction behind bin surface: 0/10, 2/10, 4/10, 6/10, 8/10, 10/10

• Patterns vs No Pattern condition

On the other hand, the dependent parameters measured during the experiment (in addition to the

interview results) were:

• Estimated depth of virtual object relative to real surface

• Difficulty rating of associated depth estimation trial

As such, the hypotheses presented are with regards to these two dependent parameters.

77

6.3.1 Estimated Depth of Virtual Object relative to real surface (EDVO)

As discussed, the addition of random dot patterns is expected to help with making more accurate

depth judgements about the virtual object relative to the real surface. Thus, Hypothesis 1 was

that the addition of patterns would lead to lower errors in EDVO relative to the No Pattern

condition.

Based on the reasoning provided in Section 4.2, the logic behind our next hypothesis was that

random dot patterns provide distinct edges that assist observers in making vergence eye

movements, thus allowing for more accurate absolute depth judgements about the virtual object

relative to the real surface. Hypothesis 2a, therefore, was that sharp patterns will lead to fewer

errors in EDVO compared to blurry patterns. Moreover, considering that smaller dot sizes have a

higher dominant spatial frequency compared to larger dot sizes and since ‘blurring’, in effect,

attenuates high spatial frequencies, Hypothesis 2b was that the effect of blur on errors in EDVO

would be larger for smaller dot sizes. In other words, if EDVO errors were found to be different

for blurry and sharp patterns, it was hypothesised that this difference would be larger for smaller

dot sizes.

As for the sharp patterns, two competing hypotheses were created with regards to the effect of

dot size. Since larger dot sizes provide more distinct edges for making vergence eye movements,

it can be hypothesized that they will lead to smaller errors in EDVO. On the other hand, smaller

dot sizes result in a larger number of edges available for making vergence eye movements and

can, therefore, be predicted to lead to smaller errors in EDVO. With regards to dot density,

higher dot densities of random dot stereograms increase the stimulus’ strength in attracting

vergence (Rashbass & Westheimer, 1961; Mallot, Roll & Arndt, 1995). Therefore, based on this

reasoning, Hypothesis 3 predicted that increasing dot densities would facilitate vergence eye

movements, leading to smaller errors in EDVO.

6.3.2 Difficulty Rating of depth estimation task (DR)

With regards to difficulty ratings (DRs), the following hypotheses were made:

• Hypothesis 4 was that the No Pattern condition would yield higher DRs compared to the

pattern conditions. The reason for this was that, without the pattern, it was expected to be

rather difficult to ascertain the depth of the real surface.

78

Moreover:

• Since the blurry patterns did not provide the distinct edges required for making vergence

eye movements, Hypothesis 5 was that blurry patterns would lead to higher difficulty

ratings compared to their sharp counterparts.

Two more hypotheses are illustrated in Figure 6.12:

• Hypothesis 6 predicted that larger dot sizes would lead to lower DRs since larger dots

provided edges that were more distinct.

• As mentioned above, since higher dot densities are more likely to facilitate vergence eye

movements, Hypothesis 7 was that higher dot densities would lead to lower DRs.

Figure 6.12: Experimental hypotheses 6 and 7, illustrating expected changes in DR as a function

of dot size and dot density.

6.4 Results

We start the data analysis process with a visual inspection of the data that were collected. The

statistical analyses of these data are then presented, followed by a discussion of the results and

their implications. In this section, we first focus on the results obtained for each of the dependent

parameters separately and then present the results related to the correspondence between these

two parameters. Finally, the responses to the interview questions are summarized and presented.

79

With regards to the statistical analyses, it should be noted that the experimental design - which

consisted of 30 trials for the No Pattern condition (6 depths x 5 repetitions) and 30 trials for each

combination of blur, dot size, and dot density (for a total of 540 trials) - led to an unbalanced

design. Therefore, to simplify the analyses, two series of repeated-measures ANOVAs were

performed on the two dependent parameters: EDVO and DR. The first repeated-measures

ANOVA treated each pattern independently and was meant to compare the results for the No

Pattern condition to each of the random dot pattern conditions. The second ANOVA was done

exclusively on the results pertaining to the random dot pattern conditions and was meant to

compare the various dot sizes, dot densities, depths and blur conditions.

6.4.1 Estimated Depth of Virtual Object (EDVO)

As previously mentioned, for each trial participants entered the proportion (in tenths) of the

truncated cone (virtual object) that they perceived as being behind the real bin surface. Figure

6.13, Figure 6.14 and Figure 6.15 present scatterplots showing the ‘Estimated Depth of Virtual

Object’ (EDVO) as a function of the virtual object’s actual depth proportion (relative to the real

surface) for the various patterns. Figure 6.16 shows the same scatterplot for the No Pattern

condition. In these figures, both the sizes and colours of the dots are proportional to the number

of occurrences at each point (higher number of occurrences are shown with larger and darker

circles).

The y=x line has also been added as a reference line, representing perfect performance. In

addition to acting as a reference for the values shown on each graph, it is important to understand

that all points above the y=x line represent estimates that are biased towards being behind the bin

surface, while all points below the y=x line represent estimates that are biased towards being in

front. Accordingly, the scenario for which depth perception would be “perfect” would result in a

‘scatterplot’ comprising only large red circles on the y=x line. On the other hand, complete

chance performance would lead to a scatterplot comprising all circles of the same size (and of

yellow colour) and equally distributed across all y-values for each depth. The resulting trend line

in this case would be y=5, which is shown with black dashed lines. As such, these scatterplots

should be compared to these two extreme situations.

80



condition, (b) Blurry condition. The sizes and colours of the dots are proportional to the number

of occurrences at each point. Each column adds up to 75 trials (15 participants*5 trials). A blue

trend line has been fitted to the data. The y=x and y=5 reference lines are also provided to show

perfect and chance performance, respectively.

(a)

DD = 20%

Sharp

(b)

DD = 20%

Blurry

81



condition, (b) Blurry condition.

(a)

DD = 40%

Sharp

(b)

DD = 40%

Blurry

82



condition, (b) Blurry condition.

(a)

DD = 60%

Sharp

(b)

DD = 60%

Blurry

83

Figure 6.16: Scatterplot showing the ‘Estimated Depth of Virtual Object’ as a function of the

virtual object’s actual depth proportion for the No Pattern condition.

As a first approximation, a blue trend line has been fitted to the values for the EDVO. Where the

trend line intersects the y=5 line can be considered as the ‘Point of Subjective Equality’ (PSE),

where the observer perceives the virtual object as being halfway inside the cone (EDVO=5/10).

The actual depth proportions where these intersections occur are calculated and shown for the

various patterns in Figure 6.17.

84

Figure 6.17: Plot showing the Point of Subjective Equality (PSE) as a function of dot density.

The PSE for the No Pattern condition is shown for reference.

By examining the trends shown in Figure 6.17, several observations can be made:

• The PSE is smaller for all patterns compared to the No Pattern condition. This

observation is in line with Hypothesis 1 and could potentially serve as supporting

evidence that random dot patterns can help in improving ordinal and absolute depth

judgements.

• As dot density increases, the PSE appears to move towards the front of the bin.

• The blurring of the random dot patterns leads to a shift of the PSE towards the back of the

bin. In other words, (similar to the finding discussed in Section 5.3.3) blurry patterns

cause the virtual object to be perceived as closer to the observer compared to their sharp

counterparts.

• As dot size decreases (1/25 → 1/75), the difference in PSE between sharp and blurry

patterns increases. This observation is in line with Hypothesis 2b which stated that ‘if

EDVO errors were found to be different for blurry and sharp patterns, this difference

would be larger for smaller dot sizes’.

• The relative ordering of the patterns with respect to PSE reverses when they are blurred:

For the sharp patterns, the PSE of 1/75 < 1/50 < 1/25 but, for the blurry patterns, the PSE

85

of 1/75 > 1/50 > 1/25 (on average). In other words, as dot size decreases, the PSE moves

closer to the observer for sharp patterns and farther behind the bin for blurry patterns.

To further inspect the data for trends, the absolute values of the errors were also averaged for all

participants. Each error was calculated as the difference between the EDVO and the actual

proportion of the truncated cone behind the bin surface (perceived depth proportion minus actual

depth proportion). Figure 6.18, Figure 6.19 and Figure 6.20 illustrate the average absolute errors

for all participants as a function of the virtual object’s depth for the three dot sizes. The results

for the No Pattern condition are also provided for each set of results for comparison purposes.

Upon inspection of these figures, it seems that for depths≥6/10, differences between the No

Pattern and the random dot pattern conditions becomes greater. This observation is in line with

Hypothesis 1. Moreover, for depths≥6/10, the sharp random dot patterns seem to lead to lower

errors than their blurry counterparts (as predicted by Hypothesis 2a). This effect also seems to be

more noticeable as dot size becomes smaller, which was also predicted by Hypothesis 2b. On the

other hand, no obvious trends can be inferred about the effect of dot size and dot density on the

average absolute errors in perceived depth.

To check whether any of these apparent effects were statistically significant, ANOVAs were

performed on the absolute values of errors averaged across the 5 (repetition) trials for each

condition and each participant. The following sections focus on the results of the two ANOVAs

that were performed.

86

Figure 6.18: Average absolute error in perceived depth as a function of the virtual object’s

actual depth relative to the real surface for dot size = 1/25.



87



6.4.1.1 Two-way Repeated Measures ANOVA

As previously mentioned, the first ANOVA was performed such that each pattern was treated

independently as a means of comparing the different patterns to the No Pattern condition. With 3

dot sizes, 3 dot densities, 2 blur conditions and a No Pattern condition, there were a total of 19

patterns. The two independent parameters considered were pattern and depth.

Mauchly’s test indicated that the assumption of sphericity had been violated for the main effect

of depth, 𝒳2(14) = 102.08, 𝑝 < .0005. Therefore, degrees of freedom were corrected using

Greenhouse-Geisser estimates of sphericity (𝜀 = .32). Following this correction, there was a

marginally significant effect of depth (p=.06). As for pattern, results revealed a significant main

effect, F(18, 252) = 3.43, p<.0005. Contrasts, however, revealed that there was no significant

difference between the No Pattern condition and the two patterns with dot density of 20% and

dot sizes of 1/50 and 1/75. This finding seems to make sense, considering that the small and

blurry dots with low dot densities (as shown in Figure 6.4) visually appear very similar to the No

Pattern condition. Other than these two patterns, significant main effects of pattern were found

for all other patterns.

88

On the other hand, there was a significant interaction effect between depth and pattern, F(90,

1260)=4.06, p<.0005. Table 6.1 outlines these significant interactions revealed by contrasts. For

the patterns not listed in Table 6.1, no significant interaction effect was found.

Table 6.1: Contrast results for significant interaction effects between depth and pattern. The

rows are colour coded to aid in identification of patterns with the same dot size.

No Pattern to Pattern with: … Blur Condition Depth=0 to Depths = ... DS=1/25 , DD=20 Sharp 8

DS=1/25 , DD=40 Sharp 6, 8, 10

DS=1/25 , DD=60 Sharp 8

DS=1/50 , DD=20 Sharp 8

DS=1/50 , DD=40 Sharp 2, 4, 6, 8, 10

DS=1/50 , DD=60 Sharp 6, 8, 10

DS=1/75 , DD=20 Sharp 8, 10

DS=1/75 , DD=40 Sharp 6, 8, 10

DS=1/75 , DD=60 Sharp 2, 6, 8, 10

DS=1/25 , DD=60 Blurry 8

DS=1/50 , DD=40 Blurry 6

DS=1/50 , DD=60 Blurry 8, 10

To derive meaning from these results, we performed individual t-tests comparing each pattern

with the No Pattern condition at every depth. Although one may argue that doing so increases the

chances of making Type I errors, if found to be non-significant these tests provide supporting

evidence not to reject the null hypothesis. In other words, although we are not justified in

‘accepting the null hypothesis’, the existence of non-significant t-tests provides some supporting

evidence that the differences (in absolute error in perceived depth) between the two conditions

are not significant. Below, the final conclusive results are summarized and presented separately

for each dot size39:

• DS=1/25: As visually confirmed in Figure 6.18, no significant difference between the

various patterns and the No Pattern condition were found for depths<6/10. Generally, the

interaction effects that were found to be significant (for depth=0/10 to depths≥6/10) are a

result of the differences between the patterns and the No Pattern condition becoming

significant as the virtual object moves farther behind the real surface. In other words,

patterns with the dot size of 1/25 lead to lower absolute errors in perceived depth

39 To aid in comprehension of these results, it is suggested to refer to Figure 6.18, Figure 6.19 and Figure 6.20.

89

compared to the No Pattern condition when the virtual object is placed at depths≥6/10.

For depths≤4/10, no significant difference exists between these patterns and the No

Pattern condition.

• DS=1/50 (Figure 6.19): For the sharp patterns, there was no significant differences

between the No Pattern condition and the patterns with dot densities of 20 and 60% at

depth=0/10. Therefore, the depths where interactions exist (as presented in Table 6.1) are

the depths for which significant differences exist between these patterns and the No

Pattern condition. For example, based on the results presented in Table 6.1, we can

conclude that for the pattern with 60% dot density, the absolute errors in perceived depth

are significantly smaller compared to the No Pattern condition for depths≥6/10. For

smaller depths, there are no significant differences between this pattern and the No

Pattern condition. For the (sharp) pattern with 40% dot density, the mean absolute error

in perceived depth was found to be significantly larger than for the No Pattern condition

for depth=0/10. No significant difference was found for depths=2/10 and 4/10

(explaining why there are interaction effects present for these depths in Table 6.1).

However, for depths≥6/10, the mean absolute error in perceived depth is significantly

smaller compared to the No Pattern condition.

As for the blurry patterns, as previously mentioned, there was no significant main effect

of pattern for dot density 20%. For dot density 40%, there was no significant difference

in absolute error in perceived depth compared to the No Pattern condition for all depths

other than depth=6/10. However, when DD=60%, the absolute errors in perceived depth

were significantly smaller than those of the No Pattern condition for depths=8/10 and

10/10.

• DS=1/75 (Figure 6.20): For the sharp patterns, there were no significant differences

between the No Pattern condition and the patterns with dot densities of 20 and 40% at

depth≤4/10. Therefore, the depths where interactions exist (as presented in Table 6.1) are

the depths for which significant differences exist between these patterns and the No

Pattern condition. For example, based on the results presented in Table 6.1, we can

conclude that for the pattern with 40% dot density, the mean absolute errors in perceived

depth are significantly smaller compared to the No Pattern condition for depths≥6/10. For

smaller depths, there are no significant differences between this pattern and the No

90

Pattern condition. For the (sharp) pattern with 60% dot density, the mean absolute error

in perceived depth was found to be significantly larger than the No Pattern condition for

depth=0/10. No significant difference was found for depth=2/10 (which explains why

there is an interaction effect present for this depth in Table 6.1). However, for

depths≥6/10, the absolute error in perceived depth is significantly smaller compared to

the No Pattern condition.

As for the blurry patterns, as previously mentioned, there was no main effect of pattern

for dot density of 20%. For dot density of 40% and 60%, there were no significant

interaction effects present. However, main effects of patterns were found to be

significant. Referring to Figure 6.20, it can be inferred that these patterns generally led to

significantly lower absolute errors in perceived depth compared to the No Pattern

condition.

6.4.1.2 Four-way Repeated Measures ANOVA

As previously mentioned, a second set of ANOVAs was done exclusively on the results

pertaining to the random dot pattern conditions and was meant to compare the various dot sizes,

dot densities, depths and blur conditions40.

Mauchly’s test indicated that the assumption of sphericity had been violated for the interaction

effects of blur and depth (𝜒2(14) = 72.85, 𝑝 < .0005), depth and dot density, (𝜒2(54) =

89.83, 𝑝 < .005) and that of blur, depth and dot size, (𝜒2(54) = 120.49, 𝑝 < .0005). Therefore,

degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (𝜀 = .28

for the interaction effect between blur and depth, 𝜀 = .36 for the interaction effect between depth

and dot density, and 𝜀 = .37 for the interaction effect between blur, depth and dot size).

Considering these corrections, the effects that were found to be significant are summarized as

follows:

• Blur * Depth: F(1.38, 19.36) = 7.77, p<.05

40 Considering the very small ratio (1/18) of No Pattern condition trials relative to those of the random dot patterns,

excluding that set of results from this analysis was not deemed to diminish the validity of the obtained results.

91

• Blur * Dot Density: F(2, 28) = 11.81, p<.0005

• Depth * Dot Density: F(3.64, 50.94) = 7.83, p<.0005

• Blur * Depth * Dot Size: F(3.71, 51.94) = 3.84, p<.0005

• Blur * Depth * Dot Size * Dot Density: F(20, 280) = 1.82, p<.05

In the following sections, we focus on each of these significant effects individually.

6.4.1.2.1 Interaction Effect of Blur and Depth

Results revealed a significant interaction effect between blur and depth, indicating that blur had

different effects on the average absolute error in perceived depth when the depth of the virtual

object changed. To break down this interaction, contrasts were performed comparing all the

depths the virtual object appeared at to depth = 0/10 (when the virtual object was completely in

front of the bin). These revealed significant interactions for all depths (with a marginally

significant interaction for depth=4/10 (where p = .06):

• Depth=2/10 to Depth=0/10: F(1,14)=4.82, r=.51, p<.05

• Depth=4/10 to Depth=0/10: F(1,14)=4.11, r=.48, p=.06




The interaction graph is presented in Figure 6.21, which differs from Figure 6.18, Figure 6.19

and Figure 6.20 in that results are combined for all dot densities and dot sizes (resulting in 18

points at each depth). This plot illustrates that the effect of blur on ‘average absolute error in

perceived depth’ for depths≥2/10 was significantly different from this effect at depth=0/10. Upon

closer inspection of this graph, we can also notice that, while there doesn’t seem to be a

noticeable difference between the sharp and blurry conditions for depths=2/10, 4/10, and 6/10,

when depth≥8/10, the sharp patterns result in lower average absolute errors in perceived depth

than those of the blurry patterns.

92

A potential explanation could be that when the random dot pattern was blurry, it gave the

impression that the surface of the bin was farther away and thus participants were more likely to

perceive the virtual object (which appeared sharp and in focus) as being in front of the bin (as

verified by the results presented in Section 6.4.1and the participants’ responses to interview

questions, discussed in Section 6.4.4). On the other hand, as the depth of the virtual object

increased (it appeared farther behind the surface of the bin), the sharp patterns provided edges

that were more distinct and, thus, resulted in depth judgements with lower absolute errors.

Figure 6.21: Average absolute error in perceived depth of virtual object as a function of its

actual depth depicting the interaction effect of blur and depth.

6.4.1.2.2 Interaction Effect of Blur and Dot Density

There was a significant interaction effect between blur and dot density. This indicates that blur

had different effects on the ‘average absolute error in perceived depth’ when the dot density of

the random dot patterns changed. To break down this interaction, contrasts were performed

comparing the effect of blur for all three dot densities (20%, 40% and 60%). Contrasts revealed

significant interactions when comparing the effect of blur both for dot density 40% to dot density

93

20% (F(1,14)=6.67,r=.57), and dot density 60% to dot density 20% (F(1,14)=31.07,r=.83).

Looking at the interaction graph (Figure 6.22), these effects reflect that the differences in the

‘average absolute errors in perceived depth proportion’ due to the blur are significantly larger for

dot density 20% compared to 40% and 60% (0.4 compared to 0.15 and 0.05). There was no

significant interaction between dot density and blur when dot density 40% was compared to

60%.

Figure 6.22: Average absolute error in perceived depth of virtual object as a function of dot

density depicting the interaction effect of blur and dot density.

Considering that the smaller dot size (1/50 and 1/75) blurry patterns with densities 20% did not

yield any significant differences of ‘average absolute error in perceived depth’ compared to the

No Pattern condition (as discussed in Section 6.4.1.1), perhaps it can be reasonable to conclude

that the addition of these patterns failed to be perceived as a substantial change to the surface. On

the other hand, this interaction effect reveals that sharp patterns with 20% dot density led to

significantly lower average absolute errors in perceived depth proportion (compared to their

blurry counterparts). This result can also serve as potential evidence for Hypothesis 2a, by

94

suggesting that even with 20% dot densities, sharp patterns may be able to provide the distinct

edges required for making vergence eye movements.

6.4.1.2.3 Interaction Effect of Depth and Dot Density

There was a significant interaction effect between depth and dot density. This indicates that dot

density had different effects on the ‘average absolute error in perceived depth proportion’ when

the depth of the virtual object changed. To break down this interaction, contrasts were performed

comparing the effect of density for all depths compared to depth=0/10. Contrasts revealed

significant interactions when comparing the effect of depth both for dot density 40% to dot

density 20% and dot density 60% to dot density 20%. The onset of this interaction when

comparing dot density 40% to dot density 20% was at depths≥4/10, while the same effect when

comparing dot density 60% to dot density 20% was at depths≥6/10.

The significant interactions found when comparing dot density 40% to 20% were as follows:





When comparing dot density 60% to 20%, the following significant interactions were found:




Looking at the interaction graphs in Figure 6.23, Figure 6.24 and Figure 6.25, which are

presented separately for each dot size to prevent clutter, it can be seen that (other than for one

exception for dot size=1/25) when comparing depths≥6/10 to depth=0/10, the effect of dot

density reverses as dot density goes from 20% to 40% and from 20% to 60%.

95


actual depth, depicting the interaction effect of depth and dot density for dot size=1/25.


actual depth depicting the interaction effect of depth and dot density for dot size=1/50.

96


actual depth depicting the interaction effect of depth and dot density for dot size=1/75.

Although these results do not provide sufficient evidence for rejecting the null hypothesis

regarding the inverse effect of dot density on errors in EDVO (Hypothesis 3 mentioned in

Section 6.3.1), a potential implication could be that with increased dot density, observers are

more likely to perceive the virtual object as being behind the surface, which causes larger errors

when it’s placed in front but leads to smaller errors as the object moves farther behind the

surface (as verified by the results presented in Section 6.4.1). Although the reason for why this

may be the case is not clear, this finding may have important implications for the design of

random dot patterns for X-ray vision, which aims to convey the impression that the virtual object

is placed behind the real surface.

6.4.1.2.4 Interaction Effect of Blur, Depth and Dot Size

There was a significant interaction effect between blur, depth and dot size. This indicates that dot

size influenced the effect of blur on ‘average absolute error in perceived depth’ differently when

97

the depth of the virtual object changed. To break down this interaction, contrasts were

performed. Results revealed the following significant interactions:

When dot size=1/50 was compared to dot size=1/25:

• Depth=6 to Depth=0: F(1,14)=6.86, r=.57, p<.05




• Depth=8 to Depth=0: F(1,14)=7, r=.58, p<.05




Looking at the interaction graphs, which are presented separately for each dot size (Figure 6.26,

Figure 6.27 and Figure 6.28), it can be seen that the effect of blur for depths≥6 (when comparing

to its effect at depth=0) is different for various dot sizes. Generally, from what can be interpreted

from these graphs, as dot size becomes smaller, the difference in ‘average absolute error in

perceiving depth’ becomes larger between the blurry and the sharp condition for depths≥6

compared to depth=0. A possible explanation for this could be that the fixed amount of blur

applied to the various patterns results in a larger visible effect for smaller dots. This is because

smaller dot sizes have higher dominant spatial frequencies compared to larger dot sizes and since

‘blurring’, in effect, attenuates high spatial frequencies, the resulting effect becomes more visible

for smaller dot sizes. Therefore, it can be concluded that this result is in agreement with

Hypothesis 2b.

98


actual depth depicting the interaction effect of blur, depth and dot density for dot size=1/25.

99



100



6.4.1.2.5 Interaction Effect of Blur, Depth, Dot Size and Dot Density

There was a significant interaction effect between blur, depth, dot density and dot size

(F(20,280)=1.82, p<.05). This indicates that the effect observed above also differed as dot

density changed. To break down this interaction, contrasts were performed. Results revealed the

following significant interactions:


• Depth=2/10 to Depth=0/10 and Dot Density=40% to Dot Density=20%: F(1,14)=6.31,

r=.56, p<.05



r=.56, p<.05

101



r=.6, p<.05


r=.55, p<.05

6.4.2 Difficulty Rating of depth estimation task (DR)

As previously mentioned, for each trial, after having responded to the depth judgement task,

participants selected a response to the question: “On a scale of 1 (easiest) to 4 (most difficult),

how difficult did you find the task?” (Note that only discrete values of 1, 2, 3 or 4 were

accepted.) The scatterplots showing the ‘Difficulty Rating’ (DR) results as a function of the

virtual object’s actual depth proportion (relative to the real surface) for the various patterns and

the No Pattern condition are presented in Appendix B2. The major observation that can be made

from these scatterplots is that, for almost all patterns, the most common ratings are 2 and 3. For

the No Pattern condition, however, this seems not to be the case, as the ratings seem to be

distributed equally across 1-4.

To further inspect the data for trends, the average ratings across participants were plotted as a

function of the virtual object’s depth proportion for all conditions. These plots can be seen in

Figure 6.29, Figure 6.30 and Figure 6.31. The results for the No Pattern condition are also

provided for each set of results for comparison purposes.

Although it may initially seem that there is an inverted U effect present as the depth proportion

of the virtual object is increased, it should be noted that the Y-axis values in Figure 6.29, Figure

6.30 and Figure 6.31 show that the average difficulty ratings all vary only between 2 to 3 and

are, therefore, very close to each other. Perhaps the only potentially meaningful observations that

can be made are:

• The difficulty ratings for depth=0 are generally smaller than those at other depth

proportions.

• Difficulty ratings increase as depth is increased, although there doesn’t seem to be much

change after depth≥6.

102

• Difficulty ratings for the ‘No Pattern’ condition are inconsistent and peak around

midrange.

To check for effects of statistical significance, ANOVAs were performed on the difficulty ratings

averaged across the 5 (repetition) trials for each condition and each participant. The following

sections focus on the results of the two ANOVAs that were performed.


to the real surface for dot size = 1/25.

103





104

6.4.2.1 Two-way Repeated Measures ANOVA

As with the depth judgement task results, the first ANOVA was performed such that each pattern

was treated independently, as a means of comparing the different patterns to the No Pattern

condition. With 3 dot sizes, 3 dot densities, 2 blur conditions and a No Pattern condition, there

was a total of 19 patterns. The two independent parameters considered were pattern and depth.

Mauchly’s test indicated that the assumption of sphericity had been violated for the main effect

of depth, 𝒳2(14) = 97.27, 𝑝 < .0005. Therefore, degrees of freedom were corrected using

Greenhouse-Geisser estimates of sphericity (𝜀 = .27). Based on this correction, there was a

significant effect of depth, F(1.34, 18.82)=4.87, p<.05. Contrasts revealed that this significant

main effect was due to the difference in difficulty ratings for depths=0/10 and depths=10/10,

F(1,14)=4.93, p<.05. There were no other significant differences in difficulty ratings when

comparing all other depths.

As for the effect of pattern, results did not reveal a significant main effect. Therefore, we were

not able to support Hypothesis 4 (which predicted that the No Pattern condition would lead to

higher DRs compared to the random dot pattern conditions). Although this result may initially

seem counter-intuitive, it does point to the fact that, while participants rated the depth judgement

task as equally difficult for the random dot pattern conditions and the No Pattern condition, their

accuracy in performing the depth judgement task was, in fact, significantly different for these

conditions.

Figure 6.32 depicts the effect of depth proportion on the difficulty ratings, formed by combining

all the data presented in Figure 6.29, Figure 6.30 and Figure 6.31. As it can be seen, this graph

supports the ANOVA result that performing the depth judgment task was deemed as

significantly more difficult when the virtual object was completely behind the bin’s surface

(10/10) compared to when it was completely in front of it (0/10).

105

Figure 6.32: Effect of depth on DRs.

6.4.2.2 Four-way Repeated Measures ANOVA

A second set of ANOVAs was done exclusively on the results pertaining to the random dot

pattern conditions and was meant to compare the various dot sizes, dot densities, depths and blur

conditions.

Once again, Mauchly’s test indicated that the assumption of sphericity had been violated for the

main effect of depth, 𝜒2(14) = 92.48, 𝑝 < .0005 and the interaction effect of blur and dot

size,𝜒2(2) = 7.45, 𝑝 < .05. Therefore, degrees of freedom were corrected using Greenhouse-

Geisser estimates of sphericity (𝜀 = .27 for the main effect of depth and 𝜀 = .7 for the

interaction effect between blur and dot size). Considering these corrections, the effects that were

found to be significant are summarized as follows:

• Depth: F(1.34, 18.76) = 5.28, p<.05

• Blur * Dot Size: F(1.39, 19.49) = 5.81, p<.05

In the following sections, we focus on each of these significant effects individually.

106

6.4.2.2.1 Main Effect of Depth

As mentioned, results revealed a significant main effect of depth proportion on DRs. To break

down this effect, contrasts were performed comparing the DR values for all depths. As expected

from the finding in Section 6.4.2.1, contrasts revealed only one significant main effect of depth

proportion, when comparing depth=10/10 to depth=0/10, F(1, 14)=5.47, p<.05.

6.4.2.2.2 Interaction Effect of Blur and Dot Size

Results revealed a significant interaction effect between blur and dot size. This indicates that blur

had different effects on the DRs when the dot size changed. To break down this interaction,

contrasts were performed comparing the sharp and blurry conditions for different dot sizes.

These revealed a significant interaction when comparing the effect of blur for dot sizes equal to

1/25 and 1/75, F(1,14)=7.02, p<.05. Figure 6.33 illustrates this interaction effect. As can be seen,

while the DRs for the sharp and blurry conditions do not differ much for dot size=1/25, DRs are

significantly larger for the sharp pattern with (small) dot size=1/75 compared to that of the blurry

pattern. In addition to confirming the larger effect of (a fixed amount of) blur on smaller dot

sizes (which was previously discussed), this result is suggesting that when dots are smaller in

size, the high number of sharp edges can cause the observer to deem the depth judgement task as

more difficult. This could potentially be due to the fact that (as explained in section 6.5.2)

stereoscopic images with higher dominant spatial frequencies (smaller dot sizes) are more likely

to lead to visual discomfort (Wöpking, 1995; Perrin, Fuchs, Roumes & Perret, 1998).

107

Figure 6.33: DRs as a function of dot size depicting the interaction effect of blur and dot size.

6.4.3 Correspondence between Average Absolute Errors in EDVO and DRs

Figure 6.34 depicts the scatterplot for average absolute errors in perceived depth as a function of

the difficulty ratings for all trials. As can be seen, there seems to be a positive correlation

between the two dependent parameters. To investigate whether this apparent correlation was

significant, Kendall’s tau revealed a significant positive correlation between the average absolute

errors in perceived depth and average difficult ratings, 𝜏 = .19, 𝑝 (𝑜𝑛𝑒 − 𝑡𝑎𝑖𝑙𝑒𝑑), 𝑝 < .0005.

Although this correlation was found to be significant, it should be pointed out that the small

value of τ is potentially due to the high number of occurrences where 2 and 3 were chosen as the

difficulty ratings.

108

Figure 6.34: Scatterplot showing average absolute errors in perceived depth as a function of the

difficulty rating for all trials.

6.4.4 Responses to the Interview Questions

Once the participants were done with the experiment trials, they were asked to reply to several

questions. The answers to each of these topics are provided in Appendix B3 and summarized

below.

• Any difficulties experienced in fusing images

o While none of the participants claimed to have experienced double images when

observing the stimuli, some mentioned that as the cone moved farther inside the

bin, the more difficult it became to fuse in 3D. In fact, some claimed that this was

one of the strategies they used in judging the depth proportion to be large.

• Specific strategies used for making depth judgements

109

As mentioned before, by asking participants to “determine what proportion of the truncated

cone (x-tenths) is behind the bin's surface”, we are asking them to determine the exocentric

distance of the larger circle of the truncated cone relative to the bin’s surface (‘a’ in Figure

6.11), given that the exocentric distance between the larger circle and the smaller circle of the

truncated cone (‘a+b’ in Figure 6.11) is 10 units. To do so, participants needed to have a

rather quick estimate of ‘a’ and ‘b’. Therefore, as expected:

o Most participants claimed that they made their depth judgements by determining ‘a’

and ‘b’ (in no particular order) and then comparing the two values.

o Some mentioned that following their gaze along the lines connecting the two circles

also helped.

o Some also made their judgements based on the perceived change in the depth of the

virtual object relative to the previous trial.

On the other hand, even though participants were explicitly advised to not use the relative size

cue41 (since the size of the circles and lengths of the truncated cone changed randomly):

o Some participants claimed that they used the relative size of the large and small

circles to make their depth judgements. Some others also mentioned using the length

of the lines connecting the two circles as an indication of the length of the truncated

cone.

With regards to cues that helped them estimate the values for ‘a’ and ‘b’, participants used one or

a combination of several rules of thumb. In general, as the larger circle moved farther behind the

bin’s surface, participants used the following terms to describe their perception of the larger

circle:

o less clear,

o less stark,

41 As outlined in the Information Sheet provided.

110

o harder to fuse,

o more out of focus,

o jagged,

o blurrier,

and vice versa. Moreover:

o With the No Pattern condition, participants claimed that determining the depth

proportion of the truncated cone was most difficult. However, in cases where the cone

was completely in front of the surface of the bin, determining its depth was easy.

o Most participants found making depth judgements more difficult when dot densities

were high.

o In making their depth judgements, half of the participants found blurry patterns easier,

while the other half found sharp patterns easier.

o Some claimed when patterns were blurry, the surface of the bin seemed farther away.

o Some claimed that it was easier to perceive the larger circle as being behind the

surface of the bin when they focused on points where it was on the black dots (rather

than on the white dots).

• The way in which the black dots are perceived

o Almost all participants (except two) perceived the black dots as part of the surface

of the bin (i.e. as painted marks on the surface of the bin).

As for the other two:

o One participant viewed the pattern as a ‘cut-out’ where black dots were holes

through which the larger circle could be seen.

o One participant perceived the black dots as part of the inside of the bin (which

also infers that the black dots were perceived as holes).

111

6.5 Discussion

In this section, we provide a discussion of the results by summarizing their implications and

comparing them to the hypotheses presented in Section 6.3.

6.5.1 Errors in EDVO

HYPOTHESIS 1: The addition of (sharp) patterns will lead to lower errors in EDVO

relative to the No Pattern condition.

While the analysis presented in Section 6.4.1.1 revealed various interaction effects, the general

results supported this hypothesis for depths≥6/10. In other words, while there may be no

significant effect of using random dot patterns for improving the accuracy of depth judgements

for depths<6/10, results showed that as the depth of the virtual object increases (above 6), the

addition of random dot patterns onto the surface of the bin can significantly reduce errors in

EDVO. This result can also be visually confirmed by examining the plots presented in Figure

6.13, Figure 6.14 and Figure 6.15, where it can be seen that, for depths≥6/10, the median errors

in EDVO are consistently smaller than those of the No Pattern condition.

HYPOTHESIS 2a: Sharp patterns will lead to lower errors in EDVO compared to blurry

patterns.

The results presented in Sections 6.4.1.2.1 revealed significant interactions between depth and

blur. Upon inspection of contrasts and Figure 6.21, it can be confirmed that as the depth of the

virtual object increased, the average absolute errors in EDVO for sharp patterns were

significantly smaller compared to those of blurry patterns. Therefore, we can claim that the

results support this hypothesis for depths≥8/10. However, as shown in Figure 6.21, at

depth=0/10, the blurry patterns seem to lead to lower average absolute errors compared to the

sharp patterns. A potential explanation could be that when the random dot pattern was blurry (or

in other words, ‘out of focus’ for the observer), it gave the impression that the surface of the bin

was farther away than the virtual object (which appeared ‘in focus’). In such cases, participants

were more likely to perceive the virtual object as being closer to them than the surface of the bin.

When depth=0/10, this was in fact the case and, hence why average absolute errors are lower for

blurry patterns. As mentioned in Section 6.4.4, participants explicitly pointed this out when

replying to the interview questions.

112

Moreover, the interaction effect between blur and dot density (as discussed in Section 6.4.1.2.2)

revealed that this difference between the blurry and sharp conditions was significant for dot

density of 20% compared to those of 40 and 60%. This finding suggests that, even with 20% dot

density, sharp patterns may be able to provide the edges required for making vergence eye

movements.

HYPOTHESIS 2b: If EDVO errors were found to be different for blurry and sharp

patterns, it was hypothesised that this difference would be larger for smaller dot sizes.

The interaction effect between blur, dot size and depth (as discussed and presented in Section

6.4.1.2.4) also supported this hypothesis, by showing that as dot size decreased, the difference

discussed above became larger. As previously mentioned, the reason for this is that the fixed

amount of blur applied to the various patterns resulted in a larger visible effect for smaller dots.

EFFECT OF DOT SIZE

As previously mentioned, two competing hypotheses existed for the effect of dot size on absolute

average errors in EDVO: While larger dot sizes provide more distinct edges, smaller dot sizes

result in a larger number of edges available for making vergence eye movements. Perhaps this is

a possible reason why no main effect of dot size was found. The only interactions that dot size

appeared in were:

o Blur*depth* dot size: This effect was discussed above.

o Blur*depth* dot size*dot density: Since no particular trend can be found for this

interaction effect, it is difficult to hypothesize a reasoning for this interaction effect.

HYPOTHESIS 3: Higher dot densities will increase the stimulus’ strength in attracting

vergence (Rashbass & Westheimer, 1961; Mallot, Roll & Arndt, 1995), leading to smaller

errors in EDVO.

With regards to the effect of dot density, implications of some interaction effects did find

supporting evidence for this hypothesis. First of all, the interaction effect of blur and dot density

(as presented in Section 6.4.1.2.2) on average absolute errors in EDVO showed that the effect of

blur was significant for dot density of 20% compared to that of 40 and 60%. What this result

113

may be implying is that when dot density is low (and, hence, there are probably fewer edges

available for guiding vergence eye movements), the sharpness of the random dot pattern matters.

However, as dot density increases, the effect of blur diminishes, possibly as a result of an

increase in the random dot pattern’s strength in attracting vergence. In other words, at densities

of 40% and 60%, the random dot pattern’s strength in attracting vergence may be high enough to

compensate for the absence of distinct edges. Moreover, the interaction effect of depth and dot

density (presented in Section 6.4.1.2.3 and illustrated in Figure 6.23, Figure 6.24 and Figure

6.25) shows that, for depths>6/10, as dot density increases, the average absolute errors in EDVO

decrease. Put simply, as the depth of the virtual object increases, higher dot densities are better

able to aid in estimating the depth of the virtual object. This result has potentially important

implications for the application of random dot patterns for X-ray vision purposes. However, as

discussed in the case of the previous series of experiments, one should also be wary about the

amount of information loss of the real surface due to the overlaying of the random dot pattern.

6.5.2 DRs

HYPOTHESIS 4: The No Pattern condition will yield higher DRs compared to the pattern

conditions.

As mentioned in Section 6.4.2.1, results did not reveal a significant main effect of pattern.

Therefore, we were not able to support Hypothesis 4. Considering that the average absolute

errors in EDVO were generally significantly lower for the random dot patterns (compared to

those of the No Pattern condition), we may attribute the absence of significant differences in DRs

for these conditions to the possibility that, when replying to the question of difficulty in

‘estimating the depth of the virtual object’, participants were also considering the difficulty in

‘fusing the image’ (or, rather, their ‘comfort when viewing the image’). In fact, the interaction

effect found between blur and dot size also supports this explanation. To further explain these

results, a brief discussion of the literature on this topic is required.

Various researchers have shown the dependence of fusional limits on the spatial frequency of the

perceived image (e.g. Felton, Richards & Smith, 1972; Schor, Wood, & Ogawa, 1984; Schor,

Heckmann, & Tyler, 1989). The results of these studies have shown that as the dominant spatial

frequency of an image increases, the fusion limit decreases in range. Moreover, other researchers

have investigated the effect of spatial frequency on visual comfort when viewing stereoscopic

114

images (e.g. Wöpking, 1995; Perrin, Fuchs, Roumes & Perret, 1998). Based on these studies,

stereoscopic images with higher dominant spatial frequencies received lower comfort ratings

compared to those with lower dominant spatial frequencies. In fact, in some studies, the use of

blur (as a means of decreasing spatial frequency of the image content) is suggested to increase

viewing comfort (e.g. Wöpking, 1995; Leroy, Fuchs & Moreau, 2012).

Therefore, it seems that, while the No Pattern condition did not provide any edges to help

participants determine the depth of the real surface, the higher viewing comfort it provided

(because of its lower dominant spatial frequency compared to that of the random dot pattern

conditions) compensated for the difficulty of the depth estimation task. Perhaps this is why the

No Pattern and the random dot pattern conditions did not lead to significantly different DRs.

Moreover, as discussed in Section 6.4.2.2.2 and illustrated in Figure 6.33, for the random dot

pattern with dot size of 1/75, DRs were significantly larger for the sharp condition (higher spatial

frequency) compared to those of the blurry condition (smaller spatial frequency). The

consistency of this result with findings of the literature (discussed above) provides relatively

strong evidence that participants were, in fact, taking their viewing comfort into account when

assessing the difficulty of the depth estimation task.

Hypothesis 5: Blurry patterns will lead to higher difficulty ratings compared to their sharp

counterparts.

As mentioned, the results presented in Section 6.4.2.2.2 and illustrated in Figure 6.33, provided

evidence contradicting this hypothesis. The potential explanation is that blurring of random dot

patterns decreased the image’s spatial frequency resulting in lower DRs (as discussed above). It

should also be reiterated that the reason the effect of blur on DRs was larger for smaller dot sizes

is that the fixed amount of blur applied to the various patterns resulted in a larger visible effect

for smaller dots (as previously explained).

Hypothesis 6: Larger dot sizes will lead to lower DRs.

As illustrated in Figure 6.33, while this hypothesis was supported for sharp patterns, this was not

the case for blurry patterns. As predicted, in the case of sharp random dot patterns, larger dot

sizes provided edges that were more distinct, thereby, leading to lower DRs. Additionally, with

regards to the discussion above, since random dot patterns with larger dot sizes were of smaller

115

spatial frequency, they were more easily fused. On the other hand, when the patterns were blurry,

the fixed amount of blur applied to the random dot patterns resulted in larger attenuation of high

spatial frequencies for smaller dot sizes, which explains the reduction in DRs.

Hypothesis 7: Higher dot densities will lead to lower DRs.

Contrary to this hypothesis, during the interview, most participants claimed that they found

making depth judgements more difficult when dot densities were high. However, statistical

analyses did not reveal the effect of dot density on DRs as significant.

6.5.3 Relationship between Average Absolute Errors in EDVO and DRs

When forming the hypotheses for this experiment, it was expected that the parameters that would

lead to smaller errors in EDVO would do so by making the depth judgement task easier. In other

words, we expected the magnitude of the DRs to be proportional to errors in EDVO. While the

results presented in Section 6.4.3 do support the hypothesis that a correlation exists between

these two dependent parameters, the value of this correlation is rather small. This small value

serves as further evidence for the discussion presented on DRs above. We may, therefore,

conclude that while the wording of “On a scale of 1 (easiest) to 4 (most difficult), how difficult

did you find the task?” intended to assess the difficulty of the depth estimation task, participants

may have responded to it with a measure that combined their visual comfort rating with their

assessment of the difficulty of the depth estimation task.

6.5.4 Some Notes on the Responses to the Interview Questions

o As previously mentioned, participants used the following terms to describe their

perception of the larger circle as it moved farther behind the bin’s surface: less clear, less

stark, harder to fuse, more out of focus, jagged, blurrier. These descriptions seem to be in

agreement with the term Pseudo-Translucency (or Stereo-Pseudo-Translucency) that we

used in Chapter 4 to label the observed phenomenon when a virtual object is rendered

stereoscopically behind a real surface without being occluded by it. As may be recalled, a

pseudo-translucent surface gives the impression that one is looking through a diffuse

surface, somewhat akin to frosted glass, and can therefore result in perceiving the virtual

object (behind it) as ‘less clear’, ‘less stark’, etc.

116

o Upon inspection of figures illustrating average absolute errors in EDVO versus depth of

the virtual object (e.g. Figure 6.18, Figure 6.19 and Figure 6.20), one of the first

impressions that stands out is the apparent U-shaped trend of the results. In other words,

the errors seem to be largest for depths=0/10 and 10/10 and smallest in between these

depths. While this may seem counter-intuitive, two potential explanations can be

presented for this observation. Firstly, as previously mentioned, the average absolute

errors in EDVO were calculated by subtracting the actual depth of the virtual object from

its perceived depth. Therefore, the maximum values of errors for depths between 0/10

and 10/10 were smaller than those for depths=0/10 and 10/10. In addition to this, it is also

likely that, when faced with uncertainty, participants tended to choose an intermediate

value, which would obviously lead to smaller errors for the depths between 0/10 and

10/10.

o The second potential explanation, which is potentially most important, stems from the

statement that some participants made as one of their strategies in making depth

judgements: “Following my gaze along the lines connecting the two circles also helped.”

One of the lessons learned when trying out different ideas for the design of the virtual

object for this experiment was that when the virtual object’s wireframe structure

contained vertical components that passed through the real surface, observers would

usually be easily able to determine the proportion of that component that was behind the

surface. The reason for this is that, at the point where the conflict between occlusion and

binocular disparity cues occurs, the virtual object becomes rather ‘less clear’, ‘less stark’,

etc., as discussed above. For this reason, when creating the virtual object (through the

process described in Section 6.2.1.3), the lines used for connecting the circles of the

truncated cone were drawn as thin as possible to minimize the usefulness of this cue.

However, as the U-shaped trend of the errors in EDVO suggest, it seems that this cue was

perhaps still being used for making depth judgements for depths between 0/10 and 10/10.

For depths=0/10 and 10/10, this cue was not available and perhaps this is why the

average absolute errors are highest for these depths.

o In making their depth judgements, half of the participants found blurry patterns easier,

while the other half found sharp patterns easier. As discussed in Section 6.5.3, the reason

for the different opinions on this could potentially be that, while some focused mostly on

117

their visual comfort, others were considering their ability to make accurate depth

judgements.

o Referring to the two participants’ comments regarding ‘perceiving the random dot pattern

as a ‘cut-out’ where black dots were holes through which the larger circle could be seen’

and ‘perceiving the black dots as part of the inside of the bin’, it seems that these two

participants were able to conform to the anticipated percept of our X-ray vision

visualization method, which was presented in Chapter 4 (Figure 4.3).

6.6 Contributions and Limitations

The main contributions of this experiment are as follows:

• Random dot patterns were shown as a potentially effective method for increasing the

accuracy of depth judgements for application of X-ray vision with stereoscopic AR

displays.

• Sharp random dot patterns (with distinct edges) may be effective in guiding vergence eye

movements, by providing features that are easier to fixate.

• In choosing the appropriate dot size and density of the random dot pattern, the spatial

frequency content of the pattern should be considered to ensure that it does not

compromise visual comfort.

It is important, however, to point out that the experiments presented here were limited to the use

of a convex surface without a prominent visible texture, and to a 3D wireframe virtual object

being presented in depth. Therefore, the applicability and validity of these results in conditions

where the real object’s surface contains 2D or 3D textural elements, or in cases where the virtual

object is solid, are unknown.

Moreover, as discussed in Section 6.2.1.3, the generation of stimuli for this experiment was done

using physical models, which allowed for a simulation of an AR display. While it is expected

that the results obtained remain valid for an actual AR display replicating these experimental

conditions (by computationally adding the random dot pattern to the real surface and generating

118

and rendering the virtual object in depth), the possible imprecisions resulting from the use of

physical models should also be considered as a potential limitation of this experiment.

119

Chapter 7

Conclusion

The experimental evaluations performed throughout this research study support the premise of

random dot patterns improving the accuracy of ordinal and absolute depth judgements of virtual

objects relative to the surface of real objects. The magnitude of dot size and dot density used

have also been shown to affect the perception of surface information and should, therefore, be

considered when using this technique to improve depth judgments for the purpose of achieving

X-ray vision with stereoscopic AR displays.

7.1 Contributions

The individual contributions of each experiment have been discussed in Sections 5.5 and 6.6. In

this section, I provide a summary of the overall contributions of this research project:

• Contributed to the advancement of a novel approach to achieving X-ray vision for use

with stereoscopic AR displays.

• Performed an experimental procedure that:

o Verified the benefit of using random dot patterns to help in disambiguation of the

depth order between a virtual object and a flat real surface with 2D textural

elements,

o Verified the benefit of using random dot patterns to help in achieving the

impression of transparency of the real surface, and

o Demonstrated that real surface information can be preserved even with the

addition of RDPs, depending on the sizes and densities of dots used.

• Contributed to the fundamental knowledge base in visual representation by providing

supporting evidence that occlusion (which is widely recognized as the strongest cue to

relative depth order) can be robustly overridden through the appropriate use of the

binocular disparity and convergence cue.

120

• Developed and implemented another experimental platform, software and procedure

specifically focused on absolute depth judgements, the results of which:

o Provided supporting evidence that distinct edges of (sharp) random dot patterns

may allow for improvements in the accuracy of absolute depth judgments,

possibly due to increased ease of fixation on those distinct edges, leading to

additional depth information from convergence cues;

o Confirmed the benefit of using random dot patterns to improve the accuracy of

absolute depth judgements, and

o Assessed the impact of random dot pattern parameters (dot size and dot density)

on accuracy of absolute depth judgements.

7.2 Practical Implications

From a practical point of view, the findings can be used as a basis for developing guidelines for

the use of random dot patterns to achieve X-ray vision and to improve (absolute and ordinal)

depth judgements for near-field applications of stereoscopic AR displays. Based on the

experimental results, for near-field applications of stereoscopic AR displays, the following

design guidelines are proposed:

• Higher dot densities are better able to aid in estimating the depth of the virtual object (as

illustrated by Figure 6.23, Figure 6.24 and Figure 6.25) for larger depths (in the present

experiment for depths ≥6/10). However, the preservation of real surface information is an

important factor to consider, as higher dot densities lead to more information loss (as

suggested by Figure 5.7).

• If preservation of real surface information is required, consideration should be given to

both dot size and dot density of the random dot pattern. For example, if the location of

the most important information on the real surface is known, using a larger dot size is

recommended (as suggested by the low absolute errors in Figure 5.8), since larger dots

can be configured such that important parts of the real surface are not occluded. In this

case, dot densities as low as 20% can still be used effectively (as shown in Figure

6.18Figure 6.22). On the other hand, if the location of the most important information on

121

the real surface is not known, smaller dot sizes with higher dot densities are

recommended (as suggested by Figure 6.25).

• In cases where factors such as the use of small dot sizes and high dot densities may lead

to subjective difficulty in making depth judgements (possibly due to the spatial frequency

content of such random dot patterns, which are potentially associated with visual

discomfort), blurring of the random dot pattern may be helpful (as suggested by Figure

6.33). However, since doing so may diminish the desired improvement in making depth

judgements, blurring of the random dot pattern is not recommended for dot densities

below 40% (as shown in Figure 6.22) and dot sizes smaller than 1/50 (as suggested by the

large difference between the sharp and blurry conditions for depth=10 in Figure 6.27 and

Figure 6.28).

• In cases where virtual objects are partially inside the real object, and where accurate

depth judgement is important for task execution, using a blurry pattern may be a better

alternative for small depths (as suggested by Figure 6.21 for depths≤4) but using sharp

patterns may be better for large depths (as suggested by the same figure for depths≥6).

7.3 Limitations and Suggested Improvements to Experiments

When interpreting the results and findings of this research one should be mindful of a number of

limitations related to the experiments performed. These limitations are divided into two

categories, those corresponding to Experiment 1 and 2 and those corresponding to Experiment 3,

and are followed by suggested improvements.

7.3.1 Experiments 1 and 2

As discussed, the ‘real surface’ that was used in these two experiments was a photograph of a flat

surface that consisted of 2D textural elements. Therefore, the results of these experiments are

limited to such surfaces and may not be applicable to non-flat surfaces with 2D textural elements

or surfaces that contain 3D textural elements. Using this experimental framework for different

surfaces would be worthwhile in investigating the effect of surface characteristics on the

effectiveness of using random dot patterns. The results of these experiments also apply only to

wireframe 2D virtual objects.

122

As for Experiment 1 (and as noted in Footnote 29), the psychophysical functions were fitted to

data that were in the close vicinity of the Point of Objective Equality. To obtain psychophysical

functions with better fits, it is suggested to also include stimuli where the virtual object is placed

at larger (stereoscopic) distances from the real surface (both in front of and behind).

With regards to Experiment 2, as shown in Figure 5.7, we were not able to identify a clear trend

for the Transparency Rating values as a function of dot density (which varied between 40, 50, 60

and 70%). Comparing our results to those of Otsuki and Milgram (2013), it is suspected that

choosing dot densities as low as 20% might have revealed such a trend.

The results obtained for d’ in this experiment resulted in values for a few combinations of dot

size and dot density that were very small (≤0.5) or close to 0, which indicate close to chance

performance. One of the implications of this result is that the shape matching task may have been

too difficult. One way to mitigate this when designing a similar experiment, would be to increase

the eccentricity of the ellipses used.

7.3.2 Experiment 3

While Experiment 3 aimed to expand upon the work done in Experiments 1 and 2, there are also

limitations associated with this experiment. One of these is that the real object consisted of a

convex surface with no visible texture and the virtual object was a wireframe object. Therefore,

the results of this experiment are limited to such objects. As an example of this limitation, it is

predicted (but still remains open for investigation) that an analogous convex surface but with 3D

textural elements might not necessarily benefit from the addition of random dot patterns, as was

observed through a few samples tried out in pilot studies.

Yet another avenue of future investigation is to explore the applicability of this method when

using solid virtual objects.

The present experiment also consisted of conditions where the truncated (wireframe) cone was

placed either in front of, in between, or behind the real surface. As discussed in Section 6.5.4,

this resulted in some participants following their gaze along the lines connecting the two circles

to find ‘the point of cue conflict’ to make their depth judgement. Since this cue may not exist for

certain X-ray vision applications of AR, it is also necessary to investigate the effectiveness of

123

using random dot patterns for cases where the virtual object is placed at different depths but

always behind the real surface.

Moreover, as discussed in Section 6.6, the generation of stimuli for this experiment was done

using physical models, which allowed for a simulation of an AR display. As a result, the possible

imprecisions resulting from the use of physical models should be considered as a limitation of

this experiment. To eliminate such limitations, 3D software should be developed to permit the

random dot patterns to be correctly added to the real object surface by obtaining a depth map for

the real surface, and the virtual object should be generated and rendered in depth directly, also

using software.

Additionally, even though the results of this experiment were in line with the hypothesis that

adding random dot patterns will provide additional fixation points on the real object surface, that

in turn will facilitate vergence eye movements and thus provide additional depth cues, making a

definitive conclusion in this regard is not possible, as eye convergence angles were not measured

in this experiment. Using appropriate experimental means, such as eye tracking devices, to

continuously measure exact angles of convergence is necessary to investigate this phenomenon

more closely and more definitively.

Finally, it should be noted that these experiments were carried out in a controlled laboratory

environment and many simplifications were made in task complexity and environmental

conditions. In medical applications of AR, for example, many task and environmental

complexities exist, and therefore further studies under more realistic conditions are required to

determine the exact conditions under which this approach will prove effective.

7.4 Future Work

The following is a list of suggestions for future work that can build on what was found in this

research:

• Implementing the idea of adding random dot patterns onto a real surface and generating

and rendering virtual objects in depth computationally, for stereoscopic AR, to assess

potential challenges in its technical implementation. Once this is done:

124

• Introducing complexity to the experimental task by developing an experimental platform

and designing an interactive manual manipulation task that requires depth judgements to

go beyond the present strictly observational depth judgement tasks.

• Along similar lines, developing the capability for the virtual object to move in relation to

the real object surface (either autonomously or under the control of the user), in order to

investigate the premise that visual cues derived from motion of the virtual object relative

to the real object surface will significantly enhance the percept of the virtual object being

behind the real surface.

• Analogous to the point above, it is surmised that being able to manipulate the

superimposed random dot patterns, in terms of dot sizes, dot densities and dot

distributions (either autonomously or under the control of the user) is likely to increase

some of the advantages of this display concept, and thus merits investigation.

• Further evaluation of this approach outside of a controlled laboratory setting, in an

environment that is representative of near-field applications of X-ray vision in AR.

• Further to the point above, some of the environmental conditions that need to be studied

include: implications of real surfaces consisting of salient information to be preserved,

variations of colour of the virtual object and presence of other depth cues such as motion

parallax.

125

References

Abdelmounaime, S., & Dong-Chen, H. (2013). New Brodatz-based image databases for

grayscale color and multiband texture analysis. ISRN Machine Vision.

Akerstrom, R. A., & Todd, J. T. (1988). The perception of stereoscopic transparency. Perception

& Psychophysics, 44(5), 421-432.

Avery, B., Sandor, C., & Thomas, B. H. (2009, March). Improving spatial perception for

augmented reality x-ray vision. In Proceedings of the IEEE Virtual Reality Conference (pp. 79-

82). IEEE.

Bajura, M., Fuchs, H., & Ohbuchi, R. (1992). Merging virtual objects with the real world: Seeing

ultrasound imagery within the patient. In ACM SIGGRAPH Computer Graphics (Vol. 26, No. 2,

pp. 203-210). ACM.

Bichlmeier C., Wimmer F., Heining S. M., Navab N. (2007). Contextual anatomic mimesis;

hybrid in-situ visualization method for improving multi-sensory depth perception in medical

augmented reality. In ISMAR 2007: The 6th IEEE and ACM (pp. 129-138). IEEE.

Brodatz, P. (1966). Textures: a photographic album for artists and designers. New York, USA:

Dover Publications.

Bruce, V., Green, P. R., & Georgeson, M. A. (2003). Visual perception: Physiology, psychology,

& ecology. Psychology Press.

Bülthoff, H. H., & Mallot, H. A. (1988). Integration of depth modules: stereo and shading. Josa

a, 5(10), 1749-1758.

Caudell, T. P., & Mizell, D. W. (1992). Augmented reality: An application of heads-up display

technology to manual manufacturing processes. In System Sciences, 1992. Proceedings of the

Twenty-Fifth Hawaii International Conference on (Vol. 2, pp. 659-669). IEEE.

Coutant, B. E., & Westheimer, G. (1993). Population distribution of stereoscopic ability.

Ophthalmic and Physiological Optics, 13(1), 3-7.

126

Cumming, B. G., Johnston, E. B., & Parker, A. J. (1993). Effects of different texture cues on

curved surfaces viewed stereoscopically. Vision research, 33(5), 827-838.

Cutting, J. E. & Vishton, P. M. (1995). Perceiving layout and knowing distances: The

integration, relative potency, and contextual use of different information about depth. W.

Epstein, 69-117.

Drascic, D. & Milgram, P. (1996). Perceptual issues in augmented reality. In Proceedings of

SPIE: Stereoscopic Displays and Virtual Reality Systems III, San Jose, California, 123-134.

Edwards, P. J., Johnson, L. G., Hawkes, D. J., Fenlon, M. R., Strong, A. J., & Gleeson, M. J.

(2004). Clinical experience and perception in stereo augmented reality surgical navigation. In

Medical Imaging and Augmented Reality (pp. 369-376). Springer Berlin Heidelberg.

Ellis, S. R., & Bucher, U. J. (1994). Distance perception of stereoscopically presented virtual

objects optically superimposed on physical objects by a head-mounted see-through display. In

Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 38, No. 19,

pp. 1300-1304). SAGE Publications.

Ellis, S. R., & Menges, B. M. (1998). Localization of virtual objects in the near visual field.

Human Factors: The Journal of the Human Factors and Ergonomics Society, 40(3), 415-431.

Felton, T. B., Richards, W., & Smith, R. A. (1972). Disparity processing of spatial frequencies in

man. The Journal of physiology, 225(2), 349-362.

Foley, J. M., & Richards, W. (1972). Effects of voluntary eye movement and convergence on the

binocular appreciation of depth. Attention, Perception, & Psychophysics, 11(6), 423-427.

Gescheider, G. A. (2013). Psychophysics: The Fundamentals. New Jersey, USA: Psychology

Press.

Ghasemi, S., Otsuki, M., Milgram, P., & Chellali, R. (2017). Use of Random Dot Patterns in

Achieving X-Ray Vision for Near-Field Applications of Stereoscopic Video-Based Augmented

Reality Displays. PRESENCE: Teleoperators and Virtual Environments, 26(1), 42-65.

127

Gibson, J. J. (1950). The perception of visual surfaces. The American journal of psychology,

63(3), 367-384.

Gibson, J. J. (2014). The ecological approach to visual perception: classic edition. Psychology

Press.

Gillam, B.J. & Grove, P.M. (2011). Contour entropy: a new determinant of perceiving ground or

a hole. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 750-

757.

Hou, M., & Milgram, P. (2003, October). A sensitivity study of factors influencing real-virtual

object alignment performance in stereoscopic augmented reality environments. In Proceedings of

the Human Factors and Ergonomics Society Annual Meeting (Vol. 47, No. 13, pp. 1630-1634).

Sage CA: Los Angeles, CA: SAGE Publications.

Interrante, V. (1996). Illustrating Transparency: Communicating the 3D shape of layered

transparent surfaces via texture. Unpublished Doctoral dissertation, University of North

Carolina at Chapel Hill.

Interrante, V., Fuchs, H., & Pizer, S. M. (1997). Conveying the 3D shape of smoothly curving

transparent surfaces via texture. IEEE Transactions on visualization and computer graphics,

3(2), 98-117.

Johnson, L. G., Edwards, P., & Hawkes, D. (2003). Surface transparency makes stereo overlays

unpredictable: the implications for augmented reality. Studies in health technology and

informatics, 131-136.

Johnston, E. B., Cumming, B. G., & Parker, A. J. (1993). Integration of depth modules:

Stereopsis and texture. Vision research, 33(5), 813-826.

Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago, IL: University of Chicago

Press.

Kalkofen, D., Mendez, E., & Schmalstieg, D. (2007, November). Interactive focus and context

visualization for augmented reality. In Proceedings of the 2007 International Symposium on

Mixed and Augmented Reality (ISMAR) (pp. 1-10). IEEE Computer Society.

128

Kennedy, J. M. (1974) A psychology of picture perception. Jossey-Bass, San Francisco.

Kennedy. J. M., Juricevic, I. and Bai, J. (2003) Line and borders of surfaces: grouping and

foreshortening. In Reconceiving pictorial space. (Hecht, H., Schwartz, R. and Atherton M. Eds.)

(p.321-354). MIT Press: Cambridge, MA.

Kennedy, J. M., & Wnuczko, M. (2015). What Is a Surface? In the Real World? And Pictures? In

Investigations into the phenomenology and the ontology of the work of art (pp. 89-107). Springer

International Publishing.

Kilpatrick, F. P., & Ittelson, W. H. (1953). The size-distance invariance hypothesis.

Psychological Review, 60(4), 223.

Landy, M. S., Maloney, L. T., Johnston, E. B., & Young, M. (1995). Measurement and modeling

of depth cue combination: in defense of weak fusion. Vision research, 35(3), 389-412.

Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z. (2007, October). Pq-space based non-

photorealistic rendering for augmented reality. In International Conference on Medical Image

Computing and Computer-Assisted Intervention (pp. 102-109). Springer, Berlin, Heidelberg.

Leroy, L., Fuchs, P., & Moreau, G. (2012). Visual fatigue reduction for immersive stereoscopic

displays by disparity, content, and focus-point adapted blur. IEEE Transactions on Industrial

Electronics, 59(10), 3998-4004.

Livingston, M. A., Dey, A., Sandor, C., & Thomas, B. H. (2013). Pursuit of “X-ray vision” for

augmented reality (pp. 67-107). Springer New York.

Mallot, H. A., Roll, A., & Arndt, P. A. (1995). Disparity-evoked vergence is directed towards

average depth.

McIntire, J. P., Havig, P. R., & Geiselman, E. E. (2014). Stereoscopic 3D displays and human

performance: A comprehensive review. Displays, 35(1), 18-26.

Milgram, P., & Kishino, F. (1994). A taxonomy of mixed reality visual displays. IEICE

Transactions on Information and Systems, 77(12), 1321-1329.

129

Mohr, P., Kerbl, B., Donoser, M., Schmalstieg, D., & Kalkofen, D. (2015, April). Retargeting

technical documentation to augmented reality. In Proceedings of the 33rd Annual ACM

Conference on Human Factors in Computing Systems (pp. 3337-3346). ACM.

Otsuki, M., & Milgram, P. (2013, October). Psychophysical exploration of stereoscopic pseudo-

transparency. In Proceedings of the 2013 International Symposium on Mixed and Augmented

Reality (ISMAR) (pp. 1-6). IEEE.

Parker, A. J., Christou, C., Cumming, B. G., Johnston, E. B., Hawken, M., & Zisserman, A.

(1992). The analysis of 3-D shape: psychophysical principles and neural mechanisms. In

Approaches to Understanding Vision.

Patterson, R. (2009). Review Paper: Human factors of stereo displays: An update. Journal of the

Society for Information Display, 17(12), 987-996.

Perrin, J., Fuchs, P., Roumes, C., & Perret, F. (1998, July). Improvement of stereoscopic comfort

through control of the disparity and of the spatial frequency content. In Aerospace/Defense

Sensing and Controls (pp. 124-134). International Society for Optics and Photonics.

Peterson, M.A. (2015). Low-level and high-level contributions to figure-ground organization:

Evidence and theoretical implications. In J. Wagemans (ed.). The Oxford Handbook of

Perceptual Organization, 259-280. Oxford University Press.

Rao, A. R., & Lohse, G. L. (1993). Identifying high level features of texture perception. CVGIP:

Graphical Models and Image Processing, 55(3), 218-233.

Rashbass, C., & Westheimer, G. (1961). Disjunctive eye movements. The Journal of Physiology,

159(2), 339-360.

Rosenthal, M., State, A., Lee, J., Hirota, G., Ackerman, J., Keller, K., & Fuchs, H. (2002).

Augmented reality guidance for needle biopsies: an initial randomized, controlled trial in

phantoms. Medical Image Analysis, 6(3), 313-320.

130

Sandor, C., Cunningham, A., Dey, A., & Mattila, V. V. (2010). An augmented reality x-ray

system based on visual saliency. In Proceedings of the 2010 9th IEEE and ACM International

Symposium on Mixed and Augmented Reality on (pp. 27-36). IEEE.

Schall, G., Zollmann, S., & Reitmayr, G. (2013). Smart Vidente: advances in mobile augmented

reality for interactive visualization of underground infrastructure. Personal and ubiquitous

computing, 17(7), 1533-1549.

Schmalstieg, D., & Hollerer, T. (2016). Augmented reality: principles and practice. Addison-

Wesley Professional.

Schor, C., Heckmann, T., & Tyler, C. W. (1989). Binocular fusion limits are independent of

contrast, luminance gradient and component phases. Vision research, 29(7), 821-835.

Schor, C., Wood, I., & Ogawa, J. (1984). Binocular sensory fusion is limited by spatial

resolution. Vision research, 24(7), 661-665.

Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N. (2006, October). Depth perception–a

major issue in medical AR: evaluation study by twenty surgeons. In International Conference on

Medical Image Computing and Computer-Assisted Intervention (pp. 364-372). Springer Berlin

Heidelberg.

Singh, G., Swan II, J. E., Jones, J. A., & Ellis, S. R. (2010, July). Depth judgment measures and

occluding surfaces in near-field augmented reality. In Proceedings of the 7th Symposium on

Applied Perception in Graphics and Visualization (pp. 149-156). ACM.

Swan, J. E., Jones, A., Kolstad, E., Livingston, M. A., & Smallman, H. S. (2007). Egocentric

depth judgments in optical, see-through augmented reality. IEEE transactions on visualization

and computer graphics, 13(3), 429-442.

Thurstone, L. L. (1927). The method of paired comparisons for social values. The Journal of

Abnormal and Social Psychology, 21(4), 384.

Tsirlin, I., Allison, R. S., & Wilcox, L. M. (2008). Stereoscopic transparency: Constraints on the

perception of multiple surfaces. Journal of Vision, 8(5), 1-10.

131

van Ee, R., Adams, W. J., & Mamassian, P. (2003). Bayesian modeling of cue interaction:

bistability in stereoscopic slant perception. JOSA A, 20(7), 1398-1406.

Van Ee, R., Van Dam, L. C., & Erkelens, C. J. (2002). Bi-stability in perceived slant when

binocular disparity and monocular perspective specify different slants. Journal of Vision, 2(9),

597-607.

Wheatstone, C. (1838). Contributions to the physiology of vision.--Part the first. On some

remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical transactions

of the Royal Society of London, 371-394.

Wickens, C. D., Hollands, J. G., Banbury, S., & Parasuraman, R. (2015). Engineering

psychology & human performance. Psychology Press.

Wismeijer, D. A., Erkelens, C. J., van Ee, R., & Wexler, M. (2010). Depth cue combination in

spontaneous eye movements. Journal of vision, 10(6), 1-15.

Wöpking, M. (1995). Viewing comfort with stereoscopic pictures: An experimental study on the

subjective effects of disparity magnitude and depth of focus. Journal of the society for

information display, 3(3), 101-103.

Young, M. J., Landy, M. S., & Maloney, L. T. (1993). A perturbation analysis of depth

perception from combinations of texture and motion cues. Vision research, 33(18), 2685-2696.

Zollmann, S., Kalkofen, D., Mendez, E., & Reitmay, G. (2010, October). Image-based ghostings

for single layer occlusions in augmented reality. In Proceedings of the 2002 International

Symposium on Mixed and Augmented Reality (ISMAR) (pp. 19-26). IEEE.

132

Appendix A: Forms and Questionnaires

A1. Experiment 1

INFORMATION SHEET FOR PARTICIPANTS

Thank you for agreeing to participate in this experiment, the purpose of which is to investigate

one’s ability to perceive a sense of transparency when viewing images using stereoscopic

displays (i.e. “3D displays”).

At the beginning of the session you will be asked to fill in a short questionnaire, after which you

will be given a brief visual test to confirm your ability to view images stereoscopically. If

accepted for participation, you will undergo a training period of about 4 minutes, to familiarise

yourself with the experiment. Please feel free to ask questions during or at the end of the training

period.

The essential stimulus that you will be shown, on a computer screen, is illustrated in the figure

below. In all cases you will observe these images using “3D glasses”, which will cause you to

perceive the images “in depth” – in other words, at different distances from the plane of the

computer monitor.

In each figure, you will observe a surface consisting of a coloured texture, a portion of which

may or may not be covered with a set of black dots. In the cases where the black dots are present,

the number and size of the dots will vary from presentation to presentation. In the centre of the

surface will be a circle. In some cases, the circle will appear to be closer to you than the surface;

in others it will appear to be farther away – that is, behind the surface. In both cases the circle

will look solid and uninterrupted.

In this experiment, you will be presented a series of images as described above. If you think that

the circle is farther away from you than the surface (in other words, if you think that the circle is

behind the surface), you will click on the corresponding key for “Behind” (8); otherwise you will

click on the corresponding key for “Front” (2). You will then be prompted to press the spacebar

to move on the next trial. If you wish to take a break sometime between the trials, you may do so

after selecting your response and before pressing the spacebar. There will be 300 such images;

however, you should be able to make each judgement fairly rapidly, and finish this experiment in

about 30 minutes. It should be noted that, for each judgement, you have a total of 4 seconds to

enter your choice. If you fail to do so, you will be automatically presented with the next stimulus

and will, later on, be presented with that same trial again until you have succeeded in entering

your choice within the 4 second limit. Upon completion of the experiment, you will be offered

$10, as a token of our gratitude for your participation.

133

PARTICIPANT CONSENT FORM

"Development of a Method for Facilitating the Percept of Transparency in Stereoscopic Augmented Reality Environments"

I have read and understood the information sheet, and I hereby consent to participate in this

research project, with the understanding that participation involves:

• Filling out one questionnaire before the experiment.

• Performing a set of psychophysical tasks, which have been explained to me.

• The psychophysical tasks involve wearing 3D glasses to view a stereoscopic (3D)

display.

I understand that the experiment will comprise a single half hour session.

I also confirm that any questions I have asked have been answered to my satisfaction, but in the

future I may ask further questions I may have about the study or the research procedures.

I understand that my name will not appear on the questionnaire, that my performance data will

remain confidential, and that only the investigators of this study will have access to my

experimental data.

Furthermore, although aggregated results of this may be presented at conferences or in scientific

journals, I also understand that no reference to the identity of any participant in this study will be

possible through publication of its results, thereby ensuring that all participants will remain

anonymous.

I understand that participation in this study is strictly voluntary. After completing the session, I

will be paid $10 for my participation.

I do, however, have the right to refuse to answer any questions asked on the questionnaire, as

well as the right to withdraw from the study at any time without any penalty and without the

need to provide any explanation for doing so.

I understand that there is a chance that I may experience some nausea or a headache as a result of

wearing the 3D glasses. If I do experience this and I do not wish to proceed as a consequence, I

shall be free to withdraw from the experiment.

In the event of early withdrawal, my remuneration will be calculated based on the actual time I

shall have spent in the study, at a rate of $15 per hour. As part of my right to withdraw from the

study, I may request that my data be destroyed. However, in the absence of such a request, I

understand that the investigators may elect to use those data, with no changes to the restrictions

to their use.

I understand that I may request a summary of the research results by contacting the investigators

directly.

I have been given a copy of this consent form. I understand what this study involves and agree to

participate.

134

Participant's Name: ____________________________

Signature: _________________________________ Date: ________________________

The persons who may be contacted about this research are: Sanaz Ghasemi and Prof. Paul

Milgram

Both may be reached at: 5 King’s College Rd., Toronto, ON M5S 3G8; Tel. 416-978-3662.

You may also contact the Ethics Review Office at [email protected] or 416-946-3273,

if you should have questions about your rights as a participant.

QUESTIONNAIRE


Subject number: _____ Date: _______________________

1. Gender (please circle one): Male Female

2. Age (please circle one): 18-19 20-29 30-39 40-49 ≥50

3. Do you ordinarily wear corrective lenses of any kind? Yes No

If yes, are you wearing your prescribed lenses right now? Yes No

4. To the best of your knowledge, are you able to view images stereoscopically (“in 3D”)?

Yes No

135

A2. Experiment 2



one’s ability to perceive a sense of transparency when viewing images using stereoscopic displays

(i.e. “3D displays”).


will be given a brief visual test to confirm your ability to view images stereoscopically. If accepted

for participation, you will undergo two separate sections to complete the experiment. The first

section will require you to perform a shape determination task which you should try to perform as

accurately as possible while the next section will be more subjective (i.e. there will be no right

answer to the selections you will be making).




computer monitor.

Figure 1: Essential Stimulus

In each figure, you will observe a surface consisting of a coloured (purple) texture, a portion of

which may or may not be covered with a set of black dots. In the cases where the black dots are

present, the number and size of the dots will vary from presentation to presentation. In the centre

of the surface will be a blue circle that will always appear farther away – that is, behind the surface.

However, the circle will look solid and uninterrupted (as though the purple texture is transparent).

In addition to the blue circle, you will also be presented with two yellow shapes placed on the

purple texture covered with the black dots. In all of the stimuli, the outer yellow shape will be a

circle. However, the inner yellow shape may be a circle or an ellipse (the orientation of the major

axis of the ellipse will be different every time). In fact, there will be a 70% probability that it will

be an ellipse and a 30% probability that it will be a circle.

During a brief training session you will go through, you will become familiar with the experimental

procedure which will require you to determine whether the inner shape is a circle or an ellipse. If

136

you think that the inner yellow shape is a circle, you will press the “up” arrow; otherwise (if you

think that the inner yellow shape is an ellipse) you will choose the orientation of its major axis by

pressing the number corresponding to it (a guide will be provided to you).

You may choose to determine the shape of the inner yellow object based on the homogeneity of

its distance from the outer yellow circle. Obviously, the easiest case will be where the surface is

not covered by the black dots. The figures below can give you an idea about how subtle the

difference will be.

(a) (b)

Figure 2: The inner shape can be a circle (a) or an ellipse (b).

You will have 6 seconds to make your selection and will then be prompted to press the spacebar

to move on to the next trial. If you fail to select your answer within the time limit, the stimulus

will disappear and you will be presented with the next trial which will repeat itself again some

other time during the experiment. If you wish to take a break sometime between the trials, you

may do so after selecting your response and before pressing the spacebar. This part of the

experiment will take about 16 minutes to complete. Please try to make your choices as accurately

as possible. In case you choose to release your email in the consent form, your chances of winning

a $50 Amazon gift card (in the lottery to be held once the experiments are done) will increase

proportionally to your performance score.

Once you are done with this section, you will go through a short training session and move on to

the next part which will require you to make a comparison between two images similar to those in

Figure 1. In this section, for each pair of images, you will be asked to answer this question:

“In which image (left or right) does the impression of transparency look more convincing?”

(The black dots may or may not appear to you as “holes” in the coloured surface, and the surface

may or may not appear to you to be transparent. There are no “right answers” in this experiment;

please try simply to express to us whatever it is that you perceive. Please answer the question even

if neither surface seems to you to be particularly transparent.) There will be no time limit for this

task and you will be able to make your selection using the right or left arrow key.

137

There will be 78 repetitions of these trials; however, you should be able to make each judgement

fairly rapidly, and finish this experiment in less than 30 minutes. Upon completion of the

experiment, you will be offered $10, as a token of our gratitude for your participation.






• Performing a set of psychophysical tasks, which have been explained to me.

• The psychophysical tasks involve wearing 3D glasses to view a stereoscopic (3D)

display.

I understand that the experiment will comprise a single half hour session.



I understand that my name will not appear on the questionnaire, that my performance data will

remain confidential, and that only the investigators of this study will have access to my

experimental data.




anonymous.


will be paid $10 for my participation. By providing my email below, I agree to take part in a

lottery for a $50 Amazon gift card with the chances of my winning determined based on my

performance score.










138


to their use.


directly.


participate.


Participant's Email: ____________________________

Signature: ____________________________________ Date: ________________________


Milgram




QUESTIONNAIRE


Date: _______________________






Yes No

139

A3. Experiment 3



one’s ability to make correct depth judgements when viewing images using stereoscopic displays

(i.e. “3D displays”).


will be given a brief visual test to confirm your ability to view images stereoscopically. If accepted

for participation, you will start the experiment which will be divided into two 25-minute sections

separated by a 10-minute break. This section will require you to perform a depth judgement task

which you should try to perform as accurately as possible while providing a subjective measure of

your certainty regarding the decision you made (i.e. there will be no right answer to this latter

measure).




computer monitor.

Figure 1: Essential Stimulus

In each figure, you will observe a garbage bin, a portion of which may or may not be covered with

a set of black dots. In the cases where the black dots are present, the number and size of the dots

will vary from presentation to presentation. In the centre of the bin will be a blue truncated cone

that will be faced towards you. When viewing this truncated cone in 3D, you may or may not

perceive parts of it behind the bin’s surface. In all cases, the truncated cone will look solid and

uninterrupted (as though the bin’s surface is transparent).

If viewed from above, what you see will resemble to what is shown in Figure 2. Your task will be

to determine what portion of the truncated cone’s length is located behind the bin’s surface (x-

tenth). During a brief training session you will go through, you will become familiar with the

experimental procedure. The stimulus will be presented for 2 seconds and you will then be

prompted to make your selection. In determining your response (which will range from 1 to 10),

140

you will use the row of numbers on top of the keyboard and you will press ‘enter’ to move on to

the next trial. Following this, you will be presented with another question asking you to determine

how difficult it was for you to make that depth judgement. You may choose any number from 1

(easiest) to 4 (most difficult). Please note that the truncated cone will be presented with different

diameters and lengths. Therefore, in making your depth judgements refrain from using relative

size or visual height as a depth cue.

Figure 2: View from above.

After 20 minutes of going through the experiment, you will be given a 10-minute break after which

you will resume with the experiment. Please try to make your choices as accurately as possible. In

case you choose to release your email in the consent form, your chances of winning a $50 Amazon

gift card (in the lottery to be held once the experiments are done) will increase proportionally to

your performance score. There will be a total of 570 trials; however, you should be able to make

each judgement fairly rapidly, and finish this experiment in less than an hour.

Once you are done with this section, you will be interviewed about your experience while going

through the experiment. Your answers will be recorded to later on be transcribed and analyzed as

qualitative data. There are no “right answers” in this section; please try simply to express to us

whatever it is that you perceive. Upon completion of the experiment, you will be offered $20, as a

token of our gratitude for your participation.


"Using Random Dot Patterns to Achieve X-ray Vision with Stereoscopic Augmented Reality Displays"




• Performing a set of depth judgement tasks, which have been explained to me.

• The depth judgement tasks involve wearing 3D glasses to view a stereoscopic (3D)

display.

• Taking part in an interview for which my replies will be recorded.

141

I understand that the experiment will comprise a one-hour and 15-minute session.



I understand that my name will not appear on the questionnaire, that my performance data and

responses to interview questions will remain confidential, and that only the investigators of this

study will have access to my experimental data.




anonymous.


will be paid $20 for my participation. By providing my email below, I agree to take part in a

lottery for a $50 Amazon gift card with the chances of my winning determined based on my

performance score.











to their use.


directly.


participate.


Participant's Email: ____________________________

Signature: _____________________________________ Date: ________________________


Milgram


142



QUESTIONNAIRE

"Using Random Dot Patterns to Achieve X-ray Vision with

Stereoscopic Augmented Reality Displays"

Date: _______________________






Yes No

INTERVIEW SCRIPT42

1. At any time during the experiment, did you experience any difficulty in perceiving the stimuli

in 3D?

If so:

• Can you remember any specific type of pattern with which this occurred?

• Did it happen often?

• Are you able to describe that difficulty?

If no:

• Did you experience any double image or difficulty in fusing the images?

2. Do you remember using any specific strategy in making your judgments?

42 This script was used as a framework for the semi-structured interview of Experiment 3. Modifications to this

script were made depending on the responses of the participant (interviewee).

143

If so, can you describe them?

3. Show selection of stimuli (randomized order of high and low dot density and dot size, random

depths (not 0 or 100%), first sharp and then blurry, for duration same as experiment and ask

about strategy (each time).

4. Repeat Question 3 but with unlimited time.

5. a) Do you have the impression that the black dots are part of the surface of the bin?

b) Please describe how you perceive them.

c) Do you perceive any holes on the surface of the bin?

144

Appendix B: Supplementary Material for Chapter 6 (Experiment 3)

B.1. Summary of Insights Gained from Pilot Studies

In the process of designing, fine-tuning, and finalizing Experiment 3 that was described in

Chapter 6, various pilot studies were conducted. This section provides a summary of some of the

key lessons learned from those studies that influenced the final design of the experiments:

Prior to opting for a truncated cone as the virtual object, a wireframe pyramid was tried out, as

shown in Figure B.1. As in the final version of Experiment 3, the depth judgement task in this

case, asked the observer to determine what proportion of the pyramid’s length is placed behind

the real surface. Pilot tests involving 4 participants were run with this setup, and the results,

observations and interviews showed that: Firstly, the orientation of the pyramid was not always

clear to the participants – i.e. whether the apex was pointing inwards or outwards43. Secondly,

participants mainly used the tip (apex) and the lines along the edges of the pyramid to make their

depth judgement.

Since using such cues was not anticipated to be possible for typical X-ray applications of AR, the

idea of using a truncated cone was tried out. Because a truncated cone contains no wireframe

outlines other than the two circles at the ends, when this was tried out initially, there was concern

that participants would have difficulty properly perceiving the shape as a truncated cone.

Consequently four equally spaced thin longitudinal connecting lines were added equally spaced

around the circumference of the truncated cone, to reinforce perception of the proper shape. The

pilot results showed that, by doing so, participants were more likely to make their depth

judgements based on the distance of the larger and smaller base of the truncated cone with

respect to the real surface, which served our desired intention.

43 This was also the case with using a wireframe cube as the virtual object (as a result of the Necker cube illusion).

145

Figure B.1: Stereoscopic image showing the virtual pyramid placed halfway along its length

relative the surface of the bin (i.e. the real surface). The apex of the pyramid in this image is

pointed towards to the observer.

Some other factors that were also varied throughout various pilot studies to arrive at an

optimized value for the experiment were: the length of truncated cone(s), the time duration for

which the stimulus was presented, the width of lines connecting the circle bases of the truncated

cone, the blur level of the random dot patterns, and the number of sections into which to break

length of truncated cone.

B.2. Difficulty Rating of Depth Estimation Task

As mentioned in Section 6.4.2, for each trial, after having responded to the depth judgement task,

participants selected a response to the question: “On a scale of 1 (easiest) to 4 (most difficult),

how difficult did you find the task?” FigureB.2, FigureB.3 and FigureB.4 present scatterplots

showing the ‘Difficulty Rating’ results as a function of the virtual object’s actual depth

proportion (relative to the real surface) for the various patterns. FigureB.5 shows corresponding

scatterplot data for the No Pattern condition. As before, in these figures, the sizes and colours of

the dots are proportional to the number of occurrences at each point (where more occurrences are

shown with larger and darker circles).

146

(a)

147

(b)


actual depth proportion for dot density 20% and various dot sizes: (a) Sharp condition, (b) Blurry

condition. The size and colour of the dots are proportional to the number of occurrences at each

point.

148

(a)

149

(b)


actual depth for various dot sizes and dot density of 40%: (a) Sharp condition, (b) Blurry

condition.

150

(a)

151

(b)


actual depth for various dot sizes and dot density of 60%: (a) Sharp condition, (b) Blurry

condition.

152

Figure B.5: Scatterplot showing the ‘Difficulty Rating’ as a function of the virtual object’s

actual depth for the No Pattern condition.

B.3. Transcript of Interviews with Participants

Below, the approximate transcripts of participants’ responses to the review questions are

provided. These responses include those obtained from the participants of the pilot studies44 as

well (Participants 1-4).

Participant 1 (Pilot Study):

1. All the time. Had a hard time seeing the cone in 3D almost all the time. The more the cone did

not look 3D or the more I had difficulty in focusing on it, the farther I assumed it was.

Had difficulty in fusing, particularly with stimuli with cone behind bin and with patterns because

my brain wasn’t used to it. If it felt weird to look at, it was behind the bin.

No double image though!

44 These pilot studies were done using the truncated cone as the virtual object.

153

3. Limited time:

Most of the time, the cone looked in front. Anytime it seemed blurry or I had a hard time fusing

(it didn’t look 3D), it was behind.

Front part looked clear, not completely behind.

Didn’t have time to build up a specific strategy. I was too focused on the cone so I wasn’t able to

register the bin properly.

Patterns definitely mattered, because when they weren’t there, I would automatically say 0 (see

the cone in front of the bin).

Look at one part of the bigger ring first. If I had a difficult time perceiving it, then it’s behind.

Then I would look at the front and make the same judgement about it. Then I would use that to

decide the squat of the cone (which I had a really hard time doing). Then I would make my

judgement.

Blurry patterns were confusing and were hard to use for making the depth judgements. The cone

would still look weird though, when it was placed behind the surface of the bin.

5. During the experiment, I didn’t see the patterns as part of the surface of the bin because I was

so focused on the cone that I wasn’t even registering the bin as a bin.

However, when I look at it now, I do see the dots as part of the surface of the bin. I don’t see holes.


1. No difficulty.

2. Sometimes, I would compare it to the one before. Especially when it was at extremes. But other

than that I don’t think so.

3. Limited time: I would look at how protruded it looked. To do that, if the front or back looked

more solid, more protruding. If it was behind, it wasn’t as stark.

154

4. Unlimited time: very different when I look at it for a long time. I get even more uncertainty. My

eyes play tricks on me. I look at the edge of the cone then I might look for the midpoint as guidance.

I think I focus more on the circle that’s farther. Blurry patterns seemed more obvious. When there

are a lot of dots, it’s more noisy and more difficult. That’s why when it’s blurry it’s better cause I

can’t focus on a single point.

5. Dots are part of the surface, no holes.


1. No double image.

2. I would look at the difference between the bigger and smaller circle of the cone.

3. Limited time: difference between small and big circle size and length and width of connecting

rods. If pattern was darker or blurrier, I would get confused. I would use the same strategy but with

more uncertainty.

4. Unlimited time: Width of connecting rods and relative size of circles!

5. Dots are part of surface, no holes.


1. No difficulty, no double image.

2. I looked at the bigger circle and the lines coming out and using that. If bigger circle seemed

closer to dots, it was less inside. Compared bigger circle to surface and then looked at connecting

lines.

3. Limited time: Look at bigger circle and compare to background. Then, using the lines, I would

follow my gaze to the smaller circle and make my judgement.

4. Unlimited time: Same strategy. + compare depth of small circle to large circle.

Blurry patterns make it harder to localize the surface of the bin. More dots makes it even more

difficult.

155


Participant 5:

1. Some instances I would stop seeing in 3D for a few seconds. No double image.

2. Hard to sustain a strategy. I would make a flash judgement first. It would work better for me.

3. Limited time: Gut judgement. I looked at VO then the background. Holistic judgement. I looked

more at the farther circle and decided how far it was.

My strategy didn’t intentionally change with patterns. With no pattern, it was easier to tell if it was

in front. But when it wasn’t, it was much more difficult.

4. Unlimited time: Look at smaller circle first, if background was blurry, surface was farther. Sharp

patterns seem to be easier to look at.

With black points, it’s easier to tell if VO is behind than white points.

5. Dots look like a cut-out. Black points are holes cut out from surface.

Participant 6:

1. No double image.

2. I don’t remember using a specific strategy.

4. Unlimited time: Gut feeling. Contrast between background and cone. Holistic look at the cone.

Blurry patterns are easier to see.

5. In general, I see dots as part of the surface. No holes.

Participant 7:

1. No double image.

2. I tried a strategy and then I stopped. No clear strategy.

156

3. Limited time: I looked at how protruding the smaller circle compared to the bigger circle. I also

compared it to the edge of the bin.

I tried different strategies for each one.

4. Unlimited time: The black dots help me more. White dots look like they’re on top when large

circle is behind.

Sharp patterns were easier.

5. Seems like white dots are on the surface and black dots are showing what’s inside the bin (black

background!). Like holes.

Participant 8:

1. Some stimuli looked flatter than some others. No double image.

2. I looked at borders but also holistically and sometimes I tried to see the surface going through

it. If it were more in front, it was easier.

4. Unlimited time: Look at bigger circle, then smaller circle and then look at it holistically.

Sometimes VO looked like it was behind the dots.

Some of the blurrier ones were easier. They give me a better sense of the plane.

If VO looked jagged (changing shades passing through black and white dots), it’s farther behind.

With No Pattern and the VO wasn’t completely out, most difficult.


Participant 9:

1. Yes. When dots were large, it was difficult to see in 3D. When the larger circle is deeper into

the bin, I would see double.

2. I couldn’t use any specific strategy other than seeing in double.

4. Unlimited time: When larger circle was not as clear, it was farther inside.

157

It was easier to guess with the blurry patterns.


Participant 10:

1. No double image.

2. At the beginning, I used size (of circles) or length of connecting rods. But then I stopped. With

No Pattern, it was difficult to see what the depth is (seems half way in, halfway out).

4. Unlimited time: Edge of big circle seems ridged (?) between black and white points so it seems

to be out. Sharp patterns are easier. I look at the lines connecting the large and small circles. Is it

clear? If it’s clear, it’s closer.


Participant 11:

1. No double image.

2. If it’s obvious, it’s more outside. I looked at the size and position of the circle and the blur of

the pattern.

4. Unlimited time: More contrast, more outside.

Higher density is easier. Low density makes it difficult.

Blurry and lower density is relatively easy too.

5. Dots are part of the surface. No holes.

Participant 12:

1. No double image.

2. When No pattern, it seemed outside.

158

4. Unlimited time: I look at large circle first and make a judgement. Then I look at small circle and

make a judgement.

Lower dot densities were easier. But with no pattern, it was even more difficult.

Sharp ones were easier.

I look at edges of black and white dots.

5. Dots are part of the surface. No holes

Participant 13:

1. With blurry ones, I wouldn’t see double images but I would have difficulty in fusing the image.

2. When the circle is larger it looks deeper inside. With black dots, the cone seemed deeper inside.

I looked at the left edge of the bigger circle to make my judgement. Blinking also helped.

4. Unlimited depth: I looked at edges of circle.

Blurry was difficult to see depth with.

Lower dot densities were easier.


Participant 14:

1. No double image.

2. No Pattern is difficult. I looked at the relative size of the two circles.

4. Unlimited time: I looked at the line connecting the two circles. If it’s longer, it’s a longer cone.

Then I compare large circle to surface, then small circle to surface.

No difference between dot densities and sharp/blurry.


159

Participant 15:

1. No double image:

2. No Pattern was difficult and seemed like the cone was outside. No pattern also didn’t look very

3D.

If larger circle was out of focus, it made it look like it’s behind the bin. The closer it was, the more

focused (and clear) it looked.

4. Unlimited time: (same as above)

Smaller dot sizes and higher dot densities were more difficult.

Blurry patterns were easier.


Participant 16:

1. No double image.

When there were more black dots, it was harder to perceive in 3D.

2. I looked at the small circle and then I would compare the depth of the small circle to that of the

big circle.

4. Unlimited time: I looked at the depth of the front of the cone and then I would look at the

connecting lines.


Blurry was easier because of the limited time.


Participant 17:

1. No double image.

160

High dot densities were most difficult.

Blurry patterns were easier.

2. I compared trials to each other.

4. Unlimited time: I looked at smaller circle and decided based on its depth. I mostly focused on

the small circle and then, sometimes, I compared it to the depth of large circle.

I usually look at the black dots.


Participant 18:

1. No double image.

2. If background was blurry, the cone looked clearer and it looked more outside. But if pattern and

cone looked clear (sharp), I felt it was inside.

4. Unlimited time: Blurry pattern looked farther away.

I also considered length of cone.

I also sometimes looked at the line connecting the circles and how it looked next to the black dots.


Participant 19:

1. No double image.

2. Sometimes I would look at the size of the large and small circle. I had to resist comparing trials.

4. Unlimited time: I would try to determine the depth of the large and small circle. Mainly I would

look at small circle and sometimes I would focus on the center of the cone. If I had more time, I

would look at the pattern and try to find the position of the surface within the cone.

Higher densities were easier.

161

With No Pattern, I was either very certain or not certain at all.

With sharp patterns, the cone looked more opaque and I was more certain about my judgement.

I look at the black dots and the lines connecting the circles.


Participant 20:

1. No double image but when it was farthest, the small circle looked a bit blurry.

2. I had a reference for halfway, 0 and 10 in my head and compared it to that.

4. Unlimited time: I focus on the small circle and how much it’s coming out.


Sharper patterns are better.


162

Appendix C: Enlarged Stereo Images

C.1. Figure 1.3

163

C.2. Figure 4.2 (a)

164

Appendix D: Depth Cues45

To infer the depth of objects, our visual system integrates various sources of available depth

information, which are defined and categorized as depth cues. While some of these cues provide

information about the ordinal or relative depth of objects (e.g. which is closer or nearest), others

provide absolute depth information, which allows an observer to ascertain the absolute size of a

measurement (e.g. in meters). Generally, depth cues can be categorized into two groups. Those

which are a property of the object being perceived are referred to as object-centered cues and

those which are a result of our own visual system are referred to as observer-centered cues.

Object-centered Cues

These cues are sometimes also referred to as pictorial cues because of their use by artists in

conveying a sense of depth in a two-dimensional medium. They consist of the following cues:

Occlusion (Interposition)

Foreground-background occlusion occurs if an object intervenes between a vantage point and

another object. Both objects may project into the optic array at a vantage point. The front of the

foreground (or ‘occluder’) projects to the vantage point, and if it is opaque, either none or only

part of the other object can project to the vantage point. In this case, either the whole object or

the other part of it is hidden – ‘occluded’. In cases where the foreground is transparent, the

background object can either partially or completely project to the vantage point, with optic

arrays passing through the foreground’s surface. There are many kinds of optical information for

occlusion. Research on optical features encouraging the appearance of occlusion continues to this

day (Gillam and Grove, 2011; Kennedy, 1974; Peterson, 2015).

It is widely believed that occlusion is the most powerful depth cue at all distances where visual

perception holds. The reason for this is that our world is populated mostly by solid objects that

45 This appendix follows the discussion of depth cues in ‘Engineering Psychology and Human Performance’ by

Wickens et al. (2000) closely. The reader is advised to approach this appendix as a rather superficial review of

perceptual literature as it pertains to ‘engineering psychology and human performance’.

165

are opaque. However, transparent or translucent objects are also encountered regularly and can

be easily incorporated into our understanding.

In the context of X-ray vision applications of AR, various researchers have used the occlusion

cue by having features of the real surface occlude the virtual object, thus allowing the observer

easily to perceive the virtual object as being behind the real object (Lerotic et al., 2007; Avery et

al., 2009; Sandor, Cunningham, Dey & Mattila, 2010).

Linear Perspective

Through transformation of 3D information in a scene to a 2D image formed on our retina, one of

the phenomena that takes place is the conversion of two parallel lines to two lines converging

toward a single point receding in depth. In simpler terms, when two converging lines are seen,

they are ordinarily assumed to be two parallel lines extending away.

Height in the Plane (Relative Height)

Since objects on a common horizontal ground plane are usually observed from above, more

distant objects appear higher in the visual field. Therefore, in interpreting ordinal depths, this cue

can be quite effective. However, in situations where the ground is uneven, the depth knowledge

obtained from this cue could be limited to ordinal information, specifically with increasing

distance to the object.

Relative (Familiar) Size

As objects move farther away, their projected sizes become smaller. Therefore, if an object is

recognized or if the absolute size of a depicted object is known, one can infer its distance from its

apparent size using the size-distance invariance hypothesis (Kilpatrick and Ittelson, 1953).

Additionally, if one knows the relative sizes of multiple different objects, then their ordinal

proximity can be inferred from their relative apparent sizes in the visual field. Thus, the

important point about this cue is that it is a relative cue. In other words, a basis for comparison

must exist, either from the scene or from the observer’s experience.

Relative Density (Depth from Texture)

166

The characteristic spacing of a cluster of objects or features of a texture on the retina is referred

to as ‘relative density’. In the example of a textured plane, the optical projection of the grain will

grow finer at greater distances and, in these cases, this cue can also be termed as ‘textural

gradient’.

In fact, by projecting optical texture to the observer’s vantage point, the texture on the surface of

an object may allow the observer also to perceive its slant, distance and shape46. What makes the

“shape from texture” cue possible is perspective, which results in smaller and more closely

spaced optical projections of the markings (Cumming, Johnston & Parker, 1993).

Proximity-luminance Covariance

Since objects and lines that are closer to us are typically perceived as brighter, reductions in the

illumination of an object and/or the intensity of the projected optic array can be used as a cue to

infer increasing distance.

Aerial Perspective (Atmospheric Attenuation)

Tiny particles such as pollutants and moisture act as a translucent medium that cause more

distant objects to appear hazier or to have less contrast. This cue, which is mostly effective for

far-field distances, is referred to as ‘aerial perspective’.

Light and Shadow (Shading)

While cast shadows are the result of luminance attenuation on a surface due to the occlusion of a

light source, shadings are the luminance distribution on a surface due to the presence of a non-

occluded light source. In both cases, shadows and shading provide us with information on

objects’ shapes as well as orientations relative to us and relative to each other.

Motion Parallax

When an observer moves relative to a 3D scene, the projection of closer objects in the optic array

change faster than those that are farther. In other words, our perceptual system inversely relates

46 Slant, distance and shape are all related since slant is change of distance and differences in slant are part of shape.

167

an object’s distance to its relative degree of changes in angular motion. As such, motion parallax

is used to infer the shape and location of objects.

(Static) Observer-centered cues

The following depth cues are a result of the characteristics of the human visual system.

Accommodation

To bring images into focus on the retina, the curvature of the lenses of the eye requires

adjustment. This adjustment is referred to as accommodation. Closer objects require more

adjustment and, thus, sensing the amount of this adjustment might help in determining the

absolute depth of nearby objects.

Although static focus distances may not provide much information, changes in focus are what

makes this depth cue effective. Moreover, this cue is generally described as a monocular depth

cue since it does not require the involvement of both eyes.

(Binocular) Convergence

The amount of inward turning of the eyes when a focal point is fixated determines the degree of

‘convergence,’ and thus sensing the extent of this inward turning can help in determining the

distance of an object. This cue is used to provide absolute depth information for nearby objects.

Binocular Disparity

The ability to perceive a scene from two eyes that are separated by an interpupillary distance

provides (95% of) humans with one of the most important and perceptually acute sources of

depth information (Coutant & Westheimer, 1993).

When a scene is viewed, the fixation point (also referred to as the focal point) will fall on a

particular location on the retina of each eye, resulting in zero disparity. One can furthermore

envisage an imaginary geometric arc called the horopter, comprising all retinal points, including

the focal point, that also have zero retinal disparity. Other points that are closer or farther from

this arc are mapped onto disparate locations on the two retinas, which are nevertheless fused into

a single image in depth. The horopter thus provides a reference plane from which the ordinal

168

depth of other objects can be judged. Objects that are in front of the horopter (closer to the

observer) will result in fused images with crossed disparity, whereas objects that are behind the

horopter (farther from the observer) will result in fused images with uncrossed disparity. Based

on the amount of retinal disparity in the projection of each point to each eye, the visual system is

thus able to discern the ordinal depths between two points in space via the binocular disparity

depth cue (Patterson, 2009).

The importance of binocular disparity in perceiving depth was first shown through the invention

of the stereoscope by Wheatstone (1838), where a pair of flat drawings were used to achieve a

three-dimensional percept of an object. Later, in 1960, by introducing the concept of random dot

stereograms, Julesz (1971) made a significant contribution to the science behind stereo vision. A

typical example of a random dot stereogram is one where two images consist of identical

randomly distributed dots, but with a central square region that is shifted horizontally by a small

distance relative to the other image. When viewed individually, each image appears as a flat field

of random dots. However, when viewed stereoscopically, the central square region appears at a

depth that is different from the background plane of random dots. Random dot stereograms

provide evidence that binocular depth perception can be achieved without the need for

monocular form recognition.

Although the neurophysiological processes through which the brain derives depth information

from binocular disparity are outside the scope of this thesis, it is nevertheless important to note

the importance of vergence eye movements for the effectiveness of this cue. As mentioned, the

brain uses the horizontal disparity of objects on the retina to estimate their depth relative to the

fixation point. Through the use of vergence eye movements, the fixation point (defined as the

intersection of the line of sight of the two eyes) changes, resulting in a corresponding shift in the

position of the horopter. By doing so, our visual system is able to increase the range within

which it is able to perceive depth through binocular disparity (Foley & Richards, 1972). In

addition, the brain is able to use the corresponding changes in ocular vergence as a depth cue in

its own right. Therefore, if it were possible to provide extra cues that facilitate the observer’s

ability to converge her eyes at different depths, it may be possible to use the feedback from

convergence to increase the accuracy of information obtained from binocular disparity.

169

Appendix E: List of Abbreviations

AR Augmented Reality

OST Optical See-Through

MWF Modified Weak Fusion

DS Dot Size

DD Dot Density

PSE Point of Subjective Equality

SDT Signal Detection Theory

TR Transparency Rating

EDVO Estimated Depth of Virtual Object

DR Difficulty Rating

an investigation of using random dot patterns to achieve x ... · augmented reality displays ....

Documents