an investigation of using random dot patterns to achieve x ... · augmented reality displays ....
TRANSCRIPT
An Investigation of Using Random Dot Patterns to Achieve X-Ray Vision for Near-Field Applications of Stereoscopic
Video Based Augmented Reality Displays
by
Sanaz Ghasemi
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Mechanical and Industrial Engineering University of Toronto
© Copyright by Sanaz Ghasemi 2018
ii
An Investigation of Using Random Dot Patterns to Achieve X-Ray
Vision for Near-Field Applications of Stereoscopic Video Based
Augmented Reality Displays
Sanaz Ghasemi
Doctor of Philosophy
Department of Mechanical and Industrial Engineering
University of Toronto
2018
Abstract
As one of the most interesting applications of Augmented Reality, ‘X-ray vision’ involves the
presentation of computer generated objects as if they lie behind a real object surface. Achieving
this notion presents several challenges, including perceptual ambiguity about the ordinal and
absolute depth of the virtual object relative to the real object’s surface and maintaining sufficient
information about the virtual and real object. This thesis investigates how random dots on an
object’s surface can facilitate seeing a virtual object as behind the surface with stereoscopic
displays.
Using a psychophysical method, Experiment 1 demonstrated the potential of this approach to
improve ordinal depth judgements. Experiment 2 investigated the effect of dot size and dot
density on transparency ratings of the real surface while preserving surface details. Paired
Comparison results revealed an advantage of the proposed method in comparison with the ‘no
pattern’ condition for the transparency ratings. Surface detail preservation was also descriptively
shown to decrease with increasing dot density and dot size.
iii
Experiment 3 explored the impact of variations in image sharpness, dot size and dot density of
the random dot pattern, and the depth of the virtual object, on the accuracy and difficulty of
absolute depth judgements about the virtual object. Compared to the ‘no-pattern’ condition, the
random dot patterns improved the accuracy of depth judgments. In estimating the depth of the
virtual object when random dot patterns were used, no main effect of dot size was found.
However, interactions suggested higher dot densities lead to smaller errors. Moreover, subjective
difficulty ratings in performing depth judgements with sharper patterns may indicate that the
random dot patterns support the use of convergence as a beneficial depth cue. The implications
of these findings for the design of ‘X-ray’ displays for near-field (including medical)
stereoscopic AR are discussed.
iv
Acknowledgments
First and foremost, I would like to express my gratitude to my PhD supervisor, Professor Paul
Milgram, who provided me with insightful guidance throughout my studies, gave me the courage
and taught me to stand up for my knowledge and beliefs as a researcher and encouraged me to
continue pursuing this research when I needed it. I am also particularly grateful for the
meticulous attention and valuable time he gave to developing my written and presentation skills.
What I learned from you will never be forgotten.
I would also like to sincerely thank Professor John Kennedy who genuinely cared about and
made valuable contributions to my research. I am grateful for the time and expertise that
Professor Justin Hollands provided me with. I would also like to acknowledge the insightful
comments and invaluable feedback I received from my external examiner, Professor Victoria
Interrante. Her PhD thesis, vast expert knowledge and kindness have and will continue to serve
as an inspiration for me. I would also like to thank Professor Mark Chignell for serving as a
voting member of my final examination.
Finally, I would like to express my utmost gratitude to my parents who have provided me with a
life I am grateful for every single day. If I have accomplished anything in my life, it has been
because of you, your sacrifices, your encouragements and your wisdom. I would also like to
thank Nazanin, my sister, who has been my truest companion and the most reliable and loving
friend I know. You make life so much better. I am also thankful to Hooman Abbasian who
helped tremendously in guiding me through my data analysis.
Last but not least, I want to thank the love of my life, Nader Noroozi, who has been the
strongest, most supportive and most caring partner and the greatest source of motivation for me
through the thick and thin we’ve been through during the past year. Although you only joined me
during the last year of my PhD journey, I am forever indebted to you for this accomplishment.
v
Table of Contents
Acknowledgments.......................................................................................................................... iv
Table of Contents .............................................................................................................................v
List of Tables ................................................................................................................................. ix
List of Figures ..................................................................................................................................x
List of Appendices ...................................................................................................................... xvii
Chapter 1 ..........................................................................................................................................1
Introduction & Overview ............................................................................................................1
Chapter 2 ..........................................................................................................................................7
Perceptual Background ...............................................................................................................7
2.1 Depth Cues ...........................................................................................................................7
2.1.1 Occlusion (Interposition) .........................................................................................8
2.1.2 Relative (Familiar) Size ...........................................................................................8
2.1.3 Accommodation .......................................................................................................9
2.1.4 (Binocular) Convergence .........................................................................................9
2.1.5 Binocular Disparity ..................................................................................................9
2.2 Integration of Depth Cues ..................................................................................................10
2.3 Surfaces ..............................................................................................................................13
2.4 Texture ...............................................................................................................................14
2.5 Transparency ......................................................................................................................14
Chapter 3 ........................................................................................................................................16
X-Ray Vision in AR: Literature Review ...................................................................................16
3.1 Challenges ..........................................................................................................................16
3.2 Review of Proposed Solutions ...........................................................................................20
3.2.1 Cutaway or Virtual Hole ........................................................................................20
3.2.2 Modified Opacity ...................................................................................................21
vi
3.2.3 Context-preserving Techniques .............................................................................25
3.3 Criteria for Success ............................................................................................................26
Chapter 4 ........................................................................................................................................28
Our Method ...............................................................................................................................28
4.1 Use of Texture....................................................................................................................28
4.2 Stereo-Translucency ..........................................................................................................29
4.3 Information Preservation ...................................................................................................34
4.4 Computational Costs ..........................................................................................................34
4.5 Past Work ...........................................................................................................................35
Chapter 5 ........................................................................................................................................37
Experiments 1 and 2: Effect of Using Random Dot Patterns on Depth Order
Disambiguation, Perception of Transparency and Surface Information Preservation ..............37
5.1 Purpose ...............................................................................................................................38
5.2 Experimental Method.........................................................................................................38
5.2.1 Image Generation and Presentation .......................................................................39
5.2.2 Participants .............................................................................................................41
5.3 Experiment 1 ......................................................................................................................42
5.3.1 Objectives and Hypotheses ....................................................................................42
5.3.2 Procedure ...............................................................................................................43
5.3.3 Results and Discussion ..........................................................................................44
5.4 Experiment 2 ......................................................................................................................49
5.4.1 Objectives, Hypotheses and Procedure ..................................................................49
5.4.2 Results and Discussion ..........................................................................................57
5.5 Contributions, Limitations and Conclusions......................................................................60
Chapter 6 ........................................................................................................................................62
Experiment 3: Effect of Using Random Dot Patterns for Improving Accuracy of Depth
Judgements ................................................................................................................................62
vii
6.1 Purposes .............................................................................................................................62
6.2 Experimental Method.........................................................................................................63
6.2.1 Image Generation and Presentation .......................................................................63
6.2.2 Participants .............................................................................................................72
6.2.3 Procedure ...............................................................................................................72
6.2.4 Depth Judgement Task ...........................................................................................74
6.3 Hypotheses .........................................................................................................................76
6.3.1 Estimated Depth of Virtual Object relative to real surface (EDVO) .....................77
6.3.2 Difficulty Rating of depth estimation task (DR)....................................................77
6.4 Results ................................................................................................................................78
6.4.1 Estimated Depth of Virtual Object (EDVO) ..........................................................79
6.4.2 Difficulty Rating of depth estimation task (DR)..................................................101
6.4.3 Correspondence between Average Absolute Errors in EDVO and DRs .............107
6.4.4 Responses to the Interview Questions .................................................................108
6.5 Discussion ........................................................................................................................111
6.5.1 Errors in EDVO ...................................................................................................111
6.5.2 DRs ......................................................................................................................113
6.5.3 Relationship between Average Absolute Errors in EDVO and DRs ...................115
6.5.4 Some Notes on the Responses to the Interview Questions ..................................115
6.6 Contributions and Limitations .........................................................................................117
Chapter 7 ......................................................................................................................................119
Conclusion ..............................................................................................................................119
7.1 Contributions....................................................................................................................119
7.2 Practical Implications.......................................................................................................120
7.3 Limitations and Suggested Improvements to Experiments .............................................121
7.3.1 Experiments 1 and 2.............................................................................................121
viii
7.3.2 Experiment 3 ........................................................................................................122
7.4 Future Work .....................................................................................................................123
References ....................................................................................................................................125
Appendix A: Forms and Questionnaires ......................................................................................132
A1. Experiment 1 ....................................................................................................................132
A2. Experiment 2 ....................................................................................................................135
A3. Experiment 3 ....................................................................................................................139
Appendix B: Supplementary Material for Chapter 6 (Experiment 3)..........................................144
B.1. Summary of Insights Gained from Pilot Studies .............................................................144
B.2. Difficulty Rating of Depth Estimation Task ...................................................................145
B.3. Transcript of Interviews with Participants ......................................................................152
Appendix C: Enlarged Stereo Images ..........................................................................................162
C.1. Figure 1.3 .........................................................................................................................162
C.2. Figure 4.2 (a) ...................................................................................................................163
Appendix D: Depth Cues .............................................................................................................164
Object-centered Cues .......................................................................................................164
(Static) Observer-centered cues .......................................................................................167
Appendix E: List of Abbreviations ..............................................................................................169
ix
List of Tables
Table 3.1: Summary of literature review on perceptual issues of X-ray vision in AR. ............... 19
Table 6.1: Contrast results for significant interaction effects between depth and pattern. The rows
are colour coded to aid in identification of patterns with the same dot size. ................................ 88
x
List of Figures
Figure 1.1: This simplified Reality-Virtuality continuum shows the various proportions with
which real (shown in blue) and virtual (shown in red) worlds can be combined to display
information. (Adapted from Milgram and Kishino, 1994). ............................................................ 2
Figure 1.2: Methods used by Schall et al. (a) (Schall et al., 2012), Lerotic et al. (b) (Lerotic et al.,
2007) and Mohr et al. (Mohr et al., 2015) applying the X-ray vision notion to present internal
structures for different applications. Image (a): “AR view with superimposed enclosures and
base point of the building corner and a capping registered in 3D.” Reprinted by permission from
RightsLink: Springer, Personal and ubiquitous computing, Smart Vidente: advances in mobile
augmented reality for interactive visualization of underground infrastructure, Schall, G.,
Zollmann, S., & Reitmayr, G., Copyright 2012 by Springer. Image (b): “fused NPR AR with the
original video.” Reprinted by permission from RightsLink: Springer Berlin Heidelberg, Medical
Image Computing and Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-
photorealistic rendering for augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang,
G. Z., Copyright 2007 by Springer-Verlag Berlin Heidelberg. Image (c): AR view showing the
interior of a coffee machine to aid in maintenance procedures. Courtesy of Peter Mohr. ............. 3
Figure 1.3: Stereo pairs. The blue circle indicates a virtual object rendered behind the surface of
a real object (the face). In this case, although the occlusion cue suggests that the virtual object is
in front of the real surface, the addition of the random dot patterns is intended to aid the observer
in correctly perceiving the virtual object as being inside the person’s head. An enlarged
(landscape) version of this image is provided in Appendix C1 to help in perceiving the desired
percept. The observer’s left eye should find a rearward ring shifted to the right compared to the
nose, and the right eye should find it shifted to the left. To view the image in this figure (as well
as all other stereo pairs presented in this thesis) in stereo without the aid of any stereoscopic
viewing equipment, the reader is advised to free fuse the images, using the white squares at the
top as a fixation point. Depending on which method the reader finds easier, either a) cover the
right image and, while observing the left pair, allow your eyes to relax, as if looking into the
distance, until the two images fuse into one (parallel fusing); or b) cover the left image and,
while observing the right pair, cross (i.e. converge) your eyes until the two images fuse into one
xi
(cross fusing). (Note that fusing this image is supposed to be difficult, as a consequence of the
cue conflict outlined above.) ........................................................................................................... 5
Figure 3.1: Example of virtual hole metaphor used by Rosenthal et al. (2002) for the task of
targeting needle biopsies in phantoms. The vertical lines are meant to show the sides of the
virtual hole. Reprinted from Medical Image Analysis, Vol. 6, Rosenthal et. al, Augmented reality
guidance for needle biopsies: An initial randomized, controlled trial in phantoms, 313-320, 2002,
with permission from Elsevier. ..................................................................................................... 21
Figure 3.2: Example of the Modified Opacity visualization method used by Bichlmeier et al.
(2007). Reprinted by permission from IEEE 2007. ...................................................................... 22
Figure 3.3: The 7 evaluated visualizations studied by Sielhorst et al. (2006). Visualizations 2 and
3, corresponding respectively to surface rendering transparently superimposed and surface
rendering through a virtual window in the skin, were determined to be the best in terms of depth
perception and effectiveness. Reprinted by permission from RightsLink: Springer Berlin
Heidelberg, International Conference on Medical Image Computing and Computer-Assisted
Intervention, Depth perception–a major issue in medical AR: evaluation study by twenty
surgeons, Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N., Copyright 2006 by
Springer-Verlag Berlin Heidelberg. .............................................................................................. 24
Figure 3.4: A context-preserving visualization used by Lerotic et al. (2007). Reprinted by
permission from RightsLink: Springer Berlin Heidelberg, Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-photorealistic rendering for
augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z., Copyright 2007 by
Springer-Verlag Berlin Heidelberg. .............................................................................................. 26
Figure 4.1: Stereo pairs. The blue circle indicates a virtual object rendered in front of the surface
of a real object (the face). In this case, the binocular disparity cue and the occlusion cue provide
consistent information, allowing the virtual object to be perceived unambiguously as being in
front of the person’s face. Note that the middle face shows less mouth than the left face and the
eyebrows are more extensive in the left face. ............................................................................... 29
Figure 4.2: In both sets of stereo pairs (a) and (b) (identical to Figure 1.3), the blue virtual circle
is stereoscopically rendered behind the face. In this case, the binocular disparity cue and the
xii
occlusion cue provide inconsistent information, leading to a cue conflict: (a) Untreated image.
An enlarged (landscape) version of this image is provided in Appendix C2 to help in perceiving
the desired percept.; (b) Addition of random dots onto the face (using a projector). If successful,
the reader should more easily perceive the virtual circle as being behind the face in (b), relative
to (a). ............................................................................................................................................. 31
Figure 4.3: Hypothesised percept when using a dot pattern as a means of surface manipulation.
The top portion of the image shows a magnified (2D) view of the real surface (skin), which has
been altered by adding a random dot pattern. The lower portion of the image shows the top view
of the observer as he/she may perceive the image if this percept is achieved. ............................. 34
Figure 4.4: Sample stimulus used by Otsuki and Milgram (2013). The blue circle indicates a
virtual object rendered beneath the depicted surface, which has been modified through the
addition of a pattern of random black dots. Reprinted by permission from IEEE 2013. .............. 35
Figure 5.1: Example of a stimulus stereo pair used in the experiment. The blue circle indicates a
virtual object rendered beneath a textured purple surface, which has been modified through the
addition of a pattern of random black dots. (The reader is referred to Figure 1.3 for instructions
on how to free fuse such stereo images.) ...................................................................................... 40
Figure 5.2: Stimuli used for Experiments 1 and 2. Only the 9 stimuli in the 40, 50 and 60%
columns were used in Expriment 1. All 12 stimuli were used in Experiment 2. .......................... 42
Figure 5.3: Psychophysical functions fitted to results of Experiment 1 for dot sizes of (a) 1/25,
(b) 1/50 and (c) 1/75. .................................................................................................................... 49
Figure 5.4: Samples of stereo pairs illustrating the shape matching task for assessment of surface
information. (a) The inner and outer yellow objects are both circles. (b) The inner yellow object
is an ellipse. (a) and (b) constitute the No Pattern condition. (c) Example of task with random dot
pattern present, and where inner yellow object is an ellipse. The orientation of the major axes of
the ellipses in (b) and (c) are 54º (corresponding to level 3) and 144º (corresponding to level 8),
respectively. .................................................................................................................................. 54
xiii
Figure 5.5: Options for designating the orientation of the major axis in ellipse conditions. This
image was provided as a guide for assisting participants in selecting their responses to the ellipse
axis orientation questions, in the form of numerals 0 to 10 on the computer keypad. ................. 54
Figure 5.6: Schematic illustration of hypotheses for both parts of Experiment 2. (a) effect of dot
density (H3); (b) effect of dot size (H4). ...................................................................................... 57
Figure 5.7: d’ and Transparency Rating (TR) results obtained from Experiment 2. The solid lines
join the d’ results, corresponding to the left hand axis, while the dashed lines join the
transparency ratings (TR), corresponding to the right hand axis. The yellow horizontal lines
correspond to the No Pattern condition. ....................................................................................... 58
Figure 5.8: Mean absolute offset errors as a function of dot size and dot density. The orange
horizontal line corresponds to the No Pattern condition. .............................................................. 59
Figure 6.1: Sample stereo pair of stimuli shown to participants. The pattern used in this example
consisted of random dots with sizes of 1/75 and distributed with 40% dot density. For guidance
on how to fuse these images, see explanation provided in caption of Figure 1.3. ........................ 64
Figure 6.2: Diagram presenting different parts of the stimulus. ................................................... 64
Figure 6.3: Stimuli with sharp random dot patterns used for Experiment 3. ................................ 66
Figure 6.4: Stimuli with blurry random dot patterns used for Experiment 3. ............................... 67
Figure 6.5: Front, side and top views of wireframe truncated cone (the virtual object) and
cylindrical bin (the real object). .................................................................................................... 68
Figure 6.6: Diagram showing real model used for generating virtual truncated cone. (It should be
noted that, as mentioned, this model consists of concentric circles. However, since the image
shows this model from the side, these circles appear in this figure as ellipses.) .......................... 69
Figure 6.7: Sequence of steps taken to generate virtual truncated cone. As expected, the
connecting rod between the two circles cannot be seen in these images, as it is perpendicular to
the line of sight.............................................................................................................................. 69
xiv
Figure 6.8: Schematic top view diagram of rod with circles placed at 6 different depths relative to
the surface of the bin. The black numbers noted on the rod are indicative of the proportion of the
truncated cone that was placed behind the bin’s surface. (The light blue lines joining the circles
represent the sides of the final virtual truncated cone.) ................................................................ 70
Figure 6.9: Schematic diagram showing the camera setup with respect to the bin (above), which
was replaced by the rod connecting the circles (below), which was placed along the red dashed
line (marking the surface of the bin’s location). ........................................................................... 71
Figure 6.10: Guide placed next to monitor for participants’ reference during experiment. ......... 74
Figure 6.11: Top view example of truncated cone’s position relative to the surface of the bin. In
this image ‘a’ and ‘b’ denote the distance of the larger and smaller circles of the truncated cone
relative to the bin’s surface, respectively. ..................................................................................... 76
Figure 6.12: Experimental hypotheses 6 and 7, illustrating expected changes in DR as a function
of dot size and dot density. ........................................................................................................... 78
Figure 6.13: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 20%: (a) Sharp
condition, (b) Blurry condition. The sizes and colours of the dots are proportional to the number
of occurrences at each point. Each column adds up to 75 trials (15 participants*5 trials). A blue
trend line has been fitted to the data. The y=x and y=5 reference lines are also provided to show
perfect and chance performance, respectively. ............................................................................. 80
Figure 6.14: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 40%: (a) Sharp
condition, (b) Blurry condition. .................................................................................................... 81
Figure 6.15: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 60%: (a) Sharp
condition, (b) Blurry condition. .................................................................................................... 82
Figure 6.16: Scatterplot showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for the No Pattern condition. ......................................... 83
xv
Figure 6.17: Plot showing the Point of Subjective Equality as a function of dot density. The PSE
for the No Pattern condition is shown for reference. .................................................................... 84
Figure 6.18: Average absolute error in perceived depth as a function of the virtual object’s actual
depth relative to the real surface for dot size = 1/25. .................................................................... 86
Figure 6.19: Average absolute error in perceived depth as a function of the virtual object’s actual
depth relative to the real surface for dot size = 1/50. .................................................................... 86
Figure 6.20: Average absolute error in perceived depth as a function of the virtual object’s actual
depth relative to the real surface for dot size = 1/75. .................................................................... 87
Figure 6.21: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of blur and depth. .............................................................. 92
Figure 6.22: Average absolute error in perceived depth of virtual object as a function of dot
density depicting the interaction effect of blur and dot density. ................................................... 93
Figure 6.23: Average absolute error in perceived depth of virtual object as a function of its actual
depth, depicting the interaction effect of depth and dot density for dot size=1/25. ...................... 95
Figure 6.24: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of depth and dot density for dot size=1/50. ....................... 95
Figure 6.25: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of depth and dot density for dot size=1/75. ....................... 96
Figure 6.26: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of blur, depth and dot density for dot size=1/25. .............. 98
Figure 6.27: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of blur, depth and dot density for dot size=1/50. .............. 99
Figure 6.28: Average absolute error in perceived depth of virtual object as a function of its actual
depth depicting the interaction effect of blur, depth and dot density for dot size=1/75. ............ 100
xvi
Figure 6.29: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/25. ......................................................................................... 102
Figure 6.30: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/50. ......................................................................................... 103
Figure 6.31: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/75. ......................................................................................... 103
Figure 6.32: Effect of depth on DRs. .......................................................................................... 105
Figure 6.33: DRs as a function of dot size depicting the interaction effect of blur and dot size. 107
Figure 6.34: Scatterplot showing average absolute errors in perceived depth as a function of the
difficulty rating for all trials. ....................................................................................................... 108
Figure B.1: Stereoscopic image showing the pyramid (=virtual object) placed at halfway along
its length relative the surface of the bin (=real surface). The apex in this image is pointed towards
to the observer …………………………………………………..………………………………145
Figure B.2: Scatterplots showing the ‘Difficulty Rating’ as a function of the virtual object’s
actual depth proportion for dot density 20% and various dot sizes: (a) Sharp condition, (b) Blurry
condition. The size and colour of the dots are proportional to the number of occurrences at each
point. ........................................................................................................................................... 147
Figure B.3: Scatterplots showing the ‘Difficulty Level’ as a function of the virtual object’s actual
depth for various dot sizes and dot density of 40%: (a) Sharp condition, (b) Blurry condition. 149
Figure B.4: Scatterplots showing the ‘Difficulty Level’ as a function of the virtual object’s actual
depth for various dot sizes and dot density of 60%: (a) Sharp condition, (b) Blurry condition. 151
Figure B.5: Scatterplot showing the ‘Difficulty Level’ as a function of the virtual object’s actual
depth for the No Pattern condition. ............................................................................................. 152
xvii
List of Appendices
Appendix A: Forms and Questionnaires ......................................................................................132
A1. Experiment 1 ....................................................................................................................132
A2. Experiment 2 ....................................................................................................................135
A3. Experiment 3 ....................................................................................................................139
Appendix B: Supplementary Material for Chapter 6 (Experiment 3)..........................................144
B.1. Summary of Insights Gained from Pilot Studies .............................................................144
B.2. Difficulty Rating of Depth Estimation Task ...................................................................145
B.3. Transcript of Interviews with Participants ......................................................................152
Appendix C: Enlarged Stereo Images ..........................................................................................162
C.1. Figure 1.3 .........................................................................................................................162
C.2. Figure 4.2 (a) ...................................................................................................................163
Appendix D: Depth Cues .............................................................................................................164
Object-centered Cues .......................................................................................................164
(Static) Observer-centered cues .......................................................................................167
Appendix E: List of Abbreviations ..............................................................................................169
1
Chapter 1
Introduction & Overview
The goal of this thesis is to show a tactic for improving stereovision that reveals objects behind a
surface. Three experiments will be reported – one on the potential of this approach to improve
ordinal depth judgements, one on achieving transparency of the surface while preserving surface
details, and one on the improvement of absolute depth judgements using this approach. To
provide an appropriate context for the understanding of these experiments, this chapter is focused
on preliminary background information and gives an overview of the chapters to follow.
Despite the rapidly expanding application areas of this technology, Augmented Reality (AR) has
been around since the 1960’s, although the term “Augmented Reality (AR)” came to life only 25
years ago when Caudell and Mizell (1992) used it for training purposes in an industrial setting.
Since then, AR technologies have demonstrated success in a variety of medical, personal,
navigation, television, advertising and commerce, and gaming application domains (Schmalstieg
and Höllerer 2017). This advance can be attributed to the wide range of applications that can
benefit from the addition of computer-generated (virtual) elements to images of the real world.
Milgram and Kishino (1994) used the Reality-Virtuality continuum, as depicted in Figure 1.1, to
define AR displays, based on their definition: “AR displays are those in which the image is of a
primarily real environment, which is enhanced, or augmented, with computer-generated
imagery”. There are three primary methods to achieve visual AR (Schmalstieg and Hollerer,
2016):
1. Optical see-through (OST) displays allow the user to see the real world through an
optical combiner that is used to reflect the computer generated virtual objects onto the
user’s eyes (Rolland & Fuchs, 2000).
2
2. Video based displays1 combine the real and virtual objects electronically. In other
words, the image of the real world captured by a camera is displayed on a conventional
viewing device (such as a monitor) and the virtual elements are added onto the image
using a graphics processor.
3. Spatial Projection refers to cases where a light projector is used to project a virtual
image directly onto a real object.
Figure 1.1: This simplified Reality-Virtuality continuum shows the various proportions with
which real (shown in blue) and virtual (shown in red) worlds can be combined to display
information. (Adapted from Milgram and Kishino, 1994).
One of the most intriguing applications of AR is the notion of “X-ray vision,” denoting the
ability to virtually “see through” a real object’s surface to present information that is not
otherwise visible to the user (Livingston, Dey, Sandor, & Thomas, 2013). In contrast to most AR
applications, which involve superimposing computer generated images onto real objects, the
present context involves adding images beneath, or behind, the surface of real objects.
The ability to ‘see through’ a surface or have ‘X-ray vision’ has a wide range of applications in
various realms. For example, in civil engineering and for surveying purposes, visualizations can
be used to reveal hidden subsurface infrastructures, such as underground pipes (Schall, Zollman,
& Reitmayr, 2012). In medical applications, preoperative ultrasound images can be overlaid onto
the organ to show its underlying anatomy (Lerotic, Chung, Mylonos & Yang, 2007). In industrial
settings, seeing through machines can help maintenance engineers and other workers perform
1 These displays are most commonly referred to as Video See-Through (VST) displays. However, to prevent
confusion with the concept of OST displays, where an optical element is actually looked through, we have refrained
from using this term,
3
various operations without the need to memorize manuals and documentation (Mohr et al.,
2015). Sample images of such applications are shown in Figure 1.2.
(a) (b)
(c)
Figure 1.2: Methods used by Schall et al. (a) (Schall et al., 2012), Lerotic et al. (b) (Lerotic et
al., 2007) and Mohr et al. (Mohr et al., 2015) applying the X-ray vision notion to present internal
structures for different applications. Image (a): “AR view with superimposed enclosures and
base point of the building corner and a capping registered in 3D.” Reprinted by permission from
RightsLink: Springer, Personal and ubiquitous computing, Smart Vidente: advances in mobile
augmented reality for interactive visualization of underground infrastructure, Schall, G.,
Zollmann, S., & Reitmayr, G., Copyright 2012 by Springer. Image (b): “fused NPR AR with the
4
original video.” Reprinted by permission from RightsLink: Springer Berlin Heidelberg, Medical
Image Computing and Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-
photorealistic rendering for augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang,
G. Z., Copyright 2007 by Springer-Verlag Berlin Heidelberg. Image (c): AR view showing the
interior of a coffee machine to aid in maintenance procedures. Courtesy of Peter Mohr.
Regardless of the realm of application, achieving the metaphor of X-ray vision is difficult since it
is ‘unnatural’ in the real world. One of the major challenges involved is the potential perceptual
ambiguity caused by simply superimposing a hidden virtual object onto the image of a real
object surface2. The consequent blocking off of the real surface suggests to the observer that the
virtual object must be in front of the real surface, rather than behind it, thus contradicting the
notion of X-ray vision. Even with stereoscopic (3D) displays, simply rendering a virtual object at
the proper depth “correctly” behind a real object may nevertheless create the perception of a
floating virtual object in front of the surface of the real object (Drascic & Milgram, 1996;
Johnson, Edwards, & Hawkes, 2003). This is a consequence of the strength of the occlusion cue3
(Cutting & Vishton, 1995). Even when the ordinal depths of the virtual object and the real
surface are judged correctly, research has shown that the presence of a real surface in front of a
virtual object can lead to imprecise judgments of the absolute depth of the virtual object
(Edwards et al., 2004) and that the content of the real surface can reduce the distance within
which the virtual object can be placed from the real surface without leading to double vision
(Johnson et al., 2003).
To deal with the challenges involved in the simultaneous presentation of overlapping surfaces,
various researchers have suggested the addition of some sort of ‘texture’ to the real surface
(Interrante, Fuchs and Pizer, 1997; Zollmann, Kalkofen, Mendez and Reitmay, 2010; Lerotic et
al., 2007; Avery, Sandor & Thomas, 2009). However, these methods either require precise
modelling of the real surface, are not applicable to cases where occlusion of the virtual object is
2 For the sake of clarity, in describing this method we use the term ‘surface’ to refer to the surface of a real object,
which has been captured by some kind of a sensor and has been reproduced in the image. The computer-generated
object, on the other hand, will be referred to as the virtual ‘object’.
3 For an explanation of this cue, see Chapter 2.
5
difficult to realize, or require the real object’s surface to possess salient features in order for the
algorithms to function effectively.
In this thesis, we propose another method of dealing with the perceptual challenges involved in
X-ray vision. With this method, which we are proposing be used for near-field applications of
AR, an artificial non-uniform texture is added to the surface of a real object. The key differences
of our approach from others’ are that: (a) our texture involves randomly distributed (black) dots
(similar to those used in random dot stereograms); (b) the only depth cues that are present are the
occlusion and binocular disparity cue (which limits the application of this method to stereoscopic
displays only); and (c) the occlusion cue is not consistent with the binocular disparity cue (the
virtual object occludes the real surface). An example of the application of this method is
provided in Figure 1.3.
Figure 1.3: Stereo pairs. The blue circle indicates a virtual object rendered behind the surface of
a real object (the face). In this case, although the occlusion cue suggests that the virtual object is
in front of the real surface, the addition of the random dot patterns is intended to aid the observer
in correctly perceiving the virtual object as being inside the person’s head. An enlarged
(landscape) version of this image is provided in Appendix C1 to help in perceiving the desired
percept. The observer’s left eye should find a rearward ring shifted to the right compared to the
nose, and the right eye should find it shifted to the left. To view the image in this figure (as well
as all other stereo pairs presented in this thesis) in stereo without the aid of any stereoscopic
viewing equipment, the reader is advised to free fuse the images, using the white squares at the
top as a fixation point. Depending on which method the reader finds easier, either a) cover the
right image and, while observing the left pair, allow your eyes to relax, as if looking into the
6
distance, until the two images fuse into one (parallel fusing); or b) cover the left image and,
while observing the right pair, cross (i.e. converge) your eyes until the two images fuse into one
(cross fusing). (Note that fusing this image is supposed to be difficult, as a consequence of the
cue conflict outlined above.)
It was hypothesized that adding these random dot patterns would improve the observer’s ability
to perceive the correct depth of the virtual object, both relatively and absolutely. To investigate
and optimize the effect on depth perception, the dot sizes and dot densities were varied
throughout the experiments conducted.
While various methods have been shown to be effective for improving depth perception for
applications of X-ray vision with OST displays, there is a paucity of literature that focuses on
this issue with video-based AR displays. Moreover, the anticipated consumer applications of X-
ray vision in AR have led to most literature being focused on the challenges involved in
achieving X-ray vision for medium and far-field distances. In this project, I have designed and
conducted experiments that demonstrate the effectiveness of adding random dot patterns in
improving depth perception for near-field applications of video-based AR displays.
In the next chapter, the perceptual background and terminology necessary to understand the idea
behind this research are presented. Chapter 3 provides an overview of existing solutions that aim
to deal with challenges of X-ray vision in AR, followed by a discussion of the details of our idea
presented in Chapter 4. Chapters 5 and 6 provide the details of the three experiments conducted
to evaluate the proposed concept. The final chapter summarizes the contributions and limitations
of this research.
7
Chapter 2
Perceptual Background4
The effectiveness of any visualization technique heavily depends on the ability to understand and
exploit the way in which our visual system extracts and integrates information from the real
world through perception. It is therefore useful to sketch some fundamentals of perception as
they relate to the topic of this thesis before moving on to methods used in achieving X-ray vision
for AR applications (Chapter 3). In this chapter, we first discuss the basis of perceiving depth
(Wickens, Hollands, Banbury, & Parasuraman, 2000) as it relates to the topic of this thesis and
then move on to providing definitions of specific terms used in this context.
2.1 Depth Cues
To estimate the depth of objects, our visual system relies on various sources of depth
information, which are defined and categorized as depth cues. While some of these cues provide
information about the ordinal or relative depth of objects (e.g. which is closer or nearest), others
provide absolute5 depth information, which allows an observer to ascertain the absolute size of a
measurement (e.g. in meters).
In addition to providing different types of information, the relative ‘strengths’ of depth cues also
vary at different distances (Cutting and Vishton, 1995). Therefore, to understand how various
depth cues are used to perceive the 3D layout of our environment, Cutting and Vishton (1995)
divided the continuum of depth into three regions: personal space, action space and vista space.
These terms are also commonly referred to as near-field, medium-field and far-field distances,
respectively (Livingston et al., 2013). For example, for near-field distances, some of the most
effective depth cues are: occlusion, relative size, accommodation, convergence and binocular
4 Note: The reader should approach this chapter as a rather superficial review of perceptual literature as it pertains to
the goal and focus of this thesis. This chapter is meant only to provide the background necessary for understanding
the theory behind the method used in this investigation.
5 In some literature, ordinal and absolute depth are also referred to as ‘relative’ and ‘metric’ depth, respectively.
However, to avoid ambiguity, I have chosen to exclude these terms from our discussion and have chosen to only use
ordinal and absolute as they pertain to our study.
8
disparity. Since the method investigated in this thesis is meant primarily for near-field
applications, only these depth cues are discussed in detail below. For a broader overview of
depth cues, the reader is referred to Appendix D.
2.1.1 Occlusion (Interposition)
Foreground-background occlusion occurs if an object intervenes between a vantage point and
another object. Both objects may project into the optic array at a vantage point. The front of the
foreground (or ‘occluder’) projects to the vantage point, and if it is opaque, either none or only
part of the other object can project to the vantage point. In this case, either the whole object or
the other part of it is hidden – ‘occluded’. In cases where the foreground is transparent, the
background object can either partially or completely project to the vantage point, with optic
arrays passing through the foreground’s surface. There are many kinds of optical information for
occlusion. Research on optical features encouraging the appearance of occlusion continues to
this day (Gillam and Grove, 2011; Kennedy, 1974; Peterson, 2015).
It is widely believed that occlusion is the most powerful depth cue at all distances where visual
perception holds. The reason for this is that our world is populated mostly by solid objects that
are opaque. However, transparent or translucent objects are also encountered regularly and can
be easily incorporated into our understanding.
In the context of X-ray vision applications of AR, various researchers have used the occlusion
cue by having features of the real surface occlude the virtual object, thus allowing the observer
easily to perceive the virtual object as being behind the real object (Lerotic et al., 2007; Avery et
al., 2009; Sandor, Cunningham, Dey & Mattila, 2010).
2.1.2 Relative (Familiar) Size
As objects move farther away, their projected sizes become smaller. Therefore, if an object is
recognized or if the absolute size of a depicted object is known, one can infer its distance from
its apparent size using the size-distance invariance hypothesis (Kilpatrick and Ittelson, 1953).
Additionally, if one knows the relative sizes of multiple different objects, then their ordinal
proximity can be inferred from their relative apparent sizes in the visual field. Thus, the
important point about this cue is that it is a relative cue. In other words, a basis for comparison
must exist, either from the scene or from the observer’s experience.
9
In the context of perceptual experiments, if the experimenter aims to prevent participants from
using this cue, it is essential that objects either be presented in the same size regardless of their
distance, or that size variations be independent of the object’s distance.
2.1.3 Accommodation
To bring images into focus on the retina, the curvature of the lenses of the eye requires
adjustment. This adjustment is referred to as accommodation. Closer objects require more
adjustment and, thus, sensing the amount of this adjustment might help in determining the
absolute depth of nearby objects.
Although static focus distances may not provide much information, changes in focus are what
makes this depth cue effective. Moreover, this cue is generally described as a monocular depth
cue since it does not require the involvement of both eyes.
2.1.4 (Binocular) Convergence
The amount of inward turning of the eyes when a focal point is fixated determines the degree of
‘convergence,’ and thus sensing the extent of this inward turning can help in determining the
distance of an object. This cue is used to provide absolute depth information for nearby objects.
2.1.5 Binocular Disparity
The ability to perceive a scene from two eyes that are separated by an interpupillary distance
provides (95% of) humans with one of the most important and perceptually acute sources of
depth information (Coutant & Westheimer, 1993).
When a scene is viewed, the fixation point (also referred to as the focal point) will fall on a
particular location on the retina of each eye, resulting in zero disparity. One can furthermore
envisage an imaginary geometric arc called the horopter, comprising all retinal points, including
the focal point, that also have zero retinal disparity. Other points that are closer or farther from
this arc are mapped onto disparate locations on the two retinas, which are nevertheless fused into
a single image in depth. The horopter thus provides a reference plane from which the ordinal
depth of other objects can be judged. Objects that are in front of the horopter (closer to the
observer) will result in fused images with crossed disparity, whereas objects that are behind the
horopter (farther from the observer) will result in fused images with uncrossed disparity. Based
10
on the amount of retinal disparity in the projection of each point to each eye, the visual system is
thus able to discern the ordinal depths between two points in space via the binocular disparity
depth cue (Patterson, 2009).
The importance of binocular disparity in perceiving depth was first shown through the invention
of the stereoscope by Wheatstone (1838), where a pair of flat drawings were used to achieve a
three-dimensional percept of an object. Later, in 1960, by introducing the concept of random dot
stereograms, Julesz (1971) made a significant contribution to the science behind stereo vision. A
typical example of a random dot stereogram is one where two images consist of identical
randomly distributed dots, but with a central square region that is shifted horizontally by a small
distance relative to the other image. When viewed individually, each image appears as a flat field
of random dots. However, when viewed stereoscopically, the central square region appears at a
depth that is different from the background plane of random dots. Random dot stereograms
provide evidence that binocular depth perception can be achieved without the need for
monocular form recognition.
Although the neurophysiological processes through which the brain derives depth information
from binocular disparity are outside the scope of this thesis, it is nevertheless important to note
the importance of vergence eye movements for the effectiveness of this cue. As mentioned, the
brain uses the horizontal disparity of objects on the retina to estimate their depth relative to the
fixation point. Through the use of vergence eye movements, the fixation point (defined as the
intersection of the line of sight of the two eyes) changes, resulting in a corresponding shift in the
position of the horopter. By doing so, our visual system is able to increase the range within
which it is able to perceive depth through binocular disparity (Foley & Richards, 1972). In
addition, the brain is able to use the corresponding changes in ocular vergence as a depth cue in
its own right. Therefore, if it were possible to provide extra cues that facilitate the observer’s
ability to converge her eyes at different depths, it may be possible to use the feedback from
convergence to increase the accuracy of information obtained from binocular disparity.
2.2 Integration of Depth Cues
In natural environments, multiple depth cues typically provide both consistent and
complementary information. However, in specific cases and especially with the use of visual
displays (due to the technological limitations of implementing various depth cues), cue conflicts
11
do arise. In other words, two or more sources of depth information can in some cases provide
inconsistent and/or discrepant information about depth. An example of this was shown in Figure
1.3, where the occlusion cue suggests that the virtual object is in front of the person’s face while
the binocular disparity cue displays the virtual object as being inside the person’s head. The way
in which and whether consistent and inconsistent cues interact with each other to provide a single
depth map or shape estimate to the observer has been the topic of much research (e.g., Johnston,
Cumming & Parker, 1993; Young, Landy & Maloney, 1993; Landy, Maloney, Johnston &
Young, 1995; Kennedy, Juricevic and Bai, 2003; Wismeijer, Erkelens, van Ee & Wexler, 2010).
For example, as discussed, Cutting and Vishton (1995) divided the continuum of depth into three
regions and defined relative ‘strengths’ for each depth cue. However, while the division of this
continuum has served as a valuable and useful tool in many aspects, the complexities involved in
the interaction of depth cues led to the development of more complicated models.
One model of cue interaction, suggested by Johnston et al. (1993), is referred to as ‘weak fusion,’
or ‘weighted linear combination’. In this model, the so-called ‘weak observer’ processes the
information provided by each depth cue separately and then averages the separate depth
estimates (from each cue) by using different weights for each. The weighting of each cue
depends on its estimated reliability under the circumstances.
An alternative to the weak fusion model is ‘strong fusion’, which involves the cooperation of
depth cues prior to obtaining depth estimates. In other words, in contrast to the weak fusion
model, the depth cues are not processed separately; rather they interact and provide the ‘strong
observer’ with the most probable three-dimensional interpretation of the scene. Examples of this
include ‘promotion’ and ‘disambiguation’. In the former case, one cue provides compensating
information for another incomplete depth cue. In the latter, depth information provided from an
inherently ambiguous cue (e.g. kinetic depth) is disambiguated by another depth cue (Johnston et
al., 1993). Based on Landy et al. (1995), models that are focused on modularity tend toward the
weak side whereas those that suggest more holistic interactions amongst cues tend toward the
strong side.
In the same paper, Landy et al. (1995) introduce the ‘modified weak fusion’ (MWF) model,
based on which interactions between different cues result in two types of information for each
cue: a commensurate depth map and an estimated measure of the cue’s reliability (which are
12
both based on a combination of information provided by the cue itself and those provided by
other cues). These estimates provide inputs to the final fusion (or weighted averaging) stage,
where the weights of each cue take the estimated reliabilities and the discrepancies between cues
into account. In other words, the MWF model can be simplified to the weak fusion model and
provides a means of constraining the strong fusion model to one that is able to be tested.
On the other hand, when the conflict between depth cues is large, some researchers have
suggested that usually one of two processes occur: cue switching or cue dominance (also referred
to as ‘vetoing’) (Wismeijer et al., 2010). In the former, the visual system switches in time
between various depth percepts based on the information available from individual depth cues
(van Ee, van Dam & Erkelens, 2002). In the latter, one cue (usually the most reliable one)
overrides the other cue and depth judgements are made based on that single cue (Bülthoff &
Mallot, 1988; van Ee, Adams & Mamassian, 2003).
Regardless of which model is used to describe the final percept that results from perceiving an
image such as the one depicted in Figure 1.3, an important implication of the discussed models is
that by creating conditions that result in changes in either cue reliability, cue availability or cue
inconsistency, it may be possible to aid observers in making more accurate depth judgements6. In
this case, with regards to the three possible models discussed above (MWF, cue switching and
cue dominance), we may be able to either:
• Increase the weighting of the cue that suggests the correct depth and reduce the weighting
of the cue that suggests the incorrect depth (if the MWF model applies), or;
• Increase the frequency with which an observer uses the cue that suggests the correct
depth (if cue switching occurs), or;
• Aid the observer in completely ignoring the cue that suggests the incorrect depth and
guide the observer towards using the cue that suggest the correct depth (if cue dominance
occurs).
6 In cases where the conflict of depth cues is created artificially (with the use of visual displays), an accurate depth
judgement may actually be the ‘desired’ depth percept/judgement.
13
Either way, creating such conditions allows us to achieve our desired depth percept. This idea is
the key to the theory behind the investigated method in this thesis and will be returned to
throughout the coming chapters.
While the issues involved in perception of depth can be discussed in much more detail, I will
now change the focus towards defining terms that are most relevant to the topic of my study.
More in-depth information about depth cues and their interactions can be found in the cited
publications as well as those not directly cited: e.g. Wickens et al. (2000), Bruce, Green &
Georgeson (2003), Parker et al. (1992), Interrante (1996).
2.3 Surfaces
Since the real object’s surface plays an important role in this study, it is important to define what
a surface is. Basically, two volumes meet at a surface. In the case of our study, the surface of the
real object meets the air at its surface. According to Kennedy and Wnuczko (2015), “a surface is
a polarized plane, that is, different on its two sides”. Generally, at any point on a non-planar and
non-spherical 3D surface there will be a unique direction in which the surface curves most
strongly, and that direction is referred to as the first principal direction. The orthogonal direction
at that point will be the direction in which the surface either curves most strongly in another
direction (if the surface is locally saddle-shaped) or it will be the direction in which the surface is
most flat (if the surface is locally cylindrical or locally elliptical). In other words, apart from the
special cases of a (flat) plane (zero curvature in any direction) and a sphere (equal curvature in
all directions), there are only five generic categories of surface shapes, defined by the signs of
the first and second principal directions (same sign = elliptical; both positive = convex; both
negative = concave; one positive/one negative = hyperbolic; and one zero/the other non-zero =
cylindrical). For the purposes of this thesis, we will categorize surfaces simply as flat or non-flat.
The limits to a surface are defined by its edges (Kennedy and Wnuczko, 2015). For example, in
the case of a cylinder placed against a wall, the cylinder’s edges can be observed because
wherever the cylinder’s surface ends, an occluding boundary is formed by the cylinder’s surface,
taken with respect to a vantage point. To one side of the boundary is the cylinder’s surface and to
the other side is the wall, as far as the vantage point is concerned. This occluding bound projects
a straight line in the optic array at the vantage point.
14
In some ways, our vision can be considered as “superficial” since, when we look at objects
around us, what is almost always perceived is the front ‘surface’ of opaque objects (Kennedy and
Wnuczko, 2015). It is for this reason that, throughout this thesis, the ‘real object’ and the ‘real
object’s surface’ will be used interchangeably. For the sake of simplicity, it might also be
referred to as the ‘real surface’.
2.4 Texture
Gibson (1950) listed “the quality of being visually resistant or ‘hard’ ” as one of the most
essential properties of a surface. He equated this property to having ‘texture’ which can be
perceived when inhomogeneous retinal stimulation occurs. ‘Texture’ can also be thought of as
what is provided by the material underlying a surface (e.g. granite or wood). In addition, the
optical projection of the texture depends on the structure or shape formed by the material (e.g. a
ball or an oar), as well as the structure of the surface as bumps and hollows.
As it pertains to the research focus of this thesis, I have placed surfaces into three categories:
containing no visible texture, containing 2D (textural) elements, or containing 3D (textural)
elements. While a smooth surface can belong to one of the first two groups, a surface containing
bumps or ridges will belong to the last.
With regards to 2D textures, Rao and Lohse (1993) used a combination of statistical techniques
to identify the features that can be used to characterize different texture patterns. Their study
revealed two major dimensions that they subjectively interpreted as “periodicity vs. irregularity”
(periodic meaning that the statistical properties of local patches of the surface are uniform over
the surface) and “directionality vs. nondirectionality” (nondirectional meaning that the surface
elements have no orientation bias). These two dimensions accounted for 90% of the variability in
their subjects’ classification of 30 pictures from Brodatz’s album (Brodatz, 1966). They also
found a third dimension characterized as representing “structural complexity” which accounted
for another 6% of the variability.
2.5 Transparency
Referring to a summary provided by Tsirlin, Allison and Wilcox (2008), one can consider
transparency to have three different primary manifestations:
15
a) Glass-transparency, which is essentially what is observed when light passes through clear
materials such as glass;
b) Translucency, which is what occurs when light is diffused as it passes through a material
and causes objects to appear less clear on the other side; and
c) Pseudo-transparency, which is the result of light passing through gaps in non-transparent
objects, such as lace or wire fences7. Based on this concept, Julesz (1971) further defined
Stereo-Transparency as Pseudo-Transparency that is perceived in surfaces defined solely
by binocular disparity.
The common theme that ties these three groups together is the fact that an object that is known to
possess some form of transparency can not only be seen but can also be seen through. What
allows for such an odd co-existence, as asserted by Interrante (1996), is our perceptual and
cognitive ability to reconstruct a continuous representation of an opaque object and a transparent
object at any given location on the transparent surface as seen from the vantage point.
In terms of the present research topic, it can be concluded that if we were able to convey the
existence of an object behind another object’s surface while preserving sufficient information
about the two objects, we may be able to create the impression of ‘transparency’ of the closer
object’s surface.
Having provided this brief perceptual background and definitions of terms, we can now move on
to the next chapter, which provides an overview of existing solutions that aim to deal with the
challenges of X-ray vision in AR.
7 In fact, “transparency” in computer graphics is actually almost always pseudo-transparency, where an “additive”
mathematical model is used to suggest the presence of little holes on the surface of the occluder, through which the
occluded object is presented. The mathematical model used to realise this effect incorporates a linear combination of
the intensities If and Ib of the occluding (foreground) and occluded (background) surfaces respectively, weighted by
the relative concentration of opaque material in the occluder: I = αIf + (1− α)Ib (Interrante, 1996).
16
Chapter 3
X-Ray Vision in AR: Literature Review
As discussed in Chapter 1, one of the prominent applications of AR is X-ray vision, which
allows the observer to ‘see through’ a real surface and perceive images of the structures beneath
the surface. To achieve this, images of what is beneath the surface are usually graphically
combined with the image of the real surface8. In achieving the metaphor of X-ray vision with AR
displays, several challenges are involved in achieving this with AR displays. This chapter
provides a review of the literature related to challenges involved in the use of AR for X-ray
vision applications, based upon which an overview of existing solutions and gaps will be
presented.
3.1 Challenges
As mentioned, to achieve X-ray vision in AR, virtual objects that are placed behind real objects
are superimposed9 onto an image of the real object. If the observer is able to see through the real
object’s surface and perceive the virtual object, based on the definition provided for
‘transparency’, the observer will perceive the real object to be transparent. For this to happen,
two requirements must be met:
1. The observer must be able to perceive the correct depth order between the virtual and the
real object:
• Johnson et al. (2003) used an OST display for overlaying preoperative MRI/CT
scans onto the patient in surgery and reported that surgeons would sometimes
perceive the virtual image (scans) as floating above the surface, even though they
were rendered in depth using a stereoscopic display to be behind the body.
8 These internal structures that are ‘added onto’ the image of the scene (which consists of the real object) may be
images either that were previously recorded or that are entirely computer-generated. For simplicity, in either of these
cases, the structure of what is behind the real surface will be referred to as the virtual object.
9 Although superimposing is the operation that is actually being carried out when the virtual objects are being added
to the real image, in light of our goal of having the virtual objects appear to be behind the real ones, a better term to
use would arguably be ‘subposing’ or ‘subposition’.
17
2. In the context of stereoscopic displays, the observer must be able to fuse the virtual
objects and real objects simultaneously:
• Using the concept of ‘stereo-transparency10’, Akerstrom and Todd (1988)
investigated the challenges involved in perceiving transparent surfaces. The
results from their experiments revealed that perception of overlapping transparent
surfaces is more difficult and requires more time than non-transparent surfaces.
They also found that the perception of stereo transparency becomes more difficult
when the distance between the overlapping planes and/or the density of elements
on the planes is increased.
• Johnson et al. (2003) asked participants to look through real stereo images of a
smooth skull, comprising the eye sockets and nasal bone, a brain in the skull and
natural foliage, to see a random dot target rendered behind the surface. The target
was initially presented at a large distance from the real surface, resulting in double
vision. The participants were then asked to move the target closer to the surface
until fusion was possible. This distance was recorded as the maximum distance at
which an observer could still fuse a target rendered behind a transparent surface.
Results revealed that the real object’s surface image content affected this distance
(with no clear trend). Hou and Milgram (2003) also confirmed this finding by
performing an experiment where participants were asked to manipulate a virtual
object near the surface of a real object. The results of their experiments showed
that, as the texture density of the real surface increased, the maximum distance at
which an observer could still fuse the virtual object behind the real surface was
reduced.
Once the impression of X-ray vision is achieved, there’s also the possibility that the accuracy of
absolute depth judgements about the real and virtual object will be adversely affected:
10 Stereo transparency is defined as Pseudo-Transparency that is perceived in surfaces defined solely by disparity
(Julesz, 1971).
18
• Ellis and Bucher (1994) asked participants to judge the depth of a virtual tetrahedron in
the absence and presence of a real checkerboard pattern placed in front of it. Results
showed that the introduction of the (opaque) checkerboard caused the mean position of
the virtual image of the tetrahedron to approach the viewers significantly.
• Ellis and Menges (1998) found that the presence of a visible real surface spatially close to
a virtual object significantly influenced the observer’s depth judgement of the virtual
object, resulting in the object appearing nearer than it really was. Singh et al. (2010) also
confirmed these results using a replication of Ellis and Menges’ experiments.
• Johnson et al. (2003) reported that, even when surgeons would see the virtual objects as
being below the real surface, they would perceive its position closer to the surface than
suggested by the binocular disparity cue.
• Edwards et al. (2004) also found reduced accuracy for depth judgements of a virtual
object when viewed through a physical transparent surface. In their experiments, the
virtual object was perceived as farther behind the surface compared to its actual position.
Their results also showed that this depth judgment error depended on the actual depth of
the virtual object.
Since all of these issues were identified in cases where OST displays were used, one may argue
whether these issues also apply to video based displays. However, it is important to reiterate that
the most relevant distinction between OSTs and video-based displays in this context is the
opacity of the virtual object (virtual objects cannot completely occlude real ones in OSTs).
Therefore, it is expected that, if these issues exist with OSTs, they should be further exacerbated
using video overlays, for which virtual objects tend to be completely opaque.
Another important point to make relates to the discrepancy between the findings of Edwards et
al. (2004), who noted that the distance to a virtual object tended to be overestimated when seen
behind a nearer transparent real surface, and those of Ellis and Bucher (1994) and Ellis and
Menges (1998), who reported to have found that the distance to the virtual object was
underestimated when seen behind a nearer opaque real surface. One possible explanation for this
discrepancy can be the distance at which the real surface was placed from the virtual object. In
both Ellis and Bucher’s (1994) and Ellis and Menges’ (1998) experiments, the real surface was
19
placed 300 mm in front of the virtual object. In Edwards et al.’s (2004) experiments, however,
the real surface was placed between 80 mm in front to 20 mm behind the virtual object and the
overestimation of the virtual object’s depth occurred only when the real surface was placed at a
distance less than 20 mm in front of the virtual object. Therefore, as Edwards et al. found from
their experiments, a plausible explanation might be that the direction of the error in perceived
depth of the virtual object depends on the distance between the real surface and the virtual
object.
A summary of the above literature is provided in Table 3.1. As can be seen, the general trend
shows incorrect depth order and inaccurate depth judgements of the virtual object, regardless of
whether the real surface is flat or curved or whether the virtual object is solid or wireframe.
Table 3.1: Summary of literature review on perceptual issues of X-ray vision in AR.
Publication Display
Used Real Object
Virtual
Object Identified Issues
Ellis and
Bucher
(1994)
Stereo
OST
Flat checkerboard
pattern
Tetrahedron
(wireframe or
solid)
- Virtual object appearing
closer to observer
Ellis and
Menges
(1998)
Stereo
OST
Flat checkerboard
pattern
Wireframe
pyramid
- Virtual object appearing
closer to observer
Johnson et
al. (2003)
Stereo
OST
- A smooth skull
- The front of the
skull including the
eye sockets and
nasal bone
- Brain in the skull
- Natural foliage
Random dot
target
- Real object’s surface
content reduced distance with
which virtual object could be
rendered while maintaining
ability to fuse image.
Real anatomy MRI/CT scans
- Virtual object appeared to
be floating above real
object’s surface
- Virtual object appearing
closer to observer
Edwards et
al. (2004)
Stereo
OST
Phantom
mimicking skin and
brain
Truncated
cone
- Virtual object appearing
farther from observer when
placed just behind the real
surface (distance less than 20
mm)
- Reduced accuracy of depth
judgements in presence of
real object
20
3.2 Review of Proposed Solutions
As mentioned in Chapter 2, the strengths of depth cues differ across different regions of space
(i.e., near-field, medium field, and far field distances). As a result, when investigating perceptual
challenges and solutions, it is important to consider the specific application and depth region for
which a solution is being proposed. Even though medical applications of AR justify the
importance of studying the relevant perceptual challenges in achieving X-ray vision for near-
field distances, there is relatively little research in this field. This is because most of the current
research in X-ray vision applications of AR is focused on mobile applications, presumably since
it is considered as the potentially major consumer application of this technology (Livingston et
al., 2013). Moreover, although a number of visualization techniques have been proposed to deal
with these challenges, there are very few studies that have provided experimental results
investigating the effectiveness of their proposed solutions. In this section, an overview of the
general methods used for dealing with perceptual issues in X-ray vision for near field
applications is presented.
3.2.1 Cutaway or Virtual Hole
To aid the observer in perceiving the virtual object as being placed behind the real surface, one
of the metaphors that has been used is referred to as the cutaway or virtual hole. In this method, a
hole appears to be carved on the real surface, through which the virtual object is presented. This
virtual hole may have a 3D structure showing the sides and bottom of the hole that is placed
behind the virtual object (Livingston et al., 2013). An example of this method is shown in Figure
3.1.
As part of their aforementioned experiments, Ellis and Menges (1998) used a virtual hole in the
real object’s surface as a visualization technique that could help in conveying correct depth
information. The results of their experiments showed that using this metaphor reduced the depth
judgment bias. Rosenthal et al. (2002) also used this method for comparing AR ultrasound
guidance systems for targeting needle biopsies in phantoms. An image showing the view from
the head mounted displayed used in that study is presented in Figure 3.1. Their results showed
lower mean errors in needle placement when using this technique, compared to when AR
displays were not used.
21
Although their study was demonstrative of the potential benefits of using AR, it did not explicitly
address the advantage of this specific visualization technique. In general, in cases where
preservation of the content of the real object’s surface is important, this method can definitely
prove problematic, as the virtual hole clearly eliminates all the information of the real object’s
surface (Bajura, Fuchs & Ohbuchi, 1992).
Figure 3.1: Example of virtual hole metaphor used by Rosenthal et al. (2002) for the task of
targeting needle biopsies in phantoms. The vertical lines are meant to show the sides of the
virtual hole. Reprinted from Medical Image Analysis, Vol. 6, Rosenthal et. al, Augmented reality
guidance for needle biopsies: An initial randomized, controlled trial in phantoms, 313-320, 2002,
with permission from Elsevier.
3.2.2 Modified Opacity
Another way to achieve the metaphor of X-ray vision is to depict the real object as being
partially transparent, by reducing the opacity of the real object pixels using image processing
techniques (Livingston et al., 2013). Rather than uniformly reducing the opacity of the entire real
object, Bichlmeier, Wimmer, Heining & Navab (2007) endeavoured to achieve a natural looking
transparency by defining an optimized opacity value, which was a function of the surface
curvature and the angle and distance between the observer and the image. In short, their model
22
involved assigning higher opacity values to regions with higher curvature and larger angle and
distance relative to the observer’s viewpoint. Sample images of their visualization technique,
implemented using a video based display, are shown in Figure 3.2. Although Bichlmeier et al.
(2007) demonstrated the feasibility of their suggested technique, they unfortunately did not
evaluate its efficiency regarding the correct perception of relative and absolute distances of
objects within the AR scene.
Figure 3.2: Example of the Modified Opacity visualization method used by Bichlmeier et al.
(2007). Reprinted by permission from IEEE 2007.
In general, there are two aspects to the limitations involved in the use of this technique:
computational and visual. Firstly, the use of this image processing technique is computationally
expensive and requires both a viewer tracking system and an accurate model of the real object.
For example, Bichlmeier et al. (2007) mentioned that high quality visualizations suffer from low
performance speed since the ‘quality of the transparency effect’ depends on the ‘accuracy level
of the surface model’. Secondly, the modified opacity technique can also be affected by the
display capabilities. In the case of OST displays, for example, virtual objects possess lower
brightness and are not able to completely cover real objects. On the other hand, video based
displays (which are intended for the use of the AR method described in this thesis) can allow
virtual objects to completely occlude real ones, which is the opposite of the desired effect for
conveying the metaphor of X-ray vision (Livingston et al., 2013).
However, it is worth mentioning that the two aforementioned techniques have been shown to be
superior to some alternative visualization techniques. For example, Sielhorst, Bichlmeier,
Heining & Navab (2006) evaluated 7 different visualization techniques using a stereoscopic
video based head-mounted display. The 7 techniques include: Surface rendering opaquely
superimposed; Surface rendering transparently superimposed; Surface rendering through a
23
virtual window in the skin; Triangle mesh; Volume rendering model through a virtual window in
the skin; Surface rendering with a glass effect of the skin; and Volume rendering superimposed.
These visualizations are presented in the same order (1-7) in Figure 3.3. Based on their
terminology (Sielhorst et al, 2006):
• Triangle mesh is the case where the surface of the bone structure “is stored in the
computer as a list of triangles” and visualized with the edges of these triangles (image 4
in Figure 3.3).
• Surface rendering involves the visualization of the bone structure surface with
“untextured but shaded solid triangles” (images 1, 2, 3 and 6 in Figure 3.3).
• Volume rendering “represents the whole volume rather than the surface” of the bone
structure with transparency values assigned to emphasize the bone structure (images 5
and 7 in Figure 3.3).
• Glass effect is the case where the surface of the skin “is rendered transparently and
achromatically” showing reflections of a virtual light source (image 6 in Figure 3.3).
In their experiments, 20 surgeons were given the task of moving a pointer to a specific point
on the surface of the spine (virtual object) inside of a phantom (real object). Amongst the
various visualization techniques they tested, they found the two best visualization modes to
be the transparent surface rendering and the virtual window surface rendering (images 2 and
3 respectively in Figure 3.3). These two methods can be considered analogous to the ‘virtual
hole’ and ‘modified opacity’ technique discussed above.
24
Figure 3.3: The 7 evaluated visualizations studied by Sielhorst et al. (2006). Visualizations 2 and
3, corresponding respectively to surface rendering transparently superimposed and surface
rendering through a virtual window in the skin, were determined to be the best in terms of depth
perception and effectiveness. Reprinted by permission from RightsLink: Springer Berlin
Heidelberg, International Conference on Medical Image Computing and Computer-Assisted
Intervention, Depth perception–a major issue in medical AR: evaluation study by twenty
25
surgeons, Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N., Copyright 2006 by
Springer-Verlag Berlin Heidelberg.
3.2.3 Context-preserving Techniques
A more advanced type of method that aims to deal with the challenges involved in the use of AR
displays for achieving X-ray vision is an image-based technique that is referred to as ‘context-
preserving’. Context-preserving refers to methods in which the removal of the real object surface
is controlled when imposing the virtual image onto it, such that certain details of the real surface
are preserved. In other words, the image of the real object is used to extract a partial model of the
real object that includes edges (Kalkofen, Mendez & Schmalstieg, 2007; Avery et al., 2009),
salient regions (Lerotic et al., 2007; Sandor et al., 2010) or a combination of salient regions,
edges and texture details (Zollmann et al., 2010). In addition to preserving the most important
information about the real object’s surface, these methods use these features to occlude the
virtual object, thereby suggesting the correct depth order between the real and virtual object.
Even in cases where such features don’t exist, synthetic features are added onto real surfaces. For
example, Zollmann et al. (2010) suggested adding synthetic features based on ‘tonal art maps’, to
provide compensation for surfaces where too few features exist. In their work, by adding a
hatching pattern to the surface of the pavement in an outdoor scene and having the pattern
occlude parts of the virtual underground pipes, they provided occlusion cues, which suggest that
the virtual pipes are in fact located underneath the pavement.
Within the literature mentioned above, the one method that has been applied to the medical
domain is that of Lerotic et al. (2007). Lerotic et al. (2007) used the da Vinci system (consisting
of a video based stereoscopic display) to compare the effectiveness of their proposed
visualization technique with traditional overlays11. Their technique involved rendering the real
surface as ‘translucent’ by adjusting its opacity value while detecting and preserving its salient
features, as shown in Figure 3.4. In their first experiment, participants were asked to locate eight
virtual spheres placed along different depths of a model of a real thorax. Results showed
11 “Traditional overlays” refers to cases where the virtual object is overlaid onto the real object without any
modifications.
26
significant improvement in the accuracy of depth judgements using their solution in comparison
with traditional overlays.
Although demonstrated to be beneficial, extracting these partial models requires computationally
expensive rendering steps or special purpose hardware (Livingston et al., 2013). These methods
are also not applicable to cases where occlusion of the virtual object is difficult to realize, or is
not desired. Moreover, most of these methods require the real object’s surface to possess salient
features in order for the algorithms to function effectively.
Figure 3.4: A context-preserving visualization used by Lerotic et al. (2007). Reprinted by
permission from RightsLink: Springer Berlin Heidelberg, Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2007, Pq-space based non-photorealistic rendering for
augmented reality, Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z., Copyright 2007 by
Springer-Verlag Berlin Heidelberg.
3.3 Criteria for Success
From the challenges and proposed solutions presented in Sections 3.1 and 3.2, it can be
concluded that the success of methods used to achieve X-ray vision with AR displays requires
considering several indicators. In particular, an effective method must:
1. Provide the observer with the information that permits her to understand the depth order
between the virtual and real objects: In simpler terms, the observer must be able to
perceive the virtual image as being behind the real object surface (and thus inside the real
object).
27
2. Preserve some amount of detail about both the virtual objects and the surface of the real
objects that is sufficient for carrying out one’s intended task: Not surprisingly, achieving
these two properties typically involves a compromise. If the real object surface is able to
occlude portions of the virtual object (allowing the observer easily to infer the virtual
object as being behind the real surface), at least some details of the virtual object may be
lost. On the other hand, if the virtual object is overlaid onto the real surface without
occlusion by the real surface, in addition to losing details of the real surface, the depth
order of the virtual and real objects may become incomprehensible.
3. Require a reasonable computational load for creating the final rendering: For instance, as
discussed, some methods require the computation of an accurate 3D model of the
physical environment to create a convincing composition of virtual and physical objects.
Therefore, to summarize, a convincing solution should be one that finds an appropriate level of
compromise between depth perception and information preservation (of both real and virtual
objects), while minimizing computational cost. In the following chapter, I present the reasoning
behind the idea of this thesis, while justifying its criteria for success by referring to the concepts
presented above.
28
Chapter 4
Our Method
4.1 Use of Texture
As discussed in Section 3.2, to deal with the challenges involved in achieving X-ray vision with
AR displays, various researchers have suggested the addition of some sort of ‘texture’ to the real
surface. For example, Zollmann et al. (2010) used the addition of synthetic features to images of
real pavement to occlude superimposed virtual pipes, thereby suggesting that the virtual pipes are
in fact located underneath the pavement.
In the context of stereoscopic displays, however, the research by Interrante et al. (1997) seems to
be most relevant to the topic of this thesis. In their work, Interrante et al. suggested using sparse
opaque textures that were specifically designed to convey intrinsic surface shape properties, to
improve perception of depth and spatial understanding of the surface. By adding grid lines or
strokes to the surface of a 3D computer-generated transparent object, Interrante et al. were able
to use a combination of the occlusion cue, the binocular disparity cue, the relative density cue
and motion parallax12 to improve depth perception. Their claim was based on the idea that
consistent depth cues reinforce each other, leading to improved depth perception (Interrante,
1996). Though their work was done in a completely virtual environment, the premise of their
work could be formulated as positing that adding texture to a surface can facilitate the veridical
perception of depth from binocular disparity.
If this is in fact true, for cases where the occlusion and binocular disparity cue are in conflict, the
addition of texture to a surface may result in an increase of the availability and/or reliability of
the binocular disparity cue so that it can dominate the occlusion cue. This reasoning is also in
line with the theory that was presented in Section 2.2 (on integration of depth cues). As may be
recalled, based on the perceptual models on depth cue integration, cues are either combined
using weights that are based on their respective reliabilities, switched between or ignored due to
the presence of a more reliable cue. Therefore, it may be possible to create conditions that result
12 For a brief explanation of these two latter cues, see Appendix D.
29
in changes in either cue reliability, cue availability or cue inconsistency, to aid observers in
making more accurate depth judgements. For example, in Figure 4.2(a) and Figure 4.2(b),
described below, even though the occlusion cue is suggesting that the virtual object is in front of
the real surface, it may be possible to reduce the weighting of the occlusion cue, reduce the
frequency with which the observer uses the occlusion cue or to help the observer in ignoring this
cue by increasing the availability and/or reliability of the binocular disparity cue. In the
following section, we propose that adding a random dot texture pattern to a real surface in a
stereoscopic display is a potentially effective means of increasing the availability and/or
reliability of the binocular disparity cue (through supporting vergence eye movements). If this is
done successfully, the observer should be able to perceive the virtual object as lying behind the
real surface.
4.2 Stereo-Translucency
In the context of stereoscopic video based AR displays, when a virtual object is correctly
rendered (stereoscopically) in front of a real object, the binocular disparity cue and the occlusion
cue together provide consistent information, allowing the virtual object to be perceived
unambiguously as being in front, as illustrated in Figure 4.113.
Figure 4.1: Stereo pairs. The blue circle indicates a virtual object rendered in front of the surface
of a real object (the face). In this case, the binocular disparity cue and the occlusion cue provide
13 It is worth noting that, this is not the case with stereoscopic OST displays. With OST displays, when the virtual
object is rendered in front of the real surface, the occlusion cue may be inconsistent with the binocular disparity cue
since the virtual object will appear to be transparent.
30
consistent information, allowing the virtual object to be perceived unambiguously as being in
front of the person’s face. Note that the middle face shows less mouth than the left face and the
eyebrows are more extensive in the left face.
The addition of random dot patterns to the real surface of Figure 4.1 should in this case have no
effect on how the virtual object is perceived relative to the real surface. However, in cases where
the virtual image is rendered stereoscopically behind a real object, even though the binocular
disparity cue is communicating that the virtual object is behind the real surface, the occlusion cue
nevertheless continues to suggest that the virtual object is in front (Drascic & Milgram, 1996).
An example of this situation is depicted in Figure 4.2(a). We refer to this case as being
incongruous, as a consequence of the conflict between these two very important depth cues –
occlusion and binocular disparity.
(a)
(b)
31
Figure 4.2: In both sets of stereo pairs (a) and (b) (identical to Figure 1.3), the blue virtual circle
is stereoscopically rendered behind the face. In this case, the binocular disparity cue and the
occlusion cue provide inconsistent information, leading to a cue conflict: (a) Untreated image.
An enlarged (landscape) version of this image is provided in Appendix C2 to help in perceiving
the desired percept.; (b) Addition of random dots onto the face (using a projector). If successful,
the reader should more easily perceive the virtual circle as being behind the face in (b), relative
to (a).
To aid the observer to contend with the sometimes perplexing effects of incongruity, and to
facilitate perception of the correct depth order of the virtual object and the real surface, we
propose the addition of random dot patterns onto the real surface. By comparing Figure 4.2 (b)14
with Figure 4.2(a), one should get the impression that perceiving the virtual object as being
behind the surface is easier when the random dot pattern is present (Figure 4.2 (b)) compared to
when it is not (Figure 4.2 (a))15.
Expanding upon what was discussed in the previous section, one explanation for the expected
effect is that by adding random dots to the real object surface, we are able to provide observers
with distinct fixation points (in the form of the edges of the dots), thus guiding them in making
vergence eye movements (between a virtual object and the real surface) and using the additional
vergence cue to make better depth judgements. By doing so, we should be able to increase the
availability and/or reliability of the binocular disparity cue such that the observer is more easily
able to perceive the virtual object as being behind the real surface (despite the conflicting
occlusion cue). Furthermore, because the virtual object is perceived as being behind the real
surface, which remains visible, observers are able to perceive the real surface as being
“transparent” – i.e. X-ray vision.
It is important to clarify the terminology we are using here. As discussed in Chapter 2, one of the
primary manifestations of transparency is Pseudo-Transparency, which is the result of light
14 Note that this figure is identical to Figure 1.3.
15 Note that, unless the reader is able to view these stereo pair images stereoscopically, it will not be possible to
perceive any differences with regards to where the virtual object is located relative to the real surface.
32
passing through gaps in non-transparent objects, such as lace or wire fences (Tsirlin et al., 2008).
Julesz (1971) also used this concept to define Stereo-Transparency, which is Pseudo-
Transparency that is perceived in surfaces defined solely by disparity. However, we have
hesitated to use Julesz’s term to refer to the phenomenon described above as Stereo-
Transparency (or Stereo-Pseudo-Transparency), due to the fact that the percept is not due only to
binocular disparity, but rather to the conjunction of both binocular disparity and occlusion
cues16. Otherwise stated, what we observe is not due to light passing through gaps in non-
transparent surfaces, and thus does not fit the accepted constraints of Pseudo-Transparency. One
option might be to label the observed phenomenon as “Pseudo-Translucency” (or “Stereo-
Pseudo-Translucency”), a term that could be further justified by the fact that virtual objects that
are rendered stereoscopically behind a real surface but nevertheless occlude that surface give the
overall impression of a diffuse surface, somewhat akin to frosted glass. As discussed later on in
this thesis, however, we have avoided using the term “translucency” in the subjective judgement
components of our experiments, due to our (untested) premonition that participants would likely
be confused by questions that are framed using that term. In the remainder of this thesis, we use
the term “transparency” in our discussion, to reflect the instructions given to participants.
Another hypothesized effect of the addition of a dot pattern onto a surface is the expected
creation of “holes” on the surface wherever the (black) dots are added. The proposed hypothesis
related to this is that, when observers are faced with the aforementioned cue conflict, they are
given the impression of looking through these holes in the real surface (the dots being the holes)
at the virtual object placed underneath the real surface. At the same time, however, because the
non-dotted parts are still occluded by the virtual object while remaining visible, this adds to the
impression of translucency, as discussed above.
Moreover, by using a uniform colour for the dots in the dot pattern (as shown in Figure 4.2 (b)),
it is postulated that a potential consequence of the virtual object occluding the dots may be the
illusion of a uniform background, of the same colour as the dots, lying behind the virtual object,
16 It is worth noting that the difference between the occlusion cue in our stimuli and Julesz’s random dot patterns is
its congruity with the binocular disparity cue. In other words, while the occlusion cue is in agreement with the
binocular disparity cue in Julesz’s random dot patterns, the incongruity of the occlusion cue and the binocular
disparity cue is what actually leads to the impression of ‘transparency’ in the case of our stimuli.
33
within the real object. As explained in Figure 4.3, the reasoning here is that, in contrast to the
non-dot portions of the pattern, which retain all of the original surface information, the black dot
portions occlude the information on the surface. Consequently, it may be possible for an observer
to perceive all of the black dot parts of the image as belonging to a large black background17.
This percept is likely to be reinforced further by the portions of the black background that are
occluded by the real surface and the virtual object that are clearly in front of that background.
17 Note that there is no reason for the random dots necessarily to be black, and thus for the background always to be
perceived as being black. In principle any colour of dots should produce the same effect, although obviously some
colours will be more appropriate than others, depending on the colours and features contained in the real surface.
For example, because of the ‘dark is deep’ bias, it is suspected that random dots with darker colours compared to the
real surface would work best.
Real Object Surface
Black dots
34
Figure 4.3: Hypothesised percept when using a dot pattern as a means of surface manipulation.
The top portion of the image shows a magnified (2D) view of the real surface (skin), which has
been altered by adding a random dot pattern. The lower portion of the image shows the top view
of the observer as he/she may perceive the image if this percept is achieved.
4.3 Information Preservation
Although other patterns such as a checkerboard might also achieve the impression of
transparency, compared to regular patterns the randomness of these patterns is intended to aid
users in focusing their attention on the surface rather than the pattern itself. In other words, the
use of prominent patterns that take on a character of their own may lead to adding visual noise
rather than enhancing the overall effectiveness of the presentation (Interrante, 1996). Moreover,
using a random dot pattern allows for independent experimental control over the density of the
black dots18.
4.4 Computational Costs
Since presentation of the real surface requires no image processing steps other than the
overlaying of the random dots, computational costs can be minimized. That is, unlike the use of
strokes for adding texture (Interrante et al., 1997), one does not need to have a detailed model of
the real object. The only extra step is to render the black dots of the pattern at depths
corresponding to points on the real surface, which can be done by obtaining a partial model,
using a depth map obtained from stereo pair images19. While it may be argued that adding grid
lines to the real object’s surface may require the same (relatively low) level of modelling of the
real object, as discussed in the previous section grid lines run the risk of forming a distracting
pattern, which would be undesirable. As an alternative to computationally overlaying the random
dot patterns on the real surface, it may also be appropriate under certain circumstances to use a
18 From a practical point of view, since the pattern is random, the ultimate user of such a display system could be
provided with the means to easily adjust the parameters of the random-dot mask (such as dot size, dot density, dot
distribution, etc.) in real-time in order to preserve the visibility of desired content on the real surface.
19 It is important to distinguish between different extents to which one can model a real object surface. In the
present case, we are considering a point cloud depth map obtained from scanning a real surface, or from performing
stereo matching, to comprise a relatively minimal extent of modelling that surface, in contrast to more extensive
models that involve quantitative relationships among all, or most, components of the object.
35
projector to project a pattern onto the real object surface, in which case no model at all would be
necessary.
4.5 Past Work
Before presenting the experimental work done for this thesis, it is important to provide a brief
overview of what was done to investigate the effectiveness of this idea prior to the
commencement of this PhD work. The first implementation of this idea was done by Otsuki and
Milgram (2013), where random dot patterns of different dot sizes and dot densities were overlaid
onto a pink (virtual) background, which was intended to represent a flat surface of an object, as
illustrated in Figure 4.4. Their results confirmed the bias found by Ellis & Bucher (1994), Ellis &
Menges (1998) and Johnson et al (2003) towards perceiving the virtual circle to be closer to the
observer than it actually was. Furthermore, using Thurstonian scaling (Thurstone, 1927), Otsuki
and Milgram’s results showed higher ratings for smaller dot sizes and higher dot densities in
response to the questions:
- In which image is it easier to perceive that the circle is behind the masking window20?
- In which image does the masking window appear to be more transparent?
Figure 4.4: Sample stimulus used by Otsuki and Milgram (2013). The blue circle indicates a
virtual object rendered beneath the depicted surface, which has been modified through the
addition of a pattern of random black dots. Reprinted by permission from IEEE 2013.
20 Masking window in this experiment referred to the part of the pink background that was covered with the black
dots.
36
The limitations involved in the implementation and results of that series of experiments (which
will be discussed in depth in the following chapter) formed the motivation to further investigate
this idea. The next chapter presents the first set of experiments done as part of this PhD thesis.
37
Chapter 5
Experiments 1 and 2: Effect of Using Random Dot Patterns on Depth Order Disambiguation, Perception of Transparency and Surface Information Preservation21
As discussed earlier, to expand the potential application areas of X-ray vision with stereoscopic
displays, by means of offering a viable compromise between depth perception, surface
information preservation and minimal computational expense, we proposed adding random dot
patterns to the surface of real objects. Despite the potential advantages, this method is
nevertheless similar to the solutions presented in Chapter 3, in that it involves a trade-off
between depth information and real surface content preservation. As part of our effort to explore
that trade-off, and thereby the potential effectiveness of this method in dealing with the
challenges of X-ray vision with stereoscopic AR, the present chapter describes and presents the
results of a set of experiments which aimed to determine the effect of dot size and dot density on
both perceived transparency (related to perception of depth order) and perception of real surface
information22.
This set of experiments consisted of 2 experiments: Experiment 1 and Experiment 2. Experiment
1 focused on investigating the feasibility of this display principle and assessing the effect of
random dot patterns in perceiving the correct depth order between a virtual object and a real
surface. Experiment 2, on the other hand, was aimed at examining the effect of relative dot size
and dot density on perceiving the impression of transparency of the same real surface while
preserving surface information. This chapter presents the detailed description and results of these
experiments.
21 Note that large portions of this chapter coincide with the 2017 publication by Ghasemi, Otsuki, Milgram &
Chellali in the journal Presence.
22 While it can be argued that surfaces contain features, optical arrays contain information about these features and
observers detect these features by using the optical information, for the sake of simplicity, surface features will be
referred to as surface information throughout this thesis.
38
5.1 Purpose
The collective purpose of the two experiments was to investigate the trade-off involved in
perceiving the correct depth order for a virtual object that is intended to appear behind a real
surface, and the perception of sufficient information about the real surface. In particular, these
experiments were designed to answer the following questions:
• Can the addition of a random dot pattern lead to disambiguation of the depth order
between the virtual object and the real surface?
• With the virtual object being perceived as behind the real surface, does the addition of a
random dot pattern lead to a more convincing impression of ‘transparency’ of the real
surface? If so, what are the effects of dot size and dot density of the random dot pattern in
achieving this impression?
• Does a trade-off exist between perceiving ‘transparency’ and preservation of surface
information?
• Based on these results, how can one optimize the dot size and dot density of the random
dot patterns to achieve X-ray vision while preserving sufficient surface information?
In the next section we provide a description of the experimental method that was used to address
the above questions.
5.2 Experimental Method
In investigating the effect of dot size and dot density on the ability to perceive both depth order
and surface information, it is important to use an appropriate distance between the real surface
and the virtual object, such that the virtual object can easily be perceived as being behind the real
surface. In other words, our primary objective here was not to examine participants’ ability to
discern different distances between the virtual object and the real object surface. Rather, our
objective was first to ensure that participants would be able to perceive that the virtual object was
behind the surface, and then to explore the factors that influence the resulting sense of the
transparency of that surface and their ability to perceive information on the object surface.
For this reason, two experiments were done. In addition to testing the effect of random dot
patterns on depth order disambiguation, Experiment 1 also aimed to determine an appropriate
39
distance for placing the virtual object in Experiment 2. In doing so, we aimed to reveal the
presence and sensitivity of any perceptual bias in localizing the virtual object within the vicinity
of the real surface. Experiment 2, on the other hand, was designed to investigate the trade-off
involved in perceiving the impression of transparency while also preserving surface information.
In this section, we discuss image generation and presentation and provide information about the
participants for both experiments. The sections after that discuss each experiment separately.
5.2.1 Image Generation and Presentation
An example of the stimuli used in the experiment is shown in Figure 5.1, which is a simplified
version of the more general case depicted in Figure 4.2 (b), but with the complex 3D face in
Figure 4.2 (b) replaced by a (purple) textured plane perpendicular to the line of sight. With
regards to the apparent similarity here to stimuli used in an earlier experiment reported by Otsuki
& Milgram (2013), as shown in Figure 4.4, we note that a primary goal of the present
experiments was to investigate the effectiveness of this method when applied to real surfaces (in
compliance with the definition of AR). For our real object, we employed a coloured photo of a
real textured surface that was extracted from a volume of professional photographs by P. Brodatz
(Abdelmounaime & Dong-Chen, 2013; Brodatz, 1966)23. In doing so, our intention at this point
was that the surface, as shown in Figure 5.1, would be flat and would comprise a visible 2D
texture. The absence of 3D textural elements on this surface24 was intended to provide us with
the means of evaluating our solution for specific surface types, such as those that might be
considered analogous to the smooth surface of organs containing 2D marks, spots or vessels.
Once the random dot patterns were generated (as explained below) and overlaid onto the real
surface, all images were rendered stereoscopically using a desktop computer (Windows 7
Professional OS with NVIDIA Quadro 600), coded using Visual C++ 2010 and OpenGL. The
stimuli were presented to participants on a 23-inch LCD screen (ASUS VG236HE, 1920 x 1080
23 These textures are publicly available in support of research on image processing and image analysis.
24 Recall the distinction between these three types of textures, outlined in Section 2.4.
40
resolution, 120 Hz refresh rate). Stereo images were observed using the NVIDIA 3D vision
system with 3D Vision 2 glasses.
Figure 5.1: Example of a stimulus stereo pair used in the experiment. The blue circle indicates a
virtual object rendered (0.35 mm) beneath a textured purple surface, which has been modified
through the addition of a pattern of random black dots. (The reader is referred to Figure 1.3 for
instructions on how to free fuse such stereo images.)
For all trials, the real object surface with the random dot pattern was presented at the same depth
as the display surface (i.e., with zero disparity)25. The blue virtual circle, on the other hand, was
rendered at different depths, based on an equivalent parallel camera orientation, depending on
the particular stimulus presentation. The on-screen horizontal disparities for the circle were
calculated based on a fixed viewer-to-display distance of 40 cm and an assumed average inter-
pupillary distance of 65 mm. To prevent the use of the relative size depth cue, the diameter of the
circle was kept constant, at 187 pixels, regardless of the distance from the surface. The line width
of the circle was also kept constant, at 2 pixels. Together with the selection of the real surface,
outlined above, the colour and line width of the virtual circle were chosen such that the stimulus
as a whole could be considered analogous to a partial endoscopic view of an organ with a virtual
vessel rendered beneath the surface.
In keeping with our goal of investigating the case of incongruous AR displays in this experiment
(as discussed in Section 4.2), no occlusion cues suggesting the blue virtual circle being behind
25 Because the real object surface was flat and was rendered with zero disparity for the present experiment, it was
functionally equivalent to a monoscopic image.
41
the real surface were present in the stimuli. In other words, as seen in Figure 5.1, the blue virtual
circle was continuous – even though it was stereoscopically rendered behind the surface.
In both experiments, the random dot patterns were generated using the MATLAB function
‘rand’. In all cases, the textured surface was square, with an area of 334x334 pixels, and the area
of the random dot pattern, also square, was 148x148 pixels.
Dot size (DS) and dot density (DD) were varied throughout both experiments, as illustrated in
Figure 5.2. The parameter that we are calling dot size should, technically speaking, be referred to
as ‘relative dot size’, since it refers to the fraction into which each dimension was divided, rather
than the actual physical size of the dots. For example, a (relative) dot size of 1/25 means that a
25x25 grid was used to generate the random dot pattern. For our 148x148 pixel grid, a dot size of
1/25, for example, therefore meant that each dot had an area of 6x6 pixels. Dot density, on the
other hand, refers to the percentage of the entire random pattern area that was covered with dots.
It should be noted that these two parameters are independent of each other. In addition to the
stimuli presented in Figure 5.2, a ‘No Pattern’ condition was also presented.
5.2.2 Participants
For each of the experiments, 15 students from the University of Toronto were recruited, all 18-39
years old (7 male and 8 female for Experiment 1 and 12 male and 3 female for Experiment 2).
All participants either had normal visual acuity or used corrective devices to achieve normal
visual acuity during the experiments. To confirm the absence of any stereoscopic vision
problems, the NVIDIA 3D stereo vision test26 was administered. After taking the stereo vision
test, participants were given an information sheet outlining the details of the experiment. They
were then given the consent form to sign, which was followed by a brief questionnaire. Copies of
these are included in Appendix A1 and A2. Participants of Experiment 1 were precluded from
26 The NVIDIA 3D stereo vision test is a simple application through which the ability to see in 3D can be verified.
When this application is launched, the letters in ‘nVIDIA’ and NVIDIA’s logo start moving back and forth in depth.
If the participant is able to see stereoscopically, he/she can attest to their ability to perceive this motion in 3D.
Conversely, if any potential participant were to be unable to detect those depth changes, s/he would not be accepted
for participation in the experiment.
42
participating in Experiment 2 to prevent learning effects. As compensation, participants were
each paid $15/hour.
Figure 5.2: Stimuli used for Experiments 1 and 2. Only the 9 stimuli in the 40, 50 and 60%
columns were used in Experiment 1. All 12 stimuli were used in Experiment 2.
5.3 Experiment 1
5.3.1 Objectives and Hypotheses
The aim of this experiment was to test the basic premise of our AR X-ray vision concept –
whether adding random dot patterns is indeed able to facilitate the perception of an incongruous
virtual object located behind a real surface. In detail, our first hypothesis (H1) was that when
virtual objects are stereoscopically rendered behind, but very close to, the real surface, the
addition of random dot patterns can lead to disambiguation of the depth order between the virtual
object and the real surface.
Expanding further upon H1, it was hypothesized that, because all portions of the virtual circle
were always visible in the image (as opposed to portions of it being occluded by the real object
surface), the participants would be biased towards perceiving the virtual circle as being closer to
the viewer in comparison with its actual geometric location, as defined by its imposed
43
stereoscopic disparity. In other words, whenever the virtual circle was presented, by means of
on-screen disparity, to be in front of the real surface, it was hypothesized that this would be
unambiguously perceived as such. However, whenever the circle was rendered to be behind the
real surface, we hypothesized (H1a) that it would be perceived to be closer to the surface than its
actual distance behind it.
Moreover, considering our postulate that the addition of random dot patterns can lead to
disambiguation of the depth order between the virtual object and the real surface, we predicted
that, in cases where the random dot pattern was present, participants would be more accurate in
determining the virtual circle’s position (H1b).
In addition to testing the above hypotheses, a second goal of this experiment was to determine an
appropriate depth for positioning of the virtual circle for Experiment 2, to permit compensation
for the predicted bias. In other words, our aim was to increase the probability that participants in
Experiment 2 would consistently perceive the virtual circle as being placed behind the real
surface. Therefore, both accuracy, in terms of determining the presence of any perceptual bias in
localizing the virtual circle within the vicinity of the real surface, as well as precision, in terms of
estimating the sensitivity of perceiving the location of the circle, were investigated. To this end,
the psychophysical method of constant stimuli was used (Gescheider, 2013), comprising a series
of trials in which the virtual circle was presented at different distances both in front of and
behind the real surface.
5.3.2 Procedure
After getting acquainted with the software, participants were shown a series of stimuli, to each of
which they responded whether they perceived the circle as being in front of or behind the
surface. The virtual circle was presented at 6 distances relative to the surface, three in front and
three behind. Relative to the physical setup of our experiment, the values used, all in mm, were:
{+0.2, +0.35, +0.5} in front and {-0.2, -0.35, -0.5} behind. (These distances were equivalent to
disparity angles of {-0.24, -0.49, -0.7} (in front) and {+0.24, +0.49, +0.7} (behind), in units of
44
arc-minutes27.) These values were selected based on pilot studies performed using the three dot
sizes {1/25, 1/50, 1/75} and the three dot densities {40%, 50%, 60%}, as well as the ‘No Pattern’
condition. The objective in choosing these particular values was to maximize the sensitivity for
identifying the associated thresholds of depth perception by emphasising values within the
expected transition zone of the resulting psychophysical functions, while avoiding any ‘floor’
and ‘ceiling’ effects associated with 100% certainty judgements, which were expected to have
resulted if substantially larger distances in front and behind had been selected.
With 5 trials for each combination of conditions, this led to 300 trials (6 x (3x3 +1) x 5) for each
participant. The stimuli containing the random dot patterns used are shown in the first three
columns of Figure 5.2. The presentation order of the stimuli was randomized. Participants had 4
seconds to reply to each presentation. (This time limit was chosen through extensive pilot testing,
to reduce speed-accuracy trade-off effects.) If participants ran out of time for a particular
stimulus, the subsequent stimulus would appear automatically, but the missed trial would
reappear, unbeknownst to participants, later on in the experiment. This would occur as many
times as required until the participant had successfully replied within the time limit for that
stimulus.
5.3.3 Results and Discussion
Figure 5.3 shows the results obtained from Experiment 1 (for each dot size), where each curve
represents a psychophysical function fitted to the associated set of experimental data
(Gescheider, 2013). It should be recalled that only the 9 stimuli in the 40, 50 and 60% columns
of Figure 5.2 were used in this experiment. The y-axis in Figure 5.3 represents the proportion of
times that the circle was perceived as being in front of the surface, averaged over participants.
The x-axis represents the actual position of the circle relative to the surface. The dashed vertical
line indicating x=0 (mm) corresponds to the Point of Objective Equality – that is, the
27 The disparity angles were obtained from the equation r=(d*I)/(D*(D+d)) where r, d, I and D correspond
respectively to disparity angle, predicted depth, inter-pupillary distance and viewing distance (Patterson, 2009). Note
that because the units in both the numerator and denominator of this equation cancel each other, the disparity angle,
r, is obtained in radians and can be converted to units such as arc-minutes. Note as well that reporting disparity
values when presenting results has been recommended by researchers in the 3D community, since it “affords more
efficient and accurate cross-study comparisons” (McIntire, Havig, & Geiselman, 2014).
45
(hypothetical) case for which the circle would be placed exactly at the depth of the real surface28.
For comparison purposes, the results for the “No Pattern” condition have also been included in
the graphs for all three dot size conditions.
Looking first at the No Pattern results (the same in all three graphs), we see clearly that the Point
of Subjective Equality (PSE), defined as the interpolated intersection of each fitted
psychophysical function with the 0.5 proportion level (shown as a dashed horizontal line in
Figure 5.3) lies at 0.493 mm behind the plane of the real surface. What this means is that if the
virtual circle had actually been placed at this distance behind the real surface, participants would
have perceived it 50% of the time as being in front of and 50% of the time behind that location.
In other words, the PSE or the hypothetical location at which participants believed on the
average that the virtual object was located on the surface, was actually 0.493 mm behind the
surface (and farther from the participants). This result was thus in support of our hypothesis H1a.
Referring now to the random dot pattern responses, for each of the relative dot sizes there does
not appear to be any obvious differences among the three dot density (DD) graphs. On the other
hand, for the DS = 1/25 graph, the PSE appears, for all three DD values, clearly to be behind the
surface, similarly to the No Pattern results. However, for the other two DS values (1/50 and
1/75), the PSE values appear to be very close to 0.
Comparing the random dot pattern psychophysical functions to those of the No Pattern condition,
one can observe that the PSE values for the two dot sizes of 1/50 and 1/75 lie closer to zero than
for the No Pattern condition. These observations suggest that, unless very large dot sizes are
used, the addition of random dot patterns can help with disambiguation of the depth order
between virtual objects and the real surfaces. This result thus supports hypothesis H1b29.
28 Note that this condition was not in fact part of the stimulus set.
29 It is worth noting that, given that these results were based on the psychophysical function, significance testing
was not feasible and, therefore, the support of H1a and H1b should not be deemed as statistically significant.
Moreover, another limitation involved with this experiment is that the psychophysical functions were fitted to data
that were too close to the Point of Objective Equality. Ideally, including stimulus distances that were both farther
behind and farther in front of the real surface would likely have resulted in more reliable estimated psychophysical
functions.
46
To determine the minimum distance that would ensure that the participants would ‘reliably’
perceive the virtual circle as being behind the real surface, a maximum error frequency of 25%
was chosen. Amongst the 10 conditions tested, the largest distance corresponding to the
intersection of the fitted psychophysical functions with the 0.25 proportion level belongs to the
largest relative dot size (1/25) and smallest dot density (40%), and is equivalent to 2.68 mm
behind the real surface. Therefore, for the next experiment, it was reasoned that, as long as the
displacement chosen places the virtual circle beyond this distance behind the real surface, one
could be confident that the circle would be consistently perceived as being behind the real
surface (with a maximum error frequency of 25%, for the DS=1/25, DD=40% condition, and a
much smaller error frequency for all of the other conditions). In fact, to reduce the error
frequency further, the blue virtual circle was presented even farther away, at a distance of 3 mm
(equivalent to 4.16 arc-minutes) behind the screen/surface for the next experiment30.
30 Care was taken not to place the virtual circle too far behind the real surface, by confirming that this value was
within Panum’s fusional area, to ensure that binocular fusion would be maintained. To do so, pilot testing was done
to confirm that no reports of difficulty in fusing the virtual circle were made.
47
(a)
48
(b)
49
(c)
Figure 5.3: Psychophysical functions fitted to results of Experiment 1 for dot sizes of (a) 1/25,
(b) 1/50 and (c) 1/75.
5.4 Experiment 2
5.4.1 Objectives, Hypotheses and Procedure
As explained above, the goal of this experiment was to investigate the trade-off involved
between concurrently perceiving surface transparency while preserving the ability to discern
surface information. We hypothesized (H2) that, whereas on the one hand it should be easier
relative to the No Pattern conditions tested to perceive transparency whenever random dots are
added (H2a), on the other hand surface information should be easier to preserve for the No
Pattern condition, for which there are no random dots to interfere with examining the content of
the surface (H2b).
50
We also hypothesized (H3) that increasing the dot density of the pattern would result in a
stronger impression of transparency (H3a) but a reduction in preservation of surface information
(H3b). The reasoning behind this is that, as previously explained, the black dots were expected to
give the impression of there being ‘holes’ in the surface, such that with larger proportion of holes
in the surface, it should be easier to see through it (i.e. more perceived transparency) but harder
to retain information about the portions of the surface with the black dots.
On the other hand, it was also hypothesized (H4) that increasing the dot size (which is not the
same as increasing the dot density) should lead to a weaker sense of transparency (H4a), since
larger dots will yield a smaller number of dots (or holes) on the surface to be seen through.
Moreover, those larger chunks of coherent surface information being occluded by the pattern
were expected to lead to a reduction in surface information preservation (H4b).
To investigate these hypotheses, the experiment was conducted in two consecutive sections (1
and 2). For both sections, the blue virtual circle was presented at a constant disparity angle of
4.16 arc-minutes, as explained above. The independent parameters, illustrated in Figure 5.2, were
three relative dot sizes {1/25, 1/50, 1/75} and four dot densities {40%, 50%, 60%, 70%}, as well
as the ‘No Pattern’ condition.
It is worth pointing out some more of the important differences between the current experiment
and an earlier set of related experiments reported by our team (Otsuki & Milgram, 2013). In that
earlier experiment, although a similar psychophysical test was administered, there was no
attempt to employ it to compute an effective location for the virtual object for their subsequent
investigation of perceived transparency. This resulted in their placement of the virtual object too
close to the real surface to act as a reliable stimulus for exploring the transparency effect in their
investigation of the incongruous condition. In addition, the surface used in that experiment
contained no texture, which, in addition to the fact that it was simulated rather than real, made it
somewhat less realistic. Finally, there was no attempt in that experiment to explore the ability to
discern surface information, and thus to explore the hypothesized trade-off explained below.
5.4.1.1 Section 1: Perception of Surface Information
Section 1 of Experiment 2 aimed to assess the effect of the random dot pattern parameters in
terms of any potential loss of surface information. Since the surface, by itself, did not contain
51
any specific information to be preserved, there was a need to add elements onto the surface.
These additional elements were covered by the random dots just as any other surface containing
such elements would be. (An example of this, once again, could be the surface of an organ
containing visible vessels.) To investigate how much information was lost due to the addition of
the random dot patterns, a shape matching task was designed, to evaluate participants’ accuracy
in identifying information presented on the real object surface when covered by different random
dot patterns. To accomplish this, each real surface was modified by adding to it a pair of
concentric yellow shapes – either two circles or a circle and an ellipse – after which the random
dot patterns were added31. As shown in the example of Figure 5.4(c), this means that the black
dots occluded different parts of the yellow shapes in different ways, depending on the particular
random pattern, just as they occluded the rest of the surface. (Note that, although the blue virtual
circle was still present for the surface information task, and was rendered behind the real surface,
it did not play any role in the shape matching task.)
The outer yellow shape for this task was always a circle. However, the inner yellow shape had a
30% probability of being also a circle (Figure 5.4(a)) or a 70% probability of being an ellipse
(Figure 5.4 (b)). The task was to determine, within 6 seconds, whether the inner yellow shape
was also a circle, like the outer circle, or whether it was an ellipse – that is, not a circle32.
To help participants do the shape matching task, they were advised during their training to
visually scan the whole image to examine the separation between the inner yellow shape and the
outer yellow circle. In other words, if the two shapes appeared to be equally separated from each
other around their circumferences, it was logical to conclude that they were both circles, whereas
if the separations appeared to vary, the conclusion should be that one shape was an ellipse. It
should be noted that, because we wanted this to be a relatively difficult task, the ellipses were
31 It should be noted that, although the yellow shapes were digitally added to the surface (and not, specifically,
captured by a sensor), they were meant to be considered as a ‘real’ feature present on the real object’s surface.
32 Should the reader, after examining Figure 5.4, be of the opinion that this was a difficult task, that was exactly the
intention!
52
designed to have very small eccentricities33. As can be seen in Figure 5.4(a) and Figure 5.4(b),
the difference between the two surface accuracy conditions was very slight.
Keeping in mind our overriding goal of evaluating whether an observer would be able
holistically to examine large parts of a real surface while employing our stereoscopic AR
display, we made the task even more difficult by preventing participants from focusing on only
one specific region of the stimulus. To accomplish this, the orientation of the major axis of each
ellipse was varied randomly and, in addition to pronouncing whether or not any particular
stimulus was an ellipse, participants were also asked to identify the direction of the major axis of
that perceived ellipse. (This was also intended to reduce the likelihood of guessing the
responses.) The orientations could possess any value from 0 to 180º, with 18º intervals, resulting
in 10 possible orientations. If participants perceived the inner yellow object as a circle, they
would press the ‘up’ arrow. On the other hand, if they perceived the inner object as an ellipse,
they were asked to indicate, using the numeric keypad, which of the 10 orientations of the major
axis of the ellipse they had observed, according to the response selection scheme presented to
them, as depicted in Figure 5.5.
For each combination of dot size (DS) and dot density (DD), as well as for the No Pattern
condition, 10 trials were randomly presented to each participant, of which 7 were ellipses (with a
10% chance for each orientation, unbeknownst to them) and 3 were circles. This led to a
minimum of 130 trials ((3x4 +1) *10) for each participant. The presentation order of the stimuli
was randomized. None of the shape matching conditions occurred more than once.
The parameter values for the experiment – namely eccentricity, number of response angles, time
limit duration – were selected on the basis of extensive pilot testing.
In trials where participants ran out of time, the experiment would automatically move on to the
next stimulus and the missed trial would repeat itself throughout the experiment as many times
as required until the participant had replied to all stimuli within the time limit.
33 In fact, the ellipses were not obtained according to the formal definition of eccentricity; rather, the ‘ellipses’ were
obtained by multiplying the x-axis of a corresponding circle by a factor of 0.95.
53
To motivate participants during the experiment, a lottery with a $50 gift card prize was
performed after all experiments were done. The participants were informed that the number of
lottery ballots assigned to their name would be proportional to their respective performance
scores.
(a)
(b)
(c)
54
Figure 5.4: Samples of stereo pairs illustrating the shape matching task for assessment of surface
information. (a) The inner and outer yellow objects are both circles. (b) The inner yellow object
is an ellipse. (a) and (b) constitute the No Pattern condition. (c) Example of task with random dot
pattern present, and where inner yellow object is an ellipse. The orientation of the major axes of
the ellipses in (b) and (c) are 54º (corresponding to level 3) and 144º (corresponding to level 8),
respectively.
Figure 5.5: Options for designating the orientation of the major axis in ellipse conditions. This
image was provided as a guide for assisting participants in selecting their responses to the ellipse
axis orientation questions, in the form of numerals 0 to 10 on the computer keypad.
For analysis purposes, Signal Detection Theory (SDT) was used for assessing performance on
distinguishing circles from ellipses. In addition, the absolute offset errors in detecting the
orientation of the major axis of the ellipse (using the numerical responses shown in Figure 5.5)
were averaged across each condition. According to the hypotheses presented in the beginning of
this section, it was hypothesized that as both dot density and dot size increased, performance on
the surface identification task would decrease. In particular, it was hypothesized that d’ values,
which are indicative of detection sensitivity, would decrease, while average absolute offset errors
would increase. The reasoning behind these hypotheses (H3b and H4b) was that, as relatively
greater portions of the yellow objects were covered by dots, it would be more difficult to perform
the shape matching task. For obvious reasons, the No Pattern condition was expected to result in
the highest sensitivity and lowest average offset error, since the yellow shapes were completely
unobstructed (hypothesis H2b).
55
5.4.1.2 Section 2: Impression of Surface Transparency
Section 2 of the experiment, which was administered to the same participants directly following
completion of Section 1, focused on exploring the relative effectiveness of the random dot
pattern parameters for creating the perception of transparency. Prior to starting this section of the
experiment, the purpose of the research and the concept of ‘transparency’ in the present context
were explained and demonstrated to participants. In particular, they were instructed that they
would be shown a set of images similar to that illustrated here in Figure 5.2, in each of which the
blue wireframe circle should appear to them to be located behind the portion of the textured
purple surface containing a random dot pattern. They were also told that, due to the manner in
which the display had been created, it was likely that they would perceive the textured purple
surface as being transparent34, and that the goal of this part of the experiment was to explore the
manner in which they perceived this transparency effect.
Because we did not consider it feasible to estimate in a direct and objective way how participants
would be able to perceive ‘transparency’ in the present context, we instead deemed Thurstone’s
classical method of paired comparison scaling (Thurstone, 1927) to be the most viable means of
achieving this end. During the data gathering phase, participants were presented with all possible
pairs of the images shown in Figure 5.2 (plus the No Pattern condition), two at a time. They had
unlimited time to examine each pair of images and to respond to the question: “In which image
is the impression of transparency more convincing?” The 13 different conditions (3 dot sizes x 4
dot densities + no pattern condition) resulted in 78 paired comparisons for each participant,
which were then aggregated over all participants and transformed into an (equal interval) scale of
Transparency Ratings (TR).
It should be pointed out that the question presented to participants was designed such that, rather
than asking directly about the perceived ‘degree’ of transparency, the relative strength of their
impression about transparency was instead being questioned. It is also important to realize that
there is no real zero on the equal interval scale of values resulting from this procedure, such that
34 Note that, as explained earlier, we avoided using the term ‘translucency’ for this experiment, based on our
presumption that participants might be confused by that term.
56
high or low comparative impressions of transparency do not necessarily translate to high or low
absolute ratings of degree of transparency.
Based on previous findings (Otsuki & Milgram, 2013), it was hypothesized that larger dot
densities and smaller dot sizes would lead to higher ratings for impression of transparency
(hypotheses H3a and H4a respectively), and in addition that the No Pattern condition would
yield the lowest rating (hypothesis H2a). One explanation for this is that the black dots in the
random dot pattern were postulated to be perceived as holes in the surface, such that, by
increasing dot density, the increased proportion of perceived holes should lead to a stronger
sense of transparency. On the other hand, it was surmised that increasing the dot size would lead
to a weaker sense of transparency, since larger dots (at the same dot density) result in a smaller
number of perceived holes on the real surface. Based on the same reasoning, it was expected that
the control condition comprising no pattern would result in the lowest transparency ratings
(hypothesis H2a).
It should be noted that the extra 70% dot density conditions that were added to Experiment 2
were a result of pilot tests, which led to the prediction that including these conditions would
potentially provide a better manifestation of the expected trade-off, explained in the next sub-
section.
5.4.1.3 Hypothesized Trade-offs
Before examining the results of the experiment, it is important to understand the relationship
between the various hypotheses presented for the two sections. Figure 5.6 summarizes those
respective hypotheses and illustrates our a priori expectation about the relationship between
them. The primary message to be extracted from Figure 5.6(a) is the trade-off between what we
believe to be the two primary objectives of augmented reality X-ray vision: effectively
presenting the impression of a virtual object (in this case the blue circle) being inside of a real
object (i.e. effectively equivalent to conveying the impression of surface transparency) while
concurrently maintaining the ability to observe and understand any pertinent information (in this
case the yellow circle and/or ellipse) on the surface of that real object (i.e. perception of surface
information). Figure 5.6(b), on the other hand, suggests that having smaller dots should always
have the effect of better perceiving surface transparency, while also retaining surface
57
information. The results presented in the following section should be read in light of these two
sets of hypotheses.
(a) (b)
Figure 5.6: Schematic illustration of hypotheses for both parts of Experiment 2. (a) effect of dot
density (H3); (b) effect of dot size (H4).
5.4.2 Results and Discussion
As mentioned, to assess participants’ performance in detecting ellipses, signal detection theory
(SDT) was used, where the occurrence of an inner ellipse was considered a “signal” event, and a
“hit” occurred whenever an ellipse was correctly detected as an ellipse35. To obtain a set of
average performance data over all participants, hits and false alarm rates were aggregated across
participants and then used to estimate the two collective SDT parameters, d’ and beta, for each
condition. The d’ results for different dot sizes and dot densities are shown as solid lines in
Figure 5.7.
35 Although there were 10 possible response angles (i.e. orientations) for the elliptical signal conditions, it is
important to note that these were all considered as having equivalent signal strengths. In other words, our
assumption was that there was one single value of d’ for the signal present case, rather than 10 different signal
strengths.
58
Figure 5.7: d’ and Transparency Rating (TR) results obtained from Experiment 2. The solid
lines join the d’ results, corresponding to the left hand axis, while the dashed lines join the
transparency ratings (TR), corresponding to the right hand axis. The yellow horizontal lines
correspond to the No Pattern condition.
The transparency rating (TR) measures are also included in Figure 5.7, as dashed lines. The No
Pattern condition results supported our hypothesis (H2a) of having the lowest TR value. (For
convenience, this value was assigned a value of zero on the scale derived from the paired
comparison data.) However, we were unable to identify a clear trend for the remaining TR values
for either different dot sizes or dot densities, according to Figure 5.6 and in support of H3a and
H4a. Comparing these results to those of Otsuki and Milgram (2013), who carried out an
analogous test that included DD=25%, in comparison with DD=50%, it is suspected that
designing our experiment with lower dot densities (<40%) might have allowed us to observe the
hypothesized increasing trend of TR values with increased dot density, as depicted in Figure 5.6.
Nevertheless, the substantial difference between the TR value for the No Pattern condition and
the TR values for the pattern conditions in support of hypothesis H2a demonstrates at least to
some extent the potential effectiveness of this method for creating the percept of transparency.
With regards to discerning surface information, it was hypothesized that with increases in both
dot density and relative dot size, performance on the detection task should decrease (hypotheses
H3b and H4b). This appears to have been supported by the results shown in Figure 5.7, where
59
the d’ values do in fact decrease with increases in both DS and DD. However, it is important to
note that ‘good performance’ is manifested in Figure 5.7 by d’ values in the vicinity of 1,
whereas d’ values in the vicinity of 0 (and below) represent essentially chance performance. In
addition to implying that the difficulty of the shape matching task may have been too high, this
suggests that this observed trend may not be that strong. On the other hand, the No Pattern
condition conforms to the expectation of yielding the highest d’ value (hypothesis H2b).
The averages of the absolute offset errors for the ellipse orientation task were plotted as a
function of dot size and dot density (Figure 5.8). It is worth noting that these offset errors
correspond only to the cases in which participants correctly judged the presence of the ellipse. As
can be seen, dot density does not seem to affect these errors in a meaningful way. Dot size,
however, does seems to have had an effect on the error, with the largest dot size (1/25) leading to
smaller mean offset errors, even when compared to the No Pattern condition. To check the
significance of this apparent finding, a two-way ANOVA was carried out, followed by post hoc
tests. Results showed that average offset errors were indeed significantly affected by the dot size,
F(2,28)=16.37, p<.0001 but not by dot density, F(3,42)=0.329, p>.05. Contrasts revealed that
average offset errors for the 1/50 dot size, F(1,14)=18.55, and the 1/75 dot size, F(1,14)=22.34,
were significantly larger than those of the 1/25 dot size.
Figure 5.8: Mean absolute offset errors as a function of dot size and dot density. The orange
horizontal line corresponds to the No Pattern condition.
60
This interesting finding may initially seem to contradict the SDT results, which showed d’ values
reflecting essentially chance performance for the 1/25 and 1/50 dot size conditions. However,
referring to the fact that the offset errors correspond only to the cases in which participants
correctly judged the presence of the ellipse, this makes sense. In other words, it seems that it was
in cases where the larger black dots (with DS=1/25) did not occlude the intersection of the major
axis of the ellipse with the outer circle that participants were able to both correctly identify the
ellipse and be more accurate in determining its orientation. The smaller dot size (DS=1/75), on
the other hand, provided a better holistic representation of the surface information (resulting in
larger d’ values) without preserving the more detailed information (resulting in larger offset
errors). This finding suggests that, in cases where the location of the most essential surface
information is known, if it is feasible to find a random dot pattern that does not occlude this part
of the surface, achieving x-ray vision should be done using a larger dot size.
5.5 Contributions, Limitations and Conclusions
Results from this set of experiments showed that the use of random dot patterns can be effective
in contributing to the percept of transparency of real surfaces in 3D AR displays, with expected
relevance towards X-ray vision applications. In particular, the main contributions of these
experiments are:
• Random dot patterns (with appropriately designed dot sizes) were shown to be a
potentially effective method for disambiguating the depth order between virtual objects
and real surfaces with textures that lack 3D textural elements.
• By appropriately controlling the relative dot size and dot density of the patterns, it should
be possible to retain sufficient information about the real surface to enable a user both to
observe a virtual object being presented inside of a real one, while concurrently
examining the surface of the real object.
It is important, however, to point out that the experiments presented here were limited to the use
of a flat real surface with a 2D texture, and to a 2D wireframe virtual object being presented in
depth. Although such objects are easy to manipulate digitally, such conditions may be rare in
actual AR applications. For example, taking the medical domain as an important target
application, real objects are 3D organs that usually consist of convex surfaces. Such conditions
61
justify the need to determine whether the results observed for this flat real surface will also
pertain to convex real 3D surfaces. It would also be interesting to investigate the applicability of
these findings to cases where the virtual wireframe is presented in 3D.
Furthermore, the results of these experiments, which confirmed the potential of using random dot
patterns for improving depth order perception, serve as motivation to investigate this effect also
for improving absolute depth judgements. The conclusions thus provide the justifications and
framework for Experiment 3, which is the topic of the next chapter.
62
Chapter 6
Experiment 3: Effect of Using Random Dot Patterns for Improving Accuracy of Depth Judgements
As results from the previous set of experiments showed, the addition of random dot patterns to
real object surfaces can be effective in perceiving the virtual object as being behind the real
surface, which achieves the notion of X-ray vision. However, as described in Section 4.1, based
on theory, the addition of random dot patterns should not only allow for proper depth order
perception but it should also allow for more accurate absolute depth judgements. To investigate
this possibility, an experiment was designed in which participants were asked to judge the
absolute depth of a virtual object relative to a real surface. This chapter presents the detailed
description and results of these experiments.
6.1 Purposes
As mentioned, the primary purpose of this experiment was to investigate whether the addition of
random dot patterns can lead to improvements in the accuracy of absolute depth judgements
between the virtual object and the real surface. As may be recalled, the reasoning was that, by
adding random dot patterns to a real surface, we are able to provide observers with distinct
fixation points (being the edges of the dots), thus guiding them in making vergence eye
movements (between the virtual object and the real surface) and in making better depth
judgements (based on confirmatory information provided by the convergence cue). Therefore,
another goal of this experiment was to test this theory, by manipulating the distinctiveness of the
dots. Moreover, since developing a practically usable X-ray display requires a measure of the
user’s assessment about the difficulty of performing a depth judgement task, the other goal of
this experiment was to investigate this subjective difficulty.
Therefore, this experiment was designed to answer the following questions:
• Can the addition of random dot patterns lead to increased accuracy of absolute depth
judgements between the virtual object and real surface for non-flat surfaces?
• If so, is it true that the resulting increased accuracy of depth judgements is because
random dot patterns provide distinct edges?
63
• Do the addition of random dot patterns lead to an increase or decrease in the subjective
difficulty of performing a particular depth judgement task?
• Can design guidelines be formulated to assist in determining the dot size and dot density
of random dot patterns that achieve optimal depth judgement accuracies?
In the next section we provide a description of the experimental method that was used to address
the above questions.
6.2 Experimental Method
The same experimental platform described in the previous chapter was used to carry out this
depth judgement experiment. This section describes in detail the stimuli generation and
presentation, the experimental task, the procedures that were followed, as well as the
experimental hypotheses. (In the process of designing this experiment, several pilot studies were
performed; some of the key lessons learned from these pilot studies are presented in Appendix
B1.)
6.2.1 Image Generation and Presentation
An example of the stimuli used in the experiment is shown in Figure 6.1. The stimuli consisted
of a 3D image comprising several different parts, as shown in Figure 6.2. In this section, we
delve into the specific details of each of these parts.
64
Figure 6.1: Sample stereo pair of stimuli shown to participants. The pattern used in this example
consisted of random dots with sizes of 1/75 and distributed with 40% dot density. Four tenths of
the blue virtual truncated cone is presented behind the surface of the bin. For guidance on how to
fuse these images, see explanation provided in caption of Figure 1.3.
Figure 6.2: Diagram presenting different parts of the stimulus.
6.2.1.1 Real Object
For the real object, a circular dustbin was used. The reason for choosing a curved surface was to
investigate the effectiveness of our idea for non-flat (convex) surfaces, due to the fact that real
world objects, specifically in the medical domain, are rarely 2D, let alone flat. The inside
diameter of the bin was 43 cm. Since one of our claims is that our method is most effective for
real object surfaces without a prominent visible texture, the cylinder was covered with white
65
cardboard to ensure that this condition was met. Such a surface can be considered analogous to
the smooth surface of an organ, which does not contain distinct elements.
6.2.1.2 Random Dot Patterns
The random dot patterns were generated using the MATLAB function ‘rand’. As a means of
circumventing the challenge of writing software to superimpose the random dot patterns
digitally, the random dot patterns were projected onto the surface of the bin using an AAXA P4-
X pico projector. The distance of the projector from the bin was such that the projected pattern
was a 30 cm x 30 cm square. The position of the projected pattern remained constant throughout
all conditions. Throughout the experiment, dot size (DS) and dot density (DD) of the patterns
were varied. Recalling that, based on our definition, dot size refers to the fraction into which
each dimension is divided, dot sizes of 1/25, 1/50 and 1/75 corresponded to squares with 12, 6
and 4 mm sides, respectively.
Moreover, since one of the main purposes of this experiment was to investigate whether the
distinct edges of the random dot patterns were used as fixation points to guide observers in
making vergence eye movements and, therefore, more accurate depth judgements, each random
dot pattern was projected as either a sharp or a blurry image36. To blur the random dot patterns,
the Gaussian blur filter of Photoshop (with a blurring radius of 20 pixels) was used.
With 3 dot sizes, 3 dot densities, 2 (sharp vs. blurry) conditions, 18 different patterns in total
were projected onto the bin. Figure 6.3 and Figure 6.4 show monoscopic versions of the stimuli,
including both the sharp and blurry patterns, respectively. In addition to the stimuli presented in
Figure 6.3 and Figure 6.4, a ‘No Pattern’ condition was also presented.
36 It should be noted that eye tracking devices are necessary to investigate this phenomenon closely and definitively.
Blurring the patterns only allows for obtaining preliminary evidence for the possibility of this phenomenon
occurring.
66
Figure 6.3: Stimuli with sharp random dot patterns used for Experiment 3.
67
Figure 6.4: Stimuli with blurry random dot patterns used for Experiment 3.
6.2.1.3 Virtual Object
A sample of the stereo images used in the experiment is presented in Figure 6.1. The virtual
object was a truncated wireframe cone, with its top surface (smaller circle) placed closer to the
observer than its base (larger circle). Figure 6.5 depicts the front, side and top views of the
truncated cone. The decision to employ a 3D rather than a 2D virtual object was to investigate
the feasibility of extending the application of random dot patterns for achieving X-ray vision to
3D virtual objects. Additionally, using a 3D virtual object allowed for testing the accuracy of
depth judgements.
68
Figure 6.5: Front, side and top views of wireframe truncated cone (the virtual object) and
cylindrical bin (the real object).
Generally, in applications of stereoscopic AR, the 3D images taken from the real world are often
processed for purposes of camera calibration and to obtain depth maps. These depth maps are
then used to render a virtual object at a specific depth relative to the real object(s). However,
doing so requires a certain computational capacity (both software and hardware). Considering
that the focus of this PhD thesis was to investigate the human factors side of this approach rather
than its technical implementation, it was decided to simulate these conditions, without sacrificing
the validity of the obtained results, using real physical models. Therefore, to generate the virtual
object and render it at its appropriate depth a real model was used.
The steps to generate the virtual object were as follows:
a) Two concentric circles with different diameters were drawn on paper, cut out and
attached to the tips of a rod. Two perpendicular diameter lines were also drawn on the
circles. An illustration of this is shown in Figure 6.6. The rod was placed such that the
circles were perpendicular to the line of sight. A stereo image was taken of the circles
using a Fujifilm FinePix REAL 3D W3 stereo camera.
69
b) The left and right images were imported into Photoshop. For each pair, using the Custom
Shape Tool, rings were drawn onto the two circles. Corresponding points on the diameter
lines of the two circles were also connected using the Line Tool (with a thickness of 3
pixels). Once this virtual truncated cone was created, the rest of the image was deleted by
selecting the Inverse of the coloured truncated cone and saving the image as a .PNG file.
This sequence of steps is illustrated in Figure 6.7.
Figure 6.6: Diagram showing real model used for generating virtual truncated cone. (It should
be noted that, as mentioned, this model consists of concentric circles. However, since the image
shows this model from the side, these circles appear in this figure as ellipses.)
Figure 6.7: Sequence of steps taken to generate virtual truncated cone. As expected, the
connecting rod between the two circles cannot be seen in these images, as it is perpendicular to
the line of sight.
70
As explained below, the task involved estimating the distances between the proximal and distal
circles, as well as their distances to the surface of the cylinder, as shown in Figure 6.5. To
prevent participants from using the relative size depth cue in this task, two different truncated
cone lengths (the distances between the two circles) were presented, by using either a 15 or a 17
cm rod. In addition, the sizes of the base and top circles were also changed between trials, by
randomly varying the diameters of the blue circles drawn in Step 2 (shown in Figure 6.7).
For the experiment we wanted to present the truncated cone with its base at 6 different depths - at
the surface (of the bin), two tenths, four tenths, six tenths and eight tenths of its length behind the
surface, and completely behind the surface (with the distal surface of the truncated cone touching
the surface of the bin). The required stimuli are illustrated (schematically) in Figure 6.8.
Figure 6.8: Schematic top view diagram of rod with circles placed at 6 different depths relative
to the surface of the bin. The black numbers noted on the rod are indicative of the proportion of
the truncated cone that was placed behind the bin’s surface. (The light blue lines joining the
circles represent the sides of the final virtual truncated cone.)
To obtain the necessary stimuli we needed to actually execute this with the real model. To render
the virtual object at its correct depth relative to the surface of the bin, the following steps were
taken:
a) The bin was placed at a fixed location on a table and in front of the Fujifilm stereo
camera, which was fixed on a stand at a distance of 138 cm from the surface of the bin.
For each random dot pattern (as well as the No Pattern condition), a stereo image was
taken of the bin (as shown in Figure 6.3 and Figure 6.4). Once this was done, the bin was
removed from the scene. However, the location of the surface of the bin was marked on
the table.
71
b) Once the bin was removed, the rod connecting the two circles was replaced in the scene.
A general illustration of the replacement of the bin with the rod connecting the circles is
shown in Figure 6.9. To obtain the virtual object images needed for the stimuli illustrated
in Figure 6.8, the length of each connecting rod (between the two concentric circles) was
divided into 5 sections – 3 cm apart for the 15 cm rod and 3.4 cm apart for the 17 cm rod.
Separate stereo images were taken of the model (as shown in Step 1 of Figure 6.7) for a
series of locations of the larger pink circle relative to the surface of the bin in order to
obtain images of the 6 conditions illustrated in Figure 6.8.
c) The stereo photos taken of the bin and of the model of the virtual object were split to left
and right images, and each of these went through the process illustrated in Figure 6.7.
d) Each left image of the virtual object was overlaid on top of each of the left images shown
in Figure 6.3 and Figure 6.4. The same was done for right images. This was done by
using the Apply Image option in Photoshop.
A sample of the resulting final stereo image is presented in Figure 6.1.
Figure 6.9: Schematic diagram showing the camera setup with respect to the bin (above), which
was replaced by the rod connecting the circles (below), which was placed along the red dashed
line (marking the surface of the bin’s location).
72
Once the images were generated, they were rendered stereoscopically using a desktop computer
(Windows 7 Professional OS with NVIDIA Quadro 600), coded using MATLAB. The stimuli
were presented to participants with a size of 15cm by 23 cm on a 23-inch LCD screen (ASUS
VG236HE, 1920 x 1080 resolution, 120 Hz refresh rate). Stereo images were observed using the
NVIDIA 3D vision system with 3D Vision 2 glasses. The participants’ task was to determine,
within a limited amount of time, the fraction of the truncated cone that was perceived to be
behind the cylinder’s surface.
6.2.2 Participants
15 students from the University of Toronto were recruited, all 18-49 years old (9 male and 6
female). Participants of Experiments 1 and 2 were precluded from participating in Experiment 3
to prevent learning effects. All participants either had normal visual acuity or used corrective
devices during the experiments to achieve normal visual acuity. To confirm the absence of any
stereoscopic vision problems, the NVIDIA 3D stereo vision test was administered. As
compensation, participants were each paid $15/hour. To motivate participants during the
experiment, a lottery with a $50 gift card prize was carried out after all experiments were done.
The participants were informed that the number of lottery ballots assigned to their names would
be proportional to their respective performance scores.
6.2.3 Procedure
After taking the stereo vision test, participants were given an information sheet outlining the
details of the experiment. They were then given the consent form to sign, which was followed by
a brief questionnaire, asking about their age, gender, use of corrective lenses and ability to
perceive stereoscopically. Copies of these are included in Appendix A3.
Once these steps were taken, 6 different samples of stimuli similar to the one shown in Figure
6.1 were presented to the participant one by one. During the first sample presentation,
participants were asked to describe what they saw. Once it was confirmed that they were seeing a
truncated cone that is partially behind the surface of a bin covered by random black dots, the
procedure of the experiment was further explained to them by showing the rest of the 6
examples.
73
Participants were then taken through a brief training session (consisting of 6 trials) which
allowed them to become familiar with the experimental software. In contrast to the preceding
examples, however, no feedback was provided to participants. During this training session (as
well as the actual experiment), a chinrest was used to ensure a fixed viewer-to-display distance
of 40 cm.
Each trial consisted of a 2-second presentation of a stimulus. The stimulus would then disappear
and participants would be prompted with a screen asking for a response to “Determine what
proportion of the truncated cone (x-tenths) is behind the bin's surface.” Because we realised that
it might be easy to confuse our request to estimate the proportion behind the surface with a
potential request to estimate the proportion in front, a visual guide, printed on a sheet of paper
and shown in Figure 6.10, was placed next to the monitor. It is important to point out that, even
though the virtual object was presented at 6 discrete depths (0, 2, 4, 6, 8, and 10 tenths behind the
surface of the bin), participants received no feedback and were thus unaware of this constraint. In
order to obtain higher resolutions in the measured errors, we allowed their responses to take on
any of the 11 discrete values between 0 and 10. After choosing their response from the numbers
on top of the keyboard, participants would press Enter.
They were then prompted to answer a second question that asked: “On a scale of 1 (easiest) to 4
(most difficult), how difficult did you find the task?” The procedure to answer this question was
the same as before, using the numeric keyboard. Participants had unlimited time to answer both
questions. The rationale for including this question was to obtain a measure of participants’
assessment about the difficulty of the task, which is an important factor in developing a
practically usable X-ray display. The correspondence between these ratings and the accuracy of
the depth judgement task was also meant to provide a measure of the consistency between
participants’ objective performance and subjective experience. In other words, asking this
question was meant to provide us with information about whether more accurate responses were
associated with lower subjective difficulties and vice versa.
74
Figure 6.10: Guide placed next to monitor for participants’ reference during experiment.
Once the training was done, the actual experiment began. With 19 display conditions (3 dot
densities x 3 dot sizes x 2 blur (sharp and blurry) levels + 1 No Pattern condition) and 30 task
conditions (6 depths x 5 repetitions of each), participants went through a total of 570 (19x30)
trials. On average, the experiment took about 50-60 minutes, during which they were given 10
minutes to take one break.
After the trials were completed, participants were interviewed. The script for this interview is
also included in Appendix A3. The main topics covered during the interview were:
• whether there were any difficulties experienced in fusing images,
• whether there were any specific strategies used for making depth judgements, and
• the way in which the black dots were perceived.
6.2.4 Depth Judgement Task
Although the general experimental procedure was presented in the previous section, due to the
important role of the depth judgement task in the experimental design, this section focuses
exclusively on the details of this task.
75
Generally, the depth of objects can be considered in two ways: ordinal and absolute. Ordinal
depth pertains to the depth order between two (or more) objects (i.e., which is closer, which is
farther). For example, our first set of experiments (described in Chapter 5) asked about the
ordinal depth of the virtual object relative to the real surface37. Absolute depth, on the other
hand, can be ascertained by the observer using units such as meters (Livingston et al., 2013).
In addition, the absolute depth of objects can be defined in terms of egocentric distances or
exocentric distances. Egocentric distances refer to the absolute depth of an object relative to the
observer, while exocentric distances refer to the absolute depth of an object relative to another
object in the field of view (Swan et al., 2007). As previously mentioned, the focus of this
experiment was to determine whether adding random dot patterns can lead to an improvement in
the accuracy of exocentric distance estimations about the virtual object relative to the real
surface.
Considering the points mentioned above, by asking participants to “determine what proportion of
the truncated cone (x-tenths) is behind the bin's surface”, we are asking them to determine the
exocentric distance of the larger circle of the truncated cone relative to the bin’s surface (‘a’ in
Figure 6.11), given that the exocentric distance between the larger circle and the smaller circle of
the truncated cone (‘a+b’ in Figure 6.11) is 10 units38.
37 It should be pointed out though that, ultimately, the findings from the ordinal depth judgement task in this
experiment resulted in a psychophysical function that provided an estimate of the absolute depth of the virtual
object.
38 It is worth pointing out that even though asking for the proportion of the cone that is in front of the bin (that is,
b/10, where ‘b’ is shown in Figure 6.11) should technically achieve the same result, the reason we chose to ask for
the proportion of the cone that is behind the bin’s surface was that we are generally interested in applications of X-
ray vision where depth judgements about the objects that are behind the real surface are more pertinent.
76
Figure 6.11: Top view example of truncated cone’s position relative to the surface of the bin. In
this image ‘a’ and ‘b’ denote the distance of the larger and smaller circles of the truncated cone
relative to the bin’s surface, respectively.
6.3 Hypotheses
To recap, the independent parameters of this experiment were as follows:
• Dot Size: 1/25 (largest), 1/50, 1/75 (smallest)
• Dot Density: 20%, 40%, 60%
• Blur level: Sharp, Blurry
• Fraction behind bin surface: 0/10, 2/10, 4/10, 6/10, 8/10, 10/10
• Patterns vs No Pattern condition
On the other hand, the dependent parameters measured during the experiment (in addition to the
interview results) were:
• Estimated depth of virtual object relative to real surface
• Difficulty rating of associated depth estimation trial
As such, the hypotheses presented are with regards to these two dependent parameters.
77
6.3.1 Estimated Depth of Virtual Object relative to real surface (EDVO)
As discussed, the addition of random dot patterns is expected to help with making more accurate
depth judgements about the virtual object relative to the real surface. Thus, Hypothesis 1 was
that the addition of patterns would lead to lower errors in EDVO relative to the No Pattern
condition.
Based on the reasoning provided in Section 4.2, the logic behind our next hypothesis was that
random dot patterns provide distinct edges that assist observers in making vergence eye
movements, thus allowing for more accurate absolute depth judgements about the virtual object
relative to the real surface. Hypothesis 2a, therefore, was that sharp patterns will lead to fewer
errors in EDVO compared to blurry patterns. Moreover, considering that smaller dot sizes have a
higher dominant spatial frequency compared to larger dot sizes and since ‘blurring’, in effect,
attenuates high spatial frequencies, Hypothesis 2b was that the effect of blur on errors in EDVO
would be larger for smaller dot sizes. In other words, if EDVO errors were found to be different
for blurry and sharp patterns, it was hypothesised that this difference would be larger for smaller
dot sizes.
As for the sharp patterns, two competing hypotheses were created with regards to the effect of
dot size. Since larger dot sizes provide more distinct edges for making vergence eye movements,
it can be hypothesized that they will lead to smaller errors in EDVO. On the other hand, smaller
dot sizes result in a larger number of edges available for making vergence eye movements and
can, therefore, be predicted to lead to smaller errors in EDVO. With regards to dot density,
higher dot densities of random dot stereograms increase the stimulus’ strength in attracting
vergence (Rashbass & Westheimer, 1961; Mallot, Roll & Arndt, 1995). Therefore, based on this
reasoning, Hypothesis 3 predicted that increasing dot densities would facilitate vergence eye
movements, leading to smaller errors in EDVO.
6.3.2 Difficulty Rating of depth estimation task (DR)
With regards to difficulty ratings (DRs), the following hypotheses were made:
• Hypothesis 4 was that the No Pattern condition would yield higher DRs compared to the
pattern conditions. The reason for this was that, without the pattern, it was expected to be
rather difficult to ascertain the depth of the real surface.
78
Moreover:
• Since the blurry patterns did not provide the distinct edges required for making vergence
eye movements, Hypothesis 5 was that blurry patterns would lead to higher difficulty
ratings compared to their sharp counterparts.
Two more hypotheses are illustrated in Figure 6.12:
• Hypothesis 6 predicted that larger dot sizes would lead to lower DRs since larger dots
provided edges that were more distinct.
• As mentioned above, since higher dot densities are more likely to facilitate vergence eye
movements, Hypothesis 7 was that higher dot densities would lead to lower DRs.
Figure 6.12: Experimental hypotheses 6 and 7, illustrating expected changes in DR as a function
of dot size and dot density.
6.4 Results
We start the data analysis process with a visual inspection of the data that were collected. The
statistical analyses of these data are then presented, followed by a discussion of the results and
their implications. In this section, we first focus on the results obtained for each of the dependent
parameters separately and then present the results related to the correspondence between these
two parameters. Finally, the responses to the interview questions are summarized and presented.
79
With regards to the statistical analyses, it should be noted that the experimental design - which
consisted of 30 trials for the No Pattern condition (6 depths x 5 repetitions) and 30 trials for each
combination of blur, dot size, and dot density (for a total of 540 trials) - led to an unbalanced
design. Therefore, to simplify the analyses, two series of repeated-measures ANOVAs were
performed on the two dependent parameters: EDVO and DR. The first repeated-measures
ANOVA treated each pattern independently and was meant to compare the results for the No
Pattern condition to each of the random dot pattern conditions. The second ANOVA was done
exclusively on the results pertaining to the random dot pattern conditions and was meant to
compare the various dot sizes, dot densities, depths and blur conditions.
6.4.1 Estimated Depth of Virtual Object (EDVO)
As previously mentioned, for each trial participants entered the proportion (in tenths) of the
truncated cone (virtual object) that they perceived as being behind the real bin surface. Figure
6.13, Figure 6.14 and Figure 6.15 present scatterplots showing the ‘Estimated Depth of Virtual
Object’ (EDVO) as a function of the virtual object’s actual depth proportion (relative to the real
surface) for the various patterns. Figure 6.16 shows the same scatterplot for the No Pattern
condition. In these figures, both the sizes and colours of the dots are proportional to the number
of occurrences at each point (higher number of occurrences are shown with larger and darker
circles).
The y=x line has also been added as a reference line, representing perfect performance. In
addition to acting as a reference for the values shown on each graph, it is important to understand
that all points above the y=x line represent estimates that are biased towards being behind the bin
surface, while all points below the y=x line represent estimates that are biased towards being in
front. Accordingly, the scenario for which depth perception would be “perfect” would result in a
‘scatterplot’ comprising only large red circles on the y=x line. On the other hand, complete
chance performance would lead to a scatterplot comprising all circles of the same size (and of
yellow colour) and equally distributed across all y-values for each depth. The resulting trend line
in this case would be y=5, which is shown with black dashed lines. As such, these scatterplots
should be compared to these two extreme situations.
80
Figure 6.13: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 20%: (a) Sharp
condition, (b) Blurry condition. The sizes and colours of the dots are proportional to the number
of occurrences at each point. Each column adds up to 75 trials (15 participants*5 trials). A blue
trend line has been fitted to the data. The y=x and y=5 reference lines are also provided to show
perfect and chance performance, respectively.
(a)
DD = 20%
Sharp
(b)
DD = 20%
Blurry
81
Figure 6.14: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 40%: (a) Sharp
condition, (b) Blurry condition.
(a)
DD = 40%
Sharp
(b)
DD = 40%
Blurry
82
Figure 6.15: Scatterplots showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for various dot sizes and dot density of 60%: (a) Sharp
condition, (b) Blurry condition.
(a)
DD = 60%
Sharp
(b)
DD = 60%
Blurry
83
Figure 6.16: Scatterplot showing the ‘Estimated Depth of Virtual Object’ as a function of the
virtual object’s actual depth proportion for the No Pattern condition.
As a first approximation, a blue trend line has been fitted to the values for the EDVO. Where the
trend line intersects the y=5 line can be considered as the ‘Point of Subjective Equality’ (PSE),
where the observer perceives the virtual object as being halfway inside the cone (EDVO=5/10).
The actual depth proportions where these intersections occur are calculated and shown for the
various patterns in Figure 6.17.
84
Figure 6.17: Plot showing the Point of Subjective Equality (PSE) as a function of dot density.
The PSE for the No Pattern condition is shown for reference.
By examining the trends shown in Figure 6.17, several observations can be made:
• The PSE is smaller for all patterns compared to the No Pattern condition. This
observation is in line with Hypothesis 1 and could potentially serve as supporting
evidence that random dot patterns can help in improving ordinal and absolute depth
judgements.
• As dot density increases, the PSE appears to move towards the front of the bin.
• The blurring of the random dot patterns leads to a shift of the PSE towards the back of the
bin. In other words, (similar to the finding discussed in Section 5.3.3) blurry patterns
cause the virtual object to be perceived as closer to the observer compared to their sharp
counterparts.
• As dot size decreases (1/25 → 1/75), the difference in PSE between sharp and blurry
patterns increases. This observation is in line with Hypothesis 2b which stated that ‘if
EDVO errors were found to be different for blurry and sharp patterns, this difference
would be larger for smaller dot sizes’.
• The relative ordering of the patterns with respect to PSE reverses when they are blurred:
For the sharp patterns, the PSE of 1/75 < 1/50 < 1/25 but, for the blurry patterns, the PSE
85
of 1/75 > 1/50 > 1/25 (on average). In other words, as dot size decreases, the PSE moves
closer to the observer for sharp patterns and farther behind the bin for blurry patterns.
To further inspect the data for trends, the absolute values of the errors were also averaged for all
participants. Each error was calculated as the difference between the EDVO and the actual
proportion of the truncated cone behind the bin surface (perceived depth proportion minus actual
depth proportion). Figure 6.18, Figure 6.19 and Figure 6.20 illustrate the average absolute errors
for all participants as a function of the virtual object’s depth for the three dot sizes. The results
for the No Pattern condition are also provided for each set of results for comparison purposes.
Upon inspection of these figures, it seems that for depths≥6/10, differences between the No
Pattern and the random dot pattern conditions becomes greater. This observation is in line with
Hypothesis 1. Moreover, for depths≥6/10, the sharp random dot patterns seem to lead to lower
errors than their blurry counterparts (as predicted by Hypothesis 2a). This effect also seems to be
more noticeable as dot size becomes smaller, which was also predicted by Hypothesis 2b. On the
other hand, no obvious trends can be inferred about the effect of dot size and dot density on the
average absolute errors in perceived depth.
To check whether any of these apparent effects were statistically significant, ANOVAs were
performed on the absolute values of errors averaged across the 5 (repetition) trials for each
condition and each participant. The following sections focus on the results of the two ANOVAs
that were performed.
86
Figure 6.18: Average absolute error in perceived depth as a function of the virtual object’s
actual depth relative to the real surface for dot size = 1/25.
Figure 6.19: Average absolute error in perceived depth as a function of the virtual object’s
actual depth relative to the real surface for dot size = 1/50.
87
Figure 6.20: Average absolute error in perceived depth as a function of the virtual object’s
actual depth relative to the real surface for dot size = 1/75.
6.4.1.1 Two-way Repeated Measures ANOVA
As previously mentioned, the first ANOVA was performed such that each pattern was treated
independently as a means of comparing the different patterns to the No Pattern condition. With 3
dot sizes, 3 dot densities, 2 blur conditions and a No Pattern condition, there were a total of 19
patterns. The two independent parameters considered were pattern and depth.
Mauchly’s test indicated that the assumption of sphericity had been violated for the main effect
of depth, 𝒳2(14) = 102.08, 𝑝 < .0005. Therefore, degrees of freedom were corrected using
Greenhouse-Geisser estimates of sphericity (𝜀 = .32). Following this correction, there was a
marginally significant effect of depth (p=.06). As for pattern, results revealed a significant main
effect, F(18, 252) = 3.43, p<.0005. Contrasts, however, revealed that there was no significant
difference between the No Pattern condition and the two patterns with dot density of 20% and
dot sizes of 1/50 and 1/75. This finding seems to make sense, considering that the small and
blurry dots with low dot densities (as shown in Figure 6.4) visually appear very similar to the No
Pattern condition. Other than these two patterns, significant main effects of pattern were found
for all other patterns.
88
On the other hand, there was a significant interaction effect between depth and pattern, F(90,
1260)=4.06, p<.0005. Table 6.1 outlines these significant interactions revealed by contrasts. For
the patterns not listed in Table 6.1, no significant interaction effect was found.
Table 6.1: Contrast results for significant interaction effects between depth and pattern. The
rows are colour coded to aid in identification of patterns with the same dot size.
No Pattern to Pattern with: … Blur Condition Depth=0 to Depths = ... DS=1/25 , DD=20 Sharp 8
DS=1/25 , DD=40 Sharp 6, 8, 10
DS=1/25 , DD=60 Sharp 8
DS=1/50 , DD=20 Sharp 8
DS=1/50 , DD=40 Sharp 2, 4, 6, 8, 10
DS=1/50 , DD=60 Sharp 6, 8, 10
DS=1/75 , DD=20 Sharp 8, 10
DS=1/75 , DD=40 Sharp 6, 8, 10
DS=1/75 , DD=60 Sharp 2, 6, 8, 10
DS=1/25 , DD=60 Blurry 8
DS=1/50 , DD=40 Blurry 6
DS=1/50 , DD=60 Blurry 8, 10
To derive meaning from these results, we performed individual t-tests comparing each pattern
with the No Pattern condition at every depth. Although one may argue that doing so increases the
chances of making Type I errors, if found to be non-significant these tests provide supporting
evidence not to reject the null hypothesis. In other words, although we are not justified in
‘accepting the null hypothesis’, the existence of non-significant t-tests provides some supporting
evidence that the differences (in absolute error in perceived depth) between the two conditions
are not significant. Below, the final conclusive results are summarized and presented separately
for each dot size39:
• DS=1/25: As visually confirmed in Figure 6.18, no significant difference between the
various patterns and the No Pattern condition were found for depths<6/10. Generally, the
interaction effects that were found to be significant (for depth=0/10 to depths≥6/10) are a
result of the differences between the patterns and the No Pattern condition becoming
significant as the virtual object moves farther behind the real surface. In other words,
patterns with the dot size of 1/25 lead to lower absolute errors in perceived depth
39 To aid in comprehension of these results, it is suggested to refer to Figure 6.18, Figure 6.19 and Figure 6.20.
89
compared to the No Pattern condition when the virtual object is placed at depths≥6/10.
For depths≤4/10, no significant difference exists between these patterns and the No
Pattern condition.
• DS=1/50 (Figure 6.19): For the sharp patterns, there was no significant differences
between the No Pattern condition and the patterns with dot densities of 20 and 60% at
depth=0/10. Therefore, the depths where interactions exist (as presented in Table 6.1) are
the depths for which significant differences exist between these patterns and the No
Pattern condition. For example, based on the results presented in Table 6.1, we can
conclude that for the pattern with 60% dot density, the absolute errors in perceived depth
are significantly smaller compared to the No Pattern condition for depths≥6/10. For
smaller depths, there are no significant differences between this pattern and the No
Pattern condition. For the (sharp) pattern with 40% dot density, the mean absolute error
in perceived depth was found to be significantly larger than for the No Pattern condition
for depth=0/10. No significant difference was found for depths=2/10 and 4/10
(explaining why there are interaction effects present for these depths in Table 6.1).
However, for depths≥6/10, the mean absolute error in perceived depth is significantly
smaller compared to the No Pattern condition.
As for the blurry patterns, as previously mentioned, there was no significant main effect
of pattern for dot density 20%. For dot density 40%, there was no significant difference
in absolute error in perceived depth compared to the No Pattern condition for all depths
other than depth=6/10. However, when DD=60%, the absolute errors in perceived depth
were significantly smaller than those of the No Pattern condition for depths=8/10 and
10/10.
• DS=1/75 (Figure 6.20): For the sharp patterns, there were no significant differences
between the No Pattern condition and the patterns with dot densities of 20 and 40% at
depth≤4/10. Therefore, the depths where interactions exist (as presented in Table 6.1) are
the depths for which significant differences exist between these patterns and the No
Pattern condition. For example, based on the results presented in Table 6.1, we can
conclude that for the pattern with 40% dot density, the mean absolute errors in perceived
depth are significantly smaller compared to the No Pattern condition for depths≥6/10. For
smaller depths, there are no significant differences between this pattern and the No
90
Pattern condition. For the (sharp) pattern with 60% dot density, the mean absolute error
in perceived depth was found to be significantly larger than the No Pattern condition for
depth=0/10. No significant difference was found for depth=2/10 (which explains why
there is an interaction effect present for this depth in Table 6.1). However, for
depths≥6/10, the absolute error in perceived depth is significantly smaller compared to
the No Pattern condition.
As for the blurry patterns, as previously mentioned, there was no main effect of pattern
for dot density of 20%. For dot density of 40% and 60%, there were no significant
interaction effects present. However, main effects of patterns were found to be
significant. Referring to Figure 6.20, it can be inferred that these patterns generally led to
significantly lower absolute errors in perceived depth compared to the No Pattern
condition.
6.4.1.2 Four-way Repeated Measures ANOVA
As previously mentioned, a second set of ANOVAs was done exclusively on the results
pertaining to the random dot pattern conditions and was meant to compare the various dot sizes,
dot densities, depths and blur conditions40.
Mauchly’s test indicated that the assumption of sphericity had been violated for the interaction
effects of blur and depth (𝜒2(14) = 72.85, 𝑝 < .0005), depth and dot density, (𝜒2(54) =
89.83, 𝑝 < .005) and that of blur, depth and dot size, (𝜒2(54) = 120.49, 𝑝 < .0005). Therefore,
degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (𝜀 = .28
for the interaction effect between blur and depth, 𝜀 = .36 for the interaction effect between depth
and dot density, and 𝜀 = .37 for the interaction effect between blur, depth and dot size).
Considering these corrections, the effects that were found to be significant are summarized as
follows:
• Blur * Depth: F(1.38, 19.36) = 7.77, p<.05
40 Considering the very small ratio (1/18) of No Pattern condition trials relative to those of the random dot patterns,
excluding that set of results from this analysis was not deemed to diminish the validity of the obtained results.
91
• Blur * Dot Density: F(2, 28) = 11.81, p<.0005
• Depth * Dot Density: F(3.64, 50.94) = 7.83, p<.0005
• Blur * Depth * Dot Size: F(3.71, 51.94) = 3.84, p<.0005
• Blur * Depth * Dot Size * Dot Density: F(20, 280) = 1.82, p<.05
In the following sections, we focus on each of these significant effects individually.
6.4.1.2.1 Interaction Effect of Blur and Depth
Results revealed a significant interaction effect between blur and depth, indicating that blur had
different effects on the average absolute error in perceived depth when the depth of the virtual
object changed. To break down this interaction, contrasts were performed comparing all the
depths the virtual object appeared at to depth = 0/10 (when the virtual object was completely in
front of the bin). These revealed significant interactions for all depths (with a marginally
significant interaction for depth=4/10 (where p = .06):
• Depth=2/10 to Depth=0/10: F(1,14)=4.82, r=.51, p<.05
• Depth=4/10 to Depth=0/10: F(1,14)=4.11, r=.48, p=.06
• Depth=6/10 to Depth=0/10: F(1,14)=5.87, r=.54, p<.05
• Depth=8/10 to Depth=0/10: F(1,14)=10.83, r=.66, p<.05
• Depth=10/10 to Depth=0/10: F(1,14)=9.26, r=.63, p<.05
The interaction graph is presented in Figure 6.21, which differs from Figure 6.18, Figure 6.19
and Figure 6.20 in that results are combined for all dot densities and dot sizes (resulting in 18
points at each depth). This plot illustrates that the effect of blur on ‘average absolute error in
perceived depth’ for depths≥2/10 was significantly different from this effect at depth=0/10. Upon
closer inspection of this graph, we can also notice that, while there doesn’t seem to be a
noticeable difference between the sharp and blurry conditions for depths=2/10, 4/10, and 6/10,
when depth≥8/10, the sharp patterns result in lower average absolute errors in perceived depth
than those of the blurry patterns.
92
A potential explanation could be that when the random dot pattern was blurry, it gave the
impression that the surface of the bin was farther away and thus participants were more likely to
perceive the virtual object (which appeared sharp and in focus) as being in front of the bin (as
verified by the results presented in Section 6.4.1and the participants’ responses to interview
questions, discussed in Section 6.4.4). On the other hand, as the depth of the virtual object
increased (it appeared farther behind the surface of the bin), the sharp patterns provided edges
that were more distinct and, thus, resulted in depth judgements with lower absolute errors.
Figure 6.21: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of blur and depth.
6.4.1.2.2 Interaction Effect of Blur and Dot Density
There was a significant interaction effect between blur and dot density. This indicates that blur
had different effects on the ‘average absolute error in perceived depth’ when the dot density of
the random dot patterns changed. To break down this interaction, contrasts were performed
comparing the effect of blur for all three dot densities (20%, 40% and 60%). Contrasts revealed
significant interactions when comparing the effect of blur both for dot density 40% to dot density
93
20% (F(1,14)=6.67,r=.57), and dot density 60% to dot density 20% (F(1,14)=31.07,r=.83).
Looking at the interaction graph (Figure 6.22), these effects reflect that the differences in the
‘average absolute errors in perceived depth proportion’ due to the blur are significantly larger for
dot density 20% compared to 40% and 60% (0.4 compared to 0.15 and 0.05). There was no
significant interaction between dot density and blur when dot density 40% was compared to
60%.
Figure 6.22: Average absolute error in perceived depth of virtual object as a function of dot
density depicting the interaction effect of blur and dot density.
Considering that the smaller dot size (1/50 and 1/75) blurry patterns with densities 20% did not
yield any significant differences of ‘average absolute error in perceived depth’ compared to the
No Pattern condition (as discussed in Section 6.4.1.1), perhaps it can be reasonable to conclude
that the addition of these patterns failed to be perceived as a substantial change to the surface. On
the other hand, this interaction effect reveals that sharp patterns with 20% dot density led to
significantly lower average absolute errors in perceived depth proportion (compared to their
blurry counterparts). This result can also serve as potential evidence for Hypothesis 2a, by
94
suggesting that even with 20% dot densities, sharp patterns may be able to provide the distinct
edges required for making vergence eye movements.
6.4.1.2.3 Interaction Effect of Depth and Dot Density
There was a significant interaction effect between depth and dot density. This indicates that dot
density had different effects on the ‘average absolute error in perceived depth proportion’ when
the depth of the virtual object changed. To break down this interaction, contrasts were performed
comparing the effect of density for all depths compared to depth=0/10. Contrasts revealed
significant interactions when comparing the effect of depth both for dot density 40% to dot
density 20% and dot density 60% to dot density 20%. The onset of this interaction when
comparing dot density 40% to dot density 20% was at depths≥4/10, while the same effect when
comparing dot density 60% to dot density 20% was at depths≥6/10.
The significant interactions found when comparing dot density 40% to 20% were as follows:
• Depth=4/10 to Depth=0/10: F(1,14)=13.8, r=.7, p<.05
• Depth=6/10 to Depth=0/10: F(1,14)=9.43, r=.63, p<.05
• Depth=8/10 to Depth=0/10: F(1,14)=8.25, r=.61, p<.05
• Depth=10/10 to Depth=0/10: F(1,14)=9.09, r=.63, p<.05
When comparing dot density 60% to 20%, the following significant interactions were found:
• Depth=6/10 to Depth=0/10: F(1,14)=7.03, r=.58, p<.05
• Depth=8/10 to Depth=0/10: F(1,14)=10.06, r=.65, p<.05
• Depth=10/10 to Depth=0/10: F(1,14)=14.47, r=.71, p<.05
Looking at the interaction graphs in Figure 6.23, Figure 6.24 and Figure 6.25, which are
presented separately for each dot size to prevent clutter, it can be seen that (other than for one
exception for dot size=1/25) when comparing depths≥6/10 to depth=0/10, the effect of dot
density reverses as dot density goes from 20% to 40% and from 20% to 60%.
95
Figure 6.23: Average absolute error in perceived depth of virtual object as a function of its
actual depth, depicting the interaction effect of depth and dot density for dot size=1/25.
Figure 6.24: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of depth and dot density for dot size=1/50.
96
Figure 6.25: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of depth and dot density for dot size=1/75.
Although these results do not provide sufficient evidence for rejecting the null hypothesis
regarding the inverse effect of dot density on errors in EDVO (Hypothesis 3 mentioned in
Section 6.3.1), a potential implication could be that with increased dot density, observers are
more likely to perceive the virtual object as being behind the surface, which causes larger errors
when it’s placed in front but leads to smaller errors as the object moves farther behind the
surface (as verified by the results presented in Section 6.4.1). Although the reason for why this
may be the case is not clear, this finding may have important implications for the design of
random dot patterns for X-ray vision, which aims to convey the impression that the virtual object
is placed behind the real surface.
6.4.1.2.4 Interaction Effect of Blur, Depth and Dot Size
There was a significant interaction effect between blur, depth and dot size. This indicates that dot
size influenced the effect of blur on ‘average absolute error in perceived depth’ differently when
97
the depth of the virtual object changed. To break down this interaction, contrasts were
performed. Results revealed the following significant interactions:
When dot size=1/50 was compared to dot size=1/25:
• Depth=6 to Depth=0: F(1,14)=6.86, r=.57, p<.05
• Depth=10 to Depth=0: F(1,14)=5.82, r=.54, p<.05
When dot size=1/75 was compared to dot size=1/25:
• Depth=6 to Depth=0: F(1,14)=5.46, r=.53, p<.05
• Depth=8 to Depth=0: F(1,14)=7, r=.58, p<.05
• Depth=10 to Depth=0: F(1,14)=8.23, r=.61, p<.05
When dot size=1/75 was compared to dot size=1/50:
• Depth=8 to Depth=0: F(1,14)=5.91, r=.54, p<.05
Looking at the interaction graphs, which are presented separately for each dot size (Figure 6.26,
Figure 6.27 and Figure 6.28), it can be seen that the effect of blur for depths≥6 (when comparing
to its effect at depth=0) is different for various dot sizes. Generally, from what can be interpreted
from these graphs, as dot size becomes smaller, the difference in ‘average absolute error in
perceiving depth’ becomes larger between the blurry and the sharp condition for depths≥6
compared to depth=0. A possible explanation for this could be that the fixed amount of blur
applied to the various patterns results in a larger visible effect for smaller dots. This is because
smaller dot sizes have higher dominant spatial frequencies compared to larger dot sizes and since
‘blurring’, in effect, attenuates high spatial frequencies, the resulting effect becomes more visible
for smaller dot sizes. Therefore, it can be concluded that this result is in agreement with
Hypothesis 2b.
98
Figure 6.26: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of blur, depth and dot density for dot size=1/25.
99
Figure 6.27: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of blur, depth and dot density for dot size=1/50.
100
Figure 6.28: Average absolute error in perceived depth of virtual object as a function of its
actual depth depicting the interaction effect of blur, depth and dot density for dot size=1/75.
6.4.1.2.5 Interaction Effect of Blur, Depth, Dot Size and Dot Density
There was a significant interaction effect between blur, depth, dot density and dot size
(F(20,280)=1.82, p<.05). This indicates that the effect observed above also differed as dot
density changed. To break down this interaction, contrasts were performed. Results revealed the
following significant interactions:
When dot size=1/50 was compared to dot size=1/25:
• Depth=2/10 to Depth=0/10 and Dot Density=40% to Dot Density=20%: F(1,14)=6.31,
r=.56, p<.05
When dot size=1/75 was compared to dot size=1/25:
• Depth=6/10 to Depth=0/10 and Dot Density=60% to Dot Density=40%: F(1,14)=6.53,
r=.56, p<.05
101
When dot size=1/75 was compared to dot size=1/50:
• Depth=2/10 to Depth=0/10 and Dot Density=40% to Dot Density=20%: F(1,14)=8.08,
r=.6, p<.05
• Depth=2/10 to Depth=0/10 and Dot Density=60% to Dot Density=40%: F(1,14)=6.07,
r=.55, p<.05
6.4.2 Difficulty Rating of depth estimation task (DR)
As previously mentioned, for each trial, after having responded to the depth judgement task,
participants selected a response to the question: “On a scale of 1 (easiest) to 4 (most difficult),
how difficult did you find the task?” (Note that only discrete values of 1, 2, 3 or 4 were
accepted.) The scatterplots showing the ‘Difficulty Rating’ (DR) results as a function of the
virtual object’s actual depth proportion (relative to the real surface) for the various patterns and
the No Pattern condition are presented in Appendix B2. The major observation that can be made
from these scatterplots is that, for almost all patterns, the most common ratings are 2 and 3. For
the No Pattern condition, however, this seems not to be the case, as the ratings seem to be
distributed equally across 1-4.
To further inspect the data for trends, the average ratings across participants were plotted as a
function of the virtual object’s depth proportion for all conditions. These plots can be seen in
Figure 6.29, Figure 6.30 and Figure 6.31. The results for the No Pattern condition are also
provided for each set of results for comparison purposes.
Although it may initially seem that there is an inverted U effect present as the depth proportion
of the virtual object is increased, it should be noted that the Y-axis values in Figure 6.29, Figure
6.30 and Figure 6.31 show that the average difficulty ratings all vary only between 2 to 3 and
are, therefore, very close to each other. Perhaps the only potentially meaningful observations that
can be made are:
• The difficulty ratings for depth=0 are generally smaller than those at other depth
proportions.
• Difficulty ratings increase as depth is increased, although there doesn’t seem to be much
change after depth≥6.
102
• Difficulty ratings for the ‘No Pattern’ condition are inconsistent and peak around
midrange.
To check for effects of statistical significance, ANOVAs were performed on the difficulty ratings
averaged across the 5 (repetition) trials for each condition and each participant. The following
sections focus on the results of the two ANOVAs that were performed.
Figure 6.29: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/25.
103
Figure 6.30: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/50.
Figure 6.31: Average difficulty ratings as a function of the virtual object’s actual depth relative
to the real surface for dot size = 1/75.
104
6.4.2.1 Two-way Repeated Measures ANOVA
As with the depth judgement task results, the first ANOVA was performed such that each pattern
was treated independently, as a means of comparing the different patterns to the No Pattern
condition. With 3 dot sizes, 3 dot densities, 2 blur conditions and a No Pattern condition, there
was a total of 19 patterns. The two independent parameters considered were pattern and depth.
Mauchly’s test indicated that the assumption of sphericity had been violated for the main effect
of depth, 𝒳2(14) = 97.27, 𝑝 < .0005. Therefore, degrees of freedom were corrected using
Greenhouse-Geisser estimates of sphericity (𝜀 = .27). Based on this correction, there was a
significant effect of depth, F(1.34, 18.82)=4.87, p<.05. Contrasts revealed that this significant
main effect was due to the difference in difficulty ratings for depths=0/10 and depths=10/10,
F(1,14)=4.93, p<.05. There were no other significant differences in difficulty ratings when
comparing all other depths.
As for the effect of pattern, results did not reveal a significant main effect. Therefore, we were
not able to support Hypothesis 4 (which predicted that the No Pattern condition would lead to
higher DRs compared to the random dot pattern conditions). Although this result may initially
seem counter-intuitive, it does point to the fact that, while participants rated the depth judgement
task as equally difficult for the random dot pattern conditions and the No Pattern condition, their
accuracy in performing the depth judgement task was, in fact, significantly different for these
conditions.
Figure 6.32 depicts the effect of depth proportion on the difficulty ratings, formed by combining
all the data presented in Figure 6.29, Figure 6.30 and Figure 6.31. As it can be seen, this graph
supports the ANOVA result that performing the depth judgment task was deemed as
significantly more difficult when the virtual object was completely behind the bin’s surface
(10/10) compared to when it was completely in front of it (0/10).
105
Figure 6.32: Effect of depth on DRs.
6.4.2.2 Four-way Repeated Measures ANOVA
A second set of ANOVAs was done exclusively on the results pertaining to the random dot
pattern conditions and was meant to compare the various dot sizes, dot densities, depths and blur
conditions.
Once again, Mauchly’s test indicated that the assumption of sphericity had been violated for the
main effect of depth, 𝜒2(14) = 92.48, 𝑝 < .0005 and the interaction effect of blur and dot
size,𝜒2(2) = 7.45, 𝑝 < .05. Therefore, degrees of freedom were corrected using Greenhouse-
Geisser estimates of sphericity (𝜀 = .27 for the main effect of depth and 𝜀 = .7 for the
interaction effect between blur and dot size). Considering these corrections, the effects that were
found to be significant are summarized as follows:
• Depth: F(1.34, 18.76) = 5.28, p<.05
• Blur * Dot Size: F(1.39, 19.49) = 5.81, p<.05
In the following sections, we focus on each of these significant effects individually.
106
6.4.2.2.1 Main Effect of Depth
As mentioned, results revealed a significant main effect of depth proportion on DRs. To break
down this effect, contrasts were performed comparing the DR values for all depths. As expected
from the finding in Section 6.4.2.1, contrasts revealed only one significant main effect of depth
proportion, when comparing depth=10/10 to depth=0/10, F(1, 14)=5.47, p<.05.
6.4.2.2.2 Interaction Effect of Blur and Dot Size
Results revealed a significant interaction effect between blur and dot size. This indicates that blur
had different effects on the DRs when the dot size changed. To break down this interaction,
contrasts were performed comparing the sharp and blurry conditions for different dot sizes.
These revealed a significant interaction when comparing the effect of blur for dot sizes equal to
1/25 and 1/75, F(1,14)=7.02, p<.05. Figure 6.33 illustrates this interaction effect. As can be seen,
while the DRs for the sharp and blurry conditions do not differ much for dot size=1/25, DRs are
significantly larger for the sharp pattern with (small) dot size=1/75 compared to that of the blurry
pattern. In addition to confirming the larger effect of (a fixed amount of) blur on smaller dot
sizes (which was previously discussed), this result is suggesting that when dots are smaller in
size, the high number of sharp edges can cause the observer to deem the depth judgement task as
more difficult. This could potentially be due to the fact that (as explained in section 6.5.2)
stereoscopic images with higher dominant spatial frequencies (smaller dot sizes) are more likely
to lead to visual discomfort (Wöpking, 1995; Perrin, Fuchs, Roumes & Perret, 1998).
107
Figure 6.33: DRs as a function of dot size depicting the interaction effect of blur and dot size.
6.4.3 Correspondence between Average Absolute Errors in EDVO and DRs
Figure 6.34 depicts the scatterplot for average absolute errors in perceived depth as a function of
the difficulty ratings for all trials. As can be seen, there seems to be a positive correlation
between the two dependent parameters. To investigate whether this apparent correlation was
significant, Kendall’s tau revealed a significant positive correlation between the average absolute
errors in perceived depth and average difficult ratings, 𝜏 = .19, 𝑝 (𝑜𝑛𝑒 − 𝑡𝑎𝑖𝑙𝑒𝑑), 𝑝 < .0005.
Although this correlation was found to be significant, it should be pointed out that the small
value of τ is potentially due to the high number of occurrences where 2 and 3 were chosen as the
difficulty ratings.
108
Figure 6.34: Scatterplot showing average absolute errors in perceived depth as a function of the
difficulty rating for all trials.
6.4.4 Responses to the Interview Questions
Once the participants were done with the experiment trials, they were asked to reply to several
questions. The answers to each of these topics are provided in Appendix B3 and summarized
below.
• Any difficulties experienced in fusing images
o While none of the participants claimed to have experienced double images when
observing the stimuli, some mentioned that as the cone moved farther inside the
bin, the more difficult it became to fuse in 3D. In fact, some claimed that this was
one of the strategies they used in judging the depth proportion to be large.
• Specific strategies used for making depth judgements
109
As mentioned before, by asking participants to “determine what proportion of the truncated
cone (x-tenths) is behind the bin's surface”, we are asking them to determine the exocentric
distance of the larger circle of the truncated cone relative to the bin’s surface (‘a’ in Figure
6.11), given that the exocentric distance between the larger circle and the smaller circle of the
truncated cone (‘a+b’ in Figure 6.11) is 10 units. To do so, participants needed to have a
rather quick estimate of ‘a’ and ‘b’. Therefore, as expected:
o Most participants claimed that they made their depth judgements by determining ‘a’
and ‘b’ (in no particular order) and then comparing the two values.
o Some mentioned that following their gaze along the lines connecting the two circles
also helped.
o Some also made their judgements based on the perceived change in the depth of the
virtual object relative to the previous trial.
On the other hand, even though participants were explicitly advised to not use the relative size
cue41 (since the size of the circles and lengths of the truncated cone changed randomly):
o Some participants claimed that they used the relative size of the large and small
circles to make their depth judgements. Some others also mentioned using the length
of the lines connecting the two circles as an indication of the length of the truncated
cone.
With regards to cues that helped them estimate the values for ‘a’ and ‘b’, participants used one or
a combination of several rules of thumb. In general, as the larger circle moved farther behind the
bin’s surface, participants used the following terms to describe their perception of the larger
circle:
o less clear,
o less stark,
41 As outlined in the Information Sheet provided.
110
o harder to fuse,
o more out of focus,
o jagged,
o blurrier,
and vice versa. Moreover:
o With the No Pattern condition, participants claimed that determining the depth
proportion of the truncated cone was most difficult. However, in cases where the cone
was completely in front of the surface of the bin, determining its depth was easy.
o Most participants found making depth judgements more difficult when dot densities
were high.
o In making their depth judgements, half of the participants found blurry patterns easier,
while the other half found sharp patterns easier.
o Some claimed when patterns were blurry, the surface of the bin seemed farther away.
o Some claimed that it was easier to perceive the larger circle as being behind the
surface of the bin when they focused on points where it was on the black dots (rather
than on the white dots).
• The way in which the black dots are perceived
o Almost all participants (except two) perceived the black dots as part of the surface
of the bin (i.e. as painted marks on the surface of the bin).
As for the other two:
o One participant viewed the pattern as a ‘cut-out’ where black dots were holes
through which the larger circle could be seen.
o One participant perceived the black dots as part of the inside of the bin (which
also infers that the black dots were perceived as holes).
111
6.5 Discussion
In this section, we provide a discussion of the results by summarizing their implications and
comparing them to the hypotheses presented in Section 6.3.
6.5.1 Errors in EDVO
HYPOTHESIS 1: The addition of (sharp) patterns will lead to lower errors in EDVO
relative to the No Pattern condition.
While the analysis presented in Section 6.4.1.1 revealed various interaction effects, the general
results supported this hypothesis for depths≥6/10. In other words, while there may be no
significant effect of using random dot patterns for improving the accuracy of depth judgements
for depths<6/10, results showed that as the depth of the virtual object increases (above 6), the
addition of random dot patterns onto the surface of the bin can significantly reduce errors in
EDVO. This result can also be visually confirmed by examining the plots presented in Figure
6.13, Figure 6.14 and Figure 6.15, where it can be seen that, for depths≥6/10, the median errors
in EDVO are consistently smaller than those of the No Pattern condition.
HYPOTHESIS 2a: Sharp patterns will lead to lower errors in EDVO compared to blurry
patterns.
The results presented in Sections 6.4.1.2.1 revealed significant interactions between depth and
blur. Upon inspection of contrasts and Figure 6.21, it can be confirmed that as the depth of the
virtual object increased, the average absolute errors in EDVO for sharp patterns were
significantly smaller compared to those of blurry patterns. Therefore, we can claim that the
results support this hypothesis for depths≥8/10. However, as shown in Figure 6.21, at
depth=0/10, the blurry patterns seem to lead to lower average absolute errors compared to the
sharp patterns. A potential explanation could be that when the random dot pattern was blurry (or
in other words, ‘out of focus’ for the observer), it gave the impression that the surface of the bin
was farther away than the virtual object (which appeared ‘in focus’). In such cases, participants
were more likely to perceive the virtual object as being closer to them than the surface of the bin.
When depth=0/10, this was in fact the case and, hence why average absolute errors are lower for
blurry patterns. As mentioned in Section 6.4.4, participants explicitly pointed this out when
replying to the interview questions.
112
Moreover, the interaction effect between blur and dot density (as discussed in Section 6.4.1.2.2)
revealed that this difference between the blurry and sharp conditions was significant for dot
density of 20% compared to those of 40 and 60%. This finding suggests that, even with 20% dot
density, sharp patterns may be able to provide the edges required for making vergence eye
movements.
HYPOTHESIS 2b: If EDVO errors were found to be different for blurry and sharp
patterns, it was hypothesised that this difference would be larger for smaller dot sizes.
The interaction effect between blur, dot size and depth (as discussed and presented in Section
6.4.1.2.4) also supported this hypothesis, by showing that as dot size decreased, the difference
discussed above became larger. As previously mentioned, the reason for this is that the fixed
amount of blur applied to the various patterns resulted in a larger visible effect for smaller dots.
EFFECT OF DOT SIZE
As previously mentioned, two competing hypotheses existed for the effect of dot size on absolute
average errors in EDVO: While larger dot sizes provide more distinct edges, smaller dot sizes
result in a larger number of edges available for making vergence eye movements. Perhaps this is
a possible reason why no main effect of dot size was found. The only interactions that dot size
appeared in were:
o Blur*depth* dot size: This effect was discussed above.
o Blur*depth* dot size*dot density: Since no particular trend can be found for this
interaction effect, it is difficult to hypothesize a reasoning for this interaction effect.
HYPOTHESIS 3: Higher dot densities will increase the stimulus’ strength in attracting
vergence (Rashbass & Westheimer, 1961; Mallot, Roll & Arndt, 1995), leading to smaller
errors in EDVO.
With regards to the effect of dot density, implications of some interaction effects did find
supporting evidence for this hypothesis. First of all, the interaction effect of blur and dot density
(as presented in Section 6.4.1.2.2) on average absolute errors in EDVO showed that the effect of
blur was significant for dot density of 20% compared to that of 40 and 60%. What this result
113
may be implying is that when dot density is low (and, hence, there are probably fewer edges
available for guiding vergence eye movements), the sharpness of the random dot pattern matters.
However, as dot density increases, the effect of blur diminishes, possibly as a result of an
increase in the random dot pattern’s strength in attracting vergence. In other words, at densities
of 40% and 60%, the random dot pattern’s strength in attracting vergence may be high enough to
compensate for the absence of distinct edges. Moreover, the interaction effect of depth and dot
density (presented in Section 6.4.1.2.3 and illustrated in Figure 6.23, Figure 6.24 and Figure
6.25) shows that, for depths>6/10, as dot density increases, the average absolute errors in EDVO
decrease. Put simply, as the depth of the virtual object increases, higher dot densities are better
able to aid in estimating the depth of the virtual object. This result has potentially important
implications for the application of random dot patterns for X-ray vision purposes. However, as
discussed in the case of the previous series of experiments, one should also be wary about the
amount of information loss of the real surface due to the overlaying of the random dot pattern.
6.5.2 DRs
HYPOTHESIS 4: The No Pattern condition will yield higher DRs compared to the pattern
conditions.
As mentioned in Section 6.4.2.1, results did not reveal a significant main effect of pattern.
Therefore, we were not able to support Hypothesis 4. Considering that the average absolute
errors in EDVO were generally significantly lower for the random dot patterns (compared to
those of the No Pattern condition), we may attribute the absence of significant differences in DRs
for these conditions to the possibility that, when replying to the question of difficulty in
‘estimating the depth of the virtual object’, participants were also considering the difficulty in
‘fusing the image’ (or, rather, their ‘comfort when viewing the image’). In fact, the interaction
effect found between blur and dot size also supports this explanation. To further explain these
results, a brief discussion of the literature on this topic is required.
Various researchers have shown the dependence of fusional limits on the spatial frequency of the
perceived image (e.g. Felton, Richards & Smith, 1972; Schor, Wood, & Ogawa, 1984; Schor,
Heckmann, & Tyler, 1989). The results of these studies have shown that as the dominant spatial
frequency of an image increases, the fusion limit decreases in range. Moreover, other researchers
have investigated the effect of spatial frequency on visual comfort when viewing stereoscopic
114
images (e.g. Wöpking, 1995; Perrin, Fuchs, Roumes & Perret, 1998). Based on these studies,
stereoscopic images with higher dominant spatial frequencies received lower comfort ratings
compared to those with lower dominant spatial frequencies. In fact, in some studies, the use of
blur (as a means of decreasing spatial frequency of the image content) is suggested to increase
viewing comfort (e.g. Wöpking, 1995; Leroy, Fuchs & Moreau, 2012).
Therefore, it seems that, while the No Pattern condition did not provide any edges to help
participants determine the depth of the real surface, the higher viewing comfort it provided
(because of its lower dominant spatial frequency compared to that of the random dot pattern
conditions) compensated for the difficulty of the depth estimation task. Perhaps this is why the
No Pattern and the random dot pattern conditions did not lead to significantly different DRs.
Moreover, as discussed in Section 6.4.2.2.2 and illustrated in Figure 6.33, for the random dot
pattern with dot size of 1/75, DRs were significantly larger for the sharp condition (higher spatial
frequency) compared to those of the blurry condition (smaller spatial frequency). The
consistency of this result with findings of the literature (discussed above) provides relatively
strong evidence that participants were, in fact, taking their viewing comfort into account when
assessing the difficulty of the depth estimation task.
Hypothesis 5: Blurry patterns will lead to higher difficulty ratings compared to their sharp
counterparts.
As mentioned, the results presented in Section 6.4.2.2.2 and illustrated in Figure 6.33, provided
evidence contradicting this hypothesis. The potential explanation is that blurring of random dot
patterns decreased the image’s spatial frequency resulting in lower DRs (as discussed above). It
should also be reiterated that the reason the effect of blur on DRs was larger for smaller dot sizes
is that the fixed amount of blur applied to the various patterns resulted in a larger visible effect
for smaller dots (as previously explained).
Hypothesis 6: Larger dot sizes will lead to lower DRs.
As illustrated in Figure 6.33, while this hypothesis was supported for sharp patterns, this was not
the case for blurry patterns. As predicted, in the case of sharp random dot patterns, larger dot
sizes provided edges that were more distinct, thereby, leading to lower DRs. Additionally, with
regards to the discussion above, since random dot patterns with larger dot sizes were of smaller
115
spatial frequency, they were more easily fused. On the other hand, when the patterns were blurry,
the fixed amount of blur applied to the random dot patterns resulted in larger attenuation of high
spatial frequencies for smaller dot sizes, which explains the reduction in DRs.
Hypothesis 7: Higher dot densities will lead to lower DRs.
Contrary to this hypothesis, during the interview, most participants claimed that they found
making depth judgements more difficult when dot densities were high. However, statistical
analyses did not reveal the effect of dot density on DRs as significant.
6.5.3 Relationship between Average Absolute Errors in EDVO and DRs
When forming the hypotheses for this experiment, it was expected that the parameters that would
lead to smaller errors in EDVO would do so by making the depth judgement task easier. In other
words, we expected the magnitude of the DRs to be proportional to errors in EDVO. While the
results presented in Section 6.4.3 do support the hypothesis that a correlation exists between
these two dependent parameters, the value of this correlation is rather small. This small value
serves as further evidence for the discussion presented on DRs above. We may, therefore,
conclude that while the wording of “On a scale of 1 (easiest) to 4 (most difficult), how difficult
did you find the task?” intended to assess the difficulty of the depth estimation task, participants
may have responded to it with a measure that combined their visual comfort rating with their
assessment of the difficulty of the depth estimation task.
6.5.4 Some Notes on the Responses to the Interview Questions
o As previously mentioned, participants used the following terms to describe their
perception of the larger circle as it moved farther behind the bin’s surface: less clear, less
stark, harder to fuse, more out of focus, jagged, blurrier. These descriptions seem to be in
agreement with the term Pseudo-Translucency (or Stereo-Pseudo-Translucency) that we
used in Chapter 4 to label the observed phenomenon when a virtual object is rendered
stereoscopically behind a real surface without being occluded by it. As may be recalled, a
pseudo-translucent surface gives the impression that one is looking through a diffuse
surface, somewhat akin to frosted glass, and can therefore result in perceiving the virtual
object (behind it) as ‘less clear’, ‘less stark’, etc.
116
o Upon inspection of figures illustrating average absolute errors in EDVO versus depth of
the virtual object (e.g. Figure 6.18, Figure 6.19 and Figure 6.20), one of the first
impressions that stands out is the apparent U-shaped trend of the results. In other words,
the errors seem to be largest for depths=0/10 and 10/10 and smallest in between these
depths. While this may seem counter-intuitive, two potential explanations can be
presented for this observation. Firstly, as previously mentioned, the average absolute
errors in EDVO were calculated by subtracting the actual depth of the virtual object from
its perceived depth. Therefore, the maximum values of errors for depths between 0/10
and 10/10 were smaller than those for depths=0/10 and 10/10. In addition to this, it is also
likely that, when faced with uncertainty, participants tended to choose an intermediate
value, which would obviously lead to smaller errors for the depths between 0/10 and
10/10.
o The second potential explanation, which is potentially most important, stems from the
statement that some participants made as one of their strategies in making depth
judgements: “Following my gaze along the lines connecting the two circles also helped.”
One of the lessons learned when trying out different ideas for the design of the virtual
object for this experiment was that when the virtual object’s wireframe structure
contained vertical components that passed through the real surface, observers would
usually be easily able to determine the proportion of that component that was behind the
surface. The reason for this is that, at the point where the conflict between occlusion and
binocular disparity cues occurs, the virtual object becomes rather ‘less clear’, ‘less stark’,
etc., as discussed above. For this reason, when creating the virtual object (through the
process described in Section 6.2.1.3), the lines used for connecting the circles of the
truncated cone were drawn as thin as possible to minimize the usefulness of this cue.
However, as the U-shaped trend of the errors in EDVO suggest, it seems that this cue was
perhaps still being used for making depth judgements for depths between 0/10 and 10/10.
For depths=0/10 and 10/10, this cue was not available and perhaps this is why the
average absolute errors are highest for these depths.
o In making their depth judgements, half of the participants found blurry patterns easier,
while the other half found sharp patterns easier. As discussed in Section 6.5.3, the reason
for the different opinions on this could potentially be that, while some focused mostly on
117
their visual comfort, others were considering their ability to make accurate depth
judgements.
o Referring to the two participants’ comments regarding ‘perceiving the random dot pattern
as a ‘cut-out’ where black dots were holes through which the larger circle could be seen’
and ‘perceiving the black dots as part of the inside of the bin’, it seems that these two
participants were able to conform to the anticipated percept of our X-ray vision
visualization method, which was presented in Chapter 4 (Figure 4.3).
6.6 Contributions and Limitations
The main contributions of this experiment are as follows:
• Random dot patterns were shown as a potentially effective method for increasing the
accuracy of depth judgements for application of X-ray vision with stereoscopic AR
displays.
• Sharp random dot patterns (with distinct edges) may be effective in guiding vergence eye
movements, by providing features that are easier to fixate.
• In choosing the appropriate dot size and density of the random dot pattern, the spatial
frequency content of the pattern should be considered to ensure that it does not
compromise visual comfort.
It is important, however, to point out that the experiments presented here were limited to the use
of a convex surface without a prominent visible texture, and to a 3D wireframe virtual object
being presented in depth. Therefore, the applicability and validity of these results in conditions
where the real object’s surface contains 2D or 3D textural elements, or in cases where the virtual
object is solid, are unknown.
Moreover, as discussed in Section 6.2.1.3, the generation of stimuli for this experiment was done
using physical models, which allowed for a simulation of an AR display. While it is expected
that the results obtained remain valid for an actual AR display replicating these experimental
conditions (by computationally adding the random dot pattern to the real surface and generating
118
and rendering the virtual object in depth), the possible imprecisions resulting from the use of
physical models should also be considered as a potential limitation of this experiment.
119
Chapter 7
Conclusion
The experimental evaluations performed throughout this research study support the premise of
random dot patterns improving the accuracy of ordinal and absolute depth judgements of virtual
objects relative to the surface of real objects. The magnitude of dot size and dot density used
have also been shown to affect the perception of surface information and should, therefore, be
considered when using this technique to improve depth judgments for the purpose of achieving
X-ray vision with stereoscopic AR displays.
7.1 Contributions
The individual contributions of each experiment have been discussed in Sections 5.5 and 6.6. In
this section, I provide a summary of the overall contributions of this research project:
• Contributed to the advancement of a novel approach to achieving X-ray vision for use
with stereoscopic AR displays.
• Performed an experimental procedure that:
o Verified the benefit of using random dot patterns to help in disambiguation of the
depth order between a virtual object and a flat real surface with 2D textural
elements,
o Verified the benefit of using random dot patterns to help in achieving the
impression of transparency of the real surface, and
o Demonstrated that real surface information can be preserved even with the
addition of RDPs, depending on the sizes and densities of dots used.
• Contributed to the fundamental knowledge base in visual representation by providing
supporting evidence that occlusion (which is widely recognized as the strongest cue to
relative depth order) can be robustly overridden through the appropriate use of the
binocular disparity and convergence cue.
120
• Developed and implemented another experimental platform, software and procedure
specifically focused on absolute depth judgements, the results of which:
o Provided supporting evidence that distinct edges of (sharp) random dot patterns
may allow for improvements in the accuracy of absolute depth judgments,
possibly due to increased ease of fixation on those distinct edges, leading to
additional depth information from convergence cues;
o Confirmed the benefit of using random dot patterns to improve the accuracy of
absolute depth judgements, and
o Assessed the impact of random dot pattern parameters (dot size and dot density)
on accuracy of absolute depth judgements.
7.2 Practical Implications
From a practical point of view, the findings can be used as a basis for developing guidelines for
the use of random dot patterns to achieve X-ray vision and to improve (absolute and ordinal)
depth judgements for near-field applications of stereoscopic AR displays. Based on the
experimental results, for near-field applications of stereoscopic AR displays, the following
design guidelines are proposed:
• Higher dot densities are better able to aid in estimating the depth of the virtual object (as
illustrated by Figure 6.23, Figure 6.24 and Figure 6.25) for larger depths (in the present
experiment for depths ≥6/10). However, the preservation of real surface information is an
important factor to consider, as higher dot densities lead to more information loss (as
suggested by Figure 5.7).
• If preservation of real surface information is required, consideration should be given to
both dot size and dot density of the random dot pattern. For example, if the location of
the most important information on the real surface is known, using a larger dot size is
recommended (as suggested by the low absolute errors in Figure 5.8), since larger dots
can be configured such that important parts of the real surface are not occluded. In this
case, dot densities as low as 20% can still be used effectively (as shown in Figure
6.18Figure 6.22). On the other hand, if the location of the most important information on
121
the real surface is not known, smaller dot sizes with higher dot densities are
recommended (as suggested by Figure 6.25).
• In cases where factors such as the use of small dot sizes and high dot densities may lead
to subjective difficulty in making depth judgements (possibly due to the spatial frequency
content of such random dot patterns, which are potentially associated with visual
discomfort), blurring of the random dot pattern may be helpful (as suggested by Figure
6.33). However, since doing so may diminish the desired improvement in making depth
judgements, blurring of the random dot pattern is not recommended for dot densities
below 40% (as shown in Figure 6.22) and dot sizes smaller than 1/50 (as suggested by the
large difference between the sharp and blurry conditions for depth=10 in Figure 6.27 and
Figure 6.28).
• In cases where virtual objects are partially inside the real object, and where accurate
depth judgement is important for task execution, using a blurry pattern may be a better
alternative for small depths (as suggested by Figure 6.21 for depths≤4) but using sharp
patterns may be better for large depths (as suggested by the same figure for depths≥6).
7.3 Limitations and Suggested Improvements to Experiments
When interpreting the results and findings of this research one should be mindful of a number of
limitations related to the experiments performed. These limitations are divided into two
categories, those corresponding to Experiment 1 and 2 and those corresponding to Experiment 3,
and are followed by suggested improvements.
7.3.1 Experiments 1 and 2
As discussed, the ‘real surface’ that was used in these two experiments was a photograph of a flat
surface that consisted of 2D textural elements. Therefore, the results of these experiments are
limited to such surfaces and may not be applicable to non-flat surfaces with 2D textural elements
or surfaces that contain 3D textural elements. Using this experimental framework for different
surfaces would be worthwhile in investigating the effect of surface characteristics on the
effectiveness of using random dot patterns. The results of these experiments also apply only to
wireframe 2D virtual objects.
122
As for Experiment 1 (and as noted in Footnote 29), the psychophysical functions were fitted to
data that were in the close vicinity of the Point of Objective Equality. To obtain psychophysical
functions with better fits, it is suggested to also include stimuli where the virtual object is placed
at larger (stereoscopic) distances from the real surface (both in front of and behind).
With regards to Experiment 2, as shown in Figure 5.7, we were not able to identify a clear trend
for the Transparency Rating values as a function of dot density (which varied between 40, 50, 60
and 70%). Comparing our results to those of Otsuki and Milgram (2013), it is suspected that
choosing dot densities as low as 20% might have revealed such a trend.
The results obtained for d’ in this experiment resulted in values for a few combinations of dot
size and dot density that were very small (≤0.5) or close to 0, which indicate close to chance
performance. One of the implications of this result is that the shape matching task may have been
too difficult. One way to mitigate this when designing a similar experiment, would be to increase
the eccentricity of the ellipses used.
7.3.2 Experiment 3
While Experiment 3 aimed to expand upon the work done in Experiments 1 and 2, there are also
limitations associated with this experiment. One of these is that the real object consisted of a
convex surface with no visible texture and the virtual object was a wireframe object. Therefore,
the results of this experiment are limited to such objects. As an example of this limitation, it is
predicted (but still remains open for investigation) that an analogous convex surface but with 3D
textural elements might not necessarily benefit from the addition of random dot patterns, as was
observed through a few samples tried out in pilot studies.
Yet another avenue of future investigation is to explore the applicability of this method when
using solid virtual objects.
The present experiment also consisted of conditions where the truncated (wireframe) cone was
placed either in front of, in between, or behind the real surface. As discussed in Section 6.5.4,
this resulted in some participants following their gaze along the lines connecting the two circles
to find ‘the point of cue conflict’ to make their depth judgement. Since this cue may not exist for
certain X-ray vision applications of AR, it is also necessary to investigate the effectiveness of
123
using random dot patterns for cases where the virtual object is placed at different depths but
always behind the real surface.
Moreover, as discussed in Section 6.6, the generation of stimuli for this experiment was done
using physical models, which allowed for a simulation of an AR display. As a result, the possible
imprecisions resulting from the use of physical models should be considered as a limitation of
this experiment. To eliminate such limitations, 3D software should be developed to permit the
random dot patterns to be correctly added to the real object surface by obtaining a depth map for
the real surface, and the virtual object should be generated and rendered in depth directly, also
using software.
Additionally, even though the results of this experiment were in line with the hypothesis that
adding random dot patterns will provide additional fixation points on the real object surface, that
in turn will facilitate vergence eye movements and thus provide additional depth cues, making a
definitive conclusion in this regard is not possible, as eye convergence angles were not measured
in this experiment. Using appropriate experimental means, such as eye tracking devices, to
continuously measure exact angles of convergence is necessary to investigate this phenomenon
more closely and more definitively.
Finally, it should be noted that these experiments were carried out in a controlled laboratory
environment and many simplifications were made in task complexity and environmental
conditions. In medical applications of AR, for example, many task and environmental
complexities exist, and therefore further studies under more realistic conditions are required to
determine the exact conditions under which this approach will prove effective.
7.4 Future Work
The following is a list of suggestions for future work that can build on what was found in this
research:
• Implementing the idea of adding random dot patterns onto a real surface and generating
and rendering virtual objects in depth computationally, for stereoscopic AR, to assess
potential challenges in its technical implementation. Once this is done:
124
• Introducing complexity to the experimental task by developing an experimental platform
and designing an interactive manual manipulation task that requires depth judgements to
go beyond the present strictly observational depth judgement tasks.
• Along similar lines, developing the capability for the virtual object to move in relation to
the real object surface (either autonomously or under the control of the user), in order to
investigate the premise that visual cues derived from motion of the virtual object relative
to the real object surface will significantly enhance the percept of the virtual object being
behind the real surface.
• Analogous to the point above, it is surmised that being able to manipulate the
superimposed random dot patterns, in terms of dot sizes, dot densities and dot
distributions (either autonomously or under the control of the user) is likely to increase
some of the advantages of this display concept, and thus merits investigation.
• Further evaluation of this approach outside of a controlled laboratory setting, in an
environment that is representative of near-field applications of X-ray vision in AR.
• Further to the point above, some of the environmental conditions that need to be studied
include: implications of real surfaces consisting of salient information to be preserved,
variations of colour of the virtual object and presence of other depth cues such as motion
parallax.
125
References
Abdelmounaime, S., & Dong-Chen, H. (2013). New Brodatz-based image databases for
grayscale color and multiband texture analysis. ISRN Machine Vision.
Akerstrom, R. A., & Todd, J. T. (1988). The perception of stereoscopic transparency. Perception
& Psychophysics, 44(5), 421-432.
Avery, B., Sandor, C., & Thomas, B. H. (2009, March). Improving spatial perception for
augmented reality x-ray vision. In Proceedings of the IEEE Virtual Reality Conference (pp. 79-
82). IEEE.
Bajura, M., Fuchs, H., & Ohbuchi, R. (1992). Merging virtual objects with the real world: Seeing
ultrasound imagery within the patient. In ACM SIGGRAPH Computer Graphics (Vol. 26, No. 2,
pp. 203-210). ACM.
Bichlmeier C., Wimmer F., Heining S. M., Navab N. (2007). Contextual anatomic mimesis;
hybrid in-situ visualization method for improving multi-sensory depth perception in medical
augmented reality. In ISMAR 2007: The 6th IEEE and ACM (pp. 129-138). IEEE.
Brodatz, P. (1966). Textures: a photographic album for artists and designers. New York, USA:
Dover Publications.
Bruce, V., Green, P. R., & Georgeson, M. A. (2003). Visual perception: Physiology, psychology,
& ecology. Psychology Press.
Bülthoff, H. H., & Mallot, H. A. (1988). Integration of depth modules: stereo and shading. Josa
a, 5(10), 1749-1758.
Caudell, T. P., & Mizell, D. W. (1992). Augmented reality: An application of heads-up display
technology to manual manufacturing processes. In System Sciences, 1992. Proceedings of the
Twenty-Fifth Hawaii International Conference on (Vol. 2, pp. 659-669). IEEE.
Coutant, B. E., & Westheimer, G. (1993). Population distribution of stereoscopic ability.
Ophthalmic and Physiological Optics, 13(1), 3-7.
126
Cumming, B. G., Johnston, E. B., & Parker, A. J. (1993). Effects of different texture cues on
curved surfaces viewed stereoscopically. Vision research, 33(5), 827-838.
Cutting, J. E. & Vishton, P. M. (1995). Perceiving layout and knowing distances: The
integration, relative potency, and contextual use of different information about depth. W.
Epstein, 69-117.
Drascic, D. & Milgram, P. (1996). Perceptual issues in augmented reality. In Proceedings of
SPIE: Stereoscopic Displays and Virtual Reality Systems III, San Jose, California, 123-134.
Edwards, P. J., Johnson, L. G., Hawkes, D. J., Fenlon, M. R., Strong, A. J., & Gleeson, M. J.
(2004). Clinical experience and perception in stereo augmented reality surgical navigation. In
Medical Imaging and Augmented Reality (pp. 369-376). Springer Berlin Heidelberg.
Ellis, S. R., & Bucher, U. J. (1994). Distance perception of stereoscopically presented virtual
objects optically superimposed on physical objects by a head-mounted see-through display. In
Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 38, No. 19,
pp. 1300-1304). SAGE Publications.
Ellis, S. R., & Menges, B. M. (1998). Localization of virtual objects in the near visual field.
Human Factors: The Journal of the Human Factors and Ergonomics Society, 40(3), 415-431.
Felton, T. B., Richards, W., & Smith, R. A. (1972). Disparity processing of spatial frequencies in
man. The Journal of physiology, 225(2), 349-362.
Foley, J. M., & Richards, W. (1972). Effects of voluntary eye movement and convergence on the
binocular appreciation of depth. Attention, Perception, & Psychophysics, 11(6), 423-427.
Gescheider, G. A. (2013). Psychophysics: The Fundamentals. New Jersey, USA: Psychology
Press.
Ghasemi, S., Otsuki, M., Milgram, P., & Chellali, R. (2017). Use of Random Dot Patterns in
Achieving X-Ray Vision for Near-Field Applications of Stereoscopic Video-Based Augmented
Reality Displays. PRESENCE: Teleoperators and Virtual Environments, 26(1), 42-65.
127
Gibson, J. J. (1950). The perception of visual surfaces. The American journal of psychology,
63(3), 367-384.
Gibson, J. J. (2014). The ecological approach to visual perception: classic edition. Psychology
Press.
Gillam, B.J. & Grove, P.M. (2011). Contour entropy: a new determinant of perceiving ground or
a hole. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 750-
757.
Hou, M., & Milgram, P. (2003, October). A sensitivity study of factors influencing real-virtual
object alignment performance in stereoscopic augmented reality environments. In Proceedings of
the Human Factors and Ergonomics Society Annual Meeting (Vol. 47, No. 13, pp. 1630-1634).
Sage CA: Los Angeles, CA: SAGE Publications.
Interrante, V. (1996). Illustrating Transparency: Communicating the 3D shape of layered
transparent surfaces via texture. Unpublished Doctoral dissertation, University of North
Carolina at Chapel Hill.
Interrante, V., Fuchs, H., & Pizer, S. M. (1997). Conveying the 3D shape of smoothly curving
transparent surfaces via texture. IEEE Transactions on visualization and computer graphics,
3(2), 98-117.
Johnson, L. G., Edwards, P., & Hawkes, D. (2003). Surface transparency makes stereo overlays
unpredictable: the implications for augmented reality. Studies in health technology and
informatics, 131-136.
Johnston, E. B., Cumming, B. G., & Parker, A. J. (1993). Integration of depth modules:
Stereopsis and texture. Vision research, 33(5), 813-826.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago, IL: University of Chicago
Press.
Kalkofen, D., Mendez, E., & Schmalstieg, D. (2007, November). Interactive focus and context
visualization for augmented reality. In Proceedings of the 2007 International Symposium on
Mixed and Augmented Reality (ISMAR) (pp. 1-10). IEEE Computer Society.
128
Kennedy, J. M. (1974) A psychology of picture perception. Jossey-Bass, San Francisco.
Kennedy. J. M., Juricevic, I. and Bai, J. (2003) Line and borders of surfaces: grouping and
foreshortening. In Reconceiving pictorial space. (Hecht, H., Schwartz, R. and Atherton M. Eds.)
(p.321-354). MIT Press: Cambridge, MA.
Kennedy, J. M., & Wnuczko, M. (2015). What Is a Surface? In the Real World? And Pictures? In
Investigations into the phenomenology and the ontology of the work of art (pp. 89-107). Springer
International Publishing.
Kilpatrick, F. P., & Ittelson, W. H. (1953). The size-distance invariance hypothesis.
Psychological Review, 60(4), 223.
Landy, M. S., Maloney, L. T., Johnston, E. B., & Young, M. (1995). Measurement and modeling
of depth cue combination: in defense of weak fusion. Vision research, 35(3), 389-412.
Lerotic, M., Chung, A. J., Mylonas, G., & Yang, G. Z. (2007, October). Pq-space based non-
photorealistic rendering for augmented reality. In International Conference on Medical Image
Computing and Computer-Assisted Intervention (pp. 102-109). Springer, Berlin, Heidelberg.
Leroy, L., Fuchs, P., & Moreau, G. (2012). Visual fatigue reduction for immersive stereoscopic
displays by disparity, content, and focus-point adapted blur. IEEE Transactions on Industrial
Electronics, 59(10), 3998-4004.
Livingston, M. A., Dey, A., Sandor, C., & Thomas, B. H. (2013). Pursuit of “X-ray vision” for
augmented reality (pp. 67-107). Springer New York.
Mallot, H. A., Roll, A., & Arndt, P. A. (1995). Disparity-evoked vergence is directed towards
average depth.
McIntire, J. P., Havig, P. R., & Geiselman, E. E. (2014). Stereoscopic 3D displays and human
performance: A comprehensive review. Displays, 35(1), 18-26.
Milgram, P., & Kishino, F. (1994). A taxonomy of mixed reality visual displays. IEICE
Transactions on Information and Systems, 77(12), 1321-1329.
129
Mohr, P., Kerbl, B., Donoser, M., Schmalstieg, D., & Kalkofen, D. (2015, April). Retargeting
technical documentation to augmented reality. In Proceedings of the 33rd Annual ACM
Conference on Human Factors in Computing Systems (pp. 3337-3346). ACM.
Otsuki, M., & Milgram, P. (2013, October). Psychophysical exploration of stereoscopic pseudo-
transparency. In Proceedings of the 2013 International Symposium on Mixed and Augmented
Reality (ISMAR) (pp. 1-6). IEEE.
Parker, A. J., Christou, C., Cumming, B. G., Johnston, E. B., Hawken, M., & Zisserman, A.
(1992). The analysis of 3-D shape: psychophysical principles and neural mechanisms. In
Approaches to Understanding Vision.
Patterson, R. (2009). Review Paper: Human factors of stereo displays: An update. Journal of the
Society for Information Display, 17(12), 987-996.
Perrin, J., Fuchs, P., Roumes, C., & Perret, F. (1998, July). Improvement of stereoscopic comfort
through control of the disparity and of the spatial frequency content. In Aerospace/Defense
Sensing and Controls (pp. 124-134). International Society for Optics and Photonics.
Peterson, M.A. (2015). Low-level and high-level contributions to figure-ground organization:
Evidence and theoretical implications. In J. Wagemans (ed.). The Oxford Handbook of
Perceptual Organization, 259-280. Oxford University Press.
Rao, A. R., & Lohse, G. L. (1993). Identifying high level features of texture perception. CVGIP:
Graphical Models and Image Processing, 55(3), 218-233.
Rashbass, C., & Westheimer, G. (1961). Disjunctive eye movements. The Journal of Physiology,
159(2), 339-360.
Rosenthal, M., State, A., Lee, J., Hirota, G., Ackerman, J., Keller, K., & Fuchs, H. (2002).
Augmented reality guidance for needle biopsies: an initial randomized, controlled trial in
phantoms. Medical Image Analysis, 6(3), 313-320.
130
Sandor, C., Cunningham, A., Dey, A., & Mattila, V. V. (2010). An augmented reality x-ray
system based on visual saliency. In Proceedings of the 2010 9th IEEE and ACM International
Symposium on Mixed and Augmented Reality on (pp. 27-36). IEEE.
Schall, G., Zollmann, S., & Reitmayr, G. (2013). Smart Vidente: advances in mobile augmented
reality for interactive visualization of underground infrastructure. Personal and ubiquitous
computing, 17(7), 1533-1549.
Schmalstieg, D., & Hollerer, T. (2016). Augmented reality: principles and practice. Addison-
Wesley Professional.
Schor, C., Heckmann, T., & Tyler, C. W. (1989). Binocular fusion limits are independent of
contrast, luminance gradient and component phases. Vision research, 29(7), 821-835.
Schor, C., Wood, I., & Ogawa, J. (1984). Binocular sensory fusion is limited by spatial
resolution. Vision research, 24(7), 661-665.
Sielhorst, T., Bichlmeier, C., Heining, S. M., & Navab, N. (2006, October). Depth perception–a
major issue in medical AR: evaluation study by twenty surgeons. In International Conference on
Medical Image Computing and Computer-Assisted Intervention (pp. 364-372). Springer Berlin
Heidelberg.
Singh, G., Swan II, J. E., Jones, J. A., & Ellis, S. R. (2010, July). Depth judgment measures and
occluding surfaces in near-field augmented reality. In Proceedings of the 7th Symposium on
Applied Perception in Graphics and Visualization (pp. 149-156). ACM.
Swan, J. E., Jones, A., Kolstad, E., Livingston, M. A., & Smallman, H. S. (2007). Egocentric
depth judgments in optical, see-through augmented reality. IEEE transactions on visualization
and computer graphics, 13(3), 429-442.
Thurstone, L. L. (1927). The method of paired comparisons for social values. The Journal of
Abnormal and Social Psychology, 21(4), 384.
Tsirlin, I., Allison, R. S., & Wilcox, L. M. (2008). Stereoscopic transparency: Constraints on the
perception of multiple surfaces. Journal of Vision, 8(5), 1-10.
131
van Ee, R., Adams, W. J., & Mamassian, P. (2003). Bayesian modeling of cue interaction:
bistability in stereoscopic slant perception. JOSA A, 20(7), 1398-1406.
Van Ee, R., Van Dam, L. C., & Erkelens, C. J. (2002). Bi-stability in perceived slant when
binocular disparity and monocular perspective specify different slants. Journal of Vision, 2(9),
597-607.
Wheatstone, C. (1838). Contributions to the physiology of vision.--Part the first. On some
remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical transactions
of the Royal Society of London, 371-394.
Wickens, C. D., Hollands, J. G., Banbury, S., & Parasuraman, R. (2015). Engineering
psychology & human performance. Psychology Press.
Wismeijer, D. A., Erkelens, C. J., van Ee, R., & Wexler, M. (2010). Depth cue combination in
spontaneous eye movements. Journal of vision, 10(6), 1-15.
Wöpking, M. (1995). Viewing comfort with stereoscopic pictures: An experimental study on the
subjective effects of disparity magnitude and depth of focus. Journal of the society for
information display, 3(3), 101-103.
Young, M. J., Landy, M. S., & Maloney, L. T. (1993). A perturbation analysis of depth
perception from combinations of texture and motion cues. Vision research, 33(18), 2685-2696.
Zollmann, S., Kalkofen, D., Mendez, E., & Reitmay, G. (2010, October). Image-based ghostings
for single layer occlusions in augmented reality. In Proceedings of the 2002 International
Symposium on Mixed and Augmented Reality (ISMAR) (pp. 19-26). IEEE.
132
Appendix A: Forms and Questionnaires
A1. Experiment 1
INFORMATION SHEET FOR PARTICIPANTS
Thank you for agreeing to participate in this experiment, the purpose of which is to investigate
one’s ability to perceive a sense of transparency when viewing images using stereoscopic
displays (i.e. “3D displays”).
At the beginning of the session you will be asked to fill in a short questionnaire, after which you
will be given a brief visual test to confirm your ability to view images stereoscopically. If
accepted for participation, you will undergo a training period of about 4 minutes, to familiarise
yourself with the experiment. Please feel free to ask questions during or at the end of the training
period.
The essential stimulus that you will be shown, on a computer screen, is illustrated in the figure
below. In all cases you will observe these images using “3D glasses”, which will cause you to
perceive the images “in depth” – in other words, at different distances from the plane of the
computer monitor.
In each figure, you will observe a surface consisting of a coloured texture, a portion of which
may or may not be covered with a set of black dots. In the cases where the black dots are present,
the number and size of the dots will vary from presentation to presentation. In the centre of the
surface will be a circle. In some cases, the circle will appear to be closer to you than the surface;
in others it will appear to be farther away – that is, behind the surface. In both cases the circle
will look solid and uninterrupted.
In this experiment, you will be presented a series of images as described above. If you think that
the circle is farther away from you than the surface (in other words, if you think that the circle is
behind the surface), you will click on the corresponding key for “Behind” (8); otherwise you will
click on the corresponding key for “Front” (2). You will then be prompted to press the spacebar
to move on the next trial. If you wish to take a break sometime between the trials, you may do so
after selecting your response and before pressing the spacebar. There will be 300 such images;
however, you should be able to make each judgement fairly rapidly, and finish this experiment in
about 30 minutes. It should be noted that, for each judgement, you have a total of 4 seconds to
enter your choice. If you fail to do so, you will be automatically presented with the next stimulus
and will, later on, be presented with that same trial again until you have succeeded in entering
your choice within the 4 second limit. Upon completion of the experiment, you will be offered
$10, as a token of our gratitude for your participation.
133
PARTICIPANT CONSENT FORM
"Development of a Method for Facilitating the Percept of Transparency in Stereoscopic Augmented Reality Environments"
I have read and understood the information sheet, and I hereby consent to participate in this
research project, with the understanding that participation involves:
• Filling out one questionnaire before the experiment.
• Performing a set of psychophysical tasks, which have been explained to me.
• The psychophysical tasks involve wearing 3D glasses to view a stereoscopic (3D)
display.
I understand that the experiment will comprise a single half hour session.
I also confirm that any questions I have asked have been answered to my satisfaction, but in the
future I may ask further questions I may have about the study or the research procedures.
I understand that my name will not appear on the questionnaire, that my performance data will
remain confidential, and that only the investigators of this study will have access to my
experimental data.
Furthermore, although aggregated results of this may be presented at conferences or in scientific
journals, I also understand that no reference to the identity of any participant in this study will be
possible through publication of its results, thereby ensuring that all participants will remain
anonymous.
I understand that participation in this study is strictly voluntary. After completing the session, I
will be paid $10 for my participation.
I do, however, have the right to refuse to answer any questions asked on the questionnaire, as
well as the right to withdraw from the study at any time without any penalty and without the
need to provide any explanation for doing so.
I understand that there is a chance that I may experience some nausea or a headache as a result of
wearing the 3D glasses. If I do experience this and I do not wish to proceed as a consequence, I
shall be free to withdraw from the experiment.
In the event of early withdrawal, my remuneration will be calculated based on the actual time I
shall have spent in the study, at a rate of $15 per hour. As part of my right to withdraw from the
study, I may request that my data be destroyed. However, in the absence of such a request, I
understand that the investigators may elect to use those data, with no changes to the restrictions
to their use.
I understand that I may request a summary of the research results by contacting the investigators
directly.
I have been given a copy of this consent form. I understand what this study involves and agree to
participate.
134
Participant's Name: ____________________________
Signature: _________________________________ Date: ________________________
The persons who may be contacted about this research are: Sanaz Ghasemi and Prof. Paul
Milgram
Both may be reached at: 5 King’s College Rd., Toronto, ON M5S 3G8; Tel. 416-978-3662.
You may also contact the Ethics Review Office at [email protected] or 416-946-3273,
if you should have questions about your rights as a participant.
QUESTIONNAIRE
"Development of a Method for Facilitating the Percept of Transparency in Stereoscopic Augmented Reality Environments"
Subject number: _____ Date: _______________________
1. Gender (please circle one): Male Female
2. Age (please circle one): 18-19 20-29 30-39 40-49 ≥50
3. Do you ordinarily wear corrective lenses of any kind? Yes No
If yes, are you wearing your prescribed lenses right now? Yes No
4. To the best of your knowledge, are you able to view images stereoscopically (“in 3D”)?
Yes No
135
A2. Experiment 2
INFORMATION SHEET FOR PARTICIPANTS
Thank you for agreeing to participate in this experiment, the purpose of which is to investigate
one’s ability to perceive a sense of transparency when viewing images using stereoscopic displays
(i.e. “3D displays”).
At the beginning of the session you will be asked to fill in a short questionnaire, after which you
will be given a brief visual test to confirm your ability to view images stereoscopically. If accepted
for participation, you will undergo two separate sections to complete the experiment. The first
section will require you to perform a shape determination task which you should try to perform as
accurately as possible while the next section will be more subjective (i.e. there will be no right
answer to the selections you will be making).
The essential stimulus that you will be shown, on a computer screen, is illustrated in the figure
below. In all cases you will observe these images using “3D glasses”, which will cause you to
perceive the images “in depth” – in other words, at different distances from the plane of the
computer monitor.
Figure 1: Essential Stimulus
In each figure, you will observe a surface consisting of a coloured (purple) texture, a portion of
which may or may not be covered with a set of black dots. In the cases where the black dots are
present, the number and size of the dots will vary from presentation to presentation. In the centre
of the surface will be a blue circle that will always appear farther away – that is, behind the surface.
However, the circle will look solid and uninterrupted (as though the purple texture is transparent).
In addition to the blue circle, you will also be presented with two yellow shapes placed on the
purple texture covered with the black dots. In all of the stimuli, the outer yellow shape will be a
circle. However, the inner yellow shape may be a circle or an ellipse (the orientation of the major
axis of the ellipse will be different every time). In fact, there will be a 70% probability that it will
be an ellipse and a 30% probability that it will be a circle.
During a brief training session you will go through, you will become familiar with the experimental
procedure which will require you to determine whether the inner shape is a circle or an ellipse. If
136
you think that the inner yellow shape is a circle, you will press the “up” arrow; otherwise (if you
think that the inner yellow shape is an ellipse) you will choose the orientation of its major axis by
pressing the number corresponding to it (a guide will be provided to you).
You may choose to determine the shape of the inner yellow object based on the homogeneity of
its distance from the outer yellow circle. Obviously, the easiest case will be where the surface is
not covered by the black dots. The figures below can give you an idea about how subtle the
difference will be.
(a) (b)
Figure 2: The inner shape can be a circle (a) or an ellipse (b).
You will have 6 seconds to make your selection and will then be prompted to press the spacebar
to move on to the next trial. If you fail to select your answer within the time limit, the stimulus
will disappear and you will be presented with the next trial which will repeat itself again some
other time during the experiment. If you wish to take a break sometime between the trials, you
may do so after selecting your response and before pressing the spacebar. This part of the
experiment will take about 16 minutes to complete. Please try to make your choices as accurately
as possible. In case you choose to release your email in the consent form, your chances of winning
a $50 Amazon gift card (in the lottery to be held once the experiments are done) will increase
proportionally to your performance score.
Once you are done with this section, you will go through a short training session and move on to
the next part which will require you to make a comparison between two images similar to those in
Figure 1. In this section, for each pair of images, you will be asked to answer this question:
“In which image (left or right) does the impression of transparency look more convincing?”
(The black dots may or may not appear to you as “holes” in the coloured surface, and the surface
may or may not appear to you to be transparent. There are no “right answers” in this experiment;
please try simply to express to us whatever it is that you perceive. Please answer the question even
if neither surface seems to you to be particularly transparent.) There will be no time limit for this
task and you will be able to make your selection using the right or left arrow key.
137
There will be 78 repetitions of these trials; however, you should be able to make each judgement
fairly rapidly, and finish this experiment in less than 30 minutes. Upon completion of the
experiment, you will be offered $10, as a token of our gratitude for your participation.
PARTICIPANT CONSENT FORM
"Development of a Method for Facilitating the Percept of Transparency in Stereoscopic Augmented Reality Environments"
I have read and understood the information sheet, and I hereby consent to participate in this
research project, with the understanding that participation involves:
• Filling out one questionnaire before the experiment.
• Performing a set of psychophysical tasks, which have been explained to me.
• The psychophysical tasks involve wearing 3D glasses to view a stereoscopic (3D)
display.
I understand that the experiment will comprise a single half hour session.
I also confirm that any questions I have asked have been answered to my satisfaction, but in the
future I may ask further questions I may have about the study or the research procedures.
I understand that my name will not appear on the questionnaire, that my performance data will
remain confidential, and that only the investigators of this study will have access to my
experimental data.
Furthermore, although aggregated results of this may be presented at conferences or in scientific
journals, I also understand that no reference to the identity of any participant in this study will be
possible through publication of its results, thereby ensuring that all participants will remain
anonymous.
I understand that participation in this study is strictly voluntary. After completing the session, I
will be paid $10 for my participation. By providing my email below, I agree to take part in a
lottery for a $50 Amazon gift card with the chances of my winning determined based on my
performance score.
I do, however, have the right to refuse to answer any questions asked on the questionnaire, as
well as the right to withdraw from the study at any time without any penalty and without the
need to provide any explanation for doing so.
I understand that there is a chance that I may experience some nausea or a headache as a result of
wearing the 3D glasses. If I do experience this and I do not wish to proceed as a consequence, I
shall be free to withdraw from the experiment.
In the event of early withdrawal, my remuneration will be calculated based on the actual time I
shall have spent in the study, at a rate of $15 per hour. As part of my right to withdraw from the
study, I may request that my data be destroyed. However, in the absence of such a request, I
138
understand that the investigators may elect to use those data, with no changes to the restrictions
to their use.
I understand that I may request a summary of the research results by contacting the investigators
directly.
I have been given a copy of this consent form. I understand what this study involves and agree to
participate.
Participant's Name: ____________________________
Participant's Email: ____________________________
Signature: ____________________________________ Date: ________________________
The persons who may be contacted about this research are: Sanaz Ghasemi and Prof. Paul
Milgram
Both may be reached at: 5 King’s College Rd., Toronto, ON M5S 3G8; Tel. 416-978-3662.
You may also contact the Ethics Review Office at [email protected] or 416-946-3273,
if you should have questions about your rights as a participant.
QUESTIONNAIRE
"Development of a Method for Facilitating the Percept of Transparency in Stereoscopic Augmented Reality Environments"
Date: _______________________
1. Gender (please circle one): Male Female
2. Age (please circle one): 18-19 20-29 30-39 40-49 ≥50
3. Do you ordinarily wear corrective lenses of any kind? Yes No
If yes, are you wearing your prescribed lenses right now? Yes No
4. To the best of your knowledge, are you able to view images stereoscopically (“in 3D”)?
Yes No
139
A3. Experiment 3
INFORMATION SHEET FOR PARTICIPANTS
Thank you for agreeing to participate in this experiment, the purpose of which is to investigate
one’s ability to make correct depth judgements when viewing images using stereoscopic displays
(i.e. “3D displays”).
At the beginning of the session you will be asked to fill in a short questionnaire, after which you
will be given a brief visual test to confirm your ability to view images stereoscopically. If accepted
for participation, you will start the experiment which will be divided into two 25-minute sections
separated by a 10-minute break. This section will require you to perform a depth judgement task
which you should try to perform as accurately as possible while providing a subjective measure of
your certainty regarding the decision you made (i.e. there will be no right answer to this latter
measure).
The essential stimulus that you will be shown, on a computer screen, is illustrated in the figure
below. In all cases you will observe these images using “3D glasses”, which will cause you to
perceive the images “in depth” – in other words, at different distances from the plane of the
computer monitor.
Figure 1: Essential Stimulus
In each figure, you will observe a garbage bin, a portion of which may or may not be covered with
a set of black dots. In the cases where the black dots are present, the number and size of the dots
will vary from presentation to presentation. In the centre of the bin will be a blue truncated cone
that will be faced towards you. When viewing this truncated cone in 3D, you may or may not
perceive parts of it behind the bin’s surface. In all cases, the truncated cone will look solid and
uninterrupted (as though the bin’s surface is transparent).
If viewed from above, what you see will resemble to what is shown in Figure 2. Your task will be
to determine what portion of the truncated cone’s length is located behind the bin’s surface (x-
tenth). During a brief training session you will go through, you will become familiar with the
experimental procedure. The stimulus will be presented for 2 seconds and you will then be
prompted to make your selection. In determining your response (which will range from 1 to 10),
140
you will use the row of numbers on top of the keyboard and you will press ‘enter’ to move on to
the next trial. Following this, you will be presented with another question asking you to determine
how difficult it was for you to make that depth judgement. You may choose any number from 1
(easiest) to 4 (most difficult). Please note that the truncated cone will be presented with different
diameters and lengths. Therefore, in making your depth judgements refrain from using relative
size or visual height as a depth cue.
Figure 2: View from above.
After 20 minutes of going through the experiment, you will be given a 10-minute break after which
you will resume with the experiment. Please try to make your choices as accurately as possible. In
case you choose to release your email in the consent form, your chances of winning a $50 Amazon
gift card (in the lottery to be held once the experiments are done) will increase proportionally to
your performance score. There will be a total of 570 trials; however, you should be able to make
each judgement fairly rapidly, and finish this experiment in less than an hour.
Once you are done with this section, you will be interviewed about your experience while going
through the experiment. Your answers will be recorded to later on be transcribed and analyzed as
qualitative data. There are no “right answers” in this section; please try simply to express to us
whatever it is that you perceive. Upon completion of the experiment, you will be offered $20, as a
token of our gratitude for your participation.
PARTICIPANT CONSENT FORM
"Using Random Dot Patterns to Achieve X-ray Vision with Stereoscopic Augmented Reality Displays"
I have read and understood the information sheet, and I hereby consent to participate in this
research project, with the understanding that participation involves:
• Filling out one questionnaire before the experiment.
• Performing a set of depth judgement tasks, which have been explained to me.
• The depth judgement tasks involve wearing 3D glasses to view a stereoscopic (3D)
display.
• Taking part in an interview for which my replies will be recorded.
141
I understand that the experiment will comprise a one-hour and 15-minute session.
I also confirm that any questions I have asked have been answered to my satisfaction, but in the
future I may ask further questions I may have about the study or the research procedures.
I understand that my name will not appear on the questionnaire, that my performance data and
responses to interview questions will remain confidential, and that only the investigators of this
study will have access to my experimental data.
Furthermore, although aggregated results of this may be presented at conferences or in scientific
journals, I also understand that no reference to the identity of any participant in this study will be
possible through publication of its results, thereby ensuring that all participants will remain
anonymous.
I understand that participation in this study is strictly voluntary. After completing the session, I
will be paid $20 for my participation. By providing my email below, I agree to take part in a
lottery for a $50 Amazon gift card with the chances of my winning determined based on my
performance score.
I do, however, have the right to refuse to answer any questions asked on the questionnaire, as
well as the right to withdraw from the study at any time without any penalty and without the
need to provide any explanation for doing so.
I understand that there is a chance that I may experience some nausea or a headache as a result of
wearing the 3D glasses. If I do experience this and I do not wish to proceed as a consequence, I
shall be free to withdraw from the experiment.
In the event of early withdrawal, my remuneration will be calculated based on the actual time I
shall have spent in the study, at a rate of $15 per hour. As part of my right to withdraw from the
study, I may request that my data be destroyed. However, in the absence of such a request, I
understand that the investigators may elect to use those data, with no changes to the restrictions
to their use.
I understand that I may request a summary of the research results by contacting the investigators
directly.
I have been given a copy of this consent form. I understand what this study involves and agree to
participate.
Participant's Name: ____________________________
Participant's Email: ____________________________
Signature: _____________________________________ Date: ________________________
The persons who may be contacted about this research are: Sanaz Ghasemi and Prof. Paul
Milgram
Both may be reached at: 5 King’s College Rd., Toronto, ON M5S 3G8; Tel. 416-978-3662.
142
You may also contact the Ethics Review Office at [email protected] or 416-946-3273,
if you should have questions about your rights as a participant.
QUESTIONNAIRE
"Using Random Dot Patterns to Achieve X-ray Vision with
Stereoscopic Augmented Reality Displays"
Date: _______________________
1. Gender (please circle one): Male Female
2. Age (please circle one): 18-19 20-29 30-39 40-49 ≥50
3. Do you ordinarily wear corrective lenses of any kind? Yes No
If yes, are you wearing your prescribed lenses right now? Yes No
4. To the best of your knowledge, are you able to view images stereoscopically (“in 3D”)?
Yes No
INTERVIEW SCRIPT42
1. At any time during the experiment, did you experience any difficulty in perceiving the stimuli
in 3D?
If so:
• Can you remember any specific type of pattern with which this occurred?
• Did it happen often?
• Are you able to describe that difficulty?
If no:
• Did you experience any double image or difficulty in fusing the images?
2. Do you remember using any specific strategy in making your judgments?
42 This script was used as a framework for the semi-structured interview of Experiment 3. Modifications to this
script were made depending on the responses of the participant (interviewee).
143
If so, can you describe them?
3. Show selection of stimuli (randomized order of high and low dot density and dot size, random
depths (not 0 or 100%), first sharp and then blurry, for duration same as experiment and ask
about strategy (each time).
4. Repeat Question 3 but with unlimited time.
5. a) Do you have the impression that the black dots are part of the surface of the bin?
b) Please describe how you perceive them.
c) Do you perceive any holes on the surface of the bin?
144
Appendix B: Supplementary Material for Chapter 6 (Experiment 3)
B.1. Summary of Insights Gained from Pilot Studies
In the process of designing, fine-tuning, and finalizing Experiment 3 that was described in
Chapter 6, various pilot studies were conducted. This section provides a summary of some of the
key lessons learned from those studies that influenced the final design of the experiments:
Prior to opting for a truncated cone as the virtual object, a wireframe pyramid was tried out, as
shown in Figure B.1. As in the final version of Experiment 3, the depth judgement task in this
case, asked the observer to determine what proportion of the pyramid’s length is placed behind
the real surface. Pilot tests involving 4 participants were run with this setup, and the results,
observations and interviews showed that: Firstly, the orientation of the pyramid was not always
clear to the participants – i.e. whether the apex was pointing inwards or outwards43. Secondly,
participants mainly used the tip (apex) and the lines along the edges of the pyramid to make their
depth judgement.
Since using such cues was not anticipated to be possible for typical X-ray applications of AR, the
idea of using a truncated cone was tried out. Because a truncated cone contains no wireframe
outlines other than the two circles at the ends, when this was tried out initially, there was concern
that participants would have difficulty properly perceiving the shape as a truncated cone.
Consequently four equally spaced thin longitudinal connecting lines were added equally spaced
around the circumference of the truncated cone, to reinforce perception of the proper shape. The
pilot results showed that, by doing so, participants were more likely to make their depth
judgements based on the distance of the larger and smaller base of the truncated cone with
respect to the real surface, which served our desired intention.
43 This was also the case with using a wireframe cube as the virtual object (as a result of the Necker cube illusion).
145
Figure B.1: Stereoscopic image showing the virtual pyramid placed halfway along its length
relative the surface of the bin (i.e. the real surface). The apex of the pyramid in this image is
pointed towards to the observer.
Some other factors that were also varied throughout various pilot studies to arrive at an
optimized value for the experiment were: the length of truncated cone(s), the time duration for
which the stimulus was presented, the width of lines connecting the circle bases of the truncated
cone, the blur level of the random dot patterns, and the number of sections into which to break
length of truncated cone.
B.2. Difficulty Rating of Depth Estimation Task
As mentioned in Section 6.4.2, for each trial, after having responded to the depth judgement task,
participants selected a response to the question: “On a scale of 1 (easiest) to 4 (most difficult),
how difficult did you find the task?” FigureB.2, FigureB.3 and FigureB.4 present scatterplots
showing the ‘Difficulty Rating’ results as a function of the virtual object’s actual depth
proportion (relative to the real surface) for the various patterns. FigureB.5 shows corresponding
scatterplot data for the No Pattern condition. As before, in these figures, the sizes and colours of
the dots are proportional to the number of occurrences at each point (where more occurrences are
shown with larger and darker circles).
146
(a)
147
(b)
Figure B.2: Scatterplots showing the ‘Difficulty Rating’ as a function of the virtual object’s
actual depth proportion for dot density 20% and various dot sizes: (a) Sharp condition, (b) Blurry
condition. The size and colour of the dots are proportional to the number of occurrences at each
point.
148
(a)
149
(b)
Figure B.3: Scatterplots showing the ‘Difficulty Rating’ as a function of the virtual object’s
actual depth for various dot sizes and dot density of 40%: (a) Sharp condition, (b) Blurry
condition.
150
(a)
151
(b)
Figure B.4: Scatterplots showing the ‘Difficulty Rating’ as a function of the virtual object’s
actual depth for various dot sizes and dot density of 60%: (a) Sharp condition, (b) Blurry
condition.
152
Figure B.5: Scatterplot showing the ‘Difficulty Rating’ as a function of the virtual object’s
actual depth for the No Pattern condition.
B.3. Transcript of Interviews with Participants
Below, the approximate transcripts of participants’ responses to the review questions are
provided. These responses include those obtained from the participants of the pilot studies44 as
well (Participants 1-4).
Participant 1 (Pilot Study):
1. All the time. Had a hard time seeing the cone in 3D almost all the time. The more the cone did
not look 3D or the more I had difficulty in focusing on it, the farther I assumed it was.
Had difficulty in fusing, particularly with stimuli with cone behind bin and with patterns because
my brain wasn’t used to it. If it felt weird to look at, it was behind the bin.
No double image though!
44 These pilot studies were done using the truncated cone as the virtual object.
153
3. Limited time:
Most of the time, the cone looked in front. Anytime it seemed blurry or I had a hard time fusing
(it didn’t look 3D), it was behind.
Front part looked clear, not completely behind.
Didn’t have time to build up a specific strategy. I was too focused on the cone so I wasn’t able to
register the bin properly.
Patterns definitely mattered, because when they weren’t there, I would automatically say 0 (see
the cone in front of the bin).
Look at one part of the bigger ring first. If I had a difficult time perceiving it, then it’s behind.
Then I would look at the front and make the same judgement about it. Then I would use that to
decide the squat of the cone (which I had a really hard time doing). Then I would make my
judgement.
Blurry patterns were confusing and were hard to use for making the depth judgements. The cone
would still look weird though, when it was placed behind the surface of the bin.
5. During the experiment, I didn’t see the patterns as part of the surface of the bin because I was
so focused on the cone that I wasn’t even registering the bin as a bin.
However, when I look at it now, I do see the dots as part of the surface of the bin. I don’t see holes.
Participant 2 (Pilot Study):
1. No difficulty.
2. Sometimes, I would compare it to the one before. Especially when it was at extremes. But other
than that I don’t think so.
3. Limited time: I would look at how protruded it looked. To do that, if the front or back looked
more solid, more protruding. If it was behind, it wasn’t as stark.
154
4. Unlimited time: very different when I look at it for a long time. I get even more uncertainty. My
eyes play tricks on me. I look at the edge of the cone then I might look for the midpoint as guidance.
I think I focus more on the circle that’s farther. Blurry patterns seemed more obvious. When there
are a lot of dots, it’s more noisy and more difficult. That’s why when it’s blurry it’s better cause I
can’t focus on a single point.
5. Dots are part of the surface, no holes.
Participant 3 (Pilot Study):
1. No double image.
2. I would look at the difference between the bigger and smaller circle of the cone.
3. Limited time: difference between small and big circle size and length and width of connecting
rods. If pattern was darker or blurrier, I would get confused. I would use the same strategy but with
more uncertainty.
4. Unlimited time: Width of connecting rods and relative size of circles!
5. Dots are part of surface, no holes.
Participant 4 (Pilot Study):
1. No difficulty, no double image.
2. I looked at the bigger circle and the lines coming out and using that. If bigger circle seemed
closer to dots, it was less inside. Compared bigger circle to surface and then looked at connecting
lines.
3. Limited time: Look at bigger circle and compare to background. Then, using the lines, I would
follow my gaze to the smaller circle and make my judgement.
4. Unlimited time: Same strategy. + compare depth of small circle to large circle.
Blurry patterns make it harder to localize the surface of the bin. More dots makes it even more
difficult.
155
5. Dots are part of the surface, no holes.
Participant 5:
1. Some instances I would stop seeing in 3D for a few seconds. No double image.
2. Hard to sustain a strategy. I would make a flash judgement first. It would work better for me.
3. Limited time: Gut judgement. I looked at VO then the background. Holistic judgement. I looked
more at the farther circle and decided how far it was.
My strategy didn’t intentionally change with patterns. With no pattern, it was easier to tell if it was
in front. But when it wasn’t, it was much more difficult.
4. Unlimited time: Look at smaller circle first, if background was blurry, surface was farther. Sharp
patterns seem to be easier to look at.
With black points, it’s easier to tell if VO is behind than white points.
5. Dots look like a cut-out. Black points are holes cut out from surface.
Participant 6:
1. No double image.
2. I don’t remember using a specific strategy.
4. Unlimited time: Gut feeling. Contrast between background and cone. Holistic look at the cone.
Blurry patterns are easier to see.
5. In general, I see dots as part of the surface. No holes.
Participant 7:
1. No double image.
2. I tried a strategy and then I stopped. No clear strategy.
156
3. Limited time: I looked at how protruding the smaller circle compared to the bigger circle. I also
compared it to the edge of the bin.
I tried different strategies for each one.
4. Unlimited time: The black dots help me more. White dots look like they’re on top when large
circle is behind.
Sharp patterns were easier.
5. Seems like white dots are on the surface and black dots are showing what’s inside the bin (black
background!). Like holes.
Participant 8:
1. Some stimuli looked flatter than some others. No double image.
2. I looked at borders but also holistically and sometimes I tried to see the surface going through
it. If it were more in front, it was easier.
4. Unlimited time: Look at bigger circle, then smaller circle and then look at it holistically.
Sometimes VO looked like it was behind the dots.
Some of the blurrier ones were easier. They give me a better sense of the plane.
If VO looked jagged (changing shades passing through black and white dots), it’s farther behind.
With No Pattern and the VO wasn’t completely out, most difficult.
5. Dots are part of the surface, no holes.
Participant 9:
1. Yes. When dots were large, it was difficult to see in 3D. When the larger circle is deeper into
the bin, I would see double.
2. I couldn’t use any specific strategy other than seeing in double.
4. Unlimited time: When larger circle was not as clear, it was farther inside.
157
It was easier to guess with the blurry patterns.
5. Dots are part of the surface, no holes.
Participant 10:
1. No double image.
2. At the beginning, I used size (of circles) or length of connecting rods. But then I stopped. With
No Pattern, it was difficult to see what the depth is (seems half way in, halfway out).
4. Unlimited time: Edge of big circle seems ridged (?) between black and white points so it seems
to be out. Sharp patterns are easier. I look at the lines connecting the large and small circles. Is it
clear? If it’s clear, it’s closer.
5. Dots are part of the surface, no holes.
Participant 11:
1. No double image.
2. If it’s obvious, it’s more outside. I looked at the size and position of the circle and the blur of
the pattern.
4. Unlimited time: More contrast, more outside.
Higher density is easier. Low density makes it difficult.
Blurry and lower density is relatively easy too.
5. Dots are part of the surface. No holes.
Participant 12:
1. No double image.
2. When No pattern, it seemed outside.
158
4. Unlimited time: I look at large circle first and make a judgement. Then I look at small circle and
make a judgement.
Lower dot densities were easier. But with no pattern, it was even more difficult.
Sharp ones were easier.
I look at edges of black and white dots.
5. Dots are part of the surface. No holes
Participant 13:
1. With blurry ones, I wouldn’t see double images but I would have difficulty in fusing the image.
2. When the circle is larger it looks deeper inside. With black dots, the cone seemed deeper inside.
I looked at the left edge of the bigger circle to make my judgement. Blinking also helped.
4. Unlimited depth: I looked at edges of circle.
Blurry was difficult to see depth with.
Lower dot densities were easier.
5. Dots are part of the surface, no holes.
Participant 14:
1. No double image.
2. No Pattern is difficult. I looked at the relative size of the two circles.
4. Unlimited time: I looked at the line connecting the two circles. If it’s longer, it’s a longer cone.
Then I compare large circle to surface, then small circle to surface.
No difference between dot densities and sharp/blurry.
5. Dots are part of the surface, no holes.
159
Participant 15:
1. No double image:
2. No Pattern was difficult and seemed like the cone was outside. No pattern also didn’t look very
3D.
If larger circle was out of focus, it made it look like it’s behind the bin. The closer it was, the more
focused (and clear) it looked.
4. Unlimited time: (same as above)
Smaller dot sizes and higher dot densities were more difficult.
Blurry patterns were easier.
5. Dots are part of the surface, no holes.
Participant 16:
1. No double image.
When there were more black dots, it was harder to perceive in 3D.
2. I looked at the small circle and then I would compare the depth of the small circle to that of the
big circle.
4. Unlimited time: I looked at the depth of the front of the cone and then I would look at the
connecting lines.
Lower dot densities were easier.
Blurry was easier because of the limited time.
5. Dots are part of the surface, no holes.
Participant 17:
1. No double image.
160
High dot densities were most difficult.
Blurry patterns were easier.
2. I compared trials to each other.
4. Unlimited time: I looked at smaller circle and decided based on its depth. I mostly focused on
the small circle and then, sometimes, I compared it to the depth of large circle.
I usually look at the black dots.
5. Dots are part of the surface, no holes.
Participant 18:
1. No double image.
2. If background was blurry, the cone looked clearer and it looked more outside. But if pattern and
cone looked clear (sharp), I felt it was inside.
4. Unlimited time: Blurry pattern looked farther away.
I also considered length of cone.
I also sometimes looked at the line connecting the circles and how it looked next to the black dots.
5. Dots are part of the surface, no holes.
Participant 19:
1. No double image.
2. Sometimes I would look at the size of the large and small circle. I had to resist comparing trials.
4. Unlimited time: I would try to determine the depth of the large and small circle. Mainly I would
look at small circle and sometimes I would focus on the center of the cone. If I had more time, I
would look at the pattern and try to find the position of the surface within the cone.
Higher densities were easier.
161
With No Pattern, I was either very certain or not certain at all.
With sharp patterns, the cone looked more opaque and I was more certain about my judgement.
I look at the black dots and the lines connecting the circles.
5. Dots are part of the surface, no holes.
Participant 20:
1. No double image but when it was farthest, the small circle looked a bit blurry.
2. I had a reference for halfway, 0 and 10 in my head and compared it to that.
4. Unlimited time: I focus on the small circle and how much it’s coming out.
Lower dot densities were easier.
Sharper patterns are better.
5. Dots are part of the surface, no holes.
162
Appendix C: Enlarged Stereo Images
C.1. Figure 1.3
163
C.2. Figure 4.2 (a)
164
Appendix D: Depth Cues45
To infer the depth of objects, our visual system integrates various sources of available depth
information, which are defined and categorized as depth cues. While some of these cues provide
information about the ordinal or relative depth of objects (e.g. which is closer or nearest), others
provide absolute depth information, which allows an observer to ascertain the absolute size of a
measurement (e.g. in meters). Generally, depth cues can be categorized into two groups. Those
which are a property of the object being perceived are referred to as object-centered cues and
those which are a result of our own visual system are referred to as observer-centered cues.
Object-centered Cues
These cues are sometimes also referred to as pictorial cues because of their use by artists in
conveying a sense of depth in a two-dimensional medium. They consist of the following cues:
Occlusion (Interposition)
Foreground-background occlusion occurs if an object intervenes between a vantage point and
another object. Both objects may project into the optic array at a vantage point. The front of the
foreground (or ‘occluder’) projects to the vantage point, and if it is opaque, either none or only
part of the other object can project to the vantage point. In this case, either the whole object or
the other part of it is hidden – ‘occluded’. In cases where the foreground is transparent, the
background object can either partially or completely project to the vantage point, with optic
arrays passing through the foreground’s surface. There are many kinds of optical information for
occlusion. Research on optical features encouraging the appearance of occlusion continues to this
day (Gillam and Grove, 2011; Kennedy, 1974; Peterson, 2015).
It is widely believed that occlusion is the most powerful depth cue at all distances where visual
perception holds. The reason for this is that our world is populated mostly by solid objects that
45 This appendix follows the discussion of depth cues in ‘Engineering Psychology and Human Performance’ by
Wickens et al. (2000) closely. The reader is advised to approach this appendix as a rather superficial review of
perceptual literature as it pertains to ‘engineering psychology and human performance’.
165
are opaque. However, transparent or translucent objects are also encountered regularly and can
be easily incorporated into our understanding.
In the context of X-ray vision applications of AR, various researchers have used the occlusion
cue by having features of the real surface occlude the virtual object, thus allowing the observer
easily to perceive the virtual object as being behind the real object (Lerotic et al., 2007; Avery et
al., 2009; Sandor, Cunningham, Dey & Mattila, 2010).
Linear Perspective
Through transformation of 3D information in a scene to a 2D image formed on our retina, one of
the phenomena that takes place is the conversion of two parallel lines to two lines converging
toward a single point receding in depth. In simpler terms, when two converging lines are seen,
they are ordinarily assumed to be two parallel lines extending away.
Height in the Plane (Relative Height)
Since objects on a common horizontal ground plane are usually observed from above, more
distant objects appear higher in the visual field. Therefore, in interpreting ordinal depths, this cue
can be quite effective. However, in situations where the ground is uneven, the depth knowledge
obtained from this cue could be limited to ordinal information, specifically with increasing
distance to the object.
Relative (Familiar) Size
As objects move farther away, their projected sizes become smaller. Therefore, if an object is
recognized or if the absolute size of a depicted object is known, one can infer its distance from its
apparent size using the size-distance invariance hypothesis (Kilpatrick and Ittelson, 1953).
Additionally, if one knows the relative sizes of multiple different objects, then their ordinal
proximity can be inferred from their relative apparent sizes in the visual field. Thus, the
important point about this cue is that it is a relative cue. In other words, a basis for comparison
must exist, either from the scene or from the observer’s experience.
Relative Density (Depth from Texture)
166
The characteristic spacing of a cluster of objects or features of a texture on the retina is referred
to as ‘relative density’. In the example of a textured plane, the optical projection of the grain will
grow finer at greater distances and, in these cases, this cue can also be termed as ‘textural
gradient’.
In fact, by projecting optical texture to the observer’s vantage point, the texture on the surface of
an object may allow the observer also to perceive its slant, distance and shape46. What makes the
“shape from texture” cue possible is perspective, which results in smaller and more closely
spaced optical projections of the markings (Cumming, Johnston & Parker, 1993).
Proximity-luminance Covariance
Since objects and lines that are closer to us are typically perceived as brighter, reductions in the
illumination of an object and/or the intensity of the projected optic array can be used as a cue to
infer increasing distance.
Aerial Perspective (Atmospheric Attenuation)
Tiny particles such as pollutants and moisture act as a translucent medium that cause more
distant objects to appear hazier or to have less contrast. This cue, which is mostly effective for
far-field distances, is referred to as ‘aerial perspective’.
Light and Shadow (Shading)
While cast shadows are the result of luminance attenuation on a surface due to the occlusion of a
light source, shadings are the luminance distribution on a surface due to the presence of a non-
occluded light source. In both cases, shadows and shading provide us with information on
objects’ shapes as well as orientations relative to us and relative to each other.
Motion Parallax
When an observer moves relative to a 3D scene, the projection of closer objects in the optic array
change faster than those that are farther. In other words, our perceptual system inversely relates
46 Slant, distance and shape are all related since slant is change of distance and differences in slant are part of shape.
167
an object’s distance to its relative degree of changes in angular motion. As such, motion parallax
is used to infer the shape and location of objects.
(Static) Observer-centered cues
The following depth cues are a result of the characteristics of the human visual system.
Accommodation
To bring images into focus on the retina, the curvature of the lenses of the eye requires
adjustment. This adjustment is referred to as accommodation. Closer objects require more
adjustment and, thus, sensing the amount of this adjustment might help in determining the
absolute depth of nearby objects.
Although static focus distances may not provide much information, changes in focus are what
makes this depth cue effective. Moreover, this cue is generally described as a monocular depth
cue since it does not require the involvement of both eyes.
(Binocular) Convergence
The amount of inward turning of the eyes when a focal point is fixated determines the degree of
‘convergence,’ and thus sensing the extent of this inward turning can help in determining the
distance of an object. This cue is used to provide absolute depth information for nearby objects.
Binocular Disparity
The ability to perceive a scene from two eyes that are separated by an interpupillary distance
provides (95% of) humans with one of the most important and perceptually acute sources of
depth information (Coutant & Westheimer, 1993).
When a scene is viewed, the fixation point (also referred to as the focal point) will fall on a
particular location on the retina of each eye, resulting in zero disparity. One can furthermore
envisage an imaginary geometric arc called the horopter, comprising all retinal points, including
the focal point, that also have zero retinal disparity. Other points that are closer or farther from
this arc are mapped onto disparate locations on the two retinas, which are nevertheless fused into
a single image in depth. The horopter thus provides a reference plane from which the ordinal
168
depth of other objects can be judged. Objects that are in front of the horopter (closer to the
observer) will result in fused images with crossed disparity, whereas objects that are behind the
horopter (farther from the observer) will result in fused images with uncrossed disparity. Based
on the amount of retinal disparity in the projection of each point to each eye, the visual system is
thus able to discern the ordinal depths between two points in space via the binocular disparity
depth cue (Patterson, 2009).
The importance of binocular disparity in perceiving depth was first shown through the invention
of the stereoscope by Wheatstone (1838), where a pair of flat drawings were used to achieve a
three-dimensional percept of an object. Later, in 1960, by introducing the concept of random dot
stereograms, Julesz (1971) made a significant contribution to the science behind stereo vision. A
typical example of a random dot stereogram is one where two images consist of identical
randomly distributed dots, but with a central square region that is shifted horizontally by a small
distance relative to the other image. When viewed individually, each image appears as a flat field
of random dots. However, when viewed stereoscopically, the central square region appears at a
depth that is different from the background plane of random dots. Random dot stereograms
provide evidence that binocular depth perception can be achieved without the need for
monocular form recognition.
Although the neurophysiological processes through which the brain derives depth information
from binocular disparity are outside the scope of this thesis, it is nevertheless important to note
the importance of vergence eye movements for the effectiveness of this cue. As mentioned, the
brain uses the horizontal disparity of objects on the retina to estimate their depth relative to the
fixation point. Through the use of vergence eye movements, the fixation point (defined as the
intersection of the line of sight of the two eyes) changes, resulting in a corresponding shift in the
position of the horopter. By doing so, our visual system is able to increase the range within
which it is able to perceive depth through binocular disparity (Foley & Richards, 1972). In
addition, the brain is able to use the corresponding changes in ocular vergence as a depth cue in
its own right. Therefore, if it were possible to provide extra cues that facilitate the observer’s
ability to converge her eyes at different depths, it may be possible to use the feedback from
convergence to increase the accuracy of information obtained from binocular disparity.
169
Appendix E: List of Abbreviations
AR Augmented Reality
OST Optical See-Through
MWF Modified Weak Fusion
DS Dot Size
DD Dot Density
PSE Point of Subjective Equality
SDT Signal Detection Theory
TR Transparency Rating
EDVO Estimated Depth of Virtual Object
DR Difficulty Rating