nuremberg institute of technology -...
TRANSCRIPT
Nuremberg Institute of Technology
Faculty of Electrical Engineering, Precision Engineering
and Information Technology
Study Programme
Media Engineering
Bachelor Thesis
submitted by
Patrick Werner
Integration of Light Field Data in
a Computer Generated 3D
Environment
Winter Semester 2016/2017
supervised by
Prof. Dr. Matthias Hopf
Prof. Dr. Stefan Röttger
Dipl. Ing. Matthias ZieglerFraunhofer IIS
Keywords: light field, virtual reality, 3D, computer graphics
Plagiarism Declaration in Accordance with Examination Rules
I herewith declare that I worked on this thesis independently. Furthermore, it was not
submitted to any other examining committee. All sources and aids used in this thesis,
including literal and analogous citations, have been identified.
Signature
Foreword
I would like to thank everybody that helped and supported me during the creation of this
thesis. Especially my supervisor at Fraunhofer IIS Dipl. Ing. Matthias Ziegler, who advised
me throughout this work. Additionally I would like to thank the group CIA which enabled
this thesis and gave me an opportunity to display my work at a convention. Finally I
thank my supervising professors Dr. Matthias Hopf and Dr. Stefan Röttger.
Contents
Acronyms 5
1. Introduction 6
1.1. State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Basics 9
2.1. Light field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.1. Capturing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.2. Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3. Depth image based rendering . . . . . . . . . . . . . . . . . . . . . . 15
2.2. 3D computer graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1. Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.2. Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3. Graphics pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3. Virtual reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4. Digital video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1. Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3. Theory 24
3.1. Transforming the CG coordinates . . . . . . . . . . . . . . . . . . . . . . . . 253.2. Intersection with the light field canvas . . . . . . . . . . . . . . . . . . . . . 263.3. Calculation of the perceived position . . . . . . . . . . . . . . . . . . . . . . 28
4. Implementation in the Unreal Engine 29
4.1. General information about the Unreal Engine . . . . . . . . . . . . . . . . . . 294.1.1. Actor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.2. Graphics Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.3. Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.4. Media Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.5. Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3. FViewrenderer plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1. Compute shader implementation: FForwardWarpDeclaration . . . . . . 334.3.2. Quality improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4. ALightfieldActor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4.1. C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4.2. Blueprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4.3. LightfieldMaterial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5. Embedding of video data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5.1. High resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.2. High frame rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.3. Quality improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 404.6. Preparation for CES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5. Evaluation 43
5.1. Comparison with a 3D object . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2. Impact of video compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4. User feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6. Conclusion 48
6.1. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Bibliography 49
Appendices
A. Blueprints 52
B. Code Listings of public members 58
C. List of Figures 60
D. List of Tables 62
E. List of Listings 63
Acronyms
xD x dimensional
UHD 3840 × 2160 resolution
API application programming interface
CES Consumer Electronics Show
CIA Computational Imaging and Algorithms
CG computer generated
CPU central processing unit
CRF constant rate factor
D6 six-sided die
D8 eight-sided die
DIBR depth image based rendering
FOV field of view
FHD 1920 × 1080 resolution
HMD head mounted display
IIS Institute for Integrated Circuits
GPU graphics processing unit
MSE mean squared error
PNG Portable Network Graphics
RGB red, green, blue
RHI rendering hardware interface
VR virtual reality
YCbCr luminance, blue-yellow chrominance, red-green chrominance
1. Introduction
1. Introduction
“Virtual Reality refers to immersive, interactive, multi-sensory, viewer-centered,
three-dimensional computer generated environments and the combination of
technologies required to build these environments”
— Carolina Cruz-Neira,SIGGRAPH ’93 Course Notes “Virtual Reality Overview” No. 23, pp. 1.1 - 1.18 (1993)
“Virtual Reality refers to the use of three-dimensional display and interaction
devices to explore real-time computer-generated environments.”
— Steve BrysonCall for Participation 1993 IEEE Symposium on Research Frontiers in Virtual Reality
Virtual reality (VR) has come a long way since the beginnings in 1962, when the Sensorama
first introduced multi-modal experiences with scent cards, wind and vibrations [12], albeit
still lacking interactivity. This was fixed by “The Sword of Damocles”, it featured the first
head mounted display (HMD) with positional and orientational tracking [30]. Nowadays
access to the virtual reality is even purchasable by consumers, with the HTC Vive, Oculus
Rift or Playstation VR.
With this breakthrough into the consumer world, new technologies regarding VR gain in
importance. There are several approaches to combine the reality with virtual reality, with
so called mixed or augmented reality applications. Augmented reality strives to improve
experiences for the user through additional virtual objects in the real world, projected
through a HMD [2]. Another approach would be improving computer generated worlds
with real objects. 3 dimensional (3D) reconstruction is possible through 3D scanning or
photogrammetry, however these technologies are mostly limited to static objects [18]. A
different way to reconstruct 3D data is through computer vision, here the results are purely
based on the captured data.
One computer vision approach is through light fields, which capture the light flow of a
scene in multiple directions. Light fields can be created by capturing a high amount of
viewpoints of the same scene, e.g. by a plenoptic approach like the consumer camera
Lytro Illum [16], the newer Lytro Immerge [17] or camera arrays as proposed by Stanford
University [31]. While Stanford University used dense camera arrays to capture light fields
the Fraunhofer Institute for Integrated Circuits (IIS) acquires light field data by capturing
with sparse arrays. Through using less cameras and therefore lowering the amount of
created data, processing times can be improved.
This type of footage allows for multiple new post processing effects, like the selection of a
new focal point with synthetic aperture or light field rendering. It allows the selection of
an arbitrary view point inside the light field bounds, for which a virtual camera view can
be computed.
Patrick Werner 6 of 63
1. Introduction
On the basis of this light field rendering, an approach that brings light field data into a
computer generated (CG) world seems plausible.
1.1. State-of-the-art
The current capturing methods to create 3D objects were briefly touched in the Introduction
1. Another possible solution would be modelling by hand, this however requires a master
of his craft to accurately represent the real object. The disadvantage in most of these is
the limitation to static objects, or the massively increased effort needed to capture video
data.
There are also different ways to represent real objects in 3D. Most capturing techniques
supply depth data, this data can be used to generate point clouds. However, these are
often hard to represent, because of the high amount of points necessary to completely fill
the object. Another way is to generate mesh data, either directly from the captured data,
or by surface reconstruction from point clouds. Though the accuracy is often limited and
movement is not easily translated into animations.
1.2. Motivation
With the rising popularity of VR and the logical connection to light field data, a real time
light field renderer has been developed at Fraunhofer IIS’ group Computational Imaging
and Algorithms (CIA). This renderer works through shaders on the graphics processing
unit (GPU), it encompasses a full depth image based rendering (DIBR) process.
The goal of this thesis is to combine the real time light field renderer with a CG environment,
enabling convincing light field data inside of a 3D world. From the given basics a new
projection model should be derived, that produces correct positions for the light field
rendering. Furthermore, this model, together with the shaders, should be implemented
into a CG environment. At first the light field itself should be integrated, but a special
focus lies on the addition of light field video data. This may lead to highly immersive
applications improved by live action footage. The result should be a demonstrator proving
the seamless connection of light fields and CG.
During the course of this thesis, group CIA decided that the resulting demonstrator should
be presented at the Consumer Electronics Show (CES) 2017 in Las Vegas. Considering
this, additional content was created, which should be highlighted with small interactive
elements.
Patrick Werner 7 of 63
1. Introduction
Light
field
Virtual
reality
3D computer
graphics
Basics
Thesis
Implementation
Theory
Demonstrator
Projection
model
Unreal
Engine
Fraunhofer
IIS shaders
Content
Prerequisites
Figure 1.1.: Overview of this thesis
1.3. Overview
In chapter 2 the necessary foundations for this thesis will be explained, starting with light
field technology, how a light field is captured and converted into usable disparity maps.
Following that, the basics of 3D computer graphics are described, firstly the modelling
process, secondly how rendering works and finally the graphics pipeline executed on the
GPU. Afterwards, virtual reality and digital video are briefly introduced.
The following chapter 3 describes the theory behind the connection of the light field to
the CG world. Here the individually taken steps, as well as their formulas are covered.
Continuing in chapter 4, the chosen 3D environment, the Unreal Engine is described.
Followed by an insight into the implementation. Starting with the FViewrenderer plugin,
a compute shader that produces viewrendered images given the appropriate input data.
Then the ALightfieldActor, the canvas that the resulting image gets drawn on, and the
methods to get light field video data into the engine are explained.
Next, the results are evaluated in chapter 5. In this chapter the effects of chapter 3 on
the rendering algorithms are described using a 3D object. Furthermore, the impact of
video compression on the rendering quality is illustrated. Then the necessary settings for
a smooth performance are described. Additionally, subjective feedback of colleagues is
assessed.
Finally the work is summarized in chapter 6 and an outlook explores future possibilities.
Patrick Werner 8 of 63
2. Basics
2. Basics
In this chapter the foundations needed for this bachelor thesis are described and illustrated.
Starting with the introduction to light field technology, different kinds of capturing methods
are demonstrated. Next the light field processing workflow used at Fraunhofer IIS, as well
as the light field rendering are described.
This is followed by an explanation of 3D computer graphics and vector spaces. Finally the
concepts of virtual reality and digital video are briefly touched.
2.1. Light field
Figure 2.1.: An eye gathering light rays
A theoretical light field contains all information about the flow of light in a 3D space,
as depicted in figure 2.1. The radiance, as well as the direction of every light ray inside
that space is known, this is represented in the 5D plenoptic function [1]. As this function
contains redundant information (outside of the bounds of convex objects) it can be reduced
to a 4D function [21]. This can be captured by a digital camera, given multiple viewpoints
of the same scene, as such a “light field can be interpreted as a 2D collection of 2D images,
each taken from a different observer position.” [21]
By photographically capturing an object with multiple varying viewpoints, the 4D light
field is created. With this light field data, multiple computer generated effects can be
realised. One of them is synthetic aperture, it allows refocusing after the picture is already
taken. This is also featured by the consumer light field camera Lytro Illum [16]. Another
possibility is view rendering, also called light field rendering [22], which allows rendering
of virtual camera positions across the light field.
Patrick Werner 9 of 63
2. Basics
2.1.1. Capturing methods
There are several approaches to capturing light field data, which are described in the
following sections. Dedicated plenoptic cameras, camera arrays or gantry systems are
possible. All working on the same principle of capturing a scene from multiple different
viewpoints.
Robotic linear axis
Figure 2.2.: Line gantry at Fraunhofer IIS
One of the first systems for capturing light fields used a moving camera, which periodically
took pictures from different viewpoints [22]. This is achievable through manual camera
dollies used in film making or motorized gantries. A gantry, normally used in manufacturing,
can be programmed to capture the light field as desired. With this capturing method a
high amount of samples can be created for the light field, with the disadvantage of only
allowing static objects.
At Fraunhofer IIS a 2D gantry with a high quality camera is used to capture planar light
fields, shown in figure 2.2. It has 4.0 meters horizontal and 0.5 meters vertical freedom of
movement.
Plenoptic cameras
Because a gantry is not very convenient, another way to capture light field data is through a
plenoptic camera. The Lytro Illum [16] provides access to light field capturing to consumers.
Figure 2.3.: The Lytro Illum plenoptic camera [16]
Patrick Werner 10 of 63
2. Basics
Plenoptic cameras work by capturing a dense light field through a micro-lens array inside
of the hand held device [25]. These spread the light rays coming through the main lens into
micro images captured by the camera sensor. As a result, this enables the aforementioned
synthetic aperture after the shooting of the scene itself, through plenoptic image processing
[25]. However, this method of capturing is limited by the narrow perspective, which does
not enable effects based on the geometry of the scene. Similar to the gantry, these consumer
cameras currently only work for still images, but Lytro currently has a plenoptic video
camera for rent [15]. Additionally Lytro offers a prototype of a 360° light field camera,
that can also capture moving scenes [17].
Camera arrays
Figure 2.4.: Black magic 3 × 3 camera array
To overcome some of the limitations of the robotic approach and plenoptic cameras, one of
the first, inexpensive to produce, dense camera arrays was suggested by Stanford University
in 2005 [31].
Multi camera arrays benefit from the modular setup possibilities. With the simplest setup
being planar, which allows high resolution, high dynamic range video, or even virtual dolly
shots [31].
By diverting from the densely packed array to a more spaced out one, the available range of
motion for geometry based effects is increased, while keeping the data rates manageable.
Other possible setups include a 360° rig [13], convex or concave placements. Even arbitrarily
placed cameras are researched [34].
Of course, the cameras need to have matching intrinsic parameters. For video purposes
these cameras also need to be synchronizable.
Fraunhofer IIS group CIA has different types of planar arrays available, with the most
recent one being a 3840 × 2160 resolution (UHD) supporting 3 × 3 array with Studio 4K
Blackmagic devices, shown in figure 2.4.
Patrick Werner 11 of 63
2. Basics
Image
AcquisitionImage
Rectification
Greenscreen
Keying
Disparity
Estimation
Disparity
Post-processing
Depth Image
Based Rendering
Figure 2.5.: The Fraunhofer IIS light field rendering workflow
2.1.2. Workflow
As a consequence of using sparse camera arrays direct light field rendering does not work.
By creating disparity maps from the sparse camera array images, a denser light field can
be reconstructed [19]. This is sufficient for several post production effects [32].
One such approach is presented by Foessel et al. in [9] as well as by Zilly et al. in [33] and
consists of 4 essential steps, cf. figure 2.5.
In the following this approach is described in detail on a data set created by the previously
mentioned line gantry. The subject is a Cleopatra figurine, which is placed in front of a
green screen. It was sampled simulating a 21 × 11 camera array, one image is shown in
figure 2.6.
Figure 2.6.: The Cleopatra object captured
Image rectification
In order to simplify the following step of disparity estimation the images have to be rectified.
Although the cameras in an array are mounted with high precision, small imperfections
cause corresponding pixels to be misaligned.
Rectification tries to adjust the images so that the corresponding pixels can be found in
the same row or column in every neighbouring image, cf. figure 2.7.
Patrick Werner 12 of 63
2. Basics
Figure 2.7.: Rectified images of the Cleopatra figurine. The rectangle shows the alignedpixels.
Green screen keying
Now the images may be keyed in order to remove the green screen. This removes unnecessary
areas beforehand, as these may negatively affect the following steps.
Disparity estimation
With the source material rectified, the next step can be approached. In order to create
appropriate disparity maps, several intermediate steps are needed. To ease complexity, the
following step is illustrated on a stereo pair of the Cleopatra data set. The same principles
can be extended for use in a multi image setup.
Disparity describes the distance in pixels between two corresponding pixels of neighbouring
images. E.g. the cat ear in figure 2.6 can be found x pixels shifted to the right in its
left adjacent image, as can be seen in figure 2.8. Ideally a disparity map contains this
information for every pixel of an image, which provides implicit geometric information
about the scene.
There are several different disparity estimation algorithms, with different strengths and
weaknesses [28][27]. Even real time disparity estimation is possible, e.g. with multi image
correspondences [6].
Patrick Werner 13 of 63
2. Basics
Figure 2.8.: Disparity estimation of the cat ear. The ear can be found in the adjacentimage shifted by the disparity.
Commonly 3D CG software supplies depth data in form of depth maps. These describe
the distance of each pixel to the camera based on the rendering range. Depth maps can be
calculated from disparity maps using the equation
ρ =B · f
δ · du
(2.1)
where ρ describes the depth, B the baseline, f the focal length, δ the disparity and du the
width of one pixel on the image sensor.
Because the viewpoints differ in perspective, not all disparities can be found through image
based algorithms. These occlusions can be significantly reduced in the multi camera case
by merging the disparity maps of all available views.
Disparity post-processing
As most algorithms have problems in areas with low fidelity or uniform surfaces wrong or
no disparities may be produced. Most of the erroneous disparities can be sorted out by cross
checking with neighbouring images for differing values, through so called consistency checks.
Then the missing disparities may be filled with surrounding values through filtering.
(a) Raw disparity map (b) Post processed disparity map
Figure 2.9.: Comparison of disparity post-processing outcome
Patrick Werner 14 of 63
2. Basics
In the end a fully filled disparity map without errors, shown in figure 2.9, is desired, as
subsequent effects quality correlates with the disparity maps quality.
Because disparity maps contain floating point data, the images are saved in a specific
way. Formats that can handle floating point data are often not fully supported, as a work
around the single channel data is split into 8 bit red, green, blue (RGB)1 data, with each
channel containing special information. Each value of red represents a disparity of 256,
green contains the range between 1 and 256 and finally blue saves the fractional digits.
This representation can be saved in many highly supported formats, e.g. Portable Network
Graphics (PNG). However, in the context of this thesis the disparity maps are converted
to a coloured representation, highlighting the full disparity range.
2.1.3. Depth image based rendering
One of the possible effects is the free viewpoint rendering. An arbitrary position inside the
light field can be chosen and a virtual camera view from that position will be rendered.
This is achieved through multiple steps that build on top of each other.
A virtual image is created by warping a set amount of surrounding images to that position
and then merging the results. The warping is split into the forward and backward warping,
which happens for every viewpoint individually. Then these warped images are combined
in the merge process [26][8][7].
Real
Camera
Real
Camera
Real
Camera
Real
Camera
Virtual
Camera
Figure 2.10.: Viewrendering in a 2 × 2 array
As an example a 2 × 2 subset of the Cleopatra data set is used. In order to exaggerate
the effect a higher baseline distance was chosen. The virtual camera is positioned in the
centre, cf. figure 2.10.
Forward warp
At first every pixel in the disparity map is shifted by its disparity value d, based on the
virtual camera position CA. CA is the position between the cameras, as shown in figure
2.10, relative to the real camera position that is currently warped.
1additive colour space based on the trichromatic colour vision theory
Patrick Werner 15 of 63
2. Basics
This results in a shifted pixel position ~pLF for each of the disparity values, based on their
original pixel position described by u and v, cf. figure 2.11. Supplementary, the forward
warped disparity maps may be filtered to decrease rendering artefacts.
~pLF =
u
v
1
+ dCA (2.2)
(a) Incoming disparity (b) Forward warped disparity
Figure 2.11.: Forward warping
Backward warp
Next, these forward warped disparity maps are used to map the RGB values back onto
those pixels, interpolating when needed, cf. figure 2.12.
(a) Incoming image (b) Backward warped image
Figure 2.12.: Backward warping
Merge
After these steps have been performed for all available views, they are combined based on
different factors, like disparity or distance, to form the new virtual camera view, shown in
figure 2.13.
Patrick Werner 16 of 63
2. Basics
Figure 2.13.: Merged colour image
In the end the pixels that could not be found in any view may be filled by an inpainting
algorithm [3], that fills the missing pixels with surrounding colours.
2.2. 3D computer graphics
Another important topic is 3D computer graphics in general. It consists of multiple
subcategories, whose most important ones will be described in the following sections.
2.2.1. Modelling
At first, the 3D objects have to be created, with the most common method being modelling
by hand. Other methods may include 3D scanning or procedural generation.
As an accurate reproduction of the real world, with objects consisting of trillions of atoms,
would be infeasable, objects are approximated by their hull. These hulls consist of multiple
3D points, also called vertices. Two vertices connecting to each other form edges, multiple
form faces.
E.g. a die with perfectly flat sides can be described with 8 vertices connected by 12 edges,
which form 6 polygonal or 12 triangular faces.
Each of these primitives can also have additional attributes, like colour, normals or texture
coordinates. Colour is used for the albedo of the rendered objects, additionally normals or
texture coordinates can be used for advanced effects like lighting or texturing.
Point clouds
Point clouds are a special kind of model, consisting only of vertices. These are most often
the result of 3D scanning or reconstruction techniques, that do not provide fully conclusive
information of the objects.
Patrick Werner 17 of 63
2. Basics
Vector spaces
Every vertex of a model an artist creates is relative to the models origin, this is called
the model space. Every model has its own origin and respectively its own model space.
In order to combine different models, e.g. placing a die on a table, they have to be in a
common vector space, most often referred to as world space.
There always has to be an active space, to which origin the standard transformations will
be applied. E.g. there are two dice, one six-sided die (D6) one eight-sided die (D8), in
order to combine these two, a world space has to be defined. The D8 is placed inside the
D6 space, translated 10 units along the x axis. Then the world space is rotated by 90°
around the y axis. This results in the D8’s rotation being relative to the D6 origin, not its
own, as can be seen in figure 2.14.
Figure 2.14.: Dice before (left) and after (right) the rotation around the y axis
All these transformations can be described by a 4 × 4 homogeneous transformation matrix~MM , consisting of the translation matrix, the scale matrix and the 3 rotation matrices,
which describe the rotation around the respective axis, cf. equation 2.3.
~T =
1 0 0 Tx
0 1 0 Ty
0 0 1 Tz
0 0 0 1
~S =
Sx 0 0 0
0 Sy 0 0
0 0 Sz 0
0 0 0 1
~Rx =
1 0 0 0
0 cos(θ) −sin(θ) 0
0 sin(θ) cos(θ) 0
0 0 0 1
~Ry =
cos(θ) 0 sin(θ) 0
0 1 0 0
−sin(θ) 0 cos(θ) 0
0 0 0 1
~Rz =
cos(θ) −sin(θ) 0 0
sin(θ) cos(θ) 0 0
0 0 1 0
0 0 0 1
(2.3)
These transformations are combined by multiplication, creating ~MM which describes a
model’s position and orientation in its entirety. As matrix multiplication is not commutative,
and column vectors are used, transformations are read from right to left.
Patrick Werner 18 of 63
2. Basics
E.g. the D8 described before can only be correctly represented, if the transformations are
in the right order, as can be seen in equation 2.4.
~MMright= ~Ry
~T =
0 0 1 0
0 1 0 0
−1 0 0 −10
0 0 0 1
~MMwrong= ~T ~Ry =
0 0 1 10
0 1 0 0
−1 0 0 0
0 0 0 1
(2.4)
In order to reverse a transformation the inverse matrix of that transformation can be
multiplied to its current transformation matrix.
2.2.2. Rendering
Now that the models are in their 3D scene, their usual purpose is to be projected to a
screen. For this, a virtual camera in the 3D scene is necessary, which is usually described
by its position ~cVCG, viewing direction ~g and up vector ~t. The following section is based on
Fundamentals of Computer Graphics [29].
Figure 2.15.: Camera with up(y), forward(-z) and right(x) vector
As an intermediate step the scene is transformed into view space to simplifiy a lot of the
following maths. In view space the camera position is the origin and the camera viewing
direction correlates with one of the axes. The result can be seen in figure 2.16.
The matrix is calculated using the intermediate variables ~w, ~u and ~v based on the camera.
~w = −~g
‖~g‖
~u =~t × ~w
‖~t × ~w‖
~v = ~w × ~u
(2.5)
Patrick Werner 19 of 63
2. Basics
Figure 2.16.: View transformation
~MV =
ux uy uz 0
vx vy vz 0
wx wy wz 0
0 0 0 1
1 0 0 −cVCGx
0 1 0 −cVCGy
0 0 1 −cVCGz
0 0 0 1
(2.6)
Now the scene must be projected to the virtual camera’s sensor. In order to project the scene,
the last transformation, to projection space must be applied. The needed transformation
matrix depends on the type of projection, either orthographic or perspective. The additional
frustum parameters near plane n and far plane f are needed, these describe the depth of
the 3D scene that should be projected. For both projections the width nx and height ny of
the projection screen are needed.
~MPOrth=
2nx
0 0 0
0 2ny
0 0
0 0 − 2f−n
f+n
f−n
0 0 0 1
(2.7)
Beyond these the perspective projection needs the vertical field of view (FOV) φ of the
virtual camera.
~MPP ersp=
ny
nxcot φ
20 0 0
0 cot φ
20 0
0 0 f+n
n−f
2fn
f−n
0 0 1 0
(2.8)
These transformations can be combined into a Model View Projection matrix ~MMV P ,
which can subsequently be used to fully transform and project a model.
Patrick Werner 20 of 63
2. Basics
2.2.3. Graphics pipeline
Modern GPUs have a programmable graphics pipeline, one example is depicted in figure
2.17. It consists of the steps to create a 2D pixel image from a 3D scene.
Prior to the GPU processing, the 3D software prepares the models and calculates their
accompanying ~MMV P . These are then send to the GPU for rendering.
Figure 2.17.: OpenGL ren-dering pipeline [14]
Basically, the standard pipeline has two main programmable
components, the vertex or geometry shader and the fragment
or pixel shader.
At first the vertex shader is called for every vertex in the
object, here the ~MMV P is applied to the vertex in order to
transform and project it. Also other geometry based effects
can be applied here, like vertex based lighting.
In the next step these vertices go through post-processing.
Vertices outside of the view frustum are removed by the
clipping.
Afterwards, the primitives are assembled out of the left over
vertices. Primitives not meeting specific criteria, like facing
the camera, can then be removed, this is also called culling.
Next, the triangle primitives are rasterized into fragments.
These contain interpolated data of the surrounding vertices.
Now, in the fragment shader effects can be applied to each
fragment individually such as per-fragment-lighting or tex-
turing.
At last, the fragments run through different tests which determine their visibility, e.g. for
overlapping fragments, the nearest to the camera wins. Additionally, the fragments may
be blended using alpha values. Finally, the coloured pixel is generated as output.
More modern graphics cards also support compute shaders. These can manipulate data
directly in a programmable environment similar to the fragment shader. Overhead is saved
by not executing the now unnecessary intermediate steps.
Rasterization rendering can also be interpreted as each pixel of the resulting image sending
out one ray. This ray transfers the colour of the first hit object, together with possible
lighting computations, back to the pixel. Similar to the way pinhole cameras work. The
back side is the image and the pinhole is comparable to the virtual CG camera. Instead of
light rays coming into the box, they are sent out from the box, cf. figure 2.18. However,
this does not compare to full ray tracing rendering methods, as these aim to realistically
replicate light rays bouncing off of surfaces.
Patrick Werner 21 of 63
2. Basics
Pinhole
camera
CG camera
Light rays
Image
Figure 2.18.: Comparison between pinhole camera and rasterization rendering
2.3. Virtual reality
Consumer VR started with Nintendo’s Virtual boy as a failure, because of a lack colour
fidelity as well as the uncomfortable position [4]. But with current HMDs, like the Oculus
Rift or HTC Vive for the PC market, as well as Gear VR or Google Cardboard for mobile,
these limitations have mostly been lifted.
A modern HMD features at least 1920 × 1080 resolution (FHD) across its two screens with
a refresh rate of about 90Hz, which is necessary for a simulation sickness free experience.
These HMDs have inbuilt accelerometers as well as gyroscopes for rotational tracking and
the PC or console variants external tracking devices for positional tracking. With specific
optical design a large FOV is achieved. As this leads to pincushion distorted images, the
rendered images have to be barrel distorted in order to appear correct to the viewer. With
these prerequisites and suitable controllers, this can be highly immersive.
As an addition to the standard 3D CG workflow, VR adds a few extra steps. The 3D
application can access new information based on the HMD, such as position or rotation,
and use these as a basis for the camera position and orientation. It also has to render
one image for each eye, to enable a stereoscopic impression for the viewer. In the end a
compositor, provided by the HMD application programming interface (API), applies the
barrel distortion, as well as other VR specific effects.
2.4. Digital video
A digital raster image consists of columns and rows of pixels, which can contain variable
information usually RGB, optionally alpha. These so called colour channels can have a
different amount of bits representing the value, e.g. an 8 bit single channel image allows
for 256 distinct colour values. This means a 500 x 500 single channel image has a size of
250,000 bytes (about 244 kB).
This is extended in digital video, where videos basically consist of a multitude of raster
images behind each other. A single image is now called a frame. As a result a four second
Patrick Werner 22 of 63
2. Basics
video with 25 frames per second accounts to 25,000,000 bytes, this brings up a problem of
digital video: file size.
Albeit traditional video has many attributes that can be exploited to reduce file size.
2.4.1. Codec
In general a codec is a process to encode and decode data streams. In this process it
enables encryption or compression, which can be lossy or lossless. For media streams both
of these come into use.
One of the simpler concepts of compression for video data is to only save the full image
data for every n-th frame and the frames in between are just described by differences to
the last frame. This significantly reduces the file size.
Another differentiation to be made is lossless versus visually lossless. The human eye does
not accurately collect visual information, because the photoreceptors in the eye gather more
data from the colour green. This is a result of its high amount of luminosity information.
Truly lossless codecs support a total reconstruction of the input single frame data. However,
visually lossless codecs leave out information that cannot be seen or distinguished by the
human eye.
This is also represented in the luminance, blue-yellow chrominance, red-green chrominance
(YCbCr)2 colour model. This model differentiates from the RGB models, as it does not
represent colour, but luminosity and chrominance.
One example of a visually lossless codec, and one of the current industry standards, is
H.264 a part of the MPEG-4 standard. Its main use is the compression of high definition
video data for streaming to the internet, CD, video telephony and broadcasting. With an
extension it also supports multi view coding, which can be used for light field data, cf. [20].
However, these extended features are currently not widely supported. Additionally, special
light field codecs are researched [5], and multi view support is directly implemented in the
H.264 successor H.265 [24].
2colour space used in digital television
Patrick Werner 23 of 63
3. Theory
3. Theory
In order to integrate light field data seamlessly into a 3D world the light field canvas has
to have the same behaviour as standard 3D objects. These objects are basically rendered
like the pinhole camera described in figure 2.18. Because the virtual camera in light field
rendering is changing its position, the captured objects move in the resulting renderings,
as can be seen in figure 3.1. The amount of movement scales with the disparity.
Resulting renderings
R V R R RV
Figure 3.1.: Resulting object movement when rendering virtual camera views (V) betweenreal cameras (R)
Resulting is a incorrectly perceived position for the light field data in the CG scene. A
solution to this problem is provided in this chapter. To correctly render the light field data
based on the viewer position, full 3D rendering would be required. However, the real time
shaders of group CIA currently only support 2D rendering. Therefore, the following is a
2D approximation, that will be applied to the 2D rendering.
The CG scene consists of a wall with an opening where the light field canvas is placed,
various CG objects may be placed in the scene, cf. figure 3.2.
In order to compensate for the movement, the standard forward warping formula 2.2 has to
be modified. For that reason the global offset dw is introduced. It is based on the position
of the CG camera relative to the light field canvas.
Furthermore, as the light field should be integrated in a VR environment with two
viewpoints the problem of stereoscopic convergence arises. When the two cameras are
parallel the image is perceived as in front of the canvas. This however goes against the
desired effect. A solution is provided through horizontal image translation represented
in the additional offset dconv, based on the eye distance dIP D. dconv is used to shift the
stereoscopic convergence point onto the object.
Patrick Werner 24 of 63
3. Theory
CG camera
Canvas
CG objects
0
Figure 3.2.: Top down view of the 3D scene
With these two additional parameters the modified formulas 3.1 are developed. These
represent the left L and right R views to be rendered, respectively.
~pLFL=
uL
vL
1
+ d~cAL+
12dconvx
12dconvy
0
+
dwx
dwy
0
~pLFR=
uR
vR
1
+ d~cAR−
12dconvx
12dconvy
0
+
dwx
dwy
0
(3.1)
3.1. Transforming the CG coordinates
At first, the CG viewer camera position ~cVCGis given, this camera also has the right vector
~rCG, which is perpendicular to its viewing and up direction, as shown as x axis in figure
2.15. Additionally, the interpupillary distance dIP DCGis available.
Because the following calculations depend on the origin point being in the middle of the
canvas, all these variables have to be converted to their counterpart in the model space of
the light field canvas, by multiplying with its inverse transformation matrix.
As these should also be decoupled from the size of the canvas, they have to be divided
by its extent ~eC =(
eC1eC2
eC31
)⊤
. Resulting in the normalized variables shown in
equation 3.2.
Patrick Werner 25 of 63
3. Theory
~cV =~cVCG
M−1C
~eC
=
cV1
cV2
cV3
1
~r =~rCGM−1
C
~eC
=
r1
r2
r3
1
dIP D =dIP DCG
M−1C2
eC2
(3.2)
3.2. Intersection with the light field canvas
With dIP D and the right vector, the positions for the left and right eye cameras can be
calculated.
~cVL= ~cV − ~rdIP D ~cVR
= ~cV + ~rdIP D (3.3)
Based from these positions a ray is sent from each eye to the convergence point ~pconv =(
pconv1pconv2
pconv31
)⊤
. This point lies in the middle of the canvas, offset on the z
axis. Because of the previously presumed parallel camera configuration, that is corrected
through dconv, the offset is theoretically infinity.
From these rays and their intersection with the light field canvas, the virtual camera
positions are defined, cf. figure 3.3.
Virtual
cameras
∞
Figure 3.3.: Intersection with the canvas
Patrick Werner 26 of 63
3. Theory
The straight line going through ~pconv and ~cVLis described by the equation
~l = ~cVL+ tL(~pconv − ~cVL
) (3.4)
and to find the intersection, this must be inserted into the plane normal equation
(~l − ~p0) · ~n (3.5)
with ~p0 = ~0, ~n =(
0 0 1)⊤
and · meaning dot product.
Then, this is solved for
tL =−cVL3
~pconv − cVL3
(3.6)
which is then inserted in ~l. Thus, the intersection point
~cLFL= ~l = ~cVL
+−cVL3
~pconv − cVL3
(~pconv − ~cVL) (3.7)
is calculated. The same steps apply to ~cLFR.
As the rendering requires these coordinates to be in an array coordinate system, the [0, 1]
range is converted to a [0, AmountCameras] range in the x and y axis. This can be ignored
here, as it is a simple scaling.
~cAL=
cLFL1
cLFL2
cLFL3
~cAR=
cLFR1
cLFR2
cLFR3
(3.8)
Now the result of equation 3.1 must be converted to homogeneous 3D coordinates in order
to combine it with the other 3D positions.
~PLFL=
pLFL1
pLFL2
0
1
~PLFR=
pLFR1
pLFR2
0
1
(3.9)
Patrick Werner 27 of 63
3. Theory
3.3. Calculation of the perceived position
The desired result is the light field position being perceived at a specific position. The
perceived position is formed by the shift from the left to the right eye.
This is achieved by equating the lines starting from the left, and respectively right eye, to
the world point of the object.
~cVL+ k( ~PLFL
− ~cVL) = ~cVR
+ j( ~PLFR− ~cVR
) (3.10)
With the desired position P0 =(
X0 Y0 Z0
)⊤
This can then be solved for dw, as well as
dconv. These formulas 3.11 and 3.12 can then be used to render light field objects at the
desired position P0 when inserted into 3.1.
dw =
−Z0dcVL1
− dcVL1cVL3
+ X0cVL3+ Z0uL − Z0cVL1
− ulcVL3
Z0 − cVL3
−Z0dcVL2
− dcVL2cVL3
+ Y0cVL3+ Z0vL − Z0cVL2
− vlcVL3
Z0 − cVL3
(3.11)
dconv =
diρ1(Z0d − dcVL3− Z0)
Z0 − cVL3
diρ2(Z0d − dcVL3− Z0)
Z0 − cVL3
(3.12)
Patrick Werner 28 of 63
4. Implementation in the Unreal Engine
4. Implementation in the Unreal Engine
As the implementation of a full CG 3D environment would go beyond the scope of this
thesis, an existing environment was chosen. In order to enable the shader based DIBR to
be implemented, the 3D environment had to provide access to the programmable graphics
pipeline, as well as an implementation of a VR API. Additionally, a media functionality
with video support must be available, in order to support video light fields.
These prerequisites were met by the engine Unity as well as the Unreal Engine 4. Subse-
quently the Unreal Engine was chosen as the preferred development environment.
First, the engine itself and its basics are explained. Next, the architecture, as well as the
implementation itself and its multiple classes are demonstrated, followed by the integration
of video data. At last, the additional preparations for the CES are briefly presented.
4.1. General information about the Unreal Engine
The Unreal Engine is a full featured game development tool by Epic Games. It allows
game development in a 2D, 3D or VR environment. Furthermore, it deploys to mobile,
web, console or PC platforms. Most of the following information is taken from the Unreal
Engine website [10], as well as the documentation [11].
It was first publicly released in 1998 along with the game Unreal, supporting development
for high end PCs. The following versions introduced multi platform support while remaining
mainly focused on the high end spectrum. 2014 the current version 4 was released and
over time updated to 4.14.3 as of now.
For beginners it features multiple example projects and free to use assets, which can be
expanded with offers from a marketplace. The engine itself is free to use up to 3000$ of
revenue, afterwards 5% royalties of earnings are required.
Epic Games provides debugging symbols for C++ development and additionally the full
engine source code, that can be extended as desired. Also, support is provided through
forums, AnswerHub, a question answer portal, and an extensive documentation.
An Unreal Engine project has its own folder structure with all content, configurations,
plugins and sources needed. These projects are developed inside the Unreal Editor and
can then be packaged into stand-alone games.
Developers are aided by the UnrealBuildTool, which provides support for various macros
replacing boilerplate code. Additionally, all engine objects are subjected to a garbage
collection.
The development for this thesis started in version 4.13.
Patrick Werner 29 of 63
4. Implementation in the Unreal Engine
4.1.1. Actor
An Actor is the most basic element that can be placed or spawned in a level. It can be
extended with custom functionality and components. These can each feature different
functionality, e.g. light, physics, movement or audio.
During its life cycle it provides different functions accessible via Blueprints, a node based
visual scripting system, or C++ code. For the purposes of this thesis the most important
ones are:
• BeginPlay()
This is executed when the game begins to play, but after the actor is spawned.
• Tick()
This function is executed every frame for every actor.
• BeginDestroy()
This is called when the garbage collection is executed for this actor.
4.1.2. Graphics Programming
The Unreal Engine features the rendering hardware interface (RHI), an abstraction layer
between the graphics API of the system and the engine. Together with a cross compiler
for HLSL1 to other languages, it enables mostly low effort multi platform development.
The architecture of the RHI is mainly based on the DirectX syntax.
There are different feature levels which correspond to different DirectX 11 Shader Models
or OpenGL versions.
In order to keep logic and drawing isolated, the main game thread is separate from the
render thread. The render thread can be accessed by the game thread by enqueuing
rendering commands through macros.
4.1.3. Plugins
The Unreal Engine is highly extensible through plugins, either adding or modifying engine
functionality. These plugins are separate from the engine code and as such separately
compiled. They can consist of multiple modules, e.g. game specific or editor specific
functionality.
Plugins can be specific to a project or made accessible engine-wide, depending on the
install directory.
1DirectX shader language
Patrick Werner 30 of 63
4. Implementation in the Unreal Engine
4.1.4. Media Framework
In order to integrate light field video, media support is necessary. This is available in
Unreal Engines Media Framework, which has recently been completely reworked (4.13)
and as such misses extensive documentation. As the code is completely free to view and
commented, development is still possible. Advanced problems that require more insight
may be described on AnswerHub and are often answered by the developer in a few days.
The available codecs are based on the ones installed on the platform, with basic H.264
profiles being supported best.
Additionally a plugin by the Media Framework developer, providing the VLC2 decoder
functionalities to the engine, is available.
4.1.5. Blueprints
Because the Unreal Engine also wants to appeal to non programmers and should be
accessible even by the artistic game development staff, it features a visual scripting
engine called Blueprints. These enclose most of the engine functionalities in a node based
environment, allowing for complete gameplay scripting from the Unreal Editor. It can
define classes and objects for the engine, using object oriented patterns, cf. figure 4.1.
Figure 4.1.: Blueprint of the SetupSplitMaterialInstances function
2open-source media player by the VideoLAN project
Patrick Werner 31 of 63
4. Implementation in the Unreal Engine
4.2. Architecture
ALightfieldActor
Engine
FViewrenderer Shader
Project
Game
Figure 4.2.: Architecture of the implementation
The foundation of the Unreal Engine based shader structure was supplied by the project
UE4ShaderPluginDemo by Fredrik “Temaran” Lindh [23]. It provided the basic knowledge
needed to implement shaders in the Unreal Engine 4.11. In the scope of this thesis it was
ported to version 4.13.
In order to keep everything structured and portable, the engine code should not be modified,
keeping the code on a project basis.
The Fraunhofer IIS DIBR shader code is provided in separate shader files. These shaders
are wrapped by a plugin called FViewrenderer, that implements the complete light field
rendering inside of an independent plugin.
Connection to the game itself is provided by the class ALightfieldActor, the representa-
tion of the canvas object in the game. It collects all needed parameters and passes them
to the FViewrenderer plugin, which returns the finished image (cf. figure 4.2).
For video functionality a workaround, thoroughly discussed in 4.5, is provided in the
ULightfieldMediaTexture class.
In addition to the general classes, there is also a LightfieldEditor subdirectory where
Editor specific classes are defined, such as the ULightfieldMediaTextureFactory which
provides access to the ULightfieldMediaTexture in the Editor user interface.
All of the used Blueprint functions, as well as the publicly available members of the most
important classes can be found in the appendices A and B.
4.3. FViewrenderer plugin
The FViewrenderer plugin wraps the Fraunhofer IIS DIBR shaders. Its aim is to decouple
the rendering process from the game logic, by exposing a public API. Through this the
rendering can be started and the results accessed. As the DIBR shaders provide optional
filtering, an enum of filtering types is supplied.
In theory the constructor accepts all invariant parameters for a light field data set, while
the public ExecuteComputeShader() method accepts the varying parameters.
Patrick Werner 32 of 63
4. Implementation in the Unreal Engine
However, some of the typically constant parameters are supplied in the execution method for
the sake of dynamically changing and testing parametrization. Furthermore, the resulting
rendered RGBA3 images or disparity maps can be accessed through the corresponding
getter methods.
In the constructor all of the needed textures are created. As described in chapter 3, the
most important parameters of ExecuteComputeShader() are the eye positions ~cLFL, ~cLFR
and the desired position P0. Other parameters include the needed textures, as well as
various parameters for the calculation of the virtual camera position and the nearest
cameras. At first, the method checks if any textures have changed and recreates them if
needed. Next the virtual camera positions are calculated from the passed positions and the
array setup defined in the constructor. Now the specified amount of neighbouring cameras
to these positions are calculated through sorting the distances to them.
The next step is the calculation of dw and dconv as described in equations 3.12 and 3.11.
Up until now everything was executed on the game thread. For the sake of executing
rendering code, a transition to the render thread must be performed. The private function
ExecuteComputeShaderInternal() is now enqueued onto the render threads task list
with a macro.
In this method the individual shader implementations are invoked in the order explained
in 2.1.3.
As the FViewrenderer plugin uses compute shaders, the latest feature level SM5 has to
be supported by the hardware, which roughly corresponds to D3D11 Shader Model 5 or
OpenGL 4.3.
4.3.1. Compute shader implementation: FForwardWarpDeclaration
Because the implementations of the compute shaders are very similar, it will be explained
on the example of the forward warp shader.
The shaders provided by Fraunhofer IIS have different input and output surfaces, as
well as parameters according to the step represented. The class itself has to extend the
FGlobalShader class, which supplies a constructor parameter that can be used to bind
these surfaces to the shader. On the other hand the parameters are supplied through a
uniform buffer struct.
Checks for outdated surfaces are possible through the Serialize() function. To set these
input and output surfaces to their respective textures the method SetSurfaces() is
provided.
Similarly, methods that bind and unbind the parameter uniform buffer structs are imple-
mented.
3additional alpha channel which describes translucency
Patrick Werner 33 of 63
4. Implementation in the Unreal Engine
In order to be accepted by the engine as a shader, it also needs to supply functions pointing
to the specific shader files, as well as their main function’s name.
These steps are nearly analogous for the other compute shader types.
4.3.2. Quality improvements
At first the disparity maps loaded into the engine were deviating from the source material,
which lead to artefacts in the end results.
This was caused by multiple factors. As the source disparity maps are raw data, no sRGB4
conversion should be applied to them, due to heavy alteration of the disparity values.
Another quality deficiency was caused by the texture filtering method of the GPU, set
to bi-linear by default. Because this creates non-existent disparities by interpolation, it
should be fixed. The last flaw was caused by the lossy compression applied to the texture,
which created a streaking effect in the disparity maps.
These settings can be set for each texture individually in the Texture Editor. Here sRGB
can be disabled, the filtering can be set to Nearest and finally the compression can
be set to an uncompressed, and as such not affecting the texture quality, format like
VectorDisplacementMap.
Figure 4.3.: Correct disparity map
The effects of these settings were tested on the Cleopatra data set after the forward warping
step. As can be seen in figures 4.4, 4.5 and 4.6 the sRGB conversion has the biggest impact
on the quality, followed by the compression and at last the filtering. Interestingly the
compression artefacts can be clearly distinguished.
Furthermore, a mean squared error (MSE) comparison verified these results, cf. table
4.1.
Method sRGB compression filtering
MSE 12520 562 149
Table 4.1.: MSEs for texture settings
4colour space with increased luminance, used for accurate representation on computer monitors
Patrick Werner 34 of 63
4. Implementation in the Unreal Engine
(a) Faulty disparity map (b) Difference image
Figure 4.4.: Comparison of sRGB effects on disparity maps
(a) Faulty disparity map (b) Difference image
Figure 4.5.: Comparison of compression effects on disparity maps
(a) Faulty disparity map (b) Difference image
Figure 4.6.: Comparison of bilinear filtering effects on disparity maps
Patrick Werner 35 of 63
4. Implementation in the Unreal Engine
4.4. ALightfieldActor
The ALightfieldActor defines the physical representation of a light field inside the game
engine. Its foundations are laid in a C++ class and extended via a Blueprint. In general
it connects the canvas object and its intrinsic parameters to the FViewrenderer plugin
and displays its results on the canvas. All needed parameters are either specifiable in the
Unreal Editor for each ALightfieldActor or derivable from the environment.
The canvas model was created in Blender5 and features special canvas texture coordinates,
which map the texture fully on the front and back sides.
4.4.1. C++
ALightfieldActor is derived from AStaticMeshActor, a base class for actors that always
have a mesh attached. In this case it is the aforementioned canvas mesh.
Every class that should be available in game needs the UCLASS() macro before the class
declaration, which helps generate Unreal’s own representation of a class. Here the specifier
Blueprintable is passed, as this allows the class to be extended by a Blueprint.
Similarly all the methods or member variables have macros defining the visibility and
accessibility in the Unreal Editor environment. C++ specific members can be declared
without macros.
Most of the variables are initialized in the constructor. However, some of them depend on
the game running, these are initialized later in the BeginPlay() method.
As the viewrenderer depends on the resolution of the textures to be initialized, which
are not always available after BeginPlay() has executed, it is initialized with the
InitViewrenderer() function. This function may also be called from a Blueprint and
as such is marked as BlueprintCallable. Here the FViewrenderer plugin and a pixel
shader plugin [23], used to convert the compute shader output, are instanced and assigned
to pointers.
In the Tick() function the textures that should be passed to the viewrenderer are first filled
depending on the selected SourceType. Then the parameters are converted as described
in chapter 3 and finally passed to the FViewrenderer plugin.
At last, the pointers to the plugin instances need to be freed. The BeginDestroy() method
does this at the end of the actors life cycle.
5a open-source 3D computer graphics software
Patrick Werner 36 of 63
4. Implementation in the Unreal Engine
4.4.2. Blueprint
The Blueprint class extends most of the functionalities based in the C++ code. In the
Blueprint most of the initialization of the textures is done. Depending on the source type,
either the video or the still images are loaded.
Then the resulting render targets are created and assigned to the variables created in
C++.
In the end an instance of the light field material is created, which uses the render targets
assigned before.
4.4.3. LightfieldMaterial
Materials are applied to meshes and handle their appearance in the engine. They provide
a variety of input data, which can be used to create physically based shading. Additionally,
different shading models, like Default Lit, Unlit or Subsurface, can be determined. Similar
to other features in the engine, a custom editor interface is provided to create these
materials in a node based fashion.
The LightfieldMaterial is in charge of displaying the correct texture on the light field
actor. It uses the texture coordinates supplied by the canvas mesh to map the result of
the viewrendering to the mesh.
It also enables the stereo support for VR by displaying different textures for each eye.
This is possible through the ScreenPosition coordinates, these span the screen in x and
y direction in a range from 0.0 to 1.0. By applying one texture from 0.0 to 0.5 and one
from 0.5 to 1.0 horizontally, the eyes can be separated.
In addition it handles the alpha support for the light field depending on the alpha channels
supplied by the FViewrenderer plugin.
4.5. Embedding of video data
The next step is to provide the viewrenderer with video data instead of static images. At
first, the video data has to be created in a sensible way. Due to the synchrony problems
and high inefficiency of accessing multiple video files at once, a different approach needed
to be made. Because the Unreal Engine does not supply easy access to multi view data,
that may be supported through a multi view codec, as described in Section 2.4, solutions
using existing functionalities were necessary.
Two approaches meeting these requirements were implemented. On one hand all videos
are combined into one, possibly higher resolution, video with all the needed images and
disparity maps. On the other hand placing the views behind each other on the timeline.
Patrick Werner 37 of 63
4. Implementation in the Unreal Engine
Another novelty is that the media file has to be specifically opened. Because the FViewrenderer
plugin depends on the media’s resolution, this has to happen beforehand. A solution is
provided through a delegate that registers to the OnMediaOpened event, which is called as
soon as the media file has finished opening. Then the viewrenderer can be initialized.
The next data set, called reporter, is a 4 × 4 shot recorded with a camera array composed
of modified GoPro cameras. It then went through the light field processing chain, as well
as a keying process, resulting in a data set with transparencies. The following sections are
explained on the basis of this reporter data set.
For video conversion and coding the FFmpeg software was used. To ensure support by the
Windows 7 codecs, while maintaining portability, the H.264 codec was used with the high
profile and level 4.0.
With this codec Windows 7 only supports resolutions up to FHD, while the Windows
specific WMV6 format supports up to UHD. However, a bug in version 4.14, which makes
textures appear too bright, renders this format unusable for now.
4.5.1. High resolution
The first approach works through combining all needed images and disparity maps into
one video. Because the incoming images are now made up of multiple views as shown in
figure 4.7 they have to be split up again.
Figure 4.7.: Incoming video data (increased gamma)
This is solved via an Unreal Engine specific solution. By creating a splitting material, that
only renders tiles of the input texture onto a render target. This material has multiple
parameters through which it only renders parts of the input texture.
6Windows Media Video
Patrick Werner 38 of 63
4. Implementation in the Unreal Engine
The necessary initializations are provided in Blueprints, starting at the SetupForHighResVideo
function. Here the different material instances needed for every view are instanced, then
for each instance a render target is created.
In the SetupSplitmaterialInstances function the number of cameras, currently only
horizontal, is used to create and parametrize the SplitMaterial instances. In the end
these are added to an array which was previously declared in C++.
Next, for each material instance a render target is created with the special settings
described in section 4.3.2. Similar to the instances, these are also saved in an array.
In the C++ Tick() function these arrays are then used to draw to the render targets
with the DrawMaterialToRendertarget() function.
A problem arose where the last texture passed to the viewrenderer was black. Assuming
this is a result of the draw process not being fast enough before assigning the targets, GPU
synchronisation is required. This is done with the FlushRenderingCommands() function.
However the last texture was still black. Now a workaround is implemented, which draws
a material to a dummy texture after the last render target.
The targets are then used as the textures passed to the viewrenderer.
Limitations
This approach currently has some limitations. The first one is the overhead of the Unreal
Engine specific SplitMaterial method. Instead, the textures may be split using RHI
functions.
Another limitation is the supported resolution. FFmpeg uses 4:4:4 predictive behaviour
when coding resolutions higher than FHD. This is not supported by the standard Windows
7 H.264 decoder implementation. Because of this the resolution is limited to FHD, that
limits the amount of camera views that can be used, while keeping the quality acceptable.
A possible fix is provided by the previously mentioned VLC plugin, with this the resolution
can be raised to UHD, resulting in higher individual resolutions.
In general both methods have a common limitation, the data loss when the disparity
is encoded. Because the original data is made up of floating point values, the resulting
disparities cannot be fully correct. This is further explored in section 5.2.
4.5.2. High frame rate
Instead of arranging the images next to each other, the other approach is to put them
behind each other. Then the resulting video is read with a higher frame rate than the
original data, depending on the amount of views used.
The decoding of the media happens on its own thread, which supplies the textures according
to the media frame rate. Because of this the individual textures cannot easily be accessed
Patrick Werner 39 of 63
4. Implementation in the Unreal Engine
in the game thread Tick() method. A solution was to extend the existing UMediaTexture
with a specialized ULightfieldMediaTexture.
Instead of providing one public resource, it provides an array of the last few decoded
textures. These can then be accessed by the game thread to pass the textures to the
viewrenderer.
Because a UMediaTexture can be created in the Editor environment, and the function-
ality should be equivalent in the ULightfieldMediaTexture a factory class has to be
created. This factory class is only available in the editor and provides access to the
ULightfieldMediaTexture as an asset.
Limitations
The main limitation is the decoding speed. This (central processing unit (CPU) or GPU)
bottleneck limits the amount of concurrently usable views. Another one is the relatively
high amount of “workaround” code, which compromises maintainability. Occasionally the
high frame rate video is offset by one frame, leading to errors in the viewrendering.
4.5.3. Quality improvements
At first, the standard colour coded variants of the disparity maps, described in section
2.1.2, were used. The resulting renderings had significant artefacts.
As the H.264 codec converts the RGB values to the YCbCr colour model, assuming the
video consists of standard, not data, content. When this is decoded, non existing colour
values can be found in other channels.
Instead of colour coding the disparity maps, they are spread to fully span the available 8
bit colour range in all three channels, as shown in figure 4.8. The used values are then
passed to the shader, where the disparities are converted back to their true values. The
results are significantly better, as can be seen in the forward warped disparity maps in
figure 4.9.
4.6. Preparation for CES
In order to have a more interactive CES demonstrator some smaller Blueprint classes were
created. Interactable objects such as buttons, switches or blocks, which activate objects
implementing the Activatable interface such as lights or moving blocks. Interaction is
provided mostly via the MotionControllerPawn from the VR example map. Additionally,
a torch was added to one of the VR controllers drastically increasing the immersion with
the Cleopatra scene.
Patrick Werner 40 of 63
4. Implementation in the Unreal Engine
(a) Colour coded disparity map (b) Full range disparity map
Figure 4.8.: Comparison of disparity map representations (increased gamma)
(a) Result with colour coded disparity map (b) Result with luminosity only disparity map
Figure 4.9.: Comparison of forward warped disparity maps
Patrick Werner 41 of 63
4. Implementation in the Unreal Engine
Also, an Egyptian tomb themed scene was created (cf. figure 4.11) in order to highlight
the captured scenes. In addition to the Cleopatra data set, a complete room filled with
Egyptian artefacts, called the burial chamber data set, was created, cf. figure 4.10. This
new data set highlighted the need for a positional correction even more, as the walls needed
to line up with the CG walls to create an immersive experience.
Figure 4.10.: Burial chamber data set
Figure 4.11.: Egyptian tomb setting with torch (increased gamma)
Patrick Werner 42 of 63
5. Evaluation
5. Evaluation
The aim of this chapter is to evaluate the success of the theory explained in chapter 3 in
integrating the light field objects inside the CG environment. At first a 3D cube is placed
at specific points in front or behind the light field canvas, then the behaviour of the light
field in comparison to the object is examined. Then the impact of compression on the
resulting disparity maps is evaluated, followed by the performance. At last subjective user
feedback is inspected.
5.1. Comparison with a 3D object
In order to test the light field rendering, a cube is placed in different points in front, behind
and inside the 3D canvas. The cube is applied with a material that is always visible to the
camera. Then the camera is moved in different directions and the relative movement of
the light field is compared to the cube. Everything works as intended when the light field
behaves in the same way as the cube. For this a 5 × 3 segment of the Cleopatra data set is
used.
The cube is placed on Cleopatra’s right eye in the same plane as the canvas. At first no
additional parameters are passed to the FViewrenderer plugin, here the object behaves
like it is in front of the cube, cf. figure 5.1.
The parameters can now be modified, so the cube matches the eye, cf. figure 5.2.
Patrick Werner 43 of 63
5. Evaluation
Figure 5.1.: Camera movement from top left to bottom right without modification (increasedgamma)
Patrick Werner 44 of 63
5. Evaluation
Figure 5.2.: Camera movement from top left to bottom right with modification (increasedgamma)
Patrick Werner 45 of 63
5. Evaluation
5.2. Impact of video compression
Because of the better stability the high frame rate variant was chosen for this section.
In order to evaluate the effects of video compression on the disparity maps accurately, a
single uncompressed file went through FFmpeg downscaling to 480 × 540 and the Unreal
Engine upscaling back to 900 × 900. The videos were encoded with the high 4.0 profile
using the Lavc encoder and yuv420p pixel format. The variable value is the constant rate
factor (CRF) which defines the quality level descending from 0 to 51. As a comparison
metric the MSE is used. The compression does not have as big of an impact with lower
CRF values, however values higher than 20 should have a significant visual impact, as can
be observed from the values listed in table 5.1.
CRF 1 10 20 30 40
Bitrate in kB/s 39717 15262 4847 1066 218MSE 0.56 1.53 2.80 12.58 43.02
Table 5.1.: Comparison of compression impact on disparity maps
5.3. Performance
The demonstrator was mostly developed on the same PC. It features a NVIDIA GeForce
1060 GPU with 6GB of memory, which is the most important aspect for the performance.
Its CPU is a Intel Core i7-6700 with 3.40 GHz. The VR aspect was covered by a HTC
Vive.
Performance was tested on the Cleopatra data set, the burial chamber data set and the
reporter data set. Furthermore, two dynamic light sources had a high impact on the
performance, but immensely improved the immersion. For the video the high resolution
variant has been chosen, with a frame rate of 25 frames per second and FHD resolution.
The goal was to achieve a stable frame rate of 90 frames per second, with the available
settings. Results are listed in table 5.2.
Dataset Cleopatra Cleopatra burial chamber reporter
Resolution 1920 × 1080 1100 × 618 750 × 422 480 × 540Amount of views used 1 5 12 4Filtering method none median none median
Table 5.2.: Settings for stable frame rate with two dynamic light sources
As can be observed, a compromise between resolution and amount of views used has to be
made. Higher resolutions produce crisper images, however with more views the resulting
rendering has less occlusions. As a consequence, the specific settings should depend on the
input data and its requirements.
Patrick Werner 46 of 63
5. Evaluation
5.4. User feedback
During the course of this thesis the demo was tested by a multitude of people. Without the
modifications described in chapter 3 users reported that the Cleopatra object felt out of
place and moved in strange ways. Especially the burial chamber produced peculiar effects,
due to the added walls that did not line up with the CG walls.
The modification vastly improved the integration, users reported Cleopatra definitely felt
like it was there in the CG world. This was amplified by the addition of the torch, which
enabled casting shadows of Cleopatra. Similarly, the burial chamber felt a lot more natural
and integrated into the scene.
However, the stereoscopic effect is static when moving towards or away from the canvas,
leading to a flatter impression as the viewer is closer and respectively deeper when the
viewer is further away. This may be a result of the missing third dimension in the light
field rendering.
Patrick Werner 47 of 63
6. Conclusion
6. Conclusion
In this thesis the creation of a demonstrator, which enables the integration of light field
data in the 3D environment of the Unreal Engine 4 was described.
First, the foundations needed were examined, consisting of light field, 3D and digital video
basics. Then, these were used to devise a projection model for the seamless integration of
light field data. With this model a position for the captured object in relation to the light
field canvas can be chosen.
This was followed by the explanation of the Unreal Engine and the implementation of
the demonstrator itself. Using the available tools of the engine and the Fraunhofer DIBR
shaders the demonstrator was realised. By applying the projection model the seamless
connection of the coordinate systems was ensured. Especially the integration of light field
video data was explored, by devising two distinct methods. Together with new content
this resulted in an immersive demonstrator, presented at the CES.
At last, the demonstrator was evaluated piece by piece. The projection model’s functionality
was shown at the hand of a 3D object, as well as with user feedback. Furthermore, the effects
of video compression on the video light field data and the demonstrators performance,
explore the demonstrators boundaries.
6.1. Outlook
The demonstrator devised in this thesis has a lot of room to grow. While many small
improvements can be made to the code base, the main factors lie in the algorithms. As
the formulas derived in chapter 3 are not definitive, their enhancement may produce an
even more immersive experience. Additionally, the light field rendering may be extended
to feature full 3D rendering.
The input quality and efficiency of video data may be increased with the new H.265 codec
and multi view coding, leading to more available data to the renderer. This may raise
rendering quality and freedom of movement. Because of the relative independence from the
shaders, the demonstrator may be used to depict different light field rendering techniques.
Different array layouts may also improve the usage scenarios, by enabling e.g full 360°
walk arounds of dynamic light field objects.
Interactive light fields, e.g. a real barkeeper reacting to prompts, may enrich gaming or
VR applications. A different approach might be an artificial relighting in the light field.
By using the torch to correctly relight objects inside the light field, immersion can be
improved immensely.
As can be seen future possibilities are plenty, especially with ever evolving light field and
VR technologies.
Patrick Werner 48 of 63
Bibliography
Bibliography
[1] Edward H Adelson and James R Bergen. The plenoptic function and the elements of
early vision. 1991.
[2] Ronald T Azuma. A survey of augmented reality. Presence: Teleoperators and virtual
environments, 6(4):355–385, 1997.
[3] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image
inpainting. In Proceedings of the 27th annual conference on Computer graphics and
interactive techniques, pages 417–424. ACM Press/Addison-Wesley Publishing Co.,
2000.
[4] Steven Boyer. A virtual failure: Evaluating the success of Nintendo’s Virtual Boy.
The Velvet Light Trap, (64):23–33, 2009.
[5] Jie Chen, Junhui Hou, and Lap-Pui Chau. Light field compression with disparity
guided sparse coding based on structural key views. arXiv preprint arXiv:1610.03684,
2016.
[6] Łukasz Dąbała, Matthias Ziegler, Piotr Didyk, Frederik Zilly, Joachim Keinert, Karol
Myszkowski, H-P Seidel, Przemysław Rokita, and Tobias Ritschel. Efficient multi-
image correspondences for on-line light field video processing. In Computer Graphics
Forum, volume 35, pages 401–410. Wiley Online Library, 2016.
[7] Christoph Fehn. A 3D-TV approach using depth-image-based rendering (DIBR). In
Proc. of VIIP, volume 3, 2003.
[8] Christoph Fehn. Depth-image-based rendering (DIBR), compression, and transmission
for a new approach on 3D-TV. In Electronic Imaging 2004, pages 93–104. International
Society for Optics and Photonics, 2004.
[9] Siegfried Foessel, Frederik Zilly, Michael Schöberl, Peter Schäfer, Matthias Ziegler, and
Joachim Keinert. Light-field acquisition and processing system for film productions.
In Annual Technical Conference & Exhibition, SMPTE 2013, pages 1–8. SMPTE,
2013.
[10] Epic Games. Unreal Engine. https://www.unrealengine.com/
what-is-unreal-engine-4/. [Online; accessed 20.01.2017].
[11] Epic Games. Unreal Engine Documentation. https://docs.unrealengine.com/.
[Online; accessed 20.01.2017].
[12] M Heilig. Sensorama simulator. United States Patent and Trade Office, Virginia,
USA, US-3,050,870, 1962.
[13] Google Inc. Google Jump. https://vr.google.com/jump/. [Online; accessed
20.01.2017].
Patrick Werner 49 of 63
Bibliography
[14] Khronos Group Inc. OpenGL Rendering Pipeline. https://www.khronos.org/
opengl/wiki/Rendering_Pipeline_Overview. [Online; accessed 20.01.2017].
[15] Lytro Inc. Lytro Cinema. https://www.lytro.com/cinema/. [Online; accessed
20.01.2017].
[16] Lytro Inc. Lytro Illum. https://www.lytro.com/illum/. [Online; accessed
11.01.2017].
[17] Lytro Inc. Lytro Immerge. https://www.lytro.com/immerge/. [Online; accessed
20.01.2017].
[18] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe,
Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison,
et al. Kinectfusion: real-time 3D reconstruction and interaction using a moving depth
camera. In Proceedings of the 24th annual ACM symposium on User interface software
and technology, pages 559–568. ACM, 2011.
[19] Peter Kauff, Nicole Atzpadin, Christoph Fehn, Marcus Müller, Oliver Schreer, Aljoscha
Smolic, and Ralf Tanger. Depth map creation and image-based rendering for advanced
3DTV services providing interoperability and scalability. Signal Processing: Image
Communication, 22(2):217–234, 2007.
[20] Péter Tamás Kovács, Zsolt Nagy, Attila Barsi, Vamsi Kiran Adhikarla, and Robert
Bregovic. Overview of the applicability of H.264/MVC for real-time light-field
applications. In 3DTV-Conference: The True Vision-Capture, Transmission and
Display of 3D Video (3DTV-CON), 2014, pages 1–4. IEEE, 2014.
[21] Marc Levoy. Light fields and computational imaging. Computer, 39(8):46–55, 2006.
[22] Marc Levoy and Pat Hanrahan. Light field rendering. In Proceedings of the 23rd
annual conference on Computer graphics and interactive techniques, pages 31–42.
ACM, 1996.
[23] Fredrik Lindh. UE4ShaderPluginDemo. https://github.com/Temaran/
UE4ShaderPluginDemo/. [Online; accessed 11.01.2017].
[24] Karsten Müller, Heiko Schwarz, Detlev Marpe, Christian Bartnik, Sebastian Bosse,
Heribert Brust, Tobias Hinz, Haricharan Lakshman, Philipp Merkle, Franz Hunn
Rhee, et al. 3D high-efficiency video coding for multi-view video and depth data.
IEEE Transactions on Image Processing, 22(9):3366–3378, 2013.
[25] Ren Ng, Marc Levoy, Mathieu Brédif, Gene Duval, Mark Horowitz, and Pat Hanrahan.
Light field photography with a hand-held plenoptic camera. Computer Science
Technical Report CSTR, 2(11):1–11, 2005.
[26] Christian Riechert, Frederik Zilly, Marcus Müller, and Peter Kauff. Advanced inter-
polation filters for depth image based rendering. In 3DTV-Conference: The True
Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, pages
1–4. IEEE, 2012.
Patrick Werner 50 of 63
Bibliography
[27] Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-
frame stereo correspondence algorithms. International journal of computer vision,
47(1-3):7–42, 2002.
[28] Daniel Scharstein, Richard Szeliski, and Heiko Hirschmüller. Middlebury Stereo
Evaluation. http://vision.middlebury.edu/stereo/eval3/. [Online, accessed
20.01.2017].
[29] Peter Shirley, Michael Ashikhmin, and Steve Marschner. Fundamentals of Computer
Graphics. CRC Press, 2009.
[30] Ivan E Sutherland. A head-mounted three dimensional display. In Proceedings of the
December 9-11, 1968, fall joint computer conference, part I, pages 757–764. ACM,
1968.
[31] Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez,
Adam Barth, Andrew Adams, Mark Horowitz, and Marc Levoy. High performance
imaging using large camera arrays. In ACM Transactions on Graphics (TOG),
volume 24, pages 765–776. ACM, 2005.
[32] Frederik Zilly, Michael Schöberl, Peter Schäfer, Matthias Ziegler, Joachim Keinert, and
Siegfried Foessel. Lightfield media production system using sparse angular sampling.
In ACM SIGGRAPH 2013 Posters, page 102. ACM, 2013.
[33] Frederik Zilly, Matthias Ziegler, Joachim Keinert, Michael Schoberl, and Siegfried
Foessel. Computational imaging for stop-motion animated video productions. SMPTE
Motion Imaging Journal, 125(1):42–47, 2016.
[34] C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and
Richard Szeliski. High-quality video view interpolation using a layered representation.
In ACM Transactions on Graphics (TOG), volume 23, pages 600–608. ACM, 2004.
Patrick Werner 51 of 63
A. Blueprints
A. Blueprints
Fig
ure
A.1
.:N
ode
layo
ut
ofth
eLightfieldMaterial
Patrick Werner 52 of 63
A. Blueprints
Fig
ure
A.2
.:C
ut
out
ofth
eALightfieldActor
blu
epri
nt
Beg
inP
lay
node
tree
Fig
ure
A.3
.:B
luep
rint
ofth
eSetupForHighResVideo
funct
ion
Patrick Werner 53 of 63
A. Blueprints
Fig
ure
A.4
.:B
luep
rint
ofth
eSetupLightfieldMaterial
funct
ion
Fig
ure
A.5
.:B
luep
rint
ofth
eSetResultRenderTarget
funct
ion
Patrick Werner 54 of 63
A. Blueprints
Fig
ure
A.7
.:B
luep
rint
ofth
eOpenMediaSource
funct
ion
Fig
ure
A.8
.:B
luep
rint
ofth
eSetupSplitRenderTargets
funct
ion
Patrick Werner 56 of 63
A. Blueprints
Fig
ure
A.9
.:L
oop
body
ofth
eSetupSplitMaterialInstances
funct
ion
Patrick Werner 57 of 63
B. Code Listings of public members
B. Code Listings of public members
Listing B.1: Public methods of the ALightfieldActor class1 // Sets default values for this actor ’s properties
ALightfieldActor ();
3
// Called when the game starts or when spawned
5 virtual void BeginPlay () override ;
virtual void BeginDestroy () override ;
7
// Called every frame
9 virtual void Tick( float DeltaSeconds ) override ;
11 UFUNCTION ( BlueprintCallable , Category = " Lightfield ", meta = ( BlueprintProtected ))
void InitViewrenderer ();
Listing B.2: Public members of the ULightfieldMediaTexture classbool CheckIfLoadedAndInit ( const uint32 NumTex );
2
TArray < class FTextureResource *> Resources ;
4
virtual void UpdateTextureSinkBuffer ( const uint8 * Data , uint32 Pitch = 0) override ;
6 virtual void ShutdownTextureSink () override ;
Listing B.3: Public methods of the FViewrenderer pluginFViewrenderer ( const uint8 NX , const uint8 NY , const float N,
2 const int32 SizeX , const int32 SizeY ,
const bool bIsLinearColor , const bool bIsStereo , const bool bIsVideoData ,
4 const ERHIFeatureLevel :: Type ShaderFeatureLevel );
6 ~ FViewrenderer ();
8 void ExecuteComputeShader ( const TArray < FTexture2DRHIRef >& InputRgbaTextures ,
const TArray < FTexture2DRHIRef >& InputDispTextures ,
10 const uint8 Amount , const float Multiplier ,
const EFilterType FilterType , const bool bInpaint ,
12 const FVector & CameraPositionLeft ,
const FVector & CameraPositionRight ,
14 const FVector2D & DesiredPosition ,
const float DesiredZ_X , const float DesiredZ_Y ,
16 const float DesiredDisparity ,
const FVector & EyeDistance , const float ConvergenceDistance );
18
void ExecuteComputeShaderInternal (); // Only execute from render thread
20
FORCEINLINE FTexture2DRHIRef GetRgbaTexture ( const uint8 Eye) const
22 {
return ResultRgbaTextures [Eye ];
24 }
26 FORCEINLINE FTexture2DRHIRef GetDispTexture ( const uint8 Eye) const
{
28 return ResultDispTextures [Eye ];
}
Patrick Werner 58 of 63
B. Code Listings of public members
Listing B.4: Public methods of the FForwardWarpDeclaration compute shader1 FForwardWarpDeclaration () {}
3 explicit FForwardWarpDeclaration ( const ShaderMetaType :: CompiledShaderInitializerType &
Initializer );
5
static bool ShouldCache ( EShaderPlatform Platform )
7 {
return IsFeatureLevelSupported (Platform , ERHIFeatureLevel :: SM5 );
9 }
11 static void ModifyCompilationEnvironment ( EShaderPlatform Platform ,
FShaderCompilerEnvironment & OutEnvironment );
13
virtual bool Serialize ( FArchive & Ar) override
15 {
bool bShaderHasOutdatedParams = FGlobalShader :: Serialize (Ar );
17
Ar << InputDispSurface << OutputDispSurfaceLeft << OutputDispSurfaceRight ;
19
return bShaderHasOutdatedParams ;
21 }
23 // This function is required to let us bind our runtime surface to the shader using an UAV
void SetSurfaces ( FRHICommandList & RHICmdList ,
25 FShaderResourceViewRHIRef InputDispSurfaceSRV ,
FUnorderedAccessViewRHIRef OutputDispSurfaceUAV );
27
void SetSurfacesStereo ( FRHICommandList & RHICmdList ,
29 FShaderResourceViewRHIRef InputDispSurfaceSRV ,
FUnorderedAccessViewRHIRef OutputDispSurfaceUAVLeft ,
31 FUnorderedAccessViewRHIRef OutputDispSurfaceUAVRight );
33 // This function is required to bind our constant / uniform buffers to the shader .
void SetUniformBuffers ( FRHICommandList & RHICmdList ,
35 const FForwardWarpVariableParameters & VariableParameters );
37 // This is used to clean up the buffer binds after each invocation to let them be changed
// and used elsewhere if needed .
39 void UnbindBuffers ( FRHICommandList & RHICmdList );
41 static const TCHAR * GetSourceFilename () {
return TEXT(" ForwardWarp ");
43 }
45 static const TCHAR * GetFunctionName () {
return TEXT(" ForwardWarpComputeShader ");
47 }
Patrick Werner 59 of 63
C. List of Figures
C. List of Figures
1.1. Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1. An eye gathering light rays . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2. Line gantry at Fraunhofer IIS . . . . . . . . . . . . . . . . . . . . . . . . . 102.3. The Lytro Illum plenoptic camera [16] . . . . . . . . . . . . . . . . . . . . 102.4. Black magic 3 × 3 camera array . . . . . . . . . . . . . . . . . . . . . . . . 112.5. The Fraunhofer IIS light field rendering workflow . . . . . . . . . . . . . . 122.6. The Cleopatra object captured . . . . . . . . . . . . . . . . . . . . . . . . . 122.7. Rectified images of the Cleopatra figurine. The rectangle shows the aligned
pixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.8. Disparity estimation of the cat ear. The ear can be found in the adjacent
image shifted by the disparity. . . . . . . . . . . . . . . . . . . . . . . . . 142.9. Comparison of disparity post-processing outcome . . . . . . . . . . . . . . 142.10. Viewrendering in a 2 × 2 array . . . . . . . . . . . . . . . . . . . . . . . . . 152.11. Forward warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.12. Backward warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.13. Merged colour image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.14. Dice before (left) and after (right) the rotation around the y axis . . . . . . 182.15. Camera with up(y), forward(-z) and right(x) vector . . . . . . . . . . . . . 192.16. View transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.17. OpenGL rendering pipeline [14] . . . . . . . . . . . . . . . . . . . . . . . . 212.18. Comparison between pinhole camera and rasterization rendering . . . . . . 22
3.1. Resulting object movement when rendering virtual camera views (V) be-tween real cameras (R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2. Top down view of the 3D scene . . . . . . . . . . . . . . . . . . . . . . . . 253.3. Intersection with the canvas . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1. Blueprint of the SetupSplitMaterialInstances function . . . . . . . . . 314.2. Architecture of the implementation . . . . . . . . . . . . . . . . . . . . . . 324.3. Correct disparity map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4. Comparison of sRGB effects on disparity maps . . . . . . . . . . . . . . . . 354.5. Comparison of compression effects on disparity maps . . . . . . . . . . . . 354.6. Comparison of bilinear filtering effects on disparity maps . . . . . . . . . . 354.7. Incoming video data (increased gamma) . . . . . . . . . . . . . . . . . . . 384.8. Comparison of disparity map representations (increased gamma) . . . . . . 414.9. Comparison of forward warped disparity maps . . . . . . . . . . . . . . . . 414.10. Burial chamber data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.11. Egyptian tomb setting with torch (increased gamma) . . . . . . . . . . . . 42
5.1. Camera movement from top left to bottom right without modification(increased gamma) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2. Camera movement from top left to bottom right with modification (increasedgamma) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Patrick Werner 60 of 63
C. List of Figures
A.1. Node layout of the LightfieldMaterial . . . . . . . . . . . . . . . . . . . 52A.2. Cut out of the ALightfieldActor blueprint BeginPlay node tree . . . . . 53A.3. Blueprint of the SetupForHighResVideo function . . . . . . . . . . . . . . 53A.4. Blueprint of the SetupLightfieldMaterial function . . . . . . . . . . . . 54A.5. Blueprint of the SetResultRenderTarget function . . . . . . . . . . . . . 54A.6. The SplitMaterial setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 55A.7. Blueprint of the OpenMediaSource function . . . . . . . . . . . . . . . . . 56A.8. Blueprint of the SetupSplitRenderTargets function . . . . . . . . . . . . 56A.9. Loop body of the SetupSplitMaterialInstances function . . . . . . . . . 57
Patrick Werner 61 of 63
D. List of Tables
D. List of Tables
4.1. MSEs for texture settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1. Comparison of compression impact on disparity maps . . . . . . . . . . . . 465.2. Settings for stable frame rate with two dynamic light sources . . . . . . . . 46
Patrick Werner 62 of 63
E. List of Listings
E. List of Listings
B.1. Public methods of the ALightfieldActor class . . . . . . . . . . . . . . . 58B.2. Public members of the ULightfieldMediaTexture class . . . . . . . . . . 58B.3. Public methods of the FViewrenderer plugin . . . . . . . . . . . . . . . . 58B.4. Public methods of the FForwardWarpDeclaration compute shader . . . . 59
Patrick Werner 63 of 63