antonio carlos dos santos souza - professores.ifba.edu.br · universidade federal da bahia...

Universidade Federal da BahiaUniversidade Salvador

Universidade Estadual de Feira de Santana

TESE DE DOUTORADO

An Adaptive Approach to Real-Time 3D Non-Rigid Registration

Antonio Carlos dos Santos Souza

Programa Multiinstitucional dePos-Graduacao em Ciencia da Computacao – PMCC

Salvador19 de Dezembro de 2014

PMCC-Dsc-0000

ANTONIO CARLOS DOS SANTOS SOUZA

AN ADAPTIVE APPROACH TO REAL-TIME 3D NON-RIGIDREGISTRATION

Tese apresentada ao Programa Mul-tiinstitucional de Pos-Graduacao emCiencia da Computacao da Univer-sidade Federal da Bahia, Universi-dade Estadual de Feira de Santanae Universidade Salvador, como requi-sito parcial para obtencao do grau deDoutor em Ciencia da Computacao.

Orientador: Antonio Lopes Apolinario Junior

Salvador19 de Dezembro de 2014

ii

Ficha catalografica.

Souza, Antonio Carlos dos Santos

An Adaptive Approach to Real-Time 3D Non-Rigid Registration/ Anto-nio Carlos dos Santos Souza– Salvador, 19 de Dezembro de 2014.

65p.: il.

Orientador: Antonio Lopes Apolinario Junior.Tese (doutorado)– Universidade Federal da Bahia, Instituto de Matematica,19 de Dezembro de 2014.

1. Alinhamento nao-rıgido 2. Algoritmos Adaptativos 3. Realidade Au-mentada.I. Apolinario, Antonio Lopes. II. Universidade Federal da Bahia. Institutode Matematica. III Tıtulo.

CCD 20.ed. 000.00

iii

TERMO DE APROVACAO

ANTONIO CARLOS DOS SANTOS SOUZA

AN ADAPTIVE APPROACH TO REAL-TIME3D NON-RIGID REGISTRATION

Esta tese foi julgada adequada a obtencaodo tıtulo de Doutor em Ciencia da Com-putacao e aprovada em sua forma finalpelo Programa Multiinstitucional de Pos-Graduacao em Ciencia da Computacao daUFBA-UEFS-UNIFACS.

Salvador, 19 de Dezembro de 2014

Prof. Dr. Antonio Lopes Apolinario JuniorUniversidade Federal da Bahia

Prof. Dr. Vinicius Moreira MelloUniversidade Federal da Bahia

Prof. Dr. Thales Miranda de Almeida VieiraUniversidade Federal de Alagoas

Prof. Dr. Ricardo FariasUniversidade Federal do Rio de Janeiro

Prof. Dr. Luiz Marcos Garcia GoncalvesUniversidade Federal do Rio Grande do Norte

ACKNOWLEDGEMENTS

First, I would like to thank God for all the blessings given during my journey.This work was made possible by the enthusiastic support, suggestions, encouragement,

and guidance of many individuals. I am greatly indebted to my academic advisor, chairand director of this work, Prof. Dr. Antonio Lopes Apolinario Jr for instilling in me thejoy of conducting outstanding research in computer graphics. These six years have beenintense in my career. Your support and vigilance have allowed me to achieve results thatI couldn’t have thought of.

Thank you so much Committee for the direction, feedbacks, and all the enlighteningadvices. Thank you Prof. Dr. Gilson Giraldi, Prof. Dr. Vinıcius Mello and Prof. Dr.Perfilino Ferreira.

Thank you Prof. Dra. Lynn Alves for the unforgettable times at master’s degree.Furthermore, I would like to acknowledge my friend Marcio Cerqueira de Farias

Macedo for the great partnership and his awesome markerless augmented reality environ-ment for on-patient medical data visualization. Working with you has been a wonderfulexperience and a great source of inspiration. I really wonder how my thesis would bewithout this environment.

Many individuals also provided support in myriad ways. Special thanks go to AlineMachado, Sabrina, Osmar, Rosalba, Thalles Caribe, Prof. Dr. Eduardo Telmo, Prof.Dr. Lurimar, Prof. Andre, Prof. Jowaner, Prof. Cesar, Lilia, Prof. Dr. Marcelo Veras,Prof. Dr. Jairo Dantas, Bruno, Leo, Everton, Toninho, Dona Vilma, Marilene (Fortona),Dona Mary, Sr. Deja, Cita, Fabiana, Carol, Janio, Eliakin, Igor, Rodrigo, Jony, Katia,Anderson, Leandro, Edilson and Rita.

I am also much indebted to the insightful discussions and fun times with all the Labra-soft friends: Luiz Claudio Machado, Valentim, Romilson, Ronaldo, Antonio Maurıcio, Si-mone, Josildo, Marcelo, Felipe, Amilton, Vanessa, Diego, Letıcia, Luiz Henrique, Pedro,Jorge, Fabiano and Aderbal.

Finally, I would like to acknowledge my family for their constant support and encour-agement during this graduate journey: Antonio Porfırio, Reinaldo, Ricardo, Maisa, Tissi,Lucia, Danilo, my uncle Ze Ribeiro, Manoel, Walter and Carlito, my aunt Belita, Esmera,Judith and Decinha, my cousins Fatim, Alva and Bel, Alan, Zeo, Caio, Vivian, Manoel,Jorge, Nandi, Rena, Luis and my family away from home Chicao, Rosinha, Juca, Leo,Mito and Kelly. I wish to thank Aline Requiao for everything. Aline, I appreciate yourmotherly love.

I dedicate this work to the loving memory of my mother Antonia, to my son Arthurand to my nieces Bruna and Eduarda.

v

RESUMO

Alinhamento nao-rıgido 3D e fundamental para o rastreamento e/ou reconstrucao demodelos tridimensionais deformaveis. Contudo, a maioria dos algoritmos de alinhamentonao-rıgido nao sao tao rapidos quanto aqueles desenvolvidos no campo do alinhamentorıgido. Metodos rapidos para alinhamento nao-rıgido 3D sao particularmente interes-santes para aplicacoes de realidade aumentada sem marcadores, em que um objeto sendoutilizado como marcador natural pode sofrer deformacoes ao longo do tempo de execucaoda aplicacao. Nesta tese e apresentado um algoritmo adaptativo multi-frame implemen-tado em GPU para o alinhamento nao-rıgido de modelos tridimensionais deformaveiscapturados por uma camera RGB-D. Abordagens adaptativas tendem a otimizar algo-ritmos, concentrando esforcos nos locais mais relevantes, causando um efeito global demelhoria da solucao. O metodo proposto utiliza adaptatividade em tres passos do algo-ritmo. Primeiro, para guiar a distribuicao de regioes de influencia baseado na intensidadede deformacao calculada sobre o objeto. Segundo, durante a selecao de restricoes, em quea amostragem feita sobre o objeto para a fase de otimizacao e baseado na deformacaoatual medida. Terceiro, para aplicar o algoritmo em um esquema multi-frame apenasquando o erro do rastreamento rıgido ultrapassar um certo limiar, indicando que umatransformacao rıgida ja nao produz um alinhamento satisfatorio. A partir do uso daadaptatividade e do paralelismo da implementacao em GPU, foram obtidos resultadosque demonstram que o metodo proposto e capaz de executar em tempo real com umaabordagem tao precisa quanto aquelas existentes na literatura.

Palavras-chave: Alinhamento nao-rıgido, Algoritmos Adaptativos, Realidade Aumen-tada.

vii

ABSTRACT

3D non-rigid registration is fundamental for tracking or reconstruction of 3D deformableshapes. However, the majority of non-rigid registration methods are not as fast as theones developed in the field of rigid registration. Fast methods for 3D non-rigid registrationare particularly interesting for markerless augmented reality applications, in which theobject being used as a natural marker may undergo non-rigid user interaction. Here,we present a multi-frame adaptive algorithm for 3D non-rigid registration implementedon GPU where the 3D data is captured from an RGB-D camera. In general, adaptivealgorithms optimize the solution, focusing on the more relevant aspects of the problem,causing a global improvement on the final solution. Our approach uses adaptivity inthree stages of the process. First, to guide the distribution of regions of influence basedon the deformation intensity on some region of the shape. Second, during the selection ofconstraints, where the sampling done over the object for the optimization is based on thecurrent deformation. Third, to apply the algorithm in a multi-frame manner only whenrigid tracking error is above a pre-defined threshold, showing that a rigid transformationcannot result in a satisfactory result. Taking advantage from this adaptivity and theparallelism of the GPU, the results obtained show that the proposed algorithm is capableto achieve real-time performance with an approach as accurate as the ones proposed inthe literature.

Keywords: Non-Rigid Registration, Adaptive Algorithms, Augmented Reality.

ix

CONTENTS

Chapter 1—Introduction 1

1.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2—Fundamentals and Related Work 5

2.1 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 3D Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Rigid Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Non-Rigid Registration . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3—Markerless Augmented Reality Environment 13

3.1 Surface Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 3D Reference Model Reconstruction . . . . . . . . . . . . . . . . . . . . . 153.3 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 4—GPU-Based Adaptive Non-Rigid Registration 19

4.1 Deformation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Matching of Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Selection of Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4 Weighting the Influence of Nodes . . . . . . . . . . . . . . . . . . . . . . 254.5 Selection of Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Error Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.7 Updating the Source Object . . . . . . . . . . . . . . . . . . . . . . . . . 284.8 Multi-Frame Non-Rigid Tracking . . . . . . . . . . . . . . . . . . . . . . 284.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Chapter 5—Non-Rigid Registration Evaluation 31

5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Accuracy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

xi

xii CONTENTS

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 6—Non-Rigid Support Evaluation for a Markerless Augmented RealityEnvironment 43

6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Chapter 7—Conclusion and Future Work 59

7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

LIST OF FIGURES

2.1 Reality-Virtuality Continuum (Milgram; Kishino, 1994). . . . . . . . . . . . 52.2 Marker-based (left image) and markerless (right images) augmented reality.

Left image is courtesy of ARToolKit library (Kato; Billinghurst, 1999) andright images are courtesy of KinectFusion (Izadi et al., 2011). . . . . . . . 7

3.1 Overview of the proposed approach from 3D reference model reconstructionto tracking solution. Adapted from (Souza; Macedo; Apolinario, 2014). . . . 14

3.2 Overview of KinectFusion’s pipeline (Izadi et al., 2011). . . . . . . . . . . . 153.3 Left image: The user translated his face fast. A small number of points

were at the same image coordinate and the ICP failed. Right image: Byusing the pose estimation algorithm, the problem can be solved (Macedo;

Apolinario; Souza, 2013). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Overview of the proposed approach from the depth map acquisition to thefinal non-rigid aligned surface. . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Building of the deformation graph (right) over the source object (left)based on the residual error measured (center). . . . . . . . . . . . . . . . 23

4.3 Refinement of the deformation graph (right) over the cheeks region of thesource object (left) based on the residual error measured (center). . . . . 24

4.4 Collapsing of the deformation graph (right) over the cheeks region of thesource object (left) after updating on the residual error (center). . . . . . 26

4.5 Constraint selection based on the initial non-rigid error between sourceand target surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1 Overview of the libraries used for each step of our approach. . . . . . . . 325.2 Datasets used for evaluation of the non-rigid registration algorithm. I -

Synthetic dataset consisting on a deformed plane. II - Real dataset of adeforming hand. III-1 - Real dataset of a user smiling. III-2 - Real datasetof a user inflating his cheeks. . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3 The resulting color-coded error from the registration between source andtarget surfaces. In all situations the proposed algorithm AdNodes + Ad-Cons obtained an averaged accuracy below 3mm and standard deviationbelow 3.5mm. I - Synthetic dataset consisting on a deformed plane. II -Real dataset of a deforming hand. III-1 - Real dataset of a user smiling.III-2 - Real dataset of a user inflating his cheeks. . . . . . . . . . . . . . . 34

5.4 Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in com-parison with the Embedded Deformation (ED) algorithm and the initialerror for each one of the datasets used. . . . . . . . . . . . . . . . . . . . 34

xiii

xiv LIST OF FIGURES

5.5 Accuracy comparison between ED algorithm and our adaptive approachwith respect to the node selection for the dataset II. . . . . . . . . . . . . 35

5.6 Accuracy comparison between ED algorithm and our adaptive approachwith respect to the node selection for the dataset III-1. . . . . . . . . . . 35

5.7 Accuracy comparison between different sampling schemes used to selectconstraints for optimization for the dataset II. . . . . . . . . . . . . . . . 36

5.8 Accuracy comparison between different sampling schemes used to selectconstraints for optimization for the dataset III-1. . . . . . . . . . . . . . 37

5.9 Accuracy (in mm) related to the parameter k for each one of the datasetsused. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.10 Accuracy (in mm) obtained for each level of the quadtree and for each oneof the datasets used. The maximum number of nodes for a level l is 4l. . 38

5.11 Performance (in FPS) obtained by AdNodes and AdNodes + AdCons incomparison with ED algorithm for each one of the datasets used. . . . . . 39

5.12 Performance (in ms) obtained by our approach for each one of the mostcomputationally expensive methods. MM - Matrix Multiplication (A =J tJ); Jacobian - computation of J ; Cholesky - LLT decomposition; Solver- linear solver Strsm from CUBLAS library; ACS - Adaptive ConstraintSelection; ANS - Adaptive Node Selection; Weights - computation of theinfluence of G on Ps; MV - Matrix-vector multiplication (b = −J tr). . . . 40

6.1 Neutral and deformed reference models based on user’s facial expression. 44

6.2 Neutral and deformed reference models for a different user. . . . . . . . . 45

6.3 Neutral and deformed reference models based on challenging deformationscenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.4 Cheeks tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 46

6.5 Color-coded cheeks tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.6 Cheeks-2 tracking error measured for both rigid and rigid + non-rigidsolutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 49

6.7 Color-coded cheeks-2 tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.8 Smile tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 50

6.9 Color-coded smile tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.10 Smile-2 tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 51

LIST OF FIGURES xv

6.11 Color-coded smile-2 tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.12 Kiss tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking.Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.13 Color-coded kiss tracking error measured for both rigid and non-rigid so-lutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.14 Kiss-2 tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 53

6.15 Color-coded kiss-2 tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.16 Open Mouth tracking error measured for both rigid and rigid + non-rigidsolutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 54

6.17 Color-coded open mouth tracking error measured for both rigid and non-rigid solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.18 Angry tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold . . . . . . . . . . . . . . . . . . . . . . 54

6.19 Color-coded angry tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.20 Bag tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking.Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.21 Color-coded bag tracking error measured for both rigid and non-rigid so-lutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.22 Limitation of the proposed method. User’s body (A) is reconstructed(B) and the algorithm cannot track user’s arms (C) integrating all themovement into the 3D reference model (D). . . . . . . . . . . . . . . . . 56

6.23 Body tracking error measured for both rigid and rigid + non-rigid so-lutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptivetracking. Dashed line - threshold. . . . . . . . . . . . . . . . . . . . . . . 56

6.24 Color-coded body tracking error measured for both rigid and non-rigidsolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

LIST OF TABLES

5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation(SD, given in mm) and performance (P, given in FPS) results according tothe step size (from 1 to 32) or sampling scheme (Adap for adaptive) usedto select constraints for optimization. . . . . . . . . . . . . . . . . . . . . 37

6.1 Average accuracy (A, given in mm) and Standard Deviation (SD, given inmm) results according to the weight used to update the 3D reference model. 47

6.2 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev.,given in mm) and Performance (Perf., given in FPS) results for each oneof the tracking algorithms tested in presence of specific user deformation.NRn: Non-Rigid Registration applied for every n frames (independent ofrigid tracking fail); NRAdaptive: Non-Rigid Registration applied wheneverthe rigid algorithm fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev.,given in mm) and Performance (Perf., given in FPS) results for each oneof the thresholds used to detect rigid tracking fail. . . . . . . . . . . . . . 49

xvii

Chapter

1In this first chapter, a brief contextualization of the problem we want to solve, objectives and contributions

of the proposed work and thesis organization are described.

INTRODUCTION

Augmented Reality (AR) is a technology in which the view of a real scene is augmentedwith additional virtual information. As stated by Azuma (1997), an AR application mustfollow three basic characteristics:

1. Combination of virtual object(s) into a real scene;

2. Real-time performance;

3. 3D Registration for accurate tracking of the augmented scene;

Since the beginning, tracking is one of the main problems which limits the developmentof a successful AR application. Virtual and real worlds must be properly aligned so thatthey seem to coexist at the same location for the user. For some applications, such as theones proposed for medical AR in surgery environments, it is specially important accurateregistration of the virtual medical data into the patient or a successful surgery operationmay be compromised.

Tracking plays an important role not only in AR, but also for 3D reconstruction.Several viewpoints of the same object/scene of interest are captured by an appropriatesensor and these must be registered and aligned to the same coordinate system. Afterthis registration step, the different viewpoints must be integrated into a single 3D model.Therefore, if the viewpoints are incorrectly aligned, visible artifacts will appear in thefinal reconstructed model.

Computer vision techniques have been proposed to solve the problem of registration,however they are not robust enough for some illumination conditions (Teichrieb et al., 2007).With the availability of depth sensors, 3D registration techniques have been proposedusing 3D information to improve tracking robustness. But, for low-cost depth sensors,noise may affect the accuracy of the registration.

1

2 INTRODUCTION

In scenarios such as on-patient craniofacial medical data visualization (Lee et al., 2012;Macedo et al., 2014), it is specially important for a markerless AR environment (MAR)to provide support for non-rigid tracking, which adds one level of interactivity for theuser and improves the robustness of the tracking algorithm for rigid and non-rigid pa-tient interactions. The main issue related to this support is that AR requires real-timeinteractivity and most of the current state-of-the-art works in the field of 3D non-rigidregistration do not provide such performance. Here, we assume that an application runsin real-time if its performance is equal or above 15 frames per second (Akenine-moller; Moller;

Haines, 2002). This concept of real-time is more related to user interactivity, because theuser must interact with the application and receive fast feedback from it without toomuch delay.

Several approaches exist for accurate 3D non-rigid registration, however a few of themallow interactive registration. Despite the real-time techniques which rely on strong priorsabout a specific scenario (Weise et al., 2011; Chen; Izadi; Fitzgibbon, 2012; Bouaziz; Wang; Pauly,2013; Li et al., 2013), a few methods have been proposed for fast general-purpose non-rigidregistration (Sumner; Schmid; Pauly, 2007; Nutti et al., 2014). Their common characteristicis the way they represent the deformation for a given surface: using a deformation graph.Each node of this graph has a 3D affine transformation which allows source surface tobe deformed to a target surface. Deformation is modelled in terms of an energy functionand, by using a non-linear optimization algorithm, energy is minimized and the bestaffine transformations for each node of the graph can be found.

In this doctoral work, we want to address the problem of fast 3D non-rigid registrationby applying adaptive techniques to reduce the computational cost of the registration whilekeeping it accurate.

1.1 HYPOTHESIS

Our main question of research is: Is it possible to track interactively and with sufficientaccuracy deformable objects which undergo deformation in sequential frames in a mark-erless augmented reality application?

To answer this question, we build upon an adaptive approach for fast non-rigid regis-tration in scenarios where real noisy surfaces are captured from a low-cost depth sensor.This thesis aims to solve the problem of fast, interactive 3D non-rigid registration forMAR environments. In this sense, the proposed approach must be as accurate as state-of-the-art solutions, while supporting real-time performance and being robust under noisyand missing data.

1.2 CONTRIBUTIONS

The main contributions of this thesis are:

� A markerless augmented reality environment based on a low-cost RGB-D sensor;

� A dynamic subdivision approach for node selection on the source object;

1.3 ORGANIZATION 3

� An adaptive algorithm to select, for each iteration, samples from the source objectto be used as constraints for optimization;

� A multi-frame adaptive approach in which non-rigid registration is applied onlywhen rigid tracking error is above a certain threshold and a 3D rigid representationof the object is updated to take into account the current deformation;

� A full framework for non-rigid registration implemented entirely on the GraphicsProcessing Unit (GPU);

1.3 ORGANIZATION

This thesis is organized as follows:

Chapter 2, Fundamentals and Related Work. This chapter formalizes the conceptsof augmented reality, 3D registration and their challenges. Also, it provides an extensivereview on related work in the fields of rigid and non-rigid registration, focusing on theinteractive methods developed so far.

Chapter 3, Markerless Augmented Reality Environment. The focus of this thesisis to add support for non-rigid tracking in a markerless augmented reality environment.Therefore, in this chapter we present the environment in which the proposed non-rigidregistration was applied and validated.

Chapter 4, GPU-Based Adaptive Non-Rigid Registration. In this chapter wepresent the proposed adaptive non-rigid registration algorithm and its adaptation to takeadvantage from the parallelism of the GPU, as well as the multi-frame scheme adoptedto improve algorithm’s performance.

Chapter 5, Non-Rigid Registration Evaluation. In this chapter, non-rigid regis-tration is evaluated in terms of accuracy and performance for several datasets.

Chapter 6, Non-Rigid Support Evaluation for a Markerless Augmented Re-ality Environment. In this chapter, non-rigid tracking is evaluated in the context ofthe markerless augmented reality environment in terms of accuracy, performance andtracking robustness for several datasets.

Chapter 7, Conclusion and Future Work. Thesis is concluded with a summary anddiscussion of future directions.

Chapter

2This chapter formalizes the concepts of augmented reality, 3D registration and their challenges. Also,

it provides a review on related work in the fields of rigid and non-rigid registration, focusing on the

interactive methods developed so far.

FUNDAMENTALS AND RELATED WORK

2.1 AUGMENTED REALITY

The concept of virtual environments has been proposed since 90s. They can be defined asenvironments in which only virtual objects are present. Milgram and Kishino proposeda taxonomy to identify in which point the applications were localized inside the so-called Reality-Virtuality Continuum (Milgram; Kishino, 1994) (Figure 2.1). The extremesof this taxonomy are the real world and the Virtual Reality (VR). At the center are theAugmented Reality and Augmented Virtuality. On the former, there is the predominanceof the real world over the virtual one, while on the latter there is the prevalence of thevirtual world over the real.

Figure 2.1 Reality-Virtuality Continuum (Milgram; Kishino, 1994).

AR and VR use virtual objects both, but they have some differences. AR changesthe real world by adding virtual elements. Thus, it is fundamental for an applicationto maintain the contact with the view of the real world, which is the basis for an ARapplication. Although authors such as Vallino and Azuma state that the main goal ofAR is the seamlessly integration of the virtual objects into the real scene (Vallino, 1998;Azuma et al., 2001), it is not mandatory for such systems to be realistic. Another centraldistinction between AR and VR is the registration or tracking problem. This processis crucial in AR: the combination of real and virtual objects into the augmented scenerequires an accurate positioning of the virtual objects over the real world.

5

6 FUNDAMENTALS AND RELATED WORK

The motivation for the development of applications and researches in the field of ARcomes from the potential of benefits that such techniques may bring in several other fields.In the specific field of registration, AR methods have been attracting a lot of attention inMedicine, because they extend the possibilities of study and practice for many techniquesand medical procedures related to the medical images generated from patient’s currentcondition, such as angiographic visualization (Wang et al., 2012), liver surgery (Haouchine et

al., 2013, 2014) and uterine laparosurgery (Collins et al., 2014). However, registration is acrucial problem in AR applications. Objects misplaced in the scene appear to be floatingover the real scene. Accurate registration becomes even more crucial in applicationswhich demand high precision, such as surgeries.

Tracking in AR is performed based on color or depth intensity of the object beingtracked by the application. For color-based tracking, features are computed from the colorimage of the scene captured by the sensor and tracked during application’s live stream(Horn; Schunck, 1981; Lucas; Kanade, 1981). The first solution proposed to solve this issuewas based on fiducial markers, used as point of reference positioned in the real scene fortracking (Figure 2.2-left image). Due to its intrusiveness (i.e. the marker is an artificialcontent introduced in the scene), methods for color-based tracking without markers wereproposed. However, the main drawback in this kind of registration is still the same: thesusceptibility to illumination conditions. To overcome this problem, depth-based trackingwas proposed by registering two surfaces captured from the real scene from a real-time3D depth sensor (Besl; Mckay, 1992; Chen; Medioni, 1992). This kind of tracking has grownpopularity due to its accuracy, robustness over illumination conditions and the recentavailability of low-cost depth sensors.

In general, AR applications can be divided in two groups: marker-based and marker-less. Marker-based AR uses a fiducial marker as a point of reference in the field of viewto help the system to estimate the camera pose (Figure 2.2-left image) (Kato; Billinghurst,1999). Markerless AR (MAR) uses a part of real scene as a natural marker (Figure 2.2-right images) (Izadi et al., 2011). By using it as a point of reference for tracking, one canexpect non-rigid motion of the marker if it consists of a deformable object (e.g. face,body, hand).

2.2 3D REGISTRATION

3D registration is a fundamental problem in fields as 3D reconstruction and augmentedreality. Most of the depth sensors provide partial surface data (i.e. acquired from oneviewpoint) that must be aligned, to allow camera pose estimation, and merged, to obtaina complete digital representation of the object or scene of interest.

Some functional models have been proposed in the literature to solve the problem ofregistration with good performance:

1. Rigid Registration: In this kind of registration, a single Euclidean transformationis used to align two objects (Rusinkiewicz; Levoy, 2001). This transformation has thefollowing properties: (1) It is global (i.e. remains the same for every point); (2)It can be uniquely defined by three non-collinear pairs of correspondences; (3) Itis low-dimensional (i.e. only six degrees of freedom). Real-time performance is

2.2 3D REGISTRATION 7

Figure 2.2 Marker-based (left image) and markerless (right images) augmented reality. Leftimage is courtesy of ARToolKit library (Kato; Billinghurst, 1999) and right images are courtesyof KinectFusion (Izadi et al., 2011).

easily achieved due to the low number of parameters required to solve the rigidregistration;

2. Articulated Deformation: For surfaces which are mainly characterized by artic-ulations, a skeleton is typically used as basis for deformation. In this representation,a skeleton is defined by a combination of bones and joints. Each joint is associatedto some DoF (i.e. joint angles) and is related to other joints by rigid transformations(Allen; Curless; Popovic, 2002). In an alternative representation, joint deformation isobtained by blending the transformations of two adjacent bones in the overlap re-gions (Chang; Zwicker, 2008, 2011). The advantage of this representation is that itrequires a low number of parameters to be estimated, which depends on the numberof available bones or joints;

3. Local Affine Deformation: For several real-world datasets, it is desirable forthe non-rigid registration algorithm to support general deformations, without priorknowledge about the objects or the kind of deformation they undergone. To achievereal-time performance, models, such as articulated registration, rely on prior knowl-edge about the scenario (e.g. skeleton tracking), losing its generality. To solve thisissue, keeping non-rigid registration fast, accurate and general, solutions which uselocal affine transformations are frequently employed as they allow the preservationof fine surface details, while decoupling the complexity of the geometry from thecomplexity of the deformation by using a deformation graph as basis representation(Sumner; Schmid; Pauly, 2007);

Other functional models such as rigid registration with non-rigid correctives (Brown;

Rusinkiewicz, 2007) and isometric deformation (Lipman; Funkhouser, 2009) have also beenproposed in the literature, however they require too much computational cost, beinginadequate to be used in our approach.


2.2.1 Rigid Registration

Rigid registration estimates a single transformation, composed of rotation and translation,to align two different viewpoints of the same object. Rigid registration is a challengingproblem because, for real-world scenarios, it must deal with noise, outliers and non-overlapping regions in-between two surfaces captured from commodity depth sensors.Noise refers to the presence of unwanted points near the surface captured. Outliers arenoisy points far from the surface, but that must be rejected otherwise they may affectthe optimization phase. As the object is captured from a single view of the camera, thepresence of non-overlapping regions between two surfaces is already expected, howeverholes and other artifacts may decrease significantly the region of overlap (Tam et al., 2013).

To limit the search space for optimization and correspondence estimation, constraintsmust be defined. In the field of rigid registration, transformation-induced constraintssuch as closest point criterion are commonly employed. It constraints potential corre-spondences by computing and matching closest points for every iteration of the registra-tion algorithm. It is used in the standard Iterative Closest Point (ICP) algorithm (Besl;

Mckay, 1992; Chen; Medioni, 1992). To reduce search space for correspondence, specific ap-proaches have been proposed: project and project-and-walk methods (Rusinkiewicz; Levoy,2001) restrict the search for a new closest point to the same 2D projection (i.e. pixel) andlocal neighbourhood respectively, avoiding global exhaustive search. Other constraintssuch as features (Johnson; Hebert, 1999) and saliency (Gelfand et al., 2005) have also beenproposed in the literature and provide more reliable correspondences, and consequentlymore accurate convergence to the final result, however they require high processing time,being inadequate to be used in our proposal.

In fact, rigid registration has been researched for several years and now it consistson a well-defined problem with a small number of parameters to be estimated. Then,real-time high-quality methods have already been proposed in the literature. The mostpopular algorithm for 3D rigid registration is the ICP. It consists of six steps:

� Selection of Points: Points from source and target objects are selected as samplesfor the algorithm;

� Matching of Points: Corresponding points from source and target objects areassociated;

� Weighting of Correspondences: Correspondences are weighted such that themost reliables will have more weight according to its level of reliability;

� Rejecting of Correspondences: Outliers are rejected from the pairs of corre-sponding points;

� Error Association: Point-to-point or point-to-plane error metric is defined forthe optimization step;

� Error Minimization: Energy function built from previous step is (commonly)minimized by solving a linear system.

2.2 3D REGISTRATION 9

As the ICP algorithm provides high accuracy and real-time performance for rigidregistration (Rusinkiewicz; Levoy, 2001), it is used in our approach.

2.2.2 Non-Rigid Registration

Non-rigid registration requires more attention because it faces the issues from rigid reg-istration and also the problem of deformation, which itself increases the number of pa-rameters to be estimated and the space of solutions that can be found. Unlike the rigidscenario, where every point from a given source object must be moved by a single transfor-mation measured by the algorithm, in the non-rigid scenario, every point may undergo adifferent, interconnected deformation. Therefore, more reliable correspondences must becomputed for every region of the source object so that the registration may be sufficientlyaccurate and realistic (Tam et al., 2013).

Traditionally, commercial systems have used markers to provide sparse reliable corre-spondences for non-rigid registration, however, they are intrusive in the scene (Bermano et

al., 2014). Templates have been used for applications based on part-to-whole alignment,where they provide strong priors for the shape, helping on handling of noise and missingdata (Li et al., 2009). For scenarios such as facial non-rigid registration, blendshapes canbe applied to capture a basis set of user expressions (Weise et al., 2011; Bouaziz; Wang;

Pauly, 2013; Li et al., 2013). Other constraints induced by deformation, features, signa-ture and saliency require too much processing time. Closest point criterion can be usedfor rigid and non-rigid registration in a similar way. However, regularization constraintsare commonly employed to improve optimization phase by avoiding local minima tak-ing advantage from a priori information. Orthonormality (Sumner; Schmid; Pauly, 2007)and handling of holes (Li; Sumner; Pauly, 2008) are some of the most used regularizationschemes for non-rigid shapes.

In general, non-linear optimization solver is typically employed for rigid and non-rigidregistration. Many techniques have been used focusing on finding the best transforma-tions and correspondences. Local deterministic optimization methods compute a solutionthat maximizes/minimizes an energy function locally. These techniques do not pro-duce the most accurate solutions, but are mainly used due to their low processing time.Gradient-descent, Newton, Gauss-Newton, quasi-Newton and Levenberg-Marquadt areoften employed for non-rigid registration (Madsen; Bruun; Tingleff, 2004). Singular ValueDecomposition, quaternions, orthonormal matrices and dual quaternions are the mostfrequently used for rigid registration (Lorusso; Eggert; Fisher, 1995). As local optimizationtechniques may find only the local minima, global optimization can solve this problemtrying to find a global solution. As alternative, stochastic optimization can solve thisproblem by using statistics and probabilistic models. While stochastic and global deter-ministic optimization seem to be more accurate, in this thesis we use a technique basedon local optimization because of its low running time. Moreover, as we assume that thereare spatial and temporal coherences between the sequential frames used for registration,local optimization converges after a few iterations (Sumner; Schmid; Pauly, 2007).

Surfaces such as face, hand and body may undergo deformation during a process of 3Dreconstruction, for instance, and the rigid registration is not able to solve it. A solution


for this issue is to apply non-rigid registration to align those deformable objects.One of the first works in the field of fast non-rigid registration applied to computer

graphics is the Embedded Deformation (ED), a real-time deformation algorithm for objectmanipulation and creation of 3D animation (Sumner; Schmid; Pauly, 2007). The goal of thistechnique is to allow an user intuitive surface editing while preserving surface’s features.Deformation is represented by a graph. Each node of this graph is associated withan affine transformation that influences the deformation to the nearby space. The greatadvantage of this approach is that it can be applied to a wide range of objects, articulatedor not.

Although its main goal is the user object manipulation, the algorithm proposed bySumner et al. also can be seen as a non-rigid registration algorithm in which source andtarget surfaces are the objects before and after user manipulation. In this sense, manyother works have used or improved this approach to the specific problem of non-rigidsurface registration.

Li et al. adapted Sumner’s algorithm to the registration of partial range scans acquiredfrom a 3D scanner (Li; Sumner; Pauly, 2008). They augmented the ED algorithm with arigid registration and designed an energy function to penalize unreliable correspondences.Later on, Li et al. presented an extension of the previous approach (Li; Sumner; Pauly,2008) where an algorithm for high-quality template-based non-rigid surface registrationand reconstruction using dynamic graph refinement and multi-frame stabilization waspresented (Li et al., 2009).

Li et al. presented a method for temporally coherent completion of surfaces capturedfrom real-time dynamic performances (Li et al., 2012). They extended the non-rigid regis-tration proposed in their previous work (Li et al., 2009) by adding texture constraints forthe optimization. Dou and colleagues proposed an algorithm to track dynamic objectsacquired from real-time commodity depth cameras, such as the Microsoft Kinect Sensor(Dou; Fuchs; Frahm, 2013). Basically, they have extended the KinectFusion algorithm (Izadi

et al., 2011) to deal with non-rigid registration. Their non-rigid registration is based onED algorithm, however color consistency and dense point cloud alignment were addedto the original energy function. All these approaches improve the accuracy of the EDalgorithm, however requiring execution time in the order of minutes to register two pointclouds. Thus, they are not suitable for an AR application.

Few methods were capable to achieve real-time performance in 3D non-rigid registra-tion. Chen et al. proposed a method for non-rigid registration of skeletons captured fromuser’s body (Chen; Izadi; Fitzgibbon, 2012). Their approach runs in 30 frames per second(FPS) but uses a small number of constraints for registration and depends on a skeletondefinition.

Nutti et al. proposed a method to track tumors based on patient’s body position thatpresumes the prior knowledge about the scenario (Nutti et al., 2014). Their algorithm runsin 10 FPS by using a multi-thread implementation of (Li et al., 2009) in CPU.

Zollhofer et al. proposed a method for real-time non-rigid registration of arbitrarymeshes captured from the real scene (Zollhofer et al., 2014). Based on a hardware special-ized for high-quality surface acquisition, their approach generates a 3D template modelof the object of interest and uses a hierarchical non-rigid registration algorithm fully

2.3 SUMMARY 11

implemented on the GPU. The implementation runs in 30 FPS with high accuracy.In this work, we present an approach also based on the ED algorithm which shares

some characteristics of (Zollhofer et al., 2014), such as no special configuration or priorknowledge of the object and GPU parallelism to achieve real-time performance. However,no special hardware is supposed to be used, on the contrary, our approach is basedon a simple off-the-shelf RGB-D sensor, with noise and low accuracy. As proposed inrelated work (Li et al., 2009), we use an adaptive graph refinement to improve non-rigidregistration accuracy. Differently from other approaches, the algorithm proposed hereruns entirely on the GPU and is based on a quadtree which operates over the 2D projectionof the object to be registered. Also, the main goal of our algorithm is to be incorporatedin a MAR environment, as a tool to improve tracking of the deformable object.

2.3 SUMMARY

Augmented reality is a technology which has been used in several fields such as medicine,entertainment, among others. For some applications, markerless technology is useful toremove the intrusiveness of traditional marker-based approaches. When the object usedas basis for markerless tracking is deformable, it is desirable for the application to supportnon-rigid motion to improve tracking robustness.

Many methods have been proposed for accurate 3D non-rigid registration inspired bythe ED algorithm, however a few of them support real-time performance, still requiringprior knowledge about the scenario. To overcome this situation, in this thesis we proposean alternative method for fast 3D non-rigid registration which extends the ED algorithmby using a three-level adaptive approach implemented entirely on the GPU.

Chapter

3The focus of this thesis is to add support for non-rigid tracking in a markerless augmented reality environ-

ment. Therefore, in this chapter we present the environment in which the proposed non-rigid registration

was applied and validated.

MARKERLESS AUGMENTED REALITYENVIRONMENT

In this chapter we present the MAR environment in which this work is based on. Anoverview of proposed MAR environment can be seen in Figure 3.1. An RGB-D sensoris used to capture color and depth information of the scene. The object of interest islocalized, segmented from the scene and reconstructed in real-time. Then, real-timetracking is performed by using the 3D reference model previously reconstructed and thecurrent 3D object captured by the sensor. The final registered 3D object is integratedinto the 3D reference model to account for new viewpoints or changes in object’s shapedue to deformations. A detailed explanation of the environment can be seen in the nextsubsections of this chapter.

3.1 SURFACE ACQUISITION

In this environment, an RGB-D sensor is used to capture color and depth informationfrom the real scene for every input frame (Figure 3.1). Color information is encoded asa color map, an image which stores for each pixel the red, green and blue intensities ofthe captured scene. Depth information is encoded as a depth map (D), an image whichstores for each pixel the measurement of distance (i.e. depth) from the corresponding 3Dpoint on the scene to the depth sensor.

Our approach is based on a low-cost RGB-D sensor which provides noisy depth data.As described in Section 2.2, unwanted points on the surface captured may reduce reg-istration accuracy. To minimize this problem, bilateral filter is applied over D (Tomasi;

Manduchi, 1998), as shown in Equation 3.1. To reduce noise preserving features (i.e.discontinuities) of the raw depth data, this technique uses a non-linear combination ofnearby image intensities based on geometric proximity and photometric similarity.

13

14 MARKERLESS AUGMENTED REALITY ENVIRONMENT

RGB-D Sensor

Live Stream Object Segmentation

TSDF3D Reference Model

Source Depth Map

Target Depth Map

Source Surface

Target Surface

Registration

Tracking

3D Reference Model Reconstruction

Figure 3.1 Overview of the proposed approach from 3D reference model reconstruction totracking solution. Adapted from (Souza; Macedo; Apolinario, 2014).

Df (p) =1

W (p)

∑q∈S

Gσd(||p− q||)Gσc(||D(p)−D(q)||)D(q)

W (p) =∑q∈S

Gσd(||p− q||)Gσc(||D(p)−D(q)||)(3.1)

where D(p) and D(q) correspond to the pixel values at positions p and q in image D.σd and σc are the standard deviations of Gaussian functions G for space (i.e. distance)and range (i.e. color) domains, respectively. W (p) is a normalization factor, S is theneighbourhood of pixel p and Df is the filtered depth map. From empirical tests, wehave set σd = 4.5 and σc = 30.

Unwanted points are also localized on the background scene, which can be removedfrom Df by using a depth threshold. On the experiments conducted, we have used thevalue of 1.3 meters for such task by considering that the object of interest is somewherenear the depth sensor.

To detect and segment the object of interest in the scene (Figure 3.1), two methodscan be used. The first method relies on the use of a classifier to detect the object on theappropriate map. If it is applied on the color map, intrinsic and extrinsic calibrationsmust be performed to allow the mapping of the segmented region from color to depth map.In practice, we have tested the approach in some scenarios where the object consists onuser’s head. In these cases, the Viola-Jones face detector (Viola; Jones, 2004) implemented

3.2 3D REFERENCE MODEL RECONSTRUCTION 15

in GPU is used to locate and segment the face in the color map (Figure 3.1). Thisdetector takes advantage from a representation called integral image to compute Haar-like features quickly. In an integral image, each pixel contains the sum of the pixels aboveand to the left of the original position. After the computation of the Haar-like features, acombination of simple classifiers built using the Adaboost learning algorithm is employedto detect faces in color images (Freund; Schapire, 1995). If the classifier is not available, analternative method can be used. A 2D bounding box that contains the foreground objectis computed from D. Then, it is discarded from the memory every position outside thebounding box.

By applying the process of intrinsic calibration, a point cloud P is computed fromD. The normal vector (n) for each point is the eigenvector of smallest eigenvalue for acovariance matrix built for every point p ∈ P (Holzer et al., 2012).

Once the 3D object is obtained for every frame, markerless rigid registration is per-formed based on the interactive alignment of two consecutive source (Ps) and target (Pt)point clouds captured from the real scene. In fact, Ps is represented by a 3D referencemodel generated from the object of interest in a previous pose and Pt is the current pointcloud acquired by the depth sensor.

To achieve real-time performance, all the steps of this MAR environment must run onthe GPU. Then, all the algorithms were carefully designed and implemented in a parallelway to exploit the full parallelism provided by the hardware.

3.2 3D REFERENCE MODEL RECONSTRUCTION

To reconstruct the 3D reference model from the object of interest in real-time (Figure3.1), the KinectFusion algorithm is employed (Izadi et al., 2011; Newcombe et al., 2011). Anoverview of this algorithm can be seen in Figure 3.2.

Figure 3.2 Overview of KinectFusion’s pipeline (Izadi et al., 2011).

Once the object is detected on the scene, the region that contains it is fixed. Then, theobject is constrained to be moved only inside this region. From the different viewpointscaptured from the same object, a single 3D reference model can be generated. To doso, the KinectFusion integrates raw depth data captured from an RGB-D sensor into a3D grid to produce a high-quality 3D reconstruction of the object of interest. The grid

16 MARKERLESS AUGMENTED REALITY ENVIRONMENT

stores for each voxel the signed distance to the closest surface around a narrow region (i.e.TSDF - Truncated Signed Distance Function) and a weight that indicates uncertaintyof the surface measurement. These volumetric representation and integration are basedon the VRIP algorithm (Curless; Levoy, 1996). To extract the implicit surface of the 3Dreconstructed model, zero-crossings (i.e. positions where the TSDF sign changes) aredetected on the grid through the raycasting algorithm.

By extracting the reference model in a previous pose, and aligning it to the cur-rent 3D model captured by the depth sensor, the incremental motion (Trigid) betweenframes can be estimated. This solution allows accurate markerless tracking without erroraccumulation, as the high-quality 3D reference model is used as basis for tracking.

3.3 TRACKING

Rigid motion is estimated by the ICP algorithm described in Section 2.2. Each one of theICP steps were designed to achieve real-time performance while providing good accuracyfor the rigid registration. This real-time variant of the algorithm is described as follows:

� Selection of Points: All the points from Ps and Pt are selected for optimization;

� Matching of Points: Corresponding points between Ps and Pt are associated byusing the projective data association (i.e. reverse calibration) (Rusinkiewicz; Levoy,2001), which matches the points that are located at the same 2D projection position(i.e. the same pixel in Ds and Dt);

� Weighting of Pairs: It is assigned constant weight for each association;

� Rejection of Pairs: Pairs are rejected if the Euclidean distance between corre-sponding points is greater than 10mm or angle between corresponding normals isgreater than 20 degrees;

� Error Metric: Point-to-plane metric (Equation 3.2) is used to guide optimization;

argmin∑

p selected

||(Trigidps − pt) · nt||2 (3.2)

� Error Minimization: Error metric is minimized by using the Cholesky decompo-sition on Equation 3.2 (Chen; Medioni, 1992).

The real-time variant of the ICP algorithm uses projective data association to findcorrespondences. The ICP fails, or does not converge to a correct registration, whenthere is high pose variation between frames in sequence. To improve tracking robustness,a real-time pose estimator is used to give a new initial guess to the tracking algorithmwhen it fails (Figure 3.3). For the situations where the object consists on user’s head, thehead pose estimation algorithm proposed by Fanelli et al. was used (Fanelli et al., 2011).However, even using this algorithm, the tracking may fail if the user interacts non-rigidlywith the application. Non-rigid tracking support can be added by applying a real-timenon-rigid surface registration algorithm to align the 3D reference model and the currentmodel captured, as will be discussed in the next chapter.

3.4 SUMMARY 17

Figure 3.3 Left image: The user translated his face fast. A small number of points were atthe same image coordinate and the ICP failed. Right image: By using the pose estimationalgorithm, the problem can be solved (Macedo; Apolinario; Souza, 2013).

3.4 SUMMARY

One solution to provide accurate markerless tracking for an augmented reality environ-ment is by generating a 3D reference model of the object of interest and tracking it inreal-time. The KinectFusion algorithm is used to reconstruct such model in real-timeand the ICP algorithm is used to track it in the scene by registering the 3D referencemodel in a previous pose and the current 3D model captured from a depth sensor. To addsupport for non-rigid tracking, it is necessary a real-time non-rigid registration algorithmto maintain user interaction with the application.

Chapter

4In this chapter we present the proposed adaptive non-rigid registration algorithm and its adaptation to

take advantage from the parallelism of the GPU. Our approach is evaluated in terms of accuracy and

performance for several datasets. Some of the content described in this chapter is present in our authored

publication in (Souza; Macedo; Apolinario, 2014).

GPU-BASED ADAPTIVE NON-RIGID REGISTRATION

In this chapter we present the adaptive non-rigid registration algorithm. An overview ofthe full process to register two point clouds can be seen in Figure 4.1. Non-rigid algorithmbuilds a deformation graph (G) on Ps to allow its deformation to Pt iteratively. Each nodeg ∈ G consists of a point ∈ Ps associated with a 3D affine rigid transformation (i.e. a 3Drotation matrix R and a 3D translation vector t) which influences the deformation to thenearby space. Current deformation between Ps and Pt is modelled in terms of an energyfunction and a non-linear optimization algorithm is applied to minimize this energy basedon the affine transformations of G. To reduce computational cost of the non-linear solver,a sub-sample of Ps is selected as constraint to be used during optimization. Next, thealgorithm iteratively refines G according to the energy function measured previously.This refinement is based on a quadtree. The registration is stopped when the residualerror between deformed Ps and Pt is sufficient low. To achieve a good performance, thefull pipeline runs entirely on GPU and non-rigid registration algorithm is applied in amulti-frame manner only when rigid tracking fails.

Our deformation model is inspired in the ED algorithm (Sumner; Schmid; Pauly, 2007).However, we have added a three-level adaptive approach to improve accuracy and perfor-mance of the original solution. Moreover, we have implemented it on the GPU to boostperformance even more. The proposed algorithm consists of several stages (Figure 4.1),which are described in the next sections of this chapter.

4.1 DEFORMATION MODEL

By using the deformation graph, a point p can be deformed by G according to thefollowing equation:

19

20 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION

Source Depth Map

Target Depth Map

Cropped SourceDepth Map

Cropped TargetDepth Map

SourceSurface

TargetSurface

Matchingof Points

Building ofQuadtree

Weightingthe

Influenceof Nodes

Selectionof Con-straints

Error Min-imization

Updatingthe source

object

Deformed Source Surface

Error > threshold

Error ≤ threshold

AdaptingQuadtree

Selection of Nodes

Figure 4.1 Overview of the proposed approach from the depth map acquisition to the finalnon-rigid aligned surface.

4.1 DEFORMATION MODEL 21

p′=

k∑j=1

wj(p)[Rj(p− gj) + gj + tj] (4.1)

where k represents the k-nearest nodes of p and wj is a weight that measures the influenceof each node to the point.

To solve the problem of non-rigid registration using this representation, we use threeenergy functions - Erot, Ereg, Econ (Sumner; Schmid; Pauly, 2007):

� Energy function for rotation (Erot): In order for a 3 × 3 rotation matrixto represent a rotation in SO(3), it must satisfy six conditions: each of its threecolumns must be unit length, and all columns must be orthogonal to one another(Grassia, 1998). The squared deviation of these conditions is given by the functionRot(R):

Rot(R) = (c1 · c2)2 + (c1 · c3)2 + (c2 · c3)2 +

(c1 · c1 − 1)2 + (c2 · c2 − 1)2 +

(c3 · c3 − 1)2 (4.2)

where c1, c2 and c3 are the column vectors of a given rotation matrix.

The term Erot is defined by the sum of the rotation error over all affine transfor-mations of G:

Erot =m∑j=1

Rot(Rj) (4.3)

� Energy function for regularization (Ereg): In order to apply a deformationsufficiently smooth, we must ensure that the affine transformations of adjacentnodes in G must be consistent. Ereg is the sum of the squared distances betweeneach node’s transformation applied to its neighbours and the actual transformedneighbour positions:

Ereg =m∑j=1

∑k∈N(j)

||Rj(gk − gj) + gj + tj − (gk − tk)||22 (4.4)

where Nj consists of all nodes connected with the node gj.

� Energy function for constraints (Econ): This energy function deals directly withPs and Pt. It measures how distant they are from each other. Econ is the sum of theEuclidean distances between the deformed source points and their correspondentson the target object:


Econ =n∑i=1

||p′

i − qi||22 (4.5)

q is the target point correspondent to pi, p′i is pi after deformation (Equation 4.1).

n is the total of points in Ps.

The total energy function Etot is defined by the following equation:

Etot = wrotErot + wregEreg + wconEcon (4.6)

We used wrot = 1, wreg = 10 and wcon = 100 in all our experiments, as suggestedin related work (Sumner; Schmid; Pauly, 2007). We tested other weights and alternativestrategies for relaxing them during each iteration, however we did not obtain betterresults.

4.2 MATCHING OF POINTS

After object detection and segmentation, points from Ps and Pt are associated. By usingthe MAR environment described in the previous chapter, it is assumed that there istemporal/spatial coherence between frames, as the rigid registration was already appliedand, as result, Ps and Pt are relatively near from each other. Hence, projective dataassociation (Section 3.3) is used to match the points.

As adaptation for GPU processing, each GPU thread transforms a single point ps intoimage coordinate and associates it with the point pt at the same image coordinate.

4.3 SELECTION OF NODES

After the matching of points, the nodes of G are selected. A quadtree is built on GPUto perform the selection of nodes based on the 2D projection of G. As the nodes of Gare also points in Ps, we can convert them from world to image coordinates by using thesame process used to reproject Ps into Ds. Ps may be an object with holes distributedalong the surface. In this case, the selection of nodes only based on the 2D space maycause the nodes to be selected in regions where there is no depth data. To solve thisproblem, we take advantage from what we call virtual nodes to represent the space wherethere is no depth data. Virtual nodes favor the expansion of the quadtree in regionswhere naturally we have depth data, however not in the specific position of the node.It is worthy to mention that virtual nodes do not have affine transformation, they arejust leaves of the quadtree that can be refined to generate real leaf nodes if necessary.Therefore, we restrict the use of virtual nodes in the first two levels of the quadtree.

To build the quadtree, some information must be stored on the GPU memory space,such as: the level for each node in G, whether in a given position exists a node in G, Ghas children (i.e. is a parent node) and exists a virtual node in G.

The algorithm can be divided in three steps: the building of the quadtree (Algorithm1), the adaptive refinement (Algorithm 2) and collapse (Algorithm 3) of nodes in G.

4.3 SELECTION OF NODES 23

Algorithm 1 Building a quadtree

1: for each thread of index idx in parallel do2: u← getP ixel(idx, currentLevel)3: if depth(v(u)) > 0 then4: insertNodeInGraph(u)5: setLevel(u, currentLevel)6: else if currentLevel <= 2 then7: insertV irtualNodeInGraph(u)8: setLevel(u, currentLevel)9: end if

10: if currentLevel > 1 and hasNode(u) then11: parentIdx = idx/412: u← getP ixel(parentIdx, currentLevel − 1)13: removeNodeFromGraph(u)14: removeV irtualNodeFromGraph(u)15: insertNodeInParentList(u)16: end if17: end for

Figure 4.2 Building of the deformation graph (right) over the source object (left) based onthe residual error measured (center).

We build the quadtree in the first iteration of our algorithm. This building is shown aspseudocode in Algorithm 1 and one result is illustrated in Figure 4.2. First, we iterativelycall the GPU kernel that will select the nodes. We iterate from the first level to the levelrequired by the user to build the quadtree. Each GPU thread in parallel computes theposition u to select the node (line 2). To compute u, we need the thread id and thecurrent level of the quadtree being iterated. The method getP ixel shifts the position ofthe thread id to the center of the 2D space that will be represented by the node. If thepoint is visible, it will be a new node in G (lines 3-5). In the opposite case, it can be a newvirtual node (lines 6-9). Therefore, we allow the quadtree to be refined even in regionswhere there are just a few points. If the node was selected but it is not in the first level(line 10), the thread removes the parent node from G, being it a real or virtual node, and


inserts it into a parent list, indicating that it has already been expanded (lines 13-15).In this case, getP ixel computes the position of the parent node based on the previouslevel in the quadtree hierarchy and the parent id thread (as the parent is expanded tofour children, we simply divide the current thread id by 4 to obtain the parent id).

Algorithm 2 Refinement of nodes

1: for each thread of index idx in parallel do2: u← getP ixel(idx, currentLevel)3: if hasNode(u) or hasV irtualNode(u) and getLevel(u) = currentLevel then4: evaluateEcon(u)5: if region around u must be refined then6: for each child node at pixel uc do7: if depth(v(uc)) > 0 then8: insertNodeInGraph(uc)9: setLevel(uc, currentLevel + 1)

10: end if11: end for12: removeNodeFromGraph(u)13: removeV irtualNodeFromGraph(u)14: insertNodeInParentList(u)15: end if16: end if17: end for

Figure 4.3 Refinement of the deformation graph (right) over the cheeks region of the sourceobject (left) based on the residual error measured (center).

After the building of the quadtree, the nodes of G can be refined or collapsed accord-ing to the residual error measured in the previous iteration. The algorithm to do therefinement of nodes is shown as pseudocode in Algorithm 2 and one result is illustratedin Figure 4.3. Again, we iteratively call the GPU kernel that will refine the nodes. Weiterate from the first level of the quadtree to the maximum level in order to refine the

4.4 WEIGHTING THE INFLUENCE OF NODES 25

nodes in a top-down fashion. For each GPU thread in parallel, we compute the posi-tion of the thread in the 2D space, check if there is a node at this position and if it isat the current level being iterated (lines 2-3). If the thread passes from this condition,we compute the average of the error around a region C as explained before (line 4). Ifthe average is above a certain threshold, the node must be refined. For each child nodecomputed from the node position (line 6), we check whether there is a point at the childposition (line 7). If exists, it will be a new child node in G (line 8). In this case, thethread removes the node from G (lines 12, 13) and inserts it into a parent list, indicatingthat it has already been expanded (line 14).

The algorithm to do the collapsing of nodes is shown as pseudocode in Algorithm 3and one result is illustrated in Figure 4.4. Again, we iteratively call the GPU kernel thatwill collapse the nodes. We iterate from the maximum level of the quadtree to the rootnode in order to collapse the nodes in a bottom-up fashion. For each GPU thread inparallel, we compute the position of the thread in the 2D space, check if the node haschildren and if it is at the current level that is being iterated (lines 2-3). If the threadpasses from these conditions, given a region C around u, we compute the average of theerror Econ (Equation 4.5) for each ps ∈ C (line 4). If the average is below a certainthreshold, the children nodes in C must be collapsed. To collapse the nodes, we check ifexist child nodes and they are leaf nodes (line 6). In this case, they are collapsed (lines7-9) and C is represented by the old parent node (lines 10-11).

Algorithm 3 Collapsing of nodes

1: for each thread of index idx in parallel do2: u← getP ixel(idx, currentLevel)3: if hasChildren(u) and getLevel(u) = currentLevel then4: evaluateEcon(u)5: if region around u must be collapsed then6: if exist child nodes and they are leaves then7: for each child node at pixel uc do8: removeNodeFromGraph(uc)9: end for

10: insertNodeInGraph(u)11: removeNodeFromParentList(u)12: end if13: end if14: end if15: end for

4.4 WEIGHTING THE INFLUENCE OF NODES

In this step, the influence of the k-nearest nodes for each ps is computed. The weight wjcan be computed by:

wj(p) = (1− ||p− gj||/distmax)2 (4.7)


Figure 4.4 Collapsing of the deformation graph (right) over the cheeks region of the sourceobject (left) after updating on the residual error (center).

and then normalized to sum to one. distmax is the distance to the k + 1-nearest nodewith respect to p. From the Equation 4.7, it is guaranteed that the nearest nodes willhave more influence in the deformation of p. Also, as the nodes are points of Ps, they aredeformed by other nodes of G.

To compute the weights efficiently in GPU, we create an array that contains only thenodes selected. The direct access to this array prevents us from checking explicitly on thesurface whether a point is also a node. Then, each GPU thread computes the influencefor a specific node in G.

4.5 SELECTION OF CONSTRAINTS

To compute the best affine transformations that align Ps and Pt we must:

1. Select the constraints (i.e. points from Ps that will be used during the optimizationphase);

2. Convert the affine rotations from Euler to quaternion representation;

3. Compute the energy function Etot (Equation 4.6) that models the constraints toguide the proper registration of the objects;

4. Use a non-linear solver to minimize Etot;

Instead of using the full dense point cloud as constraint for the optimization or askingthe user to perform this task of constraint selection, we use an adaptive algorithm thatperforms the selection of constraints based on the residual error previously measured(Equation 4.6). Given a region on the source surface, the higher the error, the higher thenumber of points selected as constraints for the optimization, as can be seen in Figure4.5.

In the first iteration of the optimization algorithm, where the residual error still wasnot measured, an uniform sampling is used to select the constraints. To do that, a n× nmask, with step n, is scanned through the 2D projection of Ps at the xy coordinates.

4.6 ERROR MINIMIZATION 27

Source Surface Target Surface

Initial Error Constraints

max

0

Figure 4.5 Constraint selection based on the initial non-rigid error between source and targetsurfaces.

The point at the center of this mask is selected to be a constraint if it exists in Ps (i.e.it is not in a hole). From empirical tests, n = 4 produced the best results. A discussionabout the most appropriate value for n is shown in Chapter 5, Section 5.2.

In the remaining iterations of the optimization, we use the same n×n mask to performa scan on the 2D projection of Ps and its residual error Etot (Equation 4.6). First, thealgorithm evaluates the average residual error at the n× n region being scanned. Basedon this average error measured from Etot, which we call here Eavg, and a pre-definedthreshold thc, the number of points selected at that region will be defined. In this case,we have three situations:

1. Eavg > thc, all the n2 points are selected;

2. Eavg ≥ thc/2 and Eavg ≤ thc, n points uniformly distributed over the mask areselected;

3. Eavg < thc/2, only the point at the center of the mask is selected.

Therefore, we select more constraints in the regions where the deformation is highand must be minimized, but we still consider the regions where the deformation is smallor none, by selecting a small number of constraints to represent them. From empiricaltests, thc equals to the half of the averaged root mean squared error measured for thedataset produced the best results.

4.6 ERROR MINIMIZATION

In this stage, the affine transformation A = [R|t], where R is a 3× 3 rotation matrix andt is a 3D translation vector, is estimated for each node by a non-linear Gauss-Newton


solver using the constraints selected previously.

After the selection of constraints, we need to convert the affine rotations from Eulerto quaternion representation. The motivation is related to our non-linear solver, thatoperates faster with quaternions (three unknowns, assuming the component w equals to1) than the Euler-form rotation matrix (nine unknowns).

To store the affine transformations that will be estimated, we create two arrays: onearray to store six parameters (i.e. three from quaternion and three for translation) for eachnode, and another array that is a hash relating a node in G to where are its parametersin the first array. We compute the array and the hash elements using atomic operationson the GPU.

Once with Etot, we must solve the optimization step to obtain the affine transforma-tions that align Ps to Pt. To achieve this goal we use the Gauss-Newton solver (Madsen;

Bruun; Tingleff, 2004). Our objective is to solve the normal equation J tJ∆ = −J tr. Wecompute the residual r, that consists in the computation of Etot for each coordinate x,y and z of each constraint and the Jacobian J , that is the first-partial derivative of Etotfor each one of the parameters. ∆ represents the unknown parameters that we want tofind to minimize Etot. To compute J efficiently we compute only the partial derivativefor the parameters that affect the constraint in which the derivation is being computed.Once with J and r, we reduce the normal equation to the linear system A∆ = −b andcompute the products A = J tJ and b = J tr. After solving the linear equation, we add ∆to the array of parameters (i.e. quaternions + translation vectors) and reiterate the op-timization algorithm until the maximum number of iterations or if the error is stabilized(does not change more than 5%).

Related to GPU processing, r and J are computed in parallel. A and b are com-puted by using the matrix-matrix and matrix-vector multiplication from CUBLAS li-brary (Nvidia, 2008). The linear system is solved by using a GPU implementation of theLLT decomposition proposed (Henry, 2009) together with a linear solver Strsm from theCUBLAS library.

4.7 UPDATING THE SOURCE OBJECT

The affine transformations computed in the previous step are applied on Ps based onEquation 4.1 and the algorithm is reiterated to the second step until the maximumnumber of iterations is reached (we use three iterations to limit processing time). EachGPU thread applies Equation 4.1 for each ps of the source object.

4.8 MULTI-FRAME NON-RIGID TRACKING

To add support for non-rigid tracking, one solution is to apply it whenever the rigidtracking fails, enhancing the robustness of the MAR environment. However, to applynon-rigid tracking for every frame has a computational cost which does not make itsuitable for real-time applications. Therefore, if rigid tracking keeps failing consecutively,non-rigid tracking will be used more frequently, reducing user interactivity.

To solve this problem, we take advantage from the volumetric representation of the

4.9 SUMMARY 29

KinectFusion algorithm to update the 3D reference model in real-time based on thecurrent deformation measured. When the rigid tracking fails (i.e. error measured isabove a certain threshold), non-rigid registration is applied and the 3D reference modeldeformed surface is sent to KinectFusion’s grid with a high weight. 3D reference modelis updated in the grid representation by the TSDF computation and then the grid is raycasted to generate a new source surface for the next iteration. High weight is used forfast adaptation of the previously stored 3D reference model into a new deformed one. Asconsequence, by deforming the 3D reference model, non-rigid tracking converges fasterand with higher accuracy in the next iterations than the rigid-only solution (i.e. in whichonly rigid tracking is applied and KinectFusion’s volume is not updated).

4.9 SUMMARY

In this chapter we have presented a fast method for non-rigid registration which is ableto register two noisy point clouds captured from a depth sensor with high accuracy. Wehave proposed an adaptive strategy for node distribution and constraint selection. Inthis context, it is fundamental to validate the algorithm in a real MAR environment inorder to validate tracking robustness over many frames as well as averaged accuracy andperformance, that is what will be done in the next chapter.

Chapter

5In this chapter, non-rigid registration is evaluated in terms of accuracy and performance for several

datasets.

NON-RIGID REGISTRATION EVALUATION

In this section we describe the experimental setup used and analyse accuracy and per-formance of the proposed algorithm. In the tests, we compare the results obtained withour algorithm in relation to related work, such as the ED algorithm.

5.1 METHODOLOGY

For all tests, we ran our algorithm on an Intel® CoreTM i7-3770 CPU @3.50 GHZ, 8GBRAM memory, NVIDIA GeForce GTX 660. Kinect is used as RGB-D sensor due toits accessibility and low-cost (Cruz; Lucio; Velho, 2012). It consists of a structured lightdepth sensor (IR emitter and camera), an RGB camera, an accelerometer, a motor anda multi-array microphone. Both cameras operate at 30 Hz, pushing images at 640x480pixels. While the sensor provides depth maps in real-time, the depth data is noisy andinaccurate.

To implement the approach proposed in this thesis, we have used some libraries ortoolkits to ease the implementation. The configuration of these libraries in the contextof our approach is illustrated in Figure 5.1. OpenNI was used to capture the depthand color stream provided by the Kinect sensor (Occipital, 2015). Object detection andsegmentation were done by using the OpenCV library (Bradski; Kaehler, 2008). We haveimplemented 3D reference model reconstruction and non-rigid registration in GPU byusing the NVIDIA CUDA architecture (Kirk; Hwu, 2010). Also, we used the open sourceC++ implementation of the KinectFusion released by the PCL project (Rusu; Cousins,2011).

3D reference model was reconstructed with the KinectFusion using a grid with volumesize of 70cm×70cm×140cm and resolution of 5123, as suggested in related work (Macedo

et al., 2014). The non-rigid registration optimization takes ≈ 20ms for each iteration.Therefore, to achieve real-time performance (≥ 15 frames per second), we have chosento use only three iterations of the optimization. As the optimization converges faster,

31

32 NON-RIGID REGISTRATION EVALUATION

such small number of iterations still produces a good balance between accuracy andperformance. As each dataset has its own minimum and maximum errors, we set thethresholds for adaptive node and constraint selections to be half of the averaged rootmean squared error measured.

Kinect

Live

Stream

Image

Processing

Reference

Model

Reconstruction

Non-Rigid

Registration

Figure 5.1 Overview of the libraries used for each step of our approach.

Out of the MAR environment, we have tested the proposed non-rigid registrationalgorithm in four different datasets, which can be seen in Figure 5.2.

I Synthetic dataset: to perform a ground-truth evaluation for the non-rigid regis-tration in objects free from noise and holes. This dataset contains models with 10kpoints;

II Real dataset with high precision and low noise: to evaluate the non-rigidregistration in objects with low level of noise. This dataset was used by Weise et al.and consists on a deforming hand with 80k points (Weise; Leibe; Gool, 2007). Althoughthis is not the kind of data we will find on the markerless AR environment, this isa real dataset common on the literature. Therefore, it was used to compare ourapproach with ED using a common model;

III Real datasets with medium precision and noise: to evaluate the non-rigidregistration in objects with noise and holes. The source and target surfaces werecaptured by our markerless AR environment. This scenario contains two differentdatasets: an user deforming his face by smiling (III-1) and inflating his cheeks (III-2).These two scenarios have objects with 30k and 40k points respectively;

5.2 ACCURACY EVALUATION 33

Source surface Target surface

I

II

III-1

III-2

Figure 5.2 Datasets used for evaluation of the non-rigid registration algorithm. I - Syntheticdataset consisting on a deformed plane. II - Real dataset of a deforming hand. III-1 - Realdataset of a user smiling. III-2 - Real dataset of a user inflating his cheeks.

On the tests comparing our approach with the ED algorithm, both were tested byusing the same number of nodes in the first iteration. While in the ED algorithm,the number of nodes did not change, in our approach the number of nodes changedaccording to the error reduction. We have used Etot as a measure for refinement/collapseof nodes. The following evaluations were done by a comparison of three algorithms:ED implemented in GPU (GPU-ED), our approach based only on adaptive refinement ofnodes (AdNodes), in which all the points are selected as constraints, and our full approachbased on adaptive refinement of nodes and constraints (AdNodes + AdCons).

5.2 ACCURACY EVALUATION

The final error distribution for the different datasets shown in Figure 5.2 can be seen inFigure 5.3 for our algorithm AdNodes + AdCons. For the synthetic dataset (on the topof the figure), the only deformation is the presence of a semi-sphere located on the centerof the object. In this situation, our algorithm achieved high accuracy of 1mm. For thesurface on Figure 5.3, a hand deforms starting by the fingers, where is the high error. Thealgorithm could reduce the average error below 2mm. The surfaces on the bottom werecaptured from the Kinect. On the first surface, the user was asked to deform his faceby smiling in front of the Kinect. Moreover, the user translated his face to get slightlydistant to the camera. Therefore, there is a high error in the model as it was deformedand rigidly translated. On the bottom surface, the user was asked to deform his face byinflating his cheeks. Therefore, the main deformation error is present in the region of thecheeks. In both cases, the algorithm had accuracy of 2.6mm.


Source surface Target surface Initial error Final error

I

II

III-1

III-2

10mm

0mm

Figure 5.3 The resulting color-coded error from the registration between source and targetsurfaces. In all situations the proposed algorithm AdNodes + AdCons obtained an averagedaccuracy below 3mm and standard deviation below 3.5mm. I - Synthetic dataset consisting ona deformed plane. II - Real dataset of a deforming hand. III-1 - Real dataset of a user smiling.III-2 - Real dataset of a user inflating his cheeks.

Figure 5.4 Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in comparisonwith the Embedded Deformation (ED) algorithm and the initial error for each one of the datasetsused.

The improvement of accuracy by AdNodes + AdCons with respect to the ED algo-rithm can be seen in Figure 5.4. AdNodes + AdCons obtained better accuracy than EDbecause of the adaptive selection of nodes, which redistributed the nodes in the deforma-


tion space increasing them in the regions where the residual error is high and decreasingthem otherwise. To improve accuracy, one solution is to select more constraints to be usedby the non-linear solver. Obviously, it will decrease the performance of the algorithm.This situation can be seen in Figures 5.4 and 5.11 and for the algorithm AdNodes.

10mm

0mm

Target Object Source Object

Registered ObjectEmbedded Deformation

(7 nodes)


(33 nodes)

Registered ObjectAdaptive Node Selection

(19 nodes)

Figure 5.5 Accuracy comparison between ED algorithm and our adaptive approach with re-spect to the node selection for the dataset II.

Target Object Source Object


(16 nodes)


(64 nodes)

Registered ObjectAdaptive Node Selection

(20 nodes)

10mm

0mm

Figure 5.6 Accuracy comparison between ED algorithm and our adaptive approach with re-spect to the node selection for the dataset III-1.

A visual comparison between AdNodes and ED can be seen in Figures 5.5 and 5.6.Accuracy by using adaptivity is comparable to ED algorithm using the double or triplenumber of nodes.

An accuracy evaluation with respect to AdCons can be seen in Table 5.1 and in Figures5.7 and 5.8. By using adaptivity instead of uniform sampling with fixed step size, non-


rigid registration achieves results as accurate as the ones obtained by using all the pointsfrom source object as constraints (i.e. step size 1), while maintaining the performanceas fast as the one obtained by the approaches which achieve good performance and pooraccuracy (i.e. step size 4 and 8). However, for the adaptivity to perform properly, we stillmust define a value for n of the mask n× n used to scan the 2D projection of Ps. Basedon Table 5.1, step size 4 produces good results for uniform sampling with fixed step size.Therefore, we use such step size for n in order to improve accuracy and performance ofthe fixed step size.

Target surface

Source surface

Registered surfaceConstraint sampling factor 1






Registered surfaceAdaptive Constraint Sampling

Figure 5.7 Accuracy comparison between different sampling schemes used to select constraintsfor optimization for the dataset II.

An accuracy evaluation with respect to the number of nodes which influence thedeformation (k) for a given point is illustrated in Figure 5.9. As stated in previous work(Sumner; Schmid; Pauly, 2007), k = 4 is a good option to solve the problem of deformation.Higher values for k may restrict the deformation space for G due to the oversample ofnodes influencing a specific region of Ps.

An accuracy evaluation with respect to the influence of each quadtree level on AdNodes


Target surface

Source surface







Registered surfaceAdaptive Constraint Sampling

Figure 5.8 Accuracy comparison between different sampling schemes used to select constraintsfor optimization for the dataset III-1.

Dataset I II III-1 III-2Sampling C A SD P C A SD P C A SD P C A SD P

1 10K 0.44 0.1 15 86.3K 0.53 0.66 3 30K 2.53 2.44 6 41K 3 3.97 72 2.5K 0.55 0.3 22 21.5K 0.66 0.86 5 7.7K 3.14 3.15 12.5 10K 3.2 4.52 12.54 625 0.7 0.52 33 5.3K 0.85 1.06 8 2K 4.8 4.28 22 2.5K 3.8 5.27 178 156 0.87 0.78 33 1.3K 1.05 1.22 12.5 494 7.36 4.91 33 644 4.6 5.85 2216 42 1.12 0.95 33 336 1.26 1.19 15 130 9.6 5.25 33 168 5 5.56 2232 12 1.28 1.1 33 84 1.44 1.58 15 30 11 6.72 33 42 5.5 6.12 33

Adap 1.7K 0.47 0.26 27 19K 0.68 0.66 9 9K 2.7 2.56 17 8.5K 2.59 3.37 17

Table 5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation (SD,given in mm) and performance (P, given in FPS) results according to the step size (from 1 to32) or sampling scheme (Adap for adaptive) used to select constraints for optimization.

can be seen in Figure 5.10. As the number of levels (l) increases, more nodes (maximum


Figure 5.9 Accuracy (in mm) related to the parameter k for each one of the datasets used.

4l) are selected and the accuracy is improved. From the tests conducted, we need threelevels for the quadtree building and refinement to register two objects accurately.

Figure 5.10 Accuracy (in mm) obtained for each level of the quadtree and for each one of thedatasets used. The maximum number of nodes for a level l is 4l.

5.3 PERFORMANCE EVALUATION

In terms of performance, a comparison between the algorithms can be seen in Figure 5.11.As the graphic shows, AdNodes + AdCons does not run in full real-time, but achieves inthe real cases 15 FPS, half of the frame rate considered ideal for a real-time applicationin computer graphics. Nevertheless, it is up three times faster than the ED algorithm.

5.3 PERFORMANCE EVALUATION 39

Figure 5.11 Performance (in FPS) obtained by AdNodes and AdNodes + AdCons in compar-ison with ED algorithm for each one of the datasets used.

The use of adaptivity for constraint selection greatly reduces the processing timeoriginally demanded by the ED algorithm (Table 5.1, step size 1). Optimization is acommon bottleneck in non-rigid registration algorithms (Sumner; Schmid; Pauly, 2007; Li

et al., 2009). The number of constraints selected is directly related to the time requiredby the optimization phase. Therefore, by reducing adaptively the number of constraintsused, we can achieve good performance even for the optimization phase. Moreover, aslong as the error is minimized over the surface, the number of nodes is dynamicallydecreased from G. With less parameters to be estimated, the optimization algorithmconverges faster.

On the dataset II, the performance for the ED algorithm is better than the oneobtained by AdNodes. It can be justified by the number of nodes used. In this case,AdNodes did not change too much the initial number of nodes. Thus, when with almostthe same number of nodes, ED is faster than AdNodes approach because it does not buildnor refine the quadtree.

An analysis of the performance cost for each step of AdNodes + AdCons was alsoperformed. The average processing time for each step of the four datasets was measured.The performance results can be seen in Figure 5.12. The step which takes most time isthe non-linear optimization algorithm, which requires 30ms (10ms per iteration). In fact,it consists of several steps: matrix-matrix and matrix-vector multiplication, computationof J , LLT decomposition and linear solving. Adaptive selection of nodes and constraintsrequires only 5ms. Therefore, the gain of performance in our approach is justified by thereduction of dimensionality for the optimization algorithm, directly related to the size ofG and the number of constraints selected.

As J is a sparse matrix, one way to improve the performance of the matrix productwould be using a specific sparse matrix product in GPU from CUSPARSE library (Nvidia,2014), as example. However, from tests conducted, the level of sparsity in J is not


Figure 5.12 Performance (in ms) obtained by our approach for each one of the most compu-tationally expensive methods. MM - Matrix Multiplication (A = J tJ); Jacobian - computationof J ; Cholesky - LLT decomposition; Solver - linear solver Strsm from CUBLAS library; ACS- Adaptive Constraint Selection; ANS - Adaptive Node Selection; Weights - computation of theinfluence of G on Ps; MV - Matrix-vector multiplication (b = −J tr).

sufficiently high (< 90%), and the CUBLAS-dense matrix product ran faster than theCUSPARSE-based matrix product.

5.4 DISCUSSION

Based on Table 5.1 and Figure 5.11, we can verify that our algorithm is up to threetimes faster and about 1.5 to 2 times more accurate than the traditional ED algorithmimplemented in GPU. Adaptation for node and constraint selections have shown to beuseful in this context, improving from 2 to 6 times the performance of the original ED,while keeping the registration accurate. Also based on Table 5.1, we highlight that ouralgorithm achieved optimistic results regarding the application on real noisy datasets.Performance is improved from 2 to 3 times over the scenario where all points are selectedas constraints, while there is minimal (for dataset III-1) or none (for dataset III-2) loss inaccuracy. Stability of the algorithm is reinforced by the low standard deviation measuredin comparison to the other scenarios evaluated.

The focus of our approach is to add non-rigid tracking support for a MAR environ-ment. Taking advantage from this scenario, where we have temporal/spatial coherenceand deformation is expected to be small between consecutive frames, we use a simple pro-jection algorithm to find correspondences. This matching algorithm does not affect ourresults since we want to ensure that the algorithm will minimize the deformation betweenconsecutive frames, which we assume will be predominantly small for every input frame.Moreover, to boost application’s performance and achieve full real-time performance, thealgorithm does not need to be applied for every frame when the current error is sufficient

5.5 SUMMARY 41

low.

5.5 SUMMARY

In this chapter we have evaluated the non-rigid registration algorithm and compared itagainst related work. Four different datasets were used and from the tests performed,we have shown that the adaptive non-rigid registration proposed outperforms currentexisting methods in terms of accuracy and performance.

Chapter

6In this chapter, non-rigid tracking is evaluated in the context of the markerless augmented reality envi-

ronment in terms of accuracy, performance and tracking robustness for several datasets.

NON-RIGID SUPPORT EVALUATION FOR AMARKERLESS AUGMENTED REALITY

ENVIRONMENT

In this section we describe the datasets used and analyse accuracy, performance andtracking robustness of the proposed algorithm in the markerless augmented reality envi-ronment.

6.1 METHODOLOGY

The same hardware described in Chapter 5, Section 5.1 is used in the following tests.On the tests of our algorithm on the MAR environment, we have tested the approachin a scenario where user’s head is our natural marker. As simple non-rigid interactions,we asked the user to perform three different facial expressions after 3D reconstruction:inflate his cheeks, smile and simulate a ”kiss” expression, as shown in Figure 6.1.

Moreover, to evaluate the proposed environment with respect to challenging defor-mation scenarios, we have tested the algorithm with different objects and in differentconditions for deformation. First, we have tested the same expressions with a differentuser (Figure 6.2) to evaluate the proposed approach in different faces. We will use termCheeks-2, Smile-2, Kiss-2 to denote these expressions and differentiate them from theones present in Figure 6.1. Next, we have tested different deformations: open mouth andangry facial expressions, and a deformation done on a bag, as can be seen in Figure 6.3.Compared to the scenarios presented in Figure 6.1, these deformations present additionalchallenges for the non-rigid registration algorithm:

� Open mouth expression poses a challenging scenario for matching of points, becausethe corresponding points become too distant during the motion of deformation.Also, there is a big hole on the deformed model which makes the process of matchingeven more difficult;

43

44NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT

As-rigid-as-possibleExpression

Cheeks Inflated

Smile

Kiss

Figure 6.1 Neutral and deformed reference models based on user’s facial expression.

� Angry expression has higher tracking error than open mouth expression, howeverit introduces less holes in the deformed model. In this case, the user not onlyperformed the facial expression, but also it rigidly rotated his head in front of thesensor. Therefore, the environment must deal not only with the rigid trackingrequired to solve rigid motion, but also with the non-rigid tracking to solve user’snon-rigid facial expression;

� Deformed bag presents high error and it is an object different from a face. Therefore,this dataset is fundamental to evaluate the robustness of the algorithm for distinctobjects;

6.2 EVALUATION

In this section, the deformation scenarios presented in Figures 6.1, 6.2, 6.3 are evaluatedin the context of the MAR environment.

As explained in Chapter 4, we need to update the 3D reference model to minimizethe use of the non-rigid registration algorithm. To accomplish that, one solution is tore-send the 3D deformed reference model into the grid with high weight. As explainedin Section 3.2, the KinectFusion algorithm integrates raw depth data into a grid basedon TSDF computation and a weight that indicates uncertainty. The higher the weight,the faster the 3D reference model shape is updated based on the current measurement.Therefore, to accommodate the current deformation and to stabilize the tracking faster, ahigh weight must be used to update the 3D reference model. We have tested the influenceof such updating on the tracking accuracy. This test can be seen in Table 6.1. Whileweight 1 does not result in fast update on 3D reference model shape, stabilization interms of accuracy is achieved with weight between 8 and 16. We have used weight 8 forall the other tests performed in this section because it provides more stable results than

6.2 EVALUATION 45

As-rigid-as-possibleExpression

Cheeks Inflated

Smile

Kiss

Figure 6.2 Neutral and deformed reference models for a different user.

weight 16 (vide standard deviation measurements in Table 6.1). The exception for thisstatement occurs for the scenario where the error accumulated is too high (in our tests,bag deformation). In this case, the use of a weight higher than 8 is required to minimizethe error estimated.

From the tests conducted on all cases mentioned at the beginning of this section inwhich the 3D reference model is a face, we estimated an average accuracy of 1.5mm forrigid tracking during 3D rigid reference model reconstruction. For the bag, the averageaccuracy was 2mm for the same step.

As can be seen in Table 6.2, when non-rigid user interaction is present, the averageaccuracy decreases for rigid tracking. We have tested different scenarios for non-rigidregistration in order to evaluate the best multi-frame strategy to balance accuracy andperformance. While skipping a specific number of frames (i.e. NR4, NR8) is a good strat-egy, to apply it for almost every frame reduces the performance while being, sometimes,unnecessary (i.e. NR1, NR2). Likewise, to apply it between a large number of frames(i.e. NR16, NR32, NR64) improves slightly average tracking accuracy while application’sperformance keeps almost the same when compared to the rigid solution. However, if highdeformation occurs in-between these frames, the tracking will fail (i.e. error measuredwill be above a pre-defined threshold used to detect rigid tracking failure). To applythe non-rigid registration whenever the rigid tracking fails (i.e. NRAdaptive) is a goodidea in order to solve every deformation which occurs between frames, while maintaininggood accuracy even for the bag scenario, by considering the relative error reduction whencompared to the rigid solution.

When the 3D reference model is continuously updated for a case in which there isa small region of deformation, it will become increasingly smooth for each frame. Inthis case, this solution may be not the most accurate, as the 3D reference model will loseinformation in regions where there is no deformation. In Table 6.2, we can see this scenariofrom the tests conducted on the ”kiss”, ”cheeks-2” and ”open mouth” expressions, where


Rigid Object Open Mouth

Angry

Deformed Bag

Figure 6.3 Neutral and deformed reference models based on challenging deformation scenarios.

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 1600

2

4

Frame

Err

or(mm

)

Figure 6.4 Cheeks tracking error measured for both rigid and rigid + non-rigid solutions. Plotin red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

non-rigid registration applied for every 1 or 2 frames did not produce the best results.This issue can be minimized by using the adaptive approach.

Tracking error evolution can be seen in Figures 6.4, 6.6, 6.8, 6.10, 6.12, 6.14, 6.16, 6.18and 6.20. When there is sufficient non-rigid user interaction, error grows considerablyand the non-rigid solution minimizes it. 3D reference model is updated to stabilize thetracking based on the current deformation. Non-rigid registration and 3D reference modelupdating are done only when the deformation changes in intensity (i.e. error above thethreshold, shown as a dashed line) and the rigid tracking fails.

A test to analyse the best threshold to detect rigid tracking failure was performedfor the simple deformation scenarios shown in Figures 6.1 and 6.2 and the results can beseen in Table 6.3. As mentioned before, rigid tracking has average accuracy of 1.5mm.Therefore, by using this value as threshold, the algorithm applies non-rigid tracking foralmost every new frame. On the opposite case, by using threshold of 3mm, the algorithm

6.2 EVALUATION 47

User Deformation Cheeks Smile KissTSDF Weight A SD A SD A SD

1 2.4 0.15 2.23 0.19 1.90 0.172 2.2 0.14 2.09 0.16 1.83 0.164 2 0.1 2.03 0.14 1.81 0.168 1.9 0.1 1.99 0.12 1.75 0.1516 1.9 0.1 1.99 0.13 1.75 0.16

User Deformation Cheeks-2 Smile-2 Kiss-2TSDF Weight A SD A SD A SD

1 2.43 0.37 2.07 0.24 2.74 0.32 2.34 0.27 1.95 0.17 2.57 0.234 2.36 0.17 1.9 0.16 2.43 0.158 2.35 0.15 1.85 0.14 2.42 0.1216 2.4 0.22 1.85 0.14 2.72 0.60

User Deformation Open Mouth Angry BagTSDF Weight A SD A SD A SD

1 2.11 0.29 4.02 0.67 13.17 5.122 2.01 0.23 3.49 0.62 10.75 4.924 1.95 0.14 3.07 0.37 8.49 2.358 1.91 0.18 2.95 0.25 6.91 1.5616 2.29 0.41 2.95 0.26 6.80 1.03

Table 6.1 Average accuracy (A, given in mm) and Standard Deviation (SD, given in mm)results according to the weight used to update the 3D reference model.

uses almost rigid tracking only. In this sense, the best threshold is between 2mm and2.5mm, which provides fast and accurate tracking. For the challenging deformationscenarios shown in Figure 6.3, tracking error coming from deformation is too high whencompared to the simple scenarios. Therefore, by using the threshold of 2mm, the non-rigid registration algorithm would be applied for almost every input frame. In this sense,for each one of the datasets, we have chosen an appropriate threshold to validate ourapproach. From the tests conducted, 2mm for open mouth expression, 3mm for angryexpression and 7mm for bag deformation allowed our approach to achieve the best results.

In terms of visual quality and accuracy, from Figures 6.5, 6.7, 6.9, 6.11, 6.13, 6.15,6.17, 6.19 and 6.21, it is visible that the algorithm captures the main deformation presenton the deformed expressions through the sequence of frames, improving accuracy in re-gions where only rigid registration cannot solve the tracking. In this context, our maincontribution is that the non-rigid registration algorithm runs in real-time, allowing itsapplication for an AR environment.

As a pre-processing step, 3D reference model reconstruction is performed at 30 framesper second (FPS). When applied, non-rigid registration requires 60ms per frame. Thestep which takes most time to be completed for every frame is the non-linear optimization,which demands on average 45ms per frame.


User Deformation Cheeks Smile KissTracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf.

Rigid 3 0.34 30 2.72 0.33 30 2.18 0.15 30NR64 2.63 0.2 30 2.66 0.29 30 2.1 0.16 29NR32 2.5 0.16 28 2.53 0.25 28 1.99 0.18 27NR16 2.48 0.17 26 2.42 0.25 27 1.9 0.14 26NR8 2.11 0.21 22 2.17 0.21 24 1.75 0.2 24

NRAdaptive 1.9 0.1 20 1.99 0.12 20 1.75 0.15 27NR4 1.87 0.17 18 2.11 0.15 21 1.67 0.14 20NR2 1.73 0.15 14 1.99 0.13 15 1.75 0.14 15NR1 1.7 0.19 10 1.96 0.28 10 1.92 0.14 10

User Deformation Cheeks-2 Smile-2 Kiss-2Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf.

Rigid 3.74 0.6 30 2.54 0.48 30 3.09 0.48 30NR64 3.45 0.5 30 2.45 0.42 28 3.03 0.46 30NR32 3.22 0.44 26 2.34 0.37 26 2.97 0.42 28NR16 2.84 0.41 24 2.21 0.31 25 2.88 0.38 27NR8 2.43 0.38 22 1.96 0.21 24 2.73 0.3 24


User Deformation Open Mouth Angry BagTracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf.

Rigid 2.41 0.34 30 5.19 0.91 30 11.3 3.59 30NR64 2.16 0.31 30 4.70 0.85 30 9.43 4.26 30NR32 2.07 0.30 27 4.34 0.86 30 14.19 6.25 27NR16 1.98 0.25 26 4.02 0.95 27 13.58 5.45 17NR8 1.94 0.24 21 3.42 0.53 22 11.40 5.97 12


Table 6.2 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given inmm) and Performance (Perf., given in FPS) results for each one of the tracking algorithmstested in presence of specific user deformation. NRn: Non-Rigid Registration applied for everyn frames (independent of rigid tracking fail); NRAdaptive: Non-Rigid Registration appliedwhenever the rigid algorithm fails.

6.2 EVALUATION 49

RigidTracking Error

Non-RigidTracking Error

Frame 8 Frame 20 Frame 50 Frame 80 Frame 101 Frame 143

10mm

0mm

Figure 6.5 Color-coded cheeks tracking error measured for both rigid and non-rigid solutions.

20 40 60 80 100 120 140 160 180 2000

2

4

Frame

Err

or(mm

)

Figure 6.6 Cheeks-2 tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

User Deformation Cheeks Smile KissThreshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf.

3 2.8 0.21 28 2.62 0.33 30 2.18 0.15 302.5 2.3 0.15 21 2.28 0.36 26 2.15 0.16 302 1.9 0.1 20 1.99 0.12 20 1.75 0.15 27

1.5 1.8 0.19 12 1.96 0.28 10 1.92 0.14 10

User Deformation Cheeks-2 Smile-2 Kiss-2Threshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf.

3 2.69 0.25 28 2.54 0.48 30 2.71 0.20 242.5 2.35 0.15 24 2.21 0.28 28 2.42 0.12 202 2.31 0.14 10 1.85 0.14 26 2.25 0.11 10

1.5 2.26 0.19 10 1.61 0.09 12 2.24 0.11 10

Table 6.3 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given inmm) and Performance (Perf., given in FPS) results for each one of the thresholds used to detectrigid tracking fail.

From Table 6.2, it it visible that NRAdaptive approach allows real-time performance


RigidTracking Error



10mm

0mm

Figure 6.7 Color-coded cheeks-2 tracking error measured for both rigid and non-rigid solutions.

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 1600

2

4

Frame

Err

or(mm

)

Figure 6.8 Smile tracking error measured for both rigid and rigid + non-rigid solutions. Plotin red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

(above 20 FPS) for almost all the deformations, with the exception of the bag, which isan object with much more points sent for optimization than the human’s head, then thenon-rigid registration runs slower for this scenario. It is worthy to mention that, in thiscase, the algorithm is not applied almost for every frame as the 3D reference model isupdated based on the present deformation, reducing the chances for rigid tracking failin the next iterations. As can be seen in the plots of Figures 6.4, 6.6, 6.8, 6.10, 6.12,6.14, 6.16, 6.18 and 6.20. The algorithm is applied 21 times (≈ for every 8 frames) forcheeks deformation, 29 times (≈ for every 8 frames) for cheeks-2 deformation, 66 times(≈ for every 2.5 frames) for smile deformation, 8 times (≈ for every 15 frames) for smile-2 deformation, 16 times (≈ for every 10 frames) for kiss deformation, 25 times (≈ forevery 6 frames) for kiss-2 deformation, 73 times (≈ for every 2.2 frames) for open mouthdeformation, 71 times (≈ for every 2.2 frames) for angry deformation and 80 times (≈for every 1.5 frames) for bag deformation.

A limitation of this adaptive algorithm is that it does not track non-rigid motionsin which the 2D projections of the correspondent parts of the object are not near. Anexample of this situation can be seen in Figure 6.22. Looking at the 2D position ofthe arms, if they are under big motion between sequential frames (Figures 6.22-A and

6.2 EVALUATION 51

RigidTracking Error



10mm

0mm

Figure 6.9 Color-coded smile tracking error measured for both rigid and non-rigid solutions.

10 20 30 40 50 60 70 80 90 100 110 1200

2

4

Frame

Err

or(mm

)

Figure 6.10 Smile-2 tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

C), the projective data association matching algorithm will not match them, becausetheir corresponding pixels are not close enough. In this case, the 3D rigid referencemodel is reconstructed from user’s body (Figure 6.22-B). As the user moves his armsin front of the sensor (Figure 6.22-C) and, by the movement, they cannot be trackedproperly due to the use of the project association algorithm, all the trajectory of themovement performed by the user is integrated into the 3D reference model (Figure 6.22-D). As stated before, our multi-frame adaptive non-rigid registration solution integratesthe current depth data into the 3D reference model when deformation occurs. Therefore,when the algorithm cannot register object’s movement, its residual error is integrated intothe reference model, based on the updating of the TSDF representation, which averagesthe current 3D reference model implicitly stored on the grid and the current depth datacaptured by the sensor. Because there is no updating on 3D reference model’s topology,the genus (i.e. hole) present in-between the body and the arms, during the opening ofthe arms, is not transferred to the 3D reference model. Even in this case, the adaptiveapproach produces results better than the ones obtained by using rigid registration only(Figures 6.23 and 6.24).


RigidTracking Error



10mm

0mm

Figure 6.11 Color-coded smile-2 tracking error measured for both rigid and non-rigid solutions.

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 1600

2

4

Frame

Err

or(mm

)

Figure 6.12 Kiss tracking error measured for both rigid and rigid + non-rigid solutions. Plotin red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

6.3 SUMMARY

In this chapter we have evaluated the multi-frame adaptive non-rigid registration algo-rithm in a MAR environment. To validate our approach, tests were realized using mainlyuser’s face as natural marker and user’s facial expressions as non-rigid interactions. Fromthe tests conducted, we have shown that the non-rigid registration, applied in a multi-frame manner, is capable to run in real-time on customer hardware. Moreover, it improvesthe tracking accuracy of the MAR environment when compared to the rigid-only solutionor other real-time non-rigid registration techniques, such as the ED algorithm.

6.3 SUMMARY 53

RigidTracking Error



10mm

0mm

Figure 6.13 Color-coded kiss tracking error measured for both rigid and non-rigid solutions.

10 20 30 40 50 60 70 80 90 100 1100

2

4

Frame

Err

or(mm

)

Figure 6.14 Kiss-2 tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

RigidTracking Error



10mm

0mm

Figure 6.15 Color-coded kiss-2 tracking error measured for both rigid and non-rigid solutions.


10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 1600

2

4

Frame

Err

or(mm

)

Figure 6.16 Open Mouth tracking error measured for both rigid and rigid + non-rigid solu-tions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line -threshold.

RigidTracking Error



10mm

0mm

Figure 6.17 Color-coded open mouth tracking error measured for both rigid and non-rigidsolutions.

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 1600

2

4

6

8

10

Frame

Err

or(mm

)

Figure 6.18 Angry tracking error measured for both rigid and rigid + non-rigid solutions.Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold

6.3 SUMMARY 55

RigidTracking Error



10mm

0mm

Figure 6.19 Color-coded angry tracking error measured for both rigid and non-rigid solutions.

10 20 30 40 50 60 70 80 90 100 110 1200

5

10

15

20

Frame

Err

or(mm

)

Figure 6.20 Bag tracking error measured for both rigid and rigid + non-rigid solutions. Plotin red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

RigidTracking Error



10mm

0mm

Figure 6.21 Color-coded bag tracking error measured for both rigid and non-rigid solutions.


A B CD

Figure 6.22 Limitation of the proposed method. User’s body (A) is reconstructed (B) andthe algorithm cannot track user’s arms (C) integrating all the movement into the 3D referencemodel (D).

50 100 150 200 250 300 350 400 450 5000

5

10

15

20

Frame

Err

or(mm

)

Figure 6.23 Body tracking error measured for both rigid and rigid + non-rigid solutions. Plotin red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

6.3 SUMMARY 57

RigidTracking Error


Frame 50 Frame 150 Frame 300 Frame 500

10mm

0mm

Figure 6.24 Color-coded body tracking error measured for both rigid and non-rigid solutions.

Chapter

7In this chapter the thesis is concluded with a summary and discussion of future directions.

CONCLUSION AND FUTURE WORK

7.1 CONCLUSION

In this thesis we have proposed a new adaptive non-rigid registration algorithm whichtracks interactively and with high accuracy 3D deformable objects captured from the realscene in a markerless augmented reality environment. The approach is robust to noisydatasets and it handles holes as well.

For validation of the proposed approach, we have designed a markerless augmentedreality environment based on a low-cost RGB-D sensor. To add support for real-timenon-rigid markerless tracking, a graph-based deformation approach is used due to itsgenerality and real-time performance.

The three-level adaptivity used in the non-rigid registration significantly improvesaccuracy and performance of the tracking solution. The adaptive selection of nodes allowsthe focus of the registration where it is needed. Adaptive selection of constraints reducecomputational cost for optimization improving application’s performance. Moreover, themulti-frame way in which the non-rigid registration is applied keeps real-time performanceand improves tracking accuracy when compared to the rigid-only solution.

To achieve even better performance, the algorithms were designed in a novel, parallelway to be easily implemented on GPU, exploiting the parallelism that such hardwareprovides.

As result, we have designed a non-rigid registration technique which outperformsthe performance of existing related work, while keeping moderate accuracy on the finalsolution. To achieve such conclusion, tests were conducted on four datasets which arecharacterized by their different levels of noise and precision. To validate the deformableregistration in the context of the markerless augmented reality environment, tracking wasevaluated for several challenging scenarios. The multi-frame way in which the algorithmwas applied allowed averaged accuracy between 2 and 3mm, and real-time performance.

59

60 CONCLUSION AND FUTURE WORK

7.2 FUTURE DIRECTIONS

One way to improve performance is by optimizing the implementation of the Gauss-Newton non-linear solver, as done in related work (Zollhofer et al., 2014). As we couldreduce the size of the problem we want to solve, using such optimized representation forthe non-linear solver may increase performance for this step.

An alternative to optimize performance is to use an hybrid approach distributingprocessing in both GPU and multi-threaded CPU (Nutti et al., 2014). In this case, onemust investigate how the processing could be balanced between these two processingunits, analysing which algorithms would perform well for each processing unit.

To improve accuracy, matching algorithm can be changed from project associationto project-and-walk or a k-d tree-based search. However, performance is expected todecrease by using one of these methods because they are inherently costly. For instance,the k-d tree must be built and updated according to the point clouds which will beregistered.

On the current state, our approach does not support non-rigid registration betweennon-isometric structures (e.g. faces of different people, models which have grown) orobjects undergoing complex topological changing. State-of-the-art solutions for accurateshape matching can solve the problem of correspondence between these structures, how-ever these methods are far from real-time performance. Hence, it opens the need forsimplified shape matching algorithms that can provide real-time performance, suitablefor interactive applications, such as the ones for augmented reality.

BIBLIOGRAPHY

Akenine-moller, T.; Moller, T.; Haines, E. Real-Time Rendering. 2nd. ed. Natick, MA,USA: A. K. Peters, Ltd., 2002. ISBN 1568811829.

Allen, B.; Curless, B.; Popovic, Z. Articulated body deformation from range scan data.ACM Trans. Graph., ACM, New York, NY, USA, v. 21, n. 3, p. 612–619, jul. 2002. ISSN0730-0301.

Azuma, R. et al. Recent advances in augmented reality. IEEE Comput. Graph. Appl.,IEEE Computer Society Press, Los Alamitos, CA, USA, v. 21, n. 6, p. 34–47, nov. 2001.ISSN 0272-1716.

Azuma, R. T. A survey of augmented reality. Presence: Teleoperators and Virtual Envi-ronments, v. 6, n. 4, p. 355–385, ago. 1997.

Bermano, A. H. et al. Facial performance enhancement using dynamic shape space anal-ysis. ACM Trans. Graph., ACM, New York, NY, USA, v. 33, n. 2, p. 13:1–13:12, abr.2014. ISSN 0730-0301.

Besl, P.; Mckay, H. A method for registration of 3-d shapes. Pattern Analysis and MachineIntelligence, IEEE Transactions on, v. 14, n. 2, p. 239 –256, feb 1992. ISSN 0162-8828.

Bouaziz, S.; Wang, Y.; Pauly, M. Online modeling for realtime facial animation. ACMTrans. Graph., ACM, New York, NY, USA, v. 32, n. 4, p. 40:1–40:10, jul. 2013. ISSN0730-0301.

Bradski, D. G. R.; Kaehler, A. Learning Opencv, 1st Edition. First. [S.l.]: O’Reilly Media,Inc., 2008. ISBN 9780596516130.

Brown, B.; Rusinkiewicz, S. Global non-rigid alignment of 3-D scans. ACM Transactionson Graphics (Proc. SIGGRAPH), v. 26, n. 3, ago. 2007.

Chang, W.; Zwicker, M. Automatic registration for articulated shapes. In: Proceedingsof the Symposium on Geometry Processing. Aire-la-Ville, Switzerland, Switzerland: Eu-rographics Association, 2008. (SGP ’08), p. 1459–1468.

Chang, W.; Zwicker, M. Global registration of dynamic range scans for articulated modelreconstruction. ACM Trans. Graph., ACM, New York, NY, USA, v. 30, n. 3, p. 26:1–26:15, maio 2011. ISSN 0730-0301.

Chen, J.; Izadi, S.; Fitzgibbon, A. Kinetre: Animating the world with the human body.In: . New York, NY, USA: ACM, 2012. (UIST ’12), p. 435–444. ISBN 978-1-4503-1580-7.

61

62 BIBLIOGRAPHY

Chen, Y.; Medioni, G. Object modelling by registration of multiple range images. ImageVision Comput., Butterworth-Heinemann, Newton, MA, USA, v. 10, n. 3, p. 145–155,abr. 1992. ISSN 0262-8856.

Collins, T. et al. Computer-assisted laparoscopic myomectomy by augmenting the uteruswith pre-operative mri data. In: Mixed and Augmented Reality (ISMAR), 2014 IEEEInternational Symposium on. [S.l.: s.n.], 2014. p. 243–248.

Cruz, L.; Lucio, D.; Velho, L. Kinect and rgbd images: Challenges and applications. In:Graphics, Patterns and Images Tutorials (SIBGRAPI-T), 2012 25th SIBGRAPI Confer-ence on. [S.l.: s.n.], 2012. p. 36–49.

Curless, B.; Levoy, M. A volumetric method for building complex models from rangeimages. In: . New York, NY, USA: ACM, 1996. (SIGGRAPH ’96), p. 303–312. ISBN0-89791-746-4.

Dou, M.; Fuchs, H.; Frahm, J.-M. Scanning and tracking dynamic objects with commod-ity depth cameras. In: . [S.l.]: IEEE Computer Society, 2013. (ISMAR ’13).

Fanelli, G. et al. Real time head pose estimation from consumer depth cameras. In:Proceedings of the 33rd International Conference on Pattern Recognition. [S.l.: s.n.], 2011.(DAGM’11). ISBN 978-3-642-23122-3.

Freund, Y.; Schapire, R. E. A decision-theoretic generalization of on-line learning and anapplication to boosting. In: Proceedings of the Second European Conference on Compu-tational Learning Theory. London, UK, UK: Springer-Verlag, 1995. (EuroCOLT ’95), p.23–37. ISBN 3-540-59119-2.

Gelfand, N. et al. Robust global registration. In: Proceedings of the Third EurographicsSymposium on Geometry Processing. Aire-la-Ville, Switzerland, Switzerland: Eurograph-ics Association, 2005. (SGP ’05). ISBN 3-905673-24-X.

Grassia, F. S. Practical parameterization of rotations using the exponential map. J.Graph. Tools, A. K. Peters, Ltd., Natick, MA, USA, v. 3, n. 3, p. 29–48, mar. 1998.ISSN 1086-7651.

Haouchine, N. et al. Single view augmentation of 3d elastic objects. In: Mixed and Aug-mented Reality (ISMAR), 2014 IEEE International Symposium on. [S.l.: s.n.], 2014. p.229–236.

Haouchine, N. et al. Image-guided simulation of heterogeneous tissue deformation foraugmented reality during hepatic surgery. In: Mixed and Augmented Reality (ISMAR),2013 IEEE International Symposium on. [S.l.: s.n.], 2013. p. 199–208.

Henry, S. Parallelizing cholesky’s decomposition algorithm. INRIA Bourdeaux TechnicalReport, 2009.

BIBLIOGRAPHY 63

Holzer, S. et al. Adaptive neighborhood selection for real-time surface normal estimationfrom organized point cloud data using integral images. In: Intelligent Robots and Systems(IROS), 2012 IEEE/RSJ International Conference on. [S.l.: s.n.], 2012. p. 2684–2689.ISSN 2153-0858.

Horn, B. K.; Schunck, B. G. Determining optical flow. Artificial Intelligence, v. 17, n.1–3, p. 185 – 203, 1981. ISSN 0004-3702.

Izadi, S. et al. Kinectfusion: real-time 3d reconstruction and interaction using a movingdepth camera. In: . USA: ACM, 2011. (UIST ’11), p. 559–568. ISBN 978-1-4503-0716-1.

Johnson, A.; Hebert, M. Using spin images for efficient object recognition in cluttered 3dscenes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, v. 21, n. 5, p.433–449, May 1999. ISSN 0162-8828.

Kato, H.; Billinghurst, M. Marker tracking and hmd calibration for a video-based aug-mented reality conferencing system. In: Augmented Reality, 1999. (IWAR ’99) Proceed-ings. 2nd IEEE and ACM International Workshop on. [S.l.: s.n.], 1999. p. 85 –94.

Kirk, D. B.; Hwu, W.-m. W. Programming Massively Parallel Processors: A Hands-onApproach. 1st. ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2010.ISBN 0123814723, 9780123814722.

Lee, J.-D. et al. Medical augment reality using a markerless registration framework. ExpertSyst. Appl., v. 39, n. 5, p. 5286–5294, 2012.

Li, H. et al. Robust single-view geometry and motion reconstruction. ACM Transactionson Graphics (Proceedings SIGGRAPH Asia 2009), ACM, v. 28, n. 5, December 2009.

Li, H. et al. Temporally coherent completion of dynamic shapes. ACM Transactions onGraphics, ACM, v. 31, n. 1, January 2012.

Li, H.; Sumner, R. W.; Pauly, M. Global correspondence optimization for non-rigidregistration of depth scans. Computer Graphics Forum (Proc. SGP’08), EurographicsAssociation, ETH Zurich, v. 27, n. 5, July 2008.

Li, H. et al. Realtime facial animation with on-the-fly correctives. ACM Transactions onGraphics, ACM, v. 32, n. 4, July 2013.

Lipman, Y.; Funkhouser, T. Mobius voting for surface correspondence. ACM Trans.Graph., ACM, New York, NY, USA, v. 28, n. 3, p. 72:1–72:12, jul. 2009. ISSN 0730-0301.

Lorusso, A.; Eggert, D. W.; Fisher, R. B. A comparison of four algorithms for estimating3-d rigid transformations. In: Proceedings of the 1995 British Conference on MachineVision (Vol. 1). Surrey, UK, UK: BMVA Press, 1995. (BMVC ’95), p. 237–246. ISBN0-9521898-2-8.

64 BIBLIOGRAPHY

Lucas, B. D.; Kanade, T. An iterative image registration technique with an applicationto stereo vision. In: Proceedings of the 7th International Joint Conference on ArtificialIntelligence - Volume 2. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1981. (IJCAI’81), p. 674–679.

Macedo, M.; Apolinario, A.; Souza, A. A robust real-time face tracking using head poseestimation for a markerless ar system. In: Virtual and Augmented Reality (SVR), 2013XV Symposium on. [S.l.: s.n.], 2013. p. 224–227.

Macedo, M. et al. A Semi-Automatic Markerless Augmented Reality Approach for On-Patient Volumetric Medical Data Visualization. In: SVR. Brazil: [s.n.], 2014.

Madsen, K.; Bruun, H.; Tingleff, O. Methods for non-linear least squares problems. 2004.

Milgram, P.; Kishino, F. A taxonomy of mixed reality visual displays. IEICE Trans.Information Systems, E77-D, n. 12, p. 1321–1329, dez. 1994.

Newcombe, R. A. et al. Kinectfusion: Real-time dense surface mapping and tracking. In:. Washington, DC, USA: IEEE Computer Society, 2011. (ISMAR ’11), p. 127–136. ISBN978-1-4577-2183-0.

Nutti, B. et al. Depth sensor-based realtime tumor tracking for accurate radiation therapy.Proc. of Eurographics 2014 Short Papers, Eurographics Association, April 2014.

Nvidia. CUBLAS Library. 2008. http://docs.nvidia.com/cuda/cublas/. [Online; ac-cessed 06-Oct-2014].

Nvidia. CUSPARSE Library. 2014. https://developer.nvidia.com/cusparse. [Online;accessed 14-Nov-2014].

Occipital. OpenNI. 2015. http://structure.io/openni. Accessed 05 January 2015.

Rusinkiewicz, S.; Levoy, M. Efficient variants of the ICP algorithm. In: 3DIM. [S.l.: s.n.],2001.

Rusu, R.; Cousins, S. 3d is here: Point cloud library (pcl). In: ICRA. [S.l.: s.n.], 2011.p. 1 –4. ISSN 1050-4729.

Souza, A. C.; Macedo, M.; Apolinario, A. Multi-Frame Adaptive Non-Rigid Registrationfor Markerless Augmented Reality. In: VRCAI. China: [s.n.], 2014.

Sumner, R. W.; Schmid, J.; Pauly, M. Embedded deformation for shape manipulation.ACM Trans. Graph., ACM, New York, NY, USA, v. 26, n. 3, jul. 2007. ISSN 0730-0301.

Tam, G. K. L. et al. Registration of 3d point clouds and meshes: A survey from rigid tononrigid. IEEE Transactions on Visualization and Computer Graphics, IEEE ComputerSociety, Los Alamitos, CA, USA, v. 19, n. 7, p. 1199–1217, 2013. ISSN 1077-2626.

BIBLIOGRAPHY 65

Teichrieb, V. et al. A survey of online monocular markerless augmented reality. Interna-tional Journal of Modeling and Simulation for the Petroleum Industry, v. 1, n. 1, p. 1–7,2007.

Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In: ICCV. [S.l.:s.n.], 1998. p. 839 –846.

Vallino, J. R. Interactive Augmented Reality. Rochester, NY, USA, 1998.

Viola, P.; Jones, M. J. Robust real-time face detection. Int. J. Comput. Vision, KluwerAcademic Publishers, Hingham, MA, USA, v. 57, n. 2, p. 137–154, maio 2004. ISSN0920-5691.

Wang, J. et al. Augmented reality during angiography: Integration of a virtual mirror forimproved 2d/3d visualization. In: Mixed and Augmented Reality (ISMAR), 2012 IEEEInternational Symposium on. [S.l.: s.n.], 2012. p. 257–264.

Weise, T. et al. Realtime performance-based facial animation. ACM Transactions onGraphics, ACM, v. 30, n. 4, July 2011.

Weise, T.; Leibe, B.; Gool, L. V. Fast 3d scanning with automatic motion compensation.In: CVPR. [S.l.: s.n.], 2007.

Zollhofer, M. et al. Real-time non-rigid reconstruction using an rgb-d camera. ACMTransactions on Graphics (TOG), ACM, 2014.

antonio carlos dos santos souza - professores.ifba.edu.br · universidade federal da bahia...

Documents