research article marching cubes algorithm for fast 3d

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 203609 7 pageshttpdxdoiorg1011552013203609

Research ArticleMarching Cubes Algorithm for Fast 3D Modeling ofHuman Face by Incremental Data Fusion

Xiangsheng Huang Xinghao Chen Tao Tang and Ziling Huang

Institute of Automation Chinese Academy of Sciences Beijng 100090 China

Correspondence should be addressed to Xiangsheng Huang xiangshenghuangiaaccn

Received 22 December 2012 Accepted 20 January 2013

Academic Editor Sheng-Yong Chen

Copyright copy 2013 Xiangsheng Huang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

We present a 3D reconstruction system to realize fast 3D modeling using a vision sensor The system can automatically detect theface region and obtain the depth data as well as color image data once a person appears in front of the sensorWhen the user rotateshis head around the systemwill track the pose and integrate the new data incrementally to obtain a complete model of the personalhead quickly In the system iterative closest point (ICP) algorithm is first used to track the pose of the head and then a volumetricintegration method is used to fuse all the data obtained Third ray casting algorithm extracts the final vertices of the model andfinally marching cubes algorithm generates the polygonal mesh of the reconstructed face model for displaying During the processwe also make improvements to speed up the system for human face reconstruction The system is very convenient for real-worldapplications since it can run very quickly and be easily operated

1 Introduction

3D reconstruction has always been an interesting topic sinceMicrosoft Kinect camera came to use The depth data ofhuman face can be easily acquired using depth cameraHowever it is still difficult to get a perfect whole face modelOne probable method to solve this problem is to obtaindepth maps from different cameras in different directionssimultaneously [1ndash5] Another solution is to get data withonly one camera in different time The next step is to doreconstruction with all the data from different directionsfinally generating a mesh model of human face A lot ofalgorithms for reconstruction have been proposed recentlybut hardly get perfect results Moreover many of them arenot so conveniently or easily used in practice

In this paper we present a system that realizes fast humanface tracking and 3D reconstruction with only one Kinectsensor collecting depth data (Figure 1) The depth data isacquired from the Kinect sensor and is converted to vertexmap and normal map as there is a corresponding relationbetween the 2D depth map coordinate system and the 3Dcamera coordinate system The ICP algorithm is used totrack the relative pose between the current data map and the

previous one and then the transformation matrix frame-to-frame can be estimated Thereby the pose of current camerain the global coordinate can be obtained which is actuallythe human face coordinate system By doing the volumetricintegrating using truncated signed distance function thedepth data of all the frames can be integrated into a volumeand the zero-crossing surface can be extracted for the nextICP tracking The reconstructed 3D model is rendered usingthe marching cubes algorithm in the reconstructing processso the user can adjust his face pose to get a better humanface model by filling the hole This system allows users toreconstruct human face model with only one depth cameraand incrementally get higher quality 3D human face modelby rotating the head

2 Related Works

The research on real-time 3D reconstruction is a hot topicin computer vision An accurate and robust 3D surfaceregistration method for In-hand modeling was proposed by[6] A complete 3D model of an object can be obtained bysimply turning around and being scanned by a camera usingiterative closest points (ICPs) algorithm [1] for coarse and

2 Mathematical Problems in Engineering

Figure 1 User sits in front of a fixed Kinect sensor and the recon-struction can be done

fine registration The authors also proposed a method fordetecting registration failure based on both geometric andtexture consistences [6]With the user performing slight headrotation while keeping the facial expression unchanged thesystem proposed byWeise et al [7] aggregatedmultiple scansinto one 3D model of the face A method for automaticallyregistering multiple 3D data sets without any knowledge ofinitial pose was proposed by [8] Jaeggli et al [9] presenteda system which produces complete 3D model using a high-speed acquisition equipment and a registrationmoduleTheyused pairwise registration as well as multiview refinementto get better results Azernikov and Fischer [10] proposedvolume warping method for surface reconstruction

KinectFusion project [11ndash13] presented a system that usesonly one moving depth camera to create detailed 3D recon-struction of a complex and arbitrary indoor scenes in realtime with GPU It also enables advanced augmented realityand multitouch on any indoor scene with arbitrary surfacegeometries Zollhofer et al [14] proposed an algorithm forcomputing a personalized avatar using a single color imageand corresponding depth image It obtains a high-quality3D reconstruction model of the face that has one-to-onecorresponding geometry and texture with the generic facemodel

3 Method and Implementation

The flow chart of our 3D face reconstruction system is shownin Figure 2 The whole system mainly consists of six stagesface detection and segmentation depth map conversion facetracking volumetric integration recasting and marchingcubes Each stage will be described in the following sections

31 Face Detection and Segmentation The Kinect sensoracquires the 640lowast480 color and depth image at 30Hz Firstlythe face region is detected using a Haar classifier [15] To getmore stable results we use the frontal-face and profile-faceclassifiers to detect the face twice

A more sophisticated extraction is used to ensure thatonly the face data in consideration is carried out after gettingthe region of the faceWe search the depth image in a windowaround the central point of the face region to get a valid

depth value Then the depth image will be traversed withinthe face region and every depth value will be compared withthe central depth value If the depth value changes more thanthe specific threshold it will be given an invalid value todistinguish nonface region from face region

In order to run the algorithm fast the face region is onlydetected at the beginning of the algorithm or after resettingOnce a valid bounding rectangle of face is obtained the facedetection phase is omitted and only segmentation task isexecuted

32 Fast Face Depth Map Conversion Firstly a bilateral filteris applied to the raw depth map in order to obtain noise-reduced face depth map and maintain the depth boundariessimultaneously The filter can be described as follows [12](raw map is denoted by 119877

119894 and the filtered depth map is

denoted by119863119894)

119863119894 () =

1

119882119901

sum

isin119908

119873120590(1003817100381710038171003817 minus

10038171003817100381710038172)119873120590(1003817100381710038171003817119877119894 () minus 119877119894 ()

10038171003817100381710038172) 119877119894()

(1)

where = (119906 V)119879 is the depth image pixel and isin 119908 (119908 is awindow to reduce computation complexity)119882

119901and119873

120590(119905) =

exp(minus1199052120590minus2) are normalizing constantsIn a real-time system the time complexity is an important

factor In the filter previously mentioned the exponentarithmetic is computationally expensive Since the distancebetween the depth image pixels is an integer we can use alookup table to speed up the computation Only the pixelswithin a certain window are considered so the size of lookuptable is not large

Given the camera intrinsic parameters (119891119909 119891119910 119888119909 119888119910

which respectively stand for the focal length in 119909 and 119910 axesthe focal point coordinate in 119909 and 119910 axes) the depth mapcan be converted into the vertex map (denoted by 119881

119894) and

corresponding normal vectors for each vertex can be easilycomputed using neighboring points

Assuming that (119906 V) is a pixel in the raw depth map and119911 = 119863

119894(119906 V) is the depth of filtered depth map then the coor-

dinate of 119881119894can be computed as follows

119881119894 (119909) =

119911 (119906 minus 119888119909)

119891119909

119881119894(119910) =

119911 (119906 minus 119888119910)

119891119910

119881119894 (119911) = 119911

(2)

With the vertex data obtained the normal vectors of eachvertex are computed with the following equation [12]

119873119894= (119881119894 (119906 + 1 V) minus 119881119894 (119906 V)) times (119881119894 (119906 V + 1) minus 119881119894 (119906 V))

(3)

and then normalized to the unit length using119873119894119873119894

Assuming that the image being processed has119898times119899 pixelsand the window of bilateral filter has a size of 119908 times 119908 the

Mathematical Problems in Engineering 3

Raw depthand rgb data

captured from kinect

sensor

Face detection and

segmention

Conversionbilateral filter

compute vertex and normal mapsof each level of

the pyramid

Trackinguse ICP to estimate the

globaltransform

Volumetric integration

compute the tsdf value for each voxel of

the volume

Raycastingraycast from the

vloume to getthe accumulated

vertex andnormal maps

Marching cubesget the polygon

mesh of thereconstructed 3D

model

Data capture

Data ready

No

Yes

Accumulated data used for ICP

Figure 2 Overall system workflow

computational complexity of filtering is 119874(119898 times 119899 times 1199082) The

Kinect sensor acquires a 640 times 640 depth image In the facesegmentation stage a smaller image that contains the faceregion is extracted which has a size of 196 times 196 in ourexperiment Besides 119908 = 7 and a lookup table is used toreplace the exponent arithmetic A complexity of119874(119898times 119899) isneeded to generate the normal vectors So the computationalcomplexity of this stage is quite low

33 Face Tracking The global model of the face is stored as astatue which means that the vertex map and the normal mapdo not move and rotate in the coordinate In the meantimethe sensor is fixed in the front of the user However becausethe userrsquos head rotates (and moves) in the modeling processthe view point of the depth data changes from frame to frameIn order to integrate the history model and the live datathe transformation matrix (a 3D matrix with 6 degrees offreedom) between them is required otherwise the result willbe chaos if we put the data directly Fortunately because ofthe preextraction the live data do not contain environmentand other references So we come out with the solution thatwe suppose that the camera is moving around to scan astatic head of the user In this stage the face depth datais processed frame-to-frame to track the face motion TheIterative Closest Point (ICP) algorithm [1] is used here tocompute the transformation matrix [12]

In ICP algorithm each iteration could be divided intothree stages first corresponding point pairs are chosenwithin the two frames then a point-to-plane Euclid distanceerror metric is used to calculate an optimal solution whichminimizes the error metric and finally the previous frameis transformed using the matrix obtained in the previous

stage preparing for the next iteration Solution becomesmoreaccurate after each cycle and an optimal transform matrixcan be obtained after finishing ICP algorithm

In the first stage the corresponding points between thecurrent frame and the previous frame should be found firstlyIn this paper points in the current frame are projected to theprevious one to obtain the corresponding points Given theprevious global camera pose 119879

119894minus1and the estimated current

pose 119879119894(which is initialized with 119879

119894minus1before the iteration

and is updated with an incremental transform calculatedper iteration of ICP) we then transform the current vertexmap into the global coordinate using 119881119892

119894= 119879119894119881119894() (Here

the human face coordinate system is regarded as the globalsystem Since our target is to obtain a complete facemodel sothat the data is required in face coordinate system)The pixelcorresponding to the global position119881119892

119894are required here so

we transform119881119892

119894into the previous camera coordinate system

to get 119881119888119894minus1

= 119879minus1

119894minus1119881119892

119894and then project it into the image plane

to get the pixel 119901

119901 (119906) =119881119888

119894minus1(119909) times 119891119909

119881119888

119894minus1(119911)

+ 119888119909

119901 (V) =119881119888

119894minus1(119910) times 119891

119910

119881119888

119894minus1(119911)

+ 119888119910

(4)

We look up the previous global vertexmap119881119892119894minus1

and globalnormal map 119873119892

119894minus1in the pixel 119901 obviously 119881

119894minus1() and 119881

119894()

are the correspondencesAlso thresholds of Euclidean distance and angles should

be set between them to reject outliers


Given these sets of corresponding oriented points thenext step is to minimize the point-to-plane error metric 119864to get the transformation matrix 119879 [16] Here the metric 119864stands for the sum of squared distances between the currentpoints and the tangent plane of the corresponding points

119864 = sum

1003817100381710038171003817(119879119881119894 () minus 119881119892

119894minus1())119873

119892

119894minus1()

10038171003817100381710038172 (5)

As there is only an incremental transformation betweenframes the rigid transform matrix can be written as follows

119879 = [119877 | 119905] = (

1 120572 minus120574 119905119909

minus120572 1 120573 119905119910

120574 minus120573 1 119905119911

) (6)

In each cycle an incremental transformation 119879119911

inc (119911represents the current iteration number) that minimizes theerrormetric can be estimatedThe desired global transforma-tion matrix can be simply updated by 119879119911

119892= 119879119911

inc sdot 119879119911minus1

119892

Weupdate the current global frame vertex estimates usingthe global transformation 119879

119911minus1

119892computed in the previous

iteration 119881119892119894() = 119879

119911minus1

119892119881119894() The increment transformation

can also be written as follow = (120573 120574 120572 119905

119909 119905119910 119905119911) isin R

6

119879119911

119892119881119894 () = 119877

119911119881119892

119894() + 119905

119911

= 119866 () + 119881119892

119894()

(7)

Assuming that 119881119892119894() = (119909 119910 119911)

119879 119866 can be representedas follows

119866 () = (

0 minus119911 119910 1 0 0

119911 0 minus119909 0 1 0

minus119910 119909 0 0 0 1

) (8)

By solving the following expression

min119909isinR6

sum

1198642119888

119864 = 119873119892

119894minus1()119879sdot (119866 () + 119881

119892

119894() minus 119881

119892

119894minus1())

(9)

An optimal transformation matrix needed can be com-puted

Then with the equations previously mentioned theexpression can be expressed as follows

sum

(119860119879119860) = sum

119860119879119887 (10)

where119860119879=119866119879()119873119892119894minus1() isin R6times1 and 119887 = 119873119892

119894minus1()119879(119881119892

119894minus1()minus

119881119892

119894()) isin R1times1 and we can easily compute the vector using

a Cholesky decompositionAfter the incremental matrix is obtained we enter the

next iteration where the operations previouslymentioned areused again And after all the iterations we can get the finalcamera pose 119879

119892larr 119879119911 max119892

A projecting strategy is used so that it takes 119874(1) com-

putational complexity for a point to find its correspondingpoint To get the transformation matrix between two framesapproximately a computational complexity of 119874(119898 times 119899) isacquired

34 Small Size Face Volumetric Integration Using the trans-formation matrix obtained in the last step the depth mapdata can be converted to point cloud in global coordinateData of this form is hard to integrate to form a face modelHere we convert them into a volumetric representation [17]A 3D volume of specific resolution that corresponds to thephysical space is divided into a 3D grid of voxels Truncatedsigned distance function (TSDF) is used here to convert the3D vertices into voxels of the volume [12]

In the volumetric integration phase each voxel in thevolume is traversed and the corresponding TSDF value isupdated using weighted average strategy For a voxel (119909 119910 119911)in the volume we first convert it into the global 3D positionV119892

V119892 (119909) = (119909 + 05119891) times cell size119909

V119892(119910) = (119910 + 05119891) times cell size119910

V119892 (119911) = (119911 + 05119891) times cell size119911

(11)

where cell size represents the size of the cell in the volume

cell size119909 = VOLUME SIZE 119883

VOLUME 119883 (12)

where VOLUME 119883 indicates how many cells are there in119909 axis of the volume and VOLUME SIZE 119883 indicates thecorresponding actual length

Subsequently the global ordinate V119892is transformed into

the camera ordinate V and the vertex V is projected into theimage plane to get the corresponding pixel 119901

Assuming that the translation vector of the global cameratransformation is denoted as 119905

119894 the distance between voxel

(119909 119910 119911) of the volume and the original point of the cameraordinates system can be calculated as V

119892minus 119905119894 Since we have

got the corresponding pixel 119901 we can get the actual depthmeasurement 119863

119894(119901) It should be pointed out that 119863

119894(119901) is

not equal to the vertexmap119881119894(119901) since the former represents

the distance between the original point and the specific pointwhile the latter only represents the 119911 value so a conversion isnecessary to get119863

119894(119901)

The SDF value of the voxel can be computed using SDF119894=

V119892minus 119905119894 minus 119863

119894(119901) This is normalized to a TSDF using the

truncating strategy The TSDF value of the voxel is updatedusing a simple running weighted average

tsdf avg =tsdf119894minus1

sdot 119908119894minus1

+ tsdf119894sdot 119908119894

119908119894minus1

+ 119908119894

(13)

Actually in practice we simply let 119908119894= 1 and can achieve

good resultsGiven that the goal of our work is fast tracking and recon-

struction time complexitymust be considered In volumetricintegration phase there are VOLUME 119883 times VOLUME 119884 times

VOLUME 119885 voxels that should be traversed so the volumesize cannot be too large for higher frame rate In case offace reconstruction the useful global size is approximately03 times 03 times 03 m3 However in the volumetric integratingalgorithm described previously the 119911 = 0 voxel of the volume


lies in the original position of the camera ordinate system Inother words if the distance between the face and the Kinectsensor is 07m VOLUME SIZE 119885 cannot be less than 07to ensure valid integration And at the same time all the0 lt 119911 lt 07 voxels are not used Consequently to ensureenough volume resolution VOLUME 119885 cannot be too smallwhich could result in higher time complexity and do toomuch useless work

Thus we introduce an offset to the volume We move thevolume along the 119911 axis a distance of offset as the 119911 = 0

plane can get close to but cannot reach the mesh of the faceThen the conversion from voxel into global position shouldbe modified as

V119892 (119911) = (119911 + 05119891) times cell size119911 + offset (14)

Some littlemodification should also bemade in the other partof the algorithm

In this stage a computational complexity of 119874(1198993) (119899 isthe length of the volume) is needed to update all the voxels inthe volume In our experiment we use 64 times 64 times 64 (119899 = 64

volume and 03 times 03 times 03 volume size along with an offset of06m and get rather good fast reconstruction results

35 Ray Casting The ray casting algorithm [18] appliedhere is to generate views of implicit surface for renderingand tracking For each pixel (119909 119910) of the output image asingle ray is emitted from the original point of the cameracoordinate system and goes through the point (119909 119910) of theimage plane With the direction of the ray we can extractthe surface position by traversing along the ray And thesurface intersection point can be easily obtained using linearinterpolation Then we can easily obtain the normal mapwith TSDF There are two contributions of the ray castingalgorithm one is the ability to view the implicit surface ofthe reconstructed 3D model and the other is to generatehigher quality data for ICP camera tracking When renderedto screen the noise shadows and holes will bemuch less thanthe raw depth data

We need to traverse all the voxels in the volume toextract the zero-crossing surface Therefore a computationalcomplexity of 119874(1198993) (119899 is the length of the volume) isacquired

36 Marching Cubes In our work we use marching cubesalgorithm [19 20] to obtain the mesh of the reconstructed 3Dmodel Each voxel of the volume is traversed and an indexis created to a precalculated array of 256 possible polygonconfigurationswithin the cube by treating each of the 8 scalarvalues as a bit in an 8-bit integer If the scalarrsquos value is higherthan the iso value (it is inside the surface) then the appropriatebit is set to one while if it is lower (outside) it is set to zeroThe final value after all 8 scalars are checked is the actualindex to the polygon indices array Finally each vertex ofthe generated polygons is placed on the appropriate positionalong the cubersquos edge by linearly interpolating the two scalarvalues that are connected by that edge

Table 1 Results of human face model reconstruction

Raw depth Reconstructed mesh Time

Shortly after the startof the reconstruction

Begin to scan theright side of the face

The right side of theface has almost been

reconstructed

Begin to scan the leftside of the face

The left side of theface has almost been

reconstructed

Reconstruction done


In marching cubes algorithm the computational com-plexity that is approximately 119874(1198993) (119899 is the length of thevolume) is acquired

4 Results

We test our 3D reconstruction system on a computer with32GHz CPU and 4GB memory We set the volume resolu-tion to 64 times 64 times 64 and the volumetric size to 03 times 03 times03m3 with an offset distance of 04m Note that running thereconstruction algorithm for one new frame only costs about180ms which is quite acceptable in practice

The results of our 3D reconstruction system are shown inTable 1 As shown in the table the 3D face model keeps beingrefined while the userrsquos head rotates in order to let the Kinectscan the whole face

We can find that the reconstruction result is very goodand is much smoother than the raw depth data And thereconstruction speed is also very acceptable

5 Conclusions

After depth cameras like Kinect sensor appear users caneasily obtain the depth data of an object 3D reconstructionespecially human face reconstruction has always been achallenging problem In this paper we represent a novel wayto create a 3D face reconstruction model Just sitting in frontof the Kinect camera and rotating his head the user can geta perfect human face model We use a volumetric integratingstrategy to fuse all the data so the reconstructed face modelbecomes more and more clear

We contribute the method to fast human face 3D recon-struction Our efforts to speed up the system are threefoldFirst we decrease the frequency of detecting face by onlydetecting when the shift of the face exceeds a specific thresh-old Second we use a lookup table to replace the computa-tionally expensive exponent arithmetic and try hard to reducerepeated computationThird we introduce some variances tothe volumetric integration algorithm to use less voxels whilekeeping the good resolution Using the methods previouslymentioned we get a well-performed face 3D reconstructionsystem

We will focus on larger object such as full body 3Dreconstruction and add color information to the model tomake the visualization better in the future work

Acknowledgment

This work was partially supported by the National NaturalScience Foundation of China (NSFC) under the Project61175034F030410

References

[1] P J Besl and N D McKay ldquoA method for registration of 3-D shapesrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 14 no 2 pp 239ndash256 1992

[2] C ShengyongW Yuehui andC Carlo ldquoKey issues inmodelingof complex 3D structures from video sequencesrdquoMathematicalProblems in Engineering vol 2012 Article ID 856523 17 pages2012

[3] S Y Chen and Y F Li ldquoVision sensor planning for 3-Dmodel acquisitionrdquo IEEE Transactions on Systems Man andCybernetics B vol 35 no 5 pp 894ndash904 2005

[4] C Carlo C Shengyong and A Gani ldquoInformation and model-ing in complexityrdquo Mathematical Problems in Engineering vol2012 Article ID 868413 4 pages 2012

[5] S Y Chen Y F Li Q Guan and G Xiao ldquoReal-time three-dimensional surface measurement by color encoded light pro-jectionrdquo Applied Physics Letters vol 89 no 11 Article ID 1111082006

[6] T Weise B Leibe and L Van Gool ldquoAccurate and robustregistration for in-hand modelingrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo08) June 2008

[7] T Weise S Bouaziz and H Li ldquoRealtime performance-basedfacial animationrdquo in Proceedings of the 38th Special InterestGroup on Computer Graphics and Interactive Techniques (SIG-GRAPH rsquo11) Vancouver Canada August 2011

[8] F Huber and M Hebert ldquoFully automatic registration ofmultiple 3D data setsrdquo in Proceedings of the IEEE Workshopon Computer Vision Beyond the Visible Spectrum Methods andApplications (CVBVS rsquo01) Kauai Hawaii USA December 2001

[9] T Jaeggli T Koninckx and L V Gool ldquoOnline 3d acquisitionand model integrationrdquo in Proceedings of IEEE InternationalWorkshop on Projector-Camera Systems (PROCAMS rsquo03) NiceFrance 2003

[10] S Azernikov and A Fischer ldquoA new volume warping methodfor surface reconstructionrdquo Virtual and Physical Prototypingvol 1 no 2 pp 65ndash71 2006

[11] S Izadi D Kim O Hilliges D Molyneaux R Newcombe PKohli et al ldquoKinectfusion real-time 3D reconstruction andinteraction using a moving depth camerardquo in Proceedings of the24th Annual ACM Symposium on User Interface Software andTechnology (UIST rsquo11) pp 559ndash568 ACM New York NY USA2011

[12] R A Newcombe S Izadi O Hilliges et al Kinect Fusion Real-Time Dense Surface Mapping and Tracking IEEE ISMAR 2011

[13] L Yong-Wan L Hyuk-Zae Y Na-Eun et al ldquo3-D reconstruc-tion using the kinect sensor and its application to a visualizationsystemrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics (SMC rsquo12) pp 3361ndash3366 2012

[14] M Zollhofer M Martinek G Greiner M Stamminger andJ Suszligmuth ldquoAutomatic reconstruction of personalized avatarsfrom 3D face scansrdquo Computer Animation and Virtual Worldsvol 22 no 2-3 pp 195ndash202 2011

[15] R Lienhart A Kuranov and V Pisarevsky ldquoEmpirical analysisof detection cascades of boosted classifiers for rapid objectdetectionrdquo Tech Rep Microprocessor Research Lab 2002

[16] K Low ldquoLinear least-squares optimization for point-to-planeICP surface registrationrdquo Tech Rep TR04-004 University ofNorth Carolina 2004

[17] B Curless and M Levoy ldquoVolumetric method for buildingcomplex models from range imagesrdquo in Proceedings of the 1996Special Interest Group on Computer Graphics and InteractiveTechniques (SIGGRAPH rsquo96) pp 303ndash312 August 1996

[18] S D Roth ldquoRay casting for modeling solidsrdquo Computer Graph-icsand Image Processing vol 18 no 2 pp 109ndash144 1982


[19] W E Lorensen and H E Cline ldquoMarching cubes a high res-olution 3D surface construction algorithmrdquo ACM Transactionson Graphics vol 21 no 4 pp 163ndash169 1987

[20] T S Newman and H Yi ldquoA survey of the marching cubesalgorithmrdquo Computers and Graphics vol 30 no 5 pp 854ndash8792006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


Figure 1 User sits in front of a fixed Kinect sensor and the recon-struction can be done

fine registration The authors also proposed a method fordetecting registration failure based on both geometric andtexture consistences [6]With the user performing slight headrotation while keeping the facial expression unchanged thesystem proposed byWeise et al [7] aggregatedmultiple scansinto one 3D model of the face A method for automaticallyregistering multiple 3D data sets without any knowledge ofinitial pose was proposed by [8] Jaeggli et al [9] presenteda system which produces complete 3D model using a high-speed acquisition equipment and a registrationmoduleTheyused pairwise registration as well as multiview refinementto get better results Azernikov and Fischer [10] proposedvolume warping method for surface reconstruction

KinectFusion project [11ndash13] presented a system that usesonly one moving depth camera to create detailed 3D recon-struction of a complex and arbitrary indoor scenes in realtime with GPU It also enables advanced augmented realityand multitouch on any indoor scene with arbitrary surfacegeometries Zollhofer et al [14] proposed an algorithm forcomputing a personalized avatar using a single color imageand corresponding depth image It obtains a high-quality3D reconstruction model of the face that has one-to-onecorresponding geometry and texture with the generic facemodel

3 Method and Implementation

The flow chart of our 3D face reconstruction system is shownin Figure 2 The whole system mainly consists of six stagesface detection and segmentation depth map conversion facetracking volumetric integration recasting and marchingcubes Each stage will be described in the following sections

31 Face Detection and Segmentation The Kinect sensoracquires the 640lowast480 color and depth image at 30Hz Firstlythe face region is detected using a Haar classifier [15] To getmore stable results we use the frontal-face and profile-faceclassifiers to detect the face twice

A more sophisticated extraction is used to ensure thatonly the face data in consideration is carried out after gettingthe region of the faceWe search the depth image in a windowaround the central point of the face region to get a valid

depth value Then the depth image will be traversed withinthe face region and every depth value will be compared withthe central depth value If the depth value changes more thanthe specific threshold it will be given an invalid value todistinguish nonface region from face region

In order to run the algorithm fast the face region is onlydetected at the beginning of the algorithm or after resettingOnce a valid bounding rectangle of face is obtained the facedetection phase is omitted and only segmentation task isexecuted

32 Fast Face Depth Map Conversion Firstly a bilateral filteris applied to the raw depth map in order to obtain noise-reduced face depth map and maintain the depth boundariessimultaneously The filter can be described as follows [12](raw map is denoted by 119877

119894 and the filtered depth map is

denoted by119863119894)

119863119894 () =

1

119882119901

sum

isin119908

119873120590(1003817100381710038171003817 minus

10038171003817100381710038172)119873120590(1003817100381710038171003817119877119894 () minus 119877119894 ()

10038171003817100381710038172) 119877119894()

(1)

where = (119906 V)119879 is the depth image pixel and isin 119908 (119908 is awindow to reduce computation complexity)119882

119901and119873

120590(119905) =

exp(minus1199052120590minus2) are normalizing constantsIn a real-time system the time complexity is an important

factor In the filter previously mentioned the exponentarithmetic is computationally expensive Since the distancebetween the depth image pixels is an integer we can use alookup table to speed up the computation Only the pixelswithin a certain window are considered so the size of lookuptable is not large

Given the camera intrinsic parameters (119891119909 119891119910 119888119909 119888119910

which respectively stand for the focal length in 119909 and 119910 axesthe focal point coordinate in 119909 and 119910 axes) the depth mapcan be converted into the vertex map (denoted by 119881

119894) and

corresponding normal vectors for each vertex can be easilycomputed using neighboring points

Assuming that (119906 V) is a pixel in the raw depth map and119911 = 119863

119894(119906 V) is the depth of filtered depth map then the coor-

dinate of 119881119894can be computed as follows

119881119894 (119909) =

119911 (119906 minus 119888119909)

119891119909

119881119894(119910) =

119911 (119906 minus 119888119910)

119891119910

119881119894 (119911) = 119911

(2)

With the vertex data obtained the normal vectors of eachvertex are computed with the following equation [12]

119873119894= (119881119894 (119906 + 1 V) minus 119881119894 (119906 V)) times (119881119894 (119906 V + 1) minus 119881119894 (119906 V))

(3)

and then normalized to the unit length using119873119894119873119894

Assuming that the image being processed has119898times119899 pixelsand the window of bilateral filter has a size of 119908 times 119908 the




sensor

Face detection and

segmention



the pyramid


globaltransform



the volume






model

Data capture

Data ready

No

Yes













119894= 119879119894119881119894() (Here





to get 119881119888119894minus1

= 119879minus1

119894minus1119881119892



119901 (119906) =119881119888

119894minus1(119909) times 119891119909

119881119888

119894minus1(119911)

+ 119888119909

119901 (V) =119881119888

119894minus1(119910) times 119891

119910

119881119888

119894minus1(119911)

+ 119888119910

(4)




119894minus1() and 119881

119894()





119864 = sum

1003817100381710038171003817(119879119881119894 () minus 119881119892

119894minus1())119873

119892

119894minus1()

10038171003817100381710038172 (5)


119879 = [119877 | 119905] = (

1 120572 minus120574 119905119909

minus120572 1 120573 119905119910

120574 minus120573 1 119905119911

) (6)



119892= 119879119911


119892


119911minus1


iteration 119881119892119894() = 119879

119911minus1



119909 119905119910 119905119911) isin R

6

119879119911

119892119881119894 () = 119877

119911119881119892

119894() + 119905

119911

= 119866 () + 119881119892

119894()

(7)

Assuming that 119881119892119894() = (119909 119910 119911)


119866 () = (

0 minus119911 119910 1 0 0

119911 0 minus119909 0 1 0

minus119910 119909 0 0 0 1

) (8)


min119909isinR6

sum

1198642119888

119864 = 119873119892

119894minus1()119879sdot (119866 () + 119881

119892

119894() minus 119881

119892

119894minus1())

(9)



sum

(119860119879119860) = sum

119860119879119887 (10)


119894minus1()119879(119881119892

119894minus1()minus

119881119892




119892larr 119879119911 max119892





V119892 (119909) = (119909 + 05119891) times cell size119909

V119892(119910) = (119910 + 05119891) times cell size119910

V119892 (119911) = (119911 + 05119891) times cell size119911

(11)



VOLUME 119883 (12)










119894(119901) is



119894(119901)


V119892minus 119905119894 minus 119863




sdot 119908119894minus1

+ tsdf119894sdot 119908119894

119908119894minus1

+ 119908119894

(13)





















reconstructed



reconstructed

Reconstruction done



4 Results




5 Conclusions




Acknowledgment


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra












sensor

Face detection and

segmention



the pyramid


globaltransform



the volume






model

Data capture

Data ready

No

Yes













119894= 119879119894119881119894() (Here





to get 119881119888119894minus1

= 119879minus1

119894minus1119881119892



119901 (119906) =119881119888

119894minus1(119909) times 119891119909

119881119888

119894minus1(119911)

+ 119888119909

119901 (V) =119881119888

119894minus1(119910) times 119891

119910

119881119888

119894minus1(119911)

+ 119888119910

(4)




119894minus1() and 119881

119894()





119864 = sum

1003817100381710038171003817(119879119881119894 () minus 119881119892

119894minus1())119873

119892

119894minus1()

10038171003817100381710038172 (5)


119879 = [119877 | 119905] = (

1 120572 minus120574 119905119909

minus120572 1 120573 119905119910

120574 minus120573 1 119905119911

) (6)



119892= 119879119911


119892


119911minus1


iteration 119881119892119894() = 119879

119911minus1



119909 119905119910 119905119911) isin R

6

119879119911

119892119881119894 () = 119877

119911119881119892

119894() + 119905

119911

= 119866 () + 119881119892

119894()

(7)

Assuming that 119881119892119894() = (119909 119910 119911)


119866 () = (

0 minus119911 119910 1 0 0

119911 0 minus119909 0 1 0

minus119910 119909 0 0 0 1

) (8)


min119909isinR6

sum

1198642119888

119864 = 119873119892

119894minus1()119879sdot (119866 () + 119881

119892

119894() minus 119881

119892

119894minus1())

(9)



sum

(119860119879119860) = sum

119860119879119887 (10)


119894minus1()119879(119881119892

119894minus1()minus

119881119892




119892larr 119879119911 max119892





V119892 (119909) = (119909 + 05119891) times cell size119909

V119892(119910) = (119910 + 05119891) times cell size119910

V119892 (119911) = (119911 + 05119891) times cell size119911

(11)



VOLUME 119883 (12)










119894(119901) is



119894(119901)


V119892minus 119905119894 minus 119863




sdot 119908119894minus1

+ tsdf119894sdot 119908119894

119908119894minus1

+ 119908119894

(13)





















reconstructed



reconstructed

Reconstruction done



4 Results




5 Conclusions




Acknowledgment


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra











119864 = sum

1003817100381710038171003817(119879119881119894 () minus 119881119892

119894minus1())119873

119892

119894minus1()

10038171003817100381710038172 (5)


119879 = [119877 | 119905] = (

1 120572 minus120574 119905119909

minus120572 1 120573 119905119910

120574 minus120573 1 119905119911

) (6)



119892= 119879119911


119892


119911minus1


iteration 119881119892119894() = 119879

119911minus1



119909 119905119910 119905119911) isin R

6

119879119911

119892119881119894 () = 119877

119911119881119892

119894() + 119905

119911

= 119866 () + 119881119892

119894()

(7)

Assuming that 119881119892119894() = (119909 119910 119911)


119866 () = (

0 minus119911 119910 1 0 0

119911 0 minus119909 0 1 0

minus119910 119909 0 0 0 1

) (8)


min119909isinR6

sum

1198642119888

119864 = 119873119892

119894minus1()119879sdot (119866 () + 119881

119892

119894() minus 119881

119892

119894minus1())

(9)



sum

(119860119879119860) = sum

119860119879119887 (10)


119894minus1()119879(119881119892

119894minus1()minus

119881119892




119892larr 119879119911 max119892





V119892 (119909) = (119909 + 05119891) times cell size119909

V119892(119910) = (119910 + 05119891) times cell size119910

V119892 (119911) = (119911 + 05119891) times cell size119911

(11)



VOLUME 119883 (12)










119894(119901) is



119894(119901)


V119892minus 119905119894 minus 119863




sdot 119908119894minus1

+ tsdf119894sdot 119908119894

119908119894minus1

+ 119908119894

(13)





















reconstructed



reconstructed

Reconstruction done



4 Results




5 Conclusions




Acknowledgment


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra

























reconstructed



reconstructed

Reconstruction done



4 Results




5 Conclusions




Acknowledgment


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra











4 Results




5 Conclusions




Acknowledgment


References





























Volume 2014




Journal of











Journal of


Function Spaces






Algebra



















Volume 2014




Journal of











Journal of


Function Spaces






Algebra









research article marching cubes algorithm for fast 3d

Documents