po-hsiang chen advisor: sheng-jyh wang 2/13/2012

Po-Hsiang Chen

Advisor: Sheng-Jyh Wang

How Kinect works?

2/13/2012

2

Major References• Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time

Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation

• CVPR 2011 Best Paper

• Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/0118123A1

• PrimeSense Patent

2/13/2012

3 2/13/2012

Outline• What is Kinect?

• Kinect Architecture

• From IR to depth image

• History of Structured Light

• PrimeSense Invented Structured Light

• From depth image to joint positions

• Body Part Interference

• Joint Proposals

• Experiments and Results

• Conclusion

• References

4 2/13/2012








• Joint Proposals


• Conclusion

• References

5 2/13/2012

What is Kinect?• Motion sensing input device by Microsoft

• Depth camera tech. developed by PrimeSense

• Invented in 2005

• Software tech. developed by Rare

• First announced at E3

2009 as “Project Natal”

• Windows SDK Releases

http://www.microsoft.com

/en-us/kinectforwindows/

discover/features.aspx

http://www.microsoft.com/en-us/kinectforwindows/discover/features.aspx







6 2/13/2012

Kinect IR Structured Light

7 2/13/2012








• Joint Proposals


• Conclusion

• References

8 2/13/2012

Kinect Architecture

Depth Image

Body Parts

Joint Position

IR Structured Light

Random Decision Forest Mean Shift

9 2/13/2012








• Joint Proposals


• Conclusion

• References

10 2/13/2012

3D Imaging of surface

11 2/13/2012

Triangulation• Main Problem

• To recover shape from multiple views, need CORRESPONDENCES between the images

• Matching/Correspondence problem is hard• Occlusions, Texture, Colors.. Etc.

• Solution: Structured light

• Idea: Simplify matching

• Strategy: Use illumination to create your

own correspondences

12 2/13/2012

Structured Light• Basic Principle

• Use a projector to create unambiguous correspondences

• Light projection

• If we project a single point, matching is unique

13 2/13/2012

Structured Light• Line projection ( Line Scan )

• For calibrated cameras, the epipolar geometry is known

• Project a line instead of a single point

14 2/13/2012

Structured Light• Project Multiple Stripes or Grids

• Which stripe matches which?• Correspondence Again

15 2/13/2012

Structured Light• Answer 1: Assume Surface Continuity

• Ordering Constraint

16 2/13/2012

Structured Light• Answer 2: Coloured stripes (De Bruijn)

• Difficult to use for coloured surfaces

17 2/13/2012

Structured Light• Answer 2: Coloured dots (M-array)

• Difficult to use for coloured surfaces

18 2/13/2012

Structured Light• Answer 3: Pattern dots (M-array)

• Difficult for industrial manufacturing

19 2/13/2012

Structured Light• Answer 4: Time-coded light patterns (Time

multiplexing)

• Use a sequence of binary patterns → (log N) images

• Each stripe has a unique binary illumination code

20 2/13/2012

Structured Light• All of the above are categorized as Discrete

Methods

• There are a lot more Continuous Structured Light Methods such as Phase shifting and etc.

• Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680

21 2/13/2012

Structured Light• All of the above are human designed patterns.

• Random Speckle

• Structured light using randomly generated patterns

• May obtain denser depth information by solving correspondence problem

22 2/13/2012

What can we do better?• A Projector is just an inverse of a camera

• One projector and one camera is enough for triangulation

• Need Calibration

23 2/13/2012

PrimeSense Patents• US 2010/0118123

• Projector-Camera system

• Already calibrated structure• δZ results in δX in 32

24 2/13/2012


• Structured Light-1• Pseudo-random distribution

• Local: Random

• Global: Gray level decreases

• Can make a rough estimate in

a low resolution image

25 2/13/2012


• Structured Light-2• Quasi-periodic pattern

• Five-fold symmetry

• Results in distinct peaks

in freq. domain

• Contain no unit cell repeats

over spatial domain

• Use to reduce noise and

ambient light in environment

26 2/13/2012

Kinect IR Structured Light

27 2/13/2012


28 2/13/2012


• Uses a special (“astigmatic”) lens with different focal length in x- and y- directions

• Orientation of the circle indicates depth

29 2/13/2012








• Joint Proposals


• Conclusion

• References

30 2/13/2012

From depth to joints • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time


• Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used )

• Algorithms runs 5ms per frame on Xbox GPU

• Novelty: Intermediate body parts representation

31 2/13/2012

Body Part Inference • Body part labeling

• 31 body parts

• Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body

32 2/13/2012

Body Part Inference• Depth image features

• dI(x) is the depth at pixel x in image I

• θ=(u,v) describe offsets u and v

• Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations

33 2/13/2012

Randomized Decision Forests• Fast and effective multi-class classifier

• Each split node consists of a feature fθ and a threshold τ

• At the leaf node in tree t, given a learned

• Final classification

34 2/13/2012

Combining Models• Multiple classifiers work together

• Committees• E.g. Averaging the predictions of a set of individual

models

• E.g. Majority votes

• Boosting• Classifiers trained in sequence

• E.g. AdaBoost

• Decision Tree• Binary selection corresponding

to the traversal of a tree

35 2/13/2012

Decision Tree• Three major aspect

• A splitting criterion

• A stop-splitting rule

• A rule to assign each

leaf to a specific class

• Decision Forests

• A Decision Tree Committee

36 2/13/2012

Randomized Decision Forests• Fast and effective multi-class classifier

• Each split node consists of a feature fθ and a threshold τ

• At the leaf node in tree t, given a learned

• Final classification How to train?

37 2/13/2012

Randomized Decision Forests• Training

• Each tree train on different images

• Each image pick 2000 example pixels

• Algorithm

38 2/13/2012

Randomized Decision Forests• Algorithm(cont.)

• Shannon entropy given Z on Y

39 2/13/2012

Randomized Decision Forests• Algorithm(cont.)

• Training takes a lot of efforts

• 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster

Where are those training data?

40 2/13/2012

Training Data• Depth imaging

• Simplify the task of background subtraction

• Most important: easy to synthesize!!!Take

Real

Images

Learning Synthesize

Parameters

Generat

e Lots of

training data

41 2/13/2012

Kinect Architecture

Depth

Image

Body

Parts

Joint Position

IR Structured Light

Random Decision Forest Mean Shift

42 2/13/2012

Joint Position Proposals• From the previous section,

• Use Mean Shift with a weighted Gaussian kernel

43 2/13/2012

Mean Shift• Kernel density estimator

• Discrete points -> Continuous function

• Calculate the gradient at initial point and shift

• Iterate till stop

44 2/13/2012








• Joint Proposals


• Conclusion

• References

45 2/13/2012

Experiments and Results • Synthetic

• Real

46 2/13/2012

Experiments and Results• Failure

47 2/13/2012

Experiments and Results• Training parameters vs. classification accuracy

48 2/13/2012

Experiments and Results• Comparisons

49 2/13/2012








• Joint Proposals


• Conclusion

• References

50 2/13/2012

Conclusion• Depth images may contain enough information to

solve human pose problems

• Depth images are color and texture invariant, which simplifies a lot of the corresponding problem

• A deep combining model with sufficient training data can become a good classifier even with simple features

• Buy a Kinect for LAB

51 2/13/2012








• Joint Proposals


• Conclusion

• References

52

References• Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time


• Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US 2010/0118123A1

• Freedman, B., A. Shpunt, et al. (2008). Distance-Varying Illumination and Imaging Techniques for Depth Mapping, US 2010/0290698A1

2/13/2012

53 2/13/2012

References• Salvi, J., S. Fernandez, et al. (2010). "A state of the

art in structured light patterns for surface profilometry." Pattern Recognition 43(8): 2666-2680.

• Albitar, I., P. Graebling, et al. (2007). “Robust structured light coding for 3D reconstruction,” IEEE.

• Scharstein, D. and R. Szeliski (2003). “High-accuracy stereo depth maps using structured light,” IEEE.

• Breiman, L. (2001). "Random forests." Machine learning 45(1): 5-32.

• Amit, Y. and D. Geman (1997). "Shape quantization and recognition with randomized trees." Neural computation 9(7): 1545-1588.

54 2/13/2012

References• John MacCormick, “How does the Kinect work? ”

users.dickinson.edu/~jmac/selected-talks/kinect.pdf

• “Structured Light”, www.igp.ethz.ch/photogrammetry/.../MV-SS2011-structured.pdf

• http://en.wikipedia.org/wiki/Kinect

• http://en.wikipedia.org/wiki/Structured-light_3D_scanner

• http://en.wikipedia.org/wiki/Triangulation

• http://dms.irb.hr/tutorial/tut_dtrees.php

• http://www.anandtech.com/show/4057/microsoft-kinect-the-anandtech-review/2

• Chen, Y. S. and B. T. Chen (2003). "Measuring of a three-dimensional surface by use of a spatial distance computation." Applied optics 42(11): 1958-1972.

http://www.igp.ethz.ch/photogrammetry/.../MV-SS2011-structured.pdf

http://www.igp.ethz.ch/photogrammetry/.../MV-SS2011-structured.pdf

http://en.wikipedia.org/wiki/Kinect



http://en.wikipedia.org/wiki/Structured-light_3D_scanner

http://en.wikipedia.org/wiki/Structured-light_3D_scanner

http://en.wikipedia.org/wiki/Triangulation

http://en.wikipedia.org/wiki/Triangulation

http://dms.irb.hr/tutorial/tut_dtrees.php

http://dms.irb.hr/tutorial/tut_dtrees.php

http://www.anandtech.com/show/4057/microsoft-kinect-the-anandtech-review/2



po-hsiang chen advisor: sheng-jyh wang 2/13/2012

Documents

unique slide

coloured surfaces slide

structured light idea

single point slide

industrial manufacturing

depth mapping

joint positions body

single depth images