kinect2 hands on
TRANSCRIPT
Kinect 2 Hands On
Luigi Oliveto
Researcher, Developer, IT Consultant
Email: [email protected]
Twitter: @LuigiOliveto
LinkedIn: https://it.linkedin.com/in/luigioliveto
Agenda
• The Sensor
• System Requirements
• Architecture
• Data Sources
• Kinect Studio
• Gesture Recognition
2
Kinect 2 Sensor
Depth resolution: 512×424 pixels
RGB resolution: 1920×1080 pixels (16:9)
Frame rate: 30 FPS
Mic frequecy: 48 kHz
Range: from 0.5 to 4.5 m
4
USB hub
Power supply
3D DEPTH SENSOR
RGB CAMERA
MULTI-ARRAY MIC
Sensor
Kinect 1 VS Kinect 2 5
Feature Kinect for Windows 1 Kinect for Windows 2
Color Camera 640 x 480 @ 30 fps 1920 x 1080 @ 30 fps
Depth Camera 320 x 240 512 x 424
Max Depth Distance ~4.0 M ~4.5 M
Min Depth Distance 80cm (40 cm in near mode) 50 cm
Horizontal Field of View 57 degrees 70 degrees
Vertical Field of View 43 degrees 60 degrees
Tilt Motor yes no
Skeleton Joints Defined 20 joints 25 joints
Full Skeletons Tracked 2 6
USB Standard 2.0 3.0
Supported OS Win 7, Win 8 Win 8-8.1 (WSA)
Price (sensor + adapter) ~ €160 ~ €200
System Requirements
• Operating System • Windows 8/8.1 (x64)
• Windows 8/8.1 Embedded Standard (x64)
• Hardware• 64 bit processor (x64) i7 3.1Ghz (or higher)
• 4 GB memory (or more)
• Built-in USB 3.0 host controller
• DirectX11 capable graphics adapter: ATI Radeon (HD 5400 series, HD 6570, HD 7800), NVidia Quadro (600, K1000M), NVidia GeForce (GT 640, GTX 660), Intel HD 4400
• Kinect v2 sensor (with power supply and USB hub)
• Software• .NET Framework 4.5
• Visual Studio 2012 or higher
• Microsoft Speech Platform Software Development Kit (Version 11)
• Kinect for Windows SDKhttp://www.microsoft.com/en-us/download/details.aspx?id=44561
• Applications• Windows Presentation Foundation (WPF)
• Windows Store App
• Programming languages• C++, C#, VB.NET, …
7
https://dev.windows.com/en-us/kinect
Architecture (2)
• The sensor is a resource many applications can access it simultaneously
• The sensor gives a set of sources (functionalities)
• From every source it is possible to start readers
• Every reader gives events to acquire references to the device’s frames.
• From every frame it is possible to get data about the specific source (e.g. color image, body data, etc…)
10
Sensor Sources ReaderFrame
RefFrame
Sensor
• Sensor usage• Get an instance of KinectSensor
• Open the sensor
• Use the sensor
• Close the sensor
• In case of device unplug• The KinectSensor instance remain valid
• No more frames are sent/received
• The sensor IsAvailable property become false
11
Sensor Sources ReaderFrame
RefFrame
Sources
• The sensor exhibit a source for every functionality• Color source
• Depth source
• Infrared source
• Body Index source
• Body source (skeleton, hand tracking, lean…)
• Audio source
12
Sensor Sources ReaderFrame
RefFrame
Readers
• Give access to frames• Events
• Polling
• Multiple readers can be created for each source
• Reader can be paused
13
Sensor Sources ReaderFrame
RefFrame
Frame References
• Access current frame through AcquireFrame() method
• Frame contains metadata (i.e., for the color: format, width, height)
• MUST be managed quickly and then released (if a frame is not released other frames shouldn’t arrive)
14
Sensor Sources ReaderFrame
RefFrame
Frame
• Access frame data• Access raw buffer directly
• Take a local copy
15
Sensor Sources ReaderFrame
RefFrame
MultiSourceFrameReader
• Allows to get a matched set of frames from multiple sources on a single event
• Delivers frames at the lowest FPS of the selected sources
16
MultiSourceFrameReader MultiReader =Sensor.OpenMultiSourceFrameReader(FrameSourceTypes.Color |
FrameSourceTypes.BodyIndex |FrameSourceTypes.Body);
var frame = args.FrameReference.AcquireFrame(); if (frame != null) {
using (colorFrame = frame.ColorFrameReference.AcquireFrame())using (bodyFrame = frame.BodyFrameReference.AcquireFrame())using (bodyIndexFrame = frame.BodyIndexFrameReference.AcquireFrame())
{//
}}
Kinect Data Sources – Color
• 1920 x 1080 array of color pixels• 30 or 15 fps, based on lighting conditions
• Elaborated Image Format: • RGBA, BGRA, YUY2, …
• Raw Format: YUY2
• Frame data can be:• Used in raw format
• Converted to other formats (with a computational cost)
• The Buffer is a byte array.
• The number of bytes per pixel depends on raw format (now is 4 bytes per pixel).
19
Kinect Data Sources – Infrared
• 512 x 424 pixel @ 30 fps
• Same physical sensor of the depth source
• Two sources:• Infrared: single infrared frame
• LongExposureInfrared: overlapping of 3 frames (better ratio signal/noise but images with blurry effect)
• Every pixel is composed by 2 byte (16-bit) and represent the IR intensity value
• Ambient light removed: the SDK get only the reflection of the infrared light, projected by the device
20
Kinect Data Sources – Depth
• 512 x 424 pixel @ 30 fps
• Range: 0.5 – 4.5 meters (Extended Depth to 8m)
• Every pixel is composed by 2 byte (16-bit) and contain the distance in millimeters from the sensor’s focal plane
• Player index not present
21
Kinect Data Sources – Body Index
• 512 x 424 @ 30 fps
• Every pixel is composed by 1 byte
• Pixel Data • 0 to 5: Index of the corresponding body,
as tracked by the body source
• > 5: No tracked body at that pixel
23
Kinect Data Sources – Body
• Range is 0.5-4.5 meters
• 30fps
• Frame data is a collection of Body objects
• Each body has • 25 joints (each joint has position in 3D space
and orientation)
• Hand tracking (open, close, “lazo”)
• Face tracking and expressions
• Bones’ orientation
• Up to 6 simultaneous bodies
• Hand State on 2 bodies
24
Body information
• The Body class contains useful properties:• ClippedEdges: edges of the Field of View that clip the body
• HandState [Left/Right]: { Unknown, NotTracked, Closed, Open, Lasso }
• HandConfidence [Left/Right]: { High, Low }
• IsRestricted
• IsTracked
• TrackingId: 64-bit unique id
• Joints: position in the space of each joint
• JointOrientations: orientation in the space of the articulation
• Lean: inclination vector of the body
• LeanTrackingState: { Inferred, NotTracked, Tracked }
• Up to 6 bodies simultaneously
• Up to 2 players’ hands simultaneously
25
Kinect Data Sources – Audio
• Frame data is an Audio Beam
• Readers and event as previous sources
• Acquire frames through AcquireBeamFrames() method
28
Coordinate System
• ColorSpace (Coordinate System of the Color Image)• … Color
• DepthSpace (Coordinate System of the Depth Data)• … Depth, Infrared, BodyIndex
• CameraSpace (Coordinate System with the origin located to the sensor)• … Body (Joint)
29
Coordinate Mapper
• Three coordinate systems
• Coordinate mapper provides conversions between each system
• Convert single or multiple points
30
Name Applies to Dimensions Units Range Origin
ColorSpacePoint Color 2 pixels 1920x1080 Top left corner
DepthSpacePoint Depth,
Infrared,
Body index
2 pixels 512x424 Top left corner
CameraSpacePoint Body 3 meters – Infrared/depth
camera
Kinect Region & User Controls
• The KinectRegion user control define a part of the user interface (XAML) where the user can interact with an hand pointer
• The region must be connected to the sensor instance
• Available gestures (“out-of-the-box”) usable into a KinectRegion:• Click
• Grab
• Pan
• Zoom
• KinectUserViewer gives a visual feedback related to the tracked state of the users
• Re-use default user controls
34
Recordable Data Sources 38
Infrared
13 MB/s
Depth
13 MB/s
BodyFrame
BodyIndex
Color
120 MB/s
Audio
32 KB/s
Legend
Record/Play
Record Only
Gesture Recognition 40
• Gesture is a coding problem
• Quick to do simple gestures/poses (hand over head)
• ML can also be useful to find good signals for Heuristic approach
• Gesture is a data problem
• Signals which may not be easily human understandable (progress in a baseball swing)
• Large investment for production
• Danger of over-fitting, causes you to be too specific – eliminating recognition of generic cases
Heuristic Machine Learning (ML) with G.B.
Visual Gesture Builder (1)
• New tool integrated with v2 SDK
• Organize data using projects and solutions
• Give meaning to data by tagging gestures
• Build gestures using machine learning technology• Adaptive Boosting (AdaBoost) Trigger
• Determines if player is performing gesture
• Random Forest Regression (RFR) Progress
• Determines the progress of the gesture performed by player
• Analyze / test the results of gesture detection
• Live preview of results
41
Resources
• General Info & Blog https://dev.windows.com/en-us/Kinect
• Purchase Sensor http://goo.gl/ZsMtBx
• Developer Forums https://goo.gl/bpptyq
• Twitter Account @KinectWindows
• A Facebook Group http://on.fb.me/1LSflbX
• A LinkedIn Group http://linkd.in/1J9gFcY
• A Twitter Account @KinectDevelop
• A Google Plus Page http://bit.ly/1SHtduT
43