becoming a kinect hacker innovator v2

58
Becoming a Kinect Hacker Innovator

Upload: jeff-sipko

Post on 15-May-2015

2.153 views

Category:

Technology


3 download

DESCRIPTION

The slides accompanying the "Becoming a Kinect Hacker^h Innovator" talk from SAPO Codebits 2011. https://codebits.eu/intra/s/session/223

TRANSCRIPT

Page 1: Becoming a kinect hacker innovator v2

Becoming a Kinect Hacker Innovator

Page 2: Becoming a kinect hacker innovator v2
Page 3: Becoming a kinect hacker innovator v2

Who We Are

Jeff Sipko– SDE on KinectShare / KinectHack– Part of Good Science, working on Fun Labs

KinectShare

Page 4: Becoming a kinect hacker innovator v2

Overview

HistoryCapabilitiesCodeDemo

Page 5: Becoming a kinect hacker innovator v2
Page 6: Becoming a kinect hacker innovator v2
Page 7: Becoming a kinect hacker innovator v2

Atari 2600Best video game system…ever…

Page 8: Becoming a kinect hacker innovator v2
Page 9: Becoming a kinect hacker innovator v2
Page 10: Becoming a kinect hacker innovator v2
Page 11: Becoming a kinect hacker innovator v2
Page 12: Becoming a kinect hacker innovator v2
Page 13: Becoming a kinect hacker innovator v2
Page 14: Becoming a kinect hacker innovator v2
Page 15: Becoming a kinect hacker innovator v2
Page 16: Becoming a kinect hacker innovator v2
Page 17: Becoming a kinect hacker innovator v2
Page 18: Becoming a kinect hacker innovator v2
Page 19: Becoming a kinect hacker innovator v2
Page 20: Becoming a kinect hacker innovator v2
Page 21: Becoming a kinect hacker innovator v2
Page 22: Becoming a kinect hacker innovator v2
Page 23: Becoming a kinect hacker innovator v2
Page 24: Becoming a kinect hacker innovator v2

Started as a $30,000 prototype

Vision: Shift the world from thinking“We need to understand technology” to "Technology needs to understand us"

Page 25: Becoming a kinect hacker innovator v2

Option A:

Why Kinect?

Page 26: Becoming a kinect hacker innovator v2

Why Kinect?

Option You:

Page 27: Becoming a kinect hacker innovator v2

The Challenge

• Find the people in the scene, ignore background• Find their limbs and joints, which person is which• Find and track their gestures• Map the gestures to meaning and commands

• Also: recognize faces• Also: recognize voices and commands• Also: reduce hardware costs to consumer levels

P.S.

And

pla

y th

e ga

me!

Page 28: Becoming a kinect hacker innovator v2

“What are those things?”

Multi-array Microphone

RGB Camera

IR Camera

Page 29: Becoming a kinect hacker innovator v2

IR Camera: Depth Computation

Page 30: Becoming a kinect hacker innovator v2

IR Camera: Depth Map

Page 31: Becoming a kinect hacker innovator v2

IR Camera: Provided DataDepth and segmentation map

Page 32: Becoming a kinect hacker innovator v2

Skeletal - Provided Data

Page 33: Becoming a kinect hacker innovator v2

Vision Algorithm (Summary) Quickly and accurately predict 3D positions of body joints From a single depth image, using no temporal information

Object recognition approach Intermediate body parts representation that maps the difficult pose

estimation problem into a simpler per-pixel classification problem

Large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc.

Generate confidence-scored 3D proposals of several body joints by re-projecting the classification result and finding local modes

System runs at 200 frames per second on consumer hardware Evaluation shows high accuracy on both synthetic and real test sets State of the art accuracy in comparison with related work and

improved generalization over exact whole-skeleton nearest neighbor matching

Page 34: Becoming a kinect hacker innovator v2

In Practice• Collect training data – thousands of visits to global households,

filming real users, the Hollywood motion capture studio generated billions of images

• Apply state-of-the-art object recognition research• Apply state-of-the-art real-time semantic segmentation

• Build a training set – classify each pixel’s probability of being in any of 32 body segments, determine probabilistic cluster of body configurations consistent with those, present the most probable

• Millions of training images Millions of classifier parameters• Hard to parallelize New algorithm for distributed decision-tree

training• Fun Fact: Major use of DryadLINQ (large-scale distributed cluster

computing)

Page 35: Becoming a kinect hacker innovator v2

Motorized Tilt

±28° up / down

Page 36: Becoming a kinect hacker innovator v2

The Audio System

Page 37: Becoming a kinect hacker innovator v2

Input Stream(What the mic array hears)

Post-MEC(What APIs present)

MEC

Demo: Multichannel Echo Cancellation

Page 38: Becoming a kinect hacker innovator v2

Beam Forming / Source Localization

Automatically points to loudest sound source

Manually steer the direction of the listening beam in 10° increments

Page 39: Becoming a kinect hacker innovator v2

Speech Recognition

Page 40: Becoming a kinect hacker innovator v2

Acoustic model Language Countries/Regionsde-DE German Germanyen-AU English Australia, New Zealanden-GB English Ireland, United Kingdomen-US English Canada, United Stateses-ES Spanish Spaines-MX Spanish Mexicofr-CA French Canadafr-FR French Franceit-IT Italian Italyja-JP Japanese Japan

Kinect Speech Recognition Languages (as of June 2011 SDK)

Page 41: Becoming a kinect hacker innovator v2

Sample Grammar(from Simple Speech Recognition sample)

<?xml version="1.0" encoding="utf-8"?><grammar xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0-literals" sapi:alphabet="x-microsoft-ups" xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions" xml:lang="en-US" root="rootrule" version="1.0">

<rule id="rootrule" scope="public"> <one-of> <item>view inventory <tag>view inventory</tag> </item> <item>show quests <tag>show quests</tag> </item> <item>pause game <tag>pause game</tag> </item> <item>open <token sapi:pron="S P EH L B UH K">spellbook</token> <tag>open spellbook</tag> </item>[…]

</one-of></rule> </grammar>

Custom pronunciation

Property tag

Page 42: Becoming a kinect hacker innovator v2

Sample Grammar: Localized(from Simple Speech Recognition sample)

<?xml version="1.0" encoding="utf-16"?><grammar xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0-literals" sapi:alphabet="x-microsoft-ups" xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions" xml:lang="es-ES" root="rootrule" version="1.0">

<rule id="rootrule" scope="public"> <one-of> <item>ver inventario <tag>view inventory</tag> </item> <item>muestra aventuras <tag>show quests</tag> </item> <item>pausa el juego <tag>pause game</tag> </item> <item>abre libros de hechizos <tag>open spellbook</tag> </item>

[…]

</one-of></rule> </grammar>

Property tag

Page 43: Becoming a kinect hacker innovator v2

What does this allow? Watch, track, and render people’s motion Recognize faces and facial expressions Recognize voices, words, and tone What else? What is coming?

Background Removal Seated Skeletal Tracking Finger Tracking Head Tracking Digital Object Creation ???? = You

EmotionallyAware

Page 44: Becoming a kinect hacker innovator v2

The Kinect SDK

Provides both Unmanaged and Managed API Unmanaged API – Concepts work in C++ Managed API – Concepts work in both VB/C#

Samples & documentation to get you started

Assumes some programming experience

http://kinectforwindows.org

Page 45: Becoming a kinect hacker innovator v2

What’s in the SDK? Raw sensor streams

Access to raw data streams from the depth sensor, color camera sensor, and four-element microphone array enables developers to build upon the low-level streams that are generated by the Kinect sensor.

Skeletal tracking The capability to track the skeleton image of one or two people moving within the

Kinect field of view make it easy to create gesture-driven applications. Advanced audio capabilities

Audio processing capabilities include sophisticated acoustic noise suppression and echo cancellation, beam formation to identify the current sound source, and integration with the Windows speech recognition API.

Sample code and documentation The SDK includes more than 100 pages of technical documentation. In addition to

built-in help files, the documentation includes detailed walkthroughs for most samples provided with the SDK.

Easy installation The SDK installs quickly, requires no complex configuration, and the complete installer

size is less than 100 MB. Developers can get up and running in just a few minutes with a standard standalone Kinect sensor unit (widely available at retail outlets).

Designed for non-commercial purposes; a commercial version is expected later.

Windows 7 – C++, C#, or Visual Basic in Microsoft Visual Studio 2010.

Page 46: Becoming a kinect hacker innovator v2

Windows SDK: Architecture

Page 47: Becoming a kinect hacker innovator v2

The Tools

Visual Studio 2010 (Express or other) .NET Framework 4.0 Kinect for Windows SDK Microsoft Speech Framework Coding4Fun Kinect Toolkit XNA Game Studio

Page 48: Becoming a kinect hacker innovator v2

Let’s Get Started!

… Literally!

Runtime nui = new Runtime();Runtime.Initialize();Andnui.VideoStream.Open(…);nui.DepthStream.Open(…);

… and then later:Runtime.Shutdown();

Page 49: Becoming a kinect hacker innovator v2

Gimme Data Polling Method

nui.DepthStream.GetNextFrame(timeout) nui.VideoStream.GetNextFrame(timeout) nui.SkeletonEngine.GetNextFrame(timeout)

Event Method nui.DepthFrameReady += new

EventHandler<ImageFrameReadyEventArgs>(fn) nui.SkeletonFrameReady += new

EventHandler<SkeletonFrameReadyEventArgs>(fn) nui.VideoFrameReady += new

EventHandler<ImageFrameReadyEventArgs>(fn)

Page 50: Becoming a kinect hacker innovator v2

What you get

Color Image:Byte array in B8G8R8 format by defaultAlso supports YUV

Depth Image:13 high-order bits contain the distance in

mm3 low-order bits contain the player index

Only valid values are 0, 1, 2

Page 51: Becoming a kinect hacker innovator v2

Speech Start the Recognition Engine

var sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

Create a grammarChoices colors = new Choices(); colors.Add(new string[] {"red"}); colors.Add(new string[] {"green"}); colors.Add(new string[] {"blue"}); GrammarBuilder gb = new GrammarBuilder();gb.Append(colors); // Create the Grammar instance.Grammar g = new Grammar(gb);sre.LoadGrammar(g);

Page 52: Becoming a kinect hacker innovator v2

Speech – Grammar From File FileStream fs = new FileStream(grammarPath +

”foobar.cfg", FileMode.Create); SrgsGrammarCompiler.Compile(grammarPath +

”foobar.grxml", (Stream)fs); fs.Close(); Grammar g = new Grammar(grammarPath +

”foobar.cfg", "rootrule");

Page 53: Becoming a kinect hacker innovator v2

Speech – cont

Register for speech eventssre.SpeechRecognized += new

EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

and

void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { MessageBox.Show(e.Result.Text); }

Page 54: Becoming a kinect hacker innovator v2

Show me the demo!

Page 55: Becoming a kinect hacker innovator v2

Kinect Fun Labs…

+ +

Page 56: Becoming a kinect hacker innovator v2
Page 57: Becoming a kinect hacker innovator v2

Q & A

Bueller?…Bueller?

Page 58: Becoming a kinect hacker innovator v2

Thank You! Obrigado!

Forumshttp://kinectforwindows.org/resources

Email me [email protected]