training kinect mihai budiu microsoft research, silicon valley ucsd cns 2012 research review...

29
Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Upload: georgia-barrett

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Training Kinect

Mihai BudiuMicrosoft Research, Silicon Valley

UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Page 2: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

2

Label body parts in depth map

Parallelizing the Training of the Kinect Body Parts Labeling AlgorithmMihai Budiu, Jamie Shotton, Derek G. Murray, and Mark FinocchioBig Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011

Page 3: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

3

Solution: Learn from Data

Classifier

Training examplesMachine learning

Page 4: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

4

Big data

• 1M Training examples• 300,000 pixels/image• 100,000 features• <220 tree nodes/tree• 31 body parts• 3 trees

Dryad

DryadLINQ

Decision forest inference

Classifier

Page 5: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

Execution

Application

Data-Parallel Computation

5

Storage

Language

ParallelDatabases

Map-Reduce

GFSBigTable

CosmosAzureHPC

Dryad

DryadLINQSawzall,FlumeJava

Hadoop

HDFSS3

Pig, HiveSQL ≈SQL LINQSawzall, Java

Page 6: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

6

Dryad = 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 7: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

7

Virtualized 2-D Pipelines

Page 8: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

8

Virtualized 2-D Pipelines

Page 9: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

9

Virtualized 2-D Pipelines

Page 10: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

10

Virtualized 2-D Pipelines

Page 11: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

11

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 12: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

12

Fault Tolerance

Page 13: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

13

LINQ

Dryad

=> DryadLINQ

Page 14: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

14

LINQ = .Net+ Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 15: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

15

DryadLINQ Data Model

Partition

Collection

.Net objects

Page 16: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

16

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 17: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

17

Kinect Training Pipeline

20x

Page 18: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

18

Partial tree ImagesFeatures

split

New partial tree

Query plan for one tree layer

Parallelize on:• Features• Images• Tree nodes

Page 19: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

19

High cluster utilization

Time

Mac

hine

Page 20: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

20

CONCLUSIONS

Page 21: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

21

Huge Commercial Success

Page 22: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

22

Tremendous Interest from Developers

Page 23: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

23

Consumer Technologies Push The Envelope

Price: 6000$

Price: 150$

Page 24: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

24

Unique Opportunity for Technology Transfer

Page 25: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

25

I can finally explain to my sonwhat I do for a living…

Page 26: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

26

BACKUP SLIDES

Page 27: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

27

10 100 1000 10000 100000 10000000

0.05

0.1

0.158 core machine1000 core cluster

Number of training images (log scale)

core

* h

ours

/ im

age

Training efficiency

Page 28: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

28

Cluster usage for one tree

Time (s)

Machine(235)

Prep

roce

ss

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (f

aile

d)

19

18.3 hours, 137.2 CPU days, 107421 processes, 29.56 TB data, average parallelism=140

1440

0 pr

oces

ses

Nor

mal

izeTr

ee

Page 29: Training Kinect Mihai Budiu Microsoft Research, Silicon Valley UCSD CNS 2012 RESEARCH REVIEW February 8, 2012

29

DryadLINQ Language Summary

WhereSelectGroupByOrderByAggregateJoin