point cloud deep learning - nvidia...• preprocessing: • sample 8192 points • same as point...

57
1 / 57 Innfarn Yoo, 3/29/2018 POINT CLOUD DEEP LEARNING

Upload: others

Post on 03-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

1 / 57

Innfarn Yoo, 3/29/2018

POINT CLOUD DEEP LEARNING

Page 2: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

2 / 572 / 57

AGENDA

• Introduction

• Previous Work

• Method

• Result

• Conclusion

Page 3: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

3 / 57

INTRODUCTION

Page 4: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

4 / 57

2D OBJECT CLASSIFICATION

• Convolutional Neural Network (CNN) for 2D images works really well

• AlexNet, ResNet, & GoogLeNet

• R-CNN Fast R-CNN Faster R-CNN Mask R-CNN

• Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN)

Deep Learning for 2D Object Classification

[1] He et al., Mask R-CNN (2017)

Page 5: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

5 / 57

3D OBJECT CLASSIFICATION

• 3D object classification approaches are getting more attentions

• Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors)

• Size of data is bigger than 2D images

• Open datasets are increasing

• Recent researches approaches human level detection accuracy

• MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble

Deep Learning for 3D Object Classification

[2] Zhou and Tuzel, VoxelNet (2017)

Page 6: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

6 / 57

GOALS

• Evaluating & comparing different types of Neural Network models for 3D object classification

• Providing the generic framework to test multiple 3D neural network models

• Simple & easy to implement neural network models

• Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data)

The goals of our method

Page 7: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

7 / 57

PREVIOUS WORK

Page 8: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

8 / 57

3D POINT-BASED APPROACHES

• PointNet

• First 3D point-based classification

• Unordered dataset

• Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification

3D Points Neural Nets

[5] Qi et al., PointNet (2017)

Page 9: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

9 / 57

PIXEL-BASED APPROACHES

• Multi-Layer Perceptron (MLP)

• Convolutional Neural Network (CNN)

• Multi-View Convolutional Neural Network (MVCNN)

3D 2D Projections Neural Nets

[3] Su et al., MVCNN (2015)

[4] Krizhevsky et al., AlexNet (2012)

Page 10: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

10 / 57

VOXEL-BASED APPROACHES

• VoxNet

• VRN Ensemble

• VoxelNet

3D Points Voxels Neural Nets

[6] Maturana and Scherer, VoxNet (2015)

[7] Broke et al., VRN Ensemble (2016)

[2] Zhou and Tuzel, VoxelNet (2017)

Page 11: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

11 / 57

METHOD

Page 12: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

12 / 5712 / 57

PREPROCESSING

• Loading 3D polygonal objects

• Required Operations on 3D objects

• Sampling, Shuffling, Jittering, Scaling, & Rotating

• Projection, & Voxelization

• Python interface is not that good for multi-core processing (or multi-threading)

• # of objects is notoriously for single-core processing

Requirement

Page 13: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

13 / 5713 / 57

FRAMEWORKBasic Pipeline

Trainer:C++ program

Load 3D ObjectsSample Points

Loading 3D Object

Sampler Threads

Thread N3D Point Sample

Thread 23D Point Sample

…Thread 33D Point Sample

Call Python NN Model Functions:Train, Test, Eval, Report, & Save

Epoch #i

Increase epoch

Thread 13D Point Sample

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

nn3d_trainer (C++)

Page 14: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

14 / 57

SHAPENET CORE V2MODELNET40MODELNET10

3D DATASETS

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

10 Categories

4,930 Objects (2 GB)

OFF (CAD) File Format

ShapeNet

https://www.shapenet.org/

55 Categories

51,191 Objects (90 GB)

OBJ File Format

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

40 Categories

12,431 Objects (10 GB)

OFF (CAD) File Format

Page 15: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

15 / 5715 / 57

NEURAL NETWORK MODELS

Point-Based Models

Pixel-Based Models

Voxel-Based Models

Page 16: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

16 / 5716 / 57

POINT-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Rotate randomly

• Scale randomly

• Uniform sampling on 3D object surfaces

• Sample 2048 points

• Shuffle points

Types of Models

• Tested Models

• Multi-Layer Perceptron (MLP)

• Multi Rotational MLPs

• Single Orientation CNN

• Multi Rotational CNNs

• Multi Rotational Resample & Max Pool Layers

• ResNet-like

Page 17: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

17 / 5717 / 57

Page 18: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

18 / 5718 / 57

POINT-BASED NEURAL NETWORK MODELS

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

MLP

3D points …

Softmax Cross Entropy

ReLU + Dropout

Page 19: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

19 / 5719 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points …

Softmax Cross Entropy

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational MLPs

ReLU + Dropout

Page 20: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

20 / 5720 / 57

POINT-BASED NEURAL NETWORK MODELS

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Single Orientation CNN

3D points …

Softmax Cross EntropyReLU + Dropout ReLU + Dropout

Page 21: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

21 / 5721 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points…

Softmax Cross Entropy

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational CNNs

ReLU + Dropout ReLU + Dropout

Page 22: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

22 / 5722 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points…

Softmax Cross Entropy

Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational Resample & Max Pool Layers

ReLU + Dropout ReLU + Dropout

Page 23: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

23 / 5723 / 57

POINT-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy

Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

ResNet-like

3D points

ReLU + Dropout ReLU + Dropout ReLU + Dropout

Page 24: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

24 / 5724 / 57

PIXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Sample 8192 points

• Same as point-based models

• Depth-only orthogonal projection

• 32x32 or 64x64

• Generating multiple rotations

• 64x64x5 & 64x64x10

Types of Models

• Tested Models:

• MLP

• Depth-Only Orthogonal MVCNN

Page 25: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

25 / 5725 / 57

Page 26: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

26 / 5726 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy

MLP

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Images

(32x32x5)…

Page 27: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

27 / 5727 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Images

(32x32x5)…

Softmax Cross Entropy

Depth-Only Orthogonal MVCNN

Image Separation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Concat

Page 28: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

28 / 5728 / 57

VOXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Sample 8192 points

• Same as point based models

• Voxelization

• 3D points Voxels

• Each voxel has intensity 0.0 ~ 1.0

• how many points hit same voxel

• 32x32x32 & 64x64x64

Types of Models

• Tested Models:

• MLP

• CNN

• ResNet-like

Page 29: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

29 / 5729 / 57

Page 30: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

30 / 5730 / 57

VOXEL-BASED NEURAL NETWORK MODELSMLP

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Softmax Cross EntropyImages

(32x32x5)…

Page 31: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

31 / 5731 / 57

VOXEL-BASED NEURAL NETWORK MODELSCNN

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

3D Conv Layer

Softmax Cross EntropyVoxels

32x32x32

Page 32: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

32 / 5732 / 57

VOXEL-BASED NEURAL NETWORK MODELSResNet-like

Softmax Cross Entropy

Voxels

32x32x32

Avg Pooling Layer

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

3D Conv Layer

Concat

Page 33: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

33 / 5733 / 57

IMPLEMENTATION

• System: Ubuntu 16.04, RAM 32 GB & 64 GB, & SSD 512 GB

• NVIDIA Quadro P6000, Quadro M6000, & GeForce Titan X

• GCC 5.2.0 for C++ 11x

• Python 3.5

• TensorFlow-GPU v1.5.0

• NumPy 1.0

System Setup

Page 34: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

34 / 5734 / 57

HYPER PARAMETERS

• Object Perturbation

• Random Rotations: -25 ~ 25 degree

• Random Scaling: 0.7 ~ 1.0

• Learning Rate: 0.0001

• Keep Probability (Dropout layer): 0.7

• Max Epochs: 1000

• Batch Size: 32

• Number of Random Rotations: 20

• Voxel Dim: 32x32x32

• MVCNN Number of Views: 5

Page 35: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

35 / 57

RESULT

Page 36: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

36 / 57

MODELNET10# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900

1000

table toilet monitor bathtub sofa chair desk dresser night_stand bed

# of Test Models # of Train Models

Page 37: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

37 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET10 ACCURACYIter: 1000%

Page 38: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

38 / 57

MODELNET40# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

Page 39: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

39 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET40 ACCURACYIter: 1000%

Page 40: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

40 / 57

MODELNET40 ACCURACY7 CATEGORIES (# OF TRAIN OBJECTS > 400)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

Page 41: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

41 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET40, 7 CATEGORIES ACCURACYIter: 1000%

Page 42: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

42 / 57

MODELNET40 ACCURACY10 CATEGORIES (# OF TRAIN OBJECTS > 300)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

Page 43: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

43 / 57

MODELNET40, 10 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

Page 44: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

44 / 57

MODELNET40 ACCURACY17 CATEGORIES (# OF TRAIN OBJECTS > 200)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

Page 45: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

45 / 57

MODELNET40,17 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAPIter: 1000%

Page 46: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

46 / 57

MODELNET10 PERFORMANCE

0.00

0.50

1.00

1.50

2.00

2.50

3.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

hours

Total Training Time

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Inference Time Per Batch

seconds

Page 47: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

47 / 57

MODELNET40 PERFORMANCE

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Total Training Time

hours

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Inference Time Per Batch

seconds

Page 48: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

48 / 5748 / 57

SHAPENET CORE V2 ACCURACY

• ShapeNet has pretty big dataset

• 90 GB of dataset

• Only tested 3 NN models

• Point MP

• Depth-Only MVCNN

• Voxel CNN

Iter: 600

0

10

20

30

40

50

60

70

80

90

100

PC MP PX MVCNN VX CNN

Train Accu Test Accu mAP%

Page 49: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

49 / 5749 / 57

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

TRAIN ACCURACY GRAPH

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

PC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

Page 50: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

50 / 5750 / 57

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

TEST ACCURACY GRAPH

PC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

Page 51: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

51 / 5751 / 570

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

CROSS ENTROPY CONVERGENCE GRAPHPC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

Page 52: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

52 / 57

CONCLUSION

Page 53: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

53 / 57

CONCLUSION

• Voxel ResNet and Voxel CNN provide best result on 3D object classification

• Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data

• Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise)

• Strict comparisons with previous methods are not done yet

• Sampling and jittering methods are different (cannot directly compare yet)

Models

Page 54: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

54 / 57

CONCLUSION

• ModelNet10 & ModelNet40

• Some categories have too small training and testing objects

• Lowering classification accuracy

• ModelNet40, 7 categories (# of training objects > 400)

• Achieved 98% accuracy

• Partial dataset from ModelNet40 (Categories that have more than 400 3D objects)

• # of train 3D objects in a category matters

• ShapeNetCore.v2

• # of 3D objects are big enough, but many object but many duplicated objects

Dataset Evaluation

Page 55: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

55 / 57

FUTURE WORK

• Running entire procedures in CUDA

• Using NVIDIA GVDB Will have advantages of Sparse Voxels

• Strict comparisons with previous methods

• PointNet, VRN Ensemble, etc

• Extending methods to captured dataset (e.g. KITTI)

• Train set is too small

• Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not

Page 56: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

56 / 57

REFERENCE

• [1] He et al., 2017, Mask R-CNN

• [2] Zhou and Tuzel, 2017, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

• [3] Su et al., 2015, Multi-view convolutional neural networks for 3d shape recognition

• [4] Krizhevsky et al., 2012, ImageNet Classification with Deep Convolutional Neural Networks

• [5] Qi et al., 2017, Pointnet: Deep learning on point sets for 3d classification and segmentation

• [6] Maturana and Scherer, 2015, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

• [7] Broke et al., 2016, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

• [8] He et al., 2016, Deep residual learning for image recognition

• [9] Goodfellow et al., 2014, Generative adversarial nets

• [10] Girshick, 2015, Fast R-CNN

• [11] Ren et al., 2015, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

• [12] Chang et al., 2015, ShapeNet: An Information-Rich 3D Model Repository

• [13] Wu et al., 2015, 3D ShapeNets: A Deep Representation for Volumetric Shapes

• [14] Wu et al., 2014, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

Page 57: POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point based models • Voxelization • 3D points Voxels • Each voxel has intensity 0.0

57 / 57