sun rgb-d: a rgb-d scene understanding benchmark suite

7
SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite Supplimentary Material Shuran Song Samuel P. Lichtenberg Jianxiong Xiao Princeton University http://rgbd.cs.princeton.edu 1. Segmetation Result wall floor cabinet bed chair sofa table door window bookshelf picture counter blinds desk shelves curtain dresser pillow mirror RGB NN 37.8 45.0 17.4 21.8 16.9 12.8 18.5 6.1 9.6 9.4 4.6 2.2 2.4 7.3 1.0 4.3 2.2 2.3 6.9 Depth NN 32.1 42.6 2.9 6.4 21.5 4.1 12.5 3.4 5.0 0.8 3.3 1.7 14.8 2.0 15.3 2.0 1.4 1.2 0.9 RGB-D NN 36.4 45.8 15.4 23.3 19.9 11.6 19.3 6.0 7.9 12.8 3.6 5.2 2.2 7.0 1.7 4.4 5.4 3.1 5.6 RGB flow 38.9 47.2 18.8 21.5 17.2 13.4 20.4 6.8 11.0 9.6 6.1 2.6 3.6 7.3 1.2 6.9 2.4 2.6 6.2 Depth flow 33.3 43.8 3.0 6.3 22.3 3.9 12.9 3.8 5.6 0.9 3.8 2.2 32.6 2.0 10.1 3.6 1.8 1.1 1.0 RGB-D flow 37.8 48.3 17.2 23.6 20.8 12.1 20.9 6.8 9.0 13.1 4.4 6.2 2.4 6.8 1.0 7.8 4.8 3.2 6.4 RGBD 43.2 78.6 26.2 42.5 33.2 40.6 34.3 33.2 43.6 23.1 57.2 31.8 42.3 12.1 18.4 59.1 31.4 49.5 24.8 floormat clothes ceiling books fridge tv paper towel shower box board person nightstand toilet sink lamp bathtub bag mean RGB NN 0.0 1.2 27.9 4.1 7.0 1.6 1.5 1.9 0.0 0.6 7.4 0.0 1.1 8.9 14.0 0.9 0.6 0.9 8.3 Depth NN 0.0 0.3 9.7 0.6 0.0 0.9 0.0 0.1 0.0 1.0 2.7 0.3 2.6 2.3 1.1 0.7 0.0 0.4 5.3 RGB-D NN 0.0 1.4 35.8 6.1 9.5 0.7 1.4 0.2 0.0 0.6 7.6 0.7 1.7 12.0 15.2 0.9 1.1 0.6 9.0 RGB flow 0.0 1.3 39.1 5.9 7.1 1.4 1.5 2.2 0.0 0.7 10.4 0.0 1.5 12.3 14.8 1.3 0.9 1.1 9.3 Depth flow 0.0 0.6 13.9 0.5 0.0 0.9 0.4 0.3 0.0 0.7 3.5 0.3 1.5 2.6 1.2 0.8 0.0 0.5 6.0 RGB-D flow 0.0 1.6 49.2 8.7 10.1 0.6 1.4 0.2 0.0 0.8 8.6 0.8 1.8 14.9 16.8 1.2 1.1 1.3 10.1 RGBD 5.6 27.0 84.5 35.7 24.2 36.5 26.8 19.2 9.0 11.7 51.4 35.7 25.0 64.1 53.0 44.2 47.0 18.6 36.3 Table 1. Semantic segmentation. We evaluate performance for 37 object categories. Here shows accuracy for each category, and the mean accuracy. (wall, floor, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, blinds, desk, shelves, curtain, dresser, pillow, mirror, floor mat, clothes, ceiling, books, fridge, tv, paper, towel, shower curtain, box, whiteboard, person, nightstand, toilet, sink, lamp, bathtub, bag) 2. Annotation Tool 3D object annotation To perform the 3D object annotations, we instruct the oDesk workers as follows. The user interface of the annotation tool is shown in Figure 2, which is divided into 4 sections. We ask the oDesk workers to label all objects that they can identify. Labeling a single object consists of three main steps: defining the length and width of the objects box, giving the object a name, and defining the height of the objects box. In the first step, the worker considers Section 2, a top view of the room, and begins to label the object by clicking three corners of the rectangle box. The first line segment created is considered to be the “front” of the object, and an orthogonal arrow will appear on that line segment in section 2. If the object has a natural orientation (such as a chair), then we ask the oDesk worker to begin from the correct “front” side; if it does not (such as a round table), then they can begin from any side they choose. Once the top view of the box is defined, worker need to type in the object category name. Each object is given its own label, even if the objects have the same semantic class, so as to discern individual instances. The oDesk workers are selected specifically for high English proficiency (it is extremely difficult to find suitable applicants from within the U.S.) as part of their interview process, so the name labels are far less noisy than they would be if obtained using Amazon Mechanical Turk. Finally, in the third step, the worker considers sections 3 and 4, which are orthogonal side views of the room, in order to adjust the top and bottom of the object’s box appropriately. We require that the boxes the oDesk workers drawn be tight. If the object is not fully visible in the image, we ask the workers to try to guess the full size of the object when drawing the box. 1

Upload: others

Post on 03-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

SUN RGB-D: A RGB-D Scene Understanding Benchmark SuiteSupplimentary Material

Shuran Song Samuel P. Lichtenberg Jianxiong XiaoPrinceton University

http://rgbd.cs.princeton.edu

1. Segmetation Result

wall floor cabinet bed chair sofa table door window bookshelf picture counter blinds desk shelves curtain dresser pillow mirrorRGB NN 37.8 45.0 17.4 21.8 16.9 12.8 18.5 6.1 9.6 9.4 4.6 2.2 2.4 7.3 1.0 4.3 2.2 2.3 6.9Depth NN 32.1 42.6 2.9 6.4 21.5 4.1 12.5 3.4 5.0 0.8 3.3 1.7 14.8 2.0 15.3 2.0 1.4 1.2 0.9

RGB-D NN 36.4 45.8 15.4 23.3 19.9 11.6 19.3 6.0 7.9 12.8 3.6 5.2 2.2 7.0 1.7 4.4 5.4 3.1 5.6RGB flow 38.9 47.2 18.8 21.5 17.2 13.4 20.4 6.8 11.0 9.6 6.1 2.6 3.6 7.3 1.2 6.9 2.4 2.6 6.2Depth flow 33.3 43.8 3.0 6.3 22.3 3.9 12.9 3.8 5.6 0.9 3.8 2.2 32.6 2.0 10.1 3.6 1.8 1.1 1.0

RGB-D flow 37.8 48.3 17.2 23.6 20.8 12.1 20.9 6.8 9.0 13.1 4.4 6.2 2.4 6.8 1.0 7.8 4.8 3.2 6.4RGBD 43.2 78.6 26.2 42.5 33.2 40.6 34.3 33.2 43.6 23.1 57.2 31.8 42.3 12.1 18.4 59.1 31.4 49.5 24.8

floormat clothes ceiling books fridge tv paper towel shower box board person nightstand toilet sink lamp bathtub bag meanRGB NN 0.0 1.2 27.9 4.1 7.0 1.6 1.5 1.9 0.0 0.6 7.4 0.0 1.1 8.9 14.0 0.9 0.6 0.9 8.3Depth NN 0.0 0.3 9.7 0.6 0.0 0.9 0.0 0.1 0.0 1.0 2.7 0.3 2.6 2.3 1.1 0.7 0.0 0.4 5.3

RGB-D NN 0.0 1.4 35.8 6.1 9.5 0.7 1.4 0.2 0.0 0.6 7.6 0.7 1.7 12.0 15.2 0.9 1.1 0.6 9.0RGB flow 0.0 1.3 39.1 5.9 7.1 1.4 1.5 2.2 0.0 0.7 10.4 0.0 1.5 12.3 14.8 1.3 0.9 1.1 9.3Depth flow 0.0 0.6 13.9 0.5 0.0 0.9 0.4 0.3 0.0 0.7 3.5 0.3 1.5 2.6 1.2 0.8 0.0 0.5 6.0

RGB-D flow 0.0 1.6 49.2 8.7 10.1 0.6 1.4 0.2 0.0 0.8 8.6 0.8 1.8 14.9 16.8 1.2 1.1 1.3 10.1RGBD 5.6 27.0 84.5 35.7 24.2 36.5 26.8 19.2 9.0 11.7 51.4 35.7 25.0 64.1 53.0 44.2 47.0 18.6 36.3

Table 1. Semantic segmentation. We evaluate performance for 37 object categories. Here shows accuracy for each category, and the meanaccuracy. (wall, floor, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, blinds, desk, shelves, curtain, dresser,pillow, mirror, floor mat, clothes, ceiling, books, fridge, tv, paper, towel, shower curtain, box, whiteboard, person, nightstand, toilet, sink,lamp, bathtub, bag)

2. Annotation Tool3D object annotation To perform the 3D object annotations, we instruct the oDesk workers as follows. The user interfaceof the annotation tool is shown in Figure 2, which is divided into 4 sections. We ask the oDesk workers to label all objectsthat they can identify. Labeling a single object consists of three main steps: defining the length and width of the objects box,giving the object a name, and defining the height of the objects box.

In the first step, the worker considers Section 2, a top view of the room, and begins to label the object by clicking threecorners of the rectangle box. The first line segment created is considered to be the “front” of the object, and an orthogonalarrow will appear on that line segment in section 2. If the object has a natural orientation (such as a chair), then we ask theoDesk worker to begin from the correct “front” side; if it does not (such as a round table), then they can begin from any sidethey choose. Once the top view of the box is defined, worker need to type in the object category name. Each object is givenits own label, even if the objects have the same semantic class, so as to discern individual instances. The oDesk workers areselected specifically for high English proficiency (it is extremely difficult to find suitable applicants from within the U.S.) aspart of their interview process, so the name labels are far less noisy than they would be if obtained using Amazon MechanicalTurk. Finally, in the third step, the worker considers sections 3 and 4, which are orthogonal side views of the room, in orderto adjust the top and bottom of the object’s box appropriately. We require that the boxes the oDesk workers drawn be tight.If the object is not fully visible in the image, we ask the workers to try to guess the full size of the object when drawing thebox.

1

2. Top

4. Front3. Side

1. Image

Figure 1. Annotation tool for 3D objects

2. Top

4. Front3. Side

1. Image

Figure 2. Annotation tool for room layout

After these three steps are completed, the worker has a completed box for the object. The labeling tool colors not onlythe box but also the points contained in that box onto the RGB image in section 1, which enables the worker to correct anymistakes. This real-time feedback is essential for obtaining high-quality boxes. This process is then repeated for every object

in the image, for every image in the dataset.

3D Room Layout Annotation The 3D room layout labeling process is very similar to that of the 3D object annotation.The worker sees an identical interface however, with a gray area indicates the field of view. The room layout labeling processagain follows a similar steps.

Firstly, the worker draws the boundary of the room, starting and ending at the camera position, in the top-down view ofsection 2. This time, the worker can connect line segments to draw an arbitrary polygon–not just a rectangle–to express theroom structure. The worker is presented with a white area that covers all points in the room, and a gray border(outside thefield of view). Any line segments used to help draw the room layout box, but that do not actually represent walls of the room,should lie only in the gray boundary area.

There is an additional complicating factor: oftentimes rooms will have hallways or other passageways leading out of thefield of view of the camera. To handle this case, we ask workers to draw both the walls defining this passageway into the grayboundary area, and connect them there with a line segment, outside the white area considered to represent the room layout.

Then, the worker adjusts the height of the room layout box. The bottom of the box should align with the floor, and the topwith the ceiling. If the floor is not clearly visible in the image, we ask the workers to make the best guess they can. If theceiling is not visible in the image, we ask the workers to draw the ceiling up into a gray boundary area at the top of section 4.The process is repeated for each image in the dataset.

3. DatasetIn this section, we show more statistics of our SUN RGB-D dataset. Figure 3 shows the distributions of object 3D location

for various categories. Figure 4 shows the distribution of object orientation. In Figure 5, we show the distribution of object3D sizes. In Figure 6 and Figure 7, we show some examples for 2D and 3D annotations.

bathtub bed box chair

0 5 10 15−1.5

−1

−0.5

0

0.5

1

0 50 100−1.5

−1

−0.5

0

0.5

1

0 50−1.5

−1

−0.5

0

0.5

1

0 2000−1.5

−1

−0.5

0

0.5

1

bookshelf desk dresser door

0 10 20−1.5

−1

−0.5

0

0.5

1

0 200−1.5

−1

−0.5

0

0.5

1

0 20−1.5

−1

−0.5

0

0.5

1

0 20−1.5

−1

−0.5

0

0.5

1

garbage bin lamp monitor night stand

0 50 100−1.5

−1

−0.5

0

0.5

1

0 50−1.5

−1

−0.5

0

0.5

1

0 50−1.5

−1

−0.5

0

0.5

1

0 50−1.5

−1

−0.5

0

0.5

1

pillow sink sofa table

0 100 200−1.5

−1

−0.5

0

0.5

1

0 50−1.5

−1

−0.5

0

0.5

1

0 200−1.5

−1

−0.5

0

0.5

1

0 500−1.5

−1

−0.5

0

0.5

1

Figure 3. 3D location priors for selected object classes. For each object category we show their 3D location distribution from top view(Left) and their center’s height location distribution (right), in meters. We draw the Kinect v1’s FOV (in yellow) for reference.

toilet bed table chair sofa sink pillow−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

10

20

30

40

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

250

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

200

400

600

800

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

10

20

30

40

50

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

200

night stand monitor lamp garbage bin dresser door desk−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

10

20

30

40

50

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

80

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

20

40

60

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

10

20

30

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

10

20

30

40

−30

150

−60

120

−90 90

−120

60

−150

30

−180

0

50

100

150

Figure 4. 3D orientation priors for selected object classes.

0.2 0.33 0.46 0.59 0.72 0.85 0.98 1.11 1.240

5

10

Le

ng

th

0.15 0.23 0.31 0.39 0.47 0.55 0.63 0.71 0.790

5

10

15

Wid

th

0.06 0.18 0.3 0.42 0.54 0.66 0.78 0.9 1.020

10

20

He

igh

t

bathtub

0.17 0.34 0.51 0.68 0.85 1.02 1.19 1.360

50

100

Le

ngth

0.11 0.31 0.51 0.71 0.91 1.11 1.31 1.51 1.710

50

100

150

Wid

th

0.07 0.21 0.35 0.49 0.63 0.77 0.91 1.05 1.190

50

100

He

ight

bed

0.17 0.7 1.23 1.76 2.29 2.82 3.35 3.88 4.410

20

40

60

Le

ng

th

0.05 0.22 0.39 0.56 0.73 0.9 1.07 1.24 1.410

50

100

Wid

th

0.09 0.26 0.43 0.6 0.77 0.94 1.11 1.28 1.450

10

20

He

igh

t

book shelf

0.02 0.14 0.26 0.38 0.5 0.62 0.74 0.86 0.980

50

100

150

Le

ngth

0.04 0.34 0.64 0.94 1.24 1.54 1.84 2.14 2.440

100

200

300

Wid

th

0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060

50

100

150

He

ight

box

0.03 0.33 0.63 0.93 1.23 1.53 1.83 2.13 2.430

5000

10000

Le

ng

th

0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060

1000

2000

3000

Wid

th

0.01 0.13 0.25 0.37 0.49 0.61 0.73 0.85 0.970

1000

2000

3000

He

igh

t

chair

0.2 0.45 0.7 0.95 1.2 1.45 1.7 1.95 2.20

2

4

Le

ng

th

0.14 0.37 0.6 0.83 1.06 1.29 1.52 1.75 1.980

2

4

6

Wid

th

0.22 0.27 0.32 0.37 0.42 0.47 0.52 0.570

2

4

6

He

igh

t

counter

0.11 0.49 0.87 1.25 1.63 2.01 2.39 2.77 3.150

100

200

300

Length

0.08 0.36 0.64 0.92 1.2 1.48 1.76 2.04 2.320

200

400

600

Wid

th

0.04 0.14 0.24 0.34 0.44 0.54 0.64 0.74 0.840

200

400

600H

eig

ht

desk

0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060

10

20

30

Length

0.03 0.24 0.45 0.66 0.87 1.08 1.29 1.5 1.710

50

100

150

Wid

th

0.04 0.21 0.38 0.55 0.72 0.89 1.06 1.23 1.40

10

20

30

Heig

ht

door

0.11 0.26 0.41 0.56 0.71 0.86 1.01 1.160

10

20

30

Length

0.1 0.25 0.4 0.55 0.7 0.85 1 1.150

20

40

60

Wid

th

0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.180

10

20

30

Heig

ht

dresser

0.08 0.17 0.26 0.35 0.44 0.53 0.62 0.71 0.80

50

100

Length

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450

50

100

Wid

th

0.03 0.12 0.21 0.3 0.39 0.48 0.57 0.66 0.750

50

100

Heig

ht

garbage bin

0.06 0.17 0.28 0.39 0.5 0.61 0.72 0.83 0.940

50

100

150

Length

0.05 0.13 0.21 0.29 0.37 0.45 0.53 0.61 0.690

50

100

Wid

th

0.03 0.17 0.31 0.45 0.59 0.73 0.87 1.01 0.030

50

100

Heig

ht

lamp

0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.550

10

20

30

Len

gth

0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.080

10

20

30W

idth

0.08 0.17 0.26 0.35 0.44 0.53 0.62 0.71 0.80

20

40

60

Heig

ht

night stand

0.15 0.32 0.49 0.66 0.83 1 1.17 1.34 1.510

50

100

Length

0.09 0.16 0.23 0.3 0.37 0.44 0.51 0.58 0.650

20

40

60

Wid

th

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.850

20

40

60

Heig

ht

sink

0.16 0.46 0.76 1.06 1.36 1.66 1.96 2.26 2.560

100

200

300

Length

0.13 0.34 0.55 0.76 0.97 1.18 1.39 1.6 1.810

200

400

Wid

th

0.09 0.25 0.41 0.57 0.73 0.89 1.05 1.210

200

400

Heig

ht

sofa

0.13 0.57 1.01 1.45 1.89 2.33 2.77 3.21 3.650

200

400

600

Le

ng

th

0.09 0.46 0.83 1.2 1.57 1.94 2.31 2.68 3.050

500

1000

Wid

th

0.03 0.21 0.39 0.57 0.75 0.93 1.11 1.29 1.470

500

1000

1500

He

igh

t

table

0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.410

10

20

30

Le

ng

th

0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45 0.490

10

20

Wid

th

0.2 0.36 0.52 0.68 0.84 1 1.16 1.32 1.480

20

40

60

He

igh

t

toilet

Figure 5. Object 3D size distribution.

Figure 6. Examples of 2D Segmentation Annotation in our SUN RGB-D dataset

Figure 7. Examples of 2D Segmentation Annotation in our SUN RGB-D dataset