sun rgb-d: a rgb-d scene understanding benchmark suite...
TRANSCRIPT
SUN RGB-D: A RGB-D Scene Understanding Benchmark SuiteSupplimentary Material
Shuran Song Samuel P. Lichtenberg Jianxiong XiaoPrinceton University
http://rgbd.cs.princeton.edu
1. Segmetation Result
wall floor cabinet bed chair sofa table door window bookshelf picture counter blinds desk shelves curtain dresser pillow mirrorRGB NN 37.8 45.0 17.4 21.8 16.9 12.8 18.5 6.1 9.6 9.4 4.6 2.2 2.4 7.3 1.0 4.3 2.2 2.3 6.9Depth NN 32.1 42.6 2.9 6.4 21.5 4.1 12.5 3.4 5.0 0.8 3.3 1.7 14.8 2.0 15.3 2.0 1.4 1.2 0.9
RGB-D NN 36.4 45.8 15.4 23.3 19.9 11.6 19.3 6.0 7.9 12.8 3.6 5.2 2.2 7.0 1.7 4.4 5.4 3.1 5.6RGB flow 38.9 47.2 18.8 21.5 17.2 13.4 20.4 6.8 11.0 9.6 6.1 2.6 3.6 7.3 1.2 6.9 2.4 2.6 6.2Depth flow 33.3 43.8 3.0 6.3 22.3 3.9 12.9 3.8 5.6 0.9 3.8 2.2 32.6 2.0 10.1 3.6 1.8 1.1 1.0
RGB-D flow 37.8 48.3 17.2 23.6 20.8 12.1 20.9 6.8 9.0 13.1 4.4 6.2 2.4 6.8 1.0 7.8 4.8 3.2 6.4RGBD 43.2 78.6 26.2 42.5 33.2 40.6 34.3 33.2 43.6 23.1 57.2 31.8 42.3 12.1 18.4 59.1 31.4 49.5 24.8
floormat clothes ceiling books fridge tv paper towel shower box board person nightstand toilet sink lamp bathtub bag meanRGB NN 0.0 1.2 27.9 4.1 7.0 1.6 1.5 1.9 0.0 0.6 7.4 0.0 1.1 8.9 14.0 0.9 0.6 0.9 8.3Depth NN 0.0 0.3 9.7 0.6 0.0 0.9 0.0 0.1 0.0 1.0 2.7 0.3 2.6 2.3 1.1 0.7 0.0 0.4 5.3
RGB-D NN 0.0 1.4 35.8 6.1 9.5 0.7 1.4 0.2 0.0 0.6 7.6 0.7 1.7 12.0 15.2 0.9 1.1 0.6 9.0RGB flow 0.0 1.3 39.1 5.9 7.1 1.4 1.5 2.2 0.0 0.7 10.4 0.0 1.5 12.3 14.8 1.3 0.9 1.1 9.3Depth flow 0.0 0.6 13.9 0.5 0.0 0.9 0.4 0.3 0.0 0.7 3.5 0.3 1.5 2.6 1.2 0.8 0.0 0.5 6.0
RGB-D flow 0.0 1.6 49.2 8.7 10.1 0.6 1.4 0.2 0.0 0.8 8.6 0.8 1.8 14.9 16.8 1.2 1.1 1.3 10.1RGBD 5.6 27.0 84.5 35.7 24.2 36.5 26.8 19.2 9.0 11.7 51.4 35.7 25.0 64.1 53.0 44.2 47.0 18.6 36.3
Table 1. Semantic segmentation. We evaluate performance for 37 object categories. Here shows accuracy for each category, and the meanaccuracy. (wall, floor, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, blinds, desk, shelves, curtain, dresser,pillow, mirror, floor mat, clothes, ceiling, books, fridge, tv, paper, towel, shower curtain, box, whiteboard, person, nightstand, toilet, sink,lamp, bathtub, bag)
2. Annotation Tool3D object annotation To perform the 3D object annotations, we instruct the oDesk workers as follows. The user interfaceof the annotation tool is shown in Figure 2, which is divided into 4 sections. We ask the oDesk workers to label all objectsthat they can identify. Labeling a single object consists of three main steps: defining the length and width of the objects box,giving the object a name, and defining the height of the objects box.
In the first step, the worker considers Section 2, a top view of the room, and begins to label the object by clicking threecorners of the rectangle box. The first line segment created is considered to be the “front” of the object, and an orthogonalarrow will appear on that line segment in section 2. If the object has a natural orientation (such as a chair), then we ask theoDesk worker to begin from the correct “front” side; if it does not (such as a round table), then they can begin from any sidethey choose. Once the top view of the box is defined, worker need to type in the object category name. Each object is givenits own label, even if the objects have the same semantic class, so as to discern individual instances. The oDesk workers areselected specifically for high English proficiency (it is extremely difficult to find suitable applicants from within the U.S.) aspart of their interview process, so the name labels are far less noisy than they would be if obtained using Amazon MechanicalTurk. Finally, in the third step, the worker considers sections 3 and 4, which are orthogonal side views of the room, in orderto adjust the top and bottom of the object’s box appropriately. We require that the boxes the oDesk workers drawn be tight.If the object is not fully visible in the image, we ask the workers to try to guess the full size of the object when drawing thebox.
1
2. Top
4. Front3. Side
1. Image
Figure 1. Annotation tool for 3D objects
2. Top
4. Front3. Side
1. Image
Figure 2. Annotation tool for room layout
After these three steps are completed, the worker has a completed box for the object. The labeling tool colors not onlythe box but also the points contained in that box onto the RGB image in section 1, which enables the worker to correct anymistakes. This real-time feedback is essential for obtaining high-quality boxes. This process is then repeated for every object
in the image, for every image in the dataset.
3D Room Layout Annotation The 3D room layout labeling process is very similar to that of the 3D object annotation.The worker sees an identical interface however, with a gray area indicates the field of view. The room layout labeling processagain follows a similar steps.
Firstly, the worker draws the boundary of the room, starting and ending at the camera position, in the top-down view ofsection 2. This time, the worker can connect line segments to draw an arbitrary polygon–not just a rectangle–to express theroom structure. The worker is presented with a white area that covers all points in the room, and a gray border(outside thefield of view). Any line segments used to help draw the room layout box, but that do not actually represent walls of the room,should lie only in the gray boundary area.
There is an additional complicating factor: oftentimes rooms will have hallways or other passageways leading out of thefield of view of the camera. To handle this case, we ask workers to draw both the walls defining this passageway into the grayboundary area, and connect them there with a line segment, outside the white area considered to represent the room layout.
Then, the worker adjusts the height of the room layout box. The bottom of the box should align with the floor, and the topwith the ceiling. If the floor is not clearly visible in the image, we ask the workers to make the best guess they can. If theceiling is not visible in the image, we ask the workers to draw the ceiling up into a gray boundary area at the top of section 4.The process is repeated for each image in the dataset.
3. DatasetIn this section, we show more statistics of our SUN RGB-D dataset. Figure 3 shows the distributions of object 3D location
for various categories. Figure 4 shows the distribution of object orientation. In Figure 5, we show the distribution of object3D sizes. In Figure 6 and Figure 7, we show some examples for 2D and 3D annotations.
bathtub bed box chair
0 5 10 15−1.5
−1
−0.5
0
0.5
1
0 50 100−1.5
−1
−0.5
0
0.5
1
0 50−1.5
−1
−0.5
0
0.5
1
0 2000−1.5
−1
−0.5
0
0.5
1
bookshelf desk dresser door
0 10 20−1.5
−1
−0.5
0
0.5
1
0 200−1.5
−1
−0.5
0
0.5
1
0 20−1.5
−1
−0.5
0
0.5
1
0 20−1.5
−1
−0.5
0
0.5
1
garbage bin lamp monitor night stand
0 50 100−1.5
−1
−0.5
0
0.5
1
0 50−1.5
−1
−0.5
0
0.5
1
0 50−1.5
−1
−0.5
0
0.5
1
0 50−1.5
−1
−0.5
0
0.5
1
pillow sink sofa table
0 100 200−1.5
−1
−0.5
0
0.5
1
0 50−1.5
−1
−0.5
0
0.5
1
0 200−1.5
−1
−0.5
0
0.5
1
0 500−1.5
−1
−0.5
0
0.5
1
Figure 3. 3D location priors for selected object classes. For each object category we show their 3D location distribution from top view(Left) and their center’s height location distribution (right), in meters. We draw the Kinect v1’s FOV (in yellow) for reference.
toilet bed table chair sofa sink pillow−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
10
20
30
40
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
200
250
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
200
400
600
800
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
10
20
30
40
50
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
200
night stand monitor lamp garbage bin dresser door desk−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
10
20
30
40
50
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
20
40
60
80
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
20
40
60
80
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
20
40
60
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
10
20
30
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
10
20
30
40
−30
150
−60
120
−90 90
−120
60
−150
30
−180
0
50
100
150
Figure 4. 3D orientation priors for selected object classes.
0.2 0.33 0.46 0.59 0.72 0.85 0.98 1.11 1.240
5
10
Le
ng
th
0.15 0.23 0.31 0.39 0.47 0.55 0.63 0.71 0.790
5
10
15
Wid
th
0.06 0.18 0.3 0.42 0.54 0.66 0.78 0.9 1.020
10
20
He
igh
t
bathtub
0.17 0.34 0.51 0.68 0.85 1.02 1.19 1.360
50
100
Le
ngth
0.11 0.31 0.51 0.71 0.91 1.11 1.31 1.51 1.710
50
100
150
Wid
th
0.07 0.21 0.35 0.49 0.63 0.77 0.91 1.05 1.190
50
100
He
ight
bed
0.17 0.7 1.23 1.76 2.29 2.82 3.35 3.88 4.410
20
40
60
Le
ng
th
0.05 0.22 0.39 0.56 0.73 0.9 1.07 1.24 1.410
50
100
Wid
th
0.09 0.26 0.43 0.6 0.77 0.94 1.11 1.28 1.450
10
20
He
igh
t
book shelf
0.02 0.14 0.26 0.38 0.5 0.62 0.74 0.86 0.980
50
100
150
Le
ngth
0.04 0.34 0.64 0.94 1.24 1.54 1.84 2.14 2.440
100
200
300
Wid
th
0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060
50
100
150
He
ight
box
0.03 0.33 0.63 0.93 1.23 1.53 1.83 2.13 2.430
5000
10000
Le
ng
th
0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060
1000
2000
3000
Wid
th
0.01 0.13 0.25 0.37 0.49 0.61 0.73 0.85 0.970
1000
2000
3000
He
igh
t
chair
0.2 0.45 0.7 0.95 1.2 1.45 1.7 1.95 2.20
2
4
Le
ng
th
0.14 0.37 0.6 0.83 1.06 1.29 1.52 1.75 1.980
2
4
6
Wid
th
0.22 0.27 0.32 0.37 0.42 0.47 0.52 0.570
2
4
6
He
igh
t
counter
0.11 0.49 0.87 1.25 1.63 2.01 2.39 2.77 3.150
100
200
300
Length
0.08 0.36 0.64 0.92 1.2 1.48 1.76 2.04 2.320
200
400
600
Wid
th
0.04 0.14 0.24 0.34 0.44 0.54 0.64 0.74 0.840
200
400
600H
eig
ht
desk
0.02 0.15 0.28 0.41 0.54 0.67 0.8 0.93 1.060
10
20
30
Length
0.03 0.24 0.45 0.66 0.87 1.08 1.29 1.5 1.710
50
100
150
Wid
th
0.04 0.21 0.38 0.55 0.72 0.89 1.06 1.23 1.40
10
20
30
Heig
ht
door
0.11 0.26 0.41 0.56 0.71 0.86 1.01 1.160
10
20
30
Length
0.1 0.25 0.4 0.55 0.7 0.85 1 1.150
20
40
60
Wid
th
0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.180
10
20
30
Heig
ht
dresser
0.08 0.17 0.26 0.35 0.44 0.53 0.62 0.71 0.80
50
100
Length
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450
50
100
Wid
th
0.03 0.12 0.21 0.3 0.39 0.48 0.57 0.66 0.750
50
100
Heig
ht
garbage bin
0.06 0.17 0.28 0.39 0.5 0.61 0.72 0.83 0.940
50
100
150
Length
0.05 0.13 0.21 0.29 0.37 0.45 0.53 0.61 0.690
50
100
Wid
th
0.03 0.17 0.31 0.45 0.59 0.73 0.87 1.01 0.030
50
100
Heig
ht
lamp
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.550
10
20
30
Len
gth
0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.080
10
20
30W
idth
0.08 0.17 0.26 0.35 0.44 0.53 0.62 0.71 0.80
20
40
60
Heig
ht
night stand
0.15 0.32 0.49 0.66 0.83 1 1.17 1.34 1.510
50
100
Length
0.09 0.16 0.23 0.3 0.37 0.44 0.51 0.58 0.650
20
40
60
Wid
th
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.850
20
40
60
Heig
ht
sink
0.16 0.46 0.76 1.06 1.36 1.66 1.96 2.26 2.560
100
200
300
Length
0.13 0.34 0.55 0.76 0.97 1.18 1.39 1.6 1.810
200
400
Wid
th
0.09 0.25 0.41 0.57 0.73 0.89 1.05 1.210
200
400
Heig
ht
sofa
0.13 0.57 1.01 1.45 1.89 2.33 2.77 3.21 3.650
200
400
600
Le
ng
th
0.09 0.46 0.83 1.2 1.57 1.94 2.31 2.68 3.050
500
1000
Wid
th
0.03 0.21 0.39 0.57 0.75 0.93 1.11 1.29 1.470
500
1000
1500
He
igh
t
table
0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.410
10
20
30
Le
ng
th
0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45 0.490
10
20
Wid
th
0.2 0.36 0.52 0.68 0.84 1 1.16 1.32 1.480
20
40
60
He
igh
t
toilet
Figure 5. Object 3D size distribution.
Figure 6. Examples of 2D Segmentation Annotation in our SUN RGB-D dataset
Figure 7. Examples of 2D Segmentation Annotation in our SUN RGB-D dataset