graph interaction network for scene parsingthe ginet con gured with 64 nodes in the gi unit can...

3
Supplementary Material for “Graph Interaction Network for Scene Parsing” Tianyi Wu 1,2? , Yu Lu 3? , Yu Zhu 1,2 , Chuang Zhang 3 , MingWu 3 , Zhanyu Ma 3 , and Guodong Guo 1,2* 1 Institute of Deep Learning, Baidu Research, Beijing, China {wutianyi01, zhuyu05, guoguodong01}@baidu.com 2 National Engineering Laboratory for Deep Learning Technology and Application, Beijing, China 3 Beijing University of Posts and Telecommunications, Beijing, China {aniki, zhangchuang, wuming, mazhanyu}@bupt.edu.cn 1 Qualitative Results This section provides more qualitative results of the proposed approach on ADE20k[3] and COCO Stuff [1]. GINet(ours) GloRe Baseline GT Image Fig. 1. Results illustration of the proposed GINet on ADE20k validation set. From left to right are: input image, ground truth, prediction of the Baseline, prediction of the GloRe[2], and prediction of the proposed GINet. (Best viewed in color.) ? Equal contribution * Corresponding author

Upload: others

Post on 06-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Interaction Network for Scene ParsingThe GINet con gured with 64 nodes in the GI unit can obtain the best performance. This means that a larger number of nodes does not result

Supplementary Material for “Graph InteractionNetwork for Scene Parsing”

Tianyi Wu1,2?, Yu Lu3?, Yu Zhu1,2,Chuang Zhang3, MingWu3, Zhanyu Ma3, and Guodong Guo1,2∗

1 Institute of Deep Learning, Baidu Research, Beijing, China{wutianyi01, zhuyu05, guoguodong01}@baidu.com

2 National Engineering Laboratory for Deep Learning Technology and Application,Beijing, China

3 Beijing University of Posts and Telecommunications, Beijing, China{aniki, zhangchuang, wuming, mazhanyu}@bupt.edu.cn

1 Qualitative Results

This section provides more qualitative results of the proposed approach onADE20k[3] and COCO Stuff [1].

GINet(ours)GloReBaselineGTImage

Fig. 1. Results illustration of the proposed GINet on ADE20k validation set. From leftto right are: input image, ground truth, prediction of the Baseline, prediction of theGloRe[2], and prediction of the proposed GINet. (Best viewed in color.)

? Equal contribution ∗ Corresponding author

Page 2: Graph Interaction Network for Scene ParsingThe GINet con gured with 64 nodes in the GI unit can obtain the best performance. This means that a larger number of nodes does not result

2 Tianyi Wu et al.

GINet(ours)GloReBaselineGTImage

Fig. 2. Results illustration of the proposed GINet on COCO Stuff test set. From leftto right are: input image, ground truth, prediction of the Baseline, prediction of theGloRe[2], and prediction of the proposed GINet. (Best viewed in color.)

Table 1. Performance comparison of using different numbers of nodes on Pascal Con-text.“N” indicates the number of nodes in visual graph.

N Backbone mIoU

16 Res50 51.032 Res50 51.164 Res50 51.7128 Res50 51.1

2 Effect of different numbers of nodes for Visual Graph

Table 1 shows the effects of using different numbers of nodes for the visualgraph. The GINet configured with 64 nodes in the GI unit can obtain the bestperformance. This means that a larger number of nodes does not result in ahigher performance, and using the right number of nodes is also very importantfor our model.

Page 3: Graph Interaction Network for Scene ParsingThe GINet con gured with 64 nodes in the GI unit can obtain the best performance. This means that a larger number of nodes does not result

Graph Interaction Network for Scene Parsing 3

References

1. Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: Thing and stuff classes in context.In: CVPR (2018)

2. Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., Kalantidis, Y.: Graph-based global reasoning networks. In: CVPR (2019)

3. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsingthrough ade20k dataset. In: CVPR (2017)