places challenge 2017 scene parsingpresentations.cocodataset.org/places17-winteriscoming.pdf ·...
TRANSCRIPT
![Page 1: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/1.jpg)
PlacesChallenge2017SceneParsing
WinterIsComingRiweiChen,QiChen,XinglongWuYifanLu,YudongJiang,LinfuWen
![Page 2: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/2.jpg)
Outline• SingleModelResults• MethodOverview• MethodDetails
• ModelPretraining• PyramidPooling• BatchSize&BN• Other details• Submissions
• VisualResults• FutureDirection
![Page 3: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/3.jpg)
FeaturesofADE20KDataset—SceneParsing
• Numberofimage• Training:20K• Validation:2K• Testing:3K
• Numberofcategory• Semanticcategory:150
![Page 4: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/4.jpg)
SingleModelResultsonValidationSet
• Singlemodel• Comparedwiththebestsinglemodelresultof2016
[1] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017[2] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. 2016* The result of “Model C, 2 conv”
Team mIoU pixel accuracy
SenseCuSceneParsing[1] 43.39% 80.90%
Adelaide[2]* 43.06% 80.53%
WinterIsComming(ours) 43.98% 81.13%
![Page 5: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/5.jpg)
MethodOverview
• BaseNetwork:ResNet38• PyramidPooling• ImageNetandPlaces2pretraining• BatchSizeiscritical• Ensemblemodelstrainedwithdifferentepochs
![Page 6: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/6.jpg)
NetworkStructure
[1] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. arXiv 2016[2] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017* Our implement is based on: https://github.com/itijyou/ademxapp
![Page 7: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/7.jpg)
BuildingBlocks
![Page 8: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/8.jpg)
Res-MobileNet
Model computation (macc)
ResNet50 109.4G
Res-MobileNet 32.5G
ResNet38 415.5G
VGG16 618.0G
* The computationcostofmodels wheninput size is 512x512
![Page 9: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/9.jpg)
ModelPerformance
[1] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017[2] Szegedy C, Ioffe S, et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016[3] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. arXiv 2016
![Page 10: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/10.jpg)
PyramidPooling
![Page 11: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/11.jpg)
PyramidPooling
• PyramidPoolingimprovestheintegrityofsegmentation
Image Ground Truth without Pyramid Pooling with Pyramid Pooling
![Page 12: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/12.jpg)
Pretraining
• ResNet50withoutImageNetpretraininghasthelowestaccuracy• Places2pretraininghelpsimproveaccuracy
![Page 13: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/13.jpg)
Batchsize&BatchNorm
• Trainingbatchsizeiscritical
• ExperimentwithRes-MobileNet
• ResNet38w/oPP,batchsize=6
• AfteraddingPP,batchsize=2
• Usuallyuse4GTX 1080Ti GPUs
Training Batch Size per GPU
Testing Pixel Accuracy
1 68.4%
2 69.7%
4 70.7%
finetune with fixed BN 72.9%
finetune ImageNet pretrained model with
fixed BN 74.1%
![Page 14: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/14.jpg)
Other Details
• Training augmentation• Multi-scale: [0.7, 1.3]• Flip• Random crop to 512x512
• Testing augmentation• Flip• Nomulti-scale
• SGD solver with lr = 1e-4 for 64 epochs
![Page 15: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/15.jpg)
Submissions
• Submit1:trainwithonlyADE20Ktrainingset• weget81.13%/43.98%pixelaccuracy/mIOUonvalidationset
• Submit2-4:finetunethemodelwithbothtrainingandvalidationsetfor5,22,29epochsrespectively
• Submit5:ensemblesubmit1-4modelsbyvoting
![Page 16: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/16.jpg)
Summary
• Pretrainingiscriticalanddatasetsofsimilartasksworkbetter• Batchsizeshouldbelargeenough• FixBNparamscanfurtherimproveresult(whenbatchsizeissmall)• PyramidPoolingcanimproveregionintegrityofsegmentation
![Page 17: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/17.jpg)
VisualResults
Image Ground Truth without Pyramid Pooling with Pyramid Pooling
![Page 18: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/18.jpg)
Futurework
• Memory-efficientdeeplearningframework• Well-PretrainedRes-MobileNet• Focalloss• Expertmodel
![Page 19: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •](https://reader034.vdocument.in/reader034/viewer/2022051809/6012fe747146ac11a608f78a/html5/thumbnails/19.jpg)
Thanks&Questions