semi-supervised video object segmentation
TRANSCRIPT
![Page 1: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/1.jpg)
![Page 2: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/2.jpg)
Semi-supervised Video Object Segmentation• Benchmarks & Metrics
• Benchmarks
• DAVIS 2016: Popular single object VOS benchmark
• DAVIS 2017: Multi object VOS benchmark with high quality annotation and higher resolution
• YouTube-VOS: The largest and most complex VOS dataset
![Page 3: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/3.jpg)
• Benchmarks & Metrics
• Metrics
• Jaccard Score ( ): IoU of predicted mask and ground truth mask
• Contour Accuracy( ): F1 score of predict mask’s boundary element and ground truth mask’s boundary element
• : Harmonic average of the above two indicators
Semi-supervised Video Object Segmentation
![Page 4: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/4.jpg)
Semi-supervised Video Object Segmentation
• Semi Supervised
• Given one or more annotated frames
• propagate the manual labeling to the entire video
![Page 5: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/5.jpg)
• Multi-object Scenarios
• post-ensemble manner:
• AOT associates and segments multiple objects within an end-to-end framework
Semi-supervised Video Object Segmentation
![Page 6: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/6.jpg)
Identity Assignment
• Identity Embedding
• Identity Decoding
![Page 7: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/7.jpg)
Long-short term transformer (LSTT)
• Long Term Attention
• Short Term Attention
![Page 8: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/8.jpg)
![Page 9: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/9.jpg)
Overview Architecture
• Encoder
• MobileNet V2
• Decoder
• FPN
• Loss Function
• Binary Cross Entropy Loss
• IoU Loss
![Page 10: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/10.jpg)
AOT-Tiny:L=1, m=1
AOT-Small:L=2, m=1
AOT-Base:L=3, m=1
AOT-Large:L=3, m={1,7,13,……}
AOT-Base 5 times faster than CFBI
(15.2fps vs 3.4fps)
![Page 11: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/11.jpg)
Ablation study
![Page 12: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/12.jpg)
Interpretability — Identity Bank
![Page 13: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/13.jpg)
Interpretability — Long term & Short term Memory
![Page 14: Semi-supervised Video Object Segmentation](https://reader035.vdocument.in/reader035/viewer/2022081619/62ce614cb0bbe9198a2b9026/html5/thumbnails/14.jpg)
Thanks for watching!