![Page 1: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/1.jpg)
Visual DialogAbhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M.F. Moura, Devi Parikh, Dhruv Batra
Presented by: Alan Luo
1
![Page 2: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/2.jpg)
Introduction Natural Language Processing + Computer Vision
● Aiding visually impaired users in understanding their surroundings or social media content
● Interacting with an AI assistant
2
![Page 3: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/3.jpg)
Video Captioning
Related Work Image/Video Captioning Image Captioning
3
![Page 4: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/4.jpg)
Datasets
Related Work Visual-Semantic Alignments Visual-Semantic Alignments
4
![Page 5: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/5.jpg)
5
Related Work Visual Q&A
![Page 6: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/6.jpg)
Contributions1. Propose a new AI task: Visual Dialog
2. Develop a novel two-person chat data-collection protocol and introduce a new dataset
3. Introduce a family of neural encoder-decoder models for Visual Dialog
6
![Page 7: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/7.jpg)
Technical Details With Late Fusion Encoder
7
![Page 8: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/8.jpg)
Qualitative Quantitative
8
Dataset VisDial
![Page 9: Visual Dialog - Stanford University · Visual Dialog 1.0 2.0 1.5 Questions Answers o 10 5 67 Words 8 9 # Unique answers (x 10000) Image I Do you think the woman is with him? Question](https://reader033.vdocument.in/reader033/viewer/2022042303/5ece26edcebd7c0f84040fab/html5/thumbnails/9.jpg)
ResultsQualitative Results
9
Quantitative Results