movie meets ai - chinese university of hong kong

43
26/03/2019 Qingqiu Huang Movie Meets AI

Upload: others

Post on 23-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Movie Meets AI - Chinese University of Hong Kong

26/03/2019

Qingqiu Huang

Movie Meets AI

Page 2: Movie Meets AI - Chinese University of Hong Kong

1. Introduction

2. Tag-based Understand

3. Story-based Understand

4. Conclusion

CONTENTS

Page 3: Movie Meets AI - Chinese University of Hong Kong

Introduction

Page 4: Movie Meets AI - Chinese University of Hong Kong

4

1. To understand movies is to understand our world

2. Cross-modal & rich resources

……

Why movies?

Page 5: Movie Meets AI - Chinese University of Hong Kong

A Large Scale Dataset

5

• 130K+ Movie Meta Data• Cast• Genres• ……

• 50K+ Trailers• 45K+ Plot• 100M+ Images

• Poster• Profile

• 4000+ Movies• 1000+ Script

Titanic11/18/1997

Drama, Romance, Thriller

7.5

In 1996, treasure hunter Brock Lovett and his team aboard the research vessel Akademik Mstislav Keldysh search the wreck of Titanic for a necklace with a rare diamond, the Heart ...

Leonardo DiCaprioJack

Kate WinsletRose

Frances FisherRuth

Billy ZaneCaledon

Kathy BatesMolly

Page 6: Movie Meets AI - Chinese University of Hong Kong

Tag-based Understand

Page 7: Movie Meets AI - Chinese University of Hong Kong

7

Titanic11/18/1997

Drama, Romance, Thriller

7.5

In 1996, treasure hunter Brock Lovett and his team aboard the research vessel Akademik Mstislav Keldysh search the wreck of Titanic for a necklace with a rare diamond, the Heart ...

Leonardo DiCaprioJack

Kate WinsletRose

Frances FisherRuth

Billy ZaneCaledon

Kathy BatesMolly

Tag-based Understand

Tag: genres, plot keywords

Page 8: Movie Meets AI - Chinese University of Hong Kong

Tag-based Understand

8

Solution• Take shot as unit

...8 shots

VisualModel

pooling

VisualModel

VisualModel

pooling

... ... ...• Train on trailers• Sparse sampling on training

Challenge• Movie is too long! 90min vs. 1min

• Only tag for the whole movie!

Page 9: Movie Meets AI - Chinese University of Hong Kong

Tag-based Understand

9

lovedystopia

VisualModel

VisualModel

VisualModel

VisualModel

VisualModel

VisualModel

LSTM

From Trailers to Storylines: An Efficient Way to Learn from MoviesQingqiu Huang, Yuanjun Xiong, Yu Xiong, Yuqi Zhang, Dahua Lin

Page 10: Movie Meets AI - Chinese University of Hong Kong

Tag-based Understand

10

Page 11: Movie Meets AI - Chinese University of Hong Kong

11

Clips Retrieval by Tags

Page 12: Movie Meets AI - Chinese University of Hong Kong

Story-based Understand

Page 13: Movie Meets AI - Chinese University of Hong Kong

who

whatwhere

Cast

Event

Scene

13

Elements of Story

Page 14: Movie Meets AI - Chinese University of Hong Kong

14

Cast

Page 15: Movie Meets AI - Chinese University of Hong Kong

Cast in Movies (CIM)

15

• 3348 cast from 630 movies• More than 1.2M instances• Bounding box and identity

are manually annotated

Page 16: Movie Meets AI - Chinese University of Hong Kong

16

Leonardo DiCaprio in CIM

Page 17: Movie Meets AI - Chinese University of Hong Kong

17

Kate Winslet in CIM

Page 18: Movie Meets AI - Chinese University of Hong Kong

Cast Recognition

18

• Face Recognition

-- from MS-Celeb-1M

• Person Re-identification

-- from MARS

Page 19: Movie Meets AI - Chinese University of Hong Kong

Cast Recognition

19

• Most of the instances in movie are without frontal faces

• Clothing and makeup would change a lot

-- Face Recognition Failed

-- Person Re-id Failed

Page 20: Movie Meets AI - Chinese University of Hong Kong

Cast Recognition with Context

20

With Face + Visual Context + Social Context

Person-Event

Person-Person

Rose Jack Caledon Molly Ruth BrockEdward

Page 21: Movie Meets AI - Chinese University of Hong Kong

Cast Recognition with Context

21

X1 X2 X3 X4Y

X1 X4

X3X2

Y

RANet

FC

FC

FC

FC

CNN

CNN

CNN

CNN

Conv FC

Visual Context Social Context

• Learn instance-specific weights for different regions with a Region Attention Network (RANet)

𝑠𝑠 𝑖𝑖, 𝑗𝑗 = �𝑟𝑟=1

𝑅𝑅

𝑤𝑤𝑖𝑖𝑟𝑟𝑤𝑤𝑗𝑗𝑟𝑟𝑠𝑠𝑟𝑟 𝑖𝑖, 𝑗𝑗

𝐽𝐽 𝑿𝑿,𝒀𝒀; �𝑭𝑭,𝑷𝑷, |𝑸𝑸 𝑺𝑺,𝑭𝑭 = 𝜓𝜓𝜈𝜈 |𝑿𝑿 𝑺𝑺 + 𝛼𝛼 ⋅ 𝜙𝜙𝑒𝑒𝑒𝑒 𝒀𝒀,𝑿𝑿; �𝑭𝑭, |𝑷𝑷 𝑭𝑭 + 𝛽𝛽 ⋅ 𝜙𝜙𝑒𝑒𝑒𝑒(𝑿𝑿;𝐐𝐐)

• Join person identification with social context learning, including person-person and event-person relations

Unifying Identification and Context Learning for Person RecognitionQingqiu Huang, Yu Xiong, Dahua Lin Conference of Computer Vision and Pattern Recognition (CVPR) 2018

Page 22: Movie Meets AI - Chinese University of Hong Kong

Visual Matching

22

RANet

FC

FC

FC

FC

CNN

CNN

CNN

CNN

Conv FC

Region specific Weights

Page 23: Movie Meets AI - Chinese University of Hong Kong

Unified Formulation with Social Context

23

X1 X4

X3X2

YX1 X2 X3 X4Y

Page 24: Movie Meets AI - Chinese University of Hong Kong

Unified Formulation with Social Context

24

X1 X4

X3X2

Y

Page 25: Movie Meets AI - Chinese University of Hong Kong

Experiments

25

Dataset Split

Existing Methods on PIPA Ours

PIPER Naeil RNN MLC Baseline RANetFusion

Full Model

PIPA

Original 83.05 86.78 84.93 88.20 82.79 87.33 89.73

Album - 78.72 78.25 83.02 75.24 82.59 85.33

Time - 69.29 66.43 77.04 66.55 76.52 80.42

Day - 46.61 43.73 59.77 47.09 65.49 67.16

CIM - - - - - 68.12 71.93 72.56

Page 26: Movie Meets AI - Chinese University of Hong Kong

Experiments

26

Query Face Recognition RANet Fusion Full Model

Experiments of Recognition Results Events Discovered by Our Approach

Page 27: Movie Meets AI - Chinese University of Hong Kong

Conclusion

27

• A new framework• Region Attention Network to adaptively combine visual cues• Unify person identification and context learning in joint inference

• Get state-of-the-art performance on PIPA and CIM

X1 X2 X3 X4Y

X1 X4

X3X2

Y

RANet

FC

FC

FC

FC

CNN

CNN

CNN

CNN

Conv FC

Visual Context Social Context

Page 28: Movie Meets AI - Chinese University of Hong Kong

Cast Search with One Portrait

28

Query

Database

Page 29: Movie Meets AI - Chinese University of Hong Kong

Cast Search with One Portrait

29

?

Page 30: Movie Meets AI - Chinese University of Hong Kong

Cast Search with One Portrait

30

Visual LinkTemporal Link

Person Search in Videos with one Portrait through Visual and Temporal LinksQingqiu Huang, Wentao Liu, Dahua Lin European Conference of Computer Vision (ECCV) 2018

Page 31: Movie Meets AI - Chinese University of Hong Kong

31

Cast Search with One Portrait

• Competitive Consensus

0.4

0.3

0.3

Linear Diffusion

Competitive Consensus

0.9

0.1

0.2

0.8

0.3

0.7

?

?

mean 0.5

0.50.4*

0.9

0.10.3*

0.2

0.8

0.3

0.70.3*

0.36

0.04

0.06

0.24

0.09

0.21

concat

0.36

0.04

0.06

0.24

0.09

0.21

concat max 0.36

0.24

0.8

0.2

softmax

0.4*0.9

0.10.3*

0.2

0.8

0.3

0.70.3*

• Progressive Propagation

• mAP: 33.66% -> 47.41%

Page 32: Movie Meets AI - Chinese University of Hong Kong

Experiments

32

IN ACROSSmAP R@1 R@3 mAP R@1 R@3

FACE 53.55 76.19 91.11 42.16 53.15 61.12LP 8.19 39.70 70.11 0.37 0.41 1.60

PPCC 63.49 83.44 94.40 62.27 62.54 73.86

Page 33: Movie Meets AI - Chinese University of Hong Kong

33

Cast Search with One Portrait

Page 34: Movie Meets AI - Chinese University of Hong Kong

34

Cast Search in a Whole Movie

Page 35: Movie Meets AI - Chinese University of Hong Kong

Future Work

35

• Memory

• Speech & Subtitle

Page 36: Movie Meets AI - Chinese University of Hong Kong

36

Event

Page 37: Movie Meets AI - Chinese University of Hong Kong

Event Retrieval and Localization by Natural Language

37Find and Focus: Retrieve and Localize Video Events with Natural Language QueriesDian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Dahua Lin European Conference of Computer Vision (ECCV) 2018

As the van doors are closed the sandstorm zooms in like a swarm of angry bees.

Everyone looks up as a string of sand whizzes past like an express train.

The weight of the sand presses the accelerator on the van, picks up speed.

… Everyone looks up as a string of sand whizzes past like an express train. As the van doors are closed the sandstorm zooms in like a swarm of angry bees. The weight of the sand presses the accelerator on the van, picks up speed. …

Page 38: Movie Meets AI - Chinese University of Hong Kong

38

: Two men are talking outside a building.

: A woman and another man walk away as the two men continue their conversation.

: The men engage in a game of pool, shooting the balls into the corner pockets and taking turns.

FindNo.1

No.4(Ground-truth video)

… …Localization

Ground-truth

Localization

Focus

No.1 → No.5

… …

𝑠𝑠1

𝑠𝑠2

𝑠𝑠𝑀𝑀

𝑠𝑠1

𝑠𝑠2

𝑠𝑠𝑀𝑀… …

Find and Focus: Retrieve and Localize Video Events with Natural Language QueriesDian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Dahua Lin European Conference of Computer Vision (ECCV) 2018

Event Retrieval and Localization by Natural Language

Page 39: Movie Meets AI - Chinese University of Hong Kong

39

Event Retrieval and Localization by Natural Language

Page 40: Movie Meets AI - Chinese University of Hong Kong

Future Work

40

• Story-based Summary

• Caption (Story Telling)

Page 41: Movie Meets AI - Chinese University of Hong Kong

Conclusion

Page 42: Movie Meets AI - Chinese University of Hong Kong

Conclusion

42

• A Large-scale Movie Dataset• Tag-based Understand

• Learn from trailers to get shot-level tag response• Story-based Understand

• Cast• Cast recognition with context• Cast search through visual and temporal links

• Event• Hieratical framework for video retrieval by natural language

• ……

Page 43: Movie Meets AI - Chinese University of Hong Kong

Thank You

25/03/2019Qingqiu Huang