bounding box layers for relationship detection · introduction method overview 3. attribute...

1
Introduction Method overview 3. Attribute detection Conclusion Problem Current computer vision methods provide isolated object classes from an image, but how do they interact with each other? What are they made of? Goal Detecting the relationships between pairs of objects in the given image. Detecting objects that are made of a certain material (attribute relationship). We did not resort to ensemble methods. Evaluation Relationship Attribute All Public 0.17934 0.04183 0.21774 Private 0.15776 0.04167 0.19666 Scores in the leader board 1. Mean Average Precision(mAP) at IoU > 0.5 focusing on relationships 2. Recall@50 focusing on relationships 3. Mean Average Precision(mAP) at IoU > 0.5 focusing on phrases The weights applied to each of the 3 metrics are [0.4, 0.2, 0.4] Sho Inayoshi 1 , Keita Otani 1 , Masahiro Igarashi 1 , Kentaro Takemoto 1 , Sukrut Rao 2 , Kuniaki Saito 1 , Antonio Tejero-de-Pablos 1 , Yoshitaka Ushiku 1 , Tatsuya Harada 1, 3 1 The University of Tokyo, 2 IIT Hyderabad, 3 RIKEN Bounding Box Layers for Relationship Detection Example of object-object relationship Example of attribute relationship 1. Object detection We use FPN (Feature Pyramid Network) [Lin, et al., 2017] to detect small objects. Intermediate output from FPN are used in relationship detection. 1. Object Detection 3. Attribute Detection 2. Relationship Detection Object-Object Relationship Triplets Object-Attribute Relationship Triplets Wine glass on tableTable is woodenObject Image Object Label (One hot vector) Attribute confidences Sigmoid ResNet50 Linear Linear Bounding Box of one object Linear Learned Pretrained 2. Relationship detection Merits of our "bounding box layer" network Our "bounding box layers" allow associating bounding boxes with their respective features. Our "bounding box layers" allow emphasizing the features contained in the bounding box of the detected objects. Sorting our “bounding box layers” by class allows the network to learn different object types in a different way. Difference with the related work in relationship detection Previous works take as their input vectorized bounding boxes [<x coordinate of an object>, <ratio of the width of two objects>, …] that also encode object labels [Zhuang, et al., 2017] . Our method uses bounding boxes as "bounding box layers” that indicate the position of the object in the feature maps, and uses labels to sort “bounding box layers” according to the object class. We calculate relationship confidences for all pairs of objects detected (! objects ! !−1 confidences). RoI Alignment Surrounding Box (dash line) Clip surrounding box (size 1010) Bounding box layers Layers are filled with 1.5 in the bounding box and 0.5 in the other area, and then resized to same size as the feature maps The layer corresponding to the object class becomes the “bounding box layer”. The rest are filled with 0. Convolution & ReLU 3 Stack these 3 chunks of layers Linear & ReLU 3 Sigmoid Relationship confidences Feature maps (size 1010) 1.5 0.5 1.5 0.5 Bounding Box of two objects Feature maps Learned Extract features in the Region of Interest Class 0 1 2 3 4 5 Class 0 1 2 3 4 5 Our “bounding box layer” network allows improving relationship detection between pairs of objects by emphasizing their features. Our “bounding box layers” can be used in other networks that use object detection results. ResNet50+FPN Faster R-CNN Feature maps Apart from the well-known ResNet classifier, our attribute detector also uses the object label to restrict the possible attributes (e.g., a glass is likely to be transparent). The number of layers is equal to the number of classes. O1 O2 O1 O2

Upload: others

Post on 20-Sep-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bounding Box Layers for Relationship Detection · Introduction Method overview 3. Attribute detection Conclusion Problem Current computer vision methods provide isolated object classes

Introduction Method overview

3. Attribute detection

Conclusion

ProblemCurrent computer vision methods provide isolated object classes from an image, but how do they interact with each other? What are they made of?

Goal• Detecting the relationships between pairs of objects in the

given image.• Detecting objects that are made of a certain material (attribute

relationship).

We did not resort to ensemble methods.

Evaluation

Relationship Attribute AllPublic 0.17934 0.04183 0.21774Private 0.15776 0.04167 0.19666

Scores in the leader board1. Mean Average Precision(mAP) at IoU > 0.5 focusing on

relationships2. Recall@50 focusing on relationships3. Mean Average Precision(mAP) at IoU > 0.5 focusing on phrasesThe weights applied to each of the 3 metrics are [0.4, 0.2, 0.4]

Sho Inayoshi1, Keita Otani1, Masahiro Igarashi1, Kentaro Takemoto1, Sukrut Rao2, Kuniaki Saito1,Antonio Tejero-de-Pablos1, Yoshitaka Ushiku1, Tatsuya Harada1, 31The University of Tokyo, 2IIT Hyderabad, 3RIKEN

Bounding Box Layers for Relationship Detection

Example of object-object relationship Example of attribute relationship

1. Object detection

We use FPN (Feature Pyramid Network) [Lin, et al., 2017] to detect small objects. Intermediate output from FPN are used in relationship detection.

1. Object Detection

3. Attribute Detection

2. Relationship Detection

Object-Object Relationship Triplets

Object-Attribute Relationship Triplets

“Wine glass on table”

“Table is wooden”

Object ImageObject Label

(One hot vector)

Attribute confidences

Sigmoid

ResNet50

Linear

Linear

Bounding Box of one object

Linear

Learned

Pretrained

2. Relationship detection

Merits of our "bounding box layer" network• Our "bounding box layers" allow associating bounding boxes

with their respective features.• Our "bounding box layers" allow emphasizing the features

contained in the bounding box of the detected objects.• Sorting our “bounding box layers” by class allows the

network to learn different object types in a different way.

Difference with the related work in relationship detectionPrevious works take as their input vectorized bounding boxes[<x coordinate of an object>, <ratio of the width of two objects>, …]that also encode object labels [Zhuang, et al., 2017].Our method uses bounding boxes as "bounding box layers” that indicate the position of the object in the feature maps, and uses labels to sort “bounding box layers” according to the object class.

We calculate relationship confidences for all pairs of objects detected(! objects → ! ! − 1 confidences).

RoIAlignment

Surrounding Box(dash line)

Clip surrounding box (size 10�10)

Bounding box layersLayers are filled with 1.5 in thebounding box and 0.5 in theother area, and then resized tosame size as the feature maps

The layer corresponding to the object class becomes the “bounding box layer”. The rest are filled with 0.

Convolution & ReLU�3

Stack these 3chunks of layers

Linear & ReLU�3

Sigmoid

Relationship confidences

Feature maps (size 10�10)

1.5

0.5

1.5

0.5

Bounding Box of two objectsFeature maps

Learned

Extract features in theRegion of Interest

Class 012345

Class 012345

• Our “bounding box layer” network allows improving relationship detection between pairs of objects by emphasizing their features.

• Our “bounding box layers” can be used in other networks that use object detection results.

ResNet50+FPN Faster R-CNN

Feature maps

Apart from the well-known ResNet classifier, our attribute detector also uses the object label to restrict the possible attributes (e.g., a glass is likely to be transparent).

The number oflayers is equal tothe number ofclasses.

O1

O2O1

O2