bounding box layers for relationship detection · introduction method overview 3. attribute...
TRANSCRIPT
Introduction Method overview
3. Attribute detection
Conclusion
ProblemCurrent computer vision methods provide isolated object classes from an image, but how do they interact with each other? What are they made of?
Goal• Detecting the relationships between pairs of objects in the
given image.• Detecting objects that are made of a certain material (attribute
relationship).
We did not resort to ensemble methods.
Evaluation
Relationship Attribute AllPublic 0.17934 0.04183 0.21774Private 0.15776 0.04167 0.19666
Scores in the leader board1. Mean Average Precision(mAP) at IoU > 0.5 focusing on
relationships2. Recall@50 focusing on relationships3. Mean Average Precision(mAP) at IoU > 0.5 focusing on phrasesThe weights applied to each of the 3 metrics are [0.4, 0.2, 0.4]
Sho Inayoshi1, Keita Otani1, Masahiro Igarashi1, Kentaro Takemoto1, Sukrut Rao2, Kuniaki Saito1,Antonio Tejero-de-Pablos1, Yoshitaka Ushiku1, Tatsuya Harada1, 31The University of Tokyo, 2IIT Hyderabad, 3RIKEN
Bounding Box Layers for Relationship Detection
Example of object-object relationship Example of attribute relationship
1. Object detection
We use FPN (Feature Pyramid Network) [Lin, et al., 2017] to detect small objects. Intermediate output from FPN are used in relationship detection.
1. Object Detection
3. Attribute Detection
2. Relationship Detection
Object-Object Relationship Triplets
Object-Attribute Relationship Triplets
“Wine glass on table”
“Table is wooden”
Object ImageObject Label
(One hot vector)
Attribute confidences
Sigmoid
ResNet50
Linear
Linear
Bounding Box of one object
Linear
Learned
Pretrained
2. Relationship detection
Merits of our "bounding box layer" network• Our "bounding box layers" allow associating bounding boxes
with their respective features.• Our "bounding box layers" allow emphasizing the features
contained in the bounding box of the detected objects.• Sorting our “bounding box layers” by class allows the
network to learn different object types in a different way.
Difference with the related work in relationship detectionPrevious works take as their input vectorized bounding boxes[<x coordinate of an object>, <ratio of the width of two objects>, …]that also encode object labels [Zhuang, et al., 2017].Our method uses bounding boxes as "bounding box layers” that indicate the position of the object in the feature maps, and uses labels to sort “bounding box layers” according to the object class.
We calculate relationship confidences for all pairs of objects detected(! objects → ! ! − 1 confidences).
RoIAlignment
Surrounding Box(dash line)
Clip surrounding box (size 10�10)
Bounding box layersLayers are filled with 1.5 in thebounding box and 0.5 in theother area, and then resized tosame size as the feature maps
The layer corresponding to the object class becomes the “bounding box layer”. The rest are filled with 0.
Convolution & ReLU�3
Stack these 3chunks of layers
Linear & ReLU�3
Sigmoid
Relationship confidences
Feature maps (size 10�10)
1.5
0.5
1.5
0.5
Bounding Box of two objectsFeature maps
Learned
Extract features in theRegion of Interest
Class 012345
Class 012345
• Our “bounding box layer” network allows improving relationship detection between pairs of objects by emphasizing their features.
• Our “bounding box layers” can be used in other networks that use object detection results.
ResNet50+FPN Faster R-CNN
Feature maps
Apart from the well-known ResNet classifier, our attribute detector also uses the object label to restrict the possible attributes (e.g., a glass is likely to be transparent).
The number oflayers is equal tothe number ofclasses.
O1
O2O1
O2