big data - university of pittsburghpeople.cs.pitt.edu/~kovashka/cs3710_sp15/bigdata_phuong.pdfbased...

Post on 13-Oct-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big dataPhuong Pham

April 9, 15

Learning Everything about Anything:Webly-Supervised Visual Concept

LearningBased on Santosh Divvala’s slide

Motivation

3

Rearing horse Rolling horse Reining horse Eye horse

What are these?

“horse”

A system that can

4

Without human supervision• Biased, non-comprehensive• Concept-specific expertise• Scalability

Approach

5

Overall questions? Or we will go into detail of each component

Retrieve concept’s variations

• Approximately 5000 n-grams per concept

• Several visually non-salient n-grams e.g. “last horse”, “particular horse”

• Classifier-based pruning (5000 -> 1000)• Train/test thumbnail images, if A.P. < threshold, discard

n-gram

• Find good quality n-gram

6

Good quality n-grams

7

quality coverage

Not for finding super-ngrams!

Merge into S if i,j are different?

Super-ngrams

8

Merging similar “good quality” ngrams

Training models

• Train separate DPM per super-ngram

• Pruning noisy components

• Merging similar appearance clustering (~ 50%)

9

Pruning noisy components

10

Noisy components (pruned based on A.P. threshold and frequency)

Merging similar components (as super-ngram)11

Object detection

12

• Their system is comparable to WS_vid_CVPR12 (better in avg)• WS_Img_ICCV11 is the best model at the cost of fully supervised learning (limitation of annotation)

Action detection

13The webly learning give reasonable good result?

Future work

14

Summary

• Advantages• Cut annotation cost

• Combine linguistic and visual semantic may lead to more interesting research topics

• ?

• Disadvantages• Still not defeat state of the art even with training set at

large scale

• ?

15

Questions

16

Scene Completion using Millions of Photographs

Based on James Hays’ slides

Motivation

18

The algorithm

19

Input image Scene Descriptor Image Collection

200 matches20 completionsContext matching+ blending

Semantic scene matching

20

→Gist scene descriptor from 6 orientation and 5 scales

Find its L2 nearest neighbors in 2.3 million images

21

… 200 total

22Graph cut + Poisson blending Hays and Efros, SIGGRAPH 2007

Context matching

Graph cut seam finding

23

For missing regions of the existing image

For regions of the image not covered be the scene match

For other pixels

Result ranking

24

We assign each of the 200 results a score which is the sum of:

The scene matching distance

The context matching distance (color + texture)

The graph cut cost

Qualitative

25

Qualitative - failures

26

Quantitive

• Our: 37%

• Baseline: 10%

27

Human evaluation

28

Discussions

29

top related