parikh lecture1 intro

78
ECE 5554: Computer Vision Devi Parikh Assistant Professor ECE, Virginia Tech Disclaimer: Many slides have been borrowed from Kristen Grauman, who may have borrowed some of them from others. Any time a slide did not already have a credit on it, I have credited it to Kristen. So there is a chance some of these credits are inaccurate.

Upload: tenpointer

Post on 07-Sep-2015

231 views

Category:

Documents


0 download

DESCRIPTION

Computer Vision,

TRANSCRIPT

Making Computers See

ECE 5554: Computer Vision

Devi Parikh

Assistant Professor

ECE, Virginia Tech

Disclaimer: Many slides have been borrowed from Kristen Grauman, who may have borrowed some of them from others. Any time a slide did not already have a credit on it, I have credited it to Kristen. So there is a chance some of these credits are inaccurate.

Plan for today

Topic overview

Introductions

Little bit about me

Little bit about you

Course overview:

Logistics and requirements

Coming up

Please interrupt at any time with questions or comments

2

Slide credit: Devi Parikh

2

What Is Computer Vision?

3

Slide credit: Devi Parikh

3

Computer Vision:Making Computers See

4

Image from: http://kirkh.deviantart.com/art/BioMech-Eye-168367549

4

Computer Vision

Automatic understanding of images and video

Computing properties of the 3D world from visual data (measurement)

5

Slide credit: Kristen Grauman

5

5

1. Vision for measurement

Real-time stereo

Structure from motion

NASA Mars Rover

Tracking

Demirdjian et al.

Snavely et al.

Wang et al.

6

Slide credit: Kristen Grauman

6

6

Computer Vision

Automatic understanding of images and video

Computing properties of the 3D world from visual data (measurement)

Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)

7

Slide credit: Kristen Grauman

7

7

sky

water

Ferris wheel

amusement park

Cedar Point

12 E

tree

tree

tree

carousel

deck

people waiting in line

ride

ride

ride

umbrellas

pedestrians

maxair

bench

tree

Lake Erie

people sitting on ride

Objects

Activities

Scenes

Locations

Text / writing

Faces

Gestures

Motions

Emotions

The Wicked Twister

2. Vision for perception, interpretation

8

Slide credit: Kristen Grauman

8

8

Computer Vision

Automatic understanding of images and video

Computing properties of the 3D world from visual data (measurement)

Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)

Algorithms to mine, search, and interact with visual data (search and organization)

9

Slide credit: Kristen Grauman

9

9

3. Visual search, organization

Image or video archives

?

Query

1

2

3

Relevant content

10

Slide credit: Kristen Grauman

10

Related disciplines

Cognitive science

Algorithms

Image processing

Artificial intelligence

Graphics

Machine learning

Computer vision

11

Slide credit: Kristen Grauman

11

Vision and graphics

Model

Images

Vision

Graphics

Inverse problems: analysis and synthesis.

12

Slide credit: Kristen Grauman

12

Slide credit: Larry Zitnick

What humans see

13

2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13

2432392402252061851882182112062162252422392181106731341522132062082212432421235894821327710820820821523521711521224323624713991209208211233208131222219226196114742082132142322171311167715069565220122822323223218218618417915912393232235235232236201154216133129811752522412402352382301281721386563234249241245237236247143597810942552482472512342372451935533115144213255253251248245161128149109138654715623925519010739102947311458177511372332331481682031794327171281726121602552551092226193524

Slide credit: Larry Zitnick

What computers see

14

2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14

Slide credit: Larry Zitnick

What do humans see?

15

15

How many object categories are there?

~10,000 to 30,000

Biederman 1987

16

Slide credit: Fei-Fei, Fergus, Torralba CVPR07 Short Course

16

17

Slide credit: Devi Parikh

17

Torralba et al. PAMI 2008

Slide credit: Larry Zitnick

What do humans see?

18

Torralba et al. PAMI 2008

chair

table setting

light

picture

Slide credit: Larry Zitnick

What do humans see?

Slide credit: Larry Zitnick

What do humans see?

20

Why is vision difficult?

Ill-posed problem: real world much more complex than what we can measure in images

3D 2D

Impossible to literally invert image formation process

21

Slide credit: Kristen Grauman

21

Challenges: many nuisance parameters

Illumination

Object pose

Clutter

Viewpoint

Intra-class appearance

Occlusions

22

Slide credit: Kristen Grauman

22

Challenges: intra-class variation

23

Slide credit: Fei-Fei, Fergus & Torralba

23

Challenges: importance of context

24

Slide credit: Fei-Fei, Fergus & Torralba

24

Challenges: complexity

Thousands to millions of pixels in an image

3,000-30,000 human recognizable object categories

30+ degrees of freedom in the pose of articulated objects (humans)

Billions of images indexed by Google Image Search

18 billion+ prints produced from digital camera images in 2004

295.5 million camera phones sold in 2005

About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991]

25

Slide credit: Kristen Grauman

25

spend the summer linking a camera to a

computer and getting the computer to describe what it saw

Marvin Minsky (1966), MIT

Turing Award (1969)

47 years later

26

Slide credit: Devi Parikh

How hard is computer vision?

26

Gerald Sussman, MIT

Youll notice that Sussman never worked in vision again! Berthold Horn

Slide credit: Devi Parikh

How hard is computer vision?

27

Progress so far

28

Slide credit: Devi Parikh

28

Progress so far

29

Slide credit: Devi Parikh

Progress so far

30

Slide credit: Devi Parikh

30

Progress so far

31

Slide credit: Devi Parikh

Location

AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua

32

Slide credit: Devi Parikh

33

Slide credit: Devi Parikh

AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua

3D Models

Rome

2 million photos

Dubrovnik

58 thousand photos

34

Slide credit: Devi Parikh

Dubrovnik

35

Slide credit: Devi Parikh

AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua

Progress so far

36

Slide credit: Devi Parikh

36

Progress so far

37

Slide credit: Devi Parikh

Progress so far

38

Slide credit: Devi Parikh

L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph.D. thesis, MIT Department of Electrical Engineering, 1963.

Visual data in 1963

39

Slide credit: Kristen Grauman

39

Personal photo albums

Surveillance and security

Movies, news, sports

Medical and scientific images

Visual data in 2013

40

Slide credit: Svetlana Lazebnik

40

40

Why vision?

As image sources multiply, so do applications

Relieve humans of boring, easy tasks

Enhance human abilities

Advance human-computer interaction, visualization

Perception for robotics / autonomous agents

Organize and give access to visual content

41

Slide credit: Kristen Grauman

Applications

Post-disaster family reunification

Law enforcement

Surveillance

Robotics

Autonomous driving

Medical imaging

Photo organization

Image search

E-commerce

cell phone cameras, social media, Google Glass, etc.

42

Slide credit: Devi Parikh

Summary

Computer Vision is a hard problem.

Lots of cool and important applications.

A growing and exciting field.

New teams in existing companies, new companies, etc.

43

Slide credit: Devi Parikh

Introductions

Instructor

Devi Parikh

[email protected]

Joined ECE, Virginia Tech in January 2013

Research Assistant Professor at TTI-Chicago for 3.5 years

Ph.D. from Carnegie Mellon University in 2009

Research area: Computer Vision

Recognition

Human-machine communication

44

Slide credit: Devi Parikh

Introductions

TA:

Neelima Chavali

[email protected]

M.S. student in ECE

45

Slide credit: Devi Parikh

Introductions

You

Name?

Department?

What are you hoping to get out of this course?

Do you have any experience in computer vision?

46

Slide credit: Devi Parikh

This course

ECE 5554

TR 3:30 pm to 4:45 pm

McBryde Hall (MCB) 307

My office hours: F 2:30 pm to 3:30 pm

Neelimas office hours: MW 11:00 am to noon

Course webpage:

http://filebox.ece.vt.edu/~F13ECE5554/

(Google me My homepage Teaching)

47

Slide credit: Devi Parikh

47

47

This course

Introductory Computer Vision course

Basics and fundamentals

Hands-on assignments and projects

Views of vision as a research area

48

Slide credit: Devi Parikh

48

48

Other courses

Advanced Computer Vision

Devi Parikh

Spring semesters

Introduction to Machine Learning and Perception

Dhruv Batra

Fall semesters

Advanced Machine Learning

Dhruv Batra

Spring semesters

49

Slide credit: Devi Parikh

49

49

Topics overview

Features & filters

Grouping & fitting

Multiple views and motion

Recognition

Video processing

Focus is on algorithms, rather than specific systems.

50

Slide credit: Kristen Grauman

50

Features and filters

Transforming and describing images; textures, colors, edges

51

Slide credit: Kristen Grauman

51

Grouping & fitting

[fig from Shi et al]

Clustering, segmentation, fitting; what parts belong together?

52

Slide credit: Kristen Grauman

52

Multiple views and motion

Hartley and Zisserman

Lowe

Multi-view geometry, matching, invariant features, stereo vision

Fei-Fei Li

53

Slide credit: Kristen Grauman

53

Recognition and learning

Recognizing objects and categories, learning techniques

54

Slide credit: Kristen Grauman

54

Video processing

Tomas Izo

Tracking objects, video analysis, low level motion, optical flow

55

Slide credit: Kristen Grauman

55

Textbooks

Recommended book:

Computer Vision:

Algorithms and Applications

By Rick Szeliski

http://szeliski.org/Book/

Lectures will be posted online

56

Slide credit: Kristen Grauman

56

Requirements / Grading

Problem sets (55%)

Project (25%)

Final exam (15%)

Class participation, including attendance (5%)

57

Slide credit: Devi Parikh

57

Problem sets

Some short answer concept questions

Programming problems

Implementation

Explanation, results

Follow instructions. Points will be deducted if cant run your code on our data, cant run our code on your data, etc.

Ask questions on Scholar forum first

Code in Matlab

These assignments are substantial.

They will take significant time to do.

Start early.

58

Slide credit: Kristen Grauman

58

Matlab

Built-in toolboxes for low-level image processing, visualization

Compact programs

Intuitive interactive debugging

Widely used in engineering

59

Slide credit: Kristen Grauman

59

PS0

PS0: Matlab warmup + basic image manipulation

Out today, due in ~ a week (Monday night)

60

Slide credit: Kristen Grauman

60

Digital images

Images as matrices

61

Slide credit: Kristen Grauman

61

im[176][201] has value 164

im[194][203] has value 37

width 520

j=1

500 height

i=1

Intensity : [0,255]

Digital images

62

Slide credit: Kristen Grauman

62

R

G

B

Color images, RGB color space

63

Slide credit: Kristen Grauman

63

Preview of some problem sets

64

Slide credit: Devi Parikh

resize: castle squished

crop: castle cropped

content aware resizing:

seam carving

Preview of some problem sets

Grouping

65

Slide credit: Kristen Grauman

65

Preview of some problem sets

Image mosaics / stitching

66

Slide credit: Kristen Grauman, Image from: Fei-Fei Li

66

Preview of some problem sets

Object search and recognition

67

Slide credit: Kristen Grauman

67

Preview of some problem sets

Tracking, activity recognition

68

Slide credit: Kristen Grauman

68

Assignment deadlines

Assignments in by 11:59 PM the day before class

Follow submission instructions given in assignment

Submit to scholar

No hard copy submissions

Deadlines are firm. Well use scholar timestamp. Even 1 minute late is late.

4 total free late days for the semester

Use them wisely: first two assignments are easier than others

If your program doesnt work, clean up the code, comment it well, explain what you have, and still submit. Draw our attention to this in your answer sheet.

69

Slide adapted from Kristen Grauman

69

Projects

Possibilities:

Apply any techniques we studied in class or related to real world problem

Extend a technique

Empirically analyze a technique

Compare approaches

Design and evaluate a novel approach

Novel application

Be creative!

Publication?

Can work with a partner

Talk to me if you need help with ideas

70

Slide credit: Devi Parikh

70

70

Project timeline (tenative)

Project proposals (1 page) [10%]

October 1st

Project presentations (5-10 minutes) [40%]

November 23rd (Saturday)

If anticipate being a problem, talk to me well in advance for alternate arrangements

Project reports (4 pages) [50%]

December 10th

71

Slide credit: Devi Parikh

71

71

Collaboration policy

All responses and code must be written individually.

Students submitting answers or code found to be identical or substantially similar (due to inappropriate collaboration) risk failing the course.

72

Slide adapted from Kristen Grauman

72

Miscellaneous

Check class website regularly

No laptops, phones, etc. open in class please.

Use our office hours!

Please interrupt with questions at any time.

73

Slide adapted from Kristen Grauman

73

Coming up

Now: read the class webpage carefully

Now: check out Matlab tutorial online

Now: PS0 is out

Thursday August 29th : first lecture on linear filters

Monday September 2nd : PS0 due

74

Slide credit: Devi Parikh

74

Questions?

See you Thursday!

Big triangles

Little triangles

?

Slide credit: Devi Parikh

76

Example: Big triangles vs. Little triangles

76

Slide credit: Devi Parikh

Example: Big triangles vs. Little triangles

77

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

0

0

1

1

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

1

1

1

0

0

1

0

0

0

0

1

1

1

1

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Sum = 16

Sum = 13

Sum = 2

Sum = 3

Rule

If sum > 10

Answer = Big triangle

Else

Answer = Little triangle

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

?

Sum = 2

Little triangle

Slide credit: Devi Parikh

Example: Big triangles vs. Little triangles

78

78