parikh lecture1 intro

Making Computers See

ECE 5554: Computer Vision

Devi Parikh

Assistant Professor

ECE, Virginia Tech

Disclaimer: Many slides have been borrowed from Kristen Grauman, who may have borrowed some of them from others. Any time a slide did not already have a credit on it, I have credited it to Kristen. So there is a chance some of these credits are inaccurate.

Plan for today

Topic overview

Introductions

Little bit about me

Little bit about you

Course overview:

Logistics and requirements

Coming up

Please interrupt at any time with questions or comments

2

Slide credit: Devi Parikh

2

What Is Computer Vision?

3


3

Computer Vision:Making Computers See

4

Image from: http://kirkh.deviantart.com/art/BioMech-Eye-168367549

4

Computer Vision

Automatic understanding of images and video

Computing properties of the 3D world from visual data (measurement)

5

Slide credit: Kristen Grauman

5

5

1. Vision for measurement

Real-time stereo

Structure from motion

NASA Mars Rover

Tracking

Demirdjian et al.

Snavely et al.

Wang et al.

6


6

6

Computer Vision



Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)

7


7

7

sky

water

Ferris wheel

amusement park

Cedar Point

12 E

tree

tree

tree

carousel

deck

people waiting in line

ride

ride

ride

umbrellas

pedestrians

maxair

bench

tree

Lake Erie

people sitting on ride

Objects

Activities

Scenes

Locations

Text / writing

Faces

Gestures

Motions

Emotions

The Wicked Twister

2. Vision for perception, interpretation

8


8

8

Computer Vision



Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)

Algorithms to mine, search, and interact with visual data (search and organization)

9


9

9

3. Visual search, organization

Image or video archives

?

Query

1

2

3

Relevant content

10


10

Related disciplines

Cognitive science

Algorithms

Image processing

Artificial intelligence

Graphics

Machine learning

Computer vision

11


11

Vision and graphics

Model

Images

Vision

Graphics

Inverse problems: analysis and synthesis.

12


12

Slide credit: Larry Zitnick

What humans see

13

2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13

2432392402252061851882182112062162252422392181106731341522132062082212432421235894821327710820820821523521711521224323624713991209208211233208131222219226196114742082132142322171311167715069565220122822323223218218618417915912393232235235232236201154216133129811752522412402352382301281721386563234249241245237236247143597810942552482472512342372451935533115144213255253251248245161128149109138654715623925519010739102947311458177511372332331481682031794327171281726121602552551092226193524

What computers see

14

2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14


What do humans see?

15

15

How many object categories are there?

~10,000 to 30,000

Biederman 1987

16

Slide credit: Fei-Fei, Fergus, Torralba CVPR07 Short Course

16

17


17

Torralba et al. PAMI 2008


What do humans see?

18

Torralba et al. PAMI 2008

chair

table setting

light

picture


What do humans see?


What do humans see?

20

Why is vision difficult?

Ill-posed problem: real world much more complex than what we can measure in images

3D 2D

Impossible to literally invert image formation process

21


21

Challenges: many nuisance parameters

Illumination

Object pose

Clutter

Viewpoint

Intra-class appearance

Occlusions

22


22

Challenges: intra-class variation

23

Slide credit: Fei-Fei, Fergus & Torralba

23

Challenges: importance of context

24

Slide credit: Fei-Fei, Fergus & Torralba

24

Challenges: complexity

Thousands to millions of pixels in an image

3,000-30,000 human recognizable object categories

30+ degrees of freedom in the pose of articulated objects (humans)

Billions of images indexed by Google Image Search

18 billion+ prints produced from digital camera images in 2004

295.5 million camera phones sold in 2005

About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991]

25


25

spend the summer linking a camera to a

computer and getting the computer to describe what it saw

Marvin Minsky (1966), MIT

Turing Award (1969)

47 years later

26


How hard is computer vision?

26

Gerald Sussman, MIT

Youll notice that Sussman never worked in vision again! Berthold Horn


How hard is computer vision?

27

Progress so far

28


28

Progress so far

29


Progress so far

30


30

Progress so far

31


Location

AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua

32


33



3D Models

Rome

2 million photos

Dubrovnik

58 thousand photos

34


Dubrovnik

35



Progress so far

36


36

Progress so far

37


Progress so far

38


L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph.D. thesis, MIT Department of Electrical Engineering, 1963.

Visual data in 1963

39


39

Personal photo albums

Surveillance and security

Movies, news, sports

Medical and scientific images

Visual data in 2013

40

Slide credit: Svetlana Lazebnik

40

40

Why vision?

As image sources multiply, so do applications

Relieve humans of boring, easy tasks

Enhance human abilities

Advance human-computer interaction, visualization

Perception for robotics / autonomous agents

Organize and give access to visual content

41


Applications

Post-disaster family reunification

Law enforcement

Surveillance

Robotics

Autonomous driving

Medical imaging

Photo organization

Image search

E-commerce

cell phone cameras, social media, Google Glass, etc.

42


Summary

Computer Vision is a hard problem.

Lots of cool and important applications.

A growing and exciting field.

New teams in existing companies, new companies, etc.

43


Introductions

Instructor

Devi Parikh

[email protected]

Joined ECE, Virginia Tech in January 2013

Research Assistant Professor at TTI-Chicago for 3.5 years

Ph.D. from Carnegie Mellon University in 2009

Research area: Computer Vision

Recognition

Human-machine communication

44


Introductions

TA:

Neelima Chavali

[email protected]

M.S. student in ECE

45


Introductions

You

Name?

Department?

What are you hoping to get out of this course?

Do you have any experience in computer vision?

46


This course

ECE 5554

TR 3:30 pm to 4:45 pm

McBryde Hall (MCB) 307

My office hours: F 2:30 pm to 3:30 pm

Neelimas office hours: MW 11:00 am to noon

Course webpage:

http://filebox.ece.vt.edu/~F13ECE5554/

(Google me My homepage Teaching)

47


47

47

This course

Introductory Computer Vision course

Basics and fundamentals

Hands-on assignments and projects

Views of vision as a research area

48


48

48

Other courses

Advanced Computer Vision

Devi Parikh

Spring semesters

Introduction to Machine Learning and Perception

Dhruv Batra

Fall semesters

Advanced Machine Learning

Dhruv Batra

Spring semesters

49


49

49

Topics overview

Features & filters

Grouping & fitting

Multiple views and motion

Recognition

Video processing

Focus is on algorithms, rather than specific systems.

50


50

Features and filters

Transforming and describing images; textures, colors, edges

51


51

Grouping & fitting

[fig from Shi et al]

Clustering, segmentation, fitting; what parts belong together?

52


52

Multiple views and motion

Hartley and Zisserman

Lowe

Multi-view geometry, matching, invariant features, stereo vision

Fei-Fei Li

53


53

Recognition and learning

Recognizing objects and categories, learning techniques

54


54

Video processing

Tomas Izo

Tracking objects, video analysis, low level motion, optical flow

55


55

Textbooks

Recommended book:

Computer Vision:

Algorithms and Applications

By Rick Szeliski

http://szeliski.org/Book/

Lectures will be posted online

56


56

Requirements / Grading

Problem sets (55%)

Project (25%)

Final exam (15%)

Class participation, including attendance (5%)

57


57

Problem sets

Some short answer concept questions

Programming problems

Implementation

Explanation, results

Follow instructions. Points will be deducted if cant run your code on our data, cant run our code on your data, etc.

Ask questions on Scholar forum first

Code in Matlab

These assignments are substantial.

They will take significant time to do.

Start early.

58


58

Matlab

Built-in toolboxes for low-level image processing, visualization

Compact programs

Intuitive interactive debugging

Widely used in engineering

59


59

PS0

PS0: Matlab warmup + basic image manipulation

Out today, due in ~ a week (Monday night)

60


60

Digital images

Images as matrices

61


61

im[176][201] has value 164

im[194][203] has value 37

width 520

j=1

500 height

i=1

Intensity : [0,255]

Digital images

62


62

R

G

B

Color images, RGB color space

63


63

Preview of some problem sets

64


resize: castle squished

crop: castle cropped

content aware resizing:

seam carving


Grouping

65


65


Image mosaics / stitching

66

Slide credit: Kristen Grauman, Image from: Fei-Fei Li

66


Object search and recognition

67


67


Tracking, activity recognition

68


68

Assignment deadlines

Assignments in by 11:59 PM the day before class

Follow submission instructions given in assignment

Submit to scholar

No hard copy submissions

Deadlines are firm. Well use scholar timestamp. Even 1 minute late is late.

4 total free late days for the semester

Use them wisely: first two assignments are easier than others

If your program doesnt work, clean up the code, comment it well, explain what you have, and still submit. Draw our attention to this in your answer sheet.

69

Slide adapted from Kristen Grauman

69

Projects

Possibilities:

Apply any techniques we studied in class or related to real world problem

Extend a technique

Empirically analyze a technique

Compare approaches

Design and evaluate a novel approach

Novel application

Be creative!

Publication?

Can work with a partner

Talk to me if you need help with ideas

70


70

70

Project timeline (tenative)

Project proposals (1 page) [10%]

October 1st

Project presentations (5-10 minutes) [40%]

November 23rd (Saturday)

If anticipate being a problem, talk to me well in advance for alternate arrangements

Project reports (4 pages) [50%]

December 10th

71


71

71

Collaboration policy

All responses and code must be written individually.

Students submitting answers or code found to be identical or substantially similar (due to inappropriate collaboration) risk failing the course.

72


72

Miscellaneous

Check class website regularly

No laptops, phones, etc. open in class please.

Use our office hours!

Please interrupt with questions at any time.

73


73

Coming up

Now: read the class webpage carefully

Now: check out Matlab tutorial online

Now: PS0 is out

Thursday August 29th : first lecture on linear filters

Monday September 2nd : PS0 due

74


74

Questions?

See you Thursday!

Big triangles

Little triangles

?


76

Example: Big triangles vs. Little triangles

76



77

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

1

1

1

0

0

0

0

0

0

0

1

1

1

1

1

0

0

0

0

0

1

1

1

1

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

1

1

1

0

0

1

0

0

0

0

1

1

1

1

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Sum = 16

Sum = 13

Sum = 2

Sum = 3

Rule

If sum > 10

Answer = Big triangle

Else

Answer = Little triangle

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

?

Sum = 2

Little triangle



78

78

parikh lecture1 intro

Documents

kristen grauman

organization9slide credit

interpretation8slide

comments2slide credit

interpretation7slide

computer visionautomatic

visual data search

visual search