comp 776: computer vision - computer sciencelazebnik/spring11/lec01_intro.pdfwhy study computer...

60
COMP 776: Computer Vision COMP 776: Computer Vision

Upload: others

Post on 25-May-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

COMP 776: Computer VisionCOMP 776: Computer Vision

TodayToday

I t d ti t t i i• Introduction to computer vision• Course overview• Course requirements• Course requirements

The goal of computer visionThe goal of computer vision

T t t “ i ” f i l• To extract “meaning” from pixels

What we see What a computer seesSource: S. Narasimhan

The goal of computer visionThe goal of computer vision

T t t “ i ” f i l• To extract “meaning” from pixels

Humans are remarkably good at this…

Source: “80 million tiny images” by Torralba et al.

Humans are remarkably good at this…

What kind of information can be extracted What kind of information can be extracted f ?f ?from an image?from an image?

M t i 3D i f ti• Metric 3D information• Semantic information

Vision as measurement deviceVision as measurement device

Real-time stereo Structure from motionReconstruction from

Internet photo collections

NASA Mars RoverNASA Mars Rover

Pollefeys et al. Goesele et al.

Vision as a source of semantic information

slide credit: Fei-Fei, Fergus & Torralba

Object categorization

sky

building

flag

wallbannerface

bus busstreet lamp

cars slide credit: Fei-Fei, Fergus & Torralba

Scene and context categorizationtd• outdoor

• city• traffic• …

slide credit: Fei-Fei, Fergus & Torralba

Qualitative spatial information

slantedslanted

non-rigid moving objectobject

rigid moving

vertical

rigid movingrigid moving object

horizontal slide credit: Fei-Fei, Fergus & Torralba

rigid moving object

Why study computer vision?Why study computer vision?• Vision is useful: Images and video are everywhere!Vision is useful: Images and video are everywhere!

Personal photo albums Movies, news, sports

Surveillance and security Medical and scientific images

Why study computer vision?Why study computer vision?

• Vision is useful• Vision is interesting• Vision is difficult

– Half of primate cerebral cortex is devoted to visual processing– Achieving human-level visual perception is probably “AI-complete”Achieving human level visual perception is probably AI complete

Why is computer vision difficult?Why is computer vision difficult?

Challenges: viewpoint variation

Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba

Challenges: illumination

image credit: J. Koenderink

Challenges: scale

slide credit: Fei-Fei, Fergus & Torralba

Challenges: deformation

Xu, Beihong 1943

slide credit: Fei-Fei, Fergus & Torralba

Challenges: occlusion

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

Challenges: background clutter

Challenges: Motion

Challenges: object intra-class variationvariation

slide credit: Fei-Fei, Fergus & Torralba

Challenges: local ambiguity

slide credit: Fei-Fei, Fergus & Torralba

Challenges: local ambiguity

Source: Rob Fergus and Antonio Torralba

Challenges: local ambiguity

Source: Rob Fergus and Antonio Torralba

Challenges or opportunities?Challenges or opportunities?

I f i b t th l l th t t f• Images are confusing, but they also reveal the structure of the world through numerous cues

• Our job is to interpret the cues!Our job is to interpret the cues!

Image source: J. Koenderink

Depth cues: Linear perspectiveDepth cues: Linear perspective

Depth cues: Aerial perspectiveDepth cues: Aerial perspective

Depth ordering cues: OcclusionDepth ordering cues: Occlusion

Source: J. Koenderink

Shape cues: Texture gradientShape cues: Texture gradient

Shape and lighting cues: ShadingShape and lighting cues: Shading

Source: J. Koenderink

Position and lighting cues: Cast shadowsPosition and lighting cues: Cast shadows

Source: J. Koenderink

Grouping cues: Similarity (color, texture,Grouping cues: Similarity (color, texture,proximity)proximity)p y)p y)

Grouping cues: “Common fate”Grouping cues: “Common fate”

Image credit: Arthus-Bertrand (via F. Durand)

Inherent ambiguity of the problemInherent ambiguity of the problem

M diff t 3D ld h i i t• Many different 3D scenes could have given rise to a particular 2D picture

Inherent ambiguity of the problemInherent ambiguity of the problem

M diff t 3D ld h i i t• Many different 3D scenes could have given rise to a particular 2D picture

• Possible solutions– Bring in more constraints (more images)

Use prior knowledge about the structure of the world– Use prior knowledge about the structure of the world

• Need a combination of geometric and statistical methods

Connections to other disciplinesConnections to other disciplines

Artificial Intelligence

Machine LearningRobotics

Computer Vision

Cognitive scienceNeuroscience

Computer Graphics

Image Processing

Neuroscience

Origins of computer visionOrigins of computer vision

L. G. Roberts, Machine Perception of Three Dimensional Solids,Ph.D. thesis, MIT Department of pElectrical Engineering, 1963.

Successes of computer vision to date

Optical character recognition (OCR)

Digit recognition License plate readersg gyann.lecun.com

License plate readershttp://en.wikipedia.org/wiki/Automatic_number_plate_recognition

Sudoku grabberhttp://sudokugrab.blogspot.com/

Source: S. Seitz, N. SnavelyAutomatic check processing

Biometrics

Fingerprint scanners on many new laptops

Face recognition systems now beginning to appear more widely

htt // ibl i i /many new laptops, other devices

http://www.sensiblevision.com/

Source: S. Seitz

Biometrics

How the Afghan Girl was Identified by Her Iris Patterns

Source: S. Seitz

Mobile visual search: Google Goggles

Face detection

Many new digital cameras now detect faces• Canon, Sony, Fuji, …

Source: S. Seitz

Smile detection

Sony Cyber-shot® T70 Digital Still Camera Source: S. Seitz

Face recognition: Apple iPhoto software

http://www.apple.com/ilife/iphoto/

Automotive safety

Mobileye: Vision systems in high-end BMW, GM, Volvo models • Pedestrian collision warning• Forward collision warning• Forward collision warning• Lane departure warning• Headway monitoring and warning Source: A. Shashua, S. Seitz

Vision-based interaction: Xbox Kinect

http://electronics.howstuffworks.com/microsoft-kinect.htmhttp://blogs.howstuffworks.com/2010/11/05/how-microsoft-

http://www.xbox.com/en-US/Live/EngineeringBlog/122910-HowYouBecometheController

kinect-works-an-amazing-use-of-infrared-light/

http://www.ismashphone.com/2010/12/kinect-hacks-more-interesting-than-the-devices-original-intention.html

Special effects: shape and motion capture

Source: S. Seitz

3D visualization: Microsoft Photosynth

http://photosynth.net Source: S. Seitz

Vision for robotics, space exploration

NASA'S Mars Exploration Rover Spirit captured this westward view from atop

Vision systems (JPL) used for several tasks

a low plateau where Spirit spent the closing months of 2007.

y ( )• Panorama stitching• 3D terrain modeling

Obstacle detection position tracking• Obstacle detection, position tracking• For more, read “Computer Vision on Mars” by Matthies et al.

Source: S. Seitz

The computer vision industryThe computer vision industry

A li t f i h• A list of companies here:

http://www.cs.ubc.ca/spider/lowe/vision.htmlp p

Basic InfoBasic Info• Instructor: Svetlana Lazebnik (lazebnik@cs unc edu)• Instructor: Svetlana Lazebnik ([email protected])• Office hours: By appointment, FB 244• Class webpage: http://www.cs.unc.edu/~lazebnik/spring11Class webpage: http://www.cs.unc.edu/ lazebnik/spring11

• Textbooks (suggested): F h & P C t Vi i A M d A hForsyth & Ponce, Computer Vision: A Modern ApproachRichard Szeliski, Computer Vision: Algorithms and Applications (available online)pp ( )

Course requirementsCourse requirements• Philosophy: computer vision is best experienced hands-onPhilosophy: computer vision is best experienced hands on

• Programming assignments: 50%– About four assignments– Expect the first one in the next couple of classes– Brush up on your MATLAB skills (see web page for tutorial)

• Final assignment: 30% – Recognition competitionRecognition competition– Winner gets a prize!

• Participation: 20%• Participation: 20% – Come to class regularly– Ask questions

A ti– Answer questions

Collaboration policyCollaboration policy

F l f t di i t ith h th b t di• Feel free to discuss assignments with each other, but coding must be done individually

• Feel free to incorporate code or tips you find on the Web, provided this doesn’t make the assignment trivial and you explicitly acknowledge your sourcesexplicitly acknowledge your sources

• Remember: I can Google too (and I have the copies of g ( peverybody’s assignments from the last three years this class was offered)

Course overviewCourse overview

I E l i i I f ti d iI. Early vision: Image formation and processingII. Mid-level vision: Grouping and fittingIII Multi view geometryIII. Multi-view geometryIV. RecognitionV. Advanced topicsd a ced top cs

I. Early visionI. Early vision

B i i f ti d i• Basic image formation and processing

* =

Cameras and sensorsLight and color

Linear filteringEdge detection

Light and color

Feature extraction: corner and blob detection

II. “MidII. “Mid--level vision”level vision”

Fitti d i• Fitting and grouping

Alignment

Fitting: Least squaresHough transformHough transform

RANSAC

III. MultiIII. Multi--view geometryview geometry

Stereo Epipolar geometry

Tomasi & Kanade (1993)

Projective structure from motionAffine structure from motion

IV. RecognitionIV. Recognition

Patch description and matching Clustering and visual vocabularies

Bag-of-features models ClassificationBag of features models

Sources: D. Lowe, L. Fei-Fei

V. Advanced TopicsV. Advanced TopicsTi itti• Time permitting…

Segmentation Face detection

Articulated models Motion and tracking