ee837, cs867, ce803 computer vision -...

Post on 24-Feb-2021

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computer Vision

EE837, CS867, CE803

Introduction

Lecture 01 Computer Vision

• Basic linear Algebra, probability, calculus - Required

• Basic data structures/programming knowledge - Required

• Working knowledge of MATLAB - Required

• Knowledge and understanding of basic image processing - Preferable

Prerequisites

• Class slides, research papers, tutorials and supplemental material

• Linda G. Shapiro and George Stockman, “Computer Vision”, Upper Saddle River, NJ: Prentice Hall, 2001.

• David A. Forsyth and Jean Ponce, “Computer Vision A Modern Approach”, 2nd edition, Prentice Hall, Inc., 2003.

• Richard Szeliski , “Computer Vision: Algorithms and Applications”, Springer; 2011 edition, 2011. Made available online by the author: http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf

Extra

List of CV books http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm

Text and Reading

• Camera geometry and basic transformations

• Camera calibration and camera- parameters estimation

• Sources, shadows, shading and shape from shading

• Feature Extraction

• Texture Synthesis

• Template Matching and Image Registration

• Segmentation

• Vision based Tracking

• Multiple view geometry

Broader course topics

Homework – Mostly programming assignments: 15%

Midterm/Hourly: 15%

Surprise quizzes/attendance: 10%

Final Project: 30%

Final exam: 30%

Grading policy

• Homework/programming assignments:– Reports should be type-written– Code and program output are required

• Final Project:– Brain storm on project ideas– Project highlights – 10 minute each group– Individual or a group of max two with individual roles clearly defined– Type-written report upto 10 pages in CVPR format, with additional pages for commented codes as appendix. – Project presentation

• Late Policy:– No credit for late submissions

Grading policy

• Plagiarism is strictly prohibited• Cite the source• Negative marking will be done, where found

Grading policy

• Dr George Stockman Professor Emeritus, Michigan State University

• Dr Mubarak Shah Professor, University of Central Florida

• The Robotics Institute Carnegie Mellon University

Material citations

Any queries?

By appointment only:

Cdr Dr Hammad

PG 111

Preferably Tuesday and Wednesday11AM –Noon

Email: cvnustpnec@gmail.com

Lets start !!!

What is an image?

What we see

What a computer sees

What is an image?

What we see

What a computer sees

Where is the Sun?

Image Processing

Fourier TransformSampling, Convolution

Image enhancement Feature detection

What is Computer Vision?

• Inverse Optics

• Intelligent interpretation of Imagery

• Building a Visual CortexPart of the cerebral cortex responsible for processing visual

information

• No matter what your definition is…

Vision is complex…….but is FUN !!!

Difference between CV and IP

• Image processing: Process the output of sensors.Computer vision: Relates the output of the sensors to real world.

• Image processing: The output is a transformed image.Computer vision: The output is usually a decision.

• Image Processing: Signal processing.Computer Vision: Artificial Intelligence .

• Defect detection or automatic driving relates to ?Enhancing an image relates to ?

Lighting

Scene

Camera

Computer

Scene Interpretation

Components of a Computer Vision System

Image acquisition

Video clip

Sequence of images

16 images in succession that shows motion

Shape from shading

• Shade deceives human visual system• Changes the 3D shape• Gradual variation of the shading gives 3D information

(1,0,1) (-1,1,1) (-1,-1,1)

Shape from texture

3D from Shading

Shape from Shading

Shape from texture

• Same shape (circles) repeated, forms texture• Circles become ellipses at some places• Gives 3D cue• Texture can be used to recover 3D

Shape from motion

• Cannot understand just from dots that what it is• Humans have this capability to understand motion

Shape from motion

Optical flow

• Color wheel• Completing pixel wise motion

Sequence Raw optical flow

Optical Flow

Microsoft photosynth

• Panorama stitching• Can capture in amazing resolution and full 3D.• For anyone with a D-SLR (Single Lens Reflex) or a point-and-shoot camera.• https://photosynth.net/preview/about/

Video clip and mosaic

• Stitching images together

Applications of Computer Vision

• Face Recognition

• Object Recognition

• Video Surveillance and Monitoring

• Object detection, tracking and behavior analysis

• Remote Sensing: UAVs

• Robotics

• Computer Graphics

• And more ………….

Face Recognition

• Principle Components Analysis (PCA)• Fisher Linear Discriminant (FLD)

Face recognition

Facial expression

Surprised Smiling

Detecting driver alertness

Human detection

• Left – UAV image• Bounding boxes• Will learn basic techniques on how we can track these moving objects

Video surveillance and monitoring

• Automated surveillance systems – Detection and tracking

Object detection Object tracking Object classification

Activity recognition

Airport surveillance

Aerial imagery - UAVs

• Drones Military use

Instead of drones many want to brand the technology as "Unmanned Aerial Systems" (UAS) in preference over "drones.“

• Aerial surveying of crops• Acrobatic aerial footage in filmmaking• Search and rescue operations• Inspecting power lines and pipelines• Counting wildlife• Delivering medical supplies to inaccessible regions

Aerial imagery

Object tracking

Kernel tracking +blob tracking +

occlusion

Motion detection

Frame differencing + background modeling +

object segmentation

Camera motion compensation

Feature based +gradient

Event detection and tracking

Aerial imagery – Registration results

Aerial imagery – Detection results

Aerial imagery – Tracking results

Wide area surveillance

Wide area surveillance

Tracking results

Unmanned Ground Vehicle

• Comes under Robot vision

• Google Self driving car

• The system combines information from Google Street View with artificial intelligence software that combines input from:

• Video cameras inside the car• Identifying pedestrians and moving obstacles

• LIDAR sensor on top of the vehicle• For 3D map

• Radar sensors on the front of the vehicle• Position of distant objects

• Position sensor attached to one of the rear wheels• Locate the car's position on the map.

Unmanned Ground Vehicle

• Defense Advanced Research Projects Agency (DARPA) urban challenge

Human activity recognition

• Involves

• Events• Actions• Activities

• Different datasets available for analysis

Human activity recognition - datasets

• Weizmann action dataset• 10 actions• 09 actors per action

• KTH Data Set• 06 categories• 25 actors• 04 instances• 600 clips

Human activity recognition - datasets

• UCF Sports dataset• 9 actions• 142 videos

• IMAX multi-view dataset

Bench swing Dive Swing Run

Kick Lift Ride Golf swing Skate

Human activity recognition - datasets• UCF 50

Stereo• Regular camera lose 3D information• Microsoft Kinect sensor – game changer• Gives direct 3D information + RGB image• 50,000 different gestures – Challenge is that can you identify all/some of these

3D depth sensors

RGB Camera

IR LED Emitter

Array of microphones

Tilt motor

Binocular Stereo

Stereo• Regular camera lose 3D information

Range Scanning and Structured Light

High density crowded scenes

• Tracking required for:

• Crowd management• Public space design• Virtual environments• Visual surveillance• Intelligent environments• And more !!!

High density crowded scenes• Can we do tracking in this kind of crowd?

Political Rallies Religious Festivals Marathons High Density Moving Objects

High density crowded scenes• Can we do tracking in this kind of crowd?

• Average chip size 14 x 22 pixels• 492 Frames• Selected 199 athletes for tracking• Successfully tracked 143 athletes

High density crowded scenes• Can we do tracking in this kind of crowd?

High density crowded scenes• Can we do tracking in this kind of crowd?

• Average chip size 14 x 17 pixels• 453 Frames• Selected 50 athletes for tracking

High density crowded scenes• Can we do tracking in this kind of crowd?

Behaviors in crowded scenes• Can we identify the behavior of the crowd?

Image localization

Location in terms of Longitude (40.4419) Latitude (40.4419)

Input Output

• Image compared with database of images

Geospatial trajectory extraction• Sequence of images compared with database

Computer graphics• CV used for movies like Harry Potter, Avatar, Matrix etc

Layer based image composition

• Segmentation method• Green Chroma key screen• Green and blue differ the most in hue from skin colors

Virtual studiohttp://en.wikipedia.org/wiki/Chroma_key

Layer based video composition• Segmentation method

Layer based video composition• Segmentation method

Industrial robots vs low skilled workers

top related