introduction to computer vision - university of...

Introduction to Computer VisionRODNEY DOCKTER, PH.D.

MARCH 2018

Rodney Dockter (me)• Ph.D. in Mechanical Engineering from the University of Minnesota

• Worked in Dr. Tim Kowalewski’s lab

• Medical robotics and machine learning

• Now working at Danfoss Power Solutions developing autonomous systems in agriculture and heavy machinery

Contact Information•Office: ME 2110

•Email: dockt036@umn.edu

•Office Hours: W, F 1:15pm – 2:00 pm

Book References (optional)• David A. Forsyth and Jean Ponce: Computer vision: a modern approach. Prentice Hall, 2002.

• Peter Corke: Robotics, Vision and Control: Fundamental Algorithms In MATLAB. Springer, 2017

• R.C. Gonzalez and R.E. Woods: Digital Image Processing, Prentice-Hall, 2002

• Richard Hartley and Andrew Zisserman: Multiple View Geometry in Computer Vision. Cambridge2004

• Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer, 2011

Class Organization• All computer vision from here on out!

• ~10 lectures, 3 assignments, 30% of total grade

• “Prerequisites”:• Linear algebra

• Statistics

• Vector Calculus

• Coding!

• Course does not presume prior computer vision experience

• Emphasis on coding!

• Matlab will be required for all homework assignments

Class Organization Cont.• Collaboration Policy:

- You are encouraged to discuss assignments with your peers. However, all written work and coding must be done individually. Individuals found submitting duplicate or substantially similar materials due to inappropriate collaboration may get an “F” in this class and may receive other serious consequences.

- In the real world no one will write your code for you!

• Libraries and Code Found Online:

- None of the Matlab libraries for computer vision may be used in assignments. You will writing your own computer vision code from the ground up.

- Students found plagiarizing code found on the internet will fail the course (we know how to use google too)

Course Syllabus (Vision Portion)March 30 F 1 - Introduction to computer vision and image processing

April 04 W 2 - Image Formation Camera Fundamentals

April 06 F 3 - Digital Image Representation (Quiz #2 review)

April 11 W 4 - Spatial Domain + Computer Vision in Matlab

April 13 F QUIZ #2 – No Lecture

April 18 W 5 - Image Histograms

April 20 F 6 - Edge Detection

April 25 W 7 - Edge Detection Cont. – Canny and Sobel

April 27 F 8 - Interest Point Detection

May 02 W 9 - Line Detection. Hough Transform.

May 04 F 10 - The future. Deep Learning.

Outline • What is computer vision? What is image processing?

• Basics of Computer Vision Processing

• Interesting examples of computer vision in the wild?

• Brief introduction to Matlab for computer vision use.

Part 1 - What is computer vision?Allowing robots and machines to see the world around them.

What is computer vision?Once robots can see the world, they can interact with it.

What is computer vision?Machines can learn to perform tasks just like humans.

What is computer vision?• Deals with the formation, analyses and interpretation of images

• Integral to robotics and Artificial Intelligence (AI)

• Interdisciplinary subject area:• Robotics

• Autonomous Vehicles

• Medical Applications

• Security

• Practical and Useful

• Challenging and continuously evolving

Difficulties in Computer Vision• Images are ambiguous: dependent on perspective

• Images are affected by many factors:• Sensor model

• Illumination (lighting)

• Shape of object(s)

• Color of object (s)

• Texture of object (s)

• No “universal solution” to vision and sensing

• Many theories and potential algorithms

• Gold standard: Human Vision!

Human Vision:

Difficulties in Computer Vision• It is a many-to-mapping:• Different objects with different material and geometric properties, possibly under different lighting

conditions could lead to identical images.

• The same object, viewed from different perspectives or with different lighting can result in vastlydifferent images (under constrained mapping)

• Information is lost in the transformation from the 3D world to a 2D image.

Difficulties in Computer Vision• It is computationally intensive.

• Usually requires high end processors or graphics cards to achieve real time performance.

• We still do not fully understand the recognition problem.

Nomenclature• Many names for mostly similar things:

• Computer Vision• General – All encompassing term

• Computational Vision• Modeling of biological vision

• Image Processing• Generally refers to static images (frame to frame). Building block for computer vision.

• Machine Vision• Industrial or factory applications for inspection and measurement.

Related Fields• Pattern Recognition

• Machine Learning

• Robotics (obviously)

• Medical Imaging

• Computer Graphics

• Human – Robot Interaction

Why study computer vision• Images and videos are everywhere

• Mobile phones, cheap cameras, real time streaming

• New applications all the time

• Face recognition

• 3D representations from pictures

• Automatic surveillance

• Driverless vehicles

• Shipping and warehouse management

• Various deep scientific mysteries• Human vision system etc

Brief History of Computer Vision• B.C. (Before Computers)

• Philosophy

• Optics

• Physics

• Neurophysiology

• Early Cameras (Late 1800s)

• Earliest way to represent the real world in a consistent manner

• Precursor to video and digital representations of the world around us

Brief History of Computer Vision• Early Computer Vision Systems (1960’s)

• Minsky at MIT• Attempted to solve vision in a summer (lol)

• Empirical approaches• Ad hoc, “bag of tricks”

• Image processing plus+

• Solutions tailored to each problem.

• Simplified worlds• “Blocks worlds” then generalize

• Known perspectives, known illumination

Brief History of Computer Vision• 1980’s improved computational power + robotics innovations

• 1990’s personal computing brings computer vision possibilities to researchers everywhere.

• 2000’s OpenCV, ROS, and Matlab provide generalized computer vision computing abilities toresearchers

• 2010’s – Computer vision comes to the masses:• Microsoft Kinect

• Leap Motion

• Google Tango

• Augmented Reality

• iPhone Face ID

Progress in Computer Vision• First Generation: Military/ Early Research• Few systems, custom built. Cost: $1 Million+

• Users have PhDs

• Slow (1 hour per frame)

• Second Generation: Industrial / Medical• Numerous systems (1000+). Cost: $10,000+

• Users have bachelor degrees

• Specialized hardware

• Third Generation: Consumer• 100000 + systems worldwide. Cost: $100

• Users have little or no training

• Raspberry + webcam = $50 = limitless potential

The Future• Software:• 1 solution for all vision problems

• Provide unexperienced users with an easy way to detect/track objects.

• Does it exist?

• Convolutional Neural Networks?

• Tensorflow?

• Hardware:• Cheap cameras (cell phone)

• Depth cameras (Kinect, stereo cameras)

• Cheap, powerful, mobile processing (getting there)

Part 2 – Computer Vision Processing• Processing Levels:• Low Level

• Mid Level

• High Level

• Image Formulation

Three Processing Levels – Low Level• Low Level Processing:

• Standard procedures to improve image quality and format

• No “intelligence”

Three Processing Levels - Mid Level• Intermediate Processing:

• Extract features and characterize components

• Some “intelligence”

Three Processing Levels - Mid Level• Representing small patches in an image• Want to establish correspondence between points or regions in different

images, so we need to mathematically describe the neighborhood around apoint

• Sharp changes are important features! (Known as “edges”)

• Representing texture by giving some statistics of the different types of patchespresent in a region

• i.e. Tigers have lots of stripes, few spots

• Leapords have lots of spots, no stripes

• How can we mathematically describe this?

Three Processing Levels - Mid Level• Filter Outputs• Filters form a dot-product between a pattern and an image, while shifting the pattern

across the image

• Strong Response -> image region matches a pattern

• e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright line next to a dark line would yield a large derivative)

Three Processing Levels - Mid Level• Many objects are distinguished by their texture• Tigers, cheetahs, grass, trees…

• We represent texture with statistics from filter outputs• For tigers, a stripes filter at a coarse scale responds strongly

• For cheetahs, a spot filter with a coarse scale responds strongly

• For grass, long narrow lines

• For the leaves of a trees, extended spots

• Objects with different textures can be segmented

• The variation in textures in a cue to a shape

Three Processing Levels - Mid Level• Geometry of multiple views (images)• Where might an object appear in camera #2 (or #3,…) given its position in

camera 1

• Stereopsis• What we know about the world from having 2 eyes (cameras)

• Depth and 3D information

• Structure from motion• What we know about the world from having many eyes (a single eye, moving

quickly)

Three Processing Levels - Mid Level• Finding coherent structures in an image to break it into smaller units• Segmentation:

• Breaking images and videos into useful pieces

• e.g. finding image components that are coherent in appearance

• e.g. separating image foreground from background

• Tracking:

• Keeping track of a moving object through a sequence of views

Three Processing Levels – High Level• High-Level Processing:

• Recognizing patterns, comparing to known models

• High “intelligence” (ie machine learning)

Three Processing Levels - High Level• Relationship between object geometry and image geometry• Model Based Vision

• Find the position and orientation of known objects (e.g. squares, circles)

• Smooth surfaces and outlines

• How the outline of a curved object is formed and what it looks like (contours)

• Aspect graphs:

• How the outline of a curved object changes as you view it from different directions

• Range data

Three Processing Levels - High Level• Using classifiers and probability to match and recognize objects• Templates and classifiers

• How to find objects that look the same from view to view with a classifier

• Relations

• Break up objects into big, simple parts, find the parts with a classifier, thenidentify relationships between parts to find the object

• Geometric templates from spatial relations

• Extend this trick so that templates are formed from relations betweensmaller parts

• Bordering on machine learning…

Image Formulation

Simplistic View

The Physics of Imaging• How images are formed:

• Cameras

• How a camera creates an image

• How to tell where the camera was

• Light

• How to measure light

• What light does at surfaces

• How the brightness values we see in cameras are determined

• Color

• The underlying mechanisms of color

• How to describe and measure color

The Physics of Imaging

introduction to computer vision - university of...

Documents

collaborative robot technical specification iso/ts...

matlab for image...

homogeneous transformation matrices example: puma...

static force analysis: another role for the...

lecture9 hough transform and thresholding...

lecture 2: image formation and cameras -...

introduction to...

lecture 8: interest point detection - university of...

#1 lecture 3: digital image representation and color...

collaborative robot technical specification iso/ts 15066...

lecture 7: most common edge...

lecture 8: interest point detection - university of...

lecture 8: interest point detection - university of...

lecture 4: spatial domain...

(incorporates material from many sources) department of...

#1 lecture 1: computer vision - university of...

lecture 1 - university of...

strategic planning - cogic.org · strategic planning: a...

robots in the field, part 5 - university of...

homogeneous transformation matrices example: puma...