machine vision

37
Case Study on Machine Vision 2011 CHAP TER 1 INTRODUC TION 1.1 What is machine vision? Machine vision (MV) is a branch of engineering that uses computer vision in the context of manufacturing. While the scope of MV is broad and a comprehensive definition is difficult to distil, a "generally accepted definition of machine vision is '... the analysis of images to extract data for controlling a process or activity. Put another way, MV processes are targeted at "recognizing the actual objects in an image and assigning properties to those objects--understanding what they mean. The first step in the MV process is acquisition of an image, typically using cameras, lenses, and lighting that has been designed to provide the differentiation required by subsequent processing. MV software packages then employ various digital image processing techniques to allow the hardware to recognize what it is looking at. Techniques used in MV include: thresholding (converting an image with gray tones to black and white), segmentation, blob extraction, pattern recognition, barcode reading, optical character recognition, gauging (measuring object dimensions), edge detection, and template matching (finding, matching, and/or counting specific patterns). NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 1 GGSIPU

Upload: saurabh-gupta

Post on 03-Jul-2015

137 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 1

INTRODUCTION

1.1 What is machine vision?

Machine vision (MV) is a branch of engineering that uses computer vision in the context of manufacturing. While the scope of MV is broad and a comprehensive definition is difficult to distil, a "generally accepted definition of machine vision is '... the analysis of images to extract data for controlling a process or activity. Put another way, MV processes are targeted at "recognizing the actual objects in an image and assigning properties to those objects--understanding what they mean.

The first step in the MV process is acquisition of an image, typically using cameras, lenses, and lighting that has been designed to provide the differentiation required by subsequent processing. MV software packages then employ various digital image processing techniques to allow the hardware to recognize what it is looking at.

Techniques used in MV include: thresholding (converting an image with gray tones to black and white), segmentation, blob extraction, pattern recognition, barcode reading, optical character recognition, gauging (measuring object dimensions), edge detection, and template matching (finding, matching, and/or counting specific patterns).

Fig 1.1 Machine Vision Camera Used in Robots

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 1GGSIPU

Page 2: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 2

THRESHOLDING

2.1 What is thresholding?

Thresholding is the simplest method of image segmentation. From a grayscale image, thresholding can be used to create binary images

2.2 Method

During the thresholding process, individual pixels in an image are marked as “object” pixels if their value is greater than some threshold value (assuming an object to be brighter than the background) and as “background” pixels otherwise. This convention is known as threshold above. Variants include threshold below, which is opposite of threshold above; threshold inside, where a pixel is labeled "object" if its value is between two thresholds; and threshold outside, which is the opposite of threshold inside. Typically, an object pixel is given a value of “1” while a background pixel is given a value of “0.” Finally, a binary image is created by coloring each pixel white or black, depending on a pixel's label's.

2.3 Threshold selection

The key parameter in the thresholding process is the choice of the threshold value (or values, as mentioned earlier). Several different methods for choosing a threshold exist; users can manually choose a threshold value, or a thresholding algorithm can compute a value automatically, which is known as automatic thresholding. A simple method would be to choose the mean or median value, the rationale being that if the object pixels are brighter than the background, they should also be brighter than the average. In a noiseless image with uniform background and object values, the mean or median will work well as the threshold, however, this will generally not be the case. A more sophisticated approach might be to create a histogram of the image pixel intensities and use the valley point as the threshold. The histogram approach assumes that there is some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. However, this may be computationally expensive, and image histograms may not have clearly defined valley points, often making the selection of an accurate threshold difficult. One method that is relatively simple, does not require much specific knowledge of the image, and is robust against image noise, is the following iterative method:

1. An initial threshold (T) is choosen; this can be done randomly or according to any other method desired.

2. The image is segmented into object and background pixels as described above, creating two sets:

1. G1 = {f(m,n):f(m,n)>T} (object pixels)2. G2 = {f(mn):f(m,n) T} (background pixels) (note, f(m,n) is the value

of the pixel located in the mth column, nth row)3. The average of each set is computed.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 2GGSIPU

Page 3: Machine Vision

Case Study on Machine Vision 2011

1. m1 = average value of G1

2. m2 = average value of G2

4. A new threshold is created that is the average of m1 and m2 1. T’ = (m1 + m2)/2

5. Go back to step two, now using the new threshold computed in step four, keep repeating until the new threshold matches the one before it (i.e. until convergence has been reached).

This iterative algorithm is a special one-dimensional case of the k-means clustering algorithm, which has been proven to converge at a local minimum—meaning that a different initial threshold may give a different final result.

2.4 Adaptive thresholding

Thresholding is called adaptive thresholding when a different threshold is used for different regions in the image. This may also be known as local or dynamic thresholding.

2.5 Categorizing thresholding Methods

Sezgin and Sankur (2004) categorize thresholding methods into the following six groups based on the information the algorithm manipulates:

1. "histogram shape-based methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed

2. clustering-based methods, where the gray-level samples are clustered in two parts as background and foreground (object), or alternately are modeled as a mixture of two Gaussians

3. Entropy-based methods result in algorithms that use the entropy of the foreground and background regions, the cross-entropy between the original and binarized image, etc.

4. Object attribute-based methods search a measure of similarity between the gray-level and the binarized images, such as fuzzy shape similarity, edge coincidence, etc.

5. spatial methods [that] use higher-order probability distribution and/or correlation between pixels

6. Local methods adapt the threshold value on each pixel to the local image characteristics."

2.6 Multiband thresholding

Colour images can also be thresholded.One approach is to designate a separate threshold for each of the RGB components of the image and then combine them with an AND operation. This reflects the way the camera works and how the data is stored in the computer, but it does not correspond to the way that people recognize colour. Therefore, the HSL and HSV colour models are more often used. It is also possible to use the CMYK colour model

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 3GGSIPU

Page 4: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 3

SEGMENTATION

3.1 What is segmentation?

In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments (sets of pixels, also known as super pixels). The goal of segmentation is to simplify or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain visual characteristics.

The result of image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a region are similar with respect to some characteristic or computed property, such as colour, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s).[1] When applied to a stack of images, typical in Medical imaging, the resulting contours after image segmentation can be used to create 3D reconstructions with the help of interpolation algorithms like marching cubes.

Some of the practical applications of image segmentation are:

1. Medical imaging

Locate tumors and other pathologies Measure tissue volumes Computer-guided surgery Diagnosis Treatment planning Study of anatomical structure

2. Locate objects in satellite images (roads, forests, etc.)3. Face recognition4. Fingerprint recognition5. Traffic control systems6. Brake light detection7. Machine vision8. Agricultural imaging – crop disease detection

3.2 Thresholding

The simplest method of image segmentation is called the thresholding method. This method is based on a clip-level (or a threshold value) to turn a gray-scale image into a binary image. The key of this method is to select the threshold value (or values when multiple-levels are selected). Several popular methods are used in industry including

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 4GGSIPU

Page 5: Machine Vision

Case Study on Machine Vision 2011

the maximum entropy method, Otsu's method (maximum variance), and et al. k-means clustering can also be used.

3.3 Histogram-based methods

Histogram-based methods are very efficient when compared to other image segmentation methods because they typically require only one pass through the pixels. In this technique, a histogram is computed from all of the pixels in the image, and the peaks and valleys in the histogram are used to locate the clusters in the image colour or intensity can be used as the measure.

A refinement of this technique is to recursively apply the histogram-seeking method to clusters in the image in order to divide them into smaller clusters. This is repeated with smaller and smaller clusters until no more clusters are formed.

One disadvantage of the histogram-seeking method is that it may be difficult to identify significant peaks and valleys in the image. In this technique of image classification distance metric and integrated region matching are familiar.

Histogram-based approaches can also be quickly adapted to occur over multiple frames, while maintaining their single pass efficiency. The histogram can be done in multiple fashions when multiple frames are considered. The same approach that is taken with one frame can be applied to multiple, and after the results are merged, peaks and valleys that were previously difficult to identify are more likely to be distinguishable. The histogram can also be applied on a per pixel basis where the information result is used to determine the most frequent colour for the pixel location. This approach segments based on active objects and a static environment, results in a different type of segmentation useful in video tracking.

3.5 Edge Detection

Edge detection is a well-developed field on its own within image processing. Region boundaries and edges are closely related, since there is often a sharp adjustment in intensity at the region boundaries. Edge detection techniques have therefore been used as the base of another segmentation technique.

The edges identified by edge detection are often disconnected. To segment an object from an image however, one needs closed region boundaries.

3.6 Connected Component Labeling

Connected component labeling (alternatively connected component analysis, blob extraction, region labeling, blob discovery, or region extraction) is an algorithmic application of graph theory, where subsets of connected components are uniquely ladled based on a given heuristic. Connected component labeling is not to be confused with segmentation.

Connected component labeling is used in computer vision to detect connected regions in binary digital images, although colour images and data with higher-dimensionality can also be processed. When integrated into an image recognition system or human-

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 5GGSIPU

Page 6: Machine Vision

Case Study on Machine Vision 2011

computer interaction interface, connected component labeling can operate on a variety of information. Blob extraction is generally performed on the resulting binary image from a thresholding step. Blobs may be counted, filtered, and tracked.

Overview:

4-connectivity 8-connectivity

A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbours'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbours. Connectivity is determined by the medium; image graphs, for example, can be 4-connected or 8-connected.

Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed.

3.7 Algorithms

The algorithms discussed can be generalized to arbitrary dimensions, albeit with increased time and space complexity.

Two-pass

Relatively simple to implement and understand, the two-pass algorithm iterates through 2-dimensional, binary data. The algorithm makes two passes over the image: one pass to record equivalences and assign temporary labels and the second to replace each temporary label by the label of its equivalence class.

The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure.

Connectivity checks are carried out by checking the labels of pixels that are North-East, North, North-West and West of the current pixel (assuming 8-connectivity). 4-connectivity uses only North and West neighbours of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed)

Conditions to check:

1. Does the pixel to the left (West) have the same value?

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 6GGSIPU

Page 7: Machine Vision

Case Study on Machine Vision 2011

1. Yes - We are in the same region. Assign the same label to the current pixel

2. No - Check next condition2. Do the pixel to the North and West of the current pixel have the same value

but not the same label? 1. Yes - We know that the North and West pixels belong to the same

region and must be merged. Assign the current pixel the minimum of the North and West labels, and record their equivalence relationship

2. No - Check next condition3. Does the pixel to the left (West) have a different value and the one to the

North the same value? 1. Yes - Assign the label of the North pixel to the current pixel2. No - Check next condition

4. Do the pixel's North and West neighbours have different pixel values? 1. Yes - Create a new label id and assign it to the current pixel

The algorithm continues this way, and creates new region labels whenever necessary. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships.[7] Union-find essentially stores labels which correspond to the same blob in a disjoint-set data structure, making it easy to remember the equivalence of two labels by the use of an interface method Eg: findSet (l). findSet (l) returns the minimum label value that is equivalent to the function argument 'l'.

Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with the it's equivalent disjoint-set representative element.

Raster Scanning Algorithm for connected region extraction is presented below.

On the first pass:

1. Iterate through each element of the data by column, then by row (Raster Scanning)

2. If the element is not the background 1. Get the neighbouring elements of the current element2. If there are no neighbours, uniquely label the current element and

continue3. Otherwise, find the neighbour with the smallest label and assign it to

the current element4. Store the equivalence between neighbouring labels

On the second pass:

1. Iterate through each element of the data by column, then by row2. If the element is not the background

1. Re label the element with the lowest equivalent label

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 7GGSIPU

Page 8: Machine Vision

Case Study on Machine Vision 2011

Here, the background is a classification, specific to the data, used to distinguish salient elements from the foreground. If the background variable is omitted, then the two-pass algorithm will treat the background as another region.

Graphical Example of Two-pass Algorithm

1. The array from which connected regions are to be extracted is given below

2. After the first pass, the following labels are generated. Note that a total of 7 labels are generated in accordance with the conditions highlighted above.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 8GGSIPU

Page 9: Machine Vision

Case Study on Machine Vision 2011

The label equivalence relationships generated are,

Set ID Equivalent Labels1 1,22 1,23 3,4,5,6,74 3,4,5,6,75 3,4,5,6,76 3,4,5,6,77 3,4,5,6,7

3. Array generated after the merging of labels is carried out. Here, the label value that was the smallest for a given region "floods" throughout the connected region and gives two distinct labels, and hence two distinct labels.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 9GGSIPU

Page 10: Machine Vision

Case Study on Machine Vision 2011

4. Final result in colour to clearly see two different regions that have been found in the array.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 10GGSIPU

Page 11: Machine Vision

Case Study on Machine Vision 2011

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 11GGSIPU

Page 12: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 4

PATTERN RECOGNITION

4.1 What is pattern recognition?

In machine learning, pattern recognition is the assignment of some sort of output value (or label) to a given input value (or instance), according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence.

Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to do "fuzzy" matching of inputs. This is opposed to pattern matching algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors. In contrast to pattern recognition, pattern matching is generally not considered a type of machine learning, although pattern-matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similar-quality output to the sort provided by pattern-recognition algorithms.

Pattern recognition is studied in many fields, including psychology, psychiatry, ethology, cognitive science and computer science.

4.2 Overview

Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. Supervised learning assumes that a set of training data has been provided, consisting of a set of instances that have been properly labelled by hand with the correct output. A learning procedure then generates a model that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor). Unsupervised learning, on the other hand, assumes training data that has not been hand-labelled, and attempts to find inherent patterns in the data that can then be used to determine the correct output value for new data instances. A combination of the two that has recently been explored is semi-supervised learning, which uses a combination of labelled and unlabeled data (typically a small set of labelled data combined with a large amount of

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 12GGSIPU

Page 13: Machine Vision

Case Study on Machine Vision 2011

unlabeled data). Note that in cases of unsupervised learning, there may be no training data at all to speak of; in other words, the data to be labelled is the training data.

Note that sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output. For example, the unsupervised equivalent of classification is normally known as clustering, based on the common perception of the task as involving no training data to speak of, and of grouping the input data into clusters based on some inherent similarity measure (e.g. the distance between instances, considered as vectors in a multi-dimensional vector space), rather than assigning each input instance into one of a set of pre-defined classes. Note also that in some fields, the terminology is different: For example, in community ecology, the term "classification" is used to refer to what is commonly known as "clustering".

The piece of input data for which an output value is generated is formally termed an instance. The instance is formally described by a vector of features, which together constitute a description of all known characteristics of the instance. (These feature vectors can be seen as defining points in an appropriate multidimensional space, and methods for manipulating vectors in vector spaces can be correspondingly applied to them, such as computing the dot product or the angle between two vectors.) Typically, features are either categorical (also known as nominal, i.e. consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g. "large", "medium" or "small"), integer-valued (e.g. a count of the number of occurrences of a particular word in an email) or real-valued (e.g. a measurement of blood pressure). Often, categorical and ordinal data are grouped together; likewise for integer-valued and real-valued data. Furthermore, many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be discretized into groups (e.g. less than 5, between 5 and 10, or greater than 10).

Many common pattern recognition algorithms are probabilistic in nature, in that they use statistical inference to find the best label for a given instance. Unlike other algorithms, which simply output a "best" label, often times probabilistic algorithms also output a probability of the instance being described by the given label. In addition, many probabilistic algorithms output a list of the N-best labels with associated probabilities, for some value of N, instead of simply a single best label. When the number of possible labels is fairly small (e.g. in the case of classification), N may be set so that the probability of all possible labels is output. Probabilistic algorithms have many advantages over non-probabilistic algorithms:

1. They output a confidence value associated with their choice. (Note that some other algorithms may also output confidence values, but in general, only for probabilistic algorithms are this value mathematically grounded in probability theory. Non-probabilistic confidence values can in general not be given any specific meaning, and only used to compare against other confidence values output by the same algorithm.)

2. Correspondingly, they can abstain when the confidence of choosing any particular output is too low.

3. Because of the probabilities output, probabilistic pattern-recognition algorithms can be more effectively incorporated into larger machine-learning

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 13GGSIPU

Page 14: Machine Vision

Case Study on Machine Vision 2011

tasks, in a way that partially or completely avoids the problem of error propagation.

Techniques to transform the raw feature vectors are sometimes used prior to application of the pattern-matching algorithm. For example, feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as principal components analysis (PCA). Feature selection algorithms, attempt to directly prune out redundant or irrelevant features. The distinction between the two is that the resulting features after feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the features left after feature selection are simply a subset of the original features.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 14GGSIPU

Page 15: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 5

BARCODE

5.1 What is barcode?

A barcode is an optical machine-readable representation of data, which shows data about the object to which it attaches. Originally, barcodes represented data by varying the widths and spacings of parallel lines, and may be referred to as linear or 1 dimensional (1D). Later they evolved into rectangles, dots, hexagons and other geometric patterns in 2 dimensions (2D). Although 2D systems use a variety of symbols, they are generally referred to as barcodes as well. Barcodes originally were scanned by special–optical scanners called barcode readers, scanners and interpretive software are available on devices including desktop printers and smartphones.

The first use of barcodes was to label railroad cars, but they were not commercially successful until they were used to automate supermarket checkout systems, a task for which they have become almost universal. Their use has spread to many other tasks that are generically referred to as Auto ID Data Capture (AIDC). The very first scanning of the now ubiquitous Universal Product Code (UPC) barcode was on a pack of Wrigley Company chewing gum in June 1974.

5.2 Scanners (barcode readers)

The earliest, and still the cheapest, barcode scanners are built from a fixed light and a single photosensor that is manually "scrubbed" across the barcode.

Barcode scanners can be classified into three categories based on their connection to the computer. The older type is the RS-232 barcode scanner. This type requires special programming for transferring the input data to the application program.

"Keyboard interface scanners" connect to a computer using a PS/2 or AT keyboard–compatible adaptor cable. The barcode's data is sent to the computer as if it had been typed on the keyboard.

Like the keyboard interface scanner, USB scanners are easy to install and do not need custom code for transferring input data to the application program.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 15GGSIPU

Page 16: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 6

EDGE DETECTION

6.1 What is edge detection?

Edge detection is a fundamental tool in image processing and computer vision, particularly in the areas of feature detection and feature extraction, which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities.

6.2 Edge properties

The edges extracted from a two-dimensional image of a three-dimensional scene can be classified as either viewpoint dependent or viewpoint independent. A viewpoint independent edge typically reflects inherent properties of the three-dimensional objects, such as surface markings and surface shape. A viewpoint dependent edge may change as the viewpoint changes, and typically reflects the geometry of the scene, such as objects occluding one another.

A typical edge might for instance be the border between a block of red colour and a block of yellow. In contrast a line (as can be extracted by a ridge detector) can be a small number of pixels of a different colour on an otherwise unchanging background. For a line, there may therefore usually be one edge on each side of the line.

6.3 A simple edge model

Although certain literature has considered the detection of ideal step edges, the edges obtained from natural images are usually not at all ideal step edges. Instead they are normally affected by one or several of the following effects:

1. Focal blur caused by a finite depth-of-field and finite point spread function.2. Penumbral blur caused by shadows created by light sources of non-zero

radius.3. Shading at a smooth object

A number of researchers have used a Gaussian smoothed step edge (an error function) as the simplest extension of the ideal step edge model for modeling the effects of edge blur in practical applications.[3][5] Thus, a one-dimensional image f which has exactly one edge placed at x = 0 may be modeled as:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 16GGSIPU

Page 17: Machine Vision

Case Study on Machine Vision 2011

At the left side of the edge, the intensity is , and right of the edge

it is . The scale parameter σ is called the blur scale of the edge.

6.4 Why edge detection is a non-trivial task?

To illustrate why edge detection is not a trivial task, consider the problem of detecting edges in the following one-dimensional signal. Here, we may intuitively say that there should be an edge between the 4th and 5th pixels.

5 7 6 4 152 148 149

If the intensity difference were smaller between the 4th and the 5th pixels and if the intensity differences between the adjacent neighbouring pixels were higher, it would not be as easy to say that there should be an edge in the corresponding region. Moreover, one could argue that this case is one in which there are several edges.

5 7 6 41 113 148 149

Hence, to firmly state a specific threshold on how large the intensity change between two neighbouring pixels must be for us to say that there should be an edge between these pixels is not always simple.[3] Indeed, this is one of the reasons why edge detection may be a non-trivial problem unless the objects in the scene are particularly simple and the illumination conditions can be well controlled (see for example, the edges extracted from the image with the girl above).

6.5 Approaches

There are many methods for edge detection, but most of them can be grouped into two categories, search-based and zero-crossing based. The search-based methods detect edges by first computing a measure of edge strength, usually a first-order derivative expression such as the gradient magnitude, and then searching for local directional maxima of the gradient magnitude using a computed estimate of the local orientation of the edge, usually the gradient direction. The zero-crossing based methods search for zero crossings in a second-order derivative expression computed from the image in order to find edges, usually the zero-crossings of the Laplacian or the zero-crossings of a non-linear differential expression. As a pre-processing step to edge detection, a smoothing stage, typically Gaussian smoothing, is almost always applied (see also noise reduction).

The edge detection methods that have been published mainly differ in the types of smoothing filters that are applied and the way the measures of edge strength are

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 17GGSIPU

Page 18: Machine Vision

Case Study on Machine Vision 2011

computed. As many edge detection methods rely on the computation of image gradients, they also differ in the types of filters used for computing gradient estimates in the x- and y-directions.

6.6 Canny edge detection

John Canny considered the mathematical problem of deriving an optimal smoothing filter given the criteria of detection, localization and minimizing multiple responses to a single edge. He showed that the optimal filter given these assumptions is a sum of four exponential terms. He also showed that this filter can be well approximated by first-order derivatives of Gaussians. Canny also introduced the notion of non-maximum suppression, which means that given the presmoothing filters, edge points are defined as points where the gradient magnitude assumes a local maximum in the gradient direction. Looking for the zero crossing of the 2nd derivative along the gradient direction was first proposed by Haralick. It took less than two decades to find a modern geometric variational meaning for that operator that links it to the Marr-Hildreth (zero crossing of the Laplacian) edge detector. That observation was presented by Ron Kimmel and Alfred Bruckstein.

Although his work was done in the early days of computer vision, the Canny edge detector (including its variations) is still a state-of-the-art edge detector. Unless the preconditions are particularly suitable, it is hard to find an edge detector that performs significantly better than the Canny edge detector.

The Canny-Deriche detector was derived from similar mathematical criteria as the Canny edge detector, although starting from a discrete viewpoint and then leading to a set of recursive filters for image smoothing instead of exponential filters or Gaussian filters.

The differential edge detector described below can be seen as a reformulation of Canny's method from the viewpoint of differential invariants computed from a scale-space representation leading to a number of advantages in terms of both theoretical analysis and sub-pixel implementation.

Other first-order methods

For estimating image gradients from the input image or a smoothed version of it, different gradient operators can be applied. The simplest approach is to use central differences:

corresponding to the application of the following filter masks to the image data:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 18GGSIPU

Page 19: Machine Vision

Case Study on Machine Vision 2011

The well-known and earlier Sobel operator is based on the following filters:

Given such estimates of first- order derivatives, the gradient magnitude is then computed as:

while the gradient orientation can be estimated as

Other first-order difference operators for estimating image gradient have been proposed in the Prewitt operator and Roberts cross.

6.7 Thresholding and Linking

Once we have computed a measure of edge strength (typically the gradient magnitude), the next stage is to apply a threshold, to decide whether edges are present or not at an image point. The lower the threshold, the more edges will be detected, and the result will be increasingly susceptible to noise and detecting edges of irrelevant features in the image. Conversely a high threshold may miss subtle edges, or result in fragmented edges.

If the edge thresholding is applied to just the gradient magnitude image, the resulting edges will in general be thick and some type of edge thinning post-processing is necessary. For edges detected with non-maximum suppression however, the edge curves are thin by definition and the edge pixels can be linked into edge polygon by an edge linking (edge tracking) procedure. On a discrete grid, the non-maximum suppression stage can be implemented by estimating the gradient direction using first-order derivatives, then rounding off the gradient direction to multiples of 45 degrees, and finally comparing the values of the gradient magnitude in the estimated gradient direction.

A commonly used approach to handle the problem of appropriate thresholds for thresholding is by using thresholding with hysteresis. This method uses multiple thresholds to find edges. We begin by using the upper threshold to find the start of an edge. Once we have a start point, we then trace the path of the edge through the image pixel by pixel, marking an edge whenever we are above the lower threshold. We stop marking our edge only when the value falls below our lower threshold. This approach makes the assumption that edges are likely to be in continuous curves, and allows us to follow a faint section of an edge we have previously seen, without meaning that every noisy pixel in the image is marked down as an edge. Still, however, we have the problem of choosing appropriate thresholding parameters, and suitable thresholding values may vary over the image.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 19GGSIPU

Page 20: Machine Vision

Case Study on Machine Vision 2011

6.8 Edge thinning

Edge thinning is a technique used to remove the unwanted spurious points on the edge of an image. This technique is employed after the image has been filtered for noise (using median, Gaussian filter etc.), the edge operator has been applied (like the ones described above) to detect the edges and after the edges have been smoothed using an appropriate threshold value. This removes all the unwanted points and if applied carefully, results in one pixel thick edge elements.

Advantages:

1. Sharp and thin edges lead to greater efficiency in object recognition.2. If Hough transforms are used to detect lines and ellipses, then thinning could

give much better results.3. If the edge happens to be boundary of a region, then thinning could easily give

the image parameters like perimeter without much algebra.

There are many popular algorithms used to do this, one such is described below:

1) Choose a type of connectivity, like 8, 6 or 4.

2) 8 connectivity is preferred, where all the immediate pixels surrounding a particular pixel are considered.

3) Remove points from North, south, east and west.

4) Do this in multiple passes, i.e. after the north pass, use the same semi processed image in the other passes and so on.

5) Remove a point if:

The point has no neighbours in the North

The point is not the end of a line.

The point is isolated. Removing the points will not cause to disconnect its neighbours in any way.

6) Else keep the point. The number of passes across direction should be chosen according to the level of accuracy desired.

Second-order approaches to edge detection

Some edge-detection operators are instead based upon second-order derivatives of the intensity. This essentially captures the rate of change in the intensity gradient. Thus, in the ideal continuous case, detection of zero-crossings in the second derivative captures local maxima in the gradient.

The early Marr-Hildreth operator is based on the detection of zero-crossings of the Laplacian operator applied to a Gaussian-smoothed image. It can be shown, however,

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 20GGSIPU

Page 21: Machine Vision

Case Study on Machine Vision 2011

that this operator will also return false edges corresponding to local minima of the gradient magnitude. Moreover, this operator will give poor localization at curved edges. Hence, this operator is today mainly of historical interest.

6.9 Differential edge detection

A more refined second-order edge detection approach which automatically detects edges with sub-pixel accuracy uses the following differential approach of detecting zero-crossings of the second-order directional derivative in the gradient direction:

Following the differential geometric way of expressing the requirement of non-maximum suppression proposed by Lindeberg, let us introduce at every image point a local coordinate system (u,v), with the v-direction parallel to the gradient direction. Assuming that the image has been pre-smoothed by Gaussian smoothing and a scale-space representation L(x,y;t) at scale t has been computed, we can require that the gradient magnitude of the scale-space representation, which is equal to the first-order directional derivative in the v-direction Lv, should have its first order directional derivative in the v-direction equal to zero

while the second-order directional derivative in the v-direction of Lv should be negative, i.e.

Written out as an explicit expression in terms of local partial derivatives Lx, Ly ... Lyyy, this edge definition can be expressed as the zero-crossing curves of the differential invariant

that satisfy a sign-condition on the following differential invariant

where Lx, Ly ... Lyyy denote partial derivatives computed from a scale-space representation L obtained by smoothing the original image with a Gaussian kernel. In this way, the edges will be automatically obtained as continuous curves with sub-pixel accuracy. Hysteresis thresholding can also be applied to these differential and subpixel edge segments.

In practice, first-order derivative approximations can be computed by central differences as described above, while second-order derivatives can be computed from the scale-space representation L according to:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 21GGSIPU

Page 22: Machine Vision

Case Study on Machine Vision 2011

corresponding to the following filter masks:

Higher-order derivatives for the third-order sign condition can be obtained in an analogous fashion.

6.10 Phase congruency-based edge detection

A recent development in edge detection techniques takes a frequency domain approach to finding edge locations. Phase congruency (also known as phase coherence) methods attempt to find locations in an image where all sinusoids in the frequency domain are in phase. These locations will generally correspond to the location of a perceived edge, regardless of whether the edge is represented by a large change in intensity in the spatial domain. A key benefit of this technique is that it responds strongly to Mach bands, and avoids false positives typically found around roof edges. A roof edge, is a discontinuity in the first order derivative of a grey-level profile.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 22GGSIPU

Page 23: Machine Vision

Case Study on Machine Vision 2011

CHAPTER 7

TEMPLATE MACHINING

7.1 What is template machining?

Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used in manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images.

Template matching can be subdivided between two approaches: feature-based and template-based matching. The feature-based approach uses the features of the search and template image, such as edges or corners, as the primary match-measuring metrics to find the best matching location of the template in the source image. The template-based, or global, approach, uses the entire template, with generally a sum-comparing metric (using SAD, SSD, cross-correlation, etc.) that determines the best location by testing all or a sample of the viable test locations within the search image that the template image may match up to.

7.2 Feature-based approach

If the template image has strong features, a feature-based approach may be considered; the approach may prove further useful if the match in the search image might be transformed in some fashion. Since this approach does not consider the entirety of the template image, it can be more computationally efficient when working with source images of larger resolution, as the alternative approach, template-based, may require searching potentially large amounts of points in order to determine the best matching location.

7.3 Template-based approach

For templates without strong features, or for when the bulk of the template image constitutes the matching image, a template-based approach may be effective. As aforementioned, since template-based template matching may potentially require sampling of a large number of points, it is possible to reduce the number of sampling points by reducing the resolution of the search and template images by the same factor and performing the operation on the resultant downsized images (multiresolution, or pyramid, image processing), providing a search window of data points within the search image so that the template does not have to search every viable data point, or a combination of both.

7.4 Motion tracking and occlusion handling

In instances where the template may not provide a direct match, it may be useful to implement the use of eigenspaces – templates that detail the matching object under a

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 23GGSIPU

Page 24: Machine Vision

Case Study on Machine Vision 2011

number of different conditions, such as varying perspectives, illuminations, colour contrasts, or acceptable matching object “poses”. For example, if the user was looking for a face, the eigenspaces may consist of images (templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions.

It is also possible for the matching image to be obscured, or occluded by an object; in these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search image may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or any object in front of the camera for that matter. In cases where the object is malleable or poseable, motion also becomes a problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision.

7.5 Template-based matching and convolution

A basic method of template matching uses a convolution mask (template), tailored to a specific feature of the search image, which we want to detect. This technique can be easily performed on grey images or edge images. The convolution output will be highest at places where the image structure matches the mask structure, where large image values get multiplied by large mask values.

This method is normally implemented by first picking out a part of the search image to use as a template: We will call the search image S(x, y), where (x, y) represent the coordinates of each pixel in the search image. We will call the template T(x t, y t), where (xt, yt) represent the coordinates of each pixel in the template. We then simply move the center (or the origin) of the template T(x t, y t) over each (x, y) point in the search image and calculate the sum of products between the coefficients in S(x, y) and T(xt, yt) over the whole area spanned by the template. As all possible positions of the template with respect to the search image are considered, the position with the highest score is the best position. This method is sometimes referred to as 'Linear Spatial Filtering' and the template is called a filter mask.

For example, one way to handle translation problems on images, using template matching is to compare the intensities of the pixels, using the SAD (Sum of absolute differences) measure.

A pixel in the search image with coordinates (xs, ys) has intensity Is(xs, ys) and a pixel in the template with coordinates (xt, yt) has intensity It(xt , yt ). Thus the absolute difference in the pixel intensities is defined as Diff(xs, ys, x t, y t) = | Is(xs, ys) – It(x t, y t) ./

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 24GGSIPU

Page 25: Machine Vision

Case Study on Machine Vision 2011

The mathematical representation of the idea about looping through the pixels in the search image as we translate the origin of the template at every pixel and take the SAD measure is the following:

Srows and Scols denote the rows and the columns of the search image and Trows and Tcols

denote the rows and the columns of the template image, respectively. In this method the lowest SAD score gives the estimate for the best position of template within the search image. The method is simple to implement and understand, but it is one of the slowest methods.

Example

+ =

REFERENCES:

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 25GGSIPU

Page 26: Machine Vision

Case Study on Machine Vision 2011

1. Introduction to Robotics By Saeed B.Niku

2. Industrial Robotics By Mikeell P.Groover.

NORTHERN INDIA ENGINEERING COLLEGE, DELHI-110053 Page 26GGSIPU