segmentation slides adopted from svetlana lazebnik

Introduction

Segmentationslides adopted from Svetlana Lazebnik

ImageIntensity-based clustersColor-based clustersSegmentation as clusteringK-means clustering based on intensity or color is essentially vector quantization of the image attributesClusters dont have to be spatially coherent

2I gave each pixel the mean intensity or mean color of its cluster --- this is basically just vector quantizing the image intensities/colors. Notice that there is no requirement that clusters be spatially localized and theyre not.K-Means for segmentationProsVery simple methodConverges to a local minimum of the error functionConsMemory-intensiveNeed to pick KSensitive to initializationSensitive to outliersOnly finds spherical clusters

3Find features (color, gradients, texture, etc)Initialize windows at individual feature pointsPerform mean shift for each window until convergenceMerge windows that end up near the same peak or mode

Mean shift clustering/segmentation

4

http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.htmlMean shift segmentation results5Mean shift pros and consProsDoes not assume spherical clustersJust a single parameter (window size) Finds variable number of modesRobust to outliersConsOutput depends on window sizeComputationally expensiveDoes not scale well with dimension of feature space6Segmentation by graph partitioningBreak Graph into SegmentsDelete links that cross between segmentsEasiest to break links that have low affinitysimilar pixels should be in the same segmentsdissimilar pixels should be in different segments

ABCSource: S. Seitzwijij7Measuring affinitySuppose we represent each pixel by a feature vector x, and define a distance function appropriate for this feature representationThen we can convert the distance between two feature vectors into an affinity with the help of a generalized Gaussian kernel:

8Minimum cutWe can do segmentation by finding the minimum cut in a graphEfficient algorithms exist for doing this

Minimum cut example9Normalized cutDrawback: minimum cut tends to cut off very small, isolated componentsIdeal CutCuts with lesser weightthan the ideal cut* Slide from Khurram Hassan-Shafique CAP5415 Computer Vision 200310Normalized cutDrawback: minimum cut tends to cut off very small, isolated componentsThis can be fixed by normalizing the cut by the weight of all the edges incident to the segmentThe normalized cut cost is:

w(A, B) = sum of weights of all edges between A and B

Solution to eigen decomposition of (D W)y = Dy

J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI 200011

Example result12ChallengeHow to segment images that are a mosaic of textures?

13Using texture features for segmentationConvolve image with a bank of filtersJ. Malik, S. Belongie, T. Leung and J. Shi. "Contour and Texture Analysis for Image Segmentation". IJCV 43(1),7-27,2001.

14Using texture features for segmentationConvolve image with a bank of filtersFind textons by clustering vectors of filter bank outputs

J. Malik, S. Belongie, T. Leung and J. Shi. "Contour and Texture Analysis for Image Segmentation". IJCV 43(1),7-27,2001.

Texton mapImage15Using texture features for segmentationConvolve image with a bank of filtersFind textons by clustering vectors of filter bank outputsThe final texture feature is a texton histogram computed over image windows at some local scale

J. Malik, S. Belongie, T. Leung and J. Shi. "Contour and Texture Analysis for Image Segmentation". IJCV 43(1),7-27,2001.

16Pitfall of texture featuresPossible solution: check for intervening contours when computing connection weights

J. Malik, S. Belongie, T. Leung and J. Shi. "Contour and Texture Analysis for Image Segmentation". IJCV 43(1),7-27,2001. 17An example ImplementionCompute an initial segmentation from the locally estimated weight matrix.a) Compute eigen-decomposition of Connectivity graphb) Pixel wise K-means with K=30 on the 11-dim subspace defined by the eigenvectors 2-12c) Reduce K until error threshold is reachedAn example Implemention1. Compute an initial segmentation from the locally estimated weight matrix.2. Update the weights using the initial segmentation.- build histogram by considering pixels in the intersection of segmentation and local windowAn example Implemention1. Compute an initial segmentation from the locally estimated weight matrix.2. Update the weights using the initial segmentation.3. Coarsen the graph with the updated weights to reduce the segmentation to a much simpler problem.- Each segment is now a node in the graph - Weights are computed through aggregation over the original graph matrix weightsAn example Implementation1. Compute an initial segmentation from the locally estimated weight matrix.2. Update the weights using the initial segmentation.3. Coarsen the graph with the updated weights to reduce the segmentation to a much simpler problem.4. Compute a nal segmentation using the coarsened graph.1. Compute the second smallest eigenvector for the generalized eigensystem using weights for coarsened graph2. Threshold the eigenvector to produce a bipartitioning of the image. 30 different values uniformly spaced within the range of the eigenvector are tried as the threshold. The one producing a partition which minimizes the normalized cut value is chosen. The corresponding partition is the best way to segment the image into two regions.3. Recursively repeat steps 1 and 2 for each of the partitions until the normalized cut value is larger than 0.1.Example results

22Results: Berkeley Segmentation Engine

http://www.cs.berkeley.edu/~fowlkes/BSE/23ProsGeneric framework, can be used with many different features and affinity formulationsConsHigh storage requirement and time complexityBias towards partitioning into equal segmentsNormalized cuts: Pro and con24Segmentationmany slides from Svetlana Lazebnik, Anat Levin

Segments as primitives for recognitionJ. Tighe and S. Lazebnik, ECCV 201026Bottom-up segmentation Normalized cuts Mean shift

Bottom-up approaches: Use low level cues to group similar pixels

27Bottom-up segmentation is ill posedSome segmentation example (maybe horses from Erans paper)

Many possible segmentation are equally good based on low level cues alone.images from Borenstein and Ullman 02 28

Top-down segmentation Class-specific, top-down segmentation (Borenstein & Ullman Eccv02) Winn and Jojic 05Leibe et al 04Yuille and Hallinan 02.Liu and Sclaroff 01Yu and Shi 03

29Combining top-down and bottom-up segmentation Find a segmentation:Similar to the top-down modelAligns with image edges

+30Why learning top-down and bottom-up models simultaneously?

Large number of freedom degrees in tentacles configuration- requires a complex deformable top down model On the other hand: rather uniform colors- low level segmentation is easy31Learn top-down and bottom-up models simultaneouslyReduces at run time to energy minimization with binary labels (graph min cut)Combined Learning Approach32

Energy model

Consistency with fragments segmentation Segmentation alignment with image edges

33Energy model

Segmentation alignment with image edges

Consistency with fragments segmentation 34Energy model

Segmentation alignment with image edges

Resulting min-cut segmentation Consistency with fragments segmentation

35

Learning from segmented class images Training data:

Learn fragments for an energy function36

Fragments selectionCandidate fragments pool:Greedy energy design:

37Fragments selection challengesStraightforward computation of likelihood improvement is impractical

2000 Fragments50 Training images10 Fragments selection iterations

1,000,000 inference operations!38Fragments selectionFragment with low error on the training setFirst order approximation to log-likelihood gain:

Fragment not accounted for by the existing model39Requires a single inference process on the previous iteration energy to evaluate approximations with respect to all fragmentsFirst order approximation evaluation is linear in the fragment sizeFirst order approximation to log-likelihood gain:

Fragments selection40Training horses model

41Training horses model-one fragment

42Training horses model-two fragments

43

Training horses model-three fragments

44Results- horses dataset

45Results- horses dataset

Fragments numberMislabeled pixels percent Comparable to previous but with far fewer fragments 46Results- artificial octopi

47

Top-down segmentationE. Borenstein and S. Ullman, Class-specific, top-down segmentation, ECCV 2002A. Levin and Y. Weiss, Learning to Combine Bottom-Up and Top-Down Segmentation, ECCV 2006.

48Visual motion

Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys49Motion and perceptual organizationSometimes, motion is the only cue

50Motion and perceptual organizationSometimes, motion is foremost cue

51Motion and perceptual organizationEven impoverished motion data can evoke a strong percept

G. Johansson, Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, 201-211, 1973.52Motion and perceptual organizationEven impoverished motion data can evoke a strong percept

G. Johansson, Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, 201-211, 1973.53Motion and perceptual organizationEven impoverished motion data can evoke a strong perceptG. Johansson, Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, 201-211, 1973.

54Uses of motionEstimating 3D structureSegmenting objects based on motion cuesLearning and tracking dynamical modelsRecognizing events and activities55Motion fieldThe motion field is the projection of the 3D scene motion into the image

56Motion field and parallaxX(t) is a moving 3D pointVelocity of scene point: V = dX/dtx(t) = (x(t),y(t)) is the projection of X in the imageApparent velocity v in the image: given by components vx = dx/dt and vy = dy/dtThese components are known as the motion field of the imagex(t)x(t+dt)X(t)X(t+dt)Vv57Motion field and parallaxx(t)x(t+dt)X(t)X(t+dt)Vv

To find image velocity v, differentiate x=(x,y) with respect to t (using quotient rule):

Image motion is a function of both the 3D motion (V) and thedepth of the 3D point (Z)

58Quotient rule for differentiation: D(f/g) = (g f g f)/g^2Motion field and parallaxPure translation: V is constant everywhere

59Motion field and parallaxPure translation: V is constant everywhere

The length of the motion vectors is inversely proportional to the depth ZVz is nonzero: Every motion vector points toward (or away from) the vanishing point of the translation direction

60At the vanishing point of the motion location, the visual motion is zeroWe have v = 0 when V_z x = v_0 or x = (f V_x / V_z, f V_y / V_z)Motion field and parallaxPure translation: V is constant everywhere

The length of the motion vectors is inversely proportional to the depth ZVz is nonzero: Every motion vector points toward (or away from) the vanishing point of the translation directionVz is zero: Motion is parallel to the image plane, all the motion vectors are parallel

61

Figure from Michael Black, Ph.D. ThesisLength of flow vectors inversely proportional to depth Z of 3d pointpoints closer to the camera move more quickly across the image planeMotion field + camera motionMotion field + camera motion

Zoom outZoom inPan right to leftMotion field + camera motion

Rigid Motion with Rotation ComponentMotion field + camera motion

Rotation Optical flowTranslation Optical FlowMotion estimation techniquesFeature-based methodsExtract visual features (corners, textured areas) and track them over multiple framesSparse motion fields, but more robust trackingSuitable when image motion is large (10s of pixels)

Direct methodsDirectly recover image motion at each pixel from spatio-temporal image brightness variationsDense motion fields, but sensitive to appearance variationsSuitable for video and when image motion is small

66Optical Flow based segmentationsegmentation of flow vectors using the above techniques:mean shiftnormalized cutstop down approaches

Applications of segmentation to videoBackground subtractionA static camera is observing a sceneGoal: separate the static background from the moving foreground

How to come up with background frame estimate without access to empty scene?68Applications of segmentation to videoBackground subtractionShot boundary detectionCommercial video is usually composed of shots or sequences showing the same objects or sceneGoal: segment video into shots for summarization and browsing (each shot can be represented by a single keyframe in a user interface)Difference from background subtraction: the camera is not necessarily stationary

69

segmentation slides adopted from svetlana lazebnik

Documents

image segmentation

segmentationconvolve

cut cost

image intensitiescolors

texture analysis

local minimum

minimum cutwe

feature vectors