a talk to be presented at the oxford robotics research...

A talk to be presented at the Oxford Robotics Research Group Seminar

Hongdong Li, PhD

ANU and NICTA, Canberra, Australia

September 2013

Outline

• 1. A quick intro. to Computer Vision research @ ANU– hope to finish in about 3~5 minutes’ time.

• 2. A technical talk on “mid-level vision”– image contour extraction and region segmentation.

Canberra & ANU

ANU: Australian National University

• A member of Go8

• A member of IARU

ANU, Berkeley, Cambridge, Copenhagen, ETH, NUS, Oxford, Peking, Tokyo, Yale.

http://www.go8.edu.au/go8-members/go8-member-profiles

http://www.iaruni.org/index.php

The ANU/NICTA Computer Vision Group (Lead by Prof. Richard Hartley)

• Multi-View Geometry& Optimization.

• Bionic Eyes

• Medical Image Analysis

• Statistical Machine Learning (kernel methods and etc)

• Embedded Robotic Vision

•

Main Research Themes

Multiple View Geometry• Multi-View Geometry has been a “success story”

in computer vision research.

• Major milestones in MVG research, completed by the VGG group, Oxford

• 1993 Marr Prize:– Rothwell, Forsyth, Zisserman, and Mundy,

Extracting Projective Structure from Single Perspective Views of 3D Point Sets, ICCV 1993

• 1998 Marr Prize: – Phil Torr, A. Fitzgibbon, and A. Zisserman,

The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences , ICCV 1998.

• 2003 Marr Prize:– Andrew Fitzgibbon, Yonatan Wexler, and

Andrew Zisserman, Image-based Rendering using Image-based Priors

http://en.wikipedia.org/wiki/Andrew_Zisserman

@ ANU, we are continue exploring some new, unconventional directions in MVG…

• Novel camera sensors: – Generalized camera model,– Multiple camera rigs, – Non-central camera, – Light field camera,

• Multiple-body motion segmentation– Subspace clustering.– Multiple model fitting.

• Non-rigid, deformable, articulated SFM.– Template based– Template-free approach

• Large scale and efficient optimization. L-inf norm, etc.

Multiple Camera-Rig Video Odometry

Li and Hartley, CVPR08, Kim and Li, Pollyfeys, CVPR 08, Liu and Hartley: camera and planar mirror, CVPR 2013, Kim, Li et al: Camera with symmetric mirror, ICCV 2013.

Multi-Body Motion Segmentation

Multi-body scenario is more realistic in practice, but is substantially hard.

In collaboration with JHU/Adelaide ( Hartley, Li, CVPR’07, CVPR’08)

Non-rigid structure-from-motion ( Dai, and Li, CVPR 2012)

Deformable surface modelling

• Mathieu Salzmann & Hartley, CVPR’09, CVPR’10.

Australia Bionic Eye

• The first stage: budgeted 42 millions over 4 years since 2011.

• 5 BVA Members and 2 contributing universities.

The above picture is for concept illustration purpose only, it is not our actual system.

Australia Bionic Eye Project: Who will benefit

– Profound blind patients.

– AMD patients (Age- related Macular Degeneration).

– RP patients (Retinitis Pigmentosa).

In today’s technical talk, I will focus on…

• Mid-level vision processing (10%)

1. Perceptual contour grouping or completion2. Joint contour-region figure-ground segmentation

Topic-1: Perceptual Boundary Extraction

Topic-2: Joint contour-region segmentation

Joint work with Yansheng Ming, 2rd-year PhD student,and Dr. Xuming He.

Today’s technical talk

Topic-1:

Perceptual Boundary Extraction

Image Boundary Extraction

• Given an image, we want to extract salient contours (boundaries) inside the image.

• This is an important vision task, because boundaries often carry important shape information for visual recognition, and for other vision tasks as well.

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/images/human/normal/overlay/color/1105/385039.jpg

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/images/human/normal/outline/color/1105/385039.jpg

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/images/human/normal/overlay/color/1105/385039.jpg

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/images/human/normal/outline/color/1105/385039.jpg

The task

• There is no rigorous definition for “boundary”, but this task is however closely related to:

• Edge Detection• Contour Extraction• Region Segmentation

– In particular, in this work, we will use human observer’s labelling results (in BSDS) as the “ground-truth”,

– and we want to produce boundaries that are “good- looking” and resemble human’s perception.

What do we mean by “good-looking” ?

Original imageCanny edges

Pb boundaries Human labelling in BSDS300Clean, smooth, closed, connected,…

What do we mean by “good-looking” ?

Original image Canny edges

Pb boundaries Human labelling

Clean, smooth, closed, connected,…

Preview of Our Results (by using the method to be described in this talk)

We deliberately use edge information only, and without using region segmentation.

How we achieve this ?• We treat contour-extraction as a middle-level perceptual

grouping task (more specifically, contour completion task). • We develop a new CRF model for the task, which

incorporates various mid-level perceptual grouping rules, a.k.a. Gestalt principles. We especially focus on the closure effect principle.

Why Closure ?

– There are numerous psychological evidences, and,– ecologist justifications (e.g. J.Gibson), plus– cognitive vision experiments (e.g. B.Julesz), all show

that…

Why Closure ?

The contour closure effect plays an important role in human visual perception and figure completion.

• Therefore, we want to develop a new contour extraction algorithm that can mimic this, to incorporate or respect this closure-effect.

Challenges

• However, compared with other relatively local Gestalt principles such as “good-continuation” or “proximity”, the “closure effect” is harder to encode, and harder to compute (efficiently).

• This is because, to represent the concept of that “a contour is closed ” we need to consider long-range inter pixel interactions, or even global topological, high-order properties, …

• Most previous work (on contour completion) have focused on “relatively local” Gestalt principles (e.g.: small clique size, pair-wise potential term: Pott model, not P^N model).

• In contrast, much fewer have been devoted to exploiting the “contour-closure” principle.

Specific goal of this work

• The goal of this work is to develop a new computational model for contour completion,

• --- a new CRF model that can not only encode the contour closure effect, but also allow efficient inference.

A high level description of our method

• To solve the image contour-extraction problem,

– we design a graphical model on bottom-up detected boundary segments.

– define a CRF (Conditional Radom Field) with proper potential energy functions on it, and,

– Formulate it as an energy-minimization problem (i.e. an inference problem).

• Solving the inference problem, and obtain the contour map.

Our new method explained: two key components

• The CRF model– Graphical model construction.– Potential functions design.

• The Inference Algorithm.

Input

Boundary graph

Result

Processing flow

Inference Algorithm

Pb boundary

CRF potential energy function design

Reminiscent of CDT (Ren et al, ICCV’05, IJCV’08)

• Our completion-edge proposal method is inspired by, similar to, yet more flexible than Ren et al’s CDT (Constrained Delaunay Triangulation) method.

• X Ren, C Fowlkes and J Malik, Scale-Invariant Contour Completion using Conditional Random Fields., ICCV '05.

The energy function

The decision variable Y= {yi} indicates whether or not the (ith) edgelet should be kept on the final contour.

yi=1 : on a contour (to keep); yi=0 : otherwise (to remove).

31

Y|XX

1

1 exp Y|X

Y|X ,x Y ,X Y Y,XJ P H

N

D i i S q q q Mi q C q C C

P EZ

E y

Data term, Junction term, Closure term, Complexity term

Presenter

Presentation Notes

We have to uses some formulas.. As standard, the probability of MRF is related to the energy function. The energy function is composed of 4 parts. We will discuss them one by one.

The decision variable Y

The decision variable Y= {yi} indicates whether or not the (i-th) edgelet should be kept on the final contour.

yi=1 : on a contour (to keep); yi=0 : otherwise (to discard).

32

Presenter

Presentation Notes


Data TermData term: weighted Pb (local-evidence)

33

Y|XX

1

1 exp Y|X


N


P EZ

E y

Presenter

Presentation Notes

The unary data terms are defined for every gradient edge. And it is designed as a linear function of its features.

Junction Term

Junction potential measures the likelihood of “good- continuation” and “smoothness” etc local Gestalt principles.

34

Y|XX

1

1 exp Y|X


N


P EZ

E y

Presenter

Presentation Notes

The junction potential function is defined for every completion edge and its neigboring gradient edges. Again, the weight is a linear function of feature.

We propose two types of “completion junctions” to fill in the gap between edgelets

Junction potentials• L-junction potential measures likelihood of the

smoothness and good continuation Gestalt principles;

• T-junction potential captures the likelihood of occluding/occluded relationships, defined as triplet.

36

Presenter

Presentation Notes

The second column shows the features for the junction potential function of an L-junction edge, Based on the bridging line segments of the two neighbouring edgelets, the lengths of corner completion And the turning angles. The intuition is that a smooth continition of contours will require shorter gaps and Small turning angles. This is how our model encodes the GL of continuity.

Model-complexity Term

Model complexity potential:

37

Y|XX

1

1 exp Y|X


N


P EZ

E y

•Similar to the MDL idea, to control the sparseness.

Presenter

Presentation Notes

Finally, the model complexity potential is the weighted sum of all labels. The parameter tao controls the level of complexity of contours. We will see the effect of this parameter in the experiments.

Summary: the potential terms

38

Y|XX

1

1 exp Y|X


N


P EZ

E y


•MDL•sparsity

Pb value

Presenter

Presentation Notes


The closure-effect term ?

39

Y|XX

1

1 exp Y|X


N


P EZ

E y


•MDL•sparsity

Pb value

?

Presenter

Presentation Notes


Design the closure-effect potential

– We said that, being global, the closure-effect is hard to encode.

– Instead, we approximate (relax) it by a series of local connectedness constraints.

– The main intuition is to discourage/penalize any floating, isolated, disconnected boundaries or edge segments (i.e. edgelets) that violate the connectedness condition.

40

Presenter

Presentation Notes

The gestalt law of closure requires a set of contours to enclose certain region, and vice visa. However, our model does not involve any concept of regions. We approximate the closure principle by a series of connectedness constraints:

• To do so, we enumerate, in the local, all possible types of (spatial) configurations that violate the Gestalt principle of closure.

• We represent such configuration violations by some mathematical conditions (actually, using linear inequalities).

• We are therefore able to quantitatively measure the closure- effect simply by counting the number of violations.

• Eventually the closure-effect potential function is written in pseudo-Boolean form.

42

Two basic types of violations

Completion inequality requires that each of the (virtual) completion edgelets must connect to real, observed edgelets; Extension inequality requires that a real edgelet must connect to virtual completion edgelets.

Presenter

Presentation Notes

The completion constraints require a completion edge to connect to the gradient edges at each end. Mathematically, it can be put into this inequity. Yi is and yjs are… The left is shows a set of completion edges whose completion constraints are satisfied. The right shows if we remove one edge, the constraint is violated.

We simply count the number of #(violated inequalities) as the closure-potential

We use a Pseudo-Boolean potential function to encode the closure effect.

43

Y|XX

1

1 exp Y|X


N


P EZ

E y

Presenter

Presentation Notes

Contour closure potential functions have a more complex form, since it is derived from a set of inequities.

Piece them together: factor graph topology

44

Presenter

Presentation Notes

Each edge we detected and proposed is associated a binary variable in the MRF. First of all, there are certain observations associated with each edge. Later, we will explain what they are. In the middle, we have junction potential function psy_s and closure potential psy_gammar. These potentials are supposed to encode the gestalt laws.

Parameter learning scheme

• Logistic regression with piecewise learning strategy is used to learn the parameters.

• Use BSDS300 human labelling results as the ground-truth.

• Training time: in minutes.

Where are we now

• We mentioned that, in contrast to other local Gestalt principles, the “closure effect” is hard to encode, hard to compute.

• Now, we have encoded the “closure effect” in our CRF model.

• Next, we consider how to compute (i.e. do inference on) the CRF, hopefully in an efficient way.

Our new method explained

• The CRF model– Graphical model construction.– Potential functions design.

• The Inference Algorithm

Inference

•Doing direct (conventional) inference on our CRF is very hard and inefficient. ( average clique size is 15~20)

•Reasons: – higher-order nonlinear terms

cubic and higher order terms.

– (very) large, global clique size. not very suitable for LBP, or QPBO,– non-submodularity. not very suitable for max-flow graph-cut.

48

Presenter

Presentation Notes

After setting up the model, and learned its parameter from the data, a natural question is how to do inference. In the standard form, our CRF has a lot of large cliques, due to the extension constraint. The left histogram show that an endpoint of gradient edge could connect to 25 completion edges. Therefore, the CRF has a clique size as large as 25. So we cannot use Bp. Also, we prove that we our energy function is non-submodular which excludes the possibility of using graphcut.

Proof of the non-submodularity

•For detailed proof please see the Suppl. Material of the paper.

Our solution

• Since Graph-cut, BP, QPBO, are not suitable,

• Standard combinatorial optimization is too slow,

• We propose a hand-crafted, tailored inference algorithm, by exploiting the special structure of the solution space of the problem.

Energy Function Reduction

• Key observation:– We notice that, if none of the “closure effect” is

violated, then the “junction potentials” can be reduced to linear terms (from 3rd order (cubic) terms).

– In addition, all the closure-effect terms are represented in the original linear inequality form.

Now that, the inference problem becomes (Integer) linear programming

• Minimizing linear function, subject to a set of linear inequality constraints, over binary variables Y.

• (0-1 Integer) Linear Program (ILP).

52

Y

min Y|XE

Presenter

Presentation Notes

Our solution is to formulate the inference problem as a ILP. We represent the extension and completion constraints as inequities of the ILP and Simplify the triple term into linear term using the completion constraints. However, general ILP is still NP hard.

Experiments and Results

Result: on synthetic images

54

•Our Contour extraction results

Presenter

Presentation Notes

First, these are some results on the synthetic images. The left figure shows our model is robust against occlusion, noise right figure shows preference for closed contours on gestalt figures. These are raw outputs. The gradient edgelets are in blue, And the L-junction edges are in red. The T-junction edges are in green.

Step-by-step sample result

Sample Results

56

Probability of Boundary

Our ResultHuman labeling

Presenter

Presentation Notes

Our model, which encodes some the gestalt laws of grouping, has partially correct these topological errors.

Sample Results


More results

Clean, smooth, closed, connected, …

Results: change of model-complexity parameter

59

0

1

2

3

Presenter

Presentation Notes

This experiment shows the effect of complexity parameter. With the increase of this parameter, our model produce less contours, but the contour connectivity is maintained.

More results: natural images

60


Presenter

Presentation Notes

These are results on natural images. The third column are binary contour images derived from the raw output

More results


Pb

CDT

C-cut

Ours62

Qualitative Visual Comparison

Presenter

Presentation Notes

These are some comparison of outputs of our model and related models. The second column shows the Pb detection. The third column shows the result of previous CRF model. The fourth shows the result fo c-cut algorithm. The last shows our result. The improvements of contour topology is evident.

Quantitative benchmarking (against BSDS300 human ground-truth labelling)

63

Presenter

Presentation Notes

We also compare our model with other approaches quantitatively in terms of pixel-based recall and precision on BSDS. We have a sizable improvement over previous CRF, and we have longer working range than the C-cut.

Efficacy of the “closure effect”

64

Presenter

Presentation Notes

This figure demonstrate the effectiveness of closure potentials, i.e. the constraints. Without out the potential the edges are independent of each other, leading to a bad performance.

Our results: while are visually pleasing

Drawbacks? topologically not meaningful !

Drawbacks• Although there are no floating,

isolated contours or boundaries, each of the found boundaries does not necessarily enclose a sensible region.

• Topologically meaningless !

• If region information is used (in conjunction), the result should be much improved.

We want to get a more meaningful region/contour segmentation

• © Picture stolen from OBJCUT

Topic-2: Figure-ground segmentation by

combining both contour-cues and region-cues

Today’s technical talk

Motivation

• Contour extraction is a challenging problem due to noises, low contrast,etc.

• Boundary-cue and region-cue are complementary visual cues that needs to be used jointly in combination.

Left: boundary cue for image segmentation; Right: Regional cue for image segmentation (figure credit: T. Cour).

Formulation: joint energy minimization

• Min E_contour(x) + lambda*E_region(y)

• Contour cues: intensity contrast, colour changes, continuity, smoothness, curvature,..

– Methods: snake, active contour,..

• Region cues: intra-region colour similarity, homogeneity, spatial closeness.

•– Methods: clustering, normalized-cut, graph-cut…

An issue

• Why and how to ensure that region-labels and contour- labels are consistent (compatible) ?

min EnergyFunction region_labels, contour_labels. . region_labels and contour_labels are consistents t

Region-Contour Consistency Condition

• if a contour label is active, its two adjacent regions must have different region labels;

• if two adjacent regions have different labels, the boundary in- between must be active.

What if not consistent ?

• A region without proper boundary.

• A contour that does not enclose any meaningful region.

• Cannot provide meaningful partition of the image.

Our goal


Existing solutions are mostly based on local, microscopic analysis

• E.g.:

Lead to a large number of constraints that need to be handled, and many degenerate cases to be resolved

Another example

Our idea: a global topological solution

How to properly fence a house ?

• Is the house securely fenced ?

Flood-fill the yard ?

A quicker way to test

Works on a directed graph too !

• Interior point test, point-in-polygon test.

Winding number

• By definition, the winding number of a closed curve in the plane around a given point is an integer representing the total number of times that curve travels counter- clockwise around the point.

• The winding number depends on the orientation of the curve, and is negative if the curve travels around the point clockwise.

Winding number explained• Suppose we are given a

closed, oriented curve in the xy plane.

• We can imagine the curve as the path of motion of a point, with the orientation indicating the direction in which the object moves.

• Then the winding number of the curve is equal to the total number of counter-clockwise turns that the object makes around the origin.

Math definition: residue theorem

The winding number of a contour about a point is defined by:

Winding number: fast computation

: Winding number for region i ;

, : count contour crossing.

: contour crossing from right: contour crossing from left

Our key intuition

We observe that, in region-contour segmentation, if we use a region’s winding number to denote its region-label, then the following set of linear constraints automatically ensures the region-label and the incident contour-labels are consistent.

• This is because:

• Now, we have • the dependency.

Recall: Our goal


Formulation: ratio optimization

Solve the ratio optimization by LP

• Optimization of this linear-fractional objective function, subject to linear winding number constraints, can be done as a Linear Program (by 0-1 relaxation, and the Charnes-Cooper transformation).

Let denotes all the variables, and c, d, e, f coefficients.

Alternatively, it may also be solvable via “parametric max-flow algorithm” or “negative cycle algorithm”

• But we have not explored these options further, as our method (a single LP) is rather efficient.

Implementation

• Start from super-pixel over segmentation.

Experiment results comparison

F-index on the horse dataset (1 solution competes with 10 solutions)

F-index on the shape dataset

Add human interaction

Current ongoing/future work

• Extend to multiple-label case (multiple regions).

• Adding high-level semantic (category- specific) information, to the salient contour extraction task.

What I have covered in this talk

• Topic-1: Contour extraction w.r.t. closure-effect.

• Topic-2: Joint contour-region segmentation w/ winding number. • Low level Vision

More results can be found in CVPR 2012, CVPR 2013, ICIP2014, CVPR 2014 (in preparation).

• High level Vision

• Mid-level Vision

Thank you!

• [email protected]

a talk to be presented at the oxford robotics research...

Documents