scenes from video workshop talk

What’s so good about pieces, Lego and understanding?

Anton van den Hengel

Australian Centre for Visual Technologies (ACVT)The University of AdelaideSouth Australia

People think in 3D

It has been a theme …

"the perception of solid objects is a process which can be based on the

properties of three-dimensional transformations and the laws of nature”

Larry Roberts (1965)

Geometry is not enough

Structure and semantics interact

Structure and geometry interact

WHY PLANTS ARE LIKE LEGO

Developmental changes in response to drought

Boris Parent, ACPFG

0

1000

2000

3000

4000

5000

6000

7000

30 35 40 45 50 55 60 65

Ab

solu

te g

row

th r

ate

[m

m2

d-1

]

Time after sowing [d]

drought

well watered

39 d after sowing

46 d after sowing

The escape response of Clipper under drought is reflected in

an earlier time of absolute maximum growth

Morphological changes in response to drought

Boris Parent, ACPFG

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

30 40 50 60

Re

lati

ve r

atio

of

sho

ot

are

a /

he

igh

t

Time after sowing [d]

The reduced number of tillers under drought is

reflected in the area/height ratio

Barley cv Clipperdrought

well watered

Deep reasoning

• Try to explain as much as possible

• Fine-grained and detailed

• Deep semantics

• And the implied constraints

• Shape is only an intermediate step

Deconstruction

Silhouettes

• We’re only interested in shape (at least for now)

Deconstruction

• Render all possible building blocks in every possible position, and recover its silhouette

• Then reconstruct object silhouettes from templates

• Requires enough camera information to achieve this

Template shapes

• nTemplates = nShapes x nPositions x nRotations

• So there are lots of them

• But they are sparsely used

Sparse recovery

• \alpha a vector of binary template coefficients

• \Pi a matrix with one template silhouette per column

• y the silhouette of the shape to be recovered

• NP hard and fragile

Sparse recovery – L_1 norm

• But there may still be millions of templates, and they’re enormous (|Pixels| x |Images|)

Sparse recovery – Random projections

• Random projection by DxS matrix \Phi

• D << S

• \Phi is sparsely sampled from N(0,1)

• But there are still too many templates

Sparse recovery - Cropping

• Eliminate templates with a footprint that extends significantly beyond that of the object

• Reduces the number of templates by at least an order of magnitude

• Down to tens to tens of thousands of templates

Binarising the solution

• Solutions are not binary

• Randomly generate binary hypotheses from non-binary \alpha

• Evaluate using an accurate composition model

Results

Plants

Results

200 400 600 800 1000

0.6

0.7

0.8

0.9

Number of Templates

Fra

ctio

n o

f T

rue

Lea

ves R

eco

vere

d

Max

Search

Viable

Results

0 0.01 0.02 0.03 0.04 0.05 0.060

0.02

0.04

0.06

0.08

Noise Level (Fraction of Pixels Changed)

Fra

ctio

n o

f P

ixe

ls E

xp

lain

ed

Max

Search

Composition problems

Not a true model of silhouette formation

So doesn’t deal well with template overlap

Working on this by subtracting overlaps, graph-based approaches

Somewhat overcome by…

Inequality

• Isn’t physically accurate for foreground pixels, so split

• Background (0) pixels

• And foreground pixels

Practicality again

• Only interested in the number of pixels outside the object silhouette, not the location

• So not

• but

Practicality again

• Want to ensure that

• Need to project to a lower dimension

• But \Phi_I must have only positive elements

A better model of composition

• Left with

Constraints - Intersection

Constraints - Intersection

• Form J where every row represents a constraint

• If templates i and k intersect then insert a row in J with only elements i and k set to 1

Constraints - Support

• Form K where every row represents a constraint

• If template i needs support t set K_ii = t

• If template j provides s support to j then K_ij = -s

Measurement benefit tails off

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

0.4

0.5

0.6

0.7

0.8

0.9

1

Noise level (added to camera extrinsics)

Accu

racy (

fra

ctio

n o

f tr

ue

blo

cks r

eco

ve

red

)

Accuracy vs noise for varying numbers of measurements

49

441

1225

2401

3969

5929

8281

11025

Results

Limitations

• One template per value per parameter

• Fixable?

scenes from video workshop talk

Technology