Download - Silicon retina technology - IMSims.unipv.it/CASWS2017/Slides/Delbruck.pdf · Silicon retina technology Tobi Delbruck Inst. of Neuroinformatics, University of Zurich and ETH Zurich

1

Sensors Group sensors.ini.uzh.ch

Sponsors: Swiss National Science Foundation NCCR Robotics project, EU projects SEEBETTER and

VISUALISE, Samsung, DARPA

Silicon retina technologyTobi DelbruckInst. of Neuroinformatics, University of Zurich and ETH Zurich

http://sensors.ini.uzh.ch/

2

Sponsors: Swiss National Science Foundation NCCR Robotics, EU projects CAVIAR, SEEBETTER,

VISUALISE, Samsung, DARPA, University of Zurich and ETH Zurich

sensors.ini.uzh.ch

inilabs.com

Golf-guides.blogspot.com

Conventional cameras (Static vision sensors) output a stroboscopic sequence of frames

Muybridge 1878

(150 years ago)

GoodCompatible with 50+years of machine vision

Allows small pixels (1um for consumer, 3-5um

for machine vision)

BadRedundant output

Temporal aliasing

Limited dynamic range (60dB)

Fundamental “latency vs. power” trade-off 3

100M photoreceptors

1M output fibers carrying max

100Hz spike rates

180dB (109) operating range

>20 different “eyes”

Many GOPs computing

3mW power consumption

Output is sparse,

asynchronous stream of digital

spike events4

The Human Eye as a digital camera

This talk has 4 parts

•Dynamic Vision Sensor Silicon Retinas

•Simple object tracking by algorithmic processing of events

•Using probabilistic methods for state estimation

•“Data-driven” deep inference with CNNs 5

DVS (Dynamic Vision Sensor) Pixel

logI

photoreceptor

I

ON

OFF

comparators

(ganglion cells)

threshold

Brightness

change

eventsEvent

reset

change amplifier(bipolar cells)

Lichtsteiner et al., ISSCC 2007, JSSC 2009

From Rodieck 1998 6

±𝛥log𝐼

Edmund 0.1 density chart

Illumination ratio=135:1780 lux 5.8 lux

780 luxON

events 5.8 lux

DVS pixel has wide dynamic range

ISSCC 2007

Using DVS for high speed (low data rate) imaging

Data rate <1MBps

“Frame rate”

equivalent to 10 kHz

but 100x less data

(10 kHz image sensor

x 16k pixels = 160

MBps)

ISSCC 2007

DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel

logI

photoreceptor

I

ON

OFF

comparators

(ganglion cells)

threshold

Change

events

Event

reset


Brandli et al., Symp VLSI, JSSC 2014

From Rodieck 1998

Intensity

reset

Intensity

value

9

±𝛥log𝐼

DVS/DAVIS +IMU demo

11Brandli, Berner, Delbruck et al., Symp. VLSI 2013, JSSC 2014, ISCAS 2015

Start

DAVIS

Demo

DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel

logI

photoreceptor

I

ON

OFF

comparators

(ganglion cells)

threshold

Change

events

Event

reset


Brandli et al., Symp VLSI, JSSC 2014

From Rodieck 1998

Intensity

reset

Intensity

value

12

±𝛥log𝐼

13

DAVIS346

Bias generator

AER DVS asynch. event readout

APS col-parallel ADCs and scanner

180nm CIS

346x260 18.5um

pixel DAVIS

8mm

14

Important layout

considerations

1. Post layout

simulations to

minimize parasitic

coupling

2. Shielding parasitic

photodiodes

Event threshold matching measurementExperiment: Apply slow triangle wave LED stimulus to entire array,

measure njmber of events that pixels generate

Conclusion: Pixels generate 11±3 events per factor 3.3 contrast.

Since ln(3.3)=1.19 and 1.19/11=0.11, contrast threshold=11% ± 4%

latency

jitter

time

Conclusion: Pixels can have minimum latency of about 12us under bright illumination. But “real

world” latencies are more like 100us-1ms.

Measuring DVS pixel latencyExperiment: Stimulate small area of sensor with flashing LED spot, measure

response latencies from recorded event stream

DVS pixel has built-in temperature compensation

17

Photoreceptor

VpT ln(Ip)

Threshold

onT ln(Ion/Id)

Since photoreceptor gain and threshold voltage both

scale with absolute temperature T, it cancels outNozaki, Delbruck 2017 (unpublished)

Integrated bias generator and circuit design enables operation over extended temperature range

18Nozaki, Delbruck 2017 (submitted)

350nm

90nm

180nm

350/180nm

Global shutter APSRolling shutter consumer APS

DVS pixel size trend

20https://docs.google.com/spreadsheets/d/1pJfybCL7i_wgH3qF8zsj1JoWMtL0zHKr9eygikBdElY/edit#gid=0

Event camera silicon retina developments

21

DVS/DAVIS

CeleX

ATIS/CCAM

DVSCommercial entities

Inilabs (Zurich) – R&D prototoypes

Insightness (Zurich) – Drones and Augmented Reality

Samsung (S Korea) – Consumer electronics

Pixium Vision (Paris) – Retinal implants

Inivation (Zurich) – Industrial applications, Automotive

Chronocam (Paris) - Automotive

Hillhouse (Singapore) - Automotive

22

Neuromorphic sensor R&D prototypes

Open source software, user guides,

app notes, sample data

Shipped devices based on multiproject wafer

silicon to 100+ organizations

Founded 2009

www.iniLabs.com

Run as not-for-profit

http://www.inilabs.com/support

http://www.inilabs.com/

23

• Dynamic Vision Sensor Silicon Retinas

• Simple object tracking by algorithmic processing of events

• Using probabilistic methods for state estimation

• “Data-driven” deep inference with CNNs

Tracking objects from DVS events using spatio-temporal coherence

24/30

1. For each event, find nearest cluster

• If event within a cluster, move

cluster

• If event not within cluster, seed

new cluster

2. Periodically prune starved clusters,

merge clusters, etc (lifetime mgmt)

Advantages

1. Low computational cost (e.g. <5% CPU)

2. No frame memory (~100 bytes/object).

3. No frame correspondence problem

Litzenberger 2007

Robo Goalie

25Delbruck et al, ISCAS 2007, Frontiers 2013

Using DVS allows 2 ms reaction time at 4% processor load with USB bus connections

26

Simultaneous Mosaicing and Tracking with DVS

28Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014.

http://www.doc.ic.ac.uk/~ajd/Publications/kim_etal_bmvc2014.pdf

29Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014.

Simultaneous Mosaicing and Tracking with DVS

http://www.doc.ic.ac.uk/~ajd/Publications/kim_etal_bmvc2014.pdf

Goal: To do event-based, semi-dense visual odometry

•We want to estimate State vector 𝑠 (camera pose, visual scene spatial brightness gradients and sensor event thresholds) using Bayesian filtering from the events 𝑒: 𝑝 𝑠 𝑒

•Sensor likelihood 𝑝 𝑒 𝑠 is modeled as mixture of inlier Gaussian distribution and outlier uniform distribution

•A tractable posterior 𝑞 𝑠 𝑒 ≈ 𝑝(𝑠|𝑒) is approximated by Kullback-Leibler (KL) divergence

•Leads to closed-form update equations in the form of a classical Kalman filter, thus computationally efficient (unlike particle filtering)

G. Gallego E. Mueggler D. Scaramuzza(submitted to PAMI, 2016)

Towards event-based, semi-dense SLAM: 6-DOF pose estimation

G. Gallego et al., PAMI (submitted 2016).

Demo - RoShamBo

33

RoShamBo CNN architecture

Conv 5x5

16x60x60Total 18MOp (~9M MAC)

Compute times:

On 150W Core i7 PC in Caffe: 2ms

On 1W CNN accelerator on FPGA: 8ms

PaperScissorsRockBackground

64x64

DVS 2D rectified

histogram of

2k events

(0.1Hz – 1kHz rate)

MaxPool

2x2

16x30x30

Conv 3x3

32x28x28

Conv 1x1

+

MaxPool 2x2

128x1x1

MaxPool 2x2

32x14x14

Conv 3x3

64x12x12

MaxPool 2x2

64x6x6

Conv 3x3

128x4x4

MaxPool

2x2

128x2x2

240x180 DVS “frames”

Conventional 5-layer LeNet with ReLU/MaxPool and 1 FC layer before output.

I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing

RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017.

RoShamBo training images

I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing

RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017.

36

A. Aimar, E. Calabrese, H. Mostafa, A. Rios-Navarro, R. Tapiador, I.-A. Lungu, A. Jimenez-Fernandez, F.

Corradi, S.-C. Liu, A. Linares-Barranco, and T. Delbruck, “Nullhop: Flexibly efficient FPGA CNN accelerator

driven by DAVIS neuromorphic vision sensor,” in NIPS 2016, Barcelona, 2016.

Conclusions1. The DVS was developed by following a neuromorphic approach of

emulating key properties of biological retinas

2. The wide dynamic range and sparse, quick output make these sensors useful in real time uncontrolled conditions

3. Applications could include vision prosthetics, surveillance, robotics and consumer electronics

4. The precise timing could improve learning and inference

5. The main challenges are to reduce pixel size and to develop effective algorithms. Only industry can do the first but academia has plenty of room to play for the second.

6. Event sensors can nicely drive deep inference. There is a lot of room for improvement of deep inference power efficiency at the system level! 37

Download - Silicon retina technology - IMSims.unipv.it/CASWS2017/Slides/Delbruck.pdf · Silicon retina technology Tobi Delbruck Inst. of Neuroinformatics, University of Zurich and ETH Zurich

Top Related