1
Sensors Group sensors.ini.uzh.ch
Sponsors: Swiss National Science Foundation NCCR Robotics project, EU projects SEEBETTER and
VISUALISE, Samsung, DARPA
Silicon retina technologyTobi DelbruckInst. of Neuroinformatics, University of Zurich and ETH Zurich
2
Sponsors: Swiss National Science Foundation NCCR Robotics, EU projects CAVIAR, SEEBETTER,
VISUALISE, Samsung, DARPA, University of Zurich and ETH Zurich
sensors.ini.uzh.ch
inilabs.com
Golf-guides.blogspot.com
Conventional cameras (Static vision sensors) output a stroboscopic sequence of frames
Muybridge 1878
(150 years ago)
GoodCompatible with 50+years of machine vision
Allows small pixels (1um for consumer, 3-5um
for machine vision)
BadRedundant output
Temporal aliasing
Limited dynamic range (60dB)
Fundamental “latency vs. power” trade-off 3
100M photoreceptors
1M output fibers carrying max
100Hz spike rates
180dB (109) operating range
>20 different “eyes”
Many GOPs computing
3mW power consumption
Output is sparse,
asynchronous stream of digital
spike events4
The Human Eye as a digital camera
This talk has 4 parts
•Dynamic Vision Sensor Silicon Retinas
•Simple object tracking by algorithmic processing of events
•Using probabilistic methods for state estimation
•“Data-driven” deep inference with CNNs 5
DVS (Dynamic Vision Sensor) Pixel
logI
photoreceptor
I
ON
OFF
comparators
(ganglion cells)
threshold
Brightness
change
eventsEvent
reset
change amplifier(bipolar cells)
Lichtsteiner et al., ISSCC 2007, JSSC 2009
From Rodieck 1998 6
±𝛥log𝐼
Edmund 0.1 density chart
Illumination ratio=135:1780 lux 5.8 lux
780 luxON
events 5.8 lux
DVS pixel has wide dynamic range
ISSCC 2007
Using DVS for high speed (low data rate) imaging
Data rate <1MBps
“Frame rate”
equivalent to 10 kHz
but 100x less data
(10 kHz image sensor
x 16k pixels = 160
MBps)
ISSCC 2007
DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel
logI
photoreceptor
I
ON
OFF
comparators
(ganglion cells)
threshold
Change
events
Event
reset
change amplifier(bipolar cells)
Brandli et al., Symp VLSI, JSSC 2014
From Rodieck 1998
Intensity
reset
Intensity
value
9
±𝛥log𝐼
10
DVS/DAVIS +IMU demo
11Brandli, Berner, Delbruck et al., Symp. VLSI 2013, JSSC 2014, ISCAS 2015
Start
DAVIS
Demo
DAVIS (Dynamic and Active Pixel Vision Sensor) Pixel
logI
photoreceptor
I
ON
OFF
comparators
(ganglion cells)
threshold
Change
events
Event
reset
change amplifier(bipolar cells)
Brandli et al., Symp VLSI, JSSC 2014
From Rodieck 1998
Intensity
reset
Intensity
value
12
±𝛥log𝐼
13
DAVIS346
Bias generator
AER DVS asynch. event readout
APS col-parallel ADCs and scanner
180nm CIS
346x260 18.5um
pixel DAVIS
8mm
14
Important layout
considerations
1. Post layout
simulations to
minimize parasitic
coupling
2. Shielding parasitic
photodiodes
Event threshold matching measurementExperiment: Apply slow triangle wave LED stimulus to entire array,
measure njmber of events that pixels generate
Conclusion: Pixels generate 11±3 events per factor 3.3 contrast.
Since ln(3.3)=1.19 and 1.19/11=0.11, contrast threshold=11% ± 4%
latency
jitter
time
Conclusion: Pixels can have minimum latency of about 12us under bright illumination. But “real
world” latencies are more like 100us-1ms.
Measuring DVS pixel latencyExperiment: Stimulate small area of sensor with flashing LED spot, measure
response latencies from recorded event stream
DVS pixel has built-in temperature compensation
17
Photoreceptor
VpT ln(Ip)
Threshold
onT ln(Ion/Id)
Since photoreceptor gain and threshold voltage both
scale with absolute temperature T, it cancels outNozaki, Delbruck 2017 (unpublished)
Integrated bias generator and circuit design enables operation over extended temperature range
18Nozaki, Delbruck 2017 (submitted)
350nm
90nm
180nm
350/180nm
Global shutter APSRolling shutter consumer APS
DVS pixel size trend
20https://docs.google.com/spreadsheets/d/1pJfybCL7i_wgH3qF8zsj1JoWMtL0zHKr9eygikBdElY/edit#gid=0
Event camera silicon retina developments
21
DVS/DAVIS
CeleX
ATIS/CCAM
DVSCommercial entities
Inilabs (Zurich) – R&D prototoypes
Insightness (Zurich) – Drones and Augmented Reality
Samsung (S Korea) – Consumer electronics
Pixium Vision (Paris) – Retinal implants
Inivation (Zurich) – Industrial applications, Automotive
Chronocam (Paris) - Automotive
Hillhouse (Singapore) - Automotive
22
Neuromorphic sensor R&D prototypes
Open source software, user guides,
app notes, sample data
Shipped devices based on multiproject wafer
silicon to 100+ organizations
Founded 2009
www.iniLabs.com
Run as not-for-profit
23
• Dynamic Vision Sensor Silicon Retinas
• Simple object tracking by algorithmic processing of events
• Using probabilistic methods for state estimation
• “Data-driven” deep inference with CNNs
Tracking objects from DVS events using spatio-temporal coherence
24/30
1. For each event, find nearest cluster
• If event within a cluster, move
cluster
• If event not within cluster, seed
new cluster
2. Periodically prune starved clusters,
merge clusters, etc (lifetime mgmt)
Advantages
1. Low computational cost (e.g. <5% CPU)
2. No frame memory (~100 bytes/object).
3. No frame correspondence problem
Litzenberger 2007
Robo Goalie
25Delbruck et al, ISCAS 2007, Frontiers 2013
Using DVS allows 2 ms reaction time at 4% processor load with USB bus connections
26
This talk has 4 parts
•Dynamic Vision Sensor Silicon Retinas
•Simple object tracking by algorithmic processing of events
•Using probabilistic methods for state estimation
•“Data-driven” deep inference with CNNs 27
Simultaneous Mosaicing and Tracking with DVS
28Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014.
29Hanme Kim, A. Handa, … Andy J. Davison, BMVC 2014.
Simultaneous Mosaicing and Tracking with DVS
Goal: To do event-based, semi-dense visual odometry
•We want to estimate State vector 𝑠 (camera pose, visual scene spatial brightness gradients and sensor event thresholds) using Bayesian filtering from the events 𝑒: 𝑝 𝑠 𝑒
•Sensor likelihood 𝑝 𝑒 𝑠 is modeled as mixture of inlier Gaussian distribution and outlier uniform distribution
•A tractable posterior 𝑞 𝑠 𝑒 ≈ 𝑝(𝑠|𝑒) is approximated by Kullback-Leibler (KL) divergence
•Leads to closed-form update equations in the form of a classical Kalman filter, thus computationally efficient (unlike particle filtering)
G. Gallego E. Mueggler D. Scaramuzza(submitted to PAMI, 2016)
Towards event-based, semi-dense SLAM: 6-DOF pose estimation
G. Gallego et al., PAMI (submitted 2016).
This talk has 4 parts
•Dynamic Vision Sensor Silicon Retinas
•Simple object tracking by algorithmic processing of events
•Using probabilistic methods for state estimation
•“Data-driven” deep inference with CNNs 32
Demo - RoShamBo
33
RoShamBo CNN architecture
Conv 5x5
16x60x60Total 18MOp (~9M MAC)
Compute times:
On 150W Core i7 PC in Caffe: 2ms
On 1W CNN accelerator on FPGA: 8ms
PaperScissorsRockBackground
64x64
DVS 2D rectified
histogram of
2k events
(0.1Hz – 1kHz rate)
MaxPool
2x2
16x30x30
Conv 3x3
32x28x28
Conv 1x1
+
MaxPool 2x2
128x1x1
MaxPool 2x2
32x14x14
Conv 3x3
64x12x12
MaxPool 2x2
64x6x6
Conv 3x3
128x4x4
MaxPool
2x2
128x2x2
240x180 DVS “frames”
Conventional 5-layer LeNet with ReLU/MaxPool and 1 FC layer before output.
I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing
RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017.
RoShamBo training images
I.-A. Lungu, F. Corradi, and T. Delbruck, “Live Demonstration: Convolutional Neural Network Driven by Dynamic Vision Sensor Playing
RoShamBo,” in 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Baltimore, MD, USA, 2017.
36
A. Aimar, E. Calabrese, H. Mostafa, A. Rios-Navarro, R. Tapiador, I.-A. Lungu, A. Jimenez-Fernandez, F.
Corradi, S.-C. Liu, A. Linares-Barranco, and T. Delbruck, “Nullhop: Flexibly efficient FPGA CNN accelerator
driven by DAVIS neuromorphic vision sensor,” in NIPS 2016, Barcelona, 2016.
Conclusions1. The DVS was developed by following a neuromorphic approach of
emulating key properties of biological retinas
2. The wide dynamic range and sparse, quick output make these sensors useful in real time uncontrolled conditions
3. Applications could include vision prosthetics, surveillance, robotics and consumer electronics
4. The precise timing could improve learning and inference
5. The main challenges are to reduce pixel size and to develop effective algorithms. Only industry can do the first but academia has plenty of room to play for the second.
6. Event sensors can nicely drive deep inference. There is a lot of room for improvement of deep inference power efficiency at the system level! 37