autocalib: automatic calibration of traffic cameras at...
TRANSCRIPT
-
AutoCalib: Automatic Calibration of Traffic Cameras at Scale
Romil Bhardwaj†, Gopi Krishna Tummala*, Ganesan Ramalingam†, Ramachandran Ramjee†, Prasun Sinha*
†Microsoft Research, *The Ohio State University
-
50
150
250
350
450
2012 2013 2014 2015 2016
Nu
mb
er
of
Cam
era
s (M
illio
n)
Number of Security Cameras Worldwide
Source: IHS
-
Conventional Traffic Camera Uses
Post-facto Incident ReviewManual Surveillance
-
Emerging Traffic Camera Use Cases
Vehicle Speed Measurement(without dedicated sensors)
Traffic Analytics Near Miss Stats
All require distance measurements in the scene
-
Measuring Distances in an Image
220 px = 8 m
220 px = 34 m
Camera CalibrationReal-world Coordinates (m) Image Coordinates (px)
-
Camera Calibration
𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1
𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3
𝑥
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
𝑇
𝑅
-
“Hard” Calibration
𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1
𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3
𝑥
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
Not Scalable!
-
“Soft” Calibration
𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1
𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3
𝑥
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
≈EPnP Solver
-
“Soft” Calibration - Prior Art
Chessboard Calibration
Vanishing Points
Geometric Landmarks
No Chessboard Patternsin Traffic Views
Assumption ofStraight Line Motion
Assumption ofLandmarks
-
AutoCalib Overview
AutoCalib𝑇
𝑅
Traffic Video Calibration Estimate
AutoCalib: no humans-in-the-loop, robust calibration
-
Video FramesVehicle
DetectionKeypoint
Extraction
Calibrations Set
Vehicle Geometric Dimensions
Calibration
Geometry based filters
Calibration Values
Cropped Image Vehicle Keypoints
𝑌𝐶
𝑋𝐶𝑍𝐶
𝑋𝐺
𝑌𝐺
𝑍𝐺
𝑅, 𝑇𝑹𝟏, 𝑻𝟏𝑹𝟐, 𝑻𝟐
:
AutoCalib - Pipeline
-
Vehicle Detection
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Vehicle Detection
• Off-the-shelf DNNs (Fast-RCNN, YOLO) promise state of the art accuracy• Expensive, scene often empty
• Background Subtraction is fast• Inaccurate
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Solution - Trigger the DNN with Background Subtraction
-
Key-point Extraction
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Key-point Selection
Desired Properties
1. Visually Distinct
• Ease of detection
2. Non-planar
• Robust Calibrations
vs
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Key-point Extraction
• Statistical vision based techniques aren’t robust to lighting variations
• DNNs require a lot of labelled data• No datasets available
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Transfer learn a DNN on a smaller dataset
-
Transfer Learning - Primer
Convolution and Pooling Layers(Generic Features)
Fully Connected Layers(Car Model Classification)
Output:BMW 3 Series
-
Transfer Learning - Primer
Convolution and Pooling Layers(Generic Features)
Fully Connected Layers(now detecting key-points)
Output:Key-points (x,y)
Transfer Learning - Less Data, Faster Training
-
Key-point DNN Dataset
• Manually labelled key-points on 486 car images
• Image Augmentation
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Original Img Horz Mirror Horz Mirror Rotate
Horz Mirror Crop
Original Crop
Original Rotate
Total of 10,344 images post augmentation
-
Key-point DNN Training• GoogLeNet architecture trained on CUHK CompCars dataset (CVPR ‘15)
for Car make/model classification
• Replaced last two fully connected layers with keypoint regression outputs
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Key-point DNN Performance
~80% of Key-points < 10% error
-
Calibration Estimation
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1
𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3
𝑥
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
-
Vehicle Identification at low resolution…
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
… is hard!(for both, humans and machines)
-
Can’t identify… so, approximate!
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
R1, T1
R3, T3
R2, T2
Calibrate
n Modelsn Calibrations
(Toyota Prius, Toyota Corolla, Honda Civic, Volkswagen Jetta, BMW 320i, Audi A4, etc.)
Calibrate with most popular cars
-
Errors in Calibration
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key-point Prediction ErrorsModel Approximation Errors
Statistical filters to remove outliers and average
-
Key Insight 1
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Ground plane should be consistent across all Calibrations
-
The Orientation Filter
1. For calibration 𝑅𝑖 , 𝑇𝑖 , its Z-axis orientation Ԧ𝑧
is defined by vector 𝑅∗,3𝑖
2. Let Ԧ𝑧𝑎𝑣𝑔 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒(𝑅∗,3𝑖 )
3. Pick 𝑛% calibrations with the least deviation
between Ԧ𝑧 and Ԧ𝑧𝑎𝑣𝑔
𝑅1, 𝑇1
𝑅2, 𝑇2
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Key Insight 2
Distance to a fixed point must be consistent across Calibrations
𝑑
𝑝
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
The Displacement Filter
• Focus region: Region where cars are detected
• For each Calibration:
1. Point 𝑝𝑖 = projection of center of focus region on the ground plane
using (𝑅𝑖 , 𝑇𝑖)
2. 𝑑𝑖 = Distance of 𝑝𝑖 to camera
• Pick middle 𝑛% and filter the rest
𝑑𝑖
𝑝𝑖
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Filtering Overview
Orientation Filter (75%)
Displacement Filter (50%)
Average Rotation Matrix
Orientation Filter (75%)
(𝑅𝑓𝑖𝑛𝑎𝑙 , 𝑇𝑓𝑖𝑛𝑎𝑙)
(𝑅1, 𝑇1) (𝑅2, 𝑇2) (𝑅3, 𝑇3)
… . .
Displacement Filter (Pick median)
(𝑅𝑎𝑣𝑔, 𝑇1) (𝑅𝑎𝑣𝑔, 𝑇
2) … . .
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
-
Implementation
Azure Service – 4 Tesla K80s, 224 GB RAM
< 12% error with ~8 minutes of video
-
Evaluation - Dataset
• 350+ hours from 10 traffic cameras in
Seattle
• Resolution - 640x360 to 1280x720
• Ground truth distances and calibration
estimated using Google Earth
A
B D
EF G
Camera Image
A
B D
EF G
8m
8m
12m9m
Google Earth View
-
Evaluation
-
AutoCalib vs Manual Calibration
4.8 5.3 5.1 5.5
8.2
1.83.0
5.1
1.5
5.9
9.8
12.3
7.9
10.6 11.1
6.7
10.211.1
5.1 5.1
0
4
8
12
16
20
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
RM
S E
rror
(%)
Ground Distance Measurement, RMS Error (%)
Manual Calibration AutoCalib EstimateAutoCalib achieves
-
AutoCalib vs Prior Art
9.812.3
7.910.6 11.1 6.7 10.2
11.1
5.15.1
16.8 14.920.3
28.8
15.8
5.4
23.019.4
14.7
56.8
0
10
20
30
40
50
60
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
RM
S E
rro
r (%
)
Ground Distance Measurement, RMS Error (%)
AutoCalib Calibration VP Approach [1]
[1] Dubská et al., Fully automatic Roadside Camera Calibration for Traffic Surveillance. IEEE ITS 2015
AutoCalib outperforms prior state of the art approaches
-
Does more video data help?
AutoCalib converges with increasing vehicle detections
-
Application – Speed Measurement
-
AutoCalib Summary
• Camera Calibration
• Enables distance measurements
• Highly manual today
• AutoCalib
• Scalable automatic calibration
• Uses DNNs to analyze vehicle geometry
• Experiments
• < 12% error in measuring distances
• Calibrates with few hundred detections