autocalib: automatic calibration of traffic cameras at...

38
AutoCalib: Automatic Calibration of Traffic Cameras at Scale Romil Bhardwaj , Gopi Krishna Tummala*, Ganesan Ramalingam , Ramachandran Ramjee , Prasun Sinha* Microsoft Research, *The Ohio State University

Upload: others

Post on 23-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • AutoCalib: Automatic Calibration of Traffic Cameras at Scale

    Romil Bhardwaj†, Gopi Krishna Tummala*, Ganesan Ramalingam†, Ramachandran Ramjee†, Prasun Sinha*

    †Microsoft Research, *The Ohio State University

  • 50

    150

    250

    350

    450

    2012 2013 2014 2015 2016

    Nu

    mb

    er

    of

    Cam

    era

    s (M

    illio

    n)

    Number of Security Cameras Worldwide

    Source: IHS

  • Conventional Traffic Camera Uses

    Post-facto Incident ReviewManual Surveillance

  • Emerging Traffic Camera Use Cases

    Vehicle Speed Measurement(without dedicated sensors)

    Traffic Analytics Near Miss Stats

    All require distance measurements in the scene

  • Measuring Distances in an Image

    220 px = 8 m

    220 px = 34 m

    Camera CalibrationReal-world Coordinates (m) Image Coordinates (px)

  • Camera Calibration

    𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1

    𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3

    𝑥

    Intrinsic Matrix(Focal length, camera center)

    Extrinsic Matrix(Rotation, Translation)

    ImageCoordinates

    Real WorldCoordinates

    𝑇

    𝑅

  • “Hard” Calibration

    𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1

    𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3

    𝑥

    Intrinsic Matrix(Focal length, camera center)

    Extrinsic Matrix(Rotation, Translation)

    ImageCoordinates

    Real WorldCoordinates

    Not Scalable!

  • “Soft” Calibration

    𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1

    𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3

    𝑥

    Intrinsic Matrix(Focal length, camera center)

    Extrinsic Matrix(Rotation, Translation)

    ImageCoordinates

    Real WorldCoordinates

    ≈EPnP Solver

  • “Soft” Calibration - Prior Art

    Chessboard Calibration

    Vanishing Points

    Geometric Landmarks

    No Chessboard Patternsin Traffic Views

    Assumption ofStraight Line Motion

    Assumption ofLandmarks

  • AutoCalib Overview

    AutoCalib𝑇

    𝑅

    Traffic Video Calibration Estimate

    AutoCalib: no humans-in-the-loop, robust calibration

  • Video FramesVehicle

    DetectionKeypoint

    Extraction

    Calibrations Set

    Vehicle Geometric Dimensions

    Calibration

    Geometry based filters

    Calibration Values

    Cropped Image Vehicle Keypoints

    𝑌𝐶

    𝑋𝐶𝑍𝐶

    𝑋𝐺

    𝑌𝐺

    𝑍𝐺

    𝑅, 𝑇𝑹𝟏, 𝑻𝟏𝑹𝟐, 𝑻𝟐

    :

    AutoCalib - Pipeline

  • Vehicle Detection

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Vehicle Detection

    • Off-the-shelf DNNs (Fast-RCNN, YOLO) promise state of the art accuracy• Expensive, scene often empty

    • Background Subtraction is fast• Inaccurate

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    Solution - Trigger the DNN with Background Subtraction

  • Key-point Extraction

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Key-point Selection

    Desired Properties

    1. Visually Distinct

    • Ease of detection

    2. Non-planar

    • Robust Calibrations

    vs

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Key-point Extraction

    • Statistical vision based techniques aren’t robust to lighting variations

    • DNNs require a lot of labelled data• No datasets available

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    Transfer learn a DNN on a smaller dataset

  • Transfer Learning - Primer

    Convolution and Pooling Layers(Generic Features)

    Fully Connected Layers(Car Model Classification)

    Output:BMW 3 Series

  • Transfer Learning - Primer

    Convolution and Pooling Layers(Generic Features)

    Fully Connected Layers(now detecting key-points)

    Output:Key-points (x,y)

    Transfer Learning - Less Data, Faster Training

  • Key-point DNN Dataset

    • Manually labelled key-points on 486 car images

    • Image Augmentation

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    Original Img Horz Mirror Horz Mirror Rotate

    Horz Mirror Crop

    Original Crop

    Original Rotate

    Total of 10,344 images post augmentation

  • Key-point DNN Training• GoogLeNet architecture trained on CUHK CompCars dataset (CVPR ‘15)

    for Car make/model classification

    • Replaced last two fully connected layers with keypoint regression outputs

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Key-point DNN Performance

    ~80% of Key-points < 10% error

  • Calibration Estimation

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    𝑦 =𝑓𝑥 0 𝑐𝑥0 𝑓𝑦 𝑐𝑦0 0 1

    𝑟11 𝑟12 𝑟13 𝑡1𝑟21 𝑟22 𝑟23 𝑡2𝑟31 𝑟32 𝑟33 𝑡3

    𝑥

    Intrinsic Matrix(Focal length, camera center)

    Extrinsic Matrix(Rotation, Translation)

    ImageCoordinates

    Real WorldCoordinates

  • Vehicle Identification at low resolution…

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    … is hard!(for both, humans and machines)

  • Can’t identify… so, approximate!

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    R1, T1

    R3, T3

    R2, T2

    Calibrate

    n Modelsn Calibrations

    (Toyota Prius, Toyota Corolla, Honda Civic, Volkswagen Jetta, BMW 320i, Audi A4, etc.)

    Calibrate with most popular cars

  • Errors in Calibration

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    Key-point Prediction ErrorsModel Approximation Errors

    Statistical filters to remove outliers and average

  • Key Insight 1

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

    Ground plane should be consistent across all Calibrations

  • The Orientation Filter

    1. For calibration 𝑅𝑖 , 𝑇𝑖 , its Z-axis orientation Ԧ𝑧

    is defined by vector 𝑅∗,3𝑖

    2. Let Ԧ𝑧𝑎𝑣𝑔 = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒(𝑅∗,3𝑖 )

    3. Pick 𝑛% calibrations with the least deviation

    between Ԧ𝑧 and Ԧ𝑧𝑎𝑣𝑔

    𝑅1, 𝑇1

    𝑅2, 𝑇2

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Key Insight 2

    Distance to a fixed point must be consistent across Calibrations

    𝑑

    𝑝

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • The Displacement Filter

    • Focus region: Region where cars are detected

    • For each Calibration:

    1. Point 𝑝𝑖 = projection of center of focus region on the ground plane

    using (𝑅𝑖 , 𝑇𝑖)

    2. 𝑑𝑖 = Distance of 𝑝𝑖 to camera

    • Pick middle 𝑛% and filter the rest

    𝑑𝑖

    𝑝𝑖

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Filtering Overview

    Orientation Filter (75%)

    Displacement Filter (50%)

    Average Rotation Matrix

    Orientation Filter (75%)

    (𝑅𝑓𝑖𝑛𝑎𝑙 , 𝑇𝑓𝑖𝑛𝑎𝑙)

    (𝑅1, 𝑇1) (𝑅2, 𝑇2) (𝑅3, 𝑇3)

    … . .

    Displacement Filter (Pick median)

    (𝑅𝑎𝑣𝑔, 𝑇1) (𝑅𝑎𝑣𝑔, 𝑇

    2) … . .

    Video Frames Vehicle DetectionKeypoint

    ExtractionCalibrations SetCalibration

    Geometry based filters

    Calibration Values

  • Implementation

    Azure Service – 4 Tesla K80s, 224 GB RAM

    < 12% error with ~8 minutes of video

  • Evaluation - Dataset

    • 350+ hours from 10 traffic cameras in

    Seattle

    • Resolution - 640x360 to 1280x720

    • Ground truth distances and calibration

    estimated using Google Earth

    A

    B D

    EF G

    Camera Image

    A

    B D

    EF G

    8m

    8m

    12m9m

    Google Earth View

  • Evaluation

  • AutoCalib vs Manual Calibration

    4.8 5.3 5.1 5.5

    8.2

    1.83.0

    5.1

    1.5

    5.9

    9.8

    12.3

    7.9

    10.6 11.1

    6.7

    10.211.1

    5.1 5.1

    0

    4

    8

    12

    16

    20

    C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

    RM

    S E

    rror

    (%)

    Ground Distance Measurement, RMS Error (%)

    Manual Calibration AutoCalib EstimateAutoCalib achieves

  • AutoCalib vs Prior Art

    9.812.3

    7.910.6 11.1 6.7 10.2

    11.1

    5.15.1

    16.8 14.920.3

    28.8

    15.8

    5.4

    23.019.4

    14.7

    56.8

    0

    10

    20

    30

    40

    50

    60

    C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

    RM

    S E

    rro

    r (%

    )

    Ground Distance Measurement, RMS Error (%)

    AutoCalib Calibration VP Approach [1]

    [1] Dubská et al., Fully automatic Roadside Camera Calibration for Traffic Surveillance. IEEE ITS 2015

    AutoCalib outperforms prior state of the art approaches

  • Does more video data help?

    AutoCalib converges with increasing vehicle detections

  • Application – Speed Measurement

  • AutoCalib Summary

    • Camera Calibration

    • Enables distance measurements

    • Highly manual today

    • AutoCalib

    • Scalable automatic calibration

    • Uses DNNs to analyze vehicle geometry

    • Experiments

    • < 12% error in measuring distances

    • Calibrates with few hundred detections