eta: extensible toolkit for video analytics

68
voxel51.com 410 N 4th Ave., 3rd Floor, Ann Arbor Jason Corso, PhD [email protected] ETA: Extensible Toolkit for Video Analytics PSCR Public Safety Broadband Stakeholder Meeting Brian Moore, PhD [email protected]

Upload: others

Post on 18-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Jason Corso, [email protected]

ETA: Extensible Toolkit for Video AnalyticsPSCR Public Safety Broadband Stakeholder Meeting

Brian Moore, [email protected]

2

DISCLAIMER

This presentation was produced by guest speaker(s) and presented at the National Institute of Standards

and Technology’s 2019 Public Safety Broadband Stakeholder Meeting. The contents of this

presentation do not necessarily reflect the views or policies of the National Institute of Standards and

Technology or the U.S. Government.

Posted with permission

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

How do we get from here

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

To here?And from here

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

We train models. To here?

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

By creating an

open-platform for

video analytics

6

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

NIST Grant: The ETA Project

The Extensible Toolkit for Analytics (ETA) is our open-source library for video analytics.

Our Technical Goal● Establish an extensible video analytics infrastructure that will support next-generation

capabilities in public safety applications.

Our Programmatic Goal● Catalyze a community of innovation that brings public safety video analytics up-to-date with

cutting edge computer vision research.

7

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

What is Hard About This?

● How to leverage the state of the art in computer vision and machine learning?● How to spread these capabilities to broad uses in public safety?● How to integrate with existing and proprietary stacks?● How to scale?● How to incentivize stakeholders?

8

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Video-First Analytics

Standard Approaches are Frame-Based Voxel51 is Video First

Walking!Running? Walking? Standing?

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Video-First Analytics

Standard Approaches are Frame-Based Voxel51 is Video First

Walking!Standing? Walking?

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Video-First Analytics

12.8% improvement

Standard Approaches are Frame-Based Voxel51 is Video First

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Streaming Analytics

Time

Compute Nodes

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Extensible Analytics

● Platform-SDK○ Enables third-party party

analytic developers to easily deploy and integrate their analytics on our platform.

○ Directly integrates into the overall ETA infrastructure.

● Available on GitHub

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Example Third-Party: Privacy-Preserving Analytic

Original Anonymized Faces

● Handle FOIA requests with automated ease.

https://egovid.com

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

VehicleSense: fine-grained vehicle recognition

https://voxel51.com/senses/vehicle-sense

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

RoadSense: adds road signs and road scenes

https://voxel51.com/senses/road-sense

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

18

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

ETA System Design19

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

ETA System I/O

Video Input● Users connect data asynchronously to the Voxel51 Platform

Data-Store● Stores user-uploaded data, internal metadata, models, parameter files, and analytics outputs● Storage is implemented as a collection of scalable buckets in Google Cloud or AWS

Analytics Output● Returned in a simple, flexible JSON format● Users can download analytics asynchronously from the Voxel51 Platform

20

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

ETA System Analytics

Core Video Processing● Low-level video operations such as sampling, resizing, optical flow, and segmentation● Preprocessing step for all input video● Generates an internal representation of the video that is amenable for subsequent analysis

Core Video Analytics● Implements core computer vision analytics such as detection, tracking, classification, and

recognition● Relies heavily on open-source libraries (OpenCV, TensorFlow, Keras, etc.) and publicly available

research implementations of modern vision algorithms

21

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

API Design22

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

PlatformConsole

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

PlatformConsole

JobsBoard

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Scoop pulls this all together25

https://voxel51.com/demo

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Faceted Search● Direct access to ontology

○ Search meaningful

○ Create complex logical queries

● Labels schema

○ Schema enforced by system

○ User-uploadable

○ Automatically generated (later)

● Backend builds an index of content and returns slice of dataset in real-time

● Let’s you drill into specific concerns

voxel51.com410 N 4th Ave., 3rd Floor, Ann Arbor

Insights

● Backend index provides faceted-search results

● Summarized

○ Occurrence statistics across ontology

○ Co-occurrence statistics with facet

○ Spatial information

● All individual resources can be studied independently as well

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Open-Source

● ETA – Vision Engine Code

● API-PY – Python API Client Library

● API-JS – Javascript API Client Lib.

● Platform-SDK – Wrap your custom vision methods to be deployed on the platform at scale.

● Coming Soon!○ Player51 – State of the art Javascript

video player

● All at https://github.com/voxel51

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

PlatformDocs

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arborhttps://voxel51.com

Case Study: Mapping Smart Cities

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Case Study

Data: GPS Tagged Dashcam Video Goal: Map all road signs in the city

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Existing Frame-based Methods

Many spurious detections Unusable for mapping

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Voxel51 Video-Based Model

More accurate Enables mapping

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Newton, MA

2.6 Days of Video

Few Hundred Miles

10 Hours of Compute

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Frame-Based Models Voxel51 Video-Based Model

Unable to determine number and location of signs Correctly localizes the signs

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Frame-BasedRoad Sign Map

Unusable information

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Voxel51Road Sign Map

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arborhttps://voxel51.com

Case Study Teaser: Baltimore CitiWatch

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Fight Detected

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

[email protected]://voxel51.com/demo

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Come back for the

Next Session

2:25 PM

41#PSCR2019

voxel51.com410 N 4th Ave, 3rd Floor, Ann ArborBackup

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Deployment Strategy 43

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

The ETA Library

44

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

45ETA Library

• Modular software infrastructure• Core infrastructure implemented in Python to maximize accessibility to the

vision community• Modules are generic and can be implemented in arbitrary languages (e.g.

C++) and even hosted remotely in third-party clouds• Modules are connected to form pipelines that are run to compute the

requested analytics• Analytics are stored in a simple, flexible JSON format

The ETA System is a dynamically configurable graph whose nodes are modules and whose edges represent data flowing through the system

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Open-Source on GitHub! 46

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

47ETA Modules

• ETA modules are simply executables that take JSON files as input and write their outputs to disk

• Input configuration JSON files specify:– The locations of the input data on disk to process– The parameter values to use– The locations on disk to write the output data

• The ETA library defines a rich type system that all modules use to declare their I/O types

– Allows modules to communicate seamlessly

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

48

• Each module provides a metadata JSON file– Enumerates the module’s inputs,

outputs, and parameters– Defines the type of each field

• Complete description of a module’s operation

• Enables modules to be implemented in any language

• Generated automatically from source for modules built on ETA

Module Metadata Files

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

49

• Block diagram representation of a simple object detector:

An ETA Module Visualized

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

50Developing Modules in ETA

• Powerful yet simple Python framework for building modules

• Built on popular open-source libraries (OpenCV, TensorFlow)

• Extensive core video library– Reading, writing, iterating over video

frames– Supports all common video codecs

• Robust interface for serializing data to disk

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

51Writing Modules in ETA

ETA Module Definition

Example JSON Config

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

52Core Video Analytics: NowCurrent Core Video Analytics

• Deep (VGG-16) and standard (SURF, ORB, weighted histogram, etc.) embeddings• Deep (C3D) and standard (per-frame-voting) video embedding• Object detection• Object tracking• Per-frame and clip-level event detection• Visual image similarity search

Current Road-Related Analytics• Vehicle and road sign detection• Fine-grained vehicle recognition (type, make, model, color)

Current Surveillance-Related Analytics• Action recognition• Target reacquisition

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

53Core Video Analytics: Soon!

Upcoming Road-Related Analytics• Improved fine-grained vehicle recognition

– Using object tracking to increase accuracy– Integrating license plate recognition (LPR)

• Road scene understanding (lanes, markings, intersections)

Upcoming Surveillance-Related Analytics• Action recognition models trained on a database of criminal activity video• Assessing building exterior quality from aerial/street images• Target search via linguistic description

Related Efforts• Support for streaming video

We are willing and able to train custom models on your data!

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

54

• Pipelines are defined by simple metadata JSON files

• Describes the module connections that define the execution graph

• Sets parameters for the constituent modules

• Exposes tunable parameters to the user

Pipeline Metadata Files

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

55

• Block diagram representation of a simple pipeline:

An ETA Pipeline Visualized

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

The Voxel51 Vision Services API

56

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

57Vision Services API

• Canonical cloud-based interface through which users interact with ETA

• Accessible via standard HTTPS requests

• Also provide client libraries to access the API programmatically from Python and JavaScript

• Lives in Google Cloud, with plans to provide a parallel deployment in Amazon Web Services

• Full documentation at https://voxel51.com/docs/api

The Vision Services API is currently in private beta. Visit our website at https://voxel51.com to request an invitation to use our platform!

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

58Vision Services API Design

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

59Voxel51 Vision Engine

• Scalable cluster of Docker containers, each of which contains a copy of the ETA Library

• Deployed via Kubernetes, a open-source platform for managing containerized workloads

• Cluster scales automatically to allocate resources on-demand in response to user job requests

• Runs jobs asynchronously as resources become available

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

60Typical User Workflow

• Register for our API and obtain an authentication token (one-time only)• Upload data to the cloud via the API• Schedule analytics to be run on your data by issuing job requests via the

API

(Tasks run asynchronously in the Voxel51 Cloud)

• Periodically ping the API to check the status of your jobs• When tasks finish, issue an API request to download the outputs, which are

concisely represented in a flexible JSON format

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

61Example ApplicationsTarget Reacquisition• User provides a video corpus (e.g. surveillance camera network) and

example image(s) of an entity of interest• System queries the corpus and returns best-matches for the entity across

the corpusLinguistic Description Retrieval• User provides a video corpus and a natural language description of an entity

of interest– Ex: “red pickup with a dented front bumper”

• System queries the corpus and returns best-matchesTarget reacquisition is currently available on our beta platform!

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Target Reacquisition

• Toy example of a target reacquisition query:

62

Visual Query Video Corpus

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

Linguistic Description Query

• Toy example of a linguistic description query:

63

Linguistic Query

“blue sedan with tape holding on the front-bumper”

Video Corpus

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

API Usage: Data Upload

• Uploading data in Python:

64

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

API Usage: Job Request

• Example request for a target reacquisition query:

65

• Query image and video corpus were previously uploaded to the Voxel51 Cloud

• Data referenced by IDs in the job request

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

API Usage: Job Request

• Example request for a linguistic retrieval query:

66

Note that data can be accessed from an external cloud via signed URLs

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

API Usage: Running a Job

• Running a job and checking its status:

67

voxel51.com410 N 4th Ave, 3rd Floor, Ann Arbor

API Usage: Job Output

• Example output for a target reacquisition query:

68