Download - Online Security Analytics on Large Scale Video Surveillance System by Yu Cao and Xiaoyan Guo
Online Security Analytics On Large Scale Video Surveillance System
Yu Cao, Xiaoyan GuoEMC Corporation
Security Analytics On Video Surveillance
• Search across video systems in all store locations toidentify the customer of a fraudulent card transactionand his/her other transactions
• Correlate register transactions and surveillance videoto identify employee fraud transactions where thereis no customer present
• If multiple stores in a region are robbed, identify anyfaces that were in all of those stores in the weeksleading up to the events
2
-- Retail Industry Customer Cases
Challenges In Big & Fast Data Era
3
Cloud Integration
M&O
Fast Data Ingestion
Multi-Latency Analytics
Scalable Data
Storage
EMC Video Analytics Data Lake
4
Where Spark Resides @ VADL
Offline Video Analytics & Model Training
Object Detection
Feature Extraction Classification
Abnormal Detection
Face Recognition
Feature Indexing
Online Video Processing and Detection
Ad-Hoc Video Content Search
Video & Feature Storage Analytics
Model
Streaming
MLlib & GraphX
Core & SQL
Deep Learning onDeep Learning Framework
5
Enable Spark to Process Raw Video Data
6
• Spark has no built-in video processing capability• Combine Spark program (Scala, Java) with video
processing library(C++)
PipedRDD: Invoking External Programs
7
• PipedRDD[T]: T => Linux Command(T) => String,T is text line
• Spark Pipe– pipe interface takes an input of an external command, and then execute it
externally. The input stream of this program is the content of RDD in spark, the output of this external program will form a new RDD
• JAVA API– JavaRDD<String> pipe(String command)– JavaRDD<String> pipe(java.util.List<String> command)– JavaRDD<String> pipe(java.util.List<String> command,
java.util.Map<String,String> env)– Return an RDD created by piping elements to a forked external process
Video Processing Function Implementation
8
• OpenCV– Popular open source computer vision library
• Home-grown algorithms, e.g. CNN
• Video Processing Functions– video file => video transcoding => list of frame images– frame image => background extraction => background
image– frame image => object detection => list of objects– object => feature extraction => object features– ……
Pipeline Video Processing Tasks
9
• Steps– Implement all required video processing sub-components
as external programs– Pipeline these processing units by utilizing PipedRDD in
Spark jobs• Pseudo-code (Chaining & Pipeline)
sc.fromCameraStream (“rtsp://10.67.89.10/road?fps=15”).pipe(“video_transcoding”).pipe(“object_detection”).pipe(“feature extraction”).writeToHBase()
Online Video Processing During Ingestion
Video Ingestion System10
Online Video Processing During Ingestion
11
video streams
Object Detection
Feature Extraction
Classification/Recognition
Indexing/ Storing
Deep Learning Platform
Model
Real-time Detection
Real-time Dashboard
Video Processing in Spark Streaming
12
• Receive Video Stream– val snapList = stream.queueStream(rddQueue)– Read video stream in certain time interval,put data into msgQueue– rddQueue += sc.makeRDD(msgQueue) – Then process snapList
Spark Job
Spark Streaming
rdd.pipe(“video_transcoding”).pipe(“object_detection”).pipe(“feature extraction”).writeToHBase()
Feature & object store
Online Video Analytics App
Video Content Search
13
• Content-based video object search
– Search similar objects by a given object instance
– E.g. search suspect from history video records by given the suspect's identification photo
• Semantic-based video object search
– Search matched objects by given semantic declaration
– E.g. given keywords: search "Red Porsche", "a woman sitting and smoking", etc
Video Content Search Workflow
14
camera streams
Object Detection
Feature Extraction
Index Building
HBase Ingestion
IndexIngestion
video pre-processing:object detection and feature
extraction
Web Dashboard
Web Backend
Search Engine
HDFS
Multi-Tier Video Storage
HBase
HBase Client
Feature Extraction
query image
similar object search
features
object information query
top-kobjects
similar objects
• Video Pre-processing and Feature Extraction • Scalable Storage • Object-based Indexing • Similar Object Search Engine
Feature & Index
Object Info
Original Video Data
Video Object Similarity
15
Local Binary Pattern(LBP)
• Similarity of Features == Similarity of Video Objects– Color, Texture, Shape– SIFT
• 160 features, each is a vector of 128 dimensions– Deep Learning Features …
Deep Learning Features in Different Layers
Feature Dimensionality Reduction
16
• PCA– MLlib version PCA (when D is small)
• Scalable PCA– Distributed PPCA implemented atop Spark (when D is large)
• LSH (Locality-Sensitive Hashing)– LSH hashes input items so that similar items map to the same “buckets” with
high probability
Resize
Grayscale
SIFT
PCA
Spark Top-K Query Pipeline
17
workers: --f(i)----f(i)--
--f(i)----f(i)--
--f(i)----f(i)--
map: --f(i)-- --qf--
--qf--query feature
--f(i)-- --qf--
--f(i)-- --qf--
order: Array[s1, s2, …,sn] Array[s1, s2, …,sn] Array[s1, s2, …,sn]
top-ktop-k most similar features
Scala Code Example
18
def topRankScore(sc:SparkContext, top:Int, queryInput:String, trainPath:String, useMethod:(Array[Double], Array[Double])=>Double ) = {
val query = sc.makeRDD(Array(queryInput)).map( _.split("\\ ").map( _.toDouble ) ).collect()(0)
featureFile.filter( _.length > 0 ).map{ line =>val parts = line.split("\\ ", 2)(useMethod(query, parts(1).split("\\ ").map( _.toDouble)), parts(0))
}.takeOrdered(top).map( i => (i._2, i._1) )
}topRankScore(sc, topNumber.toInt, imageFeaturesStr, names, cosScore).foreach(println)
Parameter: (sc, top-k, queried feature, HDFS feature file, similarity computing method)
Deep Learning @ VADL
19
• Feature extraction for detected video objects– faces– humans
• Classification of video objects– With trained model
• Suspect detection and recognition Training neural networks with many layers
20
Deep Learning With Spark• DeepLearning4J (DL4J)
– Open source– Variety of NNs &
Flexibility– Cross-platform & Scale– Java Implementation
• parallelization (Yarn, Spark)
• GPU support– Also supports multi-GPU
per host node
• DeepDist– Open source– Deep belief networks– Asynchronous
stochastic gradient descent for data stored on HDFS / Spark
– Python Implementation
THANK [email protected]