![Page 1: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/1.jpg)
Caesar: Cross-Camera Complex Activity Recognition
Xiaochen Liu University of Southern CaliforniaPradipta Ghosh University of Southern CaliforniaOytun Ulutan University of California, Santa BarbaraB.S. Manjunath University of California, Santa BarbaraKevin Chan ARLRamesh Govindan University of Southern California
![Page 2: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/2.jpg)
Cameras are the Most Powerful and Ubiquitous Sensors Today
2
![Page 3: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/3.jpg)
Cameras Can be Used to Detect Various Activities
3
Public safety
Surveillance that captures the Las Vegas mass shooter in the hotel
Smart shoppingTraffic monitoring
![Page 4: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/4.jpg)
Today’s Activity Analytics is Highly Manual
4
![Page 5: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/5.jpg)
Ideal: Automated Activity Analytics
5
Monitoring System
Detected frames and objects
User-defined activity
“A person gets into a car then walk with bag”
Camera Cluster
![Page 6: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/6.jpg)
Caesar for Complex Activity Detection
6
User-defined activity
“A person gets into a car then walks with bag”
No prior work on complex activity detection
But there is work on atomic activity detection
Caesar fills the gap
- Gets into a car- Walk - With bag
![Page 7: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/7.jpg)
Atomic Activity & Complex Activity
7
- Use Phone (Camera 1)- Take Bag (Camera 1)- Give Bag (Camera 2)
Atomic Activities
Camera 1 Camera 2
Complex Activity“A person uses phone then takes bag and gives bag”
![Page 8: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/8.jpg)
Prior Work on Atomic Activity Detection
8
BoundingBoxes
Object Detector
Object Detection: Detect people/objects as bounding boxes
Prior Work: YOLO (CVPR’17), SSD (ECCV’16), ResNet (CVPR’16)
![Page 9: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/9.jpg)
9
Tubes
Tracker
6
BoundingBoxes
Object Detector
Tracking: Track each bounding box’s movement as a tube
Prior Work: DeepSORT (WACV’18), RPP (ECCV’18), DG-Net (CVPR’19)
Prior Work on Atomic Activity Detection
![Page 10: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/10.jpg)
10
Action Detector
Tubes Results
Tracker
6
BoundingBoxes
Object Detector
Action DetectionDetect atomic actions in a tube using DNN or rules
Prior Work: ACAM (WACV’20), RPNN (ICCV’19), ODAS (ECCV’18)
Prior Work on Atomic Activity Detection
![Page 11: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/11.jpg)
Tube clips with labels
DNN-Based Action Detection
11
Result:Use Phone
Training
Use Phone
Talking
Reading
… …
Input
Images from AVA dataset
![Page 12: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/12.jpg)
Pros and Cons of DNN for Action Detection
12
Can detect various atomic activities with high accuracy
No training data for complex activities, which have multi-tubes and long duration
Even with data, training a separate DNN for each complex activity is an overkill and not generalizable
![Page 13: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/13.jpg)
Rule-Based Activity Detection
13
Person 2 Tube
Results:Two person getting close
Simple Rule: The pixel distance between two tubes keeps decreasing
Person 1 Tube
![Page 14: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/14.jpg)
Intuitive and simple input (just bounding box)
Generalizable to describe a wide range of interactions between tubes
No clue about the content in the box so can only detect very limited types of tube actions
Pros and Cons of Simple Rule-Based Action Detection
14
![Page 15: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/15.jpg)
Caesar: An Framework with DNNs + Rules
15
Caesar’s Rule: “A person skateboarding moves with a person riding a bike”
Person 1 Tube
Person 2 Tube
From DNNs
From simple rules
Define many more complex activities than before
![Page 16: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/16.jpg)
Flexibility: Detects user-defined complex activities correctly
Design Goals & Challenges
16
How to specify and detect complex activities?
![Page 17: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/17.jpg)
Flexibility: Detects user-defined complex activities correctly
Scalability: Supports concurrent video streams in realtime
Design Goals & Challenges
17
How to specify and detect complex activities?
How to optimize the hardware resource usage?
![Page 18: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/18.jpg)
Flexibility: Detects user-defined complex activities correctly
Scalability: Supports concurrent video streams in realtime
Efficiency: Saves energy and wireless data on mobile platforms
Design Goals & Challenges
18
How to specify and detect complex activities?
How to optimize the hardware resource usage?
How to collaborate the server and the mobile efficiently?
![Page 19: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/19.jpg)
Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary
Caesar’s Contributions
Efficiency: Lazily retrieve frames from mobile devices
Scalability: Use lazy graph matching to reduce DNN usage on GPUs
19
![Page 20: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/20.jpg)
Represent Complex Activities as Graphs
20
Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:
Caesar uses different approaches’ atomic actions as graph elements
Cross-camera person Re-identification DNN
Single-camera action detection DNN
Single-camera rule-based
action detection
![Page 21: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/21.jpg)
Represent Complex Activities as Graphs
21
Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:
Then ...And NotLogic Relation:
...Near Approach LeaveSpatial Relation: Close
Caesar also uses pre-defined spatial and logic relations to express rules
![Page 22: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/22.jpg)
Represent Complex Activities as Graphs
22
Then ...And NotLogic Relation:
With Bag With Bag
ThenMove
And
Person: Use PhoneAn Example ofActivity Graph Not
...Near Approach LeaveSpatial Relation: Close
Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:
![Page 23: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/23.jpg)
Workflow: Rule Definition and Parsing
23
(1) User inputs action definition (2) Parse definition into a graph
With Bag
ThenMove
And
Person: Use Phone
With Bag
Not
Action Name: use_phone_then_move_with_bag
Subjects: Person p
Action Definition: (p use_phone) and (not p with_bag) then (p move) and (p with_bag)
![Page 24: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/24.jpg)
Workflow: Graph Matching
24
(3) Matching the graph to the tubes and actions
With Bag
ThenMove
And
Use Phone
With Bag
Not
THEN
With Bag
ThenMove
And
Use Phone
With Bag
Not
![Page 25: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/25.jpg)
Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary
Caesar’s Contributions
Efficiency: Lazily retrieve frames from mobile devices
Scalability: Use lazy graph matching to reduce DNN usage on GPUs
25
![Page 26: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/26.jpg)
The Action DNN is the Major Scalability Bottleneck
26
DNN FPS
Object Detection 20~30
Tracking/ReID 30~45
Action Detection 50~60 (per tube)Too slow when
- 10 people: 5~6 FPS- 20 people: 2~3 FPS
Due to the nature of per-tube detection, the action DNN becomes the bottleneck in the processing pipeline
![Page 27: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/27.jpg)
Lazy Graph Matching
27
Match spatiotemporal actions in a graph before any DNN action matching
Simple-rule actions: ● stop, move, close, disappear...
DNN actions: ● talk, use phone, sit, hug...
Skip heavy nodes and try to match the rest of the graph
Complete matching
![Page 28: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/28.jpg)
An Example of Lazy Graph Matching
28
With Bag
ThenMove
And
Use Phone
With Bag
Not
With Bag
ThenMove
And
Use Phone
With Bag
Not
THEN
With Bag
ThenMove
And
Use Phone
With Bag
Not
Go back and run DNN
Action DNN Usage (with lazy matching)
Action DNN Usage (without lazy matching)
![Page 29: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/29.jpg)
Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary
Caesar’s Contributions
Efficiency: Lazily retrieve frames from mobile devices
Scalability: Use lazy graph matching to reduce DNN usage on GPUs
29
![Page 30: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/30.jpg)
Different modules have different data requirements:
● The spatial-temporal action detection
○ Needs bounding box positions in every frame
● The DNN action detection and tracking/ReID
○ Needs bounding box images of some tubes
● Visualization and video storage
○ Needs full frames
On-Demand Data Fetching for Mobile Devices
Data Usage
30
![Page 31: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/31.jpg)
Caesar Mobile uploads bounding boxes by default, and uploads
requested bounding box images or full frames
Data Fetching Logic on Caesar Mobile
Caesar Mobile
Input Frames
Object Detection
Image Cache
Boxes Boxes Frames
Regular Upload:- Obj BBox - Obj Type- Frame ID- Device ID- ...
On-Demand Upload:BBox Images / Full Frames
On-Demand Query from Server: Start TimeEnd TimeData Type
31
![Page 32: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/32.jpg)
System Architecture
Edge Server
Caesar Server
32
Caesar Mobile
Fixed Camera
Caesar Mobile
![Page 33: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/33.jpg)
Evaluation Setup
33
Caesar Mobile: Eight Nvidia TX1/TX2 boards, connected with WiFi
Caesar Server: Equipped with three Nvidia 2080 GPUs
Dataset: 30-min video clips from eight cameras in DukeMTMC dataset
- We manually labeled these in the videos:
- 1289 atomic activities
- 149 complex activities (73.8% are cross-camera)
Accuracy Metric: The ground truth is matched only when both the person
ID and all atomic actions are matched
![Page 34: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/34.jpg)
- Caesar achieves 61.0% recall and 59.5% precision
- All errors come from imperfect DNNs:
- Better action and ReID DNNs could improve the accuracy to 100%
Caesar’s Accuracy & Analysis
34
![Page 35: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/35.jpg)
Scalability Optimization
35
- Strawman: without lazy graph matching and on-demand data fetching- For spatial-only activities (only non-DNN nodes), Caesar has FPS > 17- For mixed activities (contain some DNN nodes), Caesar keeps FPS > 15
![Page 36: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/36.jpg)
Data Usage & Energy Consumption on Caesar Mobile
On-demand data fetching reduces- 3x wireless data usage for mixed activities; 15x for spatial-only activities- 6x energy consumption for mixed activities; 10x for spatial-only activities
36
![Page 38: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/38.jpg)
● Cross-camera complex activity recognition is essential in many fields
● Caesar combines DNNs and rule-based metrics to detect customized complex activities across cameras
● Caesar leverages lazy matching and on-demand data retrieval to improve efficiency and scalability
● Caesar reduces data usage and detection latency by >3X, up to 10X
Takeaway
38
![Page 39: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh](https://reader034.vdocument.in/reader034/viewer/2022051812/602ecb2081e79c53826dde1a/html5/thumbnails/39.jpg)
Caesar: Cross-Camera Complex Activity RecognitionXiaochen Liu University of Southern CaliforniaPradipta Ghosh University of Southern CaliforniaOytun Ulutan University of California, Santa BarbaraB.S. Manjunath University of California, Santa BarbaraKevin Chan ARLRamesh Govindan University of Southern California
https://github.com/USC-NSL/Caesar