graph stream processing...cloud compute server/ nvr/vms appliance thinci pcie / edsff cards ai...
TRANSCRIPT
Graph Stream ProcessingNext Generation of Processing for ML/DL Applications
1
Shawn HolidaySr. Director Customer [email protected]
Sept 19, 2019
© 2019 Thinci. All rights reserved. Thinci Confidential
• Who/What is Thinci?• AI Challenges • Graph Stream Computing (GSP)• Software Tools for GSP• GSP Smart Vision Automotive Applications• Examples• Wrap-up
Shawn Holiday
Who/What is Thinci?
Founded in 2010 by Industry Leading CPU & GPU Experts (ex. Intel) • Dinakar Munagala (CEO), Satyaki Koneru (CTO)
Ke Yin (VP Engineering), Val Cook (Chief SW Architect)
~250+ Employees World Wide (95% Eng & Ops)• El Dorado Hills, CA (HQ), Campbell, CA (Silicon Valley)• Hyderabad, India , Kings Langley & Leeds, UK
Focused on AI Solutions & Tools • GSP architecture for efficient computing • Comprehensive SDK & development tools• SoCs, modules, boards & platforms • Client, edge & data center applications
Major Corporate, Gov & VC Investors • Denso, Magna, Daimler, Temasek, Mirai Creation/SPARX Group (Toyota),
GGV Capital, SGInnovate, Wavemaker, Samsung
© 2019 Thinci. All rights reserved . Thinci Confidential.
Graph Streaming Processor™ (GSP) Architecture • Up to 10-100x more efficient than traditional CPU/GPU solutions• Scalable across data center, edge and client devices for enterprise
and consumer• Broad AI applicability in Machine Learning, Deep Learning, Neural
Networks and Vision Processing
Powerful Software Development Kit (SDK)• Popular AI frameworks such as TensorFlow, PyTorch & Caffe2• Direct programing access for custom/proprietary development
Comprehensive Platform Roadmap• Built around powerful Thinci GSP SOC modules • SoCs, PCIe cards, M.2 cards and appliances for data center, edge
and clients
What We Do
6
4
A
0
B C
D
Nodes A,B,C,D
time
1
3 5
2
© 2019 Thinci. All rights reserved . Thinci Confidential.
AI Development Process
Rapidly Evolve AI Models
The Challenges of Developing AI SolutionsCompute efficiency, energy efficiency, software infrastructure, flexibility, resources
Define Problem
DataCollection
Data Labeling
Train & Optimize
Data
Encapsulate AI Models in Larger
Analytics Applications
• Power/performance/cost efficiency for AI workloads
• Existing processor architectures inadequate
Compute Challenge
• How to get AI workloads seamlessly running on HW
• Manage computational load
SW Infrastructure Challenge
Rapid Flexibility Challenge
• Models evolve faster than hardware can be produced
• Need for greater programmability
Edge to Core Scalability
• Single architecture from Edge to Core
• Dodging dead ends, point solutions
© 2019 Thinci. All rights reserved . Thinci Confidential.
Thinci Hardware Solutions: Ready for Development
5
GSP Silicon, Boards, Software & Systems
Discrete SOCsPCIe Accelerators
& Embedded Systems
Deskside Servers: Linux PC-based
Development Station with SDK
Small FF AcceleratorsM.2 & EDSFF
System-On-Module (SOM) &
SOM Carriers
© 2019 Thinci. All rights reserved . Thinci Confidential.
Graph-Based ComputingThe next inflection point in cooperative computational parallelism
Cooperative task level parallelism allows multiple neural network layers to run simultaneously – without demanding Microarchitecture knowledge or low-level programming effort
Task Level Parallelism
Thread Level Parallelism
Data Level Parallelism
Instruction Level ParallelismCPUs, MCUs
DSPs, Vector Machines
GPUs, Multi-threaded Processors
Graph Processors
© 2019 Thinci. All rights reserved . Thinci Confidential.
Graph Model Processing StepsVision Processing/Object Detection Example Processing Comparison
Thinci Graph Streaming Processor (GSP)Less Memory, Less Time, Lower Power
1
2 4
3 5 6A B C DLegacy
Sequential ProcessingCPU/GPU
Time
• Data parallel only • Tasks completed before
data sent to next node• High memory utilization to
store data while tasks complete
CPUs/ GPUs: Use of intermittent storage between process steps slows processing and increases power consumption.
12
43
56
A , B , C , DGraph Streaming Processing
Time
• Data & task parallel• Data sent to next node
when ready• Low memory utilization as
data is sent directly to next node reduces power & costs
Thinci GSP massively shrinks offload to memory on host CPU –lowering power demands
Instructions Programs
Off-chip memory (DRAM)
INPUT(Pixels)
A
1 2
B C
543
6
D
A B C D
1 2 3
4 5 6
vs
OUTPUT (Image)Object Model
© 2019 Thinci. All rights reserved . Thinci Confidential.
GSP vs Other Traditional Architectures
© 2019 Thinci. All rights reserved . Thinci Confidential.
CPU GPU FPGA Fixed FunctionThinci GSP
Any taskUnoptimized
One taskMany times
Multiple tasksExploration
Multiple tasksOptimized
A fixed taskMany times
100% Programmability Pervasive Programmability(Brute force ecosystem)
Custom Programmability(Each App a new stack)
Good Programmability(Single SW stack)
LimitedProgrammability
Software Advances Workflow to Speed DeploymentStreamlines path to AI workloads running on hardware
AI Researchers Optimization Tools Developers and Researchers Platform
Network Optimizations
Graph
Application
Parameters
GraphApplication
Parameters
Kernals
Network Design Deployment Development Production
Optimize, Compress and Convert Networks to OpenVX C/C++ Graphs
Develop OpenVX C/C++ AI Enabled Applications
© 2019 Thinci. All rights reserved . Thinci Confidential.
GSP Smart Vision & Smart City ApplicationsThinci solves the essential challenges in each level of these multi-tiered problems
Edge Core Cloud / Data Center
In-Camera Analytics
Network-Based Analytics
Server/NVR Analytics
Sensor EdgeIntelligent Cameras
Thinci SOC
Edge Network Intelligent Network Equipment
Thinci PCIe / EDSFF
Edge ComputeServer/NVR/VMS Appliance
Thinci PCIe / EDSFF
Cloud ComputeServer/ NVR/VMS ApplianceThinci PCIe / EDSFF Cards
AI CamerasInitial Detection& FOV Tracking
Legacy Cameras
AI Switch/RouterCross Camera Tracking
& Object Recognition
AI Servers/VMS ApplianceCross Site/Floor/Zone
Tracking, Object Recognition, & Analytics
AI ServersCross Enterprise & Unified Smart City
Tracking, Object Recognition, & Analytics
H.264 Stream
Metadata & Frame
of Interest
Initial Detection& FOV Tracking
Metadata & Frames of
Interest
Server/NVR Analytics
Results, Alarms, Data,
Actions...
Thinci allows easy migration of analytics across computing tiers© 2019 Thinci. All rights reserved . Thinci Confidential.
GSP Automotive/Mobility ApplicationsOne GSP architecture. Many automotive applications.
Safety/Security Systems Control & Monitoring
Localized Vision Pre & Post Processing
• Centralized Sensor Fusion for Cameras, LiDAR, Radar…• Centralized Compute & AI Acceleration…
In-Cabin Monitoring
Power/Drivetrain Control & Monitoring
Infotainment & Driver UX
Localized Pre & Post Sensor Processing
Replacing power-hunger development-cumbersome GPU & FPGA platforms
© 2019 Thinci. All rights reserved . Thinci Confidential.
Example: Auto Smart Vision Object RecognitionImplementation for ADAS or autonomous driving module
ARM NIC-400
Debug unit (JTAG
)
GSP Cluster
AXI bridge
ARM TrustZoneCryptoCell
ARM A53MP 2-core
1000 MHz
RAM
Boot RO
M
CAN
BusC
ontrol x3
QSPI
SD
GPIO
x32
I2C x5
Peripheral DM
A
UAR
T x2
MIPI CIS2 x4(in)
MIPI D
SI(out)
MIPI C
IS2(out)
H264/H265Decode
H264/H265Encode
SPI slave
Ethernet
I2S
USB 3.0
LPDD
R
LPDD
R
PCIe-3.14 lane
• ISP functions combine input from multiple cameras to provide 360-degree view• AI segmentation and detection of pedestrians, road signs and other objects• Encode video for in-cabin display, other modules or distribution to the cloud• Automotive grade ASIL B
• Sensor acquisition
• Ethernet and canbus control
MIPI CIS2 x4(in)
Ethernet
LPDD
R
ARM NIC-400
LPDD
R
In-car control
H264/H265Encode
GSP Cores
© 2019 Thinci. All rights reserved . Thinci Confidential.
13
ROI
Crop
Filter
Scale
Format
NN(Inference)
No Left(93%)
DMA
Camerafeed
Regions of Interest(Object Detector)
CropImage Manipulation
Image Filtering/Enhancement Scale / Format Classification
Pedestrian(97%)
Car(96%)
FF/GPUCPU/GPU FF/GPUISP GPU GPU
Thinci’s GSP (One Visual Graph…)
Demosaic
DMA
Other Architectures: Multiple HW/SW elements with heavy data movement, memory usages, power & mgmt
Thinci GSP: Single processing element that can process the entire pipeline flow reducing data movement, memory utilization, power & mgmt
Example: Auto Smart Vision Object RecognitionGSP collapses workload pipelines into a single programmable model
© 2019 Thinci. All rights reserved . Thinci Confidential.
f1f2
f3f4
f5f6
• Graph of kernels operating on large image frames
YUV-to-RGB WB/Gamma WDR Adaptive local tone mapping
Dehaze retinex enhancement
Noise reduction Edge enhancement
Example: Auto Smart Vision Object Recognition
© 2019 Thinci. All rights reserved . Thinci Confidential.
DRAM
On-chip storage
Depth first scheduling ensures data is used as soon as it is available
f1f2
f3f4
f5f6
YUV-to-RGB WB/Gamma WDR Adaptive local tone mapping
Dehaze retinex enhancement
Noise reduction Edge enhancement
DRAM
Example: Auto Smart Vision Object Recognition
© 2019 Thinci. All rights reserved . Thinci Confidential.
• Streaming minimizes intermediate data, enabling on-chip storage and low bandwidth
Auto Smart Vision: Pixel Level Segmentation
Features:• Pixel Level Semantic Segmentation
• up-to 30 on-road objects type • Driveable free space detection• ~25 FPS on HD Images
Description:
Semantic Segmentation allows us to perceive the environment with more robust knowledge than just detection. e.g., when you require to find out the safe corridor for ego-vehicle, Semantic Segmentation provides information about free space.
© 2019 Thinci. All rights reserved . Thinci Confidential.
Auto Smart Vision: Sensor Data Dense Fusion (LiDAR + Front Camera)
Features:
• Driveable free space detection• ~30FPS
Description:
Sensor fusion is the process of merging data from multiple sensors such that to reduce the amount of uncertainty that may be involved in a navigation motion or task performing. Sensor fusion helps in building a more accurate world model in order for the vehicle to navigate and behave more successfully.
© 2019 Thinci. All rights reserved . Thinci Confidential.
Features:
• 3600 3D bounding box detection• Orientation estimation • Speed estimation• Vehicle detection• Vehicle classification• Vehicle localization• >30 FPS
Auto Smart Vision: On–Road Objects Volumetric Analysis
Description:
3D object detection subtask is itself one of the most important prerequisites to autonomous navigation, as this task is what allows the car controller to account for obstacles when considering possible future trajectories.
© 2019 Thinci. All rights reserved . Thinci Confidential.
Auto Smart Vision: Multi Objects Tracking
Features:
• Multi Vehicle 3D Tracking• Speed Estimation• >50 FPS• EKF based tracking
Description:
3D object tracking is one of the most important prerequisites to autonomous navigation. This task reduces the burden of heavy object detection and processing time while at the same time it will help to estimate leading vehicle trajectory, behaviour and its speed.
© 2019 Thinci. All rights reserved . Thinci Confidential.
Auto Smart Vision: Low Visibility Assistance
Features:
• Highlight lane marking and leading vehicles in low light condition
• HDMaps & Reinforcement Learning based Thinci algorithms
Description:
This Driver Assistive System facilitates clear and safe driving with real-time, accurate, and clear digital maps that reveal nearby objects to allow increased driving speed, lane keeping, under poor visibility such as fog, snow, rain, and darkness, prevent crashes and save lives.
© 2019 Thinci. All rights reserved . Thinci Confidential.
Graph Streaming Processor™ (GSP) Architecture • Up to 10-100x more efficient than traditional CPU/GPU solutions• Scalable across data center, edge and client devices for enterprise
and consumer• Broad AI applicability in Machine Learning, Deep Learning, Neural
Networks and Vision Processing
Powerful Software Development Kit (SDK)• Popular AI frameworks such as TensorFlow, PyTorch & Caffe2• Direct programing access for custom/proprietary development
Comprehensive Platform Roadmap• Built around powerful Thinci GSP SOC modules • SoCs, PCIe cards, M.2 cards and appliances for data center, edge
and clients
Wrapping Up
6
4
A
0
B C
D
Nodes A,B,C,D
time
1
3 5
2
© 2019 Thinci. All rights reserved . Thinci Confidential.
Thank You!