graph stream processing...cloud compute server/ nvr/vms appliance thinci pcie / edsff cards ai...

Graph Stream ProcessingNext Generation of Processing for ML/DL Applications

1

Shawn HolidaySr. Director Customer [email protected]

Sept 19, 2019

© 2019 Thinci. All rights reserved. Thinci Confidential

• Who/What is Thinci?• AI Challenges • Graph Stream Computing (GSP)• Software Tools for GSP• GSP Smart Vision Automotive Applications• Examples• Wrap-up

Shawn Holiday

mailto:[email protected]

http://www.thinci.com/

https://www.linkedin.com/in/shawnholiday/

Who/What is Thinci?

Founded in 2010 by Industry Leading CPU & GPU Experts (ex. Intel) • Dinakar Munagala (CEO), Satyaki Koneru (CTO)

Ke Yin (VP Engineering), Val Cook (Chief SW Architect)

~250+ Employees World Wide (95% Eng & Ops)• El Dorado Hills, CA (HQ), Campbell, CA (Silicon Valley)• Hyderabad, India , Kings Langley & Leeds, UK

Focused on AI Solutions & Tools • GSP architecture for efficient computing • Comprehensive SDK & development tools• SoCs, modules, boards & platforms • Client, edge & data center applications

Major Corporate, Gov & VC Investors • Denso, Magna, Daimler, Temasek, Mirai Creation/SPARX Group (Toyota),

GGV Capital, SGInnovate, Wavemaker, Samsung

© 2019 Thinci. All rights reserved . Thinci Confidential.

Graph Streaming Processor™ (GSP) Architecture • Up to 10-100x more efficient than traditional CPU/GPU solutions• Scalable across data center, edge and client devices for enterprise

and consumer• Broad AI applicability in Machine Learning, Deep Learning, Neural

Networks and Vision Processing

Powerful Software Development Kit (SDK)• Popular AI frameworks such as TensorFlow, PyTorch & Caffe2• Direct programing access for custom/proprietary development

Comprehensive Platform Roadmap• Built around powerful Thinci GSP SOC modules • SoCs, PCIe cards, M.2 cards and appliances for data center, edge

and clients

What We Do

6

4

A

0

B C

D

Nodes A,B,C,D

time

1

3 5

2


AI Development Process

Rapidly Evolve AI Models

The Challenges of Developing AI SolutionsCompute efficiency, energy efficiency, software infrastructure, flexibility, resources

Define Problem

DataCollection

Data Labeling

Train & Optimize

Data

Encapsulate AI Models in Larger

Analytics Applications

• Power/performance/cost efficiency for AI workloads

• Existing processor architectures inadequate

Compute Challenge

• How to get AI workloads seamlessly running on HW

• Manage computational load

SW Infrastructure Challenge

Rapid Flexibility Challenge

• Models evolve faster than hardware can be produced

• Need for greater programmability

Edge to Core Scalability

• Single architecture from Edge to Core

• Dodging dead ends, point solutions


Thinci Hardware Solutions: Ready for Development

5

GSP Silicon, Boards, Software & Systems

Discrete SOCsPCIe Accelerators

& Embedded Systems

Deskside Servers: Linux PC-based

Development Station with SDK

Small FF AcceleratorsM.2 & EDSFF

System-On-Module (SOM) &

SOM Carriers


Graph-Based ComputingThe next inflection point in cooperative computational parallelism

Cooperative task level parallelism allows multiple neural network layers to run simultaneously – without demanding Microarchitecture knowledge or low-level programming effort

Task Level Parallelism

Thread Level Parallelism

Data Level Parallelism

Instruction Level ParallelismCPUs, MCUs

DSPs, Vector Machines

GPUs, Multi-threaded Processors

Graph Processors


Graph Model Processing StepsVision Processing/Object Detection Example Processing Comparison

Thinci Graph Streaming Processor (GSP)Less Memory, Less Time, Lower Power

1

2 4

3 5 6A B C DLegacy

Sequential ProcessingCPU/GPU

Time

• Data parallel only • Tasks completed before

data sent to next node• High memory utilization to

store data while tasks complete

CPUs/ GPUs: Use of intermittent storage between process steps slows processing and increases power consumption.

12

43

56

A , B , C , DGraph Streaming Processing

Time

• Data & task parallel• Data sent to next node

when ready• Low memory utilization as

data is sent directly to next node reduces power & costs

Thinci GSP massively shrinks offload to memory on host CPU –lowering power demands

Instructions Programs

Off-chip memory (DRAM)

INPUT(Pixels)

A

1 2

B C

543

6

D

A B C D

1 2 3

4 5 6

vs

OUTPUT (Image)Object Model


GSP vs Other Traditional Architectures


CPU GPU FPGA Fixed FunctionThinci GSP

Any taskUnoptimized

One taskMany times

Multiple tasksExploration

Multiple tasksOptimized

A fixed taskMany times

100% Programmability Pervasive Programmability(Brute force ecosystem)

Custom Programmability(Each App a new stack)

Good Programmability(Single SW stack)

LimitedProgrammability

Software Advances Workflow to Speed DeploymentStreamlines path to AI workloads running on hardware

AI Researchers Optimization Tools Developers and Researchers Platform

Network Optimizations

Graph

Application

Parameters

GraphApplication

Parameters

Kernals

Network Design Deployment Development Production

Optimize, Compress and Convert Networks to OpenVX C/C++ Graphs

Develop OpenVX C/C++ AI Enabled Applications


GSP Smart Vision & Smart City ApplicationsThinci solves the essential challenges in each level of these multi-tiered problems

Edge Core Cloud / Data Center

In-Camera Analytics

Network-Based Analytics

Server/NVR Analytics

Sensor EdgeIntelligent Cameras

Thinci SOC

Edge Network Intelligent Network Equipment

Thinci PCIe / EDSFF

Edge ComputeServer/NVR/VMS Appliance

Thinci PCIe / EDSFF

Cloud ComputeServer/ NVR/VMS ApplianceThinci PCIe / EDSFF Cards

AI CamerasInitial Detection& FOV Tracking

Legacy Cameras

AI Switch/RouterCross Camera Tracking

& Object Recognition

AI Servers/VMS ApplianceCross Site/Floor/Zone

Tracking, Object Recognition, & Analytics

AI ServersCross Enterprise & Unified Smart City

Tracking, Object Recognition, & Analytics

H.264 Stream

Metadata & Frame

of Interest

Initial Detection& FOV Tracking

Metadata & Frames of

Interest

Server/NVR Analytics

Results, Alarms, Data,

Actions...

Thinci allows easy migration of analytics across computing tiers© 2019 Thinci. All rights reserved . Thinci Confidential.

GSP Automotive/Mobility ApplicationsOne GSP architecture. Many automotive applications.

Safety/Security Systems Control & Monitoring

Localized Vision Pre & Post Processing

• Centralized Sensor Fusion for Cameras, LiDAR, Radar…• Centralized Compute & AI Acceleration…

In-Cabin Monitoring

Power/Drivetrain Control & Monitoring

Infotainment & Driver UX

Localized Pre & Post Sensor Processing

Replacing power-hunger development-cumbersome GPU & FPGA platforms


Example: Auto Smart Vision Object RecognitionImplementation for ADAS or autonomous driving module

ARM NIC-400

Debug unit (JTAG

)

GSP Cluster

AXI bridge

ARM TrustZoneCryptoCell

ARM A53MP 2-core

1000 MHz

RAM

Boot RO

M

CAN

BusC

ontrol x3

QSPI

SD

GPIO

x32

I2C x5

Peripheral DM

A

UAR

T x2

MIPI CIS2 x4(in)

MIPI D

SI(out)

MIPI C

IS2(out)

H264/H265Decode

H264/H265Encode

SPI slave

Ethernet

I2S

USB 3.0

LPDD

R

LPDD

R

PCIe-3.14 lane

• ISP functions combine input from multiple cameras to provide 360-degree view• AI segmentation and detection of pedestrians, road signs and other objects• Encode video for in-cabin display, other modules or distribution to the cloud• Automotive grade ASIL B

• Sensor acquisition

• Ethernet and canbus control

MIPI CIS2 x4(in)

Ethernet

LPDD

R

ARM NIC-400

LPDD

R

In-car control

H264/H265Encode

GSP Cores


13

ROI

Crop

Filter

Scale

Format

NN(Inference)

No Left(93%)

DMA

Camerafeed

Regions of Interest(Object Detector)

CropImage Manipulation

Image Filtering/Enhancement Scale / Format Classification

Pedestrian(97%)

Car(96%)

FF/GPUCPU/GPU FF/GPUISP GPU GPU

Thinci’s GSP (One Visual Graph…)

Demosaic

DMA

Other Architectures: Multiple HW/SW elements with heavy data movement, memory usages, power & mgmt

Thinci GSP: Single processing element that can process the entire pipeline flow reducing data movement, memory utilization, power & mgmt

Example: Auto Smart Vision Object RecognitionGSP collapses workload pipelines into a single programmable model


f1f2

f3f4

f5f6

• Graph of kernels operating on large image frames

YUV-to-RGB WB/Gamma WDR Adaptive local tone mapping

Dehaze retinex enhancement

Noise reduction Edge enhancement

Example: Auto Smart Vision Object Recognition


DRAM

On-chip storage

Depth first scheduling ensures data is used as soon as it is available

f1f2

f3f4

f5f6

YUV-to-RGB WB/Gamma WDR Adaptive local tone mapping

Dehaze retinex enhancement

Noise reduction Edge enhancement

DRAM

Example: Auto Smart Vision Object Recognition


• Streaming minimizes intermediate data, enabling on-chip storage and low bandwidth

Auto Smart Vision: Pixel Level Segmentation

Features:• Pixel Level Semantic Segmentation

• up-to 30 on-road objects type • Driveable free space detection• ~25 FPS on HD Images

Description:

Semantic Segmentation allows us to perceive the environment with more robust knowledge than just detection. e.g., when you require to find out the safe corridor for ego-vehicle, Semantic Segmentation provides information about free space.


Auto Smart Vision: Sensor Data Dense Fusion (LiDAR + Front Camera)

Features:

• Driveable free space detection• ~30FPS

Description:

Sensor fusion is the process of merging data from multiple sensors such that to reduce the amount of uncertainty that may be involved in a navigation motion or task performing. Sensor fusion helps in building a more accurate world model in order for the vehicle to navigate and behave more successfully.


Features:

• 3600 3D bounding box detection• Orientation estimation • Speed estimation• Vehicle detection• Vehicle classification• Vehicle localization• >30 FPS

Auto Smart Vision: On–Road Objects Volumetric Analysis

Description:

3D object detection subtask is itself one of the most important prerequisites to autonomous navigation, as this task is what allows the car controller to account for obstacles when considering possible future trajectories.


Auto Smart Vision: Multi Objects Tracking

Features:

• Multi Vehicle 3D Tracking• Speed Estimation• >50 FPS• EKF based tracking

Description:

3D object tracking is one of the most important prerequisites to autonomous navigation. This task reduces the burden of heavy object detection and processing time while at the same time it will help to estimate leading vehicle trajectory, behaviour and its speed.


Auto Smart Vision: Low Visibility Assistance

Features:

• Highlight lane marking and leading vehicles in low light condition

• HDMaps & Reinforcement Learning based Thinci algorithms

Description:

This Driver Assistive System facilitates clear and safe driving with real-time, accurate, and clear digital maps that reveal nearby objects to allow increased driving speed, lane keeping, under poor visibility such as fog, snow, rain, and darkness, prevent crashes and save lives.


Graph Streaming Processor™ (GSP) Architecture • Up to 10-100x more efficient than traditional CPU/GPU solutions• Scalable across data center, edge and client devices for enterprise

and consumer• Broad AI applicability in Machine Learning, Deep Learning, Neural

Networks and Vision Processing

Powerful Software Development Kit (SDK)• Popular AI frameworks such as TensorFlow, PyTorch & Caffe2• Direct programing access for custom/proprietary development

Comprehensive Platform Roadmap• Built around powerful Thinci GSP SOC modules • SoCs, PCIe cards, M.2 cards and appliances for data center, edge

and clients

Wrapping Up

6

4

A

0

B C

D

Nodes A,B,C,D

time

1

3 5

2


Thank You!

graph stream processing...cloud compute server/ nvr/vms appliance thinci pcie / edsff cards ai...

Documents