clustar: ai training platformpowered byhigh performance ...junxue zhang evp clustar phd sing lab,...

58
Junxue ZHANG EVP CLUSTAR PhD SING Lab, HKUST AGUEST 1,2018 CLUSTAR: AI Training Platform Powered by High Performance Networking

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Junxue ZHANG

    EVP CLUSTARPhD SING Lab, HKUST

    AGUEST 1,2018

    CLUSTAR: AI Training Platform Poweredby High Performance Networking

  • Deep Learning Is Becoming Increasingly Important

    27

    Computer Vision Natural Language Processing Auto-driving Cars

  • How does Deep Learning Work ?

    28

    𝑦 = 𝑎 ∗ 𝑥 + 𝑏

    1

    𝑥

    𝑠𝑢𝑚

    Input Layer Output Layer

    𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅1 5

    2 7mini batch

    𝑎 = 1

    𝑏 = 1

  • How does Deep Learning Work ?

    29

    𝑦 = 𝑎 ∗ 𝑥 + 𝑏

    1

    𝑥

    𝑠𝑢𝑚

    Input Layer Output Layer

    𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅1 5

    2 7mini batchForward Pass

    𝑎 = 1

    𝑏 = 1

    2

    3

  • How does Deep Learning Work ?

    30

    𝑦 = 𝑎 ∗ 𝑥 + 𝑏

    1

    𝑥

    𝑠𝑢𝑚

    Input Layer Output Layer

    𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅1 5

    2 7mini batch

    𝐿 = 𝐶4 𝑦 − 𝑦6789 =124 𝑦 − 𝑦6789

    ;

    Forward Pass

    𝑎 = 1

    𝑏 = 1

    2

    3

    Calculating Loss

  • How does Deep Learning Work ?

    31

    𝑦 = 𝑎 ∗ 𝑥 + 𝑏

    1

    𝑥

    𝑠𝑢𝑚

    Input Layer Output Layer

    𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅1 5

    2 7mini batch

    𝐿 = 𝐶4 𝑦 − 𝑦6789 =124 𝑦 − 𝑦6789

    ;

    𝜕𝐿𝜕𝑎 =

    𝜕𝐿𝜕𝑦6789

    ×𝜕𝑦6789𝜕𝑎 =4 𝑦6789 − 𝑦 𝑥 = −11

    𝜕𝐿𝜕𝑏 =

    𝜕𝐿𝜕𝑦6789

    ×𝜕𝑦6789𝜕𝑏 =4 𝑦6789 − 𝑦 = −7

    𝑎 = 𝑎 − 𝑟𝜕𝐿𝜕𝑎

    𝑏 = 𝑏 − 𝑟𝜕𝐿𝜕𝑏

    2

    3

    Calculating Loss

    Backpropagation

    𝑎 = 1 − 0.1 ∗ −11 = 2.1

    𝑏 = 1 − 0.1 ∗ −7 = 1.7

  • How does Deep Learning Work ?

    32

    𝑦 = 𝑎 ∗ 𝑥 + 𝑏

    1

    𝑥

    𝑠𝑢𝑚

    Input Layer Output Layer

    𝐿 = 𝐶4 𝑦 − 𝑦6789 =124 𝑦 − 𝑦6789

    ;

    𝜕𝐿𝜕𝑎 =

    𝜕𝐿𝜕𝑦6789

    ×𝜕𝑦6789𝜕𝑎 =4 𝑦6789 − 𝑦 𝑥 = −11

    𝜕𝐿𝜕𝑏 =

    𝜕𝐿𝜕𝑦6789

    ×𝜕𝑦6789𝜕𝑏 =4 𝑦6789 − 𝑦 = −7

    𝑎 = 𝑎 − 𝑟𝜕𝐿𝜕𝑎

    𝑏 = 𝑏 − 𝑟𝜕𝐿𝜕𝑏

    Calculating Loss

    Backpropagation

    𝑎 = 1 − 0.1 ∗ −11 = 2.1

    𝑏 = 1 − 0.1 ∗ −7 = 1.7

    𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅3 9

    5 13

    NextIteration

  • How does Deep Learning Work ?

    33

    Input Layer Output LayerHidden Layer

  • How does Deep Learning Work ?

    34

    Input Layer Output LayerHidden Layer

    Forward Pass Forward Pass Forward Pass

    Calculating Loss

    BackpropagationBackpropagationBackpropagation

    𝑤;CD𝑤E;D

    𝑤FED

  • The Big Data Drives a New Paradigm for Training

    35

    1. Data is too large to fit in single machine

    2. The training time is too longUber: it usually takes weeks or longer to complete [1]

  • Networking Plays an Important Role

    36

    Networking

    Worker 1 Worker 2

    𝑤E 𝑤; …

    Parameter Server

    DataPartition 1

    DataPartition 2

  • Networking Plays an Important Role

    37

    Networking

    Worker 1 Worker 2

    𝑤E 𝑤; …

    Parameter Server

    Pull Parameters From Servers

    DataPartition 1

    DataPartition 2

    𝑤E𝑤; 𝑤E𝑤;

  • Networking Plays an Important Role

    38

    Networking

    Worker 1 Worker 2

    𝑤E 𝑤; …

    Parameter Server

    Forward Pass Forward Pass

    DataPartition 1

    DataPartition 2

    Input Input

    𝑤E𝑤; 𝑤E𝑤;

  • Networking Plays an Important Role

    39

    Networking

    Worker 1 Worker 2

    𝑤E 𝑤; …

    Parameter Server

    Forward Pass Forward Pass

    Calculating Loss

    DataPartition 1

    DataPartition 2

    Calculating Loss

    Input Input

    𝑤E𝑤; 𝑤E𝑤;

  • Networking Plays an Important Role

    40

    Networking

    Worker 1 Worker 2

    𝑤ED 𝑤EDD𝑤;DD𝑤;D

    𝑤E 𝑤; …

    Parameter Server

    DataPartition 1

    DataPartition 2

    Backpropagation Backpropagation

  • Networking Plays an Important Role

    41

    Networking

    Worker 1 Worker 2

    𝑤ED 𝑤EDD𝑤;DD𝑤;D

    𝑤E 𝑤; …

    Parameter Server

    DataPartition 1

    DataPartition 2

    Backpropagation Backpropagation

    Push parameters to Servers

  • Networking Plays an Important Role

    42

    Networking

    Worker 1 Worker 2

    𝑤ED 𝑤EDD𝑤;DD𝑤;D

    𝑤E 𝑤; …

    Parameter Server

    DataPartition 1

    DataPartition 2

    Backpropagation Backpropagation

    Push parameters to Servers

    Networking is critical to performance !

  • Networking Plays an Important Role

    43

    Model Logistic Regression

    Multi-layer perceptron

    Alexnet VGG-16 Resnet-50

    Speedup 2.59x 3.45x 1.6x 1.33x 1.03x

    The speedup achieved after utilizing the 40Gbps networking bandwidth with CLUSTAR

  • CLUSTAR: AI Training Platform Powered by High Performance Networking

    Smart NetworkingScheduling

    • Co-flow scheduling

    • Elephant & Mice flowscheduling

    GDR

    • Towards 0-copy data flow

    • Utilize RDMA and GPUDirect

    • Integrated with TensorFlow

    ParaExpress

    • Resilient and adaptive parameter aggregation

    • Tackles the disadvantage of Parameter Server & Ring AllReduce

    Key Technology(World-leading Research Achievements)

    MLT

    • Utilize the SGD of AI training

    • Semi-loss tolerance

    • Model quality awareness

    Between 2 Machines Multiple Machines AI Protocol

    Wider Roads Traffic Scheduling New Traffic Rule for AI

    The important of networking towards AI system equals the traffic system towards cities

    44

  • CLUSTAR: AI Training Platform Powered by High Performance Networking

    Smart NetworkingScheduling

    • Co-flow scheduling

    • Elephant & Mice flowscheduling

    GDR

    • Towards 0-copy data flow

    • Utilize RDMA and GPUDirect

    • Integrated with TensorFlow

    ParaExpress

    • Resilient and adaptive parameter aggregation

    • Tackles the disadvantage of Parameter Server & Ring AllReduce

    Key Technology(World-leading Research Achievements)

    MLT

    • Utilize the SGD of AI training

    • Semi-loss tolerance

    • Model quality awareness

    Between 2 Machines Multiple Machines AI Protocol

    Wider Roads Traffic Scheduling New Traffic Rule for AI

    The important of networking towards AI system equals the traffic system towards cities

    45

  • CLUSTAR: AI Training Platform Powered by High Performance Networking

    Smart NetworkingScheduling

    • Co-flow scheduling

    • Elephant & Mice flowscheduling

    GDR

    • Towards 0-copy data flow

    • Utilize RDMA and GPUDirect

    • Integrated with TensorFlow

    ParaExpress

    • Resilient and adaptive parameter aggregation

    • Tackles the disadvantage of Parameter Server & Ring AllReduce

    Key Technology(World-leading Research Achievements)

    MLT

    • Utilize the SGD of AI training

    • Semi-loss tolerance

    • Model quality awareness

    Between 2 Machines Multiple Machines AI Protocol

    Wider Roads Traffic Scheduling New Traffic Rule for AI

    The important of networking towards AI system equals the traffic system towards cities

    46

  • CLUSTAR Platform

    47

    基础设施

    应⽤用计算机视觉

    ⾦金金融⾏行行业应⽤用

    语⾳音识别 ⾃自然语⾔言处理理 ⾃自动驾驶智能反欺诈 智能⽆无⼈人机

    安防⾏行行业应⽤用 互联⽹网⾏行行业应⽤用 制造业⾏行行业应⽤用医疗⾏行行业应⽤用 政府⾏行行业应⽤用

    通⽤用硬件 CPU GPU FPGA ASIC RDMA⽹网络 全闪存存储

    Intel Nvidia AMD 寒武纪 Mellanox Broadcom P4E8

    Storage

    Clustar AI Fabrics

    RoCE 智能⽹网卡

    数据预处理 离线训练 在线训练 多租户管理 任务调度 运维监控

    Spark优化 TensorFlow优化 容器器编排引擎 交互编程界⾯面

    星云平台 可编程⽹网络

  • GDR: Towards Zero Copy Data Flow

    48

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    49

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    50

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    51

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    52

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    53

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    54

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    55

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    The unnecessary copy between RNIC/Memory and GPU RAM/Memory enlarges latency, degrades throughput and burns CPU

  • GDR: Towards Zero Copy Data Flow

    56

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    GDR removes the unnecessary copy to boost performance

  • GDR: Towards Zero Copy Data Flow

    57

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    GDR removes the unnecessary copy to boost performance

  • GDR: Towards Zero Copy Data Flow

    58

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    GDR removes the unnecessary copy to boost performance

  • GDR: Towards Zero Copy Data Flow

    59

    CPU

    Memory

    GPU RDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 1

    CPU

    Memory

    GPURDMANIC

    CPU

    Memory

    GPU GPU

    Socket 1 Socket 2

    Server 2

    Data Center Networking

    GDR removes the unnecessary copy to boost performance

  • Memory Management

    60

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    61

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    62

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    63

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Data Copy

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    64

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Data Copy

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    Manually manage malloc() and free() over pre-pinned memory

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    65

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Data Copy

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    Manually manage malloc() and free() over pre-pinned memory

    Allocated Object

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • Memory Management

    66

    OS Managed Application Memory

    Pinned RDMA MemorySending Buffer

    Allocated Object

    Data Copy

    Allocated Object

    Unnecessary Data Copy between pinned buffer and application memory degrades performance

    Pinned RDMA MemorySending Buffer

    Manually manage malloc() and free() over pre-pinned memory

    Allocated Object

    GDR further reduces the data copy by managing the objects manually over pinned memory

  • TensorFlow GDR

    67

    GDR has been contributed to TensorFlow community (We have commercial version) https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gdr

  • Benchmark

    68

    VGG16 BERTAlexNet

  • The Evil of Parameter Server & Ring AllReduce

    69

    Worker A Worker B Worker C

    ParameterServer

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

  • The Evil of Parameter Server & Ring AllReduce

    70

    Worker A Worker B Worker C

    ParameterServer

    Bottleneck link

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

  • The Evil of Parameter Server & Ring AllReduce

    71

    Worker A Worker B Worker C

    ParameterServer

    Bottleneck link

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

  • The Evil of Parameter Server & Ring AllReduce

    72

    Worker A Worker B Worker C

    ParameterServer

    Bottleneck link

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

  • The Evil of Parameter Server & Ring AllReduce

    73

    Worker A Worker B Worker C

    ParameterServer

    Bottleneck link

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

    Delayed due to congestion

    Cannot Start Transferring

    Cannot Start Transferring

  • The Evil of Parameter Server & Ring AllReduce

    74

    Worker A Worker B Worker C

    ParameterServer

    Bottleneck link

    Parameter Server largely degrades from congested links due to over-subscribed networking

    Worker A

    Worker B

    Worker C

    Worker D

    Delayed due to congestion

    Cannot Start Transferring

    Cannot Start Transferring

    The long dependency of Ring AllReudce may cause the whole job to wait once one hop blocks

  • ParaExpress: Networking-aware Parameter Aggregation

    75

    ?Root

    Aggregator

    Leaf 1 2

    ?

    3 4 5 6

    ?

    7 8

    Rack1 Rack2

    Real-time networking conditions 1

    23

    4

    56

    78

    ToR

    ToR

    Generate

    Optimal Parameter Aggregation

    The generated parameter aggregation topology has the advantage of both tree structure (Parameter Server ) and ring structure (Ring AllReduce)

  • ParaExpress Architecture

    76

    TaskQueue

    Completion Queue

    ExecutionGraph

    Resolver

    Operation Pool

    Request Manager

    Traffic Prioritization Module

    ParaExpress MasterEmbedding Plan Prioritization

    MPI Request

    High Speed Network Interface

    Change DSCP – Priority Mapping

    ParaExpress AgentExecution Graph

    D

    …… …

    R1 A1 S1

    R2 A2

    An

    S2

    SnRnTensor

  • Highlighted Results

    77

    • Compared with TensorFlow PS, Baidu Ring AllReduce and Horovod, the software optimization of ParaExpress can achieve 1.5-4.3X better performance.

    • In real environment, ParaExpess achieves 2.6X better results than Parameter Server and 3X better results than Ring AllReduce.

  • About CLUSTAR

  • World-leading Research Achievements

    79

    9 papers appear in top-tier Networking Conference (SIGCOMM/NSDI)in recent 5 years. First in Asia.• "AuTO: Scaling Deep Reinforcement Learning to Enable Datacenter-Scale Automatic Traffic Optimization", ACM SIGCOMM 2018

    • "PowerMan: An Out-of-Band Management Network for Datacenters using Power Line Communication", USENIX NSDI 2018

    • "Resilient Datacenter Load Balancing in the Wild", ACM SIGCOMM 2017

    • "Enabling Wide-spread Communications on Optical Fabric with MegaSwitch", USENIX NSDI 2017

    • "Scheduling Mix-flows in Commodity Datacenters with Karuna", ACM SIGCOMM 2016

    • "CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark", ACM SIGCOMM 2016

    • "Enabling ECN in Multi-Service Multi-Queue Data Centers", USENIX NSDI 2016

    • "Information-Agnostic Flow Scheduling for Commodity Data Centers", USENIX NSDI 2015

    • "Explicit Path Control in Commodity Data Centers: Design and Applications", USENIX NSDI 2015

    Statistics of universities in great China:

    University Number of accepted papers

    HKUST 9 (all are from teams of CLUSTAR)

    Tsinghua University 5 (from different professors and labs)

    Chinese Academy of Sciences 3 (from different professors and labs)

    Peking University 1

    Fudan University 1

    National Supercomputing Center in Wuxi 1

  • Selected Clients

    80

    n GDR n ParaExpress

    Selected Clients

    Utilize GDR to boost

    performance of moments

    classification for WeChat; and

    performance of CV for SAIC

    Achievements:

    1. ~3X for Wechat

    2. ~1.6 for SAIC

    Selected Clients

    Utilize ParaExpress to

    improve the performance

    of AI training in the

    sophisticated cloud

    environment

    Progress:

    POC

    Utilize GDR to boost the

    performance of Federated

    Learning. Utilize MLT to boost

    the long-distance

    communication

    Progress:

    Developing

    High-speed

    Networking

    virtualization

    Progress:

    Developing

    AI Unicorn1

    1 NDA issue

    n AI Consulting

    Selected Clients

    Smart customer support

    system. Utilize CLUSTAR

    platform to speed up the AI

    training

    Progress:

    Developing next-gen AI

    platform together

  • CLUSTAR Team

    81

    Kai CHENFounder

    • PhD, Northwest University• Associated Professor, HKUST• 50+ top-tier networking conference paper

    (SIGCOMM/NSDI)

    Qiang YANGCo-founder

    • PhD, University of Maryland• Chair Professor, Department Head of CSE,

    HKUST• President of IJCAI

    • 10+ years of research experience on DCN• Director of SING Lab, HKUST• Director of WHAT Lab, HKUST

    • Founder of Transfer Learning• IEEE/ACM/AAAI Fellow• Founding director of the Huawei Noah's Ark

    Research Lab

    Shuihai HUVP of Technology

    • PhD, HKUST• Expertise on RDMA

    Pin LYUDirector of Algorithm

    • 7 years of IBM software development

    Junxue ZHANGEVP

    • PhD, HKUST• Architecture for CLUSTAR

    platform

    Yajing LYUVP of Business

    • MBA, ESSEC• 6+ years of business experiences

    Junhuan SUNVP of Engineering

    • 10+ years of engineering experiences

    Weiyan WANGAI Scientist

    • PhD, HKUST• AutoML System

  • Milestone

    82

    2018.11

    CLUSTAR v1.0 launched!

    Cooperate with SAIC

    2018.05

    CLUSTAR is founded!

    2018.01 2018.09

    Join Nvidia Inception Program

    2019.01

    CLUSTAR v1.1 launched!

    Cooperate with WeChat

    2017.08 2018.11

    Cooperate with Sunshine Insurance

    Angle Funding

    2018.03

  • THANK [email protected]

    https://www.clustarai.com