artificial intelligence mit ibm power 9 und power ai / ai vision · ibm systems 3 why deep learning...

35
IBM Power Systems Artificial Intelligence mit IBM Power 9 und Power AI / AI Vision Ulrich Walter

Upload: dangthu

Post on 17-Feb-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

IBM Power Systems

Artificial Intelligence mit IBM Power 9

und Power AI / AI Vision

Ulrich Walter

IBM Systems

Centralized

Computing

Cognitive Era

E-BusinessDistributed

Computing

Smarter Planet

Office

Productivity

Client/

Server

Personal

Computer

Data

Warehousing

Big Data &

Predictive Analytics Cognitive

A New Era of Computing has Emerged

Data InsightContext

Transactional

Database

Business

Intelligence

Big Data &

Advanced Analytics

Actionable Insight in

context

Reporting

IBM Systems 3

Why Deep Learning Is Happening Now

Big Data Deep Neural Networks

PowerfulAccelerators

High performancePOWER CPUs,

Increased bandwidth and storage capacities

IBM Systems

Sic Transit Gloria Mundi

Google Brain 2012

16.000 Servers~ 8 MW/h~ 50 TFLOPS

3 NVIDIA PASCAL GPUs~ 0,9kW/h~ 62 TFLOPS

2015

1 NVIDIA Volta GPU~ 0,3kW/h~ 120 TFLOPS

2017

IBM Systems

Obama: My Successor Will Govern a Country Being Transformed by AI

IBM Systems

From data to

information

Time to

analyse

Time to

action

Decision

From knowledge

to action/reaction

Time to

reason &

conclude

Image&Video

Voice&Sound

ComInt, ELInt, SigInt

Text

Sensor

From information

to knowledge

IBM Systems

Inference

The four phases of DL/AI in the data Supply Chain – technical view

Image&Video

Voice&Sound

Detect and Collect Store/Analyze

Compress/Map Reduce

Tag/Aggregate

Knowledge Base

LearnDistributed Deep Learning

Comparison and intrepretation

Combine

Conclude/ReasonComInt, ELInt, SigInt

Text

SensorPlatforms

FPGA

Applications

Appliances

Data

Storage Bandwidth

Latency

Intervall

Network Interface

Data

Storage Bandwidth

Latency

I/O Blocksize

Network Interface

Storage Type

Compute

Storage Bandwidth

Latency

I/O Blocksize

Locality

CAPI Coherent

Accellerator Processor

Interface for Data

Infiniband Network

DiskStorage Rich

ServersFlashNVME Flash

High I/O

Data locality and low

latency

IBM Spectrum Family and

other storage options

Applied Knowledge

IBM Systems

IBM Storage For Big Dataand Analytics

IBM Platform for Deep Learning / Artificial Intelligence

ComplementingCloud Services

Image&Video

Voice&Sound

Detect and Collect Store/Analyze

Compress/Map Reduce

Tag/Aggregate

Knowledge Base

LearnDistributed Deep Learning

Comparison and intrepretation

Combine

Conclude/Reason

ComplementingIBM AI Vision for automation and scaleout DDL

ComInt, ELInt, SigInt

IBM POWER AC922

Breakthrough performance for

DL/AI and HPC with native NVLINK

Deep Learning

Frameworks

OpenBLASDistributed Frameworks

Supporting

Libraries

IBM Systems and PowerAI Framework IBM Storage for Analytics & Deep Learning

Text

Sensor

Supporting libraries:

Analytic Frameworks

and solutions :Hadoop

IBM Spectrum

Scale BeeGFSFilesystems

• IBM Elastic Storage Server

(ESS)

• Extreme Scalability

• Breakthrough performance

• Integrated solution

• IB and Etn Support

• IBM Power System 822LC

• Scalable technology

• Open Power design

• Linux only

• Flash, SAS SSD

• IB and Etn Support

CEPH/XFS

Inference

Applied KnowledgePlatforms

FPGA

Applications

Appliances

APIs

The four phases of DL/AI in the data Supply Chain

IBM Systems

The only processor specifically

designed for the AI era.

POWER9

More threads for high performance cores vs x86

more I/O bandwidth than x86

more memory bandwidth

4x 5x+ >2x OpenCAPI

NVLink 2.0

PCIe Gen4P9

IBM Systems

AC922 2 Socket, 4 GPU (SXM2) Air Cooled Node ▪ Processor

– Two POWER9 SCMs• TDP: 190W, RDP: 250W• Cores: 16, 20c

▪ Memory– Maximum of 16 DDR4 IS RDIMM slots

• Direct attach to POWER9 module– 128 to 2048GB configurations

• 8, 16, 32, 64GB, 128GB DIMMs

▪ Internal Storage – RAID 0, 1,– 0 to 2 SFF SATA HDD or SSD, default is

0 disks – PCIe adapter form factor NVMe flash

adapter

▪ PCIe Gen4 Slots– 2 PCIe x16 Gen4 LP slot– 1 PCIe x4 Gen4 LP Slot – 1, Dual port IB EDR NIC, shared by two POWER9 sockets.

▪ GPU– NVIDIA Volta GPU’s (300W max)

• SXM2 form factor, NVLink 2.0 – 2 GPU’s per POWER9 socket

• Optimized GPU bandwidth, all POWER9 NVlinks to GPUs– 2 GPUs base (1 per P9), feature upgrade to 4 GPUs

▪ Native I/O– Host 2-port USB 3.0 (1F, 1R)– Management port

▪ Enclosure Form Factor– 2U 19” rack enclosure, ~710mm (28”) deep

▪ O/S: Linux Only– Ubuntu 17.10 (with updates)– RHEL For POWER9/CORAL (limited availability in

2017)

▪ Sapphire, Opal

▪ KVM – Not required

▪ RAS– 5 year MTBF – Concurrent maintenance HDD (when installed) – N+1 redundant hot swap cooling– 2, 2200W bulk power supplies

• N+1 redundant, 4 GPU configuration • 200VAC, 277VAC, 400VDC input voltage support

Air Cooled Version Shown

Energy EfficiencyOn-chip power management, power gatingAdvanced thermal management80+ Platinum Power Supply Compliant

Service InterfaceIndustry BMC service controller w/

OpenBMC firmware Operator interface & FRU LEDs

CertificationsFCC: Class A for ServersAcoustics: Rack configuration not to exceed 9.5BEnvironment: ASHRAE A3 (5-40C, 8-85% RH, 3050m max)

OfferingAir cooled only

IBM Systems

The NVLink difference CPU-GPU

• P9 with 2nd Gen NVLink enables 5.6x faster data movement from CPU-GPU in 4 GPU system

• In 6 GPU system bandwidth is minimally reduced but balanced by higher compute capability • Results are based on IBM Internal Measurements running the CUDA H2D Bandwidth Test

• Hardware: Power AC922; 32 cores (2 x 16c chips), POWER9 with NVLink 2.0; 2.25 GHz, 1024 GB memory, 4xTesla V100 GPU; Ubuntu 16.04. S822LC for HPC; 20 cores (2 x 10c chips), POWER8 with NVLink; 2.86 GHz, 512 GB memory, Tesla P100 GPU

• Competitive HW: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 1024 GB memory, 4xTesla V100 GPU, Ubuntu 16.04

IBM Systems12

POWER9 – Data Capacity and ThroughputBig caches for massively parallel compute

and heterogeneous interactionExtreme switching bandwidth for the most

demanding compute and accellerated

workloads

L3 Cache – 120 MB shared Capacity NUCA Cache

• 10 MB Capacity + 512k L2 per SMT8 Core

• Enhanced replacement with reuse & data type awareness

• 12 * 20-way associativity

High Throughput on-chip fabric

• Over 7 TB/s On-chip switch

• Move data in/out at 256 GB/ per SMT 8 Core

IBM Systems13

POWER9 – Premier Accelleration Platform

State of the art I/O accelleration attachement signaling

• PCIe Gen 4 * 48 lanes – 192 GB duplex bandwidth

• 25G Link * 48 lanes – 300 GB/s duplex bandwidth

• Robust accelarated compute option with open standards

• On-Chip Accellearation – Gzip x1,842 Compression x2,AES/SHA x2

• CAPI 2.0 – 4x bandwidth of POWER 8 using PCIe Gen 4

high bandwidth, low latency using 25G link

• NVLINK 2.0 Next generation of CPU/GPU bandwidth

POWER 9

IBM Systems

NVLink Evolution in IBM Power AC922 5x Faster Data Communication

50+50

GB/s

50+

50 G

B/s

50+50 GB/s

POWER9

50+50

GB/s

Power9 Chipwith NVLink 2.0

Gra

ph

ics

Mem

ory

1 TBSystem Memory

V100GPU with NVLink

150GB/s

Gra

ph

ics

Mem

ory

IBM POWER9 bandwidth optimized

Gra

ph

ics

Mem

ory

Gra

ph

ics

Mem

ory

System Memory

IBM POWER9 scalability optimized

PCIe x16

NVIDIA GPU

Graphics Memory

System Memory

16+16 GB/s

x86

170 GB/s Store Large Models in System Memory

Fast Transfer via NVLink

IBM Systems

Train larger, more complex models

Large Model Support

POWER CPU

DDR4

GPU

NV

Lin

k

Graphics Memory

Traditional Model Support

CPUDDR4

GPU

PC

Ie

Graphics Memory

Limited memory on GPU forces trade-off

in model size / data resolution

Use system memory and GPU to support

more complex and higher resolution data

• Leveraging NVLink and coherence enables larger and more complex models

• Improves model accuracy with more images and higher resolution images

Large AI Models Train

~4 Times Faster

POWER9 Servers with NVLink to GPUs

vs

x86 Servers with PCIe to GPUs

3.1 Hours

49 Mins

0

2000

4000

6000

8000

10000

12000

Xeon x86 2640v4 w/ 4x V100GPUs

Power AC922 w/ 4x V100GPUs

Tim

e (

se

cs)

Caffe with LMS (Large Model Support)Runtime of 1000 Iterations

3.8x Faster

GoogleNet model on Enlarged ImageNet Dataset (2240x2240)

Distributed Deep Learning (DDL)

17

Deep learning training

takes days to weeks

Limited scaling to

multiple x86 servers

PowerAI with DDL

enables scaling to 100s

of servers 1 System 64 Systems

16 Days Down to 7 Hours58x Faster

16 Days

7 Hours

Near Ideal Scaling to 256 GPUs

ResNet-101, ImageNet-22K

1

2

4

8

16

32

64

128

256

4 16 64 256

Sp

eedup

Number of GPUs

Ideal Scaling

DDL Actual Scaling

95%Scaling

with 256 GPUS

Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System

ResNet-50, ImageNet-1K

IBM Systems

IBM PowerAI Vision

18

PowerAIVision

Productivity

Accuracy

Enterprise scale

Cost efficiency

End-to-end capability

Transform & Prep

Data (ETL)

AI Infrastructure Stack

Applications

Cognitive APIs (Eg:

Watson)In-House APIs

Machine & Deep Learning

Libraries & Frameworks

Distributed Computing

Data Lake & Data Stores

Segment Specific: Finance, Retail, Healthcare

Speech, Vision, NLP, Sentiment

TensorFlow, Caffe, SparkML

Spark, MPI

Hadoop HDFS, NoSQL DBs

Accelerated Infrastructure

Accelerated Servers Storage

PowerAI

PowerAI

20

Deep Learning Impact

(DLI) Module

Data & Model

Management, ETL,

Visualize, Advise

IBM Spectrum Conductor with Spark

Cluster Virtualization,

Auto Hyper-Parameter Optimization

PowerAI: Open Source ML Frameworks

Large Model Support (LMS)

Distributed Deep Learning

(DDL)Auto ML

PowerAIEnterprise

Auto-ML for Images & VideoPowerAIVision Label Train Deploy

Accelerated Infrastructure

Accelerated Servers Storage

Auto Hyper-Parameter Tuning

Hyper-parameters

– Learning rate

– Decay rate

– Batch size

– Optimizero GradientDecedent,

Adadelta, Momentum,

RMSProp …

– Momentum (for some

optimizers)

– LSTM hidden unit size

RandomTree-based Parzen

Estimator (TPE)Bayesian

Multi-tenant Spark ClusterIBM Spectrum Conductor with Spark

Spark search jobs are generated dynamically and executed in parallel

21

22

libGLM (C++ / CUDA Optimized Primitive Lib)

Distributed Training

Logistic Regression Linear Regression

Support Vector Machines (SVM)

Distributed Hyper-Parameter Optimization

More Coming Soon

APIs for Popular ML Frameworks

Snap ML Distributed GPU-Accelerated Machine Learning Library

Snap Machine Learning (ML) Library

IBM Systems

PowerAI Vision : The Deep Learning Development Platform for image/video analysis

Docker, KVM(POWER)

Accelerator (GPU/FPGA) Data Store(distributed FS and object

store)

Network

Resource management layer (CPU/GPU/FPGA)

(Docker, Kubernetes)

Data set management

Training task management

Model management

Inference API management

Service Management Layer

Self-defined Training with graphic/visual

monitoring

Custom Learning for Image Classification

Inference API deployment

DL-I

nsig

ht: M

onitoring a

nd O

ptim

ization t

ool

for

Deep L

earn

ing

Image Labeling and Preprocessing

Image preprocessing management

Model Training Component (Caffe, and others)

ML/DL Computation Layer

Data preprocessing component

Vision Recognition Service Layer

Inference Component (Caffe, and others)

Video Labeling Service

Data label management

Custom Learning for Object Detection

IBM Systems

PowerAI Vision enables enterprise level DNN easier

• PowerAI Vision automates the deep learning development cycles for developers.

• Deep knowledges of ML/DL and computer vision have been embedded into PowerAI Vision.

Define

training

task

Prepare

training

Data

Data Pre-

processing

DNN Model

selection

Configure the

training

hyper-

parameter

DNN Model

Training

Start

Package the new

DNN model together

with preprocessing

into inference proc.,

and deploy API

Inference

DL training

framework

preparation

Steps automatically done by PowerAI Vision

User could use the

deployed API for visual

recognition

On-line training by looping the inference and training capability together

• User defined

categories

• Data set

management

• Format transformation

• Support both training and

evaluation sets

• Support different

preprocessing plugin

• Provide base models for

different scenarios

• Predict training time

• Training process

visualization

• Training with

GPU

• Scalability and HA

deployment are

supported

Data Pre-

processing

DNN Model

selection

Configure the

training

hyper-

parameter

DNN Model

Training

DL training

framework

preparation

IBM Systems

Optimizing the development of AI with IBM AI Vision

Define

Training

Task

Prepare

Data

Data

Processing

DNN Model

Selection

DL

Framework

Preparation

Configure

training

parameter

DNN model

training

Package the new

DNN model

together with

preprocessing into

inference proc.

Application

API

Typical Challenges in AI projects • Time consuming, expensive and questionable outcome • No experience on DNN design and development • No experience on computer vision • No experience on how to build a platform to support enterprise scale deep learning, • including data preparation, training, and inference

Define

Training

Task

Prepare

Data

Data

Processing

DNN Model

Selection

DL

Framework

Preparation

Configure

training

parameter

DNN model

training

Package the new

DNN model

together with

preprocessing into

inference proc.

Application

API

Automation done by IBM AI Vision

• AI Vision automates the deep learning development cycles for developers. • Deep knowledges of ML/DL and computer vision have been embedded into AI Vision.• Reduces time, cost and complexity for AI integration

IBM Systems

Features : Data Labeling

Labeling for classification : Allow user to

upload image data and label the categories

26

▪ All the labelled data could be exported as

zip file and download

Labeling for object detection: Allow user to

upload image data and define bounding

box

Support the data upload for

different purposes and

different format

IBM Systems

Feature: Data Labeling / Video Data Platform

27

Support multiple level management (Figure 1)

– Labeling task

– Video stream

– Label data

▪ Support manual labeling and auto-labeling (figure 2)

– Manual capture and periodic capture

– Add user defined new attributes

Figure 1

Figure 2

Figure 3

▪ Provide tag management

and statistics (figure 3)

IBM Systems

Feature : Data preprocessing

Supported multiple types of data preprocessing for image data.

28

Faces: Cars: Pedestrians:

▪ Data preprocessing is very important for some domain specific learning.

▪ Data preprocessing can be extended by customized algorithms.

IBM Systems

Feature: Training

Training task management▪ Custom learning for classification and object detection

– Transfer learning is used to get high accuracy and efficiency

▪ Support different training strategy, precise

first, accuracy first, and customized

configuration.

▪ Provide the time

estimation for the

new training task

▪ Visualization for

training progress

IBM Systems

Feature : Inference for different deployment

“one click” to package the model and preprocessing into a new docker

instance and launch it in container cloud

30

▪ REST API will be automatically exposed for each model and

managed.

▪ Provide easy test for the new API

– Heat-map is provided.

– Support “negative” for classification

Fig.1 Fig.2

Fig.3 Classification

Fig.4 object detection▪ Support the inference deployment on

different architecture and location

– CPU, GPU, and FPGA

– Data center, branch, and edge

IBM Systems

Trained Caffe CNN model in data center

PowerAI Inference

Engine tool

FPGA Accelerator bit-file for edge

Net Model File Verilog File FPGA Bit File FPGA Execution

translation synthesis download

name: "dummy-net"

layers { name: "data" …}

layers { name: "conv" …}

layers { name: "pool" …}

… more layers …

layers { name: "loss" …}

--input module---

conv conv_instance(…)

pool pool_instance(…)

…more layers

loss loss_instance(…)

--output module---Net.bit

FPGA chip range from $20 to $1K

Automatically enable deep learning from cloud to edge – Enhance productivity

PowerAI Inference Engine (AccDNN): Automatically generate deep

learning accelerator

IBM Systems

Security, defence,

protection of

cyber crime

Health & research

Weather, climate research

& Agriculture

car2X, autonomous vehicles and

intelligent traffic systems

Retail and

Marketing

Banking, finance

& insurance

Industry 4.0

The value of AI in connected data islandsfor a hyperconnected and cognitive universe

Energy, utilities and

Smart cities

Wearables & mobility

Infotainment, industrial & military

health and fitness

Connected Home

IBM Systems

|

33

IBM Systems

34

Copyright © 2016 by International Business Machines Corporation. All rights reserved.

No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.

Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This document could

include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) described

herein at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and

represent goals and objectives only. References in this document to IBM products, programs, or services does not imply that IBM intends to make such

products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this

document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe

IBM's intellectually property rights, may be used instead.

THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER OR IMPLIED. IBM LY

DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no

responsibility to update this information. IBM products are warranted, if at all, according to the terms and conditions of the agreements (e.g., IBM

Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. Information

concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources.

IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims

related to non-IBM products. IBM makes no representations or warranties, ed or implied, regarding non-IBM products and services.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights.

Inquiries regarding patent or copyright licenses should be made, in writing, to:

IBM Director of Licensing

IBM Corporation

North Castle Drive

Armonk, NY 1 0504- 785

U.S.A.

Legal Notices

35

IBM, the IBM logo, ibm.com, IBM System Storage, IBM Spectrum Storage, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum

Scale, IBM Spectrum Accelerate, Softlayer, and XIV are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A current list of IBM trademarks

is available on the Web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml

The following are trademarks or registered trademarks of other companies.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited.

Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of

Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.

ITIL is a Registered Trade Mark of AXELOS Limited.

UNIX is a registered trademark of The Open Group in the United States and other countries.

* All other products may be trademarks or registered trademarks of their respective companies.

Notes:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that

any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the

workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have

achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject

to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the

performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

This presentation and the claims outlined in it were reviewed for compliance with US law. Adaptations of these claims for use in other geographies must be reviewed

by the local country counsel for compliance with local laws.

Legal Notices