artificial intelligence mit ibm power 9 und power ai / ai vision · ibm systems 3 why deep learning...
TRANSCRIPT
IBM Systems
Centralized
Computing
Cognitive Era
E-BusinessDistributed
Computing
Smarter Planet
Office
Productivity
Client/
Server
Personal
Computer
Data
Warehousing
Big Data &
Predictive Analytics Cognitive
A New Era of Computing has Emerged
Data InsightContext
Transactional
Database
Business
Intelligence
Big Data &
Advanced Analytics
Actionable Insight in
context
Reporting
IBM Systems 3
Why Deep Learning Is Happening Now
Big Data Deep Neural Networks
PowerfulAccelerators
High performancePOWER CPUs,
Increased bandwidth and storage capacities
IBM Systems
Sic Transit Gloria Mundi
Google Brain 2012
16.000 Servers~ 8 MW/h~ 50 TFLOPS
3 NVIDIA PASCAL GPUs~ 0,9kW/h~ 62 TFLOPS
2015
1 NVIDIA Volta GPU~ 0,3kW/h~ 120 TFLOPS
2017
IBM Systems
Obama: My Successor Will Govern a Country Being Transformed by AI
IBM Systems
From data to
information
Time to
analyse
Time to
action
Decision
From knowledge
to action/reaction
Time to
reason &
conclude
Image&Video
Voice&Sound
ComInt, ELInt, SigInt
Text
Sensor
From information
to knowledge
IBM Systems
Inference
The four phases of DL/AI in the data Supply Chain – technical view
Image&Video
Voice&Sound
Detect and Collect Store/Analyze
Compress/Map Reduce
Tag/Aggregate
Knowledge Base
LearnDistributed Deep Learning
Comparison and intrepretation
Combine
Conclude/ReasonComInt, ELInt, SigInt
Text
SensorPlatforms
FPGA
Applications
Appliances
Data
Storage Bandwidth
Latency
Intervall
Network Interface
Data
Storage Bandwidth
Latency
I/O Blocksize
Network Interface
Storage Type
Compute
Storage Bandwidth
Latency
I/O Blocksize
Locality
CAPI Coherent
Accellerator Processor
Interface for Data
Infiniband Network
DiskStorage Rich
ServersFlashNVME Flash
High I/O
Data locality and low
latency
IBM Spectrum Family and
other storage options
Applied Knowledge
IBM Systems
IBM Storage For Big Dataand Analytics
IBM Platform for Deep Learning / Artificial Intelligence
ComplementingCloud Services
Image&Video
Voice&Sound
Detect and Collect Store/Analyze
Compress/Map Reduce
Tag/Aggregate
Knowledge Base
LearnDistributed Deep Learning
Comparison and intrepretation
Combine
Conclude/Reason
ComplementingIBM AI Vision for automation and scaleout DDL
ComInt, ELInt, SigInt
IBM POWER AC922
Breakthrough performance for
DL/AI and HPC with native NVLINK
Deep Learning
Frameworks
OpenBLASDistributed Frameworks
Supporting
Libraries
IBM Systems and PowerAI Framework IBM Storage for Analytics & Deep Learning
Text
Sensor
Supporting libraries:
Analytic Frameworks
and solutions :Hadoop
IBM Spectrum
Scale BeeGFSFilesystems
• IBM Elastic Storage Server
(ESS)
• Extreme Scalability
• Breakthrough performance
• Integrated solution
• IB and Etn Support
• IBM Power System 822LC
• Scalable technology
• Open Power design
• Linux only
• Flash, SAS SSD
• IB and Etn Support
CEPH/XFS
Inference
Applied KnowledgePlatforms
FPGA
Applications
Appliances
APIs
The four phases of DL/AI in the data Supply Chain
IBM Systems
The only processor specifically
designed for the AI era.
POWER9
More threads for high performance cores vs x86
more I/O bandwidth than x86
more memory bandwidth
4x 5x+ >2x OpenCAPI
NVLink 2.0
PCIe Gen4P9
IBM Systems
AC922 2 Socket, 4 GPU (SXM2) Air Cooled Node ▪ Processor
– Two POWER9 SCMs• TDP: 190W, RDP: 250W• Cores: 16, 20c
▪ Memory– Maximum of 16 DDR4 IS RDIMM slots
• Direct attach to POWER9 module– 128 to 2048GB configurations
• 8, 16, 32, 64GB, 128GB DIMMs
▪ Internal Storage – RAID 0, 1,– 0 to 2 SFF SATA HDD or SSD, default is
0 disks – PCIe adapter form factor NVMe flash
adapter
▪ PCIe Gen4 Slots– 2 PCIe x16 Gen4 LP slot– 1 PCIe x4 Gen4 LP Slot – 1, Dual port IB EDR NIC, shared by two POWER9 sockets.
▪ GPU– NVIDIA Volta GPU’s (300W max)
• SXM2 form factor, NVLink 2.0 – 2 GPU’s per POWER9 socket
• Optimized GPU bandwidth, all POWER9 NVlinks to GPUs– 2 GPUs base (1 per P9), feature upgrade to 4 GPUs
▪ Native I/O– Host 2-port USB 3.0 (1F, 1R)– Management port
▪ Enclosure Form Factor– 2U 19” rack enclosure, ~710mm (28”) deep
▪ O/S: Linux Only– Ubuntu 17.10 (with updates)– RHEL For POWER9/CORAL (limited availability in
2017)
▪ Sapphire, Opal
▪ KVM – Not required
▪ RAS– 5 year MTBF – Concurrent maintenance HDD (when installed) – N+1 redundant hot swap cooling– 2, 2200W bulk power supplies
• N+1 redundant, 4 GPU configuration • 200VAC, 277VAC, 400VDC input voltage support
Air Cooled Version Shown
Energy EfficiencyOn-chip power management, power gatingAdvanced thermal management80+ Platinum Power Supply Compliant
Service InterfaceIndustry BMC service controller w/
OpenBMC firmware Operator interface & FRU LEDs
CertificationsFCC: Class A for ServersAcoustics: Rack configuration not to exceed 9.5BEnvironment: ASHRAE A3 (5-40C, 8-85% RH, 3050m max)
OfferingAir cooled only
IBM Systems
The NVLink difference CPU-GPU
• P9 with 2nd Gen NVLink enables 5.6x faster data movement from CPU-GPU in 4 GPU system
• In 6 GPU system bandwidth is minimally reduced but balanced by higher compute capability • Results are based on IBM Internal Measurements running the CUDA H2D Bandwidth Test
• Hardware: Power AC922; 32 cores (2 x 16c chips), POWER9 with NVLink 2.0; 2.25 GHz, 1024 GB memory, 4xTesla V100 GPU; Ubuntu 16.04. S822LC for HPC; 20 cores (2 x 10c chips), POWER8 with NVLink; 2.86 GHz, 512 GB memory, Tesla P100 GPU
• Competitive HW: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 1024 GB memory, 4xTesla V100 GPU, Ubuntu 16.04
IBM Systems12
POWER9 – Data Capacity and ThroughputBig caches for massively parallel compute
and heterogeneous interactionExtreme switching bandwidth for the most
demanding compute and accellerated
workloads
L3 Cache – 120 MB shared Capacity NUCA Cache
• 10 MB Capacity + 512k L2 per SMT8 Core
• Enhanced replacement with reuse & data type awareness
• 12 * 20-way associativity
High Throughput on-chip fabric
• Over 7 TB/s On-chip switch
• Move data in/out at 256 GB/ per SMT 8 Core
IBM Systems13
POWER9 – Premier Accelleration Platform
State of the art I/O accelleration attachement signaling
• PCIe Gen 4 * 48 lanes – 192 GB duplex bandwidth
• 25G Link * 48 lanes – 300 GB/s duplex bandwidth
• Robust accelarated compute option with open standards
• On-Chip Accellearation – Gzip x1,842 Compression x2,AES/SHA x2
• CAPI 2.0 – 4x bandwidth of POWER 8 using PCIe Gen 4
high bandwidth, low latency using 25G link
• NVLINK 2.0 Next generation of CPU/GPU bandwidth
POWER 9
IBM Systems
NVLink Evolution in IBM Power AC922 5x Faster Data Communication
50+50
GB/s
50+
50 G
B/s
50+50 GB/s
POWER9
50+50
GB/s
Power9 Chipwith NVLink 2.0
Gra
ph
ics
Mem
ory
1 TBSystem Memory
V100GPU with NVLink
150GB/s
Gra
ph
ics
Mem
ory
IBM POWER9 bandwidth optimized
Gra
ph
ics
Mem
ory
Gra
ph
ics
Mem
ory
System Memory
IBM POWER9 scalability optimized
PCIe x16
NVIDIA GPU
Graphics Memory
System Memory
16+16 GB/s
x86
170 GB/s Store Large Models in System Memory
Fast Transfer via NVLink
IBM Systems
Train larger, more complex models
Large Model Support
POWER CPU
DDR4
GPU
NV
Lin
k
Graphics Memory
Traditional Model Support
CPUDDR4
GPU
PC
Ie
Graphics Memory
Limited memory on GPU forces trade-off
in model size / data resolution
Use system memory and GPU to support
more complex and higher resolution data
• Leveraging NVLink and coherence enables larger and more complex models
• Improves model accuracy with more images and higher resolution images
Large AI Models Train
~4 Times Faster
POWER9 Servers with NVLink to GPUs
vs
x86 Servers with PCIe to GPUs
3.1 Hours
49 Mins
0
2000
4000
6000
8000
10000
12000
Xeon x86 2640v4 w/ 4x V100GPUs
Power AC922 w/ 4x V100GPUs
Tim
e (
se
cs)
Caffe with LMS (Large Model Support)Runtime of 1000 Iterations
3.8x Faster
GoogleNet model on Enlarged ImageNet Dataset (2240x2240)
Distributed Deep Learning (DDL)
17
Deep learning training
takes days to weeks
Limited scaling to
multiple x86 servers
PowerAI with DDL
enables scaling to 100s
of servers 1 System 64 Systems
16 Days Down to 7 Hours58x Faster
16 Days
7 Hours
Near Ideal Scaling to 256 GPUs
ResNet-101, ImageNet-22K
1
2
4
8
16
32
64
128
256
4 16 64 256
Sp
eedup
Number of GPUs
Ideal Scaling
DDL Actual Scaling
95%Scaling
with 256 GPUS
Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System
ResNet-50, ImageNet-1K
IBM Systems
IBM PowerAI Vision
18
PowerAIVision
Productivity
Accuracy
Enterprise scale
Cost efficiency
End-to-end capability
Transform & Prep
Data (ETL)
AI Infrastructure Stack
Applications
Cognitive APIs (Eg:
Watson)In-House APIs
Machine & Deep Learning
Libraries & Frameworks
Distributed Computing
Data Lake & Data Stores
Segment Specific: Finance, Retail, Healthcare
Speech, Vision, NLP, Sentiment
TensorFlow, Caffe, SparkML
Spark, MPI
Hadoop HDFS, NoSQL DBs
Accelerated Infrastructure
Accelerated Servers Storage
PowerAI
PowerAI
20
Deep Learning Impact
(DLI) Module
Data & Model
Management, ETL,
Visualize, Advise
IBM Spectrum Conductor with Spark
Cluster Virtualization,
Auto Hyper-Parameter Optimization
PowerAI: Open Source ML Frameworks
Large Model Support (LMS)
Distributed Deep Learning
(DDL)Auto ML
PowerAIEnterprise
Auto-ML for Images & VideoPowerAIVision Label Train Deploy
Accelerated Infrastructure
Accelerated Servers Storage
Auto Hyper-Parameter Tuning
Hyper-parameters
– Learning rate
– Decay rate
– Batch size
– Optimizero GradientDecedent,
Adadelta, Momentum,
RMSProp …
– Momentum (for some
optimizers)
– LSTM hidden unit size
RandomTree-based Parzen
Estimator (TPE)Bayesian
Multi-tenant Spark ClusterIBM Spectrum Conductor with Spark
Spark search jobs are generated dynamically and executed in parallel
21
22
libGLM (C++ / CUDA Optimized Primitive Lib)
Distributed Training
Logistic Regression Linear Regression
Support Vector Machines (SVM)
Distributed Hyper-Parameter Optimization
More Coming Soon
APIs for Popular ML Frameworks
Snap ML Distributed GPU-Accelerated Machine Learning Library
Snap Machine Learning (ML) Library
IBM Systems
PowerAI Vision : The Deep Learning Development Platform for image/video analysis
Docker, KVM(POWER)
Accelerator (GPU/FPGA) Data Store(distributed FS and object
store)
Network
Resource management layer (CPU/GPU/FPGA)
(Docker, Kubernetes)
Data set management
Training task management
Model management
Inference API management
Service Management Layer
Self-defined Training with graphic/visual
monitoring
Custom Learning for Image Classification
Inference API deployment
DL-I
nsig
ht: M
onitoring a
nd O
ptim
ization t
ool
for
Deep L
earn
ing
Image Labeling and Preprocessing
Image preprocessing management
Model Training Component (Caffe, and others)
ML/DL Computation Layer
Data preprocessing component
Vision Recognition Service Layer
Inference Component (Caffe, and others)
Video Labeling Service
Data label management
Custom Learning for Object Detection
IBM Systems
PowerAI Vision enables enterprise level DNN easier
• PowerAI Vision automates the deep learning development cycles for developers.
• Deep knowledges of ML/DL and computer vision have been embedded into PowerAI Vision.
Define
training
task
Prepare
training
Data
Data Pre-
processing
DNN Model
selection
Configure the
training
hyper-
parameter
DNN Model
Training
Start
Package the new
DNN model together
with preprocessing
into inference proc.,
and deploy API
Inference
DL training
framework
preparation
Steps automatically done by PowerAI Vision
User could use the
deployed API for visual
recognition
On-line training by looping the inference and training capability together
• User defined
categories
• Data set
management
• Format transformation
• Support both training and
evaluation sets
• Support different
preprocessing plugin
• Provide base models for
different scenarios
• Predict training time
• Training process
visualization
• Training with
GPU
• Scalability and HA
deployment are
supported
Data Pre-
processing
DNN Model
selection
Configure the
training
hyper-
parameter
DNN Model
Training
DL training
framework
preparation
IBM Systems
Optimizing the development of AI with IBM AI Vision
Define
Training
Task
Prepare
Data
Data
Processing
DNN Model
Selection
DL
Framework
Preparation
Configure
training
parameter
DNN model
training
Package the new
DNN model
together with
preprocessing into
inference proc.
Application
API
Typical Challenges in AI projects • Time consuming, expensive and questionable outcome • No experience on DNN design and development • No experience on computer vision • No experience on how to build a platform to support enterprise scale deep learning, • including data preparation, training, and inference
Define
Training
Task
Prepare
Data
Data
Processing
DNN Model
Selection
DL
Framework
Preparation
Configure
training
parameter
DNN model
training
Package the new
DNN model
together with
preprocessing into
inference proc.
Application
API
Automation done by IBM AI Vision
• AI Vision automates the deep learning development cycles for developers. • Deep knowledges of ML/DL and computer vision have been embedded into AI Vision.• Reduces time, cost and complexity for AI integration
IBM Systems
Features : Data Labeling
Labeling for classification : Allow user to
upload image data and label the categories
26
▪ All the labelled data could be exported as
zip file and download
Labeling for object detection: Allow user to
upload image data and define bounding
box
Support the data upload for
different purposes and
different format
IBM Systems
Feature: Data Labeling / Video Data Platform
27
Support multiple level management (Figure 1)
– Labeling task
– Video stream
– Label data
▪ Support manual labeling and auto-labeling (figure 2)
– Manual capture and periodic capture
– Add user defined new attributes
Figure 1
Figure 2
Figure 3
▪ Provide tag management
and statistics (figure 3)
IBM Systems
Feature : Data preprocessing
Supported multiple types of data preprocessing for image data.
28
Faces: Cars: Pedestrians:
▪ Data preprocessing is very important for some domain specific learning.
▪ Data preprocessing can be extended by customized algorithms.
IBM Systems
Feature: Training
Training task management▪ Custom learning for classification and object detection
– Transfer learning is used to get high accuracy and efficiency
▪ Support different training strategy, precise
first, accuracy first, and customized
configuration.
▪ Provide the time
estimation for the
new training task
▪ Visualization for
training progress
IBM Systems
Feature : Inference for different deployment
“one click” to package the model and preprocessing into a new docker
instance and launch it in container cloud
30
▪ REST API will be automatically exposed for each model and
managed.
▪ Provide easy test for the new API
– Heat-map is provided.
– Support “negative” for classification
Fig.1 Fig.2
Fig.3 Classification
Fig.4 object detection▪ Support the inference deployment on
different architecture and location
– CPU, GPU, and FPGA
– Data center, branch, and edge
IBM Systems
Trained Caffe CNN model in data center
PowerAI Inference
Engine tool
FPGA Accelerator bit-file for edge
Net Model File Verilog File FPGA Bit File FPGA Execution
translation synthesis download
name: "dummy-net"
layers { name: "data" …}
layers { name: "conv" …}
layers { name: "pool" …}
… more layers …
layers { name: "loss" …}
--input module---
conv conv_instance(…)
pool pool_instance(…)
…more layers
loss loss_instance(…)
--output module---Net.bit
FPGA chip range from $20 to $1K
Automatically enable deep learning from cloud to edge – Enhance productivity
PowerAI Inference Engine (AccDNN): Automatically generate deep
learning accelerator
IBM Systems
Security, defence,
protection of
cyber crime
Health & research
Weather, climate research
& Agriculture
car2X, autonomous vehicles and
intelligent traffic systems
Retail and
Marketing
Banking, finance
& insurance
Industry 4.0
The value of AI in connected data islandsfor a hyperconnected and cognitive universe
Energy, utilities and
Smart cities
Wearables & mobility
Infotainment, industrial & military
health and fitness
Connected Home
IBM Systems
34
Copyright © 2016 by International Business Machines Corporation. All rights reserved.
No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This document could
include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) described
herein at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only. References in this document to IBM products, programs, or services does not imply that IBM intends to make such
products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this
document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe
IBM's intellectually property rights, may be used instead.
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER OR IMPLIED. IBM LY
DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IBM shall have no
responsibility to update this information. IBM products are warranted, if at all, according to the terms and conditions of the agreements (e.g., IBM
Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. Information
concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources.
IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims
related to non-IBM products. IBM makes no representations or warranties, ed or implied, regarding non-IBM products and services.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights.
Inquiries regarding patent or copyright licenses should be made, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 1 0504- 785
U.S.A.
Legal Notices
35
IBM, the IBM logo, ibm.com, IBM System Storage, IBM Spectrum Storage, IBM Spectrum Control, IBM Spectrum Protect, IBM Spectrum Archive, IBM Spectrum Virtualize, IBM Spectrum
Scale, IBM Spectrum Accelerate, Softlayer, and XIV are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A current list of IBM trademarks
is available on the Web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml
The following are trademarks or registered trademarks of other companies.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited.
Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of
Intel Corporation or its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
ITIL is a Registered Trade Mark of AXELOS Limited.
UNIX is a registered trademark of The Open Group in the United States and other countries.
* All other products may be trademarks or registered trademarks of their respective companies.
Notes:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that
any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject
to change without notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
This presentation and the claims outlined in it were reviewed for compliance with US law. Adaptations of these claims for use in other geographies must be reviewed
by the local country counsel for compliance with local laws.
Legal Notices