keynote 12, dr. martin schmatz, ibm
TRANSCRIPT
© 2015 IBM Corporation
High-Performance Compute Systems: The road ahead
A broad view on the high-performance IT environment
IBM Research – Zurich
Dr. Martin Schmatz – Manager, Cloud Server Technologies
13 May 2015
GUIDE SHARE EUROPE: Jahrestagung DACH, Hamburg, May 11-13, 2015
Moore’s Law: From thousands to billions
1971: Intel 4004, 2’300 transistors
© 2014 IBM Corporation
2
IBM Research - Zurich
Photo: “Designer Delves into Complexities of World’s First
Microprocessor” by Intel Free Press
2014: IBM Power8,
4’200’000’000 transistors
40+ years
http://www.extremetech.com/extreme/191453-ibm-unveils-new-
power8-servers-in-last-gasp-effort-to-battle-intels-x86-dominion
© 2015 IBM Corporation 3
Evolution of IT over 50+ years: Past & present
Time
IT E
ffic
ien
cy
Scaling
1970’s
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
CLOCK ERA
IBM Research - Zurich
Moore’s law siblings
Calculations per Second for $100
© 2014 IBM Corporation
IBM Research - Zurich
http://www.kurzweilai.net/ask-ray-the-future-of-moores-law
Moore’s law siblings
Calculations per Second for $100
© 2014 IBM Corporation
IBM Research - Zurich
http://www.kurzweilai.net/ask-ray-the-future-of-moores-law
> 15 orders of
magnitude
100 years
Moore’s law siblings
Calculations per Second for $100
© 2014 IBM Corporation
6
IBM Research - Zurich
http://www.kurzweilai.net/ask-ray-the-future-of-moores-law
http://www.economist.com/news/21589080-golden-rule-
microchips-appears-be-coming-end-no-moore
Transistors per 1$
x8 in 10 years Flat or
less
20
12
© 2015 IBM Corporation 7
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
© 2015 IBM Corporation 8
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
x 15 mem BW
10 Years: 2004 2014
Z13: 832 GB/s system memory BW
© 2015 IBM Corporation 9
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
x 50 caches
x 15 mem BW
10 Years: 2004 2014
© 2015 IBM Corporation 10
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
x 24 threads
x 50 caches
x 15 mem BW
10 Years: 2004 2014
Numbers of cores in the system
Single core Pentium 2 [1997]
© 2014 IBM Corporation
11
IBM Research - Zurich
http://encyclopedia2.thefreedictionary.com/Pentium+II
http://intel.com
18C/36T E5 [2015]
18 years
IBM Z13: Up to 141C/282T per system (plus SIMD)
Aggregating ~256 workloads improves utilization & over-provisioning
© 2014 IBM Corporation
12
Impact of workload aggregation on provisioning
Number of consolidated workloads
Utiliz
ation
, O
ver-
pro
vis
ionin
g facto
r
0
1
2
3
4
5
6
1 10 100 1000
Overprovisioning
Utilization
IBM Research - Zurich
Aggregating ~256 workloads improves utilization & over-provisioning ~ x5
© 2014 IBM Corporation
13
Impact of workload aggregation on provisioning
Number of consolidated workloads
Utiliz
ation
, O
ver-
pro
vis
ionin
g facto
r
0
1
2
3
4
5
6
1 10 100 1000
Overprovisioning
Utilization
÷5
x5
IBM Research - Zurich
Diminishing ROI when going beyond 500-1000 aggregated workloads
© 2014 IBM Corporation
14
Impact of workload aggregation on provisioning
Number of consolidated workloads
Utiliz
ation
, O
ver-
pro
vis
ionin
g facto
r
0
1
2
3
4
5
6
1 10 100 1000
Overprovisioning
Utilization
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
144
Almost no gain
despite of ~ doubling
number of consolidated
workloads
256
IBM Research - Zurich
© 2015 IBM Corporation 15
Evolution of IT over 50+ years: Past & present
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
1970’s ~2004
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
IBM Research - Zurich
16 © 2014 IBM Corporation
16
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
Terabyte HDD POWER 7
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
IT Performance Evolution IBM Research - Zurich
17 © 2014 IBM Corporation
17
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
NAS Blade Server
Scale-out Maximize system capacity
Terabyte HDD POWER 8
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
IT Performance Evolution IBM Research - Zurich
Scale
-ou
t
18 © 2014 IBM Corporation
18
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
NAS Blade Server
Scale-out Maximize system capacity
Terabyte HDD POWER 8
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
Cloud
Computing
IT Performance Evolution IBM Research - Zurich
Scale
-ou
t
19 © 2014 IBM Corporation
19
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
Scale
-ou
t
NAS Blade Server
Scale-out Maximize system capacity
Terabyte HDD POWER 8
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
Cloud
Computing
IT Performance Evolution IBM Research - Zurich
Optimum Cost/VM
Embarrassingly
parallel workloads
20 © 2014 IBM Corporation
20
FLASH SSD
3D Chips FPGA Manycore BPRAM/SCM
Interconnect In-mem DB DAS
Scale-in Maximize system density
Minimize end-to-end latency
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Scale-in
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
Scale
-ou
t
NAS Blade Server
Scale-out Maximize system capacity
Terabyte HDD POWER 8
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
Cloud
Computing
IT Performance Evolution IBM Research - Zurich
Optimum Cost/VM
Embarrassingly
parallel workloads
21 © 2014 IBM Corporation
21
FLASH SSD
3D Chips FPGA?GPU Manycore BPRAM/SCM
Interconnect In-mem DB DAS
Scale-in Maximize system density
Minimize end-to-end latency
Syste
m C
ap
acit
y (
cap
ab
ilit
y)
Sin
gle
Devic
e
D
evic
e C
luste
rs
100K
10K
1K
100
10
High
Med
Low Scale-down
Sc
ale
-up
Scale-in
Low Med High Extreme
System Density (1/Latency end-to-end)
Device Clusters Single Device
Low Med High
Physical
Limits
Scale
-ou
t
NAS Blade Server
Scale-out Maximize system capacity
Terabyte HDD POWER 8
Scale-up Maximize device capacity
Atom
Transistor
Atom
Storage
Scale-down Maximize feature density
Cloud
Computing
IT Performance Evolution IBM Research - Zurich
Workload
optimized
Cost/Solution
Optimum Cost/VM
Embarrassingly
parallel workloads
© 2015 IBM Corporation 22
Evolution of IT over 50+ years: Are you ready for workload optimized systems?
IBM Research - Zurich
http://www.optimisation-conversion.com/infographies/acquisition-cross-canal-les-tendances-marketing-digital-e-
commerce-infographie/attachment/no-thanks-were-too-busy-optimisation-conversion/
© 2015 IBM Corporation 23
Evolution of IT over 50+ years: Are you ready for workload optimized systems?
IBM Research - Zurich
http://www.optimisation-conversion.com/infographies/acquisition-cross-canal-les-tendances-marketing-digital-e-
commerce-infographie/attachment/no-thanks-were-too-busy-optimisation-conversion/
© 2015 IBM Corporation 24
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA
IBM Research - Zurich
Delivered via CLOUD
© 2015 IBM Corporation 25
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
https://www-304.ibm.com/connections/blogs/systemz/entry/see_ibm_system_z_mainframe_at_the_javaone_2014_conference?lang=en_us
© 2015 IBM Corporation 26
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
IBM 2015 mainframe:
Characteristics and
benefits
© 2015 IBM Corporation 27
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
Workload optimization (1):
Small Infrastructure
enhancements produce
significant performance
increase for the application
& solution
© 2015 IBM Corporation 28
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
Workload optimization (1):
Small Infrastructure
enhancements produce
significant performance
increase for the application
& solution
© 2015 IBM Corporation 29
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
Workload optimization (1):
Small Infrastructure
enhancements produce
significant performance
increase for the application
& solution
© 2015 IBM Corporation 30
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
Workload optimization (1):
Small Infrastructure
enhancements produce
significant performance
increase for the application
& solution.
© 2015 IBM Corporation 31
Evolution of IT over 50+ years: Workload optimized systems
IBM Research - Zurich
Workload optimization (2):
Increased use of hybrid
systems.
© 2015 IBM Corporation 32
Evolution of IT over 50+ years: Hybrid systems
IBM Research - Zurich
CPU
GPU
GPU
Share Data Structures at CPU Memory Speeds; use
the compute structure which fits best to workload.
Adapted from http://nvidia.com
© 2015 IBM Corporation 33
Evolution of IT over 50+ years: Hybrid HPC systems
IBM Research - Zurich
AURORA:
Intel, Cray
180 PF @ 2018
100 - 300 PF
@ 2017
SUMMIT & SIERRA:
IBM, Nvidia
Adapted from http://intel.com
© 2015 IBM Corporation 34
Evolution of IT over 50+ years: Hybrid chips
IBM Research - Zurich
Intel E3-12xxV3
Nvidia Tegra X1 AMD APU
http://nvidia.com
http://amd.com
http://intel.com
© 2015 IBM Corporation 35
Evolution of IT over 50+ years: Hybrid chips
IBM Research - Zurich
Intel E3-12xxV3
Nvidia Tegra X1 AMD APU
CP
Us
Gra
ph
ics
Gra
ph
ics
Gra
ph
ics
Graphics
CP
Us
CP
Us
CPUs http://nvidia.com
http://amd.com
http://intel.com
© 2015 IBM Corporation 36
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA
IBM Research - Zurich
Delivered via CLOUD
© 2015 IBM Corporation 37
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA
IBM Research - Zurich
Delivered via CLOUD
https://sherlockxxi.wikispaces.com/Funny+things
+and+curiosities+about+Sherlock+Holmes
© 2015 IBM Corporation 38
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
Original
IBM Watson
System
© 2015 IBM Corporation 39
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
Original
IBM Watson
System
IBM True North
Synapse Chip
© 2015 IBM Corporation 40
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA
IBM Research - Zurich
Delivered via CLOUD
© 2015 IBM Corporation 41
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA
IBM Research - Zurich
Delivered via CLOUD
© 2015 IBM Corporation 42
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 43
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 44
The Synapse Project: Short Introduction
IBM Research - Zurich
Dendrites
Cell
Synapse
~7000 synapses/neuron ( 104)
Operations in O (ms).
Human Brain: 1010 neurons / 1014 synapses
Synapse
Cell
© 2015 IBM Corporation 45
The Synapse Project
Transistors are used to
implement a mixed
analog/digital circuit
which behaves like a large
number of very simple
neurons which are highly
interconnected.
Why is this of interest?
Because it’s extremely
power efficient (analog)
and highly scalable!!!
IBM Research - Zurich
Ref: “A Digital Neurosynaptic Core Using Embedded
Crossbar Memory with 45pJ per Spike in 45nm”;
Paul Merolla et al, CICC, 2011
Den
drite
s
© 2015 IBM Corporation 46
The Synapse Project
IBM Research - Zurich
Ref: “A million spiking-neuron integrated circuit with a scalable communication network and interface”;
Paul A. Morella et al, SCIENCE, Aug 2014
Example Result:
Image recognition on a 1920 x 1024 pixel
30 fps video stream
Analysis aperture of
400 x 240 pixels consumed
63mW of electrical power!
Simplified COMPARISON:
The Synapse-Chip “TrueNorth” can
deliver 46 billion SOPS per watt for a
typical network, and 400 billion SOPS per
watt for networks with high spike rates
and high number of active synapses,
whereas today’s most energy-efficient
supercomputer achieves [only]
4.5 billion FLOPS per watt!
10x – 100x speedup!
© 2015 IBM Corporation 47
IT Performance Evolution: Outlook to the future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Quantum, DNA,
Neuronal or ???
computing
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
CLOCK ERA QUANTUM ERA
Whatever it will be,
be assured that
IBM Research is
already working
on IT...
IBM Research - Zurich
Delivered via Cloud.
CLOUD
V1.0 ERA
CLOUD
WOS ERA
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
© 2015 IBM Corporation 48
Evolution of IT over 50+ years: Past, present, future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Quantum, DNA,
Neuronal or ???
computing
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
CLOCK ERA CLOUD
V1.0 ERA
CLOUD
WOS ERA QUANTUM ERA
Whatever it will be,
be assured that
IBM Research is
already working
on IT...
IBM Research - Zurich
Delivered via CLOUD
© 2015 IBM Corporation 49
Summary
The ‘golden age of performance scaling’ is long over - Dennard scaling ended roughly 2004
Growing number of transistors brought us threads – lots of them - This brought us to the Cloud era
However: From Moore’s law siblings, we learn that there is a
diminishing return on investment looming
- Exception: Embarrassingly parallel workloads ( u-Servers )
IBM Research - Zurich
© 2015 IBM Corporation 50
Summary
The ‘golden age of performance scaling’ is long over - Dennard scaling ended roughly 2004
Growing number of transistors brought us threads – lots of them - This brought us to the Cloud era
However: From Moore’s law siblings, we learn that there is a
diminishing return on investment looming
- Exception: Embarrassingly parallel workloads ( u-Servers )
If you want to have performance, you need to go hybrid WOS - ISAs, big/little cores, GPGPU, FPGA, programmable state-machines…
New compute paradigms (= non Von Neumann) are rapidly growing
and will enable completely new approaches to a growing range of
applications - Highly application specific, BUT terribly fast and/or low power!
IBM Research - Zurich
© 2015 IBM Corporation 52
IT Performance Evolution: Outlook to the future
Time
IT E
ffic
ien
cy
Mulit-Core
Multi-Thread Scaling
Workload
Optimized
Systems
1970’s ~2004 ~2015 >>2025
Quantum, DNA,
Neuronal or ???
computing
Performance
increase by device
size scaling which
resulted faster
clocks.
Linear single
thread
performance gain.
Performance
increase by
consolidating large
numbers of parallel
workloads.
Only marginal
single thread
performance gain.
CLOCK ERA QUANTUM ERA
Whatever it will be,
be assured that
IBM Research is
already working
on IT...
IBM Research - Zurich
Delivered via Cloud.
CLOUD
V1.0 ERA
CLOUD
WOS ERA
Performance
increase by holistic
optimization of
entire IT stack.
More than linear
single thread
performance gain.
© 2015 IBM Corporation 53
Why is 2025 an important year?
IBM Research - Zurich
Energy Dissipated per Logic Operation
Energy Dissipated per Logic Operation
© 2015 IBM Corporation 54
Why is 2025 an important year?
IBM Research - Zurich
Energy Dissipated per Logic Operation
Energy Dissipated per Logic Operation
Ups!
A broad view on the high-performance IT environment
55
Thank you!
High-Performance Compute Systems:
The road ahead
IBM Research - Zurich
© 2015 IBM Corporation 58
Abstract High-Performance Compute Systems: The road ahead High-end computing is facing change: While traditional scientific computations
are ever growing and enable deep and novel insights based on ab-initio
simulations, rapidly emerging alternative fields of applications increasingly
require high-performance computing as well. Big-data analytics is seen as having
the potential to change the way, speed and precision of decisions, for example in
business environment, medicine, environment, public management etc. Cognitive
computing, as for example seen in the recently founded IBM Watson group, brings
additional architectural concepts to the table. This presentation gives an actual
status overview and a view on future hybrid concepts reunifying different
technologies and architectures.
Presented by: Dr. Martin L. Schmatz
Martin L. Schmatz received the diploma in Electrical Engineering in 1993 and the
Ph.D. degree in 1998, both from the Swiss Federal Institute of Technology (ETH).
He joined the IBM Research Laboratory in Zurich in 1999, where he currently
manages the Cloud Server Technology research activities with focus on scalable,
hybrid server system architectures.
Dr. Schmatz has published 50+ external papers at premier conferences and
refereed journals in the field and holds more than 40 patents. He is a member of
the IBM Academy, the IBM Technical Experts Council and has an MBA degree from
the Henley Management College in the UK.
IBM Research - Zurich
Evolution of Server Microprocessors
© 2014 IBM Corporation
59
IBM Research - Zurich
POWER8
12-cores, 8-thread/core
22nm CMOS SOI
96 MB L3 cache
650mm2, 4+ GHz
>>250 GFLOPS / chip
Up to 32 socket SMP
Pentium P5, 1993
3.1M transistors
POWER8
12-cores, 8-thread/core
22nm CMOS SOI
96 MB L3 cache
650mm2, 4+ GHz
>>250 GFLOPS / chip
Up to 32 socket SMP
Evolution of Server Microprocessors
© 2014 IBM Corporation
60
IBM Research - Zurich
Pentium P5, 1993
3.1M transistors
Transistors are invested to increase the
number of available threads
Tradeoff to be made: Number of threads vs
single thread performance
Impact of workload aggregation on provisioning
Example: Server loading for single workload; and assuming a 95% SLA
© 2014 IBM Corporation
61
Time (~us)
Work
loa
d
Serv
er
Loadin
g
(th
eore
tical)
Mean
(‘Average
demand’)
95% line
(‘Capacity’)
IBM Research - Zurich
Impact of workload aggregation on provisioning
For a single workload we require a machine capacity 6.0x the average demand
© 2014 IBM Corporation
62
Work
load
Serv
er
Loadin
g
(th
eore
tical)
Mean
(‘Average
demand’)
95% line
(‘Capacity’)
IBM Research - Zurich
Time (~us) 6.0
x
When We Consolidate 4 Workloads We Only Require 3.5x Average Demand
© 2014 IBM Corporation
63
Impact of workload aggregation on provisioning
Work
load
Serv
er
Loadin
g
(th
eore
tical)
IBM Research - Zurich
Time (~us) 3.5
x
When We Consolidate 256 Workloads We Only Require 1.3x Average Demand
© 2014 IBM Corporation
64
Impact of workload aggregation on provisioning
Work
load
Serv
er
Loadin
g
(th
eore
tical)
IBM Research - Zurich
Time (~us)
Aggregating ~256 workloads improves utilization & over-provisioning ~ x5
© 2014 IBM Corporation
65
Impact of workload aggregation on provisioning
Number of consolidated workloads
Utiliz
ation
, O
ver-
pro
vis
ionin
g facto
r
0
1
2
3
4
5
6
1 10 100 1000
Overprovisioning
Utilization
Effect of workload
consolidation:
Small performance over-
provisioning
Very high utilization
Small memory over-
provisioning
Prerequisite:
Balanced system design of a
sizable SMP shared everything
server
÷5
x5
IBM Research - Zurich
Diminishing ROI when going beyond 500-1000 aggregated workloads
© 2014 IBM Corporation
66
Impact of workload aggregation on provisioning
Number of consolidated workloads
Utiliz
ation
, O
ver-
pro
vis
ionin
g facto
r
0
1
2
3
4
5
6
1 10 100 1000
Overprovisioning
Utilization
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
144
Almost no gain
despite of ~ doubling
number of consolidated
workloads
256
IBM Research - Zurich
What options out of this
dilemma do we have?
Reduce Cost (= cloud
server)
Scale-up/in (= increase
performance to reduce
cost)
Workload optimization
Any HYBRID combination
of the above!
Co
st
Perf
orm
an
ce
67
On the road to Software Defined Environments (SDE)
© 2014 IBM Corporation
67
Future
Rapidly changing workloads,
dynamic patterns
Dynamic automatic
composition of heterogeneous
system
Autonomic and proactive
management
Current
Diverse workload, limited
patterns
Homogeneous resource
pooling
Expert configuration and
mapping of workload
Traditional
Few, stable, and well known
workloads
Fixed System hardware,
manual scaling
Hardwired workload, minimal
configuration
W1 W2 W3 W4
R1 R2 R3
V1 V2 V3 V4 V5 … Vn
V1 V2 V3 V4 V5 V5 ... …. Vn
C
C
Workload types are growing and are becoming more volatile
Cloud infrastructure is becoming programmable to meet the requirements in efficiency and resiliency
Heterogeneity is increasingly present and important
IBM Research - Zurich
68
On the road to Software Defined Environments (SDE)
© 2014 IBM Corporation
68
Future
Rapidly changing workloads,
dynamic patterns
Dynamic automatic
composition of heterogeneous
system
Autonomic and proactive
management
Current
Diverse workload, limited
patterns
Homogeneous resource
pooling
Expert configuration and
mapping of workload
Traditional
Few, stable, and well known
workloads
Fixed System hardware,
manual scaling
Hardwired workload, minimal
configuration
W1 W2 W3 W4
R1 R2 R3
V1 V2 V3 V4 V5 … Vn
V1 V2 V3 V4 V5 V5 ... …. Vn
C
C
Workload types are growing and are becoming more volatile
Cloud infrastructure is becoming programmable to meet the requirements in efficiency and resiliency
Heterogeneity is increasingly present and important
IBM Research - Zurich
HYBRID !!!
© 2015 IBM Corporation 69
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 70
Human Brain Project Future Neuroscience
Achieve a unified, multi-level understanding of the human brain that integrates
data and knowledge about the healthy and diseased brain across all levels of
biological organisation, from genes to behaviour; establish in silico
experimentation as a foundational methodology for understanding the brain.
Future Medicine
Develop an objective, biologically grounded map of neurological and psychiatric
diseases based on multilevel clinical data; use the map to classify and diagnose
brain diseases and to configure models of these diseases; use in silico
experimentation to understand the causes of brain diseases and develop new
drugs and other treatments; establish personalised medicine for neurology and
psychiatry.
Future Computing
Develop novel neuromorphic and neurorobotic technologies based on the brain's
circuitry and computing principles; develop supercomputing technologies for
brain simulation, robot and autonomous systems control and other data intensive
applications.
IBM Research - Zurich
Ref: https://www.humanbrainproject.eu/discover/the-project/strategic-objectives
© 2015 IBM Corporation 71
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 72
The Synapse Project
Like the brain and unlike von Neumann computing,
the SyNAPSE chip architecture:
• is event-driven and shies away from ever-increasing clock rates, the need for
cooling, and dark silicon;
• uses local computation and is massively parallel and distributed;
• integrates memory with computation and so avoids the memory wall and minimizes
overall average wire length;
• has exceptionally low-power and so can be ubiquitously embedded;
• uses implicit addressing for synapses thus significantly reducing memory and
communication;
• is fault-tolerant and so degrades gracefully;
• is simple but fundamental, with canonical cores using canonical learning
interconnected via a canonical fabric;
• is a learning system beyond programming that can autonomously extract its
“program” (synapses, structure, and neuron thresholds) from complex,
spatiotemporal, real-world environments with multiple sensory and motor
modalities to mine the boundary between digital and physical worlds.
IBM Research - Zurich
© 2015 IBM Corporation 73
The Synapse Project
IBM Research - Zurich
Where does it lead us?
Super power efficient sensor data analytics: Examples
Roller Bot
Autonomous bots could
be deployed in a disaster
area to sense location of
victims in search and
rescue operations
Thermometers that
can smell
Sensors in future medical
devices could recognize
odors from certain
bacteria.
Jellyfish Sensors
Buoys could monitor
shipping lanes for safety
and environmental
protection.
Transforming Mobile
Low power chips could
make your mobile phone
as powerful as a
supercomputer.
© 2015 IBM Corporation 74
The Synapse Project: Scaling
IBM Research - Zurich
Ref: “A million spiking-neuron integrated circuit with a scalable communication network and interface”;
Paul A. Morella et al, SCIENCE, Aug 2014
© 2015 IBM Corporation 75
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 76
Watson
IBM Research - Zurich
Holistic combinations of machine learning algorithms
Complex arrangement of many diverse workloads
Workload optimized system required for optimum performance
© 2015 IBM Corporation 77
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 78
Continuum of “COGNITIVE COMPUTING”
BlueBrain
Project,
Human Brain
Project:
Detailed
understanding
of how a brain
works and
reacts
IBM Research - Zurich
www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com
Synapse
Project:
Efficient use
of structures
similar to
those found
in a brain
WATSON:
Holistic
combinations
of machine
learning
algorithms
Statistics:
Cross-
correlation
and intuitive
presentation
of large
amounts of
data
© 2015 IBM Corporation 79
Evolution of IT over 50+ years: Past, present, future
IBM Research - Zurich
100 x 100 Mbps
= 10Gbps
12 x 10 Gbps =
120 Gbps
Same trend from
several vendors: http://www.intelfreepress.com/news/revolutionizing-computing-with-lasers/57/
© 2015 IBM Corporation 81
Evolution of IT over 50+ years: New Workloads
IBM Research - Zurich
2004: < 1 Transaction
per mobile user per day [Facebook had 1M members]
© 2015 IBM Corporation 82
Evolution of IT over 50+ years: New Workloads
IBM Research - Zurich
2004: < 1 Transaction
per mobile user per day [Facebook had 1M members]
2014: > 37 Transactions
per mobile user per day [Facebook had 1.2B members]
© 2015 IBM Corporation 83
Evolution of IT over 50+ years: New Workloads
IBM Research - Zurich
2017: > 50 Transactions
per mobile user per day