keynote 12, dr. martin schmatz, ibm

84
© 2015 IBM Corporation High-Performance Compute Systems: The road ahead A broad view on the high-performance IT environment IBM Research Zurich Dr. Martin Schmatz Manager, Cloud Server Technologies 13 May 2015 GUIDE SHARE EUROPE: Jahrestagung DACH, Hamburg, May 11-13, 2015

Upload: guide-share-europe-austracee

Post on 15-Aug-2015

72 views

Category:

Technology


1 download

TRANSCRIPT

© 2015 IBM Corporation

High-Performance Compute Systems: The road ahead

A broad view on the high-performance IT environment

IBM Research – Zurich

Dr. Martin Schmatz – Manager, Cloud Server Technologies

13 May 2015

GUIDE SHARE EUROPE: Jahrestagung DACH, Hamburg, May 11-13, 2015

Moore’s Law: From thousands to billions

1971: Intel 4004, 2’300 transistors

© 2014 IBM Corporation

2

IBM Research - Zurich

Photo: “Designer Delves into Complexities of World’s First

Microprocessor” by Intel Free Press

2014: IBM Power8,

4’200’000’000 transistors

40+ years

http://www.extremetech.com/extreme/191453-ibm-unveils-new-

power8-servers-in-last-gasp-effort-to-battle-intels-x86-dominion

© 2015 IBM Corporation 3

Evolution of IT over 50+ years: Past & present

Time

IT E

ffic

ien

cy

Scaling

1970’s

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

CLOCK ERA

IBM Research - Zurich

Moore’s law siblings

Calculations per Second for $100

© 2014 IBM Corporation

IBM Research - Zurich

http://www.kurzweilai.net/ask-ray-the-future-of-moores-law

Moore’s law siblings

Calculations per Second for $100

© 2014 IBM Corporation

IBM Research - Zurich

http://www.kurzweilai.net/ask-ray-the-future-of-moores-law

> 15 orders of

magnitude

100 years

Moore’s law siblings

Calculations per Second for $100

© 2014 IBM Corporation

6

IBM Research - Zurich

http://www.kurzweilai.net/ask-ray-the-future-of-moores-law

http://www.economist.com/news/21589080-golden-rule-

microchips-appears-be-coming-end-no-moore

Transistors per 1$

x8 in 10 years Flat or

less

20

12

© 2015 IBM Corporation 7

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

© 2015 IBM Corporation 8

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

x 15 mem BW

10 Years: 2004 2014

Z13: 832 GB/s system memory BW

© 2015 IBM Corporation 9

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

x 50 caches

x 15 mem BW

10 Years: 2004 2014

© 2015 IBM Corporation 10

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

x 24 threads

x 50 caches

x 15 mem BW

10 Years: 2004 2014

Numbers of cores in the system

Single core Pentium 2 [1997]

© 2014 IBM Corporation

11

IBM Research - Zurich

http://encyclopedia2.thefreedictionary.com/Pentium+II

http://intel.com

18C/36T E5 [2015]

18 years

IBM Z13: Up to 141C/282T per system (plus SIMD)

Aggregating ~256 workloads improves utilization & over-provisioning

© 2014 IBM Corporation

12

Impact of workload aggregation on provisioning

Number of consolidated workloads

Utiliz

ation

, O

ver-

pro

vis

ionin

g facto

r

0

1

2

3

4

5

6

1 10 100 1000

Overprovisioning

Utilization

IBM Research - Zurich

Aggregating ~256 workloads improves utilization & over-provisioning ~ x5

© 2014 IBM Corporation

13

Impact of workload aggregation on provisioning

Number of consolidated workloads

Utiliz

ation

, O

ver-

pro

vis

ionin

g facto

r

0

1

2

3

4

5

6

1 10 100 1000

Overprovisioning

Utilization

÷5

x5

IBM Research - Zurich

Diminishing ROI when going beyond 500-1000 aggregated workloads

© 2014 IBM Corporation

14

Impact of workload aggregation on provisioning

Number of consolidated workloads

Utiliz

ation

, O

ver-

pro

vis

ionin

g facto

r

0

1

2

3

4

5

6

1 10 100 1000

Overprovisioning

Utilization

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

144

Almost no gain

despite of ~ doubling

number of consolidated

workloads

256

IBM Research - Zurich

© 2015 IBM Corporation 15

Evolution of IT over 50+ years: Past & present

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

1970’s ~2004

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

IBM Research - Zurich

16 © 2014 IBM Corporation

16

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

Terabyte HDD POWER 7

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

IT Performance Evolution IBM Research - Zurich

17 © 2014 IBM Corporation

17

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

NAS Blade Server

Scale-out Maximize system capacity

Terabyte HDD POWER 8

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

IT Performance Evolution IBM Research - Zurich

Scale

-ou

t

18 © 2014 IBM Corporation

18

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

NAS Blade Server

Scale-out Maximize system capacity

Terabyte HDD POWER 8

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

Cloud

Computing

IT Performance Evolution IBM Research - Zurich

Scale

-ou

t

19 © 2014 IBM Corporation

19

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

Scale

-ou

t

NAS Blade Server

Scale-out Maximize system capacity

Terabyte HDD POWER 8

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

Cloud

Computing

IT Performance Evolution IBM Research - Zurich

Optimum Cost/VM

Embarrassingly

parallel workloads

20 © 2014 IBM Corporation

20

FLASH SSD

3D Chips FPGA Manycore BPRAM/SCM

Interconnect In-mem DB DAS

Scale-in Maximize system density

Minimize end-to-end latency

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Scale-in

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

Scale

-ou

t

NAS Blade Server

Scale-out Maximize system capacity

Terabyte HDD POWER 8

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

Cloud

Computing

IT Performance Evolution IBM Research - Zurich

Optimum Cost/VM

Embarrassingly

parallel workloads

21 © 2014 IBM Corporation

21

FLASH SSD

3D Chips FPGA?GPU Manycore BPRAM/SCM

Interconnect In-mem DB DAS

Scale-in Maximize system density

Minimize end-to-end latency

Syste

m C

ap

acit

y (

cap

ab

ilit

y)

Sin

gle

Devic

e

D

evic

e C

luste

rs

100K

10K

1K

100

10

High

Med

Low Scale-down

Sc

ale

-up

Scale-in

Low Med High Extreme

System Density (1/Latency end-to-end)

Device Clusters Single Device

Low Med High

Physical

Limits

Scale

-ou

t

NAS Blade Server

Scale-out Maximize system capacity

Terabyte HDD POWER 8

Scale-up Maximize device capacity

Atom

Transistor

Atom

Storage

Scale-down Maximize feature density

Cloud

Computing

IT Performance Evolution IBM Research - Zurich

Workload

optimized

Cost/Solution

Optimum Cost/VM

Embarrassingly

parallel workloads

© 2015 IBM Corporation 22

Evolution of IT over 50+ years: Are you ready for workload optimized systems?

IBM Research - Zurich

http://www.optimisation-conversion.com/infographies/acquisition-cross-canal-les-tendances-marketing-digital-e-

commerce-infographie/attachment/no-thanks-were-too-busy-optimisation-conversion/

© 2015 IBM Corporation 23

Evolution of IT over 50+ years: Are you ready for workload optimized systems?

IBM Research - Zurich

http://www.optimisation-conversion.com/infographies/acquisition-cross-canal-les-tendances-marketing-digital-e-

commerce-infographie/attachment/no-thanks-were-too-busy-optimisation-conversion/

© 2015 IBM Corporation 24

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA

IBM Research - Zurich

Delivered via CLOUD

© 2015 IBM Corporation 25

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

https://www-304.ibm.com/connections/blogs/systemz/entry/see_ibm_system_z_mainframe_at_the_javaone_2014_conference?lang=en_us

© 2015 IBM Corporation 26

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

IBM 2015 mainframe:

Characteristics and

benefits

© 2015 IBM Corporation 27

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

Workload optimization (1):

Small Infrastructure

enhancements produce

significant performance

increase for the application

& solution

© 2015 IBM Corporation 28

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

Workload optimization (1):

Small Infrastructure

enhancements produce

significant performance

increase for the application

& solution

© 2015 IBM Corporation 29

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

Workload optimization (1):

Small Infrastructure

enhancements produce

significant performance

increase for the application

& solution

© 2015 IBM Corporation 30

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

Workload optimization (1):

Small Infrastructure

enhancements produce

significant performance

increase for the application

& solution.

© 2015 IBM Corporation 31

Evolution of IT over 50+ years: Workload optimized systems

IBM Research - Zurich

Workload optimization (2):

Increased use of hybrid

systems.

© 2015 IBM Corporation 32

Evolution of IT over 50+ years: Hybrid systems

IBM Research - Zurich

CPU

GPU

GPU

Share Data Structures at CPU Memory Speeds; use

the compute structure which fits best to workload.

Adapted from http://nvidia.com

© 2015 IBM Corporation 33

Evolution of IT over 50+ years: Hybrid HPC systems

IBM Research - Zurich

AURORA:

Intel, Cray

180 PF @ 2018

100 - 300 PF

@ 2017

SUMMIT & SIERRA:

IBM, Nvidia

Adapted from http://intel.com

© 2015 IBM Corporation 34

Evolution of IT over 50+ years: Hybrid chips

IBM Research - Zurich

Intel E3-12xxV3

Nvidia Tegra X1 AMD APU

http://nvidia.com

http://amd.com

http://intel.com

© 2015 IBM Corporation 35

Evolution of IT over 50+ years: Hybrid chips

IBM Research - Zurich

Intel E3-12xxV3

Nvidia Tegra X1 AMD APU

CP

Us

Gra

ph

ics

Gra

ph

ics

Gra

ph

ics

Graphics

CP

Us

CP

Us

CPUs http://nvidia.com

http://amd.com

http://intel.com

© 2015 IBM Corporation 36

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA

IBM Research - Zurich

Delivered via CLOUD

© 2015 IBM Corporation 37

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA

IBM Research - Zurich

Delivered via CLOUD

https://sherlockxxi.wikispaces.com/Funny+things

+and+curiosities+about+Sherlock+Holmes

© 2015 IBM Corporation 38

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

Original

IBM Watson

System

© 2015 IBM Corporation 39

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

Original

IBM Watson

System

IBM True North

Synapse Chip

© 2015 IBM Corporation 40

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA

IBM Research - Zurich

Delivered via CLOUD

© 2015 IBM Corporation 41

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA

IBM Research - Zurich

Delivered via CLOUD

© 2015 IBM Corporation 42

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 43

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 44

The Synapse Project: Short Introduction

IBM Research - Zurich

Dendrites

Cell

Synapse

~7000 synapses/neuron ( 104)

Operations in O (ms).

Human Brain: 1010 neurons / 1014 synapses

Synapse

Cell

© 2015 IBM Corporation 45

The Synapse Project

Transistors are used to

implement a mixed

analog/digital circuit

which behaves like a large

number of very simple

neurons which are highly

interconnected.

Why is this of interest?

Because it’s extremely

power efficient (analog)

and highly scalable!!!

IBM Research - Zurich

Ref: “A Digital Neurosynaptic Core Using Embedded

Crossbar Memory with 45pJ per Spike in 45nm”;

Paul Merolla et al, CICC, 2011

Den

drite

s

© 2015 IBM Corporation 46

The Synapse Project

IBM Research - Zurich

Ref: “A million spiking-neuron integrated circuit with a scalable communication network and interface”;

Paul A. Morella et al, SCIENCE, Aug 2014

Example Result:

Image recognition on a 1920 x 1024 pixel

30 fps video stream

Analysis aperture of

400 x 240 pixels consumed

63mW of electrical power!

Simplified COMPARISON:

The Synapse-Chip “TrueNorth” can

deliver 46 billion SOPS per watt for a

typical network, and 400 billion SOPS per

watt for networks with high spike rates

and high number of active synapses,

whereas today’s most energy-efficient

supercomputer achieves [only]

4.5 billion FLOPS per watt!

10x – 100x speedup!

© 2015 IBM Corporation 47

IT Performance Evolution: Outlook to the future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Quantum, DNA,

Neuronal or ???

computing

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

CLOCK ERA QUANTUM ERA

Whatever it will be,

be assured that

IBM Research is

already working

on IT...

IBM Research - Zurich

Delivered via Cloud.

CLOUD

V1.0 ERA

CLOUD

WOS ERA

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

© 2015 IBM Corporation 48

Evolution of IT over 50+ years: Past, present, future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Quantum, DNA,

Neuronal or ???

computing

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

CLOCK ERA CLOUD

V1.0 ERA

CLOUD

WOS ERA QUANTUM ERA

Whatever it will be,

be assured that

IBM Research is

already working

on IT...

IBM Research - Zurich

Delivered via CLOUD

© 2015 IBM Corporation 49

Summary

The ‘golden age of performance scaling’ is long over - Dennard scaling ended roughly 2004

Growing number of transistors brought us threads – lots of them - This brought us to the Cloud era

However: From Moore’s law siblings, we learn that there is a

diminishing return on investment looming

- Exception: Embarrassingly parallel workloads ( u-Servers )

IBM Research - Zurich

© 2015 IBM Corporation 50

Summary

The ‘golden age of performance scaling’ is long over - Dennard scaling ended roughly 2004

Growing number of transistors brought us threads – lots of them - This brought us to the Cloud era

However: From Moore’s law siblings, we learn that there is a

diminishing return on investment looming

- Exception: Embarrassingly parallel workloads ( u-Servers )

If you want to have performance, you need to go hybrid WOS - ISAs, big/little cores, GPGPU, FPGA, programmable state-machines…

New compute paradigms (= non Von Neumann) are rapidly growing

and will enable completely new approaches to a growing range of

applications - Highly application specific, BUT terribly fast and/or low power!

IBM Research - Zurich

Innovation

© 2014 IBM Corporation

51

© 2015 IBM Corporation 52

IT Performance Evolution: Outlook to the future

Time

IT E

ffic

ien

cy

Mulit-Core

Multi-Thread Scaling

Workload

Optimized

Systems

1970’s ~2004 ~2015 >>2025

Quantum, DNA,

Neuronal or ???

computing

Performance

increase by device

size scaling which

resulted faster

clocks.

Linear single

thread

performance gain.

Performance

increase by

consolidating large

numbers of parallel

workloads.

Only marginal

single thread

performance gain.

CLOCK ERA QUANTUM ERA

Whatever it will be,

be assured that

IBM Research is

already working

on IT...

IBM Research - Zurich

Delivered via Cloud.

CLOUD

V1.0 ERA

CLOUD

WOS ERA

Performance

increase by holistic

optimization of

entire IT stack.

More than linear

single thread

performance gain.

© 2015 IBM Corporation 53

Why is 2025 an important year?

IBM Research - Zurich

Energy Dissipated per Logic Operation

Energy Dissipated per Logic Operation

© 2015 IBM Corporation 54

Why is 2025 an important year?

IBM Research - Zurich

Energy Dissipated per Logic Operation

Energy Dissipated per Logic Operation

Ups!

A broad view on the high-performance IT environment

55

Thank you!

High-Performance Compute Systems:

The road ahead

IBM Research - Zurich

56 © 2014 IBM Corporation

56

57 © 2014 IBM Corporation

57

BACKUP

© 2015 IBM Corporation 58

Abstract High-Performance Compute Systems: The road ahead High-end computing is facing change: While traditional scientific computations

are ever growing and enable deep and novel insights based on ab-initio

simulations, rapidly emerging alternative fields of applications increasingly

require high-performance computing as well. Big-data analytics is seen as having

the potential to change the way, speed and precision of decisions, for example in

business environment, medicine, environment, public management etc. Cognitive

computing, as for example seen in the recently founded IBM Watson group, brings

additional architectural concepts to the table. This presentation gives an actual

status overview and a view on future hybrid concepts reunifying different

technologies and architectures.

Presented by: Dr. Martin L. Schmatz

Martin L. Schmatz received the diploma in Electrical Engineering in 1993 and the

Ph.D. degree in 1998, both from the Swiss Federal Institute of Technology (ETH).

He joined the IBM Research Laboratory in Zurich in 1999, where he currently

manages the Cloud Server Technology research activities with focus on scalable,

hybrid server system architectures.

Dr. Schmatz has published 50+ external papers at premier conferences and

refereed journals in the field and holds more than 40 patents. He is a member of

the IBM Academy, the IBM Technical Experts Council and has an MBA degree from

the Henley Management College in the UK.

IBM Research - Zurich

Evolution of Server Microprocessors

© 2014 IBM Corporation

59

IBM Research - Zurich

POWER8

12-cores, 8-thread/core

22nm CMOS SOI

96 MB L3 cache

650mm2, 4+ GHz

>>250 GFLOPS / chip

Up to 32 socket SMP

Pentium P5, 1993

3.1M transistors

POWER8

12-cores, 8-thread/core

22nm CMOS SOI

96 MB L3 cache

650mm2, 4+ GHz

>>250 GFLOPS / chip

Up to 32 socket SMP

Evolution of Server Microprocessors

© 2014 IBM Corporation

60

IBM Research - Zurich

Pentium P5, 1993

3.1M transistors

Transistors are invested to increase the

number of available threads

Tradeoff to be made: Number of threads vs

single thread performance

Impact of workload aggregation on provisioning

Example: Server loading for single workload; and assuming a 95% SLA

© 2014 IBM Corporation

61

Time (~us)

Work

loa

d

Serv

er

Loadin

g

(th

eore

tical)

Mean

(‘Average

demand’)

95% line

(‘Capacity’)

IBM Research - Zurich

Impact of workload aggregation on provisioning

For a single workload we require a machine capacity 6.0x the average demand

© 2014 IBM Corporation

62

Work

load

Serv

er

Loadin

g

(th

eore

tical)

Mean

(‘Average

demand’)

95% line

(‘Capacity’)

IBM Research - Zurich

Time (~us) 6.0

x

When We Consolidate 4 Workloads We Only Require 3.5x Average Demand

© 2014 IBM Corporation

63

Impact of workload aggregation on provisioning

Work

load

Serv

er

Loadin

g

(th

eore

tical)

IBM Research - Zurich

Time (~us) 3.5

x

When We Consolidate 256 Workloads We Only Require 1.3x Average Demand

© 2014 IBM Corporation

64

Impact of workload aggregation on provisioning

Work

load

Serv

er

Loadin

g

(th

eore

tical)

IBM Research - Zurich

Time (~us)

Aggregating ~256 workloads improves utilization & over-provisioning ~ x5

© 2014 IBM Corporation

65

Impact of workload aggregation on provisioning

Number of consolidated workloads

Utiliz

ation

, O

ver-

pro

vis

ionin

g facto

r

0

1

2

3

4

5

6

1 10 100 1000

Overprovisioning

Utilization

Effect of workload

consolidation:

Small performance over-

provisioning

Very high utilization

Small memory over-

provisioning

Prerequisite:

Balanced system design of a

sizable SMP shared everything

server

÷5

x5

IBM Research - Zurich

Diminishing ROI when going beyond 500-1000 aggregated workloads

© 2014 IBM Corporation

66

Impact of workload aggregation on provisioning

Number of consolidated workloads

Utiliz

ation

, O

ver-

pro

vis

ionin

g facto

r

0

1

2

3

4

5

6

1 10 100 1000

Overprovisioning

Utilization

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

144

Almost no gain

despite of ~ doubling

number of consolidated

workloads

256

IBM Research - Zurich

What options out of this

dilemma do we have?

Reduce Cost (= cloud

server)

Scale-up/in (= increase

performance to reduce

cost)

Workload optimization

Any HYBRID combination

of the above!

Co

st

Perf

orm

an

ce

67

On the road to Software Defined Environments (SDE)

© 2014 IBM Corporation

67

Future

Rapidly changing workloads,

dynamic patterns

Dynamic automatic

composition of heterogeneous

system

Autonomic and proactive

management

Current

Diverse workload, limited

patterns

Homogeneous resource

pooling

Expert configuration and

mapping of workload

Traditional

Few, stable, and well known

workloads

Fixed System hardware,

manual scaling

Hardwired workload, minimal

configuration

W1 W2 W3 W4

R1 R2 R3

V1 V2 V3 V4 V5 … Vn

V1 V2 V3 V4 V5 V5 ... …. Vn

C

C

Workload types are growing and are becoming more volatile

Cloud infrastructure is becoming programmable to meet the requirements in efficiency and resiliency

Heterogeneity is increasingly present and important

IBM Research - Zurich

68

On the road to Software Defined Environments (SDE)

© 2014 IBM Corporation

68

Future

Rapidly changing workloads,

dynamic patterns

Dynamic automatic

composition of heterogeneous

system

Autonomic and proactive

management

Current

Diverse workload, limited

patterns

Homogeneous resource

pooling

Expert configuration and

mapping of workload

Traditional

Few, stable, and well known

workloads

Fixed System hardware,

manual scaling

Hardwired workload, minimal

configuration

W1 W2 W3 W4

R1 R2 R3

V1 V2 V3 V4 V5 … Vn

V1 V2 V3 V4 V5 V5 ... …. Vn

C

C

Workload types are growing and are becoming more volatile

Cloud infrastructure is becoming programmable to meet the requirements in efficiency and resiliency

Heterogeneity is increasingly present and important

IBM Research - Zurich

HYBRID !!!

© 2015 IBM Corporation 69

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 70

Human Brain Project Future Neuroscience

Achieve a unified, multi-level understanding of the human brain that integrates

data and knowledge about the healthy and diseased brain across all levels of

biological organisation, from genes to behaviour; establish in silico

experimentation as a foundational methodology for understanding the brain.

Future Medicine

Develop an objective, biologically grounded map of neurological and psychiatric

diseases based on multilevel clinical data; use the map to classify and diagnose

brain diseases and to configure models of these diseases; use in silico

experimentation to understand the causes of brain diseases and develop new

drugs and other treatments; establish personalised medicine for neurology and

psychiatry.

Future Computing

Develop novel neuromorphic and neurorobotic technologies based on the brain's

circuitry and computing principles; develop supercomputing technologies for

brain simulation, robot and autonomous systems control and other data intensive

applications.

IBM Research - Zurich

Ref: https://www.humanbrainproject.eu/discover/the-project/strategic-objectives

© 2015 IBM Corporation 71

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 72

The Synapse Project

Like the brain and unlike von Neumann computing,

the SyNAPSE chip architecture:

• is event-driven and shies away from ever-increasing clock rates, the need for

cooling, and dark silicon;

• uses local computation and is massively parallel and distributed;

• integrates memory with computation and so avoids the memory wall and minimizes

overall average wire length;

• has exceptionally low-power and so can be ubiquitously embedded;

• uses implicit addressing for synapses thus significantly reducing memory and

communication;

• is fault-tolerant and so degrades gracefully;

• is simple but fundamental, with canonical cores using canonical learning

interconnected via a canonical fabric;

• is a learning system beyond programming that can autonomously extract its

“program” (synapses, structure, and neuron thresholds) from complex,

spatiotemporal, real-world environments with multiple sensory and motor

modalities to mine the boundary between digital and physical worlds.

IBM Research - Zurich

© 2015 IBM Corporation 73

The Synapse Project

IBM Research - Zurich

Where does it lead us?

Super power efficient sensor data analytics: Examples

Roller Bot

Autonomous bots could

be deployed in a disaster

area to sense location of

victims in search and

rescue operations

Thermometers that

can smell

Sensors in future medical

devices could recognize

odors from certain

bacteria.

Jellyfish Sensors

Buoys could monitor

shipping lanes for safety

and environmental

protection.

Transforming Mobile

Low power chips could

make your mobile phone

as powerful as a

supercomputer.

© 2015 IBM Corporation 74

The Synapse Project: Scaling

IBM Research - Zurich

Ref: “A million spiking-neuron integrated circuit with a scalable communication network and interface”;

Paul A. Morella et al, SCIENCE, Aug 2014

© 2015 IBM Corporation 75

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 76

Watson

IBM Research - Zurich

Holistic combinations of machine learning algorithms

Complex arrangement of many diverse workloads

Workload optimized system required for optimum performance

© 2015 IBM Corporation 77

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 78

Continuum of “COGNITIVE COMPUTING”

BlueBrain

Project,

Human Brain

Project:

Detailed

understanding

of how a brain

works and

reacts

IBM Research - Zurich

www.matchmove.com www.ucl.ac.uk breast-cancer-research.com www.ibm.com

Synapse

Project:

Efficient use

of structures

similar to

those found

in a brain

WATSON:

Holistic

combinations

of machine

learning

algorithms

Statistics:

Cross-

correlation

and intuitive

presentation

of large

amounts of

data

© 2015 IBM Corporation 79

Evolution of IT over 50+ years: Past, present, future

IBM Research - Zurich

100 x 100 Mbps

= 10Gbps

12 x 10 Gbps =

120 Gbps

Same trend from

several vendors: http://www.intelfreepress.com/news/revolutionizing-computing-with-lasers/57/

© 2015 IBM Corporation 80

Evolution of IT over 50+ years: New workloads

IBM Research - Zurich

© 2015 IBM Corporation 81

Evolution of IT over 50+ years: New Workloads

IBM Research - Zurich

2004: < 1 Transaction

per mobile user per day [Facebook had 1M members]

© 2015 IBM Corporation 82

Evolution of IT over 50+ years: New Workloads

IBM Research - Zurich

2004: < 1 Transaction

per mobile user per day [Facebook had 1M members]

2014: > 37 Transactions

per mobile user per day [Facebook had 1.2B members]

© 2015 IBM Corporation 83

Evolution of IT over 50+ years: New Workloads

IBM Research - Zurich

2017: > 50 Transactions

per mobile user per day

© 2015 IBM Corporation 84

Evolution of IT over 50+ years: New Workloads

IBM Research - Zurich