infrastructures for machine learning - dell technologies · 2020. 8. 23. · big data...

21
Infrastructures for machine learning Antonio Cisternino (@cisterni) IT Center @Unipisa

Upload: others

Post on 13-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Infrastructures for machine learning

Antonio Cisternino (@cisterni)IT Center @Unipisa

Page 2: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Machine Learning hype in the news

Page 3: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Machine Learning hype in the news

Page 4: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Machine Learning hype in the news

Page 5: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

The second machine age: the automation of control

World transformation

Energy

Control

Page 6: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Generating tables

We will initially need trainedmathematicians writing tables (i.e.

computer programs) but eventuallycomputers will generate tables (programs)

automatically

Alan Turing working on Colossus (not literal citation)

Page 7: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Percept

Update envinformation

Makedecision

Act

A «functional» element

Page 8: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

A «functional» element

Perception

Analysis

Classification/predictionAction

Feedback

Page 9: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Cloud, HPC and the problem size

HPC

Problem requiresmultiple nodes

Homogeneousstructure (jobs

and schedulers…)

Mainly in memory

Frequent sync(order of latency

in communications)

Cloud

Single nodeaddressingmultiple problems

Heterogeneousstructure (just

x86…)

Any memory and disk accesspatterns

Communicationmay happen even

through L2 incapsulated in L3

Big data

Problemrequiresmultiple nodes

Not-so-homogeneous

structure

Mainly disk bound

Communicationis rare

Machine learning

Not (yet) a single system

Highly variablestructure

Highly variablemodel size

It may benefit from multiple

nodes

Page 10: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

(Traditional?) Machine Learning process

Acquiredata

Training set / Test

setTraining

Model validation

Production of the model

Data and Computationally intensive

Model small wrt to data

Fast execution

Page 11: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Everyone looks for adaptive systems

Reward/punishment

Update the model

Page 12: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Deep learning ≠ Machine learning

«Future is deep learning and systems will learn like human brain»

«Accelerators are needed for efficient DNN»

«Half precision is the solutionto everything»

ML techniques are activelyused that are not based on DNN and may include treesstructures

HW acceleration may help speed depending on the sizeof the ML function

Sometimes filtering training data using less information may help generalization butit’s not always true

Belie

fsR

eality ch

eck

Page 13: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Compute is important in ML

Ok, ok, you may need

memory and

computational power

in some format and

shape

Page 14: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Data persistence is important in ML

ML Model is important Often hard or impossible to recreate (especially in adaptive systems) Should be fast to access

Page 15: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Fabric is important for ML

ML is a functional element that has to be placed in a larger computationalinfrastructure

Bandwidth is important to support data ingestion and output Latency may be even more important in production

Page 16: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Cloud is important for ML

Some ML primitives are simply too big to be executed on prem Users are a key part of on-line learning and adaptive systems

Page 17: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Is this architecture suitable for ML?

Bandwidth is limited from

the architecture

Page 18: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

Is this architecture suitable for ML?

Bandwidth is limited from

the architecture

Page 19: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

A new dawn of computing

Computation

Processing

Storage

Communication

Programming

Disks are

getting only

100x slower

than

memory

Software & PL

are getting

distant from

real

architectures

Many core, accelerators

Low latency high

bandwidth

Page 20: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

A more reasonable interconnection…

Cloud

Front

end

HPC

Big Data

Private cloud

f

f

Edge (IoT and mobile)

f

f

f

Page 21: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is

(My) Conclusions ML is a functional primitive not a system The discipline is evolving and the balance between compute/storage/fabric may vary

significantly over problems and time Hardware acceleration will be an important part and reconfiguration of hardware will

be more and more important Edge ML will be part of the picture of Fog computing ML should not be considered as a physical entity (like a cluster) in your data center