systemml: declarative machine learning on sparktozsu/courses/cs848/w19... · accelerates model...

20
SystemML: Declarative Machine Learning on Spark Presented by: Juan Carrillo Candidate for MASc. in Computer Software Department of Electrical & Computer Engineering University of Waterloo 05/03/19

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark

Presented by: Juan CarrilloCandidate for MASc. in Computer SoftwareDepartment of Electrical & Computer EngineeringUniversity of Waterloo

05/03/19

Page 2: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

Agenda

1. Introduction

2. SystemML core features

3. Experiments

4. Conclusions

5. Discussion

SystemML: Declarative Machine Learning on Spark PAGE 2

Page 3: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 3

Introduction1

Page 4: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

1. Introduction

SystemML: Declarative Machine Learning on Spark PAGE 4

Machine Learning for Big Data Analytics

Page 5: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

1. Introduction

SystemML: Declarative Machine Learning on Spark PAGE 5

The problem, and the SystemML approach

Usual workflow SystemML approach

Time consuming Error prone

Accelerates model developmentSimplifies deployment

DML

Source: Spark Summit. Inside Apache SystemML

Page 6: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

1. Introduction

SystemML: Declarative Machine Learning on Spark PAGE 6

SystemML background

2010

Creation

By researchers at the IBM Almaden Research Center

2015

Open-source

Spark Summit in San Francisco

2017

Top Level Project

Apache Software Foundation Board

2018

Current release 1.2

Deep learning functions Ultra-sparse data

Page 7: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 7

SystemML core features

2

Page 8: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

2. SystemML core features

SystemML: Declarative Machine Learning on Spark PAGE 8

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

Page 9: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

2. SystemML core features

SystemML: Declarative Machine Learning on Spark PAGE 9

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

Page 10: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

2. SystemML core features

SystemML: Declarative Machine Learning on Spark PAGE 10

Optimizer integration

Source: Spark Summit. Inside Apache SystemML

Page 11: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

2. SystemML core features

Distributed Matrix Representation

SystemML: Declarative Machine Learning on Spark PAGE 11

Runtime integration

Buffer Pool Integration

Page 12: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

2. SystemML core features

SystemML: Declarative Machine Learning on Spark PAGE 12

Runtime integration

Specific Runtime

Optimizations

● Lazy Spark-Context Creation● Short-Circuit Read● Short-Circuit Collect

+

Dynamic recompilation● Adapt the runtime plan to changing or

initially unknown data characteristics+

Partitioning Operations● Partitioning-Preserving Operations● Partitioning-Exploiting Operations+

Page 13: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 13

Experiments3

Page 14: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

3. Experiments

SystemML: Declarative Machine Learning on Spark PAGE 14

End-to-End Performance

Page 15: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

3. Experiments

SystemML: Declarative Machine Learning on Spark PAGE 15

Runtime per Iteration

Page 16: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 16

Conclusions 4

Page 17: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

4. Conclusions

✓ Importance of DML as a high-level language to improve

interoperability and scalability of Machine Learning models on Spark

✓ Multiple layers of abstraction and optimizations make SystemML a

powerful tool for accelerating the development of Machine Learning models over Big Data

✓ Experimental evaluation on multiple ML models and datasets

SystemML: Declarative Machine Learning on Spark PAGE 17

Takeaways and paper contributions

Page 18: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 18

Thanks for your attention

Page 19: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

SystemML: Declarative Machine Learning on Spark PAGE 19

Discussion 5

Page 20: SystemML: Declarative Machine Learning on Sparktozsu/courses/CS848/W19... · Accelerates model development Simplifies deployment DML Source: Spark Summit. Inside Apache SystemML

5. Discussion

1. Optimizer. How to optimize ML models over data streams?2. Runtime. In dynamic recompilation, what could be unknown data

characteristics?3. Experiments. How SystemML might perform for the KNN algorithm?

SystemML: Declarative Machine Learning on Spark PAGE 20

Research

Industry

5. Current capabilities compared to other tools such as Numpy, Scikit Learn, or TensorFlow?

6. Adoption in the current ML and Big Data user base?7. SystemML in Cloud computing infrastructure. Beyond IBM?