dice: a model-driven devops framework for big data...2017/12/17  · devops for bd: lessons learned...

23
DICE Horizon 2020 Project Grant Agreement no. 644869 http://www.dice-h2020.eu Funded by the Horizon 2020 Framework Programme of the European Union DICE: a Model-Driven DevOps Framework for Big Data Giuliano Casale Imperial College London

Upload: others

Post on 26-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

DICE: a Model-Driven DevOps Framework for Big Data

Giuliano CasaleImperial College London

Page 2: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

• DevOps for Big data• DataOps: DevOps & data engineering• DevOps for data intensive applications

• DevOps with Big data• Mining monitoring data for feedbacks/RCA• Novel uses of data science to address

software engineering problems

DevOps & Big data

2©DICE

Page 3: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DICE: DevOps for DIAs

Horizon 2020 research project (4M€, 2015-18) 9 partners (Academia & SMEs), 7 EU countries

3©DICE

Mission: support SMEs in developing high-quality cloud-based data-intensive applications (DIAs)

Page 4: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

1. Deliver a DevOps toolchain for DIAs• From requirements to CI/CD• Shared view of the application across roles

2. Increase automation of quality analysisduring DIA development and operation• Analysis of efficiency, reliability, correctness• Architectural optimisation for DIAs

Challenges

4©DICE

Page 5: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

o Reliability

o Efficiency

o Correctness

What do we mean by Quality?

5

Availability Fault-tolerance

Performance Costs

©DICE

Privacy & security Temporal metrics

Page 6: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DIA dimensions

6©DICE

Data and its Properties

Location

VelocityVolume

VarietyPrivacy

Platforms

NoSQLHadoopSpark

Storm

Methods & Tools

MDE

DeliveryDevOps

Cloud

IntegrationQA

Page 7: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

1. Model-driven orchestrationo Big Data technologies are in movement

2. QA through M2M transformation of requirements/SLAs/architectural information

o Test (and stress test) generationo Simulationo Verificationo Optimization (costs, SLAs, config)o Architectural anti-pattern detection

Our case for model-driven DevOps

7©DICE

Eases QAskill shortage

Page 8: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Prototype

Deploy

Monitor

Enhance

Design

DICE: Quality-Aware DevOps for Big Data

8©DICE

Page 9: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Big Data Technologies

Cloud (Priv/Pub)`

DICE - Overview

DICE Framework

9©DICE

DICE IDE

Profile

Plugins

Quality Analysis

UML-BasedMDEMethodology

Feedback & Iterative

Enhancement

Data Intensive Application (DIA)

Continuous Delivery & Testing

Page 10: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DICE Workflow - Dev

10©DICE

Requirements & Design

Transformto Formal Models

Simulate & Verify Optimize &Anti-patterns

EnhanceArchitecture

DICE Eclipse IDE

Page 11: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

o Early-stage scalability analysiso Effective what-if analysis for in-memory wklds

Why simulating DIAs?

11©DICE

4 days 2 days 1 day 4 days 2 days 1 day<1h<1h

Example: SAP HANASim vs Response surfaces

Page 12: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Why verification in DIAs?

12©DICE

o Privacy-by-design (GDPR)o M2M transformation propagates changes to

runtime log verification

Page 13: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DICE Workflow - Ops

13©DICE

Deploy(TOSCA) Quality Testing Fault

injectionConfigurationOptimization

Monitoring feedbacks

Page 14: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

TOSCA technology library for DIA

14/26©DICE

StormCluster:type: dice.components.storm.Nimbusrelationships:- {type: dice.relationships.ContainedIn,

target: StormMasterVM}- {type: dice.relationships.storm.ConnectedToZookeeperQuorum,

target: ZookeeperCluster}properties:

configuration: {taskTimeout: '30', supervisorTimeout: '60',monitorFrequency: '10', queueSize: '100000',retryTimes: '5', retryInterval: '2000'}

Page 15: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DevOps for BD: Lessons learned

15/26©DICE

1. Choosing the right Big data technology upfront remains an open problem.

2. Model-based orchestration an easier sell than model-based design.

3. We are still far apart from convergence of DevOps with DataOps.

4. Number of tools can be a problem, methodology is key.

Page 16: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

16

• Code changes mean that workloads change• Wrong configurations can have high performance costs

DevOps with BD: Configuration

Page 17: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Configuring a DIA architecture

17/26©DICE

o Example: a Storm-based application

!

Page 18: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

o Batch exploration of alternative DIA configurationso Algorithmic use of automated deployment

How to evolve configuration?

18/26©DICE

Page 19: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Configuration optimization (CO)

19/26©DICE

o Formally, a sequential blackboxoptimizationo Objective function is unknown

before the experiments (e.g., response time)

o Fixed budget of experimentso User inputs range of parameters

o Several techniques availableo Design of experimentso Bayesian optimizationo …

Page 20: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

CO case study: Storm-based DIA

20/26©DICE

• 2.5 months to exhaustively test 4000 alternatives• Bayesian optimization can find good configurations in

just a few tens of tests

20Tests (5 minutes/each)

BO

Dist

ance

to o

ptim

al c

onfig

Page 21: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Cassandra example

21/26©DICE

• Transfer learning methods helps reusing performance data from prior DIA releases

Page 22: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

DevOps with BD: Lessons learned

22/26©DICE

1. Works out of the box, little expertise needed

2. Risk needs to be a tunable parameter

3. People still want to understand how it works

4. Scalability to tens, not hundreds of parameters

Page 23: DICE: a Model-Driven DevOps Framework for Big Data...2017/12/17  · DevOps for BD: Lessons learned ©DICE 15/26 1. Choosing the right Big data technology upfront remains an open problem

Questions?

23©DICE

Thanks!