principles of software-defined elastic systems for big data analytics

21
Principles of Software-defined Elastic Systems for Big Data Analytics SDS@IC2E 2014 Hong-Linh Truong, Schahram Dustdar Distributed Systems Group Vienna University of Technology [email protected] http://dsg.tuwien.ac.at/research/viecom SDS@IC2E 2014, 11 Mar 2014, Boston, USA 1

Upload: hong-linh-truong

Post on 18-Nov-2014

315 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Principles of Software-defined Elastic Systems for Big Data Analytics

Principles of Software-defined Elastic

Systems for Big Data Analytics

SDS@IC2E 2014

Hong-Linh Truong, Schahram Dustdar

Distributed Systems Group

Vienna University of Technology

[email protected]

http://dsg.tuwien.ac.at/research/viecom

SDS@IC2E 2014, 11 Mar 2014,

Boston, USA

1

Page 2: Principles of Software-defined Elastic Systems for Big Data Analytics

Acknowledgements

Muhammad Candra, Georigana Copil, Duc-

Hung Le, Aitor Murguzur, Daniel Moldovan, Tien-

Dung Nguyen, Johannes Schleicher

Research Projects:

SDS@IC2E 2014, 11 Mar 2014,

Boston, USA

2

Page 3: Principles of Software-defined Elastic Systems for Big Data Analytics

Outline

Motivation

Dependences in data analytics processes

Elasticity principles for big data analytics

Software-defined elastic systems for data

Analytics

Case studies

Conclusions and future work

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

3

Page 4: Principles of Software-defined Elastic Systems for Big Data Analytics

Motivation (1)

Current trends for big data analytics

Big computing infrastructure, scalable data mining,

scalable computation frameworks

Our observations

Data analytics done by software algorithms and

humans, e.g., simulation workflows

Different cost/quality requirements from a consumer

for the same analytics process at different contexts,

e.g., in healthcare

Different „outputs“ from the same type of analytics,

E.g., in urban planning

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

4

Page 5: Principles of Software-defined Elastic Systems for Big Data Analytics

Motivation (2)

Few discussions focused

on dynamic and flexible

data analytics based on

multi-dimensional elasticity

properties

Few discussions focused

on dynamic and flexible

data analytics based on

multi-dimensional elasticity

properties

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

5

Resource elasticity

Software / human-based

computing elements,

multiple clouds

Quality elasticity

Non-functional parameters e.g.,

performance, quality of data,

service availability, human

trust

Costs & Benefit

elasticity

rewards, incentives

Elasticity

Resource, quality

and cost elasticity

can be managed

using software-

defined concepts

Resource, quality

and cost elasticity

can be managed

using software-

defined concepts

Page 6: Principles of Software-defined Elastic Systems for Big Data Analytics

Dependencies in data analytics

processes

Analytics

Analytics (structured or unstructured) processes

include analytics tasks

Processes and tasks rely on „computational models“

which implement algorithmic steps to analyze data

Computational models executed by „compute units“

Data resources feed data into computational models.

Analytics Results

Quality of result (QoR) = (Time, Cost, Quality of data

of analytics output)

QoR is not fixed

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

6

Page 7: Principles of Software-defined Elastic Systems for Big Data Analytics

Diverse types of analytics for a

given subjects

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

7

Examples

Page 8: Principles of Software-defined Elastic Systems for Big Data Analytics

Complex dependencies in (big)

data analytics More data more

computational resources

(e.g. more VMs)

More types of data

more computational models

more analytics

processes

Change quality of results

Change quality of data

Change response time

Change cost

Change types of result

(form of the data

output, e.g. tree, visual,

story, etc.)

More data more

computational resources

(e.g. more VMs)

More types of data

more computational models

more analytics

processes

Change quality of results

Change quality of data

Change response time

Change cost

Change types of result

(form of the data

output, e.g. tree, visual,

story, etc.)

8

Data

Computational

Model

Analytics

Process

Analytics Result

Data

Data

DataxDatax

DatayDatay

DatazDataz

Computational

Model

Computational

ModelComputational

Model

Computational

ModelComputational

Model

Computational

Model

Analytics

Process

Analytics

ProcessAnalytics

Process

Analytics

ProcessAnalytics

Process

Analytics

Process

Quality of

Result

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Page 9: Principles of Software-defined Elastic Systems for Big Data Analytics

Complex dependencies in (big)

data analytics

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

9

Elasticity principles

can be used to

support this!

Elasticity principles

can be used to

support this!

Page 10: Principles of Software-defined Elastic Systems for Big Data Analytics

Elasticity Principles: Elasticity of

data and computational models

Multiple types of objects from different sources with

complex dependencies, relevancies, and quality

Different data and computational models the same

analytics subject

New analytics subjects can be defined and analytics

goals can be changed

Decide/select/define/compose not only computational

models for analytics subjects but also data models

based on existing ones

10SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Management and modeling of elasticity of data and computational

model during the analytics

Management and modeling of elasticity of data and computational

model during the analytics

Page 11: Principles of Software-defined Elastic Systems for Big Data Analytics

Elasticity Principles: Elasticity of

data resources

Data provided, managed and shared by different

providers

Data associated with different concerns (cost, quality

of data, privacy, contract, etc.

Static data, open data, data-as-a-service, opportunistic

data (from sensors and human sensing)

Not just centralized big data and total data ownership

11SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Data resources can be taken into account in an elastic

manger: similar to VMs, based on their quality,

relevancy, pricing, etc.

Data resources can be taken into account in an elastic

manger: similar to VMs, based on their quality,

relevancy, pricing, etc.

Page 12: Principles of Software-defined Elastic Systems for Big Data Analytics

Elasticity Principles: Elasticity of

humans and software as computing

units

Human in the loop to solve analytics tasks that software

cannot solve

Human-based compute units can be scaled up/down

with different cost, availability, performance models

Human-based compute units + software-based

compute units for executing computational models

Elasticity controls can be also done by humans

12SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Provisioning hybrid compute units in an elastic way for

computational/data/network tasks as well as for

monitoring/control tasks in the analytics process

Provisioning hybrid compute units in an elastic way for

computational/data/network tasks as well as for

monitoring/control tasks in the analytics process

Page 13: Principles of Software-defined Elastic Systems for Big Data Analytics

Elasticity Principles: Elasticity of

quality of results

Definition of quality of results

Trade-offs of time, cost, quality of data, forms of

output

Using quality of results to select suitable

computational models, data resources,

computing units

Multi-level control for the elasticity based on

quality of results

13SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Able to cope with changes in quality of data,

performance, cost and types of results at runtime

Able to cope with changes in quality of data,

performance, cost and types of results at runtime

Page 14: Principles of Software-defined Elastic Systems for Big Data Analytics

Software-defined elastic systems for

data analytics High-level elastic object models and SD APIs

14SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Page 15: Principles of Software-defined Elastic Systems for Big Data Analytics

Software-defined elastic systems for

data analytics Conceptual architecture view

15SDS@IC2E 2014, 11 Mar

2014, Boston, USA

Page 16: Principles of Software-defined Elastic Systems for Big Data Analytics

...

events stream

Event Analyzer on

PaaS

Event Analyzer on

PaaS

Peak OperationNormal Operation

Human Analysts Human Analysts

Peak OperationNormal Operation

Machine/Human

Event Analyzers

Critical

situation 1

ExpertsExperts

SCU

Situation Data

analytics

Situation Data

analytics

Wf. A

Wf. B

Critical

situation 2

Cloud DaaSCloud DaaS

Operation data

analytics

Operation data

analytics

M2M PaaSM2M PaaS

Cloud IaaS

Operation

problem

Other stakeholdersOther stakeholders

Maintenance

process

Maintenance

process

Case study – data analytics in

smart cities

Near-realtime

sensor data

and several

historical data

Several

services for

analyzing data

Near-realtime

and offline

data analytics

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

16

Page 17: Principles of Software-defined Elastic Systems for Big Data Analytics

Case study – data analytics in

smart cities

Elastic analytics process

Most near-realtime analytics done by complex

software processes

Resource and cost elasticity applied for these

processes

Using high level language to invoke humans in

critical situation, e.g., when accuracy is too low

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

17

#for a service unit analyzing chiller status

#SYBL.ServiceUnitLevel

Mon1 MONITORING accuracy = Quality.Accuracy

Cons1 CONSTRAINT accuracy < 0.7

Str1 STRATEGY CASE Violated(Cons1):

Notify(Incident.DEFAULT, ServiceUnitType.HBS)

Page 18: Principles of Software-defined Elastic Systems for Big Data Analytics

Case study – data analytics in

smart cities

Data resource elasticity

For offline analytics: e.g., maintaining analytics

processes of the chiller operation

high quality of results take lot of historical data

and computational resource

Near-realtime analytics: e.g., detecting operation

failures of a chiller

Accept lower quality of results but fast response

Take only recently data

Control sensors to increase sampling rates

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

18

Page 19: Principles of Software-defined Elastic Systems for Big Data Analytics

Case study – data analytics in

smart cities

Elastic Hybrid Compute Units

Results from the failure operation analytics need to

be checked by human compute units

Human compute units include people from the M2M

cloud provider, equipment operator/maintenance

company, equipment manufactures and software

companies

When a problem detected, human compute units can

be scaled up and downed, depending on how the

complexity and evolution of problems being solved

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

19

Page 20: Principles of Software-defined Elastic Systems for Big Data Analytics

Conclusions and future work

Consider elastic quality of results in data analytics

Support principles of elasticity to achieve flexible and

dynamic data analytics

Need runtime elasticity techniques for dealing with

Data and computational models

Distributed and diverse data resources

Hybrid compute units of software and people

Future work

Quality of results aware data analytics processes

Elasticity control of software and humans

Programming models for data elasticity

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

20

Page 21: Principles of Software-defined Elastic Systems for Big Data Analytics

Thanks for your attention!

Hong-Linh Truong

Distributed Systems GroupTU Wien

dsg.tuwien.ac.at/research/viecom

[email protected]

SDS@IC2E 2014, 11 Mar

2014, Boston, USA

21