principles of software-defined elastic systems for big data analytics
DESCRIPTION
TRANSCRIPT
Principles of Software-defined Elastic
Systems for Big Data Analytics
SDS@IC2E 2014
Hong-Linh Truong, Schahram Dustdar
Distributed Systems Group
Vienna University of Technology
http://dsg.tuwien.ac.at/research/viecom
SDS@IC2E 2014, 11 Mar 2014,
Boston, USA
1
Acknowledgements
Muhammad Candra, Georigana Copil, Duc-
Hung Le, Aitor Murguzur, Daniel Moldovan, Tien-
Dung Nguyen, Johannes Schleicher
Research Projects:
SDS@IC2E 2014, 11 Mar 2014,
Boston, USA
2
Outline
Motivation
Dependences in data analytics processes
Elasticity principles for big data analytics
Software-defined elastic systems for data
Analytics
Case studies
Conclusions and future work
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
3
Motivation (1)
Current trends for big data analytics
Big computing infrastructure, scalable data mining,
scalable computation frameworks
Our observations
Data analytics done by software algorithms and
humans, e.g., simulation workflows
Different cost/quality requirements from a consumer
for the same analytics process at different contexts,
e.g., in healthcare
Different „outputs“ from the same type of analytics,
E.g., in urban planning
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
4
Motivation (2)
Few discussions focused
on dynamic and flexible
data analytics based on
multi-dimensional elasticity
properties
Few discussions focused
on dynamic and flexible
data analytics based on
multi-dimensional elasticity
properties
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
5
Resource elasticity
Software / human-based
computing elements,
multiple clouds
Quality elasticity
Non-functional parameters e.g.,
performance, quality of data,
service availability, human
trust
Costs & Benefit
elasticity
rewards, incentives
Elasticity
Resource, quality
and cost elasticity
can be managed
using software-
defined concepts
Resource, quality
and cost elasticity
can be managed
using software-
defined concepts
Dependencies in data analytics
processes
Analytics
Analytics (structured or unstructured) processes
include analytics tasks
Processes and tasks rely on „computational models“
which implement algorithmic steps to analyze data
Computational models executed by „compute units“
Data resources feed data into computational models.
Analytics Results
Quality of result (QoR) = (Time, Cost, Quality of data
of analytics output)
QoR is not fixed
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
6
Diverse types of analytics for a
given subjects
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
7
Examples
Complex dependencies in (big)
data analytics More data more
computational resources
(e.g. more VMs)
More types of data
more computational models
more analytics
processes
Change quality of results
Change quality of data
Change response time
Change cost
Change types of result
(form of the data
output, e.g. tree, visual,
story, etc.)
More data more
computational resources
(e.g. more VMs)
More types of data
more computational models
more analytics
processes
Change quality of results
Change quality of data
Change response time
Change cost
Change types of result
(form of the data
output, e.g. tree, visual,
story, etc.)
8
Data
Computational
Model
Analytics
Process
Analytics Result
Data
Data
DataxDatax
DatayDatay
DatazDataz
Computational
Model
Computational
ModelComputational
Model
Computational
ModelComputational
Model
Computational
Model
Analytics
Process
Analytics
ProcessAnalytics
Process
Analytics
ProcessAnalytics
Process
Analytics
Process
Quality of
Result
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Complex dependencies in (big)
data analytics
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
9
Elasticity principles
can be used to
support this!
Elasticity principles
can be used to
support this!
Elasticity Principles: Elasticity of
data and computational models
Multiple types of objects from different sources with
complex dependencies, relevancies, and quality
Different data and computational models the same
analytics subject
New analytics subjects can be defined and analytics
goals can be changed
Decide/select/define/compose not only computational
models for analytics subjects but also data models
based on existing ones
10SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Management and modeling of elasticity of data and computational
model during the analytics
Management and modeling of elasticity of data and computational
model during the analytics
Elasticity Principles: Elasticity of
data resources
Data provided, managed and shared by different
providers
Data associated with different concerns (cost, quality
of data, privacy, contract, etc.
Static data, open data, data-as-a-service, opportunistic
data (from sensors and human sensing)
Not just centralized big data and total data ownership
11SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Data resources can be taken into account in an elastic
manger: similar to VMs, based on their quality,
relevancy, pricing, etc.
Data resources can be taken into account in an elastic
manger: similar to VMs, based on their quality,
relevancy, pricing, etc.
Elasticity Principles: Elasticity of
humans and software as computing
units
Human in the loop to solve analytics tasks that software
cannot solve
Human-based compute units can be scaled up/down
with different cost, availability, performance models
Human-based compute units + software-based
compute units for executing computational models
Elasticity controls can be also done by humans
12SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Provisioning hybrid compute units in an elastic way for
computational/data/network tasks as well as for
monitoring/control tasks in the analytics process
Provisioning hybrid compute units in an elastic way for
computational/data/network tasks as well as for
monitoring/control tasks in the analytics process
Elasticity Principles: Elasticity of
quality of results
Definition of quality of results
Trade-offs of time, cost, quality of data, forms of
output
Using quality of results to select suitable
computational models, data resources,
computing units
Multi-level control for the elasticity based on
quality of results
13SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Able to cope with changes in quality of data,
performance, cost and types of results at runtime
Able to cope with changes in quality of data,
performance, cost and types of results at runtime
Software-defined elastic systems for
data analytics High-level elastic object models and SD APIs
14SDS@IC2E 2014, 11 Mar
2014, Boston, USA
Software-defined elastic systems for
data analytics Conceptual architecture view
15SDS@IC2E 2014, 11 Mar
2014, Boston, USA
...
events stream
Event Analyzer on
PaaS
Event Analyzer on
PaaS
Peak OperationNormal Operation
Human Analysts Human Analysts
Peak OperationNormal Operation
Machine/Human
Event Analyzers
Critical
situation 1
ExpertsExperts
SCU
Situation Data
analytics
Situation Data
analytics
Wf. A
Wf. B
Critical
situation 2
Cloud DaaSCloud DaaS
Operation data
analytics
Operation data
analytics
M2M PaaSM2M PaaS
Cloud IaaS
Operation
problem
Other stakeholdersOther stakeholders
Maintenance
process
Maintenance
process
Case study – data analytics in
smart cities
Near-realtime
sensor data
and several
historical data
Several
services for
analyzing data
Near-realtime
and offline
data analytics
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
16
Case study – data analytics in
smart cities
Elastic analytics process
Most near-realtime analytics done by complex
software processes
Resource and cost elasticity applied for these
processes
Using high level language to invoke humans in
critical situation, e.g., when accuracy is too low
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
17
#for a service unit analyzing chiller status
#SYBL.ServiceUnitLevel
Mon1 MONITORING accuracy = Quality.Accuracy
Cons1 CONSTRAINT accuracy < 0.7
Str1 STRATEGY CASE Violated(Cons1):
Notify(Incident.DEFAULT, ServiceUnitType.HBS)
Case study – data analytics in
smart cities
Data resource elasticity
For offline analytics: e.g., maintaining analytics
processes of the chiller operation
high quality of results take lot of historical data
and computational resource
Near-realtime analytics: e.g., detecting operation
failures of a chiller
Accept lower quality of results but fast response
Take only recently data
Control sensors to increase sampling rates
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
18
Case study – data analytics in
smart cities
Elastic Hybrid Compute Units
Results from the failure operation analytics need to
be checked by human compute units
Human compute units include people from the M2M
cloud provider, equipment operator/maintenance
company, equipment manufactures and software
companies
When a problem detected, human compute units can
be scaled up and downed, depending on how the
complexity and evolution of problems being solved
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
19
Conclusions and future work
Consider elastic quality of results in data analytics
Support principles of elasticity to achieve flexible and
dynamic data analytics
Need runtime elasticity techniques for dealing with
Data and computational models
Distributed and diverse data resources
Hybrid compute units of software and people
Future work
Quality of results aware data analytics processes
Elasticity control of software and humans
Programming models for data elasticity
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
20
Thanks for your attention!
Hong-Linh Truong
Distributed Systems GroupTU Wien
dsg.tuwien.ac.at/research/viecom
SDS@IC2E 2014, 11 Mar
2014, Boston, USA
21