architectural knowledge for designing and ... - wordpress.com · amazon ec2 & s3 google app...

36
Architectural Knowledge for Designing and Evolving Big Data Applications M. Ali Babar CREST – Centre for Research on Engineering Software Technologies University of Adelaide, Australia (http://crest-centre.net ) Big Data Meetup, Adelaide, Australia September 28, 2015

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Architectural Knowledge for Designing and Evolving Big Data Applications

M. Ali Babar CREST – Centre for Research on Engineering Software Technologies University of Adelaide, Australia (http://crest-centre.net) Big Data Meetup, Adelaide, Australia September 28, 2015

What is Software Architecture and Why is it Critical?

What is Software Architecture? •  Architecture is the fundamental organization of a system

embodied in its components, their relationships to each other and to the environment and the principles guiding its design and evolution. (IEEE1471 – 2000).

•  A software system’s architecture is the set of principal design decision made about the system (Taylor, R., et al., 2010).

•  Its all about design DECISIONS – bad, good and better ones

•  Context – good decisions may become the bad ones

Software architecture should provide intellectual control and specifications

for meaningful reasoning by stakeholders

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Locationsensing

technology

Time-ordered queue ofraw location objects

Time-ordered queue ofraw location objects

Acquisition

Collation of relatedlocation objects

Collation of relatedlocation objects

Collation of relatedlocation objects

Query subsystemQuery subsystem

Collection

Monitoring

ApplicationApplication

App-specificMonitor

Collection of unnamedtracked entity

location objects

Collection of unnamedtracked entity

location objects

Collection of namedtracked entity

location objects

Collection of namedtracked entity

location objects

ReusableMonitor

ReusableMonitor

Source: Cooney et al., 2007

Software Architecture: Key Design Decisions

Knowledge Centric Design & Evaluation

Documenting

Architecturedesign

Specifying ASRs

ArchitectureEvaluation

Stakeholders

Prioritized Quality Attribute

Scenarios

Requirements, constraints

Patterns and tactics

text

text

text

“Sketches” of candidate views,Determined by patterns

Chosen, combined views plus doc’n.

beyond views

Adapted from: Hofmeister, C., et al., A general model of software architecture design derived from five industrial approaches. Journal of Systems and Software 80(1): 106-126 (2007).

Utility

Maintainability/Modifiability/Extensibility

Performance

Security

Usability

Integrating with other systems

New browser

M1 (H, H): Add the ability to interact with a new university records system (to validate the authenticity of a degree) within 2 week 2 people work

M2 (H, M): Add the ability for a financial institution to access QVS to report the details of received payments within 2 weeks 2 people workM3 (M, H): Add the ability to connect to DIMA and check working visa conditions within 4 weeks 2 people workM4 (M, L): Add support for a new browser within two weeks

Response Time

P1 (H, M): Users need to be able to register within 5 seconds during heavy load (e.g. 500 requests per second)

P2 (H, M): User should be able to a submit verification request within 10 seconds during peak hours (e.g. 500 requests per second)

Throughput P3 (H, H): The system demand exceed initial planned capacity

Data confidentiality

Data integrityS1 (H, L): The system must provide a secure mechanism to allow users retrieve back the password

S2 (H, M): Customers sensitive information (e.g., Credit Card details) should not be accessible even the web interface security is compromised

S3 (M, M): Ability to report audit trial of modifications and users’ activities (e.g.: attempted access)

S4 (M, H): Ability to make online payment using commercial-grade encryption mechanisms

Normal operations

Customization

U1 (M, L): Allow users to save work in progress information (e.g. candidate information) so that work could be completed at different stages without needing to complete the whole process at once.

U2 (H, M): Allow users to cancel work in progress (e.g. cancel verification request after data entry and before submitting the request)

U4 (M, L): Ability to personalize the look and feel of the QVS web site

U5 (H, L): Ability to use the system without any assistance i.e.: the system need to be easy to learn and useProficiency training

U3 (L, M): Requesting verification for multiple candidates with minimum data entry (e.g.: select multiple candidates and request same verification services)

M1 (H, H): Add the ability to interact with a new University record system to validate the authenticity of a degree within 2-person day.

User Stories + Quality Attribute Scenarios

•  Scenarios are useful for evaluating multiple quality attributes of software architecture

•  Key scenarios can drive the evaluation –  describe the behavior of architecture –  set the context for particular quality attributes

•  Knowledge of patterns is always handy for quickly evaluating design alternatives

•  lightweight and agile process –  Only two roles involved –  Repository of architectural knowledge

1 1 Proxy

service

Service

service

AbstractService

service

Client

Exploiting Scenarios and Patterns

•  Sharding Pattern –  Divides entire data into horizental shards having same schema

where each shard contains a distinct subset of entire data

•  Cache Aside (Proxy) Pattern –  Loads static or dynamic data from a data store into a cache. It

helps improve performance and keeps consistency between data held at cache and data residing in data store.

•  Priority Queue Pattern –  Assigns a certain priority level to each job request based on its

significance to reorder all job requests according to their assigned priority. With pattern high priority data jobs can be processed earlier as compared to low priority jobs.

Example Patterns

•  Index Table Pattern –  Improves query performance by creating indexes (secondary

keys) over the data fields which are frequently accessed. It uses certain attributes (data fields) based on their significance for querying as secondary keys - speeds up query response time.

•  Pipe and Filter Pattern –  Data to be processed is decomposed into discrete elements

which can be reused. Elements are processed in a pipeline manner where output of one element becomes input of the next element. Improves performance, scalability and reusability.

•  Health Check Pattern –  Implements various functional checks to verify that application

and services are performing correctly. Mainly a combination of two factors: checks which are performed by the application and analysis of the result performed by the tool or framework that is handling health check.

Example Patterns

Building and Leveraging a Body of Knowledge

Design Knowledge Support

Architectural Knowledge

ReasoningKnowledge

Share (C)

Architect (A)

Evaluate (F)

Learn (E)

GeneralKnowledge

Design Knowledge

Synthesize (G)

Context Knowledge

Integrate (B)

Distill (H)

Apply (I)

Producer

Consumer

Trace (D)

Trace (D)

Trace (D)

Evaluate (F)

Key

Knowledge Type

Actor Consuming activityProducing activity

Traceability created by producers and used by consumers

Search / Retrieve (J)

•  Architectural knowledge and architecture lifecycle –  Architecture design & analysis –  Maintenance and evolution

•  Guidance and tools –  Types of architectural knowledge. –  Manage and share knowledge. –  Architectural description for reuse.

•  Building an infrastructure

–  A characterization scheme of architectural design knowledge.

–  An infrastructure for capturing and sharing architectural knowledge.

Classifying Architecture Knowledge

Discovering & Cataloguing Architecture Styles

Knowledge Based Support for Architecting

Requirements:

• Functional

• Verification

Specification Define

Problem Description

Architecture Description

Desired Quality Attribute Measures

BRedB

Scenarios Design Tactics Patterns

Quality Define

Architecture Design

Quality Attribute Measures and

Risks

Analytical Model / Reasoning Framework

Architecture Design Knowledge Ecosystems

Private Ecosystem A

Private Ecosystem C

Private Ecosystem

B

Company

Employee

Public Ecosystem create

customized AK input form

share AK View AK

IDE Modeling Tool

collaboration

Modeling

AK Consume

Implementing

Integration integration

AK Extraction

AK Consume

Requirement CM/Issue Tracking

KBase

Patterns Captured in a Knowledge Base

18

Detailed Description of Patterns

19

Design Using Shard Pattern

20

Design Using Priority Queue Pattern

21

Design Using Cache Pattern

Applying Architectural Knowledge

Hackstat: Data Collection and Analysis

•  A framework for collecting and analyzing process and product data

•  Provides IDE’s plugins called sensors that send information to a service called SensorBase

•  Several services to compute metrics by using the data from SensorBase

•  Metrics viewed using ProjectBrowser, which uses services like SensorBase, Telemetry, DailyProjectData, and TickerTape

Architecture of Hackystat

Provides visualization of different metrics through GUIs Generates reports for

external clients

Provides weekly, monthly and yearly

abstractions of metrics

Provides daily abstraction of data

Receives and stores data and provides daily abstractions

Quality Attributes & Architectural Decisions Quality

Attributes Architectural Decisions

Amazon EC2 & S3 Google App Engine Scalability Replication of system services to meet

performance requirments. No action required. Scalability is handled by platform.

Separation of database layer into a new service that utilizes platform specific persistency features.

Refactoring of persistency components to make it compatible with Google Datastore persistence.

Portability A wrapper layer is added to ensure platform independence. A separate database layer to provide seamless transfer of database layer.

Portability to other platforms is not possible.

Compatibility System features are exposed through origonal REST API. A wrapper layer is added to provide abstraction to services cluster and their deployment configuration.

System features are exposed through origonal REST API.

Reliability & Autonomous Scalability

Façade/Waper layer to provide abstraction. Amazon’s Elastic Load Balancer ensures autonomous scalability.

Ensured by platform.

Efficient & effective deployments

Amazon Elastic Load Balancer ensures auto scaling as well as efficient and cost effective deployment configuration.

Deployment of application components on cloud is managed by platform.

Architectural Views of Hackystat

Tools as a Service (TaaS)

!

High Level Architectural Solution

•  Tools Hosted in Public or Private Clouds

•  Data (Content Elements) I n t e g r a t i o n t h r o u g h C o m m o n S e m a n t i c Model Using Ontologies

Core Elements of TaaS Space High-level Architecture Overview

Semantic Integration Among Tools •  Explicitly describing common concepts •  Mapping between tools specific and common concepts

ASR and Knowledge Management Tool Modeling Tool

End Users End Users

Building Semantically Integrated Data Model

End to End Integration •  Probes and plugins to map

data of tools onto aggregated Ontology model.

•  Generating RDF graphs from aggregated Ontology model.

ASR and KM Concepts Modeling Concept

Architecture of Integration Systems •  Subsystem for Annotation, Semantic Integration and

Collaboration Notifications based on Ontology Model

•  Pipeline architecture is used to distribute data elements between private and public cloud.

•  Monitor observes the data elements and distributes them between private and public cloud process pipeline depending upon the distribution rules.

•  Re-projection components combines the data elements after processing.

Pipe and Filter Architectural Style

•  Multi-tenancy is handled at four layers of abstraction. –  Service layers –  Data access layer –  Data store layer –  Virtual infrastructure layer

•  A combination depends upon tenants requirements.

Multi-tenancy Management

Concluding Remarks

•  Like any other large scale system software architecture plays a vital role in big data systems

•  Gap in architecture design knowledge can lead to design with huge technical debt

•  Proven design principles and knowledge need to be adapted for architectural challenges

•  Building architectural design knowledge is Important for developing big data systems

Acknowledgements

•  Slides are based on the work that is being carried out in my group in close collaboration with several colleagues, students, and industrial partners.

•  Some research challenges and promising solutions have been developed for joint research proposals.

•  Wiki for big data architecture knowledge has been developed through a project carried out by Jing, Atbin, and Liang under my supervision.

•  TaaS Platform work is being driven by Aufeef Chauhan.

Thank You!

Questions M. Ali Babar [email protected] malibabar.wordpress.com