Architectural Knowledge for Designing and Evolving Big Data Applications
M. Ali Babar CREST – Centre for Research on Engineering Software Technologies University of Adelaide, Australia (http://crest-centre.net) Big Data Meetup, Adelaide, Australia September 28, 2015
What is Software Architecture? • Architecture is the fundamental organization of a system
embodied in its components, their relationships to each other and to the environment and the principles guiding its design and evolution. (IEEE1471 – 2000).
• A software system’s architecture is the set of principal design decision made about the system (Taylor, R., et al., 2010).
• Its all about design DECISIONS – bad, good and better ones
• Context – good decisions may become the bad ones
Software architecture should provide intellectual control and specifications
for meaningful reasoning by stakeholders
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Locationsensing
technology
Time-ordered queue ofraw location objects
Time-ordered queue ofraw location objects
Acquisition
Collation of relatedlocation objects
Collation of relatedlocation objects
Collation of relatedlocation objects
Query subsystemQuery subsystem
Collection
Monitoring
ApplicationApplication
App-specificMonitor
Collection of unnamedtracked entity
location objects
Collection of unnamedtracked entity
location objects
Collection of namedtracked entity
location objects
Collection of namedtracked entity
location objects
ReusableMonitor
ReusableMonitor
Source: Cooney et al., 2007
Software Architecture: Key Design Decisions
Knowledge Centric Design & Evaluation
Documenting
Architecturedesign
Specifying ASRs
ArchitectureEvaluation
Stakeholders
Prioritized Quality Attribute
Scenarios
Requirements, constraints
Patterns and tactics
text
text
text
“Sketches” of candidate views,Determined by patterns
Chosen, combined views plus doc’n.
beyond views
Adapted from: Hofmeister, C., et al., A general model of software architecture design derived from five industrial approaches. Journal of Systems and Software 80(1): 106-126 (2007).
Utility
Maintainability/Modifiability/Extensibility
Performance
Security
Usability
Integrating with other systems
New browser
M1 (H, H): Add the ability to interact with a new university records system (to validate the authenticity of a degree) within 2 week 2 people work
M2 (H, M): Add the ability for a financial institution to access QVS to report the details of received payments within 2 weeks 2 people workM3 (M, H): Add the ability to connect to DIMA and check working visa conditions within 4 weeks 2 people workM4 (M, L): Add support for a new browser within two weeks
Response Time
P1 (H, M): Users need to be able to register within 5 seconds during heavy load (e.g. 500 requests per second)
P2 (H, M): User should be able to a submit verification request within 10 seconds during peak hours (e.g. 500 requests per second)
Throughput P3 (H, H): The system demand exceed initial planned capacity
Data confidentiality
Data integrityS1 (H, L): The system must provide a secure mechanism to allow users retrieve back the password
S2 (H, M): Customers sensitive information (e.g., Credit Card details) should not be accessible even the web interface security is compromised
S3 (M, M): Ability to report audit trial of modifications and users’ activities (e.g.: attempted access)
S4 (M, H): Ability to make online payment using commercial-grade encryption mechanisms
Normal operations
Customization
U1 (M, L): Allow users to save work in progress information (e.g. candidate information) so that work could be completed at different stages without needing to complete the whole process at once.
U2 (H, M): Allow users to cancel work in progress (e.g. cancel verification request after data entry and before submitting the request)
U4 (M, L): Ability to personalize the look and feel of the QVS web site
U5 (H, L): Ability to use the system without any assistance i.e.: the system need to be easy to learn and useProficiency training
U3 (L, M): Requesting verification for multiple candidates with minimum data entry (e.g.: select multiple candidates and request same verification services)
M1 (H, H): Add the ability to interact with a new University record system to validate the authenticity of a degree within 2-person day.
User Stories + Quality Attribute Scenarios
• Scenarios are useful for evaluating multiple quality attributes of software architecture
• Key scenarios can drive the evaluation – describe the behavior of architecture – set the context for particular quality attributes
• Knowledge of patterns is always handy for quickly evaluating design alternatives
• lightweight and agile process – Only two roles involved – Repository of architectural knowledge
1 1 Proxy
service
Service
service
AbstractService
service
Client
Exploiting Scenarios and Patterns
• Sharding Pattern – Divides entire data into horizental shards having same schema
where each shard contains a distinct subset of entire data
• Cache Aside (Proxy) Pattern – Loads static or dynamic data from a data store into a cache. It
helps improve performance and keeps consistency between data held at cache and data residing in data store.
• Priority Queue Pattern – Assigns a certain priority level to each job request based on its
significance to reorder all job requests according to their assigned priority. With pattern high priority data jobs can be processed earlier as compared to low priority jobs.
Example Patterns
• Index Table Pattern – Improves query performance by creating indexes (secondary
keys) over the data fields which are frequently accessed. It uses certain attributes (data fields) based on their significance for querying as secondary keys - speeds up query response time.
• Pipe and Filter Pattern – Data to be processed is decomposed into discrete elements
which can be reused. Elements are processed in a pipeline manner where output of one element becomes input of the next element. Improves performance, scalability and reusability.
• Health Check Pattern – Implements various functional checks to verify that application
and services are performing correctly. Mainly a combination of two factors: checks which are performed by the application and analysis of the result performed by the tool or framework that is handling health check.
Example Patterns
Design Knowledge Support
Architectural Knowledge
ReasoningKnowledge
Share (C)
Architect (A)
Evaluate (F)
Learn (E)
GeneralKnowledge
Design Knowledge
Synthesize (G)
Context Knowledge
Integrate (B)
Distill (H)
Apply (I)
Producer
Consumer
Trace (D)
Trace (D)
Trace (D)
Evaluate (F)
Key
Knowledge Type
Actor Consuming activityProducing activity
Traceability created by producers and used by consumers
Search / Retrieve (J)
• Architectural knowledge and architecture lifecycle – Architecture design & analysis – Maintenance and evolution
• Guidance and tools – Types of architectural knowledge. – Manage and share knowledge. – Architectural description for reuse.
• Building an infrastructure
– A characterization scheme of architectural design knowledge.
– An infrastructure for capturing and sharing architectural knowledge.
Knowledge Based Support for Architecting
Requirements:
• Functional
• Verification
Specification Define
Problem Description
Architecture Description
Desired Quality Attribute Measures
BRedB
Scenarios Design Tactics Patterns
Quality Define
Architecture Design
Quality Attribute Measures and
Risks
Analytical Model / Reasoning Framework
Architecture Design Knowledge Ecosystems
Private Ecosystem A
Private Ecosystem C
Private Ecosystem
B
Company
Employee
Public Ecosystem create
customized AK input form
share AK View AK
IDE Modeling Tool
collaboration
Modeling
AK Consume
Implementing
Integration integration
AK Extraction
AK Consume
Requirement CM/Issue Tracking
KBase
Hackstat: Data Collection and Analysis
• A framework for collecting and analyzing process and product data
• Provides IDE’s plugins called sensors that send information to a service called SensorBase
• Several services to compute metrics by using the data from SensorBase
• Metrics viewed using ProjectBrowser, which uses services like SensorBase, Telemetry, DailyProjectData, and TickerTape
Architecture of Hackystat
Provides visualization of different metrics through GUIs Generates reports for
external clients
Provides weekly, monthly and yearly
abstractions of metrics
Provides daily abstraction of data
Receives and stores data and provides daily abstractions
Quality Attributes & Architectural Decisions Quality
Attributes Architectural Decisions
Amazon EC2 & S3 Google App Engine Scalability Replication of system services to meet
performance requirments. No action required. Scalability is handled by platform.
Separation of database layer into a new service that utilizes platform specific persistency features.
Refactoring of persistency components to make it compatible with Google Datastore persistence.
Portability A wrapper layer is added to ensure platform independence. A separate database layer to provide seamless transfer of database layer.
Portability to other platforms is not possible.
Compatibility System features are exposed through origonal REST API. A wrapper layer is added to provide abstraction to services cluster and their deployment configuration.
System features are exposed through origonal REST API.
Reliability & Autonomous Scalability
Façade/Waper layer to provide abstraction. Amazon’s Elastic Load Balancer ensures autonomous scalability.
Ensured by platform.
Efficient & effective deployments
Amazon Elastic Load Balancer ensures auto scaling as well as efficient and cost effective deployment configuration.
Deployment of application components on cloud is managed by platform.
High Level Architectural Solution
• Tools Hosted in Public or Private Clouds
• Data (Content Elements) I n t e g r a t i o n t h r o u g h C o m m o n S e m a n t i c Model Using Ontologies
Core Elements of TaaS Space High-level Architecture Overview
Semantic Integration Among Tools • Explicitly describing common concepts • Mapping between tools specific and common concepts
ASR and Knowledge Management Tool Modeling Tool
End Users End Users
Building Semantically Integrated Data Model
End to End Integration • Probes and plugins to map
data of tools onto aggregated Ontology model.
• Generating RDF graphs from aggregated Ontology model.
ASR and KM Concepts Modeling Concept
Architecture of Integration Systems • Subsystem for Annotation, Semantic Integration and
Collaboration Notifications based on Ontology Model
• Pipeline architecture is used to distribute data elements between private and public cloud.
• Monitor observes the data elements and distributes them between private and public cloud process pipeline depending upon the distribution rules.
• Re-projection components combines the data elements after processing.
Pipe and Filter Architectural Style
• Multi-tenancy is handled at four layers of abstraction. – Service layers – Data access layer – Data store layer – Virtual infrastructure layer
• A combination depends upon tenants requirements.
Multi-tenancy Management
Concluding Remarks
• Like any other large scale system software architecture plays a vital role in big data systems
• Gap in architecture design knowledge can lead to design with huge technical debt
• Proven design principles and knowledge need to be adapted for architectural challenges
• Building architectural design knowledge is Important for developing big data systems
Acknowledgements
• Slides are based on the work that is being carried out in my group in close collaboration with several colleagues, students, and industrial partners.
• Some research challenges and promising solutions have been developed for joint research proposals.
• Wiki for big data architecture knowledge has been developed through a project carried out by Jing, Atbin, and Liang under my supervision.
• TaaS Platform work is being driven by Aufeef Chauhan.