distributed architectures - vorlesungen

16
Distributed Architectures Soware Architecture VO (706.706) Roman Kern Version 1.3.2 Institute for Interactive Systems and Data Science, TU Graz 1 Outline Introduction Distributed Architectures Basics Asynchronous Architectures Lambda Architecture Kappa Architecture 2 Introduction Goals Main goal: Scalability • In the optimal case a system scales linearly with the objective e.g., number of transactions, size of the data, users 3

Upload: others

Post on 02-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Distributed Architectures

So�ware Architecture VO (706.706)

Roman Kern

Version 1.3.2

Institute for Interactive Systems and Data Science, TU Graz

1

Outline

Introduction

Distributed Architectures Basics

Asynchronous Architectures

Lambda Architecture

Kappa Architecture

2

Introduction

Goals

Main goal: � Scalability

• In the optimal case a system scales linearly with the objective

• e.g., number of transactions, size of the data, users

3

Solutions

Main solutions

1. More e�icient algorithms

2. Use faster hardware (e.g., more powerful machines)• ô Scale vertically

3. Add more machines• ò Scale horizontally (scale out)• Parallel computing, distributed computing

4

# More e�icient algorithms is in many cases either not possible, or veryexpensive (development cost).

# More powerful machines is o�en a pragmatic solutions, and o�en evencheaper than the development costs - but might only be able to “buy time”until the limit is reached.

# For truly scalable solutions, scaling horizontally is the best/only option.# (Ignoring quantum computing here).

Distributed Architectures

Parallel computing vs. distributed computing

• In parallel computing all component share a common memory, typically threadswithin a single program

• In distributed computing each component has it own memory• Typically in distributed computing the individual components are connected over a

network• Dedicated programming languages (or extensions) for parallel computing

5

# This lecture mainly deals with distributed computing.

Distributed Architectures

Distributed architectures

• Most complex solution• Due to the added parallelism of data and processing• � Increased risk of errors

• Overall latency will be the the one of the slowest machine• � latency cannot be decreased via distributed solutions

• Therefore the architecture needs to be sound

• … focus on abstraction and composition

6

# In practice people are o�en biased towards technologies they know (e.g.SQL).

Distributed Architectures

http://nighthacks.com/roller/jag/resource/Fallacies.html

7

# Known fallacies (traps) that should be avoided.

Distributed Architectures

Di�erent levels of complexity

ø Lowest complexity for operations, which can easily be distributed• If they are independent and short enough be to executed independent from each other• And if the data can be partitioned into independent parts

û Higher degree of complexity for operations, which compute a single result onmultiple nodes

• Synchronisation of data access also raises the complexity

8

Distributed Architectures

Additional aspects of complexity

• Complexity = intrinsic complexity + accidental complexity

• Intrinsic complexity of the problem itself (i.e. how hard is the problem)• Accidental complexity arises from the implementation

• Low accidental complexity good for maintenance• If high accidental complexity, you can never be sure it has been correctly implemented

Risk� The risk of errors rises with the complexity

9

Distributed Architectures - Practical Advise

General advice to deal with complexity

• Design for failure• E.g. hardware will fail, human errors (bugs)• � Limit their consequences

• Push complexity into a single place• Also called “complexity isolation”• � E�ect of bugs will be minimised

• Avoid tricky operations

• Avoid to store aggregates• Prefer to deal with raw data• Treat the data as immutable

10

# Aggregates need to be recomputed as soon as the raw data changes.

Distributed Architectures - Example Failure

Example failure unique to distributed architectures

• Compaction• Maintenance task, which need to take place from time to time• Might need couple of minutes to execute• … comparable to garbage collection

• Typically a machine will be less responsive during this phase

• If multiple machines happen do conduct compaction at the same time

• … the whole cluster may stall

Solution� Try to get rid of the compaction, if possible

11

Distributed Architectures - Theory

CAP theorem

• Not possible to achieve all three properties1. Consistency

• Reads are guaranteed to incorporate all previous writes (all nodes see the same data atthe same time)

2. Availability• Every query returns an answer, instead of an error (failures do not prevent the remaining

system to be operational)3. Partitioning

• The systems runs, even if a part of the system is not reachable (e.g. due to network failure,message loss)

Implications of CAP

One needs to find a trade-o� between the properties ¤ , e.g., choose availability overconsistency

12

# Many relations to other so�ware architecture concepts, e.g.,# Consistency via data-centric architectures,# Availability via reliability of system,# Partitioning via distributed architectures.

Distributed Architectures - Theory

Eventual consistency

• “Best e�ort” for consistency

• In practice, best consistency property

• For highly available systems

• BASE instead of ACID• BASE (Basically Available, So� state, Eventual consistency)• ACID (Atomicity, Consistency, Isolation, Durability)

13

Examples of eventual consistency

• Sloppy quorums

– Temporal replicas for partitions → extreme availability– Some distributed database systems support this, high

complexity

• Conflict free replicated data types (CRDT)

– Specific CRDTs for specific use cases– Example: counting

• Increment only one replica• Sum over all replicas• If unavailable, simply assume the maximum when merging

– For more complex use-cases

• Read/repair algorithms• Complex → error prone

Distributed Architectures - Theory

Eventual accuracy

• Approximations instead of exact results

• Needed, if expensive/not possible to compute exactly

• Might be temporal• Exact results a�er a period of approximations

14

Examples of approximate accuracy

• O�en a true value is not needed, but a rough estimate

– e.g., hit count for a web search

• Examples for approximate algorithms

– Counting → HyperLogLog– Frequencies of members → CountMinSketch– Set membership → Bloom filters

Distributed Architectures - Theory

Locality of Reference

• “Keep related data close”

• Granularity• CPU caches to storage position within the data centre

• In a distributed context: bring the code to the data

• Best practice: position the data in a way it is accessed• Partition the data according to some criteria• e.g., rack awareness of the computing infrastructure

15

# Related data - typically data needed (i) within one operation, (ii) to servethe same/similar use cases.

Architecture for Simple Applications

Share Nothing Architecture

• No centralised data storage

• Can scale almost infinitely

• Used since the beginning of the 80ies, popularised by Google

• Only a few systems allow for such an architecture

ImplicationsIf a system requires some sort of shared resources or orchestrated processing, thecomplexity rises.

16

Distributed Architectures Basics

Distributed Architectures Basics

Number of issues to address

1. Serialisation

2. Group membership

3. Leader election

4. Distributed locks

5. Barriers

6. Shared resources

7. Configuration

17

Distributed Architectures Basics - Serialisation

Serialisation

• Transform an object into a byte array (and back)• Needed to transfer objects between nodes in a distributed environment• Used to store objects, e.g., in databases

• Should work across programming languages

• Therefore serialisation frameworks provide a Schema Definition Language

• Examples: Thri�, Protocol Bu�ers, Avro

18

Distributed Architectures Basics - Group Membership

Group membership

• When a single node comes online…

• How does it know where to connect to?

• How do the other members know of an added node?

19

Distributed Architectures Basics - Group Membership

• � Peer-to-peer architectural style

• Each node is client, as well as server

• Parts of the bootstrapping mechanism

• Dynamic vs. static

• Fully dynamic via broadcast/multicast within local area networks (UDP)

• Centralised P2P - e.g., central login components/servers

• Static lists of group members (needs to be configurable)

20

Distributed Architectures Basics - Leader Election

Leader election

• Not all nodes are equal, e.g., centralised components in P2P networks

• Single node acts as master, others are workers

• Some nodes have additional responsibilities (supernodes)

• Having centralised components makes some functionality easier to implement• e.g., assign work-load

• Disadvantage: might lead to a single point of failure

21

Distributed Architectures Basics - Leader Election

• � Client-server architectural style

• Once the leader has been elected, it takes over the role of the server

• All other group members then act as clients

22

Distributed Architectures Basics - Leader Election

23

# Single master (for coordination) and several workers (which are indepen-dent from each other).# But a single master node is also a single-point-of-failure, i.e., if it fails thesystem is no longer working.# Thus a mechanism is needed to (i) detect the failed master node, (ii) selecta new master node (e.g., start a cold spare backup), (iii) inform all nodes ofthe new master node.

Distributed Architectures Basics - Leader Election

24

# If the master, a backup need to take over.

Distributed Architectures Basics - Leader Election

25

# If the master is distributed, there is no single-point-of-failure.# But the complexity of realising a distributed master is higher (not advis-able to implement on a small budget/yourself).

Distributed Architectures Basics - Locks

Distributed locks

• Restrict access to shared resources to only a single node at a time• e.g., allow only a single node to write to a file

• May yield many non-trivial problems, for example deadlocks or race conditions

• Distributed locks without central component are very complex to realise

26

Distributed Architectures Basics - Locks

Distributed locks example

• � Blackboard architectural style

• The shared repository is responsible to orchestrate the access to a locks

• Notifies waiting nodes once the lock has been li�ed

• This functionality is o�en coupled with the elected leader

27

Distributed Architectures Basics - Barriers

Barriers

• Specific type of distributed lock

• Sychronise multiple nodes

• e.g., multiple nodes should wait until a certain state has been reached

• Used when a part of the processing can be done in parallel and some parts cannotbe distributed

28

Distributed Architectures Basics - Shared Resources

Shared Resources

• If all nodes need to be able to access a common data-structure

• Read-only vs. read-write

• If read-write, the complexity rises due to synchronisation issues

29

Distributed Architectures Basics - Zookeeper

Example: Apache Zookeeper

• Zookeeper is a framework/library

• Used by LinkedIn, Facebook

• Initially developed by Yahoo!, now managed by Apache

• Features• Coordination kernel• File-system like API• Synchronisation, Watches, Locks• Configuration• Shared data

30

Distributed Architectures Basics - DFS

Distributed File Systems

• Virtual file system distributed over multiple machines• Based on a local file system

• Same semantics like traditional file systems• Folders & files

• Files are internally split into smaller blocks (e.g. 64MB)

• Blocks are redundantly stored on multiple machines

• Logic to record, which block is stored on which machine

31

# Optimised for large files

# Can be used like a traditional file system.

# Works best, if the data needed for a single operation (use case) is locatedon the same machine (� locality of reference).

Distributed Architectures Basics - Sharding

Sharding

• Split the data horizontally ò

• Each node in a network may manage a separate chunk of the data

• For example in web search engines• Each node is responsible for a number of web-pages• Returns search results from the local collection• All results from all shards are then combined into a single result

32

Distributed Architectures Basics - Sharding

Sharding example

33

# One partition of the data (shard) contain all web sites from .at, and anotherall sites from .de.

# The appropriate partitioning logic depends on the application/use case.

Distributed Architectures Basics - Sharding

Sharding - Properties

• Need redundancy, in case a node goes down

• Level of redundancy depends on the data

• e.g., if a node with low-tra�ic web-pages goes down, it might not even have animpact on the quality of the search results (at least on the first page)

34

Distributed Architectures Basics -�ality A�ribute

Anarchic Scalability

• Design for a large distributed system

• Parts of the system are developed independently from each other

• Therefore the system need to be designed for malfunctioning or even maliciouslycomponents

• The web is an example where anarchic scalability is one of the most importantaspects

• Thus clients are not expected to know all servers• … and servers are not supposed to know all clients• Another consequence is the link integrity → only one-directional

35

Asynchronous Architectures

Asynchronous Architectures - Motivation

Synchronous architectures

• Each call terminates a�er the request has been completely processed• e.g., traditional data-centric architectures (databases)

• Ý Easy to use and predictive behaviour

• Ä Does not deal well with load (need to plan with the worst case)

Asynchronous architectures

• The call returns before the request has been processed• The processing happens in the background

• Ä Non predictive behaviour

• Ý Load can be distributed over time, thus be�er scalability

36

Asynchronous Architectures - Worker

Asynchronous worker architecture

The client issues a call without waiting for the result

37

# The load balances dispatches the call (request) to one of the workers.# E.g., the one with the current lowest load.

Asynchronous Architectures - Worker

Asynchronous worker architecture

• The client does not wait for the end of the processing• I.e. does not track the result

• If the worker fails, all currently processed requests will fail

�eue MotivationIntroduce a new component to track the processing of the requests, e.g. a queueingsystem

38

Asynchronous Architectures -�eue &Worker

Asynchronous queue & worker architecture

The client puts the request into the queue, the workers poll the queue for requests

39

# The queue also tracks the processing of the requests.

Asynchronous Architectures -�eue &Worker

�eue and worker architecture

• The client is decoupled from the worked via a queue component

• The queue component is (usually) responsible to track the status of the requests

• The size of the queue depends on the current load

• If a worker fails, the request will be put back into the queue

• Default architecture choice for many distributed systems

40

# O�en, the queue itself might be distributed and might store the requests(e.g. in a database).

# See also publish-subscribe architectural style (producer-consumer pat-tern).

Asynchronous Architectures -�eue &Worker

�eue and worker architecture aspects

• The number of workers/applications may vary• Single consumer vs. multi consumer queues• Multiple independent workers that execute the same code vs.• … each worker has a di�erent task

• Typically queues are FIFO (first in, first out)• Some queue support items with higher priority

• In some configurations the application is responsible to track the status

• A typical application may consist of multiple layers of queues and workers• I.e. the output of a worker is fed as input to another queue

41

Asynchronous Architectures -�eue &Worker

�eue & worker properties

• The architecture is straightforward, but not simple

• Race conditions may occur

• Partitioning might be hard to implement• � Bad for fault tolerance

• Tedious to build• Much code necessary for serialisation/routing logic/monitoring of queues/etc.

• Complex deployment for all workers and queues

42

Asynchronous Architectures - Stream processing

Stream processing

• The queue & worker architecture can be used for stream processing• Not suitable for real-time applications, due to delay from the queue

• Continuous stream of incoming requests• I.e. the queue is perpetually filled• O�en these reflect events in such se�ings

43

Asynchronous Architectures - Stream Processing

Two basic stream processing types

• One-at-a-time

• Each event is processed individually

• Micro-batch

• Multiple events are combined into a single batch

44

Asynchronous Architectures - Stream Processing

Basic execution semantics

• At-least-once

• Each event is guaranteed to be executed• But might be processed more o�en (thus the same result might be reported multiple

times)• Might introduce inaccuracies

• At-most-once

• Each event is optionally executed, but no more than once

• Exactly-once

• Each event is guaranteed to be executed only once

45

Asynchronous Architectures - Stream Processing

Trade-o� between the stream processing types

One-at-a-time Micro-Batch

Lower Latency XHigher Throughput XAt-least-once semantic X XExactly-once semantic (sometimes) XSimpler programming model X

46

# Depends on the application, which is be�er.# http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/

Asynchronous Architectures - Stream Processing

Implications

• Exactly-once can be achieved via strictly ordered processing• Fully process an event before continuing

• More e�icient for micro-batches• Multiple batches in parallel• Need to store the batch-id of the last successfully processed batch

47

Asynchronous Architectures - Stream Processing

Strength Examples

Batch High throughput Hadoop, SparkOne-at-a-time Low latency StormMicro-batch Tradeo� Trident, Spark

Table 1: Comparison of the main distributed architectures

48

Lambda Architecture

# The Lambda Architecture is an example on how to deal with large amountsof data.

Lambda Architecture - Motivation

Target scenario

• Large amount of data• Too big for a single machine � distributed system

• Data is continuously updated• Mostly just additions, i.e. new data

• Majority of operations are read-only

• E�ectively, queries on the data

49

Lambda Architecture - Motivation

Typical solution

• The typical solution would be a data-centric architecture

• Data is stored in a distributed traditional RDBMS or noSQL databases• Updates are wri�en into the database (via transactions)• The queries are computed in a distributed manner, over all the data

• As some queries are too slow, they need to be precomputed• … and the precomputed results need to be incrementally updated (as soon a new data

comes in)

Incremental architectureThe incremental architecture is too complex, mainly because of (i) the (distributed)transaction support and (ii) the complex algorithms to merge the new data.

50

Lambda Architecture - Motivation

Target properties

• Robustness & fault tolerance

• Scalability

• Generalisation

• Extensibility

• Ad hoc queries

• Minimal maintainance

• Debuggability

51

Ad-hoc queries for exploratory data analysis

Lambda Architecture - Overview

52

# All data is stored in the master dataset, which is immutable (i.e., no up-dates), which is updated only on specific events (checkpoints), thus the mas-ter dataset will most of the time out-dated.# Instead of pre-computing the results, dedicated data structures are filled,that can be queried (but only specific queries are possible) -¿ known as batchview.# e.g. for each use case there is one precomputed batch view.# To deal with new data, there is the speed layer, which is updated as soonas new data is available (always up-to-date).# For each query, both the speed layer and the serving layer (batch view)are used and the results merged.

Lambda Architecture - Overview

53

# The realtime views in the speed layer, as well as the batch views in theserving layer are o�en (traditional) databases.# On a regular basis the master dataset gets updated:# 1. All pending changes get merged into the master dataset# 2. The batch views get updated# 3. The speed layer gets reset

Kappa Architecture

# Another example of a distributed architecture.

Kappa Architecture - Motivation

Basic idea behind Kappa Architecture

• The Lambda Architecture is relatively complex

• Most of the problems can be solved via a streaming approach

• → treat the batch processing as if it were a stream of data• Have the framework optimise for the two cases (stream vs. batch)

54

8 Rules of Stream Processing (Stonebraker et at., 2005)

1. Keep the Data Moving

2. �ery using SQL on Streams (StreamSQL)

3. Handle Stream Imperfections (Delayed, Missing and Out-of-OrderData)

4. Generate Predictable Outcomes

5. Integrate Stored and Streaming Data

6. Guarantee Data Safety and Availability

7. Partition and Scale Applications Automatically

8. Process and Respond Instantaneously

# http://blog.acolyer.org/2014/12/03/

the-8-requirements-of-real-time-stream-processing/

# Stonebraker, M., Cetintemel, U., & Zdonik, S. (2005). The 8 requirements of real-

time stream processing. ACM Sigmod Record, 34(4), 42-47.

Kappa Architecture - Example

Example of streaming framework: Apache Kafka

• Message queueing system

• Stores all data for a given timespan, e.g. 2 days

• Topics• Partitions within topics• Partitions are chosen by the application, i.e. both producers and consumers need to

agree on the partitions

• Broker

• Each message within a partition has an o�set

• Clients have a unique id and remember their o�set

55

The End

56

# https://www.enterpriseintegrationpatterns.com/patterns/

messaging/