sdims: a scalable distributed information management system

25
June 14, 2022 Department of Computer Sciences, UT Austin 1 SDIMS: A Scalable Distributed Information Management System Praveen Yalagandula Mike Dahlin Laboratory for Advanced Systems Research (LASR) University of Texas at Austin

Upload: abedi

Post on 19-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

SDIMS: A Scalable Distributed Information Management System. Praveen Yalagandula Mike Dahlin Laboratory for Advanced Systems Research (LASR) University of Texas at Austin. Goal. A Distributed Operating System Backbone Information collection and management - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 1

SDIMS: A Scalable Distributed Information Management System

Praveen Yalagandula Mike Dahlin

Laboratory for Advanced Systems Research (LASR)

University of Texas at Austin

Page 2: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 2

Goal

A Distributed Operating System Backbone Information collection and management

Core functionality of large distributed systems Monitor, query and react to changes in the system Examples:

Benefits Ease development of new services Facilitate deployment Avoid repetition of same task by different services Optimize system performance

System administration and management Service placement and location Sensor monitoring and control Distributed Denial-of-Service attack detection

File location service Multicast tree construction Naming and request routing …………

Page 3: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 3

Contributions – SDIMS

Provides an important basic building block Information collection and management

Satisfies key requirements Scalability

• With both nodes and attributes• Leverage Distributed Hash Tables (DHT)

Flexibility• Enable applications to control the aggregation• Provide flexible API: install, update and probe

Autonomy• Enable administrators to control flow of information• Build Autonomous DHTs

Robustness• Handle failures gracefully• Perform re-aggregation upon failures

– Lazy (by default) and On-demand (optional)

Page 4: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 4

Outline

SDIMS: a distributed operating system backbone

Aggregation abstraction Our approach

Design Prototype Simulation and experimental results

Conclusions

Page 5: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 5

Aggregation Abstraction

Attributes Information at machines

Aggregation tree Physical nodes are leaves Each virtual node represents a logical group of nodes

• Administrative domains, groups within domains, etc.

Aggregation function, f, for attribute A Computes the aggregated value Ai for level-i subtree

• A0 = locally stored value at the physical node or NULL• Ai = f(Ai-1

0, Ai-11, …, Ai-1

k) for virtual node with k children Each virtual node is simulated by one or more

machines Example: Total users logged in the system

Attribute: numUsers Aggregation Function: Summation

a b c dA0

A1

A2

f(a,b) f(c,d)

f(f(a,b), f(c,d))

4 1 2 2

5 4

9Astrolabe [VanRenesse et al TOCS’03]

Page 6: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 6

Outline

SDIMS: a distributed operating system backbone

Aggregation abstraction Our approach

Design Prototype Simulation and experimental results

Conclusions

Page 7: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 7

Scalability with nodes and attributes

To be a basic building block, SDIMS should support Large number of machines

• Enterprise and global-scale services• Trend: Large number of small devices

Applications with a large number of attributes• Example: File location system

– Each file is an attribute– Large number of attributes

Challenges: build aggregation trees in a scalable way Build multiple trees

• Single tree for all attributes load imbalance Ensure small number of children per node in the tree

• Reduces maximum node stress

Page 8: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 8

Building aggregation trees: Exploit DHTs

A DHT can be viewed as multiple aggregation trees Distributed Hash Tables (DHTs)

Each node assigned an ID from large space (160-bit) For each prefix (k), each node has a link to another node

• Matching k bits• Not matching on the k+1 bit

Each key from ID space mapped to a node with closest ID Routing for a key: Bit correction

• Example: from 0101 for key 1001• 0101 1110 1000 1000 1001

Lots of different algorithms with different properties• Pastry, Tapestry, CAN, CHORD, SkipNet, etc.• Load-balancing, robustness, etc.

A DHT can be viewed as a mesh of trees Routes from all nodes to a particular key form a tree Different trees for different keys

Page 9: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 9

DHT trees as aggregation trees

Key 111

010 001100 101110 111011000

1xx

11x

111

Page 10: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 10

DHT trees as aggregation trees

Keys 111 and 000

010 001100 101110 111011000

1xx

11x

111

010 001100 101110 111011000

0xx

00x

000

Page 11: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 11

API – Design Goals

Expose scalable aggregation trees from DHT Flexibility: Expose several aggregation

mechanisms Attributes with different read-to-write ratios

• “CPU load” changes often: a write-dominated attribute– Aggregate on every write too much communication cost

• “NumCPUs” changes rarely: a read-dominated attribute– Aggregate on reads unnecessary latency

Spatial and temporal heterogeneity• Non-uniform and changing read-to-write rates across tree• Example: a multicast session with changing membership

Support sparse attributes of same functionality efficiently Examples: file location, multicast, etc. Not all nodes are interested in all attributes

Page 12: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 12

Design of Flexible API

New abstraction: separate attribute type from attribute name Attribute = (attribute type, attribute name) Example: type=“fileLocation”, name=“fileFoo”

Install: an aggregation function for a type Amortize installation cost across attributes of same

type Arguments up and down control aggregation on update

Update: the value of a particular attribute Aggregation performed according to up and down Aggregation along tree with key=hash(Attribute)

Probe: for an aggregated value at some level If required, aggregation done to produce this result Two modes: one-shot and continuous

Page 13: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 13

Scalability and Flexibility

Install time aggregation and propagation strategy Applications can specify up and down Examples

• Update-Local up=0, down=0• Update-All up=ALL, down=ALL• Update-Up

up=ALL, down=0

Spatial and temporal heterogeneity Applications can exploit continuous mode in probe API

• To propagate aggregate values to only interested nodes• With expiry time, to propagate for a fixed time

Also implies scalability with sparse attributes

Page 14: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 14

Scalability and Flexibility

Install time aggregation and propagation strategy Applications can specify up and down Examples

• Update-Local up=0, down=0• Update-All up=ALL, down=ALL• Update-Up

up=ALL, down=0

Spatial and temporal heterogeneity Applications can exploit continuous mode in probe API

• To propagate aggregate values to only interested nodes• With expiry time, to propagate for a fixed time

Also implies scalability with sparse attributes

Page 15: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 15

Scalability and Flexibility

Install time aggregation and propagation strategy Applications can specify up and down Examples

• Update-Local

up=0, down=0

• Update-All up=ALL, down=ALL• Update-Up

up=ALL, down=0

Spatial and temporal heterogeneity Applications can exploit continuous mode in probe API

• To propagate aggregate values to only interested nodes• With expiry time, to propagate for a fixed time

Also implies scalability with sparse attributes

Page 16: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 16

Scalability and Flexibility

Install time aggregation and propagation strategy Applications can specify up and down Examples

• Update-Local

up=0, down=0• Update-All

up=ALL, down=ALL

• Update-Up up=ALL, down=0

Spatial and temporal heterogeneity Applications can exploit continuous mode in probe API

• To propagate aggregate values to only interested nodes• With expiry time, to propagate for a fixed time

Also implies scalability with sparse attributes

Page 17: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 17

Scalability and Flexibility

Install time aggregation and propagation strategy Applications can specify up and down Examples

• Update-Local up=0, down=0• Update-All up=ALL, down=ALL• Update-Up

up=ALL, down=0

Spatial and temporal heterogeneity Applications can exploit continuous mode in probe API

• To propagate aggregate values to only interested nodes• With expiry time, to propagate for a finite time

Also implies scalability with sparse attributes

Page 18: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 18

Autonomy

Systems spanning multiple administrative domains Allow a domain administrator control information flow

• Prevent external observer from observing the information in the domain

• Prevent external failures from affecting the operations in the domain

Support for efficient domain wise aggregation

DHT trees might not conform Autonomous DHT

Two properties• Path Locality• Path Convergence

Reach domain root first 010 001

100

101

110

111

011000

Domain 1 Domain 2 Domain 3

Page 19: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 19

Outline

SDIMS: a distributed operating system backbone

Aggregation abstraction Our approach

Leverage Distributed Hash Tables Separate attribute type from name Flexible API Prototype and evaluation

Conclusions

Page 20: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 20

Prototype and Evaluation

SDIMS prototype Built on top of FreePastry [Druschel et al, Rice U.] Two layers

• Bottom: Autonomous DHT• Top: Aggregation Management Layer

Methodology Simulation

• Scalability• Flexibility

Prototype• Micro-benchmarks on real networks

– PlanetLab – CS Department

Page 21: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 21

Methodology Sparse attributes: multicast sessions with small size membership Node Stress = Amt. of incoming and outgoing info

Two points Max Node Stress an order of magnitude less than Astrolabe Max Node Stress decreases as the number of nodes increases

1

10

100

1000

10000

100000

1e+06

1e+07

1 10 100 1000 10000 100000

Nod

e S

tres

s

Number of sessions

MAX node stress with participant size 8

Simulation Results - Scalability

Astrolabe 256

Astrolabe 4096

Astrolabe 65536

SDIMS 256

SDIMS 4096

SDIMS 65536

Page 22: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 22

Simulation Results - Flexibility

Simulation with 4096 nodes Attributes with different up and down

strategies

0.1

1

10

100

1000

10000

0.0001 0.01 1 100 10000

Num

ber

of m

essa

ges

per

oper

atio

n

Read to Write ratio

Update-Local

Update-Up

Update-All

Up=5, down=0

Up=all, down=5

Page 23: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 23

Prototype Results

CS department: 180 machines (283 SDIMS nodes)

PlanetLab: 70 machines

Department Network

0

100

200

300

400

500

600

700

800

Update - All Update - Up Update - Local

Lat

ency

(m

s)

Planet Lab

0

500

1000

1500

2000

2500

3000

3500

Update - All Update - Up Update - Local

Page 24: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 24

Conclusions

SDIMS – basic building block for large-scale distributed services Provides information collection and management

Our Approach Scalability and Flexibility through

• Leveraging DHTs• Separating attribute type from attribute name• Providing flexible API: install, update and probe

Autonomy through • Building Autonomous DHTs

Robustness through• Default lazy reaggregation• Optional on-demand reaggregation

Page 25: SDIMS: A Scalable Distributed Information Management System

April 21, 2023 Department of Computer Sciences, UT Austin 25

For more information:

http://www.cs.utexas.edu/users/ypraveen/sdims