routing on flat labels

Routing on Flat Labels 1

Routing on Flat Labels

Matthew Caesar, Tyson Condie, Jayanthkumar Kannan, Karthik Lakshminarayanan, Ion Stoica, Scott Shenker

Appeared in Sigcomm 2006

Presented by:Jackson Pang

CS 217B, Spring 2009


What’s wrong with Internet addressing today?

Hierarchical addressing allows excellent scaling properties

But, forces addressing to conform to network topology

Since topology is not static, addresses can’t persistently identify hosts


Topology-Based Addressing

Disadvantages: complicates Access control Topology changes Multi-homing Mobility

Advantage Scalability Scalability Scalability …

Area 1

Area 2

Area 4

Area 3

B

J

S

K

Q

F

V

X

A

1.11.2

2.12.2

4.2

4.1

3.33.2

3.1


What’s wrong with Internet addressing today?

The concept of location vs. identity in IP is mixed Most network applications today require persistent

identity It’s hard to provide persistent identity in presence of

hierarchical addressing Need to decouple identity from addressing Drastically complicates network configuration,

mobility, address assignment Is hierarchy the only way to scale routing?


Motivation for Flat Identifiers

Stable references Shouldn’t have to change when object moves

Object replication Store object at many different locations

Avoid fighting over names Avoid cyber squatting, typo squatting, …

<A HREF=http://isp.com/dog.jpg>my friend’s dog</A>

<A HREF=http://isp.com/dog.jpg>my friend’s dog</A>

<A HREF=http://f0120123112/ >my friend’s dog</A>

<A HREF=http://f0120123112/ >my friend’s dog</A>

Current Proposed


Is there an alternative?

Why not route on “flat” host identifiers? Assign addresses independently of network topology

Benefits: No separate name resolution system required Simpler network config/allocation/mobility Simpler network-layer access controls


Basic idea behind ROFL

Scalable routing on flat identifiers

Goal #1: Scale to Internet topologies How do you fetch a web page by typing google.com?

Goal #2: Support for BGP policies How do you preserve the customer – provider, peering

relationships?

Highly challenging problem, but solution would give a number of benefits


Basic mechanisms behind ROFL

Goal #1: Scale to Internet topologies Mechanism: DHT-style routing, maintain source-routes to

successors/fingers Provides: Scalable network routing without aggregation

Goal #2: Support for BGP policies Mechanism: Intelligently choose successors/fingers to

conform to ISP relationships Provides: Support for policies, operational model of BGP


How ROFL works

ISPISP

0xFA2910x3B57E(joining host)

0x3F6C0

0x3BAC80x3B57E

0x3F6C0

Successor list: 0x3F6C0

Pointer list: 0x3F6C0 0x3BAC8

Pointer cache: 0x3B57E

2. hosting routers participate in ROFL on behalf of hosts

3. hosting routers maintain pointers with source-routes to attached hosts’ successors/fingers

4. intermediate routers may cache pointers

5. external pointers provide reachability across domains

1. hosts are assigned topology-independent “flat” identifiers

0x3BAC8


Background Ideas

Distributed Hash Tables Hashing, Consistent Hashing, DHT goals

Chord Join, leave, index fingers, caching Stoica I., et al., “Chord: A Scalable Peer-to-peer Lookup Protocol

for Internet Applications”, IEEE/ACM Transactions on networking, 2003

Canon Support for hierarchy so that we can do BGP-style routing Ganesan, P., et al., “Canon in G Major: Designing DHTs with

Hierarchical Structure”, ACM Distributed Computing Systems, 2004.


DHT Background: Hash Table

Name-value pairs (or key-value pairs) E.g., www.cnn.com/foo.html (key) and the Web page

(value)

Hash table Data structure that associates keys with values

lookup(key) valuekey value


DHT Background: Hash Functions

Hashing Transform the key into a number And use the number to index an array

Example hash function Hash(x) = x mod 101, mapping to 0, 1, …, 100 More sophisticated ones: MD5, SHA1, SHA256

Challenges (collisions, joins and leaves) What if there are more than 101 nodes? Fewer? Which nodes correspond to each hash value? What if nodes come and go over time? But ROFL assumes that we’ve got the hashing issue

covered.


DHT Background: Consistent Hashing

Large, sparse identifier space (e.g., 128 bits) Hash a set of keys x uniformly to large id space Hash nodes to the id space as well

0 1

Hash(name) object_idHash(IP_address) node_id

Id space represented

as a ring.

2128-1


DHT (Distributed Hash Table):

Hash table spread over many nodes Distributed over a wide area

Main design goals Decentralization: no central coordinator Scalability: efficient even with large # of nodes Fault tolerance: tolerate nodes joining/leaving

Two key design decisions How do we map names on to nodes? How do we route a request to that node?


Chord Background: What is Chord? What does it do?

Supports just one operation: given a key, it maps the key onto a node

In short: a peer-to-peer lookup service

Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures

Core operation in most p2p systems is efficient location of data items


Chord Characteristics (O(LogN))

Simplicity, provable correctness, and provable performance

Each Chord node needs routing information about only a few other nodes

Resolves lookups via messages to other nodes (iteratively or recursively)

Maintains routing information as nodes join and leave the system


Chord Characteristics

Simplicity, provable correctness, and provable performance

Each Chord node needs routing information about only a few other nodes

Resolves lookups via messages to other nodes (iteratively or recursively)

Maintains routing information as nodes join and leave the system


DNS vs. Chord

DNS

provides a host name to IP address mapping

relies on a set of special root servers

names reflect administrative boundaries

is specialized to finding named hosts or services

Chord

can provide same service: Name = key, value = IP

requires no special servers

imposes no naming structure

can also be used to find data objects that are not tied to certain machines

But that doesn’t mean we replace DNS in ROFL


The Base Chord Protocol

Specifies how to find the locations of keys

How new nodes join the system

How to recover from the failure or planned departure of existing nodes


Chord: Recall Consistent Hashing

Hash function assigns each node and key an m-bit identifier using a base hash function such as SHA-1 ID(node) = hash(IP, Port)

ID(key) = hash(key)

Practically, we hash the node’s public key to get the hash key

Leverage on the benefits of consistent hashing


Chord Definitions: Successor Nodes

6

1

2

6

0

4

26

5

1

3

7

2identifier

circle

identifier

node

X key

successor(1) = 1

successor(2) = 3successor(6) = 0

Shows which node holds which keys


Node Joins and Departures

6

1

2

0

4

26

5

1

3

7successor(6) = 7

6

1

successor(1) = 3


Scalable Key Location

A very small amount of routing information suffices to implement consistent hashing in a distributed environment

Each node need only be aware of its successor node on the circle

Queries for a given identifier can be passed around the circle via these successor pointers

Resolution scheme correct, BUT inefficient: it may require traversing all N nodes!


Acceleration of Lookups

Lookups are accelerated by maintaining additional routing information

Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table

ith entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide)

s = successor(n + 2i-1) (all arithmetic mod 2)

s is called the ith finger of node n, denoted by n.finger(i).node


Finger Tables (of successors)

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130

finger tablestart int. succ.

keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

000


keys6


Finger Tables - Node Joins

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130


keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

000


keys


keys

702

[7,0)[0,2)[2,6)

003

6

6

66

6


Finger Tables - Node Departure

0

4

26

5

1

3

7

124

[1,2)[2,4)[4,0)

130


keys1

235

[2,3)[3,5)[5,1)

330


keys2

457

[4,5)[5,7)[7,3)

660


keys


keys

702

[7,0)[0,2)[2,6)

003

6

6

6

0

3


Source of Inconsistencies:Concurrent Operations and Failures

Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups

Those successor pointers can then be used to verify the finger table entries

Every node runs stabilize periodically to find newly joined nodes


Stabilization after Join

n joins predecessor = nil n acquires ns as successor via some n’

n notifies ns being the new predecessor

ns acquires n as its predecessor

np runs stabilize

np asks ns for its predecessor (now n)

np acquires n as its successor

np notifies n

n will acquire np as its predecessor

all predecessor and successor pointers are now correct

fingers still need to be fixed, but old fingers will still work

np

su

cc(n

p)

= n

s

ns

n

pre

d(n

s)

= n

p

nil

pre

d(n

s)

= n

su

cc(n

p)

= n


Failure Recovery

Key step in failure recovery is maintaining correct successor pointers

To help achieve this, each node maintains a successor-list of its r nearest successors on the ring

If node n notices that its successor has failed, it replaces it with the first live entry in the list

stabilize will correct finger table entries and successor-list entries pointing to failed node

Performance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked


Hierarchical DHT: Support for Inter-domain Routing and BGP-ness

What is a Hierarchical DHT?

UCLA MIT

USA

CS EE

Path Locality

Efficiency, Security,

Fault isolation

Path Convergence

Caching, Bandwidth

Optimization

Local DHTs

Access Control


Crescendo:Convert flat DHTs to hierarchical (Canon-ization)

Key idea: Recursive structure Construct bottom-up; merge smaller DHTs Lowest level: Chord

CS EE


Crescendo: Merging two Chord rings

0

12

10

52

13

8

3

2

13

8

3Black node x: Connect to y iff

• y closer than any other black node

• y = succ(x + 2^i)


Now Back to the ROFL…. How ROFL works

ISPISP

0xFA2910x3B57E(joining host)

0x3F6C0

0x3BAC80x3B57E

0x3F6C0

Successor list: 0x3F6C0

Pointer list: 0x3F6C0 0x3BAC8

Pointer cache: 0x3B57E

2. hosting routers participate in ROFL on behalf of hosts

3. hosting routers maintain pointers with source-routes to attached hosts’ successors/fingers

4. intermediate routers may cache pointers

5. external pointers provide reachability across domains

1. hosts are assigned topology-independent “flat” identifiers

0x3BAC8


Identifiers

Identity tied to public/private key pair Everyone can know the public key Only authorized parties know the private key

Self-certifying identifier: hash of public key Host associates with a hosting router

Proves it knows private key, to prevent spoofing Router joins the ring on the host’s behalf

Anycast Multiple nodes have the same identifier


AS and BGP Policies (A review) Economic relationships

Peering Provider/customer

Isolation routing contained within hierarchy


Isolation in ROFL (Canon)

• Traffic between two hosts traverses no higher than their lowest common provider in the AS hierarchy• Accomplished by carefully merging the rings

Joininghost

InternalSuccessor

ExternalSuccessor

ExternalSuccessor

Source Destination


Policy support in ROFL (BGP support)

B CPeering link

Mechanism:Convert peeringrelationships to Virtual ASes

Source Destination

A

Source Destination

VirtAS

A

CB

Goal: prefer peer route over

provider route


Scalability in ROFL

Two extensions to improve locality: Maintain proximity-based fingers in a policy-safe fashion Pointer caching strategies: prefer nearby, popular pointers

0x3B57

0x3BAC0x3B57

0xA153

0xF64B

0x3BAC

0xA153

0xF64B

pc: 0xA153 pc: 0x3BAC 0xF64B


Evaluation

Intra-domain Trace based on “Rocketfuel” over 4 large ISPs with 2-3

millions of hosts Each host w/ 128-bit ID Routers have 9Mbits of memory for pointer cache and

finger tables

Inter-domain Using Routeviews, inter-AS topology graph are derived Ran simulations with 30 thousand nodes and extrapolated

results to to 600 million hosts


Metric ROFL BGP+DNS

Join overhead

450 per-host

“lightweight joins”: 14 per-host

typically 0 per host

40,000 per prefix

Latency Baseline: 135ms

With pointer caches: 70ms

With lookup: 137ms

No lookup: 54ms

State 2 million pointer cache entries in core,

~100 entries at edge

150 thousand entries at core and edge

77 million DNS entries at root servers

Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers

Undergoes convergence process, failures have global impact


Metric ROFL BGP+DNS

Join overhead

450 per-host



40,000 per prefix



With lookup: 137ms

No lookup: 54ms








Contributions: Clean-slate design with novel ideas

A world without RIR’s to issue IP addresses! The design includes both inter-domain and intra-domain routing

issues. Did not side-step essential the features. ROFL is very attractive for new Internet usage patterns such as:

multicast, mobility, multi-homing. Secret Sauces:

Brings up design issues we will face in a clean-slate approach Put up a good fight to challenge the traditional IP and DNS-based

infrastructure Extension of already powerful ideas such as: DHT, Chord, Hierarchical

DHT (combined 1000+ citations). Honesty about “conservation of dirt” – didn’t bother to sweep them

elsewhere Fundamental computer science ideas: hashing, caching, recursion,

doubly linked-lists


ROFL Weaknesses:

Identity tied to public / private key pair PKI notion in Internet: who are the trusted authorities?

Unlikely but possible hash collision Identity -> hash translation

Per-AS link-state information assumed to be current and precise. Assumes existence of OSPF-like link-state information Accomplished via the “control messages”

Needs large pointer cache, but ok with current technology? Inter-AS routing needs large bloom filter for each neighboring

AS Issues with quickly updating bloom filters on joins and leaves?


References

Stoica I., et al., “Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications”, IEEE/ACM Transactions on networking, 2003

Ganesan, P., et al., “Canon in G Major: Designing DHTs with Hierarchical Structure”, Distributed Computing Systems, 2004. Proceedings.

Special Thanks for the presentation material: Jennifer Rexford, Princeton Markus Böhning, UC Berkeley

routing on flat labels

Documents