the data-centric revolution in networking?

49
1 The Data-Centric Revolution in Networking? Scott Shenker International Computer Science Institute U. C. Berkeley Liberally stealing the insight and work of others, particularly Hari Balakrishnan, Deborah Estrin, Ramesh Govindan, Joe Hellerstein, and Ion Stoica

Upload: conan

Post on 24-Feb-2016

40 views

Category:

Documents


2 download

DESCRIPTION

The Data-Centric Revolution in Networking?. Scott Shenker International Computer Science Institute U. C. Berkeley. Liberally stealing the insight and work of others, particularly Hari Balakrishnan, Deborah Estrin, Ramesh Govindan, Joe Hellerstein, and Ion Stoica. Two Communities Apart. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Data-Centric Revolution in Networking?

1

The Data-Centric Revolution in Networking?

Scott ShenkerInternational Computer Science Institute

U. C. Berkeley

Liberally stealing the insight and work of others, particularly Hari Balakrishnan, Deborah Estrin,

Ramesh Govindan, Joe Hellerstein, and Ion Stoica

Page 2: The Data-Centric Revolution in Networking?

2

Two Communities Apart

Networking (Internet) researchers:- don’t know and don’t care about databases

Vast gap between communities- much more overlap with other systems communities

But data-centrism has narrowed the gap- metaphors and algorithms

This talk will tell that story: in reverse order- Internet, then sensornets

Page 3: The Data-Centric Revolution in Networking?

3

Our Central Mission

Get data from here to there from here to there

Page 4: The Data-Centric Revolution in Networking?

4

Host-centric Protocols

Protocols defined in terms of IP addresses:- Unicast: IP address = host- Multicast: IP address = set of hosts

Destination address is given to protocol

Protocol delivers data from one host to another- unicast: conceptually trivial- multicast: address is logical, not physical

Page 5: The Data-Centric Revolution in Networking?

5

Host-centric Applications

Classic applications: destination is “intrinsic”- telnet: target machine- FTP: location of files- electronic mail: email address turns into mail server- multimedia conferencing: machines of participants

Destination is specified by user (not network)- Usually specified by hostname not address

DNS translates names into addresses

Page 6: The Data-Centric Revolution in Networking?

6

Domain Name System (DNS)

DNS is built around recursive delegation- Top level domains (TLDs): .com, .net, .edu, etc.- TLDs delegate authority to subdomains

• berkeley.edu- Subdomains can further delegate

• cs.berkeley.edu

Hierarchy fits host administrative structure- Local decentralized control- Crucial to efficient hostname resolution

Page 7: The Data-Centric Revolution in Networking?

7

Network Research in Early 90’s

Consumed by a few obsessions:- Quality of service for streaming media- Multicast- Congestion control

But nobody questioned host-centricity:- assumed to be the only way to build Internet

Page 8: The Data-Centric Revolution in Networking?

8

Surprise #1: The web catches on!

But we don’t....

Page 9: The Data-Centric Revolution in Networking?

9

The Web

Web URLs have host-name/path format- Essentially the same information as FTP

Early web:- browsers basically a GUI for FTP- URLs were easily transmitted pointers

Early web was host-centric- and largely ignored (but used) by net researchers

Page 10: The Data-Centric Revolution in Networking?

10

Modern Web

URLs often function as names of data- users think of www.cnn.com as data, not a host- Fact that www.cnn.com is a hostname is irrelevant

Users want data, not access to particular host

The web is now data-centric

Page 11: The Data-Centric Revolution in Networking?

11

Data-centric App in Host-centric World

Data still associated with host names (URLs)- administrative structure of data same as hosts- weak point in current web

Key enabler: search engines- Searchable databases map keywords to URLs- Allowed users to find desired data

Networkers focused on technical problems:- HTTP, persistence (URNs), replication (CDNs), ...

Page 12: The Data-Centric Revolution in Networking?

12

We Missed the Point!

We thought: web was an aberration search engines were a sufficient hack

No networker (except Jacobson) articulated that: web had gone from host-centric to data-centric it was a harbinger of future applications

Page 13: The Data-Centric Revolution in Networking?

13

Surprise #2: Stolen Music is Popular!

And we finally get the message...

Page 14: The Data-Centric Revolution in Networking?

14

The P2P Filesharing Phenomena

Napster: “Fastest growing Internet application”

Music sharing is intrinsically data-centric- data never associated with hosts

Centralized searchable database- listed IP addresses where content could be found- analogous to Google+DNS in the web

Legal problems forced decentralization- Led to Gnutella and other distributed programs

Page 15: The Data-Centric Revolution in Networking?

15

Gnutella-style File Sharing

Gnutella nodes form an overlay network- each node has a few “neighbors” in a virtual network- virtual link: node knows other’s IP address- do app-level “networking” on this graph

Page 16: The Data-Centric Revolution in Networking?

16

Gnutella-style Searching

Keyword queries are flooded (within scope)- query is processed locally at each node- all nodes having hits respond to source- many variations on this theme (freenet, etc.)

Clearly not scalable

P2P traffic now sizable fraction of overall load

We finally realize that we need a scalable way to find data for data-centric applications

Page 17: The Data-Centric Revolution in Networking?

17

Is there life outside the Internet?

Yes, and we should have been listening!

Page 18: The Data-Centric Revolution in Networking?

18

Sensornets (predating P2P)

Vision:- Many sensing devices with radio and processor- Enable fine-grained measurements over large areas- Huge potential impact on science, and society

Technical challenges:- untethered: power consumption must be limited- unattended: robust and self-configuring- wireless: ad hoc networking

Page 19: The Data-Centric Revolution in Networking?

19

Conceptual Challenge

Sensornets are inherently data-centric- Users know what data they want, not where it is- Estrin, Govindan, Heidemann (2000, etc.)

Centralized database infeasible- vast amount of data, constantly being updated- small fraction of data will ever be queried- sending to single site expends too much energy

Page 20: The Data-Centric Revolution in Networking?

20

Flood-then-Aggregate

General class of methods: Flood query to all nodes (or in region) Nodes with data matching query respond Responses are aggregated as appropriate

Examples: Directed diffusion: reinforce based on data TAG: tree for flood and return-path aggregation Etc....

Page 21: The Data-Centric Revolution in Networking?

21

Scaling Problems

This approach suffers as: systems get bigger queries more frequent and more specific

For current deployments, not an issue: systems are small, queries primitive

But if technology progresses as hoped: want to get relevant data without flooding similar to situation in Internet

Page 22: The Data-Centric Revolution in Networking?

22

Is Data-centric Flooding Necessary?

The initial decentralized data-centric designs (in both Internet and sensornets) used flooding

- unscalable and unsustainable

Since data-centrism is here to stay, we can’t ignore this problem

We had to broaden our research charter

Page 23: The Data-Centric Revolution in Networking?

23

Our Revised Mission

Get data from here to thereGet data from here to there

Page 24: The Data-Centric Revolution in Networking?

24

A DNS for Data?

Can we map data names into addresses?- a data-centric DNS, distributed and scalable- doesn’t alter net protocols, but aids data location- not just about stolen music, but a general facility

A formidable challenge:- Data does not have a clear administrative hierarchy- Likely need to support a flat namespace- Can one do this scalably?

Data-centrism requires scalable flat lookups

Page 25: The Data-Centric Revolution in Networking?

25

Distributed Hash Tables (DHTs)

The latest networking fad....

Presented from the Internet perspective but applies to sensornets as well

Page 26: The Data-Centric Revolution in Networking?

26

An Internet-scale Distributed Index

Interface: put(key,object), get(key)

DHTs form a structured overlay network nodes choose particular neighbors all objects have keys, usually hash(name) each node responsible for range of keys puts/gets routed to appropriate node

Page 27: The Data-Centric Revolution in Networking?

27

Example Design: Chord

Node and object keys: - random location around a circle

Neighbors: - nodes 2-i around the circle- found by routing to desired key

Routing: greedy- pick nbr closest to destination

Storage: “own” interval- node owns key range between

her key and previous node’s key

Ownershiprange

Page 28: The Data-Centric Revolution in Networking?

28

Key Properties

Large aggregate capacity: O(n) storage/b’width

Scalable: - O(log n) routing hops and state- O((log n)2) update costs for node join/leaves

Robust: self-configuring and resilient to failures

Nonproperty: strict guarantees when failures

Page 29: The Data-Centric Revolution in Networking?

29

Our Version of “Data Independence”

DHT interface allows us to get data by name- We no longer care where data is

A radical transition in databases- perhaps it will be one in networking as well

Apologies to Joe Hellerstein...- see latest SIGMOD Record for his article

Page 30: The Data-Centric Revolution in Networking?

30

Caveat!

DHTs are a work-in-progress

A flurry of research activity on:- security- replication- proximity- real operational experience- .....

For rest of talk, we put these worries aside...

Page 31: The Data-Centric Revolution in Networking?

31

Why Not Centralized Solutions?

Ugh! (and infeasible for sensornets)

Fault tolerance: avoid single point of failure

Economic:- DNS: “donated” machines, scales organically- Centralized solutions require business model

Issue still open....but irrelevant to data-centrism- need to support interface- DHTs allow us to choose between cent. and decent.

Page 32: The Data-Centric Revolution in Networking?

32

Multiple Roles for DHTs

Application-specific- rolled into P2P application, run on “peers”

General-purpose service- run on managed nodes

Intrinsic part of Internet architecture- run on managed nodes

Page 33: The Data-Centric Revolution in Networking?

33

Multiple Roles for DHTs

Application-specific- rolled into P2P app, run on “peers”

General-purpose service- run on managed nodes

Intrinsic part of Internet architecture- run on managed infrastructure nodes

Page 34: The Data-Centric Revolution in Networking?

34

Some Applications using DHTs

Partial list: File sharing Storage repositories and file systems Backup systems Event notification systems Electronic mail App-layer multicast and streaming media .....Useful substrate for many (not all) large distributed

applications because HTs are useful

Page 35: The Data-Centric Revolution in Networking?

35

Multiple Roles for DHTs

Application-specific- rolled into P2P app, run on “peers”

General-purpose service- run on managed nodes

Intrinsic part of Internet architecture- run on managed infrastructure nodes

Page 36: The Data-Centric Revolution in Networking?

36

Internet-scale Query Processing

Superficial motivation:- Joins can be implemented with hash tables so...- Distributed joins can be implemented with DHTs- Scaling: latency O(log n) while computation O(n)

PIER (talk later today in session A9!):- joins, aggregation, recursive and continuous queries

Intended targets:- data “in the wild” (filesharing, net monitoring, etc.)- schema provided by standardized protocols- no need for ACID semantics

Page 37: The Data-Centric Revolution in Networking?

37

More Complex Queries

Range search: using “prefix hash table” no need to walk tree

Keyword search: engineering the boolean approach

Active research on DHT-based distributed data structures for search (net and db communities)

Page 38: The Data-Centric Revolution in Networking?

38

Multiple Roles for DHTs

Application-specific- rolled into P2P app, run on “peers”

General-purpose service- run on managed nodes

Intrinsic part of Internet architecture- run on managed infrastructure nodes

Page 39: The Data-Centric Revolution in Networking?

39

Cleaning Up the Architecture

Making URNs a reality- webNG based on flat and opaque DHT keys- enables persistence and eliminates branding

Host identifiers versus routing information- IP addresses currently (and stupidly) serve as both- DHT key = host id, resolves to routing address- Architectural challenge for basic protocols

Page 40: The Data-Centric Revolution in Networking?

40

Subverting the Architecture

Use DHT for forwarding, not just lookup!- e.g., Internet Indirection Infrastructure (i3)- similar in spirit to multicast (logical addressing)- transcends current naming/addressing structures

Make overlay the real “network” layer- turn IP into a link layer technology

Leverages, not limited by, current infrastructure

New network layer is still simple, but not IP

Page 41: The Data-Centric Revolution in Networking?

41

New Generation of Networking?

Current Internet relies on hierarchies to scale:- DNS naming, IP addressing, etc.

Hierarchies limit flexibility:- addresses and names have to fit given structure- need to care “where” data/machines are

Scalable flat lookup avoids hierarchy- network would be “structure independent”

Less of a distinction between hosts and data

Page 42: The Data-Centric Revolution in Networking?

42

Do DHTs Apply to Sensornets?

Can we build them?Do they help?

Page 43: The Data-Centric Revolution in Networking?

43

Finding Sensornet Data w/o Flooding

Extract high-level features or “events”- Temperature spikes, toxins, animal sightings- Name these events

Store/Access events with DHT-like structure- Can later get detailed data from specific nodes

Call this “data-centric storage” (DCS)- Good for frequent specific queries- Not good for long-running or aggregate queries

But how do you build a sensornet DHT?

Page 44: The Data-Centric Revolution in Networking?

44

Geographic Routing

Nodes know own and neighbors’ positions Packets routed to geographic destination

- Greedy forwarding, when possible- If greedy fails at a void, use the right hand rule to

navigate around the void

A (x,y)

B

Page 45: The Data-Centric Revolution in Networking?

45

“Geographic” Hash Table (GHT)

Keys hashed to random coordinates Likely no node exists at that location! Forwarding ends at node closest to destination Closest node stores the data

A (x,y)

Page 46: The Data-Centric Revolution in Networking?

46

Additional Algorithms

Caching and replication- Cache around perimeter, replicate independently

Structured replication (SR)- Hierarchical decomposition of key space- Tree of “mirror images”

Hash(event) Mirror Images

Page 47: The Data-Centric Revolution in Networking?

47

More Complex Queries

Using GHT+SR: (which has spatial structure) Range searches in space and value Wavelet analysis

New data structures: Higher-dimensional range searches

Active research in distributed data structures for sensornet queries (in net and db communities)

Page 48: The Data-Centric Revolution in Networking?

48

We are finally on our way to the land of “data independence”...

We ask for your guidance....

Page 49: The Data-Centric Revolution in Networking?

49

Areas of Common Interest

Algorithmic:- distributed data structures for search

Metaphoric:- thinking about data independence