prototypes t-110.6120 17.11.2011 jimmy kjällman ericsson research, nomadiclab
TRANSCRIPT
PROTOTYPES
T-110.6120
17.11.2011
Jimmy Kjällman
Ericsson Research, NomadicLab
Prototypes
• Two research prototypes will be described in this presentation
• Blackadder– Developed in PURSUIT– Channel-oriented base implementation– Demonstrated at the end of the lecture
• Blackhawk– Originates from PSIRP– Document-oriented implementation
Original slides:George Parisis, Computer Laboratory, University of Cambridge, 2011
BLACKADDER
Blackadder
• Realizes PURSUIT’s functional modelfor information-centric networking
Rendezvous Topology
Forwarding
Pub/Sub Service Model
SId
RId RId
Functional scopingInformation scoping
DisseminationStrategy
Recursion
Information Structure
• Scopes, subscopes, information items
• Information is structured as a directed acyclic graph• IDs are (statistically) unique within a scope
– (Possibly) self-generated, flat labels– Same ID space for both subscopes and information
items• “Complete” identifier: Prefix + ID
– One or more paths starting from one or more graph’s root(s)
Information Structure
0001 0003
0001 0001 0002
0001
0001 0002 0003
AAA0 AAA1 AAA20002AAA1
Scope
Informationitem
AAA2
Information ID : /0003/0002/AAA2
Scope ID : /0001/0001/0001, /0002/0001/0001, /0003/0001/0001
0002
0001
0002
0003
00010001
0001 0002
Core Functions
• Simplified example
Rendezvous
Topology
ForwardingP S
Dissemination Strategies
• Defines the methods used for implementation (of a scope)– Architectural components– Data formats– Governance structures– Etc.
• Can be “overridden” for sub-items – if permitted– Strategies have to be aligned
• Usually engineered at design time
• Larger problem solutions through the assembly of smaller ones
Service Model
• Publish/Subscribe
• For example:– publish_scope(id, prefix, strategy)
publish_info (id, prefix, strategy) – unpublish_scope(id, prefix, strategy)
unpublish_info (id, prefix, strategy) – subscribe_scope(id, prefix, strategy)
subscribe_info (id, prefix, strategy) – unsubscribe_scope(id, prefix, strategy)
unsubscribe_info (id, prefix, strategy)– publish_data(id, strategy, data, data_len)– getEvent(&event)
Blackadder Architecture
• Click is an external framework that Blackadder uses
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
Background Information:The Click Modular Router
• Open source platform for building packet processing configurations that consist of connected elements– Language for describing router configurations– Ready-made elements– Libraries for creating new elements as C++ classes
• Portable code– Kernel and userlevel– Linux, FreeBSD, Mac OS X, etc.
• Modular design approach– Reuse of elements in different configurations
(e.g., in different prototypes or experiments)• Basic operation: packets are pushed or pulled between
elements
Click Router Configuration
• Example: Ping(nothing to do with Blackadder,just illustrates a Click router)
define($DEV eth0, $DADDR 8.8.8.8, $GW $DEV:gw)
FromDevice($DEV, SNIFFER false)-> c :: Classifier(12/0800, 12/0806 20/0002)-> CheckIPHeader(14)-> ip :: IPClassifier(icmp echo-reply)-> ping :: ICMPPingSource($DEV, $DADDR)-> SetIPAddress($GW)-> arpq :: ARPQuerier($DEV)-> IPPrint-> q :: Queue-> ToDevice($DEV);
arpq[1] -> q;c[1] -> [1] arpq;
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
IPC Element
• Implements a Netlink socket for receiving pub/sub requests from applications (or an API library) and for sending back pub/sub events and published data– These are sent as messages through the socket– In user space, the IPC element utilizes the selection
mechanism provided by Click– In kernel space, the element receives sk_buffs in the context
of the running process – buffers are wrapped into Click packets that are later processed by a Click task
• Everything is asynchronous – like an event-based system
API (Service Model):Functions and Messages
• publish_scope(id, prefix, strategy)publish_info (id, prefix, strategy)
• unpublish_scope(id, prefix, strategy)unpublish_info (id, prefix, strategy)
• subscribe_scope(id, prefix, strategy)subscribe_info (id, prefix, strategy)
• unsubscribe_scope(id, prefix, strategy)unsubscribe_info (id, prefix, strategy)
• publish_data(id, strategy, data, data_len)
(These messages are only used node-internally)
ID Prefix ID length LIPSIN Identifier
Type
ID le
ngth
Stra
tegy
ID le
ngth
1 1 Variable length Variable length1 1 LID size
ID LIPSIN Identifier
Type
ID le
ngth
Stra
tegy
1 1 Variable length 1 LID size
Data
API: Events
• Start Publishing, Stop Publishing• New Scope, Deleted Scope
• Published Data
IDTy
pe
ID le
ngth
1 1 Variable length
ID
Type
ID le
ngth
1 1 Variable length
Data
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
Accessing the network
• Standard Click elements for network communication– ToDevice and FromDevice for directly sending and
receiving Ethernet frames• Suitable, e.g., when experimenting over high-speed LANs
– RawSocket for sending and receiving IP (UDP) packets over raw sockets• Suitable, e.g., when experimenting in the PlanetLab testbed
or VPNs• IP network used as an underlay
Network Packet Format
LIPSIN Identifier
LID size
No.
IDs
ID1 l
engt
h
ID1
ID2 l
engt
h
ID2
IDn l
engt
h
IDn Payload
1 1 1 1
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
Forwarding
• Receives packets from the network communication elements– Matches the FID with all outgoing links and
forwards the packets– A separate LID is assigned to the “internal link”
between the Forwarding element and the Local Proxy Element• Implements the notion of destination
Sample forwarding configurations
• Click configurations – can be auto-generated
Forwarder (MAC, 1,1, 08:00:00:00:00:01, 08:00:00:00:00:11,
10000000000000000000000000000000000000000000000000000000000000001, 08:00:00:00:00:02, 08:00:00:00:00:12,
10000010000000000000000000000000000000000000000000000000000000002, 08:00:00:00:00:03, 08:00:00:00:00:13,
1000001000000000001000000000000000000000000000000000000000000000);
fw[1] -> Queue(1000) -> ToDevice(eth0);fw[2] -> Queue(1000) -> ToDevice(eth1);
FromDevice(eth0, SNIFFER false) -> Classifier(12/080a)[0] -> [1]fw;FromDevice(eth1, SNIFFER false) -> Classifier(12/080a)[0] -> [2]fw;
Forwarder (IP, 1,1, 192.168.0.1, 192.168.0.2, 10000000000000000000000000000000000000000000000000000000000000001, 192.168.0.1, 192.168.0.6, 10000010000000000000000000000000000000000000000000000000000000002, 192.168.1.1, 192.168.1.2, 1000001000000000001000000000000000000000000000000000000000000000);
fw[1] -> Queue(1000) -> RawSocket(UDP) -> IPClassifier(dst udp port 9999)[0] -> [1]fw;fw[2] -> Queue(1000) -> RawSocket(UDP) -> IPClassifier(dst udp port 9999)[0] -> [2]fw;
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
Local Proxy
• “The heart of a network node” – everything goes through it• Receives all pub/sub requests from applications and other Click
elements• Keeps track of
– Pending subscriptions– Advertised information items (and assigns FIDs)
• Receives– Published data and notifications about new or deleted scopes
• Pushes packets to subscribers (applications or Click elements)– Notifications to start or stop publishing data
• Pushes packets to one (of the potentially many) publishers
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
RV Function
• The same element runs in all nodes• Every node can create an information structure that will be
known and maintained by the local RV function• Other nodes can send pub/sub requests to that node if they
know a path to it• Usual scenarios
– A network node (its RV function) maintains a local structure for IPC (node-local strategy)
– A network node (its RV function) maintains a structure accessible by physical neighbours (link-local strategy)
– One or more dedicated RV nodes run in a domain – end hosts know how to reach them (domain-local scenario)
RV IPC
• The RV Element access the world the same way applications do
• It subscribes to root scope FFFF where all pub/sub requests are published
• It publishes Topology Formation requests to scope FFFE to which the TM has subscribed
• Topology formation is required when:– A set of publishers need to be notified with
Forwarding IDs that point to a set of subscribers– A set of subscribers need to be notified about a new
or deleted scope
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
The Topology Manager
• An application– Calculates shortest paths in a network
Forwarding information– Uses (e.g.) the igraph library for this
• How the TM does IPC– Subscribes locally to scope FFFE– Receives requests from the RV node as publications– Publishes responses directly to publishers and
subscribers using the Information ID /FFFD/destinationNodeID
– Utilizes an implicit rendezvous dissemination strategy where information is published with a specific FID
Blackadder Architecture
Clic
kIPC Element
Communication Elements
/dev/eth0
App1 App2 App3 App4 AppN………………...
Rendezvous
Forwarding
Local Proxy
/dev/eth1 Raw IP Sockets
TopologyManager
Dissemination Strategies
• Currently 5 strategies are implemented– These strategies are used for choosing the scope of
information visibility in a network
1. Node-local– IPC
2. Link-local– A node can create information graphs
a) locally – accessible to physical neighbours
b) remotely – accessible to this node– Link IDs are provided by applications
Dissemination Strategies
3. Intra-domain– End-hosts use an FID to a dedicated RV to create
information graphs and to subscribe to scopes and information items
– Publishers assign FIDs (to subscribers) to individual information items
4. Subscribe locally– Do not send anything to any RV
5. Implicit rendezvous– Publish the data immediately using the provided FID
A Blackadder Network
• All network nodes run the same software– Blackadder runs in user space or kernel space in the nodes
• Configurations can be different– End-nodes are configured to have link access (LID) and
access to dedicated rendezvous (RV) nodes (with an FID)– Dedicated forwarding nodes run only the forwarding element
• And other elements if additional functionality is required(e.g. caching)
– Dedicated RV and TM nodes• Any nodes can be RV nodes – an FID is required to reach them• TM nodes run a Topology Manager (TM) application
– A deployment tool can be used for generating configuration files and deploying them in a network
– Network attachment component for dynamic settings
Simple API Example
Publisher
ba = Blackadder(True) ba.publish_scope(sid, “”, DOMAIN_LOCAL, None)ba.publish_info(rid, sid, DOMAIN_LOCAL, None) ev = Event(); ev.type = 0while ev.type != START_PUBLISH: ba.getEvent(ev) pass
while True: data = raw_input() ba.publish_data(sid+rid, DOMAIN_LOCAL, None, data, len(data))
(This example uses a Python API that is wrapped on top of a C++ API library that translates API calls to messages that are passed through IPC sockets.)
Subscriber
ba = Blackadder(True)
ba.subscribe_info(rid, sid, DOMAIN_LOCAL, None) ev = Event()
while True: ba.getEvent(ev) if ev.type == PUBLISHED_DATA: print ev.data[:ev.data_len]
Blackadder availability
• Open source (GPLv2 / BSD)
• Code, documentation, etc.• http://www.fp7-pursuit.eu/• https://github.com/georgeparisis/blackadder
• Current release: v0.2beta (in GitHub)
BLACKHAWK
Blackhawk
• Pub/Sub prototype that implements the core ideas from PSIRP
• Blackboard-based architecture• Integrated with the OS kernel
– E.g., virtual memory management• Objectives: efficiency, natural interface, object
deduplication, etc.• Works in FreeBSD
Publications as Memory Objects
• A publication is an object in the blackboard – i.e.,in the computer’s memory– A (concept) publication is identified by a RId– A version is a specific piece of data identified by a vRId
• version-RId: hash tree root– A page is a block of data identified by a pRId
• page-RId: hash of content
• Sub-object relationships– Concept publications can have several different versions– Versions have a specific set of pages in a specific order
• Scopes are special publications that are identified by SIds and store collections of RIds
Blackboard: Objects
• Publication– A piece of content– Related metadata
• Identifiers, size, type, …
• Objects have their own identifiers– E.g. 256 bits; an opaque or
a hierarchical structure– Could be tied to the data
and/or an entity– Single global identifier
space assumed (by default)
• Scope– Collection of data
publications (their IDs)– Information aggregation,
access control• Data
– Placeholder for a ”concept”, i.e.,mutable content
• Version– Immutable instance of a
data publication• Page
– A chunk of actual data(e.g. in the OS kernel or in network packets)
– E.g., 4096 bytes
Object Hierarchy
Scope 0
Scope 2Scope 1
Pub 1 Pub 4Pub 2 Pub 3
Version 3Version 1
Root Scope
Subscopes
Publications
Version 2 Version 4 Version 5Versions
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
...
PagesPage 8
Page 9
Page 10
Page 11
...
Page 12
Pub/Sub API: Operations
• Create– Create a piece of content to be published– I.e., allocate virtual memory objects for
data and metadata• Publish
– Make content available to others– Results in a new version
• Subscribe– Request and get content
• Register, Listen– Get notified about publication events
(e.g., when a new version appears)
Conceptual API
handle := create(size)
publish(sid, rid, handle)
handle := subscribe(sid, rid)
events := listen(handles[])
pointers to data and metadataof a memory object
identifies a scope
identifies a publication
System Architecture
RZVnode
Kernel-level Click
Blackboard
Userlevel Click
Data publisher
RZVclient
pub/sub API
Data subscriber
fs
RZVif
Forwarder
…
socket
Kernelinterface
Userlevel interface
TM
…Network devices
Blackboard
Blackboard*
System call interface
Virtual memory system
Kernel-levelinterface
File system
Kernelevents
Internal data structures
File system,kevents Pub/Sub API library
Pub/Sub applications
VM System Integration
• Motivation:We want to achieve efficiency, a natural interface and object deduplication
• Existing FreeBSD VM system data structures utilized:– vm_page_t– vm_object_t– vm_map_t, vm_map_entry_t– ...
• In our system, for each publication, we have a VM object for metadata and data
VM System Integration
• Metadata object– One page (currently)– Object’s own ID, its size, etc.– List of sub-object IDs
• Pub: versions• Version: pages
• Data object– Pages contain actual content
VM System Integration
• Metadata and data objects mapped to applications’ memory spaces (when created or subscribed to)
• Data is copy-on-write– Can be modified
• results in a new shadow object• unmodified pages shared – don’t need to be copied
– Re-publishing results in a new version that can be subscribed to
1 ... ...
2 ... ...
File System Integration
• Each publication has a corresponding vnode in the kernel
• Applications get an open file descriptor in the “handle”– After publish or in subscribe
• Enables the use of kevents– We use it to get notifications when somebody
publishes (or subscribes to) something
File System Integration
• A new file system type, psfs
• File system view to the blackboard– E.g.: /pubsub/sid/rid/vrid/prid/data– Data/metadata can be accessed on
different levels in the object hierarchy– In theory, we can also map file system
ops to pub/sub ops
• Could be used for enabling demand paging over the network as well– Together with a pull-based caching-enabled
transport protocol
/pubsub
/sid1
/sid2
/rid1
data
meta
/vrid1...
In-kernel Rendezvous
• Publication Index (pubi)– Each scope, data publication and version (and page) has this
small additional data structure for auxiliary in-kernel metadata– Holds pointers to metadata and data VM objects and a vnode,
filesystem-related information, etc.
• Publication Index Table (PIT)– UMA zone-based storage– Hash table with ID → pubi mappings– All identifers are accessible on the same hierarchical level– Used for (recursive) object lookups in the blackboard
• ID → pubi → metadata and/or data → sub-obj. ID → …
In-kernel Rendezvous
ID, size,sub-object count, etc.
Sub-object IDs
.
.
.
.
.
.
PublicationIndex (pubi)
MetadataPage
PIT pages
Pointer tometadata
ID → pubi entry
Networking: Basic Protocol
P R S
Subscriber set update (MD SUB)
Version metadata
Data subscription
Data
RC
RC
RN
Publish
Subscribe
Rendezvous
RC
RC
DP
DS
DS
API
• Native C API: the libpsirp pub/sub library
• Wrappers for Python and Ruby– Generated with SWIG and additional C and
Python/Ruby code– The API for Python is object-oriented
C API
• Header– #include libpsirp.h
• Types– Identifiers: psirp_id_t (array)– Handle: psirp_pub_t (pointer)
• Primitives– psirp_create(), psirp_subscribe(), psirp_subscribe_sync(), psirp_publish(),
psirp_free()
• Accessors– for data, length, identifiers, fd, …– psirp_pub_data(psirp_pub_t pub), psirp_pub_data_len(psirp_pub_t pub), …
• Events– psirp_kq_t
– or standard kqueue() and kevent() calls
Very Simple API Example
#include <libpsirp.h>
void pubsub(psirp_id_t *sid, psirp_id_t *rid) { psirp_pub_t pub;
psirp_subscribe(sid, rid, &pub, 0x0) != 0); uint8_t data = psirp_pub_data(pub); data[0] = ’a’; data[1] = ’b’; psirp_publish(sid, rid, pub, 0x0);
psirp_free(pub);}
Blackhawk
• Open source (GPLv2 / BSD)
• Documentation, source code, VM images, etc.• http://www.psirp.org• http://code.psirp.org• http://users.piuha.net/blackhawk/
• Current release: v0.3 – in this presentation we described a more developed version
Summary
• Two information-centric pub/sub prototypes• Different approaches
– Channel vs Document– Not presented: Algorithmic IDs
• Blackadder implements PURSUIT’s functional model• Blackhawk implements PSIRP’s memory object model• Similar APIs, similar architectural components
– Ongoing work: Integration
BLACKADDER DEMO