distributed systems - philadelphia

Distributed SystemsNaming + locating

Ville Leppanen

based on slide material by prof. Penttonen

Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.1/52

Lectures

Introduction, 2h

Communication, 2h

Processes, 2h

Naming, 2h

Synchronization, 2h

Consistency and replication, 4h

Fault tolerance, 2h

Security, 2h

Example systems, 4h (CORBA, NFS?,WWW?) Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.2/52

Contents

Basic concepts

Structured naming

Name space; Name resolution; Implementation

Example: Domain Name System (DNS)

Flat naming and locating

Mobile entities

Problems

Solutions: simple, home-based, hierarchial

Removing unreferenced entities**Tracking methods; Actual removal

Summary

(Excluding attribute-based naming: as homework.)


Some background

Names of distributed entities – for whom? For humans?

When a computer system uses shared resources, there is needfor naming. If system is distributed, naming is not trivial.

In order to access the named resources, there must be a methodfor resolving the naming.

An increasing number of systems have mobile components,which implies new problems of naming and resolving names.

One more problem is the management of the naming system,when new systems are born and old ones die.


Concepts

Identifiers are used to name distributedentities, e.g. hosts, printers, disks, files,directories, users, mailboxes, newsgroups,web pages, . . .

To operate with entity, it needs to provide anaccess point. Name of an access point oftencalled as address.

Might be tempting to use address as a namefor distributed entity. Wrong!

Have location independent names ashuman-friendly names of entities.


True identifier

The purpose of a name is to identify an entityand act as an access point to it.To fulfil the role of true identifier,

a name refers to only one entity

different names refer to different entities

a name always refers to the same entity (i.e.it is not reused)

Very seldom names are true identifiers.


Structured naming

Instead of a flat name (=identifier), a entity canhave a structured name. Structured names oftencomposed of human-readable names.

Name spaces

Name resolution

Implementation

Example: DNS


Name spaces 1/2

Name spaceis the set of all possible names in anaming system.

A method to define structured names is to use anaming graph, which is a directed acyclic graph. Thenodes of the naming graph correspond to the entitiesof the system.

There are a number of root nodesof indegree zero,and leaf nodes of outdegree zero. Nonleaf nodes arecalled directory nodes. Nodes are labelled with nodeidentifiers.


Illustration

elke

.twmrc mbox

steen

home keys

"/home/steen/mbox"

"/keys""/home/steen/keys"

Data stored in n1

Directory node

Leaf node

n2: "elke"n3: "max"n4: "steen"

max

keys

n1

n2

n5

n0

n3 n4


Name spaces 2/2

The directed arcs of a naming graph arelabelled with arc names. For nameresolution, each directory node stores adirectory table consisting of entries of theform (node identifier, arc label). If thedirectory node n1 contains (n2,lab) , thereis an arc from n1 to n2 labelled with lab .

absolute / relative path name

Examples of such naming systems are theInternet Domain Name System and the filedirectory tree of Unix.


Name resolution

In order to access an entity, we do name resolution, wherewe have to know

where to start

how to proceed

Closure mechanism determines how and where to start.For example, it may tell how to find the root node of anetwork, or the root of a file system.If the name is given as a path in the naming graph, theasked entity is found by following the path. Path namecanbe relative, starting at the current name, or absolutestarting at the root determined by the closure mechanism.


Aliases, linking

Alias is another name for something. Two commonmethods for aliasing are hard links and symbolic links.

In previous figure /keys and /home/steen/keys arehard links to n5.

A symbolic link from node n to node m can beimplemented by storing the absolute path of m innode n. In the following figure, node n6 is a symboliclink to n5.


Illustration

.twmrc

"/home/steen/keys"

"/keys"n1

n2

n5

n0

n3

n6

mbox "/keys"

Data stored in n6n4

elke steen

home keysData stored in n1

Directory node

Leaf node

n2: "elke"n3: "max"n4: "steen"

max

keys


Mounting in file systems

A name space NS2 can be merged to aname space NS1 by mounting the root ofNS2 to a node of NS1. By Sun’s NFS(Network File System) a remote system maybe mounted. For example, directory denotedby the URL nfs://flits.cs.vu.nl/home/steen canbe mounted as if it were a local node/remote/vu, see figure.

Almost transparent!

Only problem is that in the remote machinethe naming might need to be different!


Mounting illustration

Name server Name server for foreign name space

Reference to foreign name spaceNetwork

Machine A Machine B

OS

vu

remotekeys

"nfs://flits.cs.vu.nl//home/steen"

mbox

steen

home


unix.utu.fi

unix.utu.fi:˜> df

/ (/dev/dsk/c0t0d0s0 ): 4407460 blocks 558530 files

/proc (/proc ): 0 blocks 15146 files

/etc/mnttab (mnttab ): 0 blocks 0 files

/dev/fd (fd ): 0 blocks 0 files

/var (/dev/dsk/c0t0d0s3 ): 3330528 blocks 485056 files

/var/run (swap ): 7290896 blocks 97134 files

/tmp (swap ): 7290896 blocks 97134 files

/export (/dev/dsk/c0t0d0s7 ): 6583290 blocks 403504 files

/usr/local/contrib (/dev/dsk/c0t0d0s4 ): 403998 blocks 4 13798 files

/winhome (filer.utu.fi:/vol/vol0/winhome):25907080 bl ocks 9329185

/home (filer.utu.fi:/vol/vol0/unixhome):68758456 bloc ks

/www/users (filer.utu.fi:/vol/vol1/wwwusers):1918120 0 blocks 14442352

/www/docs (filer.utu.fi:/vol/vol1/wwwdocs):41308776 b locks 14442352

/net/filer/vol/vol0/software(filer:/vol/vol0/softwa re): 6201936 blocks


Implementing name space

Name resolution and name space management of largedistributed system are organised hierarchically:

Global level has highest level directory nodes that arerarely changed. Managed by different organizations.

Admistrational level consists of nodes that aremanaged withing a single organization. Nodes arerelatively stable.

Managerial level contains low level nodes of a singleorganization. They may be managed by systemadministrators but also by single users of distributedsystems.


Name space distribution

org netjp us

nl

sun

eng

yale

eng

ai linda

robot

acm

jack jill

ieee

keio

cs

cs

pc24

co

nec

csl

oce vu

cs

ftp www

ac

com edugov mil

pub

globe

index.txt

Mana-geriallayer

Adminis-trational

layer

Globallayer

Zone


Characteristic to layers

Item Global Administrational Managerial

Geographical scale of nw Worldwide Organisation Department

Total number of nodes Few Many Vast numbers

Responsiveness to lookups Seconds Milliseconds Immediate

Update propagation Lazy Immediate Immediate

Number of replicas Many None or few None

Client-side caching used? Yes Yes Sometimes


Implementation of name resolution

Two basic strategies for local name resolver:

In iterative name resolutionclient first asks the root node of thegraph, which returns the address of its descendant node. This isrepeated until all arcs of the name path have been processed. Inthe next figure the path ftp.cs.vu.nl is resolved.

In recursive name resolutionpartial results are not immediatelyreturned but a node forwards the unresolved part of the path tothe respective descendant node, see figure.

Recursive resolution requires higher performance from the server but ithas advantages: easier for the client; needs less communication; andallows the use of caches for future name resolutions.

(Other: Hybrid; local cache; local nameserver first.)


Iterative resolution

Client'snameresolver

Rootname server

Name servernl node

Name servervu node

Name servercs node

1. <nl,vu,cs,ftp>

2. #<nl>, <vu,cs,ftp>

3. <vu,cs,ftp>

4. #<vu>, <cs,ftp>

5. <cs,ftp>

6. #<cs>, <ftp>

ftp

cs

vu

nl

Nodes aremanaged bythe same server

7. <ftp>

8. #<ftp>

#<nl,vu,cs,ftp><nl,vu,cs,ftp>


Recursive resolution

Client'snameresolver

Rootname server

Name servernl node

Name servervu node

Name servercs node

1. <nl,vu,cs,ftp>

2. <vu,cs,ftp>

7. #<vu,cs,ftp>3. <cs,ftp>

6. #<cs,ftp>4. <ftp>

5. #<ftp>

#<nl,vu,cs,ftp>

8. #<nl,vu,cs,ftp>

<nl,vu,cs,ftp>

Caching of resolved names possible!Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.22/52

DNS

Probably widest used naming system is the Domain NameSystem (DNS) of Internet.

The name space of DNS is a hierarchically organized rootedtree. A subtree of consisting of the descendants of a node is adomain. The path name from this node to the root name is thedomain name of this domain. Important information needed inname resolution is stored in the resource records of the node.Most important types of resource records are given in theenclosed table. In the table, SOA (Start of Authority) starts azone administered by this node. Canonical names can bealiased.


Syntax of DNS records

Type Entity Description

SOA Zone Information about the presented zone

A Host IP address of this node

MX Domain Mail server serving this node

SRV Domain Server handling a specific service

NS Zone Name server implementing represented zone

CNAME Node Symbolic link with primary name of node

PTR Host Contains the canonical name of a host

HINFO Host Some information about the host represented

TXT Any Any entity-specific information


Example of DNS records in a NS

Name Type Value

cs.vu.nl SOA star (1999121502,7200,3600,2419200,86400)

cs.vu.nl NS star.cs.vu.nl

cs.vu.nl NS top.cs.vu.nl

cs.vu.nl NS solo.cs.vu.nl

cs.vu.nl TXT Vrije Universiteit - Math and Comp. Sci

cs.vu.nl MX 1 zephyr.cs.vu.nl

cs.vu.nl MX 2 tomado.cs.vu.nl

cs.vu.nl MX 3 star.cs.vu.nl

star.cs.vu.nl HINFO Sun Unix

star.cs.vu.nl MX 1 star.cs.vu.nl

star.cs.vu.nl MX 10 zephyr.cs.vu.nl

star.cs.vu.nl A 130.37.24.6

star.cs.vu.nl A 192.31.231.42

zephyr.cs.vu.nl HINFO Sun Unix

zephyr.cs.vu.nl MX 1 zephyr.cs.vu.nl

zephyr.cs.vu.nl MX 2 tomado.cs.vu.nl


DNS implementation

All zones with their DNS records for local nameservers represent the adminstrational layer.

Zones often have several secondary name serversbesides one primary. Primary access database ofDNS records. Secondary only accesses a copy of it(obtained by a zone transfer).

13 servers for the root zone.

Root servers hold the list of addresses for theauthoritative servers for the top-level domains.


Mobile entities

More and more computer systems containmobile components. How should they benamed and how can the names be resolved?

Often human-frienly names are domain related –problems when the domain changes. A solutionis to separate naming service and locationservice!


Two level mapping

NameName NameName NameName

Entity ID

AddressAddress AddressAddress AddressAddress

Namingservice

Locationservice

NameName

(a) (b)

Based on identifiers!


Locating under flat naming

Broadcasing / multicasting (local area only)

Forwarding pointers

Home-based approaches

Hierarchial approaches


Broadcasing / multicasting

Broadcasting: A query “who has this address?” is broadcast tothe network, and the station having that name answers with itsaddress. This technique called Address Resolution Protocol(ARP) is used in local area networks but is inefficient in largenetworks.(IP-address 7→ Ethernet address)

Multicasting can be used to decrease network traffic. Movingentities register themselves to a multicast group. Queries sent tomulticast group.


Forwarding pointers

Idea: When a mobile station comes to a new zone (e.g.ftp.cs.vu.nl comes to cs.unisa.edu.au), it is given a new nameunder the new zone, and the old name becomes a link to the newname. If the station moves again, another link is added.

Expensive to manage and access. Chains can become verylong.

Chain can broke.

Applies to distributed objects.


Forwarding: proxy / skeleton

Process P1

Process P2

Process P3

Process P4 ObjectProxy p

Identical proxy

Skeleton

Identicalskeleton

Interprocesscommunication

Localinvocation

Proxy p refers tosame skeleton asproxy p

Proxy p


Forwarding: redirecting

Invocationrequest issent to object

Skeleton at object'scurrent process returnsthe current location

Client proxy setsa shortcut

Skeleton is nolonger referencedby any proxy

(a) (b)


Homebased location

In homebased name resolutionservers maintain visitor registers.Resolution works as follows:

Whenever a mobile host moves to another network, it gets atemporary care-of address.

Care-of address is forwarded to the home location, where ahome agent is formed.

Whenever the mobile host receives a request/packet, homeagent knows how to forward it to the care-of address. Informsalso of the care-of-address.

Efficiency can be improved by checking the local visitor registerfirst.

Used as fall-back mechanism for e.g. forwarding pointers.


Principle of Mobile IP

Host's present location

Client'slocation

1. Send packet to host at its home

2. Return addressof current location

3. Tunnel packet tocurrent location

4. Send successive packetsto current location

Host's homelocation


About hierarchial approaches

Basic idea: Information maintained and searched hierarchially.

If the local leaf domain has no knowledge of the target, requestforwarded into enclosing domain.

Forwarded until knowledge found about the whereabouts of thetarget.

Actual information maintained in leaf domains.

Internal domains only know the “direction” of information.

Distributed “object” system Globe: http://www.cs.vu.nl/globe/


Hierarchy of domains

A leaf domain, contained in S

Directory nodedir(S) of domain S

A subdomain Sof top-level domain T(S is contained in T)

Top-leveldomain T

The root directorynode dir(T)

Real info at leaf domains; other domains justhave pointers.


2 address in diff. domains

Domain D2Domain D1

M

Field with no data

Location recordwith only one field,containing an address

Field for domaindom(N) withpointer to N

Location recordfor E at node M

N

(Replicated node.) Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.38/52

Searching location

Domain D

M

Node has norecord for E, sothat request isforwarded toparent

Look-uprequest

Node knowsabout E, so requestis forwarded to child


Garbage collection 1/2

Naming and location provide global referencingservice for entities. As long as an entity is referred itcan be accessed. On the other hand, as soon as anentity cannot any more be accessed, it should beremoved. In other words, garbage entities should becollected. How?

Accessible entities can be characterized by thereference graph: The nodes of the graph are theentities. If an entity M refers to an entity N, there is anarc from M to N. An entity is accessible, if there is apath from root to it.


Problem in general

Root set

Reachable entityfrom the root set

Unreachable entityfrom the root set

Entities formingan unreachable cycle


Garbage collection 2/2

Another method to check the accessibility of an entityis to count references to it. Each entity has areference counter. Each time a reference to the entityis created, its reference counter is increased. When areference is removed, the reference counter isdecremented. When reference counter reaches zero,entity can be removed.

Important in distributed systems, since should berunning for a long time.

Often also a huge number of distributed entities.

Basically, some kind of tracking information must bemaintained.

Garbage collection then uses that information.


Tracking information

Must some how keep information of the variousproxy objects existing in the distributed system.

Simple reference counting

Advanced reference counting

Reference listing


Simple reference counting

Basic idea: just maintain the number of referencingproxy objects.

Information maintained in the skeleton object.

Increment/decrement 7→ requires communication.

Unreliable communication creates many kinds ofproblems.

Request lost, reply lost, repeated request.Must cancel the effect of repeated messages.

A special problem: forwarding a proxy object.


Proxy moving problem

P1 P1

P2 P2

O O

P1 sendsreference to P2

P1 sendsreference to P2

P1 deletes itsreference to O

P1 deletes itsreference to O

-1 -1

+1ACK

P2 informs O ithas a reference

O acks it knowsabout P2's reference

Time Time

O has beenremoved

P1 tells O that it willpass a reference to P2

(a) (b)


Advanced reference counting

Basic idea: each proxy has a weight, skeleton knows the totalweight outside. When passing a proxy object to some place, di-vide the original weight so that the total remains the same. Whenproxy deleted, inform the skeleton of the lost weight.

Advantage: less communication.

Notice: name server can be seen to have a weight for allmaintained names of distributed objects.

What if weight is minimum possible?Forwarding solution. Reference generations solution.

Failure of a process ??


Maintaining weights

SkeletonObject O

128128

128

12864

64

64

32

32

Total weightPartial weight

Process P

Process P1

Process P2

Proxy

Partialweightof proxy

Partial weight atskeleton is halved

P2 gets halfof the weightof proxy at P1 Total and partial

weight at skeletonremain the same

(a) (b)

(c)

P1 passesreference to P2


Forwaring solution

128

16

1

8

18

Process P2Process P1

P1 has run outof weight andcreates skeleton s'

P2 refers toobject via P1

Object has nomore partialweight left

Total weightPartial weight

Not a good solution.


Reference generations

4

3

Process P1

Process P2

P1 passesreference to P2

0

1Copy counter

Generation

Skeleton maintains G[i]: # of copies ofgeneration i. If all zero, isolated.


Reference listing

Basic idea: Skeleton maintains a list of all corresponding proxies.

Advantage: adding and removing can be made idempotent, i.e. itdoes not matter if adding/deleting done several times.

Unreliable communication is not such a big problem.

Java RMI uses this method.

Possible to react to process failures.

Disadvantage: a lot to maintain. Scalability?


Identifying the unreachable

Information maintaining works well for non-cyclic references.

Unreachable cycles possible: other methods.

Really no other choice but trace what can be reached from theroot set. (Then remove unreached.)

Two phase mark-and-sweepgarbage collection. In mark phase,starting from roots the references are followed and reachedentities are marked. In sweep phase, name space is sweptexhaustively and unmarked are removed.

Very expensive – not really a scalable solution.

This “stop-the-world” while doing garbage collection is not good!

Attempts to solve the problem by dividing distributed entities intogroups and operating with one group at a time. Better.


Summary

Distributed entities usually have a human-friendly name (not atrue identifier).

Addresses + hf-names (+ entity identifiers): name space.

Traditional name space implementations not good for mobileentities.Several possible solutions.Naming system for mobile phone networks is a good example.

Distributed systems are supposed to be running for a very longtime. Thus, removal of unreferenced entities becomes important!

Harder in DS. Various ways to keep tracking information.

System failures. Only partial solutions?


distributed systems - philadelphia

Documents