distributed systems - philadelphia university · names of distributed entities – for whom? for...
TRANSCRIPT
Distributed SystemsNaming + locating
Ville Leppanen
based on slide material by prof. Penttonen
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.1/52
Lectures
Introduction, 2h
Communication, 2h
Processes, 2h
Naming, 2h
Synchronization, 2h
Consistency and replication, 4h
Fault tolerance, 2h
Security, 2h
Example systems, 4h (CORBA, NFS?,WWW?) Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.2/52
Contents
Basic concepts
Structured naming
Name space; Name resolution; Implementation
Example: Domain Name System (DNS)
Flat naming and locating
Mobile entities
Problems
Solutions: simple, home-based, hierarchial
Removing unreferenced entities**Tracking methods; Actual removal
Summary
(Excluding attribute-based naming: as homework.)
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.3/52
Some background
Names of distributed entities – for whom? For humans?
When a computer system uses shared resources, there is needfor naming. If system is distributed, naming is not trivial.
In order to access the named resources, there must be a methodfor resolving the naming.
An increasing number of systems have mobile components,which implies new problems of naming and resolving names.
One more problem is the management of the naming system,when new systems are born and old ones die.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.4/52
Concepts
Identifiers are used to name distributedentities, e.g. hosts, printers, disks, files,directories, users, mailboxes, newsgroups,web pages, . . .
To operate with entity, it needs to provide anaccess point. Name of an access point oftencalled as address.
Might be tempting to use address as a namefor distributed entity. Wrong!
Have location independent names ashuman-friendly names of entities.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.5/52
True identifier
The purpose of a name is to identify an entityand act as an access point to it.To fulfil the role of true identifier,
a name refers to only one entity
different names refer to different entities
a name always refers to the same entity (i.e.it is not reused)
Very seldom names are true identifiers.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.6/52
Structured naming
Instead of a flat name (=identifier), a entity canhave a structured name. Structured names oftencomposed of human-readable names.
Name spaces
Name resolution
Implementation
Example: DNS
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.7/52
Name spaces 1/2
Name spaceis the set of all possible names in anaming system.
A method to define structured names is to use anaming graph, which is a directed acyclic graph. Thenodes of the naming graph correspond to the entitiesof the system.
There are a number of root nodesof indegree zero,and leaf nodes of outdegree zero. Nonleaf nodes arecalled directory nodes. Nodes are labelled with nodeidentifiers.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.8/52
Illustration
elke
.twmrc mbox
steen
home keys
"/home/steen/mbox"
"/keys""/home/steen/keys"
Data stored in n1
Directory node
Leaf node
n2: "elke"n3: "max"n4: "steen"
max
keys
n1
n2
n5
n0
n3 n4
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.9/52
Name spaces 2/2
The directed arcs of a naming graph arelabelled with arc names. For nameresolution, each directory node stores adirectory table consisting of entries of theform (node identifier, arc label). If thedirectory node n1 contains (n2,lab) , thereis an arc from n1 to n2 labelled with lab .
absolute / relative path name
Examples of such naming systems are theInternet Domain Name System and the filedirectory tree of Unix.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.10/52
Name resolution
In order to access an entity, we do name resolution, wherewe have to know
where to start
how to proceed
Closure mechanism determines how and where to start.For example, it may tell how to find the root node of anetwork, or the root of a file system.If the name is given as a path in the naming graph, theasked entity is found by following the path. Path namecanbe relative, starting at the current name, or absolutestarting at the root determined by the closure mechanism.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.11/52
Aliases, linking
Alias is another name for something. Two commonmethods for aliasing are hard links and symbolic links.
In previous figure /keys and /home/steen/keys arehard links to n5.
A symbolic link from node n to node m can beimplemented by storing the absolute path of m innode n. In the following figure, node n6 is a symboliclink to n5.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.12/52
Illustration
.twmrc
"/home/steen/keys"
"/keys"n1
n2
n5
n0
n3
n6
mbox "/keys"
Data stored in n6n4
elke steen
home keysData stored in n1
Directory node
Leaf node
n2: "elke"n3: "max"n4: "steen"
max
keys
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.13/52
Mounting in file systems
A name space NS2 can be merged to aname space NS1 by mounting the root ofNS2 to a node of NS1. By Sun’s NFS(Network File System) a remote system maybe mounted. For example, directory denotedby the URL nfs://flits.cs.vu.nl/home/steen canbe mounted as if it were a local node/remote/vu, see figure.
Almost transparent!
Only problem is that in the remote machinethe naming might need to be different!
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.14/52
Mounting illustration
Name server Name server for foreign name space
Reference to foreign name spaceNetwork
Machine A Machine B
OS
vu
remotekeys
"nfs://flits.cs.vu.nl//home/steen"
mbox
steen
home
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.15/52
unix.utu.fi
unix.utu.fi:˜> df
/ (/dev/dsk/c0t0d0s0 ): 4407460 blocks 558530 files
/proc (/proc ): 0 blocks 15146 files
/etc/mnttab (mnttab ): 0 blocks 0 files
/dev/fd (fd ): 0 blocks 0 files
/var (/dev/dsk/c0t0d0s3 ): 3330528 blocks 485056 files
/var/run (swap ): 7290896 blocks 97134 files
/tmp (swap ): 7290896 blocks 97134 files
/export (/dev/dsk/c0t0d0s7 ): 6583290 blocks 403504 files
/usr/local/contrib (/dev/dsk/c0t0d0s4 ): 403998 blocks 4 13798 files
/winhome (filer.utu.fi:/vol/vol0/winhome):25907080 bl ocks 9329185
/home (filer.utu.fi:/vol/vol0/unixhome):68758456 bloc ks
/www/users (filer.utu.fi:/vol/vol1/wwwusers):1918120 0 blocks 14442352
/www/docs (filer.utu.fi:/vol/vol1/wwwdocs):41308776 b locks 14442352
/net/filer/vol/vol0/software(filer:/vol/vol0/softwa re): 6201936 blocks
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.16/52
Implementing name space
Name resolution and name space management of largedistributed system are organised hierarchically:
Global level has highest level directory nodes that arerarely changed. Managed by different organizations.
Admistrational level consists of nodes that aremanaged withing a single organization. Nodes arerelatively stable.
Managerial level contains low level nodes of a singleorganization. They may be managed by systemadministrators but also by single users of distributedsystems.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.17/52
Name space distribution
org netjp us
nl
sun
eng
yale
eng
ai linda
robot
acm
jack jill
ieee
keio
cs
cs
pc24
co
nec
csl
oce vu
cs
ftp www
ac
com edugov mil
pub
globe
index.txt
Mana-geriallayer
Adminis-trational
layer
Globallayer
Zone
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.18/52
Characteristic to layers
Item Global Administrational Managerial
Geographical scale of nw Worldwide Organisation Department
Total number of nodes Few Many Vast numbers
Responsiveness to lookups Seconds Milliseconds Immediate
Update propagation Lazy Immediate Immediate
Number of replicas Many None or few None
Client-side caching used? Yes Yes Sometimes
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.19/52
Implementation of name resolution
Two basic strategies for local name resolver:
In iterative name resolutionclient first asks the root node of thegraph, which returns the address of its descendant node. This isrepeated until all arcs of the name path have been processed. Inthe next figure the path ftp.cs.vu.nl is resolved.
In recursive name resolutionpartial results are not immediatelyreturned but a node forwards the unresolved part of the path tothe respective descendant node, see figure.
Recursive resolution requires higher performance from the server but ithas advantages: easier for the client; needs less communication; andallows the use of caches for future name resolutions.
(Other: Hybrid; local cache; local nameserver first.)
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.20/52
Iterative resolution
Client'snameresolver
Rootname server
Name servernl node
Name servervu node
Name servercs node
1. <nl,vu,cs,ftp>
2. #<nl>, <vu,cs,ftp>
3. <vu,cs,ftp>
4. #<vu>, <cs,ftp>
5. <cs,ftp>
6. #<cs>, <ftp>
ftp
cs
vu
nl
Nodes aremanaged bythe same server
7. <ftp>
8. #<ftp>
#<nl,vu,cs,ftp><nl,vu,cs,ftp>
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.21/52
Recursive resolution
Client'snameresolver
Rootname server
Name servernl node
Name servervu node
Name servercs node
1. <nl,vu,cs,ftp>
2. <vu,cs,ftp>
7. #<vu,cs,ftp>3. <cs,ftp>
6. #<cs,ftp>4. <ftp>
5. #<ftp>
#<nl,vu,cs,ftp>
8. #<nl,vu,cs,ftp>
<nl,vu,cs,ftp>
Caching of resolved names possible!Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.22/52
DNS
Probably widest used naming system is the Domain NameSystem (DNS) of Internet.
The name space of DNS is a hierarchically organized rootedtree. A subtree of consisting of the descendants of a node is adomain. The path name from this node to the root name is thedomain name of this domain. Important information needed inname resolution is stored in the resource records of the node.Most important types of resource records are given in theenclosed table. In the table, SOA (Start of Authority) starts azone administered by this node. Canonical names can bealiased.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.23/52
Syntax of DNS records
Type Entity Description
SOA Zone Information about the presented zone
A Host IP address of this node
MX Domain Mail server serving this node
SRV Domain Server handling a specific service
NS Zone Name server implementing represented zone
CNAME Node Symbolic link with primary name of node
PTR Host Contains the canonical name of a host
HINFO Host Some information about the host represented
TXT Any Any entity-specific information
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.24/52
Example of DNS records in a NS
Name Type Value
cs.vu.nl SOA star (1999121502,7200,3600,2419200,86400)
cs.vu.nl NS star.cs.vu.nl
cs.vu.nl NS top.cs.vu.nl
cs.vu.nl NS solo.cs.vu.nl
cs.vu.nl TXT Vrije Universiteit - Math and Comp. Sci
cs.vu.nl MX 1 zephyr.cs.vu.nl
cs.vu.nl MX 2 tomado.cs.vu.nl
cs.vu.nl MX 3 star.cs.vu.nl
star.cs.vu.nl HINFO Sun Unix
star.cs.vu.nl MX 1 star.cs.vu.nl
star.cs.vu.nl MX 10 zephyr.cs.vu.nl
star.cs.vu.nl A 130.37.24.6
star.cs.vu.nl A 192.31.231.42
zephyr.cs.vu.nl HINFO Sun Unix
zephyr.cs.vu.nl MX 1 zephyr.cs.vu.nl
zephyr.cs.vu.nl MX 2 tomado.cs.vu.nl
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.25/52
DNS implementation
All zones with their DNS records for local nameservers represent the adminstrational layer.
Zones often have several secondary name serversbesides one primary. Primary access database ofDNS records. Secondary only accesses a copy of it(obtained by a zone transfer).
13 servers for the root zone.
Root servers hold the list of addresses for theauthoritative servers for the top-level domains.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.26/52
Mobile entities
More and more computer systems containmobile components. How should they benamed and how can the names be resolved?
Often human-frienly names are domain related –problems when the domain changes. A solutionis to separate naming service and locationservice!
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.27/52
Two level mapping
NameName NameName NameName
Entity ID
AddressAddress AddressAddress AddressAddress
Namingservice
Locationservice
NameName
(a) (b)
Based on identifiers!
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.28/52
Locating under flat naming
Broadcasing / multicasting (local area only)
Forwarding pointers
Home-based approaches
Hierarchial approaches
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.29/52
Broadcasing / multicasting
Broadcasting: A query “who has this address?” is broadcast tothe network, and the station having that name answers with itsaddress. This technique called Address Resolution Protocol(ARP) is used in local area networks but is inefficient in largenetworks.(IP-address 7→ Ethernet address)
Multicasting can be used to decrease network traffic. Movingentities register themselves to a multicast group. Queries sent tomulticast group.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.30/52
Forwarding pointers
Idea: When a mobile station comes to a new zone (e.g.ftp.cs.vu.nl comes to cs.unisa.edu.au), it is given a new nameunder the new zone, and the old name becomes a link to the newname. If the station moves again, another link is added.
Expensive to manage and access. Chains can become verylong.
Chain can broke.
Applies to distributed objects.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.31/52
Forwarding: proxy / skeleton
Process P1
Process P2
Process P3
Process P4 ObjectProxy p
Identical proxy
Skeleton
Identicalskeleton
Interprocesscommunication
Localinvocation
Proxy p refers tosame skeleton asproxy p
Proxy p
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.32/52
Forwarding: redirecting
Invocationrequest issent to object
Skeleton at object'scurrent process returnsthe current location
Client proxy setsa shortcut
Skeleton is nolonger referencedby any proxy
(a) (b)
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.33/52
Homebased location
In homebased name resolutionservers maintain visitor registers.Resolution works as follows:
Whenever a mobile host moves to another network, it gets atemporary care-of address.
Care-of address is forwarded to the home location, where ahome agent is formed.
Whenever the mobile host receives a request/packet, homeagent knows how to forward it to the care-of address. Informsalso of the care-of-address.
Efficiency can be improved by checking the local visitor registerfirst.
Used as fall-back mechanism for e.g. forwarding pointers.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.34/52
Principle of Mobile IP
Host's present location
Client'slocation
1. Send packet to host at its home
2. Return addressof current location
3. Tunnel packet tocurrent location
4. Send successive packetsto current location
Host's homelocation
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.35/52
About hierarchial approaches
Basic idea: Information maintained and searched hierarchially.
If the local leaf domain has no knowledge of the target, requestforwarded into enclosing domain.
Forwarded until knowledge found about the whereabouts of thetarget.
Actual information maintained in leaf domains.
Internal domains only know the “direction” of information.
Distributed “object” system Globe: http://www.cs.vu.nl/globe/
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.36/52
Hierarchy of domains
A leaf domain, contained in S
Directory nodedir(S) of domain S
A subdomain Sof top-level domain T(S is contained in T)
Top-leveldomain T
The root directorynode dir(T)
Real info at leaf domains; other domains justhave pointers.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.37/52
2 address in diff. domains
Domain D2Domain D1
M
Field with no data
Location recordwith only one field,containing an address
Field for domaindom(N) withpointer to N
Location recordfor E at node M
N
(Replicated node.) Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.38/52
Searching location
Domain D
M
Node has norecord for E, sothat request isforwarded toparent
Look-uprequest
Node knowsabout E, so requestis forwarded to child
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.39/52
Garbage collection 1/2
Naming and location provide global referencingservice for entities. As long as an entity is referred itcan be accessed. On the other hand, as soon as anentity cannot any more be accessed, it should beremoved. In other words, garbage entities should becollected. How?
Accessible entities can be characterized by thereference graph: The nodes of the graph are theentities. If an entity M refers to an entity N, there is anarc from M to N. An entity is accessible, if there is apath from root to it.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.40/52
Problem in general
Root set
Reachable entityfrom the root set
Unreachable entityfrom the root set
Entities formingan unreachable cycle
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.41/52
Garbage collection 2/2
Another method to check the accessibility of an entityis to count references to it. Each entity has areference counter. Each time a reference to the entityis created, its reference counter is increased. When areference is removed, the reference counter isdecremented. When reference counter reaches zero,entity can be removed.
Important in distributed systems, since should berunning for a long time.
Often also a huge number of distributed entities.
Basically, some kind of tracking information must bemaintained.
Garbage collection then uses that information.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.42/52
Tracking information
Must some how keep information of the variousproxy objects existing in the distributed system.
Simple reference counting
Advanced reference counting
Reference listing
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.43/52
Simple reference counting
Basic idea: just maintain the number of referencingproxy objects.
Information maintained in the skeleton object.
Increment/decrement 7→ requires communication.
Unreliable communication creates many kinds ofproblems.
Request lost, reply lost, repeated request.Must cancel the effect of repeated messages.
A special problem: forwarding a proxy object.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.44/52
Proxy moving problem
P1 P1
P2 P2
O O
P1 sendsreference to P2
P1 sendsreference to P2
P1 deletes itsreference to O
P1 deletes itsreference to O
-1 -1
+1ACK
P2 informs O ithas a reference
O acks it knowsabout P2's reference
Time Time
O has beenremoved
P1 tells O that it willpass a reference to P2
(a) (b)
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.45/52
Advanced reference counting
Basic idea: each proxy has a weight, skeleton knows the totalweight outside. When passing a proxy object to some place, di-vide the original weight so that the total remains the same. Whenproxy deleted, inform the skeleton of the lost weight.
Advantage: less communication.
Notice: name server can be seen to have a weight for allmaintained names of distributed objects.
What if weight is minimum possible?Forwarding solution. Reference generations solution.
Failure of a process ??
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.46/52
Maintaining weights
SkeletonObject O
128128
128
12864
64
64
32
32
Total weightPartial weight
Process P
Process P1
Process P2
Proxy
Partialweightof proxy
Partial weight atskeleton is halved
P2 gets halfof the weightof proxy at P1 Total and partial
weight at skeletonremain the same
(a) (b)
(c)
P1 passesreference to P2
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.47/52
Forwaring solution
128
16
1
8
18
Process P2Process P1
P1 has run outof weight andcreates skeleton s'
P2 refers toobject via P1
Object has nomore partialweight left
Total weightPartial weight
Not a good solution.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.48/52
Reference generations
4
3
Process P1
Process P2
P1 passesreference to P2
0
1Copy counter
Generation
Skeleton maintains G[i]: # of copies ofgeneration i. If all zero, isolated.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.49/52
Reference listing
Basic idea: Skeleton maintains a list of all corresponding proxies.
Advantage: adding and removing can be made idempotent, i.e. itdoes not matter if adding/deleting done several times.
Unreliable communication is not such a big problem.
Java RMI uses this method.
Possible to react to process failures.
Disadvantage: a lot to maintain. Scalability?
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.50/52
Identifying the unreachable
Information maintaining works well for non-cyclic references.
Unreachable cycles possible: other methods.
Really no other choice but trace what can be reached from theroot set. (Then remove unreached.)
Two phase mark-and-sweepgarbage collection. In mark phase,starting from roots the references are followed and reachedentities are marked. In sweep phase, name space is sweptexhaustively and unmarked are removed.
Very expensive – not really a scalable solution.
This “stop-the-world” while doing garbage collection is not good!
Attempts to solve the problem by dividing distributed entities intogroups and operating with one group at a time. Better.
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.51/52
Summary
Distributed entities usually have a human-friendly name (not atrue identifier).
Addresses + hf-names (+ entity identifiers): name space.
Traditional name space implementations not good for mobileentities.Several possible solutions.Naming system for mobile phone networks is a good example.
Distributed systems are supposed to be running for a very longtime. Thus, removal of unreferenced entities becomes important!
Harder in DS. Various ways to keep tracking information.
System failures. Only partial solutions?
Distributed systems, V. Leppanen & M. Penttonen, 2007 – p.52/52