smd149 - operating systems - networking · distributed systems lab 2 distributed operating systems...
TRANSCRIPT
IntroductionNetworking
Distributed systemsLab 2
SMD149 - Operating Systems - Networking
Roland Parviainen
December 5, 2005
1 / 60
IntroductionNetworking
Distributed systemsLab 2
Overview
Networking
Distributed systems
Getting started with Lab 2
2 / 60
IntroductionNetworking
Distributed systemsLab 2
OSI
3 / 60
IntroductionNetworking
Distributed systemsLab 2
TCP/IP
TCP/IP protocol stack
4 layers
Application layer
Highest levelProvides protocols for applictaions to communicate
Transport layer
End-to-end communicationRelies on network layer to determine proper path from one end ofcommunication to the other
Network layer
Moving data between computers
Link layer
Interface between the network layer and the physical medium
4 / 60
IntroductionNetworking
Distributed systemsLab 2
Transport layer
TCP
Connection oriented protocolSegments sent will arrive
UndamagedIn order
Error control, congestion control, retransmission
UDP
ConnectionlessMinimum overheadBest effort
Datagrams can be reorderd or lost
No congestion control
5 / 60
IntroductionNetworking
Distributed systemsLab 2
OSI
6 / 60
IntroductionNetworking
Distributed systemsLab 2
Communication Process Fundamentals
Socket: application-programming interface (API) that interfaces theapplication and transport layers Addressing Name or address of anend-system (IP address) Identifier that specifies the receiving process inthe destination (port number) Examples: HTTP server: TCP port 80;SMTP: TCP port 25
7 / 60
IntroductionNetworking
Distributed systemsLab 2
Client/server model
8 / 60
IntroductionNetworking
Distributed systemsLab 2
Distributed systems
Remote computers cooperate via a network to appear as a localmachine
Users are given the impression that they are interacting with just onemachine
Spread computation and storage throughout a network of computers
Applications are able to execute code and to share data, files andother resources among local and remote machines
9 / 60
IntroductionNetworking
Distributed systemsLab 2
Attributes of distributed systems
Internet has made distributed systems common
Attributes of distributed systems:
PerformanceScalabilityConnectivitySecurityReliabilityFault tolerance
10 / 60
IntroductionNetworking
Distributed systemsLab 2
Perfomance and scalability
Centralized system
A single server handles all user requests
Distributed system
User requests can be sent to different servers working in parallel toincrease performance
Scalability
Allows a distributed system to grow
11 / 60
IntroductionNetworking
Distributed systemsLab 2
Connectivity and security
Distributed systems
Susceptible to attacks by malicious users if the rely on insecurecommunications media
To improve securty
Allow only authorized users to access resourcesEnsure that information transmitted over the network is readableonly by the intended recipientsProvice mechanisms to protect resources from attack
12 / 60
IntroductionNetworking
Distributed systemsLab 2
Reliability and fault tolerance
Fault tolerance
Implemented by providing replication of resources
Replication
Offers increased reliability and availabilityConsistency
13 / 60
IntroductionNetworking
Distributed systemsLab 2
Transparancy
Access transparancy
Location transparancy
Failure transparancy
Replication transparancy
Persistence transparancy
Migration and relocation transparancy
Transaction transparancy
14 / 60
IntroductionNetworking
Distributed systemsLab 2
Network operating system
Access resources on remote computers that run independentoperating systems
Not responsible for resource management at remote locations
Distributed functions are explicit rather than transparent
Lack of transparancy in network OSs
Does not provide some of the benefits of distributed OSsEasier to implement
15 / 60
IntroductionNetworking
Distributed systemsLab 2
Distributed operating systems
Manage resources located in multiple networked computers
Many of the same communication methods, filesystem structures,etc. as network operating systems
Transparent communication
Objects are unaware of the separate computers
Not many “truly” distributed systems
16 / 60
IntroductionNetworking
Distributed systemsLab 2
Distributed operating systems
Access and manage resources located in multiple networkedcomputers
Employ many of the same communication methods, filesystem structures and other protocols found in networked operatingsystems
Transparent communication
Objects in the system are unaware of the separate computers thatprovide the service (unlike network operating systems)User has the illusion of working on a single local systemImplementation is generally much more complex than networkedoperating systems
17 / 60
IntroductionNetworking
Distributed systemsLab 2
Communication in distributed systems
Designers must establish interoperability between heterogeneouscomputers and applications
InteroperabilityPermits software components to interact among different
Hardware and software platformsProgramming languagesCommunication protocols
Standardized interface
Allows each client/server pair to communicate using a single,common interface that is understood by both sides
18 / 60
IntroductionNetworking
Distributed systemsLab 2
Middleware
Software in distributed systems that helps provide:
PortabilityTransparencyInteroperabilityProvides standard programming interfaces to enable interprocesscommunication between remote computers
19 / 60
IntroductionNetworking
Distributed systemsLab 2
Remote Procedure Call
Allows a process executing on one computer to invoke a procedurein a process executing on another computer
Goal of RPC
To simplify the process of writing distributed applications bypreserving the syntax of a local procedure call while transparentlyinitiating network communication
Based on the application-oriented design paradigm
Begin by designing a conventional program (that runs on a singlemachine) to solve a problemDivide the program into two or more pieces, and add communicationprotocols that allow pieces to execute on separate computers
20 / 60
IntroductionNetworking
Distributed systemsLab 2
RPC
To issue an RPC:
Client process makes a call to the procedure in the client stubClient stub performs marshalling of data (procedure name,arguments) into a message for transmission over a networkClient stub passes the message to the serverServer transmits the message to the server stubMessage are unmarshalledStub sends the parameters to the appropriate local procedureServer stub marshals the result and sends it back to the clientClient stub unmarshals the result, notifies the process and passes itth result
21 / 60
IntroductionNetworking
Distributed systemsLab 2
RPC
22 / 60
IntroductionNetworking
Distributed systemsLab 2
RPC
ONC RPC - Open Network Computing Remote Procedure Call (SunRPC)
Initially developed for the Network File System,
ONC RPC supports procedure calls over both UDP and TCP.
The interface description language for ONC RPC is XDR (eXternalData Representation)
Access to RPC services on a machine are provided via a port mapperthat listens for queries on a well-known port, port 111 over UDP andTCP.
23 / 60
IntroductionNetworking
Distributed systemsLab 2
RMI
Java Remote Method Invocation
RPC for Java
Object serialization
MarshallingJava objects can be used for parameters/return values
24 / 60
IntroductionNetworking
Distributed systemsLab 2
CORBA
Common Object Request Broker Architecture
Open standard
Objects as parameters or return values
Object Request Brokers (ORB) marshals data
Programming language independence through Interface DefinitionLanguge (IDL)
GIOP (General Inter-ORB Protocol), IIOP (Internet Inter-OrbProtocol)
25 / 60
IntroductionNetworking
Distributed systemsLab 2
DCOM
Distributed Component Object Model
Objects are access via interfaces
Objects may have multiple interfaces
Uses DCE/RPC (Distributed Computing Environment/RemoteProcedure Calls).
MSRPC
26 / 60
IntroductionNetworking
Distributed systemsLab 2
Web services
CORBA, DCOM, etc have problems with network firewalls, etc
Web services much more successful
Open standards
HTTP and XML
27 / 60
IntroductionNetworking
Distributed systemsLab 2
XML RPC
Uses XML to encode calls
HTTP as transport
Very simple, two page standard
28 / 60
IntroductionNetworking
Distributed systemsLab 2
<?xml version="1.0"?><methodCall><methodName>examples.getStateName</methodName><params><param>
<value><i4>41</i4></value></param>
</params></methodCall>
<?xml version="1.0"?><methodResponse><params><param>
<value><string>South Dakota</string></value></param>
</params></methodResponse>
30 / 60
IntroductionNetworking
Distributed systemsLab 2
SOAP
SOAP
Originally: Simple Object Access Protocol
XML/HTTP
Lengthy syntax:
Easy to readComplexSlow processing times (10x slower than RMI/IIOP)
31 / 60
IntroductionNetworking
Distributed systemsLab 2
Synchronization
Determining the order in which events occur is difficult
Communication delays in a distributed network are unpredictable
Causal ordering
Ensures that all processes recognize that a causally dependent eventmust occur only after the event on which it is dependentImplemented by the happens-before relation:
If events a and b belong to the same process, then a → b if aoccurred before bIf event a is the sending of a message and event b is the receiving ofthat message, then a → bThis relation is transitive
32 / 60
IntroductionNetworking
Distributed systemsLab 2
Synchronization
Causal ordering only ensures a partial ordering
Events for which it cannot be determined which occurred earlier aresaid to be concurrent
Total ordering
Ensures that all events are ordered and that causality is preservedCan be implemented through a logical clock that assigns atimestamp to each event that occurs in the systemScalar logical clocks synchronize the logical clocks on remote hostsand ensure causality
33 / 60
IntroductionNetworking
Distributed systemsLab 2
Mutual exclusion
In environments with no shared memory, mutual exclusion must beimplemented via message passing Message passing relies on clocksynchronization
FIFO broadcast
Guarantees that when two messages are sent from one process toanother, the message that was sent first will arrive first
Causal broadcast
Ensures that when message M1 is causally dependent on messageM2, then no process delivers M1 before delivering M2
Atomic broadcast (aka totally ordered or agreed broadcast)
Guarantees that all messages in a system are received in the sameorder at each process
34 / 60
IntroductionNetworking
Distributed systemsLab 2
Deadlock
Three types of distributed deadlocks
Resourceinter-process communication
Circular waiting for signals
phantom deadlock
Due to delayDeadlocks that does not exist are detected
35 / 60
IntroductionNetworking
Distributed systemsLab 2
Deadlock prevention
Wound-wait strategy
Breaks deadlock by denying the no-preemption condition
Recently created processes are forced to wait if they need a resourceheld by an earlier created process
Recently created processes are restarted if they hold a resourcerequested by an earlier created process
36 / 60
IntroductionNetworking
Distributed systemsLab 2
Wound-wait
37 / 60
IntroductionNetworking
Distributed systemsLab 2
Wait-die
Wait-die strategy
Breaks deadlock by denying the wait-for condition
Recently created processes “die” instead of waiting for a resource(currently being held by an earlier created process)
An earlier created process will wait if it needs a resource being heldby recently created processes
38 / 60
IntroductionNetworking
Distributed systemsLab 2
Wait-die
39 / 60
IntroductionNetworking
Distributed systemsLab 2
Deadlock detection
Central deadlock detection
Hiearchical deadlock detection
Distributed deadlock detection
40 / 60
IntroductionNetworking
Distributed systemsLab 2
Distributed file systems
Distributed file server
Stateful
The server keeps state information of the client requests so thatsubsequent access to the file is easier
Stateless
The client must specify which file to access in each request
Transparency is a key feature
Complete file location transparency means that the user is unawareof the physical location of a file within a distributed file systemThe user sees only a global file system
41 / 60
IntroductionNetworking
Distributed systemsLab 2
Distributed file systems
Many distributed systems implement client caching to avoid theoverhead of multiple RPCs
Clients keep a local copy of a file and flush modified copies of it tothe server from time to timeBecause there are multiple copies of the same file, files can becomeinconsistent
Distributed file systems are designed to share information amonglarge groups of computers
New computers should be able to be added to the system easilyDistributed file systems should be scalable
Security concerns in distributed file systems
Ensuring secure communicationsAccess controlTrusted/untrusted client machines?
42 / 60
IntroductionNetworking
Distributed systemsLab 2
Network File System
43 / 60
IntroductionNetworking
Distributed systemsLab 2
Network File System
NFS versions 2 and 3
Assume a stateless server implementationFault tolerance easier to implement than with a stateful serverWith stateless servers, if the server crashes, the client can simplyretry its request until the server responds
NFS-4
StatefulAllows faster access to filesHowever, if the server crashes, all the state information of the clientis lost, so the client needs to rebuild its state on the server beforeretrying
44 / 60
IntroductionNetworking
Distributed systemsLab 2
Network File System
NFS-4 extends the client-caching scheme through delegation
Allows the server to temporarily transfer the control of a file to aclientWhen the server grants a read delegation of a particular file to theclient, then no other client can write to that fileWhen the server grants a write delegation of a particular file to theclient, then no other client can read or write to that fileIf another client requests a file that has been delegated, the serverwill revoke the delegation and request that the original client flushthe file to disk
45 / 60
IntroductionNetworking
Distributed systemsLab 2
Multicomputer system architecture
Clustering
Takes advantage of distributed systems and parallel systems to buildpowerful computers
Peer-to-peer distributed computing model
Used to remove many central points of failure in applications likeinstant messengers
Grid computing
Exploits unused computer power to solve complex problems
46 / 60
IntroductionNetworking
Distributed systemsLab 2
Clustering
Clustering
Interconnecting nodes (single-processor computers or multiprocessorcomputers) within a high-speed LAN to function as a single parallelcomputer
Cluster: Set of nodes that forms the single parallel machine
Multiple computers working together to solve large and complexproblems
47 / 60
IntroductionNetworking
Distributed systemsLab 2
Clustering types
High-performance clusters
All nodes in the cluster work to improve performance
High-availability clusters
Only some of the nodes in the cluster are active while others serve asbackupsIf working nodes fail, the backup nodes immediately start runningand take over the jobs that were being executed onthe failed nodes, without interrupting service
Load-balancing clusters
A particular node works as a load balancer to distribute the load to aset of nodes so that all hardware is utilized efficiently
48 / 60
IntroductionNetworking
Distributed systemsLab 2
Clustering benefits
Economically interconnects relatively inexpensive components
Reduces the cost for building a clustered system compared to asingle parallel computer with the same capability
High performance
Each node in the clustering system shares the workload
Fast communications among nodes in a cluster
Faster than those in unclustered distributed computing systems dueto the high-speed LAN between nodes
49 / 60
IntroductionNetworking
Distributed systemsLab 2
Clustering benefits
Scalability
A cluster is able to add or remove nodes (or the components ofnodes) to adjust its capabilities without affecting the existing nodesin the clusterBetter scalability than multiprocessors
Reliability and fault tolerance by providing backups or redundancyfor the services and resources
The failure of any one computer will not affect the availability of therest of the system’s resources
50 / 60
IntroductionNetworking
Distributed systemsLab 2
Grid computing
Links computational resources that are distributed over the widearea network (such as computers, data storages and scientificdevices) to solve complex problems
Using unused resources (CPU cycles and/or disk storage) of largenumbers of disparate, often desktop, computers
Treated as a virtual, distributed cluster
Emphasizes public collaboration
Individuals and research institutes coordinate resources that are notsubject to a centralized controlExample: SETI@home
51 / 60
IntroductionNetworking
Distributed systemsLab 2
Grid computing
Grid computing has the same advantage as clustering
High performance, achieved by cost-effectively using spare computerpower and collaborating resources
However, grid computing requires advanced software to managedistributed computing tasks efficiently and reliably
52 / 60
IntroductionNetworking
Distributed systemsLab 2
Fallacies of distributed computing
The Eight Fallacies of Distributed Computing - P. Deutsch
1 The network is reliable
2 Latency is zero
3 Bandwidth is infinite
4 The network is secure
5 Topology doesn’t change
6 There is one administrator
7 Transport cost is zero
8 The network is homogeneous
53 / 60
IntroductionNetworking
Distributed systemsLab 2
Lab 2 - getting started
Unpack and get nachos running (make sure you have run “source.smd149rc”!
If not, make sure to run gmake clean first
join:
Put thread to sleepWake up thread when other thread finish
Sleep: See KThread.sleep()...
Mark thread as blocked
How do we know when to wake up?
Store a reference to calling thread in the called threadWake thread up when called thread finish (See KThread.finish())
KThread.ready()
54 / 60
IntroductionNetworking
Distributed systemsLab 2
Condition variables
Maintain a queue of waiting threads (LinkedList or Vector)
See Condition.java and Semaphore.java for hints
wakeAll: while wait queue is not empty, call wake
wake: remove first thread from wait queue, wake thread up(KThread.ready())
Protect access to queue with boolean intStatus =Machine.interrupt().disable() andMachine.interrupt().restore(intStatus);
sleep
disable interruptsrelease condition lockadd current thread to queueblock current thread (KThread.sleep())acquire condition lockrestore interrupts
55 / 60
IntroductionNetworking
Distributed systemsLab 2
Alarm
waitUntil:
disable/enable interruptsadd sleep time for current thread (add variable to KThread?)add to sleep queueblock current thread
timerInterrupt:
Disable interruptsRemove and wake up all threads where sleep time has expired(Machine.timer().getTime())Enable interruptsYield
56 / 60
IntroductionNetworking
Distributed systemsLab 2
Part 2
Add debugging statements
Write a simple program to copy a file (to test file systemimplementation)
Make sure file is larger than one page
Remember to protect data structures by e.g. disable interrupts
Free page management
FileSystem fs
fs = Machine.stubFileSystem();
57 / 60
IntroductionNetworking
Distributed systemsLab 2
Example
public int writeVirtualMemory(int vaddr, byte[] data, int offset,int length)
int read(int fd, char *buffer, int size)
case(syscallRead):fd = a0;of = fdTable.getFile(fd);buf = new byte[a2];rd1 = of.read(buf, 0, a2);rd2 = writeVirtualMemory(a1, buf, 0, rd1);return rd2;
59 / 60
IntroductionNetworking
Distributed systemsLab 2
Summary
Next time: security
60 / 60