it 344: operating systems distributed systems. 12/1/20152 what is a “distributed system”? very...

IT 344: Operating

Systems

Distributed

Systems

04/21/23 2

What is a “distributed system”?

• Very broad definition– loosely-coupled to tightly-coupled

• Nearly all systems today are distributed in some way– they use email– they access files over a network– they access printers over a network– they’re backed up over a network– they share other physical or logical resources– they cooperate with other people on other machines– they access the web– they receive video, audio, etc.

04/21/23 3

• Economics dictate that we buy small computers• Everyone needs to communicate• We need to share physical devices (printers) as well

as information (files, etc.)• Many applications are by their nature distributed

(bank teller machines, airline reservations, ticket purchasing)

• To solve the largest problems, we will need to get large collections of small machines to cooperate together (parallel programming)

Distributed systems are now a requirement

04/21/23 4

Loosely-coupled systems

• Earliest systems used simple explicit network programs– FTP (rcp): file transfer program– telnet (rlogin/rsh): remote login program– mail (SMTP)

• Each system was a completely autonomous independent system, connected to others on the network

04/21/23 5

• Even today, most distributed systems are loosely-coupled– each CPU runs an independent autonomous OS– computers don’t really trust each other– some resources are shared, but most are not– the system may look differently from different hosts– typically, communication times are long

04/21/23 6

Closely-coupled systems

• A distributed system becomes more “closely-coupled” as it– appears more uniform in nature– runs a “single” operating system– has a single security domain– shares all logical resources (e.g., files)– shares all physical resources (CPUs, memory, disks,

printers, etc.)

• In the limit, a distributed system looks to the user as if it were a centralized timesharing system, except that it’s constructed out of a distributed collection of hardware and software components

04/21/23 7

Tightly-coupled systems

• A “tightly-coupled” system usually refers to a multiprocessor– runs a single copy of the OS with a single job queue– has a single address space– usually has a single bus or backplane to which all

processors and memories are connected– has very low communication latency– processors communicate through shared memory

04/21/23 8

Some issues in distributed systems

• Transparency (how visible is the distribution)• Security• Reliability• Performance• Scalability• Programming models• Communication models

04/21/23 9

Distributed File Systems

• The most common distributed services:– printing– email– Files– Computation

• Basic idea of distributed file systems– support network-wide sharing of files and devices (disks)

• Generally provide a “traditional” view– a centralized shared local file system

• But with a distributed implementation– read blocks from remote hosts, instead of from local disks

04/21/23 10

Issues

• What is the basic abstraction– remote file system?

• open, close, read, write, …

– remote disk?• read block, write block

• Naming– how are files named?– are those names location transparent?

• is the file location visible to the user?

– are those names location independent?• do the names change if the file moves?

• do the names change if the user moves?

04/21/23 11

• Caching– caching exists for performance reasons– where are file blocks cached?

• on the file server?

• on the client machine?

• both?

• Sharing and coherency– what are the semantics of sharing?– what happens when a cached block/file is modified– how does a node know when its cached blocks are out of

04/21/23 12

• Replication– replication can exist for performance and/or availability– can there be multiple copies of a file in the network?– if multiple copies, how are updates handled?– what if there’s a network partition and clients work on

separate copies?

• Performance– what is the cost of remote operation?– what is the cost of file sharing?– how does the system scale as the number of clients grows? – what are the performance limitations: network, CPU, disks,

protocols, data copying?

04/21/23 13

Example: SUN Network File System (NFS)

• The Sun Network File System (NFS) has become a common standard for distributed UNIX file access

• NFS runs over LANs (even over WANs – slowly)• Basic idea

– allow a remote directory to be “mounted” (spliced) onto a local directory

– Gives access to that remote directory and all its descendants as if they were part of the local hierarchy

• Pretty much exactly like a “local mount” or “link” on UNIX– except for implementation and performance …– no, we didn’t really learn about these, but they’re obvious

04/21/23 14

• For instance:– I mount /u4/teng on Node1 onto /students/foo on Node2– users on Node2 can then access this directory as

/students/foo– if I had a file /u4/teng/myfile, users on Node2 see it as

/students/foo/myfile

• Just as, on a local system, I might link/groups/it344/www/10wi/

as/u4/teng/it344

to allow easy access to my web data from class home directory

04/21/23 15

NFS implementation

• NFS defines a set of RPC operations for remote file access:– searching a directory– reading directory entries– manipulating links and directories– reading/writing files

• Every node may be both a client and server

04/21/23 16

• NFS defines new layers in the Unix file system

System Call Interface

Virtual File System

buffer cache / i-node table

(local files) (remote files)

UFS NFS

The virtual file system (VFS) provides a standard interface, using v-nodes as file handles. A v-node describes either a local or remote file.

RPCs to other (server) nodes

RPC requests from remote clients, and server responses

04/21/23 17

NFS caching / sharing

• On an open, the client asks the server whether its cached blocks are up to date.

• Once a file is open, different clients can write it and get inconsistent data.

• Modified data is flushed back to the server every 30 seconds.

04/21/23 18

Example: CMU’s Andrew File System (AFS)

• Developed at CMU to support all of its student computing

• Consists of workstation clients and dedicated file server machines (differs from NFS)

• Workstations have local disks, used to cache files being used locally (originally whole files, subsequently 64K file chunks) (differs from NFS)

• Andrew has a single name space – your files have the same names everywhere in the world (differs from NFS)

• Andrew is good for distant operation because of its local disk caching: after a slow startup, most accesses are to local disk

04/21/23 19

AFS caching/sharing

• Need for scaling required reduction of client-server message traffic

• Once a file is cached, all operations are performed locally

• On close, if the file has been modified, it is replaced on the server

• The client assumes that its cache is up to date, unless it receives a callback message from the server saying otherwise– on file open, if the client has received a callback on the file, it

must fetch a new copy; otherwise it uses its locally-cached copy (differs from NFS)

04/21/23 20

Example: Berkeley Sprite File System

• Unix file system developed for diskless workstations with large memories at UCB (differs from NFS, AFS)

• Considers memory as a huge cache of disk blocks– memory is shared between file system and VM

• Files are permanently stored on servers– servers have a large memory that acts as a cache as well

• Several workstations can cache blocks for read-only files

• If a file is being written by more than 1 machine, client caching is turned off – all requests go to the server (differs from NFS, AFS)

04/21/23 21

Other Approaches

• Serverless– Xfs, Farsite

• Highly Available– GFS

• Mostly Read Only– WWW

• State, not Files– SQL Server– BigTable

Administrivia

• Case study

http://www.et.byu.edu/groups/it344/10wi/casestudies.htm#_Assignments

• Lab 8 – last one of the semester, no write up, pass off

gives you full credit• No lab next week – catch up on past due work and

work on BYOOS• BYOOS part 3 – write up just to show progress• HW 10 – last one of the semester• Final exam, on blackboard, date TBD, probably 1st

week of April

• HONOR CODE04/21/23 22

Example: Google File System (GFS)

IndependenceSmall ScaleMany usersMany programs

CooperationLarge ScaleFew usersFew programs

“Google” circa 1997 (google.stanford.edu)

04/21/23 24

Google (circa 1999)

04/21/23 25

Google data center (circa 2000)

04/21/23 26

Google new data center 2001

04/21/23 27

Google data center (3 days later)

04/21/23 28

04/21/23 29

GFS: Google File System

• Why did Google build its own FS?• Google has unique FS requirements

– Huge read/write bandwidth– Reliability over thousands of nodes– Mostly operating on large data blocks– Need efficient distributed operations

• Unfair advantage– Google has control over applications, libraries and

operating system

04/21/23 30

GFS Idealogy

• Huge amount of data• Ability to efficiently access data w/ low locality, typical

query reads 100s MB of data• Large quantity of Cheap machines: performance vs

performance/$, performance/W• Replication: scalability and h/w failure• BW more important than latency• Component failures are the norm rather than the

exception• Atomic append operation so that multiple clients can

append concurrently

04/21/23 32

• 200+ clusters

• Filesystem clusters of 1000s of machines

• Pools of 1000+ clients

• 4+ PB Filesystems

• 40 GB/s read/write load

• (in the presence of frequent HW failures)

GFS Usage @ Google

04/21/23 33

Files in GFS

• Files are huge by traditional standards• Most files are mutated by appending new data rather

than overwriting existing data• Once written, the files are only read, and often only

sequentially.• Appending becomes the focus of performance

optimization and atomicity guarantees

04/21/23 34

GFS Setup

• Master manages metadata• Data transfers happen directly between clients/chunkservers• Files broken into chunks (typically 64 MB)

Client

Misc. servers

ClientRep

Masters

GFS Master

Chunkserver 1

Chunkserver N

Chunkserver 2

04/21/23 35

Architecture

• GFS cluster consists of a single master and multiple chunk servers and is accessed by multiple clients.

• Each of these is typically a commodity Linux machine running a user-level server process.

• Files are divided into fixed-size chunks identified by an immutable and globally unique 64 bit chunk handle

• For reliability, each chunk is replicated on multiple chunk servers

• master maintains all file system metadata.• The master periodically communicates with each chunk server

in HeartBeat (timer) messages to give it instructions and collect its state

• Neither the client nor the chunk server caches file data eliminating cache coherence issues.

• Clients do cache metadata, however.

04/21/23 36

Architecture

04/21/23 37

Read Process

• Single master vastly simplifies design• Clients never read and write file data through the master.

Instead, a client asks the master which chunk servers it should contact.

• Using the fixed chunk size, the client translates the file name and byte offset specified by the application into a chunk index within the file

• It sends the master a request containing the file name and chunk index. The master replies with the corresponding chunk handle and locations of the replicas. The client caches this information using the file name and chunk index as the key.

• The client then sends a request to one of the replicas, most likely the closest one. The request specifies the chunk handle and a byte range within that chunk

04/21/23 38

Specifications

• Chunk Size = 64 MB• Chunks stored as plain Unix files on chunk server.• A persistent TCP connection to the chunk server over an

extended period of time (reduce network overhead)• cache all the chunk location information to facilitate small

random reads.• Master keeps the metadata in memory• Disadvantages – Small files become Hotspots.• Solution – Higher replication for such files.

Microsoft Data Center 4.0

• http://www.youtube.com/watch?v=PPnoKb9fTkA

04/21/23 39

Data center container

04/21/23 40

• Microsoft $500M Chicago data center (2009)• > 2000 servers/container (40 ft)• 150 containers• 11 diesel generators, each 2.8 megawatts• 12 chillers, each 1260 tons

Data center container

04/21/23 41

• Google• IBM• HP• …

04/21/23 42

Cloud Computing Platforms

04/21/23 43

04/21/23 44

Client/server computing

• Mail server/service• File server/service• Print server/service• Compute server/service• Game server/service• Music server/service• Web server/service• etc.

04/21/23 45

Peer-to-peer (p2p) systems

• Napster• Gnutella (LimeWire)

– example technical challenge: self-organizing overlay network

– technical advantage of Gnutella?

– er … legal advantage of Gnutella?

Data source: Digital Music News Research Group

04/21/23 46

Summary

• There are a number of issues to deal with:– what is the basic abstraction– naming– caching– sharing and coherency– replication– performance

• No right answer! Different systems make different tradeoffs!

04/21/23 47

• Performance is always an issue– always a tradeoff between performance and the semantics

of file operations (e.g., for shared files).

• Caching of file blocks is crucial in any file system– maintaining coherency is a crucial design issue.

• Newer systems are dealing with issues such as disconnected operation for mobile computers

Service Oriented Architecture

• How do you allow hundreds of developers to work on a single website?

Amazon.com: The Beginning

• Initially, one web server (Obidos) and one database

Obidos DatabaseInternet

Details: Front end consists of a web server (Apache) and “business logic”

(Obidos)

Amazon: Success Disaster!

Obidos

Loadbalancer

Amazon.com

Internet

Database

Use redundancy to scale-up, improve availability

Obidos

• Obidos was a single monolithic C application that comprised most of Amazon.com’s functionality

• During scale-up, this model began to break down

Problem #1: Branch Management

• Merging code across branches becomes untenable

HelloWorld.c

developmentrelease

Blue changes depend on Red changes (which maydepend on other changes…)

Problem #2: Debugging

• On a failure, we would like to inspect what happened “recently”– But, the change log contains numerous updates from many

groups

• Bigger problem: lack of isolation– Change by one group can impact others

Problem #3: Linker Failure

• Obidos grew so large that standard build tools were failing

Service-Oriented Architecture (1)

• First, decompose the monolithic web site into a set of smaller modules– Called services

• Examples: – Recommendation service– Price service– Catalogue service– And MANY others

Sidebar: Modularity

• Information hiding (Parnas 1972): The purpose of a module is to hide secrets

• Benefits of modularity– Groups can work independently

• Less “synchronization overhead”

– Ease of change• We are free to change the hidden secrets

– Ease of comprehension• Can study the system at a high level of abstraction

public interface List { } // This can be an array, a linked-list, // or something else

Systems and Information Hiding

• There is often a tension between performance and information hiding

• In OS’s, performance often wins:

struct buffer { // DO NOT MOVE these fields! // They are accessed by inline assembly that // assumes the current ordering. struct buffer* next; struct buffer* prev; int size; …

Service Oriented Architectures (2)

• Modularity + a network

• Services live on disjoint sets of machines• Services communicate using RPC

– Remote procedure call

Remote Procedure Call

• RPC exposes a programming interface across machines:

interface PriceService { float getPrice(long uniqueID);}

Client Server

PriceImpl

getPrice()

SOA, Visualized

Website

ShoppingCart

RecommendationCatalogue

•All services reside on separate machines•All invocations are remote procedure calls

Benefits of SOA

• Modularity and service isolation– This extends all the way down to the OS, programming

language, build tools, etc.

• Better visibility– Administrators can monitor the interactions between

services

• Better resource accounting– Who is using which resources?

Performance Issues

• A webpage can require dozens of service calls– RPC system must be high performance

• Metrics of interest:– Throughput– Latency

• Both average and the variance

• Service performance is dictated by contracts called Service Level Agreements– e.g., Service Foo must

• Have 4 9’s of availability• Have a median latency of 50 ms• Have a 3 9’s latency of 200 ms

Amazon and Web Services

• Allow third-parties to use some (but not all) of the Amazon platform

Front-endwebsite

Catalogue

Sleds.com

OrderProcessing

ShoppingCarts

Amazon.com

Searching on a Web Site

Searching Through a Web Serviceclass Program { static void Main(string[] args) { AWSECommerceService service = new AWSECommerceService();

ItemSearch request = new ItemSearch();

request.SubscriptionId = "0525E2PQ81DD7ZTWTK82";

request.Request = new ItemSearchRequest[1]; request.Request[0] = new ItemSearchRequest(); request.Request[0].ResponseGroup = new string[] { "Small" }; request.Request[0].SearchIndex = "Books"; request.Request[0].Author = "Tom Clancy";

ItemSearchResponse response = service.ItemSearch(request);

Console.WriteLine(response.Items[0].Item.Length + " books written by Tom Clancy found."); }}

Other Web Services

• Google– Calendar– Maps– Charts

• Amazon infrastructure services (cloud)– Simple storage (disk)– Elastic compute cloud (virtual machines)– SimpleDB

• Facebook• Ebay• …

it 344: operating systems distributed systems. 12/1/20152 what is a “distributed system”? very...

systemsa distributed

distributed file systemsthe

file blocks

file moves

file server

notthe system

file location visible

common distributed services

Documents

chapter 9 · •the difference between cisc and risc...

ara7405 - 20152.pdf

hunter catalogue 20152

distributed systems: time and mutual exclusion. 2...

locofs: a loosely-coupled metadata service for distributed...

modelling and characterization of laterally-coupled ... ·...

distributed systems. -2 a distributed system -3 loosely...

dolclan - distributed objects in loose coupled local area...

number systems lmt 20152 student

distributed systems -...

king david 20152

dj controller guide 20152

branch newsletter 20152

distributed computational method for coupled fluid structure...

the chubby lock service for loosely-coupled distributed...

which messaging layer to use in a loosely coupled...

regla men to mayo 20152

brownsburg book 20152

language support for loosely-coupled distributed...

control of an electronically-coupled distributed resource...