© 2013 a. haeberlen, z. ives nets 212: scalable and cloud computing 1 university of pennsylvania...
TRANSCRIPT
![Page 1: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/1.jpg)
1University of Pennsylvania
© 2013 A. Haeberlen, Z. Ives
NETS 212: Scalable and Cloud Computing
Internet basics; Faults and failures
September 10, 2013
![Page 2: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/2.jpg)
© 2013 A. Haeberlen, Z. Ives
2University of Pennsylvania
Announcements HW0 is due at 10:00pm today
If you haven't received your svn account info yet, please see me after class
HW1 will be available tonight MS1 due 9/17; MS2 due 9/26 [MS2 is still beta!] AWS credit codes will be mailed out later today Please start early!
Why not start tomorrow? Don't wait until the last moment -- you will need some time for
debugging! Getting an AWS account may take some time (days) -- please
sign up for the account soon!
Anyone still on the waiting list? Please come see me after class (one last time)
![Page 3: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/3.jpg)
© 2013 A. Haeberlen, Z. Ives
3University of Pennsylvania
Plan for today
Parallel programming and its challenges Parallelization and scalability, Amdahl's law Synchronization, consistency Mutual exclusion, locking, issues related to locking Architectures: SMP, NUMA, Shared-nothing
All about the Internet in 30 minutes Structure; packet switching; some important
protocols Latency, packet loss, bottlenecks, and why they
matter
Distributed programming and its challenges
Network partitions and the CAP theorem Faults, failures, and what we can do about them
NEXT
![Page 4: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/4.jpg)
© 2013 A. Haeberlen, Z. Ives
4University of Pennsylvania
The life and times of a web request
What happens when I open the web page 'www.google.com' in my browser?
First approximation: My computer contacts another computer in California and requests the web page from there
"Please give me the web page"
Server inCalifornia
![Page 5: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/5.jpg)
© 2013 A. Haeberlen, Z. Ives
5University of Pennsylvania
"Please give me the web page"
HTTP and HTML
There are standardized protocols: Hypertext Transfer Protocol (HTTP): Describes how
web pages are requested Hypertext Markup Language (HTML): The language
the actual web page is written in
How does the request make it to California?
GET / HTTP/1.1
Server inCalifornia
<html> <head><title>Google</title></head> <body>...</body></html>
HTTP/1.1 200 OK
![Page 6: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/6.jpg)
© 2013 A. Haeberlen, Z. Ives
6University of Pennsylvania
The Internet
The Internet consists of tens of thousands of interconnected networks
Routers and switches forward the data from one network link to the next
Request and response travel along a path through these networks (usually, but not always the 'shortest' path)
Server inCalifornia
Google UPenn
Cogent
AT&TLevel 3 Router
Switch
NetworksIndividual
network link
Path
Client
![Page 7: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/7.jpg)
© 2013 A. Haeberlen, Z. Ives
7University of Pennsylvania
Packet switching
Communication consists of packets Each packet traverses the path independently No dedicated connection like in the telephone
network Packets are relatively small (typically up to 1,500
bytes)
Why is this a good idea?
Google UPenn
Cogent
AT&TLevel 3
Server inCalifornia
Client
![Page 8: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/8.jpg)
© 2013 A. Haeberlen, Z. Ives
8University of Pennsylvania
IP addresses
How do routers know where to send a packet?
Each machine is assigned an IP address Machines in the same network are given similar addresses,
usually from an IP range (Example: Penn's IP range is 158.130.0.0/16)
Each packet has a source and a destination address Each router has a forwarding table that maps ranges
to links over which packets in that range should be sent
Google UPenn
Cogent
AT&TLevel 3
173.194.34.104158.130.53.72
?
4Bit 0 Bit 31
Source IPDestination IP
(data)
Indicates this isan IPv4 packet
![Page 9: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/9.jpg)
© 2013 A. Haeberlen, Z. Ives
9University of Pennsylvania
AAAA
IP routing
Networks exchange routing information If a connection or router fails, this information is
updated Result: Global reachability. Any machine on the
Internet can (in principle) communicate with any other machine.
LL
MM
II
JJ
NN
EE
KK
GG
CC
BB
DD
FF
HH
I know how to
get to A
Networks
![Page 10: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/10.jpg)
© 2013 A. Haeberlen, Z. Ives
10University of Pennsylvania
Plan for today
Parallel programming and its challenges Parallelization and scalability, Amdahl's law Synchronization, consistency Mutual exclusion, locking, issues related to locking Architectures: SMP, NUMA, Shared-nothing
All about the Internet in 30 minutes Structure; packet switching; some important
protocols Latency, packet loss, bottlenecks, and why they
matter
Distributed programming and its challenges
Network partitions and the CAP theorem Faults, failures, and what we can do about them
NEXT
![Page 11: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/11.jpg)
© 2013 A. Haeberlen, Z. Ives
11University of Pennsylvania
Path properties: Bottleneck capacity
How fast can we send data on our path? Limited by the bottleneck capacity What else does the available capacity depend on? Which links are usually the bottleneck links?
ServerClient
Bottleneck
![Page 12: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/12.jpg)
© 2013 A. Haeberlen, Z. Ives
12University of Pennsylvania
Path properties: Propagation delay
Speed of light: 299 792 458 m/s Latency matters!
[ahae@ds01 ~]$ traceroute www.mpi-sws.orgtraceroute to www.mpi-sws.org (139.19.1.156), 30 hops max, 60 byte packets 1 SUBNET-46-ROUTER.seas.UPENN.EDU (158.130.46.1) 1.744 ms 2.134 ms 2.487 ms 2 158.130.21.34 (158.130.21.34) 5.327 ms 5.395 ms 5.649 ms 3 isc-uplink-2.seas.upenn.edu (158.130.128.2) 5.671 ms 5.825 ms 6.175 ms 4 external3-core1.dccs.UPENN.EDU (128.91.9.2) 6.007 ms 6.283 ms 6.362 ms 5 external-core2.dccs.upenn.edu (128.91.10.1) 6.830 ms 6.990 ms 7.080 ms 6 local.upenn.magpi.net (216.27.100.73) 7.250 ms 3.429 ms 3.533 ms 7 remote.internet2.magpi.net (216.27.100.54) 4.487 ms 3.002 ms 2.925 ms 8 198.32.11.51 (198.32.11.51) 90.557 ms 90.806 ms 91.028 ms 9 so-6-2-0.rt1.fra.de.geant2.net (62.40.112.57) 97.403 ms 97.473 ms 97.766 ms10 dfn-gw.rt1.fra.de.geant2.net (62.40.124.34) 98.834 ms 98.890 ms 99.043 ms11 xr-fzk1-te2-3.x-win.dfn.de (188.1.145.50) 100.627 ms 101.034 ms 101.387 ms12 xr-kai1-te1-1.x-win.dfn.de (188.1.145.102) 103.985 ms 104.383 ms 104.528 ms13 xr-saa1-te1-1.x-win.dfn.de (188.1.145.97) 103.636 ms 103.903 ms 104.139 ms14 kr-0unisb.x-win.dfn.de (188.1.234.38) 103.983 ms 103.746 ms 103.853 ms15 mpi2rz-hsrp2.net.uni-saarland.de (134.96.6.28) 104.469 ms 104.355 ms 104.491 ms[ahae@ds01 ~]$
~6,270km (one way)
Round-triptime
![Page 13: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/13.jpg)
© 2013 A. Haeberlen, Z. Ives
13University of Pennsylvania
RTT versus geographical distance
Sourc
e:
htt
p:/
/ww
w.c
aid
a.o
rg/p
roje
cts/
ark
/sta
tist
ics/
otp
-ro/r
tt_v
s_d
ista
nce
.htm
l
Theoretical best(given speed of light
in fiber)
![Page 14: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/14.jpg)
© 2013 A. Haeberlen, Z. Ives
14University of Pennsylvania
Path properties:
What if we send packets too quickly? Router stores the packets in a queue until it can send
them Consequence : End-to-end delay increases Where does this matter?
What if the router runs out of queue space?
Packets are dropped and lost
Other reasons why packets might be dropped?
Queueing delay, loss
![Page 15: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/15.jpg)
© 2013 A. Haeberlen, Z. Ives
15University of Pennsylvania
TCP
Transmission Control Protocol (TCP) provides abstraction of a reliable stream of bytes
Ensures packets are delivered to application in correct order
Retransmits lost packets Tracks available capacity and prevents packets from
being sent too fast (congestion control) Prevents sender from overwhelming the receiver (flow
control)
1 2 3 4IP 1 24 IP
Sender Receiver
TCP TCPData packets
ACK 1 ACK 2Acknowledgments
![Page 16: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/16.jpg)
© 2013 A. Haeberlen, Z. Ives
16University of Pennsylvania
TCP congestion control
How fast should the sender send? Problem: Available capacity not known (and can
vary)
Solution: Congestion control Maintain a congestion window of max #packets in
flight Slow start: Exponential increase until threshold
Increase cwnd by one packet for each incoming ACK Congestion avoidance: Additive increase,
multiplicative decrease (AIMD)
Congestionwindow (cwnd)
Time
-50% -50%
-50%
"Slow start" phase(actually fast!)
ssthresh
packet loss
![Page 17: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/17.jpg)
© 2013 A. Haeberlen, Z. Ives
17University of Pennsylvania
Another reason why latency matters
The higher the RTT, the slower the process
Sender Receiver Sender Receiver
...
1-1460
1461-2920
ACK 1460
1-1460
ACK 1460
1461-29202921-4380
ACK 2920
ACK 4380
![Page 18: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/18.jpg)
© 2013 A. Haeberlen, Z. Ives
18University of Pennsylvania
Recap: The Internet in 30 minutes
What is the Internet? Tens of thousands of interconnected networks Technology: Packet switching (not like telephone
network!)
How does the network matter to applications?
Propagation delay Good to be physically close to customer
Bottlenecks Transfer speed is limited Queueing delays, loss, reordering Delay can vary Network can partition Problem for
consistency/availability Some of these can be taken care of by TCP
![Page 19: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/19.jpg)
© 2013 A. Haeberlen, Z. Ives
19University of Pennsylvania
A quick look at HW1
![Page 20: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/20.jpg)
© 2013 A. Haeberlen, Z. Ives
20University of Pennsylvania
A quick look at HW1
Task: Build a cloud-based "Image search"
Goal: Get experience with JavaScript, jQuery, Node.js, SimpleDB; start working with large data sets (DBpedia)
We've provided most of the code (and detailed instructions), but you have to fill in some missing pieces
![Page 21: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/21.jpg)
© 2013 A. Haeberlen, Z. Ives
21University of Pennsylvania
How does this work?
Your VM
Browser
<html><head><body>…
Web page (home.ejs)
function foo() {$("#id").html("x");}
Script (app.js)
DOMaccesses
Serverrequire('http');http.createServer(…)
Server (HW1.js)
Amazon SimpleDBInternet
![Page 22: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/22.jpg)
© 2013 A. Haeberlen, Z. Ives
22University of Pennsylvania
What is JavaScript?
A widely-used programming language Started out at Netscape in 1995 Widely used on the web; supported by every major
browser Also used in many other places: PDFs, certain
games, ... ... and now even on the server side (Node.js)!
What is it like? Dynamic typing, duck typing Object-based, but associative arrays instead of
'classes' Prototypes instead of inheritance Supports run-time evaluation via eval() First-class functions
![Page 23: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/23.jpg)
© 2013 A. Haeberlen, Z. Ives
23University of Pennsylvania
Running JavaScript in the browser
Web pages can contain JavaScript code Example: Pop up a dialog box when user clicks a
button The code can receive user inputs (e.g., clicks) and
produce outputs, e.g., by changing the web page in which it runs
This is done via the DOM (Document Object Model) Not just a toy language: Entire applications are being
written in it (think Google Apps!)
<html><head><script>function update(){ document.getElementById("color"). innerHTML = "Green";}</script></head><body><div id="color">Red</div><br><input onclick="update()" type="button" value="Change color"></body></html>
Uses DOMto change
text on page
Eventhandle
r
![Page 24: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/24.jpg)
© 2013 A. Haeberlen, Z. Ives
24University of Pennsylvania
What is jQuery?
A lightweight JavaScript library Makes many common functions, such as DOM
manipulation or AJAX, much easier to implement (typically single line)
Examples: $("#id").html(), $("#id").click(), $.getJSON(), ... Widely used (Google, Microsoft, IBM, Netflix, ...)
<html><head><script src="jquery.min.js"></script><script src="app.js"></script></head><body><p id="test">This is some <b>bold</b> text in a paragraph.</p><button id="btn1">Show Text</button><button id="btn2">Show HTML</button></body></html>
$(document).ready(function(){ $("#btn1").click(function(){ alert("Text: " + $("#test").text()); }); $("#btn2").click(function(){ alert("HTML: " + $("#test").html()); });});
test.html app.js
![Page 25: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/25.jpg)
© 2013 A. Haeberlen, Z. Ives
25University of Pennsylvania
Bootstrap
A toolbox for creating web sites HTML/CSS-based design templates for typography,
forms, buttons, navigation, and other interface elements
Can do popups, navbars, progress bars, ...
![Page 26: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/26.jpg)
© 2013 A. Haeberlen, Z. Ives
26University of Pennsylvania
What is Node.js?
A platform for JavaScript-based network apps
Based on Google's JavaScript engine from Chrome Comes with a built-in HTTP server library Lots of libraries and tools available; even has its own
package manager (npm)
Event-driven programming model There is a single "thread", which must never block If your program needs to wait for something (e.g., a
response from some server you contacted), it must provide a callback function
![Page 27: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/27.jpg)
© 2013 A. Haeberlen, Z. Ives
27University of Pennsylvania
"Hello World" with Node.js
Uses built-in HTTP library to create a server
The server will listen on port 8080 createServer() is given a callback function that is
called whenever someone requests a web page Callback writes the required HTTP header followed by "Hello
World" To view the result, open http://localhost:8080/ in a
browser
var http = require('http');
http.createServer( function (request, response) { response.writeHead(200, {'Content-Type': 'text/plain'}); response.end('Hello World\n'); } ).listen(8080);
console.log('Server running' + ' at http://localhost:8080/');
GET / HTTP/1.1
HTTP/1.1 200 OKContent-Type: text/plain
Hello World
Callbackfunction
![Page 28: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/28.jpg)
© 2013 A. Haeberlen, Z. Ives
28University of Pennsylvania
What is JSON?
A standard format for data interchange "JavaScript Object Notation"; MIME type
application/json Basically legal JavaScript code; can be parsed with
eval() Often used in AJAX-style applications Data types: Numbers, strings, booleans, arrays,
"objects"
{ "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": 10021 }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] }
Array (ordered sequence of
values; can be different types)
"Object": Unorderedcollection of
key-value pairs
![Page 29: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/29.jpg)
© 2013 A. Haeberlen, Z. Ives
29University of Pennsylvania
Calling the server$.getJSON('/search/' + $("#inputfield").val(), function(data) { $("#status").html(data.num_results+" result(s)"); });
var express = require('express');var app = express();...app.get('/search/:word', function(req, res) { var n = findResults(req.params.word); res.send(JSON.stringify({num_results: n, foo: 123}));});
Client code(in your browser)
Server code(Node.js)
GET /search/clouds HTTP/1.1
{ num_results: 5, foo: 123 }
Status: 5 result(s)
Web page (in your browser)
Waiting...
![Page 30: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/30.jpg)
© 2013 A. Haeberlen, Z. Ives
30University of Pennsylvania
Fear not!
You don't have to learn this all at once For HW1, we've already written most of the code for
you All you need to do for milestone #1 is implement a
small part of the application But it'll be useful to see how the pieces fit together
The TAs are trying to organize a lab session on Friday; stay tuned for announcements on Piazza
![Page 31: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/31.jpg)
© 2013 A. Haeberlen, Z. Ives
31University of Pennsylvania
Plan for today
Parallel programming and its challenges Parallelization and scalability, Amdahl's law Synchronization, consistency Mutual exclusion, locking, issues related to locking Architectures: SMP, NUMA, Shared-nothing
All about the Internet in 30 minutes Structure; packet switching; some important
protocols Latency, packet loss, bottlenecks, and why they
matter
Distributed programming and its challenges
Faults, failures, and what we can do about them Network partitions, CAP theorem, relaxed
consistency
NEXT
![Page 32: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/32.jpg)
© 2013 A. Haeberlen, Z. Ives
32University of Pennsylvania
Complications in wide-area networks
Communication is slower, less reliable Latencies are higher, more variable Bottleneck capacity is lower Packet loss, reordering, queueing delays
Faults are more common Broken or malfunctioning nodes Network partitions
![Page 33: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/33.jpg)
© 2013 A. Haeberlen, Z. Ives
33University of Pennsylvania
Faults and failures
Terminology: Fault: Some component is not working correctly Failure: System as a whole is not working correctly
X=5
X=5
Set X:=5
X=5
X=5
What is X?
X=5
X=5
What is X?
X=5
X=3
What is X?
X=3
Fault(masked)
Faultscausingfailure
Correct
![Page 34: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/34.jpg)
© 2013 A. Haeberlen, Z. Ives
34University of Pennsylvania
Faults in distributed systems
What could possibly go wrong? Node loses power Hard disk fails Administrator accidentally erases data Administrator configures node incorrectly Software bug triggers Network overloaded, drops lots of packets Hacker breaks into some of the nodes Disgruntled employee manipulates node Fire breaks out in data center where node resides Police confiscates node because of illegal activity ... ...
![Page 35: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/35.jpg)
© 2013 A. Haeberlen, Z. Ives
35University of Pennsylvania
Common misconceptions about faults
"Faults are rare exceptions" NO! At scale, faults are occurring all the time Stopping the system while handling the fault is NOT
an option - system needs to continue despite the fault
"Faulty machines always stop/crash" NO! There are many types of faults with different
effects If your system is designed to handle only crash faults
and another type of fault occurs, things can become very bad
![Page 36: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/36.jpg)
© 2013 A. Haeberlen, Z. Ives
36University of Pennsylvania
Types of faults Crash faults
Node simply stops Examples: OS crash, power loss
Rational behavior Owner manipulates node to increase profit Example: Traffic attraction attack (see next
slide)
Byzantine faults Arbitrary - faulty node could do anything
(stop, tamper with data, tell lies, attack other nodes, send spam, spy on user...)
Example: Node compromised by a hacker, data corruption, hardware defect...
![Page 37: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/37.jpg)
© 2013 A. Haeberlen, Z. Ives
37University of Pennsylvania
Rational fault example
System + control are distributed
$$$$$$$
$$$
$$$$
$$$$
$$$
I need connectivit
y!
Who knows how to get to YouTube?
I have a good route
I have an okay route
Alice's provider can choose between several routes to the same destination
Alice
Traffic attraction: see Goldberg et al., "Rationality and Traffic Attraction: Incentives for honestly announcing paths in BGP", SIGCOMM 2010
![Page 38: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/38.jpg)
© 2013 A. Haeberlen, Z. Ives
38University of Pennsylvania
Rational fault example
Networks have an incentive to make their routes appear better than they are
$$$$
$$$
I have a GREAT route to
YouTube
I wish my route had
been chosen
Alice
![Page 39: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/39.jpg)
© 2013 A. Haeberlen, Z. Ives
39University of Pennsylvania
Some examples of Byzantine faults
http://consumerist.com/2007/08/lax-meltdown-caused-by-a-single-network-interface-card.html
![Page 40: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/40.jpg)
© 2013 A. Haeberlen, Z. Ives
40University of Pennsylvania
Some examples of Byzantine faults
http://status.aws.amazon.com/s3-20080720.html
![Page 41: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/41.jpg)
© 2013 A. Haeberlen, Z. Ives
41University of Pennsylvania
Some examples of Byzantine faults
http://www.wired.com/epicenter/2009/01/magnolia-suffer/
![Page 42: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/42.jpg)
© 2013 A. Haeberlen, Z. Ives
42University of Pennsylvania
Some examples of Byzantine faults
http://groups.google.com/group/google-appengine/msg/ba95ded980c8c179
![Page 43: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/43.jpg)
© 2013 A. Haeberlen, Z. Ives
43University of Pennsylvania
Some examples of Byzantine faults
Disgruntled UBS PaineWebber Employee Charged with Allegedly Unleashing “Logic Bomb” on Company Computers
NEWARK - A disgruntled computer systems administrator for UBS PaineWebber was charged today with using a “logic bomb” to cause more than $3 million in damage to the company’s computer network, and with securities fraud for his failed plan to drive down the company’s stock with activation of the logic bomb, U.S. Attorney Christopher J. Christie announced. Roger Duronio, 60, of Bogota, N.J., was charged today a two-count Indictment returned by a federal grand jury, according to Assistant U.S. Attorney William Devaney. The Indictment alleges that Duronio, who worked at PaineWebber’s offices in Weehawken, N.J., planted the logic bomb in some 1,000 of PaineWebber’s approximately 1,500 networked computers in branch offices around the country. Duronio, who repeatedly expressed dissatisfaction with his salary and bonuses at Paine Webber resigned from the company on Feb. 22, 2002. The logic bomb Duronio allegedly planted was activated on March 4, 2002. In anticipation that the stock price of UBS PaineWebber’s parent company, UBS, A.G., would decline in response to damage caused by the logic bomb, Duronio also purchased more than $21,000 of “put option” contracts for UBS, A.G.’s stock, according to the charging document. A put option is a type of security that increases in value when the stock price drops. Market conditions at the time suggest there was no such impact on the UBS, A.G. stock price. [...] The Indictment alleges that, from about November 2001 to February, Duronio constructed the logic bomb computer program. On March 4, as planned, Duronio’s program activated and began deleting files on over 1,000 of UBS PaineWebber’s computers. It cost PaineWebber more than $3 million to assess and repair the damage, according to the Indictment. As one of the company’s computer systems administrators, Duronio had responsibility for, and access to, the entire UBS PaineWebber computer network, according to the Indictment. He also had access to the network from his home computer via secure Internet access. [...]
Source: http://www.justice.gov/criminal/cybercrime/duronioIndict.htm
![Page 44: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/44.jpg)
© 2013 A. Haeberlen, Z. Ives
44University of Pennsylvania
Correlated faults
A single problem can cause many faults Example: Overloaded machine crashes, increases
load on other machines domino effect Example: Bug is triggered in a program that is used
on lots of machines Example: Hacker manages to break into many
computers due to a shared vulnerability Example: Machines may be connected to the same
power grid, cooled by the same A/C, managed by the same admin
...
Why is this problematic?
![Page 45: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/45.jpg)
© 2013 A. Haeberlen, Z. Ives
45University of Pennsylvania
Recap: Faults and failures Faults happen all the time
Hardware malfunction, software bug, manipulation, hacker break-ins, misconfiguration, ...
NOT a rare occurrence at scale - must design system to handle them
All faults are NOT independent crash faults
Faults can be correlated Rational and Byzantine faults are real
Three common fault models: Crash fault model: Faulty machines simply stop Rational model: Machines manipulated by selfish
owners Byzantine fault model: Faulty machines could do
anything
![Page 46: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/46.jpg)
© 2013 A. Haeberlen, Z. Ives
46University of Pennsylvania
So what can we do?
![Page 47: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/47.jpg)
© 2013 A. Haeberlen, Z. Ives
47University of Pennsylvania
What can we do? Prevention and avoidance
Example: Prevent crashes with software verification Example: Provide incentives for participation
Detection Example: Cross-check network's route
announcements with other information to see whether it is lying, and hold it accountable if it is (e.g., sue for breach of contract)
Masking Example: Store replicas of the data on multiple
nodes; if data is lost or corrupted on one of them, we still have the other copies
Mitigation
![Page 48: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/48.jpg)
© 2013 A. Haeberlen, Z. Ives
48University of Pennsylvania
Masking faults with replication
Server A
Server BAlice
Bob
Alice can store her data on both servers Bob can get the data from either server
A single crash fault on a server does not lead to a failure
Availability is maintained What about other types faults, or multiple faults?
![Page 49: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/49.jpg)
© 2013 A. Haeberlen, Z. Ives
49University of Pennsylvania
Problem: Maintaining consistency
What if multiple clients are accessing the same set of replicas?
Requests may be ordered differently by different replicas
Result: Inconsistency! (remember race conditions?) For what types of requests can this happen? What do we need to do to maintain consistency?
Server A
Server BAlice
Bob
X:=5X:=7X:=5
X:=7
![Page 50: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/50.jpg)
© 2013 A. Haeberlen, Z. Ives
50University of Pennsylvania
Types of consistency
Strong consistency After an update completes, any subsequent access
will return the updated value
Weak consistency Updated value not guaranteed to be returned
immediately, only after some conditions are met (inconsistency window)
Eventual consistency A specific type of weak consistency If no new updates are made to the object, eventually
all accesses will return the last updated value
![Page 51: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/51.jpg)
© 2013 A. Haeberlen, Z. Ives
51University of Pennsylvania
Example: Storage system
Scenario: Replicated storage We have N nodes that can store data Data contains a monotonically
increasing timestamp
To write a value: Pick W replicas and write the value to each, using a
fresh timestamp (say, the current wallclock time)
To read a value: Pick R replicas and read the value from each Return the value with the highest timestamp If any replicas had a lower timestamp, send them
the newer value
X=3v1
X=3v1
X=3v1
X=5v2
X=2v4
X=5v2
Replica
![Page 52: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/52.jpg)
© 2013 A. Haeberlen, Z. Ives
52University of Pennsylvania
How to set N, R, and W
For strong consistency? What happens otherwise? Will the data ever become
consistent again?
To avoid conflicting writes? To make reads fast? Writes? To minimize the risk of
data loss? Let's do some examples!
N=2, W=2, R=1 N=2, W=1, R=1
Read set
Write set
![Page 53: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/53.jpg)
© 2013 A. Haeberlen, Z. Ives
53University of Pennsylvania
Consensus
Replicas need to agree on a single order in which to execute client requests
How can we do this? Does the specific order matter?
Problem: What if some replicas are faulty?
Crash fault: Replica does not respond; no progress (bad)
Byzantine fault: Replica might tell lies, corrupt order (worse)
Solution: Consensus protocol Paxos (for crash faults), PBFT (for Byzantine faults) Works as long as no more than a certain fraction of
the replicas are faulty (PBFT: one third)
![Page 54: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/54.jpg)
© 2013 A. Haeberlen, Z. Ives
54University of Pennsylvania
How do consensus protocols work?
Idea: Correct replicas 'outvote' faulty ones
Clients send requests to each of the replicas Replicas coordinate and each return a result Client chooses one of the results, e.g., the one that is
returned by the largest number of replicas If a small fraction of the replicas returns the wrong
result, or no result at all, they are 'outvoted' by the other replicas
![Page 55: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/55.jpg)
© 2013 A. Haeberlen, Z. Ives
55University of Pennsylvania
Plan for today
Parallel programming and its challenges Parallelization and scalability, Amdahl's law Synchronization, consistency Mutual exclusion, locking, issues related to locking Architectures: SMP, NUMA, Shared-nothing
All about the Internet in 30 minutes Structure; packet switching; some important
protocols Latency, packet loss, bottlenecks, and why they
matter
Distributed programming and its challenges
Faults, failures, and what we can do about them Network partitions, CAP theorem, relaxed
consistency
NEXT
![Page 56: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/56.jpg)
© 2013 A. Haeberlen, Z. Ives
56University of Pennsylvania
Network can partition Hardware fault, router misconfigured, undersea cable
cut, ... Result: Gobal connectivity is lost What does this mean for the properties of our
system?
Server A
Server B
What if this linkbreaks?
Alice
Bob
Network partitions
![Page 57: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/57.jpg)
© 2013 A. Haeberlen, Z. Ives
57University of Pennsylvania
The CAP theorem
What we want from a web system: Consistency: All clients share the same view of the
data, even in the presence of concurrent updates Availability: All clients can access at least one replica
of the data, even when faults occur Partition-tolerance: Consistency and availability hold
even when the network partitions
Can we get all three? CAP theorem: We can get at most two out of the
three Which ones should we choose for a given system?
Conjecture by Brewer; proven by Gilbert and Lynch
![Page 58: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/58.jpg)
© 2013 A. Haeberlen, Z. Ives
58University of Pennsylvania
Common CAP choices
Example #1: Consistency & Partition tolerance
Many replicas + consensus protocol Do not accept new write requests during partitions Certain functions may become unavailable
Example #2: Availability & Partition tolerance
Many replicas + relaxed consistency Continue accepting write requests Clients may see inconsistent state during partitions
![Page 59: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/59.jpg)
© 2013 A. Haeberlen, Z. Ives
59University of Pennsylvania
Relaxed consistency: ACID vs. BASE
Classical database systems: ACID semantics
Atomicity Consistency Isolation Durability
Modern Internet systems: BASE semantics
Basically Available Scalable Eventually consistent
![Page 60: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/60.jpg)
© 2013 A. Haeberlen, Z. Ives
60University of Pennsylvania
Eventual consistency
Idea: Optimistically allow updates Don't coordinate with ALL replicas before returning
response But ensure that updates reach all replicas eventually
What do we do if conflicting updates were made to different replicas?
Good: Decouples replicas. Better performance, availability under partitions
(Potentially) bad: Clients can see inconsistent state
Server A Server B
Alice
![Page 61: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/61.jpg)
© 2013 A. Haeberlen, Z. Ives
61University of Pennsylvania
Recap: Consistency and partitions
Use replication to mask limited # of faults
Can achieve strong consistency by having replicas agree on a common request ordering
Even non-crash faults can be handled, as long as there are not too many of them (typical limit: 1/3)
Partition tolerance, availability, consistency?
Can't have all three (CAP theorem) For some services, need to drop one (usually
availability) If service works with weaker consistency guarantees,
such as eventual consistency, can get a compromise (BASE)
Example: Shopping cart
![Page 62: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/62.jpg)
© 2013 A. Haeberlen, Z. Ives
62University of Pennsylvania
Plan for today
Parallel programming and its challenges Parallelization and scalability, Amdahl's law Synchronization, consistency Mutual exclusion, locking, issues related to locking Architectures: SMP, NUMA, Shared-nothing
All about the Internet in 30 minutes Structure; packet switching; some important
protocols Latency, packet loss, bottlenecks, and why they
matter
Distributed programming and its challenges
Faults, failures, and what we can do about them Network partitions, CAP theorem, relaxed
consistency
![Page 63: © 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Internet basics; Faults and failures September 10, 2013](https://reader035.vdocument.in/reader035/viewer/2022062417/5516a3a7550346f6208b4d59/html5/thumbnails/63.jpg)
© 2013 A. Haeberlen, Z. Ives
63University of Pennsylvania
Stay tuned
Next time you will learn about: Cloud basics; Amazon AWS