ece fall 2015 prof. john copeland
DESCRIPTION
Chapter 3 - Transport Layer Chapter 4a – IP Addresses TCP and UDP, Ports and Sockets, TCP Flow and Congestion Control TCP Flags, Sequence and Ack. No.s IP Subnets, Routers, Address Blocks IP SubnetsTRANSCRIPT
Quiz 2 -1
Review for Quiz-2
ECE3600 - Fall 2015
Prof. John Copeland
Computer Networking: A Top Down Approach Featuring the Internet, 5th edition. Jim Kurose, Keith RossAddison-Wesley, July 2004. Base material copyright 1996-2006
J.F Kurose and K.W. Ross, All Rights Reserved
10-8-2015
Quiz 2 -2
Chapter 3 - Transport LayerChapter 4a – IP Addresses
TCP and UDP, Ports and Sockets,TCP Flow and Congestion ControlTCP Flags, Sequence and Ack. No.sIP Subnets, Routers, Address BlocksIP Subnets
Quiz 2 -3
Transport services and protocols• provide logical communication
between app processes running on different hosts
• transport protocols run in end systems – send side: breaks app
messages into segments, passes to network layer
– rcv side: reassembles segments into messages, passes to app layer
• more than one transport protocol available to apps– Internet: TCP and UDP
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
Quiz 2 -4
Internet transport-layer protocols• reliable, in-order
delivery (TCP)– congestion control – flow control– connection setup
• unreliable, unordered delivery: UDP– no-frills extension of
“best-effort” IP• services not available:
– delay guarantees– bandwidth guarantees
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
Quiz 2 -5
Multiplexing / demultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2 host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)
Multiplexing at send host:
Quiz 2 -6
Connection-oriented demux
• TCP socket identified by 4-tuple: – source IP address– source port number– dest IP address– dest port number
• recv host uses all four values to direct segment to appropriate socket
• Server host may support many simultaneous TCP sockets:– each socket identified by
its own 4-tuple• Web servers have
different sockets for each connecting client– non-persistent HTTP will
have different socket for each request
Quiz 2 -7
Connectionless demuxDatagramSocket serverSocket = new DatagramSocket(6428);
ClientIP:B
P2
client IP: A
P1P1P3
serverIP: C
SP: 6428DP: 9157
SP: 9157DP: 6428
SP: 6428DP: 5775
SP: 5775DP: 6428
Source Port, SP, (and Source IP) provides “return address”
Quiz 2 -8
UDP: User Datagram Protocol [RFC 768]
• “no frills,” “bare bones” Internet transport protocol
• “best effort” service, UDP segments may be:– lost– delivered out of order to
app• connectionless:
– no handshaking between UDP sender, receiver
– each UDP segment handled independently of others
Why is there a UDP?• no connection establishment
(which can add delay)• simple: no connection state
at sender, receiver• small segment header• no congestion control: UDP
can blast away as fast as desired
Quiz 2 -9
UDP: more• often used for streaming
multimedia apps– loss tolerant– rate sensitive
• other UDP uses– DNS– SNMP
• reliable transfer over UDP: add reliability at application layer– application-specific error
recovery!
source port # dest port #32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
Quiz 2 -10
UDP checksum
Sender:• treat segment contents as
sequence of 16-bit integers• checksum: addition (1’s
complement sum) of header and some parts of the IP header)
• sender puts bit-wise complement (-checksum) value into UDP checksum field
Receiver:• compute checksum of
received segment, included checksum field.
• check if computed checksum equals zero :– NO - error detected– YES - no error detected.
But maybe errors nonetheless? More later ….
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
Quiz 2 -11
Internet Checksum Example• Note
– When adding numbers, a carry out from the most significant bit needs to be shifted (>>16) and added to the result
• Example: add two 16-bit integers (1's compliment)1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 11 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
wraparoundsum bit-invert checksum
BINARY ADD
Quiz 2 -12Transport Layer
ProblemPacket may arrive with errors.Packet may not arrive.
Sender may wait forever for ACK.ACK may not arrive, dup. sent.Packets may arrive out-of-order.Inefficient to send one pkt per RT
Missing packet early in window.“Go-Back-N” inefficient.
---- Also in TCP ---Packets may be different sizes.Slow down when network congested (as detected by RTO or triple duplicate ACKs.Know when receiver buffer will be full.
SolutionAdd checksum, CRC, or hash.Receiver sends “ACK” back. IfACK not received, packet re-sent.Timeout timer added to sender.Add sequence no.s to detect dups.Buffer packets to rearrange order.Have a “window” to send before ACK(pipelining).“Go-Back-N” to last in-order packet.“Selective Repeat” to fill in gaps only.
----Sequence number for each byte.“Slow-Start”, or "Multiplicative Decrease" to reduce transmit window. Receiver includes “space left” in every ACK.
Reliable Data Transport
Quiz 2 -13
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
• full duplex data:– bi-directional data flow in
same connection– MSS: maximum segment
size• connection-oriented:
– handshaking (exchange of control msgs) init’s sender, receiver state before data exchange
• flow controlled:– sender will not overwhelm
receiver
• point-to-point:– one sender, one receiver
• reliable, in-order byte steam:– no “message boundaries”
• pipelined:– TCP congestion and flow
control set window size• send & receive buffers
Quiz 2 -14
TCP segment structure
source port # dest port #32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberReceive windowUrg data pointerchecksum
FSRPAUheadlen
notused
Options (variable length, MSS)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(end of block)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
Quiz 2 -15
TCP seq. #’s and ACKsSeq. #’s:
– byte stream “number” of first byte in segment’s data
ACKs:– seq # of next byte
expected from other side
– cumulative ACKQ: how receiver handles
out-of-order segments– A: TCP spec doesn’t
say, - up to implementor
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C ’
host ACKsreceipt
of echoed‘C ’
host ACKsreceipt of‘C ’, echoes
back ‘C ’
timesimple telnet scenario
Quiz 2 -16
Maximum Segment Size (MSS), in bytesThe initial segments (the SYN and SYN-ACK) contain the MSS in an option field. It stays constant after this.
This tells the other host the maximum size of a segment that can be handled by their local network (without fragmentation).
Examples, one host may say it's MSS value is 1400, the other may say it's MSS value is 1420.
Since segments have to transverse both local networks, the smaller MSS value is used for the connection.
TCP rules involving Window sizes are in units of MSS (bytes), not number of segments.
For simplification, examples may say "the host is sending maximum size segments," so that 1 MSS = 1 segment. Sometimes this is implied without being stated in problems.
MSS includes the TCP header bytes (40 to 64) and data bytes, but not the IP header bytes (20). Since Ethernet and WiFi limit datagram size to 1500 bytes, MSS is never larger than 1480 bytes when either host is on a LAN.
Quiz 2 -17
TCP Round Trip Time and TimeoutEstimatedRTT[new] = (1-)* EstimatedRTT[old]
+ * SampleRTT[new]
• Exponential weighted moving average• influence of past sample decreases exponentially fast• typical value: = 0.125
Setting the timeoutEstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety marginfirst estimate how much SampleRTT deviates from EstimatedRTT:
DevRTT[new] = (1-) * DevRTT[old] + * |SampleRTT[new] - EstimatedRTT[old]|
(typically, = 0.25, note absolute value bars, ||)
TimeoutInterval: RTO = EstimatedRTT + 4 * DevRTT
Also note: Old value of EstimatedRTT is used.
Quiz 2 -18
A = 0.875 * 60 + 0.125 * 90 = 63.75 -> 64 D = 0.75 * 10 + 0.25 * | 30 | = 15
A = 0.875 * 64 + 0.125 * 30 = 59.75 -> 60 D = 0.75 * 15 + 0.25 * | -34 | = 19.75 -> 20
Running Average for Calculating the Retransmit Time Out, RTO
Round results up to 1 ms. Alpha = 1/8 (0.125) and Beta = 1/4 (0.250)
124
SampleRTT EstimatedRTT DevRTT TimeOutSampleRTT[new]- EstimatedRTT[old]
-34
Quiz 2 -19
sender won’t overflowreceiver’s buffer by
transmitting too much,
too fast
flow control
TCP Flow control
Receiver-Window =• spare room in buffer= LastByteInBuffer - LastByteACKed
• Receiver advertises spare room by including value of RcvWindow in every segment (TCP header "Window" field)
• Sender limits data to RcvWindow– guarantees receive
buffer doesn’t overflow
LastByteInBuffer
LastByteACKed
<- Byte No.s
Quiz 2 -20
Causes / costs of congestionEach host sends in data (average bits per second).Buffer output is out (maximum rate is C)
• two senders, two receivers
• one router, infinite buffers
• no retransmission
• large delays when congested
• maximum achievable throughput
unlimited shared output link buffers
Host Ain : original data
Host B
out
Quiz 2 -21
TCP Congestion Control
• sender limits transmission: LastByteSent-LastByteAcked CongWin• Roughly*,
• CongWin is dynamic, function of perceived network congestion.
• The sender uses the smaller value of CongWin and or Window (receiver's Window)
How does sender perceive congestion?
• loss event = timeout or 3 duplicate ACKs
• TCP sender reduces rate (CongWin) after loss event
three mechanisms:– AIMD (additive increase,
multiplicative decrease)– slow start initially
(exponential growth until threshold reached)
– conservative after timeout events (slow-start up to CongWin)
rate = CongWin
RTT Bytes/sec
* This is true when this window-limited rate is less that the media bandwidth in bytes per second.
Quiz 2 -22
TCP Slow Start
• When connection begins*, increase rate exponentially until first loss event:
– double CongWin every RTT
– done by adding a byte to CongWin for every new byte ACK'ed.
• Summary: initial rate is slow but data rate ramps up exponentially fast (until the Receiver Window is reached)
Host Aone (MSS) segment
RTT
Host B
time
two segments
four segments
*Also done after a Time Out, but changes to Additive Increase when theThreshold is reached.
SYN and SYN-ACK: TCP headers contain MSS values (in option field) and initial Segment Numbers.
Quiz 2 -23
TCP congestion control: sender congestion window: "CongWin"
• Approach: increase transmission rate (window size), probing for usable bandwidth, until loss occurs– multiplicative decrease: cut CongWin in half after loss
indicated by 3 duplicate ACKs (to MSS after Time Out*). – additive increase: increase CongWin by 1 MSS every
RTT until loss detected by 3 duplicate ACKs (or Time Out*)
timecong
estio
n w
indo
w s
ize
Saw toothbehavior: probing
for bandwidth
*After a Time Out, CongWin increases by doubling every RTT until 1/2 old CongWin reached
Quiz 2 -24
RefinementQ: When should the
exponential increase switch to linear (after RTO)?
A: When CongWin gets to 1/2 of its value before timeout.
Implementation:• Variable Threshold • At loss event, Threshold is
set to 1/2 of CongWin value just before loss event.
Fast Recovery (3 dups)(Fast Retransmission
replaced missing segment)
Time-Out (dt>RTO)
If a Fast Retransmit fixes the gap in ACKs before a timeout, TCP can skip the Slow-Start and immediately use Additive Increase, starting at half the previous CongWin.
Con
gWin
(M
SS
)
(Time/RTT)
Time-Out(CongWin = 16)
TCP Reno
Quiz 2 -25
Fast Retransmit (to avoid Timeout)• Time-out period often
relatively long:– long delay before
resending lost packet• Detect lost segments via
duplicate ACKs.– Sender often sends
many segments back-to-back
– If segment is lost, there will likely be many duplicate ACKs.
• If sender receives 4 ACKs for the same data (3 dups), it supposes that segment after ACKed data was lost:– Fast Retransmit:
resend segment before timer expires.
When resent packet is ACKed before a timeout, go to Fast Recovery Mode: - Halve Sender-Window, "CongWin" - Increase CongWin by 1 MSS per CongWin bytes sent and Acked.
Quiz 2 -26
CongWin / mss
Threshold = 20Time Out
3 Dup. ACKs12
6
CongWin <= Threshold: Doubles each RTT (add MSS for each ACK)CongWin > Threshold: Adds MSS each RTT
Time Out: Threshold = 1/2 CongWin, CongWin = 1 (Slow-Start)3-Dup Ack: Threshold = 1/2 CongWin, CongWin = Threshold (Fast Recovery)
Quiz 2 -27
Fairness
Fairness and UDP• Multimedia apps often do not
use TCP– do not want rate throttled by
congestion control• Instead use UDP:
– pump audio/video at constant rate, tolerate packet loss
• Research area: make UDP more TCP friendly– Solution: reserve 50% of
router buffer space for TCP segments (excess UDP segments dropped).
Fairness and parallel TCP connections
• nothing prevents app from opening parallel connections between 2 hosts.
• Web browsers do this • Example: link of rate R
supporting 9 connections; – new app starts 1 TCP, gets
rate R/10– new app starts 9 TCPs, gets
R/2 !
Quiz 2 -28
Chapter 4: Network Layer• 4. 1 Introduction• 4.2 Virtual circuit
and datagram networks
• 4.3 What’s inside a router
• 4.4 IP: Internet Protocol– Datagram format– IPv4 addressing– ICMP– IPv6
• 4.5-6 Routing– Distance Vector
(RIP) – Link state (OSPF)– Hierarchical routing
(BGP)• 4.7 Broadcast and
multicast routing
Quiz 2 -29Q3-29
Longest prefix matching
Prefix Match Link Interface 11001000 00010111 0001 0 0 11001000 00010111 0001 1000 1 11001000 00010111 0001 1 2 otherwise 3
DA: 11001000 00010111 0001 1000 1010 1010 Matches links 2 & 3 - Link 2 the longest.
Examples
DA: 11001000 00010111 0001 0110 1010 0001 Only matches Link 0
Which interface?
Which interface?
DA: 11001000 00010111 0001 1100 1010 1010 Only matches Link 3
Quiz 2 -30Q3-30
IP datagram format
ver length
32 bits
data (variable length,typically a TCP
or UDP segment)
16-bit identifierheader
checksumtime to
live32 bit source IP address
IP protocol versionnumber
header length (bytes)
max numberremaining hops
(decremented at each router)
forfragmentation/reassembly
total datagramlength (bytes)
upper layer protocolto deliver payload to
head.len
type ofservice
“type” of data flgs fragment offset
upper layer
32 bit destination IP addressOptions (if any) E.g. timestamp,
record routetaken, specifylist of routers to visit.
how much overhead with TCP?
20 bytes of TCP 20 bytes of IP = 40 bytes +
app layer overhead
Quiz 2 -31
IP Fragmentation and Reassembly
ID=x
offset=0
fragflag=0
length=4000
ID=x
offset=0
fragflag=1
length=1500
ID=x
offset=185
fragflag=1
length=1500
ID=x
offset=370
fragflag=0
length=1040
One large datagram becomesseveral smaller datagrams
Example 4000 byte
datagram MTU = 1500 bytes
1480 bytes in data field
offset =1480/8
Quiz 2 -32Q3-32
Subnets – have a contiguous block of IP addresses which have the first N bits in common (a "/N").
223.1.1.0/24 223.1.2.0/24
223.1.3.0/24
Recipe• To determine the
subnets, detach each interface from its host or router, creating islands of isolated networks. Each isolated network is called a subnet.
Subnet mask: /24 (24 1's (32-24) 0's11111111 11111111 11111111 00000000Dotted decimal notation: 255.255.255.0
Quiz 2 -33
IP addressing: CIDRCIDR: Classless InterDomain Routing
– subnet portion of address of arbitrary length
– address format: a.b.c.d/x, where x is # bits in subnet portion of address
11001000 00010111 00010000 0000000011111111 11111111 11111110 00000000
subnetpart
hostpart
200.23.16.0/23255.255.254.0 Inverted = 0.0.1.255
Addr.Mask:
Addr.Mask:
Network Address = Host Address (Bitwise AND) Network Mask = A & MMinimum Host Address* = Network AddressMaximum Host Address* = Network Address (OR) [Inverted Network Mask] * (reserved, not assigned to host)
Quiz 2 -34
The (sub)network mask can be used to change:
•an IP address into the corresponding network address (for comparison in a router forwarding table).
Match[i] = {(IP & mask[i]) == Network_addr[i]}
• “==“ means “TRUE if equals”
•an IP address (or network address) into the network Broadcast Address:
Broadcast_addr = IP | ~mask“& ” bitwise AND “|” bitwise OR “~ ” bitwise inversion (0->1, 1->0)
Quiz 2 -35Q3-35
IP Address Bitwise Calculations
1101000 00011001 0001xxxx xxxxxxxxFrom this, to get Network Mask, 0 or 1 -> 1, x -> 0 11111111 11111111 11110000 00000000Minimum Host Address: x -> 0 1101000 00011001 00010000 00000000Maximum Host Address: x -> 1 1101000 00011001 00011111 11111111
Minimum host address is the “Network Address”Maximum host address is the “Broadcast Addr.”
200.23.16.0/20
Quiz 2 -36
Split a subnet - Bitwise Calculations
Add 1 more bit to the prefix, a '0' or a '1'1101000 00011001 0001(0/1)xxx xxxxxxxxThe lower subnet will have "0", so the network address is the same, except the prefix size is 21 200.23.16.0/21 (max IP = 200.23.23.255)The higher subnet will have "1", so the network address split-byte is higher by the value of that bit, 0001 xxxx -> 0001 1xxx = 16 + 8 =24 200.23.24.0/21 (max IP = 200.23.31.255)
To split a subnet into 2 half-size subnets200.23.16.0/20 (max IP = 200.23.31.255)
Quiz 2 -37
NAT: Network Address Translation
10.0.0.1
10.0.0.2
10.0.0.3
S: 10.0.0.1, 3345D: 128.119.40.186, 80
110.0.0.4
138.76.29.7
1: host 10.0.0.1 sends datagram to 128.119.40.186, 80
NAT translation tableWAN side addr LAN side addr138.76.29.7, 5001 10.0.0.1, 3345…… ……
S: 128.119.40.186, 80 D: 10.0.0.1, 3345
4
S: 138.76.29.7, 5001D: 128.119.40.186, 80
2
2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table
S: 128.119.40.186, 80 D: 138.76.29.7, 5001
33: Reply arrives dest. address: 138.76.29.7, 5001
4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345
NAT Table for Internal Servers must be configured manually(Port Forwarding).
Quiz 2 -38
ICMP: Internet Control Message Protocol
• used by hosts & routers to communicate network-level information– error reporting:
unreachable host, network, port, protocol
– echo request/reply (used by ping)
• network-layer “above” IP:– ICMP msgs carried in IP
datagrams• ICMP message: type, code
plus first 8 bytes of IP datagram causing error
Type Code description0 0 echo reply (ping) 3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown
4 0 source quench (congestion control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header
Quiz 2 -39
Traceroute and ICMP
• Source sends series of UDP segments to dest– First has TTL =1– Second has TTL=2, etc.– Unlikely port number
• When nth datagram arrives to nth router:– Router discards
datagram– And sends to source an
ICMP message (type 11, code 0)
– Message includes name of router& IP address
• When ICMP message arrives, source calculates RTT
• Traceroute does this 3 times
Stopping criterion• UDP segment eventually
arrives at destination host
• Destination returns ICMP “host unreachable” packet (type 3, code 3)
• When source gets this ICMP, it stops.
Quiz 2 -40
IPv6 Header (Cont)
Priority: identify priority among datagrams in flowFlow Label: identify datagrams in same “flow.” (concept of“flow” not well defined).Next header: identify upper layer protocol for data
New: “flow label”, “longer addresses”Missing: fragmentation (flags, ID, offset)
“6to4 Translation” 4-byte IPv4 -> 16-byte IPv6A.B.C.D -> :2002:aabb:ccdd/80 “:aa:”= “A in 2-char hex” “:bb:”= “B in 2-char hex” etc.IPv4 address can become an IPv6 sub-net with 80 bits for “host” addresses (1e24 hosts) http://en.wikipedia.org/wiki/6to4
Quiz 2 -41
IPv6 Addressing• Routable Address is 64 bits or less.• Last 64 bits is the host, or port, ID
part Subnet prefix Interface identifier
bits 64 64
e.g. fe80::290:9eff: fe9a:a1bf:21a1::
Note that fe80::290:9eff: expands to fe80:0000:0290:9eff:(leading hex 0's can be omitted)
The "Link Local" Address is in the address block fe80:::::::/10. The "Identifier can be the Mac Address (WiFi or Ethernet). Used as source IP when requesting DHCP configuration.
The IPv4 "Link Local" Address is chosen randomly from the 65,536 possible 169.254.0.0/16 addresses.
The broadcast address is always ff02::1 ( ff02:0000:0001:: )
Quiz 2 -42
“6to4 Translation” 4-byte IPv4 -> 16-byte IPv6A.B.C.D -> :2002:aabb:ccdd/48 “:aa:”= “A in 2-char hex” “:bb:”= “B in 2-char hex” etc.IPv4 address can become an IPv6 sub-net with 80 bits for “host” addresses (1e24 hosts) http://en.wikipedia.org/wiki/6to4
Example – 6to4 convert 130.207.17.25 to IPv6 address.Convert the decimal byte-representations to hex: 130 = 0x82 207 = 0xCF 17 = 0x11 25 = 0x19IPv6 addresses are written with colons separating every 16 bits (4 hex characters). :0000: can be written ::The first 16 bits 0x2002 are a reserved /16 block of addresses reserved for IPv4 translations. :2002::::::::/16 Next add the IPv4 32 bits: :2002:82CF:1119::::::/48 This is not just a single IPv6address, but a block of 2^16possible host addresses that can replace private subnetaddresses, like 192.168.0.0/16
Quiz 2 -43Network Layer 4-43
Other Changes from IPv4• Checksum: removed entirely to reduce
processing time at each hop• Options: allowed, but outside of header,
indicated by “Next Header” field• (segmentation is done in Options
Header, only if needed)• ICMPv6: new version of ICMP
– additional message types, e.g. “Packet Too Big”– multicast group management functions
Quiz 2 -44Network Layer 4-44
TunnelingA B E F
IPv6 IPv6 IPv6 IPv6
tunnelLogical view:
Physical view:A B E F
IPv6 IPv6 IPv6 IPv6IPv4 IPv4
IPv6 A->F TCP data
IPv6 A->F TCP dataIPv4 B->E
An IPv4 Header is prepended to carry the IPv6 Datagram across the IPv4 network from B to E.