network layer - ip (internet protocol)hiroshi1.hongo.wide.ad.jp/hiroshi/files/internet/05-ip.pdf3...
TRANSCRIPT
1
Network Layer
- IP (Internet Protocol) -
2
IP(Internet Protocol) ~ Functions and Features ~
• Functions – Identify the Interface
• 32bits(or 128bits) ID
• IP address for interface
– IP packet forwarding • Exchange IP packets
• Router determines route
– Fragmentation if IP packet • Too large packet must be
fragmented
• Re-fragmented at the destination
• Features
– Best-Effort delivery • Datagram /connectionless
• Excuse if not be delivered
• Stateless for connection
– Scale-free • Against the number of data
flows
• Recursive routing configuration
– Open technical spec. • RFC791
3
Routing the IP Packet
Net1
Net2
Net3
Net4
Net1 R1 Net2 R1 Net3 R2 Net4 R3
Routing Table
Router R1
R2
R3
• Router resolves the appropriate neighbor node, that the received IP packet should be forwarded to lead to the destination interface.
東京駅 Car Train Bike Foot
成田
羽田
新横浜駅
太平洋線
欧州線
国内線
新幹線
4
Router processes IP functions
• IP packet (= human-being)
• IP at core, and TCP at the edges
IP TCP APP
IP IP TCP APP
net-A net-B
Source address
Destination address
Data (=Payload)
net-C
5
IP Routing - What a dynamic routing protocol does -
Webサーバ
R2
R1 R3
R4
R5
R6
R7
R8
R9
133.196.16.19
202.249.10.122
From 133.196.16.19 to 202.249.10.122
A. Calculate routes (back-ground)
1. Get Routing Information in the Domain
2. Export and import routing information
(Inside/Outside Routing Domain)
202.249.xxx.yyy → R4
202.249.xxx.yyy → R6 202.249.xxx.yyy
→ R8
Import & Export
available networks
B. L3 packet transmission (on-demand)
→ Dest.=202.249.10.122 (Best-Matching)
→ Next Hop=“R2” for 202.249.xxx.yyy
6
L4: Application data == contents for delivery L3: IP packet == human-being L2: Data-link frame == car, train, ship, airplane L1: Physical media == road, railroad, airline
IP packet forwarding - Compare with transport system -
1. Data-link and Physical media is frequently tightly coupled.
2. We can replace data-link, whenever you want. 3. Router does not need to memorize each
human-being to visit, i.e., stateless operation
7
IP version 4 (IPv4) Header Format
0 7 8 15 16 23 24 31
TOS version header
length Total length (Bytes)
Fragment offset (13bits)
protocol-id
flag(3)
TTL header checksum
datagram-id
source IP address
destination IP address
0
1
2
3
4
options (if any)
TCP/UDP data
20 B
yte
s
IP version 6 (IPv6) Header Format
0 7 8 15 16 23 24 31
version Priority Flow Label
Hop-Limit Next Header Type Payload Length
source IP address (128 bits)
destination IP address (128 bits)
0
1
2 3 4
options (if any)
TCP/UDP data
40 B
yte
s
5 6
7
8
9
The Data, we will put on-line
1. Japanese in general
– http://v6pc.jp/jp/spread/ipv6spread2013.phtml
– IPv6 over NGN http://v6pc.jp/jp/spread/ipv6spread2013_03.phtml
2. Government institutes in Japan
– http://www.attn.jp/ipv6status/jp/go/
11
Copyright (c) 2013 Internet Initiative Japan
Inc.
13 2013/6/13
Number of source node for Queries
Source; Mr,Y.Matsuzaki of IIJ
A
AAAA
Copyright (c) 2013 Internet Initiative
Japan Inc.
14 2013/6/13
AAAA(IPv6) vs A(IPv4)
Source; Mr,Y.Matsuzaki of IIJ
50 % of
DNS Query has
already been
Dual Stack !!!
15
TLV(Type, Length and Value) format
Option Type Option Len Option Data
00- read the next header 01- drop the packet 10- drop packet to send ICMP error message 11- drop packet to send ICMP error message, if it is not multicast packet
0 – do not change option, until destination 1 – can change option, if needed
Option code point
Option Len octet
16
IP address design; good or bad ? • Two meanings by one instance
1. Identifier to identify the destination interface
2. Routing hint to forward the IP packet to the
destination interface
• Address structure for “effective” routing
– Hierarchy by recursive and aggregative configuration
• Address structure for “flexible” routing
– Provider independent address space
– Best-match policy, e.g., super-netting or punching-
hole
17
IP経路(ルーティング)制御 - 金太郎飴的(recursive)ネットワーク -
Webサーバ
202.249.10.122
メールサーバ
133.196.16.10
OSPF Backbone Network
(Area0)
AS=1210
AS=1391
AS=2913
AS=2014
AS=1998
AS=1018 IX
Area1 Area2 AreaN
Campus Net.
Campus Net. Backbone Network
POP
: Boarder Router
Dial-up access
POP
18
Aggregating routing prefixes,
to reduce the number of routing entries
• Contiguous routing prefixes can be aggregated.
Host ID 00 Network prefix
24
Host ID
01 Network prefix
Host ID
10 Network prefix
Host ID
11 Network prefix
C
C
C
C
Network prefix
22
4C
- Address Aggregation
- 4x(/24 address) → 1x(/22 adress)
Host ID
19
Aggregating Routing Prefixes
to reduce the number of routing entries
192.24.0.0 - 192.24.7.0 = 192.24.0.0/21
192.24.16.0 - 192.24.31.0 = 192.24.16.0/20
192.24.8.0 - 192.24.11.0 = 192.24.8.0/22
192.24.34.0 - 192.24.35.0 = 192.24.34.0/23
192.32.0.0 - 192.32.15.0 = 192.32.0.0/20
192.24.12.0 - 192.24.15.0 = 192.24.12.0/22
192.24.32.0 - 192.24.33.0 = 192.24.32.0/23
192.24.32.0/22 192.24.0.0/19 192.32.0.0/20
192.24.32.0/22 192.24.0.0/19 192.32.0.0/20
RA RB
20
0 1 2 3 12345678 90123456 78901234 56789012 [1] 192.32. 0.0/20 : 11000000.00100000.0000---- -------- [2] 192.24.34.0/23 : 11000000.00011000.0010001- -------- [3] 192.24.32.0/23 : 11000000.00011000.0010000- -------- [4] 192.24.16.0/20 : 11000000.00011000.0001---- -------- [5] 192.24. 0.0/21 : 11000000.00011000.00000--- -------- [6] 192.24. 8.0/22 : 11000000.00011000.000010-- -------- [7] 192.24.12.0/22 : 11000000.00011000.000011-- --------
0 1 2 3 12345678 90123456 78901234 56789012 [1] 192.32. 0.0/20 : 11000000.00100000.0000---- -------- [8] 192.24.32.0/22 : 11000000.00011000.001000-- -------- [4] 192.24.16.0/20 : 11000000.00011000.0001---- -------- [5] 192.24. 0.0/21 : 11000000.00011000.00000--- -------- [9] 192.24. 8.0/21 : 11000000.00011000.00001--- --------
Aggregate; [2] + [3] = [8] (.34/23 + .32/23) [6] + [7] = [9] (.8/22 + .12/22)
Aggregate; [5] + [9] = [10] (.0/21 + .8/21)
21
0 1 2 3 12345678 90123456 78901234 56789012 [1] 192.32. 0.0/20 : 11000000.00100000.0000---- -------- [8] 192.24.32.0/22 : 11000000.00011000.000110-- -------- [4] 192.24.16.0/20 : 11000000.00011000.0001---- -------- [10] 192.24. 0.0/20 : 11000000.00011000.0000---- --------
0 1 2 3 12345678 90123456 78901234 56789012 [1] 192.32. 0.0/20 : 11000000.00100000.0000---- -------- [8] 192.24.32.0/22 : 11000000.00011000.000110-- -------- [11] 192.24. 0.0/19 : 11000000.00011000.000----- --------
Aggregate; [5] + [9] = [10] (.0/21 + .8/21)
0 1 2 3 12345678 90123456 78901234 56789012 [1] 192.32. 0.0/20 : 11000000.00100000.0000---- -------- [8] 192.24.32.0/22 : 11000000.00011000.001000-- -------- [4] 192.24.16.0/20 : 11000000.00011000.0001---- -------- [5] 192.24. 0.0/21 : 11000000.00011000.00000--- -------- [9] 192.24. 8.0/21 : 11000000.00011000.00001--- --------
Aggregate; [4] + [10] = [11] (.16/20 + .0/20)
22
IP address design; good or bad ? • Two meanings by one instance
1. Identifier to identify the destination interface
2. Routing hint to forward the IP packet to the
destination interface
• Address structure for “effective” routing
– Hierarchy by recursive and aggregative configuration
• Address structure for “flexible” routing
– Provider independent address space
– Best-match policy, e.g., super-netting or punching-
hole
23
Super-netting Needs Best-Matching
192.24.0.0/20
192.24.0.0/23
0 1 2 3 Next-hop 12345678 90123456 78901234 56789012 192.24.0.0/20 : 11000000.00011000.0000---- -------- R1 192.24.0.0/23 : 11000000.00011000.0000000- -------- R2 : : : : : : : :
R1
R2 R3
<< Routing table in R3 >>
Destination next-hop
(1) 192.24.1.122 : R2
(2) 192.24.8.36 : R1
24
Routing
[Types of Routing] (a) Static (static_routing, default_routing) (b) Dynamic
[Jobs of Routing Process(Dynamic Routing)] (i) Advertisement of Routing information to all the neighbor nodes (ii) Calculate and establish the routing table (iii) IP packet reception and forwarding
[Procedure when receives IP packet] (1) Retrieve host_address (2) Retrieve network_address (3) Retrieve default_entry
25
Implication of routing
protocol/algorithm
• Mathematical definition
– Create an unique tree, whose root is the destination tree. There are multiple trees, in general, but, among multiple trees, the unique tree is defined by some defined cost function.
– The tree must be created for all the possible destination, including “default destination”, as a last resort.
– Creation of the complete global tree is impossible, so that recursive structure and default routing are applied to.
26
Implication of routing
protocol/algorithm
• Engineering observation
– Fully distributed path calculation, aka, no fate-share, no single-point-of-failure
– State management is independent from application data flow, aka, do not care “end-to-end”
end-to-end principle on transparency, but on path management.
27
Dynamic Routing Protocols
[calculation algorithms] (1) Distance Vector ; DV型
(2) Link State ; LS型 (3) Path Vector ; PV型 (4) Source Routing)
Routing Type DV型 LS型 PV型 Object IPv6 RIP Yes routed RIPng OSPF Yes gated OSPFv6 BGP4 Yes gated BGP4+ DVMRP Yes mrouted - MOSPF Yes - - PIM n/a n/a n/a - - MBGP Yes - -
28
Distance Vector Routing Protocol
・ Bellman-Ford Algorithm (or Bellman-Fullkerson) - Deployed around 1969 in ARPANET - Developed by Xerox-PARC as XNS-RIP - Maintain and exchange the distance vector, indicating distance from own node to reachable destination nodes. Pick up the neighbor node, that is the shortest path to the destination node, for every possible destination nodes. - Does not have network topology information - Order of calculation and routing information is O(n), rather than O(n2) - Distance Vector : {dst_net, distance, next_hop_node}
29
Routing Information Protocol (RIP)
・ RIP for IPv4 ; RFC 1058/1721/1722/1723/1724
RIP for IPv6 ; RFC 2080
・ routed in BSD, SunOS
- Maximum hops ; 15
- Exchange distance vector every 30sec.
- Cold-Start; max.450 sec.(=15x30sec.) for synch tables
- use the UDP(port=520)
- Keep-alive algorithm;
identify in failure without 180 sec. keep-alive message
- Calculation algorithm;
- D(i,j) ; distance vector
- d(i,j) ; distance between node_i and node_j
D(i,j) = min [d(i,k) + D(k,j)] (for all k)
30
Routing Information Protocol (RIP)
A
D
B
E
C
(3)
(1)
(6)
(4)
(2)
(5)
From A to Link Cost
A local 0
From A to Link Cost
A local 0
B (1) 1
D (3) 1
From B to Link Cost
B local 0
From D to Link Cost
D local 0
B → A
D → A
(1) 0 sec
(2) 30 sec
31
Routing Information Protocol (RIP)
A
D
B
E
C
(3)
(1)
(6)
(4)
(2)
(5)
From A to Link Cost
A local 0
B (1) 1
C (1) 2
D (3) 1
E (1) 2
B → A
D → A
(2) 30 sec
(3) 60 sec
From A to Link Cost
A local 0
B (1) 1
D (3) 1
From B to Link Cost
A (1) 1
B local 0
C (2) 1
E (4) 1
From D to Link Cost
A (3) 1
D local 0
E (6) 1
Select one from even two path
→ {(1),2} vs {(3), 2}
32
Routing Information Protocol (RIP)
A
D
B
E
C
(3)
(1)
(6)
(4)
(2)
(5)
From A to Link Cost
A local 0
B (1) 1
C (1) 2
D (3) 1
E (1) 2
B → A
D → A
(3) 60 sec
(4) 90 sec
From B to Link Cost
A (1) 1
B local 0
C (2) 1
D (1) 2
E (4) 1
From D to Link Cost
A (3) 1
B (3) 2
C (6) 2
D local 0
E (6) 1
From A to Link Cost
A local 0
B (1) 1
C (1) 2
D (3) 1
E (1) 2
33
Link State Routing Protocol
・ SFP (Shortest Path First) as OSPF and as IS-IS - Deployed around 1970 in ARPANET
→ Come up with the scaling issue of Distance Vector - Each nodes in the routing domain have the topology information and the Attribute value of all the links connecting nodes. ) - The link-state database in the nodes are synchronized and are identical. - Every node calculates the next hop nodes for every destination, using the identical routing information, with identical calculation algorithm. - The shortest path, having the minimum cost value, is defined as the appropriate path to reach to the destination nodes/networks. → calculate the best path for each destination nodes/networks, i.e., creating the Spanning_Tree
34
Open Shortest Path First (OSPF)
A
D
B
E
C
(3)
(1)
(6)
(4)
(2)
(5)
[ Link State Data-Base ]
From To Link-id Distance
A B 1 1
A D 3 1
B A 1 1
B C 2 1
B E 4 1
C B 2 1
C E 5 1
D A 3 1
D E 6 1
E B 4 1
E C 5 1
E D 6 1
Exchange
LS information
between neighbor
node
At Node “A”
To Next-Hop Link-id
B B 1
C B 1
D D 3
E B 1
A
D
B
E
C
(3)
(1)
(4)
(2)
Calculate spanning tree
Build routing
table
35
OSPF Area Configuration Example
RT1
RT2
N3
RT4
RT3
RT5
RT6
N12 N13 N14
RT10 RT17
N12
N15
N3 N9
RT9
RT12
RT11
RT8
N10
N11
N1
N2
N4
N8
N7
H1
Area 1
Area 3 Area 2
Area 0
: ASes
(3)
(3)
(1)
(1) (1)
(1)
(8) (8)
(8) (6)
(7)
(6)
(6)
(8) (8) (8)
(2)
(9)
(1) (1)
(1)
(4)
(3)
(2) (1)
(1)
(1)
(2)
(10)
(3)
(2) Ia
Ib
(7)
(6)
(5)
36
OSPF Link State DB Example
- routers in Area 1 - ** From **
RT1 RT2 RT3 RT4 RT5 RT7 N3
RT1 0
RT2 0
RT3 0
RT4 0
RT5 14 8
RT7 20 14
N1
N2
N3
N4
Ia, Ib
N6
N7
N8
N9-N11,H1
N12
N13
N14
N15
** T
O *
* 3
3 1 1 1 1
2 15 22 16 15 20 19 18 18
19 16 8 2 8 8
9
Network RT3 RT4
N1 4 4
N2 4 4
N3 1 1
N4 2 3
RT3/RT5がArea_0のRouters
(RT5/RT6)へ広告する情報
RT: Router
N : Network
I : p2p link
H : Host
Example:
RT3 → RT7 : 20 = 8 + 8 + 6
(RT3,RT6,RT5,RT7)
37
Path Vector Routing Protocol - Defined in RFC827 (EGP; Exterior Gateway Protocol), deployed since 1982 → Designed as the routing protocol to interconnect NSFNET backbone and regional networks (RFC1093) → Designed so as to able to implement and reflect the different routing policies applied at the regional networks, i.e., policy routing. - Routing among AS (Autonomous System) AS is represented by 16 bits (now it is 32 bits). The path to reach to the destination AS is represented by the “path-vector”, that is consisted by the set of AS numbers. Each AS Boarder Router advertises all the reachable path vector information to all neighbor nodes. All the border routers in the Internet advertise and exchange AS path information, so as to create global AS Path entries for the spanning-trees covering globe.
38
PV Routing Protocol ; BGP4+
・ BGP (Border Gateway Protocol) ; RFC1771
(gated daemon in UNIX system)
(1) IBGP & EBGP
- IBGP(Internal BGP) ; Intra-AS
- EBGP(External BGP) ; Inter-AS
(2) Multiple Layer-3 Protocols by BGP4+
(3) Policy routing
(4) Full Mesh or Reflector for IBGP
(5) Route-Server operation for EBGP
→ Avoiding the Full-Mesh Peering (Start-Peering)
(6) Route Aggregation
(7) Running over TCP (port=179)
(8) Multi-Accessability
- IBGP; MED (Multiple Exit Discriminator)
- EBGP; Local Preference
39
A1,A2 B1,B2
D1 E1,E2
C1,C2
Path Vector Routing
A
D
B
E
C
(AS=5)
(AS=1) (AS=3) (AS=2)
(AS=4)
[ Path Vector Data-Base in node A]
From To Path Next-Router
A B1 A,B B
A B2 A,B B
A C1 A,B,C B
A C2 A,B,C B
A D1 A,D D
A E1 A,D,E D
A E2 A,B,E B
[ Path Vector Data-Base in node C]
From To Path Next-Router
C B1 C,B B
C B2 C,B B
C A1 C,B,A B
C A2 C,E,D,A B
C D1 A,E,D E
C E1 A,E E
C E2 A,B,E B
40
A1,A2 B1,B2
D1 E1,E2
C1,C2
Path Vector Routing
A
D
B
E
C
(AS=5)
(AS=1) (AS=3) (AS=2)
(AS=4)
(1) Step 1
From To Path Next-Router
A B1 A,B B
A B2 A,B B
A D1 A,D D
(2) Step 2
From To Path Next-Router
A B1 A,B B
A B2 A,B B
A C1 A,B,C B
A C2 A,B,C B
A D1 A,D D
A E1 A,D,E D
A E2 A,B,E B
[ From B]
From To Path Next-Router
B E1 B,E E
B E2 B,E E
B C1 B,C C
B C2 B,C C
[ From D]
From To Path Next-Router
D A1 D,A A
D A2 D,A A
D E1 D,E E
D E2 D,E E
41
B1,B2 C1,C2
Path Vector Routing ; IBGP
A
D
B
E
C
(AS=1) (AS=3)
(AS=2)
(AS=4)
Network;
A1,A2,D1
; Inter-AS (EBGP)
; Intra-AS (IBGP)
From B1 to E2 ; {AS=2,AS=1,AS=4}
Path; B → (A → D) → E
From C2 to B2 ; {AS=3, AS=4, AS=1, AS=2}
Path; C → (F → E) → (D → A) → B
F
Network;
E1,E2,F1
(*) IBGP; (A,D) & (E,F)
42
Avoiding Full Mesh Peering
E
D
A
C
B
E
D
A
C
B RS Full-Mesh Peering
Route
Server
E
D
A
C
B RR
Route
Reflector
“EBGP”
“IBGP”
43
NAT
(Network Address Translation)
44
IP Address for Private Use
• Reserved by IANA (RFC1918)
– This is not global unique address space.
– There are {many} nodes, that have the same IP
address.
10.0.0.0 - 10.255.255.255
172.16.0.0 - 172.31.255.255
192.168.0.0 - 192.168.255.255
45
NAT(Network Address Translation)
・ Private IP address and related port number is translated at the
NAT router, located at the edge of network segment using the
private IP addresses.
(1) Private segment → Global
- DNS; as it is (resolved global IP address of destination)
- Source IP address IP address of NAT router’s global interface
- Source Port number pick up one and register into translation table
(2) Global → Private
- DNS; NAT router’s global interface
- Destination IP address; translated into private IP address
by destination port number
[Note] Can not initiate the session from global space to private space.
We can not put a server, access from global space, in NAT segment.
46
Traditional Basic NAT
NAT A C
C A C N
A C N C
for NAT segment for global
IP Address Port IP Address Port
Src Dest Src Dest Src Dest src Dest A
-- ー ー ー ー ー N
Replace A→N
Replace N→A
Source address
Destination address
NAT segment Global Internet
47
Traditional NAT
NAT A C
NAT segment Global Internet
A
C
Src IP
Destination IP Basic NAT
100
200
Src port number
Destination port C
N
100
200
200
100
N
C
C
A
200
100
Replace A→N
Replace N→A
A
C
Src IP
Destination IP
100
200
Src port number
Destination Port
C
N
150
200
200
150
N
C
C
A
200
100
IP; A→N、
Port; 100 →150
IP; N→A
Port; 150→ 100
NAPT
48
Bi-directional NAT
NAT A C
A
C
Src IP address
Dest IP address
100
200
Src Port number
Dest Port Number
C
N
100
200
200
100
N
C
C
A
200
100
Replace A→N
Replace N→A
DNS Query; IP for host A ?
Replay; it is N.
(3)
(4)
NAT segment Global Internet
49
192.168.3.5
NAT-R1
NAT-R2
192.20.2.24 (bill.whitehouse.gov)
192.168.0.0/16 192.20.0.0/16
192.168.32.1
198.29.10.23 198.30.40.50
192.20.61.1
dst=198.30.40.50
src=192.168.3.5
dst=198.30.40.50
src=198.30.10.23
dst=192.20.2.24
src=198.20.10.23
<Translation Table in NAT-R2>
input output output
source port destination port source port destination port port
198.29.10.23 2012 198.30.40.50 n/a 190.29.10.23 n/a 192.20.2.24 n/a #1
192.20.2.24 n/a 198.29.10.23 n/a 198.30.40.50 2122 198.29.10.23 n/a #2
#2
#1
src=198.29.10.23, port=2012
→ dst=192.20.2.24
(*) DNS Address resolution : bill.whitehouse.gov → 198.30.40.50
However…… • Limitation on the number of session states for
NAT operation
• Each user could use certain number of sessions – How many sessions ?
– Even as the best case, 65,536 is the maximum number of sessions, shared by customers accommodated into a single IPv4 address
When the number of users is 2,000, it will be only
30 sessions This means……..
50
Limitation of NAT Solution
NAT
Host
Host
Host
Host
Host
Host
Maximum # of sessions
51
Limitation of NAT Solution
NAT
Host
Host
Host
Host
Host
Host
Maximum # of sessions
You may have already
experienced !!!!
52
Max 30 Connections
53
Max 20 Connections
54
Max 15 Connections
55
Max 10 Connections
56
Max 5 Connections
57
Yet another, serious problem
1. Security…… with NAT……
2. Operational overhead (OPEX) by NAT….
58
59
Some other routing mechanisms
• Anycast
– You can advertize the same routing information
from multiple routers (or nodes).
– The IP packet is automatically delivered to the
nearest network segment (or node)
– Applied first at Nagano Olympic Game, and
applied in root DNS servers and in CDN.
• MIP(Mobile IP) / NEMO(Network Mobility)
– MIP/NEMO router acts as a rendezvous point to
mobile node/network.
60
Some other routing mechanisms
61
Mobile IP operational example
Home Agent
move
Mobile Node
tunnel
Mobile Node
Set the home
address as the
destination
Have both home
address and CoA
Only the routing
information for
home address is
advertised
Can select a route
either via Home
Agent or via direct
peer-to-peer path
62
When we observes MIP…
• It is a separation of Locator-ID (=MPLS label, CoA address) and Host-Identifier (=MPLS FEC, home address)
It may lead to “future Internet” architecture discussion…
HIP(Host IdentityProtocol) and RISP (Routing and Identifier Separation Protocol)
Yet another approach
• NDN; Named Data Networking
– http://www.named-data.net/
– File/Content based routing
– Back ground architecture
• P2P
• DTN
• OpenFlow
– http://www.openflow.org/
– Special Lecture on June 10th
63