reliability strategies for network function virtualization...
Post on 21-Apr-2018
222 Views
Preview:
TRANSCRIPT
Reliability Strategies for Network Function Virtualization and Cloud NetworksMassimo TornatoreDepartment of Electronics, Information and BioengineeringPolitecnico di Milano, Italy
IEEE 2017 Emerging Technologies Reliability Roundtable (ETR-RT17)Bologna, Italy, July 2nd
Outline
Cloud/Content and Reliability
1. Virtual Network Mapping in Cloud Networks Content Connectivity vs. Network Connectivity
2. Network Function Virtualization (NFV) Reliable Service Chaining Problem
Conclusion and Future Directions
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
2
Cloud Network
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
Data
Content
Social networking
Storage
Web browsing
Videos
E‐mail
User
Request
User
Request
User
Request
User
Request
3
47
1
5
2
6 Cloud DC traffic growth
[1] Cisco Global Cloud Index: Forecast and Methodology, 2015–2020 White Paper
3
Any Content, Anywhere, Any Time
• 90% of the total Internet traffic is generated due to content dissemination [2]
• What really matters is the connectivity to content
• End‐to‐End → End‐to‐Content
4
[2] CISCO. Cisco Visual Networking Index: Forecast and Methodology, 2011‐2016. in White Paper, May 2012
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Are Cloud Networks Reliable?
• Data loss, service disconnection, security in cloud are still open issues
• Big obstacle for adoption of cloud service from business users
• Some numbers from [3]
• In June 2012, a lightning storm hit the Amazon Virginia data center, taking Netflix as well as Pinterest, Instagram and other sites off line for hours
• Two Sprint fiber optic cuts disrupted Alaska Airline’s operation in Oct. 2012
• Recent survey shows “data loss” at no. 2 of top cloud threat list
• Survey shows that in 2011, 19% of the businesses that experienced data loss are from the cloud
[3] J. Sterbenz et al. Resilience and Survivability in Communication Networks: Strategies, Principles, and Survey of Disciplines. Computer Networks, vol. 54, no. 8, pp. 1245 ‐ 1265, June 2010
5
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
From Cloud Computing to Edge Computing
• 5G networks must provide 99,999% service availability [4] Enabler: Fog Computing, Mobile Edge Computing (MEC), Surrogate Servers,
Caches, Edge Cloud… Latency? Traffic Offloading?.... Reliability!
• Most cloud services can be accessed even in case of network disconnection!
6
[4] NGMN Alliance "5G white paper." Next generation mobile networks, white paper (2015).
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
7
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
1) Content Connectivity
New Survivability Metric: Content Connectivity
Traditional metric: Network connectivity (NC)• Reachability of all nodes from any other node in the network
New metric: Content connectivity (CC)•Reachability of content from any node in the network
[5] “Fault‐Tolerant Virtual Network Mapping to Provide Content Connectivity in Optical Networks,” M. F. Habib et al.
Originserver
Proxyserver
8
Entreprise network
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Questions on Content Connectivity 9
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
• How do traditional network survivability problem evolve when the introduce Content Connectivity?
• Virtual Network Mapping (Multi‐layer protection)
• Can we save network resource with Content Connectivity?
Survivable Virtual Network Embedding (SVNE)
Initial failure of physical elements
Vertical correlated cascading failures cause failures on upper layers.
Physical layer
Virtual Layer
10
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
• Note: Embedding vs. Mapping
Cut‐Set Definition 11
Cut 1
Cut 4Cut 2
Cut 3
Cut 5
Cut 6
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
SVNE: Condition For Network Connectivity
We must ensure that there is no physical link that supports all the virtual links in a virtual cutset
(for all cutset in the virtual network)
12
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
e.g., physical link (1-2) supports all
virtual links of Cut 1
Non‐Survivable Embedding For Net. ConnectivityExample
Non Survivable Embedding 1
2 3
54
13
Cut 1
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Survivable Embedding For Network ConnectivityExample
Survivable Embedding
No physical link that supports all the virtual links of any cutset
1
2 3
54
14
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
SVNE: Condition For Content Connectivity (K) 15
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
We must ensure that all virtual nodes can reach at least one surrogate server after the occurrence
of K failures at physical layer
32
SVNEContent Connectivity
Same as network
connectivity
Scenario A1 Failure
1 Datacenter(trivial)
16
1 54
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
32
SVNEContent Connectivity
Scenario B1 Failure
2 Datacenters
Network connectivity is not guaranteed,
but content connectivity is
guaranteed
17
1 54
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
SVNEContent Connectivity K=2
K=2# replicas=2
Content connectivity guaranteed
18
1
2 3
54
K=2# replicas=4
Nonsurvivable
content connected embedding
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Number of Replicas Vs Number Virtual Links
• Which strategy is better to ensure content connectivity? • Increase number of replicas (more datacenters)?• Increase connectivity of virtual network (more links?)?
• Which is the best choice?
This issue is currently being addressed by members of our team
19
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Classification of Analyzed Approaches
Approaches against single‐link failures:•Network Connectivity (NC1)•Content Connectivity (CC1)
Approaches against double‐link failures:•Network Connectivity (NC2)•Content Connectivity (CC2)
20
Can we provide NC1 after first failure? and maintain CC2 after second failure, until failure recovery is
complete? NC1 + CC2
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
The problem and how we solved it
• Inputs Physical topology Logical topology Fixed datacenter locations
• Outputs Survivable Virtual Net. Mapping
• Objective Minimize the resource usage (i.e., wavelengths)
Integer Linear programming.
Heuristics
21
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
0
20
40
60
80
100
120
0.29 0.47 0.57 0.71 1.00
Num
ber o
f Wavelen
ght c
hann
els
β
CC1 CC2 NC1NC1+CC2 NC2
0.71
Numerical Results
Physical topology: NSFNET (14 nodes, 22 bidirectional links)
0.29 0.47
0.57
1
DC1DC2
Logical topologies: Different connectivity degrees (β)
• Number of datacenters: 2• Number of wavelengths per link: 20
22
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Lesson Learned
•With a small additional effort in the design phase, we can ensure network connectivity to single failures augmented with content connectivity against double‐link failures with minimum resources and with a limited number of datacenters
23
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
24
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
2) Reliable Service Chaining
Network Function Virtualization
Network functions implemented as virtual network function (VNF) (Virtual Machines) in general purpose hardware
No more “middle-boxes”
25
[7] “Virtualizing Network Security with NFV and SDN Explored in New Whitepaper and Webinar”, www.infonetics.com
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Service Chain
User NAT FW DPI WOCWeb
Server
• VNFs are chained to set‐up a Service Chain (SC)
• Example: Web‐Service SC
• Each SC has its own requirements in terms of
Bandwidth
Latency
Resiliency
DPI: Deep Packet InspectionWOC: WAN Optimized Controller
26
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
NAT: Network Address TranslatorFW: Firewall
VNF Placement for Service Chaining
VNF shared by different SCs
VNFs sharing the same node
Each SC has an end-to-end latency requirement
Physical Topology
…
Each VNF is carachterized by its processing requirement (#of CPUs)
Start Point
VNF3
VNF4
EndPoint
StartPoint
VNF1
VNF2
EndPointService Chain 1
Service Chain n
27
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Resilient VNF “placement”Questions to be answered
Where do we place VNFs and route traffic to ensure resiliency against link/node failures?
Which protection schemes shall we apply?
28
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Protection schemesSeveral possible combination/choices
Unprotected
29
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
(Virtual) Link Protection (Vl-P)
(Virtual) Node Protection (Vn-P) End-to-End Protection (E2E-P)
Numerical settings (1)
• NSFNET network topology (14 nodes, 22 links @1Gb/s)
• 5 different types of SCs
NAT: Network Address Translator, FW: FirewallTM: Traffic Monitor, VOC: Video Optimization Controller, IDPS: Intrusion Detection Prevention
System, WOC: WAN Optimized Controller
[8] M. Claypool and K. Claypool, Latency and player actions in online games, Commun. ACM 49, 11 (November 2006), 40‐45[9] A. Hmaity et al. "Virtual Network Function placement for resilient Service Chain provisioning," 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Halmstad, 2016, pp. 245‐252.
WSVSVoIP
OG
30
NATUSER FW TM WOC IDPS WEB SERVER
NATUSER FW TM VOC IDPS VIDEO SERVER
NATUSER FW TM FW NAT VOICE SERVER
NATUSER FW VOC WOC IDPS GAME SERVER
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Service Chain Bandwidth (kb/s) Max latency (ms)
Web Service (WS) 100 500Video Streaming (VS) 4000 100
VoIP 64 100Online Gaming (OG) 50 60
Results – Number of required NFV Nodes 31
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Web Service Online gaming
Results ‐ Average Hop Count
• Average path length (nr. of hops)
32
At high values of nodes capacity the Vl‐P produces the longest paths due to the fact that many pairs of
disjoint paths must be computed
(hard disjontness constraint)
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Lesson learned
• Applications have so diverse requirements (latency, computing intensity, bandwidth, reliability), there’s no one‐size‐fit‐all solution
→ «Slicing», applica on‐aware resource/protection provisioning
33
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Other research directions1) Self‐diagnosed networks (i.e., machine learning/analytics)
34
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
•Machine learning for fault diagnosis • A fault has a set of symptoms (warnings, alarms, other faults)• Fault diagnosis correlates observed symptoms so as to determine their root cause(s)• It leverages on monitoring data (e.g., collected by operator’s hot line [8]): counters, powers, temperatures, …
•Machine learning for Fault Localization • Authors in [9] use Network Kriging
[8] S. Gosselin et. al. , Application of Probabilistic Modeling and Machine Learning to the Diagnosis of FTTH GPON Networks, ONDM 2017[9] K. Christodoulopoulos et al.. Exploiting network kriging for fault localization. In Optical Fiber Communication Conference (pp. W1B‐5)
Other research directions2) SDN control resiliency
35
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
Data Plane
C1
C2
Control PlaneC1 C2
C3
C3
• Determining # of controller and their placements
• Determining logical control plane topology
• Mapping control plane (routing) to physical network
• Controller‐to‐switch assignments
[10]. S Savas, M Tornatore, MF Habib, P Chowdhury, B Mukherjee, Disaster-resilient control plane design and mapping in software-defined networks, in High Performance Switching and Routing (HPSR), 2015
Thank You! 36
..and thanks to them!
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Biswanath MukherjeeFarhan HabibSedef Savas
Achille PattavinaAli HmaityFrancesco Musumeci
My publications on these topics
CONTENT CONNECTIVITY
• M. F. Habib, M. Tornatore, and B. Mukherjee, "Fault‐Tolerant Virtual Network Mapping to Provide Content Connectivity in Optical Networks," in Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2013, paper OTh3E.4.
• A. Hmaity, F. Musumeci and M. Tornatore, "Survivable virtual network mapping to provide content connectivity against double‐link failures," 2016 12th International Conference on the Design of Reliable Communication Networks (DRCN), Paris, 2016, pp. 160‐166
RELIABLE SERVICE CHAINING
• A. Hmaity, M. Savi, F. Musumeci, M. Tornatore and A. Pattavina, "Virtual Network Function placement for resilient Service Chain provisioning," 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Halmstad, 2016, pp. 245‐252.
37
M. Tornatore - Reliability Strategies for NFV and Cloud Networks
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
BACKUP SLIDES
38
Service chains modelling
Servicechain
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
39
Mapping VNFs Requests
StartPoint
VNFreques
t
VNF1
EndPoint
Phase 2:Mapping VNF requests toNFV nodes that host VNFs
Servicechain
VNF2 VNF3
VNF3
VNF2VNF1
VNFrequest
VNFreques
t
Phase 1:Mapping VNFsto NFV nodes
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
40
Problem statement (2/2)
Objective function:
Minimize total number of “NFV nodes” (i.e., nodes hosting VNFs)
Three groups of constraints
VNF Request placement
VNF routing constraints
Performance (i.e., latency) constraints
Protection constraints
MILPMinimize
Number of Active NFV
Nodes
Physical topology
Active NFV nodes
Physical path for each SC (routing)
Size and position of VNFs (VNF placement)
SCs to be deployed
SCs and VNFs parameters
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
41
ILP Sets and parameters
VNF request node mappi
ngVNF reques
ts to physic
al paths
mappingMappin
g NFV to VNF request
s
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
42
Constraints (E2E‐P)
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
43
Constraints (E2E‐P)
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
44
Constraints (E2E‐P)
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
45
Constraints (E2E‐P)
Latency and capacity
constraints
M. Tornatore - Protection Strategies in Next Generation Cloud Networks
46
Whose Problem(s) Are We Addressing?
Consumers (you and I)
Enterprises
Cloud‐service providers
Carriers
47
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
What Kind Of Issues Are We Addressing?
• Traffic Engineering (TE)– “Put the traffic where the bandwidth is”
• Network Engineering (NE)– “Put the bandwidth where the traffic is”
• Network Planning (NP)– “Put the bandwidth where the traffic is forecasted to
be”
TE – online, dynamic, provisioning problem, ms time scale
NE – intermediate problem, months time scale
NP – offline, static, dimensioning problem, 5‐yr time scale
48
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Summary of Necessary and SufficientConditionsSingle‐link failures:•Network connectivity with k =1 (NC 1) «CutSet» condition [6]
•Content connectivity with K=1 (CC 1) 1 replica: same as NC1 > 1 replica: reachability of at least one replica under any single failure (much
simpler)
Double‐link failures:•Network connectivity with k=2 (NC 2) CutSet condition applies to each pair of physical links Very hard condition
•Content connectivity with K= 2 (CC2) 1 replica: same as NC2 > 1 replica: reachability of at least one replica under any double failure (much
simpler)
49
[6] K. Lee and E. Modiano. Cross Layer Survivability in WDM‐based Networks. in IEEE INFOCOM, Rio de Janeiro, Brazil, April 2009
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
SCs latency requirements
Different latency “contributions”• Propagation and transmission
delay (network links)• Processing delay for VNFs in
NFV nodes considering resource sharing Upscaling, i.e., a VNF is shared
by different SCs (notconsidered)
Context‐switching, i.e., two or more VNFs share the same hardware resources (processors)
Context switching costs
50
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Problem statement
The VNF Placement problem for survivable SC provisioning
• Given a physical topology, a set of SCs (with latency requirements) to deploy in the network, the required resilience level (nodes and/or links)• Decide the optimal placement of VNFs and mapping of virtual links into the physical topology• Minimizing the number of active NFV nodes• Subject to latency, protection, routing, (capacity) constraints
51
MILPMinimize Number
of Active NFV Nodes
Physical topology Active NFV nodes
Physical path for each SC (routing)
VNF placementSCs to be deployed
SCs and VNFs parameters
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
SVNEContent Connectivity K=2 (More virtual links)
Nonsurvivable
content connected embedding
K=2# replicas=1
Content connectivity guaranteed
K=2# replicas=2
52
1
2 3
54
Content connectivity against double-link
failures can be guaranteed with limited number of replicas if:Conn. Degree( virtual
network) > 2
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Problem Statement
• Inputs Physical topology Logical topology Fixed datacenter locations
• Outputs Survivable Virtual Net. Mapping
• Objective Minimize the resource usage (i.e., wavelengths)
• Constraints Flow constraints Placement constraints Capacity constraint
Integer Linear programming.
Heuristics
53
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
• Connectivity degree= 0.57
• All nodes are assumed to hold a datacenter
25
35
45
55
65
75
1 2 3 4 5 6 7 8
Num
ber o
f w
avel
engt
h ch
anne
ls
Number of data-centers
CC1 CC2 NC1NC1+CC2 NC2
54Numerical Results
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
E2E‐P and Vn‐Pactivate twice the number of NFV nodes w.r.t. Vl‐Pand Unprotectedscenarios when
NFV‐node capacity is high, and less than
twice under small values of NFV‐node capacity
Numerical results – Web Service SCs
0
2
4
6
8
10
12
14
16
2 4 6 8 10 12
Num
ber o
f activeNFV
nod
es
Node capacity (#CPU cores per NFV‐node)
Unpro Vl‐P Vn‐P E2E
55
For loose latency requirement (WS) resiliency Vl‐P comes at no
addtional cost in terms of NFV nodes with respect to
Unprotected case
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
0
2
4
6
8
10
12
14
16
2 4 6 8 10 12
Num
ber o
f activeNFV
nod
es
Node capacity (#CPU cores per NFV‐node)
Unpro Vl‐P Vn‐P E2E
Numerical results – Online Gaming SCs
For small values of NFV‐node capacity, only Unproscenario is feasible
56
Vl‐P is infeasible independently
from node capacity and Vn‐P comes at the same cost of
E2E‐P
The operator is constrained to
place backup VNFs off‐site to provide resiliency against
link failures
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Numerical settings (2)
Service Chain Bandwidth (kb/s) Max latency (ms)
Web Service (WS) 100 500Video Streaming (VS) 4000 100
VoIP 64 100Online Gaming (OG) 50 60
57
VNF CPU requirements (per user)
NAT 0.00092FW 0.0009TM 0.0133WOC 0.0054IDPS 0.0107VOC) 0.0054
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Protection schemes
• No resiliency against link/node failures
58
+ Low cost‐ Low reliability
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Protection schemesVirtual link Protection (Vl‐P)
• Resiliency against link failures
59
+ High node consolidation+ Low recovery time‐ Large bandwidth usage, long paths (latency)
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Protection schemesVirtual node Protection (Vn‐P)
60
• Resiliency against node failures
+ High flexibility to meet SC latency requirements‐ Large number of NFV nodes‐ High recovery time
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
Protection schemesEnd to End Protection (E2E‐P)
61
• Resiliency against both node and link failures
+ high flexibility to meet SC latency requirements+ highest resiliency‐ Large number of nodes
M. Tornatore - Reliability Strategies in NFV and Cloud Networks
top related