survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/vitmm215... · 1b....
TRANSCRIPT
1
Survivable optical networkdesign 1-2
Péter [email protected]
(János Tapolcai [email protected])
Outline
• Basic notions in survivable opticalnetworks
• Main failure sources• Optical networks – overview• Equipment availability• System availability
3
Motivation behind survivablenetwork design
4
Reliability
• Failure– is the termination of the ability of a network element to
perform a required function. Hence, a network failure happens at one particular moment tf
• Reliability, R(t)– continuous operation of a system or service– refers to the probability of the system being
adequately operational (i.e. failure free operation) for the period of time [0 – t] intended in the presence of network failures
5
Reliability (2)• Reliability, R(t)
– Defined as 1- F(t) (cummulative distribution function, cdf)– Simple model: exponentially distributed variables
• Properies: – non-increasing––
tt eetFtR λλ −− =−−=−= )1(1)(1)(
ta
1
0
R(t)
R(a)
0)(lim
1)0(
==
∞→tR
R
t
6
Device is operational
Network with reparable subsystems
t
UP
DOWN
Device is operational
Device is operational
The network element is failed, repair action is in progress.
Failure
• Measures to charecterize a reparable system are:– Availability, A(t)
• refers to the probability of a reparable system to be found in the operational state at some time t in the future
• A(t) = P(time = t, system = UP) – Unavailability, U(t)
• refers to the probability of a reparable system to be found in the faulty state at some time t in the future
• U(t) = P(time = t, system = DOWN)• A(t) + U(t) = 1 at time t
Failure
7
Element Availability Assignment• The mainly used measures are
– MTTR - Mean Time To Repair– MTTF - Mean Time to Failure
• MTTR << MTTF– MTBF - Mean Time Between Failures
• MTBF=MTTF+MTTR• if the repair is fast, MTBF is approximately the same as MTTF• Sometimes given in FITs (Failures in Time), MTBF[h]=109/FIT
• Another notation– MUT - Mean Up Time
• Like MTTF– MDT - Mean Down Time
• Like MTTR– MCT - Mean Cycle Time
• MCT=MUT+MDT 8
Availability in hoursAvailability Nines
Outage time/year
Outage time/month
Outage time/week
90% 1 nine 36.52 day 73.04 hour 16.80 hour
95% - 18.26 day 36.52 hour 8.40 hour
98% - 7.30 day 14.60 hour 3.36 hour
99%2 nines
(maintained)3.65 day 7.30 hour 1.68 hour
99.5% - 1.83 day 3.65 hour 50.40 min
99.8% - 17.53 hour 87.66 min 20.16 min
99.9%3 nines (wellmaintained)
8.77 hour 43.83 min 10.08 min
99.95% - 4.38 hour 21.91 min 5.04 min
99.99% 4 nines 52.59 min 4.38 min 1.01 min
99.999%5 nines (failure
protected)5.26 min 25.9 sec 6.05 sec
99.9999%6 nines (highreliability)
31.56 sec 2.62 sec 0.61 sec
99.99999% 7 nines 3.16 sec 0.26 sec 0.61 sec
Outline
• Basic notions in survivable opticalnetworks
• Main failure sources• Optical networks – overview• Equipment availability• System availability
10
Failure sources – HW failures
• Network element failures– Type failures
• Manufacturing or design failures • Turns out at the testing phase
– Wear out • Processor, memory, main board, interface cards
• Components with moving parts:– Cooling fans, hard disk, power supply
– Natural phenomena is mostly influence and damage these devices (e.g. high humidity, high temperature, earthquake)
• Circuit breakers, transistors, etc.
11
Failure sources – SW failures
• Design errors• High complexity and compound failures
• Faulty implementations • Typos in variable names
– Compiler detects most of these failures
• Failed memory reading/writing operation
12
Failure sources –Operator errors (1)
• Unplanned maintenance– Misconfiguration
• Routing and addressing– misconfigured addresses or prefixes, interface identifiers, link
metrics, and timers and queues (Diffserv)• Traffic Conditioners
– Policers, classifiers, markers, shapers • Wrong security settings
– Block legacy traffic
– Other operation faults: • Accidental errors (unplug, reset)• Access denial (forgotten password)
• Planned maintenance• Upgrade is longer than planned
13
Failure sources –Operator errors (2)
• Topology/Dimensioning/Implementation design errors– Weak processor in routers– High BER in long cables – Topology is not meshed enough (not enough
redundancy in protection path selection) • Compatibility errors
– Between different vendors and versions– Between service providers or AS (Autonomous
system)• Different routein settings and Admission Control between two
ASs
14
Failure sources –Operator errors (3)
• Operation and maintenance errors
Updates and patches
Misconfiguration
Device upgrade
Maintenance
Data mirroring or recovery
Monitoring and testing
Teach users
Other
15
Failure sources –User errors
• Failures from malicious users – Physical devices
• Robbery, damage the device– Against nodes
• Viruses– DoS (denial-of-service) attack (i.e. used in the Interneten)
• Routers are overload• At once from many addresses• IP address spoofing• Example: Ping of Death – the maximal size of ping packet is 65535 byte. In
1996 computers could be froze by recieving larger packets.• Unexpected user behavior
– Short term • Extreme events (mass calling)• Mobility of users (e.g. after a football match the given cell is congested)
– Long term • New popular sites and killer applications
Failure sources –Environmental causes
• Cable cuts – Road construction (‘Universal Cable Locator’)– Rodent bites
• Fading of radio waves– New skyscraper (e.g. CN Tower)– Clouds, fog, smog, etc.– Birds, planes
• Electro-magnetic interference– Electro-magnetic noise – solar flares
• Power outage• Humidity and temperature
– Air-conditioner fault• Natural disasters
– Fires, floods, terrorist attacks, lightnings, earthquakes, etc.
17
Michnet ISP Backbone11/97 – 11/98
Maintenance
Power OutageFiber Cut/Cicuit/Carrier Problem
Hardware Problem
Routing Problems
Interface Down
Congestion/Sluggish
Malicious Attack
Software Problem
• Which failures are the most probable ones?
18
Michnet ISP Backbone11/97 – 11/98
Operator35%
Hardw are15%
Environmental31%
User5%
Unknow n11%
Malice2%
Softw are1%
Cause Type # [%]
Maintenance Operator 272 16.2
Power Outage Environmental 273 16.0
Fiber Cut/Cicuit/Carrier Problem
Environmental 261 15.3
Unreachable Operator 215 12.6
Hardware Problem Hardware 154 9.0
Interface Down Hardware 105 6.2
Routing Problems Operator 104 6.1
Miscellaneous Unknown 86 5.9
Unknown/Undetermined/No problem
Unknown 32 5.6
Congestion/Sluggish User 65 4.6
Malicious Attack Malice 26 1.5
Software Problem Software 23 1.3
19
Case study - 2002
• D. Patterson et. al.: “Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies”, UC Berkeley Computer ScienceTechnical Report UCB//CSD-02-1175, March 15, 2002,
20
Failure sources - Summary• Operator errors (misconfiguration)
– Simple solutions needed– Sometimes reach 90% of all failures
• Planned maintenance– Running at night – Sometimes reach 20% of all failures
• DoS attack– It will be worse in the future
• Software failures– 10 million line source codes
• Link failures– Anything from which a point-to-point connection fails (not only cable
cuts)– Protection needed!
Outline
• Basic notions in survivable opticalnetworks
• Main failure sources• Optical networks – overview• Equipment availability• System availability
22
High Speed Backbone Service
providers
PSTNPSTN
InternetInternet
VideoVideo
Backbone
Mobile access
MetroMetro
BusinessBusiness
Telecommunicaiton Networks
23
Current network architecture in backbone networks
IP
ATM
SDH
WDM
Applications and services
Trafficengineering
Transport andprotection
High bandwidth
24
IP - Internet Protocol
• Packet switched– Packets are forwarded based on forwarding tables
• Routing is performed via link-state protocols– OSPF, IS-IS (Intermediate System To Intermediate
System )
– Link states (delays) are spread into the network• Packets are forwarded on the shortest path tree
• Widespread, its role is straightforward– From a technical point of view not very popular
25
ATM - Asynchronous Transfer Mode
• Complex• Capable for traffic engineering• Packets are switched (instead of forwarding)
based on labels• At the edge of the domain
– Ingress LER (Label Edge Router) attach a label to the packet header
– LSR (Label Switch Router) switches the packet based on the label
– Egress LER removes the label from the packet
26
IP/MPLS - Multiprotocol Label Switching
• MPLS instead of ATM• Switching tables
– Link state protocols• Distribute topology information• OSPF-TE, IS-IS-TE
– Control plane and label distribution• LSPs (Label Switched Path)• LDP, RSVP-TE, CR-LDP
Forwarding:Label Swapping
Control:
IP Router Software
Control:
IP Router Software
Forwarding:Longest-match Lookup
Control:
ATM Forum Software
Forwarding:Label Swapping
IP Router MPLS ATM Switch
27
MPLS overview1a. Routing protocols (e.g. OSPF-TE, IS-IS-TE) distribute topology information
1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables
2. Ingress LER recieves a packetand attach a label to it
IP
IP 10
3. LSR forwardingand “label swapping”
IP 20IP
40
4. egress LER removes the MPLS label from the packet
IP
28
SONET/SDH
• Pros– Fast recovery
• With self-healing rings• 50ms guaranteed recovery time
– As a widely deployed and working technology it is cheap• Cons
– Coarse granularity, provisioned (no control plane)– Framing overhead– Good for voice traffic, but IP/MPLS/SDH for data…
• New solutions in order to make it more flexible– Next generation SDH/SONET (ngSDH/SONET)
• GFP: general framing procedure => enables statistical multiplexing• VCat: Virtual concatenation => allows finer granularity• LCAS: Link capacity adjustment scheme=> meet application bw need
29
WDM - Wavelength division multiplexed optical networks
– Centralized– Permanent point-to-point connections
• Statically configured via the management system
Management system
CxC
CxC
CxC
PC
30
Evolution of networks
Thin SONET
Optics
MPLSSONET
IP
Optics
ATM
LayerLayer33
22
11
00 Packet
Optical
Inter-working Smart
Optical Smart Optical
PacketIP/MPLSPacket
IP/MPLS
LayerLayer
2/32/3
0/10/1
1999 200x2003
BGP-4: 15 – 30 minutesOSPF: 10 seconds to minutesSONET: 50 milliseconds
IP
GMPLSASON
31
Evolution of optical devices
Signal processing
Transmitter
Receiver
Switching
Optical regeneration
Burst mode receivers
Adaptive receivers - bit rate, format- Electrical Compensation of CD, PMD – FFE, DFE, MLSE
- Adaptation to power, noise
Transmission EDFA and Raman amplification under dynamic operation
Contention resolution – Fibre delay lines, RAM
Tunable filters
Tunable lambda Converter & AWG based OXC
Tunable optical compensators (CD, PMD)
Burst Mode Tx
Reconfigurable Optical Cross Connects
Advanced modulation formats (DPSK, Duobinary, etc.)
Tunable Lasers ........... � -------- Fast Tunable Laser
SOA Space SwitchesMEMSThermo-optical
ROADM - reconfigurable optical add-drop multiplexer
All-optical Wavelength conversion
Optical Performance Monitoring
Forward Error Correction
short term network scenario medium term network scenario long term….
32
All-optical (or transparent) view• A lightpath is an optical path established between two nodes of
the network, carrying only optical signals. Two lightpaths can use the same links if and only if they use different wavelengths.
• Dynamically built and released connections– Distributed– Lightpaths are built via user initiated signals
• Conversion to the electrical domain is needed only at the endpoints– Transparency in the data plane– Fast
• no need to wait for O/E/O conversion at intermediate nodes– Wavelength converting transponders
• 3R function Re-time, re-transmit, re-shape in the optical domain – Status information from lightpath is available only at the endpoints
• Localize failure is harder than in opaque optical networks
33
Evolution of networks
Transport
Copper(Analog) copper (Digital) Fiber cable
Point-point
Optical (circuit) switching
Opticla packetswitching (OPS)
1970 1995
Signalling system
Centralized Network Management System
Optical Transport Network (OTN)SDH
??
Today
Distributed Signalling system (CP)
Transport technology
20xx 20xx
34
Lambda 1
Lambda N
FSC Cloud
LSC Cloud
TDM Cloud
PSC Cloud
Fiber 1
Fiber NFiber Bundle
TDM Slot 1
TDM Slot N
Packet LSP 1
Packet LSP N
Fiber LSP’s
Lambda LSP’s
TDM LSP’s
Packet LSP’s
Combining Low-Order LSP’s Splitting High-Order LSP’s
Generalized MPLS
35
Cross-connects• Generalized MPLS functions
– Packet Switch Capable (PSC)• Router/ATM Switch/Frame Relay Switch
– Time Division Multiplexing Capable (TDMC)• SONET/SDH ADM/Digital Cross-connects
– Lambda Switch Capable (LSC)• All Optical ADM or Optical Cross-connects (OXC)
– Fiber-Switch Capable (FSC)
FSCLSC
LSC
TDMC
TDMC
PSC
36
References• Andrea Bobbio “Dependability & Maintainability Theory and Methods”• Jim Gray “Dependability in the Internet Era”• J.-P. Vasseur, M. Pickavet, P. Demeester, “Network Recovery. Protection
and Restoration of Optical, SONET-SDH, IP, and MPLS”, MorganKaufmann Publishers, San Francisco 2004.
• S. Verbrugge, D. Colle, P. Demeester, R. Huelsermann, M. Jaeger, “General Availability Model for Multilayer Transport Networks” , DRCN 2005.
• Máthé Dániel, “Hálózatok rendelkezésre állásának vizsgálata”, diplomunka, BME 2007
• Li Yin, „MPLS and GMPLS”• Kefei Wang, „Protection & Restoration for Optical Ethernet”• Dimitri Papadimitriou, „Generalized MPLS”• Ling Huang, „Protection and Restoration in Optical Network”• Jesús F. Lobo, „Impact of GMPLS on an integrated operator”• Andrew G. Malis, „Using Multi-Layer Routing to Provision Services across
MPLS/GMPLS Domain Boundaries”