survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/vitmm215... · 1b....

6
1 Survivable optical network design 1-2 Péter Babarczi [email protected] (János Tapolcai [email protected]) Outline Basic notions in survivable optical networks • Main failure sources • Optical networks – overview • Equipment availability • System availability 3 Motivation behind survivable network design 4 Reliability Failure is the termination of the ability of a network element to perform a required function. Hence, a network failure happens at one particular moment t f Reliability, R(t) continuous operation of a system or service refers to the probability of the system being adequately operational (i.e. failure free operation) for the period of time [0 – t] intended in the presence of network failures 5 Reliability (2) Reliability, R(t) Defined as 1- F(t) (cummulative distribution function, cdf) Simple model: exponentially distributed variables Properies: non-increasing t t e e t F t R λ λ - - = - - = - = ) 1 ( 1 ) ( 1 ) ( t a 1 0 R(t) R(a) 0 ) ( lim 1 ) 0 ( = = t R R t 6 Device is operational Network with reparable subsystems t UP DOWN Device is operational Device is operational The network element is failed, repair action is in progress. Failure Measures to charecterize a reparable system are: Availability, A(t) refers to the probability of a reparable system to be found in the operational state at some time t in the future A(t) = P(time = t, system = UP) Unavailability, U(t) refers to the probability of a reparable system to be found in the faulty state at some time t in the future U(t) = P(time = t, system = DOWN) A(t) + U(t) = 1 at time t Failure

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

1

Survivable optical networkdesign 1-2

Péter [email protected]

(János Tapolcai [email protected])

Outline

• Basic notions in survivable opticalnetworks

• Main failure sources• Optical networks – overview• Equipment availability• System availability

3

Motivation behind survivablenetwork design

4

Reliability

• Failure– is the termination of the ability of a network element to

perform a required function. Hence, a network failure happens at one particular moment tf

• Reliability, R(t)– continuous operation of a system or service– refers to the probability of the system being

adequately operational (i.e. failure free operation) for the period of time [0 – t] intended in the presence of network failures

5

Reliability (2)• Reliability, R(t)

– Defined as 1- F(t) (cummulative distribution function, cdf)– Simple model: exponentially distributed variables

• Properies: – non-increasing––

tt eetFtR λλ −− =−−=−= )1(1)(1)(

ta

1

0

R(t)

R(a)

0)(lim

1)0(

==

∞→tR

R

t

6

Device is operational

Network with reparable subsystems

t

UP

DOWN

Device is operational

Device is operational

The network element is failed, repair action is in progress.

Failure

• Measures to charecterize a reparable system are:– Availability, A(t)

• refers to the probability of a reparable system to be found in the operational state at some time t in the future

• A(t) = P(time = t, system = UP) – Unavailability, U(t)

• refers to the probability of a reparable system to be found in the faulty state at some time t in the future

• U(t) = P(time = t, system = DOWN)• A(t) + U(t) = 1 at time t

Failure

Page 2: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

7

Element Availability Assignment• The mainly used measures are

– MTTR - Mean Time To Repair– MTTF - Mean Time to Failure

• MTTR << MTTF– MTBF - Mean Time Between Failures

• MTBF=MTTF+MTTR• if the repair is fast, MTBF is approximately the same as MTTF• Sometimes given in FITs (Failures in Time), MTBF[h]=109/FIT

• Another notation– MUT - Mean Up Time

• Like MTTF– MDT - Mean Down Time

• Like MTTR– MCT - Mean Cycle Time

• MCT=MUT+MDT 8

Availability in hoursAvailability Nines

Outage time/year

Outage time/month

Outage time/week

90% 1 nine 36.52 day 73.04 hour 16.80 hour

95% - 18.26 day 36.52 hour 8.40 hour

98% - 7.30 day 14.60 hour 3.36 hour

99%2 nines

(maintained)3.65 day 7.30 hour 1.68 hour

99.5% - 1.83 day 3.65 hour 50.40 min

99.8% - 17.53 hour 87.66 min 20.16 min

99.9%3 nines (wellmaintained)

8.77 hour 43.83 min 10.08 min

99.95% - 4.38 hour 21.91 min 5.04 min

99.99% 4 nines 52.59 min 4.38 min 1.01 min

99.999%5 nines (failure

protected)5.26 min 25.9 sec 6.05 sec

99.9999%6 nines (highreliability)

31.56 sec 2.62 sec 0.61 sec

99.99999% 7 nines 3.16 sec 0.26 sec 0.61 sec

Outline

• Basic notions in survivable opticalnetworks

• Main failure sources• Optical networks – overview• Equipment availability• System availability

10

Failure sources – HW failures

• Network element failures– Type failures

• Manufacturing or design failures • Turns out at the testing phase

– Wear out • Processor, memory, main board, interface cards

• Components with moving parts:– Cooling fans, hard disk, power supply

– Natural phenomena is mostly influence and damage these devices (e.g. high humidity, high temperature, earthquake)

• Circuit breakers, transistors, etc.

11

Failure sources – SW failures

• Design errors• High complexity and compound failures

• Faulty implementations • Typos in variable names

– Compiler detects most of these failures

• Failed memory reading/writing operation

12

Failure sources –Operator errors (1)

• Unplanned maintenance– Misconfiguration

• Routing and addressing– misconfigured addresses or prefixes, interface identifiers, link

metrics, and timers and queues (Diffserv)• Traffic Conditioners

– Policers, classifiers, markers, shapers • Wrong security settings

– Block legacy traffic

– Other operation faults: • Accidental errors (unplug, reset)• Access denial (forgotten password)

• Planned maintenance• Upgrade is longer than planned

Page 3: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

13

Failure sources –Operator errors (2)

• Topology/Dimensioning/Implementation design errors– Weak processor in routers– High BER in long cables – Topology is not meshed enough (not enough

redundancy in protection path selection) • Compatibility errors

– Between different vendors and versions– Between service providers or AS (Autonomous

system)• Different routein settings and Admission Control between two

ASs

14

Failure sources –Operator errors (3)

• Operation and maintenance errors

Updates and patches

Misconfiguration

Device upgrade

Maintenance

Data mirroring or recovery

Monitoring and testing

Teach users

Other

15

Failure sources –User errors

• Failures from malicious users – Physical devices

• Robbery, damage the device– Against nodes

• Viruses– DoS (denial-of-service) attack (i.e. used in the Interneten)

• Routers are overload• At once from many addresses• IP address spoofing• Example: Ping of Death – the maximal size of ping packet is 65535 byte. In

1996 computers could be froze by recieving larger packets.• Unexpected user behavior

– Short term • Extreme events (mass calling)• Mobility of users (e.g. after a football match the given cell is congested)

– Long term • New popular sites and killer applications

Failure sources –Environmental causes

• Cable cuts – Road construction (‘Universal Cable Locator’)– Rodent bites

• Fading of radio waves– New skyscraper (e.g. CN Tower)– Clouds, fog, smog, etc.– Birds, planes

• Electro-magnetic interference– Electro-magnetic noise – solar flares

• Power outage• Humidity and temperature

– Air-conditioner fault• Natural disasters

– Fires, floods, terrorist attacks, lightnings, earthquakes, etc.

17

Michnet ISP Backbone11/97 – 11/98

Maintenance

Power OutageFiber Cut/Cicuit/Carrier Problem

Hardware Problem

Routing Problems

Interface Down

Congestion/Sluggish

Malicious Attack

Software Problem

• Which failures are the most probable ones?

18

Michnet ISP Backbone11/97 – 11/98

Operator35%

Hardw are15%

Environmental31%

User5%

Unknow n11%

Malice2%

Softw are1%

Cause Type # [%]

Maintenance Operator 272 16.2

Power Outage Environmental 273 16.0

Fiber Cut/Cicuit/Carrier Problem

Environmental 261 15.3

Unreachable Operator 215 12.6

Hardware Problem Hardware 154 9.0

Interface Down Hardware 105 6.2

Routing Problems Operator 104 6.1

Miscellaneous Unknown 86 5.9

Unknown/Undetermined/No problem

Unknown 32 5.6

Congestion/Sluggish User 65 4.6

Malicious Attack Malice 26 1.5

Software Problem Software 23 1.3

Page 4: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

19

Case study - 2002

• D. Patterson et. al.: “Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies”, UC Berkeley Computer ScienceTechnical Report UCB//CSD-02-1175, March 15, 2002,

20

Failure sources - Summary• Operator errors (misconfiguration)

– Simple solutions needed– Sometimes reach 90% of all failures

• Planned maintenance– Running at night – Sometimes reach 20% of all failures

• DoS attack– It will be worse in the future

• Software failures– 10 million line source codes

• Link failures– Anything from which a point-to-point connection fails (not only cable

cuts)– Protection needed!

Outline

• Basic notions in survivable opticalnetworks

• Main failure sources• Optical networks – overview• Equipment availability• System availability

22

High Speed Backbone Service

providers

PSTNPSTN

InternetInternet

VideoVideo

Backbone

Mobile access

MetroMetro

BusinessBusiness

Telecommunicaiton Networks

23

Current network architecture in backbone networks

IP

ATM

SDH

WDM

Applications and services

Trafficengineering

Transport andprotection

High bandwidth

24

IP - Internet Protocol

• Packet switched– Packets are forwarded based on forwarding tables

• Routing is performed via link-state protocols– OSPF, IS-IS (Intermediate System To Intermediate

System )

– Link states (delays) are spread into the network• Packets are forwarded on the shortest path tree

• Widespread, its role is straightforward– From a technical point of view not very popular

Page 5: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

25

ATM - Asynchronous Transfer Mode

• Complex• Capable for traffic engineering• Packets are switched (instead of forwarding)

based on labels• At the edge of the domain

– Ingress LER (Label Edge Router) attach a label to the packet header

– LSR (Label Switch Router) switches the packet based on the label

– Egress LER removes the label from the packet

26

IP/MPLS - Multiprotocol Label Switching

• MPLS instead of ATM• Switching tables

– Link state protocols• Distribute topology information• OSPF-TE, IS-IS-TE

– Control plane and label distribution• LSPs (Label Switched Path)• LDP, RSVP-TE, CR-LDP

Forwarding:Label Swapping

Control:

IP Router Software

Control:

IP Router Software

Forwarding:Longest-match Lookup

Control:

ATM Forum Software

Forwarding:Label Swapping

IP Router MPLS ATM Switch

27

MPLS overview1a. Routing protocols (e.g. OSPF-TE, IS-IS-TE) distribute topology information

1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables

2. Ingress LER recieves a packetand attach a label to it

IP

IP 10

3. LSR forwardingand “label swapping”

IP 20IP

40

4. egress LER removes the MPLS label from the packet

IP

28

SONET/SDH

• Pros– Fast recovery

• With self-healing rings• 50ms guaranteed recovery time

– As a widely deployed and working technology it is cheap• Cons

– Coarse granularity, provisioned (no control plane)– Framing overhead– Good for voice traffic, but IP/MPLS/SDH for data…

• New solutions in order to make it more flexible– Next generation SDH/SONET (ngSDH/SONET)

• GFP: general framing procedure => enables statistical multiplexing• VCat: Virtual concatenation => allows finer granularity• LCAS: Link capacity adjustment scheme=> meet application bw need

29

WDM - Wavelength division multiplexed optical networks

– Centralized– Permanent point-to-point connections

• Statically configured via the management system

Management system

CxC

CxC

CxC

PC

30

Evolution of networks

Thin SONET

Optics

MPLSSONET

IP

Optics

ATM

LayerLayer33

22

11

00 Packet

Optical

Inter-working Smart

Optical Smart Optical

PacketIP/MPLSPacket

IP/MPLS

LayerLayer

2/32/3

0/10/1

1999 200x2003

BGP-4: 15 – 30 minutesOSPF: 10 seconds to minutesSONET: 50 milliseconds

IP

GMPLSASON

Page 6: Survivable optical network design 1-2opti.tmit.bme.hu/netplan/2011en/slides/VITMM215... · 1b. Label Distribution Protocol (LDP) Configures the packet forwarding tables 2. Ingress

31

Evolution of optical devices

Signal processing

Transmitter

Receiver

Switching

Optical regeneration

Burst mode receivers

Adaptive receivers - bit rate, format- Electrical Compensation of CD, PMD – FFE, DFE, MLSE

- Adaptation to power, noise

Transmission EDFA and Raman amplification under dynamic operation

Contention resolution – Fibre delay lines, RAM

Tunable filters

Tunable lambda Converter & AWG based OXC

Tunable optical compensators (CD, PMD)

Burst Mode Tx

Reconfigurable Optical Cross Connects

Advanced modulation formats (DPSK, Duobinary, etc.)

Tunable Lasers ........... � -------- Fast Tunable Laser

SOA Space SwitchesMEMSThermo-optical

ROADM - reconfigurable optical add-drop multiplexer

All-optical Wavelength conversion

Optical Performance Monitoring

Forward Error Correction

short term network scenario medium term network scenario long term….

32

All-optical (or transparent) view• A lightpath is an optical path established between two nodes of

the network, carrying only optical signals. Two lightpaths can use the same links if and only if they use different wavelengths.

• Dynamically built and released connections– Distributed– Lightpaths are built via user initiated signals

• Conversion to the electrical domain is needed only at the endpoints– Transparency in the data plane– Fast

• no need to wait for O/E/O conversion at intermediate nodes– Wavelength converting transponders

• 3R function Re-time, re-transmit, re-shape in the optical domain – Status information from lightpath is available only at the endpoints

• Localize failure is harder than in opaque optical networks

33

Evolution of networks

Transport

Copper(Analog) copper (Digital) Fiber cable

Point-point

Optical (circuit) switching

Opticla packetswitching (OPS)

1970 1995

Signalling system

Centralized Network Management System

Optical Transport Network (OTN)SDH

??

Today

Distributed Signalling system (CP)

Transport technology

20xx 20xx

34

Lambda 1

Lambda N

FSC Cloud

LSC Cloud

TDM Cloud

PSC Cloud

Fiber 1

Fiber NFiber Bundle

TDM Slot 1

TDM Slot N

Packet LSP 1

Packet LSP N

Fiber LSP’s

Lambda LSP’s

TDM LSP’s

Packet LSP’s

Combining Low-Order LSP’s Splitting High-Order LSP’s

Generalized MPLS

35

Cross-connects• Generalized MPLS functions

– Packet Switch Capable (PSC)• Router/ATM Switch/Frame Relay Switch

– Time Division Multiplexing Capable (TDMC)• SONET/SDH ADM/Digital Cross-connects

– Lambda Switch Capable (LSC)• All Optical ADM or Optical Cross-connects (OXC)

– Fiber-Switch Capable (FSC)

FSCLSC

LSC

TDMC

TDMC

PSC

36

References• Andrea Bobbio “Dependability & Maintainability Theory and Methods”• Jim Gray “Dependability in the Internet Era”• J.-P. Vasseur, M. Pickavet, P. Demeester, “Network Recovery. Protection

and Restoration of Optical, SONET-SDH, IP, and MPLS”, MorganKaufmann Publishers, San Francisco 2004.

• S. Verbrugge, D. Colle, P. Demeester, R. Huelsermann, M. Jaeger, “General Availability Model for Multilayer Transport Networks” , DRCN 2005.

• Máthé Dániel, “Hálózatok rendelkezésre állásának vizsgálata”, diplomunka, BME 2007

• Li Yin, „MPLS and GMPLS”• Kefei Wang, „Protection & Restoration for Optical Ethernet”• Dimitri Papadimitriou, „Generalized MPLS”• Ling Huang, „Protection and Restoration in Optical Network”• Jesús F. Lobo, „Impact of GMPLS on an integrated operator”• Andrew G. Malis, „Using Multi-Layer Routing to Provision Services across

MPLS/GMPLS Domain Boundaries”