scalable resilient overlay networks

164
Scalable Resilient Overlay Networks Sameer Hashmat QAZI A dissertation submitted in fulfilment of the requirements for the degree of Doctor of Philosophy The School of Electrical Engineering and Telecommunications The University of New South Wales October 2009

Upload: others

Post on 03-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Scalable Resilient Overlay Networks

Sameer Hashmat QAZI

A dissertation submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy

The School of Electrical Engineering and Telecommunications The University of New South Wales

October 2009

2

3

ABSTRACT

The Internet has scaled massively over the past 15 years to extend to billions of users. These

users increasingly require extensive applications and capabilities from the Internet, such as Quality

of Service (QoS) optimized paths between end hosts. When default Internet paths may not meet

their requirements adequately, there is a need to facilitate the discovery of such QoS optimized

paths. Fortunately, even though the route offered by the Internet may not work (to the required level

of performance), often there exist alternate routes that do work. When the direct Internet path

between two Internet hosts for instance is sub-optimal (according to specific user defined criterion),

there is a possibility that the direct paths of both to a third host may not be suffering from the same

problem owing to path disjointness. Overlay Networks facilitate the discovery of such composite

alternate paths through third party hosts.

To discover such alternate paths, overlay hosts regularly monitor both Internet path quality and

choose better alternate paths via other hosts. Such measurements are costly and pose scalability

problems for large overlay networks. This thesis asserts and shows that these overheads could be

lowered substantially if the network layer path information between overlay hosts could be

obtained, which facilitates selection of disjoint paths. This thesis further demonstrates that obtaining

such network layer path information is very challenging. As opposed to the path monitoring which

only requires cooperation of overlay hosts, disjoint path selection depends on the accuracy of

information about the underlay, which is out of the domain of control of the overlay and so may

contain inaccuracies. This thesis investigates how such information could be gleaned at different

granularities for optimal tradeoffs between spatial and/or temporal methods for selection of

alternate paths.

The main contributions of this thesis are: (i) investigation of scalable techniques to facilitate

alternate path computation using network layer path information; (ii) a review of the realistic

performance gains achievable using such alternate paths; and (iii) investigation of techniques for

revealing the presence of incorrect network layer path information, proposal of new techniques for

its removal.

Keywords:

Quality of Service, Overlay Networks, Peer-to-Peer Systems, Service-oriented Networks

4

5

ACKNOWLEDGEMENTS First, I would like to thank the All-Mighty. After that I am very profoundly grateful to my advisor

Dr. Timothy Moors for his trust in me throughout the last four years, his unconditional support,

patience and guidance without which I could not have accomplished this long research journey. I

would also like to thank my Co-Adviser Dr. Aruna Seneviratne for guiding me in the initial stages

of my PhD.

I would also like to thank National University of Science and Technology (NUST), Pakistan for

extending their generous financial support for 3 years of my PhD candidature. I thank my

supervisor and the Head of Electrical Engineering School (UNSW), Dr. Timothy Hesketh to

provide me with PhD completion scholarship for partial financial support during the fourth year of

my candidature. My thanks are also to the Graduate Research School (UNSW) for awarding post

graduate students with travel grants to help fund my conference travels.

I would also like thank all the fellow Networks Group members (present and former): Arun,

Arvind, Bo, Jack, John, Nick, Nixian, Mohammad, Nick, Shuo, Zawar; and other friends, Mark,

Phu and Adeel for their companionship and help throughout the PhD journey. I would especially

like to thank Dr. Eric. D. Kolaczyk (Boston University) for his helpful comments on the work on

the removal of Routing Matrix Inconsistencies to improve statistical path estimation. I would also

like to acknowledge the help extended to me by Theirry Rakotoarivelo from NICTA, with whom I

shared fruitful discussions on the availability and use of Internet Datasets. I would thank also Ido

Nevat for helpful discussions on robust regression techniques. I profoundly thank Jack Tsai and

Arun Vishwanath for proofreading this dissertation. I would also like to thank Phil Allen who

looked after the welfare of our research tools namely our PCs and software applications, whenever

we had any issues.

Finally, I would like to express my profound gratitude to my parents for their hard work and

sacrifices; my sister, and my late grandmother. They all encouraged and inspired me in many ways.

I would have never made it through this journey without their love and their continuous prayers.

6

7

LIST OF ABBREVIATIONS

AMP Active Measurement Project

AS Autonomous System

ASN Autonomous System Number

BGP Border Gateway Protocol

BLP Best Linear Predictor

CAIDA The Cooperative Association for Internet Data Analysis

CDN Content Distribution Network

CO Convex Optimization

CORR Correlation

COV Covariance

DHT Distributed Hash Table

EDR Earliest Divergence Rule

EID Endpoint Identifier

FEC Forward Error Correction

GPS Global Positioning System

HLP Hybrid Link-state Path-vector

IP Internet Protocol

ISP Internet Service Provider

KBR Key Based Routing

MIRO Multipath Interdomain Routing

8

MST Minimum Spanning Tree

NCC Network Coordination Center

NLANR The National Laboratory for Applied Network Research

NIRA New Internet Routing Architecture

NP Non-polynomial time solvable

QoS Quality of Service

RD Rank Deficiency

RIPE Réseaux IP Européens

RMI Routing Matrix Inconsistencies

RON Resilient Overlay Networks

RPE Relative Prediction Error

RTT Round Trip Time

SVD Singular Value Decomposition

TCP Transmission Control Protocol

ToR Type Of Relationship

TTM Test Traffic Measurement

UDP User Datagram Protocol

VAR Variance

VoIP Voice over IP

9

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it

contains no materials previously published or written by another person, or substantial proportions

of material which have been accepted for the award of any other degree or diploma at UNSW or

any other educational institution, except where due acknowledgement is made in the thesis. Any

contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is

explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the

product of my own work, except to the extent that assistance from others in the project's design and

conception or in style, presentation and linguistic expression is acknowledged.’

Signed …………SAMEER QAZI…………..………….

Date …………16 October 2009.…………………….

10

11

OUTLINE

Part I –Introduction and Background

1 Introduction 2 Literature Review 3 Description of Internet Datasets used in this dissertation

Part II –Scalable Heuristics for Selecting Disjoint Paths in Overlay Network

4 An Architecture for Selecting Disjoint Paths- Globally Scalable RON Service 5 Disjoint Path Selection in Overlay Networks using ToR Graphs

Part III-Path Monitoring in Overlay Networks 6 Issues of Statistical Path Monitoring in Overlay Networks 7 Conclusions and Proposals for Future Directions of Research

12

13

TABLE OF CONTENTS Abstract..............................................................................................................................................................3 Acknowledgements............................................................................................................................................5 List of Abbreviations ........................................................................................................................................7 Originality Statement .......................................................................................................................................9 Outline..............................................................................................................................................................11 Table Of Contents ...........................................................................................................................................13 List of Figures..................................................................................................................................................15 List of Tables ...................................................................................................................................................17 List of Publications..........................................................................................................................................19 Part I ................................................................................................................................................................21 Introduction and Background .......................................................................................................................21 1 Introduction ...........................................................................................................................................23

1.1 Why Overlay Networks? .............................................................................................................23 1.2 Dissertation Overview..................................................................................................................27

2 Literature Review..................................................................................................................................29 2.1 Introduction..................................................................................................................................29 2.2 Exploiting Path Diversity in the Internet through Overlay Networks ....................................30

2.2.1 Overlay Topology ...............................................................................................................36 2.2.2 Monitoring Overlay Links .................................................................................................39 2.2.3 Selecting Overlay Paths .....................................................................................................43 2.2.4 Detouring Packets...............................................................................................................47 2.2.5 (In-)Feasibility of Selfish-Routing on Overlay-Networks ...............................................49 2.2.6 Open Research-issues with Overlay-Networks ................................................................50

2.3 Proposals To Modify Underlay Routing Mechanisms ..............................................................51 2.3.1 Re-Engineering BGP-4.......................................................................................................51 2.3.2 Enhancing network level packet forwarding decisions to exploit path diversity..........54 2.3.3 Fast Re-Route (FRR) construction to reduce failover times...........................................56 2.3.4 Open Research-issues with proposals to modify underlay routing mechanisms ..........58

2.4 Multi-Homing Solutions ..............................................................................................................59 2.4.1 Open Research-issues with Multi-homing........................................................................61

2.5 Chapter Summary .......................................................................................................................62 3 Description of Internet Datasets Used in This Dissertation ...............................................................63

3.1 Datasets considered and methodology for obtaining the datasets ...........................................63 3.2 Network Layer Characteristics of Overlay Paths Vs Direct Paths..........................................66 3.3 When is the Direct Internet path degraded? .............................................................................70

Part II...............................................................................................................................................................73 Scalable Heuristics for Selecting Disjoint Paths In Overlay Networks ......................................................73 4 An Architecture for Selecting Disjoint Paths- Globally Scalable RON Service...............................75

4.1 Introduction..................................................................................................................................75 4.2 Relationship between Overlay Network size and path diversity it offers................................75 4.3 Are some overlay paths preferred more often than others?.....................................................77 4.4 DG-RON Clients and Services ....................................................................................................80 4.5 Overlay Infrastructure ................................................................................................................80 4.6 Online Path Selection-Dynamic Path Monitoring.....................................................................82

14

4.7 Offline Path Selection- Landmark Based Heuristics ................................................................ 83 4.8 Performance Evaluation ............................................................................................................. 85

4.8.1 Impact of Detour Set Size .................................................................................................. 86 4.8.2 Evaluation of Offline Path Heuristics............................................................................... 88 4.8.3 Comparison with SPAD..................................................................................................... 89

4.9 Discussion ..................................................................................................................................... 91 4.10 Conclusion.................................................................................................................................... 91

5 Disjoint Path Selection In Overlay Networks using ToR Graphs..................................................... 93 5.1 Introduction ................................................................................................................................. 93 5.2 ToR (Type-of-Relationship) Graphs .......................................................................................... 93 5.3 Maximally-Disjoint Path Computation Using a Greedy approach ......................................... 95

5.3.1 Finding Valley-Free Edge-Disjoint Paths ........................................................................ 95 5.3.2 Finding Maximally-Disjoint Valley-Free Paths............................................................... 98 5.3.3 Comparison with Earliest Divergence Rule (EDR)....................................................... 100

5.4 Performance Evaluation ........................................................................................................... 101 5.4.1 Methodology used to construct ToR-graph ................................................................... 101 5.4.2 Network layer path characteristics inferred from ToR-graph .................................... 102 5.4.3 Performance-Evaluation of the Greedy-Approach ....................................................... 104

5.5 Chapter Summary ..................................................................................................................... 110 Part III........................................................................................................................................................... 113 PATH MONITORING IN OVERLAY NETWORKS.............................................................................. 113 6 Issues of Statistical Path Monitoring In Overlay Networks ............................................................ 115

6.1 Introduction ............................................................................................................................... 115 6.2 Algebraic Notation..................................................................................................................... 117 6.3 Routing matrices and Eigen Spectra of AMP and RIPE data sets........................................ 120

6.3.1 Extent of rank-deficiency ................................................................................................ 120 6.4 Selecting a Subset of Paths for Monitoring and Predicting the Unmonitored Paths Using Best Linear Predictor .............................................................................................................................. 123 6.5 Routing Matrix Inconsistencies ................................................................................................ 129

6.5.1 How RMI occurs? ............................................................................................................ 129 6.5.2 Can RMI be eliminated? ................................................................................................. 138 6.5.3 Quantification of RMI ..................................................................................................... 140

6.6 Statistical Techniques to Mitigate the Effects of RMI............................................................ 144 6.7 Improvement in Path Prediction and Anomaly Detection for AMP and RIPE networks after application of Robust Statistical Techniques......................................................................................... 146 6.8 Discussion ................................................................................................................................... 150 6.9 Conclusion.................................................................................................................................. 151

7 Conclusions And Proposals For Future Directions Of Research.................................................... 153 7.1 Reviewing the Goal.................................................................................................................... 153

7.1.1 Architecture...................................................................................................................... 153 7.1.2 Path Selection ................................................................................................................... 153 7.1.3 Path Monitoring ............................................................................................................... 154

7.2 Future Research Directions ...................................................................................................... 154 7.2.1 More accurate overlay topology ‘modeling’ .................................................................. 154 7.2.2 Accurate depiction of Internet failure models ............................................................... 154 7.2.3 Investigation of synergy between competing overlays .................................................. 155

APPENDIX ................................................................................................................................................... 157 References: .................................................................................................................................................... 159

15

LIST OF FIGURES Figure 1.1 Resilient Overlay Networks. Establishing Alternate paths via an overlay host when the path

between two Internet hosts fail. .............................................................................................................24 Figure 1.2. Logical Overlay topology (top) and Network Layer Overlay topology inferred from

traceroutes. ..............................................................................................................................................25 Figure 2.1 Direct path between UNSW and example.com and a one-hop overlay path via CMU...........31 Figure 2.2 (a) (top) Possible one-hop overlay path between end-hosts when the direct Internet path

suffers from outage/service degradations. (b) Overlay tunnel establishment....................................33 Figure 2.3 (a) (top) Full-Mesh Overlay topology and corresponding network layer topology. (b)

Constructing Minimum-Weight spanning tree to prune overlay topology by removing edges. ......37 Figure 2.4 (a) (top) Probing overlay links. Each overlay host probes paths to all other overlay hosts for

measurement of path-metrics such as latency, throughput and loss rates. (b) Link-State Dissemination Protocol is used to share such measurements between all overlay hosts...................40

Figure 2.5 Algebraic method of path monitoring (assuming path symmetry)...........................................42 Figure 2.6 Earliest-Divergence Heuristic to select disjoint alternate paths ...............................................44 Figure 2.7 Using Key-Based Routing (KBR) to find paths between two end-hosts [36]. ..........................46 Figure 2.8 ‘Drafting’ behind Akamai servers. One-hop indirection through an overlay node. The

overlay node is selected based on preference of Akamai-to serve content from one of its severs. ...48 Figure 2.9 Contention for same set of underlay links. Three overlay networks decide to use same set of

underlay links to improve QoS on end-to-end paths increasing network load (congestion) on links and also towards possible oscillations in quest for better paths..........................................................50

Figure 2.10 (a) (top) A single link-failure invalidates several valid routes (shown by bold arrows). (b) Appending path-withdrawal messages with ‘cause-of-failure’ tags help eliminate all invalid routes quickly and converge to valid route quickly .........................................................................................52

Figure 2.11 MIRO routing example[76]........................................................................................................54 Figure 2.12 Path deflection decision made at router level can exploit the path diversity in the underlay

network ....................................................................................................................................................55 Figure 2.13. Inter-domain MPLS path construction....................................................................................57 Figure 2.14 Single-homing Vs Multi-homing................................................................................................60 Figure 3.1 Location of AMP monitors in North America [100]. .................................................................64 Figure 3.2 Location of RIPE monitors in Europe and the rest of the world[101]. ....................................65 Figure 3.3 Network layer path length at IP level and AS level. (AMP-146-30/Jun/2006(top) and RIPE-

40-05/Sep/2007)........................................................................................................................................66 Figure 3.4 Percentage of one-hop overlay paths which diverge from the direct path at or before nth AS-hop

(AMP-146-30/Jun/2006)..........................................................................................................................68 Figure 3.5 Percentage of one-hop overlay paths which diverge from the direct path at or before nth IP-hop

(AMP-146-30/Jun/2006)..........................................................................................................................68 Figure 3.6 CDF of the difference between the mean path delay on direct Internet path and the mean

delay on the best one-hop overlay path. ................................................................................................69

16

Figure 3.7 Probability plots for paths to show incidence of path outages and performance failures. (RIPE (top) and AMP). .......................................................................................................................... 71

Figure 4.1 Relationship between size of an overlay network and AS degree distributions. X-axis depicts ASes sorted according to their degree-(descending order) normalized by total number of ASes. .. 77

Figure 4.2 Overlay hosts sorted in descending order ‘z’ (x-axis) according to percentage of failures masked, and failures masked as Cumulative function ‘F[z]’ (y-axis) ................................................ 79

Figure 4.3 Finding Topologically diverse detours for underlay destinations. ........................................... 81 Figure 4.4 Offline Detour Selection based on Maximum Divergence Principle. ....................................... 84 Figure 4.5 Delay Gain Comparison between DGRON and RON with variation in detour set size......... 87 Figure 4.6 Delay Gain Comparison between DGRON and SPAD (|T|=12). .............................................. 90 Figure 5.1 Network layer paths between source-destination at AS level topology.................................... 94 Figure 5.2 Example of valid and invalid valley-free paths in ToR-graphs [61, 118]................................. 96 Figure 5.3 (Top) Example of valid valley-free path in the original ToR-graph (G). Dotted lines show

concatenation of a set of C-P (forward) and P-C (backward) edges forming a valley free s-t path. (Bottom) Relaxation using the 2 layer model consisting only of forward edges................................ 97

Figure 5.4 Optimal solution to the Edge-Disjoint Path problem in the Two-Layer ToR-graph.............. 98 Figure 5.5 Path inflation between (a) AMP and (b) RIPE hosts (AS-hops)............................................. 103 Figure 5.6 Number of disjoint paths between (a) AMP (top) and (b) RIPE hosts using ToR-graph. ... 104 Figure 5.7 Number of candidate paths selected by greedy-approach for path outages and performance

failures in the AMP-datasets: (a) AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06. .......... 107 Figure 5.8 Delay gain of best path selected for path outages and performance failures in the AMP-

datasets: (a)AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06. .............................................. 108 Figure 5.9 Correlation of path-delay characteristics between direct-path and best-alternate-path

selected using Greedy Path Selection (Path Outages for AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006).......................................................................................................................................... 110

Figure 6.1 (a) (left)How overlay resilience depends on topology of the underlay network. (b) Inferring maximum information about all virtual overlay links....................................................................... 116

Figure 6.2 Additive Network Metrics. ........................................................................................................ 118 Figure 6.3 Algebraic method of path monitoring ...................................................................................... 119 Figure 6.4 Eigen Spectra of AMP and RIPE Networks............................................................................. 122 Figure 6.5 AS degree for RIPE and AMP networks.................................................................................. 123 Figure 6.6 Problems in estimating of second order link metrics from traceroutes; link correlation

matrices for AMP-30-30/Jun/2006. (a)(top) intra AS links; (b) interAS links ................................ 126 Figure 6.7. L1 error for RIPE and AMP networks as a function of monitored paths............................ 128 Figure 6.8 Load balancing inside an AS. .................................................................................................... 129 Figure 6.9 Incorrect path inference: some links are missed while other false links are added.............. 130 Figure 6.10 Frequency of path variation in AMP networks over 24 hr period....................................... 131 Figure 6.11 Adjusting path inside AS11537 causes significant delay reduction on path between amp-

upenn and amp-hawaii ......................................................................................................................... 132 Figure 6.12 Load balancing inside AS11096 causes anomalous delay measurements at 6th and last hop

on path between amp-fiu and amp-emory.......................................................................................... 132 Figure 6.13 Dynamic Load balancing inside AS11537 for paths to amp-hawaii seems to affect some

paths at different times but not others................................................................................................ 134 Figure 6.14 Comparison of performance of CO estimator for AMP networks....................................... 137 Figure 6.15 Removal of Routing Matrix Inconsistencies (RMI) using the DWI and DWR Heuristic for

removal of false links ............................................................................................................................ 139 Figure 6.16 Comparison of performance of CO estimator before and after removal of RMI for AMP

networks. ............................................................................................................................................... 140 Figure 6.17 Computed value of c as the number of sampled paths increase for AMP50 and RIPE-40 143 Figure 6.18 Comparison of the L1-error metric of BL and Robust predictor. ....................................... 147 Figure 6.19 Comparison of performance of BL and Robust estimator AMP networks......................... 148 Figure 6.20 Improvement in Variance of Relative Prediction Error using BL-ridge and Robust

estimator for AMP networks ............................................................................................................... 149 Figure 6.21 Actual, BL, BL-ridge and Robust predictor delay profile for a selected (unmonitored) path

in AMP-50-30/Jun/2006........................................................................................................................ 150

17

LIST OF TABLES Table 2-1 Factors affecting resilience and performance of overlay networks. ..........................................35 Table 3-1 NLANR-AMP and RIPE-NCC Datasets......................................................................................65 Table 4-1. Path stretch incurred by selecting overlay paths based on offline path heuristics (|T|=12).......89 Table 4-2. Average Performance of offline path heuristics in masking failures (|T|=12). Path outages for

AMP-146-30/June/2006 and Performance Failures for AMP-133-31/Aug/2006. ..............................89 Table 6-1 Dimensions and rank of AMP and RIPE routing matrices. .....................................................120

18

19

LIST OF PUBLICATIONS

Journals S. Qazi and T. Moors, “On the impact of Routing Matrix Inconsistencies on Statistical Path Monitoring in Overlay Networks”, submitted for 2nd round of reviews to Elsevier Computer Networks (ComNet) journal. S. Qazi and T. Moors, “Finding Alternate Paths in the Internet: A Survey of Techniques for End-to-End Path Discovery”, submitted for 2nd round of reviews to IEEE Communications Surveys and Tutorials journal. Conferences S. Qazi and T. Moors, “Practical Issues of Statistical Path Monitoring in Overlay Networks with Large, Rank-Deficient Path Matrices” In Proceedings of IEEE BROADNETS, 2008.

S. Qazi and T. Moors. “Disjoint-Path Selection in Overlays Networks using Type-of-Relationship (ToR) graphs”, In Proceedings of IEEE GLOBECOM, 2007. S. Qazi and T. Moors, “A Robust Wide Area Routing Overlay Using Destination-Guided Detouring”, In Proceedings of IEEE ICC 2007. J. Risson, S. Qazi, T. Moors, A. Harwood, “A Dependable Global Location Service using Rendezvous on Hierarchic Distributed Hash Tables”, In Proceedings of IEEE ICN 2006.

20

21

PART I

INTRODUCTION AND BACKGROUND

22

23

1 INTRODUCTION

1.1 Why Overlay Networks?

The Internet has expanded to a massive scale, incorporating millions of devices belonging to tens

of thousands of networks [1]. One feature that has enabled this scaling has been its use of

hierarchical routing, in which separately administrated Autonomous Systems (ASes) can

independently choose their own interior routing protocol (e.g. OSPF or IGRP) and are

interconnected by a single exterior routing protocol, the Border Gateway Protocol (BGP). Whereas

interior routing protocols can choose paths based on performance metrics chosen by the

administrator, BGP neglects such performance metrics, and only considers routing policies in trying

to find a route. This design of BGP is partially a response to the difficulty of reaching consensus

across all ASes as to what performance metrics should be used and optimized, partly because

merely accounting for service provider policies is sufficiently challenging in itself, and partly

because link and device performance are dynamic, and accounting for their variations would limit

the scalability of BGP. Consequently, routes across the Internet are often not optimized for

performance. Yet many applications are sensitive to route performance. At one extreme, if a route

simply does not work, in that it fails to deliver packets, then that will clearly impinge on

applications that communicate across that route. BGP will eventually detect and recover from such

faults, but to permit it to scale, BGP does not frequently disseminate path availability information,

e.g. it may sometimes take several minutes to learn and apply path updates [2]. As a result,

applications may experience lengthy network outages. A less extreme example of sensitivity to

performance is real-time applications such as Voice over IP (VoIP) that are sensitive to the delay

with which information is transferred across the network. For these applications, the connectivity

that BGP provides may be insufficient, since they seek a certain Quality of Service (QoS) in terms

of the performance of the route.

Fortunately, even though the route offered by BGP may not work (to the level of performance

required by an application), often there exist alternate routes in the Internet that do work. The

question then is how can applications tap into the existing path diversity in the Internet which goes

unexploited by BGP? This is complicated by the fact that source applications have little control of

the route – source routing is often blocked since it poses a security threat and is also incompatible

with the Internet routing model in which ISPs set routing polices based on destination addresses [3].

One approach is to use “Resilient Overlay Networks (RONs)”, in which the source does not address

24

its packets directly to the destination, but initially addresses them to a third party (Figure 1.1), in the

expectation that the path between it and the third party, and then from the third party to the

destination, gives better performance than the direct path. Clearly this can be extended to multiple

intermediate parties. The question then becomes how does the source determine which

intermediate parties to send its packets through?

The first pioneering study [4] demonstrated the application of resilient overlay networks to

improve the reliability with which the Internet can meet application performance metrics. It

involved participating hosts periodically probing the performance of the underlay paths between

each other, and so identifying which alternate path provides the best performance between any two

hosts via a third host. Such a path between two overlay hosts using a third overlay host as an

intermediary is often referred to as an one-hop overlay path. Note that the direct Internet (underlay)

path between two overlay hosts is also referred to as an overlay link [5]. Also note the distinction of

an overlay link from one-hop overlay path described earlier. A one-hop overlay path is formed by

the concatenation of two overlay links (underlay paths). Throughout this thesis, we interchangeably

use the terms overlay links, underlay paths or just paths to denote the end-to-end paths between any

two overlay hosts. The mention of an overlay path or simply an alternate path strictly means a one-

hop overlay path even when it is not mentioned explicitly for the sake of brevity.

While such path probing can ensure that the alternate path does not suffer the same degradation

that may affect the primary path chosen by Internet routing protocols, it does require participating

hosts to frequently probe performance (so that they can rapidly detect and respond to degradations),

and this ultimately limited the scalability of RONs to tens of hosts.

Direct path between overlay hosts A and B fails

Overlay link A,C

Overlay link C,B

One hop overlay path using an intermediate overlay host C

C

A B

Figure 1.1 Resilient Overlay Networks. Establishing Alternate paths via an overlay host when the path between two Internet hosts fail.

25

To reduce path probing overheads, an alternate mechanism would be to select paths based on

their network layer disjointness. For example, two overlay links may seem disjoint when we view

the logical topology of the overlay network (Figure 1.2) but in reality may share many links in the

underlay network with other overlay links. The logical topology of the overlay network consists of

the set of all overlay hosts and end-to-end paths between them. To be able to see the extent of

underlay link sharing amongst overlay links, one would need to know the network layer (underlay)

topology of the overlay network. A snap shot of the full underlay topology is impossible to get, as

ISPs rarely make such information publicly available. More feasible is to map all routing

information of paths between overlay hosts and piece this information together, to obtain a routing

graph (routing topology). In this dissertation, any references made to network layer overlay

topology would strictly refer to an overlay routing graph, ),( EVG = , where the vertex set

),...,,( ,21 rvvvV = refers to IP routers and overlay hosts, and the set of links

>=<= nbmas vveeeeE ,,21 ,);,...,,( represent the set of directed underlay links used on paths between

overlay hosts as determined by some path measurement techniques, such as traceroute (where mav ,

refers to the thm interface of router a ). The overlay routing graph can sometimes also be

represented in a matrix notation as a routing matrix (Chapters 2 & 6). Note that inferring the

network layer overlay topology in this manner is sometimes challenging as this requires

A

C

B

D

A

C

B

D

TracerouteA,B= [a b c d e]

b c

d e

a

Overlay Link A,B= underlay path [a b c d e]

Overlay link A,B

Figure 1.2. Logical Overlay topology (top) and Network Layer Overlay topology inferred from traceroutes.

26

information about the underlay, which is out of the domain of control of the overlay and so may

contain inaccuracies. These issues will be described in more detail in Chapter 6. In this

dissertation, references made to just the overlay topology (e.g. Chapter 2, Section 2.2) would

pertain to the logical overlay topology while references made to the network layer overlay topology

will be made explicit through the terms underlay topology, routing graph or routing matrix.

The first contribution of this thesis is the implementation of a scalable RON service,

DGRON, using a distributed architecture. In classical RON [4], all N overlay hosts need to

maintain overlay links with each other RON host, thus generating )( 2NO overheads which poses

scalability issues. In DGRON, an overlay host typically needs to establish overlay links with a

small (fixed) number of overlay hosts independent of the size of the overlay network. These hosts

are chosen with special consideration to their geographical diversity in the network and their past

performance in providing good alternate paths. Thus, the path monitoring overheads for an overlay

network with N participating hosts can be reduced from )( 2NO to )(NO . We evaluate the

tradeoffs in performance vis a vis topology maintenance and path monitoring overheads. Our results

using real world Internet datasets show that even with a huge reduction in path monitoring

overheads, DGRON’s performance matches closely that of classical RON in finding alternate paths;

matching performance of the best possible alternate path for a majority (90%) of path degradations

encountered.

The second contribution of this thesis is to propose heuristics with which disjoint alternate

paths can be discovered, so reducing the candidate alternate paths to be considered. This thesis

takes the approach of examining the topology of the underlying network at the AS level so as to

estimate viable alternate paths that are likely to be unaffected by a degradation in a direct path.

Because the network topology does not vary as frequently as link and device performance, this

technique enables RONs to scale to larger populations of participating hosts by lowering path

monitoring overheads. Previously proposed techniques such as the Earliest Divergence Rule (EDR)

[6] aim to select AS disjoint paths which separate earliest from the direct path. This can still yield a

large number of candidate paths from which a selection needs to be made. We propose more

elegant graph based algorithms based on ToR (Type-Of-Relationship) graphs, which lowers the

candidate path list over EDR by a factor of half to an order of magnitude in up to 60-70% of cases

while yielding alternate paths with similar delay benefits to EDR.

The third contribution of this thesis is to establish methods to detect and reduce the effects

of topology estimation errors. While, path measurement only requires the services of overlay

27

hosts, routing matrix estimation requires the information about the underlay network, which is out

of the domain of control of the overlay and so may contain inaccuracies; e.g. routers may reveal

inaccurate or false traceroute information. We first propose a light weight algorithm to detect false

routing information from trace routes. We also propose heuristics aimed at perfecting statistical

path measurement techniques based on the accuracy of such routing matrix estimation. Such

techniques leverage topology information inferred from the routing matrix to select a few paths for

monitoring that can lead to path quality estimation for unmonitored paths [7-8]. However, if the

routing matrix cannot be determined accurately, these techniques can yield large path estimation

errors. Our work shows that removal or mitigation of such routing matrix inconsistencies (RMI)

using robust statistical methods alone can improve such path metric prediction by 10-20% and non-

negligible benefits for anomaly detection on unmonitored paths.

1.2 Dissertation Overview

The remainder of this thesis dissertation is organized as follows. Chapter 2 presents an in depth

overview of techniques for alternate path exploration in the Internet including a rigorous analysis of

design criterion for Overlay Networks. Chapter 3 describes the Internet datasets used for trace-

based simulations used throughout this thesis. The next three chapters of this thesis are divided into

two separate parts addressing the issues of scalable architectures for alternate path selection

(Chapters 4 and 5) and path monitoring in Resilient Overlay Networks (Chapter 6). Finally,

Chapter 7 concludes this dissertation and outlines some future research directions.

28

29

2 LITERATURE REVIEW

2.1 Introduction The Internet seems to work most of the time but sometimes recovery from failures is painfully

slow. For many of the user perceived performance failures/faults, e.g. delay in loading a web page

or patchy audio in a VoIP session, there exists a possibility that using an alternate path may offer

better QoS.

Often, such alternate routes remain unexploited due to the scalability objectives of the Border-

Gateway Protocol (BGP), the de facto Internet inter-domain routing protocol that connects all

networks into one giant Internet. BGP is primarily designed for scalable dissemination of network

reachability information according to shortest paths compliant with the commercial traffic transit

policies of ISPs. Incorporating QoS based routing decisions in BGP route selection would defeat its

primary purpose of scalability, as QoS checks on paths need to be made more frequently and

individually than mere reachability checks on aggregate IP blocks. There are also no inter-ISP

benchmarks for acceptable levels of QoS which are defined by individual user applications that may

be sensitive in different ways to the levels of delay, throughput and packet loss. Moreover, if such

QoS based routing decisions could be incorporated into BGP it could cause route flapping; a

phenomenon in which many path updates are triggered when one of the advertised routes repeatedly

updates itself due to the distributed nature of BGP for learning global paths. This problem is bad

enough in BGP when exchanging network reachability information alone; and to prevent this

problem BGP inhibits frequent path updates; this can sometimes cause BGP to take several minutes

to learn and apply path updates [2].

Internet applications e.g. VoIP applications need to meet their QoS demands, so they could

benefit by tapping into the existing path diversity in the Internet for better paths which go

unexploited by BGP as explained earlier. Research focuses on several interesting solutions for

scalable end-to-end path discovery on the Internet without modifying the underlay framework;

these techniques include deployment of overlay networks [4, 9], providing redundant network

connections to end users through multi-homing to several ISPs [10] or a combination of the two

[11]. Other proposals call for changes to the underlay network routing mechanisms [12-14]. We

consider each of these proposals in Section 2.2. These proposals have already been experimentally

deployed over the Internet but it will still be some time before their use becomes widespread. We

then review proposals that call for changes to underlay routing mechanisms (Section 2.3). These

proposals are still in the early stages with no experimental deployments. Finally, we review the

30

benefit of multi-homing (Section 2.4) which although it emerged as the first solution to create path

diversity in the Internet faces stagnation now.

2.2 Exploiting Path Diversity in the Internet through Overlay Networks A natural approach to evaluate the extent of path diversity in the Internet would be to see how

many different end-to-end paths are possible between all hosts. Figure 2.1 shows the path between

an end host in University of New South Wales (UNSW), Sydney, Australia and a host,

www.example.com, located in California, US. UNSW typically uses the services of bigger provider

ISP such as AARNET (Australian Advanced Research and Educational Network) for its connection

to hosts in the continental US. Most service providers, like AARNET, using the hot potato routing

principle [15], will try to kick this traffic outside itself quickly at its nearest inter-domain egress

point to send it to its US based destination. Traceroute shows that the original path uses an egress

point of AARNET at Sydney that takes the packets to www.example.com via a router in Honolulu,

Hawaii to an ingress point in Los Angeles in the US.

Overlay Networks can exploit Internet path redundancy by deflecting packets away from the

original path if it suffers from an outage. Now consider the situation, if the end host in UNSW and

the host, www.example.com formed part of an overlay network together with another host inside

CMU (Carnegie Mellon University). Now if CMU were to be used as the intermediate relay host

assuming there was a fiber optic link fault on the default path via Honolulu, or this path had become

congested due to a sudden surge in traffic. The new path used now uses an AARNET egress point

at Sydney as before but takes the packets to a different ingress point inside the US, northwestern

Seattle instead of south western, Los Angeles.

Under normal circumstances, the original path has a delay of 150 ms. The one-hop alternate path

has a delay of 318 ms (=234+84ms). This is expected as we span the width of the continental US

twice in going to CMU, causing large path inflation. On the other hand, if we had picked an

intermediary host situated very close to www.example.com (instead of CMU), it would have most

likely used the same path as the direct one, as it would be highly unlikely to impact the traffic

routing policy of AARNET. Thus, in choosing such a host, one must be very careful to get the

optimal compromise between achieving path diversity and reducing path inflation.

31

This simple example demonstrates how alternate path selection via overlay networks can help in

tapping into the Internet path diversity. Furthermore, it also makes it clear that overlay networks

help in exploiting the path diversity by changing ingress or egress points through ASes and thus

routing through other ASes disjoint from the original path. This will become more clear in the

following sections. It also highlights the importance of choosing the intermediate host to act as a

detour, wisely.

Several independent research findings [16-17] have shown evidence of path diversity in the

Internet. Savage et al. [17] showed that for almost 80% of the paths used in the Internet there is an

alternate route with a lower probability of packet loss, and for 15% of the paths, there is an

alternative that offers an improvement in latency better than 25%. Similarly, Gummadi et al. [16]

•Sydney

•www.example.com

•Seattle

•Pittsburgh

•Honolulu

(Hawaii)

Direct path to example.com from Sydney

Alternate path via CMU

•Los Angeles150ms

234ms

84ms

UNSW

CMU

Figure 2.1 Direct path between UNSW and example.com and a one-hop overlay path via CMU.

32

showed that 54% of random path and performance failures could be masked by detouring packets to

an intended destination via an intermediate host.

Overlay networks [4] provide a systemic framework for exploiting the path redundancy in the

Internet. Overlay networks are a group of end hosts in the Internet that agree to route packets

between each other to exploit the topological redundancy in the Internet. For example, when the

direct Internet path between the source x and destination y may fail or undergo a performance

failure, it may be possible to use an alternate path by first detouring packets towards an

intermediate host z before sending them towards the destination. Such a path is called a one-hop

overlay path as described in Chapter 1. This is possible if the Internet paths between the source and

the intermediate host, and the intermediate host and the destination are not affected by the failure

due to being spatially disjoint (Figure 2.2a).

Then the aim of the overlay network is to find an intermediate (relay) overlay host ),( yxzz ≠ to

act as a relay in between source x and destination y such that the composite overlay path

yzx >−>− can optimize some path metric such as reduce path delay or packet loss rates, or

increase bandwidth or data throughput.

33

Direct Internet Path suffers from outage/ service degradation

Possible one-hop overlay paths between any two edge-hosts

End-hosts at edge of network

INTERNET

z1

z2

Source x

Destination y

Direct Internet Path suffers from outage/ service degradation

Alternate Tunnel

Non-overlay hosts at edge of network

RON host

INTERNET

z1

z2

Non-overlay Source

x

Non-overlay

Destination y

Figure 2.2 (a) (top) Possible one-hop overlay path between end-hosts when the direct Internet path suffers from outage/service degradations. (b) Overlay tunnel establishment

34

Overlay networks may be used to find and use such one-hop alternate paths to route around path

failures. Several factors (Table 2.1) affect the resilience and performance of an overlay network as

described in following subsections. The degree to which such alternate paths can be spatially

disjoint from original paths between hosts is a function of the physical geometry (spatial

characteristics) of the overlay network relative to the underlay network (Internet). For example an

one-hop overlay path via an intermediate host may be seemingly disjoint from the direct Internet

path but may share several underlay links in the underlay network. Similarly, the efficacy with

which one of several alternate paths is selected depends on the ability to monitor the metrics of all

one-hop overlay paths in the network.

Note that the architecture just described assumes that selecting alternate paths to avoid path

degradation is limited to the intra overlay paths. This poses an obvious question, “Can non overlay

based source-destination pairs benefit from such path diversity?” For non-overlay sources and/or

destinations, the alternate path computation described earlier could take the form of alternate tunnel

computation between overlay hosts closest to source/destination (Figure 2.2b). Such non-overlay

hosts intending to optimize their path selection would then have to subscribe to such a RON service

where the packet forwarding along an alternate tunnel would be handled by them.

The first decision in designing an overlay network is in where to place overlay hosts. Often

hosts cannot control their location, so the next decision is which hosts to select to use for a one-hop

overlay path. After overlay construction, comes the main (and inter-twined) task of overlay link

monitoring and path selection. Sometimes the path monitoring and path selection decisions are

application centric, as different Internet applications may have different QoS needs for which

specialized packet detouring techniques need to be addressed.

35

Table 2-1 Factors affecting resilience and performance of overlay networks.

Overlay Network

Property Techniques discussed in literature Why Important?

Overlay topology

(i)Full-Mesh (Clique)Topology [4]

(ii)Tree-based topologies [18-19]

(iii)Bottom-up Approaches [20]

Overlay Resilience,

Scalability of

Monitoring Paths

(Section 2.2.1)

Monitoring overlay

links

(i)Topology-Unaware approaches [4]

(ii)Topology –Aware approaches [8, 21-26]

Knowledge of path-

performance to make

timely decision for

switching to better

paths (Section 2.2.2)

Selecting overlay

paths

(i)Disjoint Paths [6, 27]

(ii)Path-Ranking based on Performance

Metrics [4, 28]

(iii)Using Path Diversity in Large CDNs

[29-31]

(iv)Using paths preferred by large CDNs

[32]

To select maximally-

disjoint path in the

overlay network with

least probability to fail

when failure on

primary path between

two hosts (Section

2.2.3)

Detouring Packets

(i)Active and Reactive Schemes [4, 28]

(ii)Multi-path routing schemes [9, 31]

Meet application-

specific QoS demands

(e.g. latency,

throughput, loss-rate,

multicasting such as

for gaming, video

conferencing) (Section

2.2.4)

36

2.2.1 Overlay Topology The topology of an overlay plays an essential role in the scalability of path monitoring and the

accuracy in predicting alternate paths. An overlay network basically starts out as a group of

participating hosts willing to route traffic for each other. A logical topology is formed based on

decisions of establishing links between some or all hosts. Such links, often described as overlay-

links, may traverse several underlay links and two overlay links may share underlay links. This

section surveys several proposals that have been made in this regard including the full-mesh

topology; i.e. to establish a link between all overlay hosts, to more scalable tree-based and

distributed approaches.

Full-Mesh (Clique) Topology

RON [4] used a full-mesh architecture, in which individual overlay hosts are connected with all

other hosts in a logical mesh. Each peer probes overlay links connecting it with all other hosts, and

the measured path characteristics are disseminated in the network through link-state flooding

(Figure 2.3(a)). This architecture is ‘ideal’ in the sense that each individual peer can find an

alternate path with high probability by knowing the current performance of all overlay links.

However, the associated overheads in such an architecture are )( 2NO for N overlay-hosts, which

limits the scale of such an overlay networks to 50 hosts [4].

Tree-based topologies

Alternate overlay topologies have been proposed [18-19, 23], for achieving scalable overlay link

monitoring. Monitoring overlay links between all pairs of overlay hosts is clearly inefficient when

we observe that a large number of links may actually be shared amongst overlay links due to the

power law topology of the Internet [33], which suggests that a few links are used by many paths.

Tang and Nakao [19, 24] showed that it is possible to prune the overlay topology to remove

redundant links. For example, one of two overlay links can be removed that have in common a

large number of underlay links or removing an overlay link which is unlikely to be selected by the

overlay routing algorithm. For example several overlay links between hosts in North America and

Europe may traverse the same intercontinental fiber optic link. Monitoring only one such path could

yield bounded performance estimation on all paths, since the major portion of the path delays on all

such paths would be encountered on the intercontinental fiber optic link. Using the same argument,

Li [18] and Nakao [19] proposed that mesh topologies can be reduced to a single tree or multiple

sub-trees by pruning redundant overlay links. Overlay links are redundant when they overlap with

each other at the network layer, as outlined by our previous example.

37

B

A

C

D

E

F

A

B

C

D

E

F

Physical Topology

Logical Overlay Topology (Full-Mesh)

A

F

J

Physical Topology

(Non overlay nodes excluded for clarity)

Minimum-Spanning-tree to prune edges

B

C

E

D

G

H

I

A

F

J

B

C

E

D

G

H

I

6

9

39

914

28

24 9

19

9

5 3

7

4 1 42

2

2

Figure 2.3 (a) (top) Full-Mesh Overlay topology and corresponding network layer topology. (b) Constructing Minimum-Weight spanning tree to prune overlay topology by removing edges.

38

Li and Mohapatra [18] used a minimum-weight spanning tree (MST) algorithm to connect all

overlay hosts which minimizes overall connection cost, i.e.

∑∈Ee

ecMinimize (2-1)

where

nodesoverlay ofset theis ),...,,(, ,

linksoverlay ofset theis ),...,,(

21

21

n

jiji

k

vvvVVvvandvve

eeeE

=∈⟩⟨=

=

metric, desirableany ngrepresenti edges of weightsof sum is ec e.g. latency, as shown in Figure

2.3(b).

However, removing overlay edges may achieve desired scalability at the cost of resilience, as

some crucial overlay link information is lost while pruning edges. Topology-aware heuristics can

play a crucial role in the decision to remove or retain an overlay link when constructing such trees.

For example, Eriksson et al. [34] provided evidence that it is possible to cluster hosts that share

network paths which can help towards constructing sparser spanning trees. Another problem with

MST construction is its dependence on accurate link costs, which may again vary due to differing

levels of network congestion. This would require path probing on all overlay links, even if not

frequently to update link costs for recomputing the MST.

Distributed topologies

Both mesh and tree based topologies are aimed at connecting all or the majority of overlay hosts

together. This may sometimes be not feasible for very large networks. A more scalable approach

here would be to adopt a distributed architecture like CDNs [35-36], where each overlay node has a

degree which is low, of the order of lg N for a network with N overlay hosts. Another architecture

is proposed by Lee et al. [37] and Rakotoarivelo et al. [38], where overlay hosts record their path

measurements to a few super-hosts in the network and the super-hosts maintain a database of

network path measurements. This database can be later queried by all hosts seeking to optimize

QoS between them and other hosts. Load balancing concerns may also warrant careful choice of

super nodes. An obvious caveat with such an architecture as proposed by [37] is its shift from the

aggressive path monitoring approach of RON to a more passive one. For example, querying a

database may waste valuable time and then there is also the issue of staleness of the path

information fetched. For example, a database having recorded a path as good may not have

registered it going bad when such path queries are made. We present a distributed architecture in

Chapter 4 where super nodes and detour sets are selected using a combination of landmark based

approach and data mining. We show that it is possible to tactically choose a small set of detouring

39

nodes in order to find a reasonably good QoS optimized path with high probability, reducing

)( 2NO path monitoring overheads for N overlay-hosts to just )(NO .

Topology based on Evolutionary Approach

Early works (e.g. [4]) chose arbitrary locations/sites for overlay hosts which already gave them

remarkable performance gains. Anderson et al. [4] were already able to recover from around 60% of

Internet failures successfully. The authors of some studies, (e.g. [39-40]) focused on optimizing

overlay node selection and proposed bottom-up strategies. Chun [39] considered overlay

construction as a ‘non-cooperative game’ played by selfish hosts where each tries to minimize the

number of overlay links it establishes by utilizing links established by other hosts. Slight

modifications to the rules of the game result in wide ranging overlay topologies, from complete

meshes to trees and node-degree distributions that range from exponential to power-law. Han et al.

[40] also considered a bottom-up approach, and consider the problem of picking overlay hosts for

maximum path diversity in the overlay network. They found that for minimal sharing amongst

overlay links, overlay hosts should be in diverse ISPs that have no peering relationships with each

other.

2.2.2 Monitoring Overlay Links Dynamic overlay link monitoring is essential in order to quickly recover from a failure in the

underlay network through the use of an alternate one-hop overlay path. Literature [4, 28] suggests

that monitoring path quality is best when using dynamic-online algorithms. However, the overheads

of such techniques are large and are not scalable beyond a modest overlay size. There are a few

proposals [18-19, 23-24] to reduce such overheads using topology-aware approaches.

Topology-Unaware Approaches

The pioneering work in RON [4] connected overlay hosts in a full-mesh topology. Path quality is

monitored by probing all overlay links between hosts in the network (Figure 2.4a); and distributing

such measurements between hosts using link-state protocols (Figure 2.4b). Probing all overlay

links aggressively and subsequent link-state flooding generates a large overhead. The routing

overhead in an overlay topology with n hosts and average node degree d is [18]:

messages statelink ofnumber )1( messages probing ofnumber ×−×+×× nndn (2-2)

40

Anderson et al. [4] found that the probing overhead for 50 hosts (in a mesh topology) is

approximately 30 Kbits/s of outgoing bandwidth per node when path probing interval is 12 seconds.

We will use two Internet datasets RIPE [41] and AMP [42] (more details, Chapter 3) to evaluate the

heuristics presented in this thesis where end-to-end path measurements (path delays) are made at

average intervals of 30 seconds and 1 minute, respectively.

Other approaches for monitoring paths include distributed approaches [37] (described earlier)

where overlay hosts report their path measurements to super hosts, which can be queried later by

other overlay hosts.

Topology-Aware Approaches

Several research papers [5, 8, 19, 21, 23-24, 43-44], aim to reduce path monitoring overheads in

overlay networks by leveraging network layer topology information. Several works propose graph-

Packet loss rate, Throughput, Latency

Disseminate all measurements

Figure 2.4 (a) (top) Probing overlay links. Each overlay host probes paths to all other overlay hosts for measurement of path-metrics such as latency, throughput and loss rates. (b) Link-State Dissemination Protocol is used to share such measurements between all overlay hosts.

41

based approaches to reduce the mesh topology to a tree-based overlay topology with fewer overlay

links to monitor. Tang and McKinley [23] proposed monitoring overlay links based on the

application of normal and weighted variants of the set-cover algorithm, i.e. selecting overlay links

which include as many unshared underlay links as possible. Finding the set cover is a known NP-

hard problem [23]. In this approach, they used a greedy algorithm for an approximate solution. This

leads to performance estimation for a large number of overlay links while actually monitoring a

small subset. A similar approach as been used by Madhyastha et al. in the iPlane project [45-47], to

develop a distributed path monitoring system that can be used to predict path metrics based on

shared components between paths and clustering endhosts based on BGP atoms [48] and

developing a compact library of Internet measurements for peer-to-peer applications. Such

techniques can yield good upper-bounds on path estimation while reducing overall path-monitoring

overheads.

Chen et al. [21] developed an approach to find a set of k paths which can be used to calculate

performance of all 2n end-to-end paths between overlay hosts (overlay links) for n overlay hosts

with 2nk << , based on network tomography principles.

Expressing the problem mathematically, let the vector b denote link measurements e.g. link

delays. Then, the vector Y of path delay measurements is given by:

MbY = (2-3)

where,

pnY ℜ∈ , )( 2nn p = is the set of all possible overlay links (underlay paths),

enb ℜ∈ , en is the total number of underlay links, and

ep nnM ×∈ ]1,0[ is a binary routing matrix in which:

otherwise 0link underlay traverseslink overlay if 1

,

,==

ji

jiM

jiM

Figure 2.5 gives an example of a network and corresponding routing matrix and measurement

vectors assuming path symmetry. The rank of the routing matrix identifies the set of linearly

independent paths which can reveal the characteristics of all paths, so if one measures

r (=Rank( M )) paths, then the path metrics of the entire network can be determined exactly.

Previous research shows that the routing matrices for large overlay networks are ‘rank deficient’, in

the sense that their rank is smaller than either dimension of their matrices, i.e. ),min( ep nnr < .

42

Finally, matrix-decomposition techniques (e.g. QR, SVD) can be used to find the set of r basis

paths corresponding to the linearly independent equations in Equation 2.3. Chen [21] argued that

the order of reduction can be expected to be )lg( nnO from the original )( 2nO paths if 100>n

because of the power-law topology of the Internet.

Chua et al. provided evidence in [8] that the set of r paths have disproportionate amounts of

information and a small subset of the r paths can be used to statistically predict path metrics of all

remaining unmonitored paths to predefined tolerance levels. Similarly, Song et al. [26] also

reported substantial gains when using Bayesian estimation. Naidu et al. [49] claimed that since the

main aim of overlay path monitoring is anomaly detection, further reduction is possible over the set

of paths necessitated by Chen et al. [21]. They showed that up to 50% path reduction was possible

by formulating an LP problem for selecting paths based on the knowledge of joint probability

distribution of link delays.

Coates et al. [22] studied the problem of path reduction in further detail and found that the

reduction brought by [8] could be reduced by an order of magnitude if certain signal compression

techniques, e.g. diffusion wavelets, were applied to incorporate both temporal and spatial path

correlation.

β1

A

B

C

1 0 1

M= 1 1 0

0 1 1

y1

Y= y2

y3

β1

b= β2

β3

Y= Mb

y1 y2

y3

β2

β3

l2

l1

l3

l1 l2 l3

Figure 2.5 Algebraic method of path monitoring (assuming path symmetry)

43

One shortcoming of many of the above approaches is that while routing matrices, link and path

characteristics may be easy to accurately obtain for some individual large ISPs and overlay test

beds used in their case studies, they are not very easy to obtain for overlay networks with hosts

deployed across different ISPs [50]. As we mentioned earlier, while path measurements require

coordination between participation between overlay hosts only, topology estimation requires

participation by non-overlay based elements e.g. routers. As a consequence, topology estimation is

often inaccurate or incomplete. We review the impact of incorrect topology estimation using

evidence from real world Internet datasets in Chapter 6 on such techniques [8, 21] and propose

ways to identify and alleviate such errors.

2.2.3 Selecting Overlay Paths As discussed in the previous section, monitoring overlay links can help in alternate path

selection. In worst cases, path decisions may need to be made in the presence of stale or no link

performance information. Here we highlight a few key ideas used for end-to-end path selection

using overlay-based techniques.

Disjoint Overlay Paths

Several researchers [6, 27] have argued that since Internet paths are often stable on time-scales

of days [51], maintaining complete topology information of the overlay network allows one to

select the most disjoint alternate path without the need for path monitoring. This latter approach

may work for path outages but sometimes may not be very efficient for ensuring strict application

specific metrics, like delay, throughput etc. For example, path delays may not always be a simple

function of fiber delays but a combination of fiber delays, congestion on individual links and packet

queuing delays in routers. This makes path monitoring to meet application-specific QoS demands

more difficult than merely ensuring spatial diversity. Nevertheless, the bulk of the thrust of new

research is centered on improving design heuristics to choose disjoint overlay paths, which is a key

factor in reducing the overheads and improving resilience at the same time. However, such

disjointness needs to be established at the network layer of the network; two overlay links that are

seemingly disjoint at the overlay layer could still share a link in the underlying IP layer. The shared

IP link renders both useless in the event of path failure.

A previous study [6], showed that an Earliest Divergence Rule (EDR) (Figure 2.6) can work well

by selecting the alternate path which diverges at the earliest point from the default-path near the

source. This technique assumes availability of AS level paths (from source overlay hosts to

detouring overlay hosts). In Chapter 6, we show that traceroutes and other tools used for mapping

44

paths are known to reveal path information inaccurately [50, 52-57]. A second assumption of this

technique is that the one-hop overlay paths that diverge earliest will also be the ones that converge

latest with the direct paths. In Chapter 5, we present a more flexible Maximum Divergence Rule to

pick an alternate path most divergent from both the source and the destination part of the original

path using an AS Type-of-Relationship (ToR) graph that can be built with partial AS path

information. Chapter 5 reveals that such an approach can reduce the number of candidate paths

compared to using EDR [6] .

New directions in research focus on making the overlay ‘topology aware’. One study [58]

proposed utilizing routing-underlays to give better information about the underlying IP topology of

the overlay network, so that only a subset of the overlay hosts (with orthogonal IP links) would be

probed and considered for disjoint path selection.

Instead of using dynamic online algorithms to monitor overlay paths, interestingly offline

processing of path measurements can reveal spatial relationships (disjointness) between paths. Cui

et al. [59] proposed a method which establishes performance-related correlations among the

behavior of overlay links, e.g. link-latency. Such correlations can then be used to find a backup-

path for a given primary-path between two overlay-hosts with least correlated-failure probability by

solving the following optimization problem:

∑∑ ∈∈ 00 Pr

Ε(m,n) mnijmnijΕ(i,j))L(LyxMinimize (2-4)

where:

AS A AS B AS C AS D AS E

AS P AS Q AS R

Default Internet Path

Alternate Path via an overlay host whose path diverges earliest from direct path

End-host A End-host B

End-host C

Figure 2.6 Earliest-Divergence Heuristic to select disjoint alternate paths

45

paths backup andprimary on linksoverlay ofy probabilit failurejoint theis ),Pr(

0 else ly,respective paths backup andprimary by used are and linksoverlay

if set to1 arely which respective paths, backup andprimary on flows are and linksoverlay all ofset theis

0

mnij

mnij

mnij

LL

LL

yxE

The above minimization problem can be coupled with other constraints such as delay bounds on

the backup path. Such optimization problems become NP-hard for a large number of variables and

constraints and are suitable for small networks. Moreover, the technique requires synchronization of

participating hosts which may be somewhat difficult to achieve in large networks. A similar idea

with a slightly different objective has been pursued by Antonova et al. [44], with the aim of finding

the optimal way to split a video stream over multiple paths with bounded delay requirements.

Path-Ranking based on Performance Metrics

A large amount of research discusses choices of an appropriate performance metric such as

latency, throughput and loss rates for selecting backup paths in the overlay. Paths are ranked on the

basis of these metrics using scoring functions; these range from weighted-moving averages over

finite temporal windows to statistical approaches [4, 28, 59]. RON [4] distinguished between

different paths on the basis of latency, throughput and loss rates, making the choice of ranking paths

application-specific. Similarly, Kawahara et al. [60] and Uchida et al. [61] proposed selection of

alternate paths by ranking the overlay nodes in order of frequency with which they provide an

optimal path by acting as a relay node.

Zhu [28] used available-bandwidth for alternate path selection claiming latency, loss rates and

throughput metrics could be ‘misleading’ as they often depend on the protocol implementations,

network heterogeneity or temporal effects. It argues that throughput is a function of TCP

parameters and thresholds set for detection of allowable loss rate and latency could be misleading

because of the dynamism and heterogeneity experienced by the network. Similarly, Lee et al. [37]

measured capacity of overlay paths and selected paths based on available bandwidth criteria.

Hu and Steenkiste [62] showed that in comparison to delay estimation on end to end paths,

bandwidth is often bounded by the (bandwidth of) bottleneck links. Identification of such

bottleneck links is often easy as they are often within a radius of within three to four IP hops from

end hosts as the links in the core of the Internet tend to be over provisioned. Measuring the

performance of only the bottleneck links combined with certain rules used in Internet path decisions

such as shortest, valley-free paths (Chapter 5) [27, 63] can reduce the )( 2NO overheads for a N

host overlay network to linear overheads of )(NO .

46

Using Path Diversity in Large CDNs

CDNs [35-36] were motivated by the desire of scalable content distribution using cooperative

hosts. Such overlays are often based on logical topologies based on Distributed Hash Tables

(DHTs) [36, 64-65]. Every participating node and content (files) stored on the network is identified

by a unique identifier (key) in the DHT identifier space. Each peer also maintains small distributed

localized routing tables having entries for a small number of neighboring hosts (also identified by

unique identifiers). Routing involves initiating a search for a key (query) to a neighboring peer

closer to the key value than present node (Figure 2.7). Alternate paths between two edge hosts can

thus be found in such CDNs via intermediate peer/s in a similar fashion to RON, once the direct

Internet path experiences an outage. However, one issue warrant attention; the DHT based identifier

mapping does not ensure that two neighboring hosts are also close in the underlying physical

network. A landmark-based approach is proposed in Brocade [30] to counter both problems using a

small number of super-hosts to ensure overlay routing does not incur large path stretch by using

short-cuts between distant routing domains. New design proposals [29, 64] effectively try to lower

both the number of overlay hops and optimize path metrics such as latency, throughput etc.

Using paths preferred by large CDNs to serve content

Studies, e.g. [32], showed that it is possible for small overlay providers to use network

Direct Internet Path

Alternate Path through Structured-Overlay

Figure 2.7 Using Key-Based Routing (KBR) to find paths between two end-hosts [36].

47

observations from large CDNs, e.g. Akamai [7, 66]. It shows that a single-hop indirection through

an overlay node close to the ‘preferred’ Akamai server to serve content can be effective in

establishing an end-to-end path between hosts with desirable end-to-end path performance (Figure

2.8). Large CDNs already optimize the path selection problem and this can be leveraged by

overlays. Some motivating facts found by the same study show that in some instances up to 200

CDN mirror sites were used to serve content over a 48 hr period and that sometimes CDN content

was served by a mirror outside even when an Akamai server as close to the source due to time of

day effects [32]. However, the two major issues are: (i) the surety that an adequate level of service

from the CDN is available near all overlay hosts; (ii) large CDNs may use techniques to hide

locality information about the servers and served-content to prevent exploitation.

2.2.4 Detouring Packets While the previous section addressed generic alternate path selection problems, path selection

decisions could be more driven by more application specific objectives. Different flows in the

Internet may have different application-specific QoS demands [67-69]. A real-time application,

such as a VoIP packet can tolerate some loss but no delay and requires a different packet-detouring

strategy than a packet in a ftp session which can tolerate delay. Similarly, applications using

different transport mechanisms (UDP or TCP) may require application-specific, packet-detouring

strategies. Following are the two main schemes which we identified from published literature.

48

Reactive and Proactive Schemes

There have been two popular schemes to detour packets on alternate paths. Primary internet paths

and alternate (overlay) paths between end hosts may be aggressively monitored for performance

metrics. Reactive schemes [4] use an alternate path only when the primary Internet path fails to

deliver the required QoS. Proactive schemes tend to be ‘selfish’ and may opt for the best path using

a greedy approach. While the proactive scheme optimizes path selection for some flows, Zhu [28]

showed that it may cause: (i) oscillations in the network due to frequent path swapping hurting non-

overlay traffic; (ii) use of longer paths often for minor performance gains and thus, increasing the

traffic-load on the network. This shows that the proactive scheme while intuitively desirable

appears to be extremely detrimental to global network welfare.

Direct Internet Path suffers from outage/ service degradation

One-hop Indirection using an overlay node near a server preferred by ’Akamai’,- to serve content

Servers preferred by Akamai

Servers NOT preferred by Akamai

Overlay hosts

Drafting:Select overlay node near a server ‘preferred’ by Akamai-to serve content

Figure 2.8 ‘Drafting’ behind Akamai servers. One-hop indirection through an overlay node. The overlay node is selected based on preference of Akamai-to serve content from one of its severs.

49

Multi-path Routing Schemes

Research [4] indicates that alternate paths between end hosts may fail independently of each

other, since routing domains which are independently administered rarely share underlay links.

Some studies [9, 31] investigated the reduction in path probing overheads possible by sending

redundant packets along multiple overlay paths. Assuming the probability of packet loss on one

such path to be ip , the probability that a packet will be lost if sent on N redundant paths is:

∏=

=N

iipP

1redundant (2-5)

To further reduce the probability of packet loss, advanced encoding schemes e.g. Forward Error

Correction (FEC) schemes may be used to detect and correct errors, and hence tolerate packet loss.

While Zhao [31] claimed positive results of using constrained multi-cast for ensuring end-to-end

path in the face of failures, Anderson et al. [9] concluded that such schemes can only prove useful

when links are suffering from low levels of congestion. Moreover, another alarming finding by the

same study is the fact that failures on alternate paths on an overlay network are often more

correlated than previously imagined; a packet loss on one path decreases the conditional loss

probability for success of the redundant packet on an alternate path to about 60 percent. Even

packet-encoding schemes such as FEC lose their effectiveness when path-failures are correlated.

Moreover, a large number of packets sent on the network unnecessarily consume network

resources, increase network load and rob other non-overlay/overlay based flows of their true share.

This technique requires critical information about the underlying IP-level structure of the overlay

topology in order to achieve optimum benefits.

2.2.5 (In-)Feasibility of Selfish-Routing on Overlay-Networks There are several commercial concerns regarding widespread use of overlay networks: ISPs do

not want users to participate (as overlay hosts) due to concerns that overlay networks may impact

the underlay routing policies such as Traffic Engineering [20], hurt non-overlay based traffic due to

greedy utilization of network resources, or introduce oscillations in the Internet due to interaction of

several overlay networks whose traffic rapidly switches paths based on performance benefits [70].

One study [20] observed that selfish-routing using overlays can harm traffic-engineering goals.

Overlays choose paths which are longer than direct Internet paths and may prefer certain links more

than others. This increases network load and increases congestion on some links as investigated by

[20].

50

Debates [39, 70] on coexistence of multiple overlays and their co-existence with the (non

overlay) Internet traffic have aroused suspicions on the effectiveness of overlays in the long term. It

is well understood now that overlay routing networks can provide required performance benefits

leveraging upon the inherent path redundancy in the Internet. However, they actually transfer the

traffic from one subset of paths to another. Keralapura et al. [70] claimed that multiple overlays

performing the same function using their own greedy and selfish routing metrics in selection of

overlay paths could introduce race conditions leading to unwanted routing oscillations (Figure 2.9).

It finds that the probability with which two overlay networks can get synchronized increases if the

multiple interacting overlays are aggressive i.e. have short path probing intervals or path outage

detection times close to each other. This can happen if the overlay hosts of multiple overlay

networks are situated close to each other leading to similar path round trip times used for probe

timeouts, an indicator of path failure. The more dissimilar the overlay networks are in terms of

locality of hosts and path probing parameters, the smaller the probability of routing oscillations

[70].

2.2.6 Open Research-issues with Overlay-Networks All major research related to the study of overlay-network behavior revolves around simulations

using Internet-like topology generators [33, 71-72] or few overlay test beds [4, 73]. A majority of

these topology generators use the hierarchical power-law model [72]. However, some works [74-

75] provided substantial evidence that such static power-law models may not capture the Internet

Figure 2.9 Contention for same set of underlay links. Three overlay networks decide to use same set of underlay links to improve QoS on end-to-end paths increasing network load (congestion) on links and also towards possible oscillations in quest for better paths.

51

topology accurately enough because the Internet evolution is dynamic process shaped by a several

interconnected variables; and thus the results derived from them could potentially be inaccurate and

misleading. For example, Chang et al. [74] showed that Internet-topology arises as a multi-

parameter optimization problem that incorporates AS-geography, AS-specific business-models and

AS evolution-history. Similarly Jaiswal et al. [75] dispel the notion that ASes ranked higher in the

tier-structure always have high connectivity than those in the lower tiers. This thesis uses datasets to

avoid problems from artificial simulated topologies or from testbeds that are too small.

2.3 Proposals To Modify Underlay Routing Mechanisms

2.3.1 Re-Engineering BGP-4 Overlay networks aim to overcome the shortcomings of BGP, leveraging the native path

redundancy present in the Internet. Some studies [76-82] argue that instead of turning to new

avenues for solving problems associated with the shortcomings of the Internet in handling failures

efficiently, BGP-4 could be modified to meet the requirements. Some concerns [2] about delayed

BGP routing-convergence after failures mainly stem from: (i) complicated path exploration through

several paths which already may have been invalidated by a single failure (Figure 2.10a); (ii)

suppression of new route updates [12, 83] to prevent routing oscillations, or “route flapping”. The

authors of a few papers [81, 84], suggest that path-withdrawal or other route-update messages

should be appended with cause-of-failure tags (Figure 2.10b), to simplify path exploration by

invalidating all defunct routes; Similarly Bremler-Barr et al. [77] proposed that in the event of

failure, path-withdrawal messages can be expedited in the whole network to rid the network of

unreachable routes to speed up convergence.

52

Subramanium et al. [12] proposed a Hybrid Link-state Path-vector (HLP) protocol by proposing

several architectural design changes to BGP to counter its churning issues. HLP uses a hierarchical-

approach instead of the flat-architecture of BGP; the network is divided into several domains and

sub-domains; each sub-domain uses a link-state protocol which has much better convergence

properties than path-vector protocols. The sub-domains then use a path-vector protocol to

disseminate the routing information amongst themselves. HLP also specifies a routing granularity

based on AS-level rather than the IP-prefix level used by BGP. The paper shows that by adopting

BGP speakers

Destination

Source

On detecting failure try alternate paths one by one

BGP speakers

Destination

Source

Routes invalidated by failure (dashed)

Route Withdrawal messages,

appended with cause-of-failure tags

Figure 2.10 (a) (top) A single link-failure invalidates several valid routes (shown by bold arrows). (b) Appending path-withdrawal messages with ‘cause-of-failure’ tags help eliminate all invalid routes quickly and converge to valid route quickly

53

their architecture, BGP churning could be improved by a factor of 400.

The previous proposals may reduce BGP churn but it still leaves open the debate on alternate

path discovery through explicit mechanisms. Kushman et al. [85] specifically tackle this problem,

and propose an architecture where alternate disjoint fail-over routes are also announced by BGP

which ensure quick failover (if possible) and guaranteed BGP convergence without any routing

loops. They provide detailed insight in to this problem and explain what failover routes are

appropriate to be announced and where should they be announced in the AS hierarchy.

Similarly, Quoitin et al. [86] propose that several of the BPG inter-domain path selection

parameters could actually be used for traffic engineering purposes, e.g. forced selection of one of

several alternate paths. This could be achieved by selectively advertising destinations on different

paths based on IP prefixes, artificially inflating cost on one of the paths (AS path-prepending) to

discourage its selection or advertising preference for a path to a neighboring AS explicitly through

MED (multi-exit discriminator) attribute. Similarly, Local-preference attribute that BGP uses assign

fixed weights to paths through dissimilar inter-domain bandwidth links could be made more

sensitive to dynamic performance through active path measurements. Another technique for an AS

to exploit inter-domain path diversity is to tweak its own Interior Gateway Protocol (IGP), which is

used to select an inter-domain path that leads to least internal (intra-domain) cost may. This could

end up constantly selecting one of several egress points towards other ASes. More granular IGP

weight tuning could exploit path diversity by choosing other paths.

While individual works have addressed single problems using individual solutions, Multi-Path

Inter-domain Routing (MIRO) [78] addressed all issues, proposing several architectural

modifications to the BGP. The architecture shows how it is possible for ASes to advertise multiple

routes for destination-prefixes through on-demand path announcements –pull-based route retrieval.

Pull-based route retrieval consists of two main steps, (i) a route-negotiation step, in which an

interested BGP speaker floods a query for route request and requested hosts may return such paths

through selective export policies so that other hosts stay oblivious to this information exchange; and

(ii) routing-tunnel establishment where hosts flood information amongst themselves for any

successfully negotiated route (Figure 2.11). This technique ensures that all such negotiated paths

meet BGP policy constraints through selective export policies. Not only does the architecture meet

all design objectives but it also proposes an evolutionary design-approach; offering attractive

incentives to network-administrators adopting MIRO while at the same time making it possible for

native-BGP users to co-exist.

54

Yang [14] proposed a New Internet Routing Architecture (NIRA) in which users have the

flexibility of choosing inter-domain routes by using a new IP addressing scheme that includes intra-

domain and inter-domain sub-addressing. However, it leaves as open debate discussions about the

revenue model ISPs will need to adopt to benefit when users have the power to choose inter-domain

routes.

2.3.2 Enhancing network level packet forwarding decisions to exploit path diversity The authors of one study [3] proposed that instead of BGP (and ISPs) deciding the complete

inter-domain and intra-domain sections of the paths, packet forwarding decisions made at the router

level could be augmented to enable choosing from one of multiple potential next hop candidates to

provide more ‘choice’ for exploiting the path diversity (Figure 2.12). Path deflection is possible

while forwarding packets at routers by selecting one of the candidate choices. Moreover it shows

that such deflections are possible while selecting shorter loop-free paths without violating ISP rules.

Routers only need to consider a few simple deflection rules while forwarding packets. Similarly,

Figure 2.11 MIRO routing example[78]

55

Motiwala et al. [87] proposed path splicing where the main underlying idea is that instead of

deciding upon packet deflection hop wise, a more scalable approach would be to do it at the

granularity of path segments and allow traffic to switch paths at intermediate hops. Such (alternate)

path segments are often known but not used, e.g. BGP records multiple paths between two points

but selects only one based on routing policies. For other protocols, e.g. OSPF, IGRP etc, which

recompute new paths after a failure, multiple paths could be recorded by running multiple instances

of the routing protocol after altering network parameters used for path computation, e.g. by slightly

perturbing link costs. Both of these techniques require packets to be encoded by a shim-header (in

between the network and transport header) in order to inform path deflection decisions which

potentially incurs non-negligible packet processing overhead. These questions are left as an open

debate by these studies [3, 87], and hence scalability of such techniques needs to be investigated.

Also, such studies so far have only investigated the feasibility of exploiting path diversity in few

large ISPs, e.g. Sprint and Abilene, where their results for path diversity might be exaggerated. Its

practical benefits and deployment issues over the wide area Internet are still a challenge when we

consider that due to the power-law structure, there is a large degree of link sharing amongst paths

[8, 21, 33, 72] indicating that there may not be as many path deflection choices as the studies

indicate.

ISP AISP B

ISP CISP D

src

dst

Figure 2.12 Path deflection decision made at router level can exploit the path diversity in the underlay network

56

2.3.3 Fast Re-Route (FRR) construction to reduce failover times The previous section dealt with exploiting path diversity by adding flexibility to routers in

forwarding packets; e.g. by adding randomization in selecting a next hop neighbor to forward the

packet to. This may help in exploring alternate paths but still it does not address the issue if those

alternate paths would be disjoint from the native route thus effectively bypassing the failed element

(link/router). This issue can be addressed by knowing the topological diversity of the paths and pre-

computing all possible alternate paths that allow bypassing of the failed elements. This technique is

known as FRR (Fast Re-Route) construction [88]. This method is aimed for quick recovery from

faults through pre-computed failover paths.

Shand and Bryant [88] highlight several key challenges in FRR construction for purely IP

networks. The first is how to choose such failover paths which can be utilized by the router first

detecting the fault without consulting its neighbors or waiting for the protocol (e.g. IGP) to

converge towards newer paths based on the topology change reflecting the fault and the

computational complexity of computing such paths without overloading routers. The question is

then, how to achieve an optimal tradeoffs between the two.

Such FRR techniques can be implemented at both intra-domain level (IP-FRR for IGP) as well as

inter-domain level (MPLS-FRR) (Francois and Bonaventure [89]).

IP-FRR for IGP

Link state protocols (e.g. OSPF/ IS-IS) used as IGPs (Interior Gateway Protocols) converge much

faster than BGP – a path vector based protocol owing to the small scale of network. Recovery times

of sub 200ms are not uncommon [89]. Such small delays often go unnoticed even by VoIP

customers demanding quick failover times. Interestingly, a majority of this hundreds of

milliseconds time period is not spend on detection of failure, flooding new routing information

(updates) and recomputing routing information but in loading the revised forwarding tables into the

router’s Forwarding Information Base (FIB) [88]. Having pre-computed alternate path information,

which avoid failed components can definitely help in quick recovery.

Failover paths inside a domain are considered so that individual routers can try to try alternate

paths instead of waiting to send/received routing updates to/from neighboring routers. For example,

routers could identify Shared Risk Link Groups (SRLGs) , i.e. a set of links that fail together owing

to a physical commonality between them e.g. adjacent to the same router. Various proposals have

been made for selecting such paths which include: Equal Cost Multi-Paths (ECMP), loop-free

alternate paths or multi-hop repair paths [88]. ECMPs are paths that do not traverse the failure

while loop-free alternate paths are established through a direct neighbor of a router adjacent to the

failure. Multi-hop paths are more complex to compute. Such paths cannot be often

57

computed/decided wholly by one router alone; for example can be specified using a loose-hops

approach or multiple routers using their repair FIBs employing label based mechanisms for path

discovery (label based path switching is described in more detail in the next MPLS-FRR section).

Often majority of the destinations could be reachable by using the first two basic path selection

techniques with multi-hop path construction methods required for the remaining [88].

In fact, it is not just fast recovery that can be obtained but traffic engineering information can be

also be gleaned and paths selected accordingly to meet QoS requirements or load balancing on the

links. For example, some IGP protocols often build up a Traffic Engineering Database. This

database is typically used to optimize utilization of links inside the domain and minimize the cost of

inter–domain traffic intended for an outside destination traversing its network. However, optimizing

these intra-domain parameters may lead to a sub-optimal inter-domain path; e.g. kicking out

packets on an inter-domain segment which is experiencing congestion. Even if the primary intra-

domain path satisfies the QoS requirements for its share of the inter-domain paths it does not

guarantee that its chosen failover path would too due to the constraints of other external domains

contributing to the inter-domain path. Pre-computing such failover paths and apprising neighboring

domains can yield to quick and optimal failover.

MPLS-FRR

MPLS (Multi-protocol Label Switching) is another popularly emerging solution for the solution

to inter-domain traffic engineering for appropriate path selection using IGP FRR. Instead of

PCE PCE PCE

Head end nodes

TED TEDTED

src

dst

PCCPCC PCC

LSRs

PCC=path computation client TED=Traffic Engineering Databse

PCE=path computation element LSR=Label Switching Router Figure 2.13. Inter-domain MPLS path construction

58

switching (routing) packets at network layer based on the inspection of destination addresses, the

routes should be negotiated in the beginning according to the demands of the application. Once

such a path has been found, the negotiated path segments and all packets belonging to the

application are assigned specific labels, and routing takes place on the basis of these labels.

Although, this proposal is nothing new and is similar in concept to previous solutions like ATM

[90], current efforts are now more dedicated towards improving its scalability and extending MPLS

solutions to an inter-domain level.

The proposed technique [91] uses an infra-structure based approach to exploiting path diversity in

accordance with user specified path performance demands (Figure 2.13). A separate entity known

as a Path Computation Element (PCE) [91-92] handles this task. The head end node, also called as

Path Computation Client (PCC) puts a request for a primary (and possible back-up) Label Switched

Path/s (LSP) to the PCE satisfying the user specified path constraints. The PCE responds with the

criteria, the LSRs (Label Switching Routers) should apply to search for the paths. Searching for

paths is somewhat similar to tweaking protocol parameters such as IGP weight tuning (as explained

in the previous section) for exploitation of path diversity inside a domain. Note that not all

implementations of IGP/ISIS may have provision of tuning and PCE may help in such

circumstances. The primary novelty of MPLS-TE is in these three areas: (a) extending these

concepts to an inter-domain level; (b) its approach to consider more dynamic path properties than

just exploiting path diversity and (c) computation of back-up LSPs when primary LSPs fail.

To cater for the extension to inter-domain LSP computation, it incorporates a special crankback

mechanism [91-92]. Put simply, each domain (AS) is responsible for computing a segment of the

LSP using the services of a PCE which would pass though it without revealing its internal structure

or routing policies. Large domains may have more than one PCE. When one of the the Next Hop

(NH) domains (ASes) are unable to find such a path they may refer a failure message to the

adjacent predecessor domain (AS). This message will then be conveyed to the PCE (of this

predecessor domain) which will re-compute path selection criteria so as to exploit different egress

point/s to different NH domain/s (AS). To select path conforming to the required QoS requirement

of the LSP request, the PCE uses TED (traffic engineering database) maintained by IGP/IS-IS

protocols with TE extensions. PCE may also return primary and backup LSPs for failover if

requested.

2.3.4 Open Research-issues with proposals to modify underlay routing mechanisms Proposals to modify underlay routing mechanisms seem attractive at the outset, however, they

pose some challenges. For example, are path deflection decisions as proposed by [3, 87] able to

59

scale well enough at individual packet levels? Other core issues relate to the feasibility of

implementation of the proposed changes to routers to support path deflection decisions. Also, these

studies solve the issue to exploitation of the path diversity of the Internet but a core problem is

monitoring path quality, which has hampered the deployment of large overlay networks due to

scalability concerns. Another area of practical concern is that redesigning underlay routing

mechanisms such as those suggested by [3, 87] including changes to BGP [77, 79-81] exposes

underlay routing to several security vulnerabilities [3]. At present, end systems do not exercise any

control over the paths, their packets would take which are determined solely by the network routers.

Equipping end systems with the power to influence paths may open the network to be comprised by

an adversary or cause breach of commercial traffic transit policies between ISPs causing conflicts

over revenue.

The primary motivation of the MPLS-TE solutions is only to exploit inter-domain path diversity

but also to find paths that fulfill specific QoS requirements. It is based on the premise that

neighboring domains can establish trust for finding such QoS optimized paths. Since, each

individual domain does not have to reveal its internal structure it means that this trust will be weak

unless there is some monetary incentive attached for it to do so. Another related issue is if the

primary LSP fails, each domain may have its own priority to compute a restoration paths that may

not be acceptable to other participating domains [92].

2.4 Multi-Homing Solutions

Multi-homing refers to solutions which allow hosts at the edge or transit providers in the core of

the Internet to maintain redundant connections to the Internet which can be exploited for the

purposes of finding fault tolerance, traffic engineering or optimizing QoS. Thus, multi-homing can

be categorized as of two types: site multi-homing and ISP multi-homing. Figure 2.14 shows an

example of site multi-homing. End-host A which is multi-homed via three distinct ISPs stands a

higher chance of reachability in the event of failures on one of the access links, compared with end-

host B which is single-homed. Site multi-homing is more challenging than ISP multi-homing due

to the scalability issues arising from huge number of Internet hosts when compared with transit

providers. Another challenging issue is to be able to switch paths of longer packet flows so that path

changeover remains transparent to the flow without resetting the connection, i.e. to maintain

transport-layer survivability.

60

Site domain multi-homing can take one of several forms. Host (stub) domains may announce

single/multiple connections to single/multiple ISPs over single/multiple IP addresses [93].

Previously, the approach towards multi-homing was more liberal. Stub domains could acquire

special Provider Independent (PI) addresses from the Regional Internet Registry (RIR). PI

addresses are globally unique IP addresses which are not assigned by transit providers for their

assigned address blocks. For example, if a stub domain multi-homed to two provider network is

assigned a PI address, than it can advertise this to both of its transit providers which will propagate

it to their own upstream providers, where it will reach other parts of the Internet for the dual

connectivity of the host domain.

Using PI addresses was a simple approach to multi-homing. However, this led to scalability

issues together with the problem of depleting IP address space in IPv4. Presently, stub domains are

only allowed to use Provider Aggregatable (PA) address. Stub domains thus consider one of their

immediate provider networks to be their primary ISP and the remaining as secondary. This address

is then advertised to its secondary ISPs. However, this using PA addresses becomes less useful

since, due to scalability issues BGP routers do not accept destination prefixes smaller than /24.

This means although the secondary ISPs would advertise the PA address of the multi-homed site

separately in addition to its own (as it cannot be merged with its own aggregate), the address block

advertised by the primary ISP would be a stronger match for the destination since Internet uses

longest prefix matching when routing to destinations. Thus the primary ISP will be used to connect

to the stub network for inbound packets until there is something wrong with its connection to the

primary ISP when the secondary ISPs will be used to connect to the stub domain. Thus, the

redundant paths cannot be used simultaneously to meet Traffic Engineering (TE) objectives or to

achieve quick failover as dictated by the stub domain as this traditional approach to multi-homing

will again depend on BGP reaction time to provide a failover path. Also, note that even using PI

addresses, introduces one additional routing entry per multi-homed hosts. Huston [94] and Bu et al.

ISP A ISP B ISP C

Core

(Tier-1 ISPs)

A B

Figure 2.14 Single-homing Vs Multi-homing.

61

[95] note that the number of BGP routing entries in the Internet increased by an order of magnitude

between 1995 and 2005.

Many new proposals have been considered by the research community for multi-homing in IPv6

as surveyed by De Launois and Bagnulo [96], learning from the mistakes and shortcomings of

multi-homing approaches in IPv4, namely to provide fault tolerance, traffic engineering, router

aggregation and multi-homing independence. These include: middle box tunneling approaches

through use of NAT or MHTP (Multi-homing Translation Protocol) boxes which convert PA

addresses to PI addresses and newer transport protocols like SCTP, TCP-MH and DCCP [97] that

enable using multiple IP addresses associated with multi-homed hosts to ensure transport layer

survivability.

2.4.1 Open Research-issues with Multi-homing

While multi-homing can improve availability at the edge of the Internet, overlay networks can

also improve availability within the core as well as improving the performance of end-to-end paths.

Effective multi-homing only requires that the customer network be reachable through two or more

topologically diverse ISPs so that it can connect to the outside ‘world’ with reasonable assurance.

Akella et al. [10] and Tao et al. [98] considered performance using key path metrics, delay (RTT),

loss-rate and throughput when edge hosts are multi-homed via multiple providers and also have

choice a of overlay paths when the direct-path undergoes degradation. The results from such studies

may be somewhat biased as they report the results from ISPs which gave best results across all

destinations considered. Akella et al. [10] reported that the performance-advantage is 20-40% for

delay and 15-25% for throughput, when the edge host is multi-homed via three providers;

increasing the number of providers beyond three results in marginal benefits. The same study [10]

however, also concluded that multi-homing has only limited benefits compared to when end-hosts

have a choice of overlay paths between them. This is because end-to-end path diversity in the core

of the Internet can be leveraged effectively through use of overlay networks.

Another paper [99], stated similar results when considering the number of shared routers and

underlay links on alternate paths provided by multi-homing solutions, but interestingly also proves

that overlay paths may not offer as much path diversity as previously thought. It reveals that even if

the edge ASes were removed from consideration where overlay links most-likely merge; there are

still many overlay links which share physical routers and links with other overlay links. Randomly

selecting overlay hosts for disjoint backup paths has little probability of success.

Multi-homing provides physical redundancy while working within the BGP framework.

However, multi-homed hosts announce their multiple routes within the BGP framework through

announcements of routes using different upstream-provider ISPs. Multi-homing has been blamed as

62

one of the leading factors for the exponential increase in the size of BGP routing tables since 1999

[95, 100]. Multi-homing creates ‘holes’ in the routing table [95] because certain subsets of IP sub-

blocks already contained within the prefix set of one of its providers of a multi-homed AS are

announced again by one of the multi-homed AS’s providers for the purpose of fault tolerance.

2.5 Chapter Summary In this chapter we provided a rigorous literature review discussing the three main approaches for

providing QoS to end users; namely overlay network approaches, proposals to modify the underlay

routing mechanism and multi-homing. Although the main aim of all three is identification of a path

anomaly and switching over to better alternate paths, their implementation methods differ. Multi-

homing has limited benefits and proposals to modify underlay routing mechanisms are still in

infancy requiring the efforts of the broader community. This leaves overlay networks as the

promising area to tap into the path diversity of the Internet. This thesis also looks into two core

issues, namely the selection of disjoint paths and reducing path monitoring overheads by exploiting

overlay topology information and overcoming challenges posed when such information is not

available or is inaccurate.

63

3 DESCRIPTION OF INTERNET DATASETS USED IN THIS DISSERTATION

3.1 Datasets considered and methodology for obtaining the datasets The main focus of this dissertation is to present scalable heuristics for the monitoring and

selecting alternate overlay paths when the direct underlay path fails. To analyze the performance of

these heuristics, we only require the requisite end to end path metric and topology information.

Fortunately records of such information are publicly available from several experimental overlay

networks already deployed throughout Europe and North America. Throughout the remainder of

this thesis (Chapters 4-6) we analyze the performance of overlay networks using real Internet

datasets, so it is important that the methodology of obtaining this datasets is explicitly described

before proceeding any further. Our datasets include two experimental networks. The first is a US

based project, Active Measurement Project (AMP) [42], managed by National Laboratory for

Applied Network Research (NLANR) and the second, a European project, managed by RIPE-NCC

(Réseaux IP Européens -Network Co-ordination Center) [41]. Starting July 2006, CAIDA [101]

took over operational stewardship of all NLANR machines and data. Our choice for these two

datasets is driven by two main reasons. Both of these datasets provide (a) end-to-end measurements

at small intervals (order of 30 sec to a minute), e.g. path delays; (b) network layer path information

using traceroutes.

Another popular overlay network dataset, PlanetLab’s All Pair Ping project [73], only provides

regular end-to-end measurements; traceroutes are only conducted if an end to end measurement

registers a path fault. Also, All Pair Ping’s end-to-end path measurements are made at 15 minute

intervals (2005), which makes it infeasible to make accurate path selection using this dataset alone.

This is because both path outages and performance failures occur on much smaller time scales; a

path outage may be defined as an extended period of disconnectivity lasting few minutes in the

Internet between two hosts due to a major event like a link failure (e.g. fiber cut) while a

performance failure may be defined as a minor transient failure (e.g. due to router queues being

congested) leading to an increase in latency, throughput or loss rates by a factor of two or three [4].

Research shows route updates following an outage may cause BGP to take up to 15 minutes [2]

before converging to alternate paths; AMP dataset shows most path delay degradations last less

than a minute.

NLANR’s Active Measurement Project (AMP) performs active measurements between hosts

connected by high performance IPv4 networks. 150 AMP monitors take site-to-site measurements.

AMP monitors are mainly deployed throughout the United States (Figure 3.1). Some monitors are

64

however located outside US in Taiwan, Switzerland, Chile and Korea. The hosts considered are

connected inside two virtual mesh-topologies. One is the AMP-HPC (High Performance

Connection) Network comprising AMP-hosts located in US academic institutions and the second is

the AMP-International Network comprising of hosts external to the US. These datasets provide one

round-trip time (RTT) delay measurement for each pair of hosts per minute, and IP-trace-route

information obtained around once every ten minutes. AMP avoids probing outside its own network.

An IPv6 version of the AMP performs traceroutes between eleven sites. Starting July 2006, CAIDA

[101] took over operational stewardship of all NLANR machines and data. The datasets used in this

dissertation are from 30th June 2006 and 31st August 2006, when this work was undertaken

reporting the data for 146 and 133 AMP hosts respectively. The datasets for an available 24-hr

snapshot can be obtained as compressed .gz files, with delay and traceroutes between pairs of AMP

hosts (Table 3.1).

RIPE-NCC’s Test Traffic Measurement (TTM) measures key parameters of the connectivity

between a given site and other test boxes. Like NLANR AMP, the RIPE NCC TTM system

performs probing only inside its own network. It also provides routing vectors both at the AS level

and the IP level from traceroutes, but does not report hop wise delays. In addition to the routing

vector information, the TTM system also records, among others, one-way delay, packet loss and

bandwidth. This is possible as each box in the system has GPS. Measurements have been made

approximately twice a minute, starting October 2002. RIPE monitors are mainly deployed

throughout Europe, with a few in the United States and Asia (Figure 3.2). These datasets however,

are not available as individualized 24 hr snapshots as with AMP but are available according to user

supplied queries for a particular pair of RIPE hosts and a date/time tuple. Hence to obtain the delay

and traceroute data in bulk we implemented automated “GET http://” queries using shell scripts.

We downloaded a 24 hr snapshot (5th September 2007) for selected 40 RIPE hosts (mostly from

Figure 3.1 Location of AMP monitors in North America [102].

65

Europe).

Both the datasets used suffer from some missing data; e.g. probe being lost for RTT measurement

or one way delay measurement in AMP and RIPE networks, respectively. Both datasets register

missing data with specific flags and timestamps. Similarly, missing traceroute hops are marked with

asterisks (*). We filter the data to remove the impact of such missing path delay data (Section 3.3)

and traceroutes (Chapter 6) by neglecting such paths.

In this dissertation, we select all or a subset of the AMP and RIPE monitors to behave as virtual

RONs; subsets are selected especially where we need to compare results across similar sized RIPE

and AMP networks. Such subset selection is random without any preference for some hosts unless

mentioned otherwise. We denote such virtual RONs as AMP-SIZE-dd/mmm/yyyy or RIPE-SIZE-

dd/mmm/yyyy where SIZE specifies the size of the RON and followed by the date of the dataset.

Table 3-1 NLANR-AMP and RIPE-NCC Datasets.

No of Hosts

Dataset Date

146 30-Jun-06 NLANR-AMP 133 31-Aug-06 RIPE-NCC 40 5-Sep-07

Figure 3.2 Location of RIPE monitors in Europe and the rest of the world[103].

66

3.2 Network Layer Characteristics of Overlay Paths Vs Direct Paths

In this Section we consider the characteristics for overlay paths vs direct paths as seen from the

datasets used in this dissertation. We present the results here for AMP networks behaving as virtual

RONs. We look at the network layer properties of direct Internet paths and all possible one-hop

overlay paths. Figure 3.3 shows that most of the AMP host-pairs have paths which traverse four

Autonomous Systems or more. The corresponding length of the path in the underlying IP network

is between 10 and 20 hops at an average of two to three IP hops per AS. RIPE gives similar results.

Note as RIPE datasets records routing vectors more frequently than RIPE. We have recorded AS

and IP level path lengths for all such paths. This is the reason that the number of paths exceeds the

actual number of source-destination pairs.

0

5

10

15

20

25

30

35

0 2000 4000 6000 8000 10000Source-Destination Pairs

Path

Len

gth

(hop

s)

AS path-lengthIP path-length

0

5

10

15

20

25

30

35

0 1000 2000 3000 4000Paths

Path

Len

gth

(hop

s)

AS path-lengthIP path-length

Figure 3.3 Network layer path length at IP level and AS level. (AMP-146-30/Jun/2006(top) and RIPE-40-05/Sep/2007).

67

Figures 3.4 and 3.5 depict the distribution of one-hop alternate paths between AMP host-pairs via

a third AMP host which diverge from the direct path at the thn hop at IP and AS granularity

respectively. A majority of the alternate hops diverge at the fourth or fifth IP hop Figure 3.5 or

second AS hop Figure 3.4. This reveals non-negligible path sharing between direct and one-hop

overlay paths. Similar results are obtained for AMP-133-31/Aug/2006 (not shown). We neglect

RIPE data here because the dataset contains missing routing vectors between RIPE host pairs as

bulk downloading of complete datasets is not possible.

68

0102030405060708090

100

0 5000 10000 15000 20000 25000

Source-Destination Pairs

% O

verla

y Pa

ths

n=1

n=2

n=3

n=4

Figure 3.4 Percentage of one-hop overlay paths which diverge from the direct path at or before nth AS-hop (AMP-146-30/Jun/2006).

0102030405060708090

100

0 5000 10000 15000 20000 25000Source-Destination Pairs

% A

ltern

ate

Path

s

n=1

n=2n=3

n=4

n=5

n=6n=7

n=8n=9

n=10-20

Figure 3.5 Percentage of one-hop overlay paths which diverge from the direct path at or before nth IP-hop (AMP-146-30/Jun/2006).

69

Figure 3.6 shows the delay benefit of using an alternate one-hop overlay path even if the direct

path has not degraded in performance. For 80% of the paths there is a (one-hop) alternate path

providing a lower value of mean delay than the mean delay on the direct path in both RIPE and

AMP networks. For AMP a majority of these alternate paths can provide up to 75 ms lower mean

delay than the mean delay on the direct path. For RIPE, a majority of these alternate paths can

provide up to 150ms lower mean delay than the mean delay on the direct path. The disparity in

these figures is due to the fact that most of the AMP hosts are connected by high speed links on the

US academic network (AMP-HPC).

00.10.20.30.40.50.60.70.80.9

1

-100 -50 0 50 100 150 200Delay (ms)

Frac

tion

of p

aths

AMP-30/Jun/2006AMP-31/Aug/2006RIPE-05/Sep/2007

Figure 3.6 CDF of the difference between the mean path delay on direct Internet path and the mean delay on the best one-hop overlay path.

70

3.3 When is the Direct Internet path degraded? The direct path between hosts in the Internet is usually chosen to minimize the number of hops

(both AS and IP), which also often leads to minimizing delay. Hence, using a one-hop overlay path

will usually increase delay, and so only makes sense if the current delay on the direct path is much

more than the expected delay on the overlay paths. However, in some instances the Internet path

itself may be inflated as shown by a previous study [104]. In such cases, a one-hop overlay path

may be likely to provide a lower delay path when the direct Internet path is not actually degraded.

However, using one-hop overlay paths in such manner whenever available can lead to

oscillations/instability as explained in Chapter 2. Hence, we might want to add some hysteresis to

reduce the switching frequency as we explain later.

We use the same definition of a path anomaly as used by [6]. We define an anomaly as occurring

when path metric (delay) exceeds its average value by a factor ( k ) of the standard deviation (σ ) of

the delay values in the previous 60 epochs, one hour for AMP and 30 minutes for RIPE:

σkDelayPathDelayPath average +> (3-1)

where k =1,2,3.. is a tunable parameter to trigger an anomaly for small to large delay variations

with increasing values of k , respectively. These values for k and one-hour window in determining

a path anomaly are typical of those used by Chua et al. [8] and Fei et al. [6]. Chua et al. [8] worked

with the Abilene network; the authors collected their network path delay measurements using

NLANR AMP project measurements since a subset of AMP hosts are from the Abilene network.

Similarly, Fei et al. [6] worked with RIPE dataset. Note that the criteria for flagging a path anomaly

on direct paths does not affect the relative goodness or badness of one-hop overlay paths that will

be chosen to improve performance. Fei et al in [6] conjectured, “…which paths are good alternates

to avoid delay degradations is relatively insensitive to the exact definition of delay degradation”. In

the remainder of this thesis, we refer to particular degradation considered as σk degradations based

on the value of k used. We only select anomalies for which the immediately previous 60 epochs

window do not contain any missing data. We select 3=k to emulate performance failures and

10=k to emulate path outages.

71

Figure 3.7 shows probability plots for some paths on AMP and RIPE networks with thresholds

for performance failures and path outages. (The averages and standard deviation are computed over

the entire path delay profile). The probability of a performance failure is approximately 1-3% while

the probability of a path outage is less than 0.5%.

0 5 10 15 20 25 30 35 400.0005

0.001

0.0050.01

0.050.1

0.25

0.5

0.75

0.90.95

0.990.995

0.9990.9995

Delay (ms)

Prob

abili

ty

0 5 10 15 20 25 30 35 400.0005

0.001

0.0050.01

0.050.1

0.25

0.5

0.75

0.90.95

0.990.995

0.9990.9995

Delay (ms)

Prob

abili

ty

0 10 20 30 40 50

0.0050.01

0.050.1

0.25

0.5

0.75

0.90.95

0.990.995

Delay (ms)

Prob

abili

ty

109.5 110 110.5 111 111.5 112 112.5 113 113.5

0.0050.01

0.050.1

0.25

0.5

0.75

0.90.95

0.990.995

Delay (ms)

Prob

abili

ty

Figure 3.7 Probability plots for paths to show incidence of path outages and performance failures. (RIPE (top) and AMP).

72

73

PART II

SCALABLE HEURISTICS FOR SELECTING DISJOINT PATHS IN OVERLAY NETWORKS

74

75

4 AN ARCHITECTURE FOR SELECTING DISJOINT PATHS- GLOBALLY SCALABLE RON SERVICE

4.1 Introduction In this Chapter, we first provide evidence of path diversity in the Internet at both the IP and AS

level but show that fully edge (or node) disjoint paths are often not possible between end hosts even

using overlay networks. This makes it necessary to choose wisely amongst the available partially

disjoint paths. We then proceed to describe an architecture for a best-effort RON service,

Destination Guided RON (DG-RON); which simplifies the path exploration problem by finding

topologically diverse detours, using small candidate detour sets. We also present three offline

heuristics which complement each other under different spatial distributions of failures in finding

available paths via DG-RON with a high probability. We show that landmark based heuristics can

work well for power-law networks like the Internet for finding topologically diverse alternate paths.

Our analysis using real Internet datasets, shows that it is possible to find alternate paths with a high

probability while incurring low measurement and maintenance overheads.

Before we proceed any further, we give a brief overview of this Chapter. The initial sections

describe some findings which lead to the motivation for developing a scalable architecture of DG-

RON. In Section 4.2, we look at the relationship between overlay network size and the path

diversity it offers. Section 4.3 discusses if some overlay hosts are better than others to mask Internet

path failures. Sections 4.4-4.6 describe the architecture of DG-RON based on these observations. In

Section 4.7 we present scalable landmark based heuristics in selecting an overlay host based on

disjointness criteria. In Section 4.8 we evaluate the performance of the proposed architecture using

trace based simulations using real Internet datasets. Section 4.9 concludes the section by discussing

the findings from this study. Section 4.10 concludes the chapter.

4.2 Relationship between Overlay Network size and path diversity it offers

The Internet topology evolves as a power-law network [72, 105]. In power-law networks, the

outdegree vd of a node v is proportional to the rank of the node vr , to the power of a constant R

i.e. Rvv rd α [105] where vr is the index of a node in a sequence when nodes are sorted in

decreasing outdegree sequence (ties in sorting are broken arbitrarily) and a typical value for R is

8.0− [105]. This means that there is a very small minority of well connected nodes which have a

76

huge outdegree while the majority of the nodes have a very small outdegree. This power law

topology phenomenon is visible in the AS level topology of the Internet; there are a few tier-1 ASes

which alone constitute the majority of the inter-AS links in the Internet [105]. Customer networks

are unit degree ASes (i.e. only connected to their immediate ISPs if not multi-homed) typically

located at the outward fringes of the network with sparse connectivity. We next see the impact of

selecting a small subset of Internet hosts for tapping into this path diversity as opposed to the

billions of hosts possible. Figure 4.1 shows the AS degree distribution of a large number (3828) of

ASes from [106] and the degree distribution of ASes sighted on overlay paths (using traceroutes) in

average sized overlay networks consisting of a few tens to hundreds of AMP hosts. Notice that

when even as few as 20 overlay hosts are selected to comprise an overlay network, the overlay

paths already pass through the largest tier-1 AT&T network (AS 7018 with a degree of 2351). This

shows that even small overlay networks can offer a substantial amount of path diversity provided

the overlay hosts are in diverse ISPs to enable as much connectivity to the tier-1 & 2 networks to

expose them to the AS level path redundancy in the Internet. Physically the ASes comprising the

overlay network contribute to a topology that resembles a micro model of the Internet with a

densely connected core and sparse connectivity at the edges. However, due to the power-law model

of the Internet only a few tier-1 ASes with high connectivity are present; a majority of the customer

networks are stub networks with degree of just one, i.e. only connected to their immediate ISPs

which in turn rely on the large tier-1 and tier-2 ASes for connectivity to different parts (IP blocks)

of the Internet. It is obvious to see as the number of hosts comprising the overlay network would

increase, the network layer topology of the overlay network would tend towards the crude Internet

model depicted in Figure 4.1. From AMP-20 to the crude Internet model, the percentage of ASes

with high degree grows smaller and smaller, a reduction of two orders of magnitude in ASes with

degree greater than 1000. This has the effect of stretching the graph towards the left. Due to the

larger number of hosts in AMP dataset we presented the results for AMP here; RIPE would produce

similar results.

77

4.3 Are some overlay paths preferred more often than others? One previous study [61] has shown that some overlay paths are preferred more often than others.

In their particular case, the considered overlay network was in Japan, with overlay hosts attached to

geographically separated ISP’s. They found that only 25% of overlay hosts were preferred more

often than others, alleviating around 90% of the total failures. Similarly, Kawahara et al. [60]

develop an approach for reduction in the number of transit overlay hosts based on their frequency of

selection. This approach can help in selecting the optimum overlay path that provides the maximum

performance benefit in a cost effective and scalable manner.

We performed the same analysis on our North American and European datasets to see if this

trend continued for other geographically diverse overlay networks. Let the source node be denoted

by iν and the destination node be denoted by jν ( jiNji ≠= ;,...,2,1,0, ) where N is the total

number of hosts in the overlay network connected in a mesh-topology. Let us define the

intermediate overlay hosts (i.e. detours) from iν to jν through zν ,( jiNz ,z ;,...,3,2,1,0 ≠= ) at time

t as jizt ,,,ν where z denotes the thz relay node and t the time at which the direct-path between iν

1

10

100

1000

10000

0.0001 0.001 0.01 0.1 1

ASes sorted according to degree (normalized)

AS

degr

ee

AMP-20-30/Jun/2006

AMP-40-31/Aug/2006

AMP-146-30/Jun/2006

Crude Internet Model-3828 ASes

Figure 4.1 Relationship between size of an overlay network and AS degree distributions. X-axis depicts ASes sorted according to their degree-(descending order) normalized by total number of ASes.

78

and jν becomes degraded according to the criteria explained above. These paths are ranked by

descending order of their delay gain metric as shown below :

pathDirect

path OverlaynpathDirect

Delay

DelayDelayDelaygain

th

−− −= (4-1)

where pathDirectDelay − refers to the delay on the direct Internet path between iν and jν and

pathOverlaynthDelay−

refers to the delay on the thn one-hop overlay path between iν and jν through

an intermediate overlay host zν .

We computed the frequency with which a particular AMP or RIPE host in AMP-40 and RIPE-40

respectively, was the best relay node for a source-destination pair whose path was degraded. We

use σ3 degradations for AMP-40-31/Aug/2006 and σ10 degradations for RIPE-40-05/Sep/2007

to emulate performance failures and path outages, respectively, for the results presented next based

on the definition in Eq 3-1, Section 3.2. Similar, values have been used by the authors of [8], for the

Abilene Network. Most AMP hosts considered in this dissertation are from North America, and are

on networks with connection to the Abilene network [8]. Let us define by ),,( jitH = , the set of

those source-destination pairs ),( ji νν whose paths were degraded at time t according to our earlier

definition and denote the frequency of an overlay host node being selected as zf of

zν ),...,2,1,0( Nz = between iν and jν as shown below.

),(,)(

),,(

,,, jizPD

vvIf

Pjit

zjiztz ≠

== ∑

(4-2)

where PD is the total number of path degradations observed during the 24-hr periods the datasets

were collected ( PDH =|| ), and

⎩⎨⎧ ≠=

==otherwise

jizvvifvvI zz,i,jt

zjizt 0

),( 1)( ,

,,, (4-3)

79

In addition, let us define the arrangement of zf in descending order of value by ][ zf

39:,...,3,2,1,0( == NNz for AMP-40 and RIPE-40.). Then the cumulative value of ][ zf is defined

by:

∑=

=z

xzfzF

0][ ,][ (4-4)

where 1][ =NF holds for AMP-40 and RIPE-40 respectively.

We find that ]0[F is about 0.1 for both AMP-40 and RIPE-40 (Figure 4.2). This study indicates

that 10% of the optimal routes can be found using only one transit node. Furthermore, 50% of the

optimal routes can be found using only 8 and 6 hosts in AMP-40 and RIPE-40. Around 90% of the

failures can be masked using only 50% of the overlay hosts.

These results are although a little less astounding, are consistent with the results of Uchida et al.

[61] and our findings in Figure 4.1. This is attributed to the greater ISP diversity inside the larger

geographical regions of North America and Europe as compared to Japan, allowing for more

overlay hosts to participate in better routes. They also prove even in large overlay networks, due to

clustering of overlay hosts in the same BGP atoms [48], several overlay hosts provide similar levels

of path diversity. This will also be addressed in Chapter 5.

00.10.20.30.40.50.60.70.80.9

1

0 10 20 30 40z

F[z]

RIPE-40-05/Sep/2007AMP-40-31/Aug/2006

Figure 4.2 Overlay hosts sorted in descending order ‘z’ (x-axis) according to percentage of failures masked, and failures masked as Cumulative function ‘F[z]’ (y-axis)

80

4.4 DG-RON Clients and Services We assume that DG-RON clients subscribe to the service from the nearest DG-RON edge node

stipulating services required e.g. connectivity to popular destinations but use the services on ‘pay

per use’ basis where packet detouring requests are only made once the default path suffers a

performance or path failure. This is to ensure that overlay based path switching does not affect non-

overlay traffic or cause oscillations by frequently swapping paths for minor performance gains

[107]. We assume that the packet to be routed enters the overlay via its nearest overlay proxy after

encapsulation and departs at another that is chosen by the path selection algorithm.

4.5 Overlay Infrastructure The purpose of a resilient routing overlay is to provide improved connectivity between any two

arbitrary hosts on the underlay network in the face of failures. Such a service should be scalable,

provide satisfactory performance guarantees, be able to handle overlay churn and provide good load

balancing on underlay links. Keeping these global objectives in mind we start with a bottom-up

approach in overlay construction.

BGP has demonstrated the importance of hierarchy for global scalability. We choose to use

architectural hierarchy to meet this objective. The architecture uses n landmarks to divide the

overlay network into n logical zones and at the same time into an n dimensional co-ordinate space

for inter-host distance estimation (Figure 4.3). Each of the landmarks is responsible for the

bootstrapping of new hosts in its own logical zone. Landmark hosts only play a role in forming the

infrastructure of the overlay but do not participate in routing. It must be ensured that the landmarks

are sufficiently spaced apart for accurate distance estimation and binning of hosts. We choose

landmarks based on based on topological diversity (Section 4.8). In our simulations we set n =7,

i.e. 7 landmarks, which results in optimum results for inter-host distance estimation [108-109]. The

landmarks could become a potential performance bottleneck in the system so a single landmark

could actually be a logical abstraction of a group of machines collocated together or in close

proximity of each other [110].

81

Each overlay node measures its distance from each of the landmarks as RTT (in milliseconds)

between a ping request and reply; and stores the result as an n -dimensional network vector

[ ]nRTTRTTRTTRTT ...321 (where n is the number of reference landmarks used in

simulation). Such network coordinate mechanisms embed a network into a continuous space which

is Euclidean [109]. Each overlay node then contacts its nearest landmark node to join its logical

zone and to request a detour set. The members in the detour set of each overlay node are selected in

the DG-RON architecture using the binning technique proposed in [110]. Each peer requests a total

of T relay hosts from its nearest landmark (as explained previously). In this peer selection method

a landmark returns x short distance (intra-zone) hosts from its own logical zone, and the remaining

)( xTy −= are long distance (inter-zone) hosts requested from other landmarks. The distance

estimation function used by landmarks is similar to the Cartesian Distance estimation method

IP2GEO [109]. The network distance is estimated between the network vectors of different hosts in

different zones for each of the landmarks and the network vector of the requesting node. The

network distance in terms of RTT metric between two arbitrary hosts a and b is estimated from

their network vectors as shown by equation below.

})-RTT(RTT ) -RTT(RTT)-RTT{(RTTDist bnanbaba22

222

11 +…++= (4-5)

MAX

MIN

MIDWAY

Landmark

RON nodes

Nodes maintained as detour set

Source and Destination RON nodes

Default Underlay Path

Overlay Paths (One hop Indirection via nodes in the detour set)

Source RON node

Destination RON node

Figure 4.3 Finding Topologically diverse detours for underlay destinations.

82

naRTT an landmark from dsmilliseconin node of time tripround theis where

Selecting relay hosts using the binning heuristic ensures that the overlay connectivity is

maintained and the average routing latency on the overlay is low [110].

4.6 Online Path Selection-Dynamic Path Monitoring To achieve scalability we propose using both online path probing and offline path selection

heuristics. By using a small detour set the path monitoring overheads are reduced from )( 2NO to

constant overheads )(NdO per overlay node where d is the average node degree (the size of the

detour set). Some additional overheads like churning of the overlay network (hosts joining and

departing) also need to be catered for, since this would require reformation of zones, redistribution

of hosts to landmarks and updating to detour sets. However, such events are not very common; the

associated overhead is very low and there are established scalable gossip protocols for this [36, 65].

We propose scalable offline mechanisms to find alternate paths where we do not need to address the

actual composition of the original underlay path suffering from a performance failure event. Once a

peer obtains its detour set from the landmark it probes this detour set only. Instead of probing

aggressively it may used a randomized probing scheme, e.g. monitoring paths that are more prone

to changes (degradations) than others. As we highlighted earlier, the motivation of our design is

scalability at Internet proportions. The randomized probing scheme does not require that overlay

hosts probe aggressively for detection of path outages and failover mechanisms like [4]. If a peer

from the detour set is deemed as failed for a considerable interval then a new peer can be eventually

requested from the landmark. However, in the simulations we do not implement any repairs to the

detour set of the overlay hosts and investigate only the static resilience of the overlay using only the

live members of the detour set. This assumption is reasonable in a real deployment of DG-RON

with non-aggressive probing epochs. The landmark based decentralized architecture eliminates the

need for any information flooding in the network as required in link-state protocols making the

design scalable for large overlay networks. Online link probing techniques such as those used by [4]

are still required for performance measurements to determine dynamic performance; however such

overheads are significantly reduced owing to the distributed architecture.

Overlay links between an overlay peer and hosts in its detour set are monitored for performance

characteristics such as latency, throughput and loss rates. Note that we only probe overlay links to

candidate detours; [32] shows that predicting good detouring nodes can yield acceptable upper-

bounds for end-to-end path metrics. We conjecture that the underlying reason for this is the small

83

probability for many Internet links on spatially diverse paths to undergo congestion at similar times.

Moreover, unlike [32] we can combine disjointness criteria (discussed next) with absolute

performance merits to optimize the selection of candidate detouring nodes. To improve scalability

further, we propose that probing could be replaced by passive monitoring of traffic traversing

overlay links between a peer and its detour set to improve dynamic estimation of path performance

without introducing any probing traffic and subsequent probing overheads. Techniques for both

active network probing and passive traffic monitoring have been studied in the past e.g. [4].

4.7 Offline Path Selection- Landmark Based Heuristics Several papers [4, 16, 32] showed that in most cases a performance failure can be bypassed using

single hop indirection using an overlay node. We use the Maximum Divergence Heuristic to find

such one hop detours, in which the peer chooses the next hop based on the Cartesian distance [108,

111-112] of the destination from the eligible next hop candidate relay hosts. The underlying idea is

similar to the Earliest Divergence Rule (EDR) [6], which aims to select a path which diverges from

the default path near the source and converges near the destination in order to avoid a failed or

congested link. However, EDR assumed the availability of complete AS path information between

the source-destination pair and candidate alternate paths. This is sometimes challenging as this

requires accurate information from non-overlay components e.g. routers. Our architectures only

relies on end to end path metrics which requires only the cooperation of the overlay hosts.

Our divergence criterion is to select with good probability an alternate path that diverges from the

defunct portion of the default path near the location of the failure, e.g. a congested link. Eligibility

of such overlay paths may further be based on underlying network characteristics, e.g. loss rates,

latency or throughput through monitoring (as explained in the next section). We need to capture the

entire spectrum of disjoint paths possible from amongst the detour set overlay hosts. The first

heuristic we use in searching for such divergent paths is MAX , in which we choose an overlay peer

which has the maximum network distance from the destination. The underlying reason for using

MAX is to select with high probability an overlay peer which leads to a topologically diverse

alternate path to reach the destination. We also search for alternate paths using MIN where we use

overlay hosts close to the destination as detours. The underlying heuristic for this rule in contrast to

the previously mentioned MAX rule is the observation of fact that many paths in the Internet

violate the triangle inequality due to routing policies [113]. Thus, it is also possible to find a disjoint

path using a peer in proximity to the destination. Instead of choosing the detours based on their

distance from the destination we could similarly use their distances from the source, since the

underlying idea is to exploit the whole spectrum of available disjoint paths. We refer to the heuristic

84

where we choose an overlay peers roughly midway between the source and destination as

MIDWAY . Figure 4.3, shows the underlying idea in the selection of detours. There may be other

landmark based heuristics which we may have neglected here and may work better than the ones

presented here; our main objective is to investigate if such schemes can work to select disjoint paths

when the cause and location of the path failure on the primary path is not known in advance. The

generic algorithm for offline detour selection is presented in Figure 4.4.

The offline heuristics for selecting topologically diverse detours only require that the destination

be mapped into the network co-ordinate space. This mapping can easily be managed by the

landmarks for popular destinations to which DG-RON clients have subscribed. For unfamiliar

destinations the landmarks could extrapolate the approximate co-ordinate vector using vectors of

other hosts from its nearest landmark optionally utilizing services of a third party e.g. WHOIS

servers. Only the knowledge of the destination IP address is required for both and should suffice to

find the overlay based detour. Such information could also be cached as frequent requests to

, ,

||,...,2,1 ,|)(/)(1|minarg

||,...,2,1 ,)(maxarg

||,...,2,1 ,)(minarg

))(),(()( )),(),(()( || 1

) , &

:,...,, , : 21

c

B

A

DSi

Di

Di

iSiD

n

T NextHop If MIDWAY, TextHop If MAX, N

TxtHop If MIN, Ne

TiiCostiCostC

TiiCostB

TiiCostA

endforTSDistanceiCostTDDistanceiCost

Ttoifor

spacecoordinatenetworktheinBAhostsarbitraryanyforvectorsnetworkare(B) and (A) (where

(B))(A),Distance(Cost:Define_paths)_alternate_candidate (Find:Algorithm

SSourcefromDnDestinatioforathsAlternatePCandidateOutputTTTsetdetourandDnDestinatioSource SforsCoordinateNetworkInput

===

∈∀−=

∈∀=

∈∀=

===

=

=

νννν

νννν

Figure 4.4 Offline Detour Selection based on Maximum Divergence Principle.

85

popular destinations are made so the peer can incrementally learn about these. Next we describe

three offline schemes for selection of an overlay ‘detour’ node once the position of the destination

has been determined in the co-ordinate space.

The offline methods based on network co-ordinates (discussed in the previous section) can embed

only latency but not failure or congestion information; and thus may not adapt well for dynamic

performance estimation on alternate paths. Thus, to supplement the offline path selection process

online path monitoring is necessary in DG-RON.

4.8 Performance Evaluation We use trace-based simulation driven by real-world Internet datasets to validate the DG-RON

architecture we present in this Chapter. We investigate the performance-benefits of DG-RON for

finding QoS enhanced paths. For this study, we use measurement data between 146 and 133 AMP

hosts (mainly from North and South America) from two, 24 hr datasets [114] which were obtained

in 2006: June 30 and August 31. The details of these datasets have been explained earlier in Chapter

3. We deliberately choose to avoid the RIPE measurement data here because of the small number of

hosts for which we collected data as downloading bulk dataset is not available and the purpose here

is to investigate the performance of a proposed architecture that aims to enable RONs to scale

beyond 50 hosts [4].

We let the AMP networks behave as a virtual RON. The selection of landmarks ( 7=n ) done by

selecting 7 AMP hosts which are topologically diverse to enable good network distance estimation

and have delay measurements to all other AMP hosts. Accurate distance estimation is not the goal

here, it is just to predict network distance with sufficient accuracy for selecting topologically

diverse detours. We cluster AMP-HPC hosts as belonging to 7 geographical regions: North, North

East, North West, South, South East, South West and Central US. Each of the 7 landmarks is

chosen randomly from these 7 clusters so that they are spread throughout the continental United

States. Hosts forming part of AMP-International network are not connected as a full mesh with all

other AMP hosts; these are deliberately neglected from being chosen as landmarks. Detour-set hosts

and computation of Cartesian-distance are done exactly as explained before (Section 4.5), but using

the RTT measurements from the trace files; the only differences are: (1) the size of the overlay

comprising of all the nodes in the AMP datasets; e.g. 7)-(146 139=N hosts (for AMP-146-

30/Jun/2006); (2) we pick a third of the detour set host from the short distance (intra-zone) overlay

hosts, another third from the long distance (inter-zone) overlay hosts and the remaining third are

chosen randomly from the set of overlay hosts which were responsible for alleviating the majority

of the total failures (as discussed in Section 4.3).

86

Path failures are defined as given in Equation 3-1 (Chapter 3). We pick 3=k to identify

performance failures and 10=k to identify path outages as before. Due to the way we define

failure, instead of observing only the fraction of underlay failures successfully masked by both

schemes; we use the delay gain metric (Section 4.3) to quantify the delay reduction when using the

alternate paths in the DG-RON architecture.

4.8.1 Impact of Detour Set Size

Figure 4.5 shows the results for the delay gain metric comparison between the best possible path

and the best possible path selected from amongst the detour set of a DGRON node as the detour set

size is varied. For both datasets, when there are path degradations (e.g. on 19321 (=139*139)

possible paths for AMP-146) there is at least one QoS optimized indirect (one-hop overlay) path in

the RON. Using only a carefully selected detour set, as we outlined earlier, each overlay node can

find a QoS optimized path for all path outages and performance failures encountered. The results

are more impressive for path outages than performance failures. As explained in Section 3.2, one-

hop overlay paths normally have delays much larger than direct Internet paths. If the magnitude of

path degradation is larger (path outage), it increases the number of one-hop overlay paths which

provide better delay. Consequently, even a small detour set of 6 overlay nodes can provide

exceptionally well delay gains (Figure 4.5a); delay gain of 40% or more for 90% path outages.

Figure 4.5b shows that at least 12 detouring options are required for being able to select a path

providing delay gains of 40% or more when direct Internet paths suffer from performance failures.

As the detour set size is further increased to 48, the performance gains are marginal.

87

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Path Outages (%)

Del

ay G

ain

(%)

RON

DGRON (|T|=6)

DGRON (|T|=12)

DGRON (|T|=48)

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Performance Failures (%)

Del

ay G

ain

(%)

RONDGRON (|T|=6)DGRON (|T|=12)DGRON (|T|=48)

Figure 4.5 Delay Gain Comparison between DGRON and RON with variation in detour set size. (AMP-146-30/Jun/2006 (top) and AMP-133-31/Aug/2006.)

88

4.8.2 Evaluation of Offline Path Heuristics We also evaluate the efficacy of our offline heuristics. We select three overlay relay hosts using

each heuristic; MIDWAYandMINMAX , . We compare the characteristics of the best-of-three

paths i.e. three paths selected using each of MIDWAYandMINMAX , after sorting based on

distances.

We first measure physical path stretch on the QoS enhanced one-hop overlay paths selected by

each of the offline path selection heuristics.

path)Internet (direct hops levelRouter heuristic)path offlineby selectedpath overlay hop-(one hops levelRouter =StretchPath

(4-6) We find that MAX may look for longer paths with average path stretch of 2.2 compared to 1.8

for MIN and MIDWAY for a detour set size of 12 (Table 4.1). Note that physically longer one-

hop overlay paths can still provide lower delay alternate paths if the direct path is suffering from

congestion -violation of triangle inequality [60]. We also evaluate the delay benefits obtained on

paths selected by the offline path heuristics using Equation. 4-1, where the delay of the one-hop

overlay path is through the overlay host selected by the offline heuristic. MAX accounts for finding

about 45-60% of QoS enhanced paths for performance failures and path outages, respectively

(Table 4.2). MIN and MIDWAY are most efficient in finding good QoS optimized paths with

substantially higher delay gains than MAX , accounting for finding approximately 75-99% of QoS

enhanced paths for performance failures and path outages, respectively. These QoS enhanced paths

provide delays gains of 40% or higher in all cases. This shows that landmark-based heuristics can

aid in selection of disjoint alternate paths and thus filter good paths from bad ones. In situations

where monitoring of all paths is not desirable or feasible due to scalability issues, such heuristics

can predict alternate path availability with a very high probability.

89

4.8.3 Comparison with SPAD

To investigate the effectiveness of the landmark based heuristics in the construction of DG-RON

for selecting geographically diverse detours, we compare DG-RON with SPAD [115] (Super-Peer

based Alternate Path Discovery). Several related works [8, 21] investigate lowering of path

monitoring overheads by monitoring small number of paths and predicting performance on the

unmonitored paths thereby still emulating RON. Very few works e.g. SPAD considers the problem

of selecting a subset of peers for finding QoS enhanced paths using a landmark based distributed

architecture similar to DG-RON.

To emulate SPAD, we follow a similar scheme as used by the authors of [115]. A new overlay

host contacts a super-peer (nearest landmark) for bootstrapping which gives it a list of 50 candidate

hosts (selected randomly from all overlay hosts). From these the new overlay host selects 12

overlay hosts which are closest to it in terms of RTT. This is done based on minimum network

distance in the network coordinate space. For comparison of DG-RON with SPAD we compare the

performance of the best path from the detour set of each whenever a path outage or performance

failure occurred. From Figure 4.6 it is evident that DG-RON can find paths with better delay gains

than SPAD owing to its selection of more geographically diverse detouring options for both path

outages and performance failures.

Table 4-1. Path stretch incurred by selecting overlay paths based on offline path heuristics (|T|=12).

MAX MIN MDWPath Stretch 2.18 1.78 1.84Standard Deviation 1.53 0.50 0.56

Table 4-2. Average Performance of offline path heuristics in masking failures (|T|=12). Path outages for AMP-146-30/June/2006 and Performance Failures for AMP-133-31/Aug/2006.

Average Delay Gain 41.86 57.13 46.11 61.15 40.29 57.60Percentage Failures Masked 43.72 59.15 89.37 95.31 74.32 99.66

MAX MIN MDWPerformance

FailuresPath

OutagesPerformance

FailuresPath

OutagesPerformance

FailuresPath

Outages

90

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Path Outages (%)

Del

ay G

ain

(%)

RON

DGRON

SPAD

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100Performance Failures (%)

Del

ay G

ain

(%)

RON

DGRON

SPAD

Figure 4.6 Delay Gain Comparison between DGRON and SPAD (|T|=12). (AMP-146-30/Jun/2006 (top) and AMP-133-31/Aug/2006.)

91

4.9 Discussion

The simulation results presented in the previous section reveal that landmark based offline path

searching methods can work well in power-law topologies such as the Internet which can

supplement or reduce the overheads of aggressive online path selection algorithms. The results in

this section show there is ample opportunity for finding alternate paths even if overlay hosts are not

connected in a full mesh. Considering that performance failures are short duration events, making it

highly unlikely for a large fraction of links to undergo congestion or suffer from other performance

degradations at the same time, DG-RON can predict good alternate paths among candidate hosts in

the detour set.

The proposed design for offline path selection does have some obvious caveats; the most glaring

of all is the fact that the path exploration could incur some delay in alternate path discovery. We

argue that to achieve scalability, this problem is unavoidable. BGP has taught us that scalability

only results by marching through all possible alternate paths post-detection of a failure. The

landmark based architecture can effectively predict availability of good alternate paths.

4.10 Conclusion As the Internet continues to grow, so does the diversity of the connectivity between the hosts. In

this chapter we presented the first contribution of this thesis investigating the possibility of a

globally scalable RON service for discovering infrastructural redundancy and robustness potentially

present in the Internet. RON unnecessarily searches through a large path exploration space and the

subsequent overheads associated with aggressive path monitoring pose scalability issues. To

address this issue several previous works [6, 24] have focused on topology aware heuristics in

overlay construction and link monitoring which make it possible to both monitor and select

alternate paths using distributed approaches. Our work is similar to such approaches in that we aim

to lower both path monitoring overheads and reduce the candidate path exploration space. In

addition our work presents a platform for harnessing the findings of previous literature [6, 16] .

92

93

5 DISJOINT PATH SELECTION IN OVERLAY NETWORKS USING TOR GRAPHS

5.1 Introduction In Chapter 4 we highlighted the fact that path diversity in the Internet and overlay networks exists

at both the IP and AS levels. IP level paths inside ASes are totally under the domain of the AS.

However, tapping into AS level path diversity can also allow us to exploit the IP level path

diversity.

This chapter presents the second contribution of this thesis, namely the selection of maximally

disjoint alternate paths at the AS level by using Type-Of-Relationship (ToR) graphs [116]. We

again validate our findings using real-world Internet-data from the Active Measurement Project

(AMP) [2] to quantify the benefits of choosing paths that are disjoint in terms of the ASes they

traverse.

First, Section 5.2 briefly describes ToR-graphs. In Section 5.3 we present a greedy-approach for

finding maximal AS-disjoint overlay paths. In Section 5.4 we evaluate the performance of this

approach using real-world Internet data. Section 5.5 summarizes the key findings of the study.

5.2 ToR (Type-of-Relationship) Graphs The Internet is composed of a large number of autonomous networks (ASes). Each AS is

independently administered. To route a packet from one host to another it must pass via several

different ASes. ASes can be characterized into two broad categories, transit ASes and stub ASes

(Figure 5.1). Stub-ASes are located on the edges of the Internet and typically have few connections

to neighboring ASes (usually one, perhaps a few if multi-homed) whereas transit ASes usually have

more connections to neighboring ASes. Each sub-network learns about global reachability to

different hosts in the network by exchanging route advertisements with immediate neighbors.

Gao [63] first showed that within the ‘generic’ transit-stub architecture, three dominant types of

commercial-relationships occurred between ASes, namely customer-provider (C-P), peer-peer (P-

P) and sibling-sibling (S-S). Customers depend on their respective provider networks for

connectivity (the providers acting as a transit for them), usually in exchange for a fee. Peers (and

siblings) are networks which are similar in scope and can exchange traffic (destined for each other’s

customers) between each other without a fee, for mutual benefit. C-P, P-P and S-S relationships are

all relative in that a particular AS can have different relationships with different adjacent ASes. Gao

[63] found that the percentage of C-P, P-P and S-S relationships are roughly 90.5%, 8% and 1.5%

respectively.

94

Gao [63] also showed that the Internet uses “valley-free” paths between hosts which are defined

by policies. The term “valley-free” refers to the hierarchy formed by customer-provider

relationships between ASes (as explained below). All ASes are classified into five tiers, with each

level of tiers numbered and lower numbers denoting higher tiers (more central ASes). Tier-1

included ASes belonging to global ISPs and Tier-5 includes ASes from local ISPs. Intuitively, a

customer AS belongs to a higher tier than its provider. ASes with a CP relationship should ideally

be on different tiers though in actuality it is not always even possible to create a consistent model of

AS relationships which achieves this simple structure. Traffic is permitted to pass up the hierarchy

from customers to their providers (i.e. from higher tier ASes to lower tier ASes), at the source end

of the path, but can only pass down the hierarchy (i.e. from lower tier ASes to higher tier ASes) in

order to approach the destination; a provider cannot use one of its customers to connect to another

provider, since that would form a valley. This favors the commercial-relationships between

providers and customers so as to: (a) maximize the provider profit; and (b) avoid routing loops.

Figure 5.2 shows some examples of valid valley-free paths.

Formally, let )( iASTier denote the tier number of AS i , then an AS path ),...,,( 10 nASASAS is

said to be valley-free iff there exists )0(, njiji ≤≤≤ satisfying:

CoreSource Destination

Stub- AS

Stub- AS

Transit-

AS

Transit-

AS

Figure 5.1 Network layer paths between source-destination at AS level topology.

95

).(...)()(...)()(...)( 110 njjii ASTierASTierASTierASTierASTierASTier ≤≤<==>≥≥ +− (5-1)

The maximal uphill path is then ),...,,( 10 iASASAS and the maximal downhill path is

),...,,( 1 njj ASASAS + . The AS(es) in the highest tier ),...,( ji ASAS are called top AS(es).

Type-of-Relationship (ToR) graphs [116-117] show the customer/provider/sibling relationship

between adjacent ASes, using directed edges for C-P relationships (directed from from customer to

provider Figure 5.2) and undirected edges for P-P and S-S relationships [63]. For consistency, and

without loss of generality, P-P and S-S relationships can be represented by two directed edges by

introducing a virtual-provider node in between them [5]. We adopt this technique to map P-P and

S-S edges in the ToR-graph (Figure 5.2). Note the ToR graph only depicts whether ASes are

connected and (if so) their relationship (C-P, P-P or S-S) and it does not depict any performance

metrics of the connection, such as delay.

C-P, P-P and S-S relationships are never explicitly revealed because of commercial-agreements.

By accessing BGP advertised routes (BGP dumps), one can access AS paths (described above)

which can help in inferring the type-of-relationships between adjacent AS-pairs using simple

intuitive rules specified by the valley-free routing model. For example, previous works [63, 117-

118] use simple rules to identify valley-free paths as those having either (a) an uphill path, a P-P

edge, and a downhill path in order; or (b) an uphill path and a downhill path in order (Figure 5.2).

Existing research finds that intuitive approaches like the Earliest Divergence Rule [6] can help in

finding disjoint-paths using the knowledge of AS paths between hosts (through trace-routes). We

find that only by mapping such AS information into a ToR-graph we can use more elegant

algorithms for computation of AS-disjoint paths that can give non-negligible improvement over

such approaches.

5.3 Maximally-Disjoint Path Computation Using a Greedy approach 5.3.1 Finding Valley-Free Edge-Disjoint Paths

To bypass a failure affecting a path, we need an alternate path which is physically-disjoint from

this primary path. Given a ToR-graph ),( EVG = and two hosts s and t , disjoint paths between s

and t can be either vertex-disjoint or edge-disjoint.

Our focus is on computing edge-disjoint valley-free paths, since this problem is shown to be

solvable in polynomial-time while the corresponding vertex variant of the problem is NP-hard

[116]. Our main purpose is to identify ASes not used on the shortest valley-free paths (selected by

96

BGP) and thus explore alternate disjoint paths. Finding edge-disjoint paths in graphs is a well

known problem and the focus of several previous works [119-120]. Computing all edge-disjoint

paths between all possible pairs of vertices in a graph is a NP-complete problem [8]. However, if

we are only interested in computing edge-disjoint paths between two hosts s and t, then the problem

becomes tractable [8].

To search for valley-free edge-disjoint paths in a ToR graph, Erlebach et al [116] proposed a two-

layer graph ( H ), constructed from a ToR-graph ),( EVG = and Vts ∈, (see Figure 5.4). H is a

directed graph obtained by making two copies of the original graph G , called the lower and upper

layers. In the upper layer all edge directions are reversed. Every node in the lower layer is

connected with ‘ n ’ artificial edges to the corresponding copy of that node, denoted by v’, in the

upper layer. These edges are directed from v to 'v . The justification of Erlebach et al’s two-layer

graph is as follows, and comes from the previously stated view of valid valley-free paths as being

the concatenation of a set of forward edges (uphill-path) and a subsequent set of backward edges

(downhill-path). A valid path rp νν ,....,1= in G with s=1ν and tvr = is equivalent to a path in the

directed graph H in the following way. The forward part of p , i.e. all edges pv ii ∈+ ),( 1ν that are

directed from iv to 1+iν , is routed in the lower layer. Then there is a possible switch to the upper

layer (there can be at most one such switch, enforced by directed artificial links between G and its

reverse). The backward part of p is routed in the upper layer (see Figure 5.3). The n parallel

artificial edges of type )',( vν going from each node of the lower layer to its corresponding copy in

the upper layer have been added to H so as to ensure that an arbitrary number of paths arising from

valid

invalid

Customer Provider

valley

u1

u0

u2

u3

u4

u1

u0

u2 u3

u4

u5valid

Tier-1

Tier-1

Tier-1

Tier-2

Tier-3

Tier-2

Tier-3

Tier-2

Tier-3u0

u1

u2

u3

u4

u5

u6

u7

u8

Maximal u

phill path

Maximal uphill p

ath

Maximal downhill path

Maximal downhill path

Figure 5.2 Example of valid and invalid valley-free paths in ToR-graphs [63, 118]

97

edge-disjoint paths in G can switch from the lower layer to the upper layer.

The two-layer graph has twice the number of vertices and edges (excluding edges between the

layers) compared to the original ToR–graph. This may lead one to believe that the cardinality of the

solution could be twice the optimal solution i.e. two approximation solution. Erlebach et al [116]

show that the two-layer model yields an optimal solution to finding the maximum number of

valley-free edge-disjoint paths.

We mention the proof briefly in this dissertation and refer the reader to [116] for the detailed

proof. Assume two edge-disjoint paths 1p and 2p and the edge-cut comprises of a forward edge e

and its copy backward edge 'e (Figure 5.4). Since e and 'e form the edge-cut, their removal should

make the graph between s and t disconnected with no valley-free paths between them. However,

if we remove e and 'e , there is still a valley-free path using the forward-edges in path 1p from s

to u , and backward-edges from u to t ; this contradiction concludes the proof .

G= Original ToR graph

Rev G(layer 2)s t

s t

A

A’

G(layer 1)

Figure 5.3 (Top) Example of valid valley-free path in the original ToR-graph (G). Dotted lines show concatenation of a set of C-P (forward) and P-C (backward) edges forming a valley free s-t path. (Bottom) Relaxation using the 2 layer model consisting only of forward edges.

98

5.3.2 Finding Maximally-Disjoint Valley-Free Paths

To identify maximally-disjoint paths valley-free paths between any two hosts using the ToR-

graph, we use a greedy-approach. The aim of the greedy approach is identification of paths passing

through ASes not used by the default Internet path aiding selection of disjoint overlay paths. The

greedy-approach finds shortest valley-free paths between hosts (in each iteration) by initiating an

expanding-ring search around the source node towards the target node. Since, the Internet selects

shortest valley-free paths (dictated by routing policies) between hosts, by eliminating shortest paths

first, the path found in the last iteration is most likely to be maximally disjoint from the primary

path and identifies ASes not used on the direct path. One point of concern is that selecting overlay

paths based on ASes on the most disjoint valley-free path will select more circuitous paths.

However, this is not true, as the ToR-graph is constructed using Customer-Provider relationships

between ASes which are sighted on paths between overlay hosts. Consequently, the number of

disjoint paths between any two hosts is the ToR-graph is not very large (two to three) (Section

5.4.2, Figure 5.6).

Computing the shortest path in the AS graph to approximate the shortest Internet path is a

challenging problem as argued by [19]. This is due to two facts; sometimes the Internet does not

select shortest paths due to BGP policies and that there may be more than one shortest-path with

Rev G(layer 2)

s t

G(layer 1)

u

uv

v

p1

p1

p2p2

e

e’

Figure 5.4 Optimal solution to the Edge-Disjoint Path problem in the Two-Layer ToR-graph

99

same number of AS hops. However, these issues can be resolved as suggested in [19] by using

additional criteria such as making use of the fact that AS-paths are transitive and that 70% of AS-

paths are symmetric. Since, in this dissertation the ToR graph is constructed using only AS-paths

between overlay end-hosts instead of reading BGP dumps, the ToR-graph is sparse and hence the

number of paths between any pair of hosts is not large. Also, note the aim of the greedy-approach is

not to predict the shortest-path between hosts likely used by Internet but on the contrary to only

identify the ASs on the most-disjoint valley-free path.

We briefly formalize our technique for searching for edge-disjoint valley-free paths between

source-destination hosts. Given a directed ToR-graph ),( EVG = (where EeV ∈∈ ,ν ) and two

hosts s (the source) and t (the destination); the search-algorithm starts out with an empty solution

set S and in each subsequent iteration, the shortest available path is found between s and t . Once

a path xp is found, it is added to S and the edges used in the current path are deleted and the

process is repeated on the remaining graph until no further ts − path can be found. The path found

in the last iteration is taken as the candidate path which is maximally disjoint from the primary

(direct) path between hosts.

The time complexity of the implementation of this greedy-approach follows that of finding the

maximum number of edge-disjoint paths between any two given hosts s and t in a graph

),( EVG = through the Max- flow/Min-cut algorithm [17] and is |)||(| VEO × , where || E is the

total number of edges and || V the total number of vertices in a graph. To quantify || E and || V ,

we assume an overlay network with N hosts; the number P of AS-paths between hosts is 2N .

Also, assuming that the average number of ASs traversed on AS-paths between each overlay host is

n (equivalently 1−n AS hops); n is a small number typically three to seven, since most end-hosts

are within three to five AS-hops of the so-called Tier-1 ISPs in the core of the network (Figure 3.5).

The worst-case time-complexity would be when all such 2N AS paths between overlay hosts are

completely vertex-disjoint (excepting terminal hosts) and hence would be )()( 42 NOPO = . In

practice, it is much less because of the power-law model of the Internet [18] which shows sparse

connectivity for a large number of hosts in the Internet; only about 1-2% hosts are well connected at

the AS level. Chen et al. in [21] show that the number of paths k which can be used to monitor the

quality of all 2N paths in a N -host overlay network are )lg( NNO . Thus, the worst-case time-

complexity of the greedy-approach for finding a maximally-disjoint alternate-path between a

source-destination pair becomes, )( 2kO where NNk log= . We consider this topic further in

Chapter 6.

100

AS-path information can also be obtained by reading BGP dumps [20]; a strong motivation for

the approach we propose here since we do not want to trade one type of overhead (probing) with

another (trace-routes). As this information is already distributed by routers in the network it will not

introduce additional traffic in the network. Also, such AS-path information needs to be updated at

infrequent intervals since the majority of Internet paths are stable [51].

5.3.3 Comparison with Earliest Divergence Rule (EDR)

Fei et al. [6], showed that an Earliest Divergence Rule (EDR) (Chapter 2, Figure 2.5) can work

well by selecting from a list of potential alternate paths, an alternate path from the source to the

destination which diverges at the earliest point from the default-path near the source. This technique

assumes availability of AS level path information (from source overlay hosts to detouring overlay

hosts). To show how finding maximally disjoint paths by using ToR graphs can yield better

performance than EDR, we use anecdotal evidence from one of the datasets (AMP-146-

30/Jun/2006). The details of this Internet dataset have been described in detail in Chapter 3. Here

we consider the direct path and the possible 120 one-hop overlay paths between two AMP monitors

installed at the two extreme ends of the continental US; amp-ucb (at University of California,

Berkeley) and amp-uvm (at University of Vermont). The direct AS-level path between amp-ucb and

amp-uvm is:

1351 19094 19548 2914 2152 25-Dst---------------------Src

This path has an average delay of 123 ms. Using the ToR graph, we find two disjoint (at AS-

level) paths between amp-ucb and amp-uvm.

1351 10578 11537 2153 25 b.)1351 19094 3356 2152 25 .)a

Note that the two paths are of equal length in this case, i.e. five AS hops. Also the direct AS level

path is longer than both of the disjoint paths found by the greedy approach. We especially present

this case to show that even when the underlying assumption about the shorter Internet paths is not

met, a greedy strategy can still work. If we use the EDR in selecting an one-hop overlay path, we

would normally go for paths diverging at the second AS, i.e. paths using AS 2153 instead of AS

2152 which is used in the direct Internet path. However, this turns out to be bad as only 13 paths go

through AS 2152 at the second AS hop and the remaining 107 go through AS 2153 at the second

101

AS hop. However if we further distinguish amongst paths based on the second disjoint path shown

above and start filtering paths which go through ASes 11537 and 10578. This reduces our candidate

path set to 7 down from 107. Since these paths are disjoint there is a very high probability that the

percentage of good paths would be good comparing to EDR where we tend to choose almost all

one-hop overlay paths. For example, the one-hop overlay path between amp-ucb and amp-uvm via

amp-mit (in MIT) is one of these 7 paths. The average delay between amp-ucb and amp-uvm

through amp-mit is 127ms, just 4 ms greater than the (shorter) direct path delay! Thus, it can be

expected to provide a good backup path should the direct path become congested.

5.4 Performance Evaluation 5.4.1 Methodology used to construct ToR-graph

For this study, we use path and delay measurements collected between AMP [2] hosts. The

details of this Internet dataset has been described in detail in Chapter 3. While the aim of an overlay

network may only be to optimize the one-way delay, which may differ for different directions due

to asymmetric Internet paths, two-way delay-measurements, such as RTTs, have been shown [121]

to be strongly correlated (with a correlation-coefficient of 0.87) to one-way delays, and so form a

reasonable basis for inferring one-way delays.

To construct the ToR for AMP dataset graph, we first identify all ASes used by paths between all

possible AMP hosts in the AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006 virtual RONs. Note

we used the trace-route information between hosts for the purpose of this study, but it is also

possible to obtain this information by reading BGP dumps as explained earlier; the only

requirement is to have reasonably good number of vantage points. AS-Paths not found by this

method can also be deduced indirectly using the fact that AS-paths are transitive [19]. We

identified a total of 4400 unique IP-addresses from the IP trace-route information. Only a small

fraction (7%) of total paths had incomplete or partially-complete trace-routes in the dataset. The

next step was to map these IP addresses to AS numbers for which we use the IP-to-ASN Whois

Service from Cymru [122], which can provide mappings for user-specified dates using the GNU

netcat utility [123]. Using the results from this service we identified a total of 275 unique ASNs.

RIPE dataset records path both at the IP and AS level. We identified a total of 118 unique ASNs for

the RIPE dataset. To find the relationships between these ASs (C-P, P-P, or S-S); we used the AS-

relationships data from CAIDA [106] which is based on RouteViews [124]. We obtained the AS

relationship from dates close enough to match the datasets. For AMP the AS relationship data used

102

was obtained on 5th June 2006; For RIPE the used AS relationship data was obtained 2nd August

2007.

To construct the ToR-graph, we identify all observed AS pairs in the AS-level paths between

AMP hosts, and mapped edges between them based on C-P, P-P and S-S relationships. We use

similar procedure when computing the ToR graphs for the RIPE dataset except for the extra AS to

IP mapping step because IP addresses are included within the dataset.

One important source of concern is the accuracy of Customer-Provider (and Peering)

relationships inferred from [106] as used in the ToR-graph. The methodology to obtain customer-

provider and peering relationships is based on collecting AS level paths through looking glass

servers recording BGP path advertisements and assigning customer, provider and peering/sibling

relationships to adjacent AS pairs so as to minimize anomalous paths (paths that violate the valley-

free routing principle) as shown by Gao et al. [63] and Battista et al. [117]. However, we note that

our ToR graphs are very sparse; they are constructed using customer-provider and peering

relationships between only 275 ASes for AMP and 118 ASes for RIPE. This minimizes the source

of such errors.

5.4.2 Network layer path characteristics inferred from ToR-graph

Since we use a heuristic approach for finding maximally-disjoint overlay paths, we first look at

AMP and RIPE data to evaluate the effectiveness of our proposed techniques. Chiefly, we are

interested in network layer path-characteristics between AMP and RIPE hosts such as the impact of

routing-policies on path- inflation and path-diversity using only the data that can be inferred from

the ToR-graph.

To see the impact of routing-polices on path-inflation; i.e. to see if shortest paths were selected

more often than not, we measured path-inflation on direct paths. We compute the shortest-paths

between AMP and RIPE hosts in the ToR-graph and compare them with the actual number of AS

hops on the direct-path using the trace-route information from the dataset. We find that the

majority of paths between AMP hosts (53%) and RIPE hosts (58%) were shortest-possible AS

paths. Only 27% of AMP paths and 31% of RIPE paths were inflated by one AS hop (Figure 5.5).

103

We also measure the total number of edge-disjoint paths found per source-destination pair

(Figure 5.6). Around 60% of AMP host pairs and RIPE host pairs have two or more edge-disjoint

paths. Note that these figures are very conservative estimates when we observe that about 10% of

the source destination pairs of the AMP dataset and 20% of source-destination pairs of RIPE dataset

do not have complete trace routes and so may have more than one edge disjoint path.

The ToR-graph may have some missing peering links or erroneous customer-provider links as

discussed in the previous section. Consequently, our results for path inflation and number of

disjoint paths between source-destination AS pairs may be slightly skewed in certain cases. For

example, some source-destination pairs may have shorter paths than those indicated due to missing

peering or customer provider links. Likewise, some source-destination pairs may have more disjoint

paths than those identified. However, we reiterate that the source of such errors is minimized due to

the sparse nature ToR-graphs formed with customer-provider-peering relationships between only

275 ASes for AMP and 118 ASes for RIPE.

AMP-146-30/Jun/2006

0

10

20

30

40

50

60

0 1 2 3 4 5 6

Path Inflation (AS hops)

% T

otal

num

ber

of

Path

s

RIPE-40-05/Sep/2007

010203040506070

0 1 2 3 4

Path Inflation (AS hops)

% T

otal

num

ber o

f Pa

ths

Figure 5.5 Path inflation between (a) AMP and (b) RIPE hosts (AS-hops).

104

5.4.3 Performance-Evaluation of the Greedy-Approach Selection of Alternate Paths

The greedy-approach selects alternate-paths between source-destination pairs by ranking them on

the basis of their degree-of-disjointness from direct-paths. For this, we first use the traceroute

information on all possible one-hop indirect paths and compare the number of ASes which are

common between the indirect path and the candidate-path selected by our algorithm.

AMP-146-30/Jun/2006

05

1015202530354045

Inco

mpl

ete

Trac

erou

tes 1 2 3 4 5 6 7

No. of Disjoint Paths

Perc

enta

ge s

ourc

e-de

stia

ntio

n pa

irs

RIPE-40-05/Sep/2007

05

1015202530

Inco

mpl

ete

Trac

erou

tes 1 2 3 4 5 6 7

No. of Disjoint Paths

Perc

enta

ge s

ourc

e-de

stia

ntio

n pa

irs

Figure 5.6 Number of disjoint paths between (a) AMP (top) and (b) RIPE hosts using ToR-graph.

105

We define the degree of disjointness (σn) of the nth overlay path as being the ratio of the number

of ASes that are common in the candidate valley-free disjoint-path computed by the greedy-

approach (cdp) and the nth overlay path. We use this degree of disjointness to rank overlay paths.

Thus, given the candidate-disjoint-path (cdp) between two AMP-hosts (s and d) selected by the

greedy-approach, using the ToR-graph as set AScdp=[ASs ASw ASx ASy…ASd] and the corresponding

one-hop indirect-path between the same host-pair as another set ASn1-hop (for the nth indirect-path)=

[ASs ASp ASq ASr …ASd], the degree-of-disjointness coefficient (σn) is given by (1):

||

||

1

1n

hop

cdpn

hopn AS

ASAS

− ∩=σ (5-2)

where | X | denotes the number of elements in a set X.

An alternate path n is selected by the greedy-approach if the partial disjointness is greater or

equal to some threshold value σ , i.e. thn alternate-path is selected if σσ ≥n . (Note that σ used

here is different from σ in Equation 3-1 ). We found that most nσ values were in the range of 0.2-

0.7.

An interesting observation is that if there is only one edge-disjoint path in the ToR-graph between

a given source-destination pair (Figure 5.6), the greedy-approach may actually choose the shortest

path (if the direct-path is also not inflated) as opposed to more circuitous disjoint-path; greedy-

approach will thence select less-circuitous indirect-paths with better delay characteristics. Note that

this does not invalidate the effectiveness of the greedy-approach, since even selecting a shorter-

path between AMP hosts can still yield a path that is disjoint from the primary-route if the direct-

path is inflated (Fig 5.5); if the direct path is not inflated then it will admit almost all overlay paths.

Interestingly, we found out that for such source-destination pairs showing little or no path diversity,

even the most intuitive strategy like the EDR [6] was unable to select a small number of candidate

alternate-paths because a large number of alternate-paths diverged at the same AS hop. In such

situations, [6] proposed selecting paths based on additional path-performance criteria such as delay

constraints; the focus of this chapter is not to investigate such criteria; the performance is evaluated

strictly under the disjointness criteria mentioned previously.

106

Delay Gain of Selected Paths

We designate a direct-path as degraded using the definition of a path anomaly introduced in

Section 3.5. We next carried out simulations to analyze the fault-tolerance properties of maximally-

disjoint paths when the direct-path undergoes an outage. For this final performance evaluation we

consider the AMP dataset because as mentioned earlier in Chapter 3, RIPE datasets only provide

routing vectors as aggregate summary (number of times sighted between time intervals etc) so it is

difficult to ascertain what paths were exactly being used at specific time intervals between RIPE

hosts when the anomaly occurs on a direct path. Knowing this path information is very crucial for

the framework highlighted. For all AMP hosts, we observed intervals when the path between them

suffered from outage or path degradation. We consider 10=k to emulate outages and 3=k to

emulate performance failures as before (Chapter 4). We investigate which indirect-paths offer better

performance during the entire period when the direct path is degraded by using the time-stamps in

the RTT trace files in the AMP dataset [42]. We show the results in Figures 5.7 & 5.8. Figure 5.7

shows the reduction in the number of alternate paths selected and Figure 5.8 compares the delay

gain metric (Chapter 3) of the greedy-approach and that of EDR.

The first interesting observation is that EDR was unable to find a better alternate path for 10% of

the path outages and performance failures. This is because AS path information was not available

between all pairs of AMP hosts due to asymmetric nature of path probing/measurements between

some AMP-HPC and AMP-International hosts (Chapter 3).

The greedy-approach reduces the number of candidate selected paths compared to EDR, as the

disjointness threshold for 5.0=σ for 60% of the degradations encountered (subtracting the 10% of

the cases where no path is selected by both techniques because of incomplete/unavailable AS path

information). These figures agree with our previous observations in Figure 5.5. We had observed

that around 60-70% of the source-destination paths were shortest; inflated by at most one AS hop.

Moreover, we also observed that around the same percentage of source-destination pairs had

multiple (greater than one) edge-disjoint paths in the ToR graph. We plot the delay gain for the best

path from amongst those selected using greedy-approach. For performance comparison, we also

show the corresponding results of the EDR criteria [6] where those alternate paths are considered

whose AS paths separate from direct path nearest to the source.

107

Path Outages

00.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 100 120 140No. of candidate paths selected

CD

F

GreedyEDR

Performance Failures

00.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 100 120 140No. of candidate paths selected

CD

F

GreedyEDR

Figure 5.7 Number of candidate paths selected by greedy-approach for path outages and performance failures in the AMP-datasets: (a) AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06.

108

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

Path Outages (%)

Del

ayga

in (%

)

Best-Alternate-Path

Best-using-Greedy

Best-Using-EDR

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

Performance Failures (%)

Del

ayga

in (%

)

Best-Alternate-Path

Best-using-Greedy

Best-Using-EDR

Figure 5.8 Delay gain of best path selected for path outages and performance failures in the AMP-datasets: (a)AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06.

109

Overall we observe that selecting alternate indirect-paths on the basis of AS disjointness, not only

reduced the number of potential choices drastically from 144 to fewer than 20 (Figure 5.7) in a

large majority of cases but it also finds the paths offering better delay gains in up to 90% of path

outages and performance failures (Figure 5.8). Note that both techniques were unable to find an

alternate path for around 10-15% of the outages and performance failures (Figure 5.7) because of

the incomplete/unavailable AS information (Figure 5.6a).

One interesting point worth noting based on the results of Figures 5.7 and 5.8 is that both EDR

and the greedy approach are able to find paths offering better delay gains for path outages emulated

for AMP-146-30/Jun/2006 indicated by the greater convexity of the curves in (Figure 5.8a) but do

not perform as well for finding paths for performance degradations (AMP-133-31/Aug/2006, Figure

5.8b). This is because both of these techniques tend to look for more disjoint, hence, more

circuitous paths which may tend to have higher delay than the degraded direct path if the magnitude

of degradation is small. Still, we observe that the greedy approach can select a path with a

performance very close to EDR for 90% of the performance degradations encountered while

selecting smaller number of candidate paths.

Correlation of best-selected path with direct path

We also calculate the correlation of path delays of the best selected path using the greedy

approach with the direct paths. We compute the correlation as:

)()(),(),(

YVARXVARYXCOVYXCORR = (5-3)

where X and Y represent the random variable given by the path delays of direct path and best

selected path respectively.

For each pair of measured end hosts a and b , we define )(tZab as path-delay between them at

time t and )(tZacb as the delay of path between a and b through an intermediate-host c at time

t . If the total number of measurements is K , then we compute expected values as given below:

∑=t

abab tZK

ZE )(1][ (5-4)

∑=t

acbabacbab tZtZK

ZZE )()(1][ (5-5)

Since delay measurements between the direct-path and the selected alternate-path may not be

perfectly synchronized, the computation of correlation may have some error. However, the AMP-

110

datasets used have timestamps for each recorded value of delay between AMP-hosts, so we discard

samples which are not within a window of 25 seconds.

Figure 5.9 shows the correlation of path-delay characteristics between the actual direct-paths

between AMP hosts undergoing degradation and the best-alternate-path selected using the greedy

approach based path-ranking. Results are shown for path outages in AMP-146-30/Jun/2006 and

AMP-133-31/Aug/2006. As can be seen from the figure, around 20% of the alternate-paths selected

exhibit negative correlation with the path-delay characteristics of the direct-path and 80% of the

alternate-paths show a correlation of less than 0.2. Only about 10% of the alternate-paths exhibit a

correlation of 0.6 and higher, this is due to the fact that some one-hop overlay paths inevitably share

underlay links with the direct path.

5.5 Chapter Summary This chapter presented the second contribution of this thesis, the analysis of computing

maximally-disjoint paths in overlay networks using ToR graphs. Disjoint path computation can be

used as an offline-heuristic to supplement measurement-based approaches [4] which are not

scalable, or for alternate indirect-path computation when the direct path between two hosts is

affected by a performance failure or an outage. We proposed and analyzed the performance of a

greedy approach for computing such disjoint-paths using real world Internet datasets. Our results

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

0 20 40 60 80 100

% Degradations (Normalized)

Cor

rela

tion

Jun-06Aug-06

Figure 5.9 Correlation of path-delay characteristics between direct-path and best-alternate-path selected using Greedy Path Selection (Path Outages for AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006).

111

show that such heuristics can be used to select alternate paths to bypass path outages or

degradations.

112

113

PART III

PATH MONITORING IN OVERLAY NETWORKS

114

115

6 ISSUES OF STATISTICAL PATH MONITORING IN OVERLAY NETWORKS

6.1 Introduction

The previous section of this dissertation discussed scalable architectures that exploit the network

layer overlay topology for disjoint path selection, thus reducing or eliminating path monitoring

overheads. However, disjoint path selection may not be possible in some cases because the

technique might not work in some cases. Recalling from the previous chapter, EDR and greedy

selection did not work for about 10-15% of outages and performance failures in selecting a better

alternate path when it was present. This is because the best path might not always be the maximally

disjoint path. Path monitoring could be used as a fall back in these cases. Even if alternate paths are

selected based on disjointness it normally leads to a smaller list of candidate possible paths

(Chapter 5), then path selection has to be made again on the basis of path monitoring methods.

Path monitoring can help in meeting dynamic QoS demands than merely ensuring path

disjointness. For example, selection of a longer and less congested disjoint overlay path between a

source and destination host may still give higher delay than a shorter congested direct Internet path.

Also, path disjointness may vary with time because the underlay network has a mechanism of its

own to rectify problems in the Internet by switching over to alternate paths (even if it does so

lazily!). Consequently, overlay paths selected based on physical disjointness criteria could have

already become congested due to underlay network switching traffic from the congested links to

uncongested links available on the selected disjoint overlay path.

Revisiting the problem, Andersen et al. [4] showed that when the direct-path between two

Internet hosts fails, an alternate path between them can be established using an overlay host whose

direct-paths to the source and destination host have not failed due to the spatial diversity of paths

(Figure 1.1).

We emphasized the importance of overlay path monitoring in the previous paragraphs. To recap,

we go through a simple example which highlights the importance and possibility of scalable path

monitoring in overlay networks as we will see later. An overlay can find good detours by

aggressive path monitoring. This is because an overlay link is a logical abstraction of multiple

underlay links. Two overlay links may seem disjoint at the application layer, yet share a link in the

underlying IP layer. The shared IP link renders both useless in the event of failure. For example,

116

consider the network example in Figure 6.1(a). Assume that each link has unit weight and shortest

paths are selected between two nodes. If link l fails, it disconnects source S from destination D .

It also renders both overlay hosts 1R and 2R useless for S to reach D using a single overlay hop

as S needs l to reach 2R and 1R needs it to reach D . In this case S can only reach D through

3R or through the two hop overlay route DRRS →→→ 31 . This requires that overlay hosts

constantly monitor individual overlay links to successfully detour the traffic via an appropriate

overlay node in the event of failure on the underlay network.

To be able to establish such alternate paths quickly in overlay networks it is important to monitor

all such possible indirect paths through probing. However, when the size of the overlay network is

large, probing generates excessive overhead [4]. Maintaining complete state about all overlay links

requires in the ideal case, that all N hosts be connected as logical mesh or clique (Figure 6.1(b)).

Subsequent probing for measurement of end-to-end path metrics between overlay hosts and its

dissemination via a link state protocol incurs maintenance overheads of )( 2NO . The poor

scalability of this limits the size of deployed overlay networks. On the other hand, maintaining

complete overlay state without the knowledge of the topological diversity of individual overlay

hosts may be counterintuitive when we consider that the locations of path and performance failures

are not known a priori, are often correlated and vary on very small time scales.

RON [4] aimed to bypass path failures using application specific metrics e.g. throughput, loss

rate, latency and routing through any of the possible indirect overlay hosts which are probed

aggressively incurring large overheads. Such path exploration techniques are not scalable above

modest network sizes. Previous works [7, 125] showed that the large degree of underlay link

sharing among paths enables an overlay to only monitor a carefully selected subset of the paths and

then to statistically predict the path metrics of the remaining paths.

R2

R3

R1

S

D

l

Figure 6.1 (a) (left)How overlay resilience depends on topology of the underlay network. (b) Inferring maximum information about all virtual overlay links.

117

This chapter presents the third main contribution of this thesis, namely detecting and identifying

the cause of statistical path prediction errors. First, Section 6.2 describes the related algebraic

notation. Section 6.3 evaluates the degree of independence of paths in AMP and RIPE networks by

determining the rank of their Routing Matrices (previously introduced in Section 1.1). In Section

6.4 we present the technique for monitoring a subset of paths and predict the remaining path metrics

using Best Linear (BL) statistical prediction algorithm (proposed earlier [8]) and apply it on RIPE

and AMP routing matrices. We find that BL statistical path prediction can suffer from errors that

are due to inconsistencies in routing matrices. So in Section 6.5 we review what causes these

Routing Matrix Inconsistencies (RMI), quantify the extent of RMI in RIPE and AMP datasets, and

discover that RMI can be difficult to remove. Consequently, in Section 6.6 we introduce statistical

prediction techniques that are robust against the effects of RMI. Section 6.7 reviews the practical

improvement in anomaly prediction in the presence of RMI using our proposed technique. Section

6.8 summarizes the key findings of the chapter by providing a brief discussion. Section 6.9

concludes the chapter.

6.2 Algebraic Notation

We begin by establishing some relevant notation and definitions. Let ),( εν=G be a strongly

connected directed graph, where the vertices in ν represent network devices (routers and end-

hosts) and the edges in ε represent links between those devices. Additionally, let ρ be the set of all

paths between end-hosts in the network (pre-determined by commercial Internet routing policies),

and let ||ν=vn , || ε=en and || ρ=pn denote, respectively, the number of devices, links, and

paths.

Many network path characteristics are additive of their constituent elements; e.g. path delays can

be represented as the sum of its constituent link delays il (Figure 6.2).

118

Path delay ∑=

=m

iid lP

1 where Pl i∈ (6-1)

Packet loss rates on the other hand are not additive but multiplicative in nature. If each of the

constituent links on a path drop packets with a probability ip , then the probability Pr with which

packets will be dropped on the path will be: )1(Pr1 1 imi p−∏=− = . However, such multiplicative

metrics can also be converted into additive metrics using logarithms on both sides, i.e.

∑=

−=−m

iip

1)1lg(Pr)1lg( .

Other network characteristics can also be concave in nature; for example, bandwidth. Bandwidth

available on a path is the bandwidth of the bottleneck link, i.e. the least bandwidth link and so

cannot be expressed in the algebraic manner explained above. The statistical path estimation

approaches outlined in this paper are primarily concerned with additive network characteristics

where the sole objective is to be able to predict end to end network characteristics measuring only a

subset of end to end paths. Non-additive network characteristics such as bandwidth, require

measurements at finer granularity than simply observing end to end path measurements which is

outside the scope of this chapter. Other studies, e.g. iPlane [126], have developed techniques for

estimation of bandwidth on a path using vantage points inside the network that measure link

attributes probing paths from the vantage points to intermediate routers in the network.

If we use vector enb ℜ∈ to denote measurement of a metric on each edge ε∈j of the graph,

then the vector pny ℜ∈ of path measurements is given by:

Mby = (6-2)

l1 l2 l3 lm

Pd

Figure 6.2 Additive Network Metrics.

119

where ep nnM ×∈ ]1,0[ is a routing matrix in which:

1, =jiM if path i traverses link j

0, =jiM , otherwise

Figure 6.3 gives an example of a network and corresponding routing matrix and measurement

vectors. The measurements could be of any performance metric such as delays, or loss rates.

The column (or row) rank of a matrix, such as M is the number of linearly independent columns

(or rows) in that matrix. If one measures )(MRankr = paths, then the path metrics of the entire

network can be determined exactly. Section 6.6 will show that the routing matrices for large

Internet overlay networks are ‘rank deficient’, in the sense that their rank is smaller than either

dimension of their matrices, i.e. ),min( ep nnr < . For such networks, it is only necessary to

measure as many paths as the rank of the routing matrix [7]. When limited resources force

measurement of less than r paths, then the performance of the other paths can be estimated

statistically to predefined tolerance levels [125].

β1

A

B

C

1 0 1

M= 1 1 0

0 1 1

y1

Y= y2

y3

β1

b= β2

β3

Y= Mb

y1 y2

y3

β2

β3

l2

l1

l3

l1 l2 l32 1 1

D= 1 2 1

1 1 2

Figure 6.3 Algebraic method of path monitoring

120

6.3 Routing matrices and Eigen Spectra of AMP and RIPE data sets

We use path and delay measurements collected between AMP and RIPE hosts. For estimating the

routing matrix from traceroutes, we use the virtual IP interface-pair links as real, router to router

links. The details have been described in Chapter 3. The datasets considered in this chapter were

collected during three 24-hr periods on June 30 and August 31, 2006 (AMP) and September 5, 2007

(RIPE). Since RIPE uses (i) one way path delay values owing to the provision of GPS

synchronization in its hosts compared to RTT estimates for path delays in AMP, and (ii) dedicated

software for estimation of the routing vectors (IP and AS level) compared to traceroute estimation

in AMP, it yields far more superior results than the AMP datasets for prediction of unmonitored

path properties giving conviction that ordinary traceroutes may yield less than satisfactory results in

computing a routing matrix, as we will see later.

6.3.1 Extent of rank-deficiency

Table 6.1 shows the dimensions of the routing matrices in terms of the number of paths/underlay

links and the ranks. The Rank Deficiency (RD) of a routing matrix is defined as:

))(),log(min()( rRanknnRDDeficiencyRank ep −= (6-3)

To get a feel for the extent to which the number of measured paths can be reduced below r , we

can consider the eigen-spectrum of the routing matrix, which indicates the degree of linear

dependence between the rows of a matrix. The eigen-spectrum is obtained through Singular Value

Decomposition (SVD) of the matrix MMD T= and the spectra for two Internet datasets AMP and

RIPE are shown in Figure 6.4.

Table 6-1 Dimensions and rank of AMP and RIPE routing matrices.

Dataset Paths (np) Links (ne) Rank (r) RD

log(min(np,ne)-r)

RIPE-40-05/Sep/2007 1499 2690 673 2.92

RIPE-30-05/Sep/2007 622 1693 385 2.37

AMP-50-30/Jun/2006 1700 1239 485 2.88

AMP-40-31/Aug/2006 935 812 350 2.66

AMP-30-30/Jun/2006 594 747 249 2.55

121

The diagonal elements of the matrix D are precisely the number of paths routed over their

respective links referred to as the betweeness of the links. Likewise the off-diagonal elements

measure the number of paths routed simultaneously over pairs of links referred to as co-betweeness

of the links. The co-betweeness jiD , of any two edges i and j will always be bounded above by the

smaller of the two edges betweennesses; i.e. ),min( ,,, jjiiji DDD ≤ . Chua et al. in [8] showed that

the behavior of the eigen-spectrum is related to the diagonal; the spectral decay of M at worst

parallels the edge betweeness in the graph G .

The rapid decay of the spectrum shows the degree of non-trivial link sharing amongst paths; the

knee occurs when only 1% of the rank r paths have been included and it is interesting to note that

only 20-50% of the rank r paths (note the log scale) can be used to draw meaningful inference

about the path metrics. Also note that the eigen-spectra of AMP networks show faster decay than

that of similarly sized RIPE networks. Subsets of AMP and RIPE hosts are selected to make the

comparison more meaningful, as discussed earlier in Section 3.1. This means that the amount of

linear dependence amongst paths on AMP networks is greater than RIPE. To further prove this

point, we show in Figure 6.5 the degree of the ASes of the RIPE and AMP datasets considered on a

normalized scale to cater for the differences in the number of ASes in both datasets. The AS

degrees for AMP fall more sharply than that for RIPE showing that path sharing in AMP networks

is more than in RIPE network. As we see later, routing matrix inconsistencies can amplify the

effects of statistical path prediction errors in AMP networks due to the greater degree of path

sharing as compared to RIPE networks.

122

Eigen Spectra

0.01

0.1

1

0.001 0.01 0.1 1Fraction of rank-log scale

Eige

n Va

lues

of M

'M

(Nor

mal

ized

)-lo

g sc

ale

RIPE-30-05/Sep/2007

AMP-30-30/Jun/2006

0.50.2

Eigen Spectra

0.01

0.1

1

0.001 0.01 0.1 1Fraction of rank-log scale

Eige

n Va

lues

of M

'M

(Nor

mal

ized

)-lo

g sc

ale

RIPE-40-05/Sep/2007

AMP-40-31/Aug/2006

0.50.2

Figure 6.4 Eigen Spectra of AMP and RIPE Networks.

123

6.4 Selecting a Subset of Paths for Monitoring and Predicting the Unmonitored Paths Using Best Linear Predictor

As described in the previous section, in order to completely infer network performance one needs

to monitor paths corresponding to the r largest (or all non-zero) singular values (the square roots of

eigen-values). To save monitoring overheads, we can monitor a subset k of

rank r paths( rk ≤ ) paths, corresponding to the k largest singular values. From this subset of

paths we can estimate the link metrics vector, from which we can estimate the metrics for the

remaining paths. Finding such a subset of paths is an NP-complete problem, however

approximation algorithms [8, 127] exist for selecting paths approximating the k largest singular

dimensions.

Picking a subset of paths ( )(MRankrk =< ) involves selecting paths that have the highest

singular dimensions, as explained earlier. We use the same algorithm as [8] which is an adaptation

of the subset selection algorithm to select a subset of paths when the path metrics are a sum of link

metrics. Denoting the routing matrix by M and the link covariance matrix by C in order to assign

1

10

100

1000

10000

0.001 0.01 0.1 1ASes sorted according to degree (normalized)

AS

degr

ee

RIPE-40-05/Sep/2007AMP-40-31/Aug/2006

Figure 6.5 AS degree for RIPE and AMP networks.

124

higher weights to paths that are more variable. The algorithm first factorizes ep nnMC × using SVD

into two orthogonal matrices U and V . TUSVMCSVD =)( (6-4)

where Σ=TCC (Σ is the link covariance matrix.)

pp nnU ×ℜ∈ & ee nnV ×ℜ∈

such that,

ep nnp

T diagSVMCU ×ℜ∈== ),...,,()( 21 σσσ ,

),min( ep nnp = and

0...21 ≥≥≥≥ pσσσ

The left singular vectors (i.e. columns of ],...,,[ 21 pnuuuU = ) form an orthogonal basis for the

range of MC and the magnitude of their corresponding singular values indicates their relative

importance. Note that these singular values are the square root of the eigen values of MCMC T)( .

The algorithm makes heuristic use of QR-factorization with column pivoting to find )( rkk ≤ rows

of M that approximate the span of the first k left singular vectors of MC .

QRPU kTk = (6-5)

where knk

pU ×ℜ∈ formed by the first k columns of U ; and pp nnkP ×ℜ∈ is the permutation matrix.

sM is then the submatrix formed by the first k rows of MPTk . The complete algorithm is

described in Algorithm 1.

The GLS based estimation of the link metrics vector is used in the Best Linear (BL) prediction

for unmonitored path delay as in [8]. We use the following equation from [8] to obtain the

estimated delays on unmonitored paths, (see Appendix for its derivation from the estimated value

link-metrics vector (A-7)).

sssrsTrsrr yVVlyylE 1)()|( −= (6-6)

where rl is a column vector for selecting one particular unmonitored path, Tsrrs MMV Σ= and

Tssss MMV Σ= is the covariance between the unmonitored and monitored and between monitored

paths, respectively. ∑ is the link covariance matrix.

125

Using only path information obtained from traceroutes it is difficult to infer second order link

characteristics such as link covariance or link correlation. We present in Figure 6.6, the link

correlation matrices for AMP-30 for all links exhibiting a correlation of 0.25 or more. Figure 6.6a

shows the correlation matrix for intraAS links (with links inside one AS grouped together). Figure

6.6b shows the correlation between interAS links; besides the main diagonal where each element is

one, due to insufficient traceroute information links in different ASes and the interAS links (the off-

diagonal elements) seem to erroneously show sufficient correlation. There is more correlation

between intraAS links than between interAS links. RIPE datasets only reports routing vectors so a

Algorithm 1 (Based on Algorithm 12.2.1 [127] ). Given a path matrix ep nnM ×∈ ]1,0[ and

corresponding path delay matrix pny ℜ∈ where pn and en are the number of paths and links

respectively in the network; the following algorithm computes a subset sM of path matrix M to

select the k rows that approximate the span of the first k left singular values vectors.

Compute the Singular Value Decomposition (SVD) of MC :

where Σ=TCC (Σ is the link covariance matrix.) TUSVMCSVD =)(

(U and V are the left and right singular vectors and S is a diagonal matrix whose diagonal

elements hold the singular values in sorted order.)

for k=1:1:rank r

Apply QR factorization with column pivoting of TkU where ):1(:, kUUk = (i.e.

first k columns)

kTk PUQR =

MPM Tknew = and yPy T

knew =

:),:1( kMM news = and :),:1( pnewr nkMM += where sy and sM refer to the

monitored paths/path matrix rows

:),:1( kyy news = and :),:1( pnewr nkyy += where ry and rM refer to the

unmonitored paths/ path matrix rows

endfor

126

similar analysis of RIPE is not possible. Thus, the performance of the BL predictor is evaluated

under identity link covariance matrix for both AMP and RIPE datasets. Chua et al. [8] find that

using an identity link covariance matrix to give satisfactory results.

Correlation Matrix- link (i,j)

link i

link

j

Correlation matrix- link(i,j)

link i

link

j

Figure 6.6 Problems in estimating of second order link metrics from traceroutes; link correlation matrices for AMP-30-30/Jun/2006. (a)(top) intra AS links; (b) interAS links

127

To quantify the accuracy of BL path prediction (Equation 6-6)), we use the L1 error metric which

is defined as:

1

1

ordelay vect actualordelay vect predictedordelay vect actual

1−

=− errorL (6-7)

where 1

. represents the 1l -norm of a vector.

Figure 6.7 shows the L1 error for RIPE and AMP networks as the number of monitored paths are

increased. While L1 error for RIPE appears as a monotonically decreasing function, AMP shows

anomalous behavior in the form of erratic spikes as the number of monitored paths are increased

contrary to expectations. This is due to errors in the estimation of routing matrices for AMP

networks which we explain in detail in the next section.

128

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1Number of monitored paths

L1-e

rror

RIPE-40-05/Sep/2007

RIPE-30-05/Sep/2007

00.1

0.20.3

0.40.50.6

0.70.8

0.91

0 0.2 0.4 0.6 0.8 1Number of monitored paths

L1-e

rror

AMP-30-30/Jun/2006AMP-50-30/Jun/2006

Figure 6.7. L1 error for RIPE and AMP networks as a function of monitored paths

129

6.5 Routing Matrix Inconsistencies 6.5.1 How RMI occurs?

Traceroutes are the most simple and common tool to infer topological information about the

network. However, they are also notorious at the same time for revealing inaccurate or even false

information about the topology of the IP network as found by previous Internet topology mapping

projects Skitter (now Ark) [52], RocketFuel [128] and Mercator [55] . The development of

specialized probing combined with heuristics such as MaxDelta [55] and Maximum Likelihood

Estimation [54] can resolve many of the topology mapping errors but requires intensive network

measurements. We also note that estimating a topology is a different problem to estimating a

routing matrix due to the fact that mismapping of even a few links can cause algebraic/statistical

methods for path prediction to return large prediction errors as we show later.

Consider two simple examples. In Figure 6.8, consider an AS with six routers employing load

balancing. This AS sends probes between two edge routers S and D , either using the path

SABD or the path SXYD according to its internal routing policies based on internal link

congestion.

In Figure 6.9, the traceroute infers the incorrect path SAYD . This is attributed to load balancing

decisions by routers inside the AS. While the probes with TTL=1 & 3 are sent on one path, a probe

with TTL=2 is sent on a different path. This leads to an insertion of a false link AY in the routing

matrix. The load balancing decisions are typically based on packet headers; traceroutes are known

to modify the Destination Port field when sending the UDP probes and the Sequence Number field

when sending ICMP Echo probes so that it can match the router response with the probes which

elicited them and some newer routers, e.g. Juniper allows up to 16 equal cost paths to

Figure 6.8 Load balancing inside an AS.

130

incorporate load balancing inside ASes [129]. The path inference problem in the presence of

routers using load balancing is further exacerbated when traceroutes use multiple TTL probes per

hop; Augustin et al. [129] found that up to 79% of the paths were incorrectly inferred in their study

due to the effects of multiple probing. We refer to all such issues as Routing Matrix Inconsistencies

)(RMI in the remainder of this chapter.

Figure 6.9 shows the frequency of path changes observed at 10 minutes intervals over a 24 hr

period in AMP networks. While most paths are stable, around 30 and 100 paths exhibit high

variation for AMP30 and AMP-50 respectively. Note that the AS level paths do not vary here, it is

only hops inside one (or more) of the ASes that vary. This shows that load balancing may be

employed in some of the networks.

S A

Y D

Figure 6.9 Incorrect path inference: some links are missed while other false links are added.

131

Figure 6.11 shows anecdotal evidence of RMI from the AMP June dataset. Consider the first

example where a RMI can occur on the path between amp-upenn and amp-hawaii. Possibly due to

some load balancing mechanism inside AS 11537, the total number of hops decreases from 17 to

16. At the same time, we notice the path delay decreasing from 154 ms to 122 ms. This 32ms

decrease could be attributed to selection of a better delay path inside AS11537 or a different egress

point from AS11537 towards AS7575. Note that we could not ascertain the AS number of the IP

hop 207.231.240.4 that may be a router inside either AS. This example illustrates that when using

path measurements to infer link measurements; if path changes, then delay can change, which may

lead to incorrect inference of link measurements until it is recognized that the path has changed (by

traceroute every 10 minutes). This further shows the case of a diamond anomaly [129] caused by

traceroute probes probing multiple paths between two routers inside a load balanced AS (Figure

6.9).

05

10152025303540

1 10 100 1000 10000Paths (log scale)

Num

ber o

f pat

h va

riatio

ns

AMP-30-30/Jun/2006AMP-50-30/Jun/2006

Figure 6.10 Frequency of path variation in AMP networks over 24 hr period

132

amp-upenn->amp-hawaii Fri Jun 30 12:18:04 PDT 2006 (Hop) (IP address) (AS) (delay1) (delay2) (delay3) 1 128.91.40.1 55 0.453 ms 0.348 ms 0.245 ms 2 128.91.240.37 55 0.447 ms 0.417 ms 0.416 ms 3 128.91.10.2 55 0.500 ms 0.555 ms 0.574 ms 4 128.91.9.1 55 0.526 ms 0.551 ms 0.480 ms 5 198.32.42.249 10466 0.711 ms 0.489 ms 0.604 ms 6 216.27.100.221 10466 0.754 ms 0.752 ms 0.762 ms 7 216.27.100.22 10466 2.918 ms 2.922 ms 2.884 ms 8 198.32.8.82 11537 27.168 ms 22.923 ms 23.003 ms 9 198.32.8.77 11537 26.731 ms 26.865 ms 26.859 ms 10 198.32.8.81 11537 36.031 ms 36.413 ms 39.260 ms 11 198.32.8.13 11537 51.321 ms 46.827 ms 46.684 ms 12 198.32.8.1 11537 71.453 ms 78.490 ms 71.323 ms 13 198.32.8.94 11537 83.078 ms 79.029 ms 78.806 ms 14 207.231.241.4 ? 103.852 ms 103.839 ms 103.754 ms 15 202.158.194.109 7575 154.747 ms 154.626 ms 154.645 ms 16 128.171.64.102 6360 154.646 ms 154.824 ms 154.580 ms 17 205.166.205.222 6360 154.480 ms 154.543 ms 154.520 ms Fri Jun 30 12:28:02 PDT 2006 (Hop) (IP address) (AS) (delay1) (delay2) (delay3) 1 128.91.40.1 55 0.460 ms 0.343 ms 0.251 ms 2 128.91.240.37 55 0.588 ms 0.466 ms 0.629 ms 3 128.91.10.2 55 0.518 ms 0.684 ms 0.504 ms 4 128.91.9.1 55 0.596 ms 0.496 ms 0.900 ms 5 198.32.42.249 10466 0.658 ms 0.500 ms 0.518 ms 6 216.27.100.221 10466 0.687 ms 0.698 ms 0.804 ms 7 216.27.100.22 10466 2.933 ms 2.892 ms 2.947 ms 8 198.32.8.82 11537 23.459 ms 22.920 ms 35.553 ms 9 198.32.8.77 11537 35.045 ms 37.135 ms 26.826 ms 10 198.32.8.81 11537 35.984 ms 39.350 ms 36.221 ms 11 198.32.8.13 11537 48.276 ms 50.821 ms 46.609 ms 12 198.32.8.49 11537 72.286 ms 72.243 ms 72.234 ms 13 207.231.240.4 ? 72.282 ms 72.313 ms 72.383 ms 14 202.158.194.109 7575 123.036 ms 122.996 ms 123.036 ms 15 128.171.64.102 6360 140.257 ms 123.085 ms 123.163 ms 16 205.166.205.222 6360 122.984 ms 122.960 ms 122.990 ms Figure 6.11 Adjusting path inside AS11537 causes significant delay reduction on path between amp-upenn and amp-hawaii

amp-fiu->amp-emory

Fri Jun 30 03:50:37 PDT 2006 (Hop) (IP address) (AS) (delay1) (delay2) (delay3) 1 131.94.191.2 3681 0.428 ms 0.630 ms 0.271 ms 2 131.94.192.10 3681 0.496 ms 0.269 ms 0.715 ms 3 198.32.155.77 11096 0.775 ms 0.709 ms 0.705 ms 4 198.32.155.5 11096 7.689 ms 7.567 ms 7.571 ms 5 198.32.155.65 11096 7.700 ms 7.648 ms 7.598 ms 6 198.32.155.66 11096 13.719 ms 13.678 ms 13.744 ms 7 170.140.14.37 10490 13.893 ms 13.839 ms 13.827 ms 8 170.140.127.97 3591 13.980 ms 13.851 ms 13.838 ms Fri Jun 30 04:00:23 PDT 2006 (Hop) (IP address) (AS) (delay1) (delay2) (delay3) 1 131.94.191.2 3681 0.436 ms 0.630 ms 0.271 ms 2 131.94.192.10 3681 0.466 ms 0.270 ms 0.437 ms 3 198.32.155.77 11096 1.049 ms 0.698 ms 0.684 ms 4 198.32.155.5 11096 7.742 ms 7.565 ms 7.598 ms 5 198.32.173.125 11096 7.781 ms 7.607 ms 7.594 ms 6 198.32.173.126 11096 14.330 ms 14.068 ms 14.078 ms 7 199.77.193.2 10490 13.700 ms 13.871 ms 13.692 ms 8 170.140.14.37 10490 13.887 ms 13.824 ms 13.970 ms 9 170.140.127.97 3591 13.855 ms 13.819 ms 13.798 ms Figure 6.12 Load balancing inside AS11096 causes anomalous delay measurements at 6th and last hop on path between amp-fiu and amp-emory

133

Our second example in Figure 6.12 shows the traceroute snippet on the path between amp-fiu and

am-emory. Here load balancing inside AS11096 introduces an anomalous measurement on the sixth

hop which is greater than the round-trip delay to the seventh hop. This could be due to the case

highlighted in Figure 6.9, Note the two IP addresses 198.32.173.125 and 198.32.173.126 represent

a contiguous set and may have belonged to the same router here, but the large difference in delay

measurements between the two suggests otherwise.

The third example (Figure 6.13) is a more classic case of dynamic load balancing inside

AS11537 where the path through this AS is different inside the same 10 minute window (12:20 to

12:30) used for conducting traceroute measurements. While several paths to amp-hawaii flipped

from amp-bu, amp-upenn, amp-princeton etc to the newer paths at differing times (inside AS11537)

and seemingly continued the same way for the remainder of the day based on traceroute data (at 10

min intervals), the path between amp-nyu and amp-hawaii seemed to be immune to this change.

Apparently here the load balancing decision incorporates some routing policy.

134

amp-bu-> amp-princeton-> amp-upenn-> amp-nyu-> amp-hawaii amp-hawaii amp-hawaii amp-hawaii Fri Jun 30 12:12:31 Fri Jun 30 12:13:41 Fri Jun 30 12:18:04 Fri Jun 30 12:14:51 (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) 1) 128.197.160.1 1) 140.180.128.1 1) 128.91.40.1 1) 192.76.177.177 2) 128.197.254.161 2) 128.112.12.6 2) 128.91.240.37 2) 199.109.4.21 3) 128.197.254.122 3) 198.32.42.65 3) 128.91.10.2 3) 199.109.7.97 4) 192.5.89.201 4) 216.27.100.22 4) 128.91.9.1 4) 199.109.7.9 5) 192.5.89.10 5) 198.32.8.82 5) 198.32.42.249 5) 199.109.2.2 6) 198.32.8.82 6) 198.32.8.77 6) 216.27.100.221 6) 198.32.8.77 7) 198.32.8.77 7) 198.32.8.81 7) 216.27.100.22 7) 198.32.8.81 8) 198.32.8.81 8) 198.32.8.13 8) 198.32.8.82 8) 198.32.8.13 9) 198.32.8.13 9) 198.32.8.1 9) 198.32.8.77 9) 198.32.8.1 10) 198.32.8.1 10) 198.32.8.94 10) 198.32.8.81 10) 198.32.8.94 11) 198.32.8.94 11) 207.231.241.4 11) 198.32.8.13 11) 207.231.241.4 12) 207.231.241.4 12) 202.158.194.109 12) 198.32.8.1 12) 202.158.194.109 13) 202.158.194.109 13) 128.171.64.102 13) 198.32.8.94 13) 128.171.64.102 14) 128.171.64.102 14) 205.166.205.222 14) 207.231.241.4 14) 205.166.205.222 15) 205.166.205.222 15) 202.158.194.109 16) 128.171.64.102 17) 205.166.205.222 Fri Jun 30 12:22:26 Fri Jun 30 12:23:53 Fri Jun 30 12:28:02 Fri Jun 30 12:24:46 (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) 1) 128.197.160.1 1) 140.180.128.1 1) 128.91.40.1 1) 192.76.177.177 2) 128.197.254.161 2) 128.112.12.6 2) 128.91.240.37 2) 199.109.4.21 3) 128.197.254.122 3) 198.32.42.65 3) 128.91.10.2 3) 199.109.7.97 4) 192.5.89.201 4) 216.27.100.22 4) 128.91.9.1 4) 199.109.7.9 5) 192.5.89.10 5) 198.32.8.82 5) 198.32.42.249 5) 199.109.2.2 6) 198.32.8.82 6) 198.32.8.77 6) 216.27.100.221 6) 198.32.8.77 7) 198.32.8.77 7) 198.32.8.81 7) 216.27.100.22 7) 198.32.8.81 8) 198.32.8.81 8) 198.32.8.13 8) 198.32.8.82 8) 198.32.8.13 9) 198.32.8.13 9) 198.32.8.1 9) 198.32.8.77 9) 198.32.8.1 10) 198.32.8.49 10) 198.32.8.94 10) 198.32.8.81 10) 198.32.8.94 11) 207.231.240.4 11) 207.231.241.4 11) 198.32.8.13 11) 207.231.241.4 12) 202.158.194.109 12) 202.158.194.109 12) 198.32.8.49 12) 202.158.194.109 13) 128.171.64.102 13) 128.171.64.102 13) 207.231.240.4 13) 128.171.64.102 14) 205.166.205.222 14) 205.166.205.222 14) 202.158.194.109 14) 205.166.205.222 15) 128.171.64.102 No change! 16) 205.166.205.222 No change! Fri Jun 30 12:32:30 Fri Jun 30 12:33:44 Fri Jun 30 12:38:10 Fri Jun 30 12:34:58 (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) (Hop) (IP address) 1) 128.197.160.1 1) 140.180.128.1 1) 128.91.40.1 1) 192.76.177.177 2) 128.197.254.161 2) 128.112.12.6 2) 128.91.240.37 2) 199.109.4.21 3) 128.197.254.122 3) 198.32.42.65 3) 128.91.10.2 3) 199.109.7.97 4) 192.5.89.201 4) 216.27.100.22 4) 128.91.9.1 4) 199.109.7.9 5) 192.5.89.10 5) 198.32.8.82 5) 198.32.42.249 5) 199.109.2.2 6) 198.32.8.82 6) 198.32.8.77 6) 216.27.100.221 6) 198.32.8.77 7) 198.32.8.77 7) 198.32.8.81 7) 216.27.100.22 7) 198.32.8.81 8) 198.32.8.81 8) 198.32.8.13 8) 198.32.8.82 8) 198.32.8.13 9) 198.32.8.13 9) 198.32.8.49 9) 198.32.8.77 9) 198.32.8.1 10) 198.32.8.49 10) 207.231.240.4 10) 198.32.8.81 10) 198.32.8.94 11) 207.231.240.4 11) 202.158.194.109 11) 198.32.8.13 11) 207.231.241.4 12) 202.158.194.109 12) 128.171.64.102 12) 198.32.8.49 12) 202.158.194.109 13) 128.171.64.102 13) 205.166.205.222 13) 207.231.240.4 13) 128.171.64.102 14) 205.166.205.222 14) 202.158.194.109 14) 205.166.205.222 15) 128.171.64.102 16) 205.166.205.222 No change! Figure 6.13 Dynamic Load balancing inside AS11537 for paths to amp-hawaii seems to affect some paths at different times but not others

135

To demonstrate the effects of routing matrix inconsistencies we formulated the problem as a

linear optimization problem to estimate the link metric vector as explained below.

Link-Metric Vector Estimation based on the 1l -norm minimization (Least Norm / Sparse Solution)

Coates et al. [22] showed that estimating the link-metrics vector can be based on the underlying

idea that only a few links in the network have significant delays and the remaining links have very

insignificant delays close to zero. Previous works e.g. [130] showed that such combinatorial

problems can be relaxed to an optimization problem and one approach to obtaining a sparse (least

norm) estimate of β is to solve an 0l optimization problem of the form,

ββββ ss My == subject to minargˆ

0 (6-8)

where sy and sM respectively denote the rows of y and M to be monitored and β counts the

number of the non-zero entries of β .

It is well known that this problem is NP-hard, requiring one to enumerate all possible subsets of

non-zero coefficients. Candes et al. [131] showed that if certain conditions on sM and β are met,

the 0l optimization problem is equivalent to the following simpler 1l optimization problem.

ββββ ss My == subject to minargˆ

1 (6-9)

where ∑=

=n

ii

11

ββ .

Because the 1l optimization is convex, it is computationally tractable, and a solution can be

obtained using linear programming. In addition to the constraints of (6-6) we also impose positivity

constraints on the estimation of β i.e. 0>β for eni <<1 . This is because if the routing matrix

does not contain any inconsistencies then ideally the optimizer should allow for all links to attain

positive values.

In addition, Donoho [132] further comments on 1l -optimization, “ in “most” applications in

science and technology, of course, the underlying model will not be perfectly correct and

measurements will not be perfectly accurate. It is essential to use procedures which are robust

against the effects of measurement noise and modelling error.” He further comments that when

matrices underlying underdetermined systems have a sufficiently sparse near-solution, “…the near-

solution with minimal 1l norm is a good approximation to it”. Bruckstein et al. in [133] show that if

136

we further impose non-negativity constraints on the solution in addition to its sparsity, we get a

solution that is unique.

For AMP networks (Figure 6.14) we observe that as the number of monitored paths increases

leading to more stringent constraints for the CO estimator, we see sharp spikes where the predictor

yields high prediction error because the optimizer fails to assign non-negative delays to all links and

terminates prematurely. Moreover, the L1 error does not reach zero even when all rank r paths are

selected for monitoring and the algorithm diverges for AMP-50 after 150 paths are selected for

monitoring. This adds conviction to our initial suspicion that it is due to the presence of routing

matrix inconsistencies.

137

AMP-30-30/Jun/2006

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250Number of monitored paths

L1-e

rror

AMP-50-30/Jun/2006

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300 350 400 450 500

Number of monitored paths

L1-e

rror

Figure 6.14 Comparison of performance of CO estimator for AMP networks.

138

6.5.2 Can RMI be eliminated? The next question is can we identify the rows of the routing matrix M that are the source of RMI.

However, finding such rows is a NP-hard problem, since it would require enumerating all possible

subsets of rows to be plugged into CO estimator to find out the rows containing the inconsistencies.

Since, it is very difficult to infer the actually topology by identification of RMI using only

traceroute snapshot of the network. We propose a straw-man algorithm. Our algorithm for inferring

a consistent routing matrix is centered around removal of false links as highlighted earlier. We first

tabulate all link delays over each 10 min interval as recorded by the traceroutes. Of the three values

for the nth hop, the link delay between the n-lth and the nth hop is calculated taking into view the

least positive non-negative value. This is because it is well known that ICMP replies by routers to

TTL expired packets sent by traceroutes are often rate limited. Similarly, when all three yield a

negative value for the link delay we take the least negative value for the obvious reason. We then

map the topology discovered by the traceroutes into a directed graph G=(V,E) where the vertex set

V represents routers (using the router interface IP address) and the edge set E represents directed

edges between two routers.

We introduced the concept of false links in the preceding section (Figure 6.9). Some of these

false links connect the source with a vertex in the graph by a false link for which there already

exists a path albeit a different one; others connect two vertices for which there is no path in the

actual graph (and the real network!). Such false links can be detected easily in situations when a

link is sighted with seemingly negative value in the majority of the traceroutes and we can be

almost sure that it is not due to a router delaying an ICMP response. Such negative delay links are

removed by finding if there exists another set of links joining the same two vertices without

encountering a negative delay link. This we call the Deletion With Replacement (DWR) heuristic.

If not, then we simply delete the false link ⟩⟨ +1, ii νν and replace it with a new edge by inserting an

edge between one previous vertex 1−iν and 1+iν to yield a longer link ⟩⟨ +− 11 , ii νν with a non-

negative delay as shown in Figure 6.15. We call this as the Deletion With Insertion (DWI) heuristic.

We take care not to detect or delete any interAS link in this manner so as not to destroy the

connectivity of the graph. We use an iterative greedy algorithm for the detection and removal of

such false links exhibiting negative delay values, removing links in turn which lead to the most

reduction of anomalous paths until all anomalies have been resolved. We find that this naïve

algorithm only works for the smaller AMP-30 network but fails to work for AMP-50 (Figure 6.16).

Statistical techniques will be introduced in Section 6.7 to mitigate the effects of RMI.

139

Figure 6.15 Removal of Routing Matrix Inconsistencies (RMI) using the DWI and DWR Heuristic for removal of false links

140

6.5.3 Quantification of RMI Rosen et al. [134] derived both necessary and sufficient conditions for estimation of the correct

solution cx of an over-determined system of algebraic equations.

Let

bAx ≈ (6-10) with

AMP-30-30/Jun/2006

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250

Number of monitored paths

L1-e

rror

CO original CO removal of RMI

AMP-50-30/Jun/2006

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250 300 350 400 450 500

Number of monitored paths

L1-e

rror

CO original CO removal of RMI

Figure 6.16 Comparison of performance of CO estimator before and after removal of RMI for AMP networks.

141

nmA ×ℜ∈ with full column rank n ( nm > ) and, nb ℜ∈

and there are large errors in some rows of ] [ bA with the underlying assumption being that there is

a correct (but unknown) matrix cA and cb . They find that the probability P that the calculated

solution *x will be close to the correct solution cx depends largely on the magnitude of the size of

the measurement data, the parameter nm − . Using an empirical model they find that as an upper

bound;

1=P when σk

nnm 2)(

≥−

[134] (6-11)

where

k is the number of rows of ] [ bA containing large errors (independent of the number of

errors in any particular row of A); and,

σ>0 is the lower bound on the singular values related to A .

A probability 995.0=P can be achieved with knm 222 +≥− [134]. Although as highlighted

earlier that the main goal is to be able to predict unmonitored paths as accurately as possible rather

than the accurate estimation of the link metrics vector, estimating the correct link metrics vector

helps towards this goal. We see later that most of our algebraic system of equations are

underdetermined; this is due to partial network observations; we only select complete traceroutes

for which each probe received a response (lack of response is often shown as stars) for our analysis.

However, the routing matrix M is rank-deficient so nm − relates to the quantity rnp − ( r being

the rank of the routing matrix M ) in our situation. The probability P is deeply related with the

avoidance of selecting any row of bM , with a large error in the subset selected for monitoring for

the ability to estimate the link metrics vector accurately.

If careful techniques are not employed to cater for the mitigation of RMI, problems can be

encountered in the estimation of link-metrics vector. We saw earlier that the BL predictor returns

large prediction errors if the link metrics vector is not estimated carefully to cater for RMI. These

measurement artifacts of the routing matrix estimation using traceroutes necessitate a procedure to

infer the correct (or a more consistent) routing matrix as the methods described previously may

break down completely or return large path estimation errors that could offset any benefits of

monitoring fewer number of paths than the rank of the routing matrix. We analyze and propose

methods to deals with mitigation of such errors.

142

Since, our knowledge of the routing matrix is only limited by the traceroute measurements

conducted between the AMP and RIPE hosts with no external vantage points for measurements, it

is not always possible to remove all inconsistencies from the routing matrix.

In statistical systems involving large number of variables or a modeling errors due to RMI,

collinear relationships can develop between correlated variables, a phenomenon often referred to as

multicollinearity. Such problems can be mitigated by the regularization of the linear statistical

model. This technique of regularizing statistical linear models has often been referred to as Ridge

Regression or Tikhonov regularization. For example, in our case, collinear relationships could exist

between parallel paths selected as a result of load balancing employed by large ASes (Figures 6.6

and 6.7). Thus, when variables in (Ms) are correlated amongst themselves, multicollinearity is said

to exist [135]. In this case Tssss MMV Σ= has a determinant that is very close to zero, and this will

cause:

(a) Round-off errors in the intermediate stages of the matrix calculations. These are especially

serious when the number of predictor variables is large.

(b) In the extreme case, the computations in intermediate stages of matrix calculations may break

down if ssV becomes singular in terms of the precision of the calculation, making it impossible to

compute its inverse i.e. 1)( −ssV . Such errors also impact the accuracy of path predictions.

Such effects can be mitigated by adding a small bias term to the equation for the BL prediction.

Since the collinear-relationships between variables change as different subsets of paths are selected

using Algorithm 1, we use regularization (ridge regression) of the statistical model estimate β so as

to mitigate the effects of multicollinearity and RMI. Here a small bias term is added to the Vss

matrix before taking its inverse as a diagonal matrix.

sssrsTrsrr ycIVVlyylE 1)()|( −+= (6-12)

where ttI × is an identity matrix; ttssVc ×ℜ∈≤≤ ,10 , and

t = number of monitored paths

For the linear system of monitored paths, we estimate β R using:

sssTs

R ycIVM 1))(( −+=β (6-13)

To calculate the value of the constant c , we follow the normal judgmental procedure based on

the analysis of ridge-traces [135-136]. We increment c from 0 to 1 in steps of 0.01. We select the

143

value of c that causes the coefficients Rβ of equation (6-13) to become stable, i.e. we stop

increasing c once we reach the stop condition:

( ) 01.0||||

||||||||

2

22 ≤−

oldR

newR

oldRabsβ

ββ (6-14)

where ||.||2 represents the 2l -norm of a vector

We find that c increases almost monotonically from 0.02 to 1 (for AMP-50) and only 0.02 to 0.56

(for RIPE-40) as the number of monitored paths increase beyond 10% and 50% of rank (r) paths

respectively (Figure 6.17). This indicates that AMP networks suffered more severely from

multicollinearity and RMI than RIPE networks as was observed from Figures 6.7 and 6.14.

00.1

0.20.30.40.50.60.70.8

0.91

0 0.2 0.4 0.6 0.8 1

Number of monitored paths as fraction of rank (r)

valu

e of

ridg

e-co

effic

ient

RIPE-40-05/Sep/2007 AMP-50-30/Jun/2006

Figure 6.17 Computed value of c as the number of sampled paths increase for AMP50 and RIPE-40

144

6.6 Statistical Techniques to Mitigate the Effects of RMI

Note that even measuring a subset of the paths reveals information of the end-to-end path metrics

such as path delay/loss rates but does not necessarily reveal any information about the individual

link metrics on those paths. Thus the problem is to estimate both the monitored and the

unmonitored link metrics so as to minimize the prediction error on the unmonitored paths.

Moreover, the rank deficient system of linear equations does not have a unique solution for the link

metrics vector so it has to be estimated. Literature [8, 22] proposed several estimation techniques

the two most common ones are based on the minimum-norm (sparse) solution

( 10 |||| or |||| ββ MinMin ) and the minimization of 2l -norm of error, i.e. 2||)(|| βss MyMin − by

using the Least Squares (LS) method. Minimum norm (sparse solution) as used earlier can only

help towards finding the optimum solution that reduces overall path prediction error but may not

track individual path properties efficiently. For mitigating the effects of RMI, more robust

estimation of the link metric vector is required.

Link-Metric Vector Estimation based on Iteratively Re-weighted Least-Squares method

Statistical theory for Best Linear Prediction (BLP) [136] suggests estimating β by solving the

following generalized least-squares problem.

).().( 1 βββ ssss

Tss MyVMyMin −− − (6-15)

where sy and sM respectively denote the rows of y and M to be monitored and Tssss MMV Σ=

the covariance between the selected paths where Σ is the link covariance matrix. The solution to

the above is given by the Generalized Least-Squares (GLS) estimate β̂ is given by [136]:

sssTssss

Ts yVMMVM 11 )(ˆ −−−=β (6-16)

where −R denotes the generalized-inverse of matrix R

One drawback of GLS based estimation in [8] of β̂ in BL prediction for ,

εβ += sS My (6-17)

is that it gives equal weight to all observations including the outliers thus penalizing each outlier

equally. Robust regression techniques such as Iteratively Re-weighted Least Squares (IRLS)

attempt to assign small weights to the outliers. Thus, instead of a GLS based estimation proposed in

[8] for BL estimator, we use a weighted version of generalized least squares minimization which

145

can yield superior results in such cases. We use a variant of the specific method of Daubechies et al.

[137] for IRLS in estimating the link metrics vector (Eq 6-18). The algorithm keeps reiterating

(iterations are labelled ,..2,1=t ,50) until it converges. Each iteration of the algorithm tries to find

the new solution 1+tβ at the tht 1+ iteration:

sTsts

Tst

t yMDMMD 11 )( −+ =β (6-18)

where tD is a ee nn × diagonal matrix at tht iteration. We denote the thj diagonal entry of tD as tjw . Once 1+tβ is found, the new weight 1+tw is found by:

ettj

tj ,...,n,,jw 321 ))(( 2/12

1211 =+= −

+++ εβ (6-19)

Here

))(,( 11

1e

Kt

tt nrMin +

+

+ =βεε (6-20)

and enr ℜ∈ , )( 1+tr β is the non-increasing rearrangement of the absolute values of the entries of

1+tβ . Thus, itr )( 1+β is the thi largest element of the set ej

t nj ,..,3,2,1 ,|| 1 =+β . The algorithm

terminates when 01 =+tε or 1+tε stabilizes at some non-negative value. At the start of the algorithm,

)1,...,1,1,1(0 =w and 10 =ε . To initialize K , we compute the number of non-zero elements p in the

initial solution 0β (using 0w ) and set cpK = . We find that the algorithm converges better when

.6.05.0 ≤≤ c .

There are other robust regression techniques besides IRLS based estimation like LMS (Least

Median of Squares) which aims to minimize the median of squares of the error instead of

minimizing the sum (or average) of squares of the errors. However, unlike IRLS, LMS does not

have any closed form expression and requires brute force search for evaluating combinational

subsets of solutions (by removing rows from the set of linear equations which may be the cause of

large overall estimation errors) and thus is not feasible for regression problems of large dimensions.

We refer to the predictor using IRLS based on robust regression for link metrics prediction as the

Robust Predictor in the remainder of this chapter to distinguish it from the BLP [8]. When

estimating the link metrics vector using the IRLS based method, the estimated value for the link

metrics vector as defined in Equation 6-18 is used. We call this the Robust Predictor as the method

works iteratively based on minimizing the residual errors by improving on previous estimate of the

link metrics vector and so removing larger outliers more aggressively. It mimics 1|||| β

minimization albeit with no positivity constraints like the Convex Optimizer used earlier (Section

6.6) by computing a sparse solution.

146

Link-Metric Vector Estimation based on Least-Squares method after regularization of the statistical

model

Regularization of the statistical model (Section 6.5.3) can also act like a simple tool to mitigate

the effects of large statistical errors. We use the estimate of link metrics vector using Ridge

Regression (Tikhonov Regularization) for use in BL predictor. We call this predictor as BL-ridge to

differentiate from the BLP [8].

6.7 Improvement in Path Prediction and Anomaly Detection for AMP and RIPE networks after application of Robust Statistical Techniques

Figure 6.18 and 6.19 shows the L1 error for RIPE and AMP networks after application of robust

statistical prediction Techniques. The L1 error for RIPE networks decreases more sharply. For

AMP networks, overall L1-error is reduced as well as the spikes being diminished in magnitude

when using the Robust estimator.

The iterative nature of robust prediction using IRLS based estimate of link metric vector may be

a cause of concern about its path tracking properties. We show that not only the robust prediction

technique outlined lowers overall path prediction errors on unmonitored paths but also improves the

individual path prediction. Figure 6.20 shows the improvement in the variance of the Relative

Prediction Error (RPE), defined below as the number of monitored paths increases (for AMP-50

and AMP-30).

yactualdeladelaypredicteddelayactualabsRPE ) ( −

= (6-21)

We next select the subset of paths which resulted in large prediction errors based on our results

from Figures 6.7 and 6.14. Figure 6.21 shows sample variation of path delays on one unmonitored

path and its prediction using the BL, BL-ridge and Robust Estimator. We observe that all three

predictors (BL, BL-ridge and Robust) are good at tracking path anomalies showing peaks (in either

direction) corresponding to major path variations even thought the granularity of path

measurements on the monitored paths is of the order of 60 second intervals. Furthermore, since the

path monitoring is not GPS synchronized in the datasets considered, we estimate )(tys ( )(tyr ) as

belonging to (one of the) windows of successive one minute intervals. Hence, the peaks of the

predicted path metrics are sometimes offset by one such window-interval (either side) from the

actual path anomaly. We see that BL-ridge and Robust estimator are more sensitive towards path

anomalies than BL prediction.

147

RIPE-30-05/Sep/2007

00.10.20.30.40.50.60.70.80.9

1

0 100 200 300 400Number of monitored paths

L1-e

rror

BLRobust

RIPE-40-05/Sep/2007

00.10.20.30.40.50.60.70.80.9

1

0 200 400 600Number of monitored paths

L1-e

rror

BLRobust

Figure 6.18 Comparison of the L1-error metric of BL and Robust predictor.

148

AMP-30-30/Jun/2006

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250Number of monitored paths

L1-e

rror

BL Robust

AMP-50-30/Jun/2006

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250 300 350 400 450 500

Number of monitored paths

L1-e

rror

BL Robust

Figure 6.19 Comparison of performance of BL and Robust estimator AMP networks.

149

00.010.020.030.040.050.060.070.080.09

0.1

0.4 0.6 0.8 1

Number of monitored paths as fraction of rank (r)

Varia

nce

of R

elat

ive

Pred

icat

ion

Erro

r

AMP-30-30/Jun/2006 (BL)

AMP-30-30/Jun/2006(BL-Ridge)

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.4 0.6 0.8 1Number of monitored paths as fraction of rank

(r)

Varia

nce

of R

elat

ive

Pred

ictio

n Er

ror AMP-50-30/Jun/2006 (BL)

AMP-50-30/Jun/2006(Robust)

Figure 6.20 Improvement in Variance of Relative Prediction Error using BL-ridge and Robust estimator for AMP networks

150

6.8 Discussion In this section we discuss the impact of routing matrix inconsistencies on algebraic and statistical

path prediction methods and ask the question: Do inconsistencies in the routing matrices pose a real

problem?

The combined work of [7, 125] showed that one can determine completely, or estimate to

predefined tolerance levels, the path metrics on all unmonitored paths by probing only a small

subset S of paths because of extensive underlay link sharing of Internet paths. However, Chen et

al. [7] showed that maximum benefits only occur when the number of overlay hosts N exceeds

100 so that || S is in the range )lg( NNO . We have seen how statistical path prediction errors due

to RMI begin to appear when the network size is much smaller (50 hosts).

3.55 3.6 3.65 3.7x 10

4

30

40

50

60

70

80

90

100

Time (sec since start)

Del

ay (m

sec)

Actual Path DelayBLBL-RidgeRobust

Figure 6.21 Actual, BL, BL-ridge and Robust predictor delay profile for a selected (unmonitored) path in AMP-50-30/Jun/2006.

151

We saw from Figure 6.7 & 6.14, that the effects of RMI are most pronounced after approximately

30% of the linearly-independent rank- r paths have been included in path monitoring set. From

Figure 6.7 this roughly corresponds to an L1-error of 0.2 for both AMP and RIPE in spite of the

rapidly decaying trend; this is clearly not good for accurate path prediction or anomaly detection.

For example in AMP-50, to be able to estimate path-metrics to within 10% L1-error requires that at

least 62% of the linearly-independent rank-r paths be monitored. The first major spike due to

routing matrix inconsistencies occurs when monitoring a small fraction of the linearly-independent

rank- r paths and this problem exacerbates as more paths are selected for monitoring. This shows

that by the time we are able to achieve good path prediction, routing matrix inconsistencies begin to

cause randomly large path prediction errors. These in turn can cause problems in predicting path

anomalies (Figure 6.19), which is one of the prime objectives of RONs; to alleviate path

outages/degradations before a user can detect these. Thus the techniques described in this chapter

for removing RMI are essential in order to allow a subset of paths to be monitored and so reduce

the monitoring overheads that would otherwise limit the scalability of RONs.

This chapter concludes the third contribution of this thesis, namely an investigation of the

practical problems in the area of algebraic and statistical path monitoring when applied to practical

networks. We presented a constrained convex optimization technique to show how RMI can be

identified and also showed how it is related with inaccurate routing knowledge on the network

providing anecdotal evidence from network traceroutes. In addition we quantified the statistical

prediction errors due to RMI through regularization of the linear model. We also studied the impact

of RMI on path prediction; use of robust statistical techniques reduces the path prediction error (L1-

error) by 10-20% over BL estimation. Anomaly detection is also improved through robust statistical

techniques.

6.9 Conclusion Research aimed at reduction of path monitoring overheads by leveraging topological knowledge

seems to be the most promising area of research at the moment [7-8, 21-22] at first sight but

unfortunately the performance benefits they claim to have are only based on limited deployment

over a few selected ISPs, e.g. Abilene and Sprint [8, 22, 87], PlanetLab [26] or simulated topologies

[5, 23-24]. The underlying assumption is that the routing (network layer) topology of the network is

accurately known. These issues need to be addressed in detail using real heterogeneous overlay

deployments in the Internet with limited topological knowledge [50] to fully ascertain their benefits

beyond the theoretical claims.

Our primary aim in this chapter was to investigate the source of practical problems in the area of

algebraic and statistical path monitoring. These mainly stem from incorrect topology estimation due

152

to the measurement artifacts of traceroutes. These can result in inaccurate estimation of path

metrics. More advanced topology estimation techniques using more robust route path tracing, e.g.

[129], or exploiting techniques to correct such inaccurate path information [52, 138] can help

towards improving such statistical path estimation techniques.

153

7 CONCLUSIONS AND PROPOSALS FOR FUTURE DIRECTIONS OF RESEARCH

7.1 Reviewing the Goal BGP can suffer from delayed convergence after failure, and Internet flows seeking QoS

guarantees may seek alternate paths to mask such failures. Resilient Overlay Networks can quickly

provide such alternate paths. However, this requires large overheads for path monitoring to be able

to select best alternate routes. The thesis of this dissertation is to investigate heuristics that make

Resilient Overlay Network management more scalable. We established this thesis in terms of three

intertwined yet competing aspects of scalability; architecture, path selection and path monitoring

overheads.

7.1.1 Architecture

RON suffers from scalability problems. Aggressive path probing on all end-to-end paths

between overlay hosts (overlay links) does not scale well beyond tens of hosts [4]. In Chapter 4, we

showed a landmark based distributed architecture that can enable overlay networks to scale well

while using a very sparse topology - )(NO instead of )( 2NO overlay links. The sparse topology

equates to an equal reduction in path monitoring overheads. We presented techniques for

determining how overlay hosts should select a small set of geographically diversified detours. We

showed that in spite of such a sparse topology, it can find a good working path with a very high

probability.

7.1.2 Path Selection

Path selection in Resilient Overlay Networks is directly tied with path monitoring overheads.

These path monitoring overheads could be traded with heuristics that enable disjoint path selection.

Previous studies, e.g. [6], show that an intuitive method of selecting disjoint paths is just to select

one which diverges earliest from the direct path. However, this should be based on AS level paths

which are easier to obtain than IP level paths. In Chapters 3 and 5, we showed that that a significant

percentage of one-hop overlay paths shared similar levels of path disjointness, thus making the

process of path selection even more challenging. In Chapter 5, we presented a more elegant graph

based algorithm to cater for this problem, i.e. to filter out a small set of disjoint paths to make path

154

selection easier. We then presented our technique of greedy selection using a ToR graph [27, 116].

We showed that not only the number of candidate paths could be brought down to a small number

but also it filtered out the good paths by picking a path performing close to the best possible path in

a large majority of the cases.

7.1.3 Path Monitoring

Previous research has shown the possibility of statistical techniques [8] of monitoring paths based

on network tomography principles [7]. Such techniques depend on an accurate snapshot of the

routing topology of the overlay network. Previous works [8] have investigated such statistical path

prediction techniques for networks whose topology was well known e.g. Abilene. In Chapter 6, we

highlighted how such an accurate snapshot of large networks using only commodity tools, e.g.

traceroutes, is impossible to obtain. We then presented methods to reduce or eliminate the effects of

such topology estimation errors by (a) identifying and fixing topology estimation errors; and (b)

harnessing techniques in statistics, e.g. robust estimation, to deal with them when they cannot be

identified and removed completely.

7.2 Future Research Directions Future research in overlay networks will revolve around the same three aspects of enhancing and

improving RON management described above.

7.2.1 More accurate overlay topology ‘modeling’

Research aimed at reduction of path monitoring overheads by leveraging topological knowledge

seems to be the most promising area of research at the moment [7-8, 21-22] at first sight but

unfortunately the performance benefits they claim to have are only based on limited deployment

over a few selected ISPs e.g. Abilene and Sprint [8, 22, 87], PlanetLab [26] or simulated topologies

[23-25]. The underlying assumption is that the routing (network layer) topology of the network is

accurately known. These issues need to be addressed in detail using real heterogeneous overlay

deployments in the Internet with limited topological knowledge [50] (Chapter 6) to fully ascertain

their benefits beyond the theoretical claims.

7.2.2 Accurate depiction of Internet failure models Due to unavailability of real Internet failure information, some studies employ analytical models

for generating failure scenarios on Internet paths; e.g. LM1 model [23] and exponentially

distributed failures [139]. This may lead to an overestimation of the efficiency of overlay networks

155

in computing alternate paths. Naidu et al. [49] claim anomalies to be very rare events in the Internet

than suggested by prior studies. Exploiting this fact could lead to non-negligible reduction of path

monitoring overheads achieved by conservative methods of other researchers [8, 21]. Also, it would

be difficult to compare results across different studies unless accurate modeling of Internet failure

occurrence is not dealt with seriously.

7.2.3 Investigation of synergy between competing overlays

There have been overt criticisms the research community directed against selfish routing by

overlay networks [39, 70] (Chapter 2, Section 2.2.5). However, again such claims have been made

on emulated hypothetical situations, when path monitoring and path switching decisions in two or

more overlays cause them to synchronize, i.e. switch traffic to the same path simultaneously. There

is an urgent need for large scale deployment of multiple overlays to see the impact on the underlay

network mechanisms when competing for bandwidth on same set of underlay links. For example,

previous studies [140-141] have shown that content distribution overlays that are locality aware do

not hurt ISP objectives as they are optimized to fetch content from the nearest location, e.g. within

the ISP, a thing an ISP would also prefer from a commercial point of view. Such RONs will try to

shift traffic within a certain radius in the network, e.g. choosing a relay node (detour) very close to

the source or destination; thus it may not cause appreciable harm to other traffic flows.

In addition, routing overlays such as RONs could be made to sense the presence of other overlays

around them by monitoring the behavior of its frequently occurring path switching cycles and

employing a randomized hysteresis algorithm to vary its anomaly detection and path switching

algorithms to prevent any type of synchronization with other overlays. There is also an urgent need

to study the business models that will evolve out of competing overlays. A large RON operator may

actually be willing to provide its services to smaller RON operators in exchange for a fee.

156

157

APPENDIX

We introduce some matrix notation before deriving the equation for the BL-estimator. We first

sort the rows in the matrix M, according to the largest singular values using a row-permutation

(detail later). The values of the column vector y are similarly sorted. Let us denote the new matrix

and column vector as ep nnnewM ×∈ ]1,0[ and pn

newy ℜ∈ respectively. Let sM represent the rows

(paths) of newM which are selected for monitoring because they can approximate the largest

singular dimensions to approximate the complete path matrix M well enough for reasonably

predicting the unmonitored paths.

news

news

yky

MkM

:),:1(

:),:1(

=

= (A.1)

where the notation Jba :),:( and Jba ):(:, refer to rows a through b and columns a through b

( a and b inclusive) respectively of matrix J .

Similarly, the unmonitored paths and path metrics are the remaining rows of newM and newy , as

shown below.

newpr

newpr

ynky

MnkM

:),:1(

:),:1(

+=

+= (A.2)

The vectors y, ys and yr will vary over time so references to ys and yr relate to the values of )(tys

and )(tyr at some instant t . If we let β and Σ be the mean and covariance of link delays

respectively, then the mean (ν ) and covariance (V ) of y can be expressed as:

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=

ββ

νν

νr

s

r

s

MM

(A.3)

⎥⎦

⎤⎢⎣

ΣΣΣΣ

=⎥⎦

⎤⎢⎣

⎡= T

rrTsr

Trs

Tss

rrrs

srss

MMMMMMMM

VVVV

V (A.4)

Chua et al [125] found the link-covariance matrix Σ to be dominated by the diagonal elements

for the considered Abilene network (the variance of the link delay values), with other elements

mainly zero. For the datasets we consider, we find that the link covariance cannot be calculated

efficiently by using traceroutes alone to infer link delays, as some traceroutes anomalously report

smaller path delays to thn 1+ hop than the thn hop, implying a negative link delay at thn 1+ IP hop.

Due to these measurement artifacts, we assume Σ to be an identity matrix like [142]. However, we

158

find that some nontrivial interrelationships between link properties can arise in practical situations

as we discuss in the next section.

The BL estimator for an unknown parameter y given x [136] is:

*)()|( cxxyE xy µµ −+= (A.5)

where µx=E(x), µy=E(y), c* is the solution to Vxxc=Vxy (Vxx=Cov(x), Vxy=Cov(x,y) [136] (Section

6.3))

Similarly the BL-estimator for path metrics on the unmonitored paths (yr) given the path metrics

on monitored paths (ys) is given by:

)()|( * ββ ssTrr

Trsr

Tr MyclMlyylE −+= (A.6)

(where c* is any solution to c*Vss=Vrs, and lr is a column-vector with the one element set to 1 (and

others to 0) so as to select one row of Mr corresponding to a particular unmonitored path.)

Since, the BL-estimator in (7) cannot be realized without knowledge of β; one natural solution is

to estimate it from the data. Statistical theory [136] suggests estimating β by minimizing the

following generalized least-squares problem.

).().( 1 βββ ssss

Tss MyVMyMin −− − (A.7)

And the generalized least-squares estimate β̂ is given by [136]:

sssTssss

Ts yVMMVM 11 )(ˆ −−−=β (A.8)

where R- denotes the generalized-inverse of matrix R

And after substituting β̂ in (A.7) and simplifying, the BL estimator becomes:

sssrsTrsrr yVVlyylE 1)()|( −= (A.9)

159

REFERENCES: [1] The AS Number Report see http://www.potaroo.net/tools/asn32/. [2] C. Labovitz, et al., "Delayed Internet routing convergence," in SIGCOMM '00: Proceedings of the

conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2000, pp. 175-187.

[3] X. Yang and D. Wetherall, "Source selectable path diversity via routing deflections," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 159-170.

[4] D. Andersen, et al., "Resilient overlay networks," in SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001, pp. 131-145.

[5] C. Tang and P. K. McKinley, "Improving multipath reliability in topology-aware overlay networks," in Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, 2005, pp. 82-88.

[6] T. Fei, et al., "How to Select a Good Alternate Path in Large Peer-to-Peer Systems?," in Infocomm 06, Barcelona, Spain, 2006.

[7] Y. Chen, et al., "Tomography-based overlay network monitoring," in IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, 2003, pp. 216-231.

[8] D. B. Chua, et al., "Network Kriging," Selected Areas in Communications, IEEE Journal on, vol. 24, pp. 2263-2272, 2006.

[9] D. Andersen, et al., "Best-path vs. multi-path overlay routing," in IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, 2003, pp. 91-100.

[10] A. Akella, et al., "A comparison of overlay routing and multihoming route control," in SIGCOMM '04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, 2004, pp. 93-106.

[11] D. G. Andersen, et al., "Improving Web Availability for Clients with MONET," in 2nd Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA 2005.

[12] L. Subramanian, et al., "HLP: a next generation inter-domain routing protocol," in SIGCOMM '05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, 2005, pp. 13-24.

[13] W. Xu and J. Rexford, "MIRO: multi-path interdomain routing," SIGCOMM Comput. Commun. Rev., vol. 36, pp. 171-182, 2006.

[14] X. Yang, "NIRA: a new Internet routing architecture," in FDNA '03: Proceedings of the ACM SIGCOMM workshop on Future directions in network architecture, 2003, pp. 301-312.

[15] R. Teixeira, et al., "Network sensitivity to hot-potato disruptions," in SIGCOMM '04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, 2004, pp. 231-244.

[16] K. Gummadi, et al., "Improving the Reliability of Internet Paths with One-hop Source Routing," in OSDI '04, 2004, pp. 183-198.

[17] S. Savage, et al., "Detour: a Case for Informed Internet Routing and Transport," IEEE Micro, vol. Vol 19, no 1 pp. 50-59, January 1999.

[18] Z. Li and P. Mohapatra, "The Impact of Topology on Overlay Routing Service," in Infocom, Hong Kong, 2004.

[19] A. Nakao, et al., "Scalable routing overlay networks," SIGOPS Oper. Syst. Rev., vol. 40, pp. 49-61, 2006.

[20] S. Han Hee, et al., "NetQuest: a flexible framework for large-scale network measurement," SIGMETRICS Perform. Eval. Rev., vol. 34, pp. 121-132, 2006.

[21] Y. Chen, et al., "Algebra-based scalable overlay network monitoring: algorithms, evaluation, and applications," IEEE/ACM Trans. Netw., vol. 15, pp. 1084-1097, 2007.

[22] M. Coates, et al., "Compressed network monitoring for ip and all-optical networks," in IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007, pp. 241-252.

[23] C. Tang and P. K. McKinley, "On the cost-quality tradeoff in topology-aware overlay path probing," in Network Protocols, 2003. Proceedings. 11th IEEE International Conference on, 2003, pp. 268-279.

160

[24] C. Tang and P. K. McKinley, "A distributed approach to topology-aware overlay path monitoring," in Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, 2004, pp. 122-131.

[25] C. Tang and P. K. McKinley, "Improving Multipath Reliability in Topology-Aware Overlay Networks," in Proceedings of the Fourth International Workshop on Assurance in Distributed Systems and Networks (ADSN 2005) (in conjunction with IEEE ICDCS), Columbus, Ohio, USA, 2005.

[26] H. H. Song, "Scalable and Flexible Network Measurement (Masters Thesis) ", Department of Computer Science, University of Texas at Austin, 2006.

[27] S. Qazi and T. Moors, "Using Type-of-Relationship (ToR) Graphs to Select Disjoint Paths in Overlay Networks," in GLOBECOM 2007, pp. 2602-2606.

[28] Y. Zhu, et al., "Dynamic overlay routing based on available bandwidth estimation: a simulation study," Comput. Networks, vol. 50, pp. 742-762, 2006.

[29] G. Kwon and K. Ryu, "BYPASS: topology-aware lookup overlay for DHT-based P2P file locating services," in Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on, 2004, pp. 297-304.

[30] B. Y. Zhao, et al., "Brocade: Landmark Routing on Overlay Networks," in IPTPS '02, MIT Faculty Club, Cambridge, MA, USA., 2002.

[31] B. Y. Zhao, et al., "Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays," in IEEE International Conference on Network Protocols (ICNP 2003), Atlanta, Georgia, USA, 2003.

[32] A.-J. Su, et al., "Drafting behind Akamai (travelocity-based detouring)," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 435-446.

[33] M. Faloutsos, et al., "On Power-Law Relationships in Internet topology," in Sigcom 99, Cambridge, MA, USA, 1999.

[34] B. Eriksson, et al., "Network discovery from passive measurements," SIGCOMM Comput. Commun. Rev., vol. 38, pp. 291-302, 2008.

[35] S. Ratnasamy, et al., "A Scalable Content Addressable Network," in SIGCOMM '01, San Diego, USA, 2001.

[36] I. Stoica, et al., "Chord: a scalable peer-to-peer lookup protocol for Internet applications," Networking, IEEE/ACM Transactions on, vol. 11, pp. 17-32, 2003.

[37] S.-J. Lee, et al., "Bandwidth-Aware Routing in Overlay Networks," in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 2008, pp. 1732-1740.

[38] T. Rakotoarivelo, et al., "A Super-Peer based Method to Discover QoS Enhanced Alternate Paths," in Communications, 2005 Asia-Pacific Conference on, 2005, pp. 454-458.

[39] B.-G. Chun, et al., "Characterizing Selfishly Constructed Overlay Routing Networks," in Proceedings of the 23rd IEEE International Conference on Computer Communications (INFOCOM 2004), 2004.

[40] J. Han, et al., "Topology aware overlay networks," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 2554-2565 vol. 4.

[41] "RIPE, Test Traffic Measurements (TTM) Home Page. See http://www.ripe.net/projects/ttm/data.html."

[42] Active Measurement Project (AMP). see http://watt.nlanr.net/. [43] D. Anderson, et al., "Best Path Vs Multi-path Overlay Routing," in IMC’03 . Miami Beach, Florida,

USA, 2003. [44] D. Antonova, et al., "Managing a portfolio of overlay paths," in NOSSDAV '04: Proceedings of the

14th international workshop on Network and operating systems support for digital audio and video, 2004, pp. 30-35.

[45] H. Madhyastha, et al., "iPlane: an information plane for distributed services," in OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, Seattle, Washington, 2006, pp. 367-380.

[46] H. V. Madhyastha, et al., "A Structural Approach to Latency Prediction," presented at the IMC 2006, 2006. .

[47] H. V. Madhyastha, et al., " iPlane Nano: Path Prediction for Peer-to-Peer Applications. ," in NSDI 2009, 2009.

161

[48] A. Broido and k. Claffy, "Analysis of RouteViews BGP data: policy atoms " presented at the Network Resource Data Management Workshop, 2001.

[49] K. V. M. Naidu, et al., "Detecting Anomalies Using End-to-End Path Measurements," in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 2008, pp. 1849-1857.

[50] S. Qazi and T. Moors, "Practical Issues of Statistical Path Monitoring in Overlay Networks with Large, Rank-Deficient Routing Matrices," in Broadnets, London, UK, 2008.

[51] Y. Zhang and N. Duffield, "On the constancy of internet path properties," in IMW '01: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, 2001, pp. 197-211.

[52] The Skitter Project (CAIDA) 2002. http://www.caida.org/tools/measurement/skitter/. [53] C.-M. Cheng, et al., "Path probing relay routing for achieving high end-to-end performance," in

Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE, 2004, pp. 1359-1365 Vol.3. [54] M. Coates, et al., "Maximum likelihood network topology identification from edge-based unicast

measurements," SIGMETRICS Perform. Eval. Rev., vol. 30, pp. 11-20, 2002. [55] R. Govindan and H. Tangmunarunkit, "Heuristics for Internet map discovery," in INFOCOM 2000.

Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 2000, pp. 1371-1380 vol.3.

[56] F. Viger, et al., "Detection, understanding, and prevention of traceroute measurement artifacts," Comput. Netw., vol. 52, pp. 998-1018, 2008.

[57] M. Luckie, et al., "Traceroute Probe Method and Forward IP Path Inference," presented at the Internet Measurement Conference (IMC '08), Vouliagmeni, Greece, 2008.

[58] A. Nakao, et al., "A Routing Underlay for Overlay Networks " in SIGCOMM’03 Karlsruhe, Germany, 2003.

[59] W. Cui, et al., "Backup path allocation based on a correlated link failure probability model in overlay networks," in Proceedings of 10th IEEE International Conference on Network Protocols (ICNP’02), Paris, France, 2002, pp. 236-247.

[60] R. Kawahara, et al., "On the Quality of Triangle Inequality Violation Aware Routing Overlay Architecture," in INFOCOM 2009. The 28th Conference on Computer Communications. IEEE, Rio de Janeiro, 2009, pp. 2761-2765.

[61] M. Uchida, et al., "QoS-Aware Overlay Routing with Limited Number of Alternative Route Candidates and Its Evaluation," IEICE Trans Commun, vol. E89-B, pp. 2361-2374, 2006.

[62] N. Hu and P. Steenkiste, "Exploiting internet route sharing for large scale available bandwidth estimation," in IMC '05: Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, Berkeley, CA, 2005, pp. 16-16.

[63] L. Gao, "On inferring autonomous system relationships in the internet," IEEE/ACM Trans. Netw., vol. 9, pp. 733-745, 2001.

[64] F. Dabek, et al., "Designing a DHT for Low Latency and High Throughput," in NSDI '04, 2004, pp. 85-98.

[65] S. Ratnasamy, et al., "A scalable content-addressable network," in SIGCOMM '01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp. 161-172.

[66] (2000) Fast Internet Content Delivery with FreeFlow, Akamai see www.cs.washington.edu/homes/ratul/akamai/freeflow.pdf

[67] Z. Li and P. Mohapatra, "QRON: QoS-aware routing in overlay networks," Selected Areas in Communications, IEEE Journal on, vol. 22, pp. 29-40, 2004.

[68] S. D. Patek, et al., "Enhancing aggregate QoS through alternate routing," in Global Telecommunications Conference, 2000. GLOBECOM '00. IEEE, 2000, pp. 611-615 vol.1.

[69] L. Subramanian, et al., "OverQoS: offering Internet QoS using overlays," SIGCOMM Comput. Commun. Rev., vol. 33, pp. 11-16, 2003.

[70] R. Keralapura, et al., "Race Conditions in Coexisting Overlay Networks," Networking, IEEE/ACM Transactions on, vol. 16, pp. 1-14, 2008.

[71] H. Tangmunarunkit, et al., "Network Topology Generators: Degree based vs Structural," in Sigcomm '02, Pittsburgh, Pennsylvania, USA, 2002.

[72] S. Zhou and R. J. Mondragon, "The rich club phenomenon in internet topology," IEEE Communication letters, vol. 8, pp. 180-182, March 2004.

[73] PlanetLab. see http://www.planet-lab.org/. Available: http://www.planet-lab.org/

162

[74] H. Chang, et al., "Internet connectivity at the AS-level: an optimization-driven modeling approach," in MoMeTools '03: Proceedings of the ACM SIGCOMM workshop on Models, methods and tools for reproducible network research, 2003, pp. 33-46.

[75] S. Jaiswal, et al., "Comparing the structure of power-law graphs and the Internet AS graph," in Network Protocols, 2004. ICNP 2004. Proceedings of the 12th IEEE International Conference on, 2004, pp. 294-303.

[76] S. Agarwal, et al., "OPCA: robust interdomain policy routing and traffic control," in Open Architectures and Network Programming, 2003 IEEE Conference on, 2003, pp. 55-64.

[77] A. Bremler-Barr, et al., "Improved BGP Convergence via Ghost Flushing," in Infocom '03, San Francisco, USA, 2003.

[78] W. Xu and J. Rexford, "MIRO: multi-path interdomain routing," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 171-182.

[79] J. Chandrashekar, et al., "Limiting path exploration in BGP," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 2337-2348 vol. 4.

[80] J. Chandrashekar, et al., "Fixing BGP, one as at a time," in NetT '04: Proceedings of the ACM SIGCOMM workshop on Network troubleshooting, 2004, pp. 295-300.

[81] D. Pei, et al., "BGP-RCN: improving BGP convergence through root cause notification," Comput. Netw. ISDN Syst., vol. 48, pp. 175-194, 2004.

[82] O. Bonaventure, et al., "Achieving sub-50 milliseconds recovery upon BGP peering link failures," IEEE/ACM Trans. Netw., vol. 15, pp. 1123-1135, 2007.

[83] C. Labovitz, et al., "Delayed Internet routing convergence," Networking, IEEE/ACM Transactions on, vol. 9, pp. 293-306, 2001.

[84] J. Luo, et al., "An Approach to Accelerate Convergence for Path Vector Protocol," in Globecom '02, Tapei, Taiwan, ROC, 2002.

[85] N. Kushman, et al., "R-BGP: Staying Connected in a Connected World," in 4th USENIX Symposium on Networked Systems Design & Implementation 2007, pp. 341-354.

[86] B. Quoitin, et al., "Interdomain traffic engineering with BGP," Communications Magazine, IEEE, vol. 41, pp. 122-128, 2003.

[87] M. Motiwala, et al., "Path splicing," in SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication, Seattle, WA, USA, 2008, pp. 27-38.

[88] M. Shand and S. Bryant, "IP Fast Reroute Framework," draft-ietf-rtgwg-ipfrr-framework-10, work in progress, Feb 27 2009.

[89] P. Francois and O. Bonaventure, "An evaluation of IP-based fast reroute techniques," in CoNEXT '05: Proceedings of the 2005 ACM conference on Emerging network experiment and technology, Toulouse, France, 2005, pp. 244-245.

[90] S. Singh, et al., "Asynchronous Transfer Mode (ATM) over Layer 2 Tunneling Protocol Version 3 (L2TPv3), RFC 4454," May 2006.

[91] A Path Computation Element (PCE)-Baed Architecture, IETF RFC 4655, 2006. [92] M. Yannuzzi, et al., "On the challenges of establishing disjoint QoS IP/MPLS paths across multiple

domains," Communications Magazine, IEEE, vol. 44, pp. 60-66, 2006. [93] I. v. Beijnum. (2002 A Look at Multihoming and BGP. See

http://www.oreillynet.com/pub/a/network/2002/08/12/multihoming.html. Available: http://www.oreillynet.com/pub/a/network/2002/08/12/multihoming.html

[94] G. Huston. (2004, BGP Routing Table Analysis Reports, http://bgp.potaroo.net/ Available: http://bgp.potaroo.net/

[95] T. Bu, et al., "On characterizing BGP routing table growth," Comput. Netw., vol. 45, pp. 45-54, 2004.

[96] C. De Launois and M. Bagnulo, "The paths toward IPv6 multihoming," Communications Surveys & Tutorials, IEEE, vol. 8, pp. 38-51, 2006.

[97] O. Antonova, "Introduction and Comparison of SCTP, TCP-MH, DCCP protocols," 2004. [98] S. Tao, et al., "Exploring the performance benefits of end-to-end path switching," in Network

Protocols, 2004. ICNP 2004. Proceedings of the 12th IEEE International Conference on, 2004, pp. 304-315.

163

[99] J. Han, et al., "An Experimental Study of Internet Path Diversity," Dependable and Secure Computing, IEEE Transactions on, vol. 3, pp. 273-288, 2006.

[100] G. Huston. The growth of the bgp table - 1994 to present. http://bgp.potaroo.net Available: http://bgp.potaroo.net

[101] CAIDA , The Cooperative Association for Internet Data Analysis see http://www.caida.org/home/. [102] NLANR-AMP, "Location of AMP monitors. see http://watt.nlanr.net/," ed. [103] RIPE-NCC, "Location of RIPE monitors. see http://www.ripe.net/projects/ttm/Plots/locations.cgi,"

ed. [104] S. Savage, et al., "The end-to-end effects of Internet path selection," in SIGCOMM '99: Proceedings

of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999, pp. 289-299.

[105] M. Faloutsos, et al., "On power-law relationships of the Internet topology," in SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999, pp. 251-262.

[106] CAIDA AS Relationships Dataset, see http://www.caida.org/data/active/as-relationships/. [107] R. Keralapura, et al., "Can ISPs Take the Heat from Overlay Networks?," presented at the HotNets

(04), San Diego, CA USA 2004. [108] T. S. E. Ng and H. Zhang, "Predicting Internet network distance with coordinates-based

approaches," in INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 2002, pp. 170-179 vol.1.

[109] V. Padmanabhan and L. Subramanian, "An investigation of geographic mapping techniques for internet hosts," in SIGCOMM '01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp. 173-185.

[110] S. Ratnasamy, et al., "Topologically-Aware Overlay Construction and Server Selection," in Infocom, New York, NY, USA, 2002.

[111] M. Costa, et al., "PIC: practical Internet coordinates for distance estimation," in Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, 2004, pp. 178-187.

[112] P. Francis, et al., "IDMaps: A Global Internet Host Distance Estimation Service," ed, 2000. [113] L. Tang and M. Crovella, "Virtual Landmarks for the Internet," in IMC’03, Miami Beach, Florida,

USA, 2003. [114] G. Mohan, et al., "Efficient algorithms for routing dependable connections in WDM optical

networks," Networking, IEEE/ACM Transactions on, vol. 9, pp. 553-566, 2001. [115] T. Rakotoarivelo, et al., "A structured peer-to-peer method to discover QoS enhanced alternate

paths," in Information Technology and Applications, 2005. ICITA 2005. Third International Conference on, 2005, pp. 671-676 vol.2.

[116] T. Erlebach, et al., "Cuts and Disjoint Paths in the Valley-Free Path Model," presented at the Proceedings of the First Workshop on Combinatorial and Algorithmic Aspects of Networking (CAAN), 2004

[117] G. Di Battista, et al., "Computing the types of the relationships between autonomous systems," in INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies. IEEE, 2003, pp. 156-165 vol.1.

[118] J. Xia and L. Gao, "On the evaluation of AS relationship inferences [Internet reachability/traffic flow applications]," in Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE, 2004, pp. 1373-1377 Vol.3.

[119] R. E. T. J.W. Suurballe, "A Quick Method for Finding Shortest Pairs of Disjoint Paths," Networks vol. Vol. 14, pp. pp 325-336, 1984.

[120] J. Kleinberg, "Approximation Algorithms for Disjoint Paths Problems, PhD thesis," PhD thesis, Dept. of EECS MIT 1996.

[121] T. Rakotoarivelo, et al., "Enhancing QoS Through Alternate Path: An End-to-End Framework " in ICN 2005, 4th International Conference on Networking ReunionIsland, France, 2005, pp. 125-132.

[122] Cymru IP TO ASN Whois Service. http://www.cymru.com/. [123] GNU netcat. see http://netcat.sourceforge.net. [124] RouteViews. Available: http://www.routeviews.org/

164

[125] D. B. Chua, et al., "Efficient monitoring of end-to-end network properties," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 1701-1711 vol. 3.

[126] H. V. Madhyastha, et al., "iPlane: An Information Plane for Distributed Services," in In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Seattle, WA, 2006, pp. 367-380,.

[127] G. H. Golub and C. F. V. Loan, Matrix Computations, Third ed.: John Hopkins, 1996. [128] N. Spring, et al., "Measuring ISP topologies with rocketfuel," in SIGCOMM '02: Proceedings of the

2002 conference on Applications, technologies, architectures, and protocols for computer communications, 2002, pp. 133-145.

[129] B. Augustin, et al., "Avoiding traceroute anomalies with Paris traceroute," in IMC '06: Proceedings of the 6th ACM SIGCOMM on Internet measurement, 2006, pp. 153-158.

[130] D. Dobson and F. Santosa, "Recovery of blocky images from noisy and blurred data," SIAM J. Appl. Math., vol. 56, pp. 1181-1198, 1996.

[131] E. J. Candes, et al., "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," Information Theory, IEEE Transactions on, vol. 52, pp. 489-509, 2006.

[132] D. Donoho, "For most large underdetermined systems of equations, the minimal â„“1-norm near-solution approximates the sparsest near-solution," Communications on pure and applied mathematics, vol. 59, pp. 907-934, 2006.

[133] A. Bruckstein, et al., "On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations," IEEE Transactions on Information Theory, vol. 54, pp. 4813-4820, 2008.

[134] Rosen, et al., "Accurate Solution to Overdetermined Linear Equations with Errors Using L1 Norm Minimization," Computational Optimization and Applications, vol. 17, pp. 329-341, 2000.

[135] J. Neter, et al., Applied Linear Regression Models, Third ed.: Irwin, 1996. [136] R. Christensen, Plane Answers to Complex Questions: The Theory of Linear Models, Third ed.:

Springer, 2002. [137] I. Daubechies, et al., "Iteratively Re-weighted Least Squares minimization: Proof of faster than

linear rate for sparse recovery," in Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on, 2008, pp. 26-29.

[138] The Archiplego Project (CAIDA) http://www.caida.org/projects/ark/. [139] W. Cui, et al., "Backup path allocation based on a correlated link failure probability model in

overlay networks," in Network Protocols, 2002. Proceedings. 10th IEEE International Conference on, 2002, pp. 236-245.

[140] V. Aggarwal, et al., "Can ISPS and P2P users cooperate for improved performance?," SIGCOMM Comput. Commun. Rev., vol. 37, pp. 29-40, 2007.

[141] T. Karagiannis, et al., "Should internet service providers fear peer-assisted content distribution?," in IMC '05: Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, Berkeley, CA, 2005, pp. 6-6.

[142] D. B. Chua, et al., "A Statistical Framework Fo Efficient Monitoring Of End-to-End Network Properties," CoRR, vol. abs/cs/0412037, 2004.