parameterising and modelling the internet topology

143
Parameterising and Modelling the Internet Topology Shi Zhou Department of Electronic Engineering Queen Mary, University of London A thesis submitted to the University of London for the degree of Doctor of Philosophy. July 2004

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parameterising and Modelling the Internet Topology

Parameterising and Modelling

the Internet Topology

Shi Zhou

Department of Electronic Engineering

Queen Mary, University of London

A thesis submitted to the University of London

for the degree of Doctor of Philosophy.

July 2004

Page 2: Parameterising and Modelling the Internet Topology

To my family.

2

Page 3: Parameterising and Modelling the Internet Topology

Abstract

Simulation plays a vital role in studying the complex behaviour of both existing

telecommunications networks and proposed future architecture. When modelling

the behaviour of the Internet it is crucial to obtain a good description of its

structure, because structure fundamentally affects function. The aim of this work

is to provide quantitative parameters to fully characterise network structures and

propose realistic models which can accurately reproduce the Internet topology at

the autonomous systems (AS) level.

This thesis introduces the novel concept of rich-club phenomenon to describe

the Internet hierarchical structure, where a small number of highly connected

nodes are tightly interconnected with each other. This structure is quantitatively

characterised by the rich-club connectivity and the node-node link distribution.

The metric of the rich-club connectivity is a milestone on parameterising the In-

ternet topology. Using this unique metric, the author reports that the existing

degree-based models do not match the Internet hierarchical structure. The author

shows that an appreciation of the rich-club connectivity is essential for a proper

examination of the network behaviours, such as routing efficiency, redundancy

and robustness. The author also uses this metric to reveal the major topological

disparities between the Internet measurements obtained using different method-

ologies.

The author introduces an original Interactive Growth (IG) model, which

closely resembles both the power-law degree distribution and the rich-club connec-

tivity of the AS-level Internet. Based on observations on the Internet history data,

3

Page 4: Parameterising and Modelling the Internet Topology

the author improves the IG model and introduce the Positive-Feedback Preference

(PFP) model, which is doubtlessly the most complete and detailed model to date.

The PFP model accurately reproduces all the relevant topological properties of

the Internet, including degree distribution, rich-club connectivity, the maximum

degree, shortest path length, short cycles, disassortative mixing and betweenness

centrality. The PFP model’s non-linear preference mechanism provides a novel

insight into the basic dynamics that could be responsible for the evolving topology

of complex networks.

This successful research has provided a number of promising contributions.

These achievements represent a profound extension of the state-of-the-art knowl-

edge in the area of parameterising and modelling the Internet topology.

4

Page 5: Parameterising and Modelling the Internet Topology

Acknowledgements

The author would like to express his deepest gratitude to the many people who

have kindly supported and assisted his work, including Dr. Chris Phillips and

Dr. Matthew Woolf, specially to his supervisor, Dr. Raul J. Mondragon, for his

great help and guidance through every step of the author’s research. Thanks also

to Dr. Andre Broido (CAIDA) for the inspiring discussions. The author thanks the

hospitality and support of Department of Electronic Engineering, Queen Mary,

University of London.

This work was funded by the U.K. Engineering and Physical Sciences Research

Council (EPSRC) under Grant No. GR-R30136-01.

5

Page 6: Parameterising and Modelling the Internet Topology

Contents

Abstract 3

Acknowledgements 5

Contents 6

List of Figures 11

List of Tables 15

1 Introduction 16

1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2 Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . 17

1.2.1 Parameterising The Internet Topology . . . . . . . . . . . 17

1.2.2 Modelling The Internet Topology . . . . . . . . . . . . . . 18

1.3 Structure of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Preliminaries 20

2.1 Internet Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Topological Properties . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 Network Size . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.2 Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.3 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 23

2.2.3.1 Poisson Degree Distribution . . . . . . . . . . . . 23

6

Page 7: Parameterising and Modelling the Internet Topology

2.2.3.2 Power-Law Degree Distribution . . . . . . . . . . 24

2.2.4 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 25

2.2.5 Node Betweenness Centrality . . . . . . . . . . . . . . . . 26

2.2.6 Clustering Coefficient . . . . . . . . . . . . . . . . . . . . . 27

2.2.7 Disassortative Mixing (Degree Correlations) . . . . . . . . 27

2.3 Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 Small-World Networks . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Measurements and Models Of The AS-Level Internet 33

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Topology Measurements Of The AS-Level Internet . . . . . . . . . 33

3.2.1 Passive Measurement - BGP AS Graph . . . . . . . . . . . 34

3.2.2 Extended BGP AS Graph . . . . . . . . . . . . . . . . . . 34

3.2.3 Active Measurement - Traceroute AS Graph . . . . . . . . 35

3.2.4 Discovery Of The Internet Power-Law Degree Distribution 36

3.2.5 Which AS Graph? . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Topology Models Of The AS-Level Internet . . . . . . . . . . . . . 37

3.3.1 Tiers Model . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.2 GT-ITM Model . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.3 User-Provider Model . . . . . . . . . . . . . . . . . . . . . 38

3.3.4 Inet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.5 Barabasi and Albert Model . . . . . . . . . . . . . . . . . 39

3.3.6 Fitness BA Model . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.7 Generalised BA Model . . . . . . . . . . . . . . . . . . . . 42

3.3.8 BRITE Model . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.9 Dorogovtsev-Mendes Model . . . . . . . . . . . . . . . . . 43

3.3.10 Generalised Linear Preference Model . . . . . . . . . . . . 44

3.3.11 Generalised Network Growth Model . . . . . . . . . . . . . 45

7

Page 8: Parameterising and Modelling the Internet Topology

3.3.12 Highly Optimised Tolerance Model . . . . . . . . . . . . . 46

3.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4.1 Structure-Based Models vs Degree-Based Models . . . . . 47

3.4.2 Accuracy vs Simplicity . . . . . . . . . . . . . . . . . . . . 47

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Rich–Club Phenomenon 50

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.1.1 Internet Hierarchical Structure . . . . . . . . . . . . . . . 50

4.1.2 Connectivity Of The Core . . . . . . . . . . . . . . . . . . 52

4.1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Rich-Club Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.1 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 56

4.2.2 Node-Node Link Distribution . . . . . . . . . . . . . . . . 57

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Rich-Club Subgraph . . . . . . . . . . . . . . . . . . . . . 59

4.3.2 Rich-Club Phenomenon Is Relevant . . . . . . . . . . . . . 59

4.3.3 Modelling The Rich-Club . . . . . . . . . . . . . . . . . . 60

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Interactive Growth Model 62

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Interactive Growth Model . . . . . . . . . . . . . . . . . . . . . . 63

5.3 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 65

5.3.1.1 Degree Distribution . . . . . . . . . . . . . . . . 65

5.3.1.2 Degree vs Rank . . . . . . . . . . . . . . . . . . . 66

5.3.2 Rich-club Phenomenon . . . . . . . . . . . . . . . . . . . . 67

5.3.2.1 Rich-Club Connectivity . . . . . . . . . . . . . . 67

5.3.2.2 Node-Node Link Distribution . . . . . . . . . . . 68

8

Page 9: Parameterising and Modelling the Internet Topology

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4.1 Maximum Degree . . . . . . . . . . . . . . . . . . . . . . . 70

5.4.2 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 72

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Structure Affects Functions 74

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2 Routing Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.3 Network Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.4 Network Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.4.1 Node Error . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.2 Node Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4.3 Link Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4.4 Link Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Topological Disparities Between Internet Measurements 85

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.2.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 87

7.2.2 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 89

7.2.3 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 91

7.2.4 Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.2.5 Disassortative Mixing . . . . . . . . . . . . . . . . . . . . . 96

7.2.6 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . 96

7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

9

Page 10: Parameterising and Modelling the Internet Topology

8 The Positive-Feedback Preference Model 100

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.2 Modelling The Maximum Degree . . . . . . . . . . . . . . . . . . 101

8.3 The Positive-Feedback Preference Model . . . . . . . . . . . . . . 102

8.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.4.1 Degree Distribution, Rich-Club Connectivity

and Maximum Degree . . . . . . . . . . . . . . . . . . . . 105

8.4.2 Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.4.3 Disassortative Mixing . . . . . . . . . . . . . . . . . . . . . 109

8.4.4 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 110

8.4.5 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . 111

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8.5.1 The Positive-Feedback Preferential Attachment . . . . . . 112

8.5.2 Critical Assessment of The PFP Model . . . . . . . . . . . 112

8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

9 Discussion and Conclusion 115

9.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

9.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Appendix I. Queen Mary Topology Simulator 121

Appendix II. Author’s Publications 127

Glossary 129

Bibliography 131

10

Page 11: Parameterising and Modelling the Internet Topology

List of Figures

2.1 Structure of the Internet . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 The motorway network of the USA. . . . . . . . . . . . . . . . . . 24

2.3 Poisson degree distribution. . . . . . . . . . . . . . . . . . . . . . 24

2.4 The air traffic route network of the USA. . . . . . . . . . . . . . . 25

2.5 Power-law degree distribution . . . . . . . . . . . . . . . . . . . . 25

2.6 Three Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 Small-world properties . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1 An map of the AS-level Internet. . . . . . . . . . . . . . . . . . . 35

3.2 Degree Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 Growth of the BA model. . . . . . . . . . . . . . . . . . . . . . . 40

3.4 The growth of the GLP model. . . . . . . . . . . . . . . . . . . . 44

4.1 Two network structures. . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Cumulative distribution of degree. For each model, ten networks

are generated and averaged. . . . . . . . . . . . . . . . . . . . . . 55

4.3 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 58

4.5 Degree distribution inside the rich-club subgraph . . . . . . . . . 59

5.1 The interactive growth mechanism of the IG model. . . . . . . . . 63

5.2 Degree distribution. For each model, ten networks are generated

and averaged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.3 Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

11

Page 12: Parameterising and Modelling the Internet Topology

5.4 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 67

5.5 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 68

5.6 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 69

5.7 A network generated by the IG model. . . . . . . . . . . . . . . . 70

5.8 Time-evolution of node degree . . . . . . . . . . . . . . . . . . . . 71

6.1 Cumulative distribution of degree. . . . . . . . . . . . . . . . . . . 75

6.2 Rich-club connectivity. . . . . . . . . . . . . . . . . . . . . . . . . 75

6.3 Cumulative distribution of shortest path length . . . . . . . . . . 76

6.4 Distribution of triangle coefficient . . . . . . . . . . . . . . . . . . 77

6.5 Cumulative distribution of triangle coefficient . . . . . . . . . . . 77

6.6 Distribution of quadrangle coefficient . . . . . . . . . . . . . . . . 78

6.7 Cumulative distribution of quadrangle coefficient . . . . . . . . . . 78

6.8 Node attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.9 Link attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.10 Network robustness under node error. . . . . . . . . . . . . . . . . 80

6.11 Network robustness under node attack. . . . . . . . . . . . . . . . 81

6.12 Network robustness under link error. . . . . . . . . . . . . . . . . 81

6.13 Network robustness under link attack. . . . . . . . . . . . . . . . 82

6.14 A conical structure model. . . . . . . . . . . . . . . . . . . . . . . 83

7.1 Cumulative degree distribution. . . . . . . . . . . . . . . . . . . . 87

7.2 Degree distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.3 Degree vs rank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.4 Rich-club connectivity φ(k) as a function of degree. . . . . . . . . 89

7.5 Rich-club connectivity φ(r/N) as a function of normalised rank. . 90

7.6 Cumulative distribution of shortest path length. . . . . . . . . . . 91

7.7 Correlation between shortest path length and degree. . . . . . . . 91

7.8 Cumulative distribution of clustering coefficient . . . . . . . . . . 93

7.9 Correlation between clustering coefficient and degree. . . . . . . . 93

12

Page 13: Parameterising and Modelling the Internet Topology

7.10 Cumulative distribution of triangle coefficient . . . . . . . . . . . 94

7.11 Correlation between triangle coefficient and degree. . . . . . . . . 94

7.12 Cumulative distribution of quadrangle coefficient . . . . . . . . . . 95

7.13 Correlation between quadrangle coefficient and degree. . . . . . . 95

7.14 Correlation between nearest-neighbours average degree and degree 96

7.15 Cumulative distribution of betweenness. . . . . . . . . . . . . . . 97

7.16 Correlation between betweenness and degree . . . . . . . . . . . . 97

7.17 The three AS graph measurements. . . . . . . . . . . . . . . . . . 98

8.1 Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.2 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 102

8.3 Three degree functions . . . . . . . . . . . . . . . . . . . . . . . . 104

8.4 Degree growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.5 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.6 Cumulative degree distribution . . . . . . . . . . . . . . . . . . . 106

8.7 Cumulative distribution of triangle coefficient. . . . . . . . . . . . 107

8.8 Cumulative distribution of quadrangle coefficient. . . . . . . . . . 107

8.9 Correlation between triangle coefficient and degree . . . . . . . . . 108

8.10 Correlation between quadrangle coefficient and degree . . . . . . . 108

8.11 Cumulative distribution of nearest-neighbours average degree. . . 109

8.12 Correlations between nearest-neighbours average degree and degree 109

8.13 Cumulative distribution of shortest path length. . . . . . . . . . . 110

8.14 Correlation between shortest path length and degree . . . . . . . 110

8.15 Cumulative distribution of betweenness centrality . . . . . . . . . 111

8.16 Correlations between betweenness centrality and degree . . . . . . 112

8.17 Network properties of a growing PFP model . . . . . . . . . . . . 113

10.1 Function flowchart of the QMUL Topology Simulator. . . . . . . . 122

10.2 Window of “Parameters for generating networks”. . . . . . . . . . 123

10.3 Window of the main interface. . . . . . . . . . . . . . . . . . . . . 124

13

Page 14: Parameterising and Modelling the Internet Topology

10.4 Window of “Save plot data files”. . . . . . . . . . . . . . . . . . . 125

14

Page 15: Parameterising and Modelling the Internet Topology

List of Tables

4.1 Distribution of ASes in the Internet hierarchy [78] . . . . . . . . . 51

4.2 Networks parameters . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Rich-club properties . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1 Network properties . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 68

6.1 Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Network Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.1 Parameters of the three AS graphs . . . . . . . . . . . . . . . . . 87

7.2 Rich-club connectivity as a function of degree . . . . . . . . . . . 89

7.3 Rich-club connectivity as a function of normalised rank . . . . . . 90

7.4 Parameters of the three AS graphs (continued) . . . . . . . . . . . 92

8.1 Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 105

15

Page 16: Parameterising and Modelling the Internet Topology

Chapter 1

Introduction

Recently there have been considerable efforts to understand the topology of com-

plex systems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Of particular

interest is the Internet as it is so influential in our daily life.

1.1 Challenges

Effective engineering of the Internet is predicated on a detailed understanding

of issues such as the large-scale structure of its underlying physical topology,

the manner in which it evolves over time, and the way in which its constituent

components contribute to its overall function [17].

In the last three decades, the Internet has experienced fascinating evolution,

both exponential growth in its traffic and endless expansion in its topology [18].

This emphasises the necessity of the research on more thorough and rigourous

analysis of the nature of Internet topology.

Unfortunately, developing a deep understanding of these issues has proven

to be a challenging task [19, 20, 21, 22, 23], since it in turn involves solving

difficult problems such as mapping the actual topology [24], characterising it, and

developing models that capture its emergent behaviour.

Reliable measurements of the Internet topology became available only re-

16

Page 17: Parameterising and Modelling the Internet Topology

cently [25, 26, 27, 28, 29, 30, 31]. Based on measurement data, Faloutsos et al

reported in 1999 that the Internet has a power-law degree distribution [32]. This

discovery invalidated all previous research results on modelling the Internet topol-

ogy, because they were based on the random network theories [33, 34, 35]. Even

though the networking community and physicists have since then proposed a

number of Internet topology models [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],

it remains an open question as how representative the topologies they generate

are [21].

1.2 Contributions of this thesis

The aim of this work is to provide quantitative parameters to fully characterise

the network structure and propose realistic models to accurately reproduce the

Internet topology at the autonomous systems (AS) level. The author’s own re-

search contributions are presented in chapters 4, 5, 6, 7 and 8. All the research

are based on the actual measurements of the Internet.

1.2.1 Parameterising The Internet Topology

Chapter 4 introduces the novel concept of rich-club phenomenon to describe the

hierarchical structure of the AS-level Internet, i.e. highly connected nodes not only

have large numbers of links but also are tightly interconnected with each other.

Two metrics are defined to quantitatively characterise this structural property,

which are the rich-club connectivity and the node-node link distribution. The

calculation of the two parameters are rather simple and based only on the net-

work connectivity information. Using the two parameters, the author shows that

degree-based models may not reproduce the Internet hierarchical structure. The

metric of the rich-club connectivity is a milestone on parameterising the Internet

topology and it provides a new criterion for network models.

Inspired by the rich-club phenomenon, Chapter 5 proposes an original Interac-

17

Page 18: Parameterising and Modelling the Internet Topology

tive Growth (IG) model, which adopts a so-called interactive growth mechanism

that has been observed on the Internet history data. The IG model closely resem-

bles both the power-law degree distribution and the rich-club connectivity of the

AS-level Internet.

Using the IG model as an example of networks containing a rich-club, Chapter

6 shows it is relevant to reproduce the Internet’s rich-club structure because an

Internet model that does not contain a rich-club underestimates the actual net-

work’s routing efficiency (shortest path length) and routing flexibility (alternative

reachable paths), and overestimate the network robustness under node-attack.

This result highlights the importance of studying the Internet topological struc-

ture because structure fundamentally affects function.

Chapter 7 provides a novel comparison of different Internet data sources ob-

tained by using distinct measuring methodologies. Results show that the mea-

surements contain non-trivial topological differences. The major structural dis-

crepancy is revealed by the rich-club connectivity.

1.2.2 Modelling The Internet Topology

Using the IG model as a precursor, Chapter 8 introduces the Positive-Feedback

Preference (PFP) model. The PFP model is superior to any other currently known

Internet topology generator. The PFP model accurately reproduces all the rele-

vant topological properties of the AS-level Internet, including degree distribution,

rich-club connectivity, the maximum degree, shortest path length, short cycles,

disassortative mixing and betweenness centrality. Moreover, the two growth mech-

anisms of the PFP model, namely the appearance of new internal links and the

positive-feedback preference, are based on (and supported by) the observations

on the Internet history data. The model’s unique non-linear preference provides

a novel insight into the basic dynamics that could be responsible for the evolving

topology of complex networks. The PFP model is a significant achievement on

18

Page 19: Parameterising and Modelling the Internet Topology

modelling the Internet topology.

In summary, the author’s successful research has provided a number of promis-

ing contributions. The two main achievements are the metric of rich-club connec-

tivity and the PFP model. These novel contributions represent a significant exten-

sion of the state-of-the-art knowledge in the area of parameterising and modelling

the Internet topology.

1.3 Structure of this thesis

Chapter 2 defines a number of topological properties that are used in the net-

work research, including network size, degree, rank, degree distribution, shortest

path length, node betweenness centrality, clustering coefficient and disassortative

mixing (degree correlation). Chapter 2 also introduces two classical network theo-

ries used before the discovery of the power-law degree distribution of the Internet

topology.

Chapter 3 provides the up-to-date background of this research. It introduces

data sources of the Internet topology and their measuring methodologies. Chapter

3 describes a number of existing topology models that have been used for generat-

ing Internet-like graphs and then discusses the problems of the models and points

out the objectives of the research.

Chapters 4, 5, 6, 7 and 8 present the author’s research contributions on pa-

rameterising and modelling the Internet topology.

Chapter 9 reviews the methodology used in this research and provides pos-

sible directions for the future work. Appendix I provides a brief introduction

on the self-developed software kit of QMUL Topology Simulator, which is used

to conduct simulations and obtain numerical results. Appendix II lists the au-

thor’s publications. Most materials present in this thesis have been published in

peer-reviewed journals and conferences.

19

Page 20: Parameterising and Modelling the Internet Topology

Chapter 2

Preliminaries

This Chapter introduces the Internet topology and a number of topological proper-

ties that have been widely used by the network research community. This Chapter

also introduces the two important classes of networks, namely random networks

and small-world networks, which had been used in studying and modelling the

Internet topology until the practical measurements of the Internet became avail-

able.

2.1 Internet Topology

In general terms, the Internet is a global net of computers, which are intercon-

nected by wires (links) [8]. This network provides electronic transmission of in-

formation between computers.

The connections in the Internet can be abstracted in the dimension of network

administration, which groups IP addresses into subnetworks, subnetworks into

network prefixes and prefixes into autonomous systems (AS). Figure 2.1 [32] shows

a scheme of the structure of the Internet. The vertices (nodes) of the Internet are:

• Hosts that are the computers of users.

• Servers that are computers or programs providing a network service, which

also can be hosts.

20

Page 21: Parameterising and Modelling the Internet Topology

Figure 2.1: Structure of the Internet [32]. The global structure of the Internet isdetermined by the routers (the router level) and domains (the AS level).

• Routers that distribute traffic across the Internet.

• Domains (autonomous systems), where routers are grouped into subnet-

works.

In 2001 the Internet contained about 100 million (108) hosts. However, it is not

the hosts that determine the structure of the Internet but routers and domains.

So, one can consider the topology of the Internet at the router level or the AS level.

The net of routers is much larger than the net of autonomous systems. In 2001

there were roughly 228,000 routers in total and the total number of autonomous

systems was about 104 [18].

An autonomous system is the term that the Border Gateway Protocol

(BGP) [48] gives to an entity that manages one or more networks and has a

coherent policy for routing IP traffic both internally and to other autonomous

systems. Within autonomous systems, the routing of information is advertised by

some internal rules and algorithms (internal protocols). In principle, the internal

protocols of distinct autonomous systems should not coincide. Therefore the net-

work structure inside an autonomous system only affects local traffic behaviours.

This thesis focuses on the AS-level Internet topology, in which each node is an

21

Page 22: Parameterising and Modelling the Internet Topology

autonomous system, because the delivery of IP traffic through the Internet de-

pends on the complex interactions between thousands of autonomous systems that

exchange routing information using the Border Gateway Protocol [49, 50]. For

example, research [51] has showed that the topology of the AS-level Internet has

a major impact on the delayed BGP routing convergence.

When studying the topology of the Internet, the network connectivity informa-

tion is represented with a graph, in which nodes are connected by links. Usually

nodes and links in the graph do not contain physical properties, such as the buffer

volume of a router or the length of an optical cable. There are some assumptions

on the Internet graph: all links are not directed links, no link connects a node to

the node itself (self-loop) and each node has at least one link (k ≥ 1). Also there

is no portion separated from the network, in other words, any node is reachable

from any other node.

2.2 Topological Properties

2.2.1 Network Size

The size of a network is given by the total number of nodes N , and the total

number of links L. For example in 1999 the AS-level Internet had 6374 nodes and

13641 links [26] and in 2001 it had 11122 nodes and 30054 links [113].

2.2.2 Degree

The degree k of a node, also called node connectivity, is the number of links which

have the node as an end-point, or equivalently, the number of nearest neighbours

of the node.

The average degree of a network, 〈k〉, can be given by 〈k〉 = (L ∗ 2)/N , where

L is the number of links and N is the number of nodes. The average degree of

the AS-level Internet was 4.28 in 1999 and 5.4 in 2001.

22

Page 23: Parameterising and Modelling the Internet Topology

The maximum degree of a network, kmax, is the largest degree that a node

has in the network. In 2001 the maximum degree of the AS-level Internet was

2839, which was nearly a quarter of the number of nodes, kmax ' N/4, where

N = 11122.

The concept of rank is often used when studying the property of degree. The

rank r of a node denotes its position on a list of all nodes sorted in decreasing

degree. The node with rank r = 1 has the largest degree. When a group of nodes

have the same degree, they are arbitrarily assigned a position within that group.

Therefore r ∈ [1, N ], where N is the number of nodes a network has.

2.2.3 Degree Distribution

If p(k, s,N) is defined as the probability that the node s in the network of size N

has a degree k, the degree distribution is

P (k, N) =1

N

N∑

s=1

p(k, s,N) [8], (2.1)

which is often denoted as P (k). While degree is a local property, the probability

distribution of the degree gives important information of the global properties

of a network and can be used to characterise different network topologies. For

example the so-called complex networks [2, 5, 7, 8] are characterised by highly

heterogeneous degree distributions [52].

2.2.3.1 Poisson Degree Distribution

Figure 2.2 shows the motorway network of the USA. Most cities (nodes) have 3 or

4 motorway connections, only a few cities have many motorway connections and

only a few cities have only one or two motorway connections.

This motorway network characterised by a Poisson degree distribution as

shown in Figure 2.3. The distribution curve is symmetric and the majority of

nodes are distributed around the average degree of the network, 〈k〉. Networks

with a Poisson degree distribution are often referred as exponential networks.

23

Page 24: Parameterising and Modelling the Internet Topology

Figure 2.2: The motorway network of the USA.

Figure 2.3: Poisson degree distribution.

2.2.3.2 Power-Law Degree Distribution

Figure 2.4 shows the air traffic route network of the USA. There are a very large

number of airports in the USA, but most of airports have just a few airline con-

nections. Only a few hub cities having huge numbers of airline connections and

they dominate the whole network traffic.

This network is characterised by a power-law degree distribution as shown in

Figure 2.5. The distribution curve in a logarithmic scale is a straight line, which

suggests that the formula of the power-law degree distribution is P (k) ∼ k−γ,

where the constant γ is the power-law exponent. Networks with a power-law

degree distribution are often referred as scale-free networks [38]. Both scale-free

networks and exponential networks widely exist in nature and human society [7].

Faloutsos et al [32] reported in 1999 that the AS-level Internet topology exhibit

24

Page 25: Parameterising and Modelling the Internet Topology

Figure 2.4: The air traffic route network of the USA.

Figure 2.5: Power-law degree distribution (on a log-log scale).

a power-law degree distribution, P (k) ∼ k−γ, where γ ' 2.22.

2.2.4 Shortest Path Length

The shortest path is a route connecting two nodes with the least number of hops.

In a graph, the number of hops along a route is called the length of the path.

The average shortest path length l of a node is defined as the average length of

the shortest paths from the node to all other nodes in the network. In this thesis,

the shortest path length is calculated using Dijkstra’s algorithm [53].

The characteristic path length l∗ of a network is the average length of the

shortest paths over all pairs of nodes. The characteristic path length indicates

the network’s overall routing efficiency. A network with a smaller value of l∗ may

achieve better dynamic performance [54, 55, 56]. The characteristic path length

of the AS-level Internet was 3.7 in 1999 and it was 3.13 in 2001, which are very

25

Page 26: Parameterising and Modelling the Internet Topology

small considering the huge size of the network.

The maximum value of the shortest paths over all pairs of nodes is the net-

work’s diameter, D. A network’s diameter may not proportionally increase with

the network size and it mainly depends on the topological structure of the network.

2.2.5 Node Betweenness Centrality

On a network, there are nodes that are more prominent than others because they

are highly used when transferring information. A way to measure this “impor-

tance” is by using the concept of node betweenness centrality, also called between-

ness, which measures the proportion of shortest paths which visit a certain node.

The betweenness centrality is defined as the total number of data packets passing

through that node when every pair of nodes sends and receives a data packet along

the shortest path connecting the pair. When there exist more than one shortest

paths between a pair of nodes, the data packet would be divided evenly.

Given a source node s and a destination node d, the number of different short-

est paths is g(s, d). The number of shortest paths that contain the node w is

g(w; s, d). The proportion of shortest paths, from s to d, which contain node

w is ps,d(w) = g(w; s, d)/g(s, d). The betweenness centrality of node w is calcu-

lated [57, 58, 59] as

CB(w) =∑s

d6=s

ps,d(w), (2.2)

where the sum is over all possible pairs of nodes with s 6= d.

If all pairs of nodes of a network communicate at the same rate, and the traffic

goes by the shortest paths, then the traffic through a node is proportional to the

betweenness of the node. In other words, the betweenness estimates the capacity

of each node needed for a free-flow state [57].

A node with a large CB is “important” because it carries a large traffic load.

If this node fails or gets congested, the consequences to the network traffic can

26

Page 27: Parameterising and Modelling the Internet Topology

be drastic [59, 60]. As is natural, one can suggest that the betweenness of a node

strongly correlates with its degree.

In this thesis the betweenness centrality is normalised by N , the total number

of nodes, and denoted as C∗B. The average betweenness centrality over all nodes

is 〈C∗B〉 = l∗ + 1 [59], where l∗ is the network’s characteristic path length.

2.2.6 Clustering Coefficient

If a node has k neighbours, then at most k ∗ (k − 1)/2 inter-neighbour links can

exist between the neighours. If nc denotes the number of inter-neighbour links the

node has, then the clustering coefficient c of the node is defined as the fraction of

the allowable links that actually exist [61],

c =nc

k(k − 1)/2. (2.3)

Clustering coefficient reflects the extent to which neighbours of a node are

also neighbours of each other, and thus it measures the cliquishness of a typical

neighbour circle. In other words, it characterises the ‘density’ of connections in

the environment close to a node.

When a node has only one neighbour (k = 1), the value of c is zero. The

maximum value of c is one, which means all neighbours are connected to each

other and the maximum linkage in this cluster (the maximum ‘clustering’) is

reached. The average clustering coefficient of a network, 〈c〉, is the average value of

clustering coefficient over all nodes. Depending on the measurement data sources,

the average clustering coefficient of the AS-level Internet is between 0.24 and

0.49 [11].

2.2.7 Disassortative Mixing (Degree Correlations)

Complex networks can be grouped into assortative, disassortative and neutral

networks [62, 63, 64, 65]. Social networks (e.g. the co-authorship network) are

27

Page 28: Parameterising and Modelling the Internet Topology

assortative networks, in which high-degree nodes prefer to attach to other high-

degree nodes. Information networks (e.g. the World Wide Web and the Internet)

and biological networks (e.g. protein interaction networks) have been classified

as disassortative networks, in which high-degree nodes tend to connect with low-

degree ones.

A network’s degree mixing pattern is identified by the conditional probabil-

ity pc(k′|k) that a link connects a node with degree k to a node with degree

k′. This joint degree-degree distribution is inconvenient for empirical analy-

sis due to the poor statistics obtained using the limited data sources. Paster-

Satorras et al [66, 67] found that the conditional probability can be indicated

by the nearest-neighbours average degree knn of a node with degree k. In this

dependence, only one variable (degree k) is present. A disassortative network ex-

hibits a negative correlation between the nearest-neighbours average degree and

the degree.

The degree correlations are absent in classical random graphs, but are natural

in growing networks. For example the AS-level Internet exhibits the disassortative

mixing behaviour [63, 66, 67, 64], where high-degree nodes tend to connect to

peripheral nodes with low degrees.

2.3 Random Networks

The classical random network theory was introduced by Erdos and Renyi [33, 34,

68]. There are two main constructions of Erdos-Renyi graphs with a fixed number

of nodes N :

1. Each two nodes of the network are connected by a link with probability p.

Naturally, this link is absent with probability 1− p.

2. The nodes are randomly connected by a given number L of links. One can

realise this construction procedure by repeatedly adding new links between

28

Page 29: Parameterising and Modelling the Internet Topology

pairs of randomly chosen nodes. In graph theory, this is called a random

graph process.

These two constructions define two equivalent statistical ensembles of graphs.

The set of graphs in construction (1) is all 2N(N−1)/2 graphs with any number of

links smaller than or equal to N(N − 1)/2. The set of graphs in construction (2)

consists of all possible graphs with N nodes and a given number of L of links.

The constructions above naturally generate uncorrelated graphs. In other

words, correlations between their nodes are absent. Each node in the graph with

N nodes is in the same situation. It can have any number of links attached, from

zero (a “bare” node) to N−1. If a node is of degree k, then its k links can occupy

N − 1 possible positions. Standard combinatorics readily lead to the following

degree distribution of the classical random graph:

P (k) =

(N − 1

k

)pk(1− p)N−1−k, (2.4)

that is the binomial distribution, so that the average degree is 〈k = p(N−1)〉 and

the network contains, on average, pN(N − 1)/2 links. For large N and fixed 〈k〉,the degree distribution takes the Poisson form

P (k) = e−〈k〉〈k〉k/k!. (2.5)

The Erdos-Renyi model generates statistically homogeneous networks in which,

despite the fundamental randomness of the model, most nodes have the same

number of links, 〈k〉 (the average degree). In particular, the connectivity follows

a Poisson distribution that peaks strongly at 〈k〉, implying that the probability of

finding a highly connected node decays exponentially (P (k) ' e−k, for k À 〈k〉.)The Waxman model [35] provides another construction for random networks

with Poisson degree distribution and has been widely used to generate random

topologies for network simulations. It starts by placing N nodes uniformly on an

n by n plane. Once all nodes have been placed on the plane, the model computes

29

Page 30: Parameterising and Modelling the Internet Topology

the probability of creating a link between two nodes µ and υ with the following

probability function:

P (µ, υ) = αe−d(µ,υ)/βL, (2.6)

Where d(µ, υ) is the Euclidean distance between µ and υ, L is the maximum

Euclidean distance between two nodes, α and β are parameters in the range (0, 1].

Then a random number is generated between 0 and 1. A link is created between

µ and υ only if the random number is smaller than P (µ, υ).

The above random networks are static, in the sense that they have a fixed size.

Starting with a constant set of N disconnected nodes, these networks are defined

by the rules assigning links between pairs of nodes. These networks share a random

nature in the process of placing the links, that it is in general independent of the

local properties of nodes. Despite this extreme simplification, however, random

networks have provided for a long time the theoretical reference framework in

network modelling, including the Internet.

The characteristic path length of a random network can be approximated [33,

34] by

l∗ ≈ ln(N)/ ln〈k〉, (2.7)

where N is the number of nodes and 〈k〉 is the average degree.

2.4 Small-World Networks

A regular network is characterised by its neighbour clustering. For example the

ring-lattice network shown in Figure 2.6-a has a large number of triangles and the

grid-lattice network has a large number of quadrangles. This structural property

provides the network a large number of alternative routing choices and makes the

network as a whole highly fault-tolerant.

A random network is characterised by its random connections, which provide

routing shortcuts and make the characteristic path lengths l∗ (see Equation 2.7)

30

Page 31: Parameterising and Modelling the Internet Topology

Figure 2.6: The three networks with the same numbers of nodes, the same numberof links and the same placement of nodes. a. Regular network (ring-lattice). b. Small-world network (the Watts-Strogatz Model). c. Random network.

of the network significantly smaller than that of an equivalent regular network.

A small-world network [61, 69, 70, 71, 72] has the following properties:

• The clustering coefficient c is much larger than that of a random graph with

the same number of nodes and the same average degree.

• The characteristic path length l∗ is almost as small as l∗ for the correspond-

ing random graph.

This means a small-world network has a large number of triangles and quadran-

gles and also has random connections. The AS-level Internet is regarded as a

good example of a small-world network because, despite the immense size of the

network, it has a very small characteristic path length (l∗ = 3 ∼ 4) and fairly

large average clustering coefficient (〈c〉 = 0.30 ∼ 0.49).

Watts [61] demonstrated that a regular lattice can be transformed into a small-

world network by making a small fraction of the connections random. Figure 2.6-a

shows a ring-lattice regular network, in which each nodes are uniformly connected

to its 4 closest neighbours. If a small fraction p, of the links are made random,

the network turns into a small-world network (Figure 2.6-b). If all the links are

made random, the network becomes a random network (Figure 2.6-c).

31

Page 32: Parameterising and Modelling the Internet Topology

l*(p)/l*(0)

c(p)/c(0)

p

Figure 2.7: Small-world properties [61]. c(p) is the average clustering coefficient andl∗(p) is the characteristic path length of network with a fraction of p links randomlyrewired.

As shown in Figure 2.7, when only a fraction of p = 0.01 links are rewired

randomly, the network’s average clustering coefficient is nearly the same as that

of the ring-lattice regular network, c(p = 0.01)/c(0) ' 1 and the network’s char-

acteristic path length is significantly smaller than that of the ring-lattice regular

network l∗(p = 0.01)/l∗(0) ' 0.18 and close to that of the random network.

2.5 Summary

This chapter defined the following topological properties: network size, degree,

degree distribution, shortest path length, node betweenness centrality, clustering

coefficient and disassortative mixing. They are going to be used in the rest of

this thesis. This chapter also introduced the concepts of random networks and

small-world networks.

32

Page 33: Parameterising and Modelling the Internet Topology

Chapter 3

Measurements and Models Of

The AS-Level Internet

3.1 Introduction

This Chapter introduces three types of data sources that predominate in the In-

ternet research and two methodologies used to obtain the data sets. The Chapter

also introduces a number of recently proposed network models which are of rele-

vance to this work. This Chapter sets out the immediate context of this research

and point out what the challenges are.

3.2 Topology Measurements Of The AS-Level

Internet

There are currently two primary methods of inferring the Internet structure at the

AS-level: the passive measurement, which uses BGP inter-domain routing tables,

and the active measurement, which actively probes IP addresses to get the actual

paths that packets travel from a source to a destination. The strength of this

research is that it is based on the real measurement data of the Internet topology.

33

Page 34: Parameterising and Modelling the Internet Topology

3.2.1 Passive Measurement - BGP AS Graph

The Internet passive measurement [25, 26, 27] produces the BGP AS graphs,

which are constructed from Internet inter-domain BGP routing tables. The BGP

tables contain the information of connections from an AS to its immediate AS

neighbours.

The widely used BGP data are available from the Active Measurement Project

at National Laboratory for Applied Network Research [25] and the Route Views

Project at University of Oregon [26]. Both projects connect to a number of op-

erational routers within the Internet for the purpose of collecting BGP routing

tables.

The Measurement and Network Analysis Group of the US National Labora-

tory for Applied Network Research (NLANR) [25], has developed the Network

Analysis Infrastructure (NAI). The NAI is the largest project of its kind that

makes all data publicly available for use by other network researchers. On its

web site, http://most.nlanr.net/, one can find extensive Internet routing related

information collected since November 1997. For nearly each day NLANR has a

complete map of connections of operating autonomous systems.

BGP tables have the advantage that they are relatively easy to parse, process

and comprehend. However, despite widespread public availability, BGP data has

several limitations [73]. BGP tables do not reflect how traffic actually travels in

network and provide only a local perspective from a router toward a destination.

3.2.2 Extended BGP AS Graph

The Topology Project at University of Michigan [30] provided the extended ver-

sion [74, 75] of BGP AS graphs by using additional data sources, such as the

Internet Routing Registry (IRR) data and the Looking Glass (LG) data. The

IRR maintains individual ISP’s (Internet Service Provider) routing information

in several public repositories to coordinate global routing policy. The LG sites are

34

Page 35: Parameterising and Modelling the Internet Topology

maintained by individual ISPs to help troubleshoot Internet-wide routing prob-

lems. Extended BGP AS graphs typically have 20-50% more links than the origi-

nal BGP AS graphs and provide more complete pictures of the Internet topology.

3.2.3 Active Measurement - Traceroute AS Graph

Figure 3.1: An map of the AS-level Internet measured by the Internet MappingProject [29] of Bell Labs.

The Internet active measurement [28, 29, 76] produces the Traceroute AS

graphs. From 1998, the Cooperative Association for Internet Data Analysis

(CAIDA [28]) began its Macroscopic Topology Project to collect and analyse

Internet-wide topology and latency data at a representatively large scale. In the

course of this project CAIDA has created several innovative measurement, anal-

ysis and visualisation tools. The primary topology measurement tool is skitter,

which implements the Internet Control Message Protocol (ICMP) to collect the

forward path from the monitor to a given destination and capture the addresses of

intermediate routers in the path. Skitter runs on more than 20 monitors around

the globe and actively collects forward IP path to over half a million destinations.

Traceroute AS graph extracts [28, 29, 76] interconnect information of ASes from

the massive traceroute data collected by skitter.

35

Page 36: Parameterising and Modelling the Internet Topology

3.2.4 Discovery Of The Internet Power-Law Degree

Distribution

Figure 3.2: Degree Frequency [32] of a BGP AS graph measured on 5th December1998.

Based on the BGP measurement data, Faloutsos et al [32] reported in 1999

that the degree distribution of the AS-level Internet (see Figure 3.2) and the

router-level Internet are described by a power-law

P (k) ∼ k−γ, (3.1)

where the power-law exponent is γ ' 2.2 for the AS-level Internet. The discov-

ery of the Internet power-law degree distribution is of fundamental importance

because it showed that the Internet topology can not be modelled by network

models with a Poisson degree distribution, such as random networks and small-

world networks. In fact, this property literarily invalidated all previous research

on modelling the Internet topology.

3.2.5 Which AS Graph?

Most recent studies on the AS-level Internet topology were based on the

BGP AS graphs and the Extended AS graphs, such as the power-law de-

gree distribution [32], the error and attack tolerance [77] and other research

works [66, 67, 78, 37, 79].

36

Page 37: Parameterising and Modelling the Internet Topology

Comparison studies [80, 81, 82, 73] have shown that the Traceroute AS graph

is more complete and reliable than the BGP AS graph. However it is not clear

whether the Traceroute AS graph is more complete than the Extended BGP AS

graph, which has captured even more Internet connections than the Traceroute

AS graph.

Chapter 7 will compare the three types of AS graphs in detail by examining all

the topological properties. Based on the comparison results, the author suggests

that the Traceroute AS graphs are more realistic measurements for the Internet

research. In this thesis, Chapter 4 and 5 are based on an Extended AS graph

measured in 2001. Chapter 6 and 8 are based on a Traceroute AS graph measured

in 2002.

3.3 Topology Models Of The AS-Level Internet

This section introduces a selection of the existing Internet models which have been

widely used in the studying of the Internet. The Tiers model, the GT-ITM model

and the User-Provider model focus on the Internet hierarchical structure [83].

The Inet model, the Barabasi and Albert (BA) model and the modifications of

the BA model are degree-oriented models. The BRITE model, the Dorogovtsev-

Mendes Model, the Generalised Network Growth (GNG) Model, the Generalised

Linear Preference (GLP) Model and Highly Optimised Tolerance (HOT) Model

are examples of models using more complex growth mechanisms. This thesis will

further study the Inet model, the BA model, the Fitness BA model and the GLP

model in the following chapters.

3.3.1 Tiers Model

The Tiers generator [84] is based on a three level hierarchy that represents Wide

Area Networks (WAN), Metropolitan Area Networks (MAN), and Local Area

Networks (LAN). To generate a random topology using Tiers, one specifies a

37

Page 38: Parameterising and Modelling the Internet Topology

target number of LANs and MANs. Currently Tiers cannot generate more than

one WAN per random topology. For each level of hierarchy, one also specifies

a fixed number of nodes per network. A minimum spanning tree is computed

to connect all links, then other links are created based on user-specified average

inter-level and intra-level redundancy. The link formation favours close-by nodes,

resulting in topologies with large diameters (see Section 2.2.4 on page 26).

3.3.2 GT-ITM Model

GT-ITM (Transit-Stub) model [85, 83] generates topologies based on several dif-

ferent models. The connectivity used to generate each connected graph can be se-

lected from one of six methods: PureRandom, Waxman1, Waxman2, Doar-Leslie,

Exponential, or Locality [85, 83]. Similar to Tiers, the model has a well-defined

hierarchical structure. It generates topologies with two levels of hierarchy: one

consisting of transit ASes, and the other consisting of stub ASes. Also similar to

Tiers, the GT-ITM model allows for extra links to be added between stub ASes

and between stub and transit ASes.

3.3.3 User-Provider Model

User-Provider model [36] generates networks using a self-organised interaction

between users and providers, where the interactive can be rearranged during the

network growth. All nodes in the model are divided into two roles: providers and

users. Providers can have several links, pointing to other sites which correspond

to users. Users have a single link pointing to their providers. At each time-step,

a node is added to the network. The new node can be either a provider with a

probability r or a user with probability 1−r. When a provider is added, D(t) users

in the network are chosen at random, and rewired to the new provider. Links to the

previous providers are removed. It is assumed that the integer number D(t) is a

random variable with Poisson distribution and each user has the same probability

38

Page 39: Parameterising and Modelling the Internet Topology

(1/k) to be rewired.

3.3.4 Inet Model

The Inet model1 [86, 37] was designed to match the degree distribution as mea-

sured in the BGP AS graphs. The model generates networks in three steps:

• Build a spanning tree with all nodes that have degrees greater than one.

• Connect all nodes with degree one to nodes in the spanning tree with a

linear preference.

• Connect the remaining free links in the spanning tree.

The number of links generated by the model depends on two parameters, which

are the total number of nodes and the percentage of nodes with degree k = 1.

Since the model is based on the original BGP AS graph, it typically generates

26% less links than the extended BGP AS graph.

3.3.5 Barabasi and Albert Model

Pursuing a very different class of dynamic graph models, Barabasi and Al-

bert [38, 87] showed that power-law graphs can arise from a simple dynamic model

that combines incremental growth with a preference for new nodes to connect to

existing ones that are already well connected.

The BA model starts with a small random network, the system “grows” by

attaching a new node with m links 2 to m different nodes that already present in

the system (see Figure 3.3); and the attachment is “preferential” [88] because the

probability that a new node connects to node i with degree ki is

Π(ki) =ki∑j kj

, (3.2)

1During the research on this thesis, the author found that the Inet-2.1 model con-tains redundant links in the output. According to his report, the Inet research group(http://topology.eecs.umich.edu/inet/) identified the programming bug and updated the modelto version 2.2 and later Inet-3.0.

2Use m = 3 to obtain Internet-like networks.

39

Page 40: Parameterising and Modelling the Internet Topology

(Existing network)

New node

Figure 3.3: Growth of the BA model.

which is a linear function of ki.

The BA model has generated great interest in various research areas [89, 90,

91, 92]. Barabasi and Albert state [40, 93] that this intuitively appealing growth

model applies to the Internet’s AS graph and therefore explains why AS graph ex-

hibit power-law degree distributions. The model has also been used as a starting-

point in research into the error and attack tolerance of the Internet [77, 94].

Simplicity and parsimony are the two advantages of the BA model. The BA

model is important also because the model can be mathematically analysed. Using

mean-field theory, Barabasi and Albert [95] showed that the BA model generates

networks with a degree distribution of P (k) ∼ k−γ with the power-law exponent

of γ = 3.0, which is independent of network size (growth time) and the parameter

m.

Mean-field theory for scale-free random networks

After t time-steps, the network has N = t + m0 nodes and mt links. Time de-

pendence of the connectivity ki of a given node i can be calculated analytically

using a mean-field approach. Assume that k is continuous, and thus the proba-

bility Π(ki) = ki/∑

j kj can be interpreted as a continuous rate of change of ki.

Consequently,

∂ki

∂t= mΠ(ki) = m

ki∑N−1j=1 kj

.

40

Page 41: Parameterising and Modelling the Internet Topology

Taking into account the total growth in the number of links∑

j kj = 2mt, then

∂ki/∂t = ki/2t. The solution of this equation, with the initial condition that node

i was added to the system at time ti with connectivity ki(ti) = m, is

ki(t) = m(

t

ti

, β = 1/2.

The probability that a node has a degree ki(t) smaller than k, P (ki(t) < k), can

be written as:

P (ki(t) < k) = P

(ti >

m1/βt

k1/β

)

If the nodes are added at equal time intervals, the probability density of ti is

P (ti) = 1/(m0 + t). Then,

P

(ti >

m1/βt

k1/β

)= 1− m1/βt

(m0 + t)k1/β.

The degree probability distribution is

P (k) =∂P (ki(t) < k)

∂k=

2m1/βt

(m0 + t)k1/β+1

where 1β

+ 1 = 3, so that P (k) ∼ k−3.

3.3.6 Fitness BA Model

The Fitness BA (FBA) model [39] is a modification of the BA model. It uses

generalised preferential attachment which assures that, even a relatively young

node with a small number of links, can acquire new links at a higher rate if it has

a large fitness parameter. The reason the author studies this model is that, for

the uniform fitness parameter distribution, the network generated by this model

has a power-law exponent similar to that of the AS graph.

The FBA model [39] is identical to BA model except that a new parameter,

fitness, is introduced in to the calculation of the probability Π. In the real Internet,

41

Page 42: Parameterising and Modelling the Internet Topology

the probability that a new node will be connected to node i does not only depends

on the node’s connectivity k. The node’s fitness describes it’s ability to compete

for links at the expense of other nodes. Fitness BA model generate networks with

a power-law degree distribution with the exponent of the power-law closer to the

actual Internet degree distribution.

A fixed fitness parameter η is assigned to each nodes, where η is chosen uni-

formly from the interval [0, 1]. The preferential probability becomes:

Π(i) =ηiki∑j ηjkj

. (3.3)

Using mean-field theory, Bianconi and Barabasi [39] showed that the Fitness

BA model generates networks with a power-law degree distribution of P (k) ∼ k−γ,

where the slope γ = 2.25, which is closer to that of the Internet (γ ' 2.2).

3.3.7 Generalised BA Model

The Generalised BA model [40] is an extension of the BA model. It can generate

networks with power-law exponents between 2 and 4. In the Generalised BA

model, three possible activities could happen in every growth step:

• With probability p (0 ≤ p < 1),m (m < m0) new links are added.

• With probability q (0 ≤ q < 1− p) , m links are rewired.

• With probability 1− p− q, a new node is added.

The preferential probability is

Πi =ki + 1∑j(kj + 1)

, (3.4)

which is proportional to ki + 1, such that there is a nonzero probability that

isolated nodes (ki = 0) acquire new links. Albert and Barabasi [40] showed that

the network’s power-law degree distribution is:

P (k) =t

m0 + tD(p, q, m) (k + A(p, q, m) + 1)−1−B(p,q,m) , (3.5)

42

Page 43: Parameterising and Modelling the Internet Topology

where,

A(p, q, m) = (p− q)

(2m(1− q)

1− p− q+ 1

), (3.6)

B(p, q, m) =2m(1− q) + 1− p− q

m(3.7)

and

D(p, q, m) = (m + A(p, q, m) + 1)B(p,q,m)B(p, q, m). (3.8)

The power-law exponent γ = 1 + B(p, q, m) and varies between 2 and 4.

3.3.8 BRITE Model

BRITE [41, 96] is an approach towards universal topology generation. BRITE

combines a number of topology generation tools, which can be used to flexibly

control various parameters (such as connectivity and growth models) and study

various properties of generated network topologies (such power-laws, average path

length, etc). It has the following features:

• Flexible: BRITE supports multiple generation models. Models can be en-

hanced by assigning links attributes such as bandwidth and delay.

• Extensible: BRITE’s object-oriented architecture provides researchers with

the ability to add new models of generation and with the ability to import

from and export to custom topology files.

• Interoperable: BRITE allows importing topologies from other topology gen-

erators and extending or combining them with other topologies.

3.3.9 Dorogovtsev-Mendes Model

Dorogovtsev and Mendes [42] introduced a model using the addition of new in-

ternal links. If the parameter m is the number of new internal links that appear

at each growth time-step, the model evolves according to the following rules.

43

Page 44: Parameterising and Modelling the Internet Topology

• At each time-step, a new node is added and linked with node i with the

probability given by the BA model (see Equation 3.2 on page 39).

• In addition,

– m ≥ 0 new internal links are added between unconnected pairs of

old node i and j with probability proportional to the product of their

degrees, ki × kj.

– In the case of m ≤ 0, some old links between old nodes are removed

with equal probability.

The parameter m may be also non integer. Dorogovtsev and Mendes showed that

with a wide range of m, this model can generate networks with power-law degree

distributions and the power-law exponent γ can be adjusted by the m. However

this model produces the wrong kind of the degree-degree correlation.

3.3.10 Generalised Linear Preference Model

Bu et al [44] recently introduced the Generalised Linear Preference (GLP) model.

This model is a modification of the BA model. It reflects the fact that the evolution

of the Internet topology is mostly due to two operations, the addition of new nodes

and the addition of new links between existing nodes.

(Existing network)

New node

(Existing network)

New links

a. Addition of new nodes b. Addition of new links

Figure 3.4: The growth of the GLP model. The two operations are independent.

44

Page 45: Parameterising and Modelling the Internet Topology

The model starts with m0 nodes connected through m0 − 1 links. As shown

in Figure 3.4, at each time-step, one of the following two operations is performed:

• With probability ρ ∈ [0, 1], m (m < m0) new links are added between m

pairs of nodes chosen from existing nodes;

• With probability 1− ρ, one new node is added connecting to m old nodes.

The GLP model uses the generalised linear preference that the probability

Π(i) to choose node i with degree ki is given by

Π(i) =(ki − β)∑j(kj − β)

, β ∈ (−∞, 1). (3.9)

The parameter β can be adjusted such nodes have a stronger preference of being

connected to high degree nodes than predicted by the linear preference of the BA

model given by Equation 3.2 (on page 39).

Bu et al showed that the GLP model, using the recommended parameter values

(ρ = 0.66, m = 1, m0 = 10 and β = 0.6447), resembles the characteristic path

length and the clustering coefficient of a BGP AS graph measured in September

2000.

3.3.11 Generalised Network Growth Model

The Generalised Network Growth (GNG) Model [45, 97] is similar to the GLP

model. The basic idea of the GNG model is to allow both the addition of a vertex

(with probability p) and the addition of a link (with probability 1 − p), but the

model applied a new preference scheme. According to the its definition, at each

time-step,

• either a node is added and linked with node i with probability

Π(i) = p · ki∑j=1, N kj

,

45

Page 46: Parameterising and Modelling the Internet Topology

• or a link is added (if absent) between nodes i and j, which are already

present in the system, with probability

Π(i) = (1− p) · ki∑k=1, N kk

· |ki − kj|∑k 6=i=1, N |ki − kk| .

The resulting network is a scale-free one, with the power-law exponent γ(p) =

2 + p2−p

. From the above rules, the case p = 1 (no link creation) corresponds

to a traditional BA model where only one connection is added for a time-step.

This model exhibits some agreement with the Internet measurements for the de-

gree distribution, the betweenness distribution, the clustering coefficient and the

correlation functions for the degrees. However the growth dynamics of the GNG

model are not supported by the real measurements.

3.3.12 Highly Optimised Tolerance Model

Carlson et al [46, 98] introduced another mechanism for generating power-law

distributions, referred to as Highly Optimised Tolerance (HOT), which is moti-

vated by biological organisms and advanced engineering technologies. Their focus

is on systems which are optimised, either through natural selection or engineer-

ing design, to provide robust performance despite uncertain environment. They

suggest that power-laws in these systems are due to tradeoffs between yield, cost

of resources, and tolerance to risks. The characteristic features of HOT systems

include: 1) high efficiency, performance, and robustness to designed-for uncer-

tainties; 2) hypersensitivity to design flaws and unanticipated perturbations; 3)

nongeneric, specialised, structured configurations; and 4) power-laws.

46

Page 47: Parameterising and Modelling the Internet Topology

3.4 Discussions

3.4.1 Structure-Based Models vs Degree-Based Models

Following the long-held belief that the Internet is hierarchical, the network topol-

ogy generators most widely used by the Internet research community, e.g. the

Tiers model and the GT-ITM model, create networks with a deliberately hierar-

chical structure.

However, in 1999 Faloutsos et al [32] revealed that the Internet’s degree dis-

tribution is a power-law and Tangmunarunkit et al [99] showed that the degree

distributions produced by structure-based topology generators are not power-laws.

Since then the research community has largely dismissed the structure-based mod-

els as inadequate and proposed new network generators that attempt to generate

graphs with power-law degree distributions.

Tangmunarunkit et al [99, 100] also discovered, much to their surprise, that

network generators based on the degree distribution more accurately capture the

Internet large-scale structure (such as the hierarchical structure measured by Sub-

ramanian et al [78]). However their judgements were based on simple qualita-

tive comparisons and heuristic assumptions. Tangmunarunkit et al and other

researchers recognised [20, 21] that there is a need for further studies to charac-

terise the network topology structures.

One objective of this thesis is to provide parameters to quantitatively char-

acterise and differentiate the hierarchical structure of Internet-like scale-free net-

works.

3.4.2 Accuracy vs Simplicity

Since the discovery of the power-law degree distribution in the Internet, the num-

ber of models trying to explain the power-law has been growing very rapidly.

However, there is still no Internet evolution model that would be satisfactory

47

Page 48: Parameterising and Modelling the Internet Topology

from both the physical and networking standpoints [101]. As a result, the laws

governing the Internet evolution remain unclear.

The Barabasi-Albert (BA) model and its derivatives, popular among physi-

cists, have seen a lot of criticism from the networking community for being too

general, not incorporating any domain specifics, and, hence, failing to predict cor-

rectly many characteristics of the Internet topology and evolution. For example,

by examining the AS graph Data Sets from the Topology Project of University

of Michigan, Chen et al [74] show that available historical data of the AS-level

Internet does not support the connectivity-based dynamics assumed in the BA

model. And detailed dynamics underlying the BA modelling approach does not

explain the complex structure of the AS maps. The modified BA models have

similar problems. The same type of argument has been actively used against the

BA model by biologists.

On the other hand, the models proposed by the networking community try to

incorporate Internet evolution specifics by introducing a number of non-physical

parameters allowing one to easily fit the output of a model to the observed data

(e.g. [102]). It is easy to see that any model with sufficient number of external

parameters can be forced to produce any required output by parameter manipu-

lations. A model can be of some theoretical value only when all its parameters

can be expressed via physical variables.

All the existing Internet models only focus on selected network properties and

no model is capable of accurately capturing all the relevant topological properties

of the Internet topology. Furthermore it is uncertain which model is better than

other and researchers are even not sure whether it is feasible at all to accurately

reproduce the Internet topology with an evolving model using fairly simple and

realistic mechanisms.

Because of the above inadequateness and uncertainty of the research on the

Internet topology, random networks and regular lattice graphs are still often used

by the Internet engineering community in practical studies on routing behaviours

48

Page 49: Parameterising and Modelling the Internet Topology

and protocol simulations [103].

Another objective of this thesis is to provide realistic models to accurately

reproduce the AS-level Internet topology.

3.5 Summary

This Chapter introduces the recent measurements of the AS-level Internet topol-

ogy and a number of the Internet topology generators. This Chapter also discusses

the challenges in parameterising and modelling the Internet topology and sets out

the immediate context for this research.

49

Page 50: Parameterising and Modelling the Internet Topology

Chapter 4

Rich–Club Phenomenon

4.1 Introduction

Inspired by detailed measurements on the Internet hierarchical structure, this

chapter introduces the concept of the rich-club phenomenon, which describes an

overlooked hierarchical structure of the AS-level Internet, that high-degree nodes

are tightly interconnected with each other. Two metrics are provided to quanti-

tatively characterise this structural property.

4.1.1 Internet Hierarchical Structure

It is well-known that the Internet topology has a hierarchical structure. However

the description of this structure is merely qualitative and vague. Recently based

on measurements, Subramanian et al [78] has classified and identified the exact

details of the tier structure of the AS-level Internet topology. Subramanian studied

the topology structure in terms of customer-provider and peer-peer relationships

between autonomous systems as manifested in the BGP routing policies. Using

heuristic arguments based on the commercial relationship [104] between ASes,

they proposed a five-level classification of ASes.

Dense Core: For every AS present in the dense core, all of its peers and its

provider should also be present in the core. The core of the network should

50

Page 51: Parameterising and Modelling the Internet Topology

include the small number of so-called tier-1 providers. In practice, the term

Tier-1 provider is loosely defined as a “large” AS or as an AS that does not

have any upstream provider. These ASes could be identified by looking for

all provider-free nodes. The dense core consists of 20 ASes, including the

large Internet Service Providers (ISP) such as Genuity, Sprint, AT&T, and

UUNet. The top 20 ASes have a very dense connectivity of 312 peering

links. The top 15 of the 20 ASes almost form a clique with only three links

missing from the clique.

Transit Core: ASes in the transit core are large national providers and hosting

companies that have peering relationships with many of the ASes in the

dense core.

Outer Core: The remaining ASes in the core as the outer core. The members of

the outer core typically represent regional ISPs which have a few customer

ASes and a few peering relationships with other such regional ISPs.

Small Regional ISPs: Small Regional ISPs are ASes having one or more cus-

tomers and they have no ASes peering relationships.

Customers: Customers are those stub networks which are origins and sinks of

traffic and which do not carry any transit traffic.

Table 4.1: Distribution of ASes in the Internet hierarchy [78]

Level Number of ASes

Dense core (0) 20

Transit core (1) 129

Outer core (2) 897

Small regional ISPs (3) 971

Customers (4) 8898

51

Page 52: Parameterising and Modelling the Internet Topology

4.1.2 Connectivity Of The Core

Subramanian et al has showed that the graph constructed from ten BGP dumps

on 18 April 2001 has 10,915 ASes, of which 8,898 are customers and 971 are

small regional ISPs (see Table 4.1). The remainder of the network is the core,

consisting of a connected component with just 1046 ASes and 6249 connections.

This represents approximately 25% of the total number of connections in the

graph. The nodes in the core have an average degree of 6. The key result is that

the Internet has a tier structure, where the Tier 1 consists of a “core” of ASes

which are well interconnected to each other.

However the network research community did not pay sufficient attention on

this hierarchical property, because the approach that used in Subramanian et al ’s

analysis has a number of limitations. Firstly it is a time-consuming process,

which involves scrutinising on large amounts of various information data sources.

Secondly it is based on a number of heuristic assumptions on the commercial

relationships between network elements. Thirdly the result is represented as sev-

eral tables of numbers. Thus this analysis only applies to this specific case and

provides no comparison with other networks.

4.1.3 Motivation

The author noticed Subramanian et al ’s work and were very interested in the fact

that highly connected nodes are tightly interconnected with each other.

It is known that the AS-level Internet has a power-law degree distribution,

therefore it contains a small number of nodes which have very large numbers of

links. The AS-level Internet also exhibits the disassortative mixing behaviour [66,

67], where high-degree nodes tend to connect to nodes with low degrees. However

neither the power-law degree distribution nor the disassortative mixing suggest

whether the high-degree nodes are tightly or loosely interconnected with each

other.

52

Page 53: Parameterising and Modelling the Internet Topology

(a) (b)

Figure 4.1: Two disassortative networks. (a) High-degree nodes are loosely inter-connected. (b) High-degree nodes are tightly interconnected.

As shown in Figure 4.1, two networks having similar degree distributions and

disassortative mixing behaviours can exhibit different structures. In Figure 4.1-a

the high-degree nodes are not directly interconnected, whereas in Figure 4.1-b the

high-degree nodes are tightly interconnected. One can see that this structural

difference is relevant because the network routing is much more efficient when the

high-degree nodes have direct connections among each other.

The author realised that Subramanian et al ’s measurement on the connectivity

of the core actually revealed a structural property that had not been characterised

by the existing topological parameters. Then the author recognised that there was

a need for further studies to characterise this critical structure feature and the

author expected that measuring on the inter-connectivity among the high-degree

nodes using a quantitative metric might provide a clue for a deeper understanding

on the Internet topology, namely to answer the following two questions:

• How to quantitatively characterise the rich-club phenomenon?

• Do networks having power-law degree distributions, such as maps of the AS-

level Internet and synthetic scale-free networks generated by models, show

similar hierarchical structures?

4.2 Rich-Club Phenomenon

In 2002 the author introduced the concept of rich-club phenomenon [105] to de-

scribe the above hierarchical structure of the AS-level Internet. The rich-club

53

Page 54: Parameterising and Modelling the Internet Topology

phenomenon has two meanings. Firstly the network contains a small number of

highly connected nodes. These nodes are called “rich” nodes. Secondly the rich

nodes are tightly interconnected with each other and form a tight group, which

is called the “rich-club”. The term rich-club is used to resemble a popular phe-

nomenon in the human society, where rich upper-class people form an exclusive

club to promote social and business connections among the club members.

Note that the rich-club phenomenon does not imply that the majority of the

rich nodes’ links are directed to other club members. Indeed, rich nodes have

very large numbers of links and only a few of them are enough to provide the

connectivity to other club members, whose number is anyway small.

After many calculations and tests on various possible candidate parame-

ters [106, 107], the author provided two metrics to quantitatively characterise the

rich-club phenomenon, which are the rich-club connectivity and the node-node

link distribution. These two parameters are not associated with any heuristic as-

sumption but based only on the network connectivity information. The calculation

of the metrics is fairly simple and their topological meanings are straightforward.

The Four Networks

In this sections, the two metrics of the rich-club phenomenon are defined and mea-

sured in four different networks, which include an Extended BGP AS graph mea-

sured in May 2001 [30] and three synthetic networks generated by the Barabasi-

Albert (BA) model, the Fitness BA (FBA) model and the Inet-3.0 model. For

each model, ten networks are generated with different seed numbers and all results

are the average over the ten networks.

As shown in Table 4.2, the four networks have the same number of nodes and

similar numbers of links (except the Inet-3.0 network). Figure 4.2 shows that the

cumulative degree distribution Pcum(k) of the four networks follow power-laws.

The Pcum(k) of the AS graph is characterised by a power-law of slope -1.22, which

54

Page 55: Parameterising and Modelling the Internet Topology

Table 4.2: Networks parameters

AS Graph Inet-3.0 Fitness BA BA Model

Number of nodes, N 11461 11461 11461 11461

Number of links, L 32730 24171 34366 34366

Average degree, 〈k〉 5.7 4.2 6.0 6.0

Max. degree, kmax 2432 2010 1793 329

Power-law exponent, γ 2.22 2.22 2.255 3.0

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103

Cum

ulat

ive

dist

ribu

tion

Degree

Extended BGP AS graphInet-3.0 model

FBA modelBA model

Figure 4.2: Cumulative distribution of degree. For each model, ten networks aregenerated and averaged.

yields the power-law degree distribution of P (k) ∼ k−γ, γ ' 2.22. Table 4.2 shows

that the Inet-3.0 model and the FBA model have similar power-law exponents as

the AS graph, whereas the power-law slope of the BA model is 3.0.

The reasons that the author chose and compared these three models are be-

cause the BA model is the most widely-studied scale-free model, the FBA model

generates networks with a similar power-law exponent as the AS graph and the

Inet-3.0 model is designed to resemble the AS graph’s degree-distribution. Notice

that the author is not trying to characterise all the existing power-law network

generators, but to show that it is possible to distinguish between them by studying

the properties of the rich-club.

55

Page 56: Parameterising and Modelling the Internet Topology

4.2.1 Rich-Club Connectivity

A quantitative assessment of the rich-club phenomenon is obtained by measuring

the rich-club connectivity φ, defined as the fraction of allowable links1 that actu-

ally exist among members of a rich-club. The rich-club membership is specified in

two ways: nodes with degrees higher than k (“guys richer than k”), or nodes with

ranks less than r ( “the top r richest guys”). Thus the rich-club connectivity can

be plotted as a function of node degree or node rank. In order to be independent

from the scale of the network size, the rich-club connectivity is often plotted as a

function of node rank that is normalised by the number of network nodes. The

rich-club connectivity measures how well the members of the rich-club “know”

each other. A rich-club connectivity of 100% means that all the members have a

direct link to any other member. Lower percentages of this quantity means lesser

connections between them.

AS GraphInet-3.0Fitness BABA Model

0.001 0.01 0.1 1Normalized rank (r/N)

Ric

h cl

ub c

onne

ctiv

ity

0.1%

0.01%

1%

10%

100%

Figure 4.3: Rich-club connectivity φ(r/N) as a function of normalised rank r/N .

Figure 4.3 shows the rich-club coefficient φ(r/N) as a function of normalised

rank r/N . The figure illustrates that in the four networks, the rich-club subgraphs

formed by nodes of higher degrees are progressively more interconnected. However

it is clear that the four networks exhibit profound structural differences on the

1The number of allowable links in a n-node subgraph is n(n− 1)/2.

56

Page 57: Parameterising and Modelling the Internet Topology

tendency of high-degree nodes to be well interconnected among each other. For

example the rich nodes of the AS graph is significantly more tightly interconnected

than those of the three synthetic networks. As shown in Table 4.3, the top 1% rich

nodes in the AS graph have 32% of the allowable links, compared with φ(0.01) =

18% of the Inet-3.0 model and only φ(0.01) = 5% of the BA model and the Fitness

model.

Table 4.3: Rich-club properties

AS Graph Inet-3.0 Fitness BA BA Model

φ(r/N = 0.01) 32% 18% 5% 5%∑

rjl(ri ≤ 5%, rj) 28602 22620 20929 15687

l(ri ≤ 5%, rj ≤ 5%) 8919 3697 1426 1511

φ(r/N = 0.01) is the rich-club connectivity among the top 1% richest nodes.∑rj

l(ri ≤ 5%, rj) is the number of links connecting to the top 5% rich nodes.l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes.

4.2.2 Node-Node Link Distribution

The node-node link distribution is introduced to provide an more detailed view

of the network rich-club structure.

Network nodes are divided into subsets according to their ranks, for example

ranks are normalised by the total number of nodes and divided into 5% bins. Then

the node-node link distribution l(ri, rj), is define as the number of links connecting

from nodes in the subset ri to nodes in the subset rj, where ri ≤ rj. Figure 4.4

illustrates the node-node link distribution l(ri, rj), against corresponding rank

bins ri and rj.

In the Extended BGP AS graph (Figure 4.4a), rich nodes (see columns in the

row of ri = 5%) are connected preferentially to other rich nodes, where the number

of links interconnecting among the top 5% rich nodes (the far corner column) is

significantly larger than the numbers of links connecting the rich nodes to other

lesser rich nodes.

57

Page 58: Parameterising and Modelling the Internet Topology

l(ri,rj)

9000

05%

100%

5%

100%

rjri

Ext. BGP AS graph

(a)

l(ri,rj)

4000

05%

100%

5%

100%

rjri

Inet-3.0 model

(b)

l(ri,rj)

2000

05%

100%

5%

100%

rjri

BA model

(c)

l(ri,rj)

2000

05%

100%

5%

100%

rjri

FBA model

(d)

Figure 4.4: Node-node link distribution.

The node-node link distribution of the Inet-3.0 (Figure 4.4b) is similar to that

of the AS graph, however, the number of links interconnecting among the top

5% rich nodes (far corner, 3697 links) is significantly smaller than that of the AS

graph (8919 links, see Table 4.3).

The node-node link distributions of the BA and the Fitness BA graphs (Fig-

ure 4.4c, 4.4d) are fundamentally different from that of the AS graph. The top 5%

rich nodes of the BA and the Fitness BA graphs are connected to all node sub-

sets with similar probabilities regardless of the rank range of subsets. Networks

generated by these two models do not contain a tightly interconnected rich-club

at all.

4.3 Discussion

The rich-club phenomenon describes a hierarchical property of the AS-level In-

ternet that high-degree nodes are tightly interconnected with each other. Un-

til recently this structural feature has been overlooked by the network research

community. The author’s novel contribution is the introduction of the rich-club

58

Page 59: Parameterising and Modelling the Internet Topology

connectivity and the node-node link distribution, which for the first time pro-

vide a realistic way to quantitatively characterise and differentiate this structural

property of networks having power-law degree distributions. Results show that

synthetic scale-free networks generated by degree-based models may exhibit dif-

ferent hierarchical structures.

4.3.1 Rich-Club Subgraph

0

0.02

0.04

0.06

0 20 40 60 80 100

Dis

trib

utio

n

Degree

Figure 4.5: Degree distribution inside the rich-club subgraph which consisting ofthe top 1% rich nodes of the AS graph.

As shown in Figure 4.5, if the rich-club comprises the top 1% rich nodes of

the Internet AS graph, the probability distribution of degrees among the club

members is not a power-law, in fact it peaks around degree k = 25. Calculation

shows that the average distance between rich nodes is 1.73 hops, which is very

small and means if two club members do not have a direct link between them,

very likely they share a neighbour member.

4.3.2 Rich-Club Phenomenon Is Relevant

The rich-club consists of highly connected nodes, which are well interconnected

between each other and the average hop distance among the club members is very

small (1 to 2 hops). The rich-club is a “super” traffic hub of the network and the

59

Page 60: Parameterising and Modelling the Internet Topology

Internet’s disassortative mixing property ensures that peripheral nodes are always

near the hub. These two structural properties contribute to the routing efficiency

of a network.

Modelling the rich-club phenomenon is relevant [108], because an Internet

model that does not reproduces the properties of the rich-club will underestimate

the actual network’s routing efficiency (shortest path length) and routing flex-

ibility (alternative reachable paths), and also, it will overestimate the network

robustness under node-attack [77]. Chapter 6 will investigate into more details on

the impacts of the Internet rich-club structure.

4.3.3 Modelling The Rich-Club

Results show the Inet-3.0 model does not show the rich-club phenomenon as strong

as the Extended BGP AS graph. The reason is that the Inet-3.0 model is designed

to resemble the original BGP AS graph. For example, networks generated by the

model typically have 27% less links than the Extended BGP AS graph.

The BA model and Fitness BA model generate strict power-law degree distri-

butions, which are very different from that of the AS-level Internet. Moreover, it

does not show the rich-club phenomenon of the AS graph at all. This is due to

the growth dynamics of the models. In both models, new links are brought into

the system by the addition of new nodes. New nodes are preferentially connected

to high degree nodes. Thus inter-rich links can only appear when some new nodes

grow into rich nodes. However, due to the preferential attachment, the probabil-

ity for a new node to become a rich node decreases as the network grows. As a

result, rich nodes are not well interconnected between each other. This suggests a

simple modification to these models to generate a rich-club: as the network grows,

new internal links appear which are preferentially attached between the existing

nodes. An example is the Interactive Growth model, which will be introduced in

Chapter 5.

60

Page 61: Parameterising and Modelling the Internet Topology

The above analysis on the three network models demonstrates that the rich-

club connectivity is useful in revealing structural details of complex networks

and provides a new perspective for analysing the growth mechanisms of evolving

network models. In the following chapters, the rich-club connectivity is used as

both a new criterion for validating network structures and a practical guideline

for proposing new models.

4.4 Summary

The rich-club phenomenon describes the hierarchical structure of the AS-level

Internet where high-degree nodes are tightly interconnected with each other. This

structural property is quantitatively characterised by the rich-club connectivity

and the node-node link distribution. The calculation of the two metrics is simple

and solely based on graph connectivity information. The rich-club connectivity

is a critical complement to the existing topology parameters to explicitly and

thoroughly characterise large-scale complex network structures and it provides a

new criterion for network models.

61

Page 62: Parameterising and Modelling the Internet Topology

Chapter 5

Interactive Growth Model

5.1 Introduction

Chapter 4 shows that the rich-club connectivity quantitatively characterises the

hierarchical structure of the AS-level Internet and a number of degree-based In-

ternet models do not reproduce the rich-club connectivity as the actual network.

This chapter introduces the Interactive Growth (IG) model [109, 110], which

uses a growth mechanism that is based on observations on the Internet history

data. The model is validated against an Extended BGP AS graph and the IG

model is also compared with a number of other Internet models. Results show that

the IG model compares favourable with other models because it closely resembles

both the power-law degree distribution and the rich-club connectivity of the AS-

level Internet. The chapter also discusses the reasons that are responsible for the

topological differences between the network models.

The IG model, as an example of networks containing a rich-club, will be used

in the next chapter to investigate the impact of the network structures on the

network behaviours. The IG model is also the precursor of the Positive Feedback

Preference (PFP) model which will be introduced in Chapter 8.

62

Page 63: Parameterising and Modelling the Internet Topology

5.2 Interactive Growth Model

The Interactive Growth (IG) model modifies the Barabasi and Albert (BA) model

(see Section 3.3.5 on page 39) by using a so-called interactive growth mechanism,

which is based on a number of dynamic behaviours observed [74, 66, 67, 79] on the

Internet history data. Firstly there are two main operations that account for the

evolution of the Internet graph: the addition of new nodes and the appearance

of new internal links between already existing nodes (old nodes). Secondly the

majority of new nodes are added to the system by attaching them to only one or

two old nodes. Thirdly the degree distribution of the AS-level Internet is not a

strict power-law, for example it has more nodes with degree two than nodes with

degree one (P (k = 2) > P (k = 1)). Lastly the majority nodes (with low degrees)

in the AS-level Internet exhibit a linear preferential attachment as described in

the BA model (see Equation 3.2 on page 39).

Figure 5.1: The interactive growth mechanism of the IG model. a) A new node isattached to one old node and at the same time-step two new internal links appear.b) A new node is attached to two old nodes and one new internal link appears.

The interactive growth mechanism is shown in Figure 5.1. The IG model starts

with a small random network, at each time-step,

• with probability p ∈ [0, 1] (see Figure 5.1-a), a new node is attached to one

old node (host node), and at the same time two new internal links appear

connecting the host node to two other old nodes (peer nodes),

63

Page 64: Parameterising and Modelling the Internet Topology

• with probability 1−p (see Figure 5.1-b), a new node is attached to two host

nodes and one new internal link appears connecting one of the two host node

to one peer node.

The linear preference probability given by the BA model is used for the attach-

ment of new nodes and the appearance of new internal links. From numerical

simulations, the author found that p = 0.4 produces the best result to fit the

degree distribution and the rich-club connectivity of the AS-level Internet.

The interactive growth mechanism satisfies all the above observations on the

Internet evolution. Since the two growth operations are interdependent, at each

time-step the number of nodes of the network increases by one and the number

of links increases by three. Therefore the model produces a similar ratio of links

over nodes (L/N ' 3) as the AS-level Internet.

5.3 Model Validation

The IG model is compared against an Extended BGP AS graph measured in May

2001. The IG model is also compared with synthetic networks generated by other

Internet models, such as the BA model, the Inet-3.0 model (see Section 3.3.4 on

page 39) and the GLP model (see Section 3.3.10 on page 44). For each model, ten

networks are generated with different seed numbers and all results are the average

over the ten networks.

As shown in Table 5.1, all the model networks have the same number of nodes

and similar numbers of links as the AS graph. The GLP(1) network is generated

using parameters of ρ = 0.66, m = 1,m0 = 10 and β = 0.6447, as recommended

by Bu et al [44]. The GLP(2) network uses the same parameters except β = 0,

which makes the GLP model’s generalised linear preference of Equation 3.9 (on

page 45) equivalent to the linear preference of the Equation 3.2 (on page 39) used

by the BA model and the IG model.

64

Page 65: Parameterising and Modelling the Internet Topology

Table 5.1: Network properties

N L γ kmax 〈k〉 P (k = 1) P (k = 2) P (k = 3)

AS graph 11461 32730 2.22 2432 5.7 28.9% 40.3% 11.6%

IG model 11461 34363 2.22 842 6.0 26.0% 33.8% 10.5%

GLP(1) 11461 34363 2.20 517 6.0 68.4% 11.3% 5.1%

GLP(2) 11461 34363 2.20 524 6.0 52.0% 16.3% 7.9%

Inet-3.0 11461 24171 2.22 2010 4.2 40.0% 36.7% 8.2%

BA model 11461 34363 3.0 329 6.0 0% 0% 40.0%

N - Number of nodes. L - Number of links. γ - power-law exponent. kmax - maximum degree.〈k〉 - average degree. P (k) - degree distribution, percentage of nodes with degree k.

5.3.1 Degree Distribution

5.3.1.1 Degree Distribution

AS graphIG modelGLP (1)GLP (2)Inet modelBA model

100 101 102 103 104

k

10-5

10-1

10-4

10-3

10-2

P(k)

100

Figure 5.2: Degree distribution. For each model, ten networks are generated andaveraged.

Figure 5.2 and Table 5.1 show that the IG model and the Inet-3.0 model closely

match the degree distribution of the AS graph, particularly the low-range degree

distributions, where the percentage of nodes with degree one P (1), is actually

smaller than the percentage of nodes with degree two P (2). The low-range degree

distribution is important because nodes with degree one and two account for more

65

Page 66: Parameterising and Modelling the Internet Topology

than 70% of the total number of nodes in the AS graph.

The IG model is a dynamic growing model and it is the growth mechanism that

defines the model’s topological properties, including the degree distribution. The

reason that the Inet-3.0 model well matches the AS graph’s degree distribution

is because the static model is designed to resemble the Internet measurements,

where links are attached to nodes according to pre-assigned node degrees.

The BA model is based solely on the addition of new nodes. In order to obtain

a similar ratio of links over nodes as the AS-level Internet, each new node in the

BA model is attached to three old nodes (m = 3) and therefore P (1) = P (2) = 0.

Bu et al recommend the parameter m = 1 for the GLP model, thus each new

node is attached to only one old node. As a result the probability of nodes with

degree one of the two GLP networks are significantly larger than that of the actual

network (see Table 5.1). For example, P (1) of the GLP(1) is as high as 68.4%,

which is more than twice of the AS graph.

5.3.1.2 Degree vs Rank

AS graphIG modelGLP (1)GLP (2)Inet modelBA model

100

104

101

102

103

100 101 102 103 104 105

k

r

Figure 5.3: Degree k as a function of rank r.

Figure 5.3 shows degree k as a function of rank r on a log-log scale. The

AS graph has a nearly strick power-law relationship between degree and rank,

66

Page 67: Parameterising and Modelling the Internet Topology

k ∼ r−0.85. The curves of the two GLP networks are not power-laws. The BA

model exhibits a power-law behaviour between degree and rank, but the power-

law exponent is significantly different from that of the AS graph. The curve of

the Inet-3.0 network deviates from the AS graph between k = 101 ∼ 103. Apart

from a few richest nodes (r ≤ 101), the IG model in general well matches the

correlation between degree and rank of the AS graph.

5.3.2 Rich-club Phenomenon

Networks generated using the IG model and the GLP model should exhibit a

higher rich-club connectivity than the BA model, because new internal links added

in the IG model and the GLP model preferentially connect among already well

connected nodes.

5.3.2.1 Rich-Club Connectivity

AS graphIG modelGLP (1)GLP (2)Inet modelBA model

0.1% 1% 10% 100%0.01%

0.1%

1%

10%

100%

r/N

φ(r/

N)

Figure 5.4: Rich-club connectivity, φ(r/N), as a function of normalised rank, r/N .

Figure 5.4 shows the rich-club connectivity φ(r/N) as a function of normalised

rank r/N on a log-log scale. The plot shows that only the IG model closely

matches the rich-club connectivity of the AS graph. The rich-club connectivity of

the Inet-3.0 model and the BA model are significantly lower than that of the AS

67

Page 68: Parameterising and Modelling the Internet Topology

graph. It is interesting to notice that the rich-club connectivity of the two GLP

networks are higher than that of the AS graph. This means the rich nodes in

these two models are even more tightly interconnected among each other than in

the AS graph. For example, the AS graph and the IG model have φ(0.01) = 32%,

comparing with φ(0.01) = 72% of the GLP(1) and φ(1%) = 50% of the GLP(2).

5.3.2.2 Node-Node Link Distribution

0%

5%

10%

15%

20%

25%AS graph

5%100%

5%

100%rj

ri

l(r ,r )/Li j

a).

0%

5%

10%

15%

20%

25%IG model

5%100%

5%

100%rj

ri

l(r ,r )/Li j

b).

Figure 5.5: Node-node link distribution l(ri, rj), which is normalised by L, the totalnumber of links.

Figure 5.5 shows that the IG model well resembles the node-node link distri-

bution of the Extended BGP AS graph.

Table 5.2: Node-node link distribution

AS graph IG GLP(1) GLP(2) Inet BA

Number of links, L 32730 34363 34363 34363 24171 34363∑

rjl(ri ≤ 5%, rj) 29602 26422 32376 29073 22620 15687

l(ri ≤ 5%, rj ≤ 5%) 8919 7806 16210 11540 3697 1511

∑rj

l(ri ≤ 5%, rj) is the number of links connecting to the top 5% rich nodes;l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes.

In order to compare all the networks together, Figure 5.6 shows a simplified

version of the node-node link distribution, l(ri ≤ 5%, rj), which has only one vari-

able of rj and illustrates where the top 5% rich nodes (ri ≤ 5%) are connected

68

Page 69: Parameterising and Modelling the Internet Topology

0

4000

8000

12000

16000

GLP model (1)

GLP model (2)

AS graph

IG model

Inet-3.0 model

BA model

0% 50% 100%rj

l(ri<

5%, r j)

Figure 5.6: Node-node link distribution, l(ri ≤ 5%, rj).

to. Figure 5.6 shows that only the IG model reproduces the node-node link distri-

bution of the AS graph. The two GLP networks exhibit a rich-club phenomenon

notably stronger than the AS graph. As shown in Table 5.2, the GLP(1) has

l(ri ≤ 5%, rj ≤ 5%) = 16210 links connecting among the top 5% rich nodes,

nearly twice as that of the AS graph.

5.4 Discussion

Based on observations on the Internet history data, the IG model uses the in-

teractive growth mechanism, in which the attachment of new nodes and the ap-

pearance of new internal links are interdependent. It is the growth mechanism

that defines the topological structure of the model. The simple and dynamic

IG model compares favourable with other Internet topology generators because

it closely resembles both the degree distribution and the rich-club phenomenon

of the AS-level Internet. Networks generated using the IG model, as illustrated

in Figure 5.7, have already been used in simulation studies on the TCP packet

traffic [111, 112].

The IG model is different from other models that also use the appearance

of new internal links (see Chapter 3), such as Dorogovtsev and Mendes’ model,

69

Page 70: Parameterising and Modelling the Internet Topology

Figure 5.7: A network generated using the IG model. The size of a node is pro-portional to the number of its degree-one neighbours which have been removed tosimplify the graph.

Bu and Towsley’s Generalised Linear Preference (GLP) Model and Caldarelli ’s

Generalised Network Growth (GNG) Model. These models have explored various

schemes of preference probability. However these schemes are not supported by

measurements on the Internet. In addition, these models do not satisfy the growth

dynamics that have been observed on the Internet history data.

5.4.1 Maximum Degree

The IG model still has problems. For example the AS-level Internet features a

very large value of maximum degree (kmax = 2432, see Table 5.1), which is three

times of that of the IG model (842).

Figure 5.8 shows that the time-evolution of node degree in the BA model and

the IG model obey a power-law, k(t) ∼ tθ. As predicted by Barabasi et al [95],

θ of the BA model is 0.5. The author’s calculation shows that θ of the IG model

is 0.6. This means the node degree in the IG model increases at a higher rate

than in the BA model. The reason is that in the IG model old nodes (with high

70

Page 71: Parameterising and Modelling the Internet Topology

Deg

ree

grow

th, k

(t)

10

100

1000

10000

100 1000 10000 100000

IG model

BA model

Timestep, t

Figure 5.8: Time-evolution: degree growth k(t) vs time t of a node added in anearly time-step.

degrees) have more chances to acquire new connections than those in the BA

model. As shown in Figure 5.1, at each time-step of the IG model, statistically 4

or 5 old nodes acquire new links, whereas at each time-step of the BA model, only

three old nodes acquire new links. As a result, although using the same linear

preference, the maximum degree of the IG model is higher than that of the BA

model (329).

The two GLP networks use the same growth mechanisms except the preference

probability. The GLP(1) network uses the recommended parameter value of β =

0.6447 and the generalised linear preference increases “the preference of being

connected to high-degree nodes”. The GLP(2) network uses the parameter value

of β = 0, which makes the generalised linear preference equivalent to the linear

preference. However as shown in Table 5.1, the maximum degree of the GLP(1)

network (517) and the GLP(2) network (524) are similar. This implies that the

generalised linear preference does not effectively increase the maximum degree of

the generated network.

The author has noticed that, as shown in Table 4.2 on page 55, the Fitness

BA model generates networks with a large maximum degree. The reason is that

when a high-degree node in the FBA model is assigned a high value of the fitness

parameter, the node will obtain much stronger ability of acquiring links than other

71

Page 72: Parameterising and Modelling the Internet Topology

nodes. This suggests a possible way of reproducing a large maximum degree by

increasing the high-degree nodes’ preference probability (to be stronger than the

generalised linear preference). Chapter 8 will study further on the maximum

degree.

5.4.2 Rich-Club Connectivity

The IG model closely reproduce the rich-club phenomenon of the AS-level Inter-

net, whereas the two GLP networks exhibit a rich-club phenomenon significantly

stronger than the actual network. The reason is that the IG model and the GLP

model have different numbers of new internal links being added during the net-

work growth, even though the two models contain the same numbers of nodes and

links.

In the IG model, the addition of new nodes (attached by new external links)

and the appearance of new internal links (between old nodes) are interdependent.

According to the interactive growth mechanism, at each time-step, statistically

the number of nodes increases by one and the number of links increases by three,

of which the number of newly added internal links is p × 2 + (1 − p) ∗ 1 = 1.4,

where p = 0.4, and the number of newly added external links is 3 − 1.4 = 1.6.

Thus the ratio of new internal links over new external links is 1.4/1.6 = 0.875.

In the GLP model, the addition of new nodes and the appearance of new

internal links are independent. According to the GLP model’s growth mechanism,

if ρ = 0.66 and m = 1, statistically when the number of nodes increases by one, the

number of newly added external links also increases by one, whereas the number

of newly added internal links increases by two. Thus the ratio of new internal

links over new external links is 2/1 = 2, which is significantly larger than that of

the IG model.

Due to the preferential attachment, new internal links tend to connect be-

tween already well connected nodes. Therefore networks generated by the GLP

72

Page 73: Parameterising and Modelling the Internet Topology

model exhibit a rich-club phenomenon stronger than the IG model. The rich-club

connectivity (see Figure 5.4) of the GLP(1) network deviates from the AS graph

more than the GLP(2) network does. This is because the GLP(2) network uses

the linear preference probability while the GLP(1) network uses the generalise lin-

ear preference, which increases the preference of being connected to high-degree

nodes.

The above discussions on the reasons that are responsible for the topological

differences between the network models provides novel insights on how network

growth mechanisms shape the generated topologies.

5.5 Summary

The Interactive Growth model is based on observations on the Internet history

data. This simple and dynamic model closely resembles both the degree distri-

bution and the hierarchical structure characterised by the rich-club connectivity.

The IG model is a good step forward towards realistically modelling the Internet

topology.

73

Page 74: Parameterising and Modelling the Internet Topology

Chapter 6

Structure Affects Functions

6.1 Introduction

Chapter 4 shows that networks having similar degree distributions may exhibit

different hierarchical structures characterised by the rich-club connectivity. Chap-

ter 5 proposed the Interactive Growth (IG) model, which can reproduces both the

degree distribution and the rich-club connectivity of the AS-level Internet. This

Chapter investigates whether the rich-club structure is relevant and how the net-

work structure affects the network functionality [108].

Three network behaviours are analysed and simulated, which are the network

routing efficiency, redundancy and robustness. The analyses and simulations are

based on three networks, including a Traceroute AS graph measured in April

2002 [28, 113] and two synthetic networks generated by the Interactive Growth

(IG) model and the Fitness Barabasi-Albert (FBA) model (see Section 3.3.6 on

page 41). For each model, ten networks are generated with different seed numbers

and all results are the average over the ten networks.

As shown in Table 6.1, Figure 6.1 and Figure 6.2, the three networks have

have similar sizes and power-law degree distributions. The IG model is an ex-

ample of networks having a densely interconnected rich-club as the AS graph,

whereas the FBA model is an example of networks that do not exhibit the rich-

74

Page 75: Parameterising and Modelling the Internet Topology

Table 6.1: Network Parameters

AS graph IG graph FBA graph

Number of nodes, N 11122 11122 11122

Number of links, L 30054 33349 33349

Average degree, 〈k〉 5.4 6.0 6.0

Max. degree, kmax 2839 842 1793

Power-law exponent, γ 2.2 2.22 2.255

P (k = 1) 26.1% 37.9% 14.0%

P (k = 2) 26.0% 33.3% 10.3%

P (k = 3) 0 0 50.5%

Characteristic path length, l∗ 3.13 3.56 3.86

1 10 100 1000

Cum

ulat

ive

dist

ribu

tion

Node Degree

FBA model

AS graphIG model

0.1%

1%

100%

0.01%

10%

Figure 6.1: Cumulative distribution of degree.

0.1%

1%

10%

100%

1% 10% 100%

Ric

h-cl

ub c

onne

ctiv

ity

Normalized rank0.1%

FBA model

AS graphIG model

Figure 6.2: Rich-club connectivity.

club phenomenon.

6.2 Routing Efficiency

One of the important features of the AS-level Internet is that the network is a

small-world network, which features a very small characteristic path length (see

section 2.2.4). It is possible for a network with a small characteristic path length

to achieve better routing efficiency. Please note that the routing efficiency of a

network is not only determined by the network structure, but also by the routing

protocol and many other engineering factors.

As shown in Figure 6.3, the cumulative distributions of shortest path length of

75

Page 76: Parameterising and Modelling the Internet Topology

0

0.2

0.4

0.6

0.8

1

2 3 4 5 6

Cum

ulat

ive

dist

ribu

tion

Shortest path length, l

AS graphIG model

FBA model

Figure 6.3: Cumulative distribution of shortest path length. For each model, tennetworks are generated and averaged.

the AS graph and the IG model exhibit similar patterns and they are displaced to

the left of the FBA model. As mentioned in Chapter 4, the tightly interconnected

rich-club of the Internet provides a large selection of shortcuts for the network

traffic. It is not surprising that comparing with the FBA model, the IG model

better models the shortest path length of the AS graph, because the IG model

reproduces the actual network’s rich-club structure.

However the IG model still does not accurately reproduce the characteristic

path length of the AS graph. The characteristic path length of the IG model is

0.43 hop longer than that of the AS graph. The 0.43 hop difference is notable in

terms of network routing efficiency considering the fact that the characteristic path

length of the networks is less than 4 hops. More details on accurately reproducing

the characteristic path length will be provided in Chapter 8.

6.3 Network Redundancy

Cycles [114, 97, 115, 116] encode the redundant information in the network struc-

ture. The number of short cycles (triangles and quadrangles) are relevant prop-

erties because the multiplicity of paths between any two nodes increases with the

density of short cycles (note that an alternative path between two nodes can be

76

Page 77: Parameterising and Modelling the Internet Topology

longer than their shortest-path). The triangle coefficient kt, is defined as the num-

ber of triangles that a node shares. Similarly the quadrangle coefficient kq, is the

number of quadrangles that a node has.

The clustering coefficient c of a node (see section 2.2.6) can be expressed as a

function of the node’s degree k and triangle coefficient kt,

c =kt

k(k − 1)/2. (6.1)

The reason that the author studied short cycles instead of clustering coefficient is

that short cycles have the advantage of providing neighbour clustering information

of nodes with different degrees [117].

Table 6.2: Network Short Cycles

AS graph IG graph FBA graph

Maximum triangle coef., kt−max 7482 4962 1191

Average triangle coef., 〈kt〉 12.7 10.0 0.6

Maximum quadrangle coef., kr−max 9648 9247 4638

Average quadrangle coef., 〈kq〉 227.4 108.0 10.4

0.01%

0.1%

1%

10%

100%

1 10 100 1000 10000

AS graphIG modelFBA model

Triangle coefficient

Dis

trib

utio

n

Figure 6.4: Distribution of triangle coef.

0.01%

0.1%

1%

10%

100%

1 10 100 1000 10000

Cum

ulat

ive

dist

ribu

tion

Triangle coefficient

FBA model

AS graphIG model

Figure 6.5: Cumulative distribution oftriangle coefficient.

Figure 6.4, 6.5, 6.6, 6.7 and Table 6.2 show that the AS graph and the IG

model have significantly more triangles and quadrangles than the FBA model.

This implies that the AS graph and the IG model have more possible alternative

77

Page 78: Parameterising and Modelling the Internet Topology

0.01%

0.1%

1%

10%

100%

1 10 100 1000 10000

AS graphIG modelFBA model

Rectangle coefficient

Dis

trib

utio

n

Figure 6.6: Distribution of quadrangle coef.

0.01%

0.1%

1%

10%

100%

1 10 100 1000 10000

Cum

ulat

ive

dist

ribu

tion

Rectangle coefficient

FBA model

AS graphIG model

Figure 6.7: Cumulative distribution ofquadrangle coefficient.

routing paths, are more flexible in traffic routing and hence show higher degrees

of network redundancy than the FBA model.

By reproducing the rich-club phenomenon, the IG model resembles the AS

graph’s network redundancy property. The reason is that the tightly intercon-

nected rich-club increases the number of short cycles in the network.

6.4 Network Robustness

Barabasi et al [77] showed that it is difficult to divide power-law topologies into

separate subnetworks by removing nodes at random (error), but it is very easy to

split them into subnetworks by removing specific nodes (attack, see Figure 6.8).

Barabasi et al ’s study was based on the BA model, which does not exhibit the

rich-club phenomenon of the AS-level Internet.

Largest cluster

node-attack

Figure 6.8: Node attack.

78

Page 79: Parameterising and Modelling the Internet Topology

Here the author study whether the rich-club structure has an impact on the

network robustness property in four scenarios:

• Node error – randomly remove a node and its links. Node error resembles

the scenario when a node is out of service due to unpredictable technical

problems, such as hardware failure.

• Node attack (see Figure 6.8) – firstly remove the best-connected node and

its links, and continue select and remove nodes in decreasing order of their

degrees. Node attack resembles the scenario in the actual Internet when a

node, AS or router, is “collapsed” (out of service) due to infection of mali-

cious virus, or is severely congested (denial of service) due to targeted traffic

surge. Node attack vulnerability is of great interest for network research.

• Link error [118] – randomly remove a link.

• Link attack (see Figure 6.9) – remove the link connecting between the best-

connected nodes. e.g. if a link connects between node i and node j with

degree ki ≤ kj, then the first removed link is connecting between nodes with

the largest ki.

link-attack

Figure 6.9: Link attack.

The network robustness is measured by the size of the largest cluster in the

remaining network after the error or attack operations (see Figure 6.8), where a

cluster is defined as a subnetwork in which all nodes are reachable via paths of

links. Please note that in the real Internet, there are other engineering factors

which might affect the network robustness.

79

Page 80: Parameterising and Modelling the Internet Topology

6.4.1 Node Error

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

AS graph

IG model

FBA model

f - fraction of nodes removed randomly

S -

norm

aliz

ed s

ize

of l

arge

st c

lust

er

Figure 6.10: Network robustness under node error.

Figure 6.10 is a plot of the normalised size of the largest cluster S, shown as a

fraction of the number of nodes in the original network against f , the fraction of

nodes randomly removed. The figure shows that the three networks display high

degrees of robustness under the node error. When 10% of nodes are randomly

removed, 90% of the networks can still communicate. The reason is that the

three networks have power-law degree distributions, which means most randomly

removed nodes are low-degree nodes and therefore limited node error has little

impact on the network integrity.

6.4.2 Node Attack

Figure 6.11 shows that the AS graph and the IG model are extremely vulnerable

under the node attack. The removal of only a few of its best connected nodes can

result in a disconnected network. For example, when the top 5%–10% rich nodes

are under attack, both networks collapse into small pieces. It is notable that when

only 1% best connected nodes of the AS graph are removed, nearly 40% nodes

are detached from the network. By comparison, the FBA model shows a fairly

high degree of robustness under the node attack. When the top 10% rich nodes

are removed, the largest cluster still contains 65% nodes.

80

Page 81: Parameterising and Modelling the Internet Topology

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3

AS graph

IG model

FBA model

f - fraction of nodes under attack

S -

norm

aliz

ed s

ize

of l

arge

st c

lust

er

Figure 6.11: Network robustness under node attack.

This is because the node attack is equivalent of removing members of the rich-

club. The rich-club plays a dominant role in the network’s connectivity and the

segmentation of the rich-club can break down the whole network’s integrity.

6.4.3 Link Error

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

AS graph

IG model

FBA model

f - fraction of links removed randomly

S -

norm

aliz

ed s

ize

of l

arge

st c

lust

er

Figure 6.12: Network robustness under link error.

Figure 6.12 shows that the three networks are fairly resilient to the link error.

Comparing with the AS graph and the IG model, the FBA model exhibit a higher

degree of resilience to the link error. The reason is that, as introduced in Chapter

3, during the growth of the FBA model each new node is attached to the network

with m = 3 connections and therefore all nodes in the FBA model have at least 3

links. If a node loses one or two links, it still connect with the rest of the network.

81

Page 82: Parameterising and Modelling the Internet Topology

However as shown in Table I, nearly 70% nodes of the AS graph and the IG model

have only one or two links. These nodes are more easily to be isolated from the

the system under the process of link error.

6.4.4 Link Attack

0

0.2

0.4

0.6

0.8

1.0

0 0.2 0.4 0.6 0.8 1

AS graph

IG model

FBA model

f - fraction of links under attack

S -

norm

aliz

ed s

ize

of l

arge

st c

lust

er

Figure 6.13: Network robustness under link attack.

Figure 6.13 shows that the three networks are tolerant to the link attack due

to large numbers of redundant links among high-degree nodes. The networks’

overall reachability are not damaged even after about 60% links are removed in

the attack mode. When the link attack continues, the networks start to experience

a percolation transition [119], in which the networks suddenly collapse from a

network cluster into disassembled small sub networks.

6.5 Discussion

This Chapter showed it is necessary to reproduce the rich-club structure because

it has significant impacts on the network dynamic properties. An Internet model

that does not reproduces the rich-club properties underestimates the actual net-

work’s routing efficiency in terms of shortest path length and routing flexibility

in terms of alternative reachable paths, and overestimate the network robustness

under node-attack.

82

Page 83: Parameterising and Modelling the Internet Topology

Top

Base

Figure 6.14: A conical structure model.

To help understand these impacts, Figure 6.14 illustrates the rich-club struc-

ture of the AS graph as a conical hierarchical structure, which can be regarded

as a simplified version of the conceptual Jellyfish model [120]. At the top of the

cone is the rich-club, which contains the 5% richest nodes and 27% total links in

the network. On the base of the cone is the rest 95% nodes, which have only 13%

links connecting among them. The other 60% links in the network connect the

rich-club to the nodes On the base.

The conical structure reveals a number of interesting features of the AS graph.

Firstly members of the rich-club in the cone are tightly interconnected and the

average path length among them is very small. Secondly majority of the pe-

ripheral low-degree nodes are only one step away from the rich-club. Thus the

rich-club acts as a super traffic hub by providing a large number of shortcuts

for peripheral-to-peripheral communications and therefore improves the network

routing efficiency. Also the rich-club improves the network redundancy because

the rich-club interconnections significantly increase the density of short cycles

and then form a large number of alternative routing paths. However the domi-

nant role of the rich-club makes the network very fragile under node-attack. When

the integrity of the rich-club is undermined by the removal of a few of its richest

members, the whole network’s integrity is break down as well. It is interesting to

83

Page 84: Parameterising and Modelling the Internet Topology

notice that improved network redundancy does not necessarily results in improved

network robustness.

6.6 Summary

In summary, comparison results show that the rich-club plays a dominant role in

the network. It improves the network routing efficiency and redundancy, but at

the cost of the network robustness under the node attack. Realistic models of the

Internet topology should correctly reproduce the rich-club phenomenon.

84

Page 85: Parameterising and Modelling the Internet Topology

Chapter 7

Topological Disparities Between

Internet Measurements

7.1 Introduction

Only recently reliable measurements on the AS-level Internet topology became

available. As introduced in Chapter 3, there are three major data sources pro-

duced by two measuring methodologies. Many studies on the Internet topology

are based on the BGP AS graphs and the Extended BGP AS graphs, which are

produced by the passive measurements [25, 26, 27] using the BGP routing tables.

The Extended BGP AS graphs [30, 74] use additional information sources, such

as the Internet Routing Registry (IRR) data and the Looking Glass (LG) data,

and obtain 40% more links than the BGP AS graphs [121]. The Traceroute AS

graphs are produced by the active measurement methodology [28, 29] using the

traceroute probing data. The Traceroute AS graphs have about 30% more links

than the BGP AS graphs.

There have been comparisons between the BGP AS graph and the Extended

BGP AS graph [74, 121] and there have also been comparisons between the BGP

AS graph and the Traceroute AS graph [80, 81, 73].

In this Chapter the author provides a systematic comparison among all of

85

Page 86: Parameterising and Modelling the Internet Topology

the three measurements. The author investigates a BGP AS graph, an Extended

BGP AS graph and a Traceroute AS graph, which are measured recently and have

similar numbers of nodes. The author examines a number of statistical topology

properties [122] and try to find out whether the Extended BGP AS graph and the

Traceroute AS graph are structurally equivalent and which measurement is more

complete or realistic.

Results show that the three AS graphs have non-trivial structural differences.

The major topological disparity, which is quantified by the metric of rich-club

connectivity, is that the two BGP-based graphs have less links connecting among

highly connected nodes than the Traceroute AS graph. The Traceroute AS graph

and the Extended BGP AS graph both have a notable number of links that do

not present in the other graph. The extra links contained in the Traceroute graph

are connections among the high-degree nodes. Although a small number, these

links are relevant to network behaviours, such as routing efficiency (shortest path

length) and routing flexibility (density of short cycles). Whereas the extra links

contained in the Extended BGP AS graph do not have significant impacts on the

network behaviours.

The author suggests that the traceroute-derived data, by comparison, are more

realistic measurements of the Internet topology. Chapter 8 will use the Traceroute

AS graph to validate the PFP model.

86

Page 87: Parameterising and Modelling the Internet Topology

7.2 Comparison

The Traceroute AS graph was measured in April 2002 [28, 113]. The BGP AS

graph and the Extended AS graph were both measured in May 2001 [30]. As shown

in Table 7.1, the three AS graphs have similar numbers of nodes but different

numbers of links. The BGP AS graph has 40% less links than the Extended BGP

AS graph and 28% less links than the Traceroute AS graph.

Table 7.1: Parameters of the three AS graphs

Traceroute Extended BGP BGP

Number of nodes N 11122 11461 11174

Number of links L 30054 32730 23409

Average degree 〈k〉 5.4 5.7 4.2

Max. degree kmax 2839 2432 2389

Power-law exponent γ 2.22 2.22 2.22

7.2.1 Degree Distribution

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103

Cum

ulat

ive

dist

ribu

tion

Degree

TracerouteExtended BGP

BGP

slope -1.22

Figure 7.1: Cumulative degree distribution.

As shown in Figure 7.1, the cumulative degree distribution of the three AS

graphs are characterised by slope -1.22, which yields the degree distribution of

P (k) ∼ k−γ with power-law exponent γ ' 2.22 (see Table 7.1).

87

Page 88: Parameterising and Modelling the Internet Topology

10 -4

10 -3

10 -2

10 -1

10 0

10 0 10 1 10 2 10 3

Dis

trib

utio

n

Degree

Traceroute Extended BGP

BGP

Figure 7.2: Degree distribution.

Figure 7.2 shows more common details of the non-strict power-law degree dis-

tributions. There are more nodes with degree two than node with degree one

(P (k = 2) = 37 ∼ 40% > P (k = 1) = 26 ∼ 34%) and the distributions have

heavy tails where the maximum degrees kmax are very large (see Table 7.1).

100

101

102

103

100 101 102 103 104

Deg

ree

Rank

TracerouteExtended BGP

BGP

slope -0.85

Figure 7.3: Degree vs rank.

Figure 7.3 shows that all the three AS graphs show a power-law relationship

between degree and rank (k ∼ r−0.85).

In general, the three AS graphs have fairly similar degree distributions despite

having different numbers of links.

88

Page 89: Parameterising and Modelling the Internet Topology

7.2.2 Rich-Club Connectivity

Table 7.2: Rich-club connectivity φ(k) as a function of degree, k.

Degree Traceroute graph Extended BGP graph BGP graph

1000 100% 100% 100%

300 100% 75% 80%

100 60% 46% 50%

30 14% 11% 14%

10 -3

10 -2

10 -1

100

100 101 102 103

Ric

h-cl

ub c

onne

ctiv

ity

Degree

TracerouteExtended BGP

BGP

slope 1.3

Figure 7.4: Rich-club connectivity φ(k) as a function of degree.

Since the three AS graph have similar maximum degrees, Figure 7.4 shows

the rich-club connectivity as a function of degree, φ(k). In general the rich-club

connectivity φ(k) of the three AS graphs follow a power-law behaviour of φ(k) ∼kυ with υ = 1.2 ± 0.1. The difference is that the high-degree nodes (k > 102) in

the Traceroute AS graph are more tightly interconnected among each other than

in the other two BGP AS graphs. For example as shown in Table 7.2, nodes with

degrees larger than 300 in the Traceroute AS graph form a fully connected mesh,

whereas in the other two BGP graphs the rich-club connectivity is 75 ∼ 80%. It

is interesting to notice that, although the Extended BGP AS graph has 40% more

links than the BGP AS graph, the rich-club connectivity φ(k) (as a function of

degree) of the two graphs are fairly close.

89

Page 90: Parameterising and Modelling the Internet Topology

Table 7.3: Rich-club connectivity φ(r/N) as a function of normalised rank, r/N .

r/N Traceroute graph Extended BGP graph BGP graph

0.001 100.0% 80.3% 83.3%

0.005 53.9% 48.3% 32.8%

0.008 35.1% 35.1% 21.5%

0.02 12.6% 16.2% 7.9%

0.05 3.8% 5.4% 2.4%

0.1 1.5% 2.0% 1.0%

10 -3

10 -2

10 -1

100

10 -3 10 -2 10 -1 100

Ric

h-cl

ub c

onne

ctiv

ity

Normalized Rank (r/N)

TracerouteExtended BGP

BGP

r/N=0.008

Figure 7.5: Rich-club connectivity φ(r/N) as a function of normalised rank.

Figure 7.5 reveals more details by illustrating the rich-club connectivity as a

function of rank normalised by the number of nodes, φ(r/N).

• In the Traceroute AS graph, the rich nodes are tightly interconnected and

the rich-club of r/N = 0.001 is a fully connected mesh (see Table 7.3).

• In the BGP AS graph, the rich-club connectivity φ(r/N) is notably smaller

than the other two graphs because it has significantly less links.

• Rich nodes (r/N < 0.008) of the Extended BGP AS graph are not as tightly

interconnected as the Traceroute AS graph. It is interesting to notice that

lesser rich nodes (10−2 < r/N < 10−1) of the Extended BGP AS graph are

actually has a larger rich-club connectivity than the Traceroute AS graph.

90

Page 91: Parameterising and Modelling the Internet Topology

This suggests that although the Extended BGP AS graph uses additional

information sources and have 40% more links than the BGP AS graph, it is not

capable of capturing all the inter-rich links that present in the Traceroute AS

graph. Nevertheless the Extended BGP AS graph has a notable number of links

connecting among lesser rich nodes and these links are not contained in the BGP

AS graph and the Traceroute AS graph.

7.2.3 Shortest Path Length

0.0

2.0

4.0

6.0

8.0

1.0

2 3 4 5 6

Cum

ulat

ive

dist

ribu

tion

Shortest path length

TracerouteExtended BGPOriginal BGP

Figure 7.6: Cumulative distribution of shortest path length.

1.5

2.0

2.5

3.0

3.5

4.0

10 0 10 1 10 2 10 3

Shor

test

pat

h le

ngth

Degree

Traceroute Extended BGP

BGP

Figure 7.7: Correlation between shortest path length l and degree, where l is theaverage of nodes with the same degree.

91

Page 92: Parameterising and Modelling the Internet Topology

Figure 7.6 shows that the Extended BGP AS graph and the BGP AS graph

have nearly the same cumulative distribution of shortest path length, which are

displaced to the right of the Traceroute AS graph. Figure 7.7 shows that in general

the shortest path length of a node in the two BGP AS graphs is half-hop longer

than that of a node with the same degree in the Traceroute AS graph.

Table 7.4: Parameters of the three AS graphs (continued)

Traceroute Extended BGP BGP

Characteristic path length l∗ 3.13 3.56 3.62

Average clustering coef. 〈c〉 0.49 0.35 0.30

Average triangle coef. 〈kt〉 12.7 23.4 5.3

Max. triangle coef. kt−max 7482 7150 3638

Average quadrangle coef. 〈kq〉 277.4 206.8 128.5

Max. quadrangle coef. kq−max 9648 8474 5506

Average betweenness 〈C∗B〉 4.13 4.56 4.62

Max. betweenness C∗B−max 3236.8 3555.5 3596.3

The Traceroute AS graph has more links than the BGP AS graph but has less

links than the Extended BGP AS graph. However it is clear that the number of

links does not necessarily contribute to the network routing efficiency. As shown

in Table 7.4, the characteristic path length of the two BGP AS graphs are 0.5

hop longer than that of the Traceroute AS graph. The half-hop length difference

is significant in terms of network routing efficiency considering that the average

distance between a pair of nodes in the networks are only less than 4 hops. The

reason for this difference can be explained by the above measurements on the rich-

club connectivity, that the tightly interconnected rich-club of the Traceroute AS

graph provides a large selection of shortcuts for the network routing. Whereas the

Extended BGP AS graph and the BGP AS graph do not have as many inter-rich

links as the Traceroute AS graph, and therefore on average packet traffic travels

longer in the two BGP AS graphs .

92

Page 93: Parameterising and Modelling the Internet Topology

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Cum

ulat

ive

dist

ribu

tion

Cluster coefficient

Traceroute Extended BGP

BGP

Figure 7.8: Cumulative distribution of clustering coefficient.

10 -3

10 -2

10 -1

10 0

10 0 10 1 10 2 10 3

Clu

ster

coe

ffic

ient

Degree

Traceroute Extended BGP

BGP

Figure 7.9: Correlation between clustering coefficient c and degree, where c is theaverage of nodes with the same degree.

7.2.4 Short Cycles

In general, nodes of the Traceroute AS graph have larger clustering coefficient

than the Extended BGP AS graph and the BGP AS graph (see Figure 7.8 and

〈c〉 in Table 7.4). There are curve distortions in Figure 7.8. This is because the

three graphs have large numbers of nodes having the same typical values of the

clustering coefficient, e.g. c = 0.33 and c = 0.66 when nodes with degree three

have one or two inter-neighbour links.

Figure 7.9 shows that in the three graphs the clustering coefficient decreases

93

Page 94: Parameterising and Modelling the Internet Topology

with degree, although nodes with high degrees might have a fairly large number of

inter-neighbour links. To infer the neighbour clustering information of nodes with

different degrees, the short cycles properties introduced in Chapter 6 are studied

as follows.

10 -4

10 -3

10 -2

10 -1

10 0

10 0 10 1 10 2 10 3 10 4

Cum

ulat

ive

dist

ribu

tion

Triangle coefficient

Traceroute Extended BGP

BGP

Figure 7.10: Cumulative distribution of triangle coefficient.

100

101

102

103

104

100 101 102 103

Tri

angl

e co

effi

cien

t

Degree

TracerouteExtended BGP

BGP

k=700

Figure 7.11: Correlation between triangle coefficient kt and degree, where kt is theaverage of nodes with the same degree.

Figure 7.10 and Table 7.4 show that the average triangle coefficient of the Ex-

tended BGP AS graph is larger than the other two graphs. However as shown in

Figure 7.11 the Extended BGP AS graph exhibits a distortion in the correlation

between triangle coefficient and degree. For nodes with degrees larger than 10

94

Page 95: Parameterising and Modelling the Internet Topology

and smaller than 700, the triangle coefficient of the Extended BGP is larger than

that of the BGP AS graph and the Traceroute AS graph. For other nodes, the

Extended BGP AS graph and the BGP AS graph has nearly the same triangle co-

efficient, which is smaller than the Traceroute AS graph. This observation echoes

the above analysis on the rich-club connectivity and the shortest path length,

that the extra links contained in the Extended BGP AS graph are mainly links

connecting among lesser rich nodes.

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103 104

Cum

ulat

ive

dist

ribu

tion

Quadrangle coefficient

TracerouteExtended BGP

BGP

Figure 7.12: Cumulative distribution of quadrangle coefficient.

102

103

104

100 101 102 103

Qua

dran

gle

coef

fice

nt

Degree

TracerouteExtended BGP

BGP

Figure 7.13: Correlation between quadrangle coefficient kq and degree, where kq isthe average of nodes with the same degree.

Figure 7.12, Figure 7.13 and Table 7.4 show that the three AS graphs have

95

Page 96: Parameterising and Modelling the Internet Topology

similar properties on the quadrangle coefficient and the Traceroute AS graph has

more quadrangles than the other two BGP graphs.

It is interesting to notice that plot curves of the BGP AS graph and the

Traceroute AS graph often follow parallel patterns (see Figure 7.5 and Figure 7.10

and Figure 7.12), while those of the Extended BGP AS graph have different shapes

and sometimes cross those of the Traceroute AS graph.

7.2.5 Disassortative Mixing

10 1

10 2

10 3

10 0 10 1 10 2 10 3

Nea

rest

-nei

ghbo

rs a

vera

ge d

egre

e

Degree

Traceroute Extended BGP

BGP

Figure 7.14: Correlation between nearest-neighbours average degree knn and degree,where knn is the average of nodes with the same degree.

Figure 7.14 shows that the three AS graphs have similar negative degree-degree

correlations and therefore all exhibit the disassortative mixing behaviour. In gen-

eral for nodes with the same degree, the nearest-neighbours average degree of the

Traceroute graph is larger than that of the two BGP graphs.

7.2.6 Betweenness Centrality

As shown in Figure 7.15, the cumulative distribution of betweenness Pcum(C∗B) of

the three AS graphs follow similar power-law behaviours characterised by slope

−1.1, which yields the betweenness distribution of P (C∗B) ∼ (C∗

B)−2.1. Figure 7.16

96

Page 97: Parameterising and Modelling the Internet Topology

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103 104

Cum

ulat

ive

dist

ribu

tion

Betweenness centrality

TracerouteExtended BGP

BGP

slope -1.11

Figure 7.15: Cumulative distribution of betweenness.

10 0

10 1

10 2

10 3

10 4

10 0 10 1 10 2 10 3 10 4

Bet

wee

nnes

s

Degree

Traceroute Extended BGP

BGP

Figure 7.16: Correlation between betweenness C∗B and degree, where C∗B is theaverage of nodes with the same degree.

shows that the three AS graphs show similar correlations between betweenness

and degree.

7.3 Discussion

Comparison results show that the three AS graphs have a number of similar

topological properties, including the degree distribution, the disassortative mixing

behaviour and the betweenness centrality.

However they also exhibit non-trivial structural differences. The principal

97

Page 98: Parameterising and Modelling the Internet Topology

topological disparity, which is characterised by the metric of rich-club connectiv-

ity, is that the two BGP-based graphs have less inter-rich links than the Tracer-

oute AS graph. This structural difference is relevant because the extra inter-rich

links contained in the Traceroute graph, although a small number, are responsi-

ble for the differences on performance-related network properties, such as routing

efficiency (shortest path length) and routing flexibility (density of short cycles).

Results also indicate that the Extended BGP collects a notable number of links

connecting among lesser rich nodes and these links do not present in the other

two graphs.

Traceroute AS graph

Extended BGP AS graphBGP AS graph

Figure 7.17: The three AS graph measurements.

The above analysis suggests that the BGP AS graph is a subset of both the

Extended BGP AS graph and the Traceroute AS graph, which do not fully overlap

each other (see Figure 7.17).

The Traceroute AS graph and the Extended BGP AS graph both contain

some links that do not present in the other graph. The inter-rich links contained

in the Traceroute AS graph are critical for the network structure and functionality,

whereas the the extra links present in the Extended BGP AS graph do not signif-

icantly effect the network properties. Considering the limitations of the passive

measurement based on BGP tables [80, 81, 73], we suggest that the traceroute-

derived data, by comparison, are more realistic measurements of the Internet

topology. Further study is needed to investigate whether all links inferred from

various data sources are actual Internet connections?

98

Page 99: Parameterising and Modelling the Internet Topology

7.4 Summary

There are three primary measurements of the AS-level Internet topology based

on different data-collecting methodologies. By examining a number of statistical

topological properties, we identified that the major structural discrepancy among

the three AS graphs is that the Traceroute AS graph has more interconnections

among the high-degree nodes than the other two BGP-based AS graphs. This

structural discrepancy is non-trivial because it can be critical for performance-

related network properties. We suggested that by comparison, the traceroute-

derived data are more realistic measurements of the Internet.

99

Page 100: Parameterising and Modelling the Internet Topology

Chapter 8

The Positive-Feedback Preference

Model

8.1 Introduction

During the last few years, various Internet models have been proposed. Yet no

existing model can generate a network that matches all the relevant topological

properties of the Internet. For example, Chapter 4 shows that network models

based solely on the degree distribution (e.g. the BA model) do not reproduce

the rich-club connectivity. Although the Interactive Growth model proposed in

Chapter 5 resembles both the degree distribution and the rich-club connectivity

of the AS-level Internet, the model still has its limitations. For example the IG

model does not produce a maximum degree as large as the actual measurements.

In this Chapter, based on further measurements of the Internet history data,

the author found out that, in addition to the interactive growth mechanism used

in the IG model, there is another mechanism which is necessary for the cor-

rect modelling of the AS-level Internet topology: a nonlinear preferential growth,

where the growth is described by a positive-feedback mechanism. The author

proposed the Positive-Feedback Preference (PFP) model, which uses both of the

two mechanisms.

100

Page 101: Parameterising and Modelling the Internet Topology

Validation results show that the PFP model accurately reproduces all the rele-

vant topological properties of the AS-level Internet, including degree distribution,

rich-club connectivity, the maximum degree, shortest path length, short cycles,

disassortative mixing and betweenness centrality. The PFP model provides a

novel insight into the evolutionary dynamics of real complex networks.

8.2 Modelling The Maximum Degree

100

101

102

103

100 101 102 103 104

Deg

ree

Rank

AS graphPFP model

IG modelBA model

slope -0.85

Figure 8.1: Degree k vs rank r.

The IG model has its limitations. As shown in Figure 8.1, the AS graph has

a nearly strict power-law relationship between degree and rank. The maximum

node degree kmax = 2839 present in the AS graph is nearly a quarter of the number

of nodes (kmax ' N/4) and is significantly larger than the ones obtained by the

IG model (kmax = 700) and the BA model (kmax = 292). The IG model and the

BA model use linear preferential attachment.

To overcome this shortfall, it is possible to replace the linear preference given

by the BA model

Π(i) =ki∑j kj

. (8.1)

101

Page 102: Parameterising and Modelling the Internet Topology

by a nonlinear preferential probability [42, 123]

Π(i) =kα

i∑j kα

j

, α > 1, (8.2)

which favours high-degree nodes.

A numerical experiment (called the Test* model) using Equation (8.2) instead

of Equation (8.1) in the IG model showed that, when α = 1.15 ± 0.01, this

nonlinear preferential growth creates a network with a maximum degree kmax

similar to the AS graph.

10 -3

10 -2

10 -1

100

10 -3 10 -2 10 -1 100

Ric

h-cl

ub c

onne

ctiv

ity

Normalized rank (r/N)

AS graphPFP model

IG modelBA model

Test* model

r/N=0.01

Figure 8.2: Rich-club connectivity φ(r/N) vs normalized rank r/N .

However, as shown in Figure 8.2, the rich-club connectivity produced by the

Test* model deviates from the AS graph. For example, the 1% best connected

nodes of the Test* model have 42% allowable interconnections compared with

27% of the AS graph.

8.3 The Positive-Feedback Preference Model

Based on the Internet history data, Pastor-Satorras et al [66] and Vazquez et al [67,

124] measured that the probability that a new node links with a low-degree old

node indeed follows the linear preferential attachment given by Equation (8.1).

Whereas Chen et al [74] reported that high-degree nodes have a stronger ability of

102

Page 103: Parameterising and Modelling the Internet Topology

acquiring new links than predicted by Equation (8.1). The Internet-history data

also show that at early times, node degree increases very slowly; later on, node

degree grows more and more rapidly. Taking into account these observations, we

modified the IG model by using the nonlinear preferential attachment

Π(i) =k

1+δ log10 ki

i∑j k

1+δ log10 kj

j

, δ ∈ [0, 1]. (8.3)

Equation (8.3) is used for the attachment of new nodes and the appearance of new

internal links. We call this the Positive-Feedback Preference (PFP) model [125,

126]. From numerical simulations, we found that δ = 0.048 produces the best

result. It is interesting to notice that, for δ = 0.048 and kmax = 2839, the

exponent 1+δ log10 kmax ' 1.166 is close to the value of α used in the Test* model

to reproduce the AS graph’s maximum degree. The PFP model also modifies the

IG model’s interactive growth mechanism. The PFP model starts with a small

random network, at each time-step,

• with probability p ∈ [0, 1], a new node is attached to one old node; and

at the same time with probability q ∈ [0, 1] one new internal link appears

between old nodes and with probability 1− q two new internal links appear.

• with probability 1 − p, a new node is attached to two old nodes; and at

the same time with probability q one new internal link appears and with

probability 1− q two new internal links appear.

When p = 0.4 and q = 0.9, the generated PFP networks have the same ratio of

links over nodes as the AS graph (see Table 8.1).

103

Page 104: Parameterising and Modelling the Internet Topology

α k

1+δlogk k

k

10 0

10 1 10 2 10 3

10 1

10 2

10 3

Deg

ree

func

tions

Degree 10 0

Figure 8.3: Three degree functions: k, kα with α = 1.15 and k1+δ log10 k withδ = 0.048.

PFP modelIG model

BA model

Deg

ree

Age (timestep)

10 3

10 2

10 1

10 0

10 0 10 1 10 2 103 10 4

Figure 8.4: Degree growth of a node added in an early time-step.

The PFP model satisfies Pastor-Sartorras et al, Vazquez et al and Chen et al ’s

observations. For low-degree nodes, the attachment preference is approximated

by the linear preference of Equation (8.1). For high-degree nodes, the attachment

preference increases as a nonlinear function of the node degree (see Figure 8.3).

As a result, as the time passes by, the rate of degree growth in the PFP model is

faster than in the IG model and the BA model (see Figure 8.4).

104

Page 105: Parameterising and Modelling the Internet Topology

8.4 Model Validation

In this Chapter, the analysis is based on the Traceroute AS graph measured

in April 2002 by CAIDA [28, 113]. The AS graph is compared with networks

generated by the PFP model, the IG model and the BA model. For each model,

ten networks are generated with different seed numbers and all results are the

average over the ten networks. The networks had the same number of nodes and

similar numbers of links as the AS graph (see Table 8.1).

Table 8.1: Network ParametersAS graph PFP model IG BA

Number of nodes N 11122 11122 11122 11122Number of links L 30054 30151 33349 33349

Power-law exponent γ 2.22 2.22 2.22 3Degree distribution P (k = 1) 26% 28% 26% 0%Degree distribution P (k = 2) 38% 36% 34% 0%Degree distribution P (k = 3) 14% 12% 11% 40%

Average degree 〈k〉 5.4 5.4 6.0 6.0Max. degree kmax 2839 2686 700 292

Rich-club connectivity φ(r/N0.01) 27% 30% 32% 4.5%Avg. triangle coef. 〈kt〉 12.7 12 10.4 0.1Max. triangle coef. kt−max 7482 8611 4123 64

Avg. quadrangle coef. 〈kr〉 277 247 105.4 1.3Max. quadrangle coef. kr−max 9648 9431 8780 527Charact. path length l∗ 3.13 3.14 3.6 4.3

Average knn 〈knn〉 660 482 103 20Avg. betweenness 〈C∗B〉 4.13 4.14 4.6 5.3Max. betweenness C∗Bmax 3237 3419 1002 1064

8.4.1 Degree Distribution, Rich-Club Connectivity and

Maximum Degree

The PFP model closely matches the degree distribution (see Figure 8.5 and 8.6),

the rich-club connectivity (see Figure 8.2) and the maximum degree (see Table I) of

the AS graph. Also the PFP model has the same power-law relationship between

degree and rank, k ∼ r−0.85 as the AS graph (see Figure 8.1).

105

Page 106: Parameterising and Modelling the Internet Topology

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103

Deg

ree

dist

ribu

tion

Degree

AS graphPFP model

IG modelBA model

slope -2.22

Figure 8.5: Degree distribution.

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103

Cum

ulat

ive

degr

ee d

istr

ibut

ion

Degree

AS graphPFP model

IG modelBA model

slope -1.22

Figure 8.6: The cumulative degree distribution.

In certain respect the accuracy of the PFP model to reproduce these properties

is not a surprise. After all, the model was designed to match these properties.

106

Page 107: Parameterising and Modelling the Internet Topology

8.4.2 Short Cycles

10 -4

10 -3

10 -2

10 -1

10 0

10 0 10 1 10 2 10 3 10 4

Cum

ulat

ive

dist

ribu

tion

Triangle coefficient

AS graph PFP model

IG model BA model

Figure 8.7: Cumulative distribution of triangle coefficient.

10 -4

10 -3

10 -2

10 -1

100

100 101 102 103 104

Cum

ulat

ive

dist

ribu

tion

Quadrangle coefficient

AS graphPFP model

IG modelBA model

Figure 8.8: Cumulative distribution of quadrangle coefficient.

Figure 8.7 and 8.8 show that the AS graph and the PFP model have similar

cumulative distributions of short cycles.

107

Page 108: Parameterising and Modelling the Internet Topology

10 -2

10 -1

10 0

10 1

10 2

10 3

10 4

10 0 10 1 10 2 10 3

Tri

angl

e co

effi

cien

t

Degree

AS graph PFP model

IG model BA model

Figure 8.9: Correlation between triangle coefficient kt and degree, where kt is theaverage over nodes with the same degree.

10 -1

100

101

102

103

104

100 101 102 103

Qua

dran

gle

coef

fici

ent

Degree

AS graphPFP model

IG modelBA model

Figure 8.10: Correlation between quadrangle coefficient kq and degree, where kq isthe average over nodes with the same degree.

Figure 8.9 and 8.10 show that the AS graph and the PFP networks also exhibit

similar correlations between short cycles and degree.

The AS graph and the PFP model have higher densities of short cycles (see

〈kt〉 and 〈kq〉 in Table 8.1) than the IG model and the BA model, therefore exhibit

higher degrees of network routing flexibility.

108

Page 109: Parameterising and Modelling the Internet Topology

8.4.3 Disassortative Mixing

0

0.2

0.4

0.6

0.8

1

10 0 10 1 10 2 10 3 10 4

Cum

ulat

ive

dist

ribu

tion

Nearest-neighbors average degree

AS graph PFP model

IG model BA model

Figure 8.11: Cumulative distribution of nearest-neighbours average degree.

101

102

103

100 101 102 103

Nea

rest

-nei

ghbo

urs

aver

age

degr

ee

degree

AS graphPFP model

IG modelBA model

Figure 8.12: Correlations between nearest-neighbours average degree knn and de-gree, where knn is the average over nodes with the same degree.

The AS graph and the PFP model have close cumulative distribution of the

nearest-neighbours average degree (see Figure 8.11). Figure 8.12 shows that the

AS graph and the PFP networks exhibit similar negative correlations between the

nearest-neighbours average degree and degree, therefore show similar disassorta-

tive mixing behaviours.

109

Page 110: Parameterising and Modelling the Internet Topology

8.4.4 Shortest Path Length

1.0

8.0

6.0

4.0

2.0

0.05.04.03.02.0

Cum

ulat

ive

dist

ribu

tion

Shortest path length

AS graphPFP model

IG modelBA model

Figure 8.13: Cumulative distribution of shortest path length.

1.5

2.0

2.5

3.0

3.5

4.0

4.5

10 0 10 1 10 2 10 3

Sho

rtes

t pat

h le

ngth

degree

AS graph PFP model

IG model BA model

Figure 8.14: Correlation between shortest path length l and degree, where l is theaverage over nodes with the same degree.

Figure 8.13 and 8.14 show that the PFP model accurately reproduces the cu-

mulative distribution of shortest path length and the correlation between shortest

path length and degree of the AS graph. Table 8.1 shows that the AS graph

and the PFP model have nearly the same characteristic path length, which is

significantly shorter than that of the IG model and the BA model.

110

Page 111: Parameterising and Modelling the Internet Topology

The reason that the PFP model accurately reproduces the routing efficiency

properties (shortest path length and characteristic path length) of the AS graph

is because the model correctly resembles both the rich-club connectivity and the

disassortative mixing of the AS graph. The rich-club consists of highly connected

nodes, which are well interconnected between each other and the average hop

distance among the club members is very small (1 to 2 hops). The rich-club

is a “super” traffic hub of the network and the disassortative mixing property

ensures that peripheral nodes are always near the hub. These two structural

properties together contribute to the routing efficiency of a network. On the

contrary, the BA model does not reproduces the two structural properties and

therefore underestimates the actual network’s routing efficiency.

8.4.5 Betweenness Centrality

10 -4

10 -3

10 -2

10 -1

10 0

10 0 10 1 10 2 10 3 10 4

Cum

ulat

ive

dist

ribu

tion

Betweenness centrality

AS graph PFP model

IG model BA model

slope -1.1

Figure 8.15: Cumulative distribution of betweenness centrality, Pcum(C∗B).

Figure 8.15 shows that the cumulative distribution of betweenness centrality

Pcum(C∗B) of the four networks exhibit similar power-law behaviours characterised

by slope −1.1. However as shown in Table 8.1, the maximum value of the between-

ness centrality C∗Bmax of the AS graph and the PFP model are significantly larger

than that of the IG model and the BA model. Figure 8.16 also shows that only

111

Page 112: Parameterising and Modelling the Internet Topology

100

101

102

103

100 101 102 103

Bet

wee

nnes

s

Degree

AS graphPFP model

IG modelBA model

Figure 8.16: Correlations between betweenness centrality C∗B and degree, where C∗Bis the average over nodes with the same degree.

the PFP model closely matches the correlation between betweenness centrality

and degree of the AS graph.

8.5 Discussion

8.5.1 The Positive-Feedback Preferential Attachment

The positive-feedback preferential attachment means that, as a node acquires new

links, the node’s relative advantage of competing for more new links increases as

a non-linear feed-back loop. This implies the inequality on the link-acquiring

ability between rich nodes and non-rich nodes enlarges as the network evolves.

Rich nodes, not only become richer, they become disproportionately richer.

8.5.2 Critical Assessment of The PFP Model

The PFP model accurately reproduces the AS-level Internet topology. Comparing

with other existing Internet models, the PFP model has in a number of advantages.

• Firstly the model closely matches all the topological properties that are

widely studied by the network research community, including degree distri-

bution, rich-club connectivity, the maximum degree, shortest path length,

112

Page 113: Parameterising and Modelling the Internet Topology

1K

3K11K

a.

3K

1K

11K d.1K

3K

11K

c.

P(k

), %

k

N=11K

N=3K

N=1K

b.

r

k

kr/N, %

knn

φ, %

10 0

10 1

10 2

10 3

10 -2

10 -1

100

10 1

10 2

10 1 10 0 10 2 10 3

10 1 10 0 10 2 10 3 10 1 10 0 10 2 10 -1

10 1 10 0 10 2 10 3 10 4

10 1

10 2

10 0

10 1

10 2

10 -1

Figure 8.17: Network properties of a growing PFP model with the number of nodesN=1000 (1K), 3000 (3K) and 11122 (11K). (a) Degree distribution. (b) Degree vsrank. (c) Rich-club connectivity. (d) Nearest-neighbours average degree vs degree.

short cycles, disassortative mixing and betweenness centrality.

• Secondly the model reproduces these properties with remarkable accuracy.

• Thirdly the two growth mechanisms used in the model, namely the inter-

active growth and the positive-feedback preference, are based on (and sup-

ported by) the observations on the Internet history data.

• Finally, the validation of the model was conducted with the traceroute-

derived AS graph, which is regarded as more realistic than measurements

based on the BGP-tables (see Chapter 7).

While the initial motivation was to create a model that can accurately re-

produce the rich-club connectivity and the maximum degree of the AS graph,

the PFP model actually captures all other topological properties as well. This

suggests that the Internet structure can be described by only three topological

properties.

113

Page 114: Parameterising and Modelling the Internet Topology

The PFP model is a phenomenological model. Further studies are needed to

explain why the Internet growth seems to follow the non-linear preferential attach-

ment given by the PFP model and what are the consequences of the PFP growth

mechanism for the future of the Internet. Figure 8.17 shows a number of network

properties of a growing PFP model with different numbers of nodes. It would be

interesting to investigate whether the PFP model also resembles other evolution

stages of the Internet topology without customising the model parameters.

8.6 Summary

There are two mechanisms that are necessary for the correct modelling of the

Internet topology at the AS level: the interactive growth and a nonlinear prefer-

ential growth, where the growth is described by a positive-feedback mechanism.

The Positive-Feedback Preference model uses the two mechanisms and accurately

reproduces all the topological properties of the AS-level Internet. The PFP model

is superior to other Internet models.

114

Page 115: Parameterising and Modelling the Internet Topology

Chapter 9

Discussion and Conclusion

9.1 Discussion

Three years ago the research on the Internet topology was still in a preliminary

stage. The Internet has a power-law degree distribution. This means the network

contains a small number of nodes with very large numbers of links and the average

degree can not characterise this heterogeneous nature. The discovery of the power-

law degree distribution invalidated all previous research on the Internet topology

because they were based on the random network theories.

Many degree-based Internet models have been proposed. However no model ac-

curately reproduces the full picture of the Internet topology. Some models are not

based on real measurement data and some models even use non-physical growth

mechanisms to produce selected network properties that are of the researcher’s

own interests.

During his year-long literature survey, the author developed an intuition that

the difficulties in modelling the Internet is due to the lack of means to thoroughly

describe the complex structure of the Internet. There might be some hidden

properties that have not been explicitly characterised by the existing topology

parameters. Therefore, the author did not follow the normal way of starting the

research by examining and comparing all the existing models, which of course

115

Page 116: Parameterising and Modelling the Internet Topology

would be a daunting job. Instead the author started his research by searching for

the hidden structure in the Internet topology.

Researchers have looked for other topological properties to characterise the

Internet topology. For example, by studying the correlation between degree and

nearest-neighbours average degree, researchers have reported that the Internet

exhibits the disassortative mixing behaviour, where high-degree nodes tend to

connect to low-degree nodes. However the disassortative mixing does not charac-

terise how high-degree nodes are connected with each other.

Preliminary measurement data suggested that the Internet has a large number

of links connecting among high-degree nodes. The author realised that it is a key

property of the Internet hierarchical structure. Then the author introduced the

concept of rich-club phenomenon to describe this overlooked structure, i.e. highly

connected nodes not only have large numbers of links but also are tightly intercon-

nected with each other. The rich-club phenomenon is quantitatively characterise

by the rich-club connectivity and the node-node link distribution.

The metric of the rich-club connectivity is a milestone on parameterising

the Internet topology. Using the rich-club connectivity, the author discovered

the structural deficiencies of the Internet models and the author also revealed

the structural discrepancies between different Internet measurements. Moreover,

the authro showed that the rich-club connectivity is relevant to the network be-

haviours, such as routing efficiency, redundancy and robustness.

Inspired by the rich-club properties, the author introduced the IG model,

which closely resembles both the power-law degree distribution and the rich-club

connectivity of the AS-level Internet. The IG model uses the interactive growth

mechanism that is abstracted from observations on the Internet history data. An

important contribution of the IG model is that it demonstrates a possible way to

capture more structural properties by adopting realistic mechanisms originated

from measurements on the Internet evolution.

The author noticed that the IG model still had limitations. For example, the

116

Page 117: Parameterising and Modelling the Internet Topology

model does not reproduce the maximum degree of the AS graph. The author

found that this shortfall could be responsible for not accurately reproducing other

topological properties, such as disassortative mixing. In fact it is well known that

the Internet features a very large maximum degree, but no model using evolving

mechanisms can reproduce this property. The author discovered that by increas-

ing the preference probability, the modified IG model can reproduce the maximum

degree. However the rich-club connectivity of the generated network deviates from

the AS graph. After painstaking study on the Internet history data and with some

inspiration, the author introduced the PFP model. The model modifies that IG

model by using the so-called Positive-Feedback Preference, which only favours

high-degree nodes. As a result the model accurately reproduces the maximum

degree, the degree distribution and the rich-club connectivity at the same time.

While the initial motivation was to reproduce three degree-related structural prop-

erties, the PFP model accurately captures all other topological properties as well,

including properties of short cycles, shortest path length, disassortative mixing

and betweenness centrality. The PFP model is doubtlessly the most complete

and accurate model to date.

The author is confident on the above results because, as an important method-

ology that guided throughout the research, the author bases the research only on

the actual measurements of the Internet. The author uses the Internet mea-

surement data to study the network structure and validate the Internet models.

Moreover the growth mechanisms adopted by the IG model and the PFP model

are abstracted from (and supported by) the observations on Internet history data.

9.2 Future Work

The immediate work is to study the phenomenological PFP model to explain why

the preferential attachment is given by a non-linear feedback loop and what are

the consequences of this growth mechanism for the future of the Internet.

117

Page 118: Parameterising and Modelling the Internet Topology

Future research work should take into account of the two major challenges of

the Internet.

• Due to the rapid growth, the Internet has evolved into such an immense

scale, that the existing methods are not valid anymore to carry out practical

simulations, e.g. to test new routing protocols.

• The Internet is constantly disrupted due to traffic congestions, facility fail-

ures and malicious attacks. The Quality-of-Service (QoS) issues are getting

more and more concerned when deploying future network infrastructures.

Considering the above challenges and based on the research achievements pre-

sented in this thesis, we propose two possible future directions as follows:

1. Scaling problem [127]. Can the network simulation be simplified by using

models with smaller size and less complexity? Are all scales important at

all?

2. Cascading effects [128, 129]. Does local disorder cause a cascading disruption

of the whole network? How to predict and prevent this? How long will it

take to recover?

9.3 Conclusion

The Internet topology has been measured at two different levels. By inferring

router adjacencies it is possible to measure the Internet Router (IR) level graph.

At another level, the graph of the Internet is obtained from the AS routing path

information. These two measurements are related but describe the Internet at

different levels. The AS level describes the aggregation of the routers and links

at a given domain. The two ways to measure the AS Internet are (1) passive

measurements obtained from the BGP routing tables and (2) active measurements

where a probe traces the routers that a IP packet visits when transversing the

118

Page 119: Parameterising and Modelling the Internet Topology

network (that is at the IR level). The AS graph is obtained by mapping the router

information obtained by the probe with its AS domain. The active measurements

are considered to give better description of the Internet connectivity because they

can collect ephemeral adjacency not captured by only looking at the BGP tables.

In summary the AS graph is a heterogeneous network characterised by a power-

law degree distribution. The majority of nodes have only a few links, whereas a

small number of rich nodes have large numbers of links, in particular the best

connected node has links to nearly a quarter of nodes in the network. Based on

the Internet measurement data, the author concluded that the AS graph exhibits a

rich-club phenomenon where the highly connected nodes are tightly interconnected

with other. In fact the top 100 richest nodes form a fully connected mesh.

The existence of a rich-club is critical to for the description and understanding

of the AS Internet. The rich-club is a “super” traffic hub of the network and the

disassortative mixing property ensures that peripheral nodes are always near the

hub. Thus the rich club structure together with the disassortative mixing explain

why the network has a very small characteristic path length. Scale-free models

without the rich-club structure may under-estimate the flexibility of the traffic

routing in the Internet. Moreover, there is also a counter intuitive consequence of

modelling networks without the rich-club. A network without the rich-club may

over-estimate the robustness of the network to a node attack, where the removal

of a small percentage of its richest club members can break down the network

integrity.

The PFP model demonstrates that the degree distribution, the maximum de-

gree and the rich-club connectivity can be accurately reproduced by using two

realistic growth mechanisms based on the Internet history data, namely the in-

teractive growth and the positive-feedback preference. Moveover, when the above

three structural properties are closely resembled, all other topological properties

of the AS graph are also reproduced at the same time. The PFP model is the most

precise and complete Internet topology generator to date. The PFP model not

119

Page 120: Parameterising and Modelling the Internet Topology

only is a practical model for representative Internet simulation but also provides

insights on the fundamental rules that govern the evolution of complex networks.

The above novel contributions represent a profound extension of the state-

of-the-art knowledge in the research field of parameterising and modelling the

Internet topology.

120

Page 121: Parameterising and Modelling the Internet Topology

Appendix I.

QMUL Topology Simulator

The QMUL Topology Simulator provides all the calculation and simulation results

presented in this thesis. The motivation of developing the topology simulation tool

was that there was no suitable kit available for this research, which involves gener-

ating self-designed models and calculating self-defined properties. The simulator

is developed by the author himself using Microsoft Visual C++ 6.0. It is based

on the MS Windows 2000 operating system. It has the following functions (see

Figure 10.1):

• It grows scale-free networks using the BA model series, including the BA

model, the Fitness BA model and the Generalised BA model, with various

settings of initial status and parameters. It also imports topology data

generated by the Inet model of version 2.1 ∼ 3.0.

• It generates Internet-like networks using self-designed model, such as the

Interactive Growth model and the Positive-Feedback Preference model.

• It parses and imports that Internet measurement data (AS graphs).

• It calculates all the topological properties used in this thesis, such as clus-

tering coefficient, degree distribution, shortest path length, betweenness,

nearest-neighbours average degree, rich-club connectivity, triangle coefficient

and quadrangle coefficient.

121

Page 122: Parameterising and Modelling the Internet Topology

• It exports topology data into the Pajek [130] file format, which can be used

to visualise the network graphs, e.g Figure 5.7. It also export plot data files

in the Gunplot [131] format to create scientific plot figures.

• It saves the network connectivity information and all the calculation results

of topological properties in the ‘∗.topo’ file format, which can be restored

for further uses.

Parameter Setting

( *.topo ) Topology files

Standard Pajek

network data files

Save

Load

Inet Data Files

Import Grow

Initial Status

Grow

Standard Gunplot

plot data files

Medium Status

Inet generator

Generate

PAJEK Gunplot

Load Load

Import

Other software and data sources

QMUL Topology Simulator Legend:

To generate networks using models of BA, FBA, GBA, GLP, IG, and PFP.

Export Export

QMUL Topology Simulator

parse

BGP and Traceroute AS graphs

Internet Raw data

Calculate topological properties: degree distribtuion, rich-club connectivity, shortest path length, triangle- coefficient, degree- degree correlations, betweenness......

(Scientific plot figures)

(Inner data structure)Other data

(Visualised graphs)

Figure 10.1: Function flowchart of the QMUL Topology Simulator.

122

Page 123: Parameterising and Modelling the Internet Topology

The strongpoint of this simulator is that, by using a tight linear data structure

to store the topology information, the simulator achieves an optimal balance be-

tween the fast speed of calculation and the economic amount of memory required

by the process. Running on a Dell desktop computer with merely 256MB RAM

and an Intel 1.0GHz CPU, it takes only 30 seconds for the simulator to generate a

BA model network of 11K nodes and 33K links. The author also improved Dijk-

stra’s algorithm [53] of calculating the shortest path length between every pair of

nodes, so that the same process also calculate the betweenness centrality of every

node. It takes only about 5 hours to calculate the two properties. The QMUL

Topology Simulator also has the following features:

• Flexible. The simulator supports multiple evolving network models.

• Extensible. The simulator uses an object-oriented architecture, which pro-

vides the ability to add new network models and to handle customised file

formats.

• Large-scale. The simulator is capable of processing large scale networks with

up to 100K nodes and 4.5M links.

• User Friendly. The simulator provides a Graphical User Interface as shown

in Figure 10.2, 10.3 and 10.4.

Figure 10.2: Window of “Parameters for generating networks”.

123

Page 124: Parameterising and Modelling the Internet Topology

Figure 10.3: Window of the main interface.

It took four months for the author to design, code and debug the first version

of the QMUL Topology Simulator in late 2001. Since then the simulator has been

updated and optimised for many times in order to revise program bugs, add new

functions and improve the calculation speed. The latest version of the program

has more than 5000 lines of code and it has been proved to be en efficient and

powerful network simulation tool. The following is a list of functions defined in

the C++ Class of “CQMUL Topo”.

long Do UnifyData(long thisManyXdata);

long Do ReadP lotF ile(CString thisP lotName, int thisP lotFormat);

long Do GenerateGrowOneNonLineal(long thisOneEnd, long thisCreateT ime);

long Do GenerateGetNonLinealPreferential(long thisException, double thisAlpha);

long Do P lotAverageErrorBar(long thisManyData);

long Do GenerateGrowOnePreference(long thisOneEnd, long thisCreateT ime);

long Do GenerateGrowOneRandom(long thisOneEnd, long thisCreateT ime);

124

Page 125: Parameterising and Modelling the Internet Topology

Figure 10.4: Window of “Save plot data files”.

long Do GenerateGetLinealPreferential( );

long Do GenerateGetLinealPreferentialFBA(double thisTotal);

long Do ReadBarabasiActor(CString thisF ileName);

long Do GetLinkDistDataRank(long X1, long X2, long Y 1, long Y 2);

long Do GetLinkDistDataDegree(long X1, long X2, long Y 1, long Y 2);

long Do GetRichClubRankLink(long thisRank);

long Do P lotPercentage(long thisManyData);

long Do ArrangeRawData(long thisManyData);

long Do ReadDataF ile(CString thisF ileName, int thisF ileFormat);

long GetSmallestLabel( );

double Do GetRandom(double theBase);

void Do GenerateNLP ( );

void Do CalculateLength( );

void Do P lot70( );

void Do GetBetweenThis( );

void Do P lot40 K(long thisP lot);

void Do P lotV alueRank(long thisMany);

void Do P lot10 Rank(long thisP lot);

void Do CalcalateLocal( );

125

Page 126: Parameterising and Modelling the Internet Topology

void Do GenerateDoro( );

void Do GenerateIG( );

void Do GenerateFBA( );

void Do GenerateInitialStatus( );

void Do GenerateBA( );

void Do GenerateRandom( );

void Do P lot63( );

void Do P lot62( );

void Do P lotSortData(long thisMany);

void Do P lotCumulative(long thisMany);

void Do GetTopoInfo( );

void Do P lot20 Distribution(long thisP lot);

void Do P lot60( );

void Do P lot61( );

void Do P lot51( );

void Do P lot00 ID(long thisP lot);

void Do ArrangeData( );

void Do InitNetwork( );

void Do WriteP lotF ile(CString thisF ileName, long thisLongData,

BOOL ifAverageErrorBar,BOOL ifAllLong);

CString Do GetCString(double thisData);

CString Do ComposeF ileName( );

BOOL DoScan(long thisSmallest);

BOOL Do IfHasLink(long thisStart, long thisEnd);

BOOL Do IfHasLinkAfterSort(long thisStart, long thisEnd);

BOOL Do AddNewLink(long thisLinkID, long thisStart,

long thisEnd, long thisCreateT ime, BOOL thisIfCheck);

BOOL Do AddNewNode(long thisNodeID, long thisCreateT ime);

BOOL Do CheckData( );

126

Page 127: Parameterising and Modelling the Internet Topology

Appendix II.

Author’s Publications

Journal Papers

1. S. Zhou and R. J. Mondragon. The rich-club phenomenon in the Internet

topology. IEEE Communications Letters, volume 8, page 180, March 2004.

2. S. Zhou and R. J. Mondragon. Redundancy and robustness of the AS-

level Internet topology and its models. IEE Electronic Letters, volume 40,

page 151, January 2004.

3. S. Zhou and R. J. Mondragon. Accurately modelling the Internet topology.

Accepted by Physical Review E, 2004.

4. M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondragon and J. M. Pitts.

Dynamical modelling of TCP packet traffic on scale-free networks. Submit-

ted to Physical Review E, 2004.

Conference Papers

5. S. Zhou and R. J. Mondragon. Towards modelling the Internet topology

- the Interactive Growth model. In J. Charzinski, editor, Proc. of 18th

127

Page 128: Parameterising and Modelling the Internet Topology

International Teletraffic Congress (ITC18), volume 5a of Teletraffic Science

and Engineering (Elsevier), pages 121–130, Berlin, German, Sept. 2003.

6. S. Zhou and R. J. Mondragon. The missing links in the BGP-based AS

connectivity maps. In Proc. of Passive and Active Measurement Workshop

(PAM2003) , pages 219–222, San Diego, USA, April 2003.

7. S. Zhou and R. J. Mondragon. Analyzing and modelling the AS-level

Internet topology. In Prof. of 1st International Working Conference on

Performance Modelling and Evaluation of Heterogeneous Networks (HET-

NETs’03), Ilkley, West Yorkshire, UK, July 2003.

8. S. Zhou and R. J. Mondragon. Topological properties of the AS-level

Internet. In Proc. of IEEE & IEE International Conference on Telecom-

munications (ICT2002) , volume 3, pages 497–501, Beijing, China, June

2002.

9. S. Zhou and R. J. Mondragon. Connectivity in the Internet topology. In

Proc. of PGNet2002, pages 157–162, Liverpool, UK, May 2002.

10. S. Zhou and R. J. Mondragon. The Positive-Feedback Preference model of

the AS-level Internet topology. Submitted to IEEE ICC, 2005.

11. S. Zhou and R. J. Mondragon, Sampling Methodologies and Structural

Deficiencies of the AS-level Internet Topology Measurements. Submitted to

The International Conference on Information Networking (ICOIN) 2005.

128

Page 129: Parameterising and Modelling the Internet Topology

Glossary

AS Autonomous System, a collection of routers operated in a coordinated way

so that the routers implement the same routing policy; typically operated

by a single administrative entity.

ASN Autonomous System Number, a two-byte number that uniquely identifies

an AS.

BGP Border Gateway Protocol, the primary inter-domain routing protocol used

in the Internet.

ICMP Internet Control Message Protocol, the diagnostic part of the network

layer used in the Internet for reporting status information, checking connec-

tivity, and so on.

IP Internet Protocol, the network layer protocol used by the Internet.

ISPs Internet Service Providers

LANs Local Area Networks.

MANs Metropolitan Area Networks.

Protocol A standard procedure for regulating data transmission between com-

puters.

Router A computer that typically has two or more interfaces on different net-

works and provides forwarding of packets between those networks.

129

Page 130: Parameterising and Modelling the Internet Topology

Routing The process by which a router calculates a forwarding table by using

its knowledge of the network taken from local configurations.

Routing Table A conceptual data structure used to hold routing information.

Server A hardware and software device designed to perform a specific function

for many users.

TCP Transmission Control Protocol, the principal reliable transport protocol

used in the Internet.

UDP User Datagram Protocol.

WANs Wide Area Networks.

WWW World Wide Web.

130

Page 131: Parameterising and Modelling the Internet Topology

Bibliography

[1] L. A. Adamic and B. A. Huberman, “Power-law distribution of the world

wide web,” Science, vol. 287, p. 2115, 2000.

[2] S. H. Strogatz, “Exploring complex networks,” Nature (London), vol. 410,

p. 268, 2001.

[3] P. L. Krapivsky and S. Redner, “Organization of growing random networks,”

Phys. Rev. E, vol. 63, p. 066123, 2001.

[4] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search

in power-law networks,” Physical Review E, vol. 64, p. 046135, 2001.

[5] A. L. Barabasi, Linked: The New Science of Networks. Perseus Publishing,

2002.

[6] R. Albert and A. L. Barabasi, “Statistical mechanics of complex networks,”

Rev. Mod. Phys., vol. 74, pp. 47–97, 2002.

[7] S. Bornholdt and H. G. Schuster, Handbook of Graphs and Networks - From

the Genome to the Internet. Weinheim Germany: Wiley-VCH, 2002.

[8] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks - From

Biological Nets to the Internet and WWW. Oxford University Press, 2003.

[9] A. Vazquez, R. P.-S. M. Boguna, Y. Moreno, and A. Vespignani, “Topol-

ogy and correlations in structured scale-free networks,” Physical Review E,

vol. 67, no. 046111, 2003.

131

Page 132: Parameterising and Modelling the Internet Topology

[10] R. Cohen and S. Havlin, “Scale-free networks are ultrasamll,” Physical Re-

view Letters, vol. 90, no. 5, p. 058701, 2003.

[11] R. Pastor-Satorras and A. Vespignani, Evolution and Structure of the Inter-

net - A Statistical Physics Approach. Cambridge University Press, 2004.

[12] S. T. Park, D. Pennock, and C. L. Giles, “Comparing static and dynamic

measurements andmodels of the Internets AS topology,” in Proc. of IEEE

INFOCOM 2004, 2004.

[13] A. Medina, I. Matta, and J. Byers, “On the origin of power laws in Internet

topologies,” ACM SIGCOMM Computer Communication Review, 2000.

[14] K. I. Goh, B. Kahng, and D. Kim, “Fluctuation-driven dynamics of the

Internet topology,” Physical Review Letters, 2002.

[15] A. C. Zorach and R. E. Ulanowicz, “Quantifying the complexity of flow

networks: how many roles are there?” Complexity, vol. 8, no. 3, 2003.

[16] A. L. Barabasi, Z. Deszo, E. Ravasz, S. H. Yook, and Z. Oltvai, “Scale-

free and hierarchical structures in complex networks,” to appear in Sitges

Proceedings on Complex Networks, 2004.

[17] S. Floyd, “Simulation is crucial,” IEEE Spectrum, January 2001.

[18] G. F. Riley and M. H. Ammar, “Simulating large networks - how big is big

enough?” in Proc. of 1st Intl. Conf. on Grand Challenges for Modeling and

Simulation, 2002.

[19] V. Paxson and S. Floyd, “Why we don’t know how to simulate the Internet,”

in Proc. of the 1997 Winter Simulation Conference, 1997.

[20] S. Floyd and V. Paxson, “Difficulties in simulating the Internet,”

IEEE/ACM Transactions on Networking, vol. 9, no. 4, pp. 392–403, Au-

gust 2001.

132

Page 133: Parameterising and Modelling the Internet Topology

[21] S. Floyd and E. Kohler, “Internet research needs better models,” ACM

SIGCOMM Computer Communications Reviews, vol. 33, no. 1, pp. 29–34,

January 2003.

[22] W. Willinger and V. Paxson, “Where mathematics meets the Internet,”

Notices of the American Mathematical Society, vol. 45, no. 8, 1998.

[23] B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology infer-

ence in the presence of anonymous routers,” in Proc. IEEE INFOCOM,

2003.

[24] T. Petermann and P. D. L. Rios, “Exploration of scale-free networks – do

we measure the real exponents?” Eur. Phys. J., vol. 38, pp. 201–204, 2004.

[25] NLANR (National Laboratory for Applied Network Research),

http://moat.nlanr.net/.

[26] Route Views Project, University of Oregon, Eugene.

http://www.routeviews.org/.

[27] Routing Information Service, RIPE Network Coordination Center.

http://www.ripe.net/.

[28] CAIDA (Cooperative Association For Internet Data Analysis),

http://www.caida.org/.

[29] Internet Mapping Project, Lumeta, http://research.lumeta.com/ches/map/.

[30] Topology Project, University of Michigan, Ann Arbor.

http://topology.eecs.umich.edu/.

[31] M. Murray and kc claffy, “Measuring the immeasurable: global Internet

measurement infrastructure,” in Prof. of PAM2001, 2001.

133

Page 134: Parameterising and Modelling the Internet Topology

[32] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships

of the Internet topology,” Comput. Commun. Rev., vol. 29, pp. 251–262,

1999.

[33] P. Erdos and A. Renyi, “On random graphs,” Publ. Math. Debrecen, vol. 6,

p. 290, 1959.

[34] P. Erdos and A. Renyi, “On the evolution of random graphs,” Publ. Math.

Inst. Hung. Acad. Sci., vol. 5, p. 17, 1960.

[35] B. M. Waxman, “Routing of multipoint connections,” IEEE Journal of Se-

lected Areas in Communications, vol. 6, no. 9, pp. 1617–1622, 1988.

[36] A. Capocci, G. Caldarelli, R. Marchetti, and L. Pietronero, “Growing dy-

namics of Internet providers,” Physical Review E, vol. 64, no. 035105, 2001.

[37] J. Winick and S. Jamin, “Inet-3. 0 Internet topology generator,” University

of Michigan, Tech. Rep. UM-CSE-TR-456-02, 2002.

[38] A. L. Barabasi and R. Albert, “Emergence of scaling in random networks,”

Science, vol. 286, pp. 509–512, 1999.

[39] G. Bianconi and A. L. Barabasi, “Competition and multiscaling in evolving

networks,” Europhysics Letters, vol. 54, no. 4, pp. 436–442, 2001.

[40] R. Albert and A. L. Barabasi, “Topology of evolving networks: local events

and universality,” Physical Review Letters, vol. 85, no. 24, pp. 5234–5237,

2000.

[41] A. Medina and I. Matta, “Brite: A flexible generator of Internet topologies,”

Boston University, Tech. Rep. BU-CS-TR-2000-005, 2000.

[42] S. N. Dorogovtsev and J. F. F. Mendes, “Scaling behaviour of developing

and decaying networks,” EuroPhys. Lett., vol. 52, no. 33, p. 33, 2000.

134

Page 135: Parameterising and Modelling the Internet Topology

[43] T. E. D. Vukadinovic, P. Huang, “A spectral analysis of the Internet topol-

ogy,” Technical report ETH TIK-NR. 118, 2001.

[44] T. Bu and D. Towsley, “On distinguishing between Internet power law topol-

ogy generators,” in Proc. of IEEE INFOCOM 2002, 2002, p. 638.

[45] G. Caldarelli, P. D. L. Rios, and L. Pietronero, “Generalized network

growth: from microscopic strategies to the real Internet properties,”

arXiv:cond-mat/0307610 v1, 2004.

[46] J. M. Carlson and J. C. Doyle, “Highly optimized tolerance: A mechanism

for power laws in designed systems,” Physical Review E, vol. 60, pp. 1412–

1428, 1999.

[47] I. Norros and H. Reittu, “Architectural features of the power-law random

graph model of Internet: nodes on soft hierarchy, vulnerability and multi-

casting,” in Proceedings of the 18th International Teletraffic Congress - ITC

18, Elsevier, 2003.

[48] C. P. B. Quoitin and L. Swinnen, “Interdomain traffic engineering with

bgp,” IEEE Communications Magazine, May 2003.

[49] D. K. Arrowsmith and M. Woolf, “Modelling of tcp packet traffic in a large

interactive growth network,” IEEE Proc. of Systems and Circuits, 2004.

[50] M. Barenco and D. K. Arrowsmith, “The autocorrelation of double inter-

mittency maps and the simulation of computer packet traffic,” to appear in

Jnl of Dyn. Sys, 2004.

[51] C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary, “The impact

of Internet policy and topology on delayed routing convergence,” in Proc.

of INFOCOMM 2001, 2001.

135

Page 136: Parameterising and Modelling the Internet Topology

[52] R. V. Sole and S. Valverde, “Information theory of complex networks: On

evolution and architectural constraints,” Santa Fe Institute, Tech. Rep. DOI:

SFI-WP 03-11-061, 2003.

[53] A. Kershenbaum, Telecommunications network design algorithms.

McGraw-Hill, Inc., 1993.

[54] M. Steenstrup, Routing in communications networks. Prentice Hall, 1995.

[55] H. Tangmunarunkit, R. Govindan, S. Shenker, and D. Estrin, “The impact

of routing policy on Internet paths,” in Prof. of IEEE INFOCOM 2001,

2001.

[56] R. Guerin and A. Orda, “Computing shortest paths for any number of hops,”

IEEE/ACM Transactions on Networking, vol. 10, no. 5, October 2002.

[57] K. I. Goh, B. Kahng, and D. Kim, “Universal behavior of load distribution

in scale-free networks,” Phys. Rev. Lett., vol. 87, no. 278701, 2001.

[58] K. I. Goh, E. Oh, B. Kahng, and D. Kim, “Betweenness centrality correla-

tion in social networks,” Phys. Rev. E, vol. 67, no. 017101, 2003.

[59] P. Holme and B. J. Kim, “Vertex overload breakdown in evolving networks,”

Phys. Rev. E, vol. 65, no. 066109, 2002.

[60] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerability of

complex networks,” Phys. Rev. E, vol. 65, no. 056109, 2002.

[61] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ net-

works,” Nature, vol. 393, 1998.

[62] M. E. J. Newman, “Assortative mixing in networks,” Phys. Rev. Lett.,

vol. 89, no. 208701, 2002.

[63] M. E. J. Newman, “Mixing patterns in networks,” Phys. Rev. E, vol. 67,

no. 026126, 2003.

136

Page 137: Parameterising and Modelling the Internet Topology

[64] S. Maslov, K. Sneppen, and A. Zaliznyak, “Detection of topological patterns

in complex networks: correlation profile of the Internet,” Physica A, vol. 333,

p. 529, 2004.

[65] R. Xulvi-Brunet, W. Pietsch, and I. M. Sokolov, “Correlations in scale-free

networks: Tomography and percolation,” Phys Rev E, vol. 68, no. 036119,

2003.

[66] R. Pastor-Satorras, A. Vazquez, and A. Vespignani, “Dynamical and cor-

relation properties of the Internet,” Phys. Rev. Lett., vol. 87, no. 258701,

2001.

[67] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, “Large-scale topological

and dynamical properties of Internet,” Phys. Rev. E, vol. 65, no. 066130,

2002.

[68] S. Janson, T. Luczak, and A. Rucinski, Random Graphs. Wiley-

Interscience, 2000.

[69] J. Watts, Small Worlds: The Dynamics of Networks between Order and

Randomness. New Jersey, USA: Princeton Univeristy Press, 1999.

[70] L. Adamic, “The small world web,” in Proceedings of ECDL’99, 1999, pp.

443–452.

[71] M. E. J. Newman and D. J. Watts, “Scaling and percolation in the small-

world network model,” Phys. Rev. E, vol. 60, p. 7332, 1999.

[72] M. E. J. Newman and D. J. Watts, “Renormalization group analysis of the

small-world network model,” Physics Letters A, vol. 263, pp. 341–346, 1999.

[73] Y. Hyun, A. Broido, and k. claffy, “Traceroute and BGP AS path incon-

gruities,” http://www.caida.org/outreach/papers/2003/ASP/.

137

Page 138: Parameterising and Modelling the Internet Topology

[74] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger,

“The origin of power laws in Internet topologies (revisited),” in Proc. of

IEEE INFOCOM 2002, 2002, pp. 608–617.

[75] H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger, “To-

wards capturing representative as-level Internet topology,” Computer Net-

works Journal, vol. 44, no. 6, pp. 737–755, 2004.

[76] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet map discov-

ery,” in Proc IEEE Infocom 2000, 2000.

[77] R. Albert, H. Jeong, and A. L. Barabasi, “Error and attack tolerance of

complex networks,” Nature, vol. 406, pp. 378–381, 2000.

[78] L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, “Characteriz-

ing the Internet hierarchy from multiple vantage points,” in Proc. of IEEE

INFOCOM 2002, 2002, pp. 618–627.

[79] S. T. Park, A. Khrabrov, D. M. Pennock, S. Lawrence, C. L. Giles, and

L. H. Ungar, “Static and dynamic analysis of the Internet’s susceptibility to

faults and attacks,” in Proc. of IEEE INFOCOM 2003, vol. 3, April 2003,

pp. 2144–2154.

[80] A. Broido and kc Claffy, “Internet topology: connectivity of IP graphs,” in

SPIE International symposium on Convergence of IT and Communication

2001, 2001.

[81] B. Huffaker, D. Plummer, D. Moore, and kc Claffy, “Topology discovery by

active probing,” in Proc. of the 2002 Symposium on Applications and the

Internet, 2002.

[82] E. N. A. Broido and kc Claffy, “Internet expansion, refinement and churn,”

European Transactions on Telecommunications 2002, 2002.

138

Page 139: Parameterising and Modelling the Internet Topology

[83] K. L. Calvert, M. B. Doar, and E. W. Zegura, “Modeling Internet topology,”

IEEE Communications Magazine, June 1997.

[84] M. Doar, “A better model for generating test networks,” Proc. of IEEE

GLOBECOM 1996, Nov. 1996.

[85] E. W. Zegura, K. L. Calvert, and M. J. Donahoo, “A quantitative compari-

son of graph-based models for Internet topology,” ACM/IEEE Transactions

on Networking,, vol. 5, no. 6, pp. 770–783, 1997.

[86] C. Jin, Q. Chen, and S. Jamin, “Inet: Internet topology generator,” Uni-

versity of Michigan, Tech. Rep. UM-CSE-TR-433-00, 2000.

[87] A. L. Barabasi, “The architecture of complexity: From the diameter of the

www to the structure of the cell,” http://www. nd. edu/ networks/.

[88] Z. N. H. Jeong and A. L. Barabasi, “Measuring preferential attachment in

evolving networks,” Europhysics Letters, vol. 61, no. 4, pp. 567–572, 2003.

[89] A. L. Barabasi, “The physics of the web,” Physics World, July 2001.

[90] D. Cohen, “All the world is a net,” New Scientist, April 2002.

[91] R. Cohen and S. Havlin, “Scale-free networks are ultrasmall,” Phys. Rev.

Lett., vol. 90, no. 5, p. 058701, 2003.

[92] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani, “Critical

load and congestion instabilities in scale-free networks,” Europhys. Lett.,

vol. 62, p. 292, 2002.

[93] S. H. Yook, H. Jeong, and A. L. Barabsi, “Modelling the Internet’s large-

scale topology,” Proc. of the Nat’l Academy of Sciences, vol. 99, pp. 13 382–

13 386, 2002.

[94] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free

networks,” Physical Review Letters, vol. 86, no. 14, pp. 3200–3203, 2001.

139

Page 140: Parameterising and Modelling the Internet Topology

[95] A. L. Barabasi, R. Albert, and H. Jeong, “Mean-field theory for scale-free

random networks,” Physica A, vol. 272, pp. 173–187, 1999.

[96] A. Medina, A. Lakhina, I. Matta, and J. Byers, “Brite: Universal topology

generation from a user’s perspective,” Boston University, Tech. Rep. BUCS-

TR-2001-003, 2001.

[97] G. Bianconi, G. Caldarelli, and A. Capocci, “Number of h-cycles in the

Internet at the autonomous system level,” ArXiv:cond-mat/0310339, 2003.

[98] A. Fabrikant, E. Koutsoupias, and C. H. Papadimitriou, “Heuristically op-

timized trade-offs: A new paradigm for power laws in the Internet,” in Proc.

of ICALP 2002, 2002.

[99] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-

inger, “Network topology generators: Degree-based vs. structural,” Proc.

of ACM/SIGCOMM 2002, pp. 147–159, 2002.

[100] H. Tangmunarunkit, J. Doyle, R. Govindan, and S. Jamin, “Does AS size

determine degree in AS topology?” ACM SIGCOMM Computer Commu-

nication Review, 2001.

[101] D. Krioukov, http://www.krioukov.net/ dima/rs.html.

[102] J. Spencer and L. Sacks, “Modelling ip network topologies by emulating

network development processes,” in IEEE Softcom 2002, 2002.

[103] H. Fuks and A. T. Lawniczak, “Performance of data networks with random

links,” Mathematics and Computers in Simulation, vol. 51, pp. 103–119,

1999.

[104] L. Gao, “On inferring autonomous system relationships in the Internet,” in

Proc. of IEEE Global Internet, 2000.

140

Page 141: Parameterising and Modelling the Internet Topology

[105] S. Zhou and R. J. Mondragon, “The rich-club phenomenon in the Internet

topology,” IEEE Comm. Lett., vol. 8, no. 3, pp. 180–182, March 2004.

[106] S. Zhou and R. J. Mondragon, “Connectivity in the Internet topology,” in

Proc. of PGNet2002. Liverpool, UK: EPSRC, May 2002, pp. 157–162.

[107] S. Zhou and R. J. Mondragon, “Topological properties of the as-level In-

ternet,” in Proc. of Int. Conf. on Telecommunications (ICT) 2002, vol. 3.

Beijing, China: IEEE and IEE, June 2002, pp. 497–501.

[108] S. Zhou and R. J. Mondragon, “Redundancy and robustness of the as-level

Internet topology and its models,” IEE Elec. Lett., vol. 40, no. 2, pp. 151–

152, January 2004.

[109] S. Zhou and R. J. Mondragon, “Analyzing and modelling the as-level Inter-

net topology,” in Prof. of 1st Int. Working Conf. on Performance Modelling

and Evaluation of Heterogeneous Networks (HET-NETs’03), Ilkley, West

Yorkshire, UK, July 2003, arXiv:cs. NI/0303030.

[110] S. Zhou and R. J. Mondragon, “Towards modelling the Internet topology

- the interactive growth model,” in Proc. of 18 Int. Teletraffic Congress

(ITC18), ser. Teletraffic Science and Engineering, J. Charzinski, Ed., vol. 5a.

Berlin, German: Elsevier, Sept. 2003, pp. 121–130.

[111] M. Woolf and D. K. Arrowsmith, “Modelling of tcp packet traffic in a large

interactive growth network,” in IEEE Int. Symposium on Circuits and Sys-

tems (ISCAS), Vancouver, Canada, May 2004.

[112] M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondragon, and J. M. Pitts,

“Dynamical modelling of tcp packet traffic on scale-free networks,” (submit-

ted), 2004.

141

Page 142: Parameterising and Modelling the Internet Topology

[113] The Data Kit #0204 was collected as part of CAIDA’s Skitter initiative,

http://www.caida.org. Support for Skitter is provided by DARPA, NSF,

and CAIDA membership.

[114] P. M. Gleiss, P. F. Stadler, A. Wagner, and D. A. Fell, “Small cycles in

small worlds,” SFI Working Paper 00-10-058, 2000.

[115] G. Bianconi and A. Capocci, “Number of loops of size h in growing scale-free

networks,” Phys. Rev. Lett., vol. 90, no. 078701, 2003.

[116] R. P.-S. G. Caldarelli and A. Vespignani, “Structure of cycles and local

ordering in complex networks,” The European Physical Journal B, vol. 28,

no. 2, pp. 183–186, 2004.

[117] M. M. C. Gkantsidis and E. Zegura, “Spectral analysis of Internet topolo-

gies,” in Proc. of IEEE INFOCOM 2003, 2003.

[118] G. Iannaccone, C. N. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot,

“Analysis of link failures in an ip backbone,” Proc. of the second ACM

SIGCOMM Workshop on Internet measurment, 2002.

[119] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Network

robustness and fragility: Percolation on random graphs,” Physical Review

Letters, vol. 85, no. 25, p. 5468, December 2000.

[120] S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos, “A simple conceptual

model for the Internet topology,” in Prof. of Global Internet, San Antonio,

Texas, 2001.

[121] S. Zhou and R. J. Mondragon, “The missing links in the BGP-based AS con-

nectivity maps,” in Proc. of Passive and Active Measurement (PAM) Work-

shop 2003. San Diego, USA: NLANR, April 2003, pp. 219–222, arXiv:cs.

NI/0303028.

142

Page 143: Parameterising and Modelling the Internet Topology

[122] S. Zhou and R. J. Mondragon, “On measuring and modeling the Internet

topology at the autonomous systems level,” Submitted to ACM/IMC2004,

2004.

[123] P. L. Krapivsky, S. Redner, and F. Leyvraz, “Connectivity of growing ran-

dom networks,” Phys. Rev. Lett., vol. 85, no. 4629, 2000.

[124] A. V. A. Vazquez, R. Pastor-Satorras, “Internet topology at the router and

autonomous system level,” cond-mat/0206084, 2002.

[125] S. Zhou and R. J. Mondragon, “The positive-feedback preference model of

the as-level Internet topology,” Submitted to IEEE Communications Letters,

2004.

[126] S. Zhou and R. J. Mondragon, “Accurately modelling the Internet topology,”

2004, preprint: arXiv.cs.NI/0402011.

[127] K. Psounis, R. Pan, B. Prabhakar, and D. Wischik, “The scaling hypoth-

esis: simplifying the prediction of network performance using scaled-down

simulations,” ACM SIGCOMM Computer Communications Review, vol. 33,

no. 1, 2003.

[128] A. V. Y. Moreno, R. Pastor-Satorras and A. Vespignani, “Critical load

and congestion instabilities in scale-free networks,” Europhys. Lett., vol. 62,

no. 2, pp. 292–298, 2003.

[129] S. Agarwal, C. N. Chuah, and R. H. Katz, “OPCA: Robust interdomain

policy routing and traffic control,” in Proc. of the 6th InternationalCon-

ference on Open Architectures and Network Programming (OPENARCH

2003), 2003.

[130] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/.

[131] Gunplot, http://t16web.lanl.gov/Kawano/gnuplot/.

143