parameterising and modelling the internet topology
TRANSCRIPT
Parameterising and Modelling
the Internet Topology
Shi Zhou
Department of Electronic Engineering
Queen Mary, University of London
A thesis submitted to the University of London
for the degree of Doctor of Philosophy.
July 2004
To my family.
2
Abstract
Simulation plays a vital role in studying the complex behaviour of both existing
telecommunications networks and proposed future architecture. When modelling
the behaviour of the Internet it is crucial to obtain a good description of its
structure, because structure fundamentally affects function. The aim of this work
is to provide quantitative parameters to fully characterise network structures and
propose realistic models which can accurately reproduce the Internet topology at
the autonomous systems (AS) level.
This thesis introduces the novel concept of rich-club phenomenon to describe
the Internet hierarchical structure, where a small number of highly connected
nodes are tightly interconnected with each other. This structure is quantitatively
characterised by the rich-club connectivity and the node-node link distribution.
The metric of the rich-club connectivity is a milestone on parameterising the In-
ternet topology. Using this unique metric, the author reports that the existing
degree-based models do not match the Internet hierarchical structure. The author
shows that an appreciation of the rich-club connectivity is essential for a proper
examination of the network behaviours, such as routing efficiency, redundancy
and robustness. The author also uses this metric to reveal the major topological
disparities between the Internet measurements obtained using different method-
ologies.
The author introduces an original Interactive Growth (IG) model, which
closely resembles both the power-law degree distribution and the rich-club connec-
tivity of the AS-level Internet. Based on observations on the Internet history data,
3
the author improves the IG model and introduce the Positive-Feedback Preference
(PFP) model, which is doubtlessly the most complete and detailed model to date.
The PFP model accurately reproduces all the relevant topological properties of
the Internet, including degree distribution, rich-club connectivity, the maximum
degree, shortest path length, short cycles, disassortative mixing and betweenness
centrality. The PFP model’s non-linear preference mechanism provides a novel
insight into the basic dynamics that could be responsible for the evolving topology
of complex networks.
This successful research has provided a number of promising contributions.
These achievements represent a profound extension of the state-of-the-art knowl-
edge in the area of parameterising and modelling the Internet topology.
4
Acknowledgements
The author would like to express his deepest gratitude to the many people who
have kindly supported and assisted his work, including Dr. Chris Phillips and
Dr. Matthew Woolf, specially to his supervisor, Dr. Raul J. Mondragon, for his
great help and guidance through every step of the author’s research. Thanks also
to Dr. Andre Broido (CAIDA) for the inspiring discussions. The author thanks the
hospitality and support of Department of Electronic Engineering, Queen Mary,
University of London.
This work was funded by the U.K. Engineering and Physical Sciences Research
Council (EPSRC) under Grant No. GR-R30136-01.
5
Contents
Abstract 3
Acknowledgements 5
Contents 6
List of Figures 11
List of Tables 15
1 Introduction 16
1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . 17
1.2.1 Parameterising The Internet Topology . . . . . . . . . . . 17
1.2.2 Modelling The Internet Topology . . . . . . . . . . . . . . 18
1.3 Structure of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Preliminaries 20
2.1 Internet Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Topological Properties . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Network Size . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3.1 Poisson Degree Distribution . . . . . . . . . . . . 23
6
2.2.3.2 Power-Law Degree Distribution . . . . . . . . . . 24
2.2.4 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Node Betweenness Centrality . . . . . . . . . . . . . . . . 26
2.2.6 Clustering Coefficient . . . . . . . . . . . . . . . . . . . . . 27
2.2.7 Disassortative Mixing (Degree Correlations) . . . . . . . . 27
2.3 Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Small-World Networks . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Measurements and Models Of The AS-Level Internet 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Topology Measurements Of The AS-Level Internet . . . . . . . . . 33
3.2.1 Passive Measurement - BGP AS Graph . . . . . . . . . . . 34
3.2.2 Extended BGP AS Graph . . . . . . . . . . . . . . . . . . 34
3.2.3 Active Measurement - Traceroute AS Graph . . . . . . . . 35
3.2.4 Discovery Of The Internet Power-Law Degree Distribution 36
3.2.5 Which AS Graph? . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Topology Models Of The AS-Level Internet . . . . . . . . . . . . . 37
3.3.1 Tiers Model . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.2 GT-ITM Model . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.3 User-Provider Model . . . . . . . . . . . . . . . . . . . . . 38
3.3.4 Inet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.5 Barabasi and Albert Model . . . . . . . . . . . . . . . . . 39
3.3.6 Fitness BA Model . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.7 Generalised BA Model . . . . . . . . . . . . . . . . . . . . 42
3.3.8 BRITE Model . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.9 Dorogovtsev-Mendes Model . . . . . . . . . . . . . . . . . 43
3.3.10 Generalised Linear Preference Model . . . . . . . . . . . . 44
3.3.11 Generalised Network Growth Model . . . . . . . . . . . . . 45
7
3.3.12 Highly Optimised Tolerance Model . . . . . . . . . . . . . 46
3.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Structure-Based Models vs Degree-Based Models . . . . . 47
3.4.2 Accuracy vs Simplicity . . . . . . . . . . . . . . . . . . . . 47
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Rich–Club Phenomenon 50
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 Internet Hierarchical Structure . . . . . . . . . . . . . . . 50
4.1.2 Connectivity Of The Core . . . . . . . . . . . . . . . . . . 52
4.1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Rich-Club Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.1 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 56
4.2.2 Node-Node Link Distribution . . . . . . . . . . . . . . . . 57
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Rich-Club Subgraph . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 Rich-Club Phenomenon Is Relevant . . . . . . . . . . . . . 59
4.3.3 Modelling The Rich-Club . . . . . . . . . . . . . . . . . . 60
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5 Interactive Growth Model 62
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 Interactive Growth Model . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1.1 Degree Distribution . . . . . . . . . . . . . . . . 65
5.3.1.2 Degree vs Rank . . . . . . . . . . . . . . . . . . . 66
5.3.2 Rich-club Phenomenon . . . . . . . . . . . . . . . . . . . . 67
5.3.2.1 Rich-Club Connectivity . . . . . . . . . . . . . . 67
5.3.2.2 Node-Node Link Distribution . . . . . . . . . . . 68
8
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4.1 Maximum Degree . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.2 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 72
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Structure Affects Functions 74
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Routing Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Network Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4 Network Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4.1 Node Error . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4.2 Node Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4.3 Link Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4.4 Link Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 Topological Disparities Between Internet Measurements 85
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.1 Degree Distribution . . . . . . . . . . . . . . . . . . . . . . 87
7.2.2 Rich-Club Connectivity . . . . . . . . . . . . . . . . . . . 89
7.2.3 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 91
7.2.4 Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2.5 Disassortative Mixing . . . . . . . . . . . . . . . . . . . . . 96
7.2.6 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . 96
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9
8 The Positive-Feedback Preference Model 100
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.2 Modelling The Maximum Degree . . . . . . . . . . . . . . . . . . 101
8.3 The Positive-Feedback Preference Model . . . . . . . . . . . . . . 102
8.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.4.1 Degree Distribution, Rich-Club Connectivity
and Maximum Degree . . . . . . . . . . . . . . . . . . . . 105
8.4.2 Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.4.3 Disassortative Mixing . . . . . . . . . . . . . . . . . . . . . 109
8.4.4 Shortest Path Length . . . . . . . . . . . . . . . . . . . . . 110
8.4.5 Betweenness Centrality . . . . . . . . . . . . . . . . . . . . 111
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.5.1 The Positive-Feedback Preferential Attachment . . . . . . 112
8.5.2 Critical Assessment of The PFP Model . . . . . . . . . . . 112
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9 Discussion and Conclusion 115
9.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Appendix I. Queen Mary Topology Simulator 121
Appendix II. Author’s Publications 127
Glossary 129
Bibliography 131
10
List of Figures
2.1 Structure of the Internet . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 The motorway network of the USA. . . . . . . . . . . . . . . . . . 24
2.3 Poisson degree distribution. . . . . . . . . . . . . . . . . . . . . . 24
2.4 The air traffic route network of the USA. . . . . . . . . . . . . . . 25
2.5 Power-law degree distribution . . . . . . . . . . . . . . . . . . . . 25
2.6 Three Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7 Small-world properties . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 An map of the AS-level Internet. . . . . . . . . . . . . . . . . . . 35
3.2 Degree Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Growth of the BA model. . . . . . . . . . . . . . . . . . . . . . . 40
3.4 The growth of the GLP model. . . . . . . . . . . . . . . . . . . . 44
4.1 Two network structures. . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Cumulative distribution of degree. For each model, ten networks
are generated and averaged. . . . . . . . . . . . . . . . . . . . . . 55
4.3 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 58
4.5 Degree distribution inside the rich-club subgraph . . . . . . . . . 59
5.1 The interactive growth mechanism of the IG model. . . . . . . . . 63
5.2 Degree distribution. For each model, ten networks are generated
and averaged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
11
5.4 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 68
5.6 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 69
5.7 A network generated by the IG model. . . . . . . . . . . . . . . . 70
5.8 Time-evolution of node degree . . . . . . . . . . . . . . . . . . . . 71
6.1 Cumulative distribution of degree. . . . . . . . . . . . . . . . . . . 75
6.2 Rich-club connectivity. . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Cumulative distribution of shortest path length . . . . . . . . . . 76
6.4 Distribution of triangle coefficient . . . . . . . . . . . . . . . . . . 77
6.5 Cumulative distribution of triangle coefficient . . . . . . . . . . . 77
6.6 Distribution of quadrangle coefficient . . . . . . . . . . . . . . . . 78
6.7 Cumulative distribution of quadrangle coefficient . . . . . . . . . . 78
6.8 Node attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.9 Link attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.10 Network robustness under node error. . . . . . . . . . . . . . . . . 80
6.11 Network robustness under node attack. . . . . . . . . . . . . . . . 81
6.12 Network robustness under link error. . . . . . . . . . . . . . . . . 81
6.13 Network robustness under link attack. . . . . . . . . . . . . . . . 82
6.14 A conical structure model. . . . . . . . . . . . . . . . . . . . . . . 83
7.1 Cumulative degree distribution. . . . . . . . . . . . . . . . . . . . 87
7.2 Degree distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Degree vs rank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.4 Rich-club connectivity φ(k) as a function of degree. . . . . . . . . 89
7.5 Rich-club connectivity φ(r/N) as a function of normalised rank. . 90
7.6 Cumulative distribution of shortest path length. . . . . . . . . . . 91
7.7 Correlation between shortest path length and degree. . . . . . . . 91
7.8 Cumulative distribution of clustering coefficient . . . . . . . . . . 93
7.9 Correlation between clustering coefficient and degree. . . . . . . . 93
12
7.10 Cumulative distribution of triangle coefficient . . . . . . . . . . . 94
7.11 Correlation between triangle coefficient and degree. . . . . . . . . 94
7.12 Cumulative distribution of quadrangle coefficient . . . . . . . . . . 95
7.13 Correlation between quadrangle coefficient and degree. . . . . . . 95
7.14 Correlation between nearest-neighbours average degree and degree 96
7.15 Cumulative distribution of betweenness. . . . . . . . . . . . . . . 97
7.16 Correlation between betweenness and degree . . . . . . . . . . . . 97
7.17 The three AS graph measurements. . . . . . . . . . . . . . . . . . 98
8.1 Degree vs rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Rich-club connectivity . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Three degree functions . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4 Degree growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.5 Degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.6 Cumulative degree distribution . . . . . . . . . . . . . . . . . . . 106
8.7 Cumulative distribution of triangle coefficient. . . . . . . . . . . . 107
8.8 Cumulative distribution of quadrangle coefficient. . . . . . . . . . 107
8.9 Correlation between triangle coefficient and degree . . . . . . . . . 108
8.10 Correlation between quadrangle coefficient and degree . . . . . . . 108
8.11 Cumulative distribution of nearest-neighbours average degree. . . 109
8.12 Correlations between nearest-neighbours average degree and degree 109
8.13 Cumulative distribution of shortest path length. . . . . . . . . . . 110
8.14 Correlation between shortest path length and degree . . . . . . . 110
8.15 Cumulative distribution of betweenness centrality . . . . . . . . . 111
8.16 Correlations between betweenness centrality and degree . . . . . . 112
8.17 Network properties of a growing PFP model . . . . . . . . . . . . 113
10.1 Function flowchart of the QMUL Topology Simulator. . . . . . . . 122
10.2 Window of “Parameters for generating networks”. . . . . . . . . . 123
10.3 Window of the main interface. . . . . . . . . . . . . . . . . . . . . 124
13
10.4 Window of “Save plot data files”. . . . . . . . . . . . . . . . . . . 125
14
List of Tables
4.1 Distribution of ASes in the Internet hierarchy [78] . . . . . . . . . 51
4.2 Networks parameters . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Rich-club properties . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1 Network properties . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Node-node link distribution . . . . . . . . . . . . . . . . . . . . . 68
6.1 Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Network Short Cycles . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1 Parameters of the three AS graphs . . . . . . . . . . . . . . . . . 87
7.2 Rich-club connectivity as a function of degree . . . . . . . . . . . 89
7.3 Rich-club connectivity as a function of normalised rank . . . . . . 90
7.4 Parameters of the three AS graphs (continued) . . . . . . . . . . . 92
8.1 Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 105
15
Chapter 1
Introduction
Recently there have been considerable efforts to understand the topology of com-
plex systems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Of particular
interest is the Internet as it is so influential in our daily life.
1.1 Challenges
Effective engineering of the Internet is predicated on a detailed understanding
of issues such as the large-scale structure of its underlying physical topology,
the manner in which it evolves over time, and the way in which its constituent
components contribute to its overall function [17].
In the last three decades, the Internet has experienced fascinating evolution,
both exponential growth in its traffic and endless expansion in its topology [18].
This emphasises the necessity of the research on more thorough and rigourous
analysis of the nature of Internet topology.
Unfortunately, developing a deep understanding of these issues has proven
to be a challenging task [19, 20, 21, 22, 23], since it in turn involves solving
difficult problems such as mapping the actual topology [24], characterising it, and
developing models that capture its emergent behaviour.
Reliable measurements of the Internet topology became available only re-
16
cently [25, 26, 27, 28, 29, 30, 31]. Based on measurement data, Faloutsos et al
reported in 1999 that the Internet has a power-law degree distribution [32]. This
discovery invalidated all previous research results on modelling the Internet topol-
ogy, because they were based on the random network theories [33, 34, 35]. Even
though the networking community and physicists have since then proposed a
number of Internet topology models [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
it remains an open question as how representative the topologies they generate
are [21].
1.2 Contributions of this thesis
The aim of this work is to provide quantitative parameters to fully characterise
the network structure and propose realistic models to accurately reproduce the
Internet topology at the autonomous systems (AS) level. The author’s own re-
search contributions are presented in chapters 4, 5, 6, 7 and 8. All the research
are based on the actual measurements of the Internet.
1.2.1 Parameterising The Internet Topology
Chapter 4 introduces the novel concept of rich-club phenomenon to describe the
hierarchical structure of the AS-level Internet, i.e. highly connected nodes not only
have large numbers of links but also are tightly interconnected with each other.
Two metrics are defined to quantitatively characterise this structural property,
which are the rich-club connectivity and the node-node link distribution. The
calculation of the two parameters are rather simple and based only on the net-
work connectivity information. Using the two parameters, the author shows that
degree-based models may not reproduce the Internet hierarchical structure. The
metric of the rich-club connectivity is a milestone on parameterising the Internet
topology and it provides a new criterion for network models.
Inspired by the rich-club phenomenon, Chapter 5 proposes an original Interac-
17
tive Growth (IG) model, which adopts a so-called interactive growth mechanism
that has been observed on the Internet history data. The IG model closely resem-
bles both the power-law degree distribution and the rich-club connectivity of the
AS-level Internet.
Using the IG model as an example of networks containing a rich-club, Chapter
6 shows it is relevant to reproduce the Internet’s rich-club structure because an
Internet model that does not contain a rich-club underestimates the actual net-
work’s routing efficiency (shortest path length) and routing flexibility (alternative
reachable paths), and overestimate the network robustness under node-attack.
This result highlights the importance of studying the Internet topological struc-
ture because structure fundamentally affects function.
Chapter 7 provides a novel comparison of different Internet data sources ob-
tained by using distinct measuring methodologies. Results show that the mea-
surements contain non-trivial topological differences. The major structural dis-
crepancy is revealed by the rich-club connectivity.
1.2.2 Modelling The Internet Topology
Using the IG model as a precursor, Chapter 8 introduces the Positive-Feedback
Preference (PFP) model. The PFP model is superior to any other currently known
Internet topology generator. The PFP model accurately reproduces all the rele-
vant topological properties of the AS-level Internet, including degree distribution,
rich-club connectivity, the maximum degree, shortest path length, short cycles,
disassortative mixing and betweenness centrality. Moreover, the two growth mech-
anisms of the PFP model, namely the appearance of new internal links and the
positive-feedback preference, are based on (and supported by) the observations
on the Internet history data. The model’s unique non-linear preference provides
a novel insight into the basic dynamics that could be responsible for the evolving
topology of complex networks. The PFP model is a significant achievement on
18
modelling the Internet topology.
In summary, the author’s successful research has provided a number of promis-
ing contributions. The two main achievements are the metric of rich-club connec-
tivity and the PFP model. These novel contributions represent a significant exten-
sion of the state-of-the-art knowledge in the area of parameterising and modelling
the Internet topology.
1.3 Structure of this thesis
Chapter 2 defines a number of topological properties that are used in the net-
work research, including network size, degree, rank, degree distribution, shortest
path length, node betweenness centrality, clustering coefficient and disassortative
mixing (degree correlation). Chapter 2 also introduces two classical network theo-
ries used before the discovery of the power-law degree distribution of the Internet
topology.
Chapter 3 provides the up-to-date background of this research. It introduces
data sources of the Internet topology and their measuring methodologies. Chapter
3 describes a number of existing topology models that have been used for generat-
ing Internet-like graphs and then discusses the problems of the models and points
out the objectives of the research.
Chapters 4, 5, 6, 7 and 8 present the author’s research contributions on pa-
rameterising and modelling the Internet topology.
Chapter 9 reviews the methodology used in this research and provides pos-
sible directions for the future work. Appendix I provides a brief introduction
on the self-developed software kit of QMUL Topology Simulator, which is used
to conduct simulations and obtain numerical results. Appendix II lists the au-
thor’s publications. Most materials present in this thesis have been published in
peer-reviewed journals and conferences.
19
Chapter 2
Preliminaries
This Chapter introduces the Internet topology and a number of topological proper-
ties that have been widely used by the network research community. This Chapter
also introduces the two important classes of networks, namely random networks
and small-world networks, which had been used in studying and modelling the
Internet topology until the practical measurements of the Internet became avail-
able.
2.1 Internet Topology
In general terms, the Internet is a global net of computers, which are intercon-
nected by wires (links) [8]. This network provides electronic transmission of in-
formation between computers.
The connections in the Internet can be abstracted in the dimension of network
administration, which groups IP addresses into subnetworks, subnetworks into
network prefixes and prefixes into autonomous systems (AS). Figure 2.1 [32] shows
a scheme of the structure of the Internet. The vertices (nodes) of the Internet are:
• Hosts that are the computers of users.
• Servers that are computers or programs providing a network service, which
also can be hosts.
20
Figure 2.1: Structure of the Internet [32]. The global structure of the Internet isdetermined by the routers (the router level) and domains (the AS level).
• Routers that distribute traffic across the Internet.
• Domains (autonomous systems), where routers are grouped into subnet-
works.
In 2001 the Internet contained about 100 million (108) hosts. However, it is not
the hosts that determine the structure of the Internet but routers and domains.
So, one can consider the topology of the Internet at the router level or the AS level.
The net of routers is much larger than the net of autonomous systems. In 2001
there were roughly 228,000 routers in total and the total number of autonomous
systems was about 104 [18].
An autonomous system is the term that the Border Gateway Protocol
(BGP) [48] gives to an entity that manages one or more networks and has a
coherent policy for routing IP traffic both internally and to other autonomous
systems. Within autonomous systems, the routing of information is advertised by
some internal rules and algorithms (internal protocols). In principle, the internal
protocols of distinct autonomous systems should not coincide. Therefore the net-
work structure inside an autonomous system only affects local traffic behaviours.
This thesis focuses on the AS-level Internet topology, in which each node is an
21
autonomous system, because the delivery of IP traffic through the Internet de-
pends on the complex interactions between thousands of autonomous systems that
exchange routing information using the Border Gateway Protocol [49, 50]. For
example, research [51] has showed that the topology of the AS-level Internet has
a major impact on the delayed BGP routing convergence.
When studying the topology of the Internet, the network connectivity informa-
tion is represented with a graph, in which nodes are connected by links. Usually
nodes and links in the graph do not contain physical properties, such as the buffer
volume of a router or the length of an optical cable. There are some assumptions
on the Internet graph: all links are not directed links, no link connects a node to
the node itself (self-loop) and each node has at least one link (k ≥ 1). Also there
is no portion separated from the network, in other words, any node is reachable
from any other node.
2.2 Topological Properties
2.2.1 Network Size
The size of a network is given by the total number of nodes N , and the total
number of links L. For example in 1999 the AS-level Internet had 6374 nodes and
13641 links [26] and in 2001 it had 11122 nodes and 30054 links [113].
2.2.2 Degree
The degree k of a node, also called node connectivity, is the number of links which
have the node as an end-point, or equivalently, the number of nearest neighbours
of the node.
The average degree of a network, 〈k〉, can be given by 〈k〉 = (L ∗ 2)/N , where
L is the number of links and N is the number of nodes. The average degree of
the AS-level Internet was 4.28 in 1999 and 5.4 in 2001.
22
The maximum degree of a network, kmax, is the largest degree that a node
has in the network. In 2001 the maximum degree of the AS-level Internet was
2839, which was nearly a quarter of the number of nodes, kmax ' N/4, where
N = 11122.
The concept of rank is often used when studying the property of degree. The
rank r of a node denotes its position on a list of all nodes sorted in decreasing
degree. The node with rank r = 1 has the largest degree. When a group of nodes
have the same degree, they are arbitrarily assigned a position within that group.
Therefore r ∈ [1, N ], where N is the number of nodes a network has.
2.2.3 Degree Distribution
If p(k, s,N) is defined as the probability that the node s in the network of size N
has a degree k, the degree distribution is
P (k, N) =1
N
N∑
s=1
p(k, s,N) [8], (2.1)
which is often denoted as P (k). While degree is a local property, the probability
distribution of the degree gives important information of the global properties
of a network and can be used to characterise different network topologies. For
example the so-called complex networks [2, 5, 7, 8] are characterised by highly
heterogeneous degree distributions [52].
2.2.3.1 Poisson Degree Distribution
Figure 2.2 shows the motorway network of the USA. Most cities (nodes) have 3 or
4 motorway connections, only a few cities have many motorway connections and
only a few cities have only one or two motorway connections.
This motorway network characterised by a Poisson degree distribution as
shown in Figure 2.3. The distribution curve is symmetric and the majority of
nodes are distributed around the average degree of the network, 〈k〉. Networks
with a Poisson degree distribution are often referred as exponential networks.
23
Figure 2.2: The motorway network of the USA.
Figure 2.3: Poisson degree distribution.
2.2.3.2 Power-Law Degree Distribution
Figure 2.4 shows the air traffic route network of the USA. There are a very large
number of airports in the USA, but most of airports have just a few airline con-
nections. Only a few hub cities having huge numbers of airline connections and
they dominate the whole network traffic.
This network is characterised by a power-law degree distribution as shown in
Figure 2.5. The distribution curve in a logarithmic scale is a straight line, which
suggests that the formula of the power-law degree distribution is P (k) ∼ k−γ,
where the constant γ is the power-law exponent. Networks with a power-law
degree distribution are often referred as scale-free networks [38]. Both scale-free
networks and exponential networks widely exist in nature and human society [7].
Faloutsos et al [32] reported in 1999 that the AS-level Internet topology exhibit
24
Figure 2.4: The air traffic route network of the USA.
Figure 2.5: Power-law degree distribution (on a log-log scale).
a power-law degree distribution, P (k) ∼ k−γ, where γ ' 2.22.
2.2.4 Shortest Path Length
The shortest path is a route connecting two nodes with the least number of hops.
In a graph, the number of hops along a route is called the length of the path.
The average shortest path length l of a node is defined as the average length of
the shortest paths from the node to all other nodes in the network. In this thesis,
the shortest path length is calculated using Dijkstra’s algorithm [53].
The characteristic path length l∗ of a network is the average length of the
shortest paths over all pairs of nodes. The characteristic path length indicates
the network’s overall routing efficiency. A network with a smaller value of l∗ may
achieve better dynamic performance [54, 55, 56]. The characteristic path length
of the AS-level Internet was 3.7 in 1999 and it was 3.13 in 2001, which are very
25
small considering the huge size of the network.
The maximum value of the shortest paths over all pairs of nodes is the net-
work’s diameter, D. A network’s diameter may not proportionally increase with
the network size and it mainly depends on the topological structure of the network.
2.2.5 Node Betweenness Centrality
On a network, there are nodes that are more prominent than others because they
are highly used when transferring information. A way to measure this “impor-
tance” is by using the concept of node betweenness centrality, also called between-
ness, which measures the proportion of shortest paths which visit a certain node.
The betweenness centrality is defined as the total number of data packets passing
through that node when every pair of nodes sends and receives a data packet along
the shortest path connecting the pair. When there exist more than one shortest
paths between a pair of nodes, the data packet would be divided evenly.
Given a source node s and a destination node d, the number of different short-
est paths is g(s, d). The number of shortest paths that contain the node w is
g(w; s, d). The proportion of shortest paths, from s to d, which contain node
w is ps,d(w) = g(w; s, d)/g(s, d). The betweenness centrality of node w is calcu-
lated [57, 58, 59] as
CB(w) =∑s
∑
d6=s
ps,d(w), (2.2)
where the sum is over all possible pairs of nodes with s 6= d.
If all pairs of nodes of a network communicate at the same rate, and the traffic
goes by the shortest paths, then the traffic through a node is proportional to the
betweenness of the node. In other words, the betweenness estimates the capacity
of each node needed for a free-flow state [57].
A node with a large CB is “important” because it carries a large traffic load.
If this node fails or gets congested, the consequences to the network traffic can
26
be drastic [59, 60]. As is natural, one can suggest that the betweenness of a node
strongly correlates with its degree.
In this thesis the betweenness centrality is normalised by N , the total number
of nodes, and denoted as C∗B. The average betweenness centrality over all nodes
is 〈C∗B〉 = l∗ + 1 [59], where l∗ is the network’s characteristic path length.
2.2.6 Clustering Coefficient
If a node has k neighbours, then at most k ∗ (k − 1)/2 inter-neighbour links can
exist between the neighours. If nc denotes the number of inter-neighbour links the
node has, then the clustering coefficient c of the node is defined as the fraction of
the allowable links that actually exist [61],
c =nc
k(k − 1)/2. (2.3)
Clustering coefficient reflects the extent to which neighbours of a node are
also neighbours of each other, and thus it measures the cliquishness of a typical
neighbour circle. In other words, it characterises the ‘density’ of connections in
the environment close to a node.
When a node has only one neighbour (k = 1), the value of c is zero. The
maximum value of c is one, which means all neighbours are connected to each
other and the maximum linkage in this cluster (the maximum ‘clustering’) is
reached. The average clustering coefficient of a network, 〈c〉, is the average value of
clustering coefficient over all nodes. Depending on the measurement data sources,
the average clustering coefficient of the AS-level Internet is between 0.24 and
0.49 [11].
2.2.7 Disassortative Mixing (Degree Correlations)
Complex networks can be grouped into assortative, disassortative and neutral
networks [62, 63, 64, 65]. Social networks (e.g. the co-authorship network) are
27
assortative networks, in which high-degree nodes prefer to attach to other high-
degree nodes. Information networks (e.g. the World Wide Web and the Internet)
and biological networks (e.g. protein interaction networks) have been classified
as disassortative networks, in which high-degree nodes tend to connect with low-
degree ones.
A network’s degree mixing pattern is identified by the conditional probabil-
ity pc(k′|k) that a link connects a node with degree k to a node with degree
k′. This joint degree-degree distribution is inconvenient for empirical analy-
sis due to the poor statistics obtained using the limited data sources. Paster-
Satorras et al [66, 67] found that the conditional probability can be indicated
by the nearest-neighbours average degree knn of a node with degree k. In this
dependence, only one variable (degree k) is present. A disassortative network ex-
hibits a negative correlation between the nearest-neighbours average degree and
the degree.
The degree correlations are absent in classical random graphs, but are natural
in growing networks. For example the AS-level Internet exhibits the disassortative
mixing behaviour [63, 66, 67, 64], where high-degree nodes tend to connect to
peripheral nodes with low degrees.
2.3 Random Networks
The classical random network theory was introduced by Erdos and Renyi [33, 34,
68]. There are two main constructions of Erdos-Renyi graphs with a fixed number
of nodes N :
1. Each two nodes of the network are connected by a link with probability p.
Naturally, this link is absent with probability 1− p.
2. The nodes are randomly connected by a given number L of links. One can
realise this construction procedure by repeatedly adding new links between
28
pairs of randomly chosen nodes. In graph theory, this is called a random
graph process.
These two constructions define two equivalent statistical ensembles of graphs.
The set of graphs in construction (1) is all 2N(N−1)/2 graphs with any number of
links smaller than or equal to N(N − 1)/2. The set of graphs in construction (2)
consists of all possible graphs with N nodes and a given number of L of links.
The constructions above naturally generate uncorrelated graphs. In other
words, correlations between their nodes are absent. Each node in the graph with
N nodes is in the same situation. It can have any number of links attached, from
zero (a “bare” node) to N−1. If a node is of degree k, then its k links can occupy
N − 1 possible positions. Standard combinatorics readily lead to the following
degree distribution of the classical random graph:
P (k) =
(N − 1
k
)pk(1− p)N−1−k, (2.4)
that is the binomial distribution, so that the average degree is 〈k = p(N−1)〉 and
the network contains, on average, pN(N − 1)/2 links. For large N and fixed 〈k〉,the degree distribution takes the Poisson form
P (k) = e−〈k〉〈k〉k/k!. (2.5)
The Erdos-Renyi model generates statistically homogeneous networks in which,
despite the fundamental randomness of the model, most nodes have the same
number of links, 〈k〉 (the average degree). In particular, the connectivity follows
a Poisson distribution that peaks strongly at 〈k〉, implying that the probability of
finding a highly connected node decays exponentially (P (k) ' e−k, for k À 〈k〉.)The Waxman model [35] provides another construction for random networks
with Poisson degree distribution and has been widely used to generate random
topologies for network simulations. It starts by placing N nodes uniformly on an
n by n plane. Once all nodes have been placed on the plane, the model computes
29
the probability of creating a link between two nodes µ and υ with the following
probability function:
P (µ, υ) = αe−d(µ,υ)/βL, (2.6)
Where d(µ, υ) is the Euclidean distance between µ and υ, L is the maximum
Euclidean distance between two nodes, α and β are parameters in the range (0, 1].
Then a random number is generated between 0 and 1. A link is created between
µ and υ only if the random number is smaller than P (µ, υ).
The above random networks are static, in the sense that they have a fixed size.
Starting with a constant set of N disconnected nodes, these networks are defined
by the rules assigning links between pairs of nodes. These networks share a random
nature in the process of placing the links, that it is in general independent of the
local properties of nodes. Despite this extreme simplification, however, random
networks have provided for a long time the theoretical reference framework in
network modelling, including the Internet.
The characteristic path length of a random network can be approximated [33,
34] by
l∗ ≈ ln(N)/ ln〈k〉, (2.7)
where N is the number of nodes and 〈k〉 is the average degree.
2.4 Small-World Networks
A regular network is characterised by its neighbour clustering. For example the
ring-lattice network shown in Figure 2.6-a has a large number of triangles and the
grid-lattice network has a large number of quadrangles. This structural property
provides the network a large number of alternative routing choices and makes the
network as a whole highly fault-tolerant.
A random network is characterised by its random connections, which provide
routing shortcuts and make the characteristic path lengths l∗ (see Equation 2.7)
30
Figure 2.6: The three networks with the same numbers of nodes, the same numberof links and the same placement of nodes. a. Regular network (ring-lattice). b. Small-world network (the Watts-Strogatz Model). c. Random network.
of the network significantly smaller than that of an equivalent regular network.
A small-world network [61, 69, 70, 71, 72] has the following properties:
• The clustering coefficient c is much larger than that of a random graph with
the same number of nodes and the same average degree.
• The characteristic path length l∗ is almost as small as l∗ for the correspond-
ing random graph.
This means a small-world network has a large number of triangles and quadran-
gles and also has random connections. The AS-level Internet is regarded as a
good example of a small-world network because, despite the immense size of the
network, it has a very small characteristic path length (l∗ = 3 ∼ 4) and fairly
large average clustering coefficient (〈c〉 = 0.30 ∼ 0.49).
Watts [61] demonstrated that a regular lattice can be transformed into a small-
world network by making a small fraction of the connections random. Figure 2.6-a
shows a ring-lattice regular network, in which each nodes are uniformly connected
to its 4 closest neighbours. If a small fraction p, of the links are made random,
the network turns into a small-world network (Figure 2.6-b). If all the links are
made random, the network becomes a random network (Figure 2.6-c).
31
l*(p)/l*(0)
c(p)/c(0)
p
Figure 2.7: Small-world properties [61]. c(p) is the average clustering coefficient andl∗(p) is the characteristic path length of network with a fraction of p links randomlyrewired.
As shown in Figure 2.7, when only a fraction of p = 0.01 links are rewired
randomly, the network’s average clustering coefficient is nearly the same as that
of the ring-lattice regular network, c(p = 0.01)/c(0) ' 1 and the network’s char-
acteristic path length is significantly smaller than that of the ring-lattice regular
network l∗(p = 0.01)/l∗(0) ' 0.18 and close to that of the random network.
2.5 Summary
This chapter defined the following topological properties: network size, degree,
degree distribution, shortest path length, node betweenness centrality, clustering
coefficient and disassortative mixing. They are going to be used in the rest of
this thesis. This chapter also introduced the concepts of random networks and
small-world networks.
32
Chapter 3
Measurements and Models Of
The AS-Level Internet
3.1 Introduction
This Chapter introduces three types of data sources that predominate in the In-
ternet research and two methodologies used to obtain the data sets. The Chapter
also introduces a number of recently proposed network models which are of rele-
vance to this work. This Chapter sets out the immediate context of this research
and point out what the challenges are.
3.2 Topology Measurements Of The AS-Level
Internet
There are currently two primary methods of inferring the Internet structure at the
AS-level: the passive measurement, which uses BGP inter-domain routing tables,
and the active measurement, which actively probes IP addresses to get the actual
paths that packets travel from a source to a destination. The strength of this
research is that it is based on the real measurement data of the Internet topology.
33
3.2.1 Passive Measurement - BGP AS Graph
The Internet passive measurement [25, 26, 27] produces the BGP AS graphs,
which are constructed from Internet inter-domain BGP routing tables. The BGP
tables contain the information of connections from an AS to its immediate AS
neighbours.
The widely used BGP data are available from the Active Measurement Project
at National Laboratory for Applied Network Research [25] and the Route Views
Project at University of Oregon [26]. Both projects connect to a number of op-
erational routers within the Internet for the purpose of collecting BGP routing
tables.
The Measurement and Network Analysis Group of the US National Labora-
tory for Applied Network Research (NLANR) [25], has developed the Network
Analysis Infrastructure (NAI). The NAI is the largest project of its kind that
makes all data publicly available for use by other network researchers. On its
web site, http://most.nlanr.net/, one can find extensive Internet routing related
information collected since November 1997. For nearly each day NLANR has a
complete map of connections of operating autonomous systems.
BGP tables have the advantage that they are relatively easy to parse, process
and comprehend. However, despite widespread public availability, BGP data has
several limitations [73]. BGP tables do not reflect how traffic actually travels in
network and provide only a local perspective from a router toward a destination.
3.2.2 Extended BGP AS Graph
The Topology Project at University of Michigan [30] provided the extended ver-
sion [74, 75] of BGP AS graphs by using additional data sources, such as the
Internet Routing Registry (IRR) data and the Looking Glass (LG) data. The
IRR maintains individual ISP’s (Internet Service Provider) routing information
in several public repositories to coordinate global routing policy. The LG sites are
34
maintained by individual ISPs to help troubleshoot Internet-wide routing prob-
lems. Extended BGP AS graphs typically have 20-50% more links than the origi-
nal BGP AS graphs and provide more complete pictures of the Internet topology.
3.2.3 Active Measurement - Traceroute AS Graph
Figure 3.1: An map of the AS-level Internet measured by the Internet MappingProject [29] of Bell Labs.
The Internet active measurement [28, 29, 76] produces the Traceroute AS
graphs. From 1998, the Cooperative Association for Internet Data Analysis
(CAIDA [28]) began its Macroscopic Topology Project to collect and analyse
Internet-wide topology and latency data at a representatively large scale. In the
course of this project CAIDA has created several innovative measurement, anal-
ysis and visualisation tools. The primary topology measurement tool is skitter,
which implements the Internet Control Message Protocol (ICMP) to collect the
forward path from the monitor to a given destination and capture the addresses of
intermediate routers in the path. Skitter runs on more than 20 monitors around
the globe and actively collects forward IP path to over half a million destinations.
Traceroute AS graph extracts [28, 29, 76] interconnect information of ASes from
the massive traceroute data collected by skitter.
35
3.2.4 Discovery Of The Internet Power-Law Degree
Distribution
Figure 3.2: Degree Frequency [32] of a BGP AS graph measured on 5th December1998.
Based on the BGP measurement data, Faloutsos et al [32] reported in 1999
that the degree distribution of the AS-level Internet (see Figure 3.2) and the
router-level Internet are described by a power-law
P (k) ∼ k−γ, (3.1)
where the power-law exponent is γ ' 2.2 for the AS-level Internet. The discov-
ery of the Internet power-law degree distribution is of fundamental importance
because it showed that the Internet topology can not be modelled by network
models with a Poisson degree distribution, such as random networks and small-
world networks. In fact, this property literarily invalidated all previous research
on modelling the Internet topology.
3.2.5 Which AS Graph?
Most recent studies on the AS-level Internet topology were based on the
BGP AS graphs and the Extended AS graphs, such as the power-law de-
gree distribution [32], the error and attack tolerance [77] and other research
works [66, 67, 78, 37, 79].
36
Comparison studies [80, 81, 82, 73] have shown that the Traceroute AS graph
is more complete and reliable than the BGP AS graph. However it is not clear
whether the Traceroute AS graph is more complete than the Extended BGP AS
graph, which has captured even more Internet connections than the Traceroute
AS graph.
Chapter 7 will compare the three types of AS graphs in detail by examining all
the topological properties. Based on the comparison results, the author suggests
that the Traceroute AS graphs are more realistic measurements for the Internet
research. In this thesis, Chapter 4 and 5 are based on an Extended AS graph
measured in 2001. Chapter 6 and 8 are based on a Traceroute AS graph measured
in 2002.
3.3 Topology Models Of The AS-Level Internet
This section introduces a selection of the existing Internet models which have been
widely used in the studying of the Internet. The Tiers model, the GT-ITM model
and the User-Provider model focus on the Internet hierarchical structure [83].
The Inet model, the Barabasi and Albert (BA) model and the modifications of
the BA model are degree-oriented models. The BRITE model, the Dorogovtsev-
Mendes Model, the Generalised Network Growth (GNG) Model, the Generalised
Linear Preference (GLP) Model and Highly Optimised Tolerance (HOT) Model
are examples of models using more complex growth mechanisms. This thesis will
further study the Inet model, the BA model, the Fitness BA model and the GLP
model in the following chapters.
3.3.1 Tiers Model
The Tiers generator [84] is based on a three level hierarchy that represents Wide
Area Networks (WAN), Metropolitan Area Networks (MAN), and Local Area
Networks (LAN). To generate a random topology using Tiers, one specifies a
37
target number of LANs and MANs. Currently Tiers cannot generate more than
one WAN per random topology. For each level of hierarchy, one also specifies
a fixed number of nodes per network. A minimum spanning tree is computed
to connect all links, then other links are created based on user-specified average
inter-level and intra-level redundancy. The link formation favours close-by nodes,
resulting in topologies with large diameters (see Section 2.2.4 on page 26).
3.3.2 GT-ITM Model
GT-ITM (Transit-Stub) model [85, 83] generates topologies based on several dif-
ferent models. The connectivity used to generate each connected graph can be se-
lected from one of six methods: PureRandom, Waxman1, Waxman2, Doar-Leslie,
Exponential, or Locality [85, 83]. Similar to Tiers, the model has a well-defined
hierarchical structure. It generates topologies with two levels of hierarchy: one
consisting of transit ASes, and the other consisting of stub ASes. Also similar to
Tiers, the GT-ITM model allows for extra links to be added between stub ASes
and between stub and transit ASes.
3.3.3 User-Provider Model
User-Provider model [36] generates networks using a self-organised interaction
between users and providers, where the interactive can be rearranged during the
network growth. All nodes in the model are divided into two roles: providers and
users. Providers can have several links, pointing to other sites which correspond
to users. Users have a single link pointing to their providers. At each time-step,
a node is added to the network. The new node can be either a provider with a
probability r or a user with probability 1−r. When a provider is added, D(t) users
in the network are chosen at random, and rewired to the new provider. Links to the
previous providers are removed. It is assumed that the integer number D(t) is a
random variable with Poisson distribution and each user has the same probability
38
(1/k) to be rewired.
3.3.4 Inet Model
The Inet model1 [86, 37] was designed to match the degree distribution as mea-
sured in the BGP AS graphs. The model generates networks in three steps:
• Build a spanning tree with all nodes that have degrees greater than one.
• Connect all nodes with degree one to nodes in the spanning tree with a
linear preference.
• Connect the remaining free links in the spanning tree.
The number of links generated by the model depends on two parameters, which
are the total number of nodes and the percentage of nodes with degree k = 1.
Since the model is based on the original BGP AS graph, it typically generates
26% less links than the extended BGP AS graph.
3.3.5 Barabasi and Albert Model
Pursuing a very different class of dynamic graph models, Barabasi and Al-
bert [38, 87] showed that power-law graphs can arise from a simple dynamic model
that combines incremental growth with a preference for new nodes to connect to
existing ones that are already well connected.
The BA model starts with a small random network, the system “grows” by
attaching a new node with m links 2 to m different nodes that already present in
the system (see Figure 3.3); and the attachment is “preferential” [88] because the
probability that a new node connects to node i with degree ki is
Π(ki) =ki∑j kj
, (3.2)
1During the research on this thesis, the author found that the Inet-2.1 model con-tains redundant links in the output. According to his report, the Inet research group(http://topology.eecs.umich.edu/inet/) identified the programming bug and updated the modelto version 2.2 and later Inet-3.0.
2Use m = 3 to obtain Internet-like networks.
39
(Existing network)
New node
Figure 3.3: Growth of the BA model.
which is a linear function of ki.
The BA model has generated great interest in various research areas [89, 90,
91, 92]. Barabasi and Albert state [40, 93] that this intuitively appealing growth
model applies to the Internet’s AS graph and therefore explains why AS graph ex-
hibit power-law degree distributions. The model has also been used as a starting-
point in research into the error and attack tolerance of the Internet [77, 94].
Simplicity and parsimony are the two advantages of the BA model. The BA
model is important also because the model can be mathematically analysed. Using
mean-field theory, Barabasi and Albert [95] showed that the BA model generates
networks with a degree distribution of P (k) ∼ k−γ with the power-law exponent
of γ = 3.0, which is independent of network size (growth time) and the parameter
m.
Mean-field theory for scale-free random networks
After t time-steps, the network has N = t + m0 nodes and mt links. Time de-
pendence of the connectivity ki of a given node i can be calculated analytically
using a mean-field approach. Assume that k is continuous, and thus the proba-
bility Π(ki) = ki/∑
j kj can be interpreted as a continuous rate of change of ki.
Consequently,
∂ki
∂t= mΠ(ki) = m
ki∑N−1j=1 kj
.
40
Taking into account the total growth in the number of links∑
j kj = 2mt, then
∂ki/∂t = ki/2t. The solution of this equation, with the initial condition that node
i was added to the system at time ti with connectivity ki(ti) = m, is
ki(t) = m(
t
ti
)β
, β = 1/2.
The probability that a node has a degree ki(t) smaller than k, P (ki(t) < k), can
be written as:
P (ki(t) < k) = P
(ti >
m1/βt
k1/β
)
If the nodes are added at equal time intervals, the probability density of ti is
P (ti) = 1/(m0 + t). Then,
P
(ti >
m1/βt
k1/β
)= 1− m1/βt
(m0 + t)k1/β.
The degree probability distribution is
P (k) =∂P (ki(t) < k)
∂k=
2m1/βt
(m0 + t)k1/β+1
where 1β
+ 1 = 3, so that P (k) ∼ k−3.
3.3.6 Fitness BA Model
The Fitness BA (FBA) model [39] is a modification of the BA model. It uses
generalised preferential attachment which assures that, even a relatively young
node with a small number of links, can acquire new links at a higher rate if it has
a large fitness parameter. The reason the author studies this model is that, for
the uniform fitness parameter distribution, the network generated by this model
has a power-law exponent similar to that of the AS graph.
The FBA model [39] is identical to BA model except that a new parameter,
fitness, is introduced in to the calculation of the probability Π. In the real Internet,
41
the probability that a new node will be connected to node i does not only depends
on the node’s connectivity k. The node’s fitness describes it’s ability to compete
for links at the expense of other nodes. Fitness BA model generate networks with
a power-law degree distribution with the exponent of the power-law closer to the
actual Internet degree distribution.
A fixed fitness parameter η is assigned to each nodes, where η is chosen uni-
formly from the interval [0, 1]. The preferential probability becomes:
Π(i) =ηiki∑j ηjkj
. (3.3)
Using mean-field theory, Bianconi and Barabasi [39] showed that the Fitness
BA model generates networks with a power-law degree distribution of P (k) ∼ k−γ,
where the slope γ = 2.25, which is closer to that of the Internet (γ ' 2.2).
3.3.7 Generalised BA Model
The Generalised BA model [40] is an extension of the BA model. It can generate
networks with power-law exponents between 2 and 4. In the Generalised BA
model, three possible activities could happen in every growth step:
• With probability p (0 ≤ p < 1),m (m < m0) new links are added.
• With probability q (0 ≤ q < 1− p) , m links are rewired.
• With probability 1− p− q, a new node is added.
The preferential probability is
Πi =ki + 1∑j(kj + 1)
, (3.4)
which is proportional to ki + 1, such that there is a nonzero probability that
isolated nodes (ki = 0) acquire new links. Albert and Barabasi [40] showed that
the network’s power-law degree distribution is:
P (k) =t
m0 + tD(p, q, m) (k + A(p, q, m) + 1)−1−B(p,q,m) , (3.5)
42
where,
A(p, q, m) = (p− q)
(2m(1− q)
1− p− q+ 1
), (3.6)
B(p, q, m) =2m(1− q) + 1− p− q
m(3.7)
and
D(p, q, m) = (m + A(p, q, m) + 1)B(p,q,m)B(p, q, m). (3.8)
The power-law exponent γ = 1 + B(p, q, m) and varies between 2 and 4.
3.3.8 BRITE Model
BRITE [41, 96] is an approach towards universal topology generation. BRITE
combines a number of topology generation tools, which can be used to flexibly
control various parameters (such as connectivity and growth models) and study
various properties of generated network topologies (such power-laws, average path
length, etc). It has the following features:
• Flexible: BRITE supports multiple generation models. Models can be en-
hanced by assigning links attributes such as bandwidth and delay.
• Extensible: BRITE’s object-oriented architecture provides researchers with
the ability to add new models of generation and with the ability to import
from and export to custom topology files.
• Interoperable: BRITE allows importing topologies from other topology gen-
erators and extending or combining them with other topologies.
3.3.9 Dorogovtsev-Mendes Model
Dorogovtsev and Mendes [42] introduced a model using the addition of new in-
ternal links. If the parameter m is the number of new internal links that appear
at each growth time-step, the model evolves according to the following rules.
43
• At each time-step, a new node is added and linked with node i with the
probability given by the BA model (see Equation 3.2 on page 39).
• In addition,
– m ≥ 0 new internal links are added between unconnected pairs of
old node i and j with probability proportional to the product of their
degrees, ki × kj.
– In the case of m ≤ 0, some old links between old nodes are removed
with equal probability.
The parameter m may be also non integer. Dorogovtsev and Mendes showed that
with a wide range of m, this model can generate networks with power-law degree
distributions and the power-law exponent γ can be adjusted by the m. However
this model produces the wrong kind of the degree-degree correlation.
3.3.10 Generalised Linear Preference Model
Bu et al [44] recently introduced the Generalised Linear Preference (GLP) model.
This model is a modification of the BA model. It reflects the fact that the evolution
of the Internet topology is mostly due to two operations, the addition of new nodes
and the addition of new links between existing nodes.
(Existing network)
New node
(Existing network)
New links
a. Addition of new nodes b. Addition of new links
Figure 3.4: The growth of the GLP model. The two operations are independent.
44
The model starts with m0 nodes connected through m0 − 1 links. As shown
in Figure 3.4, at each time-step, one of the following two operations is performed:
• With probability ρ ∈ [0, 1], m (m < m0) new links are added between m
pairs of nodes chosen from existing nodes;
• With probability 1− ρ, one new node is added connecting to m old nodes.
The GLP model uses the generalised linear preference that the probability
Π(i) to choose node i with degree ki is given by
Π(i) =(ki − β)∑j(kj − β)
, β ∈ (−∞, 1). (3.9)
The parameter β can be adjusted such nodes have a stronger preference of being
connected to high degree nodes than predicted by the linear preference of the BA
model given by Equation 3.2 (on page 39).
Bu et al showed that the GLP model, using the recommended parameter values
(ρ = 0.66, m = 1, m0 = 10 and β = 0.6447), resembles the characteristic path
length and the clustering coefficient of a BGP AS graph measured in September
2000.
3.3.11 Generalised Network Growth Model
The Generalised Network Growth (GNG) Model [45, 97] is similar to the GLP
model. The basic idea of the GNG model is to allow both the addition of a vertex
(with probability p) and the addition of a link (with probability 1 − p), but the
model applied a new preference scheme. According to the its definition, at each
time-step,
• either a node is added and linked with node i with probability
Π(i) = p · ki∑j=1, N kj
,
45
• or a link is added (if absent) between nodes i and j, which are already
present in the system, with probability
Π(i) = (1− p) · ki∑k=1, N kk
· |ki − kj|∑k 6=i=1, N |ki − kk| .
The resulting network is a scale-free one, with the power-law exponent γ(p) =
2 + p2−p
. From the above rules, the case p = 1 (no link creation) corresponds
to a traditional BA model where only one connection is added for a time-step.
This model exhibits some agreement with the Internet measurements for the de-
gree distribution, the betweenness distribution, the clustering coefficient and the
correlation functions for the degrees. However the growth dynamics of the GNG
model are not supported by the real measurements.
3.3.12 Highly Optimised Tolerance Model
Carlson et al [46, 98] introduced another mechanism for generating power-law
distributions, referred to as Highly Optimised Tolerance (HOT), which is moti-
vated by biological organisms and advanced engineering technologies. Their focus
is on systems which are optimised, either through natural selection or engineer-
ing design, to provide robust performance despite uncertain environment. They
suggest that power-laws in these systems are due to tradeoffs between yield, cost
of resources, and tolerance to risks. The characteristic features of HOT systems
include: 1) high efficiency, performance, and robustness to designed-for uncer-
tainties; 2) hypersensitivity to design flaws and unanticipated perturbations; 3)
nongeneric, specialised, structured configurations; and 4) power-laws.
46
3.4 Discussions
3.4.1 Structure-Based Models vs Degree-Based Models
Following the long-held belief that the Internet is hierarchical, the network topol-
ogy generators most widely used by the Internet research community, e.g. the
Tiers model and the GT-ITM model, create networks with a deliberately hierar-
chical structure.
However, in 1999 Faloutsos et al [32] revealed that the Internet’s degree dis-
tribution is a power-law and Tangmunarunkit et al [99] showed that the degree
distributions produced by structure-based topology generators are not power-laws.
Since then the research community has largely dismissed the structure-based mod-
els as inadequate and proposed new network generators that attempt to generate
graphs with power-law degree distributions.
Tangmunarunkit et al [99, 100] also discovered, much to their surprise, that
network generators based on the degree distribution more accurately capture the
Internet large-scale structure (such as the hierarchical structure measured by Sub-
ramanian et al [78]). However their judgements were based on simple qualita-
tive comparisons and heuristic assumptions. Tangmunarunkit et al and other
researchers recognised [20, 21] that there is a need for further studies to charac-
terise the network topology structures.
One objective of this thesis is to provide parameters to quantitatively char-
acterise and differentiate the hierarchical structure of Internet-like scale-free net-
works.
3.4.2 Accuracy vs Simplicity
Since the discovery of the power-law degree distribution in the Internet, the num-
ber of models trying to explain the power-law has been growing very rapidly.
However, there is still no Internet evolution model that would be satisfactory
47
from both the physical and networking standpoints [101]. As a result, the laws
governing the Internet evolution remain unclear.
The Barabasi-Albert (BA) model and its derivatives, popular among physi-
cists, have seen a lot of criticism from the networking community for being too
general, not incorporating any domain specifics, and, hence, failing to predict cor-
rectly many characteristics of the Internet topology and evolution. For example,
by examining the AS graph Data Sets from the Topology Project of University
of Michigan, Chen et al [74] show that available historical data of the AS-level
Internet does not support the connectivity-based dynamics assumed in the BA
model. And detailed dynamics underlying the BA modelling approach does not
explain the complex structure of the AS maps. The modified BA models have
similar problems. The same type of argument has been actively used against the
BA model by biologists.
On the other hand, the models proposed by the networking community try to
incorporate Internet evolution specifics by introducing a number of non-physical
parameters allowing one to easily fit the output of a model to the observed data
(e.g. [102]). It is easy to see that any model with sufficient number of external
parameters can be forced to produce any required output by parameter manipu-
lations. A model can be of some theoretical value only when all its parameters
can be expressed via physical variables.
All the existing Internet models only focus on selected network properties and
no model is capable of accurately capturing all the relevant topological properties
of the Internet topology. Furthermore it is uncertain which model is better than
other and researchers are even not sure whether it is feasible at all to accurately
reproduce the Internet topology with an evolving model using fairly simple and
realistic mechanisms.
Because of the above inadequateness and uncertainty of the research on the
Internet topology, random networks and regular lattice graphs are still often used
by the Internet engineering community in practical studies on routing behaviours
48
and protocol simulations [103].
Another objective of this thesis is to provide realistic models to accurately
reproduce the AS-level Internet topology.
3.5 Summary
This Chapter introduces the recent measurements of the AS-level Internet topol-
ogy and a number of the Internet topology generators. This Chapter also discusses
the challenges in parameterising and modelling the Internet topology and sets out
the immediate context for this research.
49
Chapter 4
Rich–Club Phenomenon
4.1 Introduction
Inspired by detailed measurements on the Internet hierarchical structure, this
chapter introduces the concept of the rich-club phenomenon, which describes an
overlooked hierarchical structure of the AS-level Internet, that high-degree nodes
are tightly interconnected with each other. Two metrics are provided to quanti-
tatively characterise this structural property.
4.1.1 Internet Hierarchical Structure
It is well-known that the Internet topology has a hierarchical structure. However
the description of this structure is merely qualitative and vague. Recently based
on measurements, Subramanian et al [78] has classified and identified the exact
details of the tier structure of the AS-level Internet topology. Subramanian studied
the topology structure in terms of customer-provider and peer-peer relationships
between autonomous systems as manifested in the BGP routing policies. Using
heuristic arguments based on the commercial relationship [104] between ASes,
they proposed a five-level classification of ASes.
Dense Core: For every AS present in the dense core, all of its peers and its
provider should also be present in the core. The core of the network should
50
include the small number of so-called tier-1 providers. In practice, the term
Tier-1 provider is loosely defined as a “large” AS or as an AS that does not
have any upstream provider. These ASes could be identified by looking for
all provider-free nodes. The dense core consists of 20 ASes, including the
large Internet Service Providers (ISP) such as Genuity, Sprint, AT&T, and
UUNet. The top 20 ASes have a very dense connectivity of 312 peering
links. The top 15 of the 20 ASes almost form a clique with only three links
missing from the clique.
Transit Core: ASes in the transit core are large national providers and hosting
companies that have peering relationships with many of the ASes in the
dense core.
Outer Core: The remaining ASes in the core as the outer core. The members of
the outer core typically represent regional ISPs which have a few customer
ASes and a few peering relationships with other such regional ISPs.
Small Regional ISPs: Small Regional ISPs are ASes having one or more cus-
tomers and they have no ASes peering relationships.
Customers: Customers are those stub networks which are origins and sinks of
traffic and which do not carry any transit traffic.
Table 4.1: Distribution of ASes in the Internet hierarchy [78]
Level Number of ASes
Dense core (0) 20
Transit core (1) 129
Outer core (2) 897
Small regional ISPs (3) 971
Customers (4) 8898
51
4.1.2 Connectivity Of The Core
Subramanian et al has showed that the graph constructed from ten BGP dumps
on 18 April 2001 has 10,915 ASes, of which 8,898 are customers and 971 are
small regional ISPs (see Table 4.1). The remainder of the network is the core,
consisting of a connected component with just 1046 ASes and 6249 connections.
This represents approximately 25% of the total number of connections in the
graph. The nodes in the core have an average degree of 6. The key result is that
the Internet has a tier structure, where the Tier 1 consists of a “core” of ASes
which are well interconnected to each other.
However the network research community did not pay sufficient attention on
this hierarchical property, because the approach that used in Subramanian et al ’s
analysis has a number of limitations. Firstly it is a time-consuming process,
which involves scrutinising on large amounts of various information data sources.
Secondly it is based on a number of heuristic assumptions on the commercial
relationships between network elements. Thirdly the result is represented as sev-
eral tables of numbers. Thus this analysis only applies to this specific case and
provides no comparison with other networks.
4.1.3 Motivation
The author noticed Subramanian et al ’s work and were very interested in the fact
that highly connected nodes are tightly interconnected with each other.
It is known that the AS-level Internet has a power-law degree distribution,
therefore it contains a small number of nodes which have very large numbers of
links. The AS-level Internet also exhibits the disassortative mixing behaviour [66,
67], where high-degree nodes tend to connect to nodes with low degrees. However
neither the power-law degree distribution nor the disassortative mixing suggest
whether the high-degree nodes are tightly or loosely interconnected with each
other.
52
(a) (b)
Figure 4.1: Two disassortative networks. (a) High-degree nodes are loosely inter-connected. (b) High-degree nodes are tightly interconnected.
As shown in Figure 4.1, two networks having similar degree distributions and
disassortative mixing behaviours can exhibit different structures. In Figure 4.1-a
the high-degree nodes are not directly interconnected, whereas in Figure 4.1-b the
high-degree nodes are tightly interconnected. One can see that this structural
difference is relevant because the network routing is much more efficient when the
high-degree nodes have direct connections among each other.
The author realised that Subramanian et al ’s measurement on the connectivity
of the core actually revealed a structural property that had not been characterised
by the existing topological parameters. Then the author recognised that there was
a need for further studies to characterise this critical structure feature and the
author expected that measuring on the inter-connectivity among the high-degree
nodes using a quantitative metric might provide a clue for a deeper understanding
on the Internet topology, namely to answer the following two questions:
• How to quantitatively characterise the rich-club phenomenon?
• Do networks having power-law degree distributions, such as maps of the AS-
level Internet and synthetic scale-free networks generated by models, show
similar hierarchical structures?
4.2 Rich-Club Phenomenon
In 2002 the author introduced the concept of rich-club phenomenon [105] to de-
scribe the above hierarchical structure of the AS-level Internet. The rich-club
53
phenomenon has two meanings. Firstly the network contains a small number of
highly connected nodes. These nodes are called “rich” nodes. Secondly the rich
nodes are tightly interconnected with each other and form a tight group, which
is called the “rich-club”. The term rich-club is used to resemble a popular phe-
nomenon in the human society, where rich upper-class people form an exclusive
club to promote social and business connections among the club members.
Note that the rich-club phenomenon does not imply that the majority of the
rich nodes’ links are directed to other club members. Indeed, rich nodes have
very large numbers of links and only a few of them are enough to provide the
connectivity to other club members, whose number is anyway small.
After many calculations and tests on various possible candidate parame-
ters [106, 107], the author provided two metrics to quantitatively characterise the
rich-club phenomenon, which are the rich-club connectivity and the node-node
link distribution. These two parameters are not associated with any heuristic as-
sumption but based only on the network connectivity information. The calculation
of the metrics is fairly simple and their topological meanings are straightforward.
The Four Networks
In this sections, the two metrics of the rich-club phenomenon are defined and mea-
sured in four different networks, which include an Extended BGP AS graph mea-
sured in May 2001 [30] and three synthetic networks generated by the Barabasi-
Albert (BA) model, the Fitness BA (FBA) model and the Inet-3.0 model. For
each model, ten networks are generated with different seed numbers and all results
are the average over the ten networks.
As shown in Table 4.2, the four networks have the same number of nodes and
similar numbers of links (except the Inet-3.0 network). Figure 4.2 shows that the
cumulative degree distribution Pcum(k) of the four networks follow power-laws.
The Pcum(k) of the AS graph is characterised by a power-law of slope -1.22, which
54
Table 4.2: Networks parameters
AS Graph Inet-3.0 Fitness BA BA Model
Number of nodes, N 11461 11461 11461 11461
Number of links, L 32730 24171 34366 34366
Average degree, 〈k〉 5.7 4.2 6.0 6.0
Max. degree, kmax 2432 2010 1793 329
Power-law exponent, γ 2.22 2.22 2.255 3.0
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103
Cum
ulat
ive
dist
ribu
tion
Degree
Extended BGP AS graphInet-3.0 model
FBA modelBA model
Figure 4.2: Cumulative distribution of degree. For each model, ten networks aregenerated and averaged.
yields the power-law degree distribution of P (k) ∼ k−γ, γ ' 2.22. Table 4.2 shows
that the Inet-3.0 model and the FBA model have similar power-law exponents as
the AS graph, whereas the power-law slope of the BA model is 3.0.
The reasons that the author chose and compared these three models are be-
cause the BA model is the most widely-studied scale-free model, the FBA model
generates networks with a similar power-law exponent as the AS graph and the
Inet-3.0 model is designed to resemble the AS graph’s degree-distribution. Notice
that the author is not trying to characterise all the existing power-law network
generators, but to show that it is possible to distinguish between them by studying
the properties of the rich-club.
55
4.2.1 Rich-Club Connectivity
A quantitative assessment of the rich-club phenomenon is obtained by measuring
the rich-club connectivity φ, defined as the fraction of allowable links1 that actu-
ally exist among members of a rich-club. The rich-club membership is specified in
two ways: nodes with degrees higher than k (“guys richer than k”), or nodes with
ranks less than r ( “the top r richest guys”). Thus the rich-club connectivity can
be plotted as a function of node degree or node rank. In order to be independent
from the scale of the network size, the rich-club connectivity is often plotted as a
function of node rank that is normalised by the number of network nodes. The
rich-club connectivity measures how well the members of the rich-club “know”
each other. A rich-club connectivity of 100% means that all the members have a
direct link to any other member. Lower percentages of this quantity means lesser
connections between them.
AS GraphInet-3.0Fitness BABA Model
0.001 0.01 0.1 1Normalized rank (r/N)
Ric
h cl
ub c
onne
ctiv
ity
0.1%
0.01%
1%
10%
100%
Figure 4.3: Rich-club connectivity φ(r/N) as a function of normalised rank r/N .
Figure 4.3 shows the rich-club coefficient φ(r/N) as a function of normalised
rank r/N . The figure illustrates that in the four networks, the rich-club subgraphs
formed by nodes of higher degrees are progressively more interconnected. However
it is clear that the four networks exhibit profound structural differences on the
1The number of allowable links in a n-node subgraph is n(n− 1)/2.
56
tendency of high-degree nodes to be well interconnected among each other. For
example the rich nodes of the AS graph is significantly more tightly interconnected
than those of the three synthetic networks. As shown in Table 4.3, the top 1% rich
nodes in the AS graph have 32% of the allowable links, compared with φ(0.01) =
18% of the Inet-3.0 model and only φ(0.01) = 5% of the BA model and the Fitness
model.
Table 4.3: Rich-club properties
AS Graph Inet-3.0 Fitness BA BA Model
φ(r/N = 0.01) 32% 18% 5% 5%∑
rjl(ri ≤ 5%, rj) 28602 22620 20929 15687
l(ri ≤ 5%, rj ≤ 5%) 8919 3697 1426 1511
φ(r/N = 0.01) is the rich-club connectivity among the top 1% richest nodes.∑rj
l(ri ≤ 5%, rj) is the number of links connecting to the top 5% rich nodes.l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes.
4.2.2 Node-Node Link Distribution
The node-node link distribution is introduced to provide an more detailed view
of the network rich-club structure.
Network nodes are divided into subsets according to their ranks, for example
ranks are normalised by the total number of nodes and divided into 5% bins. Then
the node-node link distribution l(ri, rj), is define as the number of links connecting
from nodes in the subset ri to nodes in the subset rj, where ri ≤ rj. Figure 4.4
illustrates the node-node link distribution l(ri, rj), against corresponding rank
bins ri and rj.
In the Extended BGP AS graph (Figure 4.4a), rich nodes (see columns in the
row of ri = 5%) are connected preferentially to other rich nodes, where the number
of links interconnecting among the top 5% rich nodes (the far corner column) is
significantly larger than the numbers of links connecting the rich nodes to other
lesser rich nodes.
57
l(ri,rj)
9000
05%
100%
5%
100%
rjri
Ext. BGP AS graph
(a)
l(ri,rj)
4000
05%
100%
5%
100%
rjri
Inet-3.0 model
(b)
l(ri,rj)
2000
05%
100%
5%
100%
rjri
BA model
(c)
l(ri,rj)
2000
05%
100%
5%
100%
rjri
FBA model
(d)
Figure 4.4: Node-node link distribution.
The node-node link distribution of the Inet-3.0 (Figure 4.4b) is similar to that
of the AS graph, however, the number of links interconnecting among the top
5% rich nodes (far corner, 3697 links) is significantly smaller than that of the AS
graph (8919 links, see Table 4.3).
The node-node link distributions of the BA and the Fitness BA graphs (Fig-
ure 4.4c, 4.4d) are fundamentally different from that of the AS graph. The top 5%
rich nodes of the BA and the Fitness BA graphs are connected to all node sub-
sets with similar probabilities regardless of the rank range of subsets. Networks
generated by these two models do not contain a tightly interconnected rich-club
at all.
4.3 Discussion
The rich-club phenomenon describes a hierarchical property of the AS-level In-
ternet that high-degree nodes are tightly interconnected with each other. Un-
til recently this structural feature has been overlooked by the network research
community. The author’s novel contribution is the introduction of the rich-club
58
connectivity and the node-node link distribution, which for the first time pro-
vide a realistic way to quantitatively characterise and differentiate this structural
property of networks having power-law degree distributions. Results show that
synthetic scale-free networks generated by degree-based models may exhibit dif-
ferent hierarchical structures.
4.3.1 Rich-Club Subgraph
0
0.02
0.04
0.06
0 20 40 60 80 100
Dis
trib
utio
n
Degree
Figure 4.5: Degree distribution inside the rich-club subgraph which consisting ofthe top 1% rich nodes of the AS graph.
As shown in Figure 4.5, if the rich-club comprises the top 1% rich nodes of
the Internet AS graph, the probability distribution of degrees among the club
members is not a power-law, in fact it peaks around degree k = 25. Calculation
shows that the average distance between rich nodes is 1.73 hops, which is very
small and means if two club members do not have a direct link between them,
very likely they share a neighbour member.
4.3.2 Rich-Club Phenomenon Is Relevant
The rich-club consists of highly connected nodes, which are well interconnected
between each other and the average hop distance among the club members is very
small (1 to 2 hops). The rich-club is a “super” traffic hub of the network and the
59
Internet’s disassortative mixing property ensures that peripheral nodes are always
near the hub. These two structural properties contribute to the routing efficiency
of a network.
Modelling the rich-club phenomenon is relevant [108], because an Internet
model that does not reproduces the properties of the rich-club will underestimate
the actual network’s routing efficiency (shortest path length) and routing flex-
ibility (alternative reachable paths), and also, it will overestimate the network
robustness under node-attack [77]. Chapter 6 will investigate into more details on
the impacts of the Internet rich-club structure.
4.3.3 Modelling The Rich-Club
Results show the Inet-3.0 model does not show the rich-club phenomenon as strong
as the Extended BGP AS graph. The reason is that the Inet-3.0 model is designed
to resemble the original BGP AS graph. For example, networks generated by the
model typically have 27% less links than the Extended BGP AS graph.
The BA model and Fitness BA model generate strict power-law degree distri-
butions, which are very different from that of the AS-level Internet. Moreover, it
does not show the rich-club phenomenon of the AS graph at all. This is due to
the growth dynamics of the models. In both models, new links are brought into
the system by the addition of new nodes. New nodes are preferentially connected
to high degree nodes. Thus inter-rich links can only appear when some new nodes
grow into rich nodes. However, due to the preferential attachment, the probabil-
ity for a new node to become a rich node decreases as the network grows. As a
result, rich nodes are not well interconnected between each other. This suggests a
simple modification to these models to generate a rich-club: as the network grows,
new internal links appear which are preferentially attached between the existing
nodes. An example is the Interactive Growth model, which will be introduced in
Chapter 5.
60
The above analysis on the three network models demonstrates that the rich-
club connectivity is useful in revealing structural details of complex networks
and provides a new perspective for analysing the growth mechanisms of evolving
network models. In the following chapters, the rich-club connectivity is used as
both a new criterion for validating network structures and a practical guideline
for proposing new models.
4.4 Summary
The rich-club phenomenon describes the hierarchical structure of the AS-level
Internet where high-degree nodes are tightly interconnected with each other. This
structural property is quantitatively characterised by the rich-club connectivity
and the node-node link distribution. The calculation of the two metrics is simple
and solely based on graph connectivity information. The rich-club connectivity
is a critical complement to the existing topology parameters to explicitly and
thoroughly characterise large-scale complex network structures and it provides a
new criterion for network models.
61
Chapter 5
Interactive Growth Model
5.1 Introduction
Chapter 4 shows that the rich-club connectivity quantitatively characterises the
hierarchical structure of the AS-level Internet and a number of degree-based In-
ternet models do not reproduce the rich-club connectivity as the actual network.
This chapter introduces the Interactive Growth (IG) model [109, 110], which
uses a growth mechanism that is based on observations on the Internet history
data. The model is validated against an Extended BGP AS graph and the IG
model is also compared with a number of other Internet models. Results show that
the IG model compares favourable with other models because it closely resembles
both the power-law degree distribution and the rich-club connectivity of the AS-
level Internet. The chapter also discusses the reasons that are responsible for the
topological differences between the network models.
The IG model, as an example of networks containing a rich-club, will be used
in the next chapter to investigate the impact of the network structures on the
network behaviours. The IG model is also the precursor of the Positive Feedback
Preference (PFP) model which will be introduced in Chapter 8.
62
5.2 Interactive Growth Model
The Interactive Growth (IG) model modifies the Barabasi and Albert (BA) model
(see Section 3.3.5 on page 39) by using a so-called interactive growth mechanism,
which is based on a number of dynamic behaviours observed [74, 66, 67, 79] on the
Internet history data. Firstly there are two main operations that account for the
evolution of the Internet graph: the addition of new nodes and the appearance
of new internal links between already existing nodes (old nodes). Secondly the
majority of new nodes are added to the system by attaching them to only one or
two old nodes. Thirdly the degree distribution of the AS-level Internet is not a
strict power-law, for example it has more nodes with degree two than nodes with
degree one (P (k = 2) > P (k = 1)). Lastly the majority nodes (with low degrees)
in the AS-level Internet exhibit a linear preferential attachment as described in
the BA model (see Equation 3.2 on page 39).
Figure 5.1: The interactive growth mechanism of the IG model. a) A new node isattached to one old node and at the same time-step two new internal links appear.b) A new node is attached to two old nodes and one new internal link appears.
The interactive growth mechanism is shown in Figure 5.1. The IG model starts
with a small random network, at each time-step,
• with probability p ∈ [0, 1] (see Figure 5.1-a), a new node is attached to one
old node (host node), and at the same time two new internal links appear
connecting the host node to two other old nodes (peer nodes),
63
• with probability 1−p (see Figure 5.1-b), a new node is attached to two host
nodes and one new internal link appears connecting one of the two host node
to one peer node.
The linear preference probability given by the BA model is used for the attach-
ment of new nodes and the appearance of new internal links. From numerical
simulations, the author found that p = 0.4 produces the best result to fit the
degree distribution and the rich-club connectivity of the AS-level Internet.
The interactive growth mechanism satisfies all the above observations on the
Internet evolution. Since the two growth operations are interdependent, at each
time-step the number of nodes of the network increases by one and the number
of links increases by three. Therefore the model produces a similar ratio of links
over nodes (L/N ' 3) as the AS-level Internet.
5.3 Model Validation
The IG model is compared against an Extended BGP AS graph measured in May
2001. The IG model is also compared with synthetic networks generated by other
Internet models, such as the BA model, the Inet-3.0 model (see Section 3.3.4 on
page 39) and the GLP model (see Section 3.3.10 on page 44). For each model, ten
networks are generated with different seed numbers and all results are the average
over the ten networks.
As shown in Table 5.1, all the model networks have the same number of nodes
and similar numbers of links as the AS graph. The GLP(1) network is generated
using parameters of ρ = 0.66, m = 1,m0 = 10 and β = 0.6447, as recommended
by Bu et al [44]. The GLP(2) network uses the same parameters except β = 0,
which makes the GLP model’s generalised linear preference of Equation 3.9 (on
page 45) equivalent to the linear preference of the Equation 3.2 (on page 39) used
by the BA model and the IG model.
64
Table 5.1: Network properties
N L γ kmax 〈k〉 P (k = 1) P (k = 2) P (k = 3)
AS graph 11461 32730 2.22 2432 5.7 28.9% 40.3% 11.6%
IG model 11461 34363 2.22 842 6.0 26.0% 33.8% 10.5%
GLP(1) 11461 34363 2.20 517 6.0 68.4% 11.3% 5.1%
GLP(2) 11461 34363 2.20 524 6.0 52.0% 16.3% 7.9%
Inet-3.0 11461 24171 2.22 2010 4.2 40.0% 36.7% 8.2%
BA model 11461 34363 3.0 329 6.0 0% 0% 40.0%
N - Number of nodes. L - Number of links. γ - power-law exponent. kmax - maximum degree.〈k〉 - average degree. P (k) - degree distribution, percentage of nodes with degree k.
5.3.1 Degree Distribution
5.3.1.1 Degree Distribution
AS graphIG modelGLP (1)GLP (2)Inet modelBA model
100 101 102 103 104
k
10-5
10-1
10-4
10-3
10-2
P(k)
100
Figure 5.2: Degree distribution. For each model, ten networks are generated andaveraged.
Figure 5.2 and Table 5.1 show that the IG model and the Inet-3.0 model closely
match the degree distribution of the AS graph, particularly the low-range degree
distributions, where the percentage of nodes with degree one P (1), is actually
smaller than the percentage of nodes with degree two P (2). The low-range degree
distribution is important because nodes with degree one and two account for more
65
than 70% of the total number of nodes in the AS graph.
The IG model is a dynamic growing model and it is the growth mechanism that
defines the model’s topological properties, including the degree distribution. The
reason that the Inet-3.0 model well matches the AS graph’s degree distribution
is because the static model is designed to resemble the Internet measurements,
where links are attached to nodes according to pre-assigned node degrees.
The BA model is based solely on the addition of new nodes. In order to obtain
a similar ratio of links over nodes as the AS-level Internet, each new node in the
BA model is attached to three old nodes (m = 3) and therefore P (1) = P (2) = 0.
Bu et al recommend the parameter m = 1 for the GLP model, thus each new
node is attached to only one old node. As a result the probability of nodes with
degree one of the two GLP networks are significantly larger than that of the actual
network (see Table 5.1). For example, P (1) of the GLP(1) is as high as 68.4%,
which is more than twice of the AS graph.
5.3.1.2 Degree vs Rank
AS graphIG modelGLP (1)GLP (2)Inet modelBA model
100
104
101
102
103
100 101 102 103 104 105
k
r
Figure 5.3: Degree k as a function of rank r.
Figure 5.3 shows degree k as a function of rank r on a log-log scale. The
AS graph has a nearly strick power-law relationship between degree and rank,
66
k ∼ r−0.85. The curves of the two GLP networks are not power-laws. The BA
model exhibits a power-law behaviour between degree and rank, but the power-
law exponent is significantly different from that of the AS graph. The curve of
the Inet-3.0 network deviates from the AS graph between k = 101 ∼ 103. Apart
from a few richest nodes (r ≤ 101), the IG model in general well matches the
correlation between degree and rank of the AS graph.
5.3.2 Rich-club Phenomenon
Networks generated using the IG model and the GLP model should exhibit a
higher rich-club connectivity than the BA model, because new internal links added
in the IG model and the GLP model preferentially connect among already well
connected nodes.
5.3.2.1 Rich-Club Connectivity
AS graphIG modelGLP (1)GLP (2)Inet modelBA model
0.1% 1% 10% 100%0.01%
0.1%
1%
10%
100%
r/N
φ(r/
N)
Figure 5.4: Rich-club connectivity, φ(r/N), as a function of normalised rank, r/N .
Figure 5.4 shows the rich-club connectivity φ(r/N) as a function of normalised
rank r/N on a log-log scale. The plot shows that only the IG model closely
matches the rich-club connectivity of the AS graph. The rich-club connectivity of
the Inet-3.0 model and the BA model are significantly lower than that of the AS
67
graph. It is interesting to notice that the rich-club connectivity of the two GLP
networks are higher than that of the AS graph. This means the rich nodes in
these two models are even more tightly interconnected among each other than in
the AS graph. For example, the AS graph and the IG model have φ(0.01) = 32%,
comparing with φ(0.01) = 72% of the GLP(1) and φ(1%) = 50% of the GLP(2).
5.3.2.2 Node-Node Link Distribution
0%
5%
10%
15%
20%
25%AS graph
5%100%
5%
100%rj
ri
l(r ,r )/Li j
a).
0%
5%
10%
15%
20%
25%IG model
5%100%
5%
100%rj
ri
l(r ,r )/Li j
b).
Figure 5.5: Node-node link distribution l(ri, rj), which is normalised by L, the totalnumber of links.
Figure 5.5 shows that the IG model well resembles the node-node link distri-
bution of the Extended BGP AS graph.
Table 5.2: Node-node link distribution
AS graph IG GLP(1) GLP(2) Inet BA
Number of links, L 32730 34363 34363 34363 24171 34363∑
rjl(ri ≤ 5%, rj) 29602 26422 32376 29073 22620 15687
l(ri ≤ 5%, rj ≤ 5%) 8919 7806 16210 11540 3697 1511
∑rj
l(ri ≤ 5%, rj) is the number of links connecting to the top 5% rich nodes;l(ri ≤ 5%, rj ≤ 5%) is the number of links connecting among the top 5% rich nodes.
In order to compare all the networks together, Figure 5.6 shows a simplified
version of the node-node link distribution, l(ri ≤ 5%, rj), which has only one vari-
able of rj and illustrates where the top 5% rich nodes (ri ≤ 5%) are connected
68
0
4000
8000
12000
16000
GLP model (1)
GLP model (2)
AS graph
IG model
Inet-3.0 model
BA model
0% 50% 100%rj
l(ri<
5%, r j)
Figure 5.6: Node-node link distribution, l(ri ≤ 5%, rj).
to. Figure 5.6 shows that only the IG model reproduces the node-node link distri-
bution of the AS graph. The two GLP networks exhibit a rich-club phenomenon
notably stronger than the AS graph. As shown in Table 5.2, the GLP(1) has
l(ri ≤ 5%, rj ≤ 5%) = 16210 links connecting among the top 5% rich nodes,
nearly twice as that of the AS graph.
5.4 Discussion
Based on observations on the Internet history data, the IG model uses the in-
teractive growth mechanism, in which the attachment of new nodes and the ap-
pearance of new internal links are interdependent. It is the growth mechanism
that defines the topological structure of the model. The simple and dynamic
IG model compares favourable with other Internet topology generators because
it closely resembles both the degree distribution and the rich-club phenomenon
of the AS-level Internet. Networks generated using the IG model, as illustrated
in Figure 5.7, have already been used in simulation studies on the TCP packet
traffic [111, 112].
The IG model is different from other models that also use the appearance
of new internal links (see Chapter 3), such as Dorogovtsev and Mendes’ model,
69
Figure 5.7: A network generated using the IG model. The size of a node is pro-portional to the number of its degree-one neighbours which have been removed tosimplify the graph.
Bu and Towsley’s Generalised Linear Preference (GLP) Model and Caldarelli ’s
Generalised Network Growth (GNG) Model. These models have explored various
schemes of preference probability. However these schemes are not supported by
measurements on the Internet. In addition, these models do not satisfy the growth
dynamics that have been observed on the Internet history data.
5.4.1 Maximum Degree
The IG model still has problems. For example the AS-level Internet features a
very large value of maximum degree (kmax = 2432, see Table 5.1), which is three
times of that of the IG model (842).
Figure 5.8 shows that the time-evolution of node degree in the BA model and
the IG model obey a power-law, k(t) ∼ tθ. As predicted by Barabasi et al [95],
θ of the BA model is 0.5. The author’s calculation shows that θ of the IG model
is 0.6. This means the node degree in the IG model increases at a higher rate
than in the BA model. The reason is that in the IG model old nodes (with high
70
Deg
ree
grow
th, k
(t)
10
100
1000
10000
100 1000 10000 100000
IG model
BA model
Timestep, t
Figure 5.8: Time-evolution: degree growth k(t) vs time t of a node added in anearly time-step.
degrees) have more chances to acquire new connections than those in the BA
model. As shown in Figure 5.1, at each time-step of the IG model, statistically 4
or 5 old nodes acquire new links, whereas at each time-step of the BA model, only
three old nodes acquire new links. As a result, although using the same linear
preference, the maximum degree of the IG model is higher than that of the BA
model (329).
The two GLP networks use the same growth mechanisms except the preference
probability. The GLP(1) network uses the recommended parameter value of β =
0.6447 and the generalised linear preference increases “the preference of being
connected to high-degree nodes”. The GLP(2) network uses the parameter value
of β = 0, which makes the generalised linear preference equivalent to the linear
preference. However as shown in Table 5.1, the maximum degree of the GLP(1)
network (517) and the GLP(2) network (524) are similar. This implies that the
generalised linear preference does not effectively increase the maximum degree of
the generated network.
The author has noticed that, as shown in Table 4.2 on page 55, the Fitness
BA model generates networks with a large maximum degree. The reason is that
when a high-degree node in the FBA model is assigned a high value of the fitness
parameter, the node will obtain much stronger ability of acquiring links than other
71
nodes. This suggests a possible way of reproducing a large maximum degree by
increasing the high-degree nodes’ preference probability (to be stronger than the
generalised linear preference). Chapter 8 will study further on the maximum
degree.
5.4.2 Rich-Club Connectivity
The IG model closely reproduce the rich-club phenomenon of the AS-level Inter-
net, whereas the two GLP networks exhibit a rich-club phenomenon significantly
stronger than the actual network. The reason is that the IG model and the GLP
model have different numbers of new internal links being added during the net-
work growth, even though the two models contain the same numbers of nodes and
links.
In the IG model, the addition of new nodes (attached by new external links)
and the appearance of new internal links (between old nodes) are interdependent.
According to the interactive growth mechanism, at each time-step, statistically
the number of nodes increases by one and the number of links increases by three,
of which the number of newly added internal links is p × 2 + (1 − p) ∗ 1 = 1.4,
where p = 0.4, and the number of newly added external links is 3 − 1.4 = 1.6.
Thus the ratio of new internal links over new external links is 1.4/1.6 = 0.875.
In the GLP model, the addition of new nodes and the appearance of new
internal links are independent. According to the GLP model’s growth mechanism,
if ρ = 0.66 and m = 1, statistically when the number of nodes increases by one, the
number of newly added external links also increases by one, whereas the number
of newly added internal links increases by two. Thus the ratio of new internal
links over new external links is 2/1 = 2, which is significantly larger than that of
the IG model.
Due to the preferential attachment, new internal links tend to connect be-
tween already well connected nodes. Therefore networks generated by the GLP
72
model exhibit a rich-club phenomenon stronger than the IG model. The rich-club
connectivity (see Figure 5.4) of the GLP(1) network deviates from the AS graph
more than the GLP(2) network does. This is because the GLP(2) network uses
the linear preference probability while the GLP(1) network uses the generalise lin-
ear preference, which increases the preference of being connected to high-degree
nodes.
The above discussions on the reasons that are responsible for the topological
differences between the network models provides novel insights on how network
growth mechanisms shape the generated topologies.
5.5 Summary
The Interactive Growth model is based on observations on the Internet history
data. This simple and dynamic model closely resembles both the degree distri-
bution and the hierarchical structure characterised by the rich-club connectivity.
The IG model is a good step forward towards realistically modelling the Internet
topology.
73
Chapter 6
Structure Affects Functions
6.1 Introduction
Chapter 4 shows that networks having similar degree distributions may exhibit
different hierarchical structures characterised by the rich-club connectivity. Chap-
ter 5 proposed the Interactive Growth (IG) model, which can reproduces both the
degree distribution and the rich-club connectivity of the AS-level Internet. This
Chapter investigates whether the rich-club structure is relevant and how the net-
work structure affects the network functionality [108].
Three network behaviours are analysed and simulated, which are the network
routing efficiency, redundancy and robustness. The analyses and simulations are
based on three networks, including a Traceroute AS graph measured in April
2002 [28, 113] and two synthetic networks generated by the Interactive Growth
(IG) model and the Fitness Barabasi-Albert (FBA) model (see Section 3.3.6 on
page 41). For each model, ten networks are generated with different seed numbers
and all results are the average over the ten networks.
As shown in Table 6.1, Figure 6.1 and Figure 6.2, the three networks have
have similar sizes and power-law degree distributions. The IG model is an ex-
ample of networks having a densely interconnected rich-club as the AS graph,
whereas the FBA model is an example of networks that do not exhibit the rich-
74
Table 6.1: Network Parameters
AS graph IG graph FBA graph
Number of nodes, N 11122 11122 11122
Number of links, L 30054 33349 33349
Average degree, 〈k〉 5.4 6.0 6.0
Max. degree, kmax 2839 842 1793
Power-law exponent, γ 2.2 2.22 2.255
P (k = 1) 26.1% 37.9% 14.0%
P (k = 2) 26.0% 33.3% 10.3%
P (k = 3) 0 0 50.5%
Characteristic path length, l∗ 3.13 3.56 3.86
1 10 100 1000
Cum
ulat
ive
dist
ribu
tion
Node Degree
FBA model
AS graphIG model
0.1%
1%
100%
0.01%
10%
Figure 6.1: Cumulative distribution of degree.
0.1%
1%
10%
100%
1% 10% 100%
Ric
h-cl
ub c
onne
ctiv
ity
Normalized rank0.1%
FBA model
AS graphIG model
Figure 6.2: Rich-club connectivity.
club phenomenon.
6.2 Routing Efficiency
One of the important features of the AS-level Internet is that the network is a
small-world network, which features a very small characteristic path length (see
section 2.2.4). It is possible for a network with a small characteristic path length
to achieve better routing efficiency. Please note that the routing efficiency of a
network is not only determined by the network structure, but also by the routing
protocol and many other engineering factors.
As shown in Figure 6.3, the cumulative distributions of shortest path length of
75
0
0.2
0.4
0.6
0.8
1
2 3 4 5 6
Cum
ulat
ive
dist
ribu
tion
Shortest path length, l
AS graphIG model
FBA model
Figure 6.3: Cumulative distribution of shortest path length. For each model, tennetworks are generated and averaged.
the AS graph and the IG model exhibit similar patterns and they are displaced to
the left of the FBA model. As mentioned in Chapter 4, the tightly interconnected
rich-club of the Internet provides a large selection of shortcuts for the network
traffic. It is not surprising that comparing with the FBA model, the IG model
better models the shortest path length of the AS graph, because the IG model
reproduces the actual network’s rich-club structure.
However the IG model still does not accurately reproduce the characteristic
path length of the AS graph. The characteristic path length of the IG model is
0.43 hop longer than that of the AS graph. The 0.43 hop difference is notable in
terms of network routing efficiency considering the fact that the characteristic path
length of the networks is less than 4 hops. More details on accurately reproducing
the characteristic path length will be provided in Chapter 8.
6.3 Network Redundancy
Cycles [114, 97, 115, 116] encode the redundant information in the network struc-
ture. The number of short cycles (triangles and quadrangles) are relevant prop-
erties because the multiplicity of paths between any two nodes increases with the
density of short cycles (note that an alternative path between two nodes can be
76
longer than their shortest-path). The triangle coefficient kt, is defined as the num-
ber of triangles that a node shares. Similarly the quadrangle coefficient kq, is the
number of quadrangles that a node has.
The clustering coefficient c of a node (see section 2.2.6) can be expressed as a
function of the node’s degree k and triangle coefficient kt,
c =kt
k(k − 1)/2. (6.1)
The reason that the author studied short cycles instead of clustering coefficient is
that short cycles have the advantage of providing neighbour clustering information
of nodes with different degrees [117].
Table 6.2: Network Short Cycles
AS graph IG graph FBA graph
Maximum triangle coef., kt−max 7482 4962 1191
Average triangle coef., 〈kt〉 12.7 10.0 0.6
Maximum quadrangle coef., kr−max 9648 9247 4638
Average quadrangle coef., 〈kq〉 227.4 108.0 10.4
0.01%
0.1%
1%
10%
100%
1 10 100 1000 10000
AS graphIG modelFBA model
Triangle coefficient
Dis
trib
utio
n
Figure 6.4: Distribution of triangle coef.
0.01%
0.1%
1%
10%
100%
1 10 100 1000 10000
Cum
ulat
ive
dist
ribu
tion
Triangle coefficient
FBA model
AS graphIG model
Figure 6.5: Cumulative distribution oftriangle coefficient.
Figure 6.4, 6.5, 6.6, 6.7 and Table 6.2 show that the AS graph and the IG
model have significantly more triangles and quadrangles than the FBA model.
This implies that the AS graph and the IG model have more possible alternative
77
0.01%
0.1%
1%
10%
100%
1 10 100 1000 10000
AS graphIG modelFBA model
Rectangle coefficient
Dis
trib
utio
n
Figure 6.6: Distribution of quadrangle coef.
0.01%
0.1%
1%
10%
100%
1 10 100 1000 10000
Cum
ulat
ive
dist
ribu
tion
Rectangle coefficient
FBA model
AS graphIG model
Figure 6.7: Cumulative distribution ofquadrangle coefficient.
routing paths, are more flexible in traffic routing and hence show higher degrees
of network redundancy than the FBA model.
By reproducing the rich-club phenomenon, the IG model resembles the AS
graph’s network redundancy property. The reason is that the tightly intercon-
nected rich-club increases the number of short cycles in the network.
6.4 Network Robustness
Barabasi et al [77] showed that it is difficult to divide power-law topologies into
separate subnetworks by removing nodes at random (error), but it is very easy to
split them into subnetworks by removing specific nodes (attack, see Figure 6.8).
Barabasi et al ’s study was based on the BA model, which does not exhibit the
rich-club phenomenon of the AS-level Internet.
Largest cluster
node-attack
Figure 6.8: Node attack.
78
Here the author study whether the rich-club structure has an impact on the
network robustness property in four scenarios:
• Node error – randomly remove a node and its links. Node error resembles
the scenario when a node is out of service due to unpredictable technical
problems, such as hardware failure.
• Node attack (see Figure 6.8) – firstly remove the best-connected node and
its links, and continue select and remove nodes in decreasing order of their
degrees. Node attack resembles the scenario in the actual Internet when a
node, AS or router, is “collapsed” (out of service) due to infection of mali-
cious virus, or is severely congested (denial of service) due to targeted traffic
surge. Node attack vulnerability is of great interest for network research.
• Link error [118] – randomly remove a link.
• Link attack (see Figure 6.9) – remove the link connecting between the best-
connected nodes. e.g. if a link connects between node i and node j with
degree ki ≤ kj, then the first removed link is connecting between nodes with
the largest ki.
link-attack
Figure 6.9: Link attack.
The network robustness is measured by the size of the largest cluster in the
remaining network after the error or attack operations (see Figure 6.8), where a
cluster is defined as a subnetwork in which all nodes are reachable via paths of
links. Please note that in the real Internet, there are other engineering factors
which might affect the network robustness.
79
6.4.1 Node Error
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
AS graph
IG model
FBA model
f - fraction of nodes removed randomly
S -
norm
aliz
ed s
ize
of l
arge
st c
lust
er
Figure 6.10: Network robustness under node error.
Figure 6.10 is a plot of the normalised size of the largest cluster S, shown as a
fraction of the number of nodes in the original network against f , the fraction of
nodes randomly removed. The figure shows that the three networks display high
degrees of robustness under the node error. When 10% of nodes are randomly
removed, 90% of the networks can still communicate. The reason is that the
three networks have power-law degree distributions, which means most randomly
removed nodes are low-degree nodes and therefore limited node error has little
impact on the network integrity.
6.4.2 Node Attack
Figure 6.11 shows that the AS graph and the IG model are extremely vulnerable
under the node attack. The removal of only a few of its best connected nodes can
result in a disconnected network. For example, when the top 5%–10% rich nodes
are under attack, both networks collapse into small pieces. It is notable that when
only 1% best connected nodes of the AS graph are removed, nearly 40% nodes
are detached from the network. By comparison, the FBA model shows a fairly
high degree of robustness under the node attack. When the top 10% rich nodes
are removed, the largest cluster still contains 65% nodes.
80
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3
AS graph
IG model
FBA model
f - fraction of nodes under attack
S -
norm
aliz
ed s
ize
of l
arge
st c
lust
er
Figure 6.11: Network robustness under node attack.
This is because the node attack is equivalent of removing members of the rich-
club. The rich-club plays a dominant role in the network’s connectivity and the
segmentation of the rich-club can break down the whole network’s integrity.
6.4.3 Link Error
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
AS graph
IG model
FBA model
f - fraction of links removed randomly
S -
norm
aliz
ed s
ize
of l
arge
st c
lust
er
Figure 6.12: Network robustness under link error.
Figure 6.12 shows that the three networks are fairly resilient to the link error.
Comparing with the AS graph and the IG model, the FBA model exhibit a higher
degree of resilience to the link error. The reason is that, as introduced in Chapter
3, during the growth of the FBA model each new node is attached to the network
with m = 3 connections and therefore all nodes in the FBA model have at least 3
links. If a node loses one or two links, it still connect with the rest of the network.
81
However as shown in Table I, nearly 70% nodes of the AS graph and the IG model
have only one or two links. These nodes are more easily to be isolated from the
the system under the process of link error.
6.4.4 Link Attack
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1
AS graph
IG model
FBA model
f - fraction of links under attack
S -
norm
aliz
ed s
ize
of l
arge
st c
lust
er
Figure 6.13: Network robustness under link attack.
Figure 6.13 shows that the three networks are tolerant to the link attack due
to large numbers of redundant links among high-degree nodes. The networks’
overall reachability are not damaged even after about 60% links are removed in
the attack mode. When the link attack continues, the networks start to experience
a percolation transition [119], in which the networks suddenly collapse from a
network cluster into disassembled small sub networks.
6.5 Discussion
This Chapter showed it is necessary to reproduce the rich-club structure because
it has significant impacts on the network dynamic properties. An Internet model
that does not reproduces the rich-club properties underestimates the actual net-
work’s routing efficiency in terms of shortest path length and routing flexibility
in terms of alternative reachable paths, and overestimate the network robustness
under node-attack.
82
Top
Base
Figure 6.14: A conical structure model.
To help understand these impacts, Figure 6.14 illustrates the rich-club struc-
ture of the AS graph as a conical hierarchical structure, which can be regarded
as a simplified version of the conceptual Jellyfish model [120]. At the top of the
cone is the rich-club, which contains the 5% richest nodes and 27% total links in
the network. On the base of the cone is the rest 95% nodes, which have only 13%
links connecting among them. The other 60% links in the network connect the
rich-club to the nodes On the base.
The conical structure reveals a number of interesting features of the AS graph.
Firstly members of the rich-club in the cone are tightly interconnected and the
average path length among them is very small. Secondly majority of the pe-
ripheral low-degree nodes are only one step away from the rich-club. Thus the
rich-club acts as a super traffic hub by providing a large number of shortcuts
for peripheral-to-peripheral communications and therefore improves the network
routing efficiency. Also the rich-club improves the network redundancy because
the rich-club interconnections significantly increase the density of short cycles
and then form a large number of alternative routing paths. However the domi-
nant role of the rich-club makes the network very fragile under node-attack. When
the integrity of the rich-club is undermined by the removal of a few of its richest
members, the whole network’s integrity is break down as well. It is interesting to
83
notice that improved network redundancy does not necessarily results in improved
network robustness.
6.6 Summary
In summary, comparison results show that the rich-club plays a dominant role in
the network. It improves the network routing efficiency and redundancy, but at
the cost of the network robustness under the node attack. Realistic models of the
Internet topology should correctly reproduce the rich-club phenomenon.
84
Chapter 7
Topological Disparities Between
Internet Measurements
7.1 Introduction
Only recently reliable measurements on the AS-level Internet topology became
available. As introduced in Chapter 3, there are three major data sources pro-
duced by two measuring methodologies. Many studies on the Internet topology
are based on the BGP AS graphs and the Extended BGP AS graphs, which are
produced by the passive measurements [25, 26, 27] using the BGP routing tables.
The Extended BGP AS graphs [30, 74] use additional information sources, such
as the Internet Routing Registry (IRR) data and the Looking Glass (LG) data,
and obtain 40% more links than the BGP AS graphs [121]. The Traceroute AS
graphs are produced by the active measurement methodology [28, 29] using the
traceroute probing data. The Traceroute AS graphs have about 30% more links
than the BGP AS graphs.
There have been comparisons between the BGP AS graph and the Extended
BGP AS graph [74, 121] and there have also been comparisons between the BGP
AS graph and the Traceroute AS graph [80, 81, 73].
In this Chapter the author provides a systematic comparison among all of
85
the three measurements. The author investigates a BGP AS graph, an Extended
BGP AS graph and a Traceroute AS graph, which are measured recently and have
similar numbers of nodes. The author examines a number of statistical topology
properties [122] and try to find out whether the Extended BGP AS graph and the
Traceroute AS graph are structurally equivalent and which measurement is more
complete or realistic.
Results show that the three AS graphs have non-trivial structural differences.
The major topological disparity, which is quantified by the metric of rich-club
connectivity, is that the two BGP-based graphs have less links connecting among
highly connected nodes than the Traceroute AS graph. The Traceroute AS graph
and the Extended BGP AS graph both have a notable number of links that do
not present in the other graph. The extra links contained in the Traceroute graph
are connections among the high-degree nodes. Although a small number, these
links are relevant to network behaviours, such as routing efficiency (shortest path
length) and routing flexibility (density of short cycles). Whereas the extra links
contained in the Extended BGP AS graph do not have significant impacts on the
network behaviours.
The author suggests that the traceroute-derived data, by comparison, are more
realistic measurements of the Internet topology. Chapter 8 will use the Traceroute
AS graph to validate the PFP model.
86
7.2 Comparison
The Traceroute AS graph was measured in April 2002 [28, 113]. The BGP AS
graph and the Extended AS graph were both measured in May 2001 [30]. As shown
in Table 7.1, the three AS graphs have similar numbers of nodes but different
numbers of links. The BGP AS graph has 40% less links than the Extended BGP
AS graph and 28% less links than the Traceroute AS graph.
Table 7.1: Parameters of the three AS graphs
Traceroute Extended BGP BGP
Number of nodes N 11122 11461 11174
Number of links L 30054 32730 23409
Average degree 〈k〉 5.4 5.7 4.2
Max. degree kmax 2839 2432 2389
Power-law exponent γ 2.22 2.22 2.22
7.2.1 Degree Distribution
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103
Cum
ulat
ive
dist
ribu
tion
Degree
TracerouteExtended BGP
BGP
slope -1.22
Figure 7.1: Cumulative degree distribution.
As shown in Figure 7.1, the cumulative degree distribution of the three AS
graphs are characterised by slope -1.22, which yields the degree distribution of
P (k) ∼ k−γ with power-law exponent γ ' 2.22 (see Table 7.1).
87
10 -4
10 -3
10 -2
10 -1
10 0
10 0 10 1 10 2 10 3
Dis
trib
utio
n
Degree
Traceroute Extended BGP
BGP
Figure 7.2: Degree distribution.
Figure 7.2 shows more common details of the non-strict power-law degree dis-
tributions. There are more nodes with degree two than node with degree one
(P (k = 2) = 37 ∼ 40% > P (k = 1) = 26 ∼ 34%) and the distributions have
heavy tails where the maximum degrees kmax are very large (see Table 7.1).
100
101
102
103
100 101 102 103 104
Deg
ree
Rank
TracerouteExtended BGP
BGP
slope -0.85
Figure 7.3: Degree vs rank.
Figure 7.3 shows that all the three AS graphs show a power-law relationship
between degree and rank (k ∼ r−0.85).
In general, the three AS graphs have fairly similar degree distributions despite
having different numbers of links.
88
7.2.2 Rich-Club Connectivity
Table 7.2: Rich-club connectivity φ(k) as a function of degree, k.
Degree Traceroute graph Extended BGP graph BGP graph
1000 100% 100% 100%
300 100% 75% 80%
100 60% 46% 50%
30 14% 11% 14%
10 -3
10 -2
10 -1
100
100 101 102 103
Ric
h-cl
ub c
onne
ctiv
ity
Degree
TracerouteExtended BGP
BGP
slope 1.3
Figure 7.4: Rich-club connectivity φ(k) as a function of degree.
Since the three AS graph have similar maximum degrees, Figure 7.4 shows
the rich-club connectivity as a function of degree, φ(k). In general the rich-club
connectivity φ(k) of the three AS graphs follow a power-law behaviour of φ(k) ∼kυ with υ = 1.2 ± 0.1. The difference is that the high-degree nodes (k > 102) in
the Traceroute AS graph are more tightly interconnected among each other than
in the other two BGP AS graphs. For example as shown in Table 7.2, nodes with
degrees larger than 300 in the Traceroute AS graph form a fully connected mesh,
whereas in the other two BGP graphs the rich-club connectivity is 75 ∼ 80%. It
is interesting to notice that, although the Extended BGP AS graph has 40% more
links than the BGP AS graph, the rich-club connectivity φ(k) (as a function of
degree) of the two graphs are fairly close.
89
Table 7.3: Rich-club connectivity φ(r/N) as a function of normalised rank, r/N .
r/N Traceroute graph Extended BGP graph BGP graph
0.001 100.0% 80.3% 83.3%
0.005 53.9% 48.3% 32.8%
0.008 35.1% 35.1% 21.5%
0.02 12.6% 16.2% 7.9%
0.05 3.8% 5.4% 2.4%
0.1 1.5% 2.0% 1.0%
10 -3
10 -2
10 -1
100
10 -3 10 -2 10 -1 100
Ric
h-cl
ub c
onne
ctiv
ity
Normalized Rank (r/N)
TracerouteExtended BGP
BGP
r/N=0.008
Figure 7.5: Rich-club connectivity φ(r/N) as a function of normalised rank.
Figure 7.5 reveals more details by illustrating the rich-club connectivity as a
function of rank normalised by the number of nodes, φ(r/N).
• In the Traceroute AS graph, the rich nodes are tightly interconnected and
the rich-club of r/N = 0.001 is a fully connected mesh (see Table 7.3).
• In the BGP AS graph, the rich-club connectivity φ(r/N) is notably smaller
than the other two graphs because it has significantly less links.
• Rich nodes (r/N < 0.008) of the Extended BGP AS graph are not as tightly
interconnected as the Traceroute AS graph. It is interesting to notice that
lesser rich nodes (10−2 < r/N < 10−1) of the Extended BGP AS graph are
actually has a larger rich-club connectivity than the Traceroute AS graph.
90
This suggests that although the Extended BGP AS graph uses additional
information sources and have 40% more links than the BGP AS graph, it is not
capable of capturing all the inter-rich links that present in the Traceroute AS
graph. Nevertheless the Extended BGP AS graph has a notable number of links
connecting among lesser rich nodes and these links are not contained in the BGP
AS graph and the Traceroute AS graph.
7.2.3 Shortest Path Length
0.0
2.0
4.0
6.0
8.0
1.0
2 3 4 5 6
Cum
ulat
ive
dist
ribu
tion
Shortest path length
TracerouteExtended BGPOriginal BGP
Figure 7.6: Cumulative distribution of shortest path length.
1.5
2.0
2.5
3.0
3.5
4.0
10 0 10 1 10 2 10 3
Shor
test
pat
h le
ngth
Degree
Traceroute Extended BGP
BGP
Figure 7.7: Correlation between shortest path length l and degree, where l is theaverage of nodes with the same degree.
91
Figure 7.6 shows that the Extended BGP AS graph and the BGP AS graph
have nearly the same cumulative distribution of shortest path length, which are
displaced to the right of the Traceroute AS graph. Figure 7.7 shows that in general
the shortest path length of a node in the two BGP AS graphs is half-hop longer
than that of a node with the same degree in the Traceroute AS graph.
Table 7.4: Parameters of the three AS graphs (continued)
Traceroute Extended BGP BGP
Characteristic path length l∗ 3.13 3.56 3.62
Average clustering coef. 〈c〉 0.49 0.35 0.30
Average triangle coef. 〈kt〉 12.7 23.4 5.3
Max. triangle coef. kt−max 7482 7150 3638
Average quadrangle coef. 〈kq〉 277.4 206.8 128.5
Max. quadrangle coef. kq−max 9648 8474 5506
Average betweenness 〈C∗B〉 4.13 4.56 4.62
Max. betweenness C∗B−max 3236.8 3555.5 3596.3
The Traceroute AS graph has more links than the BGP AS graph but has less
links than the Extended BGP AS graph. However it is clear that the number of
links does not necessarily contribute to the network routing efficiency. As shown
in Table 7.4, the characteristic path length of the two BGP AS graphs are 0.5
hop longer than that of the Traceroute AS graph. The half-hop length difference
is significant in terms of network routing efficiency considering that the average
distance between a pair of nodes in the networks are only less than 4 hops. The
reason for this difference can be explained by the above measurements on the rich-
club connectivity, that the tightly interconnected rich-club of the Traceroute AS
graph provides a large selection of shortcuts for the network routing. Whereas the
Extended BGP AS graph and the BGP AS graph do not have as many inter-rich
links as the Traceroute AS graph, and therefore on average packet traffic travels
longer in the two BGP AS graphs .
92
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Cum
ulat
ive
dist
ribu
tion
Cluster coefficient
Traceroute Extended BGP
BGP
Figure 7.8: Cumulative distribution of clustering coefficient.
10 -3
10 -2
10 -1
10 0
10 0 10 1 10 2 10 3
Clu
ster
coe
ffic
ient
Degree
Traceroute Extended BGP
BGP
Figure 7.9: Correlation between clustering coefficient c and degree, where c is theaverage of nodes with the same degree.
7.2.4 Short Cycles
In general, nodes of the Traceroute AS graph have larger clustering coefficient
than the Extended BGP AS graph and the BGP AS graph (see Figure 7.8 and
〈c〉 in Table 7.4). There are curve distortions in Figure 7.8. This is because the
three graphs have large numbers of nodes having the same typical values of the
clustering coefficient, e.g. c = 0.33 and c = 0.66 when nodes with degree three
have one or two inter-neighbour links.
Figure 7.9 shows that in the three graphs the clustering coefficient decreases
93
with degree, although nodes with high degrees might have a fairly large number of
inter-neighbour links. To infer the neighbour clustering information of nodes with
different degrees, the short cycles properties introduced in Chapter 6 are studied
as follows.
10 -4
10 -3
10 -2
10 -1
10 0
10 0 10 1 10 2 10 3 10 4
Cum
ulat
ive
dist
ribu
tion
Triangle coefficient
Traceroute Extended BGP
BGP
Figure 7.10: Cumulative distribution of triangle coefficient.
100
101
102
103
104
100 101 102 103
Tri
angl
e co
effi
cien
t
Degree
TracerouteExtended BGP
BGP
k=700
Figure 7.11: Correlation between triangle coefficient kt and degree, where kt is theaverage of nodes with the same degree.
Figure 7.10 and Table 7.4 show that the average triangle coefficient of the Ex-
tended BGP AS graph is larger than the other two graphs. However as shown in
Figure 7.11 the Extended BGP AS graph exhibits a distortion in the correlation
between triangle coefficient and degree. For nodes with degrees larger than 10
94
and smaller than 700, the triangle coefficient of the Extended BGP is larger than
that of the BGP AS graph and the Traceroute AS graph. For other nodes, the
Extended BGP AS graph and the BGP AS graph has nearly the same triangle co-
efficient, which is smaller than the Traceroute AS graph. This observation echoes
the above analysis on the rich-club connectivity and the shortest path length,
that the extra links contained in the Extended BGP AS graph are mainly links
connecting among lesser rich nodes.
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103 104
Cum
ulat
ive
dist
ribu
tion
Quadrangle coefficient
TracerouteExtended BGP
BGP
Figure 7.12: Cumulative distribution of quadrangle coefficient.
102
103
104
100 101 102 103
Qua
dran
gle
coef
fice
nt
Degree
TracerouteExtended BGP
BGP
Figure 7.13: Correlation between quadrangle coefficient kq and degree, where kq isthe average of nodes with the same degree.
Figure 7.12, Figure 7.13 and Table 7.4 show that the three AS graphs have
95
similar properties on the quadrangle coefficient and the Traceroute AS graph has
more quadrangles than the other two BGP graphs.
It is interesting to notice that plot curves of the BGP AS graph and the
Traceroute AS graph often follow parallel patterns (see Figure 7.5 and Figure 7.10
and Figure 7.12), while those of the Extended BGP AS graph have different shapes
and sometimes cross those of the Traceroute AS graph.
7.2.5 Disassortative Mixing
10 1
10 2
10 3
10 0 10 1 10 2 10 3
Nea
rest
-nei
ghbo
rs a
vera
ge d
egre
e
Degree
Traceroute Extended BGP
BGP
Figure 7.14: Correlation between nearest-neighbours average degree knn and degree,where knn is the average of nodes with the same degree.
Figure 7.14 shows that the three AS graphs have similar negative degree-degree
correlations and therefore all exhibit the disassortative mixing behaviour. In gen-
eral for nodes with the same degree, the nearest-neighbours average degree of the
Traceroute graph is larger than that of the two BGP graphs.
7.2.6 Betweenness Centrality
As shown in Figure 7.15, the cumulative distribution of betweenness Pcum(C∗B) of
the three AS graphs follow similar power-law behaviours characterised by slope
−1.1, which yields the betweenness distribution of P (C∗B) ∼ (C∗
B)−2.1. Figure 7.16
96
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103 104
Cum
ulat
ive
dist
ribu
tion
Betweenness centrality
TracerouteExtended BGP
BGP
slope -1.11
Figure 7.15: Cumulative distribution of betweenness.
10 0
10 1
10 2
10 3
10 4
10 0 10 1 10 2 10 3 10 4
Bet
wee
nnes
s
Degree
Traceroute Extended BGP
BGP
Figure 7.16: Correlation between betweenness C∗B and degree, where C∗B is theaverage of nodes with the same degree.
shows that the three AS graphs show similar correlations between betweenness
and degree.
7.3 Discussion
Comparison results show that the three AS graphs have a number of similar
topological properties, including the degree distribution, the disassortative mixing
behaviour and the betweenness centrality.
However they also exhibit non-trivial structural differences. The principal
97
topological disparity, which is characterised by the metric of rich-club connectiv-
ity, is that the two BGP-based graphs have less inter-rich links than the Tracer-
oute AS graph. This structural difference is relevant because the extra inter-rich
links contained in the Traceroute graph, although a small number, are responsi-
ble for the differences on performance-related network properties, such as routing
efficiency (shortest path length) and routing flexibility (density of short cycles).
Results also indicate that the Extended BGP collects a notable number of links
connecting among lesser rich nodes and these links do not present in the other
two graphs.
Traceroute AS graph
Extended BGP AS graphBGP AS graph
Figure 7.17: The three AS graph measurements.
The above analysis suggests that the BGP AS graph is a subset of both the
Extended BGP AS graph and the Traceroute AS graph, which do not fully overlap
each other (see Figure 7.17).
The Traceroute AS graph and the Extended BGP AS graph both contain
some links that do not present in the other graph. The inter-rich links contained
in the Traceroute AS graph are critical for the network structure and functionality,
whereas the the extra links present in the Extended BGP AS graph do not signif-
icantly effect the network properties. Considering the limitations of the passive
measurement based on BGP tables [80, 81, 73], we suggest that the traceroute-
derived data, by comparison, are more realistic measurements of the Internet
topology. Further study is needed to investigate whether all links inferred from
various data sources are actual Internet connections?
98
7.4 Summary
There are three primary measurements of the AS-level Internet topology based
on different data-collecting methodologies. By examining a number of statistical
topological properties, we identified that the major structural discrepancy among
the three AS graphs is that the Traceroute AS graph has more interconnections
among the high-degree nodes than the other two BGP-based AS graphs. This
structural discrepancy is non-trivial because it can be critical for performance-
related network properties. We suggested that by comparison, the traceroute-
derived data are more realistic measurements of the Internet.
99
Chapter 8
The Positive-Feedback Preference
Model
8.1 Introduction
During the last few years, various Internet models have been proposed. Yet no
existing model can generate a network that matches all the relevant topological
properties of the Internet. For example, Chapter 4 shows that network models
based solely on the degree distribution (e.g. the BA model) do not reproduce
the rich-club connectivity. Although the Interactive Growth model proposed in
Chapter 5 resembles both the degree distribution and the rich-club connectivity
of the AS-level Internet, the model still has its limitations. For example the IG
model does not produce a maximum degree as large as the actual measurements.
In this Chapter, based on further measurements of the Internet history data,
the author found out that, in addition to the interactive growth mechanism used
in the IG model, there is another mechanism which is necessary for the cor-
rect modelling of the AS-level Internet topology: a nonlinear preferential growth,
where the growth is described by a positive-feedback mechanism. The author
proposed the Positive-Feedback Preference (PFP) model, which uses both of the
two mechanisms.
100
Validation results show that the PFP model accurately reproduces all the rele-
vant topological properties of the AS-level Internet, including degree distribution,
rich-club connectivity, the maximum degree, shortest path length, short cycles,
disassortative mixing and betweenness centrality. The PFP model provides a
novel insight into the evolutionary dynamics of real complex networks.
8.2 Modelling The Maximum Degree
100
101
102
103
100 101 102 103 104
Deg
ree
Rank
AS graphPFP model
IG modelBA model
slope -0.85
Figure 8.1: Degree k vs rank r.
The IG model has its limitations. As shown in Figure 8.1, the AS graph has
a nearly strict power-law relationship between degree and rank. The maximum
node degree kmax = 2839 present in the AS graph is nearly a quarter of the number
of nodes (kmax ' N/4) and is significantly larger than the ones obtained by the
IG model (kmax = 700) and the BA model (kmax = 292). The IG model and the
BA model use linear preferential attachment.
To overcome this shortfall, it is possible to replace the linear preference given
by the BA model
Π(i) =ki∑j kj
. (8.1)
101
by a nonlinear preferential probability [42, 123]
Π(i) =kα
i∑j kα
j
, α > 1, (8.2)
which favours high-degree nodes.
A numerical experiment (called the Test* model) using Equation (8.2) instead
of Equation (8.1) in the IG model showed that, when α = 1.15 ± 0.01, this
nonlinear preferential growth creates a network with a maximum degree kmax
similar to the AS graph.
10 -3
10 -2
10 -1
100
10 -3 10 -2 10 -1 100
Ric
h-cl
ub c
onne
ctiv
ity
Normalized rank (r/N)
AS graphPFP model
IG modelBA model
Test* model
r/N=0.01
Figure 8.2: Rich-club connectivity φ(r/N) vs normalized rank r/N .
However, as shown in Figure 8.2, the rich-club connectivity produced by the
Test* model deviates from the AS graph. For example, the 1% best connected
nodes of the Test* model have 42% allowable interconnections compared with
27% of the AS graph.
8.3 The Positive-Feedback Preference Model
Based on the Internet history data, Pastor-Satorras et al [66] and Vazquez et al [67,
124] measured that the probability that a new node links with a low-degree old
node indeed follows the linear preferential attachment given by Equation (8.1).
Whereas Chen et al [74] reported that high-degree nodes have a stronger ability of
102
acquiring new links than predicted by Equation (8.1). The Internet-history data
also show that at early times, node degree increases very slowly; later on, node
degree grows more and more rapidly. Taking into account these observations, we
modified the IG model by using the nonlinear preferential attachment
Π(i) =k
1+δ log10 ki
i∑j k
1+δ log10 kj
j
, δ ∈ [0, 1]. (8.3)
Equation (8.3) is used for the attachment of new nodes and the appearance of new
internal links. We call this the Positive-Feedback Preference (PFP) model [125,
126]. From numerical simulations, we found that δ = 0.048 produces the best
result. It is interesting to notice that, for δ = 0.048 and kmax = 2839, the
exponent 1+δ log10 kmax ' 1.166 is close to the value of α used in the Test* model
to reproduce the AS graph’s maximum degree. The PFP model also modifies the
IG model’s interactive growth mechanism. The PFP model starts with a small
random network, at each time-step,
• with probability p ∈ [0, 1], a new node is attached to one old node; and
at the same time with probability q ∈ [0, 1] one new internal link appears
between old nodes and with probability 1− q two new internal links appear.
• with probability 1 − p, a new node is attached to two old nodes; and at
the same time with probability q one new internal link appears and with
probability 1− q two new internal links appear.
When p = 0.4 and q = 0.9, the generated PFP networks have the same ratio of
links over nodes as the AS graph (see Table 8.1).
103
α k
1+δlogk k
k
10 0
10 1 10 2 10 3
10 1
10 2
10 3
Deg
ree
func
tions
Degree 10 0
Figure 8.3: Three degree functions: k, kα with α = 1.15 and k1+δ log10 k withδ = 0.048.
PFP modelIG model
BA model
Deg
ree
Age (timestep)
10 3
10 2
10 1
10 0
10 0 10 1 10 2 103 10 4
Figure 8.4: Degree growth of a node added in an early time-step.
The PFP model satisfies Pastor-Sartorras et al, Vazquez et al and Chen et al ’s
observations. For low-degree nodes, the attachment preference is approximated
by the linear preference of Equation (8.1). For high-degree nodes, the attachment
preference increases as a nonlinear function of the node degree (see Figure 8.3).
As a result, as the time passes by, the rate of degree growth in the PFP model is
faster than in the IG model and the BA model (see Figure 8.4).
104
8.4 Model Validation
In this Chapter, the analysis is based on the Traceroute AS graph measured
in April 2002 by CAIDA [28, 113]. The AS graph is compared with networks
generated by the PFP model, the IG model and the BA model. For each model,
ten networks are generated with different seed numbers and all results are the
average over the ten networks. The networks had the same number of nodes and
similar numbers of links as the AS graph (see Table 8.1).
Table 8.1: Network ParametersAS graph PFP model IG BA
Number of nodes N 11122 11122 11122 11122Number of links L 30054 30151 33349 33349
Power-law exponent γ 2.22 2.22 2.22 3Degree distribution P (k = 1) 26% 28% 26% 0%Degree distribution P (k = 2) 38% 36% 34% 0%Degree distribution P (k = 3) 14% 12% 11% 40%
Average degree 〈k〉 5.4 5.4 6.0 6.0Max. degree kmax 2839 2686 700 292
Rich-club connectivity φ(r/N0.01) 27% 30% 32% 4.5%Avg. triangle coef. 〈kt〉 12.7 12 10.4 0.1Max. triangle coef. kt−max 7482 8611 4123 64
Avg. quadrangle coef. 〈kr〉 277 247 105.4 1.3Max. quadrangle coef. kr−max 9648 9431 8780 527Charact. path length l∗ 3.13 3.14 3.6 4.3
Average knn 〈knn〉 660 482 103 20Avg. betweenness 〈C∗B〉 4.13 4.14 4.6 5.3Max. betweenness C∗Bmax 3237 3419 1002 1064
8.4.1 Degree Distribution, Rich-Club Connectivity and
Maximum Degree
The PFP model closely matches the degree distribution (see Figure 8.5 and 8.6),
the rich-club connectivity (see Figure 8.2) and the maximum degree (see Table I) of
the AS graph. Also the PFP model has the same power-law relationship between
degree and rank, k ∼ r−0.85 as the AS graph (see Figure 8.1).
105
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103
Deg
ree
dist
ribu
tion
Degree
AS graphPFP model
IG modelBA model
slope -2.22
Figure 8.5: Degree distribution.
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103
Cum
ulat
ive
degr
ee d
istr
ibut
ion
Degree
AS graphPFP model
IG modelBA model
slope -1.22
Figure 8.6: The cumulative degree distribution.
In certain respect the accuracy of the PFP model to reproduce these properties
is not a surprise. After all, the model was designed to match these properties.
106
8.4.2 Short Cycles
10 -4
10 -3
10 -2
10 -1
10 0
10 0 10 1 10 2 10 3 10 4
Cum
ulat
ive
dist
ribu
tion
Triangle coefficient
AS graph PFP model
IG model BA model
Figure 8.7: Cumulative distribution of triangle coefficient.
10 -4
10 -3
10 -2
10 -1
100
100 101 102 103 104
Cum
ulat
ive
dist
ribu
tion
Quadrangle coefficient
AS graphPFP model
IG modelBA model
Figure 8.8: Cumulative distribution of quadrangle coefficient.
Figure 8.7 and 8.8 show that the AS graph and the PFP model have similar
cumulative distributions of short cycles.
107
10 -2
10 -1
10 0
10 1
10 2
10 3
10 4
10 0 10 1 10 2 10 3
Tri
angl
e co
effi
cien
t
Degree
AS graph PFP model
IG model BA model
Figure 8.9: Correlation between triangle coefficient kt and degree, where kt is theaverage over nodes with the same degree.
10 -1
100
101
102
103
104
100 101 102 103
Qua
dran
gle
coef
fici
ent
Degree
AS graphPFP model
IG modelBA model
Figure 8.10: Correlation between quadrangle coefficient kq and degree, where kq isthe average over nodes with the same degree.
Figure 8.9 and 8.10 show that the AS graph and the PFP networks also exhibit
similar correlations between short cycles and degree.
The AS graph and the PFP model have higher densities of short cycles (see
〈kt〉 and 〈kq〉 in Table 8.1) than the IG model and the BA model, therefore exhibit
higher degrees of network routing flexibility.
108
8.4.3 Disassortative Mixing
0
0.2
0.4
0.6
0.8
1
10 0 10 1 10 2 10 3 10 4
Cum
ulat
ive
dist
ribu
tion
Nearest-neighbors average degree
AS graph PFP model
IG model BA model
Figure 8.11: Cumulative distribution of nearest-neighbours average degree.
101
102
103
100 101 102 103
Nea
rest
-nei
ghbo
urs
aver
age
degr
ee
degree
AS graphPFP model
IG modelBA model
Figure 8.12: Correlations between nearest-neighbours average degree knn and de-gree, where knn is the average over nodes with the same degree.
The AS graph and the PFP model have close cumulative distribution of the
nearest-neighbours average degree (see Figure 8.11). Figure 8.12 shows that the
AS graph and the PFP networks exhibit similar negative correlations between the
nearest-neighbours average degree and degree, therefore show similar disassorta-
tive mixing behaviours.
109
8.4.4 Shortest Path Length
1.0
8.0
6.0
4.0
2.0
0.05.04.03.02.0
Cum
ulat
ive
dist
ribu
tion
Shortest path length
AS graphPFP model
IG modelBA model
Figure 8.13: Cumulative distribution of shortest path length.
1.5
2.0
2.5
3.0
3.5
4.0
4.5
10 0 10 1 10 2 10 3
Sho
rtes
t pat
h le
ngth
degree
AS graph PFP model
IG model BA model
Figure 8.14: Correlation between shortest path length l and degree, where l is theaverage over nodes with the same degree.
Figure 8.13 and 8.14 show that the PFP model accurately reproduces the cu-
mulative distribution of shortest path length and the correlation between shortest
path length and degree of the AS graph. Table 8.1 shows that the AS graph
and the PFP model have nearly the same characteristic path length, which is
significantly shorter than that of the IG model and the BA model.
110
The reason that the PFP model accurately reproduces the routing efficiency
properties (shortest path length and characteristic path length) of the AS graph
is because the model correctly resembles both the rich-club connectivity and the
disassortative mixing of the AS graph. The rich-club consists of highly connected
nodes, which are well interconnected between each other and the average hop
distance among the club members is very small (1 to 2 hops). The rich-club
is a “super” traffic hub of the network and the disassortative mixing property
ensures that peripheral nodes are always near the hub. These two structural
properties together contribute to the routing efficiency of a network. On the
contrary, the BA model does not reproduces the two structural properties and
therefore underestimates the actual network’s routing efficiency.
8.4.5 Betweenness Centrality
10 -4
10 -3
10 -2
10 -1
10 0
10 0 10 1 10 2 10 3 10 4
Cum
ulat
ive
dist
ribu
tion
Betweenness centrality
AS graph PFP model
IG model BA model
slope -1.1
Figure 8.15: Cumulative distribution of betweenness centrality, Pcum(C∗B).
Figure 8.15 shows that the cumulative distribution of betweenness centrality
Pcum(C∗B) of the four networks exhibit similar power-law behaviours characterised
by slope −1.1. However as shown in Table 8.1, the maximum value of the between-
ness centrality C∗Bmax of the AS graph and the PFP model are significantly larger
than that of the IG model and the BA model. Figure 8.16 also shows that only
111
100
101
102
103
100 101 102 103
Bet
wee
nnes
s
Degree
AS graphPFP model
IG modelBA model
Figure 8.16: Correlations between betweenness centrality C∗B and degree, where C∗Bis the average over nodes with the same degree.
the PFP model closely matches the correlation between betweenness centrality
and degree of the AS graph.
8.5 Discussion
8.5.1 The Positive-Feedback Preferential Attachment
The positive-feedback preferential attachment means that, as a node acquires new
links, the node’s relative advantage of competing for more new links increases as
a non-linear feed-back loop. This implies the inequality on the link-acquiring
ability between rich nodes and non-rich nodes enlarges as the network evolves.
Rich nodes, not only become richer, they become disproportionately richer.
8.5.2 Critical Assessment of The PFP Model
The PFP model accurately reproduces the AS-level Internet topology. Comparing
with other existing Internet models, the PFP model has in a number of advantages.
• Firstly the model closely matches all the topological properties that are
widely studied by the network research community, including degree distri-
bution, rich-club connectivity, the maximum degree, shortest path length,
112
1K
3K11K
a.
3K
1K
11K d.1K
3K
11K
c.
P(k
), %
k
N=11K
N=3K
N=1K
b.
r
k
kr/N, %
knn
φ, %
10 0
10 1
10 2
10 3
10 -2
10 -1
100
10 1
10 2
10 1 10 0 10 2 10 3
10 1 10 0 10 2 10 3 10 1 10 0 10 2 10 -1
10 1 10 0 10 2 10 3 10 4
10 1
10 2
10 0
10 1
10 2
10 -1
Figure 8.17: Network properties of a growing PFP model with the number of nodesN=1000 (1K), 3000 (3K) and 11122 (11K). (a) Degree distribution. (b) Degree vsrank. (c) Rich-club connectivity. (d) Nearest-neighbours average degree vs degree.
short cycles, disassortative mixing and betweenness centrality.
• Secondly the model reproduces these properties with remarkable accuracy.
• Thirdly the two growth mechanisms used in the model, namely the inter-
active growth and the positive-feedback preference, are based on (and sup-
ported by) the observations on the Internet history data.
• Finally, the validation of the model was conducted with the traceroute-
derived AS graph, which is regarded as more realistic than measurements
based on the BGP-tables (see Chapter 7).
While the initial motivation was to create a model that can accurately re-
produce the rich-club connectivity and the maximum degree of the AS graph,
the PFP model actually captures all other topological properties as well. This
suggests that the Internet structure can be described by only three topological
properties.
113
The PFP model is a phenomenological model. Further studies are needed to
explain why the Internet growth seems to follow the non-linear preferential attach-
ment given by the PFP model and what are the consequences of the PFP growth
mechanism for the future of the Internet. Figure 8.17 shows a number of network
properties of a growing PFP model with different numbers of nodes. It would be
interesting to investigate whether the PFP model also resembles other evolution
stages of the Internet topology without customising the model parameters.
8.6 Summary
There are two mechanisms that are necessary for the correct modelling of the
Internet topology at the AS level: the interactive growth and a nonlinear prefer-
ential growth, where the growth is described by a positive-feedback mechanism.
The Positive-Feedback Preference model uses the two mechanisms and accurately
reproduces all the topological properties of the AS-level Internet. The PFP model
is superior to other Internet models.
114
Chapter 9
Discussion and Conclusion
9.1 Discussion
Three years ago the research on the Internet topology was still in a preliminary
stage. The Internet has a power-law degree distribution. This means the network
contains a small number of nodes with very large numbers of links and the average
degree can not characterise this heterogeneous nature. The discovery of the power-
law degree distribution invalidated all previous research on the Internet topology
because they were based on the random network theories.
Many degree-based Internet models have been proposed. However no model ac-
curately reproduces the full picture of the Internet topology. Some models are not
based on real measurement data and some models even use non-physical growth
mechanisms to produce selected network properties that are of the researcher’s
own interests.
During his year-long literature survey, the author developed an intuition that
the difficulties in modelling the Internet is due to the lack of means to thoroughly
describe the complex structure of the Internet. There might be some hidden
properties that have not been explicitly characterised by the existing topology
parameters. Therefore, the author did not follow the normal way of starting the
research by examining and comparing all the existing models, which of course
115
would be a daunting job. Instead the author started his research by searching for
the hidden structure in the Internet topology.
Researchers have looked for other topological properties to characterise the
Internet topology. For example, by studying the correlation between degree and
nearest-neighbours average degree, researchers have reported that the Internet
exhibits the disassortative mixing behaviour, where high-degree nodes tend to
connect to low-degree nodes. However the disassortative mixing does not charac-
terise how high-degree nodes are connected with each other.
Preliminary measurement data suggested that the Internet has a large number
of links connecting among high-degree nodes. The author realised that it is a key
property of the Internet hierarchical structure. Then the author introduced the
concept of rich-club phenomenon to describe this overlooked structure, i.e. highly
connected nodes not only have large numbers of links but also are tightly intercon-
nected with each other. The rich-club phenomenon is quantitatively characterise
by the rich-club connectivity and the node-node link distribution.
The metric of the rich-club connectivity is a milestone on parameterising
the Internet topology. Using the rich-club connectivity, the author discovered
the structural deficiencies of the Internet models and the author also revealed
the structural discrepancies between different Internet measurements. Moreover,
the authro showed that the rich-club connectivity is relevant to the network be-
haviours, such as routing efficiency, redundancy and robustness.
Inspired by the rich-club properties, the author introduced the IG model,
which closely resembles both the power-law degree distribution and the rich-club
connectivity of the AS-level Internet. The IG model uses the interactive growth
mechanism that is abstracted from observations on the Internet history data. An
important contribution of the IG model is that it demonstrates a possible way to
capture more structural properties by adopting realistic mechanisms originated
from measurements on the Internet evolution.
The author noticed that the IG model still had limitations. For example, the
116
model does not reproduce the maximum degree of the AS graph. The author
found that this shortfall could be responsible for not accurately reproducing other
topological properties, such as disassortative mixing. In fact it is well known that
the Internet features a very large maximum degree, but no model using evolving
mechanisms can reproduce this property. The author discovered that by increas-
ing the preference probability, the modified IG model can reproduce the maximum
degree. However the rich-club connectivity of the generated network deviates from
the AS graph. After painstaking study on the Internet history data and with some
inspiration, the author introduced the PFP model. The model modifies that IG
model by using the so-called Positive-Feedback Preference, which only favours
high-degree nodes. As a result the model accurately reproduces the maximum
degree, the degree distribution and the rich-club connectivity at the same time.
While the initial motivation was to reproduce three degree-related structural prop-
erties, the PFP model accurately captures all other topological properties as well,
including properties of short cycles, shortest path length, disassortative mixing
and betweenness centrality. The PFP model is doubtlessly the most complete
and accurate model to date.
The author is confident on the above results because, as an important method-
ology that guided throughout the research, the author bases the research only on
the actual measurements of the Internet. The author uses the Internet mea-
surement data to study the network structure and validate the Internet models.
Moreover the growth mechanisms adopted by the IG model and the PFP model
are abstracted from (and supported by) the observations on Internet history data.
9.2 Future Work
The immediate work is to study the phenomenological PFP model to explain why
the preferential attachment is given by a non-linear feedback loop and what are
the consequences of this growth mechanism for the future of the Internet.
117
Future research work should take into account of the two major challenges of
the Internet.
• Due to the rapid growth, the Internet has evolved into such an immense
scale, that the existing methods are not valid anymore to carry out practical
simulations, e.g. to test new routing protocols.
• The Internet is constantly disrupted due to traffic congestions, facility fail-
ures and malicious attacks. The Quality-of-Service (QoS) issues are getting
more and more concerned when deploying future network infrastructures.
Considering the above challenges and based on the research achievements pre-
sented in this thesis, we propose two possible future directions as follows:
1. Scaling problem [127]. Can the network simulation be simplified by using
models with smaller size and less complexity? Are all scales important at
all?
2. Cascading effects [128, 129]. Does local disorder cause a cascading disruption
of the whole network? How to predict and prevent this? How long will it
take to recover?
9.3 Conclusion
The Internet topology has been measured at two different levels. By inferring
router adjacencies it is possible to measure the Internet Router (IR) level graph.
At another level, the graph of the Internet is obtained from the AS routing path
information. These two measurements are related but describe the Internet at
different levels. The AS level describes the aggregation of the routers and links
at a given domain. The two ways to measure the AS Internet are (1) passive
measurements obtained from the BGP routing tables and (2) active measurements
where a probe traces the routers that a IP packet visits when transversing the
118
network (that is at the IR level). The AS graph is obtained by mapping the router
information obtained by the probe with its AS domain. The active measurements
are considered to give better description of the Internet connectivity because they
can collect ephemeral adjacency not captured by only looking at the BGP tables.
In summary the AS graph is a heterogeneous network characterised by a power-
law degree distribution. The majority of nodes have only a few links, whereas a
small number of rich nodes have large numbers of links, in particular the best
connected node has links to nearly a quarter of nodes in the network. Based on
the Internet measurement data, the author concluded that the AS graph exhibits a
rich-club phenomenon where the highly connected nodes are tightly interconnected
with other. In fact the top 100 richest nodes form a fully connected mesh.
The existence of a rich-club is critical to for the description and understanding
of the AS Internet. The rich-club is a “super” traffic hub of the network and the
disassortative mixing property ensures that peripheral nodes are always near the
hub. Thus the rich club structure together with the disassortative mixing explain
why the network has a very small characteristic path length. Scale-free models
without the rich-club structure may under-estimate the flexibility of the traffic
routing in the Internet. Moreover, there is also a counter intuitive consequence of
modelling networks without the rich-club. A network without the rich-club may
over-estimate the robustness of the network to a node attack, where the removal
of a small percentage of its richest club members can break down the network
integrity.
The PFP model demonstrates that the degree distribution, the maximum de-
gree and the rich-club connectivity can be accurately reproduced by using two
realistic growth mechanisms based on the Internet history data, namely the in-
teractive growth and the positive-feedback preference. Moveover, when the above
three structural properties are closely resembled, all other topological properties
of the AS graph are also reproduced at the same time. The PFP model is the most
precise and complete Internet topology generator to date. The PFP model not
119
only is a practical model for representative Internet simulation but also provides
insights on the fundamental rules that govern the evolution of complex networks.
The above novel contributions represent a profound extension of the state-
of-the-art knowledge in the research field of parameterising and modelling the
Internet topology.
120
Appendix I.
QMUL Topology Simulator
The QMUL Topology Simulator provides all the calculation and simulation results
presented in this thesis. The motivation of developing the topology simulation tool
was that there was no suitable kit available for this research, which involves gener-
ating self-designed models and calculating self-defined properties. The simulator
is developed by the author himself using Microsoft Visual C++ 6.0. It is based
on the MS Windows 2000 operating system. It has the following functions (see
Figure 10.1):
• It grows scale-free networks using the BA model series, including the BA
model, the Fitness BA model and the Generalised BA model, with various
settings of initial status and parameters. It also imports topology data
generated by the Inet model of version 2.1 ∼ 3.0.
• It generates Internet-like networks using self-designed model, such as the
Interactive Growth model and the Positive-Feedback Preference model.
• It parses and imports that Internet measurement data (AS graphs).
• It calculates all the topological properties used in this thesis, such as clus-
tering coefficient, degree distribution, shortest path length, betweenness,
nearest-neighbours average degree, rich-club connectivity, triangle coefficient
and quadrangle coefficient.
121
• It exports topology data into the Pajek [130] file format, which can be used
to visualise the network graphs, e.g Figure 5.7. It also export plot data files
in the Gunplot [131] format to create scientific plot figures.
• It saves the network connectivity information and all the calculation results
of topological properties in the ‘∗.topo’ file format, which can be restored
for further uses.
Parameter Setting
( *.topo ) Topology files
Standard Pajek
network data files
Save
Load
Inet Data Files
Import Grow
Initial Status
Grow
Standard Gunplot
plot data files
Medium Status
Inet generator
Generate
PAJEK Gunplot
Load Load
Import
Other software and data sources
QMUL Topology Simulator Legend:
To generate networks using models of BA, FBA, GBA, GLP, IG, and PFP.
Export Export
QMUL Topology Simulator
parse
BGP and Traceroute AS graphs
Internet Raw data
Calculate topological properties: degree distribtuion, rich-club connectivity, shortest path length, triangle- coefficient, degree- degree correlations, betweenness......
(Scientific plot figures)
(Inner data structure)Other data
(Visualised graphs)
Figure 10.1: Function flowchart of the QMUL Topology Simulator.
122
The strongpoint of this simulator is that, by using a tight linear data structure
to store the topology information, the simulator achieves an optimal balance be-
tween the fast speed of calculation and the economic amount of memory required
by the process. Running on a Dell desktop computer with merely 256MB RAM
and an Intel 1.0GHz CPU, it takes only 30 seconds for the simulator to generate a
BA model network of 11K nodes and 33K links. The author also improved Dijk-
stra’s algorithm [53] of calculating the shortest path length between every pair of
nodes, so that the same process also calculate the betweenness centrality of every
node. It takes only about 5 hours to calculate the two properties. The QMUL
Topology Simulator also has the following features:
• Flexible. The simulator supports multiple evolving network models.
• Extensible. The simulator uses an object-oriented architecture, which pro-
vides the ability to add new network models and to handle customised file
formats.
• Large-scale. The simulator is capable of processing large scale networks with
up to 100K nodes and 4.5M links.
• User Friendly. The simulator provides a Graphical User Interface as shown
in Figure 10.2, 10.3 and 10.4.
Figure 10.2: Window of “Parameters for generating networks”.
123
Figure 10.3: Window of the main interface.
It took four months for the author to design, code and debug the first version
of the QMUL Topology Simulator in late 2001. Since then the simulator has been
updated and optimised for many times in order to revise program bugs, add new
functions and improve the calculation speed. The latest version of the program
has more than 5000 lines of code and it has been proved to be en efficient and
powerful network simulation tool. The following is a list of functions defined in
the C++ Class of “CQMUL Topo”.
long Do UnifyData(long thisManyXdata);
long Do ReadP lotF ile(CString thisP lotName, int thisP lotFormat);
long Do GenerateGrowOneNonLineal(long thisOneEnd, long thisCreateT ime);
long Do GenerateGetNonLinealPreferential(long thisException, double thisAlpha);
long Do P lotAverageErrorBar(long thisManyData);
long Do GenerateGrowOnePreference(long thisOneEnd, long thisCreateT ime);
long Do GenerateGrowOneRandom(long thisOneEnd, long thisCreateT ime);
124
Figure 10.4: Window of “Save plot data files”.
long Do GenerateGetLinealPreferential( );
long Do GenerateGetLinealPreferentialFBA(double thisTotal);
long Do ReadBarabasiActor(CString thisF ileName);
long Do GetLinkDistDataRank(long X1, long X2, long Y 1, long Y 2);
long Do GetLinkDistDataDegree(long X1, long X2, long Y 1, long Y 2);
long Do GetRichClubRankLink(long thisRank);
long Do P lotPercentage(long thisManyData);
long Do ArrangeRawData(long thisManyData);
long Do ReadDataF ile(CString thisF ileName, int thisF ileFormat);
long GetSmallestLabel( );
double Do GetRandom(double theBase);
void Do GenerateNLP ( );
void Do CalculateLength( );
void Do P lot70( );
void Do GetBetweenThis( );
void Do P lot40 K(long thisP lot);
void Do P lotV alueRank(long thisMany);
void Do P lot10 Rank(long thisP lot);
void Do CalcalateLocal( );
125
void Do GenerateDoro( );
void Do GenerateIG( );
void Do GenerateFBA( );
void Do GenerateInitialStatus( );
void Do GenerateBA( );
void Do GenerateRandom( );
void Do P lot63( );
void Do P lot62( );
void Do P lotSortData(long thisMany);
void Do P lotCumulative(long thisMany);
void Do GetTopoInfo( );
void Do P lot20 Distribution(long thisP lot);
void Do P lot60( );
void Do P lot61( );
void Do P lot51( );
void Do P lot00 ID(long thisP lot);
void Do ArrangeData( );
void Do InitNetwork( );
void Do WriteP lotF ile(CString thisF ileName, long thisLongData,
BOOL ifAverageErrorBar,BOOL ifAllLong);
CString Do GetCString(double thisData);
CString Do ComposeF ileName( );
BOOL DoScan(long thisSmallest);
BOOL Do IfHasLink(long thisStart, long thisEnd);
BOOL Do IfHasLinkAfterSort(long thisStart, long thisEnd);
BOOL Do AddNewLink(long thisLinkID, long thisStart,
long thisEnd, long thisCreateT ime, BOOL thisIfCheck);
BOOL Do AddNewNode(long thisNodeID, long thisCreateT ime);
BOOL Do CheckData( );
126
Appendix II.
Author’s Publications
Journal Papers
1. S. Zhou and R. J. Mondragon. The rich-club phenomenon in the Internet
topology. IEEE Communications Letters, volume 8, page 180, March 2004.
2. S. Zhou and R. J. Mondragon. Redundancy and robustness of the AS-
level Internet topology and its models. IEE Electronic Letters, volume 40,
page 151, January 2004.
3. S. Zhou and R. J. Mondragon. Accurately modelling the Internet topology.
Accepted by Physical Review E, 2004.
4. M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondragon and J. M. Pitts.
Dynamical modelling of TCP packet traffic on scale-free networks. Submit-
ted to Physical Review E, 2004.
Conference Papers
5. S. Zhou and R. J. Mondragon. Towards modelling the Internet topology
- the Interactive Growth model. In J. Charzinski, editor, Proc. of 18th
127
International Teletraffic Congress (ITC18), volume 5a of Teletraffic Science
and Engineering (Elsevier), pages 121–130, Berlin, German, Sept. 2003.
6. S. Zhou and R. J. Mondragon. The missing links in the BGP-based AS
connectivity maps. In Proc. of Passive and Active Measurement Workshop
(PAM2003) , pages 219–222, San Diego, USA, April 2003.
7. S. Zhou and R. J. Mondragon. Analyzing and modelling the AS-level
Internet topology. In Prof. of 1st International Working Conference on
Performance Modelling and Evaluation of Heterogeneous Networks (HET-
NETs’03), Ilkley, West Yorkshire, UK, July 2003.
8. S. Zhou and R. J. Mondragon. Topological properties of the AS-level
Internet. In Proc. of IEEE & IEE International Conference on Telecom-
munications (ICT2002) , volume 3, pages 497–501, Beijing, China, June
2002.
9. S. Zhou and R. J. Mondragon. Connectivity in the Internet topology. In
Proc. of PGNet2002, pages 157–162, Liverpool, UK, May 2002.
10. S. Zhou and R. J. Mondragon. The Positive-Feedback Preference model of
the AS-level Internet topology. Submitted to IEEE ICC, 2005.
11. S. Zhou and R. J. Mondragon, Sampling Methodologies and Structural
Deficiencies of the AS-level Internet Topology Measurements. Submitted to
The International Conference on Information Networking (ICOIN) 2005.
128
Glossary
AS Autonomous System, a collection of routers operated in a coordinated way
so that the routers implement the same routing policy; typically operated
by a single administrative entity.
ASN Autonomous System Number, a two-byte number that uniquely identifies
an AS.
BGP Border Gateway Protocol, the primary inter-domain routing protocol used
in the Internet.
ICMP Internet Control Message Protocol, the diagnostic part of the network
layer used in the Internet for reporting status information, checking connec-
tivity, and so on.
IP Internet Protocol, the network layer protocol used by the Internet.
ISPs Internet Service Providers
LANs Local Area Networks.
MANs Metropolitan Area Networks.
Protocol A standard procedure for regulating data transmission between com-
puters.
Router A computer that typically has two or more interfaces on different net-
works and provides forwarding of packets between those networks.
129
Routing The process by which a router calculates a forwarding table by using
its knowledge of the network taken from local configurations.
Routing Table A conceptual data structure used to hold routing information.
Server A hardware and software device designed to perform a specific function
for many users.
TCP Transmission Control Protocol, the principal reliable transport protocol
used in the Internet.
UDP User Datagram Protocol.
WANs Wide Area Networks.
WWW World Wide Web.
130
Bibliography
[1] L. A. Adamic and B. A. Huberman, “Power-law distribution of the world
wide web,” Science, vol. 287, p. 2115, 2000.
[2] S. H. Strogatz, “Exploring complex networks,” Nature (London), vol. 410,
p. 268, 2001.
[3] P. L. Krapivsky and S. Redner, “Organization of growing random networks,”
Phys. Rev. E, vol. 63, p. 066123, 2001.
[4] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search
in power-law networks,” Physical Review E, vol. 64, p. 046135, 2001.
[5] A. L. Barabasi, Linked: The New Science of Networks. Perseus Publishing,
2002.
[6] R. Albert and A. L. Barabasi, “Statistical mechanics of complex networks,”
Rev. Mod. Phys., vol. 74, pp. 47–97, 2002.
[7] S. Bornholdt and H. G. Schuster, Handbook of Graphs and Networks - From
the Genome to the Internet. Weinheim Germany: Wiley-VCH, 2002.
[8] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Networks - From
Biological Nets to the Internet and WWW. Oxford University Press, 2003.
[9] A. Vazquez, R. P.-S. M. Boguna, Y. Moreno, and A. Vespignani, “Topol-
ogy and correlations in structured scale-free networks,” Physical Review E,
vol. 67, no. 046111, 2003.
131
[10] R. Cohen and S. Havlin, “Scale-free networks are ultrasamll,” Physical Re-
view Letters, vol. 90, no. 5, p. 058701, 2003.
[11] R. Pastor-Satorras and A. Vespignani, Evolution and Structure of the Inter-
net - A Statistical Physics Approach. Cambridge University Press, 2004.
[12] S. T. Park, D. Pennock, and C. L. Giles, “Comparing static and dynamic
measurements andmodels of the Internets AS topology,” in Proc. of IEEE
INFOCOM 2004, 2004.
[13] A. Medina, I. Matta, and J. Byers, “On the origin of power laws in Internet
topologies,” ACM SIGCOMM Computer Communication Review, 2000.
[14] K. I. Goh, B. Kahng, and D. Kim, “Fluctuation-driven dynamics of the
Internet topology,” Physical Review Letters, 2002.
[15] A. C. Zorach and R. E. Ulanowicz, “Quantifying the complexity of flow
networks: how many roles are there?” Complexity, vol. 8, no. 3, 2003.
[16] A. L. Barabasi, Z. Deszo, E. Ravasz, S. H. Yook, and Z. Oltvai, “Scale-
free and hierarchical structures in complex networks,” to appear in Sitges
Proceedings on Complex Networks, 2004.
[17] S. Floyd, “Simulation is crucial,” IEEE Spectrum, January 2001.
[18] G. F. Riley and M. H. Ammar, “Simulating large networks - how big is big
enough?” in Proc. of 1st Intl. Conf. on Grand Challenges for Modeling and
Simulation, 2002.
[19] V. Paxson and S. Floyd, “Why we don’t know how to simulate the Internet,”
in Proc. of the 1997 Winter Simulation Conference, 1997.
[20] S. Floyd and V. Paxson, “Difficulties in simulating the Internet,”
IEEE/ACM Transactions on Networking, vol. 9, no. 4, pp. 392–403, Au-
gust 2001.
132
[21] S. Floyd and E. Kohler, “Internet research needs better models,” ACM
SIGCOMM Computer Communications Reviews, vol. 33, no. 1, pp. 29–34,
January 2003.
[22] W. Willinger and V. Paxson, “Where mathematics meets the Internet,”
Notices of the American Mathematical Society, vol. 45, no. 8, 1998.
[23] B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology infer-
ence in the presence of anonymous routers,” in Proc. IEEE INFOCOM,
2003.
[24] T. Petermann and P. D. L. Rios, “Exploration of scale-free networks – do
we measure the real exponents?” Eur. Phys. J., vol. 38, pp. 201–204, 2004.
[25] NLANR (National Laboratory for Applied Network Research),
http://moat.nlanr.net/.
[26] Route Views Project, University of Oregon, Eugene.
http://www.routeviews.org/.
[27] Routing Information Service, RIPE Network Coordination Center.
http://www.ripe.net/.
[28] CAIDA (Cooperative Association For Internet Data Analysis),
http://www.caida.org/.
[29] Internet Mapping Project, Lumeta, http://research.lumeta.com/ches/map/.
[30] Topology Project, University of Michigan, Ann Arbor.
http://topology.eecs.umich.edu/.
[31] M. Murray and kc claffy, “Measuring the immeasurable: global Internet
measurement infrastructure,” in Prof. of PAM2001, 2001.
133
[32] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships
of the Internet topology,” Comput. Commun. Rev., vol. 29, pp. 251–262,
1999.
[33] P. Erdos and A. Renyi, “On random graphs,” Publ. Math. Debrecen, vol. 6,
p. 290, 1959.
[34] P. Erdos and A. Renyi, “On the evolution of random graphs,” Publ. Math.
Inst. Hung. Acad. Sci., vol. 5, p. 17, 1960.
[35] B. M. Waxman, “Routing of multipoint connections,” IEEE Journal of Se-
lected Areas in Communications, vol. 6, no. 9, pp. 1617–1622, 1988.
[36] A. Capocci, G. Caldarelli, R. Marchetti, and L. Pietronero, “Growing dy-
namics of Internet providers,” Physical Review E, vol. 64, no. 035105, 2001.
[37] J. Winick and S. Jamin, “Inet-3. 0 Internet topology generator,” University
of Michigan, Tech. Rep. UM-CSE-TR-456-02, 2002.
[38] A. L. Barabasi and R. Albert, “Emergence of scaling in random networks,”
Science, vol. 286, pp. 509–512, 1999.
[39] G. Bianconi and A. L. Barabasi, “Competition and multiscaling in evolving
networks,” Europhysics Letters, vol. 54, no. 4, pp. 436–442, 2001.
[40] R. Albert and A. L. Barabasi, “Topology of evolving networks: local events
and universality,” Physical Review Letters, vol. 85, no. 24, pp. 5234–5237,
2000.
[41] A. Medina and I. Matta, “Brite: A flexible generator of Internet topologies,”
Boston University, Tech. Rep. BU-CS-TR-2000-005, 2000.
[42] S. N. Dorogovtsev and J. F. F. Mendes, “Scaling behaviour of developing
and decaying networks,” EuroPhys. Lett., vol. 52, no. 33, p. 33, 2000.
134
[43] T. E. D. Vukadinovic, P. Huang, “A spectral analysis of the Internet topol-
ogy,” Technical report ETH TIK-NR. 118, 2001.
[44] T. Bu and D. Towsley, “On distinguishing between Internet power law topol-
ogy generators,” in Proc. of IEEE INFOCOM 2002, 2002, p. 638.
[45] G. Caldarelli, P. D. L. Rios, and L. Pietronero, “Generalized network
growth: from microscopic strategies to the real Internet properties,”
arXiv:cond-mat/0307610 v1, 2004.
[46] J. M. Carlson and J. C. Doyle, “Highly optimized tolerance: A mechanism
for power laws in designed systems,” Physical Review E, vol. 60, pp. 1412–
1428, 1999.
[47] I. Norros and H. Reittu, “Architectural features of the power-law random
graph model of Internet: nodes on soft hierarchy, vulnerability and multi-
casting,” in Proceedings of the 18th International Teletraffic Congress - ITC
18, Elsevier, 2003.
[48] C. P. B. Quoitin and L. Swinnen, “Interdomain traffic engineering with
bgp,” IEEE Communications Magazine, May 2003.
[49] D. K. Arrowsmith and M. Woolf, “Modelling of tcp packet traffic in a large
interactive growth network,” IEEE Proc. of Systems and Circuits, 2004.
[50] M. Barenco and D. K. Arrowsmith, “The autocorrelation of double inter-
mittency maps and the simulation of computer packet traffic,” to appear in
Jnl of Dyn. Sys, 2004.
[51] C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary, “The impact
of Internet policy and topology on delayed routing convergence,” in Proc.
of INFOCOMM 2001, 2001.
135
[52] R. V. Sole and S. Valverde, “Information theory of complex networks: On
evolution and architectural constraints,” Santa Fe Institute, Tech. Rep. DOI:
SFI-WP 03-11-061, 2003.
[53] A. Kershenbaum, Telecommunications network design algorithms.
McGraw-Hill, Inc., 1993.
[54] M. Steenstrup, Routing in communications networks. Prentice Hall, 1995.
[55] H. Tangmunarunkit, R. Govindan, S. Shenker, and D. Estrin, “The impact
of routing policy on Internet paths,” in Prof. of IEEE INFOCOM 2001,
2001.
[56] R. Guerin and A. Orda, “Computing shortest paths for any number of hops,”
IEEE/ACM Transactions on Networking, vol. 10, no. 5, October 2002.
[57] K. I. Goh, B. Kahng, and D. Kim, “Universal behavior of load distribution
in scale-free networks,” Phys. Rev. Lett., vol. 87, no. 278701, 2001.
[58] K. I. Goh, E. Oh, B. Kahng, and D. Kim, “Betweenness centrality correla-
tion in social networks,” Phys. Rev. E, vol. 67, no. 017101, 2003.
[59] P. Holme and B. J. Kim, “Vertex overload breakdown in evolving networks,”
Phys. Rev. E, vol. 65, no. 066109, 2002.
[60] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han, “Attack vulnerability of
complex networks,” Phys. Rev. E, vol. 65, no. 056109, 2002.
[61] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ net-
works,” Nature, vol. 393, 1998.
[62] M. E. J. Newman, “Assortative mixing in networks,” Phys. Rev. Lett.,
vol. 89, no. 208701, 2002.
[63] M. E. J. Newman, “Mixing patterns in networks,” Phys. Rev. E, vol. 67,
no. 026126, 2003.
136
[64] S. Maslov, K. Sneppen, and A. Zaliznyak, “Detection of topological patterns
in complex networks: correlation profile of the Internet,” Physica A, vol. 333,
p. 529, 2004.
[65] R. Xulvi-Brunet, W. Pietsch, and I. M. Sokolov, “Correlations in scale-free
networks: Tomography and percolation,” Phys Rev E, vol. 68, no. 036119,
2003.
[66] R. Pastor-Satorras, A. Vazquez, and A. Vespignani, “Dynamical and cor-
relation properties of the Internet,” Phys. Rev. Lett., vol. 87, no. 258701,
2001.
[67] A. Vazquez, R. Pastor-Satorras, and A. Vespignani, “Large-scale topological
and dynamical properties of Internet,” Phys. Rev. E, vol. 65, no. 066130,
2002.
[68] S. Janson, T. Luczak, and A. Rucinski, Random Graphs. Wiley-
Interscience, 2000.
[69] J. Watts, Small Worlds: The Dynamics of Networks between Order and
Randomness. New Jersey, USA: Princeton Univeristy Press, 1999.
[70] L. Adamic, “The small world web,” in Proceedings of ECDL’99, 1999, pp.
443–452.
[71] M. E. J. Newman and D. J. Watts, “Scaling and percolation in the small-
world network model,” Phys. Rev. E, vol. 60, p. 7332, 1999.
[72] M. E. J. Newman and D. J. Watts, “Renormalization group analysis of the
small-world network model,” Physics Letters A, vol. 263, pp. 341–346, 1999.
[73] Y. Hyun, A. Broido, and k. claffy, “Traceroute and BGP AS path incon-
gruities,” http://www.caida.org/outreach/papers/2003/ASP/.
137
[74] Q. Chen, H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger,
“The origin of power laws in Internet topologies (revisited),” in Proc. of
IEEE INFOCOM 2002, 2002, pp. 608–617.
[75] H. Chang, R. Govindan, S. Jamin, S. J. Shenker, and W. Willinger, “To-
wards capturing representative as-level Internet topology,” Computer Net-
works Journal, vol. 44, no. 6, pp. 737–755, 2004.
[76] R. Govindan and H. Tangmunarunkit, “Heuristics for Internet map discov-
ery,” in Proc IEEE Infocom 2000, 2000.
[77] R. Albert, H. Jeong, and A. L. Barabasi, “Error and attack tolerance of
complex networks,” Nature, vol. 406, pp. 378–381, 2000.
[78] L. Subramanian, S. Agarwal, J. Rexford, and R. H. Katz, “Characteriz-
ing the Internet hierarchy from multiple vantage points,” in Proc. of IEEE
INFOCOM 2002, 2002, pp. 618–627.
[79] S. T. Park, A. Khrabrov, D. M. Pennock, S. Lawrence, C. L. Giles, and
L. H. Ungar, “Static and dynamic analysis of the Internet’s susceptibility to
faults and attacks,” in Proc. of IEEE INFOCOM 2003, vol. 3, April 2003,
pp. 2144–2154.
[80] A. Broido and kc Claffy, “Internet topology: connectivity of IP graphs,” in
SPIE International symposium on Convergence of IT and Communication
2001, 2001.
[81] B. Huffaker, D. Plummer, D. Moore, and kc Claffy, “Topology discovery by
active probing,” in Proc. of the 2002 Symposium on Applications and the
Internet, 2002.
[82] E. N. A. Broido and kc Claffy, “Internet expansion, refinement and churn,”
European Transactions on Telecommunications 2002, 2002.
138
[83] K. L. Calvert, M. B. Doar, and E. W. Zegura, “Modeling Internet topology,”
IEEE Communications Magazine, June 1997.
[84] M. Doar, “A better model for generating test networks,” Proc. of IEEE
GLOBECOM 1996, Nov. 1996.
[85] E. W. Zegura, K. L. Calvert, and M. J. Donahoo, “A quantitative compari-
son of graph-based models for Internet topology,” ACM/IEEE Transactions
on Networking,, vol. 5, no. 6, pp. 770–783, 1997.
[86] C. Jin, Q. Chen, and S. Jamin, “Inet: Internet topology generator,” Uni-
versity of Michigan, Tech. Rep. UM-CSE-TR-433-00, 2000.
[87] A. L. Barabasi, “The architecture of complexity: From the diameter of the
www to the structure of the cell,” http://www. nd. edu/ networks/.
[88] Z. N. H. Jeong and A. L. Barabasi, “Measuring preferential attachment in
evolving networks,” Europhysics Letters, vol. 61, no. 4, pp. 567–572, 2003.
[89] A. L. Barabasi, “The physics of the web,” Physics World, July 2001.
[90] D. Cohen, “All the world is a net,” New Scientist, April 2002.
[91] R. Cohen and S. Havlin, “Scale-free networks are ultrasmall,” Phys. Rev.
Lett., vol. 90, no. 5, p. 058701, 2003.
[92] Y. Moreno, R. Pastor-Satorras, A. Vazquez, and A. Vespignani, “Critical
load and congestion instabilities in scale-free networks,” Europhys. Lett.,
vol. 62, p. 292, 2002.
[93] S. H. Yook, H. Jeong, and A. L. Barabsi, “Modelling the Internet’s large-
scale topology,” Proc. of the Nat’l Academy of Sciences, vol. 99, pp. 13 382–
13 386, 2002.
[94] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free
networks,” Physical Review Letters, vol. 86, no. 14, pp. 3200–3203, 2001.
139
[95] A. L. Barabasi, R. Albert, and H. Jeong, “Mean-field theory for scale-free
random networks,” Physica A, vol. 272, pp. 173–187, 1999.
[96] A. Medina, A. Lakhina, I. Matta, and J. Byers, “Brite: Universal topology
generation from a user’s perspective,” Boston University, Tech. Rep. BUCS-
TR-2001-003, 2001.
[97] G. Bianconi, G. Caldarelli, and A. Capocci, “Number of h-cycles in the
Internet at the autonomous system level,” ArXiv:cond-mat/0310339, 2003.
[98] A. Fabrikant, E. Koutsoupias, and C. H. Papadimitriou, “Heuristically op-
timized trade-offs: A new paradigm for power laws in the Internet,” in Proc.
of ICALP 2002, 2002.
[99] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-
inger, “Network topology generators: Degree-based vs. structural,” Proc.
of ACM/SIGCOMM 2002, pp. 147–159, 2002.
[100] H. Tangmunarunkit, J. Doyle, R. Govindan, and S. Jamin, “Does AS size
determine degree in AS topology?” ACM SIGCOMM Computer Commu-
nication Review, 2001.
[101] D. Krioukov, http://www.krioukov.net/ dima/rs.html.
[102] J. Spencer and L. Sacks, “Modelling ip network topologies by emulating
network development processes,” in IEEE Softcom 2002, 2002.
[103] H. Fuks and A. T. Lawniczak, “Performance of data networks with random
links,” Mathematics and Computers in Simulation, vol. 51, pp. 103–119,
1999.
[104] L. Gao, “On inferring autonomous system relationships in the Internet,” in
Proc. of IEEE Global Internet, 2000.
140
[105] S. Zhou and R. J. Mondragon, “The rich-club phenomenon in the Internet
topology,” IEEE Comm. Lett., vol. 8, no. 3, pp. 180–182, March 2004.
[106] S. Zhou and R. J. Mondragon, “Connectivity in the Internet topology,” in
Proc. of PGNet2002. Liverpool, UK: EPSRC, May 2002, pp. 157–162.
[107] S. Zhou and R. J. Mondragon, “Topological properties of the as-level In-
ternet,” in Proc. of Int. Conf. on Telecommunications (ICT) 2002, vol. 3.
Beijing, China: IEEE and IEE, June 2002, pp. 497–501.
[108] S. Zhou and R. J. Mondragon, “Redundancy and robustness of the as-level
Internet topology and its models,” IEE Elec. Lett., vol. 40, no. 2, pp. 151–
152, January 2004.
[109] S. Zhou and R. J. Mondragon, “Analyzing and modelling the as-level Inter-
net topology,” in Prof. of 1st Int. Working Conf. on Performance Modelling
and Evaluation of Heterogeneous Networks (HET-NETs’03), Ilkley, West
Yorkshire, UK, July 2003, arXiv:cs. NI/0303030.
[110] S. Zhou and R. J. Mondragon, “Towards modelling the Internet topology
- the interactive growth model,” in Proc. of 18 Int. Teletraffic Congress
(ITC18), ser. Teletraffic Science and Engineering, J. Charzinski, Ed., vol. 5a.
Berlin, German: Elsevier, Sept. 2003, pp. 121–130.
[111] M. Woolf and D. K. Arrowsmith, “Modelling of tcp packet traffic in a large
interactive growth network,” in IEEE Int. Symposium on Circuits and Sys-
tems (ISCAS), Vancouver, Canada, May 2004.
[112] M. Woolf, D. K. Arrowsmith, S. Zhou, R. J. Mondragon, and J. M. Pitts,
“Dynamical modelling of tcp packet traffic on scale-free networks,” (submit-
ted), 2004.
141
[113] The Data Kit #0204 was collected as part of CAIDA’s Skitter initiative,
http://www.caida.org. Support for Skitter is provided by DARPA, NSF,
and CAIDA membership.
[114] P. M. Gleiss, P. F. Stadler, A. Wagner, and D. A. Fell, “Small cycles in
small worlds,” SFI Working Paper 00-10-058, 2000.
[115] G. Bianconi and A. Capocci, “Number of loops of size h in growing scale-free
networks,” Phys. Rev. Lett., vol. 90, no. 078701, 2003.
[116] R. P.-S. G. Caldarelli and A. Vespignani, “Structure of cycles and local
ordering in complex networks,” The European Physical Journal B, vol. 28,
no. 2, pp. 183–186, 2004.
[117] M. M. C. Gkantsidis and E. Zegura, “Spectral analysis of Internet topolo-
gies,” in Proc. of IEEE INFOCOM 2003, 2003.
[118] G. Iannaccone, C. N. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot,
“Analysis of link failures in an ip backbone,” Proc. of the second ACM
SIGCOMM Workshop on Internet measurment, 2002.
[119] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Network
robustness and fragility: Percolation on random graphs,” Physical Review
Letters, vol. 85, no. 25, p. 5468, December 2000.
[120] S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos, “A simple conceptual
model for the Internet topology,” in Prof. of Global Internet, San Antonio,
Texas, 2001.
[121] S. Zhou and R. J. Mondragon, “The missing links in the BGP-based AS con-
nectivity maps,” in Proc. of Passive and Active Measurement (PAM) Work-
shop 2003. San Diego, USA: NLANR, April 2003, pp. 219–222, arXiv:cs.
NI/0303028.
142
[122] S. Zhou and R. J. Mondragon, “On measuring and modeling the Internet
topology at the autonomous systems level,” Submitted to ACM/IMC2004,
2004.
[123] P. L. Krapivsky, S. Redner, and F. Leyvraz, “Connectivity of growing ran-
dom networks,” Phys. Rev. Lett., vol. 85, no. 4629, 2000.
[124] A. V. A. Vazquez, R. Pastor-Satorras, “Internet topology at the router and
autonomous system level,” cond-mat/0206084, 2002.
[125] S. Zhou and R. J. Mondragon, “The positive-feedback preference model of
the as-level Internet topology,” Submitted to IEEE Communications Letters,
2004.
[126] S. Zhou and R. J. Mondragon, “Accurately modelling the Internet topology,”
2004, preprint: arXiv.cs.NI/0402011.
[127] K. Psounis, R. Pan, B. Prabhakar, and D. Wischik, “The scaling hypoth-
esis: simplifying the prediction of network performance using scaled-down
simulations,” ACM SIGCOMM Computer Communications Review, vol. 33,
no. 1, 2003.
[128] A. V. Y. Moreno, R. Pastor-Satorras and A. Vespignani, “Critical load
and congestion instabilities in scale-free networks,” Europhys. Lett., vol. 62,
no. 2, pp. 292–298, 2003.
[129] S. Agarwal, C. N. Chuah, and R. H. Katz, “OPCA: Robust interdomain
policy routing and traffic control,” in Proc. of the 6th InternationalCon-
ference on Open Architectures and Network Programming (OPENARCH
2003), 2003.
[130] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/.
[131] Gunplot, http://t16web.lanl.gov/Kawano/gnuplot/.
143