investigating the impact of the blogosphere: using pagerank to determine the distribution of...
DESCRIPTION
Much has been written in recent years about the blogosphere and its impact on political, educational and scientific debates. Lately the issue has received significant attention from the industry. As the blogosphere continues to grow, even doubling its size every six months, this paper investigates its apparent impact on the overall Web itself. We use the popular Google PageRank algorithm which employs a model of Web used to measure the distribution of user attention across sites in the blogosphere. The paper is based on an analysis of the PageRank distribution for 8.8 million blogs in 2005 and 2006. This paper by Lars Kirchhoff, Axel Bruns, and Thomas Nicolai for the Association of Internet Researchers conference in Vancouver, 17-20 Oct. 2007, addresses the following key questions: How is PageRank distributed across the blogosphere? Does it indicate the existence of measurable, visible effects of blogs on the overall mediasphere? Can we compare the distribution of attention to blogs as characterised by the PageRank with the situation for other forms of Web content? Has there been a growth in the impact of the blogosphere on the Web over the two years analysed here? Finally, it will also be necessary to examine the limitations of a PageRank-centred approach.TRANSCRIPT
Using PageRank to determine the distribution of attention
Lars Kirchhoff | Axel Bruns | Thomas Nicolai
Investigating the impact of the blogosphere
18.10.2007
What are the questions?► Is the impact of the blogosphere
different to other forms of online media?
► How is PageRank distributed across the blogosphere?
► Does it indicate the existence of measurable, visible effects of blogs on the overall mediasphere?
► Has there been a growth in the impact of the blogosphere on the Web over the two years analysed here?
What we have done?2005► ~15m profiles from blogger.com► ~8.871m unique blog urls extracted► Retrieved Google PageRank
2006► same profiles ► but slightly more unique blog urls
extracted (~8.888m)
► Retrieved Google PageRank
Why PageRank?► Available for almost any web page
► Easy to gather
► Global property that takes the whole web into account
► Search is most common way to look for information
What do we have?Blogosphere PageRank Distribution 2005
# b
log
s
PageRank
1
10
100
1000
10000
100000
1000000
10000000
0 1 2 3 4 5 6 7 8 9 10
Distribution 2005
What do we have?
1
10
100
1000
10000
100000
1000000
10000000
0 1 2 3 4 5 6 7 8 9 10
Distribution 2005
Distribution 2006
Blogosphere PageRank Distribution 2006#
blo
gs
PageRank
What has happened?Increase and Decrease (%) of PageRank from 2005 to 2006
perc
en
t
PageRank
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8 9 10
What does this mean?► Strong decline at PageRank 1,2 / 7-10
► Lower end: effect of focus on Blogger?► Blogger as sandbox high attrition?
► Higher end: shrinking A-list?► churn away from Blogger?► harder to achieve high PageRank in larger, more
diverse Web?
► Need to track trajectories► e.g. how many low PR blogs rose from 2005 to
2006?► e.g. how many PR7+ blogs survived from 2005 to
2006?
What are the limitations?► Coarse values
► Algorithm is not entirely known
► Updates for Google PageRank are random
What next!► Use more blogs
► Measure PageRank more frequently
► Use other indicators/measures(alexa, technorati, BlogLines)
► Discuss different metrics