investigating the impact of the blogosphere: using pagerank to determine the distribution of...

10
Using PageRank to determine the distribution of attention Lars Kirchhoff | Axel Bruns | Thomas Nicolai Investigating the impact of the blogosphere 18.10.2007

Upload: axel-bruns

Post on 14-May-2015

2.233 views

Category:

Technology


2 download

DESCRIPTION

Much has been written in recent years about the blogosphere and its impact on political, educational and scientific debates. Lately the issue has received significant attention from the industry. As the blogosphere continues to grow, even doubling its size every six months, this paper investigates its apparent impact on the overall Web itself. We use the popular Google PageRank algorithm which employs a model of Web used to measure the distribution of user attention across sites in the blogosphere. The paper is based on an analysis of the PageRank distribution for 8.8 million blogs in 2005 and 2006. This paper by Lars Kirchhoff, Axel Bruns, and Thomas Nicolai for the Association of Internet Researchers conference in Vancouver, 17-20 Oct. 2007, addresses the following key questions: How is PageRank distributed across the blogosphere? Does it indicate the existence of measurable, visible effects of blogs on the overall mediasphere? Can we compare the distribution of attention to blogs as characterised by the PageRank with the situation for other forms of Web content? Has there been a growth in the impact of the blogosphere on the Web over the two years analysed here? Finally, it will also be necessary to examine the limitations of a PageRank-centred approach.

TRANSCRIPT

Page 1: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

Using PageRank to determine the distribution of attention

Lars Kirchhoff | Axel Bruns | Thomas Nicolai

Investigating the impact of the blogosphere

18.10.2007

Page 2: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What are the questions?► Is the impact of the blogosphere

different to other forms of online media?

► How is PageRank distributed across the blogosphere?

► Does it indicate the existence of measurable, visible effects of blogs on the overall mediasphere?

► Has there been a growth in the impact of the blogosphere on the Web over the two years analysed here?

Page 3: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What we have done?2005► ~15m profiles from blogger.com► ~8.871m unique blog urls extracted► Retrieved Google PageRank

2006► same profiles ► but slightly more unique blog urls

extracted (~8.888m)

► Retrieved Google PageRank

Page 4: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

Why PageRank?► Available for almost any web page

► Easy to gather

► Global property that takes the whole web into account

► Search is most common way to look for information

Page 5: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What do we have?Blogosphere PageRank Distribution 2005

# b

log

s

PageRank

1

10

100

1000

10000

100000

1000000

10000000

0 1 2 3 4 5 6 7 8 9 10

Distribution 2005

Page 6: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What do we have?

1

10

100

1000

10000

100000

1000000

10000000

0 1 2 3 4 5 6 7 8 9 10

Distribution 2005

Distribution 2006

Blogosphere PageRank Distribution 2006#

blo

gs

PageRank

Page 7: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What has happened?Increase and Decrease (%) of PageRank from 2005 to 2006

perc

en

t

PageRank

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9 10

Page 8: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What does this mean?► Strong decline at PageRank 1,2 / 7-10

► Lower end: effect of focus on Blogger?► Blogger as sandbox high attrition?

► Higher end: shrinking A-list?► churn away from Blogger?► harder to achieve high PageRank in larger, more

diverse Web?

► Need to track trajectories► e.g. how many low PR blogs rose from 2005 to

2006?► e.g. how many PR7+ blogs survived from 2005 to

2006?

Page 9: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What are the limitations?► Coarse values

► Algorithm is not entirely known

► Updates for Google PageRank are random

Page 10: Investigating the Impact of the Blogosphere: Using PageRank to Determine the Distribution of Attention

What next!► Use more blogs

► Measure PageRank more frequently

► Use other indicators/measures(alexa, technorati, BlogLines)

► Discuss different metrics