lastfm crawler

13
last.fm crawler RW vs RWRW Mário Almeida [email protected] Zafar Gilani [email protected] Arinto Murdopo [email protected]

Upload: arinto-murdopo

Post on 24-May-2015

1.564 views

Category:

Education


1 download

DESCRIPTION

Mini-project result presentation in class

TRANSCRIPT

Page 1: lastfm crawler

last.fm crawlerRW vs RWRW

Mário Almeida [email protected] Gilani [email protected]

Arinto Murdopo [email protected]

Page 2: lastfm crawler

Outline● Parameters● Methodology● Results● Challenges● Conclusion

Page 3: lastfm crawler

Parameters1. Playcounts2. Playlists3. Ages4. IDs5. Number of friends (degrees)

Compare average using RW and RWRW!

Page 4: lastfm crawler

MethodologyUtilized lastfm APIs to obtain

● user info ● number of friends (degree)

RW with UIS-WROn-the-fly, we apply RW formula:

Page 5: lastfm crawler

MethodologyFor RWRW, we apply:

The weight Wv is set to number of friends (degree)

Page 6: lastfm crawler

ResultsCrawled for ~10 hoursNumber of samples: 48000Number of age samples: 36363, not all users show their age

Page 7: lastfm crawler

Results - Ages

After about 25k samples, the

age stabilizes.

RW estimates

lower average age

values. There is a big

correlation between age

and the degree

Page 8: lastfm crawler

Results - Playlists

Most users do not have playlists.

RW estimates higher numbers of playlists. Users with higher degrees tend to

have more playlists.

Page 9: lastfm crawler

Results - Playcounts

We found some users having playcounts in the order of millions.

RW estimates higher playcounts. Users with higher degree tend to have higher playcounts

Page 10: lastfm crawler

Results - IDs

RW estimates a lower average ID compared to RWRW. An user with lower ID has generally a higher degree

Not yet stable.

Page 11: lastfm crawler

Results - Degrees

RWRW reduces the bias of nodes with higher probability to be visited

due to the high degree. This is indeed close to the expected degree

value.

Page 12: lastfm crawler

Conclusion● A simple random walk in a social network

generally results into biased averages.○ A node with higher degree has a higher probability of

being discovered.● RWRW normalizes the averages.

○ High variations do not abruptly impact the estimation.

○ RWRW reduces the biases of RW.● Low variance means lower difference

between RW and RWRW.● Crawling lastfm produces many challenges

○ e.g.: 0 degree, banned user, huge playcounts

Page 13: lastfm crawler

QuestionsCheck the code in:● http://code.google.com/p/lastfm-rwrw/