school of information university of michigan expertise networks in online communities: structure and...

30
School of Information University of Michigan Expertise Networks in Online Communities: Structure and Algorithms Lada Adamic joint work with Jun Zhang and Mark Ackerman School of Information, University of Michigan NetSci May 24 th , 2007

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

School of InformationUniversity of Michigan

Expertise Networks in Online Communities:Structure and Algorithms

Lada Adamic

joint work with Jun Zhang and Mark Ackerman

School of Information, University of Michigan

NetSci

May 24th, 2007

Have you sought knowledge here?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Knows

Knowledge iN

Oozing out knowledge

Knowledge In

``Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge''

NHN CEO Chae Hwi-young

Largest search engine in Korea - 70% of search (Google: 2%)

Comprehensive portal – integrated news, blogs, ‘knowledge search’

Knowledge-In had 60 million questions and answers as of Feb 2007

popular: why fingernails grow faster than toenails how fast a fly can flywhy seagulls sit in the same direction

Ranking the contributors

Level Range of points

Lowlife 0-99

Commoner 100-500

Citizen 501-3000

Middle class 3001-7000

Expert 7001-15000

Hero 15001-35000

Professional 35001-65000

Superhuman 65001-100000

(Gods) > 100000

Knowledge In

Culture of generosity

“(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.”

-- Eckart Walther (Yahoo Research)

90 million users worldwide

Limitations of Current Systems

• The Response Time Gap

4939N =

ExpertiseRating

lowhigh

WA

ITTIM

E(m

in)

10000

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

69

96

41

• The Expertise Gap • Difficult to infer reliability of answers

Automatically ranking expertise may be helpful.

Related work

• Analysis of online communities• NetScan (Smith, Fisher, et al. at Microsoft)• Social network analysis (LiveJournal, blog communities)• Motivations of online participation (Lakhani & Hippel)

• Graph-based ranking algorithms• PageRank, HITS, etc.

• Expertise sharing studies• Expertise recommenders

• ContactFinder (Krulwich et al.), Answer Garden (Ackerman)• Small Blue (Lin)

• Automatic evaluating expertise levels• Using different text resources (Kautz, et al, and a lot of others)• Using email networks (Campbell et al.)

Overview

• Social network analysis• Constructing Expertise Networks• Finding meaningful metrics

• Empirical evaluation of ranking algorithms• Human Rating vs. Algorithmic Ranking

• Simulation• Understanding underlying dynamics• Predicting performance of ranking algorithms in yet-unobserved community dynamics

Java Forum

• 87 sub-forums• 1,438,053

messages• community

expertise network constructed:• 196,191 users• 796,270 edges

Constructing a community expertise network

A B C

Thread 1 Thread 2

Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C

Thread 2: Binary file with ASCII data user ARe: File with... user C

A

B

C

1

1

A

B

C

1

2

A

B

C

1/2

1+1//2

A

B

C

0.9

0.1

unweighted

weighted by # threads

weighted by shared credit

weighted with backflow

Not Everyone Asks/Replies

• Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers.• OUT: Mostly Helpers

The Web is a bow tie The JavaForum network is

an uneven bow tie

Uneven participation

100

101

102

103

10-4

10-3

10-2

10-1

100

degree (k)

cum

ula

tive

prob

abili

ty

= 1.87 fit, R2 = 0.9730

number of people one received replies from

number of people one replied to

• ‘answer people’ may reply to thousands of others

• ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent

Who Answers Whom

Degree-degree correlations between asker and helper

helper indegree (logarithmically binned)

ask

er

ind

eg

ree

(lo

ga

rith

mic

ally

bin

ne

d)

3 7 20 55 148 403 1096 2981

3

7

20

55

148

403

1096

2981

0

1

2

3

4

5

6

Summary of JavaForum Network

• Different types of participation • Askers, ask-help-er, helpers

• Different levels of participation • top helpers, others

• Who replied to whom• Top repliers answer questions for everyone• Other helpers help those with somewhat lower expertise

Relating network structure to Java expertise

• Human-rated expertise levels• 2 raters• 135 JavaForum users with >= 10 posts• inter-rater agreement (= 0.74, = 0.83)• for evaluation of algorithms, omit users where raters disagreed by

more than 1 level (= 0.80, = 0.83)

L Category Description

5 Top Java expert Knows the core Java theory and related advanced topics deeply.

4 Java professional Can answer all or most of Java concept questions. Also knows one or some sub topics very well,

3 Java user Knows advanced Java concepts. Can program relatively well.

2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.

1 Newbie Just starting to learn java.

Structural Info Based Expertise Ranking Metrics

• # replies posted (# answers)• experts can answer many questions

• # people replied to (# indegree)• experts can answer questions from many different people

• z-score for the 2 above (observed – )/• experts are above the mean in the above two metrics

• PageRank replying to people who reply to people• higher level experts can answer mid-level experts

• HITS experts answer questions by people whose questions other experts have answered

hubs point to good authorities

10121617181920192N =

LEVCOM

1098765432

RA

NK

of

PRA

NK

160

140

120

100

80

60

40

20

0

- 20

9281

5

68

1

10121617181920192N =

LEVCOM

1098765432

RA

NK

of

REP

LY

140

120

100

80

60

40

20

0

- 20

40

101

10121617181920192N =

LEVCOM

1098765432

RA

NK

of

ZTH

REA

DS

160

140

120

100

80

60

40

20

0

- 20

40

1011

10121117171917192N =

LEVCOM

1098765432

RA

NK

of

HIT

S_A

UT

140

120

100

80

60

40

20

0

- 20

33

automated vs. human ratings

# answers

human rating

auto

mat

ed r

anki

ng

10121617181920192N =

LEVCOM

1098765432

RA

NK

of

IND

GR

160

140

120

100

80

60

40

20

0

- 20

40

101

10121117171917192N =

LEVCOM

1098765432

RA

NK

of

ZD

GR

140

120

100

80

60

40

20

0

- 20

106104

z # answers

HITS authority

indegree

z indegree

PageRank

Algorithm Rankings vs. Human Ratings

simple local measures do as well (and better) than measures incorporating the wider network topology

Top K Kendall’s Spearman’s

# answersz-score # answersindegreez-score indegreePageRankHITS authority

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Modeling community structure to explain algorithm performance

Control Parameters: Distribution of expertise Who asks questions most often? Who answers questions most often? best expert most likely someone a bit more expert

ExpertiseNet Simulator

Simulating probability of expertise pairing

0 1 2 3 4 50

1

2

3

4

5

replier expertise

aske

r ex

pert

ise

0.02

0.04

0.06

0.08

0.1

0.12

0 1 2 3 4 50

1

2

3

4

5

replier expertise

aske

r ex

pert

ise

0

0.05

0.1

0.15

suppose:

expertise is uniformly distributed

probability of posing a question is inversely proportional to expertise

pij = probability a user with expertise j replies to a user with expertise i

2 models:‘best’ preferred ‘just better’ preferred

iep ijij /~ )( iep ji

ij /~ )( j>i

Visualization

Best “preferred” just better

Degree correlation profiles

best preferred (simulation) just better (simulation)

Java Forum Networkas

ker

inde

gree

aske

r in

degr

ee

aske

r in

degr

ee

The Simulation of JavaForum

• Settings: • Distribution of expertise (skewed)• Who asks questions most often? (novices)• Who answers questions? (best expert most likely)

• Results • Similar bow tie structure• Similar degree distribution• Slightly different correlation profiles• Similar algorithm performance

• PageRank does not outperform simpler degree-based metrics

Different ranking algorithms perform differently

In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS

It can tell us when to use which algorithms

Preferred Helper: ‘just better’

Preferred Helper: ‘best available’

Summary

• Expertise Networks have interesting characteristics• A set of useful metrics • Ranking algorithms are affected by network structures• Simulation as an analysis tool

• There are rich design opportunities• Find experts with the help of structural information (and content

analysis) • Predict good answers • Re-order questions/answers to match expertise

working paper: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”

• Looking at diverse sets of question-answer forums (Yahoo Answers)

• Expertise across different topics

• Using explicit ratings for evaluation of automated expertise identification & incorporation into algorithms (battling spam)

• Users’ expertise change over time• Continually developing and evaluating our systems built upon these

findings

Future Work

cars & transportation

maintenance & repairs

beauty & style

hair

for more info

• ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure

and Algorithms, WWW’07

• Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study

Online Community Network Formation and Implications, C&T2007

Jun Zhang [email protected]

http://www-personal.si.umich.edu/~junzh

Mark Ackerman [email protected]

http://www.eecs.umich.edu/~ackerm/

Lada Adamic [email protected]

http://www-personal.umich.edu/~ladamic

NSF (IRI-9702904)

ads

• Jun Zhang is graduating and on the job market ([email protected])

• Lada is looking for a postdoc ([email protected])

Simplest models do not capture all ‘local’ interactions

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

6 12 14 36 38 46 74 78 98 102 108 110 238

realsim_ bestsim_ better