on the dynamics of topic-based communites in online...

Post on 22-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

On the Dynamics of Topic-Based Communites

in Online Knowledge-Sharing Networks

Anna Guimaraes, Ana Paula Couto da Silva, Jussara Almeida

Department of Computer Science - UFMG (Brazil)

September 21, 2015

Introduction

• Online Knowledge-Sharing Networks

– Wikis, Q&A sites, discussion forums

– User-created and maintained discussions

– Wealth of knowledge

2

Introduction

• Online Knowledge-Sharing Networks

– Wikis, Q&A sites, discussion forums

– User-created and maintained discussions

– Wealth of knowledge

• Prior research focus on knowledge extraction by:

– Detecting quality content [Agichtein et al., 2008]

– Ranking questions and answers [Dalip et al., 2013]

– Identifying expert users [Ravi et al., 2014, Wang et al., 2013]

2

Introduction

• More than repositories for knowledge!

– Community structure surrounding discussions

– Topics and communities subject to temporal changes

– Multiple topics, multiple communities

• This study:

– Community approach to knowledge-sharing networks

– Characterization and modeling of community evolution

3

Case Study: Stack Overflow

4

Case Study: Stack Overflow

Tags

4

Topic-Based Communities in Stack Overflow

• Communities centered around topics

– Topics are explicity defined

– Independent from social interaction graph

• Non-exclusive membership to multiple communities

5

Stack Overflow Dataset

• User activity

– User ID, Tag ID, Time stamp

• Data covering a six-year period

– 2008–2014

Tags Posts Users

400 19.8 million 1.7 million

6

Topic-Based Communities in Stack Overflow

• Temporal analyses of community activity in terms of:

– How user behavior affects community sustainability

– How users relate to communities in the long run

– How users divide their attention across different communities

– How communities affect one another

7

Communities in Stack Overflow: Findings

• Significant revisiting behavior

– Users continue to contribute to a same community

– Revisitors to a community grow more significant over time

Mean Fraction of Revisits

1st month 6th month 12th month

Revisitors 0.20 0.44 0.50

Revisits 0.27 0.46 0.50

8

Communities in Stack Overflow: Findings

• Participation in multiple communities

– 32% of users participate in up to 3 communities

– Average user participates in 17 communities

– Decaying pattern of activity over time

2 4 6 8 10 12Months

5

10

15

20

25

30

Com

mun

ities

1813

2 4 6 8 10 12Months

01020304050607080

Post

s

4228

9

Communities in Stack Overflow: Findings

• Migrating behavior

– Users traverse different communities over time

– Shared member base across communities

Ruby on Rails 3 → Ruby on Rails 4

Feb2013

Aug2013

Feb2014

Aug2014

Months

0100200300400500600700800900

# M

embe

rs

Rails 3 MembersNew Members

10

Communities in Stack Overflow: Findings

• Migrating behavior

– Users traverse different communities over time

– Shared member base across communities

MySQL → PHP

Feb2013

Aug2013

Feb2014

Aug2014

Months

0

1000

2000

3000

4000

5000

6000

# M

embe

rs

MySQLNew Members

10

Communities in Stack Overflow: Findings

• Key aspects dictating community evolution

– Intra-community aspects

– User revisits

– Continued activity

– Inter-community aspects

– Shared member base

– User migration

11

How can we then describe communityevolution?

12

CERIS Model

• CERIS

– Community Evolution model with

Revisits and Inter-community effectS

• Goal: describe community activity (number of posts) over time

• Incorporates revisits and community relationships

13

CERIS Model

• CERIS extends state-of-the-art models

– Phoenix-R evolution model with revisits [Figueiredo et al., 2014]

– Competition model [Beutel et al., 2012]

• Epidemiology approach to network dynamics

– Objects in the network are modeled as infections

14

CERIS Model

• Users are initially exposed to different communities

S

I1 I2

β2β1

γ1 γ2

I1,2

γ2 γ1

εβ2 εβ1

V1 V2

V1,2

ω1,2

ω1 ω2

15

CERIS Model

• Users become infected by participating in a community

S

I1 I2

β2β1

γ1 γ2

I1,2

γ2 γ1

εβ2 εβ1

V1 V2

V1,2

ω1,2

ω1 ω2

15

CERIS Model

• Users can recover by ceasing activity in a community

S

I1 I2

β2β1

γ1 γ2

I1,2

γ2 γ1

εβ2 εβ1

V1 V2

V1,2

ω1,2

ω1 ω2

15

CERIS Model

• Or they can be infected by additional communities

S

I1 I2

β2β1

γ1 γ2

I1,2

γ2 γ1

εβ2 εβ1

V1 V2

V1,2

ω1,2

ω1 ω2

15

CERIS Model

• Revisits to a same community captured by hidden states

S

I1 I2

β2β1

γ1 γ2

I1,2

γ2 γ1

εβ2 εβ1

V1 V2

V1,2

ω1,2

ω1 ω2

15

CERIS Model

I1,2

I1

S

I2V1 V2

V1,2

γ2 γ1

ω1,2

εβ2

γ1

ω1

β2β1

εβ1

γ2

ω2v1

V1

V1,2

V1

V1,2

s1 sn

...

+

+

+

+

16

CERIS Model

• Analyzes the time series for the number of posts in the

communities simultaneously

• Contagious process occurs following “shocks”

– Wavelets method to identify activity peaks as shock candidates

– e.g. When a new related community becomes active

• Model fitting with the Levenberg-Marquardt algorithm and

Minimum Description Length

17

CERIS Model Results

HTML and CSS

2009 2010 2011 2012 2013 2014010000200003000040000500006000070000

csshtmlmodel

iOS versions

Jan2012

Jan2013

Jan2014

Jul Jul Jul Jul050

100150200250300350400

ios7ios6ios5model

18

CERIS Model Results

• Model results:

– Reasonably accurate fittings

– Captures different patterns of activity

– Captures concurrent evolution of related communities

RMSE

HTML and CSS iOS versions All (mean, daily)

3046.895 13.612 21.131

19

CERIS Model Results

• Model outputs used to quantify the relationship between

communities

• Flow of users between communities:

flowC1,C2(t) = εβ2(t)

flowC2,C1(t) = εβ1(t)

20

CERIS Model Results

Top 100

20 40 60 80 100Communities

20

40

60

80

100

Com

munit

ies

0.00.10.20.30.40.50.60.70.80.9

Top 15

java

java

scrip

tc#ph

p

andr

oid

jque

ry

pyth

onht

ml

c++io

s

mys

qlcss

asp.

net

object

ive-

c.n

etjava

javascriptc#

phpandroidjquerypython

htmlc++

iosmysql

cssasp.net

objective-c.net

0.00.10.20.30.40.50.60.70.8

21

Conclusions

• Knowledge-sharing networks as a community environment

– Topic-based communities defined by users interacting with topics

of their interest

• Investigation of topic-based communities in Stack Overflow

– User activity in terms of communities they belong to

– Impact of related communities

• New model to describe community evolution

– Incorporates key factors behind community activity

– Good portrayal of the co-evolution of multiple communities

22

Thank you!

Anna Guimaraes

anna@dcc.ufmg.br

23

References I

Agichtein, E., Castillo, C., Donato, D., Gionis, A., and

Mishne, G. (2008).

Finding High-Quality Content in Social Media.

In Proc. WSDM.

Beutel, A., Prakash, B. A., Rosenfeld, R., and Faloutsos, C.

(2012).

Interacting Viruses in Networks: Can Both Survive?

In Proc. ACM SIGKDD.

24

References II

Dalip, D. H., Goncalves, M. A., Cristo, M., and Calado, P.

(2013).

Exploiting User Feedback to Learn to Rank Answers in Q&A

Forums: A Case Study with Stack Overflow.

In Proc. ACM SIGIR.

Figueiredo, F., Almeida, J. M., Matsubara, Y., Ribeiro, B.,

and Faloutsos, C. (2014).

Revisit Behavior in Social Media: The Phoenix-R Model and

Discoveries.

Proc. PKDD.

25

References III

Hansen, M. H. and Yu, B. (2001).

Model Selection and the Principle of Minimum Description

Length.

Journal of the American Statistical Association, 96(454).

More, J. J. (1978).

The levenberg-marquardt algorithm: implementation and

theory.

In Numerical analysis, pages 105–116. Springer.

Ravi, S., Pang, B., Rastogi, V., and Kumar, R. (2014).

Great Question! Question Quality in Community Q&A.

In Proc. ICWSM.

26

References IV

Wang, X., Butler, B. S., and Ren, Y. (2013).

The impact of membership overlap on growth: An

ecological competition view of online groups.

Organization Science, 24(2):414–431.

27

top related