studying online interactions using social network analysis

38
Studying Online Interactions using Social Network Analysis Anatoliy Gruzd [email protected] @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University BCC Workshop Arlington, TX, Oct 4, 2015

Upload: anatoliy-gruzd

Post on 15-Apr-2017

3.402 views

Category:

Education


2 download

TRANSCRIPT

Studying Online Interactions using Social Network Analysis

Anatoliy [email protected]@gruzd

Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of ManagementDirector, Social Media LabRyerson University

BCC Workshop

Arlington, TX, Oct 4, 2015

Social Network Analysis (SNA)

• Nodes = People

• Edges /Ties (lines) = Relations/

“Who retweeted/ replied/

mentioned whom”

Anatoliy Gruzd 3

How to Make Sense of Educational “Big” Data?

@gruzd

Makes it much easier to

understand what is going on in a

group

Advantages of

Social Network Analysis

Once the network is discovered, we can find out:

• How do people interact with each other,

• Who are the most/least active members of a group,

• Who is influential in a group,

• Who is susceptible to being influenced, etc…

Anatoliy Gruzd 4@gruzd

Common approach for collecting social network data:

• Self-reported social network data may not be available/accurate

• Surveys or interviews

Problems with surveys or interviews

• Time-consuming

• Questions can be too sensitive

• Answers are subjective or incomplete

• Participant can forget people and

interactions

• Different people perceive events and

relationships differently

How Do We Collect Information About Online Social Networks?

Anatoliy Gruzd 5@gruzd

• Common approach: surveys or interviews

• A sample question about students’ perceived social structures

How Do We Collect Information About Social Networks?

Please indicate on a scale from [1] to [5],

YOUR FRIENDSHIP RELATIONSHIP WITH EACH STUDENT IN THE CLASS

[1] - don’t know this person

[2] - just another member of class

[3] - a slight friendship

[4] - a friend

[5] - a close friend

Alice D. [1] [2] [3] [4] [5]

Richard S. [1] [2] [3] [4] [5]

Source: C. Haythornthwaite, 1999

Anatoliy Gruzd 6@gruzd

Studying Online Social Networks

http://www.visualcomplexity.com/vc

• Forum networks

• Blog networks

• Friends’ networks (Facebook,

Twitter, Google+, etc…)

• Networks of like-minded people

(YouTube, Flickr, etc…)

Anatoliy Gruzd 7@gruzd

Goal: Automated Networks Discovery

Challenge: Figuring out what content-based features of online interactions can help to uncover nodes and ties between group members

How Do We Collect Information About Online Social Networks?

Anatoliy Gruzd 8@gruzd

Automated Discovery of Social Networks

Emails

Nick

Rick

Dick

• Nodes = People

• Ties = “Who talks to whom”

• Tie strength = The number of

messages exchanged between

individuals

Anatoliy Gruzd 9@gruzd

Automated Discovery of Social Networks

“Many to Many” Communication

ChatMailing listservForum Comments

Anatoliy Gruzd 10@gruzd

Automated Discovery of Social Networks Approach 1: Chain Network (Reply-to)

FROM: SamPREVIOUS POSTER: Gabriel

....

....

....

Posting

header

Content

Anatoliy Gruzd 11@gruzd

Automated Discovery of Social Networks Approach 1: Chain Network (Reply-to)

FROM: SamPREVIOUS POSTER: Gabriel

“ Nick, Gina and Gabriel: I apologize for not backing this up

with a good source, but I know from reading about this topic that … ”

Posting

header

Content

Possible Missing Connections:

• Sam -> Nick

• Sam -> Gina

• Nick <-> Gina 12 12@gruzd

Automated Discovery of Social Networks

Approach 2: Name Network

FROM: Ann

“Steve and Natasha, I couldn't wait to see your site.

I knew it was going to [be] awesome!”

This approach looks for personal names in the content of the messages to identify social connections between group members.

Anatoliy Gruzd 14@gruzd

• Main Communicative Functions of Personal Names (Leech, 1999)

– getting attention and identifying addressee

– maintaining and reinforcing social relationships

• Names are “one of the few textual carriers of identity” in discussions on the web (Doherty, 2004)

• Their use is crucial for the creation and maintenance of a sense of community (Ubon, 2005)

Automated Discovery of Social NetworksApproach 2: Name Network

Anatoliy Gruzd 15@gruzd

Example: Twitter Networks

@John

@Peter

@Paul

• Nodes = People

• Ties = “Who retweeted/

replied/mentioned whom”

• Tie strength = The number of

retweets, replies or mentions

How to Make Sense of Social Media Data?

Anatoliy Gruzd 23@gruzd

Automated Discovery of Social Networks

Twitter Data Example

24

Name Network ties Chain Network ties

@Cheeflo -> @JoeProf @Cheeflo -> @VMosco

none

Automated Discovery of Social Networks

Twitter Data Example

25

Name Network ties Chain Network ties

@gruzd -> @sidneyeve @gruzd -> @sidneyeve

Anatoliy Gruzd

Netlytic.org cloud-based research infrastructure for automated text analysis &

discovery of social networks from social media data

Ne

two

rk

s

Sta

ts

Co

nte

nt

27

29

cMOOC Dataset

Athabasca University Course◦ CCK11: Connectivism and Connective Knowledge

Not restricted to any one

platform

“Throughout this ‘course’

participants will use a variety of

technologies, for example, blogs,

Second Life, RSS Readers,

UStream, etc.”

ANATOLIY GRUZD @gruzd

31

CCK11 MOOC Dataset

#Posts Avg charactercount

Avg word count

Blogs 812 332.3 54.2

DiscussionThreads

68 541.8 90.8

Comments 306 837.6 138.9

Tweets 1722 113.9 14.4

ANATOLIY GRUZD @gruzd

Question 1 - #LAK14 #pLASMAWhat does this visualization tell us from an instructor’s perspective?

ANATOLIY GRUZD 32@gruzd

What does this visualization tell us from a student’s perspective?

ANATOLIY GRUZD 33@gruzd

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

Macro-level

Density

Diameter

Reciprocity

Centralization

Modularity

ANATOLIY GRUZD 35@gruzd

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 36

In-degree suggests “prestige” highlighting the most mentioned or replied Twitter users

@gruzd

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 38

Out-degree reveals active Twitter users with a good awareness of others in the network

@gruzd

SNA MeasuresMicro-level

In-degree centrality

Out-degree centrality

Betweenness centrality

Other centrality measures (e.g., closeness, eigenvector)

ANATOLIY GRUZD 40

Betweenness shows actors who are located on the most number of information paths and who often connect different groups of users in the network

@gruzd

Top 10 Twitter users in the Name network based on centrality measures

ANATOLIY GRUZD 41

In-degree Out-degree Betweenness

profesortbaker cck11feeds profesortbaker

shellterrell suifaijohnmak suifaijohnmak

gsiemens web20education cck11feeds

downes daisygrisolia thbeth

zaidlearn ianchia francesbell

web20education profesortbaker hamtra

hamtra pipcleaves daisygrisolia

profesorbaker hamtra web20education

ianchia jaapsoft jaapsoft

jaapsoft thbeth gsiemens

suifaijohnmak ryantracey

@gruzd

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Density indicates the overall connectivity in the network (the total number of connections divided by the total number of possible connections).

It is equal to 1 when everyone is connected to everyone.

ANATOLIY GRUZD 42@gruzd

Student

InstructorStudent

Density = 1

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Diameter gives a general idea of how “wide” the network is; the longest of the shortest paths between any two nodes in the network.

ANATOLIY GRUZD 43@gruzd

#1

Student A Instructor

Student B

Student C

Diameter = 3

#2

#3

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Reciprocity shows how many online participants are having two-way conversations.

In a scenario when everyone replies to everyone, the reciprocity value will be 1.

ANATOLIY GRUZD 44@gruzd

Student

InstructorStudent

Student Reciprocity=1

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Centralization indicates whether a network is dominated by few central participants (values are closer to 1),

or whether more people are contributing to discussion and information dissemination (values are closer to 0).

ANATOLIY GRUZD 45@gruzd

Student

InstructorStudent

Student Centralization=1

Anatoliy Gruzd 46@gruzd

High Centralization

SNA MeasuresMacro-level

Density

Diameter

Reciprocity

Centralization

Modularity

Modularity provides an estimate of whether a network consists of one coherent group of participants who are engaged in the same conversation and who are paying attention to each other (values closer to 0);

or whether a network consists of different conversations and communities with a weak overlap (values closer to 1).

ANATOLIY GRUZD 48@gruzd

Anatoliy Gruzd 2012 Olympics in London 49@gruzd

High Modularity ~0.9

Anatoliy Gruzd #tarsand Twitter Community 50@gruzd

Moderate Modularity ~0.5

Name network (on the left) and Chain network (on the right)

ANATOLIY GRUZD 51

#CCK11 Class Tweets

@gruzd

Name Network Chain Network

Nodes 498 122

Edges 761 125

Density 0.003 (0.3%) 0.009 (0.9%)

Diameter 38 7

Reciprocity 0.09 (9%) 0.18 (18%)

Centralization 0.07 0.08

Modularity 0.67 0.77

ANATOLIY GRUZD 52@gruzd

#CCK11 Class Tweets