oxford digital humanities summer school

19
(Social) Network Analysis Scott A. Hale Oxford Internet Institute http://www.scotthale.net/ 17 July 2014

Upload: scott-a-hale

Post on 06-May-2015

441 views

Category:

Education


3 download

DESCRIPTION

Slides

TRANSCRIPT

Page 1: Oxford Digital Humanities Summer School

(Social) Network Analysis

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/

17 July 2014

Page 2: Oxford Digital Humanities Summer School

What are networks?

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Additional details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 3: Oxford Digital Humanities Summer School

What are networks?

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Additional details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 4: Oxford Digital Humanities Summer School

Why?

Characterize network structure

How far apart / well-connected are nodes?Are some nodes at more important positions?Is the network composed of communities?

How does network structure affect processes?

Information diffusionCoordination/cooperationResilience to failure/attack

Page 5: Oxford Digital Humanities Summer School

A network

First questions when approaching a network

What are edges? What are nodes?

What kind of network?

Inclusion/exclusion criteria

Page 6: Oxford Digital Humanities Summer School

Network data repositories

http://www.diggingintodata.org/Repositories/tabid/167/

Default.aspx

http://datamob.org

http://snap.stanford.edu/data

http://www-personal.umich.edu/~mejn/netdata

Page 7: Oxford Digital Humanities Summer School

Python resources

tweepy: Package for Twitter stream and search APIs (only python 2.7 atthe moment)

search and stream API example code along with code to creatementions/retweet network athttps://github.com/computermacgyver/twitter-python

Python two versions:

2.7.x – many packages, issues with non-English scripts

3.x – less packages, but excellent handling of international scripts(unicode)

Page 8: Oxford Digital Humanities Summer School

NetworkX

http://networkx.github.io/

Package to represent networks as python objects

Convenient functions to add, delete, iterate nodes/edges

Functions to calculate network statistics (degree, clustering, etc.)

Easily generate comparison graphs based on statistical models

Visualization

Alternatives include igraph (available for Python and R)

Page 9: Oxford Digital Humanities Summer School

Gephi

Open-source, cross-platform GUI interface

Primary strength is to visualize networks

Basic statistical properties are also available

Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more

Page 10: Oxford Digital Humanities Summer School

Network measures

With many nodes visualizations are often difficult/impossible to interpret.Statistical measures can be very revealing, however.

Node-level

Degree (in, out): How many incoming/outgoing edges does a node have?Centrality (next slide)Constraint

Network-level

Components: Number of disconnected subsets of nodesDensity: observed edges

maximum number of edges possible

Clustering coefficient closed tripletsconnected triples

Path length distributionDistributions of node-level measures

Page 11: Oxford Digital Humanities Summer School

Centrality measures

Degree

Closeness: Measures the average geodesic distance to ALL other nodes.Informally, an indication of the ability of a node to diffuse a propertyefficiently.

Betweenness: Number of shortest paths the node lies on. Informally,the betweenness is high if a node bridges clusters.

Eigenvector: A weighted degree centrality (inbound links from highlycentral nodes count more).

PageRank: Not strictly a centrality measure, but similar to eigenvectorbut modeled as a random walk with a teleportation parameter

Page 12: Oxford Digital Humanities Summer School

NetworkX: Nodes

import networkx as nx

g=nx.Graph() #A new (empty) undirected graph

g.add_node("Alan") #Add one new node

g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes from list

#Nodes can have attributes

g.node["Alan"]["gender"]="M"

g.node["Bob"]["gender"]="M"

g.node["Carol"]["gender"]="F"

g.node["Denise"]["gender"]="F"

for n in g:

print("{0} has gender {1}".format(n,g.node[n]["gender"]))

Page 13: Oxford Digital Humanities Summer School

NetworkX: Edges

#Interesting graphs have edges

g.add_edge("Alan","Bob") #Add one new edge

#Add two new edges

g.add_edges_from([["Carol","Denise"],["Carol","Bob"]])

#Edge attributes

g.edge["Alan"]["Bob"]["relationship"]="Friends"

g.edge["Carol"]["Denise"]["relationship"]="Friends"

g.edge["Carol"]["Bob"]["relationship"]="Married"

#New edge with an attribute

g.add_edges_from([["Carol","Alan",

{"relationship":"Friends"}]])

Page 14: Oxford Digital Humanities Summer School

NetworkX: Edges

for e in g.edges_iter():

n1=e[0]

n2=e[1]

print("{0} and {1} are {2}".format(n1,n2,g.edge[n1][n2]["relationship"]))

Page 15: Oxford Digital Humanities Summer School

NetworkX: Measures

g.number_of_nodes()

g.nodes(data=True)

g.number_of_edges()

g.edges(data=True)

nx.info(g)

nx.density(g)

nx.number_connected_components(g)

nx.degree_histogram(g)

nx.betweenness_centrality(g)

nx.clustering(g)

nx.clustering(g, nodes=["Bob"])

Page 16: Oxford Digital Humanities Summer School

NetworkX: Visualize or save

#Save g to the file my_graph.graphml in graphml format

#prettyprint will make it nice for a human to read

nx.write_graphml(g,"my_graph.graphml",prettyprint=True)

#Layout g with the Fruchterman-Reingold force-directed

#algorithm and save the result to my_graph.png

#with_labels will label each node with its id

import matplotlib.pyplot as plt

nx.draw_spring(g,with_labels=True)

plt.savefig("my_graph.png")

plt.clf() #Clear plot

Page 17: Oxford Digital Humanities Summer School

NetworkX: Odds and ends

#Read a graph from the file my_graph.graphml in graphml format

g=nx.read_graphml("my_graph.graphml")

#Create a (empty) directed graph

g=nx.DiGraph()

See http://networkx.github.io/documentation/latest/reference/

index.html for many more commands. Note that some commands are onlyavailable on directed or undirected graphs.

Page 18: Oxford Digital Humanities Summer School

Resources

Newman, M.E.J., Networks: An Introduction

Kadushin, C., Understanding Social Networks: Theories, Concepts, andFindings

De Nooy, W., et al., Exploratory Social Network Analysis with Pajek

Shneiderman B., and Smith, M., Analyzing Social Media Networks withNodeXL

Page 19: Oxford Digital Humanities Summer School

(Social) Network Analysis

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/

17 July 2014