icpsr - complex systems models in the social sciences - lab session 9 - professor daniel martin katz

85
Professor Daniel Martin Katz Introduction to Computing for Complex Systems (Lab Session 9)

Upload: daniel-martin-katz

Post on 11-May-2015

515 views

Category:

Education


2 download

TRANSCRIPT

Page 1: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Professor Daniel Martin Katz !

Introduction to Computing !for Complex Systems !

(Lab Session 9)!

Page 2: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Social Networks & !the Tools of Analysis!

!

Page 3: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Pajek: It’s Not Everything, !But it’s a Good Start!

•  Today we will begin to use Pajek, (which is pronounced Pah-yek). !

•  Pajek means spider in the Slovenian Language!

•  It is designed to read fairly large networks. !

!

Page 4: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  Pajek allows you to:!

•  Read and visualize network data!

•  Edit and create networks !

•  Run Various node level statistics!

•  Run Various graph level statistics!

Pajek: It’s Not Everything, !But it’s a Good Start!

Page 5: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

More Info About Pajek!

Vladimir Batagelj and Andrej Mrvar! !!Pajek: Program for Analysis and!Visualization of Large Networks.!Reference Manual !!!Version 1.27. Ljubljana, 2010.!

http://vlado.fmf.uni-lj.si/pub/networks/pajek/doc/pajekman.pdf!

Page 6: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

More Info About Pajek!

Wouter de Nooy, Andrej Mrvar, and Vladimir Batagelj!

!!Exploratory Social Network

Analysis with Pajek. !!!Cambridge University Press,

2005.!

Page 7: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

More Info About Pajek!

For a detailed!

description of!

Pajek’s menu !

bar options: !

http://vlado.fmf.uni-lj.si/pub/networks/pajek/sunbelt.97/pajekman.htm!

Page 8: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Creating Networks!

•  Pajek can read your network data files!

•  Pajek can also edit networks as well as create random graphs (which can serve as a null case).!

•  Please Open Pajek on your machine!

!

Page 9: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Random Network Generation!

•  create an Erdos-Renyi random graph!

•  Net>Random Network >Erdos-Renyi> Undirected>General>…!

!

!

!

Page 10: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Random Network Generation!

•  >Erdos-Renyi>Undirected>General>!

!

!

!

•  How many vertices: 100!

•  Average degree of vertices: 5!

Page 11: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Random Network Generation!

•  The screen should now show a Report screen that will show what Pajek has done thus far.!

Page 12: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Exploring Pajek’s !Menu Options!

•  Pajek will keep the networks you have used during your session in this drop down menu.!

•  Partitions keep discrete categorical attributes of nodes (such as Degree, Party ID, Etc.).!

Partition for!

Republicans =1!

Partition for!

Democrats =2!

Partition for!

Independent =0!

Page 13: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Exploring Pajek’s !Menu Options!

•  Vectors keep continuous node attributes (such as centrality).!

!

•  The Permutations, Cluster, Hierarchy drop down menus keep different types of clustering attributes.!

Page 14: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Graph Visualization!

•  Let’s visualize our Random Graph!

!

•  Select in the top menu Draw and then press Draw in the drop down menu. Draw>Draw.!

Page 15: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Energizing the Network!

•  Go to Layout in the top menu of the visualization screen and select an energizing algorithm!

!

•  Layout > Energy > Kamada-Kawai > Free!

Page 16: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Energized Random Network!

Page 17: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Energized Random Network!

•  Go to Layout again:!

!

•  Layout > Energy > Fruchterman > 2D!

•  Layout > Energy > Fruchterman > 3D!

Page 18: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Energized Random Network!

Page 19: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Rotation of the Network!

(1) Go to Layout again:!–  Layout > Energy > Fruchterman > 3D!

(2) Spin > Spin Around !

!

(3) Select Number of !

Degrees to Rotate!

"(Try 1080°)!

Page 20: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

The Options Menu!

•  Lets Explore the Options SubMenu!

!

•  We can change turn on the node labels, numbers, vector values, etc. !

!

Page 21: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

The !Options Menu!

Same Visualization- now with node labels!

!

Page 22: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Node labels could !

be names, Firms, etc.! !

Daniel Katz & Derek K. Stafford, Hustle and Flow: A Social Network analysis of the American Federal Judiciary, 71 Ohio State L. J. 457 (2010)!

Page 23: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

The Options Menu!

•  Lets Explore the Options SubMenu!

!

•  We can change the size of the vertices !

!

Page 24: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Old and New Node Sizes!

Page 25: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

The Options Menu!

Change Node and Edge Colors!

!

Set Nodes = Blue!

Set Edges = Yellow!

!

Page 26: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Blue and Yellow!

Page 27: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

The Options Menu!

Change the Background Color !

Change Font Color for Vertex Labels !

!

Set Background = Black!

Set Font Color = Grey!

!

Page 28: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Blue and Yellow: Take 2!

Page 29: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

How Do I Make This !Image Crisper?!

(1) Export to .SVG!

(2) Download Inkscape for Post Production !

http://www.inkscape.org/download/!

Page 30: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Inkscape has lots !of functions!

Page 31: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Before Inkscape!

Note : Some nodes will move as separate Realizations of the !visualization algorithm lead to slightly variant results!

Page 32: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

After Inkscape!

Daniel Katz, Joshua Gubler, Jon Zelner, Michael Bommarito, Eric Provins & Eitan Ingall, Reproduction of Hierarchy? A Social Network Analysis of the American Law Professoriate, !

61 Journal of Legal Education 1 (2011)!

Page 33: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Okay, Lets

Generate Some!Graph level Stats!

•  Using Our Random Graph We Can Measure the Clustering Coefficient of the Resulting Network !

!•  Network>Vector>Clustering Coeffcients>CC1!

Page 34: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Double Click inside the vector menu!to get clustering coefficients for!individual nodes!!Here is what it will look like !(values may differ)!

!!

Page 35: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Empirical Network Data!

•  We have now learned how to create random graphs and visualize networks. !

!

•  It is now time to work with a real empirical data set.!

!

!

Page 36: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Corporate Interlocks in Scotland dataset !

•  The Scotland.net file within has a dataset of a, “two-mode network with 244 vertices (136 multiple directors and 108 companies), 356 edges (directorate), no arcs, no loops.”!

Page 37: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Corporate Interlocks in Scotland dataset !

•  http://vlado.fmf.uni-lj.si/pub/networks/data/esna/scotland.htm!

!

Page 38: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Here is the Two Mode Network!(Companies & directors)!

Page 39: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Based upon my colors, Red=Companies; grey=indiv!

Page 40: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Pajek Project Files !(.paj files)!

•  a .paj file saves different network component files in one Full Pajek Project file.!

•  You can open a .paj file by going to File>Pajek Project File>Read!

•  You can also save a .paj by going to File>Pajek Project File>Save.!

Page 41: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Pajek Project Files (.paj files)!

Opening a .paj file: !

" " " " "!

Use the .paj file if you have as it often has more

information!!

Page 42: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

.net Files!•  Often you will only have a .net file at

your disposal !

•  Thus, Before we do anything with the file, let’s look at what a .net file looks like. !

!•  Open a Text editor such as “wordpad”

or “notepad”!

•  Then, open the Scotland.net file from within that text editor!

Page 43: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

 !!!•  Analyzing the first two lines:!

•  Number of Vertices: 244!

•  Vertex x,y,z coordinates (optional): 0.0000 0.0000 0.5000!

•  Note: The number of vertices listed at the top must match the number of nodes in the vertices section!!

Page 44: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  Later in the file is the edge list!

•  Analyzing the first three lines:!

•  *Arcs – represents directed edges!•  *Edges – represents undirected edges!

•  Meaning: The North British Railway (node# 1) is connected to the Earl of Mansfield (node# 109), etc.!

•  Additional information after the first two numbers in *Arcs/*Edges can signify attributes of the arc/edge, such as its weight or color.!

Page 45: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Additional Notes about .net Files!

!•  Either *Arcs or *Edges (or both) come

immediately after the *Vertices section without hitting ‘Enter’.!

!•  If you have arcs, the *Arcs sections

always come before the *Edges section, although you do not need to include an *Arcs section if you do not have any arcs.!

 !•  .net files can be tricky … Do not use

tab, only spacing!!

Page 46: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Other Drop Down Menus (.clu, .vec, .per, cls, hie files)!

•  You may have noticed that the Scotland.zip website also talked about .paj, .vec, and .clu files. These are files that are created in the other menu options, which all work in a similar manner as the Networks menu option.!

 !

!

Page 47: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Other Drop Down Menus (.clu, .vec, .per, cls, hie files)!

 The Partitions menu:!

saves .clu files, the Vectors menu saves to .vec files, the Permutations menu saves to .per files, the Clusters menu saves to .cls files, and the Hierarchies menu saves to .hie files. !

!

Page 48: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Now Please Load the .Paj File for Scottish Board

Interlocks!

Now, Close Pajek and Reopen it !

Page 49: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Your Screen Should Look Roughly Like This !

Page 50: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Okay, Lets

Generate Some!Graph level Stats!

•  Now that we have seen what a .net file looks like, we can use Pajek to extract graph level data from the network.!

!•  Degree Distribution:

Net>Partitions>Degree>All!

Page 51: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Graph Level Stats!

•  We can take a look at the data by double clicking the drop down menu where it is listed.!

!

•  Let’s double click on “All Degree partition of N1 (244)”!

Page 52: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  The Pajek window shows the node numbers, the degree of the vertices, and the name of the vertices.!

Page 53: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Analyzing Data in Outside Statistical Software!

•  We can also save the Partitions data as a .clu file and open it inside statistical software.!

•  Either click the floppy disk under the Partitions button, or go to File>Partition>Save!

Page 54: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Analyzing Data in Outside Statistical Software!

•  After you give it a title save the .clu file.!

Page 55: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Analyzing Data in Outside Statistical Software!

•  We can now open the .clu file in statistical software. !

•  I will use Excel.!

•  Here the degree distribution is listed. !

Page 56: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Analyzing Data in Outside Statistical Software!

•  The first entry shows how many vertices are in the data !

•  The second entry is the amount of connections the first node had !

•  the third entry is the amount of connections the second node had, etc.!

Page 57: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Average Shortest Path!

•  Net>Paths between 2vertices> Distribution of Distances> From All Vertices!

Page 58: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Average Shortest Path!

•  Then look at the Report window.!

Average distance among !

reachable pairs: 5.60675!

Page 59: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Node Level Data: Closeness Centrality!

•  We can also use Pajek to calculate node level statistics, such as various centrality measures.!

!

•  Let’s calculate Closeness Centrality.!

Page 60: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Closeness Centrality!

•  Net>Vector>Centrality>Closeness>All!

Page 61: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  Again, we can either get the individual node data by double-clicking the drop down menu next to vectors or save the data to a .vec file.!

Closeness Centrality!

Page 62: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  We can also get the average and standard deviation by going to Info>Vector. (Leave the following two windows blank that pop up before Pajek reports the data).!

Closeness Centrality!

Page 63: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Betweeness Centrality!

•  Let’s calculate Betweeness Centrality next.!

•  Net>Vector>Centrality>Betweeness!

Page 64: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Betweeness Centrality!

•  Again, we can either get the individual node data by double-clicking the drop down menu next to vectors or save the data to a .vec file.!

!

Page 65: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Betweeness Centrality!

•  We can also get the average and standard deviation by going to Info>Vector.!

Page 66: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Hubs & Authorities!

•  Net>Vector>ImportantVertices> 1-Mode: Hubs-Authorities!

Page 67: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Hubs & Authorities!

•  Let’s assume 10% Hubs or Authorities. !

•  Put 24 in the two windows that pop up after selecting Hubs & Authorities.!

Page 68: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

•  Under the Vectors drop down menu there will be information for both Hubs and Authorities.!

•  You can double click or save them as .vec files.!

Hubs & Authorities!

Page 69: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Hubs & Authorities!

•  We can also get the average and standard deviation for both the Hub and Authority measures by going to Info>Vector.!

Page 70: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Pajek’s Built-in !Export Data Tool!

•  Remember: Pajek allows users to export data directly to R!

Page 71: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Creating a Partition!

•  Many times a network will contain natural groups that will fall into partitions (such as Dems vs GOP).!

!•  We will once again create a random

graph and then produce a random partition in order to see how Pajek visualizes partition data.!

Page 72: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Random Network Generation!

•  create an Erdos-Renyi random graph!

•  Net>Random Network >Erdos-Renyi> Undirected>General>…!

!

!

!

Page 73: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Random Network Generation!

•  >Erdos-Renyi>Undirected>General>!

!

!

!

•  How many vertices: 100!

•  Average degree of vertices: 5!

Page 74: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Creating a Partition!

•  Now we will create a random partition by going to Partition>Create Random Partition>1-Mode!

Page 75: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Creating a Random Partition!

•  Now Pajek will prompt us to set the dimension of the partition. Write in 100.!

•  After that we can set how many partitions will be in the network. Select two.!

Page 76: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Editing Partition Data!

•  Click on the edit button next to the Partitions drop down menu or go to File>Partition>Edit!

•  Here you can edit your partitions data. You could then save the edits to a .clu file.!

Page 77: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Drawing a Random Partition!

•  Go to Draw>Draw-Partition to visualize the partitions in the network.!

Page 78: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Network (Now with a Random Partition)!

Page 79: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Network !(Energized with larger node sizes)!

Page 80: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Partitions are Useful!

•  It is possible to use outside data to segment nodes into partitions!

•  For example, party Id (see Below) or another variable such as Race, Income, gender, etc.!

•  These partitions can be saved in the .paj File !

Partition for!

Republicans =1!

Partition for!

democrats =2!

Partition for!

Independent =0!

Page 81: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Federal District Court!(Roughly 90 Regional Courts)

Federal Circuit Court !(13 Regional and Specialty Courts)!

Supreme Court!

An Example of a Partition!

The American Federal Judiciary !

Page 82: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Lots of Uses for Partitions Including Distinguishing

Between Nodes!

Daniel Katz & Derek K. Stafford, Hustle and Flow: A Social Network analysis of the American Federal Judiciary, 71 Ohio State L. J. 457 (2010)!

Page 83: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Two Mode -> One Mode!!

•  It is possible to use Pajek to Convert a two mode network into a one mode networks!

•  Remember our example of 2 mode versus one mode networks!

–  two mode = Movies & Actors!

–  one mode = Actor to Actor projection of the network !

Page 84: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Wrap Up!

•  Pajek has a number of additional features that may be relevant in your specific empirical inquiry!

•  Consult these sources (as well as others) to learn more: !

Page 85: ICPSR - Complex Systems Models in the Social Sciences - Lab Session 9 - Professor Daniel Martin Katz

Wrap Up!

•  Pajek is not the best tool for very large graphs or more sophisticated forms of analysis!

•  Use Igraph in Python or R !