network visualization: gephi and cytoscape · network visualization: gephi and cytoscape...

Post on 13-May-2018

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Network Visualization: Gephi and Cytoscape

Caf'E.phe, février 2016

Pablo Ruiz Fabo — LATTICE

pablo.ruiz.fabo@ens.fr

Network visualization

• Requires relational data

[ http://cvcedhlab.hypotheses.org/125 ]

2

. . .

. . .

Network analysis

• Some terminology:

[ http://cvcedhlab.hypotheses.org/106 ]

3

. . .

Network analysis • Network: composed of nodes, linked by edges

• Nodes represent actors in our domain

– People, characters, concepts, places, …

• Edges encode the relation between the nodes

– Interacting with someone, citing someone’s work, occurring in the same paragraph, …

• Edges can be weighted: encodes importance of the link

– E.g. How many times did this link occur?

• Edges can bear direction or not:

– [Being a correspondent] vs. [being the sender vs. being the addressee of a letter]

4

Objectives

• Create an co-occurrence network visualization with Gephi and Cytoscape, for two corpora:

– History corpus on the American crisis of 2008 • A CSV file representing the network’s edges was used

– Philosophy corpus: Jeremy Bentham’s manuscripts. • For Gephi, a GEXF file representing the network was used

• For Cytoscape , a Graphml file representing the network was used (it can also be used for Gephi)

• Export a navigable network so that it can be visualized outside these tools

[ Some example files to import or create networks with, and example exported networks are available at apps.lattice.cnrs.fr/nav/cafephe11 ]

5

2008 Crisis Corpus: PoliInformatics

6

Smith et al. (2014) [12]

Bentham Corpus

Transcribe Bentham (Causer & Terras, 2014) [13]

• UCL (London)

• Unpublished manuscripts transcribed by volunteers (crowdsourcing)

• 30,000 pages

7

Jeremy Bentham: Philosopher, social

reformer (1748-1832, London)

Image: blogs.ucl.ac.uk/transcribe-bentham/

8

Gephi version

• This presentation covers Gephi 0.9, which came out in December 2015, and which works with Java 8 or 7

• Most training materials on Gephi are about version 0.8.2 (worked with Java 7, NOT 8)

• Small UI changes between 0.8.2 and 0.9

• Cytoscape 3.3.0, works with Java 8, NOT 7

9

Cytoscape version

10

Import Edges table (1)

11

• Start Gephi and go to Data Laboratory. You may need to close the Projects popup. Do File / New Project

• Click on Import Spreadsheet and search in the materials for a file whose name ends with “edges.csv”. Import it as an Edges table

Import Edges table (2)

12

2. Once the table is imported, create labels by copying ID with the “Copy data to another column” tab in the bottom row

2a 2b

1. Import Edge Table Weight and Create missing nodes must be checked in the dialogue

Initial Network • Click on the Overview tab to see the initial, not

spatialized network:

13

Saving and exporting a project

• It is advisable to both save and export a project

14

To save a project, just click on Save, as would be expected. It will be saved as a project file with the .gephi extension (it’s a sort of zip file)

Additionally, also export the network as a graph file for safety

Network Layout (1) • Run the Force Atlas layout, with these settings:

15

In force-based layouts (like Forced Atlas or Forced Atlas 2), linked nodes attract each other and unrelated nodes are represented as further apart. See [3] and [8].

Determines how far apart nodes will be, thus affecting the readability of the network (how wide it will spread)

Helps avoid label overlap (but there are other means for this too)

1. Choose the Layout 2. Specify Settings

Network Layout (2) • Once the network stabilizes, you can stop Force Atlas.

• The initial layout will look similar to below

• The zoom slider can be used to see more or less of the network

16

Zoom

Toggle bottom pane here

Node and Edge Appearance

• In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based

17

Node and Edge Appearance

• In Gephi 0.9, unlike in 0.8.2, there are two modes for node and edge appearance, Unique and Attribute-based

18

• Colour • Size • Label colour • Label size

Attributes correspond to properties of nodes and edges, reflecting their role in the network as per different metrics

• Different types of metrics can be encoded in the node size. Here, we use a node’s Degree (how many nodes it is connected to)

Node Size

19

In the Appearance tab, choose the Nodes and Attribute buttons: and then: - Degree in the dropdown menu - The CIRCLES icon for node size

in the button bar, hit Apply

After applying the ranking, node size will reflect the ranking criterion. In this case, more strongly connected nodes will be bigger

For information on other ranking criteria, see [4]

Node Labels (1) • Other Node Label settings can be accessed from

the bottom panel, that can be toggled here

• If at any point node labels overlap, this can be fixed by running the Label Adjust layout

20

Node Labels (2) • Label Sizes are defined with the leftmost button

21

- In scaled mode, all labels bear the same size, scaled for readability

- In fixed mode, all labels bear the size specified in the font dropdown (Dialog bold 32 in the example)

- In node size mode, label size matches node size

- Run Label Adjust Layout in case of label overlap

• Label Colour is defined with the rightmost button

Community Detection: Modularity • The modularity tool can be run to detect communities, i.e.

groups of nodes that are more strongly connected among them than they are to other groups of nodes [9].

22

1. In the Statistics pane on the right, look for Modularity and hit Run

2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button

Community Detection: Modularity • The modularity tool can be run to detect communities, i.e.

groups of nodes that are more strongly connected among them than they are to other groups of nodes [9].

23

1. In the Statistics pane on the right, look for Modularity and hit Run

2. Go to the Partition tab on the left, select Modularity Class from the dropdown, and hit Apply. The colors can be changed by clicking and right-clicking inside the colored square, or with the Palette button

Preview Pane Preview after applying a node size criterion and community detection Settings are default, unless specified on the screenshot.

24

Show Labels was activated

Edge Thickness was reduced to 0.2 to avoid too thick edges on highly connected nodes

Hit Refresh after any changes to the Settings or to reset an unreadable preview pane

Filters • The network can be filtered

according to many criteria (see [6]). Here, we filter nodes that have less than six connections, to get rid of generally less relevant nodes and edges

25

Expand the Topology dropdown - Double click on Degree Range - Move the slider at the bottom up up to the desired minimum degree

Exporting visualization as PDF or image

• In the Preview pane, there’s a button to export the visualization (bottom left)

26

Export visualization as an interactive website: sigma.js exporter (1)

• Gephi has several plugins that allow exporting the network in an interactive website format.

– The website allows zooming in and out

– In some cases, the user can selectively focus a part of the network and run searches for nodes

• We’ll be using the sigma.js exporter plugin [10], which has all of the functions above. Depending on your browser, it may need to be run inside a web server (Apache, XAMPP, Wamp, EasyPHP etc.)

• Other plugins allowing some of the above functions:

– Seadragon plugin

– Google Maps Exporter 27

Network as a website (2): sigma.js

• We need to do three things: – Install the sigma.js exporter

plugin

– Export the network as a sigma.js site

– Make the site available from a web server

• To install the plugin: – Go to Tools/Plugins, and select

Sigma Exporter in the Available Plugins tab (once installed, it will move to the Installed tab)

28

Network as website (3): Exporting

• Jafkaj

29

1. Export the network from File/Export and Sigma.js template

2. Fill in the dialogue: Give the path to folder to export the site to, and the legend to be displayed for the site’s data

Network as website (4): Web Server • We need to take the exported site from the previous step and put it

in a web server. Note: some browsers (e.g. Firefox) allow seeing the networks just by opening the index.html file, no need for the local web server

• If you don’t have a web server installed, a possibility is to install XAMPP https://www.apachefriends.org

– Windows: • https://blog.udemy.com/xampp-tutorial/

• https://www.apachefriends.org/faq_windows.html

– Linux: https://www.apachefriends.org/faq_linux.html

– Mac: https://www.apachefriends.org/faq_osx.html

• Once you have the web server, to see the network, point a browser to http://localhost/XXX , where XXX corresponds to the name of your sigma.js network (by default the name is network when Gephi exports it). 30

Network as website (5): Config

31

If edges on the exported network are too thin and node labels are not visible

Look for config.json inside the folder where the sigma.js site was exported (network by default)

- Increase minEdgeSize and maxEdgeSize for thicker edges - Decrease labelThreshold to see more labels

32

Import the network or edges file

33

The example involves the graphml network for the Bentham corpus. Other graphml networks are available in the materials and can be manipulated similarly. An edges file (CSV) can also be imported the same way (but click ‘advanced’).

Layout

34

The AllegroLayout plugin was used (Force-based), install it with Apps / Apps Manager Default options were chosen: “Spring-electric” option. If need to modify the layout, read about their intended effect with the tooltips

Apps / App Manager

35

Layout (another example)

36

If you need a clearer layout, the Scale option will spread the network. If the edges have a weight attribute, it can be used from the Edge Weighting tab The following example follows a graphml import of the American crisis corpus, and the scale was modified. (The screenshot also reflects later modifications to the network appearance, see the following slides).

Node attributes from node table • We imported a ready network in Graphml; we

can read the attributes off the import:

37

Attributes Similar to Gephi’s Unique / Attribute buttons:

First column (Def.) defines a unique value

Second column (Map.) defines values based on an attribute

Final column (Byp.) allows to define exceptions

In the example:

- Fill color (default is a blue hue) reflects communities (based on column cluster_universal_index in the node table)

- Size (default 35) is based on the size column of the imported nodes

38

Attribute value “mapping”

• Discrete: a discrete set of categories

• Continuous: continues values, the minimum and maximum can be set.

• Passthrough: values read off the import file directly

39

Node color for communities

40

Node color according to the community id of the node in the imported node table (in this case the id was called cluster_universal_index, but other names may appear) Note: the original network was created with Cortext Manager (manager.cortext.net), and communities were created with the Louvain method [9]

Other appearance options (1)

41

After adding node color for communities, the node label was read off the label field of the nodes in the imported graphml network (otherwise the label would be the node’s numeric id). Label Font Size was set to 90

Other appearance options (2)

42

Using the character co-occurrence example in Les Misérables provided in [2b] Node size was made dependent on the “size” attribute of the nodes in the graphml. The background color was changed from the Network tab (at the bottom of the control panel). Edge color was changed with the Edge tab.

Importing edge table and analysis • If we are importing just the edges

(Source,Target,Weight), we won’t have all the attributes like node size, communities etc. So the first thing after import is running an analysis:

43

Analyzing the network: partitioning

• Several possibilities:

44

Filtering the network

45

From the Select tab in the control panel (on top) Enter a selection criterion and create a new network with the result. Selected nodes are highlighted in yellow.

Visualizing the analyzed network

46

After running the analysis, a partitioning with the Community cluster (GLay) app was peformed. Node color is based on that (__glayCluster attribute). Node size was made dependent on node degree (i.e. how many connections it has)

Exporting the network • Like in Gephi, the network can be exported as

an image, as a graph file, or as a website.

47

Other

48

The grid view helps look at different regions of the network (or selected vs othe rnodes ) at once

Interpretation problems

• Hubs vs. Authorities:

– nlp.stanford.edu/IR-book/html/htmledition/hubs-and-authorities-1.html

• Force Atlas Layout and Force Atlas with Attraction Distribution:

– The latter pushes hubs to the periphery, giving a different view of the same network, see [11]

• Hubs vs. “Sinks” (e.g. air traffic)

49

Hubs vs. Authorities (1)

50

B. R

ied

er (

20

10

) [1

1]

Hubs vs. Authorities (2)

51

B. R

ied

er (

20

10

) [1

1]

Hubs vs. Authorities (3)

• Force Atlas and Force Atlas with Attraction Distribution:

– The latter pushes hubs to the periphery, giving a different view of the same network, see [11]

– Look at Barry Wellman in the preceding graphs

52

Hubs vs. “Sinks” (1)

53

Hubs vs. “Sinks” (2)

• Las Vegas is not a central element in the network. People fly to Las Vegas and back to their departure city, not through Las Vegas.

54

References: Gephi Tutorials The format of the reference list is: Description: URL [description of the dataset if applicable]

[1] General Tutorial: https://gephi.github.io/users/quick-start/ [Character cooccurrences in Hugo’s Les Misérables]

[2a] Deeper: By Martin Grandjean http://www.martingrandjean.ch/gephi-introduction/ [many datasets]

[2b] Deeper: By Clément Levallois http://www.clementlevallois.net/gephi.html

[several datasets]

[3] Importing edge tables from CSV: http://www.literaturegeek.com/2013/09/09/dataintogephi/ [Character interactions in Joyce’s Ulysses]

[4] Network Layouts: https://gephi.github.io/users/tutorial-layouts/ [Les Misérables, Airlines dataset, Internet Core Routers datasets]

[5] Metrics: http://www.clementlevallois.net/gephi/tuto/en/gephi_advanced%20functions_en.pdf

[6] Formatting the Networks: https://gephi.github.io/users/tutorial-visualization/ [Airlines dataset]

[7] Filters: http://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-%E2%80%93-my-facebook-network-part-ii-basic-filters/ [Facebook]

55

References: Cytoscape

56

References: Other [7] Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for

exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. http://gephi.org/publications/gephi-bastian-feb09.pdf

[8] Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679

[9] Blondel, Vincent D and Guillaume, Jean-Loup and Lambiotte, Renaud and Lefebvre, Etienne. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. http://arxiv.org/pdf/0803.0476.pdf

[10] Sigma JS exporter, created by the Oxford Internet Institute: http://blogs.oii.ox.ac.uk/vis/

[11] Rieder, B. (2010). One network and four algorithms http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/

[12] Smith, N.A., Cardie. C., Washington, A. L., Wilkerson, J.D. (2014). Overview of the 2014 NLP Unshared Task in PoliInformatics. Proceedings of the ACL Workshop on Language Technologies and Computational Social Science, 5-7.

[13] Tim Causer and Melissa Terras (2014). Crowdsourcing Bentham: Beyond the traditional boundaries of academic history. International Journal of Humanities and Arts Computing, vol. 8(1), pp. 46-64.

57

Thank you!

58

top related