gephi icwsm tutorial 110717064641 phpapp02

Upload: joao-ramos

Post on 08-Aug-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    1/43

    ICWSM11 TutorialExploratory Network Analysis with:

    Instructors: Sbastien Heymann, Julian [email protected],[email protected]

    July 17, 2011 | 1 PM - 4 PM

    mailto:seb%40gephi.org?subject=mailto:julian.bilcke%40gephi.org?subject=mailto:julian.bilcke%40gephi.org?subject=mailto:seb%40gephi.org?subject=
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    2/43

    Exploratory Network Analysis with Gephi

    This tutorial is an introduction to Gephi, the open source graph networkvisualization and manipulation software.

    Gephi aims to fulll the complete chain from data importing to aesthetics

    renements and interaction.

    Users interact with the visualization and manipulate structures, shapesand colors to reveal hidden properties.

    The goal is to help data analysts to make hypotheses, intuitively discoverpatterns or errors in large data collections.

    At the end, the participants will walk away with the practical knowledgeenabling them to use Gephi for their own projects.

    OFFLI

    NE

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    3/43

    Exploratory Network Analysis with Gephi

    It starts with a brief introduction on the network exploration process anda hands-on demonstration of the essential functionalities of Gephi.

    Participants are guided step by step through the complete chain of rep-resentation, manipulation, layout, analysis and aesthetics renements.

    Next, teams work on real datasets.

    They nally present their preliminary results. The tutorial concludes with

    a general question and answer session.

    OFFLI

    NE

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    4/43

    Requirements

    Bring your own laptop with Java and Gephi installed.Gephi should be updated (menu Help > Check for Updates).

    Bring a mouse with a wheel.

    Bring a dataset of your own if you want, verify if it loads well in Gephi.[1]

    [1] http://gephi.org/users/supported-graph-formats/

    http://gephi.org/users/supported-graph-formats/http://gephi.org/users/supported-graph-formats/
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    5/43

    Workshop Schedule - Part I

    Exploratory Network Analysis

    Exploratory Data Analysis Exploratory Network Analysis Looking for Orderness in Data Examples Guideline

    Introduction to Gephi

    Approach and Community

    Networked Data Quick Start Demo

    * 30 min break *

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    6/43

    Workshop Schedule - Part II

    Hands-On!

    Team Work on a Dataset Presentation of Preliminary Results

    Q&A

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    7/43

    Exploratory Data Analysis

    The greatest value of a picture is when it forces usto notice what we never expected to see

    started withJohn Tukey (1962)

    Conrmatory

    Exploratory

    Serendipity

    resultsintuition

    surprise

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    8/43

    Exploratory Data Analysis

    Non-linear processing chain of Ben Fryin Computational Information Design (2004)

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    9/43

    Dummy Example

    P2P le size distribution (Latapy et al., 2008)

    Observation:visual saliences on specic

    le sizes

    External knowledge:these sizes correspond tolms

    New hypothesis on data:lms are highly exchanged,

    so the study might dig in

    this direction

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    10/43

    Exploratory Network Analysis

    see the network11st graph viz tool: Pajek (1996)

    Vladimir Batagelj, Andrej Mrvar

    interact in real time2

    3

    Gephi prototype (2008)

    group, lter, compute metrics...

    size by rank, color by partition,label, curved edges, thickness...

    build a visual language

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    11/43

    Looking for a Simple Small Truth?

    Drew Conway, What Data Visualization Should Do: 1. Make complex things simple2. Extract small information from large data3. Present truth, do not deceive

    http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/

    http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    12/43

    Looking for Orderness in Data

    Make varying 3 cursorssimultaneously to extractmeaningful patterns

    MICRO level MACRO level

    1 dimension N dimensions

    T+0 T+N

    at different levels

    on multiple dimensions

    at time scale

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    13/43

    Zoom cursor on Quantitative Data

    Global

    - connectivity- density- centralization

    Local

    - communities- bridges between communities- local centers vs periphery

    Individual- centrality- distances- neighborhood- location- local authority vs hub

    MICRO level MACRO level

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    14/43

    Crossing cursor on Qualitative Data

    Social

    - who with whom- communities

    - brokerage- inuence and power

    - homophily

    Semantic

    - topics

    - thematic clusters

    Geographic

    - spatial phenomena

    1 dimension N dimensions

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    15/43

    Timeline cursor on Temporal Data

    Evolution of social ties

    Evolution of communities

    Evolution of topics

    T+0 T+N

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    16/43

    Mapping an Innovation CenterCollaborations on projects at Images et Rseaux

    Themes and content

    Actors

    Territory

    Franck Ghitalla & Ecole de Design de Nantes

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    17/43

    Mapping Scientic Cooperations

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    18/43

    Network Map: a Series of Choices

    corpus

    data

    algorithms

    thresholds

    graphical

    operations

    communication

    goals

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    19/43

    Guideline

    lists + edges in bonus, focus on qualitative data

    How attributes explain the structure? easy to read, obvious patterns

    focus on entities (in context) metrics are tools to describe the graph (centrality, bridging...) links help to build and interpret categories of entitieschallenge: mix attribute crossing and connectivity

    How the structure explains attributes? hard to read, problem of hidden signals:

    track patterns with various layouts and ltering focus on structures metrics are tools to build the graph (cosine similarity...) categories help to understand the structurechallenge: pattern recognition

    require high computational power

    1 - 100

    100 - 1,000

    1,000 - 50,000

    > 50,000

    # nodes

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    20/43

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    21/43

    Gephi in a Nutshell

    Like Photoshop for graphs.

    Helps data analysts to reveal patterns and trends,highlight outliers and tells story with their data.

    Network visualization platform

    Open source, supported by a community

    Built for performance and usability

    Extensible by plug-ins Windows, MacOS X, Linux

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    22/43

    Gephi Community

    ContributorsCommunities

    Mathieu Bastian, Mathieu Jacomy,Eduardo Ramos Ibaez, SbastienHeymann, Guillaume Ceccarelli,Andr Panisson, Antonio Patriarca,Cezary Bartosiak, Martin kurla,Patrick McSweeney, Yi Du, HlderSuzuki, Daniel Bernardes, ErnestoAneiro, Keheliya Gallaba, LuizRibeiro, Urban kudnik, VojtechBardiovsky, Yudi Xue

    Nonprot organization

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    23/43

    Community Mission

    Provide a sustainable software

    Maintain the technical ecosystem

    Build a business ecosystem

    Face cutting-edge technological challenges witha long-term vision

    Distribute the software in Open Source

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    24/43

    Community Values

    Open innovation: ideas and features come fromthe entire community.

    Decisions are taken with transparency.

    We consider this technology as a public good,and will keep it in open source.

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    25/43

    Diversity of Usages

    business leisure :-)

    communication academic art

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    26/43

    Diversity of Network Encoding

    V = { a, b, c, d, e }E = { (a,b), (a,d), (b,c), (e,a), (c,e) }

    Textual

    a b c d ea - 1 - 1 -b - - 1 - -c - - - - 1d - - - - -e 1 - - - -

    Tabular

    XMLGraphical

    and many others...

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    27/43

    Software I/O

    } >graph streaming

    databases

    le

    le

    CSVPajek NETGuess GDFGEXFGraphMLGraphviz DOTUCInet DLNetdrawVNA

    Tulip TLPExcel Spreadsheet

    MySQLPostgreSL

    SQL ServerNeo4j

    CSV

    Pajek NETGuess GDFGEXFGraphMLExcel SpreadsheetSVGPDFPNG

    user input

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    28/43

    Choosing a File Format

    Table of features supportedby Gephi

    * spreadsheets can be loaded

    in the Data Laboratory

    Ed

    geList/M

    atrix

    Stru

    cture

    XMLS

    truture

    Edge

    Weig

    ht

    Attribu

    tes

    Visualizatio

    nAttribu

    tes

    Attribu

    teDefaultVa

    lue

    Hierarchica

    lGraph

    s

    Dyna

    mics

    CSV

    DL Ucinet

    DOT Graphviz

    GDF

    GEXF

    GML

    GraphML

    NET Pajek

    TLP Tulip

    VNA Netdraw

    Spreadsheet*

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    29/43

    Do you need...

    GEXFSpreadsheetGraphMLGuess GDF

    GMLUCINet DLNetdraw VNAGraphviz DOTPajek NETCSV

    Tulip TLP

    Many features

    Few features

    XML

    TabularText

    File Type

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    30/43

    Using Gephi

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    31/43

    Team work

    Create a team of 2~3 people.1

    Two teams present their preliminary ndings.

    Explore it during 1H.

    Choose a dataset.2

    3

    4

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    32/43

    Dataset #1: GitHub Software Repository

    GitHub is an application used by nearly a million people to storeover two million code repositories, making GitHub the largest code

    host in the world.

    Started in 2008, it provides the features of an online social network

    and a software repository to lower the barriers of collaboration andmake the code easier to contribute.

    https://github.com

    https://github.com/https://github.com/
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    33/43

    Dataset #1: GitHub Software Repository

    Data extracted by Franck Cuny* at Linkuence SAS

    1st release in March 2010 -> this poster2nd release in June 2011 -> your data

    _____________Network of user proles__________Nodes: peoples with at least one repository whoare followed by at least two other peopleEdges: A follows B

    _____________Network of repositories__________

    Nodes: repositoriesEdges: A shares a developer with B

    Very few research publications on this OSN!

    * [email protected]

    mailto:franck.cuny%40linkfluence.net?subject=Github%20datasetmailto:franck.cuny%40linkfluence.net?subject=Github%20dataset
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    34/43

    Dataset #1: GitHub Software Repository

    Data extracted by a crawl using the GitHub APISeed: 10 well-known contributors in the Perl community

    Networks by country: Japan, France, United StatesNetworks by language: Perl, PHP, Python, Ruby

    Node attributes: user country number of followers main programming language

    Edges:

    directed weight = number of projects A has forked from B

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    35/43

    Dataset #1: GitHub Software Repository

    Your mission (should you decide to accept it):nd research hypotheses based on your exploration

    Example question: are the Perl communities based on geography?

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    36/43

    Dataset #2: The Irish Blogosphere

    _______________Blogroll Network______________

    Nodes: blogs with more than two blogroll linksEdges: blogroll link (in-link)

    _______________Post-link Network_____________

    Nodes: blogs with more than two blogroll linksEdges: hyperlink inside post from a blog to another(post-link)

    Identifying Representative Textual Sources in Blog Networks. K. Wade, D.Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs

    http://mlg.ucd.ie/blogshttp://mlg.ucd.ie/blogs
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    37/43

    Dataset #2: The Irish Blogosphere

    Data extracted by a crawl at distance 2 from the seed for the in-linksand Google Blog Search for the post-links.Seed: 21 popular blogs, winners of the 2010 Irish Blog Awards

    Node attributes:

    post count = total number of posts by blog category = from the irish blog index at www.irishblogdirectory.com,where available

    infomap_comm = community to which a node belongs (infomap algo) gce_comms = overlapping communities (GCE algo) moses_comms = overlapping communities (MOSES algo)

    Edges: directed weight = number of hyperlinks in the Post-link network

    crawl at distance 2 from the seed

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    38/43

    Dataset #2: The Irish Blogosphere

    Your mission:explore and try to conrm the ofcial results

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    39/43

    Hands-On!

    Start:

    Load a graph Apply a layout Color the nodes by a qualitative variable in Partition Panel

    Size the nodes by a quantitative variable in Ranking Panel Start to explore...compute metrics, lter the network

    End:

    Export maps to PDF in Preview Tab

    Save

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    40/43

    Presentations

    GitHub Repository Irish Blogosphere

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    41/43

    Gephi Documentation

    Web Site:

    Support:Wiki:Source code:

    Online Tutorialshttp://gephi.org/users/quick-start/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/tutorial-layouts/http://wiki.gephi.org/index.php/Import_CSV_Datahttp://wiki.gephi.org/index.php/Import_Dynamic_Data

    Tutorial in Spanishhttps://code.google.com/p/camon/wiki/Taller_Gephi

    Supported Graph Formatshttp://gephi.org/users/supported-graph-formats/

    http://gephi.org

    http://forum.gephi.org

    http://wiki.gephi.org

    https://launchpad.net/gephi

    http://gephi.org/users/quick-start/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/tutorial-layouts/http://wiki.gephi.org/index.php/Import_CSV_Datahttp://wiki.gephi.org/index.php/Import_Dynamic_Datahttps://code.google.com/p/camon/wiki/Taller_Gephihttp://gephi.org/users/supported-graph-formats/http://gephi.org/http://forum.gephi.org/http://wiki.gephi.org/https://launchpad.net/gephihttps://launchpad.net/gephihttp://wiki.gephi.org/http://forum.gephi.org/http://gephi.org/http://gephi.org/users/supported-graph-formats/https://code.google.com/p/camon/wiki/Taller_Gephihttp://wiki.gephi.org/index.php/Import_Dynamic_Datahttp://wiki.gephi.org/index.php/Import_CSV_Datahttp://gephi.org/users/tutorial-layouts/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/quick-start/
  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    42/43

    Thank You!

    Caspar David Friedrich -

    Wanderer Above the Sea of Fog

  • 8/22/2019 Gephi Icwsm Tutorial 110717064641 Phpapp02

    43/43

    Credits

    [slide 11] images from Drew Conway

    http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/

    [slide 22 top left] Benot Vidal at MFG Labs

    [slide 22 bottom center] Franck Ghitalla at UTC

    [slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsanghttp://jeunhotsang.com/blog/2010/12/07/prototype/

    [slide 27] sketches from Ben Fry, Computational Information Design

    Special Thanks to Franck Ghitalla and Mathieu Jacomyfor their insightful discussions.

    http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/http://jeunhotsang.com/blog/2010/12/07/prototype/http://jeunhotsang.com/blog/2010/12/07/prototype/http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/