network analysis in python with networkxbiconsulting.hu › letoltes › 2015budapestbi ›...

16
Network Analysis in Python with NetworkX Johannes Wachs Central European University, Center for Network Science

Upload: others

Post on 23-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Network Analysis in Python with NetworkX

Johannes WachsCentral European University,Center for Network Science

Page 2: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

About me

PhD student at CEU’s Center for Network Science

Researcher at the Corruption Research Center Budapest

Consultant at Bondweaver

Python/NetworkX user in all three roles

Page 3: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Why Networks?

Networks are an excellent framework to study complexity.

They provide holistic information that is increasingly valuable (and quantifiable!) with the growth of information technology.

*New Yorker

Page 4: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Famous Example: Google

Count the number of links to each page.

>> Easy to manipulate, poor results

How do we order webpages?The problem:

The previous solution (Alta Vista):

Measure link quality (PageRank)The innovation:

Page 5: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

PageRank

It’s not how many links you get, but who links to you

(and who links to them, etc.)

*http://www.bloggingfever.com/

Page 6: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

NetworkX

Python (2/3) library for network analysis

Made at LAML, released 2005. Currently: v1.11 (October 2015)

● provide tools to study networks● standard interface suitable for many projects● rapid development for collaborative/multidisciplinary projects● interface to C/C++/FORTRAN● ability to take in large, nonstandard datasets

Goals:

Page 7: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

NetworkX Workflow

Import data

● dictionary, Pandas DF, np array

● edgelist/adj Matrix CSV

● graph file format GML, GEXF

● JSON

Create graph

● Graph, DiGraph, MultiGraph, Bipartite

● Labels/Attributes for nodes and edges

Calculate

● clustering, centrality, degrees

● community detection

● null-models

Analyze

● Draw graph (networkX or export to Gephi)

● Distribution of network statistics vs attributes

● Export ● Interactive via

D3.js

Page 8: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Case Study 1Collaboration and Expertise in a Firm

A 300 person department of a multinational wanted to know how they were collaborating and how expertise was distributed across the unit.

We surveyed employees to see who they considered important collaborators. We also asked them to nominate knowledge experts (+sociometry).

Two layers: the network of collaboration, and the network (PageRanked!) of expertise.

Page 9: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Firm Results

We see large clusters of collaborating employees without direct access to expertise on the periphery.

Red: ManagersPink:Team LeadsBlack: Employees

Larger nodes are experts

Edges connect collaborating coworkers

Page 10: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Case Study 2Twitter and Elections

For better or worse, Twitter has become a major platform for politics.

Scraping tweets we see how followers of different parties talk with each other around election time.

Networks show us the big picture: do left and right ever talk anymore?

*Reuters

Page 11: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Denmark 2015

10 major parties.

~100,000 accounts, 5.5 million tweets from six months before the election to one month after. Accounts grouped Left/Center/Right.

Sentiment Analysis via AFINN.

The most negative tweet during the campaign:Engang var #dkpol en kamp mellem de onde og de dumme. Nu er det de onde og dumme mod nogle andre onde og dumme. #fv15 #fv2015 #stemblankt'

Once, #dkpol was a battle between evil and stupid. Now it's the evil and stupid against some other evil and stupid. # fv15 # fv2015 #stemblankt '

Page 12: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Twitter Results

Some of the Left and Right tweet at each other, but on the Left’s turf.

Long path between the Left and Right primary clusters.

Sentiment:

● Left/Right fight (negative sentiment) before the election, get more friendly after.

Red: LeftYellow: CenterBlue: Right

size ~ PageRank

Page 13: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Corruption in Public Contracting

Public contracting is up to 25% of GDP in EU countries. This is a major avenue for corruption.

The Corruption Risk Index (CRI) grades contracts on the presence of red flags:

● Short bidding time (tell your friend ahead of time)● Presence of dummy bids (create fake competitors for your friend)● Overdetermination of requirement (make your friend uniquely eligible)● etc.

Q: How is CRI distributed in the market of issuers and firms?

Page 14: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Corruption risk is clustered!

Significant corruption assortativity

Corruption Results

Nodes:Firms and IssuersRed: High CRIHungary 2009

Page 15: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Alternatives

● igraph: written in C/C++, packages in Python and R. Faster● graph-tool: Python with data structures/algos in C++. Fastest

My rule of thumb is if:

● >50,000 Nodes or ● if integrated into production

consider these options.

Page 16: Network Analysis in Python with NetworkXbiconsulting.hu › letoltes › 2015budapestbi › budapestbiforum2015_J… · igraph: written in C/C++, packages in Python and R. Faster

Thanks!