konect cloud – large scale network mining in the cloud

Post on 26-Jan-2015

107 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

In the Winter 2011/2012 run at the Future SOC Lab, we used the KONECTframework (Koblenz Network Collection) to compute tendifferent network statistics on a large collection of downsampledversions of a large network dataset, with the goal of determiningwhether sampling of a large network can be used to reduce thecomputational effort needed to compute a network statistic. Preliminaryresults show that this is indeed the case.

TRANSCRIPT

1

KONECT Cloud

Large Scale Network Mining in the Cloud

Jérôme Kunegis Future SOC Lab Day, 18.04.2012

Networks are Everywhere

Communication

Authorship

Friendship

c

Interaction

Trust

Co-occurrence

Social Networks

friend

Trust Networks

trust

Friend/Enemy Network

enemy

frien

d

Interaction Networklisten

KONECT – Koblenz Network Collection

148 network datasets

26 are undirected 38 are directed 84 are bipartite 59 have unweighted edges 77 allow multiple edges 04 have signed edges 08 have ratings as edges 78 have edge arrival times

konect.uni-koblenz.de

Largest Network

Directed “who follows who” network

0 041 652 230 users

1 468 365 182 edges

konect.uni-koblenz.de/networks/twitter

148 Network Datasets

authorshipcommunicationco-occurrence

featuresfolksonomyinteraction

physicalratings

referencesemantic

socialtrust

What We Computed

Connected componentsNetwork diameterClustering coefficientsDegree distributionsSpectral distributionEigenvector centralityGraph drawingTemporal AnalysisLink prediction

←at Future SOC Lab

Network Diameter

6

90 Percentile Effective Diameter

5

90 Percentile Effective Diameter

3

90 Percentile Effective Diameter

3.75

Computing the Effective Diameter

for each node i { |V| count hops needed to reach 90% |E|

}

Total runtime: |E| × |V|

Graph Sampling

KeepX% of edges

Computation

× 1 000 vertices (sampled)× 120 840 391 edges× 20 sample sizes (5%, 10%, …, 100%)× 50 random samplings

Evaluation on single machine:

1 TiB memory 64 cores Matlab 64 bit

Results

Dr. Jérôme Kunegis

kunegis@uni-koblenz.de

west.uni-koblenz.de

Thank You!

konect.uni-koblenz.de

top related