the network-untangling problem · motivating example • analyze a discussion in twitter about a...

34
The network-untangling problem: From interactions to activity timelines Polina Rozenshtein (Nordea DS Lab, Finland) Nikolaj Tatti (University of Helsinki, Finland) Aristides Gionis (Aalto University, Finland) ECML/PKDD’17 + journal extension

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

The network-untangling problem:

From interactions to activity timelines

Polina Rozenshtein (Nordea DS Lab, Finland)

Nikolaj Tatti (University of Helsinki, Finland)

Aristides Gionis (Aalto University, Finland)

ECML/PKDD’17 + journal extension

Page 2: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Temporal networks

• Temporal graph ! = #, %

• # – set of entities (e.g. people, sensors, locations.. )

• Edges &, ', ( ∈ % – instantaneous interactions over entities

• &, ' ∈ V

• ( is the time of interaction

• tweets, emails, comments on social networks..

Page 3: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Problem setting

• consider a set of entities

• entities can become active or inactive

• entities interact over time, forming a temporal network

• each interaction is attributed to an active entity

Page 4: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Problem setting

• consider a set of entities

• entities can become active or inactive

• entities interact over time, forming a temporal network

• each interaction is attributed to an active entity

• can we reconstruct the activity timeline that explains best the

observed temporal network?

• assumption: being active is more costly, thus we want to

minimize total activity time

Page 5: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Motivating example

• analyze a discussion in twitter about a topic (e.g., brexit)

• entities are hashtags

• two hashtags interact if they appear in the same tweet

• summarize the discussion by reconstructing a timeline

• pick a set of important hashtags and the time intervals they

are active

Page 6: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Motivating example

time

#economy

#brexit

#negotiations

#hardbrexit

#tory

Page 7: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Motivating example

time

#economy

#brexit

#negotiations

#hardbrexit

#tory

Page 8: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Problem formulation

• given a temporal network ! = ($, &) with & = {(), *, +)}

• -. = /., 0. – activity interval of ) ∈ $ (starts at /. and ends

at 0.)

• find a set of activity intervals for all nodes

• at most 2 per each node ) ∈ $

Page 9: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Problem formulation: preliminaries

• given a temporal network ! = ($, &) with & = {(), *, +)}

• -. = /., 0. – activity interval of ) ∈ $ (starts at /. and ends

at 0.)

• find a set of activity intervals for all nodes

• at most 2 per each node ) ∈ $

• Activity timeline of ! is a set of activity intervals 3 =

-.4 .∈5,4∈[7,8]

• The timeline 3 covers temporal network !, if for each edge

), *, + ∈ & we have + ∈ -.4 or + ∈ -:4 for some ; ∈ [1, 2].

Page 10: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Problem formulation

Problem 1. (Sum-Span)

• Find a timeline ! = #$% $∈',%∈[*,+]that covers - and

minimizes total length of !.

Problem 2. (Max-Span)

• Find a timeline ! = #$% $∈',%∈[*,+]that covers - and

minimizes maximum length of intervals in !.

• For the ease of analysis consider . = 1 and . > 1 separately

Page 11: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Sum-Span

Problem 1-Sum-Span is NP-hard

Consider subproblem Coalesce:

• Assume we are also given one active time point !" for each

vertex # ∈ %.

• Find an optimal activity timeline &, which contains the

corresponding active time points !" "∈'.

Page 12: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Sum-Span

• Coalesce can be solved in linear time with factor 2

approximation, based on Binary LP-formulation.

• Define a variable !"# ∈ {0,1} for each vertex * ∈ + and time

stamp , ∈ -(*) (moments of interactions of *).

• !"# = 1 indicates that , is either the beginning or end of the

active interval of *.

• Binary LP:

– Cost function min∑",# |, − 7"| !"#

– Constraints to ensure feasibility

Page 13: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Sum-Span

• Relax the integrality and write the dual

• Maximal solution to the dual program is a 2-approximation for

Coalesce

• Maximal solution can be found in one pass (! " , Alg.

Maximal)

Iterate to solve 1-Sum-Span (Alg. Inner):

• Start with "$ = ("'( ) * +",- ) * )/2

• Run Maximal and update "$

• Repeat until no improvement.

Page 14: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

k-Sum-Span

k-Sum-Span is are inapproximable

Consider subproblem k-Coalesce:

• Assume we are also given k active time points !"# for each

vertex $ ∈ &

• One for each of activity intervals of $

• Find an optimal activity timeline ', which contains the

corresponding active time points !"# "∈(,#∈[+,,].

• Similar BLP and Alg. k-Maximal, . !

Page 15: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

k-Sum-Span

Iterate to solve k-Sum-Span (Alg. k-Inner):

• Start with !"# as centroids of a k-clustering algorithm

• Run k-Maximal and update !"

• Repeat until no improvement

Page 16: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Max-Span

1-Max-Span can be solved efficiently

Subproblem Budget:

• Assume we are also given a set of budgets !" "∈$ of interval

durations for each vertex.

• Find an optimal activity timeline % = '" "∈$, such that

length of each activity interval '" is at most !".

Page 17: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Max-Span

Budget can be solved optimally in linear time

Map Budget into 2-SAT:

• Variable !"# for each vertex $ and timestamp % ∈ '($).

• Clause (!"# ∨ !+#) – cover each edge ,, $, % .

• Clause (!"/ ∨ !"#) – ensure budget:

for each 0, % ∈ '($), such that 0 − % > 3"

• Solution for Budget : time intervals where all boolean

variables are True.

Page 18: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

1-Max-Span

Linear time:

• 2-SAT is solved in linear-time of the number of clauses (Aspvall

et all [1]). We have ! "# clauses.

• Bottleneck: SCC decomposition !("# +")

• algorithm by Kosaraju [2] for SCC decomposition

• Use of temporal structure → perform DFS in !(").

Solve 1-Max-Span by binary search to find the optimal maximum

length for intervals (Algorithm Budget, !(" log("))).

Page 19: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

k-Max-Span

k-Max-Span inapproximable

• consider two nested subproblems

Subproblem k-Partition:

• Assume we are also given k-1 inactive time points !"# for

each vertex $ ∈ &

• One for each of gap between the activity intervals of $

• Find an optimal activity timeline ', which interleaves with

corresponding gap points !"( "∈),(+[-,./-]

Page 20: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

k-Max-Span

• Problem k-Partition can be solved in polynomial time through

iteration of Problem k-Budget, which sets a budget for each

interval.

Subproblem k-Budget:

• Assume we are given a set of budgets !" "∈$ of interval durations

for each vertex;

• k-1 inactive time points %"& for each vertex

• Find an optimal activity timeline ' = )" "∈$, such that length of

each activity interval )"& is at most !"& and the gap points are

interleaved

k-Budget can be solved *(,), similarly to Budget

Page 21: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

k-Max-Span

Iterate to solve k-Sum-Span (Alg. k-Budget):

• Start with !"# as mean points of the largest intervals with no

activity of node $

• Solve k-Partition:

– do binary search on budgets with solving k-Budget

– update !"#

• Repeat until no improvement

Page 22: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Summary

Problem 1: Sum-Span

• ! = 1 NP-hard

• ! > 1 inapproximable

• Subproblem (k-)Partition with inner points

• 2-approximation in linear time via BLP dual for (k-)Partition

Page 23: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Summary

Problem 2: Max-Span

• ! = 1 polynomially solvable

• ! > 1 inapproximable

• Subproblem (k-)Budget with budgets

• Exact solution in linear time via 2-SAT for (k-)Budget

Page 24: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Experiments: case study

• Tweets from Helsinki region, November 2013

• Inner algorithm (1-Sum-Span)

winwinxbone

yandexwebdesign

visionwalkbase

winneryounited

zenroboticsslush13

pureviewnuijankopautus

nokiaegmilluusio

herekirkkonummi

nokiatypaikka

elopnordis

mtvemabestvideo

bronemaazingema2013

exobestpopemazing

worldwideactexovoteaustinmahone

Nov 1 3 6 9 12 15 18 21 24 27 30

Page 25: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Experiments: case study

• Helsinki Twitter, years 2011-2013

• k-Inner algorithm with k = 3 (k-Max-S)

slush2013uutisraivaajaslushpitstop

slush13comingtoslushpelotoncamp

slush12tediili

sailfishjolla

digitalistaller

startupsgarage48slush11

startupsaunaaaltoesstartup

padlettecrowdfunding

slushlumiasxsw

digasellipad

supercelltivitforesight

slush2012slush2008

Jan, 11 Apr Jul Oct Jan, 12 Apr Jul Oct Jan, 13 Apr Jul Oct Jan, 14

Page 26: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Performance: Inner

1 2 3 4 5 6 7 8 9 10

iterat ions

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Quality

P

R

F

1 2 3 4 5 6 7 8 9 10

iterat ions

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Re

lati

ve

to

tal

len

gth

• Synthetic dataset, with planted ground truth

• overlap ! is set to 0.5

• values are averaged over 100 runs.

Page 27: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Performance : k-Inner

1 2 3 4 5 6 7 8 9 10

iterat ions

0.76

0.78

0.80

0.82

0.84

0.86

Quality

P

R

F

1 2 3 4 5 6 7 8 9 10

iterat ions

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

8.5

Re

lati

ve

to

tal

len

gth

• Synthetic dataset, k=10 intervals

Page 28: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Performance : k-Budget

1 2 3 4 5 6 7 8 9 10

iterat ions

0.65

0.70

0.75

0.80

0.85

0.90

0.95

Quality

P

R

F1 2 3 4 5 6 7 8 9 10

iterat ions

38

40

42

44

46

48

50

52

54

Re

lati

ve

ma

x l

en

gth

• Synthetic dataset, k=10 intervals

Page 29: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Baseline comparison

2 3 4 5 6 7 8 9 10

num ber of intervals

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

F-m

ea

su

re

k-Inner

k-Budget

k-Baseline

2 3 4 5 6 7 8 9 10

number of intervals

20

40

60

80

100

120

140

160

Re

lative

ma

x le

ng

th

k-Inner

k-Budget

k-Baseline

2 3 4 5 6 7 8 9 10

number of intervals

4

6

8

10

12

14

16

18

20

Rela

tive tota

l le

ngth

k-Inner

k-Budget

k-Baseline

• Baseline: greedily ’cover’ the

longest activity intervals of the

nodes.

Page 30: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Running time

103 10

410

510

6 107

108

109

num ber of tem poral edges

10-3

10-2

10-1

100

101

102

103

104

Ru

nn

ing

tim

e (

min

ute

s)

k-Inner

k-Budget

k-Baseline

• k=10, synthetic dataset

Page 31: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Conclusions

• Novel problem of network untangling:

Discover activity time intervals for the network entities to

explain the observed interactions.

• A possible Temporal extension of Vertex Cover Problem

• Two settings: (k-)Sum-Span (minimize sum of interval lengths)

and (k-)Max-Span (minimize maximum length).

• Some hardness and inapproximability results

• Efficient algorithms

Page 32: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Future work

• Approximation for 1-Sum-Span?

• Consider different activity levels for each entity.

• Consider hyperedges.

Page 33: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

References

1. B. Aspvall, M. F. Plass, and R. E. Tarjan. A linear-time

algorithm for testing the truth of certain quantified Boolean

formulas. 1982.

2. J. E. Hopcroft and J. D. Ullman. Data structures and

algorithms.1983.

Page 34: The network-untangling problem · Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear

Thank you!