oden: simultaneous approximation of multiple motif counts

14
odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks Ilie Sarpe Department of Information Engineering University of Padova Padova, Italy [email protected] Fabio Vandin Department of Information Engineering University of Padova Padova, Italy [email protected] ABSTRACT Counting the number of occurrences of small connected subgraphs, called temporal motifs, has become a fundamental primitive for the analysis of temporal networks, whose edges are annotated with the time of the event they represent. One of the main complications in studying temporal motifs is the large number of motifs that can be built even with a limited number of vertices or edges. As a consequence, since in many applications motifs are employed for exploratory analyses, the user needs to iteratively select and ana- lyze several motifs that represent di๏ฌ€erent aspects of the network, resulting in an ine๏ฌƒcient, time-consuming process. This problem is exacerbated in large networks, where the analysis of even a single motif is computationally demanding. As a solution, in this work we propose and study the problem of simultaneously counting the number of occurrences of multiple temporal motifs, all correspond- ing to the same (static) topology (e.g., a triangle). Given that for large temporal networks computing the exact counts is unfeasible, we propose odeN, a sampling-based algorithm that provides an accurate approximation of all the counts of the motifs. We provide analytical bounds on the number of samples required by odeN to compute rigorous, probabilistic, relative approximations. Our extensive experimental evaluation shows that odeN enables the approximation of the counts of motifs in temporal networks in a fraction of the time needed by state-of-the-art methods, and that it also reports more accurate approximations than such methods. CCS CONCEPTS โ€ข Mathematics of computing โ†’ Probabilistic algorithms; โ€ข Theory of computation โ†’ Graph algorithms analysis. KEYWORDS temporal motifs, sampling algorithm, temporal networks, random- ized algorithm ACM Reference Format: Ilie Sarpe and Fabio Vandin. 2021. odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks. In Proceedings of the 30th ACM International Conference on Information and Knowledge Manage- ment (CIKM โ€™21), November 1โ€“5, 2021, Virtual Event, QLD, Australia. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3459637.3482459 CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia ยฉ 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the authorโ€™s version of the work. It is posted here for your personal use. Not for redistribution. The de๏ฌnitive Version of Record was published in Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM โ€™21), November 1โ€“5, 2021, Virtual Event, QLD, Australia, https://doi.org/10.1145/3459637. 3482459. 1 INTRODUCTION Networks are ubiquitous representations that model a wide range of real-world systems, such as social networks [9], citation networks [10], biological systems [12], and many others [32]. One of the most fundamental primitives in network analysis is the mining of motifs [30, 31, 42] (or graphlets [7, 36]), which requires to count the occurrences of small connected subgraphs of nodes. Motifs represent key building blocks of networks, and they provide useful insights in wide range of applications such as network classi๏ฌcation [29, 43], network clustering [3], and community detection [2]. Modern networks contain rich information about their edges or nodes [8, 20, 39, 50] in addition to graph structure. One of the most important information is the time at which the interactions, represented by edges, occur. Networks for which such informa- tion is available are called temporal [15, 16]; novel insights about the underlying dynamics of the systems can be uncovered by the analysis of such networks [22โ€“24]. In recent years, many primi- tives [17, 21, 34, 41] have been proposed as counterpart, in temporal networks, to the study of subgraph patterns for nontemporal net- works, with each primitive capturing di๏ฌ€erent temporal aspects of a network. One of the most important such primitives is the study of temporal motifs [34]. Temporal motifs are small connected sub- graphs with nodes and โ„“ edges occurring with a prescribed order within a time interval of duration . Temporal motifs describe the patterns shaping interactions over the network, e.g., networks from similar domains tend to have similar temporal motif counts [34], and their analysis is useful in many applications, e.g., anomalies detection [4], network classi๏ฌcation [45], and social networks [6]. The temporal dimension poses several challenges in the analyses of motifs. A major challenge is represented by the large number of temporal motifs that can be build even with a limited number of vertices and edges. For example, even considering directed (and connected) temporal motifs with only 3 vertices and 3 edges, there are 32 such motifs. In several domains when motifs are studied in the exploratory analysis of a temporal network it is almost impos- sible for the data analyst to known a priori which motif is the most interesting and useful. In social networks, a set of 3 vertices repre- sents the smallest non trivial community, and di๏ฌ€erent temporal motifs with 3 vertices describe di๏ฌ€erent patterns of interactions in such community. Hence, studying all such motifs can provide novel insights on the interactions within such communities. In network classi๏ฌcation, considering the counts of all the 32 motifs with 3 vertices and 3 edges lead to models with improved accuracy [45]. However, since state-of-the-art approaches for general temporal motifs only allow the analysis of one motif at the time, the user needs to iteratively select and analyze the various motifs, resulting arXiv:2108.08734v1 [cs.SI] 19 Aug 2021

Upload: others

Post on 02-Nov-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts inLarge Temporal Networks

Ilie SarpeDepartment of Information Engineering

University of PadovaPadova, Italy

[email protected]

Fabio VandinDepartment of Information Engineering

University of PadovaPadova, Italy

[email protected]

ABSTRACT

Counting the number of occurrences of small connected subgraphs,called temporal motifs, has become a fundamental primitive for theanalysis of temporal networks, whose edges are annotated with thetime of the event they represent. One of the main complicationsin studying temporal motifs is the large number of motifs thatcan be built even with a limited number of vertices or edges. As aconsequence, since in many applications motifs are employed forexploratory analyses, the user needs to iteratively select and ana-lyze several motifs that represent different aspects of the network,resulting in an inefficient, time-consuming process. This problem isexacerbated in large networks, where the analysis of even a singlemotif is computationally demanding. As a solution, in this workwe propose and study the problem of simultaneously counting thenumber of occurrences of multiple temporal motifs, all correspond-ing to the same (static) topology (e.g., a triangle). Given that forlarge temporal networks computing the exact counts is unfeasible,we propose odeN, a sampling-based algorithm that provides anaccurate approximation of all the counts of the motifs. We provideanalytical bounds on the number of samples required by odeNto compute rigorous, probabilistic, relative approximations. Ourextensive experimental evaluation shows that odeN enables theapproximation of the counts of motifs in temporal networks in afraction of the time needed by state-of-the-art methods, and that italso reports more accurate approximations than such methods.

CCS CONCEPTS

โ€ข Mathematics of computing โ†’ Probabilistic algorithms; โ€ขTheory of computationโ†’ Graph algorithms analysis.

KEYWORDS

temporal motifs, sampling algorithm, temporal networks, random-ized algorithm

ACM Reference Format:

Ilie Sarpe and Fabio Vandin. 2021. odeN: Simultaneous Approximation ofMultiple Motif Counts in Large Temporal Networks. In Proceedings of the30th ACM International Conference on Information and Knowledge Manage-ment (CIKM โ€™21), November 1โ€“5, 2021, Virtual Event, QLD, Australia. ACM,New York, NY, USA, 14 pages. https://doi.org/10.1145/3459637.3482459

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australiaยฉ 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.This is the authorโ€™s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in Proceedings of the30th ACM International Conference on Information and Knowledge Management (CIKMโ€™21), November 1โ€“5, 2021, Virtual Event, QLD, Australia, https://doi.org/10.1145/3459637.3482459.

1 INTRODUCTION

Networks are ubiquitous representations that model a wide range ofreal-world systems, such as social networks [9], citation networks[10], biological systems [12], and many others [32]. One of themost fundamental primitives in network analysis is the mining ofmotifs [30, 31, 42] (or graphlets [7, 36]), which requires to countthe occurrences of small connected subgraphs of ๐‘˜ nodes. Motifsrepresent key building blocks of networks, and they provide usefulinsights in wide range of applications such as network classification[29, 43], network clustering [3], and community detection [2].

Modern networks contain rich information about their edgesor nodes [8, 20, 39, 50] in addition to graph structure. One of themost important information is the time at which the interactions,represented by edges, occur. Networks for which such informa-tion is available are called temporal [15, 16]; novel insights aboutthe underlying dynamics of the systems can be uncovered by theanalysis of such networks [22โ€“24]. In recent years, many primi-tives [17, 21, 34, 41] have been proposed as counterpart, in temporalnetworks, to the study of subgraph patterns for nontemporal net-works, with each primitive capturing different temporal aspects ofa network. One of the most important such primitives is the studyof temporal motifs [34]. Temporal motifs are small connected sub-graphs with ๐‘˜ nodes and โ„“ edges occurring with a prescribed orderwithin a time interval of duration ๐›ฟ . Temporal motifs describe thepatterns shaping interactions over the network, e.g., networks fromsimilar domains tend to have similar temporal motif counts [34],and their analysis is useful in many applications, e.g., anomaliesdetection [4], network classification [45], and social networks [6].

The temporal dimension poses several challenges in the analysesof motifs. A major challenge is represented by the large numberof temporal motifs that can be build even with a limited numberof vertices and edges. For example, even considering directed (andconnected) temporal motifs with only 3 vertices and 3 edges, thereare 32 such motifs. In several domains when motifs are studied inthe exploratory analysis of a temporal network it is almost impos-sible for the data analyst to known a priori which motif is the mostinteresting and useful. In social networks, a set of 3 vertices repre-sents the smallest non trivial community, and different temporalmotifs with 3 vertices describe different patterns of interactions insuch community. Hence, studying all such motifs can provide novelinsights on the interactions within such communities. In networkclassification, considering the counts of all the 32 motifs with 3vertices and 3 edges lead to models with improved accuracy [45].

However, since state-of-the-art approaches for general temporalmotifs only allow the analysis of one motif at the time, the userneeds to iteratively select and analyze the various motifs, resulting

arX

iv:2

108.

0873

4v1

[cs

.SI]

19

Aug

202

1

Page 2: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

in an inefficient and time consuming process, in particular for largenetworks.

In this paper, we define and study the problem of simultaneouslycounting the occurrences of various temporal motifs. In particu-lar, we consider all motifs corresponding to the same static targettemplate (e.g., all triangles - see Fig. 1a). This problem is extremelychallenging, since computing the count of even a single temporalmotif is NP-Hard in general [26], with existing state-of-the-art ap-proaches having complexity exponential in the number of edges ofthe motif to obtain even a single motifโ€™s count [26, 40, 47].

The task of counting temporal motifs is hindered by the sheer sizeof modern datasets and, therefore, scalable techniques are needed todeal with such amount of data. Since exact approaches [13, 27, 34]are impractical, rigorous and efficient approximation algorithmsproviding tight guarantees are needed. In this work we developodeN, a sampling algorithm that provides a high quality approxi-mation for the problem of counting multiple temporal motifs withthe same static topology. Our main contributions are as follows:โ€ข We propose the motif template counting problem, where,given a temporal network, a ๐‘˜-node target template graph๐ป ,the number โ„“ of edges of each temporal motif, and a bound ๐›ฟon the duration of the temporal motifs, the problem requiresto output all the counts of the temporal motifs whose statictopology corresponds to ๐ป and having exactly โ„“ temporaledges, occurring within ๐›ฟ-time.โ€ข We propose odeN, a randomized sampling algorithm pro-viding a high quality approximation for the motif templatecounting problem. odeNโ€™s approach is to sample a set of mo-tif occurrences, ensuring that they all share the same statictopology ๐ป . Thus, odeN takes advantage of the constraintthat all motifs must share a common target template ๐ป , ag-gregating the computation of all motif counts in a sample.odeNโ€™s approximation, as in other data mining applications,is controlled by two parameters Y, [, which control respec-tively the quality and the confidence of the approximations.โ€ข We show a tight and efficiently computable bound on thenumber of samples required by odeN for the approximationto be within Y error with confidence > 1 โˆ’ [ for all temporalmotifโ€™s counts.โ€ข We perform large scale experiments using datasets with upto billions of temporal edges, showing that odeN requires afraction of the time required by state-of-the-art approxima-tion algorithms for single motif counts, and that it reportssharper estimates. We then provide a parallel implemen-tation of odeN displaying almost linear speedup in manyconfigurations. We also show how odeN provides novel in-sights on the dynamics of a real-world temporal network.

2 PRELIMINARIES

In this section we introduce the basic notions that we will usethroughout the work, and we define the computational problemof counting multiple temporal motifs sharing a common targettemplate graph. We start by defining temporal networks.

Definition 2.1. A temporal network is a pair ๐‘‡ = (๐‘‰ , ๐ธ) where,๐‘‰ = {๐‘ฃ1, . . . , ๐‘ฃ๐‘›} and ๐ธ = {(๐‘ฅ,๐‘ฆ, ๐‘ก) : ๐‘ฅ,๐‘ฆ โˆˆ ๐‘‰ , ๐‘ฅ โ‰  ๐‘ฆ, ๐‘ก โˆˆ R+} with|๐‘‰ | = ๐‘› and |๐ธ | =๐‘š.

1

2

3

4

5

6

7

8

TargetTemplate ๏ฟฝ

?

72

5

3, 811

6, 18 9

14,27

20, 35

10, 15

13

19

21 24

TemporalNetwork )

(a)

v2 v3

v1

ordering ฯƒ

ใ€ˆ(v1, v2), (v3, v1), (v2, v3)ใ€‰

t3

t1 t2

(b)

2 6

5

20

6 9

2 6

5

35

6 9

2 6

5

20

18 9

2 6

5

35

18 9

(c)

Figure 1: (1a): Motif template counting problem overview:

given a temporal network and a (static) target template, com-

pute the counts of all temporal motifs that map on the tem-

plate. (1b): Temporal motif, with ๐‘˜ = 3, โ„“ = 3, and its order-

ing ๐œŽ . (1c): Sequences of edges of the network in (1a) among

nodes {2, 5, 6} thatmap topologically on themotif in (1b). For

๐›ฟ = 15 only the green sequence is a ๐›ฟ-instance of the motif,

since the timestamps respect ๐œŽ and ๐‘ก โ€ฒโ„“โˆ’ ๐‘ก โ€ฒ1 = 20 โˆ’ 6 โ‰ค ๐›ฟ . The

red sequences are not ๐›ฟ-instances, since they do not respect

such constraint or do not respect the ordering ๐œŽ .

Given (๐‘ฅ,๐‘ฆ, ๐‘ก) โˆˆ ๐ธ, we say that ๐‘ก is the timestamp of the directededge (๐‘ฅ,๐‘ฆ). Given a temporal network ๐‘‡ , by ignoring the times-tamps of its edges we obtain the associated undirected projectedstatic network, defined as follows.

Definition 2.2. The undirected projected static network of a tempo-ral network๐‘‡ = (๐‘‰ , ๐ธ) is the pair๐บ๐‘‡ = (๐‘‰ , ๐ธ๐‘‡ ) that is an undirectednetwork, such that ๐ธ๐‘‡ = {{๐‘ฅ,๐‘ฆ} : (๐‘ฅ,๐‘ฆ, ๐‘ก) โˆˆ ๐ธ}.

We will often use the term static network to denote a networkwhose edges are without timestamps. Next we introduce the defini-tion of temporal motifs as defined by Paranjape et al. [34], whichare small, connected subgraphs representing patterns of interest.

Definition 2.3. A ๐‘˜-node โ„“-edge temporal motif ๐‘€ is a pair๐‘€ =

(K, ๐œŽ) where K = (๐‘‰K , ๐ธK ) is a directed and weakly connectedmultigraph where ๐‘‰K = {๐‘ฃ1, . . . , ๐‘ฃ๐‘˜ }, ๐ธK = {(๐‘ฅ,๐‘ฆ) : ๐‘ฅ,๐‘ฆ โˆˆ ๐‘‰K , ๐‘ฅ โ‰ 

๐‘ฆ} s.t. |๐‘‰K | = ๐‘˜ and |๐ธK | = โ„“ , and ๐œŽ is an ordering of ๐ธK .

Note that a ๐‘˜-node โ„“-edge temporal motif ๐‘€ = (K, ๐œŽ) is alsoidentified by the sequence โŸจ(๐‘ฅ1, ๐‘ฆ1), . . . , (๐‘ฅโ„“ , ๐‘ฆโ„“ )โŸฉ of edges orderedaccording to ๐œŽ ; we will often use such representation for a motif๐‘€(see Fig. (1b) for an example). Given a ๐‘˜-node โ„“-edge temporal motif๐‘€ , the values of ๐‘˜ and โ„“ are determined by ๐‘‰K and ๐ธK . We willtherefore use the term temporal motif, or simply motif, when ๐‘˜ andโ„“ are clear from context. Given a temporal motif๐‘€ = ((๐‘‰K , ๐ธK ), ๐œŽ),we denote with๐บ๐‘ข [๐‘€] the undirected graph corresponding to theunderlying undirected graph structure of the multigraph K of๐‘€ ,that is ๐บ๐‘ข [๐‘€] = (๐‘‰K , ๐ธ๐‘ข๐‘€ ) where ๐ธ

๐‘ข๐‘€

= {{๐‘ฅ,๐‘ฆ} : (๐‘ฅ,๐‘ฆ) โˆจ (๐‘ฆ, ๐‘ฅ) โˆˆ

Page 3: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

๐ธK } (i.e., ๐ธ๐‘ข๐‘€ is the set of undirected edges associated to the multiset๐ธK ). Notice that directed edges of the form (๐‘ฅ,๐‘ฆ), (๐‘ฆ, ๐‘ฅ) as well asmultiple directed edges (๐‘ฅ,๐‘ฆ), (๐‘ฅ,๐‘ฆ), . . . from ๐ธK are representedby the same undirected edge {๐‘ฅ,๐‘ฆ} in ๐ธ๐‘ข

๐‘€.

For a fixed temporal motif ๐‘€ , we are interested in identifyingits realizations in ๐‘‡ appearing within at most ๐›ฟ-time duration, ascaptured by the following definition.

Definition 2.4. Given a temporal network ๐‘‡ = (๐‘‰ , ๐ธ) and ๐›ฟ โˆˆR+, a time ordered sequence ๐‘† = โŸจ(๐‘ฅ โ€ฒ1, ๐‘ฆ

โ€ฒ1, ๐‘กโ€ฒ1), . . . , (๐‘ฅ

โ€ฒโ„“, ๐‘ฆโ€ฒ

โ„“, ๐‘ก โ€ฒโ„“)โŸฉ of โ„“

unique temporal edges from ๐‘‡ is a ๐›ฟ-instance of the temporal motif๐‘€ = โŸจ(๐‘ฅ1, ๐‘ฆ1), . . . , (๐‘ฅโ„“ , ๐‘ฆโ„“ )โŸฉ if:

(1) there exists a bijection ๐‘“ on the vertices such that ๐‘“ (๐‘ฅ โ€ฒ๐‘–) = ๐‘ฅ๐‘–

and ๐‘“ (๐‘ฆโ€ฒ๐‘–) = ๐‘ฆ๐‘– , ๐‘– = 1, . . . , โ„“ ; and

(2) the edges of ๐‘† occur within ๐›ฟ time, i.e., ๐‘ก โ€ฒโ„“โˆ’ ๐‘ก โ€ฒ1 โ‰ค ๐›ฟ .

Exploring different values of ๐›ฟ in the above definition oftenleads to different insights on the temporal network that may bediscovered through the analysis of the motifs [1, 15, 21, 33]. Notethat in a ๐›ฟ-instance of the temporal motif ๐‘€ = (K, ๐œŽ) the edgetimestamps must be sorted according to the ordering ๐œŽ (see Fig. (1c)for an example). In fact, ๐œŽ plays a key role in defining a temporalmotif, with different orderings of the same multigraphK reflectingdiverse dynamic properties captured by the motif.

For a given directed multigraph K with |๐ธK | = โ„“ edges, ingeneral not all the โ„“! orderings of its edges define distinct temporalmotifs. We therefore introduce the following equivalence relation.

Definition 2.5. Let๐‘€1 and๐‘€2 be two temporal motifs. Let๐‘€1 =

โŸจ(๐‘ฅ11 , ๐‘ฆ

11), . . . , (๐‘ฅ

1โ„“, ๐‘ฆ1

โ„“)โŸฉ, and ๐‘€2 = โŸจ(๐‘ฅ2

1 , ๐‘ฆ21), . . . , (๐‘ฅ

2โ„“, ๐‘ฆ2

โ„“)โŸฉ be the se-

quences of edges of๐‘€1 and๐‘€2, respectively. We say that๐‘€1 and๐‘€2are not distinct (denoted with๐‘€1 ๏ฟฝ๐œ ๐‘€2) if there exists a bijection๐‘” on the vertices such that ๐‘”(๐‘ฅ1

๐‘–) = ๐‘ฅ2

๐‘–and ๐‘”(๐‘ฆ1

๐‘–) = ๐‘ฆ2

๐‘–, ๐‘– = 1, . . . , โ„“ .

We provide an example of the definition above in Figure 2.Given two networks (undirected or temporal) ๐บ,๐บ โ€ฒ we say that

๐บ โ€ฒ = (๐‘‰ โ€ฒ, ๐ธ โ€ฒ) is a subgraph of ๐บ = (๐‘‰ , ๐ธ) (denoted with ๐บ โ€ฒ โŠ† ๐บ)if ๐‘‰ โ€ฒ โŠ† ๐‘‰ and ๐ธ โ€ฒ โŠ† ๐ธ. Note that we require a subgraph to beedge induced. To conclude the preliminary notions, we recall thedefinition of static graph isomorphism.

Definition 2.6. Given two graphs๐บ = (๐‘‰๐บ , ๐ธ๐บ ) and๐ป = (๐‘‰๐ป , ๐ธ๐ป )we say that the two graphs are isomorphic, denoted with ๐บ โ‰ƒ ๐ป

if and only if there exists a bijection ๐‘“ : ๐‘‰๐บ โ†ฆโ†’ ๐‘‰๐ป on the verticessuch that ๐‘’ = (๐‘ข, ๐‘ฃ) โˆˆ ๐ธ๐บ โ‡” ๐‘’ โ€ฒ = (๐‘“ (๐‘ข), ๐‘“ (๐‘ฃ)) โˆˆ ๐ธ๐ป .

Let U(๐‘€,๐›ฟ) = {๐ผ : ๐ผ is a ๐›ฟ-instance of ๐‘€} be the set of (all) ๐›ฟ-instances of the motif๐‘€ in๐‘‡ . The count of๐‘€ is๐ถ๐‘€ (๐›ฟ) = |U(๐‘€,๐›ฟ) |,denoted with ๐ถ๐‘€ when ๐›ฟ is clear from the context.

Given a static undirected graph ๐ป , which we call the targettemplate, we are interested in solving the problem of computingthe number of ๐›ฟ-instances of all temporal motifs with โ„“ edgesand all corresponding to the same static graph ๐ป . More formally,given the target template ๐ป = (๐‘‰๐ป , ๐ธ๐ป ), which is a simple andconnected graph, and โ„“ โ‰ฅ |๐ธ๐ป | โˆˆ Z+, letM(๐ป, โ„“) be the set ofdistinct temporal motifs with โ„“ edges whose underlying undirectedgraph structure corresponds to ๐ป , that isM(๐ป, โ„“) contains mo-tifs ๐‘€๐‘– = ((๐‘‰ ๐‘–

K , ๐ธ๐‘–K ), ๐œŽ๐‘– ), ๐‘– = 1, 2, . . . , such that i) ๐บ๐‘ข [๐‘€๐‘– ] โ‰ƒ ๐ป ; ii)

|๐ธ๐‘–K | = โ„“ ; and iii)๐‘€๐‘– ๏ฟฝ๐œ ๐‘€๐‘— ,โˆ€๐‘— โ‰  ๐‘– .

x y

zM1

โˆผ=ฯ„

xโ€ฒ yโ€ฒ

zโ€ฒM2

t1

t2t3

t2

t3t1

x y

zM1

๏ฟฝฯ„

xโ€ฒ yโ€ฒ

zโ€ฒM3

t1

t2t3

t1

t3t2

Figure 2: (Left): The two motifs are not distinct: let ๐œŽ1 =

โŸจ(๐‘ฆ, ๐‘ฅ), (๐‘ฆ, ๐‘ง), (๐‘ฅ, ๐‘ง)โŸฉ and ๐œŽ2 = โŸจ(๐‘ฅ โ€ฒ, ๐‘งโ€ฒ), (๐‘ฅ โ€ฒ, ๐‘ฆโ€ฒ), (๐‘งโ€ฒ, ๐‘ฆโ€ฒ)โŸฉ corre-sponding to ๐‘€1 and ๐‘€2, then the function ๐‘“ : ๐‘‰ 1

K โ†ฆโ†’ ๐‘‰ 2K de-

fined by ๐‘“ (๐‘ฅ) = ๐‘งโ€ฒ, ๐‘“ (๐‘ฆ) = ๐‘ฅ โ€ฒ, ๐‘“ (๐‘ง) = ๐‘ฆโ€ฒ preserves both the

topology and the ordering as from Definition 2.5. (Right):

The two motifs are distinct since there is no map ๐‘“ : ๐‘‰ 1K โ†ฆโ†’

๐‘‰ 3K preserving both the topology and ordering.

Let us explain intuitively the constrains above. First, ๐ป imposesa constraint on the undirected static topology the temporal motifsof interest (that are directed subgraphs) should have. That is, itrequires all the motifs to have the same underlying graph structure(๐บ๐‘ข [๐‘€]), which must be isomorphic to ๐ป . This is a useful way torepresent multiple related temporal motifs. For example, in socialnetwork analysis by fixing ๐ป as an undirected triangle we considerinM(๐ป, โ„“) all temporal motifs that characterize the communicationbetween groups of three friends (i.e., each motif will represent adifferent form of communication among all such groups [34]). Thesecond constraint requires each motif๐‘€๐‘– โˆˆ M(๐ป, โ„“) to have exactlyโ„“ โ‰ฅ |๐ธ๐ป | edges, with โ„“ provided in input by the user. Fixing theparameter โ„“ is motivated by the fact that motifs with different valuesof โ„“ (evenwith the same target template structure๐ป ) reflect differentpatterns of interaction (e.g, a group of friends that exchanges โ„“ = 3or โ„“ = 4 messages). As we will show empirically in Section 5.4, suchcounts vary significantly with โ„“ for fixed ๐ป and ๐›ฟ . Finally, the thirdconstraint ensures that we only count distinct motifs, i.e., motifsrepresenting different patterns.

We now define the motif template counting problem.

Problem 1. Motif template counting problem. Given a tem-poral network ๐‘‡ , a static undirected target graph ๐ป = (๐‘‰๐ป , ๐ธ๐ป ),โ„“ โˆˆ Z+, โ„“ โ‰ฅ |๐ธ๐ป |, and a parameter ๐›ฟ โˆˆ R+, find the counts๐ถ๐‘€๐‘–

(๐›ฟ) ofmotifs๐‘€๐‘– โˆˆ M(๐ป, โ„“), ๐‘– = 1, . . . , |M(๐ป, โ„“) | in ๐‘‡ .

We now provide an example of the different motifs to be countedfor different values of โ„“ with a fixed target template ๐ป .

Example 2.7. Let ๐ป = ({๐‘ฃ1, ๐‘ฃ2}, {{๐‘ฃ1, ๐‘ฃ2}}), that is, the targettemplate is an edge. Let ๐‘’1 = (๐‘ฃ1, ๐‘ฃ2) and ๐‘’2 = (๐‘ฃ2, ๐‘ฃ1). By vary-ing โ„“ โˆˆ {2, 3} the motifs inM(๐ป, โ„“), for which we want to com-pute the counts, are: ๐‘€1 = โŸจ๐‘’1, ๐‘’1โŸฉ and ๐‘€2 = โŸจ๐‘’1, ๐‘’2โŸฉ for โ„“ = 2(i.e., |M(๐ป, 2) | = 2) while ๐‘€1 = โŸจ๐‘’1, ๐‘’1, ๐‘’1โŸฉ, ๐‘€2 = โŸจ๐‘’1, ๐‘’2, ๐‘’1โŸฉ, ๐‘€3 =

โŸจ๐‘’1, ๐‘’2, ๐‘’2โŸฉ, ๐‘€4 = โŸจ๐‘’1, ๐‘’1, ๐‘’2โŸฉ for โ„“ = 3 (i.e., |M(๐ป, 3) | = 4).

Since solving the counting problem exactly is NP-Hard in gen-eral1 even for one single temporal motif, we aim at providing high-quality approximations to the motif counts as follows.

Problem 2. Motif template approximation problem. Giventhe input parameters of Problem 1 and additional parameters Y โˆˆ1The hardness depends on the topology of the motif. For example for triangles andsingle edges there exist polynomial time-algorithms, even if they are impracticableon very large networks. Interestingly, counting temporal star-shaped motifs is NP-Hard [26], while on static networks such motifs can be counted in polynomial time.

Page 4: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

R+, [ โˆˆ (0, 1), compute approximations ๐ถ โ€ฒ๐‘€๐‘–(๐›ฟ) of counts ๐ถ๐‘€๐‘–

(๐›ฟ)of motifs ๐‘€๐‘– โˆˆ M(๐ป, โ„“), ๐‘– = 1, . . . , |M(๐ป, โ„“) |, such that P[โˆƒ๐‘– โˆˆ{1, . . . , |M(๐ป, โ„“) |} : |๐ถ โ€ฒ

๐‘€๐‘–(๐›ฟ) โˆ’ ๐ถ๐‘€๐‘–

(๐›ฟ) | โ‰ฅ Y๐ถ๐‘€๐‘–(๐›ฟ)] โ‰ค [, that is

๐ถ โ€ฒ๐‘€๐‘–(๐›ฟ) is a relative Y-approximation to the count ๐ถ๐‘€๐‘–

(๐›ฟ) with prob-ability โ‰ฅ 1 โˆ’ [ for all ๐‘– = 1, . . . , |M(๐ป, โ„“) | simultaneously.

3 RELATEDWORKS

Much work has been done on enumerating and approximating ๐‘˜-node motifs in (nontemporal) networks. We refer the interestedreader to the surveys [38, 48]. However, such works cannot be easilyadapted to temporal motifs since they do not properly account forthe temporal information [15, 34]. Many different definitions oftemporal networks and temporal patterns have been proposed: herewe will focus only on those works that are relevant for our work,the interested reader may refer to [15, 16, 18, 28] for a more generaloverview.

Our work builds on the work of Paranjape et al. [34] whichfirst introduced the definition of temporal motif used here, andthe problem of counting single temporal motifs. The authors pro-vided a general algorithm for counting a single temporal motif byenumerating all the subsequences of edges that map on a singlestatic subgraph. Their approach is not feasible on large datasetssince it requires exhaustive enumeration of all subgraphs of theundirected projected static network ๐บ๐‘‡ that are isomorphic to thetarget template ๐ป . The authors also proposed efficient algorithmsand data-structures for counting 3-node 3-edge motifs, which maybe used for the exact counting subroutines within odeN samplingframework. In addition to the algorithmic contributions, the authorsalso showed that networks from similar domains tend to exhibitsimilar temporal motif counts. They also showed how motif countscan provide significant insights on the communication patterns inmany networks, highlighting the importance of studying temporalmotifs in temporal networks.

Other exact algorithms have been proposed for the problem ofcounting a single motif, or for slightly different problems. Mackeyet al. [27] presented a backtracking algorithm for counting a singletemporal motif that can be use for any motif. Boekhout et al. [6]developed exact algorithms for counting temporal motifs in multi-layer temporal networks (i.e., each edge is a tuple (๐‘ฅ,๐‘ฆ, ๐‘ก, ๐‘Ž) with๐‘Ž denoting the layer of each edge), they also discuss efficient data-structures for counting 4-node 4-edge motifs, which may also beadapted for the exact counting subroutines in our sampling frame-work odeN. Being exact, both such algorithms do not scale onmassive datasets due large time and memory requirements.

Several approximation algorithms have been proposed in re-cent years for estimating the count of a single motif. Liu et al. [26]proposed a temporal-partition based sampling approach. Wang etal. [47] introduced a sampling-based algorithm that selects tempo-ral edges with a fixed probability specified by the user. Lastly, Sarpeand Vandin [40] proposed PRESTO, an algorithm based on uniformsampling of small windows of the temporal network ๐‘‡ . All suchsampling algorithms can be used to analyze a single temporal motifbut become inefficient as the number of motifs to be counted grows,such as in Problem 2. In fact, they cannot leverage the additionalinformation that all motifs ๐‘€1, . . . , ๐‘€ |M(๐ป,โ„“) | must share a com-mon static topology isomorphic to ๐ป . As stated in Section 1, when

analysing a temporal networks it is hard to know a-priori whichmotif is representing important functions for the network, thereforeone often relies on testing all possible orderings ๐œŽ over one fixedtarget template ๐ป for fixed โ„“, ๐›ฟ [34, 45] (as in Prob. 1) resulting in atime consuming and inefficient procedure. Our approach insteadsupports the direct analysis of multiple temporal motifs, enablingthe study of hundreds of temporal motifs on massive networks in avery limited time.

4 ODEN

In this section we present odeN, our algorithm to address the motiftemplate approximation problem (Prob. 2). We start in Section 4.1with an overview of odeN. We then describe the algorithm inSection 4.2, analyze its time complexity in Section 4.3 and its theo-retical guarantees, including an efficiently computable bound onthe number of samples required to obtain the desired probabilisticguarantees, in Section 4.4.

4.1 Overview of odeN

Our algorithm odeN estimates of the counts of motifs inM(๐ป, โ„“).The main idea is to avoid the explicit generation all the motifs๐‘€๐‘– โˆˆ M(๐ป, โ„“), ๐‘– = 1, . . . , |M(๐ป, โ„“) | to count them one at the timeas it is required by existing algorithms that approximate a singlemotif count. odeN instead leverages the fact that the topology of allmotifs must to be isomorphic to the target template ๐ป , by reusingthe computation while estimating the motif counts.

An overview of the main strategy adopted by our algorithm ispresented in Figure 3. Given the input parameters of Problem 2,where ๐ป is the target template, the idea behind our procedure is toconsider the undirected static projected graph ๐บ๐‘‡ of the input tem-poral network ๐‘‡ and proceed as follows: i) find a set of subgraphsin the static graph๐บ๐‘‡ that are isomorphic to ๐ป by first sampling anedge ๐‘’๐‘… of ๐บ๐‘‡ with some probability ๐‘๐‘’๐‘… , where ๐‘๐‘’๐‘… depends, po-tentially, on ๐‘’๐‘… and the temporal network๐‘‡ , and then enumeratingall subgraphs of๐บ๐‘‡ isomorphic to ๐ป and containing ๐‘’๐‘… ; ii) for eachsuch subgraph, consider the corresponding temporal subgraph andcompute all the counts of the subsequences of โ„“ edges occurringwithin ๐›ฟ-time in such temporal subgraph; iii) for each such sub-sequence identified, find the corresponding motif inM(๐ป, โ„“), forwhich the subsequence is a ๐›ฟ-instance of, and update a count foreach motif identified; iv) weight each motif count opportunely inorder to maintain an unbiased estimate of global motif counts; v)repeat steps i)-iv) a sufficient number of iterations to guarantee thedesired (Y, [)-approximation (see Problem 2).

4.2 Algorithm Description

odeN is described in Algorithm 1. It first computes ๐บ๐‘‡ = (๐‘‰ , ๐ธ๐‘‡ ),the undirected projected static graph of ๐‘‡ (line 1), and initializes๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  (line 2) used to store the estimates of motif counts, whichare used to compute the estimators ๐ถ โ€ฒ

๐‘€๐‘–, ๐‘– = 1, . . . , |M(๐ป, โ„“) |. Then

it repeats ๐‘  times (line 3) the following procedure: i) pick a randomedge ๐‘’๐‘… from๐บ๐‘‡ (line 4) according to some probability distributionover the edges of ๐ธ๐‘‡ ; ii) enumerate all the subgraphs โ„Ž of ๐บ๐‘‡ suchthat โ„Ž โ‰ƒ ๐ป and ๐‘’๐‘… โˆˆ โ„Ž (line 5); note that this enumeration step islocal to ๐‘’๐‘… ; iii) for each such โ„Ž (line 6), collect the correspondingtemporal graph, i.e., all edges in ๐‘‡ for which their static projected

Page 5: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

Figure 3: Overview of odeNโ€™s approximation strategy. Let

๐ป be a triangle, and โ„“ = 3, ๐›ฟ = 40. odeN first collects the

static projected network ๐บ๐‘‡ , then samples an edge ๐‘’๐‘… โˆˆ ๐บ๐‘‡

randomly (๐‘’๐‘… = {1, 2} in the figure) and enumerates all the

subgraphs of ๐บ๐‘‡ isomorphic to ๐ป containing ๐‘’๐‘… . For each

subgraph it collects the corresponding temporal network,

counts the ๐›ฟ-instances of the motifs, and combines the dif-

ferent counts to obtain unbiased estimates of motif counts.

This procedure is repeated to obtain concentrated estimates.

edge is an edge of โ„Ž (line 7), sort the sequence of edges of suchgraph by increasing timestamps and apply some pruning criteria(lines 8-9); iv) if the sequence is not pruned, then update the es-timates of the number of ๐›ฟ-instances of each temporal motif bycalling the routine FastUpdate (line 10). FastUpdate features anefficient implementation of the general algorithm by Paranjapeet al. [34], for which we devised efficient encodings of the mo-tifs within integers through bitwise operations. Such function up-dates ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  in order to maintain for each motif the count thatwill be used to output its unbiased estimate (see Appendix B). Let๐ถ๐‘€๐‘–(๐‘’) be the number of ๐›ฟ-instances in๐‘‡ of๐‘€๐‘– , ๐‘– = 1, . . . , |M(๐ป, โ„“) |

whose undirected projected static network contains edge ๐‘’ โˆˆ ๐บ๐‘‡ .FastUpdate updates the estimate of the counts for each motif๐‘€๐‘–

by summing its unbiased estimate obtained at the ๐‘—-th iteration(i.e., ๐‘‹ ๐‘—

๐‘€๐‘–= ๐ถ๐‘€๐‘–

(๐‘’๐‘…)/(|๐ธ๐ป |๐‘๐‘’๐‘… )). Once the procedure is repeated ๐‘ times, for each motif ๐‘€๐‘– โˆˆ M(๐ป, โ„“), ๐‘– = 1, . . . , |M(๐ป, โ„“) |, odeNcomputes the final estimate ๐ถ โ€ฒ

๐‘€๐‘–= 1

๐‘ 

โˆ‘๐‘ ๐‘—=1 ๐‘‹

๐‘—

๐‘€๐‘–where ๐‘‹

๐‘—

๐‘€๐‘–=

1|๐ธ๐ป |

โˆ‘๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)๐‘‹๐‘’/๐‘๐‘’ is the estimate obtained at the ๐‘—-th it-

eration (with๐‘‹๐‘’ being a bernoulli random variable denoting if edge๐‘’ โˆˆ ๐บ๐‘‡ is sampled at the ๐‘—-th iteration, s.t. P[๐‘‹๐‘’ = 1] = ๐‘๐‘’ ) and out-puts it together with the motif (we output ๐œŽ๐‘– over the node-set ๐‘‰๐ป )(lines 12-13). We show in Lemma 4.1 that odeN outputs unbiasedestimates for all the motif counts.

We briefly discuss the pruning criteria used in line 9. Given acandidate temporal graph ๐‘† for which ๐บ๐‘† โ‰ƒ ๐ป holds, we check inlinear time if ๐‘† can contain a ๐›ฟ-instance of a motif or not: since ๐‘† isalready sorted by increasing timestamps (see line 8), we efficientlycheck if there are at least โ„“ edges within ๐›ฟ-time. If not, then we

Algorithm 1: odeNInput: ๐‘‡ = (๐‘‰ , ๐ธ), ๐ป = (๐‘‰๐ป , ๐ธ๐ป ), ๐›ฟ, ๐‘ , โ„“Output: (๐‘€๐‘– ,๐ถ

โ€ฒ๐‘€๐‘–), ๐‘– = 1, . . . , |M(๐ป, โ„“) | where ๐ถ โ€ฒ

๐‘€๐‘–is an

estimate of ๐ถ๐‘€๐‘–for the motifs inM(๐ป, โ„“).

1 ๐บ๐‘‡ = (๐‘‰ , ๐ธ๐‘‡ ) โ† UndirectedStaticProjection(๐‘‡ )2 ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  โ† {}3 for ๐‘— โ† 1 to ๐‘  do4 ๐‘’๐‘… = {๐‘ฅ๐‘…, ๐‘ฆ๐‘…} โ† RandomEdge(๐‘ (๐‘’) : ๐‘’ โˆˆ ๐ธ๐‘‡ )5 H โ† {โ„Ž โŠ† ๐บ๐‘‡ : โ„Ž โ‰ƒ ๐ป, {๐‘ฅ๐‘…, ๐‘ฆ๐‘…} โˆˆ โ„Ž}6 foreach โ„Ž โˆˆ H do

7 ๐‘† โ† {(๐‘ฅ,๐‘ฆ, ๐‘ก), (๐‘ฆ, ๐‘ฅ, ๐‘ก) โˆˆ ๐ธ : {๐‘ฅ,๐‘ฆ} โˆˆ โ„Ž}8 SortInPlace(๐‘†) โŠฒ By increasing timestamps

9 if *Pruning criteria are not met* then

10 FastUpdate(๐›ฟ, ๐‘†,๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  , ๐‘ (๐‘’๐‘…), ๐ป )

11 foreach (๐‘€,๐‘‹๐‘€ ) โˆˆ ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  do

12 ๐ถ โ€ฒ๐‘€โ† ๐‘‹๐‘€

๐‘ 

13 output (๐‘€,๐ถ โ€ฒ๐‘€)

prune the sequence (since by definition a ๐›ฟ-instance of a motif with๐‘˜-nodes, and โ„“-edges must have โ„“ edges occurring within ๐›ฟ-time).We thus avoid calling the subroutine FastUpdate, which has anexponential complexity in general (see Section 4.3), on ๐‘† .

We now discuss the probability distribution used to sample arandom edge ๐‘’๐‘… from๐บ๐‘‡ (line 4), while we describe the subroutineFastUpdate that updates the motif estimates at each iteration (line10) and the algorithms employed for the static enumeration inAppendix B for space constraints (Sections B.1 and B.2).

Since our final estimate is an average over ๐‘  samples of thevariables ๐‘‹ ๐‘—

๐‘€๐‘–, ๐‘– = 1, . . . , |M(๐ป, โ„“) |, ๐‘— = 1, . . . , ๐‘  , and given that ๐‘‹ ๐‘—

๐‘€๐‘–

is an unbiased estimate (see Lemma 4.1) the final estimate is also aconsistent estimator (i.e., it converges to๐ถ๐‘€๐‘–

as ๐‘  โ†’โˆž) if each edgehas a positive probability of being sampled2. Thus any probabilitymass assigning positive probabilities on edges can be adopted. Weconsidered different distributions over the edges of ๐ธ๐‘‡ :

(1) Uniform: ๐‘๐‘’ = 1/|๐ธ๐‘‡ |, ๐‘’ โˆˆ ๐ธ๐‘‡ ;(2) Static degree based: ๐‘๐‘’ = ๐‘‘ (๐‘’)/(โˆ‘๐‘’โ€ฒโˆˆ๐ธ๐‘‡ ๐‘‘ (๐‘’ โ€ฒ)), ๐‘’ โˆˆ ๐ธ๐‘‡ where

๐‘‘ (๐‘’ = {๐‘ฅ,๐‘ฆ}) = ๐‘‘ (๐‘ฅ) + ๐‘‘ (๐‘ฆ) is the degree of the edge as sumof the degree of its nodes ๐‘ฅ,๐‘ฆ โˆˆ ๐‘‰ in ๐บ๐‘‡ ;

(3) Temporal degree based: ๐‘๐‘’ = ๐œ™ (๐‘’)/(โˆ‘๐‘’โ€ฒโˆˆ๐ธ๐‘‡ ๐œ™ (๐‘’ โ€ฒ)) with๐œ™ (๐‘’ = {๐‘ฅ,๐‘ฆ}) = |{๐‘ก : โˆƒ(๐‘ฅ, ๐‘ง, ๐‘ก) โˆจ (๐‘ง, ๐‘ฅ, ๐‘ก) โˆˆ ๐ธ}| + |{๐‘ก :โˆƒ(๐‘ง,๐‘ฆ, ๐‘ก) โˆจ (๐‘ฆ, ๐‘ง, ๐‘ก) โˆˆ ๐ธ, ๐‘ง โ‰  ๐‘ฅ}|, ๐‘’ โˆˆ ๐ธ๐‘‡ ;

(4) Temporal edge weight based: ๐‘๐‘’={๐‘ฅ,๐‘ฆ } = |{(๐‘ฅ,๐‘ฆ, ๐‘ก), (๐‘ฆ, ๐‘ฅ, ๐‘ก) โˆˆ๐ธ}|/๐‘š, ๐‘’ โˆˆ ๐ธ๐‘‡ ;

We empirically found the distribution (4) to be the fastest toconverge for small number ๐‘  of iterations, thus we use it in ouranalysis. We observe that many other candidate distributions can bedesigned (e.g., combining two of those already listed with weightsb, 1 โˆ’ b, b โˆˆ (0, 1)) making our framework extremely versatile.

We conclude by summarizing some nice properties of our algo-rithm: 1) it computes the estimates only for the temporal motifs

2More formally it is only necessary to assign to each ๐›ฟ-instance a known positivesampling probability.

Page 6: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

occurring in the input temporal network ๐‘‡ (except for the veryunpractical case where the motifs inM(๐ป, โ„“) have all zero counts)without generating all the possible candidates, while existing sam-pling techniques require to first generate all the candidates and thento execute the algorithms on such candidates, even for motifs withzero counts; 2) it takes advantage of the constraint that all motifsshare the same underlying topology (๐ป ), saving computation whenestimating the different counts; 3) it is trivially parallelizable: allthe ๐‘  iterations can be executed in parallel; 4) it can easily use mostof the fast state-of-the-art subgraph enumeration algorithms devel-oped for the exact subgraph isomorphism problem (see AppendixB.2).

4.3 Time Complexity

In this section we briefly describe the time complexity of odeN.odeN needs to compute the probabilities ๐‘ (๐‘’) of edges in advance,which requires a ๐‘‚ ( |๐ธ๐‘‡ |) preprocessing step. Interestingly, thisstep does not depend on the target template ๐ป , so it can be reusedfor different target templates ๐ป . One of the most expensive stepsin Algorithm 1 is the local enumeration to identify the set Hwhich in general requires exponential time (line 5). For specifictopologies this step can be implemented very efficiently with sym-metry breaking conditions and min-degree expansion. For exam-ple, if ๐ป is a triangle this โ€œlocal" enumeration to ๐‘’๐‘… = {๐‘ฅ๐‘…, ๐‘ฆ๐‘…}can be done in ๐‘‚ (min(๐‘‘๐‘ฅ๐‘… , ๐‘‘๐‘ฆ๐‘… )) time. Let |Hโˆ— | be the maximumcardinality of a set of subgraphs isomorphic to ๐ป and adjacentto an edge in ๐บ๐‘‡ . Let |๐‘†โˆ— | denote the maximum cardinality of aset ๐‘† collected (in line 7) by our algorithm odeN. Sorting ๐‘†โˆ— re-quires ๐‘‚ ( |๐‘†โˆ— | log |๐‘†โˆ— |) time. The subroutine FastCount has a com-plexity dominated by ๐‘‚ (( |๐‘†โˆ— | + โ„“) |๐ธ๐ป |โ„“ ) (see [34] and App. B.1for more details). So overall the complexity of our procedure is๐‘‚ ( |๐ธ๐‘‡ | + ๐‘  (Z๐‘’๐‘›๐‘ข๐‘š + |Hโˆ— | ( |๐‘†โˆ— | log( |๐‘†โˆ— |) + |๐ธ๐ป |โ„“ ( |๐‘†โˆ— | + โ„“)))), whereZ๐‘’๐‘›๐‘ข๐‘š is the time required by the static enumerator used as sub-routine to compute the set Hโˆ—. Such step in general is exponen-tial in the number of edges of |๐ธ๐‘‡ | and depends on the exacttechnique used as subroutine. The final complexity accounts forthe cycle (in line 3) that is repeated ๐‘  times. The parallel versionof our algorithm, which executes the cycle of line 3 in parallelon ๐œ” processing units available, leads to a time complexity of๐‘‚ ( |๐ธ๐‘‡ | + ๐‘ /๐œ” (Z๐‘’๐‘›๐‘ข๐‘š + |Hโˆ— | ( |๐‘†โˆ— | log( |๐‘†โˆ— |) + |๐ธ๐ป |โ„“ ( |๐‘†โˆ— | + โ„“)))).

4.4 Theoretical Guarantees

In this section we present the theoretical guarantees provided byodeN. All proofs are provided in Appendix D.

Recall that our algorithm outputs, for each motif ๐‘€๐‘– โˆˆM(๐ป, โ„“), ๐‘– = 1, . . . , |M(๐ป, โ„“) |, the following estimate: ๐ถ โ€ฒ

๐‘€๐‘–=

1๐‘ 

โˆ‘๐‘ ๐‘—=1 ๐‘‹

๐‘—

๐‘€๐‘–= 1

๐‘  |๐ธ๐ป |โˆ‘๐‘ 

๐‘—=1โˆ‘๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€ (๐‘’)๐‘‹๐‘’/๐‘๐‘’ . The followingshows that such estimates are unbiased estimates of ๐ถ๐‘€๐‘–

, ๐‘– =

1, . . . , |M(๐ป, โ„“) |.

Lemma 4.1. For eachmotif-count pair (๐‘€๐‘– ,๐ถโ€ฒ๐‘€๐‘–) reported in output

by odeN, ๐ถ โ€ฒ๐‘€๐‘–

is an unbiased estimate to ๐ถ๐‘€๐‘–, that is E[๐ถ โ€ฒ

๐‘€๐‘–] = ๐ถ๐‘€๐‘–

Let ๐›ผ = min{๐‘ฅ,๐‘ฆ }โˆˆ๐ธ๐‘‡ {|{(๐‘ฅ,๐‘ฆ, ๐‘ก), (๐‘ฆ, ๐‘ฅ, ๐‘ก) โˆˆ ๐ธ}|}, i.e., the mini-mum number of temporal edges of ๐‘‡ that map on an edge in ๐บ๐‘‡ .We now give an upper bound to the variance of the estimates pro-vided by Algorithm 1 for each motif reported in output.

Lemma 4.2. For eachmotif-count pair (๐‘€๐‘– ,๐ถโ€ฒ๐‘€๐‘–) reported in output

by odeN, it holds Var[๐ถ โ€ฒ๐‘€๐‘–] โ‰ค

๐ถ2๐‘€๐‘–

๐‘ 

(๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1)

To give a bound on the number ๐‘  of samples required by odeNto output a Y-approximation that holds on all motifs in outputwith probability > 1 โˆ’ [, we combine Bennettโ€™s inequality [5],an advanced result on the concentration of sums for independentrandom variables as reported in [40], with a union bound, obtainingthe following main result.

Theorem 4.3. Let ๐‘  be the number of iterations of odeN, let Y โˆˆ R+,and [ โˆˆ (0, 1). If ๐‘  โ‰ฅ

(๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1)

1(1+Y) ln(1+Y)โˆ’Y ln

(2 |M(๐ป,โ„“) |

[

)then

P[โˆƒ๐‘– โˆˆ {1, . . . , |M(๐ป, โ„“) |} : |๐ถ โ€ฒ๐‘€๐‘–โˆ’๐ถ๐‘€๐‘–

| โ‰ฅ Y๐ถ๐‘€๐‘–] โ‰ค [.

5 EXPERIMENTAL EVALUATION

We implemented odeN and tested it on several large datasets (seeSection 5.1 for details on setup, and data). Our experimental evalu-ation has the following goals: compare odeN with state-of-the-artalgorithms for approximating motif counts (Section 5.2); evaluatethe scalability of a simple parallel implementation of odeN (Sec-tion 5.3); provide a case study highlighting the usefulness of usingodeN (Section 5.4) to analyze real-world temporal networks.

5.1 Setup, and Datasets

We briefly describe the setup and the large-scale datasets used inour experimental evaluation.

We implemented our algorithm odeN in C++20 and compiledit under gcc 9.3 with optimization flag enabled (implementationavailable at https://github.com/VandinLab/odeN), additional detailson the implementation are in Appendix C. We compared odeNwithfour different baselines, denoted as PRESTO-A (PR-A), PRESTO-E(PR-E) [40], LS [26], and ES [47]. We used the original implemen-tations available from the authors. We performed all experimentsunder Ubuntu 20.04 on a machine with 64 cores, Intel Xeon E5-26982.3GHz, running each algorithm single threaded and with 300GBof maximum RAM allowed.

The datasets used in our experimental evaluation are reported inTable 1, which shows the number of nodes and edges of ๐‘‡ , the pre-cision of the timestamps, the timespan of the network, the number|๐ธ๐‘‡ | of undirected edges in the corresponding undirected projectedstatic network ๐บ๐‘‡ , the maximum degree ๐‘‘max of a node in ๐บ๐‘‡ andthe maximum number๐‘คmax of temporal edges that are mapped onthe same static edge in๐บ๐‘‡ . The datasets are from different domains:SO is a network that models interactions from the Stack-Overflowplatform [34], BI is a network of Bitcoin transactions [26], RE anetwork built from comments on the platform Reddit [26], and ECis a bipartite temporal network build from IPv4 packets exchangedbetween Chicago and Seattle [40]. See the original papers for moredetails on the networks and the processes they model.

When measuring the running times for the various algorithmswe exclude the time to read the dataset. Since ESโ€™s implementationsupports only values of โ„“ up to 4, we do not report results for ES andโ„“ > 4. Unless otherwise stated we used ๐›ฟ = 86400 for SO and RE,๐›ฟ = 43200 on BI, and ๐›ฟ = 50000 on EC, as done in previousworks [26,34, 47]. Since all algorithms used in our comparison have different

Page 7: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

Table 1: Datasets used and their statistics. See Section 5.1 for

details on the statistics reported.

Name ๐‘› ๐‘š |๐ธ๐‘‡ | ๐‘‘max ๐‘คmax Precision Timespan

SO 2.58M 47.9M 28.1M 44K 594 sec 2774 (days)BI 48.1M 113M 84.3M 2.4M 24.2K sec 2585 (days)RE 8.40M 636M 435.3M 0.3M 165K sec 3687 (days)EC 11.16M 2.32B 66.8M 0.3M 3.8M `-sec 62.0 (mins)

parameters and only odeN counts multiple motifs simultaneously,we used the following procedure to choose the parameters. For agiven target template๐ป and โ„“ , we run PRESTO-A, PRESTO-E, LS, andES for each motif inM(๐ป, โ„“) with fixed parameters, and computedtheir running time as the sum of the running times required bythe single motifs in M(๐ป, โ„“). We then fixed the parameters ofodeN so that its running time would be at most the same as theother methods, or be close to it. All the parameters used in theexperiments (including sample sizes) are reported with the sourcecode. To extract the exact counts of motifs we used a modifiedversion of the algorithm by Mackey et al. [27]. We do not reportthe running times of such algorithm since, even though it employsparallelism, it still runs several orders of magnitude slower thanapproximate approaches.

5.2 Approximation Quality and Running Time

In this section we compared the quality of the estimates and therunning times of odeN and the baseline sampling approaches.

To evaluate the approximations qualities we used the MAPE(Mean Average Percentage Error) metric over ten executions of eachalgorithm and parameter configuration. The MAPE is computedas follows: let ๐ถ โ€ฒ

๐‘€๐‘–be the estimate of ๐ถ๐‘€๐‘–

, ๐‘– = 1, . . . , |M(๐ป, โ„“) |,returned by an algorithm, then the relative error of such estimateis |๐ถ โ€ฒ

๐‘€๐‘–โˆ’๐ถ๐‘€๐‘–

|/๐ถ๐‘€๐‘–. The MAPE is the average over the ten runs of

the relative errors, in percentage. On each of the ten runs we alsomeasured the running time of each algorithm, for which we willreport the arithmetic mean.

We first discuss the quality of the estimates for different datasetswhen ๐ป is a triangle and โ„“ โˆˆ {4, 5}. For โ„“ = 4 there are |M(๐ป, โ„“) | =96 triangles, while for โ„“ = 5, |M(๐ป, โ„“) | is 800. So as long as โ„“ in-creases the approximation task becomes more challenging, due tothe exponential growth of the number of motifs. We also observethat, to the best of our knowledge, such a huge number of temporalmotifs was never tested before on large datasets due to the limita-tions of existing algorithms, while, as we will show, odeN rendersthe approximation task practical even on hundreds of motifs.

The results on the SO dataset are shown in Figure 4a. odeN pro-vides much sharper estimates than state-of-the-art sampling tech-niques for single motif estimations on motifs ๐‘€1, . . . , ๐‘€ |M(๐ป,โ„“) | :the relative error on โ„“ = 4-edge triangles is bounded by 5%, andfor โ„“ = 5-edge triangles (where |M(๐ป, โ„“) | = 800) the relative erroris bounded by 12% while state-of-the-art algorithms report muchless accurate estimates, with twice the relative error of odeN, oneach configuration. We report the running times to obtain suchestimates in Table 2. Interestingly, odeN is more than 3ร— faster withโ„“ = 4 than any sampling algorithm and 1.7ร— faster with โ„“ = 5. Forthe other datasets, since extracting all the exact counts for โ„“ > 4 isextremely time consuming, requiring up to months of computation,

PRESTO-APRESTO-E LS ES odeN

100

101

% R

elat

ive

Erro

r (M

APE)

Methods comparison StackOverflow, = 4

PRESTO-A PRESTO-E LS odeN

100

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison StackOverflow, = 5

(a)

PRESTO-APRESTO-E LS ES odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison Bitcoin, = 4

PRESTO-APRESTO-E LS ES odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison Reddit, = 4

(b)

PRESTO-A PRESTO-E LS odeN

101

102

% R

elat

ive

Erro

r (M

APE)

Methods comparison EquinixChicago, = 4

PRESTO-APRESTO-E LS ES odeN

101

% R

elat

ive

Erro

r (M

APE)

Methods comparison Stackoverflow, = 4

(c)

Figure 4: Approximation error on different datasets. (4a): SO

dataset, ๐ป is a triangle, for โ„“ = 4 (left) and โ„“ = 5 (right). (4b):

๐ป is a triangle, โ„“ = 4, BI dataset (left) and RE dataset (right).

(4c): EC dataset, ๐ป is an edge, โ„“ = 4 (left); SO dataset, ๐ป is a

square, โ„“ = 4.

we will not discuss the approximation qualities for โ„“ = 5 (since wedo not have the exact counts to evaluate them).

On dataset BI (Figure 4b left) odeN provides more concentratedestimates for the |M(๐ป, โ„“) | = 96 triangles than other algorithmsbut ES, which also has a smaller running time than odeN. This maybe related to the static graph structure of BI, which has some veryhigh-degree nodes (see Table 1). Therefore odeNmay sample edgeswith very high degree nodes, introducing an over counting in itsestimates. Nonetheless, for higher values of โ„“ this issue is amortizedover the growing number of motifs |M(๐ป, โ„“) |.

On dataset RE (Figure 4b right) the estimates by odeN are allwithin 13% of relative error and improve significantly over state-of-the-art sampling algorithms, up to one order of magnitude ofprecision. Such estimates were notably obtained with significantlysmaller running time than state-of-the-art sampling algorithms,improving up to 2ร— the running time of ES and 1.4ร— over PRESTO(as reported in Table 2).

Finally, on the EC datasets, which is a bipartite temporal networkwith more than 2 billion edges we evaluated the approximationqualities with ๐ป being an edge and โ„“ = 4 (for which |M(๐ป, โ„“) | =8), such motifs have fundamental importance in the analysis oftemporal networks since they can be seen as building blocks [16,

Page 8: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

Table 2: Running times (in seconds) to obtain the results in

Figure 4 (results are showed following the order in Figure 4).

Under๐ป we report the topolology of๐ป used: T for triangles, Efor edges, and S for squares. โ€œ-โ€ denotes not applicable, while

โ€œโœ—โ€ denotes out of RAM.

Dataset โ„“ ๐ป PR-A PR-E LS ES odeN

SO 4 T 533.4 537.7 555.5 567.2 174.4

SO 5 T 4405 4408 4390 - 2515

BI 4 T 2048.6 2065.2 2754.6 1602.9 1948.9RE 4 T 9787.1 10165.8 14289.7 13172.3 6814.9

EC 4 E 2581.5 3014.9 2981.9 โœ— 1234.3

SO 4 S 15613.7 16718.7 14344.6 26118.3 4517.9

49]. We report the results on such motifs in Figure 4c (left) (ES isnot shown since it did not terminate with the allowed memorybudget). The estimates of odeN are well concentrated and within20% of relative error, while other sampling approaches provideapproximations with a relative error up to 90% or more. Moreover,odeNโ€™s results were obtained with a speedup of at least 2ร— over allthe other sampling algorithms, rendering the approximations taskfeasible in a small amount of time on very large temporal networks.

To illustrate the enormous advantage of odeN over existing stateof the art exact and approximation algorithms, we compared thevarious algorithms on dataset SO when ๐ป is set to be a squareand โ„“ = 4, for which |M(๐ป, โ„“) | = 48. As [47] observed, amongthe 4-edge square motifs there are 16 motifs that do not grow as asingle component (i.e., their orderings start with โŸจ(1, 2) (3, 4) ยท ยท ยท โŸฉ).Estimating the counts of such motifs is particularly hard for most ofthe current state-of-the-art sampling algorithms since they generatea large number of partial matchings, while such aspect does notimpact odeN. The results are shown in Figure 4c (right). odeNprovides tight approximations under 9% of relative error for allfour-edge square motifs, while other sampling algorithms fail toprovide sharp estimates for some of the motifs. Surprisingly, asshown in Table 2, to obtain such estimates odeN required less than1.3 hours of computation while the exact computation of the countsrequired more than two weeks, and odeN it is at least 3ร— timesfaster than all algorithms, and it is 5.4ร— times faster than ES.

Overall, these results show that our algorithm odeN achievesmuch more precise estimates within a significant smaller runningtime than state of the art sampling algorithms when estimatingthe counts๐ถ๐‘€1 , . . . ,๐ถ๐‘€|M(๐ป,โ„“ ) | for different values of โ„“ and differenttopologies of the target template ๐ป (see Problem 1 in Section 2).

5.3 Parallel Implementation

In this section we briefly describe the advantages of a simple parallelimplementation of Algorithm 1. As discussed in Section 4.2 thefor cycle (from line 3) can be trivially parallelized, therefore weimplemented such strategy through a thread pooling design pattern.

We describe the results obtained with ๐ป set to be a triangle,โ„“ = 4, and on the dataset SO; similar results are observed for otherdatasets. We tested the speedup achieved with ๐œ” โˆˆ {2, 4, 8, 16}threads over the sequential implementation. Let ๐‘‡๐œ” the averagerunning time with ๐œ” threads over ten execution of odeNwith fixed

2 4 8 16Threads

2

4

6

8

10

Spee

dup

over

sequ

entia

l

s = 1 106

s = 2 106

s = 3 106

2 4 8 16Threads

2

3

4

5

6

7

8

9

10

Spee

dup

over

sequ

entia

l

= 43200 = 86400 = 129600

Figure 5: Speed-up of odeNโ€™s parallel implementation.

(Left): Varying ๐‘  and fixed ๐›ฟ ; (Right) Varying ๐›ฟ and fixed ๐‘ .

parameters, with ๐‘‡1 being the average time for running the algo-rithm sequentially. We report the value of ๐‘‡1/๐‘‡๐œ” , ๐œ” โˆˆ {2, 4, 8, 16},i.e., the speedup over the sequential implementation. Fig. 5 (Left)shows the speedup across different values of the sample size ๐‘  , with๐›ฟ = 86400. We observe an almost linear speedup up to 4 threadsand then a slightly worse performance, especially for small samplesizes, that may be related to the time needed to process each sam-ple. Fig. 5 (Right) shows how the speedup changes for ๐‘  = 2 ยท 106

and different values of ๐›ฟ . We note that our algorithm odeN seemsnot to be impacted by the value of ๐›ฟ , and always attaining similarperformances. Interestingly, as captured by our analysis in Section4.3, the algorithm does not reach a fully linear speedup since wedid not parallelized the computation of the sampling probabilities๐‘ (๐‘’), ๐‘’ โˆˆ ๐ธ๐‘‡ . As a remark, our parallel implementation is not op-timized, and more advanced parallel strategies may substantiallyincrease its speedup.

5.4 A Case Study

In this section we illustrate how counting multiple motifs, corre-sponding to the same target template ๐ป , with odeN can be used toextract useful insights from a temporal network. We consider a real-world activity network from Facebook [46]. In such network, eachnode represents a user and a temporal edge (๐‘ข, ๐‘ฃ, ๐‘ก) indicates thatuser ๐‘ข posted on ๐‘ฃ โ€™s wall at time ๐‘ก (see the original publication [46]for more details). The network contains information collected fromSeptember 2006 to January 2009. After removing self-loops, the net-work has ๐‘›=45.7K nodes,๐‘š=826K temporal edges, and |๐ธ๐‘‡ |=179Kstatic (undirected) edges. We will fist show how analyzing the motifcounts obtained with odeN provides complementary insights tothose in [46], that relied onmostly static analyses.We then concludeby discussing how the counts of the network evolve by varyingonly the parameter โ„“ (i.e., fixing ๐ป, ๐›ฟ), showing that such countssurprisingly differ with different values of such parameter.

In the original paper [46], the authors partitioned the Facebooknetwork in nine different snapshots (obtaining nine projected staticnetworks), with each snapshot spanning 90 days of interactions inthe network. The authors observed that consecutive snapshots havesmall resemblance, i.e., on average only 45% of the edges are pre-served through consecutive snapshots. The authors also observedthat despite this difference all the snapshots have similar, almostinvariant, structural properties in terms of their clustering coeffi-cient, average degree distribution, and others. We used odeN (with

Page 9: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

Y = 1, [ = 0.1) to compare the temporal networks associated tothe snapshots by computing the counts of the 8 temporal motifs inM(๐ป, โ„“ = 3) with๐ป being a triangle and ๐›ฟ = 86400 = 1 day. On eachsnapshot, after extracting the motif counts, we computed for eachmotif ๐‘€ its normalized count on the snapshot as ๐ถ๐‘€/

โˆ‘8๐‘–=1๐ถ๐‘€๐‘–

.The results are reported in Fig. (6a) (see Appendix E for a visualrepresentation of the motifs). Interestingly, even if in [46] the au-thors highlight small resemblance through different snapshots, thecounts of the motifs are stable across the different snapshots, es-pecially by looking at the first three and the last two snapshots.Surprisingly on snapshots 6 and 7, which correspond to the periodof observation of mid-2008, we observe that there is a significantvariation in the motif counts w.r.t. the previous months. This is theperiod where the authors of [46] observed a change in Facebookโ€™sinterface (that led to a drop in the growth of the network) thatseems to be correlated to the variation on the motif counts. Evenmore surprisingly, this aspect is not captured by a static analysisof the snapshots as performed in [46]. Thus, our temporal motifsanalysis through odeN is able to capture a variation in the growthof the network that the static analysis cannot highlight. (We discusshow the motifs and their counts can be used to characterize theactivity on the network in Appendix E).

We then analyzed how the different motif counts of the wholenetwork change by varying the parameter โ„“ . We fixed ๐ป a triangleand run odeN with Y = 1, [ = 0.1, ๐›ฟ = 86400. The results are shownin Figure (6b).We observe that the counts of๐‘€1, . . . , ๐‘€ |M(๐ป,โ„“) | varysignificantly by increasing โ„“ . For โ„“ = 3 almost all the motifs havethe same counts, while for larger โ„“ there are some motifs with veryhigh counts (i.e., overrepresented) and some other motifs that areunderrepresented. Overall the highest counts range from 104 to 106

from โ„“ = 3 up to โ„“ = 6. To understand if these counts increase onlyby chance, we performed a widely used statistical test (e..g, [11, 22])by computing the ๐‘ -scores of the different motif counts under thefollowing null model [31]. We generated 500 random networksby the timeline shuffling random model [11], which redistributesall the timestamps by fixing the directed projected static network.For each motif ๐‘€๐‘– , ๐‘– = 1, . . . , |M(๐ป, โ„“) | we computed a ๐‘ -scorethat is defined as follows: let ๐ถ๐‘€๐‘–

be the count of the motif inthe original network and let ๐ถ1

๐‘€๐‘–, . . . ,๐ถ500

๐‘€๐‘–be its counts on the

๐‘—-th random network ๐‘— โˆˆ {1, . . . , 500}. The ๐‘ -score is computedas, ๐‘๐‘€๐‘–

= (๐ถ๐‘€๐‘–โˆ’โˆ‘500

๐‘—=1๐ถ๐‘—

๐‘€๐‘–/500)/std(๐ถ1

๐‘€๐‘–, . . . ,๐ถ500

๐‘€๐‘–) where std(ยท)

denotes the standard deviation. The results are in Fig. (6c), and theyshow that the counts in Fig. (6b) are very significant and not dueto random fluctuations (higher ๐‘ -scores indicate that such motifcounts are significantly more frequent in ๐‘‡ than in the networkspermutated randomly). Interestingly, the ๐‘ -scores in Figure (6c)follow a similar law to the counts in Figure (6b), with the highest๐‘ -scores increasing significantly every time โ„“ increases. Notablythe highest ๐‘ -scores of motifs with โ„“ = 6 are more than 3 ordersof magnitude larger than the ๐‘ -scores of motifs with โ„“ = 3. (Wediscuss some of the significant motifs in Appendix E).

6 CONCLUSIONS

In this work we introduced odeN, our algorithm to obtain rigor-ous, high-quality, probabilistic approximations of the counts ofmultiple motifs with the same static topology in large temporal

1 2 3 4 5 6 7 8 9Temporal Network Snapshot

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

Norm

alize

d Co

unt

M1M2

M3M4

M5M6

M7M8

(a)

Motif (sorted by count)

104

105

106

Mot

if co

unt

Distribution of the motif counts with varying

= 3 = 4 = 5 = 6

(b)

Motif (sorted by Z-score)

103

104

105

106

Z-sc

ore

of th

e m

otif

Distribution of the motif counts Z-scores with varying

= 3 = 4 = 5 = 6

(c)

Figure 6: (6a): Counts of the motifs inM(๐ป, 3) with ๐ป a tri-

angle on each temporal network corresponding to one snap-

shot in [46]. (6b): Counts on the full Facebook network with

varying โ„“ . (6c): ๐‘ -scores of the motif counts with varying โ„“ .

networks. Our experimental evaluation shows that odeN allows toanalyze several motifs in large networks in a fraction of the timerequired by state-of-the-art approaches. We believe that our algo-rithm odeN will be of practical interest in the analysis of temporalnetworks, complementing many of the existing tools and helpingin understanding complex networked systems and their patterns.

There are several interesting directions for future research, in-cluding devising better edge probability distributions for odeNand choosing such distribution based on the characteristics of thedataset, since different datasets can have very different temporaledges distributions (e.g., with skewed behaviours [40]) and, thus,there may not exist a unique distribution that is effective for alltemporal networks. Another direction of future research is thederivation of improved bounds for the number of samples requiredby odeN, using for example statistical learning theory concepts,such as pseudodimensions or Rademacher averages.

ACKNOWLEDGMENTS

This work was supported, in part, by MIUR of Italy, under PRINProject n. 20174LF3T8 AHeAD, and grant L. 232 (Dipartimenti diEccellenza), and by the U. of Padova project โ€œSID 2020: RATED-Xโ€.

Page 10: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

REFERENCES

[1] Paolo Bajardi, Alain Barrat, Fabrizio Natale, Lara Savini, and Vittoria Colizza.2011. Dynamical Patterns of Cattle Trade Movements. PLoS ONE 6, 5 (may 2011),e19869. https://doi.org/10.1371/journal.pone.0019869

[2] V. Batagelj and M. Zaversnik. 2003. An O(m) Algorithm for Cores Decompositionof Networks. Advances in Data Analysis and Classification, 2011. Volume 5, Number2, 129-145 (Oct. 2003). arXiv:cs.DS/cs/0310049

[3] Jeffrey Baumes, Mark K. Goldberg, Mukkai S. Krishnamoorthy, Malik Magdon-Ismail, and Nathan Preston. 2005. Finding communities by clustering a graphinto overlapping subgraphs. In AC 2005, Proceedings of the IADIS InternationalConference on Applied Computing, Algarve, Portugal, February 22-25, 2005, Volume1, Nuno Guimarรฃes and Pedro T. Isaรญas (Eds.). IADIS, 97โ€“104.

[4] Caleb Belth, Xinyi Zheng, and Danai Koutra. 2020. Mining Persistent Activityin Continually Evolving Networks. In Proceedings of the 26th ACM SIGKDDInternational Conference on Knowledge Discovery & Data Mining. ACM. https://doi.org/10.1145/3394486.3403136

[5] George Bennett. 1962. Probability Inequalities for the Sum of IndependentRandom Variables. J. Amer. Statist. Assoc. 57, 297 (mar 1962), 33โ€“45. https://doi.org/10.1080/01621459.1962.10482149

[6] Hanjo D Boekhout, Walter A Kosters, and Frank W Takes. 2019. Efficiently count-ing complex multilayer temporal motifs in large-scale networks. ComputationalSocial Networks 6, 1 (2019), 1โ€“34.

[7] Marco Bressan, Stefano Leucci, and Alessandro Panconesi. 2019. Motivo. Pro-ceedings of the VLDB Endowment 12, 11 (jul 2019), 1651โ€“1663. https://doi.org/10.14778/3342263.3342640

[8] Matteo Ceccarello, Carlo Fantozzi, Andrea Pietracaprina, Geppino Pucci, andFabio Vandin. 2017. Clustering uncertain graphs. Proceedings of the VLDBEndowment 11, 4 (dec 2017), 472โ€“484. https://doi.org/10.1145/3186728.3164143

[9] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. 2011. Friendship and mobility.In Proceedings of the 17th ACM SIGKDD international conference on Knowledgediscovery and data mining - KDD '11. ACM Press. https://doi.org/10.1145/2020408.2020579

[10] Ying Ding. 2011. Scientific collaboration and endorsement: Network analysisof coauthorship and citation networks. Journal of Informetrics 5, 1 (jan 2011),187โ€“203. https://doi.org/10.1016/j.joi.2010.10.008

[11] Laetitia Gauvin, Mathieu Gรฉnois, Mรกrton Karsai, Mikko Kivelรค, Taro Takaguchi,Eugenio Valdano, and Christian L. Vestergaard. 2018. Randomized referencemodels for temporal networks. (June 2018). arXiv:physics.soc-ph/1806.04032

[12] M. Girvan and M. E. J. Newman. 2002. Community structure in social andbiological networks. Proceedings of the National Academy of Sciences 99, 12 (jun2002), 7821โ€“7826. https://doi.org/10.1073/pnas.122653799

[13] Saket Gurukar, Sayan Ranu, and Balaraman Ravindran. 2015. COMMIT. InProceedings of the 2015 ACM SIGMOD International Conference on Management ofData. ACM. https://doi.org/10.1145/2723372.2737791

[14] Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: towardsultrafast and robust subgraph isomorphism search in large graph databases. InProceedings of the ACM SIGMOD International Conference on Management of Data,SIGMOD 2013, New York, NY, USA, June 22-27, 2013, Kenneth A. Ross, DiveshSrivastava, and Dimitris Papadias (Eds.). ACM, 337โ€“348. https://doi.org/10.1145/2463676.2465300

[15] Petter Holme and Jari Saramรคki. 2012. Temporal networks. Physics Reports 519,3 (oct 2012), 97โ€“125. https://doi.org/10.1016/j.physrep.2012.03.001

[16] Petter Holme and Jari Saramรคki (Eds.). 2019. Temporal Network Theory. SpringerInternational Publishing. https://doi.org/10.1007/978-3-030-23495-9

[17] Y. Hulovatyy, H. Chen, and T. Milenkoviฤ‡. 2015. Exploring the structure andfunction of temporal networks with dynamic graphlets. Bioinformatics 31, 12(jun 2015), i171โ€“i180. https://doi.org/10.1093/bioinformatics/btv227

[18] Ali Jazayeri and Christopher C Yang. 2020. Motif discovery algorithms in staticand temporal networks: A survey. Journal of Complex Networks 8, 4 (aug 2020).https://doi.org/10.1093/comnet/cnaa031

[19] Alpรกr Jรผttner and Pรฉter Madarasi. 2018. VF2++ - An improved subgraph isomor-phism algorithm. Discret. Appl. Math. 242 (2018), 69โ€“81. https://doi.org/10.1016/j.dam.2018.02.018

[20] Chrysanthi Kosyfaki, Nikos Mamoulis, Evaggelia Pitoura, and Panayio-tis Tsaparas. 2018. Flow Motifs in Interaction Networks. (Oct. 2018).arXiv:cs.SI/1810.08408

[21] Lauri Kovanen, Mรกrton Karsai, Kimmo Kaski, Jรกnos Kertรฉsz, and Jari Saramรคki.2011. Temporal motifs in time-dependent networks. Journal of Statistical Me-chanics: Theory and Experiment 2011, 11 (nov 2011), P11005. https://doi.org/10.1088/1742-5468/2011/11/p11005

[22] L. Kovanen, K. Kaski, J. Kertesz, and J. Saramaki. 2013. Temporal motifs revealhomophily, gender-specific patterns, and group talk in call sequences. Proceedingsof the National Academy of Sciences 110, 45 (oct 2013), 18070โ€“18075. https://doi.org/10.1073/pnas.1307941110

[23] Rohit Kumar and Toon Calders. 2018. 2SCENT. Proceedings of the VLDB Endow-ment 11, 11 (jul 2018), 1441โ€“1453. https://doi.org/10.14778/3236187.3269460

[24] Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolutionof online social networks. In Proceedings of the 12th ACM SIGKDD internationalconference on Knowledge discovery and data mining - KDD '06. ACM Press. https://doi.org/10.1145/1150402.1150476

[25] Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, and Jeong-Hoon Lee. 2012. AnIn-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases.Proc. VLDB Endow. 6, 2 (2012), 133โ€“144. https://doi.org/10.14778/2535568.2448946

[26] Paul Liu, Austin R. Benson, and Moses Charikar. 2019. Sampling Methodsfor Counting Temporal Motifs. In Proceedings of the Twelfth ACM Interna-tional Conference on Web Search and Data Mining (Melbourne VIC, Australia)(WSDM โ€™19). Association for Computing Machinery, New York, NY, USA, 294โ€“302.https://doi.org/10.1145/3289600.3290988

[27] Patrick Mackey, Katherine Porterfield, Erin Fitzhenry, Sutanay Choudhury, andGeorge Chin Jr. 2018. A Chronological Edge-Driven Approach to TemporalSubgraph Isomorphism. (Jan. 2018). arXiv:cs.DS/1801.08098

[28] Naoki Masuda and Renaud Lambiotte. 2016. A Guide to Temporal Networks.WORLD SCIENTIFIC (EUROPE). https://doi.org/10.1142/q0033

[29] Tijana Milenkoviฤ‡ and Nataลกa Prลพulj. 2008. Uncovering Biological Network Func-tion via Graphlet Degree Signatures. Cancer Informatics 6 (jan 2008), CIN.S680.https://doi.org/10.4137/cin.s680

[30] R. Milo. 2002. Network Motifs: Simple Building Blocks of Complex Networks.Science 298, 5594 (oct 2002), 824โ€“827. https://doi.org/10.1126/science.298.5594.824

[31] R. Milo. 2004. Superfamilies of Evolved and Designed Networks. Science 303,5663 (mar 2004), 1538โ€“1542. https://doi.org/10.1126/science.1089167

[32] Mark Newman. 2010. Networks. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001

[33] Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. 2009. Patterns anddynamics of users' behavior and interaction: Network analysis of an onlinecommunity. Journal of the American Society for Information Science and Technology60, 5 (may 2009), 911โ€“932. https://doi.org/10.1002/asi.21015

[34] Ashwin Paranjape, Austin R Benson, and Jure Leskovec. 2017. Motifs in temporalnetworks. In Proceedings of the Tenth ACM International Conference on Web Searchand Data Mining. 601โ€“610.

[35] Noujan Pashanasangi and C. Seshadhri. 2019. Efficiently Counting Vertex Orbitsof All 5-vertex Subgraphs, by EVOKE. CoRR abs/1911.10616 (2019). https://doi.org/10.1145/3336191.3371773 arXiv:1911.10616

[36] N. Przulj. 2007. Biological network comparison using graphlet degree distribution.Bioinformatics 23, 2 (jan 2007), e177โ€“e183. https://doi.org/10.1093/bioinformatics/btl301

[37] Xuguang Ren and JunhuWang. 2015. Exploiting Vertex Relationships in Speedingup Subgraph Isomorphism over Large Graphs. Proc. VLDB Endow. 8, 5 (2015),617โ€“628. https://doi.org/10.14778/2735479.2735493

[38] Pedro Ribeiro, Pedro Paredes, Miguel EP Silva, David Aparicio, and FernandoSilva. 2019. A survey on subgraph counting: concepts, algorithms and applicationsto network motifs and graphlets. arXiv preprint arXiv:1910.13011 (2019).

[39] Ryan A. Rossi, Nesreen K. Ahmed, Aldo Carranza, David Arbour, Anup Rao,Sungchul Kim, and Eunyee Koh. 2021. Heterogeneous Graphlets. ACM Trans-actions on Knowledge Discovery from Data 15, 1 (jan 2021), 1โ€“43. https://doi.org/10.1145/3418773

[40] Ilie Sarpe and Fabio Vandin. 2021. PRESTO: Simple and Scalable Sampling Tech-niques for the Rigorous Approximation of Temporal Motif Counts. SIAM Interna-tional Conference on Data Mining (2021). https://doi.org/10.1137/1.9781611976700.17

[41] Alice C. Schwarze and Mason A. Porter. 2020. Motifs for processes on networks.(July 2020). arXiv:physics.soc-ph/2007.07447

[42] Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. 2002. Networkmotifs in the transcriptional regulation network of Escherichia coli. NatureGenetics 31, 1 (apr 2002), 64โ€“68. https://doi.org/10.1038/ng881

[43] Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, andKarstenM. Borgwardt. 2009. Efficient graphlet kernels for large graph comparison.In Proceedings of the Twelfth International Conference on Artificial Intelligenceand Statistics, AISTATS 2009, Clearwater Beach, Florida, USA, April 16-18, 2009(JMLR Proceedings), David A. Van Dyk and Max Welling (Eds.), Vol. 5. JMLR.org,488โ€“495. http://proceedings.mlr.press/v5/shervashidze09a.html

[44] Shixuan Sun, Xibo Sun, Yulin Che, Qiong Luo, and Bingsheng He. 2020. Rapid-Match: a holistic approach to subgraph query processing. Proceedings of theVLDB Endowment 14 (2020), 176โ€“188. https://doi.org/10.14778/3425879.3425888

[45] Kun Tu, Jian Li, Don Towsley, Dave Braines, and Liam D. Turner. 2019. gl2vec. InProceedings of the 2019 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining. ACM. https://doi.org/10.1145/3341161.3342908

[46] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009.On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACMworkshop on Online social networks - WOSN '09. ACM Press. https://doi.org/10.1145/1592665.1592675

[47] Jingjing Wang, Yanhao Wang, Wenjun Jiang, Yuchen Li, and Kian-Lee Tan. 2020.Efficient Sampling Algorithms for Approximate Temporal Motif Counting. InProceedings of the 29th ACM International Conference on Information & KnowledgeManagement. ACM. https://doi.org/10.1145/3340531.3411862

Page 11: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

[48] Shuo Yu, Yufan Feng, Da Zhang, Hayat Dino Bedru, Bo Xu, and Feng Xia. 2020.Motif discovery in networks: A survey. Computer Science Review 37 (2020),100267.

[49] Qiankun Zhao, Yuan Tian, Qi He, Nuria Oliver, Ruoming Jin, and Wang-ChienLee. 2010. Communication motifs. In Proceedings of the 19th ACM internationalconference on Information and knowledge management - CIKM '10. ACM Press.https://doi.org/10.1145/1871437.1871694

[50] Bo Zong, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan,Ambuj K. Singh, and Guofei Jiang. 2015. Behavior query discovery in system-generated temporal graphs. Proceedings of the VLDB Endowment 9, 4 (dec 2015),240โ€“251. https://doi.org/10.14778/2856318.2856320

Table 3: Notation table.

Symbol Description

๐‘‡ = (๐‘‰ , ๐ธ) Temporal network๐‘›,๐‘š Number of nodes and temporal edges of ๐‘‡๐บ๐‘‡ Undirected projected static network of ๐‘‡

๐‘€๐‘– , ๐‘– โˆˆ [1, |M(๐ป, โ„“) |] Motifs inM(๐ป, โ„“)๐‘˜ Nodes in the motifsโ„“ Edges of the motifs๐›ฟ Duration limit of ๐›ฟ-instances

๐‘€ = (K, ๐œŽ) Motif as pair (multigraph, ordering)U(๐‘€,๐›ฟ) Set of ๐›ฟ-instances of๐‘€ from ๐‘‡

๐ถ๐‘€ Number of ๐›ฟ-instances of๐‘€ in ๐‘‡๐บ๐‘ข [๐‘€] Undirected graph associated to K

M(๐ป, โ„“)Set of distinct motifs with โ„“ edges s.t.it holds ๐บ๐‘ข [๐‘€๐‘– ] โ‰ƒ ๐ป โˆ€๐‘€๐‘– โˆˆ M(๐ป, โ„“).

๐ป Static undirected target template๐‘‰๐ป , ๐ธ๐ป Set of nodes and edges of the target ๐ป๐ถ๐‘€๐‘–(๐‘’) Number of ๐›ฟ-instances containing ๐‘’ โˆˆ ๐บ๐‘‡

๐‘  Number of samples collected by odeN๐‘‹๐‘’ Indicator variable denoting if ๐‘’ โˆˆ ๐บ๐‘‡ is sampled

๐‘๐‘’ , ๐‘ (๐‘’) Probability of sampling edge ๐‘’ โˆˆ ๐บ๐‘‡

๐‘‹๐‘—

๐‘€๐‘–Estimate of motif๐‘€๐‘– obtained at odeNโ€™s ๐‘—-th step

๐ถ โ€ฒ๐‘€๐‘–

Final odeNโ€™s estimate of ๐ถ๐‘€๐‘–

Y, [ Quality and confidence parameters๐œ” Number of threads in odeN parallel

A NOTATION

The notation used throughout this work is summarized in Table 3.

B ODENโ€™S SUBROUTINES

B.1 FastUpdate and its Subroutines

We now discuss the FastUpdate routine that is called in line 10 ofAlgorithm 1 to keep ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  updated. The FastUpdate subrou-tine is shown in Algorithm 2. ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  maintains the weightedcounts of themotif sequences identified, therefore to keep it updatedwe first count the ๐›ฟ-instances of๐‘€๐‘– , ๐‘– = 1, . . . |M(๐ป, โ„“) | within thesampled temporal network i.e. ๐‘† , and then rescale each count op-portunely. Such routine will feature two main aspects, i) an efficientadaptation of the algorithm by Paranjape et al. [34] and ii) an ef-ficient encoding of the various sequences representing the motifsoccurrences within integers that will allow for fast operations (com-parisons to distinguish between different motifs and fast updatesto the data structures).

We now discuss how FastUpdate counts all the ๐›ฟ-instances in ๐‘† .First observe that we already know that ๐บ๐‘† โ‰ƒ ๐ป , and that ๐‘† can berewritten as ๐‘† = (((๐‘ฅ1, ๐‘ฆ1), ๐‘ก1), . . . , ((๐‘ฅโ„“ , ๐‘ฆโ„“ ), ๐‘กโ„“ ). We first computethe set ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ = {(๐‘ฅ,๐‘ฆ) : ((๐‘ฅ,๐‘ฆ), ๐‘ก) โˆˆ ๐‘†} and we assign to eachedge in ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ a unique identifier (lines 4-5). Then we run anefficient implementation of the algorithm by Paranjape et al. [34]that computes through dynamic programming the counts of allthe subsequences of edges (๐‘ฅ,๐‘ฆ) s.t. (๐‘ฅ,๐‘ฆ, ๐‘ก) โˆˆ ๐‘† having length โ„“

and occurring within ๐›ฟ-time (lines 6-10). In Algorithm 3 we showour implementation of the subroutines needed to execute lines

Page 12: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

Algorithm 2: FastUpdate

Input: ๐›ฟ, ๐‘†,๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  , ๐‘ (๐‘’๐‘…), ๐ป1 ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ โ† {(๐‘ฅ,๐‘ฆ) : (๐‘ฅ,๐‘ฆ, ๐‘ก) โˆˆ ๐‘†}2 ๐‘€๐‘Ž๐‘๐‘–๐‘‘ โ† {}, ๐ธ๐‘Ÿ๐‘’๐‘ฃ โ† [],๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  โ† {}, ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก โ† 13 idโ† 04 foreach ๐‘’ โˆˆ ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ do5 ๐ธ๐‘Ÿ๐‘’๐‘ฃ [id] โ† ๐‘’ ,๐‘€๐‘Ž๐‘๐‘–๐‘‘ {๐‘’} โ† id++

6 foreach (๐‘ฅ,๐‘ฆ, ๐‘ก) โˆˆ ๐‘† do

7 while ๐‘ก โˆ’ ๐‘ก๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก > ๐›ฟ do

8 Decrement(๐‘€๐‘Ž๐‘๐‘–๐‘‘ [(๐‘ฅ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก , ๐‘ฆ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก )], ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  )9 ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก โ† ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก + 1

10 Increment(๐‘€๐‘Ž๐‘๐‘–๐‘‘ [(๐‘ฅ,๐‘ฆ)], ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  )11 foreach key ๐‘˜ of length โ„“ โˆˆ ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  .๐‘˜๐‘’๐‘ฆ๐‘  do

12 ๐‘€ โ€ฒ โ† ReconstructMotif(๐‘˜, ๐ธ๐‘Ÿ๐‘’๐‘ฃ )13 if ๐บ๐‘ข [๐‘€ โ€ฒ] โ‰ƒ ๐ป then

14 ๐‘€๐‘– โ† EncodeAndClassifyMotif(๐‘€ โ€ฒ)15 ๐‘‹๐‘€๐‘–

โ† ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  {๐‘˜}/(|๐ธ๐ป |๐‘ (๐‘’๐‘…))16 ๐‘‹ โ€ฒ

๐‘€๐‘–โ† ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  {๐‘€๐‘– }

17 ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘  {๐‘€๐‘– } โ† ๐‘‹ โ€ฒ๐‘€๐‘–+ ๐‘‹๐‘€๐‘–

6-10 (see the original paper [34] for full details and correctness).Intuitively, lines 6-10 of Algorithm 2 scan the input sequence ๐‘†linearly, maintaining inmemory information about the edgeswithin๐›ฟ time from the processed one. Through such scan the algorithmupdates ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  to keep the counts of the sequences havingat most โ„“ edges over the set ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ . Starting the cycle in line 11,๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  contains the counts of all the โ„“ subsequences of edgesfrom ๐‘† over the set ๐ธ๐‘ข๐‘›๐‘–๐‘ž๐‘ข๐‘’ . We highlight that we assign to eachstatic edge of ๐‘† an ID of ๐‘ bits. This allows us to encode eachsequence up to ๐‘— = 1, . . . , โ„“ edges, occurring within ๐›ฟ time, in aninteger using ๐‘— ยท ๐‘ bits through bitwise operations (โ€œ<<โ€ denotesright shift and โ€œ|โ€ denotes bitwise or) to allow for fast updates to๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  .

To obtain the estimates of motifs ๐‘€1, . . . , ๐‘€ |M(๐ป,โ„“) | , for eachโ„“ sequence of edges identified we reconstruct the correspondinggraph and thus the motif ๐‘€ โ€ฒ that the sequences is an instance ofin line 12 (the multigraph is given by the edges IDโ€™s while theordering of the edges is given by the sequence itself). We thencheck if๐บ๐‘ข [๐‘€ โ€ฒ] is isomorphic to ๐ป (constraint (1) from Problem 1).If so we encode the motif in a sequence of 2๐‘โ„“ bits that allows us toclassify such motif (line 14) in order to distinguish between distinctmotifs (recall we want๐‘€๐‘– ๏ฟฝ๐œ ๐‘€๐‘— , ๐‘– โ‰  ๐‘— ). The encoding is computedas follows: given ๐‘€ โ€ฒ = โŸจ(๐‘ฅ1, ๐‘ฆ1), . . . , (๐‘ฅโ„“ , ๐‘ฆโ„“ )โŸฉ we assign to eachnode an incremental ID according to its first appearance in๐‘€ โ€ฒ andwe obtain the final encoding as โŸจID(๐‘ฅ1)ID(๐‘ฆ1) . . . ID(๐‘ฅโ„“ )ID(๐‘ฆโ„“ )โŸฉ. Itis easily seen that two motifs๐‘€1, ๐‘€2 share the same encoding iff itholds๐‘€1 ๏ฟฝ๐œ ๐‘€2 as desired, given that the motifs are directed andthe definition of distinct motifs accounts for the ordering in whichedges appear. We provide an example below.

Example B.1. Let us consider๐‘€1, ๐‘€2, and๐‘€3 from Figure 2. Con-sider ๐œŽ1 = โŸจ(๐‘ฆ, ๐‘ฅ), (๐‘ฆ, ๐‘ง), (๐‘ฅ, ๐‘ง)โŸฉ, then by assigning an incrementalID to each node according to its first appearance in ๐œŽ1 we get

Algorithm 3: Subroutines of FastUpdateFunction Increment(id,๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘ )

1 foreach ๐‘˜ โˆˆ SortByDecLength(๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  .๐‘˜๐‘’๐‘ฆ๐‘ ) do2 if ๐‘˜.๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž < โ„“ then

3 ^ โ† (๐‘˜ << ๐‘) |id4 ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [^] โ† ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [^] +๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [๐‘˜]

5 ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [id] โ† ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [id] + 1Function Decrement(id,๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘ )

6 ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [id] โ† ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [id] โˆ’ 17 foreach ๐‘˜ โˆˆ SortByIncLength(๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  .๐‘˜๐‘’๐‘ฆ๐‘ ) do8 if ๐‘˜.๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž < โ„“ โˆ’ 1 then

9 ^ โ† (id << (๐‘˜.๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž ยท ๐‘)) |๐‘˜10 ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [^] โ† ๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [^] โˆ’๐‘€๐‘Ž๐‘๐‘๐‘œ๐‘ข๐‘›๐‘ก๐‘  [๐‘˜]

ID(๐‘ฆ) = 1, ID(๐‘ฅ) = 2, ID(๐‘ง) = 3 so the final encoding of ๐‘€1 isโŸจ121323โŸฉ. Following a similar procedure the encoding of ๐‘€2 isโŸจ121323โŸฉ, while the encoding๐‘€3 is โŸจ121332โŸฉ. The encodings of๐‘€1and๐‘€2 coincide while differing from the one of๐‘€3 as desired.

After this step we update the global data structure ๐ถ๐‘’๐‘ ๐‘ก๐‘–๐‘š๐‘Ž๐‘ก๐‘’๐‘ 

by summing to each motifโ€™s estimate, its count in ๐‘† divided by|๐ธ๐ป |๐‘ (๐‘’๐‘…) where ๐‘ (๐‘’๐‘…) is the probability of edge ๐‘’๐‘… of being sam-pled (lines 15-17), which we prove in Section 4.4 to be the correctweighting schema to output an unbiased estimate.

B.2 Exact Subgraph Enumeration

In this section we briefly discuss the algorithms for subgraph enu-meration that can be adapted to our Algorithm 1 (in line 5). Unfor-tunately we cannot easily use the algorithms for extracting ๐‘˜-nodemotifs mentioned in Section 3 as is, since they do not provide thelocal enumeration step required by odeN.

In fact, the problem most related to the exact enumeration werequire is the labelled query graph matching problem. In such set-ting one is provided a labelled query graph ๐ป = (๐‘‰๐ป , ๐ธ๐ป , ๐ฟ๐ป ), anda labelled graph ๐บ = (๐‘‰ , ๐ธ, ๐ฟ) (where labels can be colors for ex-ample, see [25]), ๐ฟ may be defined both on edges or vertices. Theproblem requires to find all the subgraphs โ„Žโ€ฒ โŠ† ๐บ isomorphic to๐ป , which could be either induced or not but must preserve thelabelling properties (i.e., if (๐‘ฅ,๐‘ฆ) โˆˆ ๐ธ is mapped to (๐‘ฅ โ€ฒ, ๐‘ฆโ€ฒ) โˆˆ ๐ป then(๐ฟ(๐‘ฅ), ๐ฟ(๐‘ฆ)) = (๐ฟ๐ป (๐‘ฅ โ€ฒ), ๐ฟ๐ป (๐‘ฆโ€ฒ))). To explain how we take advan-tage of the algorithms developed for the problem above we need tointroduce the following definitions (adapted from [35]).

Definition B.2. Let ๐ป = (๐‘‰๐ป , ๐ธ๐ป ) be an undirected graph, anautomorphism is a bijection ๐œ‹ : ๐‘‰๐ป โ†ฆโ†’ ๐‘‰๐ป such that (๐‘ฅ,๐‘ฆ) โˆˆ ๐ธ๐ป iff(๐œ‹ (๐‘ฅ), ๐œ‹ (๐‘ฆ)) โˆˆ ๐ธ๐ป .

Definition B.3. Let ๐ป = (๐‘‰๐ป , ๐ธ๐ป ) be an undirected graph, we saythat two edges ๐‘’ = (๐‘ฅ,๐‘ฆ), ๐‘’ โ€ฒ = (๐‘ฅ โ€ฒ, ๐‘ฆโ€ฒ) โˆˆ ๐ธ๐ป belong to the sameedge-orbit iff there exists an automorphism that maps ๐‘’ on ๐‘’ โ€ฒ.

In order to adapt the algorithms for the labelled query graphmatching problem we proceed in the following way: 1) colour thenodes of๐บ๐‘‡ with a fixed colour (say red) 2) Once sampled ๐‘’๐‘… โˆˆ ๐บ๐‘‡ ,colour its endpoint nodes with a different colour (say blue), call

Page 13: odeN: Simultaneous Approximation of Multiple Motif Counts

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia

the map from the last two points ๐ฟ๐บ๐‘‡; 3) compute the different

edge-orbits of the pattern ๐ป (by enumerating the automorphismsof ๐ป ) and for each edge-orbit choose an edge, colour its endpointnodes with the same colour assigned to ๐‘’๐‘… , and keep the colouron the other edges the same as ๐บ๐‘‡ , call this map ๐ฟ๐ป ; 4) run analgorithm for the labelled query graph matching problem withgraph ๐บ๐‘‡ = (๐‘‰๐‘‡ , ๐ธ๐‘‡ , ๐ฟ๐บ๐‘‡

) and pattern ๐ป = (๐‘‰๐ป , ๐ธ๐ป , ๐ฟ๐ป ) 5) thedesired subgraphs (H ) are the union over the different edge-orbitsenumeration steps.

C IMPLEMENTATION DETAILS

In this section we provide additional implementation details, com-plementing the description of Section 5.1.

In our implementation, we used two main structures: first, anadjacency list3, that allows to query for an edge between ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰in ๐‘‚ (log(min(๐‘‘๐‘ข , ๐‘‘๐‘ฃ))). Second, we used a hashmap to store foreach static directed edge the timestamps of the temporal edgesthat map on that edge, leading to ๐‘‚ (1) complexity of querying forthe timestamps of a static edge in ๐บ๐‘‡ . The initialization of suchstructures is done in ๐‘‚ (1) per each processed temporal edge whileloading the dataset, by knowing the number of nodes ๐‘›. Many stateof the art algorithms exist for the local enumeration of motifs (e.g.,[14, 37, 44]), we provide in our code a general algorithm based onthe algorithm VF2++ [19]. However, instead of using the generalprocedure described in Section B.2, in our test we relied on a simplealgorithm that locally enumerates the subgraphs containing anedge ๐‘’ = {๐‘ฅ,๐‘ฆ} isomorphic to ๐ป : for triangles the algorithm runs in๐‘‚ (min(๐‘‘๐‘ฅ , ๐‘‘๐‘ฆ) log(๐‘›)), while when๐ป is a square the algorithm runsin ๐‘‚ (min(๐‘‘๐‘ฅ , ๐‘‘๐‘ฆ)๐‘‘๐‘š๐‘Ž๐‘ฅ log(๐‘›)), with ๐‘‘๐‘š๐‘Ž๐‘ฅ the maximum degree ofa node in ๐บ๐‘‡ .

D PROOFS

In this section we provide the proofs not included in the main text.First we recall that ๐ถ๐‘€๐‘–

(๐‘’) the number of ๐›ฟ-instances ofmotif ๐‘€๐‘– , ๐‘– = 1, . . . , |M(๐ป, โ„“) | from ๐‘‡ whose undirected pro-jected static network contains edge ๐‘’ โˆˆ ๐บ๐‘‡ , i.e., ๐ถ๐‘€๐‘–

(๐‘’) =โˆ‘โ„ŽโŠ†๐บ๐‘‡ ,โ„Žโ‰ƒ๐ป :๐‘’โˆˆโ„Ž |U(โ„Ž,๐‘€๐‘– ) |, ๐‘’ โˆˆ ๐บ๐‘‡ where U(โ„Ž,๐‘€๐‘– ) is the set of

๐›ฟ-instances of motif ๐‘€๐‘– whose static projected graph is โ„Ž โŠ† ๐บ๐‘‡ .Then based on the above it is simple to notice that the following for-mula holds for eachmotif๐‘€๐‘– , ๐‘– = 1, . . . , |M(๐ป, โ„“) |:โˆ‘๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’) =

|๐ธ๐ป |๐ถ๐‘€ . This relation will be the key for proving the unbiasednessof the estimates provided by odeN, as we show next.

Proof of Lemma 4.1. First let us consider the expectation of๐‘‹

๐‘—

๐‘€๐‘–, ๐‘– = 1, . . . , |M(๐ป, โ„“) |, ๐‘— = 1, . . . , ๐‘  :

E

1|๐ธ๐ป |

โˆ‘๏ธ๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)๐‘‹๐‘’

๐‘๐‘’

=1|๐ธ๐ป |

โˆ‘๏ธ๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)E[๐‘‹๐‘’ ]๐‘๐‘’

= ๐ถ๐‘€๐‘–

where we used the linearity of expectation and the facts thatE[๐‘‹๐‘’ ] = ๐‘๐‘’ , ๐‘’ โˆˆ ๐บ๐‘‡ , and

โˆ‘๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’) = |๐ธ๐ป |๐ถ๐‘€๐‘–

; thus ๐‘‹ ๐‘—

๐‘€๐‘–, ๐‘– =

3We used the one provided by SNAP: https://github.com/snap-stanford/snap, moreefficient implementations can be also adopted improving the global running times.

1, . . . , |M(๐ป, โ„“) |, ๐‘— = 1, . . . , ๐‘  are unbiased estimates of ๐ถ๐‘€๐‘–, com-

bining such result to ๐ถ โ€ฒ๐‘€๐‘–

we obtain,

E[๐ถ โ€ฒ๐‘€๐‘–] = E

1๐‘ 

๐‘ โˆ‘๏ธ๐‘—=1

๐‘‹๐‘—

๐‘€๐‘–

=1๐‘ 

๐‘ โˆ‘๏ธ๐‘—=1E[๐‘‹ ๐‘—

๐‘€๐‘–] =

๐‘ ๐ถ๐‘€๐‘–

๐‘ = ๐ถ๐‘€๐‘–

by the linearity of expectation. โ–ก

Proof of Lemma 4.2. We need to bound the variance of the es-timate ๐ถ โ€ฒ

๐‘€๐‘–, first we rewrite the estimator

๐ถ โ€ฒ๐‘€๐‘–=

1๐‘ 

๐‘ โˆ‘๏ธ๐‘—=1

1|๐ธ๐ป |

โˆ‘๏ธ๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)๐‘‹๐‘’

๐‘๐‘’=

1๐‘ 

๐‘ โˆ‘๏ธ๐‘—=1

๐‘‹๐‘—

๐‘€๐‘–

Since the ๐‘  variables ๐‘‹ ๐‘—

๐‘€๐‘–, ๐‘— โˆˆ [1, ๐‘ ] are independent (edges are

drawn independently at each iteration of the outer for loop inAlgorithm 1), it holds var(๐ถ โ€ฒ

๐‘€๐‘–) = var( 1๐‘ 

โˆ‘๐‘ ๐‘—=1 ๐‘‹๐‘€๐‘–

) = 1๐‘  var(๐‘‹๐‘€๐‘–

)we thus only need to compute the variance of the variable ๐‘‹๐‘€๐‘–

. Letus recall var(๐‘‹๐‘€๐‘–

) = E[๐‘‹ 2๐‘€๐‘–] โˆ’ E[๐‘‹๐‘€๐‘–

]2 = E[๐‘‹ 2๐‘€๐‘–] โˆ’ ๐ถ2

๐‘€๐‘–by the

previous lemma. We will now bound E[๐‘‹ 2๐‘€๐‘–].

E[๐‘‹ 2๐‘€๐‘–] = E

1|๐ธ๐ป |2

โˆ‘๏ธ๐‘’1โˆˆ๐บ๐‘‡

โˆ‘๏ธ๐‘’2โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’1)๐ถ๐‘€๐‘–

(๐‘’2)๐‘‹๐‘’1๐‘‹๐‘’2

๐‘๐‘’1๐‘๐‘’2

=

1|๐ธ๐ป |2

โˆ‘๏ธ๐‘’2โˆˆ๐บ๐‘‡

๐ถ2๐‘€๐‘–(๐‘’2)

1๐‘๐‘’2โ‰ค 1|๐ธ๐ป |2

โˆ‘๏ธ๐‘’2โˆˆ๐บ๐‘‡

๐ถ2๐‘€๐‘–(๐‘’2)

๐‘š

๐›ผ=

=๐‘š

๐›ผ |๐ธ๐ป |2โˆ‘๏ธ

๐‘’2โˆˆ๐บ๐‘‡

๐ถ2๐‘€๐‘–(๐‘’2)

(1.)โ‰ค ๐‘š

๐›ผ |๐ธ๐ป |2|๐ธ๐ป |๐ถ2

๐‘€๐‘–=๐‘š๐ถ2

๐‘€๐‘–

๐›ผ |๐ธ๐ป |

where we used the linearity of expectations, the fact thatE[๐‘‹๐‘’1๐‘‹๐‘’2 ] = ๐‘๐‘’1 only for ๐‘’1 = ๐‘’2 otherwise is 0, a boundon the minimum probability ๐‘๐‘’ where ๐‘๐‘’ โ‰ค ๐›ผ/๐‘š,โˆ€๐‘’ โˆˆ ๐บ๐‘‡

for ๐›ผ defined as in Section 4.4. In (1.) we used the fact that๐ถ๐‘€๐‘–(๐‘’) = _๐‘’๐ถ๐‘€๐‘–

, ๐‘’ โˆˆ ๐บ๐‘‡ , _๐‘’ โˆˆ [0, 1], thenโˆ‘๐‘’2โˆˆ๐บ๐‘‡

๐ถ2๐‘€๐‘–(๐‘’2) =โˆ‘

๐‘’2โˆˆ๐บ๐‘‡_2๐‘’2๐ถ

2๐‘€๐‘–โ‰ค ๐ถ2

๐‘€๐‘–

โˆ‘๐‘’2โˆˆ๐บ๐‘‡

_๐‘’2 = |๐ธ๐ป |๐ถ2๐‘€๐‘–

since _๐‘’2 โˆˆ [0, 1]and further

โˆ‘๐‘’โˆˆ๐บ๐‘‡

_๐‘’ = |๐ธ๐ป | byโˆ‘๐‘’โˆˆ๐บ๐‘‡

_๐‘’๐ถ๐‘€๐‘–= |๐ธ๐ป |๐ถ๐‘€๐‘–

.Thus the variance of ๐‘‹๐‘€๐‘–

is bounded by:

Var(๐‘‹๐‘€๐‘–) โ‰ค

๐‘š๐ถ2๐‘€๐‘–

๐›ผ |๐ธ๐ป |โˆ’๐ถ2

๐‘€๐‘–= ๐ถ2

๐‘€๐‘–

(๐‘š

๐›ผ |๐ธ๐ป |โˆ’ 1

)combining everything together we obtain that var(๐ถ โ€ฒ

๐‘€๐‘–) โ‰ค

๐ถ2๐‘€๐‘–

๐‘ (๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1), concluding the proof. โ–ก

Proof of Theorem 4.3. Let us fix ๐‘€๐‘– , ๐‘– โˆˆ [1, |M(๐ป, โ„“) |] wefirst show a bound to the following probability P[|๐ถ โ€ฒ

๐‘€๐‘–โˆ’๐ถ๐‘€๐‘–

| โ‰ฅY๐ถ๐‘€๐‘–

]. We want to derive such bound through the applicationof Bennettโ€™s inequality to the following summation: 1

๐‘ 

โˆ‘๐‘ ๐‘—=1 ๐‘‹

๐‘—

๐‘€๐‘–,

we already know that E[๐‘‹ ๐‘—

๐‘€๐‘–] = ๐ถ๐‘€๐‘–

and E[(๐‘‹ ๐‘—

๐‘€๐‘–โˆ’ ๐ถ๐‘€๐‘–

)2] โ‰ค

๐ถ2๐‘€๐‘–

(๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1)= ๐‘ฃ2

๐‘—for ๐‘— = 1, . . . , ๐‘  it holds:

๐‘‹๐‘—

๐‘€๐‘–=

1|๐ธ๐ป |

โˆ‘๏ธ๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)๐‘‹๐‘’

๐‘๐‘’โ‰ค 1|๐ธ๐ป |

โˆ‘๏ธ๐‘’โˆˆ๐บ๐‘‡

๐ถ๐‘€๐‘–(๐‘’)๐‘š

๐›ผ=๐‘š๐ถ๐‘€๐‘–

๐›ผ |๐ธ๐ป |

As argued by [40] Bennettโ€™s inequality holds even if we only havean upper bound on the variance of the estimates. Therefore let us

Page 14: odeN: Simultaneous Approximation of Multiple Motif Counts

CIKM โ€™21, November 1โ€“5, 2021, Virtual Event, QLD, Australia Ilie Sarpe and Fabio Vandin

compute the quantities to apply Bennettโ€™s bound (see [40] for thestatement), clearly ๐ต = ๐ถ๐‘€๐‘–

( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1) combining what we already

showed with the unbiasedness of ๐‘‹ ๐‘—

๐‘€๐‘–, moreover ๐‘ฃ โ‰ค ๐‘ฃ2

๐‘—since the

bound ๐‘ฃ2๐‘—is equal for each ๐‘— โˆˆ [1, ๐‘ ]. Then,

๐‘ฃ2๐‘—

๐ต2 =

๐ถ2๐‘€๐‘–

(๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1)

๐ถ2๐‘€๐‘–( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1)2

=1

( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1)

also๐‘ก๐ต

๐‘ฃ2๐‘—

=Y๐ถ๐‘€๐‘–

๐ถ๐‘€๐‘–( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1)

๐ถ2๐‘€๐‘–

(๐‘š

๐›ผ |๐ธ๐ป | โˆ’ 1) = Y

Combining everything together by Bennettโ€™s inequality we obtain,

Pยฉยญยซ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ1๐‘  ๐‘ โˆ‘๏ธ

๐‘—=1๐‘‹

๐‘—

๐‘€๐‘–โˆ’๐ถ๐‘€๐‘–

๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ โ‰ฅ Y๐ถ๐‘€๐‘–

ยชยฎยฌ โ‰ค 2 exp

(โˆ’ ๐‘ 

( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1)โ„Ž(Y)

)(1)

Now, let๐ด๐‘– = โ€œ|๐ถ โ€ฒ๐‘€๐‘–โˆ’๐ถ๐‘€๐‘–

| โ‰ฅ Y๐ถ๐‘€๐‘–โ€, ๐‘– = 1, . . . , |M(๐ป, โ„“) |, namely

๐ด๐‘– is the event that the estimate of motif๐‘€๐‘– , ๐‘– = 1, . . . , |M(๐ป, โ„“) | isdistant more than Y๐ถ๐‘€๐‘–

from ๐ถ๐‘€๐‘–. We already showed that that for

an arbitrary ๐ด๐‘– inequality (1) holds for P[๐ด๐‘– ], so

Pยฉยญยซ|M(๐ป,โ„“) |โ‹ƒ

๐‘–=1๐ด๐‘–

ยชยฎยฌ โ‰ค|M(๐ป,โ„“) |โˆ‘๏ธ

๐‘–=1P[๐ด๐‘– ] โ‰ค

โ‰ค |M(๐ป, โ„“) |2 exp

(โˆ’ ๐‘ 

( ๐‘š๐›ผ |๐ธ๐ป | โˆ’ 1)โ„Ž(Y)

)โ‰ค [

combining the union bound and the choice of ๐‘  as in statement. โ–ก

E CASE STUDY - MOTIF ANALYSIS

E1

"1

E2

E3

231

E1

"2

E2

E3

231

E1

"3

E2

E3

231

E1

"4

E2

E3

231

E1

"5

E2

E3

321

E1

"6

E2

E3

321

E1

"7

E2

E3

321

E1

"8

E2

E3

321

Figure 7: Graphical representation of the motifs in Figure

(6a).

Motifs on the Snapshots of the Facebook Network. Thanks to ouranalysis of Section 5.4 we are able to characterize the user behaviouron the Facebook network of wall posts by looking at different motifs(topology and their orderings) and their counts. We first show in Fig.7 the motifs corresponding to the labels of Figure (6a) in Section 5.4.Then, let๐ป = {๐‘ฃ1, ๐‘ฃ2, ๐‘ฃ3} be a triangle, the most frequent motifs (i.e.,those with the highest normalized counts on each snapshot) seemto share a common pattern: a first node (๐‘ฃ3) after posting on ๐‘ฃ1โ€™s(or ๐‘ฃ2โ€™s) wall triggers ๐‘ฃ1 (or ๐‘ฃ2โ€™s) to post on the remaining nodeโ€™swall with ๐‘ฃ1 posting also on such nodeโ€™s wall to close the triangle,as captured by motifs๐‘€3,๐‘€7 and๐‘€8. Observe that by identifyingthe users that mostly act as ๐‘ฃ3 in the occurrences of such frequent

motifs one is able to identify, for example, the nodes more engagedin spreading most of the information over the Facebook networkin a short period of time (recall that we set ๐›ฟ to one day). Notsurprisingly motif๐‘€5 is the less frequent one since its occurrencesrequire node ๐‘ฃ2 to post on ๐‘ฃ3โ€™wall before receiving the post from ๐‘ฃ2therefore without being โ€œtriggeredโ€ by such node, that received thepost from ๐‘ฃ3. Interestingly, without considering the orderings ofoccurrence among such patterns we will not be able to distinguishbetween the most frequent motifs and the least frequent ones sincefor example๐‘€4 and๐‘€5 have the same static directed graph structurebut they have very different counts on the different snapshots ofthe Facebook network.

1 3

CMZ1

= 1188894 (7.1%)

ZMZ1

= 1496494

2 MZ1

1 3

CMZ2

= 1072282 (7.1%)

ZMZ2

= 1215602

2 MZ3

1 3

CMZ3

= 1018769 (7.1%)

ZMZ3

= 1215069

2 MZ3

1 3

CMZ4

= 1110062 (7.1%)

ZMZ4

= 1165630

2 MZ4

1 3

CMZ5

= 8825 (2.5%)

ZMZ5

= 666

2 MZ5

1 3

CMZ6

= 6170 (2.4%)

ZMZ6

= 800

2 MZ6

1 3

CMZ7

= 5535 (2.5%)

ZMZ7

= 907

2 MZ7

1 3

CMZ8

= 3890 (2.3%)

ZMZ8

= 908

2 MZ8

t4

t3

t1 , t

5 , t6

t2

t4

t3

t1 , t

2 , t5

t6

t3

t4

t1 , t

2 , t5

t6

t3

t4t1 , t

6

t2 , t

5

t2, t5

t3, t6t1

t4

t2, t3t6t4t1, t5

t2, t4

t5t1

t3 , t

6

t3, t4

t5

t1

t6 t2

Figure 8: Graphical representation of the 4motifs with high-

est (top) and lowest (bottom) ๐‘ -scores in Figure (6c) for โ„“ = 6.For each motif we report the exact count (which we com-

puted for such representation) and the relative error in the

approximation obtainedwithodeN in brackets, we addition-

ally report each ๐‘ -score of the motif as obtained from Sec-

tion 5.4 (i.e., by using only odeN).

Motifs with varying โ„“ - Frequent vs Infrequent. In this Section webriefly discuss the properties and show visually the motifs withhighest and lowest๐‘ -scores obtained in Section 5.4 on the Facebookwall post network for โ„“ = 6. The motifs are reported in Figure 8,where we report the 4-top motifs ranked by ๐‘ -score on the top andthe 4-lowest motifs by ๐‘ -scores on the bottom. Note, that the top4 motifs share a similar structure, both temporal and topological.Interestingly in the original paper [46] the authors noted that therewere very few pair of nodes that exchanged more than 5 messages(with median 2). The most frequent temporal motifs seem to involvea pair of highly active nodes (which exchanged many messagesbetween them, i.e., more than 4) and another third node that isreached by such pair of nodes. We unfortunately do not have theoriginal messages to understand better the information captured bysuch frequent motifs (since we do not have the original posts), but itis really surprising that the top 4 motifs all share similar propertiesespecially in the orderings of their edges. Additionally, it seemsthat triangles involving nodes that are pairwise very active seemto be the rarest type of interaction as captured by the 4 motifs withlowest ๐‘ -score, reported in Figure 8 bottom.