learning opinion dynamics in social networks · learning opinion dynamics in social networks abir...

20
Learning Opinion Dynamics in Social Networks Abir De 1 , Isabel Valera 2 , Niloy Ganguly 1 , Sourangshu Bhattacharya 1 , and Manuel Gomez-Rodriguez 2 1 IIT Kharagpur, {abir.de, niloy, sourangshu}@cse.iitkgp.ernet.in 2 Max Plank Institute for Software Systems, {ivalera, manuelgr}@mpi-sws.org Abstract Social media and social networking sites have become a global pinboard for exposition and discussion of news, topics, and ideas, where social media users often form their opinion about a particular topic by learning information about it from her peers. In this context, whenever a user posts a message about a topic, we observe a noisy estimate of her current opinion about it, however, the influence the user may have on other users’ opinions is hidden. In this paper, we introduce SLANT, a probabilistic modeling framework of opinion dynamics, which allows the underlying opinion of a user to be modulated by those expressed by her neighbors over time. We then identify a set of conditions under which users’ opinions converge to a steady state, find a linear relation between the initial and steady state opinions, and develop an efficient estimation method to fit the model parameters from historical fine-grained opinion and information diffusion event data. Experiments on data gathered from Twitter, Reddit and Amazon show that our model provides a good fit to the data and more accurate nowcasting and forecasting than alternatives. 1 Introduction Social media and social networking sites are increasingly used by people to express their opinions, give their “hot takes”, on the latest breaking news, political issues, sports events, and new products. As a consequence, there has been an increasing interest on leveraging social media and social networking sites to sense and predict opinions, as well as understand opinion dynamics. For example, political parties routinely use social media to sense people’s opinion about their political discourse 1 ; quantitative investment firms measure investor sentiment and trade using social media Karppi & Crawford (2015); and, corporations leverage brand sentiment, estimated from users’ posts, likes and shares in social media and social networking sites, to design their marketing campaigns 2 . In this context, multiple methods for sensing opinions, typically based on sentiment analysis Pang & Lee (2008), have been proposed in recent years O´Connor et al. (2010); Tumasjan et al. (2010). However, methods for accurately predicting opinions are still scarce Das et al. (2014); De et al. (2014), despite the extensive literature on theoretical models of opinion dynamics Clifford & Sudbury (1973); DeGroot (1974). In this paper, we develop a novel modeling framework of opinion dynamics, SLANT 3 , that does not only fit fine grained opinion data from social media and social networking sites, but also provides accurate predictions of their individual users’ opinions. The proposed model is based on two simple intuitive ideas: i) users’ opinions are hidden until they decide to share it with their friends (or neighbors); and, ii) users may form and update their opinions about a particular topic by learning from the opinions shared by their friends. Here, we distinguish between two types of social influence: temporal influence and informational influence. The former accounts for the impact that different users have on the activity level in the network Farajtabar et al. (2014) – some users may either express their opinions more frequently than others, or be more influential and thus trigger a greater number of follow-ups every time they express their opinion. The latter, i.e., informational influence, accounts for the idea that a user may update her opinion about a particular topic by learning from the information and opinions expressed by their friends Raven (1993). While informational influence is one of the main underlying premises used by many well-known theoretical models 1 {http://www.nytimes.com/2012/10/08/technology/campaigns-use-social-media-to-lure-younger-voters.html} 2 http://www.nytimes.com/2012/07/31/technology/facebook-twitter-and-foursquare-as-corporate-focus-groups.html 3 Slant is a particular point of view from which something is seen or presented. 1 arXiv:1506.05474v2 [cs.SI] 1 Apr 2016

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

Learning Opinion Dynamics in Social Networks

Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and ManuelGomez-Rodriguez2

1IIT Kharagpur, abir.de, niloy, [email protected] Plank Institute for Software Systems, ivalera, [email protected]

Abstract

Social media and social networking sites have become a global pinboard for exposition and discussion of news,topics, and ideas, where social media users often form their opinion about a particular topic by learning informationabout it from her peers. In this context, whenever a user posts a message about a topic, we observe a noisy estimateof her current opinion about it, however, the influence the user may have on other users’ opinions is hidden. In thispaper, we introduce SLANT, a probabilistic modeling framework of opinion dynamics, which allows the underlyingopinion of a user to be modulated by those expressed by her neighbors over time. We then identify a set of conditionsunder which users’ opinions converge to a steady state, find a linear relation between the initial and steady stateopinions, and develop an efficient estimation method to fit the model parameters from historical fine-grained opinionand information diffusion event data. Experiments on data gathered from Twitter, Reddit and Amazon show that ourmodel provides a good fit to the data and more accurate nowcasting and forecasting than alternatives.

1 IntroductionSocial media and social networking sites are increasingly used by people to express their opinions, give their “hottakes”, on the latest breaking news, political issues, sports events, and new products. As a consequence, there has beenan increasing interest on leveraging social media and social networking sites to sense and predict opinions, as well asunderstand opinion dynamics. For example, political parties routinely use social media to sense people’s opinion abouttheir political discourse1; quantitative investment firms measure investor sentiment and trade using social media Karppi& Crawford (2015); and, corporations leverage brand sentiment, estimated from users’ posts, likes and shares in socialmedia and social networking sites, to design their marketing campaigns2. In this context, multiple methods for sensingopinions, typically based on sentiment analysis Pang & Lee (2008), have been proposed in recent years O´Connoret al. (2010); Tumasjan et al. (2010). However, methods for accurately predicting opinions are still scarce Das et al.(2014); De et al. (2014), despite the extensive literature on theoretical models of opinion dynamics Clifford & Sudbury(1973); DeGroot (1974).

In this paper, we develop a novel modeling framework of opinion dynamics, SLANT3, that does not only fit finegrained opinion data from social media and social networking sites, but also provides accurate predictions of theirindividual users’ opinions. The proposed model is based on two simple intuitive ideas: i) users’ opinions are hiddenuntil they decide to share it with their friends (or neighbors); and, ii) users may form and update their opinionsabout a particular topic by learning from the opinions shared by their friends. Here, we distinguish between twotypes of social influence: temporal influence and informational influence. The former accounts for the impact thatdifferent users have on the activity level in the network Farajtabar et al. (2014) – some users may either express theiropinions more frequently than others, or be more influential and thus trigger a greater number of follow-ups every timethey express their opinion. The latter, i.e., informational influence, accounts for the idea that a user may update heropinion about a particular topic by learning from the information and opinions expressed by their friends Raven (1993).While informational influence is one of the main underlying premises used by many well-known theoretical models

1http://www.nytimes.com/2012/10/08/technology/campaigns-use-social-media-to-lure-younger-voters.html2http://www.nytimes.com/2012/07/31/technology/facebook-twitter-and-foursquare-as-corporate-focus-groups.html3Slant is a particular point of view from which something is seen or presented.

1

arX

iv:1

506.

0547

4v2

[cs

.SI]

1 A

pr 2

016

Page 2: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

of opinion dynamics Clifford & Sudbury (1973); DeGroot (1974), temporal influence has been ignored by models ofopinion dynamics, despite its relevance on closely related processes such as information diffusion Gomez-Rodriguezet al. (2011).

More in detail, we model each user’s latent opinion as a continuous-time stochastic process, which is modulatedover time by the opinions asynchronously expressed by her neighbors as sentiment messages, by means of informa-tional influence. Then, every time a user expresses an opinion by posting a sentiment message, she reveals a noisyestimate of her current latent opinion and may trigger further discussions in her neighborhood, by means of temporalinfluence. The proposed formulation captures characteristic properties of the opinion dynamics, previously studied inthe literature Das et al. (2014), such as stubbornness, conformity and compromise. Moreover, for several instances ofour modeling framework, we identify the conditions under which opinions converge to a steady state of consensus orpolarization. In such cases, we derive a closed-form expression for the relationship between the users’ steady stateopinions and the initial opinions they start with. Then, we develop an efficient method to find the optimal modelparameters that jointly maximize the likelihood of an observed set of sentiment messages. Finally, we experiment onboth synthetic and real data gathered from Twitter, Reddit and Amazon, and show that our model provides a goodfit to the data and more accurate opinion nowcasting and forecasting than several state of the art models of opiniondynamics De et al. (2014); DeGroot (1974); Yildiz et al. (2010).

Related work. There is an extensive line of work on theoretical models of opinion dynamics and opinion forma-tion Axelrod (1997); Bindel et al. (2011); Clifford & Sudbury (1973); DeGroot (1974); Hegselmann & Krause (2002);Holme & Newman (2006); Yildiz et al. (2010). However, previous works typically share the following limitations: (i)they do not distinguish between latent opinion and sentiment (or expressed opinion), which is a noisy observation ofthe opinion (e.g., thumbs up/down, text sentiment); (ii) they do not distinguish between informational and temporal in-fluence; (iii) their model parameters are difficult to learn from real fine-grained data and instead are set arbitrarily, as aconsequence, they provide inaccurate fine-grained predictions; (iv) they focus on analyzing only the steady state of theusers’ opinion, neglecting the transient behavior of real opinion dynamics; and, (v) they consider users’ opinions to beupdated synchronously in discrete time, however, opinions may be updated asynchronously since they are expressedasynchronously Gomez-Rodriguez et al. (2011). More recently, there have been some efforts on designing modelsthat overcome some of the above limitations and provide more accurate predictions Das et al. (2014); De et al. (2014).However, they do not distinguish between opinion and sentiment nor between informational and temporal influence,and still consider opinions to be updated synchronously in discrete time. Our modeling framework addresses the abovelimitations and, by doing so, achieves more accurate fine-grained predictions.

2 Proposed ModelIn a social network, whenever a user posts a message about a topic, she is revealing a noisy estimate of her current(latent) opinion about the topic. Here, we think of users’ opinions as stochastic processes that may evolve over time,and think of the sentiment users express in each message as noisy samples from these stochastic processes localizedin time, as illustrated in Figure 1. Our model aims to uncover the evolution of these processes by modeling bothinformational influence and temporal influence. Next, we formulate our model, starting from the data it is designedfor.

Given a directed social network G = (V, E), we record each message as e := (u,m, t), where the tripletmeans that the user u ∈ V posted a message with sentiment m at time t. Given a collection of messages e1 =(u1,m1, t1), . . . , en = (un,mn, tn), the historyHu(t) gathers all messages posted by user u up to but not includingtime t, i.e.,

Hu(t) = ei = (ui,mi, ti)|ui = u and ti < t, (1)

andH(t) := ∪u∈VHu(t) denotes the entire history of messages up to but not including time t.We represent the users’ opinions as a multidimensional (latent) stochastic process x∗(t), in which the u-th entry,

xu(t) ∈ R, represents the opinion of user u at time t and the sign ∗ means that the opinion x∗u(t) may depend onthe history H(t). Then, every time a user u posts a message at time t, we draw its sentiment m from a sentimentdistribution p(m|x∗u(t)). Here, we can also think of the sentiment m of each message as samples from a noisystochastic process mu(t) ∼ p(mu(t)|x∗u(t)). Further, we represent the message times by a set of counting processes.In particular, we denote the set of counting processes as a vector N(t), in which the u-th entry, Nu(t) ∈ 0 ∪ Z+,

2

Page 3: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

t Alice’s

latent opinion, x*(t)

Bob Charly

t Bob’s and Charlie’s

expressed opinions, mi

Alice

Alice’s expressed opinions, mI

t

Figure 1: Our model of opinion dynamics in social networks. Alice has a latent opinion x∗(t) and every time sheposts a message i at time ti, we observe the sentiment mi expressed in the message, which depends on x∗(ti). WhenAlice receives messages posted by her in-neighbors, Bob and Charly, she may update her opinion x∗(t), by means ofinformational influence, and be stimulated to post more messages through temporal influence.

counts the number of sentiment messages user u posted up to but not including time t. Then, we can characterize themessage rate of the users using their corresponding intensities as

E[dN(t) |H(t)] = λ∗(t) dt, (2)

where dN(t) := ( dNu(t) )u∈V denotes the number of message events per user in the window [t, t+dt) and λ∗(t) :=( λ∗u(t) )u∈V denotes the vector of intensities associated to all the users, which may depend on the history H(t).Moreover, we denote the neighborhood of user u byN (u). Next, we specify the functional form of the users’ opinionsx∗(t), the sentiment distribution p(m|x∗u(t)), and the intensity functions λ∗(t).

Stochastic process for opinion. The opinion x∗u(t) of a user u at time t takes the following form:

x∗u(t) = αu +∑

v∈N (u)

avu∑

ei∈Hv(t)mig(t− ti) = αu +

v∈N (u)

avu (g(t) ? (mv(t)dNv(t))),

where the first term, αu ∈ R, models the original opinion a user u starts with, the second term, with avu ∈ R, modelsupdates in user u’s opinion due to informational influence, i.e., the influence that previous messages with opinions mi

posted by the users u follows (her followees) has on her opinion. Here, g(t) denotes a nonnegative triggering kernel,which models the decay of informational influence over time, and ? denotes the convolution operation. Note that theopinion x∗u(t) of user u depends on the number of message events per neighbor over time, Nv(t)v∈N (u), which aredriven by the message intensities λ∗v(t)v∈N (u). In other words, the evolution of the user u’s opinion over time, aswell as her steady-state opinion (refer to Section 3), depends on the pace at which her neighbors express their opinions.

This functional form allows for stubbornness, conformity and compromise, which have been identified as charac-teristic properties of opinion dynamics Das et al. (2014). In particular, stubborn users do not update their opinions bymeans of informational influence, and can be characterized by avu = 0 for all v; conforming users do not have anopinion at the beginning, and can be characterized by αu = 0; and compromised users have a starting opinion that getupdated over time, by means of informational influence, and can be characterized by αu 6= 0 and avu 6= 0 for some v.

Sentiment distribution. The particular choice of sentiment distribution p(m|x∗u(t)) depends on the recorded data.In this paper, we consider continuous sentiment, m ∈ R, which fits well scenarios in which sentiment is extractedfrom text using sentiment analysis Hannak et al. (2012), and, if not specified otherwise, adopt the following sentimentdistribution:

p(m|xu(t)) ∼ N (xu(t), σu) (3)

Remarkably, our estimation method, described in Section 4, can be adapted to any log-concave sentiment distribution.For example, for discrete binary sentiment data, such as upvotes and downvotes, one may opt for a logistic distribution.

3

Page 4: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

Intensity for messages. There is a wide variety of message intensity functions one can choose from to model eachuser’s intensity λ∗u(t) Aalen et al. (2008). In this work, we consider two of the most popular characteristic functionalforms used in the growing literature on social activity modeling using point processes Zhou et al. (2013); Farajtabaret al. (2014); Valera & Gomez-Rodriguez (2015):

I. Poisson process. The intensity is assumed to be independent of the historyH(t) and constant, i.e., λ∗u(t) = µu,where µu > 0 is a scale parameter.

II. Multivariate Hawkes processes. The intensity captures a mutual excitation phenomena between messageevents and depends on the whole history of message events ∪v∈u∪N (u)Hv(t) before t:

λ∗u(t) = µu +∑

v∈u∪N (u)

bvu∑

ei∈Hv(t)κ(t− ti) = µu +

v∈u∪N (u)

bvu (κ(t) ? dNv(t)),

where the first term, µu > 0, models the publication of messages by user u on her own initiative, and the secondterm, with bvu > 0, models the publication of additional messages by user u due to temporal influence, i.e., theinfluence that previous messages posted by users that u follows has on her intensity. Here, κ(t) is a nonnegativetriggering kernel modeling the decay of temporal influence over time.

3 Model PropertiesIn this section, we identify under which conditions users’ average opinion, EH(t)[x

∗(t)], converges to a steady stateand, if so, find the steady state opinion limt→∞ EH(t)[x

∗(t)]. Our theoretical results assume that the sentiment iscontinuous and the sentiment distribution satisfies that E[m|x∗u(t)] = x∗u(t). In this section, we write Ht = H(t) tolighten the notation and denote the eigenvalues of a matrixX by ξ(X).

First, we show that the behavior of the average opinion satisfies the following (proven in Appendix A):

Lemma 1. The average opinion EHt [x∗(t)] in the model of opinion dynamics defined by Eqs. ?? and ?? with expo-nential triggering kernels with parameters ω and ν satisfies the following differential equation:

dEHt [x∗(t)]dt

= AEHt [x∗(t) λ∗(t)]− ωEHt [x∗(t)] + ωα, (4)

whereA = (avu)v,u∈G and the sign denotes pointwise product.

Next, we carry on an analysis of convergence of our model for both above mentioned functional forms of messageintensities. However, since a general analysis of convergence for multivariate Hawkes seems difficult, we focus on thecase when bvu = 0 for all v, u ∈ G, v 6= u, and leave the general analysis as future work.

I. Poisson intensity. Consider each user’s messages follow a Poisson process with rate µu. Then, the average opinionand the steady state average opinion are given by (proven in Appendices B-C):

Theorem 2. Given the conditions of Lemma 1 and λ∗u(t) = µu for all u ∈ G, then,

EHt [x∗(t)] =[eCt + ωC−1

(eCt − I

) ]α, (5)

where Λ1 := diag[µ] and C = AΛ1 − ωI .

Theorem 3. Given the conditions of Theorem 2, if Re[ξ(AΛ1)] < ω, then,

x∗p = limt→∞

EHt [x∗(t)] =

(I − AΛ1

w

)−1α. (6)

The above results indicate that the steady state opinions are nonlinearly related to the parameter matrix A, whichdepends on the network structure, and the message rates µ, which in this case are assumed to be constant and inde-pendent on the network structure. Figure 3 provides empirical evidence of these results.

II. Multivariate Hawkes Process. Consider each user’s messages follow a multivariate Hawkes process, given byEq. ??, and bvu = 0 for all v, u ∈ G, v 6= u. Then, the steady state average opinion is given by (proven in Appendix D):

4

Page 5: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

0.2

0.4

0.6

0.8

1

025 50 100 200 400 800

MSE(α

,A)→

Avg. no. of events per node

HomophilyHeterophilyCore-periphery

]]]

(a) Informational influence

0.1

0.2

0.3

0.4

0.5

025 50 100 200 400 800

MSE(µ,B

)→

Avg. no. of events per node

HomophilyHeterophilyCore-periphery

]]]

(b) Temporal influence

Figure 2: Performance of model estimation for several 512-node kronecker networks in terms of mean squared errorbetween estimated and true parameters.

Theorem 4. Given the conditions of Lemma 1 and bvu = 0 for all v, u ∈ G, v 6= u, then Eq. 4 simplifies to:

dEHt [x∗(t)]dt

= [−ωI +AΛ(t)]EHt [x∗(t)] + ωα, (7)

where Λ(t) = diag[EHt [λ∗(t)]],

EHt [λ∗(t)] =[e(B−νI)t + ν(B − νI)−1

(e(B−νI)t − I

) ]µ,

B = diag[[b11, . . . , b|V||V|]>] and diag[x] is a diagonal matrix with the elements of vector x in the diagonal. More-over, if the transition matrix Φ(t) associated to the time-varying linear system described by Eq. 7 satisfies that||Φ(t)|| ≤ γe−ct ∀t > 0, where γ, c > 0, then,

x∗p = limt→∞

EHt [x∗(t)] =

(I − AΛ2

w

)−1α. (8)

where Λ2 := diag[(I − B

ν

)−1µ]

The above indicates that the steady state opinions are nonlinearly related to the parameter matricesA andB. Thissuggests that the effect of the temporal influence on the opinion evolution, by means of the parameter matrixB of themultivariate Hawkes process, is non trivial. We illustrate this result empirically in Figure 4.

Implications. The above results do not only identify the conditions that ensure the existence of a steady state averageopinion (I: Re[ξ(AΛ1)] < ω; II: ||Φ(t)|| ≤ γe−ct) but also provide a closed-form expression for its value (Eqs. 6–8).Therefore, they establish the foundations to investigate consensus and polarization in arbitrary networks as well ascarrying out opinion shaping Farajtabar et al. (2014), which are very interesting venues for future work. Additionally,if the message intensities are Poisson, Theorem 2 allows for a fine grained analysis of the temporal evolution of theaverage opinions, as shown in Figure 3.

4 Model Parameter EstimationGiven a collection of messages H(T ) = (ui,mi, ti) recorded during a time period [0, T ) in a social networkG = (V, E), our goal is to find the optimal parameters α, µ, A and B by solving a maximum likelihood estimation(MLE) problem4. To this end, it is easy to show that the log-likelihood of the messages is given by

L(α,µ,A,B) =∑

ei∈H(T )

log p(mi|x∗ui(ti))

︸ ︷︷ ︸message sentiments

+∑

ei∈H(T )

log λ∗ui(ti)−∑

u∈V

∫ T

0

λ∗u(τ) dτ

︸ ︷︷ ︸message times

. (9)

4Here, if one decides to model the message intensities with a Poisson process,B = 0.

5

Page 6: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

0.010.005 0.015

1

00

Time

ExperimentalTheoretical

Opinion-T

rajecto

ry→

#MW

∑u∈V E[xu(t)]

|V |

0.010.005 0.01500

10

20

30

40

50

Time

Node-ID

Temporal evolution Initial state, t = 0 Transient state, t = T Steady state, t→∞

-2.2

-1.7

-1.2

-0.7

-0.2

0.010.005 0.0150Time

ExperimentalTheoretical

Opinion-T

rajecto

ry→

#MW

∑u∈V E[xu(t)]

|V |

0.010.005 0.01500

10

20

30

40

50

Time

Node-ID

Temporal evolution Initial state, t = 0 Transient state, t = T Steady state, t→∞

-1.2

-0.8

-0.4

0.4

0.8

1.2

0.010.005 0.015

0

0Time

Experimental

Theoretical

Opinion-T

rajecto

ry→

#MW

∑u∈V± E[xu(t)]

|V±|

0.010.005 0.01500

10

20

30

40

50

Time

Node-ID

Temporal evolution Initial state, t = 0 Transient state, t = T Steady state, t→∞

Figure 3: Opinion dynamics on three 50-node networks G1 (top), G2 (middle) and G3 (bottom). The first columnshows the temporal evolution of the theoretical and empirical average opinions (in the third row, we compute theaverage separately for positive and negative opinions in the steady state). The second column shows the polarity oftheoretical average opinion per user over time. The three right columns show three snapshots of the opinions in thenetworks at different times: t = 0, 0 < t = T < ∞ and t → ∞. The number of nodes with positive opinion (in red)and negative opinion (in blue) at time t = 0 is the same for all networks, however, at t → ∞, G1 reaches negativeconsensus, G2 reaches positive consensus while G3 gets uniformly polarized, due to their model parameter values andnetwork structure.

Then, we can find the optimal parameters (α,µ,A,B) using MLE as

maximizeα,µ≥0,A,B≥0

L(α,µ,A,B). (10)

Note that, as long as the sentiment distributions are log-concave, the MLE problem above is concave and thus canbe solved efficiently. Moreover, the problem decomposes in 2|V| independent subproblems, two per user u, since thefirst term in Eq. 9 only depends on (α,A) whereas the last two terms only depend on (µ,B), and thus can be readilyparallelized. In each subproblem, for exponential triggering kernels, we can precompute their sums and integrals inlinear time, i.e., O(|Hu(T )|+ | ∪v∈N (u) Hv(T )|), following Valera & Gomez-Rodriguez (2015).

The global optimum can be found by many algorithms. Here, we find (µ∗,B∗) using spectral projected gradientdescent Birgin et al. (2000), which works well in practice and achieves ε accuracy in O(log(1/ε)) iterations, andfind (α∗,A∗) analytically, since, for Gaussian sentiment distributions, the problem reduces to a least-square problem.Appendix F summarizes the overall estimation algorithm.

5 ExperimentsWe validate our model using both synthetic and real data gathered from Twitter, Reddit and Amazon. We first usesynthetic data to show that our estimation method is scalable and can accurately recover the true model parameters fromrecorded messages, and our model is able to produce opinion dynamics that converge to a steady state of consensus orpolarization and depend on the functional form of message intensities. We then use real data to show that our model

6

Page 7: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

Time

Opinion-T

rajecto

ry→

Poisson (+)

Hawkes (-)Hawkes (+)

0.005 0.01 0.015-4

-2

2

4

0

0

(a) Network G1

Time

Opinion-T

rajecto

ry→ Poisson (+)

Hawkes (-)Hawkes (+)

0.005 0.01 0.015-20

-10

10

20

30

40

0

0

(b) Network G2

Time

Opinion-T

rajecto

ry→

Poisson (+)Poisson (-)

Hawkes (-)Hawkes (+)

0.005 0.01 0.015

-1.5

-0.5

0.5

-2

-1

0

0

1

(c) Network G3

Figure 4: Influence of the functional form of message intensities on the temporal evolution of the average opinions onthe same three networks used in Figure 3. If consensus is not reached, we compute the average separately for positive(+) and negative (−) opinions in the steady state.

can accurately nowcast and forecast users’ opinions, significantly outperforming three state of the art methods De et al.(2014); DeGroot (1974); Yildiz et al. (2010), and provides a good fit to the data. We provide additional experimentalresults in the Appendices G and H.

5.1 Experiments on synthetic dataParameter estimation accuracy. We evaluate the accuracy of our model estimation procedure on three types ofKronecker networks Leskovec et al. (2010): i) Assortative networks (parameter matrix [0.96, 0.3; 0.3, 0.96]), in whichnodes tend to link to nodes with similar degree; ii) dissortative networks ([0.3, 0.96; 0.96, 0.3]), in which nodes tendto link to nodes with different degree; and iii) core-periphery networks ([0.9, 0.5; 0.5, 0.3]). For each network, themessage intensities are multivariate Hawkes, µ and B are drawn from a uniform distribution U(0, 1), and α and Aare drawn from a Gaussian distribution N (µ = 0, σ = 1). We use exponential kernels with parameters ω = 100and ν = 1, respectively, for opinions x∗u(t) and intensities λ∗(t). To simulate from the model, we adapt the efficientalgorithm recently proposed by Farajtabar et al. (2015), which we summarize in Appendix E. We evaluate the accuracyof our estimation procedure in terms of mean squared error (MSE), between the estimated and true parameters, i.e.,E[(x − x)2]. Figure 2 shows the MSE of the informational influence parameters (α,A) and the temporal influenceparameters (µ,B). As we feed more messages into the estimation procedure, it becomes more accurate.

Consensus and polarization. In this section, we show that, in practice, the model can produce opinion dynamics thatconverge to consensus and polarization. To do so, we simulate our model on three different small networks, assumingPoisson intensities, i.e., λ∗u(t) = µu, µ ∼ U(0, 1). Figure 3 summarizes the results, which show that (i) our model isable to produce opinion dynamics that converge to negative consensus (G1), positive consensus (G2) and polarization(G3), and (ii) the theoretical average opinions, given by Eq. 5, closely match the empirical estimates.

Temporal influence. Lemma 1 suggests that the evolution of the average opinion depends on the functional formof message intensities and the temporal influence parameters. In this section, we verify empirically such theoreticalresult using the same three networks as in Figure 3. For each network, we compare a configuration in which all nodesfollow Poisson intensities against another one in which a randomly chosen 5% of the nodes follow Hawkes intensitieswith bvu ∼ U(0, 1) while the remaining nodes follow the same Poisson intensities, i.e., bvu = 0 for all v. Figure 4summarizes the results5, which shows that, due to temporal influence, in the latter configuration, all networks getpolarized while, in the former, the networks reach positive and negative consensus, and polarization.

Scalability. Figure 5 shows that our model estimation algorithm scales to hundred of millions of events generatedby millions of users. For example, our algorithm takes 20 minutes to estimate the model parameters from 10 millionevents generated by one million nodes using a single machine with 24 cores and 64 GB RAM. Figure 12 in Appendix Gshows that our model simulation algorithm also scales to hundred of millions of events.

5For the Hawkes configuration, the averaged opinion EHt [x∗(t)] is computed empirically.

7

Page 8: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

InformationalTemporal

Nodes

Tim

e(s)

101 102 103 104 105 10610−210−1100101102103104105

(a) Number of nodes

InformationalTemporal

Total Events

Tim

e(s)

103 108104 105 106 10710−210−1100101102103104105

(b) Number of events

Figure 5: Running time of our estimation method. In Panel (a), the average number of events per node is 10. In Panel(b), the network has 1,000 nodes. The average degree per node is 30. The experiments are carried out in a singlemachine with 24 cores and 64 GB of main memory.

5.2 Experiments on real dataData description. We experimented with four Twitter datasets about current real-world events (Tw: Politics, Tw:Movie, Tw: Fight and Tw: Bollywood), which we gathered in-house, a dataset gathered from Amazon6 and a datasetgathered from Reddit De et al. (2014). Appendix H.1 contains further details and statistics about these datasets.

Experimental setup. For each recorded message i, our model requires its timing ti, which is already explicit in theabove datasets, and sentiment value mi, which we compute as follows. For the Twitter datasets, we compute eachmessage sentimentmi using a popular sentiment analysis toolbox, specially designed for Twitter Hannak et al. (2012),which returns values in (−1, 1). For the Amazon and Reddit datasets, we compute each message sentiment mi using apopular sentiment analysis toolbox, LIWC7, which also returns values in (−1, 1). We consider the sentiment polarityto be simply sign(m). Moreover, the Amazon and Reddit datasets do not explicitly provide a social network, which webuild as follows. For the Amazon dataset, we assume two users follow each other if they have posted at least 15 reviewson the same type of food. For the Reddit dataset, we assume a user v follows a user u if v has posted a message withina week after u posted a message in the same Reddit group. We build such explicit networks because the competingmethods need them to operate, however, our estimation method does not necessarily need a network since one canassume N (u) = V\u for all u ∈ V and add a `1-regularizer to rule out spurious influences Daneshmand et al.(2014).8

Sentiment prediction. In this section, we evaluate the performance of our model at predicting sentiment at a (mi-croscopic) message level. To do so, for each dataset, we first estimate the parameters of our model, SLANT, usingmessages from a training set containing the (chronologically) first 90% of the messages. Here, κ(t) and g(t) are expo-nential triggering kernels, with decay parameters set by cross-validation. Then, we evaluate the predictive performanceof our trained model at nowcasting and forecasting opinions using the last 10% of the messages. In nowcasting, wepredict the sentiment value m for each message in the test set given the history up to, but not including, the time ofthe message, i.e., m = E[m|x∗u(t)] = x∗u(t). In forecasting, we predict the sentiment value m given the history up toT hours before the time of the message, i.e., we forecast m = EHt\Ht−T [x∗u(t)|Ht−T ], whereHt\Ht−T denotes thatthe average is computed empirically by simulating from the trained model from t− T to t, while conditioning on thehistory up toHt−T .

We compare the performance of our model with the asynchronous linear model (AsLM) De et al. (2014), De-Groot’s model DeGroot (1974), and the voter model Yildiz et al. (2010), by means of: (i) the mean squared errorbetween the true (m) and the estimated (m) sentiment value for all messages in the held-out set, i.e., E[(m − m)2],and (ii) the failure rate, defined as the probability that the true polarity sign (m) and the estimated polarity sign (m)do not coincide, i.e., P(sign(m) 6= sign(m)). For the baselines algorithms, which work in discrete time, we simulateNT rounds in (t − T, t), where NT is the number of posts in time T . Figure 6 summarizes the results, which show

6https://snap.stanford.edu/data/web-FineFoods.html7http://www.liwc.net/8The threshold values (i.e., 15 reviews and a week, respectively) are the minimum values that lead to a set of fitted model parameters

that match the fitted parameters considering N (u) = V\u with a (cross-validated) `1-regularizer. In other words, we choose low enoughthreshold values so that only spurious influences are removed.

8

Page 9: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

DeGroot AsLM Voter SLANT(Poisson) SLANT(Hawkes)

0 2 4 6 8 10T, hours

MSE

10−2

10−1

100

101

0 2 4 6 8 10T, hours

0 2 4 6 8 10T, hours

0 2 4 6 8 10T, hours

0 2 4 6 8 10T, days

0 2 4 6 8 10T, hours

00

1

2 4 6 8 10

0.2

0.4

0.6

0.8

T, hours

Failure-R

ate

(a) Politics

0 2 4 6 8 10T, hours

(b) Movie

0 2 4 6 8 10T, hours

(c) Fight

0 2 4 6 8 10T, hours

(d) Bollywood

0 2 4 6 8 10T, days

(e) Amazon: Food

0 2 4 6 8 10T, hours

(f) Reddit

Figure 6: Sentiment prediction performance using a 10% held-out set for each real-world dataset. Performance ismeasured in terms of mean squared error (MSE) on the sentiment value, E[(m−m)2], and failure rate on the sentimentpolarity, P(sign(m) 6= sign(m)). For each message in the held-out set, we predict the sentiment value m given thehistory up to T hours before the time of the message, for different values of T . Nowcasting corresponds to T = 0 andforecasting to T > 0. The sentiment value m ∈ (−1, 1) and the sentiment polarity sign (m) ∈ −1, 1.

0.4

0.5

0.6

0.7

0.8 m(t)x(t)T = 1hT = 3hT = 5h

Time

Avera

geOpinion→

28 April 2 May 5 May

(a) Tw: Movie (Hawkes)

0.4

0.5

0.6

0.7

0.8 m(t)x(t)T = 1hT = 3hT = 5h

Time

Avera

geOpinion→

28 April 2 May 5 May

(b) Tw: Movie (Poisson)

0.1

0.2

0.3

0.4

0.5 m(t)x(t)T = 1dayT = 3daysT = 5days

Time

Avera

geOpinion→

Dec ’08 Jun ’10 Oct ’120

(c) Amazon: Food (Hawkes)

0.1

0.2

0.3

0.4

0.5 m(t)x(t)T = 1dayT = 3daysT = 5days

Time

Avera

geOpinion→

Dec ’08 Jun ’10 Oct ’120

(d) Amazon: Food (Poisson)

Figure 7: Macroscopic sentiment prediction given by our model for two real-world datasets. The panels show theobserved sentiment m(t) (in blue, running average), inferred opinion x(t) on the training set (in red), and forecastedopinion EHt\Ht−T [xu(t)|Ht−T ] for T = 1, 3, and 5 hours (Tw: Movie) or days (Amazon: Food) on the test set (inblack, green and gray, respectively), where the symbol ¯ denotes average across users.

that: (i) our model consistently outperforms others at nowcasting and forecasting both in terms of MSE (often by anorder of magnitude) and failure rate; (ii) its forecasting performance degrades gracefully with respect to T , in contrast,competing methods often fail catastrophically; and, (iii) it achieves an additional mileage by using Hawkes processesinstead of Poisson processes. To some extent, we believe SLANT’s superior performance is due to its ability to lever-age historical data to learn its model parameters and then simulate realistic temporal patterns looking into the future,in contrast with competing methods.Visualizing Opinion Dynamics. In this section, we look at the nowcasting and forecasting results from the previoussection at a network level and show that our model can also predict the evolution of opinions macroscopically (in termsof the average opinion across users). Figure 7 summarizes the results for two real world datasets, which show that theforecasted opinions become less accurate as the time T becomes larger, since the prediction comes from averagingacross longer time periods. Moreover, our model is more accurate when the message intensities are modeled usingmultivariate Hawkes, as one may have expected. We found qualitatively similar results for the remaining datasets.

Temporal and informational influence. Our model distinguishes between two types of social influence, temporaland informational, which account for the impact that different friends have on the activity level and opinion changes ofa user, respectively. Such a distinction has helped us to provide more accurate predictions than alternatives, as shownin Figures 6 and 7. Looking closely at the results, we find users that exert only temporal, only informational or bothtypes of influence on others. In Figure 8, we pinpout at two concrete real user examples, among several we found, to

9

Page 10: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

0.5

1.5

2

Time

jectory

xA(t)xA(t)xA(t)λA(t)λA(t)λA(t)

mA(t)mA(t)mA(t)mC(t)mC(t)mC(t)

x(t),m(t),λ(t)

x(t),m(t),λ(t)

x(t),m(t),λ(t)

May 1 May 2 May 3

0

1

xA(t),λA(t),mA(t) vsmC(t)

0.5

1.5

2

Time

jectory

xB(t)xB(t)xB(t)λB(t)λB(t)λB(t)mC(t)mC(t)mC(t)mB(t)mB(t)mB(t)

x(t),m(t),λ(t)

x(t),m(t),λ(t)

x(t),m(t),λ(t)

May 1 May 2 May 3

0

1

xB(t),λB(t),mB(t) vsmC(t)

Figure 8: A real-world example of a user C that exerts high (low) information influence but low (high) informationinfluence on a userA (a userB). Every time userC posts a sentiment message, userA is likely to reply but her opinionbarely changes, in contrast, user B typically does not reply but her opinion changes.

illustrate that, in some cases, a user can exert high temporal influence but low information influence and viceversa. Amacroscopic study of such phenomena is out of the scope of this paper and left as future work.

Stubbornness, conformity and compromise. Previous work Das et al. (2014) has argued that a social network usercan be stubborn (i.e., avu = 0 for all v), conforming (αu = 0) or compromised (αu 6= 0 and avu 6= 0 for some v). Inthis section, we compute the percentage of stubborn, conforming and compromised users for each dataset, as given byour fitted model. Figure 9 summarizes the results, which provide several interesting insights. First, compromised users,who starts with an initial opinion that update over time are typically most frequent. Second, while conforming usersin the Reddit dataset are very few, they are overrepresented in the Amazon dataset. Third, stubborn users typicallyaccount for 20% of the users, except for the Reddit dataset, where they are overrepresented.

6 ConclusionsWe proposed a modeling framework of opinion dynamics, naturally designed to fit fine-grained opinion data. Thekey innovation of our framework is modeling each user’s latent opinion as a continuous-time stochastic process,which is modulated over time by the opinions asynchronously expressed by her neighbors. Then, every time a userexpresses an opinion, she reveals a noisy estimate of her current latent opinion. Moreover, the model allows for anefficient parameter estimation method from historical fine-grained event data and, as a result, it predicts expressedopinions over time more accurately than alternatives. Our model opens up many interesting venues for future work.For example, a general analysis of the time-varying average opinion in our model would be of great interest, since itwould allow us to investigate the effect of complex communication patterns over time. Instead of multivariate versionof Hawkes processes, a large and diverse range of point processes can also be used to model the message intensities,without changing the convexity of the parameter estimation, e.g., condition the intensity on the expressed opinions orexternal features such as node attributes. Our framework can be leveraged to design opinion shaping algorithms andreason about large-scale opinion dynamics in large datasets. Finally, one of the key modeling ideas is realizing thatusers’ expressed opinions (be it in the form of thumbs up/down or text sentiment) can be viewed as noisy discretesamples of the users’ latent opinion localized in time. It would be very interesting to generalize this idea to any typeof event data and derive sampling theorems and conditions under which an underlying general continuous signal ofinterest (be it user’s opinion, expertise, or wealth) can be recovered from event data with provable guarantees.

ReferencesAalen, O.O., Borgan, Ø., and Gjessing, H.K. Survival and event history analysis: a process point of view. Springer

Verlag, 2008.

Axelrod, R. The dissemination of culture a model with local convergence and global polarization. Journal of conflictresolution, 41(2):203–226, 1997.

10

Page 11: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(a) Tw:Politics

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(b) Tw:Movie

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(c) Tw:Fight

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(d) Tw:Bollywood

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(e) Amazon: Food

0

1

0.2

0.4

0.6

0.8

Stubborn Conform Compromised

Fra

ction

ofNodes

Type of Nodes

(f) Reddit

Figure 9: Fraction of stubborn, conforming and compromised users in each real-world dataset

Bindel, D., Kleinberg, J., and Oren, S. How bad is forming your own opinion? In FOCS, 2011.

Birgin, Ernesto G, Martınez, Jose Mario, and Raydan, Marcos. Nonmonotone spectral projected gradient methods onconvex sets. SIAM Journal on Optimization, 10(4), 2000.

Clifford, P. and Sudbury, A. A model for spatial conflict. Biometrika, 60(3):581–588, 1973.

Daneshmand, H., Gomez-Rodriguez, M., Song, L., and Scholkopf, B. Estimating diffusion network structures: Re-covery conditions, sample complexity & soft-thresholding algorithm. In Proceedings of the 31th InternationalConference on Machine Learning, 2014.

Das, A., Gollapudi, S., and Munagala, K. Modeling opinion dynamics in social networks. In WSDM, 2014.

De, A., Bhattacharya, S., Bhattacharya, P., Ganguly, N., and Chakrabarti, S. Learning a linear influence model fromtransient opinion dynamics. In CIKM, 2014.

DeGroot, M. H. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974.

Farajtabar, M., Du, N., Gomez-Rodriguez, M., Valera, I., Song, L., and Zha, H. Shaping social activity by incentivizingusers. In NIPS, 2014.

Farajtabar, M., Wang, Y., Gomez-Rodriguez, M., Li, S., Zha, H., and Song, L. Coevolve: A joint point process modelfor information diffusion and network co-evolution. In NIPS ’15: Advances in Neural Information ProcessingSystems, 2015.

Gomez-Rodriguez, M., Balduzzi, D., and Scholkopf, B. Uncovering the Temporal Dynamics of Diffusion Networks.In ICML, 2011.

Hannak, Aniko, Anderson, Eric, Barrett, Lisa Feldman, Lehmann, Sune, Mislove, Alan, and Riedewald, Mirek.Tweetin’in the rain: Exploring societal-scale effects of weather on mood. In ICWSM, 2012.

Hegselmann, R. and Krause, U. Opinion dynamics and bounded confidence models, analysis, and simulation. Journalof Artificial Societies and Social Simulation, 5(3), 2002.

Hinrichsen, D., Ilchmann, A., and Pritchard, A.J. Robustness of stability of time-varying linear systems. Journal ofDifferential Equations, 82(2):219 – 250, 1989.

Holme, P. and Newman, M. E. Nonequilibrium phase transition in the coevolution of networks and opinions. PhysicalReview E, 74(5):056108, 2006.

Karppi, T. and Crawford, K. Social media, financial algorithms and the hack crash. Theory, Culture & Society, 2015.

Leskovec, J., Kleinberg, J., and Faloutsos, C. Graphs over time: densification laws, shrinking diameters and possibleexplanations. In KDD, 2005.

Leskovec, J., Chakrabarti, D., Kleinberg, J. M., Faloutsos, C., and Ghahramani, Z. Kronecker graphs: An approach tomodeling networks. JMLR, 2010.

11

Page 12: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

O´Connor, B., Balasubramanyan, R., R., Bryan R., and Smith, N. From tweets to polls: Linking text sentiment topublic opinion time series. ICWSM, 2010.

Pang, B. and Lee, L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1-2):1–135, 2008.

Raven, B. H. The bases of power: Origins and recent developments. Journal of social issues, 49(4):227–251, 1993.

Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. Predicting elections with twitter: What 140 charactersreveal about political sentiment. ICWSM, 2010.

Valera, I. and Gomez-Rodriguez, M. Modeling adoption and usage of competing products. In Proceedings of the 2015IEEE International Conference on Data Mining, 2015.

Yildiz, M. E., Pagliari, R., Ozdaglar, A., and Scaglione, A. Voting models in random networks. In Information Theoryand Applications Workshop, pp. 1–7, 2010.

Zhou, K., Zha, H., and Song, L. Learning triggering kernels for multi-dimensional hawkes processes. In Proceedingsof the 30th International Conference on Machine Learning, pp. 1301–1309, 2013.

12

Page 13: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

A Proof of Lemma 1Using that E[mv(θ)|x∗v(θ)] = x∗v(θ), we can compute the average opinion of user u across all possible histories fromEq. ?? as

EHt [x∗u(t)] = αu +∑

v∈N (u)

auv

∫ t

0

g(t− θ)EHt [mv(θ)dNv(θ)]

= αu +∑

v∈N (u)

auv

∫ t

0

g(t− θ)EHθ[x∗v(θ)λ

∗v(θ)

]dθ,

and we can write the expectation of the opinion for all users in vectorial form as

EHt [x∗(t)] = α+A

∫ t

0

g(t− θ)EHθ [x∗(θ) λ∗(θ)]dθ, (11)

where the sign denotes pointwise product. Then, by differentiating Eq 11, we obtain

dEHt [x∗(t)]dt

= AEHt [x∗(t) λ∗(t)]− ωEHt [x∗(t)] + ωα. (12)

B Proof of Theorem 2Using Lemma 1 and λ∗u(t) = µu, we obtain

dEHt [x∗(t)]dt

= [−ωI +AΛ1]EHt [x∗(t)] + ωα, (13)

where Λ1 = diag[µ]. Then, we apply the Laplace transform to the expression above and obtain

x(s) =(

1 +ω

s

)[sI + ωI −AΛ1]−1α.

Finally, applying the inverse Laplace transform, we obtain the average opinion EHt [x∗(t)] in time domain as

EHt [x∗(t)] =[e(AΛ1−ωI)t + ω(AΛ1 − ωI)−1

(e(AΛ1−ωI)t − I

) ]α.

C Proof of Theorem 3Theorem 2 states that the average users’ opinion EHt [x∗(t)] in time domain is given by

EHt [x∗(t)] =[e(AΛ1−ωI)t + ω(AΛ1 − ωI)−1

(e(AΛ1−ωI)t − I

) ]α.

If Re[ξ(AΛ1)] < ω, where ξ(X) denote the eigenvalues of matrixX , it easily follows that

limt→∞

EHt [x∗(t)] =

(I − AΛ1

w

)−1α. (14)

13

Page 14: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

D Proof of Theorem 4Assume bvu = 0 for all v, u ∈ G, v 6= u. Then, λ∗v(t) only depends on user v’s history and, since x∗v(t) only dependson the history of the user v’s neighbors N (v), we can write

EHt [x∗(t) λ∗(t)] = EHt[x∗(t)

] EHt

[λ∗(t)

],

and rewrite Eq. 12 as

dEHt [x∗(t)]dt

= A(EHt [x∗(t)] EHt [λ∗(t)])− ωEHt [x∗(t)] + ωα. (15)

We can now compute EHt [λ∗(θ)] analytically as follows. From Eq. ??, we obtain

EHt[λ∗(t)

]= µ+

∫ t

0

Bκ(t− θ)EHθ[λ∗(θ)

]dθ, (16)

where µ = [µ1, µ2, ..., µ|V|]> and B = (bvu)v,u∈V , where bvu = 0 for all v 6= u, by assumption. Then, we apply theLaplace transform and obtain

λ(s) = [I −Bκ(s)]−1µ, (17)

where λ(s) and κ(s) denote the Laplace transforms of EHt [λ∗(t)] and κ(t) respectively. Now, using that κ(t) =exp(−νt), we can write

λ(s) =[I − B

s+ ν

]−1µ,

and obtain EHt [λ∗(t)] in time domain as

EHt [λ∗(t)] =[e(B−νI)t + ν(B − νI)−1

(e(B−νI)t − I

) ]µ. (18)

Using Eq. 15, Eq. 18, and EHt [x∗(t)] EHt [λ∗(t)] = Λ(t)EHt [x∗(t)], where Λ(t) := diag[EHt [λ∗(t)]], weobtain

dEHt [x∗(t)]dt

= [−ωI +AΛ(t)]EHt [x∗(t)] + ωα. (19)

In such systems, solutions can be written as Hinrichsen et al. (1989)

EHt [x∗(t)] = Φ(t)α+ ω

∫ t

0

Φ(s)αds, (20)

where the transition matrix Φ(t) defines as a solution of the matrix differential equation

Φ(t) = [−ωI +AΛ(t)]Φ(t) with Φ(0) = I.

If Φ(t) satisfies ||Φ(t)|| ≤ γe−ct ∀t > 0 for γ, c > 0 then the steady state solution to Eq. 20 is given by Hinrichsenet al. (1989)

limt→∞

EHt [x∗(t)] =

(I − AΛ2

ω

)−1α.

where Λ2 = limt→∞Λ(t) = diag[I − B

ν

]−1µ.

14

Page 15: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

E Model simulation algorithm

Algorithm 1 OpinionModelSimulation(T,µ,B,α,A)1: Initialize the priority queue Q2: LastOpinionUpdateTime[1 : |V |] = LastIntensityUpdateTime[1 : |V |] = ~03: LastOpinionUpdateValue[1 : |V |] = α4: LastIntensityUpdateValue[1 : |V |] = µ5: H(0)← ∅6: for u ∈ V do7: t =SampleEvent(µ[u], 0, u)8: Q.insert([t, u])9: while t < T do

10: %%Approve the minimum time of all posts11: [t′, u] = Q.ExtractMin()12: tu = LastOpinionUpdateTime[u]13: xtu = LastOpinionUpdateValue[u]14: xt′u ← α[u] + (xtu −α[u])e−ω(t−tu)

15: LastOpinionUpdateTime[u] = t′

16: LastOpinionUpdateValue[u] = xt′u17: mu ∼ p(m|xu(t))18: H(t′)← H(t) ∪ (t,mu, u)19: %% Update neighbors’20: for ∀v such that u v do21: tv = LastIntensityUpdateTime[v]22: λtv = LastIntensityUpdateValue[v]23: λt′v ← µ[v] + (λtv − µ[v])e−ω(t−tv) +Buv

24: LastIntensityUpdateTime[v] = t′

25: LastIntensityUpdateValue[v] = λt′v26: tv = LastOpinionUpdateTime[v]27: xtv = LastOpinionUpdateValue[v]28: xt′v ← α[v] + (xtv −α[v])e−ω(t−tv) +Auvmu

29: LastOpinionUpdateTime[v] = t′

30: LastOpinionUpdateValue[v] = xt′v31: %%Sample by only effected nodes32: t+ =SampleEvent(λv(t′), t′, v)33: Q.UpdateKey(v, t+)34: t← t′

35: returnH(T )

Algorithm 2 SampleEvent(λ, t, v)

1: λ = λ, s← t2: while s < T do3: U ∼ Uniform[0, 1]4: s = s− lnU

λ

5: λ(s)← µ[v] + (λv(t)− µ[v])e−ω(s−t)

6: d ∼ Uniform[0, 1]7: if dλ < λ then8: break9: else

10: λ = λ(s)11: return s

15

Page 16: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

F Parameter estimation algorithm

Algorithm 3 Estimation of Informational Influence α and A1: Input: HT , G(V,E) , regularizer λ, error bound ε2: Output: αu, A3: Initialize:4: IndexForV[1 : |V |] = ~05: for i = 1 to |H(T )| do6: T [ui](IndexForV[ui]) = ti7: M[ui](IndexForV[ui]) = mi

8: IndexForV[ui]++9: // triggering kernel computation

10: for u ∈ V do11: i = 012: for v ∈ N (u) do13: //compute t− ti values14: g1 = ~1|T [v]|T [u]′ − T [v]~1′|T [u]|15: g1 = (g1 < 0)?∞ : g1

16: M =M[u]~1′|T [u]|17: G = e−ωg1 M18: s = 019: for j = 1 to NumRow(G) do20: s = s+ 121: i = i+ 122: g[u](i, :) = s23: for u ∈ V do24: a =InferOpinionParams(u,G, λ, g[u])25: α = a[1]26: A[∗][u] = a[1 : end]

Algorithm 4 InferOpinionParams(u,G, λ, gu)1: s = length(gu)2: X = [~1′s; gu]3: Y =M[u]4: L = XX ′

5: x = (λI + L)−1XY6: return x

16

Page 17: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

Algorithm 5 Spectral projected gradient descent for muldimensional Hawkes parameters estimation for node u1: Input: HT , G(V,E) and µ0

u,B0u∗ , step-length bound αmax > 0 and initial-step length αbb ∈ (0, αmax], error bound ε and

size of memory h2: Output: µu, Bu∗

3: Notation:x = [µu, Bu∗ ]

f(x) =∑ei∈H(T ) log λ∗ui(ti)−

∑u∈V

∫ T0λ∗u(τ) dτ

4: Initialize:k = 0x0 = [µ0

u,B0u∗ ]

5: while ||dk|| < ε do6: αk ← minαmax, αbb7: dk ← Pc

[xk − αk∇fu(xk)]− xk

8: fub ← maxfu(xk), fu(xk−1), ..., fu(xk−h)9: α← 1

10: while qk(xk + αdk) > fub + να∇fu(xk)T dk do11: Select α by backtracking line-search;12: xk+1 ← xk + αdk13: sk ← αdk14: yk ← αBkdk15: αbb ← yTk yk/s

Tk yk

16: k = k + 1

17

Page 18: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

G Additional experiments on synthetic data

Parameter estimation. We evaluate the accuracy of our model estimation procedure in the same Kronecker networksas in the experimental evaluation in the main text in terms of log-likelihood over a test set of messages, disjoint fromthe set of messages used for model estimation. Figure 10 shows that as we feed more messages into the estimationprocedure, the test log-likelihood achieved by the estimated parameters becomes closer to the test log-likelihoodachieved by the true parameters. Additionally, we evaluate the accuracy of our model estimation procedure in Forest-Fire networks Leskovec et al. (2005), using the same range of parameters for informational and temporal influence.Figure 11 summarizes the results, which shows that, as we feed more messages into our estimation procedure, theaccuracy increases.

0.2

0.4

0.6

0.8

1

025 50 100 200 400 800Avg. no. of events per node

HomophilyHeterophilyCore-periphery

]]

∣ ∣ 1−

log[ P

(D|A

,α)]

log[ P

(D|A

,α)]∣ ∣

(a) p(m|x∗u(t))

0.2

0.4

0.6

0.8

1

025 50 100 200 400 800Avg. no. of events per node

Homophily

Heterophily

Core-periphery

]]

∣ ∣ 1−

log[ P

(H|B

,µ)]

log[ P

(H|B

,µ)]∣ ∣

(b) p(t|λ∗u(t))

Figure 10: Performance of model estimation for several 512-node kronecker networks in terms of test log-likelihood.As we feed more messages into the estimation procedure, the estimation becomes more accurate.

0.5

0.6

0.7

0.8

0.9

1

25 50 100 200

MSE(α

,A)→

Avg. no. of events per node

pf = 0.5pf = 0.45pf = 0.4

]]]

(a) Informational influence

0.1

0.2

0.3

0.4

0.5

025 50 100 200

MSE(µ,B

)→

Avg. no. of events per node

pf = 0.5pf = 0.45pf = 0.4

]]]

(b) Temporal influence

0.2

0.4

0.6

0.8

1

025 50 100 200Avg. no. of events per node

pf = 0.5

pf = 0.45

pf = 0.4

∣ ∣ 1−

log[ P

(D|A

,α)]

log[ P

(D|A

,α)]∣ ∣

]

(c) p(m|x∗u(t))

0.2

0.4

0.6

0.8

1

025 50 100 200Avg. no. of events per node

pf = 0.5

pf = 0.45

pf = 0.4

∣ ∣ 1−

log[ P

(H|B

,µ)]

log[ P

(H|B

,µ)]∣ ∣

]

(d) p(t|λ∗u(t))

Figure 11: Performance of model estimation for several 512-node Forest-Fire networks in terms of mean squared errorand test log-likelihood. In all three networks, we used pb = 0.3.

Scalability. Figure 12 shows that our model simulation algorithm, summarized in Algorithm 1, scales to hundred ofmillions of events generated by millions of users.

Nodes

Tim

e(s)

101 102 103 104 105 10610−210−1100101102103104105

(a) Time-Node plotTotal Events

Tim

e(s)

103 108104 105 106 10710−210−1100101102103104105

(b) Time-events plot

Figure 12: Runing time of our simulation procedure, described in Algorithm 1. In Panel (a), the average number ofevents per node is 10. In Panel (b), the network has 1,000 nodes. The average degree per node is 30. The experimentsare carried out in a single machine with 24 cores and 64 GB RAM.

18

Page 19: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

H Additional experiments on real data

H.1 Twitter dataset descriptionWe used the Twitter search API9 to collect all the tweets (corresponding to a 2-3 weeks period around the event date)that contain hashtags related to the following events/topics:• Politics: Similarly to De et al. (2014), we collect all the tweets, from 9th to 15th of December 2013, related to

the Delhi Assembly election 2013.• Movie: We collect all the tweets, from April 28th to May 5th, 2015, about the discussion on the release of

“Avengers: Age of Ultron” movie that took place on May 1, 2015.• Fight: We collect all the tweets, from April 29th to May 7th, 2015, about the discussion on the professional

boxing match between the eight-division world champion Manny Pacquiao and the undefeated five-divisionworld champion Floyd Mayweather Jr., which took place on May 2, 2015 in Las Vegas, Nevada.

• Bollywood: We collect all the tweets, from May 5th to 16th, 2015, about the discussion on the verdict thatdeclared guilty to Salman Khan (a popular Bollywood movie star) on May 6, 2015, for causing death of a personby rash and negligible driving

We then built the follower-followee network for the users that posted the collected tweets using the Twitter rest API10.Finally, we filtered out users that posted less than 200 tweets during the account lifetime, follow less than 100 users,or have less than 50 followers.

Dataset |V| |E| |H(T )| E[m] std[m]

Tw: Politics 548 5271 20026 0.0169 0.1780Tw: Movie 567 4886 14016 0.5969 0.1358Tw: Fight 848 10118 21526 -0.0123 0.2577Tw: Bollywood 1031 34952 46845 0.5101 0.2310Amzn: Food 447 13498 24815 0.2108 0.1648Reddit 556 4629 64,366 -0.0088 0.1333

Table 1: Real datasets statistics

H.2 Sentiment value prediction vs training set sizeHere, we compare the performance of our model with DeGroot’s model and AsLM for the Amazon and Reddit datasetsin terms of sentiment value prediction at opinion nowcasting (T = 0) against training set size. Figure 13 summarizesthe results, which show that our model improves its performance when increasing the training set and outperforms thecompeting methods for any training size.

9https://dev.twitter.com/rest/public/search10https://dev.twitter.com/rest/public

19

Page 20: Learning Opinion Dynamics in Social Networks · Learning Opinion Dynamics in Social Networks Abir De1, Isabel Valera2, Niloy Ganguly1, Sourangshu Bhattacharya1, and Manuel Gomez-Rodriguez2

0.1

0.02

0.04

0.06

0.08

0

% of Training Data

MSE(m

v)→

Politics

HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(a) Politics

0.1

0.2

0.3

0.4

0

% of Training Data

MSE(m

v)→

Avengers HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(b) Movie

0.1

0.02

0.04

0.06

0.08

0

% of Training Data

MSE(m

v)→

Fight

HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(c) Fight

0.1

0.2

0.3

0.4

0

% of Training Data

MSE(m

v)→

Bollywood

HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(d) Bollywood

0.1

0.2

0.05

0.15

0

% of Training Data

MSE(m

v)→

Food HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(e) Amazon: Food

0.1

0.2

0.05

0.15

0.25

0

% of Training Data

MSE(m

v)→

Reddit

HGOMDeGrootAsLM

]]]

99 99.9999.960 80 90

(f) Reddit

Figure 13: Mean squared error on sentiment value prediction against training set size for all real-world datasets.

20