a bivariate point process model with application to social ... · application to social media user...

A Bivariate Point Process Model withApplication to Social Media User Content

Generation

Emma Jingfei [email protected]

Yongtao [email protected]

Department of Management ScienceThe Miami Business School, University of Miami

1 / 33

Data Description: Sina Weibo Data

Source: Sina Weibo, the largest twitter-type online socialmedia in China.The dataset contains posts from 5,913 followers of theofficial Beijing University Guanghua MBA Weibo account.For each user, all of his/her posts during the period of Jan1st to Jan 30th, 2014, including the time stamp of eachpost, have been collected.Each post can be a post with original contents or a repost.

2 / 33

Data Description: Trump’s Twitter Data

Source: Twitter data collected from Donald Trump(@realDonaldTrump) from Jan 2013 to Apr 2018.Twitter archive of Donald Trump can be downloaded fromhttp://www.trumptwitterarchive.com/.Twitter shows the device used for each tweet; devices maybe Android, Web Client, iPhone, and others.We consider the tweets posted by using a Android devicebefore and an iPhone after the election.This results in a total of 17,518 tweets; the averagenumber of monthly tweets is 278.Each tweet is either an original tweet or a retweet.

3 / 33

http://www.trumptwitterarchive.com/


01/01 01/05 01/10 01/15 01/20 01/25 01/30

Use

r 3U

ser 2

Use

r 1

date

Figure : The posting times of three users.

4 / 33


0 10 20 30 40 50 60 70

1.0e-05

1.5e-05

2.0e-05

2.5e-05

3.0e-05

3.5e-05

hour

Figure : Average empirical pair correlation function.

5 / 33

Observations from Data

A user’s posting activity may alternate between active andinactive states.During an active state, the user may publish one or moreposts (often with short inter-post time distances).During an inactive state, no post is being produced untilthe start of the next active state.There may be daily patterns in posting times.It’s a bivariate point process (i.e., posts and reposts).

6 / 33

Graphical Illustration: Univariate Process

Episodes: clusters of posting time locations.Adjacent episodes are nonoverlapping and separated bythe inactive period in between.

7 / 33

Graphical Illustration: Bivariate Process

episode

postsegment

postsegment

repostsegment

episodeInactive

Each episode contains subepisodes of posts and reposts.Posts (reposts) tend to be followed by posts (reposts).Reposts may be more clustered than posts.Number of reposts may be related to number of followees.

8 / 33

Clustered Point Process

Goal: Model the clustered posting times for social mediaposting time data (do not distinguish between posts andreposts for now).

Existing Methods:Hawkes processThe Neyman-Scott processBarlett-Lewis processInterrupted poisson process

We propose a new class of clustered temporal point processesthat is easy to interpret and also can be easily generalized tothe bivariate case.

9 / 33

Model Formulation

For each episode, the parent event generates a Poissonnumber of offspring events with mean µ.

Each offspring location, relative to the location of theprevious event in the same cluster, follows an exponentialdistribution with parameter ρ.

Once all the events in an episode have been observed, theparent event in the following episode is generated followinga hazard function λ(t ;β).

10 / 33

Model Formulation

By observing the daily cyclic pattern in the average paircorrelation function, we may assume that

λ(t ;β) = exp

β0 +

p∑j=1

[βj1 cos(ωj t) + βj2 sin(ωj t)]

where ωj = 2jπ and β = {β0, βj1, βj2 : j = 1, · · · ,p}.

Other nonparametric models can also be used.

11 / 33

Model Formulation

Define event time locations {Tl : l = 1, . . . ,N} andindicator variables {Yl : l = 1, . . . ,N}, where Yl = 1 denoteparent events and Yl = 0 offspring events.Let T0 = 0. Define the gap time

Dl = Tl − Tl−1, l = 1, · · · ,N.

Let fl0(x) and fl1(x) be the probability density functions ofDl given that Yl = 0 and Yl = 1. Assume

fl0(x) = ρexp(−ρx),

and

fl1(x) = λ(tl−1 + x ;β)exp

[−∫ tl−1+x

tl−1

λ(t ;β)dt

].

12 / 33

Model Formulation

Assume the first event is a parent event and all events in thelast episode are contained in [0,T ].

The complete-data likelihood can then be written as

L(θ; t,y) =n∏

l=1

1∏m=0

[flm(dl ;θ)

I(yl=m)] [ k∏

i=1

P(Ni = ni)

]P(Dn+1 > T−tn),

where Dn+1 is the gap time between tn and the next parentevent,

P(Ni = ni) =exp(−µ)µni

ni !,

and

P(Dn+1 > T − tn) = exp

[−∫ T

tnλ(t ;β)dt

].

13 / 33

Composite Likelihood Estimation

The observed-data likelihood is∑

y L(θ; t,y), where thesummation is over all 2n possibilities of y!!!Divide W = [0,T ] into J non-overlapping unit windows oflength s, i.e., W =

⋃Jj=1 Wj where Wj = [(j − 1)s, js).

As before, we assumeThe first event in Wj is a parent event,All events in the last episode of Wj are contained in Wj .

Define tj = {ti : ti ∈Wj} and yj = {yi : ti ∈Wj}. Then theobserved-data likelihood on Wj is

∑yj

L(θ; tj ,yj).

We estimate θ by maximizing the composite likelihood

L̃(θ; t) =J∏

j=1

∑yj

L(θ; tj ,yj)

.14 / 33

Composite Likelihood Estimation

Each summation in the CLE is over 2nj terms where nj isthe number of events in Wj .

Note that∑J

j=1 2nj << 2n so significant computationalgains can be achieved.There is a potential bias problem since

The first event in Wj may not be a parent event,Not all events in the last episode of Wj are contained in Wj .

The bias problem can be mitigated if we choose the blocks“wisely”.Convergence can be a problem since multiple parametersneed to be estimated simultaneously and the likelihoodsurface is often quite flat.

15 / 33

A Composite Likelihood EM Algorithm

Let Tj and Yj be the random version of tj and yj .In the E-Step, we take expectation of the log likelihood`(θ; tj ,Yj) with respect to the conditional distribution ofYj |Tj = tj , θ̂prev , i.e.,

Qj(θ|θ̂prev ) = EYj |Tj=tj ,θ̂prev`(θ; tj ,Yj).

Define

Q(θ|θ̂prev ) =J∑

j=1

Qj(θ|θ̂prev ).

In the M-step, Q(θ|θ̂prev ) is maximized with respect to θ.

16 / 33


For the expectation, we need to calculate for tl ∈Wj ,Pθ(Yl = m|Tj = tj) which is

Pθ(Yl = m|Tj = tj) =

∑yj |yl=m L(θ; tj ,yj)∑

yjL(θ; tj ,yj)

.

If there are a large number of events in Wj , we employ astandard Metropolis- Hasting algorithm to sample from theconditional distribution Yj |Tj = tj ,θ for the E-step.

Closed form expressions can be obtained for θ̂ (except forβ̂) in the M-step.Convergence is no issue.

17 / 33


Theorem

The log-composite likelihood ˜̀(θ; t) = log L̃(θ; t) satisfies˜̀(θp; t) ≥ ˜̀(θp−1; t), p = 1,2, . . ., where θp is the pth updatefrom the E-M algorithm.

The theorem guarantees that log-composite likelihood isnondecreasing at each EM iteration.The convergence of θ̂p to a stationary point as p →∞ isguaranteed by Theorem 2 in Wu (1983).Standard techniques such as running the EM algorithmfrom multiple starting point can help locate the globalmaximum.Consistency and asymptotic normality can be establishedfor the global maximum (assuming the model is right).

18 / 33

Extension to Bivariate Case

For each episode, there are a Poisson number ofsubepisodes with mean γ.

Post and repost episodes alternate.The first subepisode is post with probability α.There are a Poisson number of offspring in each post(repost) subepisode with mean µ1 (µ0).For each offspring in a post (repost) subepisode, its locationrelative to that of the previous event in the same episodefollows an exponential distribution with parameter ρ1 (ρ0).

Once all the events in an episode have been observed, theparent event in the following episode is generated followinga hazard function λ(t ;β).

The composite likelihood E-M algorithm can be modified tofit the model.

19 / 33

Application to Trump’s Twitter Data

2013 2014 2015 2016 2017 2018

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

α

2013 2014 2015 2016 2017 2018

0.0

0.5

1.0

1.5

γ

2013 2014 2015 2016 2017 2018

0.2 0.4 0.6 0.8 1.0 1.2 1.4

µ1

2013 2014 2015 2016 2017 2018

0.00.51.01.52.02.5

µ0

2013 2014 2015 2016 2017 2018

100

200

300

400

ρ1

2013 2014 2015 2016 2017 2018

0500

1000

1500

ρ0

2013 2014 2015 2016 2017 20183

45

6

number of tweets per episode

2013 2014 2015 2016 2017 2018

hour

0.2 0.3 0.4 0.5 0.6 0.7 0.8

episode length

Figure : Parameters estimated from Donald Trump’s monthly Twitterdata. The two red dashed lines mark June 2015 (candidacyannouncement) and Jan 2017 (assumes office), respectively.

20 / 33

Figure : Estimated parent event hazard functions from DonaldTrump’s monthly Twitter data. The two red dashed lines mark June2015 (candidacy announcement) and Jan 2017 (assumes office),respectively. 21 / 33

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.005 0.010 0.015 0.020 0.025 0.030

0.0

0.2

0.4

0.6

0.8

1.0

0.01 0.02 0.03 0.04 0.05

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

Figure : Goodness of fit plots of the model fitted for Jan 2017. Fromleft to right are the envelop plot (first plot) with the upper and lowerenvelopes marked in red dashed lines, goodness of fit plots for theoriginal offspring post (second plot), offspring repost (third plot) andparent (last plot) inter-event distances. Red solid lines are calculatedfrom cdf of exponential distributions. The grey bands are the 95%confidence intervals.

22 / 33

Application to Sina Weibo Data

01/01 01/05 01/10 01/15 01/20 01/25 01/30

Use

r 3U

ser 2

Use

r 1

date

Figure : The posting times of three users.

23 / 33

α γ µ1 µ0 ρ1 ρ0User 1 0.343 0.024 0.099 0.241 14.444 43.442

(0.008) (0.004) (0.010) (0.014) (7.166) (6.124)User 2 0.387 0.086 0.101 0.614 163.026 618.721

(0.009) (0.006) (0.010) (0.010) (13.013) (21.749)User 3 0.644 0.227 0.445 0.309 90.983 152.253

(0.006) (0.008) (0.013) (0.012) (5.882) (7.477)

Table : Estimated α, γ, µ1, µ0, ρ1, ρ0 of Users 1, 2 and 3.

24 / 33


12 am 12 pm 12 am

05

1015

20

time

intensity

User 1User 2User 3

Figure : Parent hazard functions of Users 1, 2 and 3.

25 / 33


12am 12pm 12am

12

34

mea

ncu

rve

-3-1

12

-3-1

12

eige

nfun

ctio

n2

-3-1

12

mean function first eigenfunction

second eigenfunction third eigenfunction

12am 12pm 12am

12am 12pm 12am

12am 12pm 12am

Figure : Plots of the mean and first three eigenfunctions of theestimated daily parent hazard functions.

26 / 33

Characterize Sina Weibo User Behavior

05

1015

20

3.2% 15.6% 81.2%

05

10

4.2% 20.4% 75.4%

01

23

4

7.3% 26.05% 66.6%

Figure : Groups in the average daily parent hazard (left plot), averagenumber of posts per episode (middle plot) and average length (inhours) of an episode (right plots). The percentages at the bottom ofthe boxplots show the percentage of users in each group.

27 / 33

Social Effect on Users of Sina Weibo

For each Sina Weibo user, we were also able to collect thenumber of accounts the user was following (n→) and thenumber of accounts that were following this user (n←).

We find that there is a stronger correlation between n→and µ0 (r = 0.205).These observations indicate that users who follow moreaccounts are more likely to have more reposts.One explanation could be that the more accounts a userfollows, the more content they can repost from. Anotherplausible explanation is that the “followers” in the socialmedia tend to repost more.

28 / 33

Social Effect on Users of Sina Weibo

We find that the “popular” users, i.e., those whoseaccounts have many followers, tend to post more originalcontent. They are also more likely to initiate their Weiboengagement by posting original content.

We find that users who have strong social ties, i.e., havemany followers or follow many others, are more likely touse Weibo more often.

We find that users with many followers are more likely tospend more time on Weibo once they start an episode ofengagement.

29 / 33

Simulation Study

We set the observation window length T = 100, α = 0.6.With each parameter configuration, we simulate 100 eventtrajectories.We set the parent event hazard function as

λ(t ;β) = exp [β01 + β11 cos(2πt) + β12 sin(2πt)] .

For estimation, we use unit window length s = 1 or 5.To model λ(t ,β), we consider both the true model and thenonparametric cyclic B-spline model. For the latter, we usethe knot vector (0,0.2,0.4,0.6,0.8,1).

30 / 33

Simulation Study

31 / 33

Simulation Study

(γ, µ1, µ0, ρ1, ρ0)

(β01, β11, β12; s) α γ µ1 µ0 ρ1 ρ0

(0.5,0.5,0.5,10,15) 0.595 0.498 0.489 0.494 10.172 15.604(-2,-2,2; 5) (0.010) (0.013) (0.014) (0.014) (0.261) (0.365)

(0.5,0.5,0.5,10,15) 0.594 0.496 0.510 0.518 9.867 15.422(-3,-3,3; 5) (0.007) (0.011) (0.012) (0.014) (0.188) (0.284)

(1.0,0.5,0.5,10,15) 0.603 0.993 0.489 0.499 10.012 15.026(-2,-2,2; 5) (0.009) (0.017) (0.011) (0.012) (0.176) (0.257)

(0.5,1.0,1.0,10,15) 0.598 0.511 0.990 1.025 10.149 15.084(-2,-2,2; 5) (0.008) (0.010) (0.016) (0.017) (0.171) (0.309)

(0.5,0.5,0.5,20,30) 0.600 0.508 0.499 0.488 19.855 30.354(-2,-2,2; 5) (0.008) (0.012) (0.012) (0.013) (0.460) (0.717)

(0.5,0.5,0.5,10,15) 0.601 0.468 0.495 0.460 10.795 16.335(-2,-2,2; 1) (0.008) (0.010) (0.014) (0.014) (0.271) (0.309)

32 / 33

Summary

We propose a new clustered temporal point process modelto model user generated posts on social media.The proposed model captures both inhomogeneity in theinitial posting time and the clustering pattern in thesubsequent posts following the initial post.The proposed goodness of fit procedure shows that theproposed model fits the data reasonably well.The fitted models provide valuable insights on a user’scontent generating behavior.

33 / 33

a bivariate point process model with application to social ... · application to social media user...

Documents