a bivariate point process model with application to social ... · application to social media user...
TRANSCRIPT
A Bivariate Point Process Model withApplication to Social Media User Content
Generation
Emma Jingfei [email protected]
Yongtao [email protected]
Department of Management ScienceThe Miami Business School, University of Miami
1 / 33
Data Description: Sina Weibo Data
Source: Sina Weibo, the largest twitter-type online socialmedia in China.The dataset contains posts from 5,913 followers of theofficial Beijing University Guanghua MBA Weibo account.For each user, all of his/her posts during the period of Jan1st to Jan 30th, 2014, including the time stamp of eachpost, have been collected.Each post can be a post with original contents or a repost.
2 / 33
Data Description: Trump’s Twitter Data
Source: Twitter data collected from Donald Trump(@realDonaldTrump) from Jan 2013 to Apr 2018.Twitter archive of Donald Trump can be downloaded fromhttp://www.trumptwitterarchive.com/.Twitter shows the device used for each tweet; devices maybe Android, Web Client, iPhone, and others.We consider the tweets posted by using a Android devicebefore and an iPhone after the election.This results in a total of 17,518 tweets; the averagenumber of monthly tweets is 278.Each tweet is either an original tweet or a retweet.
3 / 33
Data Description: Sina Weibo Data
01/01 01/05 01/10 01/15 01/20 01/25 01/30
Use
r 3U
ser 2
Use
r 1
date
Figure : The posting times of three users.
4 / 33
Data Description: Sina Weibo Data
0 10 20 30 40 50 60 70
1.0e-05
1.5e-05
2.0e-05
2.5e-05
3.0e-05
3.5e-05
hour
Figure : Average empirical pair correlation function.
5 / 33
Observations from Data
A user’s posting activity may alternate between active andinactive states.During an active state, the user may publish one or moreposts (often with short inter-post time distances).During an inactive state, no post is being produced untilthe start of the next active state.There may be daily patterns in posting times.It’s a bivariate point process (i.e., posts and reposts).
6 / 33
Graphical Illustration: Univariate Process
Episodes: clusters of posting time locations.Adjacent episodes are nonoverlapping and separated bythe inactive period in between.
7 / 33
Graphical Illustration: Bivariate Process
episode
postsegment
postsegment
repostsegment
episodeInactive
Each episode contains subepisodes of posts and reposts.Posts (reposts) tend to be followed by posts (reposts).Reposts may be more clustered than posts.Number of reposts may be related to number of followees.
8 / 33
Clustered Point Process
Goal: Model the clustered posting times for social mediaposting time data (do not distinguish between posts andreposts for now).
Existing Methods:Hawkes processThe Neyman-Scott processBarlett-Lewis processInterrupted poisson process
We propose a new class of clustered temporal point processesthat is easy to interpret and also can be easily generalized tothe bivariate case.
9 / 33
Model Formulation
For each episode, the parent event generates a Poissonnumber of offspring events with mean µ.
Each offspring location, relative to the location of theprevious event in the same cluster, follows an exponentialdistribution with parameter ρ.
Once all the events in an episode have been observed, theparent event in the following episode is generated followinga hazard function λ(t ;β).
10 / 33
Model Formulation
By observing the daily cyclic pattern in the average paircorrelation function, we may assume that
λ(t ;β) = exp
β0 +
p∑j=1
[βj1 cos(ωj t) + βj2 sin(ωj t)]
where ωj = 2jπ and β = {β0, βj1, βj2 : j = 1, · · · ,p}.
Other nonparametric models can also be used.
11 / 33
Model Formulation
Define event time locations {Tl : l = 1, . . . ,N} andindicator variables {Yl : l = 1, . . . ,N}, where Yl = 1 denoteparent events and Yl = 0 offspring events.Let T0 = 0. Define the gap time
Dl = Tl − Tl−1, l = 1, · · · ,N.
Let fl0(x) and fl1(x) be the probability density functions ofDl given that Yl = 0 and Yl = 1. Assume
fl0(x) = ρexp(−ρx),
and
fl1(x) = λ(tl−1 + x ;β)exp
[−∫ tl−1+x
tl−1
λ(t ;β)dt
].
12 / 33
Model Formulation
Assume the first event is a parent event and all events in thelast episode are contained in [0,T ].
The complete-data likelihood can then be written as
L(θ; t,y) =n∏
l=1
1∏m=0
[flm(dl ;θ)
I(yl=m)] [ k∏
i=1
P(Ni = ni)
]P(Dn+1 > T−tn),
where Dn+1 is the gap time between tn and the next parentevent,
P(Ni = ni) =exp(−µ)µni
ni !,
and
P(Dn+1 > T − tn) = exp
[−∫ T
tnλ(t ;β)dt
].
13 / 33
Composite Likelihood Estimation
The observed-data likelihood is∑
y L(θ; t,y), where thesummation is over all 2n possibilities of y!!!Divide W = [0,T ] into J non-overlapping unit windows oflength s, i.e., W =
⋃Jj=1 Wj where Wj = [(j − 1)s, js).
As before, we assumeThe first event in Wj is a parent event,All events in the last episode of Wj are contained in Wj .
Define tj = {ti : ti ∈Wj} and yj = {yi : ti ∈Wj}. Then theobserved-data likelihood on Wj is
∑yj
L(θ; tj ,yj).
We estimate θ by maximizing the composite likelihood
L̃(θ; t) =J∏
j=1
∑yj
L(θ; tj ,yj)
.14 / 33
Composite Likelihood Estimation
Each summation in the CLE is over 2nj terms where nj isthe number of events in Wj .
Note that∑J
j=1 2nj << 2n so significant computationalgains can be achieved.There is a potential bias problem since
The first event in Wj may not be a parent event,Not all events in the last episode of Wj are contained in Wj .
The bias problem can be mitigated if we choose the blocks“wisely”.Convergence can be a problem since multiple parametersneed to be estimated simultaneously and the likelihoodsurface is often quite flat.
15 / 33
A Composite Likelihood EM Algorithm
Let Tj and Yj be the random version of tj and yj .In the E-Step, we take expectation of the log likelihood`(θ; tj ,Yj) with respect to the conditional distribution ofYj |Tj = tj , θ̂prev , i.e.,
Qj(θ|θ̂prev ) = EYj |Tj=tj ,θ̂prev`(θ; tj ,Yj).
Define
Q(θ|θ̂prev ) =J∑
j=1
Qj(θ|θ̂prev ).
In the M-step, Q(θ|θ̂prev ) is maximized with respect to θ.
16 / 33
A Composite Likelihood EM Algorithm
For the expectation, we need to calculate for tl ∈Wj ,Pθ(Yl = m|Tj = tj) which is
Pθ(Yl = m|Tj = tj) =
∑yj |yl=m L(θ; tj ,yj)∑
yjL(θ; tj ,yj)
.
If there are a large number of events in Wj , we employ astandard Metropolis- Hasting algorithm to sample from theconditional distribution Yj |Tj = tj ,θ for the E-step.
Closed form expressions can be obtained for θ̂ (except forβ̂) in the M-step.Convergence is no issue.
17 / 33
A Composite Likelihood EM Algorithm
Theorem
The log-composite likelihood ˜̀(θ; t) = log L̃(θ; t) satisfies˜̀(θp; t) ≥ ˜̀(θp−1; t), p = 1,2, . . ., where θp is the pth updatefrom the E-M algorithm.
The theorem guarantees that log-composite likelihood isnondecreasing at each EM iteration.The convergence of θ̂p to a stationary point as p →∞ isguaranteed by Theorem 2 in Wu (1983).Standard techniques such as running the EM algorithmfrom multiple starting point can help locate the globalmaximum.Consistency and asymptotic normality can be establishedfor the global maximum (assuming the model is right).
18 / 33
Extension to Bivariate Case
For each episode, there are a Poisson number ofsubepisodes with mean γ.
Post and repost episodes alternate.The first subepisode is post with probability α.There are a Poisson number of offspring in each post(repost) subepisode with mean µ1 (µ0).For each offspring in a post (repost) subepisode, its locationrelative to that of the previous event in the same episodefollows an exponential distribution with parameter ρ1 (ρ0).
Once all the events in an episode have been observed, theparent event in the following episode is generated followinga hazard function λ(t ;β).
The composite likelihood E-M algorithm can be modified tofit the model.
19 / 33
Application to Trump’s Twitter Data
2013 2014 2015 2016 2017 2018
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
α
2013 2014 2015 2016 2017 2018
0.0
0.5
1.0
1.5
γ
2013 2014 2015 2016 2017 2018
0.2 0.4 0.6 0.8 1.0 1.2 1.4
µ1
2013 2014 2015 2016 2017 2018
0.00.51.01.52.02.5
µ0
2013 2014 2015 2016 2017 2018
100
200
300
400
ρ1
2013 2014 2015 2016 2017 2018
0500
1000
1500
ρ0
2013 2014 2015 2016 2017 20183
45
6
number of tweets per episode
2013 2014 2015 2016 2017 2018
hour
0.2 0.3 0.4 0.5 0.6 0.7 0.8
episode length
Figure : Parameters estimated from Donald Trump’s monthly Twitterdata. The two red dashed lines mark June 2015 (candidacyannouncement) and Jan 2017 (assumes office), respectively.
20 / 33
Figure : Estimated parent event hazard functions from DonaldTrump’s monthly Twitter data. The two red dashed lines mark June2015 (candidacy announcement) and Jan 2017 (assumes office),respectively. 21 / 33
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.005 0.010 0.015 0.020 0.025 0.030
0.0
0.2
0.4
0.6
0.8
1.0
0.01 0.02 0.03 0.04 0.05
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
Figure : Goodness of fit plots of the model fitted for Jan 2017. Fromleft to right are the envelop plot (first plot) with the upper and lowerenvelopes marked in red dashed lines, goodness of fit plots for theoriginal offspring post (second plot), offspring repost (third plot) andparent (last plot) inter-event distances. Red solid lines are calculatedfrom cdf of exponential distributions. The grey bands are the 95%confidence intervals.
22 / 33
Application to Sina Weibo Data
01/01 01/05 01/10 01/15 01/20 01/25 01/30
Use
r 3U
ser 2
Use
r 1
date
Figure : The posting times of three users.
23 / 33
α γ µ1 µ0 ρ1 ρ0User 1 0.343 0.024 0.099 0.241 14.444 43.442
(0.008) (0.004) (0.010) (0.014) (7.166) (6.124)User 2 0.387 0.086 0.101 0.614 163.026 618.721
(0.009) (0.006) (0.010) (0.010) (13.013) (21.749)User 3 0.644 0.227 0.445 0.309 90.983 152.253
(0.006) (0.008) (0.013) (0.012) (5.882) (7.477)
Table : Estimated α, γ, µ1, µ0, ρ1, ρ0 of Users 1, 2 and 3.
24 / 33
Application to Sina Weibo Data
12 am 12 pm 12 am
05
1015
20
time
intensity
User 1User 2User 3
Figure : Parent hazard functions of Users 1, 2 and 3.
25 / 33
Application to Sina Weibo Data
12am 12pm 12am
12
34
mea
ncu
rve
-3-1
12
-3-1
12
eige
nfun
ctio
n2
-3-1
12
mean function first eigenfunction
second eigenfunction third eigenfunction
12am 12pm 12am
12am 12pm 12am
12am 12pm 12am
Figure : Plots of the mean and first three eigenfunctions of theestimated daily parent hazard functions.
26 / 33
Characterize Sina Weibo User Behavior
05
1015
20
3.2% 15.6% 81.2%
05
10
4.2% 20.4% 75.4%
01
23
4
7.3% 26.05% 66.6%
Figure : Groups in the average daily parent hazard (left plot), averagenumber of posts per episode (middle plot) and average length (inhours) of an episode (right plots). The percentages at the bottom ofthe boxplots show the percentage of users in each group.
27 / 33
Social Effect on Users of Sina Weibo
For each Sina Weibo user, we were also able to collect thenumber of accounts the user was following (n→) and thenumber of accounts that were following this user (n←).
We find that there is a stronger correlation between n→and µ0 (r = 0.205).These observations indicate that users who follow moreaccounts are more likely to have more reposts.One explanation could be that the more accounts a userfollows, the more content they can repost from. Anotherplausible explanation is that the “followers” in the socialmedia tend to repost more.
28 / 33
Social Effect on Users of Sina Weibo
We find that the “popular” users, i.e., those whoseaccounts have many followers, tend to post more originalcontent. They are also more likely to initiate their Weiboengagement by posting original content.
We find that users who have strong social ties, i.e., havemany followers or follow many others, are more likely touse Weibo more often.
We find that users with many followers are more likely tospend more time on Weibo once they start an episode ofengagement.
29 / 33
Simulation Study
We set the observation window length T = 100, α = 0.6.With each parameter configuration, we simulate 100 eventtrajectories.We set the parent event hazard function as
λ(t ;β) = exp [β01 + β11 cos(2πt) + β12 sin(2πt)] .
For estimation, we use unit window length s = 1 or 5.To model λ(t ,β), we consider both the true model and thenonparametric cyclic B-spline model. For the latter, we usethe knot vector (0,0.2,0.4,0.6,0.8,1).
30 / 33
Simulation Study
31 / 33
Simulation Study
(γ, µ1, µ0, ρ1, ρ0)
(β01, β11, β12; s) α γ µ1 µ0 ρ1 ρ0
(0.5,0.5,0.5,10,15) 0.595 0.498 0.489 0.494 10.172 15.604(-2,-2,2; 5) (0.010) (0.013) (0.014) (0.014) (0.261) (0.365)
(0.5,0.5,0.5,10,15) 0.594 0.496 0.510 0.518 9.867 15.422(-3,-3,3; 5) (0.007) (0.011) (0.012) (0.014) (0.188) (0.284)
(1.0,0.5,0.5,10,15) 0.603 0.993 0.489 0.499 10.012 15.026(-2,-2,2; 5) (0.009) (0.017) (0.011) (0.012) (0.176) (0.257)
(0.5,1.0,1.0,10,15) 0.598 0.511 0.990 1.025 10.149 15.084(-2,-2,2; 5) (0.008) (0.010) (0.016) (0.017) (0.171) (0.309)
(0.5,0.5,0.5,20,30) 0.600 0.508 0.499 0.488 19.855 30.354(-2,-2,2; 5) (0.008) (0.012) (0.012) (0.013) (0.460) (0.717)
(0.5,0.5,0.5,10,15) 0.601 0.468 0.495 0.460 10.795 16.335(-2,-2,2; 1) (0.008) (0.010) (0.014) (0.014) (0.271) (0.309)
32 / 33
Summary
We propose a new clustered temporal point process modelto model user generated posts on social media.The proposed model captures both inhomogeneity in theinitial posting time and the clustering pattern in thesubsequent posts following the initial post.The proposed goodness of fit procedure shows that theproposed model fits the data reasonably well.The fitted models provide valuable insights on a user’scontent generating behavior.
33 / 33