bayesian poisson tucker decomposition for learning the …dirichlet.net/pdf/schein16bayesian.pdf ·...

26
Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations Aaron Schein ASCHEIN@CS. UMASS. EDU University of Massachusetts Amherst Mingyuan Zhou MINGYUAN. ZHOU@MCCOMBS. UTEXAS. EDU University of Texas at Austin David M. Blei DAVID. BLEI @COLUMBIA. EDU Columbia University Hanna Wallach WALLACH@MICROSOFT. COM Microsoft Research New York City Abstract We introduce Bayesian Poisson Tucker de- composition (BPTD) for modeling country– country interaction event data. These data con- sist of interaction events of the form “coun- try i took action a toward country j at time t.” BPTD discovers overlapping country– community memberships, including the number of latent communities. In addition, it discovers directed community–community interaction net- works that are specific to “topics” of action types and temporal “regimes.” We show that BPTD yields an efficient MCMC inference algorithm and achieves better predictive performance than related models. We also demonstrate that it dis- covers interpretable latent structure that agrees with our knowledge of international relations. 1. Introduction Like their inhabitants, countries interact with one another: they consult, negotiate, trade, threaten, and fight. These interactions are seldom uncoordinated. Rather, they are connected by a fabric of overlapping communities, such as security coalitions, treaties, trade cartels, and military al- liances. For example, OPEC coordinates the petroleum ex- port policies of its thirteen member countries, LAIA fosters trade among Latin American countries, and NATO guaran- tees collective defense against attacks by external parties. Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). A single country can belong to multiple communities, re- flecting its different identities. For example, Venezuela— an oil-producing country and a Latin American country—is a member of both OPEC and LAIA. When Venezuela inter- acts with other countries, it sometimes does so as an OPEC member and sometimes does so as a LAIA member. Countries engage in both within-community and between- community interactions. For example, when acting as an OPEC member, Venezuela consults with other OPEC countries, but trades with non-OPEC, oil-importing coun- tries. Moreover, although Venezuela engages in between- community interactions when trading as an OPEC member, it engages in within-community interactions when trading as a LAIA member. To understand or predict how countries interact, we must account for their community member- ships and how those memberships influence their actions. In this paper, we take a new approach to learning over- lapping communities from interaction events of the form “country i took action a toward country j at time t.” A data set of such interaction events can be represented as either 1) a set of event tokens, 2) a tensor of event type counts, or 3) a series of weighted multinetworks. Models that use the token representation naturally yield efficient inference al- gorithms, models that use the tensor representation exhibit good predictive performance, and models that use the net- work representation learn latent structure that aligns with well-known concepts such as communities. Previous mod- els of interaction event data have each used a subset of these representations. Our approach—Bayesian Poisson Tucker decomposition (BPTD)—takes advantage of all three.

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decompositionfor Learning the Structure of International Relations

Aaron Schein [email protected]

University of Massachusetts Amherst

Mingyuan Zhou [email protected]

University of Texas at Austin

David M. Blei [email protected]

Columbia University

Hanna Wallach [email protected]

Microsoft Research New York City

AbstractWe introduce Bayesian Poisson Tucker de-composition (BPTD) for modeling country–country interaction event data. These data con-sist of interaction events of the form “coun-try i took action a toward country j at timet.” BPTD discovers overlapping country–community memberships, including the numberof latent communities. In addition, it discoversdirected community–community interaction net-works that are specific to “topics” of action typesand temporal “regimes.” We show that BPTDyields an efficient MCMC inference algorithmand achieves better predictive performance thanrelated models. We also demonstrate that it dis-covers interpretable latent structure that agreeswith our knowledge of international relations.

1. IntroductionLike their inhabitants, countries interact with one another:they consult, negotiate, trade, threaten, and fight. Theseinteractions are seldom uncoordinated. Rather, they areconnected by a fabric of overlapping communities, such assecurity coalitions, treaties, trade cartels, and military al-liances. For example, OPEC coordinates the petroleum ex-port policies of its thirteen member countries, LAIA fosterstrade among Latin American countries, and NATO guaran-tees collective defense against attacks by external parties.

Proceedings of the 33 rd International Conference on MachineLearning, New York, NY, USA, 2016. JMLR: W&CP volume48. Copyright 2016 by the author(s).

A single country can belong to multiple communities, re-flecting its different identities. For example, Venezuela—an oil-producing country and a Latin American country—isa member of both OPEC and LAIA. When Venezuela inter-acts with other countries, it sometimes does so as an OPECmember and sometimes does so as a LAIA member.

Countries engage in both within-community and between-community interactions. For example, when acting asan OPEC member, Venezuela consults with other OPECcountries, but trades with non-OPEC, oil-importing coun-tries. Moreover, although Venezuela engages in between-community interactions when trading as an OPEC member,it engages in within-community interactions when tradingas a LAIA member. To understand or predict how countriesinteract, we must account for their community member-ships and how those memberships influence their actions.

In this paper, we take a new approach to learning over-lapping communities from interaction events of the form“country i took action a toward country j at time t.” A dataset of such interaction events can be represented as either1) a set of event tokens, 2) a tensor of event type counts, or3) a series of weighted multinetworks. Models that use thetoken representation naturally yield efficient inference al-gorithms, models that use the tensor representation exhibitgood predictive performance, and models that use the net-work representation learn latent structure that aligns withwell-known concepts such as communities. Previous mod-els of interaction event data have each used a subset of theserepresentations. Our approach—Bayesian Poisson Tuckerdecomposition (BPTD)—takes advantage of all three.

Page 2: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

1 2 3 4 5 6 7 8 9 10 11 12

EstoniaDenmarkSloveniaFinlandSweden

Holy SeeAustria

LithuaniaSwitzerland

LatviaHungary

NetherlandsBelgiumBulgariaSlovakiaRomania

Czech Rep.PolandCroatia

ItalyGermany

FranceMacedonia

CyprusGreeceTurkey

MontenegroKosovoAlbaniaBosniaSerbia

1 2 3 4 5 6 7 8 9 10 11 12

1211

109

87

65

43

21

Figure 1. Latent structure learned by BPTD from country–country interaction events between 1995 and 2000. Top right:A community–community interaction network specific to a singletopic of action types and temporal regime. The topic places mostof its mass on the Intend to Cooperate and Consult actions, sothis network represents cooperative community–community in-teractions. The two strongest between-community interactions(circled) are 2−→5 and 2−→7. Left: Each row depicts the over-lapping community memberships for a single country. We showonly those countries whose strongest community membership isto either community 2, 5, or 7. We ordered the countries ac-cordingly. Countries strongly associated with community 7 areat highlighted in red; countries associated with community 5 arehighlighted in green; and countries associated with community2 are highlighted in purple. Bottom right: Each country is col-ored according to its strongest community membership. The la-tent communities have a very strong geographic interpretation.

BPTD builds on the classic Tucker decomposition (Tucker,1964) to factorize a tensor of event type counts into threefactor matrices and a four-dimensional core tensor (sec-tion 2). The factor matrices embed countries into com-munities, action types into “topics,” and time steps into“regimes.” The core tensor interacts communities, top-ics, and regimes. The country–community factors en-able BPTD to learn overlapping community member-ships, while the core tensor enables it to learn directedcommunity–community interaction networks specific totopics of action types and temporal regimes. Figure 1 il-lustrates this structure. BPTD leads to an efficient MCMCinference algorithm (section 4) and achieves better predic-tive performance than related models (section 6). Finally,BPTD discovers interpretable latent structure that agreeswith our knowledge of international relations (section 7).

2. Bayesian Poisson Tucker DecompositionWe can represent a data set of interaction events as a setof N event tokens, where a single token en = (i

a−→j, t)indicates that sender country i ∈ [V ] took action a ∈ [A]toward receiver country j ∈ [V ] during time step t ∈ [T ].Alternatively, we can aggregate these event tokens into afour-dimensional tensor Y , where element y(t)

ia−→j

is a count

of the number of events of type (ia−→j, t). This tensor will

be sparse because most event types never actually occurin practice. Finally, we can equivalently view this counttensor as a series of T weighted multinetwork snapshots,where the weight on edge i a−→j in the tth snapshot is y(t)

ia−→j

.

BPTD models each element of count tensor Y as

y(t)

ia−→j∼ Po

(C∑

c=1

θic

C∑

d=1

θjd

K∑

k=1

φak

R∑

r=1

ψtr λ(r)

ck−→d

), (1)

where θic, θjd, φak, ψtr, and λ(r)c

k−→dare positive real num-

bers. Factors θic and θjd capture the rates at which coun-tries i and j participate in communities c and d, respec-tively; factor φak captures the strength of association be-tween action a and topic k; and ψtr captures how wellregime r explains the events in time step t. We can col-lectively view the V × C country–community factors as alatent factor matrix Θ, where the ith row represents countryi’s community memberships. Similarly, we can view theA×K action–topic factors and the T×R time-step–regimefactors as latent factor matrices Φ and Ψ, respectively. Fac-tor λ(r)

ck−→d

captures the rate at which community c takes ac-

tions associated with topic k toward community d duringregime r. The C × C × K × R such factors form a coretensor Λ that interacts communities, topics, and regimes.

The country–community factors are gamma-distributed,

θic ∼ Γ(αi, βi) , (2)

where the shape and rate parameters αi and βi are specificto country i. We place an uninformative gamma prior overthese shape and rate parameters: αi, βi ∼ Γ(ε0, ε0). Thishierarchical prior enables BPTD to express heterogeneityin the countries’ rates of activity. For example, we expectthat the US will engage in more interactions than Burundi.

The action–topic and time-step–regime factors are alsogamma-distributed; however, we assume that these factorsare drawn directly from an uninformative gamma prior,

φak, ψtr ∼ Γ(ε0, ε0) . (3)

Because BPTD learns a single embedding of countries intocommunities, it preserves the traditional network-basednotion of community membership. Any sender–receiverasymmetry is captured by the core tensor Λ, which we can

Page 3: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

view as a compression of count tensor Y . By allowingon-diagonal elements, which we denote by λ(r)

c k and off-diagonal elements to be non-zero, the core tensor can rep-resent both within- and between-community interactions.

The elements of the core tensor are gamma-distributed,

λ(r)

c k ∼ Γ(η c η↔c νkρr, δ

)(4)

λ(r)

ck−→d∼ Γ(η↔c η

↔d νkρr, δ) c 6= d. (5)

Each community c ∈ [C] has two positive weights η

c

and η↔c that capture its rates of within- and between-community interaction, respectively. Each topic k ∈ [K]has a positive weight νk, while each regime r ∈ [R] has apositive weight ρr. We place an uninformative prior overthe within-community interaction rates and gamma shrink-age priors over the other weights: η

c ∼ Γ(ε0, ε0), η↔c ∼Γ(γ0 /C, ζ), νk ∼ Γ(γ0 /K, ζ), and ρr ∼ Γ(γ0 /R, ζ).These priors bias BPTD toward learning latent structurethat is sparse. Finally, we assume that δ and ζ are drawnfrom an uninformative gamma prior: δ, ζ ∼ Γ(ε0, ε0).

As K → ∞, the topic weights and their correspondingaction–topic factors constitute a drawGK =

∑∞k=1 νk 1φk

from a gamma process (Ferguson, 1973). Similarly, asR → ∞, the regime weights and their correspond-ing time-step–regime factors constitute a draw GR =∑∞r=1 ρr 1ψr

from another gamma process. As C → ∞,the within- and between-community interaction weightsand their corresponding country–community factors con-stitute a draw GC =

∑∞c=1 η

↔c 1θc

from a marked gammaprocess (Kingman, 1972). The mark associated with atomθc = (θ1c, . . . , θVc) is η

c . We can view the elements ofthe core tensor and their corresponding factors as a drawG =

∑∞c=1

∑∞d=1

∑∞k=1

∑∞r=1 λ

(r)

ck−→d

1θc,θd,φk,ψrfrom a

gamma process, provided that the expected sum of the coretensor elements is finite. This multirelational gamma pro-cess extends the relational gamma process of Zhou (2015).

Proposition 1: In the limit as C,K,R →∞, the expectedsum of the core tensor elements is finite and equal to

E

∞∑

c=1

∞∑

k=1

∞∑

r=1

λ(r)

c k +∑

d6=cλ(r)

ck−→d

=

1

δ

(γ30ζ3

+γ40ζ4

).

We prove this proposition in the supplementary material.

3. Connections to Previous WorkPoisson CP decomposition: DuBois & Smyth (2010) de-veloped a model that assigns each event token (ignoringtime steps) to one of Q latent classes, where each class q ∈[Q] is characterized by three categorical distributions—θ→q

over senders, θ←q over receivers, and φq over actions—i.e.,

P (en=(ia−→j, t) | zn=q) = θ→iq θ

←jq φaq. (6)

This model is closely related to the Poisson-based modelof Schein et al. (2015), which explicitly uses the canoni-cal polyadic (CP) tensor decomposition (Harshman, 1970)to factorize count tensor Y into four latent factor matrices.These factor matrices jointly embed senders, receivers, ac-tion types, and time steps into a Q-dimensional space,

y(t)

ia−→j∼ Po

(Q∑

q=1

θ→iq θ←jq φaq ψtq

), (7)

where θ→iq , θ←jq , φaq , and ψtq are positive real numbers.

Schein et al.’s model generalizes Bayesian Poisson matrixfactorization (Cemgil, 2009; Gopalan et al., 2014; 2015;Zhou & Carin, 2015) and non-Bayesian Poisson CP de-composition (Chi & Kolda, 2012; Welling & Weber, 2001).

Although Schein et al.’s model is expressed in terms ofa tensor of event type counts, the relationship betweenthe multinomial and Poisson distributions (Kingman, 1972)means that we can also express it in terms of a set of eventtokens. This yields an equation that is similar to equation 6,

P (en=(ia−→j, t) | zn=q) ∝ θ→iq θ←jq φaq ψtq. (8)

Conversely, DuBois & Smyth’s model can be expressedas a CP tensor decomposition. This equivalence is anal-ogous to the relationship between Poisson matrix factor-ization and latent Dirichlet allocation (Blei et al., 2003).

We can make Schein et al.’s model nonparametric byadding a per-class positive weight λq ∼ Γ(γ0Q , ζ), i.e.,

y(t)

ia−→j∼ Po

(Q∑

q=1

θ→iq θ←jq φaq ψtq λq

). (9)

As Q → ∞ the per-class weights and their correspondinglatent factors constitute a draw from a gamma process.

Adding this per-class weight reveals that CP decomposi-tion is a special case of Tucker decomposition where thecardinalities of the latent dimensions are equal and the off-diagonal elements of the core tensor are zero. DuBois &Smyth’s and Schein et al.’s models are therefore highlyconstrained special cases of BPTD that cannot capturedimension-specific structure, such as communities of coun-tries or topics of action types. These models require eachlatent class to jointly summarize information about senders,receivers, action types, and time steps. This requirementconflates communities of countries and topics of actiontypes, thus forcing each class to capture potentially redun-dant information. Moreover, by definition, CP decompo-sition models cannot express between-community interac-tions and cannot express sender–receiver asymmetry with-out learning completely separate latent factor matrices for

Page 4: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

senders and receivers. These limitations make it hard to in-terpret these models as learning community memberships.

Infinite relational models: The infinite relational model(IRM) of Kemp et al. (2006) also learns latent structurespecific to each dimension of an M -dimensional tensor;however, unlike BPTD, the elements of this tensor are bi-nary, indicating the presence or absence of the correspond-ing event type. The IRM therefore uses a Bernoulli like-lihood. Schmidt & Mørup (2013) extended the IRM tomodel a tensor of event counts by replacing the Bernoullilikelihood with a Poisson likelihood (and gamma priors):

y(t)

ia−→j∼ Po

(λ(zt)

ziza−→zj

), (10)

where zi, zj ∈ [C] are the respective community assign-ments of countries i and j, za ∈ [K] is the topic as-signment of action a, and zt ∈ [R] is the regime assign-ment of time step t. This model, which we refer to as thegamma–Poisson IRM (GPIRM), allocates M -dimensionalevent types to M -dimensional latent classes—e.g., it allo-cates all tokens of type (i

a−→j, t) to class (ziza−→zj , zt).

The GPIRM is a special case of BPTD where the rows ofthe latent factor matrices are constrained to be “one-hot”binary vectors—i.e., θic = 1(zi = c), θjd = 1(zj = d),φak=1(za=k), and ψtr=1(zt=r). With this constraint,the Poisson rates in equations 1 and 10 are equal. UnlikeBPTD, the GPIRM is a single-membership model. In ad-dition, it cannot express heterogeneity in rates of activityof the countries, action types, and time steps. The latterlimitation can be remedied by letting θizi , θjzj , φaza , andψtzt be positive real numbers. We refer to this variant ofthe GPIRM as the degree-corrected GPIRM (DCGPIRM).

Stochastic block models: The IRM itself generalizesthe stochastic block model (SBM) of Nowicki & Sni-jders (2001), which learns latent structure from binary net-works. Although the SBM was originally specified using aBernoulli likelihood, Karrer & Newman (2011) introducedan alternative specification that uses the Poisson likelihood:

yi−→j ∼ Po

(C∑

c=1

θic

C∑

d=1

θjd λc−→d

), (11)

where θic = 1(zi = c), θj = 1(zj = d), and λc−→d is apositive real number. Like the IRM and the GPIRM, theSBM is a single-membership model and cannot expressheterogeneity in the countries’ rates of activity. Airoldiet al. (2008) addressed the former limitation by lettingθic ∈ [0, 1] such that

∑Cc=1 θic = 1. Meanwhile, Karrer

& Newman (2011) addressed the latter limitation by allow-ing both θizi and θjzj to be positive real numbers, muchlike the DCGPIRM. Ball et al. (2011) simultaneously ad-dressed both limitations by letting θic, θjd ≥ 0, but con-strained λc−→d = λd−→c. Finally, Zhou (2015) extended

Ball et al.’s model to be nonparametric and introduced thePoisson–Bernoulli distribution to link binary data to thePoisson likelihood in a principled fashion. In this model,the elements of the core matrix and their corresponding fac-tors constitute a draw from a relational gamma process.

Non-Poisson Tucker decomposition: Researchers some-times refer to the Poisson rate in equation 11 as be-ing “bilinear” because it can equivalently be written asθj Λθ

>i . Nickel et al. (2012) introduced RESCAL—

a non-probabilistic bilinear model for binary data thatachieves state-of-the-art performance at relation extraction.Nickel et al. (2015) then introduced several extensions forextracting relations of different types. Bilinear models,such as RESCAL and its extensions, are all special cases(albeit non-probabilistic ones) of Tucker decomposition.

Hoff (2015) recently developed a Gaussian-based Tuckerdecomposition model and multilinear tensor regressionmodel (Hoff, 2014) for analyzing interaction event data.

Finally, there are many other Tucker decomposition meth-ods (Kolda & Bader, 2009). Although these include non-parametric (Xu et al., 2012) and nonnegative variants (Kim& Choi, 20007; Mørup et al., 2008; Cichocki et al., 2009),BPTD is the first such model to use a Poisson likelihood.

4. Posterior InferenceGiven an observed count tensor Y , inference in BPTD in-volves “inverting” the generative process to obtain the pos-terior distribution over the parameters conditioned on Yand hyperparameters ε0 and γ0. The posterior distributionis analytically intractable; however, we can approximateit using a set of posterior samples. We draw these sam-ples using Gibbs sampling, repeatedly resampling the valueof each parameter from its conditional posterior given Y ,ε0, γ0, and the current values of the other parameters. Weexpress each parameter’s conditional posterior in a closedform using gamma–Poisson conjugacy and the auxiliaryvariable techniques of Zhou & Carin (2012). We providethe conditional posteriors in the supplementary material.

The conditional posteriors depend on Y via a set of “la-tent sources” (Cemgil, 2009) or subcounts. Because of thePoisson additivity theorem (Kingman, 1972), each latentsource y(tr)

icak−→jd

is a Poisson-distributed random variable:

y(tr)

icak−→jd

∼ Po(θic θjd φak ψtr λ

(r)

ck−→d

)(12)

y(t)

ia−→j

=C∑

c=1

D∑

d=1

K∑

k=1

R∑

r=1

y(tr)

icak−→jd

. (13)

Together, equations 12 and 13 are equivalent to equation 1.In practice, we can equivalently view each latent source in

Page 5: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

terms of the token representation described in section 2,

y(tr)

icak−→jd

=N∑

n=1

1(en=(ia−→j, t))1(zn=(c

k−→d, r)), (14)

where each token’s class assignment zn is an auxiliary la-tent variable. Using this representation, computing the la-tent sources (given the current values of the model param-eters) simply involves allocating event tokens to classes,much like the inference algorithm for DuBois & Smyth’smodel, and aggregating them using equation 14. The con-ditional posterior for each token’s class assignment is

P (zn=(ck−→d, r) | en=(i

a−→j, t),Y , ε0, γ0, . . .)∝ θic θjd φak ψtr λ(r)

ck−→d. (15)

Computation is dominated by the normalizing constant

Z(t)

ia−→j

=C∑

c=1

C∑

d=1

K∑

k=1

R∑

r=1

θic θjd φak ψtr λ(r)

ck−→d. (16)

Computing this normalizing constant naıvely involvesO(C × C × K × R) operations; however, because eachlatent class (c

k−→d, r) is composed of four separate dimen-sions, we can improve efficiency. We instead compute

Z(t)

ia−→j

=

C∑

c=1

θic

C∑

d=1

θjd

K∑

k=1

θak

R∑

r=1

ψtr λ(r)

ck−→d, (17)

which involves O(C + C +K +R) operations.

Compositional allocation using equations 15 and 17 im-proves computational efficiency significantly over naıvenon-compositional allocation using equations 15 and 16. Inpractice, we setC,K, andR to large values to approximatethe nonparametric interpretation of BPTD. If, for example,C = 50, K = 10, and R = 5, computing the normalizingconstant for equation 15 using equation 16 requires 2,753times the number of operations implied by equation 17.

Proposition 2: For an M -dimensional core tensor withD1 × . . .×DM elements, computing the normalizing con-stant using non-compositional allocation requires 1 ≤ π <∞ times the number of operations required to compute itusing compositional allocation. When D1 = . . .=DM =1,π=1. As Dm, Dm′ →∞ for any m and m′ 6=m, π →∞.

We prove this proposition in the supplementary material.

BPTD and other Poisson-based models yield allocation in-ference algorithms that take advantage of the inherent spar-sity of the data and scale with the number of event to-kens. In contrast, non-Poisson tensor decomposition mod-els (including Hoff’s model) lead to algorithms that scalewith the size of the count tensor. Allocation-based infer-ence in BPTD is especially efficient because it composi-tionally allocates each M -dimensional event token to an

V

V

A

K

C

C

Figure 2. Compositional allocation. For clarity, we show the allo-cation process for a three-dimensional count tensor (ignoring timesteps). Observed three-dimensional event tokens (left) are com-positionally allocated to three-dimensional latent classes (right).

M -dimensional latent class. Figure 2 illustrates this pro-cess. CP decomposition models, such as those of DuBois& Smyth (2010) and Schein et al. (2015), only permit non-compositional allocation. For example, while BPTD allo-cates each token en = (i

a−→j, t) to a four-dimensional la-tent class (c

k−→d, r), Schein et al.’s model allocates en to aone-dimensional latent class q that cannot be decomposed.Therefore, whenQ=C×C×K×R, BPTD yields a fasterallocation inference algorithm than Schein et al.’s model.

5. Country–Country Interaction Event DataOur data come from the Integrated Crisis Early Warn-ing System (ICEWS) of Boschee et al. and the GlobalDatabase of Events, Language, and Tone (GDELT) of Lee-taru & Schrodt (2013). ICEWS and GDELT both use theConflict and Mediation Event Observations (CAMEO) hi-erarchy (Gerner et al.) for senders, receivers, and actions.

The top-level CAMEO coding for senders and receiversis their country affiliation, while lower levels in the hier-archy incorporate more specific attributes like their sec-tors (e.g., government or civilian) and their religious orethnic affiliations. When studying international relationsusing CAMEO-coded event data, researchers usually con-sider only the senders’ and receivers’ countries. There are249 countries represented in ICEWS, which include non-universally recognized states, such as Occupied PalestinianTerritory, and former states, such as Former Yugoslav Re-public of Macedonia; there are 233 countries in GDELT.

The top level for actions, which we use in our analyses,consists of twenty action classes, roughly ranked accordingto their overall sentiment. For example, the most negative is20—Use Unconventional Mass Violence. CAMEO furtherdivides these actions into the QuadClass scheme: VerbalCooperation (actions 2–5), Material Cooperation (actions6–7), Verbal Conflict (actions 8–16), and Material Conflict(16–20). The first action (1—Make Statement) is neutral.

Page 6: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

6. Predictive AnalysisBaseline models: We compared BPTD’s predictive perfor-mance to that of three baseline models, described in sec-tion 3: 1) GPIRM, 2) DCGPIRM, and 3) the BayesianPoisson tensor factorization (BPTF) model of Schein et al.(2015). All three models use a Poisson likelihood and havethe same two hyperparameters as BPTD—i.e., ε0 and γ0.We set ε0 to 0.1, as recommended by Gelman (2006), andwe set γ0 so that (γ0 /C)

2(γ0 /K) (γ0 /R) = 0.01. This

parameterization encourages the elements of the core ten-sor Λ to be sparse. We implemented an MCMC inferencealgorithm for each model. We provide the full generativeprocess for all three models in the supplementary material.

GPIRM and DCGPIRM are both Tucker decompositionmodels and thus allocate events to four-dimensional la-tent classes. The cardinalities of these latent dimensionsare the same as BPTD’s—i.e., C, K, and R. In con-trast, BPTF is a CP decomposition model and thus allo-cates events to one-dimensional latent classes. We set thecardinality of this dimension so that the total number oflatent factors in BPTF’s likelihood was equal to the to-tal number of latent factors in BPTD’s likelihood—i.e.,Q = d (V×C)+(A×K)+(T×R)+(C2×K×R)

V+V+A+T+1 e. We chose notto let BPTF and BPTD use the same number of latentclasses—i.e., to set Q = C2 × K × R. BPTF does notpermit non-compositional allocation, so MCMC inferencebecomes very slow for even moderate values of C, K, andR. CP decomposition models also tend to overfit when Qis large (Zhao et al., 2015). Throughout our predictive ex-periments, we let C= 20, K= 6, and R= 3. These valueswere well-supported by the data, as we explain in section 7.

Experimental setup: We constructed twelve different ob-served tensors—six from ICEWS and six from GDELT.Five of the six tensors for each source (ICEWS or GDELT)correspond to one-year time spans with monthly time steps,starting with 2004 and ending with 2008; the sixth corre-sponds to a five-year time span with monthly time steps,spanning 1995–2000. We divided each tensor Y into atraining tensor Y train = Y (1), . . . ,Y (T−3) and a test ten-sor Y test = Y (T−2), . . . ,Y (T ). We further divided eachtest tensor into a held-out portion and an observed por-tion via a binary mask. We experimented with two dif-ferent masks: one that treats the elements involving themost active fifteen countries as the held-out portion and theremaining elements as the observed portion, and one thatdoes the opposite. The first mask enabled us to evaluatethe models’ reconstructions of the densest (and arguablymost interesting) portion of each test tensor, while the sec-ond mask enabled us to evaluate their reconstructions ofits complement. Across the entire GDELT database, forexample, the elements involving the most active fifteencountries—i.e., 6% of all 233 countries—account for 30%

of the event tokens. Moreover, 40% of these elements arenon-zero. These non-zero elements are highly dispersed,with a variance-to-mean ratio of 220. In contrast, only0.7% of the elements involving the other countries are non-zero. These elements have a variance-to-mean ratio of 26.

For each combination of the four models, twelve tensors,and two masks, we ran 5,000 iterations of MCMC inferenceon the training tensor. We clamped the country–communityfactors, the action–topic factors, and the core tensor andthen inferred the time-step–regime factors for the test ten-sor using its observed portion by running 1,000 iterations ofMCMC inference. We saved every tenth sample after thefirst 500. We used each sample, along with the country–community factors, the action–topic factors, and the coretensor, to compute the Poisson rate for each element in theheld-out portion of the test tensor. Finally, we averagedthese rates across samples and used each element’s aver-age rate to compute its probability. We combined the held-out elements’ probabilities by taking their geometric meanor, equivalently, by computing their inverse perplexity. Wechose this combination strategy to ensure that the modelswere penalized heavily for making poor predictions on thenon-zero elements and were not rewarded excessively formaking good predictions on the zero elements. By clamp-ing the country–community factors, the action–topic fac-tors, and the core tensor after training, our experimentalsetup is analogous to that used to assess collaborative fil-tering models’ strong generalization ability (Marlin, 2004).

Results: Figure 3 illustrates the results for each combi-nation of the four models, twelve tensors, and two masks.The top row contains the results from the twelve experi-ments involving the first mask, where the elements involv-ing the most active fifteen countries were treated as theheld-out portion. BPTD outperformed the baselines signif-icantly. BPTF—itself a state-of-the-art model—performedbetter than BPTD in only one experiment. In general, theTucker decomposition allows BPTD to learn richer latentstructure that generalizes better to held-out data. The bot-tom row contains the results from the experiments involv-ing the second mask. The models’ performance was closerin these experiments, probably because of the large pro-portion of easy-to-predict zero elements. BPTD and BPTFperformed indistinguishably in these experiments, and bothmodels outperformed the GPIRM and the DCGPIRM. Thesingle-membership nature of the GPIRM and the DCG-PIRM prevents them from expressing high levels of hetero-geneity in the countries’ rates of activity. When the held-out elements were highly dispersed, these models some-times made extremely inaccurate predictions. In contrast,the mixed-membership nature of BPTD and BPTF allowsthem to better express heterogeneous rates of activity.

Page 7: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

0.0

0.5

1.0 3.9e-033.9e-033.9e-033.9e-03

GDELT1995-2000

3.3e-023.3e-023.3e-023.3e-02

ICEWS1995-2000

6.4e-046.4e-046.4e-046.4e-04

GDELT2004

4.9e-024.9e-024.9e-024.9e-02

ICEWS2004

1.1e-021.1e-021.1e-021.1e-02

GDELT2005

2.6e-022.6e-022.6e-022.6e-02

ICEWS2005

1.2e-041.2e-041.2e-041.2e-04

GDELT2006

gpirmdcgpirmbptfbptd

1.7e-021.7e-021.7e-021.7e-02

ICEWS2006

1.4e-061.4e-061.4e-061.4e-06

GDELT2007

1.2e-021.2e-021.2e-021.2e-02

ICEWS2007

7.7e-077.7e-077.7e-077.7e-07

GDELT2008

3.1e-023.1e-023.1e-023.1e-02

ICEWS2008

0.0

0.5

1.0 9.1e-019.1e-019.1e-019.1e-01 9.8e-019.8e-019.8e-019.8e-01 8.4e-018.4e-018.4e-018.4e-01 9.5e-019.5e-019.5e-019.5e-01 8.8e-018.8e-018.8e-018.8e-01 9.5e-019.5e-019.5e-019.5e-01 8.6e-018.6e-018.6e-018.6e-01 9.4e-019.4e-019.4e-019.4e-01 8.0e-018.0e-018.0e-018.0e-01 9.4e-019.4e-019.4e-019.4e-01 7.1e-017.1e-017.1e-017.1e-01 9.3e-019.3e-019.3e-019.3e-01

Figure 3. Predictive performance. Each plot shows the inverse perplexity (higher is better) for the four models: the GPIRM (blue), theDCGPIRM (green), BPTF (red), and BPTD (yellow). In the experiments depicted in the top row, we treated the elements involving themost active countries as the held-out portion; in the experiments depicted in the bottom row, we treated the remaining elements as theheld-out portion. For ease of comparison, we scaled the inverse perplexities to lie between zero and one; we give the scales in the top-leftcorners of the plots. BPTD outperformed the baselines significantly when predicting the denser portion of each test tensor (top row).

7. Exploratory AnalysisWe used a tensor of ICEWS events spanning 1995–2000,with monthly time steps, to explore the latent structure dis-covered by BPTD. We initially let C = 50, K = 8, andR= 3—i.e., C × C × K × R = 60, 000 latent classes—and used the shrinkage priors to adaptively learn the mostappropriate numbers of communities, topics, and regimes.We found C = 20 communities and K = 6 topics withweights that were significantly greater than zero. We pro-vide a plot of the community weights in the supplementarymaterial. Although all three regimes had non-zero weights,one had a much larger weight than the other two. Forcomparison, Schein et al. (2015) used fifty latent classesto model the same data, while Hoff (2015) used C = 4,K=4, and R=4 to model a similar tensor from GDELT.

Topics of action types: We show the inferred action–topicfactors as a heatmap in the left subplot of figure 4. Weordered the topics by their weights ν1, . . . , νK , which areabove the heatmap. The inferred topics correspond veryclosely to CAMEO’s QuadClass scheme. Moving from leftto right, the topics place their mass on increasingly nega-tive actions. Topics 1 and 2 place most of their mass onVerbal Cooperation actions; topic 3 places most of its masson Material Cooperation actions and the neutral 1—MakeStatement action; topic 4 places most of its mass on Ver-bal Conflict actions and the 1—Make Statement action; andtopics 5 and 6 place their mass on Material Conflict actions.

Topic-partitioned community–community networks: Inthe right subplot of figure 4, we visualize the inferred com-munity structure for topic k=1 and the most active regimer. The bottom-left heatmap is the community–communityinteraction network Λ

(r)k . The top-left heatmap depicts the

rate at which each country i acts as a sender in each com-munity c—i.e., θic

∑Vj=1

∑Cd=1 θjd λ

(r)

ck−→d

. Similarly, the

bottom-right heatmap depicts the rate at which each coun-try acts as a receiver in each community. The top-rightheatmap depicts the number of times each country i took

an action associated with topic k toward each country j

during regime r—i.e.,∑Cc=1

∑Cd=1

∑Aa=1

∑Tt=1 y

(tr)

icak−→jd

.

We grouped the countries by their strongest communitymemberships and ordered the communities by their within-community interaction weights η

1 , . . . , η

C , from smallestto largest; the thin green lines separate the countries that arestrongly associated with one community from the countriesthat are strongly associated with its adjacent communities.

Some communities contain only one or two strongly as-sociated countries. For example, community 1 containsonly the US, community 6 contains only China, and com-munity 7 contains only Russia and Belarus. These com-munities mostly engage in between-community interac-tion. Other larger communities, such as communities 9and 15, mostly engage in within-community interaction.Most communities have a strong geographic interpreta-tion. Moving upward from the bottom, there are com-munities that correspond to Eastern Europe, East Africa,South-Central Africa, Latin America, Australasia, CentralEurope, Central Asia, etc. The community–community in-teraction network summarizes the patterns in the top-rightheatmap. This topic is dominated by the 4–Consult action,so the network is symmetric; the more negative topics haveasymmetric community–community interaction networks.We therefore hypothesize that cooperation is an inherentlyreciprocal type of interaction. We provide visualizationsfor the other five topics in the supplementary material.

8. SummaryWe presented Bayesian Poisson Tucker decomposition(BPTD) for learning the latent structure of international re-lations from country–country interaction events of the form“country i took action a toward country j at time t.” Unlikeprevious models, BPTD takes advantage of all three repre-sentations of an interaction event data set: 1) a set of eventtokens, 2) a tensor of event type counts, and 3) a series ofweighted multinetwork snapshots. BPTD uses a Poisson

Page 8: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

1 2 3 4 5 6

Engage inMass Violence

Fight

Assault

Coerce

ReduceRelations

Posture

Protest

Threaten

Reject

Disapprove

Demand

Investigate

Yield

Aid

Cooperate(Material)

Cooperate(Diplomatic)

Consult

Intend toCooperate

Appeal

MakeStatement

0.0

0.5

1.0

1.5

2.0

2.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 4. Left: Action–topic factors. The topics are ordered by ν1, . . . , νK (above the heatmap). Right: Latent structure discovered byBPTD for topic k = 1 and the most active regime, including the community–community interaction network (bottom left), the rate atwhich each country acts as a sender (top left) and a receiver (bottom right) in each community, and the number of times each country itook an action associated with topic k toward each country j during regime r (top right). We show only the most active 100 countries.

likelihood, respecting the discrete nature of the data and itsinherent sparsity. Moreover, BPTD yields a compositionalallocation inference algorithm that is more efficient thannon-compositional allocation algorithms. Because BPTDis a Tucker decomposition model, it shares parametersacross latent classes. In contrast, CP decomposition mod-els force each latent class to capture potentially redundantinformation. BPTD therefore “does more with less.” Thisefficiency is reflected in our predictive analysis: BPTD out-performs BPTF—a CP decomposition model—as well astwo other baselines. BPTD learns interpretable latent struc-ture that aligns with well-known concepts from the net-works literature. Specifically, BPTD learns latent country–community memberships, including the number of com-munities, as well as directed community–community inter-action networks that are specific to topics of action typesand temporal regimes. This structure captures the complex-

ity of country–country interactions, while revealing pat-terns that agree with our knowledge of international rela-tions. Finally, although we presented BPTD in the contextof interaction events, BPTD is well suited to learning latentstructure from other types of multidimensional count data.

AcknowledgementsWe thank Abigail Jacobs and Brandon Stewart for help-ful discussions. This work was supported by NSF #SBE-0965436, #IIS-1247664, #IIS-1320219; ONR #N00014-11-1-0651; DARPA #FA8750-14-2-0009, #N66001-15-C-4032; Adobe; the John Templeton Foundation; the SloanFoundation; the UMass Amherst Center for Intelligent In-formation Retrieval. Any opinions, findings, conclusions,or recommendations expressed in this material are the au-thors’ and do not necessarily reflect those of the sponsors.

Page 9: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

ReferencesAiroldi, E. M., Blei, D. M., Feinberg, S. E., and Xing, E. P.

Mixed membership stochastic blockmodels. Journal ofMachine Learning Research, 9:1981–2014, 2008.

Ball, B., Karrer, B., and Newman, M. E. J. Efficientand principled method for detecting communities in net-works. Physical Review E, 84(3), 2011.

Blei, D., Ng, A., and Jordan, M. Latent Dirichlet allo-cation. Journal of Machine Learning Research, 3:993–1022, 2003.

Boschee, E., Lautenschlager, J., O’Brien, S., Shellman, S.,Starz, J., and Ward, M. ICEWS coded event data. Har-vard Dataverse. V10.

Cemgil, A. T. Bayesian inference for nonnegative matrixfactorisation models. Computational Intelligence andNeuroscience, 2009.

Chi, E. C. and Kolda, T. G. On tensors, sparsity, and non-negative factorizations. SIAM Journal on Matrix Analy-sis and Applications, 33(4):1272–1299, 2012.

Cichocki, A., Zdunek, R., Phan, A. H., and i Amari, S.Nonnegative Matrix and Tensor Factorizations: Appli-cations to Exploratory Multi-Way Data Analysis andBlind Source Separation. John Wiley & Sons, 2009.

DuBois, C. and Smyth, P. Modeling relational events vialatent classes. In Proceedings of the Sixteenth ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining, pp. 803–812, 2010.

Ferguson, T. S. A Bayesian analysis of some nonparametricproblems. The Annals of Statistics, 1(2):209–230, 1973.

Gelman, A. Prior distributions for variance parameters inhierarchical models. Bayesian Analysis, 1(3):515–533,2006.

Gerner, D. J., Schrodt, P. A., Abu-Jabr, R., and Yilmaz, O.Conflict and mediation event observations (CAMEO): Anew event data framework for the analysis of foreign pol-icy interactions. Working paper.

Gopalan, P., Ruiz, F. J. R., Ranganath, R., and Blei,D. M. Bayesian nonparametric Poisson factorization forrecommendation systems. In Proceedings of the Sev-enteenth International Conference on Artificial Intelli-gence and Statistics, volume 33, pp. 275–283, 2014.

Gopalan, P., Hofman, J., and Blei, D. Scalable recommen-dation with Poisson factorization. In Proceedings of theThirty-First Conference on Uncertainty in Artificial In-telligence, 2015.

Harshman, R. Foundations of the PARAFAC procedure:Models and conditions for an “explanatory” multimodalfactor analysis. UCLA Working Papers in Phonetics, 16:1–84, 1970.

Hoff, P. Multilinear tensor regression for longitudinal rela-tional data. arXiv:1412.0048, 2014.

Hoff, P. Equivariant and scale-free Tucker decompositionmodels. Bayesian Analysis, 2015.

Karrer, B. and Newman, M. E. J. Stochastic blockmodelsand community structure in networks. Physical ReviewE, 83(1), 2011.

Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T.,and Ueda, N. Learning systems of concepts with an infi-nite relational model. In Proceedings of the Twenty-FirstNational Conference on Artificial Intelligence, 2006.

Kim, Y.-D. and Choi, S. Nonnegative Tucker decomposi-tion. In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, 20007.

Kingman, J. F. C. Poisson Processes. Oxford UniversityPress, 1972.

Kolda, T. G. and Bader, B. W. Tensor decompositions andapplications. SIAM Review, 51(3):455–500, 2009.

Leetaru, K. and Schrodt, P. GDELT: Global data on events,location, and tone, 1979–2012. Working paper, 2013.

Marlin, B. Collaborative filtering: A machine learning per-spective. Master’s thesis, University of Toronto, 2004.

Mørup, M., Hansen, L. K., and Arnfred, S. M. Algorithmsfor sparse nonnegative Tucker decompositions. NeuralComputation, 20(8):2112–2131, 2008.

Nickel, M., Tresp, V., and Kriegel, H.-P. FactorizingYAGO: Scalable machine learning for linked data. InProceedings of the Twenty-First International WorldWide Web Conference, pp. 271–280, 2012.

Nickel, M., Murphy, K., Tresp, V., and Gabrilovich,E. A review of relational machine learning forknowledge graphs: From multi-relational link pre-diction to automated knowledge graph construction.arXiv:1503.00759, 2015.

Nowicki, K. and Snijders, T. A. B. Estimation and predic-tion for stochastic blockstructures. Journal of the Amer-ican Statistical Association, 96(455):1077–1087, 2001.

Schein, A., Paisley, J., Blei, D. M., and Wallach, H.Bayesian Poisson tensor factorization for inferrring mul-tilateral relations from sparse dyadic event counts. In

Page 10: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations

Proceedings of the Twenty-First ACM SIGKDD Inter-national Conference on Knowledge Discovery and DataMining, pp. 1045–1054, 2015.

Schmidt, M. N. and Mørup, M. Nonparametric Bayesianmodeling of complex networks: An introduction. IEEESignal Processing Magazine, 30(3):110–128, 2013.

Tucker, L. R. The extension of factor analysis to three-dimensional matrices. In Frederiksen, N. and Gullik-sen, H. (eds.), Contributions to Mathematical Psychol-ogy. Holt, Rinehart and Winston, 1964.

Welling, M. and Weber, M. Positive tensor factorization.Pattern Recognition Letters, 22(12):1255–1261, 2001.

Xu, Z., Yan, F., and Qi, Y. Infinite Tucker decomposition:Nonparametric Bayesian models for multiway data anal-ysis. In Proceedings of the Twenty-Ninth InternationalConference on Machine Learning, pp. 1023–1030, 2012.

Zhao, Q., Zhang, L., and Cichocki, A. Bayesian CP fac-torization of incomplete tensors with automatic rank de-termination. IEEE Transactions on Pattern Analysis andMachine Intelligence, 37(9):1751–1763, 2015.

Zhou, M. Infinite edge partition models for overlappingcommunity detection and link prediction. In Proceed-ings of the Eighteenth International Conference on Arti-ficial Intelligence and Statistics, pp. 1135–1143, 2015.

Zhou, M. and Carin, L. Augment-and-conquer negativebinomial processes. In Advances in Neural InformationProcessing Systems Twenty-Five, pp. 2546–2554, 2012.

Zhou, M. and Carin, L. Negative binomial process countand mixture modeling. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 37(2):307–320,2015.

Page 11: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Supplementary Material for“Bayesian Poisson Tucker Decomposition for Learning

the Structure of International Relations”

Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016.JMLR: W&CP volume 48. Copyright 2016 by the author(s).

Aaron Schein Mingyuan Zhou David M. Blei Hanna Wallach

1

Page 12: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 Proposition 1

In the limit as C,K,R→∞, the expected sum of the core tensor elements is finite and equal to

E

∞∑

c=1

∞∑

k=1

∞∑

r=1

λ(r)

c k +∑

d6=cλ(r)

ck−→d

=

1

δ

(γ30ζ3

+γ40ζ4

).

The proof is very similar to that of Zhou (2015, Lemma 1). By the law of total expectation,

E

∞∑

c=1

∞∑

k=1

∞∑

r=1

λ(r)

c k +∑

d6=cλ(r)

ck−→d

=

∞∑

c=1

∞∑

k=1

∞∑

r=1

E

[λ(r)

c k]

+∑

d6=cE[λ(r)

ck−→d

]

=∞∑

c=1

∞∑

k=1

∞∑

r=1

E

c η↔c νkρrδ

]+∑

d 6=cE[η↔c η

↔d νkρrδ

]

=1

δ

∞∑

c=1

∞∑

k=1

∞∑

r=1

E

c η↔c νkρr

]+∑

d6=cE [η↔c η

↔d νkρr]

=1

δE

[ ∞∑

k=1

νk

]E

[ ∞∑

r=1

ρr

] ∞∑

c=1

E

c η↔c

]+∑

d 6=cE [η↔c η

↔d ]

=1

δ

(γ0ζ

)(γ0ζ

) ∞∑

c=1

E

c η↔c

]+∑

d6=cE [η↔c η

↔d ]

=1

δ

(γ0ζ

)2∞∑

c=1

E[η

c

]E [η↔c ] + E

∞∑

c=1

d6=cη↔c η

↔d

.

The marks η

c are gamma distributed with mean 1, so

=1

δ

(γ0ζ

)2E

[ ∞∑

c=1

η↔c

]+ E

∞∑

c=1

d6=cη↔c η

↔d

=1

δ

(γ0ζ

)2γ0ζ

+ E

∞∑

c=1

d6=cη↔c η

↔d

=1

δ

(γ0ζ

)2(γ0ζ

+ E

[ ∞∑

c=1

∞∑

d=1

η↔c η↔d

]− E

[ ∞∑

c=1

η↔c η↔c

])

=1

δ

(γ0ζ

)2(γ0ζ

+ E

[( ∞∑

c=1

η↔c

)( ∞∑

d=1

η↔d

)]− E

[ ∞∑

c=1

η↔c η↔c

]).

2

Page 13: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Using E [(∑∞c=1 η

↔c ) (

∑∞d=1 η

↔d )] =

γ20

ζ2 + γ0ζ2 , we can write

=1

δ

(γ0ζ

)2(γ0ζ

+γ20ζ2

+γ0ζ2− E

[ ∞∑

c=1

η↔c η↔c

]).

Finally, using Campbell’s Theorem (Kingman, 1972), we know that E [∑∞c=1 η

↔c η↔c ] = γ0

ζ2 , so

=1

δ

(γ0ζ

)2(γ0ζ

+γ20ζ2

+γ0ζ2− γ0ζ2

)

=1

δ

(γ0ζ

)2(γ0ζ

+γ20ζ2

)

=1

δ

(γ30ζ3

+γ40ζ4

).

2 Proposition 2

For anM -dimensional core tensor withD1×. . .×DM elements, computing the normalizing constant usingnon-compositional allocation requires 1 ≤ π <∞ times the number of operations required by compositionalallocation. When D1 = . . .=DM =1, π=1. As Dm, Dm′ →∞ for any m and m′ 6=m, π →∞.

Each event token occurs in an M -dimensional discrete coordinate space—i.e., en = p, where p =(p1, . . . , pM ) is a multi-index. Similarly, each event token’s latent class assignment also occurs inan M -dimensional discrete coordinate space—i.e., zn=q, where q = (q1, . . . , qM ) is a multi-index.

Assuming M factor matrices Θ(1), . . . ,Θ(M) and an M -dimensional core tensor Λ,

P (zn=q | en=p) ∝ λqM∏

m=1

θ(m)pmqm .

The computational bottleneck in MCMC inference is computing the normalizing constant

Zp =∑

q

λq

M∏

m=1

θ(m)pmqm .

If we use a naıve non-compositional approach, then (assuming each latent dimension m has car-dinality Dm) the sum over q involves

∏Mm=1Dm terms and each term requires M multiplications.

Thus, computing Zp requires a total of M∏Mm=1Dm multiplications and

∏Mm=1Dm additions.1

However, we can also compute Zp using a compositional approach—i.e.,

Zp =

D1∑

q1=1

θ(1)p1q1

D2∑

q2=1

θ(2)p2q2 . . .

DM∑

qM=1

θ(M)pMqM λq.

1Computing a sum of N terms requires either N or N − 1 additions, depending on whether or not you add the firstterm to zero. We assume the former definition and say that computing a sum of N terms requires N additions.

3

Page 14: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

This approach requires a total of∑Mm=1Dm multiplications and 1 +

∑Mm=1(Dm − 1) additions.

The ratio π of the number of operations (i.e., multiplications and additions) required by the non-compositional approach to the number of operations required by the compositional approach is

π =

(M∏Mm=1Dm

)+(∏M

m=1Dm

)

(∑Mm=1Dm

)+(

1 +∑Mm=1(Dm−1)

)

=(M+1)

∏Mm=1Dm(

2∑Mm=1Dm

)−M + 1

.

As the cardinalities D1, . . . , DM of the latent dimensions grow, the numerator grows at a fasterrate than the denominator. Therefore π achieves its lower bound when D1 = . . . = DM = 1:

Ω(π) =(M + 1)

(2M)−M + 1.

Because the numerator grows at a faster rate than the denominator, we can find the upper boundby taking the limit as one or more cardinalities tend to infinity. We work with the inverse ratio

π−1 =

(2∑Mm=1Dm

)−M + 1

(M + 1)∏Mm=1Dm

=2

M + 1

(M∑

m=1

Dm∏Mm=1Dm

)− M − 1

M + 1

(1

∏Mm=1Dm

)

=2

M + 1

(M∑

m=1

1∏m′ 6=mDm′

)− M − 1

M + 1

(1

∏Mm=1Dm

).

First, we take the limit of π−1 as a single cardinality Dm →∞:

limDm→∞

π−1 = limDm→∞

2

M + 1

(M∑

m=1

1∏n 6=mDn

)− limDm→∞

M − 1

M + 1

(1

∏Mm=1Dm

)

= limDm→∞

2

M + 1

(M∑

m=1

1∏n 6=mDn

)

=2

M + 1

(1∏

n 6=mDn

).

However, as any second cardinality Dm′ →∞,

limDm,Dm′→∞

π−1 = limDm′→∞

2

M + 1

(1∏

n 6=mDn

)→ 0.

Therefore, π →∞ as any two (or more) cardinalities tend to infinity.

4

Page 15: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

3 Inference

Gibbs sampling repeatedly resamples the value of each latent variable from its conditional poste-rior. In this section, we provide the conditional posterior for each latent variable in BPTD.

We start by defining the Chinese restaurant table (CRT) distribution (Zhou & Carin, 2015): If l ∼CRT(m, r) is a CRT-distributed random variable, then, we can equivalently say that

l ∼m∑

n=1

Bern(

r

r + n− 1

).

We also define g(x) ≡ ln(1 + x).

Throughout this section, we use, e.g., (θic | −) to denote θic conditioned on Y , ε0, γ0, and thecurrent values of the other latent variables. We assume that Y is partially observed and include abinary mask B, where b(t)

ia−→j

=0 means that y(t)i

a−→j=0 is unobserved, not an observed zero.

Action–Topic Factors:

y(·)·ak↔·≡

V∑

i=1

C∑

c=1

j 6=i

C∑

d=1

T∑

t=1

R∑

r=1

y(tr)

icak−→dj

ξak ≡V∑

i=1

j 6=i

T∑

t=1

b(t)

ia−→j

C∑

c=1

θic

C∑

d=1

θjd

R∑

r=1

ψtr λ(r)

ck−→d

(φak | −) ∼ Γ(ε0 + y

(·)·ak↔·

, ε0 + ξak

)

Time-Step–Regime Factors:

y(tr)

· ·−→· ≡V∑

i=1

C∑

c=1

j 6=i

C∑

d=1

A∑

a=1

K∑

k=1

y(tr)

icak−→dj

ξtr ≡V∑

i=1

j 6=i

A∑

a=1

b(t)

ia−→j

C∑

c=1

θic

C∑

d=1

θjd

K∑

k=1

φak λ(r)

ck−→d

(ψtr | −) ∼ Γ(ε0 + y

(tr)

· ·−→·, ε0 + ξtr

)

5

Page 16: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Country–Community Factors:

y(·)ic·↔·≡∑

j 6=i

C∑

d=1

A∑

a=1

K∑

k=1

T∑

t=1

R∑

r=1

(y(tr)

icak−→dj

+ y(tr)

jdak−→ci

)

ξic ≡∑

j 6=i

A∑

a=1

T∑

t=1

(b(t)

ia−→j

C∑

d=1

θjd

K∑

k=1

φak

R∑

r=1

ψtr λ(r)

ck−→d

+ b(t)

ja−→i

C∑

d=1

θjd

K∑

k=1

φak

R∑

r=1

ψtr λ(r)

dk−→c

)

(θic | −) ∼ Γ(αi + y

(·)ic·↔·, βi + ξic

)

Auxiliary Latent Country–Community Counts:

(`ic | −) ∼ CRT(y(·)ic·↔·, αi

)

Per-Country Shape Parameters:

(αi | −) ∼ Γ

(ε0 +

C∑

c=1

`ic, ε0 +

C∑

c=1

g(ξic βi

−1))

Per-Country Rate Parameters:

(βi | −) ∼ Γ

(ε0 + Cαi, ε0 +

C∑

c=1

θic

)

Diagonal Elements of the Core Tensor:

ω(r)

c k ≡ η

c η↔c νkρr

y(r)

c k ≡V∑

i=1

j 6=i

A∑

a=1

T∑

t=1

y(tr)

icak−→cj

ξ(r)

c k ≡V∑

i=1

θic∑

j 6=iθjc

A∑

a=1

φak

T∑

t=1

ψtr b(t)

ia−→j

(λ(r)

c k | −)∼ Γ

(ω(r)

c k + y(r)

c k , δ + ξ(r)

c k)

6

Page 17: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Off-Diagonal Elements of the Core Tensor:

ω(r)

ck−→d≡ η↔c η↔d νkρr c 6= d

y(r)

ck−→d≡

V∑

i=1

j 6=i

A∑

a=1

T∑

t=1

y(tr)

icak−→dj

c 6= d

ξ(r)

ck−→d≡

V∑

i=1

θic∑

j 6=iθjd

A∑

a=1

φak

T∑

t=1

ψtr b(t)

ia−→j

c 6= d

(λ(r)

ck−→d| −)∼ Γ

(ω(r)

ck−→d

+ +y(r)

ck−→d, δ + ξ

(r)

ck−→d

)c 6= d

Core Rate Parameter:

ω(·)· ·↔·≡

C∑

c=1

K∑

k=1

R∑

r=1

ω(r)

c k +∑

d6=cω(r)

ck−→d

λ(·)· ·↔·≡

C∑

c=1

K∑

k=1

R∑

r=1

λ(r)

c k +∑

d6=cλ(r)

ck−→d

(δ | −) ∼ Γ(ε0 + ω

(·)· ·↔·

, ε0 + λ(·)· ·↔·

)

Diagonal Auxiliary Latent Core Counts:

`(r)

c k ∼ CRT(y(r)

c k , ω(r)

c k)

Off-Diagonal Auxiliary Latent Core Counts:

`(r)

ck−→d∼ CRT

(y(r)

ck−→d, ω

(r)

ck−→d

)c 6= d

Within-Community Weights:

`(·)c · ≡

K∑

k=1

R∑

r=1

`(r)

c k

ξ

c ≡R∑

r=1

ρr

K∑

k=1

νk∑

d6=cη↔d

(g

(ξ(r)

ck−→d

δ−1)

+ g

(ξ(r)

dk−→c

δ−1))

c | −) ∼ Γ(γ0C

+ `(·)c · , ζ + ξ

c

)

7

Page 18: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

Between-Community Weights:

`(·)c·↔·≡ `(·)c · +

d 6=c

K∑

k=1

R∑

r=1

(`(r)

ck−→d

+ `(r)

dk−→c

)

ξ↔c ≡R∑

r=1

ρr

K∑

k=1

νk

η

c g(ξ(r)

c k δ−1)

+∑

d6=cη↔d

(g

(ξ(r)

ck−→d

δ−1)

+ g

(ξ(r)

dk−→c

δ−1))

(η↔c | −) ∼ Γ(γ0C

+ `(·)c·↔·, ζ + ξ↔c

)

Topic Weights:

`(·)· k−→· ≡

C∑

c=1

C∑

d=1

R∑

r=1

`(r)

ck−→d

ξk ≡R∑

r=1

ρr

C∑

c=1

η↔c

η

c g(ξ(r)

c k δ−1)

+∑

d6=cη↔d

(g

(ξ(r)

ck−→d

δ−1)

+ g

(ξ(r)

dk−→c

δ−1))

(νk | −) ∼ Γ

(γ0K

+ `(·)· k−→·, ζ + ξk

)

Regime Weights:

`(r)

· ·−→· ≡C∑

c=1

C∑

d=1

K∑

k=1

`(r)

ck−→d

ξr ≡K∑

k=1

νk

C∑

c=1

η↔c

η

c g(ξ(r)

c k δ−1)

+∑

d6=cη↔d

(g

(ξ(r)

ck−→d

δ−1)

+ g

(ξ(r)

dk−→c

δ−1))

(ρr | −) ∼ Γ(γ0R

+ `(r)

· ·−→·, ζ + ξr

)

Weights Rate Parameter:

ω ≡C∑

c=1

η

c +C∑

c=1

η↔c +K∑

k=1

νk+R∑

r=1

ρr

(ζ | −) ∼ Γ (ε0 + 4γ0, ε0 + ω)

8

Page 19: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

4 Baseline Models

BPTF (Schein et al., 2015):

y(t)

ia−→j∼ Po

(Q∑

q=1

θ→iq θ←jq φaq ψtq λq

)

θ→iq ∼ Γ (ε0, β1)

θ←jq ∼ Γ (ε0, β2)

φaq ∼ Γ (ε0, β3)

ψtq ∼ Γ (ε0, β4)

λq ∼ Γ

(γ0Q, δ

)

β1, · · · , β4, δ ∼ Γ (ε0, ε0)

GPIRM (Schmidt & Mørup, 2013):

y(t)

ia−→j∼ Po

(λ(zt)

ziza−→zj

)

zi ∼ Cat(

η1∑c ηc

, . . . ,ηC∑c ηc

)

za ∼ Cat(

ν1∑k νk

, . . . ,νK∑k νk

)

zt ∼ Cat(

ρ1∑r ρr

, . . . ,ρR∑r ρr

)

ηc ∼ Γ(γ0C, ζ)

νk ∼ Γ(γ0K, ζ)

ρr ∼ Γ(γ0R, ζ)

λ(r)

ck−→d, ζ ∼ Γ (ε0, ε0)

DCGPIRM:

y(t)

ia−→j∼ Po

(θi θj φa ψt λ

(zt)

ziza−→zj

)

θi, φa, ψt ∼ Γ(ε0, ε0)

The rest of the generative process is the same as that of the GPIRM.

9

Page 20: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

5 Supplementary Plots

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

1

2

3

4

5

Figure 1: Inferred community weights η↔1 , . . . , η↔C . We use the between-community weights tointerpret shrinkage because they are used for the on- and off-diagonal elements of the core tensor.

References

Kingman, J. F. C. Poisson Processes. Oxford University Press, 1972.

Schein, A., Paisley, J., Blei, D. M., and Wallach, H. Bayesian Poisson tensor factorization for in-ferrring multilateral relations from sparse dyadic event counts. In Proceedings of the Twenty-FirstACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1045–1054,2015.

Schmidt, M. N. and Mørup, M. Nonparametric Bayesian modeling of complex networks: Anintroduction. IEEE Signal Processing Magazine, 30(3):110–128, 2013.

Zhou, M. Infinite edge partition models for overlapping community detection and link prediction.In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp.1135–1143, 2015.

Zhou, M. and Carin, L. Negative binomial process count and mixture modeling. IEEE Transactionson Pattern Analysis and Machine Intelligence, 37(2):307–320, 2015.

10

Page 21: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 2: Latent structure discovered by BPTD for topic k = 1 (mostly Verbal Cooperation ac-tion types) and the most active regime, including the community–community interaction network(bottom left), the rate at which each country acts as a sender (top left) and a receiver (bottom right)in each community, and the number of times each country i took an action associated with topic ktoward each country j during regime r (top right). We show only the most active 100 countries.

11

Page 22: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 3: Latent structure discovered by BPTD for topic k=2 (Verbal Cooperation).

12

Page 23: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 4: Latent structure discovered by BPTD for topic k=3 (Material Cooperation).

13

Page 24: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 5: Latent structure discovered by BPTD for topic k=4 (Verbal Conflict).

14

Page 25: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 6: Latent structure discovered by BPTD for topic k=5 (Material Conflict).

15

Page 26: Bayesian Poisson Tucker Decomposition for Learning the …dirichlet.net/pdf/schein16bayesian.pdf · 2016-05-27 · Bayesian Poisson Tucker Decomposition for Learning the Structure

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2019

1817

1615

1413

1211

109

87

65

43

21

UkraineGeorgia

AzerbaijanArmenia

SudanEthiopiaSomalia

LibyaKenya

South AfricaNigeriaUganda

ZimbabweCanada

SpainCuba

MexicoColombia

ChileArgentina

BrazilVenezuela

PeruPortugal

New ZealandAustralia

IndonesiaSwitzerland

ItalyHoly SeeGermany

NetherlandsBulgariaBelgium

LithuaniaLatvia

RomaniaPolandAustria

Czech Rep.SlovakiaHungary

IranKazakhstanUzbekistanKyrgyzstan

AfghanistanTajikistan

TaiwanSingapore

PhilippinesMalaysiaVietnamThailand

CambodiaMyanmar

MacedoniaTurkeyGreeceCyprus

PalestineLebanon

IsraelFranceKuwaitYemenAlgeria

Saudi ArabiaSyria

EgyptJordanJapan

North KoreaSouth Korea

BelarusRussiaChina

BangladeshPakistan

Sri LankaIndiaIraq

CroatiaAlbaniaBosniaKosovoSerbiaIreland

UKUSA

US

AU

KIr

elan

dS

erb

iaK

oso

voB

osn

iaA

lban

iaC

roat

iaIr

aqIn

dia

Sri

Lan

kaP

akis

tan

Ban

gla

des

hC

hin

aR

uss

iaB

elar

us

So

uth

Ko

rea

No

rth

Ko

rea

Jap

anJo

rdan

Eg

ypt

Syr

iaS

aud

i Ara

bia

Alg

eria

Yem

enK

uw

ait

Fra

nce

Isra

elL

eban

on

Pal

esti

ne

Cyp

rus

Gre

ece

Tu

rkey

Mac

edo

nia

Mya

nm

arC

amb

od

iaT

hai

lan

dV

ietn

amM

alay

sia

Ph

ilip

pin

esS

ing

apo

reT

aiw

anT

ajik

ista

nA

fgh

anis

tan

Kyr

gyz

stan

Uzb

ekis

tan

Kaz

akh

stan

Iran

Hu

ng

ary

Slo

vaki

aC

zech

Rep

.A

ust

ria

Po

lan

dR

om

ania

Lat

via

Lit

hu

ania

Bel

giu

mB

ulg

aria

Net

her

lan

ds

Ger

man

yH

oly

See

Ital

yS

wit

zerl

and

Ind

on

esia

Au

stra

liaN

ew Z

eala

nd

Po

rtu

gal

Per

uV

enez

uel

aB

razi

lA

rgen

tin

aC

hile

Co

lom

bia

Mex

ico

Cu

ba

Sp

ain

Can

ada

Zim

bab

we

Ug

and

aN

iger

iaS

ou

th A

fric

aK

enya

Lib

yaS

om

alia

Eth

iop

iaS

ud

anA

rmen

iaA

zerb

aija

nG

eorg

iaU

krai

ne

Figure 7: Latent structure discovered by BPTD for topic k=6 (Material Conflict).

16