![Page 1: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/1.jpg)
Structured Topic Models: Jointly Modeling Words and Their
Accompanying Modalities
Xuerui WangComputer Science Department
University of Massachusetts Amherst
Joint work with Andrew McCallum, Andres Corrada-Emmanuel, Chris Pal, Xing Wei and Natasha Mohanty.
![Page 2: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/2.jpg)
2
Probabilistic topic models
• Main Assumption:– Documents are mixture of topics– Topic distributions over words for co-occurrence
• Objectives:– Understand text using learned topics– Represent documents in topic space
![Page 3: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/3.jpg)
3
Clustering words into topics withLatent Dirichlet Allocation
[Blei, Ng, Jordan 2003]
Sample a distributionover topics,
For each document:
Sample a topic, z
For each word in doc
Sample a wordfrom the topic, w
Example:
70% finance30% environment
finance
“bank”
GenerativeProcess:
environment
![Page 4: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/4.jpg)
4
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
Example topicsinduced from a large collection of text
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
[Tennenbaum et al]
![Page 5: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/5.jpg)
5
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGYFIELD
PHYSICSLABORATORY
STUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Example topicsinduced from a large collection of text
[Tennenbaum et al]
![Page 6: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/6.jpg)
6
Documents are not just text !
• Multiple modalities:– Research papers (author, venue, words, etc.)– Email messages (sender, recipients, time, words, etc.)– Legislative resolutions (voting record, words, etc.)– And many more
• Most previous work: one modality at a time– Learn topics from words– Discover groups from relations– Etc.
![Page 7: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/7.jpg)
8
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 8: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/8.jpg)
9
All possible “topic models” with one latent topic, two observed modalities
and two conditional dependencies
![Page 9: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/9.jpg)
10
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 10: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/10.jpg)
11
From LDA to Author-Recipient-Topic
![Page 11: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/11.jpg)
12
All possible “topic models” with two observed modalities
![Page 12: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/12.jpg)
13
Inference and Estimation
Gibbs Sampling:- Easy to implement- Reasonably fast
r
![Page 13: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/13.jpg)
14
Enron email corpus
• 250k email messages• 147 people
Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001
Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions.
DP
Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]
![Page 14: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/14.jpg)
15
Topics, and prominent senders / receiversdiscovered by ARTTopic names,
by hand
![Page 15: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/15.jpg)
16
Topics, and prominent senders / receiversdiscovered by ART
Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice President of Regulatory Affairs”Steffes = “Vice President of Government Affairs”
![Page 16: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/16.jpg)
17
Comparing role discovery
connection strength (A,B) =
distribution overauthored topics
Traditional SNA
distribution overrecipients
distribution overauthored topics
Author-TopicART
![Page 17: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/17.jpg)
18
Comparing role discovery Tracy Geaconne Dan McCarty
Traditional SNA Author-TopicART
Similar roles Different rolesDifferent roles
Geaconne = “Secretary”McCarty = “Vice President”
![Page 18: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/18.jpg)
20
Traditional SNA Author-TopicART
Different roles Very differentVery similar
Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”
Comparing role discovery Lynn Blair Kimberly Watson
![Page 19: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/19.jpg)
21
McCallum Email Corpus 2004
• January - October 2004• 23k email messages• 825 people
From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]
There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for:
NIPS registration receipt.CALO registration receipt.
Thanks,Kate
![Page 20: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/20.jpg)
25
Two most prominent topicsin discussions with ____?
Words Problove 0.030514house 0.015402donna 0.013659time 0.012351great 0.011334hope 0.011043dinner 0.00959saturday 0.009154left 0.009154ll 0.009009roweis 0.008282visit 0.008137evening 0.008137stay 0.007847bring 0.007701weekend 0.007411road 0.00712sunday 0.006829kids 0.006539flight 0.006539
Words Probtoday 0.051152tomorrow 0.045393time 0.041289ll 0.039145meeting 0.033877week 0.025484talk 0.024626meet 0.023279morning 0.022789monday 0.020767back 0.019358call 0.016418free 0.015621home 0.013967won 0.013783day 0.01311hope 0.012987leave 0.012987office 0.012742tuesday 0.012558
![Page 21: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/21.jpg)
27
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 22: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/22.jpg)
29
Discovering groups from observed set of relations
Admiration relations among six high school students.
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
![Page 23: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/23.jpg)
30
Adjacency matrix representing relations
A B C D E FABCDEF
A B C D E FG1G2G1G2G3G3
G1G2G1G2G3G3
ABCDEF
A C B D E FG1G1G2G2G3G3
G1G1G2G2G3G3
ACBDEF
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
![Page 24: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/24.jpg)
31
Group Model: partitioning entities into groups
2Sv
β
2Gγ α
Stochastic Blockstructures for Relations[Nowicki, Snijders 2001]
S: number of entities
G: number of groups
Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]
BetaDirichlet
Binomial
SgMultinomial
![Page 25: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/25.jpg)
32
Two relations with different attributes
A C B D E FG1G1G2G2G3G3
G1G1G2G2G3G3
A C E B D FG1G1G1G2G2G2
G1G1G1G2G2G2
ACEBDF
Student Roster
AdamsBennettCarterDavisEdwardsFrederking
Academic Admiration
Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)
Social Admiration
Soci(A, B) Soci(A, D) Soci(A, F)Soci(B, A) Soci(B, C) Soci(B, E)Soci(C, B) Soci(C, D) Soci(C, F)Soci(D, A) Soci(D, C) Soci(D, E)Soci(E, B) Soci(E, D) Soci(E, F)Soci(F, A) Soci(F, C) Soci(F, E)
ACBDEF
![Page 26: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/26.jpg)
33
Goal:Model relations and their (textual) attributes simultaneously to obtain better groups and more meaningful topics.
budget, funding, annual, cash
document, corrections, review, annual
![Page 27: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/27.jpg)
34
The Group-Topic model: discovering groups and topics simultaneously
bNw
t
B
T
φ
η
DirichletMultinomial
Uniform
2Sv
β
2Gγ α
Beta
Dirichlet
Binomial
SgMultinomial
T
![Page 28: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/28.jpg)
35
All possible “topic models” with two observed modalities
![Page 29: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/29.jpg)
37
U.S. Senate data set
• 16 years of voting records in the US Senate (1989 – 2005)
• a Senator may respond Yea or Nay to a resolution
• 3423 resolutions with text attributes (index terms)
• 191 Senators in total across 16 years
S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102-242. Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……
![Page 30: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/30.jpg)
38
Topics discovered (U.S. Senate)Education Energy
MilitaryMisc.
Economic
education energy government federalschool power military labor
aid water foreign insurancechildren nuclear tax aid
drug gas congress taxstudents petrol aid business
elementary research law employeeprevention pollution policy care
Mixture of Unigrams
Group-Topic Model
Education
+ DomesticForeign Economic
Social Security
+ Medicareeducation foreign labor social
school trade insurance securityfederal chemicals tax insurance
aid tariff congress medicalgovernment congress income care
tax drugs minimum medicareenergy communicable wage disability
research diseases business assistance
![Page 31: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/31.jpg)
39
Groups discovered (US Senate)
Groups from topic Education + Domestic
![Page 32: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/32.jpg)
40
Senators Who Change Coalition the most Dependent on Topic
e.g. Senator Shelby (D-AL) votes with the Republicans on Economicwith the Democrats on Education + Domesticwith a small group of maverick Republicans on Social Security + Medicare
![Page 33: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/33.jpg)
44
Do we get better groups with the GT model?
1. Cluster bills into topics using mixture of unigrams;
2. Apply group model on topic-specific subsets of bills.
Agreement Index (AI) measures group cohesion. Higher, better.
Datasets Avg. AI for Baseline Avg. AI for GT p-value
Senate 0.8198 0.8294 <.01
UN 0.8548 0.8664 <.01
1. Jointly cluster topic and groups at the same time using the GT model.
Baseline Model GT Model
![Page 34: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/34.jpg)
46
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 35: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/35.jpg)
48
Want to model trends over time
• Is prevalence of topic growing or waning?
• Pattern appears only briefly– Capture its statistics in focused way– Don’t confuse it with patterns elsewhere in time
• How do roles, groups, influence shift over time?
![Page 36: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/36.jpg)
49
Topics Over Time (TOT)
Betaover time
topicindex
timestamp
word
Multinomialover words
Dirichletprior
Dirichlet prior
multinomialover topics
Betaover time
topicindex
timestamp
wordMultinomialover words
Dirichlet prior
multinomialover topics
Dirichlet prior
![Page 37: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/37.jpg)
50
All possible “topic models” with two observed modalities
![Page 38: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/38.jpg)
51
State of the union address
208 Addresses delivered between January 8, 1790 and January 29, 2002.
To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied.
•17156 ‘documents’
•21534 words
•669,425 tokens
Our scheme of taxation, by means of which this needless surplus is takenfrom the people and put into the public Treasury, consists of a tariff orduty levied upon importations from abroad and internal-revenue taxes leviedupon the consumption of tobacco and spirituous and malt liquors. It must beconceded that none of the things subjected to internal-revenue taxationare, strictly speaking, necessaries. There appears to be no just complaintof this taxation by the consumers of these articles, and there seems to benothing so well able to bear the burden without hardship to any portion ofthe people.
1910
![Page 39: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/39.jpg)
52
Comparing
TOT
against
LDA
![Page 40: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/40.jpg)
55
Topic Distributions Conditioned on Time
time
top
ic m
ass
(in
ver
tica
l h
eig
ht)
in N
IPS
con
ference p
apers
![Page 41: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/41.jpg)
57
TOT improves ability to predict time
Predicting the year of a State-of-the-Union address.
L1 = distance between predicted year and actual year.
![Page 42: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/42.jpg)
58
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 43: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/43.jpg)
59
Topic Interpretability
LDA
algorithmsalgorithmgenetic
problemsefficient
Topical N-grams
genetic algorithmsgenetic algorithm
evolutionary computationevolutionary algorithms
fitness function
![Page 44: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/44.jpg)
60
Topics modeling phrases
• Topics based only on unigrams often difficult to interpret
• Topic discovery itself is confused because important meaning / distinctions carried by phrases.
• Significant opportunity to provide improved language models to ASR, MT, IR, etc.
![Page 45: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/45.jpg)
61
Topical N-Gram model
z1 z2 z3 z4
w1 w2 w3 w4
y1 y2 y3 y4
1
T
D
. . .
. . .
. . .
α
WTW
γ1 γ2β 2
![Page 46: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/46.jpg)
62
All possible “topic models” with two observed modalities
![Page 47: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/47.jpg)
63
Features of Topical N-Grams model
• Easily trained by Gibbs sampling– Can run efficiently on millions of words
• Topic-specific phrase discovery– “white house” has special meaning as a phrase
in the politics topic,– ... but not in the real estate topic.
![Page 48: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/48.jpg)
64
NIPS research papers• Full text of NIPS papers between 1987-1999.
• 1,740 research papers in total.
• 13, 649 unique words and 2,301,375 word tokens.
• Stop words removed and no stemming.
![Page 49: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/49.jpg)
65
“Reinforcement Learning”
state learning policy action reinforcement states time optimal actions function algorithm reward step dynamic control sutton rl decision algorithms agent
LDAreinforcement learningoptimal policydynamic programmingoptimal controlfunction approximatorprioritized sweepingfinite-state controllerlearning systemreinforcement learning RLfunction approximatorsmarkov decision problemsmarkov decision processeslocal searchstate-action pairmarkov decision processbelief statesstochastic policyaction selectionupright positionreinforcement learning methods
policyactionstatesactionsfunctionrewardcontrolagentq-learningoptimalgoallearningspacestepenvironmentsystemproblemstepssuttonpolicies
Topical N-grams (2+) Topical N-grams (1)
![Page 50: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/50.jpg)
66
“Support Vector Machines”
kernel linear vector support set nonlinear data algorithm space pca function problem margin vectors solution training svm kernels matrix machines
LDA
support vectors test error support vector machines training error feature space training examples decision function cost functions test inputs kkt conditions leave-one-out procedure soft margin bayesian transduction training patterns training points maximum margin strictly convex regularization operators base classifiers convex optimization
kernel training support margin svm solution kernels regularization adaboost test data generalization examples cost convex algorithm working feature sv functions
Topical N-grams (2+) Topical N-grams (1)
![Page 51: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/51.jpg)
67
Word dependencies in information retrieval
• Long-distance dependency ---- topical (semantic) dependency helps [Hofmann, 1999; Wei and Croft, 2006].
• Short-distance dependency ---- phrases (usually discovered by separate modules) can boost IR performance [Fagan, 1989; Evans et al., 1991; Strzalkowski, 1995; Mitra et al., 1997].
• TNG simultaneously capture both.
![Page 52: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/52.jpg)
68
San Jose Mercury News (TREC)
• Covers materials from San Jose Mercury News in 1991
• With TREC queries 51-150
• 90,257 documents in total, 255, 686 unique words and 17,574,989 word tokens.
• Stop words removed and no stemming.
<DOC><DOCNO> SJMN91-06364022 </DOCNO><ACCESS> 06364022 </ACCESS><CAPTION> Photo; PHOTO: Associated Press; MONSTER MASH -- Kentucky's Jamal MashBurn shows his stuff in the Wildcats' 103-89 victory over state rival Louisville onSaturday. Mashburn had 25 points. </CAPTION><DESCRIPT> COLLEGE; BASKETBALL; GAME; RESULT; RANKING; SCHOOL </DESCRIPT><LEADPARA> Arizona had a 24-point night from Sean Rooks, a height advantage and strong defense, but still struggled to an 83-76 victory over Evansville in the FiestaBowl Classic in Tucson, Ariz., on Saturday.; The victory moved the No. 6Wildcats into the championship of their tournament for the seventh straighttime. </LEADPARA><SECTION> Sports </SECTION><HEADLINE> ARIZONA EDGES EVANSVILLE……
![Page 53: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/53.jpg)
69
Ad-hoc retrieval on SJMN
Clearly contain phrases
No phrases due to stopping and punctuation removing
Mixed results on many other queries.
![Page 54: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/54.jpg)
70
Ad-hoc retrieval on SJMN
* indicates statistically significant differences in performance with 95% confidence according to the Wilcoxon test
![Page 55: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/55.jpg)
71
Outline
• Introduction
• Role and Topic Discovery in Social Networks
• Group and Topic Discovery from Voting Records
• Topics over Time
• Topical Phrase with Markov Assumption
• Conclusions
![Page 56: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/56.jpg)
72
All possible “topic models” with two observed modalities (revisit)
ARTGTTOT TNG
![Page 57: Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities](https://reader034.vdocument.in/reader034/viewer/2022051214/56813dfd550346895da7d7c3/html5/thumbnails/57.jpg)
73
Conclusions
• With carefully designed model structures, we can utilize multi-modality information.
• Choices of configuration are task dependent.
• Better results are obtained from joint inference on various tasks.