mining heterogeneous information...
TRANSCRIPT
![Page 1: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/1.jpg)
APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS
Yizhou SunCollege of Computer and Information Science
Northeastern University
July 25, 2015
![Page 2: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/2.jpg)
Heterogeneous Information Networks
• Multiple object types and/or multiple link types
1
Venue Paper Author
DBLP Bibliographic Network The IMDB Movie Network
Actor
Movie
Director
Movie
Studio
1. Homogeneous networks are Information loss projection of heterogeneous networks!2. New problems are emerging in heterogeneous networks!
The Facebook Network
Directly Mining information richer heterogeneous networks
![Page 3: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/3.jpg)
Outline
• Why Heterogeneous Information Networks?
• Entity Recommendation
• Information Diffusion
• Ideology Detection
• Summary
2
![Page 4: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/4.jpg)
Recommendation Paradigm
3
recommender system recommendation
user
feedback
external knowledge
product features
community user-item feedback
Collaborative FilteringE.g., K-Nearest Neighbor (Sarwar WWW’01), Matrix Factorization (Hu ICDM’08, Koren IEEE-CS’09), Probabilistic Model (Hofmann SIGIR’03)
Content-Based MethodsE.g., (Balabanovic Comm. ACM’ 97, Zhang SIGIR’02)
Hybrid MethodsE.g., Content-Based CF (Antonopoulus, IS’06), External Knowledge CF (Ma WSDM’11)
![Page 5: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/5.jpg)
Problem Definition
4
recommender system recommendation
user
feedback
information network
implicit user feedback
hybrid collaborative filteringwith information networks
![Page 6: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/6.jpg)
Hybrid Collaborative Filtering with Networks
• Utilizing network relationship information canenhance the recommendation quality
• However, most of the previous studies only usesingle type of relationship between users or items(e.g., social network Ma,WSDM’11, trust relationshipEster, KDD’10, service membership Yuan, RecSys’11)
5
![Page 7: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/7.jpg)
The Heterogeneous Information Network View of Recommender System
6
Avatar TitanicAliens Revolution-ary Road
James Cameron
Kate Winslet
Leonardo Dicaprio
Zoe Saldana Adventure
Romance
![Page 8: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/8.jpg)
Relationship Heterogeneity Alleviates Data Sparsity
7
# of users or items
A small numberof users and itemshave a largenumber of ratings
Most users and items havea small number of ratings
# o
f ra
tin
gs
Collaborative filtering methods suffer from data sparsity issue
• Heterogeneous relationships complement each other
• Users and items with limited feedback can be connected to the
network by different types of paths
• Connect new users or items (cold start) in the information
network
![Page 9: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/9.jpg)
Relationship Heterogeneity Based Personalized Recommendation Models
8
Different users may have different behaviors or preferences
Aliens
James Cameron fan
80s Sci-fi fan
Sigourney Weaver fan
Different users may beinterested in the samemovie for different reasons
Two levels of personalizationData level• Most recommendation methods use
one model for all users and rely on
personal feedback to achieve
personalization
Model level• With different entity relationships, we
can learn personalized models for
different users to further distinguish
their differences
![Page 10: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/10.jpg)
Preference Propagation-Based Latent Features
9
Alice
Bob
Kate Winslet
Naomi Watts
Titanic
revolutionary road
skyfall
King Konggenre: drama
Sam Mendes
tag: Oscar NominationCharlie
Generate L different meta-path (path types)
connecting users and items
Propagate user implicit feedback along each meta-
path
Calculate latent-features for users and items for each
meta-path with NMFrelated method
Ralph Fiennes
![Page 11: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/11.jpg)
Luser-cluster similarity
Recommendation Models
10
Observation 1: Different meta-paths may have different importance
Global Recommendation Model
Personalized Recommendation Model
Observation 2: Different users may require different models
ranking score
the q-th meta-path
features for user i and item j
c total soft user clusters
(1)
(2)
![Page 12: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/12.jpg)
Parameter Estimation
11
• Bayesian personalized ranking (Rendle UAI’09)
• Objective function
minΘ
sigmoid function
for each correctly ranked item pairi.e., 𝑢𝑖 gave feedback to 𝑒𝑎 but not 𝑒𝑏
Soft cluster users with NMF + k-means
For each user cluster, learn one
model with Eq. (3)
Generate personalized model for each user on the
fly with Eq. (2)
(3)
Learning Personalized Recommendation Model
![Page 13: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/13.jpg)
Experiment Setup
• Datasets
• Comparison methods:
• Popularity: recommend the most popular items to users
• Co-click: conditional probabilities between items
• NMF: non-negative matrix factorization on user feedback
• Hybrid-SVM: use Rank-SVM with plain features (utilize
both user feedback and information network)
12
![Page 14: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/14.jpg)
Performance Comparison
13
HeteRec personalized recommendation (HeteRec-p) provides the best recommendation results
![Page 15: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/15.jpg)
Performance under Different Scenarios
14
HeteRec–p consistently outperform other methods in different scenariosbetter recommendation results if users provide more feedbackbetter recommendation for users who like less popular items
p p
user
![Page 16: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/16.jpg)
Contributions
• Propose latent representations for users and itemsby propagating user preferences along differentmeta-paths
• Employ Bayesian ranking optimization technique tocorrectly evaluate recommendation models
• Further improve recommendation quality byconsidering user differences at model level anddefine personalized recommendation models
• Two levels of personalization
15
Entity Recommendation in InformationNetworks with Implicit User Feedback
(RecSys’13, WSDM’14a)
![Page 17: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/17.jpg)
Outline
• Why Heterogeneous Information Networks?
• Entity Recommendation
• Information Diffusion
• Ideology Detection
• Summary
16
![Page 18: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/18.jpg)
Information Diffusion in Networks
• Action of a node is triggered by the actions of their neighbors
17
![Page 19: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/19.jpg)
Linear Threshold Model
• [Granovetter, 1978]
• If the weighted activation number of its neighbors is bigger
than a pre-specified threshold 𝜃𝑢, the node u is going to be
activated
• In other words
• 𝑝𝑢(𝑡 + 1) = 𝐸[1 𝑣∈Γ 𝑢 𝑤𝑣,𝑢𝛿 𝑢, 𝑡 > 𝜃𝑢 ]
18
![Page 20: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/20.jpg)
Heterogeneous Bibliographic Network
• Multiple types of objects
• Multiple types of links
19
![Page 21: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/21.jpg)
Derived Multi-Relational Bibliographic Network
• Collaboration: Author-Paper-Author
• Citation: Author-Paper->Paper-Author
• Sharing Co-authors: Author-Paper-Author-Paper-Author
• Co-attending venues: Author-Paper-Venue-Paper-Author
20
How to generate these meta-paths ?PathSim: Sun et.al, VLDB’11
![Page 22: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/22.jpg)
How Topics Are Propagated among Authors?
• To Apply Existing approaches
• Select one relation between authors (say,
A-P-A)
• Use all the relations, but ignore the relation
types
• Do different relation types play different roles?
• Need new models!
21
![Page 23: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/23.jpg)
Two Assumptions for Topic Diffusion in Multi-Relational Networks
• Assumption 1: Relation independent diffusion
22
Model-level aggregation
![Page 24: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/24.jpg)
• Assumption 2: Relation interdependent diffusion
23
Relation-level aggregation
![Page 25: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/25.jpg)
Two Models under the Two Assumptions
• Two multi-relational linear threshold models
• Model 1: MLTM-M
• Model-level aggregation
• Model 2: MLTM-R
• Relation-level aggregation
24
![Page 26: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/26.jpg)
MLTM-M
• For each relation type k
• The activation probability for object i at time t+1:
• The collective model
• The final activation probability for object i is an aggregation
over all relation types
25
![Page 27: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/27.jpg)
Properties of MLTM-M
26
![Page 28: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/28.jpg)
MLTM-R
• Aggregate multi-relational network with different weights
• Treat the activation as in a single-relational network
•
27
To make sure the activation probability non-negative, weights 𝛽′𝑠 are required non-negative
![Page 29: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/29.jpg)
Properties of MLTM-R
28
![Page 30: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/30.jpg)
How to Evaluate the Two Models?
• Test on the real action log on multiple topics!
• 𝐴𝑐𝑡𝑖𝑜𝑛 𝑙𝑜𝑔: {< 𝑢𝑖 , 𝑡𝑖 >}
• Diffusion model learning from action log
• MLE estimation over 𝛽′𝑠
29
![Page 31: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/31.jpg)
Two Real Datasets
• DBLP
• Computer Science
• Relation types
• APA, AP->PA, APAPA, APVPA
• APS
• Physics
• Relation types
• APA, AP->PA, APAPA, APOPA
30
![Page 32: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/32.jpg)
Topics Selected
• Select topics with increasing trends
31
![Page 33: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/33.jpg)
Evaluation Methods
• Global Prediction
• How many authors are activated at t+1
• Error rate = ½(predicted#/true# + true#/predicted#)-1
• Local Prediction
• Which author is likely to be activated at t+1
• AUPR (Area under Precision-Recall Curve)
32
![Page 34: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/34.jpg)
Global Prediction
33
![Page 35: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/35.jpg)
Local Prediction - AUPR
• 1: Different Relation Play Different Roles in Diffusion Process
• 2: Relation-Level Aggregation is better than Model-Level Aggregation
34
![Page 36: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/36.jpg)
Case Study
35
![Page 37: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/37.jpg)
Prediction Results on “social network” Diffusion
36
![Page 38: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/38.jpg)
37
![Page 39: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/39.jpg)
38
WIN!
![Page 40: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/40.jpg)
Outline
• Why Heterogeneous Information Networks?
• Entity Recommendation
• Information Diffusion
• Ideology Detection
• Summary
39
![Page 41: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/41.jpg)
• Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (KDD’14, Gu, Sun et al.)
40
![Page 42: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/42.jpg)
Background
Federal Legislation
(bill)Law
The HouseSenate
Ronald Paul
Bill 1 Bill 2 ……
Barack Obama
Ronald Paul
liberal conservative
Politician
Republican Democrat
Barack Obama
41
United Stated Congress
The House Senate
![Page 43: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/43.jpg)
Legislative Voting Network
42
![Page 44: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/44.jpg)
Problem Definition
Input: Legislative Network
Output:𝒙𝑢: Ideal Points for Politician 𝑢𝒂𝑑: Ideal Points for Bill 𝑑
43
𝒙𝑢’s on different topics
![Page 45: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/45.jpg)
Existing Work
• 1-dimensional ideal point model (Poole and Rosenthal, 1985; Gerrish and Blei, 2011)
• High-dimensional ideal point model (Poole and Rosenthal, 1997)
• Issue-adjusted ideal point model (Gerrish and Blei, 2012)
44
![Page 46: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/46.jpg)
Motivations
45
Topic 1Topic 2Topic 3Topic 4
• Voters have different positions on different topics.
• Traditional matrix factorization method cannot give the meanings for each dimension.
𝑀 𝑈≈ ⋅ 𝑉𝑇
𝑘𝑡ℎ latent factor
• Topics of bills can influence politician’s voting, and the voting behavior can better interpret the topics of bills as well.
Topic Model:• Health• Public Transport• …
Voting-guided Topic Model:• Health Service• Health Expenses• Public Transport• …
![Page 47: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/47.jpg)
Topic-Factorized IPM
46
𝑢 𝑑
𝑤
PoliticiansBills
Terms
Heterogeneous Voting Network
𝑛(𝑑, 𝑤)𝑣𝑢𝑑
Entities:• Politicians• Bills• Terms
Links:• (𝑃, 𝐵)• (𝐵, 𝑇)
Parameters to maximize the likelihood of generating two types of links:• Ideal points for politicians• Ideal points for bills• Topic models
![Page 48: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/48.jpg)
Text Part
47
PoliticiansBills
Terms
![Page 49: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/49.jpg)
Text Part
• We model the probability of each word in each document as a mixture of categorical distributions, as in PLSA (Hofmann, 1999) and LDA (Blei et al., 2003)
𝑑 𝑘 𝑤𝜃𝑑𝑘 = 𝑝(𝑘|𝑑) 𝛽𝑘𝑤 = 𝑝(𝑤|𝑘)
Bill Topic Word
𝒘𝑑 = 𝑛 𝑑, 1 , 𝑛 𝑑, 2 ,… , 𝑛 𝑑,𝑁𝑤
𝑝 𝒘𝑑 𝜽,𝜷 ∝
𝑤
(
𝑘
𝜃𝑑𝑘𝛽𝑘𝑤)
𝑛(𝑑,𝑤)
𝑝 𝑾 𝜽,𝜷 ∝
𝑑
𝑤
(
𝑘
𝜃𝑑𝑘𝛽𝑘𝑤)
𝑛(𝑑,𝑤)
48
![Page 50: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/50.jpg)
Voting Part
49
PoliticiansBills
Terms
Intuitions:• The more similar of
the ideal points of uand d, the higher probability of “YEA” link
• The higher portion a bill belongs to topic k, the higher weight of ideal points on topic k
![Page 51: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/51.jpg)
Voting Part
YEA𝑝 𝑣𝑢𝑑 = 1 = 𝜎(
𝑘
𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)
𝑥𝑢1 𝑥𝑢2 𝑥𝑢𝑘 𝑥𝑢𝐾
𝑎𝑑1 𝑎𝑑2 𝑎𝑑𝑘 𝑎𝑑𝐾
𝒙𝑢
𝒂𝑑
Topic1
Topic2
…… Topic
𝑘Topic
𝐾……
0 1 -1 1 1 1 1 1
0 0 -1 1 1 1 -1 1
1 1 1 1 -1 1 0 0
𝑢1
𝑢2
𝑢𝑁𝑈
𝑑1 𝑑2 𝑑𝑁𝐷……
……
User-Bill voting matrix 𝑽
𝑟𝑢𝑑 =
𝑘=1
𝐾
𝑥𝑢𝑘𝑎𝑑𝑘
𝑟𝑢𝑑 =
𝑘=1
𝐾
𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘
𝑝 𝑣𝑢𝑑 = −1 = 1 − 𝜎(
𝑘
𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)NAY
Voter 𝑢
Bill 𝑑
𝑝 𝑽 𝜽,𝑿, 𝑨, 𝒃 =
𝑢,𝑑 :𝑣𝑢𝑑≠0
(𝑝 𝑣𝑢𝑑 = 11+𝑣𝑢𝑑2 𝑝 𝑣𝑢𝑑 = −1
1−𝑣𝑢𝑑2 )
50
𝑥𝑢1
𝑥𝑢𝑘
𝑥𝑢𝐾
𝑎𝑑1
𝑎𝑑𝑘
𝑎𝑑𝐾
……
……
𝜃𝑑𝑘
𝜃𝑑1
𝜃𝑑𝐾
𝑥𝑢𝑘 ∈ 𝑹
𝑎𝑑𝑘 ∈ 𝑹
𝐼{𝑣𝑢𝑑=1} 𝐼{𝑣𝑢𝑑=−1}
![Page 52: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/52.jpg)
Combining Two Parts Together
• The final objective function is a linear combination of the two average log-likelihood functions over the word links and voting links.
• We also add an 𝑙2 regularization term to 𝐴 and 𝑋 to reduce over-fitting.
51
![Page 53: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/53.jpg)
Learning Algorithm
• An iterative algorithm where ideal points related parameters (𝑋, 𝐴, 𝑏) and topic model related parameters (𝜃, 𝛽) enhance each other.
• Step 1: Update 𝑋, 𝐴, 𝑏 given 𝜃, 𝛽• Gradient descent
• Step 2: Update 𝜃, 𝛽 given 𝑋, 𝐴, 𝑏• Follow the idea of expectation-maximization (EM) algorithm:
maximize a lower bound of the objective function in each iteration
52
![Page 54: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/54.jpg)
Learning Algorithm
• Update 𝜃: A nonlinear constrained optimization problem.
Remove the constraints by a logistic function based transformation:
and update 𝜇𝑑𝑘 using gradient descent.
• Update 𝛽:
Since 𝛽 only appears in the topic model part, we use the same updating rule as in PLSA:
where
𝑒𝜇𝑑𝑘
1 + 𝑘′=1𝐾−1 𝑒𝜇𝑑𝑘′
1
1 + 𝑘′=1𝐾−1 𝑒𝜇𝑑𝑘′
if 1 ≤ 𝑘 ≤ 𝐾 − 1
if 𝑘 = 𝐾
𝜃𝑑𝑘 =
53
![Page 55: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/55.jpg)
Data Description
• Dataset:
• U.S. House and Senate roll call data in the years between 1990
and 2013.∗
• 1,540 legislators
• 7,162 bills
• 2,780,453 votes (80% are “YEA”)
• Keep the latest version of a bill if there are multiple versions.
• Randomly select 90% of the votes as training and 10% as
testing.
∗Downloaded from http://thomas.loc.gov/home/rollcallvotes.html
54
![Page 56: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/56.jpg)
Evaluation Measures
• Root mean square error (RMSE) between the predicted vote score and the ground truth
RMSE = 𝑢,𝑑 :𝑣𝑢𝑑≠0
1+𝑣𝑢𝑑2−𝑝 𝑣𝑢𝑑=1
2
𝑁𝑉
• Accuracy of correctly predicted votes (using 0.5 as a threshold for the predicted accuracy)
Accuracy = 𝑢,𝑑(𝐼 𝑝 𝑣𝑢𝑑=1 >0.5 && 𝑣𝑢𝑑=1
+𝐼 𝑝 𝑣𝑢𝑑=1 <0.5 && 𝑣𝑢𝑑=−1)
𝑁𝑉
• Average log-likelihood of the voting link
AvelogL = 𝑢,𝑑 :𝑣𝑢𝑑≠0
1+𝑣𝑢𝑑2log 𝑝 𝑣𝑢𝑑=1 +
1−𝑣𝑢𝑑2log 𝑝(𝑣𝑢𝑑=−1)
𝑁𝑉
55
![Page 57: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/57.jpg)
Experimental Results
Training Data set Testing Data set
56
![Page 58: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/58.jpg)
Parameter Study
Parameter study on 𝜆 Parameter study on 𝜎 (regularization coefficient)
57
𝐽 𝜽, 𝜷, 𝑿, 𝑨, 𝒃 = 1 − 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿 𝑡𝑒𝑥𝑡 + 𝜆 ⋅ 𝑎𝑣𝑒𝑙𝑜𝑔𝐿(𝑣𝑜𝑡𝑖𝑛𝑔) −1
2𝜎2(
𝑢
𝒙𝒖 22+
𝑑
𝒂𝑑 22)
![Page 59: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/59.jpg)
Foreign
Education
Individual Property
MilitaryFinancial Institution
Law
Health Service
Health Expenses
Funds
Public Transportation
Ronald Paul Barack Obama Joe Lieberman
Case Studies
• Ideal points for three famous politicians: (Republican, Democrat)
• Ronald Paul (R), Barack Obama (D), Joe Lieberman (D)
58
![Page 60: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/60.jpg)
Case Studies
• Scatter plots over selected dimensions:(Republican, Democrat)
59
![Page 61: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/61.jpg)
𝑥𝑢𝑘
Case Studies
𝑝 𝑣𝑢𝑑 = 1 = 𝜎(
𝑘
𝜃𝑑𝑘𝑥𝑢𝑘𝑎𝑑𝑘 + 𝑏𝑑)
Bill: H_RES_578 — 109th Congress (2005-2006)It is about supporting the government of Romania to improve the standard health care and well-being of children in Romania.
YEAR. Paul H_RES_578
60
Topic Model
TF-IPM
Experts/Algorithm
𝜽𝑑
𝒙𝑢
𝒂𝑑
𝑝 𝑣𝑢𝑑 = 1
For Unseen Bill 𝑑:
![Page 62: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/62.jpg)
Outline
• Why Heterogeneous Information Networks?
• Entity Recommendation
• Information Diffusion
• Ideology Detection
• Summary
61
![Page 63: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/63.jpg)
Summary
• Heterogeneous Information Networks are networks with multiple types of objects and links
• Principles in mining heterogeneous information networks
• Meta-path-based mining
• Systematically form new types of relations
• Relation strength-aware mining
• Different types of relations have different strengths
• Relation semantic-aware mining
• Different types of relations need different modeling
62
![Page 64: Mining Heterogeneous Information Networkskdd.cs.ksu.edu/Workshops/HINA-2015/Presentations/Slides/01-Sun... · 25/07/2015 · Actor Movie Director Movie Studio 1. ... Leonardo Dicaprio](https://reader035.vdocument.in/reader035/viewer/2022063013/5fcaf5d02eac2165486f0a50/html5/thumbnails/64.jpg)
Q & A
63