ensemble learning for online social networkslinc.ucy.ac.cy/isocial/images/stories/events... · big...
TRANSCRIPT
Amira Soliman
PhD student, KTH/EES/LCN
Supervisor: Ass. Prof. Sarunas Girdzijauskas
Ensemble Learning for Online Social Networks
1
Outline
1. Ensemble Learning
2. Ensemble Learning for OSN
3. Peer Sampling Service
4. Merging Classifiers
5. Applications
2
Big Data Mining
Turning this big data into “actionable insights” in a timely fashion
Challenges:
– Big Data 3V
– Information sharing and data privacy
– Mining complex and dynamic data
– Mining from sparse, uncertain, and
incomplete data
3
4
Ensemble Learning
Original
Training data
....D
1D
2 Dt-1
Dt
D
Step 1:
Create Multiple
Data Sets
C1
C2
Ct -1
Ct
Step 2:
Build Multiple
Classifiers
C*
Step 3:
Combine
Classifiers
Motivations of Ensemble Learning
• Ensemble model improves accuracy and robustness over
single model methods
• Applications:
– distributed computing
– privacy-preserving applications
– large-scale data with reusable models
– multiple sources of data
5
Simple Distributed Training
6
Server 1 Server
2
Server
3
Server
4
Server
5
Training
Data
w(1) w(2) w(3) w(4) w(5)
i
i
i
)(ww
μi is the fraction of data for piece i
Ensemble Learning for OSN
7
Research Questions:
• Diversity – random overlay
• Navigation – constructing relay paths through social ties
• Merge classifiers – reflect Ego/Socio – centric properties in OSN
Methodologies:
• Gossip-based Peer Sampling Service
• On-the-fly Path Reconstruction
• Merging classifiers based on content-richness.
Our approach
1. Construct a random overlay on top of the network.
2. Train classifier with local training dataset.
3. Receive classifiers from neighbors in overlay.
4. Assign the weights of different classifiers.
5. Update local classifier with the using the weighted average:
6. Propagate updated classifier to neighbors in overlay.
8
Overlay Construction
Cyclon: Inexpensive membership management for
unstructured P2P overlays
– that gives access to random peers
– has low diameter
– has low clustering coefficient
– Resilient to massive node failures
Basic idea: Shuffle operation, that’s performed
periodically using gossip
9
How Cyclon works
10
A
B
C
D
E
F
G
H
C : ipAddrC
A: ipAddrA E : ipAddrE
H: ipAddrH
D: ipAddrD
A: ipAddrA
C: ipAddrC
F: ipAddrF
D: ipAddrD
A: ipAddrA
B: ipAddrB
E: ipAddrE
G: ipAddrG
H: ipAddrH
Overlay Construction on top of OSN
11
A
B
C
D
E
F
G
H
D: A, D
A: A
C: C
F: A, F
D: D
A: A
B: A, B
E: E
G: G
H: H
C : B, C
A: B, A E : F, E
H: F, H
On-the-fly Path Reconstruction
12
A
B
C
D
E
F
G
H
D: A, D
A: A
C: C
F: A, F
D: D
A: A
B: A, B
E: E
G: G
H: H
H: F, H
H: A, F, H H : A, F, H
Our approach
1. Construct a random overlay on top of the network.
2. Train classifier with local training dataset.
3. Receive classifiers from neighbors in overlay.
4. Assign the weights of different classifiers.
5. Update local classifier with the using the weighted average:
6. Propagate updated classifier to neighbors in overlay.
13
Merging Models
14
L(n)
M1 M2 Mn
LCV
M
Evaluation
1. We have generated synthetic graphs and data.
2. Data from OSN (Facebook, and Twitter)
Applications:
For our iSocial project:
1. Identity Management with INSUBRIA
2. Spam Filtering with FORTH
15
Thanks
Questions and Comments are All welcome
16