gossip based partitioning and replication for online social networks
TRANSCRIPT
Gossip-based Partitioning and ReplicationMiddle-ware forOnline Social Networks
Muhammad Anis Uddin Nasir(EMDC/ICT/LCN)
Supervisor: Šarūnas GirdzijauskasExaminer: Johan Montelius
Online Social Networks
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
•Vertices •Edges •Metadata
Ioanna Antonio Vaidas
Aras
VasiaAnis
Mudit
Manos
2
LeandroJohan
Existing Solutions
• Relational Databases- MySQL Cluster
• Key-Value stores- Cassandra, Amazon Dynamo
• Document Databases- MongoDB, CouchDB
• Graph Databases- Neo4j, Titans
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 3
Why Existing Solutions are not enough?
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
4
Why Existing Solutions are not enough?
• Random Partitioning• Social Request
- E.g., gather new feeds from all the friends
• Enforcing Data Locality
• Random partitioning can lead to full replication!
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
1 4 7 82 3 5 6 10 9
1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’
5
Social Graphs are not Random
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 6
Graphs with
small world
properties
Graph Partitioning
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 7
JA-BE-JA- edge-cut
04/18/2023Muhammad Anis Uddin Nasir- Gossip-based Partitioning and
Replication Middle-ware
Server A Server B
6
3
5
2
1
4
76’
3’
1’
4’
7’
• Edge Cut = 3 links, 3+2=5 replicas to maintain
8
SPAR- Minimizing Replicas
04/18/2023Muhammad Anis Uddin Nasir- Gossip-based Partitioning and
Replication Middle-ware
Server A Server B
6
3
5
2
1
4
76’
3’2’
5’
• Edge Cut = 4 links, 2+2=4 replicas to maintain
9
Initialization
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
1 4 7 82 3 5 6 10 9
1’ 4’ 7’ 8’ 9’ 2’ 3’ 6’5’ 10’
• Node Addition- Assign it to server with minimum master
• Edge Addition- Check if Nodes are Local- Else create replicas to
maintain locality
10
Gossip Phase
• Cost Function- Count number of replicas- For current and new server
• Peer Selection- Local, Random, Hybrid
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
1 4 7 82 3 5 6 10 9
1’ 4’ 7’ 8’ 9’ 5’ 10’
11
2’ 3’ 6’
Gossip Phase
• Cost Function- Count number of replicas- For existing and new server
• Peer Selection- Local, Random, Hybrid
• Simulated Annealing
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
5
3
4
2
1
10
8
9
7
6
6 4 7 82 3 5 1 10 9
4’ 8’ 9’ 3’ 5’ 10’6’ 1’
4 10
12
Simulated Annealing
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 13
Algorithms
Algorithm Random SPAR JA-BE-JA Gossip-based
Data locality
Decentralized
Load Balancing
Fault tolerance
Avoiding Local Optima
Availability
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 14
Datasets
Datasets Vertices Edges
Synth-C 2,000 20,000
Synth-HC 2,000 20,000
Synth-PL 2,000 20,000
SNAP-Facebook 4,039 88,234
WSON-Facebook 60,290 1,545,686
SNAP-Twitter 81,306 1,768,149
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 15
Evaluation- with datasets
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
Synt
h-C
Synt
h-HC
Synt
h-PL
SNAP
-Fac
eboo
k
WSO
N-Fac
eboo
k
SNAP
-Twitt
er0
2
4
6
8
10
12Random
SPAR
JA-BE-JA
Gossip-based
Replic
ati
on O
verh
ead
>3x gain compared to
Random Partitioning
≈2x gain compared to
SPAR
• Number of Servers =16, Replication factor=2
16
Evaluation- with replication factor
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
Replic
ati
on O
verh
ead
• Number of Servers =16
Synt
h-LC
Synt
h-LH
C
Synt
h-PL
Synt
h-C
Synt
h-HC
SNAP
-Fac
eboo
k
WSO
N-Fac
eboo
k
SNAP
-Twitt
er0123456789
10f=0
f=2
Random Graphs generates maximum replication overhead Real Graphs
generates minimum replication overhead
Data locality is achieved by fault tolerance replicas
17
Evaluation- with servers
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
Replic
ati
on O
verh
ead
• Replication factor =2
Number of Servers
WSON-Facebook
18
8 16 32 6402468
101214161820
Random
SPAR
JA-BE-JA
Gossip-based
Gossip-based generates minimum
replication overhead
Replication overhead
increases non linearly
>4x gain compared to Random Partitioning
8 16 32 6402468
101214161820
Gossip-based
Evaluation- dynamicity
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware
• Number of Servers =16, Replication factor=2
1 157 313 469 625 781 937 10931249140515610.2
0.25
0.3
0.35
0.4
0.45
1 125 249 373 497 621 745 869 993 111712411365148916130.2
0.25
0.3
0.35
0.4
0.45
SNAP-Twitter SNAP-Facebook
Number of cycles Number of cycles
Replic
ati
on O
verh
ead
Replic
ati
on O
verh
ead
Spikes show bulk edge addition
AlgorithmStabilization
19
Transition state, i.e., reducing the
number of replicas after new edge
additions
Conclusion
• Random Partitioning does not provide efficient solution of Online Social Networks
• Minimizing Replicas can help to achieve better partitioning
• Gossip-based heuristic was proposed to solve the minimization problem while achieving the global optima
• Algorithm able to handle different datasets and adjusts with dynamic nature of OSNs
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 20
Gossip-based Partitioning and ReplicationMiddle-ware forOnline Social Networks
Muhammad Anis Uddin Nasir(EMDC/ICT/LCN)
Supervisor: Šarūnas GirdzijauskasExaminer: Johan Montelius
Future Work
• Execution of the algorithm with large datasets using parallel graph processing frameworks like GraphLab and Apache Girpah
• Load Balancing using both Master and Replicas and providing different consistency levels
• Smart Replication to provide data locality for highly interactive nodes
• Implement different consistency strategies based to access patterns
04/18/2023 Muhammad Anis Uddin Nasir- Gossip-based Partitioning and Replication Middle-ware 22