qingpeng zhang 0711
TRANSCRIPT
![Page 1: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/1.jpg)
Introducing VenmoPlus.com - Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
![Page 2: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/2.jpg)
![Page 3: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/3.jpg)
Features - VenmoPlus.com
● fuzzy searching of user name, with friend list to help identify users with same name
● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user
![Page 4: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/4.jpg)
Features - VenmoPlus.com
● fuzzy searching of user name, with friend list to help identify users with same name
● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user
![Page 5: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/5.jpg)
Demo:VenmoPlus.com
![Page 6: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/6.jpg)
Challenge:● Find the distance between nodes in dynamic graph in real time
![Page 7: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/7.jpg)
Solutions
● Two databases○ Redis and ElasticSearch
● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction
● Query/search optimizations
![Page 8: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/8.jpg)
Solutions
● Two databases○ Redis and ElasticSearch
● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction
● Query/search optimizations
![Page 9: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/9.jpg)
Historical transactions
Real time transactions
A Tale of Two Databases
API
![Page 10: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/10.jpg)
Redis for graph structure
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)35 million edges6 million nodes
![Page 11: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/11.jpg)
ElasticSearch for everything
![Page 12: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/12.jpg)
Redis
Elasticsearch
![Page 13: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/13.jpg)
Redis + Elasticsearch => search transactions in friend circle
![Page 14: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/14.jpg)
VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day
![Page 15: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/15.jpg)
Qingpeng “Q.P.” Zhang
● Postdoc○ Lawrence Berkeley National Lab
● PhD in Computer Science, ○ Michigan State University
What I learned from Insight:
● Thinking as data engineer● Open source tools
○ Redis, Elasticsearch, Kafka, Spark Streaming, Flask, AngularJS, etc.
![Page 16: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/16.jpg)
ElasticSearch for everything
![Page 17: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/17.jpg)
Breadth First Search -> Bidirectional Search
Shortest distance -> intersection of sets (friend lists)
● A’s 1st degree friends ∩ B’s 1st degree friends● A’s 2nd degree friends ∩ B’s 1st degree friends
O(N^2) -> O(2*N)
O(N^3) -> O(N + N^2)
![Page 18: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/18.jpg)
Query relationship of a past transaction
![Page 19: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/19.jpg)
Query relationship of a past transaction
Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)
● If there are transactions before that one, distance = 1● If the transaction is new: distance >1
○ Remove the influence of that specific transaction temporarily○ Check distance from graph (2, 3, or >3)
![Page 20: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/20.jpg)
![Page 21: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/21.jpg)
![Page 22: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/22.jpg)
Pipeline, raw data, in distributed way
![Page 23: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/23.jpg)
Query/Search Optimizations
1. Remove aggregation for better performance… (trade-off)2. Friend recommender:
a. Using Counter to get only 5 users with the most common friends
3. Search message in friend circlea. Combine query of Elasticsearch and Redis
![Page 24: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/24.jpg)
More optimization
● Only store necessary info in elasticsearch● Labeling distance of history transaction can be done in batch job, reduce
the number the real time queries● Adjust AWS instances to reduce cost
![Page 25: Qingpeng zhang 0711](https://reader030.vdocument.in/reader030/viewer/2022021500/58e6b1111a28abfd418b65f7/html5/thumbnails/25.jpg)
Historical transactions
Real time transactions
Pipeline
API