like a pack of wolves: community structure of web trackers
TRANSCRIPT
![Page 1: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/1.jpg)
Like a Pack of Wolves:Community Structure of Web Trackers
V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece
![Page 2: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/2.jpg)
Ads
Recommendations
Browsing the Web
2
![Page 3: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/3.jpg)
Tracker
Tracker
Ad Server
display relevant ads
cookie exchange
profiling
Tracking
3
![Page 4: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/4.jpg)
4
The study's authors defined "creepiness" by the feeling consumers get when they sense an ad is too personal because it uses data the consumer did not agree to provide, such as online-search and browsing history. Consumers are even more creeped out by this because they don't know how and where that information will be used.
![Page 5: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/5.jpg)
5
![Page 6: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/6.jpg)
Can’t we block them?
proxy
Tracker
Tracker
Ad Server
6
Legitimate site
![Page 7: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/7.jpg)
● not frequently updated● not sure who or based on what criteria URLs are
blacklisted● miss “hidden” trackers or dual-role nodes● blocking requires manual matching against the list● can you buy your way into the whitelist?
Available Solutions
AdBlock, DoNotTrack, EasyPrivacy:
crowd-sourced “black lists” of tracker URLs
7
![Page 8: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/8.jpg)
8
![Page 9: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/9.jpg)
Towards Automatic Tracker Detection
Exploit fundamental properties of web tracker operation to automate tracker detection
● Structural attributes: network positions, connections● Operational aspects: data exchanged, communication
patterns
9
![Page 10: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/10.jpg)
DataSet
6 months(Nov 2014 - April 2015)of augmented Apache logs from a web proxy
● 80m requests● 2m distinct URLs● 3k users
10
● User identification● URL requested● Headers● Performance
information, i.e. latency, bytes
● Tagged as Trackers or non-Trackers with EasyPrivacy
![Page 11: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/11.jpg)
Web Tracking as a Graph Problem
11
facebook.com
youtube.com
google-analytics.com
b.scorecardresearch.com
V: hostsU: Referers
Referer-Hosts Graph
U: URLs visited by the user
V: embedded URLs
![Page 12: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/12.jpg)
Referer-Hosts Graph: Connected Components
12
94% of all trackers belong to the same connected component!
![Page 13: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/13.jpg)
Communities in Graphs
13
Vertices in the same community are likely to be similar with respect to network position and connectivity
Do trackers form communities?
Densely connectedinternally
Sparsely connectedwith each other
![Page 14: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/14.jpg)
h2
h3 h4
h5 h6
h8
h7
h1
h3
h4
h5
h6
h1
h2
h7
h8
r1
r2
r3
r5
r6
r7
NT
NT
T
T
?
T
NT
NT
r4
referer-hosts graph
r1
r2r3
r3 r3 r4
r5r6
r7
hosts-projection graph
: referer: non-tracker host: tracker host: unlabeled host
The Hosts-Projection Graph
14
![Page 15: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/15.jpg)
Hosts-Projection Graph: Degrees
15
#unique referers that tracker / other host are embedded within
![Page 16: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/16.jpg)
Hosts-Projection Graph: Tracker Neighbors
16
Trackers are mainly connected to other Trackers
![Page 17: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/17.jpg)
Web Tracker Communities
17
Popular trackers, e.g. google-analytics
Smaller trackers
Ad servers
Normal webpages
![Page 18: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/18.jpg)
Data Pipeline
raw logs cleaned logs
1: logs pre-processing
2: bipartite graph creation
3: largest connected component extraction
4: hosts-projection graph
creation
5: community detection
google-analytics.com: Tbscored-research.com: Tfacebook.com: NTgithub.com: NTcdn.cxense.com: NT...
6: results
18
![Page 19: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/19.jpg)
h5
h7 h8 h3 h4 h6
h2
h3 h4
h5 h6
h8
h7
h1
Classification via Neighborhood Analysis
19
: non-tracker host: tracker host: unlabeled host
⅖ non-tracker neighbors⅗ tracker neighbors
if % of tracker neighbors > threshold=> classify as tracker
![Page 20: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/20.jpg)
Results
20
![Page 21: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/21.jpg)
Classification via Label Propagation
non-tracker
tracker
unlabeled
Iterative Algorithm forCommunity Detection
● Vertices propagate their labels to their neighbors and adopt the most popular label in their neighborhood.
● Upon convergence, vertices with the same label belong to the same community.
● If an unlabeled node ends up in a trackers community, it is classified as a tracker
![Page 22: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/22.jpg)
Classification via Label Propagation
2
3 4
5 6
8
7
1
i=0
![Page 23: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/23.jpg)
Classification via Label Propagation
2
4
5 6
8
7
1
i=1
{2} {1, 3}
{2, 4, 5} {3, 5, 6}
{4, 5}{3, 4, 6, 7}{5, 8}
{7}
3
5 6
7 6
8
8
2
3
![Page 24: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/24.jpg)
Classification via Label Propagation
3
5 6
7 6
8
8
2
i=2
5
7 7
6 7
8
8
3{3} {2, 5}
{3, 6, 7} {5, 6, 7}
{6, 7}{5, 6, 6, 8}{7, 8}
{8}
![Page 25: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/25.jpg)
Classification via Label Propagation
5
7 7
6 7
8
8
3
i=3
7
7 7
7 7
8
8
5{5} {3, 7}
{5, 6, 7} {6, 7, 7}
{6, 7}{7, 7, 7, 8}{6, 8}
{8}
![Page 26: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/26.jpg)
Classification via Label Propagation
7
7 7
7 7
8
8
5
i=4
7
7 7
7 7
8
8
7
{7} {5, 7}
{7, 7, 7} {7, 7, 7}
{7, 7}{7, 7, 7, 8}{7, 8}
{8}
![Page 27: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/27.jpg)
Classification via Label Propagation
7
7 7
7 7
8
8
7 7
7 7
7 7
8
8
7
![Page 28: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/28.jpg)
Results
28
![Page 29: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/29.jpg)
Conclusions
● Web trackers are well-connected with each other○ 94% of web trackers are in the same connected component
● Web trackers are mainly connected to other trackers○ High clustering, tight communities
● 97% classification accuracy and < 2% FPR with simple methods○ Can be used to build robust and fully automated privacy preservation
systems
29
![Page 30: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/30.jpg)
Like a Pack of Wolves:Community Structure of Web Trakcers
V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)
Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece
![Page 31: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/31.jpg)
Extra Slides
![Page 32: Like a Pack of Wolves: Community Structure of Web Trackers](https://reader031.vdocument.in/reader031/viewer/2022030317/587138e91a28abf0568b64b9/html5/thumbnails/32.jpg)
Referer-Hosts Graph: Degrees
32
#unique referers that tracker / other hosts are embedded within