predicting download directories for web resources
DESCRIPTION
Dept. of Informatics & Telecommunications University of Athens, Greece. Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos. 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014. Online User Activities. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/1.jpg)
1
Predicting Download Directories for Web Resources
George Valkanas Dimitrios Gunopulos
4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014
Dept. of Informatics & TelecommunicationsUniversity of Athens, Greece
![Page 2: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/2.jpg)
2
Online User Activities
Activity ABS Survey
StatCan Survey
Infoplease Survey
Emailing 91% 93% 92%
General Web Browsing
87% > 70% 83%
Online Purchases
45% > 50% 62%
Download Content
37% ~30% 42%
![Page 3: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/3.jpg)
3
Facilitating Downloads
Save Link In Folder
![Page 4: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/4.jpg)
4
Facilitating Downloads
Save Link In Folder
Problems:• Predefined Directories• Blunt approach / No learning • UI Clutter• Tedious user management
![Page 5: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/5.jpg)
5
A principled solution
![Page 6: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/6.jpg)
6
A principled solution
Associate the navigation through the hierarchy with a cost function
One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks
HNC(imgs/, docs/) = 2
![Page 7: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/7.jpg)
7
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
![Page 8: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/8.jpg)
8
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
•But if I know T, why not suggest T directly? (0 cost)
![Page 9: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/9.jpg)
9
Problem Definition
Given The hierarchical structure A target directory T, where the
resource will be saved
Goal Suggest a directory S that minimizes the cost function
cf( S, T )
•But if I know T, why not suggest T directly? (0 cost)
In this setting, we don’t know T until it’s too late!
![Page 10: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/10.jpg)
10
Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features
Recommend S that best matches T Use directories from past saves as candidate classes
![Page 11: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/11.jpg)
11
Features & Distances
Feature DistanceTimestamp Exponential decay
Domain (current / referrer) Equality
Path, filename (current / referrer page)
Tokenize & Jaccard
Title Tokenize & Jaccard
Filename Tokenize & Jaccard
Extension Covariance Matrix
Keywords Jaccard
![Page 12: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/12.jpg)
12
Experimental Setup
Implement classifier as a FF plugin DiDoCtor approach Javascript 1-NN classifier
6 participants 4-month minimum use period
Baseline Last-by-domain (LBD), current browser approach Simulated, based on submitted result
Metrics Click Distance: HNC, Breadcrumbs Classification Accuracy
![Page 13: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/13.jpg)
13
Preliminary Result Analysis
![Page 14: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/14.jpg)
14
Preliminary Result Analysis
Take Home Messages1. Users have different saving pattern behavior(s)
![Page 15: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/15.jpg)
15
Preliminary Result Analysis
Take Home Messages1. Users have different saving pattern behavior(s)
2. Users have high variability in their accesses to each directory
![Page 16: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/16.jpg)
16
Click Distance - HNC
Take Home MessageSignificant reduction in number of clicks to reach target directory!
![Page 17: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/17.jpg)
17
Click Distance - HNC
Take Home MessageSignificant reduction in number of clicks to reach target directory!
Click distance gainis even higher
when consideringa breadcrumbs UI!
![Page 18: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/18.jpg)
18
Running Accuracy
Take Home MessageDiDoctor is much more accurate in predicting the download directory
![Page 19: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/19.jpg)
19
Basic Model Extensions
Feature reweightingRELIEF_F
![Page 20: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/20.jpg)
20
Basic Model Extensions
Feature reweightingRELIEF_F
Suggesting k directories
![Page 21: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/21.jpg)
21
Alternative classifiers
Take Home Messages• Classifiers can help!• DiDoCtor generally
performs the best• Accuracy is affected
by user behavior!
![Page 22: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/22.jpg)
22
Conclusions & Future work
Approach for facilitating downloads Optimization problem & classification framework
Experimentation with real users Basic model extensions
Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories
![Page 23: Predicting Download Directories for Web Resources](https://reader035.vdocument.in/reader035/viewer/2022070401/568135cd550346895d9d316c/html5/thumbnails/23.jpg)
23
Thank you!
Questions?
AcknowledgementsTo the evaluators of our pluginHeraclitus II fellowship, THALIS-GeoComp,
THALIS-DISFER, Aristeia-MMD, EU project INSIGHT