1 predicting download directories for web resources george valkanasdimitrios gunopulos 4 th...

23
1 Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014 Dept. of Informatics & Telecommunications University of Athens, Greece

Upload: marian-reynolds

Post on 26-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

1

Predicting Download Directories for Web Resources

George Valkanas Dimitrios Gunopulos

4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014

Dept. of Informatics & TelecommunicationsUniversity of Athens, Greece

2

Online User Activities

Activity ABS Survey

StatCan Survey

Infoplease Survey

Emailing 91% 93% 92%

General Web Browsing

87% > 70% 83%

Online Purchases

45% > 50% 62%

Download Content

37% ~30% 42%

3

Facilitating Downloads

Save Link In Folder

4

Facilitating Downloads

Save Link In Folder

Problems:• Predefined Directories• Blunt approach / No learning • UI Clutter• Tedious user management

5

A principled solution

6

A principled solution

Associate the navigation through the hierarchy with a cost function

One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks

HNC(imgs/, docs/) = 2

7

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

8

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

•But if I know T, why not suggest T directly? (0 cost)

9

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

•But if I know T, why not suggest T directly? (0 cost)

In this setting, we don’t know T until it’s too late!

10

Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features

Recommend S that best matches T Use directories from past saves as candidate classes

11

Features & Distances

Feature DistanceTimestamp Exponential decay

Domain (current / referrer) Equality

Path, filename (current / referrer page)

Tokenize & Jaccard

Title Tokenize & Jaccard

Filename Tokenize & Jaccard

Extension Covariance Matrix

Keywords Jaccard

12

Experimental Setup

Implement classifier as a FF plugin DiDoCtor approach Javascript 1-NN classifier

6 participants 4-month minimum use period

Baseline Last-by-domain (LBD), current browser approach Simulated, based on submitted result

Metrics Click Distance: HNC, Breadcrumbs Classification Accuracy

13

Preliminary Result Analysis

14

Preliminary Result Analysis

Take Home Messages1. Users have different saving pattern behavior(s)

15

Preliminary Result Analysis

Take Home Messages1. Users have different saving pattern behavior(s)

2. Users have high variability in their accesses to each directory

16

Click Distance - HNC

Take Home MessageSignificant reduction in number of clicks to reach target directory!

17

Click Distance - HNC

Take Home MessageSignificant reduction in number of clicks to reach target directory!

Click distance gainis even higher

when consideringa breadcrumbs UI!

18

Running Accuracy

Take Home MessageDiDoctor is much more accurate in predicting the download directory

19

Basic Model Extensions

Feature reweightingRELIEF_F

20

Basic Model Extensions

Feature reweightingRELIEF_F

Suggesting k directories

21

Alternative classifiers

Take Home Messages• Classifiers can help!• DiDoCtor generally

performs the best• Accuracy is affected

by user behavior!

22

Conclusions & Future work

Approach for facilitating downloads Optimization problem & classification framework

Experimentation with real users Basic model extensions

Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories

23

Thank you!

Questions?

AcknowledgementsTo the evaluators of our pluginHeraclitus II fellowship, THALIS-GeoComp,

THALIS-DISFER, Aristeia-MMD, EU project INSIGHT