dpsmart: a flexible group based recommendation framework for … · dpsmart: a flexible group based...

32
dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu, Zhaohui Fu, Qingyang Wang† Florida International University Louisiana State University†

Upload: others

Post on 13-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems

Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu,

Zhaohui Fu, Qingyang Wang†

Florida International University

Louisiana State University†

Page 2: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Introduction

Related Work

System Design

Evaluation

Conclusion & Questions?

1.

2.

3.

4.

5.

OUTLINE

1

Page 3: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

2

INTRODUCTION

i

Page 4: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

3INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 5: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

• Limited Discoverability

• 12,376 (24.9%) out of the total 49,737

• Lack of assistance to explore the system

• 700 clients out of 1,127 with a specific record when conducting search

• Drop off visits

• 978,814 visits (58.8%) out of total 1,664,813visits

MAIN PROBLEMS

4

General Problems for Digital Library Domain

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 6: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

What do the users WANT to know?

What do the users NEED to know?

Get to know the users &

their region information

Awareness about the

content

Awareness about the

system condition (usages,

statistic, user interactions)

Recommendation SystemData Mining !!

5INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 7: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Intelligent

Recommendation System

(dpsmart)

Limited user

information

Side Effect

(Noise)

Impact on the

system

performance

Challenges

6INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Multiple

Recommenders

available

Building User

vectors from web

log data only

Grouping

Users from

stereotyping

model

Optimizing

the algorithm

with multi-

process

programming

Allowing multiple

recommenders

work

collaboratively

Page 8: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

7INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

• Implementing a stereotype based recommendation systemin real world system

• Avoiding the intensive labor works and automating the process from the log data extraction to model training.

• Avoiding the need of personal identifiable information and reducing the noise recommendation.

• Improving the system performing by using multi-process programming.

THE TECHNICAL CONTRIBUTIONS

Page 9: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

RELATED WORKS

8

Page 10: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

55%

18%

16%

11%

RECOMMENDERS

Content Based Filtering (CBF)

Collaborative Filtering (CF)

Graph-based Recommendations

Other RecommendationConcepts (Stereotyping, Item-centric Recommendations, andHybrid Recommendations)

• 200 research articles

were published

• 62 methods proposed

9INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 11: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

GLOBAL RELEVANCE LOCATION BASED QUERY/TERM SUGGESTION

(QS / TS)

10

ADDITIONAL METHODS

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 12: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

SYSTEM DESIGN

11

Page 13: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

12INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 14: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Users ➔

Collections Items Search

Terms

ID Involved? Geo-

Location

User Type

{number} {number} {number} {True / False} {Number} {Nominal}

Web Server Log Database Log Metadata Engine

By linking the records to

the metadata content will

extend 20,000 dimensions

AUTOMATIC WEB SERVER LOG MINING

MODULE

13INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 15: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Web Server Log

Metadata Engine

Pre-processing

• Bots

• Suffixes (.js, .css, etc)

• Localhost

• Internal IPs

IP + Date Logs

(Unique Users)

Unique Users Matrix

Contain FI#

Dimension Expansion

• Subject key words

• Suffixes (.js, .css, etc)

• Search Condition

• # of visit (rating)

• # of items

• Location (Lat, Lng)

Database Log IP Decoding

Cosine Similarity for CF

Location based Global

Relevance

QS / TS

Database

User Behaviors

Matrix

14

SYSTEM IMPLEMENTATION – LOG DATA

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 16: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

User-group clustering module -

Section 3.2

User Vector

Sub-space

Clustering

Cluster 0 Cluster 1 Cluster 2 Cluster 3 Cluster 4

Internal Academic Library Active Passive

Use Dominate

Features

2

15INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 17: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Subspace Clustering

Bottom Up Search

Grid MethodsTop Down Search

Interactive Methods

Adaptive Grid Static GridPer Instance

Weighting

Per Cluster

Weighting

PROCLUS

ORCLUS

FINDIT

&-Clusters

COSA

PROCLUS

ORCLUS

FINDIT

&-Clusters

PROCLUS

ORCLUS

FINDIT

&-Clusters

16

Subspace clustering

Opensubspace

v3.31

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 18: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

USE DOMINATE FEATUERES TO MARK USER

GROUPS

17

Page 19: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

More available here:

http://dpanther.fiu.edu/dpanther/dpsm

art/usercluster/

18

SAMPLE USER RECOGNITION BY USING DOMINATE FEATURES

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 20: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

CBF CF Global

Relevance

QS / TS Location

Based

Internal Users

Academic

Users

Digital Library

Users

Active Users

Passive Users

19

RECOMMENDATION STRATEGY

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 21: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Customized recommenders module –

Section 3.44

Ste

reoty

pin

g

CBF

CF

Global Relevance

QS / TS

Location Based

Final

Recommendations

20INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 22: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

Very Slow!!!

1.Content based filter: calculate the item-similarity among one specific user to find a threshold to provide

CB recommendation

2.Collaborative filtering: Calculate user similarity for current user to find if there is a similar user group.

Calculation based on Cosine Similarity

21

CUSTOMIZED RECOMMENDERS

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 23: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

3. Global Relevance (GR): Frequency of subject key works appearance in high occurred words plus the

location information

4. Term Suggestion (TS) / Query Suggestion (QS): The similarity of the term to the metadata

is generated by using the Jaccard index as suggestion by:

22

CUSTOMIZED RECOMMENDERS

Very Slow!!!

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 24: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

EVALUATION

23

Page 25: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

24

• Whether the multiple-process programming algorithm improves the hosting Digital Repository Systems performance?

• When dpsmart integrated into the dPanther, what are the benefits ofdpSmart regarding the page views, bounce rate & drop-off rate?

• How are the dPanther usability like Avg. Time on Page, Avg. Pageloading, Avg. Direction Time, Avg. Server Reponses Time, and Avg. Page Overload Time improved by integrating dpSmart?

PURPOSE OF EVALUATION

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 26: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

25

SYSTEM PERFORMANCE EVALUATION

Windows Machine

4 Cores of CPU

32 GB Memory

Randomly Selected 4,000 records to run the

CBF with 1-process, 3-processes, 5-processes,

and 7-processes respectively

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 27: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

26

SYSTEM PERFORMANCE EVALUATION

Windows Machine

4 Cores of CPU

32 GB Memory

Randomly Selected 4,000 records to run the

CBF with 1-process, 3-processes, 5-processes,

and 7-processes respectively

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 28: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

27

SYSTEM IMPACT EVALUATION

Experiment Time Line:January, 2016

First module, customized

recommender, implement

2018:Model rebuilding

multiple times

January 2019: Entire framework

implement

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 29: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

28

SYSTEM IMPACT EVALUATION

Experiment Time Line:January, 2016

First module, customized

recommender, implement

2018:Model rebuilding

multiple times

January 2019: Entire framework

implement

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 30: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

29

SYSTEM IMPACT EVALUATION

Experiment Time Line:January, 2016

First module, customized

recommender, implement

2018:Model rebuilding

multiple times

January 2019: Entire framework

implement

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 31: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

1. Design a flexible framework that allow digital library can build their recommendation

system purely from log data and metadata.

2. Facilitate multiple popular recommenders and implement it into a real-world Digital

Repository System:dPanther (http://dpanther.fiu.edu).

3. Minimize the side effect (noisy recommendation) by applying customizable group-

based recommendation strategy

4. The experimental evaluation shows that by applying the multi-process programming,

the model building time can be significantly reduced.

5. The system usage statistics also indicate that during the evaluation time from January

2019 to February 2019, the Page Views have increased compared to 2018,

demonstrating the effectiveness of our proposed framework.

30

CONCLUSION

INTRODUCTION | RELATED WORK | SYSTEM DESIGN | EVALUATION | CONCLUSION

Page 32: dpSmart: a Flexible Group based Recommendation Framework for … · dpSmart: a Flexible Group based Recommendation Framework for Digital Repository Systems Boyuan Guan, Liting Hu,

?QUESTIONS