proposing a scientific paper retrieval and recommender framework

21
Proposing a Scientific Paper Retrieval and Recommender Framework Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang Wee Kim Wee School of Communication and Information Nanyang Technological University, Singapore Presentation for ICADL’16 December 7 th 2016 1

Upload: aravind-sesagiri-raamkumar

Post on 15-Apr-2017

52 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Proposing a Scientific Paper Retrieval and Recommender Framework

Proposing a Scientific Paper Retrieval and Recommender Framework

Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang

Wee Kim Wee School of Communication and InformationNanyang Technological University, Singapore

Presentation for ICADL’16December 7th 2016

1

Page 2: Proposing a Scientific Paper Retrieval and Recommender Framework

•Information Retrieval (IR) and Recommender Systems (RS) techniques have been used to find information objects for:-

Scholarly Communication Lifecycle tasks Literature Review (LR) search tasks

•Examples of such tasks include Building a reading list of research papers Recommending similar papers based on seed papers Recommending papers based on query logs Serendipitous discovery of interesting papers Recommending publication venues for manuscripts Recommending papers based on citation context Recommending co-authors for papers And few more….

BACKGROUND

2

Page 3: Proposing a Scientific Paper Retrieval and Recommender Framework

IssuesProposed techniques and applications are piecemeal approaches

Wide variety of algorithms and data fields used in prior studies

What was done?

A prototype system Rec4LRW was built for recommending papers for three tasks:-

1. Building a reading list of research papers2. Finding similar papers based on a set of papers3. Shortlisting papers from the final reading list for inclusion in

manuscript

Task recommendation techniques conceptualized on top of an identified set of base features

BACKGROUND

3

Page 4: Proposing a Scientific Paper Retrieval and Recommender Framework

REC4LRW SYSTEM – TASK 1

4

Page 5: Proposing a Scientific Paper Retrieval and Recommender Framework

REC4LRW SYSTEM – TASK 2

5

Page 6: Proposing a Scientific Paper Retrieval and Recommender Framework

REC4LRW SYSTEM – TASK 3

6

Page 7: Proposing a Scientific Paper Retrieval and Recommender Framework

REC4LRW SYSTEM EVALUATION• Offline evaluation experiment and user evaluation study conducted to

evaluate the Rec4LRW system

• ACM DL extract of papers published between 1951 and 2011 used as corpus for the system with 103,739 articles

• Postgraduate research students, research staff and academic staff were recruited for the user evaluation study

Main entry criteria: Participant should have authored at least one research paper

• Participants evaluated the task recommendations and the overall Rec4LRW system from a list of 43 topics

Online questionnaires were provided at the end of each task

7

Page 8: Proposing a Scientific Paper Retrieval and Recommender Framework

SAMPLE QUESTIONNAIRE

8

Page 9: Proposing a Scientific Paper Retrieval and Recommender Framework

USER STUDY PARTICIPANTS

9

Demographic Variable Number of ParticipantsPosition  

Student 62 (47%)Staff 70 (53%)

Experience Level [Self-Reported]  Beginner 15 (11.4%)

Intermediate 61 (46.2%)Advanced 34 (25.8%)

Expert 22 (16.7%)Discipline Category  

Engineering & Technology 87 (65.9%)Social Sciences 42 (31.8%)

Life Sciences & Medicine 3 (2.3%)Discipline  

Computer Science & Information Systems 51 (38.6%)Library and Information Studies 30 (22.7%)

Electrical & Electronic Engineering 30 (22.7%)Communication & Media Studies 8 (6.1%)

Mechanical, Aeronautical & Manufacturing Engineering 5 (3.8%)Biological Sciences 2 (1.5%)

Statistics & Operational Research 1 (0.8%)Education 1 (0.8%)

Politics & International Studies 1 (0.8%)Economics & Econometrics 1 (0.8%)

Civil & Structural Engineering 1 (0.8%)Psychology 1 (0.8%)

Page 10: Proposing a Scientific Paper Retrieval and Recommender Framework

DATA ANALYSIS PROCEDURES Quantitative Data

Ascertain the agreement percentages of the evaluation measures

Logistic regression, t-test and correlation tests

Qualitative DataIdentify the top preferred and critical aspects of the tasks

and the overall systemFeedback responses were coded by a single coder using an

inductive approach

10

Page 11: Proposing a Scientific Paper Retrieval and Recommender Framework

EMERGENT THEMES AND A FRAMEWORK• Certain dominant themes were apparent from the qualitative feedback

• These themes were consolidated into a single framework - Scientific Paper Retrieval and Recommender Framework (SPRRF)

Why do we need a framework?• Most RS and IR studies are single dimensional i.e. algorithmic

• Need to consider the overall context towards providing a meaningful experience

• Framework generation based on empirical data

• Guide the next round of evaluation of Rec4LRW system

11

Page 12: Proposing a Scientific Paper Retrieval and Recommender Framework

THEMES (1-2)Theme 1: Distinct User Groups•Users who want more control

Participants required control features in the UI and gave preferences on the algorithms logic

“..Maybe a side window with categories like high reach, survey etc could be put up and upon clicking it, more papers in that category could be loaded.”

•Users who tend to trust the system and its outputParticipants were largely satisfied with the overall system

“The idea of providing this system is quite* good. Such a system if developed and prepared well, can help and speed up the process of literature survey by helping to find better papers…”

Theme 2: Information Cues•Four cue labels used in the system: Recent, Popular, High Reach, Survey/Review

•Cues positively impacted participants’ perceptions of the system“I like the highlighted recommendations - for e.g. Popular, Recent etc. which greatly helps in

distinguishing various references and catches the eye !” 12

Page 13: Proposing a Scientific Paper Retrieval and Recommender Framework

THEMES (3-4)Theme 3: Forced Serendipity vs Natural Serendipity•Prior studies have focused mainly on modelling serendipity

•‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier

“The view papers in the parent cluster function is very helpful to get a full picture of research field.”

“The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.”

Theme 4: Learning Algorithms vs Fixed Algorithms•Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2

•These users expect a list of appropriate algorithms to be presented in the system

“..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because,

a good work generally cites other prominent works in the field…”

13

Page 14: Proposing a Scientific Paper Retrieval and Recommender Framework

14

Page 15: Proposing a Scientific Paper Retrieval and Recommender Framework

THEMES (3-4)Theme 3: Forced Serendipity vs Natural Serendipity•Prior studies have focused mainly on modelling serendipity

•‘View Papers in the Parent Cluster’ feature helped participants in noticing papers which they have not read earlier

“The view papers in the parent cluster function is very helpful to get a full picture of research field.”

“The user can view many papers in the parent cluster in addition to the shortlisted papers. Thus the user need not spend much time on finding related papers.”

Theme 4: Learning Algorithms vs Fixed Algorithms•Some participants in the study suggested heuristics to identify papers for the tasks 1 and 2

•These users expect a list of appropriate algorithms to be presented in the system

“..Take a high impact paper (based on citation and may be exact keyword matching), then go through its own references to understand more about the research conducted. This is because,

a good work generally cites other prominent works in the field…” 15

Page 16: Proposing a Scientific Paper Retrieval and Recommender Framework

THEMES (5-6)Theme 5: Inclusion of Control Features in User-Interface•Many participants felt handicapped by the absence of control features in the Rec4LRW system

•Expected control features were sort options, topical facets and advanced search features“Really good for the initial review. It would be nice to see additional filters to focus on a specific

topic”

“More recent papers shall be included, and it is better if the user can sort the recommended paper by sequence such as sort times, date, relevance...”

Theme 6: Inclusion of Bibliometric Data•Participants explicitly stated the need for metrics such as impact factor and h-index in the UI

•The main challenge is the computing overhead for calculating the new metrics“Categorizing the papers based on popularity, journal impact factor, and etc”

“…In case that an item in the recommendation list is a journal paper, can we also know its impact factor and which databases indexes it?”

16

Page 17: Proposing a Scientific Paper Retrieval and Recommender Framework

THEMES (7-8)Theme 7: Diversification of Corpus•The evaluation of algorithms has been restricted to datasets from certain disciplines such as computer science in prior studies

•Future studies should include papers from “far-apart” disciplines for the evaluation

“…Due to limitation of data sets (as only ACM papers) search result is not of decent quality.”

“But in general the main drawback is that "the papers in the corpus/dataset are from an extract of papers from ACM DL". As I work at the intersection of information systems and business

many relevant papers are not included in the list.”

Theme 8: Task Interconnectivity•Participants appreciated the utility of ‘seed basket’ and ‘reading list’ towards management of the paper across the three tasks“I like the idea of giving recommendations based on a seed group of articles, but there needs to be more facets to select from, there needs to be greater selection of seeding articles as well in

terms of those facets.”

“The whole idea seems good for me, especially making seed of 5+ for expanding the bunch.”

17

Page 18: Proposing a Scientific Paper Retrieval and Recommender Framework

THE FRAMEWORK

18

SPRRF Feature Skill-Reliant User System-Reliant UserUI Customization    

Sort options  Topical Facets

Advanced search options  Algorithmic Customization    

Setting the recommendations count Selecting the retrieval algorithm  

Submitting external papers User Personalization    

Paper collections Favourites specification

Paper anchors  Relevance feedback  

Page 19: Proposing a Scientific Paper Retrieval and Recommender Framework

FUTURE WORK• SPRRF to be used in second round of Rec4LRW

evaluation studies

• SPRRF components to be statistically validated through hypotheses

• Expand the scope of SPRRF to other information objects in the Scholarly Communication Lifecycle

19

Page 20: Proposing a Scientific Paper Retrieval and Recommender Framework

GET ACCESS TO REC4LRW…Use the link http://goo.gl/XgynzY or scan the below QR code

20

Page 21: Proposing a Scientific Paper Retrieval and Recommender Framework

THANK YOU

21