humanities research recommendations via …cs229.stanford.edu/proj2016spr/poster/008.pdfempirical...

Humanities Research Recommendations via Collaborative Topic ModelingNitya Mani and Andy Chen

Overview of Recommendation Algorithms

Recommendation Algorithms:

Research Recommendations:• Datasets: CS/STEM research publications• Content-Focused: keywords imply readership• Filtering: extensive user feedback required• Hybrid: imbalanced + large datasets needed

Content-Based:• Item Keywords• Probabilistic

Topic Modeling• Cluster Analysis• ScientiOic Articles• Music + Movies

Filtering:• User Ratings• Nearest neighbor• Implicit + Explicit• Current News• Shopping + Social

Networks

Hybrid:• Collaborative• Knowledge-Based• Content-Based• Weighted• Mixed• NetOlix

Dataset

Humanities research publications• Topic modeling less effective• International Journal of Comparative Psychology

Small amounts of user feedback• Author-driven interest• CiteULike user libraries

Goal• Adapt hybrid modeling algorithm• Effective for little + no user feedback

Collaborative Filtering: Matrix Factorization

Setup• I users U={u1,....,uI} and J items V={v1,....,vJ}• Userirecommendsitemj:rij=1(else0)• Fix hyperparameters:λu,λv

Collaborative Topic Regression

Model Overview:1. Users have interests

(implicit article recs)2. Documents have topics

(LDA) some of which explain readership

Initialize:1. Foreachuseri=1,...,Iui~N(0,1/λuIK)

2. Foreachitemj=1...,Jvj~N(θj,1/λvIK)

3. Assumerij~N(uiTvj,cij-1)

Learning:1. Model latent document

vector with content2. Find MAP estimate of U,V,R (coordinate ascent)

3. Minimize regularized LS

Coordinate Ascent:

Empirical Study: Simulating User Feedback

• Often no access to user feedback• Simulate user-item interactions to improve recs• Users: lists of original recommendations• Updated using CTR and cross-validation• International Journal of Comparative Psychology• 4827articles,580“users”,20recs/user

Empirical Study: Humanities Research

• Sparser datasets (fewer users, recommendations)• Topic models less accurate/relevant• Non-content-focused abstracts• CiteULike: Users studying Eastern and European

languages, History, Linguistics, Classics, Politics• 1269 articles, 220 users, 715 user-item interactions

Making and Validating Recommendations

• Article Recommendation• Recommendation rating is expected value of uiTvj• Provide at least 10 recommendations if user has provided atleast20 recommended articles

• Ranking Articles• Rank articles by the predicted recommendation uiTvj• Chose prediction bar 0.75/0.9 (conOidence to recommend)

• Recommendation Validation• Precision

• Predicts the hidden original article (simulating user feedback)• Predicts relevant witheld recommendations

• Recall• Recalls the original provided user-item interactions with high

conOidence (rating over 0.9)

Data Overview + Analysis

Simulating Implicit User Feedback• Hyperparametersearch:optimalprecision+recallatK=100,λu=0.01,λv=0.1,cij=1,0.001

• 97%accuracyinrecommendingthehiddenoriginalarticlewith>0.9conLidenceandwithintop10recommendations

• 99.9%recallinratingprovidedrecommendationswithintop20andconLidence>90%

• 95%precisioninrelevanceofrandomsampleofrecommendations

CiteULikeHumanitiesResearchers• Hyperparametersearch:K=40,λu=0.01,λv=100,cij=1,0.01• Evaluateaccuracyonuserswith>20recommendations• 92%accuracyfortraininguser-articlerecommendations• Predicted64%ofentireuserrecommendations(halfhidden)–extremelyunlikelybychance~Bin(20,1/715)

• Precisionbasedonrandomsample:89%(forbothpredictionbars)

Applications + Current Work

• ApplicationofLDA+CFcanimprovecontent-basedrecommendationsforplatformswithoutaccesstouserfeedback

• Hybridmodelscaneffectivelyrecommendonsmalldatasets• Articleswithlargeproportionofout-of-vocab,non-Englishwords• Currentprojectwork:• Diversifyingtopicsinarticledataset• RunningLDAonintroductionratherthanabstract• ApplyingHMMwithLDAratherthanusingbag-of-words• Updatingparametersbasedonarticlecitationsandauthors

Sample Data(CiteULike)

Eastern Languages Users:

Probabilistic Topic Modeling: Latent Dirichlet Allocation

Setup• M documents W1,....,WM in corpus• K topics β1,β2,...,βK ; distribution over vocabulary V• Fix hyperparameters α,β,ξ

InitializeforeachWi:1. Wordlength:Ni ~Poisson(ξ)

2. Topicdistribution:θj ~Dirichlet(α)overKtopics

Foreachwordwi∈Wi:1. Chooseatopic:zij ~Multinomial(θi)

2. Choosetheword:wij ~p(wij |βzij)conditionontopic

Maximizelikelihood:1. Givenvalueofparameterα

2. EMalgorithmtolearnβ1,...,βKandtopicsθ1, ..., θM

Initialize:1. Foreachuseri=1,...,Iui~N(0,1/λuIK)

2. Foreachitemj=1...,Jvj~N(0,1/λvIK)

Foralluserpairs(i,j):1. Assignaratingrij~N(uiTvj,1/cij)

2. FixprecisionparameterscijtoreLlectconLidence

OptimizeU,V:1. Minimizeregularizedleastsquarederroroveralluser-articlepairs

2. PredictratinguiTvj

Sample Data (UserSimulation)

User6wq9p6zn

RatingconLidencepercentofprovidedandwitheldarticlerecommendations

(K=25,λu,λv=0.01,cij=1,0.01)

Titles of Provided Article Recommendations Class Rank

Play and Developmental Outcomes in Infant Siblings of Children with Autism + 1

Teaching to Play or Playing to Teach: An examination of play targets and generalization in two interventions for children with autism

+ 3

The Development of Strain Typical Defensive Patterns in the Play Fighting of Laboratory Rats

+ 6

A Novel Teacher Implemented Protocol to Assess Early Social Communication and Play Skills in Preschool Children with Autism

+ 7

Role of Peers in Cultural Innovation and Cultural Transmission: Evidence from the Play of Dolphin Calves

+ 9

A normative model of peer review: qualitative assessment of manuscript reviewers’ attitudes towards peer review

– 10

Impacts of Ferry Terminals on Juvenile Salmon Movement along Puget Sound Shorelines

– 14

Securing Resources in Collaborative Environments: A Peer-to-peer Approach – 15

Peer-mediated inference making intervention for students with autism spectrum disorders

– 16

Towards Distributed Data Collection and Peer-to-Peer Data Sharing – 17

Titles of Witheld Article Recommendations Class Rank

Pretend Play of Young Children in North Tehran: A Descriptive Cultural Study of Children's Play and Maternal Values

+ 2

More than a Child’s Work: Framing Teacher Discourse about Play + 4

Integrated Drama Groups: Promoting Symbolic Play, Empathy, and Social Engagement With Peers in Children with Autism

+ 5

Comparing Object Play in Captive and Wild Dolphins + 19

Development of “Anchoring” in the Play Fighting of Rats: Evidence for an Adaptive Age-Reversal in the Juvenile Phase

+ 20

Normative model of peer review - Qualitative assessment – NP

Strategic defense and the global public good – NP

Gender-Typed Play Behavior in Early Childhood: Adopted Children with Lesbian, Gay, and Heterosexual Parents

+ 28/NP

Japan’s Defense White Paper as a Tool for Promoting Defense Transparency – NP

Normative model of peer review - Qualitative assessment – NP

Titles of New Article Recommendations Class Rank

The Development of Juvenile-Typical Patterns of Play Fighting in Juvenile Rats does not Depend on Peer-Peer Play Experience in the Peri-Weaning Period

+ 8

Sacred Playground: Adult Play and Transformation at Burning Man + 11

Altruism in Animal Play and Human Ritual + 12

How Studies of Wild and Captive Dolphins Contribute to our Understanding of Individual Differences and Personality

+ 13

The Behavioral Development of Two Beluga Calves During the First Year of Life + 18

LDA Topic Model

LDATopicModelVisualizationforK=25(CiteULikeHumanities

Research)Sample Topics (IJCP)• 'health risk methods factors'• 'cultural american historical'• 'expression genetic function'• 'species patterns california populations habitat'• 'brain activity neural cell'• 'public policy economic state'

AccuracyontrainingandtestingdatawithvariednumbersoftopicsK