machine learning for recommender systems in the job market
TRANSCRIPT
Machine Learning for
Recommender
Systems in the Job
Markethamburg.ai, May 2017
Fabian Abel
Challenge
Given a user, the goal is to recommend job postings…
1. that the user may be interested in and
2. for which the user is an appropriate candidate.
2
Scala Dev(m/w)
ScalaEngineer
Scala Dev, Hamburg
user
job postings
Job
recommende
r
companies
recruiter
19M
750k-1M
3
Goals / Triangle of contradiction
Scala Dev,
Hamburg
• Relevant recos
• No spam
• Relevant
candidates
• High reach
• Happy customers
• High revenue (e.g. many
clicks on paid content)
companies
user
Job recommendations
5
mobile email
Job recommendations
Job recommendations
8
9
Job Recommender REST Service
GET /rest/recommendations/jobs/user/42
//response:
{
"total": 20,
"collection":[
{"item_id": 7263, "score": 0.87, "reason": [..],..},
{"item_id": 6526, "score": 0.81, "reason": [..],..},
...
]
}
10
Search indices
XIN
G
Sou
rces
/ X
ING
ser
vice
s
MySQLNoSQL
live updates
Batch processing
batchupdates
Infrastructure for recommendersR
eco
mm
en
der
RE
ST
serv
ice
XING Products
Deployment Infrastructure
11
Search indices
XIN
G
Sou
rces
/ X
ING
ser
vice
s
MySQLNoSQL
live updates
Batch processing
batchupdates
Infrastructure for recommendersR
eco
mm
en
der
RE
ST
serv
ice
XING Products
Deployment Infrastructure
12
Title
Company
Employment type
and career level
Full-text
description
Key properties of a job posting
13
Key sources for understanding user demands
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings, shown
big data
kununu
Interactions of
similar users
similar usershadoop
scala
14
Relevance Estimation
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop skills & co.
Interactions
data
web
social media
clicks, bookmarks,
ratings
big data
kununu
Interactions of
similar users
similar usershadoop
scala
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines(regression model)
Logistic Regression
P(relevant | x) = 1
1 + e -(b0 + bi xi)i
n
feature vector impact of feature xi
15
Relevance Estimation + Additional Filters
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Core
RecSys
engines(regression model)
Location-
based
filtering
Frequenty
Shown
Filtering
Monetary-
based
diversification
Career Level
filtering
Filtering &
Diversification
0.92 0.8 0.76
…
4 core sub-recommender
engines and 19 filters that
together analyze and exploit
around 200 features
(relevance criteria)
...
16
Collaborative filteringTheory: User-based and Item-based CF
User-Item-Rating Matrix
Anna3 - 4 - 2
Julia2 - 5 4 1
Tim4 3 - 5 1
John- 4 5 4 -
Java D. SAP Co Data En Data Sc BI Dev
User-based CF:
Compare users based on their
ratings (e.g. cosine sim.)
Use the n most similar users to
predict a rating on an item
Item-based CF:
Compare items based on their
ratings (e.g. cosine sim.)
Use the n most similar items to
predict a rating from a user
(simple weight average)
17
Collaborative filteringReality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna- - 1 - -
Julia- - - - -
Tim- - - - -
John1 - - - -
Java D. SAP Co Data En Data Sc BI Dev
High level of sparsity:
classical collaborative
fitering (or matrix
factorization) does not
work
18
Collaborative filteringReality: Ultra sparse User-Item Matrix and primarily implicit feedback
Anna- - 1 - -
Data
Sci- - 32 18 -
Tim524 3 1 - -
John- - 2 4 -
Java D. SAP Co Data En Data Sc BI Dev
Data
Scientists
Skilled
in Java
BI Dev
Pseudo CF:
Cluster users based on...
jobrole
skills
field of study
Recommend items that simillar
users (= clusters) interacted with
New item problem remains...
19
Content-based filteringExample: semantic search
Fabian Abel
Data Mining Expert
Haves:
Interests:
ML, j2ee
Hadoop
Raw profile Ontology-based
Data ScientistSynonyms: Data Mining Expert, Data
Mining Specialist, …
6940
263
JEESynonyms:
J2EE, Java
Enterprise, …
370
Computer ScienceSynonyms: Informatik, Comp.
Sci., CS, …
162
HadoopSynonyms:
Apache
Hadoop, …
473
Machine
LearningSynonyms:
Maschinelles Lernen,
…
[jobrole]
[skills]
[field of studies]
Education: Computer Sci.
query
TFxIDF
20
Content-based filteringExample: more-like-this component
Anna
Bookmarked, rated
and applied-to job
postings
1 2 3
q = trans( 1 2 3 )
Recommending
similar items
q
7 8 9R =
8
9
7
TFxIDF
Re-rank by similarity of
topic model vectors:R’ map { r =>
val x = B’ map { b =>
cosineSim(r, b)
}
r -> x.sum / x.size
} sortBy(-_._2)
8
7
9 Re-ranking: - LSI
- Word2Vec
Topic model
vector
representations
1 2
7 8
3
9
1 2
8
3
97R’=B’=
=B=R
21
Content-based filteringExample: more-like-this component
CTR
TFxIDFLSI-based re-ranking
+3.2% +3.1%
Word2Vec-basedRe-ranking
ChallengesIssues that we have to fight with…
22
23
Profiles vs. People’s wishes for their
future
past
past
Profile describes a
user‘s past/current
position(s), not future
wishes
What John writes…
24
And what he means…
Recruiter-John
International Sales Manager Call Center Agent(10 EUR per hour)
Sales Manager Sales Manager for B2B
customers(80K EUR per year)
Data Scientist skilled in Hadoop,
Scala, Elasticsearch, … with PhD in …
Data Analyst(skilled in SAS or Excel)
What Paul says he is…
25
And what he means…
Paul, the Candidate
CEO Network Engineer(currently unemployed)
BI Engineer(skilled in old-school ETL)
Shopman(in a kiosk)
Data Scientist with 100+ skills
Sales Manager
26
Understanding the meaning of things that recruiters
write in job postings and users write in their profiles is
not trivial…
27
People freak out if we
recommend
something wrong!
Try to eliminate
freakommendations
(outliers)
Outlier Filtering
Core
RecSys
engines
Location Filter
Outlier filterFiltering &
Diversification
0.92 0.8 0.76
…
Career level
Filter
...
…
...
…
2. Filter:
if (r > threshold) keep
else drop
1. predictRating( , )
= predict(toFeatureVec( , )
= r //rating between 1 and 5
Estimate how a user
would rate the
item…
(training: 750k
explicit ratings)
good recos
bad recos
Perc
en
tag
eo
ffi
ltere
du
ser-
job
po
sti
ng
pair
sb
yra
tin
g
threshold
29
Example: with a threshold of 2.5 we kill 86% of the bad and 18% of the good recos
Outlier FilteringThe “filter onion”: trade-off between killing bad recosand keeping good ones
• xgboost-based model
• Example features (137 features in total):
• Matching & weighting: jobrole, skills, discipline, industry, ...
• Distance: home location / job seeker location
• Transitions: job role job role, field of study job role
• ...
30
Outlier FilteringExample features (137 features in total)
Outlier FilteringSome A/B test results: user success
31
filtering
use
rs w
ith
re
cos
no filtering
-10.9%
+7.4%
use
rs w
ho
clic
ked
on
re
cos
no filtering filtering
Less people get recommendations,
but more users click!
Stricter filtering pays off!
ACM RecSys Challenge http::/recsyschallenge.com
32
Task: push recommendations (new items, paid vs. non-
paid, premium vs. basic users)
Started beginning of March (ca. 240 teams so far), ends in
June
Offline & online evaluation
Still possible to sign-up for the offline evaluation…
Thank you http://2017.recsyschallenge.com
@fabianabel