job–candidate siamese cnn for matching...maheshwary et al. 2018. matching resumes to jobs via deep...

1

Siamese CNN for job–candidate matching

Thomas Belhalfaoui

201950M€ raised

JOBTEASERTHE STORY

20091st version of JobTeaser.com website:“Video company presentations” 2013

1st Career Centerby JobTeaser

2014Mobile app launch

20153M€ raised

2016Development of 4 modules

▪ Resources

▪ Users module

▪ Appointments

▪ Statistics

Expansion into 10 European countries

2017New offer for startupsLaunch of the new mobile app

15M€ raised

JobTeaser is a unique ecosystem that builds a powerful bridge between companies and educational institutions,

JOBTEASER ECOSYSTEM

Prepare the new generation to reach their full potential, embrace the future with optimism and make their mark in the world.

Mission

70.000+ clientsuse JobTeaser as their recruitment solution

600+ Career Centers in Europe

integrated into our partner schools and universities

2.5 million students and graduates subscribedJobTeaser is a unique ecosystem that builds a

powerful bridge between companies and educational institutions,

KEY FIGURES

international teams dedicated to

UNIVERSITY, OPERATIONS, SALES

150PEOPLE

A tech team of

90PEOPLE

PEOPLE

A fast growing team of

OUR TEAM

300

5

DATA ENRICHMENTExtract relevant

keywords from job

ads / resumes.

Enrich them with

tags and

categories.

PEOPLE RECOMMENDATION

GUIDANCE TOOLS

GUIDANCE KNOWLEDGE GRAPH

Suggest relevant

peers or alumni

that the student

may want to reach

out to. Develop algorithms

to enhance

guidance tools.

Provide data

science methods to

help build a skills

and occupations

ontology.

CANDIDATE SOURCING

JOB RECOMMENDATION

Find the most

relevant students

for a given job ad.

Find the best job ad

for each student.

DATA SCIENCE CHALLENGES

6

A common embedding space

Embed job ads and resumes in a common space, so that (dis)similarity can be simply measured by the Euclidean distance.

JOB-CANDIDATE MATCHING

CNNresumes

CNNjob ads

7

JOB RECOMMENDATION

embed

embed

Precomputation

approxk-NN

Prediction

student ID

k job ad IDs

CNNresumes

CNNjob ads

8

CANDIDATE SOURCING

embed

embed

Precomputation

approxk-NN

Prediction

job ad ID

k student IDs

CNNresumes

CNNjob ads

9

SIMILAR JOB AD RECOMMENDATION

embed

embed

Precomputation

approxk-NN

Prediction

job ad ID

k job ad IDs

10

EMBEDDING A DOCUMENT

domain-specificfastText

word embeddings

100 x L

...L

documentembedding

80

ReLU+

max

246 x L

...

1Dconvolution

100 x 4 x 246

fullyconnected

layer

246 x 80

11

SIAMESE NETWORK

dap

dan

CNNanchor resume ...

Triplet loss objective

dap < dan + m

... CNNpositive job ad

CNNnegative job ad ...

12

TRIPLET LOSS

ℒ = max(0, dap ‒ dan + m)

Schroff et al., FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015

TRIPLET MININGHow to choose negatives?

Keep only hard and semi-hard ones.

13

https://omoindrot.github.io/triplet-loss

14

ONLINE TRIPLET MINING

anchorresumes

positive job ads

... ... distance matrix

resu

mes

job ads

negative job ads

...

Random positive job ads become negative job ads.

Choose only those that form hard or semi-hard triplets.

anchor-positive pairs

Cosine similarity between average (domain-specific) fastText word embeddings.

Siamese CNN trained with triplet loss and online hard negatives mining.

0.57 0.89Baseline 1 Our model

15

Area under ROC curveon unseen job ads and resumes

Proportion of words from the job ad title found in the resume.

0.61Baseline 2

Average fastText Siamese CNNTitle resume intersection

RESULTS

16

Sia Partners – Consultant(e) Data Science

Alten – Machine Learning Engineer (H/F)

Dassault Systèmes – STAGE - Ingénieur Machine Learning (H-F)

Orange Money F/H – Stage - Data science : analyse de données

Worldline – Quand l'IA joue au trader (H/F)

Aperam – We are looking for an apprentice : Data Analyst - BI (H/F)

Worldline – R&D: Machine Learning pour la santé (F/H) en Stage

Segula – Traitement intelligent des défauts remontés par les rouleurs (H/F)

LyfPay – Data Analyst Stagiaire

Alten – Datascience Engineer (H/F)

Alfic – IT Analyste Quantitatif H/F

JOB AD NEAREST NEIGHBORSJobTeaser – Data Scientist (H/F)

17

Works right away for unseen students and job ads.

No cold start problem

Embeddings are computed when a job ad or resume gets registered.Matching is a simple k-nearest

neighbors search.

Asynchronous architecture

The similarity measure is fine-grained and captures the variety of wordings

in job ads and resumes.

Captures subtle patterns

One needs a hybrid model, with both similarity and some hand-crafted rules (contract type, location, …)

Similarity is not all

Your algorithms can only be as good as your data. t reproduces the biases

of the dataset.

Bias towards applicants preferences

Job ad embeddings tend to cluster together (same for resume

embeddings).

Resume – job ad heterogeneity

DISCUSSION

18

REFERENCES

Schroff et al. 2015. FaceNet: A Unified Embedding for Face Recognition and Clustering.

Hermans et al. 2017. In Defense of the Triplet Loss for Person Re-Identification.

Liang et al. 2019. Personalized Music Recommendation with Triplet Network.

Maheshwary et al. 2018. Matching Resumes to Jobs via Deep Siamese Network.

Jacovi et al. 2019. Understanding Convolutional Neural Networks for Text Classification.

Smith. 2019. Guild AI: Simple Reproducibility In Machine Learning [http://guild.ai]

19

FUTURE WORK

EXPLAINABILITYMULTIPLE LANGUAGES FAIRNESS ATTENTION

Gain insight into how filter outputs are related to

actual predictions.

Handle the variety of (imbalanced) languages,

either explicitly or implicitly

Don’t reproduce the biases contained in the

dataset.

Try other models, such as deep pretrained language models (BERT, …) and/or models with attention.

20

Souhaïl

Data Scientist

Lancelot

Data Scientist

Mously

Data Scientist

Thomas

Lead Data Scientist

Armand

Data Scientist

FULL-STACK MACHINE LEARNING GENERALISTS

?

Machine Learning Engineer / Ops

?

CIFRE PhD student

21

Machine Learning EngineerData EngineersData Analysts. . .

http://engineering.jobteaser.com

WE ARE HIRING!

job–candidate siamese cnn for matching...maheshwary et al. 2018. matching resumes to jobs via deep...

Documents