Sharing and growing the world's knowledge with machine learning
Lei Yang ([email protected])
April 2016
Our mission
“To share and grow the world’s
knowledge”
● Millions of questions & answers
● Millions of users
● Thousands of topics
● ...
Demand
What we care about
Quality
Relevance
Data@Quora
Topic Question
User
Answer
Actions
Lots of data relations
Complex network propagation effects
Importance of topics & semantics
Machine Learning@Quora
Ranking - Answer ranking
What is a good Quora answer?
● Truthful
● Reusable
● Provides explanation
● well formatted
...
Ranking - Answer ranking
How are those criteria translated
into features?
● Features that relate to the text quality
itself
● Interaction features (upvotes/downvotes,
clicks, comments…)
● User features (e.g. expertise in topic)
Ranking - Feed
Present most interesting stories for a user at
a given time
● Interesting = topical relevance +
social relevance + timeliness
● Stories = questions + answers
● Personalized learning-to-rank approach
● Relevance-ordered vs time-ordered = big
gains in engagement
● Challenges
○ Potentially many candidate stories
○ Real-time ranking
○ Objective function
Ranking - Feed
● Personalized LTR model
● Features
○ Quality of question/answer
○ Topics the user is interested in
or knows about
○ Users the user is following
○ What is trending/popular
○ ...
● Different temporal windows
● Multi-stage solution with different
“streams”
Recommendations - Topics
Recommend new topics for the user
to follow, based on
● Topics you already follow
● Users you already follow
● Interactions with questions/answers
● Topic-related features
● ...
Recommendations - Users
Recommend new users for the user
to follow, based on:
● Users you already follow
● Topics you already follow
● Interactions with users
● User-related features
● ...
Related questions
Given interest in a question, what other questions
are interesting?
● Not only about similarity, but also “interestingness”
● Features such as:
○ Textual
○ Co-visit
○ Topics
○ …
● Important for logged-out use case
Duplicate questions
● Important issue for Quora
○ Want to make sure we don’t disperse
knowledge to the same question
● Binary classifier trained with labelled data
● Features
○ Textual vector space models
○ Usage-based features
○ ...
User expertise inference
Infer user’s trustworthiness in relation
to a given topic
● We take into account:
○ Answers written on topic
○ Upvotes/downvotes received
○ Endorsements
○ ...
● Trust/expertise propagates through the network
● Useful as input/features in other models
Spam detection and moderation
● Very important for Quora to keep quality of
content
● Pure manual approaches do not scale
● Hard to get algorithms 100% right
● ML algorithms detect content/user issues
○ Output of the algorithms feed manually
curated moderation queues
Content creation prediction
● Quora’s algorithms not only optimize for
probability of reading
● Important to predict probability of a user
answering a question
● Some product features completely rely
on that prediction
○ E.g. A2A (ask to answer) suggestions
Trending topics
Highlight current events that are interesting
to the user
● We take into account:
○ Global “Trendiness”
○ Social “Trendiness”
○ User’s interest
○ ...
● Trending topics are a great discovery mechanism
Models &Experimentation
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision Trees
● Random Forests
● (Deep) Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...
Open source project -- QMF
Quora Matrix Factorization
https://github.com/quora/qmf
● Currently BPR and WALS
● Multithreaded implementation
in C++14
ML platform
● Allow ML Engineers and Data
Scientists to collaborate within
the same ML framework
● Easy integration with well known
tools and open source libraries
● Offline evaluation and debugging
● User friendly Python frontend
● High performance and scalable
C++/CUDA backend
Redshift MySQL
S3 PythonUser Interface
Trainer Box
Session
CPU GPU
Disk
...WALS BPR
● Extensive A/B testing, data-driven
decision-making
● Separate, orthogonal “layers” for
different parts of the system
● Experiment framework showing
comparisons for various metrics
Experimentation
Conclusions
Conclusions
● At Quora we have not only Big, but also “rich” data
● Our algorithms need to understand and optimize complex aspects such
as quality, interestingness, relevance, or user expertise
● We believe ML will be one of the keys to our success
● We have many interesting problems, and many unsolved challenges
We are hiring! www.quora.com/careers