search at zocdoc machine learning to power · pdf filesearch at zocdoc pedro rubio head of ......
TRANSCRIPT
![Page 1: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/1.jpg)
This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be reproduced or shared, in whole or in part, without the express written authorization of Zocdoc, Inc.
Leveraging AWS and Machine Learning to Power Search at ZocdocPedro Rubio Head of Search Engineering
Brian d’Alessandro Head of Data Science
![Page 2: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/2.jpg)
Agenda
- How we’re built - People and Architecture- How we’re built - the Data - Questions
![Page 3: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/3.jpg)
3
Problem Statements:
1. Patients need to find and book with a doctor, and, 2. Patients don’t often know what kind of doctor they
need.
![Page 4: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/4.jpg)
4
![Page 5: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/5.jpg)
How we’re builtAnd solving “what the patient means”
![Page 6: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/6.jpg)
6
• Cross team collaboration enabling maximum iteration
speed
• Deliver recommendations < 200 ms
• Patient satisfaction
(And our architecture plays a big role here!)
Core Optimization Problems for ZD Search
![Page 7: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/7.jpg)
7
The Search Team
EngineeringProduct
Data ScienceDesign
![Page 8: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/8.jpg)
Zocdoc Tech Stack- NodeJS, ES6, Babel, React
- AWS - Cloudformation, Docker, ECR,
EC2, ELB
- Kinesis / Firehose - S3
- reporting to data-lake
- Monitoring with Datadog
- Routes with Express
![Page 9: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/9.jpg)
Our Legacy Search
![Page 10: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/10.jpg)
Free Text (Patient Powered) Search
![Page 11: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/11.jpg)
11
Types of Intent
Name of doctor
Medical procedure
Specialty
Symptom
![Page 12: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/12.jpg)
12
Intent Parsing
Architecture
Machine Learning
Design
![Page 13: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/13.jpg)
Phase I - Auto-SuggestSearchService Pipeline
Browser
Doctor Name Retrieval
Semantic Service
NLP
Corpus Building
Specialty Retrieval
Visit Reason Retrieval
Service Handler
Semantic Retrieval
Auto-Suggest Ranking
Logging Ranking Models
Results
Phase II - Backend Search
![Page 14: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/14.jpg)
14
The structured queries comprise a reasonable percent of traffic, but are a
minority of total search terms we service.
Solving for the Long Tail
Specialties = O(10^2), Procedures = O(10^3), Names = O(10^6), Other = O(10^7)
We use Natural Language Processing (NLP) algos to map unstructured terms into our structured search set.
![Page 15: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/15.jpg)
15
Variations
• Heart beats too fast
• Heart flutters• Pulse rate too high• Irregular pulse• Heart out of
rhythm• Irregular heartbeat• Heart palpitations
Concept Interpretation
• Irregular heartbeat
• Heart palpitations
Medical Term
• Atrial Fibrillation
Different Representations of Same Concept
![Page 16: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/16.jpg)
16
ZocDoc Semantic Servicef(“presentation anxiety”) = {[{Specialty =“Psychologist”, Relevance = 0.8}, …,{Specialty
=“Psychiatrist”, Relevance = 0.7}]}
![Page 17: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/17.jpg)
Early Results (And Why You Need to Always Experiment)
![Page 18: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/18.jpg)
18
Searches that Lead to “Nephrology”
Many patients don’t know what a Nephrologist is. They don’t need to know to find one now.
![Page 19: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/19.jpg)
How We’re Built - The Data
![Page 20: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/20.jpg)
Data - Indexing so we can Search
![Page 21: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/21.jpg)
Lesson Learned with Indexing Data
Monolith Live Cache Feed Process S3 ƛ
AWSLegacy Layer Elastic.co
Lambda’s act as a mini ETL layer getting the documents ready for our retrieval stage.
- Lambda memory max 1500mb- Our data much larger- Manage state in S3 and
Elasticsearch
- Complex “stateless” ETL process that transforms this data into the data that we need in Elasticsearch
- Load piecemeal into Elasticsearch- At the very end, swap alias to use
newly uploaded indexes
![Page 22: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/22.jpg)
More complex Processing 1
2
Joined Data Set
Mapping from 1 -> 2
3
Mapping from 1 -> 3
Business Logic Application
- Spark - ETL
- Get over 1500mb limit
- Get over 5 minute runtime limit
- Easily add more data-sets
- Currently in Databricks
- Plan to migrate to EMR (Elastic Map Reduce)
New ETL - Spark
![Page 23: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/23.jpg)
Data - Event Data So we can Learn
![Page 24: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/24.jpg)
24
The MarketplaceGoal: Make it as easy as possible to match the user to the right doctor.
Considerations:• How to weight distance
vs. availability vs. experience vs. reviews?
• Does Dr. take this type of patient?
• Are we meeting regulatory requirements?
![Page 25: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/25.jpg)
25
Optimize: algo iteration speed
Subject to:
• Org too small to justify full time data scientists within
search
• Throwing models over the wall to be implemented
doesn’t work
Organizational Optimization
![Page 26: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/26.jpg)
26
Model APIQuery
RankedResults
Transformations + Model Scoring
Model dB
FilteredResults
ScoredResults
ZocDoc Prod Service
(Search)
Logs (S3/Redshift)
Research, Analysis, Model Development
(Spark/Redshift)
Production
OfflineEngineering Owned DS Owned
Agile Machine Learning
![Page 27: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/27.jpg)
27
Aqueduct: Filling the Data Lake
Some Data Lake principles:
• Allow producers to easily push data
• Allow data format changes
• Smart ETL to make consumption very easy
![Page 28: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/28.jpg)
28
Cistern: Making Datalake Drinkable
• “Raw” data lake good for exploratory research (we use Spark)
• “Clean” data lake better for analytics and quick exploration
![Page 29: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/29.jpg)
Data - Insights
![Page 30: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/30.jpg)
30
![Page 31: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/31.jpg)
31
Searches for Therapy/Therapist on ZocdocWe’ve got our fingers on the pulse of public health trends.
![Page 32: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/32.jpg)
32
How Much is that Smile Worth?Click Conversion by Search Rank and DrIsSmiling
We’re exploring AWS Rekognition to research what drives user interest in Dr. profiles.
![Page 33: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/33.jpg)
![Page 34: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/34.jpg)
34
![Page 35: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s](https://reader031.vdocument.in/reader031/viewer/2022021818/5aacf3c97f8b9a2e088d9ed0/html5/thumbnails/35.jpg)
Thank you and Questions!