Transcript
Page 1: Rachel Melamed patient health for drug safety studies ... · Using indication embeddings to represent patient health for drug safety studies Rachel Melamed Biomedical Data Science

Using indication embeddings to represent patient health for drug safety studies

Rachel MelamedBiomedical Data Science @University of Chicago &

Biology @ UMass [email protected] | @RDMelamed

Goal: high-throughput drug safety studiesRandomized trials:

low-throughput but

unbiased

1) From data select people..

✄ ☤☤✄

☤✄ ☤

1) Enroll cohort 2) Randomize treatment 3) Compare experimental groups

Cohort studies: reuse

health data to emulate

randomized trial.

Drug safetyDoes taking this drug

change your risk of

some health outcome?

Exposure Cancer☤Health data

X-ray

ast

hm

a

stat

in

arthritis

✄ ☤

5,000

Precriptions

10,000

Diagnosis

codes

20,000

Procedure

codes

…taking treatment drug

…or comparator drug

Can we match without expert

design?

Creating indication embeddings

Evaluating embeddings

The challenge: confounding

Embeddings identify comparator drugs

Match with embeddings

age

aspirin

P(treated | age,..)

Propensity score match

HEALTHage aspirin

☤diabetes

Evaluate with

propensity score:

P(treated) in

treated cohort

P(treated)

in comp.

P(treated | …)

2) Match to emulate randomization

High-throughput cohort studies Currently, cohort study

design relies on domain

experts:

never

often

25 75age

aspirin

Expert

Task 1

Find suitable

comparator drug

Expert

Task 2

Design

matching—

identify

confounders

The solution: matching

age

Match on confounders! "#$%"$& %'$, %)!*#*+, … )

Insulin resistance

insulin, Type 2 diabetes

amoxi

cillin

xanax …other Rx,

Dx, Px

How to match on 30,000+ dimensional,

sparse, uninformative vectors?Instead map them to small, meaningful

embeddings

Indication

embedding

Drug

embedding

Training task:

Predict drug

Simple neural network

150 million patient histories

☤blo

od lab

gout

statin

diab

etes✄

New Rx metformin

History

event

New Rx

( , )

( , )

Create training

examples

( , )

( , )✄

Embeddings relate codes to health needs

Drug embedding = drugs

given in most similar health

contexts

Indication embedding =

health context for prescription

of a new drug

Map each event to 50-dimensional vector

For each drug, performance of embedding

distance to predict indications. Overall

ROCAUC = .82

auc

Dot-products between

antidepressants and selected

closest diagnoses.

tricyclicSSRI

SNRI

anticonvulsants

antipsychotics

Drugs with closest embedding dot

product are more comparable, as

measured by AUC

Drugs with same therapeutic use

as carbamazepine: primarily

anticonvulsant, off-label for bipolar.aripiprazoleasenapinecarbamazepinechlorpromazineclozapinedivalproex_sodiumfluphenazinegabapentinhaloperidolhaloperidol_lactatelacosamidelamotriginelevetiracetamlithium_carbonatelurasidoneOlanzapine

✄ ☤

☤✄☤

✄ ☤☤

ROC AUC

Expert

Task 1

Thera

peutic c

lass

oxcarbazepinepaliperidoneperphenazineprimidoneprochlorperazineprochlorperazine_maleatequetiapine_fumaraterisperidonethioridazinethiothixenetiagabinetopiramatetrifluoperazinevalproic_acidziprasidonezonisamide

2003

2013

25 75age

year

Step 1:

Coarsened exact matching by

age, gender, year, number of Rx

Step 2:

Encode histories àsmall dense vectors

Step 3:

Mahalanobis match on health

summaries within bins

Indication

embedding

(RV x E )

Weighted average

(upweight recent history)

✄ ☤

Embedding matching

Expert

Task 2

!

Embedding can match for key confoundersPropensity score matching

Matching people on

bupropion to trazodone is

complicated by alternate

indication of bupropion for

smoking cessation. Each

point is one person on

bupropion, trazodone, or

varenicline.

Health summary vectors

Embedding better matches

nonsmokers to nonsmokers

0 0 1 0 0 0 0 0 1 0 1 0 0 0 0

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 .2 1

0 .1 .8

Then do simple nearest-

neighbor matching

Top Related