![Page 1: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/1.jpg)
Detecting Multi-Item Associations and Temporal Trends Using the
WebVDME/MGPS Application
DIMACS Tutorial on Statistical and Other Analytic Health Surveillance Methods
18 June 2003
Richard Ferris
![Page 2: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/2.jpg)
Pharmaceutical post-marketing surveillance
Companies and regulatory agencies collect databases of spontaneous adverse reaction reports
Relevant exposure data not readily available (the “denominator problem”)
Can drug-event combinations of potential interest be identified from internal evidence alone?
Approach:– Use an internally defined “denominator”– Construct set of “expected” counts using a stratified independence model
![Page 3: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/3.jpg)
Computation of Expected Counts
The expected count for a given drug-event combination is determined by the overall count for the particular drug (across all events) and the overall count of the particular event (across all drugs)
For example, if 2% of all reports have PROZAC as a drug, and 3% of all reports have RASH as an event, then one would expect that 0.06% (0.02*0.03) of the reports will include this combination (PROZAC in combination with RASH)
(MGPS carries out this computation separately for each distinct “stratum” and sums the strata-specific expected counts to obtain an overall expected count)
![Page 4: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/4.jpg)
1. Comparing Observed and Expected Counts:Relative Reporting Rate
Relative Report Rate (RR): RRij = Nij / Eij
Easy to interpret, easy to compute
Statistically unstable if N is small or E is very small
The following all have RR = 100:– N = 1000, E = 10– N = 100, E = 1– N = 10, E = 0.1– N = 1, E = 0.01
![Page 5: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/5.jpg)
2. Comparing Observed and Expected Counts:Statistical Significance
What is the probability that Nij would be observed by chance (“sampling error”) when expected value is Eij ? (p-value for testing a null hypothesis)
Harder to interpret (not expressed in same units as RR)
Results in computation of absurdly small probabilities that have no meaning
– N=100, E=1 produces 10-158 !
Small RR can be very significant (small p-value) when sample size is very large:
– N = 2000, E = 1000, RR = 2 is more “significant” than– N = 10, E = 0.1, RR =100
![Page 6: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/6.jpg)
3. Comparing Observed and Expected Counts:Empirical Bayes Multi-Item Gamma Poisson Shrinker
Try for best of both previous approaches– interpretability of relative rate– adjust properly for sampling variation
Focus on the distribution across the set of drug-event combinations of the ratios:
– Estimate ij = ij /Eij , where ij ~ Poisson(ij )
Fit a parameterized “prior distribution” function (mixture of two gamma functions) to the empirical distribution of the ’s
Find posterior distribution of after observing N = some value n Use this to obtain posterior estimate of expectation value of given
observation of ij
This posterior estimate is what we call EBGM (Empirical Bayes Geometric Mean); also get lower and upper 95% confidence bounds (EB05, EB95).
EBGM is termed the “shrinkage estimate” for RR
![Page 7: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/7.jpg)
Multi-Item Associationsvs. Pairwise Associations
Consider the case of an item triplet; e.g. 2 drugs and an event
RRijk = Nijk/Eijk where Eijk is based on independence model
EBGMijk = shrinkage estimate of RRijk
Suppose a particular itemset (drug A, drug B, event C = kidney failure) is unusually frequent (EBGM for the triplet is >> 2)
Important to ask:– Is this merely the result of one or more of the pairs (AB, AC, BC) being unusually frequent? OR– Is this a drug-drug interaction
Compare Empirical Bayes estimate of the frequency count of the triplet to the prediction from the all-2-factor log-linear model
– EXCESS2 = (EBGM * E ) – EAll2F – E is the expected count from independence– Computation of EAll2F uses shrinkage estimates of pairwise counts– EXCESS2 is an estimate of how many “extra” cases were observed over what was expected using
the all-2-factor model
Alternate approach: Define Eijk from predictions of all-2-factor model in which case resulting EBGM directly measures divergence of observed count from all-2-factor prediction
![Page 8: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/8.jpg)
Health Authority Adoption of Signal Detection Technologies
FDA– CDER:
Experimented in Office of Biostatistics with GPS for several years Validated GPS Moving to production Have published data mining results on internal web for almost all products
– CBER: initial GPS implementation (VAERS)
– CRADA between Lincoln and FDA to further develop methodology and tools
CDC – Collaborative GPS methodology development with FDA– Includes simulation capability
WHO Uppsala Monitoring Centre – Production safety signal generation mechanism using BCPNN
![Page 9: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/9.jpg)
FDA/GPS Validation Activities
Positive controls– Examine data mining results for drug-event combinations corresponding to
known “labeled” adverse reactions
Negative controls– Examine data mining results for several drugs (with differing safety profiles)
given for the same indication
“Roll back” database in time to determine when method would have provided first signal
![Page 10: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/10.jpg)
Databases of Spontaneous AE Reports
FDA Spontaneous Report System (SRS)– Post-Marketing Surveillance of all Drugs since 1969– Dates from mid-60’s thru 1997– 1.5 Million Reports– Encoded in COSTART
FDA Adverse Event Reporting System (AERS)– US cases, serious unlabeled events from all manufacturers.– All products sold in the US ~5000 Rx’s– Replaced SRS in 1997 – Reactions coded as MedDRA PTs– Quarterly Updates, 4-6 month delay– Drugs are Verbatim– Includes initial and some follow-up reports– Includes Demographics, Reactions, Drugs, Outcomes, etc.
FDA/CDC Vaccine Adverse Events (VAERS)– Stricter Laws for Vaccine Adverse Event Reporting
![Page 11: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/11.jpg)
Signal Detection DemonstrationUsing VAERS Data
![Page 12: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/12.jpg)
“Significant” EBGM and even extremely conservative EB05 with small N
![Page 13: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/13.jpg)
Simple Rankings by Signal Strength
![Page 14: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/14.jpg)
Evolution of Signals Over Time
![Page 15: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/15.jpg)
Multi-Symptom Syndromes (Higher Order Associations)
![Page 16: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/16.jpg)
The “Serotonin Syndrome”
Could MGPS be used to identify unknown syndromes? Try mining the AERS data for “significant” event triples using a
known syndrome. "The symptoms of the serotonin syndrome are: euphoria,
drowsiness, sustained rapid eye movement, overreaction of the reflexes, rapid muscle contraction and relaxation in the ankle causing abnormal movements of the foot, clumsiness, restlessness, feeling drunk and dizzy, muscle contraction and relaxation in the jaw, sweating, intoxication, muscle twitching, rigidity, high body temperature, mental status changes were frequent (including confusion and hypomania - a "happy drunk" state), shivering, diarrhea, loss of consciousness and death. (The Serotonin Syndrome, AM J PSYCHIATRY, June 1991)
![Page 17: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/17.jpg)
![Page 18: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/18.jpg)
Using Simulation to Testthe Signal Detection Process
![Page 19: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/19.jpg)
Interpreting Simulation Parameters
1. As R P and (Q-R) (1-P) => “No Signal”
2. As R P and (Q-R) << (1-P) => “Strong Signal”
3. When R << P and (Q-R)(1-P) => “No Signal”
4. When R << P and (Q-R) << (1-P) => “Rare event”
Q 1-Q
1-P-Q+RQ-R
P-RR
Outcome
Exposure
Yes
No
Yes
No
P
1-P
1
![Page 20: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/20.jpg)
Using Simulation to Create a Receiver Operating Characteristic (ROC) Curve for EBGM
An ROC curve displays the true-positive rate (sensitivity) versus the false-positive rate(1 – specificity) for a statistic
Ran a 20 iteration simulation using P = 0.003Q = 0.001 and R = 0.00003 (RR = 10) to check the true-positive rate
Ran a 20 iteration simulation using P = 0.003,Q = 0.001 and R = 0.0003 (RR = 1) to check the false-positive rate
![Page 21: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/21.jpg)
ROC Curve Based on Simulated Injection of Signals
![Page 22: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/22.jpg)
Simulating a Rare Event
Sample 100,000 records from VAERS data
Set P = 0.003, Q = 0.001, R = 0.00003
Iterate 20 Monte Carlo simulations
Expect (on average):– 0.003 x 100,000 = 300 “Rare Exposures”– 0.001 x 100,000 = 100 “Rare Outcomes”– 0.00003 x 100,000 = 3 “Rare Exposure + Rare Outcome”
combinations– E = (300 x 100) / 100,000 = 0.3– RR = 3/ 0.3 = 10
![Page 23: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/23.jpg)
Base Simulation on VAERS Data
![Page 24: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/24.jpg)
Sample Cases From VAERS
![Page 25: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/25.jpg)
Sample 100,000 Cases
![Page 26: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/26.jpg)
P = 0.003
Q = 0.001
R = 0.00003
![Page 27: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/27.jpg)
20 Monte Carlo Iterations
![Page 28: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/28.jpg)
RareExposure Expected N = 300
![Page 29: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/29.jpg)
RareOutcome Expected N = 100
![Page 30: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/30.jpg)
RareExposure + RareOutcome Expected N = 3
Expected RR = 10
![Page 31: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/31.jpg)
Technical Details
William DuMouchel. Bayesian Data Mining in Large Frequency Tables (with Discussion). The American Statistician (1999) pp 177-190.
William Dumouchel and Daryl Pregibon. Empirical Bayes Screening for Multi-Item Associations. Proceedings of KDD 2001.
![Page 32: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/32.jpg)
Methodology History and Key Contributors
Stephan Evans– MCA, UK– Proportional reporting ratio (PRR) with Chi 2 analyses– Simple, highly intuitive, can be calculated by hand
Bate, Lindquist, Edwards et. al.– WHO Uppsala Monitoring Centre– Bayesian neural network method for adverse drug reaction signal generation
Ana Szarfman, FDA (CDER) and Bill DuMouchel (ATT)– Empiric Bayes, more robust than PRR for small n
MGPS method: statistical parameter is EGBM William DuMouchel. Bayesian Data Mining in Large Frequency Tables (with Discussion). The
American Statistician (1999) pp 177-190. William Dumouchel and Daryl Pregibon. Empirical Bayes Screening for Multi-Item
Associations. Proceedings of KDD 2001.– Multidimensional analyses possible
Interactions, gender and other demographic associates, syndrome identification– Can directly compare EBGM values of different drugs, as well as for a specific drug
![Page 33: Detecting Multi-Item Associations and Temporal Trends Using the WebVDME/MGPS Application DIMACS Tutorial on Statistical and Other Analytic Health Surveillance](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d405503460f94a1aa1e/html5/thumbnails/33.jpg)
Key Contributors (continued)
WHO Collaborating Center for Internat’l Drug Monitoring: M Lindquist, M Stahl, A. Bate, R. Edwards, RH Meyboom.
– Bayesian confidence propagation neural network (BCPNN) . Information Component (IC) statistic is the measure of the strength of D:E relationship
– Iterative approach
L. Gould . Comparison and refinement of Bayesian approaches for evaluating spontaneous reports of ADRs. DIA Annual meeting, July 2001, (Denver)
– EB vs BCPNN = similar results
Thakrar, BT, Blesch, KS, Sacks, ST, Wilcock, K (2001)– (ISPE, Pharmacoepid. & Drug Safety 10), – PRR vs. EB= similar sensitivity, EB better at ranking events based on small
N.