validation of medical registries - · pdf filevalidation of national quality registries aldana...
TRANSCRIPT
Validation of National Quality
Registries
Aldana Rosso, PhD
Epidemiology and Register Centre South, ERC Syd
Skåne University Hospital, Lund. Sweden
2016-04-26
HEL-8020 Analyse av registerdata i forskning
http://www.aftonbladet.se/sportbladet/vintersport/skidor/article20057117.ab
Swedish National Diabetes Registry
https://www.ndr.nu/#/knappen
http://www.aftonbladet.se/sportbladet/vintersport/skidor/article20057117.ab
Swedish National Diabetes Registry
https://www.ndr.nu/#/knappen
Vote: http://goo.gl/7DUCR0
Agenda
• Introduction
• Data quality
• Consequences of errors in the quality
registries
• An introduction to sampling theory
• Examples
National Quality Registries in
Sweden• There are around 100 national quality registries in
Sweden that receives central funding.
• They contain individualized information about patient problems, treatments, and outcomes of the treatments.
• http://www.kvalitetsregister.se/sekundarnavigering/inenglish.4.f647a59141ef67b4342a3c.html
Objectives for the Swedish National
Quality Registries• Improve quality in health-care.
• Follow-up treatments.
• Research.
• Law about ”Patientdata”: Chapter 7.
• http://www.riksdagen.se/sv/Dokument-Lagar/Lagar/Svenskforfattningssamling/Patientdatalag-2008355_sfs-2008-355/?bet=2008:355#K7
• In order to be able to use registerdata to
draw conclusions about healthcare, the
data needs to be somewhat correct.
Data Quality
• ISO 8402-1986: ”The totality of features
and characteristics of an entity that bears
on in its ability to satisfy stated and implied
needs”.
• Valideringshandboken: ”Data av hög
kvalitet är relevanta, fullständiga, korrekta
och konsistenta. ”
Data Quality
Data should be:
• Correct
• Complete
• Reliable (eg reasonable questions)
• The population of interest should be
represented
• Which information is needed to define a
registry?
Example: Swedish Intensive Care
Registry (SIR)
http://www.icuregswe.org/Documents/Education/Adminkurs/Utbildningsdagen_del1_
2014-11-27.pdf
Registry population
• Target population
All patients that need intensive care
”Intensivvård är en vårdnivå och inte en vårdplats. Intensivvård definieras som avancerad övervakning, diagnostik
eller behandling vid hotande eller manifest svikt i vitala funktioner.” http://www.icuregswe.org/sv/Om-SIR/
Registry population
• Registry population
All patients that need intensive care
Patients that fit in SIR’s
definition.
”Intensivvård är en vårdnivå och inte en vårdplats. Intensivvård definieras som avancerad övervakning, diagnostik
eller behandling vid hotande eller manifest svikt i vitala funktioner.” http://www.icuregswe.org/sv/Om-SIR/
Registry population
• Completness
Patienter som passar in i
SIR’s definition.
Patients
registered in
the database
All patients that need intensive care
Patienter som passar in i
SIR’s definition.
Registry population
Percentage of clinics participating in the
registry (anslutningsgrad)
Rapporterande kliniker
Andra kliniker
Example: Spanish National Acute Coronary
Syndrome Registry
Example: Spanish National Acute Coronary
Syndrome Registry
• Audit of the Spanish national acute coronary syndrome
registry [Ferreira-Gonzalez et al, Circulation:
Cardiovascular Quality and Outcomes,2009; 2: 540-547].
• They compared enrolled patients with those that were not
enrolled for some participating hospitals (17 of 50).
• Missed patients were of higher risk and received less
recommended therapies than the included patients. In-
hospital mortality was almost 3 times higher in the missed
population.
Consequences of Statistical
Analysis with Missing Data
• It depends on why the data is missing:
– missing at random: problematic from a power
perspective but it doesn’t bias the results.
– missing not at random: bias + power
problems.
Consequences of Statistical
Analysis with Missing Data
• Some calculations are more sensitive than
others.
• Ranking is specially bad!
Example: Calculations with
Missing Data
• Monte Carlo simulation: a registry with
20 000 primary operations and 3 hospitals.
• 5% of those patients have a reoperation.
• How does the percentage of missing
values affect the percentage of
reoperation?
Example: Calculations with
Missing data
• 20000 patients, 5 % have a reoperation.
• 5 % reoperations missing at random.
Hospital Proportion
Reoperation
Proportion
with missing
data
95 % CI
1 0.048 0.046 (0.045,0.047)
2 0.050 0.048 (0.047,0.049)
3 0.052 0.049 (0.048,0.050)
Example: Calculations with Missing
data: Which clinic is reoperating
more patients?
• 20000 patients, 5 % have a reoperation.
Percentage
missing data
Proportion
correct
ranking
5 % 95 %
7 % 87 %
Example: Swedish Knee
Arthroplasty Registry
IVA-diagnoser
http://portal.icuregswe.org/Rapport.aspx
Exempel: IVA-diagnoser
Diagnoser Antal
Z04.9Undersökning och observation av icke
specificerat skäl 3200
J96.9 Respiratorisk insufficiens, ospecificerad 2000
I46.9 Hjärtstillestånd, ospecificerad 1500
R57.2 Septisk chock 1500
R65.1Systemiskt inflammatoriskt svarssyndrom
[SIRS] av infektiöst ursprung med organsvikt 1500
T07.9 Icke specificerade multipla skador 1400
R56.8 Andra och icke specificerade kramper 1400
K92.2 Gastrointestinal blödning, ospecificerad 1200
J15.9 Bakteriell pneumoni, ospecificerad 1200
http://portal.icuregswe.org/Rapport.aspx
Exempel: IVA-diagnoser
Diagnoser Antal
Z04.9 3200
J96.9 2000
I46.9 1500
R57.2 1500
R65.1 1500
T07.9 1400
R56.8 1400
K92.2 1200
J15.9 1200
http://portal.icuregswe.org/Rapport.aspx
Andel felaktiga
diagnoser som
blev ”Z04.9”
Andel korrekta
ranking
2 % 4 % ( 2 % - 10 %)
5 % 5 % ( 2 % - 12 %)
Example: UK National Cardiac
Surgery Database
Example: Swedish National Diabetes
Register
Results
• Why do we have wrong data in the
registries?
Error Sources in National Quality
Registries
1. Before registration
2. During registration
3. After registration
Error Sources in National Quality
Registries
1. Before registration:
– wrong data is registered in the journal.
– the patient is not enrolled in the registry.
– It is very difficult to estimate the error
proportion of this type of error.
Error Sources in National Quality
Registries
2. During registration:
– misinterpretation and inaccurate typing: wrong value, wrong calculation, wrong alternative.
– incomplete data.
• Misinterpretation and inaccurate typing: 5 %.*
• Missing data: 3 %.*
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork. Arts
D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
Error Sources in National Quality
Registries
3. After registration:
– programming errors.
– Communication problems between different databases.
• They are very difficult to find and have severe consequences.
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork. Arts
D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
• Is automatic registration the solution?
Dutch National Intensive Care
Evaluation Registry (NICE)
*Defining and improving data quality in medical registries: A literature review, case study and generic framwork. Arts
D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.
NICE contains data from
patients who have been
admitted to Dutch
intensive care units and
provides insight into the
effectiveness and
efficiency of Dutch
intensive care.
Problem (exam question last
year)
• A small registry with 3000 patients and 3
participating clinics wants to validate its
data. Describe how you would do it.
Validation Using Sampling Theory
Principle: we only take a sample of some
patients and compare their journals to the
registered data. We extend this information to
all the patients in the registry.
References:
• “Sampling of populations”, P. Levy, 2008, Willey.
• “Sampling Essentials: Practical Guidelines for Making
Sampling Choices”, J. Daniel, 2012, SAGE Publications.
Simple Random Sampling
1. Select randomly some patients.
2. Estimate the proportion of incorrect data.
3. Extrapolate to the registry.
Registry
Simple Random Sampling
• Usually without replacement.
• It is easy to program.
• It may give a sample that doesn’t
represent the registry very well.
• “It may require more patients than other
sampling techniques”.
• Expensive (transportation cost).
Stratified Random Sampling
1. Divide the registry in strata (e.g. hospitals, regions, etc.).
2. Within each strata, select some patients.
3. Estimate the proportion of incorrect data.
4. Extrapolate to the registry.
Registry
Stratified Random Sampling
• It is possible to estimate the proportion of error for each stratum (hospitals).
• It requires the participation in the validation of all the strata.
• The patients within the stratum can be selected in several ways:
– a fixed amount of patients are selected randomly within each stratum.
– selection of the same percentage as the stratum in the frame.
Stratified Random Sampling
• At least as effective as SRS for the same
sample size.
• If information about the error distribution is
known, the design can be improved.
• It gives a more representative sample.
• Expensive (transportation cost).
Stratified Random Sampling
Registry
1
2
3
4Registry
1
2
3
4
Efficient
Stratification
Non-efficient
Stratification
Cluster Sampling1. Divide the registry in
clusters (e.g. hospitals, regions, ect.).
2. Select some clusters.
3. Within each cluster, select some patients.
4. Estimate the proportion of incorrect data.
5. Extrapolate to the registry.
Registry
Cluster Sampling
• Multistage with different sampling weights.
For example:
– Level 1: cluster region.
– Level 2: cluster hospital.
– Level 3: sampling units patients.
• It can be used in combination with
stratification.
Cluster Sampling
• Lower transportation cost.
• Only some hospitals are represented in
the validation.
• It requires more patients than SRS to
achieve the same precision.
Cluster Sampling
Registry
1
2
3
4Registry
1
2
3
4
Non-efficient
cluster
Efficient cluster
How to compare the different
designs?
• Design effect (Deff): A measure of precision of the sampling design compared to SRS.
• Deff= variancedesign/varianceSRS
– D= 1 The sampling design is equivalent to SRS.
– D<1 The sampling design is increases precision relative to SRS.
– D>1 The sampling design has lower precision relative to SRS.
Self-Weighting Designs
• Each patient has the same probability of being selected.
• Easier math.
• Examples:
– SRS
– Stratification + probability proportional to size (PPS)
– PPS at cluster level + same amount of patients
– SRS at cluster level + PPS
Non-Self-Weighting Designs
• Each patient has different probability of being selected.
• More complicated math.
• More flexible designs.
• Examples:– Stratification + SRS
– PPS at cluster level + SRS
– SRS at cluster level +Same amount of patients
Question
• A large registry (30000 patients with 40
clinics) wants to validate the data. How
would you do it?
Example: The Swedish Corneal
Transplant Registry
• The registry started in 1996.
• Coverage: around 95 %.
• 7 hospitals.
• Data collected at the operation and 2-years follow-up.
• http://www.eyenetsweden.se/page/27/the-swedish-corneal-transplant-register.aspx
Example: The Swedish Corneal
Transplant Registry
• Objective: Find a simple design to
estimate the errors in 9 variables for the
period 2008-2012.
• Assumptions
– Expected percentage error 2-3 %.
– All hospitals will participate.
Example: The Swedish Corneal
Transplant Registry
Example: The Swedish Corneal
Transplant Registry
• SRS
• Monte Carlo simulations to estimate the
sample size.
Expected percentage
incorrect data
Accepted percentage
incorrect data
Sample size
3 % 4% 1000
3% 5% 500
2% 4% 430
2% 5% 300
Example: The Swedish Corneal
Transplant Registry
Variables Incorrect data % LCI 95 % UCI 95 %
Additional surgery 0,3 0,1 2,1
Operation type 2,7 1,4 5,1
Visual acuity operated eye 3,7 2,1 6,4
Lenses status 3,4 1,9 6,0
Diagnosis 4,7 2,9 7,6
Previously transplanted on
the other eye 4,1 2,4 6,8
Operated eye 2,0 1,0 4,3
Gender 2,7 1,4 5,1
Birthday year 5,4 3,4 8,4
Example: UK National Cardiac
Surgery Database
Which information do we get from
the validation?
• Estimation about the errors in the
database.
• Is the data missing at random (MAR)?
• Is the misclassification non-differential?
• How does the registration process work?
What happens after validation?
• Ethical and legal viewpoints: We should
correct the data (Sweden).
• Bias concerns?
• Information about the data quality and the
missing data in the publications?
Software
• Stata: svy set. Programs available from
ERC Syd.
• SAS: proc surveyselect.
• R: Package “Sampling”.
How to Minimize Errors in the
Registries
http://www.who.int/bulletin/volumes/90/5/11-099580/en/
Summary• There is always going to be errors and missing values in the
database but we need to characterize them to be able to trust the results from the analysis.
• The results from the validation give information about the data stored in the registry.
• Not all missing data is MAR and not all misclassification is non-differential.
• Sampling theory helps us to design a sample that represents the registry.
• It doesn’t have to be very expensive to validate a registry.