validation of medical registries - · pdf filevalidation of national quality registries aldana...

Validation of National Quality

Registries

Aldana Rosso, PhD

Epidemiology and Register Centre South, ERC Syd

Skåne University Hospital, Lund. Sweden

[email protected]

2016-04-26

HEL-8020 Analyse av registerdata i forskning

mailto:[email protected]

http://www.aftonbladet.se/sportbladet/vintersport/skidor/article20057117.ab

Swedish National Diabetes Registry

https://www.ndr.nu/#/knappen

http://www.aftonbladet.se/sportbladet/vintersport/skidor/article20057117.ab




Vote: http://goo.gl/7DUCR0

Agenda

• Introduction

• Data quality

• Consequences of errors in the quality

registries

• An introduction to sampling theory

• Examples

National Quality Registries in

Sweden• There are around 100 national quality registries in

Sweden that receives central funding.

• They contain individualized information about patient problems, treatments, and outcomes of the treatments.

• http://www.kvalitetsregister.se/sekundarnavigering/inenglish.4.f647a59141ef67b4342a3c.html

Objectives for the Swedish National

Quality Registries• Improve quality in health-care.

• Follow-up treatments.

• Research.

• Law about ”Patientdata”: Chapter 7.

• http://www.riksdagen.se/sv/Dokument-Lagar/Lagar/Svenskforfattningssamling/Patientdatalag-2008355_sfs-2008-355/?bet=2008:355#K7

• In order to be able to use registerdata to

draw conclusions about healthcare, the

data needs to be somewhat correct.

Data Quality

• ISO 8402-1986: ”The totality of features

and characteristics of an entity that bears

on in its ability to satisfy stated and implied

needs”.

• Valideringshandboken: ”Data av hög

kvalitet är relevanta, fullständiga, korrekta

och konsistenta. ”

Data Quality

Data should be:

• Correct

• Complete

• Reliable (eg reasonable questions)

• The population of interest should be

represented

• Which information is needed to define a

registry?

Example: Swedish Intensive Care

Registry (SIR)

http://www.icuregswe.org/Documents/Education/Adminkurs/Utbildningsdagen_del1_

2014-11-27.pdf

Registry population

• Target population

All patients that need intensive care

”Intensivvård är en vårdnivå och inte en vårdplats. Intensivvård definieras som avancerad övervakning, diagnostik

eller behandling vid hotande eller manifest svikt i vitala funktioner.” http://www.icuregswe.org/sv/Om-SIR/

Registry population

• Registry population


Patients that fit in SIR’s

definition.

”Intensivvård är en vårdnivå och inte en vårdplats. Intensivvård definieras som avancerad övervakning, diagnostik

eller behandling vid hotande eller manifest svikt i vitala funktioner.” http://www.icuregswe.org/sv/Om-SIR/

Registry population

• Completness

Patienter som passar in i

SIR’s definition.

Patients

registered in

the database


Patienter som passar in i

SIR’s definition.

Registry population

Percentage of clinics participating in the

registry (anslutningsgrad)

Rapporterande kliniker

Andra kliniker

Example: Spanish National Acute Coronary

Syndrome Registry

Example: Spanish National Acute Coronary

Syndrome Registry

• Audit of the Spanish national acute coronary syndrome

registry [Ferreira-Gonzalez et al, Circulation:

Cardiovascular Quality and Outcomes,2009; 2: 540-547].

• They compared enrolled patients with those that were not

enrolled for some participating hospitals (17 of 50).

• Missed patients were of higher risk and received less

recommended therapies than the included patients. In-

hospital mortality was almost 3 times higher in the missed

population.

Consequences of Statistical

Analysis with Missing Data

• It depends on why the data is missing:

– missing at random: problematic from a power

perspective but it doesn’t bias the results.

– missing not at random: bias + power

problems.

Consequences of Statistical

Analysis with Missing Data

• Some calculations are more sensitive than

others.

• Ranking is specially bad!

Example: Calculations with

Missing Data

• Monte Carlo simulation: a registry with

20 000 primary operations and 3 hospitals.

• 5% of those patients have a reoperation.

• How does the percentage of missing

values affect the percentage of

reoperation?

Example: Calculations with

Missing data

• 20000 patients, 5 % have a reoperation.

• 5 % reoperations missing at random.

Hospital Proportion

Reoperation

Proportion

with missing

data

95 % CI

1 0.048 0.046 (0.045,0.047)

2 0.050 0.048 (0.047,0.049)

3 0.052 0.049 (0.048,0.050)

Example: Calculations with Missing

data: Which clinic is reoperating

more patients?

• 20000 patients, 5 % have a reoperation.

Percentage

missing data

Proportion

correct

ranking

5 % 95 %

7 % 87 %

Example: Swedish Knee

Arthroplasty Registry

IVA-diagnoser

http://portal.icuregswe.org/Rapport.aspx

Exempel: IVA-diagnoser

Diagnoser Antal

Z04.9Undersökning och observation av icke

specificerat skäl 3200

J96.9 Respiratorisk insufficiens, ospecificerad 2000

I46.9 Hjärtstillestånd, ospecificerad 1500

R57.2 Septisk chock 1500

R65.1Systemiskt inflammatoriskt svarssyndrom

[SIRS] av infektiöst ursprung med organsvikt 1500

T07.9 Icke specificerade multipla skador 1400

R56.8 Andra och icke specificerade kramper 1400

K92.2 Gastrointestinal blödning, ospecificerad 1200

J15.9 Bakteriell pneumoni, ospecificerad 1200


Exempel: IVA-diagnoser

Diagnoser Antal

Z04.9 3200

J96.9 2000

I46.9 1500

R57.2 1500

R65.1 1500

T07.9 1400

R56.8 1400

K92.2 1200

J15.9 1200


Andel felaktiga

diagnoser som

blev ”Z04.9”

Andel korrekta

ranking

2 % 4 % ( 2 % - 10 %)

5 % 5 % ( 2 % - 12 %)

Example: UK National Cardiac

Surgery Database

Example: Swedish National Diabetes

Register

Results

• Why do we have wrong data in the

registries?

Error Sources in National Quality

Registries

1. Before registration

2. During registration

3. After registration


Registries

1. Before registration:

– wrong data is registered in the journal.

– the patient is not enrolled in the registry.

– It is very difficult to estimate the error

proportion of this type of error.


Registries

2. During registration:

– misinterpretation and inaccurate typing: wrong value, wrong calculation, wrong alternative.

– incomplete data.

• Misinterpretation and inaccurate typing: 5 %.*

• Missing data: 3 %.*

*Defining and improving data quality in medical registries: A literature review, case study and generic framwork. Arts

D.G.T. et al, J Am Med Inform Assoc, 2002 (9) 600.


Registries

3. After registration:

– programming errors.

– Communication problems between different databases.

• They are very difficult to find and have severe consequences.



• Is automatic registration the solution?

Dutch National Intensive Care

Evaluation Registry (NICE)



NICE contains data from

patients who have been

admitted to Dutch

intensive care units and

provides insight into the

effectiveness and

efficiency of Dutch

intensive care.

Problem (exam question last

year)

• A small registry with 3000 patients and 3

participating clinics wants to validate its

data. Describe how you would do it.

Validation Using Sampling Theory

Principle: we only take a sample of some

patients and compare their journals to the

registered data. We extend this information to

all the patients in the registry.

References:

• “Sampling of populations”, P. Levy, 2008, Willey.

• “Sampling Essentials: Practical Guidelines for Making

Sampling Choices”, J. Daniel, 2012, SAGE Publications.

Simple Random Sampling

1. Select randomly some patients.

2. Estimate the proportion of incorrect data.

3. Extrapolate to the registry.

Registry

Simple Random Sampling

• Usually without replacement.

• It is easy to program.

• It may give a sample that doesn’t

represent the registry very well.

• “It may require more patients than other

sampling techniques”.

• Expensive (transportation cost).

Stratified Random Sampling

1. Divide the registry in strata (e.g. hospitals, regions, etc.).

2. Within each strata, select some patients.



Registry


• It is possible to estimate the proportion of error for each stratum (hospitals).

• It requires the participation in the validation of all the strata.

• The patients within the stratum can be selected in several ways:

– a fixed amount of patients are selected randomly within each stratum.

– selection of the same percentage as the stratum in the frame.


• At least as effective as SRS for the same

sample size.

• If information about the error distribution is

known, the design can be improved.

• It gives a more representative sample.

• Expensive (transportation cost).


Registry

1

2

3

4Registry

1

2

3

4

Efficient

Stratification

Non-efficient

Stratification

Cluster Sampling1. Divide the registry in

clusters (e.g. hospitals, regions, ect.).

2. Select some clusters.

3. Within each cluster, select some patients.



Registry

Cluster Sampling

• Multistage with different sampling weights.

For example:

– Level 1: cluster region.

– Level 2: cluster hospital.

– Level 3: sampling units patients.

• It can be used in combination with

stratification.

Cluster Sampling

• Lower transportation cost.

• Only some hospitals are represented in

the validation.

• It requires more patients than SRS to

achieve the same precision.

Cluster Sampling

Registry

1

2

3

4Registry

1

2

3

4

Non-efficient

cluster

Efficient cluster

How to compare the different

designs?

• Design effect (Deff): A measure of precision of the sampling design compared to SRS.

• Deff= variancedesign/varianceSRS

– D= 1 The sampling design is equivalent to SRS.

– D<1 The sampling design is increases precision relative to SRS.

– D>1 The sampling design has lower precision relative to SRS.

Self-Weighting Designs

• Each patient has the same probability of being selected.

• Easier math.

• Examples:

– SRS

– Stratification + probability proportional to size (PPS)

– PPS at cluster level + same amount of patients

– SRS at cluster level + PPS

Non-Self-Weighting Designs

• Each patient has different probability of being selected.

• More complicated math.

• More flexible designs.

• Examples:– Stratification + SRS

– PPS at cluster level + SRS

– SRS at cluster level +Same amount of patients

Question

• A large registry (30000 patients with 40

clinics) wants to validate the data. How

would you do it?

Example: The Swedish Corneal

Transplant Registry

• The registry started in 1996.

• Coverage: around 95 %.

• 7 hospitals.

• Data collected at the operation and 2-years follow-up.

• http://www.eyenetsweden.se/page/27/the-swedish-corneal-transplant-register.aspx


Transplant Registry

• Objective: Find a simple design to

estimate the errors in 9 variables for the

period 2008-2012.

• Assumptions

– Expected percentage error 2-3 %.

– All hospitals will participate.


Transplant Registry


Transplant Registry

• SRS

• Monte Carlo simulations to estimate the

sample size.

Expected percentage

incorrect data

Accepted percentage

incorrect data

Sample size

3 % 4% 1000

3% 5% 500

2% 4% 430

2% 5% 300


Transplant Registry

Variables Incorrect data % LCI 95 % UCI 95 %

Additional surgery 0,3 0,1 2,1

Operation type 2,7 1,4 5,1

Visual acuity operated eye 3,7 2,1 6,4

Lenses status 3,4 1,9 6,0

Diagnosis 4,7 2,9 7,6

Previously transplanted on

the other eye 4,1 2,4 6,8

Operated eye 2,0 1,0 4,3

Gender 2,7 1,4 5,1

Birthday year 5,4 3,4 8,4

Example: UK National Cardiac

Surgery Database

Which information do we get from

the validation?

• Estimation about the errors in the

database.

• Is the data missing at random (MAR)?

• Is the misclassification non-differential?

• How does the registration process work?

What happens after validation?

• Ethical and legal viewpoints: We should

correct the data (Sweden).

• Bias concerns?

• Information about the data quality and the

missing data in the publications?

Software

• Stata: svy set. Programs available from

ERC Syd.

• SAS: proc surveyselect.

• R: Package “Sampling”.

How to Minimize Errors in the

Registries

http://www.who.int/bulletin/volumes/90/5/11-099580/en/

Summary• There is always going to be errors and missing values in the

database but we need to characterize them to be able to trust the results from the analysis.

• The results from the validation give information about the data stored in the registry.

• Not all missing data is MAR and not all misclassification is non-differential.

• Sampling theory helps us to design a sample that represents the registry.

• It doesn’t have to be very expensive to validate a registry.