determining optimal cut-oﬀs in themeta

Master Thesis

Determining optimal cut-offs inthe meta-analysis of diagnostic test

accuracy studies

Author:

Susanne Steinhauser

Supervisors:

Prof. Dr. Martin Schumacher

Dr. Gerta Rucker

September 28, 2015

FAKULTAT FUR MATHEMATIK

UND PHYSIK

Diese Seite enthält persönliche Daten und ist deshalb nicht zur Online-Veröffentlichung freigegeben.

ABSTRACT iii

Abstract

In some systematic reviews of diagnostic test accuracy studies sev-

eral studies report more than one cut-off value and the corresponding

values of sensitivity and specificity. But until now there is no widely-

used meta-analysis approach using this information. For example, the

traditional bivariate model only assumes one pair of sensitivity and

specificity per study. But as there is more information available, it is

only reasonable to make use of it.

That is why we describe a new approach utilizing such data. It is based

on the idea of estimating the distribution functions of the underlying

biomarker in the non-diseased and diseased study population. We as-

sume a normal or logistic distributed biomarker and estimate different

distribution parameters in both groups. This is achieved through a

linear regression of the (probit or logit) transformed proportions of

negative test results of the non-diseased and diseased individuals, re-

spectively, using a mixed effects model with study as grouping factor.

We present a number of possible mixed models. Once both distribution

functions are estimated, these give the pooled sensitivity and specificity

at a specific cut-off and the summary receiver operating characteristic

(SROC) curve follows directly. Furthermore, the difference of the dis-

tribution functions is the Youden index, so that the determination of

an optimal cut-off across studies is possible through maximization of

the Youden index.

The approach is applied to several examples, almost all leading to con-

vincing results. An extensive simulation study was realized, showing

strengths and limitations of the approach.

iv

Zusammenfassung in deutscher Sprache

Fur systematische Ubersichtsarbeiten uber Diagnosestudien, bei de-

nen einige Studien mehr als einen cut-off Wert und die dazugehorige

Sensitivitat und Spezifitat berichten, gibt es momentan keinen weit

verbreiteten Metaanalyseansatz, der diese Informationen nutzt. Das

traditionelle bivariate Modell zum Beispiel geht lediglich von einem

Paar Sensitivitat und Spezifitat pro Studie aus. Gibt es aber mehr

Informationen, ist es sehr sinnvoll diese zu nutzen.

Wir mochten einen neuen Ansatz beschreiben, der genau diese Art von

Daten verwendet. Die Grundidee ist, die Verteilungsfunktion des zu-

grundeliegenden Biomarkers jeweils in der Gruppe der gesunden und

der der kranken Patienten zu schatzen. Wir machen einen parametri-

schen Ansatz (normal oder logistisch verteilter Biomarker) und schatzen

unterschiedliche Verteilungsparameter fur die beiden Gruppen. Dies

wird durch lineare Regression der (probit bzw. logit) transformierten

Verhaltnisse negativer Testergebnisse in den beiden Gruppen mithilfe

eines gemischten Modells erreicht, wobei Studie als Gruppierungsfak-

tor fungiert. Dabei stellen wir eine Vielzahl von moglichen gemischten

Modellen vor. Hat man beide Verteilungsfunktionen geschatzt, geben

diese die Werte der gepoolten Sensitivitat und Spezifitat fur einen spez-

ifischen cut-off und die geschatzte summary receiver operating char-

acteristic (SROC) Kurve folgt direkt. Außerdem bildet die Differenz

der Verteilungsfunktionen den Youndenindex, weshalb die Bestimmung

des optimalen cut-offs uber alle Studien hinweg durch Maximieren des

Youdenindexes moglich ist.

Der neue Ansatz wird anhand von mehreren Beispielen demonstri-

ert, die fast alle zu sehr uberzeugenden Ergebnissen fuhren. Eine

ausfuhrliche Simulationsstudie wurde durchgefuhrt, die Starken und

Grenzen des Ansatzes aufzeigt.

Contents

Acknowledgements i

Abstract iii

Zusammenfassung in deutscher Sprache iv

Abbreviations vii

Chapter 1. Introduction 1

Chapter 2. Background 3

2.1. Diagnostic Test Accuracy Studies 3

2.2. Meta-Analyses of Diagnostic Test Accuracy Studies 9

2.3. Traditional Approaches 11

Chapter 3. Theory 15

3.1. Motivation 15

3.2. Existing Approaches 15

3.3. Novel Approach 17

3.4. Model Selection 36

3.5. Implementation in R 38

3.6. Weighting Parameters 40

Chapter 4. Examples 45

4.1. Troponin as a marker for myocardial infarction 45

4.2. Procalcitonin as a marker for sepsis 49

4.3. Procalcitonin as a marker for neonatal sepsis 52

4.4. CAGE Questionnaire 53

Chapter 5. Simulation Study 55

5.1. Design 55

5.2. Results 57

Chapter 6. Discussion 65

Chapter 7. Conclusion 69

v

vi CONTENTS

Appendix A. Data Sets 71

Appendix B. R Code 75

B.1. Code Novel Approach 75

B.2. Code Simulation Study 94

Appendix C. Simulation Study Plots 105

Bibliography 109

ABBREVIATIONS vii

Abbreviations

DTA Diagnostic Test AccuracyTN True NegativesTP True PositivesFN False NegativesFP False PositivesTNR True Negative RateTPR True Positive RateROC Receiver Operating CharacteristicSROC Summary Receiver Operating CharacteristicREML Restricted Maximum LikelihoodAIC Akaike Information CriterioncAIC conditional Akaike Information CriterionMSE Mean Squared Error

CHAPTER 1

Introduction

The number of clinical studies published every year is strongly grow-

ing (Ressing et al., 2009). For example, there are more than 70 studies

investigating the predictive ability of procalcitonin (a prohormone that

can be measured in the blood) regarding sepsis1. Studies like these are

called diagnostic test accuracy (DTA) studies, as they investigate the

performance of a diagnostic test, in this case based on the biomarker

procalcitonin. DTA studies often report two measures: sensitivity and

specificity. These measures stand for the success rate of the diagnostic

test in diseased and non-diseased individuals and depend on the cho-

sen cut-off value of the biomarker. This flood of clinical studies needs

to be structured so that researchers and clinicians do not lose track.

Therefore, systematic reviews with meta-analyses are inevitable. They

collect and summarize study results regarding one subject matter ques-

tion.

For example, Wacker et al. (2013) conducted a meta-analysis about

procalcitonin as a diagnostic marker for sepsis to give an overview of

the current state of research. Despite mentioning that some studies

reported sensitivity and specificity at different cut-offs, they used only

one pair of sensitivity and specificity per study for their meta-analysis.

This was due to the use of a traditional approach for meta-analyses of

DTA studies, which only allows one pair of sensitivity and specificity

per study. In the following, further meta-analyses appeared where

several studies reported more than one cut-off and the corresponding

values of sensitivity and specificity. Therefrom the questions arose:

Which is the right cut-off to choose as a meta-analyst or how can we

use the full information provided by the studies?

There are already existing approaches which face this problem and

make use of more than one pair of sensitivity and specificity per study

1This number results from a brief PubMed search about meta-analyses to thissubject, counting the included studies.

1

2 1. INTRODUCTION

(see for example Hamza et al. (2009), Putter et al. (2010) and Martınez-

Camblor (2014)). As more data is used, the results are expected to be

more reliable and the biomarker is evaluated at its best. Finally, the

patients will benefit if the best-performing biomarker can be identified

out of a group of potential biomarkers and can be used in practise.

Furthermore, it is of interest to know at which cut-off value of the

biomarker this performance can be expected. That is why meta-analysts

have asked how to determine an optimal cut-off across all studies.

In this thesis we present a new approach for a meta-analysis of DTA

studies responding to these issues. We elaborated, refined and applied

an idea suggested by G. Rucker one year ago. This new approach uses

data, where several studies report more than one cut-off and corre-

sponding values of sensitivity and specificity and leads to pooled sen-

sitivity and specificity as well as to an optimal cut-off value across

studies. The fundamental idea is to estimate the distribution functions

of the biomarker within the diseased and non-diseased individuals us-

ing a linear mixed effects model.

The structure of this thesis is organized as follows: In the second chap-

ter, we give background information about diagnostic test accuracy

studies and meta-analyses of these and briefly introduce two traditional

approaches. Then, in chapter 3, we present our new approach. First, we

give a motivation and briefly describe some existing approaches. Step

by step, we explain the procedure, touch on the subject of model selec-

tion, introduce the implementation in R and give two weighting options.

In chapter 4 we show some examples taken from current meta-analyses.

A simulation study evaluating the performance of our new approach is

presented in chapter 5. In chapter 6 we discuss the approach and the

results of the evaluations and we finish with a conclusion in the last

chapter.

CHAPTER 2

Background

2.1. Diagnostic Test Accuracy Studies

2.1.1. Diagnostic test accuracy study. As the subject of this

thesis is meta-analysis of diagnostic test accuracy studies, we first want

to take a closer look at this special type of study. This chapter follows

chapter 9 of Schwarzer et al. (2015).

A diagnostic test accuracy study investigates if and how well a diagnos-

tic test can recognize or rule out a disease. A test can be, for example,

based on a questionnaire testing for alcoholism and we want to know if

the questions can distinguish correctly between harmful and harmless

alcohol consumption. A test could also be based on a biomarker. As

defined by the World Health Organisation (2001), ”a biomarker is any

substance, structure or process that can be measured in the body or its

products and influence or predict the incidence of outcome or disease”.

An example for a biomarker is the concentration of the prohormone

procalcitonin in the blood, where high concentration can be an indica-

tor for sepsis.

Moreover, a DTA study can provide indications for treatment decisions

of physicians. A study could, for example, report a threshold of the

concentration of procalcitonin that - if exceeded - should be seen as an

indicator of sepsis.

For DTA studies one assumes a fully accurate gold standard, that

means one knows exactly which of the patients are ill and which are

healthy. Of course this assumption cannot be upheld entirely in most

cases. However, it should be realised as well as possible.

Conducting a DTA study, one needs two groups of patients: one group

composed of diseased (D+) and one group composed of non-diseased

individuals (D−). Without loss of generality, we assume a positive test

indicates illness. T+ will stand for patients with a positive test result,

3

4 2. BACKGROUND

\Disease D+ D− Total

Test result \

T+ TP FP TP+FP

T− FN TN FN+TN

Total n1 n0 n

Table 2.1. T+ denotes a positive test result, T− a negativetest result, D+ denotes diseased, D− non-diseased individ-uals, n1 is the number of individuals in the diseased group,n0 the number of individuals in the non-diseased group andn the overall study population. ’TP’ denotes true positives,’FP’ false positives, ’FN’ false negatives and ’TN’ true nega-tives.

whereas T− will indicate a negative test result. As the test to be ex-

amined presumably is not perfect, we get a fourfold table shown in

table 2.1. There is a number of diseased individuals who will have a

positive test result, the true positives, denoted as ’TP’, but also a num-

ber of the diseased individuals who wrongly will have a negative test

result, the false negatives, denoted as ’FN’. On the other hand there

will be a number of non-diseased individuals correctly testing negative,

denoted as ’TN’ (true negatives), and in contrast some will incorrectly

test positive, denoted as ’FP’ (false positives).

The number of diseased individuals is denoted by n1 and the number of

the non-diseased by n0. In the whole thesis we will use the subscripts

1 and 0 to distinguish between diseased and non-diseased, respectively.

2.1.2. Definition of sensitivity and specificity. To rate or

compare tests, several measures were developed. We want to present

the two most common ones, according to Honest and Khan (2002):

sensitivity and specificity .

Sensitivity (Se) is the probability of a positive test result, given the

person has the disease:

Se = P (T+|D+).

2.1. DIAGNOSTIC TEST ACCURACY STUDIES 5

This probability can be estimated as follows:

�Se = TP

n1

,

and then is also called true positive rate (TPR). In contrast, specificity

(Sp) is the probability of a negative test result, given the person is

non-diseased:

Sp = P (T−|D−).

To estimate this probability we again use the values of the fourfold

table:

�Sp =TN

n0

and call it true negative rate (TNR). As one deduces from the defini-

tions, it is desirable that both, sensitivity and specificity, are close to

one, as they state probabilities of right decisions. But as we will easily

see in the next paragraph there is a trade-off between these two mea-

sures. So in most cases they cannot be maximised both at the same

time.

2.1.3. Tests based on a continuous marker. In the following

we consider a continuous biomarker X and want to design a test based

on its value. We imagine for instance a substance in the blood that

we can measure. Without loss of generality, we assume that a higher

marker value indicates a higher probability of illness, as this is what is

true for most biomarkers. Therefore, plotting the probability of having

a specific marker value for diseased and non-diseased individuals (plot

2.1), we see the distribution of the diseased further to the right. To

create a test based on this marker, we choose a specific marker value as

cut-off value and all individuals with marker values higher than the cut-

off value get a positive test result whereas individuals with a smaller

marker value get a negative test result.

As can be seen in graphic 2.1, choosing a cut-off value results in a

fourfold table as in table 2.1. That way, every choice of a cut-off leads

to a pair of estimated sensitivity and specificity. In this thesis we will

use the terms ’cut-off value’ and ’threshold’ as synonyms. Increasing

the threshold leads to more negative test results and less positives, de-

creasing leads to more positive and less negative test results. Thus

6 2. BACKGROUND

−4 −2 0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

Biomarker

Cut−off

Non−diseased

Diseased

TN

FN

TP

FP

Figure 2.1. Distributions of a continuous biomarker, theleft curve of the non-diseased and the right one of the dis-eased. The cut-off value 1.5 leads to a fourfold table with truenegatives (TN), false negatives (FN), true positives (TP) andfalse positives (FP).

increasing the threshold results in an increasing specificity and a de-

creasing sensitivity and vice versa for a decreasing threshold.

As in general the density functions will overlap, there is no cut-off value

which leads to a sensitivity and specificity both equal to one. Instead,

we have to find a trade-off between sensitivity and specificity.

2.1.4. Receiver operating characteristic curve. To plot the

triples of cut-off value, sensitivity and specificity in a two dimensional

plot, there are two common ways to proceed. First, we want to in-

troduce the receiver operating characteristic (ROC) curve. This curve,

originally developed in the signal detection theory, was later used in

medical diagnostics (Lusted, 1971). This subsection is based on Schu-

macher and Schulgen (2008, p. 330 ff.).

In this approach we keep sensitivity and specificity as pairs of a com-

mon cut-off, but neglect the value of the cut-off. By plotting these pairs


0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

Sensitiv

ity

Figure 2.2. ROC curve of a normally distributedbiomarker. The dot represents the values of sensitivity andspecificity at cut-off 1.5.

as sensitivity against one minus specificity, both on the range [0,1], we

get the ROC curve (see plot 2.2).

Each point on the curve represents the pair of sensitivity and specificity

of one cut-off. Marking a specific cut-off value as a dot in the graphic,

by increase of the cut-off value the dot moves downwards, towards the

origin.

Throughout this thesis we will consider two classes of distributions for

the biomarker X: normal distribution and logistic distribution. De-

pending on the disease status we will choose different parameters.

First we assume that the biomarker X is normally distributed. For

the diseased, the distribution is described through N(µ1, σ21) and for

the non-diseased through N(µ0, σ20), where µ1 is greater than µ0 and

N(µ, σ) is the normal distribution with mean µ and variance σ2. Then

8 2. BACKGROUND

the ROC curve is given by

Se(x) = 1− Φµ1,σ1

�Φ

−1µ0,σ0

(1− x)�, 0 ≤ x ≤ 1, (1)

where Φµ,σ is the distribution function of the normal distribution with

parameters µ and σ. As a test is better, the higher sensitivity and

specificity are, the ROC curve should ideally run close to the upper

left corner.

In the following we want to consider the biomarker X to be logistically

distributed.

Definition 2.1 (Logistic distribution). Let the continuous random

variable X be logistically distributed with location parameters µ and

dispersion parameter σ, σ > 0. We write X ∼ Logistic(µ, σ). Then

the density function is given by

fµ,σ(x) =exp(−x−µ

σ)

σ�1 + exp

�−x−µ

σ

� �2 .

Therewith, the distribution function, named expit function, is given by

expitµ,σ(x) =1

1 + exp�−x−µ

σ

� .

The inverse of the expit function is called logit function:

logitµ,σ(x) = µ+ σ log

�x

1− x

�.

The terms ’logit’ and ’expit’ without indices refer to the standard pa-

rameters choice µ = 0 and σ = 1.

The mean of the random variable X is E(X) = µ and the variance

Var(X) = σ2π2

3.

Let the distribution of the diseased be described by Logistic(µ1,σ1)

and of the non-diseased by Logistic(µ0,σ0), where again µ1 is greater

than µ0. Then the ROC curve is given by

Se(x) = 1− expitµ1,σ1

�logitµ0,σ0

(1− x)�, 0 ≤ x ≤ 1.

2.1.5. Youden index. Another way to depict the triples of cut-

off value, sensitivity and specificity is a Youden index plot (see plot

2.3 at the left). To reduce one dimension the sum of sensitivity and

specificity minus one are plotted on the cut-off values.


−4 −2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Threshold

Yo

ud

en

in

dex

Optimal threshold = 1.1

0.0 0.2 0.4 0.6 0.8 1.00

.00

.20

.40

.60

.81

.0

1 − Specificity

Se

nsitiv

ity

Summary ROC curveOptimal threshold at 1.1 with λ=0.5

Figure 2.3. Left: Youden index curve. The optimal thresh-old is derived as that threshold where the maximum is ob-tained.Right: Summary ROC curve with the optimal thresholdmarked as a dot.

Definition 2.2 (Youden index). The Youden index for a cut-off

value x is defined by

Y (x) = Se(x) + Sp(x)− 1. (2)

The point where the Youden index is maximized can be seen as

an optimal cut-off value, as at this point the sum of sensitivity and

specificity is maximal (see figure 2.3). The optimal cut-off will from

now on refer to this interpretation.

Definition 2.3 (Weighted Youden index). The weighted Youden

index for a cut-off value x is defined by

Y (x) = 2 (λw · Se(x) + (1− λw) · Sp(x))− 1, (3)

where λw ∈ [0, 1] is a weighting parameter.

The parameter λw can be used to weight either Se or Sp higher.

Choosing λw = 0.5 results in the Youden index defined in equation (2)

10 2. BACKGROUND

and equal weighted sensitivity and specificity. To place more empha-

sis on sensitivity, one could for example choose λw = 23. This can be

reasonable, for instance, for a test at the beginning of an examination

where it is important to recognize all diseased individuals.

2.2. Meta-Analyses of Diagnostic Test Accuracy Studies

2.2.1. Meta-analysis. Often there is a large number of studies

considering similar questions. Then it is useful to conduct a system-

atic review, which seeks to bring together all available studies on one

subject and summarizes the results.

”Meta-analysis is a statistical technique for combining the findings of

independent studies. It is most often used to assess the clinical effec-

tiveness of healthcare interventions.” (Crombie and Davies, 2009) A

simple meta-analysis for studies reporting all the same effect measure

could be to calculate a weighted average effect measure.

2.2.2. Meta-analyses of DTA studies. The methodology of

systematic reviews and meta-analyses of DTA studies is relatively new,

the first papers appeared in the 1990’s (Willis and Quigley, 2011).

There is a variety of different outcomes that a DTA study can report:

Different effect sizes (sensitivity and specificity or others), even sev-

eral pairs of sensitivity and specificity, cut-off values, and much more.

Most DTA studies report just a pair of sensitivity and specificity of

their choice, possibly announcing the cut-off. But we found a number

of meta-analyses with studies reporting several up to all possible triples

(see section 4). Reporting all cut-offs is certainly only reasonable for

discrete markers.

Depending on the subject, it can happen that not all measures are

available. An imaging method for example, where a physician decides

on the basis of an image whether the person has an illness or not, has a

sensitivity and specificity which may be calculated out of the number

of correct and false decisions of the physician. But there is no numeri-

cal value, up from which the physician opts for disease. Thus there is

no threshold one can report.

The kind of meta-analysis we mainly want to consider in this thesis is

using data, where each study reports one or more pairs of sensitivity

2.2. META-ANALYSES OF DIAGNOSTIC TEST ACCURACY STUDIES 11

and specificity and the corresponding cut-offs (see Table 2.2).

To conduct a meta-analysis of diagnostic test accuracy studies one

study cut-off TP FP FN TN

1 5 106 131 4 91

1 13 92 38 18 184

1 14 92 36 18 186

1 15 93 29 17 193

2 14 19 21 6 121

3 14 159 190 6 87

3 27 119 36 46 141

4 14 37 163 1 105

Table 2.2. Excerpt of the data of the meta-analysis ofZhelev et al. (2015), consisting of 4 studies reporting 4,1,2and 1 threshold(s) and the corresponding fourfold tables.

needs to pay attention to some specialties. The most important pecu-

liarity of meta-analyses of DTA studies is that sensitivity and specificity

are dependent measures. Hence, it is inappropriate to conduct two sep-

arate meta-analyses for them.

A further challenge is that heterogeneity between the studies is typi-

cally large in a meta-analysis of DTA studies. Firstly there is variation

between studies in how a continuous marker is dichotomised into a test

classification, i.e. how thresholds are chosen. And secondly there is

variation in the accuracy of tests across different settings, e.g. different

sample sizes, very similar individuals or not, etc.

If it is appropriate, a study should report the cut-off value for every pair

of sensitivity and specificity. In a meta-analysis optimal cut-offs per

study can be averaged or an overall optimal cut-off can be computed.

However, some care has to be taken concerning the concept of an op-

timal threshold across studies as this is only reasonable if a biomarker

value has the same meaning in all studies and does not differ because

of, for example, laboratory conditions.

If studies report multiple triples, the triples of each study are based on

the same individuals and therefore dependant as well. Hence, it is not

appropriate to conduct separate meta-analyses for them.

As showed Honest and Khan (2002), the prior goals of meta-analyses

of DTA studies are typically to pool sensitivity and specificity over

12 2. BACKGROUND

all studies and to obtain a summary ROC (SROC) curve. It is also of

interest to obtain an optimal threshold of the biomarker over all studies.

2.3. Traditional Approaches

The customary approaches for meta-analysis of diagnostic accuracy

studies only use one pair of sensitivity and specificity per study. They

aim to pool sensitivity and specificity and/or to estimate a summary

ROC curve. Widely used approaches are the hierarchical model (Rutter

and Gatsonis, 2001) and the bivariate model (Reitsma et al., 2005).

2.3.1. Hierarchical model. Rutter and Gatsonis proposed a hi-

erarchical model focussing on an estimate of a summary ROC curve.

They used a mixed model to allow for variation of test stringency and

test accuracy across studies.

At the study level, the number of positive testing individuals from

study s, s = 1, ...,m are assumed to be independent and to follow

binomial distributions

TPs ∼ B(n1s, Ses),

FPs ∼ B(n0s, 1− Sps),

where n1s and n0s are the number of diseased and non-diseased individ-

uals in study s. Furthermore Rutter and Gatsonis assumed, that the

logit transformed sensitivity depends linearly on the logit transformed

1− specificity for every study s parametrizing them as follows

logit(Ses) =

�θs +

1

2αs

�e−

β

2 ,

logit(1− Sps) =

�θs −

1

2αs

�e

β

2 ,

where θs is the random threshold in study s and αs the random accuracy

in study s, which are allowed to vary across studies. The parameter β

is an asymmetry parameter and constant over all studies.

At the between-study level we will describe the hierarchical model with-

out covariates. The study-level parameters θs and αs are assumed to

2.3. TRADITIONAL APPROACHES 13

be normally distributed to account for variation across studies:

θs ∼ N(Θ, τ 2θ ),

αs ∼ N(Λ, τ 2α).

Note that the threshold parameter θs is just describing the heterogene-

ity along the SROC curve and not standing for explicit cut-off values,

as the cut-off data is not used in the model.

Then in logit space the SROC curve is linear, as logit(Se) can be ex-

pressed as

logit(Se) = e−βlogit (1− Sp) + Λe−β

2 .

Back-transforming this equation leads to the SROC curve

Se = expit�e−βlogit(1− Sp) + Λe−

β

2

�.

2.3.2. Bivariate model. The bivariate model was proposed by

Reitsma et al. in 2005. They aimed to pool sensitivity and specificity

modelling a bivariate distribution. At the study level, the number of

positive test results is binomially distributed, just as in the hierarchical

model.

At the between-study level, they assume a bivariate normal distribution

of logit transformed sensitivity and specificity:�

logit(Ses)

logit(1− Sps)

�∼ N

��µ1

µ0

�,

�τ 21 τ10

τ10 τ 20

��.

That way, they preserve the two-dimensional nature of the data, ac-

knowledge possible correlation and account for variability between the

studies with random effects.

In 2007 and 2008 it was proven that the hierarchical and the bivariate

approach are closely related and even equivalent under the condition

of no covariates (Harbord et al., 2007; Arends et al., 2008). Hence,

they illustrate the same idea from two different points of view and the

parameters of one model can be converted into the ones of the other

model.

2.3.3. Benefits and drawbacks. A lot of studies only present

one pair of sensitivity and specificity, with or without information on

the underlying cut-off, hence it is a great benefit to have meta-analysis

14 2. BACKGROUND

methods dealing with such data.

In addition, as stated in subsection 2.2.2, not all diagnostic test accu-

racy studies have a calculable numerical cut-off value. For these kind

of studies it is important to have meta-analysis approaches which do

not need a cut-off information.

A problem arising when using only one pair of sensitivity and speci-

ficity per study is a not uniquely defined SROC curve. As Arends et al.

(2008) showed, there are many different ways to define the straight line

in logit space, i.e. the transformed SROC curve. As every approach

has another way to proceed and justify, there is no completely natural

way to define the SROC curve, but a large number of ways to do so.

Furthermore, using only one pair of sensitivity and specificity per study

might overestimate the SROC curve. Most studies will not report any

pair of sensitivity and specificity but a kind of optimal one. Thus, the

available data for a summary ROC curve are all the ’optimal’ points.

So it is likely that the summary ROC curve will be too optimistic and

cannot be seen as a mean of the single ROC curves.

This problem was already addressed by Rucker and Schumacher (2010).

They proposed a new approach, which uses an ’optimal point’ assump-

tion to identify a straight line in logit space and thus an SROC curve.

Another point to be mentioned is the cut-off selection. If studies present

more than one cut-off value, the meta-analyst needs to reduce the data

and select a cut-off. This procedure leads to bias, too and also does

not use the full information.

CHAPTER 3

Theory

3.1. Motivation

In this chapter we want to describe a new approach for meta-

analysis of DTA studies, which is feasible if all studies report cut-off

values and, moreover, a number of these studies provide information

about several triples of cut-off, sensitivity and specificity. There are a

number of systematic reviews providing data of this form, as can be

seen in chapter 4.

To apply the traditional approaches, the meta-analyst has to decide

on one pair of sensitivity and specificity and thus discard a lot of in-

formation. The cut-off information is also unused. Furthermore, the

selection of a pair of somehow optimal sensitivity and specificity may

cause additional bias and the test accuracy might be overestimated.

To avoid these issues we want to use all information provided. As we

will see, this will also lead to a naturally defined SROC curve.

3.2. Existing Approaches

We want to introduce three already existing approaches, which also

use data where several studies provide information about more than

one pair of sensitivity and specificity.

3.2.1. Multivariate random effects approach. In the publi-

cation of Hamza et al. (2009) the bivariate random effects approach

is generalized to the situation where each study reports k (k ≥ 3)

thresholds and the corresponding values of sensitivity and specificity.

With a multivariate random effects approach they estimate a summary

ROC, interpreting the individual ROC curve per study as random sam-

ples of the population of all ROC curves of such studies. That way,

they do not need further assumptions for the summary ROC curve, as

it was the case in the traditional approaches (see section 2.3), and the

SROC curve can be seen as an average of all study-specific ROC curves.

15

16 3. THEORY

They are also able to calculate pooled sensitivity and specificity at any

threshold.

A clear limitation of this approach is the equal number of thresholds

demanded per study. Although there are possibilities to handle this

constraint and to accept a different number of thresholds in some cases,

the total number of different thresholds across all studies should not

be too large, as this increases the number of parameters and then the

likelihood method may not work correctly any more.

3.2.2. Survival approach. Putter et al. (2010) proposed a meta-

analysis of DTA studies with multiple thresholds using survival meth-

ods. As in the multivariate approach of Hamza et al., they act on

the assumption that all studies present the same number of thresh-

olds, e.g. categories. Putter et al. assume the number of diseased and

non-diseased with a positive or negative test result, respectively, to be

Poisson distributed. Then a multivariate gamma distribution is used to

describe between-study variation. Hence, correlation of sensitivity and

specificity within a given study is included through common random

effects. Furthermore, extra correlation of sensitivity and specificity of

different thresholds is included.

But in this approach we encounter the same problem as in the multi-

variate random effects approach: the necessity of an equal number of

thresholds per study.

3.2.3. Non-parametric approach. The last approach we want

to mention is a fully non-parametric approach of Martınez-Camblor

(2014) to estimate a summary ROC curve. The data used are the

number of TNs, TPs, FNs and FPs for one or multiple thresholds per

study. In a first step, the points are depicted in ROC space and linearly

interpolated within each study, taking into account that all ROC curves

begin at (0,0) and end in (1,1). Then a global ROC curve is computed

as a weighted mean of the individual interpolated ROC curves. There

are two weighting schemes proposed, one based on a fixed-effects model

and one on a random-effects model.

In contrast to the above-mentioned approaches, the number of thresh-

olds presented per study is not fixed and all information can be em-

ployed. On the other hand, the approach only leads to an SROC curve

3.3. NOVEL APPROACH 17

and no pooled sensitivity and specificity for specific cut-offs are esti-

mated.

3.3. Novel Approach

3.3.1. Overview. The novel approach we want to present is char-

acterized by the estimation of the distribution functions of the biomarker

within the non-diseased and diseased individuals, respectively.

The method assumes a continuous biomarker which is normal or logis-

tic distributed. Different location and dispersion parameters for non-

diseased and diseased individuals are estimated with the available data.

The distribution functions of the biomarker in the non-diseased and dis-

eased population specify the estimated specificity and one minus sensi-

tivity, respectively, per threshold. To account for the heterogeneity of

the studies, a mixed effects model is used and the bivariate structure

is taken into account allowing for correlation of the random effects of

the non-diseased and diseased individuals.

This results in a large amount of possibilities. Having estimated the

underlying distribution functions, one can read off the pooled sensi-

tivity and specificity values at every threshold and confidence regions

can be specified. The summary ROC curve follows naturally and the

summary Youden index is simply the difference of the two estimated

distributions. Furthermore, the optimal cut-off among all studies can

be calculated.

In the following subsections, we will explain the procedure step by

step. In each step we will consider two different distribution assump-

tions: normal distribution and logistic distribution, respectively. In the

first subsection, the data is transformed so that it has a linear relation-

ship. Then, a straight line for each group is estimated using a linear

mixed model. After back-transforming, we determine the optimal cut-

off value. In the last subsection we compute confidence regions for

the distribution parameters, sensitivity and specificity and the optimal

cut-off.

3.3.2. Probit/Logit Transformation. First of all, we transform

sensitivity and specificity so that they are linear in the threshold. This

gives us the possibility of using a linear mixed effects model to esti-

mate the distribution functions. Starting with a normal distribution

18 3. THEORY

assumption, let N(µ0, σ20) be the distribution for the non-diseased in-

dividuals and N(µ1, σ21) the one for the diseased.

Let x be a cut-off value. The specificity, i.e. the probability of a nega-

tive test result given non-diseased individuals, at cut-off x is the area

under the density function of cut-off values smaller than x (see figure

2.1). This is per definition the value of the distribution function of

the biomarker of the non-diseased, i.e. Φµ0,σ0, at point x. This can be

restated with the standard normal distribution and we get

Sp(x) = Φµ0,σ0(x) = Φ

�x− µ0

σ0

�.

Applying the probit function Φ−1, the inverse of the standard normal

distribution, results in

Φ−1�Sp(x)

�=

x− µ0

σ0

,

and the expression on the left-hand side is linear in x.

Considering the distribution function of the biomarker of the diseased,

Φµ1,σ1, which gives the probability of a negative test result for diseased

individuals, i.e. one minus sensitivity, we get the following equivalence:

1− Se(x) = Φµ1,σ1(x) = Φ

�x− µ1

σ1

�.

Probit transforming the equation leads to

Φ−1�1− Se(x)

�=

x− µ1

σ1

.

We conclude that the logit transformed specificity and one minus sen-

sitivity depend linearly on the cut-off value x.

In the following, we want to assume logistic distributions for non-

diseased and diseased. Let the location parameter be µ0 and µ1,

and the dispersion parameter be σ0 and σ1 for non-diseased and dis-

eased, respectively. For a threshold x we get analogously to the above-

mentioned case

Sp(x) = expitµ0,σ0(x) =

1

1 + exp�−x−µ0

σ0

� .


Applying the standard logit function results in

logit(Sp(x)) = logit

1

1 + exp�−x−µ0

σ0

�

= log

1

exp�−x−µ0

σ0

�

=x− µ0

σ0

.

Concerning the diseased individuals, we have

1− Se(x) = expitµ1,σ1(x) =

1

1 + exp�−x−µ1

σ1

� .

Logit transforming this equation leads to

logit�1− Se(x)

�=

x− µ1

σ1

,

and we get linearity in x for logit transformed specificity and one minus

sensitivity as well.

3.3.3. Linear regression. In the next step, we want to fit the

transformed data with a linear model to extract the parameters of

the earlier mentioned linear dependence. Let us first consider a linear

fixed effects model. The definition follows the formulation of Ga�lecki

and Burzykowski (2013) closely.

Definition 3.1 (Linear fixed effects model). A linear regression

model with fixed effects is given by

y = Xβ + e,

e ∼ N(0, R),

where y = (y1, . . . , yn), consisting of the response variables for the n

observations and

X =

x11 x12 . . . x1p

x21 x22 . . . x2p

......

...

xn1 xn2 . . . xnp

,

20 3. THEORY

the design matrix, consisting of the n observation values of p (known)

covariates (p < n) and β = (β1, . . . , βp) are the corresponding (un-

known) regression parameters. We assume the residual error

e = (e1, . . . , en) is multivariate normally distributed with variance ma-

trix R and the e1, . . . , en are independent.

In our case we want to only take one covariate into consideration,

the threshold, and estimate two separate regression lines for the dis-

eased and non-diseased, respectively.

Let ks be the number of cut-offs in study s (s = 1, ...,m) and let xsi

be the value of the i’th cut-off of study s. The number of cut-offs may

vary over the studies. We want to explain the transformed proportions

of negative test results, with TNsi

n0sbeing the proportion of negative test

results of the non-diseased of study s and cut-off i, and FNsi

n1sthe one of

the diseased.

From now on, we will present all linear models with probit transformed

proportions of negative test results, but of course the same models can

be used for logit transformed ones.

We consider the following regression model:

Φ−1

�TNsi

n0s

�= α0 + β0 · xsi + esi,

Φ−1

�FNsi

n1s

�= α1 + β1 · xsi + fsi,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks,

where α0 and α1 are the intercepts, β0 and β1 are the slopes for the non-

diseased and diseased, respectively. The independent error terms of the

non-diseased are denoted with esi, the ones of the diseased with fsi for

every study s and cut-off i. They are both mean zero normally dis-

tributed with variances γ2

wsiand γ2

vsi, respectively, where γ is an unknown

scale parameter and wsi and vsi are given prior weights. We use this

formulation for the variances of the residual errors because it is the

way it is implemented in the R function lmer(), which we will use for

the regressions (see section 3.5).


This regression model is a fixed effects model. A linear regression

with only fixed effects assumes that there is a constant overall inter-

cept and slope for all possible values. The data is just deviating from

this regression line by chance. The way it is deviating is described by

esi ∼ N�0, γ2

wsi

�, fsi ∼ N

�0, γ2

vsi

�for s = 1, . . . ,m, i = 1, . . . , ks.

For the meta-analysis data this means that, for example for the dis-

eased, we estimate one straight line, i.e. one transformed distribution,

which underlies all data. That way we assume that all studies have the

same biomarker distribution.

But one can observe a big heterogeneity between the studies, which

is the consequence of different sample sizes, different laboratory con-

ditions, different study populations (differing in countries, specialized

clinics, etc.) and much more. To explain part of this heterogeneity in

the model, we want to include random effects.

A further disadvantage of this fixed effects model is the estimation of

two separate straight lines, leading to two separate distribution func-

tions, disregarding the bivariate character of the data. The data of the

non-diseased and diseased individuals are not independent, but come

from the same studies. That is why we want to include correlated pa-

rameters to link the two regression lines.

A model describing such a set-up is the linear mixed effects model. The

formulation follows closely M.Laird and H.Ware (1982) and Ga�lecki and

Burzykowski (2013).

Definition 3.2 (Linear mixed effects model). A linear mixed model

is given through an extension of the linear fixed effects model (see defi-

nition 3.1), including random effects. For hierarchical data with a single

level of grouping, we formulate the linear mixed model at a given level

of grouping factor s (s = 1, . . . ,m) as follows:

ys = Xsβ + Zsds + es, (4)

ds ∼ N(0, D),

es ∼ N(0, Rs), with ds ⊥ es,

where ys, Xs and es are the vector of response, the design matrix and

the vector of residual errors for grouping factor s and β the regression

22 3. THEORY

parameters of the fixed effects as in definition 3.1. The covariables

matrix of the random effects Zs is given by

Zs =

z(s)11 z

(s)12 . . . z

(s)1q

z(s)21 z

(s)22 . . . z

(s)2q

......

...

z(s)n1 z

(s)n2 . . . z

(s)nq

,

consisting of n observations of q (known) covariates and

ds = (d(s)1 , . . . , d

(s)q )� is the corresponding (unknown) vector of random

effects. The residual errors are independent and independent of the

random effects and have the variance matrix Rs. The q × q variance-

covariance matrix D of the random effects is positive-definite.

To write one model for all data, let Y = (y�1 , . . . , y�

m)�,

d = (d�1 , . . . , d�

m)� and e = (e�1 , . . . , e

�

m)� be the vectors containing all

observed values of the dependent variable, all random effects and all

residual errors of all grouping factors s. Define matrices

X =

X1

...

Xm

and Z =

Z1 0 . . . 0

0 Z2 . . . 0...

.... . .

...

0 0 . . . Zm

,

where 0 denotes a matrix with all elements equal to zero. Then the

linear mixed regression (equation (4)) can be written for all data as

follows

Y = Xβ + Zd+ e,

d ∼ N(0,D),

e ∼ N(0,R),

where D = diag(D), a diagonal matrix with the matrix D as main

diagonal entries, and R = diag(R1, . . . Rm).

In the case of meta-analysis data, there is a clear hierarchical struc-

ture by the different studies. Furthermore, the correlation between

values of the same study must not be neglected. Considering the stud-

ies as randomly chosen out of the overall study population, a linear

mixed model with study as grouping factor is an adequate way of re-

gressing the data.


For the fixed effects covariate matrix we set p = 2, thus we consider

only a fixed intercept and a single covariate, the threshold (cf. the

fixed effects model). For the random effects covariates matrix Zs for

grouping level s we will choose either

Zs =

1

1...

1

, Zs =

z(s)1

z(s)2...

z(s)n

or Zs =

1 z(s)1

1 z(s)2

......

1 z(s)n

,

starting with the one on the left-hand side. As we aim to estimate

two distributions, for the non-diseased and the diseased, respectively,

we want to estimate two regression lines as in the fixed effects model.

Extending the fixed effects model with a random intercept as in both

regressions leads to a linear mixed model

Φ−1

�TNsi

n0s

�= α0 + as + β0 · xsi + esi, (model CI)

Φ−1

�FNsi

n1s

�= α1 + as + β1 · xsi + fsi,

as ∼ N(0, τ 2a ),

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks,

where as is the random intercept for both groups at the same time.

The variable as is normally distributed with mean zero and standard

deviation τ 2a . The residual errors esi and fsi are independent and inde-

pendent of as.

The model parameters which we do not mention here are defined in

the same way as in the fixed effects model. It holds for all following

models, that model parameters which are not mentioned are defined

as in the previous model. The model name ’CI’ stands for ’common

(random) intercept’. The denomination of the models will always refer

to the random effects structure.

The following paragraph is based on Barr et al. (2013). With model CI

we estimate an overall straight line for each group, just as in the fixed

24 3. THEORY

effects model, with the fixed effects parameters α0/1 and β0/1. These

fixed effects parameters are not depending on the selection of studies

for the meta-analysis, but represent the overall study population. In

contrast, the random effects as have different values for every study.

The specific composition of the intercepts α0/1+ as, s = 1, . . . ,m for a

given meta-analysis is assumed to be a random subset of the intercepts

in the underlying study-population. Another instantiation of the same

meta-analysis where different studies would be included would there-

fore have different realizations of the as effects .

The primary goal is to produce a model, which represents the whole

study-population from which the studies are randomly drawn, rather

than describing the specific as, s = 1, . . . ,m values for this sample.

Therefore, instead of estimating the individual as effects, the model-

fitting algorithm estimates the population distribution from which the

as are drawn. We assume for the study-specific intercepts that as is

mean zero normally distributed and estimate the variance τ 2a .

By including the random term as we allow for study-specific intercepts,

that lead to study-specific means (see subsection 3.3.4).

Furthermore, the two regression lines are now connected via a common

random intercept. That way we acknowledge the bivariate structure of

sensitivity and specificity, deriving from the same studies.

Another way to include a random intercept with respect to the group-

ing factor ’study’, is to include different random intercepts for non-

diseased and diseased individuals. This leads to model DI (’different

intercepts’):

Φ−1

�TNsi

n0s

�= α0 + a0s + β0 · xsi + esi, (model DI)

Φ−1

�FNsi

n1s

�= α1 + a1s + β1 · xsi + fsi,

(a0s, a1s) ∼ N

�0,

�τ 20a ρτ0aτ1a

ρτ0aτ1a τ 21a

��,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.


Whereas in model CI we assumed the study-specific intercepts as,

s = 1, . . . ,m added to α0/1 are the same for both groups, we allow

in model DI for non-diseased and diseased individuals to have differ-

ent study-specific intercepts. The random intercepts of non-diseased

and diseased individuals, i.e. a0s and a1s, are assumed to be bivari-

ate normal distributed. The correlation of the random effects assures

the togetherness of diseased and non-diseased individuals to the same

study. As before, the random effects of different studies, a0s and a0s� ,

a1s and a1s� and also a0s and a1s� with s �= s�, are assumed to be inde-

pendent. This is also valid for all following models.

In the following we want to include random slopes instead of random

intercepts. The equivalent to model CI is the model CS (’common

slopes’), which includes the same random slope for both, non-diseased

and diseased individuals. This corresponds to the use of a random

effects covariates matrix

Zs =

z(s)1

z(s)2...

z(s)n

.

The model CS is given by

Φ−1�TNsi

n0s

�= α0 + (β0 + bs)xsi + esi, (model CS)

Φ−1�FNsi

n1s

�= α1 + (β1 + bs)xsi + fsi,

bs ∼ N(0, τ 2b ),

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

26 3. THEORY

As before, we could also allow different random effects in the groups

and this leads to model DS (’different slopes’):

Φ−1

�TNsi

n0s

�= α0 + (β0 + b0s)xsi + esi, (model DS)

Φ−1

�FNsi

n1s

�= α1 + (β1 + b1s)xsi + fsi,

(b0s, b1s) ∼ N

�0,

�τ 20b ρτ0bτ1b

ρτ0bτ1b τ 21b

��,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

Finally we can also include both, random intercept and random slope,

using the random effects covariate matrix

Zs =

1 z(s)1

1 z(s)2

......

1 z(s)n

.

Starting with a common random intercept and a common random slope

for both groups, this leads to the following model:

Φ−1

�TNsi

n0s

�= α0 + as + (β0 + bs)xsi + esi, (model CICS)

Φ−1

�FNsi

n1s

�= α1 + as + (β1 + bs)xsi + fsi,

(as, bs) ∼ N

�0,

�τ 2a ρτaτb

ρτaτb τ 2b

��,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

where the variance-covariance matrix of (as, bs) equals matrix D in

definition 3.2.

Proceeding as before, we now include distinct random intercepts for


diseased and non-diseased individuals:

Φ−1

�TNsi

n0s

�= α0 + a0s + (β0 + bs)xsi + esi, (model DICS)

Φ−1

�FNsi

n1s

�= α1 + a1s + (β1 + bs)xsi + fsi,

(a0s, a1s, bs) ∼ N

0,

τ 20a ρ1τ0aτ1a ρ2τ0aτb

ρ1τ0aτ1a τ 21a ρ3τ1aτb

ρ2τ0aτb ρ3τ1aτb τ 2b

,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

The variance-covariance matrix of the random effects is a composition

of the ones we have already seen. The random intercept and random

slope of the same study as well as the different random intercepts for

non-diseased and diseased of the same study can exhibit a correlation.

For ease of notation we will use the names of the correlation coefficients

in the different models for different correlation coefficients.

We could, instead of including separate random intercepts for the

groups, add different random slopes for the non-diseased and diseased

individuals. This leads to model CIDS:

Φ−1

�TNsi

n0s

�= α0 + as + (β0 + b0s)xsi + esi, (model CIDS)

Φ−1

�FNsi

n1s

�= α1 + as + (β1 + b1s)xsi + fsi,

(as, b0s, b1s) ∼ N

0,

τ 2a ρ1τaτ0b ρ2τaτ1b

ρ1τaτ0b τ 20b ρ3τ0bτ1b

ρ2τaτ1b ρ3τ0bτ1b τ 21b

,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

The last model we want to present is the most complex one, includ-

ing different random slopes and different random intercepts for both

28 3. THEORY

groups:

Φ−1

�TNsi

n0s

�= α0 + a0s + (β0 + b0s)xsi + esi, (model DIDS)

Φ−1

�FNsi

n1s

�= α1 + a1s + (β1 + b1s)xsi + fsi,

(a0s, a1s, b0s, b1s) ∼ N

0,

τ 20a ρ1τ0aτ1a ρ2τ0aτ0b ρ3τ0aτ1b

ρ1τ0aτ1a τ 21a ρ4τ1aτ0b ρ5τ1aτ1b

ρ2τ0aτ0b ρ4τ1aτ0b τ 20b ρ6τ0bτ1b

ρ3τ0aτ1b ρ5τ1aτ1b ρ6τ0bτ1b τ 21b

,

esi ∼ N

�0,

γ2

wsi

�,

fsi ∼ N

�0,

γ2

vsi

�, s = 1, . . . ,m, i = 1, . . . , ks.

As one can see, the total number of parameters to estimate is quite

high for model DIDS. Hence, to use this model for estimation one

needs enough data.

So far we have presented eight linear mixed effects models, which are

shown in an overview in table 3.1. A special case of these regression

models is to fix the slopes for the diseased and non-diseased individuals

to β0 = β1.

But apart from that, there is no further simplification of the fixed

effects structure feasible, as it is the main idea to estimate two dis-

tributions, for the non-diseased and diseased individuals, respectively,

being separable by their location.

Under the given circumstances (estimation of two straight lines for the

diseased and non-diseased individuals, respectively, same fixed effects,

study as only grouping factor, correlated random effects and the thresh-

old being the only covariate) table 3.1 contains a complete list of all

possible models.

3.3.4. Back-transformation. The linear regression provides in-

tercepts aj and slopes bj, j = 0, 1. We want to back-transform the

regression lines to obtain the distributions for the non-diseased and

diseased individuals. Thus, we want to compute the distribution pa-

rameters µj, σj, j = 0, 1. In subsection 3.3.2 we obtained the equiv-

alence of probit transformed specificity and one minus sensitivity to


Model Specification

DIDS Different random intercepts and different random slopes

CIDS Common random intercept and different random slopes,

a0s = a1s = as

DICS Different random intercepts and common random slope,

b0s = b1s = bs

CICS Common random intercept and common slope,

a0s = a1s = as, b0s = b1s = bs

DS Different random slopes,

a0s = a1s = 0

CS Common random slope,

a0s = a1s = 0, b0s = b1s = bs

DI Different random intercepts,

b0s = b1s = 0

CI Common random intercept,

a0s = a1s = as, b0s = b1s = 0

Table 3.1. Linear mixed effects models listed according totheir random effects structure.

z−µj

σj, j = 0, 1, respectively, and the same for the logit transformed.

Now after regressing the data we obtain the following:

z − µj

σj

= αj + βjz (j = 0, 1).

By equating coefficients we get

αj = −µj

σj

, βj =1

σj

(j = 0, 1),

wherefrom we obtain

µj = −αj

βj

, σj =1

βj

(j = 0, 1).

Thus, it is necessary that the βj (j = 0, 1) are positive to obtain positive

dispersions. That means Sp(z) and 1 − Se(z), i.e. the probabilities of

having a negative test result, should increase with increasing cut-off

values within both groups.

Within studies this is always true (see graphic 2.1). But if we combine

data of several studies, as it is done in a meta-analysis, and regress this

combined data it need not be true anymore.

30 3. THEORY

As we can see, if one fixes β0 = β1 in the linear regression models,

one assumes that the distributions of the biomarker of ill and healthy

individuals have the same variance. In contrast, varying the intercept

of the linear regression affects the mean of the distribution function.

3.3.5. Optimal cut-off value. The optimal cut-off value of a

biomarker is the value where the Youden index is maximized (see sub-

section 2.1.5).

Let us assume a normally distributed biomarker. Hence we can write

the weighted Youden index for a cut-off value x as

Y (x) = λw

�1− 2Φ

�x− µ1

σ1

��+ (1− λw)

�2Φ

�x− µ0

σ0

�− 1

�,

see equation (3). The weighted Youden index is maximised at one of

the two points of intersection between the weighted densities of the two

normal distributions ϕµ1,σ1and ϕµ0,σ0

, thus at one of the two solutions

of

λwϕµ1,σ1(x) = (1− λw)ϕµ0,σ0

(x).

The argument x0 of the maximal Youden index is given by

x0 =µ0σ

21 − µ1σ

20

σ21 − σ2

0

+

�σ20σ

21

�2(σ2

1 − σ20)�log σ1

σ0

− logit(λw)�+ (µ1 − µ0)2

�

σ21 − σ2

0

.

If σ0 = σ1 =: σ holds, then the argument x0 of the maximal Youden

index is given by

x0 =σ2logit(λw) +

12(µ2

0 − µ21)

µ0 − µ1

.

For the logistic assumption the weighted Youden index is defined as

Y (x) = λw

�1− 2expitµ1,σ1

(x)�+ (1− λw)

�2expitµ0,σ0

(x)− 1�.

To obtain the point where the Youden index is maximised we need to

solve the equation

λwexpit�

µ1,σ1(x) = (1− λw)expit

�

µ0,σ0(x). (5)


The prime symbol � denotes the derivative, which will hold throughout

the thesis. As no analytical solution has been found yet, we propose

a fixed point iteration to compute the optimal cut-off. The solution

x ∈ [µ0, µ1] of (5) can be written as

x = µ0 + σ0arccosh

�(1− λw)

λw

σ1

σ0

�1 + cosh

�x− µ1

σ1

��− 1

�. (6)

Interpreting the right-hand side of (6) as function g of x leads to

g(x) = µ0 + σ0arccosh

�(1− λw)

λw

σ1

σ0

�1 + cosh

�x− µ1

σ1

��− 1

�.

(7)

The inverse is given by

g(x)−1 = µ1 − σ1arccosh

�λw

(1− λw)

σ0

σ1

�1 + cosh

�x− µ0

σ0

��− 1

�.

(8)

We can fixed point iterate these functions with, for example, start-

ing value (µ0+µ1)/2. Following the Banach fixed point theorem, one of

the functions has precisely one fixed point and the fixed point iteration

converges towards this fixed point, if the function is Lipschitz contin-

uous with Lipschitz constant L ∈ [0, 1). Depending on the parameter

µ0, µ1, σ0 and σ1 this needs to be checked for the functions of (7) and

(8).

3.3.6. Confidence regions. To obtain confidence intervals we

use the delta method. The definition is taken from Agresti (1990),

p. 422.

Definition 3.3 (Delta method). Let Xn1, Xn2, . . . , Xnl be asymp-

totically multivariate normal distributed random variables with means

Θ1,Θ2, . . . ,Θl and covariance matrix Σ/n. The subscript n shall ex-

press the dependence on the sample size n.

We defineXn = (Xn1, Xn2, . . . , Xnl)� and Θ = (Θ1,Θ2, . . . ,Θl)

�. More

precisely, the Xn converge in distribution as follows

√n(Xn −Θ)

d→ N(0,Σ).

32 3. THEORY

Suppose the function f(Xn) has a non-zero differential

d = (dX1, dX2

, . . . , dXl) at Θ, where

dXi(Θ) =

∂f(Xn)

∂Xni

��(Θ1,Θ2,...,Θl)

.

It follows that

√n�f(Xn)− f(Θ)

� d→ N(0, d�Σd).

Thus for large samples, f(Xn) has a distribution similar to the normal

with mean f(Θ) and variance (d�Σd)/n.

Therefore, in the univariate case (l = 1) we can approximate Var�f(X)

�,

dropping the subscript n for ease of reading, by

�f �(Θ)

�2Var(X),

and in the bivariate case (l = 2) we can approximate Var (f(X1, X2))

by

dX1(Θ)2Var(X1) + 2dX1

(Θ)dX2(Θ)Cov(X1, X2) + dX2

(Θ)2Var(X2).

3.3.6.1. Distribution parameters. In the following we want to calcu-

late the confidence intervals of the distribution parameters. We assume

µj and σj, j = 0, 1 are asymptotically normally distributed. First, we

want to compute the variance of σj, j = 0, 1. With σj =1bj

in mind,

we use the univariate delta method with f(X) = 1X

and derivative

f �(X) = − 1X2 and conclude

Var(σj) = Var

�1

βj

�=

1

E(βj)4Var(βj), j = 0, 1.

For the computation of the variance of µj = −αj

βj, j = 0, 1 we apply

the bivariate delta method with f(X1, X2) =X2

X1

and partial derivatives∂f∂X1

(X1, X2) = −X2

X2

1

and ∂f∂X2

(X1, X2) =1X1

. It results for j = 0, 1

Var(µj) = Var

�−αj

βj

�

=E(αj)

2

E(βj)4Var(βj)− 2

E(αj)

E(βj)3Cov(αj, βj) +

1

E(βj)2Var(αj).

The variances of αj and βj, j = 0, 1 are obtained from the linear

regression.


We get the (1− α) confidence interval for σj, j = 0, 1 as

[σj − z1−α2

�Var(σj), σj + z1−α

2

�Var(σj)]

and the (1− α) confidence interval for µj, j = 0, 1 as

[µj − z1−α2

�Var(µj), µj + z1−α

2

�Var(µj)],

where z1−α2is the 1− α

2quantile of the standard normal distribution.

3.3.6.2. Sensitivity and specificity. To construct (1−α) confidence

intervals for sensitivity and specificity, we first regard the confidence

intervals of (logit and probit) transformed sensitivity and one minus

specificity. We will demonstrate this procedure only with probit trans-

formation, but it holds just as well for logit transformation, by simply

replacing ’Φ’ with ’expit’ and ’Φ−1’ with ’logit’.

The variance of probit transformed sensitivity (now called y0) and pro-

bit transformed one minus specificity (now called y1) for a fixed cut-off

value x are given by

Var�Φ

−1(Sp)�= Var(y0)

= Var(α0 + β0x)

= Var(α0) + 2xCov(α0, β0) + x2Var(β0),

Var�Φ

−1(1− Se)�= Var(y1)

= Var(α1 + β1x)

= Var(α1) + 2xCov(α1, β1) + x2Var(β1).

Thus, assuming the transformed specificity and 1-sensitivity are ap-

proximately normally distributed, the confidence intervals are given

for the transformed specificity by

[y0 − z1−α2

�Var(y0); y0 + z1−α

2

�Var(y0)]

and for the transformed 1-sensitivity by

[y1 − z1−α2

�Var(y1); y1 + z1−α

2

�Var(y1)].

We probit transform these confidence interval to obtain the confidence

intervals of specificity

[Φ�y0 − z1−α

2

�Var(y0)

�; Φ

�y0 + z1−α

2

�Var(y0)

�]

34 3. THEORY

and of sensitivity

[1− Φ�y1 + z1−α

2

�Var(y1)

�; 1− Φ

�y1 − z1−α

2

�Var(y1)

�].

3.3.6.3. Optimal cut-off. In the following we want to determine a

confidence interval for the optimal cut-off under normal distribution

assumption. Remember the following formula for the optimal cut-off x0

if σ0 �= σ1:

x0 =µ0σ

21 − µ1σ

20

σ21 − σ2

0

+

�σ20σ

21

�2(σ2

1 − σ20)�log σ1

σ0

− logit(λw)�+ (µ1 − µ0)2

�

σ21 − σ2

0

.

First, we conduct a reparametrization with

µj = −αj

βj

, σj =1

βj

(j = 0, 1),

to achieve a representation of the optimal cut-off, which depends on

α0,α1, β0 and β1:

x0 = h(α0,α1, β0, β1)

=

α1β1 − α0β0 +

�2(β2

0 − β21)�log β0

β1

− logit(λw)�+ (α0β1 − α1β0)

2

β20 − β2

1

.

Assuming the optimal cut-off to be asymptotically normally distributed,

we can use the delta method to obtain a variance. Let Θ be the mean

vector (E(α0),E(α1),E(β0),E(β1))�. The delta method implies that

the approximate variance for the optimal cut-off is given by

∂h

∂α0

��2

Θ

Var(α0) +∂h

∂α1

��2

Θ

Var(α1) +∂h

∂β0

��2

Θ

Var(β0) +∂h

∂β1

��2

Θ

Var(β1)

+ 2∂h

∂α0

��Θ

∂h

∂α1

��Θ

Cov(α0,α1) + 2∂h

∂α0

��Θ

∂h

∂β0

��Θ

Cov(α0, β0)

+ 2∂h

∂α0

��Θ

∂h

∂β1

��Θ

Cov(α0, β1) + 2∂h

∂α1

��Θ

∂h

∂β0

��Θ

Cov(α1, β0)

+ 2∂h

∂α1

��Θ

∂h

∂β1

��Θ

Cov(α1, β1) + 2∂h

∂β0

��Θ

∂h

∂β1

��Θ

Cov(β0, β1).


The variances and covariances in the formula result from the linear

regression. Denoting

S :=

�2(β2

0 − β21)

�log

β0

β1

− logit(λw)

�+ (α0β1 − α1β0)

2,

the partial derivatives are given by

∂

∂α0

h(α0,α1, β0, β1) =−β0 +

β1

S(α0β1 − α1β0)

β20 − β2

1

,

∂

∂α1

h(α0,α1, β0, β1) =β1 − β0

S(α0β1 − α1β0)

β20 − β2

1

,

∂

∂β0

h(α0,α1, β0, β1) =−α0 +

1β0S

(β20 − β2

1)

β20 − β2

1

+4β0

�log β0

β1

− logit(λw)�− 2α1(α0β1 − α1β0)

2S(β20 − β2

1)

− 2β0 (α1β1 − α0β0 + S)

(β20 − β2

1)2 ,

∂

∂β1

h(α0,α1, β0, β1) =α1 − 1

β1S(β2

0 − β21)

β20 − β2

1

+−4β1

�log β0

β1

− logit(λw)�+ 2α0(α0β1 − α1β0)

2S(β20 − β2

1)

+2β1(α1β1 − α0β0 + S)

(β20 − β2

1)2 .

If σ0 = σ1 =: σ holds, then the optimal cut-off value is given by

x0 =σ2logit(λw) +

12(µ2

0 − µ21)

µ0 − µ1

.

As σ0 = σ1 it follows β0 = β1 =: β. First we reparametrize, so that we

obtain the optimal cut-off value dependent of α0,α1 and β:

x0 = hσ(α0,α1, β0, β1)

=

1β

�logit(λw) +

12(α2

1 − α20)�

α1 − α0

.

36 3. THEORY

Then, with the delta method, we obtain the following estimate of the

variance of x0, with Θ being the mean vector (E(α0),E(α1),E(β))�:

∂hσ

∂α0

��2

Θ

Var(α0) +∂hσ

∂α1

��2

Θ

Var(α1) +∂hσ

∂β

��2

Θ

Var(β)

+ 2∂hσ

∂α0

��Θ

∂hσ

∂α1

��Θ

Cov(α0,α1) + 2∂hσ

∂α0

��Θ

∂hσ

∂β

��Θ

Cov(α0, β)

+ 2∂hσ

∂α1

��Θ

∂hσ

∂β

��Θ

Cov(α1, β),

where the derivatives are given by

∂

∂α0

hσ(α0,α1, β) =α1(α1 − α0)− 1

β

�logit(λw) +

12(α2

1 − α20)�

(α1 − α0)2

∂

∂α1

hσ(α0,α1, β) =−α0(α1 − α0) +

1β

�logit(λw) +

12(α2

1 − α20)�

(α1 − α0)2

∂

∂βhσ(α0,α1, β) =

− 1β2

�logit(λw) +

12(α2

1 − α20)�

α1 − α0

.

Thus, the confidence interval of the optimal cut-off in the normal dis-

tribution case, is given by

[x0 − z1−α2

�Var(x0), x0 + z1−α

2

�Var(x0)].

Assuming logistic distribution, we can determine the optimal cut-off

only with a fixed point iteration. Thus, we cannot obtain a variance

estimate in the same way.

3.4. Model Selection

In the novel approach a broad range of linear mixed models are

introduced. This leads directly to the question of model selection. The

model selection of linear mixed effects models is a current research

question, as common measures, known from fixed effects models, can-

not be used for mixed models. Several approaches for model selection

have been presented during the last few years. We will briefly present

two which seem promising to us. Unfortunately the approaches are

both not suitable to select one of our models.

3.4.1. REML criterion. It is common to estimate mixed models

with a restricted maximum likelihood (REML) approach. The idea is to

3.4. MODEL SELECTION 37

obtain less biased variance estimates for the random effects, considering

linear combinations of the data that remove the fixed effects (Faraway

(2006), p.172). The REML criterion is defined as

REML = −2 log(LikREML),

where LikREML is the likelihood function of the transformed data. The

preferred model is the one with the smallest REML criterion.

As the fixed effects are eliminated, with the REML criterion only mixed

models with same fixed effects can be compared. Thus, for our range

of models (where some models assume same fixed slopes β0 = β1 and

others not) this model selection method is not appropriate.

3.4.2. Conditional AIC. From linear fixed effect models the Akaike

information criterion (AIC) is well known. It is defined as follows:

Definition 3.4 (Akaike information criterion). Let �L be the max-

imized value of the likelihood function for the model and p the number

of estimated parameters in the model, then the AIC of the model is

defined as

AIC = −2 ln(�L) + 2p.

The problem of using this measure with mixed effect models is the

lack of clarity on how to determine a number of parameters p for the

model. Imagine we had a mixed effects model with only one random

effect. So we could argue that we only have one parameter more than

the corresponding fixed effects model, the variance parameter for that

random effect. But on the other hand, we are incorporating as many

random effects as we have studies. So what is the correct number of

parameters?

To face this problem, Vaida and Blanchard (2005) proposed a condi-

tional Akaike information criterion (cAIC). Instead of using twice the

number of parameters as penalty term, they propose a penalty term

which is related to the effective degrees of freedom for a linear mixed

model stated by Hodges and Sargent (2001). The effective degrees of

freedom reflect an intermediate level of complexity between the fixed-

effects model without the random effects and a corresponding model

with all random effects counted as fixed ones. Greven and Kneib (2010)

38 3. THEORY

identified deficits of the cAIC (the estimation of the random effects co-

variance matrix induces bias) and derived an analytic representation of

a corrected version of the cAIC. They also provide an implementation

in an R package.

This approach seems very promising, but unfortunately, the implemen-

tation in R has not been working in all of our examples yet and the

results have not been entirely plausible.

3.5. Implementation in R

To apply the linear mixed models presented in subsection 3.3.3 we

use the lmer() function of the R-package ’lme4’. The function provides

maximum likelihood or restricted maximum likelihood estimates of the

parameters in linear mixed effects models. The model is described

by a formula, including fixed and random effects terms. To estimate

the parameters of, for example, model DI with normal distribution

assumption, we use the following code:

lmer (qnorm(NN) ˜ Group ∗ Cut + (Group | Study ) ) ,

where the left-hand side of the ˜ symbol represents the response vari-

able and the right-hand side the explanatory variables. Here qnorm()

is the probit function and ’NN’ a combined vector of the proportions

of the negative test results, first of the non-diseased and then of the

diseased individuals.

On the right-hand side of the formula Group ∗ Cut stands for the fixed

effects and (Group | Study) for the random effects with study as clus-

tering factor. ’Group’ is a vector containing zeros in the first half and

ones in the second, standing for the non-diseased and diseased individ-

uals, respectively. ’Cut’ is the vector of thresholds, including study 1 to

m. The ∗ symbol means that there will be three regression parameters

estimated, one for the covariate Group (i1), one for Group·Cut (s1)

and one for Cut (s0). With the intercept i0 (which is estimated per

3.5. IMPLEMENTATION IN R 39

default), we receive the fixed effects parameter of model DI as follows:

α0 = i0,

α1 = i0 + i1,

β0 = s0,

β1 = s0 + s1,

as for the non-diseased the group variable is zero and for the diseased

it is one.

In the following, let us examine the random effects. There are two

random effects, the default random intercept i0 and a random effect i1

of the covariate Group. To obtain the random effects of model DI we

proceed analogously to the fixed effects:

a0s = i0,

a1s = i0 + i1.

For the variances and covariance thus holds

τ 20a = Var(i0),

τ 21a = Var(i0 + i1) = Var(i0) + 2Cov(i0, i1) + Var(i1),

Cov(a0s, a1s) = Cov(i0, i0 + i1) = τ 20a + Cov(i0, i1).

In R the linear mixed model is only written in one equation, to allow

correlation of the random effects of the non-diseased and diseased in-

dividuals.

One needs to pay attention to the number of parameters that need to

be estimated for a model. For the most complex model, model DIDS,

the number of parameters is the highest and none of our examples of

chapter 4 provided enough data to estimate these parameters.

To include heteroscedasticity with a small number of additional pa-

rameters (Ga�lecki and Burzykowski, 2013, p.124), the lmer() function

assumes the proportion of an unknown scale parameter γ2 and a given

prior weight wsi or vsi for the variance of the residual error of study

s and cut-off i. Per default wsi = vsi = 1 is assumed for all obser-

vations (Bates et al., 2015, p.42). Thus, the higher the weight for a

given observation, the lower the variance. The prior weights are not

normalized or standardized. Therefore, if the weights have relatively

40 3. THEORY

large magnitudes, then in order to compensate, the parameter γ will

need to have a relatively large magnitude (Bates et al., 2015, p.42).

This impairs the convergence of the lmer function, as the eigenvalues

rise. So it is advisable to scale the weights. We propose a scaling of

the weights, so that they have a mean of one. In section 3.6 we will

take a look at possible prior weights.

3.6. Weighting Parameters

In meta-analysis it is reasonable to assign different weights to dif-

ferent studies. First of all, studies with higher sample sizes should be

emphasized, as the information which these studies provide represent

a larger part of the population and thus they are expected to be more

reliable. Secondly, one could emphasize studies with smaller variance.

A small variance can be due to a large sample size and to homoge-

neous test results, which is both desirable. In the following we want to

present these two possibilities of weighting.

3.6.1. Sample Size. To weight the studies with their sample size,

we set weights w0si and w1si for the non-diseased and diseased individ-

uals of study s and cut-off i as

w0si = n0s ,

w1si = n1s ,

where n0s is the number of non-diseased individuals in study s and n1s

the number of diseased individuals in study s. It is important to weight

the groups separably, because a study may have different sample sizes

in each group.

3.6.2. Inverse Variance. To determine inverse variance weights,

we need to compute the variances of the probit and logit transformed

specificity and one minus sensitivity out of the data. Therefore, we use

the delta method (see definition 3.3).

First, we determine the mean and variance of the estimated specificity

and one minus sensitivity. Considering the TN as a (n0, p0)-binomial

distributed random variable within the non-diseased, with p0 being the

probability of a negative test result for a non-diseased individual and

3.6. WEIGHTING PARAMETERS 41

n0 the number of non-diseased individuals, we have

E(�Sp) = E

�TN

n0

�=

1

n0

E(TN) = p0, (9)

Var(�Sp) = Var

�TN

n0

�=

1

n20

Var(TN) =p0(1− p0)

n0

. (10)

Analogously we assume the TP as being a (n1,p1)-binomial distributed

random variable with p1 the probability of a positive test result for a

diseased individual and n1 the number of diseased individuals. Thereby

we have

E(1− �Se) = 1− p1, (11)

Var(1− �Se) = p1(1− p1)

n1

. (12)

Let us first consider the probit transformation Φ−1. The derivative of

the probit function is given for a random variable X by

Φ−1�(X) =

1

Φ��Φ−1(X)

� , (13)

where Φ� is the density function of the normal distribution.

Let us assume the specificity and one minus sensitivity are asymptoti-

cally normal distributed and we obtain with the delta method

Var�Φ

−1��Sp

��(13)=

1

Φ�

�Φ−1

�E��Sp

��2 · Var��Sp

�

(9)(10)=

1

Φ��Φ−1(p0)

�2 ·p0(1− p0)

n0

.

Estimating p0 as �p0 = TNn0

, we get for the estimated variance

�Var�Φ

−1��Sp

��=

1

Φ��Φ−1 (�p0)

�2 ·�p0(1− �p0)

n0

=1

Φ�

�Φ−1

�TNn0

��2 ·TN · FP

n30

.

For 1-sensitivity we obtain analogously

Var�Φ

−1�1− �Se

��(13)(11)(12)

=1

Φ��Φ−1(1− p1)

�2 ·p1(1− p1)

n1

42 3. THEORY

and with the probability p1 estimated as �p1 = TPn1

, we conclude for the

estimated variance

�Var�Φ

−1�1− �Se

��=

1

Φ�

�Φ−1

�FNn1

��2 ·TP · FN

n31

.

Thus, the weights of study s and cut-off i for the diseased and non-

diseased, defined as (unscaled) inverse variances, result in

w0si =1

�Var�Φ−1

�TNsi

n0s

�� =n30s · Φ

�

�Φ−1

�TNsi

n0s

��2

TNsi · FPsi

,

w1si =1

�Var�Φ−1

�FNsi

n1s

�� =n31s · Φ

�

�Φ−1

�FNsi

n1s

��2

TPsi · FNsi

.

Now let us consider logit transformation. The derivative of the logit

function is given by

logit�(X) =1

X(1−X), (14)

where X is a random variable. Then, we obtain with the delta method

Var�logit

��Sp

��(14)=

1

E��Sp

�2 �1− E

��Sp

��2 · Var��Sp

�

(9)(10)=

1

p20(1− p0)2·p0(1− p0)

n0

=1

n0p0(1− p0).

The estimated variance results in

�Var�logit

��Sp

��=

1

n0 �p0(1− �p0)=

n0

TN · FP.

3.6. WEIGHTING PARAMETERS 43

To calculate the variance of one minus sensitivity we proceed as above

and get

Var�logit

�1− �Se

��(14)(11)(12)

=1

(1− p1)2�1− (1− p1)

�2 ·p1(1− p1)

n1

.

=1

n1p1(1− p1).

Then the estimate of the variance is

�Var�logit

�1− �Se

��=

1

n1 �p1(1− �p1)=

n1

TP · FN.

So for the weights w0si of the non-diseased in study s and cut-off i and

w1si of the diseased in study s and cut-off i we obtain the (unscaled)

inverse variances weights

w0si =TNsi · FPsi

n0s

,

w1si =TPsi · FNsi

n1s

.

CHAPTER 4

Examples

In the following chapter we want to apply the novel approach to

different examples. In all plots the (transformed) proportions of neg-

ative test results of the non-diseased individuals will be depicted as

open circles and the ones of the diseased individuals as filled circles.

The different studies are marked in different colors. The regression

line (and the distribution function) of the non-diseased is depicted as a

dashed line, whereas the regression line (and the distribution function)

of the diseased is depicted as a continuous line. Grey lines mark the

confidence regions of the estimated specificity and one minus sensitiv-

ity, respectively.

To apply our approach to the examples, we linearly regress the trans-

formed negative test results for the diseased and non-diseased, respec-

tively. Therefore, we will use the lmer() function in R with normal dis-

tribution assumption, REML estimation and inverse variance weights

scaled to mean one. To avoid problems with zero values, we add a

continuity correction of 0.5 to TN, TP, FN and FP. We limit our-

selves to using models of the general form, i.e. where the fixed slopes

of non-diseased and diseased individuals are able to differ, and mark

these models with ’*’. To choose one model of this range, we select the

one with the smallest REML criterion. This procedure leads to model

*DICS in almost all examples shown. We use a weighting parameter

λw of 0.5, meaning that sensitivity and specificity are equally weighted,

except when noted otherwise.

4.1. Troponin as a marker for myocardial infarction

The first example that we want to consider is the data of the sys-

tematic review of Zhelev et al. (2015) where they investigated the ”di-

agnostic accuracy of single baseline measurement of Elecsys Troponin

T high-sensitive assay for diagnosis of acute myocardial infarction in

45

46 4. EXAMPLES

5 10 20

0.0

0.2

0.4

0.6

0.8

1.0

Troponin threshold [ng/L]

P(n

egative

test re

sult)

●

● ● ●

●●

● ●

●

●

●

●●

●●

●

●●

●

●

●

●

● ●

●

●●

●●●

●

●

●

●

●●

Figure 4.1. Troponin data. The proportions of negativetest results of the non-diseased are depicted as open circles(estimated specificities), the ones of the diseased as filledcircles (estimated 1-sensitivities). The data points belongingto different studies are marked in different colors and datapoints of the same studies are connected with lines.

emergency department”. There are 23 studies included, where 8 stud-

ies report 2 to 4 thresholds and 13 only one. Most of them (20 studies)

reported a threshold of 14 ng/L, as this is the manufacturer’s recom-

mended threshold (the 99% quantile of a healthy reference population).

Four studies reported a threshold of 3 ng/L and four a threshold of 5

ng/L. Furthermore, some other thresholds were reported. The data

can be found in table A.1 in the appendix.

Plotting the proportions of negative test results against the logarith-

mized threshold of troponin leads to figure 4.1.

Zhelev et al. conducted two meta-analyses using a bivariate model,

4.1. TROPONIN AS A MARKER FOR MYOCARDIAL INFARCTION 47

5 10 20

−2

−1

01

2


Φ−1 (

P(n

egative

test re

sult))

●

● ● ●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

● ●

●

●●

●●●

●

●

●

●

●●

Figure 4.2. Troponin data. Regression lines of the non-diseased (open circles, dashed line) and the diseased individ-uals (filled circles, solid line). Different studies are markedin different colors.

one pooling the data for the 14 ng/L threshold and one combined for

the thresholds 3 and 5 ng/L, as there was not enough data to perform

separate meta-analyses for these thresholds as well.

To apply our novel approach, we will proceed step-by-step. First, we

transform the proportions of negative test results applying the probit

function. After the transformation, we use a linear mixed effects model

to regress the data. From all *-models we select model *DICS, as it

has the smallest REML criterion of this range of models. The result

can be seen in figure 4.2.

48 4. EXAMPLES

5 10 20

0.0

0.2

0.4

0.6

0.8

1.0


P(n

egative

test re

sult)

●

● ● ●

●●

● ●

●

●

●

●●

●●

●

●●

●

●

●

●

● ●

●

●●

●●●

●

●

●

●

●●

Optimal threshold = 20.8 ng/L

Figure 4.3. Troponin data. Biomarker distributions withinthe non-diseased (open circles, dashed line) and within thediseased individuals (filled circles, solid line). The grey linesmark the confidence regions. The optimal threshold, derivedfrom a maximization of the Youden index, is depicted as asolid vertical line. Different studies are marked in differentcolors.

We back-transform the data and the parameters and obtain the dis-

tribution functions of the biomarker within the non-diseased and dis-

eased individuals, respectively (see figure 4.3). The resulting sensitiv-

ity at threshold 14 ng/L is 0.88 [0.85, 0.91] and the specificity is 0.75

[0.68, 0.82]. Zhelev et al. obtained a higher diagnostic accuracy of the

biomarker at the threshold 14 ng/L, as the sensitivity was 0.90 [0.86,

0.92] and the specificity was 0.77 [0.69, 0.84].

Our optimal threshold however, determined by a maximization of the

Youden index, is 20.78 ng/L. This is a lot higher than the recommended

4.1. TROPONIN AS A MARKER FOR MYOCARDIAL INFARCTION 49

●

● ●

●

●

●

●●

●

●

●

●●

●

●●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

5 10 20

0.0

0.2

0.4

0.6

0.8

1.0


Youden index

Optimal threshold = 20.8 ng/L

Figure 4.4. The Youden index of the troponin data withweighting parameter λw = 0.5. The optimal threshold, de-picted as a solid vertical line, is derived as that thresholdwhere the maximum is obtained.

threshold. The Youden index (with weighting parameter λw = 0.5, see

equation (3)) can be seen in figure 4.4.

At threshold 20.78 ng/L the sensitivity is 0.81 [0.75, 0.86] and speci-

ficity is 0.86 [0.82, 0.90]. This means that with the threshold of 20.78

ng/L, sensitivity decreased by 0.07 ng/L, but specificity increased by

0.11 ng/L with respect to the threshold of 14 ng/L. The big difference

between the recommended threshold and the point of maximization of

the Youden index might come from a different weighting of sensitivity

and specificity. As the diagnostic accuracy of Elecsys Troponin T high

sensitive assay is analyzed in the emergency department, it is likely

that sensitivity is emphasized. Setting the weighting parameter λw of

50 4. EXAMPLES

the weighted Youden index to 2/3, we obtain an optimal threshold of

13.9 ng/L, which is very close to the recommended threshold.

At last, we present the estimated summary ROC curve (plot 4.5) which

can be obtained easily since estimates of all distribution parameters are

available (see equation (1)).

●

●●●

●●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

● ●

● ●●

●

●

●

●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

Sensitiv

ity

Summary ROC curve

Optimal threshold at 20.8 ng/L with λ=0.5

Optimal threshold at 13.9 ng/L with λ=2/3

Figure 4.5. Summary ROC curve of the troponin data withtwo different optimal thresholds obtained by choosing weight-ing parameters λw = 0.5 (optimal threshold marked withblack cross) and λw = 2/3 (optimal threshold marked withred cross). Different studies are marked in different colors.

4.2. PROCALCITONIN AS A MARKER FOR SEPSIS 51

4.2. Procalcitonin as a marker for sepsis

The next example we want to consider is the data of the system-

atic review of Wacker et al. (2013). In this publication a meta-analysis

has been conducted ”to investigate the ability of procalcitonin to dif-

fer between sepsis and systematic inflammatory response syndromes

of non-infectious origin in critically ill patients”(Wacker et al., 2013).

There were 31 different studies included. They obtained a pooled sen-

sitivity of 0.77 [0.72, 0.81] and a pooled specificity of 0.79 [0.74, 0.84]

by using a bivariate mixed effects regression model.

Wacker et al. mentioned that several studies reported more than one

threshold and provided us with a list of these studies. There were 11

of the 31 studies reporting between 2 and 5 thresholds (see table A.2).

We applied our approach to that data, logarithmizing the procalcitonin

(PCT) thresholds and using model *DICS. The results are shown in

plot 4.6. We obtain an optimal threshold of 1.1 ng/mL and pooled

sensitivity of 0.71 [0.63, 0.78] and pooled specificity of 0.80 [0.73, 0.85]

at this threshold. Compared to the results of Wacker et al., our esti-

mated sensitivity is lower and has a bigger confidence interval, whereas

the estimates for the specificity are quite similar. The summary ROC

curve is shown in plot 4.7.

It is important to mention that the procalcitonin data is very het-

erogeneous. Wacker et al. investigated sources of heterogeneity with

meta-regression, but could not explain the heterogeneity. As different

procalcitonin assays were used in the studies, it may be reasonable to

stratify with respect to these assays. The different assays are all tests

to determine the concentration of procalcitonin in human serum and

plasma. To be precise, these assays are PCT-Q (a rapid test without

instruments), PCT-LIA (a manual standard test which is very reliable)

and PCT-Kryptor (a fully automated test).

The data and the estimated distribution functions stratified by assays

can be seen in figures 4.8 and 4.9. We used model *DICS for all as-

says. The resulting optimal thresholds and the pooled sensitivities and

specificities at these thresholds are given in table 4.1. For the PCT-Q

assay the point where the Youden index is maximised, i.e. the optimal

threshold, is 0.34 ng/mL. However, this value is outside of the data

range (see top panel in figure 4.8). Thus, to avoid extrapolation, we

52 4. EXAMPLES

0.1 0.2 0.5 1.0 2.0 5.0 10.0

0.0

0.2

0.4

0.6

0.8

1.0

Procalcitonin threshold [ng/mL]

P(n

egative

test re

sult)

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

Optimal threshold = 1.1 ng/mL

Figure 4.6. Distribution functions of procalcitonin withinthe non-diseased (open circles, dashed line) and within thediseased individuals (filled circles, solid line). The grey linesmark the confidence regions and different studies are markedin different colors. The optimal threshold, derived from amaximization of the Youden index, is depicted as a solidvertical line.

Assay Optimal threshold Sensitivity Specificity

[ng/mL]

PCT-Q 0.50 0.80 [0.68, 0.89] 0.86 [0.77, 0.92]

PCT-LIA 2.10 0.65 [0.54, 0.75] 0.84 [0.73, 0.91]

PCT-Kryptor 0.87 0.71 [0.55, 0.83] 0.81 [0.72, 0.87]

Table 4.1. Table of sensitivities and specificities at the op-timal threshold for the different assays of procalcitonin.


●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

Sensitiv

ity

Model−based summary ROC curve

Optimal threshold at 1.1 ng/mL,

(Se,Sp)=(0.71,0.80)

Figure 4.7. Summary ROC curve of the procalcitonin datawith the optimal threshold. Different studies are marked indifferent colors.

recommend using a threshold of 0.5 ng/mL.

The optimal thresholds of the assays differ a lot. Therefore, it is rea-

sonable to stratify. Nevertheless, the data still contains a lot of hetero-

geneity, which explains the big confidence intervals. The three sum-

mary ROC curves of the different procalcitonin assays are shown in

plot 4.10. They defer a lot and the PCT-Q assays seems to have the

best diagnostic accuracy. However, the data set of the PCT-Q assay

was the smallest with only 4 studies included.

54 4. EXAMPLES

0.5 1.0 2.0 5.0 10.0

0.0

0.2

0.4

0.6

0.8

1.0


P(n

egative

test re

sult)

●

●

●

●

●

●

●

●


0.1 0.2 0.5 1.0 2.0 5.0 10.0

0.0

0.2

0.4

0.6

0.8

1.0


P(n

egative

test re

sult)

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●


Figure 4.8. Top: Procalcitonin data with PCT-Q assay,bottom: procalcitonin data with PCT-LIA assay.Procalcitonin distribution functions within the non-diseased(open circles, dashed line) and within the diseased individuals(filled circles, solid line). The grey lines mark the confidenceregions and different studies are marked in different colors.The optimal threshold, derived from a maximization of theYouden index, is depicted as a solid vertical line.


0.1 0.2 0.5 1.0 2.0 5.0 10.0

0.0

0.2

0.4

0.6

0.8

1.0


P(n

egative

test re

sult)

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●


Figure 4.9. Procalcitonin data with PCT-Kryptor assay.Biomarker distributions within the non-diseased (open cir-cles, dashed line) and within the diseased individuals (filledcircles, solid line). The grey lines mark the confidence regionsand different studies are marked in different colors. The op-timal threshold, derived from a maximization of the Youdenindex, is depicted as a solid vertical line.

56 4. EXAMPLES

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

Sensitiv

ity

Summary ROC curve PCT−Q

Optimal threshold of PCT−Q at 0.5 ng/mL

Summary ROC curve PCT−LIA

Optimal threshold of PCT−LIA at 2.1 ng/mL

Summary ROC curve PCT−Kryptor

Optimal threshold of PCT−Kryptor at 0.87 ng/mL

Figure 4.10. Summary ROC curves of the different procal-citonin assays with the optimal threshold marked as a cross(PCT-Q assay in black, PCT-LIA assay in red and PCT-Kryptor assay in green). Different studies are marked indifferent colors.

4.3. Procalcitonin as a marker for neonatal sepsis

In the following we want to consider the data of the systematic

review of Vouloumanou et al. (2011), where the value of serum pro-

calcitonin for the distinction of individuals with and without neonatal

sepsis was investigated. They reported 16 studies, whereof 3 reported

2 to 3 thresholds and the others only one. The data set with multiple

thresholds per study can be found in table A.3. Martınez-Camblor used

4.3. PROCALCITONIN AS A MARKER FOR NEONATAL SEPSIS 57

0.5 1.0 2.0 5.0 10.0

−2

−1

01

2


Φ−1 (

P(n

egative

test re

sult))

●

●

●

● ●

●

●

● ●

●

●

●

●

●

● ●●

●

●

●

Figure 4.11. Procalcitonin data concerning neonatal sepsiswith regression lines of the non-diseased (open circles, dashedline) and the diseased individuals (filled circles, solid line).Different studies are marked in different colors.

this example to demonstrate his non-parametric approach described in

subsection 3.2.3.

Applying our new approach to this data may, depending on the model,

result in regression lines with negative slope. This leads to negative

standard deviations and thus the model fails (see paragraph 3.3.4).

The result for model *CS and logarithmized procalcitonin threshold

can be seen in plot 4.11.

For some of the other models the regression lines are not decreasing,

but almost horizontal, resulting in optimal thresholds ranging from 0

to 1.4 · 1043. Thus, we do not obtain reasonable results for this data.

58 4. EXAMPLES

4.4. CAGE Questionnaire

Generally, our approach is based on the assumption of a continuous

biomarker. However, in the following we want to apply our approach

to a discrete biomarker. Aertgeerts et al. (2004) conducted a meta-

analysis to assess diagnostic characteristics of the CAGE, a self-report

questionnaire to identify alcoholism. The data can be found in table

A.4. Putter et al. (2010) used this data example to demonstrate their

approach which is described in subsection 3.2.2. The data consists of

10 studies all reporting 5 thresholds (0, 1, 2, 3, 4) and corresponding

values of sensitivity and specificity.

The results of our approach can be seen in plot 4.12. The regression

was conducted with model *DICS and led to an optimal threshold of

1.56, but this threshold does not exist.

Instead, we need to maximize the Youden index on the discrete set of

thresholds {0, 1, 2, 3, 4}. This results in the optimal threshold 2 with

a sensitivity of 0.70 [0.60, 0.78] and a specificity of 0.88 [0.82, 0.92].

The Poisson correlated gamma frailty model of Putter et al. resulted

in a sensitivity of 0.69 [0.63, 0.76] and a specificity of 0.88 [0.83, 0.94]

at threshold 2. Thus, the results are very similar, with our approach

leading to slightly bigger confidence intervals.

The summary ROC curve is shown in plot 4.13.

4.4. CAGE QUESTIONNAIRE 59

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

CAGE score threshold

P(n

egative

test re

sult)

●●●●●●●●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

Optimal threshold = 1.6

Figure 4.12. Distribution functions of the CAGE scorewithin the non-diseased (open circles, dashed line) and withinthe diseased individuals (filled circles, solid line). The greylines mark the confidence regions and different studies aremarked in different colors. The optimal threshold, derivedfrom a maximization of the Youden index, is depicted as asolid vertical line.

60 4. EXAMPLES

●●●●●●●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1 − Specificity

Sensitiv

ity

Model−based summary ROC curve

Optimal threshold at 2,

(Se,Sp)=(0.70,0.88)

Figure 4.13. Summary ROC curve of the CAGE data withthe optimal threshold marked as a cross. Different studiesare marked in different colors.

CHAPTER 5

Simulation Study

5.1. Design

To evaluate the performance of our method we conducted a simu-

lation study. We aimed to investigate how precise the new approach

can estimate the true distributions of ill and healthy individuals. Fur-

thermore, we examined if the model is a suitable approach to estimate

the pooled sensitivity and specificity and the optimal threshold in a

meta-analysis.

We considered 384 scenarios with 1000 runs each. Data was simulated

mimicking roughly the example data presented in section 4. For an

overview of the data acquisition see the flow chart in figure 5.1. All

scenarios can be seen in table 5.1.

To obtain a data set of a DTA study, the number of studies was

randomly set to 10, 20 or 30. The ’real’ overall distributions of the

biomarker within the non-diseased and diseased individuals were nor-

mal distributions with fixed mean 0 for the non-diseased and varying

mean 2.5 (’nearby distributions’) or 4 (’distant distributions’) for the

diseased. The standard deviation of the non-diseased was 1.5, 1, 2.5,

2.5, respectively, varying together with the standard deviation of the

diseased of 1.5 (same standard deviations), 2 (different standard devia-

tions), 2.5 (same standard deviations) and 4 (different standard devia-

tions). The standard deviations were varying together with the mean,

the first two smaller standard deviations combined with the nearby

distributions and the two bigger ones with the distant distributions.

To obtain study-specific distributions random noise was added to the

’real’ overall distributions. The extent of the random noise was de-

termined by a visual comparison with the examples. Namely a mean

zero normal error with standard deviation 0 (’no heterogeneity’), 0.5

(’moderate heterogeneity’), 1 (’large heterogeneity’) or 1.5 (’huge het-

erogeneity’) was added to the mean parameters. Likewise, a normal

61

62 5. SIMULATION STUDY

error with standard deviation 0, 0.3, 0.4 or 0.5 was added to the stan-

dard deviation parameters. These noise distributions were symmetri-

cally truncated. Those of the mean parameters were truncated so that

the mean of the study-specific distribution of the diseased individuals

was greater than that of the non-diseased; those of the standard de-

viation parameters were truncated in order to guarantee non-negative

study-specific standard deviations.

The total number of individuals per study was drawn from a log-normal

distribution1 with parameters µ = 5 and σ = 1. The proportion of ill

individuals per study was drawn from a normal distribution with mean

0.5 and standard deviation 0.2, truncated to the interval (0.2, 0.8) to

obtain realistic proportions. Drawing from the respective study-specific

distribution as many times as the number of non-diseased and diseased

led to biomarker values for all individuals.

The number of thresholds per study was drawn from a Poisson dis-

tribution with parameter λ = 1.3 or 2 (rejecting zeros) or fixed to 5

thresholds per study. The values of the thresholds were spaced equidis-

tantly between the 40% quantile of the study-specific distribution of

the non-diseased individuals and the 60% quantile of the study-specific

distribution of the diseased individuals. Therewith we obtained whole

data sets of DTA studies.

Conducting the meta-analysis we applied 4 selected linear random ef-

fects models: CI, DS, CICS and CIDS, with increasing complexity. All

models were used in the special case of same fixed slopes (β0 = β1) and

in the general case of different fixed slopes (these model were named

with *), leading to a total number of eight different models. For the

computational implementation of the linear random effect models we

used the lmer() function of the R package lme4 1.1-7 with REML es-

timation. We did not include the most complex model DIDS because

there was mostly insufficient data, as the simulation data was mim-

icking the example data. For weighting of the studies we used inverse

variance weights scaled to mean one. Sensitivity and specificity were

equally weighted.

1A random variable X is log-normal distributed with parameters µ and σ, if ln(X)is normal distributed with mean µ and standard deviation σ.

5.1. DESIGN 63

Numberofcu

toffs

pois(λ

=1.3,2),

5

Proportionill:healthy

N(0.5,0

.2)

Numberofpatients

perstudy

Lognorm

al(5,1

)

Heterogeneityparameters

τµ=

(0,0.5,1,1.5),

τσ=

(0,0.3,0.4,0.5)

Parameters

ofdistribution

µ0=

0,µ1=

(2.5,4),

σ0=

(1.5,1,2.5,2.5),

σ1=

(1.5,2,2.5,4)

Numberofstudiespermeta

analysis

(10,20,30)

NumberofIll/Healthy

Parameters

ofdistributionper

study

Valuesofcu

toffs

Biomarkerva

lues

TP,TN,FP,FN

Figure 5.1. Flow chart of the data acquisition in the simu-lation study.


µ0 µ1 σ0/σ1 λ τµ/τσ models0 2.5 1.5/1.5 1.3 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 1.3 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 1.3 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 1.3 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 2 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 2 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 2 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 2 1.5/0.5 CI, DS, CICS, CIDS,*CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 5 thresholds 0/0 CI, DS, CICS, CIDS,*CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 5 thresholds 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 5 thresholds 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1.5/1.5 5 thresholds 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 1.3 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 1.3 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 1.3 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 1.3 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 2 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 2 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 2 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 2 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 5 thresholds 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 5 thresholds 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 5 thresholds 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 2.5 1/2 5 thresholds 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 1.3 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 1.3 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 1.3 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 1.3 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 2 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 2 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 2 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 2 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 5 thresholds 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 5 thresholds 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 5 thresholds 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/2.5 5 thresholds 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 1.3 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 1.3 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 1.3 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 1.3 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 2 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 2 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 2 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 2 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 5 thresholds 0/0 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 5 thresholds 0.5/0.3 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 5 thresholds 1/0.4 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS0 4 2.5/4 5 thresholds 1.5/0.5 CI, DS, CICS, CIDS, *CI, *DS, *CICS, *CIDS

Table 5.1. All scenarios of the simulation study. Every linerepresents 8 scenarios, differing in the linear mixed modelused.

5.2. RESULTS 65

5.2. Results

We investigated bias, coverage and mean squared error (MSE) of

the distribution parameters µ0, µ1, σ0 and σ1 and of sensitivity and

specificity at three points: at the mean of the non-diseased population

(sens 1, spec 1), at the ’real’ optimal threshold (sens 2, spec 2) and at

the mean of the diseased population (sens 3, spec 3). Furthermore, we

investigated bias and MSE for the optimal threshold.

All pictures contain eight plots. In every plot bias, coverage or MSE is

plotted against the linear models. From left to right the heterogeneity

of the studies increases. The four plots at the bottom show the case

of same standard deviation in the ’real’ overall distributions of the

biomarker, the top four plots the case of different standard deviations.

5.2.1. Bias.

5.2.1.1. Distribution parameters. Firstly, we consider the bias of

the distribution parameters µ0, µ1, σ0 and σ1 with λ = 1.3 and nearby

distributions of the biomarker (see figure 5.2). In the case of no hetero-

geneity and same standard deviations, the bias was basically zero for

all parameters. But the bias increased with increasing heterogeneity

up to the value of 100 for single parameters. The parameter µ0, the

mean of the non-diseased, was consequently underestimated, whereas

the other parameters were overestimated. In the case of different stan-

dard deviations (the upper row of plots) we find a similar structure.

But there is one striking difference in the case of no heterogeneity: For

the models with same fixed slopes for the non-diseased and diseased

individuals, all parameters have bias unequal to zero. An explanation

could be that the data is quite perfect, as there is no heterogeneity,

but the slopes of the two straight lines to be estimated are different.

In this case all parameters suffer from estimating these lines with the

constraint of estimating the same slope. This phenomenon vanishes

with more heterogeneity.

In the case of λ = 2 (see figure C.1 in the appendix) or even more in

the case of the fixed number of 5 thresholds per study the bias of the

distribution parameters is significantly decreasing (see graphic at the

top of figure 5.3). One can observe a zigzag pattern of the bias, where

model DS and *DS have the greatest bias and model CIDS and *CIDS


Models

Bia

s

0

50

100

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

No hetero, Same SD Moderate hetero, Same SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Large hetero, Same SD Huge hetero, Same SD

No hetero, Different SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Moderate hetero, Different SD Large hetero, Different SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0

50

100

Huge hetero, Different SD

mu0 mu1 sigma0 sigma1

Models

Bia

s

−5

0

5

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Moderate hetero, Different SD Large hetero, Different SDC

I

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−5

0

5



Figure 5.2. Bias of µ0 (open light blue circle), µ1 (filled light bluecircle), σ0 (open dark blue circle) and σ1 (filled dark blue circle) forλ = 1.3 and nearby distributions. The picture at the top shows thewhole scenario, the one at the bottom is zoomed so that −10 ≤ Bias≤ 10. In both pictures the heterogeneity of the studies increases fromleft to right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of differentstandard deviations.

5.2. RESULTS 67

the lowest, equally for same and different standard deviations.

Let us consider the case of distant distributions, where the overall dis-

tribution of the biomarker within the non-diseased individuals is a mean

zero normal distribution with standard deviation 2.5 and the one within

the diseased individuals is a normal distribution with mean 4 and stan-

dard deviation 2.5 or 4. The bias of the distribution parameters de-

creased, in comparison with the bias in the case of nearby distributions

(see figure 5.3 at the bottom, others not shown). This could be due to

the higher values of mean and standard deviation and therefore higher

differences between these parameters regarding the non-diseased and

diseased individuals. Thus the adding of heterogeneity, which stayed

the same, affected the parameters less.

5.2.1.2. Sensitivity and Specificity. In the following we consider the

bias of sensitivity and specificity with λ = 1.3 and nearby distributions

(see figures 5.4 and 5.5). In the case of same standard deviations and no

heterogeneity there was almost no bias. With increasing heterogeneity,

the sensitivity was underestimated at threshold 0 and overestimated at

threshold 2.5. For specificity this held vice versa. Thus, small values

of sensitivity and specificity were overestimated and large ones under-

estimated. At the ’real’ optimal threshold 1.25 they were both slightly

underestimated. For different standard deviations we discover the same

development, but additionally we obtain bias for the models assuming

same standard deviations in the no-heterogeneity case. At threshold

0 and at the ’real’ optimal threshold the bias in the case of different

standard deviations was generally slightly larger than in the case of

same standard deviation, at threshold 2.5 smaller.

Consistent with the results of the distribution parameters, the bias of

sensitivity and specificity was generally decreasing with an increasing

number of thresholds. At the outer two points (point 1 and 3) the bias

decreased significantly, at the ’real’ optimal threshold the bias stayed

the same (see figures 5.6 (top) and C.3 for the case of 5 thresholds per

study, the case λ = 2 is not shown). Again, we can observe a zigzag

pattern, with the highest bias resulting from model DS and *DS and

the lowest from CIDS and *CIDS, equally for same and different stan-

dard deviations.


Models

Bia

s

−5

0

5

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−5

0

5



Models

Bia

s

−5

0

5

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

No hetero, same SD Moderate hetero, same SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Large hetero, same SD Huge hetero, same SD

No hetero, different SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Moderate hetero, different SD Large hetero, different SDC

I

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−5

0

5

Huge hetero, different SD


Figure 5.3. Bias of µ0 (open light blue circle), µ1 (filled light bluecircle), σ0 (open dark blue circle) and σ1 (filled dark blue circle).Top: In the case of 5 thresholds per study and nearby distributions.Bottom: In the case of λ = 1.3 and distant distributions. Zoomedversion so that −10 ≤ Bias ≤ 10. For an overview see figure C.2. Inboth pictures the heterogeneity of the studies increases from left toright. The four plots at the bottom show the case of same standarddeviations (SD), the top four plots the case of different standarddeviations.

5.2. RESULTS 69

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_1 spec_1

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


I

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_2 spec_2

Figure 5.4. Bias of sensitivity and specificity at 0 (sens 1, spec 1)in the top panel and at the ’real’ optimal threshold (sens 2, spec 2) inthe bottom panel. It holds λ = 1.3 and the distributions are nearby.In both pictures the heterogeneity of the studies increases from left toright. The four plots at the bottom show the case of same standarddeviations (SD), the top four plots the case of different standarddeviations.


Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_3 spec_3

Figure 5.5. Bias of sensitivity (sens 3) and specificity(spec 3) at 2.5. It holds λ = 1.3 and the distributions arenearby. The heterogeneity of the studies increases from leftto right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of dif-ferent standard deviations.

In the case of distant distributions the bias of sensitivity and speci-

ficity decreased, in comparison to the nearby distributions (see figures

5.6 (bottom) and C.4, others not shown). This matches with the results

of the distribution parameters.

5.2.1.3. Optimal threshold. In the meta-analysis an overall optimal

threshold was estimated. The bias of this optimal threshold in the case

of λ = 1.3 and same standard deviations was small (see figure 5.7). In

the case of different standard deviations the bias of models with same

slope was significantly higher than the one of models with different

slopes. With increasing number of thresholds per study the bias was

decreasing (see figure C.5).

5.2.2. MSE.

5.2.2.1. Distribution parameters. The mean squared error of the

distribution parameters in the case of λ = 1.3 and nearby distributions

5.2. RESULTS 71

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_1 spec_1

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Moderate hetero, different SD Large hetero, different SDC

I

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_1 spec_1

Figure 5.6. Bias of sensitivity (sens 1) and specificity (spec 1) atthreshold 0.Top: In the case of 5 thresholds per study and nearby distributions.Bottom: In the case of λ = 1.3 and distant distributions. Both: Theheterogeneity of the studies increases from left to right. The fourplots at the bottom show the case of same standard deviations (SD),the top four plots the case of different standard deviations.


Models

Bia

s

−3

−2

−1

0

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−3

−2

−1

0


Models

Bia

s

−3

−2

−1

0

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

Moderate hetero, different SD Large hetero, different SD

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−3

−2

−1

0


Figure 5.7. Bias of the optimal threshold in the case of λ = 1.3.Top:: nearby distributions.Bottom: Distant distributions. Both: The heterogeneity of the stud-ies increases from left to right. The four plots at the bottom showthe case of same standard deviations (SD), the top four plots thecase of different standard deviations.

5.2. RESULTS 73

ranged from almost 0 up to 5.04·106, increasing strongly with increas-

ing heterogeneity. Considering no heterogeneity, the MSE was close to

zero and for moderate heterogeneity it did not exceed 15 (see figure

C.6).

For distant distributions there was an outlier at almost 8 · 108 (in the

case of large heterogeneity), but generally the MSE was smaller than

for nearby distributions. For example, in the case of no and moderate

heterogeneity the MSE did not exceed 5 (figures not shown). This may

again be due to the fact that the heterogeneity parameters affected the

higher values of the distant distributions less.

With an increasing number of thresholds, the MSE is decreasing sig-

nificantly, with a maximum value of 2·103 and almost all values being

below 15 in the case of 5 thresholds per study .

5.2.2.2. Sensitivity and specificity. In the case of λ = 1.3 and nearby

distributions the MSE did not exceed 0.08 for the three measuring

points. It was significantly lower at points 2 and 3 (not exceed 0.02

and 0.05, respectively). The MSE of specificity was permanently higher

than the one of sensitivity at threshold 0 and vice versa at threshold

2.5 (see figures 5.8 and C.7). This effect could also be observed for the

bias.

With an increasing number of thresholds the MSE was decreasing. For

example, with 5 thresholds per study the MSE got maximally up to

0.06 (not shown). For distant distributions and λ = 1.3 the MSE was

smaller, never exceeding 0.04. (see C.8 and C.9).

5.2.2.3. Optimal threshold. The MSE of the optimal threshold in-

creased a lot with increasing heterogeneity (see figure 5.9). It de-

creased strongly with an increasing number of thresholds, such that

for 5 thresholds per study the MSE was always below 3 (not shown).

It was smaller in the case of distant distributions than in the case of

nearby distributions (not shown).

5.2.3. Coverage. The coverage is the probability of the real value

being comprised in the confidence interval of the estimated value. As

we chose 95% confidence intervals, the coverage should ideally be 0.95.

5.2.3.1. Distribution parameters. The coverage of the distribution

parameters was varying between almost 0 and almost 1 (see figure


Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_1 spec_1

Figure 5.8. MSE of sensitivity (sens 1) and specificity(spec 1) at threshold 0 in the case of λ = 1.3 and nearby dis-tributions. The heterogeneity of the studies increases fromleft to right. The four plots at the bottom show the case ofsame standard deviations (SD), the top four plots the caseof different standard deviations.

5.10). The models with same fixed slope had permanently smaller

coverage than the models with different fixed slopes, but we will see

that this phenomenon did not exist for sensitivity and specificity to

this extent. In general, the coverage was almost never 0.95. In the case

of no heterogeneity we see two different phenomena: Firstly, when the

distribution functions had the same standard deviations, the coverage

hovered around 0.95 for all models. Secondly, in the case of different

standard deviations, the *-models resulted in more or less the same

coverage, whereas the models with same fixed slope had a coverage

close to zero. This may be explained by a small confidence interval due

to no heterogeneity and existing bias.

For an increasing number of thresholds the coverage decreased (see

picture C.11). In the case of distant distribution functions the coverage

is slightly higher (not shown).

5.2. RESULTS 75

Models

MS

E

10

20

30

40

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

10

20

30

40


Figure 5.9. MSE of the optimal threshold in the case ofλ = 1.3 and nearby distributions. The heterogeneity of thestudies increases from left to right. The four plots at thebottom show the case of same standard deviations (SD), thetop four plots the case of different standard deviations. Thisis a zoomed-in version. For the overview see figure C.10.

5.2.3.2. Sensitivity and specificity. The coverage of sensitivity and

specificity is strongly decreasing with increasing heterogeneity, in the

case of λ = 1.3 and nearby distributions. For the outer thresholds (0

and 2.5) and in the case of same standard deviations the coverage was

between 0.7 and 0.97 in the case of no heterogeneity and between 0 and

0.7 for huge heterogeneity. At the ’real’ optimal threshold the coverage

was also decreasing with increasing heterogeneity, but always stayed

over 0.5 (see figures 5.11 and C.12 (top)).

With an increasing number of thresholds per study the coverage at

threshold 0 and 2.5 was spreading over almost the whole interval [0,1]

and at the ’real’ optimal threshold the coverages were slightly decreas-

ing (see figures C.12 (bottom) and C.13).

In the case of distant distributions the coverage at thresholds 0 and 4

were higher than with nearby distributions at the thresholds 0 and 2.5.


Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


m0 s0 m1 s1

Figure 5.10. Coverage of the distribution parameters µ0

(open light blue circle), µ1 (filled light blue circle), σ0 (opendark blue circle) and σ1 (filled dark blue circle) in the caseof λ = 1.3 and nearby distributions. The heterogeneity ofthe studies increases from left to right. The four plots at thebottom show the case of same standard deviations (SD), thetop four plots the case of different standard deviations.

At the ’real’ optimal threshold it was slightly higher as well (figures

not shown).

5.2. RESULTS 77

Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_1 spec_1

Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


I

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_2 spec_2

Figure 5.11. Coverage of sensitivity and specificity at threshold 0(sens 1, spec 1) in the top panel and at the ’real’ optimal thresh-old (sens 2, spec 2) in the bottom panel in the case of λ = 1.3 andnearby distributions. In both pictures the heterogeneity of the stud-ies increases from left to right. The four plots at the bottom showthe case of same standard deviations (SD), the top four plots thecase of different standard deviations.

CHAPTER 6

Discussion

We have described and evaluated a new approach for meta-analysis

of diagnostic test accuracy studies, where several studies report more

than one threshold and the corresponding values of sensitivity and

specificity. The approach uses a parametric assumption (normal or lo-

gistic) for the distribution of a continuous biomarker. The idea is to

estimate the distribution functions of the biomarker, one distribution

function within the non-diseased and one within the diseased study

population. This is achieved by the use of a mixed effects model with

study as random factor.

The traditional approaches, as for example the hierarchical model of

Rutter and Gatsonis (2001) and the bivariate model of Reitsma et al.

(2005), only use one pair of sensitivity and specificity per study. Only

recently, alternative approaches using more than one pair of sensitivity

and specificity per study have been described, as for example the mul-

tivariate random effects approach of Hamza et al. (2009), the survival

approach of Putter et al. (2010) and the non-parametric approach of

Martınez-Camblor (2014). Riley et al. (2014) also proposed a multi-

variate regression model, which is closely related to our models.

Our new approach for meta-analysis of DTA studies has its strengths

and limitations.

Strengths. Our approach uses multiple pairs of sensitivity and speci-

ficity and their corresponding thresholds per study. In comparison with

traditional approaches, this has several advantages: we do not need

to select one pair of sensitivity and specificity per study (which may

lead to bias) and we do not need further assumptions to determine a

summary ROC curve. Instead, we use all the given information and

therefore the results are expected to be more reliable. In contrast to

the alternative approaches of Hamza et al. (2009) and Putter et al.

(2010), our approach can deal with a varying number of thresholds per

study. This has been the case in most of the systematic reviews we

79

80 6. DISCUSSION

found with several studies providing multiple thresholds.

The assumption of a normal or logistic distributed biomarker with dif-

ferent parameters for the non-diseased and diseased individuals is very

common. Thus, our approach follows a very natural idea by estimating

these biomarker distributions. Everything is based upon these distri-

butions: sensitivity and specificity, the SROC curve and the Youden

index and therewith the optimal threshold. Thus, directly and with-

out further assumptions, we obtain all desired quantities. By using a

mixed effects model we acknowledge the diversity of the studies, while

the data of each study has in principle the same structure. By admit-

ting correlated random effects, we respect the bivariate character of the

study data.

Furthermore, with our approach we can determine an optimal thresh-

old among all studies. This is important information for clinicians. In

the clinical routine it is not only of interest to know which is the best

biomarker for a specific illness, but also at which threshold an optimal

discrimination between non-diseased and diseased individuals can be

achieved.

Limitations. Some care has to be taken concerning the concept

of an optimal threshold across studies. This is only reasonable if a

biomarker value has the same meaning in all studies and does not

differ because of laboratory conditions. If the thresholds are very het-

erogeneous, this has to be doubted. Of course the question arises as

well in how far it is reasonable to pool sensitivity and specificity if the

studies are very inhomogeneous.

A weak point of this method is the possibility of decreasing proportions

of negative test results with an increasing threshold across studies, as

can be seen in example 4.3. Within a study, positive correlation is

assured by definition, but across studies this cannot be guaranteed. In

this case, the method will fail.

If there are not enough data points reported from the studies, some

linear mixed effects models may not be applicable as the number of

parameters to be estimated might be too big. Moreover, the lmer()

function had convergence and/or calculation problems for some mod-

els and data.

6. DISCUSSION 81

Examples. In all examples a large heterogeneity between the stud-

ies can be seen. However, the application of our approach to these

examples led to very convincing results (except for example 4.3). The

distribution functions and the pooled sensitivity and specificity with

their confidence intervals seemed reasonable. The selection method to

only consider *- models and then select the one with the smallest REML

criterion resulted, for all examples (except example 4.3), in choosing

model *DICS. The repeated choice of this model might be explained by

the great difference between the intercepts of the transformed negative

test results of the non-diseased and diseased individuals, whereas the

slopes of the data points were almost the same. Possibly this model

could be included in a next simulation study. With different choices

of models we always obtained visually good results for the estimated

distributions, but the optimal thresholds varied.

Example 4.3 regarding procalcitonin as a marker for neonatal sepsis

shows that the approach does not work for all data. If the heterogene-

ity between studies is very big and the number of threshold is low, a

reasonable regression cannot be assured as it is generally possible to

obtain regression lines with negative slope. Therefore, it is not advis-

able to use our model for such data.

In example 4.4 we can see that the approach may work well for discrete

biomarkers, too. With our approach we obtain almost the same results

as Putter et al. and have the advantage of not being restricted to data

with the same amount of thresholds per study.

Simulation study. The simulation study showed that with grow-

ing heterogeneity, the quality of the estimates deteriorates. Generally,

reasonable results of the new approach can only be expected for the

heterogeneity levels ’no’ and ’moderate’. However, since the distribu-

tion estimates for almost all data examples have been convincing, we

assume that in practice most data has a heterogeneity level within this

range. This assumption is supported by the heterogeneity levels that

Martınez-Camblor (2014) examined in his simulation study. He used

mean zero normal errors with variances 0, 0.1 and 0.2 for the means

of the distributions and with variances 0 and 0.1 for the logarithmized

variances of the distributions. Therefore, all his levels of heterogeneity

are smaller than the ’moderate’ heterogeneity level.

82 6. DISCUSSION

For data with maximal moderate heterogeneity the linear mixed mod-

els allowing for different fixed slopes (denominated with *) are to be

preferred. They led to smaller bias and MSE in scenarios where the

standard deviations were different and to an equivalent bias and MSE

in scenarios where the standard deviations were the same.

Furthermore, the bias and MSE of estimates decreased with a increas-

ing number of thresholds per study. Therefore, study investigators are

encouraged to report as many thresholds as possible.

If the biomarker distributions of the non-diseased and diseased individ-

uals were further away (mean zero distribution within the non-diseased

and a mean of 4 within the diseased), the bias and the MSE were

smaller than in the case of nearby distributions (mean 0 and mean

2.5). However, as the differences of the means and also of the stan-

dard deviations of the biomarker distributions of the non-diseased and

diseased individuals were bigger in the first case, the smaller bias and

MSE could be due to the fact that the heterogeneity parameters (which

stayed the same) had less influence.

In most circumstances the bias of sensitivity and specificity was the

smallest for the most complex models examined, the CIDS and *CIDS

model (common random intercept and different random slope). On the

other hand, we observed that the more complex the mixed effects model

was, the more convergence problems occurred in the lmer() function.

Unfortunately, the coverage of the estimates of the distribution param-

eters as well as of sensitivity and specificity was not satisfying. This

could be due to the existing bias and to incorrect confidence intervals.

For the confidence intervals we assumed the parameters to be approx-

imately normally distributed, but possibly the normal quantiles led to

the creation of too narrow confidence intervals.

CHAPTER 7

Conclusion

In this thesis a new approach for meta-analyses of DTA studies

where several studies provide more than one threshold was described.

We acknowledged for the heterogeneity between the studies as well as

for the bivariate character of the data. We applied the new approach to

several examples, almost all leading to convincing results. Our simula-

tion study showed that the method works only reasonable in scenarios

of no or moderate heterogeneity between the studies and that the cov-

erage of the estimated parameters is not satisfying. We proposed a

total of 16 linear mixed models which differ in their fixed and random

effects structure for estimation of the distribution functions. It would

be desirable to have a criterion to select the model of choice for con-

crete data, but the model selection remains an unsolved problem.

Furthermore, it would be worthwhile to determine a confidence interval

of the optimal threshold in the case of logistic distributions, possibly

using an empirical approach with a resampling method. Moreover, one

could include the uncertainty of the optimal threshold in the confidence

intervals of sensitivity and specificity at the optimal threshold.

Although our new approach can still be improved in some aspects, it

has all the important properties needed to acknowledge the heterogene-

ity of the studies and the bivariate character of the data and includes

multiple thresholds of studies, possibly differing in number.

83

APPENDIX A

Data Sets

study cutoff TP FP FN TN

1 5 106 131 4 91

1 13 92 38 18 184

1 14 92 36 18 186

1 15 93 29 17 193

2 3 202 424 3 310

2 5 196 340 9 394

2 14 189 149 16 585

2 17 186 114 19 620

3 3 130 378 0 195

3 14 111 101 19 472

4 3 35 77 0 25

4 14 33 31 2 71

5 5 39 140 1 260

5 14 36 80 4 320

5 20 35 48 5 352

5 30 32 28 8 372

6 12 54 5 7 114

6 14 53 2 8 117

6 17 51 2 10 117

7 14 159 90 6 87

7 27 119 36 46 141

8 5 434 1087 9 542

9 14 19 21 6 121

9 18 19 15 6 127

10 14 37 163 1 105

11 14 53 33 14 733

12 14 101 59 27 173

13 14 42 49 3 223

14 14 125 117 11 250

15 14 398 363 46 1265

16 14 62 52 13 82

17 14 23 159 9 195

18 14 12 60 1 69

19 14 128 18 3 84

20 14 71 80 8 199

21 14 61 126 9 282

Table A.1. Troponin data of Zhelev et al. (2015) with mul-tiple thresholds.

85

86 A. DATA SETS

study cutoff TP FP TN FN assay

1 0.5 63 11 38 8 PCT-Q

1 2 49 2 47 22 PCT-Q

1 10 18 0 49 53 PCT-Q

2 0.1 111 161 54 11 PCT-LIA

2 0.3 84 71 144 38 PCT-LIA

2 0.4 77 56 159 45 PCT-LIA

2 0.5 73 45 170 49 PCT-LIA

3 0.5 10.22 4.62 9.38 3.78 PCT-LIA

3 1 9.94 1.12 12.88 4.06 PCT-LIA

3 1.2 9.52 0.28 13.72 4.48 PCT-LIA

3 2 12.32 0 14 1.68 PCT-LIA

3 5 5.74 0 14 8.26 PCT-LIA

4 3.03 52 11 10 10 PCT-LIA

4 15.75 47 2 19 15 PCT-LIA

5 1 42 6 26 9 PCT-LIA

5 10 9 0 32 42 PCT-LIA

6 0.087 59 11 8 15 PCT-Kryptor

6 0.1 56 9 10 18 PCT-Kryptor

6 0.25 41 2 17 33 PCT-Kryptor

6 0.5 34 2 17 40 PCT-Kryptor

7 2 31 9 35 1 PCT-Kryptor

7 10 21 3 41 11 PCT-Kryptor

8 0.5 76 11 45 36 PCT-Q

8 2 52 4 52 60 PCT-Q

8 10 30 2 54 82 PCT-Q

9 0.5 20.5 25 14 4.5 PCT-LIA

9 2.5 17 10 29 8 PCT-LIA

9 5 12.5 7 32 12.5 PCT-LIA

10 0.1 138 49 84 65 PCT-Kryptor

10 0.5 83 17 116 120 PCT-Kryptor

10 3 37 4 129 166 PCT-Kryptor

11 0.5 44 73 15 1 PCT-LIA

11 1.5 34 20 68 11 PCT-LIA

11 3 24 12 76 21 PCT-LIA

12 1.2 21 2 13 13 PCT-LIA

13 1 29 2 38 7 PCT-Kryptor

14 9.7 28 9 27 3 PCT-Kryptor

15 1.6 16 8 23 4 PCT-LIA

16 0.6 39 9 20 8 PCT-LIA

17 0.28 20 3 9 4 PCT-LIA

18 1.1 58 4 14 2 PCT-LIA

19 2.2 31 0 11 24 PCT-Kryptor

20 1.1 34 5 17 7 PCT-LIA

21 0.5 17 5 58 24 PCT-LIA

22 0.25 77 23 32 19 PCT-Kryptor

23 0.5 53 5 37 19 PCT-Q

24 0.5 22 1 24 3 PCT-Q

25 5.79 17 2 17 13 PCT-Kryptor

26 0.32 65 9 16 13 PCT-Kryptor

27 2 82 92 116 37 PCT-LIA

28 3.3 19 5 6 3 PCT-LIA

29 2 49 6 14 26 PCT-LIA

30 1 19 2 21 8 PCT-Kryptor

31 1.31 55 2 8 20 PCT-LIA

Table A.2. Procalcitonin data of Wacker et al. (2013) withmultiple thresholds.

A. DATA SETS 87


1 0.5 92 0 31 40

2 0.6 30 15 0 28

3 5.75 20 31 9 63

4 0.5 16 41 2 28

4 2 16 24 2 45

4 10 13 17 5 52

5 1 11 22 8 108

6 0.8 11 1 3 25

7 1 38 5 12 17

8 0.5 38 8 7 123

9 0.55 43 41 14 107

10 2 7 21 0 95

11 0.5 35 3 1 12

11 1 26 0 10 15

12 0.5 51 60 14 58

13 1 15 6 4 109

14 2 34 11 7 16

14 6 32 2 9 25

15 0.5 26 39 20 77

16 5 16 65 3 66

Table A.3. Procalcitonin data (neonatal sepsis) ofVouloumanou et al. (2011) with multiple thresholds.

88 A. DATA SETS


1 0 76 134 0 0

2 0 53 247 0 0

3 0 63 61 0 0

4 0 48 56 0 0

5 0 175 1795 0 0

6 0 294 527 0 0

7 0 57 60 0 0

8 0 110 117 0 0

9 0 25 129 0 0

10 0 52 483 0 0

1 1 70 35 6 99

2 1 46 50 7 197

3 1 50 14 13 47

4 1 46 18 2 38

5 1 107 235 68 1560

6 1 261 99 33 428

7 1 56 15 1 45

8 1 78 48 32 69

9 1 22 16 3 113

10 1 52 304 0 179

1 2 61 9 15 125

2 2 35 19 18 228

3 2 44 9 19 52

4 2 42 9 6 47

5 2 80 95 95 1700

6 2 216 45 78 482

7 2 47 6 10 54

8 2 58 15 52 102

9 2 12 1 13 128

10 2 48 184 4 299

1 3 42 3 34 131

2 3 23 2 30 245

3 3 33 3 30 58

4 3 27 2 21 54

5 3 42 30 133 1765

6 3 130 11 164 516

7 3 30 2 27 58

8 3 30 2 80 115

9 3 6 0 19 129

10 3 24 58 28 425

1 4 21 1 55 133

2 4 10 0 43 247

3 4 17 1 46 60

4 4 17 0 31 56

5 4 20 7 155 1788

6 4 56 1 238 526

7 4 23 0 34 60

8 4 10 1 100 116

9 4 2 0 23 129

10 4 5 5 47 478

Table A.4. CAGE data of Aertgeerts et al. (2004).

APPENDIX B

R Code

B.1. Code Novel Approach

Code of the R function evaluate.R of our new approach.

1 # Auxiliary functions

2

3 # Define logit function

4 logit <- function(x){

5 log(x) - log(1-x)

6 }

7 # Define expit function

8 expit <- function(x){

9 (1+ exp(-x))^(-1)

10 }

11

12 # Function for fixed point iteration

13 iterate <- function(f, x0 , nmax , eps , print=FALSE) {

14 x <- x0

15 n <- 0

16 while (abs(f(x) - x) > eps & n < nmax) {

17 n <- n+1

18 x <- f(x)

19 if (print) print(x)

20 }

21 if(n < nmax)

22 return(x)

23 else

24 stop(" Iteration of maximal Youdenindex reached number of

maximal iterations without converging.")

25 }

26

27 #----------------------------------------------------

28 # Function to evaluate and plot meta -analysis data

29

30 evaluate <- function(data , normal=TRUE , model="6c",

31 log=FALSE , reml = TRUE ,

32 weights = "Default",

33 lambda =0.5, nmax =1000,

89

90 B. R CODE

34 eps =0.000000001 ,

35 print.iterations=FALSE ,

36 xlab="Threshold", print=FALSE ,

37 plots=c(1,2,3,4,5),

38 evaluateCutoff = "optCutoff",

39 printCI=FALSE ,

40 wl = FALSE ,

41 output=TRUE){

42

43 # Input parameters:

44 # data:( dataframe) dataframe with columns named:

45 # study , cutoff , TP , FP , TN , FN

46 # normal :( logical)

47 # True: biomarker distribution assumption is normal ,

48 # False: biomarker distribution assumption is logistic

49 # model:( string) choose one one the following models:

50 # 1b (=CI), 2b (=DI), 3b (=CS), 4b(=DS), 5b(=CICS),

51 # 6b(=DICS), 7b(=CIDS), 8b(=DIDS),

52 # 1c (=*CI), 2c (=*DI), 3c (=*CS), 4c(=*DS),

53 # 5c(=*CICS), 6c(=*DICS), 7c(=*CIDS), 8c(=*DIDS),

54 # log:( logical) True: scale x-axis with log(x),

55 # FALSE: nothing happens

56 # reml:( logical) argument of function lmer.

57 # TRUE: REML=TRUE , FALSE: REML=FALSE. (Estimation with ML)

58 # weights :( string) prior weighting for studies.

59 # "Default ": no weighting (all weights =1)

60 # "SampleSize ": sample size

61 # "SampleSize1 ": sample size ,

62 # scaled so that the weights sum up to 1

63 # "SampleSize2 ": sample size ,

64 # scaled , so that the weights have mean 1

65 # "InverseVariance ": inverse variance

66 # "InverseVariance1 ": inverse variance ,

67 # scaled so that the weights sum up to 1

68 # "InverseVariance2 ": inverse variance ,

69 # scaled , so that the weights have mean 1

70 # lambda :( numeric) weighting parameter for higher weighting

71 # of sens(lambda > 0.5) or spez(lambda > 0.5).

72 # lambda =0.5 is equally weighted

73 # nmax:( numeric) number of maximal iterations for

74 # finding the optimal cutoff of logistic distribution

75 # eps:( numeric) smallest difference for end of iterations

76 # for finding the optimal cutoff of logistic distribution

77 # print.iterations :( logical)

78 # TRUE: output of iterations , FALSE: no output

79 # xlab:( string) x axis label

B.1. CODE NOVEL APPROACH 91

80 # print:( logical) TRUE: creates plots as pdfs in folder ,

81 # FALSE: direct output of plots

82 # plots:( vector of numerics)

83 # 1: transformed data with regression lines

84 # 2: data with distribution functions

85 # 3: ROC

86 # 4: Youdenindex

87 # evaluateCutoff :( string)

88 # The cutoff where Sens and Spec are evaluated

89 # for output. Default is the optimal cutoff.

90 # printCI :( logical) TRUE: print confidence intervals ,

91 # FALSE: do not print CIs

92 # wl: (logical) TRUE: datapoints are connected with lines.

93 # FALSE: they are not

94 # output: (logical) TRUE: if output on the console is

diseased ,

95 # else FALSE.

96

97 ######################### FUNCTIONS

#################################

98 # Function for calculating the weighted cut -off point of

99 # two logistic distributions by an iterative fixpoint

procedure

100

101 # function

102 g <- function(x) {

103 m1 - s1*acosh(lambda/(1-lambda)*s0/s1*(1+ cosh((x-m0)/s0))

-1)

104 }

105

106 # inverse function

107 f <- function(x) {

108 m0 + s0*acosh ((1- lambda)/lambda*s1/s0*(1+ cosh((x-m1)/s1))

-1)

109 }

110

111 iter <- NULL

112

113 # error handling for function iterate , chooses f or g for

iterations

114 saveIterate <- function(x0, nmax , eps , print) {

115 tryCatch( { # try iterating with funciton f

116 x <- iterate(f, x0 , nmax , eps , print)

117 print("Optimal cut -off iteration with f")

118 return(list(x=x, iter="f"))

119 },

92 B. R CODE

120 warning=function(w) {

121 suppressWarnings(x <- iterate(f, x0 , nmax , eps , print))

122 warning(w$message)

123 return(list(x=x, iter="f"))

124 },

125 # if error occurs , iterate with function g

126 error=function(e) {

127 tryCatch ({ # try iteration with function g

128 x <- iterate(g, x0, nmax , eps , print)

129 print("Optimal cut -off iteration with g")

130 return(list(x=x, iter="g"))

131 },

132 warning=function(wa) {

133 suppressWarnings(x <- iterate(g, x0 , nmax , eps , print

))

134 warning(w$message)

135 return(list(x=x, iter="g"))

136 },

137 error=function(er){

138 stop("Optimal cutoff iteration didn ’t converge. Use

normal=TRUE.")

139 return(list(x=NULL , iter=NULL))

140 })

141 })

142 }

143

144 ############# DATA ORGANISING

######################################

145 attach(data)

146

147 # Reorganizing the data

148 CutoffSingle <- cutoff

149 Cutoff <- c(cutoff ,cutoff)

150 Group <- c(rep(0,length(study)),rep(1,( length(study))))

151 Negative0 <- TN

152 Negative1 <- FN

153 Negative <-c(TN , FN)

154 N0 <- FP + TN # number of non -diseased patients

155 N1 <- TP + FN # number of diseased patients

156 N <- c(N0 , N1) # number of all patients in one study IN ONE

GROUP!!!(0 or 1)

157 NN0 <- (Negative0 + 0.5) / (N0 + 1) # with continuity

correction

158 NN1 <- (Negative1 + 0.5) / (N1 + 1)

159 NN <- (Negative + 0.5) / (N + 1)

160 StudySingle <- study


161 Study <- c(study ,study)

162 numberOfStudies <- nlevels(factor(Study))

163 # Vector consisting of as many colors as the maximal number

of a study

164 colorChart <- rainbow(max(study))

165 colorVector <- colorChart[StudySingle]

166 # Dataframe consisting of rows with data for each cutoff of

each study , first

167 # for all non -diseased individuals and then everything again

for the diseased individuals

168 # (each cutoff of each study is named twice)

169 Data <- data.frame(Study , Group , Cutoff , N, Negative)

170

171 #---------------------------------------------

172

173 ## log transfomation of cutoff values

174 if(log) {Cut <- log(Cutoff)

175 CutSingle <- log(CutoffSingle)

176 } else {Cut <- Cutoff

177 CutSingle <- CutoffSingle}

178

179

180 ################### WEIGHTS

#######################################

181

182 # Default weights

183 if(weights == "Default"){

184 w <- NULL

185 }

186

187 # weights according to sample size

188 if(weights == "SampleSize"){

189 # weights are sample sizes

190 w <- c(N0,N1)

191 }

192

193 # Sample size --------- scale: sum up to 1

194 if(weights == "SampleSizeScaled1"){

195 w <- c(N0,N1)

196 w <- (w / sum(w))

197 }

198

199 # Sample size --------- scale: mean 1

200 if(weights == "SampleSizeScaled2"){

201 w <- (length(N) * N) / sum(N)

202 }

94 B. R CODE

203

204

205 # Inverse Variance weights with continuity correction

206 if(weights == "InverseVariance"){

207 if(!normal){

208 w0 <- (TN + 0.5) * (FP + 0.5) / (N0 + 1)

209 w1 <- (TP + 0.5) * (FN + 0.5) / (N1 + 1)

210 w <- c(w0 ,w1)

211 } else {

212 w1 <- (N1+1)^3*dnorm(qnorm ((FN+0.5)/(N1+1)))^2/((TP+0.5)*

(FN+0.5))

213 w0 <- (N0+1)^3*dnorm(qnorm ((TN+0.5)/(N0+1)))^2/((TN+0.5)*

(FP+0.5))

214 w <- c(w0 , w1)

215 }

216 }

217

218 # Inverse Variance weights with continuity correction ---

scale: sum up to 1

219 if(weights == "InverseVarianceScaled1"){

220 if(!normal){

221 w0 <- (TN + 0.5) * (FP + 0.5) / (N0 + 1)

222 w1 <- (TP + 0.5) * (FN + 0.5) / (N1 + 1)

223 w <- c(w0 ,w1)

224 } else {


(FN+0.5))


(FP+0.5))

227 w <- c(w0 , w1)

228 }

229 # scaling to sum 1

230 w <- (w / sum(w))

231 }

232

233 # Inverse Variance weights with continuity correction ---

scale: mean 1

234 if(weights == "InverseVarianceScaled2"){

235 if(!normal){

236 w0 <- (TN + 0.5) * (FP + 0.5) / (N0 + 1)

237 w1 <- (TP + 0.5) * (FN + 0.5) / (N1 + 1)

238 w <- c(w0 ,w1)

239 } else {


(FN+0.5))



(FP+0.5))

242 w <- c(w0 , w1)

243 }

244 # scaling

245 w <- length(w) * w / sum(w)

246 }

247

248 #----------------------------------------

249

250 detach(data)

251

252 results <- list()

253

254 # Function to transform/rescale the x-values , either with

probit or with logit function.

255 resc <- function(x){

256 if(normal) {qnorm(x)}

257 else logit(x)

258 }

259

260 ############### MODELS

#############################################

261 # fixed effects: b (Group + Cut), c (Group * Cut)

262 # random effects: 1,2 (random intercept), 3,4 (random slope),

5,6 (random intercept + slope)

263

264 # Linear regression according to selected model

265 # models as in thesis subsection 3.3.3.

266

267 #-------------------------------------------------------

268 ## Models with Fixed Effects: 1 + Group + Cut

269 # random intercept , common distribution [thesis: model CI]

270 if(model == "1b"){

271 lmeModel <- lmer(resc(NN) ~ Group + Cut + (1| Study), REML =

reml ,weights=w)

272 }

273 # random intercept , different distributions [thesis: model

DI]

274 if(model == "2b"){

275 lmeModel <- lmer(resc(NN) ~ Group + Cut + (1 + Group |

Study), REML = reml ,weights=w)

276 }

277

278 # random slope , common distribution [thesis: model CS]

279 if(model == "3b"){

96 B. R CODE

280 lmeModel <- lmer(resc(NN) ~ Group + Cut + (0 + Cut|Study),

REML = reml , weights=w)

281 }

282

283 # random slope , different distributions [thesis: model DS]

284 if(model == "4b"){

285 lmeModel <- lmer(resc(NN) ~ Group + Cut + (0 + Cut + Group

: Cut|Study), REML = reml , weights=w)

286 }

287

288 # random slope and intercept , common distributions [thesis:

model CICS]

289 if(model == "5b"){

290 lmeModel <- lmer(resc(NN) ~ Group + Cut + (Cut|Study), REML

= reml , weights = w)

291 }

292

293 # random slope (common distribution) and intercept(different

distributions) [thesis: model DICS]

294 if(model == "6b"){

295 lmeModel <- lmer(resc(NN) ~ Group + Cut + (Cut + Group |

Study), REML = reml , weights=w)

296 }

297

298 # random slope (different distributions) and intercept (

common distribution) [thesis: model CIDS]

299 if(model == "7b"){

300 lmeModel <- lmer(resc(NN) ~ Group + Cut + (Cut + Group :

Cut |Study), REML = reml , weights=w)

301 }

302

303 # random slope (different distributions) and intercept (

different distributions) [thesis: model DIDS]

304 if(model == "8b"){

305 lmeModel <- lmer(resc(NN) ~ Group + Cut + (Group * Cut|


306 }

307

308 # --------------------------------------------------------

309 ## Models with Fixed Effects: Group * Cut

310 # random intercept , common distribution [thesis: model *CI]

311 if(model == "1c"){

312 lmeModel <- lmer(resc(NN) ~ Group * Cut + (1| Study), REML =

reml ,weights=w)

313 }

314


315 # random intercept , different distributions [thesis: model *

DI]

316 if(model == "2c"){

317 lmeModel <- lmer(resc(NN) ~ Group * Cut + (1 + Group |

Study), REML = reml ,weights=w)

318 }

319

320 # random slope , common distribution [thesis: model *CS]

321 if(model == "3c"){

322 lmeModel <- lmer(resc(NN) ~ Group * Cut + (0 + Cut|Study),

REML = reml , weights=w)

323 }

324

325 # random slope , different distributions [thesis: model *DS]

326 if(model == "4c"){

327 lmeModel <- lmer(resc(NN) ~ Group * Cut + (0 + Cut + Group

: Cut|Study), REML = reml , weights=w)

328 }

329

330 # random slope + intercept , common distributions [thesis:

model *CICS]

331 if(model == "5c"){

332 lmeModel <- lmer(resc(NN) ~ Group * Cut + (Cut|Study), REML

= reml , weights = w)

333 }

334

335 # random slope(common distribution) and intercept(different

distributions) [thesis: model *DICS]

336 if(model == "6c"){

337 lmeModel <- lmer(resc(NN) ~ Group * Cut + (Cut + Group |


338 }

339

340 # random slope(different distributions) and intercept (common

distributions) [thesis: model *CIDS]

341 if(model == "7c"){

342 lmeModel <- lmer(resc(NN) ~ Group * Cut + (Cut + Group :

Cut |Study), REML = reml , weights=w)

343 }

344

345 # random intercept(different distributions) and slope (

different distributions) [thesis: model *DIDS]

346 if(model == "8c"){

347 lmeModel <- lmer(resc(NN) ~ Group * Cut + (Group * Cut|


348

98 B. R CODE

349 }

350

351 #--------------------------------------------------------

352

353 s <- summary(lmeModel)

354

355 cf <- coef(s)

356 vc <- vcov(s)

357

358

359 ################ PARAMETER EXTRACTION

#############################

360 # Extract regression coefficients alpha0 , beta0 of the non -

diseased and alpha1 and beta1 of the diseased. And their

variances.

361

362

363 if(grepl("b", model)) {

364 alpha0 <- cf[1,1]

365 alpha1 <- cf[1,1] + cf[2,1]

366 beta0 <- cf[3,1]

367 beta1 <- cf[3,1]

368 varalpha0 <- vc[1,1]

369 varalpha1 <- vc[1,1] + vc[2,2] + 2*vc[1,2]

370 varbeta0 <- vc[3,3]

371 varbeta1 <- vc[3,3]

372 cov0 <- vc[1,3]

373 cov1 <- vc[1,3] + vc[2,3]

374 }

375

376 if(grepl("c", model)) {

377 alpha0 <- cf[1,1]

378 alpha1 <- cf[1,1] + cf[2,1]

379 beta0 <- cf[3,1]

380 beta1 <- cf[3,1] + cf[4,1]

381 varalpha0 <- vc[1,1]

382 varalpha1 <- vc[1,1] + vc[2,2] + 2*vc[1,2]

383 varbeta0 <- vc[3,3]

384 varbeta1 <- vc[3,3] + vc[4,4] + 2*vc[3,4]

385 cov0 <- vc[1,3]

386 cov1 <- vc[1,3] + vc[1,4] + vc[2,3] + vc[2,4]

387 }

388

389 #.......................................................

390 # Compute the parameters of the biomarker distributions and

their variances.


391

392 m0 <- -alpha0/beta0 # Mean disease -free

393 s0 <- 1/beta0 # Standard deviation disease -free

394 m1 <- -alpha1/beta1 # Mean diseased

395 s1 <- 1/beta1 # Standard deviation diseased

396

397

398 vars0 <- varbeta0/(beta0 ^4)

399 vars1 <- varbeta1/(beta1 ^4)

400 varm0 <- (alpha0 ^2)/(beta0 ^4)*varbeta0+varalpha0/(beta0 ^2) -2

*alpha0/(beta0 ^3)*cov0

401 varm1 <- (alpha1 ^2)/(beta1 ^4)*varbeta1+varalpha1/(beta1 ^2) -2

*alpha1/(beta1 ^3)*cov1

402

403

404 ###### if neg correlation of data , stop

405 if(beta0 <=0 |beta1 <=0) stop("Regression yields negative

correlation. Try another model or get better data :)")

406 if(m1 < m0) stop("Estimated distribution of diseased patients

is left of non -diseased ones. Check if for your

biomarker really higher values indicate illness.")

407

408 ############ DISTRIBUTIONS

###################################

409

410 ######### NORMAL DISTRIBUTIONS ASSUMPTION

#####################

411 if(normal){

412 # Compute cutpoint(s) ’cut ’ of the two normals weighted

with lambda and 1-lambda

413 turn <- (m0*s1^2 - m1*s0^2)/(s1^2 -s0^2)

414 rad <- sqrt(s0^2*s1^2*(2*(s1^2 - s0^2)*(log(s1) - log(s0) -

logit(lambda)) + (m1 - m0)^2)/(s1^2 - s0^2)^2)

415 x0 <- turn - rad

416 x1 <- turn + rad

417 if (s0 < s1) cut <- x1

418 if (s0 > s1) cut <- x0

419 if (s1 == s0) {

420 cut <- (s0^2*(-logit(lambda)) -0.5*(m0^2-m1^2))/(m1 -m0)

421 }

422 # Function to compute sensitivity and specificity and their

the confidence intervals

423 sesp <- function(x) {

424 y0 <- beta0*x + alpha0

425 sp <- pnorm(y0)

426 SEy0 <- sqrt(varalpha0 + x^2*varbeta0 + 2*x*cov0)

100 B. R CODE

427 lsp <- pnorm(y0 - 1.96*SEy0)

428 usp <- pnorm(y0 + 1.96*SEy0)


430 se <- 1-pnorm(y1)


432 lse <- 1-pnorm(y1 + 1.96*SEy1)

433 use <- 1-pnorm(y1 - 1.96*SEy1)

434 list(Sens = c("Sens",round(se ,3),"[",round(lse ,3),";",

round(use ,3),"]"), Spec = c("Spec",round(sp ,3),"[",

round(lsp ,3),";",round(usp ,3),"]"))

435 }

436 #.........................................

437 # Back -transform optimal cut -off

438 if(log) cutlog <- exp(cut)

439

440 # Plot y axis label

441 yLAB1 =expression(Phi^{-1}~"(P(negative test result))")

442

443 }

444

445 ############## LOGISTIC DISTRIBUTION ASSUMPTION

###############################

446 if(!normal){

447 # Cutoff of the two logistics , weighted with lambda and

1-lambda

448 wmean <- (1-lambda)*m0 + lambda*m1

449 if ((1- lambda)*s1 != lambda*s0) {

450 x0 <- wmean

451 iterateResult <- saveIterate(x0 ,nmax ,eps ,print=print.

iterations)

452 cut <- iterateResult$x

453 iter <- iterateResult$iter

454 }

455 if ((1- lambda)*s1 == lambda*s0) {

456 cut <- wmean

457 }

458

459 # Function to compute sensitivity and specificity and their

the confidence intervals

460 sesp <- function(x) {


462 sp <- expit(y0)


464 lsp <- expit(y0 - 1.96*SEy0)

465 usp <- expit(y0 + 1.96*SEy0)



467 se <- 1-expit(y1)


469 lse <- 1-expit(y1 + 1.96*SEy1)

470 use <- 1-expit(y1 - 1.96*SEy1)

471 list(Sens = c("Sens",round(se ,3),"[",round(lse ,3),";",

round(use ,3),"]"), Spec = c("Spec",round(sp ,3),"[",

round(lsp ,3),";",round(usp ,3),"]"))

472 }

473

474 #...........................................

475 # Back -transform optimal cut -off

476 if(log) cutlog <- exp(cut)

477

478 # Plot y axis label

479 yLAB1 = "Logit(P(negative test result))"

480 }

481

482 ############# PLOTS

##############################################

483

484 # Function to print confidence regions

485 printConfI <- function (){

486 gray <- rgb (190 ,190 ,190 , maxColorValue =255, alpha =170)

487 upperSens <- function(vectorX) lapply(vectorX , function(x)

1-as.numeric(sesp(x)$Sens [4]))

488 if(log) curve(upperSens(log(x)), col=gray , lwd=1, add=TRUE)

else

489 curve(upperSens(x), col=gray , lwd=1, add=TRUE)

490 lowerSpec <- function(vectorX) lapply(vectorX , function(x)

as.numeric(sesp(x)$Spec [4]))

491 if(log) curve(lowerSpec(log(x)), col=gray , lwd=1, add=

TRUE) else

492 curve(lowerSpec(x), col=gray , lwd=1, add=TRUE)

493 lowerSens <- function(vectorX) lapply(vectorX , function(x)

1-as.numeric(sesp(x)$Sens [6]))

494 if(log) curve(lowerSens(log(x)), col=gray , lwd=1, add=TRUE)

else

495 curve(lowerSens(x), col=gray , lwd=1, add=TRUE)

496 upperSpec <- function(vectorX) lapply(vectorX , function(x)

as.numeric(sesp(x)$Spec [6]))

497 if(log) curve(upperSpec(log(x)), col=gray , lwd=1, add=TRUE)

else

498 curve(upperSpec(x), col=gray , lwd=1, add=TRUE)

499 }

500 #------------------------------------------------------

501 ## Plot 1: linear regression lines in logit/probit Space

102 B. R CODE

502 if(1 %in% plots){

503 if(print){pdf("PlotsMA/Plot1.pdf", width=5, height =4.5)}

504 if(log){

505 plot(Cutoff ,resc(NN), pch=16, col=0, ylab=expression(atop

("",Phi^{-1}~"(P(negative test result))")),xlab=xlab ,

cex.lab=0.7,cex.axis =0.7, log="x")

506 # add transformed data (possibly with lines)

507 points(CutoffSingle , resc(NN1), pch=16, cex=1, col=

colorVector)

508 if(wl) connect <- lapply (1: numberOfStudies , function(i)

lines(CutoffSingle[which(StudySingle ==i)], resc(NN1[

which(StudySingle ==i)]), col=colorChart[i], lwd=1))


colorVector)




511 # add linear regression lines

512 curve(alpha0 + beta0*log(x), lty=2, col = 1, lwd=1, add=

TRUE)

513 curve(alpha1 + beta1*log(x), lty=1, col = 1, lwd=1, add=

TRUE)

514 } else {

515 plot(Cutoff ,resc(NN), pch=16, col=0, ylab=yLAB1 ,xlab=xlab

)

516 # add transformed data (possibly with lines)


colorVector)





colorVector)




521 # add linear regression lines

522 abline(alpha0 ,beta0 , lty=2, col = 1, lwd =1)

523 abline(alpha1 ,beta1 , lty=1, col = 1, lwd =1)

524 }

525 if(print)dev.off()

526 }

527 #----------------------------------------------------------

528 ## Plot 2: data and biomarker distributions functions




531 if(log) {

532 plot(Cutoff ,NN , pch=16, col=0, ylab=expression(atop("","P

(negative test result)")), xlab=xlab , cex.lab=0.7,cex

.axis =0.7, log="x", ylim=c(0,1)) #cex.lab=2

533 }

534 if(!log) {

535 plot(Cutoff ,NN , pch=16, col=0, ylab="P(negative test

result)",xlab=xlab , ylim=c(0,1))

536 }

537 # add data (possibly with lines)

538 points(CutoffSingle , NN1 , pch=16, cex=1, col=colorVector)


lines(CutoffSingle[which(StudySingle ==i)], NN1[which(

StudySingle ==i)], col=colorChart[i], lwd=1))

540 points(CutoffSingle , NN0 , pch=1, cex=1, col=colorVector)


lines(CutoffSingle[which(StudySingle ==i)], NN0[which(

StudySingle ==i)], col=colorChart[i], lwd=1))

542 # add regression curves (possibly with confidence intervals

)

543 if(normal) {

544 if(log) {

545 if(printCI) printConfI ()

546 curve(pnorm(log(x),m0 ,s0), lty=2, col=1,lwd=1, add=TRUE

)

547 curve(pnorm(log(x),m1 ,s1), lty=1, col=1,lwd=1, add=TRUE

)

548 }

549 if(!log) {


551 curve(pnorm(x,m0 ,s0), lty=2, col=1,lwd=1, add=TRUE)

552 curve(pnorm(x,m1 ,s1), lty=1, col=1,lwd=1, add=TRUE)

553 }

554 }

555 else{

556 if(log) {


558 curve(expit(beta0*log(x) + alpha0), lty=2, col=1, lwd

=1, add=TRUE)

559 curve(expit(beta1*log(x) + alpha1), lty=1, col=1,lwd=1,

add=TRUE)

560 }

561 if(!log) {


563 curve(expit(beta0*x + alpha0), lty=2, col=1,lwd=1, add=

TRUE)

104 B. R CODE

564 curve(expit(beta1*x + alpha1), lty=1, col=1,lwd=1, add=

TRUE)

565 }

566 }

567 # draw optimal cut -off

568 if(log) {abline(v=exp(cut))} else {abline(v=cut)}

569 # draw legend

570 if(log)legend (1,0.12, paste("Optimal threshold =", round(

cutlog ,1)), cex=0.7, lwd=1, col=1, bty="n") else

571 legend(0,1, paste("Optimal threshold =", round(cut ,1)),

cex=0.7, col=1, bty="n")

572 # legend (0.3*max(Cutoff) ,0.95, paste(" Optimal threshold

=", round(cut ,1)), cex=0.7,lwd=1, col=1, bty="n")


574 }

575

576 # ----------------------------------------------------------

577 ## plot 3: Youden index = Sens + Spec - 1 ~ TNR -FNR

578 ##### not for weighted Youdenindex! (-> need to weight the

data , too)



581 # plot data

582 if(log) {

583 plot(CutoffSingle , NN0 - NN1 , pch=16, ylab=expression(

atop("","Youden index")),xlab=xlab ,cex.lab=0.7,cex.

axis =0.7, ylim=c(0,1), col=colorVector ,

584 cex=1, log="x")


lines(CutoffSingle[which(StudySingle ==i)], (NN0 -NN1)[

which(StudySingle ==i)], col=colorChart[i], lwd=1))

586 }

587 if(!log) {

588 plot(CutoffSingle , NN0 - NN1 , pch=16, ylab="Youden index"

,xlab=xlab , ylim=c(0,1), col=colorVector ,

589 cex=1)


lines(CutoffSingle[which(StudySingle ==i)], (NN0 -NN1)[

which(StudySingle ==i)], col=colorChart[i], lwd=1))

591 }

592 # plot Youden index

593 if(normal) {

594 if(log) {

595 curve (2*(1-lambda)*pnorm(log(x),m0 ,s0) + 2*lambda*(1-

pnorm(log(x),m1 ,s1)) - 1,col=1, add=TRUE)

596 }


597 if(!log) {

598 curve (2*(1-lambda)*pnorm(x,m0 ,s0) + 2*lambda*(1-pnorm(x

,m1 ,s1)) - 1,col=1, add=TRUE)

599 }

600 } else {

601 if(log) {

602 curve (2*(1-lambda)*expit(alpha0 + beta0*log(x)) + 2*

lambda*(1 - expit(alpha1 + beta1*log(x))) - 1, col

=1, add=TRUE)

603 }

604 if(!log) {

605 curve (2*(1-lambda)*expit(alpha0 + beta0*x) + 2*lambda*

(1 - expit(alpha1 + beta1*x)) - 1, col=1, add=TRUE)

606 }

607 }

608 # plot line at threshold where the maximum is obtained

609 if(log) {abline(v=exp(cut))} else {abline(v=cut)}

610 # draw legend

611 if(log)legend (1.1 ,0.95 , paste("Optimal threshold =", round(

cutlog ,1), "ng/mL"),cex=0.7, lwd=1, col=1, bty="n")

else

612 legend (0.3*max(Cutoff) ,0.95, paste("Optimal threshold =",

round(cut ,1)), cex=0.7,lwd=1, col=1, bty="n")


614 }

615

616 # --------------------------------------------------------

617 ## Plot 4: ROC curve


619 if(print){pdf("PlotsMA/Plot4.pdf", width=5, height =5)}

620 # plot data

621 par(mfrow=c(1,1), pty="s")

622 plot(1-NN0 ,1-NN1 , pch=16, col=colorVector , cex=1,cex.lab

=0.7,cex.axis =0.7,

623 xlim=c(0,1),ylim=c(0,1),

624 xlab="1 - Specificity", ylab="Sensitivity")

625 # add ROC curve

626 if(normal){

627 curve(1 - pnorm(qnorm(1-x,m0 ,s0),m1 ,s1), lwd=1, col=1,

add=TRUE)

628 points (1 - pnorm(cut ,m0,s0), 1 - pnorm(cut ,m1,s1), lwd=1,

cex=1, pch=3, col =1)

629 } else {

630 curve(1 - expit(alpha1+ beta1*(logit(1-x) - alpha0)/beta0)

, lwd=1, col=1, add=TRUE)

106 B. R CODE

631 points (1 - expit(alpha0 + beta0*cut) ,1 - expit(alpha1 +

beta1*cut), lwd=1, cex=4, pch=3, col=1)

632 }

633 # add legend

634 if(log) {legend (0.2, 0.3, bty="n", lwd=c(1,NA ,NA),pch=c

(-1,3,NA), col=c(1,1,NA), cex=0.7, c("Model -based

summary ROC curve",paste("Optimal threshold at ", round

(cutlog ,1), "ng/mL ,"), "(Se ,Sp)=(0.71 ,0.80)"))

635 } else {

636 legend (0.2, 0.3, bty="n",lwd=2,pch=c(-1,3),col=c(1,1),

cex=0.7, c("Model -based summary ROC curve",paste("

Optimal threshold at ", round(cut ,1), "ng/mL")))

637 }


639 }

640

641

642 ############ OUTPUT

#############################################

643 # print sens and spec and their confidence intervals at

chosen cut -off "evaluateCutoff"

644 if(evaluateCutoff =="optCutoff") {

645 evaluateCutoffHere <- cut

646 SESP <-sesp(cut)

647 } else {

648 if(log)evaluateCutoffHere <- as.numeric(log(evaluateCutoff)

) else

649 evaluateCutoffHere <- as.numeric(evaluateCutoff)

650 SESP <- sesp(evaluateCutoffHere)

651 }

652

653 # create list with all results

654 results$model <- model

655 results$REML.criterion <- REMLcrit(lmeModel)

656 results$AIC <- AIC(logLik(lmeModel))

657 results$BIC <- BIC(lmeModel)

658 results$normal <- normal

659 results$log <- log

660 results$REML <- reml

661 results$weights.name <- weights

662 results$lambda <- lambda

663 results$nmax <- nmax

664 results$eps <- eps

665 results$iter <- iter

666 results$print.iterations <- print.iterations

667 results$print <- print


668 results$workingData <- Data # Cutoff not yet logarithmized

669 results$weights <- w

670 results$regression.coefficients <- data.frame(alpha_0 = alpha0 ,

beta_0 = beta0 , alpha_1 = alpha1 , beta_1 = beta1)

671 results$distribution.parameters <- data.frame(m_0 = m0, varm0=

varm0 , s_0 = s0, vars0=vars0 , m_1 = m1, varm1=varm1 , s_1 =

s1, vars1=vars1)

672 if(log) results$optimal.cutoff <- cutlog else

673 results$optimal.cutoff <- cut

674 results$pooled.sensitivity <- SESP$Sens [1]

675 results$polled.specificity <- SESP$Spec [1]

676 results$inputData <- data

677 results$lmerOutput <- lmeModel

678

679 # Output on console

680 if(output){

681

682 cat("\nModel: ", model ,"\n\n")

683

684 if(log) {cat("The optimal cut -off value is:", round(cutlog ,3),

"\n\n")} else

685 {cat("The optimal cut -off value is:", round(cut ,3), "\n

\n")}

686

687 cat("REML criterion: ", REMLcrit(lmeModel),"\n\n")

688

689 if(log)

690 cat("Pooled Sensitivity and Specificity at cutoff =", round(exp

(evaluateCutoffHere) ,3), ":", "\n", SESP$Sens , "\n", SESP$

Spec ,"\n\n")

691 else

692 cat("Pooled Sensitivity and Specificity at cutoff =", round(

evaluateCutoffHere ,3), ":", "\n", SESP$Sens , "\n", SESP$

Spec ,"\n\n")

693

694 cat("--------------------------------", "\n\n")

695 }

696 return(results)

697

698 }

108 B. R CODE

B.2. Code Simulation Study

1 # Simulation study

2 # Susanne Steinhauser

3 # August/September 2015

4

5 library("data.table")

6 library(lme4)

7

8 # The version of the Evaluate.R function which is used here

defers from the one shown in section B1 in

9 # - 3 different input parameter: cutoff1 , cutoff2 , cutoff3

10 # - At these parameters sensitvity and specificity with their

confidence intervals are evaluated

11 # - the output is a dataframe containing the distribution

parameters with their variances (mo , varmo , m1 , varm1 , s0,

vars0 , s1, vars1),

12 # the optimal cutoff (cut) and sensitivity and specificity at

the 3 cut -offs given in the input with their upper and

lower boundary of the confidence interval.

13 source("Evaluate15Simu.R")

14

15 # Define logit function

16 logit <- function(x){

17 log(x) - log(1-x)

18 }

19

20

21 set.seed (14)

22

23 ######################################### OUTPUT

##################################################

24

25 # open file

26 sink(paste(getwd(),"/","SimuOutput",Sys.Date(),".csv",sep="",

collapse=NULL))

27 # write header to file

28 cat("numberOfRuns; mu0Perfect; lambda; sigma0Perfect;

mu1Perfect; sigma1Perfect; tauMu; tauSigma; model;

realMaxYouden; realSens1; realSpec1; cutoff2; realSens2;

realSpec2; cutoff3; realSens3; realSpec3; ;")

29 cat("optCutMean; optCutSE; optCutMSE; m0mean; m0SE; m0MSE;

m0Coverage; s0mean; s0SE; s0MSE; s0Coverage; m1mean; m1SE;

m1MSE; m1Coverage; s1mean; s1SE ; s1MSE; s1Coverage; ")

B.2. CODE SIMULATION STUDY 109

30 cat("sens1Mean; sens1SE; sens1MSE; sens1Coverage; spec1Mean;

spec1SE; spec1MSE; spec1Coverage; sens2Mean; sens2SE;

sens2MSE; sens2Coverage;")

31 cat("spec2Mean; spec2SE; spec2MSE; spec2Coverage; sens3Mean;

sens3SE; sens3MSE; sens3Coverage; spec3Mean; spec3SE;

spec3MSE; spec3Coverage; warnings; errors; negCorrError \n"

)

32

33 ############################### PARAMETERS

#########################################################

34

35 numberOfRuns <- 1000

36 # 1. number of studies per meta analysis

37 numberOfStudies <- c(10 ,20 ,30)

38 # 2. healthy and ill logistic distribution parameters

39 mu0Perfect <- 0

40 sigma0Perfect <- c(1.5, 1, 2.5, 2.5)

41 mu1Perfect <- c(2.5, 2.5, 4, 4)

42 sigma1Perfect <- c(1.5, 2, 2.5, 4)

43 # lognormal distribution parameters to draw number of patients

per study

44 meanlogNumberOfPatients <- 5

45 sdlogNumberOfPatients <- 1

46 # poisson distribution parameter to draw number of cutoffs

47 lambdaNumberOfCutoffs <- c(1.3 ,2)

48 # 3. heterogenize parameters

49 # for the code they should have same length (for loop)

50 heterogenizeM <- c(0, 0.5, 1, 1.5)

51 heterogenizeS <- c(0, 0.3, 0.4, 0.5)

52 # vector of linear mixed effect models

53 modelVector <- as.list(c("1b", "4b", "5b", "7b", "1c", "4c", "5

c", "7c"))

54 # parameter to unequal proportion of ill and healthy

distributions

55 lambda <- 0.5

56

57 # initialize a warning and an error vector for the messages

58 warningMessageVector <- errorMessageVector <- vector(mode="

character")

59

60 ######################################### ITERATION OF

PARAMETERS

#################################################

61 for(iterLambda in 1: length(lambdaNumberOfCutoffs)){

62 for(iterParameter in 1: length(mu1Perfect)){

63 for(h in 1: length(heterogenizeS)) {

110 B. R CODE

64 for (nm in 1: length(modelVector)) {

65 # create empty vectors to store the resulting

parameters of all runs

66 mu0RunsVector <- sigma0RunsVector <- mu1RunsVector <-

sigma1RunsVector <- optCutRunsVector <-

optCutMSE <- vector(length=numberOfRuns)

67 coverageVectorM0 <- coverageVectorS0 <-

coverageVectorM1 <- coverageVectorS1 <- vector(

length=numberOfRuns)

68 sens1 <- spec1 <- sens2 <- spec2 <- sens3 <- spec3 <-

vector(length=numberOfRuns , mode="numeric")

69 coverageVectorSens1 <- coverageVectorSens2 <-

coverageVectorSens3 <- coverageVectorSpec1 <-

coverageVectorSpec2 <- coverageVectorSpec3 <-

vector(length=numberOfRuns)

70 warningVector <- errorVector <- vector(length=

numberOfRuns)

71 negCorrVector <- rep(FALSE , numberOfRuns)

72

73 # determine optimal cutoff

74 turn <- (mu0Perfect*sigma1Perfect[iterParameter ]^2 -

mu1Perfect[iterParameter]*sigma0Perfect[

iterParameter ]^2)/(sigma1Perfect[iterParameter ]^2

-sigma0Perfect[iterParameter ]^2)

75 rad <- sqrt(sigma0Perfect[iterParameter ]^2*

sigma1Perfect[iterParameter ]^2*(2*(sigma1Perfect[

iterParameter ]^2 - sigma0Perfect[iterParameter

]^2)*(log(sigma1Perfect[iterParameter ]) - log(

sigma0Perfect[iterParameter ]) - logit(lambda)) +

(mu1Perfect[iterParameter] - mu0Perfect)^2))/(

sigma1Perfect[iterParameter ]^2 - sigma0Perfect[

iterParameter ]^2)

76 x0 <- turn - rad

77 x1 <- turn + rad

78 if (sigma0Perfect[iterParameter] < sigma1Perfect[

iterParameter ]) cut <- x1

79 if (sigma0Perfect[iterParameter] > sigma1Perfect[

iterParameter ]) cut <- x0

80 if (sigma1Perfect[iterParameter] == sigma0Perfect[

iterParameter ]) {

81 cut <- (sigma0Perfect[iterParameter ]^2*logit(lambda

)+0.5*(mu0Perfect ^2- mu1Perfect[iterParameter

]^2))/(mu0Perfect -mu1Perfect[iterParameter ])

82 }

83

84 YoudenindexMax <- cut


85

86 # fix 3 cut -off values to evaluate the results of

Sens , Spec

87 cutoff1 <- YoudenindexMax

88 cutoff2 <- mu0Perfect

89 cutoff3 <- mu1Perfect[iterParameter]

90

91 # function to calculate sens and spec at a cutoff

value x

92 sespSimu <- function(x) {

93 sp <- pnorm(x, mu0Perfect , sigma0Perfect[

iterParameter ])

94 se <- 1-pnorm(x, mu1Perfect[iterParameter],

sigma1Perfect[iterParameter ])

95 list(Sens=se, Spec =sp)

96 }

97

98 # get sens and spec at fix cutoffs 1,2,3

99 sensPerfect1 <- sespSimu(cutoff1)$Sens

100 specPerfect1 <- sespSimu(cutoff1)$Spec





105

106 ########################################## RUNS

#################################################

107 for(m in 1: numberOfRuns) {

108 # 1. generate a random sequence of entry choices for

the numberOfStudies vector

109 randomNumberOfStudiesVectorEntry <- sample (1: length

(numberOfStudies), size=numberOfRuns , replace=

TRUE)

110 # vector with number of studies for each run of

meta analysis

111 numberOfStudiesPerMetaanalysis <- numberOfStudies[

randomNumberOfStudiesVectorEntry]

112 # 2. determine number of cutoffs per study

113 numberOfCutoffsVector <- vector ()

114 while (length(numberOfCutoffsVector) <

numberOfStudiesPerMetaanalysis[m]) {

115 a <- rpois(lambda = lambdaNumberOfCutoffs[

iterLambda], n = 1)

116 if (a != 0) numberOfCutoffsVector <- c(

numberOfCutoffsVector ,a)

117 }

112 B. R CODE

118

119 # 4. vector with total patient number for each

study

120 totalPatientsPerStudy <- ceiling(rlnorm(n =

numberOfStudiesPerMetaanalysis[m],meanlog =

meanlogNumberOfPatients , sdlog =

sdlogNumberOfPatients))

121 # 5. vector with proportion of ill patients for

each study

122 propIllPerStudy <- vector(length =

numberOfStudiesPerMetaanalysis[m])

123 i <- 0

124 while(i < numberOfStudiesPerMetaanalysis[m]){

125 p <- rnorm(1, mean =0.5, sd =0.2)

126 if(p > 0.2 & p < 0.8) {propIllPerStudy[i+1] <- p;

i <- i + 1}

127 }

128 # 4.) + 5.) vector with number of ill patients per

study

129 numberIllPatients <- mapply(function(i,j) rbinom(n

= 1, size = i, prob = j),totalPatientsPerStudy

,propIllPerStudy)

130

131 # 2.) + 3.) heterogenize the distribution

parameters

132 # parameter per study. vector with an entry mu0 for

every study. Same for mu1. Truncate

distributions so that mu1 is always greater

than mu0

133 mu0 <- mu1 <- vector ()

134 while(length(mu0) < numberOfStudiesPerMetaanalysis[

m]){

135 m0 <- mu0Perfect + rnorm(1, mean=0, sd=

heterogenizeM[h])

136 m1 <- mu1Perfect[iterParameter] + rnorm(1, mean

=0, sd=heterogenizeM[h])

137 if((m1 -m0) > 0 & (m1-m0) < 2*(mu1Perfect[

iterParameter ])) {

138 mu0 <- c(mu0 , m0)

139 mu1 <- c(mu1 , m1)

140 }

141 }

142 # parameter per study. vector with an entry sigma0

for every study. Same for sigma1.

143 # Check that they are not negative and truncate

distribution symmetrically


144 sigma0 <- sigma1 <- vector ()

145 while(length(sigma0) <

numberOfStudiesPerMetaanalysis[m]){

146 s <- sigma0Perfect[iterParameter] + rnorm(1, mean

=0, sd=heterogenizeS[h])

147 if(s > 0 & s < 2*sigma0Perfect[iterParameter ])

sigma0 <- c(sigma0 , s)

148 }

149 while(length(sigma1) <

numberOfStudiesPerMetaanalysis[m]){

150 s <- sigma1Perfect[iterParameter] + rnorm(1, mean

=0, sd=heterogenizeS[h])

151 if(s > 0 & s < 2*sigma1Perfect[iterParameter ])

sigma1 <- c(sigma1 , s)

152 }

153

154 ############################################ DATA

GENERATION ##################################

155 SimuData=list()

156

157 for(s in 1: numberOfStudiesPerMetaanalysis[m]) {

158 # get cutoff values for healthy and ill patients.

Draw from logistic distribution.

159 BiomarkerValuesHealthy <- rnorm(n=(

totalPatientsPerStudy[s]-numberIllPatients[s

]), mean=mu0[s], sd=sigma0[s])

160 BiomarkerValuesIll <- rnorm(n=numberIllPatients[s

], mean=mu1[s], sd=sigma1[s])

161

162 ## 6. Cut -off allocation equidistantly between

40% quantile of distribution of healthy and

163 # 60% quantile of distribution of ill for each

study

164 cutoffMinVector <- qnorm (0.4, mean=mu0 , sd=sigma0

)

165 cutoffMaxVector <- qnorm (0.6, mean=mu1 , sd=sigma1

)

166 cutoffValues <- cutoffMinVector[s] + (1:

numberOfCutoffsVector[s])*(cutoffMaxVector[s

]-cutoffMinVector[s])/(numberOfCutoffsVector[

s]+1)

167

168

169

170 for(c in 1 : numberOfCutoffsVector[s]) {

114 B. R CODE

171 # 1.) +2.) +3.) +4.) +5.) +6.) => Test results of

healthy and ill patients

172 TNeg <- length(BiomarkerValuesHealthy[

BiomarkerValuesHealthy < cutoffValues[c]])

173 FPos <- length(BiomarkerValuesHealthy[

BiomarkerValuesHealthy >= cutoffValues[c]])

174 FNeg <- length(BiomarkerValuesIll[

BiomarkerValuesIll < cutoffValues[c]])

175 TPos <- length(BiomarkerValuesIll[

BiomarkerValuesIll >= cutoffValues[c]])

176

177 dataNew <- list(study=s,cutoff=cutoffValues[c]

, TN=TNeg , FN=FNeg , TP=TPos , FP=FPos)

178 SimuData <- rbindlist(list(SimuData , dataNew),

use.names=TRUE)

179 }

180 }

181 ###############################################

EVALUATE CALL

##############################################

182 # apply the evaluate function on the generated data

183 evalResult <- tryCatch ({

184 results <- evaluate(SimuData , normal=TRUE , model=

modelVector[nm], cutoff1=cutoff1 , cutoff2=

cutoff2 , cutoff3=cutoff3 ,log=FALSE , weights="

InverseVarianceScaled2",plots=c(), output=

FALSE)

185 list(result=results , warn=FALSE , error=FALSE ,

warnMessage="-")

186 },

187 warning=function(w) {

188 # w is the first warning of first part. Here I

need so supress the warnings to not see them

all again.

189 # so the print out is always just the fist

warning

190 suppressWarnings(result <-evaluate(SimuData ,

normal=TRUE , model=modelVector[nm],cutoff1=

cutoff1 , cutoff2=cutoff2 , cutoff3=cutoff3 ,log

=FALSE , weights="InverseVarianceScaled2",

plots=c(), output=FALSE))

191 list(result=results , warn=TRUE , error=FALSE ,

warnMessage=w$message)

192 },

193 error=function(e){


194 list(result=data.frame(m0=NA, varm0=NA, s0=NA,

vars0=NA, m1=NA , varm1=NA , s1=NA , vars1=NA ,

cut=NA ,SensC1=NA , SensC1l=NA , SensC1u=NA ,

SensC2=NA, SensC2l=NA, SensC2u=NA,

195 SensC3=NA, SensC3l=NA,

SensC3u=NA , SpecC1=NA ,

SpecC1l=NA , SpecC1u=

NA , SpecC2=NA , SpecC2l

=NA , SpecC2u=NA ,

SpecC3=NA, SpecC3l=NA,

SpecC3u=NA),

196 warn=FALSE , error=TRUE , warnMessage=e$

message)

197 }, finally = {})

198

199 ################################### STORAGE OF

RESULTS

############################################

200 attach(evalResult)

201 # store results of every run

202 optCutRunsVector[m] <- result$cut

203 mu0RunsVector[m] <- result$m0

204 sigma0RunsVector[m] <- result$s0

205 mu1RunsVector[m] <- result$m1

206 sigma1RunsVector[m] <- result$s1

207

208 # Coverage of m0 , s0

209 coverageVectorM0[m] <- abs(mu0Perfect - result$m0)

<= 1.96*sqrt(result$varm0)

210 coverageVectorM1[m] <- abs(mu1Perfect[iterParameter

] - result$m1) <= 1.96*sqrt(result$varm1)

211 coverageVectorS0[m] <- abs(sigma0Perfect[

iterParameter] - result$s0) <= 1.96*sqrt(result

$vars0)

212 coverageVectorS1[m] <- abs(sigma1Perfect[

iterParameter] - result$s1) <= 1.96*sqrt(result

$vars1)

213

214 # store sens and spec in 3 fix cutoffs

215 sens1[m] <- result$SensC1



218 spec1[m] <- result$SpecC1



221

116 B. R CODE

222 # coverage of sens , spec in 3 fix cutoffs

223 # e.g. is sensPerfect1 in confidence interval of

sens1?

224 coverageVectorSens1[m] <- (sensPerfect1 >= result$

SensC1l & sensPerfect1 <= result$SensC1u)





227 coverageVectorSpec1[m] <- (specPerfect1 >= result$

SpecC1l & specPerfect1 <= result$SpecC1u)





230

231 detach(evalResult)

232 # count warnings and errors

233 warningVector[m] <- evalResult$warn

234 warningMessageVector <- c(warningMessageVector ,

evalResult$warnMessage)

235 errorVector[m] <- evalResult$error

236 errorMessageVector <- c(errorMessageVector ,

evalResult$warnMessage)

237 if(evalResult$warnMessage == "neg correlation")

negCorrVector[m] <- TRUE

238 }

239 ############################################

AVERAGING OF RESULTS

###########################################

240

241 # calculate means and standard errors of the

estimated parameters of all runs

242 optCutMean <- mean(optCutRunsVector , na.rm=TRUE)

243 optCutSE <- sd(optCutRunsVector , na.rm=TRUE)

244 m0Mean <- mean(mu0RunsVector , na.rm=TRUE)

245 m0SE <- sd(mu0RunsVector , na.rm=TRUE)

246 s0Mean <- mean(sigma0RunsVector , na.rm=TRUE)

247 s0SE <- sd(sigma0RunsVector , na.rm=TRUE)

248 m1Mean <- mean(mu1RunsVector , na.rm=TRUE)

249 m1SE <- sd(mu1RunsVector , na.rm=TRUE)

250 s1Mean <- mean(sigma1RunsVector , na.rm=TRUE)

251 s1SE <- sd(sigma1RunsVector , na.rm=TRUE)

252

253 # calculate mse


254 m0MSE <- mean((rep(mu0Perfect , numberOfRuns) -

mu0RunsVector)^2, na.rm=TRUE)

255 s0MSE <- mean((rep(sigma0Perfect[iterParameter],

numberOfRuns) - sigma0RunsVector)^2, na.rm=TRUE)

256 m1MSE <- mean((rep(mu1Perfect[iterParameter],

numberOfRuns) - mu1RunsVector)^2, na.rm=TRUE)

257 s1MSE <- mean((rep(sigma1Perfect[iterParameter],

numberOfRuns) - sigma1RunsVector)^2, na.rm=TRUE)

258

259 sens1MSE <- mean((rep(sensPerfect1 , numberOfRuns) -

sens1)^2, na.rm=TRUE)

260 spec1MSE <- mean((rep(specPerfect1 , numberOfRuns) -

spec1)^2, na.rm=TRUE)









265

266 optCutMSE <- mean((rep(YoudenindexMax , numberOfRuns)

- optCutRunsVector)^2, na.rm=TRUE)

267

268

269 ########################################## OUTPUT

RESULTS #########################################

270 # format iterartion and output parameters to decimal

commas and 3 digits

271 iterationParameters <- c(lambdaNumberOfCutoffs[

iterLambda], sigma0Perfect[iterParameter],

mu1Perfect[iterParameter], sigma1Perfect[

iterParameter], heterogenizeM[h], heterogenizeS[h

], modelVector[nm],

272 YoudenindexMax , sensPerfect1

, specPerfect1 , cutoff2 ,

sensPerfect2 ,

specPerfect2 , cutoff3 ,

sensPerfect3 ,

specPerfect3)

273 outputParameters <-

274 c(optCutMean , optCutSE , optCutMSE , m0Mean , m0SE ,

m0MSE , mean(coverageVectorM0 , na.rm=TRUE),

s0Mean , s0SE , s0MSE , mean(coverageVectorS0 , na.

rm=TRUE),

118 B. R CODE

275 m1Mean , m1SE , m1MSE , mean(coverageVectorM1 , na.rm

=TRUE),s1Mean , s1SE , s1MSE , mean(

coverageVectorS1 , na.rm=TRUE),

276 mean(sens1 , na.rm=TRUE), sd(sens1 , na.rm=TRUE),

sens1MSE , mean(coverageVectorSens1 , na.rm=

TRUE), mean(spec1 , na.rm=TRUE), sd(spec1 , na.

rm=TRUE), spec1MSE , mean(coverageVectorSpec1 ,

na.rm=TRUE),





na.rm=TRUE),





na.rm=TRUE),

279 sum(warningVector , na.rm=TRUE), sum(errorVector ,

na.rm=TRUE), sum(negCorrVector , na.rm=TRUE))

280 iterationParametersformatted <- format(

iterationParameters , decimal.mark=",")

281 outputParametersformatted <- format(outputParameters ,

decimal.mark=",")

282

283 # write all parameters to file

284 cat(";;")

285 cat(iterationParametersformatted , sep=";")

286 cat(";;")

287 cat(outputParametersformatted , sep=";")

288 cat("\n")

289 }

290 }

291 }

292 }

293

294 #)

295 sink()

296

297 cat("number Of Runs:", numberOfRuns)

298 cat("warning messages:" )

299 print(warningMessageVector)

300 cat("error messages:")

301 print(errorMessageVector)

APPENDIX C

Simulation Study Plots

Models

Bia

s

−10

0

10

20

30

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−10

0

10

20

30Huge hetero, Different SD


Figure C.1. Bias of µ0 (open light blue circle), µ1 (filled light blue circle),σ0 (open dark blue circle) and σ1 (filled dark blue circle) in the case of λ = 2and nearby distributions. The heterogeneity of the studies increases fromleft to right. The four plots at the bottom show the case of same standarddeviations (SD), the top four plots the case of different standard deviations.

119

120 C. SIMULATION STUDY PLOTS

Models

Bia

s

−500

0

500

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−500

0

500



Figure C.2. Bias of µ0 (open light blue circle), µ1 (filled light blue circle),σ0 (open dark blue circle) and σ1 (filled dark blue circle) in the case ofλ = 1.3 and distant distributions. The heterogeneity of the studies increasesfrom left to right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of different standarddeviations.

C. SIMULATION STUDY PLOTS 121

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_2 spec_2

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_3 spec_3

Figure C.3. Bias of sensitivity and specificity at the ’real’ optimal thresh-old (top) and at threshold 2.5 (bottom) in the case of 5 thresholds per studyand nearby distributions. The heterogeneity of the studies increases fromleft to right. The four plots at the bottom show the case of same standarddeviations (SD), the top four plots the case of different standard deviations.


Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_2 spec_2

Models

Bia

s

−0.1

0.0

0.1

0.2

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−0.1

0.0

0.1

0.2


sens_3 spec_3

Figure C.4. Bias of sensitivity and specificity at the ’real’ optimal thresh-old (top) and at threshold 4 (bottom) in the case of λ = 1.3 and distantdistributions. The heterogeneity of the studies increases from left to right.The four plots at the bottom show the case of same standard deviations(SD), the top four plots the case of different standard deviations.


Models

Bia

s

−3

−2

−1

0

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

−3

−2

−1

0


Figure C.5. Bias of the optimal threshold in the case of 5 thresholds perstudy and nearby distributions. The heterogeneity of the studies increasesfrom left to right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of different standarddeviations.


Models

MS

E

0e+00

1e+06

2e+06

3e+06

4e+06

5e+06

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0e+00

1e+06

2e+06

3e+06

4e+06

5e+06


m0 s0 m1 s1

Models

MS

E

20

40

60

80

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

20

40

60

80


m0 s0 m1 s1

Figure C.6. MSE of the distribution parameters in the case of nearbydistributions and λ = 1.3. The top picture is the overall view and the oneon the bottom a zoomed-in version with MSE ≤ 100. The heterogeneity ofthe studies increases from left to right. The four plots at the bottom showthe case of same standard deviations (SD), the top four plots the case ofdifferent standard deviations.


Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_2 spec_2

Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_3 spec_3

Figure C.7. MSE of sensitivity and specificity at the ’real’ optimal thresh-old (top) and at threshold 2.5 (bottom) in the case of λ = 1.3 and nearbydistributions. The heterogeneity of the studies increases from left to right.The four plots at the bottom show the case of same standard deviations(SD), the top four plots the case of different standard deviations.


Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_1 spec_1

Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_2 spec_2

Figure C.8. MSE of sensitivity and specificity at threshold 0 (top) andat the ’real’ optimal threshold (bottom) in the case of λ = 1.3 and distantdistributions. The heterogeneity of the studies increases from left to right.The four plots at the bottom show the case of same standard deviations(SD), the top four plots the case of different standard deviations.


Models

MS

E

0.02

0.04

0.06

0.08

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.02

0.04

0.06

0.08


sens_3 spec_3

Figure C.9. MSE of sensitivity and specificity at threshold 4 in the case ofλ = 1.3 and distant distributions. The heterogeneity of the studies increasesfrom left to right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of different standarddeviations.


Models

MS

E

0

100

200

300

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0

100

200

300


Figure C.10. MSE of the optimal threshold in the case of λ = 1.3 andnearby distributions. The heterogeneity of the studies increases from leftto right. The four plots at the bottom show the case of same standarddeviations (SD), the top four plots the case of different standard deviations.


Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


m0 s0 m1 s1

Figure C.11. Coverage of the distribution parameters in the case of 5thresholds per study and nearby distributions. The heterogeneity of thestudies increases from left to right. The four plots at the bottom show thecase of same standard deviations (SD), the top four plots the case of differentstandard deviations.


Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_3 spec_3

Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_1 spec_1

Figure C.12. Coverage of sensitivity and specificity in the case of nearbydistributions. Top: at threshold 2.5 in the case of λ = 1.3. Bottom: atthreshold 0 in the case of 5 thresholds per study. The heterogeneity of thestudies increases from left to right. The four plots at the bottom show thecase of same standard deviations (SD), the top four plots the case of differentstandard deviations.


Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_2 spec_2

Models

Cove

rag

e

0.2

0.4

0.6

0.8

CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S



CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S


CI

DS

CIC

S

CID

S

*CI

*DS

*CIC

S

*CID

S

0.2

0.4

0.6

0.8


sens_3 spec_3

Figure C.13. Coverage of sensitivity and specificity at the ’real’ optimalthreshold (top) and at threshold 2.5 (bottom) in the case of 5 thresholds perstudy and nearby distributions. The heterogeneity of the studies increasesfrom left to right. The four plots at the bottom show the case of samestandard deviations (SD), the top four plots the case of different standarddeviations.

Bibliography

Aertgeerts, B., Buntinx, F., and Kester, A. (2004). The value of the

cage in screening for alcohol abuse and alcohol dependence in general

clinical populations: a diagnostic meta-analysis. Journal of Clinical

Epidemiology, 57(1):30–39.

Agresti, A. (1990). Categorical Data Analysis. John Wiley & Sons,

New York, first edition.

Arends, L. R., Hamza, T. H., van Houwelingen, J., Heijenbrok-Kal, M.,

Hunink, M., and Stijnen, T. (2008). Bivariate random effects meta-

analysis of ROC curves. Medical Decision Making, 28(5):621–638.

Barr, D. J., Levy, R., Scheepers, C., and Tily, H. J. (2013). Random ef-

fects structure for confirmatory hypothesis testing: Keep it maximal.

Journal of Memory and Language, 68:255–278.

Bates, D., Machler, M., Bolker, B. M., and Walker, S. C.

(2015). Package ’lme4’. Available from: https://cran.r-

project.org/web/packages/lme4/lme4.pdf.

Crombie, I. K. and Davies, H. T. (2009). What is meta-analysis?

Available from: http://www.medicine.ox.ac.uk/bandolier/painres/

download/whatis/meta-an.pdf.

Faraway, J. J. (2006). Extending the Linear Model with R. Taylor &

Francis Group.

Ga�lecki, A. and Burzykowski, T. (2013). Linear Mixed-Effects Mod-

els Using R: A Step-by-Step Approach. Springer Science+Business

Media.

Greven, S. and Kneib, T. (2010). On the behaviour of marginal and

conditional AIC in linear mixed models. Biometrika, 97(4):773–789.

Hamza, T. H., Arends, L. R., van Houwelingen, H. C., and Stijnen,

T. (2009). Multivariate random effects meta-analysis of diagnostic

tests with multiple thresholds. BMC Medical Research Methodology,

10(9):73.

133

134 Bibliography

Harbord, R. M., Deeks, J. J., Egger, M., Whiting, P., and Sterne,

J. A. (2007). A unification of models for meta-analysis of diagnostic

accuracy studies. Biostatistics, 8:239–251.

Hodges, J. S. and Sargent, D. J. (2001). Counting degrees of freedom

in hierarchical and other richly-parameterised models. Biometrika,

88:367–379.

Honest, H. and Khan, K. S. (2002). Reporting of measures of accuracy

in systematic reviews of diagnostic literature. BMC Health Services

Researc, 2.

Lusted, L. B. (1971). Signal detectability and medical decision-making.

Science, 171:1217–9.

Martınez-Camblor, P. (2014). Fully non-parametric receiver operat-

ing characteristic curve estimation for random-effects meta-analysis.

Statistical Methods in Medical Research.

M.Laird, N. and H.Ware, J. (1982). Random-effects models for longi-

tudinal data. Biometrics, 38:963–974.

Putter, H., Fiocco, M., and Stijnen, T. (2010). Meta-analysis of diag-

nostic test accuracy studies with multiple thresholds using survival

methods. Biometrical Journal, 52(1):95–110.

Reitsma, J., Glas, A., Rutjes, A., Scholten, R., Bossuyt, P., and Zwin-

derman, A. (2005). Bivariate analysis of sensitivity and specificity

produces informative summary measures in diagnostic reviews. Jour-

nal of Clinical Epidemiology, 58(10):982–990.

Ressing, M., Blettner, M., and Klug, S. J. (2009). Systematische

Ubersichtsarbeiten und Metaanalysen. Deutsches Arzteblatt, 27:456–

63.

Riley, R., Takwoingi, Y., Trikalinos, T., Guha, A., Biswas, A., Ensor,

J., Morris, R. K., and Deeks, J. (2014). Meta-analysis of test accu-

racy studies with multiple and missing thresholds: a multivariate-

normal model. Journal of Biometrics and Biostatistics, 5:196.

10.4172/2155-6180.1000196.

Rucker, G. and Schumacher, M. (2010). Summary ROC curve based on

the weighted Youden index for selecting an optimal cutpoint in meta-

analysis of diagnostic accuracy. Statistics in Medicine, 29:3069–3078.

Rutter, C. M. and Gatsonis, C. A. (2001). A hierarchical regression

approach to meta-analysis of diagnostic test accuracy evaluations.

Bibliography 135

Statistics in Medicine, 20:2865–2884.

Schumacher, M. and Schulgen, G. (2008). Methodik klinischer Studien.

Methodische Grundlagen der Planung, Durchfuhrung und Auswer-

tung. Springer-Verlag Inc., Heidelberg, third edition.

Schwarzer, G., Carpenter, J. R., and Rucker, G. (2015). Use-R! –

Meta-Analysis with R. Springer, Berlin, Heidelberg.

Vaida, F. and Blanchard, S. (2005). Conditional Akaike information

for mixed-effects models. Biometrika.

Vouloumanou, E., Plessa, E., Karageorgopoulos, D., Mantadakis, E.,

and Falagas, M. (2011). Serum procalcitonin as a diagnostic marker

for neonatal sepsis: a systematic review and meta-analysis. Intensive

Care Medicine, 37(5):747–762.

Wacker, C., Prkno, A., Brunkhorst, F. M., and Schlattmann, P. (2013).

Procalcitonin as a diagnostic marker for sepsis: a systematic review

and meta-analysis. Lancet Infectious Diseases, 13(5):426–435.

Willis, B. H. and Quigley, M. (2011). Uptake of newer methodological

developments and the deployment of meta-analysis in diagnostic test

research: a systematic review. BMC Medical Research Methodology

2011, 11:27, 11.

World Health Organisation (2001). Biomarkers in risk assessment: Va-

lidity and validation. Available from: http://www.inchem.org/ doc-

uments/ehc/ehc/ehc222.htm.

Zhelev, Z., Hyde, C., Youngman, E., Rogers, M., Fleming, S., Slade,

T., Coelho, H., Jones-Hughes, T., and V., N. (2015). Diagnos-

tic accuracy of single baseline measurement of elecsys troponin T

high-sensitive assay for diagnosis of acute myocardial infarction in

emergency department: systematic review and meta-analysis. BMJ,

350:h15. doi: 10.1136/bmj.h15.

Hiermit versichere ich, dass ich die vorliegende Arbeit selbststandig

verfasst und samtliche Quellen angegeben und alle Zitate gekennzeich-

net habe und dass die Arbeit in gleicher oder ahnlicher Form noch in

keiner Prufungsbehorde vorgelegt wurde.

Freiburg, den 28.09.2015 ——————————————————–

Susanne Steinhauser

determining optimal cut-oﬀs in themeta

Documents