mediation analysis of nurse practitioner impact on renal … · 2020. 7. 7. · bachelor thesis...

117
Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation Analysis of Nurse Practitioner Impact on Renal Outcome in CKD Patients Author: Luna Elise Schernthaner s4703928 First supervisor/assessor: dr. ir. T. Claassen [email protected] Second supervisor: dr. A.J.G. van den Brand [email protected] Second assessor: prof. dr. T.M. Heskes [email protected] June 27, 2020

Upload: others

Post on 28-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Bachelor thesisComputing Science

Radboud University &Radboud University Medical Center

Mediation Analysis of NursePractitioner Impact on Renal

Outcome in CKD Patients

Author:Luna Elise Schernthaners4703928

First supervisor/assessor:dr. ir. T. Claassen

[email protected]

Second supervisor:dr. A.J.G. van den Brand

[email protected]

Second assessor:prof. dr. T.M. Heskes

[email protected]

June 27, 2020

Page 2: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Abstract

Previous efforts have shown a positive association between chronic kidneydisease (CKD) progression and blood pressure, proteinuria and salt intakemanagement. However, the exact extent of these associations is still un-known. With this knowledge, guidelines for CKD treatment could be refinedto advance CKD care, which in turn could improve outcomes for CKD pa-tients. This research aims to find the extent of these effects using mediationanalysis on the MASTERPLAN dataset. The MASTERPLAN trial studiedoutcomes and guideline adherence in 788 CKD patients, allocated to receivestandard care or additional support of a nurse practitioner to achieve 11treatment goals over a median follow-up time of 5.7 years. We present fourhypothetical mediation models to find if the effect of the nurse practitionerintervention on renal outcome is mediated by the individual blood pressure,proteinuria, ACE inhibitor or ARB use, and salt intake treatment goals indi-vidually. From this analysis follows that none of these four treatment goalssignificantly mediate the impact of the intervention on the renal endpoint.However, the dataset suffered from significant contamination bias, whichmeans that the found effects may be smaller than their true values. Basedon these findings, we conclude that the nurse practitioner intervention maynot improve renal outcome in CKD patients in general, nor through theindividual blood pressure, proteinuria, ACE inhibitor or ARB use, or saltintake treatment goals.

Page 3: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Acknowledgements

I would like to thank my supervisor from the Radboud University MedicalCenter, Jan van den Brand for the idea for this thesis, giving me access tothe MASTERPLAN dataset, and his guidance. I would also like to thankmy supervisor from the Institute for Computing and Information Sciences,Tom Claassen for his input and advice. I am very thankful for their time,even with all of the events going on.

Finally, I would like to thank everyone who helped me throughout thissemester, making me believe in myself and supporting me through difficulttimes.

Page 4: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Contents

1 Introduction 31.1 Chapter overview . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Preliminaries 72.1 Chronic Kidney Disease . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 CKD Definition . . . . . . . . . . . . . . . . . . . . . . 72.1.2 Glomerular Filtration Rate . . . . . . . . . . . . . . . 72.1.3 Albuminuria . . . . . . . . . . . . . . . . . . . . . . . 82.1.4 CKD Classification . . . . . . . . . . . . . . . . . . . . 92.1.5 Blood Pressure, Proteinuria, ACE inhibitors, ARBs,

and Sodium Intake . . . . . . . . . . . . . . . . . . . . 92.1.6 CKD Complications . . . . . . . . . . . . . . . . . . . 10

2.2 The MASTERPLAN Study . . . . . . . . . . . . . . . . . . . 112.3 Directed Acyclic Graphs . . . . . . . . . . . . . . . . . . . . . 132.4 Edge Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Multiple Imputation . . . . . . . . . . . . . . . . . . . 152.4.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . 162.4.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . 172.4.4 Cox Models . . . . . . . . . . . . . . . . . . . . . . . . 182.4.5 Edge Significance . . . . . . . . . . . . . . . . . . . . . 19

2.5 Mediation Analysis . . . . . . . . . . . . . . . . . . . . . . . . 202.5.1 Potential Outcomes Framework . . . . . . . . . . . . . 212.5.2 Effect Estimation . . . . . . . . . . . . . . . . . . . . . 23

3 Method 253.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Data Visualisation . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Final DAGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Statistical Analyses . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Mediation Analysis 34

1

Page 5: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

5 Discussion 385.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Conclusions 426.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A Exploratory Data Analysis 56

B Missing Data Analysis and Imputation 70

C Individual DAGs 94

D Implication Testing Script 97

E Edge Estimation Script 102

F Mediation Analysis Script 109

2

Page 6: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 1

Introduction

Chronic kidney disease (CKD) is a significant public health problem, esti-mated to affect 843.6 million people, or 11.1% of the world population, in2017 [20]. The two main causes are diabetes and high blood pressure [39]. Ithas many symptoms that decrease quality of life, which are shown in figure1.1. CKD is associated with substantial morbidity, mortality and significanthealthcare costs. It also increases risk of cardiovascular disease (CVD) andprogression to end-stage renal disease (ESRD) [35, 63, 14]. Research hasverified that prevention, early detection and treatment of complications canimprove quality of life and survival, even if there is no effect on progressionof CKD [30].

As a result, multiple national and international guidelines for the di-agnosis and treatment of CKD have been created, however, adherence tothese guidelines is unsatisfactory. Part of this may be explained by the factthat these guidelines focus on more than 10 goals simultaneously, whichcan be overwhelming to patients. Besides this, an average consultation bya nephrologist takes 10 minutes, which means it is infeasible to properlyaddress each of the numerous goals. Recent trials suggest that additionalcoaching by nurse practitioners to improve treatment goal adherence couldbe beneficial for the life expectancy and quality of CKD patients [71].

The goal of this thesis is to answer the question:

In patients with CKD, does increased effort to eliminate highblood pressure, decrease proteinuria, increase ACE inhibitor orARB use, and lessen salt intake by means of coaching by a nursepractitioner have a significant positive impact on the long termprogression of kidney function?

To answer this question, we will use the MASTERPLAN dataset and a hypo-thetical mediation model to look at the effect of the nurse practitioner inter-vention on the occurrence of a composite renal endpoint a median follow-upof 5.7 years. The analysis will be limited to adult patients with category

3

Page 7: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

3-4 estimated glomerular filtration rate (eGFR) who reached the treatmentgoals of:

� blood pressure 130/85 mm Hg or lower,

� proteinuria 0.5 g per day or lower,

� ACE inhibitor or ARB e.g. enalapril 5 mg twice daily (or comparabledose of other ACE inhibitor) or irbesartan 75-150 mg (or comparabledose of other ARB) daily, and

� sodium intake 2g per day (or sodium excretion 100 mmol per day) orlower

Figure 1.1: Symptoms and signs of CKD [63].

4

Page 8: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

within one year, versus patients who did not reach this goal. This compositerenal endpoint consists of dialysis, transplant, 50% decrease of eGFR relativeto baseline, and death.

We will construct four causal models in which the total effect of thenurse practitioner intervention is partitioned into the effect through theaforementioned treatment goals. Such models describe causation as opposedto mere association because of the assumptions made about the directionsof the effects. To extract the size and significance of the mediators of theeffect of the intervention on the composite endpoint and the effect itself, wewill use mediation analysis.

It is important to note that we have chosen to construct these mod-els with a combination of domain knowledge and the ‘guessing-and-testing’method described by Shalizi [56], as opposed to the possibly more accu-rate method of using a discovery algorithm in the interest of time. Thus,the mediation analysis relies on the models we constructed which may beincorrect.

The partitioning of the effect into these treatment goals is done becauseprevious scientific research has shown that control of these variables has abeneficial effect on CKD patients [22]. This is especially relevant becauseother research has merely shown that nurse practitioner support may benefitCKD patients, but not through which mechanisms this benefit is achieved.That is, the effect on endpoints and treatment goals is often measured butnot the effect on endpoints through these separate treatment goals. TheMASTERPLAN trial used several treatment goals and guidelines, but theeffect of the intervention on outcomes was not broken down to analyse theinfluence of these individual goals and guidelines [72, 75, 47].

The mediation analysis performed in this thesis serves to shed more lighton the mechanism through which this nurse practitioner intervention affectsrenal outcome in CKD patients. With this information, future applicationsof this intervention and even usual care can be improved by prioritisingtreatment goals that are found to contribute significantly to the effect of thenurse practitioner intervention on renal outcome.

Because this analysis relies on the causal models constructed in thisthesis, which is partially based on hypotheses, these results may be incorrectdespite our best efforts to build valid models. The method of model creationwas employed because of time constraints. For the same reason we havechosen to limit the variables in the models to the blood pressure, proteinuria,ACE inhibitor or ARB use, and salt intake treatment goals, and to dividethe DAG into four models, each containing only one mediator. Despite theselimitations, this analysis still contributes to the body of knowledge, at theleast as a stepping stone for a more comprehensive model.

5

Page 9: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

1.1 Chapter overview

In chapter 2, we provide related background information on CKD, the MAS-TERPLAN study, mediation analysis and causal path diagrams. Chapter 3comprises the methods we used for data preprocessing, data visualisation,generating causal path diagrams and their validation and statistical analy-ses using those diagrams. Then, in chapter 4 we describe the results of themain mediation analyses of the DAGs described in chapter 3. Chapter 5includes the limitations and related work to this research. Finally, chapter 6contains the conclusions of our work and speculation on future efforts thatwill improve and expand on our research.

6

Page 10: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 2

Preliminaries

In this chapter we will explain a number of concepts relevant to this research.We will give the definition of chronic kidney disease (CKD) in section 2.1.In section 2.2 we will describe the treatment goals and results of the MAS-TERPLAN study. After this, in section 2.3 we will give a brief descriptionof directed acyclic graphs (DAGs) and their implications. Then, in section2.4 we will give an overview of the statistical analysis used in this thesis.Finally, in section 2.5 we will explain why and how we use causal mediationanalysis.

2.1 Chronic Kidney Disease

In this paper, we will mainly consult the guideline published in 2012 by theKidney Disease: Improving Global Outcomes (KDIGO) organization, whichcontains the definition, treatment recommendations and further informationon CKD.

2.1.1 CKD Definition

CKD is defined as the presence of decreased GFR or a marker of kidney dam-age (or both) for longer than three months, according to the 2012 KDIGOguideline [22]. Table 2.1 shows some more details on this definition. Insection 2.1.2 and 2.1.3 we will further explain glomerular filtration rate andalbuminuria, which are the main criteria used in the classification of CKD.To give a short description, glomerular filtration rate gives an indication ofthe speed at which the kidneys filter the blood, while albuminuria gives anindication of how damaged the kidneys are.

2.1.2 Glomerular Filtration Rate

The glomerular filtration rate (GFR) is used as a measure for kidney func-tion. It is the total filtration rate of the kidneys. The estimated GFR

7

Page 11: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Markers of kidney damage (one or more)

Albuminuria (ACR ≥30 mg/24 hours; ACR ≥30 mg/g [≥3mg/mmol])Urine sediment abnormalitiesElectrolyte and other abnormalities due to tubular disordersAbnormalities detected by histologyStructural abnormalities detected by imagingHistory of kidney transplantation

Decreased GFR

GFR <60 ml/min/1.73 m2 (GFR categories G3a–G5)

Table 2.1: Criteria for CKD (either decreased GFR or marker(s) present for>3 months) [22]

(eGFR) is, as the name says, an estimation of this filtration rate. It is cal-culated using the concentration of creatinine in the blood and is given inml/min/1.73 m2, where the 1.73 m2 is a correction for body surface area,which is important for e.g. obese patients. Creatinine is created by themuscles and is a waste product that is normally filtered out of the blood bythe kidneys. When there is a large amount of malfunction in the kidneys, arise in blood creatinine levels can be observed. This happens with age andcan vary among gender and other factors [40].

Calculating the eGFR with creatinine levels in the blood is preferredbecause it is not very invasive [40]. For this computation, the Modificationof Diet in Renal Disease (MDRD) formula was used in the MASTERPLANtrial. However, research has shown that the eGFR becomes more inaccuratewhen the kidneys are more damaged [33]. In an attempt to minimize theeffect of this inaccuracy, we use a relative decrease of 50% of eGFR as oneof the endpoints instead of a set value. Because rapid progression is definedas a sustained decline in eGFR of more than 5 ml/min/1.73 m2 per year,the relative decrease we use should capture this category for patients withGFR stage 3-5 with some room for the inaccuracy of the measurements.

2.1.3 Albuminuria

Albuminuria is also an indicator of kidney disease. When the kidneys mal-function, large proteins that should stay in the blood leak into the urine,which is called proteinuria. One of these proteins is albumin, which preventsfluid leakage from blood vessels among other things [61]. Albuminuria, ab-normal leakage of albumin into the urine is used in the diagnosis of CKDbecause it is the most common protein lost in the urine in most cases ofCKD. The albumin excretion rate (AER) is a measurement of mg albu-min/24 hours in the urine. The 2012 KDIGO guideline recommends mea-

8

Page 12: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

suring the albumin-to-creatinine ratio (ACR) in mg/g, which is the chosenunit for proteinuria in this thesis. [22].

2.1.4 CKD Classification

CKD classification is based on the GFR and ACR of a patient. A visualrepresentation of the CKD classification is shown in the table “Prognosisof CKD by GFR and albuminuria category” in figure 2.1. The prognosis,or risk of CKD outcome such as progression or cardiovascular disease isindicated by the colours.

Figure 2.1: Prognosis of CKD by GFR and albuminuria category. Green:low risk (if no other markers of kidney disease, no CKD); Yellow: moderatelyincreased risk; Orange: high risk; Red, very high risk of CKD outcome [22].

2.1.5 Blood Pressure, Proteinuria, ACE inhibitors, ARBs,and Sodium Intake

The 2012 KDIGO guideline specifies a number of measures addressing car-diovascular health and CKD to prevent the progression of CKD. Control ofblood pressure, reduction of proteinuria and lifestyle interventions includingreduced daily sodium intake are critical in preventing CKD progression. Be-sides their impact on CKD progression, these three variables are also closely

9

Page 13: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

related to each other [22]. Despite years of effort, the exact extent of theindividual effects on CKD progression is still unknown and mostly consistsof hypotheses. For this research we limit our scope to these three variablesto stay within the timeframe given. Besides this, as mentioned before, theyare known to be closely related and to have an impact on CKD progression.Next follows a brief overview of these individual variables and their relationto CKD.

CKD and elevated blood pressure are strongly associated and each cancause or aggrevate the other [23]. Lowering blood pressure in CKD patientshelps retard CKD progression. It is notable that this effect is only observableafter long-term follow-up [47]. Specifically, meta-analysis showed that bloodpressure control decreased the risk of end stage renal disease (ESRD) by21% in CKD patients with proteinuria [63].

Proteinuria is also found in many CKD patients. Actively lowering pro-teinuria may decrease cardiovasuclar risk and acute kidney injury (AKI) riskfor these patients [22]. The presence of proteinuria is also associated withincreased risk of CKD progression, mortality and possibly increased risk ofcancer [22]. The MDRD study in 1995 showed that reduction of proteinuriaand decrease in rate of GFR decline were correlated [1]. Other research hasidentified proteinuria not just as a marker of CKD, but a contributor to theprogression of the disease as well [9, 42].

Angiotensin-converting enzyme (ACE) inhibitors and Angiotensin II re-ceptor blockers (ARBs) are important agents in lowering blood pressure andmay even decrease proteinuria. Because of their effectiveness, using one orboth of these medications is recommended in nearly all cases of CKD [23].However, research has indicated that not many patients adhere to their pre-scriptions, which may be improved by means of additional coaching [72]. Werefrain from explaining the effect of these medications in more detail in theinterest of keeping this paragraph short, as this is less relevant to our thesis.

Impaired excretion of sodium is common in CKD patients as well. Lower-ing sodium intake for these patients decreases blood pressure and proteinuria[22]. The KDOQI 2002 and 2012 CKD guidelines therefore suggest loweringsodium intake to <100 mmol (<2 g) per day [31, 22]. Dutch guidelines onCKD have also recommended this or a lower maximum since 2004 [64, 15,43, 44]. Despite these recommendations, many CKD patients exceed thisdaily limit. Italian, Dutch and Turkish studies showed that only 19%, 17%and 14,7%, respectively, of CKD patients met the target daily sodium intake[25].

2.1.6 CKD Complications

Chronically decreased kidney function can lead to a lot of problems, someof which are shown in figure 1.1. It also increases risk of cancer and cardio-vascular disease (CVD), including stroke and heart attack [63]. The risk of

10

Page 14: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

CVD is so severe that CKD patients are more likely to experience a heartattack than to progress to end-stage renal disease (ESRD). The kidneys ofa patient with CKD that develops into ESRD function at 10-15% of theirnormal capacity. At this point, a patient would need dialysis or a kidneytransplant to stay alive [41].

Even when CKD does not progress to such a severe state, patients areoften in a lot of discomfort. The kidneys perform many important functionsin the human body so when their performance decreases, multiple systemsstart to fail. Because of the multitude of different systems failing, CKDpatients often get a multiple prescriptions, each with their own negativeside effects. A restrictive diet is also required to limit the consumption ofsalt, protein, potassium, phosphorous, fat and fluids. Furthermore, manypatients develop uremia, of which symptoms include nausea, vomiting, lossof appetite, tiredness and nerve problems [4]. On top of all of this, patientsare advised to visit their doctor one to four times each year to get blood andurine measurements done to evaluate the progression of the disease [22].

2.2 The MASTERPLAN Study

In 2004, the Multifactorial approach and superior treatment efficacy in renalpatients with the aid of nurse practitioners (MASTERPLAN) study wasstarted in the Netherlands with 793 CKD patients. It was a multicenterrandomized controlled clinical trial with patients of nine Dutch hospitalswith a median follow-up of 5.7 years. The study was designed to evaluatewhether a multifactorial approach with the aid of nurse practicioners reducescardiovascular risk in patients with CKD [74].

The goal of the study was to compare the results of treatment exe-cuted by physicians, who were mostly nephrologists, with execution of thetreatment regimen by a specialized nurse under the supervision of, and incollaboration with a nephrologist. For both of these categories, the same setof guidelines and treatment goals applied. These guidelines and treatmentgoals were based on several existing national and international guidelineson the treatment of CKD and were focused on reduction of vascular riskfactors. It is notable that most of the national and international guidelineswere updated or replaced during the study.

In total, there were 11 treatment goals in the MASTERPLAN study:blood pressure, proteinuria, LDL cholesterol, glycemic control, hemoglobin,serum phosphate, serum PTH, sodium excretion, BMI, and adherence toguidelines of healthy physical activity and not smoking. Besides these goals,there were also 4 medication types that were recommended by default:statins, acetylsalicyclic acid, ACE inhibitors or ARBs, and active vitamin D[73]. Because this thesis is confined to the effect of blood pressure, protein-uria, ACE inhibitor or ARB use, and sodium intake, we will solely consider

11

Page 15: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Risk factors Goal

Blood pressure ≤ 130/85 mm Hg∗

Proteinuria <0.5 g/day

Sodium excretion 100 mmol/day

ACE-I or ARB use e.g. enalapril 5 mg twice daily (or comparabledose of other ACE-I) or irbesartan 75-150 mg(or comparable dose of other ARB) daily

∗In case of proteinuria >1 g/day: 125/75 mm Hg

ACE-I = Angiotensin-converting enzyme inhibitor

ARB = Angiotensin II receptor blocker

Table 2.2: Treatment goals of the MASTERPLAN study relevant to thisthesis [74].

Control Intervention

Renal endpoint N % N %

Dialysis 9 2.5 17 4.5

Kidney transplant 6 1.6 6 1.6

50% reduction in eGFR∗ 110 30.1 91 24.1

Death 36 9.9 29 7.7

Censored 204 55.9 234 62.1

Total 365 377

Table 2.3: Renal endpoint occurrences per treatment group in the contextof this thesis.

the treatment goals related to these subjects. The relevant treatment goalsare shown in table 2.2.

After two years, treatment goal adherence improved in both the controland intervention group, with some improvements more pronounced in theintervention group. The study showed that the aid of nurse practitioners inadhering to treatment goals benefits everyday care of CKD patients. Fur-thermore, during the first two years, the number of visits of the interventiongroup was higher than that of the control group, but the number of physi-cian visits was lower. Another found benefit of the nurse practitioners isthat the differences in the achievement of treatment goals between hospitalswere reduced because adherence to guidelines was improved [72].

The original intention of the trial was to assess the effect of nurse prac-titioner intervention on cardiovascular outcomes, consisting of non-fatalstroke, non-fatal heart attack and fatal heart attack. The intervention did

12

Page 16: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

not reduce the rate of the composite cardiovascular endpoint in the interven-tion group compared to the control group. A non-significant hazard ratio of0.90 was found, which means that the intervention group had 0.9 times asmuch risk of a cardiovascular outcome as the control group. However, thishazard ratio was not significant, which means that it is likely invalid [75].

Another analysis was performed with the trial data that looked at acomposite renal endpoint instead, which consisted of 50% increase in serumcreatinine, end stage renal disease, and death. This study found a significanthazard ratio of 0.80, which means the intervention reduced incidence of thecomposite renal endpoint by 20%. The hazard ratio was significant, whichmeans that this found decrease is likely valid [47].

The composition of the composite renal endpoint relevant to our analysisand the occurrence of each endpoint per treatment group are presented intable 2.3 as amounts and proportions. A composite endpoint was chosen tosimplify the analysis and to emphasize the complexity of CKD progression.

2.3 Directed Acyclic Graphs

To perform mediation analysis, a number of assumptions are required aboutthe relationships between variables in the context of the analysis. Theseassumptions can be visualised with a directed acyclic graph (DAG). Thesegraphs are a type of causal path diagram and are used in mediation analysisto give a clear overview of the context, specifically they make the assump-tions explicit [46]. To construct an accurate DAG, it is important to knowabout confounders, which cause a difference in the risk of the outcome inthe exposed and unexposed populations that cannot be explained by theexposure [59]. Figure 2.2 is a simple example of a DAG with variables X,M , Z and Y , where M is a mediator of the relationship of X on Y , and Zis a confounder of the effect of X on Y , X on M , and M on Y .

The layout of a DAG is as follows:

� Nodes represent variables existing of the exposure, outcome and othervariables, or covariates.

� Directed edges represent relationships between variables; a direct pathindicates the possibility of causal relevance, however, the absence ofsuch a path indicates the absence of any causal relationship.

For a more detailed explanation of the composition of DAGs, we recommendchapter 2 and 3 of Tennant et al. [59].

Because DAGs are mere conceptual models built on hypotheses, a re-searcher can accidentally omit important confounders or interactions, whichwould mean that the analysis based on their DAG is invalid. Thus, thevalidity of our mediation analysis also depends strongly on the accuracy ofour DAG.

13

Page 17: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Figure 2.2: A simple DAG where M mediates the effect of X on Y and Zis a confounder of the effect of X on Y , X on M , and M on Y .

As mentioned before, two variables in a DAG that are not directly con-nected imply an independence. In figure 2.3, A and B are independent,when controlling for M. This means that if M is held constant, there is norelationship between A and B, which can be expressed as A ⊥⊥ B |M (orB ⊥⊥ A |M). This independence can be falsified if the multiple regressionformula A ∼ B + M produces a non-null parameter estimate. This in-dicates that A and B are related, even when controlling for M, thus theabsence of the arrow is not justified. Regression will be explained in moredetail in the next section.

Figure 2.3: A simple example DAG to illustrate conditional independency.

Thus a DAG produces a list of conditional independencies which can befalsified using regression and data. More specifically, this list is obtained us-ing d-separation. For a short introduction to this theory, we suggest readingd-Separation Without Tears. On a final note, the inability to falsify any im-plied conditional independency of a DAG does not mean it is valid, however,it does strengthen its plausibility.

In general, there are three ways to formulate a DAG: prior knowledge,guessing-and-testing, and discovery algorithms. The first is simply visual-ising known causal processes, however, it is still important to check thisknowledge, which is done by the guess-and-test method. First build an ini-tial DAG based on guesses, then find all implied conditional independenciesusing d-separation, and finally reject the DAG if not all conditional indepen-dencies hold. The final method requires an algorithm, such as the very basic

14

Page 18: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Spirtes, Glymour, and Scheines algorithm, which infers the DAG from data[56]. For our research, we will adhere to the guessing-and-testing methoddue to time constraints.

2.4 Edge Estimation

The next step after designing a DAG is calculating the edge weights andsignificance. This can be done with different types of regression, dependingon the variables that are being tested.

2.4.1 Multiple Imputation

Before moving on to edge estimation, it is necessary to find a solution formissing data. In trial data, it is nearly impossible to perform all measure-ments like blood pressure and albumin-creatinine ratio at every visit of everypatient, which leads to incomplete data. One method to deal with this is tosimply omit incomplete data, though this could introduce significant bias.For example, one could lose most data on patients who are more ill becausethey have a higher rate of missing variables than patients who are less ill.This introduction of bias is more severe depending on the type of miss-ingness, often classified as missing completely at random (MCAR), missingat random (MAR), and missing not at random (MNAR). In MCAR, thereare no systematic differences between the missing and the observed values,whereas in MAR there are systematic differences, however, they can be ex-plained with other observed measurements. In MNAR, there are systematicdifferences as well, but these cannot be explained with other observed mea-surements. MAR is most frequently observed, though it should be notedthat it is not possible to prove the pressence of the MNAR pattern usingdata [57].

One solution to missing data is multiple imputation (MI), with whichmultiple plausible complete datasets are produced and the results of thesedatasets are combined using specific methods, taking the uncertainty of theimputed values into account. First, to produce the plausible datasets, miss-ing variables are replaced by sampling from the predictive distribution ofthe observed data [57]. Common methods for imputation of numeric valuesand binomial values are predictive mean matching and logistic regression,respectively. Furthermore, the imputed value is also based on other parame-ters, or predictors. For further explanation of these methods, we recommendstudying chapter 3 of Buuren [6]. When the imputed datasets are produced,it is possible to fit a chosen model to each dataset and find the overall pa-rameter and variance estimations. This is where Rubin’s rules come intoplay to properly handle the variance introduced by the imputations [53].

To combine the results of several imputed datasets for measures like theregression coefficient and standard deviation, the average of the parameter

15

Page 19: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

estimates of all imputed datasets is taken, or in the form of a formula:ˆθMI = 1

M

∑Mm=1 θm, where M is the number of imputed datasets and θm is

the parameter estimate of the regression for imputed dataset m. Obtainingthe combined variance estimate of several imputed datsets is somewhat morecomplex, with the formula

V ar(ˆθMI) = within imputation variance +1

Mbetween imputation variance

In which the within imputation variance is expressed as

1

M

M∑m=1

V ar(θm) =1

M

M∑m=1

(θm − θm)2

and the between imputation variance is expressed as

1

M − 1

M∑m=1

(θm − ˆθMI)2

where ˆθMI is the average of the parameter estimate over all M imputeddatasets [34, 55].

2.4.2 Linear Regression

A commonly used method to estimate edges pointing towards continuousvariables is multiple linear regression, which attempts to express the effectof predictor variables on a response variable. This effect can be expressed inform of the mean function with p predictor variables (xp); E(y|x1, . . . , xp) =β0 + β1x1 + · · ·+ βpkp [65].

The linear regression intercept (β0) indicates the value of y when allpredictor variables (xi) are equal to 0. Furthermore, when all other x areheld constant, a change of one unit in xi causes a change of βi (the logisticregression slope for xi) in y [65].

To compute the coefficients βi in this model, the ordinary least squaresmethod can be used. This method minimises the difference between the truevalue of y (in the dataset) and the estimated value of y (E(y|x1, . . . , xp)) ascalculated by the model), where this difference for datapoint i is expressed asεi = yi−E(yi|xi1, . . . , xip). This minimising is often done with the ordinary

least squares method, which finds the values β0, . . . , βp of (β0, . . . , βp) thatminimise the residual sum of squares function, or RSS(β0, . . . , βp), given byRSS(β0, . . . , β1) =

∑ni−1(yi − E(yi|xi1, . . . , xip))2 =

∑ni−1 e

2i .

These calculations can be translated into R commands. Essentially, thecomputation of the edges of a DAG can be done by regressing a variable onall variables which have a direct outgoing arrow to it. For the simple graphin figure 2.2, the edges X → Y , M → Y , and Z → Y can be calculated by

16

Page 20: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

regressing Y on X, M and Z. For a continuous Y this can be done in Rwith the command lm(Y ∼ X + M + Z, data).

Because of the different units of most continuous variables, the obtainedcoefficients are not comparable. To enable comparing these coefficients, theindependent continuous variables in a regression formula should be stan-dardised. Standardising results in a variable that has a mean of 0 and astandard deviation of 1. This is achieved with a simple transformation ofsubtracting the mean of the variable and then dividing this by the stan-dard deviation of the variable, and should be done for all three regressiontypes mentioned in this chapter [19]. In R, the function scale() is used toperform this transformation.

2.4.3 Logistic Regression

Whereas linear regression can be used if the dependent variable is contin-uous, binary logistic regression can be used if the dependent variable isbinomial. Binary logistic regression seeks to describe the effect of predictorvariables (xi) on the probability that the outcome variable (Y ) takes on thevalue 1, also indicated as π = P (Y = 1). Usually, a nonlinear transforma-tion is applied to π which maps its range from (0, 1) to (−∞,∞). A popularchoice for this transformation is the logit function: g(π) = log( π

1−π ). Afterapplying this transformation, the probabilities are related linearly to the ppredictor variables, such that: log( π

1−π ) = β0 + β1x1 + β2x2 + · · · + βpxp.This is also known as the logistic regression model [12].

Because π is known as the probability of success, π1−π is known as the

odds of success. The logistic regression model can also be expressed as π =exp(β0+β1x1+β2x2+···+βpxp))1+exp(β0+β1x1+β2x2+···+βpxp) , which emphasises the S-shaped relationship of

π with increasing values of βixi [12].The logistic regression intercept (β0) indicates the log odds when all

predictor variables (xi) are equal to 0. Additionally, the logistic regressionslope for variable xi (βi), indicates the amount the log odds changes whenxi changes by one unit, given that all other variables remain constant. Notethat these interpretations are analogue to the linear regression intercept andslope [12]. Furthermore, if xi increases by one unit while all other variablesare held constant, the odds of success multiplicatively changes by a factorof exp(βi).

To compute the coefficients βi in the model, the iterative reweightedleast squares method is utilised. To dissect this method, first, the weightedleast squares method minimises the sum of squared errors like the ordinaryleast squares method, however, each squared error is appointed a weight suchthat not all errors are of equal importance in this minimising problem. Then,the iterative reweighted least squares method finds the weights for which thesum of squared errors is minimised. A more mathematical description hasbeen left out in the interest of keeping this section short, but can be found

17

Page 21: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

in Burrus [5].Performing this calculation in R requires a slightly different command

from linear regression. Again, looking at figure 2.2, calculating the edgesX → Y , M → Y , and Z → Y can be done by regressing Y on X, M andZ. Regressing a binomial Y requires the command glm(Y ∼ X + M + Z,

family=binomial, data) in R.In the context of a medical trial, the effects of an intervention or a

mediator on the survival time can also be calculated with Cox regressioninstead of linear or logistic regression. We explain this concept in the nextsection.

2.4.4 Cox Models

A Cox proportional hazards regression model is a type of survival modelwhich relates several exposures, like age or an intervention, considered si-multaneously to survival time. This survival time is the time a patient livesuntil an event, such as death or disease recurrence, occurs. Such a modelresults in a hazard rate, which is the risk of a patient suffering from an event,given that they survived until a given time. Thus, the hazard represents theexpected number of events per unit of time. To illustrate, a survival modelcould be used to inspect the effect of a new medicine on the lifespan of pa-tients of a certain disease. We will utilise Cox regression to find the effect ofthe nurse practitioner intervention and the four aforementioned individualtreatment goals on the time it takes for the composite renal endpoint to oc-cur. Often, the hazard ratio (HR) is the measure of interest. It is the ratioof the total number of observed to expected events in two different groups.This could be the treatment and the control groups in a trial, for example[26].

The Cox regression model thus calculates the expected hazard at time t,by taking the product of the baseline hazard and the exponential function ofthe linear combination of the exposures. The baseline hazard is the hazardwhen all exposures are equal to 0. The model can be written as:

h(t) = h0(t)exp(b1X1 + b2X2 + · · ·+ bpXp

which means that the exposures have a multiplicative effect on the expectedhazard. These coefficients for the exposures are in units of log-HR, whichmakes them difficult to interpret. This is why they are exponentiated to thenobtain the HR of the individual exposures. These, in turn, can be interpretedas the expected hazard increasing by a factor of this coefficient for everysingle unit increase of the corresponding exposure, while keeping all otherexposures constant [26]. The method used to calculate these coefficients wasdescribed by Cox and Oakes [8] and has been excluded from this thesis toconserve the readability of this paragraph. For an applied example of thistheory, we suggest looking at LaMorte [26].

18

Page 22: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

As with the other regression types, in the context of figure 2.2, the edgesX → Y , M → Y , and Z → Y can be computed by performing Cox propor-tional hazards regression with the R command coxph(Surv(time,status)

∼ X + M + Z, data) from the survival library. This Surv(time,status)

command creates a survival object. The status is often either 1 or 2, where1 means the subject was lost to follow-up or did not experience an eventbefore there was no more data recorded, also known as censored, and 2means a predefined event happened. The time is the last time that data wasrecorded for the subject, where recording stops after an event has occurred.

For the final model in this thesis, it is important to note that logisticregression estimates the odds ratio (association estimate), while Cox regres-sion estimates the hazard ratio (time to event estimate), where probabilityis not the same thing as hazard. In more informal terms, logistic and lin-ear regression on survival data assesses a proportion, while Cox regressionestimates a rate [29].

2.4.5 Edge Significance

Because these coefficients are mere estimations, a measure of significanceis desired. This is where the confidence interval (CI) comes into play. A95% CI indicates the upper and lower limits between which the estimatedcoefficient lies with 95% certainty. If this CI includes 0, there is not a strongindication that the coefficient is significant [49]. To illustrate, imagine thatthe estimated edge coefficient, or parameter estimate, of the edge from Xto Y in figure 2.2 is 0.38 with a 95% CI lower bound of 0.30 and an upperbound of 0.48, also indicated as (0.30− 0.48). This would indicate that theeffect of X on Y is significant (0 is not included in the CI) and is estimatedto be of size 0.38.

To obtain this 95% CI, the bootstrap estimate of the standard error (SE)is required, with which the 95% CI can be expressed as

θ = (T − B?)± 1.96× SE?(T ?)

where the coefficient 1.96 specifically belongs to a 95% CI. To dissect thisformula, first the average of the bootstrapped parameter estimate is

T ? = E?(T ?) =1

B

B∑b=1

T ?b

where B is the number of bootstrap samples. Then B? = T ? − T is anestimate of the bias of T. The next step is the estimated bootstrap varianceof T ?:

V ar?(T ?) =

1

B − 1

B∑b=1

(T ?b − T ?)2

19

Page 23: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Finally, the bootstrap estimate of the SE of T is expressed as

SE?(T ?) =

√V ar

?(T ?)

In the notation above, Y ? is used to indicate bootstrap variables and Yis used to denote original sample variables [13]. For this thesis, we usea simplified version of the formula for the confidence interval mentionedabove, where the bias is left out of the calculation. resulting in the formula

θ = T ± 1.96× SE?(T ?).

The different samples for the bootstrap are usually chosen “with replace-ment”, which means that in a dataset with 30 datapoints, the 10th datapointmay occur 7 times within a bootstrap sample, while the 29th datapoint maynot be used at all in that same sample [68].

For this thesis, we use bootstrapping within the imputed datasets, to ob-tain a variance estimate for each estimated coefficient. For this, we use the“MI Boot” method, as described by Schomaker and Heumann [55]. First,multiple imputation is used to generate M imputed datasets, where for eachdataset the parameter estimates are made, or in our case, the regressioncoefficients are computed which can then be used to retrieve the overall pa-rameter estimate using Rubin’s rules. Then, from each dataset B bootstrapsamples are drawn which are then used to calculate the variance estimateof the regression for each imputed dataset, which in turn can be used toobtain the overall variance estimate using Rubin’s rules. The variances ofthe regression coefficients for imputed dataset m can be obtained with

V ar(θm) =1

B − 1

∑b=1

B(θm,b − ˆθm)2

where ˆθm = 1B

∑Bb=1 θm,b [55].

2.5 Mediation Analysis

Mediation analysis is a method to make causal inferences about the effectof an exposure on an outcome through mediators. We will use this methodto study the mediation by the four treatment goals mentioned earlier of theeffect of the nurse practitioner intervention on the incidence of the renalendpoint.

Often, a visual representation in the form of a directed acyclic graph(DAG) is used to show the assumptions a researcher made in the analysis[48]. To give a simple example of how this method would work in practice,imagine a researcher wants to know the effect of an antidepressant on de-pression and how this effect is mediated by insomnia, a common side effectof antidepressants. Thus, the antidepressant would be the exposure, insom-nia the mediator and depression the outcome. The DAG would look like the

20

Page 24: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

one in figure 2.4. For an example DAG that is more similar to ours withrepeated measures, we refer to section 14.7 of Hayes [17] in the interest ofkeeping this explanation short.

Figure 2.4: A simple mediation model for the relation between an antide-pressant, insomnia and depression.

Because of the assumptions made in the construction of the mediationmodel, the researcher can then uncover the effect of the antidepressant ondepression through insomnia (indirect effect) and not through insomnia (di-rect effect). With these indirect and direct effects, they could calculate thetotal effect of the antidepressant on depression [52].

As mentioned before, we use mediation analysis to partition the effectof the nurse practitioner intervention on the time it takes for the compositerenal endpoint to occur in CKD patients through the four treatment goalsmentioned earlier. A concept version of this mediation model is shown infigure 2.5.

2.5.1 Potential Outcomes Framework

Mediation analysis is mainly about estimating the direct and indirect effects.One of the methods to obtain these estimates makes use of the potential out-comes (PO) framework, which builds upon the concept of “opposite worlds”in which the observations of the experiment are different from the true ob-servations. In this framework, Yi(a,m) is the counterfactual outcome forperson i if exposure was set to a and mediator to m, and Mi(a) is the valueassumed by the mediator for person i if exposure had been set to a. If a isthe exposure, a? is the opposite exposure. Furthermore, E[Y (a,m)] is the

21

Page 25: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Figure 2.5: A conceptual mediation model for this thesis.

average counterfactual outcome if each individual had been introduced toexposure a and mediator m [28].

For an example application, imagine a trial where depressed individualsare randomly appointed to recieve (A = 1) or not receive (A = 0) an an-tidepressant over a time period and at the end, a score of the severity oftheir depression (Y ) and presence of insomnia (M = {0, 1}) is measured.The suspected relationships between the variables is shown in figure 2.4. Ifwe apply the theory above within the context of this example, E[Y (1, 1)]denotes the average depression score if all patients would have received theantidepressant and had experienced insomnia.

With this theory, it is possible to express the different indirect and directeffects shown in figure 2.4. In these formulas, X is the exposure, M is themediator and Y is the outcome [28].

� The natural direct effect is defined as the difference between the coun-terfactual outcome if i were exposed to A = a and the counterfac-tual outcome if i were exposed to A = a?, while keeping M equal toM = M(a?).natural direct effect = E[Y (1,M(0))]− E[Y (0,M(0))]

� The natural indirect effect is defined as the difference between thecounterfactual outcome if i were exposed to M = M(a) and the coun-

22

Page 26: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

terfactual outcome if i were exposed to M = M(a?), while keeping Aequal to A = a.natural indirect effect = E[Y (1,M(1))]− E[Y (1,M(0))]

� The total effect is defined as the difference between the counterfactualoutcome if i were exposed to A = a and M = M(a) and the counter-factual outcome if i were exposed to A = a? and M = M(a?).total effect = E[Y (1,M(1))]− E[Y (0,M(0))] =(E[Y (1,M(1))]−E[Y (1,M(0)))+(E[Y (1,M(0))]−E[Y (0,M(0))]) =natural indirect effect + natural direct effect

It is of importance that some outcomes cannot be observed, likeE[Y (1,M(0))], which is the average outcome when exposure X is applied,but when mediator M takes on the value that would have manifested if theexposure had not been applied. Moreover, Y (1,M(1)) cannot be observedin subjects that do not receive exposure X. This means that mediationanalysis in the PO framework requires imputation of unobserved outcomesas described by Lange et al. [28], of which we give a brief overview in thenext section.

In practice, DAGs for mediation analysis are much more complex thanthe simple three-variable model from our illustration. The calculations getincreasingly large very quickly when more variables and interactions areadded, which is discussed in Taguri, Featherstone, and Cheng [58] andMiocevic et al. [38], to name just a few examples.

In general, the direct effect of the exposure on the outcome is the com-bination of all its indirect effects through mediators that are not explicitlymodeled. In the context of our model, this means that the direct effect ofthe nurse practitioner intervention on the survival time is the combinationof all indirect effects mediated by the other treatment goals and guidelinesthat are not present in our model.

2.5.2 Effect Estimation

In essence, mediation analysis based on the PO framework is a problemof regression with missing data. Because there is only observed data, thecounterfactual data are missing. This means that finding the indirect anddirect effects with this method requires imputation, which was used as wellin section 2.4.1, albeit with a few extra steps. For a mediation model witha single mediator (see figure 2.2, a copy of figure 2.2 in section 2.3), themethod of Lange et al. [28] can be applied. It consists of 6 steps:

� Step 1: Fit a survival model to the survival times using the treatmentgroup, mediators and confounders.

� Step 2: Duplicate the original dataset twice and add a new counter-factual variable for the intervention. In the first duplication, this new

23

Page 27: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

variable assumes the observed value, while in the second duplication, itassumes the opposite variable (in the case of a dichotomous exposure).

� Step 3: Impute the unknown survival times for the new counterfac-tuals created in the previous step.

� Step 4: Fit a Cox model to the extended dataset by regressing theimputed survival times on the confounders and newly produced coun-terfactual exposures.

� Step 5: Repeat step 3 and 4 10 times and pool the obtained coeffi-cients.

� Step 6: Repeat step 1-5 1,000 times while taking a new bootstrapsample each time to obtain the significance of the obtained coefficients.

After performing these steps, the coefficient for the actual exposure willestimate the natural indirect effect, while the coefficient for the createdexposure variable will estimate the natural direct effect. Furthermore, asthese are results of a Cox proportional hazards regression, the effects willbe expressed in log-HR. To make these more readable, the effect estimatesare exponentiated, so that the expected hazard for an effect increases by afactor of the found amount for every one unit increase in the effect, whileholding all other effects constant [26]. After this exponentiation, a HR of>1 means that the effect increases risk of an event happening, while a HRof <1 indicates the effect reduces the chance of an event happening.

Of note is that this method does not impute the counterfactual media-tor. This solution is achieved based on four assumptions about the relation-ships between the outcome, intervention, mediator(s), and confounders anda mathematical proof based on these assumptions, which are described byLange et al. [28].

As with the edge estimations, in the context of our thesis, there are 34imputed datasets to begin with to handle the missing data. This means thatan extra step of pooling is required after finishing the procedure describedabove, which can be done with the help of Rubin’s rules again.

24

Page 28: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 3

Method

In this chapter we will describe the method of our research. First, we willreport the data preprocessing performed on the MASTERPLAN datasetand visualise this dataset for some insight into the data it contains. Then,we will show our efforts to validate the final DAG. Finally, we will describethe statistical analyses with which the edge weights and significance werecalculated in our final DAG.

The code used for this thesis can be found at https://github.com/

LESchernthaner/ThesisCode.

3.1 Preprocessing

The preprocessing was performed in R [51], using the Foreign (import .dta)[50], Dplyr (data manipulation) [67], Ggplot2 (plots) [66], GridExtra (plotarrangements) [2], Rmarkdown (reporting) [70], Knitr (reporting) [69], Mice(imputation) [7], Dagitty (DAG handling) [60], Writexl (xlsx reading andwriting) [45], Lubridate (time casting) [16], Boot (bootstrapping) [11], Sur-vival (Cox regression) [62], Survminer (Cox plots) [21] and Amelia (impu-tation pooling) [18] packages.

We performed a short analysis of the variables in the MASTERPLANdataset relevant to this thesis to gain insight in their distributions and ex-treme values. We noticed the values of albumin-creatinine ratio on the firstvisit were very high compared to the other visits in both treatment groups.This may be explained by the fact that the start of the trial gave incentiveto the doctors to properly manage it. There were also two extreme BMImeasurements; one below 15 and one above 50. They were not excludedfrom the dataset, because comparison with the waist-hip ratio implied thatthese values were plausible. Finally, there was one patient with a low di-astolic blood pressure (<50), which was also not seen as an outlier, as thiswas consistently present in this patient. The full analysis can be found inappendix A.

25

Page 29: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

After this, we imputed the missing values of the baseline and one-yearcheck-up visits. This imputation was done to obtain more complete datafor the final mediation analysis. The one-year check-ups were set to be thefifth visit of each patient. However, this was not always realised, as somepatients had this check-up on their fourth visit, for example. We tried toreduce this effect with a special selection procedure. If the fifth visit of apatient contained more than 8 missing variables, we selected the fourth visitif it contained less than 8 missing variables. If this also did not hold, weselected the sixth visit if it contained less than 8 missing variables, and if thisalso did not hold, the fifth visit was selected after all to prevent includingmuch earlier or later visits.

The data was transformed to a long format with the baseline measure-ments, first visit measurements and fifth visit measurements, obtaining adataset of 742 variables. The most common pattern was no missing vari-ables (343 times), and after that, missing proteinuria on visit 5 (118 times).

The variables that were missing most often were proteinuria (34%), urinesodium (10%) and albumin-creatinine ratio (8%), all at visit 5. We generated34 imputed datasets, as the most frequently missing variable (proteinuria onvisit 5) was missing in 34% of the data. This variable could not be dropped,as it is part of our mediation model, which is why we took the precautionof producing an equal amount of imputed datasets. Usually, a cutoff valueof 40% missingness is given, and a recent publication even suggests that theamount of missingness can be higher [32]. In a short analysis of the imputa-tion, we found good convergence and reasonable distributions of the imputedvariables. Throughout our analysis, we used the same 34 imputed datasetsby using the same seed, imputation methods and predictor matrix. The fullmissing data analysis, the imputation and the analysis of the imputed dataare shown in appendix B.

3.2 Data Visualisation

To provide some insight into the MASTERPLAN dataset, two tables wereconstructed. Table 3.1 contains an overview of the baseline characteristics ofthe 788 randomised patients and table 3.2 gives a summary of the variablesof the altered dataset mentioned in the previous section, which contains dataof the first and fifth visits of the 742 patients who were still participatingduring the fifth visit. In the final analyses, we further filter the dataset toonly contain patients with stage 3-4 GFR, obtaining data with 340 patientsin the control group and 334 in the intervention group.

26

Page 30: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Control Intervention

N 393 395

Age (years) 59.3 (12.8) 58.9 (13.1)

Gender (male) (%) 68 67

Race (Caucasian) (%) 93 91

Kidney transplantation

(%)14 14

Prior CVD (%) 25 33

History of DM (%) 23 26

Smoking (%) 24 19

BMI (kg/m2) 27.2 (4.9) 27.0 (4.6)

eGFR (ml/min/1.73 m2) 35.6 (13.3) 36.3 (14.5)

Systolic BP (mm Hg) 136 (21) 135 (20)

Diastolic BP (mm Hg) 79 (11) 78 (11)

Proteinuria (g/24 hr) 0.3 [0.1-0.8] 0.2 [0.1-0.8]

Albumin-creatinine ratio

(mg/g)138.8 [28.8-496.6] 111.4 [30.8-416.3]

Sodium excretion

(mmol/24 hr)150 [113-189] 148 [116-194]

ACE-I or ARB use (%) 78 81

Data are given as proportions, means with corresponding standard deviation,or median with 25th and 75th quartiles.

CVD = cardiovascular disease

DM = diabetes mellitus

BMI = body mass index

eGFR = estimated glomerular filtration rate

BP = blood pressure

ACE-I = angiotensin-converting enzyme inhibitors

ARB = Angiotensin II receptor blockers

Table 3.1: Baseline characteristics of the MASTERPLAN dataset.

27

Page 31: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Baseline

Year1

Control

Interv

ention

Control

Interv

ention

N36

537

7365

377

eGF

R(m

l/m

in/1.

73m

2)

36.1

(13.

1)36

.7(1

4.5)

34.2

(14.1

)35.

0(1

4.8)

Syst

oli

cB

P(m

mH

g)13

5(2

0)13

5(2

0)133

(20)

130

(19)

Dia

stol

icB

P(m

mH

g)79

(11)

78(1

1)78

(11)

76(1

0)

Pro

tein

uri

a(g

/24

hr)

0.3

[0.1

-0.8

]0.

2[0

.1-0

.8]

0.3

[0.1

-1.0

]0.

3[0

.1-0

.9]

Alb

um

in-c

reati

nin

era

tio

(mg/

g)13

3.5

[28.

0-43

9.7]

110.

1[3

0.0-

406.

5]24.

4[8

.6-7

7.6]

20.

2[6

.0-5

7.3]

Sod

ium

excr

etio

n(m

mol

/24

hr)

150

[112

-190

]15

0[1

77-1

95]

152

[120-

195

]150

[117

-198]

AC

E-I

or

AR

Bu

se(%

)78

8184

92

Dat

aar

egi

ven

asp

rop

orti

ons,

mea

ns

wit

hco

rres

pon

din

gst

an

dard

dev

iati

on

,or

med

ian

wit

h25th

an

d75th

qu

arti

les.

eGF

R=

esti

mat

edgl

omer

ula

rfi

ltra

tion

rate

BP

=b

lood

pre

ssu

re

AC

E-I

=an

giot

ensi

n-c

onve

rtin

gen

zym

ein

hib

itors

AR

B=

An

giot

ensi

nII

rece

pto

rb

lock

ers

Tab

le3.2

:C

har

acte

rist

ics

ofth

ed

atas

etco

nst

ruct

edin

the

pre

vio

us

sect

ion

28

Page 32: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

ACEI ARB T0

BP T0

Intervention

PU T0

SE T0

eGFR T0

0.97 (-0.59,2.53)

1.36 (-0.21,2.93)

0.13 (0.03,0.22)

-0.23 (-0.41,-0.05)

-0.30 (-0.39,-0.20)

8.47 (3.52,13.4)

Figure 3.1: The DAG upon which all other DAGs were built. The numberswith two digits are the estimated edge weights and the numbers betweenbrackets are the lower and upper bounds of the 95% confidence interval.

3.3 Final DAGs

We constructed the final DAGs for the main analysis of this thesis withthe help of the 2012 KDIGO guideline [22] and the MASTERPLAN thesis[72]. These works contain information about the relations of the variablesused in our model. Some are hypothesised, while others are supported withempirical proof. This means that we utilised the method of ‘guessing-and-testing’ to construct a DAG based on these hypotheses [56]. Furthermore,we split the DAG with the treatment goals on T1 up into four smaller DAGs,which does not allow us to capture the interactions between the variables.However, this choice was necessary to perform the final mediation analysis,as it would have been beyond the scope of this thesis to carry out a mediationanalysis with four interacting variables. To build the individual DAGs, thegraph in figure 3.1 was used as a base. To clarify, common variable namesand their corresponding methods of measurement and units are listed below.Besides this, all continuous independent variables were standardised beforeperforming this analysis, which means that the edge weights show their sizerelative to other edge weights, as opposed to the increase in the dependentvariable based on an increase of one unit of the independent variable.

� Blood pressure: systolic blood pressure measured over 30 minutes (mmHg)

� Proteinuria: urinary protein concentration (mg/24 hr)

� ACE inhibitor or ARB use: use of angiotensin-converting-enzyme in-hibitors or angiotensin II receptor blockers (true/false)

� Salt intake: urinary sodium excretion (mmol/24 hr)

29

Page 33: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

� eGFR: estimated glomerular filtration rate with the MDRD175 formula(ml/min/1.73m2)

� Intervention: addition of a nurse practitioner to the care team or stan-dard treatment (true/false)

� (Renal) composite endpoint / outcomes: composite renal endpoint ofdeath, 50% decrease in eGFR relative to baseline, dialysis, and kidneytransplant (true/false)

To validate the DAGs that we made, we first tested the implicationsof each DAG. This test was performed by extracting the conditional in-dependencies from a DAG with the impliedConditionalIndependencies

function of the Dagitty package, and using the MASTERPLAN dataset tovalidate these independencies. It is important to perform these tests prop-erly when using imputed data, as is the case for this thesis. We used thepool function of the Mice package to do this. This function first tests aconditional independency, that is, it estimates a regression coefficient, witheach of the 34 imputed datasets and then finds the average estimate and cal-culates the total variance over the repeated analyses, according to Rubin’srules. The script for these tests is presented in appendix D and the modelcodes of each DAG can be found in appendix C.

After this, we used our combined knowledge about mediation analysisand CKD to perform a face validation of the DAG. Because a lot of themechanisms are hypothesized but not known, it was only possible to identifyrelationships in our DAG that have been disproven, or missing relationshipsthat have been proven. Thus, this step was another attempt at falsification,rather than verification. Because of these uncertainties, it is possible thatour model is incorrect. However, our falsification attempts failed, whichincreases the plausibility of our model. Figure 3.2 contains the hypotheticalDAGs we constructed.

3.4 Statistical Analyses

To compute the edge weights, we used linear, logistic or Cox proportionalhazards regression depending on the dependent variable. For the outcomevariable (ESRD), Cox regression was utilised, while binary dependent vari-ables required logistic regression. All other edges were computed by meansof linear regression. Like with the independency tests, it is important to usethe correct method for these estimations and their significance with imputeddatasets. We employed the ‘MI Boot’ method described by Schomaker andHeumann [55] in which the missing data is first imputed and after this, eachimputed dataset is bootstrapped, where all regressions are performed oneach bootstrap sample. For correctly pooling the results, a manual imple-mentation of Rubin’s rules was built. We generated 34 imputed datasets

30

Page 34: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

and used 5,000 bootstrap samples for each imputed dataset. The script forthese estimations can be found in appendix E and the results are shown infigures 3.1 and 3.2.

From these figures, it is clear that not all hypothesised edges are sig-nificant, yet leaving them out resulted in invalidation of the implications.This indicates that there are other mechanisms through which the effects ofthe variables are established. Because of the limited time available for thisresearch, we were not able to find the optimal DAGs in which all edges aresignificant. Despite this limitation, mediation analysis could still be per-formed as the DAGs do show at least a limited direct or indirect effect ofeach of the baseline variables on the occurrence of ESRD. Since the methodwe use for the final mediation analyses includes the baseline values of bloodpressure, proteinuria, ACE inhibitor or ARB use, and sodium excretion aspossible confounders for each of the four analyses, the insignificance of someof the edges in the aforementioned figures is not detrimental to the validityof the results of the analyses. This insignificance does, however, show thatour hypotheses are not completely correct.

31

Page 35: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

ACEI ARB T0

BP T0

BP T1

ESRD

Intervention

PU T0

SE T0

eGFR T0

0.00 (-0.33,0.34)

0.34(0.21,0.46)

-0.17 (-0.41,0.08)

0.34 (0.22,0.46)

-0.04 (-0.17,0.09)

-0.52 (-0.67,-0.37)

2.12 (-0.93,5.16)

11.71 (10.28,13.14)

-2.32

(-4.69

,0.05

)

1.24 (-0.07,2.55)

-1.06 (-2.19,0.07)

-0.89

(-2.14

,0.36

)

(a) Blood pressure

ACEI ARB T0

BP T0

PU T1

ESRD

Intervention

PU T0

SE T0

eGFR T0

-0.09 (-0.40,0.22)

0.32(0.23,0.41)

-0.26 (-0.51,-0.01)

-0.06 (-0.19,0.07)

-0.58 (-0.74,-0.42)

0.08 (-0.03,0.18)

-0.07

(-0.22

,0.08

)

0.91 (0.68,1.14)

-0.05

(-0.12

,0.01

)

0.28 (0.16,0.40)

0.19 (0.03,0.36)

(b) Proteinuria

Figure 3.2: The DAGs we constructed in the context of this thesis. In ordertop to bottom, the DAGs contain the blood pressure, proteinuria, ACEinhibitor or ARB use, and sodium excretion treatment goals as mediators.The red arrows indicate the indirect effect, while the blue arrows indicate thedirect effect. The numbers with two digits are the estimated edge weightsand the numbers between brackets are the lower and upper bounds of the95% confidence interval. The estimates of the edges in the base DAG canbe found in figure 3.1

32

Page 36: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

ACEI ARB T0

BP T0

ACEI ARB T1

ESRD

Intervention

PU T0

SE T0

eGFR T0

-0.08(-0.49,0.33)

-0.24 (-0.49,0.02)-0.54 (-0.69,-0.39)

1.16

(0.45

,1.87

)

0.34 (0.21,0.46)

0.29 (0.17,0.41)

-0.07 (-0.20,0.06)

4.02 (3.13,4.91)

0.23 (-0.16,0.63)

0.41 (-0.07,0.88)

(c) ACE inhibitor or ARB use

ACEI ARB T0

BP T0

SE T1

ESRD

Intervention

PU T0

SE T0

eGFR T0

1.20

(-7.42

,9.82

)

0.03 (-0.31,0.36)

-0.21(-0.36,-0.06)

-0.25 (-0.50,-0.004)

0.29 (0.17,0.41)

0.36 (0.24,0.48)

-0.53 (-0.68,-0.38)

27.6 (21.4,33.8)

(d) Sodium excretion

Figure 3.2: (cont.) The DAGs we constructed in the context of this thesis.In order from top to bottom, the DAGs contain the blood pressure, protein-uria, ACE inhibitor or ARB use, and sodium excretion treatment goals asmediators. The red arrows indicate the indirect effect, while the blue arrowsindicate the direct effect. The numbers with two digits are the estimatededge weights and the numbers between brackets are the lower and upperbounds of the 95% confidence interval. The estimates of the edges in thebase DAG can be found in figure 3.1

33

Page 37: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 4

Mediation Analysis

The final part of our analysis exists of the computation of the direct andindividual indirect effects of the nurse practitioner intervention on CKDpatient survival time. As explained before, using the potential outcomes(PO) framework, mediation analysis can be reduced to a multiple regressionproblem utilising imputation.

To obtain the size and significance of the indirect, direct and total effectsof each DAG, we followed the method of Lange et al. [28] applied to the 34imputed datasets. This means that each of the 34 imputed datasets wasbootstrapped 1,000 times and for each bootstrap sample, the counterfactualoutcomes and survival times were imputed. Like the other analyses on theseimputed datasets, the results should be pooled in accordance with Rubin’srules. As mentioned before, the script for this process is based on the codewritten by Lange et al. [28] and can be found in appendix F. The result ofthe mediation analyses of the individual DAGs can be found in 4.1. We nowshortly discuss our observations of this end result.

First, blood pressure does not appear to have a significant mediating role(HR 0.95, 95% CI 0.78, 1.17) on the impact of the intervention, which mayseem unlikely. However, this could be explained by the fact that all doctorswere informed of the treatment goals, thus adherence increased in boththe treatment and control group (see table 3.2). Despite this insignificantmediating effect, it may still be beneficial to pursue adherence to the bloodpressure treatment goal, as high blood pressure has been closely related tocardiovascular events and CKD progression [23].

Besides this, proteinuria, does not act as a significant mediator either(HR 0.99, 95% CI 0.81, 1.22). This might be explained by the fact that manypatients already met the relevant treatment goal at the start of the trial,and by the small difference between the treatment and intervention groupafter one year of follow-up (see table 3.2) [72]. However, similar to bloodpressure, increased effort to decrease proteinuria could still be valuable toCKD patients, as it is strongly associated with CKD progression [22] and

34

Page 38: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

HR95% CI

Lower limit Upper limit

Effect

Natural indirect 0.95 0.78 1.17

Natural direct 0.82 0.67 1.01

Total 0.79 0.57 1.09

Mediated proportion 0.35 -1.92 2.62

(a) Blood pressure

HR95% CI

Lower limit Upper limit

Effect

Natural indirect 0.99 0.81 1.22

Natural direct 0.79 0.65 0.97

Total 0.79 0.57 1.09

Mediated proportion 0.11 -1.96 2.18

(b) Proteinuria

Table 4.1: Results of the mediation analysis for each of the four individualDAGs. In order from left to right, the mediators are the blood pressure,proteinuria, ACE inhibitor or ARB use, and sodium excretion treatmentgoals.HR = hazard ratioCI = confidence interval

even plays an active role in this process [10].Closely related to proteinuria and blood pressure management is ACE

inhibitor or ARB use. However, the mediation analysis does not show asignificant effect (HR 0.98, 95% CI 0.81, 1.20) for this treatment. Thisabsence of an effect may again be due to the design of the MASTERPLANstudy, as all doctors were made aware of the recommendation to prescribethese medications, which resulted in a significant increase in usage in bothtreatment groups, such that almost every patient received a prescription(see table 3.2). Thus, it may still be beneficial to recommend ACE-inhibitorand ARB usage to CKD patients, as the difference between the groups wassmall but nearly all patients used them.

Like the other treatment goals discussed above, salt intake, or sodiumexcretion, is an insignificant mediator (HR 1.01, 95% CI 0.83, 1.23) of theeffect of the nurse practitioner intervention on the survival time, and unlikethe others has an average HR of above 1. This could have been anticipated,

35

Page 39: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

HR95% CI

Lower limit Upper limit

Effect

Natural indirect 0.98 0.81 1.20

Natural direct 0.80 0.66 0.98

Total 0.79 0.57 1.09

Mediated proportion 0.14 -1.92 2.21

(c) ACE inhibitor or ARB use

HR95% CI

Lower limit Upper limit

Effect

Natural indirect 1.01 0.83 1.23

Natural direct 0.78 0.64 0.96

Total 0.79 0.57 1.09

Mediated proportion 0.02 -2.00 2.05

(d) Sodium excretion

Table 4.1: (cont.) Results of the mediation analysis for each of the fourindividual DAGs. In order from left to right, the mediators are the bloodpressure, proteinuria, ACE inhibitor or ARB use, and sodium excretiontreatment goals.HR = hazard ratioCI = confidence interval

as there was barely any reduction, or rather even an increase in the measuredsodium excretion in both treatment groups (see table 3.2). This absence ofchange may be caused by the great difficulty found in restricting dietarysodium intake and lack of appropriate support strategies [37]. However,research has indicated the severe effect of excessive salt consumption, whichmeans that it may be worthwhile to find alternative, more effective coachingmethods to reduce salt intake [24].

Of course, there are more treatment goals and guidelines than thesefour, which are aggregated into the direct effect in each graph. Becausethese direct effects may include mediators that interact with the explicitlymodelled mediator, it is not possible to conclude anything on these treatmentgoals.

Each subtable of table 4.1 not only shows the effects, but also the me-diated proportion of the outcome by the individual mediators. Normally,these would be summable, resulting in a mediated proportion of 0.6 by the

36

Page 40: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

blood pressure, proteinuria, and ACE inhibitor or ARB use treatment goals.However, as described in chapter 2, there is interaction between these treat-ment goals, which means that the mediated proportions found are higherthan they are in reality.

Finally, the total effect of the nurse practitioner intervention does notappear to be significant either (HR 0.79, 95% CI 0.58, 1.08), which doesnot agree with the results of Peeters et al. [47], who found a HR for theintervention on renal outcomes of 0.80 (95% CI 0.66, 0.98), while Zuilenet al. [75] found a HR of 0.83 (95% CI 0.57, 1.20) for the intervention on theincidence of ESRD, though this was found with data of a shorter follow-upperiod of 4.6 years, as opposed to a median follow-up of 5.7 years. Besidesthis, we limited our dataset to only include patients with stage 3-4 GFR andadjusted for baseline systolic blood pressure, proteinuria, ACE inhibitor orARB use, sodium excretion, and eGFR, while Peeters et al. [47] did notfilter the dataset and only adjusted for baseline serum creatinine. Anotherdissimilarity that could explain the difference in results is the fact that weused multiple imputation to handle missing data and utilised mediationanalysis, while the two previously mentioned papers do not employ eitherof these methods. Further research based on improved models should beperformed to analyse these differences.

37

Page 41: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 5

Discussion

In this chapter, we will first discuss the limitations of the data and methodsused in this thesis. After this we will describe previous research related toours and demonstrate that this thesis has resulted in new knowledge in thefield of CKD.

5.1 Limitations

Mediation analysis relies on the assumption of a complete model. Any un-measured or unaccounted for confounders will decrease the accuracy of amodel. Despite our efforts to increase the plausibility of our model, it isstill possible that our model is incomplete. The method we used for theconstruction of the model was chosen to allow the completion of this thesiswithin the given time frame. Another obstacle in the use of DAGs is thatthe functional relationship of the nodes is unclear from a DAG alone. Forexample, moderation is not modelled separately, it is shown as two parentvariables having a separate relation (arrow) with the common child node.This makes it harder to interpret results in the applied context. Further-more, we limited our DAGs to only include one mediator, which means wewere not able to capture the relationships between these mediators. Finally,because of the method used for construction of the models, the results in-cluded some edges that were insignificant. Despite this, it was still possibleto perform the mediation analyses, as it was clear that all baseline valuesof the four treatment goals were relevant to the survival time, directly orindirectly.

Another limitation of our analyses is our assumption of linear dose-response relationships, which was chosen to simplify the calculations, whilein reality, far from all relationships are linear. Finally, we did not account formeasurement errors. Besides confounders that are not accounted for, mea-surement errors in confounding variables can result in residual confounding,disturbing the true outcomes of the mediation analysis [27].

38

Page 42: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

There are also some limitations to the MASTERPLAN dataset thatwas used for our analysis. Because patients were selected from nine Dutchhospitals, the population was mostly Dutch and Caucasian. Since multiplevariables in our analysis are affected by race, this decreases its generalization[22]. Besides this, the population was also relatively young and healthy[47]. In addition, the trial took place in nine different hospitals, which mayhave given rise to selection bias. More importantly, many patients alreadymet some goals, like ACE inhibitor or ARB use, at the start of the trial,which may have reduced the potential of the nurse practitioner intervention.Finally, there was a significant contamination bias, which means that allpractitioners were informed about the treatment goals, thus the differencein treatment between the two groups was decreased, which can be seen, forexample in the significant increase in ACE inhibitor and ARB prescriptionafter the first visit in both groups, as opposed to just the intervention group.[72].

The field of mediation analysis is developing rapidly and increasinglymore scientists use this method in their research. In this paper, we havetried to use proper methodology and avoid common issues as described byMemon et al. [36]. As a first step, we made an effort to lay a good foun-dation of knowledge about mediation analysis and build our model basedon previously established relationships or at least hypotheses. The researchquestion was constructed using the PICOT format, which allowed us to for-mulate the mediation hypotheses in a clinical context. Baron and Kenny’s(1986) approach, which has been heavily critiqued, was avoided and instead,Hayes [17] was consulted. This also means that presence of mediation andtherefore the continuation of the analysis was not based on the presenceof a direct effect. Besides this, we also refrained from labelling the medi-ation as ”partial” or ”complete” and inspected the specific indirect effectsinstead of the total indirect effect. Finally, we inspected both the two-tailedp-value and confidence intervals and utilized bootstrapping as recommendedin Hayes [17].

5.2 Related Work

In 2002, NKF-KDOQI developed a conceptual model of CKD, which wasrevised and adapted by KDIGO in 2005, shown in figure 5.1. Some arrowsare bidirectional because progression of CKD does not transpire in all pa-tients and early interventions may prevent progression. The left side of thearrows is dotted to point out that remission is less frequent than progression[30].

This conceptual model is illustrative on a conceptual level, but does notlend itself to effect computations like our models do.

In [75], the effect of the nurse practitioner intervention on a composite

39

Page 43: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Figure 5.1: Conceptual model of chronic kidney disease (CKD). Overall,this diagram presents the continuum of the development, progression, andcomplications of CKD. Green circles, stages of CKD; aqua circles, potentialantecedents of CKD; lavender circles, consequences of CKD; thick arrowsbetween ellipses, risk factors associated with the development and progres-sion and remission of CKD. “Complications” refers to all complications ofCKD and its treatment, including complications of decreased glomerularfiltration rate (GFR) and cardiovascular disease. Stages of prevention areshown along the continuum. Modified and reprinted with permission fromthe National Kidney Foundation. [30].

cardiovascular endpoint was analysed using the MASTERPLAN data. Theeffect of the intervention on the treatment goals and on the renal outcomeswas evaluated. However, the effect of the intervention through the individualtreatment was not calculated. Furthermore, the effect of the intervention onthe secondary outcome of end-stage renal disease (ESRD) after a follow-upof 2 years was found to be insignificant.

This somewhat contradicts the conclusion of Peeters et al. [47] where theeffect of the intervention on a composite renal endpoint including ESRD,death and 50% decrease in serum creatinine using the MASTERPLAN dataas well. This paper reports that the effect of the intervention on this com-posite endpoint and the individual parts except for death after 2 years offollow-up is significant. Like the previous paper, this one only examined thetotal effect of the nurse practitioner intervention, not the effects mediatedby the different treatment goals.

Two other trials were conducted with the same type of intervention forCKD patients, one in Canada and another in The Netherlands. Both trialswere of shorter duration (24 and 12 months, respectively) than the MAS-

40

Page 44: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

TERPLAN trial and also had different outcomes from the previously men-tioned trials. The Canadian trial found little to no improvement in treatmentgoals or eGFR decrease between the two treatment groups after two years,while the Dutch trial found a decrease in systolic blood pressure after justone year. Furthermore, both papers included analysis of the effect of theintervention on the individual treatment goals, the analysis of the Canadiantrial included estimation of the effect of the intervention on eGFR declineand neither included the effect on an endpoint [3, 54].

In summary, the MASTERPLAN dataset has been analysed before toinvestigate the effect of the addition of a nurse practitioner to the care teamfor CKD patients. Specifically the effect of this intervention on 11 differenttreatment goals and on a composite renal or cardiovascular endpoint. How-ever, this total effect on the endpoints has not yet been split into the effectthrough the individual treatment goals. This thesis presents a first effort todo so on a subset of these goals, together with four models incorporatingthis subset, which has also not been done before, to our knowledge.

41

Page 45: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Chapter 6

Conclusions

Chronic kidney disease (CKD) is a heavy burden on patients and is verycomplex. Sadly, many questions still remain about its underlying mecha-nisms and the best treatment to prevent disease progression. Previous trialshave shown that the addition of a nurse practitioner to the care team ofCKD patients to enhance guideline adherence may improve quality of lifeand even renal outcomes. The MASTERPLAN trial is such a trial where788 Dutch CKD patients were examined over a median follow-up time of 5.7years while they were being treated with several treatment goals and guide-lines under standard supervision or with the addition of a nurse practitionerto their care team. However, no analysis has been done yet to analysethrough which of the treatment goals this intervention may exert its effect.

In this thesis we answer the research question

In patients with CKD, does increased effort to eliminate highblood pressure, decrease proteinuria, increase ACE inhibitor orARB usage, and lessen salt intake by means of coaching by anurse practitioner have a significant positive impact on the longterm progression of kidney function?

To answer this question, we performed a mediation analysis on the MAS-TERLAN data, limited to patients with stage 3-4 estimated glomerular fil-tration rate (eGFR). Patients who reached the treatment goals of

� blood pressure 130/85 mm Hg or lower,

� proteinuria 0.5 g per day or lower,

� ACE inhibitor or ARB e.g. enalapril 5 mg twice daily (or comparabledose of other ACE inhibitor) or irbesartan 75-150 mg (or comparabledose of other ARB) daily, and

� sodium intake 2g per day (or sodium excretion 100 mmol per day) orlower

42

Page 46: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

within one year were compared to patients who did not reach this goal withinone year by means of survival time until a composite renal endpoint. Thisrenal endpoint consists of dialysis, transplant, 50% decrease of eGFR relativeto baseline, and death. It is important to note that our mediation analysisis limited by the mediation graphs we constructed within the context of thisthesis. Because they are partially based on hypotheses that have not yetbeen verified, they may be incorrect, which could reduce the validity of ourresults.

From our mediation analysis we conclude that the effect of the nursepractitioner intervention is not mediated by the aforementioned four treat-ment goals. However, this does not mean that these goals do not improvesurvival time of CKD patients. Their mediating effect may have been re-duced by pre-existing good adherence for the ACE inhibitor or ARB usegoal, and general difficulty in achieving it for the salt intake goal. Besidesthis, contamination bias was present, which means that all doctors were in-formed of the treatment goals, which led to increase in adherence for someof the goals in both treatment groups. Furthermore, the effect of the inter-vention on the survival time of CKD patients itself is insignificant as well,which is not in line with previous analyses of the dataset. These analyses,however, were performed without imputation of incomplete data, used adifferent renal endpoint, and did not employ mediation analysis. Despitethese differences, our results should be interpreted with caution and furtheranalysis, preferably based on an improved model, should be performed toexplain these differences. We conclude that the nurse practitioner interven-tion is not effective in improving renal outcome in CKD patients with stage3-4 eGFR, although the treatment goals, both discussed and not discussedin this thesis, should not be neglected, despite the outcome of our analysis.

In practice, these conclusions indicate that nurse practitioners shouldnot focus on blood pressure, proteinuria, ACE inhibitor or ARB use, or saltintake of CKD patients, as their effect is not mediated significantly by thetreatment goals relevant to these variables. Furthermore, the addition of anurse practitioner to the care teams of CKD patients will not improve renaloutcome.

6.1 Future Work

For this thesis, we limited our models to only include variables relevant toblood pressure, proteinuria, ACE inhibitor or ARB use, and sodium ex-cretion in the context of CKD progression. In reality, there are far morevariables of influence in this context. This means that more expansive mod-els could provide more insight into the mediation of CKD progression.

Furthermore, for the construction of our models we utilised the ‘guessing-and-testing’ method, which led to the modelling of some insignificant rela-

43

Page 47: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

tionships. Another method to construct such a graph utilises discoveryalgorithms, where a DAG is constructed with actual data, while the exactrelationships are still unknown, which could have resulted in more accuratemodels [56]. Thus, with future efforts, improved models could be madeby adding more variables and constructing it from data with an algorithminstead of merely checking it with data.

Besides this, we divided the DAG into separate DAGs, each with onemediator, to keep the mediation analysis within the scope of this thesis. Bybuilding a model that includes all mediators and their interactions, moreinsight could be gained into their relationships. Finally, we assumed lineardose-response relationships to simplify the analyses. By employing non-linear models to more closely mimic the true relationships of the variables,the accuracy of the results of further analyses could be improved further.

44

Page 48: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Bibliography

[1] Mauro Abbate, Carla Zoja, and Giuseppe Remuzzi.“How Does Proteinuria Cause Progressive Renal Damage?”In: Journal of the American Society of Nephrology 17.11 (Nov. 1,2006). Publisher: American Society of Nephrology Section: Frontiersin Nephrology, pp. 2974–2984. issn: 1046-6673, 1533-3450.doi: 10.1681/ASN.2006040377.url: http://jasn.asnjournals.org/content/17/11/2974 (visitedon 03/15/2020).

[2] Baptiste Auguie.gridExtra: Miscellaneous Functions for ”Grid” Graphics. 2017.url: https://CRAN.R-project.org/package=gridExtra.

[3] Brendan J. Barrett et al. “A Nurse-coordinated Model of Care versusUsual Care for Stage 3/4 Chronic Kidney Disease in the Community:A Randomized Controlled Trial”.In: Clinical Journal of the American Society of Nephrology : CJASN6.6 (June 2011), pp. 1241–1247. issn: 1555-9041.doi: 10.2215/CJN.07160810.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3109918/(visited on 05/17/2020).

[4] Jeffrey S Berns. Patient education: Chronic kidney disease (Beyondthe Basics). UpToDate. Aug. 23, 2019.url: https://www.uptodate.com/contents/chronic-kidney-disease-beyond-the-basics (visited on 05/11/2020).

[5] C. S. Burrus. Iterative Reweighted Least Squares.url: https://cnx.org/contents/krkDdys0@12/Iterative-Reweighted-Least-Squares (visited on 05/21/2020).

[6] Stef van Buuren. Flexible Imputation of Missing Data.Google-Books-ID: elDNBQAAQBAJ. CRC Press, Mar. 29, 2012.326 pp. isbn: 978-1-4398-6825-6.

[7] Stef van Buuren and Karin Groothuis-Oudshoorn.“mice: Multivariate Imputation by Chained Equations in R”.

45

Page 49: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

In: Journal of Statistical Software 45.3 (2011), pp. 1–67.url: https://www.jstatsoft.org/v45/i03/.

[8] D. R. Cox and D. Oakes. Analysis of Survival Data.Chapman and Hall/CRC, 1984. isbn: 978-1-315-13743-8.doi: 10.1201/9781315137438.url: http://www.taylorfrancis.com/books/9781315137438(visited on 05/28/2020).

[9] Paolo Cravedi and Giuseppe Remuzzi.“Pathophysiology of proteinuria and its value as an outcome measurein chronic kidney disease”. In: British Journal of ClinicalPharmacology 76.4 (Oct. 2013), pp. 516–523. issn: 0306-5251.doi: 10.1111/bcp.12104.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3791975/(visited on 03/15/2020).

[10] Paolo Cravedi, Piero Ruggenenti, and Giuseppe Remuzzi.“Proteinuria should be used as a surrogate in CKD”.In: Nature Reviews Nephrology 8.5 (May 2012). Number: 5 Publisher:Nature Publishing Group, pp. 301–306. issn: 1759-507X.doi: 10.1038/nrneph.2012.42.url: http://www.nature.com/articles/nrneph.2012.42 (visitedon 05/26/2020).

[11] A. C. Davison and D. V. Hinkley.Bootstrap Methods and Their Applications.Cambridge: Cambridge University Press, 1997. isbn: 0-521-57391-2.url: http://statwww.epfl.ch/davison/BMA/.

[12] G. M. Fitzmaurice and N. M. Laird.“Multivariate Analysis: Discrete Variables (Logistic Regression)”.In: International Encyclopedia of the Social & Behavioral Sciences.Ed. by Neil J. Smelser and Paul B. Baltes.Oxford: Pergamon, Jan. 1, 2001, pp. 10221–10228.isbn: 978-0-08-043076-8. doi: 10.1016/B0-08-043076-7/00476-9.url: http://www.sciencedirect.com/science/article/pii/B0080430767004769 (visited on 05/21/2020).

[13] John Fox and Sanford Weisberg.“Bootstrapping Regression Models in R”. In: (Sept. 21, 2018), p. 18.

[14] David Y. Gaitonde, David L. Cook, and Ian M. Rivera.“Chronic Kidney Disease: Detection and Evaluation”.In: American Family Physician 96.12 (Dec. 15, 2017), pp. 776–783.issn: 0002-838X, 1532-0650.url: http://www.aafp.org/afp/2017/1215/p776.html (visited on03/07/2020).

46

Page 50: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[15] W. J. C. de Grauw et al.“Landelijke Transmurale Afspraak Chronische nierschade”.In: Huisarts en Wetenschap 52.12 (Dec. 1, 2009), pp. 586–595.issn: 1876-5912. doi: 10.1007/BF03085802. url:https://doi.org/10.1007/BF03085802 (visited on 03/08/2020).

[16] Garrett Grolemund and Hadley Wickham.“Dates and Times Made Easy with lubridate”.In: Journal of Statistical Software 40.3 (2011), pp. 1–25.url: http://www.jstatsoft.org/v40/i03/.

[17] A.F. Hayes. Introduction to Mediation, Moderation, and ConditionalProcess Analysis: A Regression-Based Approach. 2nd ed.Methodology in the Social Sciences.New York: Guilford Publications, 2018. 691 pp.isbn: 978-1-4625-1127-3.url: https://books.google.com/books?id=8YX2QwGgD8AC.

[18] James Honaker, Gary King, and Matthew Blackwell.“Amelia II: A Program for Missing Data”.In: Journal of Statistical Software 45.7 (2011), pp. 1–47.url: http://www.jstatsoft.org/v45/i07/.

[19] Zakaria Jaadi. When and Why to Standardize Your Data? Built In.Library Catalog: builtin.com. Apr. 9, 2019.url: https://builtin.com/data-science/when-and-why-standardize-your-data (visited on 06/12/2020).

[20] Kitty J. Jager et al.“A single number for advocacy and communication—worldwide morethan 850 million individuals have kidney diseases”.In: Kidney International 96.5 (Nov. 1, 2019), pp. 1048–1050.issn: 0085-2538. doi: 10.1016/j.kint.2019.07.012.url: http://www.sciencedirect.com/science/article/pii/S0085253819307860 (visited on 06/08/2020).

[21] Alboukadel Kassambara, Marcin Kosinski, and Przemyslaw Biecek.survminer: Drawing Survival Curves using ’ggplot2’. 2019.url: https://CRAN.R-project.org/package=survminer.

[22] Kidney Disease: Improving Global Outcomes (KDIGO) CKD WorkGroup. “KDIGO 2012 Clinical Practice Guideline for the Evaluationand Management of Chronic Kidney Disease”.In: Kidney International Supplements 3 (Jan. 1, 2013), pp. 1–150.url: https://kdigo.org/wp-content/uploads/2017/02/KDIGO_2012_CKD_GL.pdf.

47

Page 51: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[23] Kidney Disease: Improving Global Outcomes (KDIGO) CKD WorkGroup. “KDIGO Clinical Practice Guideline for the Management ofBlood Pressure in Chronic Kidney Disease”. In: 2.5 (Dec. 2012).url: https://kdigo.org/wp-content/uploads/2016/10/KDIGO-2012-Blood-Pressure-Guideline-English.pdf.

[24] Hoseok Koo et al.“The ratio of urinary sodium and potassium and chronic kidneydisease progression: Results from the KoreaN Cohort Study forOutcomes in Patients with Chronic Kidney Disease (KNOW-CKD)”.In: Medicine 97.44 (Nov. 2018), e12820. issn: 0025-7974.doi: 10.1097/MD.0000000000012820. url:http://journals.lww.com/md-journal/FullText/2018/11020/

The_ratio_of_urinary_sodium_and_potassium_and.13.aspx

(visited on 05/17/2020).

[25] Aysun Aybal Kutlugun et al. “Daily Sodium Intake in ChronicKidney Disease Patients during Nephrology Clinic Follow-Up: AnObservational Study with 24-Hour Urine Sodium Measurement”.In: Nephron Clinical Practice 118.4 (2011). Publisher: KargerPublishers, pp. c361–c366. issn: , 1660-2110.doi: 10.1159/000323392.url: http://www.karger.com/Article/FullText/323392 (visitedon 03/06/2020).

[26] Wayne W. LaMorte. Cox Proportional Hazards Regression Analysis.June 3, 2016.url: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/BS704_Survival6.html#headingtaglink_1 (visited on05/21/2020).

[27] Wayne W. LaMorte. Residual Confounding, Confounding byIndication, & Reverse Causality. June 3, 2016.url: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_Confounding-EM/BS704-EP713_Confounding-EM4.html

(visited on 06/10/2020).

[28] Theis Lange et al.“Applied mediation analyses: a review and tutorial”.In: Epidemiology and Health 39 (Aug. 6, 2017). issn: 2092-7193.doi: 10.4178/epih.e2017035.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5723912/(visited on 03/29/2020).

[29] Eugene Lengerich.Lesson 13: Proportional Hazards Regression — STAT 507. 2018.url: https://online.stat.psu.edu/stat507/node/81/ (visitedon 04/04/2020).

48

Page 52: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[30] Andrew S. Levey, Lesley A. Stevens, and Josef Coresh.“Conceptual Model of CKD: Applications and Implications”.In: American Journal of Kidney Diseases. Comprehensive PublicHealth Strategies for Preventing the Development, Progression, andComplications of Chronic Kidney Disease 53.3 (Mar. 1, 2009),S4–S16. issn: 0272-6386. doi: 10.1053/j.ajkd.2008.07.048.url: http://www.sciencedirect.com/science/article/pii/S0272638608017186 (visited on 03/07/2020).

[31] Andrew S. Levey et al.“National Kidney Foundation Practice Guidelines for ChronicKidney Disease: Evaluation, Classification, and Stratification”.In: Annals of Internal Medicine 139.2 (July 15, 2003), p. 137.issn: 0003-4819.doi: 10.7326/0003-4819-139-2-200307150-00013.url: http://annals.org/article.aspx?doi=10.7326/0003-4819-139-2-200307150-00013 (visited on 03/06/2020).

[32] Paul Madley-Dowd et al. “The proportion of missing data should notbe used to guide decisions on multiple imputation”.In: Journal of Clinical Epidemiology 110 (June 1, 2019), pp. 63–73.issn: 0895-4356. doi: 10.1016/j.jclinepi.2019.02.016.url: http://www.sciencedirect.com/science/article/pii/S0895435618308710 (visited on 05/18/2020).

[33] Salima R. S. Al-Maqbali and Waad-Allah S. Mula-Abed.“Comparison between Three Different Equations for the Estimationof Glomerular Filtration Rate in Omani Patients with Type 2Diabetes Mellitus”. In: Sultan Qaboos University Medical Journal14.2 (May 2014), e197–e203. issn: 2075-051X.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997536/(visited on 05/11/2020).

[34] Andrea Marshall et al.“Combining estimates of interest in prognostic modelling studiesafter multiple imputation: current practice and guidelines”.In: BMC Medical Research Methodology 9 (July 28, 2009), p. 57.issn: 1471-2288. doi: 10.1186/1471-2288-9-57.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727536/(visited on 05/22/2020).

[35] Geraldine McCrory et al.“The impact of advanced nurse practitioners on patient outcomes inchronic kidney disease: A systematic review”.In: Journal of Renal Care 44.4 (2018). eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1111/jorc.12245,pp. 197–209. issn: 1755-6686. doi: 10.1111/jorc.12245. url:

49

Page 53: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

http://onlinelibrary.wiley.com/doi/abs/10.1111/jorc.12245

(visited on 03/06/2020).

[36] Mumtaz Memon et al.“Mediation Analysis: Issues and Recommendations”. In: Journal ofApplied Structural Equation Modeling 2 (Jan. 1, 2018), pp. i–ix.

[37] Yvette Meuleman et al. “Perceived Barriers and Support Strategiesfor Reducing Sodium Intake in Patients with Chronic KidneyDisease: a Qualitative Study”. In: International Journal ofBehavioral Medicine 22.4 (Aug. 1, 2015), pp. 530–539.issn: 1532-7558. doi: 10.1007/s12529-014-9447-x.url: https://doi.org/10.1007/s12529-014-9447-x (visited on03/06/2020).

[38] Milica Miocevic et al.“A Tutorial in Bayesian Potential Outcomes Mediation Analysis”.In: Structural Equation Modeling: A Multidisciplinary Journal 25.1(Jan. 2, 2018). Publisher: Routledge eprint:https://doi.org/10.1080/10705511.2017.1342541, pp. 121–136.issn: 1070-5511. doi: 10.1080/10705511.2017.1342541.url: https://doi.org/10.1080/10705511.2017.1342541 (visitedon 05/16/2020).

[39] National Kidney Foundation.Facts About Chronic Kidney Disease. National Kidney Foundation.Library Catalog: www.kidney.org. Feb. 15, 2017.url: https://www.kidney.org/atoz/content/about-chronic-kidney-disease (visited on 03/12/2020).

[40] National Kidney Foundation. GFR. National Kidney Foundation.Library Catalog: www.kidney.org. Oct. 21, 2014.url: https://www.kidney.org/kidneydisease/siemens_hcp_gfr(visited on 03/06/2020).

[41] National Kidney Foundation.What is Kidney Failure? National Kidney Foundation.Library Catalog: www.kidney.org. Jan. 7, 2016.url: https://www.kidney.org/atoz/content/KidneyFailure(visited on 03/07/2020).

[42] Juan F. Navarro-Gonzalez et al. “Inflammatory molecules andpathways in the pathogenesis of diabetic nephropathy”.In: Nature Reviews Nephrology 7.6 (June 2011). Number: 6Publisher: Nature Publishing Group, pp. 327–340. issn: 1759-507X.doi: 10.1038/nrneph.2011.51.url: http://www.nature.com/articles/nrneph.2011.51 (visitedon 03/15/2020).

50

Page 54: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[43] Nederlands Huisartsen Genootschap and Nederlandse InternistenVereniging. “Richtlijn diagnostiek en beleid bij volwassenen metchronische nierschade (CNS)”.In: Federatie Medisch Specialisten (Oct. 2017), p. 139.

[44] NHG-werkgroep Chronische nierschade.“NHG-Standaard Chronische Nierschade”.In: Huisarts en Wetenschap 61.4 (Apr. 2018), pp. 1–58.issn: 0018-7070. (Visited on 03/06/2020).

[45] Jeroen Ooms. Export Data Frames to Excel ’xlsx’ Format. 2019.url: https://CRAN.R-project.org/package=writexl.

[46] Neil Pearce and Debbie A. Lawlor.“Causal inference—so much more than statistics”.In: International Journal of Epidemiology 45.6 (Dec. 1, 2016).Publisher: Oxford Academic, pp. 1895–1903. issn: 0300-5771.doi: 10.1093/ije/dyw328.url: http://academic.oup.com/ije/article/45/6/1895/2999350(visited on 03/13/2020).

[47] Mieke J. Peeters et al. “Nurse Practitioner Care Improves RenalOutcome in Patients with CKD”. In: Journal of the AmericanSociety of Nephrology 25.2 (Feb. 1, 2014). Publisher: AmericanSociety of Nephrology Section: Clinical Research, pp. 390–398.issn: 1046-6673, 1533-3450. doi: 10.1681/ASN.2012121222.url: http://jasn.asnjournals.org/content/25/2/390 (visitedon 03/06/2020).

[48] Rik Pieters. “Meaningful Mediation Analysis: Plausible CausalInference and Informative Communication”.In: Journal of Consumer Research 44.3 (Oct. 1, 2017). Publisher:Oxford Academic, pp. 692–716. issn: 0093-5301.doi: 10.1093/jcr/ucx081.url: http://academic.oup.com/jcr/article/44/3/692/3978091(visited on 03/13/2020).

[49] Jean-Baptist du Prel et al. “Confidence Interval or P-Value?” In:Deutsches Arzteblatt International 106.19 (May 2009), pp. 335–339.issn: 1866-0452. doi: 10.3238/arztebl.2009.0335.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689604/(visited on 05/16/2020).

[50] R Core Team. foreign: Read Data Stored by Minitab, S, SAS, SPSS,Stata, Systat, Weka, dBase, ... 2016.url: https://CRAN.R-project.org/package=foreign.

51

Page 55: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[51] R Core Team.R: A Language and Environment for Statistical Computing.Vienna, Austria, 2014. url: http://www.R-project.org/.

[52] Lorenzo Richiardi, Rino Bellocco, and Daniela Zugna. “Mediationanalysis in epidemiology: methods, interpretation and bias”.In: International Journal of Epidemiology 42.5 (Oct. 1, 2013).Publisher: Oxford Academic, pp. 1511–1519. issn: 0300-5771.doi: 10.1093/ije/dyt127.url: https://academic.oup.com/ije/article/42/5/1511/619987(visited on 03/06/2020).

[53] Donald B. Rubin. “Inference and Missing Data”.In: Biometrika 63.3 (1976). Publisher: [Oxford University Press,Biometrika Trust], pp. 581–592. issn: 0006-3444.doi: 10.2307/2335739. url:https://www.jstor.org/stable/2335739 (visited on 05/22/2020).

[54] Nynke D Scherpbier-de Haan et al.“Effect of shared care on blood pressure in patients with chronickidney disease: a cluster randomised controlled trial”. In: The BritishJournal of General Practice 63.617 (Dec. 2013), e798–e806.issn: 0960-1643. doi: 10.3399/bjgp13X675386.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839388/(visited on 05/17/2020).

[55] Michael Schomaker and Christian Heumann.“Bootstrap inference when using multiple imputation”.In: Statistics in Medicine 37.14 (2018). eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.7654,pp. 2252–2266. issn: 1097-0258. doi: 10.1002/sim.7654. url:http://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7654

(visited on 05/22/2020).

[56] Cosma Rohilla Shalizi.Advanced Data Analysis from an Elementary Point of View.Sept. 2019.url: https://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ (visitedon 05/18/2020).

[57] Jonathan A C Sterne et al. “Multiple imputation for missing data inepidemiological and clinical research: potential and pitfalls”.In: The BMJ 338 (June 29, 2009). issn: 0959-8138.doi: 10.1136/bmj.b2393.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714692/(visited on 05/01/2020).

52

Page 56: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[58] Masataka Taguri, John Featherstone, and Jing Cheng. “Causalmediation analysis with multiple causally non-ordered mediators”.In: Statistical Methods in Medical Research 27.1 (Jan. 1, 2018).Publisher: SAGE Publications Ltd STM, pp. 3–19. issn: 0962-2802.doi: 10.1177/0962280215615899.url: https://doi.org/10.1177/0962280215615899 (visited on05/16/2020).

[59] P. W. G. Tennant et al. Advanced Modelling Strategies: Challengesand pitfalls in robust causal inference with observational data.Advanced Modelling Strategies: Challenges and pitfalls in robustcausal inference with observational data.Conference Name: Advanced Modelling Strategies: Challenges andpitfalls in robust causal inference with observational data ISBN:9781527212084 Library Catalog: eprints.whiterose.ac.uk MeetingName: Advanced Modelling Strategies: Challenges and pitfalls inrobust causal inference with observational data Place: LeedsPublisher: Leeds Institute for Data Analytics. July 16, 2017.url: http://eprints.whiterose.ac.uk/119802/ (visited on03/08/2020).

[60] Johannes Textor et al. “Robust causal inference using directedacyclic graphs: the R package ‘dagitty’”.In: International journal of epidemiology 45.6 (2016). Publisher:Oxford University Press, pp. 1887–1894.

[61] The Global Diabetes Community. Proteinuria, Albuminuria,Symptoms, diagnosing, causes, treatments. Diabetes. LibraryCatalog: www.diabetes.co.uk Section: Diabetes Complications.Jan. 15, 2019. url: https://www.diabetes.co.uk/diabetes-complications/proteinuria.html (visited on 03/07/2020).

[62] Terry M. Therneau. A Package for Survival Analysis in S. 2015.url: https://CRAN.R-project.org/package=survival.

[63] Angela C Webster et al. “Chronic Kidney Disease”.In: The Lancet 389.10075 (Mar. 25, 2017), pp. 1238–1252.issn: 0140-6736. doi: 10.1016/S0140-6736(16)32064-5.url: https://doi-org.ru.idm.oclc.org/10.1016/S0140-6736(16)32064-5 (visited on 03/07/2020).

[64] P. M. ter Wee and A. T. M. Jorna. “Behandeling van patienten metchronische nierinsufficientie; richtlijn voor internisten”.In: Nederlands Tijdschrift voor Geneeskunde 148.15 (Apr. 10, 2004),pp. 719–724.

53

Page 57: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[65] S. Weisberg. “Linear Hypothesis: Regression (Basics)”.In: International Encyclopedia of the Social & Behavioral Sciences.Ed. by Neil J. Smelser and Paul B. Baltes.Oxford: Pergamon, Jan. 1, 2001, pp. 8884–8888.isbn: 978-0-08-043076-8. doi: 10.1016/B0-08-043076-7/00454-X.url: http://www.sciencedirect.com/science/article/pii/B008043076700454X (visited on 05/21/2020).

[66] Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis.Springer-Verlag New York, 2016. isbn: 978-3-319-24277-4.url: https://ggplot2.tidyverse.org.

[67] Hadley Wickham et al. dplyr: A Grammar of Data Manipulation.2017. url: https://CRAN.R-project.org/package=dplyr.

[68] Michael Wood. “Bootstrapped Confidence Intervals as an Approachto Statistical Inference”. In: Organizational Research Methods 8.4(Oct. 1, 2005). Publisher: SAGE Publications Inc, pp. 454–470.issn: 1094-4281. doi: 10.1177/1094428105280059.url: https://doi.org/10.1177/1094428105280059 (visited on05/22/2020).

[69] Yihui Xie. Dynamic Documents with R and knitr. 2nd.Boca Raton, Florida: Chapman and Hall/CRC, 2015.url: https://yihui.org/knitr/.

[70] Yihui Xie, J. J. Allaire, and Garrett Grolemund.R Markdown: The Definitive Guide.Boca Raton, Florida: Chapman and Hall/CRC, 2018.url: https://bookdown.org/yihui/rmarkdown.

[71] Haidan Xu, Lisha Mou, and Zhiming Cai.“A nurse-coordinated model of care versus usual care for chronickidney disease: meta-analysis”.In: Journal of Clinical Nursing 26.11 (2017). eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1111/jocn.13533,pp. 1639–1649. issn: 1365-2702. doi: 10.1111/jocn.13533. url:http://onlinelibrary.wiley.com/doi/abs/10.1111/jocn.13533

(visited on 03/06/2020).

[72] A. D. van Zuilen.“Masterplan: multifactorial approach and superior treatment efficacyin renal patients with the aid of nurse practitioners”.Accepted: 2011-10-14T07:54:23Z ISBN: 9789039356463 Publisher:Utrecht University. Dissertation. Universiteit Utrecht, Oct. 27, 2011.216 pp. url: http://localhost/handle/1874/212135 (visited on03/06/2020).

54

Page 58: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

[73] Arjan D van Zuilen et al. “MASTERPLAN: study of the role ofnurse practitioners in a multifactorial intervention to reducecardiovascular risk in chronic kidney disease patients”.In: Journal of Nephrology 21.3 (2008), p. 8.

[74] Arjan D van Zuilen et al.“Multifactorial approach and superior treatment efficacy in renalpatients with the aid of nurse practitioners. Design of TheMASTERPLAN Study [ISRCTN73187232]”.In: Trials 7 (Mar. 30, 2006), p. 8. issn: 1745-6215.doi: 10.1186/1745-6215-7-8.url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1459200/(visited on 03/06/2020).

[75] Arjan D. van Zuilen et al.“Multifactorial intervention with nurse practitioners does not changecardiovascular outcomes in patients with chronic kidney disease”.In: Kidney International 82.6 (Sept. 2, 2012), pp. 710–717.issn: 0085-2538. doi: 10.1038/ki.2012.137.url: http://www.sciencedirect.com/science/article/pii/S0085253815556068 (visited on 03/06/2020).

55

Page 59: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix A

Exploratory Data Analysis

In this appendix, we present the results of the exploratory data analysisperformed on the MASTERPLAN dataset.

56

Page 60: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

AppendixAMASTERPLANExploratoryDataAnalysisTheMASTERPLANdatasetcontainsCKDpatientdatafrom788subjectsofthetrial.393patientsreceivedstandardcare,while395patientsreceivedadditionalsupportfromanursepractitionerinachievingthe11treatmentgoalssetforthetrial.

Tosummarise,thetrialstartedwith788patientsofwhichonlyonepatientwasdiagnosedwithGFRstage1.Afterthe20thvisit,lessthan25%ofpatientswerestillparticipatinginthetrial.About2/3ofthepatientswasmaleandthemajoritywascaucasian.Sex,race,diagnosisofdiabetesandpreviouscardiovasculardiseasewerefairlyequallydistributedoverthecontrolandinterventiongroups.RAS-inhibitorandARBuseincreasedsignificantlyafterthefirstvisitinbothtreatmentgroups.Moreover,thereweretwoextremeBMImeasurements,andsomeuncommonlyloworhigheGFRmeasurements,neitherofwhichcouldbeidentifiedasoutliers.Afewwaistmeasurementswereenteredas0insteadofNA,whichwascorrected.Furthermore,thealbumin-creatinineratiovariancedecreasedextremelyafterthefirstvisit,andonepatientwasfoundtohavepersistentlylowbloodpressure,bothofwhichwerealsonotidentifiedasoutliers.

TreatmentgroupsTheplotDistributionofpatientsoverCKDstagesshowsthatthepatientsofeachGFRstageareaboutequallydistributedoverthecontrolandinterventiongroups.Therearerelativelyfewpatientsinthestage1andstage5categorieswhichmakessense,asthesearetheextremecases.

Instage1,thereareusuallyfewsymptomsastherenalfunctionisonlyslightlyimpaired,whichmeansthisdiagnosisisoftenmissed.Thisdatasetonlycontainsonepatientwithstage1GFR,whichmeansthatgeneralisationsforthisstagewiththisdatasetcouldyieldhighlyinaccurateconclusions.Instage5,thereisalargerchanceofapatientsatisfyingoneoftheexclusioncriteriafortheMASTERPLANlikereceivingarenaltransplantlessthanayearbeforeinclusion.Analysesonthesetwogroupsshouldbedonewithcaution,asthereisonly1patientwithGFRstage1andonly27patientswithGFRstage5.

VisitsdistributionThetrialstartedwith788patients.TheplotNumerofpatientsparticipatingovervisitsshowstheprogressionofpatientdeclinepervisit.Thehorizontallinesindicate75%,50%and25%ofthenumberofpatientsatbaseline.

Page 61: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

AgeLookingattheagedistribution,itisclearthatmostpatientsfallbetween50-70yearsandthedifferencesbetweentheGFRstagesarenotverylarge(seeDistributionofageoverCKDstages).GFRstage1onlycontainsonedatapoint,sothereisnoproperdistribution.

TheriskofCKDincreaseswithdamageofthekidneys.Kidneydamagegrowswithageandsodoconditionslikediabetesandcoronaryheartdisease,whichinturncausedamagetothekidneys.Thisexplainstheagedistribution.Therearesomeoutliersbetween18and32whicharenotunreasonableasCKDcanbepresentatayoungerage,thoughitislesslikely.

GenderTheproportionofmaletofemalepatientsisnearlyequalinbothtreatmentgroups(0.68to0.32forcontrolversus0.67to0.33fortreatment).InspectingthedistributionofsexovertheCKDGFRstages(seeDistributionofpatientsoverCKDstages),therearelesswomenineachcategoryexceptforstage5intheinterventiongroup.

Page 62: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

RaceThemajorityofthetrialpopulationiscaucasian,asthestudywasperformedinDutchhospitals(seeDistributionofraceoverGFRstages).

DiabetesIngeneral,theproportionofdiabeticpatientsisaboutthesameintheinterventionandcontrolgroups(0.25and0.22,respectively),howeverthedistributionovertheGFRstagesdiffersmore(seeDistributionofdiabeticpatientsoverGFRstages).

Page 63: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

CardiovasculardiseaseTheproportionofpatientswithcardiovasculardiseasedifferssomewhatbetweenthetreatmentgroups(0.25inthecontrolgroupand0.33intheinterventiongroup),anddiffersmorebetweenthedifferentGFRstages(seeDistributionofCVDoverGFRstages).

Ingeneral,theproportionofdiabeticpatientsisaboutthesameintheinterventionandcontrolgroups(0.25and0.22,respectively),howeverthedistributionovertheGFRstagesdiffersmore(seeDistributionofdiabeticpatientsoverGFRstages).

CholesterolBecauselow-densitylipoprotein(LDL)isassociatedwithincreasedriskofcardiovasculardisease,whichinturnisassociatedwithCKDprogression,itisimportanttotreathighlevelsofLDLinCKDpatients[3].Thismeansthatalevelof100mg/dLLDLorbelowisrecommendedforCKDpatients.

TheplotLDLoverGFRstagesforvisit1and5byCVDdiagnosisshowsthatrealizationofthisgoalisimprovedafteroneyearforallGFRstagesandCVDdiagnoses.ItisalsonoteworthythatLDLlevelsarelowerforCKDpatientswithCKDforallGFRstagesandbothvisits,exceptonvisit5forGFRstage2.

Page 64: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

RAS-inhibitorsandARBsThenumberofpatientsdecreaseseveryvisit,sobothuseandnon-useofRAS-inhibitorsandARBsdecreasesovertime.Forsomevisits,RAS-inhibitorusewasnotrecorded,whichalsocontributestothisdecrease(SeeRAS-I&ARBusethroughoutvisits).

Itisclear,howeverthatafterthefirstvisit,theincreaseinRAS-inhibitorandARBusagerelativetonon-usagecanbeexplainedbyasuddenincreaseinprescriptionduetothestartofthetrial.Doctorsofboththeinterventionandcontrolgroupwereinformedofthebloodpressuretreatmentgoal.Itisalsoclearthatthissuddenincreaseispresentinbothgroups,albeitlesssointhecontrolgroup.

BMIBMIiscalculatedwiththeformulaweight(kg)/(height(m))^2,whereobesityisdefinedasBMI>30andunderweightisdefinedasBMI<18.5.thereare2extremeBMIvalues,onebelow15andoneabove50(seeBMIdistribution).

Page 65: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

TheplotWaistcircandBMIshowsthatthesetwovalues(redandgreenforlowandhighBMI,respectively)areprobablyaccurate,astheextremelyhighBMIisassociatedwithanextremelyhighwaistcircumferenceandtheextremelylowBMIisassociatedwithanextremelylowwaistcircumference.ThisisfurthersupportedbyplotWaistcircandhipcirc.

TheplotWaistcircandBMIalsoshowstwootherpatients(purpleandorange)withextremewaistcircumferencevalues.Thepatientwithawaistcircumferenceof0cm(purple)hassimplynotbeenrecordedandshouldbeaNAdatapointinsteadof0,whichiswhyitisnotpresentintheplotWaistcircandhipcirc.TheotherpatientwithaBMIofaround25(orange)hasarelativelysmallwaistcircumference.Thispatientalsohasarelativelysmallhipcircumference(seeWaistcircandhipcirc),soitispossiblethatthehighBMIcanbeexplainedbyasmalllength.

EstimatedglomerularfiltrationrateAsasanitycheck,inthenextplotthedistributionofeGFRoverthedifferentGFRstagesasmeasuredatbaselineisshown.Thisdistributionfitswiththedefinitionofthestages.Oneoutlierisshownforstage2,butthiseGFRiswellwithintheboundsof60and90.

TherearetwopatientsinthedatasetwithonlyoneeGFRdatapointavailable,howeverthesearenotextremevaluesanddonotneedtobeconsideredoutliers(notshown).Furthermore,thereare14patientswithonlytwoeGFRdatapointsavailable,however,thesearealsonotextremethusdonotneedtobeconsideredoutlierseither(notshown).

TheGFRstagesareshowninthetableGFRstages.

Table1:GFRstages

Page 66: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

ThenextplotillustratesanumberofabnormalmeasurementsofeGFRthroughouttheMASTERPLANtrial,excludingthebaseline.ProgressionisnotrequisiteinCKDbutremissionislessfrequent.ProgressionexplainsthedecreaseofeGFRinthedifferentstages,however,theeGFRincreasedisplayedbysomedatapointsislessrealistic.Thereare,forexample,twomeasurementsofpatientsinthestage3bgroupwitheGFRofaround130,whichishigherthanthenormalvalueofpeoplewithoutCKD[1].

Intheplot,valuesthataremorethantwoGFRcategorieslargerthanapatient’sdiagnosisaremarkedas“High”andvaluesthataremorethanthreeGFRcategoriessmallerthanapatient’sdiagnosishavebeenmarked“small”.Remissionthroughrenalreplacementtherapy(RRT)couldexplaintheremissionforpatientswithahigherGFRstage.Thenextplotsprovideamoredetailedviewoftheseextremevalues.

Page 67: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Becausethereareonly14patientswithextreme“High”values,itispossibletolookattheirdataindividuallytodistinguisherrorsfromplausiblevalues.IfthelargesteGFRvalueofapatientistwiceaslargeasthelargestvaluebeforethatone,andthepatientdidnotreceiveRRTthatcouldexplainthisdifference,itcanbetreatedasanoutlier(basedontheupperboundinthetableeGFRoutliersperstage).

Thereisalargedifferencebetweenpre-andpost-transplanteGFRvalues,whichisvisibleastwoseparateclustersontherightinthegraphDistributionofeGFRwithHighextremes(confirmationofdatesbeforeandaftertransplantswasdonebutisnotshown).

OntheleftinthesamegraphitisclearthattherearesomepatientswhohaveoneeGFRvaluethatismuchgreaterthanotherswhichcannotbeexplainedbyRRT.Theseextremevaluesaresomewhatoddbutcouldstillbeaccuratevalues.

Thereare10patientswithextreme“Low”eGFRmeasurements,soitispossibletolookattheseindividually.OntherightintheplotDistributionofeGFRwithLowextremesitisclearthattheextreme“Low”valuescanbeexplainedbyprogressionofCKD.

Intheleftgraphofthesameplottheextreme“Low”valuesarenotpartofsuchatrend,butmightbeexplainedbyepisodesofacutekidneyinjury.

Page 68: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Albumin-creatinineratioTheplotbelowshowsthedistributionofthealbumin-creatinineratiooverckdpatientsatbaseline.AlbuminuriaisnotnecessarilypresentinconjunctionwithCKD,butasCKDgetsworse,thereismorekidneydamage,thusmorealbuminuria.Thisexplainstheupwardstrendintheboxesintheplot.However,therearemultipleextremevaluesinthisgraph,ofupto20timesthelowerboundofseverealbuminuria.Theseextremevaluescallforamoredetailedanalysis.

Thereare31patientswithonlyoneACRmeasurement,twoofwhichwereperformedatthesecondvisit,whiletherestwasperformedatthefirstvisit(notshown).

ThealbuminuriastagesareshowninthetableAlbuminuriastages.

Table3:Albuminuriastages

Page 69: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Byplottingtheacrdistributionforeveryvisit,itbecomesclearthatthebaselinevisithassignificantlylargerextremevalues,uptomorethantwiceasbig(6201)asthelargestfollowupvalue(2750).Thevarianceissignificantlydecreasedafterthefirstvisitanddoesnotchangenearlyasmuchafter(seeDistributionofalbumin-creatinineratioforeachvisit),exceptforvisitslaterthanvisit22astherearefewdataentriesforthosevisits.

Onetheorytoexplainthisvarianceisthatthestartofthetrialchangedtreatmentofthepatients,whichinturndrasticallychangedthealbumin-creatinineratio.Thiscouldbethecase,asnotallpatientsweretreatmentprecedingthetrial.

BloodpressureandheartrateSystolicanddiastolicbloodpressureweremeasuredtwodifferentways;onceyearlywithamoreaccuratemethod(30minutelongprocedure)andtwiceyearlywithalessaccuratemethod(oscillometric).Theheartratewasmeasuredwiththeonceyearlymoreaccuratemethod.Thereisaclearpatterninthetwiceyearlyoscillometricbloodpressuremeasurementsduetorounding(seeDistributionofsbpanddbpforbothmethods).

Page 70: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

ForpatientswithCKD,bloodpressurehigherthan130/80mmHgisconsideredhigh.Aclearcorrelationexistsbetweensystolicanddiastolicbloodpressure,whichisvisibleintheplotSystolicbpoverdiastolicbpperGFRstage,specificallyforGFRstages3aand3b.Intheotherstages,thereareafewdatapointsthatdistortthiscorrelation.Thesedatapoints(markedinblue)arefrom5patientsintotal.

SpecificallyinthegraphofGFRstage2,thebluedatapointsallbelongtoonepatientwitharatherlowdiastolicbloodpressure.Lookingatthemediansofthesystolic-anddiastolicbloodpressuremeasurementswithbothmethods,itappearsthatthesedatapointsarerealmeasuredvaluesandnotmeasurementerrors.

Page 71: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Theheartratewasalsomeasuredwiththe30minutelongprocedureonceayear.Thenormalrestingheartrateforchildrenolderthan10throughseniorsis60-100bpm[2].Alargeamountofthemeasurementsarelowerorhigherthanthisrange,buttheydonotappeartobeobservationalerrors.

SodiumexcretionThemeanforsodiumexcretionliesaround150mmol/24hrforallGFRstages(exceptstage1,asthereareinsufficientdatapointstoproperlyindicateamean),andthereisnotmuchvariationinthisbetweenthefirstandfifthvisit(baselineand1yearcheckup,respectively).

Thetreatmentgoalforsodiumexcretionisamaximumof100mmol/24hr,sotherearemanyvaluesthataremuchlargerthanthislimit,someevenmorethan4timesaslarge.

Page 72: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

References[1]2018.EstimatedGlomerularFiltrationRate(eGFR),[online]Availableat:https://www.kidney.org/atoz/content/gfr(https://www.kidney.org/atoz/content/gfr)[Accessed11April2020].

[2]2019.Pulse,[online]Availableat:https://www.medlineplus.gov/ency/article/003399.htm(https://www.medlineplus.gov/ency/article/003399.htm)[Accessed18April2020].

[3]2019.Cholesterol,Fats,andHeartDisease:WhatYouNeedtoKnow,[online]Availableat:https://www.kidney.org/atoz/content/cholesterol(https://www.kidney.org/atoz/content/cholesterol)[Accessed23April2020]

Page 73: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix B

Missing Data Analysis andImputation

This appendix contains the results of the missing data analysis on the MAS-TERPLAN dataset and the settings used for imputation and an analysis ofits results.

70

Page 74: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

AppendixBMASTERPLANImputationAnalysisThisanalysiswasconstructedwiththehelpoftheMICEvignettesbyGerkoVinkandStefvanBuuren(seehere(https://www.gerkovink.com/miceVignettes/))Tosummarise,thedatawastransformedintolongformatcontainingbaselinevariables,measuresonthefirstvisitandmeasuresonthefifthvisit,creatingadatasetof742entries.Themostcommonpatternwasnomissingvariables(343),andafterthatmissingproteinuriameasurementatvisit5(118).Furthermore,weconcludedthatthedatawasprobablysubjecttoaMARpattern.Becausevisit5proteinuriawasthemostfrequentlymissingvariable(34%),weproduced34imputeddatasetsusingtheMicelibrary.Wefoundgoodconvergencefortheimputationsandperformedananalysisontheimputeddatasets.Weconcludedthattheimputationsweresuccessful.

StructureofthedatasetTheMASTERPLANdatasetcontainsmeasurementsof788patientsoverthecourseofthetrial.Thesemeasurementsincludesystolicbloodpressure,heartrate,albumin-creatinineratio,estimatedglomerularfiltrationrate,RAS-inhibitorusage,etc.Visits1,5,9,13,17and21werethedesignatedyearlycheckupvisitsforboththeinterventionandcontrolgroups.

Thebasedatasetcontains11708entriesofyearlyandnon-yearlyvisitsofthe788patientsuntilcensoringoracompositerenalendpointof50%decreaseineGFR,dialysis,kidneytransplant,ordeath.Thedatasetusedforimputationcontainsonlythebaselinemeasurements(visit1)andthefirstyearlycheckupvisit(visit5).Oftheinitial788patients,742werestillparticipatingatvisit5.

Someofthemoreimportantvariablesforourresearchareoutcome,dateofoutcome,systolicbloodpressure,eGFR,albumin-creatinineratio,RAS-inhibitorandARBuse,proteinuriaandsodiumexcretion.

Furthermore,theoriginaldatasetcontainedseparaterowsforeachvisitwithanidentifyingpatientidandvisitnumber.Fortheimputationanalysisthisdatawastransformedfromlongdataintowidedata,resultinginadatasetof742rowsinsteadoftheoriginal1530.

Themissingnessinthisdatasetismostlydependentonthetypeofvisit;onlytheyearlycheckupsrequiredmeasurementofmostvariables.Thisisillustratedbythelowmedianofamountofmissingvaluesoftheyearlyvisits.However,thiseffectdecreasesafteranumberofvisitsinthedataset,asthealignmentofyearlyvisitswiththevisitnumbergetsworse.

Thepatternofyearlyvisitsisrecognisableintheplotandtablebelow,butitisalsoclearthatitgetslesspronouncedovertime.Toillustratethiswithnumbers,only448patientswhohadacheckupvisitbetween12and18monthsafterenrollinghadlessthan8missingvalues.Atthatpoint,726patientswerestillparticipatinginthetrial.

TheplotalsoshowsthatthetrialwasclosedinJuly2010,75monthsafterthestartingdateinApril2004.Afterthat,onlydataonmortalityandrenaloutcomewerecollected,whichexplainsthenumberofmissingvariables.

Medianofamountofmissingvaluespervisit

Page 75: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Toincreasenumberofmorecompleteentries,somerecordsofthefirstandfifthvisitwerereplaced.Forvisitswithmorethan8missingvalues,onevisitearlierorlaterisselected,ontheconditionthatsuchavisitcontainslessthan8missingvalues.Theresultingsetofdatapointsstilldoesnotcapturealltrue“checkup”visits,however,whichisillustratedintheplotbelow.

Missingnessofvariablesisalsoinfluencedbythemusuallybeingmeasuredtogether,likesystolicanddiastolicbloodpressure,waistandhipcircumference,ordruguse.

MissingdatapatternsThemissingdatapatternanalysisshowsthatthemostcommonpatternisnomissingvaluesinbothvisitsandafterthat,missingproteinuriaonvisit5.Thesetwopatternsareprintedbelowtogetherwiththetotalnumberofmissingentriespervariable.

Page 76: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

##[1]"Twomostcommonmissingdatapatternsandnumberofmissingentriespervariable"##pidtreatmentsexagecaucasiandate_basdiagnosisdiabetescvdbmidob##34311111111111##11811111111111##00000000000##ntx_basckd_stageobesityoutcome_dateoutcomesbp_dnmp.1dbp_dnmp.1##3431111111##1181111111##0000000##hr_dnmp.1creat.1mdrd_175.1dat_visit.1sbp.1dbp.1visit_type.1time.1##34311111111##11811111111##00000000##age_fu.1med_rasi.1med_diur.1dat_visit.5visit_type.5time.5age_fu.5##3431111111##1181111111##0000000##chol.1waisthipwhrcalcium.1phosphate.1hb.1album.1hdl.1creat.5##3431111111111##1181111111111##1222222344##mdrd_175.5smokeldl.1med_rasi.5med_diur.5hb.5sbp.5dbp.5phosphate.5##343111111111##118111111111##499111117171826##calcium.5album.5uprot.1acr.1chol.5sbp_dnmp.5dbp_dnmp.5chdstroke##343111111111##118111111111##333739415055555656##hr_dnmp.5hdl.5urinenatrium.1ldl.5acr.5urinenatrium.5uprot.5##34311111110##11811111101##57626473991322521307

Themissingdatapatternofspecificallythelastfollow-upbeforecensoringorthecompositerenalendpointisalsoofinterest.Themostcommonmissingdatapatternhasallvaluesthatwererecordedatbaselineanddidnotchange(sex,race,etc),whileallothervaluesaremissing(bloodpressure,sodiumexcretion,etc)exceptforeGFR( mdrd_175 ).Partofthisisduetothefactthatnotallvariablesaremeasuredeveryvisitandtheincidenceofaneventorcensoringisnotknownbeforehand.Anotherimportantfactoristhatatleast200patientswerestillincludedduringthetimeofextendedfollow-up,whichmeansthatalotoftheirvaluesaremissingbydefault.

Page 77: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

##[1]"Mostcommonmissingpatternandnumberofmissingentriespervariableonthelastvisit"##pidtreatmentsexagecaucasiandate_basdiagnosisdiabetescvdbmivisit##54111111111111##00000000000##dobntx_basdat_visitvisit_typetimeage_fuckd_stageobesityoutcome_date##541111111111##000000000##outcometypedatawaisthipwhrcreatmdrd_175smokechdstrokemed_rasi##54111111111110##0022222126565586##med_diurhbsbpdbpphosphatealbumcalciumacrcholhdlldlsbp_dnmp##541000000000000##586619629630634649651690709719723741##dbp_dnmphr_dnmpuproturinenatrium##541000017##74174375375611711

Inthefluxplotbelowofthedatausedfortheimputation,ahigheroutfluxindicatespossiblymoreinfluenceaspredictor,whileahigherinfluxindicatesastrongerdependenceontheimputationmodel.Likethemissingdatapatternindicated;intheupperleftarevariableswithmoreknownvalues,whileinthebottomrightarevariableswithmoremissingvalues.

Alotofthevariablesareinthetopleft,meaningtheyareprobablysignificantlyinfluentialaspredictorsintheimputationmodel,becausealotoftheirvaluesareknown.Theinverseistrueforthevariablesinthebottomright.

TypeofmissingnessIncompletevariablescancomeaboutbydifferentmechanisms;missingcompletelyatrandom(MCAR),missingatrandom(MAR),ormissingnotatrandom(MNAR).MCARmeansthatthereisnosystematicpatternformissingvariables,MARmeansthatsystematicpatternsformissingvariablescanbeexplainedcompletelywithotherobservedvariables,andMNARmeansthatthesepatternscannotbeexplainedbyobservedvariables.

Byplottinganobservedvariableagainstthemissingnessofothervariables,MARmechanismscanbeuncovered.Thiscanbeseenasright-tailedorleft-tailedMARmissingness.Whentheplotoftheobserveddataisfurthertotherightwhentheothervariableisobservedaswellandthusisfurthertotheleftwhentheothervariableisnotobserved,thisindicatesaright-tailedMARmissingness.Whenthereisnotaclearhorizontalshiftintheplots,thereisaMCARorpossibleMNARmissingness.

Fromthe5plotsbelow,itisclearthatthemissingnessofmostsuspectedvariablesisnotdependentoneGFR.Thisissomewhatnotable,asalowereGFRindicatesgreaterillnessinapatient,whichmightcauseadoctortorefrainfrommeasurementstominimizetheburdenonthepatient.ThisimpliesthatthemissingnessoftheplottedvariablesisMCARorpossiblyMNAR.

Furthermore,SBPandRAS-Iuseonvisit5haveaverylimitedamountofmissingvalues(17and11,respectivelyof742total),wherethelargestamountoccurswithloweGFRvaluesofthesamevisit.Thus,thismissingnesscouldbeatleastsomewhatright-tailed.

Page 78: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 79: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

TogetsomeassuranceofthedatabeingMAR,onecanperformasensitivityanalysis.Byaddingorsubtractingsomeamount(delta)toanimputedvalue,theimputationswillchange.Iftheadjustmenthaslittleeffectontheanalysisoftheimputations,thedataisstabletothattypeofMNAR-mechanism,thusitismoreprobablethatthedataisMAR.

ImputationFirst,a‘dryrun’isperformedwiththemicefunctiontoinitializetheimputationmethods( meth )andpredictormatrix( pred )forallvariables.

set.seed(1235)ini<-mice(m.re.imp,maxit=0)meth<-ini$methpred<-ini$pred

Bydefault,miceusespredictivemeanmatchingfornumericals,logisticregressionforfactorswith2levels,polytomousregressionforunorderedfactorsof2+levels,andaproportionaloddsmodelfororderedfactorsof2+levels.Forthisanalysisonlypredictivemeanmatchingandlogisticregressionareusedbecauseofthetypesofvariablesthataremissing.

Tofurtherimprovetheimputationresults,passiveimputationwasusedforthewaist-hipratio(waist/hip).Becausetheinitialpredictormatrixusesallvariablesexceptitselfforimputation,thiswouldcauseafeedbackloop.Thusthewaist-hipratioshouldbeexcludedfromthesetofpredictorsforwaistandhipcircumference.

Page 80: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Therearealsosomevariablesthatshouldbechosenaspredictorswithcautionbecausetheyarehighlycorrelated,forexamplebmiandobesity(definedasbmi>30).Thepredictormatrixwascomputedwiththe quickpred functionwithaminimalcorrelationof0.1andaminimumproportionofusablecasesof0.25.Ageandsexwereincludedaspredictorforeveryvariable,whilepidwasexcluded.Finally,somevariablesweremanuallyexcluded.

Theimputationmethodforeachvariableisshownbelow.Thepredictormatrixistoolargetoproperlydisplay,soonlythecommandsfortheconstructionandthecountofvariables(row2)usingxamountofpredictorvariables(row1)isshown.

meth["whr"]<-"~I(waist/hip)"

incl=c("age","sex")excl=c("pid","waist","hip","obesity","visit_type.5","age_fu.1","age_fu.5","date_bas","time.1","time.5")pred<-quickpred(data=m.re.imp,mincor=0.1,minpuc=0.25,include=incl,exclude=excl)

pred[c("waist","hip"),"whr"]<-0pred[c("waist","hip"),"bmi"]<-1pred["whr",c("waist","hip")]<-1

pred[c("ldl.1","hdl.1"),"bmi"]<-1

pred[grep("\\.5",rownames(pred),invert=T),grep("\\.5",rownames(pred))]<-0

pred[grep("\\.(1|5)",rownames(pred)),"mdrd_175.1"]<-1pred[grep("\\.5",rownames(pred)),"mdrd_175.5"]<-1pred[grep("\\.(1|5)",rownames(pred)),"outcome_date"]<-1

pred["sbp_dnmp.5","dbp_dnmp.5"]<-1pred["dbp_dnmp.5","sbp_dnmp.5"]<-1pred["sbp.5","dbp.5"]<-1pred["dbp.5","sbp.5"]<-1

##pidtreatmentsexagecaucasian##""""""""""##date_basdiagnosisdiabetescvdchd##"""""""""logreg"##strokesmokebmiwaisthip##"logreg""logreg""""pmm""pmm"##whrdobntx_basckd_stageobesity##"~I(waist/hip)"""""""""##outcome_dateoutcomesbp_dnmp.1dbp_dnmp.1hr_dnmp.1##""""""""""##chol.1hdl.1ldl.1calcium.1phosphate.1##"pmm""pmm""pmm""pmm""pmm"##hb.1album.1creat.1uprot.1mdrd_175.1##"pmm""pmm""""pmm"""##dat_visit.1sbp.1dbp.1urinenatrium.1acr.1##"""""""pmm""pmm"##visit_type.1time.1age_fu.1med_rasi.1med_diur.1##""""""""""##sbp_dnmp.5dbp_dnmp.5hr_dnmp.5chol.5hdl.5##"pmm""pmm""pmm""pmm""pmm"##ldl.5calcium.5phosphate.5hb.5album.5##"pmm""pmm""pmm""pmm""pmm"##creat.5uprot.5mdrd_175.5dat_visit.5sbp.5##"pmm""pmm""pmm""""pmm"##dbp.5urinenatrium.5acr.5visit_type.5time.5##"pmm""pmm""pmm"""""##age_fu.5med_rasi.5med_diur.5##"""logreg""logreg"

Thenumberofimputationsissettothepercentageofmissingnessofthevariablethatismissingmostoften;proteinuriaonvisit5ismissing252ofthe742times,sothenumberofimputationsissetto34.Foreachimputation,5iterationsareperformed.

imp<-mice(m.re.imp,meth=meth,pred=pred,print=FALSE,m=34)

ImputationanalysisToanalyzetheimputations,itisimportanttolookattheconvergenceanddistributionoftheimputations.Thegraphsbelowshowthemeanandstandarddeviationoftheimputedvariablesperimputationovertheiterations.

Page 81: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 82: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 83: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 84: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 85: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 86: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Thenextplotswillshowtherangeoftheobservedandimputedvaluesfor6randomlychosenimputationsofthe34ofwaistversuship,whrversuswaist,sbpversusdbpatvisit5,sbp_dnmpanddbp_dnmpatvisit5,albumversussbp_dnmpatvisit1,ldlversushdlatvisit1,andldlversushdlatvisit5.Theimputedvaluesarewithintherangeoftheobservedvaluesandlookreasonable.

Page 87: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 88: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 89: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Next,thestripplotsofalbum.1,mdrd_175.5,creat.5andhb.5areshownfor6randomlychosenimputations.Thesevariablesareplottedseparatelybecausetheycontainveryfewmissingvalues.Thedistributionoftheimputedvariablesseemsreasonable.

Page 90: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 91: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Belowarethedensityplotsofsomeotherimputedvariables.Thedistributionsoftheimputedvaluesaremostlyclosetotheoriginalvalues.

Page 92: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 93: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 94: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 95: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation
Page 96: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Furthermore,smokeandstrokecontain9and56missingvalues,sotheratiooftheoriginaldataremainsaboutthesame,regardlessoftheimputedvalues.

Thus,afterconsiderationoftheimputedvalues,itseemsthatthesearereasonableandcanbeusedforfurtheranalyses.

Page 97: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix C

Individual DAGs

This appendix contains the dagitty code for the base DAG and the fourindividual DAGs.

−−−Base−−−

dag {ACEI ARB T0 [ pos =”0 .000 ,2 .000” ]BP T0 [ pos =”0 .000 ,1 .000” ]In t e rv en t i on [ exposure , pos =”0 .000 ,5 .000” ]PU T0 [ pos =”0 .000 ,3 .000” ]SE T0 [ pos =”0 .000 ,0 .000” ]eGFR T0 [ pos =”0 .000 ,4 .000” ]PU T0 −> BP T0 [ pos =”0 .315 ,2 .016” ]SE T0 −> BP T0SE T0 −> PU T0 [ pos =”0 .315 ,1 .365” ]eGFR T0 −> ACEI ARB T0 [ pos =”0 .284 ,2 .971” ]eGFR T0 −> PU T0eGFR T0 −> SE T0 [ pos =”0 .729 ,2 .018” ]

}

−−−Blood Pressure−−−

dag {ACEI ARB T0 [ pos =”0 .000 ,2 .000” ]BP T0 [ pos =”0 .000 ,1 .000” ]BP T1 [ pos =”1 .000 ,0 .000” ]ESRD [ outcome , pos =”2 .000 ,3 .000” ]In t e rv en t i on [ exposure , pos =”0 .000 ,5 .000” ]PU T0 [ pos =”0 .000 ,3 .000” ]SE T0 [ pos =”0 .000 ,0 .000” ]eGFR T0 [ pos =”0 .000 ,4 .000” ]ACEI ARB T0 −> BP T1ACEI ARB T0 −> ESRDBP T0 −> BP T1BP T1 −> ESRDInt e rv en t i on −> BP T1In t e rv en t i on −> ESRD

94

Page 98: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

PU T0 −> BP T0 [ pos =”0 .315 ,2 .016” ]PU T0 −> BP T1PU T0 −> ESRDSE T0 −> BP T0SE T0 −> BP T1SE T0 −> ESRDSE T0 −> PU T0 [ pos =”0 .315 ,1 .365” ]eGFR T0 −> ACEI ARB T0 [ pos =”0 .284 ,2 .971” ]eGFR T0 −> BP T1eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> SE T0 [ pos =”0 .729 ,2 .018” ]

}

−−−Prote inur ia−−−

dag {ACEI ARB T0 [ pos =”0 .000 ,2 .000” ]BP T0 [ pos =”0 .000 ,1 .000” ]ESRD [ outcome , pos =”2 .000 ,3 .000” ]In t e rv en t i on [ exposure , pos =”0 .000 ,5 .000” ]PU T0 [ pos =”0 .000 ,3 .000” ]PU T1 [ pos =”1 .000 ,0 .000” ]SE T0 [ pos =”0 .000 ,0 .000” ]eGFR T0 [ pos =”0 .000 ,4 .000” ]ACEI ARB T0 −> ESRDACEI ARB T0 −> PU T1BP T0 −> ESRDBP T0 −> PU T1In t e rv en t i on −> ESRDInt e rv en t i on −> PU T1PU T0 −> BP T0 [ pos =”0 .315 ,2 .016” ]PU T0 −> PU T1PU T1 −> ESRDSE T0 −> BP T0SE T0 −> ESRDSE T0 −> PU T0 [ pos =”0 .315 ,1 .365” ]eGFR T0 −> ACEI ARB T0 [ pos =”0 .284 ,2 .971” ]eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> PU T1eGFR T0 −> SE T0 [ pos =”0 .729 ,2 .018” ]

}

−−−ACE Inh i b i t o r or ARB use−−−

dag {ACEI ARB T0 [ pos =”0 .000 ,2 .000” ]ACEI ARB T1 [ pos =”1 .000 ,0 .000” ]BP T0 [ pos =”0 .000 ,1 .000” ]ESRD [ outcome , pos =”2 .000 ,3 .000” ]In t e rv en t i on [ exposure , pos =”0 .000 ,5 .000” ]PU T0 [ pos =”0 .000 ,3 .000” ]SE T0 [ pos =”0 .000 ,0 .000” ]

95

Page 99: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

eGFR T0 [ pos =”0 .000 ,4 .000” ]ACEI ARB T0 −> ACEI ARB T1ACEI ARB T1 −> ESRDBP T0 −> ACEI ARB T1BP T0 −> ESRDInt e rv en t i on −> ACEI ARB T1In t e rv en t i on −> ESRDPU T0 −> BP T0 [ pos =”0 .315 ,2 .016” ]PU T0 −> ACEI ARB T0PU T0 −> ESRDSE T0 −> BP T0SE T0 −> ESRDSE T0 −> PU T0 [ pos =”0 .315 ,1 .365” ]eGFR T0 −> ACEI ARB T0 [ pos =”0 .284 ,2 .971” ]eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> SE T0 [ pos =”0 .729 ,2 .018” ]

}

−−−Sodium Excret ion−−−

dag {ACEI ARB T0 [ pos =”0 .000 ,2 .000” ]BP T0 [ pos =”0 .000 ,1 .000” ]ESRD [ outcome , pos =”2 .000 ,3 .000” ]In t e rv en t i on [ exposure , pos =”0 .000 ,5 .000” ]PU T0 [ pos =”0 .000 ,3 .000” ]SE T0 [ pos =”0 .000 ,0 .000” ]SE T1 [ pos =”1 .000 ,0 .000” ]eGFR T0 [ pos =”0 .000 ,4 .000” ]ACEI ARB T0 −> ESRDBP T0 −> ESRDInt e rv en t i on −> ESRDInt e rv en t i on −> SE T1PU T0 −> BP T0 [ pos =”0 .315 ,2 .016” ]PU T0 −> ESRDSE T0 −> BP T0SE T0 −> PU T0 [ pos =”0 .315 ,1 .365” ]SE T0 −> SE T1SE T1 −> ESRDeGFR T0 −> ACEI ARB T0 [ pos =”0 .284 ,2 .971” ]eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> SE T0 [ pos =”0 .729 ,2 .018” ]

}

96

Page 100: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix D

Implication Testing Script

This appendix contains the script for testing the conditional independen-cies of our DAGs, specifically for the DAG containing the blood pressuretreatment goal as mediator.

############################### Load requ i r ed packages##############################

l i b r a r y ( dplyr )l i b r a r y (mice )l i b r a r y ( dag i t ty )l i b r a r y ( w r i t e x l )f i l t e r <− dplyr : : f i l t e r

############################### Impute miss ing data##############################

# Read imputation data# This CSV f i l e conta in s the ba s e l i n e and f i f t h v i s i t va lue s o f

742 pa t i en t s in long formatimputation . data <− read . csv (”Z : / data // imputat iondata . csv ”)

# Transform date and f a c t o r columnsf a c t o r . vars <− c (” treatment ” , ” sex ” , ” caucas ian ” , ” d i abe t e s ” , ”

cvd ” , ”chd ” , ” s t r oke ” , ”smoke ” ,” ntx bas ” , ” obe s i t y ” , ”med ras i . 1 ” , ”med diur

. 1 ” , ”med ras i . 5 ” , ”med diur . 5 ” )date . vars <− c (” date bas ” , ”dob” , ” outcome date ” , ” d a t v i s i t . 1 ” ,

” d a t v i s i t . 5 ” )s tage . f i l t e r <− c (”GFR stage 3a ” , ”GFR stage 3b” , ”GFR stage 4”)

imputation . data [ f a c t o r . vars ] <− l app ly ( imputation . data [ f a c t o r .vars ] , as . f a c t o r )

imputation . data [ date . vars ] <− l app ly ( imputation . data [ date . vars ] ,as . Date )

97

Page 101: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

# Rename columns to match DAG va r i a b l e s and exc lude GFR stage 1 ,2 and 5

imputation . data <− imputation . data %>%mutate ( outcome = i f e l s e ( outcome==”Censored ” , 0 , 1) ,

outcome = as . f a c t o r ( outcome ) ) %>%rename (ACEI ARB T0=med ras i . 1 , ACEI ARB T1=med ras i . 5 , BP T0=

sbp dnmp . 1 , BP T1=sbp dnmp . 5 ,ESRD=outcome , In t e rv en t i on=treatment , PU T0=uprot . 1 ,

PU T1=uprot . 5 ,SE T0=urinenatr ium . 1 , SE T1=urinenatr ium . 5 , eGFR T0=

mdrd 175 . 1 ) %>%f i l t e r ( ckd s tage %in% stage . f i l t e r )

# I n i t i a l i z e imputations e t . seed (123567)i n i <− mice ( imputation . data , maxit=0)meth <− in i$methpred <− i n i$pr ed

# Setup imputationmeth [ ”whr ” ] <− ”˜ I ( wais t / hip ) ”

i n c l = c (” age ” , ” sex ”)ex c l = c (” pid ” , ”wais t ” , ” hip ” , ” obe s i t y ” , ” v i s i t t y p e . 5 ” , ”

age fu . 1” , ” age fu . 5” ,” date bas ” , ” time . 1” , ” time . 5 ” )

pred <− quickpred ( data=imputation . data , mincor=0.1 , minpuc=0.25 ,i n c lude=inc l , exc lude=exc l )

# Rows i nd i c a t e the va lue s to be imputed and columns i nd i c a t ep r ed i c t o r v a r i a b l e s

pred [ c (” wais t ” ,” hip ”) , ”whr ” ] <− 0pred [ c (” wais t ” ,” hip ”) , ”bmi ” ] <− 1pred [ ”whr” , c (” wais t ” ,” hip ”) ] <− 1

pred [ c (” l d l . 1” , ” hdl . 1 ” ) , ”bmi ” ] <− 1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) , i n v e r t=T) , grep ( ” ( \ \ . 5 )| ( T1 ) ” , rownames ( pred ) ) ] <− 0

pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ”eGFR T0” ] <−1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) ) , ”mdrd 175 . 5 ” ] <− 1pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ” outcome date

” ] <− 1

pred [ ”BP T1” , ”dbp dnmp . 5 ” ] <− 1pred [ ” dbp dnmp .5” , ”BP T1” ] <− 1pred [ ” sbp . 5” , ”dbp . 5 ” ] <− 1pred [ ” dbp . 5” , ”sbp . 5 ” ] <− 1

# Imputation o f 34 da ta s e t s

98

Page 102: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

imp <− par lmice ( imputation . data , meth=meth , pred=pred , p r i n t=FALSE, n . core=2, n . imp . core=17)

############################### Import DAG##############################

# Setup DAG# This i s the DAG with the blood pr e s su r e treatment goa l as

mediatordag . masterplan <− dag i t ty (” dag {

ACEI ARB T0 [ pos =\”0 .000 ,2 .000\” ]BP T0 [ pos =\”0 .000 ,1 .000\” ]BP T1 [ pos =\”1 .000 ,1 .000\” ]ESRD [ outcome , pos =\”2 .000 ,3 .000\” ]In t e rv en t i on [ exposure , pos =\”0 .000 ,5 .000\” ]PU T0 [ pos =\”0 .000 ,3 .000\” ]SE T0 [ pos =\”0 .000 ,0 .000\” ]eGFR T0 [ pos =\”0 .000 ,4 .000\” ]ACEI ARB T0 −> BP T1ACEI ARB T0 −> ESRDBP T0 −> BP T1BP T1 −> ESRDInt e rv en t i on −> ESRDInt e rv en t i on −> BP T1PU T0 −> BP T0PU T0 −> BP T1PU T0 −> ESRDSE T0 −> BP T0SE T0 −> BP T1SE T0 −> PU T0SE T0 −> ESRDeGFR T0 −> ACEI ARB T0eGFR T0 −> BP T1eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> SE T0}”)

p l o t ( dag . masterplan )

############################### Check DAG imp l i c a t i on s##############################

# Get c ond i t i o na l independenc i e s from DAGcondindep <− imp l i edCond i t i ona l Independenc i e s ( dag . masterplan )

# There i s no automated l o c a l t e s t i n g f o r imputed data and i tdoes not work i f NAs are pre sent

# A workflow f o r t h i s i s implemented manually below

# Create s e t s o f l i s t s to s t o r e r e s u l t s

99

Page 103: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

r e s u l t . imp l i c a t i o n s <− vec to r (mode = ” l i s t ” , l ength = length (condindep ) )

imp l i c a t i on s <− vec to r (mode = ” l i s t ” , l ength = length ( condindep ))

# Obtain a l l l o c a l t e s t s ( us ing l i n e a r r e g r e s s i o n ) f o r a l limpl i ed c ond i t i o na l independenc i e s

f o r ( j in s eq a l ong ( condindep ) ) {

# Create a r e g r e s s i o n formula from the impl i ed c ond i t i o na lindependenc i e s

x <− condindep [ [ j ] ] $Xy <− condindep [ [ j ] ] $Yz <− condindep [ [ j ] ] $Z

# Normalise the independent va r i a b l e i f i t i s not binomiali f ( g r ep l (” (ACEI ARB T(0 | 1 ) ) | (ESRD) | ( In t e rv en t i on ) ” , y ) ) {

eq <− paste0 (x , ” ˜ ” , y )} e l s e {

eq <− paste0 (x , ” ˜ s c a l e (” , y , ”) ”)}f o r ( i in s eq a l ong ( z ) ) {

i f ( g r ep l (” (ACEI ARB T( 0 | 1 ) ) | (ESRD) | ( In t e rv en t i on ) ” , z [ i ] ) ) {eq <− paste0 ( eq , ” + ” , z [ i ] )

} e l s e {eq <− paste0 ( eq , ” + s c a l e (” , z [ i ] , ”) ”)

}}

# Perform the l o c a l t e s t on the imputed data

# Use l o g i s t i c r e g r e s s i o n f o r binomial dependent v a r i a b l e s and# l i n e a r r e g r e s s i o n f o r cont inous dependent v a r i a b l e si f ( g r ep l (” (ACEI ARB T(0 | 1 ) ) | (ESRD) | ( In t e rv en t i on ) ” , x ) ) {

imp l i c a t i on s [ [ j ] ] <− with ( imp , glm ( as . formula ( eq ) , f ami ly=binomial ) )

} e l s e {imp l i c a t i on s [ [ j ] ] <− with ( imp , lm( as . formula ( eq ) ) )

}

# Desc r ip t i on o f the l o c a l t e s tl c l t e s t . de sc r <− paste0 (x , ” | | ” , y )i f ( l ength ( z ) > 0) {

l c l t e s t . de sc r <− paste0 ( l c l t e s t . descr , ” | ”)f o r ( i in s eq a l ong ( z ) ) {

l c l t e s t . de sc r <− paste0 ( l c l t e s t . descr , z [ i ] )i f ( i < l ength ( z ) ) {

l c l t e s t . de sc r <− paste0 ( l c l t e s t . descr , ” + ”)}

}}

# Obtain the pooled r e s u l t

100

Page 104: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

r e s u l t . imp l i c a t i o n s [ [ j ] ] <− cbind ( l c l t e s t . descr , summary( pool (imp l i c a t i on s [ [ j ] ] ) ) [ 2 , ] )

}

# Combine the r e s u l t s and export to . x l sx f i l er e s u l t . imp l i c a t i o n s <− do . c a l l ( rbind , r e s u l t . imp l i c a t i o n s )rownames ( r e s u l t . imp l i c a t i o n s ) <− NULLr e s u l t . imp l i c a t i o n s [ , c ( 3 : nco l ( r e s u l t . imp l i c a t i on s ) ) ] <−

round ( r e s u l t . imp l i c a t i o n s [ , c ( 3 : nco l ( r e s u l t . imp l i c a t i on s ) ) ] , 3 )w r i t e x l s x (x=r e s u l t . imp l i c a t i on s , path=”impl i cat ionsBP uprot . x l sx

”)

101

Page 105: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix E

Edge Estimation Script

This appendix contains the script for the edge estimation of our DAGs,specifically the DAG with the blood pressure treatment goal as mediator.

############################### Load requ i r ed packages##############################

l i b r a r y ( dplyr )l i b r a r y ( l ub r i d a t e )l i b r a r y (mice )l i b r a r y ( dag i t ty )l i b r a r y ( s u r v i v a l )l i b r a r y ( w r i t e x l )l i b r a r y ( boot )l i b r a r y ( Amelia )f i l t e r <− dplyr : : f i l t e r

############################### Impute miss ing data##############################

# Read imputation data# This CSV f i l e conta in s the ba s e l i n e and f i f t h v i s i t va lue s o f

742 pa t i en t s in long formatimputation . data <− read . csv (”Z : / data // imputat iondata . csv ”)

# Transform date and f a c t o r columnsf a c t o r . vars <− c (” treatment ” , ” sex ” , ” caucas ian ” , ” d i abe t e s ” , ”

cvd ” , ”chd ” , ” s t r oke ” , ”smoke ” ,” ntx bas ” , ” obe s i t y ” , ”med ras i . 1 ” , ”med diur

. 1 ” , ”med ras i . 5 ” , ”med diur . 5 ” )date . vars <− c (” date bas ” , ”dob” , ” outcome date ” , ” d a t v i s i t . 1 ” ,

” d a t v i s i t . 5 ” )s tage . f i l t e r <− c (”GFR stage 3a ” , ”GFR stage 3b” , ”GFR stage 4”)

imputation . data [ f a c t o r . vars ] <− l app ly ( imputation . data [ f a c t o r .vars ] , as . f a c t o r )

102

Page 106: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

imputation . data [ date . vars ] <− l app ly ( imputation . data [ date . vars ] ,as . Date )

# Rename columns to match DAG var i ab l e s , prepare f o r Coxpropo r t i ona l hazards r e g r e s s i on ,

# and exc lude GFR stage 1 , 2 and 5imputation . data <− imputation . data %>%

mutate ( outcome = i f e l s e ( outcome==”Censored ” , 1 , 2) ,outcome = as . f a c t o r ( outcome ) ,outcome time = i n t e r v a l ( date bas , outcome date ) %/%

months (1 ) ) %>%rename (ACEI ARB T0=med ras i . 1 , ACEI ARB T1=med ras i . 5 , BP T0=

sbp dnmp . 1 , BP T1=sbp dnmp . 5 ,ESRD=outcome , In t e rv en t i on=treatment , PU T0=uprot . 1 ,

PU T1=uprot . 5 ,SE T0=urinenatr ium . 1 , SE T1=urinenatr ium . 5 , eGFR T0=

mdrd 175 . 1 ) %>%f i l t e r ( ckd s tage %in% stage . f i l t e r )

#Test Cox propo r t i ona l hazards assumptionf i t <− coxzph ( Surv ( outcome time , as . numeric (ESRD) ) ˜

In t e rv en t i on + BP T1 +ACEI ARB T0 + PU T0 + SE T0 + eGFR T0 ,

data = imputation . data )ggcoxzph ( cox . zph ( f i t ) )

# I n i t i a l i z e imputations e t . seed (123567)i n i <− mice ( imputation . data , maxit=0)meth <− in i$methpred <− i n i$pr ed

# Setup imputationmeth [ ”whr ” ] <− ”˜ I ( wais t / hip ) ”

i n c l = c (” age ” , ” sex ”)ex c l = c (” pid ” , ”wais t ” , ” hip ” , ” obe s i t y ” , ” v i s i t t y p e . 5 ” , ”

age fu . 1” , ” age fu . 5” ,” date bas ” , ” time . 1” , ” time . 5” , ” outcome time ”)

pred <− quickpred ( data=imputation . data , mincor=0.1 , minpuc=0.25 ,i n c lude=inc l , exc lude=exc l )

# Rows i nd i c a t e the va lue s to be imputed and columns i nd i c a t ep r ed i c t o r v a r i a b l e s

pred [ c (” wais t ” ,” hip ”) , ”whr ” ] <− 0pred [ c (” wais t ” ,” hip ”) , ”bmi ” ] <− 1pred [ ”whr” , c (” wais t ” ,” hip ”) ] <− 1

pred [ c (” l d l . 1” , ” hdl . 1 ” ) , ”bmi ” ] <− 1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) , i n v e r t=T) , grep ( ” ( \ \ . 5 )| ( T1 ) ” , rownames ( pred ) ) ] <− 0

103

Page 107: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ”eGFR T0” ] <−1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) ) , ”mdrd 175 . 5 ” ] <− 1pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ” outcome date

” ] <− 1

pred [ ”BP T1” , ”dbp dnmp . 5 ” ] <− 1pred [ ” dbp dnmp .5” , ”BP T1” ] <− 1pred [ ” sbp . 5” , ”dbp . 5 ” ] <− 1pred [ ” dbp . 5” , ”sbp . 5 ” ] <− 1

# Imputation o f 34 da ta s e t simp <− par lmice ( imputation . data , meth=meth , pred=pred , p r i n t=

FALSE, n . core=2, n . imp . core=17)

############################### Import DAG##############################

# Setup DAG# This i s the DAG with the blood pr e s su r e treatment goa l as

mediatordag . masterplan <− dag i t ty (” dag {

ACEI ARB T0 [ pos =\”0 .000 ,2 .000\” ]BP T0 [ pos =\”0 .000 ,1 .000\” ]BP T1 [ pos =\”1 .000 ,1 .000\” ]ESRD [ outcome , pos =\”2 .000 ,3 .000\” ]In t e rv en t i on [ exposure , pos =\”0 .000 ,5 .000\” ]PU T0 [ pos =\”0 .000 ,3 .000\” ]SE T0 [ pos =\”0 .000 ,0 .000\” ]eGFR T0 [ pos =\”0 .000 ,4 .000\” ]ACEI ARB T0 −> BP T1ACEI ARB T0 −> ESRDBP T0 −> BP T1BP T1 −> ESRDInt e rv en t i on −> ESRDInt e rv en t i on −> BP T1PU T0 −> BP T0PU T0 −> BP T1PU T0 −> ESRDSE T0 −> BP T0SE T0 −> BP T1SE T0 −> PU T0SE T0 −> ESRDeGFR T0 −> ACEI ARB T0eGFR T0 −> BP T1eGFR T0 −> ESRDeGFR T0 −> PU T0eGFR T0 −> SE T0}”)

p l o t ( dag . masterplan )

##############################

104

Page 108: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

# Generate r e g r e s s i o n formulas##############################

# To ca l c u l a t e the c o e f f i c i e n t s f o r edges from parents o f abinomial va r i ab l e , use l o g i s t i c r e g r e s s i on ,

# f o r edges from parents o f a cont inuous var i ab l e , use l i n e a rr e g r e s s i on , and

# f o r edges from parents o f the outcome var i ab l e , use Coxr e g r e s s i o n

# Create a l i s t with each va r i ab l e and i t s parentsdepindvars <− vec to r (mode = ” l i s t ” , l ength = length ( names ( dag .

masterplan ) ) )f o r ( j in s eq a l ong ( names ( dag . masterplan ) ) ) {

depindvars [ [ j ] ] $X <− names ( dag . masterplan ) [ j ]depindvars [ [ j ] ] $Y <− parents ( dag . masterplan , names ( dag .

masterplan ) [ j ] )}

# Create the l i s t o f r e g r e s s i o n formulasformulas <− l i s t ( )formula index <− 1f o r ( i in s eq a l ong ( depindvars ) ) {

x <− depindvars [ [ i ] ] $Xy <− depindvars [ [ i ] ] $Y

i f ( l ength (y ) > 0) {# For binary va r i ab l e s , use l o g i s t i c r e g r e s s i o ni f ( g r ep l (”ACEI ARB T( 0 | 1 ) ” , x ) ) {

formula <− paste0 (” glm (” , x , ” ˜ ”)f o r ( k in s eq a l ong (y ) ) {# Sca l e cont inuous independent v a r i a b l e si f ( g r ep l (” (ACEI ARB T( 0 | 1 ) ) | ( In t e rv en t i on ) ” , y [ k ] ) ) {

formula <− paste0 ( formula , y [ k ] )} e l s e {

formula <− paste0 ( formula , ” s c a l e (” , y [ k ] , ”) ”)}i f ( k < l ength (y ) ) {

formula <− paste0 ( formula , ” + ”)} e l s e {

formula <− paste0 ( formula , ” , f ami ly=’binomial ’ ) ”)}

}# For the s u r v i v a l outcome , use Cox propo r t i ona l hazards

r e g r e s s i o n} e l s e i f ( g r ep l (”ESRD” , x ) ) {

formula <− paste0 (” coxph ( Surv ( outcome time , as . numeric (” , x, ”) ) ˜ ”)

f o r ( k in s eq a l ong (y ) ) {# Sca l e cont inuous independent v a r i a b l e si f ( g r ep l (” (ACEI ARB T( 0 | 1 ) ) | ( In t e rv en t i on ) ” , y [ k ] ) ) {

formula <− paste0 ( formula , y [ k ] )} e l s e {

formula <− paste0 ( formula , ” s c a l e (” , y [ k ] , ”) ”)

105

Page 109: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

}i f ( k < l ength (y ) ) {

formula <− paste0 ( formula , ” + ”)} e l s e {

formula <− paste0 ( formula , ”) ”)}

}# For cont inuous va r i ab l e s , use l i n e a r r e g r e s s i o n} e l s e {

formula <− paste0 (” lm (” , x , ” ˜ ”)f o r ( k in s eq a l ong (y ) ) {# Sca l e cont inuous independent v a r i a b l e si f ( g r ep l (” (ACEI ARB T( 0 | 1 ) ) | ( In t e rv en t i on ) ” , y [ k ] ) ) {

formula <− paste0 ( formula , y [ k ] )} e l s e {

formula <− paste0 ( formula , ” s c a l e (” , y [ k ] , ”) ”)}i f ( k < l ength (y ) ) {

formula <− paste0 ( formula , ” + ”)} e l s e {

formula <− paste0 ( formula , ”) ”)}

}}formulas [ formula index ] <− formulaformula index <− formula index + 1

# Skip v a r i a b l e s without parents} e l s e {

next}

}

############################### Estimate edge weights without ext ra boots t rap##############################

# The edge weights are computed f o r each o f the imputed da ta s e t s

# Create s e t s o f l i s t s to s t o r e r e s u l t sr e s u l t . imptest <− vec to r (mode = ” l i s t ” , l ength = length ( names (

dag . masterplan ) ) )imptest <− vec to r (mode = ” l i s t ” , l ength = length ( names ( dag .

masterplan ) ) )

# Obtain c o e f f i c i e n t s ( us ing r e g r e s s i o n ) f o r a l l parents o f eachva r i ab l e and combine us ing Rubin ’ s r u l e s

f o r ( j in s eq a l ong ( formulas ) ) {# Perform the r e g r e s s i o n on the imputed dataimptest [ [ j ] ] <− with ( imp , eva l ( parse ( t ex t=formulas [ [ j ] ] ) ) )

# Obtain the pooled r e s u l tr e s u l t . imptest [ [ j ] ] <− cbind ( formulas [ [ j ] ] , summary( pool (

imptest [ [ j ] ] ) ) )}

106

Page 110: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

# Combine the e s t imate s f o r a l l edgesr e s u l t . e s t imat i on s <− do . c a l l ( rbind , r e s u l t . imptest )

############################### Estimate edge weights and s i g n i f i c a n c e with boots t rap##############################

# The standard e r r o r o f the r e g r e s s i o n i s computed f o r each o fthe imputed da ta s e t s by boots t rapp ing

# Def ine number o f boots t rapp ing samplesbootNum <− 5000

# Create l i s t to s t o r e boots t rap r e s u l t s o f each imputationbote s t <− array (NA, dim=c ( c ( nrow ( r e s u l t . e s t imat i on s ) , 2 ) , imp$m) )

# Def ine func t i on to perform a l l r e g r e s s i o n s on a boots t rapsample

bo r eg r e s s <− f unc t i on ( data , i nd i c e s , f o rmu l a l i s t ) {

# Retr i eve boots t rap sampled <− data [ i nd i c e s , ]

# Create l i s t to s t o r e r e s u l t o f r e g r e s s i o n f o r each formulai nd i v r e g s <− vec to r (mode = ” l i s t ” , l ength = length ( f o rmu l a l i s t

) )

# Perform each r e g r e s s i o n on the boots t rap sample and s t o r er e s u l t

f o r ( f in s eq a l ong ( f o rmu l a l i s t ) ) {f i t <− with (d , eva l ( parse ( t ex t=f o rmu l a l i s t [ [ f ] ] ) ) )i nd i v r e g s [ [ f ] ] <− co e f ( f i t )

}

# Compress i nd i v i dua l l i s t s o f r e g r e s s i o n r e s u l t s i n to onel i s t

# conta in ing a l l r e g r e s s i o n r e s u l t s f o r t h i s boots t rap sampler e g r e s u l t s <− un l i s t ( i nd i v r e g s )re turn ( r e g r e s u l t s )

}

# Perform boots t rapp ing on each o f the 34 imputed da ta s e t sf o r ( i in 1 : imp$m) {

# Complete o r i g i n a l datase t with imputed v a r i a b l e s o fimputation i

compldata <− complete ( imp , i )

# Generate boots t rap samples and apply ’ r e g r e s s ’ f unc t i on toobta in

# r e g r e s s i o n c o e f f i c i e n t s f o r each r e g r e s s i o n o f eachboots t rap sample

107

Page 111: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

bo r e su l t <− boot ( data=compldata , s t a t i s t i c=boregre s s , R=bootNum , f o rmu l a l i s t=formulas , p a r a l l e l=”snow”)

boverview <− cbind ( bore su l t$ t0 , #coe fapply ( bo r e su l t$ t , 2 , sd ) ) #se ( c o e f )

bo t e s t [ , , i ] <− boverview}

############################### Pool boots t rap r e s u l t s##############################

# Create l i s t to s t o r e r e s u l t s o f poo l ingr e s u l t . bo t e s t <− matrix (NA, nrow=nrow ( r e s u l t . e s t imat i on s ) , nco l

=4)colnames ( r e s u l t . bo t e s t ) <− c (” Boot es t ” , ”Boot SE ” , ”Lower 95 ” ,

”Upper 95 ”)

# Pool r e s u l t s o f 34 imputed da ta s e t sboTestMeld <− mi . meld (q=bote s t [ , 1 , ] , s e=bote s t [ , 2 , ] , byrow=F

)r e s u l t . bo t e s t [ , 1 ] <− boTestMeld$q . mir e s u l t . bo t e s t [ , 2 ] <− boTestMeld$se . mir e s u l t . bo t e s t [ , 3 ] <− r e s u l t . bo t e s t [ , 1 ] − ( 1 . 96 * r e s u l t . bo t e s t

[ , 2 ] )r e s u l t . bo t e s t [ , 4 ] <− r e s u l t . bo t e s t [ , 1 ] + (1 . 96 * r e s u l t . bo t e s t

[ , 2 ] )

# Add boots t rap r e s u l t s to e s t imat i on s and export to . x l sxr e s u l t . e s t imat i on s <− cbind ( r e s u l t . e s t imat ions , r e s u l t . bo t e s t )rownames ( r e s u l t . e s t imat i on s ) <− NULLr e s u l t . e s t imat i on s [ , c ( 3 : nco l ( r e s u l t . e s t imat i on s ) ) ] <−

round ( r e s u l t . e s t imat i on s [ , c ( 3 : nco l ( r e s u l t . e s t imat i on s ) ) ] ,d i g i t s =6)

w r i t e x l s x (x=r e s u l t . e s t imat ions , path=”est imat ionsBP uprot . x l sx”)

108

Page 112: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

Appendix F

Mediation Analysis Script

This appendix contains the script for the final mediation analysis of ourDAGs, specifically for the DAG that contains the blood pressure treatmentgoal as mediator.

############################### Load requ i r ed packages##############################

l i b r a r y ( dplyr )l i b r a r y ( l ub r i d a t e )l i b r a r y (mice )l i b r a r y ( s u r v i v a l )l i b r a r y ( w r i t e x l )l i b r a r y ( boot )l i b r a r y ( Amelia )f i l t e r <− dplyr : : f i l t e r

############################### Impute miss ing data##############################

# Read imputation data# This CSV f i l e conta in s the ba s e l i n e and f i f t h v i s i t va lue s o f

742 pa t i en t s in long formatimputation . data <− read . csv (”Z : / data // imputat iondata . csv ”)

# Transform date and f a c t o r columnsf a c t o r . vars <− c (” treatment ” , ” sex ” , ” caucas ian ” , ” d i abe t e s ” , ”

cvd ” , ”chd ” , ” s t r oke ” , ”smoke ” ,” ntx bas ” , ” obe s i t y ” , ”med ras i . 1 ” , ”med diur

. 1 ” , ”med ras i . 5 ” , ”med diur . 5 ” )date . vars <− c (” date bas ” , ”dob” , ” outcome date ” , ” d a t v i s i t . 1 ” ,

” d a t v i s i t . 5 ” )s tage . f i l t e r <− c (”GFR stage 3a ” , ”GFR stage 3b” , ”GFR stage 4”)

imputation . data [ f a c t o r . vars ] <− l app ly ( imputation . data [ f a c t o r .vars ] , as . f a c t o r )

109

Page 113: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

imputation . data [ date . vars ] <− l app ly ( imputation . data [ date . vars ] ,as . Date )

# Rename columns to match DAG var i ab l e s , prepare f o r Coxpropo r t i ona l hazards r e g r e s s i on ,

# and exc lude GFR stage 1 , 2 and 5imputation . data <− imputation . data %>%

mutate ( outcome = i f e l s e ( outcome==”Censored ” , 1 , 2) ,outcome = as . f a c t o r ( outcome ) ,outcome time = i n t e r v a l ( date bas , outcome date ) %/%

months (1 ) ) %>%rename (ACEI ARB T0=med ras i . 1 , ACEI ARB T1=med ras i . 5 , BP T0=

sbp dnmp . 1 , BP T1=sbp dnmp . 5 ,ESRD=outcome , In t e rv en t i on=treatment , PU T0=uprot . 1 ,

PU T1=uprot . 5 ,SE T0=urinenatr ium . 1 , SE T1=urinenatr ium . 5 , eGFR T0=

mdrd 175 . 1 ) %>%f i l t e r ( ckd s tage %in% stage . f i l t e r )

# I n i t i a l i z e imputations e t . seed (123567)i n i <− mice ( imputation . data , maxit=0)meth <− in i$methpred <− i n i$pr ed

# Setup imputationmeth [ ”whr ” ] <− ”˜ I ( wais t / hip ) ”

i n c l = c (” age ” , ” sex ”)ex c l = c (” pid ” , ”wais t ” , ” hip ” , ” obe s i t y ” , ” v i s i t t y p e . 5 ” , ”

age fu . 1” , ” age fu . 5” ,” date bas ” , ” time . 1” , ” time . 5” , ” outcome time ”)

pred <− quickpred ( data=imputation . data , mincor=0.1 , minpuc=0.25 ,i n c lude=inc l , exc lude=exc l )

# Rows i nd i c a t e the va lue s to be imputed and columns i nd i c a t ep r ed i c t o r v a r i a b l e s

pred [ c (” wais t ” ,” hip ”) , ”whr ” ] <− 0pred [ c (” wais t ” ,” hip ”) , ”bmi ” ] <− 1pred [ ”whr” , c (” wais t ” ,” hip ”) ] <− 1

pred [ c (” l d l . 1 ” , ” hdl . 1 ” ) , ”bmi ” ] <− 1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) , i n v e r t=T) , grep ( ” ( \ \ . 5 )| ( T1 ) ” , rownames ( pred ) ) ] <− 0

pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ”eGFR T0” ] <−1

pred [ grep ( ” ( \ \ . 5 ) | ( T1 ) ” , rownames ( pred ) ) , ”mdrd 175 . 5 ” ] <− 1pred [ grep ( ” ( \ \ . ( 1 | 5 ) ) | ( T ( 0 | 1 ) ) ” , rownames ( pred ) ) , ” outcome date

” ] <− 1

pred [ ”BP T1” , ”dbp dnmp . 5 ” ] <− 1pred [ ” dbp dnmp .5” , ”BP T1” ] <− 1

110

Page 114: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

pred [ ” sbp . 5” , ”dbp . 5 ” ] <− 1pred [ ” dbp . 5” , ”sbp . 5 ” ] <− 1

# Imputation o f 34 da ta s e t simp <− par lmice ( imputation . data , meth=meth , pred=pred , p r i n t=

FALSE, n . core=2, n . imp . core=17)

############################### Mediation ana l y s i s setup##############################

# This code i s based on the code wr i t t en by# Theis Lange , Kim Wadt Hansen , Rikke Sorensen , Soren Galat ius

(2017) .# Applied Mediation Analyses : A Review and Tutor i a l .

Epidemiology and Health , 39 . do i : 10 . 4178/ epih . e2017035

# Step 1 : F i t a Cox model with the o r i g i n a l s u r v i v a l time ands ta tu s (A) , the mediator , and a l l confounders .

# Step 2 : Copy the o r i g i n a l data twice and c r ea t e a new exposureva r i a b l e (A*) that i s equal to

# the observed exposure in the f i r s t copy , and equal to# the oppos i t e o f the observed exposure in second copy .# Step 3 : Impute the unknown coun t e r f a c tua l s u r v i v a l time in the

second copied datase t# us ing the new va r i ab l e c r ea ted in step 2 (A*)# making sure to l im i t the maximum value to the l a s t

p o s s i b l e obse rvat i on time .# Append the datase t with the imputed va lue s to the

f i r s t copy o f the datase t c r ea ted in s tep 2# which w i l l r e s u l t in a datase t twice the l ength o f the

o r i g i n a l datase t .# Step 4 : F i t a Cox model to the datase t c r ea ted in s tep 3# us ing the o r i g i n a l s t a tu s (A) , the c rea ted s t a tu s (A*)

, the s u r v i v a l time and a l l confounders .# Step 5 : Repeat s tep 3 and 4 10 t imes and pool the parameter

e s t imate s accord ing to Rubin ’ s r u l e s .# Step 6 : Repeat s tep 1−5 1 ,000 t imes with a new boots t rap

sample each time# to obta in the var iance es t imate .

# This method i s s l i g h t l y a l t e r e d to# Perform step 1−3 10 t imes to obta in 10 da ta s e t s with the

observed and imputed su r v i v a l t imes .# Perform step 4 1 ,000 t imes on each o f the se 10 da ta s e t s with a

d i f f e r e n t boots t rap sample each time .# Pool the r e s u l t i n g parameter and var iance es t imate ac co rd ing ly

.# Fina l ly , the se s t ep s are executed on each o f our 34 imputed

da ta s e t s and the r e s u l t i s pooled again .

# Setup maximum su rv i v a l time , number o f imputat ions and numbero f boots t rap samples

maxSurvivalTime <− max( imputation . data$outcome time )

111

Page 115: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

nImp <− 10nBoot <− 10ˆ3

# Create a matrix with c o r r e c t dimensions as base to s t o r e Coxr e g r e s s i o n r e s u l t s

dimParVarEst <− matrix (NA, nrow=4, nco l=2)dimEst <− matrix (NA, nrow=7, nco l=2)

# Function that c r e a t e s a datase t with the observed and imputedsu r v i v a l time and outcome

impCounterf <− f unc t i on (workData , maxSurvTime ) {# Step 1workData$InterventionTEMP <− workData$ Intervent ionf i tCox <−# This formula i s s p e c i f i c f o r the blood pr e s su r e treatment

goa l mediator because o f ’BP T1 ’survreg ( Surv ( outcome time , as . numeric (ESRD) ) ˜

InterventionTEMP + BP T1 +SE T0 + BP T0 + ACEI ARB T0 + PU T0 + eGFR T0 ,

data=workData )

# Step 2 and 3tempData1 <− workDatatempData1$InterventionSTAR <− tempData1$InterventiontempData2 <− workDatatempData2$InterventionSTAR <− as . f a c t o r (2−as . numeric (

tempData2$Intervention ) )tempData2$InterventionTEMP <− tempData2$InterventionSTARlinPredTemp <− p r ed i c t ( f i tCox , newdata=tempData2 , type=” l i n e a r

”)simSurvTimes <− rwe i bu l l ( nrow ( tempData2 ) ,

shape=1/ f i tCox$ s ca l e ,exp ( linPredTemp ) )

tempData2$ESRD <− as . f a c t o r (1 + 1*( simSurvTimes < maxSurvTime ))

tempData2$outcome time <−round ( simSurvTimes *( simSurvTimes < maxSurvTime ) +

maxSurvTime*( simSurvTimes >= maxSurvTime ) , d i g i t s =0)

expData <− rbind ( tempData1 , tempData2 )re turn ( expData )

}

# Function that f i t s a Cox r e g r e s s i o n with the observed andcoun t e r f a c tua l data and

# the observed and coun t e r f a c tua l i n t e r v en t i on and a l lconfounders

# with boots t rap samples , g iven as i n d i c e sparEst <− f unc t i on ( mediationData , i n d i c e s ) {

# Retr i eve boots t rap sampled <− mediationData [ i nd i c e s , ]

# Step 4fitMed <−

112

Page 116: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

with (d , coxph ( Surv ( outcome time , as . numeric (ESRD) ) ˜In t e rv en t i on + InterventionSTAR +

SE T0 + BP T0 + ACEI ARB T0 + PU T0 + eGFR T0) )

re turn ( co e f ( f itMed ) )}

############################### Mediation ana l y s i s execut ion##############################

# Create a l i s t to s t o r e r e s u l t s o f parameter and var iancee s t imate s f o r each o f the 34 imputed da ta s e t s

impEstimates <− array (NA, dim=c (dim( dimParVarEst ) , imp$m) )

f o r ( i in 1 : imp$m) {complData <− complete ( imp , i )# Create a l i s t to s t o r e r e s u l t s o f parameter and var iance

e s t imate s f o r each o f the 10 imputed da ta s e t sbootEstimatesTemp <− array (NA, dim=c (dim( dimEst ) , nImp) )

# Make 10 da ta s e t s with the c rea ted coun t e r f a c tua li n t e r v en t i on and imputed coun t e r f a c tua l outcome ,

# as we l l as the o r i g i n a l observed i n t e r v en t i on and outcome# and perform Cox r e g r e s s i o n on 1 ,000 boots t rap samples o f

each o f the se da ta s e t sf o r ( j in 1 : nImp) {

counter fData <− impCounterf ( complData , maxSurvivalTime )

# Perform boots t rapp ing to obta in parameter and var iancees t imate f o r t h i s imputed datase t

bootstrapTemp <− boot ( data=counterfData , s t a t i s t i c=parEst , R=nBoot , p a r a l l e l=”snow”)

boot s t rapResu l t <− cbind ( bootstrapTemp$t0 , #coe fapply ( bootstrapTemp$t , 2 , sd ) ) #se (

c o e f )bootEstimatesTemp [ , , j ] <− boots t rapResu l t

}

# Compute i nd i r e c t , d i r e c t , and t o t a l e f f e c t and mediatedproport ion

bootEst imates <− array (NA, dim=c (dim( dimParVarEst ) , nImp) )f o r ( j in 1 : nImp) {# Compute i n d i r e c t e f f e c tbootEst imates [ 1 , , j ] <− bootEstimatesTemp [ 1 , , j ]# Compute d i r e c t e f f e c tbootEst imates [ 2 , , j ] <− bootEstimatesTemp [ 2 , , j ]# Compute t o t a l e f f e c tbootEst imates [ 3 , , j ] <− bootEstimatesTemp [ 1 , , j ] +

bootEstimatesTemp [ 2 , , j ]# Compute mediated proport ionbootEst imates [ 4 , , j ] <− bootEstimatesTemp [ 1 , , j ] /

bootEstimatesTemp [ 2 , , j ]}

113

Page 117: Mediation Analysis of Nurse Practitioner Impact on Renal … · 2020. 7. 7. · Bachelor thesis Computing Science Radboud University & Radboud University Medical Center Mediation

# Pool r e s u l t s o f the 10 imputed da ta s e t sbootEstMeld <− mi . meld (q=bootEst imates [ , 1 , ] , s e=

bootEst imates [ , 2 , ] , byrow=F)parVarEst <− matrix (NA, nrow=4, nco l=2)parVarEst [ , 1 ] <− bootEstMeld$q . miparVarEst [ , 2 ] <− bootEstMeld$se . mi

impEstimates [ , , i ] <− parVarEst}

############################### Pool bootstrapped imputat ions r e s u l t s##############################

# Create a l i s t to s t o r e r e s u l t s o f poo l ing o f the 34 imputedda ta s e t s

r e s u l t . mediat ion <− matrix (NA, nrow=4, nco l=5)rownames ( r e s u l t . mediat ion ) <− c (” IE ” , ”DE” , ”TE” , ”Q”)colnames ( r e s u l t . mediat ion ) <− c (” co e f ” , ”exp ( co e f ) ” , ” se ( c o e f ) ” ,

” lower95 ” , ”upper95 ”)

# Pool r e s u l t simpEstMeld <− mi . meld (q=impEstimates [ , 1 , ] , s e=impEstimates [

, 2 , ] , byrow=F)r e s u l t . mediat ion [ , 1 ] <− impEstMeld$q . mir e s u l t . mediat ion [ , 2 ] <− exp ( r e s u l t . mediat ion [ , 1 ] )r e s u l t . mediat ion [ , 3 ] <− impEstMeld$se . mir e s u l t . mediat ion [ , 4 ] <− r e s u l t . mediat ion [ , 1 ] − ( 1 . 96 * r e s u l t .

mediat ion [ , 3 ] )r e s u l t . mediat ion [ , 5 ] <− r e s u l t . mediat ion [ , 1 ] + (1 . 96 * r e s u l t .

mediat ion [ , 3 ] )

r e s u l t . mediat ion <− round ( r e s u l t . mediation , 6)r e s u l t . mediat ion <− cbind ( e f f e c t = c (” IE ” ,”DE” ,”TE” ,”Q”) , as . data

. frame ( r e s u l t . mediat ion ) )w r i t e x l s x (x=as . data . frame ( r e s u l t . mediat ion ) , path=”

mediationBP uprot . x l sx ”)

114