the fold-in, fold-out design for dce choice tasks: application to … · 2019. 8. 8. · partial...

12
University of Groningen The Fold-in, Fold-out Design for DCE Choice Tasks Res Team Dev ABC Tool Published in: Medical Decision Making DOI: 10.1177/0272989X19849461 IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2019 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Res Team Dev ABC Tool (2019). The Fold-in, Fold-out Design for DCE Choice Tasks: Application to Burden of Disease. Medical Decision Making, 39(4), 450-460. https://doi.org/10.1177/0272989X19849461 Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 03-01-2021

Upload: others

Post on 13-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

University of Groningen

The Fold-in, Fold-out Design for DCE Choice TasksRes Team Dev ABC Tool

Published in:Medical Decision Making

DOI:10.1177/0272989X19849461

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):Res Team Dev ABC Tool (2019). The Fold-in, Fold-out Design for DCE Choice Tasks: Application toBurden of Disease. Medical Decision Making, 39(4), 450-460. https://doi.org/10.1177/0272989X19849461

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 03-01-2021

Page 2: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

Original Article

Medical Decision Making2019, Vol. 39(4) 450–460� The Author(s) 2019Article reuse guidelines:sagepub.com/journals-permissionsDOI: 10.1177/0272989X19849461journals.sagepub.com/home/mdm

The Fold-in, Fold-out Design for DCE

Choice Tasks: Application to Burden ofDisease

Lucas M. A. Goossens , Marcel F. Jonker, Maureen P. M. H. Rutten-van Molken,

Melinde R. S. Boland, Annerika H. M. Slok, Philippe L. Salome,

Onno C. P. van Schayck, Johannes C. C. M. in ‘t Veen, Elly A. Stolk , and

Bas Donkers, on behalf of the research team that developed the ABC tool

Background. In discrete-choice experiments (DCEs), choice alternatives are described by attributes. The importanceof each attribute can be quantified by analyzing respondents’ choices. Estimates are valid only if alternatives aredefined comprehensively, but choice tasks can become too difficult for respondents if too many attributes areincluded. Several solutions for this dilemma have been proposed, but these have practical or theoretical drawbacksand cannot be applied in all settings. The objective of the current article is to demonstrate an alternative solution, thefold-in, fold-out approach (FiFo). We use a motivating example, the ABC Index for burden of disease in chronicobstructive pulmonary disease (COPD). Methods. Under FiFo, all attributes are part of all choice sets, but they aregrouped into domains. These are either folded in (all attributes have the same level) or folded out (levels may differ).FiFo was applied to the valuation of the ABC Index, which included 15 attributes. The data were analyzed inBayesian mixed logit regression, with additional parameters to account for increased complexity in folded-out ques-tionnaires and potential differences in weight due to the folding status of domains. As a comparison, a model withoutthe additional parameters was estimated. Results. Folding out domains led to increased choice complexity for respon-dents. It also gave domains more weight than when it was folded in. The more complex regression model had a betterfit to the data than the simpler model. Not accounting for choice complexity in the models resulted in a substantiallydifferent ABC Index. Conclusion. Using a combination of folded-in and folded-out attributes is a feasible approachfor conducting DCEs with many attributes.

Keywords

Discrete choice experiments, Task complexity, Preference measurement, COPD, Burden of disease

Date received: June 1, 2018; accepted: March 19, 2019

Discrete-choice experiments (DCEs) are globally recog-nized as a valuable instrument to measure preferences ofrespondents in marketing, environmental economics,transportation, and increasingly health sciences andhealth economics.1 They have been used to address abroad range of questions, including assessing experiencesof patients,2 tradeoffs between health outcomes andexperience factors,3 priority setting,4 identifying groupswith different preferences,5 and estimating utility weightsfor quality-of-life questionnaires.6

A DCE offers respondents a series of choices between2 or more alternatives. These alternatives are describedon the basis of characteristics (i.e., attributes), which cantake different levels. For instance, attributes of a medical

Corresponding Author

Lucas M. A. Goossens, Erasmus School for Health Policy and

Management & Institute for Medical Technology Assessment/Erasmus

Choice Modelling Centre, Erasmus University Rotterdam, P.O. Box

1738, Rotterdam, 3000 DR, the Netherlands ([email protected]).

Page 3: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

treatment might be the risk of a serious adverse event(with possible levels: 1%, 10%, and 20%) or copayment(levels: nothing, e100, and e200). In each DCE question,2 or more alternatives are presented, from which respon-dents have to choose the most appealing one. Attributelevels vary across the alternatives, so one alternative maybe more attractive in terms of risk, whereas the otheralternative may be more attractive in terms of copay-ment. Using statistical choice models, researchers caninfer from these choices how important the various attri-butes are for the respondents. The coefficients of thesemodels quantify the relative importance of the variousattribute levels, and it is possible to sum up the coeffi-cients to compute utilities of alternatives.

A challenge arises in the application of DCEs whenthere are many attributes involved in a decision, becausechoices become more difficult or even infeasible forrespondents if the number of attributes increases.7,8

Many researchers seem to put a large weight on thisargument and aim to comprehensively describe the deci-sion problem using no more than 6 or 7 attributes.1

Unfortunately, leaving out attributes is not alwaysdesirable.

Several strategies have been proposed for conductinga DCE with a large number of attributes.9–11 However,

they require additional assumptions that might not

always hold. The first strategy is called ‘‘partial profil-

ing’’12 or ‘‘blocked attribute design.’’11 In this approach,

respondents get to see only a selection of attributes (a

partial profile or a block) at a time. The omitted attri-

butes are assumed to be equal across choice alternatives.

With different attributes being presented in different

choices, the same respondent gets to see different selec-

tions of attributes, so the full range of attributes is cov-

ered. The analysis is usually based on the pooled data

from all blocks. This assumes that the preferences of

respondents for the presented attribute levels are inde-

pendent of omitted attributes. When Witt et al.11 ana-

lyzed the data from each block separately, they noted

that the coefficients for the common attributes differed

across blocks. This suggested that marginal utility and

the marginal rate of substitution can be sensitive to the

inclusion of other attributes.In the second strategy, the hierarchical information

integration (HII) method,9,10 several individual attributes

are grouped into overarching constructs. These con-

structs can then be used in choice tasks, replacing the

original attributes. The relative weight of attributes

belonging to a construct is investigated in a separate rat-

ing task (‘‘subexperiment’’). This reduces the number of

attributes in the final choice tasks (‘‘bridging experi-

ment’’) and decreases the burden on respondents.HII assumes that respondents would already mentally

group similar attributes into constructs when they make

decisions and are able to meaningfully and sensibly

attach one summarizing value to a group of attributes,

which also validly reflects the value of the underlying

attributes. This is not always evident, and the obtained

value may in fact become dependent on the used labels

for the overarching constructs. For this reason, the anal-

ysis usually contains a test of construct validity.10

However, such a test shows whether attributes contribute

to the level of the construct. It does not show whether

the value of the construct in the DCE is accurately per-

ceived by respondents. Furthermore, the test is often per-

formed only after the choice data have been collected.

Although HII was developed decades ago, there have

been few applications of it in health care so far.13,14

The objective of the current article is to present analternative solution to the problem of large numbers ofattributes, the fold-in, fold-out approach (FiFo). Wedeveloped this approach in the context of a study inwhich we aimed to elicit preference values for the 15 itemsof the Assessment of the Burden of Chronic ObstructivePulmonary Disease (ABC) tool to create the ABC Index.Under FiFo, attributes are categorized into domains, butin contrast to HII, they are not replaced by them.Creating such overarching constructs also would nothave made much sense in our context. Under FiFo,respondents can be presented domains that are eitherfolded in or folded out. When a domain is folded in, allits attributes are still shown but have the same level.

Erasmus School of Health Policy and Management & Institute for

Medical Technology Assessment, Erasmus University Rotterdam,

Rotterdam, the Netherlands (LMAG, MFJ, MPMHRvM, MRSB,

EAS); Erasmus Choice Modelling Centre, Erasmus University

Rotterdam, Rotterdam, the Netherlands (LMAG, MFJ, MPMHRvM,

EAS, BD); Erasmus School of Economics, Erasmus University

Rotterdam, Rotterdam, the Netherlands (BD); CAPHRI Care and

Public Health Research Institute, Maastricht University, Maastricht,

the Netherlands (AHMS, OCPvS); UNICUM Huisartsenzorg,

Bilthoven, the Netherlands (PLS); Department of Pulmonology,

Franciscus Gasthuis en Vlietland, Rotterdam, the Netherlands

(JCCMiV); and EuroQol Foundation, Rotterdam, the Netherlands

(EAS). The members of the ABC tool research team are listed in the

Acknowledgement section. The author(s) declared no potential con-

flicts of interest with respect to the research, authorship, and/or publi-

cation of this article. The author(s) disclosed receipt of the following

financial support for the research, authorship, and/or publication of

this article: The authors received funding for this study from the

Innovation Fund Dutch Health Insurers, Zeist, the Netherlands;

Picasso for COPD, Alkmaar, the Netherlands; GlaxoSmithKline BV,

Zeist, the Netherlands; AstraZeneca BV, Zoetermeer, the Netherlands;

Novartis BV, Arnhem, the Netherlands; Chiesi Pharmaceuticals BV,

Rijswijk, the Netherlands; and Almirall BV, Utrecht, the Netherlands.

Goossens et al. 451

Page 4: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

When a domain is folded out, the levels may differ acrossattributes. The scores of the ABC Index were publishedelsewhere.15 In the current article, we focus on demon-strating the innovative methodology (i.e., the FiFo designof the questionnaire and the analysis of the choice data).

Methods

Context: The ABC Index

The ABC tool was recently developed to support shareddecision making by patients and health care providers.16

It includes the ABC questionnaire about the experiencedburden of chronic obstructive pulmonary disease (COPD)(see Figure A1 in the online appendix), and several objec-tive indicators, such as smoking behavior and a coloredballoon diagram, to visualize patients’ scores. Thesescores are translated into a treatment advice to be dis-cussed with the patients. The resulting individual careplan includes personal treatment goals framed in thepatient’s own words. In an 18-month cluster randomizedcontrolled trial (RCT), this tool was found to be effectivein improving disease-specific quality of life.17

As an addition to the tool, the ABC Index was devel-oped. The index aggregates the scores of the items of theABC questionnaire into a total score for the experiencedburden of disease, based on the importance of each itemfor patients. The ABC Index is the first preference-basedmeasure of burden of disease in COPD. It can be used tomonitor a patient’s overall improvement or deterioration,describe populations of patients in terms of experiencedburden of disease, and assist in contracting betweenhealth care insurers and providers. It was shown to bepredictive of health care consumption and costs.15

Discrete-Choice Experiment

In the development of the ABC Index, the importance ofeach item of the ABC tool was assessed in a DCE, whichconsisted of a series of pairwise choice tasks. In eachchoice task, the health states of 2 COPD patients weredescribed. Respondents were asked to indicate which ofthem they considered to be in worse health. They werepresented with 14 choice tasks, each describing 2 COPDpatients in different health states. The first choice taskserved as a control question to test the respondent’s com-prehension of the task. It was not included in the analysis.

Choice of Attributes and Levels

Each health state was described by 15 attributes fromthe ABC tool, which can be divided into 3 domains plus

2 separate attributes: a respiratory symptoms domain(with 4 attributes: shortness of breath at rest, shortnessof breath during physical activity, coughing, and sputumproduction), a limitations domain (4 attributes: limita-tions in strenuous physical activities, limitations in mod-erate physical activities, limitations in daily activities, andlimitations in social activities), a mental problems domain(5 attributes: feeling depressed, fearing that breathing getsworse, worrying, listlessness, and tense feeling), a fatigueattribute, and an exacerbations attribute.

Each attribute had 3 possible levels. The original ABCQuestionnaire has 7 answer categories for most ques-tions. However, 7 levels per attribute in a DCE are gener-ally considered too many. The multitude of possibilitiesand the subtle differences between them would make itmore difficult for respondents to distinguish the severityof health states. In addition, it would require the estima-tion of many coefficients in the statistical analysis. Forthese reasons, the number of levels of each attribute inthe DCE was limited to 3: 1) never or hardly ever, 2) reg-ularly, and 3) most times for the attributes in the respira-tory symptoms domain, the mental problems domain,and the fatigue attribute; 1) hardly or not at all, 2) mod-erately, and 3) severely for the attributes in the limita-tions domain; and 0, 1, and 2 exacerbations per year forthe exacerbation attribute.

Fold-in, fold-out design. Without further adjustments,the cognitive burden of the DCE to respondents wouldbe very high. To indicate which of the 2 COPD patientshad a worse health status, they would have to make tra-deoffs between 15 attributes.

The new FiFo design was developed to ease this bur-den. Unlike the HII design, FiFo does not create a newoverarching construct that fully replaces the attributes inthat construct. Instead, it shows all attributes in everychoice task, but some domains are ‘‘folded in’’ while oth-ers are ‘‘folded out.’’

When a domain is folded in, all attributes in thisdomain are forced to be at the same level. When adomain is folded out, the levels of each attribute in adomain are presented separately. In the example of achoice task in Figure 1, the limitations domain and themental problems domain are folded in. Person A is‘‘moderately’’ limited in all attributes of the limitationsdomain, while person B is ‘‘hardly or not at all’’ limited.All mental problems occurred ‘‘never or hardly ever’’ inperson A and ‘‘regularly’’ in person B. In contrast withthese 2 domains, the respiratory symptoms domain isfolded out (i.e., the levels differ across the attributeswithin the domain).

452 Medical Decision Making 39(4)

Page 5: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

In every choice task, 2 or 3 domains were folded inand 1 was folded out. To facilitate comparison, domainswould be either folded in or out in both alternatives.

The attribute levels were color-coded, with darkershades for the more severe levels. This color-coding sys-tem was specifically optimized to signal differences in theattribute levels for individuals with red-green colorblind-ness (the most prevalent form of colorblindness) whilekeeping the text readable for respondents who have otherforms of colorblindness18 and was reported to reducetask complexity for respondents.19

Figures A2 and A3 in the online appendix presentchoice tasks in the formats of partial profiling and HII,respectively, in the context of the ABC Index. These for-mats were not used in this study. They serve as a clarifi-cation of these alternative approaches and to show thecontrast with the FiFo format.

Efficient factorial design. Each respondent got 14 choicetasks: 6 choice tasks with all domains folded in and 8choice tasks with one of the domains folded out.

With 15 attributes and 3 possible levels per attribute,many possible choice tasks could be formed. Theexperimental design was optimized using Ngene

(ChoiceMetrics Ptd Ltd, Sydney, Australia) using a D-efficiency design optimization criterion with fixed priors,given the conditions of the FiFo design. This was doneseparately for each DCE question type: that is, questionswith all domains folded in and with the symptoms, limita-tions, or mental problems domain folded out, respectively.For each question type, 4 (folded-out) or 6 (folded-in)blocks of questions were developed. Finally, question-naires were assembled by combining the control questionand 1 block of 5 questions with all domains folded in witha block of 8 questions from a type with a domain foldedout. Combinations were made based on attribute level bal-ance. All blocks with folded-out domains were used once,while the blocks with only folded-in domains were used in2 combinations. Finally, all choice sets were copied andoptions A and B were switched in the second version. Thisled to a total of 24 different DCE questionnaires, whichwere randomly assigned to respondents.

To further optimize the efficiency of the design aimedat measuring preferences of the general public, severaladditional design optimizations were implemented usingprior preferences obtained from an analysis of theresponses of preceding respondents. This was done afterhalf of the patients had answered the questionnaire; itwas repeated 3 times for the members from the general

Figure 1 Example of a choice set with a fold-in, fold-out design. Attribute levels are color-coded, with darker shades for themore severe levels. The color-coding system was optimized for individuals with colorblindness.

Goossens et al. 453

Page 6: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

public. This made new designs increasingly more efficientbased on the analysis of earlier respondents. This is bestpractice in DCE research.

The questionnaire was pilot-tested in a sample of patientsin Franciscus Gasthuis hospital in Rotterdam (n = 10).After the think-aloud interviews, the final layout and word-ing of the survey was implemented (see Figure 1).

Respondents

Two groups of respondents took part in the study. Thefirst group consisted of patients who participated in thecluster RCT in which the ABC tool was tested.17 Theywere recruited in 56 health care centers across theNetherlands (39 primary care, 17 hospital care) betweenMarch 2013 and October 2013 and interviewed betweenJanuary and June 2015. All interviews were held afterthe patients had completed the trial. The second groupconsisted of a nationally representative sample (in termsof age, sex, and education level) of the general public,who were approached through a survey sample provider.

The questionnaires were administered in telephoneinterviews by trained researchers from Erasmus UniversityRotterdam using a (semi)structured interview protocol.Questionnaires were sent to respondents by mail oremail before the interview took place, so respondentshad the choice questions in front of them during thephone call.

Interviewers started by explaining to patients that theexperiment was only about their opinions and that itwould not have any consequences for their individualtreatments. The interviews with members of the generalpublic started with an explanation of the disease COPDand their possible experience with it in their family or cir-cle of friends. Both groups were then asked to complete 2simplified tasks and 1 realistic practice choice task beforethe actual DCE questionnaire started.

At the end of the interview, 3 debriefing questionswere asked about the difficulty of the questionnaire.Participating respondents received a e20 reward aftercompleting the interview.

Analysis

A Bayesian mixed logit regression model was used toanalyze the choices. DCE regression models assume thatchoices are determined by the utility that respondentsattach to the alternatives. This utility has an observablecomponent, which is described by attribute levels andregression parameters, and an unobservable part, whichis captured by the error term.

Hence, utility U of alternative i for respondent n canbe expressed as

Uni =b0Xni + eni;

where Xi is a vector of attribute levels that describe alter-native i, and bn is a vector of parameters for the effectsof these attribute levels on the utility of this alternativefor the nth respondent. The error term eni has a standardtype I extreme value distribution.

Mixed logit allows the regression coefficients b to varyacross respondents, thereby taking into account the sys-tematic differences in preferences across respondents.The general expression of the mixed logit probability foralternative i in choice task j being chosen by respondentn is

Pni =

ðeb0xniPj eb

0xnj

!f bjb,Wð Þdb;

where P denotes the probability of an alternative beingchosen, and f bjb,Wð Þ describes a distribution of para-meter values across respondents. In our case, it was anormal density function with mean b and variance W.

Given the fold-in, fold-out-presentation of thediscrete-choice tasks, it seems plausible that it becamemore difficult for respondents to make choices between 2alternatives when domains were folded out. This is notadequately incorporated in the standard MIXL model,and hence a modified regression model was developedthat explicitly accounted for the fact that the structure ofquestions was not constant (i.e., in some choice tasks, alldomains were folded in, whereas in others, one of thedomains was folded out). This was done by varying thescale of the error term for tasks with a folded-outdomain.

Accordingly, the expression for utility U of alternativei for respondent n was generalized as

Uni =b0Xni + eni=s:

Scale factor s makes explicit that the scale for utilityis arbitrary: the consistency of choice behavior dependson the ratio of parameters to the error term, not on theabsolute value of the parameters. Since it is not possibleto identify both b and s at the same time, s is usuallynormalized to 1.

To reflect differences in complexity, the scale para-meter of the error term was modeled with an additionalparameter for choice tasks with a folded-out domain20,21:

454 Medical Decision Making 39(4)

Page 7: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

s= 1+j:

Identification of this extra j parameter was ensured byjointly analyzing the data from folded-in and folded-outquestionnaires in a single statistical model.

A negative j parameter would mean that the unob-served, unpredictable component of utility had a rela-tively larger impact on choices in folded-out questionscompared to folded-in questions. This would indicatethat respondents’ preferences were less clear and choiceswere less consistent when one of the domains was foldedout.

In a further extension, the weights for the attributelevels were allowed to be different for folded-out domainscompared to folded-in domains. This was achieved byextending the utility function with domain-specific l-parameters, which make it possible to vary the weight ofthe attributes in each domain by their folding-out status:

Uni = 1+ ls � osð Þ � b0sSni + 1+ ll � olð Þ � b0tLni

+ 1+ lm � omð Þ � b0mMni +b0X Xni + eni=s;

where Si, Li, and Mi are vectors of attribute levels in thesymptoms, limitations, and mental problems domains,respectively; Xi is a vector of levels of the attributes thatare not included in these domains; and os, ol, and om areindicators of the folded-out status of the domains.

The b-parameters represented the weights, as theywere perceived by respondents for domains in the folded-in state. Positive l-parameters indicate that respondentsattached more weight to domains that were folded out.

As a comparison, the regression analysis was repeatedas a Bayesian mixed logit model without the l- and j-parameters. Model fit was assessed using the Watanabeinformation criterion (WAIC) and the deviance informa-tion criterion (DIC).22,23 OpenBugs software (opensource, www.openbugs.net) was used to fit the regressionmodels and Excel 2013 (Microsoft, Redmond, WA) wasused to calculate ABC Index scores. The OpenBugs codeis included in the online appendix.

Calculating the ABC Index Score

The estimation results were used to calculate the ABCIndex scores. The worst possible health state was definedto be equivalent to 100 points on the ABC Index scale. Ascore of 0 points would indicate that a patient was on thebest possible level on all attributes (i.e., ‘‘not or hardly’’limited; ‘‘seldom or never’’ symptoms, mental problems,or fatigue; no exacerbation).

Index scores were calculated by linear interpolationand extrapolation from the 3 coefficients per DCE

attribute to the 7-point scale of the original ABC tool.This was complicated by the fact that the b-coefficientsin the symptoms, limitations, and mental problemsdomains represented the weights for attributes as pre-sented in their folded-in state. This made it less than obvi-ous that they should be directly used in the intra- andextrapolation.

On one hand, it could be argued that attributes didnot get enough attention, relatively, when they werefolded in. This would justify adjusting the coefficients tothe folded-out state by multiplying them with an adjust-ment factor of (1 +l). On the other hand, it is also plau-sible that the attention of respondents was exaggeratedlydrawn to the folded-out domain after the first 5 choicetasks in the questionnaire, which were all folded in com-pletely. This would be an argument for using the originalcoefficients. A third option would be to apply a partialadjustment factor (1 + 0.5 * l). We used all 3 optionsand calculated 3 versions of the ABC Index, with thepartial correction as the base case.

Results

Respondents

All 328 COPD patients from the RCT who stayed in thetrial for at least 6 months were approached. Most (86%,283 patients) were willing to participate in the DCE. Theduration of the interviews was approximately 30 minutes,and all but 1 patient who started the interview fully com-pleted it. All respondents from the general public com-pleted the interview. Table 1 shows that the characteristicsof the respondents from the trial were different from thoseof the general public. COPD patients were older on

Table 1 Respondents

Characteristic Patients (n = 282) General Public (n = 250)

Age, mean (SD), y 66.7 (8.3) 46.0 (16.7)Male, % 52 55COPD, % 100 8.3GOLD 1, % 12GOLD 2, % 54GOLD 3, % 31GOLD 4, % 3Education, %Low 48 13Middle 37 35High 15 52

COPD, chronic obstructive pulmonary disease; GOLD, GOLD

(Global Initiative for Chronic Obstructive Lung Disease) classification

1/2/3/4: mild/moderate/severe/ very severe COPD.

Goossens et al. 455

Page 8: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

average, with less variation in age and less educated. A siz-able proportion of the respondents from the general publicstated that they were COPD patients themselves.

After the choice tasks, most of both respondentgroups reported that the task was almost completely ortotally clear to them (see Table 2). They agreed that thedifferences between patients A and B were at least rea-sonably easy to discern. Members from the general pub-lic found this easier than respondents in the patientsample. Making choices was reported to be (very) diffi-cult by 15% of the patients and 13% of the membersfrom the general public.

Regression Results

Table 3 shows the results of the regression analyses forpatients and the general public, respectively. Two coeffi-cients per item or domain are presented: one for level 1(moderate problems) and one for level 2 (severe prob-lems). A positive coefficient indicates that respondentsconsidered this level worse than level 0 (no or hardly anyproblems). A larger coefficient means that the attributecontributes more to the burden of disease.

All coefficients for the items and domains had theexpected positive sign, with larger coefficients at thehigher levels. Patients and members of the general publichad similar but not equal preferences. Both attachedmuch weight to the number of exacerbations, fatigue,

limitations at moderate physical activities, and concern/fear of breathing problems getting worse.

The positive l-parameters indicate that both groupsof respondents attached less weight to domains whenthey were folded in and more when they were folded out.The negative j-parameter indicated that respondents’preferences were less clear and choices were less consis-tent when one of the domains was folded out comparedto questions with only folded-in domains.

The goodness of fit was better for the models withadditional parameters than for the standard models. TheWAIC and DIC for the former were lower (see Table 4)in both groups of respondents.

ABC Index Scores

The choice of the analysis method and the choice of theadjustment factor have a substantial impact on the result-ing index scores. The scores for the worst possible levelsare shown in Table 5.

In the standard Bayesian mixed logit model withoutadditional parameters, the highest scores were for limita-tions at moderate physical activities, fatigue, exacerba-tions, dyspnea at rest, and fear of breathing problemsgetting worse. The lowest possible score was assigned tolimitations in strenuous physical activities, which wouldsuggest that patients care less about these limitations.

Besides showing a better fit to the data, the results ofthe extended models with additional parameters showedclear differences with those of the conventional model.First, the scores in the limitations domain were markedlydifferent. Especially, limitations at strenuous physicalactivities had a greater impact on utility than in the con-ventional model, whereas limitations at moderate physi-cal activities had a smaller impact on utility. Second,scores for fatigue and exacerbations were lower if theadjustment was applied at least partially. With regard tothe symptoms and mental problems domains, all modelsled to more or less similar scores.

Comparing the 3 approaches with additional para-meters to correct for differences between the fold-in andfold-out domains, it becomes clear that more adjustmentled to lower scores for exacerbations and fatigue andhigher scores for limitations in all activities.

Discussion

This article presented the first application of the FiFoapproach in DCEs with many attributes. FiFo reducesthe cognitive burden on respondents, while presentingthem with all attributes in all choice tasks. This approachis an alternative to hierarchical information integration

Table 2 Cognitive Debriefing Questions

Question Patients, %GeneralPublic, %

Was it clear what the task was?Not at all 4 0Not really 1 0Reasonably 25 6Almost completely 6 8Totally 65 86

Was the difference betweenpersons A and B easy to see?Not at all 1 0Not really 3 2Reasonably 44 20Almost completely 12 16Totally 39 63

How easy was it to decidebetween patients A and B?Very difficult 6 4Difficult 9 9Doable 50 50Easy 16 22Very easy 20 15

456 Medical Decision Making 39(4)

Page 9: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

and partial profiling, which were developed earlier tohelp DCE respondents cope with a multitude of attri-butes. FiFo is a feasible alternative because we haveshown that both patients and the general public wereable to complete the choice tasks, and the results in bothgroups of respondents were very plausible, withoutexcluding any patient from the analyses.

FiFo requires an extended analysis method. Applyingthe extended regression model instead of a conventionalanalysis led to markedly different estimates of the prefer-ence weights for the ABC Index. Furthermore, it con-firmed that respondents experience a heavier cognitive

Table 3 Regression Results: Bayesian Mixed Logit Model with Fold-in, Fold-out Parameters

Patients General Public

Attributes and Domains Level Coefficient 95% CI Coefficient 95% CI

Fatigue Regularly 1.189 0.876 to 1.545 1.023 0.701 to 1.382Most times 2.103 1.662 to 2.652 2.377 1.856 to 2.980

Exacerbations Once a year 0.815 0.516 to 1.162 1.513 1.125 to 1.961Twice a year 2.243 1.744 to 2.881 2.642 2.074 to 3.312

SymptomsDyspnea at rest Regularly 0.404 –0.021 to 0.893 0.294 –0.167 to 0.816

Most times 1.496 0.872 to 2.302 0.824 0.198 to 1.509Dyspnea during exercise Regularly 0.330 –0.211 to 0.873 0.503 –0.064 to 1.087

Most times 0.470 –0.065 to 1.021 1.195 0.591 to 1.868Coughing Regularly 0.172 –0.297 to 0.623 0.810 0.373 to 1.310

Most times 0.465 –0.025 to 0.959 1.034 0.542 to 1.577Sputum Regularly 0.766 0.395 to 1.221 0.444 0.063 to 0.865

Most times 0.528 0.057 to 0.993 1.252 0.766 to 1.836LimitationsStrenuous physical activities Moderately 0.544 –0.049 to 1.107 0.578 –0.345 to 1.403

Severely 0.692 0.033 to 1.322 0.921 –0.058 to 1.850Moderate physical activities Moderately 0.763 0.239 to 1.333 0.858 0.100 to 1.832

Severely 1.287 0.629 to 2.039 1.692 0.722 to 2.997Daily activities Moderately 0.320 0.005 to 0.656 0.344 –0.085 to 0.816

Severely 1.028 0.540 to 1.589 1.099 0.364 to 1.866Social activities Moderately 0.245 –0.073 to 0.588 0.563 0.125 to 1.079

Severely 0.943 0.440 to 1.483 1.392 0.666 to 2.168Mental problemsFearing breathing problems Regularly 0.655 0.255 to 1.089 0.706 0.247 to 1.208

Most times 1.461 0.886 to 2.172 1.587 0.898 to 2.322Feeling depressed Regularly 0.057 –0.378 to 0.480 1.027 0.551 to 1.561

Most times 0.729 0.287 to 1.199 1.610 1.100 to 2.207Listlessness Regularly 0.440 0.008 to 0.872 0.320 –0.149 to 0.818

Most times 0.512 0.060 to 0.973 0.967 0.454 to 1.542Tense feeling Regularly 0.751 0.338 to 1.233 0.177 –0.296 to 0.642

Most times 0.885 0.312 to 1.526 1.024 0.381 to 1.728Worrying Regularly 0.381 –0.077 to 0.828 0.722 0.273 to 1.215

Most times 0.797 0.217 to 1.388 0.542 –0.105 to 1.227Adjustment parametersLambda symptoms 0.490 0.036 to 0.960 0.481 0.128 to 0.864Lambda limitations 1.254 0.754 to 1.817 0.542 0.178 to 0.940Lambda mental problems 0.387 0.047 to 0.764 0.392 0.194 to 0.708Phi (complexity parameter) –0.531 –0.654 to –0.387 –0.528 0.105 to –0.376

CI, credible interval around the estimate of the coefficient.

Table 4 Goodness of Fit: Comparison of Results perStatistical Method

VariableStandard MixedLogit Models

Bayesian Mixed Logitwith Additional Parameters

PatientsWAIC 4165 4239DIC 3760 3817

General publicWAIC 3790 3846DIC 3427 3445

DIC, deviance information criterion; WAIC, Watanabe information

criterion.

Goossens et al. 457

Page 10: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

burden when they are confronted with more variety inattributes and attribute levels. This was indicated by thecomplexity parameter in the regression results in bothsamples. The negative value shows that the utility scalefor folded-out choice tasks is smaller than the scale forfolded-in choice tasks. In other words, the coefficients fora DCE with only folded-out questions would have beenconsiderably smaller because respondents would be lesscertain of their choices. This does not mean that foldingin or folding out per se has an effect on the validity of theestimated weights for the ABC Index, only on the relia-bility. The ABC Index is based on a linear extrapolationof the regression results, so only the relative differencesbetween coefficients matter, not the absolute differences.

Our study also found that respondents attached moreweight to attributes when their domains were folded outor, equivalently, less weight to attributes in domains thatwere folded in. That result might have been predicted bysupport theory.24,25 According to support theory, respon-dents judge the probability of an event to be higher whenthe explicitness of the event description increases andmore details are given about possible versions of theevent. Support theory was developed in a very differentcontext. Our study is not about probabilities. However,the analogy is in the different reactions of respondents tooptions that are described more or less extensively.

We put forward 2 reasons why respondents attachedmore weight to attributes in folded-out domains. First,these domains could have more salience because of theway in which they are presented. Alternatively, respon-dents might react to a change in the format of the choicetask questions where all domains folded in were fol-lowed by questions with 1 folded-out domain. It is alsoplausible that both mechanisms work at the same time.Support theory supports the first explanation, withoutruling out the second one. This suggests that respon-dents focus less on folded-in domains than on folded-out domains, regardless of the contrast with earlierchoice sets. That would mean that the l-parameters,which model the folding status of domains, should beincorporated in the transformation of coefficients toindex scores, at least in part. Further research couldshed more light on this element of the choice behaviorof respondents.

Several potential solutions are now available for theproblem of attribute overload. FiFo has some potentialadvantages compared to HII and partial profiling. Itis conceptually simpler for respondents than HII.Furthermore, it does not require the creation of over-arching constructs, which consist of underlying attri-butes. This limits the applicability of HII. In our case,using overarching constructs would not have been

Table 5 ABC Index Scores for the Worst Possible Levels: Comparison of Results, per Statistical Model and Choice ofAdjustment Factors

Bayesian Mixed Logit Regression

Without Additional Parameters

With Additional Parameters

Variable No Adjustmenta

Full Adjustmenta

Partial Adjustmenta

Fatigue 14 14 9 11SymptomsDyspnea at rest 11 11 10 10Dyspnea during exercise 3 3 3 3Coughing 2 3 3 3Sputum 2 2 2 3

Mental problemsFearing breathing problems 10 10 9 9Feeling depressed 5 6 5 5Listlessness 2 3 2 2Tense feeling 5 5 5 5Worrying 6 5 5 5

LimitationsStrenuous physical activities 0 4 6 5Moderate physical activities 14 8 12 11Daily activities 8 7 11 9Social activities 6 7 10 9Exacerbations 12 12 8 10

a. Coefficients were not, fully, or partially adjusted to folded-out status by multiplying them with 1, 1 +l, or (1 +0.5 * l), respectively.

458 Medical Decision Making 39(4)

Page 11: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

sensible. However, Oppewal et al.10 have shown exam-ples where HII was useful.

In contrast with partial profiling, FiFo presentsrespondents with all attributes in all choice tasks, whichincreases the consistency and validity of regression esti-mates. It also takes into account differences in utilityscales across presentations.

Another approach that can be used in combinationwith or instead of any approach (FiFo, partial profiling,and HII) with moderately large numbers of attributes isan attribute-level overlap design. This means that, ineach choice set, several attributes have the same levelacross all alternatives. This makes the choice task easierfor respondents. With different attributes being ‘‘over-lapped’’ in different choice sets, this ensures that atten-tion is paid to all attributes instead of a limited set thatrespondents decided to focus on.26

FiFo has some limitations as well. First, it requiresthat all attributes within a domain have the same set oflevels. If this is not possible, they cannot be combined inFiFo domains.

A second limitation is that the extended analysis iscurrently not facilitated by widely available software,such as Nlogit, SAS, or Stata. Specific programming wasrequired to build the regression models that includedadditional parameters. Third, the same is true for creat-ing the fractional factorial design. We developed 4 sepa-rate designs in Ngene and combined these into blocksof choice tasks. The overall design would have beenmore efficient if it had been produced in 1 step. Thiswould have required additional programming in differ-ent software.

The main limitation of this study is that we did notdirectly compare the performance of FiFo to that of HIIand partial profiling. It would not have been possible inthe context of the current study, because no overarchingconstructs could be formed for the ABC Index. The com-parison could be the focus of further research.

In conclusion, using a combination of folded-in andfolded-out attributes seems to be a feasible approach forconducting a DCE with a high number of attributes. Theanalysis requires statistical models with additional para-meters to address the increased complexity and to pre-vent the potential underestimation of the weights forfolded-in attributes.

Acknowledgments

The authors thank 2 anonymous reviewers for their construc-tive and useful comments. Members of the of the research teamthat developed the ABC tool are as follows: G.M. Asijee

(Maastricht University, CAPHRI School for Public Health andPrimary care, Department of Family Medicine, Maastricht,The Netherlands & Foundation PICASSO for COPD,Alkmaar, The Netherlands), M.R.S. Boland (ErasmusUniversity, Erasmus School for Health Policy andManagement & institute for Medical Technology Assessment,Rotterdam, The Netherlands), G. van Breukelen (MaastrichtUniversity, CAPHRI School for Public Health and PrimaryCare, Department of Methodology & Statistics, Maastricht,The Netherlands), N.H. Chavannes (Leiden UniversityMedical Centre, Department of Public Health and PrimaryCare, Leiden, The Netherlands), P.N. Dekhuijzen (RadboudUniversity Medical Center, Department of PulmonaryDiseases, Nijmegen, The Netherlands), B. Donkers (ErasmusUniversity, Erasmus School of Economics, Department ofBusiness Economics, Rotterdam, The Netherlands), L.M.A.Goossens (Erasmus University, Erasmus School for Health

Policy and Management & institute for Medical TechnologyAssessment, Rotterdam, The Netherlands), M.F. Jonker(Erasmus University, Erasmus School for Health Policy andManagement & institute for Medical Technology Assessment,Rotterdam, The Netherlands), S. Holverda (Lung FoundationNetherlands, Amersfoort, The Netherlands), H.A.M. Kerstjens(University Medical Centre Groningen, Department ofPulmonary Diseases & Groningen Research Institute for Asthmaand COPD (GRIAC), Groningen, The Netherlands), D. Kotz(Maastricht University, CAPHRI School for Public Health andPrimary care, Department of Family Medicine, Maastricht, TheNetherlands & Institute of General Practice, Medical Faculty ofthe Heinrich-Heine-University Dusseldorf, Germany), T. van derMolen (University Medical Centre Groningen, Department ofGeneral Practice & Groningen Research Institute for Asthmaand COPD (GRIAC), Groningen, The Netherlands), MPMHRutten-van Molken (Erasmus University, Erasmus School forHealth Policy and Management & institute for MedicalTechnology Assessment, Rotterdam, The Netherlands), P.L.Salome (Huisartsencooperatie PreventZorg, Bilthoven, TheNetherlands), C.P. van Schayck (Maastricht University,CAPHRI School for Public Health and Primary care,Department of Family Medicine, The Netherlands), A.H.M.Slok (Maastricht University, CAPHRI School for Public Healthand Primary care, Department of Family Medicine, Maastricht,The Netherlands), E.A. Stolk (Euroqol Foundation, Rotterdam,The Netherlands, and Erasmus University, Erasmus School forHealth Policy and Management & institute for MedicalTechnology Assessment, Rotterdam, The Netherlands),M. Twellaar (Maastricht University, CAPHRI School for PublicHealth and Primary care, Department of Family Medicine, TheNetherlands), J.C.C.M. in’t Veen (Sint Franciscus Gasthuis,Department of Pulmonology, Rotterdam, The Netherlands).

Supplementary Material

Supplementary material for this article is available on theMedical Decision Making Web site at http://journals.sagepub.com/home/mdm.

Goossens et al. 459

Page 12: The Fold-in, Fold-out Design for DCE Choice Tasks: Application to … · 2019. 8. 8. · partial profile or a block) at a time. The omitted attri-butes are assumed to be equal across

ORCID iDs

Lucas M. A. Goossens https://orcid.org/0000-0003-4450-4887Elly A. Stolk https://orcid.org/0000-0001-5968-0416

References

1. Clark MD, Determann D, Petrou S, Moro D, de Bekker-Grob EW. Discrete choice experiments in health econom-ics: a review of the literature. Pharmacoeconomics. 2014;32:883–902.

2. Grutters JP, Joore MA, Kessels AG, Davis AC, AnteunisLJ. Patient preferences for direct hearing aid provision bya private dispenser: a discrete choice experiment. Ear Hear.

2008;29:557–64.3. Herbild L, Bech M, Gyrd-Hansen D. Estimating the Dan-

ish populations’ preferences for pharmacogenetic testingusing a discrete choice experiment. the case of treatingdepression. Value Health. 2009;12:560–7.

4. Koopmanschap MA, Stolk EA, Koolman X. Dear policymaker: have you made up your mind? A discrete choiceexperiment among policy makers and other health profes-sionals. Int J Technol Assess Health Care. 2010;26:198–204.

5. Goossens LM, Utens CM, Smeenk FW, Donkers B, vanSchayck OC, Rutten-van Molken MP. Should I stay orshould I go home? A latent class analysis of a discretechoice experiment on hospital-at-home. Value Health.2014;17:588–96.

6. Krabbe PF, Devlin NJ, Stolk EA, et al. Multinational evi-dence of the applicability and robustness of discrete choicemodeling for deriving EQ-5D-5L health-state values. Med

Care. 2014;52:935–43.7. Caussade S, de Dios Ortuzar J, Rizzi L, Hensher DA.

Assessing the influence of design dimensions on statedchoice experiment estimates. Transportation Research Part

B. 2005;39:621–40.8. Hensher DA. How do respondents process stated choice

experiments? Attribute consideration under varying infor-mation load. J Appl Econ. 2006;21:861–78.

9. Louviere JL. Hierarchical information integration: a newmethod for the design and analysis of complex multiattri-bute judgment problems. Adv Consumer Res. 1984;11:148–55.

10. Oppewal H, Louviere JL, Timmermans HJP. Modelinghierarchical conjoint processes with integrated choiceexperiments. J Market Res. 1994;31:92–105.

11. Witt J, Scott A, Osborne RH. Designing choice experi-ments with many attributes: an application to setting prio-rities for orthopaedic waiting lists. Health Econ. 2009;18:681–96.

12. Chrzan K. Using partial profile choice experiments to han-dle large numbers of attributes. Int J Market Res. 2010;52:827–40.

13. van Helvoort-Postulart D, Dellaert BG, van der WeijdenT, von Meyenfeldt MF, Dirksen CD. Discrete choice

experiments for complex health-care decisions: does hier-

archical information integration offer a solution? Health

Econ. 2009;18:903–20.14. Gray E, Eden M, Vass C, McAllister M, Louviere J, Payne

K. Valuing preferences for the process and outcomes of

clinical genetics services: a pilot study. Patient. 2016;9:

135–47.15. Goossens LM, Rutten-van Molken MP, Boland MR, et al.

ABC index: quantifying experienced burden of COPD in a

discrete choice experiment and predicting costs. BMJ Open.

2017;7:e017831.16. Slok AH, in ‘t, Veen JC, Chavannes NH, et al. Develop-

ment of the assessment of burden of COPD tool: an inte-

grated tool to measure the burden of COPD. NPJ Prim

Care Respir Med. 2014;24:14021.17. Slok AH, Kotz D, van Breukelen G, et al. Effectiveness of

the assessment of burden of COPD (ABC) tool on health-

related quality of life in patients with COPD: a cluster ran-

domised controlled trial in primary and hospital care. BMJ

Open. 2016;6:e011519.18. Jonker MF, Attema AE, Donkers B, Stolk EA, Versteegh

MM. Are health state valuations from the general public

biased? A test of health state reference dependency using

self-assessed health and an efficient discrete choice experi-

ment. Health Econ. 2017;26:1534–47.19. Jonker MF, Donkers B, de Bekker-Grob E, Stolk EA.

Attribute level overlap (and color coding) can reduce task

complexity, improve choice consistency, and decrease the

dropout rate in discrete choice experiments. Health Econ.

2019;28:350–63.

20. Dellaert BGC, Donkers B, van Soest A. Complexity effects

in choice experiment–based models. J Market Res. 2012;49:

424–34.21. Swait J, Adamowicz W. The influence of task complexity

on consumer choice: a latent class model of decision strat-

egy switching. J Consumer Res. 2001;28:135–48.22. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A.

Bayesian measures of model complexity and fit (with dis-

cussion). J R Stat Soc Series B Stat Methodol. 2002;64:

583–639.23. Watanabe S. Asymptotic equivalence of Bayes cross valida-

tion and widely applicable information criterion in singular

learning theory. J Mach Learn Res. 2010;11:3571–94.24. Tversky A, Koehler DJ. Support theory: a nonextensional

representation of subjective probability. Psychol Rev.

1994;101:547–67.25. Rottenstreich Y, Tversky A. Unpacking, repacking, and

anchoring: advances in support theory. Psychol Rev.

1997;104:406–15.26. Jonker MF, Donkers B, de Bekker-Grob EW, Stolk EA.

Effect of level overlap and color coding on attribute non-

attendance in discrete choice experiments. Value Health.

2018;21:767–71.

460 Medical Decision Making 39(4)