short and precise patient self-assessment of heart failure...

Short and Precise Patient Self-Assessment of Heart Failure Symptoms

Using a Computerized Adaptive Test (HF-CAT)

Rose et al: Heart Failure CAT

Matthias Rose MD PhD 1,2,3, Milena Anatchkova PhD 1, Jason Fletcher PhD 4,

Arthur E. Blank PhD 4, Jakob Bjørner MD PhD 5, Bernd Löwe MD PhD 3,

Thomas S. Rector PhD 6, John E. Ware PhD 1,7

1Department of Quantitative Health Sciences, University of Massachusetts, Worcester, MA, USA 2Department of Psychosomatic Medicine, Charité – University Medicine Berlin, Germany 3University Medical Center Hamburg-Eppendorf and Schön Klinik Hamburg-Eilbek, Germany 4Department of Family and Social Medicine, Albert Einstein College of Medicine, Bronx, NY, USA 53i QualityMetric, Lincoln, RI, USA 6VA Medical Center and Department of Medicine, University of Minnesota, Minneapolis, MN, USA 7John Ware Research Group, Incorporated, Worcester, MA, USA

Correspondence to Matthias Rose Department of Psychosomatic Medicine, Charité – University Medicine Berlin, Germany Charitéplatz 1 10117 Berlin, Germany office +49 30 450 553002 fax +49 30 450 553989 [email protected]

Journal Subject Codes: 110

DOI: 10.1161/CIRCHEARTFAILURE.111.964916

Medicine Berlinininnnnn, ,nnnnnnnikikikikikikik H H H H H H Hamamamamamamambububububububurgrgrgrgrgrgrg-E-E-E-E-E-E-Eililililillilllege e e e e ee ofofofofofofof M M M M M M Mededededededediciciciciciciciiinini

Lr n

h

Lincoln, RI, USA r and Department of Medicine, University of Minnesota, Min

h Group, Incorporated, Worcester, MA, USA

by guest on June 23, 2018http://circheartfailure.ahajournals.org/

Dow

nloaded from

http://circheartfailure.ahajournals.org/

Abstract

Background—Assessment of dyspnea, fatigue and physical disability is fundamental to the

monitoring of patients with heart failure (HF). A plethora of patient-reported measures exist,

but most are too burdensome or imprecise to be useful in clinical practice. New techniques

used for computer adaptive tests (CAT) may be able to address these problems. The purpose

of this study was to build a CAT for patients with HF.

Methods and Results—Item banks of 74 queries (‘items’) were developed to assess self-

reported physical disability, fatigue and dyspnea. All queries were administered to 658 adults

with HF to build three item banks. The resulting HF-CAT was administered to 100 ancillary

HF-patients (NYHA I 11%, II 53%, III&IV 36%). In addition, the physical function and

vitality domains of the SF-36 questionnaire, an established shortness-of-breath-scale (SOB),

and the Minnesota Living with Heart Failure Questionnaire (MLHFQ) were applied. The HF-

CAT assessment took 3:09 1:52 minutes to complete and score. All HF-CAT scales

demonstrated good construct validity through high correlations with the corresponding SF-36

physical function (r=-.87), vitality (r=-.85) scales, and the SOB scale (r=.84). Simulation

studies showed a more precise measurement of all HF-CAT scales over a larger range than

comparable static tools. HF-CAT scales identified significant differences between patients

classified by the NYHA symptom criteria, similar to the MLHFQ.

Conclusions—A new CAT for HF patients was built using modern psychometric methods.

Initial results demonstrate its potential to increase the feasibility, and precision of patient self-

assessments of symptoms of HF with minimized respondent burden.

Clinical Trial Registration—URL: http://www.projectreporter.nih.gov. Unique identifier:

1R43HL083622-01.

Key Words: heart failure, patient-reported outcomes, computer adaptive tests

n, the physicacaaaaaal llllll

rtnesesesesesesess-s-s-s-s-ss-ofofofofofofof-b-b-b-b-b-b-brerererererereatatatatatatathhhhhhh

Living with Heart Failur Questionnaire (MLHFQ) were app

o F

construct validity through high correlations with the corresp

r 4

more precise measurement of all HF CAT scales over a larg

Living with Heart Failure Questionnaire (MLHFQ) were app

ook 3:09 1:52 minutes to complete and score. All HF

construct validity through high correlations with the corresp

r=-.87), vitality (r=-.85) scales, and the SOB scale (r=.84

more precise measurement of all HF CAT scales over a larg


Dow

nloaded from


The cardinal manifestations of heart failure (HF) are dyspnea and fatigue, limited tolerance of

physical activity, fluid retention, pulmonary congestion and peripheral edema. Therefore, HF

is a clinical diagnosis that is largely based on physical examination and a careful history about

typical subjective symptoms in the presence of cardiac dysfunction (1). A patient-centered

measurement approach is particularly important in HF, to provide clinicians with tools to help

them to monitor the syndrome, to compare improvements under different forms of therapy,

and to identify risk of deterioration. The NYHA classification has been used for this purpose,

but is being criticized for its questionable reliability (2,3) and rarely used outside clinical

studies or specialized units.

Generally, patient self-assessments have been shown to be the more reliable assessments of

subjective symptoms, which is one reason for a growing interest in subjective health status

measures from the scientific community, clinical practitioners, as well as from the

industry (4,5). Self-assessed symptoms are used to predict declines in health status of patients

with HF (6), total expenses for HF care (7), hospitalization or even mortality (8,9). Their

widespread use has been recommended to increase quality of care (10), and 30% of all new

drug developments use Patient-Reported Outcomes (PROs) as their primary or co-primary

endpoint (11).

However, with traditional methods, a comprehensive and reliable ‘static’ measure is likely to

be long and time-consuming to administer and score. If questionnaire data need to be

analyzed manually assessments become cost-prohibitive for use in routine clinical practice,

and individual patient reports cannot be provided timely. Short-forms limit the respondent

burden, but often show more ceiling- or floor effects and lack the precision required at the

ddddd rararararerererrerr lylylylylylyy u u u u seseseseeeed d d d dd d oouououooou

e a

m e

e scientific community, clinical practitioners, as well

elf-assessments have been shown to be the more reliable a

ms, which is one reason for a growing interest in subjective

e scientific community, clinical practitioners, as well


Dow

nloaded from


individual patient level (12,13). Measurement precision to guide individual decision-making

must be substantially higher than for group comparisons, because true change must be

separated from measurement error for every single assessment (13). For example, if a

confidence interval of 95% is required, a traditional tool with good psychometric properties

for group comparisons (e.g. with Cronbach =.80) would only allow for interpretation of

score differences of almost one standard deviation when used for an individual (14).

Moreover, classic psychometric methods cannot be used to determine the measurement

precision for an individual measurement. As a result, none of the existing tools has become a

standard measure in clinical practice (15,16). Enhancing the precision, accessibility and

interpretability of patient reported outcome (PRO) measures could make heart failure

management more efficient and effective in meeting patient care needs.

With the presented study we apply computerized adaptive testing (CAT) methods, a

measurement technology (17) which is used widely in educational testing (18). We aimed to

build a system which will allow routine, comprehensive assessment of pathognomonic

symptoms. The use of CAT techniques also promise to provide more precise measures, with

fewer items, and an effective resolution to the classic conflict between practicality and

precision faced by traditional measurement methodology (12). CATs tailor each assessment to

the individual’s status on what is being measured, applying only items which are most

appropriate for her/his current health status. Responses to each CAT-item direct the choice of

the following CAT-item towards the most informative for this particular assessment. A

patient indicating higher levels of disability within the first questions would only be asked

about this level of ability. Omitting the use of uninformative items not relevant for a given

precision, acacaccccccc

s coooooooulululululululd d d d d dd mamamamamamamakekekekekekek

e

d

o .

efficient and effective in meeting patient care needs.

d study we apply computerized adaptive testing (CAT

ology (17) which is used widely in educational testing (18).


Dow

nloaded from


functional limitation focuses the assessment, decreases the respondent burden, and increases

the measurement precision achievable with a given number of items.

CATs select the items out of a larger item bank representing the entire range of the construct

being measured. Most of the item banks are built upon the principles of the Item-Response

Theory (IRT). The National Institutes of Health (NIH) are intensively promoting use of these

methods to develop a comprehensive Patient-Reported Outcomes Measurement Information

System (PROMIS) as part of their roadmap initiatives (http://nihroadmap.nih.gov/). Authors

of this paper are part of the PROMIS initiative, which aims to provide a standard assessment

for generic health status measures in the near future (19).

The goal of this study was to develop CATs for dyspnea, fatigue and physical function for the

assessment of patients with HF, and to evaluate their acceptability, precision and validity.

Methods

Development of the items

After review of the relevant literature we developed a set of 74 patient questions (items)

covering the three primary physical impairments commonly reported by patients with HF:

physical function/disability (24 items), dyspnea (30 items) and vitality/fatigue (20 items). The

queries were designed to be short enough to fit on a portable phone screen for home

assessments (Figure 1). Items were selected to represent the entire continuum of each aspect

of HF from no to severe impairment. All three item banks have been scored in the direction

that higher scores indicate more impairment (i.e. physical disability, fatigue, and dyspnea).

dy was to develop CATs for dyspnea, fatigue and physical fu

n d

dy was to develop CATs for dyspnea, fatigue and physical fu

nts with HF, and to evaluate their acceptability, precision and


Dow

nloaded from


The item bank development was performed separately for each of the three domains of

physical function, dyspnea, and fatigue following the same procedures as described in

previous studies (20,21). After the item banks had been developed we used them as a basis for

a CAT. A new software solution was developed to work on a Personal Digital Assistant. The

CAT logic can be set to stop after the measurement reaches a particular precision or after a

maximum of items had been administered. For this study phase the CAT was set to assess

each of the three different domains with a standard error of SE < 3.3 (corresponding to a

reliability of Cronbach > .90 for samples with a standard deviation of 10) or a maximum

number of 7 items per scale.

Participants

The data for the CAT item bank development (IB sample) were collected via the Internet from

English speaking adults with HF. All respondents were recruited by YouGov. YouGov uses a

methodology called sample matching for the selection of study samples from pools of opt-in

respondents (22). Sample matching starts with an enumeration of the target population. For

patient recruitments, the target population is all adults with similar sociodemographic

characteristics like patients with a particular condition, as enumerated in consumer databases

(e.g. maintained by Acxiom, Experian, and InfoUSA). Then a random sample is drawn from

the target population. Finally for each member of the target sample, a matching member of the

internet pool of opt-in respondents is selected, resulting in a “matched sample”. Matching was

based on age, gender and race. The resulting matched sample has similar characteristics to the

target population and, will have similar properties to a true random sample. For this study

14,028 adults have been approached until the target number of patients with heart failure had

been enrolled. All newly developed items were administered randomly.

T e

d Y

l t hi f th l ti f t d l f

T item bank development (IB sample) were collected via the

dults with HF. All respondents were recruited by YouGov. Y

l t hi f th l ti f t d l f


Dow

nloaded from


The same data collection method and vendor has been used for many similar projects,

including a NIH roadmap initiative for the development of generic PRO tools

(www.nihpromis.org). To ensure a sufficient distribution of responses for the item parameter

estimation, we used a quota of 1/3 of patients with minor, medium, and severe impairment

based on one screening question describing the level of impairment analogous to the NYHA-

classification (I/II/ III).

To help ensure the quality of the data we applied the following exclusion criteria: (a) average

answering time per item was less than 5 seconds, (b) subjects who did not indicate they had

HF and one underlying cause for HF, (c) subjects who did not indicate that the HF diagnosis

was given by a physician, (d) last visit to a physician was more than 6 months ago, or (e)

current medication did not indicate at least one drug used for the treatment of HF (diuretics,

ACEI or ARB, -blockers, digoxin).

To examine the characteristics of the HF-CAT different simulation studies were conducted as

described earlier (20,23). These analyses are based on the real data provided for all items in

the bank by the patients in the online survey. Only small subsets of those item responses are

used to estimate the patient score for the CAT simulation (in IRT terms called ‘theta score’).

The quality of the items in the bank defines the precision of the score at different ranges. The

‘test information curve’ identifies floor- and ceiling effects and if the measurement range of

the tool fits to the symptoms of the sample. To illustrate this for the HF-CAT, the precision of

the score estimate was plotted as a function of the patient scores (20).

To evaluate the construct validity of the HF-CAT, items from the following established tools

were also included in the data collection: the SF-36® Health Survey scales for Physical

Functioning (PF) and Vitality (VT) (24), four items from the Medical Health Outcomes

nnnnnnndidididicacacaccacc tetetee t t ttttthahahahaaaat t t thththththththeee e eee

re ththhhhthhananananananan 6666666 mmmmmmmonononononononthy ( ) p y

d

o

racteristics of the HF CAT different simulation studies were

y ( ) p y

did not indicate at least one drug used for the treatment of

ockers, digoxin).

racteristics of the HF-CAT different simulation studies were


Dow

nloaded from


Survey (HOS) to assess Shortness of Breath (SOB) (25) and the Minnesota Living with Heart

Failure Questionnaire (26) (MLHFQ, 21 items) as a legacy tool for measuring HF as indicated

by patients’ perceptions of its overall effects on their lives.

A separate sample of 100 consecutive participants was recruited for the validity test

conducted at the heart failure clinic of the Montefiore Medical Center, Bronx, NY (MMC

sample). The clinic was selected as it usually does not use PRO assessments, and

predominantly serves a low income, diverse population. We considered this environment as

particularly challenging to test a new technology, assuming relatively low health literacy

levels. In addition, we felt that an evaluation of psychometric properties would be more

relevant in a less educated sample, as the validity of the IRT assumptions have been evaluated

already in the development sample, which was affluent and well-educated (Table 1). Patients

with previously diagnosed heart failure were invited to participate in the study. Consenting

participants were asked to complete the actual HF-CAT on a hand-held computer (Personal

Digital Assistant, PDA) and a series of paper- and pencil-assessments including socio-

demographic questions, the MLHFQ, and a survey evaluation the experience with the HF-

CAT. All participants completed both instruments. Participants were randomly assigned to

one of two groups within a cross-over design where the order of presentation of the HF-CAT

assessment and the MLHFQ was counterbalanced. Patients were placed in the waiting area

and asked to follow the standard instructions provided for each measure.

Medical information, including the NYHA class was extracted from the medical files. The

NYHA class is determined routinely for all patients at every visit at the MMC Heart Failure

Clinic based on the clinical assessment of the treating physician. The NYHA class was

iic c c cccc prprprprprprpropopopopopoppererererererertititit eseseseseseses w w w w w w woo

umptptptttpttioioioioioioionsnsnsnsnsnsns hhhhhhhavavavavavavaveeeeeee bp y p

o b

g y

s u

p y p

opment sample, which was affluent and well-educated (Tabd

gnosed heart failure were invited to participate in the study

sked to complete the actual HF-CAT on a hand-held compu


Dow

nloaded from


determined without knowledge of the results of patient self-assessments. Patients gave written

informed consent and received a $25 incentive for their participation in the study.

Results

Samples

After applying the inclusion and exclusion criteria, the final item development sample (IB

sample) consisted of 658 participants, 60 13 years old (49% female) who had experienced

HF for 8.8 7.9 years (Table 1). Patients reported the following conditions beside their HF:

43% coronary heart disease, 42% previous heart attacks, 18% cardiomyopathy, 14% valvular

heart disease, 5.2% rheumatic fever, 60% hypertension, 31% arrhythmias, 40% diabetes.

Alcohol abuse was reported by 5.9%.

The Montefiore Medical Center clinical sample (MMC sample, n=100) was predominantly

male (62%), with a mean age of 58 years. The sample was diverse including a majority of

African-American patients and a large proportion of Hispanics. One third of the population

had a comparatively low household income. The severity of their heart failure symptoms

assessed by the New York Heart Association (NYHA) classification was 11% in class I, 53%

in class II, 36% in class III or IV.

HF-CAT Development

Item Banks Development: In the final calibrated item banks there were 21 items assessing

Physical Disability, 20 items assessing Fatigue and 29 items in the Dyspnea bank with

satisfactory item fit (Table 2). Most informative (i.e. with a high discrimination parameter:

‘slope’) was the item asking about the ability to run errands, an item referring to a feeling of

g conditions bbeseeeeee

ardiiomomomomomomomyoyoyoyoyoyoyopapapapapapapathththththththyy,yyyyy

% 4

r

e p

% rheumatic fever, 60% hypertension, 31% arrhythmias, 4

reported by 5.9%.

edical Center clinical sample (MMC sample, n=100) was p


Dow

nloaded from


being “worn out”, and the item asking if the patient will be short of breath walking from one

room to another.

Simulation Studies: The precision of every score estimate can be displayed as a function of

the level of function, or the severity of the symptoms. The results of the simulation studies

showed that a highly precise score (comparable to an internal consistency of >.90) can be

estimated with 5 items for each domain over a range of nearly three SDs. (Figure 2, left side).

The concordance between the results of the CATs and the entire item bank was very good for

all of the constructs as illustrated by the extremely high correlations (r=0.95-0.97), showing

that the 5 item CAT can essentially capture the information provided by the entire bank. As

expected there were high correlations between the simulated CAT scale scores and the

corresponding SF-36 Health Survey’s Physical Function (r=-.87), and Vitality scales (r=-.84),

as well as the static Shortness of Breath measurement (r=.83). Compared to all legacy tools,

the HF-CAT provides a more precise measurement over a larger measurement range (Figure

2, right side). For Physical Disability a similar measurement precision like with SF-36

Physical Function scale can be achieved with ½ the number of items (Figure 2, upper left

corner).

HF-CAT Evaluation

Respondent burden: On average 4-5 items were administered for the assessment of physical

disability, fatigue and dyspnea to achieve the predefined level of precision (Table 3). The

average time for administration of the entire HF-CAT with all three domains was 3 minutes

(3 2 min).

atatatatatatatioioioioioioionsnsnsnsnsnsns ( ( ( ( ( ((r=r=r=r=r=r=r=0.0.00000 95959595959595-0-0-0-0-0-00

T can essentially capture the information provided by the en

r c

6

Shortness of Breath measurement (r .83). Compared to all

T can essentially capture the information provided by the en

re high correlations between the simulated CAT scale sc

6 Health Survey’s Physical Function (r=-.87), and Vitality s

Shortness of Breath measurement (r=.83). Compared to all


Dow

nloaded from


Validity: We used the MLHFQ to help evaluate the constructs of the HF-CAT and the NYHA

class to evaluate its discriminative validity (Table 3). The mean MLHFQ score of the sample

was 38 ± 25, the mean score of the HF-CAT were 59.6 ± 8.4 for Physical Disability,

52.6 ± 8.5 for Fatigue, and 54.8 ± 13.3 for Dyspnea. There were no order effects for any

measure. The HF-CAT scales for physical disability, fatigue, dyspnea correlated significantly

with the MLHFQ total score (r = 0.71, r = 0.63, r = 0.68 respectively).

A general linear model was used to evaluate the ability of the HF-CAT scales to statistically

differentiate patients with different levels of symptom severity as measured by the clinician’s

NYHA classification (Table 3). The main effects for all the measures were significant, with

very similar discriminative ability (Eta², F-values) for the HF-CAT Physical Disability and

Dyspnea scales, and the MLHFQ scale.

User Experience: As this study took place in a low income, less educated, minority population

we had been particularly interested in the subjective user experience with a computer

assessment. 98% of the patients found the HF-CAT assessment overall very easy or easy,

100% thought it was very easy or easy to follow the instructions, and 95% said it was very

easy or easy to read the questions on the screen. 98% judged the time for the assessment as

‘just right’, and 90% considered the questions as relevant. 98% had been willing to use the

device again on the next visit.

eaeaaaaaassssurururrurrreseseses w w wwwwwererererrrre e e e e e e sisisisisisisiggggggg

CATTTTTTT PhPhPhPhPhPhPhysysysysysysysicicicicicicicalalalalalaa Dy ( ) y

d

s this stu took ace in a low income, less educated, minor

cularly interested in the subjective user experience with

y ( ) y

d the MLHFQ scale.

s this study took place in a low income, less educated, minor

cularly interested in the subjective user experience with


Dow

nloaded from


Discussion

For the first time we applied computerized adaptive testing methods to develop and evaluate

an ultra-short assessment system for patients with HF (HF-CAT) in clinical practice. The tool

allows routine, comprehensive assessment of three primary problems that are commonly

experienced by patients with heart failure. If the emotional or social impact of the disease is of

additional interest, further tools, e.g. from the PROMIS, need to be added for a

comprehensive coverage of the health-related quality of life construct.

Feasibility

The feasibility of the HF-CAT in its PDA version was evaluated in a low income, low

educated minority population in the Bronx, NY. It was demonstrated that the HF-CAT is a

practical tool well accepted. Nevertheless, it was tested under study conditions, and

participants might have been biased receiving an incentive for their participation. To our

knowledge, only one report about the acceptance of CATs within clinical practice settings is

available. A similar CAT, also being displayed on a PDA, is in routine clinical use since

2004. Patients answering this CAT also report a high acceptability. All most all of the 423

consecutive patients considered the handling as easy and felt that the use of the PDA made

sense (27).

Several other studies report about the reception of CATs under study conditions. The majority

of patients in a feasibility test of a pain CAT found the CAT application to be useful, relevant,

of appropriate length, and easy to complete (28). Similarly the majority of respondents in a

feasibility study of an asthma impact CAT found it easy to complete and of appropriate length

(29). The results of a feasibility test of a diabetes CAT gave somewhat mixed results. While

both English-speaking and Spanish-speaking participants agreed that a paper-and-pencil

uateteeeeeed d d ddd d ininininininin a a a aa aa l llll llowowowowowowow

p

o

h p

t b t th t f CAT ithi li i l t

population in the Bronx, NY. It was demonstrated that the

accepted. Nevertheless, it was tested under study co

have been biased receiving an incentive for their particip

t b t th t f CAT ithi li i l t


Dow

nloaded from


assessment was more burdensome than a CAT, the Spanish-speaking participants preferred

the paper tool and were more willing to complete a paper tool in the future (30).

Respondent Burden

One important contribution of the Computer Adaptive Test technology will be to reduce the

respondent burden without compromising the precision and validity of the assessment, by

tailoring each assessment to the patient’s condition. This advantage was demonstrated earlier,

for example, in a simulation study of the Activities of Daily Living CAT, which found that

the CAT provided similar results to a static version while reducing the number of items

administered by 50% (31). Results from other studies indicate that scores similar to those

obtained with full-length item banks (ranging in length from 18 to 585 items) can be achieved

through much shorter CATs when measuring functional status (32-34), mental health status

(21,27,35,36) or the impact of conditions like headache (23,37), diabetes (30), chronic

pain (28), and asthma (29). Most actual CAT applications used between 5-7 items to measure

one construct. The present HF-CAT applied between 4-5 items per scale and the average total

time for the entire assessment and scoring was 3 min, i.e. 1 min per scale (which could be

applied individually). The assessment time of the MLHFQ electronically measured in a

previous study was 4 2 min (38), and time administer the Kansas City Cardiomyopathy

Questionnaire (KCCQ), another common tool for the assessment of HF patients, is reported to

be 4-6 minutes without scoring (39).

g

ivingngggggg C CCCCCCATATATATATATAT, , ,,, whwhwhwhwhwhwhii

m

% m

e a

similar results to a static version while reducing the num

% (31). Results from other studies indicate that scores simtt

ength item banks (ranging in length from 18 to 585 items) ca


Dow

nloaded from


In summary, the HF-CAT provides a precise measure over a large measurement range with

minimal respondent burden. As far it is known today, it seems that CATs offer an effective

resolution to the classic conflict between practicality and precision faced by traditional

measurement technology (12).

Validity

Studies of CAT applications in diseases, like depression (27,35), or headache (40), have

shown that their measurement advantages can transfer to increased validity in identifying

differences between groups known to differ in clinical characteristics, compared to static

tools. The three scales of the HF-CAT also discriminated between groups of patients of

different NYHA classification equally as well as a legacy tool measuring the impact of heart

failure, using four times more items. These initial results show that the HF-CAT has the

potential to provide a valid, highly relevant assessment of patients with heart failure.

Serial Measurements

For the assessment of HF patients, we believe it is important to assess the health status of the

patient at the point of care as well as at the patient’s home. As many elderly patients do not

have access to the internet or are not familiar with its use, one way to do so is the use of a

smart phone and or interactive voice recognition. Most established tools include items which

are not suitable to be used over the phone. IRT methods allow using much simpler items over

the phone and more comprehensive items at the doctor’s office, and scoring both assessments

on the same measurement metric . This allows having a smart phone administer the HF-CAT

at the patient’s home, and have the same patient answering the more comprehensive

PROMIS-CAT on a tablet PC at the doctor’s office. IRT-based measurements of health

outcomes are independent of the particular items being administered and from the test

eeeeeetwtwtwtwtwtwtweeeeeeeeeeeeeen n nnnnn grgrgrgrgrgrgrouououououououpspspspspspsps oo o ooo

assification equally as well as a legacy tool measuring the im

t -

l

t

assification equally as well as a legacy tool measuring the im

times more items. These initial results show that the HF-t

a valid, highly relevant assessment of patients with heart fail

t


Dow

nloaded from


administrator. The same value for the same domain yields the same interpretation, whereas

results from different traditional tools cannot be compared directly making serial health status

monitoring less practicable.

Limitations

Despite many encouraging findings with recent CAT developments, a number of issues still

need to be addressed. Within this study we have only used outpatients to evaluate the HF-

CAT, which limits the generalizability to less severely disabled patients. However, one of the

most relevant advantages of CATs is that they can essentially eliminate floor and ceiling

effects by applying items tailored to the test-taker. Our simulation studies have shown that the

current item bank covers more than three standard deviations above the population mean,

which is where a hospitalized population of HF patients usually scores.

We did not evaluate the test-retest reliability for the HF-CAT. Similarly, we have not used the

HF-CAT in an intervention study to test its responsiveness to treatments. However, several

studies have reported on the ability of other CATs to detect change. For example, in a

telephone study of 540 headache patients, a CAT for headache impact was demonstrated to be

more responsive to self-evaluated changes of headache impact than a corresponding 54-item

bank (23). In a longitudinal, prospective cohort study of 94 patients discharged from inpatient

rehabilitation, the CAT version of the Activity Measure for Post-Acute Care was found to be

comparable in responsiveness to the 66-item static version (41). Similarly, in a series of

articles, Hart et al. report on the results of validation studies of condition-specific CATs,

using large data sets from patients receiving rehabilitation services across multiple U.S.

clinics (33,34).

oooon n nnnnn stststststststudududududududieieieieieieiessss s ss hahahahahahahaveveveveveveve s ssssss

covers more than three standard deviations above the popa

s

v

ti t d t t t it i t t t t H

covers more than three standard deviations above the popa

spitalized population of HF patients usually scores.

the test-retest reliability for the HF-CAT. Similarly, we hav

ti t d t t t it i t t t t H


Dow

nloaded from


Summary

In summary, we have developed a promising method to measure patient-reported dyspnea,

fatigue and physical function for use in the care of patients with heart failure. This new

measure is part of a rapidly growing number of new assessment tools utilizing the advantages

of item response theory and computerized adaptive test techniques (16,19,42), with some of

them being used in clinical practice already (27,43). However, whether these encouraging

improvements in measurement will transfer to improved care and ultimately health of heart

failure patients warrants further studies.

Sources of Funding

The work has been supported in part by an NIH/NLHBI grant (1 R43 HL083622-01, PI Rose)

Disclosures

None.

References

1. Hunt SA, Baker DW, Chin MH, Cinquegrani MP, Feldman AM, Francis GS, Ganiats TG, Goldstein S, Gregoratos G, Jessup ML, Noble RJ, Packer M, Silver MA, Stevenson LW, Gibbons RJ, Antman EM, Alpert JS, Faxon DP, Fuster V, Jacobs AK, Hiratzka LF, Russell RO, Smith SC, Jr.: ACC/AHA guidelines for the evaluation and management of chronic heart failure in the adult: executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2001;38:2101-2113.

2. Bennett JA, Riegel B, Bittner V, Nichols J: Validity and reliability of the NYHA classes for measuring research outcomes in patients with cardiac disease. Heart Lung. 2002;31:262-270.

s 2

Sources of Funding

supported in part by an NIH/NLHBI grant (1 R43 HL083622


Dow

nloaded from


3. Goldman L, Hashimoto B, Cook EF, Loscalzo A: Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new specific activity scale. Circulation. 1981;64:1227-1234.

4. Lett HS, Blumenthal JA, Babyak MA, Sherwood A, Strauman T, Robins C, Newman MF: Depression as a risk factor for coronary artery disease: evidence, mechanisms, and treatment. Psychosom Med. 2004;66:305-315.

5. Konstam V, Moser DK, De Jong MJ: Depression and anxiety in heart failure. J Card Fail. 2005;11:455-463.

6. Rumsfeld JS, Havranek E, Masoudi FA, Peterson ED, Jones P, Tooley JF, Krumholz HM, Spertus JA: Depressive symptoms are the strongest predictors of short-term declines in health status in patients with heart failure. J Am Coll Cardiol. 2003;42:1811-1817.

7. Sullivan M, Simon G, Spertus J, Russo J: Depression-related costs in heart failure care. Arch Intern Med. 2002;162:1860-1866.

8. Rumsfeld JS, Jones PG, Whooley MA, Sullivan MD, Pitt B, Weintraub WS, Spertus JA: Depression predicts mortality and hospitalization in patients with myocardial infarction complicated by heart failure. Am Heart J. 2005;150:961-967.

9. Junger J, Schellberg D, Muller-Tasch T, Raupp G, Zugck C, Haunstetter A, Zipfel S, Herzog W, Haass M: Depression increasingly predicts mortality in the course of congestive heart failure. Eur J Heart Fail. 2005;7:261-267.

10. Cleary PD, Edgman-Levitan S: Health care quality. Incorporating consumer perspectives. JAMA. 1997;278:1608-1612.

11. Burke, L. FDA Perspectives on IRT/CAT. DIA Workshop on Advances in Health Outcomes Measurement: Exploring the Current State and the Future Applications of Item Response Theory, Item Banks, and Computer-adaptive Testing, Bethesda, June 25. 2004.

12. McHorney CA, Cohen AS: Equating health status measures with item response theory: illustrations with functional status items. Med Care. 2000;38:II43-II59.

13. Rose M, Bezjak A: Logistics of collecting patient-reported outcomes (PROs) in clinical practice: an overview and practical examples. Qual Life Res. 2009;18:125-136.

14. McHorney CA, Tarlov AR: Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293-307.

15. Rector TS: A conceptual model of quality of life in relation to heart failure. J Card Fail. 2005;11:173-176.

16. Garin O, Ferrer M, Pont A, Rue M, Kotzeva A, Wiklund I, Van GE, Alonso J: Disease-specific health-related quality of life questionnaires for heart failure: a systematic review with meta-analyses. Qual Life Res. 2009;18:71-85.

B,B W WWWWWWeieieieieieieintntntntntntntrararararararaububububububub W W WW W W WSSSSSSSs wiiiiiiiththththththth mymymymymymymyocococococococararararararardidd

y

l Ar

E

dgman Levitan S: Health care quality Incorporating consume

y heart failure. Am Heart J. 2005;150:961-967. JJ

llberg D, Muller-Tasch T, Raupp G, Zugck C, Haunstetter Aass M: Depression increasingly predicts mortality in the cour

Eur J Heart Fail. 2005;7:261-267.

gman Levitan S: Health care quality Incorporating consume


Dow

nloaded from


17. Bjorner JB, Chang CH, Thissen D, Reeve BB: Developing tailored instruments: item banking and computerized adaptive assessment. Qual Life Res. 2007;16 Suppl 1:95-108.

18. Wainer H, Dorans NJ, Eignor D, Flaugher R, Green BF, Mislevy RJ, Steinberg L, Thissen D: Computerized Adaptive Testing: A primer. Mahwah, NJ, Lawrence Erlbaum Associates, 2000.

19. Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader D, Fries JF, Bruce B, Rose M: The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007; 45:S3-S11.

20. Rose M, Bjorner JB, Becker J, Fries JF, Ware JE: Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol. 2008;61:17-33.

21. Fliege H, Becker J, Walter OB, Bjorner JB, Klapp BF, Rose M: Development of a computer-adaptive test for depression (D-CAT). Qual Life Res. 2005;14:2277-2291.

22. Rubin D.B.: Matched Sampling for Causal Effects. New York, Cambridge University Press, 2006

23. Ware JE, Jr., Kosinski M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlof CG, Tepper S, Dowson A: Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Qual Life Res. 2003; 12:935-952.

24. Ware JE, Jr., Dewey J: How to Score Version Two of the SF-36 Health Survey. Lincoln, RI, QualityMetric Incorporated, 2000.

25. National Committee for Quality Assurance. Specifications for the Medicare Health Outcomes Survey. HEDIS® . 6. 2004. Washington, DC, National Committee for Quality Assurance.

26. Rector T, Cohn J: Patients'self-assessment of their congestive heart failure. Part 2: Content, reliability and validity of a new measure, the Minnesota Living with Heart Failure questionnaire. Heart Failure. 1987;3:198-209.

27. Fliege H, Becker J, Walter OB, Rose M, Bjorner JB, Klapp BF: Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. Int J Methods Psychiatr Res. 2009;18:23-36.

28. Anatchkova MD, Saris-Baglama RN, Kosinski M, Bjorner JB: Development and preliminary testing of a computerized adaptive assessment of chronic pain. J Pain. 2009; 10:932-943.

29. Turner-Bowker DM, Saris-Baglama RN, Anatchkova M, Mosen DM: A Computerized Asthma Outcomes Measure Is Feasible for Disease Management. Am J Pharm Benefits. 2010;2:119-124.

ReReReReReReRessssss.. . . . . . 2020202020202005050505050505;1;1;1;1;1;1;14:4:4:4:4:4:4:22222222222222

ork CCCCCCCamamamamamamambrbrbrbrbrbrbridididididididgegegegeggg

K Cp sa

Dewey J: How to Score Version Two of the SF 36 Health Sur

Kosinski M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlof Cpplications of computerized adaptive testing (CAT) to the assact. Qual Life Res. 2003; 12:935-952.

Dewey J: How to Score Version Two of the SF 36 Health Sur


Dow

nloaded from


30. Schwartz C, Welch G, Santiago-Kelley P, Bode R, Sun X: Computerized adaptive testing of diabetes impact: a feasibility study of Hispanics and non-Hispanics in an active clinic population. Qual Life Res. 2006;15:1503-1518.

31. Chien TW, Wu HM, Wang WC, Castillo RV, Chou W: Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation. Health Qual Life Outcomes. 2009;7:39.

32. Haley SM, Gandek B, Siebens H, Black-Schaffer RM, Sinclair SJ, Tao W, Coster WJ, Ni P, Jette AM: Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes. Arch Phys Med Rehabil. 2008;89:275-283.

33. Hart DL, Wang YC, Stratford PW, Mioduski JE: Computerized adaptive test for patients with knee impairments produced valid and responsive measures of function. J Clin Epidemiol. 2008;61:1113-1124.

34. Hart DL, Werneke MW, Wang YC, Stratford PW, Mioduski JE: Computerized adaptive test for patients with lumbar spine impairments produced valid and responsive measures of function. Spine (Phila Pa 1976). 2010;35:2157-2164.

35. Gibbons RD, Weiss DJ, Kupfer DJ, Frank E, Fagiolini A, Grochocinski VJ, Bhaumik DK, Stover A, Bock RD, Immekus JC: Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatr Serv. 2008;59:361-368.

36. Walter OB, Becker J, Bjorner JB, Fliege H, Klapp BF, Rose M: Development and evaluation of a computer adaptive test for 'Anxiety' (A-CAT). Qual Life Res. 2007;16 Suppl 1:143-155.

37. Bayliss MS, Dewey JE, Dunlap I, Batenhorst AS, Cady R, Diamond ML, Sheftell F: A study of the feasibility of Internet administration of a computerized health survey: the headache impact test (HIT). Qual Life Res. 2003;12:953-961.

38. Bennett SJ, Oldridge NB, Eckert GJ, Embree JL, Browning S, Hou N, Chui M, Deer M, Murray MD: Comparison of quality of life measures in heart failure. Nurs Res. 2003;52:207-216.

39. Green CP, Porter CB, Bresnahan DR, Spertus JA: Development and evaluation of the Kansas City Cardiomyopathy Questionnaire: a new health status measure for heart failure. J Am Coll Cardiol; 2000;35:1245-1255.

40. Martin M, Kosinski M, Bjorner JB, Ware JE, Jr., Maclean R, Li T: Item response theory methods can improve the measurement of physical function by combining the modified health assessment questionnaire and the SF-36 physical function scale. Qual Life Res. 2007;16:647-660.

41. Haley SM, Fragala-Pinkham M, Ni P: Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme. Clin Rehabil. 2006;20:616-622.

alalalalalalalididididididid a a aa a a andndndndndndnd r r r rrrresesesesesesespopopopopopoponsnsnsnsnsnsns

W Jk r

n

e ma e55

Weiss DJ, Kupfer DJ, Frank E, Fagiolini A,ff Grochocinski VJk RD, Immekus JC: Using computerized adaptive testing to r

ntal health assessment. Psychiatr Serv. 2008;59:361-368.

ecker J, Bjorner JB, Fliege H, Klapp BF, Rose M: Developma computer adaptive test for 'Anxiety' (A-CAT). Qual Life Re55


Dow

nloaded from


42. Ruo B, Choi SW, Baker DW, Grady KL, Cella D: Development and validation of a computer adaptive test for measuring dyspnea in heart failure. J Card Fail. 2010;16:659-668.

43. Becker J, Fliege H, Kocalevent RD, Bjorner JB, Rose M, Walter OB, Klapp BF: Functioning and validity of a Computerized Adaptive Test to measure anxiety (A-CAT). Depress Anxiety. 2008;25:E182-E194.


Dow

nloaded from


Table 1. Characteristics of the SamplesHF-CAT

Development(IB sample)

HF-CATEvaluation

(MMC sample)

Total Sample n = 658 N = 100

Age 60 (13) 58 (12) Years with HF 8.8 (7.9) 4.6 (4.5) Family Status Living in partnership 78% 54% Living alone 21% 33%

Gender Female 49% 38%

Ethnicity Hispanic or Latino 4% 35%

yyyy

Race White 93% 19% African American 3% 46% Other 4% 35%

Education 8th Grade or Less 0.1% 13% Some High School 3% 21% High School Graduate 15% 25%

g

Some College 39% 24% g

College Graduate 22% 11% g

Postgraduate 20% 5% g

Household income

Less than $5,000 1% 11%

$5,001 to $20,000 18% 22% $20,001 to $45,000 32% 15%

$45,001 to $75,000 23% 10%

More than $75,000 17% 5% Prefer not to answer 9% 37%

Employment status

Student .3% 4% pppp yyyy

Working at a paying job 22% 23%

Retired 56% 47%

Laid off or unemployed 3% 2% A full-time homemaker 7% 9%

Other 11% 11%

3% 21%

1% 11%

3% 21% 15% 25% 39% 24% 22% 11% 20% 5%

1% 11%


Dow

nloaded from


Table 2. IRT Item Parameters HF-CAT Item Banks

Physical Disability slope thresholds

mean 1 2 3 4 1 2 3

Exercising hard for half an hour 1 2.549 0.556 -0.123 1.236 -0.123 1.236

Doing an hour of physical labor 1 2.810 0.625 0.072 1.177 0.072 1.177

Walking up a steep hill 1 3.558 0.748 -0.154 1.650 -0.154 1.650

Rearranging furniture at home 1 3.952 1.136 0.610 1.663 0.610 1.663

Doing chores 1 4.252 1.391 0.765 2.017 0.765 2.017

Doing daily physical activities 2 3.432 1.492 0.537 1.191 1.715 2.5230.537 1.191 1.715

Climbing up a flight of stairs 1 3.728 1.535 0.824 2.246 0.824 2.246

Doing daily physical activities 1 4.092 1.583 0.881 2.285 0.881 2.285

Carrying two bags of groceries 1 3.800 1.595 1.132 2.058 1.132 2.058

Walking on flat ground 1 5.247 1.621 1.621 * 1.621 *

Preparing a meal 1 5.554 1.643 1.643 * 1.643 *

Walking one hundred yards 1 3.977 1.648 1.158 2.137 1 158 2.137

Standing up from a chair 1 3.837 1.814 1.814 *

Running errands and shopping 1 5.713 1.826 1.236 2.417 7

Dressing myself 1 4.825 1.832 1.832 * 1 832 *

Taking a tub bath 1 2.683 1.869 1.601 2.137 7

Getting from one room to another 1 5.494 1.907 1.907 *

Standing up from a bed 1 3.922 1.910 1.910 *

Getting on and off the toilet 1 3.890 1.953 1.953 *

Making the bed 1 4.330 1.955 1.444 2.465 5

Putting a trash bag outside 1 4.768 1.995 1.536 2.453 3

Fatigue

Full of energy 3 2.419 -0.475 -1.407 0.456 -1.407 0.456

Strong and vital 3 2.243 -0.421 -1.271 0.429 -1.271 0.429

Fresh and rested 3 1.979 -0.175 -1.195 0.845 -1.195 0.845

Lively 3 1.925 -0.131 -1.100 0.839 -1.100 0.839

Active 3 1.856 -0.123 -1.203 0.957 -1.203 0.957

Full of life 3 1.600 0.063 -0.756 0.881 -0.756 0.881

Tired 3 2.591 0.406 -0.638 1.450 -0.638 1.450

Fatigued 3 3.617 0.546 -0.345 1.436 -0.345 1.436

Sluggish 3 2.899 0.578 -0.383 1.539 -0.383 1.539

Worn out 3 4.090 0.647 -0.214 1.508 -0.214 1.508

Run down 3 3.445 0.679 -0.192 1.551 -0.192 1.551

Wide awake 3 1.217 0.741 -0.407 1.889 -0.407 1.889

As if I have no energy left 3 3.189 0.767 -0.072 1.606 -0.072 1.606

Spent 3 3.325 0.807 -0.104 1.719 -0.104 1.719

Exhausted 3 3.392 0.811 -0.016 1.637 -0.016 1.637

Weary 3 2.614 0.852 -0.042 1.747 -0.042 1.747

Weak 3 2.421 0.866 -0.094 1.825 -0.094 1.825

Save my energy 3 1.161 1.065 -0.064 2.195 -0.064 2.195

Sleepy all day 3 1.765 1.125 0.139 2.111 0.139 2.111

Jaded 3 1.241 1.809 0.817 2.801

1.158 2.1.11111373333

1.81.81.81.81.81.81.814141414141414 **

1.21.2222223636 2.42.444117

4.825 1 832 1 832 *

7

5

3

4.825 1.832 1.832 *

2.683 1.869 1.601 2.137

5.494 1.907 1.907 *

3.922 1.910 1.910 *

3.890 1.953 1.953 *

4.330 1.955 1.444 2.465

4.768 1.995 1.536 2.453


Dow

nloaded from


Dyspnea slope thresholds

mean 1 2 3 4 1 2 3

Running a short distance makes me short of breath 3 1.190 -0.525 -2.072 -0.090 0.587 -2.072 -0.090 0.587

Exercising hard for half an hour makes me short of breath 1.134 0.131 -0.206 0.468 -0.206 0.468

Talking while walking up a hill will make me short of breath 3 2.040 0.185 -1.455 0.385 1.625 -1.455 0.385 1.625

An hour of physical labor makes me short of breath 4 1.418 0.394 0.033 0.755 0.033 0.755

My breathing problems limit my ability to exercise as much as I would like 3 1.500 0.449 -0.529 0.685 1.193 -0.529 0.685 1.193

Talking while walking up a flight of stairs makes me short of breath 3 2.068 0.564 -0.919 0.807 1.803 -0.919 0.807 1.803

During a typical day I feel short of breath 4 2.407 0.649 -0.220 1.518 -0.220 1.518

Doing chores, like vacuuming or yard work, makes me short of breath 3 2.033 0.693 -0.608 0.763 1.924 -0.608 0.763 1.924

Climbing up one flight of stairs makes me short of breath 3 2.440 0.819 -0.646 0.983 2.120 -0.646 0.983 2.120

Going outside for a walk makes me short of breath 3 2.646 1.037 -0.213 1.197 2.128 -0.213 1.197 2.128

Walking one hundred yards makes me short of breath 3 2.351 1.052 -0.064 1.194 2.027 -0.064 1.194 2.027

Walking up a hill makes me short of breath 4 2.122 1.085 0.381 1.789 0.381 1.789

Carrying groceries makes me short of breath 3 2.796 1.102 -0.106 1.222 2.190 2 2.190

Talking while walking makes me short of breath 3 2.422 1.241 -0.023 1.313 2.434 3 2.434

Running errands makes me short of breath 3 2.677 1.351 0.191 1.415 2.448 0.191 1.415 2.448

Taking a bath makes me short of breath 4 2.849 1.404 1.009 1.800 0

Dressing myself makes me short of breath 4 3.118 1.431 0.887 1.975 5

Preparing a meal makes me short of breath 4 3.104 1.451 0.988 1.914 4

Singing or humming makes me short of breath 4 2.086 1.456 0.943 1.9700

Speaking in a group makes me short of breath 4 1.900 1.481 0.994 1.969 9

I feel short of breath when I sit and rest 4 2.775 1.543 1.543 *

Talking at noisy places makes me short of breath 4 2.187 1.606 1.170 2.043 1.170 2.043

Walking from one room to another makes me short of breath 4 3.909 1.647 1.154 2.139 1.154 2.139

Talking to someone makes me short of breath 4 2.875 1.779 1.247 2.311 1.247 2.311

Talking on the phone makes me short of breath 4 2.768 1.840 1.398 2.281 1.398 2.281

Getting off the bed makes me short of breath 4 2.958 1.849 1.305 2.393 1.305 2.393

Going to the toilet makes me short of breath 4 2.868 1.900 1.501 2.298 1.501 2.298

Lying down flat makes me short of breath 3 1.532 1.924 1.234 2.000 2.537 1.234 2.000 2.537

Standing up from a chair makes me short of breath 4 2.511 1.924 1.302 2.547

The table is ordered by the mean threshold value. Response options: 1: easy / hard / impossible, 2: no difficulty / a little bit of difficulty / some difficulty / a lot of difficulty / can’t do because of my health; 3: not at all / somewhat / very much, 4: not at all / a little bit / quite a lot / can’t do; 5: not at all / a little bit / quite a lot; * two highest response option had been collapsed for the item parameter estimation the presentation of responses options for the patient remains the same

IRT item bank parameters are developed as usual on a 0±1 metric, with 0 representing the scaling sample mean with a standard deviation of 1. For easier interpretability estimated patient scores are transformed linear to a 50+10 metric later.

-0.-0.-0.-0.-0.-0.-0.10610610610610610610 1.21.21.21.21.21.21.222222222222222

-0.000000 023023023023023023 1 31 31 31 31 31.31.31313

reath 2.677 1.351 0.191 1.415

a 0

r 5

b 4

0

o 9

4

reath 2.677 1.351 0.191 1.415

ath 4 2.849 1.404 1.009 1.800

reath 4 3.118 1.431 0.887 1.975

breath 4 3.104 1.451 0.988 1.914

of breath 4 2.086 1.456 0.943 1.970

of breath 4 1.900 1.481 0.994 1.969

st 4 2.775 1.543 1.543 *4


Dow

nloaded from


The slope parameter is also called discrimination parameter. Higher slope parameters indicate a better discrimination, which makes the item more valuable, i.e. ‘informative’, for the score estimation: the capability e.g. to ‘run errands’ is more informative to determine the physical disability of a patient than e.g. her or his ability to ‘put the trash outside the house’.

The thresholds of an item show at which score level a particular response option is the most likely to be endorsed. For the item ‘running errands’ the threshold 1.236 separates the response ‘easy’ from ‘hard’, and the threshold 2.417 ‘hard’ from ‘impossible’. If a patient scores 3 standard deviations above the population mean s/he is most likely to answer the item ‘running errands is …’ with ‘impossible’, as her/his score is above the threshold of 2.417. If her/his level of disability is only 1.5 SD above the U.S. population mean s/he is likely to endorse ‘hard’, as the score is between the thresholds 1.236 and 2.417. The mean threshold illustrates the position of the item on the metric, which can be seen as ‘item difficulty’ in traditional terms. The table is sorted by the mean threshold. n threshold.


Dow

nloaded from


Table 3. Score differences between different NYHA classes

NYHA class

I II III / IV

n=11 n=53 N=36

N°

Items Mean SD Mean SD Mean SD Eta² F p RV (95%CI) SD Eta² FSD MeanSD MeanItems Mean p

Physical Disability 4.9±1.5 53.0 6.2 58.9 8.6 62.6 7.4 .12 6.2 .003 1.01

(.38-2.20) .003

Fatigue 3.7±0.7 46.8 6.9 52.0 7.6 55.4 9.4 .09 4.9 .009 .80 (.21-1.91 3.7±0.7 46.8 6.9 52.0 7.6 55.4 9.4 .09 4.9 .80 (.21-1.91 .009

Dyspnea 4.6±1.5 43.9 14.4 53.8 12.7 59.8 11.7 .13 6.9 .002 1.13 (.34-2.67) 6.9 1.13 (.34-2.67)4.6±1.5 .13 11.7 59.8 12.7 53.8 14.4 43.9 .002

MLHFQ 21 15.5 14.8 38.3 25.3 44.9 22.9 .11 6.1 .003 1.00

Theta values of the CAT scales are scored on a T-distribution. The MLHFQ scores are summary scores ranging from 0-105. All analyses have been controlled for the order of administration as a confounding variable.

RV: Relative Validity: HF-CAT scale F-values divided by the F-value for the MLHFQ sum scale. A bootstrap analysis was used to determine the confidence intervals

iiiiiionono . . ThThhhhhe e eeeee MLMLLMLLLLHFHFHHHHHen cccccccononononononontrtrtrtrtrtrtrolololololololleleleleleleledd ddddd fofofofofofofo

a confounding variable.

alidity: HF-CAT scale F-values divided by the F-value forl

a confounding variable.

alidity: HF-CAT scale F-values divided by the F-value fortstrap analysis was used to determine the confidence interval


Dow

nloaded from


Figure Legends

Figure 1. HF CAT patient interface and examples for one item of each bank

Figure 2. Measurement precision in relation to measurement range

The x-axis shows the patient score. In IRT terminology this score is referred to as the ‘theta score’. To make the HF-CAT and the legacy tools comparable both instruments are scored on the same metric as determined by the developed item banks. The y-axis shows the 95% confidence interval of the patient score, the smaller the y-value the higher the precision of the score. The dotted lines show confidence intervals which would be comparable to an internal constancy of Cronbach 0.80, 0.90, and 0.95 for illustrative purposes.


Dow

nloaded from


With the following questions we would like to assess your current health status …

I feel tired …

not at all

somewhat

very much

I feel short of breath when I sit and rest …

not at all

a little bit

quite a lot

For me, running errands is …

easy

hard

impossible


Dow

nloaded from


0

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 4

0

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 4

0

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 40

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 4

0

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 40

0,1

0,2

0,3

0,4

0,5

0,6

-3 -2 -1 0 1 2 3 4

Physical Disability

SF-36 PF 10 items

HF-CAT 5 items

Item Bank 20 items

Item Bank 20 items

Dyspnea

Item Bank 29 items

DyspneaHOS4 items

HF-CAT 4 items

Item Bank 29 items

FatigueSF-36 VT 4 items

HF-CAT 5 items

Item Bank 20 items

Fatigue

Item Bank 20 items

30 40 50 60 70 80 30 40 50 60 70 80

30 40 50 60 70 80 30 40 50 60 70 80

30 40 50 60 70 80 30 40 50 60 70 80

HF-CAT 10 items

SF-36 PF 10 items

HOS4 items

HF-CAT 5 items

SF-36 VT 4 items

HF-CAT 4 items

Physical Disability

12

10

8

6

4

2

12

10

8

6

4

2

12

10

8

6

4

2

12

10

8

6

4

2

12

10

8

6

4

2

12

10

8

6

4

2

95%

CI

patient score patient score

=.80*

=.90*

=.95*

95%

CI

95%

CI

HOSHOSHH4 i4 i4 i4 i4 i4 i4 itemtemtemtemtemtemtemss

CATCATC temtemtemtememtememsssssss

8

Item Bank 29 items

8

6

4

2 by guest on June 23, 2018http://circheartfailure.ahajournals.org/

Dow

nloaded from


Thomas S. Rector and John E. WareMatthias Rose, Milena Anatchkova, Jason Fletcher, Arthur E. Blank, Jakob Bjørner, Bernd Löwe,

Adaptive Test (HF-CAT)Short and Precise Patient Self-Assessment of Heart Failure Symptoms Using a Computerized

Print ISSN: 1941-3289. Online ISSN: 1941-3297 Copyright © 2012 American Heart Association, Inc. All rights reserved.

is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231Circulation: Heart Failure published online April 23, 2012;Circ Heart Fail.

http://circheartfailure.ahajournals.org/content/early/2012/04/23/CIRCHEARTFAILURE.111.964916World Wide Web at:

The online version of this article, along with updated information and services, is located on the

http://circheartfailure.ahajournals.org//subscriptions/

is online at: Circulation: Heart Failure Information about subscribing to Subscriptions:

http://www.lww.com/reprints Information about reprints can be found online at: Reprints:

document. Permissions and Rights Question and Answer process is available in the

click Request Permissions in the middle column of the Web page under Services. Further information about thisEditorial Office. Once the online version of the published article for which permission is being requested is located,

can be obtained via RightsLink, a service of the Copyright Clearance Center, not theCirculation: Heart Failure Requests for permissions to reproduce figures, tables, or portions of articles originally published inPermissions:


Dow

nloaded from

http://circheartfailure.ahajournals.org/content/early/2012/04/23/CIRCHEARTFAILURE.111.964916

http://www.ahajournals.org/site/rights/

http://www.lww.com/reprints

http://circheartfailure.ahajournals.org//subscriptions/


short and precise patient self-assessment of heart failure...

Documents