evaluation of the psychometric properties of self-reported ... · evaluation of the psychometric...

19
REVIEW Open Access Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review Hannah McKenna 1* , Charlene Treanor 1 , Dermot OReilly 1,2,3 and Michael Donnelly 1,2,3 Abstract Purpose: To review studies about the reliability and validity of self-reported alcohol consumption measures among adults, an area which needs updating to reflect current research. Methods: Databases (PUBMED (1966-present), MEDLINE (1946-present), EMBASE (1947-present), Cumulative Index of Nursing and Allied Health Literature (CINAHL) (1937-present), PsycINFO (1887-present) and Social Science Citation Index (1976-present)) were searched systematically for studies from inception to 11th August 2017. Pairs of independent reviewers screened study titles, abstracts and full texts with high agreement and a third author resolved disagreements. A comprehensive quality assessment was conducted of the reported psychometric properties of measures of alcohol consumption using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) to derive ratings of poor, fair, good or excellent for each checklist item relating to each psychometric property. Results: Twenty-eight studies met inclusion criteria and, collectively, they investigated twenty-one short-term recall measures, fourteen quantity-frequency measures and eleven graduated-frequency measures. All measures demonstrated adequate/good test-retest reliability and convergent validity. Quantity-frequency measures demonstrated adequate/good criterion validity; graduated-frequency and short-term recall measures demonstrated adequate/good divergent validity. Quantity-frequency measures and short-term recall measures demonstrated adequate/good hypothesis validity; short-term recall measures demonstrated adequate construct validity. Methodological quality varied within and between studies. Conclusions: It was difficult to discern conclusively which measure was the most reliable and valid given that no study assessed all psychometric properties and the included studies varied in the psychometric properties that they selected to assess. However, when the results from the range of studies were considered and summed, they tended to indicate that the quantity-frequency measure compared to the other two measures performed best in psychometric terms and, therefore, it is likely to produce the most reliable and valid assessment of alcohol consumption in population surveys. Keywords: Self-reporting alcohol intake, Psychometric properties, COSMIN systematic review * Correspondence: [email protected] 1 Centre for Public Health, School of Medicine, Dentistry and Biomedical Sciences, Institute of Clinical Sciences Block B, Royal Victoria Hospital site, Queens University Belfast, BT12 6BJ Belfast, Northern Ireland Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 DOI 10.1186/s13011-018-0143-8

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

REVIEW Open Access

Evaluation of the psychometric propertiesof self-reported measures of alcoholconsumption: a COSMIN systematic reviewHannah McKenna1* , Charlene Treanor1, Dermot O’Reilly1,2,3 and Michael Donnelly1,2,3

Abstract

Purpose: To review studies about the reliability and validity of self-reported alcohol consumption measures amongadults, an area which needs updating to reflect current research.

Methods: Databases (PUBMED (1966-present), MEDLINE (1946-present), EMBASE (1947-present), Cumulative Index ofNursing and Allied Health Literature (CINAHL) (1937-present), PsycINFO (1887-present) and Social Science CitationIndex (1976-present)) were searched systematically for studies from inception to 11th August 2017. Pairs ofindependent reviewers screened study titles, abstracts and full texts with high agreement and a third authorresolved disagreements. A comprehensive quality assessment was conducted of the reported psychometricproperties of measures of alcohol consumption using the COnsensus-based Standards for the selection ofhealth Measurement Instruments (COSMIN) to derive ratings of poor, fair, good or excellent for each checklistitem relating to each psychometric property.

Results: Twenty-eight studies met inclusion criteria and, collectively, they investigated twenty-one short-termrecall measures, fourteen quantity-frequency measures and eleven graduated-frequency measures. Allmeasures demonstrated adequate/good test-retest reliability and convergent validity. Quantity-frequencymeasures demonstrated adequate/good criterion validity; graduated-frequency and short-term recall measuresdemonstrated adequate/good divergent validity. Quantity-frequency measures and short-term recall measuresdemonstrated adequate/good hypothesis validity; short-term recall measures demonstrated adequate constructvalidity. Methodological quality varied within and between studies.

Conclusions: It was difficult to discern conclusively which measure was the most reliable and valid given thatno study assessed all psychometric properties and the included studies varied in the psychometric propertiesthat they selected to assess. However, when the results from the range of studies were considered and summed, theytended to indicate that the quantity-frequency measure compared to the other two measures performed best inpsychometric terms and, therefore, it is likely to produce the most reliable and valid assessment of alcoholconsumption in population surveys.

Keywords: Self-reporting alcohol intake, Psychometric properties, COSMIN systematic review

* Correspondence: [email protected] for Public Health, School of Medicine, Dentistry and BiomedicalSciences, Institute of Clinical Sciences – Block B, Royal Victoria Hospital site,Queen’s University Belfast, BT12 6BJ Belfast, Northern IrelandFull list of author information is available at the end of the article

© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 DOI 10.1186/s13011-018-0143-8

Page 2: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

BackgroundAlcohol use and associated consequences are a majorpublic health problem, described as the third leading riskfactor for poor health globally [1]. Recently, new revisedguidelines from UK (United Kingdom) Chief MedicalOfficers advised adults about the likely harmful healtheffects of drinking more than 14 units/week [2], which isapproximately six 175 ml glasses of (13%) wine, six568 ml pints of (4%) lager or ale or (4.5%) cider or four-teen 25 ml measures of (40%) spirts (1 unit is 10 ml or8 g of pure alcohol) in the UK [3]. The Global Burden ofDisease Survey identified alcohol as a top five risk factorfor non-communicable disease in the UK [4]. It is im-portant that reliable and valid measures are used tomonitor and assess alcohol misuse and related problemsand, in turn, to inform public health strategies.Our initial scoping exercise indicated that data about

alcohol intake tends to be collected in surveys using oneor more of the following three types of self-report ques-tionnaires: Quantity-frequency measures ask questionsabout ‘usual’ alcohol drinking to estimate the frequency(e.g. number of days per week) and volume of alcoholconsumed (e.g. ‘how many (cans/bottles/ glasses) wereconsumed on a typical drinking day’ [5–7]). Graduated-frequency questionnaires measure the volume of con-sumed alcohol by grouping the number of drinks per oc-casion into graduated categories, beginning typicallywith the highest amount consumed by a respondent anddecreasing in pre-set categories (e.g. ‘During the last 12-months, how often did you have 12 or more drinks ofany kind of alcoholic beverage in a single day?’ ‘Duringthe last 12 months, how often did you have at least 8but less than 12 drinks of any kind of alcoholic beveragein a single day?’ [8, 9]). Short-term recall measures askrespondents to recall the alcohol that they consumedwithin a predetermined timeframe such as during theprevious week or the last 24-h (e.g. the ‘Yesterday’method) or using a diary to record all alcohol consump-tion over a period of time [10, 11].There is a need to ensure that survey instruments dis-

cern accurately alcohol consumption in order to identifythe population of drinkers who consume over 14 unitsof alcohol per week [2], or misuse alcohol. In this reviewalcohol misuse is defined as ‘drinking excessively – morethan the lower-risk limits of alcohol consumption’ [12].Gmel [13] conducted a literature review of self-reportmeasures (the quantity-frequency, graduated-frequencyand short-term recall measures) compared to biologicaltests (i.e. blood alcohol concentration) using studiespublished in this field since 2004; and Feunekes [14]conducted a systematic review of studies published1984–1999 on the capacity of the quantity frequency,extended quantity frequency, retrospective diary, pro-spective diary, and 24-h recall measures, respectively, to

classify individuals according to their alcohol intake.These previous reviews are outdated and not in keepingwith advances in survey methodology and design con-cerning alcohol research or with public health guidelinechanges (such as the reduction in alcohol guidelines inthe UK [2]). This paper presents the results of a system-atic review of all relevant research evidence regardingthe reliability and validity of different types of surveymeasures of self-reported alcohol consumption in theadult population. Reliability and validity in this revieware defined by the COnsensus-based Standards for theselection of health Measurement Instruments (COS-MIN) methodology [15]. COSMIN provided an iterativeway of assessing the psychometric properties of includedmeasures. The review adds to previous research by pro-viding the first COSMIN-type review of alcohol intakemeasures as well as providing an updated review of thealcohol consumption measures. This review addressedthe following questions:Are self-reporting measures (the quantity-frequency,

graduated-frequency and short term recall measures) re-liable and valid in their assessment of alcohol consump-tion for the general population? If so, which of the self-reporting measures are most reliable and valid? Whichmeasure most accurately identifies levels of alcohol con-sumption? The use of a reliable and valid measure in al-cohol survey research will enhance the rigour andcomparability of studies.

MethodsThe review was reported in accordance with PRISMAguidelines (see checklist attached as Additional file 1)[16]. No protocol exists for this review. Study authorssearched PUBMED (1966-present), MEDLINE (1946-present), EMBASE (1947-present), CINAHL (1937-present), PsycINFO (1887-present) and SSCI (1976-present) from their inception to 11th August 2017 forpeer-reviewed articles. Search terms were based on aCOSMIN search filter to identify studies of psychometricproperties, combined with terms relevant to alcohol in-take measures (Fig. 1).

Eligibility criteriaPapers were included if they were English language peer-reviewed studies that evaluated the reliability or validityof survey measures of alcohol consumption that were‘self-completed’ by adults aged ≥18 years via telephone,paper, computer or interview. Studies were included ifthey assessed the reliability or validity of self-report alco-hol consumption measures (the quantity-frequency,graduated-frequency or short term recall measures orany variation of these measures). Studies were excludedif they did not focus on reliability or validity, were re-views of the literature or study participants had a mental

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 2 of 19

Page 3: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

or alcohol disorder diagnosis, were in receipt of treat-ment for alcohol misuse or were being cared for in acare institution. The review focused upon evaluating thepsychometric properties of alcohol consumption meas-urement for the general drinking population; previousresearch indicates that people with an alcohol use dis-order diagnosis tend to self-report differently from otherdrinkers (see discussion [17]). Studies were excluded alsoif they measured self-reported alcohol consumptionusing other methods only (biological testing or self-reporting alcohol tests).Titles were exported to Refworks, duplicates were re-

moved and titles and then suitable abstracts werescreened and examined by HMcK, CT and MD inde-pendently. Cases of disagreement over study inclusionwere resolved via review and discussion. Data collectionfrom eligible studies involved extracting informationabout population characteristics, measures, results andCOSMIN quality ratings onto an Excel spreadsheet (seeTable 2). This was completed by HMcK and checked byother reviewers. Reference lists of literature reviews andcitation lists of included studies were searched for rele-vant papers. The search strategy identified 806 studiesafter duplicate removal, 478 remained following examin-ation of abstracts and 28 papers were included followingfull-text review (Fig. 2).

Quality assessmentPairs of independent reviewers applied the well-validatedCOSMIN checklist to assess the methodological qualityof included studies. Definitions of the psychometricproperties are provided by COSMIN (see Table 1). Infor-mation (e.g. coefficients) on psychometric properties re-ported on each measure by included studies wereassessed using the quality criteria COSMIN checklistcreated by Terwee [18] which generated ratings of good,moderate or poor. An additional methodological qualityscore was calculated for each psychometric property

checklist using the ‘worst score counts’ method, wherethe lowest rating of any of the items in an individualpsychometric property checklist is taken as the overallscore for that property [19]. Risk of bias (where evidencereported by studies may not be trustworthy [20]) wasaccounted for by assessing methodological quality ofstudies. It is important to note that the review reportedthe properties that were recorded in the original articlesand that most articles did not assess or report the fullrange of properties recommended by COSMIN.

ResultsTable 2 presents the characteristics and results from the28 papers that met inclusion criteria. It acts as a sum-mary of the content from Additional file 2: Tables S1and S2 which are included as Additional files 2 and 3.Included studies reported drinks/alcohol measures instandard sizes for the country of publication (see Add-itional file 2: Table S1). Some studies included beveragespecific measures. Studies were conducted in the USA(n = 18), Australia (n = 4), Canada (n = 2), Finland (n =2), UK (n = 1) and the Netherlands (n = 1). Most studiesincluded short-term recall measures (n = 21), quantity-frequency measures (n = 14) and graduated-frequencymeasures (n = 11). Convergent validity (n = 15), criterionvalidity (n = 14), test-retest reliability (n = 10), predictivevalidity (n = 9), inter-rater reliability (n = 5), hypothesisvalidity (n = 4), construct validity (n = 2), divergent valid-ity (n = 2), and structural validity (n = 1) were assessedacross the studies. Some studies assessed the psychomet-ric properties of more than one measure and measuretype but not one study assessed all COSMIN psychomet-ric properties.

Methodological quality assessmentThere was wide variation in methodological quality rat-ings for each psychometric property (as presented anddiscussed below).

Fig. 1 Search strategy; List of free text terms and medical subject headings searched for using the conjunctions ‘AND’ or ‘OR’ to find articleswhich met the inclusion criteria using the online bibliographic databases

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 3 of 19

Page 4: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Quantity-frequency measures achieved criterion valid-ity ratings of excellent (n = 1), fair (n = 1) and poor (n =2). Test-retest reliability quality ratings were good (n =1), fair (n = 1) and poor (n = 2), with inter-rater reliabilityrated fair (n = 1) and poor (n = 1). Convergent validityratings were good (n = 1) and fair (n = 2). Hypothesis val-idity was rated good (n = 1) and fair (n = 1). Predictivevalidity was rated excellent (n = 1) and structural validityfair (n = 1).The graduated-frequency measures achieved conver-

gent validity ratings of good (n = 2) and fair (n = 3). Test-retest reliability ratings were rated fair (n = 2) and good(n = 1) and inter-rater reliability was also rated fair (n =1). Criterion validity was rated good (n = 1), fair (n = 1)and poor (n = 1). Predictive validity was rated excellent(n = 1), good (n = 1) and fair (n = 1). Divergent validitywas rated fair (n = 1). Construct validity was rated fair(n = 1).The criterion validity ratings for the short-term recall

measures were excellent (n = 1), good (n = 1), fair (n = 1)and poor (n = 4). Convergent validity was rated good (n= 2) and fair (n = 5). Predictive validity was rated excel-lent (n = 1), good (n = 1), fair (n = 2) and poor (n = 1).Test-retest reliability scores were rated fair (n = 3), with

inter-rater reliability also rated fair (n = 1). Hypothesisvalidity was rated good (n = 1) and fair (n = 1). Divergentvalidity was rated fair (n = 1) and construct validity wasrated poor (n = 1).

Test-retest reliabilityQuantity-frequency and graduated-frequency measurescompleted by a Finnish population sample [11] and acomputer and paper administered quantity-frequencymeasure demonstrated good test-retest reliabilities [6].Moderate test-retest reliabilities were reported for aquantity-frequency measure administered to a generalpopulation sample [21] and for quantity-frequency andshort-term recall measures in an Australian general sam-ple of twins [22]. Good test-retest reliability was re-ported in an undergraduate student population samplefor a graduated-frequency measure [10] and in a generalpopulation [23]. Test-retest reliability of a daily intakeshort-term recall measure was good for an older adultsample [24]. Moderate test-retest reliability was reportedfor a short-term recall measure of ≥5 drinks consumedper drinking occasion [25]. In an older population sam-ple, inter-rater reliability was good for quantity-frequency and short-term recall measures [26] though

Fig. 2 PRISMA flow diagram [16]; Flowchart depicting the process of searching, selecting and sifting studies according to eligibility criteria. The searchstages were identification, screening, eligibility and inclusion

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 4 of 19

Page 5: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 1 COSMIN definitions of domains, measurement properties, and aspects of measurement properties [18]

Term Definition

Domain Measurementproperty

Aspect of ameasurementproperty

Reliability The degree to whichthe measurement isfree from measurementerror

Reliability(extendeddefinition)

The extent to whichscores for patients whohave not changed arethe same for repeatedmeasurement underseveral conditions: e.g.using different sets ofitems from the samehealth related-patientreported outcomes(HR-PRO) (internalconsistency);over time (test-retest);by different persons onthe same occasion(inter-rater); or bythe same persons(i.e. raters or responders) ondifferent occasions (intra-rater)

Internalconsistency

The degree of theinterrelatednessamong the items

Reliability The proportion of thetotal variance in themeasurements whichis due to ‘true’a differencesbetween patients

Measurementerror

The systematic andrandom error of apatient’s score thatis not attributed totrue changes inthe construct tobe measured

Validity The degree to which anHR-PRO instrumentmeasures the construct(s)it purports to measure

Contentvalidity

The degree to which thecontent of an HR-PROinstrument is an adequatereflection of the constructto be measured

Facevalidity

The degree to which(the items of) an HR-PROinstrument indeedlooks as though theyare an adequatereflection of theconstruct tobe measured

Constructvalidity

The degree to whichthe scores of anHR-PRO instrumentare consistent withhypotheses (for instance

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 5 of 19

Page 6: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

poor inter-rater reliability was reported in a study ad-ministering a weekly quantity-frequency measure to over65-year olds [7] and for the graduated-frequency andshort-term recall measures in a general population [27](for detailed results see Table 2).

Criterion validityStudies of quantity-frequency measures administered to thegeneral population sample [28–30] and a quantity-

frequency and short-term recall measure [31] demonstratedgood criterion validity. An annual graduated-frequencymeasure and previous 24 h short-term recall measure ad-ministered in a general population sample indicated goodcriterion validity for ‘heavy drinkers’. Poor validity was re-ported for moderate drinkers in this study (due perhaps tothe fact that consumers of lower levels of alcohol may drinkirregularly and not within the 24-h before administration ofthe short-term recall measure) [27]. An undergraduate

Table 1 COSMIN definitions of domains, measurement properties, and aspects of measurement properties [18] (Continued)

Term Definition

with regard to internalrelationships, relationshipsto scores of otherinstruments, or differencesbetween relevant groups)based on the assumptionthat the HRPRO instrumentvalidly measures theconstruct to be measured

Structuralvalidity

The degree to which thescores of an HR-PROinstrument are anadequate reflectionof the dimensionality of theconstruct to be measured

Hypothesestesting

Idem construct validity

Cross-culturalvalidity

The degree to which theperformance of the itemson a translated or culturallyadapted HR-PRO instrumentare an adequatereflection of theperformance of theitems of the originalversion of theHR-PRO instrument

Criterionvalidity

The degree to whichthe scores of anHR-PRO instrumentare an adequatereflection of a ‘gold standard’

Responsiveness The ability of an HR-PROinstrument to detect changeover time in theconstruct to be measured

Responsiveness Idem responsiveness

Interpretabilityb Interpretability is thedegree to which onecan assign qualitativemeaning - that is, clinicalor commonly understoodconnotations – to aninstrument’s quantitativescores or change in scores.

Table Legend: Table of definitions of psychometric properties measured by the COSMIN checklist, grouped by property (e.g. reliability, validity, responsivenessand interpretability)aThe word ‘true’ must be seen in the context of the CTT, which states that any observation is composed of two components – a true score and error associatedwith the observation. ‘True’ is the average score that would be obtained if the scale were given an infinite number of times. It refers only to the consistency ofthe score, and not to its accuracy [54]bInterpretability is not considered a measurement property, but an important characteristic of a measurement instrument

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 6 of 19

Page 7: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

Bonevskiet al.(2010)Australia

Group 1 was 30% maleand 70% female, Group2 37% male and 63%female, Group 3 44%male and 56% femaleand Group 4 41% maleand 59% female. Group 1mean age 25 years.Group 2 mean age 27years. Group 3 meanage 25 years. Group4 mean age 25 years.

Participants were askedto recall alcohol intakeusing either a computeror paper administeredmeasure. 4–7 days laterboth modes of measureswere administered again.

Weeklyquantity-frequencymeasure.

Test-retestreliability-kappacoefficient range(0.90–0.96).Test-retest reliabilitywas good.

Test-retestreliability(poor)

Chaikelsonet al.(1994)Canada

Random sampling wasused. The sample was100% male with meanage 69 years. Wives werealso asked same questionsvia written questionnaireto assess concordance.

Results comparedto alcohol test theMAST (MichiganAlcoholismScreening Test[55]) for reliabilityand validity.

Short-termrecall measure(drinkingoccasions inthe previousmonth recall).

Test-retest reliability-kappa coefficients(0.76) total lifetimedrinking, (0.84) lastreported month and(0.77) monthly alcoholconsumption indicatinggood test-retest reliability.Concurrent validity-correlations betweenself-reports (0.87)husband alcohol intakeand (0.85) wife alcoholintake indicating goodcriterion validity.Construct validity-correlations with theMAST self-report test in1987(0.60) with totallifetime drinking (0.05)with current drinking.Correlations with 1990data (0.53) with totallifetime drinking (− 0.14)with current drinking.Construct validity showsmoderate reportedcorrelation.

Test-retestreliability(fair)Criterionvalidity(poor)Constructvalidity(poor)

Crumet al.(2002)USA

Random sampling wasused. The samplewas 58% femaleand 42% malewith mean age76.2 years. Datawas obtained fromthe 1993–1994follow-up of theWashington Countycohort of men andwomen 65 yearsand older.

Participants completeda measure of theirusual alcoholconsumption intwo ways: (1) aquantity-frequencymeasure; (2) samequestions askedin an interviewabout drinkinghabits.

Weekly quantity-frequencymeasure.Short-termrecall measure(past week recall).

Hypothesis validity-pastweek recall of alcoholintake 15–20% lowerthan the quantity-frequency measure.Hypothesis validitywas good.Inter-rater reliability-kappa statistic value0.76 indicatinggood inter-raterreliability.

Hypothesisvalidity(good)Inter-raterreliability(poor)

Cutleret al.(1988)UK

Random samplingwas used. 63.4%of the samplewere male and36.6% female.No median ormean age wasreported butparticipantswere aged18 and older.

CAGE responsesand the quantity-frequency questionstaken from HealthSurvey Questionnairewere compared.

Weekly quantity-frequencymeasure.

Criterion validity-sensitivity (42.9)specificity (97.1)positive predictivevalue (65.8) negativepredictive value (92.8)for males and sensitivity(46.6) specificity (98.6)positive predictive value(50.3) negative predictivevalue (98.4) for females

Criterionvalidity(excellent)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 7 of 19

Page 8: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

indicating goodcriterion validity.

Dollingeret al.(2009)USA

The sample wascomposed ofvolunteers andwas 61% femaleand 39% malewith a meanage 22 years.

Responses toquantity-frequencymeasures atboth timepoints compared.Nightly log ofalcohol consumptioncompared tohours spentstudying, socialisingand religiousbehaviours.

Daily graduated-frequency measure.Short-termrecall measure(daily alcoholintake recall).

Test-retest reliability-alcohol quantitycoefficient of 0.85and an alcoholfrequency coefficientof 0.84 indicatinggood test-retestreliability.Divergent validity-religion-by-alcoholcorrelations werenegative with valuesfrom −0.14 to −0.37.Convergent validity-positive correlationswith alcohol withvalues of 0.40 and0.41 respectively.Good divergentand convergentvalidity werereported.

Test-retestreliability(fair)Divergentvalidity(fair)Convergentvalidity (fair)

Greenfieldet al.(2014)USA

Randomsamplingwas used.Respondentswere 48.1%male and 53.2%female andaged over18 years.

Participantscompletedquestionnairesand a follow-upsurvey byphone or mail.

Short-termrecall measure(occasions of≥5 drinks duringspecific lifedecades).

Test-retestreliability-kappavalues for gender(0.64–0.80),age groups(0.59–0.83),ethnicity(0.70–0.73), interviewmode (0.72–0.73) andchildhood victimisation(0.75) (0.73) indicatingmoderate to goodtest-retest reliability.Predictive validity-disclosure of priorheavy drinkingincreased risk foralcohol dependenceby 18%,increased riskof consequences by21% (by 15% whenage of onset wascontrolled), increasedrisk for alcohol-usedisorder by 18%indicating goodpredictive validity.

Test-retestreliability(fair)Predictivevalidity(fair)

Gruenewaldet al. (1995)USA

Random samplingwas used. Respondentswere 43.5%male and56.5% female andaged 18 yearsor older.

Responses tograduated-frequencymeasures attwo timepoints compared.

Gruenewaldet al.(1995)Monthlygraduated-frequencymeasure

Test-retest reliability-coefficients foraverage drinkingquantity r = 0.76and for variancein drinking quantitiesr = 0.78, indicatinggood test-retest reliability.

Test-retestreliability(fair)

Hansellet al.(2008)

Random samplingwas used. Respondentswere 40% male

The measuresexaminedwere a dependence

Annualquantity-

Test-retest reliability-continuous dataquantity x frequency

Test-retestreliability(poor)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 8 of 19

Page 9: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

Australia and 60%female andaged between19 and 90years old.

score, basedon DSM-IIIR(Diagnostic andStatistical Manualof Mental Disorders[56]) and DSM-IVcriteria for substancedependence, and aquantity × frequencyof alcohol consumedtaken from thequantity-frequencymeasure.

frequencymeasure

of alcohol (0.61)between phase 1and phase 3, and(0.55) betweenphase 2 and phase 3.Categorical dataquantity x frequencyof alcohol (0.64)between phase1 and phase 3,and (0.59) betweenphase 2 and phase 3,indicating moderatetest-retest reliability.

Hilton(1989)USA

Volunteer sample.Respondentswere 50%male and 50%female and hada mean ageof 30 years.The volunteerparticipantswere recruitedfrom the SanFrancisco BayArea newspaper.

Participants completed2 retrospectiverecall measures-graduated-frequencyand beverage-specificquantity-frequencymeasures postdiary completion.Responsescompared.

Short-term recallmeasure (10week recall).Graduated-frequency measure(30 day recall).Beverage specificQuantity-frequencymeasure (2week recall).

Convergentvalidity-correlations0.88 for volumeof drinksconsumed, 0.85for days of beerconsumed, 0.89for days of beerusually consumed,0.80 fordays ofwine consumed,0.66 for daysof wineusually consumed,0.81 fordays ofliquor consumedand 0.65for daysof liquorusually consumed,indicating moderateto goodconvergent validity.

Convergentvalidity(fair)

Koppeset al.(2002)Netherlands

Random samplingwas used. Respondentswere 46% male and54% female withmean age 36years. Data wascollected from 1time point, the2000 follow-upmeasurementof 171 maleand 197female participantsfrom theAmsterdamGrowth andHealth LongitudinalStudy.

Subjects visitedstudy premisesfor 1 day. Thequantity-frequencymeasure anddietary historyinterview werebased on alcoholconsumption overthe previous monthand were completedin no particular order.

Quantity-frequencymeasure (rangingfrom never drinkingto daily alcohol intake).Short-term recallmeasure (dietaryhistory interview).

Concurrentvalidity-correlationbetween (0.77)for menand (0.87)for women,which indicatesgood concurrentvalidity.

Criterionvalidity(poor)

LaBrieet al.(2004)USA

The sample wascomposed ofvolunteers andwas 100% malewith a meanage of 20.6 years.

Drinking variablesassessed weredrinking days,average drinks,and totaldrinks during

Short-term recallmeasure (monthlyTimeLine followback method).

Convergentvalidity-correlationcoefficients between0.52–0.69 showingmoderate convergentvalidity.

Convergentvalidity(fair)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 9 of 19

Page 10: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

211 male collegestudents participated.

a 30-dayperiod.

Lennoxet al.(1996)USA

Analysis wasconducted ofa sample of ahousehold surveyaged 18–64 years.Gender proportionswere not reported.Responses wereanalysed from 1time point (the1991 follow-up)from 8755 participantsin the 1988 NationalHousehold Surveyof Drug Abuse.

Used a latentvariable approach.In this modelcovariation amongmultiple indicatorswas used asan estimateof the latentconstruct.

Quantity-frequencymeasure ofalcohol consumptionover past30 days.

Structural validity-correlationsat 0.36, alcohol abuse andconsequences betweenconstructs correlatesat 0.28 showingpoor structural validity.

Structuralvalidity(fair)

McGinleyet al.(2014)USA

A sample of 18–20year olds wereselected fromrespondents to theNational Survey onDrug Use andHealth. Genderproportionswere not reported.

Quantity and frequencyof alcohol consumptionestimates derived fromgraduated-frequencymeasure. Estimatescompared to thequantity-frequencymeasure.

Graduated-frequencymeasure ofalcohol consumptionover past30 days.

Construct validity-midvalues for quantity ofalcohol consumedwere (3.5) and (14.5)for frequencyindicating poorconstruct validity.

Constructvalidity(fair)

NorthcoteandLivingston(2011)Australia

Respondents were47.3% male and53.3% femaleand aged18–25 years.

Participants reportednumber of alcoholicdrinks consumed1–2 days afterdrinking occasionwhich was comparedto reported alcoholintake observedby peer-basedresearchers onthe occasion.

Short-term recallmeasure (lastoccasionself-reportof drinksconsumed).

Criterionvalidity-significantassociations withp values of 0.6, 0.31,0.04 and < 0.01for: up to4 drinks, 5–8drinks, 9–12drinks andmore than12 drinksrespectively indicatinggood criterionvalidity forrespondents consuming≥9 drinks. .Convergent validity-significant at0.74, withgender specificcorrelations formen as 0.79 andwomen 0.60.Moderate togood convergentvalidity was reported.

Criterionvalidity(poor)

O’Hareet al.(1991)USA

Respondents were41.6% female58.4% maleand with meanage 20.6 years.

Participants wereasked to completemailed questionnairewith both measuresof alcohol consumptionincluded.

Weekly graduated-frequency measure.Short-term recallmeasure(retrospectiverecall of past 7day alcohol intake).

Convergentvalidity-correlationswere significantat 0.74, withgender specificcorrelations formen as 0.79and women0.60, indicatingmoderate to

Convergentvalidity(good)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 10 of 19

Page 11: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

good convergentvalidity.

O’Hareet al.(1997)USA

Random sampleof an undergraduateuniversity population.Gender proportionswere reported as‘representative ofsex’. Respondentshad a meanage of 18.7years.

All students completedquantity-frequencyquestions, MmMASTand 7 day recall.The MmMASTwas used as acriterion variable.

Weekly graduated-frequency measure.Short-term recallmeasure (retrospectiverecall ofpast 7 dayalcohol intake).

Criterionvalidity-associationwas significantat p < 0.01 indicatinggood criterion validity.Predictivevalidity-sensitivityand specificityvalues were76 and 59.8for therecall measure.Using MASTcut off score≥ 2 sensitivityand specificityvalues were59.7 and70.9 indicatingmoderate togood predictivevalidity.

Criterionvalidity(fair)Predictivevalidity(fair)

Parkeret al.(1996)USA

Random samplingwas used. Respondentswere 39% male and61% female andaged 18–64. Datawas taken fromsurveys 1987–1989,1989–1990 and1992–1993 of thePawtucket HealthProgram conductedamong homedwelling adults.

Alcohol intakeassessed withfood frequencyquestion as acomponent ofthe general healthsurvey was comparedagainst alcohol intakeassessed with agraduated-frequencymeasure as part ofa survey.

Short-term recallmeasure (beveragespecific past 24 h recall).Annual graduated-frequency measure

Concurrentvalidity-kappastatistics reportedbetween measuresranged from 0.08(p < 0.001), 0.38(p < 0.001) and0.81 (p < 0.001),indicating goodconcurrent validityfor highconsumers ofalcohol only.Inter-rater reliabilityKappa valuesfor bothmeasures were(0.28–0.47).Inter-rater reliabilitywas poor (below 0.70).

Criterionvalidity(poor)Inter-raterReliability(fair)

Poikolainenet al.(2002)Finland

Volunteer samplerecruited fromtheir workplace.Respondents were83% female and17% male witha mean age of42 years.

Quantity-frequencyand graduated-frequencyobtained beforeand after 1-monthdaily recall on alcoholintake. Blood sampleobtained at outset.

Annual quantity-frequency questionnaire.Daily graduated-frequency measure.Short-term recallmeasure (past monthrecall of intake).

Convergentvalidity-coefficientswere 0.95between theshort-term recallmeasure andquantity-frequency1, 0.95 betweenthe short-termrecall measureand quantity-frequency2, 0.90 betweenthe short-termrecall measureand graduated-frequency1 and 0.93between theshort-term recallmeasure andgraduated-frequency

Convergentvalidity(good)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 11 of 19

Page 12: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

2. Convergentvalidity wasreported as good.

Readet al.(2006)USA

College studentswho reporteddrinking differentamounts of alcoholwere selected forthe sample to berepresentative ofvariation in drinkinglevels. Respondentswere 52% femaleand 48% malewith a meanage 19 years.

College studentscompleted self-reportquestionnaire ondemographic characteristics,drinking behavioursand drinkingconsequences.Drinking consequencesassessed withcomposite measurebased on DrinkerInventory ofConsequencesand Young Adult AlcoholProblem ScreeningTest developed byresearchers.

Short-termrecall measure(past 90day intake).

Concurrentvalidity-correlationvalues of0.36, p < 0.001and withquantities ofalcohol consumedwith anr valueof 0.31,p < 0.001, indicatingpoor concurrentvalidity.

Criterionvalidity(excellent)

Rehmet al.(1999)Canada

The sample waschosen to berepresentative ofthe wider drinkingpopulation. Respondentswere 48%male and 52%female, andchosen to berepresentativeof age≥ 18 years.

Population samplesfrom 4 surveysconducted forAlcohol ResearchGroup. Surveys usedcomputer-assistedtelephone interviewswith random digitdialling samplingtechniques.

Quantity-frequencymeasure for drinkingoccasion.Annual Graduated-frequency measure.Short-term recallmeasure (pastweek recall.

Convergentvalidity-correlationsmoderate atboth approximately0.40.Predictive validity-estimatesby graduated-frequencymeasure 22%higher thanshort-termrecall estimate.Quantity-frequencyestimate ofalcohol-relatedmortality 13%than short-termrecall estimate,indicating poorpredictive validity.

Convergentvalidity(fair)Predictivevalidity(excellent)

Reidet al.(2003)USA

Random samplingwas used. Theveteran primarycare sample was3% female 97%male and thecommunity dwellingsample was 60%female 40% male.Mean ageswere 73.1 for theveteran primarycare sample and75.9 for thecommunity dwellingsample.

Telephone callallowed self-reportof quantity-frequencymeasure, binge andheavy drinkingquestions,and the AUDIT(AlcoholUse DisordersIdentificationTest [44]) and CAGE(Cut down, Annoyed,Guilty, Eye-opener[45]) tests.

Weekly quantity-frequency measure.

Inter-raterreliability-kappavalues were 0.44and 0.33. Forpopulation sample2 kappavalues were0.21 and 0.46indicating moderateto poorinter-rater reliability.

Inter-raterReliability(fair)

Russellet al.(1991)USA

Random samplingwas used. Respondentswere 50.5% maleand 49.5%female andaged over18 years.Data was

Quantity-frequencyquestions wereasked about theamount and frequencyof particular alcoholicbeverages consumedvia telephone interviewusing a random-digit-dial

Typical annualbeverage-specificQuantity-frequencymeasure

Criterionvalidity-correlationsbetween 0.73and 0.77 forsubtypes ofalcohol reportedshowing goodcriterion validity.

Criterionvalidity(poor)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 12 of 19

Page 13: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

taken from 1time point ofthe survey.

technique and supplementedby samples of homelesspeople, college studentsand those withouttelephones.

Sanderet al.(1997)USA

175 patients withtraumatic braininjury were recruitedfrom a medicalrehabilitation centrealong with theirrelatives. Respondentswere 65% maleand 35% female.Mean age 39.2years for patientsand 45.9 yearsfor relatives.

Alcohol useexamined 1year afterinjury throughquantity-frequencymeasure and briefMAST test. Patientsand their relativesboth completedmeasures andconcordancebetween reportswere examined.

Annual quantity-frequency measure

Concurrentvalidity-concordanceshowed 95.4%agreement indicatinggood criterionvalidity.

Criterionvalidity(fair)

Searleset al.(1995)USA

The sample waschosen to berepresentativeof male drinkingpopulation inVermont enrolledin the AlcoholResearch Centre.Respondentshad a medianage of 28 years(ranging from21 to 56 years)and were100% male.

Subjectsself-reporteddaily alcoholintake viatelephone.At 90days subjectscompletedan interviewusing DSMcriteria toassess alcoholabuse ordependence.

Short-term recallmeasure (Dailyself-report ofalcohol intake).Short-term recallmeasure (annualretrospective recall).

Predictivevalidity-correlations0.86 andwith alcoholrelated problemslevel as 0.69.Predictive validityis moderatebetween dailyself-report andretrospective recalland alcoholrelated problems,and goodbetween dailyself-report andretrospective recalland alcoholintoxication level.

Predictivevalidity(poor)

Searleset al.(2000)USA

Volunteersample ofthose enrolledin theVermont AlcoholResearchCentre. Respondentswere 100% maleand had amean age of36.2 years forthose withoutalcohol problemstested at outsetand 30.4 years forthose with alcoholproblems.

Participants recordedalcohol intake oninteractive voiceresponse systemusing telephones.In person interviewswere conductedevery 13 weeksduring whichthey completedtimeline followback. Resultswere compared.

Short-termrecall measure(TimelineFollow backover 366 days).Short-term recallmeasure (Dailyself-report ofalcohol intake).

Convergentvalidity-correlations0.60 at180 daysof administration,0.57 at270 daysof administrationand 0.57at 366days of administration,indicating moderateconvergent validity.

Convergentvalidity(fair)

Tuunanen etal. (2013)Finland

The sampleincluded 45year oldsresident inFinnish cityof Tampere.The samplewas 100% male.

Participants completeda mailed healthquestionnaire whichinvited previousweek recall ofalcohol intake,a quantity-frequencymeasure andstructured quantity-

Quantity-frequencymeasure (typicaldrinks consumedper occasion).Short-term recallmeasure (pastweek recall).

Hypothesis validity-thepast week recall measurereported mean alcoholconsumption lower thanthe quantity-frequencymeasure indicatinggood hypothesis validity.

Hypothesisvalidity(fair)

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 13 of 19

Page 14: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

student sample completed two graduated-frequency mea-sures and a short-term recall measure with moderate criter-ion validity [32]. Short-term recall spousal reports thatwere used as a criterion or standard to validate alcohol in-take in an older sample reported good criterion validity[24]. A short-term recall measure administered to anundergraduate student sample had poor criterion validity[33] though other studies of the short-term recall measure[34] and the short-term recall and graduated-frequencymeasures [9] reported good criterion validity (see Table 2).

Construct validityPoor construct validity was found for 30-daygraduated-frequency measure completed in an under-graduate sample (age range 18–20 years) [35]. Ashort-term recall measure compared with the MASTmeasure on two separate occasions in a sample ofolder adults reported poor to moderate construct val-idity [24] (see Table 2).

Hypothesis validityGood hypothesis validity was reported for a quantity-frequency measure compared to a short-term recall meas-ure in an older adult population sample [26] and for aquantity-frequency measure compared to a short-termmeasure in a general population sample [36] (see Table 2).

Predictive validityOne study of a graduated-frequency and short-term re-call measure that was completed by an undergraduatestudent sample demonstrated adequate to good predict-ive validity [9] whilst another (albeit small sample size)study of the same measures in an undergraduate studentsample (age range 18–20 years) recorded poor predictivevalidity [32]. A general population study found poor pre-dictive validity for the three measures [37] though mea-sured against unstandardized indicators of alcohol-related mortality, morbidity and harm. A short-term re-call measure achieved good or adequate prediction

Table 2 Summary of characteristics and psychometric properties for included studies (Continued)

Author(country)

StudyPopulation

Methodsused

Studies andmeasures

Psychometricpropertiesreportedby studies

COSMINqualityratings

frequency questionsbased onthe AUDIT.

Weingardtet al.(1998)USA

Random samplingwas used.Respondentswere 58%female and42% maleand aged 18–20 years.Data wastaken from1990 and 1994cohorts ofcollege undergraduatestudents.

Peak consumption,typical weekendquantity andtypical dailyquantity measuresused to derivebinge drinkingdata to analysevalidity. Bingedrinking definedas 5–6drinks peroccasion formen and 3–4 drinksper occasion for women.

Graduated-frequency measure(peak monthlyalcohol consumption).Graduated-frequencymeasure (typicalweekend quantity).Short-term recallmeasure (typicaldaily quantity).

Concurrentvalidity-rvalue 0.57and AlcoholDependence Scalewith r value 0.54.Predictivevalidity-dailyquantity measureclassified 6.2%of drinkersas chronicand 7.4% indicatingpoor predictive validity.

Criterionvalidity(good)Predictivevalidity(good)

Whitfieldet al.(2004)Australia

Voluntary sample.Respondents were36% maleand 64%female witha meanage of 33.7years. Datawas takenfrom 3 waves(1980, 1989and 1993)using adultmale andfemale participantsof the AustralianTwin Registry.

Test-retestreliability wascalculated ascorrelations between occasions andbetween measures.Relationships betweenalcohol use and lifetimeDSMIIIR alcoholdependence examined.

Annual quantity-frequency measure.Short-term recallmeasure (past weekrecall ofalcohol intake).

Test-retestreliability-correlationsbetween (0.54–0.70)indicatingmoderate to goodtest-retest reliability.

Test-retestreliability(fair)

Table Legend: Table summarising the characteristics, findings and COSMIN quality ratings of included studies grouped by study author, study population,methods used, studies and measures, psychometric properties reported by study authors and COSMIN quality ratings

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 14 of 19

Page 15: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

properties regarding heavy drinking (≥5 drinks per occa-sion) for samples aged 18–39 [25] and for a generalpopulation [38] (see Table 2).

Convergent validityModerate to good convergent validity was found in a gen-eral population sample for a two-week beverage-specificquantity-frequency measure, a graduated-frequency andshort-term recall measure [39]. Similarly, adequate orgood convergent validity was recorded for the three typesof measures of alcohol intake in a cohort of 20 to 63-yearolds [11] and in a general population [37]. A graduated-frequency and short-term recall measure demonstratedgood convergent validity in an undergraduate studentsamples [8, 10]. A short-term recall measure completedby undergraduate student samples reported adequate togood convergent validity [40]. Also, adequate convergentvalidity was found for short-term recall measures in amale population sample [41] (see Table 2). Only one studyreferred to divergent validity of the graduated-frequencyand short-term recall measures and only in terms of anegative correlation in an undergraduate student samplebetween religiosity and alcohol consumption [10] (seeTable 2). Similarly, only one study referred explicitly tostructural validity - a 30-day quantity-frequency measurethat was used to collect data on alcohol consumption in ageneral population reported poor validity [42] (seeTable 2).Overall, the review found that only a relatively small

number of studies investigated the COSMIN psychomet-ric domains of each type of measure. Furthermore, thehypothesis validity or structural validity of thegraduated-frequency measure was not investigated at allnor was the structural validity of the short-term recallmeasure. Divergent validity or construct validity werenot assessed for the quantity-frequency measure.

DiscussionPsychometric property ratings for measure typesEach type of measure appeared to have good criterion val-idity according to COSMIN methodology. Several differ-ent reference standards or criterions were used in theincluded studies to measure alcohol consumption (e.g. [9,29]). The appropriateness of using peers [34], spousal re-ports [24] and short-term recall measures [31] as criterionstandards is questionable and perhaps it is unsurprisingthat these studies reported a low quality rating (despitereporting good content validity). Currently, there is nogold standard for the measurement of alcohol consump-tion. Most countries use some standard unit of measure-ment (e.g. one drink, one unit) but there is a lack ofconsensus and no internationally accepted definitionthereby posing difficulties for the conduct of comparativeanalyses. Biological markers of alcohol consumption

should be used more frequently to support and validatefindings from self-reporting measures, as these methodsare not subject to sampling errors or researcher or partici-pant bias [14]. However these measures are also not with-out risk of error. Alcohol abstinence in the 24 h prior tobreath-, blood- or urine- ethanol measurement has beenshown to produce low results even for heavy drinkers[43]. More research is needed to find a gold standard foralcohol consumption measurement.Construct validity was poor for graduated-frequency

and short-term recall measures, and not assessed forquantity-frequency measures. The structural validityof the quantity-frequency measure only was assessedand this construct validity-related property wasdeemed to be poor. Only one study investigated thepredictive validity of the quantity-frequency measureand it found that the validity was poor. Poor predict-ive validity results suggest the measure may not bevalid in predicting the measurement of future alcoholintake among the general population or in predictingthe measurement of drinking trajectories and alcohol-related consequences. The study was conducted withgood methodological quality and received a goodCOSMIN score.In contrast, the graduated-frequency and short-term re-

call measures achieved mixed results including predictingwith variable accuracy the outcomes of alcohol-relatedmorbidity and mortality and alcohol dependence. Therewere several studies of the convergent validity of eachmeasure and generally this property was deemed to bemoderate to good.Test-retest results tended to indicate that similar

outcome-assessments of alcohol consumption werefound when the quantity-frequency measure, graduated-frequency measure and the short-term recall measurewere re-administered. Mixed results were reported forinter-rater reliability of quantity-frequency and short-term recall measures, with poor inter-rater reliabilityfound when the graduated-frequency measure was ap-plied. In particular, there appeared to be difficultyobtaining good agreement between raters regarding themeasurement of consumed beer, wine and liquor re-spectively [27], between self-report tests (AUDIT (Alco-hol Use Disorders Identification Test [44]) and CAGE(Cut down, Annoyed, Guilty, Eye-opener) [45]) and aquantity-frequency measure when research assistantsinterviewed participants using a face-to-face predeter-mined appointment schedule [7]. It is important to notethat these studies achieved only fair or poor COSMINratings. Indeed, many of the reported poor psychometricproperties may be due to poorly conducted studies as in-dicated by poor COSMIN ratings [6, 21, 31]. Variationbetween types of psychometric properties for the samemeasure (e.g. high validity for one property and low for

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 15 of 19

Page 16: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

another property) may be due to differences in study de-sign and methodological quality.

Discrepancies between COSMIN ratings and psychometricpropertiesThere were some studies in which there were discrepan-cies between COSMIN ratings of the quality of a psycho-metric property and the performance of a measure. Forexample, one study [6] reported good test-retest reliabilityfor a typical weekly quantity-frequency measure but themethodological quality of a particular aspect of the studywas rated poor because the method of administering the(computer or paper) measure of consumption was notconsistent across time-points. Reasons for poor methodo-logical quality ratings using the COSMIN checklist in-cluded inappropriate time intervals between measureadministrations, ambiguity over management of missingresponses, lack of assurance that patients remained stablebetween measure administrations, inadequate sample sizeand choice of inappropriate statistical methods (e.g.reporting Spearman’s correlation coefficients [46] overkappa values for test-retest reliability).

Issues with self-reporting alcohol consumptionSelf-reported alcohol consumption is difficult to measureaccurately due to the influence of social desirability andmemory issues and these factors were alluded to in manyincluded studies (e.g. [25, 27, 32, 35]). Possible solutionsto these challenges include using more anonymised inter-view types, randomised response techniques, checking re-sponses using more than one alcohol measure and usingmemory aids (interviewer prompts, calendars or diaries)[47]. Also, population-based survey research about alcoholconsumption and drinking habits are particularly prob-lematic when the sample includes alcoholics because ofuncertainty about whether or not participants are soberwhen interviewed, difficulty recalling consumption due tothe effect of alcohol on memory and increased alcohol tol-erance in frequently heavy drinkers [48]. These issues posechallenges for the reliable and valid assessment of alcoholconsumption in surveys. Potential solutions include fac-toring in more complex survey questions requiring greaterreflection on alcohol intake (if respondents are asked toconsider the timing, type of beverage drank and episodicheavy drinking their responses should be more consid-ered), [17] use of a breathalyser before measure adminis-tration to ensure participants are alcohol-free [49] andcreating an environment that is conducive to confidential-ity and honest disclosure of alcohol consumption [48, 50].These potential solutions may be incorporated intopopulation-based survey collection of alcohol consump-tion data in order to afford greater confidence in thedrinking status of participants and significant assurancethat responses reflect consumption accurately.

Comparison with previous reviewsGenerally, the measures did not appear to vary significantlyacross population age and sex groupings. The assessmentof the amount of alcohol consumed appeared to exertsome influence on the psychometric performance of self-report measures. Parker [27] reported good concurrentvalidity using a short-term recall measure though for heavydrinkers only. Gmel [13] found the graduated-frequencymeasure over reported alcohol intake, whereas the bever-age specific quantity-frequency measure provided a moreaccurate measure of consumption. The Feunekes reviewrecommended that the quantity and frequency of alcoholconsumption should be prioritised and assessed separatelyfor specific types of alcoholic beverages [14] and beverage-specific quantity-frequency measures performed accuratelyand reliably though only in relation to the consumption oflower levels of alcohol [26, 28]. The use of a ‘diary’ formatwith a predetermined timeframe (that afforded individualsan opportunity to record all alcohol consumption in a for-mat of their choice; and usually in the format of a short-term recall measure) had good psychometric properties[24, 29]. This finding may suggest that the use of an ‘actual’time period instead of the ‘usual’ timeframes in quantity-frequency and graduated-frequency measures [51] mayadd to the reliability and validity of assessments of alcoholconsumption. However both reviews found that thequantity-frequency measure performed with most reliabil-ity and validity and was the measure with the highest con-cordance with the short-term recall ‘diary’ measure [22, 29,33, 38].

Recommendations for improved reliability and validityThe review findings suggest that the reliability and validityof self-reporting alcohol consumption measures may beimproved in various ways. For example, computerised orautomated modes of administration rather than aninterviewer-based mode might facilitate greater privacyand assure more candid reporting [52]. Longer timeframesmay be more desirable as they tend to capture less fre-quent drinkers (i.e. weekly, monthly or annual recall) andquestions which involve specified timeframes (i.e. lastweek, last year) over ‘usual’ reference frames require re-spondents to focus their recall. Beverage-specific ques-tions and questions that ask respondents to groupresponses into graduated categories may encourage amore thorough consideration of their alcohol consump-tion and, in turn, produce more accurate reporting. It isworth considering that the self-report measures them-selves are outdated as they focus only upon frequency andvolume of alcohol. It may be worthwhile to instead useself-report tests to assess alcohol consumption which takeinto account symptoms of alcohol addiction/dependenceas well. Using review findings, the advantages and disad-vantages of each measure type are summarised (Table 3).

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 16 of 19

Page 17: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

Limitations and strengthsThe review found wide variation in the structure, contentand format of quantity-frequency, graduated-frequencyand short-term recall measures. For example, time-periodreferents ranged from 24-h recall to alcohol intake overthe previous year and alcohol consumption was assessedin terms of units (standardised to the country of eachsample of respondents), grams of alcohol, typical sizes ofsold drinks and beverage-specific drinks. The includedstudies from various multidisciplinary databases covered arange of locations, cultures and populations and these fac-tors were taken into account in the analytical comparisonsof measures of alcohol consumption. It is important tonote that a proportion of the review studies focused onundergraduate student populations (e.g. [8, 10, 34, 40]).Arguably, students may be atypical with respect to thegeneral population [53] and their alcohol consumptionpatterns may have limited read-across to the generalpopulation particularly the population of older people.Some psychometric properties were not assessed includ-ing measurement error, cross-cultural validity, internalconsistency and responsiveness. All studies were in theEnglish language (in keeping with COSMIN manualguidelines) and it is possible that important studies inother languages may have been missed. The review ad-hered to the COSMIN manual [15] and whilst the COS-MIN method adds rigour to the exercise of psychometricassessment, arguably, a limitation is the use of the ‘worstscore counts’ which means that despite attaining higher

quality scores on some items, the lowest score of an itemlist is taken as the overall quality rating (e.g. [28, 31]). Fur-thermore, studies of poor design quality were included inthe review due to the overall lack of studies that met initialeligibility criteria.Nevertheless, the review was completed in a methodo-

logically robust fashion as per the COSMIN approachwhich has transparent, tested and validated resources suchas a manual, search filters and a quality appraisal tool [15].Particular strengths include the use of extensive searchterms and having two reviewers search the literature.

ConclusionThe studies of quantity-frequency measures indicatedgood/adequate psychometric properties for test-retest reli-ability, criterion validity, convergent validity and hypoth-esis validity; predictive- and structural-validity were ratedas poor and inter-rater reliability reported mixed results.Regarding graduated-frequency measures, good/adequatepsychometric properties were reported for test-retest reli-ability, convergent validity and divergent validity; criterionvalidity and predictive validity reported mixed results andconstruct validity and inter-rater reliability were reportedas poor. Short-term recall measures achieved good/ad-equate psychometric properties for test-retest reliability,convergent validity, hypothesis validity, construct validity,divergent validity. Criterion validity, predictive validity andinter-rater reliability reported mixed results. The reviewfindings add to previously published alcohol self-report

Table 3 Summary table of the advantages and disadvantages of the quantity-frequency, graduated-frequency and short-term recallmeasures

Measure type Advantages Disadvantages

Quantity-frequency measures • Easily administered.• Simple structure;respondents are morelikely to understand the measure.• Well-established (respondentsare more likely to be familiar with the measure).• Captures ‘usual’ drinking behaviour,unaffected by occasions or seasons wheremore alcohol consumption may occur.• Can increase reliability byincluding beverage-specific questions.

• May not record heavyepisodic drinking occasions.

Graduated-frequency measures • Categories act as prompts for respondents.• Answers are easily standardised to identifythose drinking above the guidelines.• Can increase reliability by includingbeverage-specific questions.

• May not record heavyepisodic drinking occasions.

Short-term recall measures • Can focus questions on specific drinking events.• Requires respondents to consider their responses to agreater extent (as answers are not structured).• Respondents can report their alcohol consumption(in standard drinks sizes, units etc.) ina way they are familiar with.• Can increase reliability byincluding beverage-specific questions.

• Hard to standardise answersto the same measure recordedin different formats.• Respondents may be confusedby lack of response options.

Table Legend: Summary of the advantages and disadvantages of the three self-reported alcohol consumption measure types; the quantity-frequency, graduated-frequency and short-term recall measures

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 17 of 19

Page 18: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

literature by providing an updated appraisal of measuresof alcohol consumption research and indicate that a com-bination of aspects of the various measures may enhancethe reliable and valid assessment patterns of drinking.It is difficult to discern which one of the existing mea-

sures is the most reliable and valid given the absence ofany assessment of certain psychometric properties and themixed results of studies included in the review. Arguably,when the results from the range of studies are consideredand summed, they indicate that the quantity-frequencymeasure compared to the other two measures appeared toperform best in psychometric terms and, therefore, it islikely to produce the most reliable and valid assessment ofalcohol consumption in population surveys. The resultsindicated that the features of alcohol consumption mea-sures which performed with good reliability and validitywere those that assessed beverage-specific alcohol con-sumption, used actual timeframes and asked about epi-sodes of binge drinking; and that the quantity-frequencymeasures appeared to be the ‘best’ questionnaire-type cur-rently available to measure self-reported alcohol con-sumption. Clearly, there is a need for more focusedpsychometric studies of measures of alcohol consumptionincluding head-to-head comparative population-basedand community surveys. Comparability of review resultswith previous reviews [13, 14] is difficult because they didnot employ a COSMIN methodology to appraise studies.Overall, findings appeared to be in keeping with the re-sults of the Gmel review [13] which found a beverage-specific, quantity-frequency measure recorded alcoholconsumption more reliably, and with the Feunekes [14]which reported that the most accurate alcohol intakemeasurement was provided by quantity-frequency andshort-term recall measures.

Additional files

Additional file 1: Preferred Reporting Items for Systematic Reviews andMeta-Analyses: The PRISMA statement checklist [16]. Checklist for theminimum required items to be reported as part of a systematic review.(DOC 62 kb)

Additional file 2: Table S1. Characteristics of included studies. A fulldescription of the characteristics of each study which met the reviewinclusion criteria (n = 28). (DOCX 25 kb)

Additional file 3: Table S2. Psychometric properties of includedstudies grouped into results reported by study authors and COSMINquality ratings assigned by review authors (n = 28). (DOCX 41 kb)

AbbreviationsAUDIT: Alcohol use disorders identification test [44]; CAGE: Cut down,Annoyed, guilty, eye-opener (test for problem alcohol use) [45];COSMIN: Consensus-based Standards for the selection of healthmeasurement instruments [15]; DSM: Diagnostic and statistical manual ofmental disorders [56]; MAST: Michigan alcoholism screening Test [55];DSMIIIR: Diagnostic and statistical manual of mental disorders revised 3rdedition; DSMIV: Diagnostic and statistical manual of mental disorders 4thedition; GF: Graduated-frequency; UK: United Kingdom

AcknowledgementsNot applicable

FundingThis review was completed as part of a PhD which was funded by theDepartment of Employment and Learning Northern Ireland (DEL NI).

Availability of data and materialsAll data generated or analysed during this study are included in thispublished article [and Additional files 2 and 3].

Authors’ contributionsMD and DOR conceived of the study. HMcK and CT created the searchstrategy and HMcK conducted the search. HMcK, CT and MD reviewedstudies for suitability against the inclusion criteria. HMcK extracted studyinformation. MD and CT assisted in drafting the manuscript. All authors readand approved the final manuscript.

Authors’ informationThe study was conducted at the Centre for Public Health, Queen’s UniversityBelfast.

Ethics approval and consent to participateAll included studies involving the use of human participants were conductedwith ethical approval and consent.

Consent for publicationNot applicable

Competing interestsThe authors declare that they have no competing interests.

Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Author details1Centre for Public Health, School of Medicine, Dentistry and BiomedicalSciences, Institute of Clinical Sciences – Block B, Royal Victoria Hospital site,Queen’s University Belfast, BT12 6BJ Belfast, Northern Ireland. 2UKCRC Centreof Excellence for Public Health (Northern Ireland), Queen’s University Belfast,Belfast, Northern Ireland. 3Administrative Data Research Centre (NorthernIreland), Queen’s University Belfast, Belfast, Northern Ireland.

Received: 8 November 2017 Accepted: 18 January 2018

References1. World Health Organisation, “Global strategy to reduce the harmful use of

alcohol,” World Health Organisation, 1st May 2010. Available: http://www.who.int/substance_abuse/activities/gsrhua/en/. [Accessed 18 July 2017].

2. Department of Health, “Health risks from alcohol: new guidelines,” gov.uk,8th January 2016. Available: https://www.gov.uk/government/consultations/health-risks-from-alcohol-new-guidelines. [Accessed 1 Aug 2017].

3. DrinkAware, “What is an alcohol unit?,” DrinkAware, 16 January 2016.Available: https://www.drinkaware.co.uk/alcohol-facts/alcoholic-drinks-units/what-is-an-alcohol-unit/. [Accessed 21 Dec 2017].

4. Murray C, Richards M, Newton JN, Fenton KA, Anderson HR, Atkinson C,Bennett D, Bernabe E, Blencowe H, Bourne R, Braithwaite T, Brayne C, BrugeT, Brugha TS, Burney P, Dherani M, Dolk H, Edmond K, Ezzati M, FlemingND, Fleming ND, Freedman G, Gunnell D, Hay RJ, Hutchings SJ, LOhno S,Lozano R, Lyons RA, Marcenes W, Magnavi M, Newton CR, Pearce N, PopeD, Rushton L, Salomon JA, Shibuya K, Wang T, Wang T, Williams HC, WoolfAD, Lopez AD, Davis A. UK health performance: findings of the globalburden of disease study 2010. Lancet. 2013;381(9871):997–1020.

5. Dawson D. Methodological issues in measuring alcohol use. Alcohol ResHealth. 2003;27(1):18–28.

6. Bonevski B, Campbell E, Sanson-Fisher R. The validity and reliability of an interactivecomputer tobacco and alcohol use survey in general practice. Addicit Behav. 2010;35(1):492–8.

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 18 of 19

Page 19: Evaluation of the psychometric properties of self-reported ... · Evaluation of the psychometric properties of self-reported measures of alcohol consumption: a COSMIN systematic review

7. Reid M, Tinetti M, O'Connor P, Kosten T, Concato J. Measuring alcoholconsumption among older adults: a comparison of available methods. Am JAddictions. 2003;12(3):211–9.

8. O'Hare T. Measuring alcohol consumption: a comparison of theretrospective diary and the quantity-frequency methods in a collegedrinking survey. J Stud Alcohol. 1991;52(5):500–2.

9. O'Hare T. Comparing the QFI, the retrospective diary and binge drinking incollege first offenders. J Alcohol Drug Educ. 1997;42(3):40–53.

10. Dollinger S, Malmquist D. Reliability and validity of single-item self-reports:with special relevance to college Students’ alcohol use, Religiousity, studyand social life. J Gen Psychol. 2009;136(3):231–41.

11. Poikolainen K, Podkletnova I, Alho H. Accuracy of quantity-frequency andgraduated frequency questionnaires in measuring alcohol intake:comparison with daily diary and commonly used laboratory markers.Alcohol Alcoholism. 2002;37(6):573–6.

12. National Health Service, “Alcohol Misuse,” National Health Service, 28November 2015. Available: https://www.nhs.uk/conditions/alcohol-misuse/.[Accessed 21 Dec 2017].

13. Gmel G, Rehm J. Measuring alcohol consumption. Contemp Drug Probl.2004;31(3):467–540.

14. Feunekes G, van ‘t Veer P, van Staveren WA, Kok FJ. Alcohol intakeassessment: the sober facts. Am J Epidemiol. 1999;150(1):105–12.

15. Mokkink L, Terwee C, Patrick D, Alonso J, Stratford P, Knol D, Bouter L, deVet HC. The COSMIN checklist for assessing the methodological quality ofstudies on measurement properties of health status measurementinstruments: an international Delphi study. Qual Life Res. 2010;19(4):539–49.

16. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items forsystematic reviews and meta-analyses: the PRISMA statement. PLoS Med.2009;6(7):e1000097.

17. Toneatto T, Sobell M, Sobell L. Predictors of alcohol abusers’ inconsistentself-reports of their drinking and life events. Alcoholism Clinl Exp Res. 1992;16:542–6.

18. C. Terwee, S. Bot, M. de Boer, D. van der Windt , D. Knol, J. Dekker, L. Bouter,H. de Vet, “Terwee C, Bot S, de Boer M, van der Windt D, Knol D, Dekker J,Bouter L and de Vet H (2007) ‘Quality criteria were proposed formeasurement properties of health status questionnaires’., J Clin Epidemiol,60(1), pp. 34-42,”

19. Mokkink L, Terwee C, Knol D, Stratford P, Alonso J, Patrick D, Bouter L, deVet HC. The COSMIN checklist for evaluating the methodological quality ofstudies on measurement properties: a clarification of its content. BMC MedRes Methodol. 2006;10(22):1471–2288.

20. L. Mokkink, H. de Vet, C. Prinsen, D. Patrick , J. Alonso, L. Bouter and C.Terwee, “COSMIN risk of bias checklist for systematic reviews of patientreported outcome measures,” 12th December 2017. Available: https://doi.org/10.1007/s11136-017-1765-4. [Accessed 21 Dec 2017].

21. Hansell N, Agrawal A, Whitfield J, Morley K, Zhu G. Long-term stability andheritability of telephone interview measures of alcohol consumption anddependence. Twin Res Hum Genet. 2008;11(3):287–305.

22. Whitfield J, Madden P, Neale M, Heath A, Martin N. The genetics of alcoholintake and of alcohol dependence. Alcoholism Clin Exp Res. 2004;28(8):1153–60.

23. Gruenewald P, Johnson F. The stability and reliability of self-reporteddrinking measures. J Stud Alcohol. 2006;67(1):738–45.

24. Chaikelson J, Arbuckle T, Lapidus S, Pushkar Gold D. Measurement oflifetime alcohol consumption. J Stud Alcohol. 1994;55(1):133–40.

25. Greenfield T, Nayak M, Bond J, Kerr W, Ye Y. Test-retest reliability and validityof life-course alcohol consumption measures: the 2005 National AlcoholSurvey Follow up. Alcoholism Clin Exp Res. 2014;38(9):2479–87.

26. Crum R, Puddley I, Gee G, Fried L. Reproducbility of two approaches forassessing alcohol consumption among older adults. Addict Res Theory.2002;10(4):373–85.

27. Parker D, Derby C, Usner D, Gonzalez S, Lapane K, Carleton R. Self-reportedalcohol intake using two different question formats in southeastern NewEngland. Int J Epidemiol. 1996;25(4):770–4.

28. Russell M, Welte J, Barnes G. Quantity-frequency measures of alcoholconsumption: beverage-specific vs global questions. Br J Addict. 1991;86(1):409–17.

29. Sander A, Witol A, Kreutzer J. Alcohol use after traumatic brain injury:concordance of patients’ and relatives’ reports. Alcohol Trauma Brain Inj.1997;78(1):138–41.

30. Cutler S, Wallace P, Haines A. Assessing alcohol consumption in generalpractice patients- a comparison between questionnaire and interview.Alcohol Alcoholism. 1988;23(6):441–50.

31. Koppes L, Twisk J, Snel J, Kemper H. Concurrent validity of alcoholconsumption measurement in a ‘healthy’ population; quantity-frequencyquestionnaire v. Dietary history interview. Bri J Nutr. 2002;88(1):427–34.

32. Weingardt K, Baer J, Kivlahan D. Episodic heavy drinking among collegestudents: methodological issues and longitudinal perspectives. PsycholAddict Behav. 1998;12(3):155–67.

33. Read J, Kahler C, Strong D, Colder C. Development and preliminaryvalidation of the young adult alcohol consequences questionnaire. J StudAlcohol. 2006;67(1):169–77.

34. Northcote J, Livingston M. Accuracy of self-reported drinking: observationalverification of ‘last occasion’ drink estimates of young adults. AlcoholAlcoholism. 2011;46(6):709–13.

35. McGinley J, Curran P. Validity counts with multiplying ordinal items definedby binned counts: an application to a quantity-frequency measure ofalcohol use. Methodol (Gott). 2014;10(3):108–16.

36. Tuunanen M, Aalto M, Seppa K. Mean-weekly alcohol questions are notrecommended for clinical work. Alcohol Alcoholism. 2013;48(3):308–11.

37. Rehm J, Greenfield T, Walsh G, Xic X, Robson L, Single E. Assessmentmethods for alcohol consumption, prevalence of high risk drinking andharm: a sensitivity analysis. Int J Epidemiol. 1999;28(1):219–24.

38. Searles J, Perrine M, Mundt J, Helzer J. Self-report of drinking Uisng touch-tone telephone: extending the limits of reliable daily contact. J StudAlcohol. 1995;56(4):375–82.

39. Hilton M. A comparison of a prospective diary and two summary recalltechniques for recording alcohol consumption. Br J Addict. 1989;84(1):1085–92.

40. LaBrie J, Penderson E, Earleywine M. A group-administered timelineFollowback assessment of alcohol use. J Stud Alcohol. 2004;66(5):693–7.

41. Searles J, Helzer J, Walter D. Comparison of drinking patterns measured bydaily reports and timeline Followback. Psychol Addict Behav. 2000;14(3):277–86.

42. Lennox R, Zarkin G, Bray J. Latent variable models of alcohol-relatedconstructs. J Subst Abus. 1996;8(2):241–50.

43. Sharpe P. Biochemical detection and monitoring of alcohol abuse andabstinence. Ann Clin Biochem. 2001;38:652–64.

44. World Health Organisation. The alcohol use disorders identification test.Geneva: Department of Mental Health and Substance Dependence; 2001.

45. Ewing J. Detecting alcoholism. The CAGE questionnaire. J Am Med Assoc.1984;252(14):1905–7.

46. Daniel WW. Applied nonparametric statistics. London: Houghton Mifflin;1978.

47. Bowling A. Mode of questionnaire administration can have serious effectson data quality. J Public Health. 2005;27(3):281–91.

48. L. Sobell and M. Sobell, “Alcohol consumption measures,” 01 august 2004.Available: https://pubs.niaaa.nih.gov/publications/assessingalcohol/measures.htm. [Accessed 07 June 2017].

49. Sobell L, Toneatto T, Sobell M. Behavioral assessment and treatmentplanning for alcohol, tobacco, and other drug problems: current status withan emphasis on clinical applications. Behav Ther. 1994;25:533–80.

50. Midanik L. The validity of self-reported alcohol consumption and alcoholproblems: a literature review. Addiction. 1982;77(4):357–82.

51. Werch C. Quantity-frequency and diary measures of alcohol consumptionfor elderly drinkers. Int J Addict. 1989;24(9):859–65.

52. Lucas R, Mullin P, Luna C, McInroy D. Psychiatrists and a computer asinterrogators of patients with alcohol-related illnesses: a comparison. Br JPsychiatry. 1977;131:160–7.

53. Slutske WS, Hunt-Carter EE, Nabors-Oberg RE, Sher KJ, Bucholz KK, MaddenPAF, Anokhin A, Heath AC. Do College students drink more than their non-college-attending peers? Evidence from a population-based longitudinalfemale twin study. J Abnorm Psychol. 2004;113(4):530–40.

54. Streiner DL, Norman GR, Cairney J. Health measurement scales: a practicalguide to their development and use. Oxford: Oxford University Press; 2015.

55. Selzer M. The Michigan alcoholism screening test: the quest for a newdiagnostic instrument. Am J Psychiat. 1971;127(12):1653–8.

56. Diagnostic & Statistical Manual of Mental Disorder. Diagnostic and statisticalmanual of mental disorders, fifth edition. 5th ed. Arlington: AmericanPsychiatric Association; 2013.

McKenna et al. Substance Abuse Treatment, Prevention, and Policy (2018) 13:6 Page 19 of 19