response bias

Upload: healthy-wong

Post on 06-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Response Bias

    1/16

    Perso!~. in&rilir/. Ii//. Vat 7. No. 3. pp. 385 400. 1986Printed I Great Bntain. All rights reserved 0191-X869*86 $3.00 + 0.00Copyright t 1986 Pergamon Press Ltd

    RESPONSE BIAS, SOCIAL DESIRABILITY ANDDISSIMULATION

    ADRIAN FURNHAMDepartment of Psychology, University College London, 26 Bedford Way, London WCI. England

    (Rwckrrl 14 August 1985)Summary-This review set out to review the extensive literature on response bias. and particularlydissimulating a socially desirable response to self-report data. Various terminological differences arediscussed as well as the way test constructors attempt to measure or overcome social desirabilitv responsesets. As an example of the research in this field. four types of studies measuring social desirability in theEvsenckian personality measures (MPI, EPI, EPQ) are reviewed. Also studies of faking in osvchiatric- .,symptom inventories. and a wide range of other tests are briefly reviewed. Various equivocal results fromattempts to determine what makes some measures more prone to social desirability than others. Howeverthere appears to be growing evidence that social desirability is a relatively stable. multidimensional trait.rather than a situationally-specific response set. Faking studies may also be used to examine peoplesstereotypes and images of normality and abnormality, and various studies of abnormal groups perceptionof normality are examined. Recommendations for further work in this area are proposed.

    I. INTRODUCTIONTo what extent does the fact that a test is open to various response biases (i.e. faking, socialdesirability) mean that it is invalid? Why are some tests more susceptible to response bias thanothers? What does dissimulation tell us about a Ss conception about what is normal, desirableetc.? Are social desirability scales measures of substantive traits or response styles? This review willattempt to answer some of the above questions which, although they have been around for nearly40 years (Goldstein, 1945; Meehl and Hathaway, 1946; Gough, 1947; Hunt, 1948; Green, 195 1;Nall, 195 I), are still currently debated and researched (Linehan and Nielsen, 1983; McCrae andCosta, 1983).

    Firstly, it is probably worth distinguishing between a number of generic and synonymous termsused in this area. The first is response bias which is a generic term for a whole range of responsesto interviews, surveys or questionnaires which bias the response (from the correct, honest, accurateresponse). They include the social desirable or faking-good response (see below) as well as itsopposite faking bad (or mad), acquiescence or yea-saying (the tendency to agree irrespective of thequestion) or its opposite or nay saying, extremity response set (always choosing extreme opposites)or its opposite. mid-point response set etc. These response sets may be due to the nature of thequestion as much as the motives of the respondents [see Kalton and Schuman (1982) for anexcellent review]. A second set of synonymous terms are faking, lying and dissimulating, each ofwhich refers to the fact that the respondent is concealing the truth under a feigned semblance ofsomething different. Faking/dissimulating refers specifically to those occasions when a respondentis deliberately giving false responses in order to create a specific impression (that he or she is ill,merits a job, is mad etc.). A much more specific term is social desirability which has come to beused as a general phrase to represent tendencies to distort self-reports in a favourable direction.It has been defined by Nederhof (1985) as a Ss tendency to deny socially undesirable traits andto claim socially desirable ones, and the tendency to say things which place the speaker in afdvourable light (p. 269). Whereas faking and dissimulation refer to any sort of dishonestresponse. social desirability refers specifically to one sort of faking- the presentation of self in apositive light. That is, it should not be seen as self-deception but deliberate other-deception.

    A great deal of research has been done on dissimulation and faking though, not unnaturally.most of it has concentrated on social desirability as this appears to be the most important andcommon issue in testing. However, there is also some interest in faking bad or malingering, ascertain bodies (prisons, military establishments) often need to catch those individuals deliberatelytrying to portray themselves as sick (in mind or body).

  • 8/2/2019 Response Bias

    2/16

    3 Xh ADRIAN FURNHAM

    Test constructors have realized that response biases and dissimulation, especially socialdesirability sets, threaten the validity of test results. Nederhof (1985) mentioned studies where socialdesirability bias accounts for between 10 and 75% of the variance. He notes: Although theamounts differ considerably as could be expected because of the importance of the environmentalinfluence or social desirability. the accumulated evidence concerning the pervasiveness of socialdesirability strongly suggests the desirability of the application of appropriate methods to controlthe social desirability bias in many studies (p. 265). Hence they have set about trying to measureor overcome them. The methods are essentially four-fold: the first is to provide a measure oflying/faking (EPQ and MMPI both have lie scales) in the questionnaire itself and respondents maybe identified by excessive scores. The most popularly used and researched instrument is theMarlowe-Crowne Social Desirability Scale (Crowne and Marlowe, 1960) though there are others.Nederhof (1985) has argued that these may be used to reject the data of high-scoring Ss: to correctthe data of high scorers; or thirdly merely register the impact of social desirability bias. In fact theselie scales go back to the work of Hartshorne and May (1928). Secondly. one may correlate thequestionnaire with one of a numerous range of measures of social desirability in an attempt todiscover whether some or all of the measures are associated with patterns of social desirability.Thirdly one may do an item-by-item analysis of the question in an attempt to ascertain which itemsare sensitive or susceptible to faking in which direction. This is often a cumbersome and unreliablemethod. Finally a popular method is to measure the susceptibility of a measure to bias by askingSs to deliberately fake good or bad (or mad or whatever) and then compare these results with acontrol group who were asked to be honest in their answers.

    Of course these different methods, though related. are not synonymous and may lead to slightlydifferent results. However, what is perhaps most interesting about this research area is the relativeconsistency in the findings.

    Nearly all test constructors have been concerned about response bias. particularly socialdesirability. High correlations between measures of social desirability and the test are usually takenas a sign of the invalidity of a test. However, at least three explanations for these correlations maybe possible: the results tnay not be artifactual if indeed a person is conscientious, coping, adjustedetc. which would inevitably lead to a high social desirability score. It would indeed be an ironyif honest, healthy respondents were all seen as liars. Secondly social desirability may measure adisposition which overlaps (positively or negatively) with the other test. Whether this is a need forapproval (Edwards. 1957), social naivete (Eysenck and Eysenck, 1975) or evidence for dichotomousreasoning (Nevid, 198.1) is not clear but a significant correlation may simply indicate a certainconvergence between two individual difference measures. Thirdly, the test may simply be measuringa response set and the correlation may be seen to invalidate the test. This argument tends to ignorethe inevitable individual differences in faking, preferring to dismiss the usefulness of the testoutright.

    In his recent critical review of methods to prevent or reduce social desirability Nederhof (1985)lists seven: three deal with other-deceptive, situational determinants; one with selection ofinterviewers; one with choice of Ss: and two with the data-collection situation. They include:

    (i) Forced-choice items: Ss choose between two items equal in desirability and hencetheir choice cannot be seen to be influenced by social desirability.

    (ii) Neutral questions: only questions which are neutral as regards social desirability areincluded

    (iii) Randomized response technique: this technique allows Ss to answer one or tworandomly-selected items with the interviewer not knowing what item was answered

    (iv) Self-administered questionnaires: this reduces the salience of social cues by isolatingthe S.

    (v) Bogus pipeline: Ss are led to believe that the machine to which they are attached candetect whether or not they speak the truth.

    (vi) Selecting interviewers: social desirability is reduced when Ss are similar to theirinterviewers who must also be warm and person-oriented.

    (vii) Proxy Ss: instead of interviewing Ss. someone who knows him or her well isquestioned about the behaviour of the target person.

  • 8/2/2019 Response Bias

    3/16

    Response bias, social desirability and dissimulation 387

    In conclusion, not one method excels completely and under all conditions in coping with socialdesirability bias. Most methods were shown to be at best reasonable palliatives. A combinationof one or several prevention methods and one of the detection methods seems the best choiceavailable. It should be noted that the exact effectiveness with regard to both other-deception andself-deception of most of the prevention methods has yet to be determined empirically (p. 276).

    The issue of response bias, social desirability and dissimulation has been actively researched sincethe Second World War. Some self-report measures have been more extensively investigated thanothers but the research is voluminous. This paper will review some of these studies in depth beforegoing on to address some of the more important issues resulting from this work. Specifically theEysenck personality measures and psychiatric symptom inventories will be considered because oftheir extensive use in research and applied settings.

    2. STUDIES OF SPECIFIC QUESTIONNAIRESThe Eysenck personality measures

    The Eysenck Personality Questionnaire (EPQ; Eysenck and Eysenck, 1975), and its predecessorsthe EPI and the MPI, has attracted a good deal of attention in terms of its fakability (Keehn, 1961;Star, 1962; Martin and Stanley, 1963; Vingoe, 1966; Braun and Gomez, 166; Farley, 1966; Gorman,1968; Salas, 1968; Harrison and McLaughlin, 1969; Farley, 1970; Rump and Court, 1971; Gray,1972; Velicer and Weiner, 1975; Farley and Goh, 1976; Dunnett, Koun and Barber, 1981; Furnhamand Henderson, 1983a). There have been essentially four methods employed in the measurementof social desirability (SD) in the Eysenck scales: correlational studies on actual vs estimated scores;correlations between the Eysenckian dimensions and the SD scale; studies looking at differencesin the responses of faking and control groups; and finally, studies looking at the social desirabilityof each question in the Eysenck scales.

    The earliest studies (other than those by Eysenck himself) were done on the MaudsleyPersonality Inventory (MPI; Keehn, 1961). Martin and Stanley (1963) correlated the MPI and asocial desirability scale and found a significant negative correlation with N, a significant positivecorrelation with the L scale, but no correlation with E (but a small significant positive correlationwith E and the positive form of the SD scale). Most work has however been done on the EPIeither by getting Ss to estimate their own scores or fake good or bad.

    Vingoe (1966) asked adult students to estimate their E score on a 7-point scale and comparedit to the score derived from the standard questionnaire. He found that introverts are more awareof their position on the E-I scale than extraverts, though he did not systematically examine theresults for the N scale. Harrison and McLaughlin (1969) used a larger population and found a veryclose correspondence between Ss estimates of their own E and N scores and their actual scoreson both measures (0.72 and 0.56, respectively). Gray (1972) who used Vingoes method also foundsignificant, but lower, correlations between self-ratings and actual scores on E (0.48) and N (0.21).Extraversion and introversion are concepts that are widely used and well-understood by the layperson or naive S. More recently Furnham and Henderson (1983a), using the EPQ, found a positivecorrelation between E (0.31) N (0.47) and P (0.30) in a study requiring 63 Ss to estimate theiractual score.

    Studies on the fakability of the EPT have generally shown consistent results. Braun and Gomez(I 966) asked approximately half their student Ss to answer not necessarily honestly or truthfullybut rather so as to put your best foot forward and make a good impression while the other halfwere a control group. Although E scores did not show a difference, N scores decreased and L scoresincreased as a function of the faking sets used in this study. Gorman (1968) asked student Ss tofake good and fake bad. Predictably the fake good sample had significantly higher E and L scores,but significantly lower N scores. He notes the L scale is useful for detecting those who are fakinggood, but not those who are faking bad.

    Salas (1968) was interested in using the EPI for detecting malingerers-one of the most practicaluses of faking studies (Cofer. Chance and Judson, 1949; Stanley and Salas, 1966). He administeredthe questionnaire twice-once under normal conditions and then to respond in a manner youwould expect of a neurotic badly adjusted soldier (p. 56). All three dimensions yielded significantdifferences: the fake bad scores were significantly lower on E and L, but significantly higher on

  • 8/2/2019 Response Bias

    4/16

    38X ADRIAN FLKNHAM

    N. He concludes that because the L scale assists rather than exposes concealment, the EPI scaleoffers little promise of reliably discriminating between malingerers and genuine neurotics. Howeverin a replication of this study using 40 college students. Farley (1970) found no significant effectson the fake bad set on E or L, but that N was greatly increased. He argued that because of selectiveand special effects concerning Salass soldier population, that his (Salass) results were notrepresentative of many groups.

    In a rather different faking study Velicer and Weiner (1975) got various groups of Ss either tofake a salesman, a librarian, ideal self or actual self (control) in their responses to the EPI.Significant differences were found for the E. N and L scales. The ideal self group had the lowestE and N scores and the highest L score. They argue that Ss with even a minimum degree ofsophistication can successfully fake the EPI.

    Farley and Goh (1976) asked three groups of students either to give their best socially desirableimpression, the worst mentally ill impression or respond normally to the EPI. As before the N,and to a lesser extent the L. scale differed between the fake good and control group so throwingdoubt on the usefulness of the N scale. Also the psychiatrically ill. fake made set influenced allscales. significantly increasing psychoticism. neuroticism and introversion. However they did note:It should be noted that at least where the best impression set is concerned, there is a built-in guardagainst such dissimulation on the test. That is. effects of this response set can be very easily detectedby the changing scores on the L scale. . Scores on this scale more than double under the bestimpression condition (the worst impression condition having no significant effect on L). (p. 146)

    Michaelis and Eysenck (1971) looked at real-life motivational effects on EPI faking in twowork-application groups -one (highly motivated) who believed the test scores ~coulcl be used inselection and one (lowly motivated) that they \ro~lll tmt be used. They found that when Ss weremotivated to present themselves in a positive light, L scores went up as did the correlation betweenthe L score and N from approximately zero for the non-motivated group to between 0.5 and 0.6.They note: We can in fact postulate a general rule. to the effect that a large proportion of highlie scores, together with a strong negative correlation between scores for lying and neuroticism. willusually be found in groups completing tests under certain conditions of high motivation. Theirpresence. in fact, might even be used as a measure of motivation. (p. 131) It is this fact that allowsthe investigator to distinguish between dissimulation and conformity. However Montag [un-published study cited in Eysenck and Eysenck (1976)] only found a negative correlation of -0.17between N and L in a highly motivated group probably due to both threshold and ceiling effects.They suggest that the L scale may be used for correcting the N score when the correlation betweenthe two scales is high enough to indicate Ss faking.

    Eysenck. Eysenck and Shaw (1974) In a set of four cxperimcnts contirmed the finding ofMichaclis and Eysenck (1971) and found that special honest instructions significantly increasedN scores and decreased L scores: and that L scores have a high internal reliability suggesting thatthey measure a stable trait. (It should be pointed out this technique of special honesty instructionswas pioneered by the Israeli psychologist Dr L. Montag.) They concluded that in conditions whichare likely to provide high motivation. or dissimulation (faking) it may be useful to employ honestyinstructions in order to obtain scores nearer to the levels which would have been achieved underIOU. moti\.ation conditions. Dunnctt ct (11. 1981 ). on the other hand. found that Ss asked to fakegood responded in the direction of stable extraversion. while Ss asked to fake bad tended to appearas neurotic introverts. Most importantly the L scale did not effectively discriminate honest Ss fromthose asked to fake good or bad.

    Other approaches have been taken to investigating social desirability elrects using the EPI. Powerand MacRae (1977) found that Ss could successfully simulate extraversion, introversion,neuroticism and stability on the EPI. They found that the E scale appeared most susceptible tosimulation and the L scale the least. Earlier Power and MacRae (1971) showed that Ss couldidentify all the items that measured cxtra\,crsion. neuroticism and lie on the EPI.

    Most of the early studies were concerned specifically with the correlations between the N andL scale in abnormal populations. Gibson (1962) tested apprentices at work and found a significantcorrelation of -0.36 between N and L scales; male psychopaths gave :I significant correlation of-0.31 between the EPI L and N scales (McKerrachcr and Watson. 196X); depressed patients gavea significant correlation also (Bailey and Metcalfe. 1969); male cancer patients showed a significant

  • 8/2/2019 Response Bias

    5/16

    Response bias, social desirability and dissimulation 389

    association between the L and N scales (Huggan, 1968); and a marked decrease in N with increasingL scores was found with anxious and neurotic patients (Knowles and Kreitman, 1965). Williams(1969) also showed that the rank correlation between mean N and L scores across eight variedgroups was as high as -0.79.

    There have also been a number of studies that have correlated the EPT with measures of socialdesirability in normal populations. Farley (1966) correlated the EPI with the Edwards (1957) SDscale and the MarloweeCrowne (Crowne and Marlow, 1960) SD scale and found that E and Nscores were significantly negatively associated with socially desirable responding but that themagnitude of these relationships were highly dependent on the scale used. Feather (1967) founda significant correlation of -0.68 with the CrowneeMarlow SD scale and the N scale for a groupof female students. Rump and Court (1971) correlated the EPI and various social desirabilitymeasures (Edwards SD scale, Crowne-Marlowe SD scale) in groups of students and clinical (renalpatients and donors). They found negative linear correlations with the N scale and the SD andL scales for all groups, but no reliable correlations between the E and SD scales. Hence the authorshave some doubts about the N scale of Ss wish to protect their self-esteem or confound theirassessors.

    Finally, some studies have attempted an item-by-item analysis of the EPQ. Furnham (1984) gave30 Ss the 90-item EPQ, read out the definition of neuroticism from the manual, and gave themup to 23 choices to identify those items measuring neuroticism.

    The mean number of items selected by the Ss was 21.3 (SD = 3.41). Overall their correctidentification for the 23 neurotic items ranged from under 10% to over 90%, the mean correctidentifications being 53.9%. Six items were identified by over 70% of the Ss and they relatedprimarily to worrying, while six items were identified by less than 30% of the Ss and they relatedprimarily to feeling bored and listless. On the whole, Ss appear to be reasonably accurate in theirability to detect items in a questionnaire that are measuring neuroticism.

    In the EPQ manual Eysenck and Eysenck (1975) point out that the L scale was incorporatedto measure a tendency to fake good, but that there is evidence to suggest that it also measures astable personality factor denoting social naivete though in more recent years they have lookedupon it as a measure of social conformity, which is not altogether different but a conception morein line with the data. Their argument goes thus If dissimulation were the only factor affecting thevariance of this score, then the reliability of the score should be a function of the size of score;when scores are low, thus indicating that subjects are not dissimulating, then the scale should havelow reliability. Empirically, this has not been found to be so; there is no lowering of reliability ofthe L scale under conditions of little dissimulation, and no increase in reliability under conditionsof high dissimulation. Hence the scale must measure some stable personality function; un-fortunately little is known about the precise nature of this function. (pp 15-16)

    Overall then, extensive research on the Eysenck personality measures may be summarized thus:whereas the N (and P) scale seems particularly sensitive to faking the E score is less sensitive. Thisfinding however may be due to the fact that most of the studies have been done in England whereintroversion is more acceptable than in the U.S.A. It may well be that extraversion shows higherrelationships to social desirability in the U.S.A. There is some debate as to the sensitivity andusefulness of the L scale which may itself be a trait rather than a response bias measure. That is,the L scale may be relatively sensitive at detecting faking, but may itself be susceptible to variouseffects. Eysenck, Nias and Eysenck (1971) in fact provided a useful summary of the literature:

    A number of facts have been demonstrated regarding L scales and their relationto personality scales. (1) Scores on personality inventories of all kinds can beincreased or decreased when subjects are instructed to fake good or fake bad.(2) L scale scores are increased when the test is taken under fake good instructions.(3) Life situation motivation to present oneself in the best light, as in employmentselection test, decreased scores on neuroticism and increased L scale scores.(4) Diagnosed neurotics who have low N scores tend to have high L scores.(5) Correlations between N and L are nearly always negative, but the sizes of thesecorrelations differ widely. (6) The items of the L scale show evidence of reasonablehomogeneity. (7) Many of these findings on adults have been replicated with children.

  • 8/2/2019 Response Bias

    6/16

    390 ADKIAK F LKNHAM

    It thus seems likely that the L scale does in fact perform in some degree the functionallocated it in personality testing with inventories, i.e. to measure test-takingattitudes. (pp. 23324)

    Ps~d1iu trk .synzp tom inren torit>sThere are also a number of self-report psychiatric symptom questionnaires which aim to measure

    minor psychiatric morbidity in both normal and clinical populations. These include the Generalhealth Questionnaire (GHQ; Goldberg, 1972), the Middlesex Hospital Questionnaire (MHQ;Crown and Crisp, 1966) and the Langner 22-item Scale of Distress (L-22; Langner. 1962). Eachquestionnaire, which is extensively used in psychological. psychiatric and epidemiological research,has acceptable levels of validity and reliability. It would be quite incorrect to believe that theconstructors of these self-report measures were blind to the problems of questionnaires as a meansof collecting valid and reliable data. In fact, Goldberg (1972) noted the problems of the unreliableinformant, the defensive S and the overemphatic or histrionic S as well as various response sets(agreement. social desirability, positional bias). As a result the instrument was devised not to beused as a simple self-report inventory but should be followed by the Clinical Interview Schedulewhich provides the necessary measures of specificity and sensitivity. Yet despite the fact that theGHQ was not devised as a continuous measure but to provide a simple cut-off point to discriminatebetween the case/non-case it has been widely used on its own (or in conjunction with the L-22)as a single measure of psychological disturbance (Cochrane. Hashmi and Stopes-Roe. 1977;Cochrane and Stopes-Roe, 1981). Hence, in these circumstances faking Ss are less likely to bedetected.

    There have been a few recent studies attempting to measure social desirability and faking in thesepsychiatric inventories. Parkes (1980) administered the GHQ, the MHQ, the EPQ and the Df(Defensiveness) scales of the adjective Check List to 101 nurses. She demonstrated that certainpersonality types-i.e. those high on SD and Df-would be motivated to report less distress thanthose not scoring so high on these dimensions. She found, for instance, that defensive individualsnot only eliminate the negative from their responses but actually accentuate the positive, in thatthey tend to endorse characteristics such as conscientiousness and perfectionism. which theyperceive as favourable.

    Later Furnham and Henderson (1983b) asked Ss either to fake well (physically and psycho-logically), fake psychologically ill, fake physically ill or respond honestly when completing theGHQ and the L-22. Predictably, Ss who faked well had significantly lower scores than either ofthe other two experimental groups, however there was no difference between the Fake Well andControl group on the GHQ Total or the Somatic and Depression scales. Ss who fakedpsychologically ill showed 4 out of I I significant differences between those who faked physicallyill. All of the comparisons between the two Fake Ill groups, and the Fake Well and Control groups,were statistically significant the former reporting higher incidences of distress than the latter.

    The most interesting findings were in the two Fake Ill groups. In both the GHQ and L-22subscales there was a separation between psychological and physical symptoms. Yet in less thanhalf the comparisons, the two S groups showed significant differences, and then occasionally in theopposite direction to that which was expected (compare the GHQ Somatic and Depression scales).Though the differences were not significant the Ss in the Fake Psychologically III groups hadconsistently higher Distress scores than those in the Fake Physically Ill. This may reflect either thewording of the questions or the stigma attached to psychological as opposed to physical illness.Neither GHQ or L-22 Total scores showed up these differences which suggests that if Ss wouldwish to present themselves as physically as opposed to psychologically ill (or vice versa) they wouldhave difficulty in doing so.

    Comparing the two measures, it seemed that the L-22 performed better than the GHQ. Bothsubscales on the L-22 worked in the predicted direction-i.e. on the Psychological subscale theFake Psychologically Ill group scored significantly higher than the Fake Physically 111group--andthe Duncan comparisons were on the whole higher between the four groups. The scale has theadded advantage of being only a third as long as the GHQ (60-item version). though not as shortas the GHQ (12-item version), and easier to score. Ss also reported that they understood thequestions and response categories better on the L-22 than on the GHQ.

  • 8/2/2019 Response Bias

    7/16

    Response bias, social desirability and dissimulation 391Ross and Mirowsky (1984) have argued that giving socially desirable responses is a learned,

    adaptive strategy especially in social groups that are relatively powerless or stress the importanceof a proper image. They provided evidence for this supposition by finding cultural (Mexican vsAmerican), age (older vs younger) and social class (lower vs higher) correlates of social desirability.As predicted there was a significant negative correlation between social desirability and psycho-logical distress as measured by the L-22 index. Regressional analyses showed that the effects ofsocio-economic status on distress increases, while the effect of age decreases, when socialdesirability is controlled.

    Because psychiatric symptom inventories are used in clinical settings for diagnosis their fakabilityis clearly an important issue. Clearly they are open to fakability but precisely who or why peoplemay give untruthful answers is not clear. One obvious explanation which will be pursued later isthat social desirability and L scales are measuring a trait that is closely linked to measures ofpsychological distress or ill health.

    It should be pointed out in this section that there is a large literature on the MMPI L and Kscales. The L scale was designed to identify deliberate attempts to lie and consists of 15 items thatrefer to denial of aggression, bad thoughts, weakness of character or resolve, poor self-control,prejudices and minor dishonesties. Because early research on the L scale showed that it wasinsensitive to various kinds of distortion the K scale was introduced as a more subtle and sensitiveinstrument to detect Ss attempting to deny or exaggerate psychopathology. The 30 items coverseveral different areas such as hostility, worry, poor self-confidence, family dissension on which aperson can deny problems (Graham, 1978). The L and K scales together with various other scales(? = cannot say; F = frequency) make up the validity scales. According to Dahlstrom, Welsh andDahlstrom (1972), Generally it has been found that the two kinds of self-presentation reflectedin L and in K are mutually exclusive when test subjects are completing the MMPI under the usualkinds of motivation that operate in a clinical setting in which the client is seeking some kind ofhelp and the assessment is being carried out to further those client-centered ends. Test subjects whoare naive and relatively less complicated psychologically will, when trying to present themselvesfavourably to someone in authority, usually endorse the more blatant item content of the L-scalestatements but will somehow fail to grasp the more subtle self-denigrations embodied in the K-scaleitems. With greater knowledge of themselves and with greater worldliness about human foibles,the sophisticated test subject avoids the unbelievable and homely virtues of the L scale and readilyacknowledges commonplace character defects: but when presented with the opportunity for moresubtle and perhaps more cogent self-enhancements that are provided by the K-scale items, he isable to rationalize and equivocate in such a way as to give himself the benefit of the doubt, thusearning an elevated K score (p. 169). Numerous review papers have been written about these scalesand contemporary psychometric research attests to the validity, reliability and discriminability(Colligan, Osborne, Swenson and Offord, 1983).Other measures

    Apart from clinical and personality measures a wealth of other self-report questionnaires havebeen the subject of dissimulation studies. For instance, Hogan (1972) tested the fakability of theAdorn0 F-scale and found that whereas there was no difference between the control groups andthe fake good group, the fake bad group had a significantly higher score. Becker (1976) actuallylooked at possible response bias in a social desirability questionnaire itself. He administered theCrol+,ne+Marlowe SD scale to Ss who either remained anonymous, had a code number or put theirname and number on the questionnaire. As predicted it was found that putting ones name on thequestionnaire actually increased the likelihood of a higher socially desirable response.

    Rock (198 1) found two assertiveness measures-the popular Rathus Assertiveness Schedule andthe widely used Gambrill and Richey Assertion Inventory+orrelated highly positively (0.72 and0.69, respectively) with the Edwards SD Questionnaire. Similarly McNamara and Delamater (1984)found Ss high in assertiveness evidenced greater needs for social approval (higher SD scores) thanthose low in assertiveness.

    Braun and Asta (1969), Braun and Costantini (1970) and Holden and Jackson (198 1) have allused the Personality Research Form to look at the effects on faking good and bad. They all foundvery much the same result, notably changes on the Achievement, Affiliation, Autonomy and

  • 8/2/2019 Response Bias

    8/16

    dominance scales. However. Holden and Jackson (1981) did find that multiple criterion validitycoefficients were higher in the honest/control condition than in either the fake good or fake badcondition.

    Stanwyck and Garrison (1982) actually looked at the detection of faking on the T~wtw.s.swS elf-uttw ~pt S crrle by looking at the faking in six groups: MAXBAD. MAXGOOD (present theworst/best possible picture of themselves), MODBAD,!MODGOOD (present oneselfnegatively/positively but attempt to avoid detection of deliberate falsification and CONTROL).Various multivariate and univariate analyses showed that the scale was sensitive to faking whenrespondents wished to create a favourable impression. Faking in the negative bad directionproduced dramatically different results from the control group profile. The authors suggest thata faking key or template be produced to catch fakers.

    Worsley, Baghurst and Leitch (1984) looked at social desirability bias in a Dietor), Inwtztor~~.In the first study of airforce recruits they found a diet frequency inventory correlated with twosocial desirability tests particularly with regard to the reported consumption of fresh fruit andvegetables (positive) and snack foods (negative). A second item-analysis study confirmed thisfinding. They draw various conclusions from their study: (I) Ratings of food intake can beinfluenced by social desirability biases. Such social desirability can be estimated. (2) Reportedintakes of certain foods. such as fresh fruits and vegetables. and sweet foods arc more susceptibleto social approval needs than others. (3) The SDF scale, or similar scales, could play an importantrole in dietary surveys in identifying and perhaps controlling for social desirability variables.However, further development of such a scale is required. (4) Other subject-variables such asacquiescence, extreme scoring, halo effects, should be investigated in order to determine theireffects on responses to dietary inventories (p. 34).

    The range of tests subjected to social desirability and dissimulation studies are extensive. Theyinclude the S trong Vocutiotd Itltcrvst Blu ttk (Gehman. 1957) and the T AT (Weisskopf and Dieppa,1951) as well as studies done in simulated selection or testing situations (Wesman. 1952). By andlarge they show that given specific instructions Ss are able to accurately fake certain responses.However, this is not always the case. Furnham and Henderson (1982) found Ss could not veryeasily fake the S clf-ttt~t 7itorit7~ Sccrlr (Snyder. 1974) which measures the extent to which people aresensitive to situational cues of social appropriateness and adjust or alter their behaviourappropriately. In their study Furnham and Henderson (1982) asked Ss to fake good, bad and mad(as well as having a control group) on eight personality measures. The scales which showed mostdifrerences were the P scale from the EPQ and the Edwards SD scale. However the Self-monitoringScale (Snyder. 1974) and the Locus of Control Scale (Rotter, 1966) scale yielded no significantdifferences between the fake good and fake bad; nor between control and fake good or controland fake bad. It was only on the fake good vs fake mad that any significant difference arose. Thisleads on to the important question considered in the next section.

    3. WHAT MAKES SOME QUESTIONNAIRES MORE FAKABLE THAN OTHERS?This is an important question which has been considered theoretically and empirically. There

    may be three reasons why a questionnaire is difficult to fake. Firstly the measure may have poorface validity and hence Ss do not know what the test is measuring. Indeed as Anastasi (1961) haspointed out, face validity does not refer to validity in the technical sense and concerns rapport andpublic relations. It may therefore be beneficial for any test not to have high face validity. Secondly,and related to the first idea, if the questionnaire is measuring a trait or behaviour pattern not wellunderstood by the general public (e.g. locus of control. self-monitoring) it is unlikely that the testwould be easy to fake and Ss would not know which dimension is being measured. Thirdly. thereappear to be both advantages and disadvantages associated with nearly every trait or dimensionbe it extraversion. Type A or self-monitoring. That is a very high or low score may suggest certaindistinct advantages (i.e. sociability in extraversion) but also disadvantages (impulsivity. slow tocondition). Hence it may be that when a researcher is measuring a complex trait or dimension whichhas almost exclusively negative associations (e.g. schizophrenia, neuroticism) there is a hightendency to faking but where the trait or dimension measure has some positive and some negativesides (e.g. extraversion) faking is much less likely to occur.

  • 8/2/2019 Response Bias

    9/16

    Response bias, social desirability and dissimulation 393

    One way to investigate which tests are more fakable is to examine the face validity and itemsubtlety of the tests. For instance there is a sizeable literature on the difference between suhtl~~ vsobvious approaches (Burkhart, Christian and Gythner, 1978a. b; Holden and Jackson, 1979, 198 I ).Perhaps counter-intuitively some studies (i.e. Gythner, Burkhart and Hovanitz, 1979) have foundthat obvious scales tend to have more predictive validity than subtle scales, while others foundno differences (Holden and Jackson, 198 I). Clearly a tests item subtlety and its face validity relateto the extent to which it may be susceptible to response bias. That is if a test has high face validityand unsubtle items one would imagine that the test is more fakable. Holden and Jackson (1979)have however distinguished between a number of related concepts: content calidiry-the character-istic of collection of items (representative sample) dependent on expert judgement; suhsranrive&iditll-the degree to which individual items are related to the tests underlying dimension; ,facevalidity-the degree to which a test respondent views the content of a test as relevant for thesituation being considered; and item suhtlec!,-the degree to which respondents are unaware ofwhat specific traits various items measure. Thus, whereas face validity involves the ability of atest respondent to relate item content to a hypothesized behavioural dimension, item subtlety isconcerned with the respondents ability to relate an item to its actual, keyed scale. Therefore, withrespect to the test respondent, face validity can be defined as the contextual relevance of an item,whereas item subtlety can be viewed as the lack of an obvious substantive link between an itemand its underlying construct (p. 461).

    In a study Holden and Jackson (1979) found a modest negative correlation between face validityand item subtlety suggesting that they are distinct concepts not mutually exclusive. Clearly muchdepends on the Ss differential access or understanding of the criterion or underlying dimensionupon which the test is based. The authors found no evidence that desirability mediates therelationship between item subtlety and criteria validity but admit that this may be due to thegenerally desirable nature of the traits being measures.

    In their careful study of faking Holden and Jackson (1981) conclude: No evidence was foundto suggest that subtle approaches to structured personality assessment are superior in instances inwhich distortion of self-presentation occurs . . . Although yet to be extended to other areas, suchas psychopathology, and to other motivational conditions, such as employment selection, datasuggest that the onus of proof must shift to those who would advocate disguised approaches topersonality measurement (p. 385). Yet more recently Posey and Hess (1984) who asked prisonersto fake aggressive and non-aggressive on various scales found they were able to fake the obviousscales (Self-report), but not the non-obvious scales (Draw-a-man).

    Obviously questionnaires differ systematically in the extent to which their responses are desirable.It is generally agreed that there is a universal tendency to err in self-description by being positiveand flattering, though what constitutes flattering may differ from culture to culture (Furnham,1979) and from situation to situation (Argyle, Furnham and Graham, 1981). Nevertheless theresults are not clear. It is not simply a matter of introducing subtle test items, as subtlety lies inthe eye of the beholder. Some items may be obvious and the test may have high face validity butif the S does not understand the construct being measured (i.e. self-monitoring) then thequestionnaire is not fakable.

    Finally Furnham (1986) has shown how a questionnaire may be highly fakable item-by-item yetoverall yield scores indistinguishable from an honest response. Nearly 100 Ss completed two TypeA behaviour questionnaires twice. Firstly they were asked to complete them honestly, reportingaccurately on their behaviour patterns. Half of the Ss were then asked to fake good presentingthemselves in a positive light and halffake bud presenting themselves in a negative light. There wasonly a marginal difference on one questionnaires Total score with fake good Ss having lower TypeA (i.e. higher Type B scores) yet nearly every individual question revealed large significantdifferences. The reason why the Total score did not show major variations as a function of fakingwas that the Type A measure is clearly multidimensional (Eysenck and Fulker, 1982; Price 1982;Ray and Bozek, 1980) and that while some factors of both Type A and B behaviours are clearlysocially desirable, others are not. Hence what faking does is to dramatically increase one score whilesimultaneously decreasing another, so leaving the Total score much the same. However, as differentType measures stress different factors (competitiveness vs freneticism) of Type A behaviour eachwith its distinctive pattern of social desirability, so there are differences in the overall susceptibility

  • 8/2/2019 Response Bias

    10/16

    394 AUKIAU F UKUHAM

    to faking of different measure+ in this instance the Bortner and Eysenck scale. In other words,unless a scale is measuring a construct which is both unidimensional and wholly desirable fakedresponses may not be detectable. Differences in the fakability of a scale may relate directly to theabsolute number of socially desirable and undesirable items used to measure the phenomenon beingconsidered.

    4. SOCIAL DESIRABILITY AS A TRAIT OR RESPONSE STYLE?There has recently been a renewed interest in the old debate as to the meaning of social

    desirability. Specifically there has been a debate over whether social desirability is a response setor a personality/cognitive trait. This debate has been going for at least 25 years and revolves aroundwhether social desirability variance need not or should not be removed from personality scales(Gough, 1947; Dicken. 1963). Some have argued that the importance of social desirability inquestionnaires has been overemphasized and does not constitute a major problem (Dicken, 1963)while others have suggested that any scale that is demonstrably fakable is useless. Some haveconcentrated on the profile of typical fakers (Cohen and Lefkowitz, 1974) believing this to representa stable trait while others have eliminated Ss with high scores from further analyses. In factHartshorne and May (1928) started the debate about the possibility that a L scale may be a truedimension of personality, by saying that females have higher L scores than males, but this maybe due to the fact that they were actually more conforming socially.

    This issue has recently surfaced again. Linehan and Nielsen (1981) looked at the relationshipbetween questionnaires measuring social desirability, hopelessness and suicidal behaviours in agroup of American shoppers. They found hopelessness and social desirability highly negativelycorrelated (r = -0.64), and that both are modestly but significantly correlated with reports of pastsuicidal behaviour. of course in opposite directions. Further because attempts to control for thecovariation of social desirability with hopelessness left little useful variation in predictions basedon hopelessness, they argued for caution in using self-report measures of hopelessness in examiningsuicidality. Nevid (1983) however argued in response that social desirability should not be invokedas a potential confound unless the obtained covariation is theoretically inconsistent or is sooverlapping as to make the respective scales redundant with respect to factorial content (p. 139).Nevids argument is that social desirability is a dimension of personality. not a style of respondingor a test-taking attitude, which is conceptually related to hopelessness. In other words as Linehanand Nielsen (1983) conceed: Socially desirable responding may include thoughtless, overlygenerous responses; individuals may continuously give themselves the benefit of the doubt becauseof the negative implications of negative self-perception. Thus, social desirability scores mayrepresent a symptom of the dichotomous (good/bad) reasoning processes important to mostcognitiveebehavioural theories (p. 141). However Linehan and Nielsen (1983) still contend thatsocial desirability is a confounding variable. McCrae and Costa (1983) in a review article cometo Nevids (1983) defense arguing that: social desirability scales are better interpreted as measuresof substantive traits than as indicators of response bias and that they are of little use as suppressorvariables in correcting scores from other scales (p. 882). They argue that social desirability is notvery much of a problem if all individuals scores are inflated or deflated but it is a serious sourceof distortion if individuals are d$hvztiul/~~ susceptible. Whether individuals differ as a functionin high approval needs. defensive attributions, lying strategies is unimportant. But as they quitecorrectly point out, self-report studies cannot adequately disentangle the issue of whether socialdesirability is a response style or trait. Hence they reported on a study from over 200 adults wherespouse ratings were compared to self-report using two social desirability and two personalitymeasures. Both social desirability measures were related to neuroticism, but when the correlationswith social desirability were partialled out the validity coefficients decreased rather than increased.Thus partialling out social desirability failed to improve correspondence with the externalobjective criterion-in fact it tended to lower agreement. Thus the authors conclude that socialdesirability scales are measures of psychological adjustment and closedness to experience. Theauthors conclude: Conceptual habits die hard, and it is difficult to view the correlation of a SDmeasure with a trait scale as anything but an indictment of that scale. Perhaps L and SD scalesshould be relabeled, as need for approval. social naivete, or social adjustment. These

  • 8/2/2019 Response Bias

    11/16

    Response bias, social desirability and dissimulation 395

    substantive traits should be studied in their own right and may be important in predicting real-lifeoutcomes. Indeed, they may prove yet another instance of the utility of self-report measures inproviding accurate assessments of personality (McCrae and Costa, 1983, p. 887).

    Researchers using both self-report (Eysenck and Eysenck, 1975) and non-self-report measuresalone (McCrae and Costa, 1983) have pointed out that social desirability appears to be a trait ratherthan a response set. That is, because there are consistent, stable, individual differences in socialdesirability that correlate meaningfully with other measures (usually of adjustment) there is reasonto argue that social desirability is not a situation-specific response set that invalidates othermeasures. Of course maintaining that social desirability is a trait begs the question as to theaetiology of this trait, its relationship to other traits etc. Furthermore, this approach would insistthat social desirability is not partialled out in experimental work (Borkenau and Amelong, 1985)but measured and treated as an independent variable.

    By arguing that social desirability is a trait rather than a response set does not however meanthat all questionnaires are equally responsive to this trait or that people cannot simulate it. If socialdesirability is a clinical trait that measures things akin to adjustment it is not surprising that socialdesirability correlates with neuroticism etc. and that clinical measures are so susceptible todeliberate faking. However it is not clear that those researchers who have devised lie/socialdesirability scales had a clear idea of the trait that they were measuring.There has been consistent criticism of the concept of, and the scales devised to measure, socialdesirability. For instance Wiggins (1962) has accused the scales of hypercommunality in the sensethat many of the items are answered in the same way by a high proportion of the Ss implying thatthey measure well-defined and accepted social norms. Thus a failure to match the populus responsepattern is a correlate of all pathology scales and shows excessive asocial responses.

    Many have pointed out the two-dimensional nature of socially desirable responding and therehave been numerous factor-analytic studies since Messick (1960) found nine rather unclear factors.For instance Millham (1974) identified two dimensions: attribution (tendency to attribute sociallydesirable characteristics) and denial (tendency to deny socially undesirable characteristics).Romanaiah, Schull and Lueng (1977) found empirical support for these dimensions but subsequentmore thoughtful work by Romanaiah and Martin (1980) suggests that these two scales are reallymeasuring the same construct and that previous results were attributed to specific method variancein the unbalanced scales caused by keying directions. Paulus (1984) has however made anothertwo-dimensional distinction partitioning social desirable responses into those involving s e l f -deception, where the respondent actually believes his or her positive self-reports and impression-management where the respondent consciously dissimulates. Many others have made similardistinction. In three studies he demonstrated the independence of these factors and argues that theseresults have strong implications for the control of socially desirable self-reports. Clearly it is theimpression-management factor that requires most control as there is little reason to believe thatdifferences on this dimension often bear any intrinsic relation to central content dimensions.However the self-deception component is different and may reflect underlying self-images orcognitive styles that are both invariant and unconscious.

    Thus it seems that if social desirability is a trait it is multi- rather than unidimensional. Howeverthese dimensions need to be verified empirically and to be demonstrably trait-like in theirmanifestations. As Eysenck and Eysenck (1975) have noted about their own instrument: Too littleis known at the moment about the L scale to make dogmatic statements possible (p. 15).

    5. SOCI AL DESIRABILITY AND LAY THEORIES OF NO RMALAND ABNORMAL

    What do social desirability studies tell us about a Ss conception about what is desirable, normaletc? Apart from showing that questionnaires are fakable, social desirability studies give theresearcher an interesting insight into what lay people think is good. Thus on all the Eysenckpersonality measures faking good is associated with high E and L scores and low N and P scores(Furnham and Henderson, 1982). Similarly when faking good on psychiatric symptom scores Sshave very low scores. Velicer and Weiner (1975) found that when asked to take a salesman or alibrarian, Ss showed higher E scores for the former than the latter.

  • 8/2/2019 Response Bias

    12/16

    396 AVKIAN FUKNHAM

    There are a large number of studies on how normal, healthy, adjusted people can fake anabnormal, unhealthy and maladjusted profile on a questionnaire. Nearly all have shown thatnormal people can fake abnormality, but can the abnormal fake normal? Some studies have lookedat response bias in abnormal groups (psychiatric patients, prisoners) completion of self-reportmeasures. Lawton and Kleban (1965) found that psychiatric prisoners were unable to simulatenormality on the MMPI producing a profile considered stereotypic of sociopaths. Earlier researchby Hunt (1948) had shown that whereas Navy court martial prisoners were unable to fake goodadjustment (on the MMPI) they were able to feign maladjustment. More recently Gendreau, Irvineand Knight (1973) compared 24 inmates MMPI profiles in which they were told to take goodadjustment and maladjustment with their honest profiles. The inmates were able easily to fakeboth maladjustment and good adjustment. Similarly, Rice. Arnold and Tate (1973) asked 25 malemaximum-security psychiatric patients to complete the MMPI under each of three sets ofinstructions-honest, fake good adjustment, fake bad adjustment-and found that the patientswere able to fake both good and bad adjustment. However, they did find that various faking indicesfrom the MMPI were reasonably accurate in detecting both types of faking. More recently, Poseyand Hess (1984) asked 58 adult male prisoners (60% of whom had committed crimes againstproperty) to either fake aggressive, non-aggressive or reply honestly on three quite different tests.They found that the inmates could fake aggressive on the two self-report scales (and all thesubscales) but not the Draw-a-person test. yet also that they could not fake non-aggressive (i.e.no significant difference between control and fake non-aggressive). As they note: There may beinstances in which an inmate would like to be identified as a trouble maker. e.g., in order to beclassified by prison authorities in a certain fashion. However. it seems likely that an inmate wouldmost often want to be seen as compliant and nonaggressive by authorities in order to receive thelightest possible security designation. The nonsignificant differences between the standard groupand the fake-good group on the research scales indicate that they cannot be detected on the basisof scores on these scales. (p. 193).

    The above studies have a number of limitations. Firstly, they are nearly all confined to usingthe MMPI, which although it may be considered a popular and well-psychometrized instrument,is used most frequently in American psychiatric settings. Many other instruments are used byclinicians and researchers and these may be more or less amenable to faking. Secondly, nearly allthe studies looking at a clinical population (usually criminal) have used small heterogeneoussamples thus obscuring differences between abnormal groups. For instance in the Rice et ul. (1983)study I7 of the 2.5 Ss were diagnosed as having personality disorders. and the remaining 8 asschizophrenic (!) while Gendreau et al. (1973) had 8 drug addicts, 6 thieves, 5 sexual offenders, 2escapees and an arsonist. extortionist and a parole offender. Furthermore, we are not told of howaccurately or by whom the diagnoses were made. Thirdly few of the studies using psychiatricpatients had a control group of normal people, with whom to contrast the scores of the psychiatricpatients. More recently McCarthy and Furnham (1986) asked two groups of psychiatric patients-anxiety state, depressed---and a normal control group to fill in two questionnaires twice: firstresponding honestly and then as they believed a normal person might. The results showed thatwhereas normal people tend to see other normals as much the same if not slightly less well adjustedthan themselves, patients see themselves as less well adjusted than the ordinary person. The controlswere not significantly more able to predict the normal response to these measures than the patientgroups were. However, the depressed and anxious groups differed in the accuracy of their estimatesand in their conceptions of normal functioning. The two patient groups did differ in their levelsof accuracy, the depressed patients were generally fairly accurate in their estimate although holdinga somewhat negative view of ordinary adjustment. The anxious patients estimates were alwaysfurther from the scale norms than those of the depressed patients and they substantiallyoverestimated the adjustment of the ordinary personality and underestimated the adjustment ofordinary social behaviour. Overall the anxious patients estimates deviated from scale norms in thesame direction as the controls but this deviation was usually more estremr. Thus it seems thatit is not possible to generalize about abnormal groups perceptions of normality: psychiatricallydisturbed patients are not nece.s.sari/y less able than undisturbed Ss to perceive normality accurately.It is anxiety but not depression that appears to impair this ability.

    The depressed group, in keeping with other findings (e.g. Beck. Rush. Shaw and Emery, 1979;

  • 8/2/2019 Response Bias

    13/16

    Response bias, social desirability and dissimulation 39-f

    Lewinsohn, Mischel, Chaplin and Barton, 1980) did appear to have a negative view of themselvesand a mildly negative but substantially accurate view of the ordinary others personality andbehaviour. The anxious group were more extreme in their conception of normal functioning: theyappeared to have in some respects an idealized, perhaps naive, view of the healthy personality. Theevidence from this study does not support the view idea that anxious Ss would be less able to forma consistent conception of normal functioning.

    Overall both patient groups believed there was a greater disparity between themselves and theordinary person than did the control group. Rather than basing their implicit personality theoryon the assumption that other people are more or less like themselves as the control group did,psychiatric patients assume that the ordinary person is different, even at times the opposite, tothemselves. Clinically these results have implications for the management of change in suchpatients. Setting up an ideal of the ordinary persons adjustment, which is very distant from theirself-image may make it more difficult for psychiatric patients to envisage that they might succeedin acquiring this ordinary level of adjustment.

    Certainly it appears that faking studies with abnormal groups allow researchers not only toinvestigate the extent to which these groups may be able to disguise their real response on diagnosticinventories but also to see what their images of normality and abnormality are.

    6. CONCLUSION AND RECOMMENDATIONS FOR FURTHER RESEARCHFor a very long time response bias, and social desirability in particular, has been viewed as a

    major disadvantage of all self-report measures. This view is held by the skeptical layman just asmuch as by the traditional psychometrician who believe fakability, item detection and/orcorrelations with social desirability scales as evidence of the invalidity of (all) self-responsemeasures (interview, questionnaire). This objection, along with others such as attributional ormotivational errors, means that numerous social scientists find the use of questionnairesscientifically dubious and the results possibly spurious.

    It should be pointed out that there is reasonable evidence to suggest that (some) people can fakequestionnaires and spot what the items are measuring. That is people can often fake good whenthe situation may require it (Michaelis and Eysenck, 1971) or fake bad and malinger (Salas, 1968)when they want to. However this fact alone is quite insufficient to damn self-report measures asthe many findings reviewed above show. Clearly if faking accounted for most of the variance allself-report measures would have no validity, that is, they would not be empirically testableaccording to a theoretical framework. Yet many scales provide much empirical proofs for theirvalidity. including extensive laboratory studies (Eysenck and Eysenck, 1985). This must constitutethe strongest proof against the argument that scales are valueless because they can be faked.Consider for instance the anxiety-learning literature as evidence for the validity of questionnairescores. Eysenck (1973) has shown clearly how questionnaire-derived E and N scores relate in aclearly predictable and theoretically coherent way to a range of experimental variables such aslearning and memory.

    Firstly it is worth noting that questionnaires are not all equally fakable. Whereas some measures(notably measures of adjustment) appear to be highly sensitive to, and correlated with, socialdesirability. other measures appear relatively immune from, and uncorrelated with social de-sirability measures. There are probably a number of reasons for this including item subtlety, facevalidity, multidimensionality. Thus some well-used questionnaires measuring such things asself-monitoring (Snyder, 1974) and locus-of-control beliefs (Rotter, 1966) appear not to besignificantly affected by social desirability.

    Secondly, and perhaps more importantly, there are important relatively consistent and stableindividual differences in social desirability responses. If social desirability were merely a responseset one might well predict relatively little S variance in specific experimental or natural conditions(i.e. instructed to fake good or bad). However if social desirability were more than or different frommerely a test-taking response style. and a recognizable stable trait one would imagine numerouspredictable individual differences. More recent research appears to confirm the latter position,namely that social desirability has trait-like qualities. Precisely what the trait is called, its aetiology,

  • 8/2/2019 Response Bias

    14/16

    398 ADRIAK FURXHAM

    structure and relationship to other traits, how it is best measured etc. are not clear and warrantfurther research.

    However if researchers like Eysenck and Eysenck (1975) and McCrae and Costa (1983) arecorrect in assuming that lie and social desirability scales (all of which correlate moderately highly)are tapping a stable trait measuring social conformity and adjustment it is clear why questionnairemeasures of mental health and related concepts (assertiveness, depression, self-concept, neuroti-cism, minor psychiatric morbidity) are so sensitive to faking. That is, they are measures of muchthe same thing and hence one would predict a strong negative correlation with these measures.Therefore it is not that measures of mental health are particularly susceptible to social desirabilityeffects, but rather that social desirability and mental health are closely conceptually related. Thisalso explains why this relationship holds across normal and abnormal groups. Thus to point outthat peoples scores on a mental health test are directly related to their assessment of the desirabilityreflected in the items does not necessarily debase the validity of the test as the perceived desirabilityitself is related to mental health!

    The study of response bias, and social desirability in particular, has always been a Cinderellascience, probably because it was thought to be a transient style of invalidating questionnaires ratherthan being a stable substantive trait. Hence there are major gaps in the research literature. Forinstance little is known about cultural, national or subgroup differences in social desirability. If,as is well-established, there are cultural differences in the perceptions of normal social behaviour(Furnham, 1979) it may be expected that there are cultural differences in socially desirableresponses which are predictable in terms of various features of that culture. Secondly, it isimportant to know under what circumstances socially desirable responses are more or less likelyto occur (i.e. when being observed vs when responding anonymously) as this may prove very usefulin designing testing materials and tastes. Thirdly, and perhaps more importantly, the aetiology ofthe social desirability response pattern needs to be established as well as its trait-like characteristics(stability over time, consistency over situations). The relationship of this behaviour pattern to othertraits (i.e. self-monitoring, extraversion) and needs (need for approval, need for achievement) alsowarrants investigation. Similarly, the overlap between measures of social desirability (i.e. Edwards,1957; Crowne and Marlowe, 1960) lying (Eysenck and Eysenck, 1975) and need for approval(Millhan, 1974) needs to be established. Rather than considering social desirability a mere responseartifact that threatens the validity of self-report it should be seen as a substantive trait useful inpredicting behaviour.

    REFERENCESAnastasi A. (1961) Psychological Testing. Macmillan. New York.Argyle M., Furnham A. and Graham J. (1981) Social Situcrrions. Cambridge Univ. Press, Cambs.. U.K.Bailey J. and Metcalfe M. (1969) The MPI and the EPI: a comparative study on depressive patients, Br. J. sot. c,/in. PsychoI.

    8, 50-64.Beck A., Rush A., Shaw B. and Emery G. (1979) Cog&ice Thrrupy oJDeprc~ssion. Guilford Press, New York.Becker W. (1976) Biasing effect of respondents identification on responses to a social desirability scale: a warning to

    researchers. Psychol. Rep. 39, 756758.Borkenau P. and Amelang M. (1975) The control of social desirability in personality inventories. J. Rex. Pewn. 19, 44 53.Braun J. and Asta P. (1969) Changes in the Personality Research Form produced by faking instructions, J. din. P.~t~ho/.

    25, 429430.Braun J. and Cosantini A. (1970) Faking and faking detection on the Personality Research Form. J. c/in. P.s~c~/to/. 26,

    516-518.Braun J. and Gomez B. (1966) Effects of faking on the Eysenck Personality Inventory. P.s~~c~ho/. ep. 19, 38X 390.Burkhart B., Christian W. and Gythner M. (1978a) Item subtlety and faking on the MMPI: a paradoxtcal relationship.J. Person. Assess. 42, 7680.Burkhart B.. Christian W. and Gythner M. (1978b) Psychological mindedness, intelligence and item subtlety endorsement

    patterns on the MMPI. J. din. P.~.whol. 34, 76~ 79.Cochrane R. and Stopes-Roe M. (1981) Psychological symptom levels in Indian immigrants to England-a comparison

    with the native English. Ps~~chol. Med. II, 319 327.Cochrdne R.. Hashmi F. and Stopes-Roe M. (1977) Measuring psychological disturbance in Asian immtgrants to Britain.

    Sot. Sci. Med. 11, 157 164.Cofer C.. Chance J. and Judson A. (1949) A study of malingertng on the MMPI. J. P.s~dd. 27, 491 499.Cohen J. and Lefkowitz J. (1974) Development of a biographical inventory blank to predict faking on personalrty tests.J. uppl. Psyhol. 59, 404 405.Colligan R.. Osborne I., Swenson W. and Offord K. (1983) The M.MPI: A Cmrcn~portrr~~ ?Vornrtr/irr Sr&. Praeger. New

    York.

  • 8/2/2019 Response Bias

    15/16

    Response bias, social desirability and dissimulation 399

    Crown S. and Crisp A. (1966) A short diagnostic self-rating scale for psychoneurotic patients: the Middlesex HospitalQuestionnaire. Br. J. fsychiat. 112, 917-923.

    Crowne D. and Marlowe D. (1960) A new scale of social desirability independent of psychopathology. J. ronsulr. Psychol.24, 349-354.

    Dahlstrom W.. Welsh G. and Dahlstrom L. (1972) An MMPI Handbook, Vol. I. Univ. of Minnesota Press, Minneapolis.Minn.Dicken C. (1963) Good impression, social desirability and acquiescence as suppressor variables. Educ. psychol. Measur. 23,699-720.Dunnett S., Koun S. and Barber P. (1981) Social desirability in the Eysenck Personality Inventory. Br. J. Psychoi. 72, 19-26.

    Edwards A. (1957) The Sociul Desirabiliry Variable in Personality Assessment and Research. Dryden, New York.Eysenck H. (Ed.) (1973) Personality, learning and anxiety. In Handbook of Abnormal Psychology, 2nd edn. Pitman

    Medical, London.Eysenck H. and Eysenck M. (1985) Personality and Indioidual Diferences: a Natural Science Approach. Plenum Press,London,Eysenck H. and Eysenck S. (1975) The Eysenck Personalit} Quesfionnaire Manual. Hodder & Stoughton, London.

    Eysenck H. and Eysenck S. (1976) Psycholicism as a Dimension of Personality. Hodder & Stoughton. London.Eysenck H. and Fulker D. (1982) The components of Type A behaviour and its genetic determinants. Ac/irGfas newosa

    Suppl. 1, lllLl25.Eysenck S., Nias D. and Eysenck H. (1971) The interpretation of childrens lie scale scores. Br. J. educ. PsychoI. 41, 23-31.Eysenck S., Eysenck H. and Shaw L. (1974) The modification of personality and lie scale scores by special honest

    instructions. Br. J. .FOC.clin. Ps.)~chol. 13, 41-50.Farley F. (1966) Social desirability extraversion and neuroticism: a learning analysis. J. Psychol. 64, 113-l 18.Farley F. (1970) Generality of faking effects in the dimensional measurement of personality. Aust. J. PsychoI. 22, 265-268.Farley F. and Goh D. (1976) PENmanship: faking the P-E-N. Br. J. sot. c lin. P.yychol. 15, 139-148.Feather N. (1967) Some personality correlates of external control. Aust. J. fsychol. 19, 253-260.Furnham A. (1979) Assertiveness in three cultures: multidimensionality and cultural differences. J. clin. Isvchol. 35,

    522-527.Furnham A. (1984) Lay conceptions of neuroticism. Person. indioid. D@ 5, 95-103.Furnham A. (1986) The social desirability of the Type A behaviour pattern. Submitted for publication.Furnham A. and Henderson M. (1982) The good, the bad and the mad: response bias in self-report measures. Person.

    inditid. Diff: 3, 31 I-320.Furnham A. and Henderson M. (1983a) The mote in thy brothers eye and the beam in thin, own: predicting ones own

    and others personality test scores. Br. J. P.yychol. 74, 381-389.Furnham A. and Henderson M. (1983b) Response bias in self-report measures of general health. Person. indirid. D@ 4,

    519-525.Gehman W. (1957) A study of ability to fake scores on the Strong Vocational Interest Blank for men. Educ. psychol. measur.

    17, 65-70.Gendreau P., Irvine M. and Knight S. (1973) Evaluating response set styles on the MMPI with prisoners: faking good

    adjustment and maladjustment. Can. J. behat). Sri. 5, 183-194.Gibson H. (1962) The lie scale of the Maudsley Personality Inventory. Acra psychol. 20, I8 -23.Goldberg D. (1972) The Derrcrion of Psychiatric Illness by Quesrionnaire. OUP. London.Goldstein H. (1945) A malingering key for mental tests. Psrchol. Bull. 42, 104-118.Gorman B. (1968) Social desirability factors in the Eysenck Personality Inventory. J. PsychoI. 69, 75-83.Cough H. (1947) Simulated patterns on the MMPI. J. ubnorm. .soc. PsychoI. 42, 215-225.Graham J. (1978) The MMPI: A Practical Guide. OUP, New York.Gray J. (1972) Self-rating and Eysenck Personality Inventory estimates of neuroticism and extraversion. Psychol. Rep. 30,

    213-214.Green R. (1951) Does a selection situation induce testees to bias their answers on interest and temperament tests? Educ.

    psychoI. Measur. II, 503-5 15.Gynther M., Burkhart B. and Hovanitz C. (1979) Do face-valid items have more predictive validity than subtle items? The

    case of the MMPI scale. J. consult. clin. PsychoI. 47, 295-300.Harrison N. and McLaughlin R. (I 969) Self-rating validation of the Eysenck Personality Inventory. Br. J. .soc. clin. f.yychol.

    8, 55-58.Hartshorne H. and May M. (1928) Studies in Deceit. Macmillan. New York.Hogan H. (1972) Fakability of the Adorn0 F-scale. Psychol. Rep. 30, I5 ~21.Holden R. and Jackson D. (1979) Item subtlety and face validity in personality assessment. J. consulr. clin. PsychoI. 47,

    459468.Holden R. and Jackson D. (1981) Subtlety, information and faking effects in personality assessment. J. clin. Ps~~chol. 37,

    379-386.Huggan R. (1968) Neuroticism. distortion and objective manifestations of anxiety in males with malignant disease, Br. J.

    .YOC. lin. P.vrc,hol. 7, 280-285.Hunt H. (1948) The effect of deliberate deception on MMPI performance. J. consulr. P.sychol. 12, 396-402.Kalton G. and Schuman H. (1982) The effect of the question on survey responses: a review. J. R. s/u/isr. Sot,. 145, 42- 73.Keehn H. (1961) Response sets and the Maudsley Personality Inventory Scores. PsychoI. Rep, 64, 229-233.Knowles J. and Kreitman N. (1965) The Eysenck Personality Inventory: some considerations. Br. J. I?ywhiar. 111, 755-759.Langner T. (1962) A 22-item screening scale of psychiatric symptoms indicating impairment. J. Hlrh .wc. Behat,. 3, 269-276.Lawton M. and Kleban M. (1965) Prisoners faking on the MMPI. J. clin. PsychoI. 21, 269-271.Lewinsohn P., Mischel 0.. Chaplin W. and Barton R. (1980) Social competence and depression: the role of illusory

    self-perceptions. J. ahnorm. PsychoI. 89, 203L2 12.Linehan M. and Nielson S. (1981) Assessment of suicide ideation and parasuicide: hopelessness and social desirability.J. consult. clin. Psyc,hol. 49, 773-775.

    Linehan M. and Nielson S. (I 983) Social desirability: its relevance to the measurement of hopeless and suicidal behaviour.J. c,onsult. clin. P.s~&ol. 51, I41 143.

  • 8/2/2019 Response Bias

    16/16

    400 AL~RIAN FLJRNHAM

    Martin J. and Stanley G. (1963) Social desirability and the Maudsley Personality Inventory. Acru p~~~hol. 21, 260 264.McCarthy B. and Furnham A. (1986) Patients conceptions of psychological adjustment in the normal population. Br. J.

    (,/nr. P.s~~ho/. In press.McCrae R. and Costa P. (1983) Social desirability scales: more substance than style. J. c~on.su//. c//h. f%.i~ho/. 51, 882 888.McKerracher D. and Watson R. (1968) The Eysenck Personality Inventory male and female subnormal psychopaths in

    a special security hospital. f?,-. J. .soc. &t. P.Y,~Y-ho/. , 295-302.McNamara J. and Delamater R. (1984) The assertion inventory: its relationship to social desirability and sensitivjity to

    rejection. Psj~c~hol. Rep. 55, 719~ 724.Meehl P. and Hathaway S. (1946) The K factor as a suppressor variable in the MMPI. J. up/~/. PY~Y/IO/. 30, 525 S64.Messick S. (1960) Dimensions of social desirability. J. con.ru//. P.r~cho/. 24, 279 287.Michaelis W. and Eysenck H. (1971) The determination of personality inventory factor patterns and intercorrelations by

    changes in real life motivation. J. gcnet. Ps~~c~ho/.118, 223 234.Mtllhan J. (1974) Two components of need for approval score and their relationship to cheating following success and

    failure. J. Re\. Prxson. 8, 378 392.Montag L. (1976) Personal communication cited in Eysenck and Eysenck (1976).Nall V. (1951) Simulation by college students of a prescribed pattern on a personality scale. Edw. p.~,~dwl. .Cf~uru~. II,478488.Nederhof A. (1985) Methods of coping with social desirability bias: a review. Eur. J. WC. P.v~~ho/.Nevid J. (1983) Hopelessness, social desirability and construct validity. J. consult. c,/bl. P.~~cho/. 51, 139 140.Parkes K. (1980) Social desirability, defensiveness and self-report psychiatric inventory scores. P.~j,c,ho/. Mrd. 10, 735 742.Paulus D. (1984) Two-component models of socially desirable responding. J. Pcr.totr. .coc. P.I:,Y~o/. 46, 59X 609.Poscy C. and Hess A. (1984) The fakability of subtle and obvious measures of aggressions by male prisoners. J. ~c~rsotr.

    .4s.rf~.c.r. 8, I 37. 144.Power R. and MacRae K. (1971) Detectability of items in the Eysenck Personality Inventory. Br. J. P.s~dzo/. 62, 395 401.Power R. and MacRae K. (I 977) Characteristics of items in the Eysenck Personality Inventory which affect responses when

    students simulate. Br. J. P.s~du~/. 68, 491 498.Price V. (1982) Trpe A Behuriour Pu~tun: (1 Modd/i~r Rrseurc~h md Prcrc~tiw. Academic Press. London.Ray J. and Bozek R. (1980) Dissecting the A B personality type. Br. J. med. Psycho/. 53, I81 186.Rice M., Arnold L. and Tate 0. (1973) Faking good and bad adjustment on the MMPI and overcontrolled hostility in

    maximum psychiatric patients. J. hehut-. SC,;. 15, 43-51.Rock D. (1981) The confounding of two self-report assertion measures with a tendency to give socially desirable responses

    in self description. J. c,onsu/t. &. Psycho/., 43 744.Romanaiah N. and Martin H. (1980) On the two-dimensional nature of the MarloweeCrowne Social Desirability Scale.

    J. Pwsm. As.wss.4, 07 -514.Romanaiah N.. Schill J. and Leung L. (1977) A test of the hypothesis about the two-dimensional nature of the

    Marlowe Crowne Social Desirability Scale. J. Rer. rwon. I, 51 -259.Ross C. and Mirowsky J. (1984) Socially-desirable response and acquiescence in a cross-cultural survey of mental health.

    J. H//h so