Download - 10 Methodological Issues

7/30/2019 10 Methodological Issues

1/32

Methodological Issues in Cross-Cultural

Counseling Research:

Equivalence, Bias, and Translations

Stefana gisdttir

Lawrence H. Gerstein

Deniz Canel inarbasBall State University

Concerns about the cross-cultural validity of constructs are discussed, including equiv-

alence, bias, and translation procedures. Methods to enhance equivalence are described,

as are strategies to evaluate and minimize types of bias. Recommendations for translat-

ing instruments are also presented. To illustrate some challenges of cross-cultural coun-

seling research, translation procedures employed in studies published in five counseling

journals are evaluated. In 15 of 615 empirical articles, a translation of instruments was

performed. In 9 studies, there was some effort to enhance and evaluate equivalence

between language versions of the measures employed. In contrast, 2 studies did not report

using thorough translation and verification procedures, and 4 studies employed a mod-

erate degree of rigorousness. Suggestions for strengthening translation methodologies and

enhancing the rigor of cross-cultural counseling research are provided. To conduct

cross-culturally valid research and deliver culturally appropriate services, counseling

psychologists must generate and rely on methodologically sound cross-cultural studies.

This article provides a schema for performing such studies.

There is growing interest in international issues in the counseling profes-

sion. There are more publications about cross-cultural issues in counsel-

ing and the role of counseling outside of the United States (Gerstein, 2005;

Gerstein & gisdttir, 2005a, 2005b, 2005c; Leong & Blustein, 2000; Leong

& Ponterotto, 2003; Leung, 2003; gisdttir & Gerstein, 2005). Greaterattention also has been paid to counseling international individuals living in

the United States (Fouad, 1991; Pedersen, 1991). Confirming this trend is the

focus of Division 17s past president (2003 to 2004), Louise Douce, on the

globalization of counseling psychology. Douce encouraged developing a

strategic plan to enhance the professions global effort and facilitate a move-

ment that transcends nationalism (Douce, 2004, p. 145). She also stressed

questioning the validity and applicability of our Eurocentric paradigms and

the hegemony of such paradigms. Instead, she claimed our paradigms must

integrate and evolve from indigenous models.

P. Puncky Heppner continued Douces effort as part of his Division 17

presidential initiative. Heppner (2006) claimed, Cross-national relationships

THE COUNSELING PSYCHOLOGIST, Vol. XX No. X, Month XXXX xx-xx

DOI: 10.1177/0011000007305384

2007 by the Division of Counseling Psychology.

1

Copyright 2007 by Division 17 of Counseling Psychologist Association.


2/32

have tremendous potential to enhance the basic core of the science and

practice of counseling psychology, both domestically and internationally

(p. 147). He also predicted, In the future, counseling psychology will no

longer be defined as counseling psychology within the United States, but

rather, the parameters of counseling psychology will cross many countries

and many cultures (Heppner, 2006, p. 170).

Although an international focus in counseling is important, there are

many challenges (cf. Douce, 2004; Heppner, 2006; Pedersen, 2003). This

article discusses methodological challenges, especially as related to the

translation and adaptation of instruments for use in international and cross-

cultural studies and their link to equivalence and bias. While there has been

discussion in the counseling psychology literature about the benefits andchallenges of cross-cultural counseling and the risks of simply applying

Western theories and strategies cross-culturally, we were unable to locate

publications in our literature detailing how to perform cross-culturally valid

research. There is literature, however, in other areas of psychology (e.g.,

cross-cultural, social, international) that addresses these topics. This article

draws from this literature to introduce counseling psychologists to some

concepts, methods, and issues when conducting cross-cultural research. We

also extend this literature by discussing the potential use of cross-culturalmethodologies in counseling research.

As a way to illustrate some challenges of cross-cultural research, we also

examine, analyze, and evaluate translation practices employed in five

prominent counseling journals to determine the translation procedures

counseling researchers have used and the methods employed to minimize

bias and evaluate equivalence. Finally, we offer recommendations about

translation methodology and ways to increase validity in cross-cultural

counseling research.

METHODOLOGICAL CONCEPTS AND ISSUES

IN CROSS-CULTURAL RESEARCH

Approaches to Studying Culture

There are numerous definitions ofculture in anthropology and counsel-

ing psychology. Ponterotto, Casas, Suzuki, and Alexander (1995) con-cluded that for most scholars, culture is a learned system of meaning and

behavior passed from one generation to the next. When studying cultural

influences on behavior, counseling psychologists may approach cultural

variables and the design of research from three different angles using the

indigenous, the cultural, and the cross-cultural approach (Triandis, 2000).

2 THE COUNSELING PSYCHOLOGIST / Month XXXX


3/32

According to Triandis, when using the indigenous approach, researchers

are mainly interested in the meaning of concepts in a culture and how such

meaning may change across demographics within a cultural context (e.g.,

what does counseling mean in this culture?). With this approach, psychol-

ogists often study their own culture with the goal of benefiting people in

that culture. The focus of such studies is the development of a psychology

tailored to a specific culture without a focus on generalization outside of

that cultural context (cf. Adamopolous & Lonner, 2001). The main chal-

lenge with the indigenous approach is the difficulty in avoiding existing

psychological concepts, theories, and methodologies and therefore deter-

mining what is indigenous (Adamopolous & Lonner, 2001).

Triandis (2000) contended with the cultural approach; in contrast, psy-chologists often study cultures other than their own by using ethnographic

methods. True experimental methods can also be used within this approach

(van de Vijver, 2001). Again, the meanings of constructs in a culture are the

main focus without direct comparison of constructs across cultures. The

aim is to advance the understanding of persons in a sociocultural context

and to emphasize the importance of culture in understanding behavior

(Adamopolous & Lonner, 2001). The challenge with this approach is a lack

of widely accepted research methodology (Adamopolous & Lonner, 2001).Last, Triandis (2000) stated that when using cross-cultural approaches,

psychologists obtain data in two or more cultures assuming the constructs

under investigation exist in all of the cultures studied. Here, researchers

are interested in how a construct affects behavior differently or similarly

across cultures. Thus, one implication of this approach is an increased

understanding of the cross-cultural validity and generalizability of the

theories and/or constructs. The main challenge with this approach is

demonstrating equivalence of constructs and measures used in the targetcultures and also minimizing biases that may threaten valid cross-cultural

comparisons.

In sum, indigenous and cultural approaches focus on the emics, or things

unique to a culture. These approaches are relativistic in that the aim is

studying the local context and meaning of constructs without imposing a

priori definitions of the constructs (Tanaka-Matsumi, 2001). Scholars rep-

resenting these approaches usually reject claims that psychological theories

are universal (Kim, 2001). In the cross-cultural approach, in contrast, the

focus is on the etics, or factors common across cultures (Brislin, Lonner, &

Thorndike, 1973). Here the goal is to understand similarities and differ-

ences across cultures, and the comparability of cross-cultural categories or

dimensions is emphasized (Tanaka-Matsumi, 2001).

gisdttir et al. / CROSS-CULTURAL VALIDITY 3


4/32

Methodological Challenges in Cross-Cultural Research

Scholars from diverse psychology disciplines have pursued cross-cultural

research for decades, and as a result, a literature on cross-cultural researchmethodologies and challenges emerged (e.g., Berry, 1969; Brislin, 1976;

Brislin et al., 1973; Lonner & Berry, 1986; Triandis, 1976; van de Vijver,

2001; van de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997).

Based on this work, our article identifies some methodological challenges

faced by cross-cultural researchers. Before proceeding, note that the challenges

summarized below refer to any cross-cultural comparison of psychological

constructs (within [e.g., ethnic groups] and between countries). These chal-

lenges are greater, though, in cross-cultural comparisons requiring transla-tion of instruments.

Equivalence

Equivalence is a key concept in cross-cultural psychology. It addresses the

question of comparability of observations (test scores) across cultures (van

de Vijver, 2001). Several definitions or forms of equivalence have been

reported. Lonner (1985), for instance, discussed four types: functional, concep-

tual, metric, and linguistic. Functional equivalence refers to the function the

behavior under study (e.g., counselor empathy) has in different cultures.

If similar behaviors or activities (e.g., smiling) have different functions in var-

ious cultures, their parameters cannot be used for cross-cultural comparison

(Jahoda, 1966; Lonner, 1985). In comparison, conceptual equivalence refers

to the similarity in meaning attached to a behavior or concept (Lonner, 1985;

Malpass & Poortinga, 1986). Certain behaviors and concepts (e.g., help seeking)

may vary in meaning across cultures. Metric equivalence refers to psycho-

metric properties of the tool (e.g., Self-Directed Search) used to measure the

same construct across cultures. It is assumed if psychometric data from two

or more cultural groups have the same structure (Malpass & Poortinga,

1986). Finally, linguistic equivalence has to do with wording of items (form,

meaning, and structure) in different language versions of an instrument, the

reading difficulty of the items, and the naturalness of the items in the trans-

lated form (Lonner, 1985; van de Vijver & Leung, 1997).

Van de Vijver and his colleagues (van de Vijver, 2001; van de Vijver &

Leung, 1997) also discussed four types of equivalence representing a hier-archical order from absence to higher degree of equivalence. The first type,

construct nonequivalence, refers to constructs (e.g., cultural syndromes)

being so dissimilar across cultures they cannot be compared. Under these

circumstances, no link exists between the constructs. The next three types

of equivalence demonstrate some equivalence with the higher level in the



5/32

hierarchy presupposing a lower level. These are construct (or structural),

measurement unit, and scalar equivalence.

At the lowest level is construct equivalence. A scale has construct

equivalence if it measures the same underlying construct across cultural

groups. Construct equivalence has been demonstrated for many constructs

in psychology (e.g., NEO Personality InventoryRevised five-factor model

of personality; McCrae & Costa, 1997). With construct equivalence, the

constructs (e.g., extraversion) are considered having the same meaning and

nomological network across cultures (relationships between constructs,

hypotheses, and measures; e.g., Betz, 2005) but need not be operationally

defined the same way for each cultural group (e.g., van de Vijver, 2001).

For instance, two emic measures of attitudes toward counseling may tapdifferent indicators of attitudes in each culture, and therefore, the measures

may include different items but at the same time be structurally equivalent, as

they both measure the same dimensions of counseling attitudes and predict

help seeking. Yet as their measurement differs, a direct comparison of

average test scores across cultures using a ttest or ANOVA, for example,

cannot be performed. The measures lack scalar equivalence (see below).

Construct equivalence is often demonstrated using exploratory and confirma-

tory factor analyses and structural equation modeling (SEM) to discern thesimilarities and differences of constructs structure and their nomological

networks across cultures.

The next level of equivalence is measurement-unit equivalence (van de

Vijver, 2001; van de Vijver & Leung, 1997). With this type of equivalence,

the measurement scales of the tools are equivalent (e.g., interval level), but

their origins are different across groups. While mean scores from scales

with this level of equivalence can be compared to examine individual dif-

ferences within groups (e.g., using ttest), because of different origin, com-paring mean scores (e.g., ttest) between groups from scales at this level will

not provide a valid comparison. For example, Kelvin and Celsius scales

have equivalent measurement units (interval scales) but measure tempera-

ture differentlythey have a different origin and, thus, direct comparison

of temperature using these two scales cannot be done. But because of a con-

stant difference between these two scales, comparability may be possible

(i.e., K = C 273). The known constant or value offsetting the scales makes

them comparable (van de Vijver & Leung, 1997). Such known constants are

difficult to discern in studies of human behavior, rendering scores at this level

often incomparable. A clear analogy in counseling psychology is using

different cut scores for various groups (e.g., gender) on instruments as an

indicator of some criteria or an underlying trait. Different cut scores (or

standard scores) are used because instruments do not show equivalence

beyond the measurement unit. That is, some bias affects the origin of the



6/32

scale for one group relative to the other, limiting raw score comparability

between the groups. For example, a raw score of 28 on the Minnesota

Multiphasic Personality Inventory 2 MacAndrew Alcohol ScaleRevised

(Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 2001) does not mean

the same thing for women as it does for men. For women, this score indi-

cates more impulsiveness and greater risk for substance abuse than it does

for men (Greene, 2000). A less clear example but extremely important to

cross-cultural research involves two language versions of the same psycho-

logical instrument. Here the origins of the two language versions of the

scale may appear the same (both versions include the same interval rating

scale for the items). This assumption, however, may be threatened if the two

cultural groups responding to this measure vary in their familiarity withLikert-type answer formats (method bias; see later). Because of the differ-

ential familiarity with this type of stimuli, the origin of the measurement

unit is not the same for both groups. Similarly, if the two cultural groups

vary in response style (e.g., acquiescence), a score of 2 on a 5-point scale

may not mean the same for both groups. In these examples, the source or

the origin of the scale is different in the two language versions, compro-

mising valid cross-cultural comparison.

Finally, and at the highest level of equivalence, is scalar equivalence or fullscore comparability. Equivalent instruments at the scalar level measure a con-

cept with the same interval or ratio scale across cultures, and the origins of

the scales are the same. Therefore, at this level, bias has been ruled out, and

direct cross-cultural comparisons of average scores on an instrument can be

made (e.g., van de Vijver & Leung, 1997).

According to van de Vijver (2001), it can be difficult to discern if measures

are equivalent at the measurement-unit or scalar level. This challenge is

observed in comparison of scale scores between cultural groups respondingto the same language version of an instrument as well as between different

language versions of a measure. As an example of this difficulty, when

using the same language version of an instrument, racial differences in

intelligence test scores can be interpreted as representing true differences in

intelligence (scalar equivalence has been reached) and as an artifact of the

measures (measurement-unit equivalence has been reached). In the latter,

the measurement units are the same, but they have different origins because

of various biases, hindering valid comparisons across different racial

groups. In this instance, valid comparisons at the ratio level (comparing

mean scores) cannot be done. Higher levels of equivalence are more diffi-

cult to establish. It is, for instance, easier to show that an instrument mea-

sures the same construct across cultures (construct equivalence) by showing

a similar factor structure and nomological networks than it is to demon-

strate the instruments numerical comparability (scalar equivalence). The



7/32

higher the level of equivalence, though, the more detailed analysis can be

performed on cross-cultural similarities and differences (van de Vijver,

2001; van de Vijver & Leung, 1997).

Levels of equivalence for measures used in cross-cultural counseling

research should be established and reported in counseling psychology pub-

lications. It is not until the equivalence of the concepts under study have

been determined that a meaningful cross-cultural comparison can be made.

Without demonstrated equivalence, numerous rival hypotheses (e.g., poor

translation) may account for observed cross-cultural differences.

Bias

Another important concept in cross-cultural research is bias. Bias negatively

influences equivalence and refers to nuisance factors, limiting the compara-

bility or scalar equivalence of observations (test scores) across cultural groups

(van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver & Poortinga,

1997). Typical sources of bias are construct, method, and item bias. A con-

struct bias occurs when the construct measured as a whole (e.g., intelli-

gence) is not identical across cultural groups. Potential sources for this type

of bias are when there is different coverage of the construct across cultures(i.e., not all relevant behavioral domains are sampled), an incomplete overlap

of how the construct is defined across cultures, and when the appropriate-

ness of item content differs between two language versions of an instrument

(cf. van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). A serious

construct bias equates to construct nonequivalence.

Even though a construct is well represented in multilingual versions of

a scale (construct equivalence, e.g., similar factor structure, and there is no

construct bias, e.g., complete coverage of construct), bias may still exist inthe scores, resulting in measurement-unit or scalar nonequivalence (van de

Vijver & Leung, 1997). This may be a result of method bias. Method bias

can stem from characteristics of the instrument or from its administration

(van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver &

Poortinga, 1997). Possible sources of this bias are differential response

styles (e.g., social desirability) across cultures (e.g., Johnson, Kulesa, Cho,

& Shavitt, 2005), variations in familiarity with the type of stimuli or scale

across cultures, communication problems between investigators and partici-

pants, and differences in physical conditions under which the instrument is

administered across cultures. Method bias can also limit cross-cultural com-

parisons when samples drawn from different cultures are not comparable

(e.g., prior experiences).

Item bias may also exist, posing a threat to cross-cultural comparison

(scalar equivalence). This type of bias refers to measurement at the item level.



8/32

This bias has several potential sources. It can result from poor translation or

poor item formulation (e.g., complex wording) and because item content may

not be equally relevant or appropriate for the cultural groups being compared

(e.g., Malpass & Poortinga, 1986; van de Vijver & Poortinga, 1997). An item

on an instrument is considered biased if persons from different cultures

having the same standing on the underlying characteristic (trait or state)

measured yield different average item scores on the instrument.

Finally, bias can be considered uniform and nonuniform. A uniform bias

refers to any type of bias affecting all score levels on an instrument equally

(van de Vijver and Leung, 1997). For instance, when measuring persons

intelligence, the scale may be accurate for one group but may consistently

reflect 10 points too much for another group. The 10-point differencewould appear at different intelligence levels (a true score of 90 would be

100, and a true score of 120 would be 130). A nonuniform bias is any type

of bias differentially affecting different score levels. In measuring persons

intelligence, the scale may again be accurate for one group, but for the other

group, 10 points are recorded as 12 points. The difference in measured

intelligence for persons whose true score is 90 would be a score of 108 (18-

point difference), whereas for persons whose true score is 110, the differ-

ence is 22 points (a score of 132). The distortion is greater at higher levelson the scale. Nonuniform bias is considered a greater threat in cross-cultural

comparisons than uniform bias, as it influences the origin and measurement

unit (scale) of a scale. Uniform bias affects only the origin of a scale

(cf. van de Vijver, 1998, 2001).

Relationship Between Bias and Equivalence

Bias and equivalence are closely related. When two or more languageversions of an instrument are unbiased (construct, method, item), they are

determined equivalent on a scale level. Bias will lower a measures level of

equivalence (construct, measurement unit, scalar). Also, construct bias has

more serious consequences and is more difficult to remedy than method

and item bias. For instance, selecting a preexisting instrument for transla-

tion and use on a different language group, the researcher runs the risk of

incomplete coverage of the construct in the target culture (i.e., construct

bias limiting construct equivalence). Method bias can be minimized, for

example, by using standardized administration (administering under simi-

lar conditions using same instructions) and by using covariates, whereas

thorough translation procedures may limit item bias. Furthermore, higher

levels of equivalence are less robust against bias. Scalar equivalence (a

needed condition for comparison of average scores between groups) is, for

instance, affected by all types of bias and is more susceptible to bias than



9/32

measurement-unit equivalence or construct equivalence, where comparative

statements are not a focus (cf. van de Vijver, 1998). Thus, if one wants to

infer if Culture A shows more or less magnitude of a characteristic (e.g.,

willingness to seek counseling services) than Culture B, one has to empir-

ically demonstrate the measures lack of bias and scalar equivalence.

Not all instruments are equally vulnerable to bias. In fact, more struc-

tured tests administered under standardized conditions are less susceptible

to bias than open-ended questions. Similarly, the less the cultural distance

(Triandis, 1994, 2000) between groups being compared, the less room there

is for bias. Cultural distance can, for instance, be discerned based on the

Human Development Index (HDI; United Nations, 2005) published yearly

by the United Nations Development Programme to assess well-being andchild welfare (human development). Using the HDI as a measure of cultural

distance, it can be seen that the United States (ranked 10) and Ireland

(ranked 8) are more similar in terms of human development than the United

States and Niger (ranked 177). Therefore, it can be expected that greater

bias affects cross-cultural comparisons between the United States and

Niger than between the United States and Ireland.

MEASUREMENT APPROACHES

Selection of Measurement Devices

A prerequisite to conducting a cross-cultural study is to make sure what

is being studied exists and is functionally equivalent across cultures (Berry,

1969; Lonner, 1985). Once this has been determined, the next step is decid-

ing how the construct should be assessed. This decision should be based on

the type of bias expected. If there is a concern with construct bias, the con-

struct is not functionally equivalent, and serious method bias is expected, the

researcher may need to rely on emic approaches (indigenous or cultural),

develop measures meaningful to the culture, and use culture-sensitive

methodologies. Van de Vijver and Leung (1997) called this strategy the

assembly approach. Emic techniques (i.e., assembly) are often needed if

the cultures of interest are very different (Triandis, 1994, 2000). In this

approach, though, direct comparisons between cultures can be challenging,

as the two or more measures of the construct may not be equivalent at themeasurement level.

If, in contrast, the cultures are relatively similar and the concept is func-

tionally equivalent across cultures, the researcher may opt to translate and/or

adapt preexisting instruments and methodologies to discern cultural similar-

ities and differences across cultural groups. Van de Vijver and Leung (1997)



10/32

listed two common strategies employed when using preexisting measures

for multilingual groups. First is the applied approach, where an instrument

goes through a literal translation of items. Item content is not changed to a

new cultural context, and the linguistic and psychological appropriateness of

the items are assumed. It is also assumed there is no need to change the

instrument to avoid bias. According to van de Vijver (2001), this is the most

common technique in cross-cultural research on multilingual groups. The

second strategy is adaptation, where some items may be literally translated,

while others require modification of wording and content to enhance the

appropriateness to a new cultural context (van de Vijver & Leung, 1997).

This technique is chosen if there is concern with construct bias.

Of the three approaches just mentioned (assembly, application, andadaptation), the application strategy is the easiest and least cumbersome in

terms of money, time, and effort. This technique may also offer high levels

of equivalence (measurement-unit and scalar equivalence), and it can make

the comparison to results of other studies using the same instrument possi-

ble. This approach may not be useful, however, when the characteristic

behaviors or attitudes (e.g., obedience and being a good daughter or son)

associated with the construct (e.g., filial piety) differ across cultures (lack

of construct equivalence and high construct bias) (e.g., Ho, 1996). In suchinstances, the assembly or adaptation strategy may be needed. With the

assembly approach (emic), researchers may focus on the construct validity

of the instrument (e.g., factor analysis, divergent and convergent validity),

not on direct cross-cultural comparisons. When adaptation of an instrument

is needed in which some items are literally translated, whereas others are

changed or added, cross-cultural comparisons may be challenging, as direct

comparisons of total scores may not be feasible because all items are not

identical. Only scores on identical items can be compared using mean scorecomparisons (Hambleton, 2001). The application technique (etic) to trans-

lation most easily allows for a direct comparison of test scores using ttests

or ANOVA because of potential scalar equivalence. For such comparisons

to be valid, however, an absence of bias needs to be demonstrated.

The applied approach and to some degree the adaptation strategy focus

on capturing the etics, or the qualities of concepts common across cultures.

Yet cultural researchers have criticized it. Berry (1989), for instance, labeled

this practice imposed etics, claiming that by using the etic approach,

researchers fail to capture the culturally specific aspects of a construct and

may erroneously assume the construct exists and functions similarly across

cultures (cf. Adamopolous & Lonner, 2001). The advantage of the etic over the

emic strategy, however, is that the etic technique provides the ability to make

cross-cultural comparisons, whereas in the emic approach, cross-cultural

comparison is more difficult and not as direct.



11/32

Nevertheless, the etic strategy may be limited when trying to understand

a specific culture. There is, for instance, no guarantee a translated measure

developed to assess a concept in one culture will assess the same construct

equally well in another culture. It is highly likely that some aspects of the

concept may be lost or not captured by the scale. There might be construct

bias and lack of construct equivalence. To counteract this shortcoming, sev-

eral methods have been proposed. Brislin and colleagues (Brislin, 1976,

1983; Brislin et al., 1973) suggested a combined eticemic strategy. In this

approach, researchers begin with an existing tool developed in one culture

that is translated for use in a target culture (potentially etic items). Next,

additional items are included in the translated scale, which are unique to the

target culture (emic). The additional items may be developed by personsknowledgeable about the culture and/or drawn from relevant literature.

These culture-specific items must be highly correlated with the original

items in the target instrument but unrelated to culture-specific items gener-

ated from another culture (Brislin, 1976, 1983; Brislin et al., 1973). Adding

emic items will provide the researcher with a greater in-depth understand-

ing of a construct in a given culture. Assessing equivalence between the lan-

guage versions of the instrument would be based only on the shared (etic)

items (Hambleton, 2001).Similarly, Triandis (1972, 1975, 1976) suggested that researchers start

with an etic concept (thought to exist in all cultures under study) and then

develop emic items based on each culture for the etic concept. Thus, all

instrument development is carried out within each culture included in the

study (i.e., assembly). Triandis argued that cross-cultural comparison could

still be made using these versions of the measure (one in each culture)

because the emic items would be written to measure an etic concept. SEM

could, for instance, be used for this purpose (see Betz, 2005; Weston &Gore, 2006).

Finally, a convergence approach can be applied (e.g., van de Vijver,

1998). Relying on this technique, researchers may assemble a scale mea-

suring an etic concept in each culture or use preexisting culture-specific

tools translated into each language. Then all measures are given to each cul-

tural group. Comparisons can be made between shared items (given enough

items are shared), whereas nonshared items provide culture-specific under-

standing of the construct. When this method is used, the appropriateness of

items in all scales needs to be determined before administration.

Determining Equivalence of Translated Instruments

Several statistical methods are available to determine equivalence

between translated and original versions of scales. Reporting Cronbachs



12/32

alpha reliability, item-total scale correlations, and item means and variations

provides initial information about instruments psychometric properties. A

statistical comparison between two independent reliability coefficients can

be performed (cf. van de Vijver & Leung, 1997). If the coefficients are sig-

nificantly different from each other, the source of the difference should be

examined. This may indicate item or construct bias. Additionally, item-total

scale correlations may indicate construct bias and nonequivalence, and

method bias (e.g., administration differences, differential social desirability,

differential familiarity with instrumentation). Finally, item score distribution

may suggest biased items and, therefore, information about equivalence. For

instance, an indicator (e.g., item or scale) showing variation in one cultural

group but not the other may represent an emic concept (Johnson, 1998).Therefore, comparing these statistics across different language versions of

an instrument will offer preliminary data about the instruments equivalence

(e.g., construct, measurement unit, and scalar; van de Vijver & Leung, 1997;

conceptual and measurement; Lonner, 1985).

Construct (van de Vijver & Leung, 1997), conceptual, and measurement

equivalence (Lonner, 1985) can also be assessed at the scale level. Here,

exploratory and confirmatory factor analysis, multidimensional scaling

techniques, and cluster analysis can be used (e.g., van de Vijver & Leung,1997). These techniques provide information about whether the construct is

structurally similar across cultures and if the same meaning is attached to

the construct. For instance, in confirmatory factor analysis, hypotheses

about the factor structure of a measure, such as the number of factors, load-

ings of variables on factors, and correlations among factors, can be tested.

Numerous fit indices can be used to evaluate the fit of the model to the data.

Scalar or full score equivalence is more difficult to establish than con-

struct and measurement-unit equivalence, and various biases may threaten thislevel of equivalence. Item bias, for instance, influences scalar equivalence.

Item bias can be ascertained by studying the distribution of item scores for

all cultural groups (cf. van de Vijver & Leung, 1997). Item response theory

(IRT), in which differential item functioning (DIF) is examined, may be

used for this purpose. In IRT, it is assumed item responses are related to an

underlying or latent trait using a logistic curve known as item characteristic

curve (ICC). The ICCs for each selected parameter (e.g., item difficulty or

popularity) are compared for every item in each cultural group using chi-

square statistics. Items differing between cultural groups are eliminated

before cross-cultural comparisons are made (e.g., Hambleton &

Swaminathan, 1985; van de Vijver & Leung, 1997). Item bias can also be

examined by using ANOVA. The item score is treated as the dependent vari-

able, and the cultural group (e.g., two levels) and score levels (levels depen-

dent on number of scale items and number of participants scoring at each



13/32

level) are the independent variables. Main effects for culture and the inter-

action between culture and score level are then examined. Significant effects

indicate biased items (cf. van de Vijver & Leung, 1997). Logistic regression

can also be used for this purpose using the same type of independent and

dependent variables. Additionally, multiple-group SEM invariance analy-

ses (MCFA) and multiple group mean and covariance structures analysis

(MACS) also provide information about biased items or indicators (e.g.,

Byrne, 2004; Cheung & Rensvold, 2000; Little, 1997, 2000), with the MACS

method also providing information about mean differences between groups

on latent constructs (e.g., Ployhart & Oswald, 2004).

Finally, factors contributing to method bias can be assessed and statisti-

cally held constant when measuring constructs across cultures, given thatvalid measures are available. A measure of social desirability may, for

instance, be used to partially control for method bias. Also, gross national

product per capita may be used to control for method bias, as it has been

found to correlate with social desirability (e.g., Van Hemert, van de Vijver,

Poortinga, & Georgas, 2002) and acquiescence (Johnson et al., 2005).

Furthermore, personal experience variables potentially influencing the con-

struct under study differentially across cultures may serve as covariates.

Translation Methodology

Employing a proper translation methodology is extremely important to

increase equivalence between multilingual versions of an instrument and the

measures cross-cultural validity. About a decade ago, van de Vijver and

Hambleton (1996) published practical guidelines for translating psycholog-

ical tests that were based on standards set forth in 1993 by the International

Test Commission (ITC). The guidelines covered best practices in regard tocontext, development, administration, and the interpretation of psychologi-

cal instruments (cf. Hambleton & de Jong, 2003; van de Vijver, 2001; van

de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997). The context

guidelines emphasized the importance of minimizing construct, method, and

item bias and the need to assess, instead of assume, construct similarity

across cultural groups before embarking on instrument translation. The

development guidelines referred to the translation process itself, while the

administration guidelines suggested ways to minimize method bias. Finally,

the interpretation guidelines recommended caution when explaining score

differences unless alternative hypotheses had been ruled out and equivalence

between original and translated measures had been ensured (van de Vijver &

Hambleton, 1996). Counseling psychologists should review these guidelines

when designing cross-cultural research projects and prior to translating and

adapting psychological instruments for such research.



14/32

Prior to the development of the ITC standards, Brislin et al. (1973) and

Brislin (1986) had written extensively about translation procedures. The

following paragraphs outline the common translation methods that Brislin

et al. summarized with connotations to the ITC guidelines (e.g., Hambleton

& de Jong, 2003; van de Vijver & Hambleton, 1996). Additional methods

to enhance equivalence of translated scales are also mentioned.

Translation. When translating an instrument, bilingual persons who

speak both the original and the target language should be employed. Either

a single person or a committee of translators can be used (Brislin et al.,

1973). In contrast to employing only a single person for the translation, the

committee approach emphasizes two or more persons performing the trans-lation independently. Then, the translations are compared, sometimes with

another person, until an agreement is reached on an optimal translation. The

advantage of the committee approach recommended in the ITC guidelines

(van de Vijver & Hambleton, 1996) over a single person is the possible

reduction in bias and misconceptions of a single person. In addition to

being knowledgeable about the target language of the translation, test trans-

lators need to be familiar with the target culture, the construct being

assessed, and the principles of assessment (Hambleton & de Jong, 2003;van de Vijver & Hambleton, 1996). Being knowledgeable about such

topics minimizes item biases (e.g., in an achievement test, an item in one

culture may give away more information than the same item in another cul-

ture) that may result from literal translations.

Back translation. In this procedure, the translated or target version of

the measure is independently translated back to the original language by

different person(s) than the one(s) performing the translation to the targetlanguage. If more than one person is involved in the back translation,

together they decide on the best back-translated version of the scale that is

compared to the original same-language version for linguistic equivalence.

Back translation does not only provide the researcher with some control

over the end result of the translated instrument in cases where he or she

does not know the target language (e.g., Brislin et al., 1973; Werner &

Campbell, 1970), it also allows for further refinement of the translated

version to ensure equivalence of the measures. If the two same-language

versions of the scale do not seem identical (i.e., the original and the back-

translated versions), the researcher in cooperation with the translation com-

mittee works on the translations until equivalence is reached. Here, the

items requiring a changed translation may be subject to back translation

again. Oftentimes in this procedure, only the translated version is changed

to be equivalent to the original-language version that remains unchanged.



15/32

At other times, the original language version of the scale is also changed to

ensure equivalence, a process known as decentering (Brislin et al., 1973).

Adequate back translation does not guarantee a good translation of a scale,

as this procedure often leads to literal translation at the cost of readability

and naturalness of the translated version. To minimize this, a team of back

translators with a combined expertise in psychology and linguistics may be

used (van de Vijver & Hambleton, 1996). It is also important to note that in

addition to the test items, test instructions need to go through a thorough

translation/back-translation process.

Decentering. This method was first introduced by Werner and Campbell

(1970) and refers to a translation/back-translation process in which both thesource (original instruments language) and the target language versions

are considered equally important and both are open to modification.

Decentering may need to take place if words in the original language have

no equivalence in the target language. If the aim is collecting data in both the

original and the target culture, items in the original instrument are changed

to ensure maximum equivalence (cf. Brislin, 1970, on the translation of

Marlowe-Crownes [Crowne & Marlowe, 1960] Social Desirability Scale).

Thus, the back-translated version of the original instrument is used for datacollection instead of the original version, as it is considered most likely to

be equivalent to the translated version (Brislin, 1986). When this outcome

is selected and when researchers worry that changes in the original lan-

guage may lead to a lack of comparability with previous studies using the

original instrument, Brislin (1986) suggested collecting data using both

the decentered and the original version of the instrument on a sample

speaking the original language. The participants may see half of the original

items and half of the revised items in a counterbalanced order. Statisticalanalysis can indicate whether different conclusions should be made based

on responses to the original versus the revised items (see Brislin, 1970).

Pretests. Following translation and back translation of an instrument

and, therefore, judgmental evidence about the equivalence of the original

and translated versions of the instrument, several pretest measures can be

used to evaluate the equivalence of the instruments in regard to the mean-

ing conveyed by the items. One approach is to administer the original and

the translated versions of the instrument to bilingual persons (Brislin et al.,

1973; van de Vijver & Hambleton, 1996). Following the administration of

the instruments, item responses can be compared using statistical methods

(e.g., t test). If item differences are discovered between versions of the

instrument, the translations are reviewed and changed accordingly.



16/32

Sometimes bilingual individuals are used in lieu of performing back

translations (Brislin et al., 1973). In this case, the translated version and

original versions of the instrument are administered to bilingual persons.

The bilingual persons may be randomly assigned to two groups that receive

half of the questions in the original language and the other half in the target

language. The translated items resulting in responses different from responses

elicited by the same original items are then refined until the responses between

the original and the translated items are comparable. Items not yielding com-

parable responses despite revisions are discarded. If items yield comparable

results, the two versions of the instrument are considered equivalent. Additionally,

a small group of bilingual individuals can be employed to rate each item from

the original and translated versions of the instrument on a predetermined scalein regard to the similarity of meaning conveyed by the item. Problematic items

are then refined until deemed satisfactory (e.g., Hambleton, 2001).

A small sample of participants (e.g.,N = 10) can also be employed to pretest

a translated measure that has gone through the translation/back-translation

iteration. Here, participants are instructed to provide verbal or written feed-

back about each item of the scale. For example, Brislin et al. (1973) noted

two methods: random probe and rating of items. In the random probe

method, the researcher randomly selects items from a scale and asks probingquestions about an item, such as What do you mean? Persons responses

to the probes are then examined. Responses considered bizarre or unfitting

an item are scrutinized, and the translation of the item is changed. This

method provides insight into how well the meaning of the original items has

fared in the translation. In the rating method, respondents are asked to rate

their perceptions about item clarity and appropriateness on a predetermined

scale. Items that are unclear or not fitting based on these ratings are

reworded. Finally, a focus group approach can be used (e.g., gisdttir,Gerstein, & Gridley, 2000) where a small group of participants responds to

the translated version and then discusses with the researcher(s) the meaning

the participants associated with the items. Participants also share their

perception about the clarity and cultural appropriateness of the items. Item

wording is then changed based on responses from the focus group members.

Statistical Assessment of the Translated Measure

In addition to pretesting a translated scale and judgmental evidence

about a scales equivalence, researchers need to provide further evidence of

the measures equivalence to the original instrument. As stated earlier, item

analyses and Cronbachs alpha suggest equivalence and lack of bias.

Furthermore, exploratory and confirmatory factor analyses of the measures

factor structure can contribute information about construct equivalence.

Multidimensional scaling and cluster analysis can be used to explore construct



17/32

equivalence as well. These techniques indicate equivalence on an instru-

ment level, more specifically, about the similarities and differences of the

hypothesized construct underlying the instrument for the different language

versions. Similar to Brislin et al.s (1973) suggestions mentioned earlier,

Mallinckrodt and Wang (2004) proposed a method they termed the dual-

language split half (DLSH) to evaluate equivalence. In this procedure,

alternate forms of a translated measure, each composed of one half of items

in the original language and one half of items in the target language, are

administered to bilingual persons in a counterbalanced order of languages.

Equivalence between the two language versions of the instruments is deter-

mined by lack of significant differences between mean scores on the origi-

nal and translated version of the measures, by split-half correlationsbetween clusters of items on the original and the target language, and by the

internal consistency reliability and testretest reliability of the dual lan-

guage form of the measures. These coefficients are compared to results

from the original-language version of the instrument. Also inherent in this

approach is collection of evidence for convergent validity for each language

version. Finally, and as mentioned earlier, to provide further evidence of the

measures equivalence to the original measure analyses at the item level

(item bias analysis; van de Vijver & Hambleton, 1996), procedures such asANOVA and IRT to examine DIF can be applied to determine scalar equiv-

alence (cf. van de Vijver & Leung, 1997). MCFA and MACS invariance

analyses can be employed for this purpose as well.

CONTENT ANALYSIS OF TRANSLATION METHODS

IN SELECT COUNSELING JOURNALS

Another purpose of this article is to examine, analyze, and evaluate

translation practices employed in five prominent counseling journals

thought to publish a greater number of articles on international topics

than other counseling periodicals. This purpose was pursued to determine

whether counseling researchers have, in fact, followed the translation pro-

cedures suggested by Brislin (1986) and Brislin et al. (1973) and in the

ITC guidelines (e.g., van de Vijver and Hambleton, 1996). We also examined

the methods used to control for bias and increase equivalence. While this

was not the primary purpose of this article, results of our investigation

might help illustrate counseling researchers use of preferred translation

principles mentioned in the cross-cultural literature. It was also assumed

results obtained from this type of investigation could help identify further

recommendations to assist counseling researchers when conducting cross-

cultural studies and when reporting results of such projects in the schol-

arly literature.



18/32

METHOD

Sample

The sample consisted of published studies employing translated instru-

ments in their data collection. To be included in this project, an integral part

of the studys methodology had to be a translation of one or more entire

instrument or some subset of items from an instrument. Furthermore, the

target instrument could not have been translated or evaluated the same way

in earlier studies. Additionally, the included studies had to either compare

responses from persons from more than one culture (nationality) or inves-

tigate a psychological concept using a non-U.S. or non-English-speakingsample of participants. Studies for this investigation were sampled from

five counseling journals (Journal of Counseling Psychology [JCP],Journal of

Counseling and Development[JCD],Journal of Multicultural Counseling

and Development [JMCD], Measurement and Evaluation in Counseling

and Development [MECD], and The Counseling Psychologist [TCP])

thought to publish articles relevant to non-English-speaking cultures, eth-

nic groups, and/or countries. To assess for more recent trends in the litera-

ture, only articles published between the years 2000 and 2005 were

included in our sample. We assumed recent studies (i.e., studies published

since 2000) would provide a good representation of current translation and

verification practices employed by counseling researchers. From 2000 to

2005, a total of 615 empirical articles were published in the targeted jour-

nals. Of these articles, 15 included translation as a part of their methodol-

ogy. Therefore, 2.4% of the empirical articles published in these five

counseling journals incorporated a translation process.

Procedure

The 15 identified studies were coded by (a) publication source (e.g.,

TCP), (b) year of publication (e.g., 2001), (c) construct investigated and

name of scale translated, (d) translation methodology used (single person,

committee, bilinguals), (e) whether the translated version of the scale was

pilot tested (yes or no) before main data collection, (f) number of partici-

pants used for pilot testing, (g) psychometric properties reported and statis-

tics used to evaluate the translated measures equivalence to the originalscale, and (h) number of participants from which the psychometric data

were gathered. Two of the current authors coded information from the arti-

cles independently. If disagreements arose in the coding (e.g., relevant

psychometrics for equivalence evaluation), these were resolved through

consensus agreement between the coders.


(text continues on p. 22)


19/32

19

1.Shin,Berkson,&

Crittenden(2000);

JMCD

2.Engels,Finkenauer,

Meeus,&Dekovic

(2001);JCP

3.Chung&Bemak

(2002);JCD

4.Kasturirangan&

Nutt-Williams

(2003);JMCD

5.Asner-Self&

Schreiber(2004);

MECD

6.Torres&Rollock

(2004);MECD

Psychological

help-seeking

attitudes;

traditionalvalues

Parentalattachment;

Relation

al

comp

etence;

Self-e

steem;

Depression

Anxiety

;

depre

ssion;

psych

osocial

dysfu

nction

symptoms

Culture

Domesticviolence

Attributionalstyle

Acculturation-related

challe

nges

Immigrantsfrom

Korea

Dutchadolescents

SoutheasternAsian

refugees

Latinowomen

Immigrantsfrom

CentralAmerica

Immigrantsfrom

Central&South

America

Sixitemsfromthe

A

ttitudesToward

SeekingProfessional

PsychologicalHelp

(A

TSPPH);

A

cculturationAttitude

Scale,

(AAS)prior

translation;Vignettes

developedinEnglish

Pare

ntandPeer

A

ttachment(IPPA);

PerceivedCompetence

ScaleforChildren;

Self-EsteemScale;

D

epressiveMoodList

HealthOpinionSurvey

(interview)

Ase

mistructured

in

terviewprotocol

developedbythe

re

searchers:Two

in

terviewsinEnglish,

se

veninSpanish

The

AttributionalStyle

Q

uestionnaire(ASQ)

CulturalAdjustment

D

ifficultiesChecklist

(C

ADC)

EnglishtoKorean

EnglishtoDutch

Englishto

Vietnamese,

Khmer,

Laotian

EnglishtoSpanish.

Nodiscussionof

translationmethod

Englishto

Spanish

EnglishtoSpanish

Committee

Committee

(researchers);

unclearwhat

instrumentswere

translatedinstudy

Committee

Not r

eported

Committee

Committee

Yes

Yes(

researchers)

Yes

No

Yes

Yes

No

No

Pilot in

terviews

Pilotinterview;no

comparison

between

Englishand

Spanishversion

ofprotocol

priortodata

collection

No

No

N/A

N/A

N/A

N/A

Englishversion

ofprotocol

administeredto

(n=3)Latina

women

N/A

Notreportedfor

the10%ofthe

samplethat

respondedto

thisversion

A

TSPPH:Factoranalysis

AAS:Cronbach'salpha

(N=110Koreanimmigrants

inU.S.)

C

ronbach'salpha(N=412

Dutch

adolescents)

E

xploratoryfactoryanalysis

forVietnamese(N=867),

Cambodian(N=590),and

Laotian(n=723)persons

L

atinaprofessorofforeign

languageservedasanauditor

toensurepropertranslation

oftranscriptsfromSpanish

toEnglish(n=7)Latina

women

C

ronbach'salpha,principle

componentsanalysis(N=89

CentralAmerican

immigrantsinU.S.)

C

ronbach'salpha(N=86

Hispanicimmigrants).90%

ofthesamplerespondedto

thetranslatedversionof

instruments.

Nocomparison

reportedbetweenthetwo

languageversions

Assigned

Psychom

etricsReported

Number,

Approach

Citation,

Typeof

Instrument

to

Back

andJournal

Construct

Sample

Name

Translation

Translation

Translation

Pretest

Original

Target

TABLE1:

StudiesInvolvingTranslationofInstruments

(continued)


20/32

20

7.Oh&Neville

(2004);TCP

8.Asner-Self&

Marotta(2005);

JCD

9.Wei&Heppner

(2005);TCP

Cross-culturalstudies

10.

Marino,

Stuart,&

Minas(2000);

MECD

Korean

rapemyth

acceptance

Depression,anxiety,

phob

icanxiety;

Erikson'seight

psychosocialstages

Clients'perceptions

ofco

unselorcredi-

bility

;working

alliance

Accultu

ration

Koreancollege

students

Immigrantsfrom

CentralAmerica

Counselorclient

dyadsinTaiwan

Anglo-Celtic

Australians&

Vietnamese

immigrantsto

Australia

Illin

oisRapeMyth

A

cceptanceScale

(IRMAS)(26items

fromIRMASwere

translatedandinclud-

e

dinthepreliminary

v

ersionoftheKorean

R

apeMyth

A

cceptanceScale;

K

RMAS)

BriefSymptom

Inventory(BSI);

M

easuresof

P

sychosocial

D

evelopment(MPD)

Cou

nselorRating

F

ormshortVersion

(CRF-S);The

W

orkingAlliance

Inventoryshort

V

ersion(WAI-S)

Dev

elopeda

q

uestionnaire(in

E

nglish)measuring

b

ehavioraland

p

sychological

a

cculturation,and

socioeconomicand

d

emographic

influenceson

a

cculturation

EnglishtoKorean

EnglishtoSpanish

EnglishtoMandarin

EnglishtoVietnamese

Singleperson

Notreported

Singleperson

Committee

Yes

Yes

Yes

Yes

Yes;Focusgroup

(n=4South

Korean

nationals)

evaluatedeach

itemfrom

IRMASand26

itemsgenerated

fromKorean

literature.A

ll

itemswerein

Korean

Notreported

No

Yes(n=10)

Vietnamese

version

N/A

Notreported

N/A

Cronbach'salpha,

(N=196

Anglo-Celtic

Australians)

Study1:Principlecomponents

analysisfollowedby

exploratoryfactoranalysis

(N=348SouthKorean

collegestudents).Study2:

confirmatoryFactor

analysis,

factorial

invarianceprocedure,

Cronbach'salpha,

&

MANOVAtoestablish

criterionvalidity(N=547

SouthKoreannationals).

Study3:Testretest

reliability(N=40South

Koreanteachersorschool

administrators)

Notreported.

Noinformation

aboutnumberof

participantsrespondingto

EnglishorSpanishversions

ofinstruments.Volunteers

probedabouttheresearch

experience.

Cronbach'salpha,

intercorrelationsamong

CRFsubscales(construct

validity)(N=31counselor/

clientdyadsinTaiwan)

Cronbach'salpha(N=187

VietnameseAustralians).

Vietnameseparticipants

respondedtoeitheranEnglish

oraVietnameseversionof

theinstrument.Statistical

evidenceofequivalence

betweenthesetwolanguage

versionsoftheinstrument

wasnotreported

Assigned

Psycho

metricsReported

Number,

Approach

Citation,

Typeof

Instrument

to

Back

andJournal

Construct

Sample

Name

Translation

Translation

Translation

Pretest

Original

Target

TABLE:(continued)


21/32

21

11.gisdttir&

Gerstein(2000);

JCD

12.Poasa,

Mallinckrodt,&

Suzuki(2000);

TCP

13.Tang(2002);

JMCD

Equivalencestudies

14.Chang&

Myers(2003);

MECD

15.Mallinckrodt&

Wang(2004);JCP

Counseling

expec

tations;

Holla

nd'stypology

Causalattributions

Careerc

hoice

Wellness

Adultattachment

Icelandic&U.S.

collegestudents

U.S.,

American

Somoan,&

WesternSamoan

collegestudents

Chinese,

Chinese-American,

&Caucasian

Americancollege

students

Immigrantsfrom

Korea

Int'lstudentsfrom

Taiwan

ExpectationsAbout

Counseling

Questionnaire(EAC-B);

Self-DirectedSearch

(S

DS)

Questionnaireof

A

ttributionand

Culture(QAC;

vignetteswithopen-

endedresponseprobes

developedinEnglish)

Aquestionnaire

developedinEnglish

in

thestudyto

m

easureinfluenceson

careerchoice

The

WellnessEvaluation

ofLifestyle(WEL)

The

Experiencesin

CloseRelationships

Scale(ECRS)

EnglishtoIcelandic

EnglishtoSamoan

EnglishtoChinese

EnglishtoKorean

EnglishtoChinese

Committee

Singleperson

Singleperson

(researcher)

Singletranslator

whosetranslations

wereeditedbyfirst

author.

Discrepancies

resolvedbetween

translatorand

editoruponmutual

agreement

Committee

Yes

Yes

Yes

No

Yes

FocusGroup(n=

8)Icelandic

version

Englishversionof

QACpilot

testedand

respondents

provided

feedbackto

evaluate

equivalence

(n=16)

No

Yes(n=3):

Bilingualexam-

ineestookboth

theEnglishand

theKorean

version.Effect

size(Cohen'sd)

ofdifferencein

meanscores

between

Englishand

Koreanversion

No

Cronbach'salpha

(N=225U.S.

college

students)

AteamofEnglish-

speaking

persons(n=4)

independently

codedthe

English-

language

responsesfrom

QACandinter-

views(N=23)

Nonereportedfor

Caucasian

American(N=

124)andAsian

American

(131)college

students

Nonereportedfor

alargersample

(Nnot

reported)

Split-half

reliability,

Cronbach's

alpha(N=399

U.S.college

students)

C

ronbach'salpha(N=261

Icelandiccollegestudents).

Covariateanalysis(prior

counselingexperience)used

tocontrolformethodbias.

A

teamofSamoan-speaking

persons(n=3)

independentlycodedthe

Samoanlanguageresponses

fromQACandinterviews

(N=50).Noinformation

aboutifthemes/codeswere

translatedfromSamoanto

English

N

onereportedforChinese

(N=120)collegestudents

N

onereportedforalarger

sample(Nnotreported)

U

sedbilinguals(n=30

Taiwaneseinternational

collegestudents)toevaluate

equivalenceusingDLSH

method:within-subjects

ttestbetweentwolanguage

versions,split-halfreliability,

Cronbach'salpha,testretest

reliabilityandconstruct

validitycorrelationswitha

relatedconstruct


22/32

RESULTS

Table 1 lists results found for each of the 15 studies. Three of the

included studies used a structured or semistructured interviewtest protocol.

In 3 studies, of which one included a semistructured test protocol, an

English-language instrument was developed and then translated to another

language. Furthermore, in 9 studies, one or more preexisting measures (the

entire instrument or subset of items) were translated into a language other

than English. In the 15 studies, a range of constructs was examined, includ-

ing persons counseling orientations (e.g., help-seeking attitudes, counsel-

ing expectations), adjustment (e.g., acculturation), and maladjustment (e.g.,

psychological stress). A diversity of cultural groups was represented in the15 studies as well (see Table 1).

Evaluation of Included Studies

Two main criteria were used to evaluate these 15 studies: (a) the trans-

lation methodology employed (single person, committee, back translation,

pretest), which provides judgmental evidence about the equivalence of the

translated measure to the original measure; and (b) whether statisticalmethods were used to verify equivalence of the translated measure to its

original-language version. Because the studies ranged in terms of their pur-

pose and the approaches taken when investigating multicultural groups, and

also because these strategies were linked with different measurement

opportunities of equivalence and bias, we divided these 15 studies into

three categories: target-language, cross-cultural, and equivalence studies.

The target-language studies included projects in which only translated ver-

sions of measures were investigated. These studies employed either cross-cultural (etic) methodologies or a combination of cultural and

cross-cultural methodologies (emicetic). For these studies, there was no

direct comparison made between an original and a translated version of the

protocol. The second category of studies used a cross-cultural approach, as

they compared two or more groups on a certain construct. Each of these

groups received the original and translated versions of a measure. Finally,

the third category of studies was specifically designed to examine equiva-

lence between two language versions of an instrument. These studies we

termed equivalence studies.

We identified studies that employed sound versus weak translation method-

ologies. This task turned out to be difficult, however, because of the scarcity

of information reported about the translation processes used. Sometimes,

the translation procedure was described in only a couple of sentences. In

other instances, the translation methodology was discussed in more detail



23/32

(e.g., number and qualifications of translators and back translators), while in

fewer instances, examples were provided about acceptable and unacceptable

item translations.

Despite these difficulties, and based on available information, we con-

trasted relatively sound and weak translation procedures. Translation methods

we considered weak did not incorporate any mechanism to evaluate the trans-

lation, including either judgmental (e.g., back translation, use of bilinguals,

pretest) and/or quantitative (statistical evidence of equivalence) procedures.

Instead, such a protocol was translated to one or more languages without any

apparent evaluation about its equivalence to the original language version.

Methodologically sound studies incorporated both judgmental and quantita-

tive methods to assess the validity of the translation. Given these criteria toevaluate the methodological rigor of the translation process employed, we now

present the analyses of the 15 identified studies in the literature.

Target-language studies. Eight of the 15 studies administered and exam-

ined responses from a translated measure without direct comparison to a

group responding to an original-language version of the measure (see Table

1). In most of these studies, persons from one cultural group participated.

Both quantitative and qualitative methods were employed. These studiesrelied on preexisting instruments, select items from preexisting instru-

ments, or interview protocols translated into a new language. We also

included in this category studies in which a protocol was developed in

English and translated into another language.

In two studies (4 and 8), few procedures were reported to evaluate the

translation and verify the different language form of the measures used (see

Table 1). In these studies, two language versions of a scale were collapsed

into one set of responses without evaluating their equivalence. A strongerdesign for these studies would ensure judgmental equivalence between the

two language versions of the scales. This could have been accomplished by

using a committee of translators and independent back translators. A

stronger design would have also resulted from incorporating a decentering

process when developing the adapted measures and, if appropriate, by sta-

tistically assessing equivalence. Thus, we considered these studies weak in

terms of their methodological rigor.

Sound translation methods incorporate several mechanisms to evaluate

a translated version of a protocol. They involve, for instance, a committee

approach to translation/back translation, a pretest of the scale, and an eval-

uation of the instruments psychometric properties relative to the original

version. Four studies reported information somewhat consistent with our

criteria for sound methodological procedures (3, 5, 7, and 9). The authors,

with varying degree of detail, reported using either a single person or a



24/32

committee approach to translation, they relied on back translation, and they

employed one or more independent experts to evaluate the equivalence of

the language forms. They also reported making subsequent changes to the

translated version of the instruments they were using. Additionally, in some

of these studies, a pretest of the translated protocol was performed, and in

all of these projects, the investigators discussed the statistical tests of the

measures psychometric properties (see Table 1).

The remaining three studies in this category (1, 2, and 6) contained

translation methods of moderate quality, in that their quality ranged in

between those we considered using relatively weak and strong translation

procedures. In fact, the translation process was not fully described.

Furthermore, in one instance, the same person performed the translationand the back translation (2), and in another (6), no assessment of equiva-

lence was reported on the two language versions of the scale used before

responses were collapsed into one data set. Also, in one study (1), translated

items from an existing scale were selected a priori without any quantitative

or qualitative (e.g., pretest) assurance these items fit the cultural group to

which they were administered. In none of these three studies were the mea-

sures pretested before collecting data for the main study. Finally, insufficient

information was reported about the translated instruments psychometricproperties to evaluate the validity of the measures for the targeted cultural

groups. The internal validity of these studies could have been greatly

improved had the researchers included some of these procedures in the

translation and verification process.

Cross-cultural studies. Four of the 15 studies directly compared two

or more cultural groups. In 3 of these studies, an instrument was developed

in English and then translated into another language, whereas in 1 study, apreexisting instrument was translated to another language (see Table 1). In

all 4 studies, comparisons were made between language groups relying on

two language versions of the same instrument.

None of these four studies employed a particularly weak translation

methodology. Yet three of the four studies (11, 12, and 13) used relatively

rigorous methods. In these three studies, the scales were pretested follow-

ing the translation/back-translation process, providing judgmental evidence

of equivalence. Additionally, in the two quantitative studies (10 and 11),

the researchers compared Cronbachs alphas between language versions.

Finally, in one study (11), equivalence was further determined by employ-

ing covariate analysis to control for method bias (different experiences of

participants across cultures) in support of scalar equivalence. None of these

approaches to examine and ensure equivalence was reported in the Tang

(2002) study. As a result, we concluded that this study used the least valid



25/32

approach. It is noteworthy that all four studies in this category failed to

assess the factor structure of the different language versions of the mea-

sures, and as such, they did not provide additional support for construct

equivalence. Similarly, none of these studies assessed item bias or per-

formed any detailed analyses to verify scalar equivalence. Employing these

additional analyses would have greatly enhanced the validity of the reported

cross-cultural comparisons in these four studies.

Equivalence studies. Two of the 15 studies were treated as separate

cases, as they were specifically designed to demonstrate and evaluate

equivalence between two language versions of a scale (see Table 1).

Therefore, we did not evaluate these the same way as the other 13 studies.Instead, they are examples of how to enhance cross-cultural validity of

translated and adapted scales. We concluded that Mallinckrodt and Wangs

(2004) approach to determine construct equivalence between language ver-

sions of a measure was significantly more rigorous than the one presented

by Chang and Myers (2003).

As can be seen from Table 1, Chang and Myers (2003) employed three

bilingual persons in lieu of back translation. In their approach, bilingual

persons average scale scores to both versions of a scale were compared.Mallinckrodt and Wang (2004), in contrast, used both back translation and

bilingual individuals to demonstrate and ensure equivalence. Their method

subsumed the method employed by Chang and Myers. Following a back

translation of an instrument, Mallinckrodt and Wang used a quantitative

methodology, the DLSH, to assess equivalence between two language

versions of a scale (see discussion earlier). In brief, with this approach,

responses from bilingual individuals receiving half of the items in each

language were compared to a criterion sample of persons responding to theoriginal version of the scale. By comparing average scale scores, reliability

coefficients, and construct validity correlations, the researchers were able to

examine the equivalence (construct and to some degree scalar equivalence)

between the two language versions of the instrument.

Interpretation of Results

The current results are consistent with Mallinckrodt and Wang (2004),

who discovered in their review of articles published in two counseling jour-

nals (JCP and TCP) that few studies in counseling psychology have inves-

tigated multilingual or international groups or employed translation methods.

Additionally, consistent with these investigators, we found in many instances,

counseling researchers used inadequate procedures to verify equivalence

between language versions of an instrument. For example, our analyses



26/32

indicated just more than half of the 15 studies employed a committee of

translators. A committee is highly recommended in the ITC guidelines (van

de Vijver & Hambleton, 1996).

We also discovered in less than half of the 15 studies that the measure-

ment devices were pretested, and in slightly more than half of the studies,

the researchers used quantitative methods to further demonstrate equiva-

lence. Furthermore, only 1 study systematically controlled for method bias,

while none of the 15 studies assessed for item bias. All these procedures

are recommended in the ITC guidelines. On a positive note, however, all

but 2 studies used a back-translation procedure to enhance equivalence.

Taken together, all of these results are disquieting and lead us to call for

employing more rigorous research designs when studying culture, whenusing and evaluating translated instruments, and when performing cross-

cultural comparisons.

Additionally, we found, in many cases, limited attention was placed on

discussing translation methods. Hambleton (2001) also observed this trend.

Not knowing the reason for this lack of effort, we speculate about why

methods of translation were not described in more detail. One reason could

be the lack of importance placed on this methodological feature of a

research design. Another may relate to an authors desire to comply withpage limitations in journals. A third reason could be a researchers failure

to recognize the importance of reporting the details about methods of trans-

lation. Finally, it is conceivable that researchers assume others are aware of

common methods of translation and thus do not discuss the methods they

use in much detail. Whatever the reasons, consistent with the ITC guide-

lines, we strongly suggest investigators provide detailed information about

the methods they employ when translating and validating instruments used

in research. This is especially important, as an inappropriate translation ofa measure can lead to a serious threat to a studys internal validity, may con-

tribute to bias, and in international comparisons may limit the level of

equivalence between multilingual versions of a measure. As a threat to

internal validity, a poorly translated instrument may act as a strong rival

hypothesis for obtained results.

RECOMMENDATIONS

Translation Practices

Several steps are essential for a valid translation. Based on our and

Brislin and colleagues (Brislin, 1986; Brislin et al., 1973) review of common

translation methods and the ITC guidelines (e.g., Hambleton, 2001; van de Vijver

& Hambleton, 1996), the best translation procedure involves several steps as



27/32

outlined in Table 2. All but the last step outlined in this table help to minimize

item and construct bias and therefore may increase scalar equivalence between

language versions of a measure (ITC development guidelines). The last step

or recommendation refers to verifying cross-cultural validity of measures

(i.e., absence of bias and equivalence; ITC interpretation guidelines).

Combining Emic and Etic Approaches

As stated previously, the cross-cultural approach to studying cultural

influences on behavior has limitations. One risk involves assuming universal

laws of behavior and neglecting an in-depth understanding of cultures and

their influences on behavior (e.g., imposed etics). To address this problem,

and in line with suggestions reviewed earlier, we offer several recommenda-

tions for counseling psychologists involved in international research. First,

collaboration between scholars worldwide and across disciplines is suggested

to enhance the quality of cross-cultural studies and the validity of methods

and findings. Such collaboration increases the possibility that unique cultural

variables will be incorporated into the research and potential threats to

internal and external validity will be reduced. Second, to avoid potential

method bias, an integration of quantitative and qualitative methods should be

considered, especially when one type of method may be more appropriate and

relevant to a particular culture. A convergence of results from both methods


TABLE 2: Summary of Recommended Translation Practices

1. Independent translation from two or more persons familiar with the target lan-

guage and culture and intent of the scale2. Documentation of comparisons of translations and agreement on the best trans-

lation

3. Rewriting of translated items to fit grammatical structure of target language

4. Independent back translation of translated measure into original language (one or

more persons)

5. Comparison of original and back-translated versions, focusing on appropriate-

ness, clarity, meaning (e.g., use rating scales)

6. Changes to the translated measure based on prior comparison. Changed items go

through the translation/back-translation iteration until satisfactory7. If concepts or ideas do not translate well, deciding what version of the original

version of a scale should be used for cross-cultural comparison (original, back

translated, or decentered)

8. Pretest of translated instrument on an independent sample (bilinguals or target

language group). Check for clarity, appropriateness, and meaning

9. Assessment of the scales reliability and validity, absence of bias, and equivalence

to the original-language version of the scale


28/32

enhances the validity of the findings. Third, when method bias is not expected

but there is a potential for construct bias while the use of a preexisting mea-

sure is considered feasible, researchers should consider collecting emic items

to be included in the instrument when studying an etic construct (e.g., Brislin,

1976; Oh & Neville, 2004). This approach will enhance construct equiva-

lence by limiting construct bias and will provide culture-specific information

to aid theory development. Fourth, when emic scales are available in the cul-

tures of interest to assess an etic construct and cross-cultural comparisons are

sought, the convergence approach should be considered. With this approach,

all instruments are translated and administered to each cultural group. Then,

items and scales shared across cultures are used for cross-cultural compar-

isons, whereas nonshared items provide information about the unique aspectof the construct in each culture (e.g., van de Vijver, 1998). This approach will

enhance construct equivalence, it may deepen the current understanding of

cultural and cross-cultural dimensions of a construct, and it may aid theory

development. Finally, Triandiss (1972, 1976) suggestion can be considered.

With this procedure, instruments are simultaneously assembled in each cul-

ture to measure the etic construct (e.g., subjective well-being). With this

approach, most or all types of biases can be minimized and equivalence

enhanced, as no predetermined stimuli are used. Factor analyses can be per-formed to identify etic constructs for cross-cultural comparisons.

CONCLUSION

Given our professions increased interest in international topics, there is

a critical need to address methodological challenges unique to this area. We

discussed important challenges such as translation, equivalence, and bias.Proper translation methods may strengthen the equivalence of constructs

across cultures, as a focus on instrumentation can minimize item bias and

some method bias. Consequently, construct equivalence may be enhanced.

Merely targeting an instruments translation, however, is not sufficient.

Other factors to consider when mak

Download - 10 Methodological Issues

Top Related