Download - 10 Methodological Issues
-
7/30/2019 10 Methodological Issues
1/32
Methodological Issues in Cross-Cultural
Counseling Research:
Equivalence, Bias, and Translations
Stefana gisdttir
Lawrence H. Gerstein
Deniz Canel inarbasBall State University
Concerns about the cross-cultural validity of constructs are discussed, including equiv-
alence, bias, and translation procedures. Methods to enhance equivalence are described,
as are strategies to evaluate and minimize types of bias. Recommendations for translat-
ing instruments are also presented. To illustrate some challenges of cross-cultural coun-
seling research, translation procedures employed in studies published in five counseling
journals are evaluated. In 15 of 615 empirical articles, a translation of instruments was
performed. In 9 studies, there was some effort to enhance and evaluate equivalence
between language versions of the measures employed. In contrast, 2 studies did not report
using thorough translation and verification procedures, and 4 studies employed a mod-
erate degree of rigorousness. Suggestions for strengthening translation methodologies and
enhancing the rigor of cross-cultural counseling research are provided. To conduct
cross-culturally valid research and deliver culturally appropriate services, counseling
psychologists must generate and rely on methodologically sound cross-cultural studies.
This article provides a schema for performing such studies.
There is growing interest in international issues in the counseling profes-
sion. There are more publications about cross-cultural issues in counsel-
ing and the role of counseling outside of the United States (Gerstein, 2005;
Gerstein & gisdttir, 2005a, 2005b, 2005c; Leong & Blustein, 2000; Leong
& Ponterotto, 2003; Leung, 2003; gisdttir & Gerstein, 2005). Greaterattention also has been paid to counseling international individuals living in
the United States (Fouad, 1991; Pedersen, 1991). Confirming this trend is the
focus of Division 17s past president (2003 to 2004), Louise Douce, on the
globalization of counseling psychology. Douce encouraged developing a
strategic plan to enhance the professions global effort and facilitate a move-
ment that transcends nationalism (Douce, 2004, p. 145). She also stressed
questioning the validity and applicability of our Eurocentric paradigms and
the hegemony of such paradigms. Instead, she claimed our paradigms must
integrate and evolve from indigenous models.
P. Puncky Heppner continued Douces effort as part of his Division 17
presidential initiative. Heppner (2006) claimed, Cross-national relationships
THE COUNSELING PSYCHOLOGIST, Vol. XX No. X, Month XXXX xx-xx
DOI: 10.1177/0011000007305384
2007 by the Division of Counseling Psychology.
1
Copyright 2007 by Division 17 of Counseling Psychologist Association.
-
7/30/2019 10 Methodological Issues
2/32
have tremendous potential to enhance the basic core of the science and
practice of counseling psychology, both domestically and internationally
(p. 147). He also predicted, In the future, counseling psychology will no
longer be defined as counseling psychology within the United States, but
rather, the parameters of counseling psychology will cross many countries
and many cultures (Heppner, 2006, p. 170).
Although an international focus in counseling is important, there are
many challenges (cf. Douce, 2004; Heppner, 2006; Pedersen, 2003). This
article discusses methodological challenges, especially as related to the
translation and adaptation of instruments for use in international and cross-
cultural studies and their link to equivalence and bias. While there has been
discussion in the counseling psychology literature about the benefits andchallenges of cross-cultural counseling and the risks of simply applying
Western theories and strategies cross-culturally, we were unable to locate
publications in our literature detailing how to perform cross-culturally valid
research. There is literature, however, in other areas of psychology (e.g.,
cross-cultural, social, international) that addresses these topics. This article
draws from this literature to introduce counseling psychologists to some
concepts, methods, and issues when conducting cross-cultural research. We
also extend this literature by discussing the potential use of cross-culturalmethodologies in counseling research.
As a way to illustrate some challenges of cross-cultural research, we also
examine, analyze, and evaluate translation practices employed in five
prominent counseling journals to determine the translation procedures
counseling researchers have used and the methods employed to minimize
bias and evaluate equivalence. Finally, we offer recommendations about
translation methodology and ways to increase validity in cross-cultural
counseling research.
METHODOLOGICAL CONCEPTS AND ISSUES
IN CROSS-CULTURAL RESEARCH
Approaches to Studying Culture
There are numerous definitions ofculture in anthropology and counsel-
ing psychology. Ponterotto, Casas, Suzuki, and Alexander (1995) con-cluded that for most scholars, culture is a learned system of meaning and
behavior passed from one generation to the next. When studying cultural
influences on behavior, counseling psychologists may approach cultural
variables and the design of research from three different angles using the
indigenous, the cultural, and the cross-cultural approach (Triandis, 2000).
2 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
3/32
According to Triandis, when using the indigenous approach, researchers
are mainly interested in the meaning of concepts in a culture and how such
meaning may change across demographics within a cultural context (e.g.,
what does counseling mean in this culture?). With this approach, psychol-
ogists often study their own culture with the goal of benefiting people in
that culture. The focus of such studies is the development of a psychology
tailored to a specific culture without a focus on generalization outside of
that cultural context (cf. Adamopolous & Lonner, 2001). The main chal-
lenge with the indigenous approach is the difficulty in avoiding existing
psychological concepts, theories, and methodologies and therefore deter-
mining what is indigenous (Adamopolous & Lonner, 2001).
Triandis (2000) contended with the cultural approach; in contrast, psy-chologists often study cultures other than their own by using ethnographic
methods. True experimental methods can also be used within this approach
(van de Vijver, 2001). Again, the meanings of constructs in a culture are the
main focus without direct comparison of constructs across cultures. The
aim is to advance the understanding of persons in a sociocultural context
and to emphasize the importance of culture in understanding behavior
(Adamopolous & Lonner, 2001). The challenge with this approach is a lack
of widely accepted research methodology (Adamopolous & Lonner, 2001).Last, Triandis (2000) stated that when using cross-cultural approaches,
psychologists obtain data in two or more cultures assuming the constructs
under investigation exist in all of the cultures studied. Here, researchers
are interested in how a construct affects behavior differently or similarly
across cultures. Thus, one implication of this approach is an increased
understanding of the cross-cultural validity and generalizability of the
theories and/or constructs. The main challenge with this approach is
demonstrating equivalence of constructs and measures used in the targetcultures and also minimizing biases that may threaten valid cross-cultural
comparisons.
In sum, indigenous and cultural approaches focus on the emics, or things
unique to a culture. These approaches are relativistic in that the aim is
studying the local context and meaning of constructs without imposing a
priori definitions of the constructs (Tanaka-Matsumi, 2001). Scholars rep-
resenting these approaches usually reject claims that psychological theories
are universal (Kim, 2001). In the cross-cultural approach, in contrast, the
focus is on the etics, or factors common across cultures (Brislin, Lonner, &
Thorndike, 1973). Here the goal is to understand similarities and differ-
ences across cultures, and the comparability of cross-cultural categories or
dimensions is emphasized (Tanaka-Matsumi, 2001).
gisdttir et al. / CROSS-CULTURAL VALIDITY 3
-
7/30/2019 10 Methodological Issues
4/32
Methodological Challenges in Cross-Cultural Research
Scholars from diverse psychology disciplines have pursued cross-cultural
research for decades, and as a result, a literature on cross-cultural researchmethodologies and challenges emerged (e.g., Berry, 1969; Brislin, 1976;
Brislin et al., 1973; Lonner & Berry, 1986; Triandis, 1976; van de Vijver,
2001; van de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997).
Based on this work, our article identifies some methodological challenges
faced by cross-cultural researchers. Before proceeding, note that the challenges
summarized below refer to any cross-cultural comparison of psychological
constructs (within [e.g., ethnic groups] and between countries). These chal-
lenges are greater, though, in cross-cultural comparisons requiring transla-tion of instruments.
Equivalence
Equivalence is a key concept in cross-cultural psychology. It addresses the
question of comparability of observations (test scores) across cultures (van
de Vijver, 2001). Several definitions or forms of equivalence have been
reported. Lonner (1985), for instance, discussed four types: functional, concep-
tual, metric, and linguistic. Functional equivalence refers to the function the
behavior under study (e.g., counselor empathy) has in different cultures.
If similar behaviors or activities (e.g., smiling) have different functions in var-
ious cultures, their parameters cannot be used for cross-cultural comparison
(Jahoda, 1966; Lonner, 1985). In comparison, conceptual equivalence refers
to the similarity in meaning attached to a behavior or concept (Lonner, 1985;
Malpass & Poortinga, 1986). Certain behaviors and concepts (e.g., help seeking)
may vary in meaning across cultures. Metric equivalence refers to psycho-
metric properties of the tool (e.g., Self-Directed Search) used to measure the
same construct across cultures. It is assumed if psychometric data from two
or more cultural groups have the same structure (Malpass & Poortinga,
1986). Finally, linguistic equivalence has to do with wording of items (form,
meaning, and structure) in different language versions of an instrument, the
reading difficulty of the items, and the naturalness of the items in the trans-
lated form (Lonner, 1985; van de Vijver & Leung, 1997).
Van de Vijver and his colleagues (van de Vijver, 2001; van de Vijver &
Leung, 1997) also discussed four types of equivalence representing a hier-archical order from absence to higher degree of equivalence. The first type,
construct nonequivalence, refers to constructs (e.g., cultural syndromes)
being so dissimilar across cultures they cannot be compared. Under these
circumstances, no link exists between the constructs. The next three types
of equivalence demonstrate some equivalence with the higher level in the
4 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
5/32
hierarchy presupposing a lower level. These are construct (or structural),
measurement unit, and scalar equivalence.
At the lowest level is construct equivalence. A scale has construct
equivalence if it measures the same underlying construct across cultural
groups. Construct equivalence has been demonstrated for many constructs
in psychology (e.g., NEO Personality InventoryRevised five-factor model
of personality; McCrae & Costa, 1997). With construct equivalence, the
constructs (e.g., extraversion) are considered having the same meaning and
nomological network across cultures (relationships between constructs,
hypotheses, and measures; e.g., Betz, 2005) but need not be operationally
defined the same way for each cultural group (e.g., van de Vijver, 2001).
For instance, two emic measures of attitudes toward counseling may tapdifferent indicators of attitudes in each culture, and therefore, the measures
may include different items but at the same time be structurally equivalent, as
they both measure the same dimensions of counseling attitudes and predict
help seeking. Yet as their measurement differs, a direct comparison of
average test scores across cultures using a ttest or ANOVA, for example,
cannot be performed. The measures lack scalar equivalence (see below).
Construct equivalence is often demonstrated using exploratory and confirma-
tory factor analyses and structural equation modeling (SEM) to discern thesimilarities and differences of constructs structure and their nomological
networks across cultures.
The next level of equivalence is measurement-unit equivalence (van de
Vijver, 2001; van de Vijver & Leung, 1997). With this type of equivalence,
the measurement scales of the tools are equivalent (e.g., interval level), but
their origins are different across groups. While mean scores from scales
with this level of equivalence can be compared to examine individual dif-
ferences within groups (e.g., using ttest), because of different origin, com-paring mean scores (e.g., ttest) between groups from scales at this level will
not provide a valid comparison. For example, Kelvin and Celsius scales
have equivalent measurement units (interval scales) but measure tempera-
ture differentlythey have a different origin and, thus, direct comparison
of temperature using these two scales cannot be done. But because of a con-
stant difference between these two scales, comparability may be possible
(i.e., K = C 273). The known constant or value offsetting the scales makes
them comparable (van de Vijver & Leung, 1997). Such known constants are
difficult to discern in studies of human behavior, rendering scores at this level
often incomparable. A clear analogy in counseling psychology is using
different cut scores for various groups (e.g., gender) on instruments as an
indicator of some criteria or an underlying trait. Different cut scores (or
standard scores) are used because instruments do not show equivalence
beyond the measurement unit. That is, some bias affects the origin of the
gisdttir et al. / CROSS-CULTURAL VALIDITY 5
-
7/30/2019 10 Methodological Issues
6/32
scale for one group relative to the other, limiting raw score comparability
between the groups. For example, a raw score of 28 on the Minnesota
Multiphasic Personality Inventory 2 MacAndrew Alcohol ScaleRevised
(Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 2001) does not mean
the same thing for women as it does for men. For women, this score indi-
cates more impulsiveness and greater risk for substance abuse than it does
for men (Greene, 2000). A less clear example but extremely important to
cross-cultural research involves two language versions of the same psycho-
logical instrument. Here the origins of the two language versions of the
scale may appear the same (both versions include the same interval rating
scale for the items). This assumption, however, may be threatened if the two
cultural groups responding to this measure vary in their familiarity withLikert-type answer formats (method bias; see later). Because of the differ-
ential familiarity with this type of stimuli, the origin of the measurement
unit is not the same for both groups. Similarly, if the two cultural groups
vary in response style (e.g., acquiescence), a score of 2 on a 5-point scale
may not mean the same for both groups. In these examples, the source or
the origin of the scale is different in the two language versions, compro-
mising valid cross-cultural comparison.
Finally, and at the highest level of equivalence, is scalar equivalence or fullscore comparability. Equivalent instruments at the scalar level measure a con-
cept with the same interval or ratio scale across cultures, and the origins of
the scales are the same. Therefore, at this level, bias has been ruled out, and
direct cross-cultural comparisons of average scores on an instrument can be
made (e.g., van de Vijver & Leung, 1997).
According to van de Vijver (2001), it can be difficult to discern if measures
are equivalent at the measurement-unit or scalar level. This challenge is
observed in comparison of scale scores between cultural groups respondingto the same language version of an instrument as well as between different
language versions of a measure. As an example of this difficulty, when
using the same language version of an instrument, racial differences in
intelligence test scores can be interpreted as representing true differences in
intelligence (scalar equivalence has been reached) and as an artifact of the
measures (measurement-unit equivalence has been reached). In the latter,
the measurement units are the same, but they have different origins because
of various biases, hindering valid comparisons across different racial
groups. In this instance, valid comparisons at the ratio level (comparing
mean scores) cannot be done. Higher levels of equivalence are more diffi-
cult to establish. It is, for instance, easier to show that an instrument mea-
sures the same construct across cultures (construct equivalence) by showing
a similar factor structure and nomological networks than it is to demon-
strate the instruments numerical comparability (scalar equivalence). The
6 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
7/32
higher the level of equivalence, though, the more detailed analysis can be
performed on cross-cultural similarities and differences (van de Vijver,
2001; van de Vijver & Leung, 1997).
Levels of equivalence for measures used in cross-cultural counseling
research should be established and reported in counseling psychology pub-
lications. It is not until the equivalence of the concepts under study have
been determined that a meaningful cross-cultural comparison can be made.
Without demonstrated equivalence, numerous rival hypotheses (e.g., poor
translation) may account for observed cross-cultural differences.
Bias
Another important concept in cross-cultural research is bias. Bias negatively
influences equivalence and refers to nuisance factors, limiting the compara-
bility or scalar equivalence of observations (test scores) across cultural groups
(van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver & Poortinga,
1997). Typical sources of bias are construct, method, and item bias. A con-
struct bias occurs when the construct measured as a whole (e.g., intelli-
gence) is not identical across cultural groups. Potential sources for this type
of bias are when there is different coverage of the construct across cultures(i.e., not all relevant behavioral domains are sampled), an incomplete overlap
of how the construct is defined across cultures, and when the appropriate-
ness of item content differs between two language versions of an instrument
(cf. van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). A serious
construct bias equates to construct nonequivalence.
Even though a construct is well represented in multilingual versions of
a scale (construct equivalence, e.g., similar factor structure, and there is no
construct bias, e.g., complete coverage of construct), bias may still exist inthe scores, resulting in measurement-unit or scalar nonequivalence (van de
Vijver & Leung, 1997). This may be a result of method bias. Method bias
can stem from characteristics of the instrument or from its administration
(van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver &
Poortinga, 1997). Possible sources of this bias are differential response
styles (e.g., social desirability) across cultures (e.g., Johnson, Kulesa, Cho,
& Shavitt, 2005), variations in familiarity with the type of stimuli or scale
across cultures, communication problems between investigators and partici-
pants, and differences in physical conditions under which the instrument is
administered across cultures. Method bias can also limit cross-cultural com-
parisons when samples drawn from different cultures are not comparable
(e.g., prior experiences).
Item bias may also exist, posing a threat to cross-cultural comparison
(scalar equivalence). This type of bias refers to measurement at the item level.
gisdttir et al. / CROSS-CULTURAL VALIDITY 7
-
7/30/2019 10 Methodological Issues
8/32
This bias has several potential sources. It can result from poor translation or
poor item formulation (e.g., complex wording) and because item content may
not be equally relevant or appropriate for the cultural groups being compared
(e.g., Malpass & Poortinga, 1986; van de Vijver & Poortinga, 1997). An item
on an instrument is considered biased if persons from different cultures
having the same standing on the underlying characteristic (trait or state)
measured yield different average item scores on the instrument.
Finally, bias can be considered uniform and nonuniform. A uniform bias
refers to any type of bias affecting all score levels on an instrument equally
(van de Vijver and Leung, 1997). For instance, when measuring persons
intelligence, the scale may be accurate for one group but may consistently
reflect 10 points too much for another group. The 10-point differencewould appear at different intelligence levels (a true score of 90 would be
100, and a true score of 120 would be 130). A nonuniform bias is any type
of bias differentially affecting different score levels. In measuring persons
intelligence, the scale may again be accurate for one group, but for the other
group, 10 points are recorded as 12 points. The difference in measured
intelligence for persons whose true score is 90 would be a score of 108 (18-
point difference), whereas for persons whose true score is 110, the differ-
ence is 22 points (a score of 132). The distortion is greater at higher levelson the scale. Nonuniform bias is considered a greater threat in cross-cultural
comparisons than uniform bias, as it influences the origin and measurement
unit (scale) of a scale. Uniform bias affects only the origin of a scale
(cf. van de Vijver, 1998, 2001).
Relationship Between Bias and Equivalence
Bias and equivalence are closely related. When two or more languageversions of an instrument are unbiased (construct, method, item), they are
determined equivalent on a scale level. Bias will lower a measures level of
equivalence (construct, measurement unit, scalar). Also, construct bias has
more serious consequences and is more difficult to remedy than method
and item bias. For instance, selecting a preexisting instrument for transla-
tion and use on a different language group, the researcher runs the risk of
incomplete coverage of the construct in the target culture (i.e., construct
bias limiting construct equivalence). Method bias can be minimized, for
example, by using standardized administration (administering under simi-
lar conditions using same instructions) and by using covariates, whereas
thorough translation procedures may limit item bias. Furthermore, higher
levels of equivalence are less robust against bias. Scalar equivalence (a
needed condition for comparison of average scores between groups) is, for
instance, affected by all types of bias and is more susceptible to bias than
8 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
9/32
measurement-unit equivalence or construct equivalence, where comparative
statements are not a focus (cf. van de Vijver, 1998). Thus, if one wants to
infer if Culture A shows more or less magnitude of a characteristic (e.g.,
willingness to seek counseling services) than Culture B, one has to empir-
ically demonstrate the measures lack of bias and scalar equivalence.
Not all instruments are equally vulnerable to bias. In fact, more struc-
tured tests administered under standardized conditions are less susceptible
to bias than open-ended questions. Similarly, the less the cultural distance
(Triandis, 1994, 2000) between groups being compared, the less room there
is for bias. Cultural distance can, for instance, be discerned based on the
Human Development Index (HDI; United Nations, 2005) published yearly
by the United Nations Development Programme to assess well-being andchild welfare (human development). Using the HDI as a measure of cultural
distance, it can be seen that the United States (ranked 10) and Ireland
(ranked 8) are more similar in terms of human development than the United
States and Niger (ranked 177). Therefore, it can be expected that greater
bias affects cross-cultural comparisons between the United States and
Niger than between the United States and Ireland.
MEASUREMENT APPROACHES
Selection of Measurement Devices
A prerequisite to conducting a cross-cultural study is to make sure what
is being studied exists and is functionally equivalent across cultures (Berry,
1969; Lonner, 1985). Once this has been determined, the next step is decid-
ing how the construct should be assessed. This decision should be based on
the type of bias expected. If there is a concern with construct bias, the con-
struct is not functionally equivalent, and serious method bias is expected, the
researcher may need to rely on emic approaches (indigenous or cultural),
develop measures meaningful to the culture, and use culture-sensitive
methodologies. Van de Vijver and Leung (1997) called this strategy the
assembly approach. Emic techniques (i.e., assembly) are often needed if
the cultures of interest are very different (Triandis, 1994, 2000). In this
approach, though, direct comparisons between cultures can be challenging,
as the two or more measures of the construct may not be equivalent at themeasurement level.
If, in contrast, the cultures are relatively similar and the concept is func-
tionally equivalent across cultures, the researcher may opt to translate and/or
adapt preexisting instruments and methodologies to discern cultural similar-
ities and differences across cultural groups. Van de Vijver and Leung (1997)
gisdttir et al. / CROSS-CULTURAL VALIDITY 9
-
7/30/2019 10 Methodological Issues
10/32
listed two common strategies employed when using preexisting measures
for multilingual groups. First is the applied approach, where an instrument
goes through a literal translation of items. Item content is not changed to a
new cultural context, and the linguistic and psychological appropriateness of
the items are assumed. It is also assumed there is no need to change the
instrument to avoid bias. According to van de Vijver (2001), this is the most
common technique in cross-cultural research on multilingual groups. The
second strategy is adaptation, where some items may be literally translated,
while others require modification of wording and content to enhance the
appropriateness to a new cultural context (van de Vijver & Leung, 1997).
This technique is chosen if there is concern with construct bias.
Of the three approaches just mentioned (assembly, application, andadaptation), the application strategy is the easiest and least cumbersome in
terms of money, time, and effort. This technique may also offer high levels
of equivalence (measurement-unit and scalar equivalence), and it can make
the comparison to results of other studies using the same instrument possi-
ble. This approach may not be useful, however, when the characteristic
behaviors or attitudes (e.g., obedience and being a good daughter or son)
associated with the construct (e.g., filial piety) differ across cultures (lack
of construct equivalence and high construct bias) (e.g., Ho, 1996). In suchinstances, the assembly or adaptation strategy may be needed. With the
assembly approach (emic), researchers may focus on the construct validity
of the instrument (e.g., factor analysis, divergent and convergent validity),
not on direct cross-cultural comparisons. When adaptation of an instrument
is needed in which some items are literally translated, whereas others are
changed or added, cross-cultural comparisons may be challenging, as direct
comparisons of total scores may not be feasible because all items are not
identical. Only scores on identical items can be compared using mean scorecomparisons (Hambleton, 2001). The application technique (etic) to trans-
lation most easily allows for a direct comparison of test scores using ttests
or ANOVA because of potential scalar equivalence. For such comparisons
to be valid, however, an absence of bias needs to be demonstrated.
The applied approach and to some degree the adaptation strategy focus
on capturing the etics, or the qualities of concepts common across cultures.
Yet cultural researchers have criticized it. Berry (1989), for instance, labeled
this practice imposed etics, claiming that by using the etic approach,
researchers fail to capture the culturally specific aspects of a construct and
may erroneously assume the construct exists and functions similarly across
cultures (cf. Adamopolous & Lonner, 2001). The advantage of the etic over the
emic strategy, however, is that the etic technique provides the ability to make
cross-cultural comparisons, whereas in the emic approach, cross-cultural
comparison is more difficult and not as direct.
10 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
11/32
Nevertheless, the etic strategy may be limited when trying to understand
a specific culture. There is, for instance, no guarantee a translated measure
developed to assess a concept in one culture will assess the same construct
equally well in another culture. It is highly likely that some aspects of the
concept may be lost or not captured by the scale. There might be construct
bias and lack of construct equivalence. To counteract this shortcoming, sev-
eral methods have been proposed. Brislin and colleagues (Brislin, 1976,
1983; Brislin et al., 1973) suggested a combined eticemic strategy. In this
approach, researchers begin with an existing tool developed in one culture
that is translated for use in a target culture (potentially etic items). Next,
additional items are included in the translated scale, which are unique to the
target culture (emic). The additional items may be developed by personsknowledgeable about the culture and/or drawn from relevant literature.
These culture-specific items must be highly correlated with the original
items in the target instrument but unrelated to culture-specific items gener-
ated from another culture (Brislin, 1976, 1983; Brislin et al., 1973). Adding
emic items will provide the researcher with a greater in-depth understand-
ing of a construct in a given culture. Assessing equivalence between the lan-
guage versions of the instrument would be based only on the shared (etic)
items (Hambleton, 2001).Similarly, Triandis (1972, 1975, 1976) suggested that researchers start
with an etic concept (thought to exist in all cultures under study) and then
develop emic items based on each culture for the etic concept. Thus, all
instrument development is carried out within each culture included in the
study (i.e., assembly). Triandis argued that cross-cultural comparison could
still be made using these versions of the measure (one in each culture)
because the emic items would be written to measure an etic concept. SEM
could, for instance, be used for this purpose (see Betz, 2005; Weston &Gore, 2006).
Finally, a convergence approach can be applied (e.g., van de Vijver,
1998). Relying on this technique, researchers may assemble a scale mea-
suring an etic concept in each culture or use preexisting culture-specific
tools translated into each language. Then all measures are given to each cul-
tural group. Comparisons can be made between shared items (given enough
items are shared), whereas nonshared items provide culture-specific under-
standing of the construct. When this method is used, the appropriateness of
items in all scales needs to be determined before administration.
Determining Equivalence of Translated Instruments
Several statistical methods are available to determine equivalence
between translated and original versions of scales. Reporting Cronbachs
gisdttir et al. / CROSS-CULTURAL VALIDITY 11
-
7/30/2019 10 Methodological Issues
12/32
alpha reliability, item-total scale correlations, and item means and variations
provides initial information about instruments psychometric properties. A
statistical comparison between two independent reliability coefficients can
be performed (cf. van de Vijver & Leung, 1997). If the coefficients are sig-
nificantly different from each other, the source of the difference should be
examined. This may indicate item or construct bias. Additionally, item-total
scale correlations may indicate construct bias and nonequivalence, and
method bias (e.g., administration differences, differential social desirability,
differential familiarity with instrumentation). Finally, item score distribution
may suggest biased items and, therefore, information about equivalence. For
instance, an indicator (e.g., item or scale) showing variation in one cultural
group but not the other may represent an emic concept (Johnson, 1998).Therefore, comparing these statistics across different language versions of
an instrument will offer preliminary data about the instruments equivalence
(e.g., construct, measurement unit, and scalar; van de Vijver & Leung, 1997;
conceptual and measurement; Lonner, 1985).
Construct (van de Vijver & Leung, 1997), conceptual, and measurement
equivalence (Lonner, 1985) can also be assessed at the scale level. Here,
exploratory and confirmatory factor analysis, multidimensional scaling
techniques, and cluster analysis can be used (e.g., van de Vijver & Leung,1997). These techniques provide information about whether the construct is
structurally similar across cultures and if the same meaning is attached to
the construct. For instance, in confirmatory factor analysis, hypotheses
about the factor structure of a measure, such as the number of factors, load-
ings of variables on factors, and correlations among factors, can be tested.
Numerous fit indices can be used to evaluate the fit of the model to the data.
Scalar or full score equivalence is more difficult to establish than con-
struct and measurement-unit equivalence, and various biases may threaten thislevel of equivalence. Item bias, for instance, influences scalar equivalence.
Item bias can be ascertained by studying the distribution of item scores for
all cultural groups (cf. van de Vijver & Leung, 1997). Item response theory
(IRT), in which differential item functioning (DIF) is examined, may be
used for this purpose. In IRT, it is assumed item responses are related to an
underlying or latent trait using a logistic curve known as item characteristic
curve (ICC). The ICCs for each selected parameter (e.g., item difficulty or
popularity) are compared for every item in each cultural group using chi-
square statistics. Items differing between cultural groups are eliminated
before cross-cultural comparisons are made (e.g., Hambleton &
Swaminathan, 1985; van de Vijver & Leung, 1997). Item bias can also be
examined by using ANOVA. The item score is treated as the dependent vari-
able, and the cultural group (e.g., two levels) and score levels (levels depen-
dent on number of scale items and number of participants scoring at each
12 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
13/32
level) are the independent variables. Main effects for culture and the inter-
action between culture and score level are then examined. Significant effects
indicate biased items (cf. van de Vijver & Leung, 1997). Logistic regression
can also be used for this purpose using the same type of independent and
dependent variables. Additionally, multiple-group SEM invariance analy-
ses (MCFA) and multiple group mean and covariance structures analysis
(MACS) also provide information about biased items or indicators (e.g.,
Byrne, 2004; Cheung & Rensvold, 2000; Little, 1997, 2000), with the MACS
method also providing information about mean differences between groups
on latent constructs (e.g., Ployhart & Oswald, 2004).
Finally, factors contributing to method bias can be assessed and statisti-
cally held constant when measuring constructs across cultures, given thatvalid measures are available. A measure of social desirability may, for
instance, be used to partially control for method bias. Also, gross national
product per capita may be used to control for method bias, as it has been
found to correlate with social desirability (e.g., Van Hemert, van de Vijver,
Poortinga, & Georgas, 2002) and acquiescence (Johnson et al., 2005).
Furthermore, personal experience variables potentially influencing the con-
struct under study differentially across cultures may serve as covariates.
Translation Methodology
Employing a proper translation methodology is extremely important to
increase equivalence between multilingual versions of an instrument and the
measures cross-cultural validity. About a decade ago, van de Vijver and
Hambleton (1996) published practical guidelines for translating psycholog-
ical tests that were based on standards set forth in 1993 by the International
Test Commission (ITC). The guidelines covered best practices in regard tocontext, development, administration, and the interpretation of psychologi-
cal instruments (cf. Hambleton & de Jong, 2003; van de Vijver, 2001; van
de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997). The context
guidelines emphasized the importance of minimizing construct, method, and
item bias and the need to assess, instead of assume, construct similarity
across cultural groups before embarking on instrument translation. The
development guidelines referred to the translation process itself, while the
administration guidelines suggested ways to minimize method bias. Finally,
the interpretation guidelines recommended caution when explaining score
differences unless alternative hypotheses had been ruled out and equivalence
between original and translated measures had been ensured (van de Vijver &
Hambleton, 1996). Counseling psychologists should review these guidelines
when designing cross-cultural research projects and prior to translating and
adapting psychological instruments for such research.
gisdttir et al. / CROSS-CULTURAL VALIDITY 13
-
7/30/2019 10 Methodological Issues
14/32
Prior to the development of the ITC standards, Brislin et al. (1973) and
Brislin (1986) had written extensively about translation procedures. The
following paragraphs outline the common translation methods that Brislin
et al. summarized with connotations to the ITC guidelines (e.g., Hambleton
& de Jong, 2003; van de Vijver & Hambleton, 1996). Additional methods
to enhance equivalence of translated scales are also mentioned.
Translation. When translating an instrument, bilingual persons who
speak both the original and the target language should be employed. Either
a single person or a committee of translators can be used (Brislin et al.,
1973). In contrast to employing only a single person for the translation, the
committee approach emphasizes two or more persons performing the trans-lation independently. Then, the translations are compared, sometimes with
another person, until an agreement is reached on an optimal translation. The
advantage of the committee approach recommended in the ITC guidelines
(van de Vijver & Hambleton, 1996) over a single person is the possible
reduction in bias and misconceptions of a single person. In addition to
being knowledgeable about the target language of the translation, test trans-
lators need to be familiar with the target culture, the construct being
assessed, and the principles of assessment (Hambleton & de Jong, 2003;van de Vijver & Hambleton, 1996). Being knowledgeable about such
topics minimizes item biases (e.g., in an achievement test, an item in one
culture may give away more information than the same item in another cul-
ture) that may result from literal translations.
Back translation. In this procedure, the translated or target version of
the measure is independently translated back to the original language by
different person(s) than the one(s) performing the translation to the targetlanguage. If more than one person is involved in the back translation,
together they decide on the best back-translated version of the scale that is
compared to the original same-language version for linguistic equivalence.
Back translation does not only provide the researcher with some control
over the end result of the translated instrument in cases where he or she
does not know the target language (e.g., Brislin et al., 1973; Werner &
Campbell, 1970), it also allows for further refinement of the translated
version to ensure equivalence of the measures. If the two same-language
versions of the scale do not seem identical (i.e., the original and the back-
translated versions), the researcher in cooperation with the translation com-
mittee works on the translations until equivalence is reached. Here, the
items requiring a changed translation may be subject to back translation
again. Oftentimes in this procedure, only the translated version is changed
to be equivalent to the original-language version that remains unchanged.
14 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
15/32
At other times, the original language version of the scale is also changed to
ensure equivalence, a process known as decentering (Brislin et al., 1973).
Adequate back translation does not guarantee a good translation of a scale,
as this procedure often leads to literal translation at the cost of readability
and naturalness of the translated version. To minimize this, a team of back
translators with a combined expertise in psychology and linguistics may be
used (van de Vijver & Hambleton, 1996). It is also important to note that in
addition to the test items, test instructions need to go through a thorough
translation/back-translation process.
Decentering. This method was first introduced by Werner and Campbell
(1970) and refers to a translation/back-translation process in which both thesource (original instruments language) and the target language versions
are considered equally important and both are open to modification.
Decentering may need to take place if words in the original language have
no equivalence in the target language. If the aim is collecting data in both the
original and the target culture, items in the original instrument are changed
to ensure maximum equivalence (cf. Brislin, 1970, on the translation of
Marlowe-Crownes [Crowne & Marlowe, 1960] Social Desirability Scale).
Thus, the back-translated version of the original instrument is used for datacollection instead of the original version, as it is considered most likely to
be equivalent to the translated version (Brislin, 1986). When this outcome
is selected and when researchers worry that changes in the original lan-
guage may lead to a lack of comparability with previous studies using the
original instrument, Brislin (1986) suggested collecting data using both
the decentered and the original version of the instrument on a sample
speaking the original language. The participants may see half of the original
items and half of the revised items in a counterbalanced order. Statisticalanalysis can indicate whether different conclusions should be made based
on responses to the original versus the revised items (see Brislin, 1970).
Pretests. Following translation and back translation of an instrument
and, therefore, judgmental evidence about the equivalence of the original
and translated versions of the instrument, several pretest measures can be
used to evaluate the equivalence of the instruments in regard to the mean-
ing conveyed by the items. One approach is to administer the original and
the translated versions of the instrument to bilingual persons (Brislin et al.,
1973; van de Vijver & Hambleton, 1996). Following the administration of
the instruments, item responses can be compared using statistical methods
(e.g., t test). If item differences are discovered between versions of the
instrument, the translations are reviewed and changed accordingly.
gisdttir et al. / CROSS-CULTURAL VALIDITY 15
-
7/30/2019 10 Methodological Issues
16/32
Sometimes bilingual individuals are used in lieu of performing back
translations (Brislin et al., 1973). In this case, the translated version and
original versions of the instrument are administered to bilingual persons.
The bilingual persons may be randomly assigned to two groups that receive
half of the questions in the original language and the other half in the target
language. The translated items resulting in responses different from responses
elicited by the same original items are then refined until the responses between
the original and the translated items are comparable. Items not yielding com-
parable responses despite revisions are discarded. If items yield comparable
results, the two versions of the instrument are considered equivalent. Additionally,
a small group of bilingual individuals can be employed to rate each item from
the original and translated versions of the instrument on a predetermined scalein regard to the similarity of meaning conveyed by the item. Problematic items
are then refined until deemed satisfactory (e.g., Hambleton, 2001).
A small sample of participants (e.g.,N = 10) can also be employed to pretest
a translated measure that has gone through the translation/back-translation
iteration. Here, participants are instructed to provide verbal or written feed-
back about each item of the scale. For example, Brislin et al. (1973) noted
two methods: random probe and rating of items. In the random probe
method, the researcher randomly selects items from a scale and asks probingquestions about an item, such as What do you mean? Persons responses
to the probes are then examined. Responses considered bizarre or unfitting
an item are scrutinized, and the translation of the item is changed. This
method provides insight into how well the meaning of the original items has
fared in the translation. In the rating method, respondents are asked to rate
their perceptions about item clarity and appropriateness on a predetermined
scale. Items that are unclear or not fitting based on these ratings are
reworded. Finally, a focus group approach can be used (e.g., gisdttir,Gerstein, & Gridley, 2000) where a small group of participants responds to
the translated version and then discusses with the researcher(s) the meaning
the participants associated with the items. Participants also share their
perception about the clarity and cultural appropriateness of the items. Item
wording is then changed based on responses from the focus group members.
Statistical Assessment of the Translated Measure
In addition to pretesting a translated scale and judgmental evidence
about a scales equivalence, researchers need to provide further evidence of
the measures equivalence to the original instrument. As stated earlier, item
analyses and Cronbachs alpha suggest equivalence and lack of bias.
Furthermore, exploratory and confirmatory factor analyses of the measures
factor structure can contribute information about construct equivalence.
Multidimensional scaling and cluster analysis can be used to explore construct
16 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
17/32
equivalence as well. These techniques indicate equivalence on an instru-
ment level, more specifically, about the similarities and differences of the
hypothesized construct underlying the instrument for the different language
versions. Similar to Brislin et al.s (1973) suggestions mentioned earlier,
Mallinckrodt and Wang (2004) proposed a method they termed the dual-
language split half (DLSH) to evaluate equivalence. In this procedure,
alternate forms of a translated measure, each composed of one half of items
in the original language and one half of items in the target language, are
administered to bilingual persons in a counterbalanced order of languages.
Equivalence between the two language versions of the instruments is deter-
mined by lack of significant differences between mean scores on the origi-
nal and translated version of the measures, by split-half correlationsbetween clusters of items on the original and the target language, and by the
internal consistency reliability and testretest reliability of the dual lan-
guage form of the measures. These coefficients are compared to results
from the original-language version of the instrument. Also inherent in this
approach is collection of evidence for convergent validity for each language
version. Finally, and as mentioned earlier, to provide further evidence of the
measures equivalence to the original measure analyses at the item level
(item bias analysis; van de Vijver & Hambleton, 1996), procedures such asANOVA and IRT to examine DIF can be applied to determine scalar equiv-
alence (cf. van de Vijver & Leung, 1997). MCFA and MACS invariance
analyses can be employed for this purpose as well.
CONTENT ANALYSIS OF TRANSLATION METHODS
IN SELECT COUNSELING JOURNALS
Another purpose of this article is to examine, analyze, and evaluate
translation practices employed in five prominent counseling journals
thought to publish a greater number of articles on international topics
than other counseling periodicals. This purpose was pursued to determine
whether counseling researchers have, in fact, followed the translation pro-
cedures suggested by Brislin (1986) and Brislin et al. (1973) and in the
ITC guidelines (e.g., van de Vijver and Hambleton, 1996). We also examined
the methods used to control for bias and increase equivalence. While this
was not the primary purpose of this article, results of our investigation
might help illustrate counseling researchers use of preferred translation
principles mentioned in the cross-cultural literature. It was also assumed
results obtained from this type of investigation could help identify further
recommendations to assist counseling researchers when conducting cross-
cultural studies and when reporting results of such projects in the schol-
arly literature.
gisdttir et al. / CROSS-CULTURAL VALIDITY 17
-
7/30/2019 10 Methodological Issues
18/32
METHOD
Sample
The sample consisted of published studies employing translated instru-
ments in their data collection. To be included in this project, an integral part
of the studys methodology had to be a translation of one or more entire
instrument or some subset of items from an instrument. Furthermore, the
target instrument could not have been translated or evaluated the same way
in earlier studies. Additionally, the included studies had to either compare
responses from persons from more than one culture (nationality) or inves-
tigate a psychological concept using a non-U.S. or non-English-speakingsample of participants. Studies for this investigation were sampled from
five counseling journals (Journal of Counseling Psychology [JCP],Journal of
Counseling and Development[JCD],Journal of Multicultural Counseling
and Development [JMCD], Measurement and Evaluation in Counseling
and Development [MECD], and The Counseling Psychologist [TCP])
thought to publish articles relevant to non-English-speaking cultures, eth-
nic groups, and/or countries. To assess for more recent trends in the litera-
ture, only articles published between the years 2000 and 2005 were
included in our sample. We assumed recent studies (i.e., studies published
since 2000) would provide a good representation of current translation and
verification practices employed by counseling researchers. From 2000 to
2005, a total of 615 empirical articles were published in the targeted jour-
nals. Of these articles, 15 included translation as a part of their methodol-
ogy. Therefore, 2.4% of the empirical articles published in these five
counseling journals incorporated a translation process.
Procedure
The 15 identified studies were coded by (a) publication source (e.g.,
TCP), (b) year of publication (e.g., 2001), (c) construct investigated and
name of scale translated, (d) translation methodology used (single person,
committee, bilinguals), (e) whether the translated version of the scale was
pilot tested (yes or no) before main data collection, (f) number of partici-
pants used for pilot testing, (g) psychometric properties reported and statis-
tics used to evaluate the translated measures equivalence to the originalscale, and (h) number of participants from which the psychometric data
were gathered. Two of the current authors coded information from the arti-
cles independently. If disagreements arose in the coding (e.g., relevant
psychometrics for equivalence evaluation), these were resolved through
consensus agreement between the coders.
18 THE COUNSELING PSYCHOLOGIST / Month XXXX
(text continues on p. 22)
-
7/30/2019 10 Methodological Issues
19/32
19
1.Shin,Berkson,&
Crittenden(2000);
JMCD
2.Engels,Finkenauer,
Meeus,&Dekovic
(2001);JCP
3.Chung&Bemak
(2002);JCD
4.Kasturirangan&
Nutt-Williams
(2003);JMCD
5.Asner-Self&
Schreiber(2004);
MECD
6.Torres&Rollock
(2004);MECD
Psychological
help-seeking
attitudes;
traditionalvalues
Parentalattachment;
Relation
al
comp
etence;
Self-e
steem;
Depression
Anxiety
;
depre
ssion;
psych
osocial
dysfu
nction
symptoms
Culture
Domesticviolence
Attributionalstyle
Acculturation-related
challe
nges
Immigrantsfrom
Korea
Dutchadolescents
SoutheasternAsian
refugees
Latinowomen
Immigrantsfrom
CentralAmerica
Immigrantsfrom
Central&South
America
Sixitemsfromthe
A
ttitudesToward
SeekingProfessional
PsychologicalHelp
(A
TSPPH);
A
cculturationAttitude
Scale,
(AAS)prior
translation;Vignettes
developedinEnglish
Pare
ntandPeer
A
ttachment(IPPA);
PerceivedCompetence
ScaleforChildren;
Self-EsteemScale;
D
epressiveMoodList
HealthOpinionSurvey
(interview)
Ase
mistructured
in
terviewprotocol
developedbythe
re
searchers:Two
in
terviewsinEnglish,
se
veninSpanish
The
AttributionalStyle
Q
uestionnaire(ASQ)
CulturalAdjustment
D
ifficultiesChecklist
(C
ADC)
EnglishtoKorean
EnglishtoDutch
Englishto
Vietnamese,
Khmer,
Laotian
EnglishtoSpanish.
Nodiscussionof
translationmethod
Englishto
Spanish
EnglishtoSpanish
Committee
Committee
(researchers);
unclearwhat
instrumentswere
translatedinstudy
Committee
Not r
eported
Committee
Committee
Yes
Yes(
researchers)
Yes
No
Yes
Yes
No
No
Pilot in
terviews
Pilotinterview;no
comparison
between
Englishand
Spanishversion
ofprotocol
priortodata
collection
No
No
N/A
N/A
N/A
N/A
Englishversion
ofprotocol
administeredto
(n=3)Latina
women
N/A
Notreportedfor
the10%ofthe
samplethat
respondedto
thisversion
A
TSPPH:Factoranalysis
AAS:Cronbach'salpha
(N=110Koreanimmigrants
inU.S.)
C
ronbach'salpha(N=412
Dutch
adolescents)
E
xploratoryfactoryanalysis
forVietnamese(N=867),
Cambodian(N=590),and
Laotian(n=723)persons
L
atinaprofessorofforeign
languageservedasanauditor
toensurepropertranslation
oftranscriptsfromSpanish
toEnglish(n=7)Latina
women
C
ronbach'salpha,principle
componentsanalysis(N=89
CentralAmerican
immigrantsinU.S.)
C
ronbach'salpha(N=86
Hispanicimmigrants).90%
ofthesamplerespondedto
thetranslatedversionof
instruments.
Nocomparison
reportedbetweenthetwo
languageversions
Assigned
Psychom
etricsReported
Number,
Approach
Citation,
Typeof
Instrument
to
Back
andJournal
Construct
Sample
Name
Translation
Translation
Translation
Pretest
Original
Target
TABLE1:
StudiesInvolvingTranslationofInstruments
(continued)
-
7/30/2019 10 Methodological Issues
20/32
20
7.Oh&Neville
(2004);TCP
8.Asner-Self&
Marotta(2005);
JCD
9.Wei&Heppner
(2005);TCP
Cross-culturalstudies
10.
Marino,
Stuart,&
Minas(2000);
MECD
Korean
rapemyth
acceptance
Depression,anxiety,
phob
icanxiety;
Erikson'seight
psychosocialstages
Clients'perceptions
ofco
unselorcredi-
bility
;working
alliance
Accultu
ration
Koreancollege
students
Immigrantsfrom
CentralAmerica
Counselorclient
dyadsinTaiwan
Anglo-Celtic
Australians&
Vietnamese
immigrantsto
Australia
Illin
oisRapeMyth
A
cceptanceScale
(IRMAS)(26items
fromIRMASwere
translatedandinclud-
e
dinthepreliminary
v
ersionoftheKorean
R
apeMyth
A
cceptanceScale;
K
RMAS)
BriefSymptom
Inventory(BSI);
M
easuresof
P
sychosocial
D
evelopment(MPD)
Cou
nselorRating
F
ormshortVersion
(CRF-S);The
W
orkingAlliance
Inventoryshort
V
ersion(WAI-S)
Dev
elopeda
q
uestionnaire(in
E
nglish)measuring
b
ehavioraland
p
sychological
a
cculturation,and
socioeconomicand
d
emographic
influenceson
a
cculturation
EnglishtoKorean
EnglishtoSpanish
EnglishtoMandarin
EnglishtoVietnamese
Singleperson
Notreported
Singleperson
Committee
Yes
Yes
Yes
Yes
Yes;Focusgroup
(n=4South
Korean
nationals)
evaluatedeach
itemfrom
IRMASand26
itemsgenerated
fromKorean
literature.A
ll
itemswerein
Korean
Notreported
No
Yes(n=10)
Vietnamese
version
N/A
Notreported
N/A
Cronbach'salpha,
(N=196
Anglo-Celtic
Australians)
Study1:Principlecomponents
analysisfollowedby
exploratoryfactoranalysis
(N=348SouthKorean
collegestudents).Study2:
confirmatoryFactor
analysis,
factorial
invarianceprocedure,
Cronbach'salpha,
&
MANOVAtoestablish
criterionvalidity(N=547
SouthKoreannationals).
Study3:Testretest
reliability(N=40South
Koreanteachersorschool
administrators)
Notreported.
Noinformation
aboutnumberof
participantsrespondingto
EnglishorSpanishversions
ofinstruments.Volunteers
probedabouttheresearch
experience.
Cronbach'salpha,
intercorrelationsamong
CRFsubscales(construct
validity)(N=31counselor/
clientdyadsinTaiwan)
Cronbach'salpha(N=187
VietnameseAustralians).
Vietnameseparticipants
respondedtoeitheranEnglish
oraVietnameseversionof
theinstrument.Statistical
evidenceofequivalence
betweenthesetwolanguage
versionsoftheinstrument
wasnotreported
Assigned
Psycho
metricsReported
Number,
Approach
Citation,
Typeof
Instrument
to
Back
andJournal
Construct
Sample
Name
Translation
Translation
Translation
Pretest
Original
Target
TABLE:(continued)
-
7/30/2019 10 Methodological Issues
21/32
21
11.gisdttir&
Gerstein(2000);
JCD
12.Poasa,
Mallinckrodt,&
Suzuki(2000);
TCP
13.Tang(2002);
JMCD
Equivalencestudies
14.Chang&
Myers(2003);
MECD
15.Mallinckrodt&
Wang(2004);JCP
Counseling
expec
tations;
Holla
nd'stypology
Causalattributions
Careerc
hoice
Wellness
Adultattachment
Icelandic&U.S.
collegestudents
U.S.,
American
Somoan,&
WesternSamoan
collegestudents
Chinese,
Chinese-American,
&Caucasian
Americancollege
students
Immigrantsfrom
Korea
Int'lstudentsfrom
Taiwan
ExpectationsAbout
Counseling
Questionnaire(EAC-B);
Self-DirectedSearch
(S
DS)
Questionnaireof
A
ttributionand
Culture(QAC;
vignetteswithopen-
endedresponseprobes
developedinEnglish)
Aquestionnaire
developedinEnglish
in
thestudyto
m
easureinfluenceson
careerchoice
The
WellnessEvaluation
ofLifestyle(WEL)
The
Experiencesin
CloseRelationships
Scale(ECRS)
EnglishtoIcelandic
EnglishtoSamoan
EnglishtoChinese
EnglishtoKorean
EnglishtoChinese
Committee
Singleperson
Singleperson
(researcher)
Singletranslator
whosetranslations
wereeditedbyfirst
author.
Discrepancies
resolvedbetween
translatorand
editoruponmutual
agreement
Committee
Yes
Yes
Yes
No
Yes
FocusGroup(n=
8)Icelandic
version
Englishversionof
QACpilot
testedand
respondents
provided
feedbackto
evaluate
equivalence
(n=16)
No
Yes(n=3):
Bilingualexam-
ineestookboth
theEnglishand
theKorean
version.Effect
size(Cohen'sd)
ofdifferencein
meanscores
between
Englishand
Koreanversion
No
Cronbach'salpha
(N=225U.S.
college
students)
AteamofEnglish-
speaking
persons(n=4)
independently
codedthe
English-
language
responsesfrom
QACandinter-
views(N=23)
Nonereportedfor
Caucasian
American(N=
124)andAsian
American
(131)college
students
Nonereportedfor
alargersample
(Nnot
reported)
Split-half
reliability,
Cronbach's
alpha(N=399
U.S.college
students)
C
ronbach'salpha(N=261
Icelandiccollegestudents).
Covariateanalysis(prior
counselingexperience)used
tocontrolformethodbias.
A
teamofSamoan-speaking
persons(n=3)
independentlycodedthe
Samoanlanguageresponses
fromQACandinterviews
(N=50).Noinformation
aboutifthemes/codeswere
translatedfromSamoanto
English
N
onereportedforChinese
(N=120)collegestudents
N
onereportedforalarger
sample(Nnotreported)
U
sedbilinguals(n=30
Taiwaneseinternational
collegestudents)toevaluate
equivalenceusingDLSH
method:within-subjects
ttestbetweentwolanguage
versions,split-halfreliability,
Cronbach'salpha,testretest
reliabilityandconstruct
validitycorrelationswitha
relatedconstruct
-
7/30/2019 10 Methodological Issues
22/32
RESULTS
Table 1 lists results found for each of the 15 studies. Three of the
included studies used a structured or semistructured interviewtest protocol.
In 3 studies, of which one included a semistructured test protocol, an
English-language instrument was developed and then translated to another
language. Furthermore, in 9 studies, one or more preexisting measures (the
entire instrument or subset of items) were translated into a language other
than English. In the 15 studies, a range of constructs was examined, includ-
ing persons counseling orientations (e.g., help-seeking attitudes, counsel-
ing expectations), adjustment (e.g., acculturation), and maladjustment (e.g.,
psychological stress). A diversity of cultural groups was represented in the15 studies as well (see Table 1).
Evaluation of Included Studies
Two main criteria were used to evaluate these 15 studies: (a) the trans-
lation methodology employed (single person, committee, back translation,
pretest), which provides judgmental evidence about the equivalence of the
translated measure to the original measure; and (b) whether statisticalmethods were used to verify equivalence of the translated measure to its
original-language version. Because the studies ranged in terms of their pur-
pose and the approaches taken when investigating multicultural groups, and
also because these strategies were linked with different measurement
opportunities of equivalence and bias, we divided these 15 studies into
three categories: target-language, cross-cultural, and equivalence studies.
The target-language studies included projects in which only translated ver-
sions of measures were investigated. These studies employed either cross-cultural (etic) methodologies or a combination of cultural and
cross-cultural methodologies (emicetic). For these studies, there was no
direct comparison made between an original and a translated version of the
protocol. The second category of studies used a cross-cultural approach, as
they compared two or more groups on a certain construct. Each of these
groups received the original and translated versions of a measure. Finally,
the third category of studies was specifically designed to examine equiva-
lence between two language versions of an instrument. These studies we
termed equivalence studies.
We identified studies that employed sound versus weak translation method-
ologies. This task turned out to be difficult, however, because of the scarcity
of information reported about the translation processes used. Sometimes,
the translation procedure was described in only a couple of sentences. In
other instances, the translation methodology was discussed in more detail
22 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
23/32
(e.g., number and qualifications of translators and back translators), while in
fewer instances, examples were provided about acceptable and unacceptable
item translations.
Despite these difficulties, and based on available information, we con-
trasted relatively sound and weak translation procedures. Translation methods
we considered weak did not incorporate any mechanism to evaluate the trans-
lation, including either judgmental (e.g., back translation, use of bilinguals,
pretest) and/or quantitative (statistical evidence of equivalence) procedures.
Instead, such a protocol was translated to one or more languages without any
apparent evaluation about its equivalence to the original language version.
Methodologically sound studies incorporated both judgmental and quantita-
tive methods to assess the validity of the translation. Given these criteria toevaluate the methodological rigor of the translation process employed, we now
present the analyses of the 15 identified studies in the literature.
Target-language studies. Eight of the 15 studies administered and exam-
ined responses from a translated measure without direct comparison to a
group responding to an original-language version of the measure (see Table
1). In most of these studies, persons from one cultural group participated.
Both quantitative and qualitative methods were employed. These studiesrelied on preexisting instruments, select items from preexisting instru-
ments, or interview protocols translated into a new language. We also
included in this category studies in which a protocol was developed in
English and translated into another language.
In two studies (4 and 8), few procedures were reported to evaluate the
translation and verify the different language form of the measures used (see
Table 1). In these studies, two language versions of a scale were collapsed
into one set of responses without evaluating their equivalence. A strongerdesign for these studies would ensure judgmental equivalence between the
two language versions of the scales. This could have been accomplished by
using a committee of translators and independent back translators. A
stronger design would have also resulted from incorporating a decentering
process when developing the adapted measures and, if appropriate, by sta-
tistically assessing equivalence. Thus, we considered these studies weak in
terms of their methodological rigor.
Sound translation methods incorporate several mechanisms to evaluate
a translated version of a protocol. They involve, for instance, a committee
approach to translation/back translation, a pretest of the scale, and an eval-
uation of the instruments psychometric properties relative to the original
version. Four studies reported information somewhat consistent with our
criteria for sound methodological procedures (3, 5, 7, and 9). The authors,
with varying degree of detail, reported using either a single person or a
gisdttir et al. / CROSS-CULTURAL VALIDITY 23
-
7/30/2019 10 Methodological Issues
24/32
committee approach to translation, they relied on back translation, and they
employed one or more independent experts to evaluate the equivalence of
the language forms. They also reported making subsequent changes to the
translated version of the instruments they were using. Additionally, in some
of these studies, a pretest of the translated protocol was performed, and in
all of these projects, the investigators discussed the statistical tests of the
measures psychometric properties (see Table 1).
The remaining three studies in this category (1, 2, and 6) contained
translation methods of moderate quality, in that their quality ranged in
between those we considered using relatively weak and strong translation
procedures. In fact, the translation process was not fully described.
Furthermore, in one instance, the same person performed the translationand the back translation (2), and in another (6), no assessment of equiva-
lence was reported on the two language versions of the scale used before
responses were collapsed into one data set. Also, in one study (1), translated
items from an existing scale were selected a priori without any quantitative
or qualitative (e.g., pretest) assurance these items fit the cultural group to
which they were administered. In none of these three studies were the mea-
sures pretested before collecting data for the main study. Finally, insufficient
information was reported about the translated instruments psychometricproperties to evaluate the validity of the measures for the targeted cultural
groups. The internal validity of these studies could have been greatly
improved had the researchers included some of these procedures in the
translation and verification process.
Cross-cultural studies. Four of the 15 studies directly compared two
or more cultural groups. In 3 of these studies, an instrument was developed
in English and then translated into another language, whereas in 1 study, apreexisting instrument was translated to another language (see Table 1). In
all 4 studies, comparisons were made between language groups relying on
two language versions of the same instrument.
None of these four studies employed a particularly weak translation
methodology. Yet three of the four studies (11, 12, and 13) used relatively
rigorous methods. In these three studies, the scales were pretested follow-
ing the translation/back-translation process, providing judgmental evidence
of equivalence. Additionally, in the two quantitative studies (10 and 11),
the researchers compared Cronbachs alphas between language versions.
Finally, in one study (11), equivalence was further determined by employ-
ing covariate analysis to control for method bias (different experiences of
participants across cultures) in support of scalar equivalence. None of these
approaches to examine and ensure equivalence was reported in the Tang
(2002) study. As a result, we concluded that this study used the least valid
24 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
25/32
approach. It is noteworthy that all four studies in this category failed to
assess the factor structure of the different language versions of the mea-
sures, and as such, they did not provide additional support for construct
equivalence. Similarly, none of these studies assessed item bias or per-
formed any detailed analyses to verify scalar equivalence. Employing these
additional analyses would have greatly enhanced the validity of the reported
cross-cultural comparisons in these four studies.
Equivalence studies. Two of the 15 studies were treated as separate
cases, as they were specifically designed to demonstrate and evaluate
equivalence between two language versions of a scale (see Table 1).
Therefore, we did not evaluate these the same way as the other 13 studies.Instead, they are examples of how to enhance cross-cultural validity of
translated and adapted scales. We concluded that Mallinckrodt and Wangs
(2004) approach to determine construct equivalence between language ver-
sions of a measure was significantly more rigorous than the one presented
by Chang and Myers (2003).
As can be seen from Table 1, Chang and Myers (2003) employed three
bilingual persons in lieu of back translation. In their approach, bilingual
persons average scale scores to both versions of a scale were compared.Mallinckrodt and Wang (2004), in contrast, used both back translation and
bilingual individuals to demonstrate and ensure equivalence. Their method
subsumed the method employed by Chang and Myers. Following a back
translation of an instrument, Mallinckrodt and Wang used a quantitative
methodology, the DLSH, to assess equivalence between two language
versions of a scale (see discussion earlier). In brief, with this approach,
responses from bilingual individuals receiving half of the items in each
language were compared to a criterion sample of persons responding to theoriginal version of the scale. By comparing average scale scores, reliability
coefficients, and construct validity correlations, the researchers were able to
examine the equivalence (construct and to some degree scalar equivalence)
between the two language versions of the instrument.
Interpretation of Results
The current results are consistent with Mallinckrodt and Wang (2004),
who discovered in their review of articles published in two counseling jour-
nals (JCP and TCP) that few studies in counseling psychology have inves-
tigated multilingual or international groups or employed translation methods.
Additionally, consistent with these investigators, we found in many instances,
counseling researchers used inadequate procedures to verify equivalence
between language versions of an instrument. For example, our analyses
gisdttir et al. / CROSS-CULTURAL VALIDITY 25
-
7/30/2019 10 Methodological Issues
26/32
indicated just more than half of the 15 studies employed a committee of
translators. A committee is highly recommended in the ITC guidelines (van
de Vijver & Hambleton, 1996).
We also discovered in less than half of the 15 studies that the measure-
ment devices were pretested, and in slightly more than half of the studies,
the researchers used quantitative methods to further demonstrate equiva-
lence. Furthermore, only 1 study systematically controlled for method bias,
while none of the 15 studies assessed for item bias. All these procedures
are recommended in the ITC guidelines. On a positive note, however, all
but 2 studies used a back-translation procedure to enhance equivalence.
Taken together, all of these results are disquieting and lead us to call for
employing more rigorous research designs when studying culture, whenusing and evaluating translated instruments, and when performing cross-
cultural comparisons.
Additionally, we found, in many cases, limited attention was placed on
discussing translation methods. Hambleton (2001) also observed this trend.
Not knowing the reason for this lack of effort, we speculate about why
methods of translation were not described in more detail. One reason could
be the lack of importance placed on this methodological feature of a
research design. Another may relate to an authors desire to comply withpage limitations in journals. A third reason could be a researchers failure
to recognize the importance of reporting the details about methods of trans-
lation. Finally, it is conceivable that researchers assume others are aware of
common methods of translation and thus do not discuss the methods they
use in much detail. Whatever the reasons, consistent with the ITC guide-
lines, we strongly suggest investigators provide detailed information about
the methods they employ when translating and validating instruments used
in research. This is especially important, as an inappropriate translation ofa measure can lead to a serious threat to a studys internal validity, may con-
tribute to bias, and in international comparisons may limit the level of
equivalence between multilingual versions of a measure. As a threat to
internal validity, a poorly translated instrument may act as a strong rival
hypothesis for obtained results.
RECOMMENDATIONS
Translation Practices
Several steps are essential for a valid translation. Based on our and
Brislin and colleagues (Brislin, 1986; Brislin et al., 1973) review of common
translation methods and the ITC guidelines (e.g., Hambleton, 2001; van de Vijver
& Hambleton, 1996), the best translation procedure involves several steps as
26 THE COUNSELING PSYCHOLOGIST / Month XXXX
-
7/30/2019 10 Methodological Issues
27/32
outlined in Table 2. All but the last step outlined in this table help to minimize
item and construct bias and therefore may increase scalar equivalence between
language versions of a measure (ITC development guidelines). The last step
or recommendation refers to verifying cross-cultural validity of measures
(i.e., absence of bias and equivalence; ITC interpretation guidelines).
Combining Emic and Etic Approaches
As stated previously, the cross-cultural approach to studying cultural
influences on behavior has limitations. One risk involves assuming universal
laws of behavior and neglecting an in-depth understanding of cultures and
their influences on behavior (e.g., imposed etics). To address this problem,
and in line with suggestions reviewed earlier, we offer several recommenda-
tions for counseling psychologists involved in international research. First,
collaboration between scholars worldwide and across disciplines is suggested
to enhance the quality of cross-cultural studies and the validity of methods
and findings. Such collaboration increases the possibility that unique cultural
variables will be incorporated into the research and potential threats to
internal and external validity will be reduced. Second, to avoid potential
method bias, an integration of quantitative and qualitative methods should be
considered, especially when one type of method may be more appropriate and
relevant to a particular culture. A convergence of results from both methods
gisdttir et al. / CROSS-CULTURAL VALIDITY 27
TABLE 2: Summary of Recommended Translation Practices
1. Independent translation from two or more persons familiar with the target lan-
guage and culture and intent of the scale2. Documentation of comparisons of translations and agreement on the best trans-
lation
3. Rewriting of translated items to fit grammatical structure of target language
4. Independent back translation of translated measure into original language (one or
more persons)
5. Comparison of original and back-translated versions, focusing on appropriate-
ness, clarity, meaning (e.g., use rating scales)
6. Changes to the translated measure based on prior comparison. Changed items go
through the translation/back-translation iteration until satisfactory7. If concepts or ideas do not translate well, deciding what version of the original
version of a scale should be used for cross-cultural comparison (original, back
translated, or decentered)
8. Pretest of translated instrument on an independent sample (bilinguals or target
language group). Check for clarity, appropriateness, and meaning
9. Assessment of the scales reliability and validity, absence of bias, and equivalence
to the original-language version of the scale
-
7/30/2019 10 Methodological Issues
28/32
enhances the validity of the findings. Third, when method bias is not expected
but there is a potential for construct bias while the use of a preexisting mea-
sure is considered feasible, researchers should consider collecting emic items
to be included in the instrument when studying an etic construct (e.g., Brislin,
1976; Oh & Neville, 2004). This approach will enhance construct equiva-
lence by limiting construct bias and will provide culture-specific information
to aid theory development. Fourth, when emic scales are available in the cul-
tures of interest to assess an etic construct and cross-cultural comparisons are
sought, the convergence approach should be considered. With this approach,
all instruments are translated and administered to each cultural group. Then,
items and scales shared across cultures are used for cross-cultural compar-
isons, whereas nonshared items provide information about the unique aspectof the construct in each culture (e.g., van de Vijver, 1998). This approach will
enhance construct equivalence, it may deepen the current understanding of
cultural and cross-cultural dimensions of a construct, and it may aid theory
development. Finally, Triandiss (1972, 1976) suggestion can be considered.
With this procedure, instruments are simultaneously assembled in each cul-
ture to measure the etic construct (e.g., subjective well-being). With this
approach, most or all types of biases can be minimized and equivalence
enhanced, as no predetermined stimuli are used. Factor analyses can be per-formed to identify etic constructs for cross-cultural comparisons.
CONCLUSION
Given our professions increased interest in international topics, there is
a critical need to address methodological challenges unique to this area. We
discussed important challenges such as translation, equivalence, and bias.Proper translation methods may strengthen the equivalence of constructs
across cultures, as a focus on instrumentation can minimize item bias and
some method bias. Consequently, construct equivalence may be enhanced.
Merely targeting an instruments translation, however, is not sufficient.
Other factors to consider when mak