grammatical gender and linguistic relativity: a systematic ...theoretical review grammatical gender...

20
THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel 1 & Geoff Cole 1 & Madeline J. Eacott 1 # The Psychonomic Society, Inc. 2019 Abstract Many languages assign nouns to a grammatical gender class, such that bedmight be assigned masculine gender in one language (e.g., Italian) but feminine gender in another (e.g., Spanish). In the context of research assessing the potential for language to influence thought (the linguistic relativity hypothesis), a number of scholars have investigated whether grammatical gender assignment rubs offon concepts themselves, such that Italian speakers might conceptualize beds as more masculine than Spanish speakers do. We systematically reviewed 43 pieces of empirical research examining grammatical gender and thought, which together tested 5,895 participants. We classified the findings in terms of their support for this hypothesis and assessed the results against parameters previously identified as potentially influencing outcomes. Overall, we found that support was strongly task- and context-dependent, and rested heavily on outcomes that have clear and equally viable alternative expla- nations. We also argue that it remains unclear whether grammatical gender is in fact a useful tool for investigating relativity. Keywords Grammatical gender . Whorf . Linguistic relativity . Language and thought Introduction The SapirWhorf or linguistic relativity hypothesishence- forth, simply relativity(Whorf, 1956)takes various forms, but at its heart it contends that the idiosyncrasies of the languages we speak influence the way we think about the world. In its strongest incarnationlinguistic determin- ismthought is constrained by language, but few if any con- temporary scholars take this view (Athanasopoulos, 2009). At the opposite extreme is the universalistposition, in which thought is said to be independent of language (e.g., Pinker, 1994; see Lucy, 2016, for a historical overview). Although there is no agreement as to where between these two opposing views the truth is situated, the broad consensus is that neither extreme is correct (Gleitman & Papafragou, 2013). In the middle are a variety of standpoints, some of which are more limiting of the role of language than others. For example, thinking for language (i.e., thinking about speaking) might allow language to bias our attention toward the more linguis- tically describable aspects of what we perceive (Slobin, 1996), but this does not mean that language alters the underlying conceptual structures. Another standpoint is that language can influence our judgments about what we perceive, but not perception itself, which is cognitively impenetrable (see Firestone & Scholl, 2016, for debate; see also Pylyshyn, 1984). For many, our perceptions are perhaps best described as modulated or biased by language (Athanasopoulos, 2006; Dolscheid, Shayan, Majid, & Casasanto, 2013; Gilbert, Regier, Kay, & Ivry, 2006, 2008; Lupyan, 2012). Overall, investigating the ways in which language does and does not relate to thought now appears to be the prevailing approach (Lucy, 2016; Lupyan, 2012; Thierry, 2016). Evidence for language influencing thought and judgments about perceptions comes from various domains. For example, performance on color discrimination/matching tasks has been shown to be more efficient when the stimuli have different labels in a participants language relative to when they are subsumed under one term (e.g., Roberson, Pak, & Hanley, 2008; Winawer et al., 2007). Differences between languages have also been demonstrated to affect how people think about object relations (Park & Ziegler, 2014) and objects themselves (Imai & Gentner, 1997) as well as broader, more abstract concepts such as quantity (Athanasopoulos, 2006), time (Boroditsky, 2001; Casasanto et al., 2004), and motion Electronic supplementary material The online version of this article (https://doi.org/10.3758/s13423-019-01652-3) contains supplementary material, which is available to authorized users. * Steven Samuel [email protected] 1 Department of Psychology, University of Essex, Wivenhoe Park, Essex, UK https://doi.org/10.3758/s13423-019-01652-3 Psychonomic Bulletin & Review (2019) 26:17671786 Published online: 19 August 2019

Upload: others

Post on 12-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

THEORETICAL REVIEW

Grammatical gender and linguistic relativity: A systematic review

Steven Samuel1 & Geoff Cole1 & Madeline J. Eacott1

# The Psychonomic Society, Inc. 2019

AbstractMany languages assign nouns to a grammatical gender class, such that “bed” might be assigned masculine gender in onelanguage (e.g., Italian) but feminine gender in another (e.g., Spanish). In the context of research assessing the potential forlanguage to influence thought (the linguistic relativity hypothesis), a number of scholars have investigated whether grammaticalgender assignment “rubs off” on concepts themselves, such that Italian speakers might conceptualize beds as more masculinethan Spanish speakers do. We systematically reviewed 43 pieces of empirical research examining grammatical gender andthought, which together tested 5,895 participants. We classified the findings in terms of their support for this hypothesis andassessed the results against parameters previously identified as potentially influencing outcomes. Overall, we found that supportwas strongly task- and context-dependent, and rested heavily on outcomes that have clear and equally viable alternative expla-nations. We also argue that it remains unclear whether grammatical gender is in fact a useful tool for investigating relativity.

Keywords Grammatical gender .Whorf . Linguistic relativity . Language and thought

Introduction

The Sapir–Whorf or linguistic relativity hypothesis—hence-forth, simply “relativity” (Whorf, 1956)—takes variousforms, but at its heart it contends that the idiosyncrasies ofthe languages we speak influence the way we think aboutthe world. In its strongest incarnation—linguistic determin-ism—thought is constrained by language, but few if any con-temporary scholars take this view (Athanasopoulos, 2009). Atthe opposite extreme is the “universalist” position, in whichthought is said to be independent of language (e.g., Pinker,1994; see Lucy, 2016, for a historical overview). Althoughthere is no agreement as towhere between these two opposingviews the truth is situated, the broad consensus is that neitherextreme is correct (Gleitman & Papafragou, 2013). In themiddle are a variety of standpoints, some of which are morelimiting of the role of language than others. For example,

thinking for language (i.e., thinking about speaking) mightallow language to bias our attention toward the more linguis-tically describable aspects of what we perceive (Slobin, 1996),but this does not mean that language alters the underlyingconceptual structures. Another standpoint is that languagecan influence our judgments about what we perceive, but notperception itself, which is cognitively impenetrable (seeFirestone & Scholl, 2016, for debate; see also Pylyshyn,1984). For many, our perceptions are perhaps best describedas modulated or biased by language (Athanasopoulos, 2006;Dolscheid, Shayan, Majid, & Casasanto, 2013; Gilbert,Regier, Kay, & Ivry, 2006, 2008; Lupyan, 2012). Overall,investigating the ways in which language does and does notrelate to thought now appears to be the prevailing approach(Lucy, 2016; Lupyan, 2012; Thierry, 2016).

Evidence for language influencing thought and judgmentsabout perceptions comes from various domains. For example,performance on color discrimination/matching tasks has beenshown to be more efficient when the stimuli have differentlabels in a participant’s language relative to when they aresubsumed under one term (e.g., Roberson, Pak, & Hanley,2008; Winawer et al., 2007). Differences between languageshave also been demonstrated to affect how people think aboutobject relations (Park & Ziegler, 2014) and objects themselves(Imai & Gentner, 1997) as well as broader, more abstractconcepts such as quantity (Athanasopoulos, 2006), time(Boroditsky, 2001; Casasanto et al., 2004), and motion

Electronic supplementary material The online version of this article(https://doi.org/10.3758/s13423-019-01652-3) contains supplementarymaterial, which is available to authorized users.

* Steven [email protected]

1 Department of Psychology, University of Essex, Wivenhoe Park,Essex, UK

https://doi.org/10.3758/s13423-019-01652-3Psychonomic Bulletin & Review (2019) 26:1767–1786

Published online: 19 August 2019

Page 2: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

(Athanasopoulos & Bylund, 2013). Some of these studieshave generated vigorous debate, perhaps most notably in thearea of color perception (e.g., A. Brown, Lindsey, & Guckes,2011; Franklin, Clifford, Williamson, & Davies, 2005; Witzel& Gegenfurtner, 2013). For some, aspects of grammar mightprovide a better tool by which relativity can be explored thanvision-focused research; this is because grammar is unaffectedby sensory input, obviating the need for language to “breach”psychophysical barriers. Additionally, for the relativity hy-pothesis to hold, it should not be limited to category labelsbut extend to features of syntax too (Sato & Athanasopoulos,2018).

One aspect of grammar that has received attention in recentyears is grammatical gender (Bassetti & Nicoladis, 2016;Bender, Beller, & Klauer, 2018). Unlike English, which hasa semantic (or conceptual) gender system whereby the genderof a noun is dictated by its biological sex, many languageshave a formal system by which all nouns are assigned to agrammatical gender category whether they have biologicalsex or not. For example, the English word “bed” has no gen-der, and is referred to with the pronoun “it,” but in Italian (ilmletto) takes the masculine gender. The consequence of a for-mal grammatical gender system is obligatory conformity oragreement with the syntactic rules of that class. This mayinvolve marking for gender any definite or indefinite articles(“the,” “a/an”), plural markings, and case markings, as well asother forms of agreement. Different languages also assigndifferent genders to the same objects; for example, in contrastto Italian, “bed” takes feminine grammatical gender inSpanish (laf cama). Such differences are not at all exceptional;grammatical gender assignment in a given language is largelyarbitrary (Corbett, 1991; Foundalis, 2002), and “escapes log-ic” (Boutonnet, Athanasopoulos, & Thierry, 2012). Indeed, asEnglish and other languages with purely semantic gendershow, a formal grammatical gender system is also unnecessarywhen it comes to the communicative function of a language.

The crucial exception to arbitrariness in grammatical gen-der assignment concerns words referring to entities, usuallyhuman, with actual biological sex. “Woman” is feminine and“man” masculine across German, French, Italian, andSpanish, and it is from these associations, presumably,1 thatgrammatical “gender” receives its name. Occasionally, thecorrelation between grammatical gender and biological sexis imperfect; witness German, which has a third, neuter gendercategory in addition to masculine and feminine, and whichassigns this gender to the word for “girl” (dasn Mädchen).Nevertheless, even in German the vast majority of words re-ferring to humans that have biological sex show a strong pat-tern of masculine with males and feminine with females. Inthis sense, languages with a formal gender system can be

viewed as having a largely semantic gender system for humanagents at least.

It is the combination of the arbitrariness of grammaticalgender assignment and the link with biological sex that hasattracted researchers to grammatical gender when examiningrelativity (e.g., Bassetti, 2007; Sato &Athanasopoulos, 2018).The question asked in such studies is whether grammaticalgender assignment “rubs off” on the conceptual representa-tions of inanimate objects that have no biological sex suchthat, for example, Italian speakers conceptualize beds assomehow more masculine than speakers of a language withno formal grammatical gender system, or than speakers with agender system but for whom a different gender is assigned.Although historically seen as a quirk of grammar, grammaticalgender is therefore regarded to be in a particularly strong po-sition to inform long-standing philosophical, linguistic, andpsychological debates around the universality or otherwiseof human thought (Phillips & Boroditsky, 2003).

A variety of tasks have been employed to understandwhether grammatical gender does influence concepts. Themost common has been the voice choice task (16 differentpublications using this task were discovered in this review),in which participants are asked to assign a male or femalevoice to objects, with many finding that the sex of the voiceand the grammatical gender of the target are indeed broadlyconsistent (e.g., Kurinski, Jambor, & Sera, 2016; Lambelet,2016; Ramos & Roberson, 2011; Sera, Berge, & del CastilloPintado, 1994; Sera et al., 2002). Similar results have beenfound when asking participants to assign a human name or asex to an object (“sex assignment” tasks; e.g., Belacchi &Cubelli, 2012; Flaherty, 2001), and when asking participantsto rate on a scale the similarity (“similarity task”) betweenpictures of male and female humans and objects (Phillips &Boroditsky, 2003). In object–name memory association tasks,participants are instructed to remember male and femalenames that substitute object names, such that “chair” mightnow be “Patricia,” and the results have sometimes shown thatthe ability to recall the human name is enhanced if it is con-gruent with the grammatical gender of the object in question(Boroditsky & Schmidt, 2000).

The tasks described above all involve judgments madewithout time pressure or measurement, but there is also some,albeit limited, evidence in favor of grammatical genderinfluencing concepts from speeded-response tasks.Converging evidence from different measurement types is im-portant to generate the clearest picture possible of any effect.For example, in the extrinsic affective Simon task (EAST),participants respond to words using two keys; in one condi-tion, each key is mapped to a color, and in another each key ismapped to a sex (male or female). Bender, Beller, and Klauer(2016b) found that German speakers were faster to respond tothe color of a word if that word was a gendered, personifiablenoun (e.g., death, beauty) and the response key was also

1 We return to this issue and discuss it in greater detail in the Discussionsection.

Psychon Bull Rev (2019) 26:1767–17861768

Page 3: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

mapped to the sex implied by the target’s grammatical gender.However, they also found that the effect was weak or nonex-istent for inanimate objects, and only held for personifiablenouns that had connotations of sex that were themselves con-gruent with the noun’s grammatical gender (e.g., “war” ismasculine in connotation and gender, but “Spring” is femininein connotation but masculine in gender).

The role of gender or sex in the tasks so far described hasbeen clear, because participants are making judgments andresponses with biological sex playing an obvious role in thisprocess. Some researchers have preferred tasks in which thesalience of sex and gender can be reduced or made moreorthogonal to the participant’s conscious experience. For ex-ample, Konishi (1993) employed a property judgment para-digm, finding that German speakers rated masculine-genderedobjects such as “key,” “table,” and “beach,” as more “potent”(a trait associated with masculinity) than concepts with femi-nine gender; Spanish speakers, for whom the same targetstook the feminine gender, rated the same items as less potent.This type of task makes participants think about conceptswithout having to relate them to gender or sex directly; indeed,neither term need come up at all in the instructions for suchexperiments.

Other studies, often using the same or similar designs asthose described above, have instead reported potentially im-portant absences of evidence. For example, in a propertiesjudgment task, Landor (2014) asked approximately 600speakers of gendered languages to generate adjectives to de-scribe inanimate objects, and then asked a different group ofparticipants to assign a male or female voice to them. Cross-referencing these results back to the original stimuli, no evi-dence of property judgments consistent with grammatical gen-der was found. Absences of evidence have also been reportedin sex assignment tasks (Nicoladis & Foursha-Stevenson,2012), similarity tasks (Degani, 2007), the EAST (Benderet al., 2018), and priming-type tasks (e.g., Degani, 2007;Samuel, Roehr-Brackin, & Roberson, 2016). The importanceof these findings is that they might delineate important con-straints on the relativity hypothesis that, as we discussed ear-lier, would be a finding consistent with the majority of theo-retical standpoints in the contemporary literature.

Further clarifications of such constraints might come fromresults that do not conform to a binary “support or no-support”distinction. For example, a property judgment task bySemenuks, Phillips, Dalca, Kim, and Boroditsky (2017) foundthat grammatical gender influenced property judgments, butonly in an analysis that left out participants’ first-choice ad-jectives, focusing on second- and third-choice adjectivesalone. The authors suggest that the absence of an effect fromthe first adjective choice might explain the tendency for morenegative findings from speeded-response tasks, but an alter-native explanation is that participants fell back on grammaticalgender as a strategy after they had made their initial and

potentially most faithful descriptions (cf. Bender, Beller, &Klauer, 2016a).

Other studies have shown support for relativity only forlimited subsets of results rather than for the full range of pre-dictions. For example, in the study by Konishi (1993) alreadydescribed, German and Spanish speakers both showed evi-dence of perceiving objects with masculine gender as more“potent” than objects with feminine gender, with “potent”rated as a masculine trait; however, participants also rated“negative” as a masculine trait, but this property did not trans-fer to objects. Indeed, even the link with potency involvedvery small differences between masculine and feminine con-cepts. Similarly, in a sex assignment task, Nicoladis, Da Costa,and Foursha-Stevenson (2016) found that Russian–Englishbilingual preschoolers assigned male sex to masculine genderobjects more frequently than they did to neuter gender objects,but not more frequently than they did to feminine genderobjects; and in a voice choice task, decisions have been foundto be consistent with masculine gender but not with femininegender (Bassetti, 2007).

As we mentioned earlier, the prevailing standpoint is thatneither a strong view of relativity nor a universalist view pro-vides an accurate account of the relationship between lan-guage and thought, and as a result the absences of evidenceand the mixed results hint at precisely the systematic con-straints most researchers seek to reveal. It is our view that asystematic review of the empirical literature would be well-placed to begin the process of drawing conclusionsconcerning the strength, extent, and limitations of any suchrelationship.

Before describing the results of the review, we outline thesix potential constraints or “parameters” that we have extract-ed from discussions in the literature. Assessment of the influ-ence of these parameters, together with the effects of the largevariety of tasks used in this field, runs through our review.Finally, we consider whether the literature supports the viewthat the effects of grammatical gender are primarily aboutlanguage, or whether the effects attributed to relativity mightinstead be explained by statistical co-occurrences and associ-ations that are carried by language, but are not linguisticthemselves. Such questions have the potential to go to theheart of contemporary thought about the role of language incognition (Lupyan, 2012; Thierry, 2016).

The salience of gender/sex in the task

In terms of methodology, many studies that have shown ef-fects of grammatical gender have tended to come from explicitjudgments when gender and/or sex is a salient context in theexperiment (e.g., Phillips & Boroditsky, 2003; Saalbach, Imai,& Schalk, 2012; Sera et al., 1994; Sera et al., 2002). Indeed,the most common task in the field, voice choice, asks partic-ipants to assign a biological sex to an object. Given the nature

Psychon Bull Rev (2019) 26:1767–1786 1769

Page 4: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

of the question (i.e., there is no objectively correct answer), theparticipant might seek some rationale for the choices ratherthan assign at random. Under such conditions, the chances ofgrammatical gender being consciously recruited are increased,potentially undermining a conceptual change account. A num-ber of scholars have raised this issue (e.g., Bender, Beller, &Klauer, 2011; Bender et al., 2018; Kousta, Vinson, &Vigliocco, 2008; Pavlidou & Alvanoudi, 2013; Ramos &Roberson, 2011), and Phillips and Boroditsky (2003) havesuggested that it is difficult to resolve empirically. For in-stance, in Study 2 of Sera et al. (1994), reference to “mascu-line” and “feminine” was removed from the instructions fromthe previous experiment, but participants were neverthelessstill required to assign male or female voices. Even when therole of gender/sex is less prominent in a task, it is often stillhigh. For instance, the EAST (Bender et al., 2016a, 2016b)maps all responses onto the same keys, meaning that evenwhen participants respond according to color, they do so usinga key that has also been assigned to a biological sex. This is anecessary facet of a design that relies on mapping two con-cepts to the same response in order to test for any effect of theoverlap. Any strategic use of grammatical gender, or simplyits recruitment via strong associations with biological sex in atask, might undermine the case for conceptual-change accountof results, and it is clearly important to understand the extentto which the evidence in favor of relativity might rely on suchpossibilities, by means of comparing high- versus low-gender/sex salience in tasks.

The salience of language in the task

Another issue concerns the salience of language in the taskand whether the design instead indexes language processes, asin “thinking for speaking” (Slobin, 1996). Indeed, it was anearly requirement of language-and-thought research that ef-fects of language should be evident on nonlinguistic tasks(R. Brown & Lenneberg, 1954). As with the salience of gen-der/sex, it is important to understand the extent to which theevidence for relativity might rest upon the recruitment ofgrammatical gender through language processing rather thanany underlying conceptual change. We therefore classifiedresearch as being either high or low in terms of the salienceof language in the task.

Testing participants in their gendered language

For some, testing that occurs in participants’ gendered lan-guage limits inferences to the effects of grammatical genderon concepts within that language, rather than on the conceptsthemselves (Boroditsky & Schmidt, 2000; Slobin, 1996); ifone’s language creates a worldview in which “bed” is mascu-line, “bed” should be conceptualized as masculine not onlywhen acting in the context of that particular language, but also

when acting in a nongendered language, such as English. Bydefinition, this argument suggests that research is best con-ducted on bilinguals. In favor of such a possibility, testingparticipants in a second, ungendered language context has alsorevealed influences of a first-language gender system(Boroditsky & Schmidt, 2000; Phillips & Boroditsky, 2003;Semenuks et al., 2017). To test this particular hypothesis, weclassified experiments in terms of whether participants per-formed in their gendered or nongendered language.

Two-gender versus three-gender languages

Another issue concerns the precise nature of the grammaticalgender system under investigation. For example, in Romancelanguages like Spanish, Italian, and French, most nouns thatrefer to humans carry grammatical gender that is consistentwith the target’s biological sex. In German, however, the cor-relation is weaker, in part owing to its third, neuter gender, butalso because German articles do not always differentiate be-tween genders as a result of the German case system. This canresult, for example, in even animates being labeled with agrammatical gender incongruent with their biological sex.The issue of two- versus three-gendered languages thereforeconcerns whether an influence might be strongest in speakersof languages with two gender classes that form a particularly“tight fit” with semantic gender (see Saalbach et al., 2012;Sera et al., 1994; Vigliocco, Vinson, Paganelli, &Dworzynski, 2005). Results with German speakers on voicechoice tasks have indicated that grammatical gender does notinfluence decisions in the way that, for example, French orSpanish grammatical gender does (Sera et al., 2002). If thispattern were borne out in a broader review, it might suggest astatistical, correlative relationship between biological sex andgrammatical gender in relativity.2 For Lucy (2016), the struc-tural consequences of grammatical gender (case markings,adjective agreement, etc.) are too often overlooked and mightexplain some inconsistencies in results. We classified experi-ments in terms of whether participants spoke a two- or three-gendered language.3

2 The potential for grammatical-gender effects on object conceptualization tobe the result of statistical co-occurrences goes to the heart of a debateconcerning whether grammatical gender is in fact a suitable tool for investi-gating relativity at all, and it is one we turn to in detail later in this section andin the Discussion.3 Not all two-gendered languages divide nouns into masculine and feminine;some (like Dutch, Swedish, and some Norwegian dialects) instead dividenouns into “gendered” and “neuter” categories. To our knowledge, only onestudy in our review included participants who spoke a language of this type:Bergensk, a language spoken in Norway (Beller, Brattebø, Lavik, Reigstad, &Bender, 2015). For the purposes of the review, we excluded the results of thisstudy from our two- versus three-gender comparison. The sample size (N =107) suggests that neither leaving this study in nor taking it out would have anymeaningful effect on the patterns of results described.

Psychon Bull Rev (2019) 26:1767–17861770

Page 5: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Effects with animate and inanimate targets

Another parameter concerns whether grammatical gendermight influence the conceptualizations of animate but not in-animate targets (e.g., Vigliocco et al., 2005). For example,participants who speak a gendered language (German)showed a greater willingness to erroneously endorse sex-specific statements about animals if those statements wereconsistent with the animals’ grammatical gender than didspeakers of an ungendered language (Japanese), but the samewas not true of inanimate targets (Imai, Schalk, Saalbach, &Okada, 2014). In a series of priming experiments, Benderet al. (2011) asked German speakers to decide the gender ofa target word after they had seen either (1) a definite articledenoting gender (der for masculine and die for feminine), (2)the words Mann (“man”) and Frau (“woman”), (3) the sym-bols for male and female, or (4) pictograms of a man or wom-an. They found that the congruent linguistic article primessped up judgments relative to incongruent trials, for both an-imate and inanimate targets. However, of the other primes,only the Mann/Frau pictograms had an effect, and only onanimate targets. The researchers concluded that the grammat-ical gender of objects does not appear to “seep” into the se-mantic content of inanimate nouns. A broadly similar patternwas found in the properties judgment task by Semenuks et al.(2017). Results like these have led some scholars to suggestthat grammatical gender is only relevant to conceptualizationwhen sex is a relevant property of the target (Ramos &Roberson, 2011; Vigliocco et al., 2005). If this is true, thenrelativity is subject to an important constraint, namely thatgrammatical gender only interacts with targets that have bio-logical sex in the first place. We classified experiments interms of whether participants responded to animate or inani-mate targets.

Stronger effects in adults than children

Finally, studies with adults have been thought to provide moresupporting evidence for relativity than studies with children,and particularly very young children. For example, only sixout of 18 Spanish-speaking 3- to 5-year-olds freely sortedpictures of inanimate objects and male and female people intogroups defined by grammatical gender (Martinez & Shatz,1996), and only very limited effects of grammatical genderon object conceptualization, or indeed no effect at all, havebeen found in other studies with young children (Bassetti,2007; Nicoladis & Foursha-Stevenson, 2012; Sera et al.,2002). Weaker effects at younger ages are consistent withthe possibility that it is experience with grammatical genderthat leads to biological sex connotations with objects, but alsowith the possibility that it is metalinguistic knowledge ofgrammatical gender acquired through formal instruction, rath-er than grammatical gender assignment itself, that might

explain some positive results. We classified studies in termsof whether the participants were children (< 18 years old) oradults.

An existential question for relativity: What is“gendered” about grammatical gender?

A crucial question that underpins everything in this reviewconcerns the nature of grammatical gender itself. It has beenpointed out that the “gender” in grammatical gender is notintrinsic to language but is itself an arbitrary, human-madelabel (Bender et al., 2018). At some point in their lives,speakers of gendered languages usually learn that the namesfor the categories are “masculine” and “feminine,” but wouldthey ever choose those names without formal instruction? Aswas highlighted by Foundalis (2002), “masculine” and “fem-inine” are in fact poor predictors of the majority of nouns intheir class, and even the relationship between the meanings ofthe words gender and sex are stronger for speakers ofnongendered languages, such as English; speakers of gen-dered languages would not usually use the translation equiv-alent for “gender” in grammatical gender to refer to biologicalsex at all.

Grammatical gender has usually been perceived as a par-ticularly useful tool to study relativity because its assignmentpatterns are so arbitrary and have no psychological realityoutside of language itself, but the same point could be leveled,with more detrimental consequences for the research para-digm, at the titles of the categories themselves, which aremetalinguistic labels (they describe something about lan-guage) detached from the use of the grammar itself.Grammatical gender therefore appears to suffer from an iden-tity problem that other tools used to investigate relativity donot. To illustrate, we might as a thought experiment substitutethe terms “masculine” and “feminine” for either “plant” and“nonplant,” “sky” and “earth,” “Group 1” and “Group 2,” “x”and “y,” or any number of arbitrary labels, without interferingwith the performative use of a gendered language itself. Wemight predict that if we told a cohort of Spanish speakers to goabout their day “believing” that the titles of the classes hadchanged to “x” and “y,” they could simply continue to refer tola mesa (thef table) and el libro (them book) with no real-worldconsequences. If, however, we asked the same cohort to ran-domly shuffle their color-word mappings for a day, such thatthey might need to refer to the Spanish word for “green” withthe Spanish word for “blue,” or to swap their spatial preposi-tions, such that “over” might become “under,” we would beasking them to violate the rules of language use, and we wouldexpect there to be errors.

If the labels “masculine” and “feminine” are in a sensehistorical accidents, this has consequences for the use of thesecategories in the relativity paradigm, because effects that havepreviously been attributed to conceptual change might in fact

Psychon Bull Rev (2019) 26:1767–1786 1771

Page 6: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

be the result of simple statistical co-occurrences and associa-tions between two groups of labels: the entirely arbitrary,human-made labels of metalinguistic grammatical classes,on the one hand, and the similarly arbitrary “gender” assign-ment to noun labels, on the other. This would be the equiva-lent, for example, of finding that English speakers conceive ofthe digits 3, 5, 7, and 9 as somehow “weirder” that 2, 4, 6, and8, because the former are arbitrarily labeled as “odd.”Although language would be the vehicle of any statisticalassociations, the outcome becomes trivial in the context ofclassic views of relativity as engaging conceptual change.

What patterns would a statistical co-occurrence accountpredict? We would expect five out of the six parameters de-scribed to influence results. First, speakers of a two-genderedlanguage should show stronger effects than speakers of three-gendered languages, because the relationship between humanbiological sex and grammatical gender is reinforced throughgreater repetition and a stronger gender/sex correlation in hu-man animates, at least when compared to the three-genderedlanguage most commonly used in the field (German). Wewould additionally predict that performing in a gendered lan-guage would give rise to such associations in a way thatperforming in a nongendered language like English mightnot. We would predict that the more salient the roles of bothgender/sex and language in the experiment, the greater theopportunity for associations between biological sex and lan-guage to be engaged. Finally, we would expect the presence ofanimate targets to elicit biological sex information more thaninanimate targets. A sixth possibility, albeit a theoreticallymore tenuous one, is that adults would have more stronglyreinforced associations than children, owing to their greaterquantitative experience of language. Interestingly, all of theseparameter settings have already been cited in the literature asconducive to finding effects of grammatical gender, althoughthese suggestions have not yet been supported by a systematicreview. Assuming that the strong version of relativity is wrong(as we do), it then becomes important to decide whether theeffects of grammatical gender are statistical and associative oran effect of language on the fabric of conceptual representa-tion itself.

Review

The remit of the review

This review includes (1) empirical research with (2) human par-ticipants and (3) real languages and words (rather than languagesandwords invented for the purposes of experimentation), (4) thatis either published or unpublished and (5) is reported in English.4

Studies were also required to test the influence of grammaticalgender on at least some nonhuman targets (excluding, therefore,studies on grammatical gender and gender stereotyping of menand women).

Formal searches were conducted encompassing the years1990 to 2018 using the search terms “grammatical” and “gen-der” together, once with “Whorf” and once with “relativity,”in Web of Science (all) and Google Scholar (first five pages).Additionally, we searched the EThOS PhD thesis bank usingthe terms “grammatical” and “gender” for the broadest possi-ble range of results and the NDLTD thesis bank using “gram-matical gender,” once with “Whorf” and once with “relativi-ty.” All studies from a recent special issue of the InternationalJournal of Bilingualism (volume 20, issue 1) were includedwhere relevant. To ensure maximal representation of relevantdata and to minimize the potential for skewed results owing topotential publication biases (e.g., De Bruin, Treccani, & DellaSala, 2015), we also emailed the corresponding author of ev-ery study revealed in the first stage of the review to requestany unpublished or in-press results. Finally, a call for data wasalso issued on the ResearchGate website as part of a projectlinked to the present review. A further 13 pieces of empiricalresearch not turned up by these searches were added, eitherbecause they were known to the authors or were offered/recommended to the authors during this contact phase. Thefull list of items included and excluded from the review can befound in the supplemental online materials (SOM1).

Classifications by task

Owing to the heterogeneity of methodologies in the field, weopted to sort the individual experiments into eight task types:voice choice, properties judgments, EAST, sex assignment,priming, similarity judgment, association, and object–namememory association. These eight task types were chosen be-cause they were different enough in methodology to be con-sidered in their own right. To illustrate, we felt that the closestpair was voice choice/sex assignment, since both involve ex-plicit biological sex judgments to be made about targets.However, voice choice tasks require the participant to imaginean object speaking, which might recruit thinking about lan-guage in a way that assigning biological sex alone might not.5

We classified every experiment according to the six param-eters previously described as being potentially important tothe outcomes of research. We adopted the following approachto these classifications. Regarding language content, we askedwhether language goes in to the task (e.g., the stimuli arewords), or comes out of the task (e.g., choosing betweenwords such as “male” or “female,” thinking of adjectives inproperty judgment tasks). Where neither occurs, or where any

4 Only one study discovered by the search procedure was eventually excludedfor the absence of an English translation (see SOM1).

5 Additionally, there is at present no empirical evidence that participants assignthe same biological sex to the same objects across both tasks.

Psychon Bull Rev (2019) 26:1767–17861772

Page 7: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

role of language is judged to be highly orthogonal, that studyis classified as low language content. The same approach wastaken to gender/sex content (gender or sex information shouldnot go in to the task or be part of the response or processleading to the response). The other parameters (age, language,number of gender categories, and target type) were readilyclassifiable at face value. Full details of item classificationscan be found in SOM2.

Classifications of results

Given that the results of tasks are sometimes not unambiguousin their support or otherwise of relativity, each individualstudy was classified in terms of one of three outcomes:support, mixed support, or no support. Studies classified asoffering support showed an effect of grammatical gender onconceptualizations consistent with the hypothesis employed.A study was classified as offering no support if there was anabsence of evidence for relativity that could be classified min-imally as mixed support. Studies were classified as offeringmixed support if they showed partial confirmation of an influ-ence, such as an influence of one gender but not another(Bassetti, 2007), marginally significant effects (e.g.,Semenuks et al., 2017), an influence in accuracy but not inresponse times (e.g., Bender et al., 2016a, 2016b), or an influ-ence shown on one criterion but not another (e.g., Konishi,1993). SOM2 lists these classifications for each experiment.

Adjustments for sample size

Given that the sample sizes varied widely from study to studyand condition to condition (from 7 to 924), we report resultstaking sample size into account. This has the obvious benefitof weighting the pattern of results in favor of those studies thatare most likely to be highest-powered and less susceptible tospurious effects. Given that the same participants sometimesperformed multiple conditions or analyses, we also allowedmultiple data points in the review from the same participants.For example, this review allowed for separate data points forinanimate and animate targets from the same group. In suchcases, separate classifications are necessary to ascertain theeffect of different parameters, such that the evidence fromanimate targets might be classified as offering support, butthe results from inanimate targets be classified as no support.Full details can be found in SOM2.

This review adopts a methodological approach somewherebetween a vote-count system (owing to the classifications ofsupport, mixed support, or no support) and a statistical meta-analysis (owing to sample-size adjustments) (cf. Samuel,Roehr-Brackin, Pak, & Kim, 2018). Given the heterogeneityof their research designs, languages tested, and so on, occa-sionally only very small clusters of studies could be consid-ered to be using the same methods, and often these were

studies that came from the same labs and sampled the sametype of linguistic population. We therefore considered thisapproach the better way to provide an overview of the multi-ple ways in which the field has investigated the research ques-tion, and how different designs may culminate in differentoutcomes.

What is not in the review

We excluded studies that looked not for relationships betweentargets and biological sex, but rather for relationships betweenobjects of the same grammatical gender versus objects of dif-ferent grammatical gender (e.g., Almutrafi, 2015, Exp. 2;Bobb & Mani, 2013; Boutonnet et al., 2012; Cubelli,Paolieri, Lotto, & Job, 2011; Kousta et al., 2008; Yorkston& De Mello, 2005). For example, it has been demonstratedthat semantic category judgments about nouns belonging tothe same grammatical gender are processed more quickly thannouns from different grammatical genders (Cubelli et al.,2011), and that grammatical gender information is processedin semantic similarity tasks even when it is irrelevant andundetectable by behavioral measures (Boutonnet et al.,2012). Although such studies offer support for the processingof grammatical gender information when it is apparently task-irrelevant (a useful prerequisite for tasks in this review), andeven show that objects of the same grammatical gender areperceived to be more similar than objects that are not, this isnot the same as demonstrating that objects are conceptualizedas more masculine or feminine as a function of their genderassignment. Such results might be explained in terms of aneffect of membership in the same grammatical category, inde-pendently of biological sex information (cf. Cook, 2016).Because this review is entirely concerned with this specificrelationship such studies, though clearly interesting in theirown right, were omitted.

There have also been studies assessing how and whengrammatical gender is processed in language production andcomprehension, including in bilinguals for whom the gram-matical gender for the same object might be opposite in theirtwo languages (Costa, Kovacic, Fedorenko, & Caramazza,2003; Costa, Kovacic, Franck, & Caramazza, 2003). Again,such studies are not designed to look at whether object con-ceptualization has the potential to be influenced by thebiological sex connotation of their grammatical gender assign-ment, and they were therefore excluded.

The review also does not include one study that does per-tain to grammatical gender and biological sex, but in which ameaningful understanding of the number of participants is notpossible. This was the study by Segel and Boroditsky (2011),in which depictions of personifications and allegories in thou-sands of works of art were classified retrospectively in termsof their gender congruency. This was notionally classified assupport.

Psychon Bull Rev (2019) 26:1767–1786 1773

Page 8: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Finally, the remit of the review excluded studies that didnot involve human participants but investigated the questionof relativity through connectionist models (e.g., Dilkina,McClelland, & Boroditsky, 2007; Sera et al., 2002) or throughtraining in artificial “languages” (Eberhard, Heilman, &Scheutz, 2005; Phillips & Boroditsky, 2003, Exps. 4 and 5;Sera et al., 2002, Exp. 4) or nonsense words (Konishi, 1994;Vuksanovic, Bjekic, & Radivojevic, 2015). This is becausewe felt that we should limit our scope to behavior more clearlygrounded in the experience of real people with real grammat-ical gender categories.

Results

Overall, the initial search revealed 99 individual pieces ofresearch, with a further 13 added that were known to theauthors but were not revealed by the formal search. Afterremoving those items that did not provide empirical data, thereview included 43 individual pieces of research, one of whichwas unpublished (Nicoladis, 2019), and three of which weredoctoral or master’s theses (Almutrafi, 2015; Degani, 2007;Landor, 2014). The remaining pieces of research were pub-lished journal articles, conference proceedings (always of theCognitive Science Society), or book chapters. Aftersubdividing this research by task type and condition, thesepieces of research resulted in 158 lines of data (split by differ-ences in conditions within experiments), which together sur-veyed 5,895 participants in total.

As we described earlier, we then calculated the number of“samples,” which was 7,334. This number differs from thenumber of participants because it allows for the possibilitythat the same participant might have performed in multipleconditions. It is for this reason that the number of samplescan be higher (but not lower) than the number of participants.We present all our results in the context of samples, rather than

participants, in order to capture these important within-experiment differences.

Overall results

Across the review as a whole, the results from 32% of allsamples were classified as offering support for relativity,24% were classified as offering mixed support, and 43% asoffering no support. With the exception of one particularlylarge study (Montefinese, Ambrosini, & Roivainen, 2019, N= 924 and N = 105, total N = 1,029), there was no evidencethat the results were driven by only a small cluster of highlypowered studies (see Fig. 1). If this outlying study were re-moved, support would be at 38%, mixed support at 28%, andno support at 34%.

In what follows, we first describe the results by task type.We then describe results as a function of task parameters. Afull at-a-glance view of all the results by task type and param-eter can be found in Figs. 2 and 3.

Results by task type

Properties judgment Full results for the properties judgmenttask are displayed in Table 1. Properties judgments made up37% of all samples in the review, making it the most common-ly performed task in the literature (i.e. it comes with thehighest number of samples, rather than the highest numberof uses in the literature). Only 3% of samples were classifiedas offering support (Flaherty, 2001; Imai et al., 2014; Saalbachet al., 2012). A further 23% were classified as providingmixed support; reasons were the finding that results weremore consistent with grammatical gender in one group thananother (that spoke a different language), but apparently notmore so than chance itself (Haertlé, 2017); results limited toone property but not another, despite evidence that both werelinked to biological sex (Konishi, 1993); evidence to suggest

Fig. 1 Number of samples (vertical axis) for each research item or line of the review (horizontal axis). The mean and median samples are displayed in thetext box. The outlier is Montefinese et al. (2019) with a property judgment task (N = 924 Italian speakers).

Psychon Bull Rev (2019) 26:1767–17861774

Page 9: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

an effect of the grammatical gender of a language the partic-ipants did not speak, with no direct comparison of this effectwith the language they did speak (Sedlmeier, Tipandjan, &Jänchen, 2016); and effects limited to second- and third-choice, but not first-choice, adjectives (Semenuks et al.,2017). The remaining 75% of samples offered cases of nosupport at all (Flaherty, 2001; Imai et al., 2014; Landor,2014; Mickan, Schiefke, & Stefanowitsch, 2014;Montefinese et al., 2019; Semenuks et al., 2017). It shouldbe noted that the study by Montefinese et al. represents anextreme outlier. The results of this study were classified asno support. However, even if the results of this study wereremoved, the rate of support for properties judgment taskswould only rise to 4%. All properties judgment tasks wereintrinsically high in language content, and all have so far beenconducted with adult participants. Given the almost floor-level overall rate of support, an examination of the effect ofdifferent factor parameters was not conducted.

Voice choice The full results for the voice choice task aredisplayed in Table 2. The voice choice paradigm, althoughthe most common experimental task for researchers, is thesecondmost commonly performed by participants in the field,accounting for 28% of all samples. Of these, 64% were clas-sified as offering support (Almutrafi, 2015; Athanasopoulos& Boutonnet, 2016; Beller et al., 2015; Bender et al., 2016a;Haertlé, 2017; Kurinski et al., 2016; Lambelet, 2016; Ramos&Roberson, 2011; Sera et al., 1994; Sera et al., 2002; Vernich,2017; Vernich, Argus, & Kamandulytė-Merfeldienė, 2017).An additional 22% were classified as providing mixed sup-port; these included results consistent with the hypothesis but

limited to one of two genders (Bassetti, 2007), effects forlimited subsets of targets (Beller et al., 2015; Bender et al.,2016a), and statistically marginal results (Bender et al., 2018).Mixed results also included cases in which the data suggestedthat voice choices were not more consistent with grammaticalgender than chance levels (Forbes, Poulin-Dubois, Rivero, &Sera, 2008; Sera et al., 2002), and effects limited to nativespeakers but not learners (including advanced learners6) ofthe same language (Kurinski & Sera, 2011). A minority ofsamples offered no support at all (12%) (Bassetti, 2007;Bender et al., 2018; Forbes et al., 2008; Sera et al., 1994;Sera et al., 2002).

All voice choice tasks were classified as having a highgender content and high language content. A greater rate ofsupport for relativity was found when participants spoke alanguage with two genders (69%) instead of three (24%).There was also more support from adult samples (69%) thanfrom children (39%). To illustrate these comparisons, the ma-jority of no-support cases came from participants who were 5-6 year-olds (Sera et al., 1994; Sera et al., 2002), 9-year-olds,(Bassetti, 2007), or adult speakers of German, a three-gendered language (Bender et al., 2018). In contrast, therewas no evidence that voice choices were more consistent withgrammatical gender when applied to animate (38%) than toinanimate objects (68%), and a lower rate of support was

6 This study included 102 participants of varying levels of Spanish. Any effectof grammatical gender in the advanced learners in this experiment was con-founded with the usually more reliable natural/artificial distinction (Mullen,1990). Note that if this item were included as Support owing to the 26 native-speaking participants’ performance alone, it would only move these 26 sam-ples (the number of native Spanish speakers) from mixed support to support.

Fig. 2 Classifications of support, mixed support, and no support by task type. The total number of samples is shown on the y-axis.

Psychon Bull Rev (2019) 26:1767–1786 1775

Page 10: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

found when assigning voices in one’s gendered language(56%) relative to one’s ungendered language (73%). Someof these results might be skewed by the relative dearth of childsamples, samples who performed the task in a low gendered-language context, with a three-gendered language back-ground, or with animate targets.

Sex assignment Full results for the sex assignment task aredisplayed in Table 3. Sex assignment tasks are almost identicalto voice choice tasks, in that they are explicit assignments of

male or female sex to animate and inanimate objects. Largelythanks to the study by Belacchi and Cubelli (2012),representing 412 samples (46% of all samples with this para-digm), sex assignment makes up 12% of the samples in thereview.

Overall, 66% of samples were classified as offering support(Belacchi & Cubelli, 2012; Flaherty, 2001; Pavlidou &Alvanoudi, 2018; Sera et al., 1994), 26% as mixed support(Bender et al., 2016b; Nicoladis, 2019; Nicoladis et al., 2016;Nicoladis & Foursha-Stevenson, 2012; Pavlidou &Alvanoudi, 2013), and 8% as no support (Flaherty, 2001;Nicoladis & Foursha-Stevenson, 2012).

Fig. 3 Classifications of support, mixed support, and no support according to the task parameters. The total number of samples is shown on the y-axis.

Table 1 Numbers of samples according to parameter setting and resultsclassification, for the properties task

Parameter Setting Support Mixed No Support TOTAL

Gender/Sex High 33 0 65 98

Low 40 609 1,942 2,591

Gend. Lang. High 73 336 1,734 2,143

Low 0 273 273 546

No. of Gend. Two 40 60 1,337 1,437

Three 33 276 397 706

Target Animate 33 273 24 330

Inanimate 0 120 1,943 2,063

Age Adults 57 609 1,967 2,633

Children 16 0 40 56

TOTAL 73 609 2,007 2,689

Samples are repeated multiple times across parameters, hence the differ-ences from the bottom-line totals

Table 2 Numbers of samples according to parameter setting and resultsclassification, for the voice choice task

Parameter Setting Support Mixed No Support TOTAL

Gend. Lang. High 837 415 245 1,497

Low 177 32 32 241

No. of Gend. Two 828 187 180 1,195

Three 79 153 97 329

Target Animate 107 99 76 282

Inanimate 1,189 348 201 1,738

Age Adults 1,160 372 140 1,672

Children 136 75 137 348

TOTAL 1,296 447 277 2,020

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals

Psychon Bull Rev (2019) 26:1767–17861776

Page 11: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Sex assignment tasks are intrinsically classified as high ingender/sex content. Although they need not also have a highlanguage content, they were judged high for every study inthis review. For animate targets, the rate of support was 100%,higher than for inanimate targets (33%). The rate of supportwas slightly higher in children (69%) than in adults (62%).The rate of support was 75% when participants performed intheir gendered language, but zero in an ungendered language.Finally, the rate of support from two-gendered languages washigh (83%), but from three-gendered languages it was low(23%). Again, some of these comparisons (with the probableexception of age) might be skewed by imbalances in the num-ber of samples that were classified as performing under eachparameter setting.

EAST The full results for the EAST are displayed in Table 4.The EAST is a unique case in this review, because it has onlybeen used by one core group of researchers (Bender et al.,2016a, 2016b, 2018), and only ever with adult speakers ofGerman, which is a three-gender language. It comprises 9%of the samples. There is a good case that it might constitute apriming task, but given its singularity we felt it was best con-sidered as a task category in its own right. Overall, only 11%of the samples were classified as offering support (Benderet al., 2016a, 2018), 56% were classified as offering mixedsupport (Bender et al., 2016a, 2016b, 2018), and 33% as of-fering no support (Bender et al., 2016a, 2018).

Variation in the results by parameter should be interpretedin the context of the low overall rate of support. The EASTuses a two-key response method, with one clearly mapped to“male” and one to “female”—hence, gender context is alwayshigh—and since the stimuli are always words, language con-text is also high. In fact, the only possible parameter compar-ison that can be made with the EAST concerns inanimate andanimate targets, which showed similar levels of support (10%vs. 11%, respectively).

Priming The full results for priming tasks are displayed inTable 5. There are only five different pieces of research withpriming experiments in the literature, comprising 8% of thesamples in the review. Care must be taken when attempting tointerpret parameter patterns from only a handful of studies,where settings can be entirely confounded with individualarticles. Overall, priming offered a 34% support rate (Benderet al., 2011; Sato & Athanasopoulos, 2018) and a 66% no-support rate (Bender et al., 2011; Degani, 2007; Mickan et al.,2014; Samuel et al., 2016). There were no cases of mixedclassifications. Given the small overall numbers of support(198 samples in total), coming from only two articles, com-parisons were unlikely to reveal any reliable patterns.

Similarity judgment The full results for similarity tasks aredisplayed in Table 6. Similarity judgment tasks comprisedonly 3% of all samples. The pattern of results, indicating44% support (Phillips & Boroditsky, 2003), 45% mixed sup-port (Sedlmeier et al., 2016), and 11% no support (Degani,2007), is entirely confounded with the three individual piecesof research to use the task. The small number of samples withthis taskmake it difficult to drawmeaningful conclusions as towhat might lead to the differences in results.

Table 5 Numbers of samples according to parameter setting and resultsclassification, for the priming tasks

Parameter Setting Support Mixed No Support TOTAL

Language High 170 0 389 559

Low 28 0 0 28

Gender/Sex High 170 0 317 487

Low 28 0 72 100

Gend. Lang. High 142 0 317 459

Low 56 0 72 128

No. of Gend. Two 56 0 67 123

Three 142 0 306 448

Target Animate 170 0 48 218

Inanimate 28 0 341 369

TOTAL 198 0 389 587

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals

Table 3 Numbers of samples according to parameter setting and resultsclassification, for the sex assignment task

Parameter Setting Support Mixed No Support TOTAL

Gend. Lang. High 594 151 48 793

Low 0 78 27 105

No. of Gend. Two 538 38 75 651

Three 56 191 0 247

Target Animate 412 0 0 412

Inanimate 30 61 0 91

Age Adults 214 131 0 345

Children 380 98 75 553

TOTAL 594 229 75 898

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals

Table 4 Numbers of samples according to parameter setting and resultsclassification, for the EAST

Parameter Setting Support Mixed No Support TOTAL

Target Animate 30 89 146 265

Inanimate 40 274 70 384

TOTAL 70 363 216 649

Psychon Bull Rev (2019) 26:1767–1786 1777

Page 12: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

AssociationThe full results for similarity tasks are displayed inTable 7. Only two pieces of research have employed an asso-ciation paradigm (Bender et al., 2018; Martinez & Shatz,1996), which together comprise only 2% of the samples inthe review. All the studies with this paradigm were classifiedas offering no support. Gender content and language contentwere always high, and participants were always tested on in-animate targets and in a gendered-language context.

Object–name memory association The full results for object–name association tasks are displayed in Table 8. Comprisingjust under 2% of samples in the review, object–name memoryassociation paradigms form the smallest task-type classifica-tion in this review. The task has been used in three separatepieces of research. Of the samples, 36% came under support(Boroditsky & Schmidt, 2000), 31% mixed support(Kaushanskaya & Smith, 2016), and 33% no support(Pavlidou & Alvanoudi, 2013). Note that these differencesare split entirely by publication. All these studies were classi-fied as having a high gender content and high language con-tent. All were conducted with adult participants, and all in-cluded inanimate targets.

Results by task parameters

The distribution of samples displayed in Fig. 3 points to anumber of imbalances in the literature to date. The samplesin this review were typically involved in experiments with ahigh language content (98%). The samples were usuallyadults (87%), performing in their gendered language (83%),with inanimate targets (76%), and with high sex/gender sa-lience in the task (62%). Samples also usually spoke a lan-guage with two gender categories (57%). In other words, theaverage experiment incorporated five out of the six parametersettings that are usually considered most conducive to resultsin support of relativity (inanimate targets being the exception).

Changes in the rate of support as a function of task param-eters are displayed in Table 9. In the following sections wedescribe comparisons where it was possible to isolate onecategory from another. A study that puts speakers of two-gendered and three-gendered languages into the same group,for example, was excluded entirely rather than added to bothcategories.

Table 9 Summary of shifts in the rate of support as a function of taskparameters

Parameter Setting Support Statistical Association Account

Gender/Sex High 51% (+ 49% Support) = Consistent

Low 2%

Language High N/A N/A

Low N/A

Gend. Lang. High 29% (– 3% Support) = Not consistent

Low 32%

No. of Gend. Two 43% (+ 27% Support) = Consistent

Three 16%

Target Animate 50% (+ 23% Support) = Consistent

Inanimate 27%

Age Adults 29% (– 26% Support) N/A

Children 55%

The rightmost column displays the outcomes in terms of the indicators ofa statistical association account rather than a relativity account.“Language” was almost entirely (98%) set at high

Table 6 Numbers of samples according to parameter setting and resultsclassification, for the similarity tasks

Parameter Setting Support Mixed No Support TOTAL

Language High 0 108 27 135

Low 105 0 0 105

Gender/Sex High 105 0 27 132

Low 0 108 0 108

Gend. Lang. High 0 108 27 135

Low 105 0 0 105

No. of Gend. Two 29 0 27 56

Three 40 108 0 148

TOTAL 105 108 27 240

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals.

Table 7 Numbers of samples according to parameter setting and resultsclassification, for the association tasks

Parameter Setting Support Mixed No Support TOTAL

No. of Gend. Two 0 0 18 18

Three 0 0 119 119

Age Adults 0 0 119 119

Children 0 0 18 18

TOTAL 0 0 137 137

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals

Table 8 Numbers of samples according to parameter setting and resultsclassification, for the object–name association tasks

Parameter Setting Support Mixed No Support TOTAL

No. of Gend. Two 25 35 0 60

Three 16 0 38 54

Gend. Lang. High 0 0 38 38

Low 41 35 0 76

TOTAL 41 35 38 114

Samples are repeated multiple times across parameters, hence differencefrom bottom-line totals

Psychon Bull Rev (2019) 26:1767–17861778

Page 13: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Gender/sex content Consistent with previous views of theliterature, as well as with a statistical association account ofgrammatical gender effects, studies with a high gender/sexcontent showed a higher rate of support (51%) than did studieswith low gender/sex content (2%). The voice choice, sex as-signment, EAST, association, and object–name associationtask types were always classified as high in gender/sex. Thesamples classified as low came almost entirely from the 2,689(96%) who performed property judgment tasks, which camewith only a 3% support rate. However, only 98 samples fromthis paradigm were classified as high, rendering any moredetailed comparisons unreliable.

Language content Almost all of the research was classified ashigh in language content (98%). The almost complete absenceof research classified as low in language content makes anyattempt to draw conclusions about this parameter liable tomislead. Only the priming and similarity studies by Sato andAthanasopoulos (2018) and Phillips and Boroditsky (2003),respectively—both of which were classified as support—wereclassified as low in language content.We return to this issue inour Discussion.

Gendered versus ungendered language A slightly lower rateof support was found for studies performed in a gendered lan-guage (29%) than in an ungendered language (32%). This is notconsistent with a statistical association account. When we com-pare performance in gendered and ungendered languages at thewithin-task level, we see that support is higher in a genderedlanguage context than in an ungendered language context in thesex assignment task (75% vs. 0%) and the properties task, albeitin the latter case with very low rates (3% vs. 0%). Support isslightly higher in the ungendered language for voice choice(73% vs. 56%) and priming tasks (44% vs. 31%). It is alsohigher for similarity tasks (100% vs. 0%) and object–nametasks (54% vs. 0%). Although 83% of all samples performedin a gendered language context, meaning that comparisonswere based on imbalanced sample sizes, the pattern of resultswithin tasks suggests that there is no clear support for the hy-pothesis that grammatical gender is more likely to influencethought when participants perform in a gendered language.

Two-gender versus three-gender languages The reviewshowed higher rates of support from studies with two-genderlanguages (43%) than with three-gender languages (16%).This outcomes is consistent with a statistical association ac-count. Broken down by task type, support from two-genderedlanguages over three-gendered languages came from the sexassignment tasks (83% vs. 23%), voice choice tasks (69% vs.24%), similarity tasks (52% vs. 27%), priming tasks (46% vs.32%), and object–name association tasks (42% vs. 30%).Only properties tasks (3% vs. 5%) reversed this pattern, albeitwith negligible support rates in each category.

Animate versus inanimate targets Consistent with a statisticalassociation account, studies with animate targets showed ahigher rate of support (50%) than did studies with inanimatetargets (27%). Broken down by task, this pattern was true ofsex assignment tasks (100% vs. 33%), priming tasks (78% vs.8%), and properties tasks (10% vs. 0%). The results from theEASTwere almost matched (11% vs. 10%). The reverse pat-tern was found for voice choice tasks (38% vs. 68%). Notethat the great bulk of the positive results from inanimate tar-gets comes from the voice choice task (90% of samples).

Adults versus children In apparent contrast with some viewsexpressed in the literature, research with children (55%) re-vealed a higher rate of support than research with adults(29%). However, this result is strongly weighted by task type.Overall, 87% of the samples came from adult participants, andthe great bulk of the data from children came from voicechoice and sex assignment tasks (92%). Given that the tasksthat children performed provided the highest rates of support,and those performed almost exclusively by adults providedthe lowest (e.g., properties judgments), it is difficult to knowwhether this outcome is the result of an age-related differenceor a task-related difference. Looking at age-related perfor-mance at the level of the individual task, there is some evi-dence that adults show a greater influence of grammaticalgender than children do; we see that the rate of support is30% higher in adults than children in voice choice tasks(69% vs. 39%), although it is 7% lower in adults sex assign-ment tasks (62% vs. 69%).

Other patterns

Almost half of all the data in the review (40%) came fromthe voice choice and sex assignment tasks, the two para-digms that make the clearest demands on participants toconsider targets in terms of biological sex. Since the po-tential for the strategic use of grammatical gender undersuch circumstances has been one of the most frequentissues brought up in the literature, we compared the rateof support from the review as a whole with the resultswith these two paradigms included or excluded. Overallsupport across the review drops from 32% to only 11% intheir absence, and no support rises from 43% to 64%. Inother words, when all the data are included, approximate-ly one in three samples in the review provides support;when the data from voice choice and sex assignment areexcluded, this rate drops to one in ten.

Discussion

At its broadest, the review shows that the evidence for aninfluence of grammatical gender on conceptualizations is

Psychon Bull Rev (2019) 26:1767–1786 1779

Page 14: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

highly task- and context-dependent. We found that the voicechoice and sex assignment tasks formed the backbone of sup-port for relativity; when they were removed, the support ratedropped to only 11%.With them included, about a third (32%)of the data were classified as support, relative to a no-supportrate of 43% and a mixed-support rate of 24%. If we considerthe possibility that publication biases mean that fewer nullresults make it to publication, it may be that even this supportrate is an overestimate.

The review provides support for a number of important con-straints on the relativity hypothesis. For example, the rate ofsupport is higher when the gender content of a task is high thanwhen it is low, suggesting any influence might be at least partlycontingent on the opportunity to strategically call upon gram-matical gender. Results are also more likely to be classified assupport when participants are processing animate rather thaninanimate targets, which also suggests that language might bepartly contingent on the immediacy of the overlap betweengrammatical gender and biological sex. This finding arguesagainst a singular, uniform effect of gender category on all itsmembers. Finally, results were more likely to offer supportwhen the gendered language has two gender categories ratherthan three; a finding that is inconsistent with a straightforwardaccount of grammatical gender classification per se influencingthe conceptualizations of objects. The review initially appearedto reveal one misconception concerning age; there was actuallya higher rate of support from samples of children than adults.Upon closer inspection, this outcome was closely bound to thefact that children performed those tasks that most consistentlyproduced positive results for relativity, namely voice choice andsex assignment. However, not all the predicted biases weresupported. We found no evidence that support was more com-mon when participants performed a task in a gendered-language context, which is what is predicted by thinking-for-speaking accounts. The only parameter that could not be mean-ingfully assessed at all concerned the salience of language in thetask, an issue that we return to later in this Discussion.

What these parameter-related comparisons reveal is thatmuch of the positive evidence for relativity comes from tasksand conditions that are particularly susceptible to alternative,strategy-based explanations. Overall, we therefore take theresults of this review as imposing quite powerful constraintson the relativity hypothesis as seen through the lens of gram-matical gender. Nevertheless, this conclusion itself comeswith a caveat; we feel the review also points to a significantweakness in much of the relevant research’s ability to speak tothe issue of relativity with clarity. This means that future re-search could either cement this rather negative conclusion, oroverturn it through stronger designs that are less susceptible toconfounds. As a result, we suggest that our review provides aninterim rather than final pattern of results.

We focus our discussion on those areas that we feel needaddressing in future work, and make suggestions as to some

ways for experiments to deal with them. First, we describewhy we feel much of the data speak only weakly to the ques-tion of relativity.

How do people solve the tasks?

Some have held that for relativity to be supported there shouldbe a reasonable expectation that participants did not engagegrammatical gender information strategically. This is mostclearly an option in tasks in which judgments are about sexand are explicit, such as voice choice and sex assignment, butpossibly also for similarity, association, and object–namememory associations. It is therefore important to distinguishbetween two means of arriving at decisions in many of thetasks in this review: one as a result of conceptual change, asusually hypothesized in relativity accounts, and anotherthrough metalinguistic knowledge. Metalinguistic knowledgerefers to the influence of the knowledge of a formal propertyof language, such as grammatical gender, on judgments aboutobjects. Although both processes are interesting, it is relativitythat the research in this review was designed to investigate.The problem is that the voice choice and sex assignment tasksupon which the bulk of the support for relativity apparentlyrests cannot tell the two apart.

It might be argued that researchers can simply ask partici-pants how they performed the task, and therefore be in a po-sition to rule out a metalinguistic strategy as a result.Interestingly, the cases in which participants have been askedhow they came to their decisions have thrown up mixed re-sults (e.g., Almutrafi, 2015; Kurinski & Sera, 2011; Sato &Athanasopoulos, 2018). Particularly convincing evidence fora conscious metalinguistic strategy account of voice choiceperformance comes from a study in which 25 out of 30 par-ticipants later admitted to using grammatical gender to guidetheir responses (Almutrafi, 2015). However, although it hasoften been assumed to be so, there is also no a priori reasonthat the use of metalinguistic strategy need be a consciousprocess at all. Regardless of how it might occur, the use ofmetalinguistic knowledge undermines a reading of results asthe outcome of conceptual change.

Some researchers have pointed out that participants rarelyif ever respond in a manner that is 100% consistent with bio-logical sex. This seems to argue against a metalinguistic strat-egy (conscious or otherwise). However, we cannot knowwhether participants had more than one strategy, some moreuniversal (such as masculine for artifacts, feminine for naturalkinds: Mullen, 1990), and some more idiosyncratic or person-al (see for example participants’ justifications for their choicesin Kurinski & Sera, 2011), with different strategies beingbrought to bear at different times throughout the task. Theabsence of a consistent, 100% effect of grammatical genderon task performance therefore does not preclude the

Psychon Bull Rev (2019) 26:1767–17861780

Page 15: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

possibility that metalinguistic strategies might account forsome of the effects that were found.

Is there evidence to support one or the other accountfrom the results of the review? Here again we run intothe logical problem that we cannot tell processes apart.For example, the finding that more support came fromspeakers of two-gendered languages than from speakersof three-gendered languages might be seen as weaken-ing metalinguistic strategy accounts, because there is noreason to believe Spanish speakers should prefer suchstrategies over German speakers, for example. However,the availability or attractiveness of metalinguistic strate-gies might, on the other hand, be enhanced where thereis a neater one-to-one mapping of grammatical gendercategory and biological sex.

Another potential criticism of a metalinguistic ratherthan a relativity account is that in taking the formerview, one subscribes to an intrinsically negative, preju-dicial default that effectively renders relativity empiri-cally impossible to support. Essentially, giving a meta-linguistic alternative explanation equal weight might setthe bar for actual conceptual change accounts impossi-bly high. However, the scope exists to tighten experi-mental design to guard against alternative, “killjoy” ex-planations; though this review suggests that, for themost part, when these measures are in place, the likeli-hood of finding a positive result declines.

In our view, the most convincing evidence of the poten-tial for metalinguistic strategy accounts comes from theresults of property judgment tasks. Since this task type doesnot incorporate a sex/gender prompt it keeps such informa-tion at arm’s length. The very low rate of only 3% supportfrom this task type might therefore reflect the absence ofsuch strategies in performance.

Overall, the voice choice and sex assignment para-digms are the most susceptible to alternative explana-tions, and further research using these tasks withoutserious modification is unlikely to reveal more aboutrelativity. Since these tasks are the most common inthe field, and similar objections can be raised againstother task types such as similarity, object–name memoryassociations, and associations, the pool of informationfrom which we can make the most meaningful infer-ences about the research question is likely to be small.It is for this reason that we feel the case for grammat-ical gender influencing concepts is currently difficult toweigh up and awaits future research (see also thePractical suggestions for future research section below).

Language on language or language on concepts?

The process of judging whether a task incorporates languageis a difficult one. For example, in a voice choice or sex

assignment task participants are sometimes only required toproduce a single word: “male,” “female,” “boy,” and so forth.They might only need to circle a letter M or F on a sheet ofpaper. Does this constitute a language process that might in-advertently recruit grammatical gender itself? What of the roleof the instructions of the task, which are linguistic, in formu-lating a linguistic process to arrive at a response? For almostall the tasks in the field, even language processing of the moreconspicuous kind is unavoidable, such as when making judg-ments based on linguistic stimuli in the EAST, or thinking ofadjectives to describe pictures in properties judgments. Forsome, linguistic relativity research using solely behavioralmeasures (response times and related patterns) is always sus-ceptible to linguistic processes (e.g., Gleitman & Papafragou,2013), and it is for this reason that some now advocate pri-marily neurophysiological approaches (Thierry, 2016).

The argument that language needs to be controlled in rela-tivity research is usually attributed to the thinking-for-speaking argument (Slobin, 1996), which was originallybased on the idea that languages require speakers to attendto certain aspects of a scene, such as temporal and spatialdetails, depending on what information their language re-quired (see also Slobin, 2003). Slobin later also conceived of“thinking for comprehending,” in which the languages wespeak also influences the way that we think about what wecomprehend (Slobin, 2003). Such a view could mean thattasks that present participants with words will also be subjectto the restrictions of thinking for speaking, as might tasks thatuse words about gender or sex in their instructions. Thiswould likely encompass almost all the tasks in this review.

In the tasks in this review, participants did not need toproduce the actual nouns for the target items themselves intheir response. There are some data from the review that wecan bring to bear on this question, though they are not conclu-sive. The thinking-for-speaking theory predicts that effects oflanguage on thought might not extend to performance outsideof the language in which such effects are sourced. Translatedinto grammatical gender research, effects should therefore bestrongest when performing in a gendered than in anongendered language context. This was not the case, thoughonly by a very subtle margin (29% to 32%). However, giventhe fact that 83% of the samples performed in a genderedlanguage context, it is also possible that further data fromresearch employing an ungendered context might lead to achange in this outcome, either in favor of or against thinkingfor speaking.

It is difficult to know where to draw the line between highand low language content. We classified all but two articles(Phillips & Boroditsky, 2003; Sato & Athanasopoulos, 2018),which involved only 133 samples in all, as being high inlanguage content. It could therefore be argued that almostthe entirety of the research in the review could have beentesting for an influence of language on language. We

Psychon Bull Rev (2019) 26:1767–1786 1781

Page 16: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

ourselves do not make this claim; this is almost certainly toostrong a conclusion to draw, given the heterogeneity of taskdesigns. In its broadest sense, the philosophical debate aroundthe involvement of language in behavior is beyond the remitof this review. More practically, however, we believe it diffi-cult to argue that the tasks described in this review varyenough in their language content to allow for meaningfulcomparisons along this dimension.

Practical suggestions for future research

We divide our suggestions for future research into two sec-tions, in order of importance. First, we point out that the re-sults of the review support the possibility that there might be afundamental flaw in the use of grammatical gender as a tool tospeak to the question of relativity at all. Second, we make thecase that if grammatical gender can provide an insight, thenfuture tasks would benefit from an overhaul in order to bettercontrol for alternative explanations.

Returning to the question of what is “gendered” about gram-matical gender? The results of the review make the case thateffects of grammatical gender are for the most part predictedby parameter settings that would be consistent with a statisti-cal association account, at least as well as by a conceptual-change view of relativity. This is because most settings thatpromote the association between biological sex and grammat-ical gender enhance the probability that effects will be found.

That relativity is scaffolded by associations between lan-guage and thought, rather than language as thought, is a viewthat is partially consistent with contemporary thought in thefield, such as the label-feedback hypothesis (Lupyan, 2012)and its offshoot the structural feedback hypothesis (Sato &Athanasopoulos, 2018). These theories contend that labels orgrammatical information hone attention to associated features,which in turn feed back down to lower-level processes in afeedback loop. These effects can be upregulated or downreg-ulated by the salience of the relevant linguistic information inthe task. The results of the review, as well as results fromstudies in which participants are briefly trained in inventedor real languages and come to behave in line with those lan-guages (e.g., Boroditsky, 2001; Casasanto, 2010; Phillips &Boroditsky, 2003), suggest that this is indeed the case.However, where a statistical association account departs fromthese accounts is that the latter accounts allow for some degreeof change at the conceptual level, but a statistical associationaccount does not. A statistical association account would pre-dict that if an Italian speaker is processing the target “bed” inthe context of gender/sex, for example, then the concept ofmasculinity might receive activation by an association ratherthan by any lasting conceptual rub-off. This would be similar,for example, to the statistical association between the conceptsof sunshine and ice cream; we would be unlikely to conceive

of sunshine as being similar to ice cream. Put simply, a statis-tical association account need not require conceptual change,especially long-lived change, to occur at all, and would there-fore be incompatible with the spirit of relativity in any of itstheoretical incarnations.

As we stated in our introduction, this review is not in aposition to make such a distinction between relativity and itsalternatives, in part because more data are required, but alsobecause our review did not find enough unambiguous supportfor relativity to discriminate between the possibilities. Forexample, it is difficult to assess positive results from voicechoice and sex assignment in the light of the hierarchical tax-onomy of relativity accounts by Wolff and Holmes (2011),which ranges from the strongest form of relativity (“thoughtis language”) through to subtler effects, such as “language asspotlight,” when a nonrelativistic account is at least as likelyan explanation for the results from such tasks. Instead, ourreview is in a position to weigh up the size of the problem inrelating much grammatical gender research to relativity at all,in any of its forms. Of the five principal parameter settings thata statistical association account would predict, one (languagecontent) was impossible to draw meaningful inferences from;three (target type, number of gender categories, and salienceof sex/gender) resulted in higher overall rates of support, andonly one (gendered language context) was equivocal. It isperhaps important to note that the latter parameter did notrun powerfully in the opposite direction to what a statisticalassociation account would predict; there was only a – 3%support rate difference, far smaller than the next smallest dif-ference of + 23%, which was instead in favor of the account.The sixth parameter—age—was also equivocal, but was lessimportant for the account in any case. We therefore interpretthese results as framing and underlining the case for a statis-tical association account, by which arbitrary labels for gram-matical classes interact with arbitrary assignments of nouns tothose classes, under conditions that facilitate their association.This is not to imply that we prefer such an account, or to ruleout the possibility that multiple factors might have simulta-neous and additive effects. It does imply, however, that gram-matical gender is presently a foggier lens through which toinspect the case for relativity than the domains of categoricalperception, space or time, to name a few.

As a first step, it would be useful to establish whether“masculine” and “feminine” are psychologically privileged“attractor” concepts that impose their status on other objectsin their grammatical class, or whether these metalinguisticlabels are themselves arbitrary. This is important because itwould help researchers to understand whether the idea of arelationship between grammatical “gender” and biological sexhas any psychological reality. If it does, then it becomes morelikely that the members of a class are in some sense imbuedwith this conceptual relationship, and the case for conceptual-change accounts would be enhanced.

Psychon Bull Rev (2019) 26:1767–17861782

Page 17: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

There is already a study that suggests a method by which totest this. In one experiment, not included in this review be-cause it involved invented languages, native English speakerswere taught “Gumbuzi,” an artificial language with two arti-ficial grammatical “gender” groups, labeled “soupative” and“oosative” (Phillips & Boroditsky, 2003). Participants weretaught ten items in each group, six of which were inanimateobjects, and the remaining four were humans who were eitherall female or all male. After learning which items wereassigned to which category, participants rated the similarityof human–object pairs both within and across the two groups.The results showed that pairs from the same group were ratedas being more similar than items from different groups, lead-ing the authors to conclude that there can be a causative (i.e.,learned) relationship between grammatical gender and peo-ple’s conceptualizations of objects.

Since 40% of all the items in a group were humans of thesame biological sex, the groups were strongly biased towardsex/gender, regardless of the labels “soupative” and“oosative.” It is also likely that the participants were awareof such things as grammatical gender categories through for-mal second language instruction in schools, knowledge ofwhich they could have applied to the task. Additionally, thelabels “oosative” and “soupative” fail to actually describe anyof the items within each group; real grammatical gender cate-gories are at least partially correlated with the biological sex ofits members. Nevertheless, this study provides a template for afuture study that might teach one group of participants thatgroups are called masculine and feminine, and another groupthat the groups are named after another and equallyrepresented natural kind. If the arbitrary labeling of the clas-sifiers themselves drives performance, then the results of sim-ilarity ratings should pattern in line with other labels at least asmuch as they would with masculine and feminine. This wouldsuggest that any influence of grammatical gender is a human-made one that is independent of linguistic structure and lack-ing in psychological reality, undermining the notion of a con-ceptual relationship between grammatical gender and biolog-ical sex, and in turn favoring a statistical associationrelationship.

Dulling Occam’s razor If grammatical gender is not merely acultural label, then we follow Ramos and Roberson (2011)and others who suggest that studies be conducted that aim torestrict both gender and language to as oblique a role aspossible. Property judgment tasks do the former very well,but language remains fundamental, and in any case theevidence from these tasks is to date overwhelminglynegative. Instead, the priming tasks by Sato andAthanasopoulos (2018) would seem strong candidates for fu-ture investigations. In their first experiment, French–Englishparticipants were found to be slower to indicate whether twoobjects were associated with a male or female face when the

grammatical gender of those objects was incongruent with thebiological sex of the person. In a second experiment, partici-pants matched one of two trait words (e.g., “charming,” “re-alistic”) to a now genderless face after being primed with apair of objects, such as a tie and a spade. The results againpointed to an influence of the grammatical gender of the ob-jects in French–English bilinguals’ choices. These studies,while not eliminating biological sex and language altogether,keep some distance between these and participants’ actualresponses, because associations come from the task-irrelevant grammatical genders of objects that participantswere presented with earlier. Future work might find a way ofmaking this distance greater still, and include direct statisticalcomparisons between speakers of a gendered language andspeakers of a nongendered language in order to establish moreclearly that any effects are attributable to grammatical genderspecifically.7

Limitations

This review represents, to the best of our knowledge,the first systematic attempt to assess the literature in aquantitative manner. The heterogeneity of methods andtheir uneven representation in the literature presented uswith a difficult decision; to group together research ofdifferent types, or to provide a finer-grained picture. Wetook the view that it was better in a first review toprovide a nuanced picture that takes into account differ-ences between, for example, instructing participants toassign a voice to an object or a sex to an object. Thisdoes have the drawback of making it harder to makereliable inferences based on less well-used tasks, in par-ticular object–name memory associations, associationtasks, and similarity tasks. On the other hand, it makesit easier for later work to be incorporated into futurereviews.

A more difficult and subjective issue concerns the interpre-tation of results classified as offering mixed support. The ar-gument for the inclusion of this category is to our minds quitecompelling. If we take, for example, the finding that trainingin Spanish improves the rate at which voice choices are con-sistent with grammatical gender, but nevertheless fails to raisethis rate above chance (Kurinski & Sera, 2011), neither anentirely cautious approach (i.e., this result finds no supportfor relativity) nor an endorsement (voice choices are consis-tent with grammatical gender) naturally follows, and the needfor a third category becomes clear. To present as objective aview as possible, we have for the most part focused on the rateof support in the first instance, the rate of no support second,and mixed support only where it is necessary, such as in

7 These studies also tested monolingual English speakers, but the pertinentanalyses were performed within groups.

Psychon Bull Rev (2019) 26:1767–1786 1783

Page 18: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

conditions in which there are few data on either side of thismiddle category.

Conclusion

In conclusion, our review showed that support for an influenceof grammatical gender on concepts is strongly task- and con-text-dependent. Support also comes for the most part fromtasks that are susceptible to clear alternative explanations.Perhaps most importantly, it needs to be empiricallyestablished that grammatical gender itself is not a culturallabel but a concept with psychological reality before any in-fluence can be reasonably attributed to truly linguisticprocesses.

Compliance with ethical standards

Conflict of interest The authors report no conflicts of interest.

Open practices statement Details about the review and the searchmechanisms can be found in the supplemental online materials (SOM1and SOM2).

References

Almutrafi, F. (2015). Language and cognition: Effects of grammaticalgender on the categorisation of objects (Unpublished doctoral the-sis). Newcastle University, Newcastle upon Tyne, UK

Athanasopoulos, P. (2006). Effects of the grammatical representation ofnumber on cognition in bilinguals. Bilingualism: Language andCognition, 9, 89–96.

Athanasopoulos, P. (2009). Cognitive representation of colour in bilin-guals: The case of Greek blues. Bilingualism: Language andCognition, 12, 83–95.

Athanasopoulos, P., & Boutonnet, B. (2016). Learning grammatical gen-der in a second language changes categorization of inanimate ob-jects: Replications and new evidence from English learners of L2French. In R. Alonso (Ed.), Cross-linguistic influence in secondlanguage acquisition (pp. 173–192). Bristol, UK: MultilingualMatters.

Athanasopoulos, P., & Bylund, E. (2013). Does grammatical aspect affectmotion event cognition? A cross-linguistic comparison of Englishand Swedish speakers. Cognitive Science, 37, 286–309.

Bassetti, B. (2007). Bilingualism and thought: Grammatical gender andconcepts of objects in Italian–German bilingual children.International Journal of Bilingualism, 11, 251–273.

Bassetti, B., & Nicoladis, E. (2016). Research on grammatical gender andthought in early and emergent bilinguals. International Journal ofBilingualism, 20, 3–16.

Belacchi, C., & Cubelli, R. (2012). Implicit knowledge of grammaticalgender in preschool children. Journal of Psycholinguistic Research,41, 295–310.

Beller, S., Brattebø, K. F., Lavik, K. O., Reigstad, R. D., & Bender, A.(2015). Culture or language: What drives effects of grammaticalgender? Cognitive Linguistics, 26, 331–359.

Bender, A., Beller, S., & Klauer, K. C. (2011). Grammatical gender inGerman: A case for linguistic relativity? Quarterly Journal ofExperimental Psychology, 64, 1821–1835.

Bender, A., Beller, S., & Klauer, K. C. (2016a). Crossing grammar andbiology for gender categorisations: Investigating the gender congru-ency effect in generic nouns for animates. Journal of CognitivePsychology, 28, 530–558.

Bender, A., Beller, S., & Klauer, K. C. (2016b). Lady Liberty andGodfather Death as candidates for linguistic relativity?Scrutinizing the gender congruency effect on personified allegorieswith explicit and implicit measures. Quarterly Journal ofExperimental Psychology, 69, 48–64.

Bender, A., Beller, S., & Klauer, K. C. (2018). Gender congruency from aneutral point of view: The roles of gender classes and conceptualconnotations. Journal of Experimental Psychology: Learning,Memory, and Cognition, 44, 1580–1608.

Bobb, S. C., & Mani, N. (2013). Categorizing with gender: Does implicitgrammatical gender affect semantic processing in 24-month-old tod-dlers? Journal of Experimental Child Psychology, 115, 297–308.

Boroditsky, L. (2001). Does language shape thought? Mandarin andEnglish speakers’ conceptions of time. Cognitive Psychology, 43,1–22.

Boroditsky, L., & Schmidt, L. A. (2000). Sex, syntax, and semantics. InL. R. Gleitman & A. K. Joshi (Eds.), Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society (pp.42–46). Mahwah, NJ: Erlbaum.

Boutonnet, B., Athanasopoulos, P., & Thierry, G. (2012). Unconsciouseffects of grammatical gender during object categorisation. BrainResearch, 1479, 72–79.

Brown, A., Lindsey, D. T., & Guckes, K. M. (2011). Color names, colorcategories, and color-cued visual search: Sometimes, color percep-tion is not categorical. Journal of Vision, 11(12), 2:1–21. https://doi.org/10.1167/11.12.2

Brown, R., & Lenneberg, E. (1954). A study in language and cognition.Journal of Abnormal and Social Psychology, 49, 454–462.

Casasanto, D. (2010). Space for thinking. In V. Evans & P. Chilton (Eds.),Language, cognition and space: The state of the art and newdirections (pp. 453–478). London, UK: Equinox.

Casasanto, D., Boroditsky, L., Phillips, W., Greene, J., Goswami, S.,Bocanegra-Thiel, S., . . . Gil, D. (2004). How deep are effects oflanguage on thought? Time estimation in speakers of English,Indonesian, Greek, and Spanish. In K. Forbus, D. Gentner, & T.Regier (Eds.), Proceedings of the 26th Annual Conference of theCognitive Science Society (pp. 186–191). Mahwah, NJ: Erlbaum.

Cook, S. V. (2016). Gender matters: From L1 grammar to L2 semantics.Bilingualism: Language and Cognition, 1–19.

Corbett. (1991). Gender. Cambridge, UK: Cambridge University Press.Costa, A., Kovacic, D., Fedorenko, E., & Caramazza, A. (2003). The

gender congruency effect and the selection of freestanding andbound morphemes: Evidence from Croatian. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 29,1270–1282.

Costa, A., Kovacic, D., Franck, J., & Caramazza, A. (2003). On theautonomy of the grammatical gender systems of the two languagesof a bilingual. Bilingualism: Language and Cognition, 6, 181–200.

Cubelli, R., Paolieri, D., Lotto, L., & Job, R. (2011). The effect of gram-matical gender on object categorization. Journal of ExperimentalPsychology: Learning, Memory, and Cognition, 37, 449–460.

De Bruin, A., Treccani, B., & Della Sala, S. (2015). Cognitive advantagein bilingualism: An example of publication bias? PsychologicalScience, 26, 99–107.

Degani, T. (2007). The semantic role of gender: Grammatical and bio-logical gender match effects in English and Spanish (Unpublishedmaster’s thesis). University of Pittsburgh, Pittsburgh, PA.

Dilkina, K., McClelland, J. L., & Boroditsky, L. (2007, August). Howlanguage affects thought in a connectionist model. Paper presentedat the 29th Annual Meeting of the Cognitive Science Society,Nashville, TN.

Psychon Bull Rev (2019) 26:1767–17861784

Page 19: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Dolscheid, S., Shayan, S., Majid, A., & Casasanto, D. (2013). The thick-ness of musical pitch: Psychophysical evidence for linguistic rela-tivity. Psychological Science, 24, 613–621.

Eberhard, K. M., Heilman, M., & Scheutz, M. (2005, July). An empiricaland computational test of linguistic relativity. Paper presented at the27th Annual Meeting of the Cognitive Science Society, Stresa, Italy.

Firestone, C., & Scholl, B. J. (2016). Cognition does not affect percep-tion: Evaluating the evidence for “top-down” effects. Behavioraland Brain Sciences , 39 , e229. https://doi.org/10.1017/S0140525X15000965

Flaherty, M. (2001). How a language gender system creeps into percep-tion. Journal of Cross-Cultural Psychology, 32, 18–31.

Forbes, J. N., Poulin-Dubois, D., Rivero, M. R., & Sera, M. D. (2008).Grammatical gender affects bilinguals’ conceptual gender:Implications for linguistic relativity and decision making. OpenApplied Linguistics Journal, 1, 68–76.

Foundalis, H. E. (2002). Evolution of gender in Indo-European lan-guages. In W. D. Gray & C. Schunn (Eds.), Proceedings of theAnnual Meeting of the Cognitive Science Society (pp. 304–308).Mahwah, NJ: Erlbaum.

Franklin, A., Clifford, A.,Williamson, E., &Davies, I. (2005). Color termknowledge does not affect categorical perception of color in tod-dlers. Journal of Experimental Child Psychology, 90, 114–141.

Gilbert, A. L., Regier, T., Kay, P., & Ivry, R. B. (2006). Whorf hypothesisis supported in the right visual field but not the left. Proceedings ofthe National Academy of Sciences, 103, 489–494.

Gilbert, A. L., Regier, T., Kay, P., & Ivry, R. B. (2008). Support forlateralization of the Whorf effect beyond the realm of color discrim-ination. Brain and Language, 105, 91–98.

Gleitman, L., & Papafragou, A. (2013). Relations between language andthought. In D. Reisberg (Ed.), The Oxford handbook of cognitivepsychology (pp. 504–523). NewYork, NY: Oxford University Press.

Haertlé, I. (2017). Does grammatical gender influence perception? Astudy of Polish and French speakers. Psychology of Language andCommunication, 21, 386–407.

Imai, M., & Gentner, D. (1997). A cross-linguistic study of early wordmeaning: Universal ontology and linguistic influence. Cognition,62, 169–200.

Imai, M., Schalk, L., Saalbach, H., & Okada, H. (2014). All giraffes havefemale-specific properties: Influence of grammatical gender on de-ductive reasoning about sex-specific properties in German speakers.Cognitive Science, 38, 514–536.

Kaushanskaya, M., & Smith, S. (2016). Do grammatical-gender distinc-tions learned in the second language influence native-language lex-ical processing? International Journal of Bilingualism, 20, 30–39.

Konishi, T. (1993). The semantics of grammatical gender: A cross-cultural study. Journal of Psycholinguistic Research, 22, 519–534.

Konishi, T. (1994). The connotations of gender: A semantic differentialstudy of German and Spanish. Word, 45, 317–327.

Kousta, S.-T., Vinson, D. P., & Vigliocco, G. (2008). Investigating lin-guistic relativity through bilingualism: The case of grammatical gen-der. Journal of Experimental Psychology: Learning, Memory, andCognition, 34, 843–858.

Kurinski, E., Jambor, E., & Sera, M. D. (2016). Spanish grammaticalgender: Its effects on categorization in native Hungarian speakers.International Journal of Bilingualism, 20, 76–93.

Kurinski, E., & Sera, M. D. (2011). Does learning Spanish grammaticalgender change English-speaking adults’ categorization of inanimateobjects? Bilingualism: Language and Cognition, 14, 203–220.

Lambelet, A. (2016). Second grammatical gender system and grammat-ical gender-linked connotations in adult emergent bilinguals withFrench as a second language. International Journal ofBilingualism, 20, 62–75.

Landor, R. (2014). Grammatical categories and cognition across fivelanguages: The case of grammatical gender and its potential effects

on the conceptualisation of objects (Doctoral thesis). GriffithUniversity, Brisbane, Australia.

Lucy, J. A. (2016). Recent advances in the study of linguistic relativity inhistorical context: A critical assessment. Language Learning, 66,487–515.

Lupyan, G. (2012). Linguistically modulated perception and cognition:The label-feedback hypothesis. Frontiers in Psychology, 3, 54.https://doi.org/10.3389/fpsyg.2012.00054

Martinez, I. M., & Shatz, M. (1996). Linguistic influences on categoriza-tion in preschool children: A crosslinguistic study. Journal of ChildLanguage, 23, 529–545.

Mickan, A., Schiefke, M., & Stefanowitsch, A. (2014). Key is a llave is aSchlussel: A failure to replicate an experiment fromBoroditsky et al.2003. Yearbook of the German Cognitive Linguistics Association, 2,39–50.

Montefinese, M., Ambrosini, E., & Roivainen, E. (2019). No grammati-cal gender effect on affective ratings: Evidence from Italian andGerman languages. Cognition and Emotion, 33, 848–854. https://doi.org/10.1080/02699931.2018.1483322

Mullen, M. K. (1990). Children’s classifications of nature and artifactpictures into female and male categories. Sex Roles, 23, 577–587.

Nicoladis, E. (2019). [Unpublished data].Nicoladis, E., Da Costa, N., & Foursha-Stevenson, C. (2016). Discourse

relativity in Russian-English bilingual preschoolers’ classification ofobjects by gender. International Journal of Bilingualism, 20, 17–29.

Nicoladis, E., & Foursha-Stevenson, C. (2012). Language and cultureeffects on gender classification of objects. Journal of Cross-Cultural Psychology, 43, 1095–1109.

Park, H. I., & Ziegler, N. (2014). Cognitive shift in the bilingual mind:Spatial concepts in Korean–English bilinguals. Bilingualism:Language and Cognition, 17, 410–430.

Pavlidou, T.-S., & Alvanoudi, A. (2013). Grammatical gender and cog-nition. In N. Lavidas, T. Alexio, & A. Sougari (Eds.),Major trendsin theoretical and applied linguistics (Vol. 2, pp. 109–124). London,UK: Versita.

Pavlidou, T.-S., & Alvanoudi, A. (2018). Conceptualizing the world as“female” or “male”: Further remarks on grammatical gender andspeakers’ cognition. In N. Topintzi, N. Lavidas, & M. Moumtzi(Eds.), Selected papers on theoretical and applied linguistics fromISTAL23. Thessaloniki, Greece: School of English, AristotleUniversity of Thessaloniki.

Phillips, W., & Boroditsky, L. (2003, August). Can quirks of grammaraffect the way you think? Grammatical gender and object concepts.Paper presented at the 25th Annual Meeting of the CognitiveScience Society, Boston, MA.

Pinker, S. (1994). The language instinct. New York, NY, US: WilliamMorrow & Co.

Pylyshyn, Z. W. (1984). Computation and cognition. Cambridge, MA:MIT Press.

Ramos, S., & Roberson, D. (2011). What constrains grammatical gendereffects on semantic judgements? Evidence from Portuguese.Journal of Cognitive Psychology, 23, 102–111.

Roberson, D., Pak, H., & Hanley, J. R. (2008). Categorical perception ofcolour in the left and right visual field is verbally mediated:Evidence from Korean. Cognition, 107, 752–762.

Saalbach, H., Imai, M., & Schalk, L. (2012). Grammatical gender andinferences about biological properties in German-speaking children.Cognitive Science, 36, 1251–1267.

Samuel, S., Roehr-Brackin, K., Pak, H., & Kim, H. (2018). Culturaleffects rather than a bilingual advantage in cognition: A reviewand an empirical study. Cognitive Science, 42, 2313–2341.

Samuel, S., Roehr-Brackin, K., & Roberson, D. (2016). “She says, hesays”: Does the sex of an instructor interact with the grammaticalgender of targets in a perspective-taking task? International Journalof Bilingualism, 20, 40–61.

Psychon Bull Rev (2019) 26:1767–1786 1785

Page 20: Grammatical gender and linguistic relativity: A systematic ...THEORETICAL REVIEW Grammatical gender and linguistic relativity: A systematic review Steven Samuel1 & Geoff Cole1 & Madeline

Sato, S., & Athanasopoulos, P. (2018). Grammatical gender affects gen-der perception: Evidence for the structural-feedback hypothesis.Cognition, 176, 220–231.

Sedlmeier, P., Tipandjan, A., & Jänchen, A. (2016). How persistent aregrammatical gender effects? The case of German and Tamil. Journalof Psycholinguistic Research, 45, 317–336.

Segel, E., & Boroditsky, L. (2011). Grammar in art. Frontiers inPsychology, 1, 244. https://doi.org/10.3389/fpsyg.2010.00244

Semenuks, A., Phillips, W., Dalca, I., Kim, C., & Boroditsky, L. (2017).Effects of grammatical gender on object description. In G.Gunzelmann, A. Howes, T. Tenbrink, & E. Davelaar (Eds.),Proceedings of the 39th Annual Meeting of the Cognitive ScienceSociety (pp. 1060–1065). Austin, TX: Cognitive Science Society.

Sera, M. D., Berge, C. A. H., & del Castillo Pintado, J. (1994).Grammatical and conceptual forces in the attribution of gender byEnglish and Spanish speakers. Cognitive Development, 9, 261–292.https://doi.org/10.1016/0885-2014(94)90007-8

Sera, M. D., Elieff, C., Forbes, J., Burch, M. C., Rodríguez, W., &Dubois, D. P. (2002). When language affects cognition and whenit does not: An analysis of grammatical gender and classification.Journal of Experimental Psychology: General, 131, 377–397.

Slobin, D. I. (1996). From “thought and language” to “thinking for speak-ing.” In J. J. Gumperz & S. C. Levinson (Eds.), Rethinking linguisticrelativity (pp. 70–96). Cambridge, UK: CambridgeUniversity Press.

Slobin, D. I. (2003). Language and thought online: Cognitive conse-quences of linguistic relativity. In D. Gentner & S. Goldin-Meadow (Eds.), Language in mind: Advances in the study of lan-guage and thought (pp. 157–192). Cambridge, MA: MIT Press.

Thierry, G. (2016). Neurolinguistic relativity: How language flexes hu-man perception and cognition. Language Learning, 66, 690–713.

Vernich, L. (2017). Does learning a foreign language affect object cate-gorization in native speakers of a language with grammatical

gender? The case of Lithuanian speakers learning three languageswith different types of gender systems (Italian, Russian andGerman). International Journal of Bilingualism, 23, 417–436.

Vernich, L., Argus, R., & Kamandulytė-Merfeldienė, L. (2017).Extending research on the influence of grammatical gender on ob-ject classification: A cross-linguistic study comparing Estonian,Italian and Lithuanian native speakers. Eesti RakenduslingvistikaÜhingu Aastaraamat, 13, 223–240.

Vigliocco, G., Vinson, D. P., Paganelli, F., & Dworzynski, K. (2005).Grammatical gender effects on cognition: Implications for languagelearning and language use. Journal of Experimental Psychology:General, 134, 501–520.

Vuksanovic, J., Bjekic, J., & Radivojevic, N. (2015). Grammatical genderand mental representation of object: The case of musical instru-ments. Journal of Psycholinguistic Research, 44, 383–397.

Whorf, B. L. (1956). Language, thought and reality: Selected writing ofBenjamín Lee Whorf. Cambridge, MA: MIT Press.

Winawer, J., Witthoft, N., Frank, M. C., Wu, L., Wade, A. R., &Boroditsky, L. (2007). Russian blues reveal effects of language oncolor discrimination. Proceedings of the National Academy ofSciences, 104, 7780–7785.

Witzel, C., & Gegenfurtner, K. R. (2013). Categorical sensitivity to colordifferences. Journal of Vision, 13(7), 1:1–33. https://doi.org/10.1167/13.7.1

Wolff, P., & Holmes, K. J. (2011). Linguistic relativity. WileyInterdisciplinary Reviews: Cognitive Science, 2, 253–265.

Yorkston, E., & De Mello, G. E. (2005). Linguistic gender marking andcategorization. Journal of Consumer Research, 32, 224–234.

Publisher’s note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Psychon Bull Rev (2019) 26:1767–17861786