lawrence e. marks (1989) on cross-modal similarity: the perceptual structure of pitch, loudness, and...

17
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 1/17 Journal of Experimental Psychology: Human Perception and Performance 1989, Vol. 15, No. 3 586-602 Copyright 1989 by the American Psychological Association, Inc. 0096-1523/89/J00.75 On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness Lawrence E. Marks John B. Pierce Foundation Laboratory and Yale University Examined howpitch and loudness correspond to brightness. In the Experiment 1,16 Ss identified which of 2 lights more resembled each of 16 tones; in Experiment 2, 8 of the same 16 Ss rated the similarity of lights to lights, tones to tones, and lights to tones. (1) Pitch and loudness both contributed to cross-modal similarity, but for most Ss pitch contributed more. (2) Individuals differed as to whether pitch or loudness contributed more; these differences were consistent across matching and similarity scaling. (3) Cross-modal similarity depended largely on relative stimulus values. (4) Multidimensional scaling revealed 2 perceptual dimensions, loudness and pitch, with brightness common to both. A simple quantitative model can describe the cross-modal compar- isons, compatible with the view that perceptual similarity may be characterized through a malleable spatial representation that is multimodal as well as multidimensional. Over the past several years, I have explored the perceptual similarities that characterize relatively abstract dimensions of auditory and visual experience. These similarities reveal them- selves in a variety of domains, including (a) synesthetic per- ception (Marks, 1975), (b) cross-modal matching (Marks, 1974, 1978; Marks, Hammeal, & Bornstein, 1987), (c) inter- pretation of cross-modal metaphors (Marks, 1982a; 1982b; Marks et al., 1987), and (d) speed of sensory information processing (Marks, 1987a). Evidence from all four domains establishes the existence of correspondences between the vis- ual dimensions of brightness, lightness, size, and shape on the one hand and the auditory dimensions of pitch and loudness on the other. In particular, much of this evidence converges to support the view that the single dimension of brightness maps onto the two dimensions of pitch and loudness. Domains of Cross-Modal Similarity An observation commonly made by synesthetic perceivers, in whom sounds involuntarily and unavoidably arouse sec- ondary visual qualities or images, is that loud (vs. soft) and high-pitched (vs. low-pitched) tones produce bright (vs. dim or dark) colors (Karowski, Odbert, & Osgood, 1942; Marks, 1975). Despite the considerable diversity and idiosyncrasy that synesthetic perceivers report in their experiences, certain aspects of synesthetic experiences appear uniform, even uni- versal. Notable in this respect is the correspondence between This research was supported by National Institutes of Health Grant NS21326. I thank Lynne Papiernik and Elizabeth Warner for their assistance and Robert Melara for his critical reading and insights. James Cutting, Michael Kubovy, Bruce Schneider, and an anonymous reviewer provided thoughtful comments. Correspondence concerning this article should be addressed to Lawrence E. Marks, John B. Pierce Foundation Laboratory, 290 Congress Avenue, New Haven, Connecticut 06519. pitch and brightness, which represents perhaps the most sali- ent and pervasive relation in synesthetic perception. The second finding to note is that under conditions that permit "free," unconstrained cross-modal matching (i.e., matching in which the direction or ordering of sensations is not specified in advance), nonsynesthetic perceivers (adults and children) match both loud and high-pitched sounds with bright lights, and soft and low-pitched sounds with dim lights (Marks, 1974, 1978; Marks et al., 1987). Thus the equiva- lences that appear in involuntary, synesthetic perception also appear in voluntary, cross-modal matching. Third, these very same structural equivalences, evident in synesthesia and cross-modal matching, emerge in language. Children and adults alike interpret words that describe high loudness or high pitch to imply bright colors, words that describe low loudness or low pitch to imply dim colors, and vice versa: For example, sunlight is rated louder and higher in pitch than moonlight; roar is rated brighter than whisper (Marks, 1982a; Marks et al., 1987). To a great extent the understanding of cross-sensory metaphors follows the rules governing synesthetic and cross-modal perception. Synesthetic perception, cross-modal matching, and com- prehension of cross-sensory metaphor all speak to the issue of structural relations among dimensions of experience. Cross- modal correspondences appear also in functional processes, in sensory information processing. The fourth finding is that reaction times to bright lights are faster (and errors are fewer) when the lights are accompanied by loud (vs. soft) or by high- pitched (vs. low-pitched) sounds, whereas reactions times to dim lights are faster when accompanied by soft or by low- pitched sounds (Marks, 1987a). Moreover, low- and high- pitched sounds similarly affect speed of response in discrimi- nating words describing visual experiences (Walker & Smith, 1984). In nonsynesthetic perceivers, therefore, correspond- ences between dimensions of heteromodal experiences need not rely on voluntary behavior; interactions between dimen- sions can take place quite autom atically. Thus the matches of loudness with brightness and of pitch with brightness pervade 586

Upload: carloselemesmo

Post on 30-May-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 1/17

Journal of Experimental Psychology:H um an Perception and Performance1989, Vo l. 15, No. 3 586-602

Copyright 1989 by the American Psychological Association, Inc.0096-1523/89/J00.75

On Cross-Modal Similarity: The Perceptual Structure of Pitch,

Loudness, and Brightness

Lawrence E. MarksJohn B. Pierce Foundation Laboratory and Y ale University

Examined howpitch and loudness correspond to brightness. In the Experiment 1,16 Ss identified

which of 2 lights more resembled each of 16 tones; in Experiment 2, 8 of the same 16 Ss rated

the similarity of lights to lights, tones to tones, and lights to tones. (1 ) Pitch and loudness both

contributed to cross-modal similarity, but for most Ss pitch contributed more. (2) Individuals

differed as to whether pitch or loudness contributed more; these differences were consistent across

matching and similarity scaling. (3) Cross-modal similarity depended largely on relative stimulus

values. (4) Multidimensional scaling revealed 2 perceptual dimensions, loudness and pitch, with

brightness common to both. A simple quantitative model can describe the cross-modal compar-

isons, compatible with the view that perceptual similarity may be characterized through a

malleable spatial representation that is multimodal as well as multidimensional.

Over the past several years, I have explored the perceptualsimilarities that characterize relatively abstract dimensions ofauditory and visual experience. These similarities reveal them -selves in a variety of domains, including (a) synesthetic per-ception (Marks, 1975), (b) cross-modal matching (Marks,1974, 1978; Marks, Hammeal, & Bornstein, 1987), (c) inter-pretation of cross-modal metaphors (Marks, 1982a; 1982b;Marks et al., 1987), and (d) speed of sensory informationprocessing (Marks, 1987a). Evidence from all four domainsestablishes the existence of correspondences between the vis-ual dimensions of brightness, lightness, size, and shape on theone hand and the auditory dimensions of pitch and loudness

on the other. In particular, much of this evidence convergesto support the view that the single dimension of brightnessmaps onto the two dimensions of pitch and loudness.

Domains of Cross-Modal Similarity

An observation commonly made by synesthetic perceivers,in whom sounds involuntarily and unavoidably arouse sec-ondary visual qualities or images, is that loud (vs. soft) andhigh-pitched (vs. low-pitched) tones produce bright (vs. dimor dark ) colors (Karowski, Odbert, & Osgood, 1942; Mark s,1975). Despite the considerable diversity and idiosyncrasythat synesthetic perceivers repo rt in their experiences, certain

aspects of synesthetic experiences appear uniform, even uni-versal. Notable in this respect is the correspondence between

This research was supported by National Institutes of Health Grant

NS21326.

I thank Lynne Papiernik and Elizabeth Warner for their assistance

and Robert Melara for his critical reading and insights. James Cutting,

Michael Kubovy, Bruce Schneider, and an anonymous reviewer

provided thoughtful comments.

Correspondence concerning this article should be addressed to

Lawrence E. Marks, John B. Pierce Foundation Laboratory, 290

Congress Avenue, New Haven, Connecticut 06519.

pitch and brightness, w hich represents perhaps the most sali-ent and pervasive relation in synesthetic perception.

The second finding to note is that under conditions thatpermit "free," unconstrained cross-modal m atching (i.e.,matching in which the direction or ordering of sensations isnot specified in advance), nonsynesthetic perceivers (adultsand children) match both loud and high-pitched sounds withbright lights, and soft and low-pitched sounds with dim lights(Marks, 1974, 1978; Marks et al., 1987). Thus the equiva-lences that appear in involun tary, synesthetic perception alsoappear in voluntary, cross-modal matching.

Third, these very same structural equivalences, evident in

synesthesia and cross-modal matching, emerge in language.Children and adults alike interpret words that describe highloudness or high pitch to imply bright colors, words thatdescribe low loudness or low pitch to imply dim colors, andvice versa: For example, sunlight is rated louder and higherin pitch than moonlight; roar is rated brighter than whisper

(Marks, 1982a; Marks et al., 1987). To a great extent theunderstanding of cross-sensory metaphors follows the rulesgoverning synesthetic and cross-modal perception.

Synesthetic perception, cross-modal matching, and com-prehension of cross-sensory metaphor all speak to the issue ofstructural relations among dimensions of experience. Cross-modal correspondences appear also in functional processes,

in sensory inform ation processing. The fourth finding is thatreaction times to bright lights are faster (and errors are fewer)

when the lights are accompanied by loud (vs. soft) or by high-pitched (vs. low-pitched) sounds, whereas reactions times todim lights are faster when accompanied by soft or by low-pitched sounds (Marks, 1987a). Moreover, low- and high-pitched sounds similarly affect speed of response in discrimi-nating words describing visual experiences (W alker & Smith,1984). In nonsynesthetic perceivers, therefore, correspond-ences between dimensions of heteromodal experiences neednot rely on voluntary behavior; interactions between dimen-sions can take place quite autom atically. Thus the m atches ofloudness with brightness and of pitch with brightness pervade

586

Page 2: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 2/17

CROSS-MODAL SIMILARITY 587

both structural and functional processes in perception and in

language.

Properties of Cross-Modal Similarities

What is the source of these pervasive connections acrossmodalities? For example, are they "hard wired"? The quali-tative rules of cross-modal correspondence may themselvesbe intrinsic to perception; associations across modalities couldemerge from basic sensory/perceptual processes. Consistentwith this hypothesis is the evidence that cross-modal similar-ities appear early in life. Lewkowicz and Turkewitz (1980),for example, reported behavioral equivalence between loud-ness and brightness in 3-week-old infants. Correspondencebetween loudness and brightness might rest on their use of a

common neurophysiological code—for instance, on numberof neural impulses per unit time.

The hypothesis that cross-modal similarities rest on com-

mon neural codes can lead naturally to a quantitative metric:Perceptual experiences in different modalities are theorizedto be equivalent when the underlying neural processes are

equal. Loudness "equals" brightness when rates of neuralactivity in the auditory and visual systems (or specified sitestherein) are equal. A model of this sort implies absoluteequivalences between modalities: Particular levels of loudnessequal particular levels of brightness.

But sensory coding isonly one possible basis to cross-modalequivalence. Perhaps modalities are not so strongly inter-woven through biology; perhaps cross-modal connections de-

pend only in part on intrinsic sensory functions, whereas in

significant measure they depend on higher level cognitiveprocesses. Even if biology predisposes the format of the qual-

itative rules, the rules that specify which dimensions corre-spond, cognitive and especially linguistic development may

override biological disposition so as to determine the quanti-tative rules, the rules that specify which values on givendimensions correspond. By this token, even if loudness andbrightness find their basic resemblance in common physiolog-ical processes, they may subsequently come to owe theircommunality to a common semantic code, with loud and

bright alike represented as "intense," and s o f t and dim as"nonintense." (Of course, the common semantic codes may

owe their ultimate origin to the common perceptual physiol-ogy.) Moreover, to the extent that s o f t contrasts with loud and

dim with bright by means of classes of differing relative values,

semantic coding implies relational correspondences: The sof-ter sound is to the dimmer light as the louder sound is to the

brighter light.

Cross-Modal Similarity Is Multidimensional

The present study comprised two experiments that aimedto explore, within a conceptual framework that is explicitlymultidimensional, the relations of pitch and loudness to

brightness, by asking three interrelated questions.First, How do loudness and pitch interact in determining

the similarities of sounds to lights? Does similarity rest on just

one attribute or the other, or do loudness and pitch combinein an integrative process? And if pitch and loudness dointegrate, does everyone weight the attributes equally, or do

some individuals place primary emphasis on pitch, others on

loudness? There is even an alternate conceptualization,

namely, that pitchand

loudnessare

themselves componentsof a third, unitary attribute—density—and that density is thesingular and primary auditory correlate of visual brightness( s e e Marks, 1975). It is intriguing that the attribute no wknown as auditory density (see Guirao & Stevens, 1964;Stevens, Guirao, & Slawson, 1965) once was called "bright-ness" (see Boring & Stevens, 1936). To the best of my knowl-edge, no data exist that address the issue of how individualsubjects integrate loudness and pitch in making cross-modalcomparisons between sounds and lights.

The second question is implicit in the first: What is the

range of individual differences in the ways that people usepitch or loudness in cross-modal judgment? In particular, areindividuals reliable across tasks in their reliance on the one

attribute or the other? If cross-modal correspondences are"hard wired," we might expect considerable uniformity bothacross subjects and across tasks.

Third, Howabsolute are the cross-modal associations? Are

intersensory matches based wholly or largely on absolutecorresponding levels of sensory experience, in that a particularbrightness equals a particular pitch or loudness, or are matchesbased on relations, in that the greater of two brightnessesmatches the higher of two pitches or the greater of twoloudnesses?

Synesthesia provides evidence that at least a few cross-modal connections can be absolute, notably the correspond-ences between particular colors and the vocal or muscial

sounds that induce them; it is common to find synestheticperceivers reporting connections between notes of the musicalscale and colors of particular hue and brightness (e.g., Lang-feld, 1914; Ortmann, 1933). Absolute connections, however,may be limited to instances in which the percepts have clearcategorical representations, as is the case with musical notesand with hues. Not all correspondences, though, need beabsolute; even in synesthesia, it is possible that relative valuesrather than absolute connections govern correspondencesamong psychologically more continuous dimensions likebrightness, pitch, and loudness.

Hornbostel (1925) was perhaps the first to argue that in

nonsynesthetic perception as well as in synesthesia, corre-spondences among continuous dimensions of different sensemodalities are absolute. He justified his argument by reportingtransitivity of cross-modal matches between pitch and odor,surface brightness and odor, and pitch and surface brightness.On the other hand, Cohen (1934) soon after countered thatHornbostel's cross-modal matches were only fortuitously ab-

solute, a result of the particular stimulus values that Horn-bostel used as standards; Cohen's own results implied a strongrelative component to matching.

In recent years, the same dispute has resurfaced, withZwislocki and Goodman (1980) and J. C. Stevens and Marks(1980) arguing for the existence of absolute correspondences,especially in the perception of intensity, while Mellers and

Page 3: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 3/17

588 LAWRENCE B. MARKS

Birabaum (1982) have claimed that cross-modal judgments

rely on relative correspondences. Th e issue remains far from

settled. Lewkowicz and Turkewitz's (1980) findings imply

absolute correspondenceofloudness and brightness in infants.

Most recently, Marks, Szczesuiul, and Ohlott (1986) reported

cross-modality matches in adults between loudness and

brightness (and between loudness and vibration) that ap-

peared to compose roughly equal parts of the absolute and

the relative. In that study, the cross-modal matches were

derived indirectly, through numerical estimations of either

intensity (magnitude estimates of stimuli from different mo-dalities) or dissimilarity (magnitude estimates of differencesbetween heteromodal pairs of stimuli). One goal of the first

experiment here was to use nonnumerical judgments to ex-

amine individual differences and consistencies in absolute/

relative matching of pitch and loudness to brightness.

Toward a Model of Cross-Modal Matching

To help formalize some of the empirical findings and

theoretical concepts, let us consider the following, which

constitute initial steps toward developing a model that might

apply to various types of cross-modal judgment. In particular,

I seek to establish a model that can apply to the two kinds of

judgment obtained in the present pair of experiments: (a)matching (discriminative) judgments, in which subjects decide

which of two lights a particular tone more closely resembles,

and (b) similarity ratings, in which subjects estimate the degree

of resemblance between a tone and a light.

In general terms, the model states that cross-modal judg-

ments rest on the person's comparison of two perceptual

quantities—an auditory quantity, Ba , which in the present

experiments comprised pitch P and loudness L, with a visual

quantity, £v, which corresponded to brightness. Conse-

quently, the model contains four main components: First,

there are the psychophysical transformations of sound fre-

quency and intensity to perceptual values of P and L. (There

isalso a psychophysical transformation ofluminance to values

of brightness Bv, but this transform was not a matter of

concern in these experiments.) Second, there is a rule of

auditory integration, by which P and L combine into B a.Third is a rule of cross-modal comparison, specifying themathematical operation by which Ba is compared with Bv.

Finally, there is a rule of response, by which the results of thecross-modal comparison translate into sensory matches or

ratings of similarity. In the remainder of this section I shallconsider the possible rules of integration and comparison,

deferring to later in the article a consideration of the psycho-

physical transformations and response rules.

How might fia depend jointly on pitch and loudness? And

how might subjects compare the auditory quantity Ba to the

visual quantity Bv? Asalready noted, Ba represents the quan-

tity, defined here in terms of pitch an d loudness, that I assume

subjects compare on each trial to visual brightness. Although

an infinitude of definitions of Ba are possible, two definitions

offer themselvesas main candidates. One is a weighted vector

combination:

The second is a weighted linear combination:

Ba = aP + bL. (2)

Which combinatorial rule is the more appropriate? Possibly

relevant to this issue is the multidimensional analysis ofperceived tonal similarity. Schneider and Bissett (1981) iden-

tified the putative attribute of auditory density as lying on an

axis halfway between pitch and loudness in a two-dimen-

sional, Euclidean auditory space. In some sense, then, audi-

tory density may be said to "contain" pitch and loudness.

These findings are consistent with an earlier hypothesis of my

ow n (Marks, 1975), namely, that in the synesthetic matching

of lights with sounds people may be equating visual brightness

with auditory density. Moreover, Schneider an d Bissett noted

that density, and perhaps therefore the quantity Bv may be

represented by the linear rule expressed in Equation 2 (and

in which, on average, a = b, because in their study density

lay on an axis about 45° between pitch and loudness).

An d what rule governs the comparison between f?a and J3V?

Consider the possibility that cross-modal differences are per-ceived nonlinearly; one rule of comparison might state that

the cross-modal difference £ > a ,v is given by

Another rule is linear:

= i R2

— R2| '

A— |U

aM J v f

— I Ba —

(3)

(4)

B. = (a P2 + bL2T- (1)

Which rule is appropriate? Consider the evidence concern-

ing perceived similarity. For auditory similarity, most of the

evidence favors a straightforward Euclidean metric. First, in

tasks requiring speeded stimulus classification and discrimi-

nation, pitch and loudness interact as if they form "integral"perceptual dimensions (Grau & Kemler Nelson, 1988; Melara

& Marks, 1989; Wood, 1975), and integral interactions cor-

relate with Euclidean distance metrics (see Garner, 1974;

Hyman & Well, 1967; Shepard, 1964). Second, although

studies that have carried out multidimensional scaling of tones

have not always been able to differentiate city-block from

Euclidean metrics with respect to goodness of fit (e.g., Schnei-

der & Bissett, 1981), recent empirical evidence of Grau and

Kemler Nelson (1988) clearly favors the Euclidean over the

city block. Third, despite the fact that they could not choose

empirically between the two metrics, Schneider and Bissett

(1981) argued nevertheless on theoretical grounds in favor ofthe Euclidean.

None of this evidence, of course, applies directly to cross-

modal similarity. But a fourth finding does: Melara (1989)

found the Euclidean metric superior to the city-block metric

in scaling comparisons of tones (pitch) and colors (lightness);

empirical evidence shown later in this article (Experiment 2)

also favors the Euclidean.

Th e framework given by the present model allows restate-

ment of the three questions posed in the last section. Question

1 asks whether people use a rule of integration like weighted

linear summation in making cross-modal judgments, and, ifso, whether individualsdiffer substantially in their weighting

coefficients a and b. Question 2 asks whether an individual's

weighting coefficients are consistent over different empirical

tasks, such as matching and similarity rating. And Question

Page 4: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 4/17

CROSS-MODAL SIMILARITY 589

3 asks whether a particular value of Ba corresponds absolutely

to a particular value of Bv, independent of the particular

stimulus context. I shall elaborate the model in greater detail

in the course of describing two experiments in which I sought

to answer these three main questions.

Experiment 1

Although psychophysicists tend to treat perceived intensity

as the primary psychological variable of cross-modality

matching, and hence to treat the relation of loudness to

brightness as central to similarity between vision and hearing,

a couple of lines of evidence suggest that pitch may be in

some sense "better" than loudness as a cross-modal analogue

to brightness. First, very young children (around 4 years) are

more likely to match low pitch with dim and high pitch with

bright than to match soft with dim and loud with bright

(Marks et al., 1987). Second, among adults the metaphorical

translation of meanings takes place at least as strongly, if not

more s trongly, between pitch and brightness as between loud-

ness and brightness (Marks, 1982a).

When given the opportunity, subjects appear to use both

auditory dimensions of pitch and loudness in matching to

brightness: Marks (1978) obtained cross-modality functions

that showed a trade-off between pitch and loudness in the

matching of sounds to lights of constant brightness. To match

a given brightness, as pitch increased loudness decreased

concomitantly. To obtain those results, however, data were

pooled over subjects; the study was not designed to determine

to what extent certain subjects may have relied on pitch,

others on loudness. The triple goals of Experiment 1 were (a)

to examine more carefully the way pitch and loudness com-

bine in cross-modal association to brightness, (b) to assessindividual differences in reliance on pitch and loudness, and

(c) to determine whether the associations depend largely on

relative (contextual) values or on absolute correspondence.

In planning Experiment 1,1sought to satisfy three criteria:

First, the task should be simple (e.g., to permit future com-

parisons of children to adults). To accomplish this, I decided

on a constant-stimulus procedure. Present throughout the

experiment were two lights differing in luminance and hence

in brightness. On each trial the subject heard a tone; the task

was simply to indicate whether the tone was more similar to

the dim or to the bright light. Second, the procedure should

be capable of parceling out the contributions of pitch and

loudness. To this end, the tones varied from trial to trial inboth frequency and intensity. And third, the procedure should

permit determining whether the contributions of pitch and

loudness rely on absolute correspondences or on relations. To

this end, 1 tested each subject sufficiently to establish individ-

ual parameters.

Method

Stimuli

The visual stimuli were two circular patches of white light, each

produced by a 100-W incandescent lamp that transilluminated a

diffusing paper. Each patch was 3.8 cm in diameter (the two were

separated horizontally by 5.7 cm edge to edge) and thus subtended a

visual angle of 3° at the viewing distance of 72 cm. Wratten neutral

filters inserted in the path s of the beams served to set the luminances

to 180 and 900 cd/m2("dim"and "bright," respectively).

The sounds were pure tonesproduced by a Coulbourn Instruments

modular system, under the control of an Apple He microcomputer.

The tones could take any one of six frequencies (200, 300, 540, 750,1250, or 2000Hz) and any one of six intensity levels (45, 50, 55, 60,

65, or 70 dB loudness level). I use a designation in terms of loudness

level (LI.) because, over the frequency range of 200-2000 Hz, equal

sound pressure levels (SPLs) are far from appearing equally loud.

The values of LL, derived here through Fletcher-Munson cqual-

loudness contours, represent the SPLs of a 1000-Hz tone equal in

loudness to the tone in question. Thus all tones at any given LL are

roughly equal in loudness—though generally not equal in SPL. Tones

were presented through matched TDH-49 headphones mounted in

MX41/AR cushions. Tones lasted 1.5 s, with rise and decay times of

10ms.

The set of six frequencies was divided into two subsets of four, a

"low" subset (200-750 Hz) and a"high" subset (540-2000Hz), with

the frequencies 540 and 750 Hz in common. Similarly, the set of six

LLs was divided into two subsets of four, a "low" subset (45-60dB)and a "high" subset (55-70 dB), with the LLs of 55 and 60 dB in

common. Taking all possible combinations of the loudness and pitch

subsets, I generated the stimuli for the four conditions of the experi-

ment: (a) low pitch/low loudness, (b) high pitch/low loudness, (c)

low pitch/high loudness, and (d) high pitch/high loudness. Thus each

condition comprised 16 different tonal stimuli (4 frequencies x 4

LLs).

Procedure

Subjects sat in a darkened, sound-isolating booth in which the two

lights were con tinuously visible. Subjects held a nume ric keypad that

connected directly to the computer. By depressing one of the keys,the subject initiated each stimulus presentation. On each trial, the

subject's task was to indicate, by appropriately pressing one of two

additional keys, whether the tone more closely resembled the dim or

the bright light. Tones were presented in random order, different for

each subject and in each condition; each of the 16 tones was judged

a total of 20 times within the course of the session.

Every subject served in four sessions, one for each of the experi-

mental conditions (stimulus subsets). Order of the four sessions

(conditions) was counterbalanced over the 16 subjects who partici-

pated. All were young men and women with normal hearing and

with normal or corrected-to-normalvision.

Results and Discussion

Psychometric Functions

The basic data consisted of tallies of the percentages of

times that the subject responded that the tone more resembled

the brighter light. These percentages then could be plotted

against either sound frequency or against LL to yield psycho-

metric functions. In order to interpret such functions, how-

ever, it is useful to consider the range of possible outcomes.

Psychometric functions for pitch and loudness. Given the

results of previous research, I expected to find that both

increasing pitch and increasing loudness would yield increas-

ing percentages of "bright" responses. If this were so, then

Page 5: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 5/17

Page 6: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 6/17

CROSS-MODAL SIMILARITY 591

the dimmer or brighter of two lights. The matching modelstates that the probability p(B/P, L) that a sound of pitch P

and loudness L is judged more like the brighter light is givenby a monotonic function F of the form

p(B/P, L) = F(Bl - \2)"\ 0 < p(B/P, L) < 1, (5)

where X represents the "response criterion" determined by thevalues of brightness (5V) of the two visual stimuli. Note thatEquation 5 assumes a nonlinear rule of comparison (Equation3), although it does not yet specify a rule of integration forBa. A variety of possible functions might characterize theresponse or psychometric function F. F may, for example, bea Gaussian integral:

p(B/P, L) =1

exp — (6)

Ideally, we would like to specify the function F as well as toestimate the relation of Ba to pitch and loudness and the

relation of X to the brightnesses of the tw o lights. For the mostpart, however, the present discussion will consider the formof the general model expressed by Equation 5.

Note that when Ba = X ,p = .5. That is to say, when Ba

equals X , the auditory stimulus is "equally similar" to the twovisual stimuli, for which equal similarity is defined as match-ing equally often to the dimmer stimulus and the brighterlights. Note also that this relatively simple form of the match-ing model assumes that any bias in responding is negligible;in practice when applying the model to data, response biaswill be absorbed into the value of the criterion X .

Besides the form of the psychometric function F, twoparameters of Equation 5 are of concern here: X and Ba. The

value of X, the criterion, reflects the degree to which cross-modal matching is absolute or relative. If auditory-visualmatching is absolute, then X will be constant across contextualconditions, its value corresponding to some constant value ofBa . If auditory-visual matching is relative, however, X willvary with stimulus context. Quantification of X provides ameasure along the dim ension of absolute versus relative.

The other parameter of interest in Equation 5 is B a. Earlierin this article I argued on the basis of Schneider and Bissett's(1981) results tha t Ba may obey Equation 2, following a ruleof linear summation, equaling aP + bL. Ideally, we wouldlike to compare directly how well therole of linear summ ationand, say, a rule of vector summation account for the presentdata. Comparison of the two rules of integration is impeded,

however, by three factors. First, either rule would operate onunderlying scale values of pitch and loudness, P and L,variables that are not measured directly, rather than on di-rectly measurable values of sound frequency and intensity.That is, we must specify the psychophysical functions relatingP an d L to frequency and intensity before comparing (orsimultaneous with comparing) the two integration rules. Sec-ond, the difficulty in differentiating between the two integra-tion rules is exacerbated by the tendency of many of theindividual subjects to rely almost exclusively on use of oneauditory dimension or the other—implying a»b or, lessoften, b»a in Equations 1 and 2. When cither a or b is

negligibly small, the vector and linear rules become virtually

indistinguishable. Finally, even if we restrict the analysis todata derived from that very small subset of subjects for whoma and b are more or less equal, it is necessary to define orassume the form of function F—or, what is equivalent, tofind an appropriate transformation of p(B/P, L). Althoughthe Gaussian model of Equation 6 implies that a z transfor-mation is appropriate, other functions are possible. Given thefreedom to choose both psychophysical functions for P andL and psychometric functions for F, the present data are notsufficiently strong to permit us to fit a model such as theGaussian directly to the values ofp(B/P, L) .

Although these considerations foster a conservative ap-proach to analyzing the data, nevertheless it is possible tointerpret the data quan titatively through the general model ofEquation 5. Most important, it is possible to analyze the data

in a way that obviates the need to assess the nature of thepsychometric function F. Given the general form of the modelexpressed in Equation 5, such an analysis is provided by thetraditional "threshold" approach of psychophysics.

Let us define a tonal "threshold," or in the present examplea tonal stimulus of "equal similarity," as a pitch/loudness

combination that produces p(B/P, L) = .5. Although in

principle we could use any constant probability, a value of .5not only represents a kind of neutral point in matching(assuming negligible response bias), but moreover is situatednear the overall mean ofp(B/P, L) in the data sets and thustends to be estimated relatively accurately.

From Equation 5 it follows that when p(B/P, L) = .5, orsome other constant, then either

or

aP2

+ bL2-\2

=

a P + bL - X = k,

(7)

(8)

depending on whether Ba takes on the vector form (Equation1) or the linear form (Equation 2). Note that this "threshold"

analysis actually eliminates two assumptions. Not only doesit eliminate the need to assume or define the psychometricfunction F, but it also eliminates the need to choose betweennonlinear and linear rules of comparison: Replacing the non-linear rule with a linear one in Equation 5 would produceexactly the same functional Equations as 7 and 8. EitherEquation 7 or 8 will allow us to answer the two main remain-ing substantive questions: (a) Is ma tching absolute or relative?(Is X constant over contextual conditions?) (b) How do sub-

jects weight the contributions of pitch and loudness in cross-modal m atching? (What are the relative values of a and tf!)

Derivation of equal similarity: "Matching points." Thesubsequent analysis permits derivation of estimates for eachsubject as to (a) the contribution of pitch versus loudness (thevalues of a and b) and (b) the role of absolute correspondencesversus relations (the value of X). In order to assess thesevariables, I converted each psychometric function (from eachsubject in each contextual con dition) into a single value—the"matching point"—by calculating the value of the auditorystimulus that produced, or was interpolated to produce, 50%response on the psychometric function. Th e procedure in-volved linear regression on either sound frequency of LL,

Page 7: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 7/17

592 LAWRENCE B. MARKS

then calculation of the value of the stimulus corresponding to

the 50% point. Each matching point was taken, by definition,

to represent a sound that was "half-way between" the two

lights, that is, to represent a combination of sound frequencyand intensity judged "equally similar" to the two lights. The

regressions were carried out on z-transformed values of the

probabilities; however, because of the sizable number of val-

ues at 0% and 100%, to avoid losing a considerable amount

of data, the data were firstconverted to probabilities according

to Tukey's (1977) formula for split simple (ss) fractions: p =

(n + l/6)/(Ar+ 1/3). Regressions were then carried out using,

for any given psychometric function, only one value of zcorresponding to 0% or 100%, even when more appeared.

An example appears in Figure 3, which shows the calcula-

tion of a matching point for Subject 16 in Condition 1, givena sound frequency of 540 Hz; Figure 3 indicates that at 50%

response (z = 0), the LL was 53.4 dB. I obtained as many

points as possible for each subject in each condition by

performing these calculatiions either as a function of LL, with

log frequency constant, or as a function of logfrequency, withLL constant.

Contours of constant cross-modal similarity. The forego-ing procedure provided the (derived) data points plotted in

Figure 4. Each panel of Figure 4 givesthe results from a single

subject; different symbols represent results for the four differ-

ent contextual conditions. By plotting frequency against LL,

each point indicates a particular auditory stimulus that was

judged equally similar to the dim and to the bright lights. The

locusofpoints derived for a given subject in a given contextual

condition represents, therefore, a "constant-similarity con-

tour," that is, a set of auditory stimuli all of which are judged

cross-modally to be "halfway between" the dim and bright

lights.

Note the spacing of stimuli on the two axes. This spacing

represents an attempt to showhow the equal-similarity data

conform to the linear integration rule of Equation 8. Given

that (a) the linear model is appropriate to Ba and (b) when

p(B/P, L) = 0, A : = 0, we can use Equation 8 to write

P=X/a- bL/a. (9)

0)

05 0

O -2

C D

0

-3

S:16

540 Hz

40 50 60

Loudness Level (dB)

Figure 3. Psychometric function for Subject 16 (Condition 1), at a

sound frequency of 540 Hz, showing the calculation of the matching

point—he loudness level at this frequency is calculated to give 50%

response.

Consequently, a plot of sound frequency, with stimuli spaced

in proportion to pitch, against sound intensity, with stimuli

spaced in proportion to loudness, should yield a straight line

whose slope equals -b/a and whose intercept equals X/a.In Figure 4, the stimuli are spaced on the ordinate according

to log frequency and on the abscissa according to LL - 10;

that is to say, I am assuming that P, the scale of pitch operating

in similarity judgments, is approximately a log transform of

the sound frequency in Hz and that L, the scale of loudness

operating in similarity judgments, is approximately propor-

tional to number of decibels above a threshold of 10 dB (a

good approximation under the present experimental condi-

tions). The data appear to be at least reasonably consistent

with this model, though it must be recognized that the degree

of uncertainty is great: For example, as Figure 4 indicates,

results from several of the subjects suggest slopes near zero;

as indicated earlier, when a or b = 0, the linear and vector

rules become identical. No claim can be made at this point

either that the present data strongly support the linear rule of

summation over other rules such as the vector, or that loga-rithmic psychophysical transformations rather than other

(e.g., power law) transformations intervene between sound

frequency and intensity and perceptual values of P and L.

Nevertheless, the model provides a useful framework foranalyzing such parameters as a and b (representingthe weight-

ings of pitch and loudness) and X (representing the criterion).

It is important to point out that the patterns of variation in

a, b, and X are robust over a wide range of possible integration

rules and psychophysical transformations.

Two aspects of these equal-similarity contours are signifi-cant: (a) their slopes and (b) their convergence over contextual

conditions.

Weighting of Pitch and Loudness

First, the slopes of the constant-similarity curves provide a

measure of the relative contributions of pitch and loudness;

more precisely, the slopes give the values of -b/a in Equation

9. A set of horizontal lines in Figure 4 (frequency = constant)

would indicate that similarity depends only on pitch, not at

all on loudness; a set of vertical lines (LL = constant) would

indicate that similarity depends only on loudness, not at all

on pitch; and a line or lines of negative slope would indicate

that cross-modal similarity depends on both pitch and loud-

ness, that the one can trade off with the other to maintain an

equal similarity to brightness.

Figure 4 makes it clear that for about half of the subjects,pitch alone or in large measure determined cross-modal sim-

ilarity. By way of contrast, no subject appeared to rely solely

on loudness to match brightness, although a couple ofsubjects

did rely on loudness more than on pitch. A qualification

needs to be made: The conclusions about trade-offs between

loudness and pitch rest on two assumptions. The first is that

it is appropriate to translate, in a one-to-one fashion, from

frequency to pitch and from intensity to loudness. This isprobably a safe assumption with regard to pitch, for even

though pitch does change with stimulus intensity, that change

is slight. The assumption also seems reasonably safe withregard to loudness, in that the use of LL aimed specifically to

Page 8: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 8/17

CROSS-MODAL SIMILARITY 593

2.5

-»—0

N

>*QCC D

ZSD"0

8=8-

2.0

3.0

1 2

2.0

16

40 50 70 40 70 40 50 60 70

Loudness Level (dB)

Figure 4. Fo r each of the 16 subjects, matching points for the four contextual conditions (open squares

= Condition 1 , filled triangles = Condition 2, filled squares = Condition 3, open circles = Condition

4) . (Each point gives the combination of sound frequency and loudness level calculated to give 50 %response on the psychometric function.)

equate stimuli of different frequency in their loudness; never-

theless, a caveat is in order because it is not possible to

discount either small individual differences among subjects

or small differences between the average equal-loudness

curves of the present subjects and the curves that Fletcher and

Munson (1933) used to determine LL. The second assump-

tion is that P and L are scaled appropriately here (in scaling

them, I assumed mean P an d mean L are equal for the present

stimulus set). Clearly, in an additive model one can trade offmultiplication of scale values with weighting coefficients.

Despite these reservations, it is nevertheless clear that loudness

counted much less than did pitch in determining auditory-

visual similarity.

To quantify these results, I calculated individual values ofa and b in Equation 9, under the restriction that these param-

eters can be considered weighting proportions: that is, a + b

= 1. If c represents the slope of any given equal-similarity

function in Figure 4, then a, the weighting coefficient for

pitch, equals 1/(1 — c). The first four columns in Table 1 list

the values of a for all subjects in every condition. Th e verylarge values for Subjects 1-7 (for each subject, all values of a> .85) reflect the near total reliance on pitch in matching to

brightness. By way of contrast, only one subject, Subject 16,

had a notably small weighting coefficient for pitch. Moreover,

inspection of values of a in Table 1 as well as the correspond-

ing slopes in Figure 4 shows that subjects were consistent over

conditions in their relative weightingsof pitch an d loudness.

(One apparent exception is Subject 15, who gavean aberrantly

large negative pitch coefficient in Condition 4; as Figure 4

shows, the equal-similarity points from which the aberrant

value derives [open circles in Panel 15] are exceptionally

variable, so this particular value of a should not be taken too

seriously.)Analysis of variance confirmed the existence of wide inter-

individual variation in relative weighting, F(15, 45) = 9.042,

p< .001;with Subject 15's data eliminated, F(14,42) = 17.71,

Page 9: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 9/17

594 LAWRENCE B. MARKS

Table 1Relative Weighting Coefficient a fo r Pitch (re: Loudness) andCriterion \ in theFour Contextual Conditions(Experiment 1:

Matching Tone to Light)

Subject

1234567

8

910111213141516

1

0.8881.0330.9810.9870.9940.9430.9470.8440.7040.7580.6390.6400.5960.5040.4920.237

2

1.1360.9610.9700.9720.9600.8560.8550.8420.7270.7190.5790.5570.4840.5310.5390.240

3

1.1361.0791.0091.0350.9320.9440.8710.7870.6830.6320.6260.5230.5290.4650.4230.372

4

0.8950.9700.9940.8490.9530.8710.8970.4730.7000.650

-0.123-0.017

0.3960.021

-0.974-0.036

1

2.6412.5772.6042.6692.6922.6002.5772.5812.6402.5212.7452.7102.5172.5342.5012.570

2

3.1132.9642.9983.0283.0562.9132.8302.9112.9042.7662.7972.8172.6252.6972.6532.636

3

2.5542.5092.5962.6482.7532.6652.5612.7482.7542.7212.7732.8092.6752.7462.7132.927

4

3.0033.0163.0153.0283.0772.8572.8602.9893.0222.7243.0563.0302.9713.1012.4223.033

Note. Condition 1 = low pitch/low loudness, Condition 2 ;

high pitch/high loudness.

: high pitch/low loudness, Condition 3 = low pitch/high loudness, Condition 4 •

p < .001. Viewed in another way, we see that the values of acorrelate highly over conditions: Pearson coefficients of cor-

relation r between pairs of conditions range from .76 to .96,

averaging .87. Interestingly, these correlation coefficients, cal-

culated on the derived measure a, are on the whole greaterthan the corresponding correlations calculated on the originalslopes c, the latter ranging from -.31 to .98, and averagingonly .33.Thus the derived, model-based measures seem to be

the more stable, a point whose theoretical significancedid not

escape notice.To repeat, on e prominent feature of the results is theconsistency with which individuals relied on pitch or loudnessin matching sounds to lights. Subjects tended to use pitch,loudness, or both in a characteristic fashion.

Bu t there is a second prominent feature to the weightingcoefficients a an d b, namely, their variation with change instimulus context. Although the average size of the pitchcoefficient a hardly varied across Conditions 1-3 (M = .762,

.746, and .753, respectively), a was considerably smaller inCondition 4 (M = .470). If we eliminate the data of Subject15—the subject who produced the aberrant value in Condi-tion 4—the means become .780, .759, .775,and .566. Thevariation in a across conditions is reliable, F(3,45) = 8.018,

p < .001; without Subject 15,F(3,42) = 9.612, p < .001.Posthoc, pairwise comparisons (Tukey, honest significant differ-

ence, or hsd)showed that Condition 4 differed reliably from

each of the other conditions (p < .01), but no other pair

differed reliably (allps >.05).The variation in a with change in stimulus context has an

especially important implication: The rule of integration gov-erning the quantity fia appears to have an "optional"com-

ponent. That is, the relative weightings of pitch and loudness

vary with the conditions of the experimental task, despite thefact that any given individual characteristically gives moreweight to one dimension or the other. Even for a given person,Ba is not fixed. This finding speaks against the notion that the

weightings are rigid or physiologically determined, bu t sug-

gests instead that they are malleable an d perhaps even subjectto voluntary control.

Criterion and Stimulus Context

The second feature of these equal-similarity curves is theway their locations changed across different conditions ofstimulus context. Recall that I used different subsets of soundfrequency and LL in order to learn whether cross-modalmatches remain invariant when the contextual set of stimuluslevels changes. The existence of such invariance would indi-cate that cross-modal similarities rely on absolute correspond-ences between stimuli (or sensations) of different modalities;invariance would reveal itself as a convergence of the fourequal-similarity contours onto a single function. Lack ofinvariance, indicating that cross-modal similarities relyat leastin part on relations, would reveal itself as divergence of atleast some of the four contours.

Figure 4 shows divergence clearly to be the rule. It isimportant to note, however, the connection between the

nature of convergence-divergence and the extent to whichthe matches rely on pitch versus loudness. Absolute respond-

ing should produce convergence in the functions; bu t someconvergence will also take place even with relative respondingif a subject's matches rely on just one attribute and not onboth. Consider the data for Subject 2. The contours arehorizontal, implying that Subject 2 used pitch but not loud-ness in matching to brightness. The four contours collapseonto two curves: One curve represents Conditions 1 and 3,low pitch/low loudness (open squares) and low pitch/highloudness (filled squares); the other curve represents Condi-tions 2 and 4, high pitch/low loudness (filled triangles) andhigh pitch/high loudness (open circles). Each pairwise con-

vergence takes place, therefore, when the contextual levels ofpitch remain constant but the contextual levels of loudness

Page 10: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 10/17

CROSS-MODAL SIMILARITY 595

vary, a result that is wholly consistent with the conclusion,based on slopes, that Subject 2 ignored loudness completelyin matching to brightness. That the functions do not convergewhen the levels of pitch change, however, agrees with theconclusion that Subject 2's matches depend on relative, not

absolute, values along the dimension of significance (in thiscase, on relative values of frequency or pitch).

Again, we can characterize the results in terms of a param-eter of Equation 9, in this case the criterion X. As alreadynoted, the intercept of each straight line that can be passedthrough a subset of data in Figure 4 corresponds to the valueX/a. Th e calculated values of X are listed in columns 4-8 ofTable 1. Interpretation of these numbers can be assisted byrecalling the psychophysical assumption that values of Pcorrespond to log frequency (values of L are proportional to

LL — 10, but were scaled such that the mean value of L forthe entire stimulus se t corresponded to the mean value ofP).

Given this scaling of P and L, the units of the criterion Xcorrespond roughly to values of log frequency. A value of 2.3

in the criterion, for example, is analogous to a sound fre-quency of 200 Hz. Hence, each increment of 0.1 in the valueof the criterion is equivalent about to a 25% increase in

frequency; an increment of 0.3 unit in criterion correspondsto a doubling of frequency.

Viewed in this light, it is clear that the criterion changesdramatically with changes in context. Only 2 of the 16 subjectsreveal even a hint of absolute responding across the experi-ment's four conditions. Subject 11 gave results consistent witha single criterion characterizing all conditions except highpitch/high loudness, and Subject 10 used a constant criterionover all conditions except low pitch/low loudness. In general,the nature of the convergence or divergence is compatible

with the slope of the contours and with the correspondingvalues of the pitch weighting coefficient a . Thus, as alreadyindicated, convergence takes place across subsets of loudnesswhen subjects ignore loudness (Subjects 1-7) an d mainlyacross subsets of pitch when subjects largely ignore pitch(Subject 16). Little or no convergence takes place when sub-jects use both auditory dimensions (Subjects 8, 9, 12-15).

Table 1 shows how the contextual condition affected thecriterion. Th e average value of the criterion is smallest inCondition 1 (low pitch/low loudness: X = 2.605), followed by

Condition 3 (low pitch/high loudness: X = 2.697), Condition2 (high pitch/low loudness: X = 2.857), and Condition 4 (highpitch/high loudness: X = 2.950). Analysis if variance showsthese differences to be highly significant, F(3, 45) = 26.41, p

< .001. Post hoc comparisons (Tukey hsd) reveal all pairwisedifferences to be significant (p < .01), except for the differ-

ences between Conditions 1 and 3 and between Conditions 2an d 4 (p> .05).

Did subjects tend to differ in their criteria? The values listedin Table 1 imply that subjects' criteria were reasonably uni-form. Analysis of variance suggests little or no reliable inter-individual variation in criterion, F(\5, 45) = 1.760, p = .075;without Subject 15 , F(14, 42) = 1.200, p = .25.

Th e results of Experiment 1 became the point of departurefor the next question I asked, namely, whether the use of oneauditory dimension or the other is truly characteristic ofindividuals. In particular, Experiment 2 inquired whether the

individual subjects who relied on pitch or loudness in Exper-iment 1 would continue to rely on the same attribute whenpresented with another cross-modal task, similarity scaling.Experiment 2 also sought to confirm that pitch, loudness, andbrightness are the appropriate psychological dimensions of

correspondence by asking whether they may be representedas primary axes in a single multimodal space.

Experiment 2

Although intriguing and revealing, the results of Experi-ment 1 are also intrinsically limited. A main limitation stemsfrom the use of a single pair of brightnesses. Ideally, Experi-ment 1 should be repeated using many "dim" and "bright"

standards along the brightness scale—varying both the meanlevel of each pair and the size of the difference in brightnessbetween them. One wonders, for example, whether the meanpitch or mean loudness in "matching points" increases withincreasing mean level of brightness, or how the slope of the

psychometric function varies with the size of the brightnessdifference around a given mean level. Unfortunately, how-ever, the paradigm of Experiment 1 is rather tedious andinefficient for parametric exploration.

Perhaps even more imperative is the need to develop a

wider conceptual framework. One w ay to characterize cross-modal correspondences is by representing perceptual qualitiesin a theoretical space that is both multidimensional and

multimodal (Marks, 1985; Marks et al., 1987; see Wicker,1968). The representation of perceptual qualities by means ofmultidimensional spaces has a long history in experimentalsensory psychology (see, e.g., Boring, 1942). A goal of spatialrepresentation is to express similarity relations among per-

cepts, in that some measure of distance between the spatiallocations of two perceptual experiences expresses the degreeof their dissimilarity. In auditory space, simple sounds, suchas pure tones, may be represented as points in a plane definedby the two roughly orthogonal dimensions of pitch and loud-ness (e.g., Schneider & Bissett, 1981); in visual space, coloredlights may be represented as points defined by the threedimensions of brightness, redness/greenness, and blueness/yellowness (so a two-dimensional plane of red/green andblue/

yellow gives variations in hue along concentric circles andvariations in saturation along radii emanating from the center,e.g., Indow & Kanazawa, 1960).

According to the traditional view, each sensory modality isgoverned by its own space. Consequently, there should be as

many individual spaces as there are modalities. But the cor-respondences that characterize synesthetic perception, cross-modal matching, and cross-modal interactions in responsespeed imply a possible linkage across modalities, and hence a

linkage among the spaces representing them. One way thatlinkage could come about is through communality amongdimensions of different modalities. Th e equivalence betweenloudness and brightness, for instance, suggests an overlap oridentity of these two heteromodal dimensions.

To pursue this possibility, in Experiment 2 I applied asecond technique to study correspondences between lightsand tones, namely, cross-modal similarity scaling. On eachtrial, the subject rated the degree of similarity/dissimilarity

Page 11: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 11/17

596 LAWRENCE B. MARKS

between two stimuli; on some trials both stimuli were audi-

tory, on others both were visual, and on still others one was

auditory and the other visual. If it is appropriate to treat the

underlying representation of similarity as spatial, then the

judgments of dissimilarity should be monotonically related to

the distances between percepts in the multidimensional, mul-

timodal space. By applying methods of multidimensional

scaling to the judgments, it is possible to uncover the dimen-

sions common to experiences of different modalities, as these

dimensions presumably underlie the similarity relations.

Wicker (1968) used a version of this procedure to study

similarities between tones and Munsell colors; a main featureof his results was the strong association between the pitch ofthe tones and the lightness of the colors. Subsequently, Iexplored the method in comparing hearing and vibratory

touch with respect to the perceptual dimensions of pitch and

loudness (Marks, 1985)and later in comparing hearing, touch,

and vision with respect to the perception of temporal patterns

(Marks, 1987b). This last study became the specific impetus

for Experiment 2.The study of temporal patterns showed individual subjects

to be consistent in the way they weighted rhythm and dura-

tion, which in that study were the two main components ofsimilarity (Marks, 1987b). Regardless of whether subjects were

making intramodal or cross-modal comparisons, and regard-

less of the particular modality or pair of modalities tested,

multidimensional scaling of similarity judgments revealed

most individuals to emphasize one dimension or the other

consistently. This finding suggested that multidimensional

scaling might also be useful in assessing individual relations

among pitch, loudness, and brightness. Consequently, I se-

lected a subset of the subjects from Experiment 1 to undergo

further testing; I based the selection on whether, in the firstexperime nt, the subject relied primarily on pitch or primarily

on loudness in matching tones to visual brightness. If reliance

on one dimension or the other in auditory-visual matching

is a stable personal characteristic and doesn't depend on the

particular experimental task, then comparable results should

be obtained from cross-modal similarity judgments.

Method

Stimuli

The apparatus for producing tones and lights were identical to that

of Experiment 1. The only change in the visual stimuli was theincrease in the number of luminance levels to five: 1.8, 18, 180, 900,

and 2800 cd/m2. Note that the third and fourth levels were identical

to the "dim" and "bright" stimuli, respectively, used in Experiment

1. The tones numbered nine, with each of three different frequencies(200, 650, and 2000 Hz) taking on each of three different loudness

levels (45, 57.5, and 70 dB). Thus the overall ranges of sound

frequency and intensity were identical to those in Experiment 1.

Procedure

Th e subject sat in the darkened, sound-isolated booth in front of acomputer monitor, on which appeared a rating scale. The scale was

a horizontal line 30 cm long, anchored at the left-hand end by the

label dissimilarand at the right-hand end by similar. By means oftwo keys, the subject could control the horizontal position of a cursor,

which at the start of each trial always appeared just above the center

of the rating line. On each trial, the subject set the position of the

cursor to indicate the degree of similarity between the two stimuli.

The computer recorded the position as an integer between 0 (ex-

tremely dissimilar) and 39 (extremely similar).During the course of the session, the subject judged all 196 possible

pairs of the 14 stimuli (five lightsand nine tones). Note that by testing

all pairs, intramodal as well as cross-modal, in a square matrix, each

intramodal pair appeared twice in the set. This permitted counter-

balancing of the temporal order of presentation, in the case of tone-

tone comparisons, and the spatial location of the lights, in the case

of light-light comparisons. Order of stimulus pairs wa s random anddifferent for every subject and in every session. In order to increase

stability and reliability of the individual data, each subject served intwo sessions, conducted on different days, rating all 196 stimuli in

both sessions.

Subjects

Eight subjects from Experiment 1 participated (Subjects 1, 3, 7, 8,10, 12, 14, and 16). They were selected to cover a wide range ofperformance in Experiment 1.

Results and Discussion

Th e ratingsgiven to each stimulus pair were averaged over

sessions for each subject and then analyzed by multidimen-

sional scaling. Implicit in this approach is a model of multi-

modal as well as multidimensional similarity, in which the

dissimilarity Aj between any pair of stimuli, i and j, corre-

sponds to a distance between them. In the present form of

the model, the metric of the space is Euclidean, given by the

equation

Aj = y /2 ( 1 0 )

where d^ .k is the difference between i and j on dimension k,and this Euclidean rule applies equally to intramodal and

intermodal pairs of stimuli—pairs of tones, pairs of lights,

and pairs containing a tone and a light.

For the main analysis, I converted each subject's "square"

matrix of responses to a "triangular" matrix by averaging thejudgments given to pairs that were identical except for order

of presentation. To execute the multidimensional scaling, Iused the individual difference procedure I N D S C A L of Carroll

and Chang (1970) as instantiated in the symmetrical version

S I N D S C A L (Pruzansky, 1975). The S I N D S C A L procedure makestwo important assumptions. First, it assumes the appropriate-

ness of a Euclidean metric in which distance between points

corresponds to dissimilarity; the orientation of the axes

through the multidimensional space is determined from thevariation over subjects in the weighting coefficients on the

various dimensions. Second, S I N D S C A L uses a metric, rather

than a nonmetric, scaling procedure; that is, it assumes that

the rating responses have interval-scale properties, not just

ordinal properties. To ensure that these two assumptions—Euclidean rule and interval-scale response—are justified, Ialso applied to the data a multidimensional scaling procedure

that assumes only ordinal properties in the ratinp and that

Page 12: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 12/17

CROSS-MODAL SIMILARITY 597

make it possible to compare the Euclidean and city-block

rules (Kruskal, 1964). The results confirmed the propriety of

SINDSCAL'S assumptions. First, the nonmetric procedure

yielded a better fit with the Euclidean than with the city-block

model: Stress, a measure of relative badness of fit (the greater

the stress, the worse the fit), was 13.1% with the Euclidean

metric and 16.8%with the city block (two dimensional solu-

tions; see also Melara, 1989). Second, the original data did

have interval-scale properties: The interpoint distances de-

rived from the scaling solution were linearly related to the

original dissimilarity ratings.1

Moreover, the multidimen-

sional space was much the same as that derived from SINDS-

CAL. But recall that SINDSCAL, the procedure used in the main

analysis, has the advantage of providing information about

individual subjects. Of particular interest were (a) whether

and how brightness, loudness, and pitch might appear as

dimensions, and (b) how individual subjects might differ in

the weightings on the dimensions of loudness and pitch.

Mult idimensional and Mult imodal Space

Using SINDSCAL, I obtained scaling solutions in two, three,

four, and five dimensions; the results of the main analysis

showed that two dimensions sufficed to account for sizable

proportions of the variance, with each additional dimension

accounting for less than 10% more and having little effect on

Dimensions 1 and 2. In the two-dimensional analysis, Di-

mensions 1 and 2 accounted for 36.5% and 36.4% of the

variance, respectively. Figure 5 shows the two-dimensional

solution. It is clear that Dimension 1 relates closely to loud-

ness, Dimension 2 closely to pitch. Thin solid lines connect

constant frequencies as LL increases from left to right; thin

dashed lines connect constant LLs as frequency increasesfrom bottom to top. This straightforward two-dimensional

structure agrees well with multidimensional scaling solutions

obtained with tones alone (Carvellas & Schneider, 1972;

Marks, 1985; Schneider & Bissett, 1981).

The five levels of luminance form an arc (thick solid line)

going from lower left (dim) to upper right (bright). Thus the

0.4

I0.2sC M 0.0

CDiQ

-0.4

-0.6-0.4 -0.2 0.0 0.2 0.4

Dimension 1 (loudness)

0.6

Figure 5. Multidimensional scaling so lution to similarity judgmentsof lights and tones. (Dimension 1 corresponds to loudness [from left

to right, loudness levels of 45 , 57.5, and 70 dB],Dimension 2 to pitch[from botton to top, frequencies of 200,650, and 2000Hz]. Brightnesscorrelates with both [from lower left to upper right, luminances of1.8to2800cd/m

3].)

visual dimension ofbrightness appears as a kind of"resultant"

of the two auditory dimensions of loudness and pitch. As I

pointed out (Marks, 1975), the orientation of the visual

dimension of brightness corresponds at least qualitatively to

the orientation of the auditory dimension of "density." At

that time, I suggested that the dual cross-modal equivalences

of brightness with pitch and brightness with loudness might

represent a singular equivalence between brightness and den-

sity; interestingly, reference can be found in the literature of

the early part of this century to a dimension of "auditory

brightness" (e.g., Rich, 1919), until Boring and Stevens (1936)

identified auditory brightness with density. And Schneider

and Bissett (1981) showed that density lies halfway between

the axes of pitch and loudness in auditory space. Figure 5,

however, shows no dimension of density/brightness.

As already indicated, the higher order dimensions ac-

counted for little of the variance. Dimension 3 yielded three

successive groupings: midfrequency tones, low-frequency

tones plus all lights, and high-frequency tones; it is hard to

provide Dimension 3 with an interpretation. Dimension 4corresponded largely to pitch and correlated highly with Di-

mension 2. Dimension 5, however, has some special interest.

Although it accounted for only about 3.9% of the variance,

and its appearance could be attributed to the data of a single

subject (16), Dimension 5 served roughly to distinguish lights

from sounds. When Subject 16"s data were eliminated, a five-

dimensional solution gave no division by modality.

To evaluate further the three higher order dimensions, I

also applied the SINDSCAL analysis separately to the half-matrices containing different stimulus orders. Recall that in

each experimental session, every pair of stimuli was presented

twice. In the case of tone-tone pairs, the temporal order of

presentationdiffered

in the two halves; in the case oflight-light pairs the spatial position differed in the two halves. One

half-matrix contained all tone-tone pairs in which the first

tone was softer and/or lower in pitch and all light-light pairs

in which the light to the left was dimmer; the other half-

matrix contained the complementary subsets of stimulus

pairs. Although temporal/spatial order exerted a sizable effect

on the overt responses themselves, the results of the multidi-

mensional scaling of each subset were virtually identical to

those obtained from the averaged data: The scale values of

Dimensions 1 and 2 were much the same, as were the weight-

ing coefficients. And in both cases, with Subject 16's data

included, but not with them excluded, Dimension 5 separated

visual stimuli from auditory stimuli. In sum, the results

suggest the salience of two multimodal dimensionsof similar-ity, one a dimension of loudness-brightness, the other a

dimension of pitch-brightness.

Individual Differences

Inspection of the weighting coefficients shows marked in-

dividual differences; see Table 2. For example, where Subject

1Other experiments, on tones alone, conducted in my laboratory

by Robert Melara and by me reveal the same pattern of results. Inour experience, the Euclidean metric always produces the better fit(lower stress). See also Footnote 2.

Page 13: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 13/17

598 LAWRENCE B. MARKS

Table 2

Weighting Coefficients From Multidimensional Scaling of

Tones and Lights (Experiment 2)

Subject

1

37g

10121416

Dimension 1

0.297

0.0870.4120.5040.8190.8310.5640.746

Dimension 2

0.707

0.8670.7820.6270.2670.1900.5920.165

Note. Dimension 1 = loudness, Dimension 2 = pitch.

3 emphasized pitch at the expense of loudness (weighting of.087 on Dimension 1, .867 on Dimension 2), Subject 12 didjust the opposite (weighting of .831 on Dimension 1,. 190 on

Dimension 2). There is a reasonably close correspondence

between these weightings and the matching behavior from

Experiment 1: Compare the present weighting coefficientswith the pitch weighting coefficient a derived from equal-similarity contours in the first experiment. Comparison of

Table 2 with Table 1 shows a strong degree of consistencyacross the two experiments in the use of pitch or loudness to

match visual brightness. Th e rank-order correlation coeffi-

cient, p, taken over subjects, between the median coefficienta from Experiment 1 and the difference between weightingson Dimensions 2 and 1 from Experiment 2 is 0.86. Thesesubstantial individual differences seem unlikely to be basedon sensory variations; rather, they suggest that people vary inthe cognitive strategies they use when comparing heteromodal

stimuli. Nevertheless, whatever their source, such strategies,if strategies they be , appear consistent over the two psycho-physical tasks.

Multidimensional Structure of Tones

Finally, I repeated the multidimensional scaling, but ap-

plied it this time to the subset of pairs of auditory stimulialone. Not surprisingly, the resulting two-dimensional solu-tion generally resembled the corresponding two-dimensionalsolution obtained for the entire bimodal array. There was,however, one modest qualification: Whereas loudness and

pitch accounted for about equal amounts of total variance(36.5% and 36.4%) of the bimodal array (tones and lightstogether), with tones alone pitch accounted fo r substantially

more variance than loudness (45.5% versus 33.0%), and thetwo dimensions together accounted for a somewhat largertotal proportion of the variance.

Thus when multidimensional scaling was applied to tonesalone, it waspitch that predominated—at least, for the present

selection of stimuli. Although the relative scale values ofDimensions 1 and 2 are largely the same in both analyses—with and without the inclusion of lights—the salience of pitchisless in the former, greater in the latter.This outcome impliesa slight difference in behavior (or strategy or whatever) on the

part of the subjects between the ways they judged tone-tone

pairs and the ways they judged tone—ight pairs. In compar-ing lights with tones, the salience of pitch decreases relative

to that of loudness. Or, in other words, loudness took on

greater importance when the subjects compared tones withlights than when they compared tones with tones. This out-come is curious in light of the evidence from Experiment 1

that pitch proves stronger than loudness as a cross-modalanalogue to brightness. Again, though, it suggests that the

additive weightings of pitch and loudness have an "optional"

component. In retrospect, it would ' < have been useful had thesubjects also served in a separate session in which they judgedthe similarities of tones alone, in a unimodal paradigm.

2

General Discussion

Previous studies have shown that both pitch and loudnessserve as cross-modal analogues to visual brightness. Synes-thetic perceivers often report that loud (vs. soft) and high-pitched (vs. low-pitched) sounds induce bright (vs. dim) visualimages (see Marks, 1975). Nonsynesthetic individuals, youngchildren and adults alike, will match the louder or the higher

pitched of two sounds to the brighter of two lights (Marks etal., 1987); indeed, when matching tones to lights of constantbrightness, adults trade off loudness for pitch (Marks, 1978,Figure 18). In verbal tasks, children and adults rate moonlightto be not only brighter than sunlight but also louder and

higher pitched (Marks et al., 1987). The adjective bright israted high pitched and loud, whereas dim is rated low andsoft (Marks, 1982a). And the speed and accuracy in discrim-inating bright from dim lights are greater in the presence oftones matching in either pitch or loudness: Response to a

bright light is relatively fast and accurate in the presence of ahigh-pitched or loud sound and slower and less accurate in

the presence of a low-pitched or soft sound; bu t response to a

dim light is faster and more accurate in the presence of a low-pitched or soft sound (Marks, 1987a). The impression left byall of these findings is that visual brightness has at least two

structural and functional correlates in the auditory realm—pitch and loudness.

Salience of Pitch as an Analogue to Brightness

The present results also agree with earlier evidence that, as

a correlate to visual brightness, pitch is overall at least as

strong as loudness, maybe stronger or more dominant—atleast given the stimuli used here. In Experiment 1, mostsubjects relied on pitch rather than on loudness. In Experi-ment 2 the two attributes turned ou t nearly equal in impor-

tance, but of course that outcome reflected the selection ofsubjects: Note that I chose for Experiment 2 a subset ofsubjects whose results in Experiment 1 indicated them to bemore equal on average than others in their reliance on pitchand loudness. In keeping with the greater salience of pitch-

brightness similarity, Marks et al. (1987) found a higher

2Recently, I did conduct a unimodal study in which a different

group of subjects rated the similarities of tone-tone pairs; multidi-mensional scaling revealed a Euclidean metric (superior in fit to a

city-block metric) in which similarity relied much more on pitch thanon loudness.

Page 14: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 14/17

CROSS-MODAL SIMILARITY 599

proportion of 4-year-olds matching pitch and brightness than

matching loudness and brightness. Thus even young children

find pitch more salient, or at least more reliable, than loudness

as an analogue to brightness.

The high salience of pitch-brightness similarity remains

intriguing. Nearly a half-century ago, the philosopher Hart-shorne (1934) postulated on the basis of phenomenological

evidence that pitch rather than loudness is the "proper"

analogue to visual brightness; Hartshorne's argument rests on

the contention that pitch rather than loudness should be

viewed as the auditory embodiment of intensity—that high-

pitched sounds have the same sharpness and compactness

that characterize bright lights. The intensity common to the

experiences on both sensory modalities is, so to speak, a high

"intensity per unit volume." Though Hartshorne's account

falls short of adequate explanation—indeed, his detailed ac-

count seemingly rests on essentially the kinds of introspections

used by subjects who partake in cross-modal matchin g exper-

iments like the present ones—it does point to the need tospecify more precisely what it is that can make experiencesof

different modalities alike despite their being qualitatively

distinct.

Salience and Primacy of Pitch and Loudness as

Auditory Dimensions

The present findings also suggest that in their cross-modal

connections to brightness the auditory attributes of pitch and

loudness possess a rather high degree of dimensional salience.

It is hard to reconcile these findings with my earlier suggestion

(Marks, 1975) that the dual associations in synesthesia of

pitch with brightness and loudness with brightness might

represent a simpler, unitary association of auditory density

with brightness, for which density in some sense "contains"

within itself both pitch and loudness. Given the present

results, it seems better to maintain the self-evident multiplicity

of relations, in keeping with Karwoski et al.'s(1942) "principle

of alternate polarities and gradients," stating that synesthesia

can involve one to many associations of attributes in different

modalities. To be sure, it is possible to characterize the cross-

modal relation of lights to tones in terms of an equivalence

between brightness and some weighted sum of the two attri-

butes, loudness and pitch. Nevertheless, nonsynesthetic indi-

viduals can differ markedly in the degree to which their

judgments rely on the one auditory attribute or the other; this

is true both when the task involves matching and when it

involves similarity scaling. Moreover, individuals appear to

be at least modestly consistent across tasks in their reliance

on pitch or loudness.

Yet there remains a puzzling aspect to this duplex, if not

duplicitous, rule of correspondence. Even for a given person,

the relative weightings of pitch and loudness (e.g., the coeffi-

cients a and b in Equation 2) are not wholly invariant.

Experiment 1 revealed that the weightings change systemati-

cally over contextual conditions, depending somewhat on the

absolute values of pitch and loudness available to match to

brightness. And the multidimensional scaling solutions of

Experiment 2 revealed that the weightingsdiffered depending

on whether comparisons to visual brightness were included:

When tones were compared to other tones only, pitch counted

more than when tones were compared both to other tones

and to lights. The two experiments agree in suggesting that

there is an "optional" component to pitch-loudness weight-

ing. Relative weighting coefficients are not built in or invar-iant, not even within a given individual, but can vary with

the task.

What is particularly puzzling about this high degree of

salience of pitch and loudness is its coexistence within an

interactive framework. Two classes of findings point to inter-

actions.

First, the dissimilarity between pairs of tones appears to

follow more closely a Euclidean than a city-block rule (Ex-

periment 2; see also Grau & Kemler N elson, 1988; Melara,

1989). Thus the contribution to dissimilarity of a constant

difference on one dimension, say, pitch, depends on the size

of the difference on the other dimension, loudness.

Second, in tasks requiring speeded classification of toneson the basisof either pitchor loudness, uncorrelated variation

in the levels of the irrelevant dimension depresses perform-

ance relative to a baseline condition containing no variation

in the irrelevant dimension, but correlated variation (redun-

dancy) enhances performance relative to baseline. That is,

classification of stimuli by, say, their pitch is slow (and less

accurate) when the stimuli also vary unpredictably in loudness

(low pitch and high pitch alike occur sometimes at low and

sometimes at high loudnesses) as compared with baseline, in

which loudness is always constant; however, classification is

faster (and more accurate) with redundancy (low pitch always

associated with, say, low loudness; high pitch associated with

high loudness) (Grau & Kemler Nelson, 1988; Melara &

Marks, 1989; Wood, 1975).

The interactions in similarity scaling and in speeded clas-

sification characterize performance in what Garner (e.g.,

1974) has called pairs of "integral" perceptual dimensions.

Besides pitch and loudness, other dimensions that interact

integrally are lightness and saturation. Integral pairs of per-

ceptual dimensions contrast with what Garner terms "sepa-

rable" pairs of dimensions, such as lightness and size; sepa-

rable dimensions obey city-block metrics in similarity scaling

and reveal no changes in performance at speeded classification

when the irrelevant dimension takes on variable levels. The

salience of pitch and loudness described in the beginning of

this section—based on evidence that weightings of pitch and

loudness vary with the stimulus context—probably corre-sponds to some aspect of the primacy of pitch and loudnesses

as dimensions of perceptual experience. Within the domain

of tones that vary in sound frequency and intensity, pitch and

loudness may have a kind of ontic priority, may represent

perceptual Urbegrijje. Subjects seem to rely on underlying

values of pitch and loudness even when categorizing stimuli

with regard to volume and density or with regard to other less

meaningful dimensions (Grau & Kemler Nelson, 1988; Me-

lara & Marks, 1989).

Figure 6 provides my attempt to capture some of the main

features of these interrelations. Consider two acoustic stimuli,

A and B, each stimulus having particular values of sound

frequency and intensity. The intent of Figure 6 is to sketch

Page 15: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 15/17

600 LAWRENCE B. MARKS

STIMULUS A STIMULUS B

FREQUENCY INTENSITY FREQUENCY INTENSITY

Psychophysical

P e r c e p t u a l

P o s t - P e r c e p t u a l

Figure 6. Schematic model of stages in processing the auditorydimensions of pitch and loudness.

some of the main sensory, perceptual, and postperceptual

stages in processing frequency and intensity. Thick lines andlarge arrows in the figure represent primary effects or influ-

ences; lighter lines represent secondary influences. I am notclaiming, however, that either primary effects or secondary

effects are all of one type or have a comm on kind of m echa-nism; on the contrary, the underlying mechanisms at differentstages are undoubtedly different.

At one of the earliest levels in processing, we find thepsychophysical interactions between the two stimulus dimen-sions, frequency and intensity, interactions that result in theproduction of the sensory variables P, and L,. Often, psycho-physicists refer to these quantities as pitch and loudness but Ido wish to distinguish P , and L, from other quantities thatemerge subsequently in processing. For the most part, fre-quency determines P, an d intensity determines L,, bu t inten-sity can also affect P, and frequency has a substantial effect

on L,. These psychophysical relations presumably rest onprinciples of sensory coding—principles that, unfortunately,

still remain only partly understood. It is worth pointing out,as an exam ple, the diversity in the under lying processes. Theinfluence of intensity on /", presumably arises through someneural interaction in underlyin g codes (related perhaps to theuse of neural spike frequency to code both P I and L,); theinfluence of frequency on L,, however, is largely determinedby preneural factors, by the filtering characteristics of theexternal ear canal and middle ear. In ma ny experiments, it isuseful to minimize interactions of the psychophysical sort byequating stimuli of different frequencies for loudness and byequating stimuli of different intensities for pitch; in Experi-ment 1, for example, the specification of sound intensity interms of loudness level served to reduce, if not eliminate,

effects of sound frequency on loudness (L,). Th e psychophys-ical interactions make themselves known, of course, at allsubsequent levels of processing, because L, and P, becomeinputs to later stages.

The second stage represents what I shall call perceptual

interactions, interactions of the sort found in tasks requiringspeeded classification. W hen trying to identify whether differ-en t sounds have, for example, the same pitch (P2), perform-ance is affected by variations in loudness (L 2).

Th e third stage represents the postperceptual conflation ofP2 and L2 onto the variable represented earlier as 5a—he

auditory qua ntity that subjects presumably com pare to bright-ness in cross-modal matching tasks. Figure 6 shows #a com-prising two overlapping elements; this "loose coupling" ismeant to indicate that the status of Ba differs from that ofother variables in the diagram. Ba does not represent a new,emergent, unitary perceptual quality, but rather a result ofpostperceptual—and sometimes optional—processing. fia

represents a sum, probably a linear sum (Equation 2), of P2

and L2; however, as Experiment 1 showed, the weightings ofP2 and L2 are strongly tied to stimulus context. Moreover,although P2 and L2 provide the raw materials for Ba , thedimensional identities of pitch and loudness are not lost (asevidenced by the results of Experiment 2).

Finally, a single solid line in Figure 6 connects the process-ing of Stimulus A to the processing of Stimulus B. This lineindicates the interrelation between different stimuli thatemerges in mu ltidimensional similarity scaling. Although thisline is drawn at the level of Ba , remember that Ba itselfcomprises P2 and L2. There is not clear evidence whether theprocess of interstimulus comparison, which produces an in-teractive (Euclidean) metric, itself would precede or follow

the process of summation, or whether the two processesmerely stand for the experimental results that emerge out ofthe different demands inherent in two different tasks of judg-ment.

Is Cross-Modal Similarity Primarily Perceptual?

Fo r more than a decade I have contended that cross-modalsimilarities—in particular, pitch-brightness and loudness-

brightness similarities—are quintessentially perceptual in na-ture, that they are inherent in the nature of auditory and

visual experiences themselves, and that they probably derivefrom some common characteristics of neural coding in theauditory and visual modalities (Marks 1975, 1978; Marks et

al., 1987). Consistent with this contention is the early devel-opmental appearance of these similarities. Similarities ofpitch-brightness and loudness-brightness are evident in vol-untary cross-modal matches at age 4 years (Marks et al.,

1987), and equivalence of loudness-brightness was reportedin autonom ic responses at age 1 mon th (Lewkowicz & Tur-kewitz, 1980).

Nevertheless, even if these similarities have an intrinsicstructural basis, even if they originate in phenom enal experi-ence and sensory coding, it is perfectly possible that cross-modal associations come, developmently, to be mediated bymore cognitive structures. The very fact that adults, and evenyoung children, readily interpret verbal expressions meta-

Page 16: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 16/17

CROSS-MODAL SIMILARITY 601

phorically, by the same rules that govern perceptual m atching,implies that cross-modal correspondences are represented at

a semantic level. Or at least, verbal processes have readyaccess to perceptual relations, including cross-modal relations.

Moreover, Walker and Smith (1984) found tones (varying in

pitch) to influence reaction times to words (describing visualattributes), an outcome that implies interactions at a semanticlevel of processing. It is at least conceivable, therefore, thatperceptual judg men ts themselves—cross-modal matches andratings of cross-modal similarities—come to be made on thebasis of some type of abstract knowledge about perceptualrelations.

Th e present results appear to be especially com patible withthe view that higher order cognitive processes help mediatecross-modal similarities, particularly when the task demandsintegration of multiple attributes from one of the modalities.Experiment 1 revealed a large relativistic component in thematching of both pitch an d loudness to brightness, a finding

that seems compatible with the use of semantic markers to

indicate the way that dimensions are aligned; that is, "dim"= "low pitched" and "soft," w hereas "bright" = "high pitched"

and "loud." Even more relevant to this point are the largeand consistent interindividual differences, revealed by bothExperiments 1 and 2, in the extent to which either pitch orloudness can contribute to matches with brightness. Althoughit is conceivable that such individual differences are percep-

tually based and even "hard wired," it seems more reasonableto infer that the reliance on either pitch or loudness is in somesense optional, and contingent on the existence of processesthat operate on relatively abstract qualities derived from per-cepts. Maybe perceptual similarities provide raw materials,upon which cognitive processes come to operate and elabo-

rate.Perhaps the strongest evidence in favor of the notion that

cross-modal matches ca n be mediated at least in part byhigher level systems comes from studies in which the directionper se of matching turns out to be malleable. It is well knownfrom psychophysical research that people, when ap propriatelyinstructed, can judge "inverse" attributes—hatthey can m akequantitative judgments of smoothness as well as roughness ofsurfaces (Stevens & Harris, 1962), of dimness as well asbrightness of lights (Stevens & Guirao, 1963), of hardness aswell as softness of materials (Harper & Stevens, 1964). Morepertinent, subjects can make cross-modality matches to eitherbrightness or dimness, to either softness or loudness, by mea nsof another sensory modality (Stevens & Guirao, 1963), in-verting, on instruction, the dimension of intensity; presum -ably, people manage this task by means of some mentalmanipulation of the dimension's polarity.

Still more to the point, the emergence of "inverse" attributesis not merely the result of experimental demand. Even wh ensubjects are not instructed how to order their cross-modalmatches, under certain conditions (such as the matching oftones that vary in loudness to surfaces that vary in lightness,from black through gray to white), the very same subject mayinvert the direction of matches on different occasions, some-times matching a series of increasing loudnesses with a seriesof increasing lightnesses, other times matching increasingloudness with increasing darkness (Marks, 1974).

Inversion of dimensions constitutes an extreme form ofrelativity in judgment. Neither the end points no r intermedi-ate levels of heteromodal dimensions need be rigidly con-nected. The findings of Experiment 1 are consistent withinversion of dimensions and with other results (Marks et al.,

1986), suggesting a sizable relativistic component to cross-modal m atching. On the other hand, m atches are not whollyrelative; rather, they appear to blend the relative with theabsolute (see Marks et al., 1986). (Intramodal matches canalso reveal a large relative com ponen t; M arks, 1988.) If knowl-edge about cross-modal relations does d erive from a primaryperceptual similarity, the residual absolute comp onent to thissimilarity may represent intrinsic equivalences between sub-jective levels of different modalities.

What does the capacity to manipulate the direction of across-modality match imply? Fo r one, it hints at the existenceof an underlying transperceptual mediator—probably linguis-tic—that operates in the production of matches; at the sametime, it implicates a system that can abrogate the rules of

perceptual cross-modal similarity, making possible a degreeof creativity in the production of cross-modal relations andthus perhaps providing a vehicle if not an imperative for stillhigher level cognitive skills, such as the construction of met-aphor.

References

Boring, E. G. (1942). Sensation an d perception in the history of

experimental psychology. New York: Appleton-Century-Crofts.Boring, E. G., & S tevens, S. S. (1936). The na ture of tonal brightness.

Proceedings of th e Nat ional Academy of Science s, 22, 514-521.Carroll, J. D ., & Chang, J. J. (1970). Analysis of individual differences

in multidimensional scaling via an «-way generalization of "Eck-ard-Young" decomposition. Psychometrika, 35, 283-319.

Carvellas, T., & Schneider, B. (1972). Direct estimation of m ultidi-mensional tonal dissimilarity. Journal of th e Acoustical Society ofAmerica , 51, 1839-1848.

Cohen , N. E. (1934 ). Equivalence of brightness across modalities.American Journal of Psychology, 46, 117-119.

Fletcher, H., & Munson, W. A. (1933). Loudness, its definition,measurement and calculation. Journal of th e Acoustical Society ofAmerica , 5, 82-108.

Garner, W. R. (1974). The processing of information an d structure.

Potomac, MD: Erlbaum.Grau, J. W., & Kemler Nelson, D. G. (1988). The d istinction between

integral and separable dimensions: Evidence for the integrality ofpitch and loudness. Journal of Experimental Psychology: General,

117, 347-370.Guirao, M., & Stevens, S. S. (1964). Measurement of auditory density.

Journal of th e Acoustical Society of America, 36 , 1175-1182.

Harper, R., & Stevens, S. S. (1964). Subjective hardness of compliantmaterials. Quarterly Journal of Experimental Psychology, 16, 204-215.

Hartshorne, C. (1934). Th e philosophy an d psychology of sensation.Chicago: U niversity of Chicago Press.

Hornbostel, E. M. von (1925). Die Einheit der Sinne. Melos, Zeit-schrif t furMusik, 4, 290-297.

Hyman, R., & Well, A. (1967). Judgments of similarity and spatialmodels. Perception & Psychophysics, 2, 233-248.

Indow, T., & Kanazawa, K. (1960). Multidimensional mapping ofMunsell colors varying in hue, chroma, and value. Journal ofExper imenta l Psychology, 59 , 330-336.

Page 17: Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness

http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 17/17

602 LAWRENCE B. MARKS

Karowski, T. F., Odbert, H. S., & Osgood, C. E. (1942). Studies in

synesthetic thinking. II. The role of form in visual responses to

music. Journal of Genera l Psychology, 26 , 199-222.

Kruskal, J. B. (1964). Multidimensional scaling by optimizing good*

ness-of-fit to a nonmetric hypothesis. Psychometrika, 29, 1-27.

Langfeld, H. S. (1914). Note on a case of chromaesthesia. Psycholog-

ical Bulletin, 11, 113-114.Lewkowicz, D. J., & Turkewitz, G. (1980). Cross-modal equivalence

in early infancy: Auditory-visual intensity matching. Develop-

mental Psychology, 16, 597-607.

Marks, L. E. (1974). On associations of light and sound: The media-

tion of brightness, pitch, and loudness. American Journa l of Psy-

chology, 87, 173-188.

Marks, L. E. (1975). On colored-hearing synesthesia: Cross-modal

translations of sensory dimensions. Psychological Bulletin,82,303-

331.

Marks, L. E. (1978). The unity of the senses: Interrelations among

the modalities. New York: Academic Press.

Marks, L. E. (1982a). Bright sneezes and dark coughs, loud sunlight

an d soft moonlight. Journal of Experimental Psychology: Human

Perception and Performance, 8, 177-193.

Marks, L. E. (1982b). Synesthetic perception and poetic metaphor.Journal of Experimental Psychology: Human Percept ion and Per-

formance, 8, 15-23.

Marks, L. E. (1985). Multidimensional scaling in the assessment of

cross-modal similarity. In E. E. Roskam (Ed.), Measurement and

personali ty assessment(pp. 165-177). Amsterdam: Elsevier.

Marks, L. E. (1987a). On cross-modal similarity: Auditory-visual

interactions in reaction time. Journa l of Experimental Psychology:

Human Percept ion and Performance, 13, 384-394.

Marks, L. E. (1987b). On cross-modal similarity: Perceiving temporal

patterns by hearing, touch, and vision. Perception & Psychophysics,

42, 250-256.

Marks, L. E. (1988). Magnitude estimation and sensory matching.

Percept ion &Psychophysics, 43, 511-525.

Marks, L. E., Hammeal, R. J., & Bernstein, M. H. (1987). Perceiving

similarity and comprehending metaphor. Monographs of the Soci-

ety for Research inChild Development, 52 (Whole No. 215).

Marks, L. E., Szczesiul, R., & Ohlott, P. (1986). On the cross-modal

perception of intensity. Journal of Experimental Psychology: Hu-

man Percept ion and Performance, 12, 517-534.

Melara, R. D, (1989). Similarity relations among synesthetic stimuli

and their attributes. Journa l of Experimental Psychology: Human

Percept ion and Performance, 15, 212-231.

Melara, R. D., & Marks, L. E. (1989). Perceptual primacy of dimen-

sions: Support for a modelof dimensional interact ion. Unpublished

manuscript.

Mellers, B. A., & Birnbaum, M. H. (1982). Loci of contextual effects

in judgment. Journal of Experimental Psychology: Human Percep-

tion and Performance, 8, 582-601.

Ortmann, O. (1933). Theories of synesthesia in the light of a case of

color-hearing. Human Biology, 5, 155-211.

Parducci, A. (1965). Category judgment: A range-frequency model.

Psychological Review, 72,407-418.Pruzansky, S. (1975). How to use SINDSCAL: A computer program

fo r individual differences in multidimensional scaling. Murray Hill,

NJ: Bell Telephone Laboratories.

Rich, G. J. (1919). A study of tonal attributes. AmericanJourna l of

Psychology, 30, 121-164.

Schneider, B. A., & Bissett, R. J. (1981). The dimensions of tonal

experience: A nonmetric multidimensional scaling approach. Per-

ception &Psychophysics, 30, 39-48.

Shepard, R. (1964). Attention and the metric structure of the stimulus

space. Journa l of Mathematical Psychology, 1, 54-87.

Stevens, J. C, & Marks, L. E. (1980). Cross-modality matching

functions generated by magnitude estimation. Percept ion &Psy-

chophysics, 27, 379-389.

Stevens, S. S., & Guirao, M. (1963). Subjective scaling of length and

area and the matching of length to loudness and brightness. Journa lof Experimental Psychology, 66 , 177-186.

Stevens, S. S., Guirao, M., & Slawson, A. W. (1965). Loudness, a

product of volume times density. Journal of Experimental Psy-

chology, 69, 503-510.

Stevens, S. S., & Harris, J. R. (1962). The scaling of subjective

roughness and smoothness. Journal of Experimental Psychology,

64, 489-494.

Tukey, J. W. (1977). Exploratory data analysis . Reading, MA: Ad-

dison-Wesley.

Walker, P., & Smith, S. (1984). Stroop interference based on the

synesthetic qualities of auditory pitch. Perception, 13, 75-81.

Wicker, F. W. (1968). Mapping the intersensory regions of perceptual

space. AmericanJournal of Psychology, 81, 178-188.

Wood, C. C. (1975). Auditory and phonetic levels of processing in

speech perception: Neurophysiological and information-processing

analysis. Journa l of Experimental Psychology: Human Percept ion

and Performance., I, 3-20.

Zwislocki,J. J.,&Goodman, D. A. (1980). Absolute scaling of sensory

magnitudes: A validation. Perception &Psychophysics, 28, 28-38.

Received November 3, 1987

Revision received December 5, 1988

Accepted December 7, 1988