lawrence e. marks (1989) on cross-modal similarity: the perceptual structure of pitch, loudness, and...
TRANSCRIPT
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 1/17
Journal of Experimental Psychology:H um an Perception and Performance1989, Vo l. 15, No. 3 586-602
Copyright 1989 by the American Psychological Association, Inc.0096-1523/89/J00.75
On Cross-Modal Similarity: The Perceptual Structure of Pitch,
Loudness, and Brightness
Lawrence E. MarksJohn B. Pierce Foundation Laboratory and Y ale University
Examined howpitch and loudness correspond to brightness. In the Experiment 1,16 Ss identified
which of 2 lights more resembled each of 16 tones; in Experiment 2, 8 of the same 16 Ss rated
the similarity of lights to lights, tones to tones, and lights to tones. (1 ) Pitch and loudness both
contributed to cross-modal similarity, but for most Ss pitch contributed more. (2) Individuals
differed as to whether pitch or loudness contributed more; these differences were consistent across
matching and similarity scaling. (3) Cross-modal similarity depended largely on relative stimulus
values. (4) Multidimensional scaling revealed 2 perceptual dimensions, loudness and pitch, with
brightness common to both. A simple quantitative model can describe the cross-modal compar-
isons, compatible with the view that perceptual similarity may be characterized through a
malleable spatial representation that is multimodal as well as multidimensional.
Over the past several years, I have explored the perceptualsimilarities that characterize relatively abstract dimensions ofauditory and visual experience. These similarities reveal them -selves in a variety of domains, including (a) synesthetic per-ception (Marks, 1975), (b) cross-modal matching (Marks,1974, 1978; Marks, Hammeal, & Bornstein, 1987), (c) inter-pretation of cross-modal metaphors (Marks, 1982a; 1982b;Marks et al., 1987), and (d) speed of sensory informationprocessing (Marks, 1987a). Evidence from all four domainsestablishes the existence of correspondences between the vis-ual dimensions of brightness, lightness, size, and shape on theone hand and the auditory dimensions of pitch and loudness
on the other. In particular, much of this evidence convergesto support the view that the single dimension of brightnessmaps onto the two dimensions of pitch and loudness.
Domains of Cross-Modal Similarity
An observation commonly made by synesthetic perceivers,in whom sounds involuntarily and unavoidably arouse sec-ondary visual qualities or images, is that loud (vs. soft) andhigh-pitched (vs. low-pitched) tones produce bright (vs. dimor dark ) colors (Karowski, Odbert, & Osgood, 1942; Mark s,1975). Despite the considerable diversity and idiosyncrasythat synesthetic perceivers repo rt in their experiences, certain
aspects of synesthetic experiences appear uniform, even uni-versal. Notable in this respect is the correspondence between
This research was supported by National Institutes of Health Grant
NS21326.
I thank Lynne Papiernik and Elizabeth Warner for their assistance
and Robert Melara for his critical reading and insights. James Cutting,
Michael Kubovy, Bruce Schneider, and an anonymous reviewer
provided thoughtful comments.
Correspondence concerning this article should be addressed to
Lawrence E. Marks, John B. Pierce Foundation Laboratory, 290
Congress Avenue, New Haven, Connecticut 06519.
pitch and brightness, w hich represents perhaps the most sali-ent and pervasive relation in synesthetic perception.
The second finding to note is that under conditions thatpermit "free," unconstrained cross-modal m atching (i.e.,matching in which the direction or ordering of sensations isnot specified in advance), nonsynesthetic perceivers (adultsand children) match both loud and high-pitched sounds withbright lights, and soft and low-pitched sounds with dim lights(Marks, 1974, 1978; Marks et al., 1987). Thus the equiva-lences that appear in involun tary, synesthetic perception alsoappear in voluntary, cross-modal matching.
Third, these very same structural equivalences, evident in
synesthesia and cross-modal matching, emerge in language.Children and adults alike interpret words that describe highloudness or high pitch to imply bright colors, words thatdescribe low loudness or low pitch to imply dim colors, andvice versa: For example, sunlight is rated louder and higherin pitch than moonlight; roar is rated brighter than whisper
(Marks, 1982a; Marks et al., 1987). To a great extent theunderstanding of cross-sensory metaphors follows the rulesgoverning synesthetic and cross-modal perception.
Synesthetic perception, cross-modal matching, and com-prehension of cross-sensory metaphor all speak to the issue ofstructural relations among dimensions of experience. Cross-modal correspondences appear also in functional processes,
in sensory inform ation processing. The fourth finding is thatreaction times to bright lights are faster (and errors are fewer)
when the lights are accompanied by loud (vs. soft) or by high-pitched (vs. low-pitched) sounds, whereas reactions times todim lights are faster when accompanied by soft or by low-pitched sounds (Marks, 1987a). Moreover, low- and high-pitched sounds similarly affect speed of response in discrimi-nating words describing visual experiences (W alker & Smith,1984). In nonsynesthetic perceivers, therefore, correspond-ences between dimensions of heteromodal experiences neednot rely on voluntary behavior; interactions between dimen-sions can take place quite autom atically. Thus the m atches ofloudness with brightness and of pitch with brightness pervade
586
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 2/17
CROSS-MODAL SIMILARITY 587
both structural and functional processes in perception and in
language.
Properties of Cross-Modal Similarities
What is the source of these pervasive connections acrossmodalities? For example, are they "hard wired"? The quali-tative rules of cross-modal correspondence may themselvesbe intrinsic to perception; associations across modalities couldemerge from basic sensory/perceptual processes. Consistentwith this hypothesis is the evidence that cross-modal similar-ities appear early in life. Lewkowicz and Turkewitz (1980),for example, reported behavioral equivalence between loud-ness and brightness in 3-week-old infants. Correspondencebetween loudness and brightness might rest on their use of a
common neurophysiological code—for instance, on numberof neural impulses per unit time.
The hypothesis that cross-modal similarities rest on com-
mon neural codes can lead naturally to a quantitative metric:Perceptual experiences in different modalities are theorizedto be equivalent when the underlying neural processes are
equal. Loudness "equals" brightness when rates of neuralactivity in the auditory and visual systems (or specified sitestherein) are equal. A model of this sort implies absoluteequivalences between modalities: Particular levels of loudnessequal particular levels of brightness.
But sensory coding isonly one possible basis to cross-modalequivalence. Perhaps modalities are not so strongly inter-woven through biology; perhaps cross-modal connections de-
pend only in part on intrinsic sensory functions, whereas in
significant measure they depend on higher level cognitiveprocesses. Even if biology predisposes the format of the qual-
itative rules, the rules that specify which dimensions corre-spond, cognitive and especially linguistic development may
override biological disposition so as to determine the quanti-tative rules, the rules that specify which values on givendimensions correspond. By this token, even if loudness andbrightness find their basic resemblance in common physiolog-ical processes, they may subsequently come to owe theircommunality to a common semantic code, with loud and
bright alike represented as "intense," and s o f t and dim as"nonintense." (Of course, the common semantic codes may
owe their ultimate origin to the common perceptual physiol-ogy.) Moreover, to the extent that s o f t contrasts with loud and
dim with bright by means of classes of differing relative values,
semantic coding implies relational correspondences: The sof-ter sound is to the dimmer light as the louder sound is to the
brighter light.
Cross-Modal Similarity Is Multidimensional
The present study comprised two experiments that aimedto explore, within a conceptual framework that is explicitlymultidimensional, the relations of pitch and loudness to
brightness, by asking three interrelated questions.First, How do loudness and pitch interact in determining
the similarities of sounds to lights? Does similarity rest on just
one attribute or the other, or do loudness and pitch combinein an integrative process? And if pitch and loudness dointegrate, does everyone weight the attributes equally, or do
some individuals place primary emphasis on pitch, others on
loudness? There is even an alternate conceptualization,
namely, that pitchand
loudnessare
themselves componentsof a third, unitary attribute—density—and that density is thesingular and primary auditory correlate of visual brightness( s e e Marks, 1975). It is intriguing that the attribute no wknown as auditory density (see Guirao & Stevens, 1964;Stevens, Guirao, & Slawson, 1965) once was called "bright-ness" (see Boring & Stevens, 1936). To the best of my knowl-edge, no data exist that address the issue of how individualsubjects integrate loudness and pitch in making cross-modalcomparisons between sounds and lights.
The second question is implicit in the first: What is the
range of individual differences in the ways that people usepitch or loudness in cross-modal judgment? In particular, areindividuals reliable across tasks in their reliance on the one
attribute or the other? If cross-modal correspondences are"hard wired," we might expect considerable uniformity bothacross subjects and across tasks.
Third, Howabsolute are the cross-modal associations? Are
intersensory matches based wholly or largely on absolutecorresponding levels of sensory experience, in that a particularbrightness equals a particular pitch or loudness, or are matchesbased on relations, in that the greater of two brightnessesmatches the higher of two pitches or the greater of twoloudnesses?
Synesthesia provides evidence that at least a few cross-modal connections can be absolute, notably the correspond-ences between particular colors and the vocal or muscial
sounds that induce them; it is common to find synestheticperceivers reporting connections between notes of the musicalscale and colors of particular hue and brightness (e.g., Lang-feld, 1914; Ortmann, 1933). Absolute connections, however,may be limited to instances in which the percepts have clearcategorical representations, as is the case with musical notesand with hues. Not all correspondences, though, need beabsolute; even in synesthesia, it is possible that relative valuesrather than absolute connections govern correspondencesamong psychologically more continuous dimensions likebrightness, pitch, and loudness.
Hornbostel (1925) was perhaps the first to argue that in
nonsynesthetic perception as well as in synesthesia, corre-spondences among continuous dimensions of different sensemodalities are absolute. He justified his argument by reportingtransitivity of cross-modal matches between pitch and odor,surface brightness and odor, and pitch and surface brightness.On the other hand, Cohen (1934) soon after countered thatHornbostel's cross-modal matches were only fortuitously ab-
solute, a result of the particular stimulus values that Horn-bostel used as standards; Cohen's own results implied a strongrelative component to matching.
In recent years, the same dispute has resurfaced, withZwislocki and Goodman (1980) and J. C. Stevens and Marks(1980) arguing for the existence of absolute correspondences,especially in the perception of intensity, while Mellers and
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 3/17
588 LAWRENCE B. MARKS
Birabaum (1982) have claimed that cross-modal judgments
rely on relative correspondences. Th e issue remains far from
settled. Lewkowicz and Turkewitz's (1980) findings imply
absolute correspondenceofloudness and brightness in infants.
Most recently, Marks, Szczesuiul, and Ohlott (1986) reported
cross-modality matches in adults between loudness and
brightness (and between loudness and vibration) that ap-
peared to compose roughly equal parts of the absolute and
the relative. In that study, the cross-modal matches were
derived indirectly, through numerical estimations of either
intensity (magnitude estimates of stimuli from different mo-dalities) or dissimilarity (magnitude estimates of differencesbetween heteromodal pairs of stimuli). One goal of the first
experiment here was to use nonnumerical judgments to ex-
amine individual differences and consistencies in absolute/
relative matching of pitch and loudness to brightness.
Toward a Model of Cross-Modal Matching
To help formalize some of the empirical findings and
theoretical concepts, let us consider the following, which
constitute initial steps toward developing a model that might
apply to various types of cross-modal judgment. In particular,
I seek to establish a model that can apply to the two kinds of
judgment obtained in the present pair of experiments: (a)matching (discriminative) judgments, in which subjects decide
which of two lights a particular tone more closely resembles,
and (b) similarity ratings, in which subjects estimate the degree
of resemblance between a tone and a light.
In general terms, the model states that cross-modal judg-
ments rest on the person's comparison of two perceptual
quantities—an auditory quantity, Ba , which in the present
experiments comprised pitch P and loudness L, with a visual
quantity, £v, which corresponded to brightness. Conse-
quently, the model contains four main components: First,
there are the psychophysical transformations of sound fre-
quency and intensity to perceptual values of P and L. (There
isalso a psychophysical transformation ofluminance to values
of brightness Bv, but this transform was not a matter of
concern in these experiments.) Second, there is a rule of
auditory integration, by which P and L combine into B a.Third is a rule of cross-modal comparison, specifying themathematical operation by which Ba is compared with Bv.
Finally, there is a rule of response, by which the results of thecross-modal comparison translate into sensory matches or
ratings of similarity. In the remainder of this section I shallconsider the possible rules of integration and comparison,
deferring to later in the article a consideration of the psycho-
physical transformations and response rules.
How might fia depend jointly on pitch and loudness? And
how might subjects compare the auditory quantity Ba to the
visual quantity Bv? Asalready noted, Ba represents the quan-
tity, defined here in terms of pitch an d loudness, that I assume
subjects compare on each trial to visual brightness. Although
an infinitude of definitions of Ba are possible, two definitions
offer themselvesas main candidates. One is a weighted vector
combination:
The second is a weighted linear combination:
Ba = aP + bL. (2)
Which combinatorial rule is the more appropriate? Possibly
relevant to this issue is the multidimensional analysis ofperceived tonal similarity. Schneider and Bissett (1981) iden-
tified the putative attribute of auditory density as lying on an
axis halfway between pitch and loudness in a two-dimen-
sional, Euclidean auditory space. In some sense, then, audi-
tory density may be said to "contain" pitch and loudness.
These findings are consistent with an earlier hypothesis of my
ow n (Marks, 1975), namely, that in the synesthetic matching
of lights with sounds people may be equating visual brightness
with auditory density. Moreover, Schneider an d Bissett noted
that density, and perhaps therefore the quantity Bv may be
represented by the linear rule expressed in Equation 2 (and
in which, on average, a = b, because in their study density
lay on an axis about 45° between pitch and loudness).
An d what rule governs the comparison between f?a and J3V?
Consider the possibility that cross-modal differences are per-ceived nonlinearly; one rule of comparison might state that
the cross-modal difference £ > a ,v is given by
Another rule is linear:
= i R2
— R2| '
A— |U
aM J v f
— I Ba —
(3)
(4)
B. = (a P2 + bL2T- (1)
Which rule is appropriate? Consider the evidence concern-
ing perceived similarity. For auditory similarity, most of the
evidence favors a straightforward Euclidean metric. First, in
tasks requiring speeded stimulus classification and discrimi-
nation, pitch and loudness interact as if they form "integral"perceptual dimensions (Grau & Kemler Nelson, 1988; Melara
& Marks, 1989; Wood, 1975), and integral interactions cor-
relate with Euclidean distance metrics (see Garner, 1974;
Hyman & Well, 1967; Shepard, 1964). Second, although
studies that have carried out multidimensional scaling of tones
have not always been able to differentiate city-block from
Euclidean metrics with respect to goodness of fit (e.g., Schnei-
der & Bissett, 1981), recent empirical evidence of Grau and
Kemler Nelson (1988) clearly favors the Euclidean over the
city block. Third, despite the fact that they could not choose
empirically between the two metrics, Schneider and Bissett
(1981) argued nevertheless on theoretical grounds in favor ofthe Euclidean.
None of this evidence, of course, applies directly to cross-
modal similarity. But a fourth finding does: Melara (1989)
found the Euclidean metric superior to the city-block metric
in scaling comparisons of tones (pitch) and colors (lightness);
empirical evidence shown later in this article (Experiment 2)
also favors the Euclidean.
Th e framework given by the present model allows restate-
ment of the three questions posed in the last section. Question
1 asks whether people use a rule of integration like weighted
linear summation in making cross-modal judgments, and, ifso, whether individualsdiffer substantially in their weighting
coefficients a and b. Question 2 asks whether an individual's
weighting coefficients are consistent over different empirical
tasks, such as matching and similarity rating. And Question
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 4/17
CROSS-MODAL SIMILARITY 589
3 asks whether a particular value of Ba corresponds absolutely
to a particular value of Bv, independent of the particular
stimulus context. I shall elaborate the model in greater detail
in the course of describing two experiments in which I sought
to answer these three main questions.
Experiment 1
Although psychophysicists tend to treat perceived intensity
as the primary psychological variable of cross-modality
matching, and hence to treat the relation of loudness to
brightness as central to similarity between vision and hearing,
a couple of lines of evidence suggest that pitch may be in
some sense "better" than loudness as a cross-modal analogue
to brightness. First, very young children (around 4 years) are
more likely to match low pitch with dim and high pitch with
bright than to match soft with dim and loud with bright
(Marks et al., 1987). Second, among adults the metaphorical
translation of meanings takes place at least as strongly, if not
more s trongly, between pitch and brightness as between loud-
ness and brightness (Marks, 1982a).
When given the opportunity, subjects appear to use both
auditory dimensions of pitch and loudness in matching to
brightness: Marks (1978) obtained cross-modality functions
that showed a trade-off between pitch and loudness in the
matching of sounds to lights of constant brightness. To match
a given brightness, as pitch increased loudness decreased
concomitantly. To obtain those results, however, data were
pooled over subjects; the study was not designed to determine
to what extent certain subjects may have relied on pitch,
others on loudness. The triple goals of Experiment 1 were (a)
to examine more carefully the way pitch and loudness com-
bine in cross-modal association to brightness, (b) to assessindividual differences in reliance on pitch and loudness, and
(c) to determine whether the associations depend largely on
relative (contextual) values or on absolute correspondence.
In planning Experiment 1,1sought to satisfy three criteria:
First, the task should be simple (e.g., to permit future com-
parisons of children to adults). To accomplish this, I decided
on a constant-stimulus procedure. Present throughout the
experiment were two lights differing in luminance and hence
in brightness. On each trial the subject heard a tone; the task
was simply to indicate whether the tone was more similar to
the dim or to the bright light. Second, the procedure should
be capable of parceling out the contributions of pitch and
loudness. To this end, the tones varied from trial to trial inboth frequency and intensity. And third, the procedure should
permit determining whether the contributions of pitch and
loudness rely on absolute correspondences or on relations. To
this end, 1 tested each subject sufficiently to establish individ-
ual parameters.
Method
Stimuli
The visual stimuli were two circular patches of white light, each
produced by a 100-W incandescent lamp that transilluminated a
diffusing paper. Each patch was 3.8 cm in diameter (the two were
separated horizontally by 5.7 cm edge to edge) and thus subtended a
visual angle of 3° at the viewing distance of 72 cm. Wratten neutral
filters inserted in the path s of the beams served to set the luminances
to 180 and 900 cd/m2("dim"and "bright," respectively).
The sounds were pure tonesproduced by a Coulbourn Instruments
modular system, under the control of an Apple He microcomputer.
The tones could take any one of six frequencies (200, 300, 540, 750,1250, or 2000Hz) and any one of six intensity levels (45, 50, 55, 60,
65, or 70 dB loudness level). I use a designation in terms of loudness
level (LI.) because, over the frequency range of 200-2000 Hz, equal
sound pressure levels (SPLs) are far from appearing equally loud.
The values of LL, derived here through Fletcher-Munson cqual-
loudness contours, represent the SPLs of a 1000-Hz tone equal in
loudness to the tone in question. Thus all tones at any given LL are
roughly equal in loudness—though generally not equal in SPL. Tones
were presented through matched TDH-49 headphones mounted in
MX41/AR cushions. Tones lasted 1.5 s, with rise and decay times of
10ms.
The set of six frequencies was divided into two subsets of four, a
"low" subset (200-750 Hz) and a"high" subset (540-2000Hz), with
the frequencies 540 and 750 Hz in common. Similarly, the set of six
LLs was divided into two subsets of four, a "low" subset (45-60dB)and a "high" subset (55-70 dB), with the LLs of 55 and 60 dB in
common. Taking all possible combinations of the loudness and pitch
subsets, I generated the stimuli for the four conditions of the experi-
ment: (a) low pitch/low loudness, (b) high pitch/low loudness, (c)
low pitch/high loudness, and (d) high pitch/high loudness. Thus each
condition comprised 16 different tonal stimuli (4 frequencies x 4
LLs).
Procedure
Subjects sat in a darkened, sound-isolating booth in which the two
lights were con tinuously visible. Subjects held a nume ric keypad that
connected directly to the computer. By depressing one of the keys,the subject initiated each stimulus presentation. On each trial, the
subject's task was to indicate, by appropriately pressing one of two
additional keys, whether the tone more closely resembled the dim or
the bright light. Tones were presented in random order, different for
each subject and in each condition; each of the 16 tones was judged
a total of 20 times within the course of the session.
Every subject served in four sessions, one for each of the experi-
mental conditions (stimulus subsets). Order of the four sessions
(conditions) was counterbalanced over the 16 subjects who partici-
pated. All were young men and women with normal hearing and
with normal or corrected-to-normalvision.
Results and Discussion
Psychometric Functions
The basic data consisted of tallies of the percentages of
times that the subject responded that the tone more resembled
the brighter light. These percentages then could be plotted
against either sound frequency or against LL to yield psycho-
metric functions. In order to interpret such functions, how-
ever, it is useful to consider the range of possible outcomes.
Psychometric functions for pitch and loudness. Given the
results of previous research, I expected to find that both
increasing pitch and increasing loudness would yield increas-
ing percentages of "bright" responses. If this were so, then
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 5/17
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 6/17
CROSS-MODAL SIMILARITY 591
the dimmer or brighter of two lights. The matching modelstates that the probability p(B/P, L) that a sound of pitch P
and loudness L is judged more like the brighter light is givenby a monotonic function F of the form
p(B/P, L) = F(Bl - \2)"\ 0 < p(B/P, L) < 1, (5)
where X represents the "response criterion" determined by thevalues of brightness (5V) of the two visual stimuli. Note thatEquation 5 assumes a nonlinear rule of comparison (Equation3), although it does not yet specify a rule of integration forBa. A variety of possible functions might characterize theresponse or psychometric function F. F may, for example, bea Gaussian integral:
p(B/P, L) =1
exp — (6)
Ideally, we would like to specify the function F as well as toestimate the relation of Ba to pitch and loudness and the
relation of X to the brightnesses of the tw o lights. For the mostpart, however, the present discussion will consider the formof the general model expressed by Equation 5.
Note that when Ba = X ,p = .5. That is to say, when Ba
equals X , the auditory stimulus is "equally similar" to the twovisual stimuli, for which equal similarity is defined as match-ing equally often to the dimmer stimulus and the brighterlights. Note also that this relatively simple form of the match-ing model assumes that any bias in responding is negligible;in practice when applying the model to data, response biaswill be absorbed into the value of the criterion X .
Besides the form of the psychometric function F, twoparameters of Equation 5 are of concern here: X and Ba. The
value of X, the criterion, reflects the degree to which cross-modal matching is absolute or relative. If auditory-visualmatching is absolute, then X will be constant across contextualconditions, its value corresponding to some constant value ofBa . If auditory-visual matching is relative, however, X willvary with stimulus context. Quantification of X provides ameasure along the dim ension of absolute versus relative.
The other parameter of interest in Equation 5 is B a. Earlierin this article I argued on the basis of Schneider and Bissett's(1981) results tha t Ba may obey Equation 2, following a ruleof linear summation, equaling aP + bL. Ideally, we wouldlike to compare directly how well therole of linear summ ationand, say, a rule of vector summation account for the presentdata. Comparison of the two rules of integration is impeded,
however, by three factors. First, either rule would operate onunderlying scale values of pitch and loudness, P and L,variables that are not measured directly, rather than on di-rectly measurable values of sound frequency and intensity.That is, we must specify the psychophysical functions relatingP an d L to frequency and intensity before comparing (orsimultaneous with comparing) the two integration rules. Sec-ond, the difficulty in differentiating between the two integra-tion rules is exacerbated by the tendency of many of theindividual subjects to rely almost exclusively on use of oneauditory dimension or the other—implying a»b or, lessoften, b»a in Equations 1 and 2. When cither a or b is
negligibly small, the vector and linear rules become virtually
indistinguishable. Finally, even if we restrict the analysis todata derived from that very small subset of subjects for whoma and b are more or less equal, it is necessary to define orassume the form of function F—or, what is equivalent, tofind an appropriate transformation of p(B/P, L). Althoughthe Gaussian model of Equation 6 implies that a z transfor-mation is appropriate, other functions are possible. Given thefreedom to choose both psychophysical functions for P andL and psychometric functions for F, the present data are notsufficiently strong to permit us to fit a model such as theGaussian directly to the values ofp(B/P, L) .
Although these considerations foster a conservative ap-proach to analyzing the data, nevertheless it is possible tointerpret the data quan titatively through the general model ofEquation 5. Most important, it is possible to analyze the data
in a way that obviates the need to assess the nature of thepsychometric function F. Given the general form of the modelexpressed in Equation 5, such an analysis is provided by thetraditional "threshold" approach of psychophysics.
Let us define a tonal "threshold," or in the present examplea tonal stimulus of "equal similarity," as a pitch/loudness
combination that produces p(B/P, L) = .5. Although in
principle we could use any constant probability, a value of .5not only represents a kind of neutral point in matching(assuming negligible response bias), but moreover is situatednear the overall mean ofp(B/P, L) in the data sets and thustends to be estimated relatively accurately.
From Equation 5 it follows that when p(B/P, L) = .5, orsome other constant, then either
or
aP2
+ bL2-\2
=
a P + bL - X = k,
(7)
(8)
depending on whether Ba takes on the vector form (Equation1) or the linear form (Equation 2). Note that this "threshold"
analysis actually eliminates two assumptions. Not only doesit eliminate the need to assume or define the psychometricfunction F, but it also eliminates the need to choose betweennonlinear and linear rules of comparison: Replacing the non-linear rule with a linear one in Equation 5 would produceexactly the same functional Equations as 7 and 8. EitherEquation 7 or 8 will allow us to answer the two main remain-ing substantive questions: (a) Is ma tching absolute or relative?(Is X constant over contextual conditions?) (b) How do sub-
jects weight the contributions of pitch and loudness in cross-modal m atching? (What are the relative values of a and tf!)
Derivation of equal similarity: "Matching points." Thesubsequent analysis permits derivation of estimates for eachsubject as to (a) the contribution of pitch versus loudness (thevalues of a and b) and (b) the role of absolute correspondencesversus relations (the value of X). In order to assess thesevariables, I converted each psychometric function (from eachsubject in each contextual con dition) into a single value—the"matching point"—by calculating the value of the auditorystimulus that produced, or was interpolated to produce, 50%response on the psychometric function. Th e procedure in-volved linear regression on either sound frequency of LL,
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 7/17
592 LAWRENCE B. MARKS
then calculation of the value of the stimulus corresponding to
the 50% point. Each matching point was taken, by definition,
to represent a sound that was "half-way between" the two
lights, that is, to represent a combination of sound frequencyand intensity judged "equally similar" to the two lights. The
regressions were carried out on z-transformed values of the
probabilities; however, because of the sizable number of val-
ues at 0% and 100%, to avoid losing a considerable amount
of data, the data were firstconverted to probabilities according
to Tukey's (1977) formula for split simple (ss) fractions: p =
(n + l/6)/(Ar+ 1/3). Regressions were then carried out using,
for any given psychometric function, only one value of zcorresponding to 0% or 100%, even when more appeared.
An example appears in Figure 3, which shows the calcula-
tion of a matching point for Subject 16 in Condition 1, givena sound frequency of 540 Hz; Figure 3 indicates that at 50%
response (z = 0), the LL was 53.4 dB. I obtained as many
points as possible for each subject in each condition by
performing these calculatiions either as a function of LL, with
log frequency constant, or as a function of logfrequency, withLL constant.
Contours of constant cross-modal similarity. The forego-ing procedure provided the (derived) data points plotted in
Figure 4. Each panel of Figure 4 givesthe results from a single
subject; different symbols represent results for the four differ-
ent contextual conditions. By plotting frequency against LL,
each point indicates a particular auditory stimulus that was
judged equally similar to the dim and to the bright lights. The
locusofpoints derived for a given subject in a given contextual
condition represents, therefore, a "constant-similarity con-
tour," that is, a set of auditory stimuli all of which are judged
cross-modally to be "halfway between" the dim and bright
lights.
Note the spacing of stimuli on the two axes. This spacing
represents an attempt to showhow the equal-similarity data
conform to the linear integration rule of Equation 8. Given
that (a) the linear model is appropriate to Ba and (b) when
p(B/P, L) = 0, A : = 0, we can use Equation 8 to write
P=X/a- bL/a. (9)
0)
05 0
O -2
C D
0
-3
S:16
540 Hz
40 50 60
Loudness Level (dB)
Figure 3. Psychometric function for Subject 16 (Condition 1), at a
sound frequency of 540 Hz, showing the calculation of the matching
point—he loudness level at this frequency is calculated to give 50%
response.
Consequently, a plot of sound frequency, with stimuli spaced
in proportion to pitch, against sound intensity, with stimuli
spaced in proportion to loudness, should yield a straight line
whose slope equals -b/a and whose intercept equals X/a.In Figure 4, the stimuli are spaced on the ordinate according
to log frequency and on the abscissa according to LL - 10;
that is to say, I am assuming that P, the scale of pitch operating
in similarity judgments, is approximately a log transform of
the sound frequency in Hz and that L, the scale of loudness
operating in similarity judgments, is approximately propor-
tional to number of decibels above a threshold of 10 dB (a
good approximation under the present experimental condi-
tions). The data appear to be at least reasonably consistent
with this model, though it must be recognized that the degree
of uncertainty is great: For example, as Figure 4 indicates,
results from several of the subjects suggest slopes near zero;
as indicated earlier, when a or b = 0, the linear and vector
rules become identical. No claim can be made at this point
either that the present data strongly support the linear rule of
summation over other rules such as the vector, or that loga-rithmic psychophysical transformations rather than other
(e.g., power law) transformations intervene between sound
frequency and intensity and perceptual values of P and L.
Nevertheless, the model provides a useful framework foranalyzing such parameters as a and b (representingthe weight-
ings of pitch and loudness) and X (representing the criterion).
It is important to point out that the patterns of variation in
a, b, and X are robust over a wide range of possible integration
rules and psychophysical transformations.
Two aspects of these equal-similarity contours are signifi-cant: (a) their slopes and (b) their convergence over contextual
conditions.
Weighting of Pitch and Loudness
First, the slopes of the constant-similarity curves provide a
measure of the relative contributions of pitch and loudness;
more precisely, the slopes give the values of -b/a in Equation
9. A set of horizontal lines in Figure 4 (frequency = constant)
would indicate that similarity depends only on pitch, not at
all on loudness; a set of vertical lines (LL = constant) would
indicate that similarity depends only on loudness, not at all
on pitch; and a line or lines of negative slope would indicate
that cross-modal similarity depends on both pitch and loud-
ness, that the one can trade off with the other to maintain an
equal similarity to brightness.
Figure 4 makes it clear that for about half of the subjects,pitch alone or in large measure determined cross-modal sim-
ilarity. By way of contrast, no subject appeared to rely solely
on loudness to match brightness, although a couple ofsubjects
did rely on loudness more than on pitch. A qualification
needs to be made: The conclusions about trade-offs between
loudness and pitch rest on two assumptions. The first is that
it is appropriate to translate, in a one-to-one fashion, from
frequency to pitch and from intensity to loudness. This isprobably a safe assumption with regard to pitch, for even
though pitch does change with stimulus intensity, that change
is slight. The assumption also seems reasonably safe withregard to loudness, in that the use of LL aimed specifically to
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 8/17
CROSS-MODAL SIMILARITY 593
2.5
-»—0
N
>*QCC D
ZSD"0
8=8-
2.0
3.0
1 2
2.0
16
40 50 70 40 70 40 50 60 70
Loudness Level (dB)
Figure 4. Fo r each of the 16 subjects, matching points for the four contextual conditions (open squares
= Condition 1 , filled triangles = Condition 2, filled squares = Condition 3, open circles = Condition
4) . (Each point gives the combination of sound frequency and loudness level calculated to give 50 %response on the psychometric function.)
equate stimuli of different frequency in their loudness; never-
theless, a caveat is in order because it is not possible to
discount either small individual differences among subjects
or small differences between the average equal-loudness
curves of the present subjects and the curves that Fletcher and
Munson (1933) used to determine LL. The second assump-
tion is that P and L are scaled appropriately here (in scaling
them, I assumed mean P an d mean L are equal for the present
stimulus set). Clearly, in an additive model one can trade offmultiplication of scale values with weighting coefficients.
Despite these reservations, it is nevertheless clear that loudness
counted much less than did pitch in determining auditory-
visual similarity.
To quantify these results, I calculated individual values ofa and b in Equation 9, under the restriction that these param-
eters can be considered weighting proportions: that is, a + b
= 1. If c represents the slope of any given equal-similarity
function in Figure 4, then a, the weighting coefficient for
pitch, equals 1/(1 — c). The first four columns in Table 1 list
the values of a for all subjects in every condition. Th e verylarge values for Subjects 1-7 (for each subject, all values of a> .85) reflect the near total reliance on pitch in matching to
brightness. By way of contrast, only one subject, Subject 16,
had a notably small weighting coefficient for pitch. Moreover,
inspection of values of a in Table 1 as well as the correspond-
ing slopes in Figure 4 shows that subjects were consistent over
conditions in their relative weightingsof pitch an d loudness.
(One apparent exception is Subject 15, who gavean aberrantly
large negative pitch coefficient in Condition 4; as Figure 4
shows, the equal-similarity points from which the aberrant
value derives [open circles in Panel 15] are exceptionally
variable, so this particular value of a should not be taken too
seriously.)Analysis of variance confirmed the existence of wide inter-
individual variation in relative weighting, F(15, 45) = 9.042,
p< .001;with Subject 15's data eliminated, F(14,42) = 17.71,
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 9/17
594 LAWRENCE B. MARKS
Table 1Relative Weighting Coefficient a fo r Pitch (re: Loudness) andCriterion \ in theFour Contextual Conditions(Experiment 1:
Matching Tone to Light)
Subject
1234567
8
910111213141516
1
0.8881.0330.9810.9870.9940.9430.9470.8440.7040.7580.6390.6400.5960.5040.4920.237
2
1.1360.9610.9700.9720.9600.8560.8550.8420.7270.7190.5790.5570.4840.5310.5390.240
3
1.1361.0791.0091.0350.9320.9440.8710.7870.6830.6320.6260.5230.5290.4650.4230.372
4
0.8950.9700.9940.8490.9530.8710.8970.4730.7000.650
-0.123-0.017
0.3960.021
-0.974-0.036
1
2.6412.5772.6042.6692.6922.6002.5772.5812.6402.5212.7452.7102.5172.5342.5012.570
2
3.1132.9642.9983.0283.0562.9132.8302.9112.9042.7662.7972.8172.6252.6972.6532.636
3
2.5542.5092.5962.6482.7532.6652.5612.7482.7542.7212.7732.8092.6752.7462.7132.927
4
3.0033.0163.0153.0283.0772.8572.8602.9893.0222.7243.0563.0302.9713.1012.4223.033
Note. Condition 1 = low pitch/low loudness, Condition 2 ;
high pitch/high loudness.
: high pitch/low loudness, Condition 3 = low pitch/high loudness, Condition 4 •
p < .001. Viewed in another way, we see that the values of acorrelate highly over conditions: Pearson coefficients of cor-
relation r between pairs of conditions range from .76 to .96,
averaging .87. Interestingly, these correlation coefficients, cal-
culated on the derived measure a, are on the whole greaterthan the corresponding correlations calculated on the originalslopes c, the latter ranging from -.31 to .98, and averagingonly .33.Thus the derived, model-based measures seem to be
the more stable, a point whose theoretical significancedid not
escape notice.To repeat, on e prominent feature of the results is theconsistency with which individuals relied on pitch or loudnessin matching sounds to lights. Subjects tended to use pitch,loudness, or both in a characteristic fashion.
Bu t there is a second prominent feature to the weightingcoefficients a an d b, namely, their variation with change instimulus context. Although the average size of the pitchcoefficient a hardly varied across Conditions 1-3 (M = .762,
.746, and .753, respectively), a was considerably smaller inCondition 4 (M = .470). If we eliminate the data of Subject15—the subject who produced the aberrant value in Condi-tion 4—the means become .780, .759, .775,and .566. Thevariation in a across conditions is reliable, F(3,45) = 8.018,
p < .001; without Subject 15,F(3,42) = 9.612, p < .001.Posthoc, pairwise comparisons (Tukey, honest significant differ-
ence, or hsd)showed that Condition 4 differed reliably from
each of the other conditions (p < .01), but no other pair
differed reliably (allps >.05).The variation in a with change in stimulus context has an
especially important implication: The rule of integration gov-erning the quantity fia appears to have an "optional"com-
ponent. That is, the relative weightings of pitch and loudness
vary with the conditions of the experimental task, despite thefact that any given individual characteristically gives moreweight to one dimension or the other. Even for a given person,Ba is not fixed. This finding speaks against the notion that the
weightings are rigid or physiologically determined, bu t sug-
gests instead that they are malleable an d perhaps even subjectto voluntary control.
Criterion and Stimulus Context
The second feature of these equal-similarity curves is theway their locations changed across different conditions ofstimulus context. Recall that I used different subsets of soundfrequency and LL in order to learn whether cross-modalmatches remain invariant when the contextual set of stimuluslevels changes. The existence of such invariance would indi-cate that cross-modal similarities rely on absolute correspond-ences between stimuli (or sensations) of different modalities;invariance would reveal itself as a convergence of the fourequal-similarity contours onto a single function. Lack ofinvariance, indicating that cross-modal similarities relyat leastin part on relations, would reveal itself as divergence of atleast some of the four contours.
Figure 4 shows divergence clearly to be the rule. It isimportant to note, however, the connection between the
nature of convergence-divergence and the extent to whichthe matches rely on pitch versus loudness. Absolute respond-
ing should produce convergence in the functions; bu t someconvergence will also take place even with relative respondingif a subject's matches rely on just one attribute and not onboth. Consider the data for Subject 2. The contours arehorizontal, implying that Subject 2 used pitch but not loud-ness in matching to brightness. The four contours collapseonto two curves: One curve represents Conditions 1 and 3,low pitch/low loudness (open squares) and low pitch/highloudness (filled squares); the other curve represents Condi-tions 2 and 4, high pitch/low loudness (filled triangles) andhigh pitch/high loudness (open circles). Each pairwise con-
vergence takes place, therefore, when the contextual levels ofpitch remain constant but the contextual levels of loudness
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 10/17
CROSS-MODAL SIMILARITY 595
vary, a result that is wholly consistent with the conclusion,based on slopes, that Subject 2 ignored loudness completelyin matching to brightness. That the functions do not convergewhen the levels of pitch change, however, agrees with theconclusion that Subject 2's matches depend on relative, not
absolute, values along the dimension of significance (in thiscase, on relative values of frequency or pitch).
Again, we can characterize the results in terms of a param-eter of Equation 9, in this case the criterion X. As alreadynoted, the intercept of each straight line that can be passedthrough a subset of data in Figure 4 corresponds to the valueX/a. Th e calculated values of X are listed in columns 4-8 ofTable 1. Interpretation of these numbers can be assisted byrecalling the psychophysical assumption that values of Pcorrespond to log frequency (values of L are proportional to
LL — 10, but were scaled such that the mean value of L forthe entire stimulus se t corresponded to the mean value ofP).
Given this scaling of P and L, the units of the criterion Xcorrespond roughly to values of log frequency. A value of 2.3
in the criterion, for example, is analogous to a sound fre-quency of 200 Hz. Hence, each increment of 0.1 in the valueof the criterion is equivalent about to a 25% increase in
frequency; an increment of 0.3 unit in criterion correspondsto a doubling of frequency.
Viewed in this light, it is clear that the criterion changesdramatically with changes in context. Only 2 of the 16 subjectsreveal even a hint of absolute responding across the experi-ment's four conditions. Subject 11 gave results consistent witha single criterion characterizing all conditions except highpitch/high loudness, and Subject 10 used a constant criterionover all conditions except low pitch/low loudness. In general,the nature of the convergence or divergence is compatible
with the slope of the contours and with the correspondingvalues of the pitch weighting coefficient a . Thus, as alreadyindicated, convergence takes place across subsets of loudnesswhen subjects ignore loudness (Subjects 1-7) an d mainlyacross subsets of pitch when subjects largely ignore pitch(Subject 16). Little or no convergence takes place when sub-jects use both auditory dimensions (Subjects 8, 9, 12-15).
Table 1 shows how the contextual condition affected thecriterion. Th e average value of the criterion is smallest inCondition 1 (low pitch/low loudness: X = 2.605), followed by
Condition 3 (low pitch/high loudness: X = 2.697), Condition2 (high pitch/low loudness: X = 2.857), and Condition 4 (highpitch/high loudness: X = 2.950). Analysis if variance showsthese differences to be highly significant, F(3, 45) = 26.41, p
< .001. Post hoc comparisons (Tukey hsd) reveal all pairwisedifferences to be significant (p < .01), except for the differ-
ences between Conditions 1 and 3 and between Conditions 2an d 4 (p> .05).
Did subjects tend to differ in their criteria? The values listedin Table 1 imply that subjects' criteria were reasonably uni-form. Analysis of variance suggests little or no reliable inter-individual variation in criterion, F(\5, 45) = 1.760, p = .075;without Subject 15 , F(14, 42) = 1.200, p = .25.
Th e results of Experiment 1 became the point of departurefor the next question I asked, namely, whether the use of oneauditory dimension or the other is truly characteristic ofindividuals. In particular, Experiment 2 inquired whether the
individual subjects who relied on pitch or loudness in Exper-iment 1 would continue to rely on the same attribute whenpresented with another cross-modal task, similarity scaling.Experiment 2 also sought to confirm that pitch, loudness, andbrightness are the appropriate psychological dimensions of
correspondence by asking whether they may be representedas primary axes in a single multimodal space.
Experiment 2
Although intriguing and revealing, the results of Experi-ment 1 are also intrinsically limited. A main limitation stemsfrom the use of a single pair of brightnesses. Ideally, Experi-ment 1 should be repeated using many "dim" and "bright"
standards along the brightness scale—varying both the meanlevel of each pair and the size of the difference in brightnessbetween them. One wonders, for example, whether the meanpitch or mean loudness in "matching points" increases withincreasing mean level of brightness, or how the slope of the
psychometric function varies with the size of the brightnessdifference around a given mean level. Unfortunately, how-ever, the paradigm of Experiment 1 is rather tedious andinefficient for parametric exploration.
Perhaps even more imperative is the need to develop a
wider conceptual framework. One w ay to characterize cross-modal correspondences is by representing perceptual qualitiesin a theoretical space that is both multidimensional and
multimodal (Marks, 1985; Marks et al., 1987; see Wicker,1968). The representation of perceptual qualities by means ofmultidimensional spaces has a long history in experimentalsensory psychology (see, e.g., Boring, 1942). A goal of spatialrepresentation is to express similarity relations among per-
cepts, in that some measure of distance between the spatiallocations of two perceptual experiences expresses the degreeof their dissimilarity. In auditory space, simple sounds, suchas pure tones, may be represented as points in a plane definedby the two roughly orthogonal dimensions of pitch and loud-ness (e.g., Schneider & Bissett, 1981); in visual space, coloredlights may be represented as points defined by the threedimensions of brightness, redness/greenness, and blueness/yellowness (so a two-dimensional plane of red/green andblue/
yellow gives variations in hue along concentric circles andvariations in saturation along radii emanating from the center,e.g., Indow & Kanazawa, 1960).
According to the traditional view, each sensory modality isgoverned by its own space. Consequently, there should be as
many individual spaces as there are modalities. But the cor-respondences that characterize synesthetic perception, cross-modal matching, and cross-modal interactions in responsespeed imply a possible linkage across modalities, and hence a
linkage among the spaces representing them. One way thatlinkage could come about is through communality amongdimensions of different modalities. Th e equivalence betweenloudness and brightness, for instance, suggests an overlap oridentity of these two heteromodal dimensions.
To pursue this possibility, in Experiment 2 I applied asecond technique to study correspondences between lightsand tones, namely, cross-modal similarity scaling. On eachtrial, the subject rated the degree of similarity/dissimilarity
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 11/17
596 LAWRENCE B. MARKS
between two stimuli; on some trials both stimuli were audi-
tory, on others both were visual, and on still others one was
auditory and the other visual. If it is appropriate to treat the
underlying representation of similarity as spatial, then the
judgments of dissimilarity should be monotonically related to
the distances between percepts in the multidimensional, mul-
timodal space. By applying methods of multidimensional
scaling to the judgments, it is possible to uncover the dimen-
sions common to experiences of different modalities, as these
dimensions presumably underlie the similarity relations.
Wicker (1968) used a version of this procedure to study
similarities between tones and Munsell colors; a main featureof his results was the strong association between the pitch ofthe tones and the lightness of the colors. Subsequently, Iexplored the method in comparing hearing and vibratory
touch with respect to the perceptual dimensions of pitch and
loudness (Marks, 1985)and later in comparing hearing, touch,
and vision with respect to the perception of temporal patterns
(Marks, 1987b). This last study became the specific impetus
for Experiment 2.The study of temporal patterns showed individual subjects
to be consistent in the way they weighted rhythm and dura-
tion, which in that study were the two main components ofsimilarity (Marks, 1987b). Regardless of whether subjects were
making intramodal or cross-modal comparisons, and regard-
less of the particular modality or pair of modalities tested,
multidimensional scaling of similarity judgments revealed
most individuals to emphasize one dimension or the other
consistently. This finding suggested that multidimensional
scaling might also be useful in assessing individual relations
among pitch, loudness, and brightness. Consequently, I se-
lected a subset of the subjects from Experiment 1 to undergo
further testing; I based the selection on whether, in the firstexperime nt, the subject relied primarily on pitch or primarily
on loudness in matching tones to visual brightness. If reliance
on one dimension or the other in auditory-visual matching
is a stable personal characteristic and doesn't depend on the
particular experimental task, then comparable results should
be obtained from cross-modal similarity judgments.
Method
Stimuli
The apparatus for producing tones and lights were identical to that
of Experiment 1. The only change in the visual stimuli was theincrease in the number of luminance levels to five: 1.8, 18, 180, 900,
and 2800 cd/m2. Note that the third and fourth levels were identical
to the "dim" and "bright" stimuli, respectively, used in Experiment
1. The tones numbered nine, with each of three different frequencies(200, 650, and 2000 Hz) taking on each of three different loudness
levels (45, 57.5, and 70 dB). Thus the overall ranges of sound
frequency and intensity were identical to those in Experiment 1.
Procedure
Th e subject sat in the darkened, sound-isolated booth in front of acomputer monitor, on which appeared a rating scale. The scale was
a horizontal line 30 cm long, anchored at the left-hand end by the
label dissimilarand at the right-hand end by similar. By means oftwo keys, the subject could control the horizontal position of a cursor,
which at the start of each trial always appeared just above the center
of the rating line. On each trial, the subject set the position of the
cursor to indicate the degree of similarity between the two stimuli.
The computer recorded the position as an integer between 0 (ex-
tremely dissimilar) and 39 (extremely similar).During the course of the session, the subject judged all 196 possible
pairs of the 14 stimuli (five lightsand nine tones). Note that by testing
all pairs, intramodal as well as cross-modal, in a square matrix, each
intramodal pair appeared twice in the set. This permitted counter-
balancing of the temporal order of presentation, in the case of tone-
tone comparisons, and the spatial location of the lights, in the case
of light-light comparisons. Order of stimulus pairs wa s random anddifferent for every subject and in every session. In order to increase
stability and reliability of the individual data, each subject served intwo sessions, conducted on different days, rating all 196 stimuli in
both sessions.
Subjects
Eight subjects from Experiment 1 participated (Subjects 1, 3, 7, 8,10, 12, 14, and 16). They were selected to cover a wide range ofperformance in Experiment 1.
Results and Discussion
Th e ratingsgiven to each stimulus pair were averaged over
sessions for each subject and then analyzed by multidimen-
sional scaling. Implicit in this approach is a model of multi-
modal as well as multidimensional similarity, in which the
dissimilarity Aj between any pair of stimuli, i and j, corre-
sponds to a distance between them. In the present form of
the model, the metric of the space is Euclidean, given by the
equation
Aj = y /2 ( 1 0 )
where d^ .k is the difference between i and j on dimension k,and this Euclidean rule applies equally to intramodal and
intermodal pairs of stimuli—pairs of tones, pairs of lights,
and pairs containing a tone and a light.
For the main analysis, I converted each subject's "square"
matrix of responses to a "triangular" matrix by averaging thejudgments given to pairs that were identical except for order
of presentation. To execute the multidimensional scaling, Iused the individual difference procedure I N D S C A L of Carroll
and Chang (1970) as instantiated in the symmetrical version
S I N D S C A L (Pruzansky, 1975). The S I N D S C A L procedure makestwo important assumptions. First, it assumes the appropriate-
ness of a Euclidean metric in which distance between points
corresponds to dissimilarity; the orientation of the axes
through the multidimensional space is determined from thevariation over subjects in the weighting coefficients on the
various dimensions. Second, S I N D S C A L uses a metric, rather
than a nonmetric, scaling procedure; that is, it assumes that
the rating responses have interval-scale properties, not just
ordinal properties. To ensure that these two assumptions—Euclidean rule and interval-scale response—are justified, Ialso applied to the data a multidimensional scaling procedure
that assumes only ordinal properties in the ratinp and that
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 12/17
CROSS-MODAL SIMILARITY 597
make it possible to compare the Euclidean and city-block
rules (Kruskal, 1964). The results confirmed the propriety of
SINDSCAL'S assumptions. First, the nonmetric procedure
yielded a better fit with the Euclidean than with the city-block
model: Stress, a measure of relative badness of fit (the greater
the stress, the worse the fit), was 13.1% with the Euclidean
metric and 16.8%with the city block (two dimensional solu-
tions; see also Melara, 1989). Second, the original data did
have interval-scale properties: The interpoint distances de-
rived from the scaling solution were linearly related to the
original dissimilarity ratings.1
Moreover, the multidimen-
sional space was much the same as that derived from SINDS-
CAL. But recall that SINDSCAL, the procedure used in the main
analysis, has the advantage of providing information about
individual subjects. Of particular interest were (a) whether
and how brightness, loudness, and pitch might appear as
dimensions, and (b) how individual subjects might differ in
the weightings on the dimensions of loudness and pitch.
Mult idimensional and Mult imodal Space
Using SINDSCAL, I obtained scaling solutions in two, three,
four, and five dimensions; the results of the main analysis
showed that two dimensions sufficed to account for sizable
proportions of the variance, with each additional dimension
accounting for less than 10% more and having little effect on
Dimensions 1 and 2. In the two-dimensional analysis, Di-
mensions 1 and 2 accounted for 36.5% and 36.4% of the
variance, respectively. Figure 5 shows the two-dimensional
solution. It is clear that Dimension 1 relates closely to loud-
ness, Dimension 2 closely to pitch. Thin solid lines connect
constant frequencies as LL increases from left to right; thin
dashed lines connect constant LLs as frequency increasesfrom bottom to top. This straightforward two-dimensional
structure agrees well with multidimensional scaling solutions
obtained with tones alone (Carvellas & Schneider, 1972;
Marks, 1985; Schneider & Bissett, 1981).
The five levels of luminance form an arc (thick solid line)
going from lower left (dim) to upper right (bright). Thus the
0.4
I0.2sC M 0.0
CDiQ
-0.4
-0.6-0.4 -0.2 0.0 0.2 0.4
Dimension 1 (loudness)
0.6
Figure 5. Multidimensional scaling so lution to similarity judgmentsof lights and tones. (Dimension 1 corresponds to loudness [from left
to right, loudness levels of 45 , 57.5, and 70 dB],Dimension 2 to pitch[from botton to top, frequencies of 200,650, and 2000Hz]. Brightnesscorrelates with both [from lower left to upper right, luminances of1.8to2800cd/m
3].)
visual dimension ofbrightness appears as a kind of"resultant"
of the two auditory dimensions of loudness and pitch. As I
pointed out (Marks, 1975), the orientation of the visual
dimension of brightness corresponds at least qualitatively to
the orientation of the auditory dimension of "density." At
that time, I suggested that the dual cross-modal equivalences
of brightness with pitch and brightness with loudness might
represent a singular equivalence between brightness and den-
sity; interestingly, reference can be found in the literature of
the early part of this century to a dimension of "auditory
brightness" (e.g., Rich, 1919), until Boring and Stevens (1936)
identified auditory brightness with density. And Schneider
and Bissett (1981) showed that density lies halfway between
the axes of pitch and loudness in auditory space. Figure 5,
however, shows no dimension of density/brightness.
As already indicated, the higher order dimensions ac-
counted for little of the variance. Dimension 3 yielded three
successive groupings: midfrequency tones, low-frequency
tones plus all lights, and high-frequency tones; it is hard to
provide Dimension 3 with an interpretation. Dimension 4corresponded largely to pitch and correlated highly with Di-
mension 2. Dimension 5, however, has some special interest.
Although it accounted for only about 3.9% of the variance,
and its appearance could be attributed to the data of a single
subject (16), Dimension 5 served roughly to distinguish lights
from sounds. When Subject 16"s data were eliminated, a five-
dimensional solution gave no division by modality.
To evaluate further the three higher order dimensions, I
also applied the SINDSCAL analysis separately to the half-matrices containing different stimulus orders. Recall that in
each experimental session, every pair of stimuli was presented
twice. In the case of tone-tone pairs, the temporal order of
presentationdiffered
in the two halves; in the case oflight-light pairs the spatial position differed in the two halves. One
half-matrix contained all tone-tone pairs in which the first
tone was softer and/or lower in pitch and all light-light pairs
in which the light to the left was dimmer; the other half-
matrix contained the complementary subsets of stimulus
pairs. Although temporal/spatial order exerted a sizable effect
on the overt responses themselves, the results of the multidi-
mensional scaling of each subset were virtually identical to
those obtained from the averaged data: The scale values of
Dimensions 1 and 2 were much the same, as were the weight-
ing coefficients. And in both cases, with Subject 16's data
included, but not with them excluded, Dimension 5 separated
visual stimuli from auditory stimuli. In sum, the results
suggest the salience of two multimodal dimensionsof similar-ity, one a dimension of loudness-brightness, the other a
dimension of pitch-brightness.
Individual Differences
Inspection of the weighting coefficients shows marked in-
dividual differences; see Table 2. For example, where Subject
1Other experiments, on tones alone, conducted in my laboratory
by Robert Melara and by me reveal the same pattern of results. Inour experience, the Euclidean metric always produces the better fit(lower stress). See also Footnote 2.
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 13/17
598 LAWRENCE B. MARKS
Table 2
Weighting Coefficients From Multidimensional Scaling of
Tones and Lights (Experiment 2)
Subject
1
37g
10121416
Dimension 1
0.297
0.0870.4120.5040.8190.8310.5640.746
Dimension 2
0.707
0.8670.7820.6270.2670.1900.5920.165
Note. Dimension 1 = loudness, Dimension 2 = pitch.
3 emphasized pitch at the expense of loudness (weighting of.087 on Dimension 1, .867 on Dimension 2), Subject 12 didjust the opposite (weighting of .831 on Dimension 1,. 190 on
Dimension 2). There is a reasonably close correspondence
between these weightings and the matching behavior from
Experiment 1: Compare the present weighting coefficientswith the pitch weighting coefficient a derived from equal-similarity contours in the first experiment. Comparison of
Table 2 with Table 1 shows a strong degree of consistencyacross the two experiments in the use of pitch or loudness to
match visual brightness. Th e rank-order correlation coeffi-
cient, p, taken over subjects, between the median coefficienta from Experiment 1 and the difference between weightingson Dimensions 2 and 1 from Experiment 2 is 0.86. Thesesubstantial individual differences seem unlikely to be basedon sensory variations; rather, they suggest that people vary inthe cognitive strategies they use when comparing heteromodal
stimuli. Nevertheless, whatever their source, such strategies,if strategies they be , appear consistent over the two psycho-physical tasks.
Multidimensional Structure of Tones
Finally, I repeated the multidimensional scaling, but ap-
plied it this time to the subset of pairs of auditory stimulialone. Not surprisingly, the resulting two-dimensional solu-tion generally resembled the corresponding two-dimensionalsolution obtained for the entire bimodal array. There was,however, one modest qualification: Whereas loudness and
pitch accounted for about equal amounts of total variance(36.5% and 36.4%) of the bimodal array (tones and lightstogether), with tones alone pitch accounted fo r substantially
more variance than loudness (45.5% versus 33.0%), and thetwo dimensions together accounted for a somewhat largertotal proportion of the variance.
Thus when multidimensional scaling was applied to tonesalone, it waspitch that predominated—at least, for the present
selection of stimuli. Although the relative scale values ofDimensions 1 and 2 are largely the same in both analyses—with and without the inclusion of lights—the salience of pitchisless in the former, greater in the latter.This outcome impliesa slight difference in behavior (or strategy or whatever) on the
part of the subjects between the ways they judged tone-tone
pairs and the ways they judged tone—ight pairs. In compar-ing lights with tones, the salience of pitch decreases relative
to that of loudness. Or, in other words, loudness took on
greater importance when the subjects compared tones withlights than when they compared tones with tones. This out-come is curious in light of the evidence from Experiment 1
that pitch proves stronger than loudness as a cross-modalanalogue to brightness. Again, though, it suggests that the
additive weightings of pitch and loudness have an "optional"
component. In retrospect, it would ' < have been useful had thesubjects also served in a separate session in which they judgedthe similarities of tones alone, in a unimodal paradigm.
2
General Discussion
Previous studies have shown that both pitch and loudnessserve as cross-modal analogues to visual brightness. Synes-thetic perceivers often report that loud (vs. soft) and high-pitched (vs. low-pitched) sounds induce bright (vs. dim) visualimages (see Marks, 1975). Nonsynesthetic individuals, youngchildren and adults alike, will match the louder or the higher
pitched of two sounds to the brighter of two lights (Marks etal., 1987); indeed, when matching tones to lights of constantbrightness, adults trade off loudness for pitch (Marks, 1978,Figure 18). In verbal tasks, children and adults rate moonlightto be not only brighter than sunlight but also louder and
higher pitched (Marks et al., 1987). The adjective bright israted high pitched and loud, whereas dim is rated low andsoft (Marks, 1982a). And the speed and accuracy in discrim-inating bright from dim lights are greater in the presence oftones matching in either pitch or loudness: Response to a
bright light is relatively fast and accurate in the presence of ahigh-pitched or loud sound and slower and less accurate in
the presence of a low-pitched or soft sound; bu t response to a
dim light is faster and more accurate in the presence of a low-pitched or soft sound (Marks, 1987a). The impression left byall of these findings is that visual brightness has at least two
structural and functional correlates in the auditory realm—pitch and loudness.
Salience of Pitch as an Analogue to Brightness
The present results also agree with earlier evidence that, as
a correlate to visual brightness, pitch is overall at least as
strong as loudness, maybe stronger or more dominant—atleast given the stimuli used here. In Experiment 1, mostsubjects relied on pitch rather than on loudness. In Experi-ment 2 the two attributes turned ou t nearly equal in impor-
tance, but of course that outcome reflected the selection ofsubjects: Note that I chose for Experiment 2 a subset ofsubjects whose results in Experiment 1 indicated them to bemore equal on average than others in their reliance on pitchand loudness. In keeping with the greater salience of pitch-
brightness similarity, Marks et al. (1987) found a higher
2Recently, I did conduct a unimodal study in which a different
group of subjects rated the similarities of tone-tone pairs; multidi-mensional scaling revealed a Euclidean metric (superior in fit to a
city-block metric) in which similarity relied much more on pitch thanon loudness.
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 14/17
CROSS-MODAL SIMILARITY 599
proportion of 4-year-olds matching pitch and brightness than
matching loudness and brightness. Thus even young children
find pitch more salient, or at least more reliable, than loudness
as an analogue to brightness.
The high salience of pitch-brightness similarity remains
intriguing. Nearly a half-century ago, the philosopher Hart-shorne (1934) postulated on the basis of phenomenological
evidence that pitch rather than loudness is the "proper"
analogue to visual brightness; Hartshorne's argument rests on
the contention that pitch rather than loudness should be
viewed as the auditory embodiment of intensity—that high-
pitched sounds have the same sharpness and compactness
that characterize bright lights. The intensity common to the
experiences on both sensory modalities is, so to speak, a high
"intensity per unit volume." Though Hartshorne's account
falls short of adequate explanation—indeed, his detailed ac-
count seemingly rests on essentially the kinds of introspections
used by subjects who partake in cross-modal matchin g exper-
iments like the present ones—it does point to the need tospecify more precisely what it is that can make experiencesof
different modalities alike despite their being qualitatively
distinct.
Salience and Primacy of Pitch and Loudness as
Auditory Dimensions
The present findings also suggest that in their cross-modal
connections to brightness the auditory attributes of pitch and
loudness possess a rather high degree of dimensional salience.
It is hard to reconcile these findings with my earlier suggestion
(Marks, 1975) that the dual associations in synesthesia of
pitch with brightness and loudness with brightness might
represent a simpler, unitary association of auditory density
with brightness, for which density in some sense "contains"
within itself both pitch and loudness. Given the present
results, it seems better to maintain the self-evident multiplicity
of relations, in keeping with Karwoski et al.'s(1942) "principle
of alternate polarities and gradients," stating that synesthesia
can involve one to many associations of attributes in different
modalities. To be sure, it is possible to characterize the cross-
modal relation of lights to tones in terms of an equivalence
between brightness and some weighted sum of the two attri-
butes, loudness and pitch. Nevertheless, nonsynesthetic indi-
viduals can differ markedly in the degree to which their
judgments rely on the one auditory attribute or the other; this
is true both when the task involves matching and when it
involves similarity scaling. Moreover, individuals appear to
be at least modestly consistent across tasks in their reliance
on pitch or loudness.
Yet there remains a puzzling aspect to this duplex, if not
duplicitous, rule of correspondence. Even for a given person,
the relative weightings of pitch and loudness (e.g., the coeffi-
cients a and b in Equation 2) are not wholly invariant.
Experiment 1 revealed that the weightings change systemati-
cally over contextual conditions, depending somewhat on the
absolute values of pitch and loudness available to match to
brightness. And the multidimensional scaling solutions of
Experiment 2 revealed that the weightingsdiffered depending
on whether comparisons to visual brightness were included:
When tones were compared to other tones only, pitch counted
more than when tones were compared both to other tones
and to lights. The two experiments agree in suggesting that
there is an "optional" component to pitch-loudness weight-
ing. Relative weighting coefficients are not built in or invar-iant, not even within a given individual, but can vary with
the task.
What is particularly puzzling about this high degree of
salience of pitch and loudness is its coexistence within an
interactive framework. Two classes of findings point to inter-
actions.
First, the dissimilarity between pairs of tones appears to
follow more closely a Euclidean than a city-block rule (Ex-
periment 2; see also Grau & Kemler N elson, 1988; Melara,
1989). Thus the contribution to dissimilarity of a constant
difference on one dimension, say, pitch, depends on the size
of the difference on the other dimension, loudness.
Second, in tasks requiring speeded classification of toneson the basisof either pitchor loudness, uncorrelated variation
in the levels of the irrelevant dimension depresses perform-
ance relative to a baseline condition containing no variation
in the irrelevant dimension, but correlated variation (redun-
dancy) enhances performance relative to baseline. That is,
classification of stimuli by, say, their pitch is slow (and less
accurate) when the stimuli also vary unpredictably in loudness
(low pitch and high pitch alike occur sometimes at low and
sometimes at high loudnesses) as compared with baseline, in
which loudness is always constant; however, classification is
faster (and more accurate) with redundancy (low pitch always
associated with, say, low loudness; high pitch associated with
high loudness) (Grau & Kemler Nelson, 1988; Melara &
Marks, 1989; Wood, 1975).
The interactions in similarity scaling and in speeded clas-
sification characterize performance in what Garner (e.g.,
1974) has called pairs of "integral" perceptual dimensions.
Besides pitch and loudness, other dimensions that interact
integrally are lightness and saturation. Integral pairs of per-
ceptual dimensions contrast with what Garner terms "sepa-
rable" pairs of dimensions, such as lightness and size; sepa-
rable dimensions obey city-block metrics in similarity scaling
and reveal no changes in performance at speeded classification
when the irrelevant dimension takes on variable levels. The
salience of pitch and loudness described in the beginning of
this section—based on evidence that weightings of pitch and
loudness vary with the stimulus context—probably corre-sponds to some aspect of the primacy of pitch and loudnesses
as dimensions of perceptual experience. Within the domain
of tones that vary in sound frequency and intensity, pitch and
loudness may have a kind of ontic priority, may represent
perceptual Urbegrijje. Subjects seem to rely on underlying
values of pitch and loudness even when categorizing stimuli
with regard to volume and density or with regard to other less
meaningful dimensions (Grau & Kemler Nelson, 1988; Me-
lara & Marks, 1989).
Figure 6 provides my attempt to capture some of the main
features of these interrelations. Consider two acoustic stimuli,
A and B, each stimulus having particular values of sound
frequency and intensity. The intent of Figure 6 is to sketch
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 15/17
600 LAWRENCE B. MARKS
STIMULUS A STIMULUS B
FREQUENCY INTENSITY FREQUENCY INTENSITY
Psychophysical
P e r c e p t u a l
P o s t - P e r c e p t u a l
Figure 6. Schematic model of stages in processing the auditorydimensions of pitch and loudness.
some of the main sensory, perceptual, and postperceptual
stages in processing frequency and intensity. Thick lines andlarge arrows in the figure represent primary effects or influ-
ences; lighter lines represent secondary influences. I am notclaiming, however, that either primary effects or secondary
effects are all of one type or have a comm on kind of m echa-nism; on the contrary, the underlying mechanisms at differentstages are undoubtedly different.
At one of the earliest levels in processing, we find thepsychophysical interactions between the two stimulus dimen-sions, frequency and intensity, interactions that result in theproduction of the sensory variables P, and L,. Often, psycho-physicists refer to these quantities as pitch and loudness but Ido wish to distinguish P , and L, from other quantities thatemerge subsequently in processing. For the most part, fre-quency determines P, an d intensity determines L,, bu t inten-sity can also affect P, and frequency has a substantial effect
on L,. These psychophysical relations presumably rest onprinciples of sensory coding—principles that, unfortunately,
still remain only partly understood. It is worth pointing out,as an exam ple, the diversity in the under lying processes. Theinfluence of intensity on /", presumably arises through someneural interaction in underlyin g codes (related perhaps to theuse of neural spike frequency to code both P I and L,); theinfluence of frequency on L,, however, is largely determinedby preneural factors, by the filtering characteristics of theexternal ear canal and middle ear. In ma ny experiments, it isuseful to minimize interactions of the psychophysical sort byequating stimuli of different frequencies for loudness and byequating stimuli of different intensities for pitch; in Experi-ment 1, for example, the specification of sound intensity interms of loudness level served to reduce, if not eliminate,
effects of sound frequency on loudness (L,). Th e psychophys-ical interactions make themselves known, of course, at allsubsequent levels of processing, because L, and P, becomeinputs to later stages.
The second stage represents what I shall call perceptual
interactions, interactions of the sort found in tasks requiringspeeded classification. W hen trying to identify whether differ-en t sounds have, for example, the same pitch (P2), perform-ance is affected by variations in loudness (L 2).
Th e third stage represents the postperceptual conflation ofP2 and L2 onto the variable represented earlier as 5a—he
auditory qua ntity that subjects presumably com pare to bright-ness in cross-modal matching tasks. Figure 6 shows #a com-prising two overlapping elements; this "loose coupling" ismeant to indicate that the status of Ba differs from that ofother variables in the diagram. Ba does not represent a new,emergent, unitary perceptual quality, but rather a result ofpostperceptual—and sometimes optional—processing. fia
represents a sum, probably a linear sum (Equation 2), of P2
and L2; however, as Experiment 1 showed, the weightings ofP2 and L2 are strongly tied to stimulus context. Moreover,although P2 and L2 provide the raw materials for Ba , thedimensional identities of pitch and loudness are not lost (asevidenced by the results of Experiment 2).
Finally, a single solid line in Figure 6 connects the process-ing of Stimulus A to the processing of Stimulus B. This lineindicates the interrelation between different stimuli thatemerges in mu ltidimensional similarity scaling. Although thisline is drawn at the level of Ba , remember that Ba itselfcomprises P2 and L2. There is not clear evidence whether theprocess of interstimulus comparison, which produces an in-teractive (Euclidean) metric, itself would precede or follow
the process of summation, or whether the two processesmerely stand for the experimental results that emerge out ofthe different demands inherent in two different tasks of judg-ment.
Is Cross-Modal Similarity Primarily Perceptual?
Fo r more than a decade I have contended that cross-modalsimilarities—in particular, pitch-brightness and loudness-
brightness similarities—are quintessentially perceptual in na-ture, that they are inherent in the nature of auditory and
visual experiences themselves, and that they probably derivefrom some common characteristics of neural coding in theauditory and visual modalities (Marks 1975, 1978; Marks et
al., 1987). Consistent with this contention is the early devel-opmental appearance of these similarities. Similarities ofpitch-brightness and loudness-brightness are evident in vol-untary cross-modal matches at age 4 years (Marks et al.,
1987), and equivalence of loudness-brightness was reportedin autonom ic responses at age 1 mon th (Lewkowicz & Tur-kewitz, 1980).
Nevertheless, even if these similarities have an intrinsicstructural basis, even if they originate in phenom enal experi-ence and sensory coding, it is perfectly possible that cross-modal associations come, developmently, to be mediated bymore cognitive structures. The very fact that adults, and evenyoung children, readily interpret verbal expressions meta-
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 16/17
CROSS-MODAL SIMILARITY 601
phorically, by the same rules that govern perceptual m atching,implies that cross-modal correspondences are represented at
a semantic level. Or at least, verbal processes have readyaccess to perceptual relations, including cross-modal relations.
Moreover, Walker and Smith (1984) found tones (varying in
pitch) to influence reaction times to words (describing visualattributes), an outcome that implies interactions at a semanticlevel of processing. It is at least conceivable, therefore, thatperceptual judg men ts themselves—cross-modal matches andratings of cross-modal similarities—come to be made on thebasis of some type of abstract knowledge about perceptualrelations.
Th e present results appear to be especially com patible withthe view that higher order cognitive processes help mediatecross-modal similarities, particularly when the task demandsintegration of multiple attributes from one of the modalities.Experiment 1 revealed a large relativistic component in thematching of both pitch an d loudness to brightness, a finding
that seems compatible with the use of semantic markers to
indicate the way that dimensions are aligned; that is, "dim"= "low pitched" and "soft," w hereas "bright" = "high pitched"
and "loud." Even more relevant to this point are the largeand consistent interindividual differences, revealed by bothExperiments 1 and 2, in the extent to which either pitch orloudness can contribute to matches with brightness. Althoughit is conceivable that such individual differences are percep-
tually based and even "hard wired," it seems more reasonableto infer that the reliance on either pitch or loudness is in somesense optional, and contingent on the existence of processesthat operate on relatively abstract qualities derived from per-cepts. Maybe perceptual similarities provide raw materials,upon which cognitive processes come to operate and elabo-
rate.Perhaps the strongest evidence in favor of the notion that
cross-modal matches ca n be mediated at least in part byhigher level systems comes from studies in which the directionper se of matching turns out to be malleable. It is well knownfrom psychophysical research that people, when ap propriatelyinstructed, can judge "inverse" attributes—hatthey can m akequantitative judgments of smoothness as well as roughness ofsurfaces (Stevens & Harris, 1962), of dimness as well asbrightness of lights (Stevens & Guirao, 1963), of hardness aswell as softness of materials (Harper & Stevens, 1964). Morepertinent, subjects can make cross-modality matches to eitherbrightness or dimness, to either softness or loudness, by mea nsof another sensory modality (Stevens & Guirao, 1963), in-verting, on instruction, the dimension of intensity; presum -ably, people manage this task by means of some mentalmanipulation of the dimension's polarity.
Still more to the point, the emergence of "inverse" attributesis not merely the result of experimental demand. Even wh ensubjects are not instructed how to order their cross-modalmatches, under certain conditions (such as the matching oftones that vary in loudness to surfaces that vary in lightness,from black through gray to white), the very same subject mayinvert the direction of matches on different occasions, some-times matching a series of increasing loudnesses with a seriesof increasing lightnesses, other times matching increasingloudness with increasing darkness (Marks, 1974).
Inversion of dimensions constitutes an extreme form ofrelativity in judgment. Neither the end points no r intermedi-ate levels of heteromodal dimensions need be rigidly con-nected. The findings of Experiment 1 are consistent withinversion of dimensions and with other results (Marks et al.,
1986), suggesting a sizable relativistic component to cross-modal m atching. On the other hand, m atches are not whollyrelative; rather, they appear to blend the relative with theabsolute (see Marks et al., 1986). (Intramodal matches canalso reveal a large relative com ponen t; M arks, 1988.) If knowl-edge about cross-modal relations does d erive from a primaryperceptual similarity, the residual absolute comp onent to thissimilarity may represent intrinsic equivalences between sub-jective levels of different modalities.
What does the capacity to manipulate the direction of across-modality match imply? Fo r one, it hints at the existenceof an underlying transperceptual mediator—probably linguis-tic—that operates in the production of matches; at the sametime, it implicates a system that can abrogate the rules of
perceptual cross-modal similarity, making possible a degreeof creativity in the production of cross-modal relations andthus perhaps providing a vehicle if not an imperative for stillhigher level cognitive skills, such as the construction of met-aphor.
References
Boring, E. G. (1942). Sensation an d perception in the history of
experimental psychology. New York: Appleton-Century-Crofts.Boring, E. G., & S tevens, S. S. (1936). The na ture of tonal brightness.
Proceedings of th e Nat ional Academy of Science s, 22, 514-521.Carroll, J. D ., & Chang, J. J. (1970). Analysis of individual differences
in multidimensional scaling via an «-way generalization of "Eck-ard-Young" decomposition. Psychometrika, 35, 283-319.
Carvellas, T., & Schneider, B. (1972). Direct estimation of m ultidi-mensional tonal dissimilarity. Journal of th e Acoustical Society ofAmerica , 51, 1839-1848.
Cohen , N. E. (1934 ). Equivalence of brightness across modalities.American Journal of Psychology, 46, 117-119.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition,measurement and calculation. Journal of th e Acoustical Society ofAmerica , 5, 82-108.
Garner, W. R. (1974). The processing of information an d structure.
Potomac, MD: Erlbaum.Grau, J. W., & Kemler Nelson, D. G. (1988). The d istinction between
integral and separable dimensions: Evidence for the integrality ofpitch and loudness. Journal of Experimental Psychology: General,
117, 347-370.Guirao, M., & Stevens, S. S. (1964). Measurement of auditory density.
Journal of th e Acoustical Society of America, 36 , 1175-1182.
Harper, R., & Stevens, S. S. (1964). Subjective hardness of compliantmaterials. Quarterly Journal of Experimental Psychology, 16, 204-215.
Hartshorne, C. (1934). Th e philosophy an d psychology of sensation.Chicago: U niversity of Chicago Press.
Hornbostel, E. M. von (1925). Die Einheit der Sinne. Melos, Zeit-schrif t furMusik, 4, 290-297.
Hyman, R., & Well, A. (1967). Judgments of similarity and spatialmodels. Perception & Psychophysics, 2, 233-248.
Indow, T., & Kanazawa, K. (1960). Multidimensional mapping ofMunsell colors varying in hue, chroma, and value. Journal ofExper imenta l Psychology, 59 , 330-336.
8/9/2019 Lawrence E. Marks (1989) On Cross-Modal Similarity: The Perceptual Structure of Pitch, Loudness, and Brightness
http://slidepdf.com/reader/full/lawrence-e-marks-1989-on-cross-modal-similarity-the-perceptual-structure 17/17
602 LAWRENCE B. MARKS
Karowski, T. F., Odbert, H. S., & Osgood, C. E. (1942). Studies in
synesthetic thinking. II. The role of form in visual responses to
music. Journal of Genera l Psychology, 26 , 199-222.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing good*
ness-of-fit to a nonmetric hypothesis. Psychometrika, 29, 1-27.
Langfeld, H. S. (1914). Note on a case of chromaesthesia. Psycholog-
ical Bulletin, 11, 113-114.Lewkowicz, D. J., & Turkewitz, G. (1980). Cross-modal equivalence
in early infancy: Auditory-visual intensity matching. Develop-
mental Psychology, 16, 597-607.
Marks, L. E. (1974). On associations of light and sound: The media-
tion of brightness, pitch, and loudness. American Journa l of Psy-
chology, 87, 173-188.
Marks, L. E. (1975). On colored-hearing synesthesia: Cross-modal
translations of sensory dimensions. Psychological Bulletin,82,303-
331.
Marks, L. E. (1978). The unity of the senses: Interrelations among
the modalities. New York: Academic Press.
Marks, L. E. (1982a). Bright sneezes and dark coughs, loud sunlight
an d soft moonlight. Journal of Experimental Psychology: Human
Perception and Performance, 8, 177-193.
Marks, L. E. (1982b). Synesthetic perception and poetic metaphor.Journal of Experimental Psychology: Human Percept ion and Per-
formance, 8, 15-23.
Marks, L. E. (1985). Multidimensional scaling in the assessment of
cross-modal similarity. In E. E. Roskam (Ed.), Measurement and
personali ty assessment(pp. 165-177). Amsterdam: Elsevier.
Marks, L. E. (1987a). On cross-modal similarity: Auditory-visual
interactions in reaction time. Journa l of Experimental Psychology:
Human Percept ion and Performance, 13, 384-394.
Marks, L. E. (1987b). On cross-modal similarity: Perceiving temporal
patterns by hearing, touch, and vision. Perception & Psychophysics,
42, 250-256.
Marks, L. E. (1988). Magnitude estimation and sensory matching.
Percept ion &Psychophysics, 43, 511-525.
Marks, L. E., Hammeal, R. J., & Bernstein, M. H. (1987). Perceiving
similarity and comprehending metaphor. Monographs of the Soci-
ety for Research inChild Development, 52 (Whole No. 215).
Marks, L. E., Szczesiul, R., & Ohlott, P. (1986). On the cross-modal
perception of intensity. Journal of Experimental Psychology: Hu-
man Percept ion and Performance, 12, 517-534.
Melara, R. D, (1989). Similarity relations among synesthetic stimuli
and their attributes. Journa l of Experimental Psychology: Human
Percept ion and Performance, 15, 212-231.
Melara, R. D., & Marks, L. E. (1989). Perceptual primacy of dimen-
sions: Support for a modelof dimensional interact ion. Unpublished
manuscript.
Mellers, B. A., & Birnbaum, M. H. (1982). Loci of contextual effects
in judgment. Journal of Experimental Psychology: Human Percep-
tion and Performance, 8, 582-601.
Ortmann, O. (1933). Theories of synesthesia in the light of a case of
color-hearing. Human Biology, 5, 155-211.
Parducci, A. (1965). Category judgment: A range-frequency model.
Psychological Review, 72,407-418.Pruzansky, S. (1975). How to use SINDSCAL: A computer program
fo r individual differences in multidimensional scaling. Murray Hill,
NJ: Bell Telephone Laboratories.
Rich, G. J. (1919). A study of tonal attributes. AmericanJourna l of
Psychology, 30, 121-164.
Schneider, B. A., & Bissett, R. J. (1981). The dimensions of tonal
experience: A nonmetric multidimensional scaling approach. Per-
ception &Psychophysics, 30, 39-48.
Shepard, R. (1964). Attention and the metric structure of the stimulus
space. Journa l of Mathematical Psychology, 1, 54-87.
Stevens, J. C, & Marks, L. E. (1980). Cross-modality matching
functions generated by magnitude estimation. Percept ion &Psy-
chophysics, 27, 379-389.
Stevens, S. S., & Guirao, M. (1963). Subjective scaling of length and
area and the matching of length to loudness and brightness. Journa lof Experimental Psychology, 66 , 177-186.
Stevens, S. S., Guirao, M., & Slawson, A. W. (1965). Loudness, a
product of volume times density. Journal of Experimental Psy-
chology, 69, 503-510.
Stevens, S. S., & Harris, J. R. (1962). The scaling of subjective
roughness and smoothness. Journal of Experimental Psychology,
64, 489-494.
Tukey, J. W. (1977). Exploratory data analysis . Reading, MA: Ad-
dison-Wesley.
Walker, P., & Smith, S. (1984). Stroop interference based on the
synesthetic qualities of auditory pitch. Perception, 13, 75-81.
Wicker, F. W. (1968). Mapping the intersensory regions of perceptual
space. AmericanJournal of Psychology, 81, 178-188.
Wood, C. C. (1975). Auditory and phonetic levels of processing in
speech perception: Neurophysiological and information-processing
analysis. Journa l of Experimental Psychology: Human Percept ion
and Performance., I, 3-20.
Zwislocki,J. J.,&Goodman, D. A. (1980). Absolute scaling of sensory
magnitudes: A validation. Perception &Psychophysics, 28, 28-38.
Received November 3, 1987
Revision received December 5, 1988
Accepted December 7, 1988