a preliminary study of the role of prosodic parameters in ... · a preliminary study of the role of...

25
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report A preliminary study of the role of prosodic parameters in speech perception Svensson, S-G. journal: STL-QPSR volume: 12 number: 2-3 year: 1971 pages: 024-042 http://www.speech.kth.se/qpsr

Upload: nguyenkiet

Post on 11-Mar-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

A preliminary study of therole of prosodic parameters

in speech perceptionSvensson, S-G.

journal: STL-QPSRvolume: 12number: 2-3year: 1971pages: 024-042

http://www.speech.kth.se/qpsr

STL-QPSR 2 -3/197 1

11. SPEECH PERCEPTION

A. A PRELIMINARY STUDY OF THE ROLE OF PROSODIC PARAMETERS IN SPEECH PERCEPTION*

S -G. Svensson

According to Stevens and House, the general objective of a theory of

speech perception i s to describe the process whereby an acoustic speech

signal i s decoded into linguistic units. These units can be specified on many

levels from phonetic features to sentences. The level which has , so f a r ,

been most extensively studied is that of the phonetic segments. A large

number of experiments has been conducted in order to establish invariant

acoustic propert ies of phonemes and phonetic features. A general survey

of the r e sea rch a t this level i s found in Stevens and House (1970). Another

level a t which re sea rch has been done i s the syntactic level. In a se r i e s

of experiments on speech perception Bever e t al . have attempted to verify

the psychological reality of constructs postulated in linguistic analysis

(Bever , 1968). The development of various schemes for automatic speech

recognition can be seen a s an application of the knowledge we have about

speech perception. There seems to have been a considerable lack of suc-

cess f o r this kind of work so far (P ierce , 1969). "In general, i t appears

that recognition around 95 O/o correc t can be achieved for clear ly pronounced,

isolated words from a chosen small vocabulary (digits, for instance) spoken

by a few chosen talkers" (ib. , 1969).

It was against this background that the following experiments were under-

taken. It was assumed that several factors play a role in speech perception,

and that only a small pa r t of the complex of interacting variables may have

been duly recognized in past research . The present investigation a ims a t

drawing attention to the role of prosodic parameters . The experiments to

be reported a r e of pilot character. The basic design of the experiments i s I sketched in Fig. 11-A-I. A number of sentences and phrases a r e subjected

to distortion, which i s effected by removing some of the information from

the speech signal. The subjects then t ry to recover the sentences and phrases

and write down their responses. A linguistic analysis of the original, non-

distorted stimuli and of the written responses makes a comparison and a

further analysis possible. Thus, hopefully, the type of information that the

preserved non-distorted pa r t of the utterances i s able to t ransfer can be

estimated.

3F Thesis work a t the Institute of Linguistics, Stockholm University

STL-QPSR 2 -3/197 1

Experiment 1: Prosodic cues

This experiment was conducted in order to find out whether the prosody

of a n utterance i s capable of transmitting information concerning the linguis-

t ic form of an ut terance, even if there i s no segmental information. In the

following "prosody" i s to be understood a s the combined features of amplitude,

fundamental frequency, and temporal aspects , such a s syllable duration.

There were 9 subjects, a l l employees a t the Royal Institute of Technology

(KTH), Stockholm, and a l l subjects speak s imi lar varieties of the Stockholm

dialect.

In Swedish, there i s a distinction between two kinds of word accent. Words

either have the acute accent o r the grave accent (also called accent I and 11).

The difference i s found primari ly in the fundamental frequency pattern of the

syllable bearing main s t r e s s , but the cor re la tes have been reported to include

duration and intensity too and a r e , in fact, spread out over the entire word.

Accent I1 can occur only in words with more than one syllable. Compounds

almost exclusively take accent 11. There exis t a number of minimal pa i r s ,

e . g. buren (accent I , "the cage") - buren (accent 11, q ' ~ a r r i e d 9 Q ) . The system

adopted in this study to denote the possible syllable type s i t that used in

"Svenska Akademiens Ordbokfl (1898- ) (The dictionary of the Swedish

Academy). A syllable with main s t r e s s i s assigned the number 4 if i s has the

acute tone, and 3 if i t has the grave tone. There i s a secondarily s t ressed

syllable in compounds with the grave tone. I t i s denoted by a 2. An unstressed

syllable i s represented by 0. However, a few simplifications have been made.

The lexical notation 1 pertains to a weakly s t r e s sed syllable distinct f rom an

unstressed syllable. This i s used only in words with more than two syllables.

In this paper the syllables of this category will be grouped together with the I

unstressed syllables. A distinction i s made a lso between the prosodic con-

tours in pair s like talat - talakt ("spokenf' - *'speech actf ' ) . Both words a r e -- given the contour 3 -2, but since the second word i s a compound, i t would be

represented like ta13 - akt2, where the hyphen indicates that the l a s t syllable 3 2 i s somewhat m o r e heavily s t ressed than that of tal a t . However, these dis-

tinctions a r e often hard to detect auditori;.yand have been disregarded in this

paper. F o r a more detailed description of tones in Swedish, see e. g. E l e r t

(1970) and Teleman (1969).

The possible contours of ut terances of two o r three syllables a r e l isted

and exemplified a s follows :

STL-QPSR 2 -3/197 1

P e r gdr

gaffe1

melon

bbtla s t

P e r hyr bil

P e r l a s e r

P e r ska gb

i vbr tid

k r itiker

en dker

en maskin

(Pe te r walks)

(fork)

(melon)

(ship load)

(Pe te r ren ts a ca r )

(Pe te r r eads )

(Pe te r will go)

( in our t ime)

(c r i t ic )

( a field)

(a machine)

(the dwelling)

motorvag (motorway)

i bakld s ( a deadlock)

Ander s gbr (Andrew walks)

fin vsning (nice apartment)

16 words and phrases , in fact, exactly those l isted above, each one c o r r e s -

ponding to one of the 16 existing contours fo r 2 and 3 syllables were written

on cards. F r o m these cards a recording was made by a trained phonetician,

belonging to the same dialect a r e a a s the subjects ( J . Jonasson, Dept. of

Linguistics, Stockholm University). The stimuli were generated by "hum-

ming" the prosodic patterns (i . e. with the o ra l cavity closed and the vocal

cords vibrating). Ideally, synthetic stimuli should .be used, containing only

the fundamental frequency, intensity, and duration information of the original

speech. Inspection of these pa ramete r s in hummed and natural utterances

revealed sufficient s imilar i t ies however. Each prosodic pattern was repeated

once and the 16 patterns were arranged in random order . The same 16 pat-

t e rns occurred once m o r e in a different randomization o rde r , and altogether

there were 32 stimuli ordered in two se t s of 16 different stimuli. The two

se t s were used in the same session, in succession.

The experiment was conducted by having a l l the subjects l is ten simulta-

neously to the recorded stimuli. Beforehand they were told that they were

going to hear some speech - a word, a phrase o r a sentence - but that only

the prosodic information of the utterances was preserved. F o r each one of

the recorded pat terns the subjects wese asked to write down an appropriate

response , i. e. an utterance, assumed to have a prosodic contour, identical

STL-QPSR 2-3/197 i 27,

with that of the stimulus. Every stimulus was replayed a s many times a s

requested by the subjects. The results would thus not primari ly bear on the

question: "Does a listener normally rely on prosodic information when de-

coding speech?" but rzther: "Are prosodi c cues a t al l relevant to this

process ?#I

Results of experiment 1.

Each response was classified into one of the 16 classes comprising the

possible prosodic patterns for 2 and 3 syllables. If, for example, a subject

gave a response analyzed a s having the prosodic contour 4-0 to a stimulus

given the contour 4-4, then, a s z whole, the sequence would be judged to be

incorrect, but when counting the number of correct syllables, there would

be a score of one correct syllable.

The percentage of correct response patterns were

87. 5 70 for bisyllables

8 1.0 70 for trisyllable s

83. 1 70 for al l stimuli together

Throughout this paper, a "correct response" i s to be understood a s a r e -

sponse which, in i t s relevant aspect, has the same structure a s the undis-

torted form of the stimulus.

That the figure for bisyllable s i s higher than that for trisyllables i s what

should be expected, since there a r e fewer alternative patterns for two syl-

lable s. It i s clear that the percentages a r e far above chance level (= 25 OJo for bisyllables, 8 . 3 $70 for trisyllables).

The differences between the subjects were rather large. The percentages

for the 9 subjects ranged from 62.5 70 to 100 OJo correct. The mean value was

83. 1 OJo and the standard deviation 12.4 %.

A comparison between the responses in the two se ts of stimuli (which were

made up by the same stimuli, but differently ordered) showed that i t was un-

likely that any training effect had influenced the results. Out of the 238 cor-

rec t responses, 120 were given in the f i r s t se t and 118 in the second.

STL-QPSR 2-3/197 1 28.

Matrix 1. Relationship be tween stimulus syllables and response syllables,

bisyllables (experiment 1). Number of observations i s indicated.

Response syllable no

prosodic category 4 3 2 0 r e sp.

Stimulus syllable

Matrix 2. Relationship be tween stimulus syllables and response syllables , trisyllable s (experiment I).

Response syllable

no prosodic category 4 3 2 0 r e sp.

Stimulus 3 syllable

2

Matrix 3. Combination of Matrices 1 and 2. Pe rcen t correc t responses

i s indicated.

Response syllable

no category 4 3 2 0 resp.

In the Matrices 1, 2, and 3 i s shown the proportion of response syllables of

different types for each of the different stimulus syllables. F r o m these matr ices i t can be inferred that the syllables with prosodic category 2 and

those marked 3 were represented incorrectly in the responses much more

infrequently than the other two syllable types.

4

Stimulus 3 syllable

2

0

0.3

0

95.4

1.6

--i 8. 6

0

2.8

91.7

88.3

2.8

0

3.6

I. 5

1.9

1.9

2.8

1.2

95.4

0

0.4

STL-QPSR 2 -3/197 1 3 0.

three unstressed syllables (Sigurd, 1965). If no further constraints on the

prosodic s t ructure i s presupposed, then i t can be shown (by inductive r e a -

soning) that the following expression gives the number of theoretically pos-

sible prosodic patterns: i f the number of syllables in the utterance is known.

where n i s the number of syllables in the utterance, Kn is the number of

different possible prosodic patterns given the number of syllables, Ak i s

defined by the recursive procedure A = A + A + 1; A = 0 A = 0 P P-1 P-2 0 1

(E) = o if p < q 9

Eq. ( I ) has been used to prepare Table 11-A-I.

TABLE 11-A-I. The number of different prosodic patterns in Swedish, according to the number of syllables.

Number of syllables Number of prosodic patterns

A pattern such a s 04 i s characteristic of a word like banan (banana)

but also of a pllrase like kom hit! (come he re ! ). F o r any given prosodic

pattern i t i s , in general, possible to find several linguistic interpretations

of it. Clearly, the numbers in the riglrthand columx~ of Table 11-A- 1 under - estimate the number of utterances that a r e distinct in t e r m s of syntactic

structure and word boundary locations. Consequently, if information about

the prosodic pat terns can be extracted from the acoustic ,'g -1 nal, i t seems

STL-QPSR 2 -3/197 1 31.

a s if the identification of prosodic features would be a powerful mechanism

for the l is tener to use for restricting the choices among the large number

of alternative syntactic and semantic s t ructures that an a rb i t r a ry acoustic

speech signal might be intended to convey.

Experiment 1 indicates that l i s teners a r e indeed able to make inferences

about the linguistic propert ies of utterances f rom prosodic speech param-

e t e r s and suggests the hypothesis that they might do so also in more normal

listening. situations.

Experiment 2: Grammatical morphemes and prosodic cues

In order to investigate the interaction of the prosodic acoustic cues and

grammatical features a second experiment was conducted.

There were 10 subjects, 8 of which had participated in exp. 1. The two

others belonged to the same dialect a rea . This time there were three se ts of

10 sentences o r phrases. Each stimulus had five syllables. The prosodic

pat terns of the stimuli in two of these sets were recorded in the manner , described in experiment I. Response sheets were prepared, where informa-

tion was given concerning grammatical morphemes. F o r this experiment,

grammatical morphemes were defined a s ar t ic les , inflectional endings,

conjunctions, personal pronouns, tense -forming auxiliaries, and

prepositions. Each syllable was indicated by a line for the subject to fill in.

The sheets a r e enclosed in an appendix. The experiment was divided into

three parts . The f i r s t was the grammatical p a r t (G). The subjects had

only the response sheets with grammatical information but no acoust' i C cues.

In the prosodic pa r t (P) the subjects were given blank response sheets, but

they were provided with acoustic stimuli, e. g. hummed versions of the ut-

terances. The third p a r t gave both prosodic and grammatical information

(G + P). Four 5 -syllable sentences and phrases were selected and then

the acoust ic and grammatical cues were extracted from them in order to get

appropriate stimuli for the three pa r t s of the experiment. Since a l l three

p a r t s were administered in one session to the same subjects, six dummy

stimuli were added to each part . I t was hoped that the t ransfer of a response

f rom one p a r t to another would be l e s s probable, since i t was assumed that

the dummy stimuli would conceal the similarity be tween the relevant stimuli

in the three parts . Informal interviews with some of the subjects afterwards

showed that they had not been aware that stimuli of the same origin were r e - appearing.

STL-QPSR 2 -3/197 1 32.

The following were the relevant stimuli in their undistorted fo rm (with

an approximation to English). The underlined sections show the information

presented on the response sheets in pa r t s G and G + P.

1. Han som. kan joddla. (He who - can yodel. )

2. K a n i n z hoppar. (The - rabbit jumps. - )

3. Han a ter -- en ost. (He - i s eat& - a cheese. )

4. E t t - rutigt - segel. (A checked - sail .)

Before the G + P par t of the experiment the subjects were asked to look a t

each line on the response sheet only on playback of the corrcsponding s t im-

ulus. This was to minimize the r i sk that the written stimuli were r ead in

advance and would hence influence the interpretation of the acoustic cues.

Resul ts of experiment 2

In this experiment, the responses were classified according to their

prosodic pattern, in the manner described in experiment 1. The resul t s a r e

tabulated in Table 11-A-2.

TABLE 11-A-2. Numbers of responses with a correc t prosodic pattern.

Type of information t ransferred

G P G+P

2 1 8 8 Stimulus no.

3 0 8 5

-- - -

Total 2 25 29

Pe rcen t 5 62.5 72 .5

There i s no clear improvement of the resul t s f rom p a r t P to G + P. Even

if the total for G + P i s higher, the P pa r t gave the best resul t for stimulus

no. 3 . What seems to be a safe conclusion i s (not surprisingly) that the pros-

odic pattern i s eas ier to obtain correct ly when it i s heard than when i t is not.

The types of e r r o r s in pa r t s P and G + P a r e shown in Matrices 4 and 5.

STL-QPSR 2 -3/197 1 33.

Matrix 4. Relationship be tween stimulus syllable s and response syllable s ,

pa r t P (experiment 2) . Response syllable

no prosodic category 4 3 2 0 resp.

S timulu s 3 syllable 2

Matrix 5. Relationship be tween stimulus syllables and response syllables,

pa r t G + P (experiment 2).

Response syllable

The same tendency a s in Table 11-A-2 i s shown in Matrices 4 and 5 - the

addition of grammatical cues does not bring about an increase in the number

of correc t response syllables. On the whole, the findings in e x p e r i m n t i

(Matrices 1 and 2) about the types of e r r o r s a r e also confirmed. The excep-

tion i s that originally unstrc ssed syllables tended to appear a s category 4 in

the responses in experiment 2 , whereas the opposite was true for experi-

ment 1. This may partly be due to the simple fact that there were propor-

tionally more syllables in category 0 and fewer in category 4 in experiment 2

than in experiment i .

no prosodic category 4 3 2 0 resp.

Another type of information t ransfer red is syntactic s t r u h r e . However,

this i s much more difficult to quantify and measure. F o r this paper a work-

ing definition of syntactic c lass i s given: Two utterance s belong to the same

syntactic class if: all conditions (a), (b), (c) a r c fulfilled:

1

S timulu s l 3 2 7 0 0

syllable 2 ' 0 0

0 1 9 0 4

(a) if they a r e functioning in the same way, e. g. a s a sentence, a noun phrase o r a verb phrase,

0

0

0

0 -

(b) if their immediate constituents in the surface s tructure a r e of the same I

type 9 I I

STL-QPSR 2 -3/197 1 34.

( c ) if each of the above-mentioned constituents in one of the utterances has the same number of syllables a s the corresponding constituent in the other utterance.

An example: The utterancesg'John drank a glass of beerg ' and "Ann wrote

seven let terss ' belong to the same syntactic class, since both a r e sentences

(condition a ) , both a r e made up of NP t VP in the surface structure (b), in

both the constituents on the next level a r e N t V t NP ( c ) , and,finally, in

. both the N, V, and N P have 1, 1, and 4 syllable s , r e spe ctively . In this way

the responses could be compared and classified. The number of correc t

responses, i. e. responses, which were members of the same syntactic

class a s the corresponding, undistorted stimuli, i s shown in Table 11-A-3.

The resul ts indicate that prosody alone cannot be used to recover syntactic

structure, but that the transfer of syntactic structure with the aid of gram-

matical morphemes i s considerably improved by the extra information of

prosodic patterns.

TABLE 11-A-3. Number of correct responses (syntactic structure).

Type of information transferred

G P G t P

2' 2 0 6 Stimulus no.

3 I 0 5

Total 15 2 2 7

Percent 37.5 5.0 67.5

Table 11-A-4 gives information on the variation of responses, and it

demonstrates even more clearly than Table 11-A-3 how the interaction be-

tween grammatical and prosodic information narrows down the answering

possibilities.

STL-QPSR 2'-3/197 1 35.

TABLE 11-A-4. Number of clifferect response clas5es for each stimulus, svntactic structure. Response clas sz s a r e taken to be different when they do not meet al l three conditions (a , b, c) m e ~ t i o n e d above.

Type of i.ni'orlmation t ransferred

2 7 8 2 Stimulus no.

3 4 8 3

Total 2 ! 3 1 9

Experiment 3: Content morphemes 2nd ?x-osodic cues

A third, supplementary experiment was made in order to investigate the

role of content morphemes (those morphemes no): counted a s gra.rnm7-*,ical

morphemes according to the definition in experiment 2).

There were 12 subjects, 8 of which had participated in either o r both of

the preceding experiments. The 4 others bclongcd to the same dialect a rea .

There were two pa r t s which were preparcd in exactly the same way a s in

experiment 2 , pa r t s G and G + P. Thc two pa?-ts will be refer red to a s C

(content morphemes) and C + P (content morphemes '; prosody), The 4

relevant stimuli in each of the two pa r t s were identical to those in exp. 2 a s

to syntactic and prosodic structure. Unfortunately, i t was not possible to

use exactly the same phrases and sentences in the two pa r t s , because i t was

assumed that the subjects would recognize the stimuli during the second part ,

and this would, of coursc, bias the resul t s for the S f P part. 1

The relevant stimuli in their undistorted form a r e l isted below (the under-

lined sections a r c those presented to the subjects). The response sheets will

be found in the appendix.

STL-QPSR 2 -3/197 1

P a r t C P a r t C + P

1. Han som vill sjunga - (He - who wants to sing)

2. Kalkonen ti t tar - (The turkey gloats)

1. Hon - som ska l a s a -- (She - who will - read)

2. Majoren s tammar - - (The ma jo r s tut ters)

3. Hon knacker en gren -- 3. De steker en fisk -- - ( ~ h e i s breaking a twig) - - (They a r e frying a fish) - -

4. E t t vanligt medel

(A common means) --

4. E t t t rasigt foder -- (A torn lining) --

Table 11-A-5 shows the number of correct ly represented prosodic patterns.

The column P f rom experiment 2 i s included for comparison.

TABLE 11-A -5. Number of responses with co r rec t prosodic pat tern (experiment 3) . Also shown in Fig. 11-A-2.

Type of information t ransfer red

12 subj. 10 subj. 12 subj.

2 1 7 7 Stimulus no.

3 4 8 12

Total 2 6 2 5 42

P e r cent 54.2 62.5 87.5

I t i s to be noted that the content morphemes a r e conveying nearly as much

information about the prosodic pattern a s the prosodic pattern itself. It is

a l so interesting that the inclusion of grammatical morphemes does not, to t

any significant degree, increase the accuracy of the identification of p ros - I

i I odic pat terns (Table 11-A-2), but that the irlclusion of content morphemes !

does. This i s shown graphically in Fig. 11-A-2. I

PERCENT RESPONSES WITH CORRECT PROSODIC PATTERN

: :: 1 - 0 1 m ' aJ I - 1 3 I rn ' Z t - ' i w

STL-QPSR 2-3/197 1

Tables 11-A -6 and 11-A-7 give information about the grouping of the

responses into syntactic c lasses , a s discussed for experiment 2.

TABLE 11-A-6. Number of co r rec t responses (syntactic s t ructure, experiment 3 ) . Also shown graphically in Fig. 11-A-3.

Type of information t ransfer red

C P C t P

12 subj. 10 subj. 12 subj.

2 11 0 11 Stimulus no.

3 12 0 12

Total 35 2 45

Percent 72.9 5.0 93.8

TABLE 11-A-7. Number of response c lasses (syntactic s t ructure, experiment 3) . Cf. a l so Fig. 11-A-4.

Type of information t ransfer red

C P C t P

12 subj. 10 subj. 12 subj.

2 2 8 1 Stimulus no.

3 1 8 2

Total 9 3 1 7

What can be seen f rom these tables is that there a r e small differences be-

tween columns C and C 4- P (in Table 11-A-6, i t is only stimulus no. I which

shows any substantial difference between the two columns). In comparison

with Tables 11-A-3 and 11-A-4 this seems to indicate that the prosodic pat- I

t e rns facilitate the recovery of syntactic s t ruc ture , a s long a s only the I I i

CONTENT PROSODIC WORDS CUES (C4 P)

(C) ( P) EXPERIMENT 3 *---------.

I GRAMMATICAL PROSODIC I- 5

MORPHEMES CUES (G P) EXPERIMENT 2 (GI (PI

TYPE OF STIMULUS . PRESENTATION

Fig. 11-A-3. A comparison between the reeulte of experiments 2 and 3 regarding recovery of syntactic structure.

NUMBER OF RESPONSE CLASSES

SYNTACTIC STRUCTURES

grammatical morphemes a r e given, but that no o r insignificant improvements

will occur if i t is the content morphemes that aye given, On the other hand,

it appears that the prosodic patterns can be inferred more easily by the

addition of content rhorphemes than by the addition of grammatical mor - phemes, See Figs.11-A-3 and 11-A-4 for graphical i l lustrat i ons.

Discussion

Since the experiments desckibed above a r e pilot studies, naturally they

leave a great deal to be desired. One difficulty has to do with the "linguistic

imagination" of the subjects. If a response has come out wrong, it may de-

pend on inability to perceive the stimulus a s i t was intended. But it can also

be due to the subjects' being unable to produce any linguistic sequence ful-

filling the given conditions. This makes the subjects give responses which

a r e "nearly correctt'. The reactions of the subjects during the experiments

make i t plausible that quite a few e r r o r s a r e due to this circumstance.

Another difficulty consists in controlling that the recorded material has

a "normal" and "neutral" s t ress and pitch - that it i s not unduly colored by

dialectal, idiolectal, emotional, or ~Jpresuppositional~' factors. These

factors, if they have had any importance, ought to have made it difficult

for the subjects to give correct responses.

An essential feature in experiments 2 and 3 i s that part of the stimulus

material was written, not recorded. How does this fact influence the results?

Presumably, this has overemphasized the written part of the material, e s -

pecially in experiment 2, where the grammatical morphemes were brought

out in a manner which i s probably incompatible with their acoustic role in

the speech signal. Moreover, the analysis of proscdi c patterns i s probably

insufficient fo r stimuli and responses beyond word level. To get a sounder

basis for the analysis, an approach similar to that which i s taken in Chomsky

and Halle ( 1968) probably should be adopted.

For all of the experiments the results a r e uncertain because of the small

number of subjects and (at least in experiments 2 and 3) the small number

of stimuli. It i s also to be observed that all of the findings, insofar a s they

a r e a t all valid, a r e valid only for Swedish. ,

In spite of their shortcomings, these experiments may indicate lines along

which further research can be done. The ultimate aim of this research would

be proposing a more substantive model of speech perception than those which

STL-QPSR 2 -3/197 I 39.

have been advocated so fa r . Current models a r e not necessar i ly incorrec t ,

but they reveal very l i t t le about the successive stages of the processing of

the speech signal. F o r example, see Stevens and Halle ( 1967), Morton and

Broadbent ( 1967). A more substantive model should tell u s which factors of

the speech signal a r e of importance, descr ibe how these factors a r e proc-

essed along different paths and how they interact to effect speech perception.

I t should a l so descr ibe differences between languages and individuals in

these processes , and in what ways the relative importance of different fac-

t o r s i s changed during noise.

These experiments make i t plausible that a mechanism for perception i s

used, which makes use of information conveyed by the acoustic cues of the

prosodic patterns. It i s a l so quite possible that the grammatical variables

play an important role in this mechanism.

Table 11-A-3 indicates that it i s possible to recover the syntactic s t ruc-

ture with f a i r accuracy, given grammatical morphemes and prosody. But it

i s not only the syntactic s t ructure that i s t ransfer red , since the procedure

for grouping the responses included a condition of "same number of syllablesi'

(condition c). This means that word-divisions a r e placed correct ly to a not

insignificant extent.

In order to a r r ive a t the desired model, i t will be necessary to use more

s t r ic t methods. The stimuli will have to be controlled by generating them

with a computer, and the computer synthesis i s necessary a l so to divide

prosody into i t s factors: amplitude, fundamental frequency, and syllable

duration. To make the response tasks somewhat eas i e r for the subjects it

may be necessary to give multiple-choice questions. With improved experi-

mental methods i t i s hoped that information will be obtained, relevant to the

modelling of speech perception.

References: I

BEVER, T. G. ( 1968): "A survey of some recent work in psycho l ing~ i s t i c s '~ , New York, Rockefeller University (mimeographed).

1

CHOMSKY, N. and HALLE, M. (1968): The Sound Pat te rn of English, 1 Harper & Row, Publ. , New York 1968.

ELERT, C. C. ( 1970): Ljud och ord i svenskan, Almqvist & Wiksell, Stockholm.

MORTON, I. and BROADBENT, D.E. (1967): "Passive versus active r e - cognition models", in Models for the Perception of Speech and Visual F o r m (ed. W. ath hen-Dunn), The MIT P r e s s , Cambridge, Mass.

STL-QPSR 2-3/1971 40.

PIERCE, J .R. (1969): "Whither speech recognition?", J.Acoust.Soc.Am. 46, pp. 1049-1051. -

SIGURD, B. (1965): Phonotactic Structures in Swedish, UNISKOL, Lund.

STEVENS, K. N. and HALLE, M. (1967): "Remarks on analysis by synthesis and distinctive features9 ' , in Models for the P e r - ception of Speech and Visual F o r m (ed. W . Wathen-Dunn), The MIT P r e s s , Cambridge, Mass.

STEVENS, K. N. and HOUSE, A. S. (1970): "Speech perception", in Foundations of Modern Auditory Theory (ed. J. Tobias), Vol. 1, Academic P r e s s , New York.

Svenska Akademiens Ordbok (SAOB) , ( 1898 - ). TELEMAN, U. (1969): "Bojningssuffixens form i nusvenskanv, Arkiv fbr

nordisk filologi - 84, pp. 163 -208.

STL-QPSR 2-3/1971

Var i s - du var

gren

a

knack 2 . Hon

vill sjung

ok

3. Han

4. Mot s tdnd

tin van 5. Mar

6. Tre

7.

P e r flick

stor

van

r e s

kon

s ten

del

9. Han

10. Kal

kvall

titt

Appendix 4. Response sheet, experiment 3, part C

Vdr kom

s ig

snart

de r

fisk

hu s

s a

s a

kall

typ

t r a

s tek

gam Pel -

Hon

Vind

ska

0

jor

1 i~

s tam

land

trev

Ma

Liv

Appendix 5. Response sheet, experiment 3, par t C + P