ipo annual progress reportalexandria.tue.nl/tijdschrift/ipo 17.pdf · 2010-06-25 · technology,...

175
Institute for Perception Research IPO Annual Prog ress Report 17 1982 Institute for Perception Research Den Dolech 2 Eindhoven Netherlands InAn'1l JI..,,..JlOI: Postal ad ress : lnstituut voor Perceptie Onderzoek P.O. Box 513 5600 MB Eindhoven Netherlands

Upload: others

Post on 10-Feb-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Institute for Perception Research

IPO Annual Progress Report17 1982

Institute for Perception Research

Den Dolech 2EindhovenNetherlands

InAn'1l JI..,,..JlOI:

Postal ad ress :

lnstituut voor Perceptie OnderzoekP.O. Box 5135600 MB EindhovenNetherlands

Page 2: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

2

Contents

Page

5 Introduction

6 Research Programme 1982/1983

8 Organisation fPO

10 Invited Contributions

11 I. Pollack

Applied perceptual research at IPO

14 A. Cohen

From intonation to pitch and v.v.

19 J.L. Goldstein

Optimum statistical communication models of the human senses

23 H. Duifhuis, P. Kraft and H.W. Zelle

Frequency and level effects in three-tone suppression

32 W.J.M. Levelt

Science policy: three recent idols, and a goddess

36 K.N. Stevens

Toward a feature-based model of speech perception

38 Auditory Perception and Speech

39 S.G. Nooteboom

Developments

41 M.T.M. Scheffers

The role of pitch in the perceptual separation of simultaneous vowels II

46 S.G. Nooteboom and G.J.N. Doodeman

Speech quality and word recognition from fragments of spoken words

51 C.J. Darwin

Analysis and synthesis of mixed excitation LPC coded speech

57 J.M.B. Terken

The role of accentuation in comprehension. A first test

63 B.A.G. Elsendoorn and J. 't Hart

Exploring the possibilities of speech synthesis with Dutch diphones

66 Visual Perception and Reading

67 J.A.J. Roufs

Developments

70 J.A.J. Roufs, A.A.G. Soons and R. Eising

Some experiments on sharpness in relation to contrast bearing on elec­

tronic optical imaging

Page 3: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

76 E. van der Zee and A.W. van der Meulen

The influence of field repetition frequency on the visibility of

f.licker on displays

84 H. Bouma, Ch.P. Legein, H.E.M. Melotte and L. Zabel

Is large print easy to read?

91 J.A.J. Roufs and J. Polstra

Line and edge-spread functions of the visual system elicited by a TV

display in situ

102 Cognition and Communication

103 D.G. Bouwhuis and H.C. Bunt

Developments

108 D.G. Bouwhuis

Strategy effects in letter and word recognition

116 S. Larochelle

The initiation and duration of movements in skilled typewriting

123 H.C. Bunt and G.O. thoe Schwartzenberg

Syntactic, semantic and pragmatic parsing for a natural language

dialogue system

129 F.L. van Nes and J. van der Heijden

On information retrieval by inexperienced users of data bases

138 Ergonomics

139 F.F. Leopold

Developments

141 P.A. Barbonis and F.L. van Nes

Learning to type on a chord keyboard

148 Aids for the Handicapped

149 H.E.M. Melotte

Developments

150 ,1. N. Kroon

The Typophone

156 J.E.M. Gabriels and H.E.M. Melotte

Development of reading aids: the Reading Desk Project

161 P.L.H. Schuurmann and H.E.M. Melotte

An artificial larynx with semi-automatic pitch control

162 Instrumentation and Software

163 L.F. Willems

Developments

164 J. Polstra

The speech synthesis chip in the Typophone

167 Publications 1982

178 Papers accepted for publication

178 Colophon

For information on the use of material from this report, see the last page.3

Page 4: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Introduction

The Institute

The 'Stichting Instituut voor Perceptie Onderzoek' IIPO Foundation, Institute for

Perception Research) constitutes a formal cooperation between Philips Research La­

boratories and the Eindhoven university of Technology. The Supervisory Board has

two members from Philips, two from the University and one member from the Nether­

lands Organisation for the Advancement of Pure Research ZWO. Scientists from many

disciplines serve as members of the Scientific Board.

The Institute, located within the University, welcomes guest researchers.

Events 1982

In September, Prof. Dr D.A. de Vries retired as chairman of the Supervisory Board.

We express our gratitude for his efforts on behalf of the Institute, among which

the delicate matters of housing and research financing deserve special mention. Dr

ir J. Nijman, member of the Executive Board of the Eindhoven University of

Technology, has succeeded Professor de Vries as chairman.

Dr ir P.L. Walraven left and Dr ir A. van Meeteren joined the Scientific Board.

This change reflects our continuing ties with the Institute for Perception TNO.

September 12 saw the 25th anniversary of the ' Stichting Instituut voor Perceptie

Onderzoek'. We celebrated our silver jubilee with a three-day exhibition, which

brought 500 guests. At a festive meeting on September 14th, the exhibition was

opened by Dr R.J. In 't Veld, director-general of the Department of Education and

Sciences. Other speakers were Prof. dr H.B.G. Casimir, chairman of the Scientific

Board and Prof. dr H. Bouma, present director of IPO. A booklet in Dutch explained

present research subjects to a general readership. On the occasion of our silver

jubilee, you will find a number of invited contributions to the present Annual

Progress Report. We express our gratitude to the authors for their insightful

discussion of various aspects of our common field of research.

In November, Drs B.L. Cardozo retired as deputy director of IPO. We are very grate­

ful for his many scientific and organisational contributions to the Institute from

1957 onwards.

In November, Dr N.J. Willems took his doctorate at Utrecht University with a thesis

entitled: 'English intonation from a Dutch point of view'.

Prof. dr H. Bouma and Dr D.G. Bouwhuis organised the 10th International Symposium

on Attention and Performance, held in Venlo from July 5-9. The main theme was Con­

trol of Language Processes. 65 researchers participated, 36 of whom contributed a

paper. The proceedings will be published in 1983.

In November a final positive decision was reached as to a new IPO building, to be

erected close to the present one. The building phase will last from 1983 to 1985.

5

Page 5: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Research Programme 1982/1983

IPO research is in general concerned with the understanding of sensory and cogni­

tive human information processing. Both theory-motivated and application-motivated

research are pursued in relation to the use of flexible information equipment, both

hardware and software. Natural language is specifically investigated because of its

relevance to man-machine communication. Main research topics are specified below.

Auditory Perception hearing theoryPsychoacoustic

cially where

sounds*.

study of normal and impaired hearing, espe­

relevant to speech perception. Multiple

6

Speech

Visual Perception

sound controlSound quality (digital storage, compression*). Perceptual

evaluation of unwanted sounds.

pitch and voicingPsychoacoustic theory of pitch and vOlclng (sound source)

applied to normal speech. Perceptual separation of simulta­

neous voices and improvement of speech-to-noise ratio. Com­

puter implementation.

intonationDescription and rules' of intonation in Dutch and British

Engl ish. Communicative value of presence and absence of

pitch accents in descriptive speech II 2.) •

word recognitionQuantitative modelling of the human recognition of spoken

words, both isolated and embedded in sentences. Relation

between recognition of word fragments and of full words,

taking the lexicon into account*. Computer implementation.

concatenation 3) ~)

Rules for concatenation of diphones to form natural words

and longer utterances, adding intonation and duration

rules. Application on voice response units and text-to­

speech conversion for Dutch*

speech processingAnalysis (pitch, voiced-unvoiced, formants), economic cod-

ing, resynthesis and editing of speech. Improving the qua­

lity of voice response units (speech chips). Intelligibili­

ty of good-quality speech in particular.

. luminance contrastsExtending visual transfer functions to supra-threshold con-

trasts*. Time function of brightness impression. Qantita-

Page 6: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Reading

Cognition andCommunication

Ergonomics

Aids for theHandicapped

tive models of retinal signal processing.

image qualityMeasuring and understanding the subjective quality of elec­tronically displayed images, as regards contrast, sharp­

ness, luminance, size, and time factors. HiFi television,

medical imaging*.

text displaysImprovement of legibility and lay-out as related to reading

processes and visual search (eye saccades) including visualcomfort and fatigue*. Application in visual display units.

reading processesReading of words and paragraphs by elderly people with nor-

mal and subnormal vision in relation to size and illumina­tion of the text*. Application in reading aids.

word recognition theoryExtension from short to long words. Application in word

processors.

interactive self-instruction 5 )

Simultaneous vision and audition of words and sentences in

direct access, with interactive programming facilities for

written and spoken language 1earning* (Dutch, English). Op­timal strategies. Instruction aids, using electronic speech

~nd microprocessors.

information dialoguesWritten and spoken man-machine dialogues. Theory of commu-nicative acts in relation to human performance and memory

limitations and to human dialogue partner's knowledgestates. Computer implementation of the theory*. Application

in information automatons.

product ergonomicsOptimum use and design of input and output media of infor-

mation equipment, in relation to industrial design. Voice

mail system, interactive remote contro1*, optimal VDU text

presentation and various products. Directions of use.

communication aidsDevelopment, production support and evaluation. New aids

for hearing, reading, speaking, and typing for young and

elderly people with auditory, visual or motor handicaps. In

particular, typewriter output of spelled speech; ergonomicmagnifiers (both optical and CCTV), and e1ectro1arynx with

intonation control 3). Application of concatenated speech*.

* Newly definedFormal cooperations outside Philips Electronic Industries and the Eindhoven Univer­sity of Technology:

J) Max Planck Institut fur Psycholinguistik (Nijmegen).2) Interfacultaire werkgroep Taal- en Spraakgedrag (Nijmegen University).3) Institute of Phonetics (Utrecht University).4) Department of General Linguistics, Phonetics Laboratory (Leiden University).5) Department of Instructional Psychology (Tilburg University). 7

Page 7: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Organisation IPO

Supervisory Board(31.12.1982)

Scientific Board( 31 • 12.1982)

Director

Deputy Director

Dr ir J. Nijman (chairman)Dr P. KramerProf. dr W.A.T. MeuweseDrs J. SmitsOr ir K. Teer

Prof. dr H.B.G. Casimir (chairman)Prof. ir R.G. BoitenProf. dr ir P. EykhoffProf. dr J.P. van de GeerProf. dr H.E. HenkesProf. dr L.F.W. de KlerkProf. dr S.L. KweeProf. dr W.J.M. LeveltOr ir A. van MeeterenProf. ir O. RademakerProf. dr R.J. RitsmaProf. dr H. Schult inkProf. dr ir H. SpekreijseProf. dr P.C. VeenstraProf. dr ir'C.J.D.M. VerhagenDr P.A. van WelyProf. dr P.J. WillemsProf dr ir A. van Wijngaarden

Prof. dr H. Bouma

Drs B.L. Cardozo+

- HeezeDelft

- Eindhoven- Leiden- Rotterdam- Tilburg- Eindhoven- Nijmegen- Soesterberg

Eindhoven- Groningen- Utrecht- Amsterdam- Eindhoven- Delft- Eindhoven- Tilburg- Amsterdam

Adviser Prof. dr A. Cohen (Utrecht University)

8

Group Leaders

+ Left during 1982

Prof. dr S.G. NooteboomDr ir J.A.J. RoufsDr D.G. BouwhuisIng. F.F. LeopoldH.E.M. Melotte

Ir L.F. Willems

- Hearing and Speech- Vision and Reading- Cognition and Communication- Ergonomics- Communication Aids for the

Handicapped- Instrumentation

Page 8: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Research Associates

Research Staff

Secretaries

Librarian

Workshop

Ir A.W. Bezemer+ (ZWO·)Ing. H.J. BleilevenIr F.J.J. Blommaert+ (ZWO·)Ir J.M.H. du Buf (ZWO·)Dr H.C. BuntDr R. Collier' (University Centre, Antwerp, Belgium)Dr C.J. Darwin+ (Universit¥ of Sussex, England)Drs B.A.G. Elsendoorn (ZWO )Ms Drs J.E.M. Gabriels-van den Berg+J. 't HartDr A.J.M. HoutsmaIng. Th.A. de JongDrs H.W.L.M. KreutzerIr J.N. KroonDr S. Larochelle+ (Univ. of California, San Diego, USA)Ir M.A.M. Leermakers (ZWO·)Dr Ch.P. Legein' (Catharina Hospital, Eindhoven)Dr S.M. MarcusIng. G.J.J. MoonenH.F. Muller+Dr ir F.L. van NesDrs H.A.A. OttenDrs H. de Ridder (ZWO·)Ir M.T.M. Scheffers (ZWO·)Drs J.M.B. Terken (ZWO·)Ms Drs P.G.M. Truin (ZWO·)Drs M.J. van der Vlugt (ZWO·)Ir L.L.M. VogtenDr N.J. Willems (ZWO·)Ms Drs L. Zabel+ (ZWOO)Ir E. van der Zee

T. Bierlaagh+Ms Ing. M.H.W.A. BoestenIng. M.C. Boschman (ZWOO)Ing. G.J.N. DoodemanIng. R.C. En~elen

C. FellingerIng. R.A.M. van Gorp+Ing. J.C. Jacobs+C.A. LammersIng. R.A.J.M. van LieshoutIng. J.A. van der LindenA.C. van Nes+Ing. G.W.A. NiesenIng. J.A. Pellegrino van Stuyvenberg'Ing. J. Polstra+Ing. P.L.H. Schuurmann+Ing. L.J.C. Theelen'Ing. J. TiesingaIng. P. IJtsma+Ing. H.W. Zelle+Ing. A.J.S.M. Zijlmans

Ms I. EsveldtMs P.J. EversH.M.C. van de Nieuwenhof'Ms C.E.A.L. Nuys-van de Water'Ms I. Schutte'

Ms R.M. Smith

C.G. Basten+J.H. BolkesteinA.J.J. BruursA. van de Heuvel+J.J. Leijssen+J. van PeltT.A.C. Wollerich

, Part time+ Left during 1982° Netherlands Organisation for the Advancement of Pure Research ZWO 9

Page 9: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

10

Invited Contributions

Page 10: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Appl ied perceptual research at IPO

I. PollackUniversity of Michigan, Ann Arbor

From its inception, under the inspired leadership of Prof. Schouten, the basic re­

search and applied research programs at IPO have received mutual sustenance from

each other. As a visitor to IPO in summer, 1960, I recall with great fondness the

spirited discussions around the conference table of practical and basic research

problems posed by Prof. Schouten. His point of view was excellently summarised in

the introduction to the 1971 Progress Report.

'Throughout almost 40 years spent in research and development, I always marvell­

ed at the remarkable ways in which a sound scientific approach may lead to the

most unexpected practical applications and, conversely, how seemingly simple

practical problems may inspire the scientist to novel avenues of research.'

As I look through the yearly IPO Progress Reports, starting in 1966, I am extremely

impressed by the strong threads of scientific continuity which run through the en­

tire series. One sees little evidence of attraction to scientific 'fads' which, un­

fortunately, disrupt scientific continuity. To me, this state of affairs must be

credited to the initial course set by the founder, to the wise leadership of the

Institute by Schouten and his successors, to the unfailing support of the Philips

Laboratories and the Eindhoven University of Technology, and to the devotion of a

loyal and dedicated staff. Few research institutes in the entire world have main­

tained the vision of its founder in such an exemplary fashion.

My brief remarks of individual areas will be sketchy. I feel like a kid in the can­

dy store surrounded by so many delicious examples that I must pick and choose atrandom.

The research in audition and speech, my own areas of primary interest, have been

characterised by attention to important fundamental problems. Specifically, the

work of Schouten and his associates in the perception of pitch forced the reorien­

tation of the entire field of auditory perception from an emphasis on spectral

Fourier frequency to an understanding and appreciation of the role of timing mecha­

nisms. Similarly, the field of speech perception has been reoriented by IPO re­

search to a fuller appreciat ion of the important role of timing and time-varying(intonation) properties of speech.

In Schouten's own words: 'It struck us, when reviewing our activities of the past

ten years, that the time element played such a preponderant role in many of our

fields for research.' He then enumerated the specific examples in audition, percep­tual skills, medical physics, vision and speech.

Many examples of the interaction between basic and applied research at IPO in the 11

Page 11: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

12

field of audition can be cited. My favorite concerns the role of 'dropouts' of au­

ditory information on magnetic tape which led Cardozo and collaborators to examine

'auditory jitter'. These investigations opened up new areas of research on factors

underlying the naturalness of speech synthesis, on the detection of laryngal speech

pathologies, and on basic timing mechanisms in auditory perception.

Likewise, the research in visual perception has been characterised by attention to

fundamental mechanisms and by concern for their application. The Schouten approach

to the analysis of perceptual systems in terms of the fine temporal microstructure

of events is well exemplified by the work of Roufs and collaborators. The recent

work on point spread functions has important implications for understanding the in­

teraction of spatial and temporal properties in visual perception. Bouma's work in

reading in the behaviour of dyslexics, in the problems of t~~ elderly reader, have

already resulted in equipments that are enriching the lives of many of the visually

impaired.

One line of research which was active in the early years at IPO has suffered from

benign neglect - that of motor skills and motor performance. (This remark, of

course, excludes the strong emphasis on speech production which characterises the

present IPO program of speech research). Motor skills disappeared from the IPO An­

nual Reports in the early 1970's. The early work -like that of sensory perception­

showed special attention to the temporal microstructure of events, as exemplified

by Koster's treatment of the psychological refractory period. But, one cannot fault

the prescience of judgment of IPO's directors. In the eyes of leading observers,

the field of motor performance became sterile. The original judgment, however, was

sound: the study of sensory performance alone paints a picture of an information

receiver lost in thought. The more complete picture requires study of output mecha­

nisms. Recent work by Sternberg and Kelso on the programming of motor movements may

provide a natural link to complement the analysis of sensory systems. It is my pre­

diction that motor performance will find its way back into the IPO structure as the

work in motor programming infiltrates our thinking of human performance.

One clear threat running continuously through the IPO series is the extraordinary

attention devoted toward specialised research instrumentation. Scientists are often

unwilling captives of their available instrumentation and often fail to acknowledge

their indebtedness to the equipment designer. The Progress Reports reflect the

longstanding IPO acknowledgement of this debt. While general purpose programmable

computers have unburdened certain requirements for instrumentation, interface and

specialised instrumentation needs have multiplied in even greater number and in

reach and sophistication.

The role of Ergonomics at IPO is less clear to the outside observer. In terms of

the number of personnel, this area has not been strongly supported. On the other

hand, the entire Institute has been faithful to the vision of its founder in re­

sponding to applied problems, whether posed by the commercial marketplace (e.g.

Cardozo's work on the sound character of vacuum cleaners) or by humanitarian consi­

derations (e.g. Bouma's work on reading aids). To the extent that Ergonomics can be

dissociated as a specific IPO. research area, its emphasis has been in the general.

Page 12: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

systems area. Leopold clearly points out the ergonomic problems associated with the

large number of possible programmable functions available to the user of a modern

telecommunications-information system. The full potential of these systems will not

be realised unless they are regarded as 'user friendly'. Such systems will force

IPO to heed the advise of A. Oldendorff, one of the founders of IPO. On the occa­

sion of the 10th anniversary he argued:

'It seems desirable to me that IPO research should not be restricted to the psy­

chological approach of perceptual phenomena. It might make sense, for instance,

to think about perception in different environments and in different situa­

tions •... I should like to propose to open the discussion on the problems of se­

lective perception. In this context, I am not only thinking of personality vari­

ables that belong to the field of psychology, but I am also thinking of cultural

variables exerting influence on individual selective perception.'

These remarks, made in 1966, are indeed insightful. Ergonomic research on rich in­

formational systems must begin to consider the nature of the workplace itself. Con­

sider a society where a large proportion of its members work at home upon terminals

connected to large informational systems. While such a society may be more suppor­

tive of its handicapped members, it may rob the other members of important inter­

actions now enjoyed in the central workplace. The implications of such possibili­

ties to our entire style of living, working, and interacting, are mind-boggling.

Oldendorff's remarks did not fall in deaf ears. Under Director Bouma's leadership,

a new research group was established on 'Cognition and Communication'. A portion of

this work, represented by Bouwhuis and associates, was a natural outgrowth on stu­

dies of the reading process initiated by Bouma. Another portion of this work, re­

presented by the work of Bunt, is the application of semantic theory to the cogni­

tive process, and the rules for the interpretation of dialogue acts. I would hope

that this interesting work will make contact with the work of the IPO speech group

on intonation. and temporal organisation of speech. I predict a bright future fromthis interaction.

What do these remarks bode for the future? Given the wise direction of the founder

and his successors, given the generous far-sighted support of Philips and the Eind­

hoven university of Technology, given the rich historical continuity of the Insti­

tute, I am forced to predict that the next 25 years will be as productive as the

first 25 years. I believe the words of the founder on the 10th anniversary have

been fulfilled:

'We are unravelling bit by bit the technology of the living being. And since a

great many industrial products are substitutes, simulations or improvements of

man's sensory, cerebral and motoric activities, it is not far-fetched to believe

that industrial technology may benefit from discoveries in the field of the

technology of the living being.'

13

Page 13: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

14

From intonation to pitch and v.v.

A. CohenUtrecht University

Introduction

The title of this contribution, by invitation from the Editor of this Annual Pro­

gress Report, is a deliberate allusion to the first doctoral thesis, written by R.

Collier in 1972, based on the work on intonation carried out at this Institute.

My main aim will be to sketch how a research programme of long standing has devel­

oped over the years and to show how a research proposal in the field of experimen­

tal investigation can get adrift from its original scope and how the programme it­

self only seemed to prosper if allowed to follow its own uncharted ways. This seems

all the more relevant at a time, as the present one, when research in most, notably

Dutch, university establishments can only be conducted on the basis of thorough

planning over a number of years. Such a plan needs the approval of a body of out­

siders, who are even less aware of the vicissitudes of a line of research which

must, of necessity, be beyond their own immediate expertise, than its designers.

What will follow can therefore be seen as a case history of research, starting out

from premises of a fundamental nature, which were not fulfilled, and which has led

to results in the field of application, which were never foreseen in the original

plan.

Measuring intonation

In spite of the qualms about having to carry out research according to a circum­

scribed plan, as expressed in the Introduction, at the time when a study of intona­

tion was first contemplated there seemed to be no harm in setting down on paper

what the research objectives were. They appeared in the form of a memorandum, dated

10th May 1963, entitled 'Measurements of intonation'. From the start it was made

clear that intonation involved the subjective experience of listeners in hearing

speech with respect to pitch sensation.

Intonation, in other words, was seen as pitch sensation in speech and the aim of

the research to be set up was to find a way of measuring it. The starting point re­

garding pitch as a subjective phenomenon, in many respects comparable to the sensa­

tion of weight, for instance, made it necessary to find a round about way of estab­

lishing a technique of charting these sensations in a series of perceptual experi­

ments. The ultimate aim was to contribute towards building hypotheses about the way

the human hearing. mechanism operates in processing pitch information in close coop­

eration with fellow workers in the laboratory who had been engaged, over a number

of years, in the psycho-acoustic domain.

Those were the days in which the residue theory was still upheld and a fierce con­

troversy was still waging over the question whether pitch was processed by means

of a frequency measuring device, derived from the place of maximal excitation on

Page 14: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

the basilar membrane, or due to a time measuring device beyond it, in terms of

periodicity pitch.

The memorandum contains, apart from these psycho-acoustic considerations, a number

of phonetic observations. One was that children seem to assimilate intonation be­

fore they learn to get to grips with acquiring vocabulary. Another one was that

pitch intervals in intonation hardly ever resemble musical intervals, a finding

corroborated by the notorious difficulty even musically trained people encounter in

interpreting pitch movements in ordinary speech. One is at best vaguely aware of

rises and falls, which, moreover, seem to occur continuously throughout an utter­

ance, in spite of the well acknowledged fact that in reality there are always un­

voiced passages from which no pitch sensation could be derived.

At the time we were, of course, well aware of the fact that there were many ways of

determining changes in fundamental voice frequency, but, as in telephone speech no

information about fundamental frequency need be present to obtain pitch impres­

sions, we were not interested in tracing Fo directly. Moreover, any visual record­

ings of the acoustic signal would not in themselves answer the question uppermost

in our minds about their contribution to the sensation of pitch.

The technique we finally devised to make pitch sensation measurable was inspired by

the work we had carried out in the phonetics group in segmenting diphthongs. In an

analysis by synthesis approach we had been able to convince ourselves that it was

possible to induce the sensation of a gradual formant frequency shift, by synthe­

sizing diphthongs made up of two steady state portions. This success emboldened us

to apply a similar technique in charting the way the ear might assimilate pitch in­

formation in speech by cutting up isolated words into a series of small steady

state pitch segments by gating out ca 20-30 ms at a time. This size was based on

findings in the psycho-acoustic group about the time needed for making adequate

pitch measurements. The technique required a listener, the experimenter, to match

by ear for equality of pitch a gated out portion of a voiced stretch of speech,

contained in a tape recorded word spoken in isolation, with a synthetic speech

signal.

This matching signal was of equal size, resembling the speech segment as much as

possible in intensity and in its spectral quality by means of a set of variable

formant filters; moreover, its fundamental frequency could be adjusted by turning a

knob and its periodicity could be read off from a frequency counter. The match was

to be carried out by highpassing both signals so as to make sure that the actual

matching was based on residue pitch. This research proposal was never carried out

in the way it was laid down, even in its simplest form. What was more,. we never

managed to involve our fellow psycho-acousticians in our plans. Although we took

their findings very seriously in our subsequent research, and certainly benefited

from them, we were very much left alone to our own devices. The pitch matching

technique nevertheless enabled us to answer at least one question in our original

research proposal and that was that we had stumbled on a means, however laborious,

of charting the way in which pitch sensation could be measured perceptually. We had

by no means solved the problem of how the ear deals with pitch sensations in

speech, but we had obtained a technique of measuring intonation.

15

Page 15: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

16

The grammar ot Intonation

It was through a lucky coincidence, at a meeting in Boston in 1964 with Phil Lie­

berman, that we abandoned, at his suggestion, the idea of confining our attention

to measuring pitch in isolated words. At the time we had hoped to be able to extra­

polate from such measurements with a view to getting a clear picture of what was at

stake in real speech. The course of this broadening of scope has been reported

quite fully by e.g. Collier (1972) and in subsequent publications by 't Hart and

Cohen (1973) and 't Hart and Collier (1975) and need not be repeated here.

The basic elements of the overall approach in which intonation is primarily seen as

a perceptual phenomenon, wh ich should therefore be measured by perceptual means

through listening experiments, are the following: the intricate course of vocal

cord vibrations giving rise to Fo fluctuations can be accounted for perceptually by

a highly reduced set of straight line approximations, charting the rises and falls

as perceptually relevant pitch movements. These are seen as so many discrete com­

mands to the vocal cord mechanism by the speaker, corresponding to a strategy by

which words in the utterance can be marked for pitch accents. So far the perceptual

approach has circumvented linguistic considerations and the notion of intonational

grammar is nothing more than a descriptional device accounting for the sequences of

possible intonation patterns, seen as building blocks, together making up the pitch

contour of an utterance in the language at issue.

At first, work was confined to the study of Dutch intonation but subsequent work

has shown the applicability of this approach towards charting the characteristic

patterns of other languages such as American English (Maeda, 1976), French (Del­

gutte, 1976) and recently British English (de Pijper, 1980; Willems, 1982).

Declination

One rather important component in this approach has been left out of account and

this is the concept of declination. As such it constitutes another example of the

difficulty in prognosticating research in terms of long range planning. It started

life in a stepmotherly way. In an attempt at establishing an average pitch height,

conforming to the differences in overall pitch register among different speakers,

it turned out to be impossible to represent this by one fixed value for each indi­

vidual speaker. Only after tilting the horizontal line in the visual recordings

were we able to solve this problem by assuming a virtual slowly declining line from

which rises could be constructed to depart and towards which falls seemed to re­

vert. What had started out as a descriptional device to accommodate visual record­

ings of our pitch matching measurements, soon turned out to capture a strong per­

ceptual reality, since attempts at resynthesizing pitch contours from which this

factor was absent, resulted in perceptually unacceptable intonation in subsequent

listening experiments.

For the moment it will suffice to state that it forms an integral part of any into­

nation grammar, particularly with respect to generating synthetic speech conforming

to the demands of naturalness. Rises and falls are seen as departing from, and

tending towards, an overall gradual lowering Fo in the acoustic signal, having

something to do with the amount of subglottal air pressure.

In recent years this notion of declination has started out on a life of its own

Page 16: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

hardly recognisable from the modest beginnings it arose from. It would not be an

exaggeration to suggest that the treatment of this sUbject may give rise to much

further research, never contemplated or foreseen by its godfathers.

In another publication we have found occasion to give our views on the history and

present status of declination (Cohen, Collier, 't Hart, 1982).

Applications

Above we briefly noted that the perceptual approach to intonation, originally de­

vised to deal with Dutch material, has given rise to its subsequent application in

tackling intonation of other languages as well. The interest other researchers have

shown was mainly inspired by their desire to find rules for application in systems

of generating synthetic speech of acceptable quality with regard to specification

of Fo values.

This particular application was never foreseen in our original plans, nor were a

number of other applications which turned out to be feasible. One was in the field

of foreign language learning. This particular object was achieved by the course in

Dutch intonation for the use of foreign learners devised by Collier and 't Hart

(1981). Another offshoot of the perceptual approach to intonation in which the

melodic characteristics of a particular language are described, coupled with the

theoretical postulate that these characteristic rises and falls are somehow con­

trolled by the speaker I s strategy in singling out particular words by means of

pitch accents, has given rise to a comparative study of Dutch and English intona­

tion. In this work (Willems, 1982), the aim is to arrive at a set of recipes for

the improvement of intonation by Dutch learners of English by charting characteris­

tic correspondences and differences obtaining to the intonation patterns of either

language. The underlying idea is that speakers of their native language, Dutch,

should be made aware of the implicit knowledge they have of the intonation patterns

of their own language. This approach assumes the feasibility of introducing a cog­

nitive strategy in a training programme. Once this object has been achieved it

becomes feasible to familiarise learners with the intonation patterns of the fo­

reign language, in this case British English, which clearly deviates from Dutch.

The data for English were largely based on the work of de Pijper (forthcoming) who

devised a first approximation towards a melodic model of British English intona­

tion, giving rise to the standardisation of a limited set of intonation patterns,

which were all checked for their acceptability in a series of perceptual experi­

ments with English subjects. Willems was able to show that reconstructions of ori­

ginally Dutch intonation patterns, in terms of a small set of recipes for turning

them into standardised English patterns, resulted in vast improvements according to

judgments obtained from English listeners.

Another unexpected offshoot of the original unachieved aim of establishing a model

to explain how the ear processes information of pitch in speech is the application

in the field of speech pathology, notably in the case of electrolarynx speech. On

the basis of the knowledge acquired about characteristic intonation patterns in

Dutch, notably the so-called hat pattern, and the perceptual requirement of decli­

nation, a prototype electrolarynx was constructed with built in semi-automatic Focontrol.

It has been shown by van Geel (1982) that patients who had lost the use of the17

Page 17: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

18

larynx and who had subsequently been dependent on the use of an artificial larynx

could be trained, very much on the lines of a cognitive strategy, to be made aware

of intonation patterns of their own language, Dutch. On the basis of these results

alaryngeal speakers will stand to benefit from this particular application since

the otherwise completely monotonous, robotlike quality of electrolarynx speech can

be vastly improved upon by the facilities provided by an instrument capable of pro­

ducing perceptually acceptable pitch fluctuations.

Conclusion

Looking back over the years it seems only natural that a study of intonation at IPO

should have started as a problem in the field of pitch perception, a field in which

pioneer work had been done. Intonation, at the time, was only seen as a special

feature of pitch, Le. the pitch in human speech. It soon turned out that the

psycho-acoustic world was not impressed by the efforts of a couple of phoneticians

who ostensibly meant to solve their, the psycho-acousticians' problem, how the ear

processes pitch, based on results obtained with pitch as it occurs in speech. Thus

the study of pitch as such was abandoned and rules for the intonation of specific

languages were ultimately set up. The fruits of this research programme, which was

supposed to be of a fundamental nature, can be seen from its unexpected and unfore­

seen practical applications, in the fields of speech synthesis, foreign language

learning and pathological speech. The main reason why it should have produced any

results at all is that all the time the vantage point of perceptual relevance was

maintained and linguistic problems were bypassed. As such it was an unorthodox

approach and would probably have scored very poorly with a panel of outside jUdges

with their own ideas of what intonation is all about, even if the initiators could

have foreseen in a flash what its ultimate results would have been, which they

clearly did not and could not.

References

Cohen, A., Collier, R. and Hart, J. 't (1982) Declination: Construct or intrinsic

feature of speech pitch? Phonetica 12, 254-273.

Collier, R. (1972) From pitch to intonation. Doct. thesis, Leuven.

Collier, R. and Hart, J. 't (1981) Curs us Nederlandse Intonatie, Acco, Leuven.

Oelgutte, B. (1976) Fundamental frequency contours of French: a perceptual study.

M.Sc. thesis, M.I.T., Cambridge, Mass.

Geel, R.C. van (1982) Semi-automatic pitch control for an electrolarynx. In: Elec­

tro-acoustic analysis and enhancement of alaryngeal speech, 190-197, ed. by A.

Sekey, C.C. Thomas, Springfield Ill.

Hart, J. 't and Cohen, A. (1973) Intonation by rule: a perceptual quest. J. Phone­

tics ..!., 309-327.

Hart, J. 't and Collier, R. (1975) Integrating different levels of intonation

analysis. J. Phonetics 1, 235-255.

Maeda, S. (1976) A characterisation of American English intonation. Ph.D. thesis,

M.l.T., Cambridge, Mass.

pijper, J .R. de (1980) A melodical model of British English intonation. Annual

Progress Report of the Institute for Perception Research (lPO) 12, 54-58.Pijper, J.R. de (forthcoming) Towards a melodic model of British intonation.

Doct. thesis.

willems, N.J. (1982) English intonation from a Dutch point of view. Foris Publica­

tions, Dordrecht.

Page 18: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Optimum statistical communication models of the human senses

J.L. GoldsteinTel-Aviv University

Comparisons between the human brain and modern information processing machines have

been based upon tempting analogies between human goal seeking and automatic control

systems (1), human language and efficient communication in the presence of noise

(2, 3) and between the human brain and stored program computers (4). Early hopes of

the cybernetic revolution founded upon these analogies for a new era of symbiotic

growth uniting information technology and brain science have been dashed by the in­

dependent and prodigious development of information machines that outdo many human

sensory and motor skills.

Underlying recent purely technological progress indeed are the profound mathemati­

cal principles of logic and rules for optimum processing, identification and deci­

sions based upon stochastic signals that were promulgated by the founding fathers

(1-3) of cybernetics. More decisive however are revolutionary inventions in elec­

tronic hardware that permit inexpensive implementation of complex algorithms. Mean­

while it has become painfully clear that it is vastly more difficult to measure,

analyse and comprehend the signals, hardware and specific tasks of the human brain,

than it is to synthesize humanoid machines. Nevertheless, modest but significant

progress continues to be made in gaining scientific knowledge of some aspects of

the human brain through the systematic and patient application of the basic prin­

ciples underlying modern information engineering.

In particular, optimum statistical communication as a model for sensory psychophy­

sics and for identifying the neural basis of the psychophysics has proved produc­

tive in over two decades of extensive auditory research. In this approach the phy­

sical stimulus is postulated to be transformed lawfully into a set of stochastic

sensory signals which are subsequently decoded by an ideal probabilistic receiver

that performs its appointed tasks of stimulus detection, discrimination and recog­

nition optimally on the basis of available information. This approach has provided

an extensive quantification of the information transfer in various psychophysical

tasks, including detection and discrimination of monaural (5) and binaural (6)

stimuli and recognition of pitch complex tones (7). Quantification is. given in

terms of the stimulus-response characteristics of the sensory signals. As the de­

coding receiver operating on these signals is optimal, the stochastic sensory sig­

nals compactly describe the limits of psychophysical behaviour.

Extensive systematic research on the spike discharge patterns in the auditory nerve

of anesthetised laboratory mammals revealed the stochastic nature of neural respon­

ses (8) and permitted quantitative comparison between psychophysically and physio­

logically defined stochastic limits (9, 10). The whole auditory nerve generally

suppl ies more information than is uti lised perceptually. Good agreement between19

Page 19: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

20

psychophysics and physiology, however, was discovered when the ideal probabilistic

receiver is organised tonotopically (11-14). Significantly, the characteristics of

aud i tory-nerve phase locking are strongly reflected by psychophysics. Current re­

search on auditory-nerve responses to complex speech-like sounds and optimum proba­

bilistic models of monaural signal processing mechanisms have converged on similar

conclusions concerning auditory processing of tonotopically and temporally organis­

ed spike responses from the aUditory nerve.

The pitch of the residue or periodicity pitch, whose scientific study was promoted

by Jan F. Schouten, founder and first director of the IPO, has proved to be a key

perceptual phenomenon for developing knowledge of the auditory system. Periodicity

pitch denotes the perceptual fact that the temporal period of musical sounds and

voiced speech communicate musical notes and speech intonation. Periodicity pitch is

manifest in the various conceptual levels of perception from simple perceptual ele­

ments in classical frequency analysis to synthetic perception of notes of melodies

and up to linguistic perception of phoneme and stress distinction. It has yielded

to the classical scientific strategy of reductionism wherein the whole is under­

stood in terms of its elements (7, 16, 17), permitting the correlation between com­

plex sound perception and simple physiological signals from the auditory filters

(10, 13). Yet it has also contributed knowledge on how wholistic pattern inform­

ation stored in human memory is used to interpret the elementary signals (7, 18),

thereby motivating extensions of physiological models of peripheral signal proces­

s ing to more complex and less understood speech sounds (14, 15). Most important,

optimum statistical communication models have provided a general quantitative con­

ceptual basis for developing precise models of sensory perception, thus allowing

clearer definition of our areas of ignorance as well as knowledge and contributing

to more orderly progress in the general understanding and applications of the human

senses (19, 20).

Thus our opinion is that while the early exaggerated hopes of the cybernetic revo­

lution were frustrated, at least some of the basic insights have proved valid.

True, patient detailed scientific study of the psychophysics and physiology of sen­

sory-motor systems cannot be eliminated by logical deductions from sweeping gene­

ralisations. However, the cybernetic notion that sensory stimuli are monitored by

an ideal observer constrained by various hard won psychophysical and physiological

facts can partially fill the void in knowledge of central physiological mechanisms

as well as contribute to the development of this knowledge. Moreover, although

sophisticated technology won the game of building humanoid machines, the importance

of detailed knowledge of human sensory-motor behaviour is heightened by the in­

creased potential of technology to apply it. Indeed, work at the IPO in applying

recent knowledge of periodicity pitch psychophysics follows this line of develop­

ment (21).

Current technological developments include increasingly complex speech and image

production and recognition systems. Certainly knowledge of human psychophysics is

relevant when the design goal is signals recognisable by humans. Very likely too,

as claimed early in the cybernetic revolution, fundamental understanding of human

ability to cope with external and internal disturbances will prove useful when the

Page 20: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

design goal is recognition by machines. In our opinion the paradigm of optimumsta­

tistical communication provides a reliable and efficient basis for identifying and

quantifying relevant information bearing signals in human communication, and can

quicken the pace of uncovering knowledge of complex stimulus perception in the

speech and visual sciences. Thus, while the early promises of the cybernetic revo­

lution were exaggerated, they were not empty. Hopefully the next decade will be

blessed with the institutional support that will sustain and reap the anticipated

next developments in the cybernetic evolution.

References

1. wiener, N. (1948) Cybernetics, or control and communication in the animal and

machine. Wiley, New York.

2. Shannon, C.E. and Weaver, W. (1949) The mathematical theory of communication.

Univ. of Illinois Press, Urbana.

3. Shannon, C.E. (1951) Prediction of entropy of printed English. Bell Sys.

Techn. J. January lQ, 50-64.

4. Neumann, J. von (1958) The computer and the brain. Yale Univ. Press, New

Haven, Connecticut.

5. Green, D.M. and Swets, J.A. (1966) Signal detection theory and psychophysics.

Wiley, New York.

6. Durlach, N.I. (1972) Binaural signal detection: equalization and cancellation

theory. In J.V. Tobias (ed.). Foundations of modern auditory theory. Acade­

mic Press, New York, 369-462.

7. Goldstein, J.L. (1973) An optimum processor theory for the central formation

of the pitch of complex tones. J. Acoust. Soc. Am. 2!, 1496-1516.

8. Kiang, N.Y.S. (1965) Discharge patterns of single fibers in the cat's auditory

nerve. M.I.T. Press, Cambridge, Massachusetts.

9. Siebert, W.M. (1968) Stimulus transformations in the peripheral auditory sys­

tem. In P.A. Kolers and M. Eden (eds). Recognizing patterns. M.I.T. Press,

Cambridge, Massachusetts, 104-133.

10. Siebert, W.M. (1970) Frequency discrimination in the auditory system: place or

periodicity mechanisms? Proc. IEEE ~, 723-730.

11. Colburn, H. S. (1973) Theory of binaural interaction based on auditory-nerve

data. I. General strategy and preliminary results on interaural discrimina­

tion. J. Acoust. Soc. Am. 54, 1458-1470.

12. Colburn, H.S. (1977) Theory of binaural interaction based upon aUditory-nerve

data. II. Detection of tones in noise. J. Acoust. Soc. Am. ~, 525-533.

13. Goldstein, J.L. and Srulovicz, P. (1977) Auditory-nerve spike intervals as an

adequate basis for aural frequency measurement. In E.F. Evans and J.P. Wil­

son (eds). Psychophysics and physiology of hearing. Academic Press, London,

337-345.

14. Srulovicz, P. and Goldstein, J.L. (1982) A central spectrum model: a synthesis

of auditory-nerve timing and place cues in monaural communication of fre­

quency spectrum. Submitted to J. Acoust. Soc. Am.

15. Sachs, M.B. and Young, E.D. (1980) Effects of nonlinearities on speech encod­

ing in the auditory nerve. J. Acoust. Soc. Am. 68, 858-875.

16. Houtsma, A.J.M. and Goldstein, J.L. (1972) The central origin of the pitch of21

Page 21: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

22

complex tones: evidence from musical interval recognition. J. Acoust. Soc.

Am. 21., 520-529.

17. Goldstein, J .L., Gerson, A., Srulovicz, P. and Furst, M. (1973) Verification

of the optimum probabilistic basis of aural processing in pitch of complex

tones. J. Acoust. Soc. Am. 63, 486-497.

18. Gerson, A. and Goldstein, J.L. (1978) Evidence for a general template in cen­

tral optimal processing of pitch of complex tones. J. Acoust. Soc. Am. 63,

498-510.

19. Siebert, W.M. (1978) Contributions of the communications sciences to physiolo­

gy. Am. J. of Physiol. 234 (5), R161-R166.

20. Buchsbaum, G. and Goldstein, J.L. (1979) Optimum probabilistic processing in

colour vision. I and II. Roy. Soc. Lond. 8205, 229-266.

21. Duifhuis, H., Willems, L.F. and Sluyter, R.J. (1982) Measurement of pitch in

speech: An implementation of Goldstein I s theory of pitch perception. J.

Acoust. Soc. Am. 21, 1568-1580.

Page 22: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Frequency and level effects in three-tone suppression

H. Duifhuis*, in collaboration with P. Kraft* and H.W. Zelle*Groningen University

Introduction

Since the pioneering work of Houtgast (e.g., 1974), psychophysical lateral suppres­

sion has become a popular topic in hearing research. In this paper we first indi­

cate why this may continue to be the case for some time. We present some new data

on three-tone suppression, and we conclude by reflecting on the current position of

the study of psychophysical suppression within the field of hearing research.

Psychophysical suppression was originally believed to be the counterpart of physio­

logical lateral inhibition, which was fairly well established by 1970. The pheno­

menon is that the response to a tone (the suppressee) is reduced by the addition of

a second tone (the suppressor). Later, it was realised that the effect is not a

true inhibition effect, but the result of a nonlinear peripheral process.

For a linear process the knowledge of the response to a single click allows the

prediction of the response to any arbitrary stimulus. A nonlinear process, however,

is neither characterised by its impulse response, nor by its frequency characteris­

tic. So long as there is no adequate nonlinear theory, the predictions of responses

to new stimuli have to be checked experimentally every time. This paper presents

additional data concerning the extension from a two-tone to a three-tone stimulus.

The data have been obtained at IPO (from medio 1979 until ultimo 1980 1 ) and at

the Groningen Biophysics department (from ultimo 1981 until medio 1982). The cen­

tral experimental issue in these studies was the question: does a second suppressor

cause an increase of suppression or a decrease? The latter situation might occur

when the second suppressor reduces the first suppressor, thereby rendering this one

less effective. Note that this issue displays aspects of linear thinking about non­

linear phenomena, which is dangerous.

Some tentative results of these experiments have been presented elsewhere (Duif­

huis, 1980b). The issue was, however, not resolved unequivocally at that stage. At

present we appear to be able to generalise the results as follows. Given a two-tone

suppression case, the addition of a second suppressor may cause a summation of sup­

pression when the effect of the second suppressor is small. At the point where both

suppressors are, individually, about equally effective the summed suppression ef­

fect becomes less than either single effect.

Method

Al though the actual hardware components of the two experimental setups differed

1) Detailed reports (in Dutch) are available as IPO-report 409 by H.W. Zelle, andas P. Kraft's master's thesis (supervised by Duifhuis) at the Biophysics de­partment of Groningen University~

23

Page 23: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

markedly, the stimuli and experimental procedures were virtually identical. In all

cases we used the pulsation threshold technique in a method of adjustment situa­

tion. Details of the stimulus and of the procedure have been described before

(Duifhuis, 1980a, 1980b). In the IPO setup, used in Experiment 1, stimuli were pre­

sented monotically (Pioneer SE-700 headphones); in the Groningen setup, used in Ex­

periment 2, presentation was diotically (through TDH-49P). All levels were cali­

brated with the B&K type 4153 artificial ear, and are expressed as sound pressure

level (re 20 ~Pa).

The notation to be used in the rest of the paper is: index 1 for the suppressee, 2

for the first suppressor, 3 for the second suppressor and P for the probe.

Experiment 1: Frequency effects

Two normal hearing, experienced sUbjects (the authors HD and HZ) participated in

Experiment 1. In this experiment we fixed the stimulus frequencies. We used two

conditions, viz. both suppressors below f1 (HD and HZ), and both suppressors above

f1 (HZ only). In the first condition we chose f1 = 2 kHz, f2 = 0.8 kHz, and f3 =0.5 kHz and L1 = 50 dB. L2 was selected to yield significant suppression (80 dB for

HZ and 85 dB for HD) in the two-tone case and L3 was varied over a restricted range

up to 90 dB. In the second condition we used f1 = 1 kHz, f2 = 1.3 kHz, f 3 = 1.5

kHz, and L1 = 40 dB. L2 was 75 dB and L3 ranged from 75-90 dB. The probe frequency

was the independent variable. In all cases we determined pulsation threshold pat­

terns for the individual components f1, f2 and f3, for the two-tone combinations

f1+f2 and f1+f3 and for the three-tone combination f1+f2+f3.

Results for the three-tone combinations are presented in Figs 1a and 1b (condition

1) and in Fig. 2 (condition 2).

100

90

80

...J0- 70CJ)

III~ 60u0.J::

50til

~£:c: 40.2Uj.!!1 30:>a.

20

24

.2 3 4 .5 678.910

probe frequency (kHz)

15 20 3.0 40

Page 24: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Fig. 1. Three-tone pulsation threshold frequency patterns, for HDin 1a and Hz in 1b. Suppressor frequencies are below suppresseefrequency. Parameter values are indicated along the axes. Errorbars at the top indicate variability in data points. The dashedline without data points is a reference line constructed by energysummation of the patterns for the individual tones. The error barsat the bottom indicate the variability in this sum pattern.

100

90 =: IL380 ~L2

-.J 7011.CIl

11lE 60u(5.s:: 50 ~L1<Il

~.s::

c 400

~30<Il

:;a.

20

10HZ reo

0.2 .3 .4 .5 .6 .7 .8 .9 1.0 1.5 2D 3D

probe frequency (kHz)

Besides the data for the three-tone condition, a dashed line without plotted data­

points indicates the energy sum of the patterns for the individual components. The

distances from the datapoints to this line at f2 and f1 indicate amounts of sup­

pression. In Fig. 1 we observe significant suppression at fl' moderate but systema­

tic suppression at f2, but no clear-cut effect from L3 on the thresholds at fl. InFig. 2 the conspicuous difference is the vast amount of suppression at f2 (approx.

30 dB). We also note significant traces of combination tones in Fig. 1a (0.2 and

0.3 kHz) and in Fig. 2 (1.1 kHz).

A problem we face when trying to measure suppression is that there is only a limit­

ed level range over which a tone can act as a suppressor (cf. Duifhuis, 1980a).

Beyond that level it becomes excitatory, thereby reducing suppression effects. In

addition, in a multi tone-complex there are potentially confounding suppression and

excitation effects by combination tones, in particular in condition 2. In condition

1 the combination tones fall outside the range of direct interest.

25

Page 25: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Fig. 2. As Fig. 1 with suppressor fre­quencies above the suppressee frequency,for HZ.

\\\\\\\---

1lil\/I

Ii iJ'

I ,!,I\j

HZr.e.

100

90 -+--+-

80 -+-L3

-+-L270

60

50

c0

~ 30CI)

::l0..

20

10

-la..(J)

en"3"0"0.cCI)Q.l....c- 40 -+-L1

.7 .8 .9 1.0 1.5 2.0

probe frequency (kHz)

The error bars at the top of the figures denote the estimated standard deviations

for the data points at those frequencies; the error bars at the bottom give the es­

timated standard deviation of the computed energy sum.

Experiment 2: Level effects

Since Experiment 1 did not provide an unequivocal answer to the question of additi­

vity of suppression, and since sampling of the stimulus spaces in both frequency

and time had to be deemed impractical, we concentrated on measuring level effects

at fixed frequencies.

SUbjects in this experiment were the normal hearing students AA, PK (the second

author) and PKo. Three signal frequencies were used, viz. 1.0, 1.1 and 1.24 kHz and

each functioned in turn as suppressee, first and second suppressor. L1 was always

50 dB SPL. The results for the six three-tone conditions are presented in Figs 3a

to 3f for PK. The two-tone reference data are also displayed. The data for PKo dif­

fered only in details. L2 is the parameter in the presented data.

26

Starting at low L3 we see the three-tone curves parallel the two-tone curve. As

long as this is true the amounts of suppression caused by the two suppressors add

up. At some point the two-tone curve and three-tone curve begin to converge and ul-

Page 26: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

3 2 2 3

LLL a llLb

1 1.1 1.24 kHz 1 1.1 1.24 kHz

2 33 1 2

iLLdJLL c

3 2 1 2 3 1

ilL e iLL,60 60

L2 (dB SPLI L2 (dB SPLl

50no

50

:; :;.. 50 ..'" '"lD

40 !l!:! 40

:!l 82 "J1 ;;

~~

30 ~ 30 70

.§ .2

I 20 I 20

1O 1020 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90

suppreSsor level L3 (dB SPl) suppreSSOr level L3 {dB SPLI

60 60

L2 (dB SPLI L2 (dB SPLI

50 50 no

:; .... 50.. ..'" '"lD

40 lD:!l !:! 40

" ";;

~i 30 f 30

.§ 60 .2~ ~0. 20 0. 20

10 1020 30 40 50 60 70 eo 90 20 30 40 50 60 70 80 90suppressor level La (dB SPL) suppressor level L3 (dB SPL)

60 60

L2 (dB SPLl L2 (dB SPLl

50 00 50 00

:; 50 :;.. ..'" '" 60lD

40 58 lD!:! !:! 40

"

J;; 72ii! 68

30 30

.~ c.9

~ ::;

0. 20 0. 20

10 1020 30 40 50 60 70 80 90 20 30 40 50 60 70 eo 90

suppressor level La (dB SPL) suppressor level La (dB SPL)

Fig. 3. Two- and three-tone suppression level data, for PK. Three tones, at 1, 1.1and 1.24 kHz took turns as suppressee and suppressors, as is indicated at the top.The 'no'-L2 curves represent two-tone suppression data. Left of each three-tonecurve is displayed the level of L2 ( 1 15 suppressee, 2 is 1st suppressor, 3 is2nd suppressor). The stimulus conditions are shown separately at the top.

'27

Page 27: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

28

timately they cross. Thereafter, the suppression in the three-tone case is less

than in the two-tone case. The points where convergence starts and where intersec­

tion occurs depend on the frequency and level conditions. Addition of suppression

is most pronounced in panel d for L2 = 50 dB. But in general it disappears soon af­

ter the amount of suppression caused by L3 becomes of the order of magnitude of the

suppression caused by the first suppressor (the latter magnitude equals the de­

crease at the lowest L3)'

Discussion

The two-tone suppression data contained in the current set are consistent with pub­

lished data (e.g., Houtgast, 1974; Shannon, 1976; Duifhuis, 1980a). It is interest­

ing to note that the effect of suppression is not only present at fl, but acts at

all frequencies that show threshold elevation by the suppressee, and this entire

pattern is virtually obliterated (Figs 1,2). There are strong experimental limita­

tions on exploring three-tone interactions. When the suppressors are higher in fre­

quency than the suppressee, the results are confounded by combination tones. This

problem is avoided by having the suppressors below fl, but then we need very high

suppressor levels to demonstrate significant suppression. In experiment 2 some fur­

ther optimisation was achieved by increasing Ll and reducing the frequency distance

(cf. Duifhuis, 1980a, Fig. 14). Now it was possible to determine a fairly complete

three-tone suppression level pattern. The clear conclusion is that, in general,

there is only a limited range over which suppression is additive (on log-coordi­

nates). Here suppression might be considered the result of two independent attenu­

ating factors (cf. Sachs and Abbas, 1976; Javel et al., 1978). It occurs when the

two suppressors produce a limited amount of suppression. At higher suppressor

levels, which for individual suppressors produce more suppression, addition breaks

down. At sufficiently high levels the two suppressors combined produce less sup­

pression than either separately. Phenomenologically this might be called 'release

of suppression' due to suppression of the first suppressor. However, such a quasi­

linear interpretation is almost certainly false. If, on the basis of relative

levels, none of the components can be neglected with respect to the other two, we

are dealing with a nonlinear three-tone interaction, which cannot be predicted on

the basis of two-tone results. The current data provide new boundary conditions for

the appropriate nonlinear analysis.

It is of interest to note that, despite the quantitative asymmetry in high-frequen­

cy and low-frequency two-tone suppression that is generally found, the results for

all conditions displayed in Fig. 3 are qualitatively similar, and the above de­

scription of the data is largely independent of the (relative) locations of f2 and

f3·

The range of suppressor levels over which two-tone suppression occurs is bounded

(at the high level side) by the effect that at a sufficiently high level the sup­

pressor becomes excitatory. The 'release of suppression' in the three-tone case oc­

curs at a lower level. This might imply, as an alternative simplistic interpreta­

tion, that the combined excitatory effect of the two suppressors occurs at a lower

level than the individual effects. In retrospect it is clear that, without suffi-

Page 28: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

cient exploration of the level domain, no unequivocal answer could be obtained to

the question of additivity of suppression. There is a marked variability in sup­pression data between listeners, so that equal physical parameter values produce

different suppression. The results of Experiment 2 now show a uniform trend: addi­

tivity breaks down as suppressor levels increase.

Finally, it is quite clear that our results on psychophysical three-tone suppres­

sion are distinctly at variance with the conclusion of a recent report on physiolo­

gical three-tone suppression in the cat's anteroventral cochlear nucleus (Javel etal., 1982). They report additivity to be the general result. It is very intriguing

that this property describes only a small part of our data.

Perspective

The issue of auditory nonlinearity has been one of the focuses of interest in the

study of hearing in the last decade. Although psychophysical knowledge about non­

linearity dates back several centuries, the currently emerging interpretation islargely the result of combined experimental and theoretical efforts over the last

15 years. Evidence to support the notion that the nonlinearity originates in the

mechano-electric transduction process, i.e. in the hair cell, -although mostly

still indirect evidence- has been gradually piling up.

The historical observations of nonlinearity concern combination tones, the low-fre­quency intermodulation products generated by any nonlinear system that is stimulat­

ed with a multi tone input. Reasoning from psychophysical data, theorists deducedthat the nonl ineari ty was of a saturating type (Schroeder, 1969; Smoorenburg,

1972), that it is decomposable into an odd-order and an even-order part which might

behave independently (e.g. Duifhuis, 1976), that certain aspects are described by a

lumped system of filter, nonlinearity, and second filter (both filters band-pass)

(Goldstein, 1967), and that the combination tones propagate within the cochlea verymuch like stimulus components (Goldstein and Kiang, 1968; Hall, 1974). After firm

establishment of these findings, they were confirmed by physiological data from the

auditory periphery, albeit sometimes after several years of dispute. A key problem

of the physiological experiments is the extreme vulnerability of the peripheral

nonlinearity (e.g. Khanna and Leonard, 1982). It is interesting to note that the

lateral suppression story developed quite differently. Two-tone suppression (inhi­

bition) was first firmly established in auditory nerve data and it was not until

the early seventies that Houtgast (e.g. 1974) came up within the experimental tech­niques for measuring psychophysical counterparts of suppression. Although several

psychophysicists dispute the validity of forward masking as well as of the pulsa­

tion threshold technique for identifying lateral suppression, there can be little

doubt that the issues raised are marginal, however ingenious the arguments. The

simple logic is that the periphery is nonlinear, that the functional scheme that

described the combination tone behaviour also predicts lateral suppression (Pfeif­

fer, 1970; Duifhuis, 1976), that, more in general, suppression and distortion pro­

ducts are different faces of the same coin. There is a non-negligible complicationthat the faces are observed from quite different angles but that does not preclude

their identification. The central auditory system is provided with information af-29

Page 29: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

30

fected by suppression. The interpretation of psychophysical suppression as a cen­

tral 'unmasking' effect is not parsimonious and is thus unattractive. This logic

applies to most trends in the psychophysical data reported so far. The problem is,

of course, to relate the psychophysical and physiological effects quantitatively in

all details. As pointed out in the experimental study presented above, much work

remains to be done here. The ultimate tests are impossible without an adequate the­

ory of the nonlinear cochlea, and appropriate description of subjective decision

cri teria and rules. But let us re-emphasize the significance of the results from

combination tone experiments, dealing with the same problems, and put the above

formal problems in a reasonable perspective.

There should be no misunderstanding regarding the question whether the lumped fil­

ter-nonlinearity-second filter model (the BPNL model) provides the required ade­

quate theory. The real problem is very complicated mathematically and the tools to

solve this problem are still being developed. The theory should take the signifi­

cant mechanical properties of the microstructure of the organ of Corti into ac­count, including the probably sometimes active and nonlinear mechanical load that

the hair cells provide. Although the first steps in this direction have been taken,

lack of precise data prevents their proper evaluation. In this situation a blackbox

model, like the BPNL model, has its (limited) use.

The relevance of auditory nonlinearity needs further exploration. In harmonic sig­

nals in particular, but in broad-band signals in general, the generation of inter­

modulation products is fairly irrelevant since they will mostly be masked by prima­

ry signal components. Lateral suppression might be thought to produce sharpened

auditory excitation patterns. The evidence for this is extremely meagre. The effect

has to compete with the saturating effect of the nonl ineari ty. Thus, in response

patterns the effect is minimal. However, in iso-response measurements which exploit

the saturating nonlinearity in the opposite direction as it were, the effects of

suppression are obvious. Nevertheless it might be that the main role of the satu­

rating nonlinearity is to help the ear cover its enormous dynamic range which, de­

spite the overlapping ranges for different auditory nerve fibres becoming accepted

again, is still spectacular and hardly interpretable in terms of linear mechanics.

References

Duifhuis, H. (1976) Cochlear nonlinearity and second filter: possible mechanismand implications. J. Acoust. Soc. Am. 59, 408-423.

Duifhuis, H. (1980a) Level effects in psychophysical two-tone suppression. J.

Acoust. Soc. Am. 67, 914-927.

Duifhuis, H. (1980b) Psychophysical three-tone suppression. In: Psychophysical,

Physiological and Behavioural Studies in Hearing, Proc. 5th Int. Symp. on Hear­

ing, G. v.d. Brink and F.A. Bilsen eds (Delft University Press), 253-256.

Goldstein, J.L. (1967) Auditory nonlinearity. J. Acoust. Soc. Am. il, 676-689.

Goldstein, J.L. and Kiang, N.Y.S. (1968) Neural correlates of the aural combi­nation-tone 2f1-f2. Proc. IEEE 56, 981-992.

Page 30: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Hall, J.L. (1974) Two-tone distortion products in a nonlinear model of the basilar

membrane. J. Acoust. Soc. Am. ~, 1818-1828.

Houtgast, T. (1974) Lateral suppression in hearing. Ph.D.-thesis, VU Amsterdam.

Javel, E., Geisler, C.D. and Ravidran, A. (1978) Two-tone suppression in auditory

nerve of the cat: Rate-intensity and temporal analyses. J. Acoust. Soc. Am. 63,

1093-1104.

Javel, E., McGee, J., Farley, G.R., Gorga, M.P. and Walsh, E.J. (1982) Three-tone

suppression. J. Acoust. Soc. Am. 21, S18 (A).

Khanna, S.M. and Leonard, D.G.B. (1982) Basilar membrane tuning in the cat

cochlea. Science 215, 305-306.

Pfeiffer, R.R. (1979) A model for two-tone inhibition in single cochlear nerve

fibers. J. Acoust. Soc. Am. 48, 1373-1378.

Sachs, M.B. and Abbas, P.J. (1976) Phenomenological model for two-tone sup­

pression. J. Acoust. Soc. Am. 60, 1157-1163.

Schroeder, M.R. (1969) Relation between critical bands in hearing and the phase

characteristics of cubic difference tones. J. Acoust. Soc. Am. 46, 1488-1492.

Shannon, R.V. (1976) Two-tone unmasking and suppression in a forward masking

situation. J. Acoust. Soc. Am. 59, 1460-1470.

Smoorenburg, G.F. (1972) Combination tones and their origin. J. Acoust. Soc. Am.

52, 615-632.

31

Page 31: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

32

Science policy: three recent idols, and a goddessW.J.M. LeveltMax-Planck-Institute for Psycholinguistics. Nijmegen

Science and science policy

Human perceptual systems accomplish feats of veridicality. To be sure, there are

occasional illusions and misperceptions, carefully registered and analysed in re­

search institutes such as IPO, but the astonishing fact is that, under a wide rangeof conditions, our senses yield truthful representations of objects and events in

an ever-changing environment. There is immediate categorisation of what is fixed,constant, permanent, essential, and this knowledge provides veridical guidance for

often equally immediate decisions and actions. This, with language, is one of evo­lution's most precious gifts to mankind.

But evolution has been less generous to the mind's eye, in particular as regardsour ability to discern scientific truths immediately in the ever-changing flood of

potentially relevant data. In fact, that ability is nonexistent. Approaching truth

in science is a slow and highly unpredictable process, one which is beset by illu­

sions and misconceptions. Only time is a filter comparable to the senses: After thefads, the rhetoric, the personal and pUblic interests have died away, it becomes

inc~easingly clear what progress has been made in terms of lasting contributions toscience. Twenty-five years is a relatively narrow band-width for this filter.

Still, when it is applied to IPO, one detects the contours of some major theoreti­

cal developments which are generally viewed as 'classics' (Some of these highlights

are discussed in '25 jaar IPO', Eindhoven: IPO, 1982).The disquieting (for some) but instructive aspect of this is that none of these

developments could have been predicted 25 years ago. The mind's eye is completely

blind to future knowledge. If one did, moreover, try to trace back how a particular

scientific insight came into existence, one would find a bewildering gamma of idio­syncratic accidentals, none of which in itself could have been known in advance to

playa crucial role in the process.This makes research management a difficult, or -depending on one's perspective- an

easy job: One cannot do much about steering the process. There are no tested andproved strategies. No planning, however intelligent, can guarantee success.

This state of affairs, though nothing new in science, is becoming increasingly dis­

quieting to science pOlicy makers. Public pressure is building up to 'exert con­

trol' over the advancement of knowledge, to make the process less erratic and morepredictable. The worst, and most ridiculous version of such 'control' is to assign

scientists the task of proving a pre-established scientific 'truth', such as a

racist, Marxist, or feminist ideology. Leaving such excesses out of consideration,

however, one can also detect more subtle ways in which governments and funding

agencies try to influence the course and the degree of success of scientific dis­

covery. There can be no doubt that some of these efforts are exercised with the

best of intentions: The promotion of science is still widely recognised as a maxim

of our culture. Still, the form such promotion takes is at times impractical, at

other times based on false assumptions. Some of these assumptions acquire the sta-

Page 32: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

tus of idols, worshipped equally by science pol icy makers, administrators and

scientists. But at least the latter should speak the truth, even when this leads todisharmony with public opinion or to jeopardising the flow of funds. Moreover, it

may well be that the worshipping of idols will in the long run boomerang and harmscience itself. In the following I want to mention three idols whose worshipping I

would not recommend to scientists.

Three idols

The research proposal idol

Only research which is cast in a neat research proposal, outlining theory, methods,

expected findings, and timing, can be expected to yield significant results. There

is no doubt that the writing of research proposals can perform useful functions in

the promotion of science. It forces the scientist to relate his or her ideas to

whatever is around in the literature, and it gives the indispensible scientific fo­

rum a chance to interfere even during the conception of a research project. Also,

it makes the scientist 'funds-conscious': In the best case, he or she will consider

whether the expected scientific gain is reasonable related to the financial re­

quirements of the project. All this I grant; still, the general claim is patently

false. Its falsity follows from the above-mentioned in-principle unpredictability

of future knowledge. If one always required a scientist to predict his scientific

results, or to predict the direction of these results, or even only to outline the

problems, and if one at the same time required him to stay within the limits ofthese predictions or outlines, this would be a death-blow to scientific progress. A

good research project will, as a rule, yield unexpected results, and progress in

science is best served by allowing the scientist to follow these leads, i.e. to de­

fine a new problem and to steer in a different direction. Happily enough, funding

agencies are often aware of this and do not bother too much about mismatches be­

tween the proposal and the actual work carried out. But then one wonders whetherthe present research proposal cult, which is growing out of all bounds, is not

really a liturgy rendering homage to an idol, and only serving the public illusion

that the scientific progress can be 'controlled'.

The interdisciplinarity idol

Interdisciplinary research is better than monodisciplinary research. Dissatisfac­

tion with scientific progress within certain disciplines, and general dislike of an

'ivory tower mentality' may be at the root of strong public pressure towards inter­

disciplinary research. If a scientist is only put into a situation where he is

forced to consider problems, theories, and methods other than the traditional ones

of his own field, new vistas of scientific progress will automatically emerge. To

be sure, the recent history of science has witnessed the growth of highly success­

ful interdisciplinary fields, such as biochemistry, biophysics, and psycholinguis­

tics. But in my view, this has nothing to do with interdisciplinary per se. The

viability of an interdisciplinary field depends on whether or not it cuts nature at

its joints, i.e. whether the systems and processes studied are sufficiently autono­

mous and specific to warrant research in their own right. Interdisciplinary re-33

Page 33: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

34

search may equally well lead away from such 'islands of nature' as towards them.

There is an additional confusion here, which should be carefully distinguished. It

is popular these days to say that research should be 'problem-oriented' (another

idol), and that 'problems' usually defy traditional boundaries between disci­

plines. The comprehensive study of traffic problems, minority problems, or rehabi­

litation problems, for instance, obviously requires expertise from different disci­

plines simultaneoulsy. Therefore, 'problem-oriented' research requires interdisci­

plinari ty. Though this is obviously true, the starting-point is less convincing,

i.e. that real science should be 'problem-oriented' in the suggested sense. I will

turn to this issue when discussing the next and last idol. Here it suffices to say

that striving for interdisciplinarity as such amounts to trying to exert control

over science by sheer magic.

The relevance idol

The promotion of science is best served by glvlng priority to the study of urgent

problems in our society (there then follows an unbiassed listing of these pro­

blems). Though not dead, this idol has lost much of its revolutionary appeal over

the last ten or fifteen years; its falseness is too apparent. I will not repeat the

arguments here, but rather consider some offspring of the idol which are still

alive and kicking, and which find considerable support among scientists them­

selves. The keywords are 'problem-oriented' and 'applied' science. As for 'problem­

oriented' research, its meaning is dependent on what is taken to be a 'problem'.

Usually, it takes the form of 'an urgent problem in our society', which brings us

back to the relevance idol. At the other extreme a 'problem' can be anything aris­ing from science itself, such as the chemical structure of DNA, universal proper­

ties of syntax, or the recognition of words. In that case, 'problem-orientedscience' is just a faddish way of saying 'science'. Between these two extremes lies

a third use of the term: A problem is any issue external to science which draws at­

tention. This mayor may not be a 'socially relevant' issue, a practical issue, an

aesthetic one, etc. 'Problem-oriented research' is, then, the scientific analysis

of such an issue. This sense of the term is, as far as I can see, indistinguishable

from what is usually called 'applied research'. So let us limit the discussion to

the question of whether applied research should be a privileged way to promote

science. That there is a general move these days away from basic and towards ap­plied research is a given, and I have always taken this as the unhappy result of

funds drying up and scientists wanting to stay alive. This is regrettable, but not

insincere. What disturbs me is to hear scientists proclaim that applied research is

so exceptionally good for science. The Dutch Psychonomic Society, for example, isorganising a conference on metatheoretical aspects of psychonomic research - a lau­

dable initiative. However, a major part of this conference is dedicated to applied

research. Why? Is one really presupposing that applied research has some intrinsic

theoretical role to play in the scientific analysis of mental processes? This would

be utterly off the mark. It is sheer luck when an applied problem reveals the exis­

tence of an hitherto unknown principle of mental (or, for that matter, of biologi­

calor physical) organisation. Almost any readily observable phenomenon or problem

is the resultant of complex interactions. This fact does not preclude their scien­

tific analysis, but this is not the most straightforward way to discover the laws

Page 34: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

of nature. The latter requires abstraction from interactions, irrelevant variables,

and the like. Louis Pasteur must have had this in mind when he remarked that there

are no applied sciences, only applications of science. It is true that applied re­

search can sensitise a scientist's mind to potentially important variables. But

here applied research is on a par with occasional observations, talking to collea­

gues, having dreams, reading books, etc. They are among the ingredients from which

the highly associative creative process in science draws; but there is no special

status here for applied research. It would be untruthful to proclaim this, and it

will in the end hinder the advancement of science if politicians capitalise on such

proclamations.

Just a last point here: I am a staunch supporter of applied research. There is a

host of problems in our complex society which cannot be solved without applying the

best of our scientific tools and methods. This should be done, and it should be

done well. But one should not confuse it with science.

A goddess: the freedom of science

Science is free in the sense that it is disinterested. It approaches truth whatever

the consequences. An unpopular, dangerous, or socially irrelevant truth is just as

valuable in science as a popular, useful, or relevant truth. The only thing that

counts is the internal dynamics of the inquiry. In last instance, this freedom has

to be realised in the individual scientist's mind. This is not a luxury, but rather

a responsibility, one which is becoming increasingly hard to live up to under grow­

ing public pressure and with a falling economic tide. Freedom of science is, like

democracy, not a self-evident permanent characteristic of our society. It is vul­

nerable, and it needs continuous defence both within the scientific community it­self and before the general public.

At the same time, the disinterestedness of the inquiry can in no way serve as anexcuse for the scientist to refrain from signalling potential abuses of his re­

sults. In fact, public arguments against the freedom of science have often address­

ed scientists' neglect of this duty. But one should not throw out the baby with thebathwater.

How can science policy promote the advancement of knowledge? There is, first, the

domain of applied research. Governments can promote the study of urgent societal

problems, they can def ine desired results, technical developments and the I ike.

Second, science policy can consist of setting priorities for fundamental research,

stimulating one discipline or sUbdiscipline rather more than others. The heart of

the matter here, however, is to create a maximum of freedom of whatever science is

to be promoted. Every move to exert control over the course of the inquiry itself

is doomed to be counterproductive.

An especially effective way of realising these boundary conditions for fundamental

research is the establishment of research institutes with longterm funding and in­

dependent internal definition of the research program. This is, to a good approxi­

mation, the structure of IPO, and it has been put to good use.

35

Page 35: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

36

Toward a featu re-based model of speech perception

K.N. StevensMassachusetts Institute of Technology, Cambridge, MA

One of the principal reasons why it is difficult to formulate a satisfactory model

of speech perception is that the acoustic representation of particular phonetic

segments exhibits a large amount of variability. The acoustic manifestation of a

phonetic segment is often strongly dependent on the context in which the segment

occurs. Thus a speech perception model that attempts to identify strings of phone­

tic segments directly from acoustic parameters must take into account this context

dependence if it is to simulate the process by which a listener identifies thewords in an utterance. Various theories have attempted to model the speech percep­

tion process in spite of the variability in phonetic segments by resorting to aperception strategy that involves inferring certain speech production parameters

from the speech signal or that involves the direct identification of larger units

such as syllables or words.

For the most part, however, these theories do not incorporate the special charac­

teristics of the auditory system. It is known from psychophysical and electrophysi­ological studies that the auditory system gives a distinctive response when certain

parameters that describe the sound lie within specific ranges. That is, manipula­

tion of these acoustic parameters in equal steps along some physical scale produces

a discontinuous or categorical response on the part of the listener. Examples of

acoustic parameters that give rise to categorical responses include the spacing be­

tween spectral peaks for vowels and certain consonants, the relative onset times of

two components of a sound, the abruptness of an onset, and the degree of prominence

of a spectral peak. Languages seem to exploit these special characteristics of theauditory system in selecting an inventory of sounds or of sound properties that are

used to distinguish between words. The concept of natural classes of sounds and of

distinctive features seems to be based in part on these attributes of the auditory

system. There seems to be a rather limited inventory of about 10-20 of these pro­perties that can operate in various combinations to distinguish between the words

of a language.

Although much work remains to be done to define this inventory of distinctive fea­

tures and their acoustic correlates, data from acoustic and perceptual studies sug­

gest that when a distinction between words is based on a particular feature, an in­

variant acoustic property is present in the sound. That is, the acoustic property

corresponding to a given feature is independent of the context in which the featureoccurs. To be sure, there are redundancies in the feature specification of an ut­

terance, so that usually more than one feature is operating to distinguish between

words that are minimally different. A speaker is free, then, to modify or delete

some features as long as the distinction continues to be carried by other fea­tures. This kind of redundancy is a principal source of the variability of phonetic

Page 36: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

segments. The study of redundant features and the principles governing their occur­

rence is a major topic for future research. We need to know which features require

help from redundant features in order to enhance a distinction between words, and

we need to know the rules governing the modification or deletion of features that

are redundant in running speech.

A speech perception model based on features postulates that words are represented

in the lexicon of the speaker-listener in terms of patterns of features. If the

features can be identified by detecting the appropriate properties in the sound,

the word can be identified directly from this pattern of features. Such a model

does not require that phonetic segments of phonemes be identified as a step towardsidentification of words.

Any theory of speech perception must incorporate knowledge of auditory processing

of sound, acoustic properties of speech, and the linguistic principles on which ut­

terances are organised. The Institute for Perception Research is unique in that it

has attracted innovative and productive researchers whose interests span all of

these fields. The result has been a series of papers and books that are widely read

and have significantly advanced our understanding of the speech process. We congra­

tulate the Institute for Perception Research on the completion of its twenty-fifth

year, and we wish it continued success in the next quarter century.

.37

Page 37: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

38

Auditory Perception and Speech

Page 38: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

DevelopmentsS.G. Nooteboom I

Audition in normal and hearing impaired listeners

Dr A.J.M. Houtsma, formerly at M.LT., joined our group this September. His re­

search will mainly be in the field of theoretical and experimental studies of tonal

perception. On December 1st Drs B.L. Cardozo retired from IPO, and our efforts in

the domain of 'annoyance due to sound' will, for the time being, not be continued.

An investigation of some problems relating to dynamic compression of musical sound,

started by Cardozo and Van Lieshout, will be continued by Houtsma and Van Lieshout.

The four year project on cochlear adaptation in auditory masking, carried out by

A. Bezemer, was concluded early this year. A doctorate thesis is in preparation.

Analysis and resynthesis of speech

A doctorate thesis by Vogten will shortly be completed and will document our ef­

forts of the last few years in the area of analysis and resynthesis of speech and

the perceptual evaluation of resynthesized speech. In the meantime our attempts to

improve on the LPC-to-formants analysis-resynthesis are continued with special em­

phasis on low bit-rate speech (Vogten and Willems, in collaboration with Zelle, who

as of August 1st is at Philips Elcoma).

Dr C.J. Darwin of the Psychology Department of the University of Sussex joined us

as a guest researcher during the period from April 1st to November 1st. He has at­

tempted to improve on the analysis and resynthesis of speech by applying a nonbina­

ry distinction between the periodic and noisy sound source (Darwin, this issue).

Diphone concatenation and intonation

This year a four year project has started exploring the possibilities and limita­

tions of synthesizing Dutch speech from diphones, based on our LPC-to-formants

analysis-resynthesis system. The initial attempts in this direction are very en­

couraging (Elsendoorn and 't Hart, this issue).

A computer implementation has been made of the rules for generating Dutch pi tch

contours ('t Hart, Zelle). This algorithm can be used both as a research tool, and

as part of the diphone synthesis system. Development of a set of rules for generat­

ing British-English pitch contours is continued (N. Willems).

Perception, recognition and comprehension of speech

Our efforts to illuminate the role of pitch in the perceptual separation of simul­

taneous sounds have been continued in the form of an experiment focusing on the hu­

man capacity to identify two completely simultaneous periodic vowel sounds, for a

variety of differences in FO between the two vowels (Scheffers, this issue). In the

same study, an attempt is made to model this human capacity in the form of a compu-39

Page 39: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

40

ter algorithm, building on the Simulation of Auditory Analysis of Pitch developedearlier (Scheffers).

The simulation of aspects of human recognition of spoken words has been continued

with special emphasis on the acoustical front end of the system, and on the possi­bility of including diphone-like subunits beneath the word level, thus employing

the advantages of context-sensitive coding on two separate levels of recognition(Marcus).

A new four year project was started this year as an experimental investigation of

the ways in which auditory and lexical information are combined in the human recog­

nition of spoken words. In this project auditory information will be controlled byusing acoustically invariant diphones to synthesize stimulus words. Lexical infor­

mation will be available in the form of a lexicon of Dutch words stored on disk(Van der Vlugt).

An attempt was made this year to apply the so-called 'gating paradigm', introducedby Grosjean for the study of word recognition, to the measurement of relative

speech quality (Nooteboom and Doodeman, this issue).

Our exploration of the communicative functions of pitch accents has this year takenthe form of a series of experiments measuring the effects of appropriate and in­

appropriate accentuation ahd de-accentuation of words in descriptive utterances on

speed of comprehension in a verification task. Although certain clear effects were

obtained, interpretation of the data still leaves us with some puzzles (Terken,this issue).

Page 40: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The role of pitch in the perceptual separation of simultaneous vowels II

M.T.M. Scheffers'

Introduction

In a continuation of our research on the perceptual separation of simultaneoussounds (Scheffers, 1979) an experiment was carried out to investigate the identifi­

ability of simultaneous vowel sounds as a function of the difference between thefundamental frequencies of both vowels. Our research is inspired by the intriguing

question - first posed by Colin Cherry as the 'Cocktail Party Problem' (Cherry,1953) - of how listeners are able to perceive the speech of a single speaker sepa­

rate from a background of interfering voices. Cherry mentioned voice pitch as oneof the factors possibly facilitating the separation. Much earlier, Stumpf (18,90)

had reported that the sounds of two musical instruments tended to fuse into a sin­

gle percept when both instruments simultaneously played exactly the same note, but

were separately audible when different notes were played. More recently, Brokx and

Nooteboom (1982) found that speech sounds presented in a background of speech from

another or even the same speaker, could be identified considerably better when

there was a difference of more than 1 semi tone between the pitches of the two

sounds. These observations prompted us to investigate the role of differences in

pitch between simultaneous vowels in the perceptual separation process.

Identification of pairs of unvoiced vowels was investigated in a second experi­

ment. This experiment was conducted in order to determine to what extent listeners

could use information derived from the spectral envelope of the sound for identify­ing the vowels.

Experiment 1

The stimuli of the first experiment consisted of two different voiced vowels. The

waveforms of the vowel sounds were computed using a software five-formant speech

synthesizer (Vogten and Willems, 1977). Eight vowels were used, viz. the Dutch Iii,IYI, III, lEI, lei, lal, I~/, and lui. Formant structures were taken from Govaerts'study of Dutch vowels (Govaerts, 1974). The duration of each vowel was 220 ms in­

cluding cosine-shaped onset and offset ramps of 20 ms. The vowels were added with

no temporal onset difference, starting in ~ero phase. They had about equal subjec­

tive loudness. Six f o differences were used: 0, 1/4, 1/2, 1, 2 and 4 semitones. Theaverage f o was 150 Hz. For each pair of vowels with unequal f o two stimuli were

made, one in which the one vowel had the lower and one in which the other had thelower foe The waveforms of the 308 different combinations were digitally stored ondisk.

Twenty subjects took part in the experiment. They had normal hearing and were fami­

liar with synthesized speech sounds and psychoacoustic experiments. They were tes­

ted individually. The subjects were seated in a sound-treated booth and receivedthe signals diotically through TDH 49-P headphones. The signals were band-pass fil-

,41

Page 41: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

tered from 50 Hz to 5 kHz and presented at a level of about 60 dB SL.

A minicomputer controlled the presentation of the stimuli and recorded the respon­

ses. The subjects were instructed to respond to each stimulus by pushing two but­

tons on a panel of eight, each button representing one of the eight vowels used.

All vowels were played to them before the experiment started. No feedback was given

on their responses. The subjects attended the experiment in four sessions held on

consecutive days. In every session, each of the 308 stimuli was presented once, in

a random order that differed for each sUbject and for each session. A session last­

ed about half an hour.

Results

A synopsis of the results is presented in Fig. 1. The percentages correctly identi­

fied combinations (both vowels correct) are given in Fig. la. Fig. lb depicts the

percentages individual vowels correct. Each line in these two panels connects the

results for one vowel pair averaged over all subjects.

100 100

c 0

~::r /J

4

B

o

2 3Mo (semi tone)

1o

!II

41~o>

g 4041..oU!II

c: 20.2..IIIU­..~ 0 1------+------+-----+-----1

"

::~/4

c

2 3.:1 fa (semitone)

1o

~ 80!IIc:.2..IIIc: 60

.DEouc:o 4041..oU!II

c: 20.2..III.2-

Fig. 1. Percentages correct identification of two simultaneous voiced vowels as afunction of the difference between the fundamental frequencies of the vowels. Fig.la shows the identification scores for the 28 combinations used (both vowelscorrect) and Fig. lb the scores on individual vowels. Figs lc and ld give these

42 results averaged over the combinations.

Page 42: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

No significant difference was found between the performance on the stimuli in which

one vowel had the higher f o and on those in which it had the lower f o • The results

are therefore averaged over 'positive' and 'negative' f o differences. The two lowerpanels 1c and ld give the scores for combinations correct and individual vowels

correct respectively, averaged over the 28 combinations. Note that the values along

the vertical axes are different for these two panels.Figs la and lb show that the scores differed much for different pairs of vowels.

They were in general lowest for combinations of similar vowels, such as two front

vowels or two back vowels, and were highest for dissimilar combinations such as a

front and a back vowel. The scores were down to around chance level (4%) for only a

few combinations of vowels with equal fo . It can be seen from Fig. 1 that the

scores increased with increasing f o difference up to 1 or 2 semitones.

Experiment 2

When it was found that identification scores even on pairs of vowels with identical

fo's were generally well above chance level, a second experiment was devised. Sti­muli in this experiment consisted of two different unvoiced vowels. They were con­

structed in the same way as the stimuli for Exp. 1. The unvoiced vowels had the

same spectral envelopes as the voiced ones. The stimuli were D-A converted, band­

pass filtered from 50 Hz to 5 kHz and recorded on magnetic tape with an lSI of 3 s.

The tape contained every stimulus eight times in random order.

Eighteen subjects with normal hearing took part in this experiment. They were asked

to identify the two vowels in a stimulus and to write down a phonemic transcriptionof both vowels on an answer form. The test method was the same as in Exp. 1 except

for the use of a tape and written responses.

Results

Performance on the unvoiced vowels was significantly lower than for voiced vowels

with equal fundamentals (p < .01). The identification score on combinations was 26%

for the unvoiced stimuli and 45% for the voiced stimuli and the average score on

individual vowels was 56% and 69% respectively. The same tendency for pairs of vo­

wels with dissimilar formant structures to be better identified than pairs with si­milar structures was also found here.

Discussion and conclusions

The most surprising result of the experiments is that identifiability of two simul­

taneous vowels was far above chance level even if both vowels had the same funda­mental frequency or when they were unvoiced. The result that simultaneous unvoiced

vowels were less well identifiable than simultaneous voiced vowels with equal fo'scannot yet be explained. It was contradictory to what one would expect from the

fact that formants are more 'sharply' defined in unvoiced than in voiced vowels,

although this is only true for the long-term spectrum. Identification scores on

voiced pairs increased by about 18% on average when the fo difference between thetwo vowels was increased from 0 to 2 semi tones. It is noteworthy that at least one

43

Page 43: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

44

vowel was correctly identified in 95% of the voiced stimuli and in 86% of the un­

voiced.

Identification scores on combinations of vowels with strongly differing spectral

envelopes like Iii and lui were much higher than the scores on vowels with relati­

vely similar spectral shapes like Iii and IY/. This supports a theory of a 'pro­

file' analysis (cf. Green, Kidd and Picardi, 1983) in the recognition process. A

profile is considered to be a relatively simple image of the envelope of the spec­

tral representation of the sound in the peripheral ear. Recognition is then a pro­

cess of matching reference profiles to the one of the present spectrum and identi­

fying the sounds on the basis of the best fitting profiles. The profile is probably

best defined around the first two formants of the vowel. The shape of the profile

near the frequencies of these formants apparently weighs most in the matching (ci.

Klatt, 1982; Scheffers, 1983). If there is a great difference between the profiles

of the composing vowels, identifiability of the combination is relatively high and

little influenced by f o differences. If the profiles are rather similar, however,

f o differences can aid in separating the profile of the combination into parts be­

longing to one of the vowels and parts belonging to the other or maybe to both. Se­

paration is supposed to be guided by the harmonic fine structure of the spectrum.

This is only possible for relatively low frequencies because high harmonics are not

separately detectable in the auditory system (e.g. Plomp, 1964). The theory is sup­

ported by the observation during informal listening that two different pitches

could usually be heard in the stimuli with f o differences greater than 1 semitone,

whereas for smaller differences only one (beating) pitch was heard. This would im­

ply that two different series of harmonics could be discriminated for f o differ­

ences greater than 1 semitone. We may therefore expect little further improvement

of the performance when the f o difference is increased beyond 1 semi tone. A de­

crease in performance can even be expected for harmonic intervals between the two

fo's such as a major third (4 semitones) and especially for an octave because many

harmonics of both vowels will then coincide. A clear decrease in performance for

the 4-semitone difference was indeed found in the results for 8 combinations.

Summary

The identification of simultaneous vowels was investigated for combinations of two

voiced vowels as a function of the difference between the fundamental frequencies

of both vowels and for combinations of unvoiced vowels. The results show that, usu­

ally, at least one of the vowels in a combination can be identified correctly. The

identification scores on combination (both vowels correct) are in general surpris­

ingly high, even for unvoiced vowels and for combinations of vowels of equal funda­

mental frequency. Increasing the difference between the fundamentals facil itated

the identifiability of the combinations by 18% on average. The results support a

theory of a profile analysis of the spectrum of the combination, aided by separa­

tion of low formants as belonging to different vowels on the basis of the pitch(es)

perceived in the complex sound.

Page 44: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

References

Brokx,J. P. L. and Nooteboom, S.G. (1982) Intonation and the perceptual separation

of simultaneous voices. J. Phonetics lQ, 23-26.

Cherry, E.C. (1953) Some experiments on the recognition of speech with one and

with two ears. J. Acoust. Soc. Am. 25, 975-979.

Govaerts, G. (1974) Psychologische en fysische structuren van perceptueel geselec­

teerde klinkers, een onderzoek aan de hand van Zuidnederlandse klinkers.

Doctoral thesis, Louvain University.

Green, D.M., Kidd, G. and Picardi, M.C. (1983) Successive versus simultaneous com­

parison in auditory intensity discrimination. J. Acoust. Soc. Am. 11, 639-643.

Klatt, D.H. (1982) Predictions of perceived phonetic distance from critical-band

spectra: a first step. Proc. ICASSP 82 (2), 1278-1281.

Plomp, R. (1964) The ear as a frequency analyser. J. Acoust. Soc. Am. 36, 1628­

1636.

Scheffers, M.T.M. (1979) The role of pitch in perceptual separation of simulta­

neous vowels. IPO Annual Progress Report li, 51-54.

Scheffers, M.T.M. (1983) Identification of synthesized vowels in a noise back­

ground. IPO Manuscript 450, to be submitted for publication.

Stumpf, C. (1890) Tonpsychologie. Lizensausgabe des S. Hirzel Verlages, Leipzig.

Republished in 1965 by Knef-Bonset, Hilversum-Amsterdam.

Vogten, L.L.M. and Willems, L.F. (1977) The Formator: a speech analysis-resynthe­

sis system based on formant extraction from linear prediction coefficients. IPO

Annual Progress Report ~, 47-54.

45

Page 45: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

46

Speech quality and word recognition from fragments of spoken words

S.G. Nooteboom and G.J.N. Doodeman

introduction

Correct recognition of a word from speech can often take place when only part of

the spoken word form has been heard (e.g. Marslen-Wilson, 1980). In such a case the

remainder of the word is, in some sense, redundant information. Of course, redun­

dant does not mean superfluous. Redundancy serves to make speech communication less

vulnerable to all kinds of degradation of the 'ideal' speech signal, caused for ex­

ample by sloppy articulation, external distortion, or a hearing deficit in the lis­

tener.

When the speech signal is noticeably degraded, we may expect that the part of theword form that would have been redundant in ideal conditions is, at least to some

extent, needed to restore correct recognition. Thus the relative size of the frag­

ment of the word form needed for correct recognition (or correct guessing), could

probably provide us with a useful measure of the degree of degradation. This idea

was earlier successfully applied to measuring the skill of native and non-nativelisteners in using auditory information in word recognition (Nooteboom and Truin,

1980). In the present experiment we have set out to measure the effect of differen­

ces in speech quality, caused by differences in degree of data reduction in LPC vo­coder speech, on the relative number of speech sounds needed for correct recogni­

tion of polysyllabic words. For this purpose we used an adaptation of the 'gating

paradigm' introduced by Grosjean (1980).

Our aim was twofold. We wanted to find out whether we could obtain a reliable and

relatively easily applied meapure for speech quality. We also wanted to see whether

the course of the probability of correct recognition, as controlled by the succes­

sively added speech sounds, has any diagnostic value with respect to the type ofdegradation of the speech signal.

Method

A set of 40 Dutch polysyllabic words was selected, with frequencies of usage of 10

or more per 720.000 words in the Uit den Boogaart word frequency count (Uit den

Boogaart, 1975). Optimally spoken realisations of these words by a speaker of stan­dard Dutch were recorded and stored on disk in digital form (pcm, 12 bits per sam­

ple, 10 kHz sampling frequency). From each word token an initial fragment was isol­

ated, corresponding to the beginning of the word, and containing several speech

sounds. This chosen fragment was long enough for successful application of LPC

analysis and resynthesis, and short enough to ensure a low probability of correct

recognition. Further versions of the same word token were produced by adding to the

ini tial fragments segments of speech corresponding to success i ve speech sounds.

This was done under visual and auditory control. An example of a phonetic trans-

Page 46: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

cription of consecutive fragmentary word tokens of one word, in this case the word

AUTORITEIT (Engl. AUTHORITY), is:

1. [oto], 2. [otorI, 3. [otori], 4. [otoritI, 5. [otoritei],

6. [ otoriteitJ.

All 40 sets of word fragments were prepared in four speech qualities:

1. the original digital recording, using 120.000 bits per second;

2. vocoder speech, obtained wi th an LPC-to-formant analysis-resynthes is system,

using 16.000 bits per second (Cf. 't Hart, Nooteboom, Vogten and Willems, 1982);

3. idem, with further data reduction by parameter quantisation to 4.000 bits per

second;

4. idem, with still further reduction to 1.000 bits per second.

From these 40 sets of word fragments in the four speech qualities, four stimulus

tapes were prepared. Each tape contained four groups of ten words, each group in a

di fferent speech quality. Each group of ten words appeared in a di fferent speech

quali ty on each of the tapes. The order of speech qual i ties on each tape varied

randomly from one word to the next.

Each tape was played over headphones to a different group of five individually

tested listeners. After the presentation of each fragment, listeners were encou­

raged to guess and say aloud the word from which the current fragment was taken. If

not able to guess, they were asked to repeat aloud the fragment heard. After each

correct guess the experimenter switched to the next set of word fragments. Stimuli

and responses were recorded on two separate tracks of a magnetic tape for later

analysis.

Results

The results presented here will be limited to probabilities of correct recognition

as a function of the number and kind of added segments. The probability of correct

recognition as a function of the number of speech segments added to the initial

word fragment, for the four speech qualities separately and averaged over all words

and all subjects, is given in Fig. 1.

The difference between each pair of curves is significant (p < 0.05) on a sign test

applied to estimated means for individual words in different conditions. As expect­

ed, the number of audible segments necessary for correct recognition systematically

increases with decreasing speech quality.

In search of diagnostic indications in our data, we have calculated the relative

contribution of consonant and vowel segments to correct recognitions. The propor­

tion of the total number of correct recognitions occurring immediately after adding

a vowel segment, and the proportion occurring immediately after a consonant seg­

ment, in the four speech qualities, is plotted in Fig. 2. We see that with decreas­

ing speech quali ty the relative contribution of vowel segments increases at the

cost of consonant segments. Apparently, on the average, consonant segments suffer

more than vowel segments from the particular type of data reduction used.

47

Page 47: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

4

16

• LPC

o LPC/

~ 120

uQ)

~ 0.5ou

o1--."f\---.--------------------'o 2 3 4 5 6

number of added segments

Fig. 1. Probability of correct word recognition as afunction of the number of sound segments added to theinitial word fragment, for four speech qualities.

D after adding consonant

~ after adding vowel

(ij-0- -'0 u 0.5Q)...

lJ) ...c 0

.Q u- ...... Q)0 .0Co E0... ::JCo C

oPCM120

LPC16

LPC4

LPC1

48

Fig. 2. Proportions of the total number of correctlyrecognised words after adding a vowel or a consonantsegment, for four speech qualities.

Page 48: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

We also investigated the relative contribution to recognition of stressed and un­stressed syllables. For this purpose we focused on those 27 of the 40 words in

which the initial word fragment did not contain the lexically stressed syllable.

For each of those words we numbered the added segments, starting with 0 for the vo­

wel of the stressed syllable, negatively towards earlier and positively towards la­ter segments. We then made frequency distributions of correct recognitions over the

numbers of added segments. These are presented in Fig. 3. Obviously, as speech qua­

lity decreases, correct recognition becomes more and more dependent on the availa­

bility of the vowel of the lexically stressed syllable.

Fig. 3. Frequencies of correct word recognition as a functionof the position of the added segment. This position is takenrelative to the position of the lexically stressed vowel.

T

T

C

-3 0 3

LPC1

KBIT

LP16

KBI

.... l-t-3 3

LP4

KBI

r .,

stressed vowel40

l1IItPCM120KBIT

-3 0 3

C

(Jl

"0 U....a

403"0Q)NCCla()Q).... 0>- 40;::()Q)........a()-a.... 0Q).0E 40:::I.C

o

It should be noted that all these words were presented with a natural pitch contouras produced in isolation. Therefore these data do not allow us to decide whether

the determining factor is the relatively well preserved identity of the stressed

vowel, or the availability of the relative position of the stressed syllable. But

at least the data strongly suggest that the particular type of data reduction ap­plied in our analysis-resynthesis system does considerably more damage to unstress­ed than to stressed syllables.

Discussion and conclusion

The results of this experiment show that the 'gating paradigm' can be applied

fruitfully to the problem of measuring differences in speech quality. It proved49

Page 49: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

50

possible to find significant differences between the four speech qualities used,

with 40 words and only a few listeners per word, suggesting that measurement of re­

lative speech qualities can be fairly easy and fast, given the availability of pre­

pared sets of word fragments. The discriminative power of the test compares favour­

ably with an adaptation of the Nakatani and Dukes (1973) test, as appli.ed to ap­

proximately the same speech qualities by Vogten (1980). This discriminative power

probably has two sources. One is that the method allows comparison between speech

qualities per word, thus allowing for statistical tests with individual words as

entries. The other is that the test is very sensitive to the effects of speech qua­

Ii ty on the weak parts of the speech signal, corresponding to unstressed sylla­

bles. In this respect the test differs from practically all current speech quality

tests, which normally focus on stressed monosyllabic words. As exemplified in the

results section, a simple analysis of the data distribution may give useful indica­

tions of those parts of the speech signal which are most seriously damaged in each

speech quali ty •

References

Grosjean, F. (1980) Spoken word recognition and the gating paradigm. Perception &

Psychophysics 28, 267-283.

Hart, J. 't, Nooteboom, S.G., Vogten, L.L.M. and Willems, L.F. (1982) SPARX:

manipulation of speech sound. Philips Technical Review 40, 134-145.

Marslen-Wilson, W.D. (1980) Speech understanding as a psychological process. In:

J.D. Simon (Ed.) Spoken Language Generation and Recognition. Reidel, Dordrecht,

39-67.

Nakatani, L.H. and Dukes, K.D. (1973) A sensitive test of speech communication

quality. J. Acoust. Soc. Am. ~, 1083-1092.

Nooteboom, S.G. and Truin, P.G.M. (1980) Word recognition from fragments of

spoken words by native and non-native listeners. IPO Annual Progress Report ~,

42-47.

Uit den Boogaart, P.C. (1975) Woordfrequenties in geschreven en gesproken Neder­

lands. Oosthoek, Scheltema & Hoekema, Utrecht.

Vogten, L.L.M. (1980) Evaluation of LPC formant-coded speech with a speech inter­

ference test. IPO Annual Progress Report ~, 33-41.

Page 50: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Analysis and synthesis of mixed excitation LPC coded speech

C.J. Darwin

Introduction

The weakest part of systems for analysing and resynthesizing speech is the estima­

tion of pitch and voicing. The quality of LPC coded speech can be improved substan­

tially by doing away with explicit decisions on pitch and voicing, coding directly

the LPC error signal (Atal and Remde, 1982). But in many applications the pitch and

duration of speech segments need to be changed, necessitating explicit decisions on

pitch and voicing.

The IPO LPC analysis and resynthesis system includes a successful pitch algorithm­

DWS (Duifhuis, Willems and Sluyter, 1982), but only a rather simple voiced/voice­

less decision criterion. This report describes work aimed at providing a voicing

parameter that exploited the methods and success of the DWS algorithm.

Essentially the method described below replaces the normal voiced/voiceless binary

decision by a multi-valued parameter Fc • Fc is the cut-off frequency between voiced

(buzz) and voiceless (noise) excitation. Below Fc the excitation is voiced, aboveit is voiceless. On resynthesis the buzz and noise excitation sources are respec­

tively low and high pass filtered at Fc • The parameter Fc is estimated in the anal­ysis system by two complementary methods. The first method looks for the highest

harmonic found by DWS. The second method inspects a representation of the speech

signal similar to a wide-band spectrogram, looks for the sharp increases in energy

across different frequency bands characteristic of buzz excitation, and estimates

how high in frequency such striations are visible. Fe is the maximum of the highest

frequency harmonic found by the first method and the highest frequency at which

striations are visible to the second method.

We have adopted two separate methods in order to approximate the capabilities of

the auditory system. At low frequencies the individual harmonics of the voice are

resolved and the DWS system gives a good approximation to psycho-acoustic perfor­

mance. Above the 10th or so harmonic, the auditory system can no longer resolve out

individual harmonics. However, it is still capable of discriminating whether a for­

mant is excited by a periodic or an aperiodic source. There is a clear difference

in timbre, even though the periodically excited formant may not have a clear

pitch. We have assumed the ability to discriminate buzz from noise excitation de­

pends on the perception of amplitude modulation.

Speech synthesized by the system described below is generally less buzzy than that

produced by the original system and gives better fricatives, although it is compu­

tationally expensive (because of the filter bank used to produce the digital spec­

trogram) and suffers from occasional roughness caused by underestimation of Fc •

51

Page 51: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

52

Synthesis of mixed-excitation LPC speech

Normal LPC synthesis uses either buzz or hiss excitation, but not both, at anyone

time. Two examples illustrate that such a dichotomy is too simple. First, at onset

and offset of voicing (see Fischer-Jorgensen and Butters, 1981) or during 'breathy'

voice, the lower frequencies may be predominantly buzz excited (showing harmonic

structure on a narrow-band spectrogram), while higher frequencies (above the first

formant) are noisy. Second, phones such as voiced fricatives are produced naturally

wi th both buzz and noise exci tat ion ; here the noise is ampli tude modulated by the

changing intra-oral air pressure during each glottal cycle.

The simplest way to produce mixed excitation is to add buzz and noise together in

variable proportion. But simple addition does not capture the general observation

that during mixed excitation speech, noise is usually confined to the higher fre­

quencies. A more natural synthesis can be achieved, at least with parallel-formant

synthesizers (see Holmes, 1973), using mixed excitation with a variable cut-off

frequency Fc • Below Fc the excitation is buzz, above it it noise. Such a cut-off

between buzz and noise has also been applied to LPC synthesis of speech by Makhoul

et al. (1978), and is the method adopted here. A more elaborate method, allowing

multi-layer sandwiches of buzz and noise excitation has been used by Fujimura

(1968) for vocoded speech, but it is not clear that it confers any advantages.

In the present program two primary sources of excitation are used, buzz and noise.

The excitation that passes into the LPC filters is a mixture of the buzz low-passed

at Fc ' plus the noise high-passed at Fc • The gain of the mixture is then adjusted

according to the needs of the frame being synthesized.

Care must be taken that the relative levels of buzz and noise excitation are appro~

priate. There is no algorithmic way to equate them before synthesis, because the

intensity of the final waveform depends on the relation between pitch and formant

positions. However, two separate criteria can be used. The first applies to speech

that is synthesized with dichotomous voicing: the ratio of the levels of voiced

frames in the original and the resynthesized versions should be the same as the ra­

tio of the levels of the voiceless frames. The second applies to mixed excitation:

the overall level of a long section of voiced speech should be the same when syn­

thesized with buzz on a normal pitch contour, as when synthesized with noise. In

the system described here, a fixed relative level of noise excitation is used that

is based on the second criterion using male speech.

Estimation of mixed-excitation cut-off frequency

Two previously published approaches to the problem of characterising different fre­

quency regions of a signal as being buzz or noise excited have both been psycho­

acoustically optimistic. Makhoul et al. use the output of a harmonic pitch extrac­

tor to estimate the highest frequency at which the spectrum is harmonic, whereas

Fujimura uses cepstral analysis to detect periodicity in band-limited sections of

the signal. Neither approach is psycho-acoustically plausible above the 10th or so

harmonic.

Page 52: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The approach used here combines frequency and temporal information~ the programtakes as the cut-off frequency whichever is the higher of two frequencies: (i) the

highest harmonic found by a modified version of the DWS pitch program (Duifhuis,

Willems and Sluyter, 1982) and (ii) the estimated 'height' (in frequency) of the

striations in a wide-band spectrogram-like representation. A striation is a sharp

increase in energy across many frequency bands at the same time, visible in wide­

band spectrograms of voiced speech as a vertical edge.

Estimating the cut-off frequency Fc

The cut-off frequency between buzz and noise is estimated as the maximum of the

highest harmonic and the 'height' (in frequency) of the most prominent striation in

a wide-band spectrogram covering the current 10 ms analysis window. The routinesused to find this striation, can also be used to locate the position of glottal

pulses during speech that is known to be voiced and may be of some help in reducingoctave errors in pitch estimates.

The speech is first passed through a filter bank that has 19 channels with equally

(linearly) spaced center frequencies and a constant 500 Hz bandwidth. In order for

our method to be successful, clear amplitude modulation must be present in the

band-passed signal. The filter bandwidth must be greater than the harmonic spac­

ing. A 500 Hz bandwidth was chosen as greater than the harmonic spacing of all but

the highest voices. It would, of course, be psycholog ically more correct to use

cri tical bandwidth channels. Subsequent systems could combine the DWS component

with the present filter bank into a single critical-band filter bank.

In order to find local rises in the energy of each channel, the output from the

filter bank is convolved with three different widths (2, 4, 6 ms) of a differenti­

ated gaussian mask. Such a mask has an inhibitory region followed by an excitatory

one, and so responds maximally to increases in energy. It gives no response to un­

changing energy levels and gives a negative response to decreasing energy. Differ­

ent widths of mask respond best to different rates of increase of energy. The value

of the highest output at each time position (1 ms) is noted. For each channel, the

local maxima of this output across a 5 ms window are marked. The number of marked

points across all channels at each time position (with a tolerance of 1 ms) is then

added up and the maximum of the sum found across a 10 ms window. The temporal posi­

tion of this maximum is thus the time at which the greatest number of frequency

channels simultaneously have a local rise in energy. The height of the striation at

this point is then defined as the highest frequency channel that has a local energy

rise and below which there is a local energy rise in all frequency channels. The

estimated cut-off frequency is the maximum of this value and the highest harmonicfound.

Requiring that a rise in energy be found in all frequency channels below the cut­

off, is a very stringent criterion. Using good quality speech we find that it works

well (see next section). But it is likely that the criterion would prove too strin­

gent for poorer quality speech. Further work will be necessary to provide a crite­

rion that is robust and yet does not overestimate the cut-off frequency in good

quality speech. 53

Page 53: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Performance of the analysis-synthesis system

The performance of the system has been assessed in two ways: first the analysis

system was given steady-state vowels that had been synthesized at different cut-off

frequencies by the synthesis program.

5r----------------------"7---..,1\ • •

4

26 frames

1\

1\v

1\

U<1.1.. 1\ 1\

1\

range

+1 S X

1\V

Iv

1\

Q""-_.z...._..L-_....I..- ----L. ----' "'-- --'------l

2

Q 2 3 4 5

Fe (kHz)~

Fig. 1. Performance of an algorithm for finding the cut-off fre­quency between buzz and hiss excitation. The speech used was asteady synthetic vowel, synthesized with different cut-off fre­quencies (Fe). The figure shows the mean, standard deviation andrange of the estimates (Fe) returned by the program to 26 consecu­tive frames of the vowel.

54

In Fig. 1 is plotted the cut-off frequency estimated by the striation component of

the analysis program, against the actual cut-off frequency specified to the synthe­

sis program. For fully voiced speech the system performs perfectly (the highestcut-off frequency that the analysis program can return is 4500 Hz). Performance is

good down to around 1 kHz. At cut-off frequencies lower than 1 kHz the program

makes numerous overestimation errors. The reason is presumably fortuitous energy

rises in the noise excitation, together with some buzz excitation leaking through

the cut-off filters used in the synthesis. It would perhaps be possible to improve

the performance of this part of the program by imposing some threshold on the size

of rise in a channel that qualifies as a rise1 but I have preferred to avoid all

dependency on absolute levels in order to keep the algorithm general.

Page 54: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Informal listening tests have been made on ten good quality sentences spoken by amale. There is a clear difference between the current resynthesis system and the

mixed system described here. The mixed system does not produce the buzziness of the

conventional LPC speech and also shows substantial improvement in the rendering of

some fricative consonants. On the other hand there are occasions where the mixed­

excitation speech sounds somewhat rough, and others where there are some extraneous

noises added to the speech. The roughness is due to some frames receiving too Iowacut-off frequency and too abrupt transitions between adjacent high and low cut-off

frequencies. It may be possible to alleviate this fault by introducing some linearsmoothing of the cut-off frequency as well as the 3-point median smoothing current­

ly empl~yed. Systematic evaluation of the system is needed for a range of different

speakers in different acoustic environments.

Concluding remarks

The system developed here has proved to be capable of producing speech that shows

some improvements over conventional resynthesis. It is computationally expensive onthe VAX (circa 120 times real time), but most of this time is spent in simulating a

19-channel filter bank.

The routines used to estimate Fc do not use any absolute thresholds and so should

be more resistant to changes in signal level than routines (such as the existing

voiced/voiceless decision) that do. Similarly, the routines do not use any spectralbalance measures and so should be more resistant to changes in spectral balance

produced by different recording conditions and speakers. Only by testing the algo­rithm in a wide range of environments can it be made more robust and capable of

graceful failure in the face of a degrading input signal. In particular it would be

interesting to know whether the striation finding algorithm can survive severephase distortion.

There is a very obvious lack of relevant psycho-acoustic research on topics related

to the questions addressed by this program. We do not know how readily, or by what

mechanism we can discriminate buzz excited from noise excited high-frequency for­mants, in which harmonic structure is not apparent. Such work should be able to

guide a more psycho-acoustically plausible choice of parameters and mechanism thanhas been taken in the present program.

The algorithm for estimating striation height proved to be successful at finding

the location of glottal pulses in speech known to be voiced. It could therefore be

used as a first step in applications where glottal pulses have to be marked in the

speech waveform. Such applications include pitch synchronous analysis and synthe­sis, and pitch and temporal changes to natural speech.

References

Atal, B.S. and Remde, J.L. (1982) A new model of LPC excitation for producing

natural-sounding speech at low bit rates. Proceedings ICASSP 1, 614-617.

Duifhuis, H., Willems, L.F. and Sluyter, R.J. (1982) Pitch measurement of speech:55

Page 55: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

56

an implementation of Goldstein's theory of pitch perception. J. Acoust. Soc.

Am. 11, 1568-1580.

Fischer-Jorgensen, E., and Hutters, B. (1981) Aspirated stop consonants before low

vowels, a problem of delimitation, its causes and consequences. Annual Report of

the Institute of Phonetics, University of Copenhagen, 12, 77-102.

Fujimura, O. (1968) An approximation to voice aperiodicity. IEEE Trans. Audio and

Electro-acoustics, AU-16, 68-72.

Holmes, J.N. (1973) The influence of glottal waveform on the naturalness of speech

from a parallel formant synthesizer. IEEE Trans. Audio and Electro-acoustics,'

AU-21, 298-305.

Makhoul, J., Viswanathan, R., Schwartz, R. and Huggins, A.W.F. (1978) A mixed­

source model for speech compression and synthesis. J. Acoust. Soc. Am. 64 (6),

1577-1581.

Page 56: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The role of accentuation in comprehension. A first test

J.M.B. Terken

Introduction

As part of a project concerned with the how and why of accentuation we have been

studying the factors affecting accentuation in spoken scene descriptions and in­

structions. Results of these investigations have been reported in preceding issues

of the IPO Annual Progress Report (Terken 1980, 1981). The present paper reports on

an investigation into the influence of accentuation on the listener's comprehen­sion; the main interest is in finding out how helpful appropriate accentuation is

and how disadvantageous inappropriate accentuation is to the listener's comprehen­sion.

From the production experiments we derived the following working hypothesis about

the function of accentuation. When a speaker does not accent a verbal expressionwhich, on syntactic grounds, would be expected to be accented, he signals to the

listener that he is assumed to have the interpretation-in-context available. As aresult, the class of potential interpretations of the expression is narrowed down

to those immediately available, facilitating the interpretative process on the partof the listener.

For instance, if a speaker utters (1),

(1) we went to the theatre yesterday

the addressee will have to compute the interpretation of the referring expression

'the theatre'. But if the utterance is embedded in a mini-conversation, as in ex­ample (2),

(2) (we do not often visit the theatre)

but we went to the theatre yesterday

when the expression is mentioned for the second time, the addressee does not have

to search for a new interpretation, but he may immediately relate it to the inter­

pretation computed earlier. Our hypothesis is that de-accentuation is one means ofsignalling the listener to do so, and that accentuation of the second instance of

'theatre' would lead the listener to analyse the expression again and compute aninterpretation anew.

An important decision concerns how to test this hypothesis. One might, for in­

stance, consider whether inappropriate accentuation affects speech intelligibility,but as we can make an utterance perfectly understandable while maintaining the in­

appropriate distribution of accents l ), we prefer another measure. Since the passageof time is a necessary aspect of information processing, we may consider whether

1) This remark should not be taken as an a priori proposition about the role of ac­centuation in noisy communication.

57

Page 57: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

58

inappropriate accentuation takes longer than appropriate accentuation, by getting

some measure of the time it takes to process the information in either case.

If the view outlined above is correct, we would predict that the following rela­

tions apply ( < stands for 'faster than'):

1. appropriate -accent < appropriate +accent (in the latter condition the number of

potential interpretations to be considered is greater);

2. appropriate -accent < inappropriate +accent (since inappropriate accentuation

leads the listener to consider more interpretations than strictly necessary);

3. appropriate +accent < inappropriate -accent (since inappropriate de-accentuation

leads the listener (initially) to consdder fewer interpretations than necessary.

4. inappropriate +accent < inappropriate -accent (in the case of inappropriate

+accent the listener considers more possibi Ii ties than are necessary, but the

intended referent is included in the set of candidates, in case of inappropriate

-accent the intended referent is not included in the set of candidates consider­

ed in the first instance).

A major problem in this approach is to establish what counts as appropriate accen­

tuation and de-accentuation. Most authors suggest (and these suggestions have been

confirmed in our earlier investigations) that speakers, in determining which verbal

expressions should be marked by accents, lean heavily on the preceding linguistic

context, in particular the immediately preceding utterance. Since the precise rules

are not yet known, we shall use the following simplification. If a verbal expres­

sion has been used in the same syntactic role in the immediately preceding utter­

ance, de-accentuation is appropriate; otherwise, accentuation is appropriate. De­

parture from this rule results in inappropriate (de-)accentuation.

Method

Task

The task for the listener was a picture-utterance verification task: the listener

watched a screen displaying a simple letter configuration. After a warning signal

one of the letters was allocated a new position relative to one of the other let­

ters. The listener heard a spoken description of the change, in the form

(1) dep komt .links boven de k

(the p comes left above the k)

The listener had to decide whether the utterance contained a true description of

the change in the letter configuration or not. He was to indicate his decision by

pressing one of two buttons. After the listener had responded, again one of the

letters changed position; again the listener heard a description of the change

which had to be responded to, and so on.

Material

We prepared eighty blocks of utterances plus letter configurations. Each block con­

tained one test utterance preceded by three or four context utterances. In forty

test utterances the distribution of accents was appropriate with respect to the

Page 58: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

preceding utterance (in context utterances the distribution of accents was alwaysappropriate). From these forty blocks the other forty blocks having test utterances

with inappropriate accentuation were derived in the following way: in half of thecases the subject noun was de-accented if it was accented in the appropriate ver­

sion and accented if it was de-accented in the appropriate version; in half of the

cases the head noun of the predicate was de-accented if it was accented in the ap­

propriate version and accented if it was de-accented in the appropriate version.

Compare the following stylised pitch contours for sentence (1) (abrupt movements

represented by solid lines are prominence lending pitch movements):

(1) de p komt links boven de kA --J"'~ "B / "C -../ "\

For the subject, appropriate A was replaced by inappropriate B, appropriate B was

replaced by inappropriate A; for the head noun of the predicate, appropriate A was

replaced by inappropriate C, appropriate C was replaced by inappropriate A. Thus,

each inappropriat~ version had its own appropriate control.

In addition to these manipulations half of the test utterances contained true de­

scriptions, the rest were false. Also, some of the context utterances contained

false descriptions: however, the context utterance immediately preceding the test

utterance was always true. Descriptions could be false in two ways: for the utter­

ances containing inappropriately (de-)accented subjects and their controls ('sub­

ject targets') another letter changed position on the screen than the one mentioned

in the spoken description, for the utterances containing inappropriately (de-)ac­cented predicate nouns and their controls ('predicate targets') another letter

served as anchor point on the screen than the one mentioned in the spoken descrip­

tion. Consequently, false subject targets could be rejected as soon as the sUbjecthad been mentioned.

In all we had sixteen combinations of conditions. Within each combination we hadfive blocks.

SUbjects

Subjects were ten students of the Eindhoven University of Technology. They were

paid for participation.

Procedure

Subjects were tested individually. The subject was told that he would hear blocks

of utterances describing successive changes in a letter configuration displayed on

a screen. He was aSked to decide as quickly as possible whether a description was

true or false, and to indicate his decision by pressing one of two buttons. Only

the reaction time and the decision for the test utterance were recorded.

Page 59: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

60

The subject was seated in a sound-proof cabin. The screen was placed in front of a

window in the cabin. Viewing distance was approximately fifty centimetres. The spo­

ken descriptions were played over headphones at a comfortable level. The whole ex­

periment was computer-controlled. A series of ten blocks served as an exercise be­

fore the actual experiment.

Latency times were measured from the earliest point in the utterance where the lis­

tener could determine the truth value of the utterance, that is, from the vowel on­

set of the predicate noun for true and false 'predicate targets' and for true 'sub­

ject targets', and from the vowel onset of the subject noun for false 'subject tar­

gets' •

Results

We will not consider the data for true 'subject targets', since in this condition

decisions were taken at the end of the utterance, whereas we are interested in the

immediate effects of the presence or absence of accents on decision latencies, that

is, in the case of 'subject targets' we are interested in the processing demands in

the vicinity of the grammatical sUbject rather than at the end of the utterance.

For the same reason we have excluded the data for false 'subject targets' of three

subjects who always delayed their decisions to the end of the utterance. Wrong de­

cisions (11 out of 540) have also been excluded from the analyses. The results have

been analysed separately for false 'subject targets', for true 'predicate targets'

and for false 'predicate targets'. The primary statistical test used is the sign

test on matched pairs of individual measurements. Average latency times in milli­

seconds are represented in Table I.

a. FALSE SUBJECT TARGETS +ACCENT -ACCENTAPPROPR 872 828

INAPPROPR 925 749

b. TRUE PREDICATE TARGETS +ACCENT -ACCENTAPPROPR 541 546

INAPPROPR 597 675

c.FALSE PREDICATE TARGETS +ACCENT -ACCENTAPPROPR 672 722

INAPPROPR 662 774

Table I. Average latency times for (a) false, subj ect targets', ( b) true 'pred i cate tar­gets' and (c) false 'predicate targets', asrelated to the presence or absence of accent,and to the appropriateness of the presence orabsence of accent.

Page 60: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The results are as follows (the number N of pairs, the number x of pairs where .thedifference is in the predicted direction, and the probability under HO, are given

in parentheses, in the order false 'subject target', true and false 'predicate tar­

get', respectively):

1. no differences between appropriate -accent and appropriate +accent(N=35, x=18, p=.50; N=31, x=16, p=.50; N+27, x=18, p=.12);

2. no differences between appropriate -accent and inappropriate +accent(N=34, x=21, p=.22; N=40, x=22, p=.62; N=36, x=18,p=.50);

3. appropriate +accent is slower than inappropriate -accent for false 'subject tar­

gets', but faster for true and false 'predicate targets'

(N=31, x=9, p=.02; N=42, x=33, p=.0004; N=38, x=28, p=.006);4. inappropriate +accent is slower than inappropriate -accent for false 'subject

targets', but faster for true and false 'predicate targets'

(N=43, x=12, p=.006; N=34, x=25, p=.01; N=32, x=28, p=.0001).

These findings are supported by the results of two-way analyses of variance. In ad­

dition, the main effect of accentuation has been found to be Significant in the

analyses of variance. However, -accent is faster than +accent for 'subject targets'

(F1,83=7.86), but slower for 'predicate targets' (F1,174=4.49, and F1,175=10.4, re­spectively) •

The distribution of errors has not been analysed, since the proportion of errors isvery small (.02).

Discussion

The predictions derived in the introduction are not corroborated by the results.

Moreover, the results for Subject are the opposite of the results for Predicate.

How, then, can these results be understood?

For predicate targets there is a clear effect: the decision latency is lengthened

when the predicate noun is inappropriately de-accented. This may be accounted forby the form of the intonation contour: we do not simply de-accent the predicate

noun; rather, the final fall is shifted from the final word to a preceding word.

The final fall generally indicates that the remainder of the utterance only

contains information the interpretation of which is already available to the

listener. Therefore a final fall on an earlier word than the last one leads the

listener to build up strong expectations on what is still to come. If theseexpectations are not borne out -as is the case with inappropriately de-accented

predicate targets- the listener's interpretation will be slowed down considerably.

We do not have an adequate account for the data for subject targets which is compa­

tible with the data for predicate targets. We will therefore postpone the interpre­

tation for these findings until further experimental data are available.

Summary

A verification experiment was set up to assess the effect of (A) presence or ab­

sence of pitch accent; (8) appropriateness of (de-)accentuation. Listeners watched61

Page 61: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

62

a change in a letter configuration and heard a description of the change immediate­ly afterwards. The utterances were of the form 'the p moves to the right of the

k'. Appropriateness of accentuation was defined with respect to the immediately

preceding utterance, and was manipulated independently for subject and predicate

noun.

Results were the following.

(A) For false 'subject targets', decision latencies for inappropriate -accent were

faster than for appropriate and inappropriate +accent.(B) Inappropriately de-accenting the predicate noun led to a considerable lengthen­

ing of decision latency, when compared to appropriate and inappropriate +accent.

It is suggested that inappropriately advancing the final fall (which is what hap­

pens in the case of inappropriately de-accenting the predicate noun) may cause dis­

turbances in the course of processing, because the speaker is not continuing the

utterance in the way the listener expects him to on the basis of the preceding

pitch contour. The data for 'subject targets' have not yet been accounted for.

References

Terken, J.M.B. (1980) The distribution of pitch accents as a function of informa­

tional variables I. IPO Annual Progress Report ~, 48-53.

Terken, J.M.B. (1981) The distribution of pitch accents as a function of informa­

tional variables II. IPO Annual Progress Report ~, 39-43.

Page 62: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Exploring the possibilities of speech synthesis with Dutch diphones

B.A.G. Eisendoorn and J. 't Hart

Introduction

In the last decade several uni ts have been proposed to be used in computer con­

trolled concatenation of speech, ranging from words (Olive and Nakatani, 1974) via

syllables (Fujimura and Lovins, 1978) and demisyllables (Browman, 1980) to diphones

(Emerard and Lasseur, 1974). An advantage of the latter two approaches is that only

a fairly limited number of units will suffice to synthesize any possible speech ut­

terance.

Up to now research on speech concatenation by means of diphones for the Dutch lan­

guage has only been rudimentary. Th'e present research on speech synthesis using

Dutch diphones therefore started with an exploratory stage to investigate whether

these units were likely to give satisfactory results.

Material, segmentation, storage

To make sure that the diphones were as clearly and neutrally enunciated as possi­

ble, the words out of which they were to be segmented were nonsense words of the

general /CaCVCa/ type; where the second syllable was stressed and the three conso­

nants were identical. The V(owel) stands for any of the fifteen Dutch vowels or

diphthongs. The words were embedded in a carrier phrase and bore sentence accent.

The material was read out by a male native speaker of Dutch in a sound attenuating

booth and recorded onto tape, using high quality equipment. Subsequently the mate­

rial was digitised with a sampling frequency of 10 kHz and the nonsense words were

segmented and LPC-analysed. Out of the stressed syllable two diphones were segment­

ed, one CV-diphone (half consonant, half vowel) and one VC-diphone (half vowel,

half consonant). The border between the two diphones lay halfway the vowel, except

for diphthongs which were segmented before the transition. Consonants, other than

plosives were segmented halfway, where the amplitude reached its lowest value. The

boundary in stop consonants was the point at which the noise burst started. This

implies that e.g. a /pV/-diphone starts with the noise burst, whereas a /Vp/­

diphone finishes with the silent interval.

The diphoneswere stored on disk in 10 ms frames, containing information about

loudness, source sound, five formants with quality factors and FO, fixed at 100

Hz. There was no reduction in steady state parts of vowels and consonants. This re­

sulted in a set of about 450 diphones.

First attempts at synthesis

With this set only a limited number of utterances could be generated, since conso­

nant clusters could not be handled yet. The main goal of these first attempts was

to find out whether this approach was likely to yield any satisfactory results.

The discontinui ties in the middle of vowels and consonants were not as great as:63

Page 63: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

64

they were expected to be and passed largely unnoticed, even though no smoothing al­

gorithm was applied across diphone boundaries.

Fundamental frequency was semi-automatically superimposed with the aid of a compu­

ter program in which the most frequently occurring Dutch intonation patterns are

stored.

Extension of the inventory

Our experiences with these attempts gave reason to be optimistic and to continue

working towards a complete diphone inventory for Dutch.For this purpose new words were recorded, produced in isolation, and containing

consonant clusters. This time use was made of existing words. The analysis and seg­

mentation procedures were identical to the ones followed in the case of cv- and

VC-diphones. In those cases where the consonant cluster could occur within a sylla­

ble as well as across a syllable boundary, both varieties were included in the in­

ventory (e.g. the /mp/-diphone as in Eng.: 'lamp' vs. 'lam/poon', DU.: 'lamp' vs.

'lam/pekap' ) .

Segmentation of /h/-diphones caused more difficulties as the Dutch /h/-sound is a

very special phoneme. Contrary to the same phoneme in English, it is a voiced fric­

ative and influences of surrounding phonemes can be more easily perceived because

of the voiced character of the sound. Therefore /h/-diphones could not be segmented

using the same criteria, as the frication noise was influenced by surrounding pho­

nemes, to such an extent that concatenation of two /h/-diphones yielded unsatisfac­

tory results. We therefore opted for a so-called triphone, in which the complete

/h/-sound was preserved, preceded and followed by a half phoneme.The inventory also contains vowel diphones consisting of the second half of a vow­

el, followed by a glottal stop, followed by the first half of a vowel (e.g. in

Eng.: 'the author', DU.: 'na/apen'). The total number of diphones now amounts to a­bout 1200 and, with a few exceptions (viz. diphones containing loan phonemes as

well as a small number of infrequently occurring phonemes), any speech utterance inDutch can be concatenated.

Diphone synthesis: some disturbing phenomena

The synthesized speech is now subject to detailed scrutiny to locate disturbances

in the speech signal. It appears that discontinuities in the amplitude curve are

much more disturbing to the ear than are sudden jumps in the formant trajectories.

In some places the temporal organisation of the synthesized speech leaves much to

be desired. Lack of durational structures leads to decreasing naturalness. However,

naturalness seems to increase by leaps and bounds, when .the speech rate is speeded

up by 20% (this is possible without affecting the fundamental and formant frequen­

cies). It should be noted that these are only impressionistic data, as no percep­

tion and acceptability tests have yet been performed.

Summary and conclusions

A setup has been made for computer controlled concatenation of speech using Dutch

diphones. The resulting speech is easy to understand and of remarkably good quali-

Page 64: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

ty, considering the fact that no smoothing algorithm or rules for the production of

correct durations were applied. We can therefore conclude that the lookout for a

diphone based synthesis system for Dutch seems promising. We should, however, be

careful not to be over-optimistic as results might be flattered by the use of this

particular speaker. We do not know what results will be like when a second speaker

is used to build up a diphone inventory.

In the next year we shall concentrate on this problem of speaker dependency and on

temporal aspects. It is hoped that a correct durational structure will lead to more

natural synthesized speech.

References

Browman, C.P. (1980) Rules for demisyllable synthesis using Lingua, a language in­

terpreter. IEEE-ASSP, 561-564.

Emerard, F. and Larreur, D. ( 1976) Synthese par diphones. Recherches/AcoustiqueCNET, Lannion III, 293-314.

Fujimura, O. and Lovins, J.B. (1978) . Syllables as concatenative units. In: A. Bell

and J.B. Hooper (eds). Syllables and Segments. North-Holland Pub. Co.

Olive, J.P. and Nakatani, L.H. (1974) Rule synthesis of speech by word concatena­

tion: a first step. J. Acoust. Soc. Am. 55, 660-666.

65

Page 65: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

66

Visual Perception and Reading

Page 66: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

DevelopmentsJ.A.J. Roufs

This year several new projects, dealing with the study of simple or complex percep­

tual attributes of vision were started. In connection with simple perceptual attri­

butes of vision our research efforts are characterised by the general theme of lu­

minance contrast transfer, while complex attributes are involved in our work on

image quality and reading.

Luminance contrast transfer

- In the Inter-faculty working Group on Retina Models, in cooperation with the

Electrotechnical and Physics Department of the Eindhoven University of Technolo­

gy, we suffered a severe loss by the untimely death of Dr Henk van Ouwerkerk. He

was a theorist of the Physics Department and dealt in particular with the pro­

blems connected with partial differential equations as found in continuous-layer

models of the retina. We will remember him with respect and miss his cordial

friendship.

- Continuous media having PDVs of a mixture of diffusion and wave type and discretespatio-temporal models were studied. Phase behaviour, studied psychophysically

was also analysed (Van Ouwerkerk, Van Aalst, Du Buf, Piceni and Roufs).- A new project this year is, 'Suprathreshold brightness contrast of foveal pat­

terns' (Du Buf). This project is supported by ZWO (Netherlands Organisation for

the Advancement of Pure Research). Its aim is the study of the development of

(subjective) brightness contrast of small foveal stimuli, such as letters or sym­

bols, as a function of luminance. Sharpness, area and background level are impor­

tant parameters.

- A project on the prediction of detection thresholds of spatial patterns, based on

the use of point spread functions of the eye has been terminated. The results of

this ZWO project have been published in part, other pUblications are in prepara­

tion (Blommaert).

- The work on spatio-temporal visual responses at threshold level as elicited by TV

line increments has been continued with several undergraduate students. Some re­

cent results are reported in this issue (Roufs and Polstra). In the context of

this work a certain amount of systematic research on the Van Meeteren grating pa­

radox has been done. Van Meeteren found that a thin bright line crossing a lumi­

nance grating perpendicularly is seen in the luminance troughs at the low spatial

frequencies and at the peaks for the high frequencies. Our results confirm his

findings and suggest that interaction of responses of line and grating cause the

phenomena at high spatial frequencies (Bakermans, final graduation study).

- Brightness dynamics is a new (ZWO) project. Judgement of brightness of time-de­

pendent stimuli is studied, using scaling and matching techniques aiming at quan­

titative dynamic models (De Ridder, Theelen). This builds on earlier work as pub-67

Page 67: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

68

lished by this laboratory.

Image quality

- A new project has been started called, 'Design criteria for Visual Display Units

to be used for intensive information transfer and investigated in connection with

visual comfort' (Boschman, Leermakers). It is made possible by a grant from the

Foundation of Technical Sciences. The effects of contrast, sharpness, font and

other parameters of text and alphanumeric symbols on the ease of human informa­

tion intake from VDUs are investigated by psychometric methods and the analyses

of eye movements, measured with an eyetracker from SRI, based on Purkinje images

(see pictures).

The SRI Dual purkinje Image Eyetrackerin use. The lower picture is an exampleof processed display text, the squaredot indicating the eye position of thesubject.

Page 68: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

ERRATUM

Due to an error in the printer's office, the lower picture on page 68has been interchanged with the lower picture in Fig.2on page 72.

Page 69: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

- Subjective sharpness has been studied in relation to spatial contrast transfer

and the luminance (tone) distribution function, using electronic image processing

(Saelmans, final graduation studies, Roufs). The fast 2D spatial filtering algo­

rithms were obtained thanks to the cooperation of Dr R. Eising of the Mathematics

Department. A second series of improved experiments has been finished and will be

reported on later. The results of metric and non-metric seal ing are compared

with sharpness matchings (see also Roufs, Soons and Eising, this issue).

- Experiments on scaling and matching of the sUbjective quality of projected images

differing in area and luminance have been extended by introducing sharpness as a

parameter. Interaction was found with both the other parameters (Boesten).

- The flicker fusion frequency of display monitors has been studied in relation to

luminance, distance and age of observers. The connection with the De Lange flick­

er fusion characteristics of a homogenous flickering field has been established

(van der Zee, Roufs). Results of earlier work can be found in this issue (Van derZee and Van der Meul--\

- A new proj ect has bl

conversions in coope

de Polder).

Reading processes

The processing of Ie

lowing up earlier we

rent aim is to exten

site words (Bouwhuis- Continuing earlier :

started studying thEfurther details, seE

- The work on selecti'is the case with thE

References

Bouwhuis, D.G. and Bu

IPO Annual Progres~

Nes, F.L. van and Jac

recognition. IPO AlRoufs, J.A.J. and Pol:

system elicited bysue) •

Roufs, J.A.J., Soons,

in relation to con

gress Report .l2 (tlZee, E. van der and Ml

frequency on the v

.l2 (this issue).

ctive evaluation of different kinds of

; Research Laboratory (Van der Zee, Van

ring recognition is a new project, fol­

)f three-to-five letter words. The cur­

Ls, developed earlier, to longer compo-

lrted by Van Nes and Jacobs (1981), we

under reduced-contrast conditions. For

is issue.

:er recogni tion has been terminated, as

the Elderly' (Bouma, Melotte, Zabel).

ition and Communication, Developments.

) .effect of contrast on letter and word

§., 72-80.

Ind-edge-spread functions of the visual

IPO Annual Progress Report 17 (this is-

( 1982) Some experiments on sharpness

ronic optical imaging. IPO Annual Pro-

182) The influence of field repetition

n displays. IPO Annual Progress Report

69

Page 70: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Some experiments on sharpness in relation to contrastbearing on electronic optical imaging

.J.AJ. Routs, AAG. Soons*) and R. Eising**)

Introduction

Subjective image quality is an important criterion for the design of electronic op­

tical imaging devices. It is used here in the sense of the ability to reproduce

reali ty in a satisfactory or pleasing way (Roufs and Bouma, 1980). Performance­

oriented aspects will not be discussed in this report. Image quality is known to

have several subjective dimensions (Nakayama, 1975, 1980). Subjective sharpness,

simply called sharpness further on, is relatively important. The main purpose of

the present investigation is to obtain quantitative information on this dimension.

Since spatial detail is its dominant parameter, (psychological) scaling of sharp­

ness as a function of spatial frequency content is one of the obvious approaches.

Other parameters are the mean luminance and the luminance reproduction function.

The latter may be characterised by plotting log luminance of the screen against log

luminance of the scene, the electronic counterpart of the tone reproduction func­

tion in photography (Hurter-Driffield curve). Usually a considerable part of the

middle range of these plots can be approximated by a straight line, enabling one to

use the slope y as a characteristic number. The latter, however, is also an impor­

tant parameter with respect to (subjective) contrast. It has long been known thatI

in order to obtain a good picture the luminance production should generally speak-

ing not be linear. Actually, in many cases y should be greater than one and differ­

ent for image devices looked at under different conditions. The ideal picture is

claimed to be rendered if, under the given conditions, the (subjective) brightness

distribution is a perfect copy of the brightness distribution of the scene, as ob­

served under daylight conditions (Nelson, 1977; Bartleson and Breneman, 1967). To

ensure this, the values of y should for instance be about 1.2 for TV and 1.6 for

slides under the usual conditions. The exponent y, however, does affect the impres­

sion of sharpness. In this article we want to report on some pilot experiments

showing the effect of y on the scaling of sharpness, mentioned above.

Fig. 1 shows the experimental setup schematically. The scenes are reproduced on

slides and converted into signals by a flying spot scanner. After being digitised

the signal is stored in a 512x512 pixels memory frame. The digitised image, the

pixel information, can be modified using algorithms run on a minicomputer and the

processed image can be restored in the memory, on a disk or on a professional video

tape. Finally the 512x512 image is displayed on a high-quality monitor. A second

monitor is available to provide a reference image if required.

The picture on the display is 27 em high and has a width of 30 em. The viewing dis­

tance is 240 em, which implies that the maximum spatial frequency in the horizontal

direction is 36 c.dg- 1 •

*) Now at PTT Headquarters, The Hague! 70 **) Department of mathem. and computer science, Eindhoven University of Technology

IPO annual progress report 17 1982

Page 71: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

"Iv:"; flying

~f~Di~ I~~I spot -scanner

slide

picture r---

~ monitor storageI))/.

deviRe ()1-- ~

subi. I (Y,HIII

tII

monitor I

I .i- video rec. computor

CC9~ terminalJ) I J

experimentor

Fig. 1. Schematic representation of the apparatus.

The luminance of the empty screen is 100 cd.m- 2 • The mean luminance of the scenes

is about 20 cd.m- 2 . The results shown are obtained with two scenes which differ

considerably in spatial spectrum and grey-level distribution as demonstrated in

Fig. 2.

As to the picture storage device the pixel signal constitutes a reasonable

approximation of a power function of the scene luminance. The stored image is

filtered digitally in two spatial dimensions by a linear shift-invariant, second­

order, digital, recursive filter (Eising, 1980). The algorithms have been modified

to perform noncausal filtering. In the case we describe here, low-pass filters are

used whose modulation transfer functions closely approximate a gaussian. The point­

spread function is a rotation-symmetric gaussian function, that is

PSF(x,y) -- eln2

ln2 ( 1 )

Here f c is the 6 dB spatial cutoff frequency.

The filtered signal is transformed again, using pointwise mUltiplication by an

exponential function whose exponent can be varied (no homomorphic filtering is

applied in the present case). By using a y -dependent correction factor the mean

signal value over the image is kept constant to within 20%.

The monitor is the last stage and its luminance output is an exponential function 71

Page 72: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

255Grayvalue

200

200 255Greyvalue

150

150

100

100

50

50

2.51,.... -,

8c~

B~

1.25

5 8 1 ......-----------------,c~

B~

2.5

Fig. 2. Two scenes used in the experiments. The histograms show the grey leveldistribution.

of the input signal, the exponent being 2.4. As a result, the overall luminance re­

production function, in good approximation, is given by

yLdisplay = a.Lscene (2 )

y can be changed between certain limits.

Methods

Sharpness is determined by category scaling using the numbers 0-10 to express its

magnitude. During every session the subjects are shown a stimulus tape, lasting 25

min. 5 values of cutoff frequency f c and 5 values of yare used and are repeated

3 times. This means 75 stimuli shown pseudorandomly in 3 latin squares. Every pic­

ture is to be observed for 15 seconds. The pauses between pictures last 5 seconds

and are filled with a homogeneous field of mean picture luminance in order to main­

tain the adaptive state. The tape was repeated 3 times giving 9 judgements for each

combination.

The matching is done by displaying videorecorded references on one monitor and com­

puter controlled images of the same scene on the other. The reference has a certain

{fc'y} combination. The test generally has another f c ' its y being changed accord­

ing to a double staircase method. The subject is instructed to say yes when the

test scene appears to be sharper. In the initial phase y is changed by step values72

Page 73: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

of d = 0.05. In the measuring phase the steps are d = 0.01. The change in y is re­

versed after 2 same responses. 36 trials are used including the initial 6. The Ys

found at the points where the direction is reversed are averaged.

acuity and are between 25 and 30 years old.

of sharpness S are shown in Figs 3a, 3b and 4a,

function of spatial cut-off frequency. There are

subjects. The scaled values turn out to be sensi-

Ji - 1.6

0.8 to t2 t4 1.6~

log Ic [log C.dg-1J

subj. FBscene1

~ -o ~ -0.6

..-/

j:("/ .I:>- -tr-----------t>. Yt =0.4

I ",I ",

/ isd// 11 =0.16

/.~.

......... ";,....... _._.-e 't =0.08..-._........ ......-.-.-.

+10

S 9

RB@

8scene 1

7

6

5

4

3

2

0

dg-1J

lrpness as a function of the spatial cut-offle 1 by 2 subj ects. The gamma of the luminancehere.

~t . 018

subj. TSscene 2

t4 t6

,o1fc [log c.d9-1]

t2to0.8

._..... _ ......._._._.-.. P, • 008....._...

+10

S 9

8

7

~1 6

5

4

3

2

0

-1 J

- ----"'--z ~~Q"'.LlI'::l ox: 1:WO quite different scenes by one subject. Examples ofiso-sharpness lines are drawn horizontally: 73

Page 74: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

There is amazingly little difference between scenes, as may be appreciated by com­

paring Figs 4a and 4b. Horizontal lines through the scale values are iso-sharpness

lines. Their intersection points with the family of curves having different values

of y give rise to a set of y,f c combinations. From them, iso-sharpness curves can

be plotted in the y,fc domain. These curves are shown in Figs Sa and b (drawnlines) •

However, if different y,fc combinations are perceived as being equally sharp, it

must also be possible to match them. The data from the matching experiments arealso shown in Figs Sa and Sb (dashed curves).

S~2 34 56 7 9 2 34 56 7 8Y 16

\1'; 1.6

14 14

1.2 12

subj. TS subj. TS1.0

scene1 10 scene 2

o.a 08

06 06

04 0.4

02 02@

QO 0008 1.0 12 14 1.6 08 10 12 1.4 1.6

log I c [log c.dg-1) log I c [log c.dg-11

Fig. 5. Iso-sharpness matchings (dashed curves) in relation to iso-sharpnesscurves derived from category scaling (drawn curves). One sUbject, two scenes.

Discussion and conclusion

74

Although category scaling is a fast measuring technique, it is not easy to estimate

its validity. First, it is hard to instruct subjects adequately as to the relevant

perceptual attribute and even then one is not sure what criterion they actually

use. For instance, one of the subjects (RB) stated that she had the feeling that

she sometimes used quality as a criterion. Second, there is always the question:

does judgement reflect perceptual strength properly? On the other hand there is no

substantial difference between subj ects. Moreover, the small differences between

the scenes is encouraging.

In the matching experiments one meets analogous criterion problems in having to

match sharpness when contrast is clearly different. Nevertheless, subjects seem to

be able to do it. Moreover, the iso-quality curves stemming from both experiments

agree, in view of the difficulties, remarkably well. This suggests at least a con­

sistent monotonic mapping from perception on judgement in the case of the category

Page 75: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

scaling. In order to obtain more information on the validity of the category scale,

one has to compare different scaling techniques (experiments we performed on this

subject will be published elsewhere).

A detailed analysis of both types of curves shows that, within experimental uncer­

tainty, there is consistency with Nakayama's results.

The results of Fig. 5 suggest that there is a certain trade-off between bandwidth

and y. To judge this fully, however, more factors than have been treated here have

to be taken into account. The same applies if we want to explain the shapes of the

curves of Fig. 5 in terms of fundamental properties of contrast transfer.

Summary

Sharpness, the important psychological dimension of image quality, has been scaled

as a function of spatial cut-off frequency. The luminance-reproduction characteris­

tic 'y' turned out to be a sensitive parameter. Iso-sharpness curves obtained from

these scale values are found to be consistent. With matching curves obtained from

equal sharpness settings between different combinations of cut-off frequency and

gamma, a certain trade-off between sharpness and contrast has been observed.

References

2D systems: an algebraic approach. Math. Centre Fracts 125,

Brightness perception in complexand Breneman, E.J. (1967)

Soc. Am. ~' 953-957.

Bartleson, C.J.

fields. J. Opt.

Ei sing, F • ( 1980 )Amsterdam.

Nakayama, T. (1975) Picture quality and subjective evaluation. In: Fundamentals ofimage electronics, Corona Book Co. Japan, 294-352.

Nakayama, T. (1980) Evaluation of displayed image quality. Proc. Soc. Int. Displ.Engn. ~ (3), 180-181.

Nelson, C.N. (1977) The reproduction of tone. In: Neblette's Handbook of photo­graphy and reprography, 234-246.

Roufs, J.A.J. and Bouma, H. (1980) Towards linking perception research and imagequality. Proc. SID ~ (3), 247-269.

75

Page 76: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

76

The influence of field repetition frequency on the visibility offlicker on displays

E. van der Zee and A.W. van der MeuJen*)

Introduction

Paper and parchment have always been easier to bleach than to darken. Perhaps this

is why we have become so familiar with texts with dark characters on a light back­

ground, and why we seem to read texts in the reversed mode less easily (Bauer and

Cavonius, 1980). On visual displays, however, it has become common practice to use

light characters on a dark background since this is technically the easiest way to

avoid flicker.

Most visual displays have the same field repetition frequency as broadcast televi­

sion receivers (50 Hz in Europe and 60 Hz in America and Japan). During normal te­

levision programmes flicker is not very prominent. It can be seen most clearly by

looking slightly off the screen or by viewing the screen close by, especially when

large areas with high luminance are broadcast. This is because the periphery of the

eye is more sensitive to flicker than the central part and because the flicker sen­

sitivity increases with brightness. Unfortunately, displays are typically used un­

der viewing conditions which are optimal for the perception of flicker, and for

this reason a dark background is mostly used.

In this paper we present an investigation into the question as to what height the

repetition frequency should be raised so that no flicker is perceived on displays

where, like text on paper, use is made of dark characters on a light background.

This has been done for the interlaced mode, a display mode which, while saving

bandwidth, is most critical.

In the interlaced display mode a picture is constructed from two fields, one con­

sisting of all odd lines (1st field) and the other of all even lines (2nd field).

The fields are scanned sequentially. This has the advantage that the picture repe­

tition frequency is apparently doubled (The actual picture repetition frequency is

half the apparent picture repetition frequency, which is equal to the field repeti­

tion frequency). A major drawback is the occurrence of interline flicker in addi­

tion to large area flicker.

Large-area flicker is seen in particular in large, bright areas. It can be reduced

by increasing the frequency at which the picture is repeated. In interlaced display

mode the field repetition frequency is the relevant factor, in non-interlaced mode

the picture repetition frequency.

Interline flicker occurs only in interlaced pictures. The visibility depends on the

spot size of the display and the distance between neighbouring scanning lines. Two

types of interline flicker are distinguished, so-called dancing-line flicker and

single-line flicker.

Dancing-line flicker occurs at luminance transients with a horizontal boundary;

*) Philips Research Laboratories, Projects Centre Geldrop

Page 77: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

this boundary seems to dance up and down. One period of this phenomenon consists ofan up and a down movement: its frequency is therefore equal to the picture repeti­

tion frequency, i.e. to half the field repetition frequency.Single line flicker occurs when the luminance of one line or piece of a line dif­

fers strongly in luminance from neighbouring lines. Now, a brightness modulation is

visible of a line with a fixed position. As the line belongs to only one field, the

perceptible frequency in this mode is half the field repetition frequency.

The experiments

The experiments on large-area flicker were performed with the most critical scene,

that of a homogeneous white screen (pattern a in Fig. 1). Three luminance values

were used, viz. 50, 100 and 200 cd/m2, and three viewing distances, viz. 33 cm

(normal reading distance), 50 cm (recommended distance for using a keyboard), and

70 cm (the largest practical distance for looking at a display on one's desk).

a b c

~ = = = = = = = =

~ = = = = = = = = - -- -== : = = = = = = I- - - - - - -• = = = = = = = = -- -= = = = = = = = = - - - - - - - -

d e f

Fig. 1. Patterns used in the experiments. Large-area flicker: pattern a: dancing­line flicker: patterns b, c and e: single-line flicker: pattern d and f.

The thresholds were determined for the 9 possible combinations of viewing condi­

tions for 24 male subjects, varying in age between 20 and 60 years with 6 subjectsin each span of 10 years.

'77

Page 78: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

78

Non-emmetropic subjects wore their spectacles during the experiment.

The experiments on interline flicker were performed with the following scenes:

a. A pattern with a repetition of two neighbouring black lines in a light back­

ground. In this pattern dancing-line flicker occurs in its pure sense. The two

lines are from different fields. In the actual stimulus, every 16th line of a

field was black (pattern b in Fig. 1).

b. A pattern with a repetition of horizontal black and white bars of equal size. In

this pattern dancing-line flicker occurs at the edges of the bars. In the actual

stimulus each bar consisted of 128 lines (64 from each field) (pattern c in

Fig. 1).

c. A pattern with a repetition of one black line in.a light background. With this

pattern single-line flicker is roused. In the actual stimulus one field was com­

pletely white, whereas every 16th line of the other field was black (pattern d

in Fig.1).

d. The patterns described under a. and c., now with black lines consisting of small

black and white segments of equal size (patterns e and f in Fig. 1).

The thresholds were determined for the authors (aged 29 and 34) for the most criti­

cal viewing condition, viz. a viewing distance of 33 cm and a white part lumi­

nance of 200 cd/m2).

Experimental setup

The display that was used had been designed to give the quality of average text on

paper (see Thoone, 1982). Its size is equal to A4 format (21.*29.7cm); the aspect­

ratio is 1:/2 (instead of the usual 4:3). The number of lines is 2501, 2287 of

which are active. One line contains 1728 picture elements that can be either black

or white. These figures agree with a CCITT recommendation for digital facsimile

transmission. With appropriate electronic circuitry, the field repetition frequency

can be adjusted to an integer value between 60 and 120 Hz.

An important factor that determines the amount of flicker for a display at a cer­

tain repetition frequency and a certain luminance is the phosphor that is used.

with a persistent phosphor it is possible to arrive at flicker free screens at

rather low frequencies. This has the drawback, however, that when changing the con­

tents of th~ screen the old information will also remain. In our experiments P4

phosphor was used. This has such a fast decay time that the light-dark ratio of a

single spot can be approximated by a Dirac pulse. The modulation depth of the first

harmonic component is then equal to the theoretical maximum, namely 200%. This

means that, as far as the phosphor is concerned the most critical situation has

been chosen.

The subjects were instructed to fixate at a point in the centre of the display.

During the interstimulus periods an adaptation field of the same luminance as the

display was generated by a light shining in a plate of ordinary glass, functioning

as a half-mirror. This plate was placed perpendicular to the ground at an angle of

Page 79: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

45° to the display as well as to the lighting. The subjects looked at the screen

through the glass. The lighting had direct-current-driven fluorescent lamps to pre­

vent flicker. Appropriate lightings on a cardboard surrounding of the display pro­

vided the recommended contrast ratio 1:3 with the screen luminance (Cakir et al.,

1979). The subject's head was fixated with a head-and-chin rest.

Procedure

A 2-alternative forced-choice method was used to determine the flicker thresholds.

The subject was presented successively with two stimuli of a duration of 1 second

each, separated by an interstimulus period of 1 second. One of the stimuli, the re­

ference, had a frequency of 120 Hz at which it is impossible to see any flicker.

After a few preliminary trials, 6 equidistant frequencies were chosen for the test

stimulus, which could be expected to lie around the threshold frequency. The pre­

sentation sequence of test and reference stimulus was varied randomly. The sUbject

had to report (if necessary, by guessing) in which of the two stimuli flicker was

present. The flicker threshold was defined as the frequency belonging to the 75%

correct score. Note that this corresponds to a 50% visibility score.

The subjects in the experiments on large-area flicker participated in two experi­

mental sessions in succession per day. In each experimental session, the 6 test

frequencies were presented at random 10 times each, resulting in a total of 60 pre­

sentations of a stimulus pair. The time interval between two successive stimulus

pairs was at least 1 second. Each experimental session lasted about 15 minutes. In

the fifth and last session the threshold for the sUbject's first viewing condition

was determined once more in order to establish the occurrence, if any, of a system­

atic threshold shift during the experiment. No such shift was found. For the two

subjects in the interline flicker experiment each of the different thresholds was

measured three times.

Results and discussion

The results of the large-area flicker experiment are given in Fig. 2. The field re­

petition frequency is plotted along the vertical axis, the luminance is plotted 10­

gari thmically along the horizontal axis. The dots represent the 75% thresholds,

averaged over subjects. The length of each bar around a dot is twice the standard

deviation of the mean.

The results can be described by the formula:

LAFT = 48.0 + 12.1*logL + 4.2*10gA

in which

LAFT= Large Area Flicker Threshold (Hz)

L Luminance (cd/m2)

A Area on the retina (mm2).

This equation was derived by means of a regression analysis. It is a combination of'79

Page 80: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

FRF (Hz) FRF FRF90

~ + +85

!!80

! !! ~

75 !33CM 50CM 70CM

7050 100 200 50 100 200 50 100 200

screen luminance (CD/M2 )~

Fig. 2. 75% thresholds for large-area flicker. Results averaged over 24 subjects.Pattern is a homogeneous white field with viewing distance as indicated.

the laws of Ferry and Porter for the luminance part and Granit and Harper for the

area part (Landis, 1954). It is worth noting that the luminance part of the equa­tion agrees with previously published data (Haantjes and De Vrijer, 1951). The low

coefficient of the area part is only valid for the large fields used (20-40°). For

smaller fields (1'-1°) it is much larger (Landis, 1954).

It seems reasonable to assume that flicker annoyance disappears at frequencies

below the threshold, although we do not have quantitative results on this, but cer­

tainly no annoyance from flicker will occur if it cannot be seen. Therefore, 97.5%

and 52.5% correct thresholds were also estimated from the scores obtained. These

thresholds are shown in Fig. 3. Areas above the upper curves represent frequencies

for which large-area flicker is observed 5% of the time, whereas in the areas below

the lower curves it is seen 95% of the time. It should be noted that for the most

realistic viewing condition, at a viewing distance of 33 cm and a luminance level

of 200 cd/m2, flicker is almost always seen up to 80 Hz and that it only completely

disappears at frequencies exceeding 92 Hz.

There is no agreement in the Ii terature about the effects of age on thresholds for

the perception of large area flicker (Landis, 1954: McFarland et al., 1958: Misiak,

1947). We did not find a significant age effect (see Fig. 4).

80

The results of the experiments on interlace flicker are given in Table 1. The fi­

gures are the 75% thresholds. As regards dancing-line flicker, a striking differ-

Page 81: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

FRF (Hz)

95 +FRF FRF

85

75

33CM 50CM 70CM

65 ~--+----f--+-50 100 200 50 100 200 50

screen luminance (CD/M2)~

100 200

Fig. 3. 52.5% and 97.5% thresholds for large-area flicker. Flicker is always seenfor frequencies below the lower curve and is never seen for frequencies above theupper curve.

• 200 CD/M2

* 50o 70

FRFFRF

75

FRF (Hz)

90 +

85

80

33CM 50CM 70CM70 L..---f-----I--+--~

20 30 40 50 20 30 40 50 20 30 40 50

age in years ~

Fig. 4. Influence' of age on the thresholds for large-area flicker. 81

Page 82: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

ence between subjects can be observed. For subject 1 the thresholds decrease by

more than 10 Hz if two black lines (pattern b) are replaced by a thick bar (patternc). The thresholds for subject 2, however, remain the same. A possible explanation

can be given by comparison with the large-area thresholds (pattern a) of the two

subjects. Again, it is found that the thresholds for subject 1 differ by more than

10 Hz, and remain the same for the other subject. So, most probably, subject 1 did

use dancing-line flicker percepts as clues for his detection criterion, whereas

subject 2 used the same criterion for the three patterns,. namely large-area flickerin the white parts.

Dancing-line flicker

Two neighbouring black lines on a

light background (pattern b)Two neighbouring lines of black

and white segments on a lightbackground (pattern e)

Thick horizontal black and white

bars (pattern c)

Single-line flicker

Single black line on a light

background (pattern d)

Single line of black and white

segments on a light background(pattern f)

Large-area flicker

Homogeneous white field (pattern a)

SUbject 1

82

80

70

103

99

93

SUbject 2

81

80

79

97

97

81

82

Table 1. 75% thresholds for interline flicker percepts.

This would mean that dancing-line flicker has lower thresholds than large-area

flicker. Besides, the effects of dancing-line flicker are more clearly visible if

it occurs in its purest form (2 lines) than on the edges of extended bars.

The thresholds for single-line flicker, by contrast, are higher than for large-area

flicker, as the results for pattern c show. This can be explained easily by imagin­

ing the case in which one field is white and the other black: then one does not see

a fine pattern of black and white lines, but a rather homogeneous plane that flick­

ers half the field repetition frequency. This means that the threshold would rise

from 85 to 170 Hz. Fortunately, one single line does not do so much damage, al­

though the threshold does become higher.

As can be seen from Table 1 (patterns b vs e and d vs f) thresholds for interline

flicker do not change if the black lines are replaced by lines with alternating

Page 83: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

black and white pieces. Therefore an important conclusion can be drawn for the de­

sign of characters on a visual display: In the case of interlace, vertical parts

should not be as thin as 1 line, for these parts will give rise to the highest

flicker thresholds.

It should be borne in mind, that for broadcast television such large differences in

luminance in neighbouring lines cannot be recorded with existing cameras, so that

single-line flicker does not occur there.

Conclusions

- A flicker-free display can be guaranteed for all practical cases of screen lumi­

nance and viewing angle only if the field repetition frequency is increased to at

least 92 Hz.

- There is no significant effect of age on the visibility of large area flicker.

- The thresholds for the perception of dancing-line flicker are lower than those

for large-area flicker.

- Two neighbouring lines with a luminance different from their surroundings give

rise to higher thresholds for dancing-line flicker than the edges of thick bars

do.

- The thresholds for single-line flicker are higher than those for large-area

flicker.

- A designer of characters for displays should therefore avoid having parts of his

characters as thin as 1 line in the vertical direction.

References

Bauer, D. and Cavonius, C.R. (1980) Improving the legibility of visual diplay

units through contrast reversal. In Grandjean, E. and vigliani, E. (Eds). Ergo­

nomic aspects of visual display terminals. London, Taylor & Francis, 137-142.

Cakir, A., Hart, D.J. and Stewart, T.F.M. (1979) The V.D.T. Manual. I.F.R.A.,

Darmstadt, 177.

Haantjes, J. and Vrijer, F.M. de (1951) Het flikkeren van televisiebeelden. Phi­

lips Technisch Tijdschrift 11 (1), 7-12.

Landis, C. (1954) Determinants of critical flicker fusion thresholds. Physiol.

Rev. 34, 259-286.

McFarland, R.A., Warren, A.B. and Karis, C. (1958) Alterations in critical flicker

frequency as a function of age and light-dark ratio. Journal of Exp. Psych. ~

(6), 529-538.

Misiak, H. (1947) Age and sex differences in critical flicker frequency. Journal

of Exp. Psych. 37, 318-322.

Thoone, M.L.G. (1982) A very high resolution document display. Proc. of the Int.

Conferencece on Image Processing, York, lEE Conf. Publ. 214, 6-10.

83

Page 84: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

84

Is large print easy to read?Oral reading rate and word recognition of elderly subjects

H. Bouma, Ch.P. Legein, H.E.M. Malotte and L. Zabel

Introduction

Proficient readers can read print of a wide range of sizes. Small print such as in

telephone directories and in abstracts of scientific journals can be read apparent­

ly just as well as supersized headlines of newspapers or advertisements. If the

size of print is largely irrelevant for the reading process, the selection of print

size can be guided by other factors such as print economy, aesthetics and conspic­

uousness.

From the point of view of understanding visual reading processes, it is not immedi­

ately clear why reading processes should be insensitive to print size. How is this

independence accomplished, what reading processes are so adaptive, and where are

the limits of adaptivity? How good is the evidence that reading processes are inde­

pendent of print size and what size limits do obtain? In fact, Tinker (1963) in his

classic work mentions an optimum print size for lower-case print of 11 point size

(equivalent to some 2.2 mm x-height with 1.3 rom extensions), which should make for

faster reading than, for example, 8 point and 14 point size.

At the lower edge of readable print sizes it is clear that relevant letter details

should exceed visual acuity, which calls for a lower limit for letter size of ap­

prox. 5 minutes of arc (letter height x = 0.5 mm at 33 cm reading distance). For

people with low visual acuity, (V.A.) this limit should be higher, for example x =5 mm if V.A. = 0.10. At the upper edge, no theoretical limit is apparent. People

with high and with low acuity would be expected to be equally proficient in readinglarge print.

Our interest arose in the context of a project investigating the respects in which

the reading of elderly people might differ from that of younger adults. Clearly,

the fraction of people encountering difficulties in reading increases with age, but

it is not clear whether this is caused by decreased visual faculties or by psycho­

logical faculties of a general nature. We searched the literature for experimental

evidence, but found almost none directly applicable to reading, although a useful

recent review is available (Sekuler, Kline and Dismukes, 1982). In line with our

long-standing interest in visual reading processes, we started a research project

to get some insight into the matter.

What visual faculties should be considered? Usually, we distinguish four types of

visual reading process: (a) optical imaging, (b) control of eye saccades, (c) fove­

al and parafoveal word recognition, (d) integration of information across saccades.

(a) Optical imaging of small and large print presents no obvious problems to normal

eyes, although in individual cases the optical quality of the retinal image is not

always easy to assess. As to (b), control of eye saccades, it has been reported

Page 85: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

that reading-saccades are proportional to print size (Rayner, 1983). Consequently

their extent should be expressed in number of letters rather than in degrees of vi­

sual angle. Thus, reading-saccades would perhaps fully adapt to print size. (c) It

then also seems logical to express the width of the reading field in number of let­

ters which indeed is commonly done, although to our knowledge this is not based on

experimental evidence. We shall present some evidence bearing on this question. The

fourth question (d), concerning integration of visual information across saccades

is directly relevant, but will be left aside here.

We report here on two types of experiment: (1) oral reading rate and (2) visual

word recognition both in the centre of vision and left and right of the point of

fixation. In both experiments we varied print size and illumination level. Subjects

were motivated readers in five age groups and two visual acuity ranges per age

group.

Experiments

We aimed at 10 subjects in each age group, 5 of whom had high visual acuity

(V.A. ~ 0.8 for each eye) and 5 moderately low visual acuity (0.1 .;;:; V.A.';;:; 0.3 for

each eye). Full ophthalmological data were available. Age groups were 33-45 years,

45-55, 55-65, 65-75, and 75+ and will be referred to as 40, 50, 60, 70 and 80 yrs.

In the 40-yr age group, we could find only two low-vision subjects. In selecting

subjects, we aimed at high motivation and high reading skill, as reflected in pre­

sent or earlier profession and in present interest and reading habits. We took

brief tests on IQ and on reading skill (Cloze test) which will be left out of ac­

count here.

We report on two tasks:

(1) Oral reading of text passages printed in lower-case letters without serifs us­

ing five h-height print sizes (i.e. the height of the lower-case letter h) of

1.8, 3.2, 5.3, 9.0 and 16.0 mmi x-height was 75% of h-height. Reading distance

was maintained at 33 cm by a forehead support, with proper optical correction.

Illumination -level E was 400 lux, environment 100 lux. Oral reading rate R

(words/minute) was taken from the first minute of reading, except for the two

largest print sizes where the whole passage was read in less than one minute

and the value of R was extrapolated. This was also done in a few cases where

subjects did not succeed in reading the full passage because of fatigue. Pas­

sages of interesting text were taken, average word length was 1.6 syllables andaverage sentence length 13.5 words.

(2) Recognition of Dutch 6-letter words of two syllables, with frequency of occur­

rence between 2.10- 5 and 10-4 , from a single tachistoscopic (100 ms) pre­

sentation centred at the point of fixation (foveal) or aside from fixation in

the left and right visual field (eccentric presentation). In eccentric presen­

tation the word was at positions -9 to -4 letters in the left visual field or

from positions +4 to +9 letters in the right visual field, for all letter

sizes. This implies that eccentricity in degrees of visual angle grows linearly

with letter size. Recognition distance was 33 cm. Results to be presented are

averaged over three levels of illumination E = 160, 400 and 1600 lux. Order of

presentation was from low to high illumination and for each illumination from85

Page 86: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

small to large print size. We repeated one small letter size to get an impres­

sion of possible learning effects, which were not apparent. The subjects startedeach presentation by pressing a button when fixating a fixation mark.

The tachistoscope was of a projection type with electronic shutters and a trans­

lucent screen. The screen caused some loss of sharpness which, for the smallerletter size (h = 1.8 mm), was visible as a lower contrast. Order of presentation

was the same for all subjects. There was a voice switch providing vocal latency

data which will not be reported here.

Results

Oral reading rates

Fig. 1 gives oral reading rates R as a function of age for a letter size h = 5.3

mm, separately for high and low-acuity groups.

+++~aCUiIY. -low acuity

E=400 luxh =5,3 mm

+ 200w/min.

(\)-CIl...Olc:

"'CCIl(\)...CIl...

1000

R

40 60 70 80 90

~ age (years)

Fig. 1. Group averages of oral reading rate as a function of age forboth high and low visual acuity groups. Letter size h = 5.3 mm, illu­mination E = 400 lux.

86

Despite adequate letter size, the poor-vlslon groups read slower than the good vi­

sion groups. As to age effects, the high-acuity groups show a decline with increas­ing age above 60 yrs, whereas in the low-acuity groups individual differences mask

Page 87: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

possible age effects. In Figs 2 and 3 we shall combine the 40-60 yr groups and the

70-80 yr groups. Fig. 2 gives oral reading rates as a function of print size. Over­

all, there appears to be a sli9ht positive influence of letter size for all four

groups. Contrary to expectation, the low-acuity groups do not have dramatically low

average reading rates for small print size.

16148 10 12letter height (mm)

42~h

E=400 lux

40-60 high acuity -l " 70-80 high acuity -----:

h~ .17~

~L

40-60 low aCUityI--

......~...-

,I

oo

R

~

300 w/min.

(lj

(; 100

~ 200

Q)-(lj.....OJC

"0(ljQ).....

Fig. 2. Oral reading rate as a function of print size for two agegroups of high and low acuity. E = 400 lux. All groups benefitslightly from larger print size. The low acuity groups can never­theless read small print size reasonably well for a short while.

Word recognition scores

Fig. 3 gives correct word recognition scores as a function of letter size separate­

ly for left eccentric, foveal, and right eccentric presentation. Letter size is now

ordered along an almost logarithmic scale. Central presentation offers little dif­

ficulty for the high-acuity groups, but the low-acuity groups now show the benefi­

cial influence of larger letter size that was to be expected, reaching close to

perfect scores for h = 16 mm letter size.87

Page 88: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

1.0

c:.Q.'!::c:Clo()Q)...t> .5 r-------,~=__,,------­Q)......o()

c:o

~-18 3.2 5.3~h

left visual field

1.0

.51-----++-----

1.8 3.2 5.3 9.0 16.0mm~h

central

1.0

.5 f---------}-rt-

.0 L--~_~_'__---'-_.....1.8 3.2 5.3 9.0 16.0mm~h

right visual field

'88

Fig. 3. Correct word recognition scores as a function of print size for three po­sitions in the visual field: central, left of centre and right of centre. For dif­ferent sizes, the number of letter positions from fixation has been kept constant.Note the substantial decrease with print size for large letters in eccentric vi­sion.

In eccentric vision, results are surpr~s~ng. Both for good and for low-vision

groups there are substantial influences of letter size, with maxima clearly ex­

pressed. For the good-vision groups, maximum recognition occurs at h = 3-5 mm,

whereas for the low-vision groups the maxima are at 5-9 mm. For large letters (h =16 mm) high and low-vision groups both score dramatically low. As expected, recog­

nition in the right visual field exceeds recognition in the left visual field

(Bouma, 1973), differences being clear-cut for all letter sizes where floor effects

are absent.

Discussion

Oral reading rates are somewhat higher for high than for low visual acuity groups,

even for large letter size. For us, this was an unexpected finding. Even if low

acuity is compensated by large letter size, reading is slower for the low-acuity

group. Have low-vision sUbjects acquired a habit of slower reading or has reading

proficiency truly decreased? Although the remaining reading rates for low-acuity

subjects are tolerable, reading rate is important for getting the meaning of para­

graphs because of assumed time limits in working memory. An extreme example is the

difficulty experienced by anyone in getting the meaning of a long sentence when the

reading rate is below 20w/min., which value is often considered as a lower limit

for the reading of text passages.

Page 89: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

For the high-acuity groups, reading rates are close to maximum speech rates. We

have no inQependent evidence as to speech rates of our subjects, but it would seempossible that it is perhaps speech rate rather than reading rate that has decreased

in the case of the elderly subjects. It would be interesting to establish rates for

silent reading which, for experienced readers, surpass speech rates.

Age itself appears to be of restricted influence on oral reading rates, whereas

low acuity is more clearly a limiting factor. One should be aware that in the gen­

eral population (in contrast to our experimental subjects) age and acuity are nega­

tively correlated - in fact we had difficulty in finding old subjects with high vi­sual acuity. Therefore the general conclusion should probably read: average oral

reading rate diminishes with increasing age, primarily concomitant with lowered vi­

sual acuity.

The substantial effects of letter size left and right from fixation is surprising

indeed. Apparently, the range of letter sizes to which recognition is fully adapted

is very limited. Both for small and for large size it is conceivable that the visu­

al system as a whole adapts, as oral reading rates indicate, but then it would seem

that for large letters a narrower reading field is compensated by shorter reading

saccades. This is indeed what Tinker (1963) has reported, and a direct investiga­

tion of the matter seems in order.

A different question is why it is that eccentric recognition decreases with in­

creased print size. In usual print size, we know that it is lateral interference

between letters (and not parafoveal acuity) that limits recognition and we have

some indications that for large print size, interference is greater, particularly

for letters farther from the fovea.

It has been argued (Rayner, 1983) that tachistoscopic word recognition is not re­

presentative of the reading field during actual reading of text. Although estimates

from word recognition in the right visual field and from actual reading agree, in

the left visual field word recognition seems to extend further into eccentric vi­

sion than is necessary for reading. The present results also show a dissociation,

since tachistoscopic recognition is poor for large print size, whereas the oral

reading rate is, if anything, slightly higher. Also, word recognition of small

print is poor in our low-acuity groups, whereas oral reading rate is reasonable.

However, we cannot assess the possible influence of speech rate limits.

From a theoretical point of view, a few differences between reading field and word

recognition field can be considered:

(a) In text, lateral interference is greater because it extends over substantial

retinal distances and text has more interfering words (and lines) than is the

case in single-word recognition. This would lead to a narrower field in read­ing.

(b) Text is redundant in that syntactic and semantic relationships enable the use

of knowledge (top-down information), so that less visual information (bottom­

up) is needed than in single-word recognition. This would give a wider field inreading.

The two effects might perhaps balance in first approximation. However this may be,89

Page 90: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

90

it is not sufficient to state that tachistoscopic recognition is not representative

of actual reading. Rather, we should try and find out why, in actual reading, thesize of the reading field during eye pauses is as large or as small as it appears

to be. And tachistoscopic recognition seems to be a rather straightforward way of

finding out contributory factors, particularly of a visual type.

In the present experiments, the lack of correlation between reading rate and ta­

chistoscopic recognition may have a prosaic, even trivial background, because of

ceiling effects in oral reading rates. Silent reading might perhaps be a better

tool, and the measurement of eye saccades seems equally necessary.

Conclusions

Oral reading rate appears to be rather insensitive to print size both for sUbjects

with high and with low visual acuity. A low acuity goes hand-in-hand with a lower

reading rate, and a lower acuity seems to be the primary reason why elderly sub­

jects encounter more reading difficulties.Word recognition, both left and right of fixation, is very much dependent on print

size if the number of letter positions from fixation is kept constant for different

print sizes. In particular, large print is very difficult to recognise and the vi­

sual reading field shrinks with print size above some 5 rom. For large print, word

recognition of both high and low visual acuity groups is equal.

Is large print easy to read? For oral reading the answer is yes, but we do not know

why. Tachistoscopic word recognition indicates a narrow visual reading field, which

should be expected to be compensated by short eye saccades. If so, the silent read­

ing rate of large print could well suffer.

References

Bouma, H. (1973) Visual interference in the parafoveal recognition of initial and

final letters of words. Vision Research 11, 767-782.

Rayner, K. (1983) Visual selection in reading, picture perception, and visual

search, a tutorial review. In Bouma, H. and Bouwhuis, D.G. (eds) Attention and

Performance X: Control of language processes, Hillsdale, N.J.l Lawrence ErlbaumAssociates 1983.

Sekuler, R., Kline, D. and Dismukes, K. (eds) (1982) Aging and human visual func­tion. Alan R. Liss Inc., New York.

Tinker, M.A. (1963) Legibility of print. Iowa State university Press.

Page 91: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Line and edge-spread functions of the visual system elicited by aTV display in situ

J.A.J. Routs and J. Polstra *)

Introduction

Vision of spatial detail and contrast, which are intimately related, are fundamen­

tal considerations in the design of displays. An imaging device should meet the eye

but does not need to be better. There are several ways to characterise detail vi­

sion. One such measure, which became popular in the last two decades, is the spa­

tial modulation transfer function (SMTF). It is based on a linear optical theory

(Duffieux, 1946), applied later in displays (Schade, 1948). The SMTF relates thein- and output of the sinusoidal eigenfunctions. It has the advantage of generality

over earlier methods. Although the eye is not linear around thresholds of luminance

modulation the assumption of small-signal linearity has been applied successfully.

The spatial modulation transfer of the eye has been determined by Campbell and

Green (1965) and by Campbell and Robson (1968), using sinusoidally modulated grat­

ings. It is usually assumed that at threshold of contrast modulation the responding

signal at some relevant but unspecified part of the visual pathway has to have a

constant critical value independent of spatial frequency. If the processing is per­

formed in one channel we are in a position to measure the transfer of a filter by

varying the input keeping the output amplitude constant. In the eye the measured

transfer is found to be a function of the mean (adapting) luminance (for instance,

see Van Nes, 1967; Van Meeteren, 1973). The full use of Fourier techniques in pre­

dicting the thresholds of arbitrary stimuli, however, is hindered by some complex

properties of the visual system. First, the eye is not homogeneous. Second, Camp­

bell and Robson (ibid) showed that it processes contrast in differently tuned pa­

rallel channels.· Finally, it is noisy (Graham, 1977; Sachs, Nachmias Robson,

1971). In practical problems, therefore, it may be more convenient to use another

systems function which is closer to the kind of image pattern one is interested in

and which suffers least from the complexing factors mentioned above. In linear,

space-invariant, mono-channel systems it could be derived from the SMTF. In realityit has to be measured separately.

Line-spread functions are examples. They apply if one is interested in the effect

of the line structure of a display, in the sharpness of lines, bars and edges, in

the visibility of bar-type interference, etc. They already received the attention

of scientists at the end of the sixties. Since Campbell, Carpenter and Levison

(1969), a number of authors have contributed to this sUbject. Yet, more dataob­

tained under conditions of present displays are needed if we are able to construct

adequate quantitative models.

This paper concerns the measurement of line-spread functions of the eye elicited by

luminance increments of TV lines under realistic TV viewing conditions.

*) Now at Philips Research Laboratories, Projects Centre Geldrop91

Page 92: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Methods

Subthreshold responses to lines, bars and edges have been determined by means of adrift-correcting perturbation technique applied earlier in the measurement of point

spread and temporal-impulse responses (Roufs and Blommaert, 19811 Blommaert and

Roufs, 1981). The basic postulates of this approach are:

a. small-signal linearity, which is locally shift-invariantb. a modest internal noise source, and

c. peak detection.This is visualised in Fig. 1. Here we will mainly consider the narrowest channel

Light ?'x

Fig. 1. Basic assumptions aboutthe visual system (see text).

covering the finest details. The pattern of a line having a width of ~y and a lumi­

nance increment t, L is symbolised by ~Lp(y,~y), its pattern of retinal illumination

being Ep(y,~y). It is assumed to be detected if its visual response E~YU6(Y) plus

noise leads to a 50% detection probability. The centre of this line is fixated, aswill always be the case below. Due to the inhomogeneity of the retina, the response

will not be the same along a meridian through the fovea. The relevant response isunderstood to be the effective line response in the fovea centralis, the most sen­

sitive part of the fovea under daylight conditions. A line is detected if

( 1 )

where U6 (y) is the unit impulse response and D is the amplitude causing 50% detec­

tion probability. Without loss of generality, D may be taken as the response unit.

Since there is local symmetry,

92

Thus, the line threshold is given by

U6(0) -1(~y--)

D

( 2)

(3)

Page 93: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

is still to be determined.

Now let the line be flanked symmetrically by two weak lines at distances y'. These

weak lines are the important ones whose responses we want to determine. Detection

of this complex is given by

extr{£c~YUo(y)+qE<!,YUo(Y-Y')+qE<!,YUo(Y+Y')}= D if q« 1, the extremum of the com­

plex response is in good approximation at the position of the maximum of the re­

sponse of the strong line, called 'probe line' further on.

Uo(o)Thus,

D+ 2q ( 4 )

Uo(y' )Since the first term is constant, can in principle be obtained experimental-

Dly from EC(y'). However, changes in threshold are very small quantities and drift

in D tends to spoil the results. Fortunately, the effect of drift can be reduced

effectively by using a reference. If, at a given y', a limited number of tries is

used in order to obtain a fast estimate of the 50% threshold E c' and immediately

thereafter the threshold El of the probe line alone is measured in the same way, D

is almost unchanged. Therefore, if

1(5 )

is taken as a norm factor, the normalised line-spread function given below is ob­

tained, that is

( 6 )

The line-spread function may be reconstructed by multiplying (6) by the right-hand

side of (5) averaged over many sessions (for details see Roufs and Blommaert,

1981). (It can be argued that the noise amplitude is sufficiently small to prevent

a serious effect on the derivation of (6).)

An arbitrary luminance profile Lof(y), giving a retinal illumination Eof(y) evokes,in accordance with the model, the response

D

00 Uo(Y)EO f f (y-Y) dY

D( 7 )

If no noise were present its threshold would be given by

D(8 )

For calculations of the effect of noise under certain assumptions, see Rashbass,

1976~ Watson, Nachmias, 1977.

Page 94: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

94

In this paper we are mainly concerned with comparing the results of (7) with direct

measurement of such a response. This can be done by replacing the second term of

(4) by the response of the relevant response, changing the q value properly. The

response can then be measured according to the same principles.

Apparatus

A professional video monitor (Philips LOR 2151) has been used. The screen measured

28 x 22 em, having a white phosphor with 170 ~s persistence time (10%) and a mean

luminance of 200 cd.m-2 (A level of 5 cd.m- 2 was obtained by using neutral filters

in front of the eye). The subjects, with natural pupils, monitored entoptically

several times during a session, were looking into a dark environment. Although

there are considerable intersubject differences, pupil diameters of 4.5 at 200

cd.m-2 and 6 at 5 cd.m-2 are perhaps typical.

Two viewing distances, ensured by forehead and chin rests, were used, viz. 1.54 m

(7 x image height) and 2.80 m. The line width was about 0.3 mm and the centre dis­

tance 0.4 mm corresponding to retinal distances of 0.9 and 0.5 min. of are, respec­

tively. Line luminance increments were generated by a specially built device. There

was always one relatively strong, single line increment, the probe line, displayed

at the centre of the screen and marked by fixation marks (usually 4 points at the

corners of a diamond). Perturbing lines at various line distances from the probe

(usually in symmetrically situated pairs) and perturbing edges and bars were used,

whose position relative to the probe could be changed. In this paper only the

results of stationary patterns will be considered, an example being visualised in

Fig. 2.

/..:.:=:::_-----_-:.-_":._::::.:.:Field 1 :.--'--------- - ------~......... F'leld 2~--------- -----~~

~::---.:_.: -===----::::~/-- - -- --- --- --- ---.------ ---------

L

Fig. 2. The upper figure shows the frontal view of the linesof the black and white video monitor. In the 3D drawing ofthese lines (second row, left figure) the stationary luminanceincrements of the probeline in field 2 and the perturbing linein field 1 are demonstrated. The right-hand side of this rowshows the spatio-temporal structure. The third row shows thepossibility of sampling in the time space domain by one-shotincrements (not used in this paper).

Page 95: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

One part of the specially built stimulus generator provides the analog signals

which are built up to form the final video signal. The second part provides the di­

gital signals which control the grey levels of the (analog) video signal. The mid­

dle grey level has been chosen to correspond to 200 cd.m- 2 • The luminance incre­

ments or decrements are controlled by dB attenuators operating at the analog part

of the system. Probe response and perturbing functions are functionally separate.

The ratio q between the amplitudes of their luminance deviations with respect to

the constant mean level is controlled in discrete steps. The digital part controls

the probe position, which is fixed in the centre of the screen, and generates the

perturbing lines, bars or edges, whose position with respect to the probe line can

be shifted over the screen. The bar width can be set to any number of lines.

Experimental procedure

Thresholds were determined by a modification of the method of constant stimuli and

the frequency of seeing by fractions of 10 trials. The intensity was changed in

random order, but controlled in steps of 0.5 dB. Only that part of psychometric

functions between 0.2 and 0.8 detection chance was used, as it varies in good ap­

proximation linearly with log luminance. Thresholds and slopes were averaged overpairs measured in counterbalance. The average slope of a large number of psychome­

tric functions was taken and used to estimate the 50% threshold pairs from 2 x 2

fractions of stimulus and reference measured in counterbalance. In all cases q was

set at a value ensuring that the response peak of the perturbing function was 0.2 ­

0.3 of the peak of the probe response.

Results

Two typical line-spread functions at different luminance levels (2.00 and 4.75

cd.m- 2) are shown in Fig. 3a, although only half of this symmetrical function has

been drawn. The viewing distance was in this case 154 cm, the interline distance

0.9 min. of arc. The results demonstrate that the method leads to reasonably pre­

cise results. Each point is the average of 6 pairs of thresholds, each pair being

the mean of two quickly measured counterbalanced pairs •.Every point is based on 960

trials, the vertical bars indicating the standard deviation of the mean. The gene­

ral shape, an integrative centre surrounded by two inhibiting phases, is in accor­

dance with earlier physiological (Rodieck, 1965) and psychophysi~al data (e.g. Wil­

son and Bergen, 1979; Kulikowski, King and Smith, 1973).

The low-level line-spread function is clearly broader than the high-level one,

which is consistent with visual acuity data, for example. Their shape, however, is

the same, since they can be matched by changing the scale by a factor of 1.6, as is

demonstrated in Fig. 3b. This result suggests that neural factors are the dominat­

ing ones, since optical blurring would not change with level. Also, the inhibiting

phases cannot be optical. As already suggested (Rodieck, 1965; Wilson and Bergen,

1979) the shape may be described by the difference of 2 Gaussian functions. How­ever, one positive and two negative gaussians turned out to give a slightly betterfit.

95

Page 96: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

..U,5 (y)

.5

subj. H v.d.w

• 4.75 Cd.m-2 NF· 031 lTd minr1

• 200 0034

min of arc

®-5

Fig. 3a. Example of line spread functions elicited bya luminance increment of a TV line, demonstrating theeffect of background luminance.

"-

~\... \

U,5 (y) \\

~ .5 •\•\\\

••\0

0

subj. H v.d.W. 2• 4.75 Cd.m~

• 200

min of arc at 200 Cd.m-2

15

-5 @

96

Fig. 3b. As Fig. 3a, the horizontal axis for the lowerlevel is multiplied by a scaling factor.

The linearity assumption is tested by comparing line-and-edge responses at 200

cd.m- 2 • It was felt that more sampling points were needed to obtain reasonably ac­

curate convolutions of the line response. Therefore the viewing distance was in­

creased to 280 cm, brin.ging the interline (sampling) distance to about 0.5 min. of

arc. Fig. 4 shows typical line and edge responses measured by the same subject. The

dashed curves represent simultaneous fittings to both responses, the line response

being the space derivative of the edge response. The results are clearly consistent

Page 97: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

with the linearity assumption in combination with peak detection.

to

rsubj. XT

200Cd.rri'z- -Z -1

NF' 3.010 (Td min)

min of arc

distance y from edge..-10

@

t

4

-4

5

subj XTN-F -1.5 10z

Td-1

10

Fig. 4. Comparison of line and edge spread functionsshown in Figs a en b respectively. The dashed curvesare simultaneously fitted, the lower being the convo­lution of the upper one.

However, one must be careful with respect to prediction of thresholds based on con­volution of line-spread functions according to eq. (7), since other 'channels' maycome into play. This is demonstrated for the thresholds of bars with variable

97

Page 98: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

width. Fig. 5 shows 50% bar thresholds as a function of bar width. The finely dash­ed curves are the predictions based on (7). Prediction and measurement deviate con­

siderably in bars wider than 5'. Such a discrepancy is commonly found (e.g. Bergen

--".....-".. /

/ I // /-,!

/ / .6t

o

-.5

III subj. XT

. ~ £1 • 91 Td\ I I I, I \ \

I \~ I \ \,I \ \'I \ \\ \ I \\ \ \ \\ \ \ \\ \ \ \

J,,, \ \ //-------\ \ \ \ /) \ \ X, \ '" \

,\. y"~-_... \ \,

\ \ ,/ /",/\ \... /'_L'_ ...., J."v.1_....... "

.....-- - min of arc

15

98

-1 L.......I--..L---'-----'-----L..--L..-I---1-....L..-....L..-...1...--'--.............L...-.I5 10

------:~~ bar width

Fig. 5. Threshold of a bar as a function of bar width.The bold curve shows the predicted values based on theconvolution of the measured line spread function. Thethin curves show how prediction is changed by involvingother channels.

and Wilson, 1979). It cannot be restored by correcting for the effect of probabili­

ty summation. Also, there is no anomaly in the bar responses beyond this critical

width. This is demonstrated in Fig. 6, where measurement and prediction of the re­

sponse of 10 min. bar are compared.

Our results support the common opinion that this type of deviation between measure­

ment and prediction is due to multichannel processing (e.g. Bergen and wilson,

ibid). Indeed, if we use Bergen and Wilson's 4-channel formalism, taking 4 isomor­

phic channels, each being a factor of 2 wider, we obtain good predictions of bar

thresholds. This may be appreciated by looking at the effect of these channels as

indicated in Fig. 5.

Page 99: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

5

---........-- distance trom left bar-3}

5 j }J1---- subj. XT

-3 -14 NF =1.510 Td

U10lYl /' J\

t3 'J \I \

I \2 I \

/ \\\\,

minot arc

Fig. 6. Measured responses of a 10' bar, compared with prediction.

Conclusions

a. Visual responses elicited by lines, bars and edges, etc. under TV conditions can

be measured with reasonable precision with a sampling perturbation technique.b. The line responses are isomorphic and wider for lower mean luminance levels.

Their shapes are consistent with literature data for static lines at homogeneousstatic backgrounds.

c. Although the retina is inhomogeneous, local linearity is found in the direction

perpendicular to the long dimension. For example, the line-spread function is

the spatial derivative of the edge-spread function.d. Prediction of bar thresholds, based on a one-channel approach is correct up to

about 5' of arc. For wider bars, more sophisticated multichannel models have tobe used.

Summary

Line, bar, and edge-spread functions of the human visual system elicited by lumi­

nance increments of TV lines have been obtained by a sampling, drift-correcting

perturbation technique. Line-spread functions at 200 and 5 cd.m- 2 levels differ on­

ly by a scale factor. Line, bar, and edge responses are linearly related. Threshold

prediction of bars based on a single-channel model becomes inaccurate for bars wi­

der than 5 min. of arc.

Acknowledgements

Thanks are due to the students H. Hegt, X. Timmermans, H. van der Woord and C. Tim­

mers for carrying out the measurements.,99

Page 100: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

100

References

Blommaert, F.J.J. and Roufs, J.A.J. (1981) The foveal point spread function as a

determinant for detail vision. Vision Research £1, 1223-1233.

Campbell, F.W., Carpenter, R.H.S., Levinson, J.Z. (1969) Visibility of aperiodic

patterns compared with that of sinusoidal gratings. Am. J. of Physio!. 204, 283­

298.

Campbell, F.W. and Green, D.G. (1965) Optical and retinal factors affecting visual

resolution. Am. J. of Physiol. ~, 576-593.

Campbell, F.W. and Robson, J.G. (1968) Application of Fourier analysis to the vi­

sibility of gratings. Am. J. of Physiol. 197, 551-566.

Duffieux, P.M. (1946) L'integrale de Fourier et ses applications a l'optique.

Fac. des Sc. Besan~on.

Graham, N. (1977) Visual detection of aperiodic spatial stimuli by probability

summation among narrow band channels. Vision Research 11, 637-652.

Kulikowski, J.J., King and Smith (1973) spatial arrangement of line, edge and

grating detectors revealed by subthreshold summation. Vision Research 11, 1455­

1478.

Meeteren, A. van (1973) Visual aspects of image intensification. Thesis Utrecht.

Nes, F.L. van and Bouman, M.A. (1967) Spatial modulation transfer in the human

eye. J. Opt. Soc. Am. 57, 401-406.

Rashbass, C. (1976) Unification of two contrasting models of the visual increment

threshold. Vision Research ~, 1281-1283.

Rodieck, R.W. (1965) Quantitative analysis of cat retinal ganglion cell response

to visual stimuli. Vision Research 2, 583-601.

Roufs, J.A.J. and Blommaert, F.J.J. (1981) Temporal impulse and step responses of

the human eye obtained psychophysically by means of a drift-correcting perturba­

tion technique. Vision research £1, 1203-1221.

Schade, O.H. (1948) Electro-optical characteristics of television systems.

I Characteristics of vision and visual systems, p. 5

II, III Electro-optical specification for television systems, p. 245, p. 490.

IV Correlation and evaluation of electro-optical characteristics of imaging sys­

tems.

RCA Rev.

Sachs, M.B., Nachmias, J. and Robson, J .G. (1971) Spatial-frequency channels in

human vision. J. Opt. Soc. Am. ~, 1176-1188.

Watson, A.B. and Nachmias, J. (1977) Patterns of temporal interaction in the de­

tection of gratings. Vision Research 12, 893-902.

Wilson, H.R. and Bergen, J.R. (1979) A four mechanism model for threshold spatial

vision. Vision Research ~, 19-32.

Page 101: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

102

Cognition and Communication

Page 102: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Developments

D.G. Bouwhuis and H.C. Bunt

Language learning

The efforts of the group Cognition and Communication in the field of foreign lan­

guage learning were considerably expanded in 1982. First, the investigation on the

use of randomly accessible speech in an English proficiency course, the preliminary

results of which were reported upon in last year's issue of this report (Bouwhuis

and Kreutzer, 1981), was continued this year. Altogether 58 students have partici­

pated in the experiments. The results confirm earlier findings with respect to im­

proved English word knowledge and listening proficiency for students who had stu­

died with randomly available speech, rather than a language laboratory recorder. A

small group of students was asked to come back after finishing the experiment pro­

per, to study another lesson with the other learning system than the one they had

been using throughout the experiment. This meant that students who had listened to

speech by pointing at words or sentences with a pen, now learned with a language

laboratory recorder, and the reverse. It turned out that no single student pre­

ferred the language laboratory recorder, which confirmed the impression of high mo­

tivation which the students displayed while studying with 'speech-at-wi1I'. This

high motivation was also borne out by a questionnaire to all participating stu­

dents. A general point of worry is the lack of valid tests of text comprehension.

Although we have been using comprehension tests which were part of the course that

was employed, the validity appeared to be essentially negligible in item analyses

that were carried out. It is, therefore, not surprising that the various groups

participating in the experiment did not differ significantly in two tests of thissort.

Interactive language learning

Another project, funded jointly by the Eindhoven University of Technology and Til­

burg University is concerned with an interactive learning system for students (low­

er grades highschool) who have to learn the meanings of a basic set of words in

context. In doing this they can, at any moment, choose one of a number of presenta­

tion modes like reviewing the whole list, study all the meanings of one word, ad­

minister themselves a lenient test or play an adapted version of Mastermind® in­

volving the same words. A preliminary version of the system has been the subject of

a pilot study at a highschool for 11 students, where the set of words was an inte­

gral part of the course taught in class. User acceptance was quite favourable.

It turned out that none of the eleven participating students needed any preliminary

instruction for carrying out the learning task.

Studying styles adopted by the students were very different, but all of them

learned a considerable proportion of the 108 words.

103

Page 103: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

104

Learning to read by dyslectic children

This project, funded by the Dutch Organisation for the Advancement of Pure Research

(ZWO) started this year by implementing a mini~computer system for speech produc­

tion. As is well known, dyslectic children have particularly great'difficulties in

reading longer words (exceeding 6 letters). It is hypothesized that these children

may associate the word sound with the printed word if it is repeated often enough.

Therefore, if the dyslectic children can make the word sound available, only by

pointing at the word, reading and decoding may become automatised through,

overlearning.

A ' talking reading board' has been constructed, which can produce the sounds of

over sixty words. The word soumis were constructed by diphone concatenation (see

Elsendoorn and 't Hart, this issue) and are stored in the working memory of a mini­

computer, to attain a very fast access.

So far, the system has been tried out in a pilot study where normal and dyslectic

children either could see and hear all words or only see them.

Children who could hear and see the words were far more motivated than those who

could only see them. Only the latter group needed increasing encouragement to try

to study the (difficult) words.

Preliminary analyses indicate stable and favourable results for the word-with-sound

system; in no single instance was the learning process affected in a negative way.

Word recognition in reading

It is well known that the perception of single letters in a word is impeded by the

other letters. This effect is called visual interference. An experiment was carried

out to investigate to what extent this effect is selective, i.e. depends on the

particular shape and features of letters. The interfering effect of the letter 'x'

might be supposed to impede particularly the perception of letters with a squareenvelope having an inner structure, like 'w' or 'z', which might be unlike the ef­

fects of an interfering letter '0'.The letters to be recognised, in isolation or embedded in strings, were lower case

letters without ascenders (b, d) or descenders (p, j), and were presented at a num­

ber of foveal and parafoveal positions. Under these conditions no clear-cut selec­

tive interference has been established. Both 'x' and '0' caused an almost equal in­

terference effect on all x-height letters. In a subsequent study letters to be re­

cognised were followed by punctuation marks like colons, dots and comma's. Also in

these cases a clear interference effect was obtained, even for the simple dot.

Recognition of bisyllabic words

Two other studies were carried out on the recognition of longer words (more than 6

letters), consisting of two syllables. Studies of this kind had not been carried

out at IPO before and we were particularly interested in the contribution of each,

bne of the two component syllables to the recognition of the whole word. This was

investigated by presenting the words in such a way that the fixation was on the se­

cond letter, the middle of the word or the penultimate letter in a lexical decision

task. It turned out that for these words the relatively limited shifts of fixation

Page 104: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

points was effective in changing decision and recognition times considerably. Un­

like what is sometimes claimed in the literature, the second syllable could be

shown to affect recognition time of the whole word significantly, even if its ef­

fect is more limited than that of the first syllable.

Motor coordination in typing

The study on coordination of finger movements in typing has been continued. It was

not so much concerned with typing errors but with the moment of initiation of the

finger movements and the trajectories for the execution of keystrokes. These fac­

tors determine essentially the speed of typing and have not been studied in great

detail before. Results of the study can be found in the paper by Larochelle in this

issue.

Empirical studies in information dialogues

Information dialogues via the telephone with the flight information service for

travellers and visitors at Schiphol, Amsterdam Airport, were collected for a whole

day. The dialogues, recorded on tape, provide valuable material both for testing

aspects of our theoretical framework of information dialogues and for determining

practical syntactic, lexical, semantic, and pragmatic requirements that an accept­

able automatic dialogue system would have to fulfill.

For a newly initiated study of man-machine dialogues in a practical situation see

also Leopold, this issue.

Development of an intelligent dialogue system

Development of a computer model has been started, that integrates aspects of parti­

cipation in an information dialogue, especially in relation to interpreting natural

language utterances in a dialogue context, to updating the model of the discourse

domain and of the partner, and to generating appropriate continuations of the dia­

logue. The model is provisionally called INDIS (for lPO ~atural Language Dialogue

~ystem) •

The overall design of INDIS is based on investigations into dialogue control acts

(see Bunt et al., 1980), which provide indications as to the main functional compo­

nents to be distinguished in an intelligent dialogue system.

In designing the language interpretation component of INDIS we build up on earlier

work in the PHLIQA project at Philips Research Laboratories (see Bronnenberg et

al., 1980). In particular we have taken over the general multilevel semantic repre­

sentation approach and, with some modifications, the set-theoretical nucleus of the

formal languages for semantic representation. This nucleus has been modified and

extended to include constructions based on ensemble theory, the extension of clas­

sical set theory defined in Bunt (1981a), resulting in the representation language

called EL (Ensemble Language). The development of multilevel parsers for fragments

of Dutch and English, inclUding the necessary procedures and data structures for

calculating and representing the semantic types of EL expressions, has constituted

the major part of our implementation work and is described in this issue by Bunt

and Thoe Schwarzenberg. Other implementation activities have been concerned with105

Page 105: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

106

the development of basic structures and procedures for knowledge representation and

manipulation.

The theoretical investigations in relation to the INDIS model have been concerned

mainly with the definition of a representational formalism for modelling the infor­

mation state of the human dialogue partner (see below), the definition of a system

of speech acts, necessary and sufficient for information dialogues, and the design

of a grammar in which the syntactic-semantic rules are enriched with pragmatic com­

ponents. By enriching the grammar rules and lexical representations with pragmatic

components, we make it possible to determine, along with the syntactic structures

and the semantic contents of utterances, the type of speech act that is performed ­

in so far as this can be determined on syntactic-semantic grounds, of course.

Speaker modelling

A central role in the INDIS model will be played by a representation system model­

ling the dialogue situation and its dynamics. This includes the representation of

the system's knowledge of the domain of discourse and its knowledge of the state of

the human dialogue partner, which includes the system's knowledge of the partner's

knowledge of the system's knowledge, etc. The formally correct representation of

.this kind of knowledge presents intricate logical problems which can only be solved

by using complex systems of modal logic, for which no automatic deduction tech­

niques are known.

Recently, a significant advance in this area has been made by Moore (1981), who de­

veloped an axiomatic representational formalism in a first-order language, in which

propositions of the model logic occur as basic terms. Unfortunately, the automatic

deduction techniques that have been developed for axiom sets, even in first-order

languages, are extremely time- and space consuming, as Appelt's (1982) implementa­

tion of Moore's formalism confirms. Moreover, it is not clear how the axiomatic

method could be combined with the model-theoretic evaluation of nonmodal expres­

sions given a data base describing the state of the discourse world, which has

proved so successful in the PHLIQA system. Therefore a purely model-theoretic ap­

proach with recursive evaluation rather than deduction from axiom sets would be

preferable. Our initial attempts at designing a formally correct and computational­

ly effective representational system in this direction have been described in last

year's issue of our Annual Report (Bunt, 1981b). Our recent efforts consist of ex­

tending this work to include the design of an epistemic representation language

with recursive evaluation relative to partial models, the partial character of the

models reflecting the incompleteness of the system's knowledge of the human dia­

logue partner, that partner's knowledge of the discourse domain, etc.

Acknowledgements

A number of students have been active in preparing and running experiments men­

tioned above or in data analysis. A. Zagers (Tilburg University), assisted in the

final phase of the study on learning a foreign language with randomly accessible

speech. G. Linders (Tilburg University) helped to run the pilot study on the inter­

active learning system. H. Michels (Tilburg University) carried out the experiments

Page 106: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

on selective letter interference, while K. van Gucht (Free University, Brussels)

investigated the recognition of longer words. This study was followed up by R.

Janssen and R. van Lier (Moller Institute, Tilburg) for varying contrast levels.

G. thoe Schwartzenberg (Groningen University) has been productive in the develop­

ment of computer programs for parsing and interpreting Dutch or English sentences.

References

Appelt, D. E. (1981) Planning natural language utterances to satisfy multiple

goals. SRI International AI Center Technical Note 259.

Bouwhuis, D.G. and Kreutzer, H. (1981) The role of speech in foreign language

learning. IPO Annual Progress Report ~, 86-95.

Bronnenberg, W.J., Bunt, H.C., Landsbergen, S.P.J., Scha, R.J.H., Schoenmakers,

W.J. and Utteren, E.P.C. van (1980) The question answering system PHLIQA1. In

Bolc (1980) (ed.) Natural language question answering systems. Macmillan, Lon­

don; Carl Hanser Verlag, Munich & Vienna.

Bunt, H.C. (1981a) The formal semantics of mass terms. Doctoral dissertation, Uni­

versity of Amsterdam. Revised edition to be published by Cambridge University

Press, Cambridge 1983.

Bunt, H.C. (1981b) Rules for the interpretation, evaluation, and generation of

dialogue acts. IPO Annual Progress Report ~, 99-107.

Bunt, H.C. and Schwartzenberg, G.O. thoe (1982) Syntactic, semantic, and pragmatic

parsing for a natural language dialogue system. IPO Annual Progress Report .!l(this issue).

Elsendoorn, B.A.G. and Hart, J. 't (1982) Exploring the possibilities of speech

synthesis with Dutch diphones. IPO Annual Progress Report .!l (this issue).

Larochelle, S. (1982) The initiation and duration of movements in skilled type­

writing. IPO Annual Progress Report .!l (this issue).

Leopold, F.F. (1982) Ergonomics, Developments. IPO Annual Progress Report .!l (this

issue) •

Moore, R.C. (1981) Reasoning about knowledge and belief. SRI International AI Cen­

ter Technical Note 191.

107

Page 107: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

108

Strategy effects in letter and word recognition

D.G. Bouwhuis

A number of recent models for visual word recognition state that the constituent

letters of words are basic units in that process (Bouwhuis, 1979; Rumelhart and

McClelland, 1982; Paap and Newsome, 1982). It is not particularly easy to study the

exact contributions of letter recognition to word recognition, owing to several vi­

sual factors and others related to the hierarchical nature of word recognition asposited in those models. First, words may appear at various positions of the visual

field during reading. Not only does parafoveal position increase visual interfe­

rence between letters, but the pattern of interference also changes considerably as

a function of position. Initial and final letters are best perceived, but the let­

ter farthest from fixation is consistently best perceived of all, while the middle

letters are much less identifiable. Consequently, letters in a word will be per­

ceived quite differently depending on the exact position of the fixation point in a

word, especially when it is a long one.Second, it is well known that we need not always identify all letters of a word in

order to recognise it. Lexical, syntactic and semantic knowledge enable the reader

to decode incompletely seen words successfully. This phenomenon obscures the exact

contribution of letter recognition. Also, it can be argued that visual properties

of the word pattern may allow incompletely seen letters to be inferred, or to sup­

plement those that have not been seen, but must be part of the word pattern. This

is a consequence of the hierarchical principle of a process; higher levels deter­

mine what must have been parts that were present at lower levels. To assess letterrecognition proper in words, letters must be presented in meaningless, but wordlike

letter strings. Investigations of this sort have been carried out by Bouma (1973).

In these experiments the subjects reported letters in the various positions of the

strings. The strings were derived from existing words by replacing letters, not to

be reported, by visually similar ones in such a way that another existing word was

not obtained. Word length, word contour and ,visual interference therefore remainedunchanged.

Word recognition and letter confusion

The letter confusion model proposed by Bouwhuis (1979) is a model for word recogni­

tion that explicitly takes into account how letters are perceived in words. In ef­

fect, the model assumes that recognition of a letter occurs independently of that

of other letters in a word. This independent recognition only entails perception of

identity. By the presence of adjacent letters the recognition accuracy will be down

relative to recognition of a letter in isolation, but letter identity per se willnot be affected.

The assumption of independence of letter recognition allows a very simple combina­

tion rule for all letters in a word, viz. multiplication of the letter recognition

Page 108: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

probabilities. Generally, any other combination rule involving interdependence re­

quires parameters that have to be estimated, which can be avoided under the inde­

pendence condition.Of course, as has been implied before, there are many ways in which dependence can

arise. These dependence effects, however, are only introduced on the word level of

the model, not on the level of letters. The letter confusion model gives a fairly

accurate account of many findings in word recognition. There is, however, also one

consistent phenomenon to be observed: the probabilities of correct word recognition

are underestimated. For words presented at four different positions in the visual

field the average probability of correct recognition was 0.74, while the prediction

of the model was 0.67: an underestimation of 0.07 (Bouwhuis, 1979).

Some possible .reasons for this underestimation have been discussed before (Bouw­

huis, 1979) and can be subsumed under three headings: vocabulary effects, subject

effects and strategic effects.

Vocabulary effects arise when the vocabulary of the model is increased. This givesrise to more error responses.

Subject effects, on the other hand, are caused mainly by combining recognition pro­

babilities from subjects who differ in recognition scores.

Both of these effects have been treated in earlier pUblications (Bouwhuis, 1978,

1979) and were shown to be of only minor importance or none at all.

In this paper we intend to discuss in detail what the effects of perceptual strate­

gies can be on underestimation. As will be shown below, the underestimation can be

taken to reflect dependence effects in letter recognition, whereas the model as­

sumes independence. It is, ~herefore, useful to explore theoretically the kinds of

dependence which may arise from effects other than purely perceptual ones, so that

the independence property which is both productive and parsimonious in the model

can be retained.

The effects of strategy

To elucidate the effects of perceptual strategy it is useful first to state the

fundamental equation of the letter confusion model (Bouwhuis and Bouma, 1979) forword recognition:

P(L1L2L3Il1l2l3)=P(L1--111--)P(-L2-1-12-)P(--L3!--13)' in which L1L2L3 is a stringof letters that may have been perceived for the three-letter stimulus word 111213,

and which may also be a real word. Li and lj are the constituent letters in theirindicated position.

Note that the independence assumption for letter recognition leads to the triplemUltiplication in the right-hand expression. It is conceivable that during the re­

cognition trials attention strays in the sense that the observer attends more tothe left visual half field than to the right field: it is even possible that slight

eye movements are made, inadvertently, to the left or to the right. The observer,then, will be able to perceive information at the attended side somewhat better

than information at the nonattended side. We distinguish, therefore, three percep­tual states.

1. All-state. The observer perceives all information at the attended side109

Page 109: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

correctly; if there are three letters, all three will be perceived.2. Partial-state. The observer will not extract all information from the display

and some letters may be incorrectly recognised. Independence is assumed tohold here.

3. None-state. The subject has been unable to extract any useful sensory informa­

tion and has to guess which letters were present.

Covariance effects

In statistics the covariance between two variables X and Y is defined as:

COV(XY) = E(X-EX) (Y-EY) = EXY-EX.EY

where E is the expectation.

Generalising to the model equation (for two rather than three letters) leads to:

COV(P1 P2) = EP1P2 - EP1EP2Therefore

EP1P2 = EP1EP2 + COV(P1 P2)·Here EP1EP2 corresponds to the first two terms in the right-hand side of the funda­mental model equation, and may, by generalisation correspond to any pair of the

three terms.

It is seen that the expectation for recognition of two letters (EP1P2) is indeed

predicted by the product of the separate letter recognition scores, but that the

covariance terms affects its size as well. If the covariance is positive an under­

estimation will obtain, which is exactly what has been observed.In the sequel we will explore the effects of the presence of some or all of the

three hypothesised perceptual states on the size of the covariance. It turns outthat since all states may occur over trials, they may be configured in many combi­

nations.

Model variations

In the examples below the (perceived) letters Li and Lj are given in the left co­lumn and the theoretical probabilities corresponding to the configurations of in­

formation state are given in the right-hand column.

The Partial model

This model corresponds essentially to the original letter confusion model (Bouw­huis, 1978) that has been mentioned.

Report

LiLj

LiLj

LiLjLiLj

Probability

PiPjPi ( 1-Pj)

(l-Pi)Pj(l-Pi)(l-Pj)

110

COV(PiPj) = EPiPj - EPiEpj = PiPj - PiPj = O.As stipulated earlier, the original version of the model does predict that thereshould be no covariance.

Page 110: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

AII-or-partial model

There is a probability a that both letters are seen, otherwise independence holds

and the probability of correct report is Pi.

Report

LiLjLiLj

LiLjLiLj

Probabil i ty

a+( 1-a)PiPj(l-a)si(l-sj)

(l-a) (l-Si)Sj(l-a) (1-s0 (l-sj)

COV(PiPj) =a+(l-a)siSj -(a+(l-a)si)(a+(l-a)sj)

= (l-a) (l-si) (l-sj)From this result it is found that only if a is either zero or one, there will be no

covariance. For a=0.2, Pi=0.6 and Pj=0.7 the covariance is 0.02. Maximum covariancewill obtain when a=0.5, which is too high a value. In that case the subject would

always have to attend fully to one visual position (left and right presentations

were randomised over trials but position of the string was constant over ses­

sions). This can easily be demonstrated not to be the case.

Partial-or-none model

In this model sensory information is extracted with probability b, when no sensory

information is available guessing takes place, which leads to a correct report with

probabil i ty g.

Report

LiLj

LiLjLiLj

LiLj

Probability

bPiPj+( 1-b)9i9j

bPi(1-Pj)+(1-b)9i(1-gj)

b(1-Pi)Pj+(1-b)(1-9i)9j

b(1-Pi)(1-Pj)+(1-b)(1-9 i)(1-gj)

COV(PiPj) bPiPj+(1-b)9i9j-(bPi+(1-b)9i)(bpj+(1-b)gj)b(1-b)(Pi-9i)(Pj-gj)

From this result it is found that if b=O or b=l the covariance vanishes. If Pi=9ithe covariance vanishes as well. It is interesting to note that if Pi < 9i the cova­

riance becomes negative and an overestimation should result. Such a situation,where the sensory probability is lower than the guessing probability is hard to

imagine. Taking b=O.B, Pi=0.6 and Pj=0.7 as before and 9i=9j=1/26 we get 0.06 asthe result for the covariance whjch is higher than for the all-or-partial model.

All-or-none model

The name of this model may cause confusion because the same name has been coined by

Townsend (1971) for another model. In that model, however, the subject either ex­

tracts some sensory information leading to correct recognition with probability P,

or fails to extract information and just guesses. Viewed this way the Townsend all­

or-none model is similar to our partial-or-none model: in addition, our models de-111

Page 111: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

scribe the recognition of two letters, whereas Townsend's models were intended forsingle-letter recognition.

Report

Li~j

LiLj

~i~jLiLj

Probab il i ty

a+(l-a)gigj( 1- a) 9 i ( 1-g i)

(l-a) (1-9 i)9j( 1- a) ( 1-g i) ( 1-g j )

COV(PiPj) a+(l-a)gigj - (a+(l-a)gi)(a+(l-a)gi)

(1-a) ( 1-g il (1-g j )Here again the covariance vanishes when a equals 0 or 1, and is maximal when

a=0.5. For a=0.2 and 9i9j=1/26 the covariance amounts to 0.15, which is relatively

high but understandable because of the dominant role of the parameter a. Neither of

the guessing parameters can be assumed to be much greater than 1/26 on average,

causing the covariance to be only slightly lower than a(l-a).

It can be argued, however, that the model cannot possibly fit the data. If the ob­

served probability of the correct report of one letter equals 0.60, the value of a

is easily estimated:

a+( 1-a)g = 0.60·

a = 0.584The value exceeds 0.50, which is the maximum value of a considering that presenta­

tions are randomly distributed over left and right half-fields. So the observation

of correct letter reports with probabilities exceeding 0.52 (the value at which a

must be 0.50) casts doubt on the validity of this model.

All-model

This model is even more extreme than the All-or-none model in that the observer

does not produce guessing responses when nothing has been seen, and therefore fails

to report on a number of occasions.

Report

LiLj

LiLjLiLj

LiLj

Probability

a

oo( 1-a)

COV(PiPj)

112

a-a2

a ( 1-a)

As before, the maximum covariance will obtain when a=0.50 and it will vanish when aequals 1 or O. Just as in the All-or-none model 'a' values exceeding 0.50 are re­

quired to describe the letter recognition data properly, thus invalidating this mo­del.

None-model

In this model it is assumed that the subject never manages to extract sensory in-

Page 112: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

formation from the presentations and always guesses.

Report

LiLj

LiLj

LiLj

LiLj

Probabil i ty

9i9j

9 i (1-gj )

( 1-g i) 9 j

( 1-g il ( 1-g j )

COV(PiPj) = 9i9j - gi9j = O.In this model no covariance will be observed. Besides, as the model has no percep-

tual component, it will be inadequate for purposes of word recognition anyhow.

AII-or-partial-or-none model

By combining all three information states the most complete model can be con­

structed. Though it is also the most complex one, it generalises easily to the sim­

pler models by setting appropriate parameters to zero.

Report

LiLj

LiLj

LiLj

LiLj

Probability

a+bpiPj+[ l-(a+b)] 9i9j

bPi(1-Pj)+[1-(a+b)]9i(1-gj)

b(l-Pi)Pj+ [l-(a+b)] (1-9i)9j

b ( 1-p i ) ( 1-Pj ) + [ 1- ( a +b )] (1 -g i) ( 1-g j )

COV(PiPj) a+bpiPj+[l-(a+b)] 9i9j -

(a+bpi+[1-(a+b)]9i)-(a+bpj+[1-(a+b)]gj).

This expression does not simplify easily, but by eliminating a number of small

terms involving, for example 9i9j (which will be less than 0.0014) we arrive at the

following expression

COV(PiPj) ~ a(1-a) - ab(PiPj) + b(l-b)SiSj.

If a is taken to be 0.2, b=0.8, Pi=0.6 and Pj=0.7 the covariance turns out to be

0.02. Inspection of the expression reveals that especially the last term, involving

the parameter b, is effective in changing the covariance size. For instance if b is

decreased to 0.50 the covariance amounts to 0.135.

Though it can be seen that the effects of changes in a are limited and those of the

guessing parameters negligible, the partial state effect b is quite powerful de­

spite the fact that the covariance of the partial model itself is zero, and that of

the All-or-none state and of the All-state is sizeable.

This model seems to be able to capture covariance effects as observed in the model,

and to give quite reasonable descriptions of observed recognition probabilities. It

is obvious that, in this model at least, covariance effects are unavoidable, when

slight attentional shifts occur. It is also clear that guessing has no noticeable

effect on the covariances.

It is interesting to look at another alternative which was mentioned before, the

All-or-none model proposed by Townsend (1971). It is not easy to identify the para­

meters of the Townsend All-or-none model with those of the present series but,

since there is no provision for seeing two letters perfectly well, we might call it

here the partial-and-none model since perception and failure to perceive may occur113

Page 113: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

simultaneously and independence is implied.

Partial-and-none model

Report

LiLj

LiLj

LiLj

LiLj

Probab i li t Y

PiPj+(1-Pi)Pj9i+Pi(1-Pj)9j+(1-Pi)(1-Pj)9i9jPi(1-Pj)(1-9j)+(1-Pi)(1-Pj)9i(1-9j)

(1-Pi)Pj(1-9i)+(1-Pi)(1-Pj)(1-9i)9j( 1-Pi) ( 1-Pj ) ( 1-9 i) (1-9 j )

COV(PiPj)

114

PiPj+(1-Pi)Pj9i+Pi(1-Pj)9i+(1-Pi)(1-Pj)9i9j

- (Pi+(1-Pi)9i)·(Pj+(1-Pj)gj)o.

The not too suprising result is that, according to this model, no covariance should

be observed at all.

Conclusion

It was shown that only slight shifts of attention, favouring recognition of two

letters in a string lead to covariance and hence to an underestimation of the pre­

dictions for correct responses. It was also shown that such shifts must necessarily

be limited. The model which can most easily account for the observed underestima­

tion makes clear that it can be described wholly by dependencies due to strategy

effects, and basically, is almost inevitable too. Therefore, the assumption of in­

dependence between letter recognition processes on the perceptual level as hypothe­

sised in the letter confusion model (Bouwhuis, 1978) can be upheld.

A final point concerns the applicability of the analysis to words of three let­

ters. We have presented an analysis for two letters only, and it might be argued

that the predicted covariance could be extended to more letters. In the first

place, a covariance involving three variables is hard to interpret, even though the

same analysis could be applied to the variables involving all three letters. In the

letter recognition experiments, the letters to be reported were obtained in two

sessions. In one, the first and last letter of a string had to be reported, and in

a later one, only the middle letter. If there are covariance effects, they would be

present in the results of the trials involving the two letters -initial and final­

and in those of the word recognition experiments. From this it would seem that the

analysis applied to two letters is the prcper one.

Summary

A consistent finding in tests of a letter confusion model for word recognition was

that it underestimated the probabilities of correct word responses. It is shown

here that this underestimation can be interpreted as a dependence between letter

recognition processes, whereas the letter confusion model assumes complete indepen­

dence of the letter. In a theoretical account the kinds of dependencies in letter

report which may arise as a consequence of 'strategy' effects, such as attention

and bias, have been explored. It is shown that the underestimation can be complete­

ly accounted for by a model incorporating both attentional effects and guessing.

Page 114: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

References

Bouma, H. (1973) Visual interference in the parafoveal recognition of initial and

final letters of words. Vision Research 11, 767-782.

Bouwhuis, D.G. (1978) A model for the visual recognition of words of three let­

ters. In: J. Requin (Ed.) Attention and Performance VII, Hillsdale, N.J.: Law­

rence Erlbaum Associates, 1978.

Bouwhuis, D.G. (1979) Visual recognition of words. Eindhoven: doctoral disserta­

tion, Nijmegen University.

Paap, K.R., Newsome, S.L., McDonald, J.E. and Schvaneveldt, R.W. (1982) An activa­

tion-verification model for letter and word recognition: the word-superiority

effect. Psych. Review 89, 573-594.

Rumelhart, D.E. and McClelland, J.L. (1982) An interactive activation model of

context effects in letter perception. Psych. Review 89, 60-94.

Townsend, J.T. (1971) Alphabetic confusion: a test of models for individuals. Per­

ception and Psychophysics ~, 449-454.

, 115

Page 115: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

116

The initation and duration of movements in skilled typewriting

s. Larochelle

Introduction

This paper deals with a cognitive-motor skill that has become increasingly impor­

tant in man-machine interactions since the advent of computers and computerised

word-processing systems, namely: typewriting. The paper presents the preliminary

results of an experiment aimed at specifying the composition of the movements in­

volved in skilled typing. The results concern, more specifically, the time at which

the movements leading to successive keystrokes are initiated and the duration of

the movements involved in transcribing lexical and non-lexical material.

Until recently, all the studies interested in typewriting had dealt exclusively

with the errors produced by typists and/or the temporal intervals between succes­

sive keystrokes. Implicit in most theories based on such evidence (e.g. Shaffer,

1976; Sternberg et al., 1978) was the notion that successive keystrokes are execut­

ed in a strictly sequential fashion. To my knowledge, only two studies (Gentner,

Grudin and Conway, 1980; Olsen and Murray, 1976) prior to the one reported here,

have deal t wi th the properties of the movements. Part of the reason for thi s ne­

glect is that, while computers have greatly eased the collection and analysis of

errors and interstroke intervals, the properties of the movements are accessible

only through a painstaking, frame-by-frame analysis of a film or videotape of the

subjects' performance. As a result, the available evidence is limited and it is

based on a relatively small number of keystrokes. For instance, Gentner et al.'s

results are based on a sample of 147 keystrokes obtained when a very skilled typist

(90 words per minute) repeatedly transcribed the same sentence.

Nonetheless, the studies devoted to movement have profoundly challenged the tradi­

tional view of typing. Olsen and Murray (1976) found that the movements leading to

a given keystroke started on average 5 ms before the preceding key was pressed. In

Gentner et al.'s (1980) study, 96% of the keystrokes were initiated before the pre­

vious key was pressed. A more unexpected finding is that the order in which the

movements were initiated did not always correspond to the order in which the keys

were finally pressed. For 21% of the keystrokes, the movements started before the

preceding keystroke was initiated. In all these cases, the movements ended with the

keystrokes in the correct order. Overall, the movements were initiated on average

137 ms before the previous key was pressed. This value contrasts with that obtained

by Olsen and Murray (1976), showing a much greater amount of coarticulation in the

movements.

The present experiment differs from the previous two studies in that it is not

based on the usual continuous transcription typing situation. There is considerable

evidence (Shaffer and Hardwick, 1968) that the semantic and the syntactic structure

of the material contributes very little to the typing speed of skilled subjects. By

Page 116: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

contrast, it is knowl1 that the lexical-orthographic composition of the materialdoes influence typing speed. In order to investigate the lexical-orthographic ef­

fects on the composition of movements, the discontinuous typing" paradigm was adopt­ed. In this task, the subjects are presented with isolated words or word-size let­

ter strings that they are asked to type, on cue, as fast and accurately as possi­ble.

Method

The subjects were four right-handed females, native English speakers, working as

professional typists. Their typing speed, which ranged from 61 wpm to 73 wpm, wascomparable to that of the subjects who participated in Olsen and Murray's (1976)

study.

The stimuli were all four letters long. Half of the letter strings were words, theotl-er half being nonwords. The stimuli also varied with respect to their motor com­

position. Half of the letter strings were typed with one hand (lH strings), theother half with both hands (2H strings). The 2H strings required that all the let­

ters but one be typed by the same hand, so that the hand responsible for the single

keystroke was free to start moving at any time during the execution of the other

keystrokes. For this reason, it is the single keystroke done by the alternate hand

in 2H strings which is of critical interest. This keystroke will be referred to

hereafter as the critical keystroke, or keystroke C. The 1H strings differed fromthe 2H strings only with respect to the critical keystroke, the movements being

much more constrained in the 1H strings. The position of the critical keystrokevaried across stimuli, occurring either in position 2, 3 or 4. Examples of 2H words

in which the critical letter occupies position 2, 3 and 4 are: rust, dent and week

respectively. The corresponding 1H words were: rest, debt and weed. In all these

examples, it is the left hand which does most (2H strings) or all (lH strings) the

typing. The reverse situation, in which most or all the work is done by the right

hand, was equally represented. Over both hands, there were 18 1H words and 18 2H

words for each position of the critical keystroke. The nonsense letter strings were

constructed by replacing each letter of every word included in the experiment bythe letter located on the same row and typed with the same finger of the opposite

hand. By operating this transformation on the 2H words rust, dent and week, for in­

stance, the 2H nonwords urly, kiby and oiid are obtained. Similarly, the 1H words

rest, debt and weed yield the 1H nonwords uily, kiny and oiik. The rare words for

which the transformation produces a lexical item were excluded· from the experi­

ment. Also excluded were all the words in which the letters ~, £.' ~ or ~ appear,because there are no equivalent letters on the right half of the keyboard.

Throughout the experiment, which required two sessions of about two hours per sub­

ject, each subject typed every stimulus string on three separate occasions. Thewords and the nonwords were presented in separate blocks, the other factors varying

randomly within each block. Each successive trial was announced by a beep, two secafter the completion of the subject I s prior response. A stimulus string was then

displayed for one sec in the center of a monitor situated on top of an Apple IIcomputer. Following the disappearance of the string, the subject heard three short

117

Page 117: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

118

tones at 500 ms intervals. The first two tones served as warning signals, the third

tone being the signal to respond. The subjects were asked to leave their fingers in

resting position on or above the home row keys (~, ~, ~, i, i, ~, 1, ~) until the

response signal was given. In order to motivate compliance with this request, 25%

of the trials were catch trials; the subjects being instructed to ignore the string

presented and to strike all the home row keys simultaneously and as fast as possi­

ble. Catch trials were signaled by a low tone, identical to the preceding two warn­

ing tones. On 75% of the trials the response signal was a high tone; the subjects

being instructed to type the string presented as fast and accurately as possible.

The computer recorded the keystrokes done by the subjects, but the correspond.ing

characters were not sent to the monitor screen. ~he computer also recorded, to the

nearest 2 ms, the interstroke intervals produced by the subjects in typing the

stimulus strings. The finger movements of the subjects were videotaped using a Phi­

lips Video-80 camera, hooked to the ceiling directly above the keyboard. In addi­

tion to a transverse view, the camera also captured a frontal view of the movements

from a mirror placed with a 45 degree angle at the top of the keyboard. The video

fields were recorded every 20 ms and serially numbered. The videotape analyses were

done using a Sony RM-400CE control panel connected to a Sony VO-2850P video recor­

der. This apparatus makes it possible to step through the videotape field by field.

Results and discussion

Of the three trials on which the subjects typed every item used in the experiment,

only one was submitted to videotape analysis. Being interested in optimal perfor­

mance, I chose the errorless trial which, for each stimulus string, produced the

smallest average interstroke interval. From the videotape, I extracted the time at

which the movements involved in the execution of the critical keystroke started

(Csl and ended (Cel. The same measures were also taken for the keystroke immediate­

ly preceding the critical keystroke (C-1) and, in some cases, for earlier key­

s·trokes (C-N) as well.

Nonwords Words Difference

1H strings:

Movement onset 58 39 19

Movement duration 161 154 7

Interstroke interval 218 193 25

2H strings:

Movement onset -18 -32 14

Movement duration 196 187 9

Interstroke interval 177 155 22

Table 1. Movement onsets, movement durations and interstroke intervals obtainedwith 1H and 2H strings.Note:The onset and duration of the movements were measured from the videotape with anaccuracy of 20 ms, whereas the interstroke intervals were measured by the computerwith an accuracy of 2 ms. This difference of accuracy explains why the movement on­sets and durations do not p~rfectly add up to the interstroke intervals.

Page 118: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Table 1 summarises the results of these analyses, averaged over items and over sub­

jects. Movement onset is defined as the time between Cs and C-1e. A positive value

indicates that the critical keystroke was initiated after the completion of the

preceding keystroke, whereas a negative value indicates overlap in movement. Note

that the results obtained with 1H strings in which the same finger is responsible

for keystrokes C and C-1 are excluded from the means presented in Table 1. The rea­

son is that overlap in finger movement is physically impossible in such cases. Also

excluded are the results obtained with the paired 2H strings. Movement duration is

defined as the time between Cs and Ce and the critical interstroke interval is the

time between C-1e and Ceo

The lexical-orthographic composition of the material caused a decrease in typing

speed, as indicated by the interstroke interval results. Moreover, the lexical­

orthographic effects were about the same for ~H and 2H strings, the results showing

a 25 ms difference between the words and the nonwords among the 1H strings and a 22

ms difference among the corresponding 2H strings. These effects were expected on

the basis of previous experiments (see Larochelle, in press). What this experiment

revealed is that the decrease in typing speed is mostly due to longer delays in the

initiation of movements. The average difference in movement onset between the words

and the nonwords (16 ms) amounted to two-thirds of the effect found among the in­

terstrokes intervals (24 ms).

These results contrast with those obtained by Olsen and Murray (1976), who found

the movement durations to be much more affected by the composition of the material

when they compared the transcription of alphanumeric codes and normal prose. The

discrepancy between the two studies probably lies in the nature of the non-lexical

material employed. Alphanumeric material requires longer finger movements than

strictly alphabetic material when the transcription is done on the standard Sholes

keyboard. By contrast, the nonwords used in the present study were generated from

the words in a way that preserved to a large extent the distances that the fingers

had to travel. As a result, the lexical-orthographic effects on movement duration

were much reduced, but not eliminated.

The results contrast with those obtained by Gentner, Grudin and Conway (1980) in

that I found much less overlap in movement. With 1H words, the movements leading to

the critical keystroke were found to start before the completion of the preceding

keystroke on only 10% of the trials. In Gentner et al.'s study, 51% of the 1H di­

graphs showed overlap in movement. Results concerning the onset of within-hand fin­

ger movements must be viewed with caution however. One methodological point that

needs to be made is that, in the case of within-hand movements, it is often diffi­

cult to determine whether a finger is actively reaching for a key, or whether it is

passively following or simply reacting to the movements of the other fingers. For­

tunately, the results obtained with 2H strings in this study do not raise the same

problem: the movements observed in typing the critical keystroke had to be inten­

tional since the hand involved had no other letter to type. With 2H words, there

was overlap in movement on slightly more than half of the trials (56%). For one

subject, the proportion of trials showing overlap reached 90% with 2H words, a val­

ue approaching that obtained by Gentner et al.119

Page 119: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Perhaps more interesting than the number of trials showing overlap in movement or

the amount of temporal overlap obtained is the coordination of the movements ob­

served on trials with overlap. There was a tendency for the movements leading to

the critical keystroke to start at particular moments during the execution of ear­

lier keystrokes. For three of the SUbjects, the movements rarely started before the

hitting motion of the previous keystroke was initiated. with 2H strings, these two

events: the upward, lifting motion of one finger and the downward, hitting motion

of the other, often occurred in close synchrony. The performance of the fourth sub­

ject, mentioned previously, was distinguished by the fact that, in many cases, two

fingers started to lift in close synchrony. Kelso, Southard and Goodman (1979)

found evidence of some regulatory mechanism involved in controlling the coordina­

tion of simple two-handed movements. The existence of privileged moments for move­

ment onset suggests that similar mechanisms are also involved in typing.

Subject Position of Repeated Repeated Homologous Otherscritical letter finger keyskeystroke

3 4 week

4 3 hobo kiby lien hivl

purl yiby

4 been holt

niib

uiis

5

6 4 hood

Table 2. Categorisation of the trials in which the movements were initiated out ofsequence

There were trials in which the movement toward the critical key started before the

preceding keystroke was initiated. All such trials involved 2H strings and, as

shown in Table 2, most of them were found in the performance of the fourth sub­

ject. From Table 2, it appears that out-of-sequence initiation of movements occurs

in rather limited situations, which are also known to be conducive to errors. In

most cases, the critical keystroke was preceded or surrounded by a repeated let­

ter. This situation is known to lead to doubling errors in which the wrong letter

is repeatedly typed. The fact that the movements started early when the cri tical

keystroke was preceded by a repeated finger, as well as with repeated letters, sug­

gests that there may be some economy in the number of effectors that can be simul­

taneously controlled by the motor system. There were two cases of out-of-sequence

initiation of movements in which the critical keystroke and the preceding one in­

volved homologous keys. These cases and homologous errors in which a letter is re­

placed by the homologous letter, are also suggestive of low-level interactions in

the control of bi-manual movements.

Although they are still incomplete, the results presented here have some implica­

tions with respect to the processes underlying skilled typing. First, it appears

120 that the specification of the effectors and of the general direction of the move-

Page 120: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

ments can occur very early. In typing the word holt, for instance, the movements

toward .!:. started halfway through the execution of !!., 320 ms before the movements

leading to 1 were initiated. Second, the longer delays in movement onset obtained

with nonwords suggest that the access to the information concerning the effectors

needed and the location of the keys may be influenced by the lexical-orthographic

composi tion of the material. Third, the fact that there was also a difference in

movement duration, albeit small, between the words and the nonwords suggests that

the control of the movements was not as efficient in typing nonwords. This, in

turn, suggests that the movements were not fully specified at onset time. Accor­

dingly, the results have provided some evidence for the existence of regulatory me­

chanisms involved in typing. I hope to obtain more information about the nature of

these mechanisms from the digitisation of the videotapes which is currently in pro­

gress.

Summary-

An experiment is reported whose purpose was to investigate the composition of the

movements involved in skilled typewriting. The method consisted in presenting the

subjects with isolated words or word-size letter strings that they were asked to

type, on cue, as fast and accurately as possible. The performance of the sUbjects

was recorded on videotapes to determine at which time the movements leading to suc­

cessive keystrokes were initiated and how long the movements took to complete. The

lexical-orthographic composition of the material was found to influence primarily

the time of movement onset, but there was also a small difference in movement dura­

tion between the words and the nonwords. The results confirmed earlier reports that

successive keystrokes are not always executed in a strictly sequential fashion but

that the movements often overlap, sometimes even starting in an order different

from the order in which the keys are finally pressed. The analysis of the movement

onset times and of the cases in which the movements were initiated out of sequence

suggested the existence of some regulatory mechanisms influencing the execution of

the typing response.

References

Gentner, D.R., Grudin, J. and Conway, E. (1980) Finger movements in transcription

typing. Techn. Rept. 8001. University of California at San Diego, Center for Hu­

man Information Processing, La Jolla, Calif.

Kelso, J.A., Southard, D.L. and Goodman, D. (1979) On the coordination of two­

handed movements. Journal of Experimental psychology: Human Perception and Per­

formance 2' 229-238.

Larochelle, S. (in press) A comparison of skilled and novice performance in dis­

continuous typing. In W.E. Cooper (Ed.), Cognitive aspects of skilled typewrit­

ing. New York: Springer-Verlag.

Olsen, R.A. and Murray, R.A. (1976) Finger motion in typing of texts of varying

complexity. Proceedings, 6th Congress of the International Ergonomics Associa­

tion, 446-450.

Shaffer, L.H. (1976) Intention and performance. Psychological Review 83, 375-393.

Shaffer, L.H. and Hardwick, J. (1968) Typing performance as a function of text.121

Page 121: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

122

Quarterly Journal of Experimental Psychology 20, 360-369.

Sternberg, S., Monsell, S., Knoll, R.L. and Wright, C.E. (1978) The latency and

duration of rapid movement sequences: Comparisons of speech and typewriting. In

G.E. Stelmach (Ed.), Information processing in motor control and motor learn­

ing. New York: Academic Press, 117-152.

Page 122: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Syntactic, semantic and pragmatic parsing for a natural languagedialogue system

H.C. Bunt and G.O. thoe Schwartzenberg*)

Introduction

In the last few years we have started investigations into a variety of aspects of

pure information-exchange dialogues (' infm.'mcltion dialogues'), with a particular

interest in dialogues between a computer ana a human user in restricted forms of

natural language. These studies, carried out in the Institute I s working group on

dialogues, concern ergonomic, cognitive and linguistic aspects and their interplay

as observed in experimental settings (Van Katwijk et al., 1979; Bunt et al., 1980;

Van Katwijk, 1981) and as analysed from theoretical points of view (Bunt and Van

Katwijk, 1980; Bunt, 1981b). Parts of this work are being applied in the design of

simple man-machine dialogue systems involving speech (see Leopold, this issue;Waterworth, 1982), and we have now started de'7eloping a computer model integrating

our theoretical conception of the interpretation and generation of dialogue ele­

ments in connection with dynamic models of ~:t1e dialogue situation. The model is

provisionally called INDIS (IPO Natural Language Dialogue System) and is intended

to develop eventually into an intelligent automatic dialogue partner.

In this paper we describe the multilevel parser for small fragments of Dutch and

English that has been developed as part of the INDIS system. The aim of thisparser, given an input belonging to the natural language fragment, is to construct

formal representations for all those interpretations in terms of direct speech acts

and literal meanings that are semantically well-formed, given the domain of dis­

course, and pragmatically well-formed given the dialogue situation.

The general language interpretation framework

The general framework for natural language interpretation that is used here is that

of two-level model-theoretic semantics (Bunt, 1981 a, after Bronnenberg et al.,

1980; Medema et al., 1975), combined with an adaptation of speech act theory to in­formation dialogues.

Generally, in model-theoretic semantics a natural language sentence is translated

into a formal language, the result being an unambiguous representation of the lite­

ral meaning of the sentence. An ambiguous sentence has more than one such represen­

tation. The representations in the formal language have a value that can be calcu­

lated, given a model of the discourse domain. In the two-level model-theoretic ap­

proach the translation is performed in two steps. In the first only those semantic

aspects are taken into account that relate to the structural properties of the sen­

tence, as determined by word order and function words. Only the logical form of the

sentence is determined, so to speak. This logical form is represented in a formal

language of which the constants have a one-one correspondence to content words in

*) Now at Tilburg University, Department of Sociocultural Sc.iences 123

Page 123: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

124

the natural language, and which are correspondingly ambiguous. The second steptakes care of the semantic aspects of the natural language content words, conserved

in the constants of the formal language, by relating these to the elements in the

discourse domain. With these elements the constants of a second formal language

have a one-one correspondence, and so the second step takes the form of translating

the expressions of the first formal language into the second. Choosing the two

formal languages in such a way that every syntactic construction in the first is

also allowed in the second, this translation can take the form of replacement of

lexical items.

The choice of these formal languages is an obviously important one. For the INDIS

model we have taken two extensions of a family of languages, designed for the

PHLIQA question-answering system (Bronnenberg et al., 1980). Like virtually anyformal language, the PHLIQA representation languages are based in their semantics

on set theory, that is on the idea of discrete individual objects and relations be­

tween them. They therefore have no adequate means of representing semantic struc­

tures that involve continuous concepts like water, paper, freight or time (cf.

Bunt, 1983a; Hayes, 1974). To overcome this difficulty, Bunt (1981a) has defined an

extension to classical set theory called ensemble theory, which contains formalobjects with a nondiscrete structure, and has defined extensions of the PHLIQA lan­

guages to iclude ensemble-theoretical constructs. Of this family of 'Ensemble Lan­guages', two instances were defined at the two main levels of representation i,n

INDIS. The first language is called EL!F, F for 'formal', since expressions in thislanguage describe logical forms of sentences, the second is called EL!R, R for

'referential', since the constants in this language are used to describe how con­tent words refer to the discourse domain.

The pragmatic part of the framework consists, first of all, in the definition of a

class of communicative actions that is assumed to contain all and only those typesof acts needed for participating in information dialogues. These acts, which we

call dialogue acts, have a semantic content and a communicative function. We define

the function of a dialogue act by the way in which the act, given its semantic

content, affects the state of the addressee. On the assumption that the addresseeinterprets the act in the way intended by the speaker, this amounts to the same as

defining the function by the preconditions that have to be satisfied by the speak­

er's state if he is to perform the act according to normal conventions for coopera­

tive linguistic behaviour (Allwood, 1976).

The class of dialogue act functions has a hierarchical structure, owing to the fact

that some acts are more 'specific' than others: certain acts have stronger precon­

ditions than others. For instance, a check is more specific than a question, and a

confirmation is more specific than an answer. Given this well-defined class of dia­

logue acts, the process of determining the communicative function from an utterancethat realises it, which is very difficult in general, comes within reach: in infor­

mation dialogues, where the participants are not trying to be funny, are not iro­nic, etc., there are relatively straightforward relations between syntactic proper­

ties of utterances and the communicative functions of dialogue acts.Even so, the relation between utterances and dialogue acts is not simply a one-one

Page 124: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

one, and it is convenient to have an intermediate level where representations are

constructed of those syntactic aspects of an utterance that contribute to the de­

termination of the underlying dialogue act. 'Surface speech acts' seems an appro­

priate name for these representations (cf. Appelt, 1982). Among the factors deter­

mining the surface speech act are syntactic mood, interrogative adverbs, pronouns

and determiners, adverbs indicating the speaker's degree of certainty about his

knowledge or indicating (dis)agreement between his knowledge and that of his part­

ner (adverbs like 'indeed', 'really', 'maybe'), some special lexical items

('right', 'OK'), and, in written language, punctuation marks.

To take these factors into account the grammar rules have, besides a syntactic and

a semantic part, a pragmatic part that builds up the surface speech act represen­

tation.

The grammar

The type of grammar we use can be considered, as far as the syntax and semantics

are concerned, a variant of the kind of Montague grammar proposed by Partee (1973)

and more in detail by Scha (1981). It consists of a lexicon plus a collection of

compositional rules, each consisting of a syntactic part building syntactic struc­

tures from simpler ones, a semantic part building corresponding semantic struc­

tures, and a pragmatic part building surface speech act representations ('SSA re­

presentations'). The form of SSA representations is still under study; currently we

use lists of attribute-value pairs for this purpose.

The grammar defines a class of ordered trees ('syntactic trees'), of which the ter­

minal nodes are natural language words. For each syntactic tree the grammar defines

an EL/F expression, which is the first-level semantic representation of the natural

language expression at the terminal nodes, and an SSA representation.

The syntactic part of a rule consists of two rules, a phrase structure rule (PS

rule) and a feature rule, the latter formulating conditions on syntactic features

of the constituents involved and specifying how these features propagate. The sepa­

ration of the syntactic rules into a PS rule, which is in terms of syntactic cate­

gories, and a feature rule is purely a matter of convenience. Categories and fea­

tures could in principle be combined, which would lead to a phrase structure gram­

mar in terms of complex symbols (Gazdar, 1982).

The PS rules are in fact not phrase structure rules in the usual sense, since not

all of them are purely concatenating. This is because discontinuous constituents

are handled by means of rules like

(1) V + (NP) + ADV - VP

which says that a discontinuous verb phrase may consist of a verb and an adverb se­

parated by an NP. To combine a discontinuous constituent like the VP, formed by

this rule, with another constituent C, which may be discontinuous as well, a rule

is needed that inspects the relative positions of the constituents of VP and C,

rather than those of VP and C themselves. As a result, the syntactic tree of an ex­

pression with discontinuous constituents may have crossing branches:125

Page 125: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

( 2)

126

Although this may be temporary, the grammar currently has the restriction that a

discontinuous constituent is 'dissolved' whenever it is combined with another con­

stituent. The following example illustrates this:

( 3 ) VP + NP -. S

/~ I /I~V ADV PPRON V NP ADV

I I I I I ICall again me Call PPRON again

Ime

The grammar considers an expression 'grammatical' if it can be assigned a syntactic

tree with a single top node in which no words are 'hanging loose'. As a result, the

syntactic tree of a grammatical expression never contains crossing branches.

In its syntactic and semantic coverage, the present grammar is a modest extension

of that described in Bunt (1981a), with the necessary adaptations in the Dutch ver­

sion. Noteworthy are the treatment of mass nouns and mass adjectives, that of

amount expressions, and in general the detailed treatment of quantification pheno­

mena. (This treatment is discussed in Bunt, 1981a chapter 17. A novel element in

the current grammar is that questions of 'distribution', i.e. of distinctions like

collective vs. distributive quantification, need not be resolved at the EL/F lev­

el. This is an advantage, since such questions can in general only be settled by

taking domain-specific knowledge into account. See also Bunt, 1983b.)

Overall structure of the parser

The most recently implemented version of the parser uses a simple, brute-force

strategy which is convenient for experimenting with the grammar. The analysis is

bottom-up and from left to right. First, the words of the input sentence are iden­

tified by a dictionary lookup. For each entry, the dictionary provides:

(i) A syntactic tree consisting of a single node labelled with the name of the

syntactic category and the syntactic features,

(ii) The semantic representation of the word in EL/F. For a content word this is

simply an EL/F constant; for a function word this is a complex expression,

for instance, the EL/F representation of the determiner 'each' is

(AX: ().P: (''itx E X: P( xl) l),

(iii) An SSA structure, consisting of a sequence of pragmatic attributes with their

values. For instance, for the adverb 'indeed' there is an attribute called

'concord', which has the value 'positive'.

Page 126: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

There is more than one lexical entry with the same word if the word can have dif­

ferent syntactic functions, such as the word 'is' (copula, auxiliary, or main verb)

or the Dutch word 'een' (article or numeral). Every syntactic node from the dictio­

nary is supplied with a left and a right connection pointing to its predecessor andits successor. Moreover, the nodes with the associated EL/F expressions and SSA

structures are stored according to their left-to-right order in a LiFo stack of ob­

jects, that will successively be the focus of the parsing process. Ambiguous words

are represented as different objects with the same left/right connections.

The parser now performs a full parallel analysis. Beginning with the element on top

of the stack, which corresponds to the leftmost word in the input, all the grammar

rules are tried. Every hit (the conditions of a rule are satisfied) results in the

generation of a new node that has the old node(s) as its antecedent(s) and inherits

the left- and right connections. The syntactic category and features, semantic con­

tent and SSA structure for this node are constructed according to the grammar rule

which is applied, and the result is pushed in the LiFo stack.When all the rules have been tried the highest node is popped from the stack and

will be the next object for which the rules are tried.

When the parser has produced a node which is the top of a syntactic tree in which

all the words of the input expression have a place, this process stops. In this

case the input is considered 'grammatical'. The EL/F expression built up in the

meantime, and associated with the top node, is the first-level semantic representa­

tion of the input expression; the surface speech act information is represented in

the SSA structure. From this structure, the possible communicative function of the

underlying dialogue acts are calculated using the function hierarchy mentionedabove.

The next step in the process consists of replacing the constants in the EL/F ex­

pression by their EL/R translations, provided by an EL/F - EL/R lexicon. Since EL/F

constants still have the lexical ambiguities of the natural language words they re­

present, the EL/F-EL/R lexicon often gives more than one possible translation. Not

every possible combination of translations for the various constants in an EL/F ex­

pression makes sense. Those combinations that make no sense in the given domain of

discourse are filtered out by means of a type-checking procedure that makes use ofthe fact that EL/R has a semantic type system with domain-dependent atomic types.

The resulting EL/R expressions represent the semantic content of the dialogue acts

in the terms of the discourse domain. These dialogue act representations are the

inputs for the next INDIS component, which considers incoming dialogue acts in thelight of the current model of the dialogue situation. This component is still underdevelopment.

All implementation work is carried out in PASCAL on the Institute's VAX 11/780.

127

Page 127: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

128

Summary

A grammar and a parser are described that combine ideas from generalised phrase

structure grammar, two-level model-theoretic semantics (using ensemble theory), and

speech act theory in the analysis of inputs to a natural language dialogue s~stem

intended to model aspects of participating in pure information-exchange dialogues.

References

Allwood, J. (1976) Linguistic communication as action and cooperation. Gothenburg

monographs in Linguistics Vol.I.

Appelt, D.E. (1982) Planning natural language utterances to satisfy multiple

goals. SRI International Technical Note 259.

Bronnenberg, W.J., Bunt, H.C., Landsbergen, S.P.J., Scha, R.J.H., Schoenmakers,

W.J., Utteren, E.P.C. van (1980) The question answering system PHLIQA1. In L.

Bolc (ed.), Natural communication with computers. McMillan, London, etc.

Bunt, H.C., Katwijk, A.F.V. van, Muller, H.F., and Nes, F.L. van (1980) Dialogue

control acts. IPO Annual Progress Report ~, 95-99.

Bunt, H.C., (1981a) The formal semantics of mass terms. Doctoral dissertation,

university of Amsterdam. Revised edition to be pUblished as Bunt (1983b).

Bunt, H.C. (1981b) Rules for the interpretation, evaluation, and generation of

dialogue acts. IPO Annual Progress Report ~, 99-107.

Bunt, H.C. (1983a) The formal representation of (quasi-)continuous concepts. In

Hobbs, J.R. and Moore, R.C. (Eds) Formal Theories of the Common Sense World.

Contributions to Artificial Intelligence Vol. I. Ablex Publ. Corp., Norwood NJ.

Bunt H.C., (1983b) The formal semantics of mass terms. Revised edition of Bunt

(1981b). Cambridge University Press, Cambridge.

Gazdar, G. (1982) Phrase Structure Grammar. In Jacobson, P. and Pullum, G.K.

(Eds) The nature of syntactic representation. Reidel, Dordrecht etc.

Hayes, P.J. (1974) Some problems and non-problems in representation theory. AISB

Summer ~onference, University of Sussex.

Katwiik, A.F.V. van, Bunt, H.C., Leopold, F.F., Muller, H.F., and Nes, F.L. van

(1979) Naive subjects interacting with a conversing information system. IPO An­

nual Progress Report li, 105-112.

Katwijk, A.F.V. van (1981) Explorations in the experimental study of information

dialogues. IPO Annual Progress Report ~, 108-113

Medema P. (1975) PHLIQA1: Multilevel semantics in question-answering. AJCL Micro­

fiche 32.

Scha, R.J.H. (1981) Distributive, collective and cumulative quantification. In

Groenendijk, J.A.G;, Janssen, T.M.V. and Stokhof, M.B.J. (Eds) Formal methods in

the study of language. Mathematisch Centrum, Amsterdam.

Waterworth, J .A. (1982) Man-machine speech I dialogue acts'. Applied Ergonomics

.!.l(3),203-207.

Page 128: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

On information retrieval by inexperienced users of data bases

F.L. van Nes and J. van der Heijden*)

Introduction

There is a great need for efficient information storage and retrieval systems, in

the professional and, to a certain extent, also in the private sphere. In prin­

ciple, computers with their enormous capacity for well-ordered storage of informa­

tion offer excellent possibilities for information handling. However, the results

of experiments we reported earlier (Van Nes and Tromp, 1979: Van Nes, 1980) as well

as experience obtained with recently introduced public information systems, such as

Teletext or Videotex show that information retrieval from a seemingly ideal medium:

th~ computer memory, is in effect rather difficult. In this paper we discuss some

of the results of an experiment on information retrieval with two different search

methods: menu selection and keyword entry (see Van Nes and Van der Heijden, 1980).

The experimental data have been analysed in an attempt to find out what makes such

search processes difficult, and how these difficulties could possibly be dimin­

ished, e.g. by choosing the better search method.

Method

Retrieval task

The retrieval task of our 12 subjects ·was to find out, with each of the search

methods, when and where five particular TV programmes were broadcast, according to

a simulated TV guide for all programmes of one week, provided by our data base.

Since TV programmes cover a wide spectrum of themes, our data base had a very gen­

eral character. The programmes were classified systematically, in a three-level

hierarchy: (level 1) classes of programmes, e.g. 'Children's and youth programmes':

(level 2) categories within each class, e.g. 'Pop music shows': (level 3) actualprogrammes, e.g. 'Teenager sounds'.

The data on broadcasting time and network of these programmes could be found at the

fourth level of the hierarchy. Here we will focus our attention on the selection

process by the subjects as a function of level when they were searching by menu se­lection or keyword entry.

Structure of the data base

An outline of the data base is shown in Fig. 1. Each box in the figure represents a

cell in the data base, containing a paragraph of text which when activated, wasdisplayed on the lower half of a VDU screen. The upper half of the screen was re­

served for information fed back to the subject after his actions; the two screen

halves together will henceforth be called a 'page'.

Table 1, displaying the contents of cell II in Fig. exemplifies the way in which

one class of TV programmes was subdivided into ten categories. One of these catego-

*) Philips Research Laboratories, Project Centre Geldrop 129

Page 129: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

K

level

1

2

I.....IL-'L.--'L...J'--JL...JIL...J~l.....I 4lID

Fig. 1. Structure of the data base in the experiment. Of the 43 index cells onlevels 1-3, 17 could be reached directly with an appropriate keyword: they are de­noted by a 'K'.

ries could be selected by typing its number: the screen would then display the

appropriate group of TV programmes at level 3. All cells from levels 2 and 3 pre­

sented the option: 'None of these programmes', see Table 1. At level 2, choosing

this option (by typing 11) would lead back to level 1, whereas at level 3 it would

lead back to the previous block of programme categories in level 2. The option was

not presented at level because no alternative outside of the ten classes at the

top level existed.

Cartoons

Puppet shows

Documentaries about animals

Children's short, daily programmes

Youth films

Children's and youth series - stories (Dutch)

pop music shows

Children's and youth series - varied

None of these programmes

1 Children's and youth series - stories (Foreign)

2 School television

3

4

5

6

7

8

9

10

Table 1. The contents of cell II in Fig. 1, showingthe ten categories together making up the programmeclass 'Children's and youth programmes'.

130

Page 130: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The data structure was incomplete at the 3rd and 4th levels because of software

limitations. Owing to this incompleteness, the three selection levels of the tree

contained altogether 43 'index' cells, corresponding to 43 index pages: see Fig. 1.

Our data base could thus be called small. Of the 43 index cells, 17 could be ac­

cessed directly by typing an appropriate keyword: they are denoted by a 'K' in

Fig. 1. From the index cell thus accessed via a keyword, the search process pro­

ceeded with menu selection.

Results

Frequencies of right and wrong choices for both search methods

As a kind of state diagram, Fig. 2 presents, for the menu selection method, the

frequencies of right, wrong and corrective choices of all the subjects taken to­

gether, with their successive choices, at the three levels of the tree. A completed

correct search process is represented by the sequence of four bold descending ar­

rows and the four circles at the left of Fig. 2: a subject begins to search for a

TV programme and makes the right choice at level (uppermost arrow, representing

3Q cases), then makes the right choice at level 2 (descending arrow between levels

1 and 2, representing 55 cases), followed by the right choice at level 3 (48 ca­

ses), which leads to the desired data at level 4 (again 48 cases). Hence the first

column represents the ~ight choices as well as the gata pages: the second column

represents the '~one of these programmes' corrective choices and the third column

the ~rong cho ices. As a further example of how the figure should be read, there

were 56 ~rong choices at level 1, followed by 1 right choice at this level (the

preceding wrong one had been an illegal command, i.e. not one of the numbers 1-11);

12 corrective choices at level 2 and 43 wrong choices at level 2.

The arrows from the first and third circle of level 2 upwards reflect the incom­

pleteness of the 3rd level that can be seen in Fig. 1, although the ten cells from

level 2 each contained a list of ten TV programme categories, like that of Table

1, the structure was further developed into the 3rd level for only 32 of these ca­

tegories. When one of the remaining 68 categories was selected, the subject was re­

turned to levell, where he saw a page containing in its upper half, the text 'Un­

fortunately this week no programmes are being broadcast from the category you have

iust selected. Perhaps you would like to choose another programme, from one of the

classes displayed below': i.e. those from level 1.

From Fig. 2 can be deduced that in 1+8+3+13+27=52 cases the subjects indeed select­

ed one of those 68 categories. In 12 of these 52 cases, represented by the three

dotted arrows, this was a ~ight choice, since one of the TV programmes that had to

be found by the 12 subjects was 'not being broadcast' - as may happen in reality,

too. For 8 of those 12 subjects, this concerned the last of the five programmes

they had to find. The fourth level of the data tree was also incomplete (see Fig.

1): only for 5 out of the 32 groups of programmes contained in the cells at level

3, was actual broadcast informati6n available at level 4. But during the experi­

ment, selection of a programme from the other 27 groups -which returned the subject

to levell, with an explanatory text in the upper half of the page- only occurred

four times, represented by the two arrows in Fig. 2 from the lower right circle at

level 3 to level 1.

131

Page 131: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

39 17

Fig. 2. State diagram of the menu selection method,representing the right (R), wrong (W) and corrective (N)choices of all subjects together at each level of thetree, with the successive choices. The different fre­quencies of the successions of states are denoted, to acertain extent, by the three different widths of thecorresponding arrows. The middle circle at level 3,e.g., represents the 27 corrective choices that weremade at this level; 3 of those, represented by the semi­circular arrow beginning and ending in the circle, werepreceded by another corrective choice at level 3.

level1

2

3

4

132

A number of observations may be derived from Fig. 2:

- out of a total of 135 wrong choices, 41% were made at level 1, 55% at level 2 and

4% at level 3.

- the a posteriori probabilities of making the right choice at levels 1 and 2 were

approximately equal, i.e. 68/(68+56) = 0.55 and 78/(78+74) = 0.51, respectively;

at level 3 this probability was much higher, viz. 75/(75+5) = 0.94.

- of the 68 right choices at levell, 19% were followed by a wrong choice at level

2.

Page 132: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

- a wrong choice at level 1 in 43 out of 56 cases, i.e. 77%, was followed by anoth­

er wrong choice at level 2.

- the number of extra pages which were viewed because of wrong choices at level 1

was 68+56-60 = 64, at level 2, 60+19+73-60 = 92 and at level 3, 48+27+5-48 = 321

the incompleteness of level 3 reduced the number of extra pages there by at least

40, since after 40 wrong selections at level 2 the subject was returned immedi­

ately to levell, whereas he would have done so via level 3 for a completed tree

structure.

level1

2

3

4

Fig. 3. State diagram of the keyword entry method, representing, for all subjectstogether, the right (R) and wrong (W) choices of keywords, the right (R) and wrong(W) attempts to obtain a list of keywords, as well as the right (R), wrong (W) andcorrective (N) menu-selection choices in the data tree which followed on accepted('right') keywords. As in Fig. 2, the different widths of the arrows connectingsuccessive states indicate the different frequencies (precisely registered by thenumbers next to each arrow) of these successions. See text and the caption of theanalogously built-up Fig. 2 for further explanation.

A scheme similar to that of Fig. 2 can be made for the keyword entry method. In

this diagram, Fig. 3, the five closely spaced circles at the left plus the circle. 133

Page 133: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

134

marked 'K' at level 1 represent those states of the search process which are char­acterised by the use of keywords. The right half of the scheme is analogous to

Fig. 2. The shortest, most efficient search path is now represented by the three

bold descending arrows with the circles which they connect: a subject then begins

to search by typing an accepted, i.e. ~ight keyword (34 cases). Such a right key­word in 29 cases led to level 3, in all cases followed by a right choice from the

displayed list of TV programmes. The other arrows fanning out from the right-hand

side of the circle representing the total number of cases in which a right keyword

was used (64), show how often these accepted keywords were followed by right, wrongand corrective choices at level 2 and level 3 of the data tree. Note that an ac­

cepted, 'right' keyword may not be the most appropriate one for a specific search

aim! In 17 cases the subjects started searching by typing an unaccepted, i.e. ~rong

keyword: they were then informed of this on the screen and asked to try again, and

moreover the screen said that it was possible to consult a list of accepted key­

words. This list was subsequently obtained 9 times with a correct command whereas

an incorrect, unsuccessful command was used twice (leading to the 'list of keyword'

circles, with R/16 and W/2, respectively). In 4 cases a subject started to searchby immed iately invoking, wi th the proper command, the keyword 1 ist. It was al to­

gether used 16 times in the correct way to obtain a keyword.

The two other circles connected wi th the use of keywords in Fig. 3 are the upper

one marked 'I', representing 'illegal' commands, and the one marked 'K' at level 1

in the data tree. The latter represents 4 cases in which subjects resumed searching

with keywords: two subjects did so after having already searched for a particular

programme for quite a while by the menu selection method (initially having usedkeywords, of course).

From Fig. 3we can see, for example,

1. that in those cases where the subjects typed a keyword right at the start of a

search process, Le. without previous consultation of a keyword list, in two

thirds of all cases they picked a 'valid' keyword:2. that 44% of the valid keywords which were used led to level 2, and 56% to level

3 of the data tree.

Selection times

Schemes like Fig. 2 tell us nothing, of course, about the time that subjects need

to select their choices. Yet the distributions of selection times have a theoreti­

cal as well as a practical importance - e.g. for page charge systems. The histo­

grams of Fig. 4 represent right, wrong and corrective choices combined, for themenu selection method. They show that:

the average selection time SUbjects needed was longest at level 1, viz. 23 s, a

little bit shorter at level 2 (19 s) and considerably shorter at level 3 (12 s).

- there is a shift in the distribution of selection times, from longer to shorter,

when subjects descend the hierarchical levels, i.e. the percentage of shorter se­lection times increases, which may be regarded as evidence that, not suprisingly,

the uncertainty of SUbjects about their decisions decreases when they home in onmore specific data assortments.

The tails of the histograms bear witness to the fact that, although the subjects

Page 134: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

60 level 1 level 2 level 3

50

+40<IJ<IJ21()

~ 30CIlQ.

<IJ

Qj 20.0E:::lc:

10

0

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80-.-- seconds -.-- seconds .- seconds

Fig. 4. Histograms of selection times for all right, wrong and cor­rective choices of 12 subjects using the menu-selection method, forthe three index levels. The first class at each level refers to timesfrom 1-10 seconds, the second class to times from 11-20 seconds, andso on. The vertical dotted lines indicate the average selection timesat the three levels.

were only searching a very small data base, deciding about their next choice was

often a slow process. Page-observation times of more than one minute occurred with

nine of the subjects.

Performance of individual subjects

In order to fully cOmpare both access techniques, menu selection and keyword entry,

we need more data on individual subject performance. These will shortly be present­

ed in detail elsewhere; here we will only restrict ourselves to the following:

- with menu selection, none of the subjects managed to find the five TV programmes

in the minimum number of pages (19); five subjects needed more than twice this

number. When searching with keywords, none of the subjects found the five pro­

grammes in the minimum number (15) either, whereas two subjects needed more than

twice this number;

- mean individual page display durations varied widely between subjects. For nine

of them, they were longest when searching with keywords. Generally, this was due

to the rather long time needed for typing keywords, instead of pressing one nume­

ric key;

- averaged over all subjects, the time needed to find a particular TV programme was

not very different for both search methods. However, the performance of indivi­

dual subjects may be markedly superior with one of these methods. After the expe­

riment, the subjects were interviewed and asked which search technique they pre-

ferred. Their preferences were evenly distributed over both techniques and did 135

Page 135: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

136

not always correspond to their objective performance.

Discussion and conclusions

Menu selection versus keyword entry

What are the pros and cons of the two search methods we investigated? A comparison

of Fig. 2 and Fig. 3 shows that, in principle, keywords provide a safer way of

reaching the 3rd level of the data tree than menus. In particular, if the list of

keywords were consulted, it ought to be possible to select a sufficiently specific

keyword to get access immediately to this 3rd level, where wrong choices are rare.

However, this list was only consulted 16 times and sUbjects often used a keyword

which was right, i.e. accepted by the system, but this only led them to level 2,

where the chance of subsequently making a wrong menu selection was apparently quite

high. It should be realised, however, that 77% of the 'traffic' between levels 1, 2

and 3 of the data tree which was caused by wrong choices at levels 1 and 2 and cor­

rective choices at levels 2 and 3 -represented by the six medium-width arrows in

the right half of the tree- originated from two sUbjects searching for one particu­

lar programme. At any rate, the greater 'safety' provided by keywords may be re­

flected by the smaller number of subjects who needed more than twice the minimum

number of pages when using keyword entry. On the other hand, about half of the sub­

jects preferred menu selection, some of them because they were not very proficient

in typing. Since subject performance with one of the two search methods was often

clearly superior, it would appear that a system providing both access routes, like

the one investigated, should be chosen in practical situations.

Retrieval aids

It is found that retrieving data, even from a small data base, is not a simple pro­

cess. With menu selection, for instance, the long page-observation times that often

occurred (Fig. 4) could not prevent subjects from frequently making a wrong choice,

especially at level 2 (Fig. 2). Probably they would have been helped with advance

information on the contents of an information category at the next hierarchical

level - such information has been shown to be helpful in viewdata systems (Frank­

huizen and Vrins, 1980; Lee et al., 1982). In order to diminish further the diffi­

culties which users of such systems may encounter, their search processes have to

be analysed in detail. It turned out, for instance, that our sUbjects often had

problems in (i) searching a page for a particular item of information, (ii) remem­

bering which items from a menu they had already selected unsuccessfully, (iii)

grasping the hierarchical structure of the data base, i.e. the relation of the dif­

ferent levels to one another. These problems might be remedied by, respectively (a)

carefully designing the layout of each page, (b) providing some form of memory

aids, (c) giving good instructions before the data base is used and making it pos­

sible to obtain help while it is used. As to the latter, a simple measure, like

clearly displaying the level to which each page belongs, may already be useful.

Summary

In order to investigate the possibilities of using computers for convenient infor-

Page 136: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

mation retrieval, in the private as well as the professional sphere, an experiment

was done on data retrieval from a hierarchically structured data base with four

levels. Subjects had to use two different search methods, menu selection and key­

word entry. The results show that retrieving data, even from a small data base, can

be a lengthy and laborious process. Averaged over all 12 subjects, searching with

keywords was slightly more efficient. But the performance of individual sUbjects

may be markedly superior with one of these methods. The subjective preference of

the subjects was evenly distributed over both methods; it did not always correspond

to their objective performance. Suggestions are given for possible retrieval aids.

References

Frankhuizen, J.L. and Vrins, T.G.M. (1980) Human factor studies with viewdata.

Proceedings of the 9th international symposium on human factors in telecommuni­

cation, 29 September - 3 October 1980, Redbank, New Jersey.

Lee, E.S., Whalen, T.E., McEwen, S. and Latremouille, S. (1982) Human factors in

Videotex information retrieval. Proceedings of the 1982 International Zurich Se­

minar on Digital Communications - man-machine interaction, March 9-11, 1982, Zu­

rich (IEEE Catalog No. 82CH1735-0).

Nes, F.L. van (1980) Searching TELETEKST, a sequentially broadcast data base. IPO

Annual Progress Report 12, 109-112.

Nes, F.L. van and Heijden, J. van der (1980) Data retrieval with hierarchical or

direct entry methods. Ergonomics ~, 515.

Nes, F.L. van and Tromp, J.H. (1979) Is viewdata easy to use? IPO Annual Progress

Report li, 120-123.

137

Page 137: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Developments

F.F. Leopold i

In last year's issue of the Annual Progress Report we reported the initiation of a

study in which characters for use in teletext messages, broadcast by the normal TV

channels were designed and perceptually evaluated. These characters are based on a12x10 dot matrix which allows for considerably more refined configurations than the

present 6x10 videotext standard. Criteria established in earlier IPO research (Bou­

ma and Leopold, 1969) were applied to the design of an alphanumeric character set

which was experimentally compared with other proposals for 10x12 characters as wellas the present set.

The legibility of the 'IPO Normal' character set was shown to be superior. The com­

plete set of the 196 'IPO Normal' 10x12 dot matrix characters (alphanumerics, punc­

tuation marks and supplementary symbols) is now protected under the rules of the

International Design Registration effected under the Geneva Protocol of 1975.

This year we started a study on spoken man-computer dialogues to be used by tele­phone subscribers in order to get direct access to special computerised services

without having to perform complex pushbutton operations or needing the interventionof a human operator. The problem with all kinds of supplementary services offered

by the telephone company is the rather strict and opaque procedure the user has tofollow before such a service is available to him. In particular, such facilities as

the transfer of calls to another number, three-party service, or the transfer of

call charges, in general features that are mostly used on an ad-hoc basis, require

a cryptic series of pushbutton commands whose structure has no compatible associa­

tion with the user. Verbal assistance by applying spoken instructions from the te­

lephone exchange is useful but is only a makeshift. The information exchange loop

between the telephone subscriber and the supplementary service system would be

closed more naturally and efficiently by enabling him to use simple spoken com­mands, identified by a voice recognition system.

On the basis of an existing system, the interchange of the system's prompts and the

user's dial commands is replaced by a 'conversation', based on some presuppositions

concerning such aspects as the terminology and phrasing the listener would expect

from a machine, the user's limited ability to memorise verbal information in menu­form, and the difference in attitude towards the system between a trained and an

untrained user. In the near future we plan to test the dialogue being designed in asimulation experiment. It is hoped that a tested version can be implemented in an

interface with the telephone system for evaluation in an office environment.

The final aim is to develop guidelines for engineers working on equipment thatgives the remote telephone customer direct access to such facilities as: supplemen-

139

Page 138: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

140

tary telephone services, directory assistance, or a warehouse ordering system by

using spoken instructions.

References

Bouma, H. and Leopold, F.F. (1969) A set of matrix characters in a special 7x8

array. IPO Annual Progress Report !, 115-119.

Page 139: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Learning to type on a chord keyboard

P.A. Barbonis*) and F.L. van Nes

Introduction.

Typewriters based on C.L. Sholes' 1873 design are found in almost all offices, andwhat is striking about them is that little has changed since Sholes laid-out the

keys in the so-called QWERTY format by which it is now generally known. An impor­

tant but adverse characteristic of the QWERTY keyboard is the typically long train­

ing period required to gain reasonable mastery over it. Schuurmann (1981) has docu­mented various drawbacks of the QWERTY keyboard and also the various previous at­

tempts to improve the lay-outs; Berkelmans and Den Outer's Outertype chord keyboard

(1980a,b), represents a more recent attempt to improve the keyboard. In music, a

chord is produced on sounding a number of notes at the same time, and a similar

idea is adopted for chord keyboards where several keys are depressed simultaneously

to produce a letter (see e.g. Conrad and Longman, 1965) or a syllable, as in theOutertype case.

Berkelmans and Den Outer trained several young people on a prototype machine and

claim performances superior to the conventional QWERTY keyboard training, including

apparently higher rate of keying skill acquisition as well as greater ease of ope­ration.

The present experiment was an attempt to determine the veracity of the inventors'

claim and examine the course of skill acquisition and the factors involved in it.

Method

Subjects

Three students, two females aged 19 and 17 and one male aged 19, participated in

the experiment. These three met all the conditions for participating which included

the need to be present at the same time over the experimental period as well as to

have very little previous typing experience. Since the main holiday period inter­

vened, the experimental period had to be split into two parts. The subjects were

looking for temporary summer jobs and viewed their participation in that light. The

number of subjects used was determined by the number of keyboards available and the

constraints of time and available funds. Subjects were paid for the total timespent at the laboratory.

Equipment

Each chord keyboard was supplemented with a microprocessor and a visual display on

which the produced text appeared. Fig. 1 shows the lay-out of the keyboard. Not all

26 letters were represented by separate keys: several letters had to be formed bydepressing two keys simultaneously. On the other hand, frequently-occurring conso-

*) Philips Data Systems, Apeldoorn 141

Page 140: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

142

Fig. 1. The lay-out of the keys on the Outertype chord keyboard. Mostkey-tops bore an alphabetic symbol or a symbol peculiar to the keyboard.The two large lower keys bore no inscr iption. The left one was used,among other things, for generating upper case; the right one to suppress'space' •

nants and all vowels were represented by two keys, one for the left and one for the

right hand, in order to minimise finger motions between keys and, consequently, in­

crease keying speeds. Spaces were automatically produced after each key-chord, so

between monosyllabic words no spaces had to be keyed. When typing a multisyllabic

word the typist had to prevent a space between each pair of syllables by depressing

the appropriate key.

Procedure

At their first appearance all subjects were given a passage to type on an electric

QWERTY typewriter. The time taken was noted and their typing speed was determined.

This was followed by a demonstration on the Outertype keyboard after which the ex­

perimenter answered any questions put by the subjects about the keyboard.

Training began immediately with each subject in a separate booth. A training period

of 1-1.5 hours was followed by a break of 10-25 minutes. A training session for the

day varied between 3-3.5 hours.

Subjects' progress was evaluated by requlrlng them to type a given passage in speed

test. Speeds were calculated from the time taken to complete the passage. The first

of these tests was conducted after about 1.5 hours of training, and then at sui­

table intervals. Subjects were specifically asked to practise blind typing, as is

customary in typing schools teaching the QWERTY system, but it was evident that

Page 141: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

even after 20 hours of training sUbjects were not doing bl ind typing. It must be

admitted that the Outertype keyboard is very difficult to operate at the start

without looking at the keys, to say nothing of having to read the instruction pa­

pers at the same time! It was therefore decided to use the presence or absence of

the key-tops as an independent variable. After the fourth test, which was held in

the 23rd hour, the key-tops were removed so as to induce the subjects to practise

blind typing. A speed test was given immediately after the key-tops were removed to

determine the effect. Training resumed again immediately but with the earliertraining material, which was necessary to guide the subjects after the key-tops

were removed. Speed tests were then administered at suitable intervals. The key­

tops were replaced sometime during the 42nd hour of training to see if there would

be a significant improvement in speed when key-top symbols were available.

After a practise session of about 100 minutes, followed by a speed test, training

continued with key-tops in place, leading up to the 14th test after 45 hours of

training.

Subjects then went off on a 5-week break, which was a condition on their part for

participation in the experiment. Also, the break provided an opportunity to deter­

mine retention of the acquired skill. On their return subjects were given three

tests, two of which were performed without key-tops and the third with key-tops in

place. Afterwards the subjects continued with the training as before.

An important part of the instruction related to errors detected by subjects. They

were asked to re-enter the letter or word that was wrong immediately, without eras­

ing the error. This was necessary to preserve the errors for subsequent analysis.

However, for the tests subjects were told to ignore any errors that occurred as

otherwise efforts at correcting could influence the speed of typing.

Following the completion of the training for the three subjects, two accomplished

typists, whose respective QWERTy-typing speeds were 411 char/min and 307 char/min

(determined in the manner described previously), began a training period on the

chord keyboard. One of these secretaries, however, dropped out after about 10 hours

because she was bored. The other continued her training whenever she had time to

spare and her work load permitted - there were occasional gaps of a whole week inher training.

Results

The speeds (uncorrected for errors) attained by the subjects prior to the removal

of the key-tops was about twice as high as that attained by QWERTY trainees, as can

be seen from Fig. 2. However, when the key-tops were removed the average typing

speed dropped from 90 char/min to 41 char/min, a drop of about 55%. When this speed

is considered, the average QWERTY trainee with the same amount of training appearsto be better than the three subjects typing blind.

It is difficult to say what the speed would have been had the subjects began their

training with blind typing. After the 23rd hour, the subjects were practising blind

typing but, as can be seen in Fig. 2, their speeds never really exceeded that cor-'143

Page 142: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

c: 140E...... 130...CIl~0

120-c:~CIl 110>"5..¥ 100

"0 90Q)Q)Q.(/J 80c:CIlQ) 70~

60

o 0 with key-tops.------. with key-tops removed• .. with key-tops replaced

learning curve for QWERTY(training manual)

• • with key-tops removed(after break)

o 0 with key- tops replaced(after break)

~ +t+~ .. ,u±1Un-1,n'3

"I -',I

.-'/ ".- /

-' ,,-' ,

-' ,/

II

-' f .:s:CIlQ)...

!Xl

5 10 15 20 25 30 35 40 45 0 5 10 15Training (hours)

144

Fig. 2. Average performance of the subjects as a function of training duration.

responding to QWERTY until after the 40th hour, when the average speed reached 88

char/min. This was largely due to a good performance by one of the female subjectswho had then reached a speed of 124 char/min. For the next test, the average speed

dropped to 83 char/min, which is below the QWERTY figure. Replacement of the key­tops in the 43rd hour produced a rather steep rise in speed but the course of this

rise could not be studied as the subjects then went off on their 5-week break. Theperformance of the subjects before and after the 5-week break is also shown in

Fig. 2. After the break an initial sharp rise can be seen in the average typing

speed with key-tops in place up to about the 10th hour, after which it fluctuates

around 120 char/min. Further skill acquisition could not be determined owing to theinterruption of training.

The performance of the subjects in the latter part of their post-break training

was, however, better than that for QWERTY, as is shown by the 'IPO'-curve in Fig.3. However, this speed was not very great compared with their own performance on

the QWERTY typewriter at the start of the experiment.

Fig. 3 also shows the speeds attained by the skilled secretary on the chord key­

board and those claimed by the inventors for their own trainees. The secretary ge-

Page 143: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Oss1-3

,/S-DO",

/,~,,,

""",,,/ Otypist,

,"

typist

produced performances superior to the Berkelmans and Den Outer trainees 1)

turn, chalked up performances which exceeded QWERTY performances by a large

OL..----'----'----..........L..--.........--..l------'---'-----'o W ~ W 00 100 1W 1~ 1W

Training (hours)

nerally

who, in

margin.

500cE

........... 450ro

.c(J-c 400~ro>:::lC"

350UJ

300

250

200

150

100

50

Fig. 3. Curve '8-00': results given by the inventors ofthe Outertype keyboard for five trainees (the dottedsection was an extrapolation); curve 'QWERTY': typicallearning curve for a normal Qwerty keyboard; curve'IPO': average Outertype results of the subjects in thepresent experiment (their averaged pre-Outertype Qwertyspeed is indicated by Qss1-3); curve 'typist': Outer typeresults of a well-trained Qwerty typist (her pre-Outer­type Qwerty speed is indicated by Qtypist).

Discussion

The main issue involved in this investigation was whether typing can be learned

substantially faster on this chord keyboard than on the QWERTY keyboard, as claimed

by the inventors. If the results of the three subjects alone are taken into ac­

count, then it must be said that it is difficult to agree with the inventors. If

1) Meanwhile, informal reports from the ' Associatie SMK' claim Outertype keyingrates of more than 700 char/min produced by flve trainees from a new group. 145

Page 144: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

146

the secretary's performance is considered, however, it suggests that the chord key­

board has posed few if any problems for her. Could her performance have occurred by

chance or could it be that her proficiency on the QWERTY has in some way contri­

buted to her apparent mastery of the chord keyboard? Or did the three sUbjects pro­

duce performances which are typical for the chord keyboard?

The answers to these questions would require further experimentation but we can ex­

amine some of the factors that have affected the performances of the sUbjects. The

absence or presence of visual feedback can affect performance. The inventors' trai­

nees learned to type without visual feedback and in this respect possibly laid for

themselves the foundation on which rapid acquisition of typing skill was possible.

On the other hand, it is difficult to envisage training without feedback and if the

inventors' trainees managed it they must have been highly motivated, for without it

mastery of all key combinations is likely to be extremely difficult.

The final performances of the three subjects were ch~racterised by a low error

rate; when sampled towards the end of the experiment, the error rate was about 1%.

We have no data on the errors produced by the inventors' subjects, whose perfor­

mance is shown in curve, '8-00' of Fig. 3. It is well known from the psychological

literature that in many tasks there is a speed-accuracy trade-off; this would sug­

gest that our subjects, with their low error rate, possibly could have a correspon­

dingly slow keying performance.

The inventors provided their subjects with texts which were recorded on cassette,

to which they listened while keying. Their subjects could in no way monitor their

own performance. Our subjects were reading their source texts as well as their pro­

duction, and the switching from texts to keyboard and back plus the searchings for

the points they left off may have adversely affected their speeds. On the other

hand, such an adverse influence on keying speed may also result from listening to a

text, through uncertainty about its correct spelling. The auditory input of train­

ing material to the inventors' trainees also had the effect of pacing; thus their

sUbjects were forced into typing at dictation speed while the present subjects

typed at their own pace. This may be a factor affecting the speed attained.

Motivation is also a factor in skill acquisition. Our subjects were highly moti­

vated in the first weeks of the experiment but later on, particularly after their

return from the long holiday break they began to complain about the texts to be

typed, which they found boring. The content of the training material should not

really concern the subjects and yet they seemed to be interested in the contents,

which interest may also have affected their speeds. The training regime employed in

the experiment could be related to and have an influence on motivation. The sub­

jects had to practise about 3-3.5 hours per day and this is quite long; it may have

reduced their motivation. In typing schools, practise sessions are about one hour

in duration and it is possible that the inventors' trainees also had short training

sessions.

In conclusion, it seems reasonable to state that it will not be easy to learn typ­

ing substantially faster on the Outertype- than on the QWERTY keyboard; in fact, it

Page 145: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

may be necessary to adopt a rigid training regime. In this context it should be

noted that a chord keyboard certainly appears to be unsuitable for casual users,

e.g. because the principle of chord keying presents an initial barrier to learning

to operate such a keyboard.

Summary

Conventional QWERTY keyboards typically require a long training period before rea­

sonable mastery over it can be acquired. Among the many attempts which have been

made to improve this keyboard, chord keyboards form a special group featuring si­

multaneous depression of several keys to form a letter or syllable. A learning ex­

periment on such a chord keyboard was done with three young adults with very little

typing experience. They trained for 45 hours during 3 weeks: then, after a 5-week

break, they trained for another 15 hours during 1 week. Their performance in terms

of keying speed was better, though not much, than may be expected for a comparable

training period on a QWERTY keyboard. It is possible that the ultimate performance

of accomplished typists on a chord keyboard is superior to that on QWERTY. A chord

keyboard such as the one investigated appears not to be well suited to casual

users, because (i) the principle of chord keying presents an initial barrier in

learning to operate it, and (ii) after a period of nontyping, it takes a while be­

fore the way of producing all letters and syllables is remembered again.

References

Berkelmans, N.M. and Outer, M. den (13 June 1980) Netherlands Patent Application

80 03 451.

Berkelmans, N.M. and Outer, M. den (1980) Snelschrijfmachine Outertype 1980 ­

leer- en oefenboek. Associatie voor stenografie, machineschrijven en kantoor­

praktijk (SMK), Eindhoven.

Conrad, R. and Longman, D.J.A. (1965) Standard typewriter versus chord keyboard ­

an experimental comparison. Ergonomics 8, 77-88.

Schuurmann, P. (1981) Inventarisatie van toetsenborden ten behoeve van de ontwik­

keling van communicatie-apparaten voor perceptief gehandicapten. IPO Memorandum

no. 242.

147

Page 146: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

148

Aids for the Handicapped

Page 147: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Developments

H.E.M. Melotte

Our activities in the field of communication aids for the handicapped have focused

on the final touches to three terminating projects, which implied a number of tech­

nical design and development problems that had to be solved. The final results, a

talking typewriter, a number of reading aids and an electrolarynx with pitch con­

trol, may be considered successful. The newly developed aids, for three of which a

patent has been applied, have been made suitable for industrial production. Agree­

ment has been reached on this with the manufacturers and distributors concerned.

The new aids are described in this issue by Kroon, Gabriels et al., and Schuurmann

et al.

Research on the ZWo project 'Reading by elderly people' was also finished this

year. The results of this project, described by Bouma et ale in this issue, have

led to several possible openings for further work in this direction.

With regard to the TV magnifier, the work in the final stage has been divided among

the industrial designer, the manufacturer and the sales organisation. The IPO task

is restricted to keeping an eye on the fulfilment of the requirements. In spite of

this allocation of tasks, new developments have not yet led to a useful design this

year.

The contracts of the projects described in last year's progress report come to an

end this year, but the knowledge and experience obtained offer good possibilities

for continuing the work. In view of the lack of facilities and permanent staff in

our own institute, procedures have been started to arrange for appropriate projects

as final study subjects for students of the Eindhoven university of Technology. On

account of our interest in reading problems of the visually impaired, a new pro­

ject, 'Improving the usefulness of optical magnifiers for reading and handwork by

visually impaired and elderly persons', has been set on the stocks.

149

Page 148: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

150

The Typophone

J.N. Kroon

Introduction

Within the general framework of a study on the application of present-day speech

technology in aids for the handicapped, the development of a talking typewriter has

been undertaken. The visually impaired experience many difficulties in the use of astandard typewriter, although it is a common means by which visually handicapped

people communicate in writing with the sighted. No doubt some of these difficulties

could be reduced or even eliminated by the use of spoken feedback from operations

carried out with the typewriter. In the first place the feedback of pressing atypewriter key consists of the name of the key spoken in Dutch. In addition, the

typist should be enabled to inspect and modify previously typed text by facilitiesfor text editing.

As regards the speech output, certain conditions have to be fulfilled. For in­

stance, speech quality should meet the requirements of acceptibili ty and unambi­guity. However, it is of greater importance that the spoken feedback should come

without noticeable delay upon taking the corresponding action ('random accessibi­

lity'). Synthetic speech can meet these demands.

In order to trace the potentialities of a talking typewriter, an experimental de­

vice (see Kroon and de Braal, 1980) has been extensively evaluated by a large num­

ber of visually handicapped typists, both novices and experienced. From the results

we worked out a list of qualities, some achieved and appreciated and others still

to be attained, and decided as to which of the functions implemented on the experi­

mental device could be discarded. On the basis of this work we designed and built aprototype suitable for industrial production, the Typophone.

Evaluation procedure

The experimental talking typewriter was evaluated according to a scheme (Kroon,

1980), centred round questions concerning synthetic speech perception, the rela­

tions between speech feedback and typing, general ergonomical aspects and the use

of additional functions. The testing took place at three national institutes forthe blind: the only Dutch rehabilitation centre for late-blind adults and two in­

sti tutional schools for visually impaired children. They each participated for

about a month. Some 130 visually handicapped, most novice, but also experienced

typists gained experience with the talking typewriter. For the sake of practicalpurposes the local typing instructors led the test periods. The adult blind became

conversant with the typewriter with the aid of extensive step-by-step directions

for use (Kroon, 1981). The children, however, were allowed to survey the typewriter

more freely, whereupon they took down their judgment in typing. We requested the

instructors to note experiences and remarks during the trial sessions, to ensure

Page 149: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

they were not lost. The use of the test-key to print the number of times each of

the function keys had been used, was recommended.

Results

The experiences consist of notes, written down as the handicaJ;>ped subjects did

their typing exercises, and of notes from interviews with the pupils afterwards.

The results are divided into three categories: perception of the speech output, the

influence of the spoken response on typing and miscellanea.

A. Synthetic speech perception

In general the speech output has been received very favourably. Although there was

occasional mention of the need of some habituation, on the whole the subjects stat­

ed that the voice was not metallic or robotic at all (as they had expected), and

they appreciated the pronunciation in Dutch (instead of the usual English). Occasi­

onal confusions between nearly homonymous letter sounds were reported, like Band

D, and M and N. Most letter sounds, as well as the names of punctuation marks, num­

bers and functions, were unambiguously understood.

B. Typing with speech feedback

The spoken response on each touch of a key has been found to be very useful by no­vice typists. In particular when the instructor is absent, the pupils can go ahead

by themselves, with warnings for incorrectly touched keys. The more experienced

typists, by contrast, are divided in their opinions. The adults feel hindered by

the speech in attaining a high typing speed. The children, on the other hand, easi­ly learn to type so fast that the speech feedback can not keep pace, and merges in­

to one pulp of sounds. The word 'capital', spoken after each capital letter, disap­

pears even during not so fast typing~ this should therefore be reconsidered.

c. Text memory editing and other functions

The editing functions of the setup are based on the implementation of a text memory

of two lines. Of the implemented functions (see Kroon and de Braal, 1980), the in­

sert and delete functions are among those the least used ones .On the other hand,

the recall functions, also using the text memory, have proved their utility, al­though the typists do not appreciate that the commands have to be given in two

steps (see Kroon and de Braal, 1980).

Most users express a preference for a more extensive text memory of two to ten

lines, or even a whole page. It appears, however, that even the present two linesof memory cannot be recalled or edited without significant orientati~n problems, in

spite of the appreciated spoken information on the line number and the locationwithin the current line.

The number of times each of the function keys has been used, is given in Fig. 1.

The reason for a number of functions· being used so rarely, can be found in thedirections for use, in which these functions are considered at the end, and some

151

Page 150: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

users do not get that far. However, even in the 0plnlon of those subjects who

learned these additional functions, they are irrelevant after all.

insert

delete

input

time

note

output

place-date

number

speech off

practise

repeat

line

character

word

next

back

0 500

children

Iadults

number of times of function use~

I I

~I

I

I

I

I

152

Fig.1. The number of times that each of the function keys was used,as registered by means of the test function facility. The results oftwo age groups have been summed: hatched bars: adults, blank bars:children.

Basic prototype requirements

On the whole the talking typewriter has been received favourably in the world ofthe visually impaired. On the basis of possitive as well as negative experiences we

are now able to draft a number of basic requirements which the industrial prototypeshould meet:

the typewriter: - standard, modern typewriter with correction features, tabulator,

etc. ;

- ergonomically designed, in particular with respect to the tactile

feedback of the keyboard;- for practical purposes: an electronic typewriter with built-in

interface, which needs no additional adaptation;- if possible with visual and/or tactile position indicator.

Page 151: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

text memory

speech output - unambiguously pronounced, interrupted by every following utter­

ance, individually adjustable talking speed:

- in addition to the normal spelling, switchable for instance to a

spelling code (in case of intelligibility problems) or phonetic

spelling (for speeding up the recall of memory contents):

capital information simultaneously with the letter utterance, for

instance by changing the pitch.

clear and simple recall commands:

a memory length of eight words or one whole line is sufficient as

long as the contents are recalled by spelled speech instead of

running speech:

- the memory should retain all the typing actions, including the

corrections.

the dictaphone: - combination with cassette recorders (for dictation and instruc­

tion) should be possible, without additional switching and with­

out the two voices interfering with each other.

Based on these requirements we designed and built a prototype suitable for indus­

trial production: the Typophone.

Fig. 2. The Typophone,

connected wi th an Olym­

pia ESW 100 KSR electro­

nic typewriter/terminal.

The Typophone

The Typophone is a self-supporting voice response unit, to be used universally with

any text processing apparatus (see Fig. 2). So not only a typewriter, but also

computer terminals and word processors can be connected. The Typophone meets the

above-mentioned requirements completely. In addition, a large-letter (height 9 mm)

visual display has been implemented on behalf of the instructors and the visually

impaired with some residual sight. The speech sounds can be listened to via the

built-in speaker or the optional headphones.153

Page 152: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

154

The user can extend the set of functions by using a keypad with nine functions:

1. speech on/off, which suppresses the speed feedback during typing

2. alphabet selection, with which the user chooses between the standard spelling­

alphabet, a phonetic alphabet and telephone letter codes

3. speed selection, with which the user chooses from four speeds of talking (espe­

cially important during the recall of memory contents)

4, 5 and 6. recall of former,. current or next word by spelled speech,every last character of a word is pronounced with a pitch declination, which fa­

cilitates the perception of word boundaries

7. recall of the whole line

8. number information, which gives a spoken mention of the position of the typinghead

9. the relocation function, finally, causes the typing head to move to the point of

departure, which has been left for making corrections.

The Typophone is supervised by a Motorola MC6809 microprocessor. The microprocessor

controls the input, handles the administration of signals from the typewriter and

the function keypad, and arranges the speech output. A Philips MEA8000 speech chip

takes care of the speech synthesis (Electronic Design, 1981). The interface speci­

fications and the necessary software to control the speech chip are described by

Polstra (this issue).

Summary

An experimental talking typewriter has been evaluated by a large number of visually

impaired typists, novice and experienced. The speech feedback proves to be veryuseful in typing courses, since the pupil can use it as a continuously present and

indefatigable supervisor. On the basis of the evaluation results we have drafted a

number of requirements which a talking typewriter should meet in practical use. A

prototype device, suitable for industrial production, and meeting said requirements

has been designed and built: the Typophone.

Acknowledgements

The cooperation of the three institutes for the blind, in particular their instruc­

tors and the local visually impaired, was of vital importance in the development of

the Typophone. I am especially indebted to Mr K. Fredriks of 'Het Loo Erf', Apel­

doorn, Mrs L.r. Eichenberger-Boot of the 'Koninklijke Blindeninstituut', Huizen,

and Mr A.A. de Langen of the 'Instituut Bartimeus', Zeist.

References

Electronic Design (1981) Synthesizer adds voice to ~P-controlled systems. Elec­

tronic Design ~ (25), 65.

Kroon, J.N. (1980) Schema voor onderzoek naar gebruikservaringen met de sprekende

schrijfmachine. IPO Memorandum no. 229 (in Dutch).

Page 153: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Kroon, J.N. (1981) Gebruiks- en instructie-aanwijzing voor de sprekende schrijfma­chine. IPO Handleiding no. 27 (in Dutch).

Kroon, J.N. and Braal, E. de (1980) A talking typewriter as an aid for the vi­

sually impaired. IPO Annual Progress Report 12, 118-123.Polstra, J. (1982) The speech synthesis chip in the Typophone. IPO Annual Progress

Report 11, this issue.

. 155

Page 154: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

156

Development of reading aids: the Reading Desk Project

J.E.M..Gabriels and H.E.M. Malotte

General background

A decade has passed since our first attempts at improving the reading situation for

the elderly and, more generally, for people with moderately low vision.

In the meantime our attention has been directed toward the development of a reading

aid in which an adjustable reading desk, adjustable text illumination and a large­

size optical magnifier could be made interchangeable. Several prototypes have been

constructed and evaluated by elderly and visually impaired subjects (see issues of

the IPO Annual Progress Report from 1973, 1975, 1978 and 1981). Although many

users' experiences and suggestions for improvement have been obtained, further de­

velopments had to be interrupted again and again for lack of financial support. In

1981 the National Research Fund for Diabetes Mellitus enabled us to continue ourwork in this field in cooperation with the Low-vision Department of the ophthalmo­

logical Clinic in Rotterdam. The resumed reading-desk project, split up into a

technical development part and a systematic evaluation part, has recently led to

four new reading aids, suitable for industrial production.

Technical development

Users' experiences with earlier prototypes of the reading desk indicated shortcom­

ings in ease of operation, durability of mechanical construction and exterior shape

and design. In addition, it was learned that many people prefer reading in an arm­

chair, while the reading desk was designed for reading at a table.

On the basis of these data, improved prototypes have been built so that movement of

the text, usefulness of optical magnifiers of different size and text illumination

could be tested separately as well as in combination under different reading condi­

tions. The main purpose of the technical development was to satisfy the subjects'

needs as they arose in successive evaluation sessions. Especially the need to movethe text in two directions with ease and to do needlework using the same aid,

raised many problems in mechanical design and construction.

Evaluation

In order to get a representative group of subjects for the evaluation, we first

carried out a preliminary investigation with 42 patients of the low-vision depart­

ment in Rotterdam and of an oculist's practice in Eindhoven. On the basis of inter­

views, reading tests and ophthalmological diagnosis, 16 subjects were selected for

the evaluation. Their age ranged from 52 to 77 years, their visual acuity from 0.1

to 0.8. Several eye diseases, such as macular degeneration, diabetic retinopathy,

cataracts, were represented. A good reading motivation and a physical ability to

handle reading aids were important criteria for selection. All subjects gave read­

ing and doing needle work as among the main activities that presented them with

difficulties.

Page 155: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

During the successive evaluation sessions the subjects were given the opportunity

to use the following aids at a table and in an armchair:

- a movable transparant reading desk provided with a system for keeping the pages

of a book flat,

- a combination of the reading desk with a homogeneous text illumination (Philips

PL-ll) with different colour and grey filters, and

- a combination of the parts mentioned above with different magnifying glasses:

COIL aspheric 1.8 and 2.3 x magnification, Eschenbach aspheric 1.8 x, biconvex

1.6, 1.8 and 1.9 x.

The users' tests consisted of reading aloud various texts, performing search tasks

in a newspaper and in a telephone book and doing needlework (female subjects on­

ly). Personal experiences of the subjects and observations by the evaluation leader

provided information on body position and fatigue, ease of operation and acceptance

of the aid in practical use. Between the evaluation sessions several suggestions

for technical improvement were made.

Results

Objective considerations, constructive requirements and the aim of a low cost price

have resulted in the development of four reading aids (see pictures).

Picture 1. The reading desk consists of a horizontally and vertically movable desk(a), text illumination unit with a PL light source (b) and an interchangeable sys­tem for magnifying glasses (c). The desk can be folded and weighs 2500 grammes.Operation: the book or text has to be pressed onto the desk by means of the magne­tic clamp (d). The movable arm of the lampholder should be adjusted to the workingdistance of the magnifying glass to be used. The text is moved under the lens withthe fingertips, from left to right and vice versa and from the bottom upwards. Onreaching the bottom line on the page, the desk can be brought into a new position(upper line next page) by means of the grips (f). 157

Page 156: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

158

Picture 2. Table magnifier. The lensand illumination unit are the same ason the reading desk. The supply unitfor the PL illumination has beenfitted in the foot. It can be used onthe table for reading and needlework,eventually in combination with thelap desk (picture 4).

Picture 3. Floor-stand magnifier.Again the same lens and illuminationunit, but fixed on a stand for read­ing and doing needlework in an arm­chair or in bed.

Page 157: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Picture 4. Lap desk. Transparant reading boardequipped to keep reading materials flat. Intended tobe laid on the thighs with the upper side (a) roundthe knees. Can also be used in combination with thetable magnifier, the floor-stand magnifier and, gen­erally, with optical magnifiers.

In this way we meet the subjects' preferences for ways of reading and doing needle­

work under a variety of conditions. On the basis of evaluation results, several

noteworthy things have been taken into account in the final designs. The most im­

portant are the following:

- Untrammeled transition from magnified to unmagnified parts of text or needlework

is found to be essential, for instance in finding a new line or surveying the

needlework. This has been realised by means of Eschenbach rectangular biconvex

magnifying glasses, which do not have irritating protective flanges round the

lens. Especially those subjects with retina diseases preferred clearance on all

sides of the lens.

A diffuse, homogeneous illumination, free of flicker and heat radiation, is found

to be preferable to the usual incandescent or fluorescent light sources. All sub­

jects were very enthusiastic about the Philips PL light source, which has recent­

ly become available (PL-11 watt, 4500 lux at a distance of 17 em). Moreover, it

is energy-saving, which is especially important for the elderly who tend to be

'thrifty with light'. Some patients with cataract were found to benefit from re­

latively low levels of illumination or the use of yellow colour filters in front

of the light source.

- We found that not too many parts of the reading aids should be moveable and ad­

justable. The aids should be light in weight, portable, and not look like a medi­

calor flashy device.159

Page 158: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

160

The final stage of development was to make the aids sui table for industrial

production and distribution. We now have four new aids which meet the needs of a

considerable group of visually handicapped persons.

References

Bier, N. (1970) Correction for subnormal vision. London: Butterworths.

Bouma, H., Legein, Ch.P., Melotte, H.E.M. and Zabel, L. (1982) Oral reading rate

and word recognition of elderly subjects. IPO Annual Pogress Report 17 (this is­

sue) •

Fonda, G. and Snydacker, D. (1959) Optical aids for low vision acuity. Transac­

tions American Academy of Ophthalmology and Otolaryngology ~, 79-88.

Henkes, H.E. and Balen, A.Th.M. van (1976) Oogheelkunde voor de algemene prak-

tijk. Agon Elsevier, Amsterdam/Brussel.

Melotte, H.E.M., Gabriels, J.E.M., Gorp, R.A.M. van, Kroon, J.N. and Schuurmann,

P.L.H. (1981) Developments and progress of current projects. IPO Annual Pro­

gress Report ~, 131.

Silver, J.H. (1976) Low vision aids in the management of visual handicap. British

Journal of Physiological Optics 11, no. 2, 47-87.

Sloan, L.L. (1971) Recommended aids for the partially sighted. National Society

for the Prevention of Blindness, Inc.

Page 159: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

An artificial larynx with semi-automatic pitch control

P.L.H. Schuurmann and H.E.M. Melotte

In last year's Annual Progress Report we mentioned our first activities in the de­

velopment of an artificial larynx. In fact the aim of this project was to provide

an existing monotonic artificial larynx (Servox) with semi-automatic pitch control

with which laryngectomees can control intonation. In close cooperation with the In­

stitute of Phonetics at Utrecht University, from where the idea of the system

stems, the developments at our institute have resulted this year in an industrial

prototype of an improved talking aid for laryngectomees (see photograph).

Artificial larynx with the newly

developed two-button control.

During technical development of this aid, attention had to be paid to miniaturising

the electronic circuitry and to the evaluation of shape, weight and ease of opera­

tion by a selectea number of laryngectomees. Users' experiences with several proto­

types equipped with different possibilities for switch operation, led to the devel­

opment of a special two-button switch, one for switching the device on and off and

the other for switching from high to low pitch or vice versa whenever a pitch ac­

cent has to be made. For the rest, the evaluation has not resulted in radical

changes of the exterior compared to the original Servox device.

Recently, the industrial prototype has been accepted by the manufacturer and dis­

tributor of the Servox artificial larynx for the production of a first series.

These devices will be used by the Institute of Phonetics in Utrecht to develop a

training program and become acquainted with the intonation possibilties of the la­

ryngectomees.

161

Page 160: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

162

Instrumentation and Software

Page 161: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Developments

L.F. Willems

Our VAX 11/780 computer has been operating for more than a full year now and is

assuming the status of the central computing facility for the institute. The main

work being done on this machine is speech analysis, manipulation and resynthesis,

linguistic syntactic, semantic and pragmatic analysis and statistical processing of

data acquired in psychophysical experiments. The psychophysical experiments are

controlled by minicomputers (Philips P8000 series) and microcomputers (Terak,

Apple), keeping most of the time-critical tasks outside the VAX. For the transmis­

sion of data obtained from these experiments 'ole have installed connections bet'tleen

the various mini- and microcomputers and the VAX.

Our VAX installation has a lM5-byte memory, two disk-storage units with a capacity

of 67 Mbyte and 256 Mbyte and some 20 RS232 lines connected up. We are now in the

process of extending the memory and speech output facilities of the installation.

This year the Philips speech synthesis chip MEA8000 has become available, and sev­

eral interfaces have been made, one with the VAX computer, one with an Apple micro­

computer, and one with the Motorola M6809 in the Typophone (see Polstra, this issue

and Kroon, this issue).

We have started designing and building a prototype multiple-waveform generator to

be used in a complex visual experimentation setup. It is a microprocessor system

which is loaded and controlled via the IEEE bus by a minicomputer as host. Four

waveforms can be generated simultaneously and the amplitude of the stimulus wave­

form and the background level (DC-offset) has to be controlled with a high degree

of accuracy_ For this purpose a separate digitally controlled attenuator is

incorporated for each waveform.

References

Kroon, J.N. (1982) The Typophone. IPO Annual Progress Report 22, this issue.

Polstra, J. (1982) The speech synthesis chip in the Typophone. IPO Annual Progress

Report 12, this issue.

163

Page 162: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

The speech synthesis chip in the typophone

J. Poistra

In the typophone, described elsewhere in this progress report (see Kroon), a speechsynthesis chip (MEA8000) is used for the production of speech. The implementation

is made with an eight bit microprocessor MC6809.The control signals of the MEA8000 have been designed in such a way that the chip

can be connected directly to the microprocessor bus. Reading from and writing to

the speech chip is done by using the chip enable input signal as a strobe, while

the read/write line determines the direction of the dataflow. The chip contains two

write-only and one read-only register. These are a data register, a command regis­

ter and a status register. From the status register only bit seven, the request

bit, is relevant. Register select line A (see Fig. 1) discriminates between data

and command register, while this line is irrelevant during a read access. More de­

tailed information can be found in the datasheets.

4MHz

f----lCE

iii

r---------lAoMEA80001--_---1

OUT

ADDRESSBUS

1------1 ~>-------iRw

EI------l

Me 6809

decoder

Fig. 1. Speech chip interface to the microprocessor

Parameters for both the digital speech waveform generator and the filters are sup­

plied to the synthesiser in coded groups of four bytes -a frame- via the databus,

except for the pitch start value which must be given only once prior to these four

byte frames. In the case of a higher-pitched utterance (capitals in the typophone)

only the pitch start value has to be changed. The typophone gives the user the pos­

sibility of speeding up utterances, which can easily be achieved by changing theframe duration information in every frame of the utterance. When the microprocessor

programme uses polling techniques to check whether the speech parameters must beupdated or not, the status register has to be read and checked before the frame in­

formation is updated. A routine for this operation could be the following:

164

Page 163: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

PROCEDURE OutMea (b):

BEGIN

REPEAT UNTIL req IN ChipStatus;

ChipData := b

END;

The synthes iser is in one of three modes: the stop mode, the active mode or the

continuous mode. The speech chip can always be put into the stop mode by setting

the stop bit in the command register (Fig. 2).

07 ID6 I 05 04 03 02 01 DO

STOP CONT CONT ROE ROEenable en.ble

00· INVALID 00' INVALID'0" INVALID 01 • INVALID 01' INVALID

NOT USED 10· SLOW STOP 10' 0lSA8LE REO OUTPUT'1'· STOP 11 • CONTINUE 11 • ENA8LE REO OUTPUT

Fig. 2. Command register

If the speech parameters are not updated in time, the behaviour of the chip depends

of the status of the continuous bit. If this bit is set, the chip will repeat the

last frame, while in the other case, the chip reverts to the stop mode automatical­

ly.

In the case of the typophone, speech information is stored in non volatile read­

only memory (ROM). The utterances are placed sequentially in ROM and every

utterance is preceded by a pitch start value.

Let us assume, for instance, that the synthesiser has to produce the utterance

'WORD' located in ROM at addresses BeginAddress to EndAddress. A'PASCAL program to

achieve this might be as follows:

PROGRAM MeaDemo(output);

CaNST MeaDataAddress ·... :MeaCommandAddress ·... ,MeaStatusAddress ·... ,WORDAddress • ••• I

Length ·... , (*EndAddress - BeginAddress*)

TYPE byte 0•• 255;

CommandBits (roe, roeen, cont, conten, stop, c5, c6, c7);

StatusBits (sO, s1, s2, s3, s4, s5, s6, req);

(*a variable of one of these types occupies one byte in memory*)

165

Page 164: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

VAR i integer;

WORD ARRAY [0 ••• length] OF byte @ WORDAddress;

ChipData byte @ MeaDataAddress;

ChipStatus SET OF StatusBits @ MeaStatusAddress;

ChipCommand: SET OF CommandBits @ MeaCommandAddress;

(*this compiler has the facility of binding a physical address to a declared

variable. (ie. 'b: byte @ $3000;' means that variable b is located at address

hex 3000.)*)

PROCEDURE InitMea;

BEGIN ChipCommand := [conten, stop]

END;

PROCEDURE OutMea (b: byte);

BEGIN REPEAT UNTIL req IN ChipStatus;

ChipData:=b;

END;

(*slow stop mode*)

166

BEGIN InitMea;

FOR i := 0 TO length DO OutMea(WORDIi]);

END.

Page 165: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Publications 1982

P 425

P 426

P 427

L. Zabel, H. Bouma and H.E.M. Melotte

Use of the TV magnifier in the Netherlands: a survey.

In: Journal of visual impairment and blindness 76, 1982, no. 1, pp 25-29.

Persons with low visual acuity can read with a TV magnifier, according to asurvey of 280 users of the aid in the Netherlands. Survey respondents foundthe TV magnifier indispensible for such tasks as reading, writing, andlooking at photographs. The survey also identified structural and instruc­tional problems, and the authors offer suggestions for improving the aid,which will lead to its wider use.

J.P.L. Brokx and S.G. Nooteboom

Intonation and the perceptual separation of simultaneous voices.

In: Journal of Phonetics, 10, 1982, pp 23-36.

The present paper examines the role of speech pitch in the perceptual sepa­ration of simultaneous speech messages, when both messages are spoken bythe same speaker and there are no differences in directional hearing. In afirst experiment employing resynthesized speech with completely monotonouspitch, it is shown that intelligibility of the target message can be mani­pulated by introducing an artificial constant difference in pitch betweentarget speech and interfering speech. Within certain limits, intelligibili­ty increases with increasing difference in pitch. In a second experiment,natural speech is employed for both target and interfering messages. Theinterfering speech is always spoken with normal intonation, whereas thetarget messages are either spoken with normal intonation or deliberatelyspoken in a monotone. For both intonation conditions the messages are ei­ther spoken within the same pitch range as the interfering speech (SAMEPITCH), or within a considerably higher pitch range (DIFFERENT PITCH). Forthe messages spoken with normal intonation the SAME PITCH condition is con­siderably less intelligible than the DIFFERENT PITCH condition. For the mo­notonously spoken messages the results are less clear. Here the effect of adifference in pitch range is probably confounded with the effects of otherproperties of speech which result from a monotonous pronunciation. The mainresults of these experiments can be related to the phenomenon of 'perceptu­al fusion', occurring when two simultaneous sounds are identical in pitch,and to 'perceptual tracking'. When the pitches of target and interferingspeech cross each other, the listener runs the risk of inadvertentlyswitching his attention from the target speech to the interfering speech.

F.L. van Nes

Perceptive, cognitive and communicative aspects of data processing equip­ment.

In: Proceedings of the 1982 International Zurich seminar on digital commu­nications: man-machine interaction, March 9-11, 1982, Zurich: Swiss FederalInstitute of Technology, 1982, pp 259-262.

It is to be expected that data processing equipment will be increasinglyused by laymen. An easy, natural use of such equipment would benefit notonly the layman, but the data processing expert as well. This may be fur­thered by allowing dialogues in natural language, selecting appropriate in­put/output apparatus, employing display symbols and text layouts of highlegibility, etc. Experiments are reported with the presentation of the in­formation retrieved in both text and in speech form. A preliminary analysisis given of the efficiency and ease of using such visually or auditivelypresented information.

167

Page 166: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

168

P 428

P 429

P 430

H.C. Bunt

Conversational principles in question-answer dialogues.

In: Zur Theorie der Frage. Hrsg. von D. Krallman und G. Stickel. Tlibingen,Narr, z.j., pp 119-141.

In this paper questions and answers are considered within the setting ofdialogues of a relatively simple kind, called 'informative dialogues'. Cha­racteristic of this type of dialogues is that they are conducted with thepurpose of exchanging certain factual information and do not contain ele­ments irrelevant to that purpose. The empirical data we use ace recordingsof such dialogues between people and between a person and a computer. Itturns out that, though the desired information transfer is basically accom­plished by means of questions and answers, the naturalness and smoothnessof the dialogues are largely due to the way questions and answers are in­terwoven with acknowledgements, checks, confirmations, etc.A theoretical framework for the study of dialogues is developed, in which adialogue is viewed as a sequence of speech acts performed according to cer­tain rules for their appropriate use. We outline how these rules can bemade explicit and precise in terms of an articulated description of theparticipants' goals, plans, knowledge of the discourse domain, and assump­tions about each other.Two applications of this framework are discussed: its use in the transcrip­tion and analysis of person-to-person dialogues and its role as a basis forconstructing a pragmatically acceptable automatic dialogue partner.

J. 't Hart, S.G. Nooteboom, L.L.M. Vogten and L.F. Willems

Manipulaties met spraakgeluid (Manipulations of speech sounds).

In: Philips Technisch Tijdschrift 40, 1981/82, no. 4, pp 108-119 (inDutch) •

This paper deals with the computer implemented sy:;tem called SPARX, forSPeech Analysis and Resynthesis eXperiments with speech. It is based on thecommonly accepted 'synthesis model' for speech production, consisting of avoiced or voiceless source with a flat spectral envelope, and a variablefilter that includes all the spectral characteristics of the speech or­gans. Digitised speech is analysed by means of linear predictive coding(LPC), the fundamental frequency (Fo ) is measured with the autosign-corre­lation method. This results in thirteen parameters: voiced/unvoiced, Fo 'amplitude, and frequencies and bandwidths for each of the five formants.Resynthesized speech can be manipulated by deliberate operations on the pa­rameters. Examples discussed are coarsening of the parameters in order toreduce the bit rate, and manipulation of the pitch and the temporal struc­ture in such a way as to alter the interpretation of a speech utterance.Finally, an intonation grammar is discussed that can be used in controllingFo in synthetic speech. For Dutch, such a grammar had been developed ear­lier, but for British English (as is presented in the English edition), therules have been formulated on the basis of recent research in which SPARXwas used for determining the extent to which the pitch variation can be'stylised' without degrading the perception.

J.T.S. Smits and H. Duifhuis

Masking and partial masking in listeners with a high-frequency hearingloss.

In: Audiology, 21, 1982, pp 310-324.

Three listeners with sensorineural hearing loss ranging from moderate tomoderate-severe starting at frequencies higher than 1 kHz participated intwo masking experiments and a partial masking experiment. In the firstmasking experiment with f M = 1 kHz and LM = 50 dB SPL, higher than nor­mal masked thresholds were obtained for listeners whose hearing was im­paired in the frequency region of clear hearing loss as well as in the re­gion of near-normal absolute thresholds.The second masking experiment showed that for the hearing-impaired listen­ers the elevation of the masked thresholds, in dB, in this frequency regionof 'near-normal' absolute was equal to the elevation of the absolute thres­holds, in dB. The third experiment, a partial masking experiment with fM= 975-1025 Hz and LM = 75 dB SPL, showed similar partial masking func­tions for hearing-impaired and normal listeners, but the functions for thehearing-impaired listeners were at much higher levels of the partially

Page 167: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

P 431

P 432

P 433

P 434

masked probe-tone. Thus the higher masked thresholds of the hearing-impair­ed can result in a dramatic reduction of the dynamic range of hearing undermasking in the frequency region of hearing loss and also in the region withonly a small hearing loss « 30 dB). It is suggested that this may explainthe speech perception difficulties which these listeners experience, espe­cially in the presence of ambient noise.

H.C. Bunt

IPO Dialogue Project

In: Sigart Newsletter, 1982, no. 80, pp 60-61

A brief description is given of the dialogue theory under development atIPO and the automatic dialogue system being designed on the basis of thistheory. The central role of the felicity conditions for dialogue acts inthe interpretation, evaluation, and generation of these acts is emphasised.

J.E.M.W. Thomassen

Melodic accent: experiments and a tentative model

In: Journal of the Acoustical Society of America, 71, 1982, no. 6, pp 1596­1605.

The perception of accent in tone sequences is a constructive process inwhich physical cues are matched against anticipated accents. The anticipa­tion of the observer can experimentally be controlled by embedding theshort tone sequence to be investigated in a context with a meter: method ofcontrolled anticipation. An investigation of melodic accentuation, result­ing from the succession of frequency intervals, revealed that in principleevery change of frequency level between two successive tones can be inter­preted as accentuation of the terminal tone of the change. The melodic con­tour seems to be most important. The first of two intervals in opposite di­rections operates as the strongest accentuation, whereas two intervals inthe same direction are equally effective. The effect of relative magnitudeis less pronounced. Only in the case of clearly diverging relative magni­tudes the largest interval is the most powerful, particularly when the in­tervals are in the same direction. The advantage of rises over falls is al­most negligible. The short-term influence of physical factors on momentaryaccent perception allows for a description in terms of a 'memory window'sliding along the tone sequence. At each moment the frequencies within thewindow provide the physical cue for accent that has to be matched againstanticipation. If the span of the window is minimal, i.e., three tones, ac­cent perception in sequences of four tones, embedded according to the meth­od of controlled anticipation, has been accounted for fairly well, the cor­relation coefficient between prediction and outcome being 0.76.

P. Ottley, S.M. Marcus and J. Morton

Contextual effects in the stimulus suffix paradigm.

In: British Journal of Psychology, 73, 1982, pp 383-387.

Morton & Chambers (1976) showed that the suffix effect -a selective impair­ment in serial recall on the final serial position of an acoustically pre­sented list- was crucially affected by whether the suffix was a speechsound or a non-speech sound. They also claimed that the classification of asound as speech-like was determined simply by the acoustic properties ofthe sound and not at all by the context. The crucial sound in their experi­ments was a steady state, naturally produced vowel sound which fai led togive a suffix effect. We report here that when the sound was the only suf­fix used, it did produce a suffix effect. We conclude that, contrary toMorton & Chambers' conclusion, context effects are indeed operative in de­termining whether a sound produces a suffix effect.

H. Duifhuis, L.F. Willems and R.J. Sluyter

Measurement of pitch in speech: an implementation of Goldstein's theory ofpitch perception.

In: Journal of the Acoustical Society of America, 71, 1982, no. 6, pp 1568­1580.

169

Page 168: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

170

P 435

P 436

P 437

The current hearing theory views on perception of pitch in complex soundsconverge towards the interpretation that pitch is the result of a psycholo­gical pattern recognition process. Pitch is determined by the fundamentalof the harmonic sound which spectrum optimally matches that of the complexsound. Goldstein (J. Acoust. Soc. Am. 54, 1496-1516 (1973)) proposed an ob­jective procedure for finding the best fit. We have adapted and extendedhis procedure in a pitch meter which measures pitch in speech. The most im­portant deviation from Goldstein's procedure is that in our implementationnot all components of the complex sound have to be classified as harmo­nics. A simple criterion, based on the number of components that is classi­fied as harmonics at a candidate pitch, determines which candidate pitch isselected. Performance of this psycho-acoustically based pitch meter com­pares favourably with that of known algorithms.

Ch.P. Legein and H. Bouma

Reading and the Ophthalmologist; an introduction into the complex phenome­non of ordinary reading as a guidance for analysis and treatment of dis­abled readers.

In: Documenta ophthalmologica, 53, 1982, pp 123-157.

Reading problems are a frequent source of complaints in ophthalmologicalpractice. In many cases suitable optical correction is all that is needed.However, difficulties may remain despite adequate optical correction. Thispaper describes visual reading processes with the aim of making such diffi­culties understood arid, if possible, providing remedies.Four different types of visual reading processes are distinguished: (a) op­tical imaging, (b) eye movement control, (c) visual word recognition and(d) integration of information across eye fixations. Next the attempt ismade to use our insight to obtain a better understanding of actual readingproblems, such as those of elderly readers, low-vision patients, and dys­lectics as well as those of the blind. Therapeutic options, including visu­al aids are given due attention.

D.G. Bouwhuis

Engels leren met spraak naar keuze (Learning English with 'speech-at-will':a pilot study).

In: Hees, E.J.W.M. en Dirkzwager, A., Eds, Onderwijs en de nieuwe media;Onderwijsresearch Dagen, 1982, Tilburg. Lisse: Swetz & zeitlinger, 1982 (inDutch) •

It is standard educational practice to use learning material in printedform. This has obvious advantages in studying and searching for particularpassages. Spoken text material is often employed in foreign language educa­tion, but never in such a way that students have direct access to particu­lar passages.In an experiment students were enabled to select arbitrary spoken text pas­sages for listening and could repeat those at will. This led to improvedlearning results for word knowledge and listening ability; improvement ontext comprehension could not be assessed with the tests employed.

C.J. Chiang and A.J.M. Houtsma

A comparison between expensive and inexpensive violin strings.

In: Catgut Acoustical Society Newsletter, 1982, no. 38, pp 8-10.

This project represents an attempt to determine whether or not there is asignificant quality difference between expensive and inexpensive violinstrings. Our experimental procedure included subjective evaluation of thestrings through blind listening and blind playing as well as physical mea­surements of overtone harmonicity. Results of blind listening tests were,as usual, inconclusive, whereas blind playing by an accomplished violinistresults in a clear preference for the more expensive strings. Laboratorytests indicated that overtones of plucked notes from the more expensivesilver- or aluminum-wound nylon strings are considerably more harmonic thanthose from the less expensive steel strings. This explains why the formerare much easier to bow than the latter, and is also consistent with theconsiderable difference in bending stiffness between the two kinds ofstrings.

Page 169: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

P 438

P 439

P 440

A. Cohen, R. Collie.r and J. 't Hart

Declination: construct or intrinsic feature of speech pitch?

In: Phonetica, 39, 1982, pp 254-273.

Declination is taken as the focus of studying pitch phenomena from an a­coustic, physiological and perceptual point of view. It is shown that ori­ginally declination was no more than a theoretical construct to account forthe interpretation of acoustic Fo recordings. Recently, psycholinguisticconsiderations have enhanced the domain of application so as to account forthis phenomenon. The literature is reviewed and the authors take issue overthe various claims put forward by others, such as the dominance of the top­line over the baseline approach, and the amount of preprogramming involvedin declination, as manifested in its slope and in linguistically determinedresetting.

S.G. Nooteboom and J.M.B. Terken

What makes speakers omit pitch accents? An experiment.

In: Phonetica, 39, 1982, pp 317-336.

The present paper reports on an experiment which was set up to examinewhether we can make a speaker either accent or de-accent particular wordsby systematically varying the objective probability that a particular ref­erent will be mentioned (and therewith the referent's predictability forspeaker and listener). In the experiment each of 24 speakers was asked towatch a visual display, showing a very simple configuration of letter sym­bols, and to describe orally each change in the current configuration to alistener. By manipulating the letter configurations shown on the display,the objective probability that the speaker will mention a particular lettercould be controlled. Letters could either move around on the screen (movingletters) or remain fixed and serve as spatial reference points (fixed let­ters).Objective probabilities were 0.5 and 1 for both moving letters and fixedletters. The main findings were the following:1. When a referent is fully predictable to speaker and listener there is a

high proportion of ellipsis, particularly for the moving letter whichalways was referred to from subject position.

2. The probability that a word referring to a letter is accented appearsnot to be immediately controlled by the referent. The controlling factoris rather the preceding linguistic context. More specifically, the pro­bability of accenting, being close to 1 at the first time a specificreferent is mentioned, sharply decreases when the same referent is men­tioned for the second time in a row, and decreases again when this samereferent is mentioned three or more times in a row. However, as soon asthe competing referent is mentioned once, in the same row (moving orfixed letter), the probability of accenting jumps up again.

3. The probability of accenting is systematically lower for the moving let­ters in subject position (average 0.32), than for the fixed letters, inpredicate position (average 0.52). In view of these findings, de-accent­ing, defined as conspicuously omitting an accent on a word that, forgrammatical reasons, otherwise would have been accented, is interpretedas a device which can be used by a cooperative speaker for helping thelistener to find the intended referent as easily and quickly as pos­sible. Speakers not using this device systematically are supposed togive their listeners a harder time.

A.J.M. Houtsma

Inharmonicity of wound guitar strings.

In: Journal of guitar acoustics, 1982, no. 6, pp 60-64.

Wound guitar strings are known to 'go dead I after several hours of play­ing. Increased inharmonicity of string partials is thought to be the prima­ry contributing factor, making exact tuning of strings impossible. Increas­ed inharmonicity with age is mostly due to changes in mass distribution andinternal stresses, rather than changes in stiffness. String aging can beartificially induced by repeated streching and relaxing of a new string.Measurements of the frequencies of the first ten partials in standardbrass-wound steel guitar strings show that inharmonicity is significantlyincreased by repeated streching. The inharmonic effect of streching can begreatly reduced if strings are stress-relieved by heat after winding. 171

Page 170: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

172

P 441

P 442

P 443

A.J.M. Houtsma

Pitch salience of various complex sounds.

In: Symposium on common aspects of processing of linguistic and musical da­ta; Tallinn, 22-24 November 1982. Tallinn: Academy of the Estonian S.S.R.,1982, pp 30-39.

Pitch salience of a variety of different complex sounds was measuredthrough open-set melodic dictation tests using five musically experiencedobservers. The experimental task on each trial was to play back all notesof a four-note melody, randomly selected from an eight-note diatonic majorscale, on an eight-note keyboard. Data were reduced to a correlation mea­sure which addresses mostly the degree to which ordinal or contour informa­tion is preserved in the sequence of sensations, and also to a percent cor­rect identification measure which tests preservation of ratio information.The two measures are in some cases very different, and it is proposed thatthose sounds that seem to convey mostly ordinal and little ratio informa­tion should not qualify as sounds that evoke true pitch sensations.

J.A.J. Roufs

Static and dynamic line spread functions of the visual system, elicited byTV-line increments in situ.

In: Perception, 11, 1982, no. 1, pp A12 (abstract).

Line spread functions are increasingly used to characterise the contrasttransfer of the human visual system, especially in connection with the per­ception of extended stimuli. There are several reasons why they cannot bederived adequately from either the point spread function or the modulationtransfer function. In practical problems dealing with TV display the linestructure of the luminance field and the interlace of frames is anothercomplication.Wi th the use of a perturbation method on the basis of local linearity inconjunction with a specially constructed electronic device, line spreadfunction elicited by individual line increments of a normal TV set could besampled in space and time. Quasistatic responses at a 200 cd.m-2 , havingthe usual shape, were found to be linearly related to edge and bar re­sponses. The greater width of the line spread functions at 5 cd.m- 2 couldbe simply expressed by a spatial scale factor. The impulse line responseappeared to be more oscillatory than an impulse point response at the samelevel. The lateral spread of the impulse line response is much larger thanin its static spread, which suggests that its space and time coordinatesare not separable.

E. van der Zee and A.W. van der Meulen

The visibility of flicker on visual displays as a function of the framerepetition rate.

In: Perception 11, 1982, no. 1, pp A12 (abstract).

In an experiment the frame repetition rate (FRR) was determined at whichflicker cannot be detected on a screen with negative contrast, i.e. darkletters on a bright background. Use was made of a display on which the FRRcould be adjusted to all integer values between 60 and 120 Hz. The experi­ment was carried out under the general viewing conditions as recommended byCakir and Stewart. Twenty-four naive subjects participated, six in each ofthe age classes 20-30, 30-40, 40-50 and 50-60 years. The empty screens thesubjects saw had a variable luminance (50, 100, and 200 cd.m- 2 ) and wereobserved at three viewing distances (33, 50, and 70 cm). The subjectsjudged which of two successive presentations on the same screen containedflicker. This two-alternative forced-choice method was chosen because somesUbjects could not produce consistent results on single presentations withthe method of the constant stimuli. It was found that de FRR must be higherthan that quoted in many pUblications, viz. between 90 and 95 Hz instead of70 or 80 Hz. Higher screen luminance calls for a higher FRR, by about 8 Hzwhen going over from 50 to 200 cd.m- 2 • Apparent flicker is reduced bygreater viewing distances: older subjects are only slightly less sensitiveto flicker than younger ones.

Page 171: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

P 444 S.G. Nooteboom

Boekbespreking (Book review): D. Robert Ladd Jr.: The structure of intona­tional meaning: evidence from English.

In: Phonetica, 38, 1981, pp 357-360.

Papers accepted for publication

MS 410 J. 't Hart

A phonetic approach to intonation: from pitch contours to intonation pat­terns.

To appear in: Dafydd Gibbon and Helmut Richter (Eds). Pattern, Process andFunction in Discourse Phonology, Berlin: de Gruyter.

This contribution to a book on discourse phonology makes a plea for takinginto account the results of experimental phonetic research on intonation bytheoretical linguists. It gives a survey of the methods applied in the pho­netic approach to intonation, particularly the stylisation method, and ofthe results already obtained for Dutch and those obtained thus far for Bri­tish English intonation. In a section on functions of intonation it is saidthat al though it is clear that pi tch movements play an important part inlending prominence and in marking major syntactic boundaries, the problemas to where pitch accents should be located in a sentence, and whether acertain major syntactic boundary should be marked intonationally or not,have not yet been solved. Even less is known about the possible function ofa speaker's choice of a particular intonation pattern. Theoretical implica­tions are that the programming of the pitch contour does not necessarilyhave to wait until the surface structure is ready. An alternative principleis a hierarchical one in which the intonation pattern is chosen at a veryearly stage, and in which the choice of the kind of accent-lending pitchmovements is subordinated to the kind of intonation pattern to be real­ised. Finally, it is claimed that this is a feasible approach for makingexplicit what comprises the intonational competence shared by speakers andlisteners, and that its applicability is not restricted to only a few lan­guages.

MS 419 S. Larochelle

A comparison of skilled and novice performance in discontinuous typing.

To appear in: W.E. Cooper (Ed.). Cognitive aspects of skilled typewriting.New York: Springer-Verlag.

Skilled and novice typists were presented with isolated letter strings thatthey were asked to transcribe, on cue, as fast and as accurately as pos­sible. Two aspects of performance were analysed, the time required to typethe first letter of the strings (called the latency), and the time intervalbetween each of the following keystrokes (called the inter-stroke inter­vals). Both the latencies and the average inter-stroke intervals increasedwith the length of the strings. With novice typists, the increase in laten­cy and inter-stroke interval was much more pronounced when nonword letterstrings were used as stimuli than when the stimuli consisted of words. Bycontrast, the performance of skilled typists remained the same as long asthe nonwords preserved the same digraph composition as the words. Sequencesof keystrokes that involved the fingers of both hands (2H) produced longerlatencies but shorter inter-stroke intervals than sequences typed with thefingers of only one hand (lH). The length and the orthographic compositionof the strings did not affect the difference in latency between the lH andthe 2H conditions, but did affect the difference in inter-stroke intervals,especially in the case of novice subjects. Similar results were obtainedwhen comparing keystroke transitions involving different fingers of onehand (2F) and transitions involving the repeated use of one finger (IF). Inorder to account for these results, it was argued that the orthographic re­presentation of the strings is still active during the execution of thetyping response, and that the planning of future keystrokes overlaps to avariable degree, depending on the skill of the sUbjects and the orthograph­ic composition of the material with the execution of previous keystrokes.

173

Page 172: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

174

MS 422 S.M. Marcus

Speech production and perception - communication between speaker and lis­tener.

To appear in: Sprache: Eingabe und Ausgabe, Chapter 1.

This chapter has given a brief outline of the nature of the speech signal.It has been seen that there is not a simple one-to-one relationship betwe~n

the acoustic signal and the sounds perceived. The various articulatory ges­tures which result in speech have been presented, together with the voweland consonants which are found in Standard German. Comparison of the spec­tral characteristics of even simple utterances and their formant peaks re­veals a complex relationship between acoustics and articulation, and thisis further complicated by coarticulation between adjacent sounds and thevowel reduction which occurs in fast speech as is normally used in fluentconversation.The concept of the phoneme has been introduced as one valuable for identi­fying the sounds which serve to distinguish between words. Despite theirvalue for this, there appears to be little evidence that they serve as anintermediate stage as such in the recognition of words. A few recent ap­proaches have been showing promise in exploring a direct relationship be­tween the acoustic signal and the perceived word.Differences between speakers result from differences in the size and shapeof their vocal tracts, and differences in the way in which they use them.Some of this variability may be allowed for by estimating the overalllength of the speaker's vowel tract, and there is experimental evidence tosuggest that listeners can and do perform such a normalisation very rapid­ly.In continuous speech various factors operate to give speech its continui­ty. Meaning and grammatical constraints play an important part even veryearly in the recognition of each word, while the overall intonation counterof a sentence gives it perceptual coherence and emphasises important words.

MS 423 S.M. Marcus

The mechanisms of speech production and perception.

To appear in: Sprache: Eingabe und Ausgabe, Chapter 2.

This chapter has presented a brief outline of the mechanisms involved inhuman speech production and perception. Examination of the brain and itsrelatively slow mechanisms of neural transmission shows that its speed andcomplexity of operation must result from a high degree of coordinated pa­rallel activation, the precise nature of which is as yet poorly under­stood. Turning to the mechanisms of speech production the resonances of thevocal tract have been considered, and the source-filter model has been pre­sented as a valuable simplification in representing their acoustic charac­teristics. A simple form of the vocal tract filter has been discussed, interms of either a serial or parallel connection of formant resonances, andthe modification of this to include nasal resonances and subglottal aerody­namics briefly mentioned.An examination of the structure of the ear, and in particular of the coch­lea, give some insight into the complex mechanisms through which the earcan simultaneously provide a high sensitivity, selectivity, dynamic rangeand temporal resolution. Changes in the propagation of a travelling wavealong the narrow basilar membrane which divides the cochlea are able tosignal tones of frequencies well above the maximum firing rate of a singleneurone.The final sections have considered perceptual measurements of the charac­teristics of human audition, and the concept of a set of auditory filterseach with a particular critical bandwidth presented. The practical applica­bility of this both in loudness estimation and speech perception in noisehave been mentioned. Finally the perception of pitch has been shown to havea complex relation to the acoustic signal, not simply corresponding to thefundamental frequency of the signal.

MS 432 R. Collier

Some physiological and perceptual constraints on tonal systems.

To appear in: Proceedings of the Workshop 'Explanations of LanguageUniversals', Cascais (Portugal), Jan. 1982.

Page 173: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Phonological or phonetic analyses of tonal phenomena in speech give the im­pression that different languages evince striking similarities in the kindof pitch attributes they exploit in bringing about tonal contrasts, as wellas in the number of categories they discriminate along each attributive di­mension. This paper tries to explain these similarities on the basis of li­mitations of the speech production and perception systems, restricting it­self to single pitch movements. The following topics are discussed: (1) de­clination; (2) speed of pitch movements; (3) size of pitch movements; (4)position of pitch movements with respect to the syllable. In each case someobservations are presented, physiological or perceptual explanations aregiven, and implications in a theory of pitch-movement universals are consi­dered. The physiological and perceptual constraints lead to the predictionthat intonation languages select their contrastive pitch movements from aninventory of thirty-six movements at most: two directions, two speeds,three sizes and three timings. The total number of possibi Ii ties is ex­pected to be considerably lower due to interactions between the various at­tributes. For instance, the timing distinction is relevant for fast pitchmovements only, and the contrast between small and medium-sized rises maydisappear when both are slow.

MS 433 J.M.B. Terken

(De-)accentuering in gesproken beschrijvingen.

To appear in: Handelingen van het 37e Filologencongres.

Two experiments are summarised investigating factors which affect the pre­sence and absence of pitch accents in spoken descriptions. In one, highlyrestrictive experiment, speakers gave short descriptions of changes in aconfiguration of alphabetic characters, in the form 'the p moves to theright of the k'. The probability of accentuation of referring expressions('the p', 'the k') appeared to be a function both of the number of succes­sive references to the current referent immediately preceding the currentutterance, and of sentence position: expressions in initial position hadlower probability of being accented than expressions in final position.In another, less restrictive experiment, speakers gave listeners instruc­tions in an assembly task. The monologues were analysed so that each in­struction concerned one of the elements to be assembled, called the TOPICof the instruction. The distribution of accents was found to be affected bytwo factors, prior mention in the instruction, and topicality: expressionsreferring to referents which had been mentioned earlier in the current in­struction were accented less often than expressions referring to referentswhich had not been mentioned before in the current instruction; this effectwas more extreme for topical expressions than for expressions referring toother referents. Also, it was noted that there was a strong tendency to usepronouns to refer to the topic after it had been topicalised.The results were taken to mean that speakers use de-accentuation as a sig­nal to the listener that the interpretation of the de-accented expressionis immediately available and need not be computed anew, and that the avai­lability of interpretations is judged with respect to the thematic struc­ture of the discourse. In the absence of a clear thematic structure, simplefrequency of mention may be used to assess availability of interpretations.

MS 435 J. 't Hart

The stylisation method applied to British English intonation.

To appear in: Proceedings Working Group on Intonation of the 13th Interna­tional Congress of Linguists, Tokyo, 1982.

This paper deals with the work done by de Pijper to show that the stylisa­tion method, as developed for the study of Dutch intonation, can also befruitfully applied to British English intonation (as he himself reported inAPR 14 and 15). The stylisation method is a way of making the objectivemeasurements of Fo more easily interpretable, and how this simplificationis brought about is entirely determined by perceptual criteria. In a trial­and-error procedure, an artificial pitch contour consisting of straight­line segments is constructed, in such a way that with the smallest possiblenumber of these segments it is a fair or even an auditorily indistinguish­able approximation to the original course of Fo . The latter is called aclose-copy stylisation. The perceptual equality between these and originalsis tested in the first experiment described. The experiment, with 64 nativeEnglish subjects, showed that although the listeners were sometimes able to

175

Page 174: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

hear the very slight differences, they were unable to do so in 87% of thecases in which close copies were compared to resynthesized originals. Thenext step in the stylisation method is standardisation of all parameters ofthe pitch movements. Standardised pitch contours may sound different fromthe originals from which they are derived, but they should sound accep­table. This was tested in the second experiment, with a variety of differ­ent patterns, each presented in five different versions, one of which wasmade according to the rules for Dutch intonation. The outcome was thatstandardised contours are not less acceptable than resynthesized originals,and that as soon as the incorrect specifications were applied, the contourswere significantly less acceptable.

MS 442 M. Florentine and A.J.M. Houtsma

Tuning curves and pitch matches in a listener with a unilateral, low-fre­quency hearing loss.

To appear in: Journal of the Acoustical Society of America.

Psychoacoustical tuning curves and interaural pitch matches were measuredin a listener with a unilateral, moderately-severe hearing loss of primari­ly cochlear origin below 2 kHz. The psychoacoustical tuning curves, mea­sured in a simultaneous masking paradigm, were obtained at 1 kHz for probelevels of 4.5, 7, and 13 dB SL in the impaired ear, and 7 dB SL in the nor­mal ear. Results show that as the level of the probe increased from 4.5 to13 dB SL in the impaired ear, (1) the frequency location of the tip of thetuning curve decreased from approximately 2.85 to 2.20 kHz and (2) the low­est level of the masker required to just mask the probe increased from 49to 83 dB SPL. The tuning curve in the normal ear was comparable to datafrom other normal listeners. The interaural pitch matches were measuredfrom 0.5 to 6 kHz at 10 dB SL in the impaired ear and approximately 15 to20 dB SL in the normal ear. Results show reasonable identity matches (e.g.,a SOO-Hz tone in the impaired ear was matched close to a 500-Hz tone in thenormal ear), although variability was significantly greater for pitchmatches below 2 kHz. The results are discussed in terms of their implica­tions for models of pitch perception.

MS 445 J.M. Thomassen

Erratum: 'Melodic accent: Experiments and a tentative model' (J. Acoust.Soc. Am. 71, 1596-1605 (1982».

To appear in: Journal of the Acoustical Society of America.

The theoretical values in Fig. 6 of the sUbject paper did not correspond tothe model as described in this paper; the correct figure is given in theerratum.

MS 446 F.L. van Nes

Review of the book 'Bildschirm am Arbeitsplatz' (ed. by K. Nagel - R. 01­denbourg Verlag, Munchen, 1981) for the European Journal of Operational Re­search.

The collection of articles which this German book comprises is criticallyreviewed. It contains the official German safety rules for VDU workplacesin the office area, with a very good introduction on this topic.

176

MS - A.J.M. Houtsma

Pitch salience of various complex sounds.

To appear in: Helmholtz issue of music perception.

Pi tch salience of a variety of different complex sounds was measuredthrough open-set melodic dictation tests using five musically experiencedobservers. The experimental task of each trial was to play back all notesof a four-note melody, randomly selected from an eight-note diatonic majorscale, on an eight-note keyboard. Data were reduced to a correlation mea­sure which addresses mostly the degree to which ordinal or contour informa­tion is preserved in the sequence of sensation, and also to a percent cor­rect identification measure which tests preservation of ratio information.The two measures are in some cases very different, and it is proposed thatthose sounds that seem to convey mostly ordinal and little ratio informa­tion should not qualify as sounds that evoke true pitch sensations.

Page 175: IPO Annual Progress Reportalexandria.tue.nl/tijdschrift/IPO 17.pdf · 2010-06-25 · Technology, has succeeded Professor de Vries as chairman. Dr ir P.L. Walraven left and Dr ir A

Reprints and preprints of IPO Publications

Single copies of material from this issue of the IPO Annual Progress Report may be

made for personal, noncommercial use. Permission to make multiple copies must be

obtained from the Institute for Perception Research. Illustrations may be used onlywith explicit mentioning of the source.

Requests for reprints or preprints of publications listed above should be addressedto:

Library,Institute for Perception Research

P.O. Box 513

5600 MB Eindhoven

The Netherlands

Colophon

The following persons contributed to theproduction and distribution of this issueof the IPO Annual Progress Report:

H.C. BuntF.F. LeopoldC.G. BastenA. Smith-HardyMs P.J. Evers

EditorDesignIllustrationsLanguage correctionTyping and distribution

178

Lay-out and printing by the Reproductionand photography Section of the Eindhovenuniversity of Technology.