a computational theory of writing systems

233
A Computational Theory of Writing Systems

Upload: others

Post on 09-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

A ComputationalTheory ofWriting Systems

Richard Sproat

A ComputationalTheory ofWriting Systems

AT&T Labs − Research

c�

RichardSproat2000

For Lisa,who is learningto read

Contents

1 ReadingDevices 11.1 Text-to-SpeechConversion:a Brief Introduction. . . . . . . . . . . . 21.2 TheTaskof PronouncingAloud: a Model . . . . . . . . . . . . . . . 5

1.2.1 A simpleexamplefrom Russian . . . . . . . . . . . . . . . . 61.2.2 Formaldefinitions . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.2.1 AVM’ sandAnnotationGraphs . . . . . . . . . . . 101.2.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . 121.2.2.3 Axioms . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.3 Centralclaimsof thetheory . . . . . . . . . . . . . . . . . . 141.2.3.1 Regularity . . . . . . . . . . . . . . . . . . . . . . 141.2.3.2 Consistency . . . . . . . . . . . . . . . . . . . . . 17

1.2.4 Furtherissues. . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.4.1 Why a constrainedtheoryof writing systems? . . . 191.2.4.2 Orthographyandthe“segmental”assumption . . . 21

1.3 TerminologyandConventions . . . . . . . . . . . . . . . . . . . . . 231.A Overview of FSA’sandFST’s . . . . . . . . . . . . . . . . . . . . . 27

1.A.1 Regularlanguagesandfinite-stateautomata. . . . . . . . . . 271.A.2 Regularrelationsandfinite-statetransducers . . . . . . . . . 28

2 Regularity 332.1 PlanarRegularLanguages . . . . . . . . . . . . . . . . . . . . . . . 342.2 TheLocality Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 412.3 PlanarArrangements:Examples . . . . . . . . . . . . . . . . . . . . 42

2.3.1 KoreanHankul . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.2 Devanagari . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.3.3 Pahawh Hmong . . . . . . . . . . . . . . . . . . . . . . . . . 462.3.4 Chinese. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3.5 A counterexamplefrom AncientEgyptian . . . . . . . . . . . 52

2.4 Cross-Writing-SystemVariationin theSLU . . . . . . . . . . . . . . 532.5 MacroscopicCatenation:Text Direction . . . . . . . . . . . . . . . . 562.A ChineseCharacters. . . . . . . . . . . . . . . . . . . . . . . . . . . 61

v

vi CONTENTS

3 ORL Depth and Consistency 673.1 RussianandBelarusianOrthography. . . . . . . . . . . . . . . . . . 68

3.1.1 Vowel reduction . . . . . . . . . . . . . . . . . . . . . . . . 683.1.2 Regressivepalatalization. . . . . . . . . . . . . . . . . . . . 733.1.3 Lexical markingin Russian . . . . . . . . . . . . . . . . . . 753.1.4 Summaryof RussianandBelarusian. . . . . . . . . . . . . . 77

3.2 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.3 Serbo-CroatianDevoicing . . . . . . . . . . . . . . . . . . . . . . . 86

3.3.1 Methodsandmaterials . . . . . . . . . . . . . . . . . . . . . 873.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.4 Cyclicity in Orthography . . . . . . . . . . . . . . . . . . . . . . . . 913.5 SurfaceOrthographicConstraints. . . . . . . . . . . . . . . . . . . . 933.A EnglishDeepandShallow ORL’s . . . . . . . . . . . . . . . . . . . . 96

3.A.1 Lexical representations. . . . . . . . . . . . . . . . . . . . . 963.A.2 Rulesfor thedeepORL . . . . . . . . . . . . . . . . . . . . 1223.A.3 Rulesfor theshallow ORL . . . . . . . . . . . . . . . . . . . 125

4 Linguistic Elements 1274.1 Taxonomies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.1.1 Gelb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.1.2 Sampson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.1.3 DeFrancis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

4.1.3.1 No full writing systemis semasiographic. . . . . . 1314.1.3.2 All full writing is phonographic. . . . . . . . . . . 1314.1.3.3 Hankulis not featural . . . . . . . . . . . . . . . . 132

4.1.4 A new proposal. . . . . . . . . . . . . . . . . . . . . . . . . 1354.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.2 ChineseWriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1394.3 JapaneseWriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1464.4 SomeFurtherExamples. . . . . . . . . . . . . . . . . . . . . . . . . 151

4.4.1 Syriacsyame . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.4.2 Reduplicationmarkers . . . . . . . . . . . . . . . . . . . . . 1524.4.3 Cancellationsigns . . . . . . . . . . . . . . . . . . . . . . . 153

5 PsycholinguisticEvidence 1575.1 OrthographicDepth . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

5.1.1 Evidencefor theOrthographicDepthHypothesis . . . . . . . 1625.1.2 EvidenceagainsttheOrthographicDepthHypothesis. . . . . 163

5.2 “Shallow” Processingin “Deep” Orthographies. . . . . . . . . . . . 1645.2.1 Phonologicalaccessin Chinese . . . . . . . . . . . . . . . . 1655.2.2 Phonologicalaccessin Japanese. . . . . . . . . . . . . . . . 1665.2.3 Evidencefor thefunctionof phoneticcomponentsin Chinese 1685.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.3 ConnectionistApproaches . . . . . . . . . . . . . . . . . . . . . . . 1705.3.1 Outlineof themodel . . . . . . . . . . . . . . . . . . . . . . 171

CONTENTS vii

5.3.2 Whatis wrongwith themodel . . . . . . . . . . . . . . . . . 1745.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

6 Further Issues 1796.1 Adaptationof Writing Systems. . . . . . . . . . . . . . . . . . . . . 1806.2 OrthographicReforms . . . . . . . . . . . . . . . . . . . . . . . . . 186

6.2.1 The1954spellingrules . . . . . . . . . . . . . . . . . . . . . 1866.2.2 The1995spellingrules . . . . . . . . . . . . . . . . . . . . . 188

6.3 NumericalNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1906.4 AbbreviatoryDevices . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.5 Non-BloomfieldianViewson Writing . . . . . . . . . . . . . . . . . 2016.6 Postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

viii CONTENTS

List of Figures

1.1 A partial linguistic representationfor thesentencein (1.1). Shown area pho-netic transcription,a prosodicanalysisinto two intonationalphrases( � ) andoneutterance(U), accentassignment(*), a setof part of speechtags,andasimplephrase-structureanalysis.PhoneticsymbolsareIPA. Note that ‘MP’means‘measurephrase’.. . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 A simple FST implementingthe rewrite rules in (1.5). In this examplethemachinehasa singlestate(0), which is both an initial anda final state.Thelabelson theindividual arcsconsistof an input label (to theleft of thecolon)and an output label (to the right). Here, capital Romanlettersare usedtorepresenttheequivalentCyrillic letters.. . . . . . . . . . . . . . . . . . . 15

1.3 An acceptorfor ����� . Theheavy-circledstate(0) is (conventionally)theinitialstate,andthedouble-circledstateis thefinal state. . . . . . . . . . . . . . 28

1.4 An FSTthatacceptsa:c (b:d)� . . . . . . . . . . . . . . . . . . . . . . . 30

1.5 Threetransducers,where������ ������ . . . . . . . . . . . . . . . . . . 31

2.1 ��������������� ����! �"���$#%�����&����'(� . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2 Anotherfiguredescribedby ���������)����� ��� ! �"���$#%�������*��'+�-, . . . . . . . . . . 35

2.3 Thefive planarconcatenationoperations:

(a) �����������.��� ��� ; (b) �������0/��1�*� ��� ; (c) ��������! ����� ��� ; (d) �������&2 ����� �*� ; (e)�������435��� �*� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4 A 2FSAthatrecognizesFigure2.2.Thelabels“R” and“D” onthearcsdenotereadingdirection; “left” on state0 (the initial state)denotesthe position atwhich scanningbegins. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.5 A 2FSTthatmapstheexpression���*#6' to ������� � �87 ��� ����! �*7 ���$#+� � ������'(�:9)9 . . 41

2.6 Thesyllables ; mos< /mo/ ‘cannot’,and ; cal< /cal/ ‘well’ in Hankul. . . . 44

2.7 A two-wordMayanglyphrepresentingtheadjectivesequenceyaxch’ul ‘greensacred’, after (Macri, 1996, page 179), with the arrangement=?>�@ ���7 ACB4DFEG! � H�I$9 Perconvention, the capitalizedglossrepresentsa logographicele-ment,andthelower caseglossesphonographicelements. . . . . . . . . . . 55

2.8 Threemethodsof wrappingthevirtual tape:(a) standardnon-boustrophedon;(b) boustrophedon;(c) invertedboustrophedon.. . . . . . . . . . . . . . . 59

ix

x LIST OF FIGURES

3.1 Waveformandvoicing profile for oneutteranceof gradski ‘urban’. Theclo-surefor the/d/ is labeledas“dcl”, thevoicing offsetis labeledas“v”, andthestartof the /s/ is labeledas“s”. Thevoicing profile is the third plot from thetop in thesecondwindow, labeledasprob voice. . . . . . . . . . . . . . . 89

3.2 Barplot showing the proportionsof voicing for all samplesof underlying/d/(blackbars),versus/t/ (shadedbars). . . . . . . . . . . . . . . . . . . . . 90

4.1 The taxonomyof Gelb (1963),alongwith examplesof writing systemsthatbelongto eachcase.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.2 Thetaxonomyof Sampson(1985). . . . . . . . . . . . . . . . . . . . . . 1304.3 Featuralrepresentationof KoreanHankul, from (Sampson,1985,page124),

Figure19. (Presentedwith permissionof Routledge/StanfordUniversityPress.)1334.4 DeFrancis’classificationof writing systems. . . . . . . . . . . . . . . . . 1344.5 A non-arborealclassificationof writing systems.OnPerso-Aramaic,seeSec-

tion 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.1 A modelof readinga word aloud,simplifiedfrom (BesnerandSmith,1992).. 1605.2 TheSeidenberg andMcClellandmodelof lexical processing,(Seidenberg and

McClelland,1989,page526),Figure1. Usedwith permissionof theAmericanPsychologicalAssociation,Inc.. . . . . . . . . . . . . . . . . . . . . . . 172

5.3 The implementedportion of Seidenberg and McClelland’s model of lexicalprocessing,(Seidenberg andMcClelland, 1989, page527), Figure 2. Usedwith permissionof AmericanPsychologicalAssociation,Inc.. . . . . . . . . 173

5.4 Replicationof the (Seidenberg, McRae,andJared,1988) experiment,from(Seidenberg andMcClelland,1989,page545),Figure19. Usedwith permis-sionof AmericanPsychologicalAssociation,Inc.. . . . . . . . . . . . . . 176

6.1 A numeralfactorizationtransducerfor numbersup to 999. . . . . . . . . . 1936.2 Two-dimensionallayoutin ane-mailsignature . . . . . . . . . . . . . . . 203

List of Tables

2.1 Chinesecharactersillustratingthefivemodesof combinationof semantic(un-derlined)andphoneticcomponents. . . . . . . . . . . . . . . . . . . . . 34

2.2 Full anddiacritic forms for Devanagarivowels, classifiedby the positionofexpressionof thediacritic forms. Thus“after” meansthatthediacritic occursaftertheconsonantcluster, “below”, below it, andsoforth. . . . . . . . . . 45

2.3 Distribution of placementof semanticcomponent(Kang-Xi Radical)among12,728charactersfrom theTaiwanBig5 characterset.. . . . . . . . . . . . 49

2.4 Distribution of placementof semanticcomponent(Kang-Xi Radical)among2,596charactersfrom theTaiwanBig5 characterset. . . . . . . . . . . . . 50

2.5 Linguisticunitscorrespondingto Mayancomplex glyphs,from (Macri, 1996);glossingconventionsfollow thoseof Macri. In somephrasalcasesit is unclearwhetherthe unit in questionis really a constituent:suchcasesareindicatedwith a questionmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.1 The ten most frequentlexical markingsfor the shallow ORL in the Englishfragment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.2 Thetenmostfrequentlexicalmarkingsfor thedeepORL in theEnglishfragment. 85

4.1 Disyllabic morphemescollectedfrom theROCLING corpus(10million char-acters)and10 million charactersof the United Informaticscorpus. This setconsistsof pairsof charactersoccurringat leasttwice, andwhereeachmem-berof thepair only cooccurswith theother. . . . . . . . . . . . . . . . . 154

4.2 Furtherdisyllabic morphemescollectedfrom the ROCLING corpus(10 mil-lion characters)and10 million charactersof the United Informaticscorpus.This setconsistsotherpairsof charactersthat do not exclusively occurwitheachother, but wherethereis nonethelessa high mutual informationfor thepair. NotethatLV ( J ) indicatesthatthephoneticcomponentin questionoccurs9 out of 38 timesin characterspronouncedwith initial /l/ followed by somevowel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

xi

xii LIST OF TABLES

4.3 A sampleof Japanesekokuji (secondcolumn),with their componentialanal-ysis (third column). Thefirst columnis theentrynumberin Alexander’s list(LehmanandFaust,1951). The fourth columnlists thephonetic,if any. Thefifth columnlists thekun pronunciation,andthesixth columntheon pronun-ciation, if any: in onecase— goza— thereis no kun pronunciation. Thelast threekokuji shown areformedassemantic-phoneticconstructs,with thelast two beingbasedon the kun pronunciation:note that the phoneticcom-ponentof masaalsomeans‘straight’, so it is possiblethat this oneis alsoasemantic-semanticconstruct.. . . . . . . . . . . . . . . . . . . . . . . . 156

6.1 Total counts in Cregeen’s dictionary of words spelledwith initial ; Ch < ,whereC denotesaconsonant,andthenumberandpercentagesof thosewordsthat areminimal pairswith homophonicor close-homophonicwordsspelledwithout the ; h < . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Preface

Most generalbookson writing systemsarewritten or editedby scholarswho arespe-cialistsin a small subsetof thewriting systemsthat they cover, andwho have devel-opedtheirviewsonwriting in generalbasedontheirown experiencein theirparticularspecializedarea.

Thisbookis different:I cannotclaimto beanexpertonanyparticularwriting sys-tem. My interestin writing systemsstemsin part from my interestin text-to-speechsynthesissystems,and in particularthe problemof converting from written text in-to a linguistic representationthat representshow that text would be read. Given thatproblem,it is naturalto inquireabouttheformalnatureof therelationshipbetweenthewrittenform, andthelinguisticrepresentationthatthewrittenform encodes:Whatlin-guisticelementsdowrittensymbolsencode?Do writing systemsdiffer in theabstract-nessof thelinguistic representationencodedby orthography, andif sohow? Whataretheformal constraintson themappingbetweenlinguistic representationandwriting?Someof theseissueshave,of course,beenaddressedelsewhere,thoughusuallyin aninformal fashion.Thisbookis anattemptto answerthesequestionsin thecontext of aformal,computationaltheoryof writing systems.

One point that needsto be madeat the outsetis that this book is not intendedasan introductionto the topic of writing systems.Therearemany excellentbooksthatserve thatpurpose,including(Sampson,1985),(Coulmas,1989)and(DeFrancis,1989).Special mentionmustbegivento thesuperbcollectionin (DanielsandBright,1996),without which thepresentbookwould not have beenpossible.Thus,while Idodiscussaspectsof severalwriting systemsin someamountof detail,therearealsoanumberof writing systemsthatarediscussedin lessdetail.Thereaderunfamiliarwiththegeneralpropertiesof thewriting systemsdiscussedhereis urgedto consultoneofthemany generalintroductionsto thetopic,suchasthosecitedabove.

In preparingthis work, I have benefitedgreatly from discussionswith andcom-mentsfrom a numberof colleagues,listedherein alphabeticalorder:HaraldBaayen,Alan Black,WaylesBrowne,Roy Harris,LeonardKatz,GeorgeKiraz, KazuakiMae-da, Anneke Neijt, Elena Pavlova, Geoffrey Sampson,Chilin Shih, Brian Stowell,RobertThomson,J. MarshallUnger, andJenniferVenditti. I would especiallyliketo thankSteven Bird, who readthroughtwo whole draftsof this work, andgave meextensive commentson both. I alsoacknowledgean anonymousreviewer for Cam-bridge University Press. Portionsof this work were presentedat the University ofArizona,andat CharlesUniversity in Prague,andI thankaudiencestherefor useful

xiii

xiv PREFACE

commentsandquestions.I alsothankJuergenSchroeterfor helpin usinghisrecordingsetupfor theexperimentreportedin Section3.3.

Thetechnicalproductionof thisbookdependedheavily uponseveralfreeor publicdomainresourcesincludingdatabasesandsoftware. I amindebtedto Rick Harbaugh(developerof www.zhongwen.com) for kindly allowing me accessto his dataonChinesecharacterstructure. Several of the moredetailedanalysesin this book, in-cluding the treatmentof Englishin Section3.2 andof Chinesein Section2.3.4wereimplemented,andtheseimplementationsdependeduponthefsmlibrary developedbymy colleaguesat AT&T Labs,Michael Riley, FernandoPereiraandMehryarMohri.Chinesecharacterswereincorporatedinto LATEXusingStephenSimpson’sPMC pack-age;for DevanagariI usedFransVelthuis’s devtex package;Visible Speechfontsaredue to Mark Shoulson. Editing of figuresand graphicswere doneusing VectaportInc.’s idraw, JohnBradley’sxv, andDavor Matic’sbitmap.

Finally I would like to thankmy editorat CambridgeUniversityPress,ChristineBartels,for hersupportfor this project.

Richard SproatFlorhamPark, New Jersey

September1, 1999

Chapter 1

ReadingDevices

Ourstartingpoint for thisstudyof writing systemsis text-to-speechsynthesis— TTS,andmorespecificallythecomputationalproblemof convertingfrom written text intoa linguistic representation.While the connectionbetweenTTS systemson the onehand,andwriting systemson theothermaynot beimmediatelyapparent,a moment’sreflectionwill make it clearthattheproblemto besolvedby a TTSsystem— namelytheconversionof written text into speech— is exactly thesameproblemasa humanreadermustsolve whenpresentedwith a text to be readaloud. And just aswritingsystems,their properties,and the ways in which they encodelinguistic informationareof interestto psycholinguistswho studyhow peopleread,so(in principle)shouldsuchconsiderationsbe of interestto thosewho developTTS technology:at the veryleast,it oughtto beof asmuchinterestas,for example,understandingthephysiologyand acousticsunderlyingspeechproduction,somethingthat early speechsynthesisresearcherssuchasFant(Fant,1960)wereheavily involvedin.1

Sincemy startingpoint is TTS, andsinceI assumethatmostreaderswill not befamiliar with this field, I will start this chapterwith a review of someof the issuesrelevantto thedevelopmentof TTS systems,particularlyasthey relateto theproblemof analyzinginput text. This will be the topic of Section1.1. In Section1.2 I willinformally introduce,by wayof asimpleexample,themodelthatI shallbedevelopingthroughoutthe restof this book. Finally, Section1.3 will introducesomeaspectsoftheformalism,andtheconventionsthatwill beusedthroughoutthisbook.

1It will perhapscomeasno surprisethatTTS researchershave not, in fact,generallybeenoverly inter-estedin writing systems.This is undoubtedlyduein partto therelatively low interestin text-analysisissuesin generalin theTTS literature,at leastascomparedto thehigh level of interestin suchmattersasprosody,intonation,voice quality andsynthesistechniques.It alsois undoubtedlyrelatedto the fact that muchofthework on TTS is driven by ratherpracticalaims(e.g. building a working system),whereanoveractiveinterestin theoriesof writing systemsmightappearto beanunnecessaryluxury.

1

2 CHAPTER1. READING DEVICES

1.1 Text-to-SpeechConversion: a Brief Intr oduction

As notedabove, thetaskof a TTS systemis to convertwritten text into speech.Nor-mally thewritten representationis in theform of anelectronictext — codedin ASCII,ISO, JIS, UNICODE or someotherstandarddependinguponthe languageandsys-tembeingused;this circumventsoneproblemthathumansmustsolve,namelythatofvisually recognizingcharactersprintedon a page.2 Similarly the output is a digitalrepresentationof speech.Betweenthesetwo representationsarenumerousstagesofprocessing,whichit is profitableto classifyinto two broadstages.Thefirst stageis theconversionof thewritten text into aninternallinguistic representation;andthesecondis the conversionfrom that linguistic representationinto speech.The latter consistsof computingvariousphoneticandacousticparameters,includingsegmentalduration,FK (“pitch”) trajectory, propertiesof the outputspeechsuchasspectraltilt or glottalopenquotient,and(in concatenative speechsynthesissystems)selectionof appropri-ateacousticunits,or (in formant-basedsynthesissystems)thegenerationof vocal-tracttransferfunctionsappropriateto theintendedsounds.We will have nothingfurthertosayabouttheseissueshere;thereaderis referredto (Dutoit, 1997)for a goodgeneralintroductionto theseissues,andalso to (Allen, Hunnicutt,andKlatt, 1987;Sproat,1997b)for an overview of how two particularsystems(theMITalk system,andtheBell LabsTTS system)work.

In any TTS systemthe output speechwill be generatedfrom an annotatedlin-guistic representation,which is in turn derived from input text via the first stageofprocessingdefinedabove. How rich a linguistic representationis presumed(and intermsof which linguistic theoriesandassumptionsit is couched)differsfrom systemto system,of course,but we mayat leastassumethatthelinguistic representationwillincludeinformationon the sequenceof soundsto be enunciated(usuallyallophonesof phonemes,but in somesystemswhole syllable-sizedunits), lexical stressor toneinformation, word andphrase-level accentuationand emphasis;and the location ofvariousprosodicboundaries,includingsyllableandprosodicphraseboundaries.Thusfor an input suchasthat in (1.1),we might presumeasa plausible(partial) linguisticrepresentation,therepresentationin Figure1.1.

(1.1) I need2 oz. of Valrhonaand6 anchosfor themole.

In the particularrenditionof the sentencepresumedin Figure 1.1 thereare two in-tonationalphrases(denotedby L ) groupedinto a singleutterance(U). Lexical stressis indicatedby a metrical treedominatingindividual syllables( M ) anddominatedit-self by a prosodicword ( N ); we assumethatproclitics form a prosodicword with thefollowing contentword. Also indicatedare lexical accentsfor the wordsneed, two,ounces, Valrhona, six, anchosandmole.

In orderto producethis representation,or any equallyplausiblerepresentation,forthis sentence,a readermust“reconstruct”a greatdealof linguistic informationthat is

2Of course,it is possibleto hookup a TTS systemto anoptical character recognition (OCR)system;suchsystemshave in fact beenavailable for several yearsin the form of page-readersfor the blind (e.g.Kurzweil’s reader);andtherehasbeenmuchrecentinterestin conversionof FAX into speech,which addsyet a furthercomplication,namelymessyinput.

1.1. TEXT-TO-SPEECHCONVERSION:A BRIEF INTRODUCTION 3

nid tu moleð

I need two oz. of ValrhonaO

and 6 anchos forP

the mole

w s w s w s w

σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ

ω ω ω ω ω ω ω ω

w w s w ws w

sss s

*Q

*Q

*Q

*Q

*Q

*Q

VR

Num N P N CnjS

Num N P Det N

ι ι

U

MP PP NP NP

PP

NP

VPR

Pro

ST

*Qa Is ksI

e v e n ef r

σ

eawns z

eeva elron a ∫U

ent oz

NP

Figure1.1: A partial linguistic representationfor thesentencein (1.1). Shown area phonetictranscription,aprosodicanalysisinto two intonationalphrases( � ) andoneutterance(U), accentassignment(*), a setof part of speechtags,anda simplephrase-structureanalysis.PhoneticsymbolsareIPA. Notethat‘MP’ means‘measurephrase’.

4 CHAPTER1. READING DEVICES

simplynotrepresentedin thewrittenform. Naturallyall syntacticinformation,includ-ing both the morphosyntacticpart of speechtagsaswell asphrasestructuremustbecomputed.Somusta greatdealof thephonologicalinformation.So,thesequenceofphoneticsegmentsareonly somewhat indirectly representedin Englishorthography:in somewritten formssuchas2, 6 andoz. they cannotbesaidto berepresentedatall.In the latter casethe linguistic form mustbe reconstructedentirely from thereader’sknowledgeof the language,andoftendependsuponinformationaboutcontext (doesonesayounceor ounces?). In somecasesreadersmayneedto makeeducatedguessesaboutthepronunciationsof somewords,thoughif thesefollow thenormalpronunci-ationconventionsof thelanguagethey will usuallyguesscorrectly:evenreaderswhohad not previously seenthe words anchos or Valrhona could nonethelessprobablyhave guessedthecorrectpronunciation.For mole— in thesenseof a Mexicansauce,andpronounced/ VmoleI/ — thesituationis morecomplex sincethepronunciationheredoesnotfollow standardEnglishconventions:in thiscaseonewouldsimplyhaveto befamiliarwith theword. But thereis of courseanadditionalproblemherein that,asinthecaseof oz., onemustalsodisambiguatethis word,sothatonedoesnot pronounceit asthehomographic/ Vmol/ (e.g.,in thesenseof a speciesof insectivore).

Prosodicphrasingis rarelyrepresented;notethatpunctuationis only partlyusedinthis function(Nunberg, 1995),andin any caseit is by no meansconsistentlyusedinevery casewhereonemight plausiblyfind a prosodicboundary. Lexical accentuationis almostnever indicated.3

Thus, if one is designinga TTS systemthat canhandlearbitrary text in a givenlanguage,it is generallynecessaryfor the systemto possessa large amountof lin-guistic knowledge,including knowledgeaboutthe lexical andphrasalphonologyofthelanguagein question,andat thevery minimuma setof heuristicsfor determiningplausiblelocationsfor accentsandprosodicphraseboundaries(Dutoit, 1997;Sproat,1997b).

If one,furthermore,is developinga TTS systemthat is intendedto be adaptableto morethanonelanguage,thenthereis anadditionalconsideration:not only do thewritten formsof utterancessystematicallyfail to indicatemany aspectsof thespokenforms, but differentwriting systemspresentdifferentsetsof problems.Thus,if onedesignsaTTSsystemwith Europeanlanguagesin mind,onemightreasonablyassume(asmany havedone)thatwordsin theinput text areseparatedby whitespace.But thisassumptionwill fail with writing systemslike thoseof Chinese,Japaneseor Thai,whereword boundariesareneverwritten. (Seethediscussionof variousAsianscriptsin (DanielsandBright, 1996),andsee(Sproatet al., 1996) for a discussionof the

3It is generallytrue that suprasegmentalandprosodicinformation is systematicallyomitted from theorthographiesof a large variety of languages.This is particularlytrue for high level prosodicinformationsuchasprosodicphraseboundaryplacement,andaccentuationandprominence.But it extendsto purelylexically determinedfeaturessuchaslexical tone.Thuswhile somelanguages,suchasThai,VietnameseorNavajo,do indicatelexically distinctive tonein theirorthographies,it seemsto befarmorecommonto omitthis feature:for examplemany orthographiesdevelopedfor tonal languagesof Africa omit marksof tone,thoughit shouldbenotedthatmany of thesescriptsweredevelopedby Europeanmissionarieswho hadnounderstandingof tone: see(Bird, 1999)for a discussionof morerecentlydevelopedAfrican orthographieswheretoneis marked.

A relatedpoint, asGeoffrey Sampsonhasnoted(personalcommunication),is that Latin did not marklengthin vowels(thoughgeminationin consonantswasmarked).

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 5

issuein acomputationalsetting.) Similarly, for many languagesonemayassumethatabbreviationsandnumberscanbe expandedin a “preprocessing”phaseprior to fulllinguisticanalysis.For English(or Chinese)this(almost)worksin thatanabbreviationsuchasoz. hasonly two plausibletranslations,namelyounceor ounces, andin mostcasessomesimpleheuristicsbasedon the context can tell you which one it shouldbe.But asI havediscussedat lengthelsewhere(Sproat,1997b;Sproat,1997a),suchasimpleapproachcannotwork for Russian,wherein orderto decidehow to pronouncea seeminglyinnocuoussequencesuchas5%, oneneedsto determinesuchthingsaswhetherthe percentageexpressionis modifying a following noun(‘a 5% discount’)or not (‘I need5%’). In the former casethe ‘5%’ phraseis an adjective agreeingincase,numberandgenderwith thefollowing noun;in the lattercase,it is a noun,andits casenumberandgenderis determinedby thesyntacticcontext in which it occurs.Thus the expressionWYX[Z(\�]�^�\�_a` 5% skidkab ‘5% discount’, is readas pjat+i-procent+n+ajaskidka(five+Gen-percent+Adj+NomFemdiscount),with anadjectivalform procentnajaagreeingin number, genderandcasewith thefollowing noun. Thesimpleexpression 5%b on its own would bereadas pjat’ procent+ov (five+Nompercent+GenPl),with a nominalprocentov in thegenitiveplural form. If theexamplewere ` 4%b , the word for ‘percent’would have to be in the genitive singular form:cetyreprocent+a(four+Nompercent+GenSg).If the“percentphrase”is governedbyan element,suchasa preposition,requiringan obliquecase,thenthe entirephrase,including the numberandword for ‘percent’,mustappearin thatobliquecase:thusZcWYX5` s5%b ‘with 5%’, is spjat’ju procent+ami(with five+Instrpercent+InstrPl).

Considerationssuchastheseinevitably leadoneto askwhatcommonalitiesthereareamongthediversewritten representationsof language,andwhethera singlecom-putationalmodelcanencompassall systemsthatonemightencounter. A modelof thiskind for TTStext analysis,onethathasbeenappliedto languagesandwriting system-s asdiverseasGerman,Spanish,Russian,Hindi, ChineseandJapaneseis discussedelsewhere(Sproat,1997b;Sproat,1997a). The purposeof this book is to presentacomputationaltheoryof writing systemsthatwasmotivatedby thework on TTS,andthatis at leastto someextentconsistentwith themodelpresentedin thispreviouswork.

1.2 The Task of PronouncingAloud: a Model

We turnnow to sketchingthemodelof therelationbetweenwrittenandlinguistic for-m thatwe will developin this book. As implied by our discussionin the lastsection,we will, at leastinitially beconcernedwith specifyinga computationalmodelwhosetaskis to pronouncetext aloud.Thustheproblemwe startout with is essentiallywhatpsychologistswho studyreadingtermnaming— thepronunciationaloudof a writtenform. This is in principlea differenttaskfrom thetaskof lexical accessvia a writtenform, andfrom thetaskof decidinghow to spella givenlinguistic form. Thecompu-tationalmodelof writing thatwe will proposewill nonethelesshave implicationsfortheseissuesalso: indeeda largeportionof thediscussionin Chapter3 will focusona modelof spellingfor English. We startherewith anexamplethatwill illustratethemodelto bedeveloped.

6 CHAPTER1. READING DEVICES

1.2.1 A simpleexamplefr om Russian

Most literatepeople,even thosewho aremonolingual,arebroadlyawarethat someorthographiesaremore“regular” thanothers;that,for example,Spanishorthographyis highly regular (“written asit sounds”),andthat Englishorthography, on the otherhand,is highly irregular. This naive notionof regularity correspondsroughlyto whatpsychologiststerm orthographic depth. That is, psychologistsoften refer to an or-thographyasdeepif it is not generallypossibleto reconstructthepronunciationof awordby simplylookingatthestringof symbolsandapplyinggeneral“letter-to-sound”rules; see(Frost,Katz, andBentin, 1987;BesnerandSmith, 1992;Katz andFrost,1992;Seidenberg, 1992), inter alia, aswell asthediscussionin Chapter5.4 Thus,in termsof themetaphorof depth,theorthographyof Spanish,is shallower thanthatof English(or Hebrew). With somelegitimacy we canconsiderSpanishandEnglishasbeingneartwo endsof a spectrumof possibleorthographicdepths.

Russianfalls somewherein betweenthesetwo extremes:it is not nearlyasirreg-ular asEnglish,but at the sametime it is not possibleto do asonecanin Spanish,andpredictthepronunciationof a word purelyby looking at theorthographicstring.Russianorthographyis oftendescribedasmorphological (Cubberley, 1996,page352),meaningthat thespellingsystemattemptsto representmorphologicallyrelatedformsconsistently, abstractingaway from at leastsomephonologicalchanges.As a corol-lary, a readerof Russianneedsaccessto this morphologicalinformation in order topronouncewordscorrectly.

To seewhat is meantby this, considerthe problemof pronouncinga particularletter string, say d4egf�e"^�_h` gorodab . As it happens,this can representoneof twolexical forms in standardRussian:‘of a city’ (city+gen.sg.),in which caseit is pro-nouncedwith initial stress/ Vgori dj /; or ‘cities’ (city+pl.nom./acc.),in which caseitis pronouncedwith final stress/gi r j4Vda/. Thefact that therearetwo possiblepronun-ciationsfor the string d4egf�e"^�_k` gorodab , shows immediatelythat it is not possibleto pronouncethis stringmerelyby looking at the sequenceof letters:onemusthaveaccessto lexical information,andin this caseonepresumablyneedsaccessto someinformationaboutthecontext in whichthewordoccurs,sincethereaderneedsto deter-minewhetherthegenitivesingularor plural nominative/accusativeis themoreappro-priateinterpretation.5 Not surprisingly, giventhehigh degreeof lexical competenceneededto be able to assignlexical stressin Russianwords, pedagogicalgrammarsof Russianroutinelymarkstressplacement.Thusthegenitivesingularform would bewritten d ´egf�e"^�_8` gorodab , whereasthegenitiveplural form wouldbewritten d"egf�e"^ ´_` gorodab . But suchmarksof stressarerarelyusedin non-pedagogicalcontexts. Innot markingstress,Russianorthographythusfails to mark informationthat is impor-tantfor gettingthereadingcorrect;to usea termsuggestedto meby AnnekeNeijt, itscoverageof thephonologicalinformationis incomplete.

But Russianorthography, in additionto its incompletecoverage,is alsorelatively

4An alternative termto orthographicdepth, namelyorthographictransparency, is gainingsomecurren-cy; LeonardKatz,personalcommunication.

5Notealsothat this particularambiguitybetweengenitive singularandplural nominative/accusative —with concomitantshift in lexical stress— is by no meansgeneralin Russian:only a subsetof nounsshowthis particularambiguity, thoughothercasesof stress-relatedminimalpairsarerife in thelanguage.

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 7

“deep” in thattherearestress-relatedvowel reductionsthatarenot markedin Russianorthography:note for example,that the first /o/ in d4egf�e"^�_l` gorodab shows up as/o/ whenstressed,asin the genitive singularform; but as/ i / whendestressed(morecorrectly, whenin the syllableantepenultimateto the stress(Wade,1992)),asin thenominative/accusativeplural. Thesealternationsarequiteregularandpredictable,butthey arenevermarkedin theorthography, whichmeansthatRussianorthographyrep-resentsa level that is somewhatmoreabstractthana surfacephonemiclevel. As weshallseein Section3.1,thestandardorthographyfor Belarusiandoesorthographicallyrepresentthesevowel reductions,and is thereforesomewhat moreshallow than theorthographyof standardRussian. (Belarusianis like Russianin termsof coverage,though,in thatit too fails to markstressin theorthography.)

Beforeweproceedfurther, weneedto definea little morepreciselywhatwemeanwhenwe speakof anorthographicobjectrepresentinga linguistic object.Let usstartwith what I take to be a fairly uncontroversial(partial) representationof the genitivesingularform goroda ‘of a city’, namelythe Attribute-ValueMatrix (AVM) in (1.2).(On theuseof AVM’ s in phonologicalrepresentationssee,inter alia, (Bird andKlein,1994;MastroianniandCarpenter, 1994;Bird, 1995).)

(1.2) mnnnnnnnnoPHON p�qgV r+s%r%t"uGvSYNSEM

mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����

�)�������)���������

Firstof all, a few commentson(1.2).Theprimarystressonthefirst syllableis indicat-edherewith thestandarddiacritic ‘ V ’, ratherthanby anexplicit hierarchicalprosodicstructurewithin theAVM; this is purelya notationalconvenience.For similar reasonsof notationalconvenience,thephonologicalrepresentationis given,in thisexample,asa list of segments,with no indicationof higherlevel prosodicstructure,suchassylla-blesor feet. (Indeed,wearetakingsomeamountof liberty by evenallowing segmentsinto our ontology, giventhegrowing bodyof phonologicalwork thatviews segmentsasepiphenomenaof temporallyoverlappingcollectionsof features.We returnto thispoint lateron.) Also worthy of noteis thefact that thesegmentalrepresentationpre-sentedis what traditionallywould betermeda relatively “deep” representation,sinceit abstractsaway from variouslow-level phonologicalprocesses,suchas the vowelreductionswe havediscussed;this is intentional,sinceI shallarguethatit is this deepphonologicallevel thatis representedby theorthographyof Russian.Finally, therep-resentationin (1.2) fails to indicatethatgoroda is morphlogicallycomplex, arguablyconsistingof a stemgorod- andan inflectionalaffix -a. Perhapssurprisingly, I willhave relatively little to sayaboutmorphologyin this book,thoughI will returnbrieflyin Section3.4 to therelationbetweenorthographyandmorphologicalstructure.

Wheredoesorthographyfit into (1.2)? An obvious first cut at a representation

8 CHAPTER1. READING DEVICES

would besimply to assumeanotherattribute“ORTH” with anassociatedlist of ortho-graphicelements.

(1.3) mnnnnnnnnnnoPHON p�qgV r+s%r%t"uGvORTH p$d4egf�e"^�_gvSYNSEM

mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����

�)�������)�����������

But this representationis inadequatefor several reasons.First of all, while it repre-sentsthefact that d4egf�e"^�_�` gorodab is theorthographicrepresentationof goroda, itfails to indicatethe obvious fact that the individual lettersof the orthographicrepre-sentationeachcorrespondto a particularlinguistic unit, in this casea segment: thusdh` g b clearly represents/g/, and e�` o b clearly represents/o/. Second,it fails torepresentthe kind of relationbetween(in this case)the phonologicalportionsof therepresentationandtheorthographicportion. It seemsreasonableto view this relationasoneof licensing, whereparticular(setsof) linguistic elementslicensethe occur-renceof (setsof) orthographicelements.Thus/g/ licensestheoccurrenceof d�` g b inthis example.Third, andfinally, by presentingthevalueof ORTH asanordered list,weareredundantlyspecifyinginformationthatis specifiedelsewherein theAVM: thephonologicalsegmentsareorderedwith respectto oneanother, andthelinearorderingof thelicensedorthographicelementsoughtto follow in somefashionfrom that.

Theseconsiderationsleadus to propose,instead,the representationin (1.4). Werepresentlicensingusingnumericalcoindexation, wherethe index of the licenserismarkedwith an asterisk.Thevaluefor ORTH is itself an unorderedlist of objects:we indicatethis usingthestandardcurly-bracenotationfor sets.

(1.4) mnnnnnnnnnnoPHON p�q���%V r%���%s%���"r+���"t4�~�Yug�~�"vORTH �6d �(� e �+� f �Y� e ��� ^ ��� _ �%�SYNSEM

mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����

�)�������)�����������

As we have seen,we have assumeda relatively abstractphonologicalrepresenta-tion in theRussianexamplethatwe have beendiscussing.In generalwe will assumethat theorthographyof a languagerepresentsa particularlinguistic level of represen-tation. For phonologicalinformationthat is orthographicallyencodedwe canspeak

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 9

of this level asbeingrelatively “deep” comparedto a “surfacephonemic”represen-tation; or relatively shallow. We will term the linguistic level representedby the or-thographyof a languagetheOrthographicallyRelevantLevel — ORL.6 Notethatweare not claiming thatevery symbolin thespellingof a word necessarilyhasa (non-orthographic)linguistic counterpartat the ORL: so aswe shall arguein Section3.2,many aspectsof the spellingsof words in Englisharearbitraryandsimply mustbelistedaspartof theword’s spelling. Nonethelesseven in anorthographyasirregularasthatof Englishthereareregularcorrespondencesbetweenlinguistic elementsandtheirorthographicexpression:theORL is simply thatlinguistic level of representationat which thoseregular correspondencesaremostsuccinctlystated.Note that for ex-positoryreasonswewill typically presentastheORL just thatportionof thelinguisticrepresentationthat is relevant to the particularorthographicphenomenonunderdis-cussion.Thus,for mostpurelyphonographicscripts,informationassociatedwith theSYNSEMportionof theAVM is not typically relevant(thoughin somecasesit mightbe, asfor examplein Germanwherecapitalizationis sensitive to whetheror not theword is a noun). In suchcasestheSYNSEMinformationwould beomittedfrom therepresentation:it shouldbeunderstood,however, that the informationis still present,just not germaneto thediscussionat hand.

Returningto (1.4),wenotethatthereis still someredundancy thatcanberemoved.Russianorthographyis largely regular in thesensethata given(abstract)phonemeistypically only spelledin oneway. This in turn implies that we shouldnot needtoexplicitly list theorthographicelementsin theAVM; indeedin theexamplein (1.4)allof thelettersarecompletelypredictable,andcouldbederivedvia asetof rewrite rulesasfollows:

(1.5) g � d�` g bo � e�` o br � f�` r bd � ^l` d ba � _�` ab

Suchrulescanbe viewedasfilling positionsin the orthographyportion of the AVMandhencelicensingthe materialin thosepositions. Of courseeven in fairly regularspellingsystems— andcertainlyin complex systemssuchasEnglish— somelexicalspecificationof spellingis necessary. This canbehandledeitherby simply listing theirregular spelling,or elseby a lexically specificspellingrule. Thus for the Englishwordknit, for instance,wemightassumealexical representationasin (1.6a),or elsearule asin (1.6b),in eithercasespecifyingthespellingof /n/ as ` kn b ; weassumethattheremaining/it/ is regularlyspelled:

6This level is roughlyequivalentto whatI have referredto asthemorphologically motivatedannotation(MMA) in previouswork on text-analysisfor TTS (Sproat,1997b;Sproat,1997a).

10 CHAPTER1. READING DEVICES

(1.6) (a) mnnnno PHON p�w �:� I ��vORTH �%�gw � �SYNSEM � CAT �4|(s��

SEM �gw�}~���� �����

(b) n � ` kn b in knit

As we will discussfurther, we will follow Nunn(1998)in assumingthatrulesareusednot only in the initial graphemic licensingphasethatwe have beendiscussing,but alsoin asubsequentphaseof whatNunntermsautonomousspellingrules. Wewillexpandher notion of autonomousspellingrule to includewhat we will term surfaceorthographicconstraints; seeSection3.5.

1.2.2 Formal definitions

In this sectionwe expandthe formalismfurther, and introducesomeadditionalfor-mal notations,aswell assomeaxiomsthat control the mappingbetweenlinguisticinformationandorthography. We will alsointroducethecentralthesesof this study.

1.2.2.1 AVM’ sand Annotation Graphs

Let usreturnto theAVM representationfrom (1.4),repeatedhereas(1.7):

(1.7) mnnnnnnnnnnoPHON p�q� � V r%� � s%� � r+� � t4� � ug� � vORTH �6d � � e � � f � � e � � ^ � � _ � �SYNSEM

mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����

� �������)�����������

In the Russianexample,orthographicelementsare licensedpurely by phonologicalelements.In partlylogographicwriting systemslike Chinese,weproposethatpartof acomplex glyphmaybelicensedbyaportionof theSYNSEMpartof therepresentation.Thusconsiderthecharacter��` INSECT+CHAN b chan ‘cicada’— seeSection1.3fora detaileddiscussionof our conventionsfor glossingChinesecharacters— wheretheINSECT component  (lefthandportionof thecharacter)is theso-called “semanticradical” andthe righthandcomponent¡ chan cuesthe pronunciation.For this casewe assumean AVM as in (1.8), wherethe INSECT portion is licensedby the SEMentry, andthephonologicalportionis licensedby thesyllable:

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 11

(1.8) mnnnnnnnnnoPHON

mno SYL

mo SEG ¢�£ ONS {¥¤§¦ £ RIME u"w*¦©¨TONE ª �� � �� � �

SYNSEM � CAT w�r+x�wSEM {�}~{¥u4t4ug�~�+�

ORTH ��  � � ¡ � �� ����������

An equivalentrepresentationthatwewill use— andwhichwill moredirectly formthebasisfor ouraxioms— is theannotationgraph; see(Bird andLiberman,1999)andalso(Bird, 1995).The annotationgraphsin (1.9)and(1.10),areequivalent(omittingsomedetail)to theAVM’ s in (1.7)and(1.8),respectively.

(1.9)

SEM: city

PHON: g : d o : e r : f o : e d : ^ a : _(1.10)

SEM: cicada:  TONE: 2SYL: M : ¡ONS-RIME: ch an

Therepresentationsof theannotationgraphsin (1.9)and(1.10)areto beinterpret-edasfollows. First of all theannotationssuchas“SEM”, “SYL”, andso forth in thelefthandcolumnmarkarc-sequencesthatencodevaluesof thethus-namedattribute(s)in the correspondingAVM. Thusin (1.10), for instance,the SEM arc representsthevaluecicadafor theattributeSEM.Second,theverticalmarksindicateverticesof thegraphoutof whichthehorizontalarcsemanate.Theverticesareassumedto betempo-rally anchored,with verticesontheleft precedingverticesontheright. Thusthesourcevertex of theONSarclabeledch in (1.10)— z(r+x�s%{�|g«�{¥¤�¬ — precedesits destinationvertex ( t4|%z6��«�{¥¤�¬ ); it alsoprecedesthedestinationvertex of theSEM arccicada:   .We will denoteprecedencein thestandardfashionwith “ ­ ” so that u�­®� is read“ uprecedes� ”; “ ¯ ” will beusedto mean“precedesor is cotemporaneouswith”; finally“ ° ” and“ ± ” will alsobeusedwith theobviousmeanings.

Setsof arcsthatarein a dominancerelation— i.e. form a graph-basedhierarchyin the senseof Bird andLiberman(1999)— are(vertically) adjacentto each otherand are joined at at leastone vertex. On the other hand,setsof arcs that are notin a dominancerelation are separatedby a blank line. Thesedominancerelations

12 CHAPTER1. READING DEVICES

correspondto relationsof dominancein thecorrespondingAVM. So,in (1.10)theSYLandONS-RIMEarcsequencesarein adominancerelation:thiscorrespondsto thefactthatin theAVM in (1.8),theSYL attributehasanAVM containingtheonsetandrimeAVM’ s, and thusdominatesthe AVM’ s. (Similarly, SYL dominatesTONE, thoughTONEis not in adominancerelationwith ONS-RIME,apointnotwell representedinthegraph.)On theotherhand,SEM is not in a dominancerelationwith SYL. Ratherthe SEM andSYL arcsmerelytemporallyoverlap(seebelow). Finally, we indicatelicensingby placingthe licensed elementon thesamearcasits licenser. Thus‘g: d ’meansthatthephoneme/g/ licensestheletter d�` g b .

1.2.2.2 Definitions

We now statesomedefinitionsandaxiomsover the annotationgraphrepresentationthatwe have just developed.

First of all somedefinitions,startingwith two versionsof temporaloverlap:

Definition 1.1(Overlap) Arc ² overlapsarc ³ ( ² � ³ ) if either:

1. z6r+x�s%{�|g«�²´¬0¯µz(r+x�s%{�|g«$³¶¬ and t4|+z6��«:²�¬0°·z(r+x�s%{�|g«�³¶¬ , or

2. t"|%z6��«�²´¬&±µt4|%z6��«$³¶¬ and z(r+x�s%{�|g«:²�¬0­µt"|%z6��«$³�¬Definition 1.2(CompleteOverlap) Arc ² completelyoverlapsarc ³( ² �¹¸ ³ ) if: z(r+x�s%{�|g«�²´¬0¯·z(r+x�s%{�|g«$³�¬ and t4|%z6��«�²´¬0±¹t4|%z¥��«�³¶¬Note thatwhile overlapis symmetric,completeoverlapis not. (Note thatwe usethesymbol“

�” for overlap,ratherthanthemorenormal º : this lattersymbolis usedhere

for composition.)Following Bird andLiberman’snotionof graph-basedhierarchy, we defineimme-

diatedominancebothin termsof thegraph,andin termsof thetypesof arcsinvolved.

Definition 1.3(Immediate Dominance) Arc ² immediately dominates arc ³( ²»b½¼¿¾�ÀÁ³ ) if ² �¹¸ ³ andthetypeof ³ is (a list elementof) a valueof anattributeinAVM’sof type ² .

ThusaSYL arcthatcompletelyoverlapsanONSarcwould immediatelydominatetheONSarcassumingin theassociatedAVM theSYL AVM hasanattribute(e.g. SEG),whosevalueis a list containingtheAVM for ONS;cf. (1.8). On theotherhandSEMwouldnot dominateONS.

We will also needa definition of path-precedenceon arcs,denotinga situationwheretwo arcsjoin at thesamevertex, suchthat the secondimmediatelyfollows onthesecondwithin thesamepaththroughthegraph.

Definition 1.4(Immediate Path-Precedence)Arc ² immediatelypath-precedesarc³ ( ²»­&Â8³ ) if t"|%z6��«�²´¬ is identicalto z(r+x�s+{¥|4«�³¶¬ .

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 13

1.2.2.3 Axioms

This sectionintroducesthe axiomsthat form the coreof the theory that we will bedefending.Beforewe do that, we will formalizea few ideassomewhat further. Wehavealreadyintroducedthenotionof OrthographicallyRelevantLevel (ORL),asbeingthe level of linguistic representationthat is encodedorthographicallyby a particularwriting system.We will denotethe outputof themappingfrom the ORL to spelling— i.e., thespellingitself — as à . As wehavealreadysaid,we follow Nunn(1998) inassumingthat this mappingcanbe decomposedinto a setof graphicencodingrules,anda setof autonomousspellingrules;again,seeSection3.5. Eachof thesesetsofmappingrulesimplementsarelation(wewill bemorespecificonwhatkindof relationmomentarily),theformerof which we’ll notateas ÄÆÅ�Ç ¸ ¾©¼¿È andthe latteras ÄÊÉ Â È~Ë)Ë .Theentiremapping,which we will denoteas Ä»Ì�Í�ÎÏ.Ð is simply thecompositionofthesetwo relations: Ļ̶Í�ÎÏ1ÐÒÑÓÄ»Å¶Ç ¸ ¾©¼¿È�º0ÄÆÉ Â È~ËÔË .

Wewill usetheexpressionÕÖ«�²´¬ to denotethe imageof linguisticelement² underĻ̶Í�ÎÏ1Ð .Theaxiomsmakeuseto two furtherconcepts.Thefirst is thenotionof catenation.

Informally, ² catenateswith ³ , denoted²�×¥³ if ² is adjacentto ³ . Themostfamiliarnotion of catenationis the string-basednotion of concatenationin formal languagetheory(Harrison,1978;HopcroftandUllman, 1979;Lewis andPapadimitriou,1981)where ²Ø×�³ constructsastringby concatenating² with ³ , in thatorder. In Chapter2,wewill generalizethisnotionto planar(two-dimensional)catenation.In thediscussionin this section,for simplicity’s sake, we will assumewhatwe shall later term left-to-right catenation, denoted

Ï × : ² Ï ×8³ simplydenotesastring ²�³ , where² immediatelyprecedes³ .

The secondconceptis the idea that the spell-outof a linguistic sequenceunderĻ̶Í�ÎÏ1Ð maybe lexically specified,asalreadyintroducedabove. We illustratethispoint immediatelyafterthestatementof Axiom 1.1:

Axiom 1.1 If ²»­  ³ thenif Õ´«:²�³�¬ is nototherwisedefined,ÕÖ«�²¶³¶¬�ѵÕÖ«�²´¬×:ÕÖ«$³�¬ . (If² immediatelypath-precedes³ , thenthe image of ²¶³ under Ļ̶Í�ÎÏ1Ð is simplythecatenationof ÕÖ«�²´¬ with ÕÖ«$³�¬ .)Thusin English,thespelloutof thephonemesequence/bo/ would, accordingto Ax-iom 1.1, be ÕÖ«:��¬ Ï ×·ÕÖ«�r�¬ , or ` bob (assumingthe default ways of spelling thosephonemes).On the other hand,lexical specificationmay override this: /ks/ is fre-quentlyspelled x b , preemptingspelloutas ÕÖ«:�§¬ Ï ×½ÕÖ«:z+¬ .

The secondaxiom describesthe mechanismof inheritanceof graphicalspelloutfor acomplex linguisticconstructionthatimmediatelydominatesother(possiblycom-plex) linguistic constructions:

Axiom 1.2 If ²»b½¼¿¾-ÀÁ³ ( ³ possiblya sequence)thenif Õ´«:²�¬ is nototherwisedefined,ÕÖ«�²´¬�Ñ·Õ´«�³¶¬ . (If ² immediatelydominates³ , thentheimage of ² under Ä»Ì�Í�ÎÏ.Ð issimplytheimageof ³ under Ä»Ì�Í�Î�Ï1Ð .)

Thus,for instance,thespelloutof the syllabledominating/kæt/would consistof thespelloutof the onsetdominating/k/ andthe spelloutof the rime dominating/æt/. In

14 CHAPTER1. READING DEVICES

turn,theformerconsiststheof spelloutof /k/, andthelatterthespelloutof thesequence/æt/.

Finally, we introduceAxiom 1.3 which definesthe spelloutof two overlappingelements.Thefunctionalityof thisaxiomwill beillustratedwith datafrom ChineseinSection4.2:

Axiom 1.3 If ² � ³ , then Õ´«:² � ³¶¬ÒÑÙÕ�×"³ . (If ² overlaps ³ , then the image of ²togetherwith ³ under Ä Ì�Í�ÎÏ.Ð is simplytheimageof ² , catenatedwith theimageof³ .)

An importantpoint to noteabouttheseaxiomsis that they do not precluderegu-lar (i.e. non-lexically-specified)context-dependentspellout.For instance,thedefaultspellingof /k/ before ` i b , ` eb or ` y b in Englishis as ` k b , whereasin othercon-texts it is ` c b . Axiom 1.1 merelyrequiresthatwhatever spellsout /k/ catenatewithwhateverspellsout thevowel.

1.2.3 Central claimsof the theory

Wenow cometo thecoreproposalsthatI wishto defendin theremainderof thiswork:Ú Regularity: ThemappingĻ̶Í�ÎÏ1Ð is a regular relation.Ú Consistency: The ORL for a given writing system(as usedfor a particularlanguage)representsa consistentlevelof linguistic representation.

We describetheseclaimsin thenext two sections.Here,andelsewherein this work,I will capitalizethe terms“Regular”, “Regularity”, “Consistent”and“Consistency”whenthey areusedin thesetechnicalsenses,andotherwiselowercasethem.

1.2.3.1 Regularity

The first of the coreproposalsstatesthat Ļ̶Í�ÎÏ1Ð is a regular relationor, equiva-lently, that Ļ̶Í�ÎÏ1Ð canbe implementedasa finite-statetransducer(FST); readersnot familiarwith FST’smaywish to consultAppendix1.A, thougha shortsynopsisisgivenimmediatelybelow.

Our route to the claim of Regularity comesaboutin two ways. First of all, wehaveassumedthatthemappingbetweenlinguistic representationandorthographycanbe handledby context-sensitive rewrite rules, an assumptionthat is held by othersincluding (Venezky, 1970)and(Nunn,1998),andit is onewhich naturally fits wellwith the standardnotion of “spelling rule”. Now, as hasbeenshown in (Johnson,1972;KaplanandKay, 1994), as long ascertain constraintson non-applicationtotheir own outputareobserved,suchrulesareformally equivalentto regularrelations,andcanthereforebeimplementedusingFST’s. Indeed,practicalcompilershavebeenbuilt that compile from rewrite rule representationsinto transducers(KarttunenandBeesley, 1992;KaplanandKay, 1994;Karttunen,1995;Mohri andSproat,1996).

An instanceof an FST — oneimplementingthe simplesetof rulesin (1.5) — isshown in Figure1.2.

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 15

0

g:Go:Or:Rd:Da:A

Figure1.2: A simpleFSTimplementingtherewrite rulesin (1.5). In thisexamplethemachinehasa singlestate(0), which is bothaninitial anda final state.Thelabelson theindividual arcsconsistof aninput label (to theleft of thecolon)andanoutputlabel (to theright). Here,capitalRomanlettersareusedto representtheequivalentCyrillic letters.

16 CHAPTER1. READING DEVICES

Second,Regularity follows from theaxiomsintroducedin Section1.2.2.3.To seethis,considerthateachof theaxiomsstatesthatin ÕÖ«�²¶³¶¬ , composedof ÕÖ«�²´¬ andÕÖ«$³¶¬ ,ÕÖ«�²´¬ is catenatedwith ÕÖ«$³¶¬ . Thedefinitionof regularrelations(seeAppendix1.A.2)statesfirst of all that a mappingbetweena pair of symbolsis a regular relation,andfurthermorethat theconcatenationof two regular relationsis itself a regular relation.It is thereforeeasyto seethatonecanprovideaconstructiveproofwherebyRegularityfollows from the statedaxioms. In onesense,the axiomsprovide a ratherrestrictivenotion of Regularity. Considera writing systemin which a linguistic object ²�³�Û�Üis spelledout as ÕÖ«�²´¬~ÕÖ«�Û�¬~ÕÖ«:Ü"¬~ÕÖ«$³¶¬ : for example,the writing systemmight have the(bizarre)propertythatthesecondphonemeis alwaysspelledoutattheendof theword.This would certainlybe a violation of the axiomsinsofar asthe spelledstring is notformedby concatenatingeither Õ´«:²�¬ or Õ´«:Û�¬ with Õ´«�³¶¬ . However thisexamplecanbehandledby a regularrelationthat in effect mapsa symbol— here ³ — to nothing( Ý )on theoutputside,“remembers”thatit hasseen³ , andthenspellsit outas ÕÖ«$³¶¬ at theendof the string. But such“memory” comesat somecost in finite-statemachinery,sincesucha machinemustrepresentinterveningmaterialmultiple times: in additionto mapping Û to Õ´«:Û�¬ and Ü to ÕÖ«:ÜY¬ , the machinemustalsorememberwhich secondphoneme( ³ ) it hadseen,andtheonly way to do this is to haveseparatepathsthroughtheremainingportionsof the transducer, onepathfor eachphonemethatmight havebeendeleted. Memory in finite-statedevicescanonly be encodedin states:if onewishesto delete³ with a view to inserting Õ´«�³¶¬ lateron, thenonemusthave thearcthatdeletes³ endin astatez%Þ distinctfrom thestatez(ß thatterminatesanarcthat,forinstance,deletesà (insertedlateron as Õ´«�à"¬ ). z%Þ and z(ß would in turn be thesourcestatesfor arcsthatmap Û to ÕÖ«�Û�¬ and Ü to Õ´«�ÜY¬ , andwouldeachhavetheirown privatecopiesof thesearcs. Writing systemsgenerallydo not seemto requirethis kind ofmemory. At first one might think suchcasesarecommon. Consider, for instance,thespellingof English/eI/ as ` aCeb , where‘C’ is aconsonant(make) or sequenceofconsonants(taste). If ` eb is somehow partof thespellingof /eI/, thenthiswouldseemto bea violationof theaxioms.However, it seemsperfectlyreasonableto assumethat/eI/ is in factspelledby ` ab , andthat ` eb is merelyintroducedby rule to “support”thespellingof /eI/ as ` ab in certainenvironments;see(Cummings,1988).

An importantfeatureof regularrelationsis thatthey areclosedundercomposition.Supposewehavetwo regularrelationsá Þ and á ß , andsupposethatthedomainof á Þis (thesetof strings)â , its range� , andsupposefurtherthatthedomainof á ß is � andits rangeis ã . Thenthe compositionof thesetwo relations,denotedá Þ º8á ß is alsoa regularrelationwhosedomainis â andrangeis ã . (Thenotionof compositionhereis exactly thesamenotionasthatof functioncompositionin algebra.)This propertyof closureundercompositionhasanimportantimplication. Sincesinglerewrite rulescanberepresentedcomputationallyasFST’s,onecanalsorepresentanorderedseriesof suchrewrite rulesasa singleFST, by merelycomposingtogethertheFST’s for theindividual rules.

A secondimportantpropertyof regularrelationsandFST’s is thatthey areinvert-ible. That is, by switchingthe input andoutputlabels,oneswitchesthe domainandrangeof arelation.In thecaseathand,if onehasatransducerÄ thatmapsfrom ORLto à , thentheinverseof Ä , denotedÄåä Þ will mapfrom à to theORL. This is clearly

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 17

a usefulpropertysinceit meansthata modelof spellingcanalsoserve (inverted)asa modelof reading— in the limited senseof decodinga linguistic structurefrom awritten text.

In additionto usingregularrelationsandFST’s to implementthemappingbetweentheORLandà , onecanalsoimplementconstraintsusingregularlanguages, andfinite-stateautomata(FSA’s). Finite-stateconstraint-basedsystemshave beenusedwidelyotherareasof linguistic description,suchasphonology(Bird andEllison, 1994),andsyntax(Voutilainen,1994;Mohri, 1994). In writing systems,surfacespellingcon-straintscanbe modeledin this fashion.For instance,if a certainwritten symbol æ isdisallowed in word-final positiononemight write a constraintsuchasthe following(where‘#’ denotesa wordboundary):

(1.11) çhæ�èSeeSection3.5 for somediscussionof real examplesof surfaceorthographiccon-straints.

As we have alreadydiscussed,we follow Nunn(1998)in our assumptionthat therelation à canbedecomposedinto a compositionof thesetof graphicencodingrulesÄ»Å¶Ç ¸ ¾-¼¿È andthe setof autonomousspellingrules ÄÆÉ Â È~ËÔË . At this point we canbemorespecificin our claim: Ä»Å�Ç ¸ ¾©¼¿È and ÄÊÉ Â È~Ë)Ë both implementregular relationsand à is the compositionof thosetwo regular relations: Ä»Å¶Ç ¸ ¾-¼¿È�º½ÄÆÉ Â È~ËÔË . Surfaceorthographicconstraintsareclearly a componentof ÄÆÉ Â È�ËÔË : onecan factor ÄÆÉ Â È�ËÔËinto two components,onethat implementsa mapping ÄÆÉ Â È~ËÔË Àêé  , andthe otherthatimplementsasetof constraintsÄ É Â È�ËÔË ¸ ¾-ÇYë~ì$í . Ä É Â È~Ë)Ë itself is thenjust thecompositionof thesetwo, or moreformally:

(1.12) ÄÊÉ Â È~Ë)Ë�ÑîÄÆÉ Â È~ËÔË À0é  ºaï4t�«:ÄÆÉ Â È~ËÔË ¸ ¾-ÇYë~ì$í ¬Here, ï"t is anoperationthatconvertsanFSA into anequivalentFST, wheretheinputandoutputlabelson eacharcareidentical(KaplanandKay, 1994,page341).

Finally, we have beenimplicitly assumingin this discussiona modelof regularrelationsthat containsa standardstring-based“left-to-right” concatenationoperator.As we have alreadynoted,we will needto extendthenotion of catenationto handlevariousformsof two-dimensionalcombination.We will discussthis in Chapter2.

1.2.3.2 Consistency

In Section1.2.1we introducedthenotionof theOrthographicallyRelevantLevel, andwesuggestedthatdependinguponthewriting system,theORL couldrepresentarela-tively deepor relatively shallow orthographiclevel. Thethesisof Consistency simplystatesthat this level is consistentacrossthe entire vocabulary of the language. Asshouldbeclear, andaswe will discussfurtherbelow, this notionpresumesa classicalderivationalmodelof phonology.

Considera sequenceof phonologicalrules á Þ á ßÖð¥ð6ð á Ç , which appliesin thederivation of every word of somelanguage:we will define ñ to be the input levelto thesequenceof rules.For sucha system,thereare wØòÓó consistentlevelsof repre-sentation,namely ñ itself, and ñ composedwith á1Þ ð6ð¥ð á?ô , }½õöw . TheConsistency

18 CHAPTER1. READING DEVICES

hypothesisrequirestheORL to bepickedfrom oneof theseconsistentlevels } . A vio-lationof Consistency would bea systemwhereoneportionof thevocabulary (e.g.allnouns,or all wordshaving a particularphonologicalstructure)picksa level } , andtheremainderof thevocabularypicksa level ÷ , }êøѹ÷ .

The model describedin the last paragraphcould be expandedto supportmoreintricatenotionsof consistency. For instance,in a Lexical Phonology-basedtheory(Mohanan,1986),insteadof sequencesof rules,wemight think in termsof sequencesof strata.TheORL couldthenbepickedto beeithertheinput level, or elsetheoutputof oneof thestrata.Thiswouldof coursebeamoreconstrainedtheoryof consistency,andprobablyonethatshouldbe favoredover the loosermodelpreviously described.We will not,however, attemptto choosebetweenthesevariantmodelshere.Notethata similar questionwasraisedby Klima (1972),who asked(page67) “which levelsoflinguistic structure. . .arethenmostreadily accessibleto the processof readingandwriting?” (italics original).

An additionalissueis cyclicity. If amorphologicallycomplex word is constructedin a cyclic fashion,might it be the casethatorthographicfeaturesof the morphemesarealsoaddedcyclically? In whatsensethencouldwespeakof orthographymappingto a singlelevel?SeeSection3.4 for furtherdiscussion.

Consistency will beexemplifiedin Chapter3 with acomparisonof RussianandBe-larusianorthographies,aswell asadiscussionof (American)Englishorthography. Wewill alsoexamineanapparentcounterexampleto Consistency from Serbo-Croatian:aswe shall see,Consistency forcesa reanalysisof theSerbo-Croatiandata,which leadsin turnto amoreinsightfuldescriptionof thephenomenonthanthetraditionaldescrip-tion. Evenin quiteregularsystemssuchasRussian,onedoesin factfind caseswheretheorthographywould appearto mapto a deeperor shallower level of representationthanwould beexpectedon thebasisof thepositedORL for theremainderof thevo-cabulary. We shall seesuchexamplesin thediscussionof RussianandBelarusianinChapter3. As longastheexceptionsconstituteasmallminority — asis thecasein theRussianandBelarusianexamplesthatwe shalldiscuss— they canalwaysbehandledby meansof lexical marking, thoughnaturallythis device comesat somecost. Theexamplesin Chapter3 will thusbeseengenerallyto supportConsistency, but we willnecessarilyleave it asa topic for futureresearchto determinewhetherConsistency issupportedmorebroadlyacrosstheworld’swriting systems.

Theassumptionthatorthographymayrepresenta particularlevel — deepor shal-low — of alanguageis implicit in many discussionsof readingin thepsycholinguisticsliterature;it is arguablyimplicit in Venezky’s (1970)classicanalysisof English or-thography;andit is a claimalsomadein TheSoundPatternof English(Chomsky andHalle, 1968),whereEnglishorthographyis describedasa “nearperfect”representa-tion of anunderlyingphonologicalrepresentation.

As wehavealreadynoted,wetakeasthebasisfor Consistency atraditionalderiva-tional modelof phonology. This is surelya controversialmove: naturally it wouldseemdesirablein light of modernnon-derivationaltheoriesof phonologyto castouranalysisin termsof a non-derivationalparadigm. For example,it would be naturalto seekan accountof the phenomenathat we will discussin termsof a monostrataltheorysuchasthoseof (Bird andEllison, 1994;Bird andKlein, 1994;Bird, 1995).

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 19

Similarly, onemight desirean accountin termsof Optimality Theory(PrinceandSmolensky, 1993),wheretheonly linguistic level in effect is theoutputlevel atwhichthe rank-orderedconstraintsareevaluated(at leaston oneversionof the theory). Itis not at presentclear to me how to do this: while I do not doubt,for example,thatananalysisof theRussianandBelarusianfactsthat I will discussin Chapter3 couldbe recastin termsof sucha framework, they seemto be describablemostnaturallywithin a modelwhereinonecanspeakof differentlevels of representation.Oncea-gain,I leave it asa topic for futureresearchto work out analyseswithin morecurrentphonologicalframeworks.

1.2.4 Further issues

In this sectionwe discusstwo issuesthat the currentwork andthe modelpresentedhereinraise. Thefirst issue(Section1.2.4.1)concernsthe following question:giventhatwriting, unlikenaturallanguage,is anartefact,onethat— againunlikenaturallan-guage— mustbeexplicitly taught,why shouldonebelievethataconstrainedmodelofthekind typically appliedto language,wouldapplyto writing? Thesecondissue(Sec-tion 1.2.4.2)relatesto our adoptionof a phonologicalmodelthat includessegments:giventhatmany phonographicwriting systemsareessentially“segmental”(thebasicsymbolsrepresentingsegment-sizedunits),this is certainlya convenientchoice,yet itseemsto fly in thefaceof morerecentmodelsof phonologythateschew segments.

1.2.4.1 Why a constrainedtheory of writing systems?

It mayhaveoccurredto thereaderto wonderwhy aconstrainedtheoryof writing sys-temsshouldhave any chanceof beingcorrect. To be sure,suchmodelshave beenappliedin linguisticswith greatsuccess.But writing is crucially differentfrom otheraspectsof linguisticknowledge.Languageoccursnaturallyin all humancommunities:writing, in contrast,is a technologicaldevelopmentthatwasapparentlyonly indepen-dently inventedfour timesin history (in Egypt,Sumer, ChinaandCentralAmerica)andhasonly beenusedby a minority of languagesandpeoplethroughoutmost ofhistory. With few exceptionsall humanslearnto speak(or sign)at leastonelanguagewithoutany specialinstruction;in contrast,readingandwriting mustbetaughtexplic-itly andin many casestakesyearsof specialinstructionto master. Writing is thereforenot “natural” in thesamesenseaslanguage.

Onemight evengo further thanthis: writing systemsaredevelopedfor particularlanguages,with moreor lesscarebeingtakento ensurethat they reflectthelinguisticpropertiesof the languagein question. Furthermore,at leasta few writing systemshave undergonereformsover theyears,in orderto attemptto bring the systemmorein line with thelanguage(seeSection6.2). Orthography, then,canbethoughtof asakind of practicallinguistic theory.

This latterview hasbeenexpressedperhapsbestby Aronoff (1985)in a paperde-scribingthepunctuationsystemof MasoreticHebrew. Masoreticannotationsevolvedasa way of markingvariousinformationaboutBiblical Hebrew text, in particularin-formation abouthow to pronounce,andaccentor intone the text. The systemwas

20 CHAPTER1. READING DEVICES

basedon diacritics,with annotationsbeingaddedto, but not alteringthecoreconso-nantaltext, which wasconsideredsacred.The systemfor markingvowels survivesasthe (optional)vowel pointsof ModernHebrew. The notationfor accent,which isthe topic of Aronoff ’s discussion,is only usedin the Bible. Aronoff arguesthat theaccentualmarkingsystemin factmarks“a completeunlabeledbinaryphrase-structureanalysisof every verse”of theBible (page28). It thusrepresentstheend-productofconsciouslinguisticanalysis,andthusin effectencodesa linguistic theoryof whatthephrasestructureof Hebrew shouldlook like. Furthermore,like any linguistic theory,theMasoreticannotationsystemcanbeincorrectin thestructuresit presumesfor par-ticularconstructions:indeedAronoff arguesthattheanalysisimplicit in theannotationis in somecasesincorrect.

As Aronoff notes,theMasoreticsystemis quiteunusualin the richnessof linguis-tic structurethat is marked: certainlyno orthographicsystemthat is in wide usehasconventionsfor markingconstituentstructure.(Onemight think of normalpunctua-tion symbolsasmarkingsomelevel of syntacticor phonologicalphrasing,but Nun-berg (1995)effectively arguesagainstthis.) And of coursethe Masoreticsystemisatypicalin anotherrespect:it wasnot anorthographicsystemusedby nativespeakersof a languagefor everydaycommunication,but rathera systemdesignedspecificallyto give preciseguidancein the pronunciationof sacredtexts to non-native speakers(sinceHebrew was,during the relevant period,nobody’s mothertongue). From thatpointof view, thesystemhasmorein commonwith systemsof annotationfor markingscansionin poetrythanit doeswith theorthographicsystemof, say, ModernEnglish.Nonetheless,to the extent that consciouseffort goesinto the designof moretypicalorthographies,Aronoff ’spointsremainvalid. Theseconsiderationswould thusappearto argueagainstapplyingthesamekindsof methodsin thestudyof writing asin thestudyof languagemoregenerally. Therearehoweverat leastacoupleof basicreasonswhy suchpessimismis ill-founded.

Firstly, while writing surelymustbe learned,andwhile writing systemsareoftenconsciouslydesigned,they mustalsobe used,which meansthat to be practicaltheymustbearsomesensiblerelationto thelanguagesthat they represent.Presumablyby“sensible”we imply non-arbitrary, andby “non-arbitrary”we meanthat it shouldbepossibleto stateformal constraints.WhetherConsistency andRegularity, introducedin Section1.2.3arereasonableconstraintsis an empiricalquestion. What is not indoubt,in my view, is thatsomesuchconstraintsmustexist.

Secondly, while orthographicsystemscertainlydependuponthelinguistic knowl-edgeof their creators,influencesin the other directionarealso found. First of all,asWells (1982)notes,theorthographicrepresentationof wordsis often thebasisforspeakers’ consciousbeliefs about their pronunciation:naive English speakers maybelieve thattow andtoearepronounceddifferentlybecausethey arespelleddifferent-ly. Secondly, thereareso-calledspellingpronunciations,suchas/v V Ikt ù ui lz/ (ratherthan/v V It i lz/) for victuals, wherethephonologicalrepresentationof individual wordshasbeenmodifiedover time on the basisof spelling. Thirdly, onealsofinds moresystematiceffectson the phonologyon the basisof written form. Thus, accordingto Serianni(1989),NorthernItalian dialectshistorically lack gemination— termedraddoppiamentoin theItalian literature— bothwithin wordsandacrosswords— the

1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 21

so-calledraddoppiamentosintattico. Cross-wordgeminationis notwrittenin standardItalianorthography, andNortherndialectscontinueto lack raddoppiamentosintattico.However, word internalraddoppiamentoasin the second/m/ of mamma‘mama’, isconsistentlyspelled. As a result, northerndialects,which historically lacked word-internalraddoppiamentonow possessit. Linguistic knowledgeis oftenassumedto bein somesenseprimary, or at leastmorebasicthanorthographicknowledge.Spellingpronunciationsandexampleslike thosein Italian show that in somecasesparticularbitsof linguisticknowledgecanbestbeexplainedon thebasisof orthography. This inturn suggeststhe needto understandthe relationbetweenorthographyandlinguisticstructure,andtheformal constraintson thatrelation.

1.2.4.2 Orthography and the “segmental” assumption

In thediscussionabove,we assumedthatthegraphemesin a segmentalphonographicsystemlike Russianare licensedby phonologicalsegmentsin the traditional sense.In makingthis assumption,we mayseemto be taking two stepsbackwards.Severalstrandsof work in phonologyover the pastdecadeanda half, includingFeatureGe-ometry (Clements,1985;Sagey, 1986),DeclarativePhonology(Coleman,1998), andArticulatory Phonology(BrowmanandGoldstein,1989),haveconverged on thecon-clusionthat segmentsareepiphenomena,the resultof overlappinggestures.Indeed,thereseemsto beawidely accepteddogmathattheverynotionof segmentin Westernphonologicaltraditionderivesfrom segmentalalphabeticwriting.

We shouldnoteat theoutsetthatin onesensethis issueis orthogonalto themodelbeingdevelopedhere. That is, I have chosento representthe licenserof Russiand` g b asthe segment/g/, but I could just aseasilynot have. If we have insteada setof overlappinggestures— e.g. VELAR, ò VOICE, ú CONTINUANT, ú NASAL —eachonits own arcin anannotationgraphrepresentation,thenwecanassumethatthiscollectionof featurestogetherlicensesd.` g b . Oneimplementationof this ideawouldbeto assumethat the timing slot or syllablepositionthat is linkedto theoverlappingsetof featuresis thelicenserof d?` g b , andcanonly licensethisgraphemeby virtueofthecollectionof featuresthatit is associatedwith. Thosewhoarebotheredby my useof segmentalphonologicalrepresentationsareinvited to think of themasa shorthandfor themorearticulatedview I have just sketched.

On theotherhand,theview that thenotion“segment” in phonologyderivesfromsegmentalwriting is overly facile,andshouldnot be uncritically accepted,I believe.Perhapsthe bestarticulatedpresentationof this conceptis a paperby Faber(1992),whereshesetsherselfthe task of explaining the following paradox: The notion ofsegmentis unnatural,andderivesin part from alphabeticwriting: “investigationsoflanguageusesuggestthatmany speakersdo not divide wordsinto phonologicalseg-mentsunlessthey have receivedexplicit instructionin suchsegmentationcomparableto that involved in teachingan alphabeticwriting system”(page111). On the otherhand,alphabeticwriting systemsdoexist. How couldthey havecomeaboutin thefirstplaceif theprinciplesuponwhich they arebasedaresounnatural?

Faber’s answermakesuseof thestandardview thatwhentheGreeksadoptedthePhoenicianscript, they misinterpretedsomeof theconsonantalsymbolsasrepresent-

22 CHAPTER1. READING DEVICES

ing vowels. Thustheuseof alpha to represent/a/ wasa misinterpretationof Phoeni-cian/ û alpa/,representing/ û /. Thismuchis widely accepted,andit thereforeis possiblethat the Greekinventorsof the segmentalalphabetdid not have an a priori notion ofsegment:on thecontrary, they thoughtthey wereborrowing a systemof writing thatrepresentedbothvowelsandconsonants.

A reasonablequestionat this point is why Faberis focussingon theGreekalpha-bet (and its derivatives): after all, thereare many apparentlysegmentalsystemsinthe world, including numerousSouth Asian scriptssuchasDevanagari(seeSec-tion 2.3.2),KoreanHankul(Section2.3.1),andEthiopic(Haile,1996).Someof these,suchastheSouthAsianscriptsmayhave hada Semiticorigin (Salomon,1996),likeGreek— thoughsurely independentlyof Greek. For others,like Hankul, which isa totally endemicKoreaninvention,the external inspiration(if any) for designingasegmentalsystemis unclear(King, 1996). Indeed,evenunvocalizedSemiticwritingsystemscouldbeconsideredsegmental,thoughthey traditionallyomit marksfor vow-els: aswith thenon-representationof lexical stressin Russian(Section1.2.1)we cansaythatthecoverageof traditionalSemiticscriptsis incomplete.

Faberconcentrateson Greekbecauseshetakesa rathernarrow view of the no-tion of “alphabeticwriting”, andit is only “alphabeticwriting”, accordingto Faber,that engendersthe paradoxicalsituationintroducedabove. For her an alphabetis a“segmentallylinear script” that represents“vowels andconsonantsboth asseparateandequal”. The latter requirement,of course,eliminatestraditionalSemiticscriptsfrom consideration:they do not representvowels. “Segmentallylinear” scriptsarescriptswheretheelementsarearrangedin a moreor lesslinear fashion,without anysignificantuseof two-dimensionallayout: only in suchscriptsareall elementson apar with eachother. ThusSouthAsian scriptsandHankul scriptsareeliminated,s-ince in bothcasesthe consonantandvowel symbolsarelaid out in two-dimensional(syllable-sized)chunks(Sections2.3.2and 2.3.1);furthermorein many SouthAsianscripts(thoughnot soclearlyin thecaseof Hankul),thevowelsarefrequentlydiacrit-ic symbolswritten aroundthe consonantalcore,andthusarenot on a par with eachother. If onenarrowsthefield in this fashion,then,it would seemthatsegmentalwrit-ing wasreally only inventedonce,by accident,andwe do not needto attribute any“naturalness”to thenotionof segment.

Still, onemightwonderaboutthejustificationfor thelimitationsthatFaberimpos-es.Why is Devanagariany lesssegmentalthanGreek, justbecauseit happensto rep-resent/e/ asa stroke above thetemporallypreceding/k/, whereasGreekarrangesthesymbolsby left-to-right concatenation?Faber’s point, not surprisingly, is thatscriptslikeDevanagari(or Ethiopic,or Hankul)arrangetheirsegmentalelementsin syllable-sizedchunks(in Chapter2 we will saythat in suchscriptsthe SmallLinguisticUnitis thesyllable),which arethemselveslinearly arranged(“segmentallycoded,syllabi-cally linear”). In otherwords,the syllablehasa specialstatusin suchscriptsthat isseeminglylackingin Greek-derived(“segmentallycoded,segmentallylinear”) scripts.

Now, onecannotdeny theimportanceof thesyllableasanorganizingprincipleinorthographies:we will seeseveral instancesof this in Chapter2, andsyllablesevenshow themselvesto be importantin “segmentallylinear” scripts;seeSection3.5,and(Nunn,1998).But Ethiopic,SouthAsianscripts,Hankulandotherscriptsalsoencode

1.3. TERMINOLOGY AND CONVENTIONS 23

segmentalinformation.This point, it seemsto me,is not nullified by thefact that thescriptsalsoencodesyllabic information. Segmentalsystemshave evolved, or beendeveloped,in avarietyof differentcultures,speakingawidevarietyof languages,andundera variety of differentconditions. The notion “segment” may be an unnaturalepiphenomenon,but if so,thenat leastit is onethatis fairly widespread.

1.3 Terminology and Conventions

Thissectionoutlinestheterminologyandconventionsthatwewill usethroughoutthisbook.

First of all, we will usethe terms“script”, “orthography”and“writing system”,in their conventionalsensesasfollows. A “script” is just a setof distinctmarkscon-ventionallyusedto representthe written form of oneor more languages:crucially,onecanspeakof a scriptwithout implying its usefor a givenlanguage.Thuswe willspeakof the“Romanscript”, or the“Chinesescript”. Ontheotherhand,awriting sys-temis a scriptusedto representa particularlanguage.Thus“writing system”implies“writing systemfor a given language”.7 We will usethe terms“orthography”and“writing system”interchangeably;8 in someof the literature,the term “orthography”implies “standardizedorthography”,suchasthe standardsystemof spellingusedinAmericanEnglish,andthis implicitly excludessystemsof writing thathave not beenstandardized(aswasthecasein, say, ElizabethanEnglish).Thoughwe will primarilybe discussingstandardizedorthographiesin this work, we do not intendthe term tocarrywith it any implicationof standardization.

Thefollowing notationalconventionswill beobserved:Ú Angle bracketswill be usedto encloseorthographicrepresentationsin Romanscript. Note that this will only be the casewhenin the discussionat handthefocus is on the orthographicrepresentation.For examplein a discussionof alinguistic examplecontainingtheword frog, thatword will be italicizedaspernormallinguistic convention,if we aremerelyreferringto the linguistic object(word, morpheme,. . . ) frog. However if we arespecificallyinterestedin thestringof characters‘f ’, ‘r’, ‘o’ and‘g’, thenanglebracketswill beused: ` frog b .Ú Examplesin non-romanscriptswill generallybetransliterated,with thetranslit-erationgivenin angle-brackets.Phonemictranscriptionsandtranslationswill begivenwhererelevant. Inevitably somesinglecharactersof a non-Romanscriptwill needto betransliteratedwith a sequenceof charactersin Romanscript: insuchcases,thesequenceof characterswill beunderlinedin orderto indicatethatit is aunit. For example:(Cyrillic) ü�` ja b .

7One could go further anddefinethe notion of writing systemat a more abstractlevel whereby, forexample,theBraille encodingof theRomanalphabet,asusedfor English,is aninstanceof thesamewritingsystemasis usedin printedEnglish— thoughobviously thescript is quitedifferent. (Actually in ordertomake this connection,onewould have to glossover thefactthatbraille hasvariouslexical andstring-basedabbreviatory conventionsthathave no directcounterpartin standardprint.) Wewill not beconcernedwiththis level of abstractionhere.

8Thoughproperlyanorthographyis reallymerelyonetypeof writing system;see(Mountford,1996).

24 CHAPTER1. READING DEVICES

For scriptsthat run from right-to-left, I will indicatethis by markingthestringof graphemeswith thesymbol‘ ý ’.

For Chinesewriting I will adopta slightly morecomplex strategy, at leastincaseswheretheinternalstructureof Chinesecharactersis underdiscussion.Asmany as97% of Chinesecharacterscanbe analyzedasbeingcomposedof asemanticradicalplusa phoneticcomponent(DeFrancis,1984). In caseswherethis decompositionis feasibleI will “gloss” thecharacterin smallcapitalsthus:` SEMANTIC+PHONETIC b . Here SEMANTIC will be a conventionalterm todescribethe semanticradical in question,and PHONETIC will be a phonetictranscriptionin pinyin of thepronunciationof thephoneticcomponent;moreonthetranscriptionof thephoneticcomponentmomentarily. Following thiswill begivena phonetictranscriptionin pinyin of thewholecharacter, andanEnglishglosswherepossibleandrelevant.

Choosingthe appropriatetranscriptionfor the phoneticcomponentis not ass-traightforwardasit might seem.First of all, many phoneticcomponentshavemorethanonepronunciationasindependentcharacters.For example,thepho-neticcomponentof � chan ‘cicada’, namely ¡ , hastwo independentpronun-ciations,namelydan andchan. Secondly, in a numberof cases,no indepen-dentpronunciationof thephoneticcomponentis particularlysimilar to thepro-nunciationof thesemantic-phoneticcompound,but a significantfractionof thecharactersthat containthat phoneticcomponenthave an identical pronuncia-tion, possiblyignoringtone,to thecharacterof interest.A particularlystrikinginstanceinvolvesthephoneticcomponentþ , which asan independentcharac-ter is pronouncedchou, but asa phoneticcomponent,is alwayspronouncedniu(with varioustones).In sucha case,oneis arguablyjustifiedin transcribingthephoneticcomponentasniu ratherthanchou.

In decidinghow to transcribethephoneticcomponentof acharacterwethereforeadoptthestrategy of finding theclosestmatchbetweenthepronunciationof thesemantic-phoneticcompoundamong:

– theattestedindependentpronunciationsof thephonetic,and

– the pronunciationsof well-populatedsubsetsof thosecharacterssharingthesamephoneticcomponent

In casewe makeuseof thesecondof theseoptions,we indicatetheratioof:

– thenumberof characterslisted in (Wieger, 1965)with thephoneticcom-ponentand thepronunciationof interest,and

– thetotal numberof charactersin Wieger’s lists with thatphoneticcompo-nent.

Wealsolist thepagenumber(s)in Wiegerwhereonecanfind thecharacterswiththatphoneticcomponent.If the tonesdiffer amongthemembersof thesubset,andonly in thatcase,weomit tonemarksfrom thetranscription.

1.3. TERMINOLOGY AND CONVENTIONS 25

For instance, for ÿ huang ‘locust’ the transcription would be` INSECT+HUANG b 9 where huang happensto be the independentpro-nunciationof the phoneticcomponent� ‘emperor’. For � chan ‘cicada’ wetranscribe INSECT+CHAN b , wherechan is oneof the independentpronunci-ationsof ¡ , thoughnot the mostfrequent.For � tı, the phoneticcomponentis � , whoseonly independentpronunciationis shı. However, a significantnumberof characterslistedin (Wieger, 1965)with � asa phoneticcomponenthave thepronunciationtı, andthuswe transcribe� as ` WINE+TI ��� Þ � Â�� � �"b ,meaningthatin nineoutof nineteencharacterswith � asaphoneticcomponent(page498), Wieger lists the pronunciationas tı. This methodof transcription,while surelynot uncontroversial,is at leastreplicable.

In caseswheretheinternalstructureof theChinesecharacterisnotatissue,I willin generaldispensewith thedetailedcharacter-structuregloss,andmerelygivea phonetictransliterationin pinyin and(wherepossibleor relevant)anEnglishgloss.10Ú I will usethetermgraphemeto denotea basicsymbolof a writing system;this,despitethevalid objectionsto theuseof that termoutlinedin (Daniels,1991a;Daniels,1991b). However, notethat Daniels’ objectionsareaimedat the useof the termgrapheme, asan implicit parallelof phoneme: Daniels’ contentionis that thereis no “systematicgraphemics”parallel to a systematicphonemiclevel. I do not wish to contendthis point,andmerelyusethetermgraphemeasaconvenientshortway of saying“basicsymbolof awriting system”.

Note that in discussingsomewriting systemswe may usethe term graphemein slightly differentwaysdependinguponhow fine-grainedananalysisis beingassumed.For instance,it is convenientto refer to a singleChinesecharacterasbeinga graphemein somecontexts: in particular, in theelectroniccodingoftexts it is invariably the casethat singleChinesecharactersconstituteseparatecodes,and thus from the point of view of a computationalsystem(suchasaTTS system),Chinesecharactersare unanalyzablebasicunits. On the otherhand,thereis clearly importantinternalstructurein Chinesecharacters— cf.the semantic+phoneticcompositionof Chinesecharactersalludedto above —andfrom thepoint of view of a finer-grainedanalysisof Chinesewriting, thesesmallerunitswouldcertainlybecalledgraphemes.

I will alsousethetermglyphconventionallyto referto a written symbolwith aparticularshape,independentlyof whetherit correspondsto a singlegraphemeor multiplegraphemes.Thusin my discussionof KoreanHankul(Section2.3.1)I will referto “syllable-sizedglyphs”aswell asconsonantandvowelglyphs;thelattercorrespondto singlegraphemes,whereastheformerarepolygraphemic.Ú Wherethereis unlikely to be confusionI will usethe nameof languageX to

9I.e.   +� .10Note that throughoutthis work, I will usetraditionalChinesecharactersasusedin TaiwanandHong

Kong,andeschew theuseof simplifiedcharactersasusedontheMainlandandSingapore,exceptwherethestructureof suchsimplifiedcharactersis at issue.

26 CHAPTER1. READING DEVICES

denote“the orthographyof languageX”. Thus“Chinese”will denoteChineseorthography, exceptwherethis usageis likely to beconfusing.

1.A. OVERVIEW OF FSA’S AND FST’S 27

1.A Appendix: An Overview of Finite-State Automataand Transducers

In this appendixI give an overview of regular languagesand relations,and theirassociatedcomputationaldevices,finite-stateacceptors(FSA’s) andfinite-statetrans-ducers(FST’s). Thecoveragehereis necessarilybrief, andfor furtherdiscussionothersourcesarerecommended.Finite-stateacceptorsandregularlanguagesarediscussedin any good introductionto the theoryof computation:see,for example(Harrison,1978;HopcroftandUllman, 1979;Lewis andPapadimitriou,1981). There arefew-er introductoryworkson transducers.Onereasonablyaccessiblediscussion(dealingwith transducers)canbefoundin (KaplanandKay, 1994).Onemightalsoconsultthethird chapterof (Sproat,1992)for an in-depthintroductionto the useof finite-statetransducersin computationalphonologyandmorphology. For transducers(aswell asweightedacceptors),thereis a recentpaperby Mohri (1997) that discussesvariousformalpropertiesandalgorithms,andvariousotherrelevantworksarecitedtherein.

1.A.1 Regular languagesand finite-stateautomata

Basic to the theoryof automatais the notion of an alphabetof symbols;the entirealphabetis conventionallydenoted� . Theemptystring is denotedby Ý , which is notanelementof � ; also,theemptystringis distinctfrom theemptyset � . �� denotesthesetof all strings— including Ý — over thealphabet� .

It is usual to definea regular language with a recursive definition suchas thefollowing (modeledon thatof (KaplanandKay, 1994,page338)):

1. � is a regularlanguage

2. For all symbolsu������kÝ , �+u � is a regularlanguage

3. If �&Þ , ��ß and � areregularlanguages,thensoare

(a) � Þ ×�� ß , the concatenationof � Þ and � ß : for every � Þ ��� Þ and � ß ��� ß ,� Þ � ß ��� Þ ×�� ß(b) � Þ ��� ß , the unionof � Þ and � ß(c) �� , the Kleeneclosureof � . Using � ô to denote� concatenatedwith itself} times, �� Ñ����ô ��K � ô .

While theabove definition is complete,regular languagesobserve additionalclo-sure properties:Ú Intersection: if � Þ and � ß areregularlanguagesthensois � Þ"! � ß .Ú Difference: if �&Þ and ��ß areregular languagesthenso is �&Þ8ú#��ß , the setof

stringsin �&Þ thatarenot in ��ß .Ú Complementation: if � is a regular language,thenso is �� 8ú#� , the setof allstringsover � thatare not in � . (Of course,complementationis merelyaspecialcaseof difference.)

28 CHAPTER1. READING DEVICES

0 1a

b

Figure1.3: An acceptorfor ��� � . Theheavy-circledstate(0) is (conventionally)theinitial state,andthedouble-circledstateis thefinal state.Ú Reversal: if � is aregularlanguage,thensois á1|(��«$��¬ , thesetof reversalsof all

stringsin � .

Regularlanguagesaresetsof strings,andthey areusuallynotatedusingregularexpres-sions. A fundamentalresultof automatatheoryarethe so-calledKleene’s theorems,which demonstratethatregularlanguagesareexactly thelanguagesthatcanberecog-nizedusingfinite-stateautomata, wherethis computationaldevice canbe definedasfollows(Harrison,1978;HopcroftandUllman,1979;LewisandPapadimitriou,1981):

A finite-stateautomatonis a quintuple Ä Ñ «&% � z �'0� � � Û�¬ where:

1. % is afinite setof states

2. z is a designatedinitial state

3. ' is a designatedsetof final states

4. � is analphabetof symbols,and

5. Û is a transitionrelationfrom %)(�� to %As asimpleexample,considerthe(infinite) setof strings: �(u � uG� � uG��� � uG���¥� ð¥ð¥ð � —

i.e. the setconsistingof u followed by zeroor more � s. The mostcompactregularexpressiondenotingthis setis uG�* . Furthermore,the languagecanbe recognizedbythefinite-statemachinegivenin Figure1.3.

1.A.2 Regular relationsand finite-state transducers

Regular n-relationscan be definedin a way entirely parallel to regular languages.Again, thedefinitiongivenhereis modeledon thatof KaplanandKay (1994):

1. � is a regularn-relation

2. For all symbolsu��,+�«-�.��Ý¿¬�( ð¥ð6ð (�«-�.��Ý¿¬�/ , �+u � is a regularn-relation

3. If á1Þ , á?ß and á areregularn-relations,thensoare

1.A. OVERVIEW OF FSA’S AND FST’S 29

(a) á.Þ&×6á?ß , the (n-way)concatenationof á1Þ and á?ß : for every s%Þ0� á1Þ ands ß �ká ß , s Þ s ß �lá Þ ×¥á ß(b) á.Þ��ká8ß(c) á1 , the n-wayKleeneclosure of á .

Onecanthink of regular n-relationsasacceptingstringsof a relationstatedover anm-tupleof symbols,andmappingthemto stringsof a relationstatedover a k-tupleof symbols,where y ò �aÑ w . We canthereforespeakmorespecificallyof y2(�� -relations.As in thecaseof regularlanguages,therearefurtherclosurepropertiesthatregularn-relationsobey:11Ú Composition: if á Þ is aregular �"(Öy -relationand á ß is aregular y2(43 -relation,

then á Þ º&á ß is a regular �"(53 -relation.Compositionwill beexplainedbelow.Ú Reversal: if á is a regularn-relation,thensois á?|(�*«�á.¬ .Ú Inversion: if á is a regular y#(Öw -relation, then á ä Þ , the inverse of á , is aregular w�(Öy -relation.

Onecomputesthe inverseof a transducerby simply switching the input andoutputlabels.Thefactthatregularrelationsareclosedunderinversionhasanimportantprac-tical consequencefor systemsbasedon finite-statetransducers,namelythat they arefully bidirectional.Thus,aswe notedin Section1.2.2,a modelof spelling(mappingfrom the ORL to à ) canbe turnedinto a modelof reading(mappingfrom à to theORL) by simply invertingtheFSTimplementingÄ»Ì�Í�Î�Ï1Ð .

For mostpracticalapplicationsof n-relationsw Ñ ª (sothat � and y areobviouslyboth ó ).12 In thiscasewecanspeakof arelationasmappingfrom stringsof oneregu-lar languageinto stringsof another. In thiswork wewill beconcernedexclusivelywithª -relations,andwe will usethetermregular relationswith thatmeaningthroughout.

Thecomputationaldevicecorrespondingto aregularrelationis afinite-statetrans-ducer. Thedefinitionof FSTcanbemodeledon thedefinitionof FSA’s givenabove,so we will merelyillustrateby example,ratherthanessentiallyrepeatthe definition.Saywe haveanalphabet�ÓÑ �+u � � � { � t � anda regularrelationover thatalphabetex-pressedby theset: �g«�u � {6¬ , «:ug� � {�tg¬ , «�uG��� � {�t"tg¬ , «�uG����� � {�t4t"tg¬ . . . � . This relationthusconsistsof u mappingto { followedby zeroor more � ’smappingto t . Thisrelationcanberepresentedcompactlyby the two-wayregular expression a:c (b:d) . Figure1.4,depictsanFSTthatcomputesthis relation.Wereferto theexpressionson thelefthandsideof the‘:’ astheinputside,andtheexpressionson therighthandsideastheoutputside. Thus,in Figure1.4, the input sideis characterizableby the regularexpressionuG�* , andtheoutputsideby theexpression{�t6 .

Compositionof regular relationshas the sameinterpretationas compositionoffunctions: if á1Þ and á?ß are regular relations,then applying á.Þ.º�á?ß to an input

11Theomissionof difference,complementationandintersectionareintentional.In general,regular rela-tionsarenot closedundertheseoperations,thoughsomeimportantsubclassesof regular relationsare.See(KaplanandKay, 1994)for furtherdiscussion.

12Oneexceptionis thework of Kiraz (1999).

30 CHAPTER1. READING DEVICES

0 1a:c

b:d

Figure1.4: An FSTthatacceptsa:c (b:d)� .

expressionï is the sameasapplying á Þ to ï first andthenapplying á ß to the out-put. Figure1.5 depictstwo transducers,labeled7 Þ and 7 ß . 7 Þ computestherelationexpressableas (a:c (b:d) ) 8 ((e:g) f:h) (where 8 denotesdisjunction),whereas7 ßcomputesg:i Ý :j h:k (with the Ý : ÷ term insertinga ÷ ). The resultof composingthetwo transducerstogether— 7 Þ º97 ß — is atransducerthatcomputesthetrivial relation,e:i Ý :j f:k. In thisparticularcase,thoughboth 7�Þ and 7�ß expressrelationswith infinitedomainsandranges,theresultof compositionmerelymapsthestringef to ijk.

One other notion that is worth mentioningis the notion of projection onto onedimensionof a relation. For example,for a 2-way relation á , :�Þ�«:á.¬ projectsá ontothefirst dimensionand :�ß4«�á.¬ projectsontotheseconddimension.Projectionappliedto anFSTproducesanFSA correspondingto onesideof thetransducer. Thusthefirstprojection( :*Þ ) of thetransducerin Figure1.4is theacceptorin Figure1.3.

1.A. OVERVIEW OF FSA’S AND FST’S 31

7 Þ0

b:d

1

e:g

2

f:h3

a:c

e:g

f:h

7�ß0 1

g:i2

ε :j h:k

75;0 1

e:i2

ε :j 3

f:k

Figure1.5: Threetransducers,where������ ��¶���

32 CHAPTER1. READING DEVICES

Chapter 2

Regularity

In this chapterwe defendthe first hypothesisthat was introducedin Section1.2.3,namelyRegularity.

It is obvious at the outsetthat the normal notion of a regular language,wherethecatenationoperator‘ × ’ denotessimple left-to-right concatenation,will not suffice.Thiscanbeseeneasilyenoughwith theChinesecharacter< ` WINE+JI ANG b ji ang‘sauce’wherethe semanticradical = ` WINE b 1 occursbelow the phoneticportion> ` JI ANG b . This contrastswith the caseof ? ` FISH+L I b l ı ‘carp’, wherethesemanticradical @ ` fishb occursto the left of the phoneticcomponentA ` L I b ;with B ` BIRD+JI A b ya ‘duck’, wherethe semanticradical C ` BIRD b occurstotheright of thephoneticcomponentD�` JI A b ; with E ` GRASS+ZAO b cao ‘grass’wherethe semanticradical ` GRASS bGF occursabove the phoneticcomponentH` ZAO b . andwith I ` SURROUND+HUO b guo ‘country’, wherethesemanticradicalJ ` SURROUND b surroundsthe phoneticcomponentK ` HUO b . Thesedataaresummarizedin Table2.1.

Clearlywe needa morepowerful notionthansimpleconcatenationto handlesuchcases.Wewill thereforeintroducethenotionof planarregular languages, whichdifferfrom ordinary(string-based)regularlanguagesonly in definingarichersetof concate-nationoperations.Thedefinitionof planarregularlanguageswill begivenimmediatelyin Section2.1; we will alsointroduce(in Section2.2) the notion of SmallLinguisticUnit (SLU), the linguistic unit within which variationfrom the macroscopic— line-anddocument-level — orderof a script is possible. In subsequentsectionswe willshow theapplicabilityof theexpandedformalismto variousphenomenathatarisein avarietyof scripts.It will beclearthat theextendedformalismis capableof providingstraightforwardanalysesof thesephenomena,which lendssupportto the Regularityhypothesis.Problematicexamplesfrom AncientEgyptian will be discussedin Sec-tion 2.3.5. In Section2.4 we briefly survey the possibleinstantiationsof the SLU indifferentwriting systems.Finally, weendthechapterwith theimplicationsof thethe-ory for themacroscopicarrangementof scripts,andin particularfor theinstantiations

1Usedalone,this character, pronouncedyou, is usedmostlyasa termin thecalendricalcycle, thoughinarcheologicalusageit retainsits originalmeaningof ‘amphora’.

33

34 CHAPTER2. REGULARITY

?�ÑL@ left of A B�ÑLC right of D

E�ÑLF above H <�ÑL= below>

I�Ñ JsurroundingK

Table2.1: Chinesecharactersillustrating the five modesof combinationof semantic(under-lined)andphoneticcomponents.

of boustrophedonwriting.

2.1 Planar Regular Languagesand Planar Regular Re-lations

Planargrammarsof variouskinds have beenusedboth in two-dimensionalpatternrecognitionand in building generative modelsof two-dimensionallayouts. For in-stancetwo-dimensionalcontext freegrammarshave beenusedin the recognitionofprintedmathematicalequations(Chou,1989),andin formal descriptionsof Chinesecharacterconstruction(FujimuraandKagaya,1969;Wang,1983). Planarfinite-statemodelshave alsobeenused,mainly in patternrecognition: for instanceLevin andPieraccini(1991)developeda planar hiddenMarkov modelapproachto opticalchar-acterrecognition. A comprehensive review of two-dimensionalfinite-statemodelsandtheir propertiesis given in (GiammarressiandRestivo, 1997). GiammarresiandRestivo’s discussionfocusseson two-dimensionallanguages— alsotermed,for ob-

viousreasons,pictures— thatcanberepresentedwith symbolson a rectangulargrid.For instance,thefollowing would beapictureover thealphabet�(u � � � :

(2.1) a a a a a a ba a a a a b aa a a a b a aa a a b a a aa a b a a a aa b a a a a ab a a a a a a

This view is not really adequatefor our purposes,however, sincewe would like toview theprimitiveelementsof thealphabetasbeing,in effect,geometricalfiguresthatmight occupy morethanone“square”in sucha two dimensionalgrid. For example,in Figure2.2 below, the basicelementÕ´«:²�¬ is left-adjoinedwith the entirecomplexconsistingof Õ´«�³¶¬ , ÕÖ«:Ü"¬ , and ÕÖ«�Û�¬ — andin particularis directly to the left of bothÕÖ«$³¶¬ and ÕÖ«:Ü"¬ — somethingthat is not easilyrepresentedin anarrangementsuchasthatin (2.1).

2.1. PLANAR REGULAR LANGUAGES 35

γ(α) γ(β)

γ(ζ) γ(δ)

Figure2.1: ���������)�&�*� ��� ! �4�*�$#%����������'(� .γ(α)

γ(β)

γ(ζ) γ(δ)

Figure2.2: Anotherfiguredescribedby �*�����¶�Ô�&��� �*��! �4���$#%����������'(�-, .Thenotionof planarregularlanguagesthatwehavein mindherecanbedescribed

informally asfollows. Supposeyou have a setof two dimensionalfiguresarrangedinsomefashionon a flat surface:considerfor examplethefour rectangleslabeledÕ´«:²�¬ ,ÕÖ«$³¶¬ , ÕÖ«:Ü"¬ and ÕÖ«�Û�¬ in Figure2.1. We assumefor simplicity’s sake thatwe aretoldwhatthesubfiguresareandwherethey arerelative to oneanother:that is, our taskisnot to computethat therearefour blocksin Figure2.1,andthat they arearrangedinsomepattern,but rather, givenapredeterminedlayout,to describethatlayoutin formalterms. The analogyin the one dimensionalcaseis between,say, optical characterrecognition,andstringmatching:in theformercaseonemustdiscoverwhatcharactersarein atext; in thelattercaseonealreadyknowsthecharactersandtheirrelativeorders,andonemerelyhasto, for example,find patternsin this alreadyknown sequenceofcharacters.

Thereareanumberof waysin whichonecoulddescribeFigure2.1,but supposingwestartin theupperlefthandcorner, we mightsaythat Õ´«:²�¬ left catenateswith Õ´«�³¶¬ ;thatthispairdownwardscatenateswith thepair ÕÖ«:Ü"¬~ÕÖ«�Û�¬ ; andthat ÕÖ«:ÜY¬ left catenates

with ÕÖ«�Û�¬ . If we use‘Ï × ’ for ‘left catenateswith’ and ‘ M × ’ for ‘downwardscatenates

with’, we could describethe layout succinctlyas ÕÖ«�²´¬ Ï ×lÕ´«�³¶¬NM ×&ÕÖ«:ÜY¬ Ï ×kÕÖ«�Û�¬ . Ofcourseotherpatternsareconsistentwith this formula: considerFigure2.2.Thisbringsup thepoint thatunlike thecaseof one-dimensionalconcatenation,planarcatenationoperatorsarenot in generalassociative. More specifically, a sequenceof within op-

36 CHAPTER2. REGULARITY

eratorcatenationsis associative: «�ÕÖ«�²´¬�M ×�ÕÖ«$³¶¬-¬�M ×l«�ÕÖ«:Ü"¬�M ×cÕÖ«�Û�¬-¬ is equivalent toÕÖ«�²´¬NM ×1«�ÕÖ«$³¶¬1M ×�Õ´«�ÜY¬�¬1M ×�Õ´«:Û�¬�¬ ; but cross-operatorcatenationsarenot in generalas-sociative. Thereare a coupleof possiblesolutionsthat allow us to more preciselydescribea particularlayout. Oneapproachis to make bracketsanexplicit partof the

formalism: thusFigure2.1 could be describedas + ÕÖ«�²´¬ Ï × ÕÖ«$³¶¬�/ M ×O+ Õ´«�ÜY¬ Ï × Õ´«:Û�¬-/ ,asdistinct from ÕÖ«�²´¬ Ï ×P+ ÕÖ«$³�¬ M ×N+ Õ´«�ÜY¬ Ï ×lÕÖ«�Û�¬-/ / , which would describeFigure2.2.An alternative that can be adoptedin somecases(seeSection2.3.1, for instance)is to definea precedenceon operators. So Figure 2.1 can be describedas simplyÕÖ«�²´¬ Ï ×ÒÕÖ«$³¶¬QM ×´ÕÖ«:Ü"¬ Ï × ÕÖ«�Û�¬ , if we have the understandingthat ‘

Ï × ’ hasprecedence

over‘ M × ’, sothatthegroupsÕ´«:²�¬ Ï ×½ÕÖ«$³¶¬ andÕ´«�ÜY¬ Ï ×8ÕÖ«�Û�¬ will form first, andonly then

will M × join thetwo groupstogether. Suchanapproachwould not allow usto describe

Figure 2.2, sinceno definition of precedencebetween‘Ï × ’ and ‘ M × ’ will allow us to

groupthecomponentsappropriately. In suchcasesonewould have to resortto brack-eting.For example,theChineseR l ın ‘fish scale’is composedof thecomponents,@, S , T and , arrangedasfollows: @ Ï × [ S M × [ T Ï × ]]

We turn now to a formal definition of planarregular languages.The definitionsof regularlanguagesintroducedin Appendix1.A carryover directly to planarregularlanguages,theonly novel featurebeingthesplittingof concatenation‘ × ’ into fiveoper-ations— eachof whichis neededto describetheChinesecharactercomponentlayoutsillustratedin theintroductionto this chapter:2Ú Left catenation:

Ï ×Ú Rightcatenation:U ×Ú Downwardscatenation:M ×Ú Upwardscatenation:V ×Ú Surroundingcatenation:3Notethat 3 doesnot haveadual: we discussthis point furtherin Section2.3.4.

Thus,we canemendthe relevant portionsof the definition of regular languagesgivenin Appendix1.A.1 to readasfollows:

3. If �&Þ and ��ß areregularlanguages,thensoare

(a) � Þ Ï ×Q� ß ; � Þ U ×0� ß ; � ÞWM ×9� ß ; � ÞWV ×X� ß ; � Þ 3)� ßEachof thesecatenationoperationsis illustratedin Figure2.3.Of course,for theimplementationof Ä»Ì�Í�Î�Ï1Ð in a givenwriting system,we will

notonly beinterestedin planarregularlanguages,but moregenerallyin planarregular2NotethatColeman(1998,pages27–28)usesdownwardsconcatenation(which hetermscocatenation)

aspartof hisdescriptionof theformal syntaxof IPA symbols.

2.1. PLANAR REGULAR LANGUAGES 37

(a) (b)

(c) (d)

(e)

γ(α) γ(β) γ(α)γ(β)

γ(α)

γ(β) γ(α)

γ(β)

γ(β) γ(α)

Figure2.3: Thefive planarconcatenationoperations:

(a) Y5Z\[^]^_��WY4Z �5] ; (b) Y5Z\[5]a`��WY5Z �4] ; (c) Y4Z\[^] ! �bY5Z �4] ; (d) Y5Z\[^]5c �bY5Z �4] ; (e) Y5Z\[^]bdeY5Z �4] .

38 CHAPTER2. REGULARITY

relations. On theorthographicsideof themapping,oneis clearlymappingto planarobjectsbuilt usingsomecombinationof planarcatenationoperations.Onthelinguisticsidethingsareperhapslessclear. Although linguistic objectssuchastheannotationgraphsintroducedin Section1.2.2aredisplayedin two dimensions,they arenot reallyplanarobjects:thereis no sensein which theSEM arcis, say, above theTONEarcin(1.10).For thesakeof thepresentdiscussionwewill assumefor thesakeof simplicitythatgraph-theoreticobjectssuchas(1.10)have been“linearized” into strings,sothatwe can think of their constructionas being in termsof simple string concatenation‘ f ’. So, in thepresentdiscussionwe will be interestedin planarregular relationsthatinvolvemappingsbetweenstringsconstructedusing‘ f ’ andplanarobjectsusingsomecombinationof planarcatenationoperations.Thuswemightwantto state,for instance,

that gPf�h,fji transducesto k�l&g"m0n fok�lphqm1r\fsk"l$itm . We canstraightforwardly redefinethenormalnotionof concatenationin regular relationsto implementthecasewe areinterestedin (Appendix1.A.2):

3. If u0v and uxw areregularrelations,thensoare:

(a) uNvLyzf|{ r f6}euxw ; uNvLyzf|{�~�f6}euxw ; uNvLyzf|{ n f-}euQw ; u0v�y�f {�� f�}�uxw ;u v yGf { d }�u wHere, the notation y����X�(s v {�����(s w } meansthat we combinethe input sideof therelationusing �����(s�v andtheoutputsideusing ���X�(s�w .

It shouldbestressedthattheplanarcatenationoperationsarenotgenerallyintend-ed to describethe exact placementof oneelementrelative to another. Thusstating

a formula suchas k�l&g"m0n fWk"l&h�m merelyentailsthat k�l&g"m is placedsomewhereabovek�lph�m , but it saysnothingaboutwhetherthecenterof gravity of thevisibleglyphrepre-sentingk�l&g"m is exactlycenteredon thevisibleglyph representingk"l&h�m , or is perhaps,say, a little to theright. Of coursesometimessuchdifferencescorrelatewith a differ-encein meaning.To take anobviousexample,in a numberof scriptstheapostrophey ’ } andthe comma y , } arealmost identicalor completelyidentical in form, theonly differencebeing the vertical placement. In both y Jones’} and y Jones,} wewould saythat the comma/apostropheis catenatedto the right of y s} , so it wouldseemasif thecurrentformalismcansaynothingabouthow thesetwo casesaredistin-guished.Thesolution,of course,is to assumethatglyphsarenot merelya collectionof blackbits, but in generalalsoincludea block of white bits within which theblackbits aresituated.Thusapostropheandcommaarereally representedasin (2.2) and(2.3),respectively:

(2.2) ’�

(2.3) ,

Thuswe canpreserve the simplestatementthatapostrophesandcommasalike cate-natedin a left-to-right fashionwith their neighbors,andat the sametime guaranteethat they will be positionedappropriatelyin the vertical dimensionrelative to those

2.1. PLANAR REGULAR LANGUAGES 39

neighbors.In many othercases,though,issuesof exactplacementof glyphsrelatetowritten stylistics,andin generalmay vary substantiallydependinguponthe style offont, andwhetheroneis dealingwith ordinaryprintedtext, ordinaryhandwrittentext,or calligraphy. Suchstylistic concernsareoutsidethescopeof thepresentstudy.

A furtherpoint needsto bemadeabouttheuseof bracketsto indicateassociation,discussedabove. In principle, the unboundeduseof pairedbracketsintroducesnon-regularity, sincenon-finitesetsof well-formedbracketingsarewell-known to requirecontext-freepower;see,e.g.,(Harrison,1978,pages312ff.). Wecankeepthelanguagewithin thesetof regularlanguages,however, if we limit thedepth � of bracketingthatwe allow, thusalsolimiting thenumberof switchesbetweencatenationoperatorsthatweallow. (Sincebracketingis not involvedwhenwecombineelementswithin agivencatenationoperator— e.g. whenwe combine g r f�h and k�l&itm r f�k�l$��m using r f —thereareno restrictionson “depth” of combinationin suchcases.)It is unclearwhatthesettingfor � shouldbe,but a reasonablesettingmight beseven.3 This would bemorethansufficient to allow for anexhaustivestructuralanalysisof themostcomplexChinesecharacters(Rick Harbaugh,personalcommunication);seeSection2.3.4.

Thecomputationaldevicescorrespondingto planarregularlanguagesandrelationsareplanar(or “two-dimensional”)finite-stateautomata(2FSA)andplanarfinite-statetransducers(2FST), respectively. Wecandefineaplanarfinite-stateacceptoralongthelinesof thedefinitionof (one-dimensional)finite-stateautomatafrom Appendix1.A.1,addingto thedefinitionasetof directions,astartposition,andasetof groupingbrack-ets; computationallyit is easierto definethe machinesusingbracketsratherthanintermsof operatorprecedence.

A planar finite-state acceptor is an octuple � �l&��{���{&�a{���{���{��{��Q{itm where:

1. � is afinite setof states

2. � is a designatedinitial state

3. � is thestartingposition(in theplanarfigure)for � , chosenfrom theset � left, top,right, bottom� .

4. � is thesetof groupingbrackets � [, ] �5. � is a designatedsetof final states

6. � is thesetof directions� R(ight) , L(eft), D(own), U(p), I(nwards)�(correspondingto thecatenationoperatorsr f , ~ f , n f , � f and d , respec-tively)

7. � is analphabetof symbols,and

8. i is a transitionrelationbetween�)�,l��.�s������m���� and �To recognizethefigurein Figure2.2we effectively needto have a 2FSAthatrec-

ognizesthedescriptionk�l&g"m r f¡  k�lph�m n f�  k"l��jm r fQk�l&itm-¢ ¢ . Soweneedto haveamachine3In asimilar vein,Church(1980)proposedahardlimit on thedepthof embeddingin syntacticstructure

in orderto beableto implementafinite-statesyntacticanalyzer.

40 CHAPTER2. REGULARITY

0 1

2 3

8

γ(α)

γ(β)

γ(ζ)

[ [

] ]γ(δ)

R

R

R

D

RR

R R

left

Figure2.4: A 2FSA that recognizesFigure2.2. The labels“R” and“D” on the arcsdenotereadingdirection; “left” on state0 (the initial state)denotesthe position at which scanningbegins.

wherescanningbegins at the lefthandside of the figure, proceedsrightwardsread-ing k�l&g"m , readsrightwardsacrossonegroupingbracket,readsrightwardsacrossk�lph�m ,readsdownwardsacrossone groupingbracket, readsrightwardsacrossk�l$��m , readsrightwardsacrossk"l$itm , andfinally readsrightwardsacrosstwo groupingbrackets.A2FSAthataccomplishesthis is givenin Figure2.4.

A 2FSTcanbe definedsimilarly to a 2FSA. For our purposeswe areinterestedin machinesthatmapfrom expressionsconstructedusingstringcatenation,to expres-sionsconstructedusingplanarcatenationoperators.Theonly partof thedefinitionthatchangesis 8:

A planar finite-state transducer is an octuple � �l&��{���{&�a{���{���{��{��Q{itm where:

1. � is afinite setof states

2. � is a designatedinitial state

3. � is thestartingposition(in theplanarfigure)for � , chosenfrom theset � left, top,right, bottom� .

4. � is thesetof groupingbrackets � [, ] �5. � is a designatedsetof final states

6. � is thesetof directions� R(ight) , L(eft), D(own), U(p), I(nwards)�(correspondingto thecatenationoperatorsr f , ~\f , n f , � f and d , respec-tively)

2.2. THE LOCALITY HYPOTHESIS 41

0 1

2 3

8R

R

R

D

RR

R R

left

ε:γ(β)

ζ:γ(ζ) δ:γ(δ)§ ε:]

ε:[ ε:[

ε:]

α:γ(α)

Figure2.5: A 2FSTthatmapstheexpression[X¨X©�ª to Y5Z\[5]a_|«Q¬ Y5Z�¨4]a­ «X¬ Y4Zp©�]^_|«WY5Z\ª�]$®¯® .

7. � is analphabetof symbols,and

8. i is a transitionrelationfrom �)���°�,l��.�����s��m���� to �In generalarcsthatarelabeledwith bracketson the“planarside” will belabeledwith� on the“string side” of the transduction.A 2FSTthatmapstheexpressiongahq�ti to

k�l&g"mor�f±  k�lph�m�n f�  k"l��jm�r\fQk�l&itm-¢ ¢ (Figure2.2),is givenin Figure2.5.

2.2 The Locality Hypothesis

How arethevariouscatenationoperatorsactuallydistributedin a writing systemthatusesmore than one? Invariably one finds a situationsuchas the following. At amacroscopiclevel, the script runs in a particulardirection, say left to right ( r f ) or

top to bottom( n f ) (seeSection2.5); the particularchoicemay be somewhat free, asit is in Chinese,but whatever is chosenis fixed for a given text. Alterationsof thismacroscopicorderoccuronly locally. Thusin Chinese,theconstructionof aparticularcharactermay involve variouscombinationsof thecatenationoperatorsthatwe havedescribed,but thereis (for a giventext) only onechoiceavailablefor catenatingthatcharacterwith thefollowing one.Thisobservationleadsto thefollowing claim,whichwe shalltermLocality:

42 CHAPTER2. REGULARITY

(2.4)Locality

Changesfrom themacroscopiccatenationtypecanonly occurwithin a graphicunit that correspondsto a small linguisticunit(SLU).

As we shallsee,in many writing systems,theSLU in questionis thesyllable,thoughin somecases(Section2.3.2) the “orthographicsyllable” is non-isomorphicto thephonologicalsyllable;in othercasestheunit seemsto betheword (aswe shallseeina casefrom Aramaicin Section4.4.1).This issueis furtherdiscussedin Section2.4.

In theremainingsectionsof thischapter, weturnto anapplicationof theformalismandtheorydevelopedhereto variousphenomenafoundin writing systems.

2.3 Planar Arrangements: Examples

In this sectionwe discussfour writing systems— KoreanHankul, Devanagari,Pa-hawh HmongandChinese— which make substantialuseof more thanone planarcatenationoperator. In eachof thesecasestheSLU is thesyllable,thoughin Devana-gari, the relevant notion of “syllable” is orthographicallyratherthanphonologicallydefined.In thefinal subsectionwediscussanapparentcounterexampleto theclaimofregularity from AncientEgyptian.

2.3.1 KoreanHankul

Thediscussionof KoreanHankulheredrawsuponthedescriptionpresentedby King(1996)(andseealso (Sampson,1985)).Thefollowing summarizesthefactsdiscussedin detail by King. The lettersof Hankul are arrangedinto “syllable-sized”glyph-s.4 Thesyllable-sizedglyphsarecatenatedwith eitherleft-catenationor downwards-catenation. Within the syllable-sizedglyphs, however, both left- and downwards-catenationareusedin waysthat arepredictablegiven the particularsegmentsbeingcombined.Vowel anddiphthongglyphsareclassifiedinto two classes,VERTICAL andHORIZONTAL; examplesof eachof thesewill be givenmomentarily. All orthographicsyllablesin Hankulmusthaveonsets:if thecorrespondingphonologicalsyllablelacksanonset,thena“placeholder”glyph ²³ is usedto representtheemptyonset.5 Thatis:k�l$´�µ�¶*·&¸-¹ºmo� ²³ .

4Note that by “syllable”, herewe meansyllableat a morphophonemic(ratherthansurfacephonemic)level of representation:in many cases,unitsthatarerepresentedorthographicallyassyllablesdo not repre-sentsinglesyllablesin thesurfacephonology. TheORL in ModernKoreanorthographywould appearto befairly deep.King observes(page223)that:

Hankul orthographydrifts from a moreor lessconsistentlyphonemicapproachin thefifteenthcentury, to anincreasinglymorphophonemiconeby thetwentiethcentury.

Sampson(Sampson,1985,pages135ff.) givesadetaileddiscussionof this issue.5In codaposition,thissymbolrepresents/ » /, whichdoesnotoccurin syllable-initialposition.

2.3. PLANAR ARRANGEMENTS:EXAMPLES 43

As examplesof theconstructionof of Hankulorthographicsyllables,considerthe

glyphs y mos} and y cal} . Thefirst is constructedout of threecomponentsarrangedin verticaldescendingorderasfollows:

¼ y m }¼ y o }¼ y s}

For y cal} , thecomponentglyphsareasfollows,with thefirst two arrangedhori-zontallywith respectto eachother, but abovethethird glyph:

¼ y c }¼ y a}¼ y l }

Theglyph y o } belongsto thehorizontalclasswhereas y a} belongsto the

verticalclass— notethelargely horizontalorientationof y o } asopposedtoy a} — andthis in turn correlateswith the fact that y o } is arrangedvertically withrespectto theprecedingonset,whereasy a} is arrangedhorizontally.

The macroscopicarrangementof Hankul syllable glyphs is traditionally top-to-bottom,thoughleft-to-rightarrangementis becomingmuchmorecommon.Deviationsfrom this macroscopicorderonly occurwithin the syllable,promptingthe followingstatementfor Hankul:

(2.5) TheSLU is thesyllable.

Thefull setof rulesfor thearrangementof glyphsin Hankulareasfollows;see(Samp-son,1985, page132) and(King, 1996,page222). In this versionof the rules,weassumethatsyllable-sizedunitsarearrangedin a left-to-right fashion:

¼ For syllables½Xv and ½9w , k�l&½4v�f�½�w�m¾��k"l$½Xv�m�r\f¿k�l&½�w�m .¼ For onset-nucleusclusterÀWÁ andcoda  , k�l\ÀWÁ�f�ÂXmo�°k"lpÀWÁÃm n fÃk�l$ÂXm .¼ If coda is complex, consistingof (maximally)two consonants v and  w , then

k"l�Â�mo��k"l� v f� w m¾�#k�l$ v m r fÄk�l$ w m¼ For onsetÀ andnucleusÁ ,

– if Á belongsto theVERTICAL classthen k"lpÀ.fÅÁÃmo�°k"lpÀWm r fÄk�l$Á9m– elsek�l\À,f�ÁÃmo�°k"lpÀWm n fÃk�l$ÁÃm

44 CHAPTER2. REGULARITY

  n f ¢ n f �y m } y o } y s} y mos}  r f ¢ n f �y c } y a} y l } y cal}

Figure2.6: ThesyllablesÆ mosÇ /mo/ ‘cannot’,and Æ calÇ /cal/ ‘well’ in Hankul.

Theprinciplesof Hankulgraphicsyllableconstructionareillustrated in Figure2.6,againusingthe syllables/mos/and/cal/. Recall that y o } belongsto the horizontalclass,whereasy a} belongsto the vertical class. Figure2.6 makesuseof bracketsto indicategrouping.Thus,in y cal} , y c } is catenatedto the left of y a} , andthenthis whole groupis catenatedon top of y l } : the alternative bracketingwould resultin a differentarrangementof symbols.However, in Hankul it is possibleto dispensewith brackets in favor of operatorprecedence,as discussedin Section2.1: giving

leftwardscatenationhigherprecedencethandownwardscatenation— r f�È n f — yieldsthedesiredresult.

2.3.2 Devanagari

The Devanagariscript is a modernIndian script derived originally from the Brahmiscript (Bright, 1996). It is usedto representHindi, NepaliandMarathi,aswell asavarietyof local languagesof NorthIndia; it is alsotheusualscriptusedin representingSanskrit.In thepresentdiscussion,wewill assumetheuseof theDevanagariscriptasawriting systemfor Hindi, thougheverythingthatwewill discusscarriesover, mutatismutandis,to thescript’susefor otherlanguages.

Bright describesDevanagariasanalphasyllabary, meaningthat thesystemis ba-sically alphabetic,but that the symbolsarearrangedin syllable-sizedunits. As weshallseemomentarily, therelevantsyllablesfor theorthographyarenot isomorphictophonologicalsyllables.

Thebasicfeaturesof thescriptthatwill concernusherearethefollowing:

¼ (Phonological)syllable-initialvowelsarerepresentedasfull symbols,but whencombinedwith a precedingconsonantthey appearin diacritic forms that ap-pearabove, below, beforeor after the consonantin question. The vowel / É /hasno diacritic form, sothata consonantwithout a vowel markhasaninherentschwa. Theformsof theindependentanddiacriticvowels(with onsetconsonantÊ y k } ) aregivenin Table2.2.

¼ Consonantclustersarerepresentedby ligaturedgroups,thatbehaveasunitsforthepurposesof vowel placement.Thus ËÌy s} +

Ê y k } yields Í Ê y ska} ;Ê

y k } + Î�y s} + ÏÐy m } yields Ñ�ÏÒy ksma} .6 A sequence/skI/ is represented

6Therulesfor ligatureformationaresomewhatcomplex andwill notconcernushere.

2.3. PLANAR ARRANGEMENTS:EXAMPLES 45

Expression Full form Diacritic form

Null Ó y�Éj} Ê y k Ét}After Ó¡Ô y a} Ê Ô y ka}

Ó6Õ y o } Ê Õ y ko }Ó6Ö y�×j} Ê Ö y k ×t}Ø y i } Ê�Ù y ki }

Above Ú y e} Ê Û y ke}Ú Û yÄÜ�} Ê Ý y k ÜÅ}

Below Þ y U } Ê ß y kU }à y u } Ê á y ku }â y ri } Ê ã y kri }Before ä y I } å Ê y kI }

Table2.2: Full anddiacritic forms for Devanagarivowels, classifiedby the position of ex-pressionof thediacritic forms. Thus“after” meansthatthediacritic occursafter theconsonantcluster, “below”, below it, andsoforth.

å$Í Ê y i+sk } , with the y i } occurringin the positionbeforethe cluster. Theligatured-unit-plusvowel combinationformsanorthographicsyllable.

¼ A preconsonantalinitial /r/ in anorthographicsyllableis representedby asuper-scriptsymboloccurringat theendof theorthographicsyllable.Thus/v É rma/isrepresentedasæ�ÏWÔç y v É ma+r} .

An algorithmanda setof mappingrulesfor Devanagarithat handlesthesefactsfollows:

¼ Divide thephonologicalstring into orthographicsyllablesby placinga syllableboundaryèºéëê :

– At thebeginningof theword;

– Betweeneachpair of adjacentvowels;

– Beforethefirst consonantof a cluster.

ThustheSLU in Devanagariis theorthographicsyllable.

46 CHAPTER2. REGULARITY

¼ Assumea function ìoí�î , which formstheligaturedform of asequenceof conso-nantglyphs.Then k�l$ï�vðï�w�ñ�ñ*ñ�ïWò�mo��ì¾í$î5l\k�l$ï�vëm�k"l�ï�w�m4ñ�ñ*ñ�k"l�ïWòÃmm .

¼ Let �0óXô$ô|õ bethefull vowel glyph for vowel ö÷õ , then k�l$ö6õ�moø)�0ó4ô&ô|õ�ù è�éëê .

¼ For consonantcluster  andvowel ú , thenk�l$Â�fðú5m �#k�l$ÂXm if ú��2É�#k�l$ÂXm r fQk�lpú5m if ú�� /a,o,× ,i/�#k�l$ÂXm � f9k"l&ú4m if ú�� /e,Ü /�#k�l$ÂXmqn f9k"l&ú4m if ú�� /u,U,ri/�#k�l$ÂXm ~ fQk�lpú5m if ú�� /i/

¼ For an orthographicsyllablestartingwith /r/ andremainderÂÃú with non-null

consonant , k"l-ùÅûðùÄf�Â9ú4mo��k"l-ùÅûðù�m n f6k�l$Â9ú4m .Thepropertiesof Devanagarithatwehave justanalyzedarecommonamongother

IndianandIndian-derivedscripts.Indeed,comparedwith thoseof someotherscripts,Devanagaridiacritic vowels are relatively simple: Thai for example(Diller, 1996)hasvowel symbolsthat not only occurabove, below, beforeandafter the consonantsymbol,but alsovowel diacriticsthatsurroundtheconsonantsymbol.

2.3.3 PahawhHmong

The Pahawh Hmong messianicscript inventedin 1959 by ShongLue Yang, a H-mongpeasant,is describedat lengthin a fascinatingstudyby Smalley, VangandYang(1990);a more concisedescriptioncanbe found in (Ratliff, 1996). Therewereac-tually four stagesof thescript,which evolvedasShongLue Yangrefinedhis originaldesign:we will beconcernedwith theThird Stage,which is theversionthatreceivedthewidestacceptanceanduse.Therearetwo setsof glyphsin Pahawh, thefirst rep-resentingonsetconsonantclusters,andthe secondthe rime, i.e. the vowel plus thelexical tone.7 Pahawh is thussometimesdescribedasa demisyllabicsystem,thoughthis is really a misnomer. Thewriting runsfrom left to right, with spacesseparatingsyllables,making the syllable-sizedchunksquite easyto identify. What is notableaboutPahawh is that the glyph representingthe rime is systematicallywritten to theleft of theglyph representingtheonset,in contraventionto theoverall left-to-right or-

derof thescript.For exampleShongLue’snameis written y ´×ð»^} + y s}y ˆü } + y l } /s´×ð» lˆ

ü/ in Pahawh Hmongwriting. Clearly, asothershave noted,onecan

view this asa generalizationof thepropertyof many IndianandIndian-derivedSouthEastAsianscripts,to allow somevowel glyphsto precedeconsonantclustersthat,onthe basisof phonologicalordering,they logically follow; we saw an exampleof this

7In thefinal (fourth)stage,thevowel andtonesymbolshadbecomecompletelyseparate,andevenin theThird stage,thereis apartialseparation,with sometonalinformationbeingrepresentedby diacriticsymbolswritten over thevowel symbol. We will not beconcernedwith therepresentationof tonehere,andfor thepurposesof thisdiscussion,wewill considerthevowel-plus-tonecombinationasa singleunit.

2.3. PLANAR ARRANGEMENTS:EXAMPLES 47

with Devanagariin theprevioussection.However, Pahawh is theonly known writingsystemthatconsistentlyhasthis reversal.8

While theorigin of thisuniquefeatureof Pahawh is mysterious,its implementationwithin thecurrentframework is simple.As for Hankul,theSLU is thesyllable:

(2.6) TheSLU is thesyllable.

TherulesdescribingHmongglypharrangementsareasfollows:

¼ For syllables½Xv and ½9w , k�l&½4v�f�½�w�m¾��k"l$½Xv�m�r\f¿k�l&½�w�m .¼ For onsetÀ andnucleusÁ , k�l\À,f�ÁÃmW�#k�l\ÀWm ~ fQk�l$ÁÃm

2.3.4 Chinese

As will be recalledfrom Section1.2.2(andseealsoSection4.2), Chineseis a partlylogographicwriting systemwheremostindividualcharactersaremadeupof acompo-nentthatgivessomeinformationaboutthepronunciation(the“phonetic” component)andanothercomponent(the “semantic”component)thatgivescluesto themeaning.Thesetwo componentscanbearrangedin a numberof waysrelative to eachother, aswediscussedbriefly in theintroductionto thischapter. Fromtheannotation-graphrep-resentationof a Chinesemorphemesuchasthatin (1.10),repeatedbelow as(2.7),wehave assumethat the semanticinformationassociatedwith that morphemeoverlaps,but doesnot dominate,thephonologicalinformation:

(2.7)

SEM: cicada: ýTONE: 2SYL: ½ : þONS-RIME: ch an

Given Axiom 1.3 from Section1.2.2, it follows that the imageof the semanticpor-tion under �Pÿ���� r � mustcatenatewith theimageof thephonologicalportionunder� ÿ���� r � . TheSLU in Chinese,likethethreewriting systemswehavejustdiscussed,is the syllable,and(with a singleexceptionthat neednot concernus here)syllablesarein turn implementedusingsinglecharacters.In principletherefore,thecatenationoperatorchosento implementthe within-charactercombinationof the semanticandphoneticelementscandiffer from themacroscopiccatenation,whetherthatbethetra-ditionaldownwardscatenation,or themoremodernleft-to-rightcatenation.In fact,theparticularcatenationoperatorchosendependsupona relatively complex setof rulesandlexical specifications.This sectionpresentsa preliminaryanalysisof theinternalstructureof Chinesecharactersin termsof thepresentplanargrammarformalism.

8Furthermore,sinceShongLue Yangwasilliterate whenhefirst beganto createthescript, it is hardtoseehow hecouldhaveknown aboutthistendency to reversethelogicalorderin theregion’swriting systems.

48 CHAPTER2. REGULARITY

Therehasbeena long history of structuralanalysisfor Chinesecharactersstart-ing with the AD 200 Shuo Wen Jie Zı; see(Wieger, 1965) for a brief history. Oneimportantpoint in thehistoryof Chinesecharacterstudiesis thecompendiousdictio-narycompiledduring the reignof theKang-Xi emperor(r. 1661–1722):themodernclassificationof charactersaccordingto semanticradicalslargely follows theusageinthat dictionary. In moremoderntimes, therehave beenvariousgenerative analyses.Onesuchstudywasthatof FujimuraandKagaya(1969),who constructeda programthatwascapableof generating(andoutputtingon anoscilliscope),not only realChi-nesecharacters,but alsopossiblecharacters— thatis charactersthatarenon-existent,but obey the structuralconstraintsof Chinesecharacters.Wang (1983)presentedagenerative-grammar-basedmodelof Chinesecharacterstructurethatpredictedtherel-ative placementof semantic“classifiers”and phonetic“specifiers”within characters,andalsoprovidedamodelof theactualwriting of thecharacters,with specialattentionbeingpaidto theanalysisof thestrokeorder;wewill discussWang’sanalysisin moredepthmomentarily.

An interestingstudyby Myers (1996)arguesfor the relevanceof prosodichead-ednessin Chinesecharacterconstruction.By assumingthat the structuralheadof acharacteris (dependingupontheoverall compositionof thecharacter)on thebottom,the right or thebottomright, Myers is ableto explain several robust featuresof Chi-nesecharacters:for instance,thelargestcomponentor stroketendsto beonthebottomor on the right — i.e., in the headposition; the leftmoststroke in a characterwith asignificantamountof structureto its right is curved— i.e., a non-headverticalstrokeis curved;thereis a strongpreferencefor semanticcomponentsto occuron theleft oronthetop— i.e., in anon-headposition.This latterpoint is completelyin accordwithour observations,reportedbelow. Finally, Myersnotesa tendency for reducedformsof radicals(seepage50), to occuron thetop or theleft — i.e., in non-headposition.

Finally, thereis a website— www.zhongwen.com (Harbaugh,1998)— whichproducesstructuralanalysesfor selectedcharacters.Oneof the pointsthat is nicelybroughtout in this websiteis the fact that Chinesecharactersaretree-structuredob-jects,andcomplex characterscanbeanalyzedinto many levels.Thusin thecharacter� y TREE+FENG } feng‘maple’, which is analyzedat thelevel weareinterestedin as� y TREE } r f�Òy FENG } , we canfurther breakdown the righthandcomponentinto

the components and ý . Thusan exhaustive analysiswould be� r f   n f$ý�¢ ; ex-

amplesmorecomplex thanthisarenothardto find (cf. theexample� l ın ‘fish scale’,introducedearlier). In thepresentdiscussionwe will only beconcernedwith the toplevel, namelythecombinationof semanticwith phonetic,or semanticwith semantic,in charactersthathavesuchanalyses.

We return now to the study by Wang (1983). The analysis of semantic-phoneticcomponentplacementin Wang’s study is feature based,using the fea-tures[high], [low], [left] and[right]. A semanticcomponentsuchas � y DOOR } ,which typically takes its phonetic componentinside (e.g. y DOOR+GUI }guı ‘door to women’s apartments’),is analyzedas [+high,+left,+right]. Similar-ly � y FORCE } , which typically occurs on the right ( � y FORCE+JING } j ın‘strength’, is analyzedas [+right]. Finally ����� �Py SURROUND } , which complete-

2.3. PLANAR ARRANGEMENTS:EXAMPLES 49

left 8,303top 1,964bottom 1,246right 882surround 159other 174

Table2.3: Distributionof placementof semanticcomponent(Kang-Xi Radical)among12,728charactersfrom theTaiwanBig5 characterset.

ly surroundsits phonetic( � y SURROUND+HUO } guo ‘country’), is analyzedas[+high,+low,+right,+left]. Phoneticcomponentsmay also have specifications: �zhuang as in � y CLOTHING+ZHUANG } zhuang ‘pack, contain’, is [+high,� low].Wang assumesa seriesof rules that usethe featurespecificationsto determinetheactualplacementof thecomponents:

¼ If neithercomponentis specified,a default semantic-left,phonetic-rightplace-mentis used;

¼ If onecomponentis specified,the oppositefeaturesarefilled in for the othercomponent;

¼ If bothcomponentsarespecified,andif thereis a conflict, thesemanticcompo-nentwins.

Therearevariousinterestingaspectsto Wang’sanalysis,andonthewholeit seemsto beontheright track.It is certainlytrue,for example,thatthesemantic-left/phonetic-right placementis in somesensethedefault. This is shown in Table2.3,which showsthe frequencies,for 12,728charactersfrom the Taiwan Big5 characterset,of vari-oussemanticradicalplacements.9 Oneproblemwith Wang’sanalysis,though,is thatit is too powerful. In particular, the featuralsystemhe developswould predict thatonemight have a componentthat is specified[ � left, � right, � high,� low], andwouldthusselectto bein themiddleof whatever it combineswith. However, theonly caseswheresuchplacementoccursis in fossilizedforms. Thus � dong ‘east’ is tradition-ally analyzedasbeingcomposedof � rı ‘sun, day’ placedin the middle of

�mu

‘tree, wood’, but � rı doesnot in factselectfor this position: thereis no productivecharacter-formationprocessthatcanaccountfor � dong. This is why thesurroundingcatenationoperatord doesnothaveadual: thenotionof “insidecatenation”doesnotappearto benecessary.

It alsoseemsthatthedifferencebetween� y DOOR } as[+high,+left,+right],ver-sus � ��� � y SURROUND } as[+high,+low,+left,+right], is ratherredundant:presum-ably the fact that the latter completelysurroundsits sistercomponent,whereastheformer only partially doesso, follows from the shapesof the two components.Tohandlebothcases,it oughtto besufficient to saythatthey surroundtheir sister.

9Here,I tookatfacevaluethetraditional214Kang-Xi radicals,assumingthatthesearein factthecorrectsemanticcomponentsin all cases.Occasionallythis assumptioncanbemisguided.

50 CHAPTER2. REGULARITY

left 1,745top 313bottom 313right 166surround 51

Table2.4: Distributionof placementof semanticcomponent(Kang-Xi Radical)among2,596charactersfrom theTaiwanBig5 characterset.

Wethereforedispensewith thefeaturalapproach,andpresentinsteadapreliminaryanalysisbasedon thecatenationoperatorsintroducedin this chapter. Our analysisisbasedupon2,596charactersfrom theTaiwanBig5 charactersetfor whichweknow thebreakdowninto semanticandphonetic/semanticcomponent.10 As weseein Table2.4,the relative magnitudesof the differentplacementsof the componentsis roughly thesameasfor the fuller Big5 characterset (Table2.3), so this smallersamplemay betakenasrepresentative.11

I now turn to a descriptionof the rulesdevelopedto handlethesecharacters.Inthefollowing discussion,I dispensewith thenormalglosses,exceptwherecritical, soasnot to overly clutter the text. Note that somesemanticradicalsaremarkedas fullor red (reduced): in thesecasestherearealternative forms — “full” and“reduced”for the radical,andthesetwo forms behave differently. Thus � xın ‘heart’ hastwoforms as the radical y HEART } , onebeing moreor lessthe sameshapeas the fullcharacter(placedunderneaththe secondcomponent),andthe othera reducedthree-stroke component,as in the lefthandportion of � mang ‘busy’ (placedto the leftof the secondcomponent). We assumefor the presentthat it is part of the lexicalorthographicspecificationof a morphemewhethera particularcharacterhasthe fullor reducedvariant,thoughto someextentonecanpredictthis by the positionof theradical in the character(Myers, 1996). Which of the two options— “full” or “red”— is markeddependsuponwhich oneis reasonablyregardedasthedefault; only thenon-defaultis markedin theglosses.Finally, asindicatedin thedescriptionsbelow, weassumethatoneof thecomponents— usuallythesemanticradical,but sometimesthephoneticradical— is the “determiningcomponent”:it is the one thatwill be listed

first in the formulae,so that an expressionsuchas � n fQ� , with � the determiningcomponent,will unambiguouslymeanthat � occursabove � :

(2.8) (a) The following, asphoneticcomponents,take precedence— i.e., deter-mine the placementof the componentsin the character:� , , ! , ", # , $ , % , & . In othercasesthe semanticcomponentdeterminestheplacement.As notedabove, the “determiningcomponent”occursto the

10Thesebreakdownsweretakenfrom theraw datausedin thewww.zhongwen.com (Harbaugh,1998)website.I amgratefulto Rick Harbaughfor makingthesedataavailableto me.

11For thepurposesof thepresentdiscussion,we eliminatedcharacters,suchas � dong ‘east’,which donotbelongto oneof thefivecategoriesin Table2.4. Invariablysuchcharactersareold constructionsthatarenotbuilt outof their supposedcomponentsby any productive process.

2.3. PLANAR ARRANGEMENTS:EXAMPLES 51

left of thecatenationoperatorin theformulae.

(b) f�' n f , if any of the following is thedeterminingcomponent:( , ) , * ,+, ! , ,.-0/1 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , " , # , $ , % , & .

(c) f:' ~ f , if any of thefollowing is thedeterminingcomponent:; , < , = ,>, ? , @ , A , B , C , � , D , E .

(d) fF' � f , if any of thefollowing is thedeterminingcomponent:G ,� -0/�1 1 , ý,ý ,12 H , � , I.Jp¸LK , M , N , O , P -0/1 1 , ; -Q/�1 1 , � .

(e) fR' d , if any of thefollowing is thedeterminingcomponent:� , S , T, U.��� � ( �Òy SURROUND } ), V

(f) OtherwisefR' r fThe currentrulesetis able to analyze88% of the 2,596characters,leaving the

remaining12%to belexically specifiedexceptions;examplesof 196of theanalyzed,and 38 of the unanalyzedcharactersare given in Appendix 2.A. Although furthertuningof the rulescouldundoubtedlyincreasethecoveragefurther, it mustbebornein mind thatsomeamountof lexical specificationis alwaysgoingto benecessary:forexample, W y EYE } and X y MANG vZY\[�wëv�]_^ `\a\` } cancombinein two ways,yieldingthecharactersb mang ‘blind’ and c mang, a variantspellingof b .

Oneinterestingobservationaboutthesedatais thatif oneconsidersonly thechar-actersfor which the phoneticcomponentis a perfectpredictorof the pronunciation(includingtone)of thewholecharacter— 624charactersin thisset— theaccuracy oftherulesetpresentedincreasesto 92%.Whatthismeansis that“regularlypronounced”charactersarealsomoreregular in structure.This result is perhapsnot surprising:itparallelscommonlyobservedpatternsin morphologywheresemanticallytransparentandmorphologicallyproductive constructionsalsotendto be the onesthat aremorephonologicallyregular. Presumablythis increasein regularityfor phonologicallyregu-lar compoundsis usefulfor theChinesereader. As wediscusselsewhere(Section5.2),thereis psycholinguisticevidencethatChinesereadersmakeuseof thephoneticcom-ponentof characterswhenthey areuseful; in order to be ableto reliably locateandidentify thephoneticcomponent,it helpsif theplacementof this componentis moreregular, andlesssubjectto idiosyncraticlexical specifictions.In Appendix2.A, regu-larly pronouncedcharactersareindicatedwith ‘ d ’.

Oneadditionaltopic thatWang(1983)discusses,andwhichweneedto provideanaccountfor is what he termsclassifierraising: in somecasesthe semanticclassifierof thephoneticcomponentis “raised” to becomethe semanticclassifierof theentirecharacter. This is a relatively rarephenomenon,restrictedin part to spellingvariantsof certaincharacters.Howeverthereareacoupleof phoneticcomponentsthatseemtoundergothisprocessasamatterof course.Onesuchcomponentis thetopportionof ey INSECT+TENG f [ëvºwg]h^ iZaji } teng ‘serpent’, k y SILK+TENG f [ëv�w�]h^ ila\i } teng ‘bind’or m y HORSE+TENG f [�v�w�]h^ iZaji } teng‘mount’. Thephoneticcomponentin eachcaseconsistsof the portion above the semanticradical, plus the lefthandcomponentn ,which is at leaststructurallythe phoneticcomponent’s own semanticradical. This

12I.e.,doubleýpo INSECT q .

52 CHAPTER2. REGULARITY

n hasbeenraisedto becomethe semanticradicalof the whole character. In orderto accountfor this in the currentformalism, we needto assumea rule suchas thefollowing,whichreassociatesthecomponentsof characterscontainingthisphonetic:13

(2.9) rRs�tvuxwFy\z|{�fj �n f ¢4ø}n r f�  n f:rRst~u�wFylz�{ë¢2.3.5 A counterexamplefr om Ancient Egyptian

Thehieroglyphicwriting systemof AncientEgyptian (Gardiner, 1982;Ritner, 1996)presentsonecasethatis problematicfor regularity. In Old KingdomEgyptian,pluralnumberwas indicatedin the orthographyby the doublecopying of the baseword,or a portion thereof. The copying could be implementedin variousways includingduplicatingtheentirephonographicallywrittenwordasin (2.10),duplicatingtheentirelogographas in (2.11) or duplicatingonly the semanticclassifierof the word as in(2.12):

(2.10) nr

rnw ‘names’

(2.11) ntr ntrw ‘gods’

(2.12)

n

h

w t

TREE nhwt trees

Theproblemlies in thefact that theEgyptianplural wasmorphologicallymarkedbysuffixation, -w with masculinenounsand -wt with femininenouns.14 This suffix -wt is actually spelledout in the orthographyin the representationin (2.12), thoughit wasnot in generalrequiredto spell out the plural suffix. Sincethe orthographicduplicationdoesnot representany linguistic duplication,it mustbe partof the map-ping betweenlinguistic representationandorthography, or in otherwordsit mustbehandledby �Pÿ���� r � . Sinceregular relationsarenot powerful enoughto handlear-bitrary copying, this orthographicpracticestandsasa counterexampleto theclaim ofRegularity.

It is interestingto notethat this copying wasonly usedduring the Old Kingdom(prior to 2240BC); by the Middle Kingdom it hadbeenreplacedby a logographic

13In thisandsimilarcases,then,weactuallyneedto delveslightly deeperthanthetop-level componentialanalysisthatwehave dealtwith in this section,andat leastrecognizeasecondtier of structure.

14Thesamepoint canbemadefor theorthographicrepresenationof dualnumber, which wasalsorepre-sentedby copying, in thiscaseasinglecopy. As in thecaseof theplural, thedualwasencodedmorpholog-ically via suffixation.

2.4. CROSS-WRITING-SYSTEMVARIATION IN THE SLU 53

symbolinvolving threestrokes(Gardiner, 1982,page59),apparentlyderivedfrom theoriginal repetition.An exampleis givenin (2.13):

(2.13) ntr Pl ntrw gods

Oneassumesthat this device wasintroducedby the scribesasanabbreviatory aid tosave themwriting (or carving) threecopiesof a setof glyphs. But it is interestingthatthis abbreviatorydevicealsorenderedEgyptianorthographymoreRegular, in theformal sensedefinedhere.

Themostsimilar exampleI amawareof in a modernorthographyis theencodingof plurality in Spanishinitials by doublingtheletterscorrespondingto thepluralnoun:thusEstadosUnidos‘United States’is abbreviatedasEE.UU., andFuerzasArmadas‘armedforces’ is FF.AA.. But althoughthis processis productive in Spanish,sinceit only involvesdoublingindividual lettersof thealphabetit is feasibleto simply listall doubledletterswith an indicationthat thesedoublesareto be usedto abbreviateplurals.(Alternatively onecouldwrite arulethatcopiesindividual lettersof thealpha-bet,andthis rule couldberepresentedasa transducer:notethoughthatthis would becomputationallyequivalentto simply listing thedoubles.)

Apart from theSpanishexample,which is easilyhandledwithin the theory, I amawareof no othercasesof suchduplicationin amodernwriting system.It is temptingthereforeto concludethat the Egyptianexamplewasmarked,asthe theorypredicts,andthattherarity of suchcasesis aconsequenceof this markedness.

2.4 Cross-Writing-SystemVariation in the SLU

All of the writing systemsthat we have examinedin the previous sectioninvolvesyllable-sizedSLU’s. This is no accident.Writing systemswherebasicglyphsareor-ganizedinto syllable-sizedunitsseemto bequiteprevalentamongtheworld’swritingsystems,ashasoftenbeennotedin the literature: for instance,Faber(1992),assignsa nodein her arborealtaxonomyof writing systemsto what shetermssegmentallycoded, syllabicallylinear scripts,which includeHankulandDevanagari.

However largerSLU’s certainlyseemto bepossible.For example,Mayanwritingappearsto have hadanSLU at thelevel of theword or smallphrase.Mayancomplexglyphsweretypically arrangedin pairedcolumnsthatwereto bereadleft to right, topto bottom(Macri, 1996).Thus,thereadingorderof thefollowing example,would beA B C D E F:

A BC DE F

The basicreadingorderis thereforeleft-to-right, with eachline of text consistingoftwo glyphs.Within eachglyph, thearrangementof basicglyphswassomewhatfreer,

54 CHAPTER2. REGULARITY

with signsrunning“loosely from upperleft cornerto lowerright corner, with generousallowancesfor artistic convention” (Macri, 1996,page178). The complex glyph isthereforeclearlythegraphicunit correspondingto theSLU, but whatkind of linguis-tic unit is this? Thesingleexamplepresentedin Macri’s discussion(1996,page178)suggeststhat in many casestheSLU is a single— potentiallymorphologicallycom-plex, usuallypolysyllabic— word,but in somecasesit seemsto beanasmallphrase.Table2.5 lists eachof the linguistic elementscorrespondingto a singlecomplex g-lyph in Macri’ssampletext. Figure2.7illustratesthetwo-wordglyphyakch’ul ‘greensacred’.

element gloss sizeof elementy-ak’aw he.presents wordu-pi(s) his-cycle wordpixol hat wordu-ha(l) his-necklace wordu-tup his-earrings wordyax green wordu-kawaw his-helmet wordch’ok young.one wordKawil (aname) wordy-ak’a-w it.is.given wordu-sakhunal his-whiteheadband (adjective-noun)phrasechantun sky stone (noun-noun)phrasehunwinik onetwenty phrase?yaxch’ul greensacred (adjective-adjective)phrase?

Table 2.5: Linguistic units correspondingto Mayan complex glyphs, from (Macri, 1996);glossingconventionsfollow thoseof Macri. In somephrasalcasesit is unclearwhethertheunit in questionis really a constituent:suchcasesareindicatedwith a questionmark.

Needlessto say, a morethoroughsurvey of thedecipheredcorpusof Mayantextswill benecessaryto determinethemaximalsizeof theSLU in thatwriting system,andwhatconstraints,if any existed,on valid SLU’s.15 This small text does,however, atleastshow thattheSLU in Mayanis largerthana singlesyllable.

Not only is thesizeof theSLU writing-systemdependent,but it seemsalsoto beconstructiondependent:onefindscaseswherethecatenationoperatoris changedonlyin certainkindsof (local) constructions.Oneexampleof suchconstruction-particularreorderingis foundin AncientEgyptian.This is theexampleof “honorific inversion”(Ritner, 1996,page80)wherebytermsfor godsor kings wouldbewrittenbeforeterms

15Thedatathatwouldallow oneto investigatethisquestionalreadyexist aspartof theMayaHieroglyphicDatabaseProjectattheUniversityof California,Davis,Departmentof NativeAmericanStudies.I hadhopedto beableto examinesomeof thesedata,andhadonmorethanoneoccasionrequestedaccessto aportionofthedatabase.Unfortunately, theserequestshave led nowhere.Theresolutionof this questionwill thereforehave to wait until thesepotentiallyvaluabledataaremadeavailableto awider rangeof scholars.

2.4. CROSS-WRITING-SYSTEMVARIATION IN THE SLU 55

YAX

ch’u

li

Figure2.7: A two-word Mayanglyph representingthe adjective sequenceyax ch’ul ‘green

sacred’,after(Macri, 1996,page179),with thearrangement����� _ «Q¬ �j�F�Q� ­ «��Q�p® Perconvention,thecapitalizedglossrepresentsalogographicelement,andthelowercaseglossesphonographicelements.

thatthey logically follow.16 Thusthephrasemdwntr (wordsgod)‘god’swords’wouldbewrittenasin (2.14),with thelogographfor ‘god’ beingwrittenbeforetheduplicated(henceplural) phonographfor /md/:

(2.14) ntr md

This canbeaccountedfor if we assumethat theSLU canbea word or small phrase.In thatcasewe canassumethatthereis simply reversalwithin thatSLU of theoverallcatenationoperatorin usefor the text. Thus if the text is beingwritten from left toright, andthereforegenerallyusesthecatenationoperatorr f , wecanassumeaspecialprinciplelike thatin (2.15)thatin honorificcontexts implementsf as ~ f ratherthan r f :

(2.15) k�l&g�fðh�m¾�°k�l&g"m ~ fQk"l&h�m , if h is to behonored.

As Ritnernotes,theEgyptianexampleis really no differentfrom the situationinmany modernwriting systemswith currency amounts.ThusconsiderEnglishexam-pleslike y $1000} for onethousanddollars or moresignificantly y $1 million } foronemillion dollars. The “symbolic abbreviation” (to coin a term) ‘$’ for dollar(s) iswritten beforethe numberphrasewhich it logically follows. (Note that the currencyterm that is “moved” may itself becomplex: onehundredUS dollars canbe writtenas y US$100} .) Thispromptsa construction-particularstatementsimilar to (2.15)inEgyptian:

(2.16) k�l&g�fðh�m¾�°k�l&g"m�~\fQk"l&h�m , if:

16Additional rearrangementsof symbolsfor artistic reasonswerealso found (Ritner, 1996). Plausiblysuchcasesaredueto stylistic considerationsandthusfall outsidetherangeof thepresenttheory.

56 CHAPTER2. REGULARITY

¼ g is anumberphrase,and k�l&g"m startswith a digit, and¼ h is acurrency term,and k�lphqm is asymbolicabbreviationfor thatcurrencyterm.

The first constraintcapturesthe fact that symbolicabbreviationsfor currenciesmustprecedea number expressedas a digit: one does not find such expressionsas* y $twentyfive} . Indeedthis in turn suggeststhat the true sourceof the reorderingmayactuallybeasurfaceorthographicconstraintthatrequiresthecurrency symboltoimmediatelyprecedeadigit:

(2.17) If k�lph�m is a symbolicabbreviation for a currency term h , then k"l&h�m mustpre-cedea digit.

This constraintthenforcesthecatenationoperatorto be ~\f , asin (2.16).Notethough,in any case,thattheSLU mustbedefinedfor thisconstructionto bethewholenumber-plus-currency phrase,in orderto accountfor thedeviation from thenormalordering.

Theexamplesdiscussedin this sectionandelsewherein this chaptersuggestthatthereis quitea rangeof variationin thedefinitionof theSLU acrosswriting systems,andevenwithin differentcomponentsof thesamewriting system.Wewill leaveit asatopic for futureresearchto provideacompletetaxonomyof thepossibleinstantiationsof theSLU.

2.5 MacroscopicCatenation: Text Dir ection

. . . their mannerof writing is verypeculiar, beingneitherfromtheleft to theright, like theEuropeans;nor fromtheright to theleft, like theArabians;nor fromup to down,like theChinese;but aslantfromonecornerof thepaperto theother. . .

Swift, Jonathan.1726.Gullivers Travels:A Voyage to Lilliput , chapter6.

Many of theexamplesdiscussedin this chapterrelateto themicro-arrangementofgraphicalsymbols.Naturally, in additionto specifyinghow, for example,the glyphsarrangethemselvesinto orthographicsyllables,any writing systemmustalsospecifytheoveralldirectionof thescript. As Harris(Harris,1995,chapter19) usefullypointsout,thenotionof directionis morecomplex thanit first appearsto be. In characterizingthe default directionalityof text in English,for instance,the following specificationsneedto bemade:

¼ Eachline of text is composedof glyphsarrangedfrom left to right.

¼ Eachpageof text is composedof linesarrangedfrom top to bottom.

¼ Eachmultipage documentis (whencorrectlybound)composedof pagesboundon the lefthandside.

Thesespecificationsare of coursescript-dependent.The traditional statementsforChineserun asfollows:

2.5. MACROSCOPICCATENATION: TEXT DIRECTION 57

¼ Eachline of text is composedof glyphsarrangedfrom top to bottom.

¼ Eachpageof text is composedof linesarrangedfrom right to left.

¼ Eachmultipage documentis (whencorrectlybound)composedof pagesboundon therighthandside.

In principle thethreetypesof specification— “line” level, pagelevel, anddocumentlevel — arealsologically independentof eachother. Still, asHarris notes,thereareplausiblebiomechanicalor otherreasonsfor theeliminationof somecombinations.Soonceonehasfixedone’s scriptasrunningin horizontallines,theoptionof arrangingthoselinesfrom bottomto topseemsnotto begenerallyavailable,presumablybecausethe productionof suchtext would requireone to cover up what onehadpreviouslywritten.17 TheLilliputians’ diagonalarrangementsof lines is presumablydisfavoredsimply becauseit would forceconstantlychangingline lengthson arectangularpage.

Similarly, multipagedocumentbindingpracticesarenot independentof thedirec-tion of thescript. If one’sscriptrunsfrom left to right acrossthepage,thereis anaturaltendency to want to reada multipagedocumentfrom “left-to-right”, whereasif one’sscript runsfrom right-to-left acrossthepage(whetherright-to-left in horizontallinesasin Hebrew, or right-to-left in verticalcolumnsasin Chinese),thereis a tendency towantto readfrom “right-to-left”. So,holdinganEnglishbookwith thespinepointingaway from you, you startreadingon the leftmostpageandcontinueto the rightmostpage.For a Chineseor Hebrew book,holdingthebookin thesameconfigurationyoustartfrom therightmostpage.

Wewill havenothingfurtherto sayaboutbindingpracticeshere,but it will beuse-ful to dwell for amomenton Harris’ secondcharacterizationof text direction,namelythe arrangementof lines on a page. We have of coursedealt with the issueof themacroscopicarrangementof symbolswithin a line (or column)of text by assuming

a singlemacroscopiccatenationoperatorsuchas r�f or n f for a givenscriptor styleofwriting within a script. Oneis thereforetemptedto dealwith thearrangementof linesof text in a similar fashion.Suchanaccountfor Englishwould runasfollows:

(2.18) (a) ôpí��^�Ä�Gg v r fxg w r f¡ñ*ñ�ñ r fxg ò , for g a letter, symbolor space.

(b) �:��î6�Q�Gôpí��^� v n fXôpí��^� w n f^ñ*ñ�ñ n f9ô&í��^� òSoa line would beanarrangementof basicsymbolsusing r f anda pagewould bean

arrangementof linesusing n f .Theproblemwith this accountis thatit wouldappearto violateLocality: a line of

text doesnot constitutea SmallLinguistic Unit, andthereforetheswitchfrom r f to n fwould constitutea violation. This considerationwould appearto forceanalternativeview, onewhich alsohasthe benefitof beingmoreintuitively appealing.Underthisalternativeview, text is written on a virtual tapein onedirectiononly: in Englishthiswould befrom left to right, in Chinesefrom top to bottom.This tapeis then“pasted”

17Similar considerationspresumablyalsoaccountfor the extremerarity (as the survey in (DanielsandBright, 1996)shows) of scriptswherecolumnsrun from bottomto top.

58 CHAPTER2. REGULARITY

to aphysicalsurface,with theinevitableconsequencewith asufficiently long tapethatonewill run out of spaceon a line (or column)andhave to wrapthe tapeto thenextcolumn.

Thereareonly threereasonablewaysto performthiswrapping.Thefirst, practicedby everymodernwriting system,involves“cutting” thevirtual tape,andcontinuingonthenext line or columnnearwhereonestartedthepreviousline; see(a) in Figure2.8.Thesecondandthird methodsbothinvolve startingat thesideof thepagewhereonefinishedtheprecedingline, andheadingbackacrossthepage.In orderto do this,onemustbendthetapearound,with theimmediateconsequencethatthefaceof theglyphsmustalso be turnedaround; notethat the term “f ace” is suggestedby Harris (1995,page132). The mostcommonway to performthis bendingis to fold the tapeoveritself so that the glyphs— now runningin theoppositedirectionacrossthe physicalsurface— areflipped aroundthe vertical axis; seediagram(b) in Figure2.8. Thisis thestandardboustrophedonwriting foundin severalancienteasternMediterraneanscripts,all of which involve this changeof facearoundthe vertical axis; seevariouschaptersin (DanielsandBright, 1996). Theotherway to bend the tapeis to twist itaroundso that the glyphsrunning in the backwardsdirectionareupsidedown: this“invertedboustrophedon”systemseemsto be foundin only a handfulof scripts,onebeingtheEasterIslandrongorongoscript(Fischer, 1997b),andanotherbeingthean-cientItalianscriptVenetic(Lejeune,1974;Bonfante,1996).See(c) in Figure2.8;notethat rongorongoactuallyrunsfrom bottomto top acrossthesurface,ratherthantop

to bottomasdiagrammedhere.(Veneticapparentlyhadbothkindsof boustrophedon,eitherflipping the faceof the characterswhenswitchingdirection,or elseinvertingthem:(Lejeune,1974,pages180–181).)

It is importantto understandthatthevirtual tapemodel,andtheconsequencesthatfollow from it whichwehavejustseen,areadirectconsequenceof thetheoryadoptedhere:accordingto Locality onecannotmodelboustrophedonwriting asa line-by-lineswitchof catenationtype,any morethanonecanmodelthe arrangementof linesona pageby modelingwithin-line catenationas(e.g.) r f andacross-linecatenationas

n f . The flip of facein boustrophedonsystems,which follows from the virtual tapemodel, is thus effectively forced by the theory. A boustrophedonsystemthat doesnot involve a changeof facewould thusbe a problemfor the theory. It is thereforeinterestingto notethatsuchsystemsdo not appearto becommon,if indeedthey existat all: variousauthors,including Harris (1995)allude to the existenceof suchnon-flipping boustrophedonsystems,yet I have beenunableto find specificexamplesofthis phenomenon.

Nonetheless,onemustpoint out thathigherlevel script-dependentgeneralizationsareoften muchfreer thanthe micro-constraintswe have discussedelsewherein thischapter. Thus,with the exceptionof the relatively few Chinesecharactersthat havealternateforms, thereis generallyonly oneway to arrangethe componentswithin aChinesecharacter. Ontheotherhand,modernChineseallowstwo schemesfor arrang-ing the symbolson the page,the traditionalonedescribedabove, and the Western-influencedleft-to-right/top-to-bottomarrangementfound in English; the samefacts

2.5. MACROSCOPICCATENATION: TEXT DIRECTION 59

a)

This is a test of the emergency broadcast

system. This is only a test. Had this been a

real emergency, you would have been

This is a test of the emergency broadcast

system. This is only a test. Had this been a

real emergency, you would have been

c)

This is a test of the emergency broadcast

system. This is only a test. Had this been a

real emergency, you would have been

b)

Figure2.8: Threemethodsof wrappingthevirtual tape:(a) standardnon-boustrophedon;(b)boustrophedon;(c) invertedboustrophedon.

60 CHAPTER2. REGULARITY

holdfor otherEastAsianscripts.18 Onceonemovesoutof thedomainof printedtexts,otherarrangementsoften becomepossible. Shopsignsin Englishmay be arrangedwith thecharactersrunningfrom top to bottom,for instance,andmorenovel arrange-mentsarepossiblein othercontexts, as long assomenotion of sequentialityis pre-served.Thislooseningof constraintsasonemovesfrom themicroto themacrolevel ishardlysurprisingandhasanexactanalogin linguisticstructure:thesyntacticpossibil-ities for combiningmorphemeswithin wordsaregenerallyhighly constrainedacrosslanguages,andin many “fix edword order” languagesthe possibilitiesfor rearrange-mentsof wordsor phraseswithin sentencesarealso limited. Beyond the sentence,however, the interrelationbetweenunits (sentences,paragraphs,turns in a dialogue,etc.) is muchmorelooselyconstrainedby purelyformal linguistic considerations,andmuchmoregovernedby considerationsof how languageis used.Similarly, themacrolevel of written text is constrainedby considerationsof usageasmuchasby formalorthographicconstraints,a point thatHarris’ (1995)discussionbringsoutnicely.

18A thirdscheme,aSemitic-styleright-to-left/top-to-bottom arrangementisalsofoundin Chinese,thoughperhapsnot ascommonin ordinarytext astheothertwo.

2.A. CHINESECHARACTERS 61

2.A SampleChineseCharactersand their Analyses

The following pagescontaina randomlyselectedsampleof 194 Chinesecharactersthatareaccountedfor by (2.8),andanadditional38 thatarenot. In theanalyses,thedeterminingcomponentis listedfirst, after theequalssign,andbeforethecatenationoperator. Thesemanticcomponentis boxed.Thesymbol‘ � ’ markscharacterswherethe independentpronunciationof the phoneticcomponentis a perfectpredictor(in-cluding tone)of the pronunciationof the complex character. This is perhapsa morestringentdefinition of phonologicalregularity thanis strictly speakingnecessary, butit hastheadvantageof beingeasyto compute.

Notethatwhatis listedfor theincorrectlyanalyzedcharactersis thecorrectanaly-sis: thepredicted(but incorrect)catenationoperatoris shown in parenthesesaftereachexample.In theseexampleswe list thesemanticcomponentuniformly first sinceit isunclearin many caseswhatthedeterminingcomponentis.

62 CHAPTER2. REGULARITY

CorrectlyAnalysedCharacters

����� � ���������� �

����

�����   � ��¡��¢�� � � ��£��¤�� ¥ � ��¦��§�� � ����¨��©�� ª � ��«��¬�� ­

®��¯

��°�� � ����±��²�� � � ��³��´�� ª � ��µ��¶�� ¥ ����·��¸�� ­

®��¹

��º�� » ¼���½��¾�� ¿ � ��À��Á��  � ��À��Ã�� Ä � ��¡��Å�� Æ ����Ç��È�� ¥ � ��É��Ê�� Ë � ��Ì��Í�� ­

®��Î

��Ï�� ­®��³

��Ð�� Ñ � ��Ò��Ó�� Ô � ��Õ��Ö�� ×ÙØQÚ�Û Û

��ÝÜ

��Þ�� ¿ ����ß��à�� ¿ � ��á��â�� ã � ��ä��å�� æ � ��ç��è�� é ����ê��ë�� é ����ì

��í�� ¥ ���ïî��ð�� Ñ ���ïñ��ò�� æ � �ïó��ô�� × � �ïõ��ö�� ÷ ¼ �ïø��ù�� ú

®�Fû

��ü�� Ñ � �ïý��þ�� Ñ � �ïÿ����� � ���������� é � ��������   � ������� ���������� � �������� Ä � �������� � ����������   � �������� ­

®���

����� � ���������� � ¼ �������� é � �� ��!�� ¥ � ��"��#�� ú

®��$

��%�� � ����&��'�� ( ����)* � � � �,+- � � � ��«. � � � �0/1 � Æ � �,23 � 4 � �,56 � ª � �,78 � ª � �,9: � ¥ � �,2; � � � �,<

2.A. CHINESECHARACTERS 63

= � >@?BA C DFEG � 4 � ��HI � J � ��KL � ª ����MN � ª ����OP � ¥ � ��QR � ¥ � �ï«S � ¥ � ��TU � ¥ ���V/W � X

®�ZY[ � » ¼���\] � × � ��^_ � ª ����`a � b ����Tc �  � ��de � f � ��gh � � � ��9

i � \��kjl � m � ��no � � � ��pq � » ¼��ïÿr � » ¼ ��st � 4 � �Vuv � ( ����w

x � × Ø0Ú�Û Û��zy

{ � ª Ø0Ú�Û Û��}|~ � ª � ���� � ¿ ������ � ¿ ������ � ¿ � ���� � ¥ � ���� � � � ���

� � f � �,�� � Ë � �,�� � ­

®�}�� � � �,�� � � � �,�� � Æ � �,�� � 4 � �,�� � ( � �,�� � × � �,�� � ª � �,�� �   � �,¡¢ �  � �,£

¤ � ú®�}¥¦ � Ë � �,§¨ � � �,©ª � � �,�« � Æ � �,¬­ � Æ � �,®¯ � ª � �,�° � ª � �,±² � ¥ � �,³´ � µ � �,p

¶ � ú®�}·

¸ � ú®�}¹º �   � �,p

» � \��}¼½ � � � �,¾¿ � � � �,gÀ � � � �,Á � à D ¿Ä � � � �,ÅÆ � Æ � �,ÇÈ � Æ � �,É

64 CHAPTER2. REGULARITY

Ê � é ����ËÌ � 4 ����ÍÎ � 4 � ��ÏÐ � ª � ��ÑÒ � ¿ � �VÓÔ � ¥ ����ÕÖ � ×@ØÚÙÜÛ

���ÝÞ � Ë � ��ßà � Ë � ��áâ � æ � ���ã � ä ���ïóå � � ����Oæ � ç � ��èé � 4 ����ê

ë � ì®�Zíî � ª ����íï � ¿ � ��ðñ � ¿ � �ïî

ò � ­®�Zóô � æ � ��õö � � ¼ ��÷ø � Ä � ��ùú � ª ����ûüFý þ ÿ����

�Fý � ÿ�����Fý � ÿ ��Fý � ÿ���� Fý � � ����Fý � � ���� ý � ÿ ����Fý � ÿ ����Fý � � ���

�Fý � !�#"$Fý � !�#%&Fý ' ÿ!�#()Fý+* � � ,-Fý . ÿ �#/0Fý 1 ÿ �#23Fý � ÿ!�#45Fý 6 � �879Fý 6 � �8:;Fý � ÿ!�#<=Fý > ÿ �#�?Fý @BADCFE E G �8HIFý 1 ÿ!�#JKFý � � �8LMFý 6 � �8NOFý > ÿ!�#PQFý R !�#STFý U ÿ �#VWFý � ÿ!�#XYFý Z �#[\Fý 6 � �8]^Fý _ ÿ �#`aFý � ÿ!�#bcFý 1 ÿ!�#`dFý 1 ÿ �#efFý g ÿ �#MhFý R !�#ijFý � ÿ �#klFý monFp q rtsuFý v ÿ �#wxFý y ÿ!�#z{Fý+| � � }

2.A. CHINESECHARACTERS 65

~Fý�| � � ��Fý�| � � ��Fý�| � � �

66 CHAPTER2. REGULARITY

IncorrectlyAnalysedCharacters��� ý � G ���

(� �)

��� ý � ÿ��#�(G �)

��� ý � G ���(ÿ �

)��� ý v � ���

(ÿ!�

)��� ý , G ���

(ÿ!�

)��� ý , G ���

(ÿ!�

)��� ý � G ���

(ÿ!�

)�Fý ' G ���

(ÿ �

)�Fý ' G ���

(ÿ �

)�Fý ' G ���

(ÿ �

) Fý . G ��¡

(ÿ �

)¢Fý £ ÿ ��¤

(G �)

¥Fý ¦ G ��§(ÿ �

)¨ ý v G ��©

(ÿ �

)ªFý � G ��«

(ÿ��

)¬Fý ­ G ��®

(ÿ��

)¯Fý g G ��°

(ÿ��

)±Fý v � ��²

(ÿ��

)³Fý � � ��´

( ��

)µFý ´ G ��¶

(ÿ �

)·Fý . � ��¸

(ÿ �

)¹Fý ¦ ��º(ÿ��

)»Fý ­ G ��R

(ÿ��

)¼Fý � G ��½

(ÿ��

)¾Fý ¿ ��À

(� �)

ÁFý � ÿ ��Â(G �)

ÃFý v � ��Ä(ÿ �

)ÅFý Æ ��Ç(ÿ��

)ÈFý É G ��Ê

(ÿ��

)

ËFý � ÿ!�#Ì(G �)

ÍFý v � �8Î(ÿ �

)ÏFý É G �8Ð

(ÿ��

)ÑFý . � �8Ò

(ÿ �

)ÓFý � G �8Ô

(ÿ �

)ÕFý g G �8Ö

(ÿ �

)×Fý � G �8/

(ÿ �

)ØFý > G �8Ù

(ÿ �

)ÚFý É G �8g

(ÿ �

)

Chapter 3

ORL Depth and Consistency

In this chapterwe addressthesecondof thetwo proposalspresentedin Section1.2.3,namelyConsistency. We will first examinetheorthographiesof RussianandBelaru-sian,which form a near minimal pair from thepoint of view of thelevel of theORL.We will show thatasimplecoherentanalysisof thetwo systemscanbeobtainedif weassumethat in eachcasethe ORL is a Consistentlevel, the only differencebetweenRussianandBelarusianbeingthedepthof thatlevel.

We thenturn our attentionto English. In light of theanalysisof RussianandBe-larusian,how deepis the ORL for English,andcanoneassumeConsistency for theORL?Theevidencewe will examinesuggeststhatConsistency is possible.Not sur-prisingly, theanalysisis simplerif we assumea relatively deepORL, thoughperhapsunexpectedlythe evidenceis not asclearcutasin the caseof RussianversusBelaru-sian.

An apparentcounterexampleto Consistency is foundin theorthographicrepresen-tation of obstruantsin Serbo-Croatian:thesedataarediscussedin Section3.3, anddatafrom a smallphoneticexperimentarepresentedthatsuggestthat in fact thedataarenot a counterexample,but ratheroffer supportfor Consistency.

Anotherpotentialproblemfor Consistency wouldbeevidencethatthespellingof awordneedsto beconstructedin acyclic fashion,sincethiswouldseemto suggestthattheremight in effect be several “ORL’s” for a givenmorphologicallycomplex word,onefor eachcycle. A potentialexampleof this is discussedin Section3.4.

Finally, aswe discussedin the introduction,we assume,similar to Nunn (1998),that ÛoÜÞÝàß ÿâá can be split into two components,which we have termed Ûoãåä8æèçêéìëand Û+íïî ëèðñð . Onecomponentof the latterarewhatwe cantermsurfaceorthographicconstraints, andfor lack of a betterplaceto discussthem,we turn to shortdiscussionof this topic in Section3.5.

67

68 CHAPTER3. ORL DEPTHAND CONSISTENCY

3.1 Russianand Belarusian Orthography: A CaseS-tudy

Oneway to illustratethe functionality of the proposedmodelis to comparetwo lan-guagesthat have similar phonologies,but select different levels for the ORL. Analmostidealpair of languagesfor this purposeis RussianandBelarusian.Theselan-guagessharemany phonologicalfeatures,including strongvowel reductionin un-stressedsyllables,andpalatalizationassimilationin consonantclusters. The ortho-graphicrepresentationof thesephonologicalphenomenais, however, quitedifferentinthetwo writing systems.

3.1.1 Vowel reduction

Oneway in which theRussianandBelarusianorthographiesdiffer is in thetreatmentof the vowel reductionprocessknown in the Slavic literatureby the namesakan’je,ikan’je andjakan’je. We have alreadyseenaninstanceof akan’je — reductionof /a/and/o/ — in theexampleò8óõôöó�÷öøúù gorodaû /gü r ýÿþda/‘cities’ in Section1.2.1.Ikan’jeinvolvesthereductionof /(j)e/ and/(j)a/ aftersoftconsonants1 to / � /: two examplesare������� ù jazyk û /j � þz � k/ ‘language’,and � ô�� õó8÷ ù perevodû /p,� r, � þvot/ ‘translation’.(We usea “,” to denotea palatalizedconsonantin the phonetictranscription.) Thedetailsof Russianakan’je/ikan’jearewell-known (Wade,1992,pages5–7):� In pretonicposition(i.e. in the syllableprecedingthe lexically stressedsylla-

ble), word-initial in anunstressedsyllable,or word-final in anopenunstressedsyllable,underlying/o/ and/a/ (afterhardconsonants)arereducedto / ý /.� In all otherunstressedsyllablesunderlying/o/ and/a/ (after hardconsonants)arereducedto / ü /.� Underlying/(j)e/ and/(j)a/ (aftersoftconsonants)arereducedto / � / in unstressedsyllables.

Russianrepresentsneitherreductionprocessin its orthography, andso it seemsrea-sonableto suppose,asis typically done(cf. again(Cubberley, 1996)),thatRussianor-thographyis morphological in thesensethatit representsanunderlyingphonologicallevel — UL, thoughthis doesnot necessarilyrepresentthe mostabstractphonologi-cal level onecouldposit. The tablein (3.1) givesthe levelsof representationfor thewordsfor ‘cities’ and‘translation’. Here,andelsewhere, � denotestheactualsurfacephonemicrepresentation(i.e., thepronunciation):

(3.1)‘cities’ ‘translation’

ORL ( � UL) goroþda pereþvod� òÿóõô ó8÷ ø � ô�� õó8÷� gü r ýÿþda p� r � þvot1Russianand Belarusianphonemicallydistinguishsoft or palatalizedconsonantsfrom hard or non-

palatalizedconsonants.

3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 69

A setof spellingrules— Û ãåä�æèç éìë — includingtherulesin (3.2)(copiedin partfrom(1.5)) is sufficient to accomplishthismapping:2

(3.2) g � ò ù g ûo � ó ù o û������������! #"�$&%('*)�%,+r � ô ù r ûd � ÷ ù d ûa � ø ù aû������������! #"�$&%('-),%,+p � ù p ûe � úù eû������������! �+v � ù v û

Notetherestrictionson therewrite of vowels: /o/ and/a/ appearas ó ù o û and ø ù aûonly after hardconsonants( � �������! #"�$&%,'*),%�+ ); andin the majority of Russianwords,/e/ appearsas ù eû after all consonants,whereasin syllable-initial position/e/ (asopposedto /je/) appearsasthe non-palatal. , which we will notatehereas ù eû (seealsoSection3.5).

Belarusianalsohasakan’jeandikan’je — thelattercalled jakan’je, thatbehavevery similarly to their Russiancounterparts.However, unlike the situationin Rus-sian, Belarusianorthographygenerallyreflectstheseprocesses;see(Carlton, 1990,pages299–301).Therulescanbestatedasfollows(following Carlton):� In pretonicposition,or word-initial in anunstressedsyllable,underlying/e/and

/o/ arereducedto /a/.3� In all otherunstressedsyllablesunderlying/o/ and/e/ (after hardconsonants)arereducedto /a/.

(We returnat the endof this sectionto the caseof non-pretonicunstressed/e/ aftersoftconsonants.)Examples(from (Krivickij andPodluzhnyj, 1994,pages15,22))are /�0� ôtù vecerû / þv,et 1�" er/ ‘wind (noun)’, versus �(2 ô � ù vjatry û /v,aþ tr � / ‘winds’;3 óõò i4 ù nogiû / þnogi/ ‘feet’, versus3 ø�ò8ø ù nagaû /naþga/‘foot’; and 0!. ò54öø ù ceglaû/ þ t 1 eagla/‘brick’, versus0àø�ò#4 �,3�� ù cagljany û /t 1 aþgl,an

�/ ‘madeof brick’.

Similar to thetablegivenfor theRussiancases,we canassumethe tablesin (3.3)for theBelarusianexampleswe have just discussed.In this casetheORL reflectstheapplicationof vowel reduction,and � is effectively thesameastheORL:5

2While we expressthe ruleshere,andelsewhere,asa setof orderedrewrite rules, thereis, often nocrucialorderingto suchrules.Whenorderingis notcrucialthey arebestviewedasasetof paralleltwo-levelrulesin thesenseof (Koskenniemi,1983).

3It is unclearwhetherthis is really /a/ or somethingmoreakin to / ý /, detailedphoneticdescriptionsofBelarusianakan’jehaving provedelusive.

4TheRussiansymbol 687 i 9 , is notusedin Belarusian.Notealsothat ò , representedphonemicallyhereas /g/, is actuallyavoicedfricative, oftentransliteratedas/h/ (WaylesBrowne,personalcommunication).

5On the changefrom /t/ to /t : / in the form /;0� ô<7 vecer9 ‘wind’, which is also reflectedin theorthography, seeSection3.1.2. In the underlyingrepresentationswe assumean underlying/t/ for surface/t : /.

70 CHAPTER3. ORL DEPTHAND CONSISTENCY

(3.3)‘wind’ ‘winds’

UL þv,et,er v,eþ tr �ORL þv,et1 er v,aþ tr �� /�0� ô �,2 ô �� þv,et1 er v,aþ tr �

‘feet’ ‘foot’UL þnogi noþga

ORL þnogi naþga� 3 óõò i 3 ø�ò8ø� þnogi naþga

‘brick’ ‘madeof brick’UL þ t,egla t,eþgl,an

�ORL þ t 1 egla t 1 aþgl,an

�� 0!. ò54öø 0àø�ò#4 �(3��� þ t 1 egla t 1 aþgl,an�

Thespellingrulesnecessaryto mapfrom ORL to�

for Belarusianincludethosein (3.4)

(3.4) v � ù v ûe � ù eû=�����������! #">��%('*)�%,+t � 2 ù t ût 1 � 0 ù c ûr � ô ù r ûa � � ù ja û��,� �������! 5"?��%,'*),%�+� � � ù y ûn � 3 ù n ûo � ó ù o ûg � ò ù g ûi � i ù i ûa � ø ù aû=�����������! #"�$&%('*)�%,+e � . ù eû@�����������! #"�$&%('*)�%,+l � 4 ù l û

The encodingof consonantsandmostvowels is identicalto that in Russian:the on-ly differencesevident in thesecasesareencodingof /e/ following hardconsonants,which is .úù eû in Belarusian(which is generallydisallowed exceptin syllable-initialpositionin Russian),andthedifferentsymbolusedfor /i/.

It is worthnotingthat,at leastin someversionsof Belarusian,akan’jeandjakan’jeoccursnot only within wordsbut alsowithin clitic groups— andis likewisereflectedin the orthography. Thus in the text of (Lyosik, 1926),onefinds examplessuchasthe following, involving the negative clitic /ne/, in many environmentswritten as 3 ù neû , but in the jakan’jeenvironmentwritten as 3�� ù njaû , evidently reflectingthe

3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 71

pronunciation/n,a/.Contrasttheexamplesin (3.5a)with thosein (3.5b):6

(3.5) (a) 3��A2 ´óB4�C � i ù nja tol’ki û ‘not only’ page75,833�� ˘D,E ´ ù njawseû ‘not all’ page853�� ı F ø#0�0àø ù njapısaccaû ‘is not written’ page109

(b) 3 HG#ø�ò54 ı ù nemaglı û ‘they werenot able’ page823 E 4 ø# ´�,3�E�� i ù neslavjanskiû ‘not Slavic’ page983 HI/�4öø�ô ´D,E�� i ù nebelaruskiû ‘not Belarusian’ page95

However, themorerecentdiscussionin (Krivickij andPodluzhnyj, 1994,page22)explicitly deniesthat Lyosik’s examplesarecorrect,andcontraststhebehavior of 3 ù neû ‘not’ and I/ � ù bezû ‘without’ asseparateclitics andasprefixes:

Regularly written with [ ù eû ] are the particle 3 [ ù neû ] and theprepositionI/ � [ ù bezû ]; if, however, they appearasprefixes,thentheyobey thegeneralrule of jakan’je: I/ � 4�J ÷ � ˘6 [ ù bezljudzejû ‘withoutpeople’],but I ��� 4�J ÷ 3�� [ ù bjazljudny û ‘unpopulated’],3 �FKG ø 2 [ ù nesmatû ] ‘not much’,but 3�� FKG#ø 2 [ ù njasmatû ‘a little’] . . .

Evidentlythis differenceis dueto a changein Belarusianorthographicnorms since Lyosik’s day (rather than being due to an actual changein the lan-guage).7 Krivickij andPodluzhnyj’s choiceof wording — “regularly written with . . . ” (“ ô� ò D 4 � ô 3 óL�6�F D(2BE;�NM ô� � ”) — certainlysuggeststhis possibility. Ifthis is the case,andthis doesreflecta spellingreform in Belarusian,it is interestingto notethatit is, consistentwith themostrecentDutchspellingreform,whichwe willdiscusslateron in Chapter6, a reformthatfavorsmorphological,ratherthanphoneticregularity: asa resultof this orthographicprinciple, 3 ù neû ‘not’ and I/ � ù bezû‘without’ arespelledthe same(at leastwhenthey areusedasseparatewords),eventhoughthey maychangein pronunciation.However this maybe,themostdirect im-plementationof the versionof Belarusiandescribedby Krivickij andPodluzhnyj intermsof our model is to assumethat bez‘without’ andne ‘not’ aresimply lexicallymarkedto alwaysbespelledwith úù eû .

Over andabove the spellingconventionsfor 3 ù neû and I/ � ù bezû , andde-spiteBelarusian’s generaltendency to have a shallow, phonemically-basedorthogra-phy, therearea few lexical exceptionsto theorthographicconventionsfor akan’jeandjakan’je. Krivckij and Podluzhnyj note that unstressedù eû is written in ÷ � � �POø 2/� ù dzevjaty û ‘ninth’ (from ÷ � � � 0�C ù dzevjac’ û ‘nine’), ÷ � E;�,2/� ù dzesjaty û‘tenth’ (from ÷ � E�� 0�C ù dzesjac’ û ‘ten’), andin someothernumerals.And etymo-logical ù o û can be found in unstressedsyllablesin loan words: 2/� G#ó 4 ¨ ò i M,3��

6Nonetheless,about25% of thirty oneexamplescollectedfrom Lyosik’s text seemto be inconsistentwith the correctapplicationof jakan’je for /ne/. For examplehe uses

3�� I � 4 óQ7 nja bylo 9 ‘wasnot’

(page63). Stressmustbeonthesecondsyllableof I � 4öó�7 bylo 9 ‘was’ (asit is in Russian),asevidencedby non-applicationof akan’jeto the/o/. Therefore,the/e/ of /ne/shouldin principlebewritten as R7 e9 ,sinceit is not in apretonicsyllable.

7It seemsplausiblethatLyosik’sorthographicschemewasin factexperimental,sinceBelarusianorthog-raphyhadprobablynotbeenstandardizedwhenLyosikwrote(ElenaPavlova,personalcommunication).

Also, it seemsthat thespellingsystemassumedby Krivckij and Podluzhnyj wasimposedby decreebyStalinin 1933,replacinganearlierpopularspellingsystem(Maksymiuk,1999).

72 CHAPTER3. ORL DEPTHAND CONSISTENCY

ù etymoljogicny û ‘etymological’. Suchcasesmustpresumablysimply be lexicallymarked. For example,for the word ‘nine’, we canassumea (partial) lexical repre-sentationas in (3.6a),whereonly the ù eû is lexically specified. (Recall that theorthographicspecificationsin the Russianexamplein (1.4) were redundant: we as-sume,in fact, that the orthographicrepresentationof òÿóõô ó8÷ ø ù gorodaû ‘cities’ isregularly derived.) The lexical specificationwill thencarryover to thederivedform‘ninth’, asshown in (3.6b):

(3.6) (a) SPHON T�þ UBV;WYX*Z\[�"^]#_a`Y"^bORTH c� X�d e

(b) SPHON T-U V W\X Z þ [�"f]B_g'hbORTH c; X d e

Theremainingcasesof potentialjakan’je,namelycaseswhere/e/ occursin non-pretonicunstressedsyllablesaftera palatalconsonant,areof uncertainstatus.In suchcases,Carltonnotes(1990,page300)thatBelarusian“specialistsdiffer . . . [s]omerec-ommend[ing]’a asthecorrectpronunciation. . . ”, othersrecommending/e/,or “evena vowel between’a and’e”. While thetwo unequivocalinstancesof akan’je/jakan’jearereflectedin Belarusianspelling,this latterinstanceis not. Thespellingthatis cho-sen— ù eû or � ù ja û — dependsuponthe spellingof the vowel in the root inquestionin a form wherethat vowel is stressed.Thus(to useCarlton’s examples),wehave 4� E ù lesû ‘forest’, 4 �,E;3 i � ù ljasnık û ‘woodsman’,but 4 E;3 i � ı ù lesnikiû‘woodsman’;but 0 � ij� ø5 8ø 2/� ù tsjazkavatyû ‘somewhat heavy’, derived from 25OE;�,ij� ø ù tsjazkaû ‘heavily’. In any case,the pronunciationof the boxed vowels isthesame,though,aswe have noted,Belarusianspecialistsdiffer asto what it shouldbe. On thefaceof it, then,we would appearto have a casewhereBelarusianspellingbehavesmorelike that of Russian,in representingan underlyingratherthansurfacevowel, somethingthatwouldappearto bein directviolationof Consistency.

However, a possiblesolution to this dilemmasuggestsitself: supposethat thejakan’je of non-pretonicunstressedpost-palatal/e/’s — call it “jakan’je-B” — is adifferent processthan the remainingcasesof akan’je/jakan’je(“(j)akan’je-A”), andsupposefurtherthatjakan’je-Boccurslaterthan(j)akan’je-A.Thenonecouldassumethat theORL for Belarusianrepresentsa stageat which (j)akan’je-A hasapplied,butbeforejakan’je-B hasapplied. What lendsplausibility to this suggestionis precise-ly the disagreementamongBelarusianspecialistsasto what thevowel in suchcasesshouldbe,which contraststo their (apparent)agreementaboutall otherinstancesofakan’jeandjakan’je. If this disagreementreflectsa truephoneticvariationin the im-plementationof jakan’je-B — one that is not in evidencefor (j)akan’je-A — thenit is quite possiblethat thesedo in fact representtwo stagesof vowel reduction,one((j)akan’je-A) which is firmly rootedin thephonologyof the languageandtheother,

3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 73

which is lessfirmly establishedandwhich is subjectto morevariationacrossspeaker-s.8 Furtherevidencefor this positioncomesfrom thehistoryandpresentdistributionof akan’jeand ikan’je in Russiandialects(Avanesov, 1974,andElenaPavlova, per-sonalcommunication). Ikan’je definitely postdatedakan’je in Moscow dialectsofRussian,duein part to the fact that the distinctionbetweenhardandsoft consonantshadnot stabilizeduntil quite late (14thcentury). In modernRussiandialectsthereisstill a greatdealof variationin ikan’je, comparedwith akan’je,suggestingthatevenin Russianthe two processesmay be at different levels of the phonology. The ulti-matecorrectnessof this suggestionnecessarilyawaits further study, but if it canbemaintained,thenthesefactsdo not constitutea counterexampleto Consistency.

3.1.2 Regressivepalatalization

AnotherdifferencebetweenRussianorthographyandat leastsome versionsof Be-larusianorthographyis in thetreatmentof regressivepalatalizationof consonants.

In Russian,a dentalor alveolar consonantbecomespalatalizedif the followingadjacentdentalor alveolar consonantis also palatalized. More specifically(Wade,1992,pages9–10):9� Dentalstops(/t/, /d/, /n/) becomepalatalizedbeforeapalatalizeddentalor alve-

olar: thus/dn,i/ ‘days’ is pronounced/d,n,i/� Alveolarfricatives(/s/, /z/) followedby a palatalizeddentalstop,alveolarfrica-tiveor lateral: thus/v ýõþzn,ik/ ‘arose’is /v ýõþz,n,ik/

We mayassumefor thesake of concretenessthat theassimilationinvolvesspreadingof thefeature[ � high] within thesequenceof consonants.10

While palatalizationis marked in the orthographyof Russian,thereis no specialmarkof thespreadingitself. That is, thefinal consonantof theclusteris orthographi-cally markedaspalatal,eitherby virtueof its occurringbeforeoneof the“soft” vowels— ù (j)e û , 6 ù i û , J ù ju û , ¨�ù jo û or � ù ja û , or elseexplicitly usingthesoftsignC ù ’ û . But consonantsinternalto theclusterarenot markedaspalatal.11 Thusin theword E;2 C ù jest’ û ‘is, are’, thefinal /t/ is orthographicallymarkedaspalatalwith thesoft sign C , but in fact theentire/st/ clusteris palatal:/jes,t,/. We assumethetablein(3.7) for this case:

8One might suppose,along theselines, that (j)akan’je-A is a lexical phonologicalprocess,whereasjakan’je-Bis apostlexical or phoneticprocess.

9Wayles Browne notes(personalcommunication)that this spreadingof palatalization acrossden-tal/alveolarconsonantclustersis a featureof olderdialects,andis becominglessprevalentin contemporaryRussian.

10Notethoughthatwordsadmitof alternatepronunciations,includingsomecasesthatdo not fall undertherubricof thetwo classeslistedabove: theseincludecaseslike /dv,er,/ ‘door’, whichmaybeeither/dv,er,/or /d,ver,/. Many suchexceptionscanbefoundin (Avanesov, 1983).

11Consonantsinternalto clusterscanbemarkedwith asoftsign,but in thatcasethey arelexically palatal,andthishasnothingto dowith theassimilationprocessthatwearediscussingnow: anexampleis

E�D ÷�C5I8ø7 sud’ba9 ‘f ate’

74 CHAPTER3. ORL DEPTHAND CONSISTENCY

(3.7)‘is, are’

ORL ( � UL) jest,� E;2 C ù jest’ û� jes,t,

Palatalizationandregressivepalatalizationin Belarusianis on thewholesimilar tothat of Russian(see(Krivickij andPodluzhnyj, 1994,pages55–57)),but thereareacouple of notabledifferences.Onedifferencein thepalatalizationprocessitself is thatpalatalized/t/ and/d/ becomepalatalizedaffricates,/t 1 ,/ and/dk ,/ respectively. ThusforRussian/d,at,ka/‘uncle’ we have in Belarusian/dk ,at1 ,ka/; for Russian/t,es,t,/‘f atherin law’, Belarusianhas/t 1 ,es,t1 ,/.

A seconddifferenceis thatdentalstopsin Belarusianregularlypalatalizebeforeapalatalized/v/. Thusalongsidethemasculine/neuterform /dva/ ‘two’, we have femi-nine/dk ,v,e/ ‘two’; alongside/catyry/‘four’, wehave thecollective form /cat1 ,v,ora/.

Onceagainunlike thesituationin Russian,Belarusianorthographicallymarksre-gressive palatalization,at leastin casesinvolving /t 1 ,/ and /dk ,/, which have a sepa-rateorthographicrepresentation,namely0mlnC/o ù c(’) û and÷ � lnC/oöù dz(’) û respectively.Thus/dk ,v,e/ ‘two (feminine)’ is written as ÷ � B ù dzveû , and/cat1 ,v,ora/‘four (col-lective)’ is writtenas M ø#0� ¨ ô ø ù cacvjoraû . ThusalthoughKrivickij andPodluzhnyjnotethattheeffectsof regressivepalatalizationarenot indicatedin writing (page56),this is not strictly correctsincein the caseof palatalized/t/ and/d/, theseareortho-graphicallymarkedasaffricates,thoughthey arenot followedby a soft sign C ù ’ û .However, for a form like ¨ E 0�C ù josc,û ‘is, are’, which is pronounced/jos,t1 ,/, thepalatal/s/ is certainlynotmarkedorthographicallyin any way. Theexplanationfor thedifferencein behavior is presumablythattheaffricates/t 1 / and/dk /, whetherpalatalizedor not have a standardorthographicrepresentationas 0où c û and ÷ � ù dzû ; whereas/s/, for instance,only hasonerepresentation,namely E ù sû , andthereis no separatesymbol for a palatalized/s/. But why is the soft sign not written cluster internal-ly?12 We canpresumethat in modernBelarusianorthography, a soft sign’s functionis merelyto marka clusterof consonantsasbeingpalatalized.Thuswe couldwrite arule,thatwould form partof the Û ãÞä�æèç éìë for Belarusian,andthatwouldsimply insertasoft signafterapalatalizedconsonant,whenever it is not followedby avowel (sincein thatcaseoneusesoneof thesoft vowel symbols,which implicitly markpalatality)or anotherpalatalizedconsonant:

(3.8) pq�rC��,� �������! #"?��%,'*),%�+ (# st���������! #"�$&%('-),%,+vuThe table in (3.9) gives the representationsfor ÷� 8ø ù dvaû ‘two’ (mascu-

line/neuter)and ÷ � Búù dzveû ‘two’ (feminine):

(3.9)‘two’ (masculine,neuter) ‘two’ (feminine)

UL dva dv,eORL dva dk ,v,e� ÷� 8ø ù dvaû ÷ � / ù dzveû� dva dk ,v,e

12Thatis, besidescaseswheretheconsonantis lexically markedaspalatal,asin Russian.

3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 75

This caseis interesting,becausethe representationof regressive palatalizationinBelarusianorthographywould appearto beprimafacieevidenceagainstConsistency:on theonehandregressive palatalizationis representedwhenit involves/t, d/, whichbecomeaffricates;on theotherhandit is not representedfor any otherconsonant.Butaswe cansee,the Inconsistency is only apparent:the reasonthat the soft sign is notusedto mark regressive assimilationin generalmerelyrelatesto thestatementof therule thatspellsout thesoft sign.

Interestingly, Lyosik’susageis againdifferentfrom thatof Krivickij andPodluzh-nyj. In Lyosik’s usage,in clustersof palatalconsonants,all assimilatedconsonantsaremarked with a soft sign. ThuswhereasKrivickij andPodluzhnyj have 0� ¨ ôõ÷ �ù cvjordy û /t 1 v,ordy/ ‘hard’, Lyosik writes 0�C# ¨ ôõ÷ � ù c’vjordy û ; for ÷ � /âù dzveû/dk ,v,e/ ‘two (feminine)’, Lyosik has ÷ � C5 / ù dz’veû ; for ¨ E 0�C ù josc’ û /jos,t1 ,/ ‘is,are’,hehas¨ E C#0�C ù jos’c’ û . This is readilyinterpretableasasimplificationof therulein (3.8),which canberewritten to describeLyosik’sspellingconventionsasfollows:

(3.10) pA�wCH�,� �������! #"?��%,'*),%�+ xzy(Thatis, thesoft signis insertedaftera palatalizedconsonant,exceptbeforeavowel.)Thetablefor thefeminineform of ‘two’ shown abovein (3.9)now becomes:

(3.11)‘two’ (feminine)

UL dv,eORL dk ,v,e� ÷ � C5 / ù dz’veû� dk ,v,e

3.1.3 Lexical marking in Russianand other issues

Despitethe regularity of Russianspelling, thereare caseswhereone must assumelexical markingof orthographicinformation,andwe will examinea coupleof thesehere.Wewill start,however, by consideringacasethatmightappearto involvemarkedorthography,but whichin factinvolvesmarkedphonology, theorthographyitself beingperfectlyregular.

In theSlavic-derivedvocabulary, sequencesof hardconsonantsfollowedby non-palatalized/e/ do not occur. Basicallyeitheronegetsa palatalizedconsonant(e.g.,ot’ec ‘f ather’, with a palatalized/t’/), or elseone finds a hard (in this casepartial-ly palatalized)consonant,followed by /je/: otjezd ‘departure’. Thereare howevera large numberof borrowed words that have suchsequences,particularly thosein-volving dentalconsonants.Considerthe following examples,in which thehardcon-sonanthasbeenunderlined:seks ‘sex’, test ‘test’, arterioscl’eroz ‘arteriosclerosis’,g’eteromorfizm‘heteromorphism’,dekagram ‘decagram’.In eachof thesecases,thevowel /e/ is spelledwith ù eû : thusdekagram is spelled÷� � ø�ò8ô ø5G ù dekagramû ,andseksis spelled E ��E ù seksû . From the point of view of the readerof Russian,thesecasesinvolveanirregularusageof thewritten vowel âù eû . But thereis in factnothingirregularin thespellinghere:whatis irregularis thephonology. Thefactthat

76 CHAPTER3. ORL DEPTHAND CONSISTENCY

thevowel is spelledwith ù eû follows from generalconstraintson Russianorthog-raphy. This is becausetheonly otherway thatthevowel couldbespelledwould beas. ù eû , but this vowel symbolis generallydisallowedin non-syllable-initialposition.As weshallsuggestin Section3.5,this constraintis bestexpressedasasurfaceortho-graphicconstraintof Russian.What this entailsthenis that theunusualphonologicalstructurein thecaseswe have beenconsideringarespelledin the only way they canbe,with ù eû . Notethatthespellingrulesthatwe presentedin (3.2)alreadycapturethesefacts,sincepostconsonantal/e/ is rewrittenby theserulesas úù eû .

Onegenuinecaseof orthographiclexical marking involvesthe genitive endings-evo and -ovo, which arealwayswritten òÿó ù egoû and óÿòÿó ù ogoû . Thusfor theword bol’sovo ‘big’ (masculine/neutergenitive singular)it will be necessaryto onlyspecifythatthe/v/ is writtenas ò ù g û .

Anotherandsomewhatmoretroublingcasethat requireslexical specificationun-der the presenttheoryareprefixessuchas raz-, or bez-, which assimilatein voicingto thefollowing consonant,following regularprinciplesof Russianphonology. Basi-cally, word final obstruentsin Russianarealwaysvoiceless,andobstruentsassimilatein voicing to a following obstruentin a cluster;this assimilationappliesacrosswordsaswell aswithin words.13 With oneclassof exceptions,assimilationanddevoicingof obstruentsis never reflectedin theorthographyof Russian.Thus òÿóõô ó8÷ ù gorodû‘city’ is thuswritten, even thoughthe final /d/ is actuallya /t/; similarly, the phraseI/ � àø54�C50àø ù bezpal’caû ‘without a finger’ is thuswritten eventhoughthefinal /z/of bezassimilatesin voicing to thefollowing /p/, andis thusreally /s/. Thesystematicclassof exceptionsto this generalizationaretheaforementionedprefixes(alongwithv(o)z-andiz-; see(Wade,1992,page16)). Thuswe have I/ � Iõó ij3�� ˘6 ù bezboznyj û‘godless’,but IB E àø54 � ˘6 ù bespalyjû ‘fingerless’; ô ø � ÷ D G{C# ù razdum’eû ‘thought’,but ô ø E �6 E;� ø ù raspiskaû ‘receipt’. Notethatthis exceptionalspellingis only foundwith prefixesendingin (underlying)voicedobstruents.Thosesuchass- thatareun-derlyinglyvoiceless,retaintheir expectedspellingswhenprecedingvoicedobstruents(in which casetheir final consonantsbecomephonologicallyvoiced): thus sdavat’/zdý vat’/ ‘to handin’ is written E ÷ ø5 8ø 2 C ù sdavat’ û , not * � ÷ ø# �ø 2 C ù zdavat’ û .

In additionto voicing assimilation,akan’jeis also representedin theorthographyfor theprefixraz-: indeed,theunderlyingform of raz-is really roz-. Thus,understressit is spelledô ó � ù rozû (or ôöó E ù rosû ): thus ôöó E�E�� �C ù rossyp’û ‘mineraldeposit’versusô ø E�E;� àø 2 Cúù rassypat’ û ‘spill, scatter’.Thealternationis notmarkedfor bez-.Thusoneneverfindsthespelling* I#6 � ù biz û : in I/ E àø#4 � ˘6 ù bespalyj û ‘fingerless’,thevowel in theprefix is / � /, not /e/,despitethespelling.

Thebehavior of theseprefixeswould appearto requirea relaxationof theConsis-tency assumption,sincethey wouldseemto involveanORL thatis muchlaterthantheORL thatwe have beenassumingfor therestof theRussianvocabulary. On the faceof it thefactsarereminiscentof Pesetsky’s (1979)unpublishedbut influentialanalysisof Russianlexical phonology, whereinhearguesthatprefixesarephonologicallyon alatercycle thansuffixes.Might it be,then,thattheorthographyof prefixesis similarlyhandledat a later level, andthusreflectsvowel andconsonantchangesthat have not

13Thefactsareidenticalin Belarusian,andthediscussionherewouldcarryover ratherdirectly to Belaru-sian.

3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 77

takenplaceat thepoint at which thespellingof stemsandsuffixesis handled?Underthisanalysiswe wouldhaveto abandonConsistency in favor of akind of ConstrainedInconsistency. The problemwith this move is that, aswe notedabove, the spellingirregularitiesareby no meanscommonto all prefixes: prefixesendingin voicelessconsonantssuchass- or ot- do not undergo thesechanges.Furthermoreot- is alwaysspelledó 2 ù ot û in Russian,evenwhenthe/o/ is reducedto / ü / by akan’je. Thustheinconsistency would beconstrainedindeed,applyingto just a few lexical entries.

Thecleanestsolutionto theproblemwithin thecurrentframework is to assumethatthis classof prefixeshasa specialsetof spellingrulesthat is sensitive to thevoicingof the following consonantand(for roz-) the stresson the prefix. Thus,we assumeaccordingto the Consistency hypothesisthat the ORL representationof I/ E àø#4 � ˘6ù bespalyjû is /bezpalyj/. The normal spelling rule for /z/ would of coursegive �ù z û , but if we assumea rule suchasthat in (3.12), the /z/ will insteadbe written Eù sû . Similarly, the ORL representationof ô ø E�E;� àø 2 C ù rassypat’û is /rossypat,/,andthespellingof /o/ as ø ù aû is accomplishedby therule in (3.13):

(3.12) z � E ù sû / [ $ voiced]

Condition:/z/ is in oneof theprefixesbez-, roz-, v(o)z-or iz-

(3.13) o � ø ù aû

Condition:/o/ is unstressedin theprefix roz-

Theunusualspellingof underlying/z/ and/o/ in thesecasesis thusdueto a form oflexical marking,this time a lexical conditionon a rule. We arethusableto preserveConsistency, thoughat thecostof a redundantspellingrule.

3.1.4 Summary of Russianand Belarusian

The previous discussionhasoffereda comparative analysisof a portion of RussianandBelarusianorthography, andtherelationshipof thoseorthographicsystemsto thephonologiesof thelanguages.Of course,a full evaluationof themodel’sapplicabilityto thesewriting systemsawaitsa completedescriptionof theorthography— aswellasa completedescriptionof the relevantphonologicalphenomena— somethingthatis certainlybeyond the scopeof this work. Nonetheless,the datapresentedhereareat leastconsistentwith theConsistency hypothesis,which is whatwe aimedto show.Thuswe concludethat theORL in RussianandBelarusianareConsistentlevels,andthatfurthermorethereis agreatsimilarity in thetwo systemsof spelling— Û ÜåÝàß ÿâá .ThemaindifferencebetweenRussianandBelarusianorthographylies in thedepthoftheORL.

78 CHAPTER3. ORL DEPTHAND CONSISTENCY

3.2 English

As we have seen,theorthographiesof RussianandBelarusianarebothquite regular(i.e.,in thesenseof “rule-governed”),theonly differencebeingin thelevel of abstract-nessof theORL: in Russian,onerepresentsin thewrittenform alevel of phonologicalrepresentationthat is closerto an “underlying” representation,thanwhat onerepre-sentsin Belarusian.Underthatassumption,relatively little orthographicinformationneedsto belexically marked.In contrast,if weassumedashallowerORL for Russian,thena largeportionof thevocabulary, particularlythosewordswith lexical /o/ or /a/,bothof which surfaceas/ ü / or / ý / in unstressedpositions,would requireorthograph-ic information to be marked. That is, if the spelling òÿóõô ó8÷ ø ù gorodaû is taken torepresenta phonologicalrepresentationsuchas/gü r ýõþda/,thenthe/ ü / andthe/ ý / willeachneedto be lexically marked asbeingwritten as ó ù o û , sinceeitheronecouldequallywell have beenwritten ø ù aû , yielding the samepronunciation.Of course,our assumptiondoesnot allow usto avoid lexical markingin Russian(or Belarusian)completely:for instance,we consideredthe irregularly pronouncedò ù g û /v/ in thegenitiveendings ò8ó / óÿòÿóâù egoû / ù ogoû . But suchitemswouldrequirelexical mark-ing of somekind in any case,sincethey patentlyfall outsidethe regular systemofRussianspelling.

Therelativelyclearstatusof Russianasa“deep”orthographybringsusto theques-tion of how to characterizetheorthographyof ModernEnglish,anotherphonographicwriting systemthathasbeendescribedas“deep”. Of course,evena cursoryknowl-edgeof Englishspellingleadsoneimmediatelyto the conclusionthat the systemofEnglishspellingis agreatdealmorechaoticthanthatof Russian,or indeedalmostanyotherlanguagethatusesa scriptwhoseoriginal designwaspurelyphonological,14 afactthathasnot goneunnoticedby scoresof spelling-reformersfrom theseventeenthcenturyto thepresentday(Venezky, 1970,pages19ff.). Nonetheless,this observationhasnotpreventedvariousauthorsfrom attemptingto find regularityin thesystem.Onesuchenterprisewastheclassicwork of Venezky (1970),who arguedthat therelationbetweenspellingandpronunciation— hewasprimarily interestedin themappinginthis direction,ratherthanthe otherway around— wasgovernedby clearsetsof or-deredrules.15 In Venezky’s system,a spellingsuchas ù socialû wasmappedto themorphophonemicrepresentationc sosIæld , by a setof grapheme-to-morphophonemerules(page46); andthenceto a surfacepronunciationby phonologicalrules.RulesinVenezky’s systemwerearrangedin orderedblocks. Thusoneblock statesthat initialù h û correspondsto morphophonemiccY| d in heir, (AmericanEnglish)herb, honest,honorandhour; medialpreconsonantal,andfinal ù h û is c�| d ; and ù h û is elsewherec h d (page74). Interestingly, Venezky saysrelatively little of asystematicnatureabout

14Oneexceptionis theorthographyof ManxGaelic,asystemthatis basedontheorthographyof English,andwhoseapparentarbitrarinessapproachesthatof Englishin somerespects;we will discussthewritingsystemof Manx lateron in Section6.1.

15Anotherimportantwork that presentsa rule-basedaccountof Englishspellingis Cumming’s (1988)treatmentof AmericanEnglishspelling. Cumming’s work is relatively unusualin the literatureEnglishorthographyin thathedeals,aswe do in this section,with theproblemof predictingspellingfrom pronun-ciationratherthanpronunciationfrom spelling;aswe have just noted,Venezky’s work, for instance,dealtwith theproblemof inferringpronunciationfrom spelling.

3.2. ENGLISH 79

the influenceof morphology: the vowel-shift-relatedalternationssuchas thoseex-emplifiedin extreme–extremity, andwhich wereto play so centrala role in the earlydevelopmentof generative phonology, were treatedonly briefly (pages108–109)inVenezky’sdiscussion.

It is temptingto classifyVenezky’s model in our termsasbeing onewheretheORL — hismorphophonemiclevel — correspondsmoreor lessto anunderlyingmor-phophonemicrepresentation.But this is not strictly accurate:Venezky wasoperatingwithin apre-generativeAmericanstructuralistsetof assumptions,16 within whichsuchnotionsasunderlyingrepresentationwerenot available.

Within the generative phonologicaltradition, however, suchnotionsas deeporsurfacestructureareexplicitly assumed(or at leastwere until abouta decadeago),and it was within this set of assumptionsthat arguably the most radical statementaboutthenatureof Englishspellingwasmade.Chomsky andHalle (1968) arguethatEnglishorthographyis, despiteappearances,a nearoptimal spellingsystem,the keyassumptionbeing that what is representedis not a surfacephonemicrepresentation,but rathera quite abstractlexical representation.The claim wasmadelargely on thebasisof suchalternationsasassign–assignation. For this pair, the surfacephonemicrepresentation/ ü þsaIn/–/asIgþneIsü n/ givesno clueasto why thereshouldbea ù g û inthespellingof assign, or why ù i û shouldrepresentsuchtwo quitedifferentvowels.The deeprepresentation,which would be somethinglike /æþ sıgn/ and/ }æsıgþnatyon/,respectively, makesthis clear, asbothformshave a /g/ (removedby subsequentrulesin assign), andbothformshavethesame/ı/, whichundergoesvowelshift in assign, andtrisyllabic laxing in assignation. Chomsky andHallegosofarasto claimthat Englishorthography, far from beingtheunfortunatesystemit is usuallytakento be,is in factcloseto anidealsystemof orthography. This is because“the fundamentalprincipleoforthographyis thatphoneticvariationis not indicatedwhereit is predictable”andthat“an optimalorthographywould haveonerepresentationfor eachlexical entry” (1968,page49) (alsocitedin (Sampson,1985,page200)).

Needlessto sayfew scholarsof writing systemswould agreewith Chomsky andHalle’s ratherprescriptive statementaboutorthographicprinciples,which is neitherobviously trueasa statisticalstatementaboutwriting systemscross-linguistically, norcanit betakenasanything otherthana statementof personaltasteabouthow writingsystemsshouldbedesigned.17 Furthermore,andmoredirectly relevantto ourcurren-

16Hedoesnot evenciteTheSoundPatternof English, eventhoughthatwork appearedtwo yearsprior tothepublicationof hismonograph.

17Having saidthat, it is certainlytrue that thereseemsto be a “tension” betweenwhat onemight termphonological faithfulnesson the onehand,and morphological faithfulnesson the other. That is, writingsystemsoften facea choicebetweenrepresentinga word in a form that is representative of its (surface)pronunciation,andrepresentingthemorphemesof aword in a fashionconsistentwith theirspellingin otherrelatedwords. This is hardly a new observation, andlinguistshave for decadesmadethis observation invariousforms. Russianorthographycanbe said to have addressedboth problemsratherelegantly in thesensethat morphologicallyrelatedforms — at least thosethat are relatedby fairly regular and generalphonologicalalternations— areconsistentlyspelled,yet going from the spelling to the pronunciationisalsoquitestraightforward— givenof coursethatonehascertainlexical informationin hand.

Onemight be temptedto statethe notionsof phonologicalfaithfulnessandmorphologicalfaithfulnessassoft constraints,andexplain variousspellingsby meansof differentrankingsof theseconstraintsin themannerof Optimality Theory: indeedwhatwe have termedmorphologicalfaithfulnessis quitesimilar in

80 CHAPTER3. ORL DEPTHAND CONSISTENCY

t discussion,several authorshave taken issuewith the specificclaim aboutEnglish.For example,Sampson(Sampson,1985)notesseveral ratherseriousproblemswithChomsky andHalle’sposition.Onesignificantproblemis thatevenif Englishspellingis assumedto representa deeplevel, it canhardlybesaidto beconsistentin its repre-sentation.For instance,while assign–assignationshow retentionof thesamespellingacrossdifferentderivedforms,someotherpresumablymorphologicallyrelatedpairsdo not, even in caseswherethereis no changein pronunciation:so considerthe al-ternationin spelling for the vowel /i/ in ù speechû – ù speak û ; or the alternationinspellingfor the pair ù collideû – ù collision û , wherethe pronunciationof underlying/d/ as/z/ shouldbepredictablein thismorphophonologicalenvironment,sooneoughtin principle to be able to spell the word collision as ù collidion û (Sampson,1985,page201); or considersuchminimal pairs as ù raceû – ù racial û versus ù spaceû –ù spatial û , wherethe phonologicalalternation(/s/ versus/s/) is identical,asarethemorphologicalenvironments,andyet in onecasethespelling ù c û is retainedin the-ial derivative,whereasin theotherthe/s/ is spelledas ù t û .

After discussingvariousotherapproachesto Englishspelling,Sampsonproposes(page203):

We may seeanotherkind of methodin the apparentmadnessof ourspelling,though,if we avoid letting ourselvesbeobsessedby thephono-graphicoriginsof theRomanalphabetandthink of Englishspellingasatleastpartly logographic.

The proposalthat Englishspellinghaslogographicpropertiesis certainlya widelyexpressedone: for example,as Bloomfield notes(Bloomfield and Barnhart,1961,page27) (usingthetermword writing for logographic):

Now someonemay askwhetherthe spellingof knit with k doesnotserve to distinguishthis word from nit ‘the egg of a louse’. Of courseit does,andthis is exactly whereour writing lapsesfrom the alphabeticprinciplebackinto theolderschemeof wordwriting. Alphabeticwriting,which indicatesall thesignificantspeechsoundsof eachword, is just asclearasactualspeech,whichmeansthatit is clearenough.Word writing,on theotherhand,providesa separatecharacterfor eachandevery word. . .Ourspellingtheverbknit with anextrak (andthenounnit without thisextrak) is astepin thedirectionof word writing.

While thereis certainlysomemerit in thisview, I feel thatit is importantto distinguishwriting systemswith a truelogographiccomponent,suchasChineseor AncientEgyp-tian, from theratherhaphazard“pseudologographic”propertiesof Englishwriting.

On the onehand,many of the logographiccomponents(the so-called“semanticradicals”) of Chinesecharactersseemto representsemanticaspectsof morphemesin a surprisinglyconsistentway. As we have notedelsewhere(Sproatet al., 1996),many semanticradicalsin Chinesearequite consistentin the semanticinformation

spirit to paradigmuniformity, a principle that hasbeenproposedasa phonologicalconstraintby Steriade(1999)andothers.I think, however, thatthese“principles” maybea little too vagueandfuzzy to allow forthis treatment,at leastatpresent.

3.2. ENGLISH 81

thatthey mark. Thusin thelists presentedin (Wieger, 1965,pages773–776),254outof 263(97%)characterswith thesemanticradical

U ù INSECT û denotecrawling orinvertebrateanimals;for ~ ù GHOST û (page808),21 out of 22 (95%)denoteghostsor spirits. The semanticinformationprovidedby theselogographicelementsis thusstrikingly consistent.As wealreadyproposedin Section1.2.2,for aChinesewordlike� ù INSECT+CHAN û chan ‘cicada’, the insectradical

U ù INSECT û representsaportionof thesemanticfeaturesetfor themorpheme,whereasthephonologicalportion(in thiscase�tù CHAN û ) representsthepronunciationassociatedwith themorpheme.

On the other hand, it is hard to find any suchconsistency in the “logographic”aspectsof Englishorthography:for instance,thesetof wordsorthographicallydistin-guishedfrom otherwordsby aninitial silent ù k û — knit, know, knight, knave. . .—formsno naturalclass,andthemostwe cansayaboutthe ù k û hereis thatit is anor-thographicelementwith no correspondingphonologicalelement.Thustheword knitmight be representedasin (3.14),repeatedin part from (1.6a): noteagainthatwhiletherepresentationof /n/ as ù kn û is idiosyncratic,thetheremainingspelling ù it û isin thiscasepredictablegiventhephonologicalform, anddoesnotneedto bespecified:

(3.14) SPHON T���X-Z I _abORTH cY�/� X�dPe

It would of coursebecompletelyunmotivatedto saythatthe ù k û , or ù kn û herecorrespondsto any non-phonologicalcontentof thefeaturestructureof thisword,andsowereally havenoparallelhereto theChinesecase.

All of which still leavesuswith thequestionof how bestto characterizeEnglishorthographyin termsof themodelwe aredeveloping. The interestingquestionfromourperspectiveis nothow to dealwith irregularitiessuchasthe ù k û in knit, but ratherhow to characterizethephonologicallevel of representationthat is representedby theregularspelling.In our terms:how deepis theORL for Englishspelling?A definitiveanswerto this questionwould requirea completeanalysisof the spellingof a largeportion of the Englishvocabulary — somethingakin to Nunn’s (1998) treatmentofDutchspelling,onewhich systematicallyanalyzeshow well thestandardorthographyof, say, AmericanEnglish,correspondsboth to the standard(surface)pronunciationandto a plausibleproposalfor theunderlyingrepresentationof eachword.18

In this sectionI will describea smallanalysisof a kind thatshouldeventuallybedoneon a more completescale. I selected1169 words from an on-line dictionarywith their (American)spellingsandphonemicrepresentationfor a standardAmerican

18Alternatively one might considera data-orientedapproachfor measuringthe relation betweenthespellingandvariousproposalsfor theORL. VandenBoschandhiscolleagues(vandenBoschet al., 1994)investigatedthreedata-orientedlearningmethodsfor measuringtherelative complexity of English,FrenchandDutch orthographies— morespecificallythe complexity of the relationbetweenthe spellingandthesurfacephonemicrepresentationthatonewould find in a dictionary. They proposedtheinverseof thevari-ousmodels’performanceasa measureof thecomplexity of eachsystem.Of course,oneweaknessof theirapproachfrom our point of view is that they took it for grantedthat the surfacephonemicpronunciationis the correctlevel to relatethe spellingto. Our thesisthat the depthof the ORL differs amongdifferentlanguagessuggeststhattheirassumptionis notnecessarilyvalid.

82 CHAPTER3. ORL DEPTHAND CONSISTENCY

pronunciation.Thesewordsconsistmostlyof forms thatareat leastin partLatin- orGreek-derivedandshow alternationsof thekind thatwerecentralto theargumentsforvowel shift, laxing,velarsoftening,andsomeotherconsonantalternationsin (Chom-sky andHalle,1968)(SPE).Thuswefind exampleslikeabound–abundanceor helio-centric–heliocentricity–heliocentricism. In additionto the dictionary-derivedsurfacephonemicrepresentation,I alsoreconstructedanSPE-styleunderlyingrepresentation.Thustheunderlyingformsof aboundandabundanceareassumedto be,respectively,/æþbund/ and /æþbundans/;similarly the, underlyingforms of electric andelectricityare,respectively, / � þ l � ktrIk/ and/ � l � k þ trIkIti/. Hereandelsewhere/u/ is usedto repre-sentunderlying/u/ that diphthongizesto /aU/; tense/u/ as in super, which doesnotdiphthongize,is representedwithout the overbar;/k/ representsa /k/ that undergoesvelar softening(similarly /g/). WhereI perhapsdepartfrom SPEis in only positingabstractformsin caseswherethereis plausibleevidencefor analternation.Thussincethereis no evidencefor the first /s/ in cervix alternatingwith anything else,I repre-sentthis word underlyinglyas/ þs� rvIks/ (but notethepenultimate/k/ asevidencedbythe form cervicitis, wherethe /k/ hasundergonevelar softening). As a generalrule,I assumethat thereareno schwasin underlyingrepresentation,only full vowels: s-incethepositedfull vowelsaregenerallypositedon thebasisof theorthography, thisnecessarilybiasesthe analysis,and this point shouldbe bornein mind. The com-pletelist of wordsalongwith their surfaceandpositedunderlyingforms is given inAppendix3.A.1at theendof this chapter.

Whatotherassumptionsdo we thenneedto make in orderto predictthespellingfrom eachof theselevels?More specifically:� Whatrulesdo we needto assumefor Û+ÜåÝàß ÿâá ?� And whatlexical markingof orthographicpropertiesmustweassume?

Needlessto say, therearevariouswaysin which onecould juggle thesetwo kindsofdevices,andwhatI presenthereshouldbeunderstoodasjust beingonepossibleway,andnotnecessarilythebestone.

Lexical specificationsneededfor eachform are given in Appendix 3.A.1. Inthatlist, orthographicspecificationsaregivenby subscriptedlettersin anglebrackets.Thus: /æþbu ���,� ndans/denotesthe fact that the /u/ is spelledas ù u û . The notation� � E � denotesa casewhereletter ù l û correspondsto no phonologicalmaterial. (Insomecasesonecouldalternatively havecoalescedtheaddedletterwith thespelloutofa precedingor following phoneme,aswasdonewith ù knit û above.) Thetwo othermaindevicesarethe (subscripted)feature[ � db], usedto marka consonantthat is toberepresentedorthographicallyasdouble;19 andthefeature[ � gk], which is usedtomarkwordsthathave the Greekspellings ù phû for /f/ and ù rh û for (initial) /r/. Insomecases,particularlywith plural +es, morphemeboundaryinformationis neededin orderto predicttheappropriatespelling;themorphemeboundaryis markedas‘+’.Thesecondandthird columnsof thetablein Appendix3.A.1 constituteproposalsfordeepandshallow ORL’s for theseEnglishwords.

19It might bebelieved thatdoubledconsonantsin Englishspellingarepredictable,but in fact this is notthecase.It is certainlytruethatin generaladoubleconsonantis anindicatorthattheprecedingvowel is lax(cf. (Venezky, 1970,pages106–107),inter alia). But theimplicationdoesnotgo theotherway.

3.2. ENGLISH 83

The(ordered)rulescorrespondingto thedeepandshallow ORL’saregivenin Ap-pendix3.A.2 andAppendix3.A.3 respectively. In many casesthe import of the par-ticular rule shouldbeclear: in caseswhereit is not,somecommentaryis added.Bothof theserulesetshave beentestedwith their correspondingORL’s, to verify that therulesetsappliedto theirORL’sdo indeedderive thecorrectspellingsfor all words.20

To illustratethedifferencebetweentheassumptionof adeepversusshallow ORL,considerthe word audacity, the AVM representationsof which aregiven in (3.15a)(deep)and(3.15b)(shallow). In this word, only a shallow ORL would requirea lex-ical marking,namelythespecificationof ù c û for thespellingof the/s/, indicatedinAppendix3.A.1asa subscriptedù c û .

(3.15) (a) SPHON T��FþdakIti bORTH c d e

(b) SPHON T�� þdæsX Z Iti bORTH c cX d e

Therulesneededto accountfor thespellinggiventhesetwo ORL’sarelistedin (3.16).The numbersin the last two columnsindicate the rule numberfor the DeepORL(Appendix3.A.2)andShallow ORL (Appendix3.A.3),respectively. Notethatin somecasesslightly differentrulesareneededfor thetwo ORL levels.No rule is neededforthe ù c û spellingfor theShallow ORL, sincethis spellingis lexically specified.

(3.16) Spelling Rule DeepRule# Shallow Rule#ù auû : � � ù auû 1 2ù d û : d � ù d û 22 24ù aû : a � ù aû 45 –

æ � ù aû – 48ù c û : k � ù c û 21 –ù i û : I � ù i û 52 57ù t û : t � ù t û 33 35ù y û : i � ù y û / # 58 50

We turn now to a discussionof the fragmentpresentedin Appendix 3.A. Firstof all, thereis a clear differencein the numberof rules neededin eachcase,with58 rulesfor the deepORL, and69 for the shallow ORL. (Someof theserulescouldhave been combined;this would changetheoverall counts,but not the relative sizesof the two sets.)More interestingarethe lexical markingsgiven in Appendix3.A.1.We discountthe[ � gk] and[ � db] markings,which aregenerallyneededundereitherassumptionof thedepthof theORL.For thedeepORL,389(33%)of thewordsrequire

20Thesystemwasdevelopedandtestedusingthe lextoolsfinite-statelinguisticanalysistoolkit developedat AT&T Bell Labs,anddescribedin (Sproat,1997b;Sproat,1997a).

84 CHAPTER3. ORL DEPTHAND CONSISTENCY

lexical marking,with 509totalmarksbeingneeded.For theshallow ORL, in contrast,892/1169(76%) of the words requiresomelexical marking, with 1452 total marksbeingused. So the shallow ORL is certainlya morecostly assumption,particularlywith respectto theamountof lexical marking,but alsoto someextentwith thenumberof requiredrules.This muchsupportsChomsky andHalle’s position.Howeverwhenoneconsidersthe distribution of the marks,the situationis lessconvincing. The tenmostcommonlexical specificationsin theshallow ORL,covering1311/1452(90%)ofthecases,aregivenin Table3.1.Similarly, thetenmostcommonlexical specificationsin thedeepORL, covering453/509(89%)of thecasesaregivenin Table3.2. Amongthe markingsfor the shallow ORL, four relateto the spellingof reducedvowels / ü /,and/I/ (as ù eû ); oneinvolvesthespellingof /s/ as ù c û asin electricity; two involvethe irregular representation(mostly in Greek-derivedwords)of /I/ and/aI/ as ù y û ;oneinvolveswriting /i/ as ù i û (ratherthanthemorenormal ù eû ). Finally, we havespecificationsof /k/ as ù chû (in Greekwords)and/z/ as ù sû . Of these,five do notoccurin someform amongthe top tenfor thedeepORL markings:the four reducedvowel marks,and the specificationof /s/ as ù c û . In contrast, ù y û spellingsforvarietiesof /i/, ù chû spellingsfor /k/, ù sû for /z/ and ù i û for /i/ are neededaslexically specifiedmarkingseven underthe assumptionof a deepORL. The needtomarkthespellingof reducedvowelsundertheassumptionof ashallow ORL is courseunsurprising:I believe it is necessaryto assumethat in English,as in Russian,theORL correspondsat least to a phonologicalrepresentationthat containsfull ratherthanreducedvowels. Similarly, the necessityof markingthe ù c û spellingof velar-softening-derived /s/ in a shallow ORL would appearto be someevidencein favorof an SPE-styledeeplevel for the ORL. On the other hand,someother aspectsofdeepstructure,which wereimportantin the analysisin SPE,turn out to have muchlessimportancethanonemight expect. It makes little difference,for example,thatthereis avowel alternationin thepairchaste–chastity, analternationthatis abstractedawayfrom in theunderlyingrepresentation:in thecaseof adeepORL, theunderlyingvowel of the stem/a/ is mappedto ù aû ; with a shallow ORL, we simply have rulesthatmapthetwo distinctvowels/æ/and/eI/ to ù aû . No lexical markingis required.Similarly, while an alternationlike assign–assignationdoesrequirelexical markingfor the shallow ORL (sincewe must simply mark the fact that in the word assign,thesequence/aIn/ is spelled ù ign û ), thereareonly six suchcasesin our list, not anoverwhelmingamountof lexical marking.

Interestingly, oneclearcasethatrequireslexical markingwith adeepORL but notwith ashallow ORL involvesalternationssuchasabound–abundance. Theunderlyingrepresentationof thesecondvowel in this pair of wordsis presumablyuniformly /u/,soonewouldexpectaconsistentspelling— e.g. ù ouû . Yet in thiscase,thespellings,whichalternatein parallelwith thephonologicalvowelalternationsaremoreconsistentwith a shallow ORL thanwith a deepORL, contraryto what onemight expectfromSPE.

What do we concludefrom all of this? Thereseemsto be someevidenceforthe EnglishORL beingrelatively deep,somethingthat is hardly surprising. On theother hand,with the exceptionof ù c û in velar softeningcases,the considerationsthatfiguredprominentlyin Chomsky andHalle’sdiscussionof Englishspellingdonot

3.2. ENGLISH 85

Phoneme Orthographicmark Numberof cases/ ü / ù o û 395/s/ ù c û 242/ ü / ù aû 170/aI/ ù y û 123/I/ ù y û 112/I/ ù eû 67/i/ ù i û 63/ ü / ù i û 61/k/ ù chû 47/z/ ù sû 31

Table3.1: Thetenmostfrequentlexical markingsfor theshallow ORL in theEnglishfragment.

Phoneme Orthographicmark Numberof cases/ı/ ù y û 180/s/ ù c û 84/I/ ù y û 70/k/ ù chû 47/z/ ù sû 27/u/ ù u û 12/i/ ù i û 9/k/ ù k û 8/i/ ù eû 8/u/ ù euû 8

Table3.2: Thetenmostfrequentlexical markingsfor thedeepORL in theEnglishfragment.

in fact seem to be of suchgreatimportance.Onefurther point shouldbe borneinmind: we have consideredabout1100words,carefully selectedto exhibit the kindsof alternationsunderdiscussionin SPE.This is hardlya representative sampleof theEnglishvocabulary, eitherin termsof theraw count,or in termsof thepropertiesthewordsexhibit. Indeed,an examinationof a larger fragmentof the vocabulary wouldprobablymake theargumentfor a deepORL lessconvincing: themajority of Englishwordssimply do not participatein the kinds of alternationsexhibited by the subsetconsideredhere.

To reiteratethecaveatsthatwehavealreadypresented,onenaturallymusttaketheanalysispresentedherewith at leasta small grain of salt: in particularif onemakesdifferentassumptionsaboutthe underlyingrepresentations,thenonewould arrive atdifferent results. Still, I shouldbe surprisedif they turnedout to be too different,and I would expect the basicconclusionto remainthe same:with the exceptionoftheorthographicrepresentationof reducedvowels,which is moreelegantlyhandledifoneassumesa relatively deepORL, theevidencefor a deep“morphological”ORL in

86 CHAPTER3. ORL DEPTHAND CONSISTENCY

Englishis equivocal.

3.3 The Orthographic Representation of Serbo-Croatian ConsonantDevoicing

An interestingprima faciecounterexampleto theConsistency hypothesisis found inSerbo-Croatian,andinvolvesthespellingof dentalobstruentsbefore/s/and/s/.21 Ac-cordingto thestandarddescriptionof Serbo-Croatian,obstruentclustersagreein voic-ing,with thevoicingof theclusterbeingdeterminedby thefinal memberof thecluster.Thusalongsidesvezati‘to bind’, onefindssveska ‘notebook’;besideredak‘line’, onefindsthegenitivesingularform retka. This muchis in commonwith otherSlavic lan-guagessuchasRussian.What is unusualaboutSerbo-Croatianis that thesevoicingassimilationsarereflectedin theorthography, so thata /b/ thathasbecomedevoicedto a /p/, for instance,is writtenas ù p û ratherthan ù b û . ThemodernSerbo-Croatianorthography, due to Vuk Karadzic(1787–1864),is often cited as an instanceof a“shallow” orthography(seeChapter5 for somefurther discussionof this point), andoneof thefeaturesof this “shallowness”is that it spellswordsaccordingto their sur-facephoneticrealization:in popularparlance,Serbo-Croatianis written“asit sounds”.

If this werethe entirestory, thenSerbo-Croatianwould be handleableunderthepresenttheorywithout further comment:the ORL would simply be a level at whichvoicing assimilationin obstruentclustershadalreadyapplied. Thereis, however, asystematicexceptionto the spellingprinciple we have just outlined: underlying/d/when followed by /s/ or /s/ retainsits spellingas ù d û , even thoughit is describedasbeingvoiceless:notethat in otherenvironments(e.g.,before/k/, or /p/) devoiced/d/ is spelledas ù t û ; andotherobstruentsbesides/d/ arespelledas their voicelesscounterpartsbefore/s/ or /s/. Thusprefix od- beforepad- “f all” yieldsotpad ‘trash’;srb- ‘Serb’ yields srpski ‘Serbian’. But grad- ‘city’ yields gradski ‘urban’, andod-plus steta‘damage’yieldsodsteta‘compensation’.

On thefaceof it, then,we wouldappearto havea problem:for mostobstruentsinmostenvironmentstheevidencewouldappearto favor placingtheORL afterobstruentvoicingassimilation.But just in casewehaveanunderlying/d/ precedingan/s/or /s/,weseemto needto placetheORL earlier. In orderto maintainConsistency, wewouldhave to resortto oneof two possiblestrategies,neitherof which is palatable:� Assumea late(postvoicing-assimilation)ORL, but markunderlying/d/ before

/s/or /s/with adiacritic,sothatthespellingrulescanseethefactthatit was/d/,andspell it accordingly.� Assumean early (pre voicing-assimiliation)ORL. In this caseonehasaccessto the underlyingsegments,so thereis no problemspellingunderlying/d/ asù d û in gradski. Unfortunately, however, in mostcasestheobstruentis spelledaccordingto its surfacephoneticrealization,meaningthat onewould in effect

21I amgratefulto WaylesBrownefor bringingthisexampleto my attention.

3.3. SERBO-CROATIAN DEVOICING 87

beduplicatingin theorthographicrules,theeffectsof voicing assimilationthatarealreadyhandledin thephonology.

The seeminglyexceptionalspellingof underlying/ds/ and /ds/ sequencesis notmerelya problemfor Consistency, however. It is, in fact, a puzzlemoregenerally.Why did Karadzicfail to spell thevoicedstopasits voicelesscounterpartin just thisone case? Could it be, in fact that suchunderlying/d/’s soundvoiced, despitethestandarddescription? If so, this would suggest,amongother things, that obstruentvoicingassimilationis notaunitaryphenomenon,but appliesto varyingdegreesunderdifferentconditions.

Supportfor thenon-uniformityof obstruentvoicing assimilationis alreadygivenby Browne(1993,page317),who notesthatassimilationto a voicedcluster-final ob-struent,andassimilationto a voicelesscluster-final obstruent,behave differentlywithrespectto thephonologicalrule of clusterbreakingin nominalgenitive plurals. Con-sonantsthat have becomevoicedby voicing assimilation,remainvoicedafter beingseparatedfrom theconsonantthattriggeredvoicing. Thusprimetiti ‘to remark’,yieldsprimedba ‘comment’(noun);in thegenitivepluralprimedaba, theacquiredvoicingonthe/d/ is retained.On theotherhandthedevoiced/z/ in sveska ‘notebook’,shows upagainasa /z/ in thegenitiveplural form svezaka. Evidently, [ � voiced]assimilationisdeeperin thephonologythan[ $ voiced]assimilationinsofar asa traditionalanalysiswouldordertheformerbeforeclusterbreaking,andthelatterafter.

To explain the orthographicfacts,however, we are interestedin an even finer-grainedquestion: is theresomereasonto believe that [ $ voice] assimilationis lesscompletein the sequences/ds/ and/ds/, thanit is in othersequences?As a prelimi-naryanswerto thisquestionweconductedapilot studyof [ $ voice]assimilationin thespeechof a singleCroatianspeaker. In this studywe addressedthe following ques-tions:

1. Are underlying/d/ and/t/ before/s/ phoneticallydistinct with respectto theirvoicingprofiles,contraryto standarddescriptions?

2. Are underlying/b/ and/p/ before/s/ (bothspelledù p û ) phoneticallydistinct?

3. How dounderlying/d/ and/t/ (bothspelledù t û ) beforeanon-sibilantobstruent– /k/ in ourdata– compareto thesestopsin thepre-/s/position?

Thestudyandits resultsaredescribedin thetwo sectionsthatfollow.

3.3.1 Methods and materials

A list of Croatianwordswaspreparedthatcoveredtheenvironmentsof interestfor thequestionsabove. Specificallythesewordscovered:

1. Underlying/d/ before/s/: akadski‘Acadian’,gradski‘urban’.

2. Underlying/t/ before/s/: anegdotski‘anecdotal’,hrvatski‘Croatian’.

3. Underlying/b/before/s/: arapski‘Arab’, mikropski‘microbial’, ropski‘slavish’,srpski‘Serbian’.

88 CHAPTER3. ORL DEPTHAND CONSISTENCY

4. Underlying/p/ before/s/: mikroskopski‘microscopic’

5. Underlying/d/ before/k/: glatka ‘smooth’, lutka ‘doll’, otpatke ‘refuse(noun)’,votka‘vodka’.

Thesewordswereprinted,alongwith filler material,in four iterations,eachwith adifferentrandomorder.

Thesubjectof the experiment— a researcherat AT&T Labs— is a malenativespeakerof a Dalmatiandialectof Croatian.Hehaslivedfor overa decadein English-speakingcountries,but hisCroatianspeechis self-describedasnormalfor thatdialect,andnot affectedby his exposureto English. He wasnot informedof the purposeoftheexperiment.22

The speaker wasasked to readthe wordson the printed list at a normal rateofspeed. His speechwas recordedto DAT using a Bruel and Kjær Microphonein aquiet room. The datawassubsequentlyuploadedto a Silicon Graphicsworkstation,andhigh-passfilteredat 40 Hz to remove low-frequency noise.Thespeechwasthensegmentedinto wordsusingtheEntropicResearchLaboratorieswaves� package.Pre-dictionof voicingwascomputedfor eachfile usingtheEntropicget f0 utility (Talkin,1995). Note that the voicing profile for a speechfile producedby get f0 is a timeserieswith two values,namely1 for voicedand0 for unvoiced. The individual fileswerethenhandlabeledusingwaves� andthexmarksutility for thefollowing features:� Onsetof thepre-/s/,or pre-/k/stop.� Offsetof thevoicingwithin thestop.� Onsetof thefollowing segment(/s/ or /k/).

The first andthird of thesewerelabeledbasedon visual inspectionof the waveformandthespectrogram.Thesecondwaslabeledbasedon thevoicing profile. A typicalwaveformandvoicingprofile for thewordgradskiis shown in Figure3.1.

3.3.2 Results

Thereareat leasttwo plausiblemeasuresof thedegreeof voicing of a stop,giventhevoicing profile: onemeasureis theabsolutedurationof theinterval betweentheonsetof thestop,andtheoffsetof thevoicing;anotheris theproportionor percentageof thestopthatis voiced.As it turnsout,bothmeasuresyield similar resultsin this study.

Let us dealfirst with the leastsurprisingresult: /d/, written as ù t û before/k/ isclearly voiceless,essentiallythroughout. The meanabsolutedurationof the voicedregionof thestopis 5 msec,andthemeanproportionof thevoicedis 0.06.Thustheseunderlying/d/’s really are/t/’s,hencetheir spelling.

Turningnow to thecaseof /p,b/before/s/ (bothwritten ù p û ), thefirst thing thatwe noteis thatvoicing is generallyfoundin, on average,thefirst 25 msecof thestop,which is greaterthantheamountwe observedin thecaseof /tk/. Betweenunderlying

22Thesubjectwasaskedto readthefirst pageof thetext asecondtimeat theendof therecordingsession,sothatwehave five ratherthanfour repetitionsof somewords.

3.3. SERBO-CROATIAN DEVOICING 89

Figure3.1: Waveform andvoicing profile for oneutteranceof gradski ‘urban’. The closurefor the /d/ is labeledas “dcl”, the voicing offset is labeledas “v”, and the start of the /s/ islabeledas“s”. Thevoicing profile is thethird plot from thetop in thesecondwindow, labeledasprob voice.

90 CHAPTER3. ORL DEPTHAND CONSISTENCY

d�

d�

d�

d�

d�

d�

d�

d�

d�

d�

t�

t�

t�

t�

t�

t�

t�

t�

t�

t�0.

00.

20.

40.

60.

8-dski (d) vs. -tski (t)

Pro

port

ion

of v

oice

d cl

osur

e

Figure3.2: Barplotshowing theproportionsof voicing for all samplesof underlying/d/ (blackbars),versus/t/ (shadedbars).

/p/ and/b/ therewasnosignificantdifference:for /p/ (4 samples)theaveragedurationof voicing 24 msec,for /b/ (18 samples)it was26 msec. A � -testshowed this smalldifferenceto be non-significant( �����(� �5�,�-�����,� �\� ). Looking at the proportionofvoicing,wedofind amildly significantdifference:themeanproportionfor underlying/p/ was0.55andfor underlying/b/ was0.36( ���w�5� �#�,�������(� �#� ), but notethat thedifferenceis not in theexpecteddirectionsincetheunderlying/p/ behavesmorevoicedthanunderlying/b/ by this measure.This resultmight at leastin partbeexplainedbythe fact that the underlying/p/’s hadshorterdurations(mean44 msec)comparedtounderlying/b/’s (mean72 msec).If thereis a tendency to keepa constantdurationofvoicing,thiswouldresultin a largerproportionof voicing for theunderlying/p/ cases.All in all though,thereseemsto beno convincingevidencethatunderlying/b/ and/p/before/s/behavedifferentlywith respectto their surfacerealization.

With /t/ and/d/ before/s/ the story is very different. First of all, considerabso-lute duration,which averaged14 msecfor /t/ (10 samples)and34 msecfor /d/ (10samples).This differenceis significant: �{���,� �\���-�¢¡@�,� �5�B£ . Theproportionof voic-ing alsoshows a significantdifference,with a meanof 0.25 for /t/, and0.46 for /d/:�¤�@�(� �,�5���¥¡=�,� �#£ . Theproportionsof voicing for all samplesof /d/ and/t/ areshownin thebarplotin Figure3.2.While thereis clearlyoverlapbetweenthetwo categories,theconclusionseemsunequivocal: contraryto thestandarddescription,/d/ before/s/in words like gradskior akadskihasa greaterpropensitytoward beingvoiced than/t/ in the sameposition. This is differentfrom the casewith /d/ before/k/, which is

3.4. CYCLICITY IN ORTHOGRAPHY 91

unequivocallyvoiceless,andit is differentfrom thecaseof underlying/p,b/before/s/,wherewe foundnoreliabledifferencein voicingbehavior.

Should/d/ in wordsgradskibe consideredvoicedasopposedto voiceless?Thisdependsuponwhatonemeansby “voiced”. In Serbo-Croatian,voicedobstruentclus-tersshow clearvoicing throughout,whereas/d/ in gradskiisnever voicedthroughout. Perhapsfor this reasonthis underlying/d/ shouldbe con-sideredvoiceless. But no matterwhat the correctanswerto that questionmay be,onepoint seemsunequivocal from thedatawe have presentedhere: [ ¦ voice] assim-ilation is not a simpleacross-theboardphenomenon.It happensto differentdegreesin differentenvironments. Evidently, it appliesin the leastcompletefashionwhere/d/ precedes/s/, andthis fact is reflectedin the orthography:such/d/’s soundmorevoiced,andhencearewrittenas ¡ d § ’s.So,far from beingaproblemfor Consistency,Serbo-Croatianlendsratherdetailedsupportto thenotionof a uniformORL.

Needlessto say, theresultsof this preliminaryexperimentneedto becorroboratedby a more thoroughstudyof a wider rangeof speakers. Nonetheless,I believe theburdenof proof now lies with thosewho would standby the traditionaldescriptionwhich presentsobstruentclusterdevoicing in Serbo-Croatianasa simpleacross-the-boardphenomenonthatappliesequallyin all cases.

Onefinal questionneedsto be addressed:is it possiblethat the speaker in thisexperimentwasinfluencedby theorthography, andwasthusproducingspellingpro-nunciations?More generally, might literateSerbo-Croatianspeakersbe influencedintheir applicationof obstruentdevoicing by thevery spellingthatwe areattemptingtoexplain?This would suggest,then,thatwhile in Karadzic’s time thevoicingassimila-tion wascomplete,dueto his (onceagain,peculiar)spellingof underlying/d/ as ¡ d §before/s/, subsequentgenerationsof speakershave beeninfluencedby the spelling,andnow differentiatethedegreeof assimilationaswehaveobserved.This is certainlypossible,but if it is so,thenonceagainit would appearto supportthenotionof Con-sistency: onemight inventawriting systemthatfails to observeConsistency, but therewill bea strongtendency on thepartof usersof thatsystemeitherto adjustthewrit-ing systemto make it moreConsistent,or elseto (unconsciously)adjusttheir speechto bring it more in line with the orthographicrepresentation.In any case,then, theConsistency hypothesisappearsto be supportedby the Serbo-Croatiandatawe havepresentedin this section.

3.4 Cyclicity in Orthography

Traditionalmodelsof Generative Phonology, including classicalSPE-stylephonolo-gy, andlater morearticulatedtheoriessuchasLexical Phonology(Mohanan,1986),includethe familiar mechanismof cyclicity. Phonologicalrules that apply cyclical-ly do so by applying in tandemwith the morphology, so that a setof phonologicalrulesis appliedaseachaffix is attached.Cyclicity is not in favor muchin present-dayphonologicaltheories.

We are interestedhere, though,not in cyclicity in phonologybut rather in or-thography. Perhapsnot surprisingly, given the dearthof formal analysesof ortho-

92 CHAPTER3. ORL DEPTHAND CONSISTENCY

graphicsystems,very little evidencehasbeenadducedin the literaturefor cyclicityin orthography. There is however one suchpotential instancein Dutch, discussedby Nunn (1998, pages102–103)23 involving the interactionbetweenOrthographicConsonantDegemination,andOrthographicSyllabification.OrthographicConsonantDegemination,roughlyspeaking,simplifiesdoubledconsonantsthatoccurwithin thesameorthographicsyllable: thusverbrand+d (burn+ed)‘burnt’ (adjective) is spelled¡ verbrand§ . OrthographicSyllabificationis arelatively complex rule in Nunn’sanal-ysis, but oneresultof the rule is to split up intervocalic geminateconsonantsif therighthandmemberof the pair can possiblybe syllabified to the right: thuswasster‘washerwoman’ is syllabifiedas [was] ¨ [ster] ¨ . Nunn givesa numberof argumentsthatdespitetheir similarity to phonologicaldegeminationandsyllabification,theset-wo processesarein factorthographicallybased.I will not repeatherargumentshere,but referthereaderto herdiscussion,in particularin Chapters3 and5.

As exampleslikewasstershow, OrthographicSyllabificationcanblockConsonantDegemination:sincethe two ¡ s§ ’s areseparatedinto two syllables,the rule of Or-thographicConsonantDegeminationis nolongerapplicable.Ontheotherhand,formslikewijste‘wisest’ which is morphologicallywijs+st+e (wise+Superlative+Inflection)show that in somecasesSyllabificationseemsnot to block Degemination:hereonewould have expectedthe syllabification[wijs] ¨ [ste] ¨ , andthe spelling* ¡ wijsste§ .Nunn suggeststhat suchexamplescanbe handledif we assumethat SyllabificationandDegeminationapplycyclically. Thusin wijste, ontheinnercyclewehavewijs+st.Syllabificationhasnothingto dohere(therebeingonly oneorthographicvowel, name-ly ¡ ij § ), andDegeminationappliesto yield wijst. On thenext cycle e is added,andSyllabificationappliesto yield [wij] ¨ [ste] ¨ .

Is cyclic applicationin orthographya problemfor Consistency? It would be ifonecould show that the orthographiccycleswerebuilt in tandemwith phonologicalcycles.In thatcase,onecouldno longerspeakof aconsistentlevel for theORL: rathertherewouldbemultiplelevels,onefor eachcycle. However, Nunn’sevidencedoesnotseemto requirethisassumption:preciselybecausewearedealingherewith thecyclicbehavior of two orthographic rules, we have no evidencefor a crucial dependenceuponphonological cyclicity.

Nunnassumesthatherphoneme-to-graphemerules— the first stagein themap-pingfrom theORL — mapfrom asomewhatabstractrepresentationof morphological-ly complex words:herpresumedunderlyingspelling ¡ wijsste§ , canonly bederivedfrom a phonologicalrepresentationwhereonerepresentsboth the /s/ of the root andthe /s/ of thesuffix ([[[we Is]st] © ] ) ratherthana moresurfacephonologicalrepresen-tation that representsthe effectsof phonologicaldegemination(/weIst© /). We couldthereforemapin onestepfrom a phonologicalrepresentationincluding morphologi-cal constituency informationsuchas[[[we Is]st] © ] into anorthographicrepresentation[[[wijs]st]e] , which alsoincludesmorphologicalconstituency information. A cyclicapplicationof orthographicrulescould thenproceedon this orthographicrepresenta-tion, independentlyof whatevergoeson in thephonology. ThusNunn’sexampleneednot bea problemfor Consistency sinceunderthis scenariotheORL is indeeda single

23I thankAnneke Neijt for pointingmeto this example.

3.5. SURFACE ORTHOGRAPHICCONSTRAINTS 93

level of representation(in thiscaseaphonologicallyabstractone),andthecyclicity oftheorthographicrulesis entirelyinternalto theorthography.

It shouldbestressedthatNunn’scaseis theonly phenomenonthatseemsto requireacyclic treatmentin heranalysisof theorthographicsystemof Dutch,ananalysisthatincludesthirteenautonomousspelling rules (of which theseare two), anda coupleof hundredphoneme-to-grapheme rules. Theremay of coursebe further evidencefor cyclicity in orthographyin Dutch,or in otherwriting systems,but at presenttheevidenceis at bestsparse,andit is hardthereforeto concludemuchfrom it.

3.5 SurfaceOrthographic Constraints

While many aspectsof spellingarebestthoughtof in termsof a mappingfrom somelevel of linguistic structureto written form, thereareotherswhich seemto bepurelyorthographicin nature.Venezky (1970,pages59–62)termsthesegraphemicalterna-tions. Nunn (1998),aswe have alreadynoted,distinguishesphoneme-to-graphemeconversionrules(our ª�«�¬5­g®^¯?° ), andasetof purelyorthographicautonomousspellingrules(our ª²±\³�°µ´ ´ ). Nunnidentifiesanumberof phenomenain Dutch spellingthatshearguesarebestdescribedin termsof rulesthatreferonly to orthographicinformation.Onesuchphenomenoninvolvestheorthographicrepresentationof phonologicallylongvowels: thesearegenerallyrepresentedasdoublevowelsin closedsyllablesasin maan‘moon’, but assinglevowelsin opensyllables:manen‘moons’. Nunnstatesthis gen-eralizationasa rule thatdeletesthesecondof anidenticalpair of vowelsprecedinganorthographicsyllableboundary.

AssumingNunn’s analysisof Dutch vowel degeminationis correct,we mustas-sume,asindeedwehave,that ª ±\³�°µ´ ´ in generalimplementsarelation,sinceit includesrewrite rulesthatchangepropertiesof theorthographicrepresentation.However, thereis goodreasonto assumethatat leastsomepurely orthographicphenomenaarebestdescribedasconstraints.As wesuggestedin Section1.2.3.1,onecanthenview ª ±\³�°g´ ´asbreakingdown into twocomponents,oneconsistingof aregularrelation ª²±\³�°µ´ ´�¶¸· ³ ,andoneconsistingof a regular languageª²±\³�°g´ ´ ­g®f¬5¹µº�» . We will have nothingfurtherto sayabout ª²±\³�°µ´ ´ ¶R· ³ here:thereaderis referredto Nunnfor extensiveargumenta-tion for suchautonomousspellingrulesin Dutch.Ratherwe will focushereon a fewexamplesof surfaceorthographicconstraints.

A simpleexampleis affordedby the alternationof ¡ i § and ¡ y § in Malagasy.Both lettersrepresentthevowel /i/, but they arein complementarydistribution, ¡ y §occurringonly at theendsof words, ¡ i § only in non-finalposition.24 For example,from ¡ omby§ ‘cattle’, onecanderivethereduplicativeform ¡ tsiombiomby§ (achil-dren’s gamein which the childrenplay the role of cattle)(Rajemisa-Raolison,1971,page19),wherethefirst copy of thestemombyspellsthe/i/ with ¡ i § , sinceit is wordinternal.

24It is temptingto think that this purelyorthographicrestrictionmayhave beenborrowedfrom English,which hasthe samerestriction,at leastwhenyou discountwordsborrowed from Greek,Latin andothersources;seeVenezky’s discussion(page59). This is notanimplausiblesuggestion,sinceit wasBritish mis-sionaries,invited in 1817by King RadamaI, who introducedtheRomanalphabetto Madagascar, replacingtheolderArabic-derivedorthography.

94 CHAPTER3. ORL DEPTHAND CONSISTENCY

Sucha restrictionis easilymodeledby a surfacefilter (partof ª²±\³�°µ´ ´ ­g®f¬5¹µº�» ) thatdisallowsword-final ¡ i § andnon-word-final ¡ y § (thewordboundarybeingdenotedwith ‘#’):

(3.17) ¼R½n¾*¿¸ÀH¡�Á§�ÃQÄÆÅǾ*¿¸À�¡�Èɧ�¼¤ÃQĵÊAssumenow that ª�«!¬#­g®f¯?° containsthefollowing rule, which maps/i/ to either ¡ i §or ¡ y § :

(3.18) i ËÌ¡ i §ÎÍ\¡ y §Then the mapping ª²Ï!Ð�Ñ(Ò�ÓÔ�̪ «!¬5­g®^¯?°ÖÕ ª ±Y³�°µ´×´ will have the desiredresult ofmapping/i/ to ¡ i § and ¡ y § , andrestrictingtheseto theappropriatepositions.

Another straightforward exampleof a surfacefilter involves positionalvariantsof graphemes,which arefound in many languages;many writers would term theseallographs, thoughDaniels(1991a;1991b)hasarguedagainstthis termon theoreticalgrounds.Oneexampleis the f-lik e “long ¡ s§ ” which occurredexclusively in non-word-finalpositionin variousRomanscriptsdatingfrom theHalf Uncialsof thefourthcentury(Knight, 1996),aswell aslaterprintedforms;the“round ¡ s§ ” occurredonlyin word-finalposition. Clearly this distribution canbemodeledin exactly thesameway as the distribution of ¡ i § and ¡ y § in Malagasy. The identicaldistribution isfound with Greek Ø�¡ s§ (non-finalonly) and Ù (final only). Comparableexamplesarefoundin Hebrew; andin Arabic,whichhasinitial, medialandfinal formsfor mostletters.

It is alsopossibleto considertheprohibitionon internal ÚÛ¡ e§ in Russianto beasurfaceorthographicconstraint.However, thestatementof theconstraintis certainlymorecomplex thanthecasewe have examinedin Malagasy. For onething thestate-mentwould clearly have to restrict Új¡ e§ not to word initial position,but rathertosyllable-initial position,sincetherearenumerouscaseswhereonefindsword-internal,thoughsyllableinitial ÚH¡ e§ :

(3.19) Ü#Ý�Þ/ß!Ú;à�á;â�Þ/ã�ä/Ý<¡ antielektron§ ‘antielectron’å Ú;ã�ä/æ!à�ä�Þ8¡ aeroflot § ‘Aeroflot’ç�è Ú�ÞÉ¡ duet § ‘duet’é ß�ã è Ú�ÞÉ¡ piruet § ‘pirouette’

In somecases,asin thecaseof ‘antielectron’,thesyllableboundaryalsocorrespondsto a morphemeboundary, but this is not alwaysthecase,astheotherexamplesshow.The restrictionon the distribution of Ú¥¡ e§ can thusbe statedin termsof syllablestructure,andmore specificallyasa prohibition on Ú¥¡ e§ occurringanywherebutright-adjacentto anorthographicsyllableboundary.25 Evenso,therearestill lexicalexceptionswhich would have to bemarkedassuch,andwhich would have to beableto overridethis surfaceconstraint:acronyms, not surprisingly, regularly do so (e.g.

25As thereaderwill not fail to have observed,all of theexamplesin (3.19)areborrowedwords.But thisis not surprising,since Ú�ê eë — with theexceptionof somehigh-frequency wordssuchas Ú�Þ�äQê eto ë‘this’, andits derivatives— is mostlyfoundin borrowedwords

3.5. SURFACE ORTHOGRAPHICCONSTRAINTS 95ìHíïî ¡ nep § for novajaekonomiceskajapolitika ‘New EconomicPolicy’); andtherearea handfulof borrowedwordsthatdo not obey theprinciple (e.g. ã�Ú;â�á;ÞA¡ reket§‘racket’). It might bebetterthereforeto considerthis to benot anabsoluteconstraint,but rathera soft constraint,onewhich couldbe implementedwith a weightedratherthan unweightedfinite-stateacceptor.26 The constraint would allow non-initial Ú¡ e§ , but only at somecost. If a lexical item is marked ashaving ÚÉ¡ e§ in a non-syllable-initial position,thenit will be allowed. In all othercases,both non-initial Ú¡ e§ and á&¡ e§ will beallowed,but ÚP¡ e§ will notbeselectedsinceit will beamorecostlyanalysis.

26An alternative would beto assumepriority union(Karttunen,1998).

96 CHAPTER3. ORL DEPTHAND CONSISTENCY

3.A English Deepand Shallow ORL’s

3.A.1 Lexical representations

deep shallow

abound æðbund ©�ðbaUndabundance æðbu ñ�ò�ó ndæns ©�ðbô nd©\ñ�õ\ó nsacademe ðækæödem ðæk©Yñ�õ5óÆödimacademicals öækæðdemIkæls öæk©Yñ�õ5óÆðd÷ mIk ©Yñ�õ5ó lzacademicism öækæðdemI ökIsm öæk© ñ�õ5ó ðd÷ mI ösñ�ø�ó Iz© macademic öækæðdemIk öæk© ñ�õ5ó ðd÷ mIkacetone ðæsñ�ø�ó ÷>ö ton ðæsñ�ø�ó I ñ�ù;ó ö toUnacetonic öæsñ�ø�ó ÷>ð tonIk öæsñ�ø�ó I ñ�ù;ó ð tonIkacetylene æðsñ�ø�ó etI ñ(ú,ó ö len ©�ðsñ�ø�ó ÷ t © ñ(ú(ó ö linacetylenic æösñ�ø�ó etI ñ(ú,ó ð lenIk ©�ösñ�ø�ó ÷ t © ñ(ú(ó ð l ÷ nIkachondrite aðk ñ�øaû,ó ondrıt eI ðk ñ�øaû�ó ondraItachondritic ö akñ�øaû,ó onðdrıtIk öeIk ñ�øaû�ó onðdrItIkacidophile ðæsñ�ø�ó Idoö fıl ü ý�þ;ÿ�� ðæsñ�ø�ó IdoU ö faI l ü ý�þ;ÿ��acidophilic öæsñ�ø�ó Idoð fıl Ik ü ý�þ;ÿ�� öæsñ�ø�ó IdoU ð f I l Ik ü ý�þ;ÿ��aconite ðækoönıt ðæk©Yñ��#óÆönaItaconitic öækoðnıtIk öæk©Yñ��#óÆðnItIkactinomycete öæktInomi ñ�ú,ó�ðket öæktInoUmaI ñ�ú,ó�ðsñ�ø�ó itactinomycin öæktInoðmi ñ�ú,ó kIn öæktInoU ðmaI ñ�ú,ó sñ�ø�ó Inactinomycosis öæktInomi ñ�ú,ó ðkosIs öæktInoUmaI ñ�ú,ó ðkoUsIsadvocation öædvoðkatyon öædv© ñ��5ó ðkeIs© naeruginous ÷ ñ�õ�ù;ó ð rugInos I ñ�õ�ù;ó ð ruˇIn© saerugo ÷ ñ�õ�ù;ó ð rugo I ñ�õ�ù;ó ð rugoU

agnosticism ægðnostI ökIsm ægðnostI ösñ�ø�ó Iz© magnostic ægðnostIk ægðnostIkalbite ðælbıt ðælbaItalbitic ælðbıtIk ælðbItIkalcoholicity öælkohoð l IkIti öælk©Yñ��5ó h�Yñ��#ó!ð l Isñ�ø�ó Itialcoholic öælkoðholIk öælk©Yñ��5óÆðh�Yñ��#ó l Ikalkaline ðælkñ ÿ ó æö l ın ðælkñ ÿ ó ©\ñ�õ\óïö laInalkalinity öælkñ ÿ ó æð l InIti öælkñ ÿ ó ©\ñ�õ\óïð l InItiallophone ðælü ý���� oö fonü ý�þ;ÿ�� ðælü ý������ש ñ��#ó ö foUn ü ý�þ;ÿ��allophonic ö ðælü ý���� oð f ð onIk ü ý�þ;ÿ�� öælü ý������ש ñ��#ó ð fonIk ü ý�þ;ÿ��allotrope ðælü ý���� oö trop ðælü ý������ש ñ��#ó ö troUpallotropic öælü ý���� oð tropIk öælü ý������ש ñ��#ó ð tropIkammonite ðæmü ý������ oönıt ðæmü ý����ש ñ��5ó önaItammonitic öæmü ý������ oðnItIk öæmü ý����ש ñ��5ó ðnItIkamortization öæmortı ðzatyon öæm© ñ��5ó rtI ðzeIs© namortize ðæmorö tız ðæm© ñ��5ó r ö taIzanabolite æðnæboö l ıt ©�ðnæb©Yñ��5ó�ö laItanabolitic æönæboð l ıtIk ©�önæb©Yñ��5ó�ð l ItIkanecdote ðæn÷ k ödot ðænI ñ�ù;ó k ödoUtanecdotic öæn÷ k ðdotIk öænI ñ�ù;ó k ðdotIkangelic anðg÷ l Ik ænðˇ ÷ l Ikangel ð ang÷ l ðeInˇ © ñ�ù�ó lannounce æðn ü ý���� uns ©�ðn ü ý������ aUns

3.A. ENGLISHDEEPAND SHALLOW ORL’S 97

deep shallow

annunciate æðn ü ý������ u ñ ò�ó nsñ�ø�ó I ö at ©�ðn ü ý������tô nsñ�ø�ó i ñ � ó!öeItannunciation æön ü ý������ u ñ ò�ó nsñ�ø�ó I ð atyon ©�ön ü ý������tô nsñ�ø�ó i ñ � ó!ðeIs© nanorthite ænðor� ıt ænð � r � aItanorthitic öænorð � ıtIk öæn� r ð � ItIkanthracene ðæn� ræöken ðæn� r © ñ�õ5ó ösñ�ø�ó inanthracite ðæn� ræökıt ðæn� r © ñ�õ5ó ösñ�ø�ó aItanthracitic öæn� ræðkıtIk öæn� r © ñ�õ5ó ðsñ�ø�ó ItIkanthracoid ðæn� ræökoId ðæn� r © ñ�õ5ó ökoIdanticyclone öæntI ðsñ�ø�ó i ñ(ú,ó klon öæntiñ�� ó ðsñ�ø�ó aI ñ(ú,ó kloUnanticyclonic öæntIsñ�ø�ó i ñ(ú(ó ðklonIk öæntiñ�� ó sñ�ø�ó aI ñ(ú,ó ðklonIkantique ænð teñ��vó k ñ��\ò,ó ænð ti ñ�� ó k ñ��\ò,óantiquity ænð teñ��vó kwIti ænð tIkwItiantitype ðæntI ö ti ñ(ú(ó p ðæntiñ�� ó�ö taI ñ(ú,ó pantitypic öæntI ð ti ñ(ú(ó pIk öæntiñ�� ó�ð tI ñ�ú,ó pIkapical ðæpIkæl ðæpIk ©Yñ�õ\ó lapices ðæpIk � ez ðæpIsñ�ø�ó�� izaplite ðæplıt ðæplaItaplitic æpð l ıtIk æpð l ItIkappeal æðp ü ý������ eñ�ù>õ\ó l ©�ðp ü ý������ i ñ�ù>õ\ó lappellation öæpü ý������n÷>ð l ü ý������ atyon öæpü ý�����n©�ð l ü ý����� eIs© nappendicectomy æöp ü ý������n÷ ndI ðk ÷ ktomI ©�öp ü ý������ ÷ ndI ðsñ�ø�ó ÷ kt © ñ��5ó miappendicitis æöp ü ý������n÷ ndI ðkıtIs ©�öp ü ý������ ÷ ndI ðsñ�ø�ó aItIsappendicle æðp ü ý������n÷ ndIkl ©�ðp ü ý������ ÷ ndIk ©�� l ñ�� ù�óappendix æðp ü ý������n÷ ndIks ©�ðp ü ý������ ÷ ndIksarchangelic öærkñ�øaû,ó anðg÷ l Ik ö � rk ñ�øfû�ó ænðˇ ÷ l Ikarchangel ðærkñ�øaû,óÆö ang÷ l ð � rk ñ�øfû�óÆöeInˇ ©Yñ�ù;ó larenite ðær÷>önıt ðær©�önaItarenitic öær÷>ðnıtIk öær©�ðnItIkargillite ðærgI ö l ü ý���� ıt ð � r ©\ñ�� óÆö l ü ý������ aItargillitic öærgI ð l ü ý���� ıtIk ö � r © ñ�� ó ð l ü ý������ ItIkasceticism æðs� ñ�ø�ó ÷ tI ökIsm ©�ðs� ñ�ø�ó ÷ tI ösñ�ø�ó Iz© mascetic æðs� ñ�ø�ó ÷ tIk ©�ðs� ñ�ø�ó ÷ tIkasinine ðæsI önın ðæs© ñ�� ó önaInasininity öæsI ðnınIti öæs© ñ�� ó ðnInItiasparagine æðspæræögeñ�� ó n ©�ðspær© ñ�õ5ó öˇi ñ�� ó nasparagus æðspærægUs ©�ðspær©Yñ�õ5ó g©Yñ�ò,ó sassignation öæsü ý������ ıgðnatyon öæsü ý����� IgðneIs© nassign æðsü ý������ ıgn ©�ðsü ý������ aI ñ�� þ ó nasymptote ðæsI ñ�ú,ó m��ñ��(óÆö tot ðæsI ñ(ú,ó m��ñ��,óÆö toUtasymptotic öæsI ñ�ú,ó m��ñ��(óÆð totIk öæsI ñ(ú,ó m��ñ��,óÆð totIkathlete ðæ� let ðæ� litathletic æ��ð letIk æ��ð l ÷ tIkatone æð ton ©�ð toUnatonic að tonIk eI ð tonIkatrocious æð trokyos ©�ð troUs© s

98 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

atrocity æð trokIti ©�ð trosñ�ø�ó Itiaudacious ��ðdakyos ��ðdeIs© saudacity ��ðdakIti ��ðdæsñ�ø�ó Itiaugite ð � gıt ð � gaItaugitic ��ðgıtIk ��ðgItIkaustenite ð � st÷?önıt ð � st©�önaItaustenitic ö � st÷?ðnıtIk ö � st©�ðnItIkaustralopithecine ��östraloðpI �;÷>ökeñ � ó n ��östreI loU ðpI � I ñ�ù;ó ösñ�ø�ó i ñ��vó naustralopithecus � sö traloðpI �;÷ kUs � sö treI loU ðpI � I ñ�ù;ó k © ñ�ò�ó sauthenticity ö ���;÷ nð tIkIti ö ����÷ nð tIsñ�ø�ó Itiauthentic ��ð �;÷ ntIk ��ð ��÷ ntIkauthorization ö ��� orı ðzatyon ö ����© ñ��5ó rI ðzeIs© nauthorize ð ��� oö rız ð ����©Yñ��5óïöraIzautomate ð � toömat ð � t ©Yñ��5óïömeItautomatic ö � toðmætIk ö � t ©Yñ��5óïðmætIkautophyte ð � toö fi ñ(ú,ó t ü ýÆþ�ÿ�� ð � t ©Yñ��5óïö faI ñ(ú(ó t ü ý�þ;ÿ��autophytic ö � toð fi ñ(ú,ó tIk ü ý�þ;ÿ�� ö � t ©Yñ��5óïð f I ñ(ú,ó tIk ü ý�þ;ÿ��autotype ð � toö ti ñ(ú,ó p ð � t ©Yñ��5óïö taI ñ(ú(ó pautotypic ö � toð ti ñ(ú,ó pIk ö � t © ñ��5ó ð tI ñ(ú,ó pIkavocation öævoðkatyon öæv© ñ��#ó ðkeIs© nazeotrope æðzI ñ�ù;ó oö trop ©�ðzi © ñ��#ó ö troUpazeotropic ö azI ñ�ù�ó oð tropIk öeIzi © ñ��5ó ð tropIkbacteriophage bækð t ÷ rIoö fagü ý�þ�ÿ�� bækð tI ñ�ù;ó r.i ñ � ó © ñ��5ó ö feIˇ ü ý�þ�ÿ��bacteriophagic bækö t ÷ rIoð fagIk ü ý�þ;ÿ�� bækö tI ñ�ù;ó r.i ñ � ó © ñ��5ó ð fæ Ik ü ý�þ�ÿ��balance ðbælæns ðbæl©\ñ�õ\ó nsbale ðbal ðbeI lbaroscope ðbæroöskop ðbær©\ñ��#óÆöskoUpbaroscopic öbæroðskopIk öbær©\ñ��#óÆðskopIkbasicity baðsIkIti beI ðsIsñ�ø�ó Itibasic ðbasIk ðbeIsIkbeneficence b÷>ðn÷ f Ik ÷ ns b©�ðn÷ f Isñ�ø�ó © nsbeneficent b÷>ðn÷ f Ik ÷ nt b©�ðn÷ f Isñ�ø�ó © ntbenefic b÷>ðn÷ f Ik b©�ðn÷ f Ikbiconcave bı ðkonkav baI ðkonkeIvbiconcavity öbıkonðkavIti öbaIkonðkævItibiophysicist öbıoð f I ñ�ú,ó z ñ��>ó IkIstü ý�þ;ÿ�� öbaIoU ð f I ñ(ú(ó z ñ��>ó Isñ�ø�ó Istü ý�þ;ÿ��biophysics öbıoð f I ñ�ú,ó z ñ��>ó Ik � sü ýÆþ�ÿ�� öbaIoU ð f I ñ(ú(ó z ñ��>ó Ik � sü ý�þ;ÿ��biotite ðbıoö tıt ðbaI ©\ñ��#óÆö taItbiotitic öbıoð tıtIk öbaI ©\ñ��#óÆð tItIkbiotype ðbıoö ti ñ(ú,ó p ðbaI ©\ñ��#óÆö taI ñ(ú,ó pbiotypic öbıoð ti ñ(ú,ó pIk öbaI ©\ñ��#óÆð tI ñ�ú,ó pIkbiquadrate bı ðkwædrat baI ðkwodreItbiquadratic öbıkwæðdratIk öbaIkwoðdrætIkbreve ðbrev ðbrivbrevity ðbrevIti ðbr÷ vIti

3.A. ENGLISHDEEPAND SHALLOW ORL’S 99

deep shallow

bromide ðbromıd ðbroUmaIdbromidic broðmıdIk broU ðmIdIkbronchoscope ðbronkñ�øaû,ó oöskop ðbro� k ñ�øaû,ó ©\ñ��#ó!öskoUpbronchoscopic öbronkñ�øaû,ó oðskopIk öbro� k ñ�øaû,ó ©\ñ��#ó!ðskopIkbryophyte ðbri ñ�ú,ó oö fi ñ�ú,ó t ü ý�þ;ÿ�� ðbraI ñ�ú,ó © ñ��#ó ö faI ñ�ú,ó t ü ýÆþ�ÿ��bryophytic öbri ñ�ú,ó oð fi ñ�ú,ó tIk ü ý�þ�ÿ�� öbraI ñ�ú,ó © ñ��#ó ð f I ñ(ú(ó tIk ü ý�þ�ÿ��calcination ökælkI ðnatyon ökælsñ�ø�ó I ðneIs© ncalcine ðkælkın ðkælsñ�ø�ó aIncalcite ðkælökıt ðkælösñ�ø�ó aItcalcitic kælðkItIk kælðsñ�ø�ó ItIkcalices ðkælI ök � ez ðkælI ösñ�ø�ó � izcalicle ðkælIkl ðkælIk © lcalyces ðkælI ñ(ú,óÆök � ez ðkælI ñ(ú,óÆösñ�ø�ó�� izcalycine ðkælI ñ(ú,ó kIn��ñ�ù;ó ðkælI ñ(ú,ó sñ�ø�ó In��ñ�ù;ócalycle ðkælI ñ(ú,ó kl ðkælI ñ(ú,ó k © lcapacious kæðpakyos k ©Yñ�õ\ó!ðpeIs© scapacity kæðpakIti k ©Yñ�õ\ó!ðpæsñ�ø�ó Iticapitalization ökæpItælı ðzatyon ökæpIt ©Yñ�õ\ó l I ðzeIs© ncapitalize ðkæpItæö l ız ðkæpIt © ñ�õ\ó ö laIzcapitation ökæpI ð tatyon ökæpI ð teIs© ncaput ðkapUt ðkeIp© ñ ò�ó tcarbonization ökærbonı ðzatyon ök � rb© ñ��#ó nI ðzeIs© ncarbonize ðkærboönız ðk � rb© ñ��#ó önaIzcathode ðkæ� od ðkæ� oUdcathodic kæð � odIk kæð � odIkcatholicity ökæ� oð l IkIti ökæ��© ñ��#ó ð l Isñ�ø�ó Iticatholic ðkæ� olIk ðkæ��©Yñ��#ó l Ikcausticity k ��ðstIkIti k ��ðstIsñ�ø�ó Iticaustic ðk � stIk ðk � stIkcave ðkav ðkeIvcavity ðkavIti ðkævIticease ðsñ�ø�ó eñ�ù>õ\ó s ðsñ�ø�ó i ñ�ù>õ\ó scenobite ðsñ�ø�ó enoöbıt ðsñ�ø�ó in © ñ��5ó öbaItcenobitic ösñ�ø�ó enoðbıtIk ösñ�ø�ó in © ñ��5ó ðbItIkcenocyte ðsñ�ø�ó enoösñ�ø�ó i ñ�ú,ó t ðsñ�ø�ó in © ñ��5ó ösñ�ø�ó aI ñ(ú,ó tcenocytic ösñ�ø�ó enoðsñ�ø�ó i ñ�ú,ó tIk ösñ�ø�ó in © ñ��5ó ðsñ�ø�ó I ñ�ú,ó tIkcentricity sñ�ø�ó ÷ nð trIkIti sñ�ø�ó ÷ nð trIsñ�ø�ó Iticentric ðsñ�ø�ó ÷ ntrIk ðsñ�ø�ó ÷ ntrIkcentrosome ðsñ�ø�ó ÷ ntroösom ðsñ�ø�ó ÷ ntr© ñ��5ó ösoUmcentrosomic ösñ�ø�óï÷ ntroðsomIk ösñ�ø�óï÷ ntr©Yñ��5óïðsomIkcercopithecid ösñ�ø�óï÷ rkopI ð � ekId ösñ�ø�ó r.koUpI ð � is ñ�ø�ó Idcercopithecoid ösñ�ø�óï÷ rkopI ð � ekoId ösñ�ø�ó r.koUpI ð � ikoIdcervical ðsñ�ø�óï÷ rvIkæl ðsñ�ø�ó r.vIk ©Yñ�õ5ó lcervicitis ösñ�ø�óï÷ rvI ðkıtIs ösñ�ø�ó r.vI ðsñ�ø�ó aItIscervix ðsñ�ø�ó ÷ rvIks ðsñ�ø�ó r.vIks

100 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

cessation sñ�ø�ó eðsü ý���� atyon sñ�ø�óï÷>ðsü ý����� eIs© ncharacterization ök ñ�øaû�ó ærækt÷ rı ðzatyon ök ñ�øaû,ó ærI ñ�õ\ó ktr.I ðzeIs© ncharacterize ðk ñ�øaû�ó ærækt÷>ö rız ðk ñ�øaû,ó ærI ñ�õ\ó kt ©�öraIzchaste ð cast ð ceIstchastity ð castIti ð cæstItichondrite ðk ñ�øaû�ó ondrıt ðk ñ�øaû,ó ondraItchondritic k ñ�øaû�ó onðdrıtIk k ñ�øaû,ó onðdrItIkchromate ðk ñ�øaû�ó romat ðk ñ�øaû,ó roUmeItchromaticism k ñ�øaû�ó roðmatI ökIsm k ñ�øaû,ó roU ðmætI ösñ�ø�ó Iz© mchromatic k ñ�øaû�ó roðmatIk k ñ�øaû,ó roU ðmætIkchronicity k ñ�øaû�ó roðnIkIti k ñ�øaû,ó roðnIsñ�ø�ó Itichronic ðk ñ�øaû�ó ronIk ðk ñ�øaû,ó ronIkchronoscope ðk ñ�øaû�ó ronoöskop ðk ñ�øaû,ó ron©Yñ��5óÆöskoUpchronoscopic ök ñ�øaû�ó ronoðskopIk ök ñ�øaû,ó ron©Yñ��5óÆðskopIkchrysolite ðk ñ�øaû�ó rI ñ(ú(ó soö l ıt ðk ñ�øaû,ó rI ñ(ú,ó s©Yñ��#ó�ö laItchrysolitic ök ñ�øaû�ó rI ñ(ú(ó soð l ıtIk ök ñ�øaû,ó rI ñ(ú,ó s©Yñ��#ó�ð l ItIkcivilization ösñ�ø�ó IvI l ı ðzatyon ösñ�ø�ó Iv ©Yñ�� ó l I ðzeIs© ncivilize ðsñ�ø�ó IvI ö l ız ðsñ�ø�ó Iv ©Yñ�� ó�ö laIzclassicism ðklæsü ý���� I ökIsm ðklæsü ý������ I ösñ�ø�ó Iz© mclassicist ðklæsü ý���� IkIst ðklæsü ý������ Isñ�ø�ó Istclassic ðklæsü ý���� Ik ðklæsü ý������ Ikclone ðklon ðkloUnclonic ðklonIk ðklonIkcognizance ðkognızæns ðkognIz© ñ�õ\ó nscognize ðkognız ðkognaIzcoincidence koð Insñ�ø�ó ıd÷ ns koU ð Insñ�ø�ó Id© nscoincide ökoInðsñ�ø�ó ıd ökoUInðsñ�ø�ó aIdcolic ðkolIk ðkolIkcollotype ðkol ü ý������ oö ti ñ(ú,ó p ðkol ü ý����� ©\ñ��#ó�ö taI ñ�ú,ó pcollotypic ökol ü ý������ oð ti ñ(ú,ó pIk ökol ü ý����� ©\ñ��#ó�ð tI ñ(ú,ó pIkcolonic koð lonIk koU ð lonIkcolonization ökolonı ðzatyon ökol © ñ��#ó nI ðzeIs© ncolonize ðkoloönız ðkol © ñ��#ó önaIzcolon ðkolon ðkoUl © ñ��5ó ncombination ökombI ðnatyon ökomb© ñ�� ó ðneIs© ncombine komðbın k © ñ��#ó mðbaIncommode koðm ü ý���� od k © ñ��#ó ðm ü ý���� oUdcommodity koðm ü ý���� odIti k © ñ��#ó ðm ü ý���� odIticompilation ökompı ð latyon ökomp© ñ�� ó ð leIs© ncompile komðpıl k ©Yñ��#ó mðpaI lconcave konðkav konðkeIvconcavity konðkavIti konðkævIticonceal konðsñ�ø�ó eñ�ù>õ\ó l k ©Yñ��#ó nðsñ�ø�ó i ñ�ù>õ\ó lcone ðkon ðkoUnconfidence ðkonfıd÷ ns ðkonfId© ns

3.A. ENGLISHDEEPAND SHALLOW ORL’S 101

deep shallow

confide konð fıd k ©Yñ��5ó nð faIdcongeal konðgeñ�ù>õ\ó l k ©Yñ��5ó nðˇi ñ�ù^õ5ó lcongelation ökongeð latyon ökon ©�ð leIs© nconic ðkonIk ðkonIkconsignation ökonsıgðnatyon ökonsIgðneIs© nconsign konð sıgn k © ñ��5ó nðsaI ñ�� þ ó nconsolation ökonsoð latyon ökons© ñ��5ó ð leIs© ncontravene ökontræðven ökontr© ñ�õ5ó ðvincontravention ökontræðv ÷ ntyon ökontr© ñ�õ5ó ðv ÷ nc© nconvene konðven k © ñ��5ó nðvinconvention konðv ÷ ntyon k © ñ��5ó nðv ÷ nc© nconvocation ökonvoðkatyon ökonv ©\ñ��#ó�ðkeIs© nconvoke konðvok k ©Yñ��5ó nðvoUkcormophyte ðkormoö fi ñ�ú,ó t ü ý�þ�ÿ�� ðk � rm©Yñ��#óÆö faI ñ�ú,ó t ü ý�þ;ÿ��cormophytic ökormoð fi ñ�ú,ó tIk ü ý�þ;ÿ�� ök � rm©Yñ��#óÆð f I ñ(ú,ó tIk ü ýÆþ�ÿ��creophagous krI ñ�ù;óÆðofagosü ý�þ;ÿ�� kri ðof ©Yñ�õ\ó g© sü ý�þ;ÿ��creophagy krI ñ�ù;ó ðofagI ü ýÆþ�ÿ�� kri ðof © ñ�õ\ó ˇi ü ý�þ;ÿ��creosote ðkrI ñ�ù;ó oösot ðkri © ñ��5ó ösoUtcreosotic ökrI ñ�ù;ó oðsotIk ökri © ñ��5ó ðsotIkcriticism ðkrItI ökIsm ðkrItI ösñ�ø�ó Iz© mcriticize ðkrItI ökız ðkrItI ösñ�ø�ó aIzcritic ðkrItIk ðkrItIkcrocein ðkrokeIn ðkroUsñ�ø�ó i Incrocus ðkrokUs ðkroUk ©Yñ�ò,ó scryoscope ðkri ñ�ú,ó oöskop ðkraI ñ�ú,ó�©Yñ��#ó!öskoUpcryoscopic ökri ñ�ú,ó oðskopIk ökraI ñ�ú,ó�©Yñ��#ó!ðskopIkcrystallite ðkrI ñ(ú,ó stæö l ü ý������ ıt ðkrI ñ(ú,ó st©Yñ�õ\óïö l ü ý���� aItcrystallitic ökrI ñ(ú,ó stæð l ü ý������ ıtIk ökrI ñ(ú,ó st©Yñ�õ\óïð l ü ý���� ItIkcyanite ðsñ�ø�ó i ñ(ú,ó æönıt ðsñ�ø�ó aI ñ(ú,ó © ñ�õ\ó önaItcyanitic ösñ�ø�ó i ñ(ú,ó æðnıtIk ösñ�ø�ó aI ñ(ú,ó © ñ�õ\ó ðnItIkcyclone ðsñ�ø�ó i ñ(ú,ó klon ðsñ�ø�ó aI ñ(ú,ó kloUncyclonic sñ�ø�ó i ñ(ú,ó ðklonIk sñ�ø�ó aI ñ(ú,ó ðklonIkcynicism ðsñ�ø�ó I ñ�ú,ó nI ökIsm ðsñ�ø�ó I ñ�ú,ó nI ösñ�ø�ó Iz© mcynic ðsñ�ø�ó I ñ�ú,ó nIk ðsñ�ø�ó I ñ�ú,ó nIkcystoscope ðsñ�ø�ó I ñ�ú,ó stoöskop ðsñ�ø�ó I ñ�ú,ó st© ñ��5ó öskoUpcystoscopic ösñ�ø�ó I ñ�ú,ó stoðskopIk ösñ�ø�ó I ñ�ú,ó st© ñ��5ó ðskopIkdeclination öd÷ klı ðnatyon öd÷ kl ©Yñ � ó!ðneIs© ndecline d÷>ðklın dI ðklaIndendrite ðd÷ ndrıt ðd÷ ndraItdendritic d÷ nðdrıtIk d÷ nðdrItIkdenounce d÷>ðnuns dI ðnaUnsdenunciate d÷>ðnu ñ�ò,ó nsñ�ø�ó I ö at dI ðnô nsñ�ø�ó i ñ�� ó öeItdenunciation d÷>önu ñ�ò,ó nsñ�ø�ó I ð atyon dI önô nsñ�ø�ó i ñ�� ó ðeIs© ndeprave d÷>ðprav dI ðpreIvdepravity d÷>ðpravIti dI ðprævIti

102 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

deprivation öd÷ prı ðvatyon öd÷ pr©Yñ�� óÆðveIs© ndeprive d÷>ðprıv dI ðpraIvderivation öd÷ rı ðvatyon öd÷ r ©Yñ � ó!ðveIs© nderive d÷>ðrıv dI ðraIvdermatome ðd÷ rmöætom ðdr.m© ñ�õ5ó ö toUmdermatomic öd÷ rmæð tomIk ödr.m© ñ�õ5ó ð tomIkdermatophyte ðd÷ rmætöofi ñ(ú,ó t ü ý�þ;ÿ�� ðdr.m© ñ�õ5ó t © ñ��#ó ö faI ñ�ú,ó t ü ý�þ;ÿ��dermatophytic öd÷ rmætoð fi ñ(ú,ó tIk ü ý�þ;ÿ�� ödr.m© ñ�õ5ó t © ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��desensitization deös÷ nsItI ðzatyon di ös÷ nsItI ðzeIs© ndesensitize deðs÷ nsI ö tız di ðs÷ nsI ö taIzdesignation öd÷ sıgðnatyon öd÷ z ñ��>ó IgðneIs© ndesign d÷>ð sıgn dI ðz ñ��>ó aI ñ�� þ ó ndeuteranope ðduñ�ù?ò,ó t ÷ ræönop ðduñ�ù?ò�ó tr. ©\ñ�õ\ó�önoUpdeuteranopic öduñ�ù?ò,ó t ÷ ræðnopIk öduñ�ù?ò�ó tr. ©\ñ�õ\ó�ðnopIkdiaphone ðdıæö fonü ýÆþ�ÿ�� ðdaI ©Yñ�õ\óÆö foUn ü ý�þ�ÿ��diaphonic ödıæð fonIk ü ý�þ;ÿ�� ödaI ©Yñ�õ\óÆð fonIk ü ý�þ;ÿ��dibasicity ödıbaðsIkIti ödaIbeI ðsIsñ�ø�ó Itidibasic dı ðbasIk daI ðbeIsIkdichroite ðdık ñ�øaû,ó roö ıt ðdaIk ñ�øaû�ó roU öaItdichroitic ödık ñ�øaû,ó roð ıtIk ödaIk ñ�øaû�ó roU ð ItIkdichromate dı ðk ñ�øaû,ó romat daI ðk ñ�øaû�ó roUmeItdichromaticism ödık ñ�øaû,ó roðmatI ökIsm ödaIk ñ�øaû�ó roU ðmætI ösñ�ø�ó Iz© mdichromatic ödık ñ�øaû,ó roðmatIk ödaIk ñ�øaû�ó roU ðmætIkdichroscope ðdık ñ�øaû,ó roöskop ðdaIk ñ�øaû�ó r © ñ��#ó öskoUpdichroscopic ödık ñ�øaû,ó roðskopIk ödaIk ñ�øaû�ó r © ñ��#ó ðskopIkdiorite ðdıoörıt ðdaI © ñ��5ó ö raItdioritic ödıoðrItIk ödaI ©Yñ��5óÆð rItIkdiscommode ödIskoðm ü ý������ od ödIsk©\ñ��#óÆðm ü ý����� oUddiscommodity ödIskoðm ü ý������ odIti ödIsk©\ñ��#óÆðm ü ý����� odItidisinclination ödIsInklı ðnatyon ödIsInklI ðneIs© ndisincline ödIsInðklın ödIsInðklaIndivination ödIvı ðnatyon ödIv © ñ � ó ðneIs© ndivine dI ðvın dI ñ�� ó ðvaIndivinity dI ðvınIti dI ñ�� ó ðvInItidolerite ðdol÷>örıt ðdol©�öraItdoleritic ödol÷>ðrıtIk ödol©�ðrItIkdramatization ödræmætı ðzatyon ödræm© ñ�õ\ó tI ðzeIs© ndramatize ðdræmæö tız ðdræm© ñ�õ\ó ö taIzdynamite ðdi ñ�ú,ó næömıt ðdaI ñ(ú(ó n© ñ�õ5ó ömaItdynamitic ödi ñ�ú,ó næðmıtIk ödaI ñ(ú(ó n©Yñ�õ5ó!ðmItIkecclesiastical ÷>ök ü ý������ lezñ��>ó I ðæstIkæl I ñ�ù�ó�ök ü ý������ liz ñ��>ó i ñ�� ó�ðæstIk ©Yñ�õ\ó lecclesiasticism ÷>ök ü ý������ lezñ��>ó I ðæstI ökIsm I ñ�ù�ó�ök ü ý������ liz ñ��>ó i ñ�� ó�ðæstI ösñ�ø�ó Iz© mecclesiastic ÷>ök ü ý������ lezñ��>ó I ðæstIk I ñ�ù�ó�ök ü ý������ liz ñ��>ó i ñ�� ó�ðæstIkeclecticism ÷>ðkl ÷ ktI ökIsm I ñ�ù�ó�ðkl ÷ ktI ösñ�ø�ó Iz© meclectic ÷>ðkl ÷ ktIk I ñ�ù�ó ðkl ÷ ktIk

3.A. ENGLISHDEEPAND SHALLOW ORL’S 103

deep shallow

ecotype ð ÷ koö ti ñ�ú,ó p ð ÷ k ©\ñ��#ó�ö taI ñ�ú,ó pecotypic ö ÷ koð ti ñ�ú,ó pIk ö ÷ k ©\ñ��#ó�ð tI ñ(ú,ó pIkectoparasite ö ÷ ktoðpæræö sıt ö ÷ ktoU ðpær©Yñ�õ5óÆösaItectoparasitic ö ÷ ktoöpæræð sıtIk ö ÷ ktoU öpær©Yñ�õ5óÆðsItIkedacious ÷>ðdakyos I ñ�ù;ó ðdeIs© sedacity ÷>ðdakIti I ñ�ù;ó ðdæsñ�ø�ó Itielasticity ÷ læðstIkIti I ñ�ù;ó læðstIsñ�ø�ó Itielasticize ÷>ð læstI ökız I ñ�ù;ó ð læstI ösñ�ø�ó aIzelastic ÷>ð læstIk I ñ�ù;ó ð læstIkelectrical ÷>ð l ÷ ktrIkæl I ñ�ù;ó ð l ÷ ktrIk © ñ�õ5ó lelectricity ÷ l ÷ k ð trIkIti I ñ�ù;ó l ÷ k ð trIsñ�ø�ó Itielectric ÷>ð l ÷ ktrIk I ñ�ù;ó ð l ÷ ktrIkelectrolyte ÷>ð l ÷ ktroö li ñ�ú,ó t I ñ�ù;óïð l ÷ ktr ©Yñ��5ó!ö laI ñ(ú,ó telectrolytic ÷>ö l ÷ ktroð li ñ�ú,ó tIk I ñ�ù;óïö l ÷ ktr ©Yñ��5ó!ð l I ñ�ú,ó tIkelectrophone ÷>ð l ÷ ktroö fonü ýÆþ�ÿ�� I ñ�ù;óïð l ÷ ktr ©Yñ��5ó!ö foUn ü ý�þ;ÿ��electrophonic ÷>ö l ÷ ktroð fonIk ü ýÆþ�ÿ�� I ñ�ù;óïö l ÷ ktr ©Yñ��5ó!ð fonIk ü ý�þ;ÿ��electroscope ÷>ð l ÷ ktroöskop I ñ�ù;óïð l ÷ ktr ©Yñ��5ó!öskoUpelectroscopic ÷>ö l ÷ ktroðskopIk I ñ�ù;óïö l ÷ ktr ©Yñ��5ó!ðskopIkelliptical ÷>ð l ü ý������ IptIkæl I ñ�ù;ó ð l ü ý���� IptIk © ñ�õ\ó lellipticity ÷ l ü ý������ Ipð tIkIti I ñ�ù;ó l ü ý���� Ipð tIsñ�ø�ó Itiempiricism ÷ mðpIrI ökIsm ÷ mðpIrI ösñ�ø�ó Iz© mempiric ÷ mðpIrIk ÷ mðpIrIkendoparasite ö ÷ ndoðpæræö sıt ö ÷ ndoU ðpær© ñ�õ5ó ösaItendoparasitic ö ÷ ndoöpæræð sıtIk ö ÷ ndoU öpær© ñ�õ5ó ðsItIkendophyte ð ÷ ndoö fi ñ(ú,ó t ü ýÆþ�ÿ�� ð ÷ nd© ñ��#ó ö faI ñ�ú,ó t ü ýÆþ�ÿ��endophytic ö ÷ ndoð fi ñ(ú,ó tIk ü ý�þ;ÿ�� ö ÷ nd© ñ��#ó ð f I ñ(ú(ó tIk ü ý�þ�ÿ��endoscope ð ÷ ndoöskop ð ÷ nd©Yñ��#ó!öskoUpendoscopic ö ÷ ndoðskopIk ö ÷ nd©Yñ��#ó!ðskopIkenounce ÷>ðnuns I ñ�ù;óïðnaUnsentophyte ð ÷ ntoö fi ñ(ú(ó t ü ý�þ;ÿ�� ð ÷ nt©Yñ��5óïö faI ñ(ú,ó t ü ý�þ�ÿ��entophytic ö ÷ ntoð fi ñ(ú(ó tIk ü ý�þ;ÿ�� ö ÷ nt©Yñ��5óïð f I ñ�ú,ó tIk ü ý�þ;ÿ��enunciable ÷>ðnu ñ ò�ó nsñ�ø�ó IæbI l I ñ�ù;ó ðnô nsñ�ø�ó i ñ � ó © ñ�õ\ó b© lenunciate ÷>ðnu ñ ò�ó nsñ�ø�ó I ö at I ñ�ù;ó ðnô nsñ�ø�ó i ñ � ó öeItenunciation ÷>önu ñ ò�ó nsñ�ø�ó I ð atyon I ñ�ù;ó önô nsñ�ø�ó i ñ � ó ðeIs© nepidote ð ÷ pI ödot ð ÷ pI ödoUtepidotic ö ÷ pI ðdotIk ö ÷ pI ðdotIkepiphyte ð ÷ pI ö fi ñ�ú,ó t ü ý�þ;ÿ�� ð ÷ p© ñ�� ó ö faI ñ(ú,ó t ü ý�þ;ÿ��epiphytic ö ÷ pI ð fi ñ�ú,ó tIk ü ý�þ�ÿ�� ö ÷ p© ñ�� ó ð f I ñ�ú,ó tIk ü ý�þ;ÿ��episode ð ÷ pI ösod ð ÷ p© ñ�� ó ösoUdepisodic ö ÷ pI ðsodIk ö ÷ p©\ñ�� óïðsodIkequivocation ÷>ökwIvoðkatyon I ñ�ù;óïökwIv ©Yñ��#óÆðkeIs© nequivoke ð ÷ kwI övok ð ÷ kw ©\ñ�� óïövoUkeremite ð ÷ r ÷>ömıt ð ÷ r ©�ömaIteremitic ö ÷ r ÷>ðmıtIk ö ÷ r ©�ðmItIkeroticism ÷>ð rotI ökIsm © ñ�ù;ó ðrotI ösñ�ø�ó Iz© m

104 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

erotic ÷?ðrotIk ©Yñ�ù;ó�ð rotIkerythrocyte ÷?ðrI ñ�ú,ó�� roösñ�ø�ó i ñ(ú,ó t I ñ�ù;ó�ð rI ñ�ú,ó�� r ©Yñ��5ó!ösñ�ø�ó aI ñ(ú,ó terythrocytic ÷?örI ñ�ú,ó�� roðsñ�ø�ó i ñ(ú,ó tIk I ñ�ù;ó�ö rI ñ�ú,ó�� r ©Yñ��5ó!ðsñ�ø�ó I ñ�ú,ó tIkesophageal ÷?ösofaðgI ñ�ù;ó ælü ý�þ�ÿ�� I ñ�ù;ó�ösof©Yñ�õ5óÆðˇi ©Yñ�õ\ó l ü ý�þ�ÿ��esophagus ÷?ðsofagUsü ý�þ;ÿ�� I ñ�ù;ó ðsof© ñ�õ5ó g© ñ ò�ó sü ý�þ�ÿ��esthete ð ÷ s� et ð ÷ s� itestheticism ÷ sð � etI ökIsm ÷ sð �;÷ tI ösñ�ø�ó Iz© mesthetic ÷ sð � etIk ÷ sð �;÷ tIkethicize ð ÷�� I ökız ð ÷�� I ösñ�ø�ó aIzethic ð ÷�� Ik ð ÷�� Ikevocation ö ÷ voðkatyon ö ÷ v © ñ��5ó ðkeIs© nevoke ÷?ðvok I ñ�ù;ó�ðvoUkexegete ð ÷ ks÷?öget ð ÷ ksI ñ�ù;ó�öˇitexegetic ö ÷ ks÷?ðgetIk ö ÷ ksI ñ�ù;ó�ðˇ ÷ tIkexile ð ÷ gzıl ð ÷ gzaI lexilic ÷?ðgzıl Ik ÷>ðgzI l Ikextreme ÷?ðkstrem I ñ�ù;ó ðkstrimextremity ÷?ðkstremIti I ñ�ù;ó ðkstr÷ mItifalciform ð fælkI ö form ð fælsñ�ø�ó © ñ�� ó ö f � rmfalcon ð fælkon ð fælk© ñ��5ó nfanaticism fæðnætI ökIsm f © ñ�õ\ó ðnætI ösñ�ø�ó Iz© mfanaticize fæðnætI ökız f © ñ�õ\ó ðnætI ösñ�ø�ó aIzfanatic fæðnætIk f ©Yñ�õ\ó!ðnætIkfasciation ö fas��ñ�ø�ó I ð atyon ö fæsñ��gø�ó i ñ � ó!ðeIs© nfascia ð fas��ñ�ø�ó Iæ ð feIsñ��gø�ó i ñ��vó!©Yñ�õ5ófascination ö fæs��ñ�ø�ó I ðnatyon ö fæs��ñ�ø�ó�©Yñ�� óïðneIs© nfascine fæðs��ñ�ø�ó eñ�� ó n fæðs��ñ�ø�ó i ñ � ó nfederalization ö f ÷ d÷ rælð Izatyon ö f ÷ dr ñ�ù! ?ó!©Yñ�õ\ó l ©Yñ��vó�ðzeIs© nfederalize ð f ÷ d÷ r öælız ð f ÷ dr ñ�ù! ?ó © ñ�õ\ó ö laIzfelsite ð f ÷ lsıt ð f ÷ lsaItfelsitic f ÷ l ð sıtIk f ÷ l ðsItIkferocious f ÷>ð rokyos f ©�ð roUs© sferocity f ÷>ð rokIti f ©�ð rosñ�ø�ó Itiferroelectricity ö f ÷ r ü ý������ o÷ l ÷ k ð trIkIti ö f ÷ r ü ý������ oUI ñ�ù�ó l ÷ k ð trIsñ�ø�ó Itiferroelectric ö f ÷ r ü ý������ o÷?ð l ÷ ktrIk ö f ÷ r ü ý������ oUI ñ�ù�ó ð l ÷ ktrIkfertilization ö f ÷ rtI l ı ðzatyon ö fr.t © ñ � ó l I ðzeIs© nfertilize ð f ÷ rt ö I l ız ð fr.t ©Yñ � ó�ö laIzfinance f I ðnæns f I ðnænsfinance ð fı önæns ð faI önænsfluoroscope ðflu ñ�ò��5ó roöskop ðflu ñ ò�#ó r. ©Yñ��5ó!öskoUpfluoroscopic öflu ñ�ò��5ó roðskopIk öflu ñ ò�#ó r. ©Yñ��5ó!ðskopIkfugacious fugð akyos fyu ðgeIs© sfugacity fugð akIti fyu ðgæsñ�ø�ó Itifumarole ð fumærö ol ð fyum© ñ�õ\ó öroUlfumarolic ö fumærð olIk ö fyum© ñ�õ\ó ðrolIk

3.A. ENGLISHDEEPAND SHALLOW ORL’S 105

deep shallow

fungicide ð f ô ngI ökıd ð f ô nˇI ösñ�ø�ó aIdfungic ð f ô ngIk ð f ô nˇIkgalvanoscope ðgælvænoöskop ðgælv©Yñ�õ\ó n©Yñ��#ó!öskoUpgalvanoscopic ögælvænoðskopIk ögælv© ñ�õ\ó n© ñ��#ó ðskopIkgastronome ðgæstroönom ðgæstr© ñ��#ó önoUmgastronomic ögæstroðnomIk ögæstr© ñ��#ó ðnomIkgastroscope ðgæstroöskop ðgæstr© ñ��#ó öskoUpgastroscopic ögæstroðskopIk ögæstr© ñ��#ó ðskopIkgeneralization öˇ ÷ n÷ rælð ızatyon öˇ ÷ nr ñ�ù! ?ó © ñ�õ\ó l © ñ � ó ðzeIs© ngeneralize ðˇ ÷ n÷ r öælız ðˇ ÷ nr ñ�ù! ?ó © ñ�õ\ó ö laIzgeneticist ˇeðn÷ tIkIst ©�ðn÷ tIsñ�ø�ó Istgenetic ˇeðn÷ tIk ©�ðn÷ tIkgene ðˇen ðˇingenic ðˇenIk ðˇ ÷ nIkgenotype ðˇenoö ti ñ�ú,ó p ðˇ ÷ n©Yñ��5ó!ö taI ñ(ú,ó pgenotypicity öˇenotiñ�ú,ó ðpIkIti öˇ ÷ n©Yñ��5ó tI ñ�ú,ó!ðpIsñ�ø�ó Itigenotypic öˇenoð ti ñ�ú,ó pIk öˇ ÷ n© ñ��5ó ð tI ñ�ú,ó pIkgeode ðˇeod ðˇioUdgeodic ˇeð odIk ˇi ðodIkgeophagism ˇeðofaögIsmü ý�þ;ÿ�� ˇi ðof © ñ�õ\ó öˇIz© m ü ý�þ;ÿ��geophagous ˇeðofagosü ý�þ;ÿ�� ˇi ðof © ñ�õ\ó g© sü ý�þ�ÿ��geophagy ˇeðofagI ü ý�þ�ÿ�� ˇi ðof © ñ�õ\ó ˇi ü ý�þ;ÿ��geophyte ðˇeoö fi ñ(ú(ó t ü ý�þ�ÿ�� ðˇi ©Yñ��5óÆö faI ñ�ú,ó t ü ýÆþ�ÿ��geophytic öˇeoð fi ñ(ú(ó tIk ü ý�þ;ÿ�� öˇi ©Yñ��5óÆð f I ñ(ú(ó tIk ü ý�þ�ÿ��gibbose ðgIb ü ý���� os ðgIb ü ý����� oUsgibbosity gI ðb ü ý���� osIti gI ðb ü ý����� osItiglauconite ðgl � koönıt ðgl � k ©Yñ��5ó�önaItglauconitic ögl � koðnıtIk ögl � k © ñ��5ó ðnItIkglobose ðgloöbos ðgloU öboUsglobosity gloðbosIti gloU ðbosItiglucose ðglukos ðglukoUsglucosic gluðkosIk gluðkosIkglucoside ðglukoö sıd ðgluk© ñ��#ó ösaIdglucosidic öglukoð sıdIk ögluk© ñ��#ó ðsIdIkglycine ðgli ñ(ú,ó keñ��vó n ðglaI ñ�ú,ó sñ�ø�ó i ñ � ó nglycoside ðgli ñ(ú,ó koö sıd ðglaI ñ�ú,ó k ©Yñ��5ó�ösaIdglycosidic ögli ñ(ú,ó koð sıdIk öglaI ñ�ú,ó k ©Yñ��5ó�ðsIdIkgrandiose ðgrændI ö os ðgrændiñ��vó öoUsgrandiosity ögrændI ð osIti ögrændiñ��vó ðosItigranulite ðgrænUl ö ıt ðgræny ©�ö laItgranulitic ögrænUl ð ıtIk ögræny ©�ð l ItIkgranulocyte ðgrænUlosñ�ø�ó ö i ñ(ú(ó t ðgræny © loU ösñ�ø�ó aI ñ(ú,ó tgranulocytic ögrænUloðsñ�ø�ó i ñ(ú,ó tIk ögræny © loU ðsñ�ø�ó I ñ�ú,ó tIkgrave ðgrav ðgreIvgravity ðgravIti ðgrævIti

106 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

gyroscope ðˇi ñ(ú,ó roöskop ðˇaI ñ(ú(ó r ©Yñ��5ó!öskoUpgyroscopic öˇi ñ(ú,ó roðskopIk öˇaI ñ(ú(ó r ©Yñ��5ó!ðskopIkhagioscope ðhægIoöskop ðhægiñ��vó ©\ñ��#ó�öskoUphagioscopic öhægIoðskopIk öhægiñ��vó ©\ñ��#ó�ðskopIkhalophyte ðhæloö fi ñ(ú,ó t ü ýÆþ�ÿ�� ðhæl© ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��halophytic öhæloð fi ñ(ú,ó tIk ü ý�þ;ÿ�� öhæl© ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��haplite ðhæplıt ðhæplaIthaplitic hæpð l ıtIk hæpð l ItIkhelical ðh÷ l Ikæl ðh÷ l Ik © ñ�õ\ó lhelices ðh÷ l I ök � ez ðh÷ l I ösñ�ø�ó � izheliocentricism öhelIoðsñ�ø�ó ÷ ntrI ökIsm öhili ñ�� ó oU ðsñ�ø�ó ÷ ntrI ösñ�ø�ó Iz© mheliocentricity öhelIosñ�ø�ó ÷ nð trIkIti öhili ñ�� ó oUsñ�ø�ó ÷ nð trIsñ�ø�ó Itiheliocentric öhelIoðsñ�ø�óÆ÷ ntrIk öhili ñ�� ó oU ðsñ�ø�ó�÷ ntrIkheliotrope ðhelIoö trop ðhili ñ�� óÆ©\ñ��#ó!ö troUpheliotropic öhelIoð tropIk öhili ñ�� óÆ©\ñ��#ó!ð tropIkheliotype ðhelIoö ti ñ�ú,ó p ðhili ñ�� óÆ©\ñ��#ó!ö taI ñ(ú(ó pheliotypic öhelIoð ti ñ�ú,ó pIk öhili ñ�� óÆ©\ñ��#ó!ð tI ñ(ú,ó pIkhematite ðh÷ mæö tıt ðh÷ m©Yñ�õ\óÆö taIthematitic öh÷ mæð tıtIk öh÷ m© ñ�õ\ó ð tItIkhemitrope ðh÷ mI ö trop ðh÷ mI ö troUphemitropic öh÷ mI ð tropIk öh÷ mI ð tropIkhemophile ðhemoö fıl ü ý�þ;ÿ�� ðhim© ñ��#ó ö faI l ü ý�þ;ÿ��hemophilic öhemoð fıl Ik ü ý�þ;ÿ�� öhim© ñ��#ó ð f I l Ik ü ýÆþ�ÿ��heteroclite ðh÷ t ÷ roöklıt ðh÷ tr. © ñ��5ó öklaItheteroclitic öh÷ t ÷ roðklıtIk öh÷ tr. © ñ��5ó ðkl ItIkhistiocyte ðhIstIoösñ�ø�ó i ñ�ú,ó t ðhIsti ñ�� ó © ñ��#ó ösñ�ø�ó aI ñ(ú(ó thistiocytic öhIstIoðsñ�ø�ó i ñ�ú,ó tIk öhIsti ñ�� óÆ©\ñ��#ó�ðsñ�ø�ó I ñ(ú,ó tIkhistoricism hI ðstorI ökIsm hI ðst� rI ösñ�ø�ó Iz© mhistoric hI ðstorIk hI ðst� rIkholophyte ðholoö fi ñ�ú,ó t ü ý�þ�ÿ�� ðhol©Yñ��5ó�ö faI ñ�ú,ó t ü ý�þ;ÿ��holophytic öholoð fi ñ�ú,ó tIk ü ý�þ;ÿ�� öhol©Yñ��5ó�ð f I ñ(ú,ó tIk ü ýÆþ�ÿ��holotype ðholoö ti ñ�ú,ó p ðhol© ñ��5ó ö taI ñ�ú,ó pholotypic öholoð ti ñ�ú,ó pIk öhol© ñ��5ó ð tI ñ(ú,ó pIkhomologize hoðmoloö gız h© ñ��5ó ðmol© ñ��#ó öˇaIzhomologous hoðmologos h© ñ��5ó ðmol© ñ��#ó g© shomology hoðmologI h© ñ��5ó ðmol© ñ��#ó ˇihomophile ðhomoö fıl ü ý�þ;ÿ�� ðhoUm© ñ��#ó ö faI l ü ýÆþ�ÿ��homophone ðhomoö fonü ý�þ�ÿ�� ðhom©\ñ��#ó ö foUn ü ý�þ;ÿ��homophonic öhomoð fonIk ü ý�þ;ÿ�� öhom©\ñ��#ó ð fonIk ü ý�þ;ÿ��homophyllic öhomoð f I ñ�ú,ó l ü ý������ Ik ü ýÆþ�ÿ�� öhoUm©Yñ��#ó�ð f I ñ(ú,ó l ü ý������ Ik ü ýÆþ�ÿ��homozygote öhomoðzi ñ�ú,ó got öhoUm©Yñ��#ó�ðzaI ñ(ú,ó goUthomozygotic öhomoziñ�ú,ó ðgotIk öhoUm©Yñ��#ó zaI ñ(ú,ó�ðgotIkhoplite ðhoplıt ðhoplaIthoplitic hopð l ıtIk hopð l ItIkhoroscope ðhoroöskop ðh� r © ñ��#ó öskoUp

3.A. ENGLISHDEEPAND SHALLOW ORL’S 107

deep shallow

horoscopic öhoroðskopIk öh� r ©\ñ��#óÆðskopIkhospitalization öhospItælı ðzatyon öhospIt ©Yñ�õ5ó l I ðzeIs© nhospitalize ðhospItæö l ız ðhospIt ©Yñ�õ5ó ö laIzhumane humð an hyuðmeInhumanity humðænIti hyuðmænItihydroelectricity öhi ñ(ú,ó dro÷ l ÷ k ð trIkIti öhaI ñ�ú,ó droUI ñ�ù;ó l ÷ k ð trIsñ�ø�ó Itihydroelectric öhi ñ(ú,ó dro÷>ð l ÷ ktrIk öhaI ñ�ú,ó droUI ñ�ù;ó ð l ÷ ktrIkhydrolyte ðhi ñ(ú,ó droö li ñ(ú(ó t ðhaI ñ�ú,ó dr© ñ��#ó ö laI ñ�ú,ó thydrolytic hi ñ(ú,ó droð li ñ�ú,ó tIk haI ñ�ú,ó dr© ñ��#ó ð l I ñ(ú,ó tIkhydrophyte ðhi ñ(ú,ó droö fi ñ(ú(ó t ü ý�þ�ÿ�� ðhaI ñ�ú,ó dr© ñ��#ó ö faI ñ�ú,ó t ü ý�þ;ÿ��hydrophytic öhi ñ(ú,ó droð fi ñ(ú(ó tIk ü ý�þ;ÿ�� öhaI ñ�ú,ó dr© ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��hydroscope ðhi ñ(ú,ó droöskop ðhaI ñ�ú,ó dr© ñ��#ó öskoUphydroscopic öhi ñ(ú,ó droðskopIk öhaI ñ�ú,ó dr©Yñ��#ó!ðskopIkhygroscope ðhi ñ(ú,ó groöskop ðhaI ñ�ú,ó gr©Yñ��#ó!öskoUphygroscopic öhi ñ(ú,ó groðskopIk öhaI ñ�ú,ó gr©Yñ��#ó!ðskopIkhypersthene ðhi ñ(ú,ó p÷ r ös� en ðhaI ñ�ú,ó pr. ös� inhypersthenic öhi ñ(ú,ó p÷ r ðs� enIk öhaI ñ�ú,ó pr. ðs��÷ nIkhypogene ðhi ñ(ú,ó poöˇen ðhaI ñ�ú,ó p©Yñ��5ó!öˇinhypogenic öhi ñ(ú,ó poðˇenIk öhaI ñ�ú,ó p© ñ��5ó ðˇ ÷ nIkichthyolite ð Ik ñ�øaû�ó � I ñ�ú,ó oö l ıt ð Ik ñ�øaû,ó � i ñ(ú(ó © ñ��#ó ö laItichthyolitic ö Ik ñ�øaû�ó � I ñ�ú,ó oð l ıtIk ö Ik ñ�øaû,ó � i ñ(ú(ó © ñ��#ó ð l ItIkichthyophagous ö Ik ñ�øaû�ó � I ñ�ú,ó ðofagosü ý�þ;ÿ�� ö Ik ñ�øaû,ó � i ñ(ú(ó ðof © ñ�õ5ó g© sü ýÆþ�ÿ��ichthyophagy ö Ik ñ�øaû�ó � I ñ�ú,ó ðofagI ü ýÆþ�ÿ�� ö Ik ñ�øaû,ó � i ñ(ú(ó ðof © ñ�õ5ó ˇi ü ý�þ;ÿ��iconomaticism ı ökonoðmætI ökIsm aI ökon© ñ��#ó ðmætI ösñ�ø�ó Iz© miconomatic ı ökonoðmætIk aI ökon©Yñ��#ó�ðmætIkidiophone ð IdIoö fonü ý�þ;ÿ�� ð Idi ñ � óÆ©Yñ��5ó!ö foUn ü ý�þ;ÿ��idiophonic ö IdIoð fonIk ü ý�þ;ÿ�� ö Idi ñ � óÆ©Yñ��5ó!ð fonIk ü ý�þ;ÿ��imide ð Imıd ð ImaIdimidic I ðmıdIk I ðmIdIkimpastation ö Impasð tatyon ö Impæsð teIs© nimpaste Imðpast ImðpeIstimpolite ö Impoð l ıt ö Imp© ñ��#ó ð laItimpolitic ImðpolıtIk ImðpolItIkinane I ðnan I ðneIninanity I ðnanIti I ðnænItiinclination ö InklI ðnatyon ö Inkl © ñ��vó ðneIs© nincline Inðklın InðklaInincommode ö Inkoðm ü ý������ od ö Ink©Yñ��5ó�ðm ü ý���� oUdincommodity ö Inkoðm ü ý������ odIti ö Ink©Yñ��5ó�ðm ü ý���� odItiindigene ð IndI öˇen ð IndI öˇinindigenity ö IndI ðˇenIti ö IndI ðˇ ÷ nItiindignation ö Indıgðnatyon ö IndIgðneIs© nindignity InðdıgnIti InðdIgnItiindign Inðdıgn InðdaI ñ�� þ ó ninefficacious ö In÷ f ü ý���� I ðkakyos ö In÷ f ü ý�����t© ñ�� ó ðkeIs© s

108 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

inefficacity ö In÷ f ü ý���� I ðkakIti ö In÷ f ü ý����n©Yñ�� óïðkæsñ�ø�ó Itiinelasticity ö In÷ læsð tIkIti ö InI ñ�ù;ó læsð tIsñ�ø�ó Itiinelastic ö In÷>ð læstIk ö InI ñ�ù;ó!ð læstIkiniquity I ðnIkwIti I ðnIkwItiinsane Inðsan InðseIninsanity InðsanIti InðsænItiintervene ö Int÷ r ðven ö Intr. ðvinintervention ö Int÷ r ðv ÷ ntyon ö Intr. ðv ÷ nc© ninurbane ö InUr ðban ö Inr. ñ�ò� ?ó ðbeIninurbanity ö InUr ðbænIti ö Inr. ñ�ò� ?ó ðbænItiinvitation ö Invı ð tatyon ö InvI ð teIs© ninvite Inðvıt InðvaItinvocation ö Invoðkatyon ö Inv ©\ñ��#óÆðkeIs© ninvoke Inðvok InðvoUkionization ö ıonı ðzatyon öaI ©Yñ��#ó n©\ñ�� óïðzeIs© nionize ð ıoönız ðaI ©Yñ��#ó!önaIzisocline ð ısoöklın ðaIs©Yñ��5ó!öklaInisoclinic ö ısoðklınIk öaIs©Yñ��5ó!ðkl InIkisotone ð ısoö ton ðaIs© ñ��5ó ö toUnisotonic ö ısoð tonIk öaIs© ñ��5ó ð tonIkisotope ð ısoö top ðaIs© ñ��5ó ö toUpisotopic ö ısoð topIk öaIs© ñ��5ó ð topIkkaleidoscope k ñ ÿ ó æð li ñ�ù"� ó doöskop k ñ ÿ ó © ñ�õ5ó ð laI ñ�ù"� ó d© ñ��#ó öskoUpkaleidoscopic k ñ ÿ ó æö li ñ�ù"� ó doðskopIk k ñ ÿ ó © ñ�õ5ó ö laI ñ�ù"� ó d© ñ��#ó ðskopIkkaryotype ðk ñ ÿ ó ærI ñ�ú,ó oö ti ñ(ú,ó p ðk ñ ÿ ó æriñ(ú(ó © ñ��#ó ö taI ñ�ú,ó pkaryotypic ök ñ ÿ ó ærI ñ�ú,ó oð ti ñ(ú,ó pIk ök ñ ÿ ó æriñ(ú(ó © ñ��#ó ð tI ñ(ú(ó pIkkyanite ðk ñ ÿ ó i ñ�ú,ó æönıt ðk ñ ÿ ó aI ñ�ú,óÆ©Yñ�õ\ó�önaItkyanitic ök ñ ÿ ó i ñ�ú,ó æðnıtIk ök ñ ÿ ó aI ñ�ú,óÆ©Yñ�õ\ó�ðnItIklaccolite ð lækü ý������ oö l ıt ð lækü ý������t©Yñ��5ó�ö laItlaccolitic ö lækü ý������ oð l ItIk ö lækü ý������t©Yñ��5ó�ð l ItIklachrymose ð lækñ�øfû�ó rI ñ(ú,ó�ömos ð lækñ�øfû�ó r ©Yñ(ú(ó!ömoUslachrymosity ö lækñ�øfû�ó rI ñ(ú,ó ðmosIti ö lækñ�øfû�ó r © ñ(ú(ó ðmosItilactone ð lækton ð læktoUnlactonic lækð tonIk lækð tonIklanose ð lanos ð leInoUslanosity laðnosIti leI ðnosItilanuginous læðnugInos l © ñ�õ5ó ðnu © ñ�� ó n© slanugo læðnugo l © ñ�õ5ó ðnugoU

laryngoscope læðrI ñ�ú,ó ngoöskop l ©Yñ�õ5ó�ð rI ñ(ú,ó�� g©Yñ��5ó!öskoUplaryngoscopic læörI ñ�ú,ó ngoðskopIk l ©Yñ�õ5ó�ö rI ñ(ú,ó�� g©Yñ��5ó!ðskopIklaterite ð læt÷>ö rıt ð læt©�ö raItlateritic ö læt÷>ð rıtIk ö læt©�ð rItIklavation læðvatyon læðveIs© nlenticellate ö l ÷ ntI ðk ÷ l ü ý������ at ö l ÷ ntI ðsñ�ø�óm÷ l ü ý���� I ñ�õ\ó t ��ñ�ù;ólenticel ð l ÷ ntI ök ÷ l ð l ÷ ntI ösñ�ø�ó ÷ l

3.A. ENGLISHDEEPAND SHALLOW ORL’S 109

deep shallow

lentic ð l ÷ ntIk ð l ÷ ntIkleucite ð lu ñ�ù?ò�ó kıt ð lu ñ�ù?ò�ó sñ�ø�ó aItleucitic lu ñ�ù?ò�óÆðkıtIk lu ñ�ù?ò�óÆðsñ�ø�ó ItIkleukocyte ð lu ñ�ù?ò�ó k ñ ÿ ó oösñ�ø�ó i ñ�ú,ó t ð lu ñ�ù?ò�ó k ñ ÿ óÆ©Yñ��5ó!ösñ�ø�ó aI ñ(ú,ó tleukocytic ö lu ñ�ù?ò�ó k ñ ÿ ó oðsñ�ø�ó i ñ�ú,ó tIk ö lu ñ�ù?ò�ó k ñ ÿ ó © ñ��5ó ðsñ�ø�ó I ñ�ú,ó tIklignite ð l Ignıt ð l IgnaItlignitic l IgðnıtIk l IgðnItIklimonite ð l ımoönıt ð laIm© ñ��#ó önaItlimonitic ö l ımoðnıtIk ö laIm© ñ��#ó ðnItIklithophyte ð l I � oö fi ñ�ú,ó t ü ý�þ;ÿ�� ð l I ��© ñ��5ó ö faI ñ(ú,ó t ü ýÆþ�ÿ��lithophytic ö l I � oð fi ñ�ú,ó tIk ü ý�þ�ÿ�� ö l I ��© ñ��5ó ð f I ñ�ú,ó tIk ü ý�þ�ÿ��logicism ð logI ökIsm ð loˇI ösñ�ø�ó Iz© mlogic ð logIk ð loˇIkloquacious loðkwakyos loU ðkweIs© sloquacity loðkwækIti loU ðkwæsñ�ø�ó Itilycanthrope ð li ñ(ú,ó kænö � rop ð laI ñ(ú,ó k ©Yñ�õ5ó nö � roUplycanthropic ö li ñ(ú,ó kænð � ropIk ö laI ñ(ú,ó k © ñ�õ5ó nð � ropIklymphocyte ð l I ñ�ú,ó mfoösñ�ø�ó i ñ(ú,ó t ü ý�þ;ÿ�� ð l I ñ�ú,ó mf © ñ��#ó ösñ�ø�ó aI ñ(ú(ó t ü ý�þ;ÿ��lymphocytic ö l I ñ�ú,ó mfoðsñ�ø�ó i ñ(ú,ó tIk ü ýÆþ�ÿ�� ö l I ñ�ú,ó mf © ñ��#ó ðsñ�ø�ó I ñ(ú,ó tIk ü ýÆþ�ÿ��lyricism ð l I ñ�ú,ó rI ökIsm ð l I ñ�ú,ó rI ösñ�ø�ó Iz© mlyricist ð l I ñ�ú,ó rIkIst ð l I ñ�ú,ó rIsñ�ø�ó Istlyric ð l I ñ�ú,ó rIk ð l I ñ�ú,ó rIkmacrocyte ðmækroösñ�ø�ó i ñ�ú,ó t ðmækr© ñ��#ó ösñ�ø�ó aI ñ(ú,ó tmacrocytic ömækroðsñ�ø�ó i ñ�ú,ó tIk ömækr© ñ��#ó ðsñ�ø�ó I ñ�ú,ó tIkmacrophage ðmækroö fagü ý�þ;ÿ�� ðmækr©Yñ��#ó�ö feIˇ ü ýÆþ�ÿ��macrophagic ömækroð fagIk ü ý�þ�ÿ�� ömækr©Yñ��#ó�ð fæ Ik ü ýÆþ�ÿ��magnetite ðmægn÷>ö tıt ðmægnI ñ�ù;ó�ö taItmagnetitic ömægn÷>ð tıtIk ömægnI ñ�ù;ó�ð tItIkmagnificence mægðnIf Ik ÷ ns mægðnIf Isñ�ø�ó © nsmagnificent mægðnIf Ik ÷ nt mægðnIf Isñ�ø�ó © ntmagnific mægðnIf Ik mægðnIf Ikmalignity mæð l ıgnIti m© ñ�õ5ó ð l IgnItimalign mæð l ıgn m© ñ�õ5ó ð laI ñ�� þ ó nmartensite ðmært÷ nöz ñ��>ó ıt ðm� rt ÷ nöz ñ��>ó aItmartensitic ömært÷ nðz ñ��>ó ıtIk öm� rt ÷ nðz ñ��>ó ItIkmatrices ðmatrI ök � ez ðmeItrI ösñ�ø�ó � izmatrix ðmatrIks ðmeItrIksmedicine ðm÷ dIkIn��ñ�ù;ó ðm÷ dIsñ�ø�ó In��ñ�ù;ómedic ðm÷ dIk ðm÷ dIkmegaphone ðm÷ gæö fonü ý�þ;ÿ�� ðm÷ g©Yñ�õ\óÆö foUn ü ý�þ;ÿ��megaphonic öm÷ gæð fonIk ü ý�þ;ÿ�� öm÷ g©Yñ�õ\óÆð fonIk ü ý�þ;ÿ��mendacious m÷ nðdakyos m÷ nðdeIs© smendacity m÷ nðdakIti m÷ nðdæsñ�ø�ó Itimesophyte ðm÷ z ñ��>ó oö fi ñ(ú(ó t ü ý�þ;ÿ�� ðm÷ z ñ��>ó © ñ��#ó ö faI ñ�ú,ó t ü ý�þ;ÿ��mesophytic öm÷ z ñ��>ó oð fi ñ(ú(ó tIk ü ýÆþ�ÿ�� öm÷ z ñ��>ó © ñ��#ó ð f I ñ(ú,ó tIk ü ýÆþ�ÿ��

110 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

mesothoracic öm÷ z ñ��>ó o� oð rækIk öm÷ z ñ��>óÆ©Yñ��5ó#����ð ræsñ�ø�ó Ikmesothorax öm÷ z ñ��>ó oð � oræks öm÷ z ñ��>óÆ©Yñ��5óïð ��� ræksmetaphysicist öm÷ tæð f I ñ(ú,ó z ñ��>ó IkIstü ý�þ�ÿ�� öm÷ t ©Yñ�õ5ó!ð f I ñ(ú,ó z ñ��>ó Isñ�ø�ó Istü ý�þ;ÿ��metaphysic öm÷ tæð f I ñ(ú,ó z ñ��>ó Ik ü ýÆþ�ÿ�� öm÷ t ©Yñ�õ5ó!ð f I ñ(ú,ó z ñ��>ó Ik ü ý�þ�ÿ��metathoracic öm÷ tæ� oðrækIk öm÷ t © ñ�õ5ó ����ðræsñ�ø�ó Ikmetathorax öm÷ tæð � oræks öm÷ t © ñ�õ5ó ð ��� ræksmeteorite ðmeteoörıt ðmiti © ñ��#ó ö raItmeteoritic ömeteoðrıtIk ömiti © ñ��#ó ð rItIkmetronome ðm÷ troönom ðm÷ tr © ñ��#ó önoUmmetronomic öm÷ troðnomIk öm÷ tr © ñ��#ó ðnomIkmicrocyte ðmıkroösñ�ø�ó i ñ(ú,ó t ðmaIkr © ñ��#ó ösñ�ø�ó aI ñ(ú,ó tmicrocytic ömıkroðsñ�ø�ó i ñ(ú,ó tIk ömaIkr © ñ��#ó ðsñ�ø�ó I ñ�ú,ó tIkmicroparasite ömıkroðpæræö sıt ömaIkroU ðpær©\ñ�õ\ó ösaItmicroparasitic ömıkroöpæræð sıtIk ömaIkroU öpær©\ñ�õ\ó ðsItIkmicrophone ðmıkroö fonü ý�þ;ÿ�� ðmaIkr ©Yñ��#ó�ö foUn ü ý�þ;ÿ��microphonic ömıkroð fonIk ü ý�þ;ÿ�� ömaIkr ©Yñ��#ó�ð fonIk ü ýÆþ�ÿ��microphyte ðmıkroö fi ñ�ú,ó t ü ý�þ�ÿ�� ðmaIkr ©Yñ��#ó�ö faI ñ(ú,ó t ü ý�þ;ÿ��microphytic ömıkroð fi ñ�ú,ó tIk ü ý�þ;ÿ�� ömaIkr ©Yñ��#ó�ð f I ñ�ú,ó tIk ü ý�þ;ÿ��microscope ðmıkroöskop ðmaIkr © ñ��#ó öskoUpmicroscopic ömıkroðskopIk ömaIkr © ñ��#ó ðskopIkmicrotome ðmıkroö tom ðmaIkr © ñ��#ó ö toUmmicrotomic ömıkroð tomIk ömaIkr © ñ��#ó ð tomIkmime ðmım ðmaImmimic ðmımIk ðmImIkmisanthrope ðmIsænö � rop ðmIs© ñ�õ5ó nö � roUpmisanthropic ömIsænð � ropIk ömIs© ñ�õ5ó nð � ropIkmispronounce ömIsproðnuns ömIspr©Yñ��#óÆðnaUnsmispronunciation ömIsproönu ñ�ò,ó nsñ�ø�ó I ð atyon ömIspr©Yñ��#óÆönô nsñ�ø�ó i ñ � ó!ðeIs© nmithridate ðmI � rI ödat ðmI � r ©\ñ�� ó!ödeItmithridatic ömI � rI ðdatIk ömI � rI ðdætIkmonasticism moðnæstI ökIsm m©Yñ��5ó!ðnæstI ösñ�ø�ó Iz© mmonastic moðnæstIk m© ñ��5ó ðnæstIkmonochromate ömonoðk ñ�øaû,ó romat ömon© ñ��#ó ðk ñ�øaû�ó roUmeItmonochromatic ömonokñ�øaû,ó roðmatIk ömon© ñ��#ó k ñ�øaû�ó roU ðmætIkmonocline ðmonoöklın ðmon© ñ��#ó öklaInmonoclinic ömonoðklınIk ömon© ñ��#ó ðkl InIkmonocyte ðmonoösñ�ø�ó i ñ(ú,ó t ðmon© ñ��#ó ösñ�ø�ó aI ñ(ú(ó tmonocytic ömonoðsñ�ø�ó i ñ(ú,ó tIk ömon© ñ��#ó ðsñ�ø�ó I ñ(ú,ó tIkmonotype ðmonoö ti ñ�ú,ó p ðmon© ñ��#ó ö taI ñ(ú(ó pmonotypic ömonoð ti ñ�ú,ó pIk ömon©Yñ��#ó�ð tI ñ(ú,ó pIkmonzonite ðmonzoönıt ðmonz©Yñ��5ó�önaItmonzonitic ömonzoðnıtIk ömonz©Yñ��5ó�ðnItIkmordacious morðdakyos m� r ðdeIs© smordacity morðdakIti m� r ðdæsñ�ø�ó Itimucose ðmukos ðmyukoUs

3.A. ENGLISHDEEPAND SHALLOW ORL’S 111

deep shallow

mucosity mukð osIti myuðkosItimyope ðmi ñ(ú,ó op ðmaI ñ(ú,ó oUpmyopic mi ñ(ú,ó�ð opIk maI ñ(ú,óÆðopIkmysticism ðmI ñ�ú,ó stI ökIsm ðmI ñ�ú,ó stI ösñ�ø�ó Iz© mmystic ðmI ñ�ú,ó stIk ðmI ñ�ú,ó stIkneoclassicism öneoðklæsü ý������ I ökIsm önioU ðklæsü ý���� I ösñ�ø�ó Iz© mneoclassic öneoðklæsü ý������ Ik önioU ðklæsü ý���� Ikneophyte ðneoö fi ñ(ú,ó t ü ý�þ;ÿ�� ðni © ñ��5ó ö faI ñ(ú,ó t ü ý�þ�ÿ��neophytic öneoð fi ñ(ú,ó tIk ü ý�þ;ÿ�� öni © ñ��5ó ð f I ñ�ú,ó tIk ü ý�þ;ÿ��neuroticism nuñ�ù?ò,ó ðrotI ökIsm nuñ�ù?ò,ó ðrotI ösñ�ø�ó Iz© mneurotic nuñ�ù?ò,ó ðrotIk nuñ�ù?ò,ó ðrotIknoctiluca önoktI ð lukæ önokt© ñ � ó ð luk ©noctilucent önoktI ð luk ÷ nt önokt©Yñ � ó�ð lusñ�ø�ó!© ntnodose ðnoödos ðnoU ödoUsnodosity noðdosIti noU ðdosItinummulite ðnô m ü ý���� Ul ö ıt ðnô m ü ý���� y ©�ö laItnummulitic önô m ü ý���� U ð l ıtIk önô m ü ý���� y ©�ð l ItIkobligee öoblI ðge öobl©Yñ�� óÆðˇiobligor öoblI ðgor öobl© ñ�� ó ðg� roblique oðbleñ��vó k ñ��\ò,ó © ñ��5ó ðbli ñ � ó k ñ��\ò�óobliquity oðbleñ��vó kwIti © ñ��5ó ðblIkwItiobscene obðs� ñ�ø�ó en © ñ��5ó bðs� ñ�ø�ó inobscenity obðs� ñ�ø�ó enIti © ñ��5ó bðs� ñ�ø�ó ÷ nItiomnificent omðnIf Ik ÷ nt omðnIf Isñ�ø�ó © ntomnific omðnIf Ik omðnIf Ikomophagous oðmofagosü ýÆþ�ÿ�� oU ðmof©Yñ�õ\ó g© sü ý�þ�ÿ��omophagy oðmofægI ü ý�þ�ÿ�� oU ðmof©Yñ�õ\ó ˇi ü ý�þ;ÿ��oolite ð ooö l ıt ðoU ©Yñ��5óÆö laItoolitic ö ooð l ıtIk öoU © ñ��5ó ð l ItIkoophyte ð ooö fi ñ�ú,ó t ü ý�þ;ÿ�� ðoU © ñ��5ó ö faI ñ�ú,ó t ü ýÆþ�ÿ��oophytic ö ooð fi ñ�ú,ó tIk ü ýÆþ�ÿ�� öoU © ñ��5ó ð f I ñ(ú(ó tIk ü ý�þ�ÿ��opacity oðpakIti oU ðpæsñ�ø�ó Itiopaque oðpakñ��\ò�ó oU ðpeIk ñ��\ò�óoperate ðop÷>ö rat ðop©�ö reItoperatic öop÷>ð ratIk öop©�ð rætIkophthalmoscope of ð � ælmoöskopü ý�þ;ÿ�� of ð � ælm© ñ��#ó öskoUp ü ýÆþ�ÿ��ophthalmoscopic of ö � ælmoðskopIk ü ý�þ;ÿ�� of ö � ælm©\ñ��#óÆðskopIk ü ýÆþ�ÿ��organicism orðgænI ökIsm � r ðgænI ösñ�ø�ó Iz© morganic orðgænIk � r ðgænIkorganization öorgænI ðzatyon ö � rg©Yñ�õ\ó nI ðzeIs© norganize ðorgæönız ð � rg©Yñ�õ\óÆönaIzorthoscope ðor� oöskop ð � r ��© ñ��#ó öskoUporthoscopic öor� oðskopIk ö � r ��© ñ��#ó ðskopIkosteophyte ðosteoö fi ñ(ú,ó t ü ý�þ;ÿ�� ðosti© ñ��5ó ö faI ñ(ú,ó t ü ý�þ�ÿ��osteophytic öosteoð fi ñ(ú,ó tIk ü ý�þ;ÿ�� öosti© ñ��5ó ð f I ñ�ú,ó tIk ü ý�þ;ÿ��

112 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

otiose ð otI ö os ðoUsñ�$^ó i ñ��vó�öoUsotiosity ö otI ð osIti öoUsñ�$^ó i ñ��vó�ðosItiotoscope ð otoöskop ðoUt ©\ñ��#óÆöskoUpotoscopic ö otoðskopIk öoUt ©\ñ��#óÆðskopIkoxidase ðoksı ödas ðoksI ödeIsoxidasic öoksI ðdasIk öoksI ðdæsIkoxidation öoksı ðdatyon öoksI ðdeIs© noxide ðoksıd ðoksaIdozone ð ozon ðoUzoUnozonic oðzonIk oU ðzonIkpalindrome ðpælInödrom ðpælInödroUmpalindromic öpælInðdromIk öpælInðdromIkpantomime ðpæntoömım ðpænt©\ñ��#óÆömaImpantomimic öpæntoðmımIk öpænt©\ñ��#óÆðmImIkparasite ðpæræö sıt ðpær©Yñ�õ\óÆösaItparasiticide öpæræð sıtI ösñ�ø�ó ıd öpær©Yñ�õ\óÆðsItI ösñ�ø�ó aIdparasitic öpæræð sıtIk öpær©Yñ�õ\óÆðsItIkparoxytone pæð roksI ñ�ú,ó�ö ton p©\ñ�õ\ó�ðroksI ñ(ú,ó�ö toUnparoxytonic öpæroksI ñ�ú,ó ð tonIk öpæroksI ñ�ú,ó ð tonIkpasteurization öpæstyñ�ù;ó Ur ð Izatyon öpæsc ñ�$^ó © ñ�ù?ò�ó rI ðzeIs© npasteurize ðpæstyñ�ù;ó öUrız ðpæsc ñ�$^ó © ñ�ù?ò�ó öraIzpathogene ðpæ� oöˇen ðpæ��© ñ��5ó öˇinpathogenic öpæ� oðˇenIk öpæ��© ñ��5ó ðˇ ÷ nIkpearlite ðp÷ ñ�ù>õ\ó rlıt ðpr. ñ�ù>õ� ?ó laItpearlitic p÷ ñ�ù>õ\ó r ð l ıtIk pr. ñ�ù>õ� �ó ð l ItIkpedicel ðp÷ dIk ÷ l ðp÷ dIsñ�ø�ó © ñ�ù;ó lpedicle ðp÷ dIkl ðp÷ dIk © lpegmatite ðp÷ gmæö tıt ðp÷ gm©Yñ�õ\óÆö taItpegmatitic öp÷ gmæð tıtIk öp÷ gm©Yñ�õ\óÆð tItIkpeptone ðp÷ pton ðp÷ ptoUnpeptonic p÷ pð tonIk p÷ pð tonIkperidotite öp÷ rI ðdotıt öp÷ rI ðdoUtaItperidotitic öp÷ rIdoð tıtIk öp÷ rIdoU ð tItIkperiscope ðp÷ rI öskop ðp÷ rI öskoUpperiscopic öp÷ rI ðskopIk öp÷ rI ðskopIkperlite ðp÷ rlıt ðpr.laItperlitic p÷ r ð l ıtIk pr. ð l ItIkperspicacious öp÷ rspI ðkakyos öpr.sp© ñ�� ó ðkeIs© sperspicacity öp÷ rspI ðkakIti öpr.sp© ñ�� ó ðkæsñ�ø�ó Itipertinacious öp÷ rtI ðnakyos öpr.t ©Yñ � ó�ðneIs© spertinacity öp÷ rtI ðnakIti öpr.t ©Yñ � ó�ðnæsñ�ø�ó Itiphagocyte ð fægoösñ�ø�ó i ñ�ú,ó t ü ýÆþ�ÿ�� ð fæg©Yñ��5óÆösñ�ø�ó aI ñ�ú,ó t ü ý�þ�ÿ��phagocytic ö fægoðsñ�ø�ó i ñ�ú,ó tIk ü ý�þ;ÿ�� ö fæg©Yñ��5óÆðsñ�ø�ó I ñ(ú(ó tIk ü ý�þ;ÿ��phallicism ð fælü ý���� I ökIsmü ý�þ;ÿ�� ð fælü ý���� I ösñ�ø�ó Iz© m ü ý�þ;ÿ��phallic ð fælü ý���� Ik ü ý�þ�ÿ�� ð fælü ý���� Ik ü ý�þ;ÿ��

3.A. ENGLISHDEEPAND SHALLOW ORL’S 113

deep shallow

pharmacal ð færmækælü ý�þ�ÿ�� ð f � rm©\ñ�õ\ó k ©Yñ�õ5ó l ü ýÆþ�ÿ��pharmacist ð færmækIstü ý�þ;ÿ�� ð f � rm©\ñ�õ\ó sñ�ø�ó Istü ý�þ;ÿ��pharmacy ð færmækI ü ý�þ;ÿ�� ð f � rm©\ñ�õ\ó sñ�ø�ó i ü ýÆþ�ÿ��pharyngoscope fæðrI ñ�ú,ó ngoöskopü ýÆþ�ÿ�� f ©Yñ�õ\ó!ð rI ñ(ú(ó�� g©Yñ��5ó�öskoUp ü ýÆþ�ÿ��pharyngoscopic fæörI ñ�ú,ó ngoðskopIk ü ýÆþ�ÿ�� f © ñ�õ\ó ö rI ñ(ú(ó � g© ñ��5ó ðskopIk ü ýÆþ�ÿ��phenotype ð fenoö ti ñ(ú(ó p ü ý�þ;ÿ�� ðfin © ñ��5ó ö taI ñ(ú,ó p ü ýÆþ�ÿ��phenotypic ö fenoð ti ñ(ú(ó pIk ü ý�þ;ÿ�� öfin © ñ��5ó ð tI ñ�ú,ó pIk ü ý�þ�ÿ��philhellene f I l ðh÷ l ü ý����� enü ýÆþ�ÿ�� f I l ðh÷ l ü ý������ in ü ý�þ;ÿ��philhellenic ö f I lh ÷>ð l ü ý����� enIk ü ý�þ�ÿ�� ö f I lh ÷>ð l ü ý������n÷ nIk ü ý�þ;ÿ��phonolite ð fonoö l ıt ü ýÆþ�ÿ�� ð foUn© ñ��5ó ö laIt ü ý�þ;ÿ��phonolitic ö fonoð l ıtIk ü ý�þ;ÿ�� ö foUn© ñ��5ó ð l ItIk ü ý�þ;ÿ��phonotype ð fonoö ti ñ(ú(ó p ü ý�þ;ÿ�� ð foUn© ñ��5ó ö taI ñ(ú,ó p ü ýÆþ�ÿ��phonotypic ö fonoð ti ñ(ú(ó pIk ü ý�þ�ÿ�� ö foUn©Yñ��5óÆð tI ñ�ú,ó pIk ü ý�þ�ÿ��phosphate ð fosfatü ý�þ;ÿ�� ð fosfeIt ü ýÆþ�ÿ��phosphatic fosð fatIk ü ý�þ;ÿ�� fosð fætIk ü ýÆþ�ÿ��phosphorite ð fosfoö rıt ü ý�þ;ÿ�� ð fosf©\ñ��#óÆöraIt ü ý�þ;ÿ��phosphoritic ö fosfoð rıtIk ü ý�þ;ÿ�� ö fosf©\ñ��#óÆðrItIk ü ýÆþ�ÿ��photoelectricity ö foto÷ l ÷ k ð trIkItI ü ý�þ�ÿ�� ö foUtoUI ñ�ù;ó l ÷ k ð trIsñ�ø�ó Iti ü ý�þ;ÿ��photoelectric ö foto÷>ð l ÷ ktrIk ü ý�þ�ÿ�� ö foUtoUI ñ�ù;ó ð l ÷ ktrIk ü ýÆþ�ÿ��photogene ð fotoöˇenü ýÆþ�ÿ�� ð foUt © ñ��#ó öˇin ü ý�þ;ÿ��photogenic ö fotoðˇenIk ü ý�þ;ÿ�� ö foUt © ñ��#ó ðˇ ÷ nIk ü ýÆþ�ÿ��phototype ð fotoö ti ñ�ú,ó p ü ý�þ�ÿ�� ð foUt © ñ��#ó ö taI ñ(ú,ó p ü ý�þ�ÿ��phototypic ö fotoð ti ñ�ú,ó pIk ü ý�þ;ÿ�� ö foUt © ñ��#ó ð tI ñ�ú,ó pIk ü ý�þ;ÿ��phyllite ð f I ñ(ú,ó l ü ý������ ıt ü ý�þ;ÿ�� ð f I ñ�ú,ó l ü ý���� aIt ü ý�þ�ÿ��phyllitic f I ñ(ú,ó ð l ü ý������ ıtIk ü ý�þ;ÿ�� f I ñ�ú,ó ð l ü ý����� ItIk ü ý�þ;ÿ��phyllome ð f I ñ(ú,ó l ü ý������ omü ýÆþ�ÿ�� ð f I ñ�ú,ó l ü ý���� oUm ü ý�þ;ÿ��phyllomic f I ñ(ú,óÆð l ü ý������ omIk ü ý�þ;ÿ�� f I ñ�ú,ó!ð l ü ý����� omIk ü ýÆþ�ÿ��phylogenesis ö fi ñ�ú,ó lo ðˇen÷ sIsü ý�þ;ÿ�� ö faI ñ(ú,ó l ©Yñ��#ó�ðˇ ÷ nI ñ�ù;ó sIsü ý�þ;ÿ��phylogenic ö fi ñ�ú,ó lo ðˇenIk ü ý�þ;ÿ�� ö faI ñ(ú,ó l ©Yñ��#ó�ðˇ ÷ nIk ü ý�þ;ÿ��physical ð f I ñ(ú,ó z ñ��>ó Ikælü ý�þ;ÿ�� ð f I ñ�ú,ó z ñ��>ó Ik ©\ñ�õ\ó l ü ý�þ;ÿ��physicist ð f I ñ(ú,ó z ñ��>ó IkIstü ý�þ;ÿ�� ð f I ñ�ú,ó z ñ��>ó Isñ�ø�ó Istü ý�þ;ÿ��physics ð f I ñ(ú,ó z ñ��>ó Ik � sü ý�þ;ÿ�� ð f I ñ�ú,ó z ñ��>ó Ik � sü ý�þ�ÿ��physic ð f I ñ(ú,ó z ñ��>ó Ik ü ý�þ�ÿ�� ð f I ñ�ú,ó z ñ��>ó Ik ü ý�þ;ÿ��phytophagous fi ñ�ú,ó ð tofagosü ýÆþ�ÿ�� faI ñ(ú,ó ð tof © ñ�õ\ó g© sü ý�þ;ÿ��phytophagy fi ñ�ú,ó ð tofa I ü ý�þ�ÿ�� faI ñ(ú,ó ð tof © ñ�õ\ó ˇi ü ý�þ;ÿ��pilose ðpılos ðpaI loUspilosity pı ð losIti paI ð losItipisolite ðpısoö l ıt ðpaIs© ñ��#ó ö laItpisolitic öpısoð l ıtIk öpaIs©Yñ��#ó!ð l ItIkplasmagene ðplæzñ��>ó mæöˇen ðplæzñ��>ó m©Yñ�õ\ó!öˇinplasmagenic öplæzñ��>ó mæðˇenIk öplæzñ��>ó m©Yñ�õ\ó!ðˇ ÷ nIkplasticity plæðstIkIti plæðstIsñ�ø�ó Itiplasticize ðplæstI ökız ðplæstI ösñ�ø�ó aIzplastic ðplæstIk ðplæstIkpleasance ðpleñ�ù>õ\ó z ñ��>ó æns ðpl ÷ ñ�ù>õ\ó z ñ��>ó © ñ�õ\ó ns

114 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

please ðpleñ�ù>õ\ó z ñ��>ó ðpli ñ�ù^õ5ó z ñ��>óplumose ðplumos ðplumoUsplumosity pluðmosIti pluðmosItipodsolization öpodsolı ðzatyon öpods©Yñ��5ó l I ðzeIs© npodsolize ðpodsoö l ız ðpods© ñ��5ó ö laIzpodzolization öpodzolı ðzatyon öpodz© ñ��5ó l I ðzeIs© npodzolize ðpodzoö l ız ðpodz© ñ��5ó ö laIzpoeticize poð ÷ tI ökız poU ð ÷ tI ösñ�ø�ó aIzpoetic poð ÷ tIk poU ð ÷ tIkpolarization öpolærI ðzatyon öpoUl © ñ�õ\ó rI ðzeIs© npolarize ðpolæö rız ðpoUl © ñ�õ\ó ö raIzpolemicist poð l ÷ mIkIst p© ñ��5ó ð l ÷ mIsñ�ø�ó Istpolemic poð l ÷ mIk p©Yñ��5óÆð l ÷ mIkpolite poð l ıt p©Yñ��5óÆð laItpolitical poð l ıtIkæl p©Yñ��5óÆð l ItIk ©Yñ�õ\ó lpoliticize poð l ıtI ökız p©Yñ��5óÆð l ItI ösñ�ø�ó aIzpolitic ðpolıtIk ðpolItIkpolyphone ðpolI ñ(ú,ó�ö fonü ý�þ;ÿ�� ðpoli ñ(ú,ó�ö foUn ü ý�þ�ÿ��polyphonic öpolI ñ(ú,ó ð fonIk ü ýÆþ�ÿ�� öpoli ñ(ú,ó ð fonIk ü ý�þ;ÿ��porcine ðporkın ðp� rsñ�ø�ó aInpork ðporkñ ÿ ó ðp� rk ñ ÿ óposterity poðst÷ rIti poðst÷ rItiposter ðpostr ðpoUstr.precocious pr÷>ðkokyos prI ñ�ù;ó ðkoUs© sprecocity pr÷>ðkokIti prI ðkosñ�ø�ó Itipredaceous pr÷>ðdaky ñ�ù;ó os prI ðdeIsñ�øfù;ó © spredacious pr÷>ðdakyos prI ðdeIs© spredacity pr÷>ðdækIti prI ðdæsñ�ø�ó Itiprevocational öprevoðkatyonæl öprivoU ðkeIs© n©Yñ�õ\ó lproctoscope ðproktoöskop ðprokt©Yñ��5ó öskoUpproctoscopic öproktoðskopIk öprokt©Yñ��5ó ðskopIkprodigal ðprodIgæl ðprod© ñ�� ó g© ñ�õ\ó lprodigy ðprodIgI ðprod© ñ�� ó ˇiprodrome ðprodrom ðproUdroUmprodromic proðdromIk proU ðdromIkprofane proð fan pr© ñ��#ó ð feInprofanity proð fanIti pr© ñ��#ó ð fænItiprofound proð fund pr©Yñ��#óÆð faUndprofundity proð fu ñ ò�ó ndIti pr©Yñ��#óÆð f ô ndItipronounce proðnuns pr©Yñ��#óÆðnaUnspronunciation proönu ñ ò�ó nsñ�ø�ó I ð atyon pr©Yñ��#óÆönô nsñ�ø�ó i ñ � ó!ðeIs© nprosaicism proðz ñ��>ó aI ökIsm proU ðz ñ��>ó eI I ösñ�ø�ó Iz© mprosaic proðz ñ��>ó aIk proU ðz ñ��>ó eI Ikprototype ðprotoö ti ñ(ú,ó p ðproUt © ñ��5ó ö taI ñ�ú,ó pprototypic öprotoð ti ñ(ú,ó pIk öproUt © ñ��5ó ð tI ñ(ú,ó pIk

3.A. ENGLISHDEEPAND SHALLOW ORL’S 115

deep shallow

providence ðprovId÷ ns ðprovId© nsprovide proðvıd pr©Yñ��#ó!ðvaIdprovocation öprovoðkatyon öprov ©\ñ��#ó ðkeIs© nprovoke proðvok pr©Yñ��#ó!ðvoUkpsammite ð � ñ��(ó sæmü ý������ ıt ð � ñ��,ó sæmü ý������ aItpsammitic � ñ��(ó sæðm ü ý���� ıtIk � ñ��,ó sæðm ü ý������ ItIkpsephite ð � ñ��(ó sefıt ü ý�þ;ÿ�� ð � ñ��,ó sifaIt ü ý�þ;ÿ��psephitic � ñ��(ó seð fıtIk ü ý�þ�ÿ�� � ñ��,ó sið f ItIk ü ý�þ;ÿ��pteridophyte � ñ��(ó t ÷?ðrIdoö fi ñ�ú,ó t ü ý�þ;ÿ�� � ñ��,ó t ©�ð rId© ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��pteridophytic � ñ��(ó t ÷?örIdoð fi ñ�ú,ó tIk ü ý�þ�ÿ��%� ñ��,ó t ©�ö rId© ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��publicist ðpô blIkIst ðpô blIsñ�ø�ó Istpublicity pô/ðblIkIti pô/ðblIsñ�ø�ó Itipublicize ðpô blI ökız ðpô blI ösñ�ø�ó aIzpublic ðpô blIk ðpô blIkpugnacious pô gðnakyos pô gðneIs© spugnacity pô gðnakIti pô gðnæsñ�ø�ó Itipyrite ðpi ñ(ú(ó rıt ðpaI ñ(ú,ó raItpyritic pi ñ(ú(ó!ðrıtIk paI ñ(ú,óÆð rItIkpyroelectricity öpi ñ(ú(ó ro÷ l ÷ k ð trIkIti öpaI ñ(ú,ó roUI ñ�ù;ó l ÷ k ð trIsñ�ø�ó Itipyroelectric öpi ñ(ú(ó ro÷>ð l ÷ ktrIk öpaI ñ(ú,ó roUI ñ�ù;ó ð l ÷ ktrIkpyrrole ðpI ñ(ú,ó r ü ý������ ol ðpI ñ�ú,ó r ü ý����� oUlpyrrolic pI ñ(ú,ó ð r ü ý������ olIk pI ñ�ú,ó ð r ü ý����� olIkradioisotope ö radIoð ısotop öreIdi ñ��vó oU ðaIs© ñ��#ó toUpradioisotopic ö radIoö ısoð topIk öreIdi ñ��vó oU öaIs© ñ��#ó ð topIkradiopacity ö radIoðpakIti öreIdi ñ��vó oU ðpæsñ�ø�ó Itiradiopaque ö radIoðpakñ��\ò�ó öreIdi ñ��vó oU ðpeIk ñ��\ò,óradiophone ð radIo ü &�þ;ÿ�� ö fonü ý�þ;ÿ�� ðreIdi ñ��vó oU ü &�þ�ÿ��×ö foUn ü ý�þ;ÿ��radiophonic ö radIo ü &�þ;ÿ�� ð fonIk ü ý�þ;ÿ�� öreIdi ñ��vó oU ü &�þ�ÿ��×ð fonIk ü ý�þ;ÿ��radioscope ð radIoöskop ðreIdi ñ��vó oU öskoUpradioscopic ö radIoðskopIk öreIdi ñ��vó oU ðskopIkradiotelephone ö radIo ü &�þ;ÿ�� ð t ÷ l ÷>ö fonü ý�þ�ÿ�� öreIdi ñ��vó oU ü &�þ�ÿ��×ð t ÷ l ©�ö foUn ü ý�þ�ÿ��radiotelephonic ö radIo ü &�þ;ÿ�� ö t ÷ l ÷>ð fonIk ü ý�þ;ÿ�� öreIdi ñ��vó oU ü &�þ�ÿ��×ö t ÷ l ©�ð fonIk ü ý�þ;ÿ��rapacious ræðpakyos r © ñ�õ\ó ðpeIs© srapacity ræðpækIti r © ñ�õ\ó ðpæsñ�ø�ó Itirealization ö reælI ðzatyon öri © ñ�õ\ó l I ðzeIs© nrealize ð reæö l ız ðri © ñ�õ\ó ö laIzrecitation ö r ÷ sñ�ø�ó ı ð tatyon ör ÷ sñ�ø�ó I ð teIs© nrecite r ÷>ðsñ�ø�ó ıt rI ðsñ�ø�ó aItreclination ö r ÷ klı ðnatyon ör ÷ kl © ñ�� ó ðneIs© nrecline r ÷>ðklın rI ðklaInregale r ÷>ðgal rI ðgeI lregality r ÷>ðgalIti rI ðgælItirenounce r ÷>ðnuns rI ðnaUnsrenunciation r ÷>önu ñ�ò�ó nsñ�ø�ó I ð atyon rI önô nsñ�ø�ó i ñ � ó!ðeIs© nreorganization ö reorgænI ðzatyon öri � rg© ñ�õ\ó nI ðzeIs© n

116 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

reorganize reðorgæönız ri ð � rg©\ñ�õ\óÆönaIzresidence ðr ÷ z ñ��>ó ıd÷ ns ð r ÷ z ñ��>ó Id© nsreside r ÷?ðz ñ��>ó ıd rI ðz ñ��>ó aIdresignation ör ÷ sıgðnatyon ö r ÷ z ñ��>ó IgðneIs© nresign r ÷?ð sıgn rI ðz ñ��>ó aI ñ � þ ó nreveal r ÷?ðveñ�ù>õ\ó l rI ðvi ñ�ù>õ5ó lrevelation ör ÷ veð latyon ö r ÷ v ©�ð leIs© nrevile r ÷?ðvıl rI ðvaI lrevocation ör ÷ voðkatyon ö r ÷ v © ñ��#ó ðkeIs© nrevoke r ÷?ðvok rI ðvoUkrhetoric ðretorIk ü ýÆþ�ÿ�� ð r ÷ t © ñ��#ó rIk ü ý�þ;ÿ��rhetor ðretorü ý�þ;ÿ�� ð rit © ñ��5ó r ü ý�þ�ÿ��rhyolite ðri ñ(ú(ó oö l ıt ü ý�þ�ÿ�� ð raI ñ�ú,ó�©Yñ��#ó�ö laIt ü ý�þ;ÿ��rhyolitic öri ñ(ú(ó oð l ıtIk ü ý�þ;ÿ�� ö raI ñ�ú,ó�©Yñ��#ó�ð l ItIk ü ý�þ;ÿ��rhythmicity rI ñ(ú,ó('#ðmIkItI ü ý�þ;ÿ�� rI ñ(ú(ó�'5ðmIsñ�ø�ó Iti ü ý�þ;ÿ��rhythmics ðrI ñ(ú,ó�' mIk � sü ý�þ;ÿ�� ð rI ñ(ú(ó ' mIk � sü ý�þ�ÿ��rimose ðrımos ð raImoUsrimosity rı ðmosIti raI ðmosItiromanticism roðmæntI ökIsm roU ðmæntI ösñ�ø�ó Iz© mromanticist roðmæntIkIst roU ðmæntIsñ�ø�ó Istromanticize roðmæntI ökız roU ðmæntI ösñ�ø�ó aIzromantic roðmæntIk roU ðmæntIkrugose ðrugos ð rugoUsrugosity ruðgosIti ruðgosItirusticity r ô/ðstIkIti r ô/ðstIsñ�ø�ó Itirustic ðr ô stIk ð r ô stIksabulose ðsæbUl ö os ðsæby©�ö loUssabulosity ösæbU ð losIti ösæby©�ð losItisagacious sæðgakyos s©Yñ�õ5ó!ðgeIs© ssagacity sæðgakIti s©Yñ�õ5ó!ðgæsñ�ø�ó Itisalacious sæð lakyos s©Yñ�õ5ó!ð leIs© ssalacity sæð lakIti s© ñ�õ5ó ð læsñ�ø�ó Itisalicine ðsælIkIn� ñ�ù;ó ðsælIsñ�ø�ó In� ñ�ù;ósalicin ðsælIkIn ðsælIsñ�ø�ó Insalic ðsælIk ðsælIksane ðsan ðseInsanity ðsanIti ðsænItisaprolite ðsæproö l ıt ðsæpr© ñ��#ó ö laItsaprolitic ösæproð l ıtIk ösæpr© ñ��#ó ð l ItIksaprophyte ðsæproö fi ñ(ú,ó t ü ýÆþ�ÿ�� ðsæpr©Yñ��#óÆö faI ñ(ú(ó t ü ý�þ;ÿ��saprophytic ösæproð fi ñ(ú,ó tIk ü ý�þ;ÿ�� ösæpr©Yñ��#óÆð f I ñ(ú,ó tIk ü ý�þ;ÿ��satellite ðsæt÷>ö l ü ý����� ıt ðsæt©�ö l ü ý������ aItsatellitic ösæt÷>ð l ü ý����� ıtIk ösæt©�ð l ü ý������ ItIksaturnine ðsætUr önın ðsæt©Yñ ò�ó r önaInsaturninity ðsætUr önınIti ðsæt© ñ ò�ó r önInIti

3.A. ENGLISHDEEPAND SHALLOW ORL’S 117

deep shallow

saxophone ðsæksoö fonü ý�þ;ÿ�� ðsæks©Yñ��5ó�ö foUn ü ý�þ;ÿ��saxophonic ösæksoð fonIk ü ý�þ;ÿ�� ösæks©Yñ��5ó�ð fonIk ü ýÆþ�ÿ��schizomycete öskñ�øaû�ó Izomi ñ(ú,óÆðket öskñ�øaû,ó IzoUmaI ñ(ú,óÆðsñ�ø�ó itschizomycetic öskñ�øaû�ó Izomi ñ(ú,óÆðketIk öskñ�øaû,ó IzoUmaI ñ(ú,óÆðsñ�ø�óï÷ tIkschizophyte ðskñ�øaû�ó Izoö fi ñ�ú,ó t ü ý�þ;ÿ�� ðskñ�øaû,ó Iz© ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��schizophytic öskñ�øaû�ó Izoð fi ñ�ú,ó tIk ü ý�þ;ÿ�� öskñ�øaû,ó Iz© ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��scholasticism skñ�øaû�ó oð læstI ökIsm skñ�øaû,ó © ñ��5ó ð læstI ösñ�ø�ó Iz© mscholastic skñ�øaû�ó oð læstIk skñ�øaû,ó © ñ��5ó ð læstIkseismoscope ðsi ñ�ù"� ó z ñ��>ó moöskop ðsaI ñ�ù"� ó z ñ��>ó m© ñ��#ó öskoUpseismoscopic ösi ñ�ù"� ó z ñ��>ó moðskopIk ösaI ñ�ù"� ó z ñ��>ó m© ñ��#ó ðskopIksemen ðsem÷ n ðsim© nsemination ösem÷ ñ � ó ðnatyon ös÷ m© ñ�� ó ðneIs© nsemiparasite ös÷ mI ðpæræö sıt ös÷ mi ñ � ó!ðpær©Yñ�õ\óÆösaItsemiparasitic ös÷ mI öpæræð sıtIk ös÷ mi ñ � ó!öpær©Yñ�õ\óÆðsItIksepticemia ös÷ ptI ðkemIæ ös÷ ptI ðsñ�ø�ó imi ñ�� ó!©septicidal ös÷ ptI ðkıdæl ös÷ ptI ðsñ�ø�ó aId©\ñ�õ\ó lsepticity s÷ pð tIkIti s÷ pð tIsñ�ø�ó Itiseptic ðs÷ ptIk ðs÷ ptIksequacious s÷>ðkwakyos sI ñ�ù;ó ðkweIs© ssequacity s÷>ðkwakIti sI ñ�ù;ó ðkwæsñ�ø�ó Itiserene s÷>ð ren s©�ðrinserenity s÷>ð renIti s©�ðr ÷ nItisiderite ðsId÷?örıt ðsId©�ö raItsideritic ösId÷?ðrıtIk ösId©�ð rItIksigmoidoscope sIgðmoIdoöskop sIgðmoId© ñ��5ó öskoUpsigmoidoscopic sIgömoIdoðskopIk sIgömoId© ñ��5ó ðskopIksilicic sI ð l IkIk sI ð l Isñ�ø�ó Iksilicide ðsI l I ökıd ðsI l I ösñ�ø�ó aIdsiliciferous ösI l I ðkIf ÷ ros ösI l I ðsñ�ø�ó Ifr. © ssilicify sI ð l IkI ö fı sI ð l Isñ�ø�ó!©Yñ � ó�ö faI

silicon ðsI l Ikon ðsI l Ik ©Yñ��#ó nsomite ðsomıt ðsoUmaItsomitic soðmıtIk soU ðmItIksone ðson ðsoUnsonic ðsonIk ðsonIkspecifiable ðsp÷ sñ�ø�ó I ö fıæbI l ðsp÷ sñ�ø�ó © ñ�� ó ö faI © ñ�õ\ó b© lspecification ösp÷ sñ�ø�ó If I ðkatyon ösp÷ sñ�ø�ó © ñ�� ó f © ñ�� ó ðkeIs© nspecificative ðsp÷ sñ�ø�ó If I ökatIv � ñ�ù;ó ðsp÷ sñ�ø�ó © ñ�� ó f © ñ�� ó ökeItIv � ñ�ù;óspecificity ösp÷ sñ�ø�ó I ð fıkIti ösp÷ sñ�ø�ó © ñ�� ó ð f Isñ�ø�ó Itispecify ðsp÷ sñ�ø�ó I ö fı ðsp÷ sñ�ø�ó�©Yñ�� óïö faI

specimen ðsp÷ sñ�ø�ó Im÷ n ðsp÷ sñ�ø�ó�©Yñ�� ó m© nspectrohelioscope ösp÷ ktroðhelIoöskop ösp÷ ktr ©Yñ��#óÆðhili ñ��vóÆ©Yñ��#ó�öskoUpspectrohelioscopic ösp÷ ktroöhelIoðskopIk ösp÷ ktr ©Yñ��#óÆöhili ñ��vóÆ©Yñ��#ó�ðskopIkspectroscope ðsp÷ ktroöskop ðsp÷ ktr ©Yñ��#óÆöskoUpspectroscopic ösp÷ ktroðskopIk ösp÷ ktr © ñ��#ó ðskopIk

118 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

sphericity sfeðrIkItI ü ýÆþ�ÿ�� sf÷>ð rIsñ�ø�ó Iti ü ý�þ;ÿ��spherics ðsferIk � sü ý�þ;ÿ�� ðsf÷ rIk � sü ýÆþ�ÿ��spinose ðspı önos ðspaI önoUsspinosity spı ðnosIti spaI ðnosItisporophyte ðsporoö fi ñ(ú,ó t ü ý�þ;ÿ�� ðsp� r © ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��sporophytic ösporoð fi ñ(ú,ó tIk ü ý�þ;ÿ�� ösp� r © ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��state ðstat ðsteItstatic ðstatIk ðstætIkstaurolite ðst� roö l ıt ðst� ñ�õ;ò�ó r © ñ��5ó ö laItstaurolitic öst� roð l ıtIk öst� ñ�õ;ò�ó r © ñ��5ó ð l ItIkstauroscope ðst� roöskop ðst� ñ�õ;ò�ó r © ñ��5ó öskoUpstauroscopic öst� roðskopIk öst� ñ�õ;ò�ó r © ñ��5ó ðskopIksteatite ðsteæö tıt ðsti©\ñ�õ\óïö taItsteatitic östeæð tıtIk östi©\ñ�õ\óïð tItIksteatopyga östeætoðpi ñ(ú,ó gæ östi©\ñ�õ\ó toU ðpaI ñ�ú,ó g©steatopygia östeætoðpi ñ(ú,ó gIæ östi©\ñ�õ\ó toU ðpaI ñ�ú,ó ˇi ñ��vóÆ©steatopygous östeætoðpi ñ(ú,ó gos östi© ñ�õ\ó toU ðpaI ñ�ú,ó g© sstenotype ðst÷ noö ti ñ(ú,ó p ðst÷ n© ñ��#ó ö taI ñ�ú,ó pstenotypic öst÷ noð ti ñ(ú,ó pIk öst÷ n© ñ��#ó ð tI ñ(ú,ó pIkstereoscope ðst÷ reoöskop ðst÷ ri © ñ��5ó öskoUpstereoscopic öst÷ reoðskopIk öst÷ ri © ñ��5ó ðskopIkstereotype ðst÷ reoö ti ñ�ú,ó p ðst÷ ri © ñ��5ó ö taI ñ(ú(ó pstereotypic öst÷ reoð ti ñ�ú,ó pIk öst÷ ri © ñ��5ó ð tI ñ(ú,ó pIksterilization öst÷ rI l I ðzatyon öst÷ r ©Yñ�� ó l I ðzeIs© nsterilize ðst÷ rI ö l ız ðst÷ r ©Yñ�� óïö laIzstethoscope ðst÷�� oöskop ðst÷���©Yñ��#ó�öskoUpstethoscopic öst÷�� oðskopIk öst÷���©Yñ��#ó�ðskopIkstoicism ðstoI ökIsm ðstoUI ösñ�ø�ó Iz© mstoic ðstoIk ðstoUIkstrobilation östrobI ð latyon östroUb© ñ�� ó ð leIs© nstrobila stroðbılæ stroU ðbaI l ©stroboscope ðstroboöskop ðstroUb© ñ��#ó öskoUpstroboscopic östroboðskopIk östroUb© ñ��#ó ðskopIkstromatolite stroðmætoö l ıt stroU ðmæt© ñ��#ó ö laItstromatolitic stroömætoð l ıtIk stroU ömæt© ñ��#ó ð l ItIkstylite ðsti ñ(ú(ó l ıt ðstaI ñ�ú,ó laItstylitic sti ñ(ú(ó ð l ıtIk staI ñ�ú,ó ð l ItIkstylolite ðsti ñ(ú(ó lo ö l ıt ðstaI ñ�ú,ó l ©Yñ��#ó!ö laItstylolitic östi ñ(ú(ó lo ð l ıtIk östaI ñ�ú,ó l ©Yñ��#ó!ð l ItIkstypticity stI ñ(ú,ó pð tIkIti stI ñ(ú(ó pð tIsñ�ø�ó Itistyptic ðstI ñ(ú,ó ptIk ðstI ñ(ú(ó ptIksublime sU ðblım s©Yñ ò�óïðblaImsublimity sU ðblImIti s© ñ ò�ó ðblImItisubvene sUbðven s© ñ ò�ó bðvinsubvention sUbðventyon s© ñ ò�ó bðv ÷ nc© n

3.A. ENGLISHDEEPAND SHALLOW ORL’S 119

deep shallow

sulfite ðsô lf ıt ðsô lfaItsulfitic sô l ð fıtIk sô l ð f ItIksupervene ösup÷ r ðven ösupr. ðvinsupervention ösup÷ r ðventyon ösupr. ðv ÷ nc© nsybarite ðsI ñ(ú,ó bæö rıt ðsI ñ�ú,ó b© ñ�õ\ó öraItsybaritic ösI ñ(ú,ó bæð rıtIk ösI ñ�ú,ó b© ñ�õ\ó ðrItIksyenite ðsi ñ�ú,ó ÷>önıt ðsaI ñ(ú,ó ©�önaItsyenitic ösi ñ�ú,ó ÷>ðnıtIk ösaI ñ(ú,ó ©�ðnItIksynagogical ösI ñ(ú,ó næðgogIkæl ösI ñ�ú,ó n© ñ�õ\ó ðgo Ik © ñ�õ5ó lsynagogue ðsI ñ(ú,ó næögog� ñ ò�ó � ñ�ù�ó ðsI ñ�ú,ó n© ñ�õ\ó ögog� ñ�ò�ó � ñ�ù;ósyndrome ðsI ñ(ú,ó ndrom ðsI ñ�ú,ó ndroUmsyndromic sI ñ(ú,ó nðdromIk sI ñ�ú,ó nðdromIktachistoscope tæðk ñ�øaû�ó Istoöskop t ©\ñ�õ\ó�ðk ñ�øfû�ó Ist©Yñ��5óïöskoUptachistoscopic tæök ñ�øaû�ó IstoðskopIk t ©\ñ�õ\ó�ök ñ�øfû�ó Ist©Yñ��5óïðskopIktachylite ð tækñ�øaû�ó I ñ�ú,ó!ö l ıt ð tækñ�øaû,óÆ©Yñ(ú(ó!ö laIttachylyte ð tækñ�øaû�ó I ñ�ú,ó!ö li ñ�ú,ó t ð tækñ�øaû,óÆ©Yñ(ú(ó!ö laI ñ(ú(ó ttachylytic ö tækñ�øaû�ó I ñ�ú,ó ð li ñ�ú,ó tIk ö tækñ�øaû,ó © ñ(ú(ó ð l I ñ(ú,ó tIktelescope ð t ÷ l ÷>öskop ð t ÷ l ©�öskoUptelescopic ö t ÷ l ÷>ðskopIk ö t ÷ l I ñ�ù;ó ðskopIktenacious t ÷>ðnakyos t ©�ðneIs© stenacity t ÷>ðnækIti t ©�ðnæsñ�ø�ó Ititephrite ð t ÷ frıt ü ý�þ;ÿ�� ð t ÷ fraIt ü ý�þ;ÿ��tephritic t ÷ f ðrıtIk ü ý�þ;ÿ�� t ÷ f ð rItIk ü ý�þ;ÿ��tetrabasicity ö t ÷ træbaðsIkIti ö t ÷ tr © ñ�õ5ó beI ðsIsñ�ø�ó Ititetrabasic ö t ÷ træðbasIk ö t ÷ tr ©Yñ�õ5ó!ðbeIsIkthallophyte ð � ælü ý���� oö fi ñ�ú,ó t ü ý�þ;ÿ�� ð � ælü ý������t©Yñ��5ó�ö faI ñ(ú,ó t ü ý�þ;ÿ��thallophytic ö � ælü ý���� oð fi ñ�ú,ó tIk ü ý�þ�ÿ�� ö � ælü ý������t©Yñ��5ó�ð f I ñ�ú,ó tIk ü ý�þ;ÿ��theodolite � eðodoö l ıt � i ðod©Yñ��#ó�ö laIttheodolitic � eöodoð l ıtIk � i öod©Yñ��#ó�ð l ItIkthermoelectricity ö �;÷ rmo÷ l ÷ k ð trIkIti ö � r.moUI ñ�ù;ó l ÷ k ð trIsñ�ø�ó Itithermoelectric ö �;÷ rmo÷>ð l ÷ ktrIk ö � r.moUI ñ�ù;ó ð l ÷ ktrIkthermoplasticity ö �;÷ rmoplæðstIkIti ö � r.m© ñ��#ó plæðstIsñ�ø�ó Itithermoplastic ö �;÷ rmoðplæstIk ö � r.m© ñ��#ó ðplæstIkthermoscope ð �;÷ rmöoskop ð � r.m© ñ��#ó öskoUpthermoscopic ö �;÷ rmoðskopIk ö � r.m© ñ��#ó ðskopIkthoracic � oð rækIk ����ðræsñ�ø�ó Ikthorax ð � oræks ð ��� ræksthrombocyte ð � romboösñ�ø�ó i ñ�ú,ó t ð � romb©Yñ��#ó�ösñ�ø�ó aI ñ(ú(ó tthrombocytic ö � romboðsñ�ø�ó i ñ�ú,ó tIk ö � romb©Yñ��#ó�ðsñ�ø�ó I ñ(ú,ó tIktone ð ton ð toUntonic ð tonIk ð tonIktope ð top ð toUptopic ð topIk ð topIktorose ð toros ð t � roUstorosity toð rosIti t ��ðrosIti

120 CHAPTER3. ORL DEPTHAND CONSISTENCY

deep shallow

toxicity toðksIkIti toðksIsñ�ø�ó Ititoxic ð toksIk ð toksIktoxophilite toðksofI ö l ıt ü ý�þ;ÿ�� toðksof©Yñ � ó�ö laIt ü ý�þ;ÿ��toxophilitic toöksofI ð l ıtIk ü ý�þ;ÿ�� toöksof©Yñ � ó�ð l ItIk ü ýÆþ�ÿ��trephination ö tr ÷ fı ðnatyonü ýÆþ�ÿ�� ö tr ÷ f © ñ � ó ðneIs© n ü ý�þ;ÿ��trephine tr ÷>ð fın ü ý�þ;ÿ�� trI ñ�ù;ó ð faIn ü ýÆþ�ÿ��triazole ð trıæözol ð traI © ñ�õ\ó özoUltriazolic ö trıæðzolIk ö traI © ñ�õ\ó ðzolIktrichite ð trIk ñ�øaû�ó ıt ð trIk ñ�øfû�ó aIttrichitic trI ðk ñ�øaû�ó ıtIk trI ðk ñ�øfû�ó ItIktrilobite ð trılo öbıt ð traI l © ñ��#ó öbaIttrilobitic ö trılo ðbıtIk ö traI l © ñ��#ó ðbItIktroglodyte ð trogloödi ñ�ú,ó t ð trogl©Yñ��#ó�ödaI ñ(ú,ó ttroglodytic ö trogloðdi ñ�ú,ó tIk ö trogl©Yñ��#ó�ðdI ñ�ú,ó tIktrope ð trop ð troUptropic ð tropIk ð tropIktropophyte ð tropoö fi ñ(ú(ó t ü ýÆþ�ÿ�� ð trop©Yñ��5ó�ö faI ñ�ú,ó t ü ýÆþ�ÿ��tropophytic ö tropoð fi ñ(ú(ó tIk ü ý�þ;ÿ�� ö trop©Yñ��5ó�ð f I ñ(ú(ó tIk ü ý�þ�ÿ��trypanosome ð trI ñ(ú,ó pænoösom ð trI ñ(ú,ó p© ñ�õ5ó n© ñ��5ó ösoUmtrypanosomic ö trI ñ(ú,ó pænoðsomIk ö trI ñ(ú,ó p© ñ�õ5ó n© ñ��5ó ðsomIktuberose ð tub÷>ö ros ð tub©�öroUstuberosity ö tub÷>ð rosIti ö tub©�ðrosItiultramicroscope ö ô ltræðmıkroöskop ö ô ltr © ñ�õ5ó ðmaIkr © ñ��#ó öskoUpultramicroscopic ö ô ltræömıkroðskopIk ö ô ltr © ñ�õ5ó ömaIkr © ñ��#ó ðskopIkunchaste ö ô nð cast ö ô nð ceIstunchastity ö ô nð castIti ö ô nð cæstItiuralite ðyUræö l ıt ðyUr. ©Yñ�õ\óïö laIturalitic öyUræð l ıtIk öyUr. ©Yñ�õ\óïð l ItIkuranite ðyUraönıt ðyUr. ©Yñ�õ\óïönaIturanitic öyUraðnıtIk öyUr. ©Yñ�õ\óïðnItIkurbane Ur ðban r. ñ�ò �ó!ðbeInurbanity Ur ðbanIti r. ñ�ò �ó ðbænItiurbanization öUrbænı ðzatyon ö r. ñ�ò �ó b© ñ�õ\ó nI ðzeIs© nurbanize ðUrböænız ð r. ñ�ò �ó b© ñ�õ\ó önaIzvaccination övækñ�ø�ó sñ�ø�ó I ðnatyon övækñ�ø�ó sñ�ø�ó © ñ��vó ðneIs© nvaccine væðk ñ�ø�ó sñ�ø�ó eñ � ó n væðk ñ�ø�ó sñ�ø�ó i ñ � ó nvaporization övaporı ðzatyon öveIp© ñ��#ó rI ðzeIs© nvaporize ðvapoö rız ðveIp© ñ��#ó öraIzvaricose ðværI ökos ðvær© ñ � ó ökoUsvaricosity öværI ðkosIti övær©Yñ � óÆðkosItivariolite ðværIoö l ıt ðv ÷;ñ�õ5ó r.i ñ�� ó�©Yñ��#ó!ö laItvariolitic öværIoð l ıtIk öv ÷;ñ�õ5ó r.i ñ�� ó�©Yñ��#ó!ð l ItIkvaticination övætIkI ðnatyon övætIsñ�ø�ó I ðneIs© nvatic ðvætIk ðvætIkventricose ðv ÷ ntrI ökos ðv ÷ ntr© ñ�� ó ökoUs

3.A. ENGLISHDEEPAND SHALLOW ORL’S 121

deep shallow

ventricosity öv ÷ ntrI ðkosIti öv ÷ ntr©Yñ�� óÆðkosItiveracious v ÷>ð rakyos v ©�ð reIs© sveracity v ÷>ð rakIti v ©�ð ræsñ�ø�ó Itiverbose v ÷ r ðbos vr. ðboUsverbosity v ÷ r ðbosIti vr. ðbosItiverrucose ðv ÷ r ü ý������ U ökos ðv ÷ r ü ý�����t© ñ�ò,ó ökoUsverrucosity öv ÷ r ü ý������ U ðkosIti öv ÷ r ü ý�����t© ñ�ò,ó ðkosItivertical ðv ÷ rtIkæl ðvr.tIk © ñ�õ\ó lvertices ðv ÷ rt ö Ik � ez ðvr.tI ösñ�ø�ó � izvideophone ðvIdeoö fonü ý�þ;ÿ�� ðvIdi © ñ��5ó ö foUn ü ý�þ;ÿ��videophonic övIdeoð fonIk ü ý�þ�ÿ�� övIdi © ñ��5ó ð fonIk ü ý�þ;ÿ��vinosity vI ðnosIti vI ðnosItivinous ðvınos ðvaIn© sviscose ðvIskos ðvIskoUsviscosity vIsðkosIti vIsðkosItivivacious vI ðvakyos vI ðveIs© svivacity vI ðvækIti vI ðvæsñ�ø�ó Itivocational voðkatyonæl voU ðkeIs© n©Yñ�õ5ó lvocation voðkatyon voU ðkeIs© nvoracious voð rakyos v ��ð reIs© svoracity voð rakIti v ��ð ræsñ�ø�ó Itivortical ðvortIkæl ðv � rtIk © ñ�õ5ó lvortices ðvortI ök � ez ðv � rtI ösñ�ø�ó � izvorticism ðvortI ökIsm ðv � rtI ösñ�ø�ó Iz© mxerophyte ðz ñ�)/ó ÷ roö fi ñ(ú(ó t ü ý�þ;ÿ�� ðz ñ�)/ó I ñ�ù�ó r. © ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��xerophytic öz ñ�)/ó ÷ roð fi ñ(ú(ó tIk ü ý�þ�ÿ�� öz ñ�)/ó I ñ�ù�ó r. © ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��xylophone ðz ñ�)/ó i ñ�ú,ó lo ö fonü ý�þ;ÿ�� ðz ñ�)/ó aI ñ(ú,ó l ©Yñ��5óïö foUn ü ý�þ;ÿ��xylophonic öz ñ�)/ó i ñ�ú,ó lo ð fonIk ü ý�þ;ÿ�� öz ñ�)/ó aI ñ(ú,ó l ©Yñ��5óïð fonIk ü ýÆþ�ÿ��zeolite ðzeoö l ıt ðzi ©\ñ��#óÆö laItzeolitic özeoð l ıtIk özi ©\ñ��#óÆð l ItIkzoophile ðzooö fıl ü ý�þ�ÿ�� ðzoU ©Yñ��#óÆö faI l ü ýÆþ�ÿ��zoophilic özooð fıl Ik ü ý�þ�ÿ�� özoU © ñ��#ó ð f I l Ik ü ý�þ�ÿ��zoophyte ðzooö fi ñ�ú,ó t ü ý�þ;ÿ�� ðzoU © ñ��#ó ö faI ñ(ú(ó t ü ý�þ;ÿ��zoophytic özooð fi ñ�ú,ó tIk ü ýÆþ�ÿ�� özoU © ñ��#ó ð f I ñ(ú,ó tIk ü ý�þ;ÿ��zygote ðzi ñ(ú,ó got ðzaI ñ�ú,ó goUtzygotic zi ñ(ú,ó ðgotIk zaI ñ�ú,ó ðgotIk

122 CHAPTER3. ORL DEPTHAND CONSISTENCY

3.A.2 Rulesfor the deepORL

In the next two subappendicesI give the rulesneededfor the two differentORL’s. Ialso indicaterules that areneededfor the deepORL but not the shallow ORL, andvice versa,with thesymbol“ * ”. Decidingwhich rulesaresharedis not ascompletelytrivial asit might seemsince,for instance,a rule thatconvertsanunderlying/u/ into¡ ou§ is really equivalent to a rule that convertssurface/aU/ into ¡ ou§ , sincethelatter phonologicalrepresentationis supposedto be derived from the former. Suchcasesarecountedasmatching.On theotherhand,in somecasesonemayfind thatasingleunderlyingphonemeis representedin severalpossibleways: thus/yu/ surfacesas/yu/, /yU/ and/y © /. In suchcasesonly oneof thecorrespondingshallow ORL rulesis countedasmatchingthedeepORL rule.

3.A. ENGLISHDEEPAND SHALLOW ORL’S 123

1 � + , au-2 tyon + , tion -3 kyos + , cious- / #4 os + ous / #5 (k . k)s + x6 kw + , qu-7 gz + , x -/8 ız + , es- / 0 #

(plural /iz/ spelled , es- )/9 il + , le - / (a. i)b #

10 r + , er- / C #11 1 + , e- / [ 0 tense]C2 #

(Adda “silent” , e- after tensevowels)12 e + 1 / ıgn #13 e + 1 / ı 3547698;: [ 0 cor, 0 cont] #

( , ea- requiresno “silent” , e- exceptwith intervening, s- )14 l + , le - / C #15 ı + , ee- / ðC < #16 c + , ch-17 ' + , th -/18 g + , g -19 = + , th -20 b + , b -/21 k + , c -22 d + , d -23 f + , ph- / . . . [ 0 gk]24 r + , rh - / # . . . [ 0 gk]25 f + , f -26 g + , g -27 h + , h -28 l + , l -29 m + , m -30 n + , n -31 p + , p -32 r + , r -33 t + , t -34 v + , v -35 w + , w -36 yu + , u - / #37 y + , y -38 z + , z -39 s + , ce- / n #40 s + , s-41 u + , ou-42 oI + , oi -43 ô + , u -44 a + , a-45 a + , a-46 e + , e-47 ı + , i -48 o + , o -49 u + , u -

124 CHAPTER3. ORL DEPTHAND CONSISTENCY

50 u + , u -51 o + , o -52 I + , i -53 e + , e-54 ˇ + , g - / >?,A@�-B.�,ACD-B.�,FE;-HG55 ˇ + , j -56 k + , k - / >?,A@�-B.�,ACD-B.�,FE;-HG57 k + , c -58 i + , y - / #(this of coursecouldbemodeledasa surfaceconstraint —Section3.5)

3.A. ENGLISHDEEPAND SHALLOW ORL’S 125

3.A.3 Rulesfor the shallow ORL* 1 � Ë ¡ o § / r2 � Ë ¡ au§3 (c Í s)© n Ë ¡ tion § / ( © l)? #4 s© s Ë ¡ cious§ / #* 5 o Ë ¡ a§ / w6 ks Ë ¡ x §7 gz Ë ¡ x §8 kw Ë ¡ qu§* 9 I Ë ¡ e§ / # pr Í r Í d* 10 aIz Ë ¡ ize§ / #* 11 z Ë ¡ s§ / #

12 r Ë ¡ er§ / C #* 13 z© m Ë ¡ sm§ / #14 © s Ë ¡ ous§ / #15 © l Ë ¡ le § / #16 � Ë ¡ e§ / [ � tense]Cý #17 e Ë � / ı ñ�IKJ�ó ¼ [ � cor, � cont] #18 l Ë ¡ le § / C #19 i Ë ¡ ee§ / ðCÀ #20 c Ë ¡ ch§21 ' Ë ¡ th §22 L Ë ¡ th §23 b Ë ¡ b §24 d Ë ¡ d §25 f Ë ¡ ph§ / . . . [ � gk]26 r Ë ¡ rh § / # . . . [ � gk]27 f Ë ¡ f §28 g Ë ¡ g §29 h Ë ¡ h §30 l Ë ¡ l §31 m Ë ¡ m §32 n Ë ¡ n §33 p Ë ¡ p §34 r Ë ¡ r §35 t Ë ¡ t §36 v Ë ¡ v §37 w Ë ¡ w §38 yu Ë ¡ u § / #* 39 y © Ë ¡ u §* 40 yU Ë ¡ u §41 y Ë ¡ y §42 z Ë ¡ z §43 s Ë ¡ ce§ / n #44 s Ë ¡ s§* 45 aU Ë ¡ ou§46 oI Ë ¡ oi §47 ô Ë ¡ u §

126 CHAPTER3. ORL DEPTHAND CONSISTENCY

48 æ Ë ¡ a§49 eI Ë ¡ a§50 i Ë ¡ y § / #51 i Ë ¡ e§52 aI Ë ¡ i §53 oU Ë ¡ o §54 u Ë ¡ u §55 U Ë ¡ u §56 o Ë ¡ o §57 I Ë ¡ i §58 e Ë ¡ e§* 59 r. Ë ¡ r § / V60 r. Ë ¡ er§* 61 � Ë ¡ a§* 62 � Ë ¡ n § / [ � velar]* 63 © Ë ¡ a§ / #* 64 © Ë ¡ a§ / #* 65 © Ë ¡ e§66 ˇ Ë ¡ g § / ¾a¡BMƧ Í/¡ONH§ Í/¡BPq§&Ä67 ˇ Ë ¡ j §68 k Ë ¡ k § / ¾a¡BMƧ Í/¡ONH§ Í/¡BPq§&Ä69 k Ë ¡ c §

Chapter 4

Linguistic Elements

Un petit d’un petitS’etonneauxHallesUn petit d’un petitAh! degreste fallentIndolentqui nesort cesseIndolentqui nesemeneQu’importeun petit d’un petitToutGai deReguennes.

vanRooten,Luisd’Antin. 1967.Motsd’Heures:Gousses,Rames.Thed’AntinManuscrip-t, pageI. PenguinBooks,New York, NY.

In Section1.2 we madea numberof specificassumptionsaboutwhat kinds oflinguistic elementswritten symbolsrepresent.More specificallyandmoreformallywe assumedthat both phonologicaland semanticportionsof an AVM representinga morphemeor word can in principle licensegraphicalelements.This assumptionnaturallybegsthequestionof therangeof linguistic elementsthatcanberepresentedby written symbolsin the world’s writing systems.This questionis the topic of thischapter. Thequestionof whatkindsof linguistic elementswritten symbolsrepresentis the single most investigatedissuein the study of writing systems. Gelb (1963)is normally creditedwith beingthe first to systematicallyinvestigatethe matter, andevery extensive discussionof the topic sincehaspresenteda classificationof writingsystemsbasedonwhich linguisticelementsthewriting systemsupposedlyrepresents.

I startthe discussion(Section4.1) with a review of someof the moreinfluentialtaxonomiesof writing systems.As weshallsee,thesetaxonomiesaremostlyarboreal:I will endthe sectionwith a proposalfor a non-arborealtwo-dimensionaltaxonomythat takesasonedimensionthe typeof phonographyencodedby thewriting system,andasanotherdimensionthedegreeof logographyof thesystem.

Chinesewriting was introducedin Chapter1 as a mixed systemthat generallyinvolvesbothphonographicandsemantic— or logographic— elements,a position

127

128 CHAPTER4. LINGUISTIC ELEMENTS

arguedmostforcefully by DeFrancis(1984;1989).Wefurtherjustify thisassumptionherein Section4.2.Alsoaswehavepreviouslyargued,thephonographicandsemanticelementsrepresentedgraphicallyin a Chinesecharacterare in a relationof overlap.This formal property, along with Axiom 1.3, hasmakes an interestingand correctpredictionaboutthewritten representationof disyllabicmorphemesin Chinese.

We turn in Section4.3 to a discussionof Japanesewriting, in particularwith re-spectto its useof Chinesecharacters.Japaneseis surelythe mostcomplex modernwriting system,andthehardestto force into any taxonomicmold. The propertiesofthe Chinesescriptasit is usedin the Chinesewriting systemcontrastratherdramat-ically with the useof the samebasicwriting system— kanji — in Japanese,a pointthatSampson(1985;1994)showsconvincingly in his discussionof this topic. As weshallargue,Japaneseuseof kanji is logographicto a greaterdegreethanis useof thesuperficiallysimilar setof symbolsin Chinese.Thecharacterizationof the Japanesewriting systemasa wholeis a ratherdifferentmatter, however. Thebestcharacteriza-tion wouldappearto bethatit is basicallyaphonographicsystem,but with significantamountsof logography.

We endthechapter(Section4.4)with a discussionof a few esotericgraphicalde-vicesusedin somewriting systems,andpresentananalysisof eachwithin thepresentframework.

4.1 Taxonomiesof Writing Systems:A Brief Overview

Our purposehereis not to beexhaustivebut ratherto presenta smallsampleof someof themoreinfluentialtaxonomiesof writing systems;a morebalancedreview canbefoundin (Coulmas,1994)(andseealso(DeFrancis,1989,pages56–64)).

4.1.1 Gelb

Gelb’s taxonomyof writing systemsis generallyviewed asthe startingpoint for allsubsequenttaxonomies.Gelb’s purposein his classificationwaslargely teleological:heviewedthesegmentalphonographicalphabetastheevolutionaryhighpointof writ-ing systems,andall otherwriting systemscouldbeviewedasfalling on a continuumfrom pictographicnon-writingto alphabeticwriting. Thusa linearpresentationseem-s mostappropriatefor Gelb’s taxonomy, andthis is what is presentedin Figure4.1.Note thatGelbclassifiedMayanwriting amonghis “limited-systems”subcategory oftheforerunnersof writing; of course,this is now known to befalse,sinceMayanwrit-ing is a full writing systemcontainingboth logographicandphonographicelements;see(Macri, 1996),inter alia. Thereis alsogeneraldisagreementwith Gelb’s classifi-cationof theconsonantalSemiticwriting systemsassyllabic.

4.1.2 Sampson

A lessteleologicalview of writing is presentedin (Sampson,1985); seeFigure4.2.Oneimportantinnovationof Sampson’ssystemis theprimarydivisionbetween“glot-

4.1. TAXONOMIES 129

Fo

rerun

ners o

f writin

g: p

ictog

raph

s and

oth

er "limited

systems"

Wo

rd syllab

ic systems

Syllab

ic writin

g

Th

e Alp

hab

et

Mayan

Sum

erian, Chinese

Aegean syllabaries, K

ana, West S

emitic

Greek, etc.

Figure4.1: Thetaxonomyof Gelb(1963),alongwith examplesof writing systemsthatbelongto eachcase.

130 CHAPTER4. LINGUISTIC ELEMENTS

SemasiographicQ Systems

GlottographicR Systems

Segmental(Alphabetic)

Logographic Systems

SyllabicQ

ConsonantalS

Featural

Yukaghir [?]T

ChineseU

Linear B West SemiticV

GreekW

Hankul

Phonographic Systems

Figure4.2: Thetaxonomyof Sampson(1985).

tographic”writing, wherethe symbolsrepresentlinguistic elements;and“semasio-graphic”writing, wherethesymbolsrepresentconcepts,but provide no specificationof a linguistic form to expressthoseconcepts. Sampson(pages28–29)tentativelypresentsas an exampleof semasiographicwriting a Yukaghir “love letter”, and hegivesa few otherinstancesaswell — for examplea setof pictographicinstructionsfor startingaFordcar(page30). But on thewhole,theevidencefor semasiographyasa viable category of writing systemsis tenuous.Indeed,Sampson’s primary interestappearsto beto suggestmerelythata fully communicativesystemof semasiographymight in principlebepossible,ratherthanto arguethatsuchasystemhaseverexisted.

Among glottographicsystemsSampsontakes the relatively traditionalview thatChinesewriting is logographic.This is because,in hisview, Chinesecharactersdonotencodephonologicalinformation: rather, he claims,Chinesecharactersdirectly rep-resentmorphemes,sothattheChinesereadermustsimply learnwhich charactergoeswith which morpheme,andultimatelywith which pronunciation.This is of coursea

4.1. TAXONOMIES 131

fairly standarddefinitionof logography.1 It differsfrom theview thatweareassuminghere— seeSection1.2.2andthediscussionbelow in Section4.2— in thatweviewanycomponentof a writing systemashavinga logographicfunctionif it formallyencodesa portion of non-phonological linguistic structure, whetherit bea wholemorpheme,or merelysomesemanticportionof thatmorpheme.Thelatterkind of encodingmightperhapsbecalled“semasiographic”,but for variousreasonsI preferto avoid thatterm.

In additionto adoptingthecommonview of Chinesewriting, Sampsonalsotakesthe more innovative (andcontroversial)position that KoreanHankul is featural,anissuethatwe will returnto below.

4.1.3 DeFrancis

DeFrancis(1989)takesissuewith anumberof Sampson’sclaims.

4.1.3.1 No full writing systemis semasiographic

First, andmostimportantly, hearguesagainsttheexistenceor eventhepossibilityofsemasiographicwriting. His majorattackconsistsof demonstratingthattheYukaghir“love letter”, thatSampsonciteswasnot aninstanceof a systemof written communi-cationatall, but ratherapropin akind of “party game”whichwasneverintendedto beunderstoodby a reader, but ratherwasinterpretedfor othersby theauthor(pages24–35). Still, asSampson(1994)correctlyobserves,theargumentis not really fair: eventhoughtheYukaghir“letter” turnsoutnot to beaninstanceof semasiographicwriting,DeFrancisignorestheotherinstancesof semasiographicwriting (e.g. theiconic Fordinstructionmanual)that Sampson(1985)hadpreviously discussed:indeedDeFran-cis cannotdeny the existenceof iconic symbologythat communicatesideaswithoutrecourseto representingany specificallylinguistic information. On the otherhand,DeFrancisis correctin observingthat suchsystemsarealwayslimited in what theyarecapableof expressing:nobodyhasshown theexistenceof a writing systemthat isentirelysemasiographic,relying on no linguistic basisin orderto communicateideas,andwhich allows peopleto write to oneanotheron any topic they choose.It seemsfair to saythat theburdenof proof is on thosewho would claim that semasiographicwriting is possibleto demonstratetheexistenceof suchasystem.For DeFrancis,then,all full writing is glottographic.

4.1.3.2 All full writing is phonographic

But DeFrancismakesan evenstrongerclaim: all full writing is largely phonograph-ic. A purely logographicsystemis, accordingto him, impossible. ThusSampson’sclassificationof Chinesewriting as

logographicis incorrect.DeFrancis’basicargumentis simple:thevastmajorityofChinesecharactersthathave beencreatedthroughouthistoryareso-calledsemantic-

1Note that the basicdivision amongglottographicsystemsbetweenlogographicsystemsand phono-graphicsystemscorrespondsexactly to what Haas (1983)termspleremicandcenemic, respectively. Seealso(Coulmas,1989).

132 CHAPTER4. LINGUISTIC ELEMENTS

phoneticcompounds,suchasthe characterX ¡ INSECT+CHAN § chan ‘cicada’ dis-cussedin Section1.2.2whereoneelementin thecharactergivesahint of themeaning,and the other elementgives a hint at the pronunciation. The exact percentagede-pendsupon the sizeof the charactersetbeing considered:for the 9,353charactersthathadbeendevelopedup to the2ndcenturyAD, about82%of thecharactersweresemantic-phoneticcompounds;for theentiresetof 48,641charactersthatwererecord-ed by the 18th century, 97% were semantic-phoneticcompounds(DeFrancis,1989,page99), meaningthat essentiallyall of the characterscreatedbetweenthe 2nd and18th centurieswereof the semantic-phonetictype. No explicit estimateis given forthe percentageof suchcompoundsin the written vocabulary of the averageChinesereader(who canbeexpectedto know between5,000and7,000characters),but thereis no questionthat it will be thevastmajority. Thus,for DeFrancis,Chinesewritingis not primarily logographicat all, but whathetermsmorphosyllabic: it is basicallyaphonographicwriting system,with additionallogographicinformationencoded.

4.1.3.3 Hankul is not featural

Finally, hearguesthatKoreanHankul,while thereareclearlysome featuralaspectsthatwentinto its design,is basicallysegmental(pages186–200);notethatthis is alsothepositionof (King, 1996). Themajor reasonthatDeFrancisgives(andthis is alsoechoedin (Coulmas,1994))is thatKoreanchildrenlearningto readtypically mem-orizethesyllable-sizedgroupingsof elementsaswholes,andthatKoreanreadersarecertainlyunawareof the featuralrelationshipsbetweenthe symbols. This argument,it seemsto me,is rathershaky: what readersaretaughtor explicitly awareof in theirwriting systemis oftenat oddswith whata carefulanalysistells usis trueof thatsys-tem.For exampleFlesch(1981)citesseveralinstancesof Americanreadersof Englishtaughtreadingby theso-called“whole-word” method,whowereunawarethatEnglishwriting is basicallysegmentalwith particularlettersor combinationsrepresentingpar-ticular sounds.

Nevertheless,DeFrancisis probablycorrectin assertingthatHankul is basicallysegmental,ratherthan featural. To seethis, it is worth consideringa truly featuralscript,namelyBell’s “Visible Speech”(Bell, 1867;MacMahon,1996),whichwasde-velopedfor useasa universal phoneticalphabet.The constructionof the individualglyphsin thesystemencodesarticulatoryfeaturesin a consistenticonic fashion.Forinstanceconsonantplaceof articulationis indicatedby orientationof thebasicconso-nantglyph(aC-shapedsymbolfor consonant): thus Y is /x/ with thebowl of theglyphpointingleftwards,indicatingaconstrictionatthebackof themouth;and Z is / [ /, withthe rightward-pointingbowl representinga bilabial constriction.Stops(closures)areindicatedby closingoff theopenpartof theglyph: thus \ is /k/ and ] is /p/. Voicingis indicatedby a bar that is iconic for the nearclosureof the glottis during voicing:thus ^ is /g/ and _ is /b/. Nasalityis indicatedby turninghalf of theclosurebarinto awavy line, which indicatestheloweredsoftpalate(MacMahon,1996,page838): thus`

is / � / and a is /m/. Otherfeaturesaresimilarly indicatedin a consistentfashion.In contrast,Hankul is not by any meansasconsistentin its representationof fea-

tures. While somefeaturesare representedconsistentlyby propertiesof the script,

4.1. TAXONOMIES 133

Figure4.3: Featuralrepresentationof KoreanHankul, from (Sampson,1985,page124),Fig-ure19. (Presentedwith permissionof Routledge/StanfordUniversityPress.)

othersareonly inconsistentlyrepresented,andstill othersnot at all. Considertheba-sic Hankul segmentalelementspresentedin (Sampson,1985,page124), andshownherein Figure4.3. As we have notedcertainphonologicalfeaturesdo have a consis-tentrepresentationin theelementsof theHankulscript.Thus,for instance,thefeature(bundle)[ � tense,¦ aspirated](fifth row of Hankulsymbols)is representedby a dou-bling of thebasicsymbolusedfor thecorresponding(in termsof placeof articulation)

lax stop.Similarly, thesibilants(third column)have in commonthebasicsymbol¡ s§ , which representsa tooth (Sampson,1985,page125), or in otherwords is anindicationof theplaceof articulationof theconsonantsin question.Ontheotherhand,labiality is not consistentlyrepresented:/m/, /b/ and/p/ sharea commonshape(thesquarerepresentinga mouth),but /pb / is different. And somefeaturesarenot repre-sentedatall: thusthereis norepresentationof thefeaturenasal:/m/, /n/ and/s/appearto be on a par, involving the “basic” glyphsfor eachplaceof articulation. Thereissimilarly no consistentrepresentationof thefeature[voiced] (lax, in Sampson’s clas-sification) for stops: the apicalandsibilant voicedelementshave an overbar, but inthecaseof /b/ we find not anoverbar, but anextensionof thetwo verticalsidesof thesquareof /m/; the overbarfor /g/, accordingto Sampson,representsthe roof of themouthtouchingthepalate,andthusis presumablynot thesameastheoverbarfor the

134 CHAPTER4. LINGUISTIC ELEMENTS

Syllabicc

PureSyllabic

Morphosyllabic

Segmentalc

Consonantald

Alphabetice

PureConsonantal

MorphoConsonantal

PurePhonemic

MorphoPhonemic

W. Semiticf

Egyptian Greek English,Korean

Chinese, Sumerian

kana, Cherokee

Figure4.4: DeFrancis’classificationof writing systems.

apicalandsibilantglyphs. Finally, thereis no consistentrepresentationof aspiration:in somesymbols(/cb /, /h/) it seemsto be representedasa dot over the basicglyph(third row), andin othercases(/t b /, /k b /) it appearsasahorizontalbarinsidethebasicglyph.

Noneof this shouldbeinterpretedasdenigratingtheHankulscript: it is probablythe mostscientificscript in commonusetoday. There is no questionthat the basicdesignof Hankul is phoneticallymotivatedto a highly sophisticateddegree:afterall,many of the shapesare depictionsof modesof articulation,andare iconic in a waysimilar to Bell’sVisible Speech.But it falls shortof being a featuralscriptin thewaythatVisibleSpeechis: ratherit is betterviewedasanintelligentlyconstructedsegmen-tal alphabet.Thisconclusionshouldnotbesurprising.After all, evenin unequivocallysegmentalsystems,onestill findssymbolsthatseemto encodeindividual features.Soin Russian,for instance,thesoft sign gK¡ ’ § markspalatalizedconsonants,andthusmight beviewedasencodingthefeature[ � high] for consonants.ClearlyHankulhasmorefeaturalaspectsthanRussianorthography, yet it is probablymorea matterofdegreethanof kind.

Thethreeviewswe have just discussed,takentogether, leadDeFrancisto proposea classificationof truewriting systemsasdepictedin Figure4.4.

Somecommentsarein order. DeFrancis’basicdivision is accordingto the typeof phonologicalunit represented:syllabicversussegmental,andwithin thelattercon-sonantal(representingmostlyor only consonants)versusalphabetic(representingallsegments).Within eachcategoryheassumestwo variants,namelyapurevariantwhereonly phonologicalinformation is representedin the system;and a morpho-variantwhere,additionally, morphologicalinformationis represented.Thisis obviousenoughfor Chinese,SumerianandEgyptian,whereeachof thesewriting systemshassemanticelementsthatrepresentmeaning-relatedpropertiesof morphemes.EnglishandKorean

4.1. TAXONOMIES 135

Hankularesimilarly classifiedsincein bothcasesthewriting systemsfail to befullyphonemic.We shall arguein the next subsectionthat this classificationrepresentsacategoryerror: EnglishandHankularenot on aparwith Chineseor Egyptian.

4.1.4 A newproposal

While I acceptseveral basicassumptionsof DeFrancis’classificationscheme,therearenonethelessseveralareaswhereI feel his schemeis deficient.TheseI enumeratebelow:

1. Despitetheprominentpositiongivento syllabariesin DeFrancis’taxonomy(aswell asmostotherschemes),it is importantto realizethat full syllabaries—that is, systemswhereall syllablesof the languagearerepresentedby (oneormore)singlesymbols— areactuallyvery rare.Thoughscholarsof writing sys-temshave undoubtedlybeenawareof this point for a long while, it wasto myknowledgefirst madeexplicitly by Bill Poserin a presentationat theLinguisticSocietyof America in 1992. Poser’s basicpoint wassimple. In the majorityof systemsthatarecalled“syllabic” — amongthem,JapaneseKana,LinearB,thephonologicalcomponentof Sumerianwriting, thephonologicalcomponentof Mayanwriting — onedoesnot find a symbol for every full syllableof thelanguage.Instead,whatonefindsaresymbolsfor simple“core” (C)V syllables,possiblyaugmentedto includeonglides(CGV); morecomplex syllablesarerep-resentedin writing by combiningthecoresyllablesymbolswith symbolseitherrepresentingsinglephonemes,or elseothersymbolsrepresentingcoresyllables.A few exampleswill serve to illustratethepoint.h KanasymbolsrepresenteitherV, CV or CGV. To representa CVV syl-

lable, one must combinea basicCV symbol with a V symbol. Thus asyllable/nai/ would be representedas ¡ na§H¡ i § . Japaneseorthographymight thereforebedescribedasmoraic (cf. (Horodeck,1987,page33)).h In LinearB (Miller, 1994;Bennett,1996),symbolsrepresent(C)V, CGV,or in a few casesCCV. More complex syllablesare(partially) representedusingcombinationsof thesebasicunits. Thusthe(disyllabic)word /ksen-wion/ was representedas ¡ ke§H¡ se§H¡ ni §H¡ wi §H¡ jo § (with no repre-sentationof thefinal /n/), andthe(tetrasyllabic)word /mnasiwergos/wasrepresented¡ ma§H¡ na§H¡ si §H¡ we§H¡ ko § (with no orthographicrepre-sentationof the/r/ andfinal /s/) (Miller, 1994,pages18–22).h In Sumerian,complex syllableswereoften representedby “syllable tele-scoping”(DeFrancis,1989,pages81–82),wherebya CV graphemicunitandaVC graphemicunit werecombinedto representaCVC phonologicalsyllable.Thus ¡ ki § + ¡ ir § would represent/kir/.

In onesense,of course,suchsystemsare syllabaries:the phonologicalunit-s representedby the simpleglyphsaregenerallywell-formedsyllablesof thelanguage.But this is misleading.Whenonespeaksof alphabeticsymbolsit is

136 CHAPTER4. LINGUISTIC ELEMENTS

taken for grantedthat therearesymbolsavailableto representevery phonemicsegmentof the language.So, the term syllabaryoughtto similarly imply thateverysyllableof thelanguagehasa graphemicsymbolassociatedwith it. Mostso-calledsyllabariesdo not meetthatrequirement.2 For lack of a betterterm,Iwill henceforthtermsuchsystemscoresyllabaries.3

Still full syllabariescertainlydo occur: Chineseis onesuchexample,thoughof courseit is not a purephonologicalsystem. Anotherexampleseemsto betheYi syllabary(DeFrancis,1989;Shi, 1996),which in its classicform wasamorphosyllabicsystemlikeChinese(andmayhavebeeninfluencedby Chinesewriting), but in its modernform — at leastthepopularstandardizedform thatisrecommendedby thegovernment(Shi, 1996,241), it is a purelyphonographicsyllabary. Yi syllable-structureis exceedinglysimple,with basicallyonly CVsyllables(including somediphthongs)allowed. However, thereare44 conso-nants(including the emptyonset),10 vowels and3 lexical tones,resultingina syllabaryof 819 charactersonceall legal C+V+Tonecombinationsarecon-sidered. Of coursethe complexity of the syllablestructurerepresentedby Yisyllabogramsis no more complex than thoserepresentedby typical coresyl-labaries:but in Yi eachdistinct full syllableis representedby a separateglyph,unlike,for example,thecasein Japanese.

Note that to saythat true syllabariesarerarerthanusuallysupposedis not, ofcourse,to deny the importanceof the syllable as an organizationalunit in agreatmany writing systems;we have noted(as have others)theimportanceofsyllablesin Hankul,DevanagariandPahawh Hmong,andmany othersystemscouldbe cited. We even arguedfor the importanceof syllablesin the Russianwriting system(Section3.5).

2. “Morphophonemic”systemssuchasEnglishor Korean,areparallelin DeFran-cis’ taxonomyto morphosyllabicsystemslike Chineseor Sumerian,andmor-phoconsonantalsystemslike Egyptian.This is a categoryerror.

WhatmakesKoreanandEnglishlessthanfully “phonemic”relatesnot to whatis representedby thebasicsymbolsof thescript,but to thephonologicaldepthof what is represented,and the amountof lexical marking one must assume:in other words it relatesto the depthof the ORL, and other issuesdiscussedin Chapter3. This issuewasexplicitly discussedfor English in that chapter;relevantdiscussionon Koreancanbe found in (Sampson,1985,pages135ff.),who describesa set of rules to predict the actualsurfacepronunciationof astringof KoreanHankul,givenregular (morpho)phonological processesof thelanguage.As I alsodiscussedin Section3.2,it is particularlyamistaketo equate

2Of course,thisargumentis notentirelyfair, sincein many alphabeticsystemscertainphonemesareonlyrepresentedby combinationsof basicsymbols,suchasdigraphs:so/c/ in Spanishis only representablebyê chë and/ � / and/ ' / in Englishareonly representableby ê th ë . But polygraphsaretypically theminorityin alphabeticsystems,and thereare many segmentalsystemsthat do not have polygraphs. In contrast,polygraphicrepresentationof complex syllablesin so-calledsyllabariesappearsto bethenorm.

3Fischer(1997a;1997b) termsthem“opensyllabaries”,but this termis suboptimal:CVV syllablesareafterall “open”, thoughthey tendnot to berepresentedwith singlesymbolsin coresyllabaries.

4.1. TAXONOMIES 137

the lexical orthographicmarkingof English(e.g. the markedspellingof /n/ inknit) with the logographiccomponentsof Chinesewriting, an equationthat isimplicit in DeFrancis’classification.

3. Calling Egyptian“consonantal”,andthusequatingit with Semiticwriting sys-temsobscuresoneuniquepropertyof Egyptian,namelytheexistenceof bi- andtriliterals, standingfor two andthreeconsonantsrespectively (Ritner, 1996). Infact thesemake up the majority of the system:“uniliteral” symbolsconsistofonly about25 symbols;biliteralsabout80; andtriliterals 70. Egyptianwritingmight thereforebetterbedescribedas“polyconsonantal”.

4. While I acceptDeFrancis’basichypothesisthatonecannotconstructa full writ-ing systemon completelylogographicprincipleswithout recourseto phonogra-phy, logographyis nonethelessan importantaspectof many writing systems,apointwhich noscholarwouldpresumablydeny.4

Ontheotherhand,asDeFrancishasargued,for awriting systemto beextensibleit musthave a robustphonographiccomponent:onecannotefficiently developwritten representationsfor neologismsif oneis restrictedto purelylogographicmeans.5 Thusnomatterhow largetheamountof logographyaparticularwritingsystemhas,logographyis clearly not on a par with phonography, andshouldthereforenot be representedaspartof thesamearborealtaxonomyasit is, forexample,in Sampson’ssystem.

Thelastpointmotivatesustoabandonthetraditionalarborealclassificationof writ-ingsystemsin favorof atwo-dimensionalarrangementwherethetypeof phonographyusedrepresentstheprimarydimensionandamountof logographyusedrepresentsthesecond.This schemeis representedin Figure4.5. Naturallythedegreeof logographyis tricky to estimate— thoughI believe it canbeestimated— andthearrangementofparticularwriting systemsin thisseconddimensionis largely impressionistic.But it isimportantto realizethatall writing systemsprobablyhavesomedegreeof logography.Sowritten Englishcontainsnumeroussymbolsandletter sequencesthat canonly beconstruedlogographically: ¡ & § , ¡ lb § , ¡ $ § , arejust threeexamples.In thetaxon-omy, alphasyllabariessuchasDevanagari (Section2.3.2)areclassifiedasalphabets.Thestatusof “onset-rime”scriptslikePahawh Hmong(Section2.3.3)is unclear:theyarealmostsegmental,but symbolsfor rimeslike / ��� / show themnot to be completelyso. Onemight considersettingup a specialcategory for suchscripts;in the currentschemeI classifyPahawh Hmongasfalling somewherebetweenalphabetsandcoresyllabaries.

4Indeed,asSampson(1994,page122) cogentlypointsout, logographyis in no way anomalousonceoneobservesthat“any naturallanguagehasunitsat many levels,andin particularthatall humanlanguagesexhibit a ‘doublearticulation’ into units carryingmeaning,on theonehand,andphonologicalunits . . . onthe other. . . . It is at leastlogically possible,therefore,that a glottographicscript might assigndistinctivesymbolsto elementsof thefirst ratherthanof thesecondarticulation.”

5As weshallseein ourdiscussionof Japanesebelow, kokuji — Japanese-inventedChinesecharacters—areinstancesof purelylogographicconstructionsinventedto representwordsthatdid notpreviouslyhaveawritten representation.But therearenomorethanacoupleof hundredof these,whereasthenumberof newwordsthathave representationsin thekanacoresyllabarynumberin thethousands.

138 CHAPTER4. LINGUISTIC ELEMENTS

Alphabetici

Core Syllabic Syllabicj

Consonantal Polyconsonantal

Type of Phonographyk

Am

ou

nt o

f Lo

go

grap

hy

W. Semiticl

Perso−Aramaic

Egyptian

Modern Yi

Chinese

Sumerian,Mayan,Japanese

Linear BEnglish,Greek,Korean,Devanagari

PahawhmHmong

Figure 4.5: A non-arborealclassificationof writing systems. On Perso-Aramaic,seeSec-tion 6.1.

Thereis of coursenoreasonto stopattwo dimensions,thoughit is moreconvenientto do so for simplicity of presentation.An additionaldimensionwould relateto thedepthof theORL, andothertopicsdiscussedin Chapter3; in this dimension,Englishand Koreanwould patterndifferently from, say, Greek.This is of coursethesensein which DeFrancismeantthat EnglishandKoreanare “morphophonemic”,unlikeGreek:but thedimensiononwhichthey differ is orthogonalto thedimensiononwhich,say, ChineseandModernYi differ.

Yetanotherdimensionwouldbethedegreeto whichcomplex planararrangementshave a significantfunction in a writing system,the topic of Chapter2: KoreanandDevanagariwouldpatterndifferentlyfrom Englishon this dimension(Faber, 1992).

4.1.5 Summary

We have suggesteda view of writing systemswherelogography— definedas thegraphicalencodingof non-phonologicallinguistic information— is anorthogonaldi-mensionfrom phonography:writing systemscanthusbeclassifiedminimally in a twodimensionalspaceaccordingto whattypesof phonologicalelementsareencoded,andto how muchlogographythey have. Encodedphonologicalelementsrepresenta rangeof possibilities,asis well known,but normallythemaximumsizeof suchelementsis acore-syllabicCV or VC unit: in particularrarelydoesonefind a purelyphonographicsystemthat representseachpossiblesyllableof the languagewith a distinctelement.Egyptianrepresentsanapparentlyuniquepolyconsonantalsystem.

Theorthogonalityof logographyandphonographyis entirely in keepingwith theformalmodelwepresentedin Section1.2.2: assumingthatlogographicelementsrep-resentinformationrelatedto theSYNSEMattribute,thenaswe previously observed,thelinguistic informationencodedby logographicelementsis not in ahierarchicalre-

4.2. CHINESEWRITING 139

lationshipwith informationencodedby phonographicelements.Thereforeit shouldinprinciplebepossiblefor writing systemsto selectdifferentmixesof logographicandphonographicencodings,andto exhibit both in thesamesystem.For differenttypesof phonographythis ability to mix is lessnatural: syllablesdominatesegment-sizedunits, andso if a systemis core-syllabicit is naturalfor it to choosemostor all ofits elementsat that level of thehierarchy;mixescouldhappen,but onein fact rarelyif ever finds systemsthat have botha largecollectionof graphemesthat denotecoresyllablesandanothercollectionof graphemesthatdenotesegments.Systemstendtochooseonephonologicallevel to encode.

DeFrancis’mainthesisis thatChineseis a writing systemthatis basicallyphono-graphic(asheclaimsareall writing systems),but with alargelogographiccomponent.The conclusionof the precedingparagraphthat suchsystemsareexpectedgiven theformal apparatusof Section1.2.2is of courseconsistentwith DeFrancis’thesis.Buttheformalmodelof Chinesesemantic-phoneticcharacterspresentedtherestill begsthequestionof whetherthereis compellingevidencethatChinesewriting really behavesthatway: doesit actuallybuy you anything to assumethat the ¡ INSECT § portionofchan ‘cicada’,encodesaportionof theSYNSEMfield, whereastheputatively phono-graphicportion ¡ CHAN § encodesphonologicalinformation? We will arguethat itdoes,bothin thenext section,andin Chapter5 whereweaddresspsycholinguisticevi-dencefor readers’onlineprocessingof characters.In thenext section,in particular, wewill arguenot only that thephonographicportionof Chinesecharactersplaysan im-portantrolein encodingphonologicalinformation,but thattheformalmodelpresentedearliermakesaninterestingpredictionabouttheencodingof disyllabicmorphemesinChinese.

4.2 ChineseWriting

The traditionalChineseclassificationof charactersdividestheminto six groups,theso-calledli u shu, or Six Categoriesof Characters.Of these,four relateto thestructuralpropertiesof thecharacters,andtwo to their usage(Wieger, 1965,page10). It is thestructuralpropertiesthatwill concernushere,thefour categoriesof interestbeing:h Pictographs(xiangxıng): e.g. n ¡ PERSON § ren ‘person’, o ¡ TURTLE § guı

‘turtle’.h Indicative symbols(zhıshı): e.g. p ¡ DOWN § xia ‘downwards’, q ¡ UP §shang ‘upwards’.h Semantic-semanticcompounds(huıyı): e.g. r ¡ FEMALE+CHILD § hao‘good’, s ¡ GRASS+FIELD § miao ‘sprout’.h Semantic-phoneticcompounds(xıng sheng): e.g. X ¡ INSECT+CHAN § chan‘cicada’, t ¡ TREE+XIANG § xiang ‘oak’.

Thefirst threecasesarereasonablyuncontroversial:thereis noquestionthatthesethreegroupsof signs,which in total numberno morethanabout1,500in the largest

140 CHAPTER4. LINGUISTIC ELEMENTS

dictionary (DeFrancis,1989, page99), are logographs,without any representationof phoneticinformation. Of course,asCoulmas(1989,page50) notes,logographicsymbols,which theoreticallyshouldmapdirectly to anon-phonologicalportionof themorphologicallevel of representationdo, in the minds of skilled readers,also mapdirectly to phonologicalrepresentation:a skilledEnglishreader, for example,will un-consciouslymap ¡ lb § to /paUnd/, andequivalentfactshold for Chinese.6 Thecon-troversialcharactersareof coursethefourthcategory, thesemantic-phoneticgroup,thecategory thatDeFrancis insistsarebasicallyphonetic,whereasSampsonhasargued(andmany othershavemerelyassumed)arelogographic.

The problemwith the categorizationof the semantic-phoneticcategory revolvesaroundthefactthatthephonologicalinformationprovidedby thephoneticcomponentis sometimesperfect,frequentlyonly partial, and in somecasescompletelyuseless.An exampleof eachcaseis givenbelow:

Char. Analysis Phon.Component ActualPron. Glosst ¡ TREE+XIANG § u xiang(‘elephant’) xiang ‘oak’v ¡ BIRD+JI A § w ji a (‘cuirass’) ya ‘duck’x ¡ DOG+QING § y qıng (‘green’) cai ‘guess’

Thedistribution of thesethreetypesis quiteskewed: therearea smallnumberofthefirst category (perfectmatch),a few of thelastcategory (completelyuseless),withmostfalling into thesecondcategory (somewhathelpful). Somephoneticcomponentsare in generalmoreuseful thanothers: for exampleall charactershaving z huang(‘emperor’)asaphoneticcomponenthavethepronunciationhuang, matchingthebasecharacterdown to thelevel of thetone;see(DeFrancis,1989,e.g.,pages102–103)fora rangeof otherexamples.7

Now, if thephoneticcomponentwerealwaysa perfectindicatorof thepronuncia-tion of thecharacter, thentherewould presumablybeno contention:everyonewouldagreethatmostChinesecharactersarebasicallyphoneticsymbols,with additionall-ogographicinformation(thesemanticcomponent).But becauseof the imperfectionsin therepresentationof thephonologicalinformation,mostauthorshaveassumedthat

6This is presumablythebasisof theuseof characterspurely for their pronunciation,a practicethathasbeenfollowed for centuriesto transliterateforeign words,whetherthey be Sanskrittermsfrom Buddhisttracts,or present-dayforeignnameslike {}|�~ kelındun ‘Clinton’. In ModernChinese,theparticularcharactersthatareusedfor this kind of “phonetictranscription”areamore-or-lessclosedclass;see(Sproatet al.,1996)for somediscussion.SeealsoSection4.3for adiscussionof theequivalentJapaneseateji.

7What is the reasonfor inexact matches?In somecasesthe reasonis historical soundchange. Forexample,many characterswith thephoneticcomponent� xıng/hangarepronouncedxingor hang(Wieger,1965,page443)(asis thebasecharacter).Thesetwo syllablesaretheresultof ahistoricalsplit in Mandarin.Thereis no questionthat the phoneticcomponentsweremoreuseful in the pastthanthey arein ModernMandarin(andmay be moreusefuleven today in otherChineselanguages,suchasCantonese,thoughIhavenotseenaninvestigationof this topic). EvenSampson(1994)admitsthattheChinesesystemmayhavebeenamuchmorephonographicsystemat onetime. See(Baxter, 1992)for a comprehensive discussionofthephonologyof earlyChineseandits relationshipwith thephoneticcomponents.

In othercases,thehistoricalargumentis lessconvincing: � ê WATER+X ING ë ‘overflow’ is pronouncedyan, which presumablywasnever historically derivable from xıng/hang: presumablythis way of writingthecharacterwaschosensincethephoneticcomponentwasdeemedsimilarenoughto theintendedreading,andsince� in its readingxıng means‘go’, it mayhave alsocontributedsomesemanticinformationto thecompositecharacter.

4.2. CHINESEWRITING 141

the systemis no longerphonographic.Although the comparisonwith Englishis notentirely fair (Englishorthographyis never asirregularasChinese),it is interestingtonote that preciselythe sameassumptionshave beenpopularly madeaboutEnglish.Indeed,themisguidedassumptionthatEnglishis not basicallya phoneticbut ratheralogographicwriting systemhashada significantimpacton theteachingof readingintheUnitedStates,aslamentedandattackedin (BloomfieldandBarnhart,1961;Flesch,1981); this assumptionstemsin largemeasurefrom the fact thatEnglishspellingisnot optimal for cuing the readerto the pronunciationof the word, meaningthat thespellingandpronunciationof somewordsmustsimply belearned.8

DeFrancis’argument,however, is not thatChinesewriting is agoodphonographicsystem: indeed,he stressesthat it is a lousy one. However, it is muchmoreusefulto view it asanimperfectphonographicsystemwith additionallogographicattributes,than it is to view it as a wholly logographicsystem. Apart from the distributionalreasonsthatDeFrancisdiscusses,thereareotherreasonsfor assumingthatChineseislargely phonographic,andthat in particularthe phonographicinformationresidesinthephoneticcomponent,whenthatis present.Amongthesereasons:h The evidencefor the psychologicalreality of the phoneticcomponent,asdis-

cussedin thenext chapter.h Thecommon-senseobservationthatChinesereaders,whenencounteringanun-familiar character, will attemptto guessits pronunciationfrom the phoneticcomponent.Indeed,with acompletelyunfamiliarcharacter, they havenochoicebut to adoptthis strategy. An instanceof this is thecharacter� ¡ FISH+XUE §xue ‘cod’. Apparentlythis characterwasa Japaneseinvention,a kokuji (Sec-tion 4.3), wherethesecondelement� wasusednot for its pronunciationxue,but for its meaning‘snow’ (thefleshof cookedcodbeingsnowy white). Thusthecorrectanalysisfor Japanesewould be ¡ FISH+SNOW § , a typical semantic-semanticconstructioncommonin kokuji. Whenthis characterwasborrowedbackinto Chinese,Chinesereadersinterpretedthe � componentasa phoneticcomponent,thusassigningthecharacterthepronunciationxue.h The developmentof somesimplified charactersin the Mainlandinvolving thesubstitutionof a differentphoneticcomponentfor theoneusedin thetradition-al script. Themainmotivation in charactersimplificationwasthereductionofthe numberof strokesneededto write the character, with the goal of makingChinesewriting easierto learn; see(DeFrancis,1984, inter alia). The major-ity of simplificationsinvolved stroke reductionsin componentsof characters,without actuallychangingthe componentsused. However a small percentageinvolved actually substitutingan easier-to-write component— usuallya pho-neticcomponent— for a morecomplex traditionalcomponent.In suchcases,asubstitutedphoneticcomponentwasmoreoftenthatnotacloserphoneticmatch

8Sampson(1985), aswe have noted elsewhere,makes a similar assumptionaboutEnglishspelling,thoughit shouldbestressedthathecannotbeaccusedof the samekind of naivete asAmericaneducatorswhohave subscribedto thisview.

142 CHAPTER4. LINGUISTIC ELEMENTS

to thepronunciationof thewholecharacterthanthetraditionalcomponentit re-placed.For instance,a countof phonetic-componentsubstitutionsfrom thelistof simplified/traditionalcharacterpairsin onedictionary(NanyangSiangPau,1984) revealed74 characters(64%) wherethe pronunciationof the substitut-ed phoneticcomponentis a closer, or at leastasclosea matchto the pronun-ciation of the whole characterthan that of the traditional component,and42(36%) wherethe substitutedcomponentis actuallya worsematch. Thus, �¡ GOING+ ZHUI § j ın ‘enter’ hasbeenreplacedin thesimplified scriptby¡ GOING+JING § .9 Similarly, traditional � ¡ EARTH+GUI § kuai ‘lump’ has

beenreplacedby ¡ EARTH+GUAI § . An instancewherethepronunciationof the substitutedcomponentis worseis � ¡ FORCE+ZHONG § dong ‘move’,

wherethe (lefthand)phoneticcomponent� zhong, is replacedto form¡ FORCE+YUN § .

Given theseobservations,it makessenseto assumeaswe have that the phoneticcomponentis licensedby thephonologicalinformationof thesyllablethatit encodes.Whatthenof thesemanticcomponent,which we have assumedis licensedseparatelyby a portionof theSEM attribute’s value?For X ¡ INSECT+CHAN § chan ‘cicada’,we hadassumedanAVM asin (1.8), repeatedhereas(4.1); andanannotationgraphasin (1.10),repeatedhereas(4.2).

(4.1) �����������PHON

��� SYL

�� SEG ��� ONS ����� � RIME �� �9�TONE � �� ��������

SYNSEM � CAT �������SEM ���?���¡ ¡�£¢ ��¤

ORTH ¥�¦ ¢�§ ¨ �£©� ����������

(4.2)

SEM: cicada: ¦TONE: 2SYL: ª : ¨ONS-RIME: ch an

Thetwo licensingcomponentsin (4.2)overlap,andAxiom 1.3 tells usthat they mustcatenatewith eachother: thespecificcatenationoperatorchosenis (in this case)pre-dictableby rule, following thediscussionin Section2.3.4.

9Notethatthesyllablesjin andjing arehomophonousfor many Mandarinspeakers.

4.2. CHINESEWRITING 143

Note,however, thatsincemostChinesemorphemesaremonosyllabic(DeFrancis,1984),it would seemhardto distinguishbetweenthesomewhatelaboratetheorypre-sentedhere,andtheseeminglysimplertheorythatstatesthatthephoneticcomponent¨¬« CHAN ­ , is indeedlicensedby the syllable,but that the semanticcomponent¦ « INSECT ­ is simply someexcessbaggagethathappensto beassociatedwith thesyllablefor this particularword. In otherwordswe would like to distinguishour pro-posalfrom the alternative theorythatstatesthat theso-calledsemanticcomponentisnot licensedby thesemanticportionof theAVM at all.

Crucial evidencecomesfrom the orthographicrepresentationof disyllabic mor-phemes.Somewell-known examplesof disyllabicmorphemesincludehudie ‘butter-fly’, putao ‘grape’andbınlang‘betel’: asfarashistoricalrecordsallow usto determinethesewordsdo not derive from morphologicallycomplex formsandthereis certainlyno synchronicevidenceof morphologicalcomplexity. While varioussourcesdiscussdisyllabicmorphemes,onerarelygetsa clearsenseof how many of thesemorphemesthereare:DeFrancis(1984;1989),for instance,only discussesa few suchcases,andsuchcursorytreatmentis thenorm.10 In factdisyllabicmorphemesprobablynumberaroundahundred:74arelistedin Tables4.1and4.2,andthisby nomeansacompletelist.

Beforewe considerthoselists, however, let usseewhat thetheorypredictsabouttheorthographicrepresention.Considerbınlang ‘betelnut’ which is written usingthe« TREE ­ radical ® andphoneticcomponents and ° representing,respectively,thetwo syllablesbın andlang, andthuscouldbetranscribedas « TREE+BINLANG ­ .The two phoneticsymbolsareobviously licensedby the individual syllables. Giventhat the SYNSEM attribute is associatedwith the entiremorphemeratherthanwiththe individual syllables,the theorystatesthat the « TREE ­ radicalis associatedwiththewholemorphemeratherthanwith theindividualsyllables.TheAVM for thismor-phemewould beasin (4.3)andtheannotationgraphasin (4.4):

(4.3) ±²²²²²²²²²³PHON

±²³SYLS ´

±³ SEG µ�¶ ONS ·9¸ ¶ RIME �?¹�¸9�TONE º �� �»� ±³ SEG µ¼¶ ONS ½¾¸ ¶ RIME ¿¡¹ À¸Á�

TONE � �� ¢ �� ����SYNSEM � CAT ¹�����¹

SEM ·�Ã�ÄÅÃ�½ÇÆ �ȤORTH ¥�® Æ�§ ¥�¯ � ° ¢ ©�©

������������10Plausibly, thereasonfor therelativeneglectof disyllabicmorphemescomesfromthefactthattraditional

Chinesedictionariesareorganizedaroundmonosyllabiccharacters,notaroundwordsor morphemes.Sincemeaningsaretraditionally listed in thedictionaryasentriesfor thesecharacters,this obscuresthe fact thatmany charactersarein factmeaninglessunlesscombinedwith aspecificsecondcharacter.

144 CHAPTER4. LINGUISTIC ELEMENTS

(4.4)

SEM: betel: ®TONE: 1 2SYL: ª : ¯ ª : °ONS-RIME: b in l ang

Given the representationin (4.4), how are the semantic and phoneticcomponentsto be realizedrelative to oneanother?First observe that both the following overlapstatementsaretrue,where ª�É and ª¼Ê arethefirst andsecondsyllables:

1. betelË̪�É2. betelË̪ Ê

From Axiom 1.3 we would thenexpect that Í�Î betelÏ mustcatenatewith Í�λª É Ï , andthat Í#Î betelÏ mustalsocatenatewith Í#Î7ª Ê Ï ; in otherwords,thesemanticradicalmustshow up on bothcomponentsof thethewritten representationof themorpheme.Thispredictionis correct: the written form is ÐÒÑ , with the « TREE ­ component® onbothcomponents.

Of coursein additionto overlappingwith theindividualsyllables,it is alsotruethatthe semanticinformationbetel, overlapswith the whole phonologicalword bınlang.In that casewe would say that betelË bınlang and we might then expect to find awritten form suchas* ÐÌ° , with the « TREE ­ componentcatenatedto the left ofthe pair of syllables— thusin effect showing up on the first component.Note thatthis obeys the catenationformula Í#Î betelÏÔÓ�Í#Î bınlangÏ , which canbe derived frombetelË bınlang from Axiom 1.3. However, recall that the combinationof semanticandphoneticradicalsin Chinesefrequentlyinvolvesa differentcatenationoperatorfrom the macroscopicoperator. In generala formula suchas Í#Î betelÏÕÓÍ�Î bınlangÏwouldrequirechoosingadifferentoperatorfrom themacroscopicoperatorat thewordlevel. This in turn violates the constraintthat the SLU is the syllable in Chinese;seeSection2.3.4. Thereforetheonly optiongiventhe theoryandthewriting-systemparticularconstraintsis to duplicatethesemanticelementacrossbothsyllables.Thisduplicationhasbeennotedelsewherein theliterature— e.g.in (DeFrancis,1989;LawandCaramazza,1995,amongothers)— but never to my knowledgeexplained.

Theduplicationof semanticradicalsacrossbothcomponentsof a disyllabicmor-phemepredictedby the theoryholdsquite generally. Oneproblem,of course,is todefinewhatonemeansby a disyllabicmorpheme:onecannotin generalrely on dic-tionaries,sincestandardChinesedictionariesarearrangedby character, andunderthemulticharactersubentriesrarelydistinguish“compounds”of two or morecompoundsthataremorphologicallysimplex from compoundsthataremorphologicallycomplex.Onecanhowevercollectdisyllabicmorphemesfrom corpora,if we make thereason-ableassumptionthat a disyllabic morphemeshouldconsistof two elementsthat arestatisticallyhighly associatedwith eachother, but not with anything else. Two such

4.2. CHINESEWRITING 145

lists arepresentedhere,basedon 20 million charactersof Chinesetext.11 The firstlist, presentedin Table4.1,consistsof pairsof charactersthatoccurat leasttwice andonly cooccurwith eachother: i.e. thetokencountof thepair is identicalto thetokencountsof theindividualcharacters.Thesecondlist, Table4.2,consistsof otherpairsofcharactersnot in thefirst list, which havehigh mutualinformation(Fano,1961).12 Inthis secondtable,at leastoneof the componentcharacterscanoccurelsewhere. Insomecasesthis is purely an orthographicfact: for instance,the Ö ka of Ö}× kafei‘coffee’ is alsousedto write ÖÙØ galı ‘curry’. In othercases,aswith the Ú dieof ÛÚ hudie, whatwasoriginally partof adisyllabicmorphemehastakenona “life of itsown”, andcanbeusednowadaysin new derivatives(aswell asin personalnames);see(SproatandShih,1995) for somediscussion.But what is clearaboutall thesecasesis thatbothon thebasisof distributionalevidence,andon thebasisof native-speakerjudgments,thesepairsof charactersall seemto form singleindivisibleunits.Themoststriking featureof theselists, of course,is the predictedduplicationof the semanticradicalsin eachof thecases.

Beyonddistributionalevidence,suchexamplesreflectwhatis in facta verystrongintuition of Chinesereadersthat charactersthatbelongto suchrelatively inseparableconstructionsmust be written with the sameradical. Indeed,it is possibleto tracethehistoryof someof thesecases,andshow that thesemanticradicalsharedby bothcharacterswasaddedin accordancewith this “same-radicalconstraint”.For instance,ÜÞÝ « STEP+SHANGYANG ­ changyang‘roamleisurely’hasin thepastbeenwrittenin variousways, including ßBà (changyang, the literal interpretationof ‘everydaysheep’beingirrelevant)(Ci Hai, 1979).Over time, thesame-radicalconstraintforcedtheword into its presentshape.

It is worth notingat this juncturethatonealsofindsa few pairsof disyllabicmor-phemeswith identicalpronunciationsandphoneticcomponents,differing only in thesemanticcomponents.The meaningdifferencesmay be subtle,or stark,but in anycasethereis a strongsensethatoneis dealingwith differentsensesof thesameword,with different spellingsfor the different senses.One pair where the semanticdif-ferenceis ratherstarkis á}â « TREE+PI ÉKã"ä9ÊÁÊ(å!åÈæ ã"Éèç�é�ã"ÉÅã BA ­ pıpa ‘loquat’ and êë « MUSIC+PI ÉKã"ä9ÊÁÊ(å!åÈæ ã"Éèç�é�ã�ÉKã BA ­ pıpa ‘Chineselute’. But despitethe differencein meaning,the wordsare clearly related: the Chineselute (or “pipa”) is a loquat-shapedinstrument.A moresubtlepair is ìÞí « MOUNTAIN+QIQU ­ qıqu and« FOOT+QIQU ­ qıqu, bothwith thesenseof ‘rugged’, thoughwith theformerseem-ing to emphasizethe terrain,andthe latter the journey or pathover a ruggedterrain.A crucial point aboutsuchcasesis that while thereis somefreedomto choosethesemanticradical,mixing thesemanticradicalsacrossthe two charactersresultsin anill-formed result: î�ïÞí « FOOT+QI MOUNTAIN+JU ­ is not a possibleway to write

11Thisin turnconsistsof the10-million characterROCLING (R.O.C.ComputationalLinguisticsSociety)corpus,plustenmillion charactersof anothercorpusfrom UnitedInformatics,Inc.

12Mutual information is definedfor two events ð!ñ and ðèò , andtheir cooccurrenceð!ñÅðèò , asfollows:óõô÷öùøÁú ò�ûýü"þKÿ�� � � � � ¢��ü!þKÿ�� ��� � � ü"þKÿ�� ��� ¢���Herewe estimatetheprobabilitiesby themaximumlikelihoodestimate,5û ð ���� û¾ð ����� , where û¾ð � is

thefrequency of ð and � is thesizeof thecorpus.Mutual informationhasbeenusedin many computationallinguistic applicationsfor computingthe strengthof associationbetweenlexical items: see,e.g. (ChurchandHanks,1989), andalso(SproatandShih,1990)for anapplicationto Chineseword segmentation.

146 CHAPTER4. LINGUISTIC ELEMENTS

thisword. Clearlytheseexamplesareconsistentwith amodelwherethesemanticrad-ical is a propertyof themorpheme:it is bothindependentof thephoneticcomponentin thatthereis somefreedomto choosealternative radicalswhile keepingthephonet-ics constant,while at thesametime thesameradicalmustbeusedfor bothcharactersusedto write thesyllablesof themorpheme.

While theappropriaterepresentationof xıngshengcharactersseemsclear, weneedto saysomethingabouttherepresentationof theotherclasses— pictographs,indica-tive symbolsandsemantic-semanticcompounds— which formally lack a semanticor phoneticcomponent.Taking asour examplethe pictograph� ren ‘person’, wewill assumea representationsuchasthat in (4.5),wherethegraphemeis licensedbytheSEM entry. Froma purely formal point of view � ren is a logograph:it doesnotstrictly speakingencodephonologicalinformation.Nonetheless,aswe have observedabove, even in suchcasesof formal logography, skilled readersphonologically“re-code” the symbol(Coulmas,1989),which in turn suggestsa second licensingfromthephonologicalportionof theAVM. Both theselicensingsareindicatedhere:

(4.5)±²²²²²²²²²³PHON

±²³SYL

±³ SEG µ¼¶ ONS ��¸ ¶ RIME Ã�¹�¸ �TONE �

�� ���� � �SYNSEM � CAT ¹�����¹

SEM ������¿¡ ¡¿ ¢ � ¤ORTH ¥�� ��� ¢ ©

������������4.3 JapaneseWriting

Treatmentsof writing systemsneedto saysomethingaboutJapanesewriting, widelyconsideredto be the mostcomplex writing systemin usetoday. I thereforebrieflytreatJapanesewriting here,andin Section5.2.2. In this sectionI provide aninformaltreatmentof the degreeto which Japanesewriting is logographic,a sizeableboneofcontentionamongscholarsof writing systems.In Section5.2.2I discusspsycholin-guisticevidencethat,no matterhow formally logographicJapanesemaybe,Japanesereadersand writers seemto treat it as a phonographicsystem: that is, the kind ofphonologicalrecodingsuggestedby Coulmas(1989) doesindeedseemto occur inJapanese(andalso in Chineseaswewill arguein Section5.2.1).

Japaneseapparentlyexhibits the most most extensive useof logographyof anymodernwriting system.Thereasonsfor this arewell-documentedelsewhereandwillonly bebriefly sketchedhere;see(Sampson,1985;DeFrancis,1989;Sampson,1994,amongothers). When the JapaneseadaptedChinesewriting for representingtheirown language,they experimentedwith variouswaysof usingChinesecharacters(kan-ji ) to representJapanesewords. Oneway wasto useChinesecharactersto representJapanesewordswith roughly the samemeaningbut (of course)with quite differentpronunciations:thus � ‘person’would beusedto representhito ‘person’ (Mandarin

4.3. JAPANESEWRITING 147

ren) and � ‘fish’ would be usedto representsakana‘fish’ (Mandarinyu). Charac-ters usedin this way to representnative Japanesewords are said to have their kun(‘instruction’) reading. In somecasesa sequenceof characterswith the appropriateinterpretationwould be usedto representa monomorphemicJapaneseword: thus ��

for itsu ‘when’ (Mandarinheshı), or �! anata ‘you’ (Mandaringuıfang). Inthesecases— termedjukujikun— thecharactersequencebehavesin effectasasinglecomplex character.

Right from the start,Chinesecharacterswerealsousedphoneticallyto representJapanesesyllables,andin thisusagethecharacterstookonpronunciationscorrespond-ing (roughly) to their pronunciationin Chinese.Thus, " (Mandarinnai) would beusedto representthe Japanesesyllableno (Sampson,1985,page175). Over time, aconventionalsetof theseso-calledmanyo’gana, reducedin form, evolvedinto thet-wo modernkanacoresyllabaries.But in earlyJapanesetextsonefoundadmixturesofChinesecharactersfunctioningasphoneticelementsalongwith charactersto bereadwith a kunpronunciation.

To addto thecomplexity, Japaneseborrowednot only Chinesewriting, but alsoalargenumberof Chinesevocabularyitems.Naturallythesewerewrittenasthey wouldbe in Chinese,andthey werealsopronouncedapproximatelyasin Chinese.Signifi-cantly, suchSino-Japanesereadingsarecalledon meaning‘sound’ (Chinese# yın);theinherent‘sound’ of a characteris its Chinesepronunciation,andthis is consisten-t with the useof the Chinesepronunciationin manyo’gana. As is well-documentedin Sampson’s (1985)discussion,furthercomplicationsarosedueto thefact thatChi-nesevocabularywasborrowedinto Japaneseatdifferenttimesandfrom differentpart-s of China,resultingin various“Chinese”pronunciationsfor many characters.It isnot unusualfor a characterto have, in additionto a kun pronunciation,threeor fourdistinct on pronunciations.Typically thesedifferentpronunciationsarerestrictedto

differentwords:thus $ ‘definite’ is pronouncedtei in teika ‘fix edprice’ (Man-darin dıngjia), but as jo in $&% joren ‘regular customer’(which would be dınglianin Mandarinif this werea Chineseword). Suchfactsguaranteethat for mostChinesecharacters,the Japanesereaderhaslittle choiceother thanto memorizethe associa-tion betweenlexical entriesandtheir written form, with little or no usefulrecoursetophonologicalinformation:in otherwordsthesecharactersarelogographic.

Thelogographicnatureof Chinesecharacters,asthey areusedin Japanese,is un-derscoredin a differentway by kokuji (literally ‘domesticcharacters’),Chinesechar-actersthat were inventedin Japanto representJapanesewords (LehmanandFaust,1951; Coulmas,1989; Daniels and Bright, 1996). A sampleof theseis givenin Table 4.3. The most striking featureof kokuji is the overwhelmingprevalenceof semantic-semanticconstructions,and the relatively small numberof semantic-phoneticconstructs;particularly rarearesemantic-phoneticconstructsinvolving na-tive kunpronunciations.Alexander’scompilationof kokuji (reportedin (LehmanandFaust,1951))includes249examplesfor whichapproximately184haveaclearetymol-

ogy, andarenot simply contractionsof multicharacterexpressions.(For instancejinrikisha ‘rickshaw’ is clearlyderivedfrom thecharactersequence�('*) jin riki sha(humanpower car).) Of these,72%aresemantic-semanticconstructs.Theremainder

148 CHAPTER4. LINGUISTIC ELEMENTS

aresemantic-phoneticcompounds,with 20% basedon an on pronunciation,andtheremaining8% on a kun pronunciation. Someexamplesof eachof thesecategoriescanbeseenin Table4.3.13 The prevalenceof semantic-semanticformationsamongJapanesecharacterinnovationsis striking in that it sostronglycontrastswith thesitu-ationin Chinese:there,aswe havealreadynoted, semantic-phoneticconstructswereoverwhelminglythepreferredmeansof forming new characters.TheJapanesesitua-tion alsocontrastedwith anotherChinese-basedscript,namelytheChu’ Nomwritingsystemof Vietnam.Exclusively Vietnamesecharacterinnovationswerefoundin Chu’Nom, but thesewereapparentlyall semantic-phoneticconstructions(Nguyen,1959).

A coupleof factorsmightseemto explainthelow percentageof semantic-phoneticconstructsin kokuji. Both of theseexplanationsdependupon the observation thatmany of thekokuji wereinventedto write wordsthathave only a kun pronunciation.Howeverasweshallsee,neitherof theseexplanationsreally work.

Thefirst ideais that themostobvioussourceof thephoneticcomponentfor kun-only characterswouldbeacharacterwith thesameor similarkunpronunciationastheintendedtarget.But sincethe“sound” (on) of aChinesecharacteris its Sino-Japanesepronunciation(or pronunciations),usinga kun pronunciationin this way might havebeendisfavored. However, aswe have noted,8% of the kokuji were formedin thisway, so therecannothave beenan absoluteprohibition on usingkun pronunciationsin phoneticcomponents.Furthermore,sucha prohibition would not have ruled outthemorewidespreaduseof on pronunciationsto representbothon (20%of thecases)aswell askun pronunciations.Indeed,someinstancesof the latter typedo occur, as

in « FISH+KI ­ kisu (kun) ‘sillago’ (Alexander’s entry 230), wherethe phoneticcomponent+ ki is anon reading.

Thesecondpotentialexplanationrelatesto the lengthof kun pronunciations.InChinese,andalsoin Vietnamese,charactersalmostexclusively representsinglesylla-bles.Giventherelatively simplesyllablestructuresof theselanguages,thereis a highdegreeof homophony. Thusin inventinganew characterto representamorphemewithagivenpronunciation,thereareusuallymany identicallyor similarly pronouncedchar-actersto choosefrom to actasaphoneticcomponent.Kunpronunciationsin Japanese,in contrast,areoften polysyllabic(three-syllablenativemorphemesarenot unusual),andthereforethe degreeof potentialhomophony is reduced.Again theremay be agrainof truth to this explanation,but it cannotrepresentmorethana tendency. Poly-syllabic homophonesdo exist in Japanese,andthis fact is apparentlymadeuseof in

forming someof thekokuji: see,for instance, « CLOTHING+YUKI ­ yuki ‘sleevelength’ in Table4.3 is homophonouswith (amongotherthings)yuki ‘go’ (written ,), andthis is takenadvantageof in forming this character. Secondly, thereseemsto beno requirementin generalthat the homophony be particularlyclose. Thusthe 214th

entryin Alexander’s(1951)list is « FISH+HASHI ­ subashiri ‘youngof grey mul-let’, which is apparentlyderivedusing - hashi‘run’ asa phonetic;herethetarget is

13Note, that the semanticelementsusedin kokuji do not always correspond to traditional semanticelementsin Chinese:thus,for instance,$ ‘definite’ is nota traditionalsemanticelement,thoughit is usedassuchin thecharacterfor shika‘clearly’, in Table4.3.

4.3. JAPANESEWRITING 149

quitedifferentin pronunciation(evenallowing for thewell-known /h/. /b/ alternationin Japanese)from thatof thephoneticcomponent.

Theonly reasonableguessasto thehighincidenceof semantic-semantickokuji, inmy view, is thattherewassimplyapreferenceamongtheusersof theJapanesewritingsystemfor creatingthesekindsof “visual puns”.Thismayin partbedueto thefactthatthroughoutmuchof historywriting wasaneliteskill in Japan(asin muchof therestoftheworld) (Sampson,1985)andthepeoplewhopossessedthatskill hadtimefor whatmaybeviewedaspractically-orientedlanguagegames.But thespreadof literacy hasby no meanskilled this kind of creativity: Alexanderexplicitly excludesfrom discus-

sionmorerecentwidely-known formationslike « FEMALE+UP+DOWN ­ erebetagaru ‘elevatorgirl’ preciselybecausethey arepunsandarenot seriouslyconsideredpart of the writing system.But the differencein kind betweenthis exampleandthe

genuinekokuji « MOUNTAIN+UP+DOWN ­ touge ‘mountainpass’is in factmin-imal.

As a logographicsystem— or moreproperly, a logographicsubsetof a writingsystem— semantic-semantickokuji exemplify thecreative limits of logography. Butwhatdo they tell usaboutthenatureof Japanesewriting?

And secondlywhat do they tell us aboutthe possibility of developingan entirewriting systembasedon logography— somethingthat Sampson(1985), it will berecalled,claimsexistsalreadyin thecaseof Chinese?

In answerto thefirst question,aswe notedin the introductionto this section,theamountof logographythatJapanesereadersmustfaceis large,morethanin any othermodernwriting system.But it is alsoclearthatthispercentagehasbeenonthedeclinewithin the last century, as the useof the systemmoved out of the circle of literatiinto the generalpopulation;as Smith (1996, page210) notes,the useof kanji in awide varietyof functionshasdeclinedsteadilythroughoutthe20thcentury. With thedecreasein theuseof kanji, therehasbeen necessarilya concomitantincreasein theuseof thephonologically-basedkanascripts.Japanesewriting hasalwaysinvolvedamixture of logographicandphoneticelements;it is, andalwayshasbeen,a “mixedscript”, asSampson(1985)termsit, onewherethereis a large logographiccore,butwherephonologically-baseddevicesareavailable,andwidely andproductively used.Themix hassimply shiftedmoreandmoreto thephonologically-basedmethods.

Over andabove this onemustmake a cleardistinctionbetweenthepurely formalcharacterizationof the script andhow the script is actuallyusedby fluent readersofJapanese.Largenumbersof logographicelementsclearlyexist in Japanese,but recallthat even logographicelementscanbe recodedso as to representphonologicalele-mentsdirectly, aswe discussedin Section4.2. Thuswe would assumethat thekokuji/

tara ‘cod’ hasa representationlike thatof Chinese� ren ‘person’ in (4.5),givenin (4.6)below:

150 CHAPTER4. LINGUISTIC ELEMENTS

(4.6)±²²²²³ PHON 07ÄÅ¿1�È¿32 � �SYNSEM � CAT ¹�����¹

SEM ���È  ¢ ��¤ORTH ¥ / �4� ¢ ©

� �����Therearetwo kindsof evidencethat this hashappenedin Japanese.First of all, thereis psycholinguisticevidencefrom Horodeck(1987) and Matsunaga(1994) demon-stratingthat readersof Japaneseaccessphonological representationswhenthey readkanji; this evidencewill bediscussedin Section5.2. Secondly, kanji (like charactersin Chinese)maybeusedpurely for their phonologicalvalue,ignoring their semanticvalue: in this usagethey arecalledateji. (Of coursethis is preciselytheway in whichthey were usedin early Japanesemanyo’gana.) An exampleis 5!6 kohı ‘coffee’(Smith,1996,page210),wherethecomponentcharacters5 ko (‘ornamentalhatpin’)and 6 hı (‘string of many pearls’)areusedpurely for phonologicalreasons,the in-dependentmeaningsof thecharactersbeingirrelevant.Thuswe mayassumea purelyphonographicanalysisfor 5(6 , asin (4.7):

(4.7)±²²²²²³ PHON ¶ SYLS 0 ko

���hı ¢ � 2K¸

SYNSEM � CAT ¹�����¹SEM ���8797 Ã�à ¤

ORTH ¥�5 � 6 ¢ ©��������

Ateji really involveexactly thesameprocessby which onecanwrite theEnglishsen-tenceI seeyouforgot that as « i c u 4gotthat­ : thedifferenceis thatspecificateji areanacceptedstandardpartof Japanesewriting.

Turningto thesecondquestion,semantic-semanticcharactersin Japanese(andinChinesealso), certainly give someindication of what a purely logographicsystemmight look like. Now, in their reply to Sampson(1994),DeFrancisandUnger(1994)argueagainstthe possibility of a learnablepurely logographicsystemby citing thecaseof military codes.In suchcodes(asdistinct from ciphers), wordsarerandomlysubstitutedfor eachother, sothatbattleshipmight betransmittedasgrapefruitandat-tack mightbetransmittedasfallacious. Suchsystemsareindeedunlearnable(nobody,presumably, hassufficientmemory),but theexampleis notentirelyfair either:thesys-temhasno structure,which is of coursewhy it is soeffective for its intendedpurpose.Semantic-semanticcharactersprovidewhatseemslikeamorereasonablemodel:therewould be a limited setof primitives— in the caseof ChineseandJapanesewriting,the componentsof the characters— andtherewould be a calculusthat defineshowthey areto becombined.Of coursetherewouldbenophonologicalcuesto thelearner:ratherthelearnerwouldneedto learnto associatecollectionsof purelysemanticinfor-mationwith intendedwordsor morphemes.It seemsfair to guessthatsucha system

4.4. SOMEFURTHER EXAMPLES 151

would beextremelydifficult to design,which is presumablypartof thereasonsuchasystemnever hasbeendesigned.And it seemsfair to guessthatsucha systemwouldalsobe difficult to learn, thoughpresumablynot asdifficult asa military code. Forthe latter reasonalonesucha systemwould not serve theneedsof a societyin whichreadingis takenasabasicskill to bemasteredasrapidlyaspossibleby a largenumberof people:mostpeoplein a societyhave little time for complex linguistic games.Toserve thoseneedsa writing systemmusthavea significantphonographiccomponent.

4.4 SomeFurther Examples

In thischapterwehavegivenanoverview of someof thekindsof linguisticinformationthatmaybeencodedby orthographicelements.Wehaveproposedataxonomyof writ-ing systemsbasedon thekind of phonographicelementsusedin the system,andtheamountof logographypresentin the system.However, unequivocally phonographicelements,andthekindsof semanticallymotivatedlogographicelementsthatwe haveconsideredby no meansexhaustthe possiblefunctionsof orthographicdevices. Weclosethischapterwith threemildly esoterickindsof functions:anorthographicpluralmarker in Syriac;reduplicationmarkers;andcancellationsigns.In eachcasewe givea formaldescriptionin termsof ourmodel.

4.4.1 Syriac syame

The Syriacsyameis a pair of dots that marksplurality in nounsand adjectives inSyriacand someother Aramaicdialects(Daniels,1996a,page507). For example,the plural of : « mlk ;¡­ /malka/ ‘king’, is written : « mlk” ;¡­/malke/ (with syametransliteratedas ‘” ’). In unvocalizedtext, the syameareoftentheonly markof plurality, soonecanplausiblyanalyzethis device aslogographic,inthis casebeinglicensedby theSYNSEMfeature < = PL> . A representationfor /malke/‘kings’, is given in (4.8). (Ultimately, the syamewill catenatewith the orthographicexpressionof theword asa whole: thereforethe lettersin theword aregroupedhere(using‘ ¥ © ’) separatelyfrom thesyame):

(4.8)±²²²²³ PHON 0@? � � ¿£½ ¢ �BA Æ � ÃC2ORTH ¥�¥ « ? ­ � §�« ½�­ ¢ §�« A ­ Æ §�« ;¡­ © § ¥ «&D ­FE ©�©SYNSEM � CAT ¹�����¹

PL = E � ¤ �������Sincethe PL attribute overlapswith the phonologicalinformation in the word, thesyame, as the orthographicrepresentationof <�= PL> , catenate— more specificallydownwardscatenate— with the orthographicrepresentationof the phonologicalin-formationin theword, in this casetheletters « m ­ , « l ­ , « k ­ and « ;¡­ .

The exact placementof the syameabove the word dependsuponthe particularlettersin theword. If « r ­ is present,thesyameareattractedto it andarewritten

152 CHAPTER4. LINGUISTIC ELEMENTS

asa seconddot over the « r ­ : . Otherwisethesyamearewritten preferablyneartheendof theword,avoiding theletters « l ­ and « ;¡­ , which have longascenders.

Thecategoricalrequirementsthat thesyameappearabove theword, andthat theymustappearabove « r ­ if thereis onearecapturedby the rulesin (4.9) and(4.10).Thefirst placesthesyameasthegraphicalexpressionof <�=HGJIK> abovethegraphicalex-pressionof thenounor adjective L . Thesecondreassociatesthesyameto thepositionof the « r ­ in theword, if thereis one;notethat if morethanone « r ­ is present,thesyamecanoccuron eitherone. Otherdetailsof syame-placementwe assumeto bestylistic.

(4.9) Í�Î�L ÓB< =HGMIK> ÏONQPSR5¿T?÷ÃVU Ó5Í#Î@L Ï(4.10) Let Í#Î@L Ï É"æ æ æ W é�É denotethefirst �YX º lettersof Í�Î�L Ï , Í#Î@L Ï W the � th letter, andÍ�Î�L Ï W[Z�É"æ æ æ \ theremainingletters.If Í�Î�L Ï W is « r ­ , then:P]R5¿T? Ã^U Ó�Í#Î@L Ï`_ Í�Î�L Ï É"æ æ æ W7é�Éba Óc< P]R5¿T? Ã^U Ó�Í#Î@L Ï W > a Ó Í�Î�L Ï W[Z�É"æ æ æ \

Notethatwe wouldsaythatin SyriactheSLU is theword.

4.4.2 Reduplication markers

A numberof writing systems,includingearlierformsof Malayand BahasaIndonesiaaswell asKhmer, havemarkersthatindicaterepetitionof precedingmaterial.Khmer,for instance,hasa signthatmarkstherepetitionof theprecedingword or word group(Schiller, 1996,page472). (In Malay andBahasaIndonesiaa raised‘2’ wasused.)Suchsignsmightappearto constituteacounterexampleto regularity: in orderto iden-tify wheresuchsignscanbe written, the mapping dfehg9i akj would have to identifycopiedstretchesof linguisticmaterial,somethingthatcannotbehandledby finite-statedevicesfor unboundedcopy lengths.However, asfar asI have beenableto ascertain,thesedevicesarenot usedto markarbitrarycopiesof surfacestrings,but ratheronlycopiesthatarisefrom someform of morphologicalreduplication.We canreasonablyassumethat the lexical representationof a reduplicatedform indicatesthat a givenstretchof linguistic materialhasbeencopied: for example,standardautosegmentalanalysesof reduplication(Marantz,1982,andmuchsubsequentwork) assumethatthebasethatis copiedis affixedor compoundedwith amorphemethatis lexically phono-logically empty, but which derivesits surfacephonologicalmaterialby copying fromthebase.Clearlysuchreduplicatingmorphemesmustbemarkedassuchin the lexi-cal representationof constructsthatcontainthem,andwe only needassumethat thisinformationis indicatedaspartof therepresentationat theORL.

Forexample,let’ssaythatwehaveaform likeoral oral , wherewewill assumeforthesake of argumentthat thefirst portionis thebase,andthesecondis thecopy. Theinformationaboutwhich is thebaseandwhich thecopy is known to themorphology,andpresumablycouldbelexically markedassuch.For instance,onemight imaginearepresentationsuchasthatin (4.11),wherethecopy is markedwith labeledbrackets:

(4.11) oralh< m�nKå]o oralJ>pm�nKå]o

4.4. SOMEFURTHER EXAMPLES 153

Assumethat thewriting systemin questionmarksreduplicatedconstituentsusingthesymbol‘ Ê ’. Thenonecanwrite arulethatsimplystatesthattheimageunderd ehg9i akjof any spanL bracketedby < m�n?å]o and > m�nKå�o is simply ‘ Ê ’:(4.12) Í#Î�< m�nKå�oqLV>pm�nKå]oÏON ÊThusassumingaspellingof « orang­ for thebase,thespellingof oralr< m�n?å]o oral3>@m�n?å]owouldbecome« orangÊ ­ .

4.4.3 Cancellationsigns

A numberof scriptshave cancellationsigns,usedto mark symbolsthat arenot pro-nounced.For example,Syriacmarkslettersthatarenot pronouncedin somedialectswith a diagonalbar(calledmbat.lana) undertheletter(GeorgeKiraz, personalcom-munication).Similarly, Thai (Diller, 1996)usesa cancellationsignto indicatelettersthatarenotto bepronounced,mostlyin wordsderivedfrom Sanskrit,whicharespelledetymologicallyin ModernThaiorthography.

Cancellationsignsthus mark graphemesthat are not licensedby any linguisticmaterial: more formally, they mark graphemess wherethe imageof s under theinverseof dferg9i akj — whichwewill denotehereas Í é�É — is empty: Í é�É Îts�ÏuNwv : .Thus,givenacancellationsign x , wewantto rewrite s as x#Ó[s just in caseÍ é�É Îts�Ï`Nyv :(4.13) For an orthographicsymbol s , andcancellationsign x if Í é�É Î�s�ÏzN{v thens|_}x Ó~s .

154 CHAPTER4. LINGUISTIC ELEMENTS

Orthography Analysis Pronunciation Gloss�(� « SURROUND+L INGWU ­ l ıngyu ‘imprisoned’�(� « SURROUND+WULUN ­ hulun ‘swallow whole’�(� « CART+LI AOGE ­ ji uge ‘entwined’�(� « CAVE+YOUTIAO É4�!ä9çèã�åÈæ ã!ç�� ­ yaotiao ‘graceful’�(� « DEMON+WANGLIANG ­ wangliang ‘roamingghost’�(� « FEMALE+ZHOU �!ä9ÊÁÊ�åÈæ ã9ç�� L I ­ zhoulı ‘sisterin laws’�(� « FOOD+KUNTUN ­ huntun ‘wonton’�(� « FOOT+CUOTUO ­ cuotuo ‘procrastinate’�(� « FOOT+LANG Éèç!äÁÊ��(åÈæ ã���� QIANG � ä!É���åÈæ �!É�� ­ langqiang ‘hobble’�(� « FOOT+ROUL IN ­ roulın ‘trample’�(� « FOOT+CHOU �!ä!É���åÈæ ����� ZHU ­ chouchu ‘hesitate’�(� « FOOT+ZHI Ê9ä9ç(åÈæ ���9Ê SHU ­ zhızhu ‘hesitate’�(� « GAS+YINYUN �!ä!É � åÈæ �!É ��­ yınyun ‘misty atmosphere’�(  « GOING+XIEHOU ­ xiehou ‘encounter’¡(¢ « GOING+YIL I ­ yılı ‘trailing’£(¤ « GRASS+BOQI ­ bıqı ‘waterchestnut’¥(¦ « GRASS+GUAJU ­ woju ‘lettuce’§(¨ « GRASS+HANXIAN ­ handan ‘lotus’©(ª « GRASS+JI ANJI A ­ ji anjia ‘type of reed’«(¬ « GRASS+MUSU ­ musu ‘clover’­(® « HAND+YEYU ­ yeyu ‘tease’¯(° « HEAD+MANHAN ÉKã"ä9çÁÊ(åÈæ ã���É ­ manhan ‘muddleheaded’±(² « HEART+CONGYONG ­ songyong ‘eggon’³(´ « HEART+NIU �!ä��(å�æ ã��� NI ­ niunı ‘coy’µ(¶ « HEART+YINQIN ­ yınqın ‘attentively’·(¸ « INSECT+BIANFU ­ bianfu ‘bat’¹(º « INSECT+FUYOU ­ fuyou ‘mayfly’»(¼ « INSECT+QIUY IN ­ qiuyın ‘earthworm’½(¾ « JADE+CUICAN ­ cuıcan ‘brilliant’¿(À « JADE+DAIMAO ­ daimao ‘tortoiseshell’Á( « LEATHER+QIUQIAN ­ qiuqian ‘swing’Ã(Ä « OLD+MAOZHI ­ maodie ‘old people’Å(Æ « OVERHANGING+YI �9ä9Ê���åÈæ ã���� NI ­ yını ‘fluttering’Ç(È « PERSON+KONGZONG �!ä��(å9åÈæ �9ç!É"éÉ�9ç9Ê ­ kongzong ‘busy’Ê(Ë « SICKNESS+GE �!ä9ÊÁÊ(åÈæ ã���� DA ­ geda ‘cyst,boil’Ì(Í « STEP+PANGHUANG ­ panghuang ‘roamaimlessly’ÜÙÝ « STEP+SHANGYANG ­ changyang ‘roam leisurely’Î(Ï « TEETH+JUWU ­ juyu ‘bickering’áÙâ « TREE+PI ÉKã"ä9ÊÁÊ(å!åÈæ ã"Éèç�é�ã�ÉKã BA ­ pıpa ‘loquat’Ð(Ñ « TREE+NINGMENG ­ nıngmeng ‘lemon’Ò(Ó « WINE+MINGDING ­ mıngdıng ‘drunk’Ô(Õ « WINE+TI �!ä!É��(åÈæ ã���� HU ­ tıhu ‘clear wine,butterfat’Ö(× « WRAP+PU É��9ä9Ê9Ê�å!åÈæ ã����é�ã� � FU ­ pufu ‘crawl’

Table4.1: Disyllabic morphemescollectedfrom the ROCLING corpus(10 million charac-ters)and10 million charactersof theUnited Informaticscorpus.This setconsistsof pairsofcharactersoccurringat leasttwice, andwhereeachmemberof thepair only cooccurswith theother.

4.4. SOMEFURTHER EXAMPLES 155

Orthography Analysis Pronunciation GlossØ(Ù « BIRD+YUANYANG ­ yuanyang ‘mandarinduck’Ú(Û « DOG+JI AOHUA ã!ä���åÈæ �èã!Ê ­ ji aohua ‘cunning’Ü(Ý « GRASS+FANSHU ­ fanshu ‘yam’Þ(ß « GRASS+HULU ­ hulu ‘gourd’à(á « GRASS+LUOFU ­ luobo ‘daikon’â(ã « GRASS+PUTAO ­ putao ‘grape’ä(å « HEART+GUANGHU ­ huanghu ‘illusionarily’æ(ç « HEART+KANGJI ­ kangkai ‘generous’ÛÙÚ « INSECT+HUDIE É4�!ä9Ê��(åÈæ ���!Ê ­ hudie ‘butterfly’è(é « INSECT+MAY I ­ mayı ‘ant’ê(ë « INSECT+PANGXIE ­ pangxie ‘crab’ì(í « INSECT+ZHANGLANG ­ zhanglang ‘cockroach’î(ï « JADE+HUBO ­ hupo ‘amber’ð(ñ « JADE+L INLANG ­ l ınlang ‘kind of jade’ò(ó « JADE+PIL I ­ bolı ‘glass’ô(õ « LAME+JI ANJI E ­ ganga ‘awkward’ö(÷ « MOUTH+PAO É4�!ä9ç���å!åÈæ ã9ç���é�ã9ç"É XIAO ­ paoxiao ‘roar’ø(ù « MOUTH+HOULONG ­ houlong ‘throat’ú(û « MOUTH+HAISU ­ kesou ‘cough’ü(ý « MOUTH+JUJUE ­ jujue ‘chew’ÖÙ× « MOUTH+JI AFEI ­ kafei ‘coffee’þ(ÿ « MOUTH+LABA ­ laba ‘trumpet,speaker’��� « MOUTH+XIXU ­ xıxu ‘sniffling’��� « PERSON+GUILEI ­ kuılei ‘puppet’��� « PERSON+KANGL I ­ kanglı ‘couple’��� « ROOF+YUZHOU �9ä9Ê9Ê�åÈæ ã9ç�� ­ yuzhou ‘universe’� « SHELL+YOULV �!äÁç�� åÈæ ã9ã�� ­�� huıluo ‘bribe’ �� « STEP+FEIHUI ­ paihuı ‘going to andfro’ÐÙÑ « TREE+BINLANG ­ bınlang ‘betelnut’��� « TREE+GANLAN ­ ganlan ‘oli ve’��� « WINE+YUN �!ä!É � åÈæ �!É � XIANG ­ yunniang ‘brewing (i.e. trouble. . . )’

Table 4.2: Furtherdisyllabic morphemescollectedfrom the ROCLING corpus(10 millioncharacters)and10 million charactersof theUnited Informaticscorpus.This setconsistsotherpairsof charactersthatdo not exclusively occurwith eachother, but wherethereis nonethelessa high mutualinformationfor thepair. Note that LV ( � ) indicatesthat thephoneticcomponentin questionoccurs9 out of 38 timesin characterspronouncedwith initial /l/ followedby somevowel.

156 CHAPTER4. LINGUISTIC ELEMENTS

Alex. # Kokuji Analysis (Phonetic) Kun (on) Gloss

10 « PERSON+MOVE ­ hataraki do ‘effort’

12 « WIND+STOP ­ nagi ‘lull, calm’

33 « MOUNTAIN+UP+DOWN ­ touge ‘mountainpass’

37 « HEART+FOREVER ­ kore ‘endure’

74 « FEW+HAIR ­ mushi ‘pluck’

124 « EAR+CERTAIN ­ shika ‘clearly’

160 « BODY+BEAUTIFUL ­ shitsuke ‘upbringing’

198 « DOWN+WIND ­ oroshi ‘mountainwind’

240 « FIELD+BIRD ­ shigi ‘snipe’

249 « FEMALE+NOSE ­ kaka ‘wife’

138 « GRASS+ZA ­ � za(on) goza ‘matting’

51 « TREE+MASA ­ masa(kun) masa ‘straightgrain’

147 « CLOTHING+YUKI ­ yuki (kun) yuki ‘sleeve length’

Table4.3: A sampleof Japanesekokuji (secondcolumn),with their componentialanalysis(third column). The first columnis the entry numberin Alexander’s list (LehmanandFaust,1951).Thefourth columnliststhephonetic,if any. Thefifth columnliststhekunpronunciation,and the sixth column the on pronunciation,if any: in one case— goza— thereis no kunpronunciation.The last threekokuji shown areformedassemantic-phoneticconstructs,withthe last two beingbasedon thekun pronunciation:notethat thephoneticcomponentof masaalsomeans‘straight’, soit is possiblethatthis oneis alsoa semantic-semanticconstruct.

Chapter 5

PsycholinguisticEvidence

Thereis to datealargeliteratureonthepsycholinguisticsof readingandwriting, whichdealsin thequestionof how humansextract linguistic informationfrom written text,andhow they composewritten text given a mentallinguistic representation.Someusefulgeneralcollectionsinclude(FrostandKatz,1992;deGelderandMorais,1995;Perfetti,Rieben,andFayol, 1997)and (Balota,Floresd’Arcais,andRayner, 1990);therehasalsobeena large amountof work on readingandwriting Chinesescript,including the paperscollectedin (Chenand Tzeng,1992) and (Wang, Inhoff, andChen,1999).

Sinceweareproposingacomputationalmodelof writing systemsandtheirrelationto linguistic structure,it makessenseto askwhat “psychologicalreality” thereis inthe model that we have proposed.It is not my intentionhereto review the variouspsycholinguisticmodels— many of them mutually inconsistent— that have beenproposed.Rathertheapproachthatwill betakenwill beto examinethemodelandseeif thereis any supportin the psycholinguisticliteraturefor someof the propertiesofthemodel.

Clearlysuchanapproachrequirescaution.Whatdowemeanby “psychologicallyrealistic”? This is a term which unfortunatelyhasbeenmuchabusedin the historyof linguisticsandcomputationallinguistics. In the context of the presentdiscussionwe needto beratherpreciseaboutthelevel of granularityat which we would wanttoinvestigatethe psychologicalreality of our modelof writing systems.For example,thereare specificcomputationaldevices — finite-statetransducersin particular—thathavebeenproposedasplausiblecomputationalmechanismsfor mappingbetweenwrittenform andlinguistic representation:it seemsunlikely thatthesespecificdevicesareplausiblemodelsof whatgoeson insidea reader’s (or writer’s)head.

Ontheotherhandtherearemoremacroscopicpropertiesof themodelthatdomakesenseto comparewith theresultsof psycholinguisticresearch.I will focusontwo suchpropertieshere:� Ar chitectural Uniformity : thesamemodelof therelationbetweenorthography

andlinguistic form is proposedfor all writing systems.

157

158 CHAPTER5. PSYCHOLINGUISTICEVIDENCE� Dual Routes: the model makesa distinction betweenspelling rules, and thelexical specifications, possibly includingmarkedorthographicinformation,thattheserulesoperateon. It is assumedthat in normalreadingorthographicrepre-sentationsaremappedto lexical elements(i.e., theORL’sof morphemes,wordsandphrases),andthenceto pronunciations.However, in mostwriting systems,for mostwords,only partial lexical orthographicspecificationsarerequired,thebulk of thespellingbeingpredictableby spellingrules. Inverted,thesespellingrulescanserveasrulesfor inferringanORL representationfrom spelling;if onethencomposestheseinvertedspellingruleswith whatever rulesor principlesofthelanguagepredicttheactualpronunciationfrom theORL, onecanderivepro-nunciationsfor spelledwordswithout actually“consulting” the lexicon. Thuswehaveanadditionalrule-basedpathto pronunciationthatbypassesthelexicon.

Oneof the main topics in the literatureon readinghasbeenthe questionof thenumberof routesby which a readercanget from a written word into a phonologicalrepresentation.As we describein moredetail below, the mostcommonassumptionin the literatureis that thereareat leasttwo suchrouteswhich maybecharacterizedbroadlyasvia thelexiconor via “grapheme-to-phonemecorrespondences”.

Within this framework, onequestionthathasreceiveda greatdealof attentioniswhetherwriting systemsdiffer in whichof thesetwo routesis taken.Theso-calledOr-thographicDepthHypothesis— henceforthODH — claims,in its strongestform, thatsomelanguages— theso-calledorthographically“deep” languages,of whichEnglishis anoft-citedexample— requirereadersto go via thelexicon;whereasorthographi-cally “shallow” languages— Serbo-Croatianis supposedlysucha case— only makeuseof thegrapheme-to-phonemeroute.Onemightbetemptedto equatethisnotionoforthographicdepthwith the notionof the depthof the ORL, discussedin Chapter3.But thereis a crucial difference:we claim that languagesdiffer in the depthof theirorthography, andin theregularityof their “grapheme-phoneme”correspondences,butnot in themannerin whichonemapsfrom orthographyto linguistic representation,orultimatelyto pronunciation.

It seemsthatonecandraw two conclusionsfrom theliterature(thoughit wouldbedisingenuousto suggestthat thereis anything like a consensuson thesepoints).Bothof theseconclusionsareconsistentwith the propertiesof the modelthat we outlinedabove:� Multiple routesfrom written form to pronunciationareavailable.� TheODH, at leastin its strongestform, is incorrect:all writing systemscanbe

shown to make useof both a “lexical”, anda “phonological” (i.e, rule-based)route.

Theremainderof this chapteris organizedasfollows. In Section5.1,we will out-line the evidencefor multiple routes,andwe will discussthe ODH, andgive someof the evidencethathasbeenpresentedbothsupportof, andagainstthis hypothesis.Section5.2 continuesthis discussionwith someevidencefrom ChineseandJapanese— two writing systems thatwouldappearonthefaceof it to beunequivocally“deep”— showing that even thereonefinds evidenceof “shallow” processing.Finally, not

5.1. ORTHOGRAPHICDEPTH 159

all psycholinguistssupportthehypothesisof multiple routes,andthemostvocaladvo-catesof analternativesingle-routeapproachhavebeentheconnectionists.A somewhatdated,thoughstill influentialwork in thismold is (Seidenberg andMcClelland,1989).Section5.3givesabrief critiqueof this work.

5.1 Multiple Routesand the Orthographic Depth Hy-pothesis

Two kindsof experimentsfigureprominentlyin thepsycholinguisticwork onreading.Oneinvolveslexical decision, andtheother naming.

In a lexical decisionparadigm,subjectsarepresentedwith awrittenstimulus(usu-ally on a CRT screen),andareasked to answer(e.g. by pressinga button on a key-board)whetheror not the stimulusin questionis a word of their language. Theirreactiontime is measured,asis thecorrectnessof their responses.

In thenamingparadigm,subjectsareagainpresentedwith a written stimulus,butthis time they areasked to pronouncethe stimulusaloud— to “name” theword thatis on the screen. In this casewhat is normally measuredis the time betweenthepresentationof thestimulusandtheonsetof vocalization.

TheODH hasimplicationsbothfor namingandfor lexical decision,but it is per-hapseasiestto illustratethe ideabehindthe hypothesisin the context of a modelofnaming.Onesuchmodelis presentedschematicallyin 5.1;thismodelis adapted,withsimplifications,from (BesnerandSmith,1992,Figure1), andthe presentationof theODH hypothesisdrawsheavily on their discussionof this topic.

Themodelin Figure5.1allows for threeroutesto naming.Thesimplestis labeled‘A’ in thefigureandinvolvestheapplicationof “grapheme-to-phoneme”rules.In thatscheme,input text is fed into a block of rules,anda phonologicalrepresentationisderivedsolelyvia thatblockof rules.Crucially, thereis no lexical accessinvolved.Totakeanexample,thestring � peat� in Englishcanbepronouncedby applyingtherules� p ����� p� , � ea����� i � , and � t ����� t � , deriving thepronunciation/pit/. Sincethisrouteinvolvesassemblinga phonologicalrepresentationon the fly, it is often termedtheassembledroute.

Thesecondandthird routesdoinvolvelexical access,tovaryingdegrees.Theroutelabeled‘B–D’ involvesthe so-calledorthographic input lexicon, which storeswordsin their orthographicforms,presumablywith associatedphonologicalinformation; itcorrespondspretty muchexactly to the orthographiclexical entry in the ORL in ourmodel. Namingvia route‘B–D’ thusinvolveslexical access,but of a fairly shallowkind, in thatonly theformal propertiesof theword areaddressed.Underthis scheme� peat� would bepronouncedby matchingthestring � p � , � e� , � a� , � t � againstthe lexical entry for peat in the orthographicinput lexicon, andretrieving the storedpronunciation/pit/.

The third route,‘C–D’ is the deepest.It too involvesthe orthographicinput lex-icon, but it alsoinvolvesaccessingthe meaningof the word. In this case,semanticattributesof the lexical entry of peatwould be accessed,andfrom thereonewould

160 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

Phonological Output Lexicon

TEXT

SPEECH

Orthographic Input Lexicon

Grapheme to PhonemeCorrespondences

Phonemic Buffer

Semantic System

A

C C

B

D

Figure5.1: A modelof readingaword aloud,simplifiedfrom (BesnerandSmith,1992).

5.1. ORTHOGRAPHICDEPTH 161

derive a pronunciationfor theword associatedwith thatsetof attributes.1 In normalreadersundernormalconditionsaccessingthesemanticsshouldnot in principleyielda differentresult from the ‘B–D’ route. Differentresultsmay be obtained,however,in neurologicallyimpairedpatients,aswe shall seemomentarily. Routesinvolvinglexical accessderive pronunciationsfor written wordsby addressinga lexical repre-sentation,andhenceareoftentermedaddressedroutes.

As BesnerandSmith note (page47) thereis evidencefor the existenceof eachof theseroutesin readersof deeporthographies,like that of English. Someof themostcompellingevidencecomesfrom patientswith variouskindsof brainlesionsthatimpair theirability to readaloudin variousways.Specifically:� Oneclassof patientsfinds it easierto namewordswhosespellingsare“more

regular” giventheir pronunciations.For examplecavefollows therulesof En-glish spellingbetterthanhavedoes,andsuchpatientsfind it easierto correctlynamecavethanhave. Plausibly, suchpatientshavebeendamagedin suchawaythatthegrapheme-to-phonemerule pathA is theonly oneleft opento them.� At theotherextreme,somepatientsmake semanticerrorswhenaskedto name:for � tulip � they mayanswercrocus, for example.A reasonableexplanationisthat for thesepatientsthesemanticaccessrouteC–D hasbecomefavored(andthisonly imperfectly).� In the middle are patientswho have no particularproblemsnamingordinarywords(eitherhaveor cave), anddon’t tendto make semanticerrors. Yet theyareimpairedin that they areunableto readnon-words.This suggeststhat theyareusingneithera grapheme-to-phonemestrategy (routeA) nor do they seemto be using a semanticstrategy (route C–D). Ratherthey are forced by theirimpairmentinto routeB–D. Thiscorrectlypredictsthatthey will beableto readwordsthatarein thelexiconalready, but not novel words.

We turn now to the ODH. Two flavors of this hypothesishave beenproposedinthe literature,the strong form andthe weakform. The strongODH canbe statedasfollows:

(5.1) Orthographicdepthhypothesis(strongform):Readersof languagesthathave completelyregulargrapheme-phonemecorre-

spondenceslackanorthographicinput lexicon.

In otherwords,routeA is theonly routeavailableto suchreaders.In theliteratureonthe ODH, the mostoften cited instanceof a shallow orthographyis probablySerbo-Croatian.2

1This third, semantic,routeis theonethathasno directcorrespondentin our model: it would of coursebeeasyenoughto addanadditionallayerof semanticprocessingwherebylexical entriesat theORL maptoasemanticrepresentation,andthencebackto phonologicalentries.

2Notehowever that Serbo-Croatianorthographydoesnot mark lexical accent,which is determinedby(unwritten)lexical propertiesof thewordmuchasin thecaseof Russianstress(Section1.2.1);(Seidenberg,1990,pages50–51;WaylesBrowne,personalcommunication). Note that this is thecasewhetherwe aretalking aboutCroatian(written in theRomanalphabet)or Serbian(written in theCyrillic alphabet).Thus

162 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

A (significantly) weaker versionof the hypothesis— onesupported,for exam-ple, by Katz andFrost (1992)— states that all written languagesallow for both agrapheme-to-phonemecorrespondenceroute(routeA), andfor a lexical accessroute(routesB–D,or perhapsC–D):but thatcostof eachroutedirectlyrelatesto thetypeoforthography(deepor shallow) involved. In shallow orthographies,the grapheme-to-phonemerouteis usuallycheaperin naming,thoughtheremaybeinstancesin whichlexical accessis involved.Contrariwise,in adeeporthography, lexical accesswill typ-ically becheaperto usein naming,thoughtherewill beinstanceswherethegrapheme-to-phonemeroutemightbeused.

Insofar asit makesa far strongerclaim aboutthe mentalprocessof reading,thestrongODH is moreinterestingthantheweakODH. We shall thereforestartby out-lining theevidencethathasbeenmarshalled,both for andagainstthis versionof thehypothesis.Ourconclusionwill bethattheevidenceagainstthestrongODH seemsonthewholemorecompellingthantheevidencefor it, andthat thereforethereseemstobeno reasonto acceptthatreadersof differentorthographieshave fundamentallydif-ferentmentalarchitectures.Rathertheevidenceseemsmoreconsistentwith a model(possiblylike the weakODH), wherereadersof all orthographiesusefundamentallythesamemodel,thoughof coursedifferencesamongorthographieswill inevitably leadto differencesin how thementalresourcesareallocated.

5.1.1 Evidencefor the Orthographic Depth Hypothesis

Accordingto thestrongODH, theprocessingof shallow orthographiesin namingin-volvespathway A in Figure5.1. Thus,it bypassesbothof the lexical pathwaysB–DandC–D.Thiswouldappearto maketheratherclearpredictionthatreadersof shalloworthographiesshouldfail to show effectsof lexical accessin namingtasks.In contrast,readersof deeporthographiesshouldshow sucheffectssincein generalpathway A isnot sufficient to correctlynamewritten forms,andoneof the lexical routesmustbeused.

Two widely reportedlexical effectsarethe effect of word frequency andlexicalpriming. The lexical frequencyeffect relatesthefrequency of particularlexical itemswith thespeedwith which they canberetrievedfrom the lexicon: otherthingsbeingequalmorefrequentwordsareretrievedmorequickly. The lexical priming effect re-latesthespeedwith which a word will beretrieved,to thepresenceof a semanticallyrelatedword: if theword couch hasbeenusedin a previouscontext, semanticallyre-latedsofawill beretrievedfasterthanif asemanticallyrelatedwordhadnotbeenused.Thespeedof lexical retrieval is oftenmeasuredusinga lexical decisionparadigm,andin this paradigm,bothpriming andlexical frequency effectshave beendemonstratedboth in languagesthathave deeporthographiesandin shallow orthographies(BesnerandSmith,1992,page50).

Given theseobservations,it would appearto be strongconfirmationof the ODHthatprimingandword frequency effectswerenot observedin namingtasksfor Serbo-

it is by no meanspossibleto predictevery aspectof thepronunciationof a word in Serbo-Croatian.Thisdiffers from the caseof Spanishwherealmostwithout exceptiononecanpredict the pronunciationof awritten form without considerationof whatlexical form it mayrepresent.

5.1. ORTHOGRAPHICDEPTH 163

Croatian,alanguagewith asupposedlyshallow orthography(KatzandFeldman,1983;Frost,Katz, andBentin,1987). In theseexperiments, subjectswereasked to namebothrealwordsandplausiblenon-words;theexpectedpriming andfrequency effectsdid notobtainfor therealwordstimuli. In contrast,readersof deeporthographies,likethatof English,do show theselexical accesseffectsin similarly constructednamingtasks(BesnerandSmith,1992).

Still, BesnerandSmithobserve(page50):

. . . in contrastto thelargenumberof papersshowing priming andfre-quency effectsin deeporthographies,theattemptto provethenull hypoth-esisof no priming andno frequency effectsin theoral readingof shalloworthographiesrestsupona very narrow database.Therehave beenonlytwo reportsthata relatedcontext doesnot facilitatenamingrelative to anunrelatedcontexts (Frost,Katz & Bentin,1987;Katz & Feldman,1983),andonly onereportthatword frequency doesnot affect naming(Frostetal., 1987)

As BesnerandSmith note, one critical designfeatureof both the Frost et al.andKatz andFeldmanexperimentsis that, aswe have alreadydescribed,they usedbothwordsandnon-wordsasstimuli. Presumablynon-wordscanonlybepronouncedvia the assembledroute: they have, after all, no lexical representations.Could thisthennot simply biassubjectsto alwaysusetheassembledroute?After all, in a shal-low orthographythis will nearlyalwayswork. Sowhattheseexperimentsreportmaybe indicative not of what readersof shallow orthographiesdo in readingnormaltex-t (wherethe majority of wordswill be known), but rathersimply be the resultof astrategy thatsubjectshaveadoptedundertheconditionsof thisexperiment.

5.1.2 Evidenceagainstthe Orthographic Depth Hypothesis

BesnerandSmithdiscussseveralpiecesof evidencethatwould appearto underminethe conclusionsreachedin the Katz andFeldmanandFrost et al. papers,includingdatafrom Serbo-Croatian,Persian(Farsi), andJapanesewritten in kana. For Serbo-Croatian,experimentswereperformedwhereonly realwordswerepresentedto sub-jects.In this case,bothlexical frequency andprimingeffectswerefound.

ThePersianresultswereoriginally reportedin (BaluchandBesner, 1991).PersianorthographyisanArabic-derived abjad(Kaye,1996)(andseeSection6.1for anexpla-nationof thetermabjad): for many wordsthephonological informationprovidedbythewritten form is incomplete,in particularinformationaboutthevowels. However,asin Arabic, theconsonantletters � w � , � y � and � ‘ � (alif ) canfunctionasvowels(/u/, /i/ and/a/,respectively),andsomewordswrittenwith thesesymbolshappento becompletein theirphonologicalspecifications.ThusPersianprovidesbothcaseswherelexical accessis necessaryto namea written form, andcaseswherelexical accessisin principlenot necessary. TheODH would predictlexical accesseffects— word fre-quency andpriming effects— for thosewordsthatarerelatively “deep”,andno sucheffectsfor “shallow” words. BaluchandBesner’s datasupportthis expectation,butonly whena significantportion of non-wordswere includedamongthestimuli. When

164 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

suchnon-word stimuli were not presented,lexical accesseffects were obtainedforboth“shallow” and“deep”words.This, then,supportsthecontentionthatthereportedlackof suchlexical accesseffectsin previouswork onSerbo-Croatianmaybedueto astrategy adoptedby subjectswhengivena taskwheretheassembledrouteis oftenre-quired.Whennon-wordsareremoved,the assembledrouteis no longerautomaticallyadopted,andsubjectsbehaveasif they areuniformly usinganaddressedroute.

Theexperimenton readingof Japanesekanareportedin (BesnerandHildebrandt,1987) leadsto a similar conclusion. Japanese hasan even more extremecaseofa mixed orthographythan Persian,using both Chinesecharacters(kanji), many ofwhich functionlogographicallyaswehaveseen(Section4.3,thoughseeSection5.2.2below), aswell two kanacore syllabaries— hiraganaandkatakana,which arefairlyphonemicin their representation.3 BesnerandHildebrandtpresentedsubjectswithstimuli written in katakana,which is normallyusedto write foreignloanwords.4 Thestimuli wereof two types,namelywordsthat arenormally written in katakana,andwordsthatwould normallybewritten in kanji. The lattergroupwerethuswritten inanunfamiliarway, whereastheformergroupwasorthographicallyfamiliar. However,if theODH werecorrect,this familiarity shouldhave no effecton namingspeedsincekatakanais in any event a shallow orthography. Registeringa form as“f amiliar” or“unfamiliar” presumesthatoneis matchinga written form againsta lexical entry, yetif onepresumes,following theODH, thatkanais readusingonly pathwayA from Fig-ure5.1, thenno matchingagainstlexical entriescanbe involved. In fact,BesnerandHildebrandt’s resultsshow definiteeffectsof familiarity, with wordsthatarenot nor-mally written in katakana(unfamiliar orthographicforms) takingsignificantlylongerto namethanwordsthatarenormallywritten in katakana(familiarorthographicform-s). This suggeststhatlexical accessmustbeinvolvedin readingkatakana,contrarytotheexpectationsof theODH.

5.2 “Shallow” Processingin “Deep” Orthographies

Theprevioussectionhasexaminedsomeof thepsycholinguisticevidencesurroundingtheODH,which in its strongform claimsthatreadersof shallow orthographieslargelybypasslexical accesswhenreadingaloud.Thebulk of theevidencedoesnot seemtosupportthatradicalconclusion.Ratherthereseemsto beevidencethatreadersof bothshallow anddeeporthographiesdoperformlexical accesswhennaming,exceptunderexperimentalconditionsthatfavor adoptinga uniformassembledroute.

Yet surelythereis a sensethat “deep” orthographies,suchasEnglishor Chinese,typically requirelexical accessthat is “deeper”thanonewould expectfor a shalloworthography?For example,while naminga Spanishform likecocer‘to cook’ mayaf-ter all usuallyinvolve lexical access,presumablythewholelexical entrydoesn’t needto beretrieved,but ratherjust thephonologicalinformation,which correspondsfairlystraightforwardly to the orthographicform. In contrast,to reada Chineseword like

3Note however, that pitch accent,which is lexically distinctive in Japanese,is not marked in the kanascripts.

4Hiraganais reservedmostlyfor grammaticalmorphemes.

5.2. “SHALLOW” PROCESSINGIN “DEEP” ORTHOGRAPHIES 165�ma ‘horse’, wherethereseemsto beno indicationof the pronunciationin the or-

thographicform, presumablyonehasto retrieve the whole lexical entry: indeed,aswe havenotedelsewhere,it hasoftenbeensupposedthatChinesewriting is primarilylogographicin thateachcharacterrepresentsnotaphonologicalunit atall, but ratheraword or morpheme.In this sectionwe discussevidencethat in ChineseandJapanese— two canonicalexamplesof deeporthographies— rapid accessto the phonologywithout (complete)lexical access,is possible.This thenprovidesevidenceof a com-plementarynatureto whatwaspresentedin theprevioussection:a“deep”orthographycannonethelessshow shallow processingeffects.

5.2.1 Phonologicalaccessin Chinese

In an experimentreportedby AngelaTzeng(Tzeng,1994),Chinesereaderswerep-resentedwith a seriesof Chinesecharacterspresentedin rapid succession,possiblycontainingsomeinterveningcharacter-likenonsensematerial.5 Thetaskfor thesub-jectswassimply to write down thecharactersthatthey werepresentedwith. Thestim-uli werepresentedwith an interval of between90 and110milliseconds,fastenoughto resultin aneffect of repetitionblindnessunderappropriateconditions.Repetitionblindness,first reportedin (Kanwisher, 1987),denotesasituationwheretwo tokensofa particulartypearepresentedin rapidsuccession,andwheresubjectsfail to notethatmorethanonetokenwaspresented.In the context of Tzeng’s experiment,presenta-tion of two identicalcharacters— e.g. two instancesof � sheng ‘win’ — resultedin a meanaccuracy ratein subjects’performanceof about51%. In contrast,presenta-tion of a controlsequenceof two distinctandnon-homophonouscharacters— e.g. �shengand � dı — resultedin a higheraccuracy (around61%). Crucially, presenta-tion of two graphicallydissimilarbut homographiccharacters— e.g. � shengand sheng‘holy’ resultedin a meanerrorrateof 52%,or thesameastheratefor identicalcharacters.6

The critical factor in this experimentis that the homographicpairschosenweregraphicallydistinct,so it is not plausiblethat thesubjectsweresimply confusingthecharactersat a visual level. Neither is it possiblethat the subjectswere doing fulllexical accessand confusingthe two instancesat a lexical level. Putting asidetheimplausibility of doing lexical accessin as little time as90–110milliseconds(mostexperimentsaremoreconsistentwith lexical accessrequiringon the orderof a fewhundredmilliseconds,especiallyfor lower frequency items),full lexical accesscouldnot be involved,sincethe charactersin questioncorrespondto differentmorphemes:� sheng‘win’ and sheng‘holy’, certainlymusthave differentlexical entries,andif thesubjectsweredoinglexical accessthenthey surelywouldhaveregisteredthefactthat they weredealingwith a successionof distinct characters.The only solution,it

5The“nonsense”materialusedwasKoreanHankulsyllableglyphs,whichareof coursemeaninglesstoChinesereaderswhodonotknow Korean,but havetheusefulpropertythatthey look somewhatlikeChinesecharacters.

6The behavior for high and low frequency characterswas different, with high frequency homophon-ic pairsshowing a higheraccuracy than repeatedcharacters,thougha lower accuracy thandifferentandnon-homophonicpairs; for low frequency charactersthe performancefor homophonicpairswasactuallysignificantlyworsethantheperformancefor repeatedidenticalcharacterpairs.

166 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

seems,is to concludethatChinesecharactersmap,in the initial stagesof processing,to a level of representationthat is basicallyphonological.Put in anotherway, whileChinesecharacterscertainlycontainnon-phonologicalinformation, it is nonethelessthe casethat skilled Chinesereadershave learnedan associationbetweencharactersandtheircorrespondingsyllables,thatallowsfor veryrapidaccessto thephonologicalform, in effectbypassingtherestof lexical access.

Tzeng’sresultsareconsistentwith othermorerecentfindings.For examplePerfettiandTan(1998)reportresultsof a priming experimentwheresubjectswerepresentedwith a characterprime followed immediatelyby a target, which the subjectswerethenasked to readaloudasquickly andaccuratelyaspossible.The time differencebetweenthe startof the primeandstartof the target— the so-calledStimulusOnsetAsynchrony or SOA — wasvaried,aswasthe natureof the prime: the prime couldeitherbe graphicallysimilar, homophonous,semanticallyrelated(either vaguelyor“precisely”), or an unrelatedcontrol. A strongerpriming effect resultedin a shorterandgenerallymoreaccuratenamingof thetarget. With theshortestSOA’s (43 msec)thestrongestprimingwasobtainedfrom graphicallysimilarcharacters,but astheSOAincreasedto 57 msec,thegraphicsimilarity effect attenuated.AcrossthelongerSOAconditions,homophonousprimesconsistentlyhada strongereffect thansemanticallysimilarprimes.In otherwords,thenamingof targetcharactersis facilitatedmoreby aprimethatsoundsthesame,thanwith aprimethathasa relatedmeaning.

In thecontext of thecomputationalmodel,asensibleinterpretationof thisclassofresultswouldappearto bethatskilledreadersof Chinese,in additiontoknowingwhichcharactersrepresentwhich lexical entries,have also learneda setof “grapheme-to-phoneme”correspondencesby which they know, for example,that � mapsto sheng.In termsof the discussionin Section4.2, this amountsto sayingthat the relationbe-tweenthesyllableshengandtheentirecharacter� implicit in therepresentationhasbeenextractedasa rule by theskilled Chinesereader;we returnto this point in Sec-tion 5.2.4.

5.2.2 Phonologicalaccessin Japanese

Tzeng’s resultsfor Chinesearemirroredby theresultsobtainedfor Japanesekanji bytwo studies,(Horodeck,1987)and (Matsunaga,1994).

Horodeck’s goal was to refute the widespreadview that Chinesecharactersareideographic in the sensethat they directly representideasin the mind of the reader;thisview hasof coursebeenheavily attackedby others,mostnotablyDeFrancis(1984;1989). To this end,Horodeckconductedtwo studies,oneinvolving writing andtheother reading. In the writing study, spontaneouslywritten short essaysfrom 2410Japanesespeakers with a variety of occupationsandeducationalbackgroundswerestudiedfor spellingerrorsinvolving kanji. Horodeckclassifiedtheerrorsalongthreedimensions:� whethertheerrorful characterhadtheright sound— i.e., wasa homophoneof

thecorrectcharacter;� whethertheerrorfulcharacterhadtheright form — i.e.,sharedamajorstructural

5.2. “SHALLOW” PROCESSINGIN “DEEP” ORTHOGRAPHIES 167

componentwith thecorrectcharacter;and� whethertheerrorful characterhadtheright meaning— i.e.,wassimilarenoughin its senseto thecorrectcharacter.

For thepurposesof Horodeck’s intentions,themostusefulkindsof errorswereerrorsinvolving either:characterswith theright sound,but wrongform andwrongmeaning;or characterswith the wrong sound,wrong form but right meaning. All othercate-goriesof errorareeitherambiguous,or elsecouldbeexplainedpurelyon thebasisofformal similaritiesbetweentheerrorandthetarget. In Horodeck’scorpustherewere136right-sound/wrong-form/wrong-meaningerrors;amongtheseerrors127involvedon (Sino-Japanese)readingsand9 involvedkun (native) readings.In contrast,therewerea total of 14 wrong-sound/wrong-form/right-meaning errors. Thus, in sponta-neouswriting oneis muchmorelikely to make anerroron thebasisof soundthanonthebasisof meaning.

Horodeck’s secondexperimentinvolveda readingtestwherekanji with inappro-priate meaningswere insertedin a text, and wherethe object was to measurehowoften theseerrorswere detected. All of the errors in this portion of the study in-volvedmulticharactercompoundswith on readings:kanji occurmuchmorefrequent-ly in these constructionsthanthey do eitherwith kun readingsor assinglecharac-terswith on readings(“on-isolates”),andit wasthereforeeasierto constructstimuliusing multicharacteron constructions.For the stimulustexts, newspaperheadlineswere chosensincethesehave a higher densityof kanji than normal running prose.Theerrorstimuli usedwereof two types:right-sound/right-form/wrong-meaningandwrong-sound/right-form/wrong-meaning. Readerson averagedetectedonly 40.5%oftheformerkind of stimulus,asopposedto 54.3%of thelatterkind of stimulus.Thisd-if ferencewasstatisticallysignificant,anddemonstratedthaterrorshomophonouswiththeir targetsareharderto detectthanerrorsthatarenon-homophonous.

Matsunaga’s(1994)experiment,likeHorodeck’ssecondexperiment,involvedho-mophonousandnon-homophonouskanji errors.However, ratherthanaskingreadersto markerrorsin newspaperheadlines,sheinsteadmeasuredreaders’eye movementsasthey readfull sentencescontainingsucherrors: theassumptionhereis thaterrors,when detected,will disrupt the reader’s readingandwill translateinto fixations onthe locationof the error, which in turn will show up in the eye-trackingdata. Mat-sunagafoundthattherateof fixationspererrorwassignificantlyhigherin thecaseofnonhomophonicerrorsthan in the caseof homophonicerrors. In otherwords,non-homophonicerrorswereeasierto detect,a result that replicatesHorodeck’s secondstudy.

The studiesof HorodeckandMatsunagathus lead to the sameconclusion forJapanesereadingof kanji asdoesTzeng’sstudyof Chinese.Usersof bothwriting sys-temsmorereadilymisserrorsthatarehomophonouswith their targets,andthey morereadilymissrepeatedcharactersif they arehomophonouswith a previouslypresentedcharacter. All of thesestudiesthereforesupportthe ideathatChinesecharactersmapdirectly to phonologicalrepresentationsin themindsof fluentreaders.

168 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

5.2.3 Evidencefor the function of phoneticcomponentsin Chinese

Psycholinguisticevidencefor low-level phonologicalprocessingof Chinese,of thekind discussedin Section5.2.1seemson thefaceof it to bedirectconfirmationof theclaim of DeFrancis(1984;1989),alsodiscussedin Chapter4, that Chinesewritingis essentiallyphonographicin design,thoughobviously imperfectlyso. As it will berecalled,the mostpowerful evidencefor this claim is the large numberof semantic-phoneticcharacters,wherethepronunciationis indicated,to agreateror lesserdegree,by a phoneticradical. Theimportanceof phoneticinformationin thedevelopmentoftheChinesewriting systemis unequivocal,andit is evenplausibleto supposethatthephonologicalinformationprovidedby thephoneticcomponentis anaidto learningthewriting system.WhatTzeng’s experimentsshow is thatskilled Chinesereadershaveinternalizedthewrittensymbolsasakind of alphabet,sothatthey canretrievepronun-ciationsfor eachof thesymbolsin theabsenceof any furtherlexical information.Butonemustbecarefulaboutdrawingaconnectionbetweentheresultsof thisexperiment,andtheevidenceadducedbyDeFrancis.To seethis,considerthefollowing thoughtex-periment.SupposethattheChinesewriting systemwere,counterfactually, completelyarbitraryin its mappingbetweenorthographicsymbolsandtheir pronunciation:thatis, therewould beno equivalentto a “phonetic” component,andthereforeno way tolook atanovel characterandguessits pronunciation.Supposefurthermorethatsome-onehadmasteredthiswriting systemaswell asliterateChinesereadersmastertherealChinesewriting system.Onemight expectthat theresultsof Tzeng’s experimentsonthispseudo-Chinesewouldbeidenticalto whatshedemonstratedwith realChinese.Ifthatturnedout to bethecase,thenonewould haveto concludethata skilled readerofany writing systemis likely to “phonologicallyrecode”thesystemin that they wouldbe able to mapbetweenwritten symbolsandpronunciationwithout performingfulllexical access.7

So Tzeng’s experimentmight not relatedirectly to DeFrancis’resultsat all. Wemustthenaskwhat the evidenceis that readersof Chineseactuallymake useof thephoneticinformationprovidedin themajorityof Chinesecharacters.

In fact,thereis suchevidence,onerelevantexperimentbeingthatof (Hung,Tzeng,andTzeng,1992). In thatexperimenta Strooppicture-word interferenceparadigmwas usedto test subjects’abilities to namea picture when a single-characterwordof varying degreesof congruenceto the picturewassimultaneouslypresented.Forexample,supposea pictureof a basket is presented.A completelycongruentwordwould betheword ! lan ‘basket’; following Hung,Tzengand Tzeng’s terminology,we will call this word/characterCC for “completely congruent”. An exampleof acompletelyincongruent(CI) word would be " dıng ‘nail’. Partially congruentwordswere:� A homophonous(but semanticallydistinct)characterhaving thesamephonetic

componentasCC,or having CC asa phoneticcomponent.For example# lan‘blue’. (SGSS:“similar graph,samesound”)

7Coulmas(1989,page50) makesexactly this point whenhenotesthata skilled readerof Chinesecanequallywell mapbetweena charactersuchas $ andits phonological— bı — andlexical — ‘pen’ —values.

5.2. “SHALLOW” PROCESSINGIN “DEEP” ORTHOGRAPHIES 169� A homophonousandstructurallydistinct character: % lan ‘orchid’. (DGSS:“dif ferentgraph,samesound”)� A nonhomophonouscharactersharinga componentwith CC: & ji an ‘jail’ (S-GDS:“similar graph,differentsound”)� A pseudocharacter(PC),wherethe CC servedasthe phoneticcomponentof anon-existentcharacter:

Subjectswereaskedto namethepictures,andtheirreactiontimeswererecorded,alongwith their error rates.Not surprisingly, the CC andCI conditionsshowed the fastestandslowestreactiontimes,respectively, aswell asthebestandworsterrorrates.Theother conditionslisted above were arrangedas follows, orderedfrom fastest/lowesterror to slowest/highesterror: PC � SGSS� SGDS � DGSS.As Hung,TzengandTzengargue,therearetwo independenteffects,oneof graphicsimilarity to the tar-get (CC) character, andoneof phonologicalsimilarity. Taken together, theseresultsfirst of all supportthe laterwork of Tzeng(1994)in underscoringthe importanceofphonologicalinformationin Chinese,andthey alsoshow thatthephoneticcomponentis bothaccessibleandusedby Chinesereaders,sincethetwo non-CCconditions(PC,SGSS)wherethephoneticcomponentis thesameasthatof CC weretheoneswheresubjectsperformedthebest.8

5.2.4 Summary

Thereappearsto be evidencethatphonologicalinformationis both availableto andusedby readersof ChineseandJapanese.Furthermore,at leastfor readersof Chi-nese,informationin the phoneticcomponentof the character, whenpresent,is used.Whena usefulphoneticcomponentdoesnot exist, we assume,aswe did in the pre-viouschapter, that theorthographicentry for themorphemeis linkedsimultaneouslyto boththesemanticandphonologicalentries,andthatthecharacterthusservesasitsown “phoneticcomponent.” In the discussionin the previouschapter, we assumedastaticrepresentationwherebythe orthographicsymbol is simply listed aspart of thelexical entryof the morpheme,with indicesindicatingwhich portionsof the symbolcorrespondto thesemanticandphonologicalfields. What theexperimentalevidencepresentedin this sectionsuggestsis that skilled Chinesereadershave advancedonestepfurther thanthis staticknowledge.Soratherthanmerelyrepresenting(Chinese)' � BIRD+JI A � ya ‘duck’ asin (5.2),they haveformulatedaspellingruleasin (5.3),which wouldbelexically markedto applyonly to this morpheme.

8The importanceof the phoneticcomponentis further underscoredby several studiescited in (Hung,Tzeng,andTzeng,1992,page127),whereit wasshown that the phonologicalconsistency of a phoneticcomponentwasnegatively correlatedwith the naminglatency for charactershaving thatphoneticcompo-nent.

OnemightalsosupposethatJapanesereadersmakeuseof thephoneticcomponentwhenit is bothpresentanduseful.Clearlythephoneticcomponentwill notgenerallybeusefulfor kunreadings,but for onreadings,thephoneticcomponentwould in many caseshave approximatelythesameutility asit would for thesamecharacterin Chinese.Onestudythatseemsto supporttheutility of thephoneticcomponentin Japaneseonreadingsis (Floresd’Arcais,Saito,andKawakami,1995).

170 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

(5.2) ()))))))))*PHON

()* SYL

(* SEG +-, ONS .0/ , RIME 12/43TONE 5 67 6987;:=<

SYNSEM > CAT ?A@CBD?SEM E0BGFIHKJ <ML

ORTH NPO J-QSR :UT698888888887

(5.3) ya �VOW� BIRD �YX[Z R � JI A � ( \ ' )

(5.4) (' \ ) O]X^Z R � ya

Inverted,as in (5.4) this rule will map directly between'

and ya. The presenceof the phoneticcomponentR � JI A � on the lefthandside of several suchinvertedruleswill, giventhephonologicalsimilarity (thoughcertainlynot identity in thiscase),tend to reinforcethe salienceof the phoneticcontribution of that componentto thepronunciationsof thecharacterscontainingit asaphoneticelement.And theexistenceof suchinvertedrulesyields the result that it is possibleto mapdirectly betweenacharacterandits pronunciation,without lexical access.

The situationin Chineseis really no different in kind from the lexically markedspellingsin English: for the word light, for example,one must mark the spelling� igh � in thelexicon,sincethereis nowayto predictthatspellingfrom generalprinci-ples.However, onecancertainlyextractfrom thesetof wordsspelledwith � igh � theusefulgeneralizationthat this graphemesequenceis generallypronounced/aI/.9 Im-putinga rule that maps

'to ya from a lexical representationthat statesthat ‘duck’

is written with the � BIRD � radicalanda phoneticcomponent� JI A � , is merelyaninstanceof thesamephenomenon.

5.3 Connectionist Models: The Seidenberg-McClelland Model

As wediscussedabove,moststudiesof readinghaveassumedadualroute,or multipleroutemodel. By definition suchmodelspresumea strict distinctionbetweenstoredlexical informationon the onehand,andruleson theother. In separatingrulesfromstaticlexical information,suchpsychologicalmodelsareof coursetakinga fairly tra-ditional stance,onethatis in accordwith traditionallinguistic models.

As is well-known, sincethe mid 1980’s, an alternative view of languagehase-mergedwhich eschews a formal distinction betweenrulesand lexicon. This is theconnectionistview, so-calledbecausethe belief is that complex systemsof behavior

9Or, if oneprefersto assumea deepORL for English(seeSection3.2), that it mapsto /ı/ which sub-sequentlychangesto /aI/ by phonologicalrule. Note thatsetsof suchinvertedrulesserve asthe basisforlinguistically-informedapproachesto theteachingof readingsuchas(BloomfieldandBarnhart,1961).

5.3. CONNECTIONISTAPPROACHES 171

can be modeledusing large numbersof simple, but massively interconnectedunits(sometimescalled“neurons”). Phenomenathathave beentermed“rules” or “lexicalentries”aremerelyemergentpropertiesof suchinterconnectednetworks. Probablythemostfamousapplicationof this ideato aproblemin naturallanguageis RumelhartandMcClelland’s(1986)oft-citedsimulationof the learningof theEnglishpasttense.Their basicclaim wasthattherewasno differencebetweenEnglishregularpasttenseverbs,whichaddsomevariantof /-d/, andirregular(mostlyhistorically“strong” verb-s)whichinvolveachangeof thestemvowel (alongwith otherchangesin somecases):both typesof verbsare learnedby their network in the sameway, and the networkis — to someextent — ableto generalizewhat it has“learned” to new cases.Thusthereis noneedto posita formaldistinctionbetweenrulesandstoredlexical items,asboth“kinds” of knowledgeare“learned”by thenetwork in thesameway. An effectiverebuttal to this papercanbefoundin (PinkerandPrince,1988).

Theclassicconnectionistapproachto readingis thesystemof Seidenberg andM-cClelland(1989). Naturally, oneof themain claimsof their theoryis thatdual-routemodelsarenot necessary:in particular, regularly spelledwords,suchasate, whichcouldin principlebepronouncedwithout referenceto lexical information,arelearnedin the sameway asirregularly spelledwords (plaid), wherelexical accessseemstobe required. Indeed,in subsequentwork (Seidenberg, 1990)the model is presentedasproviding analternative to traditional“dual route” model(thoughsee(Seidenberg,1992)for a slightly modifiedposition). Thework is alsocited (Seidenberg, 1997)asa viable alternative to symbolicrule/principle-basedapproachesof the kind familiarfrom generative linguistics. Therearemorerecentandarguablymoresophisticatedconnectionistapproachesto reading— see,for example,thework of VanOrdenandcolleagues(Van Orden,Pennington,andStone,1990;StoneandVan Orden,1994),but Seidenberg andMcClelland’s paperseemsto presentthe mostdetaileddiscus-sionof a computationalsimulationof a connectionistmodelof reading,aswell asthemostdetaileddiscussioncomparingthat model’s behavior to experimentson humansubjects.

It is not our purposehereto give anextensive review of Seidenberg andMcClel-land’smodel.Rather, averybrief summarywill begiven,andafew of theweaknessesof theapproachwill bepointedout. Themainconclusionwill bethatSeidenberg andMcClellandhave failed to provide convincing evidencethat their modelhaslearnedthe task that it is claimedto have learned;thus thereis little reasonto accepttheirconclusionthatmoretraditionalkindsof modelshavebeensuperseded.

5.3.1 Outline of the model

Seidenberg andMcClellandhave in mind a completemodelof lexical processingre-lating orthographic,semantic,phonologicalandcontextual information. Their modelis diagrammedin Figure5.2.10 Theportionof themodelthat is actuallyimplement-

10Oddly, morphologicalinformation,socrucialfor thecorrectpronunciationof wordsin many languages,is missingfrom their conceptionof the lexical processingsystem. It is unclear, for instance,wherethemorphologicallydeterminedstressinformationin Russian(Section1.2.1)that is crucial for correctvowelpronunciationwould fit into themodel:would thatbepartof “phonology”or “meaning”?

172 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

context

meaning

phonologyorthography

MAKE /meIk/

Figure 5.2: The Seidenberg and McClelland model of lexical processing,(Seidenberg andMcClelland,1989,page526), Figure1. Usedwith permissionof theAmericanPsychologicalAssociation,Inc..

5.3. CONNECTIONISTAPPROACHES 173

100/200hidden units

400_orthographicunits

460_phonologicalunits

Figure5.3: The implementedportion of Seidenberg andMcClelland’s modelof lexical pro-cessing,(Seidenberg andMcClelland, 1989,page527), Figure2. Usedwith permissionofAmericanPsychologicalAssociation,Inc..

ed in the 1989paperis depictedin Figure5.3. This systemwastrainedon a setof2,884word-pronunciationpairsconsistingof all minimally three-lettermonosyllablesfrom the KuceraandFranciswordlist of English(1967), from which they removed“proper nouns,words we judgedto be foreign, abbreviations,andmorphologicallycomplex words that were formed from the addition of a final -s or -ed inflection”(page530). The training wasdivided into epochs,andwordswerepresentedto thesystemin eachepochwith a probabilityproportionalto their occurrencein theKucer-a/Francisdatabase.Input (orthography)andoutput(pronunciation)wascodedusinga“Wickelgrentriple” letter/phonemetrigramschemesimilar to thatusedin (RumelhartandMcClelland,1986): thus � cat� would becoded as N ca,cat,at

T(where‘ ’ is

theboundarysymbol). Eachinput andoutputunit is sensitive to exactly oneof thesetriples.

The testingmethodologyis describedby Seidenberg andMcClellandasfollows(page532):

Thephonologicaloutputcomputedfor eachwordwascomparedto allof thetargetpatternsthatcouldbecreatedby replacinga singlephonemewith someotherphoneme.For theword HOT, for example,thecomputedoutputwascomparedto thecorrectcode,/hot/,andto all of thestringsinthesetformedby /Xot/, /hXt/, and/hoX/, whereX wasany phoneme.Wethendeterminedthenumberof casesfor which thebestfit (smallesterrorscore)wasprovidedby thecorrectcodeor oneof thealternatives.

Thesystemwastestedon thetrainingset,andtheerror rateof the trainedsystemonthissetwas2.7%.Amongtheerrorsreported,someareplausibleregularizationerrors

174 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

(e.g./bruc/ for � brooch� ); othersarelessplausible(e.g./h rs/ for � hearth� ).The remainderof the paperis devotedmostly to demonstratingequivalencesbe-

tweenexperimentaldataonhumansubjectsandthebehavior of themodelon thekindof datareportedin theexperiments.Thetypicalcomparisonmadeis betweenhumans’naminglatencies— the amountof time it takes for a humanto pronouncea givenword aloud— andthephonologicalerrorscoreof themodel(computedasdescribedabove) for the correctpronunciationof the given word. For example,in a studyre-portedin (TarabanandMcClelland,1987),subjectsshowedslower naminglatenciesin low-frequency wordsthanhigh-frequency words;they alsoshowedslower naminglatenciesfor exceptionallyspelledwordsover regularly spelledwords,but this dif-ferencewasonly significantamonglow-frequency words. This patternof behavioris apparentlyreplicatedin the phonologicalerror ratesof the model: low-frequencywordsshow higherphonologicalerror ratesthanhigh-frequency words; andamongthelow-frequency words,exceptionallyspelledwordsshowedsignificantlyhigherer-ror ratesthanregularlyspelledwords.

5.3.2 What is wrongwith the model

The Seidenberg-McClellandmodel is not the first connectionistmodel that wasap-plied to theproblemof readingaloud:it waspredatedby severalyearsby theNETtalksystemof Sejnowski andRosenberg (eventuallyreportedin a journal article in (Se-jnowski and Rosenberg, 1987)). Sejnowski and Rosenberg were only marginallyinterestedin psychology, being concernedinsteadin an engineeringproblem: howcould onedesigna computationaldevice that “learns” to correctlypronouncewordsgiven a training corpusconsistingof text with alignedorthographyand pronuncia-tion? Ratherthanrestrictthemselvesto a few thousandmonosyllables,Sejnowski andRosenberg’ssystemwasexposedto Englishwordsof variousstructures,alignedwiththeir pronunciations,taken from runningtext. The resultsof Sejnowski andRosen-berg’s experimentwereclearlynot acceptablefor a realapplication— reportederrorrateswereabout8%by phoneme— but werepromisingenoughto spawn a greatdealof subsequentresearchin self-organizingmethodsfor learningword pronunciations:seeSection6.6.

Comparedwith Sejnowski andRosenberg’s system,the Seidenberg-McClellandsystemseemsratherweak,evenif theresultsdo on thefaceof it appearto bebackedupby experimentalevidence.Considerthatthemodelhasbeentrainedandtestedonlyon a few thousandmonosyllables(andfurther testedon possiblya few hundredmoreexamplesin thevariousreplicationsof experiments).Restrictingoneselfto monosylla-blesonenaturallyavoidsoneof themostdifficult problemsin learningto readEnglish,namelypredictingwherethestressis placed.Linguistically motivatedmodelsof pro-nunciation,suchasthosetypically usedin goodtext-to-speechsystems,modelstressplacementby somecombinationof lexical marking,andphonologicalrulesthat aresensitive to morphologicalstructure.Seidenberg andMcClelland’smodelprovidesnoanswerto how thelearnerlearnsto appropriatelystresswordswhenreadingaloud.11

11Similarly, by avoiding names,the modelavoids anothercomplex areathat maturereadersof Englishlearnto dealwith. Becausenames— personalnamesin particular— oftencomefrom languagesotherthan

5.3. CONNECTIONISTAPPROACHES 175

As we notedpreviously, someof the errorsproducedby the systemarebizarre,at leastif oneis consideringthe systemto be a modelof a normalmaturereaderofEnglish.Someerrorsthatfall into this categoryaregivenin (5.5):

(5.5) chew cwfrappe frlplewd lidmow mlouch eIcplume plomswarm swlrmangst ondstbreadth brebaczar varfeud fludgarb gargnerse mersnymph mimfsphinx spinkstaps tatstsar tarzip vip

As Pinker andPrince(1988)observe abouta similar setof errorsin the Rumelhart-McClelland(1986)model,thisdoesnotappearto be thebehavior of amaturesystem.

What, then,of the replicationsof the variouspsycholinguistic experimentsthatSeidenberg andMcClellanddiscuss?Several of thesedependuponanalogizingbe-tweensubjects’naminglatenciesand the error rateof the model: whetherthis is ameaningfulcomparisonis unclear, thoughone might acceptit if enoughexamplesshow parallelbehaviors betweenthesetwo measures.The problemis, however, thatsomeof the supposedparallelsare highly misleading. The bestexampleof this isshown in Seidenberg andMcClelland’sFigure19,reproducedherein Figure5.4.Thisis a replicationof a studyreportedin (Seidenberg, McRae,andJared,1988),whichcomparednaming latenciesfor regularly pronouncedEnglish words,and regularlypronouncedEnglishwordsthatbelongto aninconsistentlypronouncedclass(Reg Incin the figure). For examplehoneis regularly pronounced(/hon/), but thereare(fre-quent)words sharingthe letter sequence� one� , that have pronunciationsfor thatsequencethatareinconsistentwith thepronunciationin hone: gone, done. Theexper-imentalresultsdemonstrateda 13 millisecondnaminglatency differencebetweentheregularandregularinconsistentclasses,asshown in Figure5.4.Also shown in thatfig-urearethemeansquaredphonologicalerrorscoresfor themodelon thesamestimuli,which accordingto Seidenberg andMcClelland“also providea goodfit to thelatencydata.” But this canhardly be describedasan honestcomparison.Note that thereisno prescribedformula for mappingbetweenlatency differencesand(mean-squared)

English,thepronunciationof namesdoesnot alwaysfollow thegeneralconventionsof Englishwords.

176 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

540b535b530b525b520b

6.0

5.5

5.0

4.5

4.0

ExperimentSimulation

Regular Reg Inc

Type of Word

Mea

n N

amin

g La

tenc

y (m

sec)

Mea

n S

quar

ed E

rror

Figure5.4: Replicationof the(Seidenberg, McRae,andJared,1988)experiment,from (Sei-denberg and McClelland, 1989, page545), Figure 19. Used with permissionof AmericanPsychologicalAssociation,Inc..

5.4. SUMMARY 177

phonologicalerror scoresof the model. In otherwords,givena line representingla-tency differencesfor humansubjects,thereis no prescribedformulawhich,giventhatline, allows oneto derive theslopeandintersectof theexpectedphonologicalerrors-coresof themodel.Thus,with only two datapoints,SeidenbergandMcClellandcouldhave chosento give the line correspondingto themodel’s performanceany slopeandintersectthatthey chose.In particular, they couldhavematchedtheline exactly to theexperimentaldata;but this would have lookedtoo good,andwould have emphasizedthepoint that thecomparisonis meaningless.Insteadthey presentedanalmostexactmatch,a tacticthatis incrediblyeffective,unlessoneis payingcloseattention.

To sumup: Seidenberg andMcClellandpresenttheir modelasan alternative tostandardpsychologicaltheoriesof reading. But it is hardto acceptthat conclusion.What we arepresentedwith is a toy system,onethat is shown to performwell (andeventhenwith bizarreexceptions)only on a very small portionof theproblem. Thecomparisonsofferedwith realexperimentaldatarangefrom theplausibleto thehighlymisleading.

In thefieldsof computationallinguisticsor speechtechnology, nobodywould ac-ceptthatamodelprovidedausefulalternativesolutionto a problemif thatmodelhadonly beentestedon a small and carefully selectedsubsetof the relevant examples.Neithershouldonebelievesimilar claimsin psychology.

5.4 Summary

Thecomputationalmodelof orthographythatwe have presentedimplicitly assumesa“dual route” for mappingbetweenwritten formsandtheir pronunciation:on theonehandwe have themappingto theORL, thelevel of lexical representationrelevantfororthographicencoding;on theother, themappingbetweentheORL andtheorthogra-phy — cedgfihkjml — is a setof spellingruleswhich, invertedandcomposedwith themapbetweentheORL andthepronunciation,serve alsoasa setof “letter-to-sound”rules.

ThemodelalsohasArchitectural Uniformity in that it assumesthesamearchitec-ture for all writing systems:onereadsChineseor Japaneseby thesamemechanismsasonereadsEnglishor Serbo-Croatian.

Thesetwo propertiesarewell supportedby thepsycholinguisticliteratureonread-ing. Connectionistproposalsnotwithstanding,thereis evidencefor dual(or multiple)routesduringreading.And thesamemechanismsappearto beavailableto readersof avarietyof scripts.In particularthereis evidencebothfor “deep”(lexical) processinginsupposedlyshallow orthographies,andalso“shallow” processingin supposedlydeeporthographies.Of coursedifferentexperimentalconditionsmay favor certainstrate-gieswith certainscripts:presentingnonsensewordsto readersof a relatively shalloworthographywill certainlyfavor an“assembled”route.But on thewhole,whetheronefindsevidencefor deepor shallow processingdepends,it seems,moreuponthe taskthatis beingexaminedthantheorthographicsystemoneis studying.

So, while I would not go so far asto claim “psychologicalreality” for the com-putationalmodel,we canstatewith someconfidencethattheoverall architectureis at

178 CHAPTER5. PSYCHOLINGUISTICEVIDENCE

leastnot atoddswith whatweknow abouthumanorthographicprocessing.

Chapter 6

Further Issues

Needlessto say, therearemany problemsin the analysisof writing systemsthat areleft unresolvedby this discussion.This chapterwill addressfour of theseissues.

First of all, in Section6.1, I examinethe adaptationof writing systems,andaskwhatit meansto adaptanorthographyto a new language.

In Section6.2 I considerspelling reforms,and in particularthe 1995reform ofDutchspelling:asweshallseein thiscase,contraryto popularnotionsof whatspellingreformsshouldbelike,extantexamplesof spellingreforms(asin theDutchcase)oftenaremoreconcernedwith “morphologicalfaithfulness”thanwith makingthe writingsystemmore“phonetic”, thoughthey maynotbeparticularlysuccessfulin eithergoal.

Throughoutmostof this bookI have consideredwhatonemight term“core writ-ing”, that is theordinaryspellingof wordsin thenormalorthographyfor a language.Largely left out of this discussionhave beena largenumberof typesof symbolsthatarerampantin written language— abbreviations,specialsymbols(%, & , etc.),andnumericalsymbolsamongthem. How do suchthingsfit into the generalmodelthatwe havedevelopedhere?This questionis addressed,at leastin a preliminaryfashion,in Section6.3 with an examinationof written numeralsandtheir relationto numbernames,andin Section6.4with a shortdiscussionof abbreviatorydevices.

Finally, implicit in theapproachadoptedherehasbeentheBloomfieldian(Bloom-field, 1933,page21) maximthatwritten languageis “merely a way of recordinglan-guageby meansof visible marks”. This view, while assumedin a lot of work onwriting systemsis by nomeansa universallypopularone,andthereis a long traditionthat takesthe view that written languageis on a par with spoken language,andthattherearea lot of featuresof theformerthatarenotbestunderstoodby appealingto thelatter. In fact, I believe that thereis no fundamentalincompatibilitybetweenthe twoviews,andI will arguethispoint in Section6.5.

179

180 CHAPTER6. FURTHER ISSUES

6.1 Adaptation of Writing Systems:the Caseof ManxGaelic

Almost all of the literateculturesin the world todayoriginally borrowedtheir scriptfrom anotherculture. The only clear exceptionsto this generalizationare the veryfew cultureswhosewriting systemwasdevelopedtotally indigenously:Chineseis themostobviousexampleof this,but othersmight includeSemitic(dependinguponone’sviewsof theoriginsof Semiticscripts— see(O’Connor, 1996)),andof coursewritingsystems,suchasPahawh Hmong, that weredevelopedin recenttimes by inspiredindividualsin previously illiterate cultures.

It is worth makinga distinction(onenot often made)betweenthe adaptationofscriptsandtheadaptationof writing systems. Thedistinctioncanbe illustratedper-fectly by thevariousadaptationsof Semiticscripts,notablyHebrew andArabic (Hary,1996;Aronson,1996;Kaye, 1996). As is well known, the mostnotable featureofSemiticwriting systemsis their systematiclackof representationof shortvowels,andtheir imperfectrepresentationfor long vowels.1 Also systematicallylackingarerep-resentationsof certainconsonantalpropertiessuchasgeminationin StandardArabic.Daniels(1996b)termsthis kind of writing systeman abjad. As is alsowell known,Semitic languageshave a characteristic“root-and-pattern”morphology, wheremor-phologicallyrelatedwordssharea commonconsonantalroot,anddiffer mainly in thepatternof vowelsandgemination(in Arabic)or spirantization(in Hebrew) of rootcon-sonants.TheSemiticwriting systemsareoftenclaimedto bewell adaptedto Semiticlanguagessince,by omitting symbolsfor vowelsandalterationsof root consonants,theroot is renderedmoretransparentacrosswholesetsof derivationallyrelatedforms.

Whatever themeritsof this argument,it is noteworthy thatwhenSemiticwritingsystemshave beenadaptedto otherlanguagestwo ratherdivergentcourseshave beentaken.Onecourseis simply to borrowtheentirewriting system: thatis thesymbolsetis borrowed(with possibleaugmentations),thesymbolsareusedto denote(roughly)the samephonemes,and the way in which the script is usedto representlinguisticform is moreor lessidenticalto theoriginal. Thiswasthecoursetakenby theArabic-basedabjadsfor PersianandUrdu, andtheHebrew-basedabjadof Judeo-Arabic.Inthesecases,the resultingsystemis the sameasthe original writing systemin that,

for instance,shortvowels tendnot to beorthographicallyexpressed.Thereareothersensesin whichawriting systemcanbeborrowed,andwereturnto thesemomentarily.

Thesecondway in which Semiticscriptshave beenadaptedis for theborrowinglanguageto borrow thescript,but not thewayin which it is used. That is, thesymbolset is borrowed(againwith possibleaugmentation),the symbolsare(again)usedtorepresent(roughly) thesamephonemes,but theresultingsystemis not anabjad.Ex-amplesaretheArabic-basedKurdishandUyghuralphabets,and theHebrew-basedalphabetsfor YiddishandJudezmo(Ladino). Indeed,in thesecasesmorethanjust thesymbolsetandthe grapheme-to-phonemecorrespondenceswereborrowed: onealso

1Vowel symbolsareof courseavailable in the form of “points” written above or below the consonantsymbols,but suchsymbolsaregenerallyreservedfor pedagogicalusesto ensurethatthelearnerpronouncesthevowelscorrectly, andfor religioustexts for thesamereason.

6.1. ADAPTATION OF WRITING SYSTEMS 181

findspositionalpreferencesfor certainsymbolsreflectingcertainsoundsborrowedin-to thenew system.For instance,in Judezmo(Bunis,1975)theHebrew consonant� ’ � is usedto represent/a/ in initial andmedialposition;however in final position� h � is usedsinceHebrew wordsendingin /a/ aremost frequentlyspelledwithfinal . Thuskada‘each’ is spelled no� q’dh� . Similarly, � ’ � doublesasa“support” for syllable-initialvowelsbesides/a/,sothat i ‘and’ is written n�� ’y � ,andes ‘is’ is written nW� ’ys � ; notealsothat � y � is usedfor both /i/ and/e/. Again, this follows Hebrew practicesince � ’ � , etymologically/ p /, is onewayof representinganemptysyllableonsetin Hebrew, and � y � mayrepresenteither(long) /e/ or /i/. But despitethis inheritance,thesystemusedin Judezmois clearlyanalphabet,not anabjad,sinceall vowelsarerepresentedin theorthography.

Let usnow returnto thenotionof borrowing a writing system,ratherthanmerelyascript.Whatdoesthisnotionentail?Clearlytheinterpretationof thisconcepthingescruciallyuponone’sunderstandingof whatspecifickindsof linguistic informationarerepresentedby agivenwriting system,andhow they arerepresented.For example,wehave notedearlier(Section3.2,Footnote17) that thereis a tensionbetween“phono-logical faithfulness”and“morphologicalfaithfulness”: writing systemsoften faceachoicebetween representinga word in a form that is representative of its (surface)pronunciation,andrepresentingthemorphemesof a word in a fashionconsistentwiththeir spellingin otherrelatedwords.Semiticwriting systemshave addressedthis ten-sion in a ratherinterestingfashion:dueto their peculiarproperties,they areable,toa largeextent,to consistentlyrepresentmorphologicalroots,abstractingaway from avariety of surfacepronunciationsof thoseroots; but at the sametime, preciselybe-causevowelsandcertainotherfeaturesarenot generallyrepresented,Semiticwritingsystemsrepresentwordsin a mannerconsistentwith, if not particularly informativeof, theactualpronunciation.(In otherwords,Semiticwriting systemshaveincompletecoverage;Section1.2.1.)For aSemiticwritten form suchas � mlk � ‘king’, onecouldinterpretthis aseitherrepresentinga particularsurfacepronunciation(e.g. /malik/ inArabic)or elsea particularrootmorpheme.

Carriedonestepfurther, onemight imagine“forgetting” the phoneticbasisof astringof graphemeslike � mlk � andtakingthis to bea logographicrepresentationofthemorpheme‘king’. For a particularword derivedfrom ‘king’, onemight considerany additionalgraphemesto bephoneticcuesasto theparticularderivative of ‘king’in question.This abstractionwould appearto bethesourceof theheterogramsfoundin adaptationsof Aramaicwriting systemsto Persianlanguages(Skjærvø,1996).Het-erogramsarewordsor morphemesthatarespelledexactly asthey would be spelledin their Aramaicsourcelanguage,but areintendedto bereadasa Persianword: oftentheword, in additionto its Aramaiccore,hasadditionalgraphemicmaterialto reflect,for example,Persianinflectionalendings,andthesearespelledaccordingto Persian,notAramaic,pronunciation.To giveanexamplefrom Middle Persian(Skjærvø,1996,page523),awordspelled� YHWWNd � consistedof anAramaiccore � YHWWN �‘be, become’(the Aramaicstembeing /yhwwn/), and the final � d � representingaPersianinflectionalending. Theword wasto be pronounced/bawand/,with the pro-nunciation/bawan/ ‘become’ correspondingto the heterogram� YHWWN � . One

182 CHAPTER6. FURTHER ISSUES

could castthis asa reinterpretationof the mapping crqtsIuwvxv for a given written sign:ratherthanmappingfrom thevalueof PHONattributeto spelling,it is reinterpretedasbeinga mappingfrom a valueof the SYNSEMattribute. Aramaicheterogramsthushave a strongfamily resemblanceto theadaptationof Chinesescript to thewriting ofnative wordsin Japanese(Section4.3), wherethephonographicbasisof theChinesecharacterwaslost.

Turning to anothercase,considerhow a writing systemmight look if it wereanadaptationof theEnglishwriting system.Considerfirst whatparticularpropertiesareessentialfeaturesof Englishwriting. Two importantpropertiescometo mind:� A particular associationbetweenphonological structureand graphemese-

quences( c qtsyuzvxv )� A largeamountof lexical markingof orthographicfeatures;seeSection3.2

Onewriting systemthat is adaptedfrom that of Englishand that seemsto haveadoptedthe above-mentionedtwo featuresis thatof Manx Gaelic. Unlike Irish andScotsGaelic,which preserveda written tradition datingbackto the 7th century(see(McManus,1996)for aconciseoutlineof thehistoryof Gaelicorthography),theGael-ic speakerson the Isle of Man completelylost touchwith that literary tradition. Sowhen Bishop Phillips, the Welsh Bishop of SodorandMan, undertookto translatetheAnglicanBook of CommonPrayerinto Manx sometimebetween1605and1610(Thomson,1969),he wasforcedto inventan orthographyfor the language.This hedid, with a systemthat representedtheconsonantsasin Englishandthevowels (ap-parently)at leastin partbasedonWelsh.Thisfirst attemptto introduceanorthographyfor Manx wasnot very successful,however. In the early eighteenthcenturythe firstprintedbookin Manx appeared,a bilingual versionof BishopThomasWilson’sPrin-ciplesandDutiesof Christianity, andit wastheorthographicschemeusedherethat,with someminorchanges,becamethestandardorthographyfor ManxGaelic.

This later orthography, unlike Phillips’, wasbasedalmostwholly on that of En-glish. Thismuchis generallyacceptedasbeingclearlyborrowed:� The valuesof the vowels: thus,for instance,� ee� represented/i/ and � oo�

represented/u/.� Theuseof ‘silent’ � e� to markvowel length.Thuslane /l { :n/ ‘full’.� Doubledconsonantsmarking short vowels. Thus moddey */m ` d| /,2 balley/bæl}[| / ‘town, farm’.� � gh� is usedto represent/x/ (word internally),asit wasin variousdialectsofEnglishat thetime.

2The pronunciationheredoesnot reflect the pronunciationof Late Spoken Manx (Broderick,1984b),which would be */m ` : ~2| / for this word: betweenthe time that the orthographywas invented,and theearly20thcentury, several innovationshadtakenplacein theManx soundsystemincludingthelenition ofintervocalicstops,andaprocessof lengtheningof /a/and/ ` / in stressedopensyllablesof disyllabicwords.

6.1. ADAPTATION OF WRITING SYSTEMS 183

To besure,somechoicesfor spellingcertainsoundsdo notmakeagreatdealof sensegivenanEnglishmodel. So � y � generallyrepresents/ | /, somethingthatmight per-hapsbea holdover from Phillips’ earlierorthography, since � y � is usedto represent/ | / in Welsh. Equallypuzzlingfrom anEnglish(or a Welsh)point of view is theuseof � ey � to represent/ | /, particularly in final position: ushtey /usc| / ‘water’, carrey/kær| / ‘friend’.3 Also not apparentlyfrom English(thoughvery reminiscentof tra-ditional Gaelicspelling)wasthesporadicuseof � i � beforea consonantto representpalatalizationof that consonant.But on the whole, the Englishprovenanceof mostfeaturesof Manx spellingis quiteclear.

Not only did Manx borrow a largenumberof its spelling-soundcorrespondencesfrom English, but it also apparentlyborrowed the propensityof English for irregu-lar spelling; in more technicalterms,it adoptedthe tendency of English for lexicalmarkingof orthographicfeatures. One interestinginstanceis the useof � h � afterinitial consonantswhich, with only four exceptions,would appearto correlatewithno phonologicaldistinction. Thefour exceptionsare � gh� (representing/ � /), � ch�(representingboth/x/ and/c/), � ph� (representing/f/) and � sh� (representing/s/).4

But � h � canalsooccurafter initial � b � , � d � , � f � , � k � , � l � , � m � , � n � ,� r � , and � t � ,5 andin noneof thesecasesdoesthe spellingwith � h � apparentlycorrespondto a differentphonologicalform from thespellingwithout it.6 Consider,for example,whatCregeenin his classicdictionary(1835,pagevi) statesconcerning� lh � — theonly � h � spellingthatheexplicitly commentson:

L. Somesaythatthis letteradmitsof noaspiration,andis pronouncedasl (in English)in law, live, love; asLAUE, LIOAR, LANE; but I think

3Onepossibleexplanationof this particularpuzzleis that the � ey � spellingwasmotivatedby a finalreducedvowel otherthan/ | /, namely/I/. At leastsomeof thewordsspelledwith final � ey � andpronouncedwith / | / in LateSpoken Manx may have hadan /I/-like vowel in 18th centuryManx, asevidencedby thequasi-phonetictranscriptionscollectedin Edward Lhuyd’s Geiriau Manaweg (‘Manx Words’) (IfansandThomson,1979).Thuswe find: wysteefor ushtey ‘water’; yleefor eoylley ‘mud’; maji for maidjey ‘wood’;lomyr yn kyrri for loamrey’n cheyrrey ‘fleece’; fani for fahney ‘wart’. For at leastsomeof thesethereisetymologicalevidencein thatcognatesin Irish or ScotsGaelichaveapalatalizedconsonantbeforethefinalreducedvowel, whichcouldplausiblyresultin ahigher/I/-like reducedvowel. Thus(palatalizedconsonantsunderlined):(Irish) uisce ‘water’ ( � ushtey), maide ‘stick’ ( � maidjey), and(ScotsGaelic) foinne ‘wart’( � fahney); notethat in Gaelicspelling,palatalizationof consonantsis indicatedby � e� or � i � adjacentto the consonantcluster(andif possibleon both sides). Given that � y � wasat leastsometimesusedtorepresent/ | / in otherpositions,theuseof � ey � to representthis /I/ wouldhavebeenreasonable,especiallysincethe pronunciationof final /i/ (as in chimney) wasmost likely /I/ in nearby(Lancashire)dialectsofEnglish. (Indeed,Geoffrey Sampson,personalcommunication,notesthatsuchfinal high vowelsarelax inpresent-dayReceived Pronunciation;seealso(Wells, 1982,page119).) It is conceivable that � ey � wasthengeneralizedto representall final reducedvowels.

4Notethat in Phillips’ earlierorthography� ch� wasnot ambiguousasit representedonly /c/. In thatsystem/x/ wasrepresentedas � gh� in all positions.

5Onealsofinds � vh � to representinitial lenitionof wordsspelledwith � bh� or � mh� .6Of course, � bh� , � mh� , � fh ��� th � and � dh� have an overt similarity to Gaelicspellingsfor

lenited/b/ (/v/ or /w/), /m/ (/v/ or /w/), /f/ (/ �P� ), /t/ (/h/) and/d/ (/ � /). But it seemsunlikely that traditionalGaelicorthographyis the sourceof thesespellings,sincethereis no evidencethat the developersof theorthographywereawareof Gaelicorthographictraditions,so therewould have beenlittle opportunityforthemto attemptto give Manx a superficialsimilarity to Gaelic.Besides,a Gaelicsourcecouldnot directlyexplain themostcommon� h � spelling, � lh � , norcouldit explain � kh � , � nh� or � rh � , sincenoneofthesesequencesoccurinitially in Gaelicspelling.

184 CHAPTER6. FURTHER ISSUES

thereis a distinctionbetweenlie or ly in English,andLHIE in Manks;andhadthewordsLOO, LOOR, etc. beenspelledor written LHOO andLHOOR, they would have answeredtheMankspronunciationbetter;forwithout theh thesoundis toonarrow, exceptto thosewhoknow thattheyrequirethatsound.

Thoughit is hardto saywhatCregeenis describinghere,it is evident that,at leastintheManx of the early19thcenturywhenhis dictionarywascompiled,thepronunci-ation of /l/ in Manx wasdistinct from thatof English/l/, andthe spellingwith � h �wasintendedto answerthis difference.7 However, asis alsoevident from Cregeen’scomments,otherwordsthatwerespelledwith plain � l � , alsohadthisnon-English/l/,sothat the � h � , if indeedit servedthefunctionof markingtheconsonantasdistinctfrom theEnglishpronunciation,at leastdid not do soconsistently.8

Did � h � serve to distinguishhomophonesor closehomophones?It clearlydid atleastpartly serve this function,asthe following closeminimal pairs(from Cregeen’sdictionary)show:

beill ‘mouths’ bheill ‘grind’leih ‘forgiveness’ lheih ‘place’lott ‘lot’ lhott ‘wound’meeley ‘soft’ mheeley ‘mile’taal ‘flow’ thaal ‘adze’tie ‘the ill’ thie ‘house’

Indeed,in onecaseCregeenhimselfexplicitly notesthis function: commentingon thewordmhill ‘spoil’ andonthealternativespellingmill, henotes(page126)that“for thebettersound’ssake anda differencefrom Mill (honey), theh is inserted.”9 However,providing ameansof orthographicallydistinguishinghomophonesseemsonly to havebeena minor function of postconsonantal� h � . Table6.1 shows the total countsinCregeen’sdictionaryof wordsspelledwith initial � Ch � (for C a consonant),andthenumberof thosewordsthatareminimal pairswith homophonicor close-homophonicwords spelledwithout the � h � . (In thesecalculations,I discountedderived com-pounds:thusthie ‘house’is counted,but not thie lhionney ‘alehouse’.)

About the only consistentfunction that � h � in thesespellingsseemsto have isthatit servesto make Manx spellingirregular: thatis, onemustsimply list for a word

7RobertThomsonsuggests(personalcommunication)thatCregeenmayhave hearda dark /l/ in Manxin contrastto thelight /l/ onewould expectto getin Englishin initial position.

8Onemight be temptedto supposethat � lh � representspalatal(“slender”) /l/, sincefor a numberofwordsspelledwith � lh � , the correspondingIrish or ScotsGaelic forms have palatalized/l/: thus lhaih‘read’ correspondingto ScotsGaelic leugh (wherethe � e� serves to mark a palatalized/l/). Howeverpalatalized/l/ is alsomarkedin otherways,especiallyby � i � : lioar ‘book’ (Irish leabhar). And therearemany instancesof wordsspelledwith � lh � thatarenot palatalizedin Gaelic: thuslhag ‘weak’ (Irish lag).This is alsoconfirmedby late spoken Manx pronunciationsascataloguedby Broderick(1984a): thusforexamplethe word lhon (Irish lon) ‘blackbird’ hasattestedpronunciations/l � �k� n/ or /l ` n/, neitherof themwith palatalized/l/.

9It is unclearwhathemeansby “the bettersound’s sake.”

6.1. ADAPTATION OF WRITING SYSTEMS 185

Spelling Total Number Homophones Percentage� bh� 14 1 7%� dh� 32 1 3%� fh � 2 0 0%� kh � 6 2 33%� lh � 186 8 4%� mh� 24 3 13%� nh� 3 0 0%� rh � 23 0 0%� th � 90 5 6%

Table6.1: Total countsin Cregeen’s dictionaryof wordsspelledwith initial � Ch � , whereCdenotesaconsonant,andthenumberandpercentagesof thosewordsthatareminimalpairswithhomophonicor close-homophonicwordsspelledwithout the � h � .

likebheill ‘grind’, thefactthatthereis an � h � in theorthography. Thuswemightas-sumearepresentationalongthelinesof (6.1),wherethe � bh� spellingis (irregularly)licensedby /b/:

(6.1) > PHON �=� : <2� J <����z<0�ORTH N��P� : Q �M� J Q ��� � T L

This is, needlessto say, highly reminiscentof English,wherelargeamountsof suchlexical markingarenecessary. Theparticularuseof � h � , is of coursenot apparentlyborrowed from English: that is distinctively Manx. However the propertyof irreg-ularity itself plausibly is borrowed. Onecan imaginethe original developersof theManxwriting system,beingintimatelyfamiliarwith Englishorthography, consciouslyor unconsciouslyimportingthepropertythatwordsmayhaveorthographicallymarkedlexicalentries.Occasionallythisirregularitywouldbeusedto distinguishhomophones(asin Englishroad/rode), but moreoften it would be usedasa lexical markingwithno apparentother function. Put in anotherway, the developers’of Manx orthogra-phy, giventheir experiencewith English,werenot particularlymotivatedto provideaconsistentspellingsystemfor Manx.10

What doesit meanto borrow a writing system? Apparentlyit can meanmuchmorethanmerelyadaptingthe mapping c qtsyuzvxv to a new language.In somecases,asin Perso-Aramaicheterograms,it can involve a reinterpretationof what c qtsyuzvxv ismappingbetween.In thecaseof Manx,whatwasborrowed(apartfrom theparticular“letter-to-sound”correspondences)is thepropertyof having rampantlexical markingof orthographicproperties.

10As RobertThomsonnotes(personalcommunication),besidestheidiosyncraticuseof post-consonantal� h � , therearemany otherinstancesof idiosyncraticspellingsin Manx. For instance,thewordsleigh ‘law’,leih ‘forgive’, lheiy ‘calf ’ and lhiy ‘colt’ arehomophonesor nearhomophones,which arekept distinct insomewhatarbitrarywaysin spelling.

186 CHAPTER6. FURTHER ISSUES

6.2 Orthographic Reforms: the Caseof Dutch

Englishis oneof thefew major languagesthathasbeenblessednot to have hadanylarge-scaleformally sanctionedspellingreformsduringits history, thisdespitethenu-merousattemptson the part of variousindividualsfor the pastthreehundredyears.Not surprisingly, themajorintentionof all spellingreformsproposedfor Englishis torenderEnglishspelling“morephonetic”,or in otherwordsto make it morephonolog-ically faithful. An Anglocentricviewpoint would thusassumethat spellingreformsin generalshouldaim for phonologicalfaithfulness. In fact, this is not usually thecase,andmorphologicalfaithfulness— a propertythat Englishorthographyalreadyhasto someextent(seeSection3.2)— canoftenplayarole in theredesignof spellingsystems. We examineherethe caseof the 1995 spelling reform for Dutch, whichillustratesthispoint.

In 1995a new revision of Dutch spellingwas formulated(Instituut voor Neder-landseLexicologie,1995);this new spellingbecameofficial in thefall of 1996in theNetherlandsandBelgium.Variouschangesproposedin the1995spellingsystemhavebeenthesourceof muchlinguistic debate;see(Neijt andNunn,1997)for a compre-hensivereview of this andpreviousspellingchangesfor Dutch.

In this discussionwe will concernourselves with only one issue,namely thespellingof two of the so-called“linking morphemes”in nominalcompounds,thosethat arespelled � e� or � en� , both of which arepronounced/ | /. Someexamples,usingtheconventionsof thepre-1995— 1954— spelling,areshown below, with thelinking morphemein questionunderlined. (We will glossthe linking morphemeas“LM”): 11

(6.2) (a) slangebeet(snake+LM+bite)‘snakebite’paardebloem(horse+LM+flower) ‘dandelion’kattevel (cat+LM+skin)‘catskin’forellevangst(trout+LM+catch)‘trout catch’

(b) bessenjam (berry+LM+jam)‘berry jam’boekenkast(book+LM+case)‘bookcase’paardenvolk (horse+LM+people)‘cavalry’kreeftenvangst(crab+LM+catch)‘crabcatch’

6.2.1 The 1954spelling rules

Under the 1954spellingconventions,the decisionon which form to usewasbasedlargely on whetherthe lefthandmemberis interpretedasplural. A commonplural

11The-e anden formsareonly two of thefive possiblewaysof linking elementsof nominalcompoundsin Dutch. Of the otherthreeways, the mostcommonis simply to have no linking morpheme:rundvlees(ox+meat)‘beef’. Lesscommonis -s (lamsvlees(lamb+S+meat)‘lamb (meat)’),andevenrareris -er (run-dergehakt(ox+ER+chopped)‘groundbeef’). As Schreuderet al. (1998)note(from whomtheseparticularexampleswere taken), all of the non-zerolinking morphemesare relics of an obsoletemedieval Dutchnominalinflectionsystem.

6.2. ORTHOGRAPHICREFORMS 187

suffix for Dutchnounsis written � en� — andpronounced/ | /. If thelefthandmemberof thecompoundhasa plural in � en� (not all nounsdo), andif the interpretationofthelefthandmemberin thecompoundin questionis plausiblyplural,spell thelinkingmorphemeas � en� ; otherwisespell it as � e� . Thus,in principle oneshouldwritebessenjam for ‘berry jam’ becausetheword for berry(bes) hasa plural in � en� , andbecauseonenormallymakesjamoutof multipleberries.Ontheotherhandonewritesslangebeetfor ‘snakebite’, becauseeven thoughthe plural of slange is slangen, a s-nakebitetypically only involvesonesnake. As Neijt andNunnnote(1997,pages11–12), andasonemight expect, thingswere by no meansuniformly so simple. So,sometimestheprincipleswereappliedratherarbitrarily: why is it kreeftenvangst‘crabcatch’, implying thecatchof morethanonecrab,but forellevangst‘trout catch’, im-plying thecatchof justonetrout?12 Furthermore,someportionsof thevocabularyap-parentlylicensedcategoricaloverridesof thegeneralprinciple.For instance,if theleft-handtermdenotedapersonor persons,enwasalwaysused,evenif asingularinterpre-tationmight beplausible:weduwenpensioen‘widow’spension’.13 However, if a par-ticular individual is intended,then � e� is written: koninginnedag (queen+LM+day)‘Queen’sday’.

But ignoring these(somewhat large) nits in the system,and working undertheassumptionthat the 1954spellingconventionsweremore-or-lessconsistent,what isthebestanalysisof themappingbetweenlinguistic representationandorthographyintermsof thetheoryunderdevelopmenthere?Thespellingconventionsstatedthatoneshouldwrite � en� if thelinking morphemewaspronounced/ | /, if theintendedinter-pretationof thelefthandmemberwasplural,andif thenounin questionhadaplural in-en. They did not actuallymake thelinguistic claim that in suchinstancesthelinkingmorphemeis theplural morpheme.However, severallinguists(e.g.Booij (1996),andSchreuderandcolleagues(1998)who provide experimentaldatasupportingBooij’sclaim) have madepreciselythis argument,andindeedit seemsto result in the mostsuccinctdescriptionif we make this assumption.If this is the case,thenwe canas-sumethatnounsthathaveaplural in -en, selectthelinking morpheme(spelled� en� ,andmarked[+PL]) in (6.3a),which is in fact just the-enplural morpheme;andin allothercasestheform in (6.3b)(which is unspecifiedfor plurality) is selected:

(6.3) (a) ()* SYNSEM ��� ���m��� �PHON ��| : �ORTH N � ? : < T

6 8712As a reviewer haspointedout to me,theanswermight be that“crabsaretypically caughtin numbers,

while trout arecaughtindividually on a line.” On the otherhand,a trout fishermanmay capturemultipletrout on a single fishing trip, and yet underthe old spelling system,this would still have to be writtenforellevangst.

13Notethat theEnglishtranslationhasa singularform widow’s, correspondingto theDutchplural formweduwen.

188 CHAPTER6. FURTHER ISSUES

(b) ()* SYNSEM � �PHON ��| : �ORTH N � :�< T

69876.2.2 The 1995spelling rules

For all of their relativeconsistency, the1954conventionssuffer from onemajorprob-lem, that make themideal fodderfor spellingreformers.Decidingwhetherto write� en� or � e� requiresone to judgewhethera lefthandmemberof a compoundisplausiblyplural in interpretation.Sincethis may differ from compoundinstancetocompoundinstance,the 1954 conventionshad the disadvantagethat one could notguaranteea consistentspellingof a givencompound,or a givenclassof similar com-poundssincein someinstancesa plural interpretationmight seemappropriate,in oth-ersa singularinterpretation.Thusfrom thepoint of view of thosewho prefera super-ficially consistentspellingsystem,the1954designis ratherpoor.

Underthe 1995conventions,oneis no longerrequiredto decideuponwhetherapluralmeaningis moreappropriateto agivencontext. Rathertherule for using � en�and � e� depends,at leastin its simplestform, onwhattheplural form of thelefthandnounis (InstituutvoorNederlandseLexicologie,1995,page25,my translation):

Write an -n- whenthe first part of the compoundis an independentnounwhich hasa pluralexclusively in -(e)n.14

Theconnectionbetweenthe � en� form of thelinking morphemeandtheplural is thusalsodrawn in the1995conventionsasin the1954conventions,but herethesemanticsof the compounddo not enter into the decision. On the faceof it, then, the 1995conventionswould appearto bea simplificationover theearlierconventions.

However, therearesomeexceptionsto themainrule, which significantlycompli-catethenew spellingprinciple.Amongthese:

1. Thefirst partdenotesapersonor thing thatin thegivencontext is auniquetype:zonneschijn ‘sunshine’

2. The first part is an animal name, and the secondpart is a botanical term:paardebloem(horse+LM+flower) ‘dandelion’

3. The first part denotesa body part, and the whole compound is a fos-silized construction: kakebeen (jaw+LM+bone[?]) ‘jawbone’, ruggespraak(back+LM+speech)‘consultation’

Thefirst of theseexceptionsis of coursealmostidenticalto thestipulationof the1954conventionsrelatingto compoundswith lefthandmembersdenotingpersons,and

14Besidestheexceptionsto therule to bediscussedbelow, therearea coupleof additionalamendments.For example,if thesingularof the lefthandnoundoesnot endin / | / andcanform a plural both in /s/ and/en/,thenthelinking morphemeshouldalsobespelledwith � en�

6.2. ORTHOGRAPHICREFORMS 189

for thisclassof casesthewriter is still forcedto judgewhetherthelefthandmemberisappropriatelyinterpretedasuniquegiventhecontext. The“flora-faunarule” is appar-ently a concessionto thefact thatmostsuchcompoundshadpreviously beenspelledwith � e� , andtherewasadesireto minimizethenumberof spellingchangesrequiredby the new conventions(Neijt andNunn, 1997,page22). The third seemsperhapsthemostdifficult to applysinceit requiresoneto determinewhethertheconstructionin questionis “fossilized” (theDutchtermusedhereis versteendesamenstelling‘pet-rified compound’),presumablymeaningthat it is semanticallyopaque.Ruggespraak‘consultation’clearlycountsasopaque,but kakebeen‘jawbone’is muchlessobvious-ly so, andonly after somerathercircuitousreasoningdoesit becomeclear that oneshouldprobablyconsiderit to beopaqueafterall: beenmeanseither‘bone’ or ‘leg’;theplural for ‘bone’ is beenderen, for ‘leg’ benen; thepluralof kakebeenis kakebenen,suggestingthatthebeenherehas,etymologicalconsiderationsaside,the‘leg’ reading.On thatreasoning,thencertainlykakebeenshouldbeconsideredopaque.

What is an appropriateformal analysisfor the 1995conventionon � e� versus� en� ? Sincethe basicrule is no longer basedon the semanticsof the situation,but ratheron a morphologicalpropertyof the lefthandnoun— whetheror not it hasa plural in -en — it no longermakesany senseto assumea linking-morphemeentrywith a[+PL] semanticspecification(asin (6.3a)).Simplerwouldbeto assumeasinglelinking morphemethatis unspecifiedfor orthographicandsemanticinformation:

(6.4) > PHON ��| �ORTH N T L

In orderto predicttheappropriatespelling,weneedto assumeamorphologicalfeature[+en], which marksnounsthat have a plural exclusively in /en/. Thenwe canwritespellingrulesasin (6.5), to capturethebasicrule of the1995conventions,andfill invaluefor theORTH attributein (6.4):

(6.5) / | / ��� en� / [+en]/ | / ��� e�

For theexceptionalclassestherearetwo possibleroutes.Firstly, onecouldprespecifythe exceptional � e� spelling in the orthographicfield of the compoundsfitting thetermsof theexception:this seemsperhapsthemostreasonableroutein thethird classof exceptionsgivenabove. Secondly, onecouldassumeanadditionalrule introducing� e� in certainsemanticallyspecifiedcontexts. The flora-faunaexceptioncould behandledby therule in (6.6),which would applysoasto bleedthefirst of therulesin(6.5):

(6.6) / | / ��� e� / [+flora] [+fauna]

But, howevertheexceptionsareto betreated,whatis clearis thatwehave in the1995proposalasystemthatis morecomplex thantheconventionsit supplants.

Whatwould a morereasonableapproachfor the1995proposalto have takenwithregardto the linking morpheme?Oneapproachwould have simply beento leave it

190 CHAPTER6. FURTHER ISSUES

unchangedfrom the 1954conventions. This is moreor lesswhat Booij (1996)sug-gests.Of course,thisdoesleavesomeambiguityin somecases:dependinguponwhatonemeansonecouldhave eitherschapevleesor schapenvlees(sheep+LM+meat)for‘mutton’. But, asBooij rightly asks(page133): “what is wrongwith that?”

A secondapproachwould have achievedcompleteconsistency, andwould at thesametime have beenmuchsimplerto state:sincethelinking morphemeis invariablypronounced/ | / nomatterhow it is spelled,onecouldsimplyalwayswrite it � e� , andeliminatethe � en� spellingentirely. Alternatively, onecould have chosento spellthelinking morphemeexclusively with � en� (eliminating � e� entirely);in thelattercase,thepronunciationof � en� as/ | / would beconsistentwith thepronunciationoftheplural suffix � en� , whosespellinghasnot beenchangedunderthe1995reform.Either of thesereforms,hadthey beenadopted,would have madeDutch spellings-lightly more“phonologicallyfaithful”. Insteadwhathasbeenadoptedis asystemthatattemptsto beas“morphologicallyfaithful” asthe 1954conventionsbut at thesametime drainsthemorphologicalfaithfulnessof any semanticsense:ratherthandepend-ing uponthesemanticsof thesituation,the1995conventionsrequirethatoneconsidera purely formal propertyof the lefthandnoun(whetherit exclusively formsits pluralin -en). In principlethis couldof courseguaranteemoreconsistentspellingsthanthe1954conventionssincethewriter would merelyhave to reflecton themorphologicalpropertiesof thebasenoun,andwould not have to determinepossiblysubtleseman-tic nuances.Needlessto say, whatever benefitmight have beengainedby this newconventionhasbeeneffectively eliminatedby theadditionalstipulatedexceptions.

6.3 Other forms of Notation: Numerical Notation andits Relation to Number Names

In our discussionof writing systemswe have thus far focusedexclusively on whatmight betermedthecoreof writing systems,wherewritten symbolsclearlyrepresentsomesort of linguistic object, be it phonologicalor lexical. But written languagecontainsmany formsthatcannotbesodescribed,themostprominentandwidespreadof thesebeingnumericalnotation. Herewe will concernourselvesmostly with theHindu-Arabicnumeralsystem,whichhasbecomepracticallyuniversal.

Therehaveof coursebeennumerouswrittenrepresentationsof numbersdevelopedthroughouthistoryby variousculturesspeakingvariouslanguages;for anoverview see(Pettersson,1996). In somecases,thesystem,in additionto servingasa representa-tion for numerals,alsoservedasa reasonablewritten representationof theassociatednumbernames.Suchis the casewith traditionalChinesenumerals.Thusa numeralrepresentationsuchas�¡ £¢¥¤£¦£§¥¨ sanqianli u bai ba shısı ‘ ©�5Mª2«­¬�5Mª2®­¯�5yª±°A² ’,servessimultaneouslyasarepresentationfor thenumber‘3,684’ andasaspecificationof how the numberis actually read; indeedthere is no other way to representthenumbernamefor ‘3,684’ in Chinesethanby the string of charactersgiven.15 And

15Businessandaccountingvariantsof thestandardnumbernamecharactersdo exist: thus ³ er insteadof ´ er for ‘2’. But thesearemerelycontextually determinedgraphicalvariantsof thestandardforms.

For seriousmathematicalcalculations,standardChinesenumeralsare not very convenient,and other

6.3. NUMERICAL NOTATION 191

numericalrepresentationschemesdevelopedby a particularculturetend,not surpris-ingly, to have propertiesthat are influencedby the linguistic factsof the languagespokenby thatculture: thusAncientMayannumericalrepresentationis basicallyvi-gesimal,reflectingthevigesimalsystemusedin numbernameconstructionin Mayanlanguages.

Thussomenumericalrepresentationschemesareat leastpartly glottographicindesign,in thatthey reflectat leastsomeaspectsof thestructureof thelinguisticsystemof numbernamesof the languagespoken by the designersof the system. In con-trast,the Hindu-Arabicsystemis decidedlynon-glottographicin designeven for thespeakersof the SouthAsian languages(whichever they may have been),who devel-opedit, around600AD from anearliermoreglottographicsystem(Pettersson,1996,page804). Ratherit is a purelymathematicallymotivated“positional” representation(Harris,1995;Pettersson,1996)wherepowersof thebase(10)are representedby thepositionof digits in a grid startingfrom the rightmostposition,andthe digits them-selvesrepresentmultipliersof thepowerof thebase.Thusanumbersuchas � 3,684�representsstraightforwardly (omitting 5yª2µ ):(6.7) ©A¶·5yª0«y��¬A¶¸5yª0®y��¯A¶·5yªU°I��²

TheHindu-Arabicsystemis now usedto representnumbersin thewritten repre-sentationof nearlyall languages,andthesystemsof numbernamesin the languagescoverawidespectrumof possibilities.A sampleof therangeof possibilitiesfor theex-ample‘3,684’ is givenbelow in (6.8).English(6.8a)is afairly straightforwarddecimalsystemwherethereis a closeone-to-onemappingbetweenthe wordsin the numbername,andthe multipliers andmultiplicandsin the factorizedrepresentationin (6.7).German(6.8b) is similarly straightforward with the exceptionthat, as in mostotherGermaniclanguages(ModernEnglishbeingthenotableexception),thedigitsandtensarepresentedin the reverseof their “logical” order. In Malagasy(6.8c) (Rajemisa-Raolison,1971), the entirenumbernameis presentedin the reverseof its “logical”order. Finally, in thecaseof Basque(6.8d)we find a partially decimal-vigesimalsys-temwherenumbersbelow 100areregularly representedin termsof sumsof productsof powersof 20 followedby unitsor ‘ten’ plusa unit.16

(6.8) (a) threethousandsix hundredeightyfour

(b) dreitausendsechshundertvierundachtzig(three+thousand+six+hundred+four+and+eighty)©A¶·5Mª2«y��¬g¶¹5Mª2®y��²K��¯A¶·5yªU°

(c) efatraambyvalopolosy eninjatosy teloarivo(four andeight+tenandsix+hundredandthreethousand)²U��¯g¶¹5Mª±°y��¬g¶·5yª2®M��©A¶·5yª0«

(d) hiru mila seirehundalaurogeitalau(threethousandsix+hundredfour+scorefour)©A¶·5Mª2«y��¬g¶¹5Mª2®y��²º¶�»tª±°I��²

numericalrepresentationswereinventedfor suchpurposes;see(Needham,1959;Pettersson,1996)16See(Hurford,1975)and(Stampe,1976)for surveysandlinguisticmodelsof numbernamesystemsand

(BrandtCorstius,1968)for someearlygrammaticalmodelsof numbernames.

192 CHAPTER6. FURTHER ISSUES

The relationshipbetweenthe blatantly non-glottographicHindu-Arabicnumeralsystemand the numbernamesystemsof the variouslanguagesin which it is usedwould beof largely academicinterestwereit not for thefact thatconvertingbetweenthe two representationsis somethingthat literatespeakersdo routinely— andsome-thing thatautomatictext-to-speechconversionsystemsmustalsobecapableof. Thisimmediatelyraisesthe questionof what kind of mappingsuchspeakersperformandhow the modelof this mappingrelatesto the theoryof writing systemsthatwe havebeendeveloping.

On first consideration,our modelof writing systemswould appearto have littleto sayaboutthis mapping,sincethe two mostprominentassumptionsthat onemustmakeseemto directlycontradictwhatweknow to betrueof theHindu-Arabicnumeralsystem,andwhatis claimedto betrueof numbernamesystems:� A numericalrepresentationsuchas‘3,684’ mapsdirectly to a linguistic level of

representation,in thiscasethelexical representationof thenumbernameitself.17� Themappingbetweenthetwo levelsis regular.

Thefirst assumptionpatentlycontradictsthemathematicaldesignof thesystem,whichwasclearlynon-glottographic.Thesecondassumptionis alsoclearly falsein generalsincethe“alphabet”of powersof tenis infinite: by definitiononecannothavearegularrelationthatinvolvesaninfinite alphabet.18

But thesetwo pointsaremisleading.First of all, the original designof a writtenrepresentationsystemshouldnot confusetheissueof how thesystemis actuallyusedby readersof alanguagethatusesthatsystem:thereis noreasonwhy theHindu-Arabicnumeralsystemasused,say, in Englishcouldnot have a dual function,namelyasamathematicallymotivatedrepresentationof thenumber, but alsoasacrudelogograph-ic representationof thenumbernamesof thelanguage.Secondly, theobservationthatthe alphabetof powersof ten is non-finitemissesthe importantpoint that thereis alimit to the lengthof a digit string that will be readby a humanreaderas a numbername(asopposedto merelya stringof digits). Thelimit of coursevariesfrom readerto reader, andpresumablydependsuponthelevel of literacy andmathematicalacumenof thepersoninvolved. But thereclearly is sucha limit: while mostreadersof Amer-icanEnglishwould have no problemreading‘1,000,000’asonemillion, fewerwouldbe so confidentaboutonequadrillion for ‘1,000,000,000,000,000’; andpresumablynonewould be ableto translate(without the aid of pencil andpaper, andpossiblyadictionary)a numbersuchas‘1,000,000,000,000,000,000,000,000,000’. In practicalsituations,suchnumbersaretypically eitherrepresentedin scientificnotation( 5yª0®�¼ ),whichhasatotally differentmodeof readingaloud,or else(at leastwith smallernum-bers)partlyin words(e.g.AmericanEnglish1 trillion for ‘1,000,000,000,000’). Giventheseobservations,wecanproceedto developafinite-statemodelfor theconversionofHindu-Arabicnumeralsinto numbernamesfor agivenlanguage;thismodelis theone

17Wetake it for grantedthatnumericalrepresentationsdo not generallyrepresentphonologicalinforma-tion.

18Number-namesystemsthemselveshave beenarguedto bemildly context sensitive, hencenot regular;see(Radzinski,1991).

6.3. NUMERICAL NOTATION 193

0 1

1:1

2:2

3:3

4:4

5:56:6

7:7

8:8

9:9

4

ε :10^1

2

ε :10^2 5

0:0

1:1

2:2

3:3

4:45:5

6:6

7:7

8:8

9:93

0:0

1:1

2:2

3:3

4:45:5

6:6

7:7

8:8

9:9

ε :10^1

Figure6.1: A numeralfactorizationtransducerfor numbersup to 999.

usedin theBell LabsmultilingualTTSsystem,asdescribedin (Sproat,1997b;Sproat,1997a).

The problemis bestunderstoodby factoringit into two components.The firstcomponent,which we shall term factorization, is a mechanismfor expanding aHindu-Arabicnumeralsequenceinto a representationin termsof sumsof productsofpowersof ten. The secondcomponentmapsfrom this factorizedrepresentationintothesequenceof wordsthatmake up thenumbernamecorrespondingto theparticularnumeralsequence;let us term this latter componentthe numbernamegenerator.Both of theseoperationscanbehandledusingfinite-statetransducers.For example,asimpletransducerthatfactorsnumeralsup to multiplesof 5yª0® is shown in Figure6.1.Generatinga numbernamefrom a numeralstring then consistsof composingthestring with the factorizationtransducer, composingthe resultwith the numbernamegenerator, andthencomputingtheprojectionof theoutput,or formally:

numbername\Y½ ® � numeral¾ factorization ¾ numbernamegenerator �Thenumbernamegeneratoris obviously language-specific,sincenot only arethe

194 CHAPTER6. FURTHER ISSUES

lexical itemsinvolved specificto a given language,but alsovariousaspectsof theircombination:it is a language-specificfact of English, for examplethat onemay (insomedialectsmust) use the word and betweenhundred and following material inthe numbername,but do not useit after thousand, million, etc; similarly in Russiancomplex caseandgenderagreementis requiredbetweenelementsof thenumbername(Wade,1992).19

The factorizationtransduceris alsolanguagespecific— or onemight bettersay,language-areaspecific.This is in partbecauselanguagesdiffer in thewaythey logical-ly factorizea long numbername.Most (decimal)numbernamesystemshave distinctwordsfor 5yª±° , 5Mª2® and 5Mª2« , but differ significantlyon higherpowers. The first fivepowersof tenfor which thereareseparatelexical itemsin AmericanEnglish,Chinese(alongwith several other EastAsian languages)andHindi (alongwith mostSouthAsianlanguages)aregivenin (6.9):20

(6.9) English 5yªU° , 5Mª2® , 5yª2« , 5yª0¿ , 5Mª2ÀChinese 5yªU° , 5Mª2® , 5yª2« , 5yª2Á , 5Mª2ÂHindi 5yªU° , 5Mª2® , 5yª2« , 5yª0Ã , 5Mª2¼

For the “missing” powers,languagesrevert to an analytic strategy, so that 5k¶·5yª Á inEnglishis expressedasten thousandand 5k¶·5yª0ù�Ä5-¶¹5MªtÁ is expressedasa hundredand ten thousand. Thusa numberlike ‘12345678’would be factoreddifferently inthesethreelanguages.Here,theanalyticblocksareunderlined:

(6.10) English ÅxÆ0Ç�ÆÉÈ2ÊGËÍÌPÎÏÇÐÆÉÈyÑ ËeÅ Ò;ÇÐÆÉÈMÓ­ËÕÔ-Ç�ÆÉÈtÊiËÍÖPÎÏÇÐÆ×ÈMØ ËÚÙ;ÇÛÆ×ÈCÓgËÝÜKÇÐÆÉÈtÊiËßÞChinese ÅxÆ0Ç�ÆÉÈ Ø ËÍÌ;ÇÐÆÉÈ Ó ËÚÒkÇÐÆ×È Ê ËßÔIÎàÇÐÆ×ÈMá ËÍÖKÇ�ÆÉÈ Ø ËßÙkÇÐÆÉÈ Ó ËÍÜ;ÇÐÆ×È Ê ËÍÞHindi ƱÇÐÆÉÈMâ ËeÅ Ì;ÇÐÆÉÈ Ê ËÚÒIÎÏÇ�ÆÉÈMã ËeÅ Ô-Ç�ÆÉÈ Ê ËÍÖPÎÏÇÐÆ×È Ø ËÚÙ;ÇÛÆ×È Ó ËÝÜKÇÐÆÉÈ Ê ËßÞ

Onepoint that is not often notedin discussionsof numericalrepresentationsandtheir relation to numbernamesrelatesto the positioningof commaor otheraids tointerpretationthat are typically insertedinto long numerals.21 Whereone finds acommawritten dependsexactly uponwhich powersof ten the languagehasdistinctwordsfor. For English,thecommais written in positionscorrespondingto theendofthe first andsecondunderlinedblocksin (6.10): thusonewrites � 12,345,678� . InChinese,thecommais alsowritten after thefirst underlinedblock, this time resultingin � 1234,5678� . Finally in Hindi, onewrites the commaafter the first secondandthird underlinedblocks, resultingin � 1,23,45,678� .22 Thus the placementof thecommacorrespondsexactly to the(right) edgeof analyticnumbernameconstructions.It is hardto interpretthecommain any otherway thanasanaid for thereaderin themappingbetweenthenumericalrepresentationandthenumbername.In otherwords,

19In a working system,suchastheBell LabsTTS system,suchlinguistic factscanbehandledin partbyrewrite rulescompiledinto finite-statetransducers.

20Thesystemfor AmericanEnglishdiffers,of course,from thatusedtraditionallyin British English,andcurrentlyin otherWesternEuropeanlanguages.

21Thesymbolis ‘,’ in English,ChineseandHindi, asit happens.In many EuropeanlanguagesbesidesEnglishoneuseseither ‘.’ or simply a space,the ‘,’ being usedto representwhat is representedwith adecimalpoint in English.

22The surfaceform of the numeralsin Hindi andotherIndian languages(andalsoArabic), is differentfrom thatusedin Englishor Chinese:but thesystemotherwiseworksexactly thesame.

6.3. NUMERICAL NOTATION 195

thenumericalrepresentationis beingtreatedin thewriting systemasa representationnotonly of mathematicalobjects,but simultaneouslyasanorthographicrepresentationof linguistic objects.

For somelanguages,additionalmechanismsarerequiredin thefactorizationstep.In German,for example,digits anddecadesoccurin thenumbernamein the reverseof their ‘logical’ order, as noted in (6.8b). This “decadeflop” can be handledbya finite-statetransducer, but only at somecost, sincetransducerscan only performstringreversals(for a finite setof strings)by enumeratingall stringspairedwith theirreversedform. (This is exactly what is donein the Germanversionof the Bell LabsTTS system;see(Sproat,1997b).) So the mappingbetweennumeralsand theirassociatedfactorizationsis still regular, but not elegantlyso. It is thusnot surprisingthat speakersof GermanandDutch (which hasan equivalentnumbernamesystem)havesomedifficulty in readingnumbernamesfrom their numericalrepresentation.23

For Malagasy, which hasa completereversalof the logical order(6.8c),it makesmoresenseto assumethat readers,whenfacedwith a number, shift their attentiontotheendof thenumeralstringandread(within thatstring)from right to left, temporarilyoverriding the normalleft-to-right orderof reading.Thus,a simpleregularmappingbetweennumeralandnumbernamecanbemaintained,with theonly addedassumptionbeingthe additionallow-level processingstrategy just described.A similar strategymust in any casebe assumedfor Hebrew and Arabic, wherethe script runs fromright to left, but Hindu-Arabic numeralsrun from left to right as they do in otherlanguages.24

Turningto vigesimalsystemswenotethattherearetwo possiblestrategiesfor deal-ing themappingbetweendecimal-basedHindu-Arabicnumerals,andnumbernames.The first would be to insert into the factorizationstepa stepthat performsthe baseconversionbetween10 and20. For numericalrepresentationsof a finite size,this canbehandledby a regular relation. Oncethis factorizationinto powersof 20 is accom-plished,the numbernamegeneratorwould work in a vigesimalsystempreciselyasin a decimalsystem. As a practicalmatterhowever thereseemto be relatively fewextant vigesimalsystemsthat have distinct wordsfor anything above a small powerof 20. In Basquefor example,the systembecomesdecimalabove 5yª ® . Merrifield(1968) reportson the Macro-MayanlanguageCh’ol, which haswords for »2ª±° , »tª0®and »2ª2« , thusreflectinganold Mayanpurelyvigesimalsystem;but healsonotesthatthe systemfor the larger numbershasessentiallygiven way to the decimalsystemof Spanish.Furthermorethereis no dataon Ch’ol speakersreadingnumbersfrom aHindu-Arabicdecimalrepresentation(assumingit would be possiblefor themto dothis). So for the point at hand,namelythe conversionof decimalHindu-Arabicnu-meralsinto vigesimalnumbernames,we cannotdeduceanything from theexistenceof sucha systemof numbernames.For simpler(onemight evensaysemi-fossilized)

23Harald Baayen,personalcommunication. The evidenceis currently only anectodal,and this claimcertainlyneedsto besupportedby experimentalevidence.

24It is interestingto note that whenMalagasywaswritten in the Arabic script (beforethe early 19thcentury),this low-level processingstepneednot have beenassumed.The script ran from right to left, ofcourse,but numeralswouldhavebeenrepresented,asin Arabicor in presentdayMalagasyfrom left to right.Thusa readerof Arabic-script-basedMalagasycouldmaintaina right-to-left readingorderthroughoutboththeordinarytext and(asin modernMalagasy)thenumber.

196 CHAPTER6. FURTHER ISSUES

decimal-vigesimalsystemssuchasBasque,it is a relatively straightforwardmattertolist thevigesimal-basedwordsin the numberlexicon, associatingthemwith decimalratherthanvigesimalfactorizations.Thesolutionis notentirelyelegant,but it is notto-tally unreasonableeither, especiallysince,aswasnotedabove,Basquenumbernamesaredecimalabove 5yª0® ; numbersunder 5Mª2® arethuslexical exceptionsto the generalpatternof thenumbernamesystemof ModernBasque.

It shouldbe stressed,if it is not alreadyclear, thatwhat I have presentedis not alinguistic theoryof numbernames,but rathera modelof themappingbetweena nu-mericalrepresentation— theHindu-Arabicsystem— andthenumbernamesystemsof variouslanguages.Numbernamesystemsthemselvescanbe quite complex: S-tampe,for example,givesanexampleof anexotic systememployedin Sora,aMundalanguageof India (Stampe,1976,page601):

Most Mundalanguageshave decimal-vigesimalcounting:they count10, 20, 20 + 10, 2 x 20, (2 x 20) + 10. Sorachangedfrom a decimaltoa duodecimal(12) basewithin this vigesimalstructure. Sorasthereforeaddunitsto 12 to reach19migg| l-gulji (12+ 7); thencount20 b -kor. i (1x 20) andaddunits to reach32 b -kor. i-migg| l ((1 x 20) + 12), to whichareaddedunitsto reach39 b -kor. i-migg| l-gulji ((1 x 20) + 12 + 7); 40 isba-kor.i (2 x 20), andso on, in a Stravinskianalternationof twelvesandeightsunparalleledin any known language.

Thecurrenttheorywould beableto provide a modelfor the readingof Soranumbernamesfrom a Hindu-Arabicdecimalrepresentation:but of courseit would not reallyaccountfor the form of the numbernamesthemselves,a topic that falls ratherunderthedomainof a linguistic theoryof numbernames.

In summary, it is indisputablethat the primary representationalfunction of theHindu-Arabicnumeralsystemis mathematical,not linguistic. IndeedHindu-Arabicnumeralsareoften held up asan archetypalexampleof a patentlynon-linguisticallymotivatedwritten representation.However, themappingbetweennumeralsandtheirassociatednumbernamesin a large variety of languagescan,somewhat surprising-ly, be handledby a model that is consistentwith the moregeneralmodelof writingsystemsthathasbeendevelopedin theremainderof this book.

6.4 Abbreviatory Devices

Theliteratureon writing andwriting systemscontainsvery little discussionof abbre-viations,acronyms,andothershorteningdevices falling underthe generalrubric of“initialisms”. This is perhapsunsurprisinggiventheheavy focusin that literatureonwhat linguistic objectswritten symbolsrepresent,andhow they representthemin thespellingof “ordinarywords”. Yet it is at thesametimesomewhatof anoversightsinceabbreviatory devicesof variouskinds have a history that is asold asthat of writingitself (Cannon,1989;Romer, 1994).

Abbreviations,asdefinedbelow, areof particularpracticalimportancein the de-velopmentof TTS systemssincethesystemmustdecideon how to readthem,given

6.4. ABBREVIATORY DEVICES 197

that they typically do not obey thenormal“pronunciationrules” of the language.Forstandardizedcasessuchas � Blvd � for Boulevard, this is lessof aproblemsincesuchcasescanbecatalogued.(Thereis, however, theproblemthatmany abbreviationscanconventionallystandfor more than one thing, as in � St� (Street, Saint) or � Dr �(Drive, Doctor, drachma); or elseareconfusablewith ordinarywordsasin � Ave�(Avenue, or aveasin avemaria). Theseissuesareamenableto sensedisambiguationtechniques,suchasthoseof Yarowsky (1996).) But creatively coinedabbreviationsarenotatall uncommon,andin certaingenres,suchasrealestateadvertisements,theyarerife. Considertheexamplein (6.11)takenfrom theNew York Timesrealestateadsfor January12,1999:

(6.11) 2400’ REALLY! HI CEILS,18’ KIT,MBR/Riv vu, mds,clstsgalore!$915K.

Herewe find � CEILS� (ceilings), � KIT � kitchen, � MBR � masterbedroom,� Riv vu � river view, � mds� maids(room)(?) and � clsts� closets, noneof whicharestandardabbreviations,at leastnot in generalwritten English.While humanread-ers(usually)havenoproblemreconstructingtheintendedwords,thesearebeaseriousproblemfor TTS systems,which generallywill fail to correctlyexpandtheabbrevia-tion in suchcases.Clearly, though,abbreviation is a productive processthatmustbemodeledin any theoryof therelationbetweenlanguageandwriting.

Thepurposeof thissectionis toproposehow abbreviatorydevicesmightfit into thetheorythatwe have developed.Beforewe proceed,however, it is necessaryto definesomeof our terminology, sincetermssuchasabbreviation, acronymandsoforth, areusedin differentwaysby differentpeople;see(Cannon,1989)for adiscussionof someof this terminologicalquagmire.

For the purposesof the presentdiscussion,we will distinguishthreecategories.The first category, abbreviations, constituteall caseswherethe normalspellingof aconstruction— typically, thoughby no meansalways,a singleword — is shortenedeitherby deletingletters(e.g. � St.� , � Dr. � , � kg � , � CEILS� or � clsts� ), or bysubstitutinga shorterstringof symbolswhich is synchronicallyunrelatedto thetargetword: the lattercasesinclude � lb � for pound, � % � for percent, � & � for and and� $ � for dollar. Note that my useof the term abbreviation differs from Cannon’s,on which seebelow. What all of thesecaseshave in common,andwhat setsthemapartfrom theothertwo categoriesto bediscussedmomentarilyis thatthey all involveshortenedformswhereit is nonethelessintendedthatonereadthefull formof theword.Thus,whenencounteringtheabbreviation � lb � , onewould normally readpoundorpounds, but not l. b..

Thesecondcategory, which we shall term letter sequences, behave differently: inthiscasetheintentionis thatonereadthemassequencesof letters,irrespectiveof whatthey standfor. Thus � CIA � , � USA � and � ACL � areto be readassequencesofletters,despitethefactthatthey standfor (amongotherthings)Central IntelligenceA-gency, UnitedStatesof AmericaandAssociationfor ComputationalLinguistics. Notethat Cannonincludescasessuchas theseunderthe rubric of abbreviation, thoughthesereally differ in kind from what I have termedabbreviationsabove, sincelettersequencesarenot generallyto beexpandedinto a word or setof words.On theother

198 CHAPTER6. FURTHER ISSUES

handwhatI termlettersequencesareoftenpopularlycalledacronyms, which is moreproperlyusedto namethethird category. To avoid thesepotentialconfusions,then,Isuggestthe term letter sequence. Typically a letter sequenceis formedfrom the ini-tial lettersof the wordsof the phrasebeingabbreviated, thoughfunction wordsareoften omittedin this computation(as in � USA � ). Periodsmay be usedwithin thelettersequencethoughtheseseemneverto berequired;see(Cannon,1989)for furtherdetails.

Thethird category areacronyms, which canbethoughtof aslettersequencesthatare to be readas words. Well-known examplesare � NATO � , � UNESCO� and� AIDS � . The formationprinciplesof acronymsaresimilar to thoseof initial lettersequences,but therearedifferences.Acronymsaremorelikely thanlettersequencestohave additionallettersaddedbeyondthe initial lettersof theconstituentnon-functionwords;Cannon(page114)citesexamplessuchas � APEX � from advancepurchaseexcursion. And acronyms canbe longer than letter sequences:the initial letter se-quencesin Cannon’scorpushadmaximallyfive letters,whereasacronymscouldhaveasmany aseightletters(pages110–113).

Unlike abbreviations,both letter sequencesandacronymsarederivedmostoftenfrom multi-wordphrases.

How arethesevariousclassesof initialismsaccountedfor within thecurrentmod-el? Let usstartwith abbreviations,which aretheeasiestto describe.For standardizedabbreviations it makessenseto simply assumethat they are listed asan alternativeorthographicentryfor a theword,or words,thatthey areassociatedwith. This wouldyield a representationsuchastheonein (6.12)for � Dr � representingDoctor:

(6.12) > PHON ��E :�< 1KHUäw|På J < �ORTH NMæ : Q å J T L

Note that thereare somewords for which thereis no standardnon-abbreviatedform: in English theseincludeMrs (missusis a possiblefull spelling,but this is infact hardly ever used),andMs. In thesecasesoneassumesthereis simply no fullorthographicentry.

For novelabbreviations— caseslike � clsts� in (6.11)above— wemustassumeadevicewherebytheabbreviationmaybederivedproductively from thenormalspellingof theword. It is notentirelyclearwhattheconstraintsonabbreviation formationare:clearlyin Englishvowelsareparticularlyproneto beingdeleted,andthereseemsto beatendency to deletenon-initialconsonantstoo,but beyondthis it is hardto sayclearlywhatmakesagoodversusanunacceptableabbreviation. However, it seemslikely thatwhatever theconstraintsare,they canbedescribedin termsof regularrelations.Thisbeingso,wecanmodelproductively-formedabbreviationsbycomposinganadditionalabbreviation transducerç ontotheoutputof c dºfihkjèl : c dgfih-jèl ¾¹ç . This predictsthat abbreviations,as we have definedthem, can be formed purely on the basisofthe orthographicform. It mustthereforebe possibleto recastapparentphonologicalinfluenceson abbreviation (if any) in purelyorthographicterms.Whetherthis is trueor not remainsto beseen.

6.4. ABBREVIATORY DEVICES 199

For acronyms and letter sequencesthe modelmustbe different. Acronyms like� NATO � andletter sequenceslike � CIA � certainlyrepresent,respectively, NorthAtlantic TreatyOrganizationandCentral IntelligenceAgency(or Culinary Instituteof America). But they arenot generallyto be read assuch. Rather � NATO � , forinstance,is the orthographicrendition of a lexical item that happensto denotethesameasNorth AtlanticTreatyOrganization, but is pronounced/neIto/:

(6.13) > PHON ��? :�< � I J < ä � < @Cé < �ORTH NCê : Q ç J Qàë � Q4ì é T L

A similar analysiswould begivenfor � CIA � , whereherethesyllables/si aI eI/ arerepresentedorthographicallyby theletterswith thecorresondingnames:

(6.14) > PHON �=í � :=< 1 I J < � I

� < �ORTH N�î : Q�ï J Q ç �CT L

Thus,whereasabbreviationsmerelyconstituteshorterwaysof writing existing lexicalitems, acronyms and initial letter sequencescorrespondto new lexical items. Thecreationof acronyms and initial letter sequencesis thus a type of word formation(Aronoff, 1976;Cannon,1989),applying to the orthographic representationof theword, ratherthan(aswould normallybethecase)on thephonologicalrepresentation.Onceonehasthenew orthographicform, thepronunciationcanbederivedin oneoftwo ways. In the caseof acronyms the relation that mapsbetweenphonologyandorthographicform (“spelling rules”) canbe inverted(“grapheme-to-phonemerules”)to producea phonologicalrepresentation.In the caseof initial letter sequencesthephonologicalrepresentationis formed out of the normal namesof the letters; notehere,though,that otherdevicessuchasdescriptionsof the letter sequenceinvolved(Triple A for � AAA � ) arepossible.

While the rangeof abbreviatory devicespossiblefor Englishseemsto be widelyavailablefor many written languages,the distribution of the varying typesseemstodiffer from languageto language.(I am not awareof any cross-linguisticsurveys ofthedistributionsof typesof abbreviatorydevices.)In somelanguages,indeed,certaintypesseemto be essentiallylacking. For example,Chineseseemsto have very fewabbreviationsin thesensethatI haveusedthis term,andit will beinstructiveto digressfor a momentandconsiderthecaseof Chinese,sinceit offersaninterestingexampleof how differentpropertiesof thewriting systemcanresultin differentpossibilitiesforabbreviatorydevices.

In Chineseacronyms— termedsuoxie‘shrunkenwriting’ — abound:thusonefindsmany standardexampleslike ðeñ bei da for ð òeñ ó beijıng daxue ‘BeijingUniversity’; ô�õ dengxuan for ô ö¡÷£ø¥õ dengxiaopıngwenxuan(DengXiaopingselected-works)‘the selectedworksof DengXiaoping’; and øeù wenge for ø ú¡ñù û wenhua dagemıng ‘Cultural Revolution’ (Wang,1996). An examinationof the

200 CHAPTER6. FURTHER ISSUES

examplesgivenwill revealthatthepiecesselectedfor thesuoxieconstructionneednotcomefrom theinitial of thecorrespondingconstituent,unlike whatonewould almostinvariably find in English. Nonetheless,Chinesesuoxieare like English acronymsin that they are shortenedforms of longer constructionswhich, crucially, are readin their shortenedform, not expandedinto the constructionfrom which they werederived. Now, astheastutereaderwill have alreadynoted,thenatureof theChinesewriting systemmakes it impossibleto determinewhethersuoxie is more correctlyequatedwith acronyms,aswe haveheretoforeassumed,or with lettersequences.Thekey distinction is in how thesetwo kinds of constructionsare read: acronyms arepronouncedby applyingpronunciationrulesto thesequenceof symbols;initial lettersequencesarepronouncedby namingthe lettersin sequence.In Chinese,thesetworoutesyield thesameresultsincethepronunciationof a characteris alsothenameofthatcharacter.

As we noted,Chinesebasicallylacksabbreviationsin thesensethatwe have de-fined. With the exceptionof specialsymbolslike ‘%’ and ‘$’, which areexpandedinto thecorrespondingexpressionsfor ‘percent’,‘dollar’, etc.,thereareessentiallynoothercaseswherea shortenedform is expandedduring reading:this evenappliestoborrowedforms,like ü kg ý or ü cmý , which areobligatorily treatedasabbreviationsin English,but which Chinesereadersspell out assequencesof letters:thus ü kg ý isreadliterally ask. g. (Chilin Shih,personalcommunication).Theapparentavoidanceof treatingborrowedgraphicalelementsasabbreviationsto beexpandedwhenreadingcouldbeexplainedby thefact thatChinesehashistorically lackedabbreviations.Butwhy did it lackabbreviations?

I believe the explanationmay be dueto a conspiracy betweenpropertiesof Chi-nesemorphology, thenatureof theChinesescript,andthe functionof abbreviations.First, Chinesewordsaretypically short,one-andtwo-syllablebeingtheoverwhelm-ing majority. (This is with thenotableexceptionof nominalcompounds,whichcanbequite long.) In the earliestforms of Chinese,it is commonlyconjecturedthat wordswere largely monosyllabic(seevariouschaptersin (Packard,1998) for discussion),andevenin laterClassicalChinese,which wasthewritten standardup until theearlypartof thiscentury, monosyllabicwordsmadeupalargerproportionof wordsin atyp-ical text thanthey would in present-dayspokenMandarin.Second,Chinesecharactersphonologicallyalmostalwaysrepresentsinglesyllables.Third, asweobservedabove,abbreviationsaremostcommonlyusedto abbreviatesinglewords(thoughabbrevia-tions of phrasescertainlydo occur). In Chinese,then,all onecould hopeto gain inmostcaseswould betheshorteningof a word thatwould bewritten with two charac-ters(a two-syllableword) into a one-characterabbreviation,somethingthatwouldnothaveaffordedmuchof asavings.Therewasthereforelittle to begainedby introducinggraphicalshorteningdevicesin theform of abbreviations.25

Abbreviations,aswe havedefinedthem,area purelygraphicaldevice intendedtoshortentheformof writtenwords.Acronymsandinitial lettersequencesaresomewhatmorecomplex thanthis,but they toodependuponthewritten form of words.As such,all of theseformsof “initialisms” haveaplacein acompletemodelof writing systems.

25Notethatothergraphicalshorteningdeviceswereemployed,suchasaspecialsymbolto indicateredu-plicatedcharacterssimilar in functionto thereduplicationmarkersdiscussedin Section4.4.2.

6.5. NON-BLOOMFIELDIAN VIEWS ON WRITING 201

In thissectionwehavemadesometentativestepstowardsfitting thesedevicesinto theproposedmodel.

6.5 Non-BloomfieldianViewson Writing

It hasoftenbeennotedthat scholarsof languagehave beendivided in their attitudesaboutwriting alongtwo partly independentdimensions.The primary dimensionre-latesto whetherthestudyof writing is eveninteresting:Bloomfieldis generallycitedasthesourceof theview thatwriting itself is not interesting(sincewritten languageismerelyatbestacrudeapproximationof spokenlanguage,thetrueobjectof study),andthisview hasto a largeextentsurvivedin moderngenerativelinguistics.26 Theseconddimension,assumingoneat leastacceptsthat writing systemsmight be interesting,is how written languagerelatesto language.Specifically, is it in Bloomfield’s termsa “way of recording[spoken] languageby meansof visible marks”,or is it indeedaseparateform of communicationthatneednot relateto spokenlanguageat all?27

ThePraguians,mostnotablyVachek(e.g. (Vachek,1973))have beenamongthemoststaunchdefendersof theview thatwritten languageshouldbetreatedseparatelyfrom spoken language,but a numberof British linguists, including Sampson(1985)andHarris(1995),havechallengedtheessentially“glottocentric”Bloomfieldianview.For example,aswe discussedearlier, Sampsondistinguishesbetweenglottographicwriting systems,wherethewritten symbolsrepresentsomeaspectof specificallylin-guisticinformation;and“semasiographic”writing systems,wherethewrittensymbolsdirectlyrepresentthe“meaning”of theintendedmessage,giving noinformationabouthow onewould actuallyexpressthis meaninglinguistically. Crucially, for Sampsonboth of thesekinds of systemscount aswriting, and this view is echoedby Harris(1995).

Perhapsthe clearestinstanceof the opposingview is offered by DeFrancis(1989),whodefines“writing system”assynonymouswith “glottographicwriting sys-tem”.28 Onecentralreasonis thatif oneinsiststhatwriting systemsareexactly thosegraphicalsystemsof communicationwhereinonecanexpressany messageexpressiblein spokenlanguage,thentheonly extantsystemsthatmeetthis requirementareglot-tographicones.To besure,therearemany highly complex notationalsystemsthatarearguablynot glottographic,andwhich allow for a rich setof expressions:mathemati-calnotation,danceand othermovementnotationsystems,music(seevariouschaptersin (DanielsandBright, 1996)),andeventheicon-basedmessagesone frequentlyseesin (especiallyEuropean)instructionmanuals(Sampson,1985,pages31–32). All of

26It would bea mistake, however, to view this asany kind of religiousdogmain generative linguisticsthatis inculcatedin succeedinggenerationsof disciples.WhenI wasagraduatestudentat MIT in theearly80’s, I donot recallwriting systemsbeingdiscussedin any form, eitherpositively or negatively. Ratherthelack of attentionto writing systemsamonggenerative linguistsarises,I believe, simply becausefew in thattraditionhave thoughtmuchaboutthetopic,andnonehave beenencouragedto doso.

27Of course,asHarris rightly observes(1995,page45), the existenceof Braille shows that the writingdoesnot have to bevisibly arranged,merelyspatiallyarranged.

28Indeed,aswe have seen,he goesmuchfurther thanthis, insistingthat full writing systemsmustnotonly beglottographic,but at leastto somedegreephonographic.

202 CHAPTER6. FURTHER ISSUES

theseexamplesinvolvesymbologythatis conventionalto agreateror lesserdegree;allof themareclearlysystemsof communicationthatarecomplex to a greateror lesserdegree;andall of themareableto communicatemessagesthatcanbequitecomplex,especiallyif onehad to put them into words. But all of themare highly restrictedin the domainto which they apply, andthis is, to take DeFrancis’position,the cruxof the matter. To make the samepoint in a differentway, while it maybe painful topreciselyexpressa complex mathematicalexpressionusingordinarywritten English,this is somethingthatcouldbedone,just asonecouldreadtheexpressionaloud,andhave it be understoodby someonewith sufficient mathematicalknowledge. In con-trast, it would be hardto seehow onecould representthe AmericanDeclaration ofIndependenceusingonly thesymbologyof mathematics.Thusglottographicsystemsaregeneral,whereasarguablysemasiographicsystemsarerestricted.

Shouldnon-glottographicsystemsbe consideredwriting? On the faceof it thiswould appearto be purely a matterof definition, and hardly worth arguing about.However, thereis oneimportantpropertythatnon-glottographicsystemssuchasmath-ematicalnotationsharewith (glottographic)written language,which neitherof themsharewith speech,andthatis theuseof atwo-dimensionalsurface:speechis producedandprocessedover time, andthereforecouldbe consideredto be a one-dimensionalsignal; written forms, whetherglottographicor otherwise,usuallyhave two dimen-sionsat their disposal,andfrequentlymake useof themin waysthathave no parallelin speech.

Forexample,asHarrisnotes(pages141–144),crucialusehasbeenmadein diversemathematicaltraditionsof tabular arrangementsof symbols:multiplicationtables,andlogarithmtablesare just two moderninstancesof these.The tabular arrangementiscrucial for representingthe relevant mathematicalconcepts:for example,in a mul-tiplication table,oneunderstandsthe entriesin eachcell of the tableasrepresentingthe productof the numberheadingthe relevantcolumn,andthe numberheadingtherelevantrow. Speechcannotadequatelyrepresentthis two dimensionalstructure.AsHarrisobserves(page144): “If mathematicshadhadto rely onspeechasits cognitivemode,we shouldstill beliving in a primitiveagriculturalsociety.”29

Similar usesof two-dimensionallayout can also be found, of course,in glotto-graphicwriting. OnecasethatHarris pointsto is patternpoetry, wherewordsin thepoemarearrangedsoasevokeapicture;therearealsoinstancesof “patternprose”,themostfamousof thesein Englishbeing,perhaps,themouse’stale/tailin Lewis Carroll’sAlice’sAdventuresin Wonderland. A moderninstanceis thee-mailsignatureblock,amade-up(but perfectlyrealistic)exampleof which is givenin Figure6.2;herewe seethe useof two-dimensionalarrangementin the separationof the postaladdressfromthenamein thelefthandcolumn,thee-mailaddressin thetop righthandcolumn,and

29Having saidthis, it shouldalsobepointedout thattherearesituationswhereoneis forcedto representtabular informationin speech:suchis the caseof readingsystemsfor the blind. Oneingenioustechnicalsolutionthatcircumventssomeof thelimitationsof normalspokenlanguagewasdevelopedby T.V. Raman(1994)in his text-preprocessingsystemnamedAster. Raman’s systemrendersLATEXdocumentsinto speech(usingtheDECTalk TTSsystem),andincludesmethodsfor renderingvariouslevelsof documentstructure,aswell asmathematicalexpressions— includingmatrices,which areof coursea kind of tabular represen-tation. Tablesarereadleft-to-right andtop to bottom,with thepositionin thetablebeingmimickedby theperceivedpositionof thevoicein theauditoryfield.

6.6. POSTSCRIPT 203

---------------------------------------------------------’:’...’’. ---

Michael Farber [email protected] ’::.’’’’’:;:’::; ;;:

Lucent Technologies :::,, :;;1432 Pine St., 3D-403 Lucent Technolo-

gies ;:;’ ;:;Liberty Corner Bell Labs Innovation-

s ;;: .;:New Jersey, 07934 :;:, ,:;:phone: 908-712-9993 fax: 908-712-

9980 ’:;;;:;:’

Figure6.2: Two-dimensionallayoutin ane-mailsignature

the integrationof the verbiagefrom the company logo (“Lucent Technologies,BellLabs Innovations”) into the restof the design. Therearemany otherexamplesthatcouldbegiven.

Sonon-glottographicformsof “writing” sharewith glottographicformstheprop-erty of usingtwo-dimensionalspacein waysthathave no direct counterpartin (one-dimensional)speech.Furthermore,this “layout analysis”(to borrow andsomewhatadapta term from documentimageprocessing)is clearly a field worthy of study inits own right. But whatarewe to make of this observation?Doesit forceus to view(e.g.) mathematicalnotationandordinarywritten Englishasbeingtwo instancesofthe sameclassof object? And must the term “writing” apply to both? I fail to seewhy: presumablyonecouldrestricttheterm“writing” to glottographicrepresentation-al systems,andusea separateterm to denoteforms of symbolicrepresentationthatmakecrucialuseof two dimensions.Glottographicwriting systems,in their full glory,would beinstancesof both;mathematicalandothernon-glottographicsystemswouldbeinstancesonly of thelatter. It comesdown, afterall, to a matterof definition.

Of coursethis view doesentail thatthereareinterestingaspectsof (glottographic)writing that go beyond the way in which writing representsspeech.This conclusionseemsincontrovertible,but it is importantto realizethatthis in no waycontradictsthetheoryof writing systemspresentedin thisbook,whichdealssolelywith themappingbetweenwrittenandspokenform.

6.6 Postscript

I havepresentedaformal theoryof orthography, makingspecificproposalsaboutwhatlinguisticobjectsarerepresented,whatlevelof linguisticrepresentation(whatwehavetermedthe ORL) may be represented,andwhat the constraintson the mappingbe-

204 CHAPTER6. FURTHER ISSUES

tweenlinguistic andgraphicalrepresentationare.Thequestionof whatspecifickindsof linguistic objectsarerepresentedis, of course,a topic that hasoccupiedmuchofthe literatureon writing systems;the level of linguistic representationhasonly beendiscussedextensively in the psycholinguisticliterature,andthereonly in superficialterms; constraintson the mappingbetweenlinguistic andwritten form have hardlybeendiscussedat all. Thereis thereforesomereasonto believe that thecurrentworkis the mostsystematicformal proposalfor a theoryof writing systemspresentedtodate.It is, nonetheless,only a beginning,andit is hopedthatthis work will serveasastimulusfor developinga muchmorecompletetheoryof writing systemsby a muchwider groupof researchers.

Sucha theoryis clearlynecessaryfor a varietyof reasons.Consider, for examplethatorthographicevidencehasbeenoccasionallyusedby generative linguiststo sup-port oneor another(usually)phonologicaltheory. We have alreadydiscussedChom-sky andHalle’s views on English spellingandits relationto their modelof Englishphonology;onecouldaddto this Steriade’s (1982)andsubsequentlyMiller’ s (1994)useof evidencefrom Linear B to supporta modelof syllablestructurefor Greek.Miller’ sstudyis broadin scopeandsystematic,but mostuseof orthographicevidencethatonefindsin theliterature,includingSteriade’s,andChomsky andHalle’s is limit-ed,andmostlyadhoc.In somecases(plausiblySteriade’s)theanalysismayturnouttobecorrect;in others(Chomsky andHalle’s) it is suspect.But therealpointhereis thata fortuitouslyselectedorthographicfactheldup asevidencefor a particularlinguisticclaim cannotbe readily evaluatedin the absenceof a serioustheoryof orthography.Thereis nothingspecialaboutorthographyin this regard:in anentirelysimilar veinaphonologicalfactoidbroughtin asevidencefor a particularsyntacticanalysisshouldnot be taken seriouslywithout a goodunderstandingof the relationbetweensyntaxandphonology. Orthographydeservesthesamelevel of respect.

A coherenttheoryof therelationbetweenwriting andlinguistic form is alsoneed-edin speechtechnology, whichwasthestartingpoint of our discussion.Many speechtechnologyresearchers,both in text-to-speechandautomaticspeechrecognition,im-plicitly view the standardorthographyfor a languagelike Englishasa poor kind ofphonetictranscription. Thusone hearstermslike “letter-to-soundrules” usedas ifsomehow the sequenceof letters ü enoughý wassimply a lousy phonetictranscrip-tion of thesequenceof soundsthatwould (in standardIPA) berepresentedby / þ nÿ f/.Recognizingthatfor a complex orthographylike English,thedevelopmentof a letter-to-soundcomponentis amajorundertaking,therehasbeen,overthepastdecadeor so,a largeamountof interestin automaticmethodsfor acquiringletter-to-soundsystem-s: (Sejnowski andRosenberg, 1987;Luk andDamper, 1993;AdamsonandDamper,1996;Luk andDamper, 1996;Daelemansandvan denBosch,1997) aresomeof the betterknown instancesof these. Suchsystemscanautomatically“learn” thecontext-dependenttransductionsneededfor a systemlike thatof English,so thatonemight expectrough, troughandthroughto becorrectlypronounced.(To date,though,nobodyhasyet demonstratedperformance,for English,at the level of a moretradi-tionally designedsystemincluding a dictionaryplus morphologicalor phonologicalrules.) In sodoingsuchsystemstake advantageof a crucialpropertyof Englishor-thography:while themappingbetweena particularletteranda givensoundis highly

6.6. POSTSCRIPT 205

complex, onecanalmostalwaysfind agoodanswerby lookingat thecontext within afairly smallwindow (sayplusor minusfour letters)aroundthetargetletter.

Evenso,thereis still a largeamountof indeterminacy. Alternationslike produce(noun)versusproduce(verb),or axes/ � æksþ z/ (plural of ax) versusaxes/ � æksiz/(plu-ral of axis), demonstratethat onehasto be preparedto make useof informationnotfound in the letter string alone: in generalonemustuselexical, grammaticalor se-mantic information that can only be inferred from examining a wider context thanjust the individual word. Letterstringsin Englishdo encodepronunciation,but onlyin combinationwith otherinformationthatcannotbecomputedfrom the letterstringalone.

So,to theextentthatautomaticmethodswordpronunciationpresumethatall of theinformationneededto pronounceaword is foundin its letterstring,they aremissingabasicpoint abouthow writing systemsrepresentlinguistic information. Orthographyis not phonetictranscription;ratherit is a guideto the native readerof the languagethat frequentlygivesa large amountof informationabouthow to pronouncewords,but alsoinvariablyassumesthat the readerhasotherlinguistic knowledgeto bring tobearontheproblemof decodingthemessage.This is akey reasonwhy automaticTTSconversionis sohardto do right: mostof thelinguistic knowledgethathumansbringto bearon thetaskof readingis simply missingfrom TTS systems.

To furtherdrive homethis point, consideragainthecaseof Russianorthography.As wehaveseen,Russianspellingis highly regular(much moresothanEnglish),butthereis onecrucial pieceof informationmissingfrom the standardspelling,namelythe lexical stressplacement,without which onecannotpredictthe quality of variousvowels in the word: in Russian,lexically-determinedstressplacementof the kindillustratedby Englishproduce/produceis rampant.Stressplacementinformationcanbe predictedfrom morphologicalinformation, and if such information were addedto the strings(either by somedictionary-plus-rule-basedprocedure,or by someas-yet-to-be-developedhigh accuracy automaticinferenceprocedure),thenof coursethevariousautomaticschemesthathavebeenproposedshouldhavenotroublelearningtherelationbetweentheseannotatedorthographicstrings,andthepronunciation.But thisexercisewould largely defeatthe statedpurposeof mostwork on automaticlearningof “letter-to-sound”rules in that therewould not be muchsavings of labor. On theonehand,developingthemorphologicalanalysistools for Russiansuchthatonecanpredicttheappropriatemorphologicalfeaturesto addto agivenstringis itself amajorundertaking;30 andonceonehasthis portionof thesystem,developingthe“letter-to-sound”rulesis relatively straightforward. It shouldbeaddedthat to datetherearenoknown methodsby which themorphologicalsystemof a languageasmorphologicallycomplex asRussiancould be automaticallylearned;automaticmethodsthus fail tosave laborpreciselywherethesavingsis neededmost.

It shouldbeclearthatmy goalis not to argueagainsttheinvestigationof automaticmethodsfor learningword pronunciation:on thecontrary, investigationof theseandsimilarmachine-learningproblemsareaninterestingandimportantline of inquiry. Butsuchinvestigationsshouldbegroundedin a properunderstandingof thephenomenon

30In our experience,severalmonthsat leastis required:see(Sproat,1997b).

206 CHAPTER6. FURTHER ISSUES

thatoneis attemptingto investigate,andthisunderstandingis frequentlylackingin thespeechtechnologycommunity. As a resultany regularattendeeof speechtechnologyconferenceswill be subjectedto a seriesof quite surprisingclaimsto the effect thatsincesuch-and-suchan automaticmethodperformswith (say) a 10% error rate onEnglishword pronunciation(which is well-known to be the hardestlanguage,or sothesuppositiongoes),thesametechniquecanbeappliedto any otherlanguage,thusobviating theneedfor manuallinguistic labor. Worse,sincemostregularattendeesatspeechconferencesdo not know any better, suchclaimsdo not raisethe numbersofeyebrowsthatthey oughtto.

It is certainlyoptimisticto assumethata well-articulatedformal theoryof writingsystemswill ipso facto raisethe generallevel of awarenessof orthographyand itsrelation to linguistic form. But it is alsocertainthat without sucha theory, writingsystemswill not generallybedeemedworthyof seriousstudyby theoreticallinguists,norwill muchattentionbepaidto their propertiesby speechtechnologists.

References 207

REFERENCES

Adamson,Martin andRobertDamper. 1996. A recurrentnetwork that learnsto pro-nounceEnglish text. In Proceedingsof the Fourth InternationalConferenceonSpokenLanguageProcessing, volume3, pages1704–1707,Philadelphia,PA. IC-SLP. l2s.

Allen, Jonathan,M. SharonHunnicutt,andDennisKlatt. 1987.FromText to Speech:theMITalk System. CambridgeUniversityPress,Cambridge.

Aronoff, Mark. 1976. Word Formation in Generative Grammar. MIT Press,Cam-bridge,MA.

Aronoff, Mark. 1985.Orthographyandlinguistic theory. Language, 61(1):28–72.

Aronson,Howard. 1996. Yiddish. In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages735–742.

Avanesov, R. I. 1974. Russkayadialektnayafonetika(RussianDialect Phonetics).Prosvescenie,Moscow.

Avanesov, R. I., editor. 1983. Orfoepicheskiyslovar’ russkogo yazyka(OrthoepicDictionaryof theRussianLanguage). RusskiyYazyk,Moscow.

Balota,David, GiovanniFloresd’Arcais, andKeith Rayner, editors. 1990. Compre-hensionProcessesin Reading. LawrenceErlbaumAssociates,Hillsdale,NJ.

Baluch,B. andDerk Besner. 1991. Visual word recognition:Evidencefor strategiccontrolof lexical andnonlexical routinesin oral reading.Journalof ExperimentalPsychology: Learning, Memory, andCognition, 17:644–652.

Baxter, William. 1992.A Handbookof Old ChinesePhonology. Number64 in Trendsin Linguistics:StudiesandMonographs.MoutondeGruyter, Berlin.

Bell, AlexanderMelville. 1867. Visible Speech: TheScienceof Universal Alphabet-ics; or Self-InterpretingPhysiological Letters, for theWriting of All LanguagesinOneAlphabet. Simpkin,Marshall,London.

Bennett,Emmett.1996.Aegeanscripts.In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages125–133.

Besner, Derk andD. Hildebrandt.1987. Orthographicandphonologicalcodesin theoral readingof Japanesekana. Journal of ExperimentalPsychology: Learning,Memory, andCognition, 13:335–343.

Besner, Derk andMarilyn ChapnikSmith. 1992. Basicprocessesin reading:Is theorthographicdepthhypothesissinking? In RamFrostandLeonardKatz, editors,Orthography, Phonology, Morphology and Meaning, number94 in AdvancesinPsychology. North-Holland,Amsterdam,pages45–66.

208 References

Bird, Steven. 1995. ComputationalPhonology. CambridgeUniversity Press,Cam-bridge.

Bird, Steven. 1999. Strategiesfor representingtonein African languages:A criticalreview. WrittenLanguageandLiteracy, 2(1). To appear.

Bird, StevenandT. Mark Ellison. 1994. One-level phonology:Autosegmentalrepre-sentationsandrulesasfinite automata.ComputationalLinguistics, 20(1):55–90.

Bird, StevenandEwanKlein. 1994.Phonologicalanalysisin typedfeaturestructures.ComputationalLinguistics, 20:455–491.

Bird, Steven andMark Liberman. 1999. A formal framework for linguistic annota-tion. TechnicalReportMS-CIS-99-01,Departmentof ComputerandInformationScience,Universityof Pennsylvania,Philadelphia.

Bloomfield,Leonard.1933.Language. Holt, RinehartandWinston,New York.

Bloomfield, Leonardand ClarenceBarnhart. 1961. Let’s Read: A Linguistic Ap-proach. WayneStateUniversityPress,Detroit,MI.

Bonfante,Larissa. 1996. The scriptsof Italy. In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages297–311.

Booij, Geert. 1996. Verbindingsklanken in samenstellingenen de nieuwespellingregeling.NederlandseTaalkunde, 2:126–134.

BrandtCorstius,Hugo, editor. 1968. Grammars for NumberNames. Number7 inFoundationsof Language,SupplementarySeries.D. Reidel,Dordrecht.

Bright, William. 1996. TheDevanagariscript. In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages384–390.

Broderick,George. 1984a.A Handbookof LateSpokenManx: Dictionary, volume2.Max NiemeyerVerlag,Tubingen.

Broderick,George. 1984b. A Handbookof LateSpokenManx: GrammarandTexts,volume1. Max NiemeyerVerlag,Tubingen.

Browman,CatherineandLouis Goldstein.1989. Articulatory gesturesasphonologi-calunits. Phonology, 6:201–251.

Browne,Wayles. 1993. Serbo-Croat.In BernardComrieandGreville Corbett,edi-tors,TheSlavonicLanguages. Routledge,London,pages306–387.

Bunis,David. 1975.A Guideto ReadingandWriting Judezmo. TheJudezmoSociety,Brooklyn,NY.

References 209

Cannon,Garland. 1989. Abbreviationsand acronyms in English word-formation.AmericanSpeech, 64:99–127.

Carlton,Terence.1990. Introductionto the Phonological History of theSlavicLan-guages. Slavica,Columbus,OH.

Chen,Hsuan-ChihandOvid Tzeng,editors.1992. LanguageProcessingin Chinese.Number90 in Advancesin Psychology. North-Holland,Amsterdam.

Chomsky, NoamandMorris Halle. 1968. TheSoundPatternof English. HarperandRow, New York.

Chou, Phil. 1989. Recognitionof equationsusing a two-dimensionalstochasticcontext-free grammar. Technicalreport,AT&T Bell Laboratories,Murray Hill,NJ,August.

Church,Kenneth.1980.Onmemorylimitationsin naturallanguageprocessing.Mas-ter’s thesis,MassachusettsInstituteof Technology, Cambridge,MA.

Church,KennethandPatrick Hanks.1989.Word associationnorms,mutualinforma-tion andlexicography. In 27th AnnualMeetingof the Associationfor Computa-tional Linguistics, pages76–83,Morristown, NJ. Associationfor ComputationalLinguistics.

Ci Hai. 1979.Ci Hai. ShanghaiLexiconPublishingSociety, Shanghai.

Clements,G. Nick. 1985. The geometryof phonologicalfeatures. In C. EwenandJ. Anderson,editors,Phonology Yearbook2. CambridgeUniversity Press,pages225–252.

Coleman,John.1998. Phonological Representations:Their Names,FormsandPow-ers. Number85 in CambridgeStudiesin Linguistics.Cambridge.

Coulmas,Florian. 1989.TheWriting Systemsof theWorld. Blackwell,Oxford.

Coulmas,Florian. 1994. Typology of writing systems. In Hugo Steger and Her-bert ErnstWiegand,editors,Schrift und Schriftlichkeit/Writing and its Use, vol-ume2. WalterdeGruyter, Berlin, chapter118,pages1380–1387.

Cregeen,Archibald. 1835. Fockleyr ny Gaelgey (A Dictionary of Manx). Yn Che-shaghtGhailckagh,Douglas,Isleof Man. Reprintedin 1971.

Cubberley, Paul. 1996. The Slavic alphabets.In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages346–363.

Cummings,D. W. 1988.AmericanEnglishSpelling:An InformalDescription. JohnsHopkinsUniversityPress,Baltimore,MD.

210 References

Daelemans,Walter and Antal van den Bosch. 1997. Language-independentdata-orientedgrapheme-to-phonemeconversion. In Janvan Santen,RichardSproat,JosephOlive,andJuliaHirschberg,editors,Progressin SpeechSynthesis. Springer,New York, NY, pages77–89.

Daniels,Peter. 1991a. Is a structuralgraphemicspossible? In LACUSForum, vol-ume18,pages528–37.

Daniels,Peter. 1991b. Replyto herrick. In LACUSForum, volume21,pages425–31.

Daniels,Peter. 1996a.Aramaicscriptsfor Aramaiclanguages.In PeterDanielsandWilliam Bright, editors,The World’s Writing Systems. Oxford University Press,New York, NY, pages499–514.

Daniels,Peter. 1996b. The studyof writing systems.In PeterDanielsandWilliamBright, editors,TheWorld’sWriting Systems. Oxford UniversityPress,New York,NY, pages3–17.

Daniels,Peterand William Bright. 1996. The World’s Writing Systems. OxfordUniversityPress,New York, NY.

de Gelder, BeatriceandJose Morais, editors. 1995. Speech and Reading. Erlbaum(UK) Taylor andFrancis,Hove.

DeFrancis,John. 1984. The ChineseLanguage: Fact and Fantasy. University ofHawaii Press,Honolulu,HI.

DeFrancis,John. 1989. Visible Speech: TheDiverseOnenessof Writing Systems.Universityof Hawaii Press,Honolulu,HI.

DeFrancis,JohnandJ.MarshallUnger. 1994.Rejoinderto Geoffrey Sampson:“Chi-nesescriptandthediversityof writing systems”.Linguistics, 32:549–554.

Diller, Anthony. 1996. Thai andLao writing. In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages457–466.

Dutoit, Thierry. 1997. An Introductionto Text-to-Speech Synthesis. Kluwer, Dor-drecht.

Faber, Alice. 1992. Phonemicsegmentationasepiphenomenon.evidencefrom thehistoryof alphabeticwriting. In PamelaDowning,SusanLima, andMichaelNoo-nan,editors,TheLinguisticsof Literacy. JohnBenjamins,Amsterdam,pages111–34.

Fano,R. 1961.Transmissionof Information. MIT Press,Cambridge,MA.

Fant,Gunnar. 1960.AcousticTheoryof Speech Production. Mouton,TheHague.

Fischer, Steven. 1997a.Glyphbreaker. Copernicus,New York.

References 211

Fischer, Steven. 1997b. Rongorongo: TheEasterIsland Script: History, Tradition-s, Texts. Number14 in Oxford Studiesin AnthropologicalLinguistics.OxfordUniversityPress,Oxford.

Flesch,Rudolf. 1981. WhyJohnnyStill Can’t Read: A New Look at the Scandalofour Schools. HarperCollins,New York.

Floresd’Arcais, Giovanni,Hirofumi Saito,andMasahiroKawakami. 1995. Phono-logicalandsemanticactivationin readingkanji characters.Journalof Experimen-tal Psychology, 21(1):34–42.

Frost,RamandLeonardKatz, editors. 1992. Orthography, Phonology, Morphologyand Meaning. Number94 in Advancesin Psychology. North-Holland,Amster-dam.

Frost, Ram, LeonardKatz, and ShlomoBentin. 1987. Strategies for visual wordrecognitionandorthographicaldepth:A multilingual comparison.Journal of Ex-perimentalPsychology: HumanPerceptionandPerformance, 13:104–115.

Fujimura, Osamuand R. Kagaya. 1969. Structuralpatternsof Chinesecharacter-s. In Proceedingsof the InternationalConferenceon ComputationalLinguistics,pages131–148,Sanga-Saby, Sweden.InternationalConferenceonComputationalLinguistics.

Gardiner, Alan. 1982. EgyptianGrammar. Griffith Institute,AshmoleanMuseum,Oxford, third edition.

Gelb,Ignace.1963.A Studyof Writing. ChicagoUniversityPress,2ndedition.

Giammarressi,Dora and Antonio Restivo. 1997. Two-dimensionallanguages. InGrzegorzRozenberg andArto Salomaa,editors,Handbookof Formal Languages.Springer-Verlag,Berlin, pages215–267.

Haas,W. 1983. Determiningthelevel of a script. In FlorianCoulmasandK. Ehlich,editors,Writing in Focus. Mouton,Berlin, pages15–29.

Haile, Getatchew. 1996. Ethiopic writing. In PeterDanielsand William Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages569–576.

Harbaugh,Rick. 1998. ChineseCharacters: A Genealogy and Dictionary. Zhong-wen.com.

Harris,Roy. 1995.Signsof Writing. Routledge,London.

Harrison,Michael. 1978. Introductionto FormalLanguageTheory. AddisonWesley,Reading,MA.

Hary, Benjamin. 1996. Adaptationsof Hebrew script. In PeterDanielsandWilliamBright, editors,TheWorld’sWriting Systems. Oxford UniversityPress,New York,NY, pages727–734.

212 References

Hopcroft, JohnandJeffrey Ullman. 1979. Introductionto AutomataTheory, Lan-guagesandComputation. Addison-Wesley, Reading,MA.

Horodeck,Richard. 1987. TheRoleof Soundin Readingand Writing Kanji. Ph.D.thesis,CornellUniversity, Ithaca,NY.

Hung, Daisy, Ovid Tzeng,and AngelaTzeng. 1992. Automatic activation of lin-guistic informationin Chinesecharacterrecognition. In RamFrostandLeonardKatz, editors,Orthography, Phonology, Morphology andMeaning, number94 inAdvancesin Psychology. North-Holland,Amsterdam,pages119–130.

Hurford, James.1975. TheLinguistic Theoryof Numerals. CambridgeUniversityPress,Cambridge.

Ifans,DafyddandRobertThomson.1979.EdwardLhuyd’sGeirieuManaweg. StudiaCeltica, 14:127–167.

Instituut voor NederlandseLexicologie. 1995. Woordenlijst NederlandseTaal. SduUitgevers,TheHague.

Johnson,C. Douglas. 1972. Formal Aspectsof Phonological Description. Mouton,Mouton,TheHague.

Kanwisher, N. 1987. Repetitionblindness:Typerecognitionwithout tokenindividu-ation. Cognition, 27:117–143.

Kaplan,RonaldandMartin Kay. 1994.Regularmodelsof phonologicalrule systems.ComputationalLinguistics, 20:331–378.

Karttunen,Lauri. 1995. Thereplaceoperator. In 33rd AnnualMeetingof theAssoci-ation for ComputationalLinguistics, pages16–23,Cambridge,MA. ACL.

Karttunen,Lauri. 1998. Thepropertreatmentof optimality in computationalphonol-ogy. In Lauri KarttunenandKemalOflazer, editors,FSMNLP’98: Proceedingsof theInternationalWorkshopon Finite StateMethodsin Natural Language Pro-cessing, pages1–12,BilkentUniversity, Ankara.

Karttunen,Lauri andKennethBeesley. 1992. Two-level rule compiler. TechnicalReportP92–00149,XeroxPaloAlto ResearchCenter.

Katz,LeonardandLaurieFeldman.1983.Relationbetweenpronunciationandrecog-nition of printedwordsin deepandshallow orthographies.Journalof Experimen-tal Psychology: Learning, MemoryandCognition, 9:157–166.

Katz, LeonardandRamFrost. 1992. The readingprocessis different for differen-t orthographies:the orthographicdepthhypothesis. In Ram FrostandLeonardKatz, editors,Orthography, Phonology, Morphology andMeaning, number94 inAdvancesin Psychology. North-Holland,Amsterdam,pages67–84.

References 213

Kaye,Alan. 1996.Adaptationsof Arabicscript. In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages743–762.

King, Ross.1996. Koreanhankul. In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages218–227.

Kiraz, George. 1999. ComputationalApproach to Non-LinearMorphology. Cam-bridgeUniversityPress,Cambridge.

Klima, Edward. 1972. How alphabetsmight reflect language. In JamesKavanaghandIgnatiusMattingly, editors,Language by Ear andby Eye: TheRelationshipsbetweenSpeech andReading. MIT Press,Cambridge,MA, pages57–80.

Knight,Stan.1996.Theromanalphabet.In PeterDanielsandWilliam Bright,editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages312–332.

Koskenniemi,Kimmo. 1983. Two-Level Morphology: a General ComputationalModel for Word-Form Recognition and Production. Ph.D. thesis,University ofHelsinki,Helsinki.

Krivickij, A. andA. Podluzhnyj. 1994.UchebnikBelorusskogo Yazyka. VyshejshayaShkola,Minsk.

Kucera,H. andW. Francis.1967. ComputationalAnalysisof Present-DayAmericanEnglish. Brown UniversityPress,Providence.

Law, SamPoandAlfonso Caramazza.1995. Cognitive processesin writing Chinesecharacters:Basicissuesandsomepreliminarydata.In BeatricedeGelderandJoseMorais, editors,Speech and Reading. Erlbaum(UK) Taylor andFrancis,Hove,pages143–190.

Lehman,Winifred andLloyd Faust,1951. A Grammarof Formal WrittenJapanese,chapterSupplement:Kokuji, by R. P. Alexander. HarvardUniversityPress,Cam-bridge,MA.

Lejeune,Michel. 1974.Manueldela LangueVenete. CarlWinter, Heidelberg.

Levin, Estherand RobertoPieraccini. 1991. Dynamic planarwarping and planarhiddenmarkov modeling:from speechto opticalcharacterrecognition.Technicalreport,AT&T Bell Laboratories,Murray Hill, NJ,November.

Lewis, Harry andChristosPapadimitriou.1981.Elementsof theTheoryof Computa-tion. Prentice-Hall,EnglewoodCliffs, NJ.

Luk, RobertandRobertDamper. 1993. Experimentswith silent-eandaffix corre-spondencesin stochasticphonographictransductions.In Proceedingsof theThirdEuropeanConferenceonSpeechCommunicationandTechnology, volume2,pages917–920,Berlin. ESCA.

214 References

Luk, Robertand RobertDamper. 1996. StochasticphonographictransductionforEnglish.ComputerSpeech andLanguage, 10:133–153.

Lyosik, Yazep. 1926. GramatykaBelaruskaeMovy: Fonetyka. Publishedby theAuthor, Minsk.

MacMahon,Michael. 1996. Phoneticnotation. In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages821–846.

Macri, Martha. 1996. Maya andotherMesoamericanscripts. In PeterDanielsandWilliam Bright, editors,The World’s Writing Systems. Oxford University Press,New York, NY, pages172–182.

Maksymiuk,Jan. 1999. An orthographyon trial in Belarus. Written Language andLiteracy, 2(1):141–144.

Marantz,Alec. 1982.Rereduplication.LinguisticInquiry, 13:435–482.

Mastroianni,M. andBob Carpenter. 1994. Constraint-basedmorpho-phonology. InProceedingsof theFirst ACL SIGPHONWorkshop, LasCruces,NM. ACL.

Matsunaga,Sachiko. 1994. TheLinguisticandPsycholinguisticNature of Kanji: DoKanji RepresentandTrigger only Meanings?Ph.D.thesis,Universityof Hawaii,Honolulu,HI.

McManus,Damian. 1996. Celtic languages.In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages655–660.

Merrifield, William. 1968. Numbernamesin four languagesof Mexico. In HugoBrandtCorstius,editor, Grammars for NumberNames, number7 in Foundationsof Language,SupplementarySeries.D. Reidel,Dordrecht,pages91–102.

Miller, D. Gary. 1994.AncientScriptsandPhonological Knowledge. Number116inCurrentIssuesin LinguisticTheory. JohnBenjamins,Amsterdam.

Mohanan,K. P. 1986.TheTheoryof Lexical Phonology. D. Reidel,Dordrecht.

Mohri, Mehryar. 1994. Syntacticanalysisby local grammarsautomata:anefficien-t algorithm. In Papers in ComputationalLexicography: COMPLEX’94, pages179–191,Budapest.ResearchInstitutefor Linguistics,HungarianAcademyof Sci-ences.

Mohri, Mehryar. 1997. Finite-statetransducersin languageandspeechprocessing.ComputationalLinguistics, 23(2).

Mohri, MehryarandRichardSproat.1996.An efficientcompilerfor weightedrewriterules. In 34thAnnualMeetingof the Associationfor ComputationalLinguistics,pages231–238,SantaCruz,CA. ACL.

References 215

Mountford, John. 1996. A functionalclassification. In PeterDanielsandWilliamBright, editors,TheWorld’sWriting Systems. Oxford UniversityPress,New York,NY, pages627–632.

Myers,James.1996. Prosodicstructurein Chinesecharacters.Presentedasa Posterat the5th InternationalConferenceon ChineseLinguistics,TsingHuaUniversity,Taiwan,June.

NanyangSiangPau. 1984. Learner’s ChineseEnglishDictionary. NanyangSiangPau,UmumPublisher, Singapore.

Needham,J. 1959. ScienceandCivilisation in China: Vol. 3 — MathematicsandtheSciencesof theHeavensandtheEarth. CambridgeUniversityPress,Cambridge.with thecollaborationof WangLing.

Neijt, Anneke andAnneke Nunn. 1997. The recenthistory of Dutch orthography:Problemssolvedandcreated.LeuvenseBijdragen, 86:1–26.

Nguyen,Dinh Hoa. 1959. Chu’ nom: The demoticsystemof writing in Vietnam.Journalof theAmericanOrientalSociety, 79(4):270–274.

Nunberg,Geoffrey. 1995.TheLinguisticsof Punctuation. CSLI (Universityof Chica-goPress),Chicago,IL.

Nunn,Anneke. 1998. Dutch Orthography: A SystematicInvestigationof theSpellingof Dutch Words. Number6 in LOT InternationalSeries.HollandAcademicGraph-ics,TheHague.

O’Connor, M. 1996.EpigraphicSemiticscripts.In PeterDanielsandWilliam Bright,editors,The World’s Writing Systems. Oxford University Press,New York, NY,pages88–107.

Packard,Jerome,editor. 1998.New Approachesto ChineseWord Formation. Number105in Trendsin Linguistics,StudiesandMonographs.MoutondeGruyter, Berlin.

Perfetti,CharlesandLi Hai Tan. 1998.Thetime courseof graphic,phonologicalandsemanticactivation in Chinesecharacteridentification. Journal of ExperimentalPsychology: Learning, MemoryandCognition, 24(1):101–118.

Perfetti,CharlesA., LaurenceRieben,andMichel Fayol, editors. 1997. Learningto Spell: Research, Theoryand PracticeacrossLanguages. LawrenceErlbaumAssociates,Mahwah,NJ.

Pesetsky, David. 1979. Russianmorphologyandlexical theory. MS. MassachusettsInstituteof Technology.

Pettersson,JohnSoren. 1996. Numericalnotation. In PeterDanielsand WilliamBright, editors,TheWorld’sWriting Systems. Oxford UniversityPress,New York,NY, pages795–806.

216 References

Pinker, StevenandAlan Prince. 1988. On languageandconnectionism:Analysisofa paralleldistributedprocessingmodelof languageacquisition. In StevenPinkerandJacquesMehler, editors,ConnectionsandSymbols. MIT Press,pages73–193.Cognitionspecialissue.

Prince,Alan and Paul Smolensky. 1993. Optimality theory. TechnicalReport2,RutgersUniversity, Piscataway, NJ.

Radzinski,Daniel. 1991. Chinesenumber-names,treeadjoininglanguages,andmildcontext-sensitivity. ComputationalLinguistics, 17(3):277–300.

Rajemisa-Raolison,Regis. 1971. Grammaire Malgache. Centrede FormationPedagogique,Fianarantsoa,Madagascar. 7th edition.

Raman,T.V. 1994. Audio Systemfor Technical Readings. Ph.D.thesis,CornellUni-versity.

Ratliff, Martha. 1996. The Pahawh Hmong script. In PeterDanielsand WilliamBright, editors,TheWorld’sWriting Systems. Oxford UniversityPress,New York,NY, pages619–624.

Ritner, Robert.1996.Egyptianwriting. In PeterDanielsandWilliam Bright, editors,TheWorld’s Writing Systems. Oxford UniversityPress,New York, NY, pages73–84.

Romer, Jurgen.1994.Abkurzungen.In HugoStegerandHerbertErnstWiegand,ed-itors,Schrift undSchriftlichkeit/Writingandits Use, volume2. WalterdeGruyter,Berlin, chapter135,pages1506–1515.

Rumelhart,David andJamesMcClelland.1986.On learningthepasttenseof Englishverbs. In JamesMcClellandandDavid Rumelhart,editors,Parallel DistributedProcessing. MIT Press,Cambridge,MA, pages216–271.Volume2.

Sagey, Elizabeth.1986. TheRepresentationof FeaturesandRelationsin Non-LinearPhonology. Ph.D.thesis,MassachusettsInstituteof Technology, Cambridge,MA.

Salomon,Richard. 1996. South Asian writing systems(introduction). In PeterDanielsandWilliam Bright, editors,TheWorld’s Writing Systems. Oxford Uni-versityPress,New York, NY, pages371–372.

Sampson,Geoffrey. 1985.Writing Systems. StanfordUniversityPress,Stanford,CA.

Sampson,Geoffrey. 1994. Chinesescriptandthe diversityof writing systems.Lin-guistics, 32:117–132.

Schiller, Eric. 1996.Khmerwriting. In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages467–473.

Schreuder, Robert,AnnekeNeijt, FemkevanderWeide,andR. HaraldBaayen.1998.Regular plurals in Dutch compounds:Linking graphemesor morphemes.Lan-guageandCognitiveProcesses, 13:551–573.

References 217

Seidenberg, M., K. McRae, and D. Jared. 1988. Frequency and consistency ofspelling-soundcorrespondencesin naming. Presentedat the29thannualmeetingof thePsychonomicSociety, Chicago,November.

Seidenberg, Mark. 1990. Lexical access:Anothertheoreticalsoupstone?In DavidBalota,GiovanniFloresd’Arcais,andKeithRayner, editors,ComprehensionPro-cessesin Reading. LawrenceErlbaumAssociates,Hillsdale,NJ,pages33–71.

Seidenberg,Mark. 1992.Beyondorthographicdepthin reading:Equitabledivisionoflabor. In RamFrostandLeonardKatz,editors,Orthography, Phonology, Morphol-ogyandMeaning, number94 in Advancesin Psychology. North-Holland,Amster-dam,pages85–118.

Seidenberg, Mark. 1997.Languageacquisitionanduse:Learningandapplyingprob-abilisticconstraints.Science, 275:1599–1603,March14.

Seidenberg, Mark andJamesMcClelland. 1989. A distributed,developmentalmodelof visualword recognitionandnaming.Psychological Review, 96:523–568.

Sejnowski,TerenceandC.Rosenberg. 1987.Parallelnetworksthatlearnto pronounceEnglishtext. Complex Systems, 1:145–168.

Serianni,Luca. 1989. GrammaticaItaliana: Italiano comunee lingua letteraria.UTET Libreria,Turin.

Shi, Dingxu. 1996. TheYi script. In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages239–243.

Skjærvø,Oktor. 1996. Aramaicscriptsfor Iranianlanguages.In PeterDanielsandWilliam Bright, editors,The World’s Writing Systems. Oxford University Press,New York, NY, pages515–535.

Smalley, William, Chia KouaVang,andGnia YeeYang. 1990. Mother of Writing:TheOrigin andDevelopmentof a HmongMessianicScript. Universityof ChicagoPress,Chicago,IL.

Smith,Janet.1996. Japanesewriting. In PeterDanielsandWilliam Bright, editors,TheWorld’sWriting Systems. OxfordUniversityPress,New York, NY, pages209–217.

Sproat,Richard.1992.MorphologyandComputation. MIT Press,Cambridge,MA.

Sproat,Richard.1997a.Multilingual text analysisfor text-to-speechsynthesis.Jour-nal of Natural LanguageEngineering, 2(4):369–380.

Sproat,Richard,editor. 1997b. Multilingual Text to Speech Synthesis:TheBell LabsApproach. Kluwer AcademicPublishers,Boston,MA.

Sproat,RichardandChilin Shih. 1990. A statisticalmethodfor finding word bound-ariesin Chinesetext. ComputerProcessingof ChineseandOriental Languages,4:336–351.

218 References

Sproat,RichardandChilin Shih. 1995.A corpus-basedanalysisof Mandarinnominalrootcompounds.Journalof EastAsianLinguistics, 4(1):1–23.

Sproat,Richard,Chilin Shih, William Gale,andNancy Chang. 1996. A stochasticfinite-stateword-segmentationalgorithmfor Chinese.ComputationalLinguistics,22:377–404.

Stampe,David. 1976. Cardinalnumbersystems.In Papers from the 12th RegionalMeeting, Chicago Linguistic Society, pages594–609,Chicago,IL. ChicagoLin-guisticSociety.

Steriade,Donca. 1982. GreekProsodiesand the Nature of Syllabification. Ph.D.thesis,MassachusettsInstituteof Technology, Cambridge,MA.

Steriade,Donca.1999. Paradigmuniformity andthephonetics-phonologyboundary.In MichaelBroeandJanetPierrehumbert,editors,Papersin LaboratoryPhonologyV. CambridgeUniversityPress.

Stone,Gregory and Guy Van Orden. 1994. Building a resonanceframework forword recognitionusing designand systemprinciples. Journal of ExperimentalPsychology: HumanPerceptionandPerformance, 20(6):1248–1268.

Talkin, David. 1995.A robustalgorithmfor pitch tracking(RAPT). In W. Kleijn andK. K. Paliwal, editors,Speech CodingandSynthesis. Elsevier, New York, NY.

Taraban,R. andJamesMcClelland. 1987. Conspiracy effects in word recognition.Journalof MemoryandLanguage, 26:608–631.

Thomson,Robert. 1969. The study of Manx Gaelic. Proceedingsof the BritishAcademy, 60:179–210.Sir JohnRhysMemorialLecture.

Tzeng,AngelaKu-Yuan. 1994. ComparativeStudieson Word Perceptionof ChineseandEnglish:EvidenceAgainstanOrthographic-SpecificHypothesis. Ph.D.thesis,Universityof California,Riverside.

Vachek,Josef.1973.WrittenLanguage: General ProblemsandProblemsof English.Mouton,TheHague.

van den Bosch, Antal, Alain Content,Walter Daelemans,and Beatricede Gelder.1994. Measuringthecomplexity of writing systems.Journal of QuantitativeLin-guistics, 1:178–188.

VanOrden,Guy, BrucePennington,andGregoryStone.1990.Word identificationinreadingandthepromiseof subsymbolicpsycholinguistics.Psychological Review,97(4):488–522.

Venezky, Richard.1970.TheStructure of EnglishOrthography. Number82 in JanuaLinguarum.Mouton,TheHague.

References 219

Voutilainen,Atro. 1994. ThreeStudiesof Grammar-BasedSurfaceParsingof Unre-strictedEnglishText. Ph.D.thesis,Universityof Helsinki,Helsinki. PublishedasPublicationsof theDepartmentof General Linguistics,Universityof Helsinki,no.24.

Wade,Terence.1992.A ComprehensiveRussianGrammar. Blackwell,Oxford.

Wang,Jason.1983. Toward a GenerativeGrammarof ChineseCharacterStructureandStrokeOrder. Ph.D.thesis,Universityof Wisconsin,Madison,WI.

Wang,Jian,Albrecht Inhoff, andHsuan-ChihChen,editors. 1999. ReadingChineseScript. LawrenceErlbaumAssociates,Mahwah,NJ.

Wang,Kuijing. 1996. XiandaiHanyuSuolueyuCidian (A Dictionary of Present-DayChineseAbbreviations). ShangwuPrintingHouse,Beijing.

Wells, J. C. 1982. Accentsof English 1: An Introduction. CambridgeUniversityPress,Cambridge.

Wieger, L. 1965. ChineseCharacters. Dover, New York. Republicationof secondedition,published1927by CatholicMissionPress.

Yarowsky, David. 1996. Homographdisambiguationin text-to-speechsynthesis.In Janvan Santen,RichardSproat,JosephOlive, andJulia Hirschberg, editors,Progressin Speech Synthesis. Springer, New York, pages157–172.