lexicon organisation and contextual methods for …pszcah/pdf/swemyohtwethesis.pdf · lexicon...

Lexicon Organisation and

Contextual Methods for Online

Handwritten Pitman’s

Shorthand Recognition

by Swe Myo Htwe, BSc

Thesis submitted to the University of Nottingham for the degree of

Doctor of Philosophy

School of Computer Science and Information Technology

December 2006

To my parents and fiancé

ii

Abstract

This research investigates innovations for the computer transcription of handwritten Pitman’s

Shorthand as a rapid means of text entry (up to 100 words per minute) into today’s pen-based

handheld devices.

Two mathematical models are developed in this work. The first model deals with high level

phonetic-based translation, while the second model is specifically concerned with low level

primitive-based translation. Both models are closely related to the lexicon organization and

contextual processing for online handwritten Pitman’s Shorthand recognition.

A number of research issues that arise from interpreting handwritten Pitman’s Shorthand strokes

of digital ink as text are addressed including: (a) a feasibility study into improving a conventional

phonetic-based transliteration approach to advance word recognition; (b) an investigation into

new Bayesian Network modelling of strokes and their relationships in order to solve the problem

of geometric variations and vowel ambiguities of handwritten Pitman’s Shorthand; (c) generation

of a new machine-readable Pitman’s Shorthand lexicon to facilitate the direct transcription of

geometric features of Pitman’s Shorthand into English text; (d) analysis of the impact of

statistical language modelling in handwriting phrase recognition; (e) and a discussion of the

graphical user interface issues in relation to the development of a commercial prototype from the

frame of reference of this research.

The research has been carried out in close cooperation with Nanyang Technology University

(NTU) in Singapore. The system is currently undergoing a final evaluation in terms of its

recognition accuracy, as well as its potential to be introduced as a commercially viable fast text

input system.

iii

Acknowledgements

I would like to take this opportunity to express my sincere gratitude to my supervisor Dr.

Colin Higgins for his valuable guidance and constant support since the day I had stepped

into the School of Computer Science of the University of Nottingham till the day of the

successful completion of this research.

My sincere gratitude also goes to Professor Graham Leedham for his dedicated guidance and

genuine assistance for keeping the close collaboration between the two participating teams

of this research. My deepest thanks also go to Ma Yang for her heartfelt contribution and

her immediate responses during the critical time of this collaborative research.

My sincere thanks also go to Ms. Joyce Cox for her kind and professional help in proof

reading the quality of English of the thesis. Also, from the bottom of my heart, I am very

grateful to all participants who have helped me in the experiments of this research. Many

thanks also go to my colleagues in the LTR Research group for their warm friendship that

made me feel at home in our LTR lab.

Also, my endless thanks to my uncle, Dr. Kyin Win for supporting me financially and

emotionally to make my dream of participating in the doctorate research come true. My

heartfelt thanks also go to the International Office and the School of Computer Science of

the University of Nottingham for their enormous financial support for the development of

this research.

Also, my sincere love and thanks to my parents, fiancé, and all my friends in Nottingham for

supporting me financially, emotionally and spiritually during the difficult days of my long

residence in Nottingham.

Last but not least, my sincere thanks to all the members of the School of Computer Science

of the University of Nottingham for all their help and advice, given to me when I needed it

most.

Thank you all, Swe Myo Htwe.

iv

Table of contents

ABSTRACT …………………………………………………………………………………………..II

ACKNOWLEDGEMENTS…………………………………………………………………………III

TABLE OF CONTENTS …………………………………………………………………………...IV

LIST OF FIGURES ………………………………………………………………………………

LIST OF TABLES ………………………………………………………………………………….

1 LINGUISTIC POST PROCESSING OF HANDWRITTEN PITMAN’S SHORTHAND .. 1

CHAPTER 1 INTRODUCTION ................................................................................................... 2

1.1 BACKGROUND...................................................................................................................... 31.1.1 Collaboration ................................................................................................................. 31.1.2 Motivation ...................................................................................................................... 41.1.3 Scope .............................................................................................................................. 5

1.2 BRIEF OVERVIEW................................................................................................................. 71.2.1 General Objectives and Contributions ........................................................................... 7

1.3 SYNOPSIS OF THE DISSERTATION ....................................................................................... 12

2 BACKGROUND TO THE AUTOMATIC RECOGNITION OF HANDWRITTEN PITMAN’S SHORTHAND ............................................................................................................... 15

CHAPTER 2 INTRODUCTION ................................................................................................. 16

2.1 EVALUATION OF EXISTING TEXT INPUT SYSTEMS FOR HANDHELD DEVICES ....................... 162.1.1 On-screen keyboards vs. a handwritten Pitman’s Shorthand recognizer..................... 172.1.2 A cursive handwriting recognizer vs. a handwritten Pitman’s Shorthand recognizer . 172.1.3 Gesture based text entry systems vs. a handwritten Pitman’s Shorthand recognizer... 182.1.4 Speech recognition systems vs. a handwritten Pitman’s Shorthand recognizer........... 19

2.2 PITMAN’S SHORTHAND: A BRIEF OVERVIEW ...................................................................... 202.3 AUTOMATIC RECOGNITION OF HANDWRITTEN PITMAN’S SHORTHAND: AN OVERVIEW...... 232.4 HANDWRITING RECOGNITION ALGORITHMS TO IMPROVE A WORD LEVEL TRANSLITERATION

262.4.1 Hidden Markov Models (HMMs) ................................................................................. 272.4.2 Neural Networks........................................................................................................... 282.4.3 Bayesian Networks ....................................................................................................... 29

2.4.3.1 Conditional independence.................................................................................................302.4.3.2 Inference ...........................................................................................................................322.4.3.3 Learning............................................................................................................................33

2.5 NATURAL LANGUAGE PROCESSING ALGORITHMS FOR HANDWRITTEN PHRASE RECOGNITION

352.5.1 Statistical language modeling ...................................................................................... 352.5.2 Viterbi algorithm .......................................................................................................... 36

2.6 PEN APPLICATION PROGRAM INTERFACES (APIS) .............................................................. 372.7 SUMMARY.......................................................................................................................... 38

3 EVALUATION OF PHONETIC BASED TRANSCRIPTION OF VOCALISED HANDWRITTEN PITMAN’S OUTLINES..................................................................................... 39


3.1 SYSTEM OVERVIEW............................................................................................................ 413.2 TRANSCRIPTION OF VOCALIZED OUTLINES BASED ON A PHONETIC APPROACH................... 433.3 LEXICON PREPARATION ..................................................................................................... 443.4 NEAREST NEIGHBOURHOOD QUERY (NNQ) ...................................................................... 473.5 FEATURE TO PHONEME CONVERSION ................................................................................ 493.6 PHONEME ORDERING ......................................................................................................... 513.7 EXPERIMENTAL RESULTS ................................................................................................... 54

3.7.1 Data Set ........................................................................................................................ 543.7.2 Analysis of a phonetic lexicon ...................................................................................... 55

v

3.7.3 Performance evaluation of the word level transcription .............................................. 573.8 DISCUSSION ....................................................................................................................... 58

4 BAYESIAN NETWORK BASED WORD TRANSCRIPTION ........................................... 61


4.1 SYSTEM OVERVIEW ........................................................................................................... 634.2 SUMMARY OF BAYESIAN NETWORK BASED WORD TRANSCRIPTION................................... 644.3 LIFE CYCLE OF OUTLINE MODELS ..................................................................................... 654.4 OUTLINE MODEL ARCHITECTURE...................................................................................... 67

4.4.1 Nodes of an outline model ............................................................................................ 684.4.2 Relationships between nodes........................................................................................ 73

4.5 INFERENCE......................................................................................................................... 744.5.1 Message Initialization .................................................................................................. 754.5.2 Belief Updating ............................................................................................................ 76

4.6 LEARNING OF OUTLINE MODELS ....................................................................................... 784.6.1 Learning of consonant primitives ................................................................................. 794.6.2 Learning of vowel primitives ........................................................................................ 81

4.7 MODEL SELECTION ............................................................................................................ 824.8 EXPERIMENTAL RESULT..................................................................................................... 86

4.8.1 Data set ........................................................................................................................ 874.8.2 Evaluation of the recognition engine............................................................................ 894.8.3 Evaluation of the word transcription accuracy ............................................................ 934.8.4 Analysis of word transcription accuracy using the single consonant data set ............. 94

4.8.4.1 Analysis of the recognition accuracy vs. the transcription accuracy.................................944.8.4.2 Analysis of the accuracy of a result list ............................................................................954.8.4.3 Analysis of the correction accuracy vs. the classification/vowel errors ............................974.8.4.4 Analysis of factors influencing the accuracy of a result list..............................................98

4.8.5 Analysis of word transcription accuracy using stroke-combination data set ............... 994.8.5.1 Analysis of the recognition accuracy vs. the transcription accuracy.................................994.8.5.2 Analysis of the accuracy of a result list ..........................................................................1004.8.5.3 Analysis of the correction accuracy vs. the classification/vowel errors ..........................1014.8.5.4 Analysis of factors influencing the accuracy of a result list............................................102

4.8.6 Analysis of word transcription accuracy for the special-rule data set ....................... 1034.8.6.1 Analysis of the recognition accuracy vs. the transcription accuracy...............................1034.8.6.2 Analysis of the accuracy of the result list .......................................................................1044.8.6.3 Analysis of the correction accuracy vs. the classification/vowel errors ..........................1054.8.6.4 Analysis of factors influencing the accuracy of a result list............................................106

4.9 DISCUSSION ..................................................................................................................... 107

5 GENERATION OF A MACHINE-READABLE PITMAN’S SHORTHAND LEXICON110

CHAPTER 5 INTRODUCTION ............................................................................................... 111

5.1 OVERVIEW ....................................................................................................................... 1125.1.1 Rule-based creation of the electronic Pitman’s Shorthand lexicon............................ 113

5.2 STRUCTURE OF THE ELECTRONIC PITMAN’S SHORTHAND LEXICON ................................. 1145.2.1 Feature set.................................................................................................................. 1145.2.2 Key.............................................................................................................................. 1155.2.3 Lexicon layout ............................................................................................................ 116

5.3 CONVERSION PROCEDURE................................................................................................ 1185.3.1 The importance of algorithms of the presented rules ................................................. 1195.3.2 Description of Rules ................................................................................................... 120

5.4 EXPERIMENTAL RESULTS ................................................................................................. 1275.4.1 Data set ...................................................................................................................... 1275.4.2 Analysis of the accuracy of a machine readable Pitman’s Shorthand lexicon ........... 1285.4.3 Analysis of the distribution of homophones in machine-readable Pitman’s Shorthandlexicons 134

5.5 DISCUSSION ..................................................................................................................... 136

6 PHRASE LEVEL TRANSCRIPTION OF ONLINE HANDWRITTEN PITMAN’S SHORTHAND OUTLINES ............................................................................................................ 137

vi


6.1 CONTEXTUAL REJECTION STRATEGY ............................................................................... 1396.2 HANDWRITTEN PITMAN’S SHORTHAND PHRASE RECOGNITION........................................ 1416.3 THE INTEGRATION OF A PITMAN’S SHORTHAND PHRASE RECOGNISER WITH APIS........... 1436.4 EXPERIMENTAL RESULTS ................................................................................................. 1466.5 DISCUSSION ..................................................................................................................... 146

7 GRAPHICAL USER INTERFACES OF THE HANDWRITTEN PITMAN’S SHORTHAND RECOGNITION SYSTEM................................................................................... 148


7.1 OVERVIEW ....................................................................................................................... 1507.2 INK DATA COLLECTION IN THIS RESEARCH....................................................................... 1517.3 GENERAL TRAINING DATA COLLECTION TOOL ................................................................. 1557.4 DEVELOPER GRAPHICAL USER INTERFACE...................................................................... 1587.5 SHORTHAND DATA ENTRY GRAPHICAL USER INTERFACES................................................ 1597.6 EXPERIMENTAL RESULTS ................................................................................................. 164

7.6.1 Analysis of the general distribution of user fondness for the presented prototypes ... 1667.6.2 Analysis of the distribution of user fondness for the presented prototypes in the case of speed writing............................................................................................................................. 1677.6.3 Analysis of the distribution of user fondness for the presented prototypes in the case of a small amount of text entry into handheld devices .................................................................. 1677.6.4 The comparison of the most favourite GUI of experienced shorthand writers and that of novice shorthand writers ...................................................................................................... 168

7.7 DISCUSSION ..................................................................................................................... 169

8 CONCLUSION ....................................................................................................................... 171


8.1 RESEARCH WORK SUMMARY............................................................................................ 1728.2 CONTRIBUTION ................................................................................................................ 1748.3 FUTURE WORK ................................................................................................................. 175

8.3.1 Improvement upon the overall system ........................................................................ 1758.3.2 Application of the presented system to the real life problems .................................... 177

8.4 DISSEMINATION ............................................................................................................... 177

REFERENCES................................................................................................................................. 180

APPENDIX .......................................................................... ERROR! BOOKMARK NOT DEFINED.

vii

FIGURE 1.1: SCOPE OF THE THESIS........................................................................................... 6FIGURE 1.2: A HIGH LEVEL VIEW OF THE SCOPE OF THE RECOGNITION ENGINE

AND THE TRANSCRIPTION ENGINE................................................................................... 9FIGURE 2.1: ILLUSTRATION OF TEXT ENTRY USING SHARK SYSTEM (A) THE WORD

“QUICK” IS WRITTEN USING ATOMIK KEYBOARD LAYOUT (B) THE WORD “QUICK” IS WRITTEN WITHOUT USING A TEMPLATE KEYBOARD ........................... 19

FIGURE 2.2: BASIC CONSONANTS OF PITMAN’S SHORTHAND AS ILLUSTRATED IN [OJ95] ......................................................................................................................................... 21

FIGURE 2.3: /W/, /Y/, H/ CONSONANTS OF PITMAN’S SHORTHAND ..................................... 21FIGURE 2.4: VOWELS, DIPHTHONGS AND DIPHONES OF PITMAN’S SHORTHAND.......... 21FIGURE 2.5: ILLUSTRATION OF VOCALIZED OUTLINES ........................................................ 22FIGURE 2.6: (A) BASIC NOTATIONS OF PITMAN’S SHORTHAND (B) THE WORD “PLAY”

IS WRITTEN PHONETICALLY USING BASIC NOTATIONS (C) THE WORD “PLAY” IS WRITTEN USING A SPECIAL RULE OF PITMAN’S SHORTHAND ................................. 22

FIGURE 2.7: (A) SAMPLES OF SHORT FORMS (B) SAMPLES OF PHRASES........................... 23FIGURE 2.8: A SAMPLE HMM MODEL FOR A SINGLE OUTLINE OF PITMAN’S

SHORTHAND. AT EACH STATE I, ΒI PROBABILITY OF A PARTICULAR STROKE SI

TO BE TYPE TI IS OBSERVED. .............................................................................................. 27FIGURE 2.9: AN INDIVIDUAL CELL A OF NEURAL NETWORK, MODELLED FOR THE

CLASSIFICATION OF HANDWRITTEN PITMAN’S SHORTHAND IN [LQ90] ................ 28FIGURE 2.10: ILLUSTRATION OF STROKE DEPENDENCIES IN PITMAN’S SHORTHAND

(A) VOWEL DEPENDENCY (B) POSITIONAL DEPENDENCY OF THE FIRST CONSONANT PRIMITIVE ...................................................................................................... 30

FIGURE 2.11: C IS CONDITIONALLY INDEPENDENT OF W GIVEN R .................................... 31

FIGURE 2.12: S IS CONDITIONALLY DEPENDENT ON R GIVEN AN OBSERVED DATA, W ................................................................................................................................... 31

FIGURE 2.13: ILLUSTRATION OF THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICE VERSA. ...................................................................................................................................... 32

FIGURE 3.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 42FIGURE 3.2: DETAILED VIEW OF A VOCALIZED OUTLINE INTERPRETER...................... 44FIGURE 3.3: ILLUSTRATION OF SAMPLE WORDS IN NORMAL ENGLISH AND

PITMAN’S SHORTHAND ....................................................................................................... 45

Consonant neighbourhoods vowel neighbourhood circle neighbourhood

F, V

P,BR

S, Z

TH, th

T, D at the beginning of an outline

in the middle of an outline

at the end of an outline

close circles

unclose circles

hooks

FIGURE 3.4: SAMPLE NEIGHBOURHOODS PREDEFINED IN THE NEAREST NEIGHBOURHOOD QUERY APPROACH ......................................................................... 47

FIGURE 3.5: SAMPLE OUTPUT PRODUCED BY THE NEAREST NEIGHBOURHOOD QUERY ...................................................................................................................................... 48

viii

FIGURE 3.6: SAMPLE OF PHONEME TRANSLATION OF A DOUBLE LENGTH STROKE.................................................................................................................................................... 51

FIGURE 3.7: (A) SAMPLE INPUT OF PHONEME ORDERING PROCESS (B) SAMPLE OUTPUT OF PHONEME ORDERING PROCESS ............................................................. 52

FIGURE 3.8: SAMPLE ELEMENT OF A PHONETIC LEXICON IN A HASH TABLE............. 54FIGURE 3.9: SAMPLE COLLECTED OUTLINES ........................................................................ 55FIGURE 3.10: THE DISTRIBUTION OF HOMOPHONES IN DIFFERENT SIZED

PHONETIC LEXICONS .......................................................................................................... 55FIGURE 3.11: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO

CONFUSION BETWEEN A CIRCLE AND A HOOK .......................................................... 59FIGURE 3.12: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO

LENGTH CONFUSION ........................................................................................................... 60FIGURE 4.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 63

FIGURE 4.2: ILLUSTRATION OF BAYESIAN NETWORK BASED WORD TRANSCRIPTION ................................................................................................................... 64

FIGURE 4.4 ILLUSTRATION OF THREE PAIRS OF SIMILAR OUTLINES GROUPED IN AN OUTLINE MODEL.................................................................................................................... 66

FIGURE 4.5 LIFE CYCLE OF OUTLINE MODELS ........................................................................ 67FIGURE 4.6 ILLUSTRATION OF DIFFERENT CHRONOLOGICAL WRITING ORDER OF

NORMAL ENGLISH AND PITMAN’S SHORTHAND .......................................................... 68FIGURE 4.7 ILLUSTRATION OF UNIQUE NODES OF AN OUTLINE MODEL ......................... 69FIGURE 4.8 ILLUSTRATION OF STEP BY STEP CREATION OF OUTLINE MODELS............ 71FIGURE 4.9 SAMPLE TRAINING DATA FOR THE WORD “BAKE” PROCESSED BY THE

RECOGNITION ENGINE; THE ITALIC TEXT ON THE RIGHT EXPLAINS WHAT EACH LINE OF DATA REPRESENTS ............................................................................................... 72

FIGURE 4.10 ILLUSTRATION OF CONDITIONAL DEPENDENCY OF VARIABLES IN AN OUTLINE MODEL USING THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICEVERSA. ...................................................................................................................................... 74

FIGURE 4.11 ILLUSTRATION OF OUTLINE MODEL SELECTION STRATEGIES................... 86

ix

FIGURE 4.12: SAMPLES OF THE STROKE COMBINATION DATA SET .............................. 87FIGURE 4.13: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD “AFTER”; (A)

THE WORD “AFTER” IS WRITTEN ACCORDING TO THE DIRECT CONVERSION OF PHONEMES INTO PRIMITIVES (B) THE WORD “AFTER” IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND ......... 88

FIGURE 4.14: SCREEN SHOT OF OUTLINES WRITTEN BY WRITER A ............................. 89FIGURE 4.15: EVALUATION OF THE VOCALISED OUTLINE IDENTIFICATION OF THE

RECOGNITION ENGINE........................................................................................................ 90FIGURE 4.16: EVALUATION OF THE SEGMENTATION ACCURACY OF THE

RECOGNITION ENGINE........................................................................................................ 92FIGURE 4.17: EVALUATION OF THE CLASSIFICATION ACCURACY OF THE

RECOGNITION ENGINE........................................................................................................ 93FIGURE 4.18: ILLUSTRATION OF A RELATIONSHIP BETWEEN RECOGNITION

ACCURACY AND TRANSCRIPTION ACCURACY OF THE SINGLE CONSONANT DATA SET................................................................................................................................. 95

FIGURE 4.19: COMPARISON OF THE HANDWRITING OF TWO WRITERS ....................... 96FIGURE 4.20: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE

SINGLE CONSONANT DATA SET ...................................................................................... 96FIGURE 4.21: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON

WITH THE CLASSIFICATION OR VOWEL ERRORS OF THE SINGLE CONSONANT DATA SET................................................................................................................................. 98

FIGURE 4.22: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (SINGLE CONSONANT DATA SET) ........................................................................................................................................... 99

FIGURE 4.23: ILLUSTRATION OF THE RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATIONDATA SET............................................................................................................................... 100

FIGURE 4.24: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATION DATA SET ................................................................................ 101

FIGURE 4.25: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON WITH THE CLASSIFICATION/VOWEL ERRORS OF THE STROKE COMBINATION DATA SET............................................................................................................................... 102

FIGURE 4.26: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (STROKE-COMBINATION DATA SET) ............................................................................................................................. 103

FIGURE 4.27: RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ........................ 104

FIGURE 4.28: EVALUATION OF THE WORD TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ................................................................................................. 105

FIGURE 4.29: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON WITH CLASSIFICATION OR VOWEL ERRORS OF THE SPECIAL-RULE DATA SET.................................................................................................................................................. 106

FIGURE 4.30: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (SPECIAL-RULE DATA SET).................................................................................................................................................. 107

FIGURE 5.1: (A) SAMPLE ENTRIES OF A CONVENTIONAL PITMAN’S SHORTHANDDICTIONARY AVAILABLE IN BOOK FORMAT (B) SAMPLE ENTRIES OF AN ELECTRONIC PITMAN’S SHORTHAND LEXICON ........................................................ 112

FIGURE 5.2: SAMPLE KEYS OF THE ELECTRONIC PITMAN’S SHORTHAND LEXICON; VOWELS ARE UNDERLINED ............................................................................................. 115

FIGURE 5.3: SAMPLE ENTRIES OF THE ELECTRONIC PITMAN’S SHORTHANDLEXICON................................................................................................................................. 116

FIGURE 5.4: ILLUSTRATION OF THE CONVERSION PROCEDURE ................................. 119FIGURE 5.5: ILLUSTRATION OF THE USE OF A DOT PRIMITIVE FOR THE SOUND COM

AT THE BEGINNING OF A WORD .................................................................................... 123FIGURE 5.6: ILLUSTRATION OF THE USE OF NEGATIVE PREFIX IR- IN A VOCALISED

OUTLINE ................................................................................................................................. 124FIGURE 5.7: ILLUSTRATION OF THE USE OF PL HOOK IN A VOCALISED OUTLINE .. 125FIGURE 5.8: ILLUSTRATION OF A ONE SYLLABLE HALF-LENGTH OUTLINE ............... 126

x

FIGURE 5.9: ILLUSTRATION OF THE OMISSION OF THE SYLLABLE –TER IN A VOCALISED OUTLINE ......................................................................................................... 126

FIGURE 5.10: ILLUSTRATION OF INCOMPATIBLE PRIMITIVE PAIRS FOR DOUBLING.................................................................................................................................................. 127

FIGURE 5.11: SAMPLE ENTRIES OF A MACHINE-READABLE PITMAN’S SHORTHANDLEXICON................................................................................................................................. 128

FIGURE 5.12: AVERAGE ACCURACIES OF DIFFERENT SIZES OF MACHINE-READABLE PITMAN’S SHORTHAND LEXICONS.......................................................... 129

FIGURE 5.13: TWO DIFFERENT OUTLINES FOR THE WORD “WEATHER”; (A) THE WORD “WEATHER IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND; (B) THE WORD “WEATHER” IS NOT WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND ....... 131

FIGURE 5.14: (A) SHORTHAND OUTLINE FOR THE WORD “FACTOR”; (B) SHORTHAND OUTLINE FOR THE WORD “FURTHER”................................................ 131

FIGURE 5.15: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD “UNION” . 132FIGURE 5.16: TWO DIFFERENT OUTLINES FOR THE WORD “LANDLORD” .................. 132FIGURE 5.17: TWO DIFFERENT OUTLINES FOR THE WORD “ENVIRONMENT”........... 133FIGURE 5.18: THE DISTRIBUTION OF DIFFERENT CATEGORIES OF ERRORS IN

ELECTRONIC PITMAN’S SHORTHAND LEXICONS OF DIFFERENT SIZES ........... 133FIGURE 5.19: THE DISTRIBUTION OF UNIQUENESS OF THE ELECTRONIC PITMAN’S

SHORTHAND LEXICONS..................................................................................................... 134FIGURE 6.1: SAMPLES OF PITMAN’S SHORTHAND OUTLINES WRITTEN IN THREE

DIFFERENT POSITIONS; (A) OUTLINES WRITTEN INCLUDING VOWEL NOTATIONS, (B) OUTLINES WRITTEN WITHOUT VOWEL NOTATIONS ................ 140

FIGURE 6.2: ILLUSTRATION OF THE HANDWRITTEN PITMAN’S SHORTHAND PHRASE LEVEL TRANSCRIPTION PROCESS................................................................................ 141

FIGURE 6.3: AN ABSTRACT VIEW OF THE OBJECT MODEL “MICROSOFT.INK”.......... 144FIGURE 6.4: SCREEN SHOTS OF THE RECOGNITION RESULTS PRODUCED BY THE

“RECOGNISERCONTEXT” API .......................................................................................... 145FIGURE 6.5: PERFORMANCE OF THE CONTEXTUAL REJECTION STRATEGY ........... 146FIGURE 7.1: FRONT-END AND BACK-END ARCHITECTURE OF THE SYSTEM ............ 149FIGURE 7.2: ILLUSTRATION OF INTERACTIONS BETWEEN USER INTERFACES AND

BACK-END ENGINES OF THE SYSTEM ......................................................................... 150FIGURE 7.3 ILLUSTRATION OF THE TABLET PC PLATFORM APIS PRESENTED AT

[REF] ........................................................................................................................................ 152FIGURE 7.4: ILLUSTRATION OF THE HIGH LEVEL RELATIONSHIP OF OBJECT

MODELS OF THE TABLET PC PLATFORM APIS ........................................................... 154FIGURE 7.5: HOME PAGE OF THE TRAINING DATA COLLECTOR ................................... 155FIGURE 7.6: SAMPLE DATA ENTRY PAGE OF THE TRAINING DATA COLLECTOR GUI

.................................................................................................................................................. 156FIGURE 7.7: SCREEN SHOT OF THE DEVELOPER GRAPHICAL INTERFACE .............. 158FIGURE 7.8: THE FIRST VERSION OF THE COLLABORATOR’S TABLET PC INTERFACE

FOR THE HANDWRITTEN PITMAN’S SHORTHAND RECOGNITION SYSTEM ...... 160FIGURE 7.9: THE LATEST VERSION OF THE COLLABORATOR’S TABLET PC

INTERFACE FOR PITMAN’S SHORTHAND RECOGNITION SYSTEM...................... 161FIGURE 7.10: SCREENSHOT OF A NOTE-PAD LAYOUT OF THE END-USER

INTERFACE OF THIS RESEARCH.................................................................................... 163FIGURE 7.11: SCREENSHOT OF AN ALTERNATIVE LAYOUT OF THE END-USER

INTERFACE OF THIS RESEARCH.................................................................................... 164FIGURE 7.12: THUMBNAILS OF THE FOUR GUIS EVALUATED IN THE EXPERIMENT165

xi

FIGURE 7.13: THE GENERAL DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES......................................................................................................................... 166

FIGURE 7.14: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES IN THE CASE OF SPEED WRITING ..................................................... 167

FIGURE 7.15: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES IN THE CASE OF A SMALL AMOUNT OF TEXT ENTRY INTO HANDHELD DEVICES.......................................................................................................... 168

FIGURE 7.16: THE COMPARISON OF THE MOST FAVOURITE GUI OF EXPERIENCED SHORTHAND WRITERS AND THAT OF NOVICE SHORTHAND WRITERS ............ 169

1. Linguistic Post Processing of Handwritten Pitman’s Shorthand

1

1 Linguistic Post Processing of Handwritten Pitman’s Shorthand


2

Chapter 1 Introduction

Recently, there has been a dramatic growth in the use of handheld devices as powerful

appliances to collect and distribute information efficiently. Examples are provided by

companies and organizations worldwide who are implementing mobile business solutions to

accelerate business cycles, increase productivity and reduce operating costs by the use of

mobile phones, tablet PCs, pocket PCs and Personal Digital Assistants (PDAs). Current

handheld computers are applicable to daily business procedures, however the ultimate

usefulness of these handheld devices depends on a solution to a serious bottleneck: textual

information needs to be entered as quickly and accurately as possible, similar to using a full

size keyboard. Computers continue to get smaller and thinner with the thinnest tablet PC

recently launched by NEC merely 1 cm thick and weighing less than 1Kg at the time of

writing. The transformation of a standard “QWERTY” keyboard into these compact devices

has not been so effective; miniature keyboards make text entry very slow at less than 10

words per minute (wpm) [Mt98].

This bottleneck has been a major concern for manufacturers of handheld devices and

decades of research and development have been invested in inventing a feasible means of

text entry into mobile devices, resulting in commercial systems with four main types of text

input methods: (a) on-screen keyboards, (b) handwriting recognition systems, (c) gesture

based text entry systems, and (d) speech recognition systems. The existing systems meet the

fundamental requirement of inputting text into handheld devices, but a solution to practical

rapid text input into handheld devices still remains to be done.

This dissertation presents work on the research, design, implementation and evaluation of

techniques that facilitate rapid text entry into a pen based computer, approximately at the

same rate as speech (i.e., more than 100 words per minute). It is based on Pitman’s


3

Shorthand, which is a speed-writing mechanism widely practiced in the real time reporting

community.

This chapter gives an overview of the linguistic post processing system of a handwritten

Pitman’s Shorthand recognizer. It mainly highlights the motivation and scope of the work.

It also outlines the general objectives of the thesis and draws attention to the author’s

contribution to achieve each objective. A synopsis of the thesis that explains the structure of

the dissertation along with a brief summary for each chapter is given at the end of the

chapter.

1.1 Background

1.1.1 Collaboration

Research in this thesis has been carried out in close cooperation with Nanyang

Technological University (NTU) in Singapore to the extent that a team from NTU

contributed to the research and development of low level classification of handwritten ink

data, and a team from the University of Nottingham contributed to transliteration of

classified primitives into English words. The collaboration has been a great success with

several workshops carried out at NTU annually as well as with a series of co-authored

publications [HHL+04a], [HHL+04b], [HHL+04c], [YLH+04a], [YLH+04b], [HHL+05a],

[HHL+05b], [HHL+05c], [YLH+05a], [YLH+05b], [YLH+05c]. In addition, concurrent

development of the two engines (i.e., recognition and transcription engines) has not been

difficult, mainly due to the accessibility of the classified data of the recognition engine since

the start of the project. This is because the collaborator has already carried out extensive

research on the low level segmentation and classification of handwritten Pitman’s Shorthand

outlines for over two decades and the collaborator’s contribution to this research is, in fact,

improving an existing recognition engine rather than developing a completely new one.


4

Previous work done by our collaborator can be referenced in [Lg84], [LD84], [LDB84],

[LDB85], [LD86], [LD87], [QL89], [LQ89], [Lg89], [Lg90], [LQ90], [QL91], [NL92],

[LQ92], [QL93]. The transcription engine and work described in this thesis is, however,

new.

1.1.2 Motivation

The major motive behind this research has been to investigate the linguistic post processing

of handwritten Pitman’s Shorthand as a rapid means of text entry on handheld devices and

an evaluation of the overall performance via a tablet PC based demo system. This involves

data pre-processing, lexicon preparation, word level interpretation, phrase level

interpretation and the development of a Graphical User Interface (GUI). No earlier work

fully presents a handwritten Pitman’s Shorthand recognizer for handheld devices with a

complete GUI.

One of the factors that holds the potential for the automatic recognition of handwritten

Pitman’s Shorthand is due to the language itself being simple and fast to write. Pitman’s

Shorthand records speech phonetically and comprises simple notations of 24 consonants, 12

vowels, and 4 diphthongs. It defines 90 of the most frequently used words as shortforms

(i.e., single simple pen strokes invented for speed improvement purposes) and these 90

shortforms account for over 37% of the most commonly used English words [Lg90].

Taking into consideration a simultaneous verbatim written transcription of speech, the

automatic recognition of handwritten Pitman’s Shorthand should not be seen as an option,

but as a necessity for mobile rapid note takers. Regardless of the portability and efficiency

of handheld devices, today’s mobile rapid note-takers (e.g, stenographers) still retain a

traditional way of writing shorthand with a paper notepad and a pencil as their tablet PCs or


5

Personal Digital Assistants (PDAs) are not productive enough to record speech in a real

time.

In addition, having a cooperative research network provides a firm foundation on which this

research can be based. The linguistic post processing of handwritten Pitman’s Shorthand

can be taken as a further step of expanding what is already possible with a Pitman’s

Shorthand classifier, as reported in the literature. The classifier supports the noise reduction,

and outlines segmentation and classification of pattern primitives into related categories. It is

a low level processing tool and its output is fed directly to the transcription engine.

Finally, hardware and technical viability played an important role in the successful

development of the whole research. Compared to the time of the previous research,

handheld devices have become more easily accessible with a more powerful engine but at a

cheaper price. A number of mobile PC and tablet PC development tool kits have become

available and these factors have strengthened the feasibility of the research.

1.1.3 Scope

From a handwriting recognition perspective, this research relates to an online recognition1.

It includes a minimal study of the low level processing of handwritten scripts, with a deep

research into the transliteration of shorthand primitives into orthographic English words.

This incorporates theories and techniques of pattern recognition, natural language processing

and mobile PC applications. Figure 1.1 illustrates a high level view of the scope of the

thesis.

1 In an on-line recognition, an input is in the form of successive points of strokes collected in

time order; whereas in the off-line recognition, an input is in the form of a digital image of

handwritten word.


6

Figure 1.1: Scope of the thesis

Three areas have been investigated in the field of pattern recognition. The first one is

concerned with setting protocols to interrelate a linguistic post processor with a low level

classification engine. Without the successful integration of these two engines, work in this

thesis would not have been feasible. The second one consists of defining a network model

that not only best represents the natural ambiguity of handwritten Pitman’s Shorthand, but

also produces promising output for a written word. The third area is focused on investigating

relevant word rejection strategies in which the interpretation cost has been taken into

account, mainly in terms of its search time and storage requirements.

In the field of natural language processing, a substantial amount of work has been done in

the construction of a shorthand lexicon that is used to support word level transcription. This

mainly includes the application of rule based algorithms to simulate instinctive knowledge

gained from learning Pitman’s Shorthand and the creation of a shorthand dictionary based on

this knowledge. In addition, a survey on the impact of statistical language modelling in

handwriting recognition has been carried out in relation to phrase level transcription.

Natural language processing

Syntactic knowledge

Lexical semantic knowledge

Pattern recognition

Handwriting recognition

Online handwriting recognition

Pen based PC

applications

Tablet PC applications

Statistical language

model

Scope of the thesis

Legend

Handheld device

applications


7

In the field of mobile PC applications, three types of end user interfaces have been

developed in this research; (1) a Training Data Collector, (2) an Advanced User Controller

and (3) a Final User Interface. By using the Training Data Collector, a vast amount of

training data can be collected effectively, and by using the Advanced User Controller, a

developer can have deep insight into the structure of the system, thereby enabling him/her to

make changes to the low level parameter settings. Similarly, by using the Final User

Interface, a user can have a front-end view of the system and can practice real time

shorthand input into handheld devices. The development of the interfaces includes the

application of pen based APIs, analysis of parameters of the transcription engine, collection

of training and testing data, and evaluation of the overall system performance.

1.2 Brief Overview

1.2.1 General Objectives and Contributions

The aim of this research is to propose and evaluate a set of techniques to significantly

improve the transcription accuracy of a handwritten Pitman’s Shorthand recognizer and

deliver a commercially viable and functional prototype. In order to enable the reader to gain

a brief overview of this research, the following questions and answers are provided in which

the questions represent general objectives of the research and the answers highlight the

author’s contribution to achieve the objectives.

A set of questions relating to the system integration and configuration:

How effectively has the Pitman’s Shorthand linguistic post processor integrated with

the collaborator’s low level recognition engine under the condition of developing the

two engines in different countries?


8

The solution includes an extensive collaboration between the two teams including

the author’s annual visits to the partner’s institution, setting protocols on the data

flow and modification of components between the two systems, concurrent

evaluation of the whole system on both sites, and co-authoring the publication of

progress reports.

What are the tasks of the recognition engine and the transcription engine in general?

A high level view of the tasks of the recognition and transcription engines are shown

in Figure 1.2. The white boxes at the top of Figure 1.2 represent processes of the

recognition engine and the shaded boxes represent tasks taken by the transcription

engine. The sample input outlines in Figure 1.2 illustrate the functions of the

recognition and transcription engines.

A set of questions relating to the linguistic post processing:

To what extent is the linguistic post processor based on the previous work?

The linguistic postprocessor is based on the recognition engine developed by our

research collaborator and the recognition engine has inherited a majority of the

pervious work reported in the literature. Apart from the recognition components, the

remaining transcription components are based on completely different approaches

rather than the ones reported in the literature. Reasons for using the new approaches

are discussed in detail in Chapter 3.


9

Figure 1.2: A high level view of the scope of the recognition engine and the transcription engine

What new approaches are there in this linguistic post processing research?

The Significant new approaches of this thesis are:

x1, y1x2, y2x3, y3x4, y4..

Data collection

Segmentation Classification Word level transcription

Phrase level transcription

Tablet PC based graphical user interface

wornwarmstorm

SuddenWelcomeSeldom

Warm Welcome

3 possible types of Segment 1.

Segment 1x1, y1x2, y2x3, y3x4, y4.. Coordinate of a

pen

worn

2nd sample input outline

Collected data segmented data classified data transcribed word(s) result word(s)

Coordinates of a pen

Segment 1

3 possible types of Segment 1.

1st sample input outline

Legend

Process included in our collaborator’s recognition engine

Process included in the author’s transcription engine

Data flow

collected data segmented data classified data transcribed word(s) result word(s)


10

(a) An electronic version of a shorthand dictionary has been successfully created by

using rule based algorithms. A similar kind of Pitman’s Shorthand dictionary in e-

format was never present in the past.

(b) Bayesian Network based outline models have been proposed with the aid of an

electronic shorthand dictionary and training data. These outline models well

represent the distribution of the natural parameters of handwritten Pitman’s

Shorthand and increase word transcription accuracy.

(c) A complete framework for the online recognition of handwritten Pitman’s

Shorthand is reported in this thesis. Whereas, most of the work in the literature

emphasized only an initial segmentation and classification of shorthand primitives.

(d) The first tablet PC based demo system has been produced. This allows a future

researcher to have deep insight into the performance of the recognition and

transcription engines via functional interfaces. It also enables an end user to input

shorthand into a handheld device.

A set of questions relating to the development of a mobile PC application:

For what types of handheld devices is the system intended to be applicable?

The system is intended to be applicable for any pen based mobile device in which

the use of a traditional “QWERTY” keyboard is impractical. Experiments and

evaluations of this thesis are based on tablet PCs with Microsoft Windows XP

Tablet PC Edition 2005.

For what kind of domain/scenario is the application aimed to be applicable?

The application is aimed to be applicable:


11

(a) as a rapid text input system on handheld devices

e.g., typing a text message on a mobile phone, inputting rich text information into

PDA.

(b) as a useful tool for stenographers in a real time verbatim written transcription of

the spoken word

e.g., taking a memo in a meeting, taking verbatim legal records of speeches in a

court, providing real time subtitling services for the deaf and hard of hearing

community.

(c) as a real time language translation tool.

With additional configuration, the system can be applicable as a real time language

transliteration tool for an international traveller. For example, using shorthand a

person can write his/her question with English phonetics and immediately the

question is translated into, say, Japanese. This can resolve language barriers for

international travellers enabling them access to essential information. The language

translation feature needs an additional installation and configuration of third party

software and it is not included in the scope of this thesis.

A set of questions relating to training and testing of the overall system:

What kind of people are involved in the training and testing of the overall system?

In order to evaluate a realistic performance of the whole system, the training and

testing involve writers with different levels of skills in Pitman’s Shorthand, different

genders and ages.

How is the whole system performance evaluated?

The overall system performance is evaluated under different criteria such as

unconstrained writing, independent users, different speeds of writing and different


12

levels of tidiness. The evaluation process also involves a list of practical concerns

such as usability, learning curve, popularity/commercial viability of the system.

1.3 Synopsis of the Dissertation

The research in this thesis combines theories and techniques from the fields of pattern

recognition, natural language processing and mobile PC application. It aims for a

commercially viable and functional prototype with a set of techniques that significantly

improve the transcription accuracy of a handwritten Pitman’s Shorthand recognizer.

This chapter (Chapter 1) presents the motivation, scope, and background of the research. It

introduces the three main problem areas relating to the themes of the thesis, major objectives

and contributions.

Chapter 2 reviews key concepts in the areas of Pitman’s Shorthand recognition, pattern

recognition and natural language processing. The focus in Pitman’s Shorthand recognition is

on the evaluation of existing text entry methods into handheld devices, the study of Pitman’s

Shorthand, and the review of existing approaches applied to the automatic recognition of

handwritten Pitman’s Shorthand problems. The focus in pattern recognition is on the

analysis of the capabilities of commonly used graphical models to resolve natural

ambiguities of handwriting. Finally, the focus in natural language processing is on the

review of the Viterbi algorithm and statistical language modelling techniques used to

enhance the solution to the phrase level transcription problem.

Chapter 3 reports on a prototype that implements the architecture designs described in the

literature. In particular, it expounds the phonetic based transcription of handwritten

Pitman’s Shorthand outlines and presents the problems that need resolving. This chapter


13

primarily discusses whether the conventional phonetic based transliteration method is

efficient for the purpose of the thesis.

Chapter 4 presents the main architecture and design of a novel primitive based transcription

approach. Ambiguities of handwritten Pitman’s Shorthand, in particular, stroke variations

and vowel omissions are resolved by introducing Bayesian Network based shorthand outline

models. The word interpretation includes outline models creation, belief propagation,

Bayesian Network based learning and model selection. The conceptual solution is shown to

improve the solution to the word level transcription problem.

Chapter 5 focuses on the generation of a novel machine-readable Pitman’s Shorthand

lexicon, which is repeatedly applied to the primitive based transcription of handwritten

Pitman’s Shorthand. The rule based conversion of a phonetic representation of a word into a

Pitman’s Shorthand representation is presented. This involves the creation of an electronic

Pitman’s Shorthand lexicon with the application of the writing rules of Pitman’s Shorthand,

plus the evaluation of the proposed methods with different sizes of lexicons.

Chapter 6 proposes a Viterbi algorithm based framework to resolve the Pitman’s Shorthand

specific phrase level transcription problem. The framework comprises Pitman’s Shorthand

related contextual knowledge. Experimental results demonstrate the practical benefits of the

proposed framework.

Chapter 7 documents the roles of the graphical user interfaces of this research that are

designed for the developer’s authoring environment, the experimental user’s authoring

environment, and end-user’s authoring environment. Experimental results substantiate the

affirmative feasibility of the proposed interfaces.


14

This thesis supports the argument that the development of an automatic handwritten

Pitman’s Shorthand interpreter is feasible and useful. Chapter 8 highlights the argument by

reviewing the dissertation’s key points, linking the results to the general objectives,

highlighting the contributions and presenting the perspective future work.

2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand

15

2 Background to the Automatic Recognition of

Handwritten Pitman’s Shorthand


16


This chapter provides background information on the computer aided recognition and

interpretation of handwritten Pitman’s Shorthand. It comprises of seven sections, outlined

as:

- Evaluation of existing text input systems for handheld devices.

- A brief overview of Pitman’s Shorthand.

- An overview of the automatic recognition of handwritten Pitman’s Shorthand.

- Handwriting recognition algorithms to improve word level transliteration.

- Natural language processing algorithms to improve phrase level transliteration.

- Pen interface application interfaces.

- Summary.

2.1 Evaluation of Existing Text Input Systems for Handheld

Devices

After a brief review of text entry into current handheld devices, capabilities of available text

entry methods are evaluated, particularly, in comparison with those of a handwritten

Pitman’s Shorthand recognizer in this section. Methods evaluated include:

- On screen keyboard.

- Cursive handwriting recognition system.

- Gesture recognition system.

- Speech recognition system.


17

2.1.1 On-screen Keyboards vs. a Handwritten Pitman’s Shorthand

Recognizer

An on-screen keyboard is a virtual keyboard displayed on the flat display panel of a device

where text is entered by tapping a stylus on it serially, for instance, IBM’s Touchboard for

Windows. The method provides an adequate means of interaction with computers; however,

it requires constant visual attention since keys are not physically sensitive to fingers. From

the aspect of human computer interaction, the use of handwritten data entry has been shown

to be more natural for entering text into handheld devices [Win05]. However, practical use

for one system over another still relies on the purpose of use and/or individual user

preference. If a user prefers a handwritten recognizer to an on-screen keyboard in general, a

commercially viable handwritten Pitman’s Shorthand recognizer may be of great interest to

the user.

2.1.2 A Cursive Handwriting Recognizer vs. a Handwritten Pitman’s

Shorthand Recognizer

A cursive handwriting recognition engine built in Microsoft Tablet PC Edition 2005

[Win05] is a well known handwriting recognizer at the time of writing. It is capable of

interpreting cursive script; however, efficiency of the system is restricted by the limited

speed of normal cursive writing (i.e., less than 40 words per minute). Using handwritten

data not in the form of normal longhand is likely to be a solution to overcome the slow input

problem. According to [Lg90], a Pitman’s Shorthand writing system is an alternative to a

longhand writing system that can be practiced nearly at the same rate as speech (i.e., about

120- 180 words per minute (wmp)).

On the whole, Pitman’s Shorthand presents a number of strengths that facilitate very rapid

writing, but it also presents a drawback in that it has a long learning curve, which includes


18

memorizing new phonetic symbols and pronouncing words based on a number of rules.

Having said that, there are millions of Pitman’s Shorthand writers who have received

training in its use [Lg90] and most of them remark that it is worth learning although there is

a little frustration at the time of learning. Therefore, the automatic recognition of

handwritten Pitman’s Shorthand is intended to be beneficial to a particular group of

stenographers, plus some interested users who are dedicated to achieve fast data entry using

handheld devices.

2.1.3 Gesture Based Text Entry Systems vs. a Handwritten Pitman’s


In general, gesture based text entry systems provide a virtual keyboard that enables users to

make input gestures. A well known gesture based text entry system at the time of writing is

SHARK [ZK03], developed by IBM. It allows a user to input gestures with the aid of a

virtual, template keyboard and gradually trains the user to be capable of inputting gestures

without using the keyboard. In this way, SHARK eliminates the constant virtual attention

required for a virtual keyboard and produces fast data input. In order to enable the reader to

get a clear view of a gesture based text entry system, word entries into the SHARK system is

shown in Figure 2.1. Figure 2.1 (a) illustrates input for the word “quick” using a virtual

keyboard, and Figure 2.1 (b) illustrates input for the same word “quick” without using the

virtual keyboard.

On the whole, gesture based text entry systems facilitate a faster data input compared to

normal cursive handwriting recognizers; however, memorizing gestures of a substantial

number of words results in a very steep learning curve.

In general, Pitman’s Shorthand recognition is similar to gesture recognition since both

interpret a series of lines into words and produce a fast data input. However, there is no


19

need to memorize every gesture of a word using a handwritten Pitman’s Shorthand

recognizer since the construction of Pitman’s outlines is based on a set of phonetic rules.

Figure 2.1: Illustration of text entry using SHARK system (a) The word “quick” is written using ATOMIK keyboard layout (b) The word “quick” is written without using a

template keyboard

2.1.4 Speech Recognition Systems vs. a Handwritten Pitman’s


In terms of efficiency and operational cost, speech recognition systems seem the most

outstanding compared to other data input methods because users can speak naturally as well

as rapidly (around 100 -120 words per minute) using speech recognition systems. An

example is given in the real time subtitling of TV programs, where speech is automatically

dictated into text and manual retranslation cost is reduced. A primary negative aspect of

speech recognition systems is that data must be spoken. On some occasions, it is not always

feasible to input data via voice, for instance, an automatic transcription of a noisy debate

using a speech recognition system is considerably difficult unless it is feasible to encourage

speakers to use microphones. Therefore, this research proposes that it is reasonable to

develop a system that facilitates an alternative means to record speech without using speech

input.

Starting point

Starting point

(a) (b)


20

2.2 Pitman’s Shorthand: a Brief Overview

Pitman’s Shorthand was first presented by Sir Issac Pitman in 1837 and it has two forms -

New Era and Pitman 2000. Research in this thesis is based on the latter one as Pitman 2000

is a modified version of New Era and offers more accurate transcription as well as a faster

learning curve.

Words are written as they are pronounced with the main feature of Pitman’s Shorthand being

the simplicity of its notations. There are 24 consonants, 12 vowels and 4 dipthongs in

Pitman’s Shorthand. A skeleton of a shorthand outline is formed by a combination of

consonant strokes, and the writing of vowels is optional. This means it is essential to write

the consonant strokes of a word, but vowel notations can be omitted when the writing needs

to be fast. There is no standard rule defined for the omission of vowels – it varies widely

depending on a writer’s experience or an individual’s inclination.

Due to the phonetic based formation of words, Pitman’s Shorthand is easily adaptable to

multiple languages (15 languages to date). It is practiced as a speech-recording medium in

the real time reporting community at a practical rate of about 120-180 words per

minute[Lg90 ]. It is widely used in offices in the UK and is also taught in 74 other

countries. [Lg90]

Figure 2.2 illustrates 21 out of 24 basic Pitman’s consonants with three diagrams, which can

be easily remembered. In order to understand notations of the diagrams, consider the left-

most stroke in Figure 2.2 (a). This stroke indicates that notations for phonemes /P/ and /B/

are and respectively. Note that the two notations are the same down-stroke with

different line thicknesses. Similarly, according to Figure 2.2 (b), notations of phonemes F

and V are and respectively.


21

Figure 2.2: Basic Consonants of Pitman’s Shorthand as illustrated in [Oj95]

In addition to the 21 consonants in Figure 2.2, there are three additional consonants of

Pitman’s Shorthand, which are /W/, /Y/ and /H/. These consonants are formed using hooks

and upstrokes as shown in Figure 2.3. Vowels and diphthongs are simple pen strokes and

are also illustrated in Figure 2.4.

Figure 2.3: /W/, /Y/, H/ consonants of Pitman’s Shorthand

Figure 2.4: Vowels, diphthongs and diphones of Pitman’s Shorthand

Words are constructed with consonant and vowel notations in Pitman’s Shorthand and a

script containing both consonants and vowels is called a vocalized outline. Samples of

vocalized outlines, including notations of vowels, diphones and diphthongs are illustrated in

Figure 2.5 .

Vowel notations Diphthong notations Diphones notation

P, B

T, D P, B L R

SH, ZHF, V

th, TH

M

S, Z

N, NG

(a) (b) (c)

R

K, G

W Y H


22

Figure 2.5: Illustration of vocalized outlines

By using basic notations, illustrated in Figure 2.2, Figure 2.3, Figure 2.4, a person can start

writing a shorthand outline that is phonetically correct, but not in complete accordance with

the special rules of Pitman’s Shorthand. The special rules include 20 definitions, invented

for speed enhancement purposes and need to be remembered thoroughly if a person wants to

be a professional Pitman’s Shorthand writer. Details about the special rules of Pitman’s

Shorthand can be referenced in [Oj95] and one of the special rules is given as an example

here. In the example (Figure 2.6), the word “play”, comprising of three phonemes (/P/, /L/

and /Ā/) can be written phonetically using the basic notations of Pitman’s Shorthand as

shown in Figure 2.6 (b). However, one of the special rules of Pitman’ shorthand is read: - “if

a phoneme /P/ is followed by a phoneme /L/, the notation for /L/ is transformed into a small

hook and the small hook is attached to the beginning of /P/ stroke.” Therefore, the word

“play” should be written in the form of Figure 2.6 (c) rather than in Figure 2.6 (b) although

the form in (b) is phonetically correct.

Figure 2.6: (a) Basic notations of Pitman’s Shorthand (b) The word “play” is written phonetically using basic notations (c) The word “play” is written using a special rule

of Pitman’s Shorthand

vowel

vowel

diphones diphthong

bait go radio time

Phonemes Basic notations

P

L

Ā

(b) (c)(a)

Outline for the word “play”

Outline for the word “play”


23

In addition to vocalized outlines, other important components of Pitman’s Shorthand are

short forms and phrases. In general, short forms and phrases account for over 40% of the

most commonly used English words [Lg90] and are key attributes for facilitating the

outstanding speed of Pitman’s Shorthand. Examples of short forms and phrases are depicted

in Figure 2.7.

Figure 2.7: (a) Samples of short forms (b) Samples of phrases

2.3 Automatic Recognition of Handwritten Pitman’s Shorthand: an

Overview

The first investigation into the feasibility of using handwritten Pitman’s Shorthand as a

means of verbatim transcription as an aid for the deaf was reported by Brooks and Newell

[BN81], [Bc85], [BN85].

Concurrent work [Lg84] investigated this idea in more detail and further work [LD86]

evaluated the enormous potential of the online recognition of handwritten Pitman’s

Shorthand for the real time recording of speech (e.g., verbatim reporting of meetings and

court proceedings). In this approach, four main studies were carried out: (a) detection of a

consonant boundary in a whole outline, (b) classification of segmented consonant strokes,

(c) evaluation of a normal-length stroke confusing with half-length/ double-length strokes,

and (d) evaluation of various inclinations of horizontal and vertical strokes. In addition,

a/an all and as/has do eye/I have

you should not be your company I am not

(a)

(b)


24

different classification algorithms were used to classify vocalized outlines and short-forms in

this approach. The best classification rate reported at the time was 14.5%.

In the early 1990s, extensive research was carried out to improve the recognition of

vocalised outlines and short-forms. Leedham [Lg90] reported that using contextual

knowledge was the most feasible means to improve the recognition of short-forms, in which

the recognition was based on a template-matching algorithm. In this approach, the

transliteration was carried out by firstly sorting classified pattern primitives into correct

linguistic order; secondly converting primitives into phonemes using a set of production

rules and finally converting phonemes into orthographic English words. The concept of a

machinography, that is, how to modify the original Pitman’s notations to be ideally suited

for machine recognition was also addressed in this work.

In later work [LQ90], basic notations of Pitman’s Shorthand were categorized into 89 basic

features and incorporated into a neural network. Their approach could correct initial

classification errors and achieved a classification rate of 94.5%.

Concurrently, Leedham and Qiao [LQ90] carried out another experiment to evaluate the

classification performance using a fuzzy classifier. In this approach, the classification (90%

correctness) was achieved by interacting between segmentation and classification processes.

Initial classification errors were also corrected using knowledge of legal primitive-pairs in

this approach.

In 1993, Qiao and Leedham [QL93] took another innovative approach to classify segmented

primitives. Their new method allowed communication between bottom up processes (i.e.,

segmentation based classification) and top down processes (i.e., holistic classification) via an

interactive heuristic (IH) search schema. They reported that locating a boundary between


25

features without first recognizing a whole outline was difficult. The performance of their

work was 84% correct segmentation and 58% correct classification.

In the early 2000s, another research group [NB02] [KSN+03] [SKN+04] [KSN04] started

investigating off-line automatic recognition of handwritten Pitman’s Shorthand. This group

concentrated more on the linguistic post-processing of classified primitives into orthographic

English words. Similar to Leedham’s approach, phonetic based transcription using the same

concept of vowel ordering was implemented. Incidence of homophones (outlines which are

written the same but have different representations) was addressed in their work and the

filtering of homophones using domain based rejection strategies and context based rejection

strategies was investigated. They mentioned that an ordinary phonetic dictionary was not

adequate for generating text, and a modified dictionary, particularly designed for the

recognition of Pitman’s Shorthand was necessary. On the whole, a major limitation of their

work was an impractical assumption about homophones i.e., an occurrence of only two

homophones for each word was considered.

In summary, work carried out by pervious research mainly emphasized the low level

segmentation and classification of shorthand primitives with little work reported on the

backend transliteration. This thesis proposes that further extensive research is required to

improve word level transcription as well as phrase level transcription. In order to achieve

this goal, first it is necessary to make a thorough evaluation of recent popular handwriting

recognition algorithms and natural language processing algorithms.


26

2.4 Handwriting Recognition Algorithms to Improve a Word Level

Transliteration

Work in this thesis considers the fundamental problem of interpreting shorthand strokes of

digital ink as text. Here, features extracted from a shorthand outline already give a

reasonable separation of strokes and provide the related identity of the strokes. It is

necessary to take into account the context of strokes to achieve a promising interpretation;

however, dealing with spatial context can easily become computationally intensive [BSH04].

For optimum text interpretation, it is practical to take a balanced consideration between

context and the low level ink information of strokes.

In the field of handwriting recognition, a common approach to handle variables (e.g., context

and observed ink information) is by embedding them into a probabilistic model and

discriminating between them based on resultant probabilities. Graphical models are

considered here. Graphical models are a marriage between probability theory and graph

theory [Jm99]. They consist of two kinds: undirected and directed models. Undirected

models have simple definitions of independence, whereas directed models have a more

complicated notion of independence [Mk98]. There is a huge uncertainty and complexity in

the word recognition of handwritten shorthand and directed models are more relevant to

represent features of shorthand as well as interdependencies between them. Popular directed

graphical models are: Hidden Markov Models (HMMs), Neural Networks and Bayesian

Networks. In general, these models belong to the same family – examples are provided that

HMM is a kind of dynamic Bayesian Network; a Neural Network is a kind of input/output

HMM. The primary difference between them is the way variables are structured (i.e.,

topology) and the way interdependencies between variables are handled.


27

2.4.1 Hidden Markov Models (HMMs)

Hidden Markov Models represent hidden and observed states in terms of state variables,

which can have complex interdependencies [Mk98]. One of the tutorials on HMM [Rl89]

presents that “The Hidden Markov Model is a finite set of states, each of which is associated

with a (generally multidimensional) probability distribution. Transitions among the states are

governed by a set of probabilities called transition probabilities.”

A sample HMM model, representing an outline of Pitman’s Shorthand is illustrated in

Figure 2.8. In the figure, an outline is divided into several slices - each slice represents a

segmented classified primitive, containing one discrete hidden node and one discrete

observed node.

Figure 2.8: A sample HMM model for a single outline of Pitman’s Shorthand. At each state i, βi probability of a particular stroke Si to be type Ti is observed.

There are several kinds of HMMs depending on network topology: HMMs with a mixture of

Gaussian output, input-output HMMs and factorial HMMs. Details of these algorithms can

be found in the literature [Mk01], [Rl89].

In the field of pattern recognition many systems have applied HMMs – examples include the

representation of utterances as HMMs for speech recognition [Sa04], [MS04]; the

representation of facial images (combinations of hair, forehead, eyes, nose and mouth) as

HMMs for face recognition [HSS02], [KKL03]; the representation of words as HMM for

handwriting recognition [GB04], [HLB00]; the representation of human motion as HMMs

for gesture recognition [CFH03], [KP01]; the representation of pen-gestures (e.g., writing

pressure and smoothness of a line) as HMMs for signature recognition [JBS05], [YWP95].

S1 S2 S3 Si

T1 T2 T3 Ti

β1 β2 β3 βi

Hidden node

Observed node


28

Generally, HMMs work extremely well for certain types of applications; however, the

Markov assumption itself, i.e., that the probability of being in a given state at time t only

depends on the state at time t-1 is not always appropriate for certain kinds of problems where

dependencies extend from other states [Rl89].

2.4.2 Neural Networks

Neural Networks are based on the structure of the brain and are designed to mimic a

biological counterpart of neurons [Ri93]. They consist of neurons (i.e., variables),

connected via weighted links where the weight specifies the strength of a particular

connection between one node to another. The use of Neural Networks has been

demonstrated in several pattern recognition applications [Ri93]. Similar to HMMs, Neural

Networks have been devised in different types including single-layer linear Networks,

threshold Networks, multilayer Networks and multilayer Networks with learning.

Figure 2.9: An individual cell a of Neural Network, modelled for the classification of handwritten Pitman’s Shorthand in [LQ90]

A Multilayer Neural Network designed for the transcription of handwritten Pitman’s

Shorthand presented in previous research [LQ90] is illustrated in Figure 2.9. In that

network, there are 20 layers and each layer (i.e., each segment) consists of 89 nodes,

representing the 89 basic Pitman’s primitives. Only one node from each layer is capable of

activating the next layer and the activation is based on the competition among the nodes. A

major drawback of this model is an unnecessary consideration of a wide range of primitives

89 links form the previous layer

89 links to the following layer

Input from the classifier

bias


29

in each layer. In fact, by using the context of an outline and a shorthand dictionary, the

number of nodes required for each layer can be normalised.

2.4.3 Bayesian Networks

Word level transcription in this thesis mainly applies Bayesian Network architecture. A

Bayesian Network [Pj88] is a directed acyclic graph in which each node represents a

mutually exclusive and collectively exhaustive set of random variables, and links between

the nodes signify probabilistic dependencies between them. It has been a remarkable tool in

the domain of handwriting recognition for its outstanding ability to model natural ambiguity

(e.g., to model complex stroke relationships). In on-line handwriting recognition, stroke

relationships are usually robust against geometric variation and important for discriminating

characters of similar shapes [CK04].

In Pitman’s Shorthand, stroke relationships mean occurrences of vowel notations and their

positions in a vocalized outline, and starting positions of the first consonant stroke i.e.,

whether it is written above, on or below a base line. An example of vowel dependency and

an example of positional dependency of the first consonant stroke are illustrated in Figure

2.10 (a) and (b) respectively. As shown in Figure 2.10 (a), a dot vowel written at two

different locations (i.e., beginning and end of a stroke) represent two different words in

Pitman’s Shorthand. Similarly, two exact outlines written at two different starting positions

(i.e., above and below a base line) represent two different words in Figure 2.10 (b).


30

Figure 2.10: Illustration of stroke dependencies in Pitman’s Shorthand (a) vowel dependency (b) positional dependency of the first consonant primitive

A summary of Bayesian Networks is described in this chapter and implementation of the

network in association with problems of the word level transcription of handwritten

Pitman’s Shorthand is discussed in detail in Chapter 4. In general, the basic concepts of

Bayesian Networks are discussed under the following sections:

Conditional independence.

Inference.

Learning.

2.4.3.1 Conditional Independence

In a Bayesian Network, an edge between nodes defines a dependency between variables. For

example, consider the event “grass is wet” and possible causes “cloudy” and “rain”. If

cloudy (C) becomes independent of wet grass (W) having an observed data, rain (R),

conditional independence between cloudy (C) and wet grass (W) can be indicated using a

series of arrows as shown in Figure 2.11.

aid eat

bath bathe

dot vowel at the end of a stroke

dot vowel in the middle of a stroke

The first consonant B (i.e., ) is written above the base line

base line

The first consonant B (i.e., is written on the base line

(a)

(b)


31

Figure 2.11: C is conditionally independent of W given R

Another type of dependency in Bayesian Network is “explaining away” [WH93], in which

each variable is competing to “explain” observed data. For example, consider the event

“grass is wet”, and possible causes “rain” and “water sprinkler”. Figure 2.12, in contrast to

the case in Figure 2.11, illustrates that two independent nodes, Sprinkler (S) and rain (R),

become conditionally dependent when having the observed data, wet grass (W). The

converging arrows towards wet grass (W) in Figure 2.12 indicate that if the grass is wet

when it is raining, the probability of a sprinkler being on becomes automatically less

possible and vice versa.

Figure 2.12: S is conditionally dependent on R given an observed data, W

Therefore, the state of a node being observed or hidden in a Bayesian Network has a huge

influence on the conditional dependency between variables. By using the Bayes ball

algorithm [Sr98], conditional independence between variables can be easily determined with

the information on a node being hidden or observed. The Bayes ball algorithm is illustrated

in Figure 2.13.

C R W

Cloudy Rain Wet grass

S

W

RSprinkler Rain

Wet grass

S R P(W= T) P(W = F)

T T 0.98 0.02 T F 0.95 0.05F F 0.0 1.0 F T 0.94 0.06

CPT of W node


32

Figure 2.13: Illustration of the Bayes Ball algorithm [Sr98]. If there is no flow of a ball from A to B in a graph, A and B are conditionally independent given a set of observed

or hidden variables X and vice versa.

In addition, every node in the Bayesian Network needs to be specified with a Conditional

Probability Distribution (CPD) and a table holding these distribution values is called a

“Conditional Probability Table” (CPT). A sample CPT of a W node is shown in Figure

2.12. The table indicates the likelihood of grass getting wet with regard to whether a

sprinkler is on and/or whether it has rained.

2.4.3.2 Inference

Inference of Bayesian Network involves computing the probability distribution of a node

given the values of some other nodes. In other words, it is the process of finding the

likelihood of an explanation given evidence and priori probabilities. One of the reasons

Legend

Hidden node

Observed node

Direction of the flow of a ball

Indication of parent-child relationship


33

Bayesian Networks are useful is because they permit a more efficient inference procedure

[Ja99]. Inference can be categorized into two types: exact and approximate.

Exact inference procedures are useful when a network structure is not too complex;

however, approximate inference procedures work better in practice when a model becomes

computationally complicated such as models with a repetitive structure or large clusters.

Examples of exact inference algorithms include a local message passing algorithm [Pj88],

[PS91] and a junction tree algorithm [HD96], [CDL+99]. Popular approximate inference

methods include Monte Carlo sampling methods [MD98], variational techniques [SJJ96],

[JGJ+98], [JJ98], and loopy belief propagation [WF99], [Wy00], [FW00].

2.4.3.3 Learning

“Learning” in Bayesian Networks often refers to learning parameters of a network as well as

the structure of the network.

In brief, parameter learning is an estimation of a conditional probability table (CPT) of each

node in a network based on a number of training samples. Here, the learning methods vary

widely depending on attributes of training samples i.e., whether they are (fully or partially)

observed, or whether they are (fully or partially) hidden. In general, there are three common

types of parameter learning methods – Maximum Likelihood (ML), Maximum a Posterior

(MAP) and Expectation Maximization (EM).

In ML learning, the goal is to find the maximum likelihood of training data given N cases,

which are assumed to be independent. Assume that D = (D1, …, DM) is a training data set

which contains M cases, the maximum (optimal) likelihood of each node α can be denoted

as


34

)|(maxarg

DP (2.1)

In MAP learning, Maximum a Posterior (MAP) estimation assumes the existence of a prior

p(β) over the parameters β [Ja99]. It prevents the case of zero probability if a particular

parameter is never seen in the training samples by the use of a Dirichlet prior. The chance of

having zero probability in MAP is because the algorithm is based on “counting”. According

to the wet grass example in Figure 2.12, the MAP of the wet grass node including Dirichlet

prior can be denoted as:

),(

),,(),|(

rRsSN

rRsSwWNrRsSwWPMAP (2.2)

where N(…) is the number of times the corresponding parameters are found to be true or

false and α, β are uniform Dirichlet priors, used when a particular parameter is not seen in a

training set. In general, MAP is used if there are a small number of training cases compared

to the number of parameters [Mk01], however it is still important that the counts are based

on sufficient statistics to achieve an optimal estimation.

Expectation Maximization (EM) is mainly used when variables are partially observable i.e.,

the network contains some hidden nodes. It computes expected values of all the nodes after

(M step) training by using an inference algorithm, and then treats these expected values as

though they were observed (in E step) [Mk01]. Using EM, it is important to know the

structure of the model in advanced as this is the key to identifying any hidden nodes. In the

case of the wet grass example in Figure 2.12, the EM of a W node can be denoted as

),(

),,(),|(

rRsSE

rRsSwWErRsSwWPEM

(2.3)

where E(…) is the number of times corresponding parameters are expected to occur.

According to Murphy [Mk01], E(…) is computed as follows


35

m m

mm DePDeIeE )|()|()( (2.4)

where I(e|Dm) is an indicator function which is 1 if a parameter e occurs in training case m,

and 0 otherwise.

2.5 Natural Language Processing Algorithms for Handwritten

Phrase Recognition

This section presents natural language processing algorithms relating to the field of

handwriting recognition. In particular, it focuses on the role of statistical language modelling

algorithms in handwritten sentence recognition systems.

2.5.1 Statistical Language Modelling

[MS99] stated that the major purpose behind statistical language modelling is to capture a

language’s regularities via statistical inference on its corpus. According to the literature

[QAC05], the concept of applying statistical language models to automatic text transcription

was initiated by speech recognition research. The concept was then later adapted to

handwriting recognition problems, resulting in several handwriting recognition engines built

with statistical language modelling techniques. For instance, recent work [PVM+03],

[Ms01], [QAC05] and [MB01], [ZB04], [VBB04] applied statistical language modelling

techniques to resolve the problems of online and offline handwritten sentence recognition,

respectively, and the work [QAC05] achieved up to 90.4% word recognition accuracy.

In general, the most commonly used statistical language models in the field of handwriting

recognition are n-gram models, which are denoted as follows by [QAC05]:


36

)|()(1

11

n

i

inii wwpWp (2.5)

where p(W) is the probability of a word sequence given by a statistical language model, and

)|( 11

inii wwp is the frequency of the sequence 1

1

iniw occurring in a corpus.

By applying a statistical language model, [QAC05] proposed a solution to online

handwritten sentence recognition as follows:

Ŵ= )()|(maxarg WpWSpW

(2.6)

where Ŵ is the most likely word sequence for a written sentence (out of the candidate

sequences W), S is a given handwritten sentence to recognise, P(S|W) is the posterior

probability of the written sentence S given a sequence W, and p(W) is the statistical language

model’s probability for the sequence W. This work identifies the most likely word sequence

for a written sentence by finding the best path in a word graph (i.e., a graphical model of a

sentence’s candidate words) using a Viterbi search algorithm [QAC05].

2.5.2 Viterbi Algorithm

The Viterbi algorithm provides an efficient way of finding the most likely state sequence in

the Maximum a Posterior (MAP) probability sense of a process, which is assumed to be a

finite-state discrete-time Markov process [Ml00]. Here, a finite state means that the number

of states in the model is limited, discrete-time means that it takes the same unit of time to get

from any state to its adjacent state in the model, and the Markov process means that

(assuming that it is a first order Markov process) the probability of being in state ck at time k

(given all states up to k-1) depends only on the previous states ck-1 at time k-1. [Ml00]

formulates the first order Markov process as follow:


37

)|(),...,,|( 111 kkkok ccpccccp (2.7)

Overall, the Markov process can be of any order and the nth order Markov process is defined

as:

),...,|(),...,,|( 110 knkkkk cccpccccp (2.8)

In order to clarify the Viterbi algorithm’s role in handwriting recognition, consider the

Viterbi algorithm (formula 2.9) proposed by [Ml00] for handwritten word recognition. It is

assumed that the process is the first order Markov process in the algorithm.

n

inniic ccPccPccpczpZg

111201 )]|(...)|()|(log[)|(log)( (2.9)

where gc(Z) is the maximum posterior probability of the sequence of characters conditioned

on candidate characters’ sequence C = c1,c2,.., cn, and zi is a feature vector for the ith

character.

2.6 Pen Application Program Interfaces (APIs)

With the rapid popularity of pen based computing in recent years, a number of pen based

application program interfaces (APIs) have become widely available. One of the most

popular APIs for collecting, manipulating and recognizing digital ink is Microsoft Tablet PC

platform SDK APIs [Tab04], which mainly supports the Microsoft Tablet PC platform. The

APIs include functions to manipulate low level ink data as well as higher segment-level,

stroke-level, word-level and phrase-level recognition. Some of the stroke-level APIs are not

directly applicable to the current research as Pitman’s Shorthand is not included in the

supported languages. Nevertheless, other APIs are highly useful for the development of ink


38

input and text output user interfaces. Implementation of these APIs for the overall

recognition and transcription of handwritten Pitman’s Shorthand is discussed in detail in

Chapter 7.

2.7 Summary

This chapter presented a literature review of systems and techniques relating to computer

aided recognition and transcription of handwritten Pitman’s Shorthand. The commercial

viability of the handwritten Pitman’s Shorthand recogniser is evaluated in comparison to the

functionalities of handheld devices’ existing text entry systems. The chapter presents basic

information on Pitman’s Shorthand, which is vital to enable the reader to easily follow the

thesis’ discussions, and it also provides brief reviews of decades of previous work on the

automatic recognition of handwritten Pitman’s Shorthand. A number of graphical models

applied to the pattern recognition field were discussed, with a thorough algorithm review of

the Bayesian Network’s architecture, mainly from the aspect of the algorithm’s efficiency in

handling handwritten Pitman’s Shorthand word recognition problems. The role of statistical

language models in the recognition of handwritten sentences has also been addressed,

together with a review of the Viterbi algorithm. The chapter also highlighted tablet PC

related application program interfaces (APIs) that are essential for the development of a

commercially viable prototype handwritten Pitman’s Shorthand recogniser.

3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines

39

3 Evaluation of phonetic based transcription of vocalised

handwritten Pitman’s outlines


40


The previous chapter reviewed the performance of existing work carried out on the

automatic recognition of handwritten Pitman’s Shorthand and presented an overview of

popular pattern recognition algorithms that can be used to improve the performance of word

level and phrase level recognition. Before taking the next step to advanced word and phrase

recognition, this chapter first presents a preliminary experiment, carried out to verify

whether existing transliteration methods, proposed in the literature, are efficient enough for

the purpose of this project.

In particular, the primary goal of this preliminary assessment is to ensure whether it is

practical to convert segmented portions of shorthand outlines into phonetic values prior to a

text translation. Perhaps direct translation of segmented primitives of shorthand outlines

into English words is more efficient, however, such an attempt has never been reported

throughout two decades of previous work. It has been shown that a primitive to text

translation approach is robust against stroke variation [CK04] and the approach is applied in

several commercial handwriting recognisers [Hn97], [LY97], [HV93]. Taking into

consideration the transcription accuracy achieved by existing systems, this research is not

based on the assumption that phonetic based transcription is the only one absolute solution

to transliterate handwritten Pitman’s Shorthand. In addition, the direct translation of

primitives into words was not feasible at the time of previous work because there was no

electronic Pitman’s Shorthand lexicon that enabled primitives to be directly mapped to

related words. If an electronic Pitman’s Shorthand lexicon is in existence, direct translation

of primitives into text will become feasible. It is proposed in this research that it is

reasonable to create an electronic Pitman’s Shorthand lexicon and analyse a primitive-to-text

translation approach. However, a careful appraisal of conventional methods is performed

before implementing a new algorithm. Therefore, this chapter firstly analyses the

advantages and disadvantages of phonetic based translation via experimental results, and


41

then presents a discussion on why a primitive based transcription approach is preferable to

phonetic based transcription approach.

In general, appraisal of existing methods can be carried out easily if the existing systems

serve the purpose of the assessment directly. However, this is not the case in the current

assessment (i.e., assessment of conventional phonetic based transcription methods). There

are two reasons for this: firstly, previous work by [LQ90], [QL93], [LD86] mainly

emphasises low level pattern classification and the work presents just logical procedures of a

linguistic post processor with no detailed implementation for phonetic based word

translation. Secondly, work by [NB02], [KSN+03], [SKN+04] emphasise offline

recognition and the systems there do not fit the objectives of the current experiment. As a

result, this chapter presents a prototype of a linguistic post processor that includes the

conventional idea of phonetic based word translation, plus novel pattern tuning algorithms,

which are effective in dealing with the shape variations of handwritten Pitman’s Shorthand

.

3.1 System Overview

In order to assist the reader to get a clear understanding of the whole framework, an

overview of the transcription engine in combination with our collaborator’s recognition

engine is given (Figure 3.1). Ink data is collected by the recognition engine whose role is to

firstly differentiate between vocalized outlines and short forms. It then segments a vocalized

outline into the most relevant fragments by detecting dominant points along the outline. The

segmented primitives are then processed through a neural network classifier, and a ranked

list of pattern primitives, along with each of their related categories, is produced at the end of

the classification process. Short-forms are recognized separately from vocalized outlines

using a Template Matching Algorithm. Unlike the vocalized outline recognizer, the short-

form recognizer immediately produces a ranked list of candidate words for a given short-


42

form. Detailed descriptions of the collaborator’s recognition engine can be referenced in

recent publications, [YLH+04a], [YLH+04b], [YLH+05a].

Figure 3.1: An abstract view of the whole system

The role of the transcription engine is to find the best candidate word for a given vocalized

outline or short-form. It includes two major stages: word level transcription and phrase level

transcription. At the word level transcription, short-forms are not taken into account since

they have been interpreted into the most likely words by the recognition engine. Vocalized

outlines are transliterated into sets of English characters by two processes: pre-processing

and word recognition. These two processes are the primary components of the system

presented in this chapter. The pre-processor performs the setting up of essential lexical

knowledge relating to handwritten Pitman’s Shorthand. The word recognizer then takes a

Input

Collaborator’s recognition engine

Vocalised outline recogniser

Short-formrecogniser

Segmentation engine (dominant point

detection)

Classifier (Neural Network)

Output:A ranked list of

primitives

Template matching

engine

Output:A list of words

Transcription engine

Vocalised outline interpreter

Pre-processing

Word level transcription


words

Short-form interpreter


Output Text

Internet


43

ranked list of classified primitives, which are forwarded from the recognition engine as

input, and produces a ranked list of candidate words as output.

After word recognition, candidate words of either a vocalized outline or a short-form are put

through a phrase level processor and the word with the highest contextual probability is

chosen as a correct representation for an input outline. The phrase level transcription is not

studied in this chapter since the primary purpose of the preliminary experiment is to analyse

word recognition performance.

3.2 Transcription of Vocalized Outlines Based on a Phonetic

Approach

A detailed view of a phonetic based vocalized outline interpreter is illustrated in Figure 3.2

and it consists of the following modules:

- Lexicon preparation: converts a phonetic lexicon into a hash table such that similar

sounding words are indexed under the same key in order to cope with phonetic rules

of Pitman’s Shorthand.

- Nearest Neighbourhood Query: slightly adjusts segmented features of an input

shorthand outline in order to cope with shape-variation in handwriting.

- Feature to phoneme conversion: converts geometrical features of shorthand outlines

into phonetic representation in order to match with a phonetic lexicon.

- Phoneme ordering: reorders resultant phonemes, produced by a “Feature to

phoneme conversion” process into a linguistic sequence in order to match with a

phonetic lexicon.

- Lexicon lookup: matches a series of phonemes with a phonetic lexicon to find

related English words.


44

Figure 3.2: Detailed view of a vocalized outline interpreter

3.3 Lexicon Preparation

The primary purpose of the lexicon preparation is to convert a phonetic dictionary into a

hash table data structure and categorise words with similar pronunciations under the same

key. Here, words with similar pronunciations mean words with either identical phonemes or

similar phonemes. For instance, the words “bet” and “pet” have similar pronunciations

because they contain similar phonemes with the only difference of a voiced consonant /B/

and an unvoiced consonant / P/.

A major benefit of keeping similar sounding words under the same key is to reduce the

search complexity at O(1). In addition, it enables the retrieval of a list of ambiguous words

for an input outline by a single lookup because the creation of a hash table for a lexicon is

Vocalized outline

recogniser

A ranked list of classified primitives

Vocalized outline interpreter

Lexicon preparation(Pre-processing)

Nearest Neighbourhood Query (NNQ)

Feature to phoneme conversion

Phoneme ordering

Lexicon lookup


Phonetic lexicon

A ranked list of words

Contextual knowledge

Sentence level transcription

Output word(s)

Input outline

Process

Data Data flow

Read/Write accessStorage

Legend


45

based on the hypothesis: “words with similar pronunciations resemble one another in

Pitman’s Shorthand”. One may question why similar sounding words resemble one another

in Pitman’s Shorthand since this assumption is not true in normal English. In normal

alphabetical handwriting, two similar sounding words do not exactly need to look alike. An

example is given with the words “tail” and “tale” in (Figure 3.3); the two words sound alike,

but their scripts are dissimilar enough not to be confused.

Figure 3.3: Illustration of sample words in normal English and Pitman’s Shorthand

In contrast to normal English, similar sounding words do look alike or are identical in

Pitman’s Shorthand. This is due to the special rule of Pitman’s Shorthand invented for

speed improvement purposes i.e., a pair of voiced and unvoiced consonants are written in the

same stroke with different line thicknesses. An example is given with the words “tail” and

“tale” again (Figure 3.3): the two words sound alike and their scripts look identical in

Pitman’s Shorthand. Therefore, keeping similar sounding words under the same root

directly affects search performance and an algorithm for the lexicon organisation is

presented below:

N: numbers of words contained in a phonetic lexicon

Xi: ith phonetic index of the phonetic lexicon

Yi: word data relating to Xi

table: a hash table used to store data of the phonetic lexicon

key: a phonetic key

value: word data to which a specified key is mapped in table

Typed words in English

tale

tail

Handwritten words in English Handwritten words in Pitman’s Shorthand


46

Initialisation

table = a hash table

Lexicon organisation

For i = 0 to N

key = Xi

Yi = getWordData(Xi)

//convert unvoiced consonants into voiced consonants

key = tuneToVoicedConsonants(key)

//if a phonetic key already exists

if (table.containsKey(key))

value = table.get(key)

value += Yi

end

else if (!table.containsKey(key))

value = Yi

end

table.put(key,value)

end

The lexicon preparation takes place when the transcription engine is run for the first time

and does not repeat when input outlines are transcribed in real time. If any modification of a

lexicon is required, such as a change of word-list or a change of user’s domain, the existing

hash table can be updated by repeating the “lexicon preparation” procedure.

Once the lexicon data is ready the next process, denoted as “Nearest Neighbourhood Query”

is invoked.


47

3.4 Nearest Neighbourhood Query (NNQ)

Consonant neighbourhoods vowel neighbourhood circle neighbourhood

F, V

P,BR

S, Z

TH, th

T, D at the beginning of an outline

in the middle of an outline

at the end of an outline

close circles

unclose circles

hooks

Figure 3.4: Sample neighbourhoods predefined in the Nearest Neighbourhood Query Approach

The Nearest Neighbourhood Query (NNQ) is, in fact, a heuristic approach in which

misclassified pen strokes are adjusted according to the degree of similarity to other strokes.

Primitives with similar geometric features are predefined in the same neighbourhood and the

system comprises of seven neighbourhoods, where four relate to vertical and horizontal

strokes, one to circular primitives and the remaining two to dot and dash vowel-primitives.

Here, similarity means having similar angular structure for stroke primitives, having similar

shape for circular primitives or having similar location and shape for vowel primitives.

Samples of the predefined neighbourhoods are illustrated in Figure 3.4 and the Nearest

Neighbourhood Query algorithm is presented as follows:

{N1, N2, .., N7}: a collection of seven neighbourhoods

O: an input handwritten outline

I : number of segments of an input outline, O

Si: ith segment of an input outline, O

Pattern: a pattern category of Si

Xi: a resultant vector, containing a set of primitives that are similar to Si

R: an output vector, containing a set of Xi where (i = 1, 2,.., I)

M: a matrix, containing a number of outlines that are similar to O

Initialization

Initialize N1, N2, N3, N4, N5, N6, N7

X = a new vector

R = a new vector


48

Stroke adjustment

for i = 0 to I

//assign the ith segment of an input outline as a pattern category

Pattern = Si

for j = 0 to 7

//if the jth neighbourhood contains the value of Pattern

if (Pattern Nj)

//get all the elements of Nj excluding Pattern

Xi= Nj \ Pattern

end

end

R += Xi

end

Matrix = createMatrix(R)

return Matrix //output of NNQ algorithm

Figure 3.5: Sample output produced by the Nearest Neighbourhood Query

The output of NNQ is a matrix of primitives, whereby each row represents a particular

shorthand outline that is similar to an input pattern and each column represents a certain

segment of the shorthand outline. A pictorial presentation of NNQ is given in Figure 3.5 in

Handwritten outline

Sample output of the Nearest Neighbourhood Query algorithm


49

which sample input and output of the algorithm can be clearly seen. Once the NNQ process

is completed the next process, “Feature to Phoneme Conversion” is invoked.

3.5 Feature to Phoneme Conversion

This process converts segmented portions of a shorthand outline (e.g., loops, hooks or

strokes) into a phonetic representation using a set of production rules. According to our

study, approximately 20% of segmented portions of shorthand outlines, either forwarded

from the recognition engine or produced by the NNQ, can be directly converted into basic

Pitman’s phonemes. The remaining 80% need knowledge of adjacent primitives to be

translated into phonetic values based on a number of production rules. Similar to the work

by Leedham [Lg90], the production rules are applied with respect to a relationship between

an individual primitive and its adjacent primitives. Unlike Leedham’s approach, rules are

applied in the order of priority in this novel system. Basically, there are five production rules

introduced in this new system and they can be stated in a descending priority order as

follows:

1. Feature Detection (FD)

2. Length Detection (LD)

3. Primitive Combination (PC)

4. Primitive Combination and Reverse Ordering (PCRO)

5. Direct Translation (DT)

To clarify the first two rules, consider the two examples described below and to

clarify the last three rules, refer to examples in


50


51

Table 3-2. In addition, basic notations of Pitman’s Shorthand relating to each rule

can be looked at in Table 3-1.

Table 3-1: Relationship between the production rules and basic Pitman phonemes

Rule Pitman phonemes

FD SES, ZES circles, ST, STER loop, N, F, V, SHUN hook, suffix –SHIP hook,

suffix –ING/INGS dot

LD MD, ND, suffix –MENT, half length strokes, double length strokes

PC W, Y, H

PCRO PL, BR, etc., PR, BR, etc., FR, VR, etc., and FL, VL etc.

DT All consonants except Y, W and H

Example 1: Application of Feature Detection (FD) Rule

STER large loop: Pitman uses a large loop to indicate the sound of /STER/ in the middle or

at the end of an outline. For this case, one of the FD rules reads: “IF a stroke or curve

primitive is followed by a large circular loop primitive in the middle or at the end of an

outline, THEN the loop appends phonemes of /STER/ to the preceding phoneme.”

Example 2: Application of Length Detection (LD) Rule

Double length curves: Normal Curve-primitives are doubled in length to represent the

addition of the syllables -TER, -DER, -THER and -TURE in Pitman’s Shorthand. For this

case, one of the LD rules reads: “IF a curve primitive is doubled in length, THEN the

double-length curve inserts phonemes of /TER/, /DER/, /THER/ and /TURE/ after the

phoneme of the curve. To understand this principle clearly, consider the example in Figure

3.6.


52

Figure 3.6: Sample of phoneme translation of a double length stroke

As shown in the reference section of Figure 3.6, a normal downward curve represents a

phoneme /F/ in Pitman’s Shorthand, however, when the curve is doubled in length, it

represents the sound /F/ plus additional sounds of /TER/, /DER/, /THER/ or /TURE/.

Therefore, a candidate list for the word “after” contains four different pronunciations at the

end of phoneme conversion (Figure 3.6).

3.6 Phoneme Ordering

The primary task of “Phoneme ordering” is reordering resultant phonemes, produced by the

“Feature to Phoneme Conversion” process. The reordering is required due to a special

writing order of Pitman’s Shorthand i.e., consonants of a word are always written first and

vowel notations are written only after the completion of a whole consonant kernel. In online

handwriting recognition, pen data is collected in a series of time order and so vowel-

primitives are always tagged behind consonant-primitives regardless of the linguistic order

in our system. In order to obtain correctly ordered phonemes, vowels need to be inserted

among consonants. Leedham [Lg90] proposed the same strategy to sort phonemes according

to the linguistic order and this process is denoted as “Phoneme ordering”.

Double-length input outline for the word “after”

Recognition output

1st primitive- double length /F/ or /V/ curve2nd primitive- /A/ vowel

Output of “Feature to Phoneme Conversion”

Output 1/TER/+/A/ vowel

Output 2/DER/+/A/ vowel

Output 3/THER/+/A/ vowel

Output 4/TURE/+/A/ vowel

Reference

Normal /F/ consonant

Apply double length rule of /TER/, /DER/, /THER/, /TURE/


53

An example for “phoneme ordering” is given in Figure 3.7 in which its sample inputs are

directly taken from outputs of the “Feature to Phoneme conversion” process, demonstrated

in Figure 3.6. As shown Figure 3.7(a) the vowel /A/ is detected last although it is the first

phoneme in the word “after”. The system uses dominant point information and sequence

information of ink data to place vowels at their correct positions. After phonemes have been

sorted into correct order, the resultant phonemes are matched up with a phonetic lexicon in

the next process, called “lexicon lookup”. Then a list of autographic English words that best

represent the input shorthand outline is produced at the end the search.

Figure 3.7: (a) Sample input of phoneme ordering process (b) sample output of phoneme ordering process

Input of the phoneme ordering process

Input 1/TER/+/A/ vowel

Input 2/DER/+/A/ vowel

Input 3/THER/+/A/

vowel

Input 4/TURE/+/A/

vowel

Output of the phoneme ordering process

Output 1/A/ vowel+/TER/

Output 2/A/ vowel+/DER/

Output 3/A/

vowel+/THER/

Output 4/A/

vowel+/TURE/

(a) (b)


54

Table 3-2: Phonemes translation using PC, PCRO or DT rules

Number Pitman outline

English word

Primitives classified by a

recognition engine

Phonemes of an outline

(a) Word 1. - /W/ consonant

(Rule: + = /W/)2.

3. - /D/ or /T/ consonant

4. - /AW/ vowel

Translation is based on the rule of “primitive combination” (PC). The rule applied to this

example is “IF an upward diagonal stroke is preceded by a small anti-clockwise hook,

THEN the combination of these two primitives denotes the phoneme /W/”

(b) Printed 1. - /PR/ or /BR/

2. (Rule: + = /PR/ or /BR/)

3. - /N/ curve

4. - /T/ or /D/ consonant

5. - /Ē/ vowel

6. - /Ā/ vowel

Translation is based on the rule of “primitive combination and reverse ordering” (PCRO).

The rule applied here is “IF a small hook is followed by a straight downward stroke, the

small hook is converted into phoneme /R/ and swapped with a succeeding phoneme.”

(c) Go 1. / - /G/ or /K/ consonant

2. - /Ō/ vowel

Translation is based on the rule of “direct translation” (DT). The rule applied to this

example is “IF a horizontal stroke is written from left to right, THEN the stroke directly

denotes the phoneme /G/ or /K/.”


55

3.7 Experimental Results

The preliminary experiment described in this chapter is categorized into two main studies:

firstly, statistical analysis of homophones (words which look similar but have different

representations) in a phonetic lexicon and secondly, performance evaluation of the word

level transcription of the system prototype.

3.7.1 Data Set

For a statistical analysis of a phonetic lexicon, a list of the 5000 most frequently used

English words, extracted from the Brown Corpus is used. Based on this word list, a hash

table is created with a series of phonemes as a key for each group of words. Here, the

phonetic keys are extracted from the CMU phonetic dictionary. (Figure 3.8 gives a pictorial

representation of the hash table).

Index Word

/B Ă T/ Bat

Pat

Bad

Pad

Figure 3.8: Sample element of a phonetic lexicon in a hash table

For an analysis of word transcription performance, 432 Pitman outlines were collected,

written with different levels of tidiness on a WACOM ART II Tablet by three writers. Each

writer wrote a sample sentence, consisting of 28 vocalized outlines and 20 short-forms, three

times. Here, the sample sentence covers the whole range of shorthand primitives and the

selected words contained in the 5000 most frequently used English words of the general

domain. Samples of the collected data are illustrated in Figure 3.9.


56

Figure 3.9: Sample collected outlines

3.7.2 Analysis of a Phonetic Lexicon

The goal of this experiment is to estimate an approximate number of candidate words

(homophones) for each input outline by using a phonetic lexicon, and to evaluate which

vocabulary level has the highest ambiguity distribution and which level has the least. Here,

vocabulary level means words known and used by a user and it is equivalent to a number of

words contained in a lexicon. Statistics obtained from this study are intended to estimate the

preliminary accuracy of the post processing of handwritten Pitman’s Shorthand with respect

to different levels of writers’ vocabulary.

40

50

60

70

80

90

100

10

0

30

0

50

0

10

00

20

00

25

00

30

00

35

00

40

00

45

00

50

00

Lexicon size in no.of words

Un

iqu

e o

utl

ine

s i

n %

Uniqueness of outlinesfor perfect recognition

Uniqueness of outlinesgiven line thicknessambiguity

Uniqueness of outlinesgiven vowel ambiguity

Figure 3.10: The distribution of homophones in different sized phonetic lexicons

Figure 3.10 illustrates experimental results obtained from different sizes of phonetic lexicons

up to 5000 words. The X-axis of the graph represents different sizes of lexicon, and words


57

extracted for these lexicons are sorted according to their frequency of usage. This means, a

lexicon of size 100 represents the first hundred most commonly used words in English; a

lexicon of size 300 represents the first 300 most commonly used words and so on. The first

test simulates how an input of Pitman’s outline can be uniquely identified by a lexicon in the

presence of perfect segmentation and recognition. According to the test, 97% of the 5000

most frequently used English words have a unique representation. The maximum ambiguity

is 3 potential words per index and an average ambiguity is 1.02 potential words per index.

Therefore, a transcription accuracy of at least 97% can be estimated if there are no errors in

the low level segmentation and classification of shorthand outlines.

The second test (Figure 3.10) estimates the transcription performance in the presence of

unclear thickness of a pen-stroke. This is the most common case experienced in the

recognition of Pitman shorthand as most digitizers are unable to detect the thickness of a

pen-stroke even though Pitman defines similar sounding consonants by the same stokes and

differentiates between voiced and unvoiced sounds by thick and thin lines. It should also be

noted that regardless of the input technology, writers do not make a clear distinction between

thick and thin strokes. According to this test, ambiguity of a lexicon of 5000 words increases

by about 9% if there is no distinction between voiced and unvoiced consonants. The

transcription accuracy here is expected to be at least 87%.

The third test in Figure 3.10 predicts the transcription performance in the presence of

ambiguous vowel notations. This is an important consideration in the recognition of Pitman

shorthand since vowels are occasionally omitted in writing Pitman’s Shorthand and omitted

positions vary from writers’ experience or individual inclination. If a solution to handle the

unpredicted omission of vowels in an outline is by excluding vowels from the lexicon and

matching without vowel components, the new version of the lexicon has about 56% unique

indices.


58

3.7.3 Performance Evaluation of the Word Level Transcription

The goal of this experiment is to evaluate the word transcription performance of our

proposed framework under the following criteria:

in the presence of shape variation and position confusion due to speed writing or

different users’ writing;

in the presence of segmentation and classification errors due to misclassification or

hardware constraints, and

in the presence of abnormal outlines due to inconsistent writing

Table 3-3: Experimental results of the phonetic based word translation

Description Transcription accuracy (Vocalised outline)

Overall 84%

In the presence of vowel

omission or confusion

0%

In the presence of inconsistent

writing

0%

In the presence of classification

error

100%

As shown (Table 3-3), the best rate achieved by the vocalized outline interpreter is 84%.

12% of error rate is due to inconsistent writing i.e. outlines which are comprehensible to

human readers, but are not consistent with the writing rules of Pitman shorthand. An

interesting phenomenon observed in this experiment is that 48% of perfect transcription

occurs in the presence of recognition errors. This shows the approximate pattern matching

technique applied in NNQ is capable of dealing with classification errors. A primary

limitation of this system, which accounts for 40% of error rate, is being unable to correctly

transcribe outlines with hidden or omitted vowels.


59

Both accuracy and error rates reported throughout this experiment are based on a number of

outlines and they can be denoted as follow:

100t

ca (3.1)

where a is the word transcription accuracy, c is the total number of correctly interpreted

outlines, and t is the total number of handwritten outlines.

100

t

cte (3.2)

where e is the error rate, t is the total number of handwritten outlines, and c is the total

number of correctly interpreted outlines.

3.8 Discussion

On the whole, a primary advantage of phonetic based transcription of vocalised outlines is to

be able to adapt to existing language models (i.e., phonetic models), which define large

vocabularies with probability distributions between sequences of phonemes. Another

distinct advantage is that a machine performs the same logical procedures as a human

interpreter to transcribe Pitman’s Shorthand outlines and this makes the machine

transcription concept easy to follow.

In terms of disadvantage, performance of phonetic based transcription falls dramatically in

the presence of omitted vowels in vocalised outlines. According to our statistical analysis of

a phonetic dictionary (Figure 3.10), transcription of vocalised outlines without vowel

components is estimated at merely 56% correctness. In addition, the special writing rules of

Pitman’s Shorthand raises ambiguity in phonetic based transcription – a number of special


60

writing rules invented for speed improvement purposes in Pitman’s Shorthand allow

multiple ways of pronouncing different sounds for primitives with minor differences of size,

length, thickness or inclination. In general, it is practical to express accurate size, length, or

inclination of a stroke for a printed script; however it is less practical for handwriting,

especially if the script is written at speed. The following examples illustrate the variation of

pronunciations with the minor differences between geometric features.

Example 1: Appearance variation

As shown in Figure 3.11, standard notations for consonants /T/ and /L/ are a vertical stroke

and a curve respectively, however if /T/ is immediately followed by consonant /L/ or if

there is a non-stressed vowel between /T/ and /L/, Pitman uses a combination of a small

hook and a vertical stroke (i.e., ) to indicate the sound of /TL/ or /T+ silent_Vowel+L/.

On the other hand, an outline with a small circle followed by a vertical stroke (i.e., )

stands for the sound /ST/ and it can be easily confused with an outline of /TL/ or

/T+silent_Vowel+L/ if the circle at the beginning is not clearly written. According to

experimental results, approximately 45% of small hooks are recognised as circles.

Therefore, a direct conversion of primitives, which are prone to minor recognition errors,

into phonemes can lead to completely different interpretations.

Figure 3.11: Illustration of the incidence of phoneme variation due to confusion between a circle and a hook

Basic Pitman’s notations

/T/

/L/

/S/

/ST/

/TL/ or/T+silent_vowel+L/

Handwritten outline


61

Example 2: Length variation

In Pitman’s Shorthand, curves of different lengths represent different phonemes; however,

length is not clearly shown in some outlines while writing at speed. As shown in Figure

3.12 a sample outline of the word “shatter” can be interpreted wrongly as /SH Ă/ instead of

/SH Ă T ER/ if the curve /SH/ is not recognised as a long curve.

Figure 3.12: Illustration of the incidence of phoneme variation due to length confusion

Examples 1 and 2 demonstrate that converting inaccurate handwritten primitives into

phonemes allows unnecessary candidates to appear at an early stage and subsequently affects

the transcription performance. Cho & Kim [CK04] proposed that stroke relationships are

usually robust against geometric variations and important for discriminating characters of

similar shapes in on-line handwriting recognition. It is, therefore, more appropriate to retain

stroke information of an outline rather than changing it into phonemes.

After a thorough evaluation on the advantages and disadvantages of a phonetic based

transcription approach, it has finally been concluded that the remaining work of the thesis is

going to be based on a novel transcription method that retains low level stroke information,

denoted as a “primitive based transcription approach”.

Phonetic representations of double-length and normal

length outlines

/SH Ă T ER/

/SH Ă/

Handwritten outline for the word “shatter”

5. Generation of a machine-readable Pitman’s Shorthand lexicon

62

4 Bayesian Network Based Word Transcription


63


The previous chapter reviewed the advantages and disadvantages of a phonetic based

transliteration of handwritten Pitman’s Shorthand and concluded that the idea of not

following conventional phonetic approaches is rather appealing. This chapter discusses the

novel approach, implemented specifically for this research to improve word transcription

accuracy by using a primitive to text transliteration approach. In this new approach,

Bayesian Network representation is applied to configure ambiguities and stroke

dependencies of handwritten Pitman’s Shorthand outlines.

First of all, an overview of the whole system is given, thereby enabling the reader to get a

clear understanding of the role of the word transcription processes. Following the overview,

a detailed description of a Bayesian Network word recogniser is given under the following

topics.

Summary: a brief description of each process included in a Bayesian Network based

word recogniser.

Life cycle: explanation of a life cycle of Bayesian Network models that represent

handwritten Pitman’s Shorthand outlines.

Network architecture: description of an outline model’s architecture, including

attributes (nodes) and relationship between nodes (topology).

Inference: propagation of the likelihood of an attribute of an outline model based on

other attributes of the model, in which the propagated values are used for N-best

word selection.

Training (Learning): training outline models with a collection of training data in

order to enable the system to cope with the natural ambiguity of handwriting in

Pitman’s Shorthand


64

Model selection: selection of N-best outline models for a given input outline centred

on knowledge based rejection strategies.

Experiment: performance evaluation of the Bayesian Network based word

recogniser

4.1 System Overview

Figure 4.1: An abstract view of the whole system

An overview of the whole system is given (Figure 4.1), in which the diagram is nearly

identical to the one illustrated in the pervious chapter. A major difference between the two

frameworks is the change to a vocalized outline interpreter (shaded box) in the new

framework, where text is interpreted directly from primitive attributes instead of phonetic

Input

Collaborator’s recognition engine

Vocalised outline recogniser

Short-formrecogniser

Segmentation engine (dominant point

detection)

Classifier \(Neural Network)


primitives

Template matching

engine

Output:A list of words


Vocalised outline interpreter

Pre-processing

Primitive based word transcription


words



Output Text

Internet


65

attributes as in the old framework. A summary of processes included in the new vocalized

outline interpreter is presented as follow.

4.2 Summary of Bayesian Network Based Word Transcription

Figure 4.2: Illustration of Bayesian Network based word transcription

The shaded box in Figure 4.2 highlights the role of Bayesian Network based vocalized

outline transcription. It comprises of two major processes: preprocessing and word

interpretation. The preprocessing takes place at the first time of setting up a transcription

engine and it is skipped in real time transcription of shorthand outlines unless any

modification of lexical data is required. A primary function of preprocessing is to

automatically convert a phonetic lexicon into a Pitman’s Shorthand lexicon such that

different combinations of a series of geometric patterns represent different keys, with each

key mapping to one, or more than one, word. This approach (the creation of the Pitman’s

Vocalised outline

recogniser

A ranked list of

primitives



A ranked list of words

Bayesian Network based vocalised outline interpreter

Word interpretation

Pre-processing

Lexicon construction

Training

Shorthand lexicon

Bayesian Network based outline models

Input outline

Output word(s)

Process

Data Data flow

Read/Write accessStorage

Legends


66

Shorthand lexicon) is distinct from previous work and a full description of the lexicon

creation is given in a separate chapter, Chapter 5. Another important function of

preprocessing is to create Bayesian Network based outline models where user independent

handwritten data and lexicon information are embedded in hierarchical probabilistic

structures.

The next process, which takes place immediately after the preprocessing, is word

interpretation. A primary function of the word interpretation is to produce a ranked list of N-

best words based on a confidence score of the low level recognition plus a belief of nodes of

an outline model. After the word interpretation, the N-best words are then forwarded to the

next process, called a phrase level interpreter to produce the final word(s) for a given input

outline.

4.3 Life Cycle of Outline Models

Outline models are the primary components of a vocalised outline interpreter and this

section describes the life cycle of outline models throughout the word transcription process.

Firstly, a precise description of outline models is given– a collection of outline models

represents a dictionary, and the number of outline models, generated in the word interpreter

is not the same as the number of words of a dictionary. This is because each outline model

is designed to represent one, or more than one, word in order to cope with hardware

limitations or ambiguities of handwritten Pitman’s Shorthand. An example of a hardware

limitation is the passive digitisers of Personal Digital Assistants (PDAs) that are incapable of

detecting accurate line thicknesses. This limitation makes a shorthand recogniser fail to

distinguish between two similar outlines with different line thicknesses e.g., outlines for the

words “pays” and “bays” . As a result, grouping similar outlines under the same


67

model enables the system to easily find potential candidate words for a given outline and

improves the search performance. Here, “similar outlines” stands for “words with the same

series of geometric features (of a consonant kernel) regardless of different line thicknesses

and different vowel positions. Samples of similar outlines are illustrated in Figure 4.3.

Figure 4.3: Illustration of three pairs of similar outlines

In terms of the life cycle, outline models are firstly created with the use of a shorthand

lexicon and secondly updated with a set of training data. The models are then saved as a

knowledge source for word interpretation until any changes are required. Examples of

changes include expanding the word list of an existing dictionary or altering a user domain.

In response to the change of a user domain, outline models are created, edited or removed

according to the user’s preference, defined at a domain set up process. Note that vocabulary

(i.e., the word list of a dictionary) has a huge impact on word transcription performance and

outline models should be associated with a dictionary of a corresponding domain. Figure

4.4 illustrates the life cycle of outline models.

In real time word interpretation, a series of classified primitives of an input outline are

matched with outline models, and the model with the highest posterior probability is taken as

a correct representation for a written outline.

pays bays oak go airs erase


68

Figure 4.4: Life cycle of outline models

4.4 Outline Model Architecture

An outline model is formed by concatenating the basic geometric features of a shorthand

outline, produced by the low level recognition engine, in chronological order. Note that

chronological writing order in Pitman’s Shorthand is not synonymous with the one in normal

English. The difference between them is illustrated in Figure 4.5, a chronological writing

order of the word “beat” in normal English is b, e, a, t whereas the writing order changes to

b, t, e, a in Pitman’s Shorthand.

Vowels are always written last no matter how words are pronounced in Pitman’s Shorthand

and this makes the automatic transliteration of handwritten Pitman’s Shorthand distinct from

the transcription of handwritten English. According to the study in Chapter 3, reordering

vowels to their corresponding positions was found inefficient in the case of having missing

vowel variables in an outline. To improve upon existing systems, it was argued, one should

Shorthand lexicon

A collection of outline models

Creation of a new outline

model

Update of existing outline models

with training dataTraining

data

A closer view of an outline model for a particular word

Removal of outline model with new domain set up

Legends

Storage Outline model

ProcessRead / write access

Node of an outline model


69

somehow seek a more parsimonious solution that also leads to a better text interpretation

performance. Thus, this research proposes a novel network model, denoted as an outline

model which represents the inherently complex features of handwritten Pitman’s Shorthand.

Figure 4.5: Illustration of chronological writing order of normal English and Pitman’s Shorthand

4.4.1 Nodes of an Outline Model

The structure of an outline model is based on a Bayesian Network representation [Pj88] in

which the model is retained in a hierarchical structure with each node corresponding to a

primitive variable or a conditional variable, and each link signifying probabilistic

dependency between nodes. Similar to a network architecture designed by Xiao in the

domain of signature verification [XL02], our outline model creates the following four types

of nodes, depending on the relationship between one node and another.

1. Root node: A root node corresponds to an outline O and it represents one, or more than

one, word. It contains N child nodes {P1, P2,.. PN} where Pi corresponds to a collection of

primitives which represents the ith segment of the outline O.

Chronological writing order of the word “beat”

1st letter 2nd letter 3rd letter 4th letter

1st consonant (b) 2nd consonant (t) 3rd vowel (ea)

The word “beat”

In normal English

In Pitman’s Shorthand


70

2. Unique node: A unique node corresponds to a particular segment of a shorthand outline

and it represents one and only one pattern. It appears while an outline model O is created

with a shorthand lexicon at the beginning, and it remains or disappears while O is updated

with training data. The definition of a unique node is: “if a particular segment (node) of an

outline model relates to one and only one type of geometric feature after it has been updated

with a shorthand lexicon as well as training data, the node is considered to be independent of

other nodes and linked directly to a root node.” Figure 4.6 (a) and (b) illustrate occurrence

of unique nodes in two cases.

Figure 4.6: Illustration of unique nodes of an outline model(a) Occurrence of unique nodes after O has been created with a Pitman’s Shorthand

lexicon– since features of O in the lexicon are genuinely accurate, every segment(node) of O is related to one and only one pattern, resulting in a unique node for every segment. (b) Occurrence of unique nodes after O has been updated with lexicon and training data– since there is more than one possibility in the first and third segment of

O, two corresponding unique nodes disappear, resulting in one remaining unique node, P3.

3. Virtual node: A virtual node corresponds to a certain segment of a shorthand outline and

represents a conditional variable that allows the embedding of multiple possibilities of a

consonant-segment in an outline model O. It appears when two or more primitives compete

to represent a particular node of O during the training process, but it never appears while O

is created with a shorthand lexicon at the beginning. The definition of virtual node reads: “if

a particular primitive (e.g., P1 in Figure 4.6 (b)) is dependent on another primitive (e.g.,P2 in

Figure 4.6 (b)) and there is an optional relationship between them (i.e., either at most one or

none of them can be true at the same time), we can assume that there is a mechanism that

controls the values of P1 and P2, resulting in a virtual node V1 as shown (Figure 4.6 (b)).

O

P1P3 P4

P3

P1 P2 P4 P5

(b)(a)

O

V1H1


71

4. Hidden node: A hidden node corresponds to a certain portion of a shorthand outline and

represents a conditional variable that allows the embedding of hidden vowel primitives in an

outline model. An interesting assumption in relation to the creation of a hidden node is that

it appears from the time when outline models are created with a shorthand lexicon, although

the lexicon provides accurate vowel information at the time. This is due to a major purpose

behind hidden nodes i.e., to identify missing vowel components, randomly omitted by

writers according to the writers’ experience or preference. The definition of hidden nodes

reads: “if a particular primitive (e.g., P4 in Figure 4.6 (b)) appears or disappears from time to

time and the variation does not adhere to any rule, we can assume that there is a hidden

mechanism that controls the value of P4 or P5, resulting in a hidden node H1.”

In order to demonstrate how an outline model is created with the use of four types of node,

the step by step creation of an outline model for the word “bake” is given (Figure 4.7).

1. Firstly, a root node of an outline model is generated with the word “bake”.

2. The root node then creates N number of child nodes using a shorthand lexicon such

that each consonant primitive of a word in the lexicon turns into a unique node and

each vowel primitive turns into a hidden node, where N is the number of primitives

of the word.

3. The outline model is then updated with a number of training samples, resulting in

additional leaf nodes and virtual nodes.


72

Figure 4.7: Illustration of step by step creation of outline models

Step 1: Creation of a root node

Step 2: Creation of unique node and hidden node

Step 3: Update with training data 1

Step 4: Update with training data 2

Steps Description Outline model

Shorthand outline for the word “bake” =

Lexicon entry for the word “bake”in terms of strokes =

Lexicon entry for the word “bake”in terms of type number = 4 7 91

R

R

U U H

Training data 1 for the word “bake”in terms of type number = 4 7 92

1 6

Training data 2 for the word “bake”in terms of type number = 4 5

R

V V H

L

L L L L L L

R

V V H

L L L L L L L

legend

R V H LRoot node Virtual node Hidden node Leaf node

Bake

4 7

91

4 1 7 6 91 92

4 1 7 6 5 91 92

2 possibilities of the 1st segment


73

Figure 4.8: Sample training data for the word “bake” processed by the recognition engine; the italic text on the right explains what each line of data represents

In addition, detailed explanation on training data, applied in the creation of an outline model

is given in Figure 4.8 which illustrates the training data 1 of the word “bake”, depicted in

Figure 4.7. The second and third line of data (Figure 4.8) indicates that there are two

possible pattern categories associating with the first segment of the word “bake”: type 4 and

type 1. Here, type 4 is equal to an existing pattern of a shorthand lexicon and type 1 is a new

pattern observed by the recognition engine. In order to update an existing outline model

with this new observation, an existing unique node (Figure 4.7, Step 3) is firstly transformed

into a virtual node and secondly attached with two leaf nodes, resulting in a virtual node

with two children. Similarly, according to the sixth line of data (Figure 4.8), a vowel

primitive (type 92) classified by the recognition engine is different from the one defined in a

lexicon (type 91), resulting in a hidden node with two children.

WStartS1, 0, 64, 4, 0.56

S1, 0,64, 1, 0.44S2, 64, 137, 7, 0.88S2, 64, 137, 6, 0.12

V1, 0, 64, 2, 1, 92

WEnd

(Input: a Pitman’s outline for the word “bake”)

(1st line: word start)(2nd line: segment number, start coordinate, end coordinate, primitive type, probability)(3rd line)(4th line)(5th line)

(6th line: vowel number, start coordinate, end coordinate, sequence, position, type)

(7th line: word end)


74

4.4.2 Relationships between Nodes

Relationships between nodes of a Bayesian Network are indicated by drawing arcs from

cause variables to their immediate effects [Hd99]. The arc signifies a cause-effect

relationship and encodes conditional probability distribution (CPD) – indicating to which

extent one variable is likely to affect another. In addition, the level of dependency between

nodes has a significant affect on computational expense: the stronger the relationship

between nodes, the bigger the conditional probability table size grows and vice versa.

Before determining the dependency of nodes of an outline model, this research takes into

account the following two extreme situations:

If nodes are extremely independent of each other, they become d-separated given an

evidence node, thereby making a network model unable to cope with abnormal

circumstances. For example, variables A and B, which are usually dependent on

each other, may be disconnected due to an occurrence of rare evidence, E.

On the other hand, if nodes are precisely connected to each other with conditional

probability distributions for all possible cases, it becomes computationally inflexible

to obtain a reliable estimation.

Taking into account the drawbacks of the above two extreme situations, the outline models

for this research are designed with the following practical hypothesis:

1. each node Pi is independent of its non-descendants Pj

2. each node Pi is independent of its descendants Di given a parent of Di

3. leaf nodes {L1, L2,…, Ln} are independent of each other unless they share the same

parent Xj

Alternatively, conditional dependency between variables of an outline model can be

presented using a Bay ball algorithm [Sr98] as illustrated (Figure 4.9).


75

Figure 4.9 Illustration of conditional dependency of variables in an outline model using the Bayes Ball algorithm [Sr98].

If there is no flow of a ball from A to B in a graph, A and B are conditionally independent given a set of observed or hidden variables X and vice versa.

4.5 Inference

The inference process of a Bayesian Network involves updating the probability of nodes

given some evidence and priori probabilities [XL02]. It is called finding belief of a node x,

denoted as BEL(x). In our case, evidence of nodes is given by a lexicon, training data or

user input. A primary use of BEL(x) is to find the likelihood of outline models, with which

the N-best models for a given shorthand outline are selected.

Among a variety of beliefs updating algorithms that support the Bayesian Network, this

work directly applies the “message passing” algorithm developed by Pearl [Pj88] – the belief

of every node in the network is taken as the product of π and λ messages, where π is a

message received from each of its parents (if any) and λ is a message received from each of

its children (if any). Alternatively, π and λ of each node of an outline model is denoted as a

product of π X(U), a message that node X receives from its parent U and )(XjY , a message

that node X receives from its child Yj. Note that an outline model is a tree structure where

every node has one and only one parent (except the root node, which has no parent) and has

N number of children (Y1, Y2,.., YN).

Legend

Hidden node

Observed node

Direction of the flow of a ball

Indication of parent-child relationship


76

4.5.1 Message Initialization

According to Pearl’s algorithm [Pj88], nodes of a Bayesian Network need to be assigned

with initial belief before propagating messages through the network. In general, assignment

of initial belief (prior probability) of a node varies widely from one application to another,

depending on statistical information on variables as well as previous experience of a

developer working on a similar problem.

In this work, message initialisation varies depending on the type of node. The initialisation

of π and λ messages for different types of node of an outline model is presented as follows.

Root node: A root node is the topmost one in an outline model and does not have any parent;

therefore its π message is set to 0.5 assuming that there is an equal chance of taking a TRUE

or FALSE value for this node. Its λ message is set to 1 assuming that there is a TRUE

relationship from its child nodes.

Unique node: A unique node does not have any descendants and is linked directly to a root

node. Its π message is set to 1 assuming that there is a TRUE relationship from its parent

(root node) and its λ message is set to 1, stating that a primitive associating with this node

appears in both lexicon and training data.

Virtual node: A virtual node is a judgemental node holding a true relationship from its

parent (i.e., π = 1) and optional relationship to its children (i.e.,

λ=P(Child_Nodes|observation).


77

Hidden node: Similar to a virtual node, a hidden node holds a true relationship from its

parent (i.e. π = 1) and optional relationship to its children (i.e.,

λ=P(Child_Nodes|observation).

Leaf node: A leaf node (not including a “unique node”) holds an optional relationship from

a virtual node or a hidden node and its π message is set to P(Child_Nodes|observation). It

does not have any children and its λ message is set to a confidence score of the node

obtained from training data.

On the whole, our message initialisation strategy is similar to the one implemented by Xiao

and Leedham [XL02] for signature verification. Nonetheless, the estimated values are

different in this work in accordance with the characteristics of handwritten Pitman’s

Shorthand.

4.5.2 Belief Updating

Belief of a node in an outline model is calculated by the formula presented by Xiao [XL02]

denoted as

)()()( xxxBEL (4.1)

where β is a normalization factor, λ(x) is a combined message received from all the children

of node X and π(x) is a combined message received from all the parents of node X.

Depending on the type of node, λ(x) is calculated differently. If it is a Root Node, λ(x) can

be defined by the formula presented by Pearl [Pj88]:

))(()( xxj

Yj (4.2)


78

Where )(xjY is a message that a node X receives from its child node Yj.

If a node is a unique node, λ(x) is defined as:

1)( x (4.3)

If it is a virtual node, λ(x) is defined as:

001.0

)(

)(

x

xj

Y

If a node is a hidden node, λ(x) is defined as:

1.0

)(

)(

x

xj

Y

In equations 4.4 and 4.5, the values 0.001 and 0.1 are predefined probabilities, used when

none of the child node of X is likely to be true. Selection of these confidence scores is based

on several experimental results, testing with different thresholds between 0 and 1.

If a node is a leaf node, but not a unique node, λ(x) is defined as:

),(

)0.1,0.1(

)(

ba

x

If a child Yj of the virtual node X is true

otherwise (4.4)

If a child Yj of the hidden node X is true

otherwise (4.5)

If a value of node X is assigned by a lexicon

otherwise (4.6)


79

where a and b are normalised recognition and training probabilities for a corresponding

node. The next section, “Learning of Outline Models” explains how a and b are calculated

using training data.

4.6 Learning of Outline Models

In general, learning in a Bayesian Network often refers to the learning of structure of a

model and its parameters, or learning either one of them [Mk01]. In this work, learning

refers to the learning of parameters of an outline model. Structure learning is not of concern

here since there is no direct interaction between the low level segmentation engine (of the

collaborator) and the network modelling engine (of this research) to enable a dynamic

change of a basic model layout.

Parameter learning of an outline model includes finding an optimal maximum likelihood of a

node based on a set of training parameters and assigning the likelihood value as belief of the

node. There are various learning algorithms [Hd99], [Mk01] that support Bayesian

Networks and the selection of an appropriate one is based on two factors: structure of the

network (whether it is known or unknown) and evidence of nodes (whether they are fully or

partially observed). With full details of these two factors, an appropriate learning algorithm

for a particular Bayesian Network can be identified using Murphy’s decision table (Table 4-

1), shown below.

Table 4-1 Murphy’s decision table [Mk01]

ObservabilityStructure Full PartialKnown Close form EMUnknown Local search Structural EM

The table indicates which algorithm is likely to be the most effective under which

circumstances. For example, the Expectation Maximization (EM) algorithm is likely to be


80

the most suitable one for a Bayesian Network whose structure is known in advanced and

whose parameters are partially observed. With reference to this table, parameter learning of

an outline model is discussed in two parts: “learning of consonant primitives” and “learning

of vowel primitives”.

4.6.1 Learning of Consonant Primitives

The maximum likelihood estimates (MLE) learning method is used to find estimates of

consonant primitives of an outline model since the structure of the model is known and its

consonant parameters are fully observed in training data. The structure (states of an outline

model) is known in advanced because an outline model is initially constructed with a lexicon

entry with clear information on the number of segments (states) of each shorthand outline.

In addition, the consonant primitives are always observed in training data because

stenographers never omit consonant primitives of a vocalised outline. Murphy denotes MLE

as a closed form in his table (Table 4-1).

The basic idea behind the MLE method is to maximise the likelihood of training data D,

which contains M cases (believed to be independent) [Hd99]. Assuming that

Di={N(i,1),N(i,2),..N(i,j)} is the ith sample of training data D, j is the number of consonant

primitives of a word and N(i,1)={N(i,1,1),N(i,1,2),..,N(i,1,k)} represents a set of possible consonant

primitives of the node N(i,1), a pair of MLE values (a,b) for each consonant node N(i,j,k) can be

calculated using formula 4.7 and formula 4.8 when a training sample Di is fed into an outline

model. In general, a represents the likelihood of a consonant node N(i,j) to be recognised as a

primitive type N(i,j,k) and b is the likelihood of consonant primitive N(i,j,k) to be associated

with node N(i,j). The calculation of a is denoted as:


81

a =

M

iiDP

M 1

),|(1 (4.7)

where is the recognition/classification accuracy of a consonant node given training data

Di and a parent node P.

The calculation of b is denoted as:

b =

M

i

DiPM 1

),|(1 (4.8)

where is a conditional variable which is 1 if a consonant primitive of training data Di has

a true relationship with its corresponding parent node P and 0 otherwise.

In addition, the value b is saved in a history file to be used to create new outline models that

do not have training data; the history file creation is presented below.

Initialization

D: a collection of training data

Di: ith sample of the training data, D

L: a primitive lexicon

Li: an element of L which holds an equal word value as Di

N: number of consonant primitives contained in Di

j: an index identifier

Ni,j: a pattern representing the jth consonant primitive of Di

Li,j: a pattern representing the jth consonant primitive of Li

b : probability of Ni,j to have a relationship with Li,j


82

History updating

If (Ni,j != Li,j & b >0)

Save b

The above two lines of pseudo code indicate that if a pattern Ni,j ,observed in training data is

not the same as the one defined in a lexicon and if there is evidence confirming a

relationship between Ni,j and Li,j (i.e., the value of b is greater than zero), the system creates a

history file and stores the value b as a probability of Li,j to be recognised as Ni,j . Then, later

in the training process, the history file is retrieved to construct new outline models for words

which do not have any training samples.

In brief, the use of the history file has a significant benefit to the training of shorthand

outline models, particularly for words which do not have sufficient training data. This is

mainly because Pitman’s Shorthand is no longer a demanding skill nowadays and the

collection of thousands or millions of samples of training data is infeasible in terms of

accessibility to experienced writers.

4.6.2 Learning of Vowel Primitives

Finally, the hardest part of the learning process will be addressed, where there are

hidden/missing vowel variables. The problem is that vowel primitives are rarely written by

Pitman’s Shorthand writers and omitted positions vary widely depending on individual

preference and context. When having such non-linear distribution of hidden variables, the

Expectation Maximization (EM) algorithm is shown to be effective to find a (locally)

optimal maximum likelihood of a node [Mk01]. Thus, the EM algorithm is applied for

learning of vowel primitives here. The basic idea behind the EM algorithm in our learning

process is that if we know the vowel values from a lexicon, the probability distribution of


83

hidden vowels in an outline model can be estimated after learning in the M step. Then in the

E step, in which E>M, these estimated values can be treated as though they are observed.

On the whole, the EM learning of a vowel (hidden) node is denoted as:

PEM (V=TRUE |O=TRUE) =)(

)(

TRUEOE

TRUEVE

(4.9)

where V is a vowel node, O is an outline model and E(…) is the number of times a

corresponding parameter is expected to occur. According to Murphy [Mk01], E(...) is

computed as follows

m m

mm DePDeIeE )|()|()( (4.10)

where D is a set of training data, I(e|Dm) is an indicator function which is 1 if an event e

occurs in training case m, and 0 otherwise.

4.7 Model Selection

Model selection in a Bayesian Network is concerned with measuring the degree to which a

network structure (equivalence class) fits the prior knowledge and data. [Hd99] This work

determines the fitness of a particular outline model to a given input outline via a relative

posterior probability. Assume Oi is the ith outline model and P1, P2,.., Pn are input primitives

classified by our collaborator’s recognition engine, a posterior probability (fitness) of the

outline model Oi given a set of input primitives can be defined as:

P(Oi| P1, P2,.., Pn) = P(Oi)P(P1, P2,.., Pn|Oi) (4.11)

According to equation 4.11, posterior probability of an outline model is calculated based

upon a prior probability of an outline model in combination with the likelihood of input


84

primitives which belong to the given outline model. Alternatively, equation 4.11 can be

denoted in terms of belief of a node as follow:

)|()|(),...,|( 2,1 jjini PNBELOxBELPPPOP (4.12)

where j = (1,..,n), Oi is the ith outline model, x is a root node of Oi, BEL(…) is the belief of a

node, Pj is an input primitive and Nj is a child node of the root node x.

To find the N-best outline models for a given input, models with top N posterior

probabilities are chosen. However using the posterior probabilities alone to find the best

models is not computationally efficient. The problem is that the number of outline models

increases along with the number of words contained in a lexicon and calculating the

posterior probability of thousands of outline models in real time word transcription is

infeasible, mainly in terms of operational time. Therefore, three unigram-based rejection

strategies are applied in our system in order to reduce model selection time.

The first rejection strategy – number of consonant primitives (NCP) of an input outline is

used as a first level filter to reject outline models that are not relevant to a given input. The

approach is denoted as “NCP filter” and the algorithm is denoted as:

)()()( \ kiNCPiNCPkNCP OOO (4.13)

where O NCP(i) is a collection of outline models relating to any NCP, and k is the NCP of an

input outline. Example 1 (below) clarifies the concept behind NCP filter.

Example 1

Assuming that k= 2, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained in the

system and NCP of O1, O2, O3, O4, O5, O6 are 2, 2, 6, 3, 5 and 2 respectively, O NCP(2) is

calculated using the formula 4.13 as follow:

O NCP(k) = O NCP(i) \ O NCP(i ≠ k)


85

O NCP(2) = { O1, O2, O3, O4, O5, O6} \ { O3, O4, O5}

= { O1, O2, O6}

The second rejection strategy – outline models are discriminated in favour of a pair of

primitives, appearing at the first and last (consonant) segments of an outline. This approach

is denoted as “F&L filter” and the algorithm is denoted as:

O F(k),L(j) = OF(i),L(i) \ OF(i ≠ k),L(i ≠ j) (4.14)

where OFi,Li is a set of outline models whose first and last segments relate to any type of

primitive and k and j are types of the first and last segments of an input outline respectively.

Example 2 below demonstrates the concept behind F&L filter.

Example 2

Assuming that k= 5, j = 6, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained

in the system and (F(i),L(i)) of O1, O2, O3, O4, O5, O6 are (3,2), (5,5), (5,6), (1,2), (5,6) and

(5,2) respectively, O F(5),L(6) is calculated as:

O F(k),L(j) = OF(i),L(i) \ OF(i ≠ k),L(i ≠ j)

O F(5),L(6) = { O1, O2, O3, O4, O5, O6} \ { O1, O2, O4, O6}

= { O3, O5}

The idea behind formula 4.14 is based on an interesting phenomenon i.e., wrongly spelled

English words are sometimes comprehensible to a reader as long as the first and the last

letters of the words are clearly indicated. For example, you may understand the following

sentence even though it contains a number of spelling errors: “Wornlgy seplled Egnlish

words are sitll leiglbe to a reader as lnog as the frist and lsat ltteers of the words are crroect.”

In other words, first and last letters of a word provide heuristics for word identification in

English. Similar to this phenomenon, the outline model selection in our work can be based

on evidence of the first and last primitives of an outline provided that the first and last

segments of an outline are always written in Pitman’s Shorthand. According to our study


86

done on 10 samples of shorthand notes, handwritten by professional shorthand writers, it is

observed that the first and last primitives of a vocalized outline are always written in

Pitman’s Shorthand. Therefore, the second rejection strategy (formula 4.14) is based on the

first and last primitives of a shorthand outline.

The third rejection strategy – outline models are selected depending on the existence of

circular primitives in an input outline. The approach is referred to “C filter” and the

algorithm is denoted as:

O C(k) = OC(i) \ OC(i ≠ k) (4.15)

where OC(i) is a set of outline models, k is a conditional variable which is TRUE if an input

outline contains circular primitives and FALSE otherwise. Example 3 below demonstrates

the concept behind C filter.

Example 3

Assuming that k= TRUE, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained

in the system and C(i) of O1, O2, O3, O4, O5, O6 are TRUE, FALSE, TRUE, FALSE, FALSE,

TURE respectively, O C(TRUE) is calculated as:

O C(k) = OC(i) \ OC(i ≠ k)

O C(TRUE) = { O1, O2, O3, O4, O5, O6} \ { O2, O4, O5}

= {O1, O3, O6}

Formula 4.15 checks the existence of circular primitives in outline models and splits the

whole outline models into two main groups: those containing circular primitives and those

not containing circular primitives. In general, this rejection strategy performs well, with

reliable accuracy of the collaborator’s recognition engine at detecting circular primitives of

an outline (if there are any).

Overall, model selection strategies carried out in this work are illustrated from left to right

rejection order (Figure 4.10). After the C filter, posterior probabilities of the remaining


87

outline models are calculated using formula 4.12, with which the N-best candidate outline

models for a given input are chosen.

Figure 4.10 Illustration of outline model selection strategies

4.8 Experimental Result

A primary goal of the experiments carried out in this chapter is to evaluate the transcription

accuracy of the Bayesian Network based word interpreter under the following criteria:

In the presence of shape variation and position confusion of pen strokes due to

natural handwriting.

In the presence of segmentation and classification errors due to hardware constraints

and limitations of the recognition engine.

In the presence of missing vowel primitives that are randomly omitted among

outlines by experienced Pitman’s Shorthand writers.

In the presence of incorrect shorthand outlines, written by inexperienced shorthand

writers

Legend

A collection of outline model (its length represents number of outline models)

NCP filter

F&Lfilter

C filter Posterior probability

filter

Rejection strategy


88

4.8.1 Data Set

Three types of data sets are evaluated in experiments of this chapter and they can be outlined

as follow:

Single-consonant data set: This data set contains outlines with skeletons having

one and only one consonant stroke, for instance, shorthand outlines for the

words “bay” ( ), “pea” ( ) and “pat”( ). In general, various groups of

homophones (i.e., outlines that look similar but have different representations)

contain in this data set as outlines are similar with minor difference of line

thicknesses, vowel positions and inclinations.

Stroke-combination data set: This data set contains outlines with skeletons

having two or more consonant strokes, written according to the normal rules of

Pitman’s Shorthand i.e., phonemes of the words are directly converted into

Pitman’s primitives without applying any special rules of Pitman’s Shorthand,

invented for speed enhancement purposes. The data set covers the whole range

of possible stroke combinations, and sample outlines of the data set are

illustrated (Figure 4.11).

Figure 4.11: Samples of the stroke combination data set

Special-rule data set: This data set contains words written according to the

special rules of Pitman’s Shorthand. For instance, instead of writing the word

“after” by comprising primitives of the phonemes /F/, /T/, /R/ and vowels as in

Figure 4.12(a), Pitman uses a doubled length /F/ curve to express the word

Bar making rare escape machine


89

“after” as in (Figure 4.12 (b)). In general, this data set contains inconsistent

outlines, written without following corresponding special rules of Pitman’s

Shorthand by (inexperienced) shorthand writers who do not digest the complete

rules of Pitman’s Shorthand.

Figure 4.12: Two different shorthand outlines for the word “after”; (a) the word “after” is written according to the direct conversion of phonemes into primitives (b) the word

“after” is written according to the double-length rule of Pitman’s Shorthand

Table 4-2: Details of the data collection for the three data sets

Data set name Numbers of words

Writer ID Number of times

Single-consonant data set 135 Writer A 2

Single-consonant data set 135 Writer B 1

Stroke-combination data set 192 Writer A 1

Stroke-combination data set 192 Writer B 1

Stroke-combination data set 192 Writer C 1

Special-rule data set 87 Writer A 2

Special-rule data set 87 Writer D 2

Special-rule data set 87 Writer E 1

In total, 1416 outlines were collected for the three data sets where

(a) Incorrect outline (b) Correct outline

Two different outlines for the word “after”

Consonant /F/

Consonant /T/ Consonant /R/

Doubled length curve /F/Vowel Vowel


90

Table 4-2 provides details of the collected data. The data is collected using a tablet PC

with an electromagnetic digitizer of resolution 1000 ppi and five writers were involved in the

data collection. The three data sets cover the whole range of shorthand primitives and the

word list contains the 5000 most frequently used English words of the general domain. 45%

of the data is included in a training data set and samples of the collected data are illustrated

(Figure 4.13).

Figure 4.13: Screen shot of outlines written by Writer A

4.8.2 Evaluation of the Recognition Engine

Before the evaluation of the word transcription performance of the Bayesian Network based

interpreter, this section firstly evaluates the accuracy of the recognition engine in order to

relate it to the overall word transcription performance. The study is categorized into three

groups namely: (1) analysis of the vocalized outline identification, (2) analysis of the outline

segmentation, and (3) analysis of the primitive classification. The studies are carried out

using the whole data sets and experimental results are discussed as follow.

Firstly, the accuracy of the vocalised outline identification is discussed. To clarify what is

meant by the vocalised outline identification– it is the process of defining whether a written

outline is a short-form or a phonetically written outline. As shown in Figure 4.14, accuracies

of the vocalised outline identification vary from writer to writer, or even from time to time

Pitman’s Shorthand outline for the word “bay”


91

for the same writer. For instance, consider the accuracies of the vocalised outline

identification for writer A for the single-consonant data set where there is a difference of

approximately 62% between the accuracy of the first time and the second time writings. The

study finds that a major reason for having such a difference of accuracy is that writer A

omitted most of the vowels while writing the single-consonant data set the first time whereas

the writer indicated at least one vowel for most of the words at the second time of writing.

Therefore, it is summarised that the indication of at least one vowel for an outline is critical

for obtaining high vocalised outline identification accuracy. Any words that are not

recognized as vocalised outlines are remarked as short-forms by the recognition engine. For

example, 73% of the data written by writer A for the single consonant data set are remarked

as short-forms by the recognition engine although the outlines are, in fact, vocalized

outlines. On the whole, average vocalised outline identification accuracy for the whole data

sets is 69%.

Figure 4.14: Evaluation of the vocalised outline identification of the recognition engine

Performance of the vocalised outline identification by the recognition engine

0%

20%

40%

60%

80%

100%

120%

A A B C A B A A E D D

writer

Acc

ura

cy

Legend

Single-consonant data set

Stroke-combination data set

Special-rule data set


92

Secondly, segmentation accuracy of the recognition engine is discussed. In general, the

segmentation accuracy varies depending on different data sets. As shown in Figure 4.15, the

single-consonant data set has about 72% of correctly segmented outlines, whereas the

stroke-combination data set has only about 21% of correctly segmented outlines on average.

The results are reasonable since the single consonant data set contains outlines with only one

consonant stroke and hence the higher segmentation accuracy. For the analysis of

segmentation accuracy of different writers of same data set, consider the results of the

special-rule data set where the segmentation accuracy of outlines written by writer E is

higher than that of writer A. Statistics show that writer A does not have previous experience

of using a pen based text entry system whereas writer E has previous experience of applying

pen based text entry systems of handheld devices. In addition, statistics show that writer A

prefers writing small scripts on a tablet in a similar manner to writing on a conventional

paper whereas writer E produces larger scripts with flexible pen movements on the digitizer.

Therefore, it is observed that writers’ previous experience of using pen based text entry

systems has an influence over the segmentation performance of the recognition engine. The

average segmentation accuracy of the overall data sets is 36%. The segmentation accuracy

presented in Figure 4.15 is based on the number of correctly detected vocalized outlines and

is formulated as follow:

100)(

t

yts (4.16)

where s is segmentation accuracy, t is the total number of written words and y is the total

number of outlines that are recognised as short-forms instead of vocalised outlines.


93

Figure 4.15: Evaluation of the segmentation accuracy of the recognition engine

Thirdly, classification accuracy of the recognition engine is discussed. As shown in Figure

4.16, the average classification accuracy of the stroke-combination data set is lower than that

of the single consonant data set or special rule data set. Statistics show that the classification

accuracy is influenced by several factors including tidiness of the handwriting, limitations of

hardware, or limitations of applied algorithms of the recognition engine. On average, the

classification accuracy of the whole data sets is 77% where the classification accuracy is

based on the total number of outlines that are recognised as vocalised outlines as well as

being correctly segmented. The formula is defined as:

100

t

xtc (4.17)

Segmentation accuracy of the recognition engine

0%10%20%30%40%50%60%70%80%


writer

Legend





94

where c is classification accuracy, t is total number of written words, x is total number of

outlines that are recognised as vocalized outlines as well as being correctly segmented.

Figure 4.16: Evaluation of the classification accuracy of the recognition engine

4.8.3 Evaluation of the Word Transcription Accuracy

Experiments carried out in this section are categorized into three groups namely:

Analysis of word transcription accuracy using single-consonant data set.

Analysis of word transcription accuracy using stroke-combination data set.

Analysis of word transcription accuracy using special-rule data set.

Each group comprises of four graphs discussing experimental results from different aspects,

outlined as follow:

Recognition accuracy vs. transcription accuracy: the graph illustrates the influence

of the performance of the recognition engine over the transcription engine. It applies

Classification accuracy of the recognition engine

0%

20%

40%

60%

80%

100%

120%


writer

Legend





95

two types of data in order to discuss the theme: firstly, data with any kind of errors

of the recognition engine and secondly, (filtered) data with no vocalised outline

identification and segmentation errors of the recognition engine.

Accuracy of the end result: the graph illustrates the accuracy of a list of results of an

input outline according to three measures: firstly, the accuracy of a correct word

appearing in the result list, secondly, the accuracy of the correct word appearing in

the top five group of the result list and thirdly, the accuracy of the correct word

appearing at the topmost position of the result list. Note that accuracies illustrated in

this graph are based on data with no vocalized outline and segmentation errors as the

correction of these errors is not included in the scope of this research.

Correction accuracy vs. classification/vowel errors: the graph illustrates the

correction accuracy of the Bayesian Network based word interpreter in relation to

the classification and vowel omission errors. Similarly, results reported in this graph

are based on data with no vocalised outline identification and segmentation errors.

Factors influencing the accuracy of a result list: the graph illustrates the average

distribution of factors influencing a correct word not to appear at the topmost

position of the result list. Similarly, results reported in this graph are based on data

with no vocalised outline identification and segmentation errors.

4.8.4 Analysis of Word Transcription Accuracy Using the Single

Consonant Data Set

4.8.4.1 Analysis of the Recognition Accuracy vs. the Transcription Accuracy

As shown (Figure 4.17), accuracy of the recognition engine, specifically, accuracy of the

vocalized outline identification and segmentation has a huge impact on accuracy of the

transcription engine. The study finds that approximately 73% of outlines written by writer A

the first time are not recognised as vocalised outlines and this directly affects the


96

transcription accuracy (less than 20%). It has been discussed that inadequacy of vocalised

outline identification of the recognition engine is mainly affected by the omission of vowels

among outlines and therefore, the indication of at least one vowel of a vocalised outline is

also encouraged in this research in order to achieve high transcription accuracy.

Relationship between recognition accuracy and transcription accuracy of the single-consonant

data set

0%10%

20%30%40%50%

60%70%80%

90%100%

A A B

Writer

Wo

rds

app

eare

d i

n t

he

resl

ut

list

In the presence ofany kind ofrecognition errors

In the presence of novocalised outlineidentification andsegmentation errors

Figure 4.17: Illustration of a relationship between recognition accuracy andtranscription accuracy of the single consonant data set

4.8.4.2 Analysis of the Accuracy of a Result List

As shown (Figure 4.19), approximately 93% of input data are interpreted with a result list

containing a correct word; 72% of the data are interpreted with a correct word appearing in

the top five group of a result list; 40% of the data are interpreted with a correct word

appearing at the topmost position of a result list on average.

An interesting phenomenon here is that although writer A has an immediate level of skill in

Pitman’s Shorthand and writer B has an inexperienced level of skill in Pitman’s Shorthand,

outlines written by writer B are transcribed more accurately than that of writer A. The study

finds that this is because the handwriting of writer B is more legible with more informative


97

pen strokes than that of writer A as compared (Figure 4.18). In relation to this finding, it is

remarked that the writing of legible scripts is encouraged in this current research to be able

to obtain high recognition and transcription accuracy.

Figure 4.18: Comparison of the handwriting of two writers

Evaluation of the word transcription accuracy of the single consonant data set

0%

20%

40%

60%

80%

100%

A A B

Writer

Tra

nsc

rip

tio

n a

ccu

racy

A correct word appearingin a result list

A correct word appearingin the top five group of aresult list

A correct word appearingat the topmost positionof a result list

Figure 4.19: Illustration of the word transcription accuracy of the single consonant data set

Pitman’s Shorthand outlines written by writer B

Pitman’s Shorthand outlines written by writer A

night nod note nut

night nod note nut


98

4.8.4.3 Analysis of the Correction Accuracy vs. the Classification/Vowel

Errors

The graph (Figure 4.20) illustrates classification and vowel errors in comparison with the

correction accuracy where the correction accuracy indicates how much of the classification

and vowel errors are covered by the transcription engine respectively. Here, the

classification error, the vowel error and the correction accuracy are formulated as follow:

100t

ec (4.18)

where c is the classification error, e is the number of words having a classification error and t

is the total number of input words.

100t

fv (4.19)

where v is the vowel error, f is the number of words having omitted vowels and t is the total

number of input words.

100t

ba (4.20)

where a is the correction accuracy, b is the total number of words interpreted correctly by

the transcription engine in the presence of the classification /vowel errors and t is the total

number of words having classification errors or a vowel errors respectively.

On average, the correction rate for the classification error is 76% and the correction rate for

the vowel error is 55%. This indicates that the Bayesian Network based outline models,

implemented in the transcription engine are capable of coping with the classification and

vowel errors.


99

Correction accuracy in comparison with the classification or vowel errors of the Single

consonant data set

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

A A B

Wrtier

Classification errors

Successful transcriptionin the presence ofclassification errors

Vowel errors

Successful transcriptionin the presence of vowelerrors

Figure 4.20: Illustration of the correction accuracy in comparison with the classification or vowel errors of the single consonant data set

4.8.4.4 Analysis of Factors Influencing the Accuracy of a Result List

As illustrated (Figure 4.21), the major factor (49%) that influences a correct word of an input

outline not to appear at the topmost position of a result list is because of the similarity of an

input outline to other outlines. This case is generally true for single-consonant data set as

outlines included in this data set are very similar, with minor differences of line thicknesses

and vowel positions. Other factors influencing a correct word not to appear at the topmost

position of a result list are classification errors (31%), vowel error (3%) and a combination

of similarity of an input outline to other outlines, classification error and vowel error (17%).


100

Average distribtuion of factors influencing the accuracy of a result list (single-consonant data

set)

49%

31%

3%17%

due to similarity to other outlines

due to classification errors

due to vowel errors

due to combination of similarity to other outlines, classification errors andvowel errors

Figure 4.21: Illustration of an average distribution of factors influencing the accuracy of a result list (single consonant data set)

4.8.5 Analysis of Word Transcription Accuracy Using Stroke-

combination Data Set

This section analyses word transcription accuracy for outlines containing two, or more than

two, strokes. The primary purpose of the study is to evaluate the transcription accuracy in

the presence of stroke combinations. Similar to the single-consonant data set, four types of

graphs are discussed as follow.


As shown (Figure 4.22), average transcription accuracy of filtered 2 stroke-combination data

set is 97% where the value is similar to an accuracy achieved by the single consonant data

set (Figure 4.17). However, in terms of unfiltered data 3 the overall transcription accuracy

of the stroke-combination data set decreases by 27%, compared to that of the single-

consonant data set. The study finds that this is mainly due to the increase of segmentation

2 Filtered data does not contain any vocalized outline identification or segmentation errors. 3 Unfiltered data contains any kind of recognition errors.


101

errors and in relation to this finding, it is evaluated that reliable outline segmentation is

important for the overall transcription accuracy in the case of having words with two or

more consonant strokes.

Relationship between recognition accuracy and transcription accuracy of the stroke-combination

data set

0%

20%

40%

60%

80%

100%

120%

C A B

Writer

Wo

rds

app

eare

d i

n t

he

resl

ut

list


In the presence of novocalised outlineidentification andsegmentation errors

Figure 4.22: Illustration of the relationship between recognition accuracy and transcription accuracy of the stroke-combination data set

4.8.5.2 Analysis of the Accuracy of a Result List

According to experimental results (Figure 4.23), approximately 97% of input data are

transcribed with a result list containing a correct word; 97% of input data are transcribed

with a correct word appearing in the top five of a result list; 55% of input data are

transcribed with a correct word appearing at the topmost position of a result list. On the

whole, the transcription accuracy of the stroke-combination data set increases by 25%,

compared to the single consonant data set.


102

An interesting phenomenon here is that although writer C’s data is not included in the

training data set, 96% of the writer’s outlines are transcribed with a correct word appearing

at the top-five of a result list. This indicates that the history based learning algorithm

implemented in Bayesian Network models can effectively cope with unseen patterns that are

not included in a training data set.

Evaluation of the word transcription accuracy of the stroke-combination data set

0%

20%

40%

60%

80%

100%

120%

C A B

Writer

Acc

ura

cy


A correct word appearingin the top five of a resultlist

A correct word appearingat the top most positionof a result list

Figure 4.23: Illustration of the word transcription accuracy of the stroke-combination data set


Errors


correction accuracy for stroke-combination data set. The classification error, the vowel error

and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20

respectively. As shown (Figure 4.24), none of the 3% of the classification errors of writer C

are corrected because the errors include patterns deviating completely from original patterns.

For instance, if the orientation of an input pattern is completely different from its original

form, the transcription engine does not cover this kind of error.


103

Note that writers rarely omit vowels in this data set, compared to the single-consonant data

set. Writers of this data set are encouraged to indicate at least one vowel of an outline,

mainly in order to avoid the rejection of substantial data at the recognition stage by a

vocalised outline detector.

Correction accuracy in comparison with the classification or vowel errrors of the stroke-

combination data set

0%

20%

40%

60%

80%

100%

120%

C A B

Wrtier



Vowel errors


Figure 4.24: Illustration of the correction accuracy in comparison with the classification/vowel errors of the stroke combination data set


The graph (Figure 4.25) illustrates factors influencing a correct word of an input outline not

to appear at the topmost position of a result list, where 11% is due to the similarity of an

input outline to other outlines and the rest 89% is due to the combination of classification

errors, vowel errors and the similarity of input outlines to other outlines. An interesting

phenomenon here is that ambiguity due to the similarity of an input outline to other outlines

decreases by 38% compared to the single consonant data set. This indicates that outlines

become less ambiguous when they contain two or more consonant strokes.


104

Average distribtuion of factors influencing the accuracy of a result list (stroke-combination data

set)

11%

89%

due to similarity to otheroutlines

due to combination ofsimilarity to otheroutlines, classificationerrors and vowel errors

Figure 4.25: Illustration of an average distribution of factors influencing the accuracy of a result list (stroke-combination data set)

4.8.6 Analysis of Word Transcription Accuracy for the Special-rule Data

Set

This section analyses word transcription accuracy for outlines, written according to the

special rules of Pitman’s Shorthand. The primary purpose of the study is to evaluate the

transcription performance in the presence of the application of the special rules of Pitman’s

Shorthand, where consistency between patterns written by one writer to another becomes a

critical concern. Similar to the single-consonant and stroke-combination data sets, four

types of graphs are discussed as follow.


As illustrated (Figure 4.26), the transcription accuracy of filtered data 4achieves up to 100%,

however, the transcription accuracy of unfiltered data5 gets to as low as 1% for writer D.

4 Filtered data does not contain any vocalized outline identification error or segmentation error.

5 Unfiltered data contains any kind of recognition errors.


105

The study finds that this is mainly due to the preference of writer D writing outlines without

vowel components as well as due to the writing of incorrect outlines without fully following

the special rules of Pitman’s Shorthand. In relation to this finding, it is remarked that the

writing of consistent outlines in accordance with the special rules of Pitman’s Shorthand is

encouraged in this research in order to obtain high transcription accuracy.

Relationship between recognition accuracy and transcription accuracy of the special-rule data set

0%

20%

40%

60%

80%

100%

120%

A A E D D

Writer

Wo

rds

app

eare

d i

n t

he

resl

ut

list


In the presence of novocalised outlineidentifcation andsegmentation errors

Figure 4.26: Relationship between recognition accuracy and transcription accuracy of the special-rule data set

4.8.6.2 Analysis of the Accuracy of the Result List

According to experimental results (Figure 4.27), approximately 85% of input data are

transcribed with a result list containing a correct word; 85% of input data are transcribed

with a correct word appearing at the top-five of a result list; 80% of input data are

transcribed with a correct word appearing at the topmost position of a result list. An

interesting phenomenon here is that the special-rule data set achieves the highest average

transcription accuracy (80%) in terms of a correct word appearing at the topmost position of

a result list.


106

Evaluation of the word transcription accuracy of the special-rule data set

0%

20%

40%

60%

80%

100%

120%

A A E D D

Writer

Tra

nsc

rip

tio

n a

ccu

racy


A correct word appearingin the top five of a resultlist

A correct word appearingat the topmost positionof a result list

Figure 4.27: Evaluation of the word transcription accuracy of the special-rule data set


Errors


correction accuracy for the special-rule data set. The classification errors, the vowel errors

and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20

respectively. Experimental results show that there is no classification error or vowel error

for outlines written by writer A and writer D (i.e., data of the first time writing) and this

directly affects the overall transcription accuracy which achieves up to 100% for some cases

as illustrated(graph above). On average, 100% of the vowel errors and 67% of classification

errors are covered by the transcription engine.


107

Correction accuracy in comparison with classification or vowel errors of the special-rule

data set

0%

20%

40%

60%

80%

100%

120%

A A E D D

Wrtier



Vowel errors


Figure 4.28: Illustration of the correction accuracy in comparison with classification or vowel errors of the special-rule data set


The graph (Figure 4.29) illustrates factors influencing a correct word of an input outline not

to appear at the topmost position of a result list, where 23% is due to the similarity of an

input outline to other outlines, 8% is due to classification errors and the rest 69% is due to

the combination of the classification errors, vowel errors and the similarity of input outlines

to other outlines.


108

Average distribtuion of factors influencing the accuracy of a result list (special-rule data set)

23%

8%69%

due to similarity to otheroutlines

due to classificationerrors

due to combination ofsimilarity to otheroutlines, classificationerrors and vowel errors

Figure 4.29: Illustration of an average distribution of factors influencing the accuracy of a result list (special-rule data set)

4.9 Summary and Discussion

In summary, this chapter proposes a novel primitive based text translation approach to

interpret handwritten Pitman’s outlines into English words, using Bayesian Network based

outline models. Ambiguities of pen strokes and interactions between strokes of written

shorthand outlines are modelled in Bayesian Network outline models. The word

interpretation comprises the network modelling, the belief propagation, the Bayesian

learning and the model selection. Experimental results of the new framework are presented,

following the full description of Bayesian Network based transcription algorithms.

An evaluation of the recognition engine is presented in combination with the experimental

results of the Bayesian Network based word interpreter, based on three data sets, namely:

the single-consonant data set, the stroke-combination data set and the special rule data set.

Overall, a primary issue discussed in relation to the performance of the recognition engine is

the indication of at least one vowel of an outline in order to avoid an incidence where

outlines are mistaken as short-forms instead of vocalised outlines. In terms of the feasibility


109

of enforcing the restriction to shorthand writers, approximately 80% of inexperienced

Pitman’s Shorthand writers find the restriction is easily adaptable; however, approximately

60% of professional Pitman’s Shorthand writers find the restriction is impractical, especially

in the case of speed writing. Therefore, a further improvement on algorithms of the vocalised

outline identifier is recommended where the indication of vowels of an outline will no longer

be mandatory.

From the aspect of the performance of the Bayesian Network based word interpreter, the

average transcription accuracies for the three (filtered6) data sets are 91% for a correct word

appearing in a result list, 85% for a correct word appearing in the top five of a result list and

58% for a correct word appearing at the topmost position of a result list. Overall, the

accuracy of 91% is satisfactory, however the accuracy of a correct word appearing at the

topmost position of a result list (58%) indicates that the homophones of the result list need to

be resolved with the application of contextual information. A resolution to this problem is

discussed in detail in chapter 6, which is about the phrase level transcription.

From the aspect of a relationship between the features of a writer and the transcription

accuracy, the study finds that gender and age of writers do not have significant influence on

the performance of the recognition and transcription systems. However, the study finds that

a writer’s skill in Pitman’s Shorthand and a writer’s previous experience in using pen based

text entry systems are related to the overall transcription accuracy. Another consideration is

that writers of the current experiments are right-handed and a further analysis of the

transcription performance with left-handed writers is recommended. In addition, writers of

the current experiment use British English and further analysis of the transcription

performance with the writers who use American English is challenging. Since Pitman’s

Shorthand is written phonetically, outlines written according to British English and

American English are different, especially in vowel notations.

6 Filtered data does not contain any vocalized outline identification error or segmentation error.


110

As a final discussion, a comparison of performance of the conventional phonetic based

transcription and performance of the novel primitive based transcription is given. Table

(Table 4-3) presents accuracies of the two approaches where results are based on the data set

used in experiments of the phonetic based transcription approach, presented in Chapter 3.

Table 4-3: Transcription accuracies of the phonetic based transcription and the primitive based transcription approaches

Average Transcription

Accuracy

Primitive based

transcription

Phonetic based

transcription

Overall 93% 84%

In the presence of vowel

omission or confusion

100% 0%

In the presence of inconsistent

writing

0% 0%

In the presence of

classification error

100% 100%

As shown (Table 4-3), an average transcription accuracy of the primitive based transcription

approach increases by 9% compared to that of the phonetic based transcription approach.

The study finds that this is mainly due to the increased correction accuracy of vowel errors

in the novel framework. Overall, performance of the novel proposed framework is promising

but must be improved upon in the areas discussed for it to become a commercially viable

system.


111

5 Generation of a machine-readable Pitman’s Shorthand

lexicon


112


The previous chapter presented a novel solution as a means of interpreting handwritten

Pitman’s Shorthand outlines using Bayesian Network algorithms, in which geometrical

features of the outlines are directly translated into English word(s). On the whole, the

solution was found to be efficient, mainly with the use of a machine-readable Pitman’s

Shorthand dictionary that contains a set of shorthand outlines mapping to corresponding

English word(s). Based on a thorough literature review carried out in this research, no other

machine-readable (electronic) Pitman’s Shorthand lexicon has ever been designed, making

this the first of its kind which has been developed specifically for this research. This may be

a major reason why none of the previous work (of the same framework) attempted to analyse

performance of the direct transcription of geometric primitives into English words.

Specifically, this chapter presents full details of the creation of the electronic Pitman’s

Shorthand lexicon, developed for this research, under the following four sections:

1. Overview: overview of the rule based creation of the electronic Pitman’s Shorthand

lexicon and discussion on general advantages and disadvantages of rule based

approaches.

2. Structure: description of the lexicon structure in terms of feature set, key and lexicon

as a whole.

3. Rules: description of rules, applied in our system to conform to the writing rules of

Pitman’s Shorthand.

4. Experimental results: evaluation of the electronic Pitman’s Shorthand lexicon,

mainly in terms of accuracy and homophone distribution.


113

5.1 Overview

Firstly, in order to clarify precisely what is meant by a Pitman’s Shorthand lexicon, sample

entries of a conventional Pitman’s Shorthand dictionary (available in book format) and

sample entries of an electronic Pitman’s Shorthand dictionary are illustrated (Figure 5.1).

Figure 5.1: (a) sample entries of a conventional Pitman’s Shorthand dictionary available in book format (b) sample entries of an electronic Pitman’s Shorthand

lexicon

A primary objective of the research presented in this chapter is to create an electronic

Pitman’s Shorthand lexicon Figure 5.1(b), based on the concept of Figure 5.1 (a). A major

difference between them (Figure 5.1 (a) and (b)) is the relationship between keys and

elements – each key (each word) is related to one and only one shorthand outline in the

conventional lexicon whereas each key (each shorthand outline) is related to one or more

than one word in the electronic lexicon. The latter layout is preferred in this work because

(a) (b)

Key Shorthand Key Word

airs , erase

bake

ball

bays, pays

airs, erase

oak, go

oak, go

pays, bays

airs

bake

ball

bays

erase

oak

go

pays


114

it directly relates to ambiguities of handwritten Pitman’s Shorthand e.g., line thickness

ambiguities.

5.1.1 Rule-based Creation of the Electronic Pitman’s Shorthand Lexicon

The creation of the electronic Pitman’s Shorthand lexicon is based on the following four

basic steps, which are taken by stenographers while learning Pitman’s Shorthand:

1. Gain prior knowledge of English pronunciation.

2. Memorise notations of Pitman’s Shorthand.

3. Memorise writing rules of Pitman’s Shorthand.

4. Write Pitman’s Shorthand outlines, using the above learned knowledge.

In order to instruct a machine to generate the electronic Pitman’s Shorthand lexicon

automatically, the above four steps are restructured as below:

1. Set up a phonetic lexicon with a series of phonemes as a key for word identification.

2. Define notations of Pitman’s Shorthand in terms of low level geometric features, for

instance, a consonant W is defined as a combination of an anti-clockwise hook

and an upward stroke .

3. Define conversion rules that conform to the writing rules of Pitman’s Shorthand.

4. Generate a series of geometric features for a given word using the phonetic lexicon

and conversion rules.

The Pitman’s Shorthand lexicon is created using a set of conversion rules. When rule-based

algorithms are introduced in the field of handwriting recognition, one may argue that rules

are static and incapable of coping with natural ambiguities [Sy94], [Lr89]. Here, it is

important to realise that the rules reported in this chapter are applied only to static lexical

data, not to handwritten data; due to the use of this static lexical data, accuracy of the

Pitman’s Shorthand lexicon becomes reliable. Like other rule-based approaches [FF93],


115

[Mm03], training is not required for the creation of the shorthand lexicon and rules can be

refined easily as needed if the lexicon is to be altered.

5.2 Structure of the Electronic Pitman’s Shorthand Lexicon

5.2.1 Feature Set

The electronic Pitman’s Shorthand lexicon includes 31 features representing phonemes of

a word. The features are represented in numerical form and are shown in Table 5-1.

Table 5-1: features of the electronic Pitman’s Shorthand lexicon

Type Pattern Description Type Pattern Description

1 /T/ or /D/ 17

Large unclosed circle

2

/F/ or /V/ 18

Large closed circle

3

/th/ or /TH/ 19 Small unclosed loop

4

/P/ or /B/ 20

Small closed loop

5 /M/ 21

Large unclosed loop

6 /N/ or /NG/ 22

Large closed loop

7 /K/ or /G/ 23 Small hook

8

/SH/ or /ZH/ 24

Large hook

9

/CH/ or /J/ 25 Vowel

10 /R/ (downward) 26 Vowel

11

/L/ (upward) 27 Diphthong

12

/S/ or /Z/ 28 Diphthong

13

/R/ (upward) 29 Diphthong


116

14

/L/ (upward) 30

Diphthong

15

Small unclosed circle 31

Diphones

16 Small closed circle

5.2.2 Key

A key corresponds to a vocalised shorthand outline and relates to one or more than one

word. It is composed of consonants and vowels such that consonant primitives of a

vocalised outline are firstly allocated in chronological order with a series of vowel primitives

following at the end. A major reason for keeping vowel primitives at the end of a key is to

cope with the special writing order of Pitman’s Shorthand i.e., consonants are always written

first with vowels placed around the consonant kernel later. Sample keys are given in Figure

5.2 where each key comprises two major components– one containing consonant primitives

and the other containing vowel primitives. Both components are arranged in chronological

order such that a primitive at the end of the first component corresponds to the last

consonant of a vocalised outline and a primitive at the beginning of the second component

corresponds to the first vowel of the vocalised outline.

Word Pronunciation Phonemes in chronological writing order of Pitman’s Shorthand

Keys of a Pitman’s Shorthand lexicon

Famous /F Ā M Ŭ S/ /F M S Ā Ŭ / 2+5+16+91+92

Yellow /Y Ě L Ō W/ /Y L Ě Ō / 23+13+11+91+92

Figure 5.2: Sample keys of the electronic Pitman’s Shorthand lexicon; vowels are underlined

Pitman’s Shorthand outlines for the word “famous” and “yellow”

famous yellow


117

5.2.3 Lexicon Layout

A Pitman’s Shorthand lexicon is a hash-table with each key mapping to one or more than

one word where words with the same key contain the same series of similar consonant

primitives. Here, “similar consonant primitives” stands for primitives of the same type with

different line thicknesses or lengths. Sample entries of the Pitman’s Shorthand lexicon are

illustrated in Figure 5.3.

Lexicon

Sample Key Word

1 2+91 fee, father, further, after

2 4+91 pays, bays

Figure 5.3: Sample entries of the electronic Pitman’s Shorthand lexicon

Sample one in Figure 5.3 presents a lexicon entry for the words “fee”, “father”, “further” and

“after”. The example indicates that words with similar geometric features of different

lengths belong to the same key. In order to recognise length variation of the words, consider

the sample Pitman’s Shorthand outlines given in Figure 5.3.

Similarly, sample two in Figure 5.3 presents a lexicon entry for the words “pays” and

“bays”. The example indicates that words with similar geometric features of different

thicknesses belong to the same key. Consider the sample outlines illustrated in Figure 5.3 to

identify line thickness difference between the two words.

On the whole, the Pitman’s Shorthand lexicon is created as follows:

fee father after pays bays

Sample Pitman’s Shorthand outlines


118

P: a phonetic lexicon

N: numbers of words contained in P

Wi: ith word of the phonetic lexicon, P

Vi: phonetic representation of Wi

Si: a series of geometric features of Vi

table: a hash table, holding a Pitman’s Shorthand lexicon

key: a key, representing a vocalised shorthand outline

value: word value to which a specified key is mapped in table

Initialisation

table = a hash table

Lexicon organisation

For i = 0 to N

//convert phonemes of a word into geometric features

Si = convertPhonemetoShorthand(Vi)

key = Si

//if a key already exists

if (table.containsKey(key))

value = table.get(key)

value += Wi

end

else if (!table.containsKey(key))

value = Wi

end

table.put(key,value)

end


119

5.3 Conversion Procedure

This section presents full details of a conversion procedure that is used to transform

phonemes of a word into a series of geometric features. Assuming that x is a set of

phonemes for a particular word and y is a shorthand representation for the word, a

conversion procedure can be defined as

y = convertPhonemeToShorthand(x) (5.1)

For instance, if x is a set of phonemes /T Ō D Ā / (for the word “today”), then y is produced

by invoking the conversion procedure as follows:

y = convertPhonemeToShorthand(/T Ō D Ā /)

y = 1+ 1+ 91+ 92

In total, the conversion procedure comprises 46 rules, conforming to the writing rules of

Pitman’s Shorthand 2000, defined in [Oj95]. In order to produce a primitive representation

for a given set of phonemes, the 46 rules are applied in an ascending priority order as follow:

Priority 1: 1st rule – 17th rule

Priority 2: 18th rule – 32nd rule

Priority 3: 33rd rule – 36th rule

Priority 4: 37th rule – 39th rule

Priority 5: 40th rule – 41st rule

Priority 6: 42nd rule – 43rd rule

Priority 7: 44th rule

Priority 8: 45th rule – 46th rule

For instance, application of the 2nd rule must follow the completion of the 1st rule and

similarly, application of the 18th rule must follow the completion of the 17th rule. With the


120

aid of a diagram (Figure 5.4), data flow in the conversion procedure can be followed.

Figure 5.4: Illustration of the conversion procedure

5.3.1 The Importance of Algorithms of the Presented Rules

The automatic generation of an electronic Pitman’s Shorthand lexicon is, in fact, the

replication of a human’s power of recalling a set of writing rules and producing shorthand

outlines for corresponding English words. This process seems to be trivial; however the

efficiency of replicating the exact ability of the human writer relies totally on the efficiency

of the rules implemented in the system. In reality, the implementation of these rules is

deeply complex, as it involves the consideration of several indefinite factors, such as the use

of different notations for the same phoneme, depending on the conformability of pen

movements; the use of various notations for the same pronunciation, depending on whether

the phonemes appear at the beginning, in the middle or at the end of a word; and so on. On

the whole, each rule replicates corresponding criteria on which stenographers base their

Input (phonemes of the word famous) /F Ā M Ŭ S/

Rule 1

Rule 18 to 32

Rule 45 to 48

Output (primitives of the word famous)2+5+16+91+92

Rule 2

Rule 17


121

ability to produce shorthand outlines, and it is important to clarify the concept behind each

rule to enable the reader to understand how the complex writing rules of Pitman’s Shorthand

are embedded in the system of this research.

5.3.2 Description of Rules

Table 5-2: Summary of the 46 rules applied in the creation of the Pitman’s Shorthandlexicon

Rule Description

1 Verification of a vocalised outline

2 Diphthong U

3 CON or COM at the beginning of a word

4 WH

5 Negative prefix IL-, IM-, IN-, IR-, UN-

6 PL, BL, TL, DL, CHL, JL, KL, GL, used consonantly at the beginning, in the

middle or at the end of a word

7 FL, VL, ThL, ML, NL, SHL, used consonantly at the beginning of a word

8 SPR, STR, SKR at the beginning of a word

9 STER in the middle or at the end of a word

10 CON, COM, CUM, COG in the middle of a word

11 SES, SEZ, ZES, ZEZ at the end of a word

12 Past tense ED

13 ST at the beginning, in the middle or at the end of a word

14 Half length stroke for one syllable words

15 ING

16 INGS

17 Suffix –SHIP

18 S or Z stroke


122

19 Suffix –MENT

20 Suffix –MENTAL

21 Suffix –MENTALLY

22 Double length stroke

23 MD, ND

24 FR, VR, Thr, THR, SHR, ZHR, MR, NR, used consonantly at the beginning of

a word

25+26 PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR, SHR, ZHR, MR,

NR, used syllabically at the beginning, in the middle or at the end of a word

27 SKR, SGR

28 KW, GW

29 PL, BL, TL, DL, CHL, JL, KL, GL, used syllabically in the middle or at the

end of a word

30 FL, VL, THL, ML, NL, SHL, used syllabically in the middle or at the end of a

word

31 S followed by H

32 S+vowel+hookR, ST+vowel+hookR

33 Downward L

34 F or V hook at the end of a word

35 F or V hook in the middle of a word

36 SHUN

37 N hook

38 Upward L

39 Half length stroke in a word of two or more syllables

40 Suffix –LY

41 Upward R and downward R

42 Dash H


123

43 Reversed FR, VR, Thr, THR

44 P, B, T, D, K, G, M, N, NG, F, V, Th, TH, R, CH, JH, SH, S, Z, ZH, H

45 Vowel extraction and appending

46 Vowel conversion

Table 5-2 presents a summary of the forty-six rules with a list of phonemes relating to each

of them. In order to avoid information overload, algorithms of just five rules are presented

in this section, and the remaining rules can be referenced in detail in Appendix A.

In general, the rules are discussed here in three aspects: complexity, objective and strategy.

The complexity of a rule corresponds to either direct conversion or indirect conversion.

Direct conversion directly converts phonemes into geometric features, whereas indirect

conversion performs the unusual conversion of phonemes into geometric features with

respect to the special writing rules of Pitman’s Shorthand, invented for speed improvement

purposes. In addition, the objective of a rule corresponds to the major role of the rule, and

the strategy of a rule corresponds to a programming procedure of the rule.

Description of the 3rd Rule (CON and COM at the beginning of a word)

Complexity: indirect conversion

Objective: to convert the sounds CON and COM at the beginning of a word into a dot

primitive. A sample outline containing the sound COM at the beginning is illustrated in

Figure 5.5.

Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is

not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON

or COM into a dot primitive.


124

Figure 5.5: Illustration of the use of a dot primitive for the sound COM at the beginning of a word

Description of the 5th Rule (Negative prefix IL-, IM-, IN-, IR-, UN-)


Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a

series of consonant and vowel primitives. A sample Pitman’s Shorthand outline containing

the prefix IR- is illustrated in Figure 5.6.

Strategy:

1. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.

2. Check if a word representation of an input matches with any element of the list;

3. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,

followed by a dot primitive and another extra upward stroke L;

4. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a

dot primitive and another extra curve M;

5. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,

followed by a dot primitive and another extra downward curve R;

6. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a

dot primitive and another extra curve N.

7. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a

dash primitive and another extra curve N.

Reference

Pitman’s Shorthand notations

/COM/ /M/ /NS/ /Ě/

Pronunciation for the word “commence”

/ K Ŏ M Ě N S/

Pitman’s outline for the word “commence”


125

In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not

allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitman’s

Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule

can be referenced in Appendix B.

Figure 5.6: Illustration of the use of negative prefix IR- in a vocalised outline

Description of the 6th Rule ( PL, BL,…,GL, used consonantly at the beginning, in the middle

or at the end of a word


Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the

beginning, in the middle or at the end of a word into a series of a small hook L followed by a

corresponding consonant primitive. Note that the consonant L is written as an upward or

downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,

/CH/, /J/, /K/ or /G/. A sample Pitman’s Shorthand outline containing the sound /P L/ at the

beginning of a word is illustrated in Figure 5.7.

Strategy:

1. If /N/ comes before /T L/ or /D L/, hook L is not used.

2. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;

Pitman’s Shorthand outline for the word “irregular”

Reference


/R/ /R/ /G/ /L/ /R/

(start) (middle) (end)

/Ĭ/ /Ě/ /Ă/ /U/

Pronunciation for the word “irregular”

/ Ĭ R Ě G Y U L Ă/


126

3. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/

of an input with a, b, c, d, e, f, g and h respectively, where

a = hook + P stroke

b= hook + B strokes

c = hook + T stroke

d = hook + D stroke

e = hook + CH stroke

f = hook + J stroke

g = hook + K stroke

h = hook + G stroke.

Figure 5.7: Illustration of the use of PL hook in a vocalised outline

Description of the 14th rule (Half length stroke for one syllable words)


Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length

writing rule of Pitman’s Shorthand and a sample (one-syllable) half-length outline is


Strategy: if a word is a one-syllable word and if there are consonants in a word other than

just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not

following a voiced consonant and /D/ is not following an unvoiced consonant.

Pitman’s Shorthand outline for the word “play”

Reference


/P/ /L/ /PL/ /Ā/

Pronunciation for the word “play”

/P L Ā/


127

Figure 5.8: Illustration of a one syllable half-length outline

Description of the 22nd rule (Double length stroke)


Objective: to omit the syllables –TER, -DER, -THER and -TURE of a word according to the

double-length rule of Pitman’s Shorthand (description of the rule can be referenced in

appendix B). A sample Pitman’s Shorthand outline containing the syllable -TER is


Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or

at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the

syllable is removed from the input phonemes. Samples of incompatible neighbouring

primitives are given in Figure 5.10.

Figure 5.9: Illustration of the omission of the syllable –TER in a vocalised outline

Pitman’s Shorthand outline for the word “after”

Reference


/F+TER/ /F/ /Ă/

Pronunciation for the word “after”

/ Ă F T E R/

Pitman’s Shorthand outline for the word “coat”

Reference


/K/ /T/ /Ō/

Pronunciation for the word “coat”

/K Ō T/


128

Figure 5.10: illustration of incompatible primitive pairs for doubling


Experiments carried out in this chapter are categorized into two main studies: firstly, an

analysis of the accuracy of a novel machine-readable Pitman’s Shorthand lexicon and

secondly, an analysis of the distribution of homophones (outlines which look similar but

have different representations) in the novel lexicon.

5.4.1 Data Set

In order to analyse an accuracy of a machine-readable Pitman’s Shorthand lexicon, 1253

generally used English words are chosen where the words cover the whole range of writing

rules of Pitman’s Shorthand, except the rules of currency notations and the rule of

punctuation.

In order to analyse the distribution of homophones in a machine-readable Pitman’s

Shorthand lexicon, 5000 most frequently used English words, extracted from the Brown

Corpus are used. Based on the word list, a hash table is created with a series of primitives as

a key for each group of words where a primitive key is automatically generated by the

transcription engine from a phonetic representation of a word. A pictorial representation of

the electronic Pitman’s Shorthand lexicon is presented (Figure 5.11).

Primitive pairs that cannot be represented by doubling

/F K/ /V K/ /F G/ /V G/


129

key Element

May

Maid

Made

Bat

Pat

Bad

Pad

Figure 5.11: Sample entries of a machine-readable Pitman’s Shorthand lexicon

5.4.2 Analysis of the Accuracy of a Machine Readable Pitman’s

Shorthand Lexicon

A primary goal of an experiment carried out in this section is to evaluate the accuracy of a

machine-readable Pitman’s Shorthand lexicon, where the formulation of the lexicon

accuracy is defined as:

100*)(

t

eta

(5.2)

where a is the accuracy of an electronic Pitman’s Shorthand lexicon, t is the total number of

words included in a testing data set and e is the total number of incorrectly generated words

whose primitive representations do not match with patterns defined in an original Pitman’s

Shorthand dictionary (available in book format).


130

Accuracies of different sizes of machine-readable Pitman's shorthand lexicon

70 75 80 85 90 95 100

100

300

500

700

900

1100

1253

Nu

mb

er o

f w

ord

s in

clu

ded

in

a l

exic

on

Accuray in Percentage

Figure 5.12: Average accuracies of different sizes of machine-readable Pitman’s Shorthand lexicons

The graph (Figure 5.12) illustrates accuracies of machine-readable Pitman’s lexicons of

different sizes where the word lists of each lexicon is chosen from a data set of 1253 words

randomly without including any duplicated words. The study finds that an accuracy of the

lexicon of 1253 words is 98.8% and an average accuracy of different sizes of lexicons is

99%. An average error rate is approximately 1% and errors are categorised into the

following four groups.

Errors due to an ambiguity of the writing rules of Pitman’s Shorthand.

Errors due to different phonetic representations of an applied phonetic dictionary.

Errors due to derivative or compound words.

Errors due to limitations of machine compatible scripts

In order to clarify the four types of errors, consider the following four examples in which

each example provides a sample erroneous shorthand outline with a corresponding

explanation for each type of error.

Example 1: Errors due to ambiguity of the writing rules of Pitman’s Shorthand


131

In order to clarify errors due to an ambiguity of the writing rules of Pitman’s Shorthand,

consider one of the rules of Pitman’s Shorthand which reads: “Straight strokes are doubled

in length to represent the sounds of –TER, -DER, -THER, and –TURE when they follow

another stroke” [Oj95]. According to this rule, the transcription engine generates a

shorthand representation for the word “weather” as Figure 5.13(a) in which the sound –

THER is added via a doubled length stroke. However, the typical Pitman’s Shorthand

dictionary (available in book format) defines the word “weather” as in Figure 5.13 (b) in

which the sound –THER is not written according to the double-length rule. The study finds

that this is because the typical Pitman’s Shorthand lexicon applies another rule of Pitman’s

Shorthand which reads: “A straight stroke is not doubled if the doubling would produce two

strokes of unequal length without an angle” [Oj95]. To determine whether the word

“weather” relates to the first rule or the second rule, consider other two outlines

(

Figure 5.14), defined in the typical Pitman’s Shorthand dictionary. Between the two words,

the typical dictionary defines that doubling is not allowed for the word “factor” as the curve

before the straight stroke will produce two strokes of unequal length if the straight stroke is

doubled (case a); however, it defines that doubling is allowed for the word “further” since

the word complies with the double length rule of Pitman’s Shorthand (case b). On the

whole, the transcription engine assumes that the word “weather” belongs to the case (b)

rather than case (a) since doubling does not produce two strokes of unequal length if the

straight stroke is doubled to add the sound –THER. As a result, a shorthand outline for the

word “weather” generated by the transcription engine is different from the one defined in the


132

typical Pitman’s Shorthand dictionary and hence the error. Overall, a primary cause of error

in this case is due to ambiguity of the rules of Pitman’s Shorthand.

Figure 5.13: Two different outlines for the word “weather”; (a) the word “weather is written according to the double-length rule of Pitman’s Shorthand; (b) the word

“weather” is not written according to the double-length rule of Pitman’s Shorthand

Figure 5.14: (a) Shorthand outline for the word “factor”; (b) shorthand outline for the word “further”

Example 2: Errors due to different phonetic representations of an applied phonetic

dictionary

In order to clarify errors due to different phonetic representations of a phonetic dictionary,

consider a phonetic representation of the word “union”. According to the applied phonetic

dictionary of this research (i.e., CMU phonetic dictionary), the word “union” is composed as

/Y Ō N Y Ĭ N/ whereas, the word is composed as /Y Ō N Ĭ N/ according to the typical

Pitman’s Shorthand dictionary (available in book format). Note that there is an extra

phoneme /Y/ in the first composition and due to this difference, a shorthand representation

of the word “union” defined by the transcription engine is different from the one defined by

the typical Pitman’s Shorthand dictionary as illustrated (Figure 5.15). In general, Pitman’s

Shorthand outline for the word “factor”

Shorthand outline for the word “further”

(a) (b)

(a) Correct outline (b) Incorrect outline

Two different shorthand outlines for the word “weather”


133

Shorthand outlines are generated according to phonetic representations of an applied

dictionary and therefore accuracy of the phonetic dictionary is critical in this research.

Figure 5.15: two different shorthand outlines for the word “union”

Example 3: Errors due to derivative or compound words

In order to clarify errors due to derivative or compound words, consider the compound word

“landlord”. According to the typical Pitman’s Shorthand dictionary (available in book

format), the word “landlord” is remarked as a composition of two words “land” and “lord”;

however according to the transcription engine, the word “landlord” is remarked as a

composition of one word. As a result, shorthand outline representations for the word

“landlord”, generated by the transcription engine and the typical Pitman’s Shorthand

dictionary become different as illustrated (Figure 5.16), mainly due to one of the rules of

Pitman’s Shorthand that reads: “a small hook written inside the end of a curved stroke adds

the final sound N” [Oj95]. Since the transcription engine does not regard that the phoneme

/N/ at the end of the word “land” is a final phoneme, the N hook is not applied for the word

“landlord” and hence the error. In general, efficiency of the transcription engine in

identifying any derivatives and compound words relies on the information available in an

applied phonetic dictionary.

Two different shorthand outlines for the word “union”



134

Figure 5.16: Two different outlines for the word “landlord”

Example 4: Errors due to the limitation of machine compatible scripts

In order to clarify errors due to the limitations of machine compatible scripts, consider a

shorthand representation for the word “environment”, which is defined either as Figure 5.17

(a) or Figure 5.17 (b) by the typical Pitman’s Shorthand dictionary. Note that both of the

scripts are valid in this case according to the rule of Pitman’s Shorthand that reads: “the

suffix –MENT is represented by or , whichever is convenient.” [Oj95] In order to

reduce ambiguity for the computer aided transcription, the transcription engine restricts the

writing of the suffix –MENT to be only one form and the inability of generating an

alternative form is taken as an error in the current research.

Figure 5.17: Two different outlines for the word “environment”

Two different shorthand outlines for the word “environment”

(a) Valid outline (b) Valid outline

Two different shorthand outlines for the word “landlord”


N hookNormal N stroke


135

Distribution of different categories of errors in machine-readable Pitman's shorthand lexicons of

different sizes

13%

27%

20%

40%

Errors due to ambiguity of the writing rules of Pitman's shorthand

Errors due to different phonetic representations of an applied phonetic dictionary

Errors due to derivative or compound words

Errors due to limitations of machine compatible scripts

Figure 5.18: The distribution of different categories of errors in different sizes of electronic Pitman’s Shorthand lexicon

On the whole, the graph (Figure 5.18) illustrates the distribution of four types of errors,

discovered in the current experiment where the major error (40%) is due to the limitation of

machine compatible scripts.

5.4.3 Analysis of the Distribution of Homophones in Machine-readable

Pitman’s Shorthand Lexicons

A primary goal of an experiment carried out in this section is to estimate an average

distribution of candidate words (homophones) mapping to each key of a machine-readable

Pitman’s Shorthand lexicon and to evaluate the distribution of homophones in different sizes

of lexicon.


136

0

20

40

60

80

100

120

100

500

2000

3000

4000

5000

Lexicon size in no. of words

Un

iqu

e o

utl

ines

in

%

In the presence of clearline thickness andcomplete vowelinformation

In the presence of linethickness ambiguity

In the presence of vowelambiguity

Figure 5.19: The distribution of uniqueness of the electronic Pitman’s Shorthandlexicons

Figure 5.19 illustrates experimental results carried out on different sizes of electronic

Pitman’s Shorthand lexicon of up to 5000 words. The X-axis of the graph represents

different sizes of lexicons and words are sorted according to the frequency of usage in each

lexicon. The Y-axis of the graph represents the uniqueness of an electronic Pitman’s

Shorthand lexicon where the formulation of the uniqueness can be defined as follow:

100*t

atu

(5.3)

where u is the uniqueness a lexicon, t is the total number of keys containing in the lexicon,

and a is the total number of keys having one and only one relationship with a corresponding

English word in the lexicon.

The first test (Figure 5.19) illustrates uniqueness of lexicons in the presence of clear

distinction between line thicknesses as well as in the presence of complete vowel

information. The study finds that uniqueness of the lexicon of 5000 most frequently used

English words is 96%. The maximum ambiguity is 4 candidate words per key and an

average ambiguity is 1.02 potential words per key.


137

The second test (Figure 5.19) illustrates uniqueness of lexicons in the presence of line

thickness ambiguity. According to experimental results, uniqueness of the lexicon of 5000

words drops by about 5% if there is no distinction between thick and thin strokes. The

maximum ambiguity is 5 candidate words per key and an average ambiguity is 1.05 potential

words per key.

Finally, the third test (Figure 5.19) illustrates uniqueness of lexicons in the presence of

vowel ambiguity. The study finds that uniqueness of the lexicon of 5000 words is

approximately 71% when vowel primitives are not included in the keys of a lexicon. The

maximum ambiguity is 15 candidate words per key and an average ambiguity is 1.22

potential words per key.

5.5 Discussion

On the whole, this chapter presents the creation of a novel machine-readable Pitman’s

Shorthand lexicon in order to facilitate the direct translation of geometrical features of

shorthand outlines into English words. Experimental results present accuracies of different

sizes of electronic Pitman’s Shorthand lexicon as well as the distribution of homophones in

the novel lexicon.

In relation to accuracies of the electronic Pitman’s Shorthand lexicons, an average accuracy

of 99% is satisfactory, however further improvements on the correction of errors caused by

an applied lexicon can be carried out with the use of an appropriate dictionary. As for the

correction of errors relating to the rules of Pitman’s Shorthand, the use of machine

compatible scripts is recommended; however it requires further assessment for user

acceptability.


138

In relation to the analysis of uniqueness of an electronic Pitman’s Shorthand lexicon,

experimental results can be used to justify which type of electronic Pitman’s Shorthand

lexicon is appropriate for the computer aided transcription of handwritten Pitman’s

Shorthand. According to experimental results, a lexicon with low uniqueness is more robust

against the line thickness ambiguity or the vowel ambiguity, and the lexicon with the highest

uniqueness is the least robust against the natural ambiguity. Taking into consideration the

impracticability of having restrictions on natural ambiguity in handwriting recognition, it is

recommended that the use of either a lexicon with line thickness ambiguity or a lexicon with

vowel ambiguity or a combination of both is feasible for the real time transcription of

handwritten Pitman’s Shorthand.

6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines

139

6 Phrase level transcription of online handwritten

Pitman’s Shorthand outlines


140


This chapter focuses on the solutions to the phrase level recognition of online handwritten

Pitman’s Shorthand outlines. The primary aims of this chapter are first to investigate a

contextual method that can effectively reduce homophone ambiguities appeared in a

resulting list of a corresponding handwritten Pitman’s Shorthand outline; and second, to

propose a phrase level recognition framework to produce the most likely word sequence for

a written phrase using the Vertibi algorithm.

Unlike research carried out in the previous chapters of the thesis, which determined the

investigation into finding a novel solution for a given problem to be the one and only one

goal, this chapter’s research is carried out with multiple goals, i.e., to investigate conceptual

algorithms for implementing a handwritten Pitman’s Shorthand phrase recogniser, and also

to consider the possibility of applying existing Application Program Interfaces (APIs)

[Mic04] to resolve the problem of handwritten Pitman’s Shorthand phrase recognition. A

major bottleneck of the integration is access to the APIs’ hidden functions to enable the

Pitman’s Shorthand recogniser’s candidate English words to be input into the APIs.

This chapter presents detailed studies carried out to meet the main objectives mentioned

above, and it is categorised into the following four sections:

- Contextual rejection strategy: presents an effective novel word rejection strategy,

implemented using the critical contextual knowledge that people use when reading

handwritten shorthand notes.

- Phrase level recognition algorithm: propose a conceptual solution to find the most

likely word sequences for a handwritten Pitman’s Shorthand phrase with the use of

the Viterbi algorithm and statistical language modelling techniques.

- Integration with APIs: discusses major difficulties in the integration of a Pitman’s

Shorthand phrase recogniser with APIs of the Microsoft Tablet PC Platform


141

Software Development Kit [Tab04], and highlights the potential benefits of

successfully integrating the two components.

- Experimental result: evaluates the performance of the new contextual rejection

strategy proposed in this chapter.

6.1 Contextual Rejection Strategy

Chapter 4 mentioned that the major factor preventing a correct word from appearing in the

topmost position of a result list for a given Pitman’s Shorthand outline is the similarity of

input outlines to other outlines. This research denotes this problem as the homophone

ambiguity, and further resolution to this problem is discussed in relation to the word

rejection strategies here.

Several word rejection strategies [GKM+97], [PP02], [ MAG+02], have been applied in the

field of handwritten word recognition. Their reliability is related to their capability not to

accept false word hypotheses and not to reject true word hypotheses [Ka04]. Common

rejection thresholds reported in the literature are the class-threshold (e.g., [QAC05], which

rejects words according to their grammatical nature), the domain-threshold (e.g., [NB02],

which rejects words according to a user domain), the lexicon-threshold (e.g., [ESS+98],

which rejects words according to a lexicon’s confidence score) and the recogniser-threshold

(e.g., [PP02], which rejects words according to the confidence scores produced by a Hidden

Markov Model-based on-line handwriting recogniser).

In addition to the rejection strategies mentioned above, a critical contextual knowledge that

needs to be put into practice for rejecting homophones of handwritten Pitman’s Shorthand

outlines is the shorthand outlines’ position. In Pitman’s Shorthand, the outlines’ correct

positioning is highly critical [Oj95], as it provides a primary clue for retrieving vowel

information even though vowels are omitted in an outline. In general, an outline’s position is


142

determined by the first pen-stroke in Pitman’s Shorthand, such that if an outline’s first stroke

is written above the writing line, it is considered to be written in the first position; if the first

stroke is written on the writing line, it is considered to be written in the second position; and

if the first stroke is written through the writing line, it is considered to be written in the third

position. Samples of three Pitman’s Shorthand outlines written in three different positions

are illustrated in (Figure 6.1 a). As illustrated (Figure 6.1 (b)), although the three words

comprise exactly the same consonant stroke, their corresponding English words can be

easily identified by the differences between the outlines’ positions.

Figure 6.1: Samples of Pitman’s Shorthand outlines written in three different positions; (a) outlines written including vowel notations, (b) outlines written without

vowel notations

Overall, stenographers apply the outlines’ position as a primary clue to identify the most

relevant words for a particular shorthand outline. However, this knowledge has never been

applied to solve the problem of homophone ambiguity in machine-based transcriptions.

Based on this observation, the contextual rejection strategy proposed in this chapter is

defined as:

W P(k) = WP(i) \ WP(i ≠ k) (6.1)

where WP(i) is a list of candidate words for an input shorthand outline (written in different

positions) and k is an input outline’s written position, which is 1 for the first position, 2 for

Outline written in the second position

Outline written in the third position

at aid eat

Outline written in the first position

Outline written in the second position

Outline written in the third position

at aid eat

(a) (b)

Pitman’s Shorthand outlines for the words “at,” “aid” and “eat”

Pitman’s Shorthand outlines for the words “at,” “aid” and “eat” (written without vowels)

Outline written in the first position


143

the second position and 3 for the third position. In order to clarify the algorithm, consider

Example 1 given below.

Example 1

Assuming that k = 1, W = {W1, W2, W3, W4, W5, W6} is a set of candidate words for a given

shorthand outline and P(i) of W1, W2, W3, W4, W5, W6 are 2, 1, 1, 3, 2 and 1 respectively,

then W P(k) is calculated as:

W P(k) = WP(i) \ WP(i ≠ k)

W P(k) = { W1, W2, W3, W4, W5, W6} \ { W1, W4, W5}

= {W2, W3, W6}

6.2 Handwritten Pitman’s Shorthand Phrase Recognition

Figure 6.2: illustration of the handwritten Pitman’s Shorthand phrase level transcription process

The structure of the phrase level transcription process (for handwritten Pitman’s Shorthand)

is illustrated in Figure 6.2, where the framework is based on the architecture of the online


Bayesian Networkbased vocalised outline

interpreter


Bayesian Networkbased vocalised

outline interpreter

Ordered word list

Ordered word list

Ordered word list

Ordered word list

Pitman’s Shorthand specific contextual word rejection

Filtered word list

Filtered word list

Filtered word list

Filtered word list

Word graph

Language model

The cat sat on the mat.

Pitman’s Shorthand phrase level transcription


144

handwritten English sentence recognition [QAC05]. In brief, every handwritten Pitman’s

Shorthand outline is given as an input to the short-form interpreter or the Bayesian Network

based vocalised outline interpreter, and a ranked list of candidate words for each outline is

produced at the end of the word recognition process. The candidate words are then validated

by the contextual rejection strategy, which removes irrelevant words from the candidate lists

before forwarding them to the phrase level recogniser. The phrase level recogniser then

creates a word graph with the incoming word lists such that each node represents a candidate

word’s likelihood, and each edge represents the transitional probability (i.e., language model

probability) between node n1 and node n2. The phrase level recogniser then finds the most

likely sequence of words for a given input phrase by applying the Viterbi algorithm to the

word graph.

Based on the algorithm defined in the online handwritten English sentence recognition

[QAC05], the most likely sequence of words Ŵ for a handwritten Pitman’s Shorthand

phrase is defined as:

Ŵ = )()|(maxarg WpWPpW

(6.2)

where W is the candidate words’ sequences, P is the handwritten Pitman’s Shorthand phrase,

p(P|W) is the posterior probability of a handwritten phrase P conditioned on the given

sequence of words W, and p(W) is the prior probability of sequence W.

In detail, p(P|W) is evaluated by the confidence score of the Bayesian Network based word

interpreter and p(W) is given by a statistical language model. In other words, the efficiency

of finding the best sequence of words for a given input phrase depends on the confidence

score of the handwritten word recogniser plus the confidence score of the applied statistical

language model.


145

This chapter focuses on the statistical language models’ impact on phrase level recognition,

because a language model’s quality can directly affect the overall word recognition

accuracy. For instance, [MB01] showed that a bi-gram language model outperforms a

unigram language model in offline handwritten sentence recognition. Similarly, work by

[ZB04] showed that a tri-gram model increases word recognition accuracy by 6.8%

compared to a bi-gram model. Again, work by [QAC05] showed that using bi-gram and tri-

gram models for online handwritten sentence recognition results in only a 0.1% difference in

word recognition accuracy. These findings show that it is critical to apply an appropriate

statistical language model in order to obtain an overall promising result.

Considering a statistical language model’s quality, this chapter proposes the use of the

statistical language model embedded in Microsoft handwriting recognition APIs [Mic04], in

which the language model has been thoroughly trained with millions of words of various

languages, dictionaries and grammar for the development of a commercially viable system.

6.3 The Integration of a Pitman’s Shorthand Phrase Recogniser

with APIs

This section presents a feasibility study of the integration of a Pitman’s Shorthand phrase

recogniser with Microsoft handwriting recognition APIs, in order to take advantage of

existing statistical language models embedded in the APIs. In order to discuss the specific

API relating to this study, consider the object model illustrated in Figure 6.3.


146

Figure 6.3: An abstract view of the object model “Microsoft.Ink”

Figure 6.3 illustrates an object model for the class “Microsoft.Ink” that includes child

objects facilitating automatic handwriting recognition. A specific object relating to the

current study is the “RecogniserContext” object, which enables ink recognition, retrieving

the recognition’s result and alternative results.

In order to clarify the “RecogniserContext’s” efficiency, consider the examples illustrated in

Figure 6.4. Figure 6.4 (a) illustrates the recognition results for a written phrase produced by

Ink Collection

and Display

Ink Data

Recognition

Recognizer

RecognizerContext

WordList

RecognizerGuide

RecognitionResult

RecognitionAlternates

RecognitionAlternate

Strokes

Stroke

Gesture


147

the “RecogniserContext” API, Figure 6.4 (b) illustrates the change in recognition results

upon a new word’s arrival, Figure 6.4 (c) illustrates the change in the recognition results

when the API’s context is limited to a full file path’s name, and Figure 6.4 (d) illustrates the

change in recognition results when the API’s context is limited to an e-mail address’

username.

Figure 6.4: Screen shots of the recognition results produced by the“RecogniserContext” API

In total, approximately 40 kinds of input scopes can be defined in relation to the API’s

context. A major bottleneck in integrating the handwritten Pitman’s Shorthand recogniser

into this powerful API is the API’s lack of function that can facilitate taking a written

phrase’s candidate words as inputs, and producing a ranked list of candidate phrases as an

output. Overall, the investigation of a solution to facilitate this function is rewarding and is

open to future work of this research.

(a) (b)

(c) (d)


148


A small experiment is carried out in this chapter to evaluate the performance of the

contextual rejection strategy reported in this chapter. The data set includes 700 phrases,

which were automatically generated using the word lists gained from the experiments of the

Bayesian Network based word transcription (Chapter 4). A primary goal of this experiment

is to analyse the accuracy of irrelevant words’ removal from the result lists based on the

shorthand outlines’ position information, which defines rejection accuracy as:

0

100

a

Figure 6.5: Performance of the contextual rejection strategy

The rejection accuracy of 700 phrases is illustrated in Figure 6.5. The study finds that the

contextual rejection strategy correctly filtered 98% of the phrases, and that inaccurate

position writing, practised primarily by inexperienced writers, caused the 2% error rate. The

findings show that the contextual rejection strategy proposed in this chapter is highly reliable

in conjunction with accurate position writing.

6.5 Discussion

This chapter presents Pitman’s Shorthand specific contextual knowledge to reduce

handwritten Pitman’s Shorthand’s homophone ambiguities. Theoretical algorithms to

0

20

40

60

80

100

120

1 30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494 523 552 581 610 639 668 697

phrase number

accu

racy

If correct words appear in result lists after applying the rejection strategy

Otherwise (6.3)


149

resolve the problem of handwritten Pitman’s Shorthand phrase recognition are proposed

with the use of the Viterbi algorithm and language models. In relation to the use of sufficient

statistical language models in order to enhance the phrase level recognition performance, a

feasibility study of the phrase level recogniser’s integration with the existing handwriting

recognition APIs is carried out. The study highlights the APIs’ efficiency and proposes the

potential benefits of successfully integrating the two components.

Overall, this chapter has addressed solutions to the online handwritten Pitman’s Shorthand

phrase recognition problem; however the framework is not fully implemented in this

research mainly because the research into this problem is no longer new, and an established

framework is available in the market. Compared to phrase level transcription, investigation

into novel solutions to handwritten Pitman’s Shorthand’s word level transcription problems

has been more emphasised in this research, as the state of the art of the latter needs more

extensive research in order to produce a commercially viable handwritten Pitman’s

Shorthand recogniser.

7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System

150

7 Graphical User Interfaces of the Handwritten Pitman’s

Shorthand Recognition System


151


Previous chapters presented full details of the back-end interpretation of handwritten

Pitman’s Shorthand outlines, whereas this chapter presents the research and development of

front-end user interfaces, via which users of the handwritten Pitman’s Shorthand recogniser

interact with the back-end programs. The primary objective of the chapter is to demonstrate

the commercial viability of the end result of this research with a series of well-designed

prototypes. Figure 7.1 depicts the front-end and back-end layers of the system.

Figure 7.1: Front-end and back-end architecture of the system

The chapter includes six main topics, outlined as follow:

1. Overview: a description of interactions between user interfaces and back-end

engines, including clarification of an applied programming environment for each

front-end and back-end program.

Client layer

Tool layer

Database layer

Developer Volunteers for the training

data collection

End users

Front-end layer

Back-end layer


152

2. Pen based Application Program Interfaces (APIs): a brief description of Microsoft

tablet PC APIs, in particular ink APIs which are primarily used in the development

of Graphical User Interfaces (GUIs) of the system.

3. Training data collection GUIs: description of data collection GUIs with which a

large collection of handwritten Pitman’s Shorthand outlines is collected for training

purposes.

4. Developer GUI: description of a low level parameter setting interface, mainly

implemented for system developers.

5. End-user GUIs: description and comparison of end-user interfaces of this research

which enable handwritten Pitman’s Shorthand data entry into tablet PCs.

6. Experiment: evaluation of user feedbacks on the presented prototypes, particularly

from the perspective of potential users’ acceptability.

7.1 Overview

Figure 7.2: Illustration of interactions between user interfaces and back-end engines of the system

Ink Collector GUI

0 1050 116931 1051 116746 1053 1171..

Data file 1

Recognition engine

WStartS1,32,1,25,1S2,32,92,3,0.997942S2,32,92,12,0.24453..

Data file 2

Transcriptionengine

play 0.322bay 0.112clay 0.001..

Result GUI

Data file 3

Legends

File storage

Process

Process flow

Read/write access

Visual C# Visual C++ Java Visual C#


153

A brief overview of the interactions between front-end interfaces and back-end engines are

illustrated (Figure 7.2). As shown, handwritten ink data, collected by an ink collector GUI is

firstly saved in a data file (data file 1 in Figure 7.2). The data file is then processed by the

collaborator’s recognition engine where a series of ink coordinates are transformed into a

ranked list of words or primitives (data file 2 in Figure 7.2). Then, on arrival of the

classified data, the transcription engine invokes and produces a ranked list of n-best words

for the corresponding classified data (data file 3 in Figure 7.2). Once the transcription result

is ready, the front-end GUI retrieves the result file and displays it as the best text

representation for a written outline.

As discussed, a primary means of data transmission between components of the current

system is via file accesses. In this way, development of the front-end and back-end

programs becomes independent without necessarily waiting for the completion of each

other; thereby enabling the concurrent development of several components of the system in

two countries to become productive. In addition, the current system includes more than one

programming environment and data files are, in fact, primary media communicating

programs of the different environments. The graphical user interfaces presented in this

chapter are implemented using tablet PCs with Microsoft Windows XP Tablet PC Edition.

The detailed programming environments included in the current system development are:

- Microsoft visual C#, used in the development of front end user interfaces.

- Microsoft C++, used in the development of the collaborator’s recognition engine.

- Sun Java (J2SDK), used in the development of the transcription algorithms.

7.2 Ink Data Collection in this Research

This section presents an essential description of the ink collection procedure carried out in

this research. The description is linked to the Microsoft tablet PC platform APIs [Mic04]

in order to enable the (interested) reader to test a simple ink collection program practically.

In other words, the presented ink collection procedure here is applicable to not only the


154

collection of online handwritten Pitman’s Shorthand data, but also the collection of any kind

of ink data needed for various purposes.

In general, the Tablet PC platform APIs relating to the ink collection can be divided into

three distinct groups: ink collection APIs, ink data management APIs and ink recognition

APIs. A pictorial presentation of how these APIs work together, at a high level, is provided

at the MSDN library [Abo04] and the illustration is replicated here (Figure 7.3) as a

reference for discussion.

Figure 7.3 Illustration of the Tablet PC platform APIs presented at [Abo04]

According to the pictorial presentation of the Tablet PC platform APIs (Figure 7.3), the ink

collection procedure of this research relates to the utilization of Pen APIs (i.e., the first stage

of Figure 7.3). Here, clarification is made why APIs of the other stages (i.e., No. 2, 3 and 4

stages of Figure 7.3) are not applicable to the current research regardless of their efficient

ink manipulating and recognition capabilities. This is because the Tablet PC platform APIs

support the processing of only fifteen handwritten languages at the time of writing, and

Pitman’s Shorthand is not one of them. In brief, only the ink collection APIs are applicable

to the current research, and the remaining functions of ink manipulation and ink recognition


155

are covered by the recognition engine and the transcription engine of this research

respectively.

Figure 7.4 depicts the high level relationship of object models of the tablet PC APIs where a

hierarchical relationship of an ink collection object, namely “InkCollector” is highlighted.

The primary function of this object is capturing a series of ink coordinates and timestamps of

a pen-stroke. In general, any handwriting data written on handheld devices with a Microsoft

tablet PC platform can be collected with the use of an “InkCollector” object.


156

Figure 7.4: Illustration of the high level relationship of object models of the Tablet PCplatform APIs

Ink Collection

and Display InkTablets

IInkTablet

InkCollector

InkDisp

IInkCursors

IInkCursor

InkDrawingAttributes

IInkCursorButtons

IInkCursorButton

INKEDLib

InkEdit

PenInputPanel

InkOverlay

InkPicture

InkRenderer

InkRectangle

InkTransform

Ink Data

Recognition


157

7.3 General Training Data Collection Tool

Figure 7.5: Home page of the training data collector

The interfaces (Figure 7.5 & Figure 7.6) of this research are particularly designed for the

collection of a large sum of handwriting data for system training purposes. It has been

applied as a primary data collection tool in this research, and it can also be applied as a

general data collection tool for any other kinds of handwriting recognition systems.

A primary purpose of the interfaces in this research is to collect and organise training data

effectively as well as to enable volunteers (shorthand writers) to have a user friendly

experience of entering Pitman’s Shorthand outlines into a tablet PC. Mention should be

made that Pitman’s Shorthand was once widely practised as a speech recording mechanism,

but more recently it has ceased being a popular writing system. Therefore, volunteers of the


158

training data collection process can be of various ages as well as domains. In addition, tablet

PCs are fairly new devices for the general population at the time of writing, and no more

than 20% of the volunteers of this research have previous experience of using pen-based

computers. Taking into account these factors, an important criterion is set in relation to the

layout of the training data collector i.e., functions of the interfaces should be kept as simple

as possible, and the appearance of the interfaces must be suitable for volunteers of various

ages and domains.

Figure 7.6: Sample data entry page of the training data collector GUI

In general, the training data collector collects two types of data: writer data and ink data.

The writer data is intended for the evaluation of the overall system performance and the ink


159

data is intended for the training of the transcription engine. As illustrated (Figure 7.5), the

home page of the training data collector collects the following writer information:

Name: intended for automatic naming of training data folders and files.

Gender: intended for evaluating whether the transcription accuracy varies between

female writers and male writers.

Skill in Pitman’s Shorthand: intended for evaluating whether the transcription

accuracy varies depending on the skill of a writer in Pitman’s Shorthand, where the

skill is categorised into three levels: professional, intermediate and inexperienced.

First language of a writer: intended for evaluating whether the transcription

accuracy is influenced by a writer’s skill in English pronunciation. Since Pitman’s

Shorthand is based on phonetics, technically, non-native English speakers may find

pronunciation more difficult than native speakers and they may produce more

inaccurate shorthand outlines.

Previous experience in pen-based data entry system: intended for evaluating

whether the transcription accuracy is influenced by a writer’s previous experience in

using a pen-based text entry system.

Tidiness of handwriting: intended for evaluating whether the transcription accuracy

is influenced by the tidiness of a user’s handwriting, where the tidiness is

categorised into three levels here: “very neat and tidy”, “legible enough to others but

not very tidy”, and “legible to me but not to others”.

Domain of a writer: intended for evaluating whether the transcription accuracy

varies depending on the change of domains.

Way of writing: intended for evaluating whether the transcription accuracy varies

depending on whether the writer is left-handed or right-handed.


160

7.4 Developer Graphical User Interface

Figure 7.7: Screen shot of the developer Graphical Interface

The developer GUI (Figure 7.7) provides an advanced setting of the system where its

functions are particularly intended for system developers. Since it is a gateway to control

parameters of the recognition and transcription engines, it has presented huge benefits to the

current research and development. Moreover, it is also intended to be beneficial to future

system developers whose work is going to be based on this research. Functions included in

this interface are:

A change of lexicon: a file dialogue is provided to specify the location of a new

lexicon. Domain specific knowledge is critical for the transcription of Pitman’s

Shorthand [NB02] and therefore this function enables a change of an appropriate

lexicon for a particular domain.

Definition of training data set: a text area is provided to enter a list of words that are

to be collected for training purposes. Since Pitman’s Shorthand is no longer a

popular writing system nowadays, databases of handwritten Pitman’s Shorthand

outlines, designed for training a handwriting recogniser, are not available at the time


161

of writing. As a result, the current research includes collection of several training

data sets.

Execution of the recognition engine (classification): a control dialogue is provided

to execute the recognition engine independently. A primary purpose of this function

is to convert a series of ink data files (e.g., data file 1 in Figure 7.2) into a series of

classified data files (e.g., data file 2 in Figure 7.2) in a batch by running the

recognition engine separately. It has been mentioned that the ink collector and the

recognition engine (Figure 7.2) are capable of running independently with data files

created in between. At the time of the training data collection in this research, the

ink collector is run separately, mainly to reduce frustration to volunteers who input

hundreds of shorthand outlines into the system in a limited amount of time.

Parameter setting: a control dialogue is provided to adjust parameters, used in the

training of Bayesian Network based shorthand outline models. The use of this

interface is essential to train the transcription engine, because it is theoretically

inflexible, to have multiple training samples for every word of a dictionary

regardless of a large collection of training data in this research. With the use of the

parameter setting interface, Bayesian Network based outline models can be trained

with training data and history data or either one of them. In addition, the interface

also enables the specification of a preferred training data set, to train the

transcription engine.

7.5 Shorthand Data Entry Graphical User Interfaces

The shorthand data entry GUIs presented in this section are the first graphical user

interfaces, ever implemented to facilitate handwritten Pitman’s Shorthand entry into tablet

PCs. Discussion on the interfaces are presented below in comparison with the collaborator’s

interfaces that were concurrently developed.


162

Firstly, a screen shot of an end-user interface, proposed by the collaborator is illustrated

(Figure 7.8) where box 3 facilitates a handwritten ink entry into the system; box 1 provides

segmentation and classification results of a written script; box 2 provides a list of n-best

words for the written script. Similarly, another (most recent) version of the collaborator’s

interface is presented (Figure 7.9) where the components are basically the same as the

previous version (Figure 7.8) with additional writing areas and new features for adjusting the

parameters of the recognition engine. On the whole, the end-user interfaces suggested by the

collaborator are mainly beneficial to system developers since they emphasise the back-end

view of the recognition and transcription engines.

Figure 7.8: The first version of the collaborator’s tablet PC interface for the handwritten Pitman’s Shorthand recognition system

Box 1

Box 2

Box 3


163

Figure 7.9: The latest version of the collaborator’s tablet PC interface for Pitman’s Shorthand recognition system

Unlike the interfaces developed by the collaborator, end-user interfaces of this thesis put

emphasis on the usability issues including user friendliness, commercial viability and

completeness of the system. From the aspect of user friendliness, the research interfaces are

designed to look similar to a conventional shorthand note-pad. In this way, primary users

(stenographers) of the system are expected to get used to the interfaces quickly, thereby

enabling a short learning curve.

While taking into account the creation of a note pad like interface, a pen-input area (writing

area) becomes a critical concern i.e., whether a square writing box should be designed for

Textual output

Transcription results

Recognition results

Input area

Parameter control

functions

make the

make a cake


164

the writing of a single word or multiple words. In general, facilitating the writing of

multiple words in a single writing area suffers word boundary ambiguities. However, on the

other hand, enabling the writing of N words in N writing areas suffers wasted screen spaces.

In this research, the collaborator’s recognition engine encourages the writing of one and only

one word in each writing area in order to reduce word boundary ambiguities. As a result,

end-user interfaces of this research also encourage the writing of N words in N writing areas.

Regardless of the use of several writing boxes in the interface, the original goal (i.e., to

create a note pad like interface) is achieved by connecting the boxes with faded borders as


In addition, the dimension of a writing area is discussed in relation to the creation of a note-

pad like interface. The size of Pitman’s Shorthand note-pads, commonly used by

stenographers, is roughly A5 size (210mm x 149 mm) with approximately 8mm line

intervals [Lg90]. By taking a ratio of the size of a note pad to the size of 15” digitiser of a

tablet PC (e.g., 1024 pixel x 768 pixel), the 8 mm line interval is considered to be equal to

approximately 30 pixels. Based on these measurements, the dimension of individual writing

box of the interface is set at 100 pixels x 60 pixels with a line interval of 30 pixels. The

solution appears to be practical but requires further assessment for user acceptability.

From the aspect of commercial viability, the overall presentation of the interface is designed

to look good in addition to its reliable functioning. On the whole, users are provided with a

choice of two layouts to interact with the final interfaces of the system. The first one (Figure

7.10) is designed to be practical for its rapid note taking purpose and it resembles a

conventional shorthand note-pad. The second one (Figure 7.11) is designed to be practical

for its general text entry purpose and it resembles a handwriting recogniser of Microsoft

Windows XP Tablet PC edition.


165

From the aspect of completeness, the final interfaces of this research perform as a gateway to

access to any components of the system, including a data entry GUI with a list of toolboxes

for text edition and parameter setting, the developer GUI (Figure 7.7), the training data

collection GUI (Figure 7.5), and a back-end view of the system (similar to the one proposed

by the collaborator (Figure 7.9)). Regardless of the involvement of these multiple

components, the simplicity of the look of the interfaces is achieved by hiding any advanced

level control components with show/hide functions to open/close them respectively, as

illustrated (Figure 7.10 & Figure 7.11).

Figure 7.10: Screenshot of a note-pad layout of the end-user interface of this research

Textual output

Writing area

Page feed

make a cake

Text editing toolbox

Advanced setting toolbox

make a cake


166

Figure 7.11: Screenshot of an alternative layout of the end-user interface of this research


A small experiment is carried out in this chapter to analyse user feedbacks on the graphical

user interfaces (GUIs) presented in this chapter. An ultimate goal of the experiment is to

determine the most feasible GUI for the automatic handwritten Pitman’s Shorthand

recogniser that can be presented as a commercially viable prototype.

On the whole, four types of GUIs are evaluated in the experiment. Two of them were

developed by the collaborated research team and the other two were developed in the

research and development of this thesis. In order to assist the reader to easily recognise the

four GUIs, thumbnail views of the GUIs are presented in Figure 7.12. The experiment was

conducted by 20 persons with different levels of skill in Pitman’s Shorthand (including those

with no background knowledge in Pitman’s Shorthand up to those with professional level of

skill in Pitman’s Shorthand).

Textual outputAdvanced setting toolbox

Text editing toolbox

Page feedWriting area

make a cake

make a cake


167

Figure 7.12: Thumbnails of the four GUIs evaluated in the experiment

In general, experiments carried out in this chapter are categorised into four groups as

follows:

The general distribution of user fondness for the presented prototypes.

The distribution of user fondness for the presented prototypes in the case of speed

writing.

The distribution of user fondness for the presented prototypes in the case of general

text entry into handheld devices.

The comparison of the most favourite GUI of experienced shorthand writers and that

of novice shorthand writers.

Prototype 1 Prototype 2 Prototype 3

Prototype 4


168

7.6.1 Analysis of the General Distribution of User Fondness for the

Presented Prototypes

The general distribution of user fondness for the four prototypes

0%

20%

40%60%

80%

100%

1 2 3 4

level of preference (1 represents the most favourite and 4 represents the least favourite)

per

cen

tag

e o

f u

ser

prototpye 1 prototype 2 prototype 3 prototype 4

Figure 7.13: The general distribution of user fondness for the presented prototypes

(Figure 7.13) illustrates the level of user fondness for the four prototypes where the X-axis

represents the level of user preference for a specific prototype over the others, and the Y-axis

represents the percentage of users. Experimental results show that prototype 4 is the most

favourite GUI for 60% of users, and the prototype 1 is the least favourite GUI for 95% of

users.


169

7.6.2 Analysis of the Distribution of User Fondness for the Presented

Prototypes in the Case of Speed Writing

The distribution of user fondness for the four prototypes in the case of

speed writing

0%10%20%30%40%50%60%70%80%

prototype 1 prototype 2 prototype 3 prototype 4

per

cen

tag

e o

f u

ser

Figure 7.14: The distribution of user fondness for the presented prototypes in the case of speed writing

(Figure 7.14) illustrates the level of user fondness for the four prototypes, especially in

relation to the need for rapid writing, for instance, in the case of the real time recording of

spoken speech. An interesting phenomenon here is that prototype 3 becomes the most

preferred GUI over the others, in particular prototype 4. Note that, prototype 4 is the most

favourite GUI in the experiment for general cases (Figure 7.13). This finding shows that the

majority of users regard that a notepad like interface is more appropriate for rapid note-

taking purposes.

7.6.3 Analysis of the Distribution of User Fondness for the Presented

Prototypes in the case of a Small Amount of Text Entry into

Handheld Devices


170

The distribution of user fondness for the four prototypes in the case of small text

entry

0%

10%

20%

30%

40%

50%

60%

70%


per

cen

tag

e o

f u

ser

Figure 7.15: The distribution of user fondness for the presented prototypes in the case of a small amount of text entry into handheld devices

Figure 7.15 illustrates the level of user fondness for the four prototypes, particularly in

relation to the need for entering a small amount of textual information into handheld devices,

for instance, entering a person’s name into the name field of an address book. In contrast to

the case in Figure 7.14, the study finds that prototype 4 becomes the most favourite GUI

mainly due to the small amount of screen space taken by the shorthand recogniser while

other applications need to be run at the same time.

7.6.4 The Comparison of the Most Favourite GUI of Experienced

Shorthand Writers and that of Novice Shorthand Writers


171

The comparison of the most favourite GUI of experienced shorhtand writers and that of novice shorthand writers

0%

20%

40%

60%

80%

100%

120%


pre

cen

tag

e o

f u

ser

experienced shorthand writer Novice shorthand writer

Figure 7.16: The comparison of the most favourite GUI of experienced shorthand writers and that of novice shorthand writers

Finally, a comparison of the most favourite GUI of experienced shorthand writers and that of

novice shorthand writers (for the general purpose of use) is given in Figure 7.16. The study

finds that 100% of experienced shorthand writers prefer prototype 3 over the others, whereas

the majority of novice writers (80%) prefer prototype 4 over the others.

7.7 Discussion

This chapter presents the research and development findings of prototypes of the automatic

handwritten Pitman’s Shorthand recogniser. It takes a step towards a commercialization of

the product by showing what can be done with the prototypes of the handwritten Pitman’s

Shorthand recogniser.

According to the experimental results, it is found that prototype 3 and prototype 4,

developed by the research in this thesis, are preferred to the other two prototypes which were

developed by the collaborated research team. In addition, the study finds that the user

preference between prototype 3 and prototype 4 varies from time to time depending on the

purpose of use. Taking into consideration these findings, the end-user interface of the system


172

is finally designed with the integration of prototype 3 and prototype 4 so that users are

provided with a choice of two different layouts (i.e., prototype 3 or prototype 4) in order to

interact with the automatic handwritten Pitman’s Shorthand recogniser on tablet PCs.

8. Conclusion

173

8 Conclusion

8. Conclusion

174


This chapter presents the summary and conclusion of researches carried out in this thesis and

it is divided into the following four sections:

Research work summary: presents a summary of the whole thesis by highlighting

the key objectives of each chapter in combination with an overall evaluation of the

work carried out in each chapter.

Contribution: draws attention to a list of major contributions that have been made to

the research and development in order to meet the overall objectives of the thesis,

outlined in chapter 1.

Future work: presents further research directions that may be taken in order to

improve upon the presented approaches for a commercially viable system.

Dissemination: presents a list of papers (progress reports of the findings of this

research) that have been presented and published in pattern recognition specific

journals and conference proceedings.

8.1 Research Work Summary

The overall aim of the research presented in this thesis was to investigate the novel lexicon

organization and contextual methods that could improve the state of art of the online

handwritten Pitman’s Shorthand recognition problem.

Chapter 1 introduced the research of the thesis by highlighting a motivating need for the

development of new lexical post-processing methods to enhance the quality of text

interpretation of online handwritten Pitman’ shorthand outlines. The chapter also

highlighted the necessity for the development of a functional user friendly graphical user

interface which facilitates a rapid text entry into pen based computing handheld devices

using handwritten Pitman’s Shorthand.

8. Conclusion

175

A thorough literature review was carried out in Chapter 2 which overviews currently

available text entry systems into handheld devices and describes commonly used pattern

recognition and natural language processing algorithms that are applied to resolve problems

of the handwriting recognition.

The investigation into the efficiency of a conventional phonetic based word transcription

approach (where primitives of a shorthand script are firstly converted into a phonetic

representation in order to interpret it once more into corresponding English words with the

use of a phonetic dictionary) was discussed in Chapter 3. It was shown that the approach is

not robust against ambiguities of Pitman’s Shorthand, in particular, ambiguities of the

random omission of vowels among outlines. This leads to an exploration of the

development of a novel Bayesian Network based word transcription algorithm which aims to

enhance the solution using a primitive based transcription approach (Chapter 4). In the new

approach, primitives of a shorthand script are directly converted into autographic English

word(s) without getting transformed into phonemes with the use of a Pitman’s Shorthand

lexicon. It was shown that the new primitive based approach outperforms the conventional

phonetic based method.

In relation to the primitive to text transcription approach, Chapter 5 presented the automatic

generation of a novel machine-readable Pitman’s Shorthand lexicon which is an essential

component facilitating the primitive based transcription of the Bayesian Network based

word recogniser. The lexicon was shown to be a very effective mechanism for automatically

generating Pitman’s Shorthand representations for any given words.

Following an extensive research into the word level transcription of handwritten Pitman’s

Shorthand outlines, Chapter 6 proposed new contextual methods to enhance the solution

quality of the phrase level transcription problem. It was shown that the application of a well

8. Conclusion

176

known Viterbi algorithm in combination with Pitman’s Shorthand specific contextual

knowledge is more effective, comparing to other contextual methods of the same framework.

Finally, prototypes of end user graphical user interfaces (GUIs), designed to demonstrate the

real time recognition of handwritten Pitman’s Shorthand on a tablet PC are presented in

Chapter 7. This involves an evaluation of the user friendliness of the prototypes as well as

the selection of a final GUI for the whole system based on experimental results.

8.2 Contribution

A number of original contributions have been drawn from the thesis and they are identified

as follow:

For the first time, an investigation into the integration of the low-level online

handwritten Pitman’s Shorthand recogniser with the high-level linguistic post-

processor is presented. It is shown that the integration has resulted in a more

productive quality research than the work reported in the literature of the same

framework.

The concept of phonetic based interpretation of segmented portions of handwritten

Pitman’s Shorthand outlines into English words, reported in the literature is applied

to the linguistic post-processing of handwritten Pitman’s Shorthand problem. The

appraisal delivers a valuable finding that highlights the need for the investigation of

a novel means of interpreting handwritten Pitman’s Shorthand with a different

approach from the direction of existing concepts.

For the first time, the Bayesian Network representation is applied to the modelling

of handwritten Pitman’s Shorthand outlines. A series of experiments are carried out

to analyse the transcription performance of the Bayesian Network based word

interpreter. The findings show that the Bayesian Network representation is robust

8. Conclusion

177

against stroke variation and highly effective for handling major ambiguities of

handwritten Pitman’s Shorthand (i.e., classification errors and vowel errors).

For the first time, a machine readable Pitman’s Shorthand lexicon is generated. The

findings show that the capability of the lexicon (i.e., ability to produce an accurate

Pitman’s Shorthand representation for a corresponding word) plays an important

role in producing high quality solutions.

The application of Pitman’s Shorthand specific contextual methods in combination

with a Viterbi algorithm is proposed for the phrase level transcription problem. The

algorithm shows promise towards achieving the best quality solution to the phrase

level transcription problems of handwritten Pitman’s Shorthand.

A complete but resolute testing data set (which covers the whole range of rules of

Pitman’s Shorthand) is proposed. The dataset is sufficient to be applied as a quality

benchmark dataset in the literature.

For the first time, the development of end user graphical user interfaces for enabling

Pitman’s Shorthand data entry into tablet PCs is carried out. It is shown that the

final interface of the system is ready to be introduced as a commercially viable

prototype.

8.3 Future Work

Whilst this thesis presents several new methodologies to improve the state of the art of the

machine transcription of online handwritten Pitman’s Shorthand, there are several research

questions that have been generated. Some of these are identified below.

8.3.1 Improvement upon the Overall System

A further research direction, working in close collaboration with the current collaborated

research team is encouraged for the future improvements upon the overall system for a

8. Conclusion

178

commercially viable product. A major reward gained from the future cooperative research

will be the removal of the limitations of the current system, in particular recovering

segmentation errors of the recognition engine by allowing an interactive processing between

the recognition engine and the transcription engine. In the current Bayesian Network model,

the modeling of segmentation ambiguities is infeasible mainly due to the lack of real time

interaction with the low level segmentation process of the recognition engine. With an

interactive supply of low level segmentation data, Bayesian Network based stroke models

for Pitman’s Shorthand notations can be added in connection with existing shorthand outline

models. In this way, segmentation ambiguities can be embedded in probabilistic models and

recovered in the lexical post-processing stage. Overall, it may be worthwhile exploring a

solution to segmentation errors, which is a critical issue in the recognition of natural

handwriting.

In addition, further investigation of the transcription of punctuations and currency notations

of Pitman’s Shorthand is worthwhile in order to complete the functionality of the Pitman’s

Shorthand recognition system. Similar to short-forms, the number of symbols of the

punctuations and currency notations is limited and therefore the use of a Template Matching

algorithm to interpret these symbols is promising. In fact, the Template Matching algorithm

has already been established in the current recognition engine to recognise short-forms.

When the transcription engine receives punctuation data, a rigorous analysis of the

performance of the phrase level interpreter should be carried out in order to identify the

affect of punctuation over the phrase level transcription performance.

Finally, the full implementation of the integration of the contextual methods with the Viterbi

algorithm (proposed in Chapter 6) should be carried out by applying an appropriate a

professionally developed statistical language model available in the market. With the strong

foundation of the statistical language model in combination with the effective Pitman’s

Shorthand specific contextual methods, the overall performance of the system will be

considerably better than other benchmark instances found in the current literature.

8. Conclusion

179

8.3.2 Application of the Presented System to the Real Life Problems

Although this thesis focuses on the automatic recognition of handwritten Pitman’s Shorthand

as a rapid means of text entry into a tablet PC, it is worthwhile investigating the applicability

of the system to a variety of real-life problems where speed writing is critical. For instance,

it is interesting to analyse whether the system is beneficial to international travellers as a

language translation tool, running on their personal digital assistances (PDAs) or mobile

phones. Details of this idea have also been proposed in Chapter 1. In addition, previous

ideas stated in the literature such as the application of the Pitman’s Shorthand recognition

system to real time subtitling of TV programmes and the real time transcription of lectures

and meetings need to be thoroughly studied in terms of feasibility. Finally, in order to make

the original aim of the thesis come true (i.e., to apply the system as a popular rapid text entry

into portable handheld devices), it is worthwhile formulating innovative as well as attractive

training methods via which general users can become more attracted to and interested in

Pitman’s Shorthand. This may include the implementation of Pitman’s Shorthand related

educational games, the invention of a shortcut method to learn Pitman’s Shorthand and so

on.

8.4 Dissemination

The research carried out in this thesis has been disseminated in pattern recognition specific

international journals and conference proceedings. The following provides a list of papers

that have been published or submitted throughout the research.

1. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Knowledge based

transcription of Pitman’s handwritten shorthand using word frequency and context,

8. Conclusion

180

Proceedings of the 7th IEEE International Conference on Development and Application

Systems, pp. 508-512, Suceava, Romania, 27-29 May 2004.

2. Ma Yang, Graham Leedham, Colin Higgins, & Swe Myo Htwe, Segmentation and

recognition of vocalized outlines in Pitman shorthand, Proceedings of the 17th International

Conference on Pattern Recognition, Vol. I, ISBN 0-7695-2128-2, pp. 441-444, Cambridge,

UK, 23-26 August 2004.

3. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Post Processing of

Handwriting Pitman’s Shorthand using Unigram and Heuristic Approaches, Published in

Lecture Notes in Computer Science: Document Analysis Systems VI, 3163, Springer-

Verlag, pp. 332-336, Proceedings of the IAPR workshop on document analysis systems,

University of Florence, Italy, 8-10 September 2004.

4. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, Segmentation and

recognition of phonetic features in handwritten Pitman shorthand, Pattern Recognition,

August 2004, Accepted and in press.

5. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Evaluation of Feature

Sets in the Post Processing of Handwritten Pitman’s Shorthand, Proceedings of the 9th

International Workshop on Frontiers in Handwriting Recognition, ISBN 0-7695-2187-8, pp.

359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.

6. Swe Myo Htwe, Colin Higgins & Graham Leedham, Post-processing of handwritten

Phonetic Pitman’s Shorthand using a Bayesian Network built on geometric attributes, In

Pattern Recognition and Image Analysis, Lecture Notes in Computer Science 3687,

8. Conclusion

181

Springer, Sameer Singh, Maneesha Singh, Chid Apte, Petra Perner (Eds.), pp. 569-579,

2005.

7. Swe Myo Htwe, Colin Higgins & Graham Leedham, Transliteration of online

handwritten phonetic Pitman’s Shorthand with the use of a Bayesian Network, Proceedings

of the 8th International Conference on Document Analysis and Recognition, Vol. 2, pp.

1090-1094, Seoul, Korea, 29 August - 1 September 2005.

8. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, On-line recognition of

Pitman’s Shorthand for fast mobile text entry, Proceedings of the 3rd IEEE International

Conference on Information Technology and Applications, pp. 686-691, Sydney, Australia,

4-7 July 2005.

References

182

References

Abo04 About Pen Input, Ink and Recognition, available from

www.msdn.microsoft.com, 2004.

Bc85 Brooks C.P., Computer Transcription of Written Shorthand for the Deaf, PhD

Thesis, Faculty of Engineering, University of Southampton, 1985.

BN81 Brooks C.P. & Newell A.F., Simultaneous Transcription of Pitman's New Era

Shorthand, Int. Conf. on Microprocessors in Automation and Communications,

pp. 171-179, London, 27-29 Jan, 1981.

BN85 Brooks C.P. & Newell A.F., Computer Transcription of Handwritten Shorthand

as an Aid for the Deaf - A feasibility Study, International Journal of Man-

Machine Studies, Vol.23, No.1, pp.45-60, 1985.

BSH04 Bishop C.M., Svens´en M., Hinton G.E., Distinguishing Text from Graphics in

On-line Handwritten Ink, in Proceedings of the Ninth International Workshop

on Frontiers in Handwriting Recognition (IWFHR'09), pp. 142-147, Tokyo,

Japan, 26-29 October, 2004.

CDL+99 Cowell R. G., Dawid A. P., Lauritzen S. L. & Spiegelhalter D. J., Probabilistic

Networks and Expert Systems, Springer, 1999.

CFH03 Chen F-S., Fu C-M. & Huang C-L., Hand Gesture Recognition Using a Real-

time Tracking Method and Hidden Markov Models, Image and Vision

Computing, Vol.21(8): pp. 745-758, 2003.

References

183

CK04 Cho S-J. & Kim J.H., Bayesian Network Modeling of Strokes and their

Relationships for On-line Handwriting Recognition, Pattern Recognition, Vol.

37(22): pp. 253-264, 2004.

ESS+98 El-Yacoubi A., Sabourin R., Suen C.Y. & Gilloux M., Improved Model

Architecture and Training Phase in a Off-line HMM-based Word Recognition

System, in Proc. of the 14th International Conference on Pattern Recognition,

pp.17-20, Brisbaine, Australia, 1998.

FF93 Feddag A. & Foxley E., A Lexical Analyser for Arabic, International Journal of

Man-Machine Studies, Vol.38(2): pp.313-330, Feburary, 1993.

FW00 Freeman W. & Weiss Y., On the Fixed Points of the Max-Product Algorithm,

IEEE Transactions on Information Theory, 2000.

GB04 Günter S., & Bunke H., HMM-Based Handwritten Word Recognition: on the

Optimization of the Number Of States, Training Iterations And Gaussian

Components, Pattern Recognition , Vol. 37, pp. 2069 - 2079, 2004.

GKM+97 Gloger J., Kaltenmeier A., Mandler E. & Andrews L., Rejection Management in

a Handwriting Recognition System, in Proc. 4th International Conference

Document Analysis and Recognition, pp.556-559, Ulm, Germany, 1997.

HD96 Huang C. & Darwiche A., Inference in Belief Networks: A Procedural Guide,

Intl. J. Approximate Reasoning, Vol.15(3): pp. 225-263, 1996.

Hd99 Heckerman D., A Tutorial on Learning with Bayesian Networks. In Learning in

Graphical Models, M. Jordan (ed.) MIT Press, Cambridge, MA, 1999.

References

184

HHL+04a Htwe S. M., Higgins C., Leedham C.G. & Yang M., Knowledge Based

Transcription of Pitman's Handwritten Shorthand Using Word Frequency and

Context, in the Proceedings of the 7th IEEE International Conference on

Development and Application Systems, pp. 508-512, Suceava, Romania, 27-29

May 2004.

HHL+04b Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post Processing of

Handwriting Pitman’s Shorthand using Unigram and Heuristic Approaches, In

Document Analysis Systems VI, 3163, Lecture Notes in Computer Science,

Springer-Verlag, pp. 332-336, 2004.

HHL+04c Htwe S.M, Higgins C., Leedham C.G. & Yang M., Evaluation of Feature Sets

in the Post Processing of Handwritten Pitman’s Shorthand, Proceedings of the

9th International Workshop on Frontiers in Handwriting Recognition, ISBN 0-

7695-2187-8, pp. 359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.

HHL+05a Htwe S.M, Higgins C & Leedham C.G & Yang M., Transliteration of Online

Handwritten Phonetic Pitman’s Shorthand with the use of a Bayesian Network,

Proceedings of the 8th International Conference on Document Analysis and

Recognition, Vol. 2, pp. 1090-1094, Seoul, Korea, 29 August - 1 September

2005.

HHL+05b Htwe S.M., Higgins C. & Leedham C.G & Yang M., Post-processing of

Handwritten Phonetic Pitman’s Shorthand using a Bayesian Network Built on

Geometric Attributes, Proceedings of the 3rd International Conference on

Advances in Pattern Recognition, pp. 569- 579, Bath, UK, 22-25 August 2005.

References

185

HHL+05c Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post-processing of

Handwritten Phonetic Pitman’s Shorthand using a Bayesian Network Built on

Geometric Attributes, In Pattern Recognition and Image Analysis, Lecture

Notes in Computer Science 3687, Springer, Sameer Singh, Maneesha Singh,

Chid Apte, Petra Perner (Eds.), pp. 569-579, 2005.

HLB00 Hu J., Lim S.G. & Brown M.K., Writer Independent On-line Handwriting

Recognition using an HMM Approach, Pattern recognition, Vol. 33(1): pp. 133-

147, 2000.

Hn97 Nishida H., Analysis and Synthesis of Deformed Patterns Based on Structural

Models, Computer Vision and Image Understanding, Vol.68(1): pp.59-71,

October 1997.

HSS02 Hu T., Silva L.C. D. & Sengupta K., A Hybrid Approach of NN and HMM for

Facial Emotion Classification, Pattern Recognition, Vol. 23(11): pp. 1303-1310,

2002.

HV93 Hoffman J.S. & Vidal J.J., Cluster Network for Recognition of Handwritten,

Cursive Script Characters, Neural Networks, Vol.6(1): pp.69-78, 1993.

Ja99 Bilmes J.A., Natural Statistical Models for Automatic Speech Recognition, PhD

Dissertation, University of California at Berkeley, May 1999.

JBS05 Justino E. J. R., Bortolozzi F. & Sabourin R., A comparison of SVM and HMM

Classifier in the Off-line Signature Verification, Pattern Recognition Letters,

References

186

Vol. 26(9): pp. 1377-1385, 2005.

JGJ+98 Jordan M. I., Ghahramani Z., Jaakkola T. S. & Saul L. K., An Introduction to

Variational Methods for Graphical Models. In M. Jordan (ed.) Learning in

Graphical Models, MIT Press, 1998.

JJ98 Jaakkola T.S. & Jordan M.I., Learning in Graphical Models, Chapter Improving

the Mean Field Approximations via the Use of Mixture Distributions. Kluwer

Academic Publishers, 1998.

Jm99 Jordan M. I.(ed), "Learning in Graphical Models", MIT Press, 1999.

Ka04 Koerich A.L., Rejection Strategies for Handwritten Word Recognition, in Proc.

9th Int'l Workshop on Frontiers in Handwriting Recognition, Tokyo, Japan, 26-

29 October, 2004.

KKL03 Kim M., Kim D. & Lee S-Y., Face Recognition Using the Embedded HMM

with Second-order Block-specific Observations, Pattern Recognition, Vol.

36(11): pp. 2723-2735, 2003.

KP01 Kapoor A. & Picard R., A real-time Head Nod and Shake Detector, in

Proceedings from the Workshop on Perspective User Interfaces, November

2001.

KSN+03 Kumar G.H., Shankar M.R., Nagabushan P., Anami B.S., Generation of Pitman

Shorthand Language Symbol for Diphthongs Grammalogues and Punctuation

from Spoken English Language Text: An Approach based on Discrete Wavelet

References

187

Transform and Dynamic Time Wrapping Technique, Proceedings: National

Workshop on IT Services and Applications (WITSA2003), Feb 27-28, 2003.

KSN04 Kumar H.G., Shivakumara P. & Nagaraju S., A New Invariant Feature

Extraction Technique for Classification of Pitman Shorthand Symbols,

Proceedings of the International Conference on Computational Science

(ICCS2004), Krakow, Poland, June 6-9, 2004.

LD84 Leedham C.G. & Downton A.C., On-line recognition of Shortforms in Pitman's

Handwritten Shorthand, Proc. of the 7th Int. Conf. on Pattern Recognition, (also

in 1985 Research Handbook, Department of Electronics, Southampton

University, pp. 39-41), pp. 1058- 1060, Montreal, Canada, 30 July-2 August

1984.

LD86 Leedham C.G. & Downton A.C., On-line Recognition of Pitman's Handwritten

Shorthand - an Evaluation of Potential, Int. J. Man-Machine Studies, Vol. 24,

pp. 375-393, 1986.

LD87 Leedham C.G. & Downton A.C., Automatic Recognition and Transcription of

Pitman's Handwritten Shorthand - an Approach to Shortforms, Pattern

Recognition, Vol. 20, No. 3, pp. 341-348, 1987.

LDB84 Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line

Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data

Entry, Proc. of the 1st Int. Conf. on Human-Computer Interaction, pp. 2.86-

2.91, London, UK, 4-7 September 1984.

References

188

LDB85 Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line

Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data

Entry, Human-Computer Interaction - INTERACT '84, B. Shackel (Ed.), pp.

151-156, North Holland, 1985.

Lg84 Leedham C.G., Computer Acquisition and Recognition of Pitman's Handwritten

Shorthand, PhD Thesis, Department of Electronics, University of Southampton,

1984.

Lg89 Leedham C.G., Pitman's Handwritten Shorthand: Machine Recognition and

Transcription, Proc. of the 4th International Graphonomics Society Conference

on the Development of Graphical Skills, Trondheim, Norway, 24-26 July 1989.

Lg90 Leedham C.G., Automatic Recognition and Transcription of Pitman’s

Handwritten Shorthand, Computer Processing of Handwriting, Plamondon R.

and Leedham C. G. (Eds.), World Scientific, ISBN 981-02-0408-6, pp. 235-

269, Singapore, November 1990.

LQ89 Leedham C.G. & Qiao Y., On-line Recognition of Vocalised Pitman shorthand

Outlines, Proc. of the IEE Colloquium on Character Recognition and

Applications, Digest No. 1989/109, pp. 10/1-10/5, Savoy Place, London, 2

October 1989.

LQ90 Leedham C.G. & Qiao Y., Correcting Recognition Errors in Handwritten,

Vocalised Pitman’s Shorthand Outlines Using a Boltzmann Machine, Proc. of

the IEE Colloquium on Neural Nets in Human-Computer Interaction, Digest

No. 1990/179, pp. 2/1-2/7, Savoy Place, London, 14 December 1990.

References

189

LQ92 Leedham C.G. & Qiao Y., High Speed Text Input to Computer Using

Handwriting, Instructional Science, Vol. 21, pp. 209-221, September 1992.

Lr89 Lea R.N., Applications of fuzzy Sets to Rule-based Expert System

Development, Telematics and Informatics, Vol.6(3-4), pp. 403-406, 1989.

LY97 Li X. & Yeung D-Y., On-line Handwritten Alphanumeric Character

Recognition Using Dominant Points in Strokes, Pattern Recognition, Vol.31(1):

pp. 31-44, January, 1997.

MAG+02 Marukata S., Artires T., Gallinari P. & Dorizzi B., Rejection Measures for

Handwriting Sentence Recognition, in Proc. 8th International Workshop on

Frontiers in Handwriting Recognition, pp. 24-29, Niagara-on-the-Lake, Canada,

2002.

MB01 Marti U-V. & Bunke H., Using a Statistical Language Model to Improve the

Performance of an HMM-Based Cursive Handwriting Recognition System,

IJPRAI, Vol. 15(1): pp. 65-90, 2001.

Md98 Mackay D. J. C., Introduction to Monte Carlo Methods. In M. I. Jordan (ed.)

Learning in Graphical Models, pp. 175-204. Cambridge, MA: MIT Press.

Mic04 Microsoft Tablet PC - Ink Controls, available from www.msdn.microsoft.com,

2004

Mk01 Murphy M., Introduction to Graphical Models, Technical Report, May 2001.

Mk98 Murphy M., A Brief Introduction to Graphical Models and Bayesian Networks,

References

190

Technical Report, 1998.

Ml00 Mendez L.A.T., Viterbi Algorithm in Text Recognition, Pattern Recognition

Course, McHill University,2000.

Mm03 Miozzo M., On the Processing of Regular and Irregular Forms of Verbs and

Nouns: Evidence from Neuropsychology, Congnition, Vol.87(2), pp. 101-127,

March 2003.

Ms01 Marukatat S., Sentence Recognition through Hybird Neuro-Markovian

Modeling, in 6th ICDAR, pages 731-737, 2001.

MS04 Meyer C. & Schramm H, Boosting HMM Acoustic Models in Large

Vocabulary Speech Recognition, Proceedings of Signal and Image Processing

(SIP 2004), Honolulu, Hawaii, USA, 23-25 August 2004.

MS99 Manning C. & Schutze H., Foundations of Statistical Natural Language

Processing, MIT Press, 1999.

Mt98 Toshiyuki M., POBox: An efficient Text Input Method for Handheld and

Ubiquitous Computers, Proc. of the ACM Conference on Human Factors in

Computing System (CHI’98), Los Angeles, USA, pp. 328-335, April 1998.

NB02 Nagabhushan P. & Anami B.S., Dictionary Supported Generation of English

Text from Pitman Shorthand Scripted Phonetic Text, Proceedings of the

Language Engineering Conference (LEC'02), pp. 33, Hyderabad, India,

December 13 - 15, 2002.

References

191

NL92 Nair A. & Leedham C.G., Evaluation of Dynamic Programming Algorithms for

the Recognition of Shortforms in Pitman’s Shorthand, Pattern Recognition

Letters, Vol. 13, pp. 605-612, August 1992.

Oj95 Osborne J., Pitman 2000: Shorthand First Course (Pitman 2000 Shorthand),

1995.

Pj88 Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.

Pj88 Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference, Morgan Kaufmann, 1988.

PP02 Pitrelli J.F. & Perrone M.P., Confidence Modeling for Verification Post-

Processing for Handwriting Recognition, in Proc. 8th International Workshop

on Frontiers in Handwriting Recognition, pp.30-35, Niagara-on-the-Lake,

Canada, 2002.

PS91 Peot M. & Shachter R., Fusion and Propagation with Multiple Observations in

Belief Networks, Artificial Intelligence, Vol.48: pp. 299-318, 1991.

PVM+03 Perraud F., Viard-Gaudin C., Morin E. & Lallican P-M., N-Gram and N-Class

Models for on Line Handwriting Recognition, In 7th ICDAR, pages 1053-1059,

2003.

QAC05 Quiniou S., Anquetil E. & Carbonnel S., Statistical language Models for On-

line Handwritten Sentence Recognition, Proceedings of the Eighth International

Conference on Document Analysis and Recognition (ICDAR 2005), pp. 516-

References

192

520, Seoul, Korea, 29 August - 1 September, 2005.

QL89 Qiao Y. & Leedham C.G., Segmentation and Classification of Consonant

Features in Vocalised Pitman Shorthand Outlines, Proc. of the Int. Conf. on

Image Processing (ICIP'89), pp. 294-298, Singapore, 5-8 September 1989.

QL91 Qiao Y. & Leedham C.G., Segmentation of Handwritten Pitman Shorthand

Outlines Using an Interactive Heuristic Search, Proc. of the 5th International

Graphonomics Society Conference, ISBN 0-9630246-0-4, pp. 157-162, Tempe,

Arizona, USA, 27-30 October 1991.

QL93 Qiao Y. & Leedham C.G., Segmentation and Recognition of Handwritten

Pitman Shorthand Outlines Using an Interactive Heuristic Search, Pattern

Recognition, Vol. 26, No. 3, pp. 433-441, March 1993.

Ri93 Russell I., Neural Networks: Theory and Applications, The Journal of

Undergraduate Mathematics and its Applications (UMAP), Vol 14(1), January

1993.

Rl89 Rabiner L., A Tutorial on Hidden Markov Models and Selected Applications in

Speech Recognition, Proc. IEEE 77(2): pp. 257--286, 1989.

Sa04 Seward A., A fast HMM Match Algorithm for very Large Vocabulary Speech

Recognition, Speech Comm, Vol. 42, pp. 191-206, 2004.

SJJ96 Saul L.K., Jaakkola T. & Jordan M.I., Mean Field Theory for Sigmoid Belief

Networks, Journal of Artificial Intelligence Research, Vol. 14: pp. 61-76, 1996.

References

193

SKN+04 Shankar M.R., Kumar G.H., Nagabhushan P., Anami B.S., Linear Predictive

Coefficient Based Approach for Generation of Pitman Shorthand Language

Characters from Spoken English Language, Proceedings of the International

Conference on Computational Science (ICCS2004), Krakow, Poland, June 6-9,

2004.

Sr98 Shachter R.D.,Bayes-ball: The Rational Pastime for Determining Irrelevance

and Requisite Information in Belief Networks and Influence Diagrams. In

Uncertainty in Artificial Intelligence, 1998.

Sy94 Swales Y.Y.G., Integrating Artificial Neural Networks with Rule-based Expert

Systems, Decision Support Systems, Vol.11(5), pp.497-507, June 1994.

Tab04 Tablet PC Platform Software Development Kit (SDK) v1.5, 2004, Available

from www.microsoft.com

VBB04 Vinciarelli V., Bengio S. & Bunke H., Offline Recognition of Unconstrained

Handwritten Texts using HMMs and Statistical Language Models, IEEE

Transactions on PAMI, Vol. 26(6): pp. 709-720, 2004.

WF99 Weiss Y. & Freeman W. T., Correctness of Belief Propagation in Gaussian

Graphical Models of Arbitrary Topology. In NIPS-12, 1999.

WH93 Wellman M.P. & Henrion M., Explaining “Explaining Away”, IEEE

Transactions on Pattern Analysis and Machine Intelligence archive, Vol 15(3):

pp. 287-292, 1993.

References

194

Win05 Windows XP Tablet PC Edition 2005 Features, available from

www.microsoft.com/windowsxp/tabletpc, 25 August, 2004.

Wy00 Weiss Y., Correctness of Local Probability Propagation in Graphical Models

with Loops, Neural Computation, Vol. 12: pp. 1-41, 2000.

XL02 Xiao X. & Leedham C.G, Signature Verification Using a Modified Bayesian

Network, Pattern Recognition, Vol.35, pp. 983-995, 2002.

YLH+04a Yang M., Leedham C.G., Higgins C., & Htwe S.M, Segmentation and

Recognition of Vocalized Outlines in Pitman shorthand, Proceedings of the

17th International Conference on Pattern Recognition, Vol. I, ISBN 0-7695-

2128-2, pp. 441-444, Cambridge, UK, 23-26 August 2004.

YLH+04b Yang M., Leedham C.G., Higgins C. & Htwe S.M., Segmentation and

Recognition of Phonetic Features in Handwritten Pitman Shorthand, Pattern

Recognition, August 2004, Accepted and in press.

YLH+05a Yang M., Leedham C.G., Higgins C. & Htwe S.M, On-line Recognition of

Pitman’s Shorthand for Fast Mobile Text Entry, Proceedings of the 3rd IEEE

International Conference on Information Technology and Applications, pp. 686-

691, Sydney, Australia, 4-7 July 2005.

YLH+05b Yang M., Leedham C.G., Higgins C. & Htwe S.M., An On-line Automatic

Recognition System for Pitman’s Shorthand, Proceedings of the IEEE Region

10 Technical Conference (TENCON05), Melbourne, Australia, 21-24

November 2005.

References

195

YLH+05c Yang M., Graham Leedham., Higgins C. & Htwe S.M., Critical Technological

Issues of Commercializing a Pitman Shorthand Recognition System,

Proceedings of the 5th International Conference on Information,

Communications and Signal Processing (ICICS 2005), Bangkok, Thailand, 6-9

December 2005.

YWP95 Yang L., Widjaja B. K. & Prasad R., Application of Hidden Markov Models for

Signature Verification, Pattern Recognition, Vol.28(22): pp. 161--170, 1995.

ZB04 Zimmermann M. & Bunke H., N-Gram Language Models for Offline

Handwritten Text Recognition, in 9th IWFHR, pages 203-208, 2004.

ZK03 Zhai S., & Kristensson P-O., Shorthand Writing on Stylus Keyboard,

Proceedings of the ACM Conference on Human Factors in Computing Systems

(CHI 2003), CHI Letters 5, 1. ACM Press, pp 97-104, 2003.

Appendix

196

Chapter 9 Appendix A

Description of the 46 Rules Applied to the Automatic Generation of

a Machine-readable Pitman’s Shorthand Lexicon

The 1st Rule

Complexity: direct conversion

Objective: to verify if an input (i.e., a set of phonemes) corresponds to a vocalised outline,

containing both consonants and vowels

Strategy: check if an input contains any consonants; if it does, an input is passed to the 2nd

rule; otherwise it instructs the program to omit the current input and get another.

The 2nd Rule

Complexity: indirect conversion.

Objective: to convert a combination of /Y/, /CH/, /JH/ or /ZH/ and a vowel into a diphthong

symbol ( ). To clarify the objective, consider an example for the word “refuse” (/R Ĭ F Y

U Z/) – instead of the consonant /Y/ written as , it is combined with an adjacent vowel

/U/ and written as a diphthong (Figure 9.1).

Strategy: convert a combination of /Y/, /ZH/, /JH/ or /CH/ and /U/ or /AH/ into a diphthong

feature .

Pitman’s Shorthand outline for the word “refuse”

Reference

Pitman’s Shorthand notation

/R/ /F/ /S/ /Y/ (diphthong)

Pronunciation for the word “refuse”

/R Ĭ F Y U Z/

Appendix

197

Figure 9.1: Illustration of the use of diphthong feature in a vocalised outline

The 3rd Rule


Objective: to convert the sounds CON and COM at the beginning of a word into a dot

primitive. A sample outline containing the sound COM at the beginning is illustrated in

Figure 9.2.

Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is

not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON

or COM into a dot primitive.

Figure 9.2: Illustration of the use of a dot primitive for the sound COM at the beginning of a word

The 4th Rule


Objective: to convert the sound WH of a word into a large hook. A sample Pitman’s

Shorthand outline containing the sound WH is illustrated in Figure 9.3.

Strategy:

1. First, save words containing the sound WH in a list. In the current system, the list

contains 321 words, which are extracted from a lexicon of 99,281 words.

2. Check if a word representation of an input matches with any element of the list.

3. If it does, then the sound WH of an input is converted into a large hook, otherwise

do nothing.

Reference


/COM/ /M/ /NS/ /Ě/

Pronunciation for the word “commence”

/ K Ŏ M Ě N S/

Pitman’s outline for the word “commence”

Appendix

198

Figure 9.3: Illustration of the use of WH hook in a vocalised outline

The 5th Rule


Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a

series of consonant and vowel primitives. A sample Pitman’s Shorthand outline containing

the prefix IR- is illustrated in Figure 9.4.

Strategy:

8. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.

9. Check if a word representation of an input matches with any element of the list;

10. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,

followed by a dot primitive and another extra upward stroke L;

11. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a

dot primitive and another extra curve M;

12. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,

followed by a dot primitive and another extra downward curve R;

13. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a

dot primitive and another extra curve N.

14. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a

dash primitive and another extra curve N.

Pitman’s outline for the word “where”

Reference


/WH/ /R/ /Ā/

Pronunciation for the word “where”

/W H Ā R/

Appendix

199

In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not

allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitman’s

Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule

can be referenced in Appendix B.

Figure 9.4: Illustration of the use of negative prefix IR- in a vocalised outline

The 6th Rule


Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the

beginning, in the middle or at the end of a word into a series of a small hook L followed by a

corresponding consonant primitive. Note that the consonant L is written as an upward or

downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,

/CH/, /J/, /K/ or /G/. A sample Pitman’s Shorthand outline containing the sound /P L/ at the


Strategy:

4. If /N/ comes before /T L/ or /D L/, hook L is not used.

5. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;

6. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/

of an input with a, b, c, d, e, f, g and h respectively, where

Pitman’s Shorthand outline for the word “irregular”

Reference


/R/ /R/ /G/ /L/ /R/

(start) (middle) (end)

/Ĭ/ /Ě/ /Ă/ /U/

Pronunciation for the word “irregular”

/ Ĭ R Ě G Y U L Ă/

Appendix

200

a = hook + P stroke

b= hook + B strokes

c = hook + T stroke

d = hook + D stroke

e = hook + CH stroke

f = hook + J stroke

g = hook + K stroke

h = hook + G stroke.

Figure 9.5: Illustration of the use of PL hook in a vocalised outline

The 7th rule


Objective: to convert a pair of consonants FL, VL, ThL, ML, NL, SHL at the beginning of a

word into a series of a large hook L, followed by a corresponding consonant primitive. A

sample Pitman’s Shorthand outline containing the sound /FL/ at the beginning of a word is


Strategy: replace phonemes /FL/, /VL/, /ThL/, /ML/, /NL/ and /SHL/ with a, b, c, d, e and f

respectively, where

a = large hook L + F stroke

b = large hook L + V stroke

c = large hook L + Th stroke

d = large hook L + M stroke

Pitman’s Shorthand outline for the word “play”

Reference


/P/ /L/ /PL/ /Ā/

Pronunciation for the word “play”

/P L Ā/

Appendix

201

e = large hook L + N stroke

f = large hook L+ SH stroke

Figure 9.6: Illustration of the use of FL hook at the beginning of a vocalised outline

The 8th Rule


Objective: to convert a series of consonants SPR, STR, SKR at the beginning of a word into

a series of circle S followed by P, T or K stroke respectively. Note that a consonant R is

omitted in this case. A sample Pitman’s Shorthand outline containing the sound /SPR/ at the


Strategy: replace phonemes /SPR/, /STR/ and /SKR/ at the beginning of input phonemes

with a, b and c respectively, where

a = circle S + stroke P

b = circle S + stroke T

c = circle S + stroke K

Pitman’s Shorthand outlinefor the word “spray”

Reference


/S/ /P/ /R/ /PR/ /SPR/ /Ā/

Pronunciation for the word “spray”

/S P R Ā/

Pitman’s Shorthand outline for the word “flow”

Reference


/F/ /L/ /FL/ /Ō/

Pronunciation for the word “flow”

/F L Ō/

Appendix

202

Figure 9.7: Illustration of the use of SPR stroke in a vocalised outline

The 9th rule


Objective: to convert the sound /STER/ in the middle or at the end of a word into a large

loop. A sample Pitman’s Shorthand outline containing the sound /STER/ at the end of a

word is illustrated in Figure 9.8.

Strategy:

1. If the sound /STER/ appears at the beginning of a word, it is not converted into a

large loop;

2. If the sound /STER/ is followed by a consonant /N/ at the end of a word, it is not

converted into a large loop;

3. otherwise, replace the sound /STER/ of a word with a large loop.

Figure 9.8: Illustration of the use of STER loop in a vocalised outline

The 10th rule


Objective: to omit primitives of the sound CON, COM, CUM or COG in the middle of a

word. A sample Pitman’s Shorthand outline containing the sound /CON/ in the middle of a

word is illustrated in Figure 9.9.

Strategy:

1. If CON, COM, CUM or COG is the only sound of a word, it is not omitted;

Pitman’s Shorthand outline for the word “master”

Reference


/M/ /S/ /T/ /R/ /STER/ /AH/

Pronunciation for the word “master”

/M AH S T Ă/

Appendix

203

2. otherwise, omit primitives of the sound CON, COM, CUM or COG.

Figure 9.9: Illustration of omission of the sound CON in the middle of a vocalised outline

The 11th rule


Objective: to convert the sound SES, SEZ, ZES or ZEZ at the end of a word into a large

circle. A sample Pitman’s Shorthand outline containing the sound SEZ at the end of a word

is illustrated in Figure 9.10.

Strategy: if the word is ended with the sound SES, SEZ, ZES or ZEZ, replace the sound with

a large circle.

Figure 9.10: Illustration of the use of SEZ circle in a vocalised outline

The 12th rule


Pitman’s Shorthand outline for the word “bases”

Reference


/B/ /SEZ/ /Ā/ Pronunciation for the word “bases”

/B Ā S Ē Z/

Pitman’s Shorthand outline for the word “reconsider”

Reference


/R/ /K/ /N/ /S/ /D/ /Ē/ /Ĭ/

Pronunciation for the word “reconsider”

/R Ē K Ŏ N S Ĭ D Ă/

Appendix

204

Objective: to convert the sound /ED/ that makes a verb a past tense into a disjoined stroke T

or stroke D. A sample Pitman’s Shorthand outline containing a disjointed /ED/ at the end is


Strategy: if a word is ended with /T/ or /D/ and if a vowel comes before /T/ or /D/, then

replace /vowel+T/ or /vowel+D/ at the end of the word with T or D stroke respectively.

Figure 9.11: Illustration of the use of a disjointed /ED/ in a vocalised outline

The 13th rule


Objective: to convert the sound /ST/ at the beginning, in the middle or at the end of a word

into a shallow loop. A sample Pitman’s Shorthand outline containing the sound /ST/ at the


Strategy:

1. if a word begins or ends with a vowel, /ST/ at the beginning or at the end of the

word is not converted into a shallow loop;

2. if the sound /ST/ is immediately followed by the sound /SHUN/, it is not converted

into a shallow loop;

3. if /S/ and /T/ belong to two different syllables, /ST/ is not converted into a shallow

loop;

4. if /ST/ comes before /NTS/ or /NDS/ at the end of a word, it is not converted into a

shallow loop;

5. if /ST/ is followed by /R/, it is not converted into a shallow loop;

Pitman’s Shorthand outline for the word “dated”

Reference


/D/ /T/ /Ā/ /ED/ Pronunciation for the word “dated”

/D Ā T Ĭ D /

Appendix

205

6. otherwise, replace /ST/ of an input with a shallow loop.

Figure 9.12: Illustration of the use of ST loop in a vocalised outline

The 14th rule


Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length

writing rule of Pitman’s Shorthand and a sample (one-syllable) half-length outline is


Strategy: if a word is a one-syllable word and if there are consonants in a word other than

just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not

following a voiced consonant and /D/ is not following an unvoiced consonant.

Figure 9.13: Illustration of a one syllable half-length outline

The 15th rule


Objective: to convert the suffix ING into a dot primitive. A sample Pitman’s Shorthand

outline containing the suffix ING is illustrated in Figure 9.14.

Pitman’s Shorthand outline for the word “coat”

Reference


/K/ /T/ /Ō/

Pronunciation for the word “coat”

/K Ō T/

Pitman’s Shorthand outline for the word “stock”

Reference


/S/ /T/ /ST/ /K/ /Ŏ/ Pronunciation for the word “stock”

/S T Ŏ K /

Appendix

206

Strategy: if an input ends with the sound ING, convert the sound ING into a dot primitive.

Figure 9.14: Illustration of the use of suffix ING in a vocalised outline

The 16th rule


Objective: to convert the suffix INGS into a dash primitive. A sample Pitman’s Shorthand

outline containing the suffix INGS is illustrated in Figure 9.15.

Strategy: if an input ends with the sound /INGS/, convert the sound /INGS/ into a dash.

Figure 9.15: Illustration of the use of the suffix INGS in a vocalised outline

The 17th rule


Objective: to convert the suffix -SHIP into a SH stroke. A sample Pitman’s Shorthand

outline containing the suffix SHIP is illustrated in Figure 9.16.

Strategy: if a word ends with the sound /SHIP/, then convert the sound /SHIP/ into a stroke

SH.

Pitman’s Shorthand outline for the word “takings”

Reference


/T/ /K/ /Ā/ /INGZ/ Pronunciation for the word “takings”

/T Ā K Ĭ NG Z/

Pitman’s Shorthand outline for the word “coping”

Reference


/K/ /P/ /Ō/ /ING/ Pronunciation for the word “coping”

/K Ō P Ĭ NG/

Appendix

207

Figure 9.16: Illustration of the use of the suffix SHIP in a vocalised outline

The 18th rule


Objective: to convert the consonant /S/ or /Z/ into a downward stroke. A sample Pitman’s

Shorthand outline containing the stroke Z is illustrated in Figure 9.17.

Strategy:

1. if a vowel comes before /S/ at the beginning of input phonemes, convert /S/ at the

beginning into a stroke S.

2. If input phonemes contain /S+vowel/, /S+vowel+S/, /S+vowel+ past tense D/ or

/S+vowel+ING/ at the end, convert /S/ at the end into a stroke S.

3. If input phonemes contain /S+vowel+S/ or /S+vowel+Z/ at the beginning, convert

/S/ at the beginning into a stroke S.

4. If a word starts with /Z/, convert /Z/ into a stroke Z.

Figure 9.17: Illustration of the use of stroke Z in a vocalised outline

Pitman’s Shorthand outline for the word “busy”

Reference


/B/ /Z/ /Ĭ/ /Ĭ/ Pronunciation for the word “busy”

/B Ĭ Z Ĭ/

Pitman’s Shorthand outline for the word “scholarship”

Reference


/S/ /K/ /L/ /R/ /AW/ /Ă/ /SHIP/ Pronunciation for the word “scholarship”

/S K AW L Ă R SH Ĭ P/

Appendix

208

The 19th rule


Objective: to convert the suffix –MENT into a stroke N. A sample Pitman’s Shorthand

outline containing the suffix –MENT is illustrated in Figure 9.18.

Strategy: if the sound /MENT/ appears at the end of a word and is preceded by a straight

upstroke, /N/, /ST/ or /S/, it is converted into stroke N.

Figure 9.18: Illustration of the use of the suffix –MENT in a vocalised outline

The 20th rule


Objective: to convert the suffix –MENTAL into a series of a stroke N followed by a

downward L. A sample Pitman’s Shorthand outline containing the suffix –MENTAL is


Strategy: if an input contains the sound /MENTAL/ at the end, convert the sound

/MENTAL/ into a combination of a stroke N and a downward L.

Pitman’s Shorthand outline for the word “experimental”

Reference


/P/ /S/ /P/ /R/ /MENTAL/ /Ě/ / Ě/ Pronunciation for the word “experimental”

/ Ě P Ě R Ĭ M Ă N T Ě L/

Pitman’s Shorthand outline for the word “apartment”

Reference


/P/ /RT/ /MENT/ /Ă/ /AH/ Pronunciation for the word “apartment”

/Ă P AH R T M Ă N T/

Appendix

209

Figure 9.19: illustration of the use of the suffix –MENTAL in a vocalised outline

The 21st rule


Objective: to convert the suffix –MENTALLY into a series of stroke N followed by

downward L and a vowel Ē. In fact, primitive representations of the suffix –MENTAL and

the suffix –MENTALLY are very similar. Figure 9.20 illustrates a sample Pitman’s

Shorthand outline containing the suffix –MENTALLY.

Strategy: if input phonemes contain the sound /MENTALLY/ at the end, convert the sound

/MENTALLY/ into a series of a stroke N, followed by a downward L and vowel Ē.

Figure 9.20: illustration of the use of the suffix –MENTALLY in a vocalised outline

The 22nd rule


Objective: to omit the syllables –TER, -DER, -THER and -TURE of a word according to the

double-length rule of Pitman’s Shorthand (description of the rule can be referenced in

appendix B). A sample Pitman’s Shorthand outline containing the syllable -TER is


Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or

at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the

syllable is removed from the input phonemes. Samples of incompatible neighbouring

primitives are given (Figure 9.22).

Pitman’s Shorthand outline for the word “experimentally”

Reference


/P/ /S/ /P/ /R/ /MENTAL/ /Ě/ / Ě/ Pronunciation for the word “experimentally”

/ Ě P Ě R Ĭ M Ă N T Ě L/

Appendix

210

Figure 9.21: Illustration of the omission of the syllable –TER in a vocalised outline

Figure 9.22: illustration of incompatible primitive pairs for doubling

The 23rd rule


Objective: to omit the consonant /D/, following the consonant /M/ or /N/ of a word

according to the MD and ND writing rules of Pitman’s Shorthand (description of the rules

can be referenced in appendix B). A sample Pitman’s Shorthand outline containing the

sound /MD/ is given in Figure 9.23.

Strategy:

1. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound

/SHUN/, the consonant /D/ is not omitted;

2. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound /N/,

/NS/ or /NT/ at the end of a word, the consonant /D/ is not omitted;

3. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by a vowel at the end

of a word, the consonant /D/ is not omitted;

4. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by vowel+/S/ or

vowel+/Z/ at the end of a word, the consonant /D/ is not omitted;

Primitive pairs that cannot be represented by doubling

/F K/ /V K/ /F G/ /V G/

Pitman’s Shorthand outline for the word “after”

Reference


/FTER/ /F/ /Ă/

Pronunciation for the word “after”

/ AH F T Ă R/

Appendix

211

5. otherwise, omit a consonant /D/, following /M/ or /N/.

Figure 9.23: Illustration of occurrence of the sound MD in of vocalised outline.

The 24th rule


Objective: to convert a pair of consonants FR, VR, Thr, THR, SHR, ZHR, MR or NR at the

beginning of a word into a series of a small hook followed by a corresponding consonant

primitive. A sample Pitman’s Shorthand outline containing the sound /FR/ at the beginning

of a word is illustrated in Figure 9.24.

Strategy: if an input contains the sound /FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/ or

/NR/ at the beginning, then replace phonemes / FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/

or /NR/ with a, b, c, d, e, f, g and h respectively, where

a = small hook + stroke F

b = small hook + stroke V

c = small hook + stroke Th

d = small hook + stroke TH

e = small hook + stroke SH

f = small hook + stroke ZH

g =small hook + stroke M

h= small hook + stroke N

Pitman’s Shorthand outline for the word “madam”

Reference


/MD/ /M/ /Ă/ /Ă/

Pronunciation for the word “madam”

/ M Ă D Ă M/

Appendix

212

Figure 9.24: Illustration of the use of FR hook at the beginning of a vocalised outline

The 25th and 26th rules


Objective: to convert the syllable PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR,

SHR, ZHR, MR, NR at the beginning, in the middle or at the end of a word into a series of a

small hook followed by a corresponding consonant primitive. A sample Pitman’s Shorthand

outline containing the syllable PR is illustrated in Figure 9.25.

Strategy: replace the syllable /PR/, /BR/, /TR/, /DR/, /CHR/, /JR/, /KR/, /GR/, /FR/, /VR/,

/Thr/, /THR/, /SHR/, /ZHR/, /MR/ and /NR/ with a, b, c, d, e, f, g, h, i, j, k, l, m, n, o and p

respectively, where

a = small hook + stroke P

b = small hook + stroke B

c = small hook + stroke T

d = small hook + stroke D

e = small hook + stroke CH

f = small hook + stroke J

g = small hook + stroke K

h = small hook + stroke G

i = small hook + stroke F

j = small hook + stroke V

k = small hook + stroke Th

Pitman’s Shorthand outline for the word “free”

Reference


/F/ /R/ /FR/ /Ē/

Pronunciation for the word “free”

/ F R Ē/

Appendix

213

l = small hook + stroke TH

m = small hook + stroke SH

n = small hook + stroke ZH

o = small hook + stroke M

p = small hook + stroke N

Figure 9.25: Illustration of occurrence of the syllable PR in a vocalised outline

The 27th rule


Objective: to omit the consonant /R/ in the sound /SKR/ or /SGR/. A sample Pitman’s

outline containing the sound /SKR/ is illustrated in Figure 9.26.

Strategy: if an input contains the sound /SKR/ or /SGR/, and if the sound /SGR/ is not at the

beginning of the input, then replace /SKR/ or /SGR/ with a or b respectively, where

a = circle S+ stroke K

b = circle S + stroke G

Pitman’s Shorthand outline for the word “describe”

Reference


/D/ /S/ /K/ /R/ /B/ /SKR/ /Ě/ /I/

Pronunciation for the word “describe”

/ D Ě S K R I B/

Pitman’s Shorthand outline for the word “paper”

Reference


/P/ /R/ /PER/ /Ā/

Pronunciation for the word “paper”

/ P Ā P Ă R /

Appendix

214

Figure 9.26: Illustration of occurrence of the sound SKR in a vocalised outline

The 28th rule


Objective: to convert the sound /KW/ or /GW/ of a word into a series of a large hook

followed by a corresponding consonant primitive. A sample Pitman’s Shorthand outline

containing the sound /KW/ is illustrated in Figure 9.27.

Strategy: replace the sound /KW/ and /GW/ with a and b respectively, where

a = large hook + stroke K

b = large hook + stroke G

Figure 9.27: illustration of occurrence of the sound KW in a vocalised outline

The 29th rule


Objective: to convert the syllable PL, BL, TL, DL, CHL, JL, KL or GL in the middle and at

the end a word into a series of a small hook followed by a corresponding consonant

primitive.

Strategy: replace the syllable PL, BL, TL, DL, CHL, JL, KL, and GL with a, b, c, d, e, f, g

and h respectively, where

a = small hook + P stroke

b= small hook + B stroke

c = small hook + T stroke

Pitman’s Shorthand outline for the word “quick”

Reference


/K / /W/ /KW/ /Ĭ/

Pronunciation for the word “quick”

/ K W Ĭ K/

Appendix

215

d = small hook + D stroke

e = small hook + CH stroke

f = small hook + J stroke

g= small hook + K stroke

h= small hook + G stroke

The 30th rule


Objective: to convert the syllable FL, VL, THL, ML, NL or SHL in the middle or at the end

of a word into a series of a large hook followed by a corresponding consonant primitive. A

sample outline containing the syllable FL is illustrated in Figure 9.28.

Strategy: replace the syllable FL, VL, THL, ML, NL and SHL with a, b, c, d, e and f

respectively, where

a = large hook + stroke F

b = large hook + stroke V

c = large hook + stroke TH

d = large hook + stroke M

e = large hook + stroke N

f = large hook + stroke SH

Figure 9.28: illustration of the use of FL hook in a vocalised outline

Pitman’s Shorthand outline for the word “flow”

Reference


/F/ /L/ /F U L / /Ĭ/

Pronunciation for the word “flow”

/ F U L F Ĭ L/

Appendix

216

The 31st rule


Objective: to convert a pair of consonants /S/ and /H/ into a circle S. A sample Pitman’s

Shorthand outline containing a series of /S/ and /H/ is illustrated in Figure 9.29.

Strategy: replace the sound /S H/ of input phonemes with a circle S.

Figure 9.29: illustration of occurrence of /S/ followed by /H/ in a vocalised outline

The 32nd rule


Objective: to omit a small hook R when there is a series of S+vowel+hookR or

ST+vowel+hookR. A sample Pitman’s Shorthand outline containing a series of

S+vowel+hookR is illustrated in Figure 9.30.

Strategy: if an input contains /S+vowel+hookR/ or /ST+vowel+hookR/, omit the hook R.

Figure 9.30: illustration of the occurrence of a series of S+vowel+hookR in a vocalised outline

Pitman’s Shorthand outline for the word “supper”

Reference


/S / /P/ /R/ /PR/ /S PER/ /Ŭ/

Pronunciation for the word “supper”

/ S Ŭ P Ĭ R/

Pitman’s Shorthand outline for the word “racehorse”

Reference


/R/ /H/ /S / /R/ /Ā/ /AW/

Pronunciation for the word “racehorse”

/ R Ā S H S AW R S/

Appendix

217

The 33rd rule


Objective: to convert the consonant /L/ into a downward stroke. A sample Pitman’s

Shorthand outline containing the consonant /L/ is illustrated in Figure 9.31.

Strategy:

1. if /L/ is following a hook N, it is not converted into downward stroke.

2. If /L/ is not at the beginning of a word and if it is following the stroke /N/ or /NG/, it

is converted into a downward stroke.

Figure 9.31: illustration of the use of a downward stroke L in a vocalised outline

The 34th rule


Objective: to convert the consonant /F/ or /V/ at the end of a word into a small hook. A

sample Pitman’s Shorthand outline containing a hook F is illustrated in Figure 9.32.

Strategy:

1. if /F/ or /V/ is the only consonant of an input, it is not converted into a small hook.

2. if /F/ or /V/ is followed by the sound -ING, -INGS, T, D, S or Z and if it is not

preceded by the consonant /L/, it is converted into a small hook.

Pitman’s Shorthand outline for the word “only”

Reference


/N/ /L/(down) /Ō / /Ē/

Pronunciation for the word “only”

/ Ō N L Ē/

Appendix

218

Figure 9.32: Illustration of the use of hook F in a vocalised outline

The 35th rule


Objective: to convert a consonant /F/ or /V/ in the middle of a word into a small hook. A

sample Pitman’s Shorthand outline containing the hook F in the middle is illustrated in

Figure 9.33.

Strategy:

1. if /F/ or /V/ is at the beginning of an input, it is not converted into a small hook.

2. If /F/ or /V/ is in the middle of an input, and if neighbouring consonants of /F/ or /V/

are two straight downward strokes, or a combination of a straight downward stroke

and a curve (Th, S or Z), the consonant /F/ or /V/ is converted into a small hook.

Figure 9.33: Illustration of the use of hook V in the middle of a vocalised outline

The 36th rule


Pitman’s Shorthand outline for the word “divide”

Reference


/D/ /V/ /D V/ /D/ /Ĭ/ /I/

Pronunciation for the word “divide”

/ D Ĭ V I D/

Pitman’s Shorthand outline for the word “rough”

Reference


/R/ /F/ /R F / /Ŭ/

Pronunciation for the word “rough”

/ R Ŭ F/

Appendix

219

Objective: to convert the sound SHUN in the middle or at the end of a word into a small or

large hook. A sample Pitman’s Shorthand outline containing a large SHUN hook at the end

is illustrated in Figure 9.34.

Strategy:

1. if the sound SHUN appears at the beginning of a word, it is not converted into a

hook;

2. if the sound SHUN is preceded by a circle S or Z, it is converted into a small hook;

3. otherwise, the sound SHUN in the middle or at the end of an input is converted into

a large hook.

Figure 9.34: Illustration of the use of a large SHUN hook in a vocalised outline

The 37th rule


Objective: to convert the consonant /N/ of a word either into a small hook or a circle. A

sample Pitman’s outline containing a hook N at the end is illustrated in Figure 9.35.

Strategy:

1. if /N/ is at the beginning of an input, it is not converted into a small hook.

2. If /N S/ appears at the end of an input and is preceded by a curve stroke, the

consonant /N/ is not converted into a small hook.

3. If /N/ is immediately following /S/ or /Z/, it is not converted into a small hook.

4. If /N/ is at the end of an input, it is converted into a small hook.

Pitman’s Shorthand outline for the word “attention”

Reference


/T/ /N/ /SHUN / /Ă/ /Ě

Pronunciation for the word “attention”

/ Ă T Ĕ N SH Ĭ N/

Appendix

220

5. If /N Z/ appears at the end of an input and is preceded by a curve stroke, the sound

/N Z/ is converted into a series of a small hook followed by a small circle.

6. If /N Z/ or /N S/ appears at the end of an input and is preceded by a straight stroke,

the sound /N Z/ or /N S/ is converted into a small circle.

7. If the sound /N SES/ or /N ZES/ appears at the end of an input and is preceded by a

straight stroke, the sound /N SES/ or /N ZES/ is converted into a large circle.

8. If the sound /N STER/ or /N ST/ appears at the end of an input and is preceded by a

straight stroke, the sound /N STER/ or /N ST/ is converted into a large loop or small

loop respectively.

9. If the sound /N T S/ or /N D S/ appears at the end of an input and is preceded by a

straight stroke, the sound /N T S/ or /N D Z/ is converted into a small circle.

Figure 9.35: Illustration of the use of hook N at the end of a vocalised outline

The 38th rule


Objective: to convert the consonant /L/ of a word into an upward stroke. A sample Pitman’s

Shorthand outline containing the consonant /L/ is illustrated in Figure 9.36.

Strategy: replace consonant /L/ with an upward stroke L.

Pitman’s Shorthand outline for the word “alone”

Reference


/L/ /N/ /N / /Ă/ /Ō/

Pronunciation for the word “alone”

/Ă L Ō N/

Appendix

221

Figure 9.36: Illustration of the use of an upward stroke L in a vocalised outline

The 39th rule


Objective: to omit the consonant /D/ or /T/ in a word of two or more syllables according to

the half-length rule of Pitman’s Shorthand (description of the rule can be referenced in

appendix B). A sample Pitman’s Shorthand outline containing the omission of /D/ and /T/ is


Strategy:

1. if neighbouring consonants of /T/ or /D/ are incompatible, the consonant /T/ or /D/ is

not omitted. A list of incompatible neighbouring consonants in relation to the

omission of /T/ or /D/ is given (Figure 9.38).

2. otherwise, /T/ or /D/ is omitted.

Figure 9.37: Illustration of omission of T or D in a vocalised outline

Pitman’s Shorthand outline for the word “deduct”

Reference


/D/ /D D/ / K/ /K T/ /Ĕ/ /Ŭ/

Pronunciation for the word “deduct”

/ D Ĕ D Ŭ K T/

Pitman’s Shorthand outline for the word “mail”

Reference


/M/ /L/ / Ā/

Pronunciation for the word “mail”

/ M Ā L/

Appendix

222

Figure 9.38: Illustration of incompatible combination of primitives for halving

The 40th rule


Objective: to convert the suffix –LY into a vowel primitive according to the –LY rule of

Pitman’s Shorthand (description of the rule can be referenced in appendix B). A sample

Pitman’s Shorthand outline containing the suffix –LY is illustrated in Figure 9.39.

Strategy: if the input contains the suffix –LY, replace the suffix –LY with a vowel Ĭ.

Figure 9.39: Illustration of the use of suffix –LY in a vocalised outline

The 41st rule


Objective: to convert the consonant /R/ into an upward or downward stroke. Sample

Pitman’s Shorthand outlines containing an upward or downward R are illustrated in Figure

9.40.

Strategy:

Pitman’s Shorthand outline for the word “solely”

Reference


/S/ /L/ / Ō/ /Ĭ /

Pronunciation for the word “solely”

/ S Ō L L Ĭ /

A series of primitives that cannot be represented by halving

F/V + K/G + T/D

K/G +K/G +T/D

L+K/G+T/D

R+T/D

S+T

Appendix

223

1. if /R/ appears at the beginning of an input, it is converted into an upward stroke.

2. If /R/ is followed by a sounded vowel at the end of an input, it is converted into an

upward stroke.

3. If /R/ appears at the end of an input, it is converted into a downward stroke.

4. If /R/ is preceded by a vowel at the beginning of an input, it is converted into a

downward stroke.

5. If /R/ is followed by a circle S or SES at the end of an input, it is converted into a

downward R.

6. If /R/ is followed by /M/, it is converted into a downward stroke.

Figure 9.40: Illustration of the use of an upward or downward R in vocalised outlines

The 42nd rule


Objective: to convert the consonant /H/ at the beginning of a word into a dash primitive. A

sample Pitman’s Shorthand outline containing a dash H at the beginning is illustrated in

Figure 9.41.

Strategy: if /H/ appears at the beginning of an input and is followed by /M/, /L/ or downward

R, it is converted into a dash primitive.

Pitman’s Shorthand outline for the words “rail” and “erase”

Reference


/R/ /L/ / R/ /S/ /Ĕ/ /Ā/

Pronunciations for the words “rail” and “erase”

/ R Ā L/, /Ĕ R Ā S/

Appendix

224

Figure 9.41: Illustration of the use of a dash H in a vocalised outline

The 43rd rule


Objective: to reverse an orientation of initially hooked FR, VR, Thr or THR according to the

“reverse rule” of Pitman’s Shorthand (description of the rule can be referenced in appendix

B). A sample Pitman’s Shorthand outline containing a reversed VR hook is illustrated in

Figure 9.42.

Strategy: if a series of /small_hook+F/, /small_hook+V/, /small_hook+Thr/ or /small_

hook+THR/ is followed by an upstroke or a horizontal stroke, it is converted into a series of

/small hook +R/, /small hook+R/, /small hook+stroke S/ or /small hook +stroke Z/

respectively.

Figure 9.42: Illustration of the use of reverse VR hook in a vocalised outline

The 44th rule


Pitman’s Shorthand outline for the word “cover”

Reference


/K/ /V R/ /V R/ (reversed) /Ŭ/

Pronunciation for the word “cover”

/ K Ŭ V Ĭ R/

Pitman’s Shorthand outline for the word “home”

Reference


/H/ /H/ /M/ /Ō/

Pronunciation for the word “home”

/ H Ō M/

Appendix

225

Objective: to convert consonants that have not been converted into geometric features into

their corresponding primitives.

Strategy:

1. replace the consonant /P/ with stroke P

2. replace the consonant /B/ with stroke B

3. replace the consonant /T/ with stroke T

4. replace the consonant /D/ with stroke D

5. replace the consonant /K/ with stroke K

6. replace the consonant /G/ with stroke G

7. replace the consonant /M/ with stroke M

8. replace the consonant /N/ with stroke N

9. replace the consonant /NG/ with stroke NG

10. replace the consonant /F/ with stroke F

11. replace the consonant /V/ with stroke V

12. replace the consonant /Th/ with stroke Th

13. replace the consonant /TH/ with stroke TH

14. replace the consonant /W/ with a series of small hook followed by upward R

15. replace the consonant /Y/ with a series of small hook followed by upward R

16. replace the consoant /CH/ with stroke CH

17. replace the consonant /JH/ with stroke JH

18. replace the consonant /SH/ with stroke SH

19. replace the consonant /S/ with a small circle

20. replace the consonant /Z/ with a small circle

21. replace the consonant /ZH/ with a stroke ZH

22. replace the consonant /H/ with a series of small circle, followed by stroke R

The 45th rule: extract vowels of a word and append them to the end of consonant primitives.

Appendix

226

The 46th rule: convert vowels into their corresponding geometric primitives.

Appendix

227

Appendix B

Certain Rules of Pitman’s Shorthand Mentioned in the Discussion

on the Automatic Generation of a Machine-readable Pitman’s

Shorthand Lexicon

Rule’s name Description of the rule stated in Pitman’s Shorthand[Oj95]

MD, ND Strokes M and N are halved and thickened to add the

following sound of D.

Double length strokes All curved strokes are doubled in length to represent the

addition of the syllables –TER, -DER, -THER, -TURE.

Half-length strokes In words of two or more syllables a stroke is generally halved

to indicate the following sound T or D.

Suffix -LY The suffix –LY is represented by upward L and the third place

Ĭ vowel.

Reversed FR, VR, Thr,

THR

The initially hooked FR, VR, Thr, THR are always reversed

when immediately following upstrokes and horizontals.

lexicon organisation and contextual methods for …pszcah/pdf/swemyohtwethesis.pdf · lexicon...

Documents