lexicon organisation and contextual methods for …pszcah/pdf/swemyohtwethesis.pdf · lexicon...
TRANSCRIPT
Lexicon Organisation and
Contextual Methods for Online
Handwritten Pitman’s
Shorthand Recognition
by Swe Myo Htwe, BSc
Thesis submitted to the University of Nottingham for the degree of
Doctor of Philosophy
School of Computer Science and Information Technology
December 2006
To my parents and fiancé
ii
Abstract
This research investigates innovations for the computer transcription of handwritten Pitman’s
Shorthand as a rapid means of text entry (up to 100 words per minute) into today’s pen-based
handheld devices.
Two mathematical models are developed in this work. The first model deals with high level
phonetic-based translation, while the second model is specifically concerned with low level
primitive-based translation. Both models are closely related to the lexicon organization and
contextual processing for online handwritten Pitman’s Shorthand recognition.
A number of research issues that arise from interpreting handwritten Pitman’s Shorthand strokes
of digital ink as text are addressed including: (a) a feasibility study into improving a conventional
phonetic-based transliteration approach to advance word recognition; (b) an investigation into
new Bayesian Network modelling of strokes and their relationships in order to solve the problem
of geometric variations and vowel ambiguities of handwritten Pitman’s Shorthand; (c) generation
of a new machine-readable Pitman’s Shorthand lexicon to facilitate the direct transcription of
geometric features of Pitman’s Shorthand into English text; (d) analysis of the impact of
statistical language modelling in handwriting phrase recognition; (e) and a discussion of the
graphical user interface issues in relation to the development of a commercial prototype from the
frame of reference of this research.
The research has been carried out in close cooperation with Nanyang Technology University
(NTU) in Singapore. The system is currently undergoing a final evaluation in terms of its
recognition accuracy, as well as its potential to be introduced as a commercially viable fast text
input system.
iii
Acknowledgements
I would like to take this opportunity to express my sincere gratitude to my supervisor Dr.
Colin Higgins for his valuable guidance and constant support since the day I had stepped
into the School of Computer Science of the University of Nottingham till the day of the
successful completion of this research.
My sincere gratitude also goes to Professor Graham Leedham for his dedicated guidance and
genuine assistance for keeping the close collaboration between the two participating teams
of this research. My deepest thanks also go to Ma Yang for her heartfelt contribution and
her immediate responses during the critical time of this collaborative research.
My sincere thanks also go to Ms. Joyce Cox for her kind and professional help in proof
reading the quality of English of the thesis. Also, from the bottom of my heart, I am very
grateful to all participants who have helped me in the experiments of this research. Many
thanks also go to my colleagues in the LTR Research group for their warm friendship that
made me feel at home in our LTR lab.
Also, my endless thanks to my uncle, Dr. Kyin Win for supporting me financially and
emotionally to make my dream of participating in the doctorate research come true. My
heartfelt thanks also go to the International Office and the School of Computer Science of
the University of Nottingham for their enormous financial support for the development of
this research.
Also, my sincere love and thanks to my parents, fiancé, and all my friends in Nottingham for
supporting me financially, emotionally and spiritually during the difficult days of my long
residence in Nottingham.
Last but not least, my sincere thanks to all the members of the School of Computer Science
of the University of Nottingham for all their help and advice, given to me when I needed it
most.
Thank you all, Swe Myo Htwe.
iv
Table of contents
ABSTRACT …………………………………………………………………………………………..II
ACKNOWLEDGEMENTS…………………………………………………………………………III
TABLE OF CONTENTS …………………………………………………………………………...IV
LIST OF FIGURES ………………………………………………………………………………
LIST OF TABLES ………………………………………………………………………………….
1 LINGUISTIC POST PROCESSING OF HANDWRITTEN PITMAN’S SHORTHAND .. 1
CHAPTER 1 INTRODUCTION ................................................................................................... 2
1.1 BACKGROUND...................................................................................................................... 31.1.1 Collaboration ................................................................................................................. 31.1.2 Motivation ...................................................................................................................... 41.1.3 Scope .............................................................................................................................. 5
1.2 BRIEF OVERVIEW................................................................................................................. 71.2.1 General Objectives and Contributions ........................................................................... 7
1.3 SYNOPSIS OF THE DISSERTATION ....................................................................................... 12
2 BACKGROUND TO THE AUTOMATIC RECOGNITION OF HANDWRITTEN PITMAN’S SHORTHAND ............................................................................................................... 15
CHAPTER 2 INTRODUCTION ................................................................................................. 16
2.1 EVALUATION OF EXISTING TEXT INPUT SYSTEMS FOR HANDHELD DEVICES ....................... 162.1.1 On-screen keyboards vs. a handwritten Pitman’s Shorthand recognizer..................... 172.1.2 A cursive handwriting recognizer vs. a handwritten Pitman’s Shorthand recognizer . 172.1.3 Gesture based text entry systems vs. a handwritten Pitman’s Shorthand recognizer... 182.1.4 Speech recognition systems vs. a handwritten Pitman’s Shorthand recognizer........... 19
2.2 PITMAN’S SHORTHAND: A BRIEF OVERVIEW ...................................................................... 202.3 AUTOMATIC RECOGNITION OF HANDWRITTEN PITMAN’S SHORTHAND: AN OVERVIEW...... 232.4 HANDWRITING RECOGNITION ALGORITHMS TO IMPROVE A WORD LEVEL TRANSLITERATION
262.4.1 Hidden Markov Models (HMMs) ................................................................................. 272.4.2 Neural Networks........................................................................................................... 282.4.3 Bayesian Networks ....................................................................................................... 29
2.4.3.1 Conditional independence.................................................................................................302.4.3.2 Inference ...........................................................................................................................322.4.3.3 Learning............................................................................................................................33
2.5 NATURAL LANGUAGE PROCESSING ALGORITHMS FOR HANDWRITTEN PHRASE RECOGNITION
352.5.1 Statistical language modeling ...................................................................................... 352.5.2 Viterbi algorithm .......................................................................................................... 36
2.6 PEN APPLICATION PROGRAM INTERFACES (APIS) .............................................................. 372.7 SUMMARY.......................................................................................................................... 38
3 EVALUATION OF PHONETIC BASED TRANSCRIPTION OF VOCALISED HANDWRITTEN PITMAN’S OUTLINES..................................................................................... 39
CHAPTER 3 INTRODUCTION ................................................................................................. 40
3.1 SYSTEM OVERVIEW............................................................................................................ 413.2 TRANSCRIPTION OF VOCALIZED OUTLINES BASED ON A PHONETIC APPROACH................... 433.3 LEXICON PREPARATION ..................................................................................................... 443.4 NEAREST NEIGHBOURHOOD QUERY (NNQ) ...................................................................... 473.5 FEATURE TO PHONEME CONVERSION ................................................................................ 493.6 PHONEME ORDERING ......................................................................................................... 513.7 EXPERIMENTAL RESULTS ................................................................................................... 54
3.7.1 Data Set ........................................................................................................................ 543.7.2 Analysis of a phonetic lexicon ...................................................................................... 55
v
3.7.3 Performance evaluation of the word level transcription .............................................. 573.8 DISCUSSION ....................................................................................................................... 58
4 BAYESIAN NETWORK BASED WORD TRANSCRIPTION ........................................... 61
CHAPTER 4 INTRODUCTION ................................................................................................. 62
4.1 SYSTEM OVERVIEW ........................................................................................................... 634.2 SUMMARY OF BAYESIAN NETWORK BASED WORD TRANSCRIPTION................................... 644.3 LIFE CYCLE OF OUTLINE MODELS ..................................................................................... 654.4 OUTLINE MODEL ARCHITECTURE...................................................................................... 67
4.4.1 Nodes of an outline model ............................................................................................ 684.4.2 Relationships between nodes........................................................................................ 73
4.5 INFERENCE......................................................................................................................... 744.5.1 Message Initialization .................................................................................................. 754.5.2 Belief Updating ............................................................................................................ 76
4.6 LEARNING OF OUTLINE MODELS ....................................................................................... 784.6.1 Learning of consonant primitives ................................................................................. 794.6.2 Learning of vowel primitives ........................................................................................ 81
4.7 MODEL SELECTION ............................................................................................................ 824.8 EXPERIMENTAL RESULT..................................................................................................... 86
4.8.1 Data set ........................................................................................................................ 874.8.2 Evaluation of the recognition engine............................................................................ 894.8.3 Evaluation of the word transcription accuracy ............................................................ 934.8.4 Analysis of word transcription accuracy using the single consonant data set ............. 94
4.8.4.1 Analysis of the recognition accuracy vs. the transcription accuracy.................................944.8.4.2 Analysis of the accuracy of a result list ............................................................................954.8.4.3 Analysis of the correction accuracy vs. the classification/vowel errors ............................974.8.4.4 Analysis of factors influencing the accuracy of a result list..............................................98
4.8.5 Analysis of word transcription accuracy using stroke-combination data set ............... 994.8.5.1 Analysis of the recognition accuracy vs. the transcription accuracy.................................994.8.5.2 Analysis of the accuracy of a result list ..........................................................................1004.8.5.3 Analysis of the correction accuracy vs. the classification/vowel errors ..........................1014.8.5.4 Analysis of factors influencing the accuracy of a result list............................................102
4.8.6 Analysis of word transcription accuracy for the special-rule data set ....................... 1034.8.6.1 Analysis of the recognition accuracy vs. the transcription accuracy...............................1034.8.6.2 Analysis of the accuracy of the result list .......................................................................1044.8.6.3 Analysis of the correction accuracy vs. the classification/vowel errors ..........................1054.8.6.4 Analysis of factors influencing the accuracy of a result list............................................106
4.9 DISCUSSION ..................................................................................................................... 107
5 GENERATION OF A MACHINE-READABLE PITMAN’S SHORTHAND LEXICON110
CHAPTER 5 INTRODUCTION ............................................................................................... 111
5.1 OVERVIEW ....................................................................................................................... 1125.1.1 Rule-based creation of the electronic Pitman’s Shorthand lexicon............................ 113
5.2 STRUCTURE OF THE ELECTRONIC PITMAN’S SHORTHAND LEXICON ................................. 1145.2.1 Feature set.................................................................................................................. 1145.2.2 Key.............................................................................................................................. 1155.2.3 Lexicon layout ............................................................................................................ 116
5.3 CONVERSION PROCEDURE................................................................................................ 1185.3.1 The importance of algorithms of the presented rules ................................................. 1195.3.2 Description of Rules ................................................................................................... 120
5.4 EXPERIMENTAL RESULTS ................................................................................................. 1275.4.1 Data set ...................................................................................................................... 1275.4.2 Analysis of the accuracy of a machine readable Pitman’s Shorthand lexicon ........... 1285.4.3 Analysis of the distribution of homophones in machine-readable Pitman’s Shorthandlexicons 134
5.5 DISCUSSION ..................................................................................................................... 136
6 PHRASE LEVEL TRANSCRIPTION OF ONLINE HANDWRITTEN PITMAN’S SHORTHAND OUTLINES ............................................................................................................ 137
vi
CHAPTER 6 INTRODUCTION ............................................................................................... 138
6.1 CONTEXTUAL REJECTION STRATEGY ............................................................................... 1396.2 HANDWRITTEN PITMAN’S SHORTHAND PHRASE RECOGNITION........................................ 1416.3 THE INTEGRATION OF A PITMAN’S SHORTHAND PHRASE RECOGNISER WITH APIS........... 1436.4 EXPERIMENTAL RESULTS ................................................................................................. 1466.5 DISCUSSION ..................................................................................................................... 146
7 GRAPHICAL USER INTERFACES OF THE HANDWRITTEN PITMAN’S SHORTHAND RECOGNITION SYSTEM................................................................................... 148
CHAPTER 7 INTRODUCTION ............................................................................................... 149
7.1 OVERVIEW ....................................................................................................................... 1507.2 INK DATA COLLECTION IN THIS RESEARCH....................................................................... 1517.3 GENERAL TRAINING DATA COLLECTION TOOL ................................................................. 1557.4 DEVELOPER GRAPHICAL USER INTERFACE...................................................................... 1587.5 SHORTHAND DATA ENTRY GRAPHICAL USER INTERFACES................................................ 1597.6 EXPERIMENTAL RESULTS ................................................................................................. 164
7.6.1 Analysis of the general distribution of user fondness for the presented prototypes ... 1667.6.2 Analysis of the distribution of user fondness for the presented prototypes in the case of speed writing............................................................................................................................. 1677.6.3 Analysis of the distribution of user fondness for the presented prototypes in the case of a small amount of text entry into handheld devices .................................................................. 1677.6.4 The comparison of the most favourite GUI of experienced shorthand writers and that of novice shorthand writers ...................................................................................................... 168
7.7 DISCUSSION ..................................................................................................................... 169
8 CONCLUSION ....................................................................................................................... 171
CHAPTER 8 INTRODUCTION ............................................................................................... 172
8.1 RESEARCH WORK SUMMARY............................................................................................ 1728.2 CONTRIBUTION ................................................................................................................ 1748.3 FUTURE WORK ................................................................................................................. 175
8.3.1 Improvement upon the overall system ........................................................................ 1758.3.2 Application of the presented system to the real life problems .................................... 177
8.4 DISSEMINATION ............................................................................................................... 177
REFERENCES................................................................................................................................. 180
APPENDIX .......................................................................... ERROR! BOOKMARK NOT DEFINED.
vii
FIGURE 1.1: SCOPE OF THE THESIS........................................................................................... 6FIGURE 1.2: A HIGH LEVEL VIEW OF THE SCOPE OF THE RECOGNITION ENGINE
AND THE TRANSCRIPTION ENGINE................................................................................... 9FIGURE 2.1: ILLUSTRATION OF TEXT ENTRY USING SHARK SYSTEM (A) THE WORD
“QUICK” IS WRITTEN USING ATOMIK KEYBOARD LAYOUT (B) THE WORD “QUICK” IS WRITTEN WITHOUT USING A TEMPLATE KEYBOARD ........................... 19
FIGURE 2.2: BASIC CONSONANTS OF PITMAN’S SHORTHAND AS ILLUSTRATED IN [OJ95] ......................................................................................................................................... 21
FIGURE 2.3: /W/, /Y/, H/ CONSONANTS OF PITMAN’S SHORTHAND ..................................... 21FIGURE 2.4: VOWELS, DIPHTHONGS AND DIPHONES OF PITMAN’S SHORTHAND.......... 21FIGURE 2.5: ILLUSTRATION OF VOCALIZED OUTLINES ........................................................ 22FIGURE 2.6: (A) BASIC NOTATIONS OF PITMAN’S SHORTHAND (B) THE WORD “PLAY”
IS WRITTEN PHONETICALLY USING BASIC NOTATIONS (C) THE WORD “PLAY” IS WRITTEN USING A SPECIAL RULE OF PITMAN’S SHORTHAND ................................. 22
FIGURE 2.7: (A) SAMPLES OF SHORT FORMS (B) SAMPLES OF PHRASES........................... 23FIGURE 2.8: A SAMPLE HMM MODEL FOR A SINGLE OUTLINE OF PITMAN’S
SHORTHAND. AT EACH STATE I, ΒI PROBABILITY OF A PARTICULAR STROKE SI
TO BE TYPE TI IS OBSERVED. .............................................................................................. 27FIGURE 2.9: AN INDIVIDUAL CELL A OF NEURAL NETWORK, MODELLED FOR THE
CLASSIFICATION OF HANDWRITTEN PITMAN’S SHORTHAND IN [LQ90] ................ 28FIGURE 2.10: ILLUSTRATION OF STROKE DEPENDENCIES IN PITMAN’S SHORTHAND
(A) VOWEL DEPENDENCY (B) POSITIONAL DEPENDENCY OF THE FIRST CONSONANT PRIMITIVE ...................................................................................................... 30
FIGURE 2.11: C IS CONDITIONALLY INDEPENDENT OF W GIVEN R .................................... 31
FIGURE 2.12: S IS CONDITIONALLY DEPENDENT ON R GIVEN AN OBSERVED DATA, W ................................................................................................................................... 31
FIGURE 2.13: ILLUSTRATION OF THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICE VERSA. ...................................................................................................................................... 32
FIGURE 3.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 42FIGURE 3.2: DETAILED VIEW OF A VOCALIZED OUTLINE INTERPRETER...................... 44FIGURE 3.3: ILLUSTRATION OF SAMPLE WORDS IN NORMAL ENGLISH AND
PITMAN’S SHORTHAND ....................................................................................................... 45
Consonant neighbourhoods vowel neighbourhood circle neighbourhood
F, V
P,BR
S, Z
TH, th
T, D at the beginning of an outline
in the middle of an outline
at the end of an outline
close circles
unclose circles
hooks
FIGURE 3.4: SAMPLE NEIGHBOURHOODS PREDEFINED IN THE NEAREST NEIGHBOURHOOD QUERY APPROACH ......................................................................... 47
FIGURE 3.5: SAMPLE OUTPUT PRODUCED BY THE NEAREST NEIGHBOURHOOD QUERY ...................................................................................................................................... 48
viii
FIGURE 3.6: SAMPLE OF PHONEME TRANSLATION OF A DOUBLE LENGTH STROKE.................................................................................................................................................... 51
FIGURE 3.7: (A) SAMPLE INPUT OF PHONEME ORDERING PROCESS (B) SAMPLE OUTPUT OF PHONEME ORDERING PROCESS ............................................................. 52
FIGURE 3.8: SAMPLE ELEMENT OF A PHONETIC LEXICON IN A HASH TABLE............. 54FIGURE 3.9: SAMPLE COLLECTED OUTLINES ........................................................................ 55FIGURE 3.10: THE DISTRIBUTION OF HOMOPHONES IN DIFFERENT SIZED
PHONETIC LEXICONS .......................................................................................................... 55FIGURE 3.11: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
CONFUSION BETWEEN A CIRCLE AND A HOOK .......................................................... 59FIGURE 3.12: ILLUSTRATION OF THE INCIDENCE OF PHONEME VARIATION DUE TO
LENGTH CONFUSION ........................................................................................................... 60FIGURE 4.1: AN ABSTRACT VIEW OF THE WHOLE SYSTEM .............................................. 63
FIGURE 4.2: ILLUSTRATION OF BAYESIAN NETWORK BASED WORD TRANSCRIPTION ................................................................................................................... 64
FIGURE 4.4 ILLUSTRATION OF THREE PAIRS OF SIMILAR OUTLINES GROUPED IN AN OUTLINE MODEL.................................................................................................................... 66
FIGURE 4.5 LIFE CYCLE OF OUTLINE MODELS ........................................................................ 67FIGURE 4.6 ILLUSTRATION OF DIFFERENT CHRONOLOGICAL WRITING ORDER OF
NORMAL ENGLISH AND PITMAN’S SHORTHAND .......................................................... 68FIGURE 4.7 ILLUSTRATION OF UNIQUE NODES OF AN OUTLINE MODEL ......................... 69FIGURE 4.8 ILLUSTRATION OF STEP BY STEP CREATION OF OUTLINE MODELS............ 71FIGURE 4.9 SAMPLE TRAINING DATA FOR THE WORD “BAKE” PROCESSED BY THE
RECOGNITION ENGINE; THE ITALIC TEXT ON THE RIGHT EXPLAINS WHAT EACH LINE OF DATA REPRESENTS ............................................................................................... 72
FIGURE 4.10 ILLUSTRATION OF CONDITIONAL DEPENDENCY OF VARIABLES IN AN OUTLINE MODEL USING THE BAYES BALL ALGORITHM [SR98]. IF THERE IS NO FLOW OF A BALL FROM A TO B IN A GRAPH, A AND B ARE CONDITIONALLY INDEPENDENT GIVEN A SET OF OBSERVED OR HIDDEN VARIABLES X AND VICEVERSA. ...................................................................................................................................... 74
FIGURE 4.11 ILLUSTRATION OF OUTLINE MODEL SELECTION STRATEGIES................... 86
ix
FIGURE 4.12: SAMPLES OF THE STROKE COMBINATION DATA SET .............................. 87FIGURE 4.13: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD “AFTER”; (A)
THE WORD “AFTER” IS WRITTEN ACCORDING TO THE DIRECT CONVERSION OF PHONEMES INTO PRIMITIVES (B) THE WORD “AFTER” IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND ......... 88
FIGURE 4.14: SCREEN SHOT OF OUTLINES WRITTEN BY WRITER A ............................. 89FIGURE 4.15: EVALUATION OF THE VOCALISED OUTLINE IDENTIFICATION OF THE
RECOGNITION ENGINE........................................................................................................ 90FIGURE 4.16: EVALUATION OF THE SEGMENTATION ACCURACY OF THE
RECOGNITION ENGINE........................................................................................................ 92FIGURE 4.17: EVALUATION OF THE CLASSIFICATION ACCURACY OF THE
RECOGNITION ENGINE........................................................................................................ 93FIGURE 4.18: ILLUSTRATION OF A RELATIONSHIP BETWEEN RECOGNITION
ACCURACY AND TRANSCRIPTION ACCURACY OF THE SINGLE CONSONANT DATA SET................................................................................................................................. 95
FIGURE 4.19: COMPARISON OF THE HANDWRITING OF TWO WRITERS ....................... 96FIGURE 4.20: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE
SINGLE CONSONANT DATA SET ...................................................................................... 96FIGURE 4.21: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON
WITH THE CLASSIFICATION OR VOWEL ERRORS OF THE SINGLE CONSONANT DATA SET................................................................................................................................. 98
FIGURE 4.22: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (SINGLE CONSONANT DATA SET) ........................................................................................................................................... 99
FIGURE 4.23: ILLUSTRATION OF THE RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATIONDATA SET............................................................................................................................... 100
FIGURE 4.24: ILLUSTRATION OF THE WORD TRANSCRIPTION ACCURACY OF THE STROKE-COMBINATION DATA SET ................................................................................ 101
FIGURE 4.25: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON WITH THE CLASSIFICATION/VOWEL ERRORS OF THE STROKE COMBINATION DATA SET............................................................................................................................... 102
FIGURE 4.26: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (STROKE-COMBINATION DATA SET) ............................................................................................................................. 103
FIGURE 4.27: RELATIONSHIP BETWEEN RECOGNITION ACCURACY AND TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ........................ 104
FIGURE 4.28: EVALUATION OF THE WORD TRANSCRIPTION ACCURACY OF THE SPECIAL-RULE DATA SET ................................................................................................. 105
FIGURE 4.29: ILLUSTRATION OF THE CORRECTION ACCURACY IN COMPARISON WITH CLASSIFICATION OR VOWEL ERRORS OF THE SPECIAL-RULE DATA SET.................................................................................................................................................. 106
FIGURE 4.30: ILLUSTRATION OF AN AVERAGE DISTRIBUTION OF FACTORS INFLUENCING THE ACCURACY OF A RESULT LIST (SPECIAL-RULE DATA SET).................................................................................................................................................. 107
FIGURE 5.1: (A) SAMPLE ENTRIES OF A CONVENTIONAL PITMAN’S SHORTHANDDICTIONARY AVAILABLE IN BOOK FORMAT (B) SAMPLE ENTRIES OF AN ELECTRONIC PITMAN’S SHORTHAND LEXICON ........................................................ 112
FIGURE 5.2: SAMPLE KEYS OF THE ELECTRONIC PITMAN’S SHORTHAND LEXICON; VOWELS ARE UNDERLINED ............................................................................................. 115
FIGURE 5.3: SAMPLE ENTRIES OF THE ELECTRONIC PITMAN’S SHORTHANDLEXICON................................................................................................................................. 116
FIGURE 5.4: ILLUSTRATION OF THE CONVERSION PROCEDURE ................................. 119FIGURE 5.5: ILLUSTRATION OF THE USE OF A DOT PRIMITIVE FOR THE SOUND COM
AT THE BEGINNING OF A WORD .................................................................................... 123FIGURE 5.6: ILLUSTRATION OF THE USE OF NEGATIVE PREFIX IR- IN A VOCALISED
OUTLINE ................................................................................................................................. 124FIGURE 5.7: ILLUSTRATION OF THE USE OF PL HOOK IN A VOCALISED OUTLINE .. 125FIGURE 5.8: ILLUSTRATION OF A ONE SYLLABLE HALF-LENGTH OUTLINE ............... 126
x
FIGURE 5.9: ILLUSTRATION OF THE OMISSION OF THE SYLLABLE –TER IN A VOCALISED OUTLINE ......................................................................................................... 126
FIGURE 5.10: ILLUSTRATION OF INCOMPATIBLE PRIMITIVE PAIRS FOR DOUBLING.................................................................................................................................................. 127
FIGURE 5.11: SAMPLE ENTRIES OF A MACHINE-READABLE PITMAN’S SHORTHANDLEXICON................................................................................................................................. 128
FIGURE 5.12: AVERAGE ACCURACIES OF DIFFERENT SIZES OF MACHINE-READABLE PITMAN’S SHORTHAND LEXICONS.......................................................... 129
FIGURE 5.13: TWO DIFFERENT OUTLINES FOR THE WORD “WEATHER”; (A) THE WORD “WEATHER IS WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND; (B) THE WORD “WEATHER” IS NOT WRITTEN ACCORDING TO THE DOUBLE-LENGTH RULE OF PITMAN’S SHORTHAND ....... 131
FIGURE 5.14: (A) SHORTHAND OUTLINE FOR THE WORD “FACTOR”; (B) SHORTHAND OUTLINE FOR THE WORD “FURTHER”................................................ 131
FIGURE 5.15: TWO DIFFERENT SHORTHAND OUTLINES FOR THE WORD “UNION” . 132FIGURE 5.16: TWO DIFFERENT OUTLINES FOR THE WORD “LANDLORD” .................. 132FIGURE 5.17: TWO DIFFERENT OUTLINES FOR THE WORD “ENVIRONMENT”........... 133FIGURE 5.18: THE DISTRIBUTION OF DIFFERENT CATEGORIES OF ERRORS IN
ELECTRONIC PITMAN’S SHORTHAND LEXICONS OF DIFFERENT SIZES ........... 133FIGURE 5.19: THE DISTRIBUTION OF UNIQUENESS OF THE ELECTRONIC PITMAN’S
SHORTHAND LEXICONS..................................................................................................... 134FIGURE 6.1: SAMPLES OF PITMAN’S SHORTHAND OUTLINES WRITTEN IN THREE
DIFFERENT POSITIONS; (A) OUTLINES WRITTEN INCLUDING VOWEL NOTATIONS, (B) OUTLINES WRITTEN WITHOUT VOWEL NOTATIONS ................ 140
FIGURE 6.2: ILLUSTRATION OF THE HANDWRITTEN PITMAN’S SHORTHAND PHRASE LEVEL TRANSCRIPTION PROCESS................................................................................ 141
FIGURE 6.3: AN ABSTRACT VIEW OF THE OBJECT MODEL “MICROSOFT.INK”.......... 144FIGURE 6.4: SCREEN SHOTS OF THE RECOGNITION RESULTS PRODUCED BY THE
“RECOGNISERCONTEXT” API .......................................................................................... 145FIGURE 6.5: PERFORMANCE OF THE CONTEXTUAL REJECTION STRATEGY ........... 146FIGURE 7.1: FRONT-END AND BACK-END ARCHITECTURE OF THE SYSTEM ............ 149FIGURE 7.2: ILLUSTRATION OF INTERACTIONS BETWEEN USER INTERFACES AND
BACK-END ENGINES OF THE SYSTEM ......................................................................... 150FIGURE 7.3 ILLUSTRATION OF THE TABLET PC PLATFORM APIS PRESENTED AT
[REF] ........................................................................................................................................ 152FIGURE 7.4: ILLUSTRATION OF THE HIGH LEVEL RELATIONSHIP OF OBJECT
MODELS OF THE TABLET PC PLATFORM APIS ........................................................... 154FIGURE 7.5: HOME PAGE OF THE TRAINING DATA COLLECTOR ................................... 155FIGURE 7.6: SAMPLE DATA ENTRY PAGE OF THE TRAINING DATA COLLECTOR GUI
.................................................................................................................................................. 156FIGURE 7.7: SCREEN SHOT OF THE DEVELOPER GRAPHICAL INTERFACE .............. 158FIGURE 7.8: THE FIRST VERSION OF THE COLLABORATOR’S TABLET PC INTERFACE
FOR THE HANDWRITTEN PITMAN’S SHORTHAND RECOGNITION SYSTEM ...... 160FIGURE 7.9: THE LATEST VERSION OF THE COLLABORATOR’S TABLET PC
INTERFACE FOR PITMAN’S SHORTHAND RECOGNITION SYSTEM...................... 161FIGURE 7.10: SCREENSHOT OF A NOTE-PAD LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 163FIGURE 7.11: SCREENSHOT OF AN ALTERNATIVE LAYOUT OF THE END-USER
INTERFACE OF THIS RESEARCH.................................................................................... 164FIGURE 7.12: THUMBNAILS OF THE FOUR GUIS EVALUATED IN THE EXPERIMENT165
xi
FIGURE 7.13: THE GENERAL DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES......................................................................................................................... 166
FIGURE 7.14: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES IN THE CASE OF SPEED WRITING ..................................................... 167
FIGURE 7.15: THE DISTRIBUTION OF USER FONDNESS FOR THE PRESENTED PROTOTYPES IN THE CASE OF A SMALL AMOUNT OF TEXT ENTRY INTO HANDHELD DEVICES.......................................................................................................... 168
FIGURE 7.16: THE COMPARISON OF THE MOST FAVOURITE GUI OF EXPERIENCED SHORTHAND WRITERS AND THAT OF NOVICE SHORTHAND WRITERS ............ 169
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
1
1 Linguistic Post Processing of Handwritten Pitman’s Shorthand
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
2
Chapter 1 Introduction
Recently, there has been a dramatic growth in the use of handheld devices as powerful
appliances to collect and distribute information efficiently. Examples are provided by
companies and organizations worldwide who are implementing mobile business solutions to
accelerate business cycles, increase productivity and reduce operating costs by the use of
mobile phones, tablet PCs, pocket PCs and Personal Digital Assistants (PDAs). Current
handheld computers are applicable to daily business procedures, however the ultimate
usefulness of these handheld devices depends on a solution to a serious bottleneck: textual
information needs to be entered as quickly and accurately as possible, similar to using a full
size keyboard. Computers continue to get smaller and thinner with the thinnest tablet PC
recently launched by NEC merely 1 cm thick and weighing less than 1Kg at the time of
writing. The transformation of a standard “QWERTY” keyboard into these compact devices
has not been so effective; miniature keyboards make text entry very slow at less than 10
words per minute (wpm) [Mt98].
This bottleneck has been a major concern for manufacturers of handheld devices and
decades of research and development have been invested in inventing a feasible means of
text entry into mobile devices, resulting in commercial systems with four main types of text
input methods: (a) on-screen keyboards, (b) handwriting recognition systems, (c) gesture
based text entry systems, and (d) speech recognition systems. The existing systems meet the
fundamental requirement of inputting text into handheld devices, but a solution to practical
rapid text input into handheld devices still remains to be done.
This dissertation presents work on the research, design, implementation and evaluation of
techniques that facilitate rapid text entry into a pen based computer, approximately at the
same rate as speech (i.e., more than 100 words per minute). It is based on Pitman’s
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
3
Shorthand, which is a speed-writing mechanism widely practiced in the real time reporting
community.
This chapter gives an overview of the linguistic post processing system of a handwritten
Pitman’s Shorthand recognizer. It mainly highlights the motivation and scope of the work.
It also outlines the general objectives of the thesis and draws attention to the author’s
contribution to achieve each objective. A synopsis of the thesis that explains the structure of
the dissertation along with a brief summary for each chapter is given at the end of the
chapter.
1.1 Background
1.1.1 Collaboration
Research in this thesis has been carried out in close cooperation with Nanyang
Technological University (NTU) in Singapore to the extent that a team from NTU
contributed to the research and development of low level classification of handwritten ink
data, and a team from the University of Nottingham contributed to transliteration of
classified primitives into English words. The collaboration has been a great success with
several workshops carried out at NTU annually as well as with a series of co-authored
publications [HHL+04a], [HHL+04b], [HHL+04c], [YLH+04a], [YLH+04b], [HHL+05a],
[HHL+05b], [HHL+05c], [YLH+05a], [YLH+05b], [YLH+05c]. In addition, concurrent
development of the two engines (i.e., recognition and transcription engines) has not been
difficult, mainly due to the accessibility of the classified data of the recognition engine since
the start of the project. This is because the collaborator has already carried out extensive
research on the low level segmentation and classification of handwritten Pitman’s Shorthand
outlines for over two decades and the collaborator’s contribution to this research is, in fact,
improving an existing recognition engine rather than developing a completely new one.
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
4
Previous work done by our collaborator can be referenced in [Lg84], [LD84], [LDB84],
[LDB85], [LD86], [LD87], [QL89], [LQ89], [Lg89], [Lg90], [LQ90], [QL91], [NL92],
[LQ92], [QL93]. The transcription engine and work described in this thesis is, however,
new.
1.1.2 Motivation
The major motive behind this research has been to investigate the linguistic post processing
of handwritten Pitman’s Shorthand as a rapid means of text entry on handheld devices and
an evaluation of the overall performance via a tablet PC based demo system. This involves
data pre-processing, lexicon preparation, word level interpretation, phrase level
interpretation and the development of a Graphical User Interface (GUI). No earlier work
fully presents a handwritten Pitman’s Shorthand recognizer for handheld devices with a
complete GUI.
One of the factors that holds the potential for the automatic recognition of handwritten
Pitman’s Shorthand is due to the language itself being simple and fast to write. Pitman’s
Shorthand records speech phonetically and comprises simple notations of 24 consonants, 12
vowels, and 4 diphthongs. It defines 90 of the most frequently used words as shortforms
(i.e., single simple pen strokes invented for speed improvement purposes) and these 90
shortforms account for over 37% of the most commonly used English words [Lg90].
Taking into consideration a simultaneous verbatim written transcription of speech, the
automatic recognition of handwritten Pitman’s Shorthand should not be seen as an option,
but as a necessity for mobile rapid note takers. Regardless of the portability and efficiency
of handheld devices, today’s mobile rapid note-takers (e.g, stenographers) still retain a
traditional way of writing shorthand with a paper notepad and a pencil as their tablet PCs or
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
5
Personal Digital Assistants (PDAs) are not productive enough to record speech in a real
time.
In addition, having a cooperative research network provides a firm foundation on which this
research can be based. The linguistic post processing of handwritten Pitman’s Shorthand
can be taken as a further step of expanding what is already possible with a Pitman’s
Shorthand classifier, as reported in the literature. The classifier supports the noise reduction,
and outlines segmentation and classification of pattern primitives into related categories. It is
a low level processing tool and its output is fed directly to the transcription engine.
Finally, hardware and technical viability played an important role in the successful
development of the whole research. Compared to the time of the previous research,
handheld devices have become more easily accessible with a more powerful engine but at a
cheaper price. A number of mobile PC and tablet PC development tool kits have become
available and these factors have strengthened the feasibility of the research.
1.1.3 Scope
From a handwriting recognition perspective, this research relates to an online recognition1.
It includes a minimal study of the low level processing of handwritten scripts, with a deep
research into the transliteration of shorthand primitives into orthographic English words.
This incorporates theories and techniques of pattern recognition, natural language processing
and mobile PC applications. Figure 1.1 illustrates a high level view of the scope of the
thesis.
1 In an on-line recognition, an input is in the form of successive points of strokes collected in
time order; whereas in the off-line recognition, an input is in the form of a digital image of
handwritten word.
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
6
Figure 1.1: Scope of the thesis
Three areas have been investigated in the field of pattern recognition. The first one is
concerned with setting protocols to interrelate a linguistic post processor with a low level
classification engine. Without the successful integration of these two engines, work in this
thesis would not have been feasible. The second one consists of defining a network model
that not only best represents the natural ambiguity of handwritten Pitman’s Shorthand, but
also produces promising output for a written word. The third area is focused on investigating
relevant word rejection strategies in which the interpretation cost has been taken into
account, mainly in terms of its search time and storage requirements.
In the field of natural language processing, a substantial amount of work has been done in
the construction of a shorthand lexicon that is used to support word level transcription. This
mainly includes the application of rule based algorithms to simulate instinctive knowledge
gained from learning Pitman’s Shorthand and the creation of a shorthand dictionary based on
this knowledge. In addition, a survey on the impact of statistical language modelling in
handwriting recognition has been carried out in relation to phrase level transcription.
Natural language processing
Syntactic knowledge
Lexical semantic knowledge
Pattern recognition
Handwriting recognition
Online handwriting recognition
Pen based PC
applications
Tablet PC applications
Statistical language
model
Scope of the thesis
Legend
Handheld device
applications
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
7
In the field of mobile PC applications, three types of end user interfaces have been
developed in this research; (1) a Training Data Collector, (2) an Advanced User Controller
and (3) a Final User Interface. By using the Training Data Collector, a vast amount of
training data can be collected effectively, and by using the Advanced User Controller, a
developer can have deep insight into the structure of the system, thereby enabling him/her to
make changes to the low level parameter settings. Similarly, by using the Final User
Interface, a user can have a front-end view of the system and can practice real time
shorthand input into handheld devices. The development of the interfaces includes the
application of pen based APIs, analysis of parameters of the transcription engine, collection
of training and testing data, and evaluation of the overall system performance.
1.2 Brief Overview
1.2.1 General Objectives and Contributions
The aim of this research is to propose and evaluate a set of techniques to significantly
improve the transcription accuracy of a handwritten Pitman’s Shorthand recognizer and
deliver a commercially viable and functional prototype. In order to enable the reader to gain
a brief overview of this research, the following questions and answers are provided in which
the questions represent general objectives of the research and the answers highlight the
author’s contribution to achieve the objectives.
A set of questions relating to the system integration and configuration:
How effectively has the Pitman’s Shorthand linguistic post processor integrated with
the collaborator’s low level recognition engine under the condition of developing the
two engines in different countries?
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
8
The solution includes an extensive collaboration between the two teams including
the author’s annual visits to the partner’s institution, setting protocols on the data
flow and modification of components between the two systems, concurrent
evaluation of the whole system on both sites, and co-authoring the publication of
progress reports.
What are the tasks of the recognition engine and the transcription engine in general?
A high level view of the tasks of the recognition and transcription engines are shown
in Figure 1.2. The white boxes at the top of Figure 1.2 represent processes of the
recognition engine and the shaded boxes represent tasks taken by the transcription
engine. The sample input outlines in Figure 1.2 illustrate the functions of the
recognition and transcription engines.
A set of questions relating to the linguistic post processing:
To what extent is the linguistic post processor based on the previous work?
The linguistic postprocessor is based on the recognition engine developed by our
research collaborator and the recognition engine has inherited a majority of the
pervious work reported in the literature. Apart from the recognition components, the
remaining transcription components are based on completely different approaches
rather than the ones reported in the literature. Reasons for using the new approaches
are discussed in detail in Chapter 3.
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
9
Figure 1.2: A high level view of the scope of the recognition engine and the transcription engine
What new approaches are there in this linguistic post processing research?
The Significant new approaches of this thesis are:
x1, y1x2, y2x3, y3x4, y4..
Data collection
Segmentation Classification Word level transcription
Phrase level transcription
Tablet PC based graphical user interface
wornwarmstorm
SuddenWelcomeSeldom
Warm Welcome
3 possible types of Segment 1.
Segment 1x1, y1x2, y2x3, y3x4, y4.. Coordinate of a
pen
worn
2nd sample input outline
Collected data segmented data classified data transcribed word(s) result word(s)
Coordinates of a pen
Segment 1
3 possible types of Segment 1.
1st sample input outline
Legend
Process included in our collaborator’s recognition engine
Process included in the author’s transcription engine
Data flow
collected data segmented data classified data transcribed word(s) result word(s)
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
10
(a) An electronic version of a shorthand dictionary has been successfully created by
using rule based algorithms. A similar kind of Pitman’s Shorthand dictionary in e-
format was never present in the past.
(b) Bayesian Network based outline models have been proposed with the aid of an
electronic shorthand dictionary and training data. These outline models well
represent the distribution of the natural parameters of handwritten Pitman’s
Shorthand and increase word transcription accuracy.
(c) A complete framework for the online recognition of handwritten Pitman’s
Shorthand is reported in this thesis. Whereas, most of the work in the literature
emphasized only an initial segmentation and classification of shorthand primitives.
(d) The first tablet PC based demo system has been produced. This allows a future
researcher to have deep insight into the performance of the recognition and
transcription engines via functional interfaces. It also enables an end user to input
shorthand into a handheld device.
A set of questions relating to the development of a mobile PC application:
For what types of handheld devices is the system intended to be applicable?
The system is intended to be applicable for any pen based mobile device in which
the use of a traditional “QWERTY” keyboard is impractical. Experiments and
evaluations of this thesis are based on tablet PCs with Microsoft Windows XP
Tablet PC Edition 2005.
For what kind of domain/scenario is the application aimed to be applicable?
The application is aimed to be applicable:
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
11
(a) as a rapid text input system on handheld devices
e.g., typing a text message on a mobile phone, inputting rich text information into
PDA.
(b) as a useful tool for stenographers in a real time verbatim written transcription of
the spoken word
e.g., taking a memo in a meeting, taking verbatim legal records of speeches in a
court, providing real time subtitling services for the deaf and hard of hearing
community.
(c) as a real time language translation tool.
With additional configuration, the system can be applicable as a real time language
transliteration tool for an international traveller. For example, using shorthand a
person can write his/her question with English phonetics and immediately the
question is translated into, say, Japanese. This can resolve language barriers for
international travellers enabling them access to essential information. The language
translation feature needs an additional installation and configuration of third party
software and it is not included in the scope of this thesis.
A set of questions relating to training and testing of the overall system:
What kind of people are involved in the training and testing of the overall system?
In order to evaluate a realistic performance of the whole system, the training and
testing involve writers with different levels of skills in Pitman’s Shorthand, different
genders and ages.
How is the whole system performance evaluated?
The overall system performance is evaluated under different criteria such as
unconstrained writing, independent users, different speeds of writing and different
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
12
levels of tidiness. The evaluation process also involves a list of practical concerns
such as usability, learning curve, popularity/commercial viability of the system.
1.3 Synopsis of the Dissertation
The research in this thesis combines theories and techniques from the fields of pattern
recognition, natural language processing and mobile PC application. It aims for a
commercially viable and functional prototype with a set of techniques that significantly
improve the transcription accuracy of a handwritten Pitman’s Shorthand recognizer.
This chapter (Chapter 1) presents the motivation, scope, and background of the research. It
introduces the three main problem areas relating to the themes of the thesis, major objectives
and contributions.
Chapter 2 reviews key concepts in the areas of Pitman’s Shorthand recognition, pattern
recognition and natural language processing. The focus in Pitman’s Shorthand recognition is
on the evaluation of existing text entry methods into handheld devices, the study of Pitman’s
Shorthand, and the review of existing approaches applied to the automatic recognition of
handwritten Pitman’s Shorthand problems. The focus in pattern recognition is on the
analysis of the capabilities of commonly used graphical models to resolve natural
ambiguities of handwriting. Finally, the focus in natural language processing is on the
review of the Viterbi algorithm and statistical language modelling techniques used to
enhance the solution to the phrase level transcription problem.
Chapter 3 reports on a prototype that implements the architecture designs described in the
literature. In particular, it expounds the phonetic based transcription of handwritten
Pitman’s Shorthand outlines and presents the problems that need resolving. This chapter
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
13
primarily discusses whether the conventional phonetic based transliteration method is
efficient for the purpose of the thesis.
Chapter 4 presents the main architecture and design of a novel primitive based transcription
approach. Ambiguities of handwritten Pitman’s Shorthand, in particular, stroke variations
and vowel omissions are resolved by introducing Bayesian Network based shorthand outline
models. The word interpretation includes outline models creation, belief propagation,
Bayesian Network based learning and model selection. The conceptual solution is shown to
improve the solution to the word level transcription problem.
Chapter 5 focuses on the generation of a novel machine-readable Pitman’s Shorthand
lexicon, which is repeatedly applied to the primitive based transcription of handwritten
Pitman’s Shorthand. The rule based conversion of a phonetic representation of a word into a
Pitman’s Shorthand representation is presented. This involves the creation of an electronic
Pitman’s Shorthand lexicon with the application of the writing rules of Pitman’s Shorthand,
plus the evaluation of the proposed methods with different sizes of lexicons.
Chapter 6 proposes a Viterbi algorithm based framework to resolve the Pitman’s Shorthand
specific phrase level transcription problem. The framework comprises Pitman’s Shorthand
related contextual knowledge. Experimental results demonstrate the practical benefits of the
proposed framework.
Chapter 7 documents the roles of the graphical user interfaces of this research that are
designed for the developer’s authoring environment, the experimental user’s authoring
environment, and end-user’s authoring environment. Experimental results substantiate the
affirmative feasibility of the proposed interfaces.
1. Linguistic Post Processing of Handwritten Pitman’s Shorthand
14
This thesis supports the argument that the development of an automatic handwritten
Pitman’s Shorthand interpreter is feasible and useful. Chapter 8 highlights the argument by
reviewing the dissertation’s key points, linking the results to the general objectives,
highlighting the contributions and presenting the perspective future work.
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
15
2 Background to the Automatic Recognition of
Handwritten Pitman’s Shorthand
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
16
Chapter 2 Introduction
This chapter provides background information on the computer aided recognition and
interpretation of handwritten Pitman’s Shorthand. It comprises of seven sections, outlined
as:
- Evaluation of existing text input systems for handheld devices.
- A brief overview of Pitman’s Shorthand.
- An overview of the automatic recognition of handwritten Pitman’s Shorthand.
- Handwriting recognition algorithms to improve word level transliteration.
- Natural language processing algorithms to improve phrase level transliteration.
- Pen interface application interfaces.
- Summary.
2.1 Evaluation of Existing Text Input Systems for Handheld
Devices
After a brief review of text entry into current handheld devices, capabilities of available text
entry methods are evaluated, particularly, in comparison with those of a handwritten
Pitman’s Shorthand recognizer in this section. Methods evaluated include:
- On screen keyboard.
- Cursive handwriting recognition system.
- Gesture recognition system.
- Speech recognition system.
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
17
2.1.1 On-screen Keyboards vs. a Handwritten Pitman’s Shorthand
Recognizer
An on-screen keyboard is a virtual keyboard displayed on the flat display panel of a device
where text is entered by tapping a stylus on it serially, for instance, IBM’s Touchboard for
Windows. The method provides an adequate means of interaction with computers; however,
it requires constant visual attention since keys are not physically sensitive to fingers. From
the aspect of human computer interaction, the use of handwritten data entry has been shown
to be more natural for entering text into handheld devices [Win05]. However, practical use
for one system over another still relies on the purpose of use and/or individual user
preference. If a user prefers a handwritten recognizer to an on-screen keyboard in general, a
commercially viable handwritten Pitman’s Shorthand recognizer may be of great interest to
the user.
2.1.2 A Cursive Handwriting Recognizer vs. a Handwritten Pitman’s
Shorthand Recognizer
A cursive handwriting recognition engine built in Microsoft Tablet PC Edition 2005
[Win05] is a well known handwriting recognizer at the time of writing. It is capable of
interpreting cursive script; however, efficiency of the system is restricted by the limited
speed of normal cursive writing (i.e., less than 40 words per minute). Using handwritten
data not in the form of normal longhand is likely to be a solution to overcome the slow input
problem. According to [Lg90], a Pitman’s Shorthand writing system is an alternative to a
longhand writing system that can be practiced nearly at the same rate as speech (i.e., about
120- 180 words per minute (wmp)).
On the whole, Pitman’s Shorthand presents a number of strengths that facilitate very rapid
writing, but it also presents a drawback in that it has a long learning curve, which includes
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
18
memorizing new phonetic symbols and pronouncing words based on a number of rules.
Having said that, there are millions of Pitman’s Shorthand writers who have received
training in its use [Lg90] and most of them remark that it is worth learning although there is
a little frustration at the time of learning. Therefore, the automatic recognition of
handwritten Pitman’s Shorthand is intended to be beneficial to a particular group of
stenographers, plus some interested users who are dedicated to achieve fast data entry using
handheld devices.
2.1.3 Gesture Based Text Entry Systems vs. a Handwritten Pitman’s
Shorthand Recognizer
In general, gesture based text entry systems provide a virtual keyboard that enables users to
make input gestures. A well known gesture based text entry system at the time of writing is
SHARK [ZK03], developed by IBM. It allows a user to input gestures with the aid of a
virtual, template keyboard and gradually trains the user to be capable of inputting gestures
without using the keyboard. In this way, SHARK eliminates the constant virtual attention
required for a virtual keyboard and produces fast data input. In order to enable the reader to
get a clear view of a gesture based text entry system, word entries into the SHARK system is
shown in Figure 2.1. Figure 2.1 (a) illustrates input for the word “quick” using a virtual
keyboard, and Figure 2.1 (b) illustrates input for the same word “quick” without using the
virtual keyboard.
On the whole, gesture based text entry systems facilitate a faster data input compared to
normal cursive handwriting recognizers; however, memorizing gestures of a substantial
number of words results in a very steep learning curve.
In general, Pitman’s Shorthand recognition is similar to gesture recognition since both
interpret a series of lines into words and produce a fast data input. However, there is no
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
19
need to memorize every gesture of a word using a handwritten Pitman’s Shorthand
recognizer since the construction of Pitman’s outlines is based on a set of phonetic rules.
Figure 2.1: Illustration of text entry using SHARK system (a) The word “quick” is written using ATOMIK keyboard layout (b) The word “quick” is written without using a
template keyboard
2.1.4 Speech Recognition Systems vs. a Handwritten Pitman’s
Shorthand Recognizer
In terms of efficiency and operational cost, speech recognition systems seem the most
outstanding compared to other data input methods because users can speak naturally as well
as rapidly (around 100 -120 words per minute) using speech recognition systems. An
example is given in the real time subtitling of TV programs, where speech is automatically
dictated into text and manual retranslation cost is reduced. A primary negative aspect of
speech recognition systems is that data must be spoken. On some occasions, it is not always
feasible to input data via voice, for instance, an automatic transcription of a noisy debate
using a speech recognition system is considerably difficult unless it is feasible to encourage
speakers to use microphones. Therefore, this research proposes that it is reasonable to
develop a system that facilitates an alternative means to record speech without using speech
input.
Starting point
Starting point
(a) (b)
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
20
2.2 Pitman’s Shorthand: a Brief Overview
Pitman’s Shorthand was first presented by Sir Issac Pitman in 1837 and it has two forms -
New Era and Pitman 2000. Research in this thesis is based on the latter one as Pitman 2000
is a modified version of New Era and offers more accurate transcription as well as a faster
learning curve.
Words are written as they are pronounced with the main feature of Pitman’s Shorthand being
the simplicity of its notations. There are 24 consonants, 12 vowels and 4 dipthongs in
Pitman’s Shorthand. A skeleton of a shorthand outline is formed by a combination of
consonant strokes, and the writing of vowels is optional. This means it is essential to write
the consonant strokes of a word, but vowel notations can be omitted when the writing needs
to be fast. There is no standard rule defined for the omission of vowels – it varies widely
depending on a writer’s experience or an individual’s inclination.
Due to the phonetic based formation of words, Pitman’s Shorthand is easily adaptable to
multiple languages (15 languages to date). It is practiced as a speech-recording medium in
the real time reporting community at a practical rate of about 120-180 words per
minute[Lg90 ]. It is widely used in offices in the UK and is also taught in 74 other
countries. [Lg90]
Figure 2.2 illustrates 21 out of 24 basic Pitman’s consonants with three diagrams, which can
be easily remembered. In order to understand notations of the diagrams, consider the left-
most stroke in Figure 2.2 (a). This stroke indicates that notations for phonemes /P/ and /B/
are and respectively. Note that the two notations are the same down-stroke with
different line thicknesses. Similarly, according to Figure 2.2 (b), notations of phonemes F
and V are and respectively.
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
21
Figure 2.2: Basic Consonants of Pitman’s Shorthand as illustrated in [Oj95]
In addition to the 21 consonants in Figure 2.2, there are three additional consonants of
Pitman’s Shorthand, which are /W/, /Y/ and /H/. These consonants are formed using hooks
and upstrokes as shown in Figure 2.3. Vowels and diphthongs are simple pen strokes and
are also illustrated in Figure 2.4.
Figure 2.3: /W/, /Y/, H/ consonants of Pitman’s Shorthand
Figure 2.4: Vowels, diphthongs and diphones of Pitman’s Shorthand
Words are constructed with consonant and vowel notations in Pitman’s Shorthand and a
script containing both consonants and vowels is called a vocalized outline. Samples of
vocalized outlines, including notations of vowels, diphones and diphthongs are illustrated in
Figure 2.5 .
Vowel notations Diphthong notations Diphones notation
P, B
T, D P, B L R
SH, ZHF, V
th, TH
M
S, Z
N, NG
(a) (b) (c)
R
K, G
W Y H
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
22
Figure 2.5: Illustration of vocalized outlines
By using basic notations, illustrated in Figure 2.2, Figure 2.3, Figure 2.4, a person can start
writing a shorthand outline that is phonetically correct, but not in complete accordance with
the special rules of Pitman’s Shorthand. The special rules include 20 definitions, invented
for speed enhancement purposes and need to be remembered thoroughly if a person wants to
be a professional Pitman’s Shorthand writer. Details about the special rules of Pitman’s
Shorthand can be referenced in [Oj95] and one of the special rules is given as an example
here. In the example (Figure 2.6), the word “play”, comprising of three phonemes (/P/, /L/
and /Ā/) can be written phonetically using the basic notations of Pitman’s Shorthand as
shown in Figure 2.6 (b). However, one of the special rules of Pitman’ shorthand is read: - “if
a phoneme /P/ is followed by a phoneme /L/, the notation for /L/ is transformed into a small
hook and the small hook is attached to the beginning of /P/ stroke.” Therefore, the word
“play” should be written in the form of Figure 2.6 (c) rather than in Figure 2.6 (b) although
the form in (b) is phonetically correct.
Figure 2.6: (a) Basic notations of Pitman’s Shorthand (b) The word “play” is written phonetically using basic notations (c) The word “play” is written using a special rule
of Pitman’s Shorthand
vowel
vowel
diphones diphthong
bait go radio time
Phonemes Basic notations
P
L
Ā
(b) (c)(a)
Outline for the word “play”
Outline for the word “play”
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
23
In addition to vocalized outlines, other important components of Pitman’s Shorthand are
short forms and phrases. In general, short forms and phrases account for over 40% of the
most commonly used English words [Lg90] and are key attributes for facilitating the
outstanding speed of Pitman’s Shorthand. Examples of short forms and phrases are depicted
in Figure 2.7.
Figure 2.7: (a) Samples of short forms (b) Samples of phrases
2.3 Automatic Recognition of Handwritten Pitman’s Shorthand: an
Overview
The first investigation into the feasibility of using handwritten Pitman’s Shorthand as a
means of verbatim transcription as an aid for the deaf was reported by Brooks and Newell
[BN81], [Bc85], [BN85].
Concurrent work [Lg84] investigated this idea in more detail and further work [LD86]
evaluated the enormous potential of the online recognition of handwritten Pitman’s
Shorthand for the real time recording of speech (e.g., verbatim reporting of meetings and
court proceedings). In this approach, four main studies were carried out: (a) detection of a
consonant boundary in a whole outline, (b) classification of segmented consonant strokes,
(c) evaluation of a normal-length stroke confusing with half-length/ double-length strokes,
and (d) evaluation of various inclinations of horizontal and vertical strokes. In addition,
a/an all and as/has do eye/I have
you should not be your company I am not
(a)
(b)
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
24
different classification algorithms were used to classify vocalized outlines and short-forms in
this approach. The best classification rate reported at the time was 14.5%.
In the early 1990s, extensive research was carried out to improve the recognition of
vocalised outlines and short-forms. Leedham [Lg90] reported that using contextual
knowledge was the most feasible means to improve the recognition of short-forms, in which
the recognition was based on a template-matching algorithm. In this approach, the
transliteration was carried out by firstly sorting classified pattern primitives into correct
linguistic order; secondly converting primitives into phonemes using a set of production
rules and finally converting phonemes into orthographic English words. The concept of a
machinography, that is, how to modify the original Pitman’s notations to be ideally suited
for machine recognition was also addressed in this work.
In later work [LQ90], basic notations of Pitman’s Shorthand were categorized into 89 basic
features and incorporated into a neural network. Their approach could correct initial
classification errors and achieved a classification rate of 94.5%.
Concurrently, Leedham and Qiao [LQ90] carried out another experiment to evaluate the
classification performance using a fuzzy classifier. In this approach, the classification (90%
correctness) was achieved by interacting between segmentation and classification processes.
Initial classification errors were also corrected using knowledge of legal primitive-pairs in
this approach.
In 1993, Qiao and Leedham [QL93] took another innovative approach to classify segmented
primitives. Their new method allowed communication between bottom up processes (i.e.,
segmentation based classification) and top down processes (i.e., holistic classification) via an
interactive heuristic (IH) search schema. They reported that locating a boundary between
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
25
features without first recognizing a whole outline was difficult. The performance of their
work was 84% correct segmentation and 58% correct classification.
In the early 2000s, another research group [NB02] [KSN+03] [SKN+04] [KSN04] started
investigating off-line automatic recognition of handwritten Pitman’s Shorthand. This group
concentrated more on the linguistic post-processing of classified primitives into orthographic
English words. Similar to Leedham’s approach, phonetic based transcription using the same
concept of vowel ordering was implemented. Incidence of homophones (outlines which are
written the same but have different representations) was addressed in their work and the
filtering of homophones using domain based rejection strategies and context based rejection
strategies was investigated. They mentioned that an ordinary phonetic dictionary was not
adequate for generating text, and a modified dictionary, particularly designed for the
recognition of Pitman’s Shorthand was necessary. On the whole, a major limitation of their
work was an impractical assumption about homophones i.e., an occurrence of only two
homophones for each word was considered.
In summary, work carried out by pervious research mainly emphasized the low level
segmentation and classification of shorthand primitives with little work reported on the
backend transliteration. This thesis proposes that further extensive research is required to
improve word level transcription as well as phrase level transcription. In order to achieve
this goal, first it is necessary to make a thorough evaluation of recent popular handwriting
recognition algorithms and natural language processing algorithms.
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
26
2.4 Handwriting Recognition Algorithms to Improve a Word Level
Transliteration
Work in this thesis considers the fundamental problem of interpreting shorthand strokes of
digital ink as text. Here, features extracted from a shorthand outline already give a
reasonable separation of strokes and provide the related identity of the strokes. It is
necessary to take into account the context of strokes to achieve a promising interpretation;
however, dealing with spatial context can easily become computationally intensive [BSH04].
For optimum text interpretation, it is practical to take a balanced consideration between
context and the low level ink information of strokes.
In the field of handwriting recognition, a common approach to handle variables (e.g., context
and observed ink information) is by embedding them into a probabilistic model and
discriminating between them based on resultant probabilities. Graphical models are
considered here. Graphical models are a marriage between probability theory and graph
theory [Jm99]. They consist of two kinds: undirected and directed models. Undirected
models have simple definitions of independence, whereas directed models have a more
complicated notion of independence [Mk98]. There is a huge uncertainty and complexity in
the word recognition of handwritten shorthand and directed models are more relevant to
represent features of shorthand as well as interdependencies between them. Popular directed
graphical models are: Hidden Markov Models (HMMs), Neural Networks and Bayesian
Networks. In general, these models belong to the same family – examples are provided that
HMM is a kind of dynamic Bayesian Network; a Neural Network is a kind of input/output
HMM. The primary difference between them is the way variables are structured (i.e.,
topology) and the way interdependencies between variables are handled.
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
27
2.4.1 Hidden Markov Models (HMMs)
Hidden Markov Models represent hidden and observed states in terms of state variables,
which can have complex interdependencies [Mk98]. One of the tutorials on HMM [Rl89]
presents that “The Hidden Markov Model is a finite set of states, each of which is associated
with a (generally multidimensional) probability distribution. Transitions among the states are
governed by a set of probabilities called transition probabilities.”
A sample HMM model, representing an outline of Pitman’s Shorthand is illustrated in
Figure 2.8. In the figure, an outline is divided into several slices - each slice represents a
segmented classified primitive, containing one discrete hidden node and one discrete
observed node.
Figure 2.8: A sample HMM model for a single outline of Pitman’s Shorthand. At each state i, βi probability of a particular stroke Si to be type Ti is observed.
There are several kinds of HMMs depending on network topology: HMMs with a mixture of
Gaussian output, input-output HMMs and factorial HMMs. Details of these algorithms can
be found in the literature [Mk01], [Rl89].
In the field of pattern recognition many systems have applied HMMs – examples include the
representation of utterances as HMMs for speech recognition [Sa04], [MS04]; the
representation of facial images (combinations of hair, forehead, eyes, nose and mouth) as
HMMs for face recognition [HSS02], [KKL03]; the representation of words as HMM for
handwriting recognition [GB04], [HLB00]; the representation of human motion as HMMs
for gesture recognition [CFH03], [KP01]; the representation of pen-gestures (e.g., writing
pressure and smoothness of a line) as HMMs for signature recognition [JBS05], [YWP95].
S1 S2 S3 Si
T1 T2 T3 Ti
β1 β2 β3 βi
Hidden node
Observed node
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
28
Generally, HMMs work extremely well for certain types of applications; however, the
Markov assumption itself, i.e., that the probability of being in a given state at time t only
depends on the state at time t-1 is not always appropriate for certain kinds of problems where
dependencies extend from other states [Rl89].
2.4.2 Neural Networks
Neural Networks are based on the structure of the brain and are designed to mimic a
biological counterpart of neurons [Ri93]. They consist of neurons (i.e., variables),
connected via weighted links where the weight specifies the strength of a particular
connection between one node to another. The use of Neural Networks has been
demonstrated in several pattern recognition applications [Ri93]. Similar to HMMs, Neural
Networks have been devised in different types including single-layer linear Networks,
threshold Networks, multilayer Networks and multilayer Networks with learning.
Figure 2.9: An individual cell a of Neural Network, modelled for the classification of handwritten Pitman’s Shorthand in [LQ90]
A Multilayer Neural Network designed for the transcription of handwritten Pitman’s
Shorthand presented in previous research [LQ90] is illustrated in Figure 2.9. In that
network, there are 20 layers and each layer (i.e., each segment) consists of 89 nodes,
representing the 89 basic Pitman’s primitives. Only one node from each layer is capable of
activating the next layer and the activation is based on the competition among the nodes. A
major drawback of this model is an unnecessary consideration of a wide range of primitives
89 links form the previous layer
89 links to the following layer
Input from the classifier
bias
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
29
in each layer. In fact, by using the context of an outline and a shorthand dictionary, the
number of nodes required for each layer can be normalised.
2.4.3 Bayesian Networks
Word level transcription in this thesis mainly applies Bayesian Network architecture. A
Bayesian Network [Pj88] is a directed acyclic graph in which each node represents a
mutually exclusive and collectively exhaustive set of random variables, and links between
the nodes signify probabilistic dependencies between them. It has been a remarkable tool in
the domain of handwriting recognition for its outstanding ability to model natural ambiguity
(e.g., to model complex stroke relationships). In on-line handwriting recognition, stroke
relationships are usually robust against geometric variation and important for discriminating
characters of similar shapes [CK04].
In Pitman’s Shorthand, stroke relationships mean occurrences of vowel notations and their
positions in a vocalized outline, and starting positions of the first consonant stroke i.e.,
whether it is written above, on or below a base line. An example of vowel dependency and
an example of positional dependency of the first consonant stroke are illustrated in Figure
2.10 (a) and (b) respectively. As shown in Figure 2.10 (a), a dot vowel written at two
different locations (i.e., beginning and end of a stroke) represent two different words in
Pitman’s Shorthand. Similarly, two exact outlines written at two different starting positions
(i.e., above and below a base line) represent two different words in Figure 2.10 (b).
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
30
Figure 2.10: Illustration of stroke dependencies in Pitman’s Shorthand (a) vowel dependency (b) positional dependency of the first consonant primitive
A summary of Bayesian Networks is described in this chapter and implementation of the
network in association with problems of the word level transcription of handwritten
Pitman’s Shorthand is discussed in detail in Chapter 4. In general, the basic concepts of
Bayesian Networks are discussed under the following sections:
Conditional independence.
Inference.
Learning.
2.4.3.1 Conditional Independence
In a Bayesian Network, an edge between nodes defines a dependency between variables. For
example, consider the event “grass is wet” and possible causes “cloudy” and “rain”. If
cloudy (C) becomes independent of wet grass (W) having an observed data, rain (R),
conditional independence between cloudy (C) and wet grass (W) can be indicated using a
series of arrows as shown in Figure 2.11.
aid eat
bath bathe
dot vowel at the end of a stroke
dot vowel in the middle of a stroke
The first consonant B (i.e., ) is written above the base line
base line
The first consonant B (i.e., is written on the base line
(a)
(b)
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
31
Figure 2.11: C is conditionally independent of W given R
Another type of dependency in Bayesian Network is “explaining away” [WH93], in which
each variable is competing to “explain” observed data. For example, consider the event
“grass is wet”, and possible causes “rain” and “water sprinkler”. Figure 2.12, in contrast to
the case in Figure 2.11, illustrates that two independent nodes, Sprinkler (S) and rain (R),
become conditionally dependent when having the observed data, wet grass (W). The
converging arrows towards wet grass (W) in Figure 2.12 indicate that if the grass is wet
when it is raining, the probability of a sprinkler being on becomes automatically less
possible and vice versa.
Figure 2.12: S is conditionally dependent on R given an observed data, W
Therefore, the state of a node being observed or hidden in a Bayesian Network has a huge
influence on the conditional dependency between variables. By using the Bayes ball
algorithm [Sr98], conditional independence between variables can be easily determined with
the information on a node being hidden or observed. The Bayes ball algorithm is illustrated
in Figure 2.13.
C R W
Cloudy Rain Wet grass
S
W
RSprinkler Rain
Wet grass
S R P(W= T) P(W = F)
T T 0.98 0.02 T F 0.95 0.05F F 0.0 1.0 F T 0.94 0.06
CPT of W node
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
32
Figure 2.13: Illustration of the Bayes Ball algorithm [Sr98]. If there is no flow of a ball from A to B in a graph, A and B are conditionally independent given a set of observed
or hidden variables X and vice versa.
In addition, every node in the Bayesian Network needs to be specified with a Conditional
Probability Distribution (CPD) and a table holding these distribution values is called a
“Conditional Probability Table” (CPT). A sample CPT of a W node is shown in Figure
2.12. The table indicates the likelihood of grass getting wet with regard to whether a
sprinkler is on and/or whether it has rained.
2.4.3.2 Inference
Inference of Bayesian Network involves computing the probability distribution of a node
given the values of some other nodes. In other words, it is the process of finding the
likelihood of an explanation given evidence and priori probabilities. One of the reasons
Legend
Hidden node
Observed node
Direction of the flow of a ball
Indication of parent-child relationship
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
33
Bayesian Networks are useful is because they permit a more efficient inference procedure
[Ja99]. Inference can be categorized into two types: exact and approximate.
Exact inference procedures are useful when a network structure is not too complex;
however, approximate inference procedures work better in practice when a model becomes
computationally complicated such as models with a repetitive structure or large clusters.
Examples of exact inference algorithms include a local message passing algorithm [Pj88],
[PS91] and a junction tree algorithm [HD96], [CDL+99]. Popular approximate inference
methods include Monte Carlo sampling methods [MD98], variational techniques [SJJ96],
[JGJ+98], [JJ98], and loopy belief propagation [WF99], [Wy00], [FW00].
2.4.3.3 Learning
“Learning” in Bayesian Networks often refers to learning parameters of a network as well as
the structure of the network.
In brief, parameter learning is an estimation of a conditional probability table (CPT) of each
node in a network based on a number of training samples. Here, the learning methods vary
widely depending on attributes of training samples i.e., whether they are (fully or partially)
observed, or whether they are (fully or partially) hidden. In general, there are three common
types of parameter learning methods – Maximum Likelihood (ML), Maximum a Posterior
(MAP) and Expectation Maximization (EM).
In ML learning, the goal is to find the maximum likelihood of training data given N cases,
which are assumed to be independent. Assume that D = (D1, …, DM) is a training data set
which contains M cases, the maximum (optimal) likelihood of each node α can be denoted
as
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
34
)|(maxarg
DP (2.1)
In MAP learning, Maximum a Posterior (MAP) estimation assumes the existence of a prior
p(β) over the parameters β [Ja99]. It prevents the case of zero probability if a particular
parameter is never seen in the training samples by the use of a Dirichlet prior. The chance of
having zero probability in MAP is because the algorithm is based on “counting”. According
to the wet grass example in Figure 2.12, the MAP of the wet grass node including Dirichlet
prior can be denoted as:
),(
),,(),|(
rRsSN
rRsSwWNrRsSwWPMAP (2.2)
where N(…) is the number of times the corresponding parameters are found to be true or
false and α, β are uniform Dirichlet priors, used when a particular parameter is not seen in a
training set. In general, MAP is used if there are a small number of training cases compared
to the number of parameters [Mk01], however it is still important that the counts are based
on sufficient statistics to achieve an optimal estimation.
Expectation Maximization (EM) is mainly used when variables are partially observable i.e.,
the network contains some hidden nodes. It computes expected values of all the nodes after
(M step) training by using an inference algorithm, and then treats these expected values as
though they were observed (in E step) [Mk01]. Using EM, it is important to know the
structure of the model in advanced as this is the key to identifying any hidden nodes. In the
case of the wet grass example in Figure 2.12, the EM of a W node can be denoted as
),(
),,(),|(
rRsSE
rRsSwWErRsSwWPEM
(2.3)
where E(…) is the number of times corresponding parameters are expected to occur.
According to Murphy [Mk01], E(…) is computed as follows
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
35
m m
mm DePDeIeE )|()|()( (2.4)
where I(e|Dm) is an indicator function which is 1 if a parameter e occurs in training case m,
and 0 otherwise.
2.5 Natural Language Processing Algorithms for Handwritten
Phrase Recognition
This section presents natural language processing algorithms relating to the field of
handwriting recognition. In particular, it focuses on the role of statistical language modelling
algorithms in handwritten sentence recognition systems.
2.5.1 Statistical Language Modelling
[MS99] stated that the major purpose behind statistical language modelling is to capture a
language’s regularities via statistical inference on its corpus. According to the literature
[QAC05], the concept of applying statistical language models to automatic text transcription
was initiated by speech recognition research. The concept was then later adapted to
handwriting recognition problems, resulting in several handwriting recognition engines built
with statistical language modelling techniques. For instance, recent work [PVM+03],
[Ms01], [QAC05] and [MB01], [ZB04], [VBB04] applied statistical language modelling
techniques to resolve the problems of online and offline handwritten sentence recognition,
respectively, and the work [QAC05] achieved up to 90.4% word recognition accuracy.
In general, the most commonly used statistical language models in the field of handwriting
recognition are n-gram models, which are denoted as follows by [QAC05]:
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
36
)|()(1
11
n
i
inii wwpWp (2.5)
where p(W) is the probability of a word sequence given by a statistical language model, and
)|( 11
inii wwp is the frequency of the sequence 1
1
iniw occurring in a corpus.
By applying a statistical language model, [QAC05] proposed a solution to online
handwritten sentence recognition as follows:
Ŵ= )()|(maxarg WpWSpW
(2.6)
where Ŵ is the most likely word sequence for a written sentence (out of the candidate
sequences W), S is a given handwritten sentence to recognise, P(S|W) is the posterior
probability of the written sentence S given a sequence W, and p(W) is the statistical language
model’s probability for the sequence W. This work identifies the most likely word sequence
for a written sentence by finding the best path in a word graph (i.e., a graphical model of a
sentence’s candidate words) using a Viterbi search algorithm [QAC05].
2.5.2 Viterbi Algorithm
The Viterbi algorithm provides an efficient way of finding the most likely state sequence in
the Maximum a Posterior (MAP) probability sense of a process, which is assumed to be a
finite-state discrete-time Markov process [Ml00]. Here, a finite state means that the number
of states in the model is limited, discrete-time means that it takes the same unit of time to get
from any state to its adjacent state in the model, and the Markov process means that
(assuming that it is a first order Markov process) the probability of being in state ck at time k
(given all states up to k-1) depends only on the previous states ck-1 at time k-1. [Ml00]
formulates the first order Markov process as follow:
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
37
)|(),...,,|( 111 kkkok ccpccccp (2.7)
Overall, the Markov process can be of any order and the nth order Markov process is defined
as:
),...,|(),...,,|( 110 knkkkk cccpccccp (2.8)
In order to clarify the Viterbi algorithm’s role in handwriting recognition, consider the
Viterbi algorithm (formula 2.9) proposed by [Ml00] for handwritten word recognition. It is
assumed that the process is the first order Markov process in the algorithm.
n
inniic ccPccPccpczpZg
111201 )]|(...)|()|(log[)|(log)( (2.9)
where gc(Z) is the maximum posterior probability of the sequence of characters conditioned
on candidate characters’ sequence C = c1,c2,.., cn, and zi is a feature vector for the ith
character.
2.6 Pen Application Program Interfaces (APIs)
With the rapid popularity of pen based computing in recent years, a number of pen based
application program interfaces (APIs) have become widely available. One of the most
popular APIs for collecting, manipulating and recognizing digital ink is Microsoft Tablet PC
platform SDK APIs [Tab04], which mainly supports the Microsoft Tablet PC platform. The
APIs include functions to manipulate low level ink data as well as higher segment-level,
stroke-level, word-level and phrase-level recognition. Some of the stroke-level APIs are not
directly applicable to the current research as Pitman’s Shorthand is not included in the
supported languages. Nevertheless, other APIs are highly useful for the development of ink
2. Background to the Automatic Recognition of Handwritten Pitman’s Shorthand
38
input and text output user interfaces. Implementation of these APIs for the overall
recognition and transcription of handwritten Pitman’s Shorthand is discussed in detail in
Chapter 7.
2.7 Summary
This chapter presented a literature review of systems and techniques relating to computer
aided recognition and transcription of handwritten Pitman’s Shorthand. The commercial
viability of the handwritten Pitman’s Shorthand recogniser is evaluated in comparison to the
functionalities of handheld devices’ existing text entry systems. The chapter presents basic
information on Pitman’s Shorthand, which is vital to enable the reader to easily follow the
thesis’ discussions, and it also provides brief reviews of decades of previous work on the
automatic recognition of handwritten Pitman’s Shorthand. A number of graphical models
applied to the pattern recognition field were discussed, with a thorough algorithm review of
the Bayesian Network’s architecture, mainly from the aspect of the algorithm’s efficiency in
handling handwritten Pitman’s Shorthand word recognition problems. The role of statistical
language models in the recognition of handwritten sentences has also been addressed,
together with a review of the Viterbi algorithm. The chapter also highlighted tablet PC
related application program interfaces (APIs) that are essential for the development of a
commercially viable prototype handwritten Pitman’s Shorthand recogniser.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
39
3 Evaluation of phonetic based transcription of vocalised
handwritten Pitman’s outlines
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
40
Chapter 3 Introduction
The previous chapter reviewed the performance of existing work carried out on the
automatic recognition of handwritten Pitman’s Shorthand and presented an overview of
popular pattern recognition algorithms that can be used to improve the performance of word
level and phrase level recognition. Before taking the next step to advanced word and phrase
recognition, this chapter first presents a preliminary experiment, carried out to verify
whether existing transliteration methods, proposed in the literature, are efficient enough for
the purpose of this project.
In particular, the primary goal of this preliminary assessment is to ensure whether it is
practical to convert segmented portions of shorthand outlines into phonetic values prior to a
text translation. Perhaps direct translation of segmented primitives of shorthand outlines
into English words is more efficient, however, such an attempt has never been reported
throughout two decades of previous work. It has been shown that a primitive to text
translation approach is robust against stroke variation [CK04] and the approach is applied in
several commercial handwriting recognisers [Hn97], [LY97], [HV93]. Taking into
consideration the transcription accuracy achieved by existing systems, this research is not
based on the assumption that phonetic based transcription is the only one absolute solution
to transliterate handwritten Pitman’s Shorthand. In addition, the direct translation of
primitives into words was not feasible at the time of previous work because there was no
electronic Pitman’s Shorthand lexicon that enabled primitives to be directly mapped to
related words. If an electronic Pitman’s Shorthand lexicon is in existence, direct translation
of primitives into text will become feasible. It is proposed in this research that it is
reasonable to create an electronic Pitman’s Shorthand lexicon and analyse a primitive-to-text
translation approach. However, a careful appraisal of conventional methods is performed
before implementing a new algorithm. Therefore, this chapter firstly analyses the
advantages and disadvantages of phonetic based translation via experimental results, and
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
41
then presents a discussion on why a primitive based transcription approach is preferable to
phonetic based transcription approach.
In general, appraisal of existing methods can be carried out easily if the existing systems
serve the purpose of the assessment directly. However, this is not the case in the current
assessment (i.e., assessment of conventional phonetic based transcription methods). There
are two reasons for this: firstly, previous work by [LQ90], [QL93], [LD86] mainly
emphasises low level pattern classification and the work presents just logical procedures of a
linguistic post processor with no detailed implementation for phonetic based word
translation. Secondly, work by [NB02], [KSN+03], [SKN+04] emphasise offline
recognition and the systems there do not fit the objectives of the current experiment. As a
result, this chapter presents a prototype of a linguistic post processor that includes the
conventional idea of phonetic based word translation, plus novel pattern tuning algorithms,
which are effective in dealing with the shape variations of handwritten Pitman’s Shorthand
.
3.1 System Overview
In order to assist the reader to get a clear understanding of the whole framework, an
overview of the transcription engine in combination with our collaborator’s recognition
engine is given (Figure 3.1). Ink data is collected by the recognition engine whose role is to
firstly differentiate between vocalized outlines and short forms. It then segments a vocalized
outline into the most relevant fragments by detecting dominant points along the outline. The
segmented primitives are then processed through a neural network classifier, and a ranked
list of pattern primitives, along with each of their related categories, is produced at the end of
the classification process. Short-forms are recognized separately from vocalized outlines
using a Template Matching Algorithm. Unlike the vocalized outline recognizer, the short-
form recognizer immediately produces a ranked list of candidate words for a given short-
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
42
form. Detailed descriptions of the collaborator’s recognition engine can be referenced in
recent publications, [YLH+04a], [YLH+04b], [YLH+05a].
Figure 3.1: An abstract view of the whole system
The role of the transcription engine is to find the best candidate word for a given vocalized
outline or short-form. It includes two major stages: word level transcription and phrase level
transcription. At the word level transcription, short-forms are not taken into account since
they have been interpreted into the most likely words by the recognition engine. Vocalized
outlines are transliterated into sets of English characters by two processes: pre-processing
and word recognition. These two processes are the primary components of the system
presented in this chapter. The pre-processor performs the setting up of essential lexical
knowledge relating to handwritten Pitman’s Shorthand. The word recognizer then takes a
Input
Collaborator’s recognition engine
Vocalised outline recogniser
Short-formrecogniser
Segmentation engine (dominant point
detection)
Classifier (Neural Network)
Output:A ranked list of
primitives
Template matching
engine
Output:A list of words
Transcription engine
Vocalised outline interpreter
Pre-processing
Word level transcription
Output:A ranked list of
words
Short-form interpreter
Phrase level transcription
Output Text
Internet
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
43
ranked list of classified primitives, which are forwarded from the recognition engine as
input, and produces a ranked list of candidate words as output.
After word recognition, candidate words of either a vocalized outline or a short-form are put
through a phrase level processor and the word with the highest contextual probability is
chosen as a correct representation for an input outline. The phrase level transcription is not
studied in this chapter since the primary purpose of the preliminary experiment is to analyse
word recognition performance.
3.2 Transcription of Vocalized Outlines Based on a Phonetic
Approach
A detailed view of a phonetic based vocalized outline interpreter is illustrated in Figure 3.2
and it consists of the following modules:
- Lexicon preparation: converts a phonetic lexicon into a hash table such that similar
sounding words are indexed under the same key in order to cope with phonetic rules
of Pitman’s Shorthand.
- Nearest Neighbourhood Query: slightly adjusts segmented features of an input
shorthand outline in order to cope with shape-variation in handwriting.
- Feature to phoneme conversion: converts geometrical features of shorthand outlines
into phonetic representation in order to match with a phonetic lexicon.
- Phoneme ordering: reorders resultant phonemes, produced by a “Feature to
phoneme conversion” process into a linguistic sequence in order to match with a
phonetic lexicon.
- Lexicon lookup: matches a series of phonemes with a phonetic lexicon to find
related English words.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
44
Figure 3.2: Detailed view of a vocalized outline interpreter
3.3 Lexicon Preparation
The primary purpose of the lexicon preparation is to convert a phonetic dictionary into a
hash table data structure and categorise words with similar pronunciations under the same
key. Here, words with similar pronunciations mean words with either identical phonemes or
similar phonemes. For instance, the words “bet” and “pet” have similar pronunciations
because they contain similar phonemes with the only difference of a voiced consonant /B/
and an unvoiced consonant / P/.
A major benefit of keeping similar sounding words under the same key is to reduce the
search complexity at O(1). In addition, it enables the retrieval of a list of ambiguous words
for an input outline by a single lookup because the creation of a hash table for a lexicon is
Vocalized outline
recogniser
A ranked list of classified primitives
Vocalized outline interpreter
Lexicon preparation(Pre-processing)
Nearest Neighbourhood Query (NNQ)
Feature to phoneme conversion
Phoneme ordering
Lexicon lookup
Transcription engine
Phonetic lexicon
A ranked list of words
Contextual knowledge
Sentence level transcription
Output word(s)
Input outline
Process
Data Data flow
Read/Write accessStorage
Legend
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
45
based on the hypothesis: “words with similar pronunciations resemble one another in
Pitman’s Shorthand”. One may question why similar sounding words resemble one another
in Pitman’s Shorthand since this assumption is not true in normal English. In normal
alphabetical handwriting, two similar sounding words do not exactly need to look alike. An
example is given with the words “tail” and “tale” in (Figure 3.3); the two words sound alike,
but their scripts are dissimilar enough not to be confused.
Figure 3.3: Illustration of sample words in normal English and Pitman’s Shorthand
In contrast to normal English, similar sounding words do look alike or are identical in
Pitman’s Shorthand. This is due to the special rule of Pitman’s Shorthand invented for
speed improvement purposes i.e., a pair of voiced and unvoiced consonants are written in the
same stroke with different line thicknesses. An example is given with the words “tail” and
“tale” again (Figure 3.3): the two words sound alike and their scripts look identical in
Pitman’s Shorthand. Therefore, keeping similar sounding words under the same root
directly affects search performance and an algorithm for the lexicon organisation is
presented below:
N: numbers of words contained in a phonetic lexicon
Xi: ith phonetic index of the phonetic lexicon
Yi: word data relating to Xi
table: a hash table used to store data of the phonetic lexicon
key: a phonetic key
value: word data to which a specified key is mapped in table
Typed words in English
tale
tail
Handwritten words in English Handwritten words in Pitman’s Shorthand
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
46
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
key = Xi
Yi = getWordData(Xi)
//convert unvoiced consonants into voiced consonants
key = tuneToVoicedConsonants(key)
//if a phonetic key already exists
if (table.containsKey(key))
value = table.get(key)
value += Yi
end
else if (!table.containsKey(key))
value = Yi
end
table.put(key,value)
end
The lexicon preparation takes place when the transcription engine is run for the first time
and does not repeat when input outlines are transcribed in real time. If any modification of a
lexicon is required, such as a change of word-list or a change of user’s domain, the existing
hash table can be updated by repeating the “lexicon preparation” procedure.
Once the lexicon data is ready the next process, denoted as “Nearest Neighbourhood Query”
is invoked.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
47
3.4 Nearest Neighbourhood Query (NNQ)
Consonant neighbourhoods vowel neighbourhood circle neighbourhood
F, V
P,BR
S, Z
TH, th
T, D at the beginning of an outline
in the middle of an outline
at the end of an outline
close circles
unclose circles
hooks
Figure 3.4: Sample neighbourhoods predefined in the Nearest Neighbourhood Query Approach
The Nearest Neighbourhood Query (NNQ) is, in fact, a heuristic approach in which
misclassified pen strokes are adjusted according to the degree of similarity to other strokes.
Primitives with similar geometric features are predefined in the same neighbourhood and the
system comprises of seven neighbourhoods, where four relate to vertical and horizontal
strokes, one to circular primitives and the remaining two to dot and dash vowel-primitives.
Here, similarity means having similar angular structure for stroke primitives, having similar
shape for circular primitives or having similar location and shape for vowel primitives.
Samples of the predefined neighbourhoods are illustrated in Figure 3.4 and the Nearest
Neighbourhood Query algorithm is presented as follows:
{N1, N2, .., N7}: a collection of seven neighbourhoods
O: an input handwritten outline
I : number of segments of an input outline, O
Si: ith segment of an input outline, O
Pattern: a pattern category of Si
Xi: a resultant vector, containing a set of primitives that are similar to Si
R: an output vector, containing a set of Xi where (i = 1, 2,.., I)
M: a matrix, containing a number of outlines that are similar to O
Initialization
Initialize N1, N2, N3, N4, N5, N6, N7
X = a new vector
R = a new vector
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
48
Stroke adjustment
for i = 0 to I
//assign the ith segment of an input outline as a pattern category
Pattern = Si
for j = 0 to 7
//if the jth neighbourhood contains the value of Pattern
if (Pattern Nj)
//get all the elements of Nj excluding Pattern
Xi= Nj \ Pattern
end
end
R += Xi
end
Matrix = createMatrix(R)
return Matrix //output of NNQ algorithm
Figure 3.5: Sample output produced by the Nearest Neighbourhood Query
The output of NNQ is a matrix of primitives, whereby each row represents a particular
shorthand outline that is similar to an input pattern and each column represents a certain
segment of the shorthand outline. A pictorial presentation of NNQ is given in Figure 3.5 in
Handwritten outline
Sample output of the Nearest Neighbourhood Query algorithm
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
49
which sample input and output of the algorithm can be clearly seen. Once the NNQ process
is completed the next process, “Feature to Phoneme Conversion” is invoked.
3.5 Feature to Phoneme Conversion
This process converts segmented portions of a shorthand outline (e.g., loops, hooks or
strokes) into a phonetic representation using a set of production rules. According to our
study, approximately 20% of segmented portions of shorthand outlines, either forwarded
from the recognition engine or produced by the NNQ, can be directly converted into basic
Pitman’s phonemes. The remaining 80% need knowledge of adjacent primitives to be
translated into phonetic values based on a number of production rules. Similar to the work
by Leedham [Lg90], the production rules are applied with respect to a relationship between
an individual primitive and its adjacent primitives. Unlike Leedham’s approach, rules are
applied in the order of priority in this novel system. Basically, there are five production rules
introduced in this new system and they can be stated in a descending priority order as
follows:
1. Feature Detection (FD)
2. Length Detection (LD)
3. Primitive Combination (PC)
4. Primitive Combination and Reverse Ordering (PCRO)
5. Direct Translation (DT)
To clarify the first two rules, consider the two examples described below and to
clarify the last three rules, refer to examples in
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
50
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
51
Table 3-2. In addition, basic notations of Pitman’s Shorthand relating to each rule
can be looked at in Table 3-1.
Table 3-1: Relationship between the production rules and basic Pitman phonemes
Rule Pitman phonemes
FD SES, ZES circles, ST, STER loop, N, F, V, SHUN hook, suffix –SHIP hook,
suffix –ING/INGS dot
LD MD, ND, suffix –MENT, half length strokes, double length strokes
PC W, Y, H
PCRO PL, BR, etc., PR, BR, etc., FR, VR, etc., and FL, VL etc.
DT All consonants except Y, W and H
Example 1: Application of Feature Detection (FD) Rule
STER large loop: Pitman uses a large loop to indicate the sound of /STER/ in the middle or
at the end of an outline. For this case, one of the FD rules reads: “IF a stroke or curve
primitive is followed by a large circular loop primitive in the middle or at the end of an
outline, THEN the loop appends phonemes of /STER/ to the preceding phoneme.”
Example 2: Application of Length Detection (LD) Rule
Double length curves: Normal Curve-primitives are doubled in length to represent the
addition of the syllables -TER, -DER, -THER and -TURE in Pitman’s Shorthand. For this
case, one of the LD rules reads: “IF a curve primitive is doubled in length, THEN the
double-length curve inserts phonemes of /TER/, /DER/, /THER/ and /TURE/ after the
phoneme of the curve. To understand this principle clearly, consider the example in Figure
3.6.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
52
Figure 3.6: Sample of phoneme translation of a double length stroke
As shown in the reference section of Figure 3.6, a normal downward curve represents a
phoneme /F/ in Pitman’s Shorthand, however, when the curve is doubled in length, it
represents the sound /F/ plus additional sounds of /TER/, /DER/, /THER/ or /TURE/.
Therefore, a candidate list for the word “after” contains four different pronunciations at the
end of phoneme conversion (Figure 3.6).
3.6 Phoneme Ordering
The primary task of “Phoneme ordering” is reordering resultant phonemes, produced by the
“Feature to Phoneme Conversion” process. The reordering is required due to a special
writing order of Pitman’s Shorthand i.e., consonants of a word are always written first and
vowel notations are written only after the completion of a whole consonant kernel. In online
handwriting recognition, pen data is collected in a series of time order and so vowel-
primitives are always tagged behind consonant-primitives regardless of the linguistic order
in our system. In order to obtain correctly ordered phonemes, vowels need to be inserted
among consonants. Leedham [Lg90] proposed the same strategy to sort phonemes according
to the linguistic order and this process is denoted as “Phoneme ordering”.
Double-length input outline for the word “after”
Recognition output
1st primitive- double length /F/ or /V/ curve2nd primitive- /A/ vowel
Output of “Feature to Phoneme Conversion”
Output 1/TER/+/A/ vowel
Output 2/DER/+/A/ vowel
Output 3/THER/+/A/ vowel
Output 4/TURE/+/A/ vowel
Reference
Normal /F/ consonant
Apply double length rule of /TER/, /DER/, /THER/, /TURE/
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
53
An example for “phoneme ordering” is given in Figure 3.7 in which its sample inputs are
directly taken from outputs of the “Feature to Phoneme conversion” process, demonstrated
in Figure 3.6. As shown Figure 3.7(a) the vowel /A/ is detected last although it is the first
phoneme in the word “after”. The system uses dominant point information and sequence
information of ink data to place vowels at their correct positions. After phonemes have been
sorted into correct order, the resultant phonemes are matched up with a phonetic lexicon in
the next process, called “lexicon lookup”. Then a list of autographic English words that best
represent the input shorthand outline is produced at the end the search.
Figure 3.7: (a) Sample input of phoneme ordering process (b) sample output of phoneme ordering process
Input of the phoneme ordering process
Input 1/TER/+/A/ vowel
Input 2/DER/+/A/ vowel
Input 3/THER/+/A/
vowel
Input 4/TURE/+/A/
vowel
Output of the phoneme ordering process
Output 1/A/ vowel+/TER/
Output 2/A/ vowel+/DER/
Output 3/A/
vowel+/THER/
Output 4/A/
vowel+/TURE/
(a) (b)
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
54
Table 3-2: Phonemes translation using PC, PCRO or DT rules
Number Pitman outline
English word
Primitives classified by a
recognition engine
Phonemes of an outline
(a) Word 1. - /W/ consonant
(Rule: + = /W/)2.
3. - /D/ or /T/ consonant
4. - /AW/ vowel
Translation is based on the rule of “primitive combination” (PC). The rule applied to this
example is “IF an upward diagonal stroke is preceded by a small anti-clockwise hook,
THEN the combination of these two primitives denotes the phoneme /W/”
(b) Printed 1. - /PR/ or /BR/
2. (Rule: + = /PR/ or /BR/)
3. - /N/ curve
4. - /T/ or /D/ consonant
5. - /Ē/ vowel
6. - /Ā/ vowel
Translation is based on the rule of “primitive combination and reverse ordering” (PCRO).
The rule applied here is “IF a small hook is followed by a straight downward stroke, the
small hook is converted into phoneme /R/ and swapped with a succeeding phoneme.”
(c) Go 1. / - /G/ or /K/ consonant
2. - /Ō/ vowel
Translation is based on the rule of “direct translation” (DT). The rule applied to this
example is “IF a horizontal stroke is written from left to right, THEN the stroke directly
denotes the phoneme /G/ or /K/.”
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
55
3.7 Experimental Results
The preliminary experiment described in this chapter is categorized into two main studies:
firstly, statistical analysis of homophones (words which look similar but have different
representations) in a phonetic lexicon and secondly, performance evaluation of the word
level transcription of the system prototype.
3.7.1 Data Set
For a statistical analysis of a phonetic lexicon, a list of the 5000 most frequently used
English words, extracted from the Brown Corpus is used. Based on this word list, a hash
table is created with a series of phonemes as a key for each group of words. Here, the
phonetic keys are extracted from the CMU phonetic dictionary. (Figure 3.8 gives a pictorial
representation of the hash table).
Index Word
/B Ă T/ Bat
Pat
Bad
Pad
Figure 3.8: Sample element of a phonetic lexicon in a hash table
For an analysis of word transcription performance, 432 Pitman outlines were collected,
written with different levels of tidiness on a WACOM ART II Tablet by three writers. Each
writer wrote a sample sentence, consisting of 28 vocalized outlines and 20 short-forms, three
times. Here, the sample sentence covers the whole range of shorthand primitives and the
selected words contained in the 5000 most frequently used English words of the general
domain. Samples of the collected data are illustrated in Figure 3.9.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
56
Figure 3.9: Sample collected outlines
3.7.2 Analysis of a Phonetic Lexicon
The goal of this experiment is to estimate an approximate number of candidate words
(homophones) for each input outline by using a phonetic lexicon, and to evaluate which
vocabulary level has the highest ambiguity distribution and which level has the least. Here,
vocabulary level means words known and used by a user and it is equivalent to a number of
words contained in a lexicon. Statistics obtained from this study are intended to estimate the
preliminary accuracy of the post processing of handwritten Pitman’s Shorthand with respect
to different levels of writers’ vocabulary.
40
50
60
70
80
90
100
10
0
30
0
50
0
10
00
20
00
25
00
30
00
35
00
40
00
45
00
50
00
Lexicon size in no.of words
Un
iqu
e o
utl
ine
s i
n %
Uniqueness of outlinesfor perfect recognition
Uniqueness of outlinesgiven line thicknessambiguity
Uniqueness of outlinesgiven vowel ambiguity
Figure 3.10: The distribution of homophones in different sized phonetic lexicons
Figure 3.10 illustrates experimental results obtained from different sizes of phonetic lexicons
up to 5000 words. The X-axis of the graph represents different sizes of lexicon, and words
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
57
extracted for these lexicons are sorted according to their frequency of usage. This means, a
lexicon of size 100 represents the first hundred most commonly used words in English; a
lexicon of size 300 represents the first 300 most commonly used words and so on. The first
test simulates how an input of Pitman’s outline can be uniquely identified by a lexicon in the
presence of perfect segmentation and recognition. According to the test, 97% of the 5000
most frequently used English words have a unique representation. The maximum ambiguity
is 3 potential words per index and an average ambiguity is 1.02 potential words per index.
Therefore, a transcription accuracy of at least 97% can be estimated if there are no errors in
the low level segmentation and classification of shorthand outlines.
The second test (Figure 3.10) estimates the transcription performance in the presence of
unclear thickness of a pen-stroke. This is the most common case experienced in the
recognition of Pitman shorthand as most digitizers are unable to detect the thickness of a
pen-stroke even though Pitman defines similar sounding consonants by the same stokes and
differentiates between voiced and unvoiced sounds by thick and thin lines. It should also be
noted that regardless of the input technology, writers do not make a clear distinction between
thick and thin strokes. According to this test, ambiguity of a lexicon of 5000 words increases
by about 9% if there is no distinction between voiced and unvoiced consonants. The
transcription accuracy here is expected to be at least 87%.
The third test in Figure 3.10 predicts the transcription performance in the presence of
ambiguous vowel notations. This is an important consideration in the recognition of Pitman
shorthand since vowels are occasionally omitted in writing Pitman’s Shorthand and omitted
positions vary from writers’ experience or individual inclination. If a solution to handle the
unpredicted omission of vowels in an outline is by excluding vowels from the lexicon and
matching without vowel components, the new version of the lexicon has about 56% unique
indices.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
58
3.7.3 Performance Evaluation of the Word Level Transcription
The goal of this experiment is to evaluate the word transcription performance of our
proposed framework under the following criteria:
in the presence of shape variation and position confusion due to speed writing or
different users’ writing;
in the presence of segmentation and classification errors due to misclassification or
hardware constraints, and
in the presence of abnormal outlines due to inconsistent writing
Table 3-3: Experimental results of the phonetic based word translation
Description Transcription accuracy (Vocalised outline)
Overall 84%
In the presence of vowel
omission or confusion
0%
In the presence of inconsistent
writing
0%
In the presence of classification
error
100%
As shown (Table 3-3), the best rate achieved by the vocalized outline interpreter is 84%.
12% of error rate is due to inconsistent writing i.e. outlines which are comprehensible to
human readers, but are not consistent with the writing rules of Pitman shorthand. An
interesting phenomenon observed in this experiment is that 48% of perfect transcription
occurs in the presence of recognition errors. This shows the approximate pattern matching
technique applied in NNQ is capable of dealing with classification errors. A primary
limitation of this system, which accounts for 40% of error rate, is being unable to correctly
transcribe outlines with hidden or omitted vowels.
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
59
Both accuracy and error rates reported throughout this experiment are based on a number of
outlines and they can be denoted as follow:
100t
ca (3.1)
where a is the word transcription accuracy, c is the total number of correctly interpreted
outlines, and t is the total number of handwritten outlines.
100
t
cte (3.2)
where e is the error rate, t is the total number of handwritten outlines, and c is the total
number of correctly interpreted outlines.
3.8 Discussion
On the whole, a primary advantage of phonetic based transcription of vocalised outlines is to
be able to adapt to existing language models (i.e., phonetic models), which define large
vocabularies with probability distributions between sequences of phonemes. Another
distinct advantage is that a machine performs the same logical procedures as a human
interpreter to transcribe Pitman’s Shorthand outlines and this makes the machine
transcription concept easy to follow.
In terms of disadvantage, performance of phonetic based transcription falls dramatically in
the presence of omitted vowels in vocalised outlines. According to our statistical analysis of
a phonetic dictionary (Figure 3.10), transcription of vocalised outlines without vowel
components is estimated at merely 56% correctness. In addition, the special writing rules of
Pitman’s Shorthand raises ambiguity in phonetic based transcription – a number of special
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
60
writing rules invented for speed improvement purposes in Pitman’s Shorthand allow
multiple ways of pronouncing different sounds for primitives with minor differences of size,
length, thickness or inclination. In general, it is practical to express accurate size, length, or
inclination of a stroke for a printed script; however it is less practical for handwriting,
especially if the script is written at speed. The following examples illustrate the variation of
pronunciations with the minor differences between geometric features.
Example 1: Appearance variation
As shown in Figure 3.11, standard notations for consonants /T/ and /L/ are a vertical stroke
and a curve respectively, however if /T/ is immediately followed by consonant /L/ or if
there is a non-stressed vowel between /T/ and /L/, Pitman uses a combination of a small
hook and a vertical stroke (i.e., ) to indicate the sound of /TL/ or /T+ silent_Vowel+L/.
On the other hand, an outline with a small circle followed by a vertical stroke (i.e., )
stands for the sound /ST/ and it can be easily confused with an outline of /TL/ or
/T+silent_Vowel+L/ if the circle at the beginning is not clearly written. According to
experimental results, approximately 45% of small hooks are recognised as circles.
Therefore, a direct conversion of primitives, which are prone to minor recognition errors,
into phonemes can lead to completely different interpretations.
Figure 3.11: Illustration of the incidence of phoneme variation due to confusion between a circle and a hook
Basic Pitman’s notations
/T/
/L/
/S/
/ST/
/TL/ or/T+silent_vowel+L/
Handwritten outline
3. Evaluation of phonetic based transcription of vocalised handwritten Pitman’s outlines
61
Example 2: Length variation
In Pitman’s Shorthand, curves of different lengths represent different phonemes; however,
length is not clearly shown in some outlines while writing at speed. As shown in Figure
3.12 a sample outline of the word “shatter” can be interpreted wrongly as /SH Ă/ instead of
/SH Ă T ER/ if the curve /SH/ is not recognised as a long curve.
Figure 3.12: Illustration of the incidence of phoneme variation due to length confusion
Examples 1 and 2 demonstrate that converting inaccurate handwritten primitives into
phonemes allows unnecessary candidates to appear at an early stage and subsequently affects
the transcription performance. Cho & Kim [CK04] proposed that stroke relationships are
usually robust against geometric variations and important for discriminating characters of
similar shapes in on-line handwriting recognition. It is, therefore, more appropriate to retain
stroke information of an outline rather than changing it into phonemes.
After a thorough evaluation on the advantages and disadvantages of a phonetic based
transcription approach, it has finally been concluded that the remaining work of the thesis is
going to be based on a novel transcription method that retains low level stroke information,
denoted as a “primitive based transcription approach”.
Phonetic representations of double-length and normal
length outlines
/SH Ă T ER/
/SH Ă/
Handwritten outline for the word “shatter”
5. Generation of a machine-readable Pitman’s Shorthand lexicon
62
4 Bayesian Network Based Word Transcription
5. Generation of a machine-readable Pitman’s Shorthand lexicon
63
Chapter 4 Introduction
The previous chapter reviewed the advantages and disadvantages of a phonetic based
transliteration of handwritten Pitman’s Shorthand and concluded that the idea of not
following conventional phonetic approaches is rather appealing. This chapter discusses the
novel approach, implemented specifically for this research to improve word transcription
accuracy by using a primitive to text transliteration approach. In this new approach,
Bayesian Network representation is applied to configure ambiguities and stroke
dependencies of handwritten Pitman’s Shorthand outlines.
First of all, an overview of the whole system is given, thereby enabling the reader to get a
clear understanding of the role of the word transcription processes. Following the overview,
a detailed description of a Bayesian Network word recogniser is given under the following
topics.
Summary: a brief description of each process included in a Bayesian Network based
word recogniser.
Life cycle: explanation of a life cycle of Bayesian Network models that represent
handwritten Pitman’s Shorthand outlines.
Network architecture: description of an outline model’s architecture, including
attributes (nodes) and relationship between nodes (topology).
Inference: propagation of the likelihood of an attribute of an outline model based on
other attributes of the model, in which the propagated values are used for N-best
word selection.
Training (Learning): training outline models with a collection of training data in
order to enable the system to cope with the natural ambiguity of handwriting in
Pitman’s Shorthand
5. Generation of a machine-readable Pitman’s Shorthand lexicon
64
Model selection: selection of N-best outline models for a given input outline centred
on knowledge based rejection strategies.
Experiment: performance evaluation of the Bayesian Network based word
recogniser
4.1 System Overview
Figure 4.1: An abstract view of the whole system
An overview of the whole system is given (Figure 4.1), in which the diagram is nearly
identical to the one illustrated in the pervious chapter. A major difference between the two
frameworks is the change to a vocalized outline interpreter (shaded box) in the new
framework, where text is interpreted directly from primitive attributes instead of phonetic
Input
Collaborator’s recognition engine
Vocalised outline recogniser
Short-formrecogniser
Segmentation engine (dominant point
detection)
Classifier \(Neural Network)
Output:A ranked list of
primitives
Template matching
engine
Output:A list of words
Transcription engine
Vocalised outline interpreter
Pre-processing
Primitive based word transcription
Output:A ranked list of
words
Short-form interpreter
Phrase level transcription
Output Text
Internet
5. Generation of a machine-readable Pitman’s Shorthand lexicon
65
attributes as in the old framework. A summary of processes included in the new vocalized
outline interpreter is presented as follow.
4.2 Summary of Bayesian Network Based Word Transcription
Figure 4.2: Illustration of Bayesian Network based word transcription
The shaded box in Figure 4.2 highlights the role of Bayesian Network based vocalized
outline transcription. It comprises of two major processes: preprocessing and word
interpretation. The preprocessing takes place at the first time of setting up a transcription
engine and it is skipped in real time transcription of shorthand outlines unless any
modification of lexical data is required. A primary function of preprocessing is to
automatically convert a phonetic lexicon into a Pitman’s Shorthand lexicon such that
different combinations of a series of geometric patterns represent different keys, with each
key mapping to one, or more than one, word. This approach (the creation of the Pitman’s
Vocalised outline
recogniser
A ranked list of
primitives
Transcription engine
Phrase level transcription
A ranked list of words
Bayesian Network based vocalised outline interpreter
Word interpretation
Pre-processing
Lexicon construction
Training
Shorthand lexicon
Bayesian Network based outline models
Input outline
Output word(s)
Process
Data Data flow
Read/Write accessStorage
Legends
5. Generation of a machine-readable Pitman’s Shorthand lexicon
66
Shorthand lexicon) is distinct from previous work and a full description of the lexicon
creation is given in a separate chapter, Chapter 5. Another important function of
preprocessing is to create Bayesian Network based outline models where user independent
handwritten data and lexicon information are embedded in hierarchical probabilistic
structures.
The next process, which takes place immediately after the preprocessing, is word
interpretation. A primary function of the word interpretation is to produce a ranked list of N-
best words based on a confidence score of the low level recognition plus a belief of nodes of
an outline model. After the word interpretation, the N-best words are then forwarded to the
next process, called a phrase level interpreter to produce the final word(s) for a given input
outline.
4.3 Life Cycle of Outline Models
Outline models are the primary components of a vocalised outline interpreter and this
section describes the life cycle of outline models throughout the word transcription process.
Firstly, a precise description of outline models is given– a collection of outline models
represents a dictionary, and the number of outline models, generated in the word interpreter
is not the same as the number of words of a dictionary. This is because each outline model
is designed to represent one, or more than one, word in order to cope with hardware
limitations or ambiguities of handwritten Pitman’s Shorthand. An example of a hardware
limitation is the passive digitisers of Personal Digital Assistants (PDAs) that are incapable of
detecting accurate line thicknesses. This limitation makes a shorthand recogniser fail to
distinguish between two similar outlines with different line thicknesses e.g., outlines for the
words “pays” and “bays” . As a result, grouping similar outlines under the same
5. Generation of a machine-readable Pitman’s Shorthand lexicon
67
model enables the system to easily find potential candidate words for a given outline and
improves the search performance. Here, “similar outlines” stands for “words with the same
series of geometric features (of a consonant kernel) regardless of different line thicknesses
and different vowel positions. Samples of similar outlines are illustrated in Figure 4.3.
Figure 4.3: Illustration of three pairs of similar outlines
In terms of the life cycle, outline models are firstly created with the use of a shorthand
lexicon and secondly updated with a set of training data. The models are then saved as a
knowledge source for word interpretation until any changes are required. Examples of
changes include expanding the word list of an existing dictionary or altering a user domain.
In response to the change of a user domain, outline models are created, edited or removed
according to the user’s preference, defined at a domain set up process. Note that vocabulary
(i.e., the word list of a dictionary) has a huge impact on word transcription performance and
outline models should be associated with a dictionary of a corresponding domain. Figure
4.4 illustrates the life cycle of outline models.
In real time word interpretation, a series of classified primitives of an input outline are
matched with outline models, and the model with the highest posterior probability is taken as
a correct representation for a written outline.
pays bays oak go airs erase
5. Generation of a machine-readable Pitman’s Shorthand lexicon
68
Figure 4.4: Life cycle of outline models
4.4 Outline Model Architecture
An outline model is formed by concatenating the basic geometric features of a shorthand
outline, produced by the low level recognition engine, in chronological order. Note that
chronological writing order in Pitman’s Shorthand is not synonymous with the one in normal
English. The difference between them is illustrated in Figure 4.5, a chronological writing
order of the word “beat” in normal English is b, e, a, t whereas the writing order changes to
b, t, e, a in Pitman’s Shorthand.
Vowels are always written last no matter how words are pronounced in Pitman’s Shorthand
and this makes the automatic transliteration of handwritten Pitman’s Shorthand distinct from
the transcription of handwritten English. According to the study in Chapter 3, reordering
vowels to their corresponding positions was found inefficient in the case of having missing
vowel variables in an outline. To improve upon existing systems, it was argued, one should
Shorthand lexicon
A collection of outline models
Creation of a new outline
model
Update of existing outline models
with training dataTraining
data
A closer view of an outline model for a particular word
Removal of outline model with new domain set up
Legends
Storage Outline model
ProcessRead / write access
Node of an outline model
5. Generation of a machine-readable Pitman’s Shorthand lexicon
69
somehow seek a more parsimonious solution that also leads to a better text interpretation
performance. Thus, this research proposes a novel network model, denoted as an outline
model which represents the inherently complex features of handwritten Pitman’s Shorthand.
Figure 4.5: Illustration of chronological writing order of normal English and Pitman’s Shorthand
4.4.1 Nodes of an Outline Model
The structure of an outline model is based on a Bayesian Network representation [Pj88] in
which the model is retained in a hierarchical structure with each node corresponding to a
primitive variable or a conditional variable, and each link signifying probabilistic
dependency between nodes. Similar to a network architecture designed by Xiao in the
domain of signature verification [XL02], our outline model creates the following four types
of nodes, depending on the relationship between one node and another.
1. Root node: A root node corresponds to an outline O and it represents one, or more than
one, word. It contains N child nodes {P1, P2,.. PN} where Pi corresponds to a collection of
primitives which represents the ith segment of the outline O.
Chronological writing order of the word “beat”
1st letter 2nd letter 3rd letter 4th letter
1st consonant (b) 2nd consonant (t) 3rd vowel (ea)
The word “beat”
In normal English
In Pitman’s Shorthand
5. Generation of a machine-readable Pitman’s Shorthand lexicon
70
2. Unique node: A unique node corresponds to a particular segment of a shorthand outline
and it represents one and only one pattern. It appears while an outline model O is created
with a shorthand lexicon at the beginning, and it remains or disappears while O is updated
with training data. The definition of a unique node is: “if a particular segment (node) of an
outline model relates to one and only one type of geometric feature after it has been updated
with a shorthand lexicon as well as training data, the node is considered to be independent of
other nodes and linked directly to a root node.” Figure 4.6 (a) and (b) illustrate occurrence
of unique nodes in two cases.
Figure 4.6: Illustration of unique nodes of an outline model(a) Occurrence of unique nodes after O has been created with a Pitman’s Shorthand
lexicon– since features of O in the lexicon are genuinely accurate, every segment(node) of O is related to one and only one pattern, resulting in a unique node for every segment. (b) Occurrence of unique nodes after O has been updated with lexicon and training data– since there is more than one possibility in the first and third segment of
O, two corresponding unique nodes disappear, resulting in one remaining unique node, P3.
3. Virtual node: A virtual node corresponds to a certain segment of a shorthand outline and
represents a conditional variable that allows the embedding of multiple possibilities of a
consonant-segment in an outline model O. It appears when two or more primitives compete
to represent a particular node of O during the training process, but it never appears while O
is created with a shorthand lexicon at the beginning. The definition of virtual node reads: “if
a particular primitive (e.g., P1 in Figure 4.6 (b)) is dependent on another primitive (e.g.,P2 in
Figure 4.6 (b)) and there is an optional relationship between them (i.e., either at most one or
none of them can be true at the same time), we can assume that there is a mechanism that
controls the values of P1 and P2, resulting in a virtual node V1 as shown (Figure 4.6 (b)).
O
P1P3 P4
P3
P1 P2 P4 P5
(b)(a)
O
V1H1
5. Generation of a machine-readable Pitman’s Shorthand lexicon
71
4. Hidden node: A hidden node corresponds to a certain portion of a shorthand outline and
represents a conditional variable that allows the embedding of hidden vowel primitives in an
outline model. An interesting assumption in relation to the creation of a hidden node is that
it appears from the time when outline models are created with a shorthand lexicon, although
the lexicon provides accurate vowel information at the time. This is due to a major purpose
behind hidden nodes i.e., to identify missing vowel components, randomly omitted by
writers according to the writers’ experience or preference. The definition of hidden nodes
reads: “if a particular primitive (e.g., P4 in Figure 4.6 (b)) appears or disappears from time to
time and the variation does not adhere to any rule, we can assume that there is a hidden
mechanism that controls the value of P4 or P5, resulting in a hidden node H1.”
In order to demonstrate how an outline model is created with the use of four types of node,
the step by step creation of an outline model for the word “bake” is given (Figure 4.7).
1. Firstly, a root node of an outline model is generated with the word “bake”.
2. The root node then creates N number of child nodes using a shorthand lexicon such
that each consonant primitive of a word in the lexicon turns into a unique node and
each vowel primitive turns into a hidden node, where N is the number of primitives
of the word.
3. The outline model is then updated with a number of training samples, resulting in
additional leaf nodes and virtual nodes.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
72
Figure 4.7: Illustration of step by step creation of outline models
Step 1: Creation of a root node
Step 2: Creation of unique node and hidden node
Step 3: Update with training data 1
Step 4: Update with training data 2
Steps Description Outline model
Shorthand outline for the word “bake” =
Lexicon entry for the word “bake”in terms of strokes =
Lexicon entry for the word “bake”in terms of type number = 4 7 91
R
R
U U H
Training data 1 for the word “bake”in terms of type number = 4 7 92
1 6
Training data 2 for the word “bake”in terms of type number = 4 5
R
V V H
L
L L L L L L
R
V V H
L L L L L L L
legend
R V H LRoot node Virtual node Hidden node Leaf node
Bake
4 7
91
4 1 7 6 91 92
4 1 7 6 5 91 92
2 possibilities of the 1st segment
5. Generation of a machine-readable Pitman’s Shorthand lexicon
73
Figure 4.8: Sample training data for the word “bake” processed by the recognition engine; the italic text on the right explains what each line of data represents
In addition, detailed explanation on training data, applied in the creation of an outline model
is given in Figure 4.8 which illustrates the training data 1 of the word “bake”, depicted in
Figure 4.7. The second and third line of data (Figure 4.8) indicates that there are two
possible pattern categories associating with the first segment of the word “bake”: type 4 and
type 1. Here, type 4 is equal to an existing pattern of a shorthand lexicon and type 1 is a new
pattern observed by the recognition engine. In order to update an existing outline model
with this new observation, an existing unique node (Figure 4.7, Step 3) is firstly transformed
into a virtual node and secondly attached with two leaf nodes, resulting in a virtual node
with two children. Similarly, according to the sixth line of data (Figure 4.8), a vowel
primitive (type 92) classified by the recognition engine is different from the one defined in a
lexicon (type 91), resulting in a hidden node with two children.
WStartS1, 0, 64, 4, 0.56
S1, 0,64, 1, 0.44S2, 64, 137, 7, 0.88S2, 64, 137, 6, 0.12
V1, 0, 64, 2, 1, 92
WEnd
(Input: a Pitman’s outline for the word “bake”)
(1st line: word start)(2nd line: segment number, start coordinate, end coordinate, primitive type, probability)(3rd line)(4th line)(5th line)
(6th line: vowel number, start coordinate, end coordinate, sequence, position, type)
(7th line: word end)
5. Generation of a machine-readable Pitman’s Shorthand lexicon
74
4.4.2 Relationships between Nodes
Relationships between nodes of a Bayesian Network are indicated by drawing arcs from
cause variables to their immediate effects [Hd99]. The arc signifies a cause-effect
relationship and encodes conditional probability distribution (CPD) – indicating to which
extent one variable is likely to affect another. In addition, the level of dependency between
nodes has a significant affect on computational expense: the stronger the relationship
between nodes, the bigger the conditional probability table size grows and vice versa.
Before determining the dependency of nodes of an outline model, this research takes into
account the following two extreme situations:
If nodes are extremely independent of each other, they become d-separated given an
evidence node, thereby making a network model unable to cope with abnormal
circumstances. For example, variables A and B, which are usually dependent on
each other, may be disconnected due to an occurrence of rare evidence, E.
On the other hand, if nodes are precisely connected to each other with conditional
probability distributions for all possible cases, it becomes computationally inflexible
to obtain a reliable estimation.
Taking into account the drawbacks of the above two extreme situations, the outline models
for this research are designed with the following practical hypothesis:
1. each node Pi is independent of its non-descendants Pj
2. each node Pi is independent of its descendants Di given a parent of Di
3. leaf nodes {L1, L2,…, Ln} are independent of each other unless they share the same
parent Xj
Alternatively, conditional dependency between variables of an outline model can be
presented using a Bay ball algorithm [Sr98] as illustrated (Figure 4.9).
5. Generation of a machine-readable Pitman’s Shorthand lexicon
75
Figure 4.9 Illustration of conditional dependency of variables in an outline model using the Bayes Ball algorithm [Sr98].
If there is no flow of a ball from A to B in a graph, A and B are conditionally independent given a set of observed or hidden variables X and vice versa.
4.5 Inference
The inference process of a Bayesian Network involves updating the probability of nodes
given some evidence and priori probabilities [XL02]. It is called finding belief of a node x,
denoted as BEL(x). In our case, evidence of nodes is given by a lexicon, training data or
user input. A primary use of BEL(x) is to find the likelihood of outline models, with which
the N-best models for a given shorthand outline are selected.
Among a variety of beliefs updating algorithms that support the Bayesian Network, this
work directly applies the “message passing” algorithm developed by Pearl [Pj88] – the belief
of every node in the network is taken as the product of π and λ messages, where π is a
message received from each of its parents (if any) and λ is a message received from each of
its children (if any). Alternatively, π and λ of each node of an outline model is denoted as a
product of π X(U), a message that node X receives from its parent U and )(XjY , a message
that node X receives from its child Yj. Note that an outline model is a tree structure where
every node has one and only one parent (except the root node, which has no parent) and has
N number of children (Y1, Y2,.., YN).
Legend
Hidden node
Observed node
Direction of the flow of a ball
Indication of parent-child relationship
5. Generation of a machine-readable Pitman’s Shorthand lexicon
76
4.5.1 Message Initialization
According to Pearl’s algorithm [Pj88], nodes of a Bayesian Network need to be assigned
with initial belief before propagating messages through the network. In general, assignment
of initial belief (prior probability) of a node varies widely from one application to another,
depending on statistical information on variables as well as previous experience of a
developer working on a similar problem.
In this work, message initialisation varies depending on the type of node. The initialisation
of π and λ messages for different types of node of an outline model is presented as follows.
Root node: A root node is the topmost one in an outline model and does not have any parent;
therefore its π message is set to 0.5 assuming that there is an equal chance of taking a TRUE
or FALSE value for this node. Its λ message is set to 1 assuming that there is a TRUE
relationship from its child nodes.
Unique node: A unique node does not have any descendants and is linked directly to a root
node. Its π message is set to 1 assuming that there is a TRUE relationship from its parent
(root node) and its λ message is set to 1, stating that a primitive associating with this node
appears in both lexicon and training data.
Virtual node: A virtual node is a judgemental node holding a true relationship from its
parent (i.e., π = 1) and optional relationship to its children (i.e.,
λ=P(Child_Nodes|observation).
5. Generation of a machine-readable Pitman’s Shorthand lexicon
77
Hidden node: Similar to a virtual node, a hidden node holds a true relationship from its
parent (i.e. π = 1) and optional relationship to its children (i.e.,
λ=P(Child_Nodes|observation).
Leaf node: A leaf node (not including a “unique node”) holds an optional relationship from
a virtual node or a hidden node and its π message is set to P(Child_Nodes|observation). It
does not have any children and its λ message is set to a confidence score of the node
obtained from training data.
On the whole, our message initialisation strategy is similar to the one implemented by Xiao
and Leedham [XL02] for signature verification. Nonetheless, the estimated values are
different in this work in accordance with the characteristics of handwritten Pitman’s
Shorthand.
4.5.2 Belief Updating
Belief of a node in an outline model is calculated by the formula presented by Xiao [XL02]
denoted as
)()()( xxxBEL (4.1)
where β is a normalization factor, λ(x) is a combined message received from all the children
of node X and π(x) is a combined message received from all the parents of node X.
Depending on the type of node, λ(x) is calculated differently. If it is a Root Node, λ(x) can
be defined by the formula presented by Pearl [Pj88]:
))(()( xxj
Yj (4.2)
5. Generation of a machine-readable Pitman’s Shorthand lexicon
78
Where )(xjY is a message that a node X receives from its child node Yj.
If a node is a unique node, λ(x) is defined as:
1)( x (4.3)
If it is a virtual node, λ(x) is defined as:
001.0
)(
)(
x
xj
Y
If a node is a hidden node, λ(x) is defined as:
1.0
)(
)(
x
xj
Y
In equations 4.4 and 4.5, the values 0.001 and 0.1 are predefined probabilities, used when
none of the child node of X is likely to be true. Selection of these confidence scores is based
on several experimental results, testing with different thresholds between 0 and 1.
If a node is a leaf node, but not a unique node, λ(x) is defined as:
),(
)0.1,0.1(
)(
ba
x
If a child Yj of the virtual node X is true
otherwise (4.4)
If a child Yj of the hidden node X is true
otherwise (4.5)
If a value of node X is assigned by a lexicon
otherwise (4.6)
5. Generation of a machine-readable Pitman’s Shorthand lexicon
79
where a and b are normalised recognition and training probabilities for a corresponding
node. The next section, “Learning of Outline Models” explains how a and b are calculated
using training data.
4.6 Learning of Outline Models
In general, learning in a Bayesian Network often refers to the learning of structure of a
model and its parameters, or learning either one of them [Mk01]. In this work, learning
refers to the learning of parameters of an outline model. Structure learning is not of concern
here since there is no direct interaction between the low level segmentation engine (of the
collaborator) and the network modelling engine (of this research) to enable a dynamic
change of a basic model layout.
Parameter learning of an outline model includes finding an optimal maximum likelihood of a
node based on a set of training parameters and assigning the likelihood value as belief of the
node. There are various learning algorithms [Hd99], [Mk01] that support Bayesian
Networks and the selection of an appropriate one is based on two factors: structure of the
network (whether it is known or unknown) and evidence of nodes (whether they are fully or
partially observed). With full details of these two factors, an appropriate learning algorithm
for a particular Bayesian Network can be identified using Murphy’s decision table (Table 4-
1), shown below.
Table 4-1 Murphy’s decision table [Mk01]
ObservabilityStructure Full PartialKnown Close form EMUnknown Local search Structural EM
The table indicates which algorithm is likely to be the most effective under which
circumstances. For example, the Expectation Maximization (EM) algorithm is likely to be
5. Generation of a machine-readable Pitman’s Shorthand lexicon
80
the most suitable one for a Bayesian Network whose structure is known in advanced and
whose parameters are partially observed. With reference to this table, parameter learning of
an outline model is discussed in two parts: “learning of consonant primitives” and “learning
of vowel primitives”.
4.6.1 Learning of Consonant Primitives
The maximum likelihood estimates (MLE) learning method is used to find estimates of
consonant primitives of an outline model since the structure of the model is known and its
consonant parameters are fully observed in training data. The structure (states of an outline
model) is known in advanced because an outline model is initially constructed with a lexicon
entry with clear information on the number of segments (states) of each shorthand outline.
In addition, the consonant primitives are always observed in training data because
stenographers never omit consonant primitives of a vocalised outline. Murphy denotes MLE
as a closed form in his table (Table 4-1).
The basic idea behind the MLE method is to maximise the likelihood of training data D,
which contains M cases (believed to be independent) [Hd99]. Assuming that
Di={N(i,1),N(i,2),..N(i,j)} is the ith sample of training data D, j is the number of consonant
primitives of a word and N(i,1)={N(i,1,1),N(i,1,2),..,N(i,1,k)} represents a set of possible consonant
primitives of the node N(i,1), a pair of MLE values (a,b) for each consonant node N(i,j,k) can be
calculated using formula 4.7 and formula 4.8 when a training sample Di is fed into an outline
model. In general, a represents the likelihood of a consonant node N(i,j) to be recognised as a
primitive type N(i,j,k) and b is the likelihood of consonant primitive N(i,j,k) to be associated
with node N(i,j). The calculation of a is denoted as:
5. Generation of a machine-readable Pitman’s Shorthand lexicon
81
a =
M
iiDP
M 1
),|(1 (4.7)
where is the recognition/classification accuracy of a consonant node given training data
Di and a parent node P.
The calculation of b is denoted as:
b =
M
i
DiPM 1
),|(1 (4.8)
where is a conditional variable which is 1 if a consonant primitive of training data Di has
a true relationship with its corresponding parent node P and 0 otherwise.
In addition, the value b is saved in a history file to be used to create new outline models that
do not have training data; the history file creation is presented below.
Initialization
D: a collection of training data
Di: ith sample of the training data, D
L: a primitive lexicon
Li: an element of L which holds an equal word value as Di
N: number of consonant primitives contained in Di
j: an index identifier
Ni,j: a pattern representing the jth consonant primitive of Di
Li,j: a pattern representing the jth consonant primitive of Li
b : probability of Ni,j to have a relationship with Li,j
5. Generation of a machine-readable Pitman’s Shorthand lexicon
82
History updating
If (Ni,j != Li,j & b >0)
Save b
The above two lines of pseudo code indicate that if a pattern Ni,j ,observed in training data is
not the same as the one defined in a lexicon and if there is evidence confirming a
relationship between Ni,j and Li,j (i.e., the value of b is greater than zero), the system creates a
history file and stores the value b as a probability of Li,j to be recognised as Ni,j . Then, later
in the training process, the history file is retrieved to construct new outline models for words
which do not have any training samples.
In brief, the use of the history file has a significant benefit to the training of shorthand
outline models, particularly for words which do not have sufficient training data. This is
mainly because Pitman’s Shorthand is no longer a demanding skill nowadays and the
collection of thousands or millions of samples of training data is infeasible in terms of
accessibility to experienced writers.
4.6.2 Learning of Vowel Primitives
Finally, the hardest part of the learning process will be addressed, where there are
hidden/missing vowel variables. The problem is that vowel primitives are rarely written by
Pitman’s Shorthand writers and omitted positions vary widely depending on individual
preference and context. When having such non-linear distribution of hidden variables, the
Expectation Maximization (EM) algorithm is shown to be effective to find a (locally)
optimal maximum likelihood of a node [Mk01]. Thus, the EM algorithm is applied for
learning of vowel primitives here. The basic idea behind the EM algorithm in our learning
process is that if we know the vowel values from a lexicon, the probability distribution of
5. Generation of a machine-readable Pitman’s Shorthand lexicon
83
hidden vowels in an outline model can be estimated after learning in the M step. Then in the
E step, in which E>M, these estimated values can be treated as though they are observed.
On the whole, the EM learning of a vowel (hidden) node is denoted as:
PEM (V=TRUE |O=TRUE) =)(
)(
TRUEOE
TRUEVE
(4.9)
where V is a vowel node, O is an outline model and E(…) is the number of times a
corresponding parameter is expected to occur. According to Murphy [Mk01], E(...) is
computed as follows
m m
mm DePDeIeE )|()|()( (4.10)
where D is a set of training data, I(e|Dm) is an indicator function which is 1 if an event e
occurs in training case m, and 0 otherwise.
4.7 Model Selection
Model selection in a Bayesian Network is concerned with measuring the degree to which a
network structure (equivalence class) fits the prior knowledge and data. [Hd99] This work
determines the fitness of a particular outline model to a given input outline via a relative
posterior probability. Assume Oi is the ith outline model and P1, P2,.., Pn are input primitives
classified by our collaborator’s recognition engine, a posterior probability (fitness) of the
outline model Oi given a set of input primitives can be defined as:
P(Oi| P1, P2,.., Pn) = P(Oi)P(P1, P2,.., Pn|Oi) (4.11)
According to equation 4.11, posterior probability of an outline model is calculated based
upon a prior probability of an outline model in combination with the likelihood of input
5. Generation of a machine-readable Pitman’s Shorthand lexicon
84
primitives which belong to the given outline model. Alternatively, equation 4.11 can be
denoted in terms of belief of a node as follow:
)|()|(),...,|( 2,1 jjini PNBELOxBELPPPOP (4.12)
where j = (1,..,n), Oi is the ith outline model, x is a root node of Oi, BEL(…) is the belief of a
node, Pj is an input primitive and Nj is a child node of the root node x.
To find the N-best outline models for a given input, models with top N posterior
probabilities are chosen. However using the posterior probabilities alone to find the best
models is not computationally efficient. The problem is that the number of outline models
increases along with the number of words contained in a lexicon and calculating the
posterior probability of thousands of outline models in real time word transcription is
infeasible, mainly in terms of operational time. Therefore, three unigram-based rejection
strategies are applied in our system in order to reduce model selection time.
The first rejection strategy – number of consonant primitives (NCP) of an input outline is
used as a first level filter to reject outline models that are not relevant to a given input. The
approach is denoted as “NCP filter” and the algorithm is denoted as:
)()()( \ kiNCPiNCPkNCP OOO (4.13)
where O NCP(i) is a collection of outline models relating to any NCP, and k is the NCP of an
input outline. Example 1 (below) clarifies the concept behind NCP filter.
Example 1
Assuming that k= 2, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained in the
system and NCP of O1, O2, O3, O4, O5, O6 are 2, 2, 6, 3, 5 and 2 respectively, O NCP(2) is
calculated using the formula 4.13 as follow:
O NCP(k) = O NCP(i) \ O NCP(i ≠ k)
5. Generation of a machine-readable Pitman’s Shorthand lexicon
85
O NCP(2) = { O1, O2, O3, O4, O5, O6} \ { O3, O4, O5}
= { O1, O2, O6}
The second rejection strategy – outline models are discriminated in favour of a pair of
primitives, appearing at the first and last (consonant) segments of an outline. This approach
is denoted as “F&L filter” and the algorithm is denoted as:
O F(k),L(j) = OF(i),L(i) \ OF(i ≠ k),L(i ≠ j) (4.14)
where OFi,Li is a set of outline models whose first and last segments relate to any type of
primitive and k and j are types of the first and last segments of an input outline respectively.
Example 2 below demonstrates the concept behind F&L filter.
Example 2
Assuming that k= 5, j = 6, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and (F(i),L(i)) of O1, O2, O3, O4, O5, O6 are (3,2), (5,5), (5,6), (1,2), (5,6) and
(5,2) respectively, O F(5),L(6) is calculated as:
O F(k),L(j) = OF(i),L(i) \ OF(i ≠ k),L(i ≠ j)
O F(5),L(6) = { O1, O2, O3, O4, O5, O6} \ { O1, O2, O4, O6}
= { O3, O5}
The idea behind formula 4.14 is based on an interesting phenomenon i.e., wrongly spelled
English words are sometimes comprehensible to a reader as long as the first and the last
letters of the words are clearly indicated. For example, you may understand the following
sentence even though it contains a number of spelling errors: “Wornlgy seplled Egnlish
words are sitll leiglbe to a reader as lnog as the frist and lsat ltteers of the words are crroect.”
In other words, first and last letters of a word provide heuristics for word identification in
English. Similar to this phenomenon, the outline model selection in our work can be based
on evidence of the first and last primitives of an outline provided that the first and last
segments of an outline are always written in Pitman’s Shorthand. According to our study
5. Generation of a machine-readable Pitman’s Shorthand lexicon
86
done on 10 samples of shorthand notes, handwritten by professional shorthand writers, it is
observed that the first and last primitives of a vocalized outline are always written in
Pitman’s Shorthand. Therefore, the second rejection strategy (formula 4.14) is based on the
first and last primitives of a shorthand outline.
The third rejection strategy – outline models are selected depending on the existence of
circular primitives in an input outline. The approach is referred to “C filter” and the
algorithm is denoted as:
O C(k) = OC(i) \ OC(i ≠ k) (4.15)
where OC(i) is a set of outline models, k is a conditional variable which is TRUE if an input
outline contains circular primitives and FALSE otherwise. Example 3 below demonstrates
the concept behind C filter.
Example 3
Assuming that k= TRUE, O = {O1, O2, O3, O4, O5, O6} is a set of outline models contained
in the system and C(i) of O1, O2, O3, O4, O5, O6 are TRUE, FALSE, TRUE, FALSE, FALSE,
TURE respectively, O C(TRUE) is calculated as:
O C(k) = OC(i) \ OC(i ≠ k)
O C(TRUE) = { O1, O2, O3, O4, O5, O6} \ { O2, O4, O5}
= {O1, O3, O6}
Formula 4.15 checks the existence of circular primitives in outline models and splits the
whole outline models into two main groups: those containing circular primitives and those
not containing circular primitives. In general, this rejection strategy performs well, with
reliable accuracy of the collaborator’s recognition engine at detecting circular primitives of
an outline (if there are any).
Overall, model selection strategies carried out in this work are illustrated from left to right
rejection order (Figure 4.10). After the C filter, posterior probabilities of the remaining
5. Generation of a machine-readable Pitman’s Shorthand lexicon
87
outline models are calculated using formula 4.12, with which the N-best candidate outline
models for a given input are chosen.
Figure 4.10 Illustration of outline model selection strategies
4.8 Experimental Result
A primary goal of the experiments carried out in this chapter is to evaluate the transcription
accuracy of the Bayesian Network based word interpreter under the following criteria:
In the presence of shape variation and position confusion of pen strokes due to
natural handwriting.
In the presence of segmentation and classification errors due to hardware constraints
and limitations of the recognition engine.
In the presence of missing vowel primitives that are randomly omitted among
outlines by experienced Pitman’s Shorthand writers.
In the presence of incorrect shorthand outlines, written by inexperienced shorthand
writers
Legend
A collection of outline model (its length represents number of outline models)
NCP filter
F&Lfilter
C filter Posterior probability
filter
Rejection strategy
5. Generation of a machine-readable Pitman’s Shorthand lexicon
88
4.8.1 Data Set
Three types of data sets are evaluated in experiments of this chapter and they can be outlined
as follow:
Single-consonant data set: This data set contains outlines with skeletons having
one and only one consonant stroke, for instance, shorthand outlines for the
words “bay” ( ), “pea” ( ) and “pat”( ). In general, various groups of
homophones (i.e., outlines that look similar but have different representations)
contain in this data set as outlines are similar with minor difference of line
thicknesses, vowel positions and inclinations.
Stroke-combination data set: This data set contains outlines with skeletons
having two or more consonant strokes, written according to the normal rules of
Pitman’s Shorthand i.e., phonemes of the words are directly converted into
Pitman’s primitives without applying any special rules of Pitman’s Shorthand,
invented for speed enhancement purposes. The data set covers the whole range
of possible stroke combinations, and sample outlines of the data set are
illustrated (Figure 4.11).
Figure 4.11: Samples of the stroke combination data set
Special-rule data set: This data set contains words written according to the
special rules of Pitman’s Shorthand. For instance, instead of writing the word
“after” by comprising primitives of the phonemes /F/, /T/, /R/ and vowels as in
Figure 4.12(a), Pitman uses a doubled length /F/ curve to express the word
Bar making rare escape machine
5. Generation of a machine-readable Pitman’s Shorthand lexicon
89
“after” as in (Figure 4.12 (b)). In general, this data set contains inconsistent
outlines, written without following corresponding special rules of Pitman’s
Shorthand by (inexperienced) shorthand writers who do not digest the complete
rules of Pitman’s Shorthand.
Figure 4.12: Two different shorthand outlines for the word “after”; (a) the word “after” is written according to the direct conversion of phonemes into primitives (b) the word
“after” is written according to the double-length rule of Pitman’s Shorthand
Table 4-2: Details of the data collection for the three data sets
Data set name Numbers of words
Writer ID Number of times
Single-consonant data set 135 Writer A 2
Single-consonant data set 135 Writer B 1
Stroke-combination data set 192 Writer A 1
Stroke-combination data set 192 Writer B 1
Stroke-combination data set 192 Writer C 1
Special-rule data set 87 Writer A 2
Special-rule data set 87 Writer D 2
Special-rule data set 87 Writer E 1
In total, 1416 outlines were collected for the three data sets where
(a) Incorrect outline (b) Correct outline
Two different outlines for the word “after”
Consonant /F/
Consonant /T/ Consonant /R/
Doubled length curve /F/Vowel Vowel
5. Generation of a machine-readable Pitman’s Shorthand lexicon
90
Table 4-2 provides details of the collected data. The data is collected using a tablet PC
with an electromagnetic digitizer of resolution 1000 ppi and five writers were involved in the
data collection. The three data sets cover the whole range of shorthand primitives and the
word list contains the 5000 most frequently used English words of the general domain. 45%
of the data is included in a training data set and samples of the collected data are illustrated
(Figure 4.13).
Figure 4.13: Screen shot of outlines written by Writer A
4.8.2 Evaluation of the Recognition Engine
Before the evaluation of the word transcription performance of the Bayesian Network based
interpreter, this section firstly evaluates the accuracy of the recognition engine in order to
relate it to the overall word transcription performance. The study is categorized into three
groups namely: (1) analysis of the vocalized outline identification, (2) analysis of the outline
segmentation, and (3) analysis of the primitive classification. The studies are carried out
using the whole data sets and experimental results are discussed as follow.
Firstly, the accuracy of the vocalised outline identification is discussed. To clarify what is
meant by the vocalised outline identification– it is the process of defining whether a written
outline is a short-form or a phonetically written outline. As shown in Figure 4.14, accuracies
of the vocalised outline identification vary from writer to writer, or even from time to time
Pitman’s Shorthand outline for the word “bay”
5. Generation of a machine-readable Pitman’s Shorthand lexicon
91
for the same writer. For instance, consider the accuracies of the vocalised outline
identification for writer A for the single-consonant data set where there is a difference of
approximately 62% between the accuracy of the first time and the second time writings. The
study finds that a major reason for having such a difference of accuracy is that writer A
omitted most of the vowels while writing the single-consonant data set the first time whereas
the writer indicated at least one vowel for most of the words at the second time of writing.
Therefore, it is summarised that the indication of at least one vowel for an outline is critical
for obtaining high vocalised outline identification accuracy. Any words that are not
recognized as vocalised outlines are remarked as short-forms by the recognition engine. For
example, 73% of the data written by writer A for the single consonant data set are remarked
as short-forms by the recognition engine although the outlines are, in fact, vocalized
outlines. On the whole, average vocalised outline identification accuracy for the whole data
sets is 69%.
Figure 4.14: Evaluation of the vocalised outline identification of the recognition engine
Performance of the vocalised outline identification by the recognition engine
0%
20%
40%
60%
80%
100%
120%
A A B C A B A A E D D
writer
Acc
ura
cy
Legend
Single-consonant data set
Stroke-combination data set
Special-rule data set
5. Generation of a machine-readable Pitman’s Shorthand lexicon
92
Secondly, segmentation accuracy of the recognition engine is discussed. In general, the
segmentation accuracy varies depending on different data sets. As shown in Figure 4.15, the
single-consonant data set has about 72% of correctly segmented outlines, whereas the
stroke-combination data set has only about 21% of correctly segmented outlines on average.
The results are reasonable since the single consonant data set contains outlines with only one
consonant stroke and hence the higher segmentation accuracy. For the analysis of
segmentation accuracy of different writers of same data set, consider the results of the
special-rule data set where the segmentation accuracy of outlines written by writer E is
higher than that of writer A. Statistics show that writer A does not have previous experience
of using a pen based text entry system whereas writer E has previous experience of applying
pen based text entry systems of handheld devices. In addition, statistics show that writer A
prefers writing small scripts on a tablet in a similar manner to writing on a conventional
paper whereas writer E produces larger scripts with flexible pen movements on the digitizer.
Therefore, it is observed that writers’ previous experience of using pen based text entry
systems has an influence over the segmentation performance of the recognition engine. The
average segmentation accuracy of the overall data sets is 36%. The segmentation accuracy
presented in Figure 4.15 is based on the number of correctly detected vocalized outlines and
is formulated as follow:
100)(
t
yts (4.16)
where s is segmentation accuracy, t is the total number of written words and y is the total
number of outlines that are recognised as short-forms instead of vocalised outlines.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
93
Figure 4.15: Evaluation of the segmentation accuracy of the recognition engine
Thirdly, classification accuracy of the recognition engine is discussed. As shown in Figure
4.16, the average classification accuracy of the stroke-combination data set is lower than that
of the single consonant data set or special rule data set. Statistics show that the classification
accuracy is influenced by several factors including tidiness of the handwriting, limitations of
hardware, or limitations of applied algorithms of the recognition engine. On average, the
classification accuracy of the whole data sets is 77% where the classification accuracy is
based on the total number of outlines that are recognised as vocalised outlines as well as
being correctly segmented. The formula is defined as:
100
t
xtc (4.17)
Segmentation accuracy of the recognition engine
0%10%20%30%40%50%60%70%80%
A A B C A B A A E D D
writer
Legend
Single-consonant data set
Stroke-combination data set
Special-rule data set
5. Generation of a machine-readable Pitman’s Shorthand lexicon
94
where c is classification accuracy, t is total number of written words, x is total number of
outlines that are recognised as vocalized outlines as well as being correctly segmented.
Figure 4.16: Evaluation of the classification accuracy of the recognition engine
4.8.3 Evaluation of the Word Transcription Accuracy
Experiments carried out in this section are categorized into three groups namely:
Analysis of word transcription accuracy using single-consonant data set.
Analysis of word transcription accuracy using stroke-combination data set.
Analysis of word transcription accuracy using special-rule data set.
Each group comprises of four graphs discussing experimental results from different aspects,
outlined as follow:
Recognition accuracy vs. transcription accuracy: the graph illustrates the influence
of the performance of the recognition engine over the transcription engine. It applies
Classification accuracy of the recognition engine
0%
20%
40%
60%
80%
100%
120%
A A B C A B A A E D D
writer
Legend
Single-consonant data set
Stroke-combination data set
Special-rule data set
5. Generation of a machine-readable Pitman’s Shorthand lexicon
95
two types of data in order to discuss the theme: firstly, data with any kind of errors
of the recognition engine and secondly, (filtered) data with no vocalised outline
identification and segmentation errors of the recognition engine.
Accuracy of the end result: the graph illustrates the accuracy of a list of results of an
input outline according to three measures: firstly, the accuracy of a correct word
appearing in the result list, secondly, the accuracy of the correct word appearing in
the top five group of the result list and thirdly, the accuracy of the correct word
appearing at the topmost position of the result list. Note that accuracies illustrated in
this graph are based on data with no vocalized outline and segmentation errors as the
correction of these errors is not included in the scope of this research.
Correction accuracy vs. classification/vowel errors: the graph illustrates the
correction accuracy of the Bayesian Network based word interpreter in relation to
the classification and vowel omission errors. Similarly, results reported in this graph
are based on data with no vocalised outline identification and segmentation errors.
Factors influencing the accuracy of a result list: the graph illustrates the average
distribution of factors influencing a correct word not to appear at the topmost
position of the result list. Similarly, results reported in this graph are based on data
with no vocalised outline identification and segmentation errors.
4.8.4 Analysis of Word Transcription Accuracy Using the Single
Consonant Data Set
4.8.4.1 Analysis of the Recognition Accuracy vs. the Transcription Accuracy
As shown (Figure 4.17), accuracy of the recognition engine, specifically, accuracy of the
vocalized outline identification and segmentation has a huge impact on accuracy of the
transcription engine. The study finds that approximately 73% of outlines written by writer A
the first time are not recognised as vocalised outlines and this directly affects the
5. Generation of a machine-readable Pitman’s Shorthand lexicon
96
transcription accuracy (less than 20%). It has been discussed that inadequacy of vocalised
outline identification of the recognition engine is mainly affected by the omission of vowels
among outlines and therefore, the indication of at least one vowel of a vocalised outline is
also encouraged in this research in order to achieve high transcription accuracy.
Relationship between recognition accuracy and transcription accuracy of the single-consonant
data set
0%10%
20%30%40%50%
60%70%80%
90%100%
A A B
Writer
Wo
rds
app
eare
d i
n t
he
resl
ut
list
In the presence ofany kind ofrecognition errors
In the presence of novocalised outlineidentification andsegmentation errors
Figure 4.17: Illustration of a relationship between recognition accuracy andtranscription accuracy of the single consonant data set
4.8.4.2 Analysis of the Accuracy of a Result List
As shown (Figure 4.19), approximately 93% of input data are interpreted with a result list
containing a correct word; 72% of the data are interpreted with a correct word appearing in
the top five group of a result list; 40% of the data are interpreted with a correct word
appearing at the topmost position of a result list on average.
An interesting phenomenon here is that although writer A has an immediate level of skill in
Pitman’s Shorthand and writer B has an inexperienced level of skill in Pitman’s Shorthand,
outlines written by writer B are transcribed more accurately than that of writer A. The study
finds that this is because the handwriting of writer B is more legible with more informative
5. Generation of a machine-readable Pitman’s Shorthand lexicon
97
pen strokes than that of writer A as compared (Figure 4.18). In relation to this finding, it is
remarked that the writing of legible scripts is encouraged in this current research to be able
to obtain high recognition and transcription accuracy.
Figure 4.18: Comparison of the handwriting of two writers
Evaluation of the word transcription accuracy of the single consonant data set
0%
20%
40%
60%
80%
100%
A A B
Writer
Tra
nsc
rip
tio
n a
ccu
racy
A correct word appearingin a result list
A correct word appearingin the top five group of aresult list
A correct word appearingat the topmost positionof a result list
Figure 4.19: Illustration of the word transcription accuracy of the single consonant data set
Pitman’s Shorthand outlines written by writer B
Pitman’s Shorthand outlines written by writer A
night nod note nut
night nod note nut
5. Generation of a machine-readable Pitman’s Shorthand lexicon
98
4.8.4.3 Analysis of the Correction Accuracy vs. the Classification/Vowel
Errors
The graph (Figure 4.20) illustrates classification and vowel errors in comparison with the
correction accuracy where the correction accuracy indicates how much of the classification
and vowel errors are covered by the transcription engine respectively. Here, the
classification error, the vowel error and the correction accuracy are formulated as follow:
100t
ec (4.18)
where c is the classification error, e is the number of words having a classification error and t
is the total number of input words.
100t
fv (4.19)
where v is the vowel error, f is the number of words having omitted vowels and t is the total
number of input words.
100t
ba (4.20)
where a is the correction accuracy, b is the total number of words interpreted correctly by
the transcription engine in the presence of the classification /vowel errors and t is the total
number of words having classification errors or a vowel errors respectively.
On average, the correction rate for the classification error is 76% and the correction rate for
the vowel error is 55%. This indicates that the Bayesian Network based outline models,
implemented in the transcription engine are capable of coping with the classification and
vowel errors.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
99
Correction accuracy in comparison with the classification or vowel errors of the Single
consonant data set
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
A A B
Wrtier
Classification errors
Successful transcriptionin the presence ofclassification errors
Vowel errors
Successful transcriptionin the presence of vowelerrors
Figure 4.20: Illustration of the correction accuracy in comparison with the classification or vowel errors of the single consonant data set
4.8.4.4 Analysis of Factors Influencing the Accuracy of a Result List
As illustrated (Figure 4.21), the major factor (49%) that influences a correct word of an input
outline not to appear at the topmost position of a result list is because of the similarity of an
input outline to other outlines. This case is generally true for single-consonant data set as
outlines included in this data set are very similar, with minor differences of line thicknesses
and vowel positions. Other factors influencing a correct word not to appear at the topmost
position of a result list are classification errors (31%), vowel error (3%) and a combination
of similarity of an input outline to other outlines, classification error and vowel error (17%).
5. Generation of a machine-readable Pitman’s Shorthand lexicon
100
Average distribtuion of factors influencing the accuracy of a result list (single-consonant data
set)
49%
31%
3%17%
due to similarity to other outlines
due to classification errors
due to vowel errors
due to combination of similarity to other outlines, classification errors andvowel errors
Figure 4.21: Illustration of an average distribution of factors influencing the accuracy of a result list (single consonant data set)
4.8.5 Analysis of Word Transcription Accuracy Using Stroke-
combination Data Set
This section analyses word transcription accuracy for outlines containing two, or more than
two, strokes. The primary purpose of the study is to evaluate the transcription accuracy in
the presence of stroke combinations. Similar to the single-consonant data set, four types of
graphs are discussed as follow.
4.8.5.1 Analysis of the Recognition Accuracy vs. the Transcription Accuracy
As shown (Figure 4.22), average transcription accuracy of filtered 2 stroke-combination data
set is 97% where the value is similar to an accuracy achieved by the single consonant data
set (Figure 4.17). However, in terms of unfiltered data 3 the overall transcription accuracy
of the stroke-combination data set decreases by 27%, compared to that of the single-
consonant data set. The study finds that this is mainly due to the increase of segmentation
2 Filtered data does not contain any vocalized outline identification or segmentation errors. 3 Unfiltered data contains any kind of recognition errors.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
101
errors and in relation to this finding, it is evaluated that reliable outline segmentation is
important for the overall transcription accuracy in the case of having words with two or
more consonant strokes.
Relationship between recognition accuracy and transcription accuracy of the stroke-combination
data set
0%
20%
40%
60%
80%
100%
120%
C A B
Writer
Wo
rds
app
eare
d i
n t
he
resl
ut
list
In the presence ofany kind ofrecognition errors
In the presence of novocalised outlineidentification andsegmentation errors
Figure 4.22: Illustration of the relationship between recognition accuracy and transcription accuracy of the stroke-combination data set
4.8.5.2 Analysis of the Accuracy of a Result List
According to experimental results (Figure 4.23), approximately 97% of input data are
transcribed with a result list containing a correct word; 97% of input data are transcribed
with a correct word appearing in the top five of a result list; 55% of input data are
transcribed with a correct word appearing at the topmost position of a result list. On the
whole, the transcription accuracy of the stroke-combination data set increases by 25%,
compared to the single consonant data set.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
102
An interesting phenomenon here is that although writer C’s data is not included in the
training data set, 96% of the writer’s outlines are transcribed with a correct word appearing
at the top-five of a result list. This indicates that the history based learning algorithm
implemented in Bayesian Network models can effectively cope with unseen patterns that are
not included in a training data set.
Evaluation of the word transcription accuracy of the stroke-combination data set
0%
20%
40%
60%
80%
100%
120%
C A B
Writer
Acc
ura
cy
A correct word appearingin a result list
A correct word appearingin the top five of a resultlist
A correct word appearingat the top most positionof a result list
Figure 4.23: Illustration of the word transcription accuracy of the stroke-combination data set
4.8.5.3 Analysis of the Correction Accuracy vs. the Classification/Vowel
Errors
The graph (Figure 4.24) illustrates classification and vowel errors in comparison with the
correction accuracy for stroke-combination data set. The classification error, the vowel error
and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20
respectively. As shown (Figure 4.24), none of the 3% of the classification errors of writer C
are corrected because the errors include patterns deviating completely from original patterns.
For instance, if the orientation of an input pattern is completely different from its original
form, the transcription engine does not cover this kind of error.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
103
Note that writers rarely omit vowels in this data set, compared to the single-consonant data
set. Writers of this data set are encouraged to indicate at least one vowel of an outline,
mainly in order to avoid the rejection of substantial data at the recognition stage by a
vocalised outline detector.
Correction accuracy in comparison with the classification or vowel errrors of the stroke-
combination data set
0%
20%
40%
60%
80%
100%
120%
C A B
Wrtier
Classification errors
Successful transcriptionin the presence ofclassification errors
Vowel errors
Successful transcriptionin the presence of vowelerrors
Figure 4.24: Illustration of the correction accuracy in comparison with the classification/vowel errors of the stroke combination data set
4.8.5.4 Analysis of Factors Influencing the Accuracy of a Result List
The graph (Figure 4.25) illustrates factors influencing a correct word of an input outline not
to appear at the topmost position of a result list, where 11% is due to the similarity of an
input outline to other outlines and the rest 89% is due to the combination of classification
errors, vowel errors and the similarity of input outlines to other outlines. An interesting
phenomenon here is that ambiguity due to the similarity of an input outline to other outlines
decreases by 38% compared to the single consonant data set. This indicates that outlines
become less ambiguous when they contain two or more consonant strokes.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
104
Average distribtuion of factors influencing the accuracy of a result list (stroke-combination data
set)
11%
89%
due to similarity to otheroutlines
due to combination ofsimilarity to otheroutlines, classificationerrors and vowel errors
Figure 4.25: Illustration of an average distribution of factors influencing the accuracy of a result list (stroke-combination data set)
4.8.6 Analysis of Word Transcription Accuracy for the Special-rule Data
Set
This section analyses word transcription accuracy for outlines, written according to the
special rules of Pitman’s Shorthand. The primary purpose of the study is to evaluate the
transcription performance in the presence of the application of the special rules of Pitman’s
Shorthand, where consistency between patterns written by one writer to another becomes a
critical concern. Similar to the single-consonant and stroke-combination data sets, four
types of graphs are discussed as follow.
4.8.6.1 Analysis of the Recognition Accuracy vs. the Transcription Accuracy
As illustrated (Figure 4.26), the transcription accuracy of filtered data 4achieves up to 100%,
however, the transcription accuracy of unfiltered data5 gets to as low as 1% for writer D.
4 Filtered data does not contain any vocalized outline identification error or segmentation error.
5 Unfiltered data contains any kind of recognition errors.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
105
The study finds that this is mainly due to the preference of writer D writing outlines without
vowel components as well as due to the writing of incorrect outlines without fully following
the special rules of Pitman’s Shorthand. In relation to this finding, it is remarked that the
writing of consistent outlines in accordance with the special rules of Pitman’s Shorthand is
encouraged in this research in order to obtain high transcription accuracy.
Relationship between recognition accuracy and transcription accuracy of the special-rule data set
0%
20%
40%
60%
80%
100%
120%
A A E D D
Writer
Wo
rds
app
eare
d i
n t
he
resl
ut
list
In the presence ofany kind ofrecognition errors
In the presence of novocalised outlineidentifcation andsegmentation errors
Figure 4.26: Relationship between recognition accuracy and transcription accuracy of the special-rule data set
4.8.6.2 Analysis of the Accuracy of the Result List
According to experimental results (Figure 4.27), approximately 85% of input data are
transcribed with a result list containing a correct word; 85% of input data are transcribed
with a correct word appearing at the top-five of a result list; 80% of input data are
transcribed with a correct word appearing at the topmost position of a result list. An
interesting phenomenon here is that the special-rule data set achieves the highest average
transcription accuracy (80%) in terms of a correct word appearing at the topmost position of
a result list.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
106
Evaluation of the word transcription accuracy of the special-rule data set
0%
20%
40%
60%
80%
100%
120%
A A E D D
Writer
Tra
nsc
rip
tio
n a
ccu
racy
A correct word appearingin a result list
A correct word appearingin the top five of a resultlist
A correct word appearingat the topmost positionof a result list
Figure 4.27: Evaluation of the word transcription accuracy of the special-rule data set
4.8.6.3 Analysis of the Correction Accuracy vs. the Classification/Vowel
Errors
The graph (Figure 4.28) illustrates classification and vowel errors in comparison with the
correction accuracy for the special-rule data set. The classification errors, the vowel errors
and the correction accuracy are calculated by applying formula 4.18, 4.19 and 4.20
respectively. Experimental results show that there is no classification error or vowel error
for outlines written by writer A and writer D (i.e., data of the first time writing) and this
directly affects the overall transcription accuracy which achieves up to 100% for some cases
as illustrated(graph above). On average, 100% of the vowel errors and 67% of classification
errors are covered by the transcription engine.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
107
Correction accuracy in comparison with classification or vowel errors of the special-rule
data set
0%
20%
40%
60%
80%
100%
120%
A A E D D
Wrtier
Classification errors
Successful transcriptionin the presence ofclassification errors
Vowel errors
Successful transcriptionin the presence of vowelerrors
Figure 4.28: Illustration of the correction accuracy in comparison with classification or vowel errors of the special-rule data set
4.8.6.4 Analysis of Factors Influencing the Accuracy of a Result List
The graph (Figure 4.29) illustrates factors influencing a correct word of an input outline not
to appear at the topmost position of a result list, where 23% is due to the similarity of an
input outline to other outlines, 8% is due to classification errors and the rest 69% is due to
the combination of the classification errors, vowel errors and the similarity of input outlines
to other outlines.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
108
Average distribtuion of factors influencing the accuracy of a result list (special-rule data set)
23%
8%69%
due to similarity to otheroutlines
due to classificationerrors
due to combination ofsimilarity to otheroutlines, classificationerrors and vowel errors
Figure 4.29: Illustration of an average distribution of factors influencing the accuracy of a result list (special-rule data set)
4.9 Summary and Discussion
In summary, this chapter proposes a novel primitive based text translation approach to
interpret handwritten Pitman’s outlines into English words, using Bayesian Network based
outline models. Ambiguities of pen strokes and interactions between strokes of written
shorthand outlines are modelled in Bayesian Network outline models. The word
interpretation comprises the network modelling, the belief propagation, the Bayesian
learning and the model selection. Experimental results of the new framework are presented,
following the full description of Bayesian Network based transcription algorithms.
An evaluation of the recognition engine is presented in combination with the experimental
results of the Bayesian Network based word interpreter, based on three data sets, namely:
the single-consonant data set, the stroke-combination data set and the special rule data set.
Overall, a primary issue discussed in relation to the performance of the recognition engine is
the indication of at least one vowel of an outline in order to avoid an incidence where
outlines are mistaken as short-forms instead of vocalised outlines. In terms of the feasibility
5. Generation of a machine-readable Pitman’s Shorthand lexicon
109
of enforcing the restriction to shorthand writers, approximately 80% of inexperienced
Pitman’s Shorthand writers find the restriction is easily adaptable; however, approximately
60% of professional Pitman’s Shorthand writers find the restriction is impractical, especially
in the case of speed writing. Therefore, a further improvement on algorithms of the vocalised
outline identifier is recommended where the indication of vowels of an outline will no longer
be mandatory.
From the aspect of the performance of the Bayesian Network based word interpreter, the
average transcription accuracies for the three (filtered6) data sets are 91% for a correct word
appearing in a result list, 85% for a correct word appearing in the top five of a result list and
58% for a correct word appearing at the topmost position of a result list. Overall, the
accuracy of 91% is satisfactory, however the accuracy of a correct word appearing at the
topmost position of a result list (58%) indicates that the homophones of the result list need to
be resolved with the application of contextual information. A resolution to this problem is
discussed in detail in chapter 6, which is about the phrase level transcription.
From the aspect of a relationship between the features of a writer and the transcription
accuracy, the study finds that gender and age of writers do not have significant influence on
the performance of the recognition and transcription systems. However, the study finds that
a writer’s skill in Pitman’s Shorthand and a writer’s previous experience in using pen based
text entry systems are related to the overall transcription accuracy. Another consideration is
that writers of the current experiments are right-handed and a further analysis of the
transcription performance with left-handed writers is recommended. In addition, writers of
the current experiment use British English and further analysis of the transcription
performance with the writers who use American English is challenging. Since Pitman’s
Shorthand is written phonetically, outlines written according to British English and
American English are different, especially in vowel notations.
6 Filtered data does not contain any vocalized outline identification error or segmentation error.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
110
As a final discussion, a comparison of performance of the conventional phonetic based
transcription and performance of the novel primitive based transcription is given. Table
(Table 4-3) presents accuracies of the two approaches where results are based on the data set
used in experiments of the phonetic based transcription approach, presented in Chapter 3.
Table 4-3: Transcription accuracies of the phonetic based transcription and the primitive based transcription approaches
Average Transcription
Accuracy
Primitive based
transcription
Phonetic based
transcription
Overall 93% 84%
In the presence of vowel
omission or confusion
100% 0%
In the presence of inconsistent
writing
0% 0%
In the presence of
classification error
100% 100%
As shown (Table 4-3), an average transcription accuracy of the primitive based transcription
approach increases by 9% compared to that of the phonetic based transcription approach.
The study finds that this is mainly due to the increased correction accuracy of vowel errors
in the novel framework. Overall, performance of the novel proposed framework is promising
but must be improved upon in the areas discussed for it to become a commercially viable
system.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
111
5 Generation of a machine-readable Pitman’s Shorthand
lexicon
5. Generation of a machine-readable Pitman’s Shorthand lexicon
112
Chapter 5 Introduction
The previous chapter presented a novel solution as a means of interpreting handwritten
Pitman’s Shorthand outlines using Bayesian Network algorithms, in which geometrical
features of the outlines are directly translated into English word(s). On the whole, the
solution was found to be efficient, mainly with the use of a machine-readable Pitman’s
Shorthand dictionary that contains a set of shorthand outlines mapping to corresponding
English word(s). Based on a thorough literature review carried out in this research, no other
machine-readable (electronic) Pitman’s Shorthand lexicon has ever been designed, making
this the first of its kind which has been developed specifically for this research. This may be
a major reason why none of the previous work (of the same framework) attempted to analyse
performance of the direct transcription of geometric primitives into English words.
Specifically, this chapter presents full details of the creation of the electronic Pitman’s
Shorthand lexicon, developed for this research, under the following four sections:
1. Overview: overview of the rule based creation of the electronic Pitman’s Shorthand
lexicon and discussion on general advantages and disadvantages of rule based
approaches.
2. Structure: description of the lexicon structure in terms of feature set, key and lexicon
as a whole.
3. Rules: description of rules, applied in our system to conform to the writing rules of
Pitman’s Shorthand.
4. Experimental results: evaluation of the electronic Pitman’s Shorthand lexicon,
mainly in terms of accuracy and homophone distribution.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
113
5.1 Overview
Firstly, in order to clarify precisely what is meant by a Pitman’s Shorthand lexicon, sample
entries of a conventional Pitman’s Shorthand dictionary (available in book format) and
sample entries of an electronic Pitman’s Shorthand dictionary are illustrated (Figure 5.1).
Figure 5.1: (a) sample entries of a conventional Pitman’s Shorthand dictionary available in book format (b) sample entries of an electronic Pitman’s Shorthand
lexicon
A primary objective of the research presented in this chapter is to create an electronic
Pitman’s Shorthand lexicon Figure 5.1(b), based on the concept of Figure 5.1 (a). A major
difference between them (Figure 5.1 (a) and (b)) is the relationship between keys and
elements – each key (each word) is related to one and only one shorthand outline in the
conventional lexicon whereas each key (each shorthand outline) is related to one or more
than one word in the electronic lexicon. The latter layout is preferred in this work because
(a) (b)
Key Shorthand Key Word
airs , erase
bake
ball
bays, pays
airs, erase
oak, go
oak, go
pays, bays
airs
bake
ball
bays
erase
oak
go
pays
5. Generation of a machine-readable Pitman’s Shorthand lexicon
114
it directly relates to ambiguities of handwritten Pitman’s Shorthand e.g., line thickness
ambiguities.
5.1.1 Rule-based Creation of the Electronic Pitman’s Shorthand Lexicon
The creation of the electronic Pitman’s Shorthand lexicon is based on the following four
basic steps, which are taken by stenographers while learning Pitman’s Shorthand:
1. Gain prior knowledge of English pronunciation.
2. Memorise notations of Pitman’s Shorthand.
3. Memorise writing rules of Pitman’s Shorthand.
4. Write Pitman’s Shorthand outlines, using the above learned knowledge.
In order to instruct a machine to generate the electronic Pitman’s Shorthand lexicon
automatically, the above four steps are restructured as below:
1. Set up a phonetic lexicon with a series of phonemes as a key for word identification.
2. Define notations of Pitman’s Shorthand in terms of low level geometric features, for
instance, a consonant W is defined as a combination of an anti-clockwise hook
and an upward stroke .
3. Define conversion rules that conform to the writing rules of Pitman’s Shorthand.
4. Generate a series of geometric features for a given word using the phonetic lexicon
and conversion rules.
The Pitman’s Shorthand lexicon is created using a set of conversion rules. When rule-based
algorithms are introduced in the field of handwriting recognition, one may argue that rules
are static and incapable of coping with natural ambiguities [Sy94], [Lr89]. Here, it is
important to realise that the rules reported in this chapter are applied only to static lexical
data, not to handwritten data; due to the use of this static lexical data, accuracy of the
Pitman’s Shorthand lexicon becomes reliable. Like other rule-based approaches [FF93],
5. Generation of a machine-readable Pitman’s Shorthand lexicon
115
[Mm03], training is not required for the creation of the shorthand lexicon and rules can be
refined easily as needed if the lexicon is to be altered.
5.2 Structure of the Electronic Pitman’s Shorthand Lexicon
5.2.1 Feature Set
The electronic Pitman’s Shorthand lexicon includes 31 features representing phonemes of
a word. The features are represented in numerical form and are shown in Table 5-1.
Table 5-1: features of the electronic Pitman’s Shorthand lexicon
Type Pattern Description Type Pattern Description
1 /T/ or /D/ 17
Large unclosed circle
2
/F/ or /V/ 18
Large closed circle
3
/th/ or /TH/ 19 Small unclosed loop
4
/P/ or /B/ 20
Small closed loop
5 /M/ 21
Large unclosed loop
6 /N/ or /NG/ 22
Large closed loop
7 /K/ or /G/ 23 Small hook
8
/SH/ or /ZH/ 24
Large hook
9
/CH/ or /J/ 25 Vowel
10 /R/ (downward) 26 Vowel
11
/L/ (upward) 27 Diphthong
12
/S/ or /Z/ 28 Diphthong
13
/R/ (upward) 29 Diphthong
5. Generation of a machine-readable Pitman’s Shorthand lexicon
116
14
/L/ (upward) 30
Diphthong
15
Small unclosed circle 31
Diphones
16 Small closed circle
5.2.2 Key
A key corresponds to a vocalised shorthand outline and relates to one or more than one
word. It is composed of consonants and vowels such that consonant primitives of a
vocalised outline are firstly allocated in chronological order with a series of vowel primitives
following at the end. A major reason for keeping vowel primitives at the end of a key is to
cope with the special writing order of Pitman’s Shorthand i.e., consonants are always written
first with vowels placed around the consonant kernel later. Sample keys are given in Figure
5.2 where each key comprises two major components– one containing consonant primitives
and the other containing vowel primitives. Both components are arranged in chronological
order such that a primitive at the end of the first component corresponds to the last
consonant of a vocalised outline and a primitive at the beginning of the second component
corresponds to the first vowel of the vocalised outline.
Word Pronunciation Phonemes in chronological writing order of Pitman’s Shorthand
Keys of a Pitman’s Shorthand lexicon
Famous /F Ā M Ŭ S/ /F M S Ā Ŭ / 2+5+16+91+92
Yellow /Y Ě L Ō W/ /Y L Ě Ō / 23+13+11+91+92
Figure 5.2: Sample keys of the electronic Pitman’s Shorthand lexicon; vowels are underlined
Pitman’s Shorthand outlines for the word “famous” and “yellow”
famous yellow
5. Generation of a machine-readable Pitman’s Shorthand lexicon
117
5.2.3 Lexicon Layout
A Pitman’s Shorthand lexicon is a hash-table with each key mapping to one or more than
one word where words with the same key contain the same series of similar consonant
primitives. Here, “similar consonant primitives” stands for primitives of the same type with
different line thicknesses or lengths. Sample entries of the Pitman’s Shorthand lexicon are
illustrated in Figure 5.3.
Lexicon
Sample Key Word
1 2+91 fee, father, further, after
2 4+91 pays, bays
Figure 5.3: Sample entries of the electronic Pitman’s Shorthand lexicon
Sample one in Figure 5.3 presents a lexicon entry for the words “fee”, “father”, “further” and
“after”. The example indicates that words with similar geometric features of different
lengths belong to the same key. In order to recognise length variation of the words, consider
the sample Pitman’s Shorthand outlines given in Figure 5.3.
Similarly, sample two in Figure 5.3 presents a lexicon entry for the words “pays” and
“bays”. The example indicates that words with similar geometric features of different
thicknesses belong to the same key. Consider the sample outlines illustrated in Figure 5.3 to
identify line thickness difference between the two words.
On the whole, the Pitman’s Shorthand lexicon is created as follows:
fee father after pays bays
Sample Pitman’s Shorthand outlines
5. Generation of a machine-readable Pitman’s Shorthand lexicon
118
P: a phonetic lexicon
N: numbers of words contained in P
Wi: ith word of the phonetic lexicon, P
Vi: phonetic representation of Wi
Si: a series of geometric features of Vi
table: a hash table, holding a Pitman’s Shorthand lexicon
key: a key, representing a vocalised shorthand outline
value: word value to which a specified key is mapped in table
Initialisation
table = a hash table
Lexicon organisation
For i = 0 to N
//convert phonemes of a word into geometric features
Si = convertPhonemetoShorthand(Vi)
key = Si
//if a key already exists
if (table.containsKey(key))
value = table.get(key)
value += Wi
end
else if (!table.containsKey(key))
value = Wi
end
table.put(key,value)
end
5. Generation of a machine-readable Pitman’s Shorthand lexicon
119
5.3 Conversion Procedure
This section presents full details of a conversion procedure that is used to transform
phonemes of a word into a series of geometric features. Assuming that x is a set of
phonemes for a particular word and y is a shorthand representation for the word, a
conversion procedure can be defined as
y = convertPhonemeToShorthand(x) (5.1)
For instance, if x is a set of phonemes /T Ō D Ā / (for the word “today”), then y is produced
by invoking the conversion procedure as follows:
y = convertPhonemeToShorthand(/T Ō D Ā /)
y = 1+ 1+ 91+ 92
In total, the conversion procedure comprises 46 rules, conforming to the writing rules of
Pitman’s Shorthand 2000, defined in [Oj95]. In order to produce a primitive representation
for a given set of phonemes, the 46 rules are applied in an ascending priority order as follow:
Priority 1: 1st rule – 17th rule
Priority 2: 18th rule – 32nd rule
Priority 3: 33rd rule – 36th rule
Priority 4: 37th rule – 39th rule
Priority 5: 40th rule – 41st rule
Priority 6: 42nd rule – 43rd rule
Priority 7: 44th rule
Priority 8: 45th rule – 46th rule
For instance, application of the 2nd rule must follow the completion of the 1st rule and
similarly, application of the 18th rule must follow the completion of the 17th rule. With the
5. Generation of a machine-readable Pitman’s Shorthand lexicon
120
aid of a diagram (Figure 5.4), data flow in the conversion procedure can be followed.
Figure 5.4: Illustration of the conversion procedure
5.3.1 The Importance of Algorithms of the Presented Rules
The automatic generation of an electronic Pitman’s Shorthand lexicon is, in fact, the
replication of a human’s power of recalling a set of writing rules and producing shorthand
outlines for corresponding English words. This process seems to be trivial; however the
efficiency of replicating the exact ability of the human writer relies totally on the efficiency
of the rules implemented in the system. In reality, the implementation of these rules is
deeply complex, as it involves the consideration of several indefinite factors, such as the use
of different notations for the same phoneme, depending on the conformability of pen
movements; the use of various notations for the same pronunciation, depending on whether
the phonemes appear at the beginning, in the middle or at the end of a word; and so on. On
the whole, each rule replicates corresponding criteria on which stenographers base their
Input (phonemes of the word famous) /F Ā M Ŭ S/
Rule 1
Rule 18 to 32
Rule 45 to 48
Output (primitives of the word famous)2+5+16+91+92
Rule 2
Rule 17
5. Generation of a machine-readable Pitman’s Shorthand lexicon
121
ability to produce shorthand outlines, and it is important to clarify the concept behind each
rule to enable the reader to understand how the complex writing rules of Pitman’s Shorthand
are embedded in the system of this research.
5.3.2 Description of Rules
Table 5-2: Summary of the 46 rules applied in the creation of the Pitman’s Shorthandlexicon
Rule Description
1 Verification of a vocalised outline
2 Diphthong U
3 CON or COM at the beginning of a word
4 WH
5 Negative prefix IL-, IM-, IN-, IR-, UN-
6 PL, BL, TL, DL, CHL, JL, KL, GL, used consonantly at the beginning, in the
middle or at the end of a word
7 FL, VL, ThL, ML, NL, SHL, used consonantly at the beginning of a word
8 SPR, STR, SKR at the beginning of a word
9 STER in the middle or at the end of a word
10 CON, COM, CUM, COG in the middle of a word
11 SES, SEZ, ZES, ZEZ at the end of a word
12 Past tense ED
13 ST at the beginning, in the middle or at the end of a word
14 Half length stroke for one syllable words
15 ING
16 INGS
17 Suffix –SHIP
18 S or Z stroke
5. Generation of a machine-readable Pitman’s Shorthand lexicon
122
19 Suffix –MENT
20 Suffix –MENTAL
21 Suffix –MENTALLY
22 Double length stroke
23 MD, ND
24 FR, VR, Thr, THR, SHR, ZHR, MR, NR, used consonantly at the beginning of
a word
25+26 PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR, SHR, ZHR, MR,
NR, used syllabically at the beginning, in the middle or at the end of a word
27 SKR, SGR
28 KW, GW
29 PL, BL, TL, DL, CHL, JL, KL, GL, used syllabically in the middle or at the
end of a word
30 FL, VL, THL, ML, NL, SHL, used syllabically in the middle or at the end of a
word
31 S followed by H
32 S+vowel+hookR, ST+vowel+hookR
33 Downward L
34 F or V hook at the end of a word
35 F or V hook in the middle of a word
36 SHUN
37 N hook
38 Upward L
39 Half length stroke in a word of two or more syllables
40 Suffix –LY
41 Upward R and downward R
42 Dash H
5. Generation of a machine-readable Pitman’s Shorthand lexicon
123
43 Reversed FR, VR, Thr, THR
44 P, B, T, D, K, G, M, N, NG, F, V, Th, TH, R, CH, JH, SH, S, Z, ZH, H
45 Vowel extraction and appending
46 Vowel conversion
Table 5-2 presents a summary of the forty-six rules with a list of phonemes relating to each
of them. In order to avoid information overload, algorithms of just five rules are presented
in this section, and the remaining rules can be referenced in detail in Appendix A.
In general, the rules are discussed here in three aspects: complexity, objective and strategy.
The complexity of a rule corresponds to either direct conversion or indirect conversion.
Direct conversion directly converts phonemes into geometric features, whereas indirect
conversion performs the unusual conversion of phonemes into geometric features with
respect to the special writing rules of Pitman’s Shorthand, invented for speed improvement
purposes. In addition, the objective of a rule corresponds to the major role of the rule, and
the strategy of a rule corresponds to a programming procedure of the rule.
Description of the 3rd Rule (CON and COM at the beginning of a word)
Complexity: indirect conversion
Objective: to convert the sounds CON and COM at the beginning of a word into a dot
primitive. A sample outline containing the sound COM at the beginning is illustrated in
Figure 5.5.
Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is
not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON
or COM into a dot primitive.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
124
Figure 5.5: Illustration of the use of a dot primitive for the sound COM at the beginning of a word
Description of the 5th Rule (Negative prefix IL-, IM-, IN-, IR-, UN-)
Complexity: indirect conversion
Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a
series of consonant and vowel primitives. A sample Pitman’s Shorthand outline containing
the prefix IR- is illustrated in Figure 5.6.
Strategy:
1. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.
2. Check if a word representation of an input matches with any element of the list;
3. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,
followed by a dot primitive and another extra upward stroke L;
4. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a
dot primitive and another extra curve M;
5. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,
followed by a dot primitive and another extra downward curve R;
6. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a
dot primitive and another extra curve N.
7. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a
dash primitive and another extra curve N.
Reference
Pitman’s Shorthand notations
/COM/ /M/ /NS/ /Ě/
Pronunciation for the word “commence”
/ K Ŏ M Ě N S/
Pitman’s outline for the word “commence”
5. Generation of a machine-readable Pitman’s Shorthand lexicon
125
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitman’s
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Figure 5.6: Illustration of the use of negative prefix IR- in a vocalised outline
Description of the 6th Rule ( PL, BL,…,GL, used consonantly at the beginning, in the middle
or at the end of a word
Complexity: indirect conversion
Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the
beginning, in the middle or at the end of a word into a series of a small hook L followed by a
corresponding consonant primitive. Note that the consonant L is written as an upward or
downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,
/CH/, /J/, /K/ or /G/. A sample Pitman’s Shorthand outline containing the sound /P L/ at the
beginning of a word is illustrated in Figure 5.7.
Strategy:
1. If /N/ comes before /T L/ or /D L/, hook L is not used.
2. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;
Pitman’s Shorthand outline for the word “irregular”
Reference
Pitman’s Shorthand notations
/R/ /R/ /G/ /L/ /R/
(start) (middle) (end)
/Ĭ/ /Ě/ /Ă/ /U/
Pronunciation for the word “irregular”
/ Ĭ R Ě G Y U L Ă/
5. Generation of a machine-readable Pitman’s Shorthand lexicon
126
3. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/
of an input with a, b, c, d, e, f, g and h respectively, where
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Figure 5.7: Illustration of the use of PL hook in a vocalised outline
Description of the 14th rule (Half length stroke for one syllable words)
Complexity: indirect conversion
Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length
writing rule of Pitman’s Shorthand and a sample (one-syllable) half-length outline is
illustrated in Figure 5.8.
Strategy: if a word is a one-syllable word and if there are consonants in a word other than
just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not
following a voiced consonant and /D/ is not following an unvoiced consonant.
Pitman’s Shorthand outline for the word “play”
Reference
Pitman’s Shorthand notations
/P/ /L/ /PL/ /Ā/
Pronunciation for the word “play”
/P L Ā/
5. Generation of a machine-readable Pitman’s Shorthand lexicon
127
Figure 5.8: Illustration of a one syllable half-length outline
Description of the 22nd rule (Double length stroke)
Complexity: indirect conversion
Objective: to omit the syllables –TER, -DER, -THER and -TURE of a word according to the
double-length rule of Pitman’s Shorthand (description of the rule can be referenced in
appendix B). A sample Pitman’s Shorthand outline containing the syllable -TER is
illustrated in Figure 5.9.
Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or
at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the
syllable is removed from the input phonemes. Samples of incompatible neighbouring
primitives are given in Figure 5.10.
Figure 5.9: Illustration of the omission of the syllable –TER in a vocalised outline
Pitman’s Shorthand outline for the word “after”
Reference
Pitman’s Shorthand notations
/F+TER/ /F/ /Ă/
Pronunciation for the word “after”
/ Ă F T E R/
Pitman’s Shorthand outline for the word “coat”
Reference
Pitman’s Shorthand notations
/K/ /T/ /Ō/
Pronunciation for the word “coat”
/K Ō T/
5. Generation of a machine-readable Pitman’s Shorthand lexicon
128
Figure 5.10: illustration of incompatible primitive pairs for doubling
5.4 Experimental Results
Experiments carried out in this chapter are categorized into two main studies: firstly, an
analysis of the accuracy of a novel machine-readable Pitman’s Shorthand lexicon and
secondly, an analysis of the distribution of homophones (outlines which look similar but
have different representations) in the novel lexicon.
5.4.1 Data Set
In order to analyse an accuracy of a machine-readable Pitman’s Shorthand lexicon, 1253
generally used English words are chosen where the words cover the whole range of writing
rules of Pitman’s Shorthand, except the rules of currency notations and the rule of
punctuation.
In order to analyse the distribution of homophones in a machine-readable Pitman’s
Shorthand lexicon, 5000 most frequently used English words, extracted from the Brown
Corpus are used. Based on the word list, a hash table is created with a series of primitives as
a key for each group of words where a primitive key is automatically generated by the
transcription engine from a phonetic representation of a word. A pictorial representation of
the electronic Pitman’s Shorthand lexicon is presented (Figure 5.11).
Primitive pairs that cannot be represented by doubling
/F K/ /V K/ /F G/ /V G/
5. Generation of a machine-readable Pitman’s Shorthand lexicon
129
key Element
May
Maid
Made
Bat
Pat
Bad
Pad
Figure 5.11: Sample entries of a machine-readable Pitman’s Shorthand lexicon
5.4.2 Analysis of the Accuracy of a Machine Readable Pitman’s
Shorthand Lexicon
A primary goal of an experiment carried out in this section is to evaluate the accuracy of a
machine-readable Pitman’s Shorthand lexicon, where the formulation of the lexicon
accuracy is defined as:
100*)(
t
eta
(5.2)
where a is the accuracy of an electronic Pitman’s Shorthand lexicon, t is the total number of
words included in a testing data set and e is the total number of incorrectly generated words
whose primitive representations do not match with patterns defined in an original Pitman’s
Shorthand dictionary (available in book format).
5. Generation of a machine-readable Pitman’s Shorthand lexicon
130
Accuracies of different sizes of machine-readable Pitman's shorthand lexicon
70 75 80 85 90 95 100
100
300
500
700
900
1100
1253
Nu
mb
er o
f w
ord
s in
clu
ded
in
a l
exic
on
Accuray in Percentage
Figure 5.12: Average accuracies of different sizes of machine-readable Pitman’s Shorthand lexicons
The graph (Figure 5.12) illustrates accuracies of machine-readable Pitman’s lexicons of
different sizes where the word lists of each lexicon is chosen from a data set of 1253 words
randomly without including any duplicated words. The study finds that an accuracy of the
lexicon of 1253 words is 98.8% and an average accuracy of different sizes of lexicons is
99%. An average error rate is approximately 1% and errors are categorised into the
following four groups.
Errors due to an ambiguity of the writing rules of Pitman’s Shorthand.
Errors due to different phonetic representations of an applied phonetic dictionary.
Errors due to derivative or compound words.
Errors due to limitations of machine compatible scripts
In order to clarify the four types of errors, consider the following four examples in which
each example provides a sample erroneous shorthand outline with a corresponding
explanation for each type of error.
Example 1: Errors due to ambiguity of the writing rules of Pitman’s Shorthand
5. Generation of a machine-readable Pitman’s Shorthand lexicon
131
In order to clarify errors due to an ambiguity of the writing rules of Pitman’s Shorthand,
consider one of the rules of Pitman’s Shorthand which reads: “Straight strokes are doubled
in length to represent the sounds of –TER, -DER, -THER, and –TURE when they follow
another stroke” [Oj95]. According to this rule, the transcription engine generates a
shorthand representation for the word “weather” as Figure 5.13(a) in which the sound –
THER is added via a doubled length stroke. However, the typical Pitman’s Shorthand
dictionary (available in book format) defines the word “weather” as in Figure 5.13 (b) in
which the sound –THER is not written according to the double-length rule. The study finds
that this is because the typical Pitman’s Shorthand lexicon applies another rule of Pitman’s
Shorthand which reads: “A straight stroke is not doubled if the doubling would produce two
strokes of unequal length without an angle” [Oj95]. To determine whether the word
“weather” relates to the first rule or the second rule, consider other two outlines
(
Figure 5.14), defined in the typical Pitman’s Shorthand dictionary. Between the two words,
the typical dictionary defines that doubling is not allowed for the word “factor” as the curve
before the straight stroke will produce two strokes of unequal length if the straight stroke is
doubled (case a); however, it defines that doubling is allowed for the word “further” since
the word complies with the double length rule of Pitman’s Shorthand (case b). On the
whole, the transcription engine assumes that the word “weather” belongs to the case (b)
rather than case (a) since doubling does not produce two strokes of unequal length if the
straight stroke is doubled to add the sound –THER. As a result, a shorthand outline for the
word “weather” generated by the transcription engine is different from the one defined in the
5. Generation of a machine-readable Pitman’s Shorthand lexicon
132
typical Pitman’s Shorthand dictionary and hence the error. Overall, a primary cause of error
in this case is due to ambiguity of the rules of Pitman’s Shorthand.
Figure 5.13: Two different outlines for the word “weather”; (a) the word “weather is written according to the double-length rule of Pitman’s Shorthand; (b) the word
“weather” is not written according to the double-length rule of Pitman’s Shorthand
Figure 5.14: (a) Shorthand outline for the word “factor”; (b) shorthand outline for the word “further”
Example 2: Errors due to different phonetic representations of an applied phonetic
dictionary
In order to clarify errors due to different phonetic representations of a phonetic dictionary,
consider a phonetic representation of the word “union”. According to the applied phonetic
dictionary of this research (i.e., CMU phonetic dictionary), the word “union” is composed as
/Y Ō N Y Ĭ N/ whereas, the word is composed as /Y Ō N Ĭ N/ according to the typical
Pitman’s Shorthand dictionary (available in book format). Note that there is an extra
phoneme /Y/ in the first composition and due to this difference, a shorthand representation
of the word “union” defined by the transcription engine is different from the one defined by
the typical Pitman’s Shorthand dictionary as illustrated (Figure 5.15). In general, Pitman’s
Shorthand outline for the word “factor”
Shorthand outline for the word “further”
(a) (b)
(a) Correct outline (b) Incorrect outline
Two different shorthand outlines for the word “weather”
5. Generation of a machine-readable Pitman’s Shorthand lexicon
133
Shorthand outlines are generated according to phonetic representations of an applied
dictionary and therefore accuracy of the phonetic dictionary is critical in this research.
Figure 5.15: two different shorthand outlines for the word “union”
Example 3: Errors due to derivative or compound words
In order to clarify errors due to derivative or compound words, consider the compound word
“landlord”. According to the typical Pitman’s Shorthand dictionary (available in book
format), the word “landlord” is remarked as a composition of two words “land” and “lord”;
however according to the transcription engine, the word “landlord” is remarked as a
composition of one word. As a result, shorthand outline representations for the word
“landlord”, generated by the transcription engine and the typical Pitman’s Shorthand
dictionary become different as illustrated (Figure 5.16), mainly due to one of the rules of
Pitman’s Shorthand that reads: “a small hook written inside the end of a curved stroke adds
the final sound N” [Oj95]. Since the transcription engine does not regard that the phoneme
/N/ at the end of the word “land” is a final phoneme, the N hook is not applied for the word
“landlord” and hence the error. In general, efficiency of the transcription engine in
identifying any derivatives and compound words relies on the information available in an
applied phonetic dictionary.
Two different shorthand outlines for the word “union”
(a) Correct outline (b) Incorrect outline
5. Generation of a machine-readable Pitman’s Shorthand lexicon
134
Figure 5.16: Two different outlines for the word “landlord”
Example 4: Errors due to the limitation of machine compatible scripts
In order to clarify errors due to the limitations of machine compatible scripts, consider a
shorthand representation for the word “environment”, which is defined either as Figure 5.17
(a) or Figure 5.17 (b) by the typical Pitman’s Shorthand dictionary. Note that both of the
scripts are valid in this case according to the rule of Pitman’s Shorthand that reads: “the
suffix –MENT is represented by or , whichever is convenient.” [Oj95] In order to
reduce ambiguity for the computer aided transcription, the transcription engine restricts the
writing of the suffix –MENT to be only one form and the inability of generating an
alternative form is taken as an error in the current research.
Figure 5.17: Two different outlines for the word “environment”
Two different shorthand outlines for the word “environment”
(a) Valid outline (b) Valid outline
Two different shorthand outlines for the word “landlord”
(a) Correct outline (b) Incorrect outline
N hookNormal N stroke
5. Generation of a machine-readable Pitman’s Shorthand lexicon
135
Distribution of different categories of errors in machine-readable Pitman's shorthand lexicons of
different sizes
13%
27%
20%
40%
Errors due to ambiguity of the writing rules of Pitman's shorthand
Errors due to different phonetic representations of an applied phonetic dictionary
Errors due to derivative or compound words
Errors due to limitations of machine compatible scripts
Figure 5.18: The distribution of different categories of errors in different sizes of electronic Pitman’s Shorthand lexicon
On the whole, the graph (Figure 5.18) illustrates the distribution of four types of errors,
discovered in the current experiment where the major error (40%) is due to the limitation of
machine compatible scripts.
5.4.3 Analysis of the Distribution of Homophones in Machine-readable
Pitman’s Shorthand Lexicons
A primary goal of an experiment carried out in this section is to estimate an average
distribution of candidate words (homophones) mapping to each key of a machine-readable
Pitman’s Shorthand lexicon and to evaluate the distribution of homophones in different sizes
of lexicon.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
136
0
20
40
60
80
100
120
100
500
2000
3000
4000
5000
Lexicon size in no. of words
Un
iqu
e o
utl
ines
in
%
In the presence of clearline thickness andcomplete vowelinformation
In the presence of linethickness ambiguity
In the presence of vowelambiguity
Figure 5.19: The distribution of uniqueness of the electronic Pitman’s Shorthandlexicons
Figure 5.19 illustrates experimental results carried out on different sizes of electronic
Pitman’s Shorthand lexicon of up to 5000 words. The X-axis of the graph represents
different sizes of lexicons and words are sorted according to the frequency of usage in each
lexicon. The Y-axis of the graph represents the uniqueness of an electronic Pitman’s
Shorthand lexicon where the formulation of the uniqueness can be defined as follow:
100*t
atu
(5.3)
where u is the uniqueness a lexicon, t is the total number of keys containing in the lexicon,
and a is the total number of keys having one and only one relationship with a corresponding
English word in the lexicon.
The first test (Figure 5.19) illustrates uniqueness of lexicons in the presence of clear
distinction between line thicknesses as well as in the presence of complete vowel
information. The study finds that uniqueness of the lexicon of 5000 most frequently used
English words is 96%. The maximum ambiguity is 4 candidate words per key and an
average ambiguity is 1.02 potential words per key.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
137
The second test (Figure 5.19) illustrates uniqueness of lexicons in the presence of line
thickness ambiguity. According to experimental results, uniqueness of the lexicon of 5000
words drops by about 5% if there is no distinction between thick and thin strokes. The
maximum ambiguity is 5 candidate words per key and an average ambiguity is 1.05 potential
words per key.
Finally, the third test (Figure 5.19) illustrates uniqueness of lexicons in the presence of
vowel ambiguity. The study finds that uniqueness of the lexicon of 5000 words is
approximately 71% when vowel primitives are not included in the keys of a lexicon. The
maximum ambiguity is 15 candidate words per key and an average ambiguity is 1.22
potential words per key.
5.5 Discussion
On the whole, this chapter presents the creation of a novel machine-readable Pitman’s
Shorthand lexicon in order to facilitate the direct translation of geometrical features of
shorthand outlines into English words. Experimental results present accuracies of different
sizes of electronic Pitman’s Shorthand lexicon as well as the distribution of homophones in
the novel lexicon.
In relation to accuracies of the electronic Pitman’s Shorthand lexicons, an average accuracy
of 99% is satisfactory, however further improvements on the correction of errors caused by
an applied lexicon can be carried out with the use of an appropriate dictionary. As for the
correction of errors relating to the rules of Pitman’s Shorthand, the use of machine
compatible scripts is recommended; however it requires further assessment for user
acceptability.
5. Generation of a machine-readable Pitman’s Shorthand lexicon
138
In relation to the analysis of uniqueness of an electronic Pitman’s Shorthand lexicon,
experimental results can be used to justify which type of electronic Pitman’s Shorthand
lexicon is appropriate for the computer aided transcription of handwritten Pitman’s
Shorthand. According to experimental results, a lexicon with low uniqueness is more robust
against the line thickness ambiguity or the vowel ambiguity, and the lexicon with the highest
uniqueness is the least robust against the natural ambiguity. Taking into consideration the
impracticability of having restrictions on natural ambiguity in handwriting recognition, it is
recommended that the use of either a lexicon with line thickness ambiguity or a lexicon with
vowel ambiguity or a combination of both is feasible for the real time transcription of
handwritten Pitman’s Shorthand.
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
139
6 Phrase level transcription of online handwritten
Pitman’s Shorthand outlines
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
140
Chapter 6 Introduction
This chapter focuses on the solutions to the phrase level recognition of online handwritten
Pitman’s Shorthand outlines. The primary aims of this chapter are first to investigate a
contextual method that can effectively reduce homophone ambiguities appeared in a
resulting list of a corresponding handwritten Pitman’s Shorthand outline; and second, to
propose a phrase level recognition framework to produce the most likely word sequence for
a written phrase using the Vertibi algorithm.
Unlike research carried out in the previous chapters of the thesis, which determined the
investigation into finding a novel solution for a given problem to be the one and only one
goal, this chapter’s research is carried out with multiple goals, i.e., to investigate conceptual
algorithms for implementing a handwritten Pitman’s Shorthand phrase recogniser, and also
to consider the possibility of applying existing Application Program Interfaces (APIs)
[Mic04] to resolve the problem of handwritten Pitman’s Shorthand phrase recognition. A
major bottleneck of the integration is access to the APIs’ hidden functions to enable the
Pitman’s Shorthand recogniser’s candidate English words to be input into the APIs.
This chapter presents detailed studies carried out to meet the main objectives mentioned
above, and it is categorised into the following four sections:
- Contextual rejection strategy: presents an effective novel word rejection strategy,
implemented using the critical contextual knowledge that people use when reading
handwritten shorthand notes.
- Phrase level recognition algorithm: propose a conceptual solution to find the most
likely word sequences for a handwritten Pitman’s Shorthand phrase with the use of
the Viterbi algorithm and statistical language modelling techniques.
- Integration with APIs: discusses major difficulties in the integration of a Pitman’s
Shorthand phrase recogniser with APIs of the Microsoft Tablet PC Platform
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
141
Software Development Kit [Tab04], and highlights the potential benefits of
successfully integrating the two components.
- Experimental result: evaluates the performance of the new contextual rejection
strategy proposed in this chapter.
6.1 Contextual Rejection Strategy
Chapter 4 mentioned that the major factor preventing a correct word from appearing in the
topmost position of a result list for a given Pitman’s Shorthand outline is the similarity of
input outlines to other outlines. This research denotes this problem as the homophone
ambiguity, and further resolution to this problem is discussed in relation to the word
rejection strategies here.
Several word rejection strategies [GKM+97], [PP02], [ MAG+02], have been applied in the
field of handwritten word recognition. Their reliability is related to their capability not to
accept false word hypotheses and not to reject true word hypotheses [Ka04]. Common
rejection thresholds reported in the literature are the class-threshold (e.g., [QAC05], which
rejects words according to their grammatical nature), the domain-threshold (e.g., [NB02],
which rejects words according to a user domain), the lexicon-threshold (e.g., [ESS+98],
which rejects words according to a lexicon’s confidence score) and the recogniser-threshold
(e.g., [PP02], which rejects words according to the confidence scores produced by a Hidden
Markov Model-based on-line handwriting recogniser).
In addition to the rejection strategies mentioned above, a critical contextual knowledge that
needs to be put into practice for rejecting homophones of handwritten Pitman’s Shorthand
outlines is the shorthand outlines’ position. In Pitman’s Shorthand, the outlines’ correct
positioning is highly critical [Oj95], as it provides a primary clue for retrieving vowel
information even though vowels are omitted in an outline. In general, an outline’s position is
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
142
determined by the first pen-stroke in Pitman’s Shorthand, such that if an outline’s first stroke
is written above the writing line, it is considered to be written in the first position; if the first
stroke is written on the writing line, it is considered to be written in the second position; and
if the first stroke is written through the writing line, it is considered to be written in the third
position. Samples of three Pitman’s Shorthand outlines written in three different positions
are illustrated in (Figure 6.1 a). As illustrated (Figure 6.1 (b)), although the three words
comprise exactly the same consonant stroke, their corresponding English words can be
easily identified by the differences between the outlines’ positions.
Figure 6.1: Samples of Pitman’s Shorthand outlines written in three different positions; (a) outlines written including vowel notations, (b) outlines written without
vowel notations
Overall, stenographers apply the outlines’ position as a primary clue to identify the most
relevant words for a particular shorthand outline. However, this knowledge has never been
applied to solve the problem of homophone ambiguity in machine-based transcriptions.
Based on this observation, the contextual rejection strategy proposed in this chapter is
defined as:
W P(k) = WP(i) \ WP(i ≠ k) (6.1)
where WP(i) is a list of candidate words for an input shorthand outline (written in different
positions) and k is an input outline’s written position, which is 1 for the first position, 2 for
Outline written in the second position
Outline written in the third position
at aid eat
Outline written in the first position
Outline written in the second position
Outline written in the third position
at aid eat
(a) (b)
Pitman’s Shorthand outlines for the words “at,” “aid” and “eat”
Pitman’s Shorthand outlines for the words “at,” “aid” and “eat” (written without vowels)
Outline written in the first position
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
143
the second position and 3 for the third position. In order to clarify the algorithm, consider
Example 1 given below.
Example 1
Assuming that k = 1, W = {W1, W2, W3, W4, W5, W6} is a set of candidate words for a given
shorthand outline and P(i) of W1, W2, W3, W4, W5, W6 are 2, 1, 1, 3, 2 and 1 respectively,
then W P(k) is calculated as:
W P(k) = WP(i) \ WP(i ≠ k)
W P(k) = { W1, W2, W3, W4, W5, W6} \ { W1, W4, W5}
= {W2, W3, W6}
6.2 Handwritten Pitman’s Shorthand Phrase Recognition
Figure 6.2: illustration of the handwritten Pitman’s Shorthand phrase level transcription process
The structure of the phrase level transcription process (for handwritten Pitman’s Shorthand)
is illustrated in Figure 6.2, where the framework is based on the architecture of the online
Short-form interpreter
Bayesian Networkbased vocalised outline
interpreter
Short-form interpreter
Bayesian Networkbased vocalised
outline interpreter
Ordered word list
Ordered word list
Ordered word list
Ordered word list
Pitman’s Shorthand specific contextual word rejection
Filtered word list
Filtered word list
Filtered word list
Filtered word list
Word graph
Language model
The cat sat on the mat.
Pitman’s Shorthand phrase level transcription
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
144
handwritten English sentence recognition [QAC05]. In brief, every handwritten Pitman’s
Shorthand outline is given as an input to the short-form interpreter or the Bayesian Network
based vocalised outline interpreter, and a ranked list of candidate words for each outline is
produced at the end of the word recognition process. The candidate words are then validated
by the contextual rejection strategy, which removes irrelevant words from the candidate lists
before forwarding them to the phrase level recogniser. The phrase level recogniser then
creates a word graph with the incoming word lists such that each node represents a candidate
word’s likelihood, and each edge represents the transitional probability (i.e., language model
probability) between node n1 and node n2. The phrase level recogniser then finds the most
likely sequence of words for a given input phrase by applying the Viterbi algorithm to the
word graph.
Based on the algorithm defined in the online handwritten English sentence recognition
[QAC05], the most likely sequence of words Ŵ for a handwritten Pitman’s Shorthand
phrase is defined as:
Ŵ = )()|(maxarg WpWPpW
(6.2)
where W is the candidate words’ sequences, P is the handwritten Pitman’s Shorthand phrase,
p(P|W) is the posterior probability of a handwritten phrase P conditioned on the given
sequence of words W, and p(W) is the prior probability of sequence W.
In detail, p(P|W) is evaluated by the confidence score of the Bayesian Network based word
interpreter and p(W) is given by a statistical language model. In other words, the efficiency
of finding the best sequence of words for a given input phrase depends on the confidence
score of the handwritten word recogniser plus the confidence score of the applied statistical
language model.
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
145
This chapter focuses on the statistical language models’ impact on phrase level recognition,
because a language model’s quality can directly affect the overall word recognition
accuracy. For instance, [MB01] showed that a bi-gram language model outperforms a
unigram language model in offline handwritten sentence recognition. Similarly, work by
[ZB04] showed that a tri-gram model increases word recognition accuracy by 6.8%
compared to a bi-gram model. Again, work by [QAC05] showed that using bi-gram and tri-
gram models for online handwritten sentence recognition results in only a 0.1% difference in
word recognition accuracy. These findings show that it is critical to apply an appropriate
statistical language model in order to obtain an overall promising result.
Considering a statistical language model’s quality, this chapter proposes the use of the
statistical language model embedded in Microsoft handwriting recognition APIs [Mic04], in
which the language model has been thoroughly trained with millions of words of various
languages, dictionaries and grammar for the development of a commercially viable system.
6.3 The Integration of a Pitman’s Shorthand Phrase Recogniser
with APIs
This section presents a feasibility study of the integration of a Pitman’s Shorthand phrase
recogniser with Microsoft handwriting recognition APIs, in order to take advantage of
existing statistical language models embedded in the APIs. In order to discuss the specific
API relating to this study, consider the object model illustrated in Figure 6.3.
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
146
Figure 6.3: An abstract view of the object model “Microsoft.Ink”
Figure 6.3 illustrates an object model for the class “Microsoft.Ink” that includes child
objects facilitating automatic handwriting recognition. A specific object relating to the
current study is the “RecogniserContext” object, which enables ink recognition, retrieving
the recognition’s result and alternative results.
In order to clarify the “RecogniserContext’s” efficiency, consider the examples illustrated in
Figure 6.4. Figure 6.4 (a) illustrates the recognition results for a written phrase produced by
Ink Collection
and Display
Ink Data
Recognition
Recognizer
RecognizerContext
WordList
RecognizerGuide
RecognitionResult
RecognitionAlternates
RecognitionAlternate
Strokes
Stroke
Gesture
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
147
the “RecogniserContext” API, Figure 6.4 (b) illustrates the change in recognition results
upon a new word’s arrival, Figure 6.4 (c) illustrates the change in the recognition results
when the API’s context is limited to a full file path’s name, and Figure 6.4 (d) illustrates the
change in recognition results when the API’s context is limited to an e-mail address’
username.
Figure 6.4: Screen shots of the recognition results produced by the“RecogniserContext” API
In total, approximately 40 kinds of input scopes can be defined in relation to the API’s
context. A major bottleneck in integrating the handwritten Pitman’s Shorthand recogniser
into this powerful API is the API’s lack of function that can facilitate taking a written
phrase’s candidate words as inputs, and producing a ranked list of candidate phrases as an
output. Overall, the investigation of a solution to facilitate this function is rewarding and is
open to future work of this research.
(a) (b)
(c) (d)
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
148
6.4 Experimental Results
A small experiment is carried out in this chapter to evaluate the performance of the
contextual rejection strategy reported in this chapter. The data set includes 700 phrases,
which were automatically generated using the word lists gained from the experiments of the
Bayesian Network based word transcription (Chapter 4). A primary goal of this experiment
is to analyse the accuracy of irrelevant words’ removal from the result lists based on the
shorthand outlines’ position information, which defines rejection accuracy as:
0
100
a
Figure 6.5: Performance of the contextual rejection strategy
The rejection accuracy of 700 phrases is illustrated in Figure 6.5. The study finds that the
contextual rejection strategy correctly filtered 98% of the phrases, and that inaccurate
position writing, practised primarily by inexperienced writers, caused the 2% error rate. The
findings show that the contextual rejection strategy proposed in this chapter is highly reliable
in conjunction with accurate position writing.
6.5 Discussion
This chapter presents Pitman’s Shorthand specific contextual knowledge to reduce
handwritten Pitman’s Shorthand’s homophone ambiguities. Theoretical algorithms to
0
20
40
60
80
100
120
1 30 59 88 117 146 175 204 233 262 291 320 349 378 407 436 465 494 523 552 581 610 639 668 697
phrase number
accu
racy
If correct words appear in result lists after applying the rejection strategy
Otherwise (6.3)
6. Phrase level transcription of online handwritten Pitman’s Shorthand outlines
149
resolve the problem of handwritten Pitman’s Shorthand phrase recognition are proposed
with the use of the Viterbi algorithm and language models. In relation to the use of sufficient
statistical language models in order to enhance the phrase level recognition performance, a
feasibility study of the phrase level recogniser’s integration with the existing handwriting
recognition APIs is carried out. The study highlights the APIs’ efficiency and proposes the
potential benefits of successfully integrating the two components.
Overall, this chapter has addressed solutions to the online handwritten Pitman’s Shorthand
phrase recognition problem; however the framework is not fully implemented in this
research mainly because the research into this problem is no longer new, and an established
framework is available in the market. Compared to phrase level transcription, investigation
into novel solutions to handwritten Pitman’s Shorthand’s word level transcription problems
has been more emphasised in this research, as the state of the art of the latter needs more
extensive research in order to produce a commercially viable handwritten Pitman’s
Shorthand recogniser.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
150
7 Graphical User Interfaces of the Handwritten Pitman’s
Shorthand Recognition System
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
151
Chapter 7 Introduction
Previous chapters presented full details of the back-end interpretation of handwritten
Pitman’s Shorthand outlines, whereas this chapter presents the research and development of
front-end user interfaces, via which users of the handwritten Pitman’s Shorthand recogniser
interact with the back-end programs. The primary objective of the chapter is to demonstrate
the commercial viability of the end result of this research with a series of well-designed
prototypes. Figure 7.1 depicts the front-end and back-end layers of the system.
Figure 7.1: Front-end and back-end architecture of the system
The chapter includes six main topics, outlined as follow:
1. Overview: a description of interactions between user interfaces and back-end
engines, including clarification of an applied programming environment for each
front-end and back-end program.
Client layer
Tool layer
Database layer
Developer Volunteers for the training
data collection
End users
Front-end layer
Back-end layer
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
152
2. Pen based Application Program Interfaces (APIs): a brief description of Microsoft
tablet PC APIs, in particular ink APIs which are primarily used in the development
of Graphical User Interfaces (GUIs) of the system.
3. Training data collection GUIs: description of data collection GUIs with which a
large collection of handwritten Pitman’s Shorthand outlines is collected for training
purposes.
4. Developer GUI: description of a low level parameter setting interface, mainly
implemented for system developers.
5. End-user GUIs: description and comparison of end-user interfaces of this research
which enable handwritten Pitman’s Shorthand data entry into tablet PCs.
6. Experiment: evaluation of user feedbacks on the presented prototypes, particularly
from the perspective of potential users’ acceptability.
7.1 Overview
Figure 7.2: Illustration of interactions between user interfaces and back-end engines of the system
Ink Collector GUI
0 1050 116931 1051 116746 1053 1171..
Data file 1
Recognition engine
WStartS1,32,1,25,1S2,32,92,3,0.997942S2,32,92,12,0.24453..
Data file 2
Transcriptionengine
play 0.322bay 0.112clay 0.001..
Result GUI
Data file 3
Legends
File storage
Process
Process flow
Read/write access
Visual C# Visual C++ Java Visual C#
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
153
A brief overview of the interactions between front-end interfaces and back-end engines are
illustrated (Figure 7.2). As shown, handwritten ink data, collected by an ink collector GUI is
firstly saved in a data file (data file 1 in Figure 7.2). The data file is then processed by the
collaborator’s recognition engine where a series of ink coordinates are transformed into a
ranked list of words or primitives (data file 2 in Figure 7.2). Then, on arrival of the
classified data, the transcription engine invokes and produces a ranked list of n-best words
for the corresponding classified data (data file 3 in Figure 7.2). Once the transcription result
is ready, the front-end GUI retrieves the result file and displays it as the best text
representation for a written outline.
As discussed, a primary means of data transmission between components of the current
system is via file accesses. In this way, development of the front-end and back-end
programs becomes independent without necessarily waiting for the completion of each
other; thereby enabling the concurrent development of several components of the system in
two countries to become productive. In addition, the current system includes more than one
programming environment and data files are, in fact, primary media communicating
programs of the different environments. The graphical user interfaces presented in this
chapter are implemented using tablet PCs with Microsoft Windows XP Tablet PC Edition.
The detailed programming environments included in the current system development are:
- Microsoft visual C#, used in the development of front end user interfaces.
- Microsoft C++, used in the development of the collaborator’s recognition engine.
- Sun Java (J2SDK), used in the development of the transcription algorithms.
7.2 Ink Data Collection in this Research
This section presents an essential description of the ink collection procedure carried out in
this research. The description is linked to the Microsoft tablet PC platform APIs [Mic04]
in order to enable the (interested) reader to test a simple ink collection program practically.
In other words, the presented ink collection procedure here is applicable to not only the
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
154
collection of online handwritten Pitman’s Shorthand data, but also the collection of any kind
of ink data needed for various purposes.
In general, the Tablet PC platform APIs relating to the ink collection can be divided into
three distinct groups: ink collection APIs, ink data management APIs and ink recognition
APIs. A pictorial presentation of how these APIs work together, at a high level, is provided
at the MSDN library [Abo04] and the illustration is replicated here (Figure 7.3) as a
reference for discussion.
Figure 7.3 Illustration of the Tablet PC platform APIs presented at [Abo04]
According to the pictorial presentation of the Tablet PC platform APIs (Figure 7.3), the ink
collection procedure of this research relates to the utilization of Pen APIs (i.e., the first stage
of Figure 7.3). Here, clarification is made why APIs of the other stages (i.e., No. 2, 3 and 4
stages of Figure 7.3) are not applicable to the current research regardless of their efficient
ink manipulating and recognition capabilities. This is because the Tablet PC platform APIs
support the processing of only fifteen handwritten languages at the time of writing, and
Pitman’s Shorthand is not one of them. In brief, only the ink collection APIs are applicable
to the current research, and the remaining functions of ink manipulation and ink recognition
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
155
are covered by the recognition engine and the transcription engine of this research
respectively.
Figure 7.4 depicts the high level relationship of object models of the tablet PC APIs where a
hierarchical relationship of an ink collection object, namely “InkCollector” is highlighted.
The primary function of this object is capturing a series of ink coordinates and timestamps of
a pen-stroke. In general, any handwriting data written on handheld devices with a Microsoft
tablet PC platform can be collected with the use of an “InkCollector” object.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
156
Figure 7.4: Illustration of the high level relationship of object models of the Tablet PCplatform APIs
Ink Collection
and Display InkTablets
IInkTablet
InkCollector
InkDisp
IInkCursors
IInkCursor
InkDrawingAttributes
IInkCursorButtons
IInkCursorButton
INKEDLib
InkEdit
PenInputPanel
InkOverlay
InkPicture
InkRenderer
InkRectangle
InkTransform
Ink Data
Recognition
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
157
7.3 General Training Data Collection Tool
Figure 7.5: Home page of the training data collector
The interfaces (Figure 7.5 & Figure 7.6) of this research are particularly designed for the
collection of a large sum of handwriting data for system training purposes. It has been
applied as a primary data collection tool in this research, and it can also be applied as a
general data collection tool for any other kinds of handwriting recognition systems.
A primary purpose of the interfaces in this research is to collect and organise training data
effectively as well as to enable volunteers (shorthand writers) to have a user friendly
experience of entering Pitman’s Shorthand outlines into a tablet PC. Mention should be
made that Pitman’s Shorthand was once widely practised as a speech recording mechanism,
but more recently it has ceased being a popular writing system. Therefore, volunteers of the
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
158
training data collection process can be of various ages as well as domains. In addition, tablet
PCs are fairly new devices for the general population at the time of writing, and no more
than 20% of the volunteers of this research have previous experience of using pen-based
computers. Taking into account these factors, an important criterion is set in relation to the
layout of the training data collector i.e., functions of the interfaces should be kept as simple
as possible, and the appearance of the interfaces must be suitable for volunteers of various
ages and domains.
Figure 7.6: Sample data entry page of the training data collector GUI
In general, the training data collector collects two types of data: writer data and ink data.
The writer data is intended for the evaluation of the overall system performance and the ink
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
159
data is intended for the training of the transcription engine. As illustrated (Figure 7.5), the
home page of the training data collector collects the following writer information:
Name: intended for automatic naming of training data folders and files.
Gender: intended for evaluating whether the transcription accuracy varies between
female writers and male writers.
Skill in Pitman’s Shorthand: intended for evaluating whether the transcription
accuracy varies depending on the skill of a writer in Pitman’s Shorthand, where the
skill is categorised into three levels: professional, intermediate and inexperienced.
First language of a writer: intended for evaluating whether the transcription
accuracy is influenced by a writer’s skill in English pronunciation. Since Pitman’s
Shorthand is based on phonetics, technically, non-native English speakers may find
pronunciation more difficult than native speakers and they may produce more
inaccurate shorthand outlines.
Previous experience in pen-based data entry system: intended for evaluating
whether the transcription accuracy is influenced by a writer’s previous experience in
using a pen-based text entry system.
Tidiness of handwriting: intended for evaluating whether the transcription accuracy
is influenced by the tidiness of a user’s handwriting, where the tidiness is
categorised into three levels here: “very neat and tidy”, “legible enough to others but
not very tidy”, and “legible to me but not to others”.
Domain of a writer: intended for evaluating whether the transcription accuracy
varies depending on the change of domains.
Way of writing: intended for evaluating whether the transcription accuracy varies
depending on whether the writer is left-handed or right-handed.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
160
7.4 Developer Graphical User Interface
Figure 7.7: Screen shot of the developer Graphical Interface
The developer GUI (Figure 7.7) provides an advanced setting of the system where its
functions are particularly intended for system developers. Since it is a gateway to control
parameters of the recognition and transcription engines, it has presented huge benefits to the
current research and development. Moreover, it is also intended to be beneficial to future
system developers whose work is going to be based on this research. Functions included in
this interface are:
A change of lexicon: a file dialogue is provided to specify the location of a new
lexicon. Domain specific knowledge is critical for the transcription of Pitman’s
Shorthand [NB02] and therefore this function enables a change of an appropriate
lexicon for a particular domain.
Definition of training data set: a text area is provided to enter a list of words that are
to be collected for training purposes. Since Pitman’s Shorthand is no longer a
popular writing system nowadays, databases of handwritten Pitman’s Shorthand
outlines, designed for training a handwriting recogniser, are not available at the time
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
161
of writing. As a result, the current research includes collection of several training
data sets.
Execution of the recognition engine (classification): a control dialogue is provided
to execute the recognition engine independently. A primary purpose of this function
is to convert a series of ink data files (e.g., data file 1 in Figure 7.2) into a series of
classified data files (e.g., data file 2 in Figure 7.2) in a batch by running the
recognition engine separately. It has been mentioned that the ink collector and the
recognition engine (Figure 7.2) are capable of running independently with data files
created in between. At the time of the training data collection in this research, the
ink collector is run separately, mainly to reduce frustration to volunteers who input
hundreds of shorthand outlines into the system in a limited amount of time.
Parameter setting: a control dialogue is provided to adjust parameters, used in the
training of Bayesian Network based shorthand outline models. The use of this
interface is essential to train the transcription engine, because it is theoretically
inflexible, to have multiple training samples for every word of a dictionary
regardless of a large collection of training data in this research. With the use of the
parameter setting interface, Bayesian Network based outline models can be trained
with training data and history data or either one of them. In addition, the interface
also enables the specification of a preferred training data set, to train the
transcription engine.
7.5 Shorthand Data Entry Graphical User Interfaces
The shorthand data entry GUIs presented in this section are the first graphical user
interfaces, ever implemented to facilitate handwritten Pitman’s Shorthand entry into tablet
PCs. Discussion on the interfaces are presented below in comparison with the collaborator’s
interfaces that were concurrently developed.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
162
Firstly, a screen shot of an end-user interface, proposed by the collaborator is illustrated
(Figure 7.8) where box 3 facilitates a handwritten ink entry into the system; box 1 provides
segmentation and classification results of a written script; box 2 provides a list of n-best
words for the written script. Similarly, another (most recent) version of the collaborator’s
interface is presented (Figure 7.9) where the components are basically the same as the
previous version (Figure 7.8) with additional writing areas and new features for adjusting the
parameters of the recognition engine. On the whole, the end-user interfaces suggested by the
collaborator are mainly beneficial to system developers since they emphasise the back-end
view of the recognition and transcription engines.
Figure 7.8: The first version of the collaborator’s tablet PC interface for the handwritten Pitman’s Shorthand recognition system
Box 1
Box 2
Box 3
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
163
Figure 7.9: The latest version of the collaborator’s tablet PC interface for Pitman’s Shorthand recognition system
Unlike the interfaces developed by the collaborator, end-user interfaces of this thesis put
emphasis on the usability issues including user friendliness, commercial viability and
completeness of the system. From the aspect of user friendliness, the research interfaces are
designed to look similar to a conventional shorthand note-pad. In this way, primary users
(stenographers) of the system are expected to get used to the interfaces quickly, thereby
enabling a short learning curve.
While taking into account the creation of a note pad like interface, a pen-input area (writing
area) becomes a critical concern i.e., whether a square writing box should be designed for
Textual output
Transcription results
Recognition results
Input area
Parameter control
functions
make the
make a cake
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
164
the writing of a single word or multiple words. In general, facilitating the writing of
multiple words in a single writing area suffers word boundary ambiguities. However, on the
other hand, enabling the writing of N words in N writing areas suffers wasted screen spaces.
In this research, the collaborator’s recognition engine encourages the writing of one and only
one word in each writing area in order to reduce word boundary ambiguities. As a result,
end-user interfaces of this research also encourage the writing of N words in N writing areas.
Regardless of the use of several writing boxes in the interface, the original goal (i.e., to
create a note pad like interface) is achieved by connecting the boxes with faded borders as
illustrated (Figure 7.10).
In addition, the dimension of a writing area is discussed in relation to the creation of a note-
pad like interface. The size of Pitman’s Shorthand note-pads, commonly used by
stenographers, is roughly A5 size (210mm x 149 mm) with approximately 8mm line
intervals [Lg90]. By taking a ratio of the size of a note pad to the size of 15” digitiser of a
tablet PC (e.g., 1024 pixel x 768 pixel), the 8 mm line interval is considered to be equal to
approximately 30 pixels. Based on these measurements, the dimension of individual writing
box of the interface is set at 100 pixels x 60 pixels with a line interval of 30 pixels. The
solution appears to be practical but requires further assessment for user acceptability.
From the aspect of commercial viability, the overall presentation of the interface is designed
to look good in addition to its reliable functioning. On the whole, users are provided with a
choice of two layouts to interact with the final interfaces of the system. The first one (Figure
7.10) is designed to be practical for its rapid note taking purpose and it resembles a
conventional shorthand note-pad. The second one (Figure 7.11) is designed to be practical
for its general text entry purpose and it resembles a handwriting recogniser of Microsoft
Windows XP Tablet PC edition.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
165
From the aspect of completeness, the final interfaces of this research perform as a gateway to
access to any components of the system, including a data entry GUI with a list of toolboxes
for text edition and parameter setting, the developer GUI (Figure 7.7), the training data
collection GUI (Figure 7.5), and a back-end view of the system (similar to the one proposed
by the collaborator (Figure 7.9)). Regardless of the involvement of these multiple
components, the simplicity of the look of the interfaces is achieved by hiding any advanced
level control components with show/hide functions to open/close them respectively, as
illustrated (Figure 7.10 & Figure 7.11).
Figure 7.10: Screenshot of a note-pad layout of the end-user interface of this research
Textual output
Writing area
Page feed
make a cake
Text editing toolbox
Advanced setting toolbox
make a cake
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
166
Figure 7.11: Screenshot of an alternative layout of the end-user interface of this research
7.6 Experimental Results
A small experiment is carried out in this chapter to analyse user feedbacks on the graphical
user interfaces (GUIs) presented in this chapter. An ultimate goal of the experiment is to
determine the most feasible GUI for the automatic handwritten Pitman’s Shorthand
recogniser that can be presented as a commercially viable prototype.
On the whole, four types of GUIs are evaluated in the experiment. Two of them were
developed by the collaborated research team and the other two were developed in the
research and development of this thesis. In order to assist the reader to easily recognise the
four GUIs, thumbnail views of the GUIs are presented in Figure 7.12. The experiment was
conducted by 20 persons with different levels of skill in Pitman’s Shorthand (including those
with no background knowledge in Pitman’s Shorthand up to those with professional level of
skill in Pitman’s Shorthand).
Textual outputAdvanced setting toolbox
Text editing toolbox
Page feedWriting area
make a cake
make a cake
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
167
Figure 7.12: Thumbnails of the four GUIs evaluated in the experiment
In general, experiments carried out in this chapter are categorised into four groups as
follows:
The general distribution of user fondness for the presented prototypes.
The distribution of user fondness for the presented prototypes in the case of speed
writing.
The distribution of user fondness for the presented prototypes in the case of general
text entry into handheld devices.
The comparison of the most favourite GUI of experienced shorthand writers and that
of novice shorthand writers.
Prototype 1 Prototype 2 Prototype 3
Prototype 4
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
168
7.6.1 Analysis of the General Distribution of User Fondness for the
Presented Prototypes
The general distribution of user fondness for the four prototypes
0%
20%
40%60%
80%
100%
1 2 3 4
level of preference (1 represents the most favourite and 4 represents the least favourite)
per
cen
tag
e o
f u
ser
prototpye 1 prototype 2 prototype 3 prototype 4
Figure 7.13: The general distribution of user fondness for the presented prototypes
(Figure 7.13) illustrates the level of user fondness for the four prototypes where the X-axis
represents the level of user preference for a specific prototype over the others, and the Y-axis
represents the percentage of users. Experimental results show that prototype 4 is the most
favourite GUI for 60% of users, and the prototype 1 is the least favourite GUI for 95% of
users.
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
169
7.6.2 Analysis of the Distribution of User Fondness for the Presented
Prototypes in the Case of Speed Writing
The distribution of user fondness for the four prototypes in the case of
speed writing
0%10%20%30%40%50%60%70%80%
prototype 1 prototype 2 prototype 3 prototype 4
per
cen
tag
e o
f u
ser
Figure 7.14: The distribution of user fondness for the presented prototypes in the case of speed writing
(Figure 7.14) illustrates the level of user fondness for the four prototypes, especially in
relation to the need for rapid writing, for instance, in the case of the real time recording of
spoken speech. An interesting phenomenon here is that prototype 3 becomes the most
preferred GUI over the others, in particular prototype 4. Note that, prototype 4 is the most
favourite GUI in the experiment for general cases (Figure 7.13). This finding shows that the
majority of users regard that a notepad like interface is more appropriate for rapid note-
taking purposes.
7.6.3 Analysis of the Distribution of User Fondness for the Presented
Prototypes in the case of a Small Amount of Text Entry into
Handheld Devices
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
170
The distribution of user fondness for the four prototypes in the case of small text
entry
0%
10%
20%
30%
40%
50%
60%
70%
prototype 1 prototype 2 prototype 3 prototype 4
per
cen
tag
e o
f u
ser
Figure 7.15: The distribution of user fondness for the presented prototypes in the case of a small amount of text entry into handheld devices
Figure 7.15 illustrates the level of user fondness for the four prototypes, particularly in
relation to the need for entering a small amount of textual information into handheld devices,
for instance, entering a person’s name into the name field of an address book. In contrast to
the case in Figure 7.14, the study finds that prototype 4 becomes the most favourite GUI
mainly due to the small amount of screen space taken by the shorthand recogniser while
other applications need to be run at the same time.
7.6.4 The Comparison of the Most Favourite GUI of Experienced
Shorthand Writers and that of Novice Shorthand Writers
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
171
The comparison of the most favourite GUI of experienced shorhtand writers and that of novice shorthand writers
0%
20%
40%
60%
80%
100%
120%
prototype 1 prototype 2 prototype 3 prototype 4
pre
cen
tag
e o
f u
ser
experienced shorthand writer Novice shorthand writer
Figure 7.16: The comparison of the most favourite GUI of experienced shorthand writers and that of novice shorthand writers
Finally, a comparison of the most favourite GUI of experienced shorthand writers and that of
novice shorthand writers (for the general purpose of use) is given in Figure 7.16. The study
finds that 100% of experienced shorthand writers prefer prototype 3 over the others, whereas
the majority of novice writers (80%) prefer prototype 4 over the others.
7.7 Discussion
This chapter presents the research and development findings of prototypes of the automatic
handwritten Pitman’s Shorthand recogniser. It takes a step towards a commercialization of
the product by showing what can be done with the prototypes of the handwritten Pitman’s
Shorthand recogniser.
According to the experimental results, it is found that prototype 3 and prototype 4,
developed by the research in this thesis, are preferred to the other two prototypes which were
developed by the collaborated research team. In addition, the study finds that the user
preference between prototype 3 and prototype 4 varies from time to time depending on the
purpose of use. Taking into consideration these findings, the end-user interface of the system
7. Graphical User Interfaces of the Handwritten Pitman’s Shorthand Recognition System
172
is finally designed with the integration of prototype 3 and prototype 4 so that users are
provided with a choice of two different layouts (i.e., prototype 3 or prototype 4) in order to
interact with the automatic handwritten Pitman’s Shorthand recogniser on tablet PCs.
8. Conclusion
173
8 Conclusion
8. Conclusion
174
Chapter 8 Introduction
This chapter presents the summary and conclusion of researches carried out in this thesis and
it is divided into the following four sections:
Research work summary: presents a summary of the whole thesis by highlighting
the key objectives of each chapter in combination with an overall evaluation of the
work carried out in each chapter.
Contribution: draws attention to a list of major contributions that have been made to
the research and development in order to meet the overall objectives of the thesis,
outlined in chapter 1.
Future work: presents further research directions that may be taken in order to
improve upon the presented approaches for a commercially viable system.
Dissemination: presents a list of papers (progress reports of the findings of this
research) that have been presented and published in pattern recognition specific
journals and conference proceedings.
8.1 Research Work Summary
The overall aim of the research presented in this thesis was to investigate the novel lexicon
organization and contextual methods that could improve the state of art of the online
handwritten Pitman’s Shorthand recognition problem.
Chapter 1 introduced the research of the thesis by highlighting a motivating need for the
development of new lexical post-processing methods to enhance the quality of text
interpretation of online handwritten Pitman’ shorthand outlines. The chapter also
highlighted the necessity for the development of a functional user friendly graphical user
interface which facilitates a rapid text entry into pen based computing handheld devices
using handwritten Pitman’s Shorthand.
8. Conclusion
175
A thorough literature review was carried out in Chapter 2 which overviews currently
available text entry systems into handheld devices and describes commonly used pattern
recognition and natural language processing algorithms that are applied to resolve problems
of the handwriting recognition.
The investigation into the efficiency of a conventional phonetic based word transcription
approach (where primitives of a shorthand script are firstly converted into a phonetic
representation in order to interpret it once more into corresponding English words with the
use of a phonetic dictionary) was discussed in Chapter 3. It was shown that the approach is
not robust against ambiguities of Pitman’s Shorthand, in particular, ambiguities of the
random omission of vowels among outlines. This leads to an exploration of the
development of a novel Bayesian Network based word transcription algorithm which aims to
enhance the solution using a primitive based transcription approach (Chapter 4). In the new
approach, primitives of a shorthand script are directly converted into autographic English
word(s) without getting transformed into phonemes with the use of a Pitman’s Shorthand
lexicon. It was shown that the new primitive based approach outperforms the conventional
phonetic based method.
In relation to the primitive to text transcription approach, Chapter 5 presented the automatic
generation of a novel machine-readable Pitman’s Shorthand lexicon which is an essential
component facilitating the primitive based transcription of the Bayesian Network based
word recogniser. The lexicon was shown to be a very effective mechanism for automatically
generating Pitman’s Shorthand representations for any given words.
Following an extensive research into the word level transcription of handwritten Pitman’s
Shorthand outlines, Chapter 6 proposed new contextual methods to enhance the solution
quality of the phrase level transcription problem. It was shown that the application of a well
8. Conclusion
176
known Viterbi algorithm in combination with Pitman’s Shorthand specific contextual
knowledge is more effective, comparing to other contextual methods of the same framework.
Finally, prototypes of end user graphical user interfaces (GUIs), designed to demonstrate the
real time recognition of handwritten Pitman’s Shorthand on a tablet PC are presented in
Chapter 7. This involves an evaluation of the user friendliness of the prototypes as well as
the selection of a final GUI for the whole system based on experimental results.
8.2 Contribution
A number of original contributions have been drawn from the thesis and they are identified
as follow:
For the first time, an investigation into the integration of the low-level online
handwritten Pitman’s Shorthand recogniser with the high-level linguistic post-
processor is presented. It is shown that the integration has resulted in a more
productive quality research than the work reported in the literature of the same
framework.
The concept of phonetic based interpretation of segmented portions of handwritten
Pitman’s Shorthand outlines into English words, reported in the literature is applied
to the linguistic post-processing of handwritten Pitman’s Shorthand problem. The
appraisal delivers a valuable finding that highlights the need for the investigation of
a novel means of interpreting handwritten Pitman’s Shorthand with a different
approach from the direction of existing concepts.
For the first time, the Bayesian Network representation is applied to the modelling
of handwritten Pitman’s Shorthand outlines. A series of experiments are carried out
to analyse the transcription performance of the Bayesian Network based word
interpreter. The findings show that the Bayesian Network representation is robust
8. Conclusion
177
against stroke variation and highly effective for handling major ambiguities of
handwritten Pitman’s Shorthand (i.e., classification errors and vowel errors).
For the first time, a machine readable Pitman’s Shorthand lexicon is generated. The
findings show that the capability of the lexicon (i.e., ability to produce an accurate
Pitman’s Shorthand representation for a corresponding word) plays an important
role in producing high quality solutions.
The application of Pitman’s Shorthand specific contextual methods in combination
with a Viterbi algorithm is proposed for the phrase level transcription problem. The
algorithm shows promise towards achieving the best quality solution to the phrase
level transcription problems of handwritten Pitman’s Shorthand.
A complete but resolute testing data set (which covers the whole range of rules of
Pitman’s Shorthand) is proposed. The dataset is sufficient to be applied as a quality
benchmark dataset in the literature.
For the first time, the development of end user graphical user interfaces for enabling
Pitman’s Shorthand data entry into tablet PCs is carried out. It is shown that the
final interface of the system is ready to be introduced as a commercially viable
prototype.
8.3 Future Work
Whilst this thesis presents several new methodologies to improve the state of the art of the
machine transcription of online handwritten Pitman’s Shorthand, there are several research
questions that have been generated. Some of these are identified below.
8.3.1 Improvement upon the Overall System
A further research direction, working in close collaboration with the current collaborated
research team is encouraged for the future improvements upon the overall system for a
8. Conclusion
178
commercially viable product. A major reward gained from the future cooperative research
will be the removal of the limitations of the current system, in particular recovering
segmentation errors of the recognition engine by allowing an interactive processing between
the recognition engine and the transcription engine. In the current Bayesian Network model,
the modeling of segmentation ambiguities is infeasible mainly due to the lack of real time
interaction with the low level segmentation process of the recognition engine. With an
interactive supply of low level segmentation data, Bayesian Network based stroke models
for Pitman’s Shorthand notations can be added in connection with existing shorthand outline
models. In this way, segmentation ambiguities can be embedded in probabilistic models and
recovered in the lexical post-processing stage. Overall, it may be worthwhile exploring a
solution to segmentation errors, which is a critical issue in the recognition of natural
handwriting.
In addition, further investigation of the transcription of punctuations and currency notations
of Pitman’s Shorthand is worthwhile in order to complete the functionality of the Pitman’s
Shorthand recognition system. Similar to short-forms, the number of symbols of the
punctuations and currency notations is limited and therefore the use of a Template Matching
algorithm to interpret these symbols is promising. In fact, the Template Matching algorithm
has already been established in the current recognition engine to recognise short-forms.
When the transcription engine receives punctuation data, a rigorous analysis of the
performance of the phrase level interpreter should be carried out in order to identify the
affect of punctuation over the phrase level transcription performance.
Finally, the full implementation of the integration of the contextual methods with the Viterbi
algorithm (proposed in Chapter 6) should be carried out by applying an appropriate a
professionally developed statistical language model available in the market. With the strong
foundation of the statistical language model in combination with the effective Pitman’s
Shorthand specific contextual methods, the overall performance of the system will be
considerably better than other benchmark instances found in the current literature.
8. Conclusion
179
8.3.2 Application of the Presented System to the Real Life Problems
Although this thesis focuses on the automatic recognition of handwritten Pitman’s Shorthand
as a rapid means of text entry into a tablet PC, it is worthwhile investigating the applicability
of the system to a variety of real-life problems where speed writing is critical. For instance,
it is interesting to analyse whether the system is beneficial to international travellers as a
language translation tool, running on their personal digital assistances (PDAs) or mobile
phones. Details of this idea have also been proposed in Chapter 1. In addition, previous
ideas stated in the literature such as the application of the Pitman’s Shorthand recognition
system to real time subtitling of TV programmes and the real time transcription of lectures
and meetings need to be thoroughly studied in terms of feasibility. Finally, in order to make
the original aim of the thesis come true (i.e., to apply the system as a popular rapid text entry
into portable handheld devices), it is worthwhile formulating innovative as well as attractive
training methods via which general users can become more attracted to and interested in
Pitman’s Shorthand. This may include the implementation of Pitman’s Shorthand related
educational games, the invention of a shortcut method to learn Pitman’s Shorthand and so
on.
8.4 Dissemination
The research carried out in this thesis has been disseminated in pattern recognition specific
international journals and conference proceedings. The following provides a list of papers
that have been published or submitted throughout the research.
1. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Knowledge based
transcription of Pitman’s handwritten shorthand using word frequency and context,
8. Conclusion
180
Proceedings of the 7th IEEE International Conference on Development and Application
Systems, pp. 508-512, Suceava, Romania, 27-29 May 2004.
2. Ma Yang, Graham Leedham, Colin Higgins, & Swe Myo Htwe, Segmentation and
recognition of vocalized outlines in Pitman shorthand, Proceedings of the 17th International
Conference on Pattern Recognition, Vol. I, ISBN 0-7695-2128-2, pp. 441-444, Cambridge,
UK, 23-26 August 2004.
3. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Post Processing of
Handwriting Pitman’s Shorthand using Unigram and Heuristic Approaches, Published in
Lecture Notes in Computer Science: Document Analysis Systems VI, 3163, Springer-
Verlag, pp. 332-336, Proceedings of the IAPR workshop on document analysis systems,
University of Florence, Italy, 8-10 September 2004.
4. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, Segmentation and
recognition of phonetic features in handwritten Pitman shorthand, Pattern Recognition,
August 2004, Accepted and in press.
5. Swe Myo Htwe, Colin Higgins, Graham Leedham & Ma Yang, Evaluation of Feature
Sets in the Post Processing of Handwritten Pitman’s Shorthand, Proceedings of the 9th
International Workshop on Frontiers in Handwriting Recognition, ISBN 0-7695-2187-8, pp.
359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
6. Swe Myo Htwe, Colin Higgins & Graham Leedham, Post-processing of handwritten
Phonetic Pitman’s Shorthand using a Bayesian Network built on geometric attributes, In
Pattern Recognition and Image Analysis, Lecture Notes in Computer Science 3687,
8. Conclusion
181
Springer, Sameer Singh, Maneesha Singh, Chid Apte, Petra Perner (Eds.), pp. 569-579,
2005.
7. Swe Myo Htwe, Colin Higgins & Graham Leedham, Transliteration of online
handwritten phonetic Pitman’s Shorthand with the use of a Bayesian Network, Proceedings
of the 8th International Conference on Document Analysis and Recognition, Vol. 2, pp.
1090-1094, Seoul, Korea, 29 August - 1 September 2005.
8. Ma Yang, Graham Leedham, Colin Higgins & Swe Myo Htwe, On-line recognition of
Pitman’s Shorthand for fast mobile text entry, Proceedings of the 3rd IEEE International
Conference on Information Technology and Applications, pp. 686-691, Sydney, Australia,
4-7 July 2005.
References
182
References
Abo04 About Pen Input, Ink and Recognition, available from
www.msdn.microsoft.com, 2004.
Bc85 Brooks C.P., Computer Transcription of Written Shorthand for the Deaf, PhD
Thesis, Faculty of Engineering, University of Southampton, 1985.
BN81 Brooks C.P. & Newell A.F., Simultaneous Transcription of Pitman's New Era
Shorthand, Int. Conf. on Microprocessors in Automation and Communications,
pp. 171-179, London, 27-29 Jan, 1981.
BN85 Brooks C.P. & Newell A.F., Computer Transcription of Handwritten Shorthand
as an Aid for the Deaf - A feasibility Study, International Journal of Man-
Machine Studies, Vol.23, No.1, pp.45-60, 1985.
BSH04 Bishop C.M., Svens´en M., Hinton G.E., Distinguishing Text from Graphics in
On-line Handwritten Ink, in Proceedings of the Ninth International Workshop
on Frontiers in Handwriting Recognition (IWFHR'09), pp. 142-147, Tokyo,
Japan, 26-29 October, 2004.
CDL+99 Cowell R. G., Dawid A. P., Lauritzen S. L. & Spiegelhalter D. J., Probabilistic
Networks and Expert Systems, Springer, 1999.
CFH03 Chen F-S., Fu C-M. & Huang C-L., Hand Gesture Recognition Using a Real-
time Tracking Method and Hidden Markov Models, Image and Vision
Computing, Vol.21(8): pp. 745-758, 2003.
References
183
CK04 Cho S-J. & Kim J.H., Bayesian Network Modeling of Strokes and their
Relationships for On-line Handwriting Recognition, Pattern Recognition, Vol.
37(22): pp. 253-264, 2004.
ESS+98 El-Yacoubi A., Sabourin R., Suen C.Y. & Gilloux M., Improved Model
Architecture and Training Phase in a Off-line HMM-based Word Recognition
System, in Proc. of the 14th International Conference on Pattern Recognition,
pp.17-20, Brisbaine, Australia, 1998.
FF93 Feddag A. & Foxley E., A Lexical Analyser for Arabic, International Journal of
Man-Machine Studies, Vol.38(2): pp.313-330, Feburary, 1993.
FW00 Freeman W. & Weiss Y., On the Fixed Points of the Max-Product Algorithm,
IEEE Transactions on Information Theory, 2000.
GB04 Günter S., & Bunke H., HMM-Based Handwritten Word Recognition: on the
Optimization of the Number Of States, Training Iterations And Gaussian
Components, Pattern Recognition , Vol. 37, pp. 2069 - 2079, 2004.
GKM+97 Gloger J., Kaltenmeier A., Mandler E. & Andrews L., Rejection Management in
a Handwriting Recognition System, in Proc. 4th International Conference
Document Analysis and Recognition, pp.556-559, Ulm, Germany, 1997.
HD96 Huang C. & Darwiche A., Inference in Belief Networks: A Procedural Guide,
Intl. J. Approximate Reasoning, Vol.15(3): pp. 225-263, 1996.
Hd99 Heckerman D., A Tutorial on Learning with Bayesian Networks. In Learning in
Graphical Models, M. Jordan (ed.) MIT Press, Cambridge, MA, 1999.
References
184
HHL+04a Htwe S. M., Higgins C., Leedham C.G. & Yang M., Knowledge Based
Transcription of Pitman's Handwritten Shorthand Using Word Frequency and
Context, in the Proceedings of the 7th IEEE International Conference on
Development and Application Systems, pp. 508-512, Suceava, Romania, 27-29
May 2004.
HHL+04b Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post Processing of
Handwriting Pitman’s Shorthand using Unigram and Heuristic Approaches, In
Document Analysis Systems VI, 3163, Lecture Notes in Computer Science,
Springer-Verlag, pp. 332-336, 2004.
HHL+04c Htwe S.M, Higgins C., Leedham C.G. & Yang M., Evaluation of Feature Sets
in the Post Processing of Handwritten Pitman’s Shorthand, Proceedings of the
9th International Workshop on Frontiers in Handwriting Recognition, ISBN 0-
7695-2187-8, pp. 359-364, Kokubunji, Tokyo, Japan, 26-29 October 2004.
HHL+05a Htwe S.M, Higgins C & Leedham C.G & Yang M., Transliteration of Online
Handwritten Phonetic Pitman’s Shorthand with the use of a Bayesian Network,
Proceedings of the 8th International Conference on Document Analysis and
Recognition, Vol. 2, pp. 1090-1094, Seoul, Korea, 29 August - 1 September
2005.
HHL+05b Htwe S.M., Higgins C. & Leedham C.G & Yang M., Post-processing of
Handwritten Phonetic Pitman’s Shorthand using a Bayesian Network Built on
Geometric Attributes, Proceedings of the 3rd International Conference on
Advances in Pattern Recognition, pp. 569- 579, Bath, UK, 22-25 August 2005.
References
185
HHL+05c Htwe S.M., Higgins C., Leedham C.G. & Yang M., Post-processing of
Handwritten Phonetic Pitman’s Shorthand using a Bayesian Network Built on
Geometric Attributes, In Pattern Recognition and Image Analysis, Lecture
Notes in Computer Science 3687, Springer, Sameer Singh, Maneesha Singh,
Chid Apte, Petra Perner (Eds.), pp. 569-579, 2005.
HLB00 Hu J., Lim S.G. & Brown M.K., Writer Independent On-line Handwriting
Recognition using an HMM Approach, Pattern recognition, Vol. 33(1): pp. 133-
147, 2000.
Hn97 Nishida H., Analysis and Synthesis of Deformed Patterns Based on Structural
Models, Computer Vision and Image Understanding, Vol.68(1): pp.59-71,
October 1997.
HSS02 Hu T., Silva L.C. D. & Sengupta K., A Hybrid Approach of NN and HMM for
Facial Emotion Classification, Pattern Recognition, Vol. 23(11): pp. 1303-1310,
2002.
HV93 Hoffman J.S. & Vidal J.J., Cluster Network for Recognition of Handwritten,
Cursive Script Characters, Neural Networks, Vol.6(1): pp.69-78, 1993.
Ja99 Bilmes J.A., Natural Statistical Models for Automatic Speech Recognition, PhD
Dissertation, University of California at Berkeley, May 1999.
JBS05 Justino E. J. R., Bortolozzi F. & Sabourin R., A comparison of SVM and HMM
Classifier in the Off-line Signature Verification, Pattern Recognition Letters,
References
186
Vol. 26(9): pp. 1377-1385, 2005.
JGJ+98 Jordan M. I., Ghahramani Z., Jaakkola T. S. & Saul L. K., An Introduction to
Variational Methods for Graphical Models. In M. Jordan (ed.) Learning in
Graphical Models, MIT Press, 1998.
JJ98 Jaakkola T.S. & Jordan M.I., Learning in Graphical Models, Chapter Improving
the Mean Field Approximations via the Use of Mixture Distributions. Kluwer
Academic Publishers, 1998.
Jm99 Jordan M. I.(ed), "Learning in Graphical Models", MIT Press, 1999.
Ka04 Koerich A.L., Rejection Strategies for Handwritten Word Recognition, in Proc.
9th Int'l Workshop on Frontiers in Handwriting Recognition, Tokyo, Japan, 26-
29 October, 2004.
KKL03 Kim M., Kim D. & Lee S-Y., Face Recognition Using the Embedded HMM
with Second-order Block-specific Observations, Pattern Recognition, Vol.
36(11): pp. 2723-2735, 2003.
KP01 Kapoor A. & Picard R., A real-time Head Nod and Shake Detector, in
Proceedings from the Workshop on Perspective User Interfaces, November
2001.
KSN+03 Kumar G.H., Shankar M.R., Nagabushan P., Anami B.S., Generation of Pitman
Shorthand Language Symbol for Diphthongs Grammalogues and Punctuation
from Spoken English Language Text: An Approach based on Discrete Wavelet
References
187
Transform and Dynamic Time Wrapping Technique, Proceedings: National
Workshop on IT Services and Applications (WITSA2003), Feb 27-28, 2003.
KSN04 Kumar H.G., Shivakumara P. & Nagaraju S., A New Invariant Feature
Extraction Technique for Classification of Pitman Shorthand Symbols,
Proceedings of the International Conference on Computational Science
(ICCS2004), Krakow, Poland, June 6-9, 2004.
LD84 Leedham C.G. & Downton A.C., On-line recognition of Shortforms in Pitman's
Handwritten Shorthand, Proc. of the 7th Int. Conf. on Pattern Recognition, (also
in 1985 Research Handbook, Department of Electronics, Southampton
University, pp. 39-41), pp. 1058- 1060, Montreal, Canada, 30 July-2 August
1984.
LD86 Leedham C.G. & Downton A.C., On-line Recognition of Pitman's Handwritten
Shorthand - an Evaluation of Potential, Int. J. Man-Machine Studies, Vol. 24,
pp. 375-393, 1986.
LD87 Leedham C.G. & Downton A.C., Automatic Recognition and Transcription of
Pitman's Handwritten Shorthand - an Approach to Shortforms, Pattern
Recognition, Vol. 20, No. 3, pp. 341-348, 1987.
LDB84 Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Proc. of the 1st Int. Conf. on Human-Computer Interaction, pp. 2.86-
2.91, London, UK, 4-7 September 1984.
References
188
LDB85 Leedham C.G., Downton A.C., Brooks C.P. & Newell A.F., On-line
Acquisition of Pitman's Handwritten Shorthand as a Means of Rapid Data
Entry, Human-Computer Interaction - INTERACT '84, B. Shackel (Ed.), pp.
151-156, North Holland, 1985.
Lg84 Leedham C.G., Computer Acquisition and Recognition of Pitman's Handwritten
Shorthand, PhD Thesis, Department of Electronics, University of Southampton,
1984.
Lg89 Leedham C.G., Pitman's Handwritten Shorthand: Machine Recognition and
Transcription, Proc. of the 4th International Graphonomics Society Conference
on the Development of Graphical Skills, Trondheim, Norway, 24-26 July 1989.
Lg90 Leedham C.G., Automatic Recognition and Transcription of Pitman’s
Handwritten Shorthand, Computer Processing of Handwriting, Plamondon R.
and Leedham C. G. (Eds.), World Scientific, ISBN 981-02-0408-6, pp. 235-
269, Singapore, November 1990.
LQ89 Leedham C.G. & Qiao Y., On-line Recognition of Vocalised Pitman shorthand
Outlines, Proc. of the IEE Colloquium on Character Recognition and
Applications, Digest No. 1989/109, pp. 10/1-10/5, Savoy Place, London, 2
October 1989.
LQ90 Leedham C.G. & Qiao Y., Correcting Recognition Errors in Handwritten,
Vocalised Pitman’s Shorthand Outlines Using a Boltzmann Machine, Proc. of
the IEE Colloquium on Neural Nets in Human-Computer Interaction, Digest
No. 1990/179, pp. 2/1-2/7, Savoy Place, London, 14 December 1990.
References
189
LQ92 Leedham C.G. & Qiao Y., High Speed Text Input to Computer Using
Handwriting, Instructional Science, Vol. 21, pp. 209-221, September 1992.
Lr89 Lea R.N., Applications of fuzzy Sets to Rule-based Expert System
Development, Telematics and Informatics, Vol.6(3-4), pp. 403-406, 1989.
LY97 Li X. & Yeung D-Y., On-line Handwritten Alphanumeric Character
Recognition Using Dominant Points in Strokes, Pattern Recognition, Vol.31(1):
pp. 31-44, January, 1997.
MAG+02 Marukata S., Artires T., Gallinari P. & Dorizzi B., Rejection Measures for
Handwriting Sentence Recognition, in Proc. 8th International Workshop on
Frontiers in Handwriting Recognition, pp. 24-29, Niagara-on-the-Lake, Canada,
2002.
MB01 Marti U-V. & Bunke H., Using a Statistical Language Model to Improve the
Performance of an HMM-Based Cursive Handwriting Recognition System,
IJPRAI, Vol. 15(1): pp. 65-90, 2001.
Md98 Mackay D. J. C., Introduction to Monte Carlo Methods. In M. I. Jordan (ed.)
Learning in Graphical Models, pp. 175-204. Cambridge, MA: MIT Press.
Mic04 Microsoft Tablet PC - Ink Controls, available from www.msdn.microsoft.com,
2004
Mk01 Murphy M., Introduction to Graphical Models, Technical Report, May 2001.
Mk98 Murphy M., A Brief Introduction to Graphical Models and Bayesian Networks,
References
190
Technical Report, 1998.
Ml00 Mendez L.A.T., Viterbi Algorithm in Text Recognition, Pattern Recognition
Course, McHill University,2000.
Mm03 Miozzo M., On the Processing of Regular and Irregular Forms of Verbs and
Nouns: Evidence from Neuropsychology, Congnition, Vol.87(2), pp. 101-127,
March 2003.
Ms01 Marukatat S., Sentence Recognition through Hybird Neuro-Markovian
Modeling, in 6th ICDAR, pages 731-737, 2001.
MS04 Meyer C. & Schramm H, Boosting HMM Acoustic Models in Large
Vocabulary Speech Recognition, Proceedings of Signal and Image Processing
(SIP 2004), Honolulu, Hawaii, USA, 23-25 August 2004.
MS99 Manning C. & Schutze H., Foundations of Statistical Natural Language
Processing, MIT Press, 1999.
Mt98 Toshiyuki M., POBox: An efficient Text Input Method for Handheld and
Ubiquitous Computers, Proc. of the ACM Conference on Human Factors in
Computing System (CHI’98), Los Angeles, USA, pp. 328-335, April 1998.
NB02 Nagabhushan P. & Anami B.S., Dictionary Supported Generation of English
Text from Pitman Shorthand Scripted Phonetic Text, Proceedings of the
Language Engineering Conference (LEC'02), pp. 33, Hyderabad, India,
December 13 - 15, 2002.
References
191
NL92 Nair A. & Leedham C.G., Evaluation of Dynamic Programming Algorithms for
the Recognition of Shortforms in Pitman’s Shorthand, Pattern Recognition
Letters, Vol. 13, pp. 605-612, August 1992.
Oj95 Osborne J., Pitman 2000: Shorthand First Course (Pitman 2000 Shorthand),
1995.
Pj88 Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1988.
Pj88 Pearl J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, Morgan Kaufmann, 1988.
PP02 Pitrelli J.F. & Perrone M.P., Confidence Modeling for Verification Post-
Processing for Handwriting Recognition, in Proc. 8th International Workshop
on Frontiers in Handwriting Recognition, pp.30-35, Niagara-on-the-Lake,
Canada, 2002.
PS91 Peot M. & Shachter R., Fusion and Propagation with Multiple Observations in
Belief Networks, Artificial Intelligence, Vol.48: pp. 299-318, 1991.
PVM+03 Perraud F., Viard-Gaudin C., Morin E. & Lallican P-M., N-Gram and N-Class
Models for on Line Handwriting Recognition, In 7th ICDAR, pages 1053-1059,
2003.
QAC05 Quiniou S., Anquetil E. & Carbonnel S., Statistical language Models for On-
line Handwritten Sentence Recognition, Proceedings of the Eighth International
Conference on Document Analysis and Recognition (ICDAR 2005), pp. 516-
References
192
520, Seoul, Korea, 29 August - 1 September, 2005.
QL89 Qiao Y. & Leedham C.G., Segmentation and Classification of Consonant
Features in Vocalised Pitman Shorthand Outlines, Proc. of the Int. Conf. on
Image Processing (ICIP'89), pp. 294-298, Singapore, 5-8 September 1989.
QL91 Qiao Y. & Leedham C.G., Segmentation of Handwritten Pitman Shorthand
Outlines Using an Interactive Heuristic Search, Proc. of the 5th International
Graphonomics Society Conference, ISBN 0-9630246-0-4, pp. 157-162, Tempe,
Arizona, USA, 27-30 October 1991.
QL93 Qiao Y. & Leedham C.G., Segmentation and Recognition of Handwritten
Pitman Shorthand Outlines Using an Interactive Heuristic Search, Pattern
Recognition, Vol. 26, No. 3, pp. 433-441, March 1993.
Ri93 Russell I., Neural Networks: Theory and Applications, The Journal of
Undergraduate Mathematics and its Applications (UMAP), Vol 14(1), January
1993.
Rl89 Rabiner L., A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition, Proc. IEEE 77(2): pp. 257--286, 1989.
Sa04 Seward A., A fast HMM Match Algorithm for very Large Vocabulary Speech
Recognition, Speech Comm, Vol. 42, pp. 191-206, 2004.
SJJ96 Saul L.K., Jaakkola T. & Jordan M.I., Mean Field Theory for Sigmoid Belief
Networks, Journal of Artificial Intelligence Research, Vol. 14: pp. 61-76, 1996.
References
193
SKN+04 Shankar M.R., Kumar G.H., Nagabhushan P., Anami B.S., Linear Predictive
Coefficient Based Approach for Generation of Pitman Shorthand Language
Characters from Spoken English Language, Proceedings of the International
Conference on Computational Science (ICCS2004), Krakow, Poland, June 6-9,
2004.
Sr98 Shachter R.D.,Bayes-ball: The Rational Pastime for Determining Irrelevance
and Requisite Information in Belief Networks and Influence Diagrams. In
Uncertainty in Artificial Intelligence, 1998.
Sy94 Swales Y.Y.G., Integrating Artificial Neural Networks with Rule-based Expert
Systems, Decision Support Systems, Vol.11(5), pp.497-507, June 1994.
Tab04 Tablet PC Platform Software Development Kit (SDK) v1.5, 2004, Available
from www.microsoft.com
VBB04 Vinciarelli V., Bengio S. & Bunke H., Offline Recognition of Unconstrained
Handwritten Texts using HMMs and Statistical Language Models, IEEE
Transactions on PAMI, Vol. 26(6): pp. 709-720, 2004.
WF99 Weiss Y. & Freeman W. T., Correctness of Belief Propagation in Gaussian
Graphical Models of Arbitrary Topology. In NIPS-12, 1999.
WH93 Wellman M.P. & Henrion M., Explaining “Explaining Away”, IEEE
Transactions on Pattern Analysis and Machine Intelligence archive, Vol 15(3):
pp. 287-292, 1993.
References
194
Win05 Windows XP Tablet PC Edition 2005 Features, available from
www.microsoft.com/windowsxp/tabletpc, 25 August, 2004.
Wy00 Weiss Y., Correctness of Local Probability Propagation in Graphical Models
with Loops, Neural Computation, Vol. 12: pp. 1-41, 2000.
XL02 Xiao X. & Leedham C.G, Signature Verification Using a Modified Bayesian
Network, Pattern Recognition, Vol.35, pp. 983-995, 2002.
YLH+04a Yang M., Leedham C.G., Higgins C., & Htwe S.M, Segmentation and
Recognition of Vocalized Outlines in Pitman shorthand, Proceedings of the
17th International Conference on Pattern Recognition, Vol. I, ISBN 0-7695-
2128-2, pp. 441-444, Cambridge, UK, 23-26 August 2004.
YLH+04b Yang M., Leedham C.G., Higgins C. & Htwe S.M., Segmentation and
Recognition of Phonetic Features in Handwritten Pitman Shorthand, Pattern
Recognition, August 2004, Accepted and in press.
YLH+05a Yang M., Leedham C.G., Higgins C. & Htwe S.M, On-line Recognition of
Pitman’s Shorthand for Fast Mobile Text Entry, Proceedings of the 3rd IEEE
International Conference on Information Technology and Applications, pp. 686-
691, Sydney, Australia, 4-7 July 2005.
YLH+05b Yang M., Leedham C.G., Higgins C. & Htwe S.M., An On-line Automatic
Recognition System for Pitman’s Shorthand, Proceedings of the IEEE Region
10 Technical Conference (TENCON05), Melbourne, Australia, 21-24
November 2005.
References
195
YLH+05c Yang M., Graham Leedham., Higgins C. & Htwe S.M., Critical Technological
Issues of Commercializing a Pitman Shorthand Recognition System,
Proceedings of the 5th International Conference on Information,
Communications and Signal Processing (ICICS 2005), Bangkok, Thailand, 6-9
December 2005.
YWP95 Yang L., Widjaja B. K. & Prasad R., Application of Hidden Markov Models for
Signature Verification, Pattern Recognition, Vol.28(22): pp. 161--170, 1995.
ZB04 Zimmermann M. & Bunke H., N-Gram Language Models for Offline
Handwritten Text Recognition, in 9th IWFHR, pages 203-208, 2004.
ZK03 Zhai S., & Kristensson P-O., Shorthand Writing on Stylus Keyboard,
Proceedings of the ACM Conference on Human Factors in Computing Systems
(CHI 2003), CHI Letters 5, 1. ACM Press, pp 97-104, 2003.
Appendix
196
Chapter 9 Appendix A
Description of the 46 Rules Applied to the Automatic Generation of
a Machine-readable Pitman’s Shorthand Lexicon
The 1st Rule
Complexity: direct conversion
Objective: to verify if an input (i.e., a set of phonemes) corresponds to a vocalised outline,
containing both consonants and vowels
Strategy: check if an input contains any consonants; if it does, an input is passed to the 2nd
rule; otherwise it instructs the program to omit the current input and get another.
The 2nd Rule
Complexity: indirect conversion.
Objective: to convert a combination of /Y/, /CH/, /JH/ or /ZH/ and a vowel into a diphthong
symbol ( ). To clarify the objective, consider an example for the word “refuse” (/R Ĭ F Y
U Z/) – instead of the consonant /Y/ written as , it is combined with an adjacent vowel
/U/ and written as a diphthong (Figure 9.1).
Strategy: convert a combination of /Y/, /ZH/, /JH/ or /CH/ and /U/ or /AH/ into a diphthong
feature .
Pitman’s Shorthand outline for the word “refuse”
Reference
Pitman’s Shorthand notation
/R/ /F/ /S/ /Y/ (diphthong)
Pronunciation for the word “refuse”
/R Ĭ F Y U Z/
Appendix
197
Figure 9.1: Illustration of the use of diphthong feature in a vocalised outline
The 3rd Rule
Complexity: indirect conversion
Objective: to convert the sounds CON and COM at the beginning of a word into a dot
primitive. A sample outline containing the sound COM at the beginning is illustrated in
Figure 9.2.
Strategy: if a word starts with the sound CON or COM, and if the sound CON or COM is
not followed by the sound ING, S, Z, T or D at the end of the word, convert the sound CON
or COM into a dot primitive.
Figure 9.2: Illustration of the use of a dot primitive for the sound COM at the beginning of a word
The 4th Rule
Complexity: direct conversion
Objective: to convert the sound WH of a word into a large hook. A sample Pitman’s
Shorthand outline containing the sound WH is illustrated in Figure 9.3.
Strategy:
1. First, save words containing the sound WH in a list. In the current system, the list
contains 321 words, which are extracted from a lexicon of 99,281 words.
2. Check if a word representation of an input matches with any element of the list.
3. If it does, then the sound WH of an input is converted into a large hook, otherwise
do nothing.
Reference
Pitman’s Shorthand notations
/COM/ /M/ /NS/ /Ě/
Pronunciation for the word “commence”
/ K Ŏ M Ě N S/
Pitman’s outline for the word “commence”
Appendix
198
Figure 9.3: Illustration of the use of WH hook in a vocalised outline
The 5th Rule
Complexity: indirect conversion
Objective: to convert the sound IL-, IM-, IN-, IR- or UN-, negative prefix of a word, into a
series of consonant and vowel primitives. A sample Pitman’s Shorthand outline containing
the prefix IR- is illustrated in Figure 9.4.
Strategy:
8. Save words containing the prefix IL-, IM-, IN-, IR- or UN- in a list.
9. Check if a word representation of an input matches with any element of the list;
10. if it does and if the prefix is IL-, convert the sound IL into an upward stroke L,
followed by a dot primitive and another extra upward stroke L;
11. if it does and if the prefix is IM-, convert the sound IM- into a curve M, followed a
dot primitive and another extra curve M;
12. if it does and if the prefix is IR-, convert the sound IR- into a downward curve R,
followed by a dot primitive and another extra downward curve R;
13. if it does and if the prefix is IN-, convert the sound IN- into a curve N, followed by a
dot primitive and another extra curve N.
14. if it does and if the prefix is UN-, convert the sound UN- into curve N, followed by a
dash primitive and another extra curve N.
Pitman’s outline for the word “where”
Reference
Pitman’s Shorthand notations
/WH/ /R/ /Ā/
Pronunciation for the word “where”
/W H Ā R/
Appendix
199
In addition, the 5th rule states that a consonant /D/ following the prefix IN- and UN- is not
allowed to be omitted. This is to avoid a conflict with the ND writing rule of Pitman’s
Shorthand, in which the consonant /D/ following /N/ is omitted. Detail about the ND rule
can be referenced in Appendix B.
Figure 9.4: Illustration of the use of negative prefix IR- in a vocalised outline
The 6th Rule
Complexity: indirect conversion
Objective: to convert a pair of consonants PL, BL, TL, DL, CHL, JL, KL or GL at the
beginning, in the middle or at the end of a word into a series of a small hook L followed by a
corresponding consonant primitive. Note that the consonant L is written as an upward or
downward curve (instead of a hook) when it is not immediately following /P/, /B/, /T/, /D/,
/CH/, /J/, /K/ or /G/. A sample Pitman’s Shorthand outline containing the sound /P L/ at the
beginning of a word is illustrated in Figure 9.5.
Strategy:
4. If /N/ comes before /T L/ or /D L/, hook L is not used.
5. If /T/ or /D/ does not appear in the same syllable as /L/, hook L is not used;
6. otherwise, replace phonemes /P L/, /B L/, /T L/, /D L/, /CH L/, /J L/, /K L/ and /GL/
of an input with a, b, c, d, e, f, g and h respectively, where
Pitman’s Shorthand outline for the word “irregular”
Reference
Pitman’s Shorthand notations
/R/ /R/ /G/ /L/ /R/
(start) (middle) (end)
/Ĭ/ /Ě/ /Ă/ /U/
Pronunciation for the word “irregular”
/ Ĭ R Ě G Y U L Ă/
Appendix
200
a = hook + P stroke
b= hook + B strokes
c = hook + T stroke
d = hook + D stroke
e = hook + CH stroke
f = hook + J stroke
g = hook + K stroke
h = hook + G stroke.
Figure 9.5: Illustration of the use of PL hook in a vocalised outline
The 7th rule
Complexity: indirect conversion
Objective: to convert a pair of consonants FL, VL, ThL, ML, NL, SHL at the beginning of a
word into a series of a large hook L, followed by a corresponding consonant primitive. A
sample Pitman’s Shorthand outline containing the sound /FL/ at the beginning of a word is
illustrated in Figure 9.6.
Strategy: replace phonemes /FL/, /VL/, /ThL/, /ML/, /NL/ and /SHL/ with a, b, c, d, e and f
respectively, where
a = large hook L + F stroke
b = large hook L + V stroke
c = large hook L + Th stroke
d = large hook L + M stroke
Pitman’s Shorthand outline for the word “play”
Reference
Pitman’s Shorthand notations
/P/ /L/ /PL/ /Ā/
Pronunciation for the word “play”
/P L Ā/
Appendix
201
e = large hook L + N stroke
f = large hook L+ SH stroke
Figure 9.6: Illustration of the use of FL hook at the beginning of a vocalised outline
The 8th Rule
Complexity: indirect conversion
Objective: to convert a series of consonants SPR, STR, SKR at the beginning of a word into
a series of circle S followed by P, T or K stroke respectively. Note that a consonant R is
omitted in this case. A sample Pitman’s Shorthand outline containing the sound /SPR/ at the
beginning of a word is illustrated in Figure 9.7.
Strategy: replace phonemes /SPR/, /STR/ and /SKR/ at the beginning of input phonemes
with a, b and c respectively, where
a = circle S + stroke P
b = circle S + stroke T
c = circle S + stroke K
Pitman’s Shorthand outlinefor the word “spray”
Reference
Pitman’s Shorthand notations
/S/ /P/ /R/ /PR/ /SPR/ /Ā/
Pronunciation for the word “spray”
/S P R Ā/
Pitman’s Shorthand outline for the word “flow”
Reference
Pitman’s Shorthand notations
/F/ /L/ /FL/ /Ō/
Pronunciation for the word “flow”
/F L Ō/
Appendix
202
Figure 9.7: Illustration of the use of SPR stroke in a vocalised outline
The 9th rule
Complexity: indirect conversion
Objective: to convert the sound /STER/ in the middle or at the end of a word into a large
loop. A sample Pitman’s Shorthand outline containing the sound /STER/ at the end of a
word is illustrated in Figure 9.8.
Strategy:
1. If the sound /STER/ appears at the beginning of a word, it is not converted into a
large loop;
2. If the sound /STER/ is followed by a consonant /N/ at the end of a word, it is not
converted into a large loop;
3. otherwise, replace the sound /STER/ of a word with a large loop.
Figure 9.8: Illustration of the use of STER loop in a vocalised outline
The 10th rule
Complexity: indirect conversion
Objective: to omit primitives of the sound CON, COM, CUM or COG in the middle of a
word. A sample Pitman’s Shorthand outline containing the sound /CON/ in the middle of a
word is illustrated in Figure 9.9.
Strategy:
1. If CON, COM, CUM or COG is the only sound of a word, it is not omitted;
Pitman’s Shorthand outline for the word “master”
Reference
Pitman’s Shorthand notations
/M/ /S/ /T/ /R/ /STER/ /AH/
Pronunciation for the word “master”
/M AH S T Ă/
Appendix
203
2. otherwise, omit primitives of the sound CON, COM, CUM or COG.
Figure 9.9: Illustration of omission of the sound CON in the middle of a vocalised outline
The 11th rule
Complexity: indirect conversion
Objective: to convert the sound SES, SEZ, ZES or ZEZ at the end of a word into a large
circle. A sample Pitman’s Shorthand outline containing the sound SEZ at the end of a word
is illustrated in Figure 9.10.
Strategy: if the word is ended with the sound SES, SEZ, ZES or ZEZ, replace the sound with
a large circle.
Figure 9.10: Illustration of the use of SEZ circle in a vocalised outline
The 12th rule
Complexity: indirect conversion
Pitman’s Shorthand outline for the word “bases”
Reference
Pitman’s Shorthand notations
/B/ /SEZ/ /Ā/ Pronunciation for the word “bases”
/B Ā S Ē Z/
Pitman’s Shorthand outline for the word “reconsider”
Reference
Pitman’s Shorthand notations
/R/ /K/ /N/ /S/ /D/ /Ē/ /Ĭ/
Pronunciation for the word “reconsider”
/R Ē K Ŏ N S Ĭ D Ă/
Appendix
204
Objective: to convert the sound /ED/ that makes a verb a past tense into a disjoined stroke T
or stroke D. A sample Pitman’s Shorthand outline containing a disjointed /ED/ at the end is
illustrated in Figure 9.11.
Strategy: if a word is ended with /T/ or /D/ and if a vowel comes before /T/ or /D/, then
replace /vowel+T/ or /vowel+D/ at the end of the word with T or D stroke respectively.
Figure 9.11: Illustration of the use of a disjointed /ED/ in a vocalised outline
The 13th rule
Complexity: indirect conversion
Objective: to convert the sound /ST/ at the beginning, in the middle or at the end of a word
into a shallow loop. A sample Pitman’s Shorthand outline containing the sound /ST/ at the
beginning of a word is illustrated in Figure 9.12.
Strategy:
1. if a word begins or ends with a vowel, /ST/ at the beginning or at the end of the
word is not converted into a shallow loop;
2. if the sound /ST/ is immediately followed by the sound /SHUN/, it is not converted
into a shallow loop;
3. if /S/ and /T/ belong to two different syllables, /ST/ is not converted into a shallow
loop;
4. if /ST/ comes before /NTS/ or /NDS/ at the end of a word, it is not converted into a
shallow loop;
5. if /ST/ is followed by /R/, it is not converted into a shallow loop;
Pitman’s Shorthand outline for the word “dated”
Reference
Pitman’s Shorthand notations
/D/ /T/ /Ā/ /ED/ Pronunciation for the word “dated”
/D Ā T Ĭ D /
Appendix
205
6. otherwise, replace /ST/ of an input with a shallow loop.
Figure 9.12: Illustration of the use of ST loop in a vocalised outline
The 14th rule
Complexity: indirect conversion
Objective: to omit /T/ or /D/ at the end of one syllable words. This relates to the half-length
writing rule of Pitman’s Shorthand and a sample (one-syllable) half-length outline is
illustrated (Figure 9.13).
Strategy: if a word is a one-syllable word and if there are consonants in a word other than
just /R/ and /T/ or /T/, then /T/ or /D/ at the end the word is omitted, provided that /T/ is not
following a voiced consonant and /D/ is not following an unvoiced consonant.
Figure 9.13: Illustration of a one syllable half-length outline
The 15th rule
Complexity: indirect conversion
Objective: to convert the suffix ING into a dot primitive. A sample Pitman’s Shorthand
outline containing the suffix ING is illustrated in Figure 9.14.
Pitman’s Shorthand outline for the word “coat”
Reference
Pitman’s Shorthand notations
/K/ /T/ /Ō/
Pronunciation for the word “coat”
/K Ō T/
Pitman’s Shorthand outline for the word “stock”
Reference
Pitman’s Shorthand notations
/S/ /T/ /ST/ /K/ /Ŏ/ Pronunciation for the word “stock”
/S T Ŏ K /
Appendix
206
Strategy: if an input ends with the sound ING, convert the sound ING into a dot primitive.
Figure 9.14: Illustration of the use of suffix ING in a vocalised outline
The 16th rule
Complexity: indirect conversion
Objective: to convert the suffix INGS into a dash primitive. A sample Pitman’s Shorthand
outline containing the suffix INGS is illustrated in Figure 9.15.
Strategy: if an input ends with the sound /INGS/, convert the sound /INGS/ into a dash.
Figure 9.15: Illustration of the use of the suffix INGS in a vocalised outline
The 17th rule
Complexity: indirect conversion
Objective: to convert the suffix -SHIP into a SH stroke. A sample Pitman’s Shorthand
outline containing the suffix SHIP is illustrated in Figure 9.16.
Strategy: if a word ends with the sound /SHIP/, then convert the sound /SHIP/ into a stroke
SH.
Pitman’s Shorthand outline for the word “takings”
Reference
Pitman’s Shorthand notations
/T/ /K/ /Ā/ /INGZ/ Pronunciation for the word “takings”
/T Ā K Ĭ NG Z/
Pitman’s Shorthand outline for the word “coping”
Reference
Pitman’s Shorthand notations
/K/ /P/ /Ō/ /ING/ Pronunciation for the word “coping”
/K Ō P Ĭ NG/
Appendix
207
Figure 9.16: Illustration of the use of the suffix SHIP in a vocalised outline
The 18th rule
Complexity: direct conversion
Objective: to convert the consonant /S/ or /Z/ into a downward stroke. A sample Pitman’s
Shorthand outline containing the stroke Z is illustrated in Figure 9.17.
Strategy:
1. if a vowel comes before /S/ at the beginning of input phonemes, convert /S/ at the
beginning into a stroke S.
2. If input phonemes contain /S+vowel/, /S+vowel+S/, /S+vowel+ past tense D/ or
/S+vowel+ING/ at the end, convert /S/ at the end into a stroke S.
3. If input phonemes contain /S+vowel+S/ or /S+vowel+Z/ at the beginning, convert
/S/ at the beginning into a stroke S.
4. If a word starts with /Z/, convert /Z/ into a stroke Z.
Figure 9.17: Illustration of the use of stroke Z in a vocalised outline
Pitman’s Shorthand outline for the word “busy”
Reference
Pitman’s Shorthand notations
/B/ /Z/ /Ĭ/ /Ĭ/ Pronunciation for the word “busy”
/B Ĭ Z Ĭ/
Pitman’s Shorthand outline for the word “scholarship”
Reference
Pitman’s Shorthand notations
/S/ /K/ /L/ /R/ /AW/ /Ă/ /SHIP/ Pronunciation for the word “scholarship”
/S K AW L Ă R SH Ĭ P/
Appendix
208
The 19th rule
Complexity: indirect conversion
Objective: to convert the suffix –MENT into a stroke N. A sample Pitman’s Shorthand
outline containing the suffix –MENT is illustrated in Figure 9.18.
Strategy: if the sound /MENT/ appears at the end of a word and is preceded by a straight
upstroke, /N/, /ST/ or /S/, it is converted into stroke N.
Figure 9.18: Illustration of the use of the suffix –MENT in a vocalised outline
The 20th rule
Complexity: indirect conversion
Objective: to convert the suffix –MENTAL into a series of a stroke N followed by a
downward L. A sample Pitman’s Shorthand outline containing the suffix –MENTAL is
illustrated in Figure 9.19.
Strategy: if an input contains the sound /MENTAL/ at the end, convert the sound
/MENTAL/ into a combination of a stroke N and a downward L.
Pitman’s Shorthand outline for the word “experimental”
Reference
Pitman’s Shorthand notations
/P/ /S/ /P/ /R/ /MENTAL/ /Ě/ / Ě/ Pronunciation for the word “experimental”
/ Ě P Ě R Ĭ M Ă N T Ě L/
Pitman’s Shorthand outline for the word “apartment”
Reference
Pitman’s Shorthand notations
/P/ /RT/ /MENT/ /Ă/ /AH/ Pronunciation for the word “apartment”
/Ă P AH R T M Ă N T/
Appendix
209
Figure 9.19: illustration of the use of the suffix –MENTAL in a vocalised outline
The 21st rule
Complexity: indirect conversion
Objective: to convert the suffix –MENTALLY into a series of stroke N followed by
downward L and a vowel Ē. In fact, primitive representations of the suffix –MENTAL and
the suffix –MENTALLY are very similar. Figure 9.20 illustrates a sample Pitman’s
Shorthand outline containing the suffix –MENTALLY.
Strategy: if input phonemes contain the sound /MENTALLY/ at the end, convert the sound
/MENTALLY/ into a series of a stroke N, followed by a downward L and vowel Ē.
Figure 9.20: illustration of the use of the suffix –MENTALLY in a vocalised outline
The 22nd rule
Complexity: indirect conversion
Objective: to omit the syllables –TER, -DER, -THER and -TURE of a word according to the
double-length rule of Pitman’s Shorthand (description of the rule can be referenced in
appendix B). A sample Pitman’s Shorthand outline containing the syllable -TER is
illustrated in Figure 9.21.
Strategy: if an input contains the syllable /TER/, /DER/, /THER/ or /TURE/ in the middle or
at the end, and if the syllable is not surrounded by incompatible neighbouring primitives, the
syllable is removed from the input phonemes. Samples of incompatible neighbouring
primitives are given (Figure 9.22).
Pitman’s Shorthand outline for the word “experimentally”
Reference
Pitman’s Shorthand notations
/P/ /S/ /P/ /R/ /MENTAL/ /Ě/ / Ě/ Pronunciation for the word “experimentally”
/ Ě P Ě R Ĭ M Ă N T Ě L/
Appendix
210
Figure 9.21: Illustration of the omission of the syllable –TER in a vocalised outline
Figure 9.22: illustration of incompatible primitive pairs for doubling
The 23rd rule
Complexity: indirect conversion
Objective: to omit the consonant /D/, following the consonant /M/ or /N/ of a word
according to the MD and ND writing rules of Pitman’s Shorthand (description of the rules
can be referenced in appendix B). A sample Pitman’s Shorthand outline containing the
sound /MD/ is given in Figure 9.23.
Strategy:
1. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound
/SHUN/, the consonant /D/ is not omitted;
2. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by the sound /N/,
/NS/ or /NT/ at the end of a word, the consonant /D/ is not omitted;
3. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by a vowel at the end
of a word, the consonant /D/ is not omitted;
4. if a series of consonants /M/ and /D/ or /N/ and /D/ is followed by vowel+/S/ or
vowel+/Z/ at the end of a word, the consonant /D/ is not omitted;
Primitive pairs that cannot be represented by doubling
/F K/ /V K/ /F G/ /V G/
Pitman’s Shorthand outline for the word “after”
Reference
Pitman’s Shorthand notations
/FTER/ /F/ /Ă/
Pronunciation for the word “after”
/ AH F T Ă R/
Appendix
211
5. otherwise, omit a consonant /D/, following /M/ or /N/.
Figure 9.23: Illustration of occurrence of the sound MD in of vocalised outline.
The 24th rule
Complexity: indirect conversion
Objective: to convert a pair of consonants FR, VR, Thr, THR, SHR, ZHR, MR or NR at the
beginning of a word into a series of a small hook followed by a corresponding consonant
primitive. A sample Pitman’s Shorthand outline containing the sound /FR/ at the beginning
of a word is illustrated in Figure 9.24.
Strategy: if an input contains the sound /FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/ or
/NR/ at the beginning, then replace phonemes / FR/, /VR/, /Thr/, /THR/, /SHR/, /ZHR/, /MR/
or /NR/ with a, b, c, d, e, f, g and h respectively, where
a = small hook + stroke F
b = small hook + stroke V
c = small hook + stroke Th
d = small hook + stroke TH
e = small hook + stroke SH
f = small hook + stroke ZH
g =small hook + stroke M
h= small hook + stroke N
Pitman’s Shorthand outline for the word “madam”
Reference
Pitman’s Shorthand notations
/MD/ /M/ /Ă/ /Ă/
Pronunciation for the word “madam”
/ M Ă D Ă M/
Appendix
212
Figure 9.24: Illustration of the use of FR hook at the beginning of a vocalised outline
The 25th and 26th rules
Complexity: indirect conversion
Objective: to convert the syllable PR, BR, TR, DR, CHR, JR, KR, GR, FR, VR, Thr, THR,
SHR, ZHR, MR, NR at the beginning, in the middle or at the end of a word into a series of a
small hook followed by a corresponding consonant primitive. A sample Pitman’s Shorthand
outline containing the syllable PR is illustrated in Figure 9.25.
Strategy: replace the syllable /PR/, /BR/, /TR/, /DR/, /CHR/, /JR/, /KR/, /GR/, /FR/, /VR/,
/Thr/, /THR/, /SHR/, /ZHR/, /MR/ and /NR/ with a, b, c, d, e, f, g, h, i, j, k, l, m, n, o and p
respectively, where
a = small hook + stroke P
b = small hook + stroke B
c = small hook + stroke T
d = small hook + stroke D
e = small hook + stroke CH
f = small hook + stroke J
g = small hook + stroke K
h = small hook + stroke G
i = small hook + stroke F
j = small hook + stroke V
k = small hook + stroke Th
Pitman’s Shorthand outline for the word “free”
Reference
Pitman’s Shorthand notations
/F/ /R/ /FR/ /Ē/
Pronunciation for the word “free”
/ F R Ē/
Appendix
213
l = small hook + stroke TH
m = small hook + stroke SH
n = small hook + stroke ZH
o = small hook + stroke M
p = small hook + stroke N
Figure 9.25: Illustration of occurrence of the syllable PR in a vocalised outline
The 27th rule
Complexity: indirect conversion
Objective: to omit the consonant /R/ in the sound /SKR/ or /SGR/. A sample Pitman’s
outline containing the sound /SKR/ is illustrated in Figure 9.26.
Strategy: if an input contains the sound /SKR/ or /SGR/, and if the sound /SGR/ is not at the
beginning of the input, then replace /SKR/ or /SGR/ with a or b respectively, where
a = circle S+ stroke K
b = circle S + stroke G
Pitman’s Shorthand outline for the word “describe”
Reference
Pitman’s Shorthand notations
/D/ /S/ /K/ /R/ /B/ /SKR/ /Ě/ /I/
Pronunciation for the word “describe”
/ D Ě S K R I B/
Pitman’s Shorthand outline for the word “paper”
Reference
Pitman’s Shorthand notations
/P/ /R/ /PER/ /Ā/
Pronunciation for the word “paper”
/ P Ā P Ă R /
Appendix
214
Figure 9.26: Illustration of occurrence of the sound SKR in a vocalised outline
The 28th rule
Complexity: indirect conversion
Objective: to convert the sound /KW/ or /GW/ of a word into a series of a large hook
followed by a corresponding consonant primitive. A sample Pitman’s Shorthand outline
containing the sound /KW/ is illustrated in Figure 9.27.
Strategy: replace the sound /KW/ and /GW/ with a and b respectively, where
a = large hook + stroke K
b = large hook + stroke G
Figure 9.27: illustration of occurrence of the sound KW in a vocalised outline
The 29th rule
Complexity: indirect conversion
Objective: to convert the syllable PL, BL, TL, DL, CHL, JL, KL or GL in the middle and at
the end a word into a series of a small hook followed by a corresponding consonant
primitive.
Strategy: replace the syllable PL, BL, TL, DL, CHL, JL, KL, and GL with a, b, c, d, e, f, g
and h respectively, where
a = small hook + P stroke
b= small hook + B stroke
c = small hook + T stroke
Pitman’s Shorthand outline for the word “quick”
Reference
Pitman’s Shorthand notations
/K / /W/ /KW/ /Ĭ/
Pronunciation for the word “quick”
/ K W Ĭ K/
Appendix
215
d = small hook + D stroke
e = small hook + CH stroke
f = small hook + J stroke
g= small hook + K stroke
h= small hook + G stroke
The 30th rule
Complexity: indirect conversion
Objective: to convert the syllable FL, VL, THL, ML, NL or SHL in the middle or at the end
of a word into a series of a large hook followed by a corresponding consonant primitive. A
sample outline containing the syllable FL is illustrated in Figure 9.28.
Strategy: replace the syllable FL, VL, THL, ML, NL and SHL with a, b, c, d, e and f
respectively, where
a = large hook + stroke F
b = large hook + stroke V
c = large hook + stroke TH
d = large hook + stroke M
e = large hook + stroke N
f = large hook + stroke SH
Figure 9.28: illustration of the use of FL hook in a vocalised outline
Pitman’s Shorthand outline for the word “flow”
Reference
Pitman’s Shorthand notations
/F/ /L/ /F U L / /Ĭ/
Pronunciation for the word “flow”
/ F U L F Ĭ L/
Appendix
216
The 31st rule
Complexity: indirect conversion
Objective: to convert a pair of consonants /S/ and /H/ into a circle S. A sample Pitman’s
Shorthand outline containing a series of /S/ and /H/ is illustrated in Figure 9.29.
Strategy: replace the sound /S H/ of input phonemes with a circle S.
Figure 9.29: illustration of occurrence of /S/ followed by /H/ in a vocalised outline
The 32nd rule
Complexity: indirect conversion
Objective: to omit a small hook R when there is a series of S+vowel+hookR or
ST+vowel+hookR. A sample Pitman’s Shorthand outline containing a series of
S+vowel+hookR is illustrated in Figure 9.30.
Strategy: if an input contains /S+vowel+hookR/ or /ST+vowel+hookR/, omit the hook R.
Figure 9.30: illustration of the occurrence of a series of S+vowel+hookR in a vocalised outline
Pitman’s Shorthand outline for the word “supper”
Reference
Pitman’s Shorthand notations
/S / /P/ /R/ /PR/ /S PER/ /Ŭ/
Pronunciation for the word “supper”
/ S Ŭ P Ĭ R/
Pitman’s Shorthand outline for the word “racehorse”
Reference
Pitman’s Shorthand notations
/R/ /H/ /S / /R/ /Ā/ /AW/
Pronunciation for the word “racehorse”
/ R Ā S H S AW R S/
Appendix
217
The 33rd rule
Complexity: direct conversion
Objective: to convert the consonant /L/ into a downward stroke. A sample Pitman’s
Shorthand outline containing the consonant /L/ is illustrated in Figure 9.31.
Strategy:
1. if /L/ is following a hook N, it is not converted into downward stroke.
2. If /L/ is not at the beginning of a word and if it is following the stroke /N/ or /NG/, it
is converted into a downward stroke.
Figure 9.31: illustration of the use of a downward stroke L in a vocalised outline
The 34th rule
Complexity: indirect conversion
Objective: to convert the consonant /F/ or /V/ at the end of a word into a small hook. A
sample Pitman’s Shorthand outline containing a hook F is illustrated in Figure 9.32.
Strategy:
1. if /F/ or /V/ is the only consonant of an input, it is not converted into a small hook.
2. if /F/ or /V/ is followed by the sound -ING, -INGS, T, D, S or Z and if it is not
preceded by the consonant /L/, it is converted into a small hook.
Pitman’s Shorthand outline for the word “only”
Reference
Pitman’s Shorthand notations
/N/ /L/(down) /Ō / /Ē/
Pronunciation for the word “only”
/ Ō N L Ē/
Appendix
218
Figure 9.32: Illustration of the use of hook F in a vocalised outline
The 35th rule
Complexity: indirect conversion
Objective: to convert a consonant /F/ or /V/ in the middle of a word into a small hook. A
sample Pitman’s Shorthand outline containing the hook F in the middle is illustrated in
Figure 9.33.
Strategy:
1. if /F/ or /V/ is at the beginning of an input, it is not converted into a small hook.
2. If /F/ or /V/ is in the middle of an input, and if neighbouring consonants of /F/ or /V/
are two straight downward strokes, or a combination of a straight downward stroke
and a curve (Th, S or Z), the consonant /F/ or /V/ is converted into a small hook.
Figure 9.33: Illustration of the use of hook V in the middle of a vocalised outline
The 36th rule
Complexity: indirect conversion
Pitman’s Shorthand outline for the word “divide”
Reference
Pitman’s Shorthand notations
/D/ /V/ /D V/ /D/ /Ĭ/ /I/
Pronunciation for the word “divide”
/ D Ĭ V I D/
Pitman’s Shorthand outline for the word “rough”
Reference
Pitman’s Shorthand notations
/R/ /F/ /R F / /Ŭ/
Pronunciation for the word “rough”
/ R Ŭ F/
Appendix
219
Objective: to convert the sound SHUN in the middle or at the end of a word into a small or
large hook. A sample Pitman’s Shorthand outline containing a large SHUN hook at the end
is illustrated in Figure 9.34.
Strategy:
1. if the sound SHUN appears at the beginning of a word, it is not converted into a
hook;
2. if the sound SHUN is preceded by a circle S or Z, it is converted into a small hook;
3. otherwise, the sound SHUN in the middle or at the end of an input is converted into
a large hook.
Figure 9.34: Illustration of the use of a large SHUN hook in a vocalised outline
The 37th rule
Complexity: indirect conversion
Objective: to convert the consonant /N/ of a word either into a small hook or a circle. A
sample Pitman’s outline containing a hook N at the end is illustrated in Figure 9.35.
Strategy:
1. if /N/ is at the beginning of an input, it is not converted into a small hook.
2. If /N S/ appears at the end of an input and is preceded by a curve stroke, the
consonant /N/ is not converted into a small hook.
3. If /N/ is immediately following /S/ or /Z/, it is not converted into a small hook.
4. If /N/ is at the end of an input, it is converted into a small hook.
Pitman’s Shorthand outline for the word “attention”
Reference
Pitman’s Shorthand notations
/T/ /N/ /SHUN / /Ă/ /Ě
Pronunciation for the word “attention”
/ Ă T Ĕ N SH Ĭ N/
Appendix
220
5. If /N Z/ appears at the end of an input and is preceded by a curve stroke, the sound
/N Z/ is converted into a series of a small hook followed by a small circle.
6. If /N Z/ or /N S/ appears at the end of an input and is preceded by a straight stroke,
the sound /N Z/ or /N S/ is converted into a small circle.
7. If the sound /N SES/ or /N ZES/ appears at the end of an input and is preceded by a
straight stroke, the sound /N SES/ or /N ZES/ is converted into a large circle.
8. If the sound /N STER/ or /N ST/ appears at the end of an input and is preceded by a
straight stroke, the sound /N STER/ or /N ST/ is converted into a large loop or small
loop respectively.
9. If the sound /N T S/ or /N D S/ appears at the end of an input and is preceded by a
straight stroke, the sound /N T S/ or /N D Z/ is converted into a small circle.
Figure 9.35: Illustration of the use of hook N at the end of a vocalised outline
The 38th rule
Complexity: direct conversion
Objective: to convert the consonant /L/ of a word into an upward stroke. A sample Pitman’s
Shorthand outline containing the consonant /L/ is illustrated in Figure 9.36.
Strategy: replace consonant /L/ with an upward stroke L.
Pitman’s Shorthand outline for the word “alone”
Reference
Pitman’s Shorthand notations
/L/ /N/ /N / /Ă/ /Ō/
Pronunciation for the word “alone”
/Ă L Ō N/
Appendix
221
Figure 9.36: Illustration of the use of an upward stroke L in a vocalised outline
The 39th rule
Complexity: indirect conversion
Objective: to omit the consonant /D/ or /T/ in a word of two or more syllables according to
the half-length rule of Pitman’s Shorthand (description of the rule can be referenced in
appendix B). A sample Pitman’s Shorthand outline containing the omission of /D/ and /T/ is
illustrated in Figure 9.37.
Strategy:
1. if neighbouring consonants of /T/ or /D/ are incompatible, the consonant /T/ or /D/ is
not omitted. A list of incompatible neighbouring consonants in relation to the
omission of /T/ or /D/ is given (Figure 9.38).
2. otherwise, /T/ or /D/ is omitted.
Figure 9.37: Illustration of omission of T or D in a vocalised outline
Pitman’s Shorthand outline for the word “deduct”
Reference
Pitman’s Shorthand notations
/D/ /D D/ / K/ /K T/ /Ĕ/ /Ŭ/
Pronunciation for the word “deduct”
/ D Ĕ D Ŭ K T/
Pitman’s Shorthand outline for the word “mail”
Reference
Pitman’s Shorthand notations
/M/ /L/ / Ā/
Pronunciation for the word “mail”
/ M Ā L/
Appendix
222
Figure 9.38: Illustration of incompatible combination of primitives for halving
The 40th rule
Complexity: indirect conversion
Objective: to convert the suffix –LY into a vowel primitive according to the –LY rule of
Pitman’s Shorthand (description of the rule can be referenced in appendix B). A sample
Pitman’s Shorthand outline containing the suffix –LY is illustrated in Figure 9.39.
Strategy: if the input contains the suffix –LY, replace the suffix –LY with a vowel Ĭ.
Figure 9.39: Illustration of the use of suffix –LY in a vocalised outline
The 41st rule
Complexity: direct conversion
Objective: to convert the consonant /R/ into an upward or downward stroke. Sample
Pitman’s Shorthand outlines containing an upward or downward R are illustrated in Figure
9.40.
Strategy:
Pitman’s Shorthand outline for the word “solely”
Reference
Pitman’s Shorthand notations
/S/ /L/ / Ō/ /Ĭ /
Pronunciation for the word “solely”
/ S Ō L L Ĭ /
A series of primitives that cannot be represented by halving
F/V + K/G + T/D
K/G +K/G +T/D
L+K/G+T/D
R+T/D
S+T
Appendix
223
1. if /R/ appears at the beginning of an input, it is converted into an upward stroke.
2. If /R/ is followed by a sounded vowel at the end of an input, it is converted into an
upward stroke.
3. If /R/ appears at the end of an input, it is converted into a downward stroke.
4. If /R/ is preceded by a vowel at the beginning of an input, it is converted into a
downward stroke.
5. If /R/ is followed by a circle S or SES at the end of an input, it is converted into a
downward R.
6. If /R/ is followed by /M/, it is converted into a downward stroke.
Figure 9.40: Illustration of the use of an upward or downward R in vocalised outlines
The 42nd rule
Complexity: indirect conversion
Objective: to convert the consonant /H/ at the beginning of a word into a dash primitive. A
sample Pitman’s Shorthand outline containing a dash H at the beginning is illustrated in
Figure 9.41.
Strategy: if /H/ appears at the beginning of an input and is followed by /M/, /L/ or downward
R, it is converted into a dash primitive.
Pitman’s Shorthand outline for the words “rail” and “erase”
Reference
Pitman’s Shorthand notations
/R/ /L/ / R/ /S/ /Ĕ/ /Ā/
Pronunciations for the words “rail” and “erase”
/ R Ā L/, /Ĕ R Ā S/
Appendix
224
Figure 9.41: Illustration of the use of a dash H in a vocalised outline
The 43rd rule
Complexity: indirect conversion
Objective: to reverse an orientation of initially hooked FR, VR, Thr or THR according to the
“reverse rule” of Pitman’s Shorthand (description of the rule can be referenced in appendix
B). A sample Pitman’s Shorthand outline containing a reversed VR hook is illustrated in
Figure 9.42.
Strategy: if a series of /small_hook+F/, /small_hook+V/, /small_hook+Thr/ or /small_
hook+THR/ is followed by an upstroke or a horizontal stroke, it is converted into a series of
/small hook +R/, /small hook+R/, /small hook+stroke S/ or /small hook +stroke Z/
respectively.
Figure 9.42: Illustration of the use of reverse VR hook in a vocalised outline
The 44th rule
Complexity: direct conversion
Pitman’s Shorthand outline for the word “cover”
Reference
Pitman’s Shorthand notations
/K/ /V R/ /V R/ (reversed) /Ŭ/
Pronunciation for the word “cover”
/ K Ŭ V Ĭ R/
Pitman’s Shorthand outline for the word “home”
Reference
Pitman’s Shorthand notations
/H/ /H/ /M/ /Ō/
Pronunciation for the word “home”
/ H Ō M/
Appendix
225
Objective: to convert consonants that have not been converted into geometric features into
their corresponding primitives.
Strategy:
1. replace the consonant /P/ with stroke P
2. replace the consonant /B/ with stroke B
3. replace the consonant /T/ with stroke T
4. replace the consonant /D/ with stroke D
5. replace the consonant /K/ with stroke K
6. replace the consonant /G/ with stroke G
7. replace the consonant /M/ with stroke M
8. replace the consonant /N/ with stroke N
9. replace the consonant /NG/ with stroke NG
10. replace the consonant /F/ with stroke F
11. replace the consonant /V/ with stroke V
12. replace the consonant /Th/ with stroke Th
13. replace the consonant /TH/ with stroke TH
14. replace the consonant /W/ with a series of small hook followed by upward R
15. replace the consonant /Y/ with a series of small hook followed by upward R
16. replace the consoant /CH/ with stroke CH
17. replace the consonant /JH/ with stroke JH
18. replace the consonant /SH/ with stroke SH
19. replace the consonant /S/ with a small circle
20. replace the consonant /Z/ with a small circle
21. replace the consonant /ZH/ with a stroke ZH
22. replace the consonant /H/ with a series of small circle, followed by stroke R
The 45th rule: extract vowels of a word and append them to the end of consonant primitives.
Appendix
226
The 46th rule: convert vowels into their corresponding geometric primitives.
Appendix
227
Appendix B
Certain Rules of Pitman’s Shorthand Mentioned in the Discussion
on the Automatic Generation of a Machine-readable Pitman’s
Shorthand Lexicon
Rule’s name Description of the rule stated in Pitman’s Shorthand[Oj95]
MD, ND Strokes M and N are halved and thickened to add the
following sound of D.
Double length strokes All curved strokes are doubled in length to represent the
addition of the syllables –TER, -DER, -THER, -TURE.
Half-length strokes In words of two or more syllables a stroke is generally halved
to indicate the following sound T or D.
Suffix -LY The suffix –LY is represented by upward L and the third place
Ĭ vowel.
Reversed FR, VR, Thr,
THR
The initially hooked FR, VR, Thr, THR are always reversed
when immediately following upstrokes and horizontals.