pentelligence: combining pen tip motion and writing sounds ... · algorithm (dtw) to distinguish...

11
Pentelligence: Combining Pen Tip Motion and Writing Sounds for Handwritten Digit Recognition Maximilian Schrapel University of Hannover Human-Computer Interaction Hannover, Germany [email protected] Max-Ludwig Stadler University of Hannover Human-Computer Interaction Hannover, Germany Michael Rohs University of Hannover Human-Computer Interaction Hannover, Germany [email protected] ABSTRACT Digital pens emit ink on paper and digitize handwriting. The range of the pen is typically limited to a special writing surface on which the pen’s tip is tracked. We present Pentelligence, a pen for handwritten digit recognition that operates on regular paper and does not require a separate tracking device. It senses the pen tip’s motions and sound emissions when stroking. Pen motions and writing sounds exhibit complementary properties. Combining both types of sensor data substantially improves the recognition rate. Hilbert envelopes of the writing sounds and mean-filtered motion data are fed to neural networks for majority voting. The results on a dataset of 9408 handwritten digits taken from 26 individuals show that motion+sound out- performs single-sensor approaches at an accuracy of 78.4% for 10 test users. Retraining the networks for a single writer on a dataset of 2120 samples increased the precision to 100% for single handwritten digits at an overall accuracy of 98.3%. ACM Classification Keywords H.5.2. Information Interfaces and Presentation: User Inter- faces – input devices and strategies. Author Keywords Digital pen; handwriting recognition; digit recognition; sound emissions; writing sound; writing motion; neural networks INTRODUCTION Traditional input methods like software keyboards are not suit- able for small devices such as smart watches. An issue of direct touch input is the “fat finger” problem [30], i.e., the occlusion of the target area under the finger. For precise input such as writing or drawing, capacitive touch pens and induc- tive stylus pens help to overcome some of these issues [7]. In note taking tasks users often still prefer analog documents [27]. This indicates a gap between the analog and digital world. Dig- ital or smart pens that operate on paper have the potential to create a bridge between both worlds. There are various prod- ucts and technical approaches on the market. The livescribe 3 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CHI 2018, April 21–26, 2018, Montréal, QC, Canada. © 2018 ACM ISBN 978-1-4503-5620-6/18/04 ...$15.00. https://doi.org/10.1145/3173574.3173705 Figure 1. Handheld Pentelligence prototype while writing. The housing was removed for this picture to show the position of the inner hardware. pen by Anoto [1] applies paper with a micro dot pattern and an infrared camera to detect strokes. Based on ultrasound and infrared light the Inkling Pen by Wacom [10] can operate together with a receiver box on any A4 paper sheet. Both sys- tems can digitize handwriting and sketches. Digital pens are useful in many application scenarios. For instance, students can easily share and recall their notes from a lecture [19] or patient charts can be directly digitized in hospital work [31]. But the need for special paper (as with the Anoto pen) or constraints in writing space with receiver boxes (as with the Wacom Inkling pen) can be problematic in some situations. In this paper we present Pentelligence a novel digital pen, which uses a microphone and an inertial measurement unit to capture audio and motion data for handwritten digit recogni- tion. Our prototype does not differ substantially from ordinary pens in appearance, weight, and size. It combines the strengths of audio and motion data to achieve high recognition rates on our data set of about 9400 digit samples. The dataset was taken from 26 individuals and classified by deep neural networks with majority voting. Up to now, research has focused on approaches with sensors such as cameras, motion, or audio for handwriting recognition, but to the best of our knowledge we are the first to unite audio and motion on pens. We show that a combination of motion and audio can achieve better results on digits than single sensor CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada Paper 131 Page 1

Upload: others

Post on 25-Aug-2020

16 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

Pentelligence: Combining Pen Tip Motion and WritingSounds for Handwritten Digit Recognition

Maximilian SchrapelUniversity of Hannover

Human-Computer InteractionHannover, Germany

[email protected]

Max-Ludwig StadlerUniversity of Hannover

Human-Computer InteractionHannover, Germany

Michael RohsUniversity of Hannover

Human-Computer InteractionHannover, Germany

[email protected]

ABSTRACTDigital pens emit ink on paper and digitize handwriting. Therange of the pen is typically limited to a special writing surfaceon which the pen’s tip is tracked. We present Pentelligence, apen for handwritten digit recognition that operates on regularpaper and does not require a separate tracking device. It sensesthe pen tip’s motions and sound emissions when stroking. Penmotions and writing sounds exhibit complementary properties.Combining both types of sensor data substantially improvesthe recognition rate. Hilbert envelopes of the writing soundsand mean-filtered motion data are fed to neural networks formajority voting. The results on a dataset of 9408 handwrittendigits taken from 26 individuals show that motion+sound out-performs single-sensor approaches at an accuracy of 78.4%for 10 test users. Retraining the networks for a single writeron a dataset of 2120 samples increased the precision to 100%for single handwritten digits at an overall accuracy of 98.3%.

ACM Classification KeywordsH.5.2. Information Interfaces and Presentation: User Inter-faces – input devices and strategies.

Author KeywordsDigital pen; handwriting recognition; digit recognition; soundemissions; writing sound; writing motion; neural networks

INTRODUCTIONTraditional input methods like software keyboards are not suit-able for small devices such as smart watches. An issue ofdirect touch input is the “fat finger” problem [30], i.e., theocclusion of the target area under the finger. For precise inputsuch as writing or drawing, capacitive touch pens and induc-tive stylus pens help to overcome some of these issues [7]. Innote taking tasks users often still prefer analog documents [27].This indicates a gap between the analog and digital world. Dig-ital or smart pens that operate on paper have the potential tocreate a bridge between both worlds. There are various prod-ucts and technical approaches on the market. The livescribe 3

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] 2018, April 21–26, 2018, Montréal, QC, Canada.© 2018 ACM ISBN 978-1-4503-5620-6/18/04 ...$15.00.https://doi.org/10.1145/3173574.3173705

Figure 1. Handheld Pentelligence prototype while writing. The housingwas removed for this picture to show the position of the inner hardware.

pen by Anoto [1] applies paper with a micro dot pattern andan infrared camera to detect strokes. Based on ultrasoundand infrared light the Inkling Pen by Wacom [10] can operatetogether with a receiver box on any A4 paper sheet. Both sys-tems can digitize handwriting and sketches. Digital pens areuseful in many application scenarios. For instance, studentscan easily share and recall their notes from a lecture [19] orpatient charts can be directly digitized in hospital work [31].But the need for special paper (as with the Anoto pen) orconstraints in writing space with receiver boxes (as with theWacom Inkling pen) can be problematic in some situations.

In this paper we present Pentelligence a novel digital pen,which uses a microphone and an inertial measurement unit tocapture audio and motion data for handwritten digit recogni-tion. Our prototype does not differ substantially from ordinarypens in appearance, weight, and size. It combines the strengthsof audio and motion data to achieve high recognition rates onour data set of about 9400 digit samples. The dataset was takenfrom 26 individuals and classified by deep neural networkswith majority voting.

Up to now, research has focused on approaches with sensorssuch as cameras, motion, or audio for handwriting recognition,but to the best of our knowledge we are the first to unite audioand motion on pens. We show that a combination of motionand audio can achieve better results on digits than single sensor

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 1

Page 2: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

1 2 3 4 5 6

Figure 2. Detailed view of the digital pen prototype in original size. The USB connector of the cable extends the prototype.1 = micro-USB jack, 2 = USB/UART converter, 3 = microcontroller, 4 = microphone with amplifier, 5 = inertial measurement unit, 6 = write sensor.

approaches. Furthermore the recognition is not limited to aspecial writing surface or predefined single or multi-stroketrajectories, which are important facts for the usability due tothe individual writing style of every human. We demonstratethat a high accuracy for a target user can be reached whenclassifiers are retrained with samples of this user. We alsoinvestigated in the acceptability of the pen prototype and pointout avenues for future research.

RELATED WORKThere has been extensive work in the field of handwritingrecognition and gesture input for over five decades [15, 18].There are several camera-based approaches, which are sen-sitive to lighting conditions [3, 8, 14, 20, 21, 22]. Our con-tribution is related to combining the strengths of motion andsound emissions from pens to achieve high recognition rates.Hence this chapter is divided into motion and audio data forhandwriting and pattern recognition.

Motion Data for Handwriting RecognitionCapturing the motion of a pen tip is a well-explored topic [5,6, 11, 29]. Applying inertial sensors for handwriting andgesture detection does not require an external reference [5].For this purpose various methods and classifiers have beenevaluated. Typically characters are modeled as trajectories.For instance Choi et. al [11] used a triaxial accelerometerwith principle component analysis (PCA) to reduce the featurevector size and computation time for a hidden Markov model(HMM) classifier. They also applied the dynamic time warpingalgorithm (DTW) to distinguish the Arabic numerals ‘0’ and‘6’ despite their similarity. They reached 100% and 90.8%recognition rates for writer-dependent and independent data,respectively.

Wang and Chuang [29] proposed a trajectory recognition algo-rithm for an accelerometer-based pen. Time- and frequency-domain features were extracted from an accelerometer andreduced by kernel-based class separability (KBCS) and lineardiscriminant analysis (LDA). A probabilistic neural network(PNN) on their data set achieved an overall recognition rate of98% for handwritten digits and 98.75% on predefined gestures.

Bashir and Kempf [6] presented a novel pen device with aninertial measurement unit combined with pressure sensors forpassword entry. The data set contained ten different letters,numbers, and symbols. DTW was applied to each and thecombination of all input channels. They could show that thecombination of input channels achieves higher recognitionrates of over 99% with an immensely reduced response timeof less than 500 ms. Hence, sensing of motion and pressurepromises good results for handwriting recognition.

Acoustic Data for Handwriting RecognitionPen or finger strokes provide a very rich basis for acousticfeatures that can be applied to handwriting or gesture recogni-tion. Harrison and Hudson [12] used a stethoscope to detectdistinct multi-part gestures – composed of taps, lines, andcircles – on different surfaces. Hwang et. al [13] presented acheap pressure estimation touch pen, which captured the audiogenerated by the pen tip when stroked on a touch screen. Forthis purpose the audio data was filtered and analyzed in thefrequency domain.

Li [16] proposed to use envelopes of audio signatures for hand-writing authentication. He demonstrated that Hilbert envelopesof writing sounds, which are caused by the friction betweenthe pen tip and paper, achieve a recognition rate of betterthan 75% with a straightforward multi-layer back propaga-tion neural network. Li and Hammond [17] applied the meanamplitude and Mel-frequency cepstral coefficients (MFCC)features for template matching with DTW. They achieved anaccuracy of 80% for 26 English characters, if constrained to aparticular drawing order.

Seniuk and Blostein [25] focused on pen acoustic emissionsby taping a microphone to the midpoint of the pen shaft. Twodata sets consisting of 26 words and the cursive lowercasealphabet from a single writer were recorded. Three classifica-tion algorithms have been evaluated: Signal subtraction, peakcomparison, and scale space representation. Their prelimi-nary results show that the classification with peak comparisonreaches 95% on words and 70% on letters with signal subtrac-tion.

In contrast to related work, we apply both sound and motiondata from pens to combine the strengths of each sensor fordigit classification. To recognize different writing styles ofeach digit majority voting neural networks are trained on adataset recorded from 21 people and then individualized onsamples of a single target user to achieve high accuracies. Wefocus on digits, as a misclassification cannot be subsequentlycorrected in the same way as words.

The rest of the paper is structured as follows. First we presentthe hardware of our prototype. Then we show how the sen-sor data is preprocessed for the classifiers. Furthermore, weintroduce the topology of the neural networks and describethe dataset we captured. Subsequently, recognition results ofdifferent classifiers are presented, both on all test users andindividualized on a single writer. In addition, a questionnaireassesses the usability of our prototype. Finally, we discussthe results and limitations and point out avenues for futureresearch.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 2

Page 3: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

Communication

µC

Motion

Sensors

Sound

Write

Sampling & Framing

USB/UART transmission

Micro-controller

Figure 3. Simplified block diagram of the hardware prototype.

HARDWARE PROTOTYPEPentelligence is a lightweight digital pen with nearly the samelook and feel as a regular ball pen. Figure 2 shows a detailedoverview of the Pentelligence prototype. Due to the minimiza-tion of integrated circuits we achieve a length of 12 cm at only11 g and 12 mm diameter. The 1 mm thick housing is printedwith a Lulzbot 3D TAZ and ABS material. The ball pointtogether with the ink cartridge and spring are taken from aconventional low cost pen. The ATmega328p microcontrollergathers audio and motion data as well as force exerted on thepen tip and sends them in frames to the computer via USB.The microcontroller runs at 16 MHz and contains an Arduinobootloader for easy programming.

SensorsThe segmentation of single strokes is done by detecting pres-sure on the pen tip. To this end a contact sensor is mountedon the spring to determine whether the pen tip touches thewriting surface. Moreover, by analyzing the binary contactinformation first, computational overhead and unintended in-put can be avoided. The analog omnidirectional microphoneMM34202 by DBUnlimited is placed directly on the printedcircuit board to avoid scratching audios from the ink cartridgeon its housing. This MEMS microphone has a flat frequencyresponse up to 16 kHz, a high signal to noise ratio (SNR) of58 dB, and a sensitivity of -42 dB. The audio signal is ampli-fied with an OPA344 rail-to-rail op-amp by Texas Instrumentsand measured with a 10-bit analog-to-digital converter. Theinertial measurement unit BMI160 by Bosch Sensortec, con-sisting of an accelerometer and a gyroscope with 16-bit valueson each axis, is placed as close to the pen tip as possible, tooptimally detect pen tip motion. All parts are surface-mountedand mainly developed for smart phones and wearables.

CommunicationThe data of all sensors is sent to a computer, which is con-nected via USB, for further processing. Wired communicationwas chosen for the prototype because of its reliability andsimplicity. In the first iteration of the prototype we aimed toexclude potential problems with wireless transmissions. TheFT232RL USB-UART converter by FTDI in our implemen-tation achieves a data rate of 2 MBaud without errors. Allsensor values are transmitted in a frame starting with an 8-bitsequence number for synchronization and frame loss detectionat the receiver. In total each frame consists of 124 data bits.The data rate is 7100 frames per second, which corresponds to880.4 kbps.

SIGNAL PREPROCESSINGA universal asynchronous receiver-transmitter (UART) splitsall data to be sent into single bytes with a start and stop bit [23].Hence the incoming data stream first has to be synchronized tothe baud rate and all measured audio and motion values haveto be re-assembled. Then these values are stored in a bufferfor further processing.

Write SensorBetween individual strokes, either within or between symbols,there is a brief time period during which the pen tip has nocontact to the surface. The contact and no-contact states canbe detected with the write sensor in order to segment the datastream into strokes. A preliminary test with two participantsshowed that 700 ms is an appropriate value for segmentingthe sequence into digits. To exemplify the impact of sequencesegmentation the low-pass filtered audio stream of a handwrit-ten ‘7’ with middle stroke is analyzed in Figure 5. The redlines beside of marker 5 represent the short time period ofapproximately 100 ms when the pen tip is lifted to write themiddle stroke. Moreover by pointing the pen tip on a surfacea high audio peak at markers 1 and 6 can be observed. Thisinformation could also be utilized for segmenting the sequencebut it is less reliable because grabbing a pen or putting it onthe table would also generate similar acoustic emissions.

FFT of Audio Sequence-40

-50

-60

-70

-80

-90

-100

-110

104fnyquist

Frequency (Hz)

Am

plitu

de (

dB)

Audio Sequence of digit 7

1

2

3

4 5

6

7

8

1000

800

600

400

200

0

-200

-400

-600

-800

-1000

Am

plitu

de (

mV

)

0 200 400 600 800Time (ms)

103

102

101

Figure 5. Smoothed audio sequence of a handwritten digit ‘7’ and re-lated Hanning-windowed FFT recorded at 44.1 kHz sampling rate. 1 =sound emission peak of pen tip touching the surface, 2 = trajectory oftop stroke, 3 = end of top stroke, 4 = stroke downwards, 5 = pen tip liftedand repositioned, 6 = sound emission peak like 1, 7 = trajectory of middlestroke, 8 = pen tip is lifted and digit is finished.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 3

Page 4: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

Classification

Result

Majority voting

Neural Networks

Data acquisition

Sequence trimming

Synchronization

Prototype

Preprocessing

Downsampling

Hilbert envelope

Average filtering

10 2

Sound

MotionAudio

Motion

Write Fea

ture

s

Figure 4. Block diagram of signal processing, partitioned in data acquisition, preprocessing, and classification.

Audio ProcessingFurther analysis of the smoothed audio sequence shows thatevery stroke has characteristic acoustic features. At marker 3in Figure 5 the pen tip is not lifted from the surface but thestroke direction changes, which results in high signal peaks.During every stroke the acoustic emissions seem to be modu-lated by a resonance frequency of the prototype and surface.To determine the required sampling rate to acquire the mostacoustic information of handwritten digits it is necessary toanalyze the raw audio stream in the frequency domain.

For that purpose Figure 5 exemplifies the signal componentsby a Fast Fourier Transformation (FFT) calculated with theaudio editing tool Audacity and a Hanning window. It canbe seen that the information of a handwritten digit ‘7’ liesbelow 2 kHz. This frequency distribution is caused by thestructure of normal paper and the speed of stroking the pen tip.Consequently a sample rate of 7.1 kHz is sufficient to coverthe frequencies that are relevant for classification.

Noise is another relevant aspect for pattern recognition withaudio. Li [16] proposed to use the normalized Hilbert enve-lope of writing sounds as the feature space. We comparedthis approach to an FFT representation (with a Hamming win-dow) and to the raw audio data of each symbol. In an initialevaluation we recorded 40 samples of each digit from a singlewriter. All recorded raw sound emissions of each digit werethen compared to all others with FastDTW [24]. FastDTWprovides a linear and accurate approximation of dynamic timewarping. The procedure was repeated for Hilbert envelopesand FFT. The resulting distances were averaged and comparedto each other by normalizing the highest distance to 100%.

Dissimilarity comparison (normalized)

Method Result (%)

Hilbert 100FFT 72Raw audio 71

Table 1. Comparison of written digit dissimilarities from sound emis-sions. The FastDTW distances of all pairs of different symbols are av-eraged and normalized to the method with the highest distance. Higherpercentages mean better distinguishability.

Higher distances correspond to a better discriminability be-tween the digits which implies higher accuracies for the neuralnetwork classifiers. Table 1 shows that the normalized Hilbert

envelopes clearly outperforms FFT and raw audio data ap-proaches for a single writer. The recorded sound emissionscontain noise related to the surface as well as other sourcessuch as resonances of the prototype itself. Li [16] stated thatthe movement of a pen tip on paper imposes an amplitudemodulation effect on carriers such as resonances of the proto-type or characteristics of the surface. Hilbert envelopes help toovercome such issues by smoothing the signal curve. We aimto provide high recognition rates without special paper. Hencefeatures related to the structure of paper have to be removedin preprocessing.

After applying Hilbert envelopes on the incoming signal it hasto be stretched to the number of input neurons because of thevarying input length. Moreover, the signal amplitude has to benormalized to an interval between 0 and 1.

Motion Data ProcessingThe accelerometer and gyroscope values of each axis are down-sampled and averaged to 150 Hz. Due to the relatively slowmotion of the pen tip, stretching the data to 150 input neuronsis adequate for classification. For comparison, Krishnan et.al [15] used a sampling rate of 100 Hz and achieved recogni-tion accuracies of over 86% for 5 gestures with an Adaboostclassifier. Wang et. al. [29] achieved accuracies of 97% fordigits on the 100 Hz data of their accelerometer-based penwith a probabilistic neural network classifier.

CLASSIFICATIONHandwriting itself is very complex and shows strong individ-ual characteristics [32]. We also observed that single partic-ipants of our study wrote the same digits in more than oneway. For instance the number ‘5’ visualized in Figure 6 wasoften written in two styles by the same participants. Henceclassification with template matching approaches may sufferunder different trajectories and cannot detect both in a sin-gle step. Moreover the growing number of symbol variationswould increase the classification time rapidly. Another impor-tant fact is the challenging feature selection for audio sincethe microphone provides very noisy data. Motion can alsobe challenging due to the rotation variance and the individualstyle of holding the pen. Thus we decided to investigate aneural network approach. Furthermore this provides the op-portunity of individualization for every user by retraining theclassifier after collecting a few samples of each digit withoutan extensive feature preselection. The only limitation is thatthe time between multiple strokes of a single digit must notexceed the threshold between digits.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 4

Page 5: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

Figure 6. Left: different observed writing styles of ‘5’ written by a singleperson. Middle and right: typical trajectories of ‘1’ and ‘9’ written bydifferent persons.

DatasetWe collected a dataset to capture digit samples in variouswriting styles. 23 male and 3 female volunteers (average26.5 years, SD 8.0) were recruited, three of whom were left-handed.

The participants were instructed to repeatedly write the samedigit on a 21.0 cm x 29.7 cm white squared paper with a boxsize of 11 mm x 14 mm and 15 boxes per row. In sync with asoft click sound that occurred every 2 s the participants wrotethe same digit into a new box until two rows were collected.The study was conducted in a quiet environment and the clicksounds were soft enough not to interfere with the sound cap-turing of the pen. The digits from ‘0’ to ‘9’ were presentedin different random orders to different participants in order tocounterbalance order effects due to fatigue or frustration. Ithas been shown that frustration influences the pressure andspeed at which a user is moving the pen tip [4]. Furthermoredifferences in pressure leads to different sound emissions onthe surface [13].

The captured training set consists of 6169 handwritten samplesfrom 21 of 26 individuals including two left handed male andtwo female participants. For the test set five right handed maleparticipants who had already conducted the study for the train-ing data were asked to do the study again and additionally onefemale and four male volunteers were recruited. In summary,the test set consists of 3239 handwritten digit samples. Forthe individualization of the classifiers one right handed malevolunteer who already gave samples for the training and testset conducted the study a third time. The individualization setcontains 2120 samples with 212 samples of each digit dividedinto 90 training and 122 test samples. At the end of the studyevery participant was handed out a short questionnaire to givefeedback on the usability of the pen.

Neural Network ConfigurationFinding an optimal setup for deep neural networks is challeng-ing, because many parameters have to be considered. We builtclassifiers for motion data with binary write sensor informa-tion, for audio alone and in combination with write sensor dataas well as all sensors together in one neural network. Deter-mining the optimal number of hidden layers and neurons forall classifiers is a very important aspect. If the complexity ofthe networks is too low to model the characteristics of the dataset the error rates will be high. If the complexity is too high,long training and recall times are the consequence. Hencemany rules of thumb and optimization algorithms have been

proposed over the last two decades [28]. Preliminary testson different network topologies according to [28] with ourdataset show that a continuous decreasing number of hiddenneurons in Table 2 achieve high recognition rates. Furthermorethe fourth hidden dense layer for audio features is relevant tomodel the characteristics of each digit.

Neural Network topologies

LayerNetwork

MotionAudio

AudioAll& Write & Write

Neurons

Input 1050 3500 3650 4400

hidd

en

1 1050 3500 3650 44002 790 2802 2952 34423 530 2104 2254 24844 1406 1556 1406

Output 10 10 10 10Table 2. Applied number of layers and neurons of neural networks us-ing motion with write information, audio alone and together with writeinformation and all sensors combined.

Besides of a well-performing network topology in Table 2overfitting of the classifier on the training set has to be consid-ered. When the networks adapt too much to the training dataset, they loose accuracy on unknown samples. To avoid overfit-ting, we apply dropout, a technique that temporarily removesa set of randomly selected neurons and their connections [26].

Achieving a high generalization in one neural network is chal-lenging due to the different observed writing styles in ourdataset. The complexity of the optimal topology is possiblytoo high to be sufficiently trained given our limited set oftraining samples. Alpaydin proposed to train multiple neuralnetworks independently and to apply a voting scheme whichincreases the generalization significantly [2]. Thus we ex-amine majority voting networks with the same topology andequal voting weight. For this purpose the training set is splitinto subsets of equal size depending on the number of partici-pating networks and the accuracy is compared to single neuralnetwork classifiers.

IndividualizationTo achieve higher accuracies the classifiers are optimized to theindividual writing style of a target user by applying a secondtraining step. The individualization dataset from a single writeris utilized to evaluate this approach. The previously trainedneural networks on the training dataset for audio as well asmotion with binary write sensor data are retrained and theoverall and single digit accuracies are examined alone and incombination of both majority voting classifiers.

RESULTSIn this section, we present the results from examining dropoutrates on our dataset following with accuracies of single and ma-jority voting neural networks based on Keras [9] and Python2.7. Furthermore, the recognition rates of individualized clas-sifiers for a single writer are given. Additionally, we show theresults of a short questionnaire about the prototype.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 5

Page 6: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

0 1 2 3 4 5 6 7 8 9

01

23

45

67

89

Tru

e

0 1 2 3 4 5 6 7 8 9Predicted

74.3 0.3 1.2 0.0 0.0 0.0 23.5 0.0 0.6 0.0

1.9 73.9 0.6 0.9 10.6 0.3 2.5 0.9 4.3 4.0

0.3 2.5 88.5 0.9 0.0 0.3 0.6 3.1 1.6 2.2

0.0 0.0 1.9 95.0 0.0 0.6 0.3 0.3 1.2 0.6

0.0 6.2 2.5 0.0 84.6 0.3 0.0 0.0 0.0 6.5

0.0 0.0 1.5 16.0 0.0 59.7 1.8 16.3 4.0 0.6

10.1 0.0 0.0 0.0 8.9 0.0 81.0 0.0 0.0 0.0

0.0 1.5 12.5 0.3 4.0 0.3 0.0 81.3 0.0 0.0

7.8 0.0 0.0 0.0 0.3 0.0 3.1 0.0 88.5 0.3

0.0 2.8 0.6 26.5 0.9 1.2 0.9 0.0 1.5 65.4

0 1 2 3 4 5 6 7 8 9

65.9 5.6 2.5 0.3 0.0 0.0 14.2 0.9 9.9 0.6

0.6 82.3 8.1 0.3 2.5 0.6 0.3 1.6 0.6 3.1

2.8 0.9 68.3 3.1 1.9 7.5 3.7 5.3 2.8 3.7

0.9 4.6 3.4 51.1 6.5 3.4 0.6 1.5 9.9 18.0

0.0 4.3 4.3 5.6 68.5 1.5 0.9 2.8 2.2 9.9

0.6 0.3 9.2 9.2 16.6 38.5 0.9 12.9 7.1 4.6

14.7 9.8 8.0 3.4 0.0 0.3 58.7 0.3 3.1 1.8

0.0 1.5 4.0 4.3 20.8 7.6 1.8 52.3 1.8 5.8

11.5 0.9 1.6 1.9 0.9 0.6 9.0 0.3 62.4 10.9

6.5 3.1 0.3 11.7 5.6 1.2 2.5 0.0 11.1 58.00%

20%

40%

60%

80%

100%Audio Motion & Write All Sensors

66.3 6.2 1.5 0.3 0.0 0.0 14.9 0.3 10.5 0.0

2.8 77.6 8.4 0.0 1.6 1.2 2.8 0.9 1.2 3.4

2.8 0.6 63.7 2.5 2.8 7.1 5.3 6.5 5.0 3.7

1.9 1.2 2.5 50.2 6.5 5.0 0.9 1.9 14.2 15.8

0.0 1.5 5.6 5.6 69.8 2.8 1.2 4.6 4.3 4.6

1.2 0.0 6.5 6.2 14.5 46.2 0.9 12.0 8.6 4.0

15.3 8.3 4.0 3.1 0.6 0.3 62.7 0.0 4.0 1.8

0.9 0.9 1.8 6.1 24.8 9.5 0.6 49.2 2.4 3.7

10.9 0.6 0.6 3.4 0.9 0.9 6.5 0.3 64.3 11.5

9.0 0.9 0.6 21.3 7.7 2.8 0.6 0.0 16.0 41.0

70.9 6.8 2.8 0.6 0.3 0.0 11.1 0.3 6.8 0.3

3.4 76.1 8.4 0.6 2.8 1.2 2.2 1.6 0.9 2.8

5.6 0.9 68.0 0.9 1.9 9.6 4.3 5.0 1.6 2.2

1.5 3.4 4.6 46.1 6.8 4.0 1.9 1.2 13.6 16.7

0.9 3.7 5.6 7.1 61.1 3.4 1.2 9.3 2.2 5.6

0.6 0.6 7.7 6.5 12.6 47.7 2.2 10.8 4.9 6.5

16.8 9.5 6.7 3.1 0.3 1.2 58.7 0.0 2.4 1.2

0.9 0.3 4.9 6.7 18.3 11.0 0.3 52.6 0.6 4.3

15.5 1.9 0.9 7.1 0.3 0.9 7.5 0.3 55.9 9.6

9.0 1.5 1.5 15.7 8.3 1.5 1.5 0.0 14.2 46.6

Figure 7. Relative confusion matrices of classifiers trained with audio, motion & write information and all sensors combined.

DropoutThe first point of interest is the dropout rate for the neuralnetworks of our approach. Table 3 shows that the highestaccuracy can be achieved when a dropout rate of 25% is ap-plied for acoustic emissions from handwritten digits. Thisresult was also confirmed by the other neural networks fromTable 2. Reducing or increasing the percentage of randomlyomitted neurons from each layer during training iterationsleads to lower accuracies. Hence the further examinations arerelated to a dropout rate of 25% in the training procedure ofall classifiers.

Dropout Accuracy0% 0.70

10% 0.7425% 0.7950% 0.76

Table 3. Comparison of dropout rates for single neural networks trainedwith audio data. In every training epoch a percentage of randomly cho-sen neurons from each layer is omitted to avoid overfitting.

ClassifiersBy applying a dropout rate of 25% the influence of the numberof nets on the accuracy is to be examined. For this purposea single neural network is compared to 4, 5, and 8 majorityvoting classifiers with the same setup.

Precision Recall

Net

s

1 0.58 0.574 0.61 0.605 0.60 0.598 0.52 0.52

Table 4. Precision and recall of classifiers with different numbers ofvoting neural networks in same topology.

The results in Table 4 show that majority voting with fouraudio classifiers outperform single neural networks. All othertopologies from Table 2 validated these findings. Moreover aone-way-ANOVA was conducted and confirmed statistically

that single networks achieve lower accuracies than four major-ity classifiers on our dataset (F3,4 = 7.11, p < 0.01). Hence,the further examination is based on majority voting amongfour networks.

To observe the individual performance of each test user onthe different classifiers Figure 8 shows the mean precisions onall digits. The test users 1 to 5 conducted the study twice togenerate samples for the training and test datasets. Users 6to 10 are completely unknown to the classifiers and achievelower accuracies on all neural networks. User 9 was the onlyleft-handed person in our test dataset and shows in comparisonto all others the lowest recognition rates. Moreover, only forthis user audio performs better than motion. For all other usersmotion data achieves the highest precision. The write sensorinformation together with audio performs slightly better onknown writing styles than audio alone. For the completely newwriting styles of users 6 to 10 audio alone shows better resultson right-handed persons of our training dataset with fewerleft-handed individuals. Networks with all sensors achieve aslightly higher precision than audio alone when the writingstyles are known for the classifiers.

For further comparison of the different classifiers, the accu-racy of individual digits on our dataset has to be investigated.Figure 7 compares confusion matrices of (1) audio data alone,(2) motion with write sensor data, and (3) audio + motion +write sensor data. Audio alone achieves an overall accuracyof 58.1%, while all sensors combined reach 60.6%, and mo-tion with write sensor data outperforms the other classifiersat 79.2%. Handwritten digits ‘0’ are often misclassified bynetworks with motion and write information as ‘6’ while au-dio classifiers confuse them less frequently when the digit isactually a ‘0’. A particular observation is that motion withwrite data classifiers have a lower accuracy on digits ‘5’ and‘9’ than on others. A ‘9’ is often classified as a ‘3’ whilevice versa 95% of all samples are predicted correctly. Forthis specific case audio networks have a higher accuracy. Asimilar phenomenon can be found for the digits ‘5’ and ‘7’.The probability of predicting a ‘5’ when it is actually a ‘7’is 16.3% while the audio classifier has a probability of only10.8%.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 6

Page 7: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

From the findings in Figure 7 and 8 it can be stated that theusage of all sensors in one majority voting classifier is not asapplicable as audio or motion with writing information alone.The networks could not identify the strengths of each sensor toachieve higher accuracies than motion classifiers. Hence thefound strengths have to be generalized and applied differently.

For this purpose motion first classifies an unknown digit andwhen the outcome has a higher accuracy on audio networksthey verify the result. For instance if a ‘3’ was predicted bythe motion classifier, audio networks proof the decision bydetermining that it is not actually a ‘9’. This exchange ofaccuracies in the confusion matrices causes a theoreticallyincreasing accuracy on handwritten digits ‘9’ to 76.2% witha decreasing accuracy on digits ‘3’ to 78.3%. In comparisonto the confusion matrix of the motion with write informationclassifier in Figure 7 the overall accuracy decreased theoreti-cally to about 78.86% with the advantage of a better precisionon handwritten digits ‘9’ while decreasing the accuracy ondigit ‘3’. Re-evaluation of a predicted ‘6’ that it is not actuallya ‘0’ can also increase the overall accuracy to 79.5% by adecreasing precision on the digit ‘6’.

To generalize this exchanging rule let MatM be the confusionmatrix for motion with write information and MatA for audiodata. Further the indicator t is the true and f the false predicteddigit for exchange. If the rows and columns of the confusionmatrices in Figure 7 are normalized to the interval [0,1] thenthe distance dgt, f in equation 1 is the increasing accuracy ofthe target digit and dlt, f in equation 2 the decreasing accuracyof the false predicted digit. If the exchange with the audioclassifier increase the mean accuracy of the true and falsepredicted digits in equation 3 then the re-validation rule canbe applied. To avoid the classifier from re-evaluating toomuch, a threshold ε ≥ 0 can be applied and the rule can onlybe utilized once per digit for not decreasing the accuracy on

1 2 3 4 5 6 7 8 9 10Testuser ID

0.0

0.2

0.4

0.6

0.8

1.0

Pre

cisi

on

SensorAudioAudio & WriteMotion & WriteAll

Precision of digits per test user

Figure 8. Mean precision of digits for all test users with various classi-fiers. The majority voting neural networks were trained with differentsensor combinations. Users 1 to 5 are known while 6 to 10 are completelynew for the classifiers.

0 1 2 3 4 5 6 7 8 9Predicted

01

23

45

67

89

Tru

e

81.7 0.3 1.2 0.0 0.0 0.0 16.1 0.0 0.6 0.0

1.9 73.9 0.6 0.3 10.6 0.3 2.5 0.9 4.3 4.7

0.3 2.5 88.5 0.9 0.0 0.3 0.6 3.1 1.6 2.2

0.0 0.0 1.9 78.6 0.0 0.6 0.3 0.3 1.2 17.0

0.0 6.2 2.5 0.0 84.6 0.3 0.0 0.0 0.0 6.5

0.0 0.0 1.5 14.8 0.0 69.8 1.8 6.2 4.0 1.8

19.3 0.0 0.0 0.0 8.9 0.0 71.9 0.0 0.0 0.0

0.0 1.5 12.5 0.3 4.0 9.5 0.0 72.2 0.0 0.0

9.0 0.0 0.0 0.0 0.3 0.0 1.9 0.0 88.5 0.3

0.0 2.8 0.6 17.9 0.9 1.2 0.9 0.0 1.5 74.10%

20%

40%

60%

80%

100%Combined Classifier

Figure 9. Confusion matrix of combined classifier. First the sampleswere predicted with motion and write sensor data, then the outcome wasvalidated by the audio classifier.

single exchange digits too much. Furthermore by permittingnegative thresholds for the worst classified digits by motiondata the confusion can be attenuated.

dgt, f = MatM[t, f ]− (MatM[t, f ] · (1−MatA[t, t])) (1)

dlt, f = MatM[ f , f ]− (MatM[ f , f ] · (1−MatA[ f , t])) (2)

dgt, f −dlt, f ≥ ε (3)

If the presented rule with a threshold ε = 0.03 is applied onthe confusion matrices in Figure 7, the classifier re-evaluatesthe result of the motion networks between the digits ‘0’ to‘6’, ‘5’ to ‘3’ and ‘1’ to ‘4’ or ‘6’ to ‘4’ on audio data. Toincrease the accuracies of the most confused digits ‘5’ and ‘9’negative thresholds are permitted for these symbols. Hencethe exchange rule is applied on ‘0’ to ‘6’, ‘3’ to ‘9’ and ‘5’ to‘7’.

The theoretical combination of motion and audio classifierscan not state the real results because the statistical distributionof the test samples is unknown. Hence the resulting combinedclassifier of all sensors was proofed on our test dataset. Thetheoretical rule for combining the strengths of motion andaudio could be confirmed by the confusion matrix in Figure 9.The features of captured audios and motion of pen tips providecomplementary features which can be combined for increasingthe accuracy on single handwritten digits. In order to achieveapplicable accuracies for human-computer interaction, theindividualization of the classifiers must be considered.

IndividualizationThe classifiers were retrained with 90 samples of each digitby a single writer. For this purpose 50 training iterations

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 7

Page 8: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

0 1 2 3 4 5 6 7 8 9

01

23

45

67

89

Tru

e

0 1 2 3 4 5 6 7 8 9Predicted

74.3 0.3 1.2 0.0 0.0 0.0 23.5 0.0 0.6 0.0

1.9 73.9 0.6 0.9 10.6 0.3 2.5 0.9 4.3 4.0

0.3 2.5 88.5 0.9 0.0 0.3 0.6 3.1 1.6 2.2

0.0 0.0 1.9 95.0 0.0 0.6 0.3 0.3 1.2 0.6

0.0 6.2 2.5 0.0 84.6 0.3 0.0 0.0 0.0 6.5

0.0 0.0 1.5 16.0 0.0 59.7 1.8 16.3 4.0 0.6

10.1 0.0 0.0 0.0 8.9 0.0 81.0 0.0 0.0 0.0

0.0 1.5 12.5 0.3 4.0 0.3 0.0 81.3 0.0 0.0

7.8 0.0 0.0 0.0 0.3 0.0 3.1 0.0 88.5 0.3

0.0 2.8 0.6 26.5 0.9 1.2 0.9 0.0 1.5 65.4

0 1 2 3 4 5 6 7 8 9

65.9 5.6 2.5 0.3 0.0 0.0 14.2 0.9 9.9 0.6

0.6 82.3 8.1 0.3 2.5 0.6 0.3 1.6 0.6 3.1

2.8 0.9 68.3 3.1 1.9 7.5 3.7 5.3 2.8 3.7

0.9 4.6 3.4 51.1 6.5 3.4 0.6 1.5 9.9 18.0

0.0 4.3 4.3 5.6 68.5 1.5 0.9 2.8 2.2 9.9

0.6 0.3 9.2 9.2 16.6 38.5 0.9 12.9 7.1 4.6

14.7 9.8 8.0 3.4 0.0 0.3 58.7 0.3 3.1 1.8

0.0 1.5 4.0 4.3 20.8 7.6 1.8 52.3 1.8 5.8

11.5 0.9 1.6 1.9 0.9 0.6 9.0 0.3 62.4 10.9

6.5 3.1 0.3 11.7 5.6 1.2 2.5 0.0 11.1 58.00%

20%

40%

60%

80%

100%Audio Motion & Write Combined

66.3 6.2 1.5 0.3 0.0 0.0 14.9 0.3 10.5 0.0

2.8 77.6 8.4 0.0 1.6 1.2 2.8 0.9 1.2 3.4

2.8 0.6 63.7 2.5 2.8 7.1 5.3 6.5 5.0 3.7

1.9 1.2 2.5 50.2 6.5 5.0 0.9 1.9 14.2 15.8

0.0 1.5 5.6 5.6 69.8 2.8 1.2 4.6 4.3 4.6

1.2 0.0 6.5 6.2 14.5 46.2 0.9 12.0 8.6 4.0

15.3 8.3 4.0 3.1 0.6 0.3 62.7 0.0 4.0 1.8

0.9 0.9 1.8 6.1 24.8 9.5 0.6 49.2 2.4 3.7

10.9 0.6 0.6 3.4 0.9 0.9 6.5 0.3 64.3 11.5

9.0 0.9 0.6 21.3 7.7 2.8 0.6 0.0 16.0 41.0

70.9 6.8 2.8 0.6 0.3 0.0 11.1 0.3 6.8 0.3

3.4 76.1 8.4 0.6 2.8 1.2 2.2 1.6 0.9 2.8

5.6 0.9 68.0 0.9 1.9 9.6 4.3 5.0 1.6 2.2

1.5 3.4 4.6 46.1 6.8 4.0 1.9 1.2 13.6 16.7

0.9 3.7 5.6 7.1 61.1 3.4 1.2 9.3 2.2 5.6

0.6 0.6 7.7 6.5 12.6 47.7 2.2 10.8 4.9 6.5

16.8 9.5 6.7 3.1 0.3 1.2 58.7 0.0 2.4 1.2

0.9 0.3 4.9 6.7 18.3 11.0 0.3 52.6 0.6 4.3

15.5 1.9 0.9 7.1 0.3 0.9 7.5 0.3 55.9 9.6

9.0 1.5 1.5 15.7 8.3 1.5 1.5 0.0 14.2 46.6

60.6 0.0 0.0 0.0 0.0 0.0 39.4 0.0 0.0 0.0

0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 99.2 0.0 0.8 0.0 0.0

0.8 0.0 0.8 0.0 0.0 0.0 98.5 0.0 0.0 0.0

0.0 0.0 0.8 0.0 0.0 0.0 0.0 99.2 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0

0.0 0.0 0.0 11.4 0.0 0.0 0.0 0.0 0.0 88.6

99.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.0

2.3 91.7 0.0 0.0 3.0 0.0 0.8 0.8 1.5 0.0

0.0 0.0 92.5 1.5 0.0 0.0 2.3 3.0 0.0 0.8

0.8 0.8 2.3 88.7 0.0 1.5 0.0 5.3 0.8 0.0

0.0 0.0 0.8 0.0 97.7 0.0 0.0 0.8 0.0 0.8

0.0 0.0 9.8 0.0 0.0 81.8 0.0 7.6 0.8 0.0

6.8 0.8 1.5 0.0 0.0 0.0 90.9 0.0 0.0 0.0

0.8 0.0 6.8 0.8 3.8 3.8 0.0 81.1 2.3 0.8

9.0 0.0 0.0 0.0 0.8 0.0 0.0 0.0 89.5 0.8

4.5 0.0 3.8 15.2 0.0 0.0 2.3 0.0 1.5 72.7

100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 99.2 0.0 0.8 0.0 0.0

6.8 0.0 0.8 0.0 0.0 0.0 92.4 0.0 0.0 0.0

0.0 0.0 0.8 0.0 0.0 3.8 0.0 95.5 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0

0.0 0.0 0.0 3.8 0.0 0.0 0.0 0.0 0.0 96.2

Figure 10. Relative confusion matrices of audio, motion & write information and combined classifiers with re-validation rule for a single writer.

were chosen. The previously defined confusions between thedigits ‘0’ to ‘6’, ‘3’ to ‘9’ and ‘5’ to ‘7’ were handled by theindividualized combined classifier.

The results in Figure 10 for a single right-handed writer con-firm that audio and motion of pen tips have complementaryfeatures which can be utilized to achieve high recognitionrates. While the classifier based on motion data is not ableto reliably distinguish ‘0’ and ‘6’, the classifier trained onaudio data does not show strong confusion of both digits. Theaudio classifier achieved an overall accuracy of 88.58% whilemotion with writing information reaches 94.61%. The combi-nation of both classifiers outperforms the single classifiers atrecognition rates of 98.33%. Moreover all test samples of ‘0’to ‘4’ and ‘8’ are 100% correctly predicted, which correspondsto 722 test samples. The worst result of 92.4% on handwrittendigits ‘6’ is related to the re-validation by the audio classifierand confusion to digit ‘0’. Due to the very low confusion ofaudio networks that a predicted ‘3’ is actually a ‘9’ the com-bined classifier achieves a recognition rate of approximately96.2% on digit ‘9’.

Qualitative ResultsAfter completion of the study, each participant was handed outa short questionnaire. The results are shown in Figure 11. Wewanted to find out if people are worried about taking samplesof their handwriting. Taking such samples was necessaryto generate the initial dataset. Only 16.7% of all surveyedparticipants were worried about privacy issues. 54.2% strongly

The cable did not bother me.

I take handwritten notes in my everyday life.

I'm not worried about the datacollection of my handwriting.

11 4 5 4

2 5 4 9 4

4 3 4 13

StronglyDisagree AgreeDisagree

StronglyAgreeNeutral

Figure 11. Results of the questionnaire on 24 participants in the studyhanded out after completing the data acquisition.

stated that they don’t mind about storing audio and motiondata of their handwriting.

We also asked the participants if they take notes on paper intheir everyday life, because this indicates the applicability ofour prototype in comparison to other digital pens with specialwriting surfaces on which the pen tip is tracked. Approxi-mately 8.3% of the participants do not take any notes at all,while 20.8% stated that note-taking is not part of their dailylife. On the other hand 16.7% strongly agree and 37.5% agreethat they do take notes in their everyday life.

For further improvement of our prototype we also asked ifthe wired connection bothered the participants. About half(45.8%) of the participants stated that the cable should beremoved to not disturb them while writing. About a third(37.5%) of the participants were not bothered by the cable.

DISCUSSIONAs a result of the questionnaire, the cable of the hardwareprototype should to be replaced by a wireless connection.Our aim is to avoid any constraints of writing space and tomaximize freedom of writing movements and for that purposecables are not helpful. Furthermore, 16.7% were worriedabout collecting samples of their handwriting, which may berelated to a concern of assembling their signature and fakingtheir identity.

By reviewing the results of the classifiers it can be statedthat our dataset provides a rich base of different handwritingsamples. We observed overfitting on the neural classifierswhich is reduced by applying dropout during the trainingprocess. Moreover not all individual writing styles could berepresented by a single classifier. Applying majority votingwith four neural networks achieved the highest precision onevery sensor combination.

Classifiers trained on audio data show more confusion betweendigits than classifiers trained on motion data. This is possiblydue to fewer features than provided by 6-axis motion sensors.Moreover motion features are more reliable than audio data.Motion classifiers show less confusion across all digits butachieve lower accuracy when, for instance, the input is pre-

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 8

Page 9: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

dicted as ‘6’ but it was actually a ‘0’. This is because of thesimilar trajectories and writing styles of both digits. In thatspecific case audio provides complementary features whichcan be utilized to improve the accuracy.

When all sensors are applied together in a classifier with major-ity voting, the recognition rates do not outperform motion clas-sifiers. The confusion across all digits of acoustic emissionsstrongly influence the classifier and decrease the accuracy. Itseems that surface and pressure related writing audios are veryindividual and show a large fluctuation over all test users asshown in Figure 8. Hence the input weights of audio featurescould be decreased to achieve a higher focus on motion data.

To overcome such challenges in combining the strengths ofeach sensor a re-validation rule was presented in equation 1 to3, which evaluates the result of motion data by classifying theaudio stream. The results show that the accuracy on digits canbe increased by utilizing the complementary audio features.Moreover for this purpose the accuracy on the confused digitshad to be decreased but there are also cases on digits such as‘0’ and ‘6’ where the overall accuracy compared to motionclassifiers can be increased. In case of loud ambient noise,the audio power level when no input is performed could beutilized to prevent the revalidation of the result of the motionclassifiers.

The achieved accuracy of 78.38% is not high enough to usethe prototype in practice. It can also be stated that the meanprecision on digits varies between all test users. For a singleleft-handed test user the precision decreased below 50% on allclassifiers whereby motion data with binary write informationachieve a lower precision than audio. The different writingstyles and possibilities to hold the pen of every user stronglyinfluence the classification results. Not all individual stylescan be recognized equally well by our classifiers.

Hence an individualization of the classifiers was applied. Forthis purpose a single writer wrote 90 samples of each digit andthe networks were re-trained for 50 epochs on the individualdataset. The challenges in confusing the digits ‘0’ and ‘6’ bymotion classifiers increased while acoustic emissions over-came such problems. Combining the complementary featuresby the previously defined rules increased the overall accuracyon handwritten digits for a single writer to 98.33%. Hence thisalso confirmed that the individual writing style of handwrittendigits from a single writer can be generalized even when thereis a variance in writing single digits. Networks that have beentrained on different writers can be used to bootstrap specificclassifiers for a specific user. Hence the number of handwrit-ten digits and computing time for training the classifiers ona target user can be reduced. Moreover this procedure onlyneeds to be performed once per user.

The results on the classifiers refer to a quiet environment, butin practice the effects of ambient noise must be taken intoaccount. The Hilbert transformation cannot overcome such is-sues but the housing of the prototype attenuates ambient noise.The measured amplitudes of the microphone can be utilizedfor the determination of ambient noise. If the calculated inputenergy level of the audio signals is above a threshold value

while the binary write sensor detects no input, the classifiershould not re-validate the result of the motion data when an in-put is detected. Hence unreliable results of audio classificationcan be avoided which still corresponds to an overall accuracyof 79.2% and 94.6% for all test users and the target user afterindividualization, respectively. Moreover this also implies thatmotion and audio data have to be classified separately for realapplications.

All the presented results are related to digits. In contrast torecognizing handwritten words it is not possible to correctsingle digits afterwards. Especially when this pen is used fordialing tasks or adding phone numbers to contacts, a high ac-curacy has to be achieved. The evaluated classifiers could beextended to characters or bigrams. The Re-evaluation of mo-tion data with writing audios is also promising for handwrittencharacters such as ‘D’ and ‘P’ due to the high similarity.

CONCLUSION AND FUTURE WORKWe reported the design of a digital pen for common writingsurfaces, such as paper, that does not require a special patternand is not confined to a particular area or drawing order. Thedesigned pen does not strongly differ from regular pens inappearance (except for a wired USB connection) and is builtfrom cheap and robust components. The evaluation basedon a corpus of digit data shows that the pen achieves highrecognition rates. The combination of motion and audio dataperforms substantially better than motion data alone or audiodata alone. It turns out that these types of sensor data havecomplementary characteristics.

It was also shown that majority voting neural networks, eachwith the same topology, provide a more accurate classificationthan single networks. Furthermore we found that combiningthe strengths of audio and motion features achieves better re-sults when motion data are classified first and then the resultis evaluated by classifiers trained on the sound emissions ofthe pen tip. Retraining the classifiers for a single writer on adataset of 900 samples turned out to achieve very high recogni-tion rates of 98.33%. It was also stated from a survey that usersaccept the recording of their handwriting for individualizingthe neural networks.

In future research the wired connection should be replacedby a wireless implementation, which was also the outcome ofthe questionnaire. To provide a wide field of application thedataset has to be extended to letters and gestures. Deep neuralnetworks with a higher amount of training data and differenttopologies could be observed to achieve higher recognitionrates on all test users before individualization as well as theimpact of different surfaces or background noises.

A live application such as a phone number dialing or a calcu-lator as well was numeral password entry can be implementedto evaluate the user experience. Writer identification is alsoa research topic in which this pen could be applied. Futureresearch will focus on letters, bigrams, and words with subse-quent auto-correction to cover a wide spectrum of handwritingrecognition.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 9

Page 10: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

REFERENCES1. Anoto AB. 2016. Anoto. Digital pen and paper

technology. (2016). http://www.anoto.com.2. E. Alpaydin. 1992. Multiple neural networks and

weighted voting. In Proceedings., 11th IAPRInternational Conference on Pattern Recognition. Vol.II.Conference B: Pattern Recognition Methodology andSystems. 29–32. DOI:http://dx.doi.org/10.1109/ICPR.1992.201715

3. Toshifumi Arai, Dietmar Aust, and Scott E. Hudson.1997. PaperLink: A Technique for Hyperlinking fromReal Paper to Electronic Content. In Proceedings of theACM SIGCHI Conference on Human Factors inComputing Systems (CHI ’97). ACM, New York, NY,USA, 327–334. DOI:http://dx.doi.org/10.1145/258549.258782

4. Hiroki Asai and Hayato Yamana. 2013. DetectingStudent Frustration Based on Handwriting Behavior. InProceedings of the Adjunct Publication of the 26thAnnual ACM Symposium on User Interface Software andTechnology (UIST ’13 Adjunct). ACM, New York, NY,USA, 77–78. DOI:http://dx.doi.org/10.1145/2508468.2514718

5. Won-Chul Bang, Wook Chang, Kyeong-Ho Kang,Eun-Seok Choi, Alexey Potanin, and Dong-Yoon Kim.2003. Self-contained Spatial Input Device for WearableComputers. In Proceedings of the 7th IEEE InternationalSymposium on Wearable Computers (ISWC ’03). IEEEComputer Society, Washington, DC, USA, 26–.http://dl.acm.org/citation.cfm?id=946249.946869

6. Muzaffar Bashir and Jürgen Kempf. 2008. ReducedDynamic Time Warping for Handwriting RecognitionBased on Multi dimensional Time Series of a Novel PenDevice. 3 (01 2008).

7. Peter Brandl, Clifton Forlines, Daniel Wigdor, MichaelHaller, and Chia Shen. 2008. Combining and Measuringthe Benefits of Bimanual Pen and Direct-touchInteraction on Horizontal Interfaces. In Proceedings ofthe Working Conference on Advanced Visual Interfaces(AVI ’08). ACM, New York, NY, USA, 154–161. DOI:http://dx.doi.org/10.1145/1385569.1385595

8. H. Bunke, T. Von Siebenthal, T. Yamasaki, and M.Schenkel. 1999. Online handwriting data acquisitionusing a video camera. In Document Analysis andRecognition, 1999. ICDAR ’99. Proceedings of the FifthInternational Conference on. 573–576. DOI:http://dx.doi.org/10.1109/ICDAR.1999.791852

9. François Chollet and others. 2015. Keras.https://github.com/fchollet/keras. (2015).

10. Wacom Company. 2011. Wacom Inkling. (2011).http://inkling.wacom.eu.

11. S. d. Choi, A. S. Lee, and S. y. Lee. 2006. On-LineHandwritten Character Recognition with 3DAccelerometer. In 2006 IEEE International Conferenceon Information Acquisition. 845–850. DOI:http://dx.doi.org/10.1109/ICIA.2006.305842

12. Chris Harrison and Scott E. Hudson. 2008. Scratch Input:Creating Large, Inexpensive, Unpowered and MobileFinger Input Surfaces. In Proceedings of the 21st AnnualACM Symposium on User Interface Software andTechnology (UIST ’08). ACM, New York, NY, USA,205–208. DOI:http://dx.doi.org/10.1145/1449715.1449747

13. Sungjae Hwang, Andrea Bianchi, and Kwangyun Wohn.2012. MicPen: Pressure-sensitive Pen Interaction UsingMicrophone with Standard Touchscreen. In CHI ’12Extended Abstracts on Human Factors in ComputingSystems (CHI EA ’12). ACM, New York, NY, USA,1847–1852. DOI:http://dx.doi.org/10.1145/2212776.2223717

14. Koichi Kise, Megumi Chikano, Kazumasa Iwata,Masakazu Iwamura, Seiichi Uchida, and ShinichiroOmachi. 2010. Expansion of Queries and Databases forImproving the Retrieval Accuracy of Document Portions:An Application to a Camera-pen System. In Proceedingsof the 9th IAPR International Workshop on DocumentAnalysis Systems (DAS ’10). ACM, New York, NY, USA,309–316. DOI:http://dx.doi.org/10.1145/1815330.1815370

15. Narayanan C. Krishnan, Colin Juillard, Dirk Colbry, andSethuraman Panchanathan. 2009. Recognition of HandMovements Using Wearable Accelerometers. J. AmbientIntell. Smart Environ. 1, 2 (April 2009), 143–155.http://dl.acm.org/citation.cfm?id=1735835.1735841

16. F. F. Li. 2004. Handwriting authentication by envelopesof sound signature. In Proceedings of the 17thInternational Conference on Pattern Recognition, 2004.ICPR 2004., Vol. 1. 401–404 Vol.1. DOI:http://dx.doi.org/10.1109/ICPR.2004.1334136

17. Wenzhe Li and Tracy Anne Hammond. 2011.Recognizing Text Through Sound Alone. In Proceedingsof the Twenty-Fifth AAAI Conference on ArtificialIntelligence, AAAI 2011, San Francisco, California, USA,August 7-11, 2011. http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3791

18. Cheng-Lin Liu, Kazuki Nakashima, Hiroshi Sako, andHiromichi Fujisawa. 2003. Handwritten digit recognition:Benchmarking of state-of-the-art techniques. 36 (102003), 2271–2285.

19. M. Miura, S. Kunifuji, and Y. Sakamoto. 2007.AirTransNote: An Instant Note Sharing and ReproducingSystem to Support Students Learning. In Seventh IEEEInternational Conference on Advanced LearningTechnologies (ICALT 2007). 175–179. DOI:http://dx.doi.org/10.1109/ICALT.2007.50

20. Tohru MIYAGAWA, Yoshimichi YONEZAWA, KazunoriItoh, and Masami Hashimoto. 2002. Handwritten PatternReproduction Using 3D Inertial Measurement ofHandwriting Movement. 38 (01 2002), 1–8.

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 10

Page 11: Pentelligence: Combining Pen Tip Motion and Writing Sounds ... · algorithm (DTW) to distinguish the Arabic numerals ‘0’ and ‘6’ despite their similarity. They reached 100%

21. Mario E. Munich and Pietro Perona. 2002. Visual Inputfor Pen-Based Computers. IEEE Trans. Pattern Anal.Mach. Intell. 24, 3 (March 2002), 313–328. DOI:http://dx.doi.org/10.1109/34.990134

22. Shinji Nabeshima, Shinichirou Yamamoto, KiyoshiAgusa, and Toshio Taguchi. 1995. MEMO-PEN: A NewInput Device. In Conference Companion on HumanFactors in Computing Systems (CHI ’95). ACM, NewYork, NY, USA, 256–257. DOI:http://dx.doi.org/10.1145/223355.223662

23. A. Osborne. 1980. An Introduction to Microcomputers:Basic concepts. Osborne/McGraw-Hill.https://books.google.de/books?id=ScAjAAAAMAAJ

24. Stan Salvador and Philip Chan. 2007. Toward AccurateDynamic Time Warping in Linear Time and Space. Intell.Data Anal. 11, 5 (Oct. 2007), 561–580.http://dl.acm.org/citation.cfm?id=1367985.1367993

25. Andrew Seniuk and Dorothea Blostein. 2009. PenAcoustic Emissions for Text and Gesture Recognition. InProceedings of the 2009 10th International Conferenceon Document Analysis and Recognition (ICDAR ’09).IEEE Computer Society, Washington, DC, USA,872–876. DOI:http://dx.doi.org/10.1109/ICDAR.2009.251

26. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, IlyaSutskever, and Ruslan Salakhutdinov. 2014. Dropout: ASimple Way to Prevent Neural Networks from Overfitting.Journal of Machine Learning Research 15 (2014),1929–1958.http://jmlr.org/papers/v15/srivastava14a.html

27. Jürgen Steimle, Iryna Gurevych, and Max Mühlhäuser.2007. Notetaking in University Courses and itsImplications for eLearning Systems. In DeLFI 2007: 5.e-Learning Fachtagung Informatik, Christian Eibl,Johannes Magenheim, Sigrid Schubert, and MartinWessner (Eds.). Gesellschaft für Informatik, Bonn,45–56.

28. Jelena Ljucovic Adis Balota Zoran ŠevaracTijana Vujicic, Tripo Matijevic. 2016. ComparativeAnalysis of Methods for Determining Number of HiddenNeurons in Artificial Neural Network. Proceeding of theCentral European Conference on Information andIntelligent Systems 27 (2016), 219–223.http://archive.ceciis.foi.hr/app/index.php/ceciis/

index/pages/view/ProceedingsArchive2016

29. Jeen-Shing Wang and Fang-Chen Chuang. 2012. AnAccelerometer-Based Digital Pen With a TrajectoryRecognition Algorithm for Handwritten Digit andGesture Recognition. IEEE Trans. Industrial Electronics59, 7 (2012), 2998–3007. DOI:http://dx.doi.org/10.1109/TIE.2011.2167895

30. Daniel Wigdor, Clifton Forlines, Patrick Baudisch, JohnBarnwell, and Chia Shen. 2007. Lucid Touch: ASee-through Mobile Device. In Proceedings of the 20thAnnual ACM Symposium on User Interface Software andTechnology (UIST ’07). ACM, New York, NY, USA,269–278. DOI:http://dx.doi.org/10.1145/1294211.1294259

31. Myrna S. Zamarripa, Victor M. Gonzalez, and JesusFavela. 2007. The Augmented Patient Chart: SeamlessIntegration of Physical and Digital Artifacts for HospitalWork. In Proceedings of the 4th International Conferenceon Universal Access in Human-computer Interaction:Applications and Services (UAHCI’07). Springer-Verlag,Berlin, Heidelberg, 1006–1015.http://dl.acm.org/citation.cfm?id=1757148.1757261

32. Bin Zhang, Sargur N. Srihari, and Sangjik Lee. 2003.Individuality of Handwritten Characters. In Proceedingsof the Seventh International Conference on DocumentAnalysis and Recognition - Volume 2 (ICDAR ’03). IEEEComputer Society, Washington, DC, USA, 1086–.http://dl.acm.org/citation.cfm?id=938980.939484

CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Paper 131 Page 11