the effects of part-of-speech tagsets on tagger

157
The Effects of Part-of-Speech Tagsets on Tagger Performance A thesis presented by Andrew MacKinlay to The Department of Computer Science and Software Engineering in partial fulfillment of the requirements for the degree of Bachelor of Science with Honours University of Melbourne Melbourne, Australia October 2005

Upload: fares-al-aswadi

Post on 07-Mar-2015

89 views

Category:

Documents


5 download

TRANSCRIPT

The Effects of Part-of-Speech Tagsets on TaggerPerformance

A thesis presented

by

Andrew MacKinlay

to

The Department of Computer Science and Software Engineering

in partial fulfillment of the requirements

for the degree of

Bachelor of Science with Honours

University of Melbourne

Melbourne, Australia

October 2005

c©2005 - Andrew MacKinlay

All rights reserved.

Thesis advisor(s) AuthorTimothy Baldwin Andrew MacKinlaySteven Bird

The Effects of Part-of-Speech Tagsets on Tagger Performance

Abstract

In natural language processing (NLP), a crucial subsystem in a wide range of appli-cations is a part-of-speech (POS) tagger, which labels (or classifies) unannotated wordsof natural language with part-of-speech labels corresponding to categories such as noun,verb or adjective. Mainstream approaches are generally corpus-based : a POS taggerlearns from a corpus of pre-annotated data how to correctly tag unlabelled data.

Previous work has tended to focus on applying new algorithms to the problem oradding hand-tuned features to assist in classifying difficult instances. Using these meth-ods, a number of distinct approaches have plateaued to similar accuracy figures of96.9± 0.3%.

Here we approach the problem of improving accuracy in POS tagging from a uniqueangle. We use a representative set of tagging algorithms and attempt to optimise per-formance by modifying the inventory of tags (or tagset) used in the pre-labelled trainingdata . We modify tagsets by systematically mapping the tags of the training data to anew tagset. Our aim is to produce a tagset which is more conducive to automatic POStagging by more accurately reflecting the underlying lingustic distinctions which shouldbe encoded in a tagset.

The mappings are reversible, enabling the original tags to be trivially recovered, whichfacilitates comparison with previous work and between competing mappings. We exploretwo different broad sources of these mappings. Our primary focus is on using linguisticinsight to determine potentially useful distinctions which we can then evaluate empirically.We also evaluate an alternative data-driven approach for extracting patterns of regularityin a tagged corpus.

Our experiments indicate the approach is not as successful as we had predicted. Ourmost successful mappings were data-driven, which give improvements of approximately0.01% in token level accuracy over the development set using specific taggers, with in-crements of 0.03% over the test set. We show a wide range of linguistically motivatedmodifications which cause a performance decrement, while the best linguistic approachesmaintain performance approximately over the development data and produce up to 0.05%improvement over the development data. Our results lead us to believe that this line ofresearch is unlikely to provide significant gains over conventional approaches to POStagging.

Contents

Title Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivCitations to Previously Published Work . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 Background 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Parts of Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 POSs Defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 POS Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Natural Language Corpora . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Review 72.1 Overview of Tagging Algorithms . . . . . . . . . . . . . . . . . . . . . . . 72.2 Using Linguistic Insight to Optimise NLP Applications . . . . . . . . . . 82.3 Linguistic Resources for Modifying the Tagset . . . . . . . . . . . . . . . 9

2.3.1 The Brown Corpus Tagset . . . . . . . . . . . . . . . . . . . . . . 92.3.2 The Penn Treebank Tagset . . . . . . . . . . . . . . . . . . . . . . 102.3.3 Other Tagsets in Use . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Algorithms for POS Tagging . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.1 POS Tagging with Transformation-Based Learning . . . . . . . . 152.4.2 Support Vector Machine-based POS Tagging . . . . . . . . . . . . 172.4.3 Maximum Entropy POS Tagging . . . . . . . . . . . . . . . . . . 18

3 Methodology 193.1 General Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 Data Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3.2 A Data-Driven Alternative . . . . . . . . . . . . . . . . . . . . . . 243.3.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Experimental Evaluation 274.1 Benchmark and Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

iv

4.3 Linguistically Motivated Modifications . . . . . . . . . . . . . . . . . . . 314.3.1 Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Syntactically-Conditioned Modifications of Closed Classes . . . . 324.3.3 Syntactically-Conditioned Modifications of Open Classes . . . . . 334.3.4 Lexically-Conditioned Modifications of Closed Classes . . . . . . . 364.3.5 Lexically-Conditioned Modifications of Open Classes . . . . . . . 37

4.4 Overall Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Conclusion 405.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

A The Penn Tagset 45

B Complete Results 47B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48B.2 rb–deg[ld] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49B.3 rb–deg[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50B.4 vb–cop[s] + rb–deg[s] Mapping . . . . . . . . . . . . . . . . . . . . . . 52B.5 vb–cop[lm] + rb–deg[s] Mapping . . . . . . . . . . . . . . . . . . . . 53B.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55B.8 in–sub[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56B.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59B.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60B.11 nn–ms[l] + dt–num[l] Mapping . . . . . . . . . . . . . . . . . . . . . . 61B.12 vb–rp[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63B.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64B.14 in–rp[l] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65B.15 in–rp[l] + in–sub[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . 66B.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68B.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69B.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71B.19 vb–rp[ld] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75B.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.23 vb–tr[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80B.24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81B.25 to:in Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83B.26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85B.27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86B.28 vb–inf[lm] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87B.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89B.30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90B.31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

v

B.32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92B.33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93B.34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94B.35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95B.36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96B.37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97B.38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98B.39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99B.40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100B.41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101B.42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102B.43 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105B.44 rp(cl–c) Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106B.45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109B.46 in(cl–c,s) Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110B.47 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113B.48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114B.49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115B.50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116B.51 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117B.52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118B.53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119B.54 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120B.55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123B.56 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124B.57 rbr(cl–c,s) Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125B.58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126B.59 in(cl–c) Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127B.60 rb–loc[lm] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130B.61 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131B.62 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132B.63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133B.64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134B.65 rb–loc[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135B.66 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137B.67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138B.68 vb–cop[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139B.69 dt–num[l] + jj–num[l] Mapping . . . . . . . . . . . . . . . . . . . . . 142B.70 to:in + in–sub[s] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 143B.71 vb–cop[lm] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145B.72 in/rp/rb[l] Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146B.73 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

vi

Citations to Previously Published Work

Portions of Chapter 3, Chapter 4 and Chapter 5 are to appear in the following paper:

Andrew MacKinlay and Timothy Baldwin (Forthcoming). POS Tagging witha More Informative Tagset. In Proceedings of the Australasian Language Tech-nology Workshop 2005, Sydney, Australia

Acknowledgments

Kate, and all my other friends for putting up with my absence over the year.My secondary supervisor Steven for providing occasional but very helpful support

whenever it was needed.And most importantly my primary supervisor Tim, who was extremely generous with

his time and assistance, for being a highly effective source of inspiration and informationas well as an editor and proofreader.

Chapter 1

Background

1.1 Introduction

Part-of-speech (POS) tagging is a well-studied problem in natural language process-ing, in which the aim, given a natural language text, is to a label each word in thatsample with a POS tag such as noun, verb or adjective. In one of the earlier successfulapproaches to POS tagging in the framework which is now mainstream in NLP, Church(1988) determined that it was possible to achieve high accuracy in POS tagging using animpoverished feature set derived from no more than two words in the immediate localcontext.

In the following decade or so, subsequent work generally involved applying new algo-rithms to essentially the same task (Brill 1995; Ratnaparkhi 1996; Daelemans et al. 1996;Nakagawa et al. 2001), with these diverse approaches all settling on a similar set of fea-tures to those used by Church, and never using anything more extensive than highlylocalised context. There was very little successful novel feature engineering – in generalvariants on this small set of features are tacitly regarded as optimal for tractable POStagging. A wide range of approaches rapidly asymptoted to a “glass ceiling” of perfor-mance, achieving accuracy of 96.8 ± 0.2% over the standard dataset used for the task,the Penn Treebank (Marcus et al. 1993).

More recently approaches have been targeted more on reducing the running timeof an existing approach (Gimenez and Marquez 2003; Ngai and Florian 2001) with aconcomitant further reduction in focus on improving the feature set. The only departurefrom this was by Toutanova and Manning (2000), who added a selection of language-specific, hand-tuned features, achieving improvements in accuracy of approximately 0.3%,bringing accuracy to just above the top end of the range quoted above.

There has been very little improvement in accuracy in POS tagging in the last decade –diverse approaches have reached a similar plateau. The application of different algorithmsto the task has already been extensively explored, and feature engineering is unlikely tohelp much more than has already been shown to be possible. This motivates the needfor a new approach to the problem, which is the focus of this thesis.

This thesis reports an investigation into an alternative approach to POS tagging.Rather than the application of a new algorithm or addition of new features, it is anexamination of the effect of modifying the tagset – the inventory of tags assigned by thetagger. We create new tagsets by systematically mapping words tagged in the original

1

tagset to a new, finer-grained tagset, which is designed to more accurately reflect theunderlying linguistic distinctions which should be made explicit in a well-designed tagset.

This finer-grained tagset will provide more specific contextual information with whichto determine the identity of surrounding tags. We empirically evaluate whether this ad-ditional information facilitates more accurate tagging. This evaluation stage is facilitatedby the use of reversible mappings which enable the original tags to be recovered so thatcomparison between different mappings and with previous work can be easily achieved.Thus we are explicitly discarding any increased linguistic utility of the new distinctions(although this may be valuable) and examining the effect of these new distinctions onaccuracy. The primary focus is to investigate a wide range of mappings designed usinglinguistic knowledge, however we also demonstrate an alternative approach where we at-tempt to use machine learning techniques to infer salient groupings from a tagged corpusof text.

To conduct a thorough investigation it is necessary to have a small and well-definedfocus. The sole source of gold-standard data is the Penn Treebank corpus – we use this asa case study to investigate the extent to which performance is affected by a finer-grainedtagset. We leave it as an open question how applicable our findings are to different tagsetsand languages, as this is a question for further empirical evaluation.

This thesis is structured as follows: The remainder of this chapter introduces back-ground material and terminology.Chapter 2 contains a broad survey of previous work onPOS tagging and information about various alternative tagsets used for different naturallanguage corpora. Chapter 3 explains the general setup of the experiments performed andevaluation metrics, and the motivation for attempting to improve performance in POStagging, as well as the method for extracting useful groupings of lexical items from thedata. Chapter 4 gives detailed explanations of the mappings used in various experimentsalong with summaries of results for each. Finally, in Chapter 5 we discuss the significanceof the results demonstrated and outline opportunities for further work.

1.2 Parts of Speech

1.2.1 Terminology

At this point it will be useful to define some terms relevant to both linguistics andto computational linguistics, taking definitions mostly from Crystal (1987). Syntax es-sentially refers to the arrangement of words in a sentence and how this arrangementinfluences the semantic relationships between words. Most theories of syntax posit ahierarchical underlying structure for a sentence: at the base level there are words whichcombine with each other to form phrases, and these phrases recursively combine witheach other to form sentences. This captures our intuition that in a sentence like Thecat watched the fantastic performers there is a closer relationship between, for example,fantastic and performers than there is between fantastic and cat. Syntacticians thereforeoften represent sentences in terms of parse trees, such as that in Figure 1.1.

Morphology is another linguistic term which refers to how words are built up fromsmaller units of meaning known as morphemes. A word such as cats could be segmentedinto two smaller meaning bearing units: cat which has the meaning associated withits dictionary definition and the suffix –s indicating plural. Both of these segments

2

Sentence

�������

HHHHHHH

NounPhrase

���HHH

Determiner

the

Noun

cat

VerbPhrase

�������

HHHHHHH

Verb

watched

NounPhrase

�����

��

HHHHH

HH

Determiner

the

Adjective

fantastic

Noun

performers

Figure 1.1: A example of a syntactic parse tree

contribute some component of meaning to the overall meaning of cats, however thesesegments cannot be divided up further into any smaller units which are associated withan intrinsic meaning, and for this reason they are generally accepted as being morphemes.Similarly, a word such as dispassionately can be segmented as dis– + passion + –ate + –ly.

Morphology is often classified into two sub-types: inflectional and derivational. In-flectional morphology is concerned with how words inflect in a particular grammaticalcontext. So, the distinction between cat and cats is determined by how the word is usedin a particular instance – in this case whether the intended referent is singular or plural.The underlying meaning of the word has changed very little. Other inflectional suffixesin English include the past tense suffix for verbs, in play + –ed. Meanwhile derivationalmorphology is typified by affixes of the kind seen above in dispassionately. Each of theaffixes causes a change in meaning and creates a new word with a substantially differentmeaning which often performs an entirely different syntactic function.

It will also be necessary to introduce some terminological distinctions from computa-tional linguistics literature. This thesis deals with words of natural language wherein theterm word is often not sufficiently precise for our purposes. For the purposes of dividingup text into units, it will suffice to accept, unless mentioned otherwise, the definition ofword as understood by literate speakers of English: an orthographic unit comprised of astring of alphabetical or numeric characters separated from other words by whitespaceor punctuation marks. We will abstract away from the large number of complications tothis simplistic definition. Similarly, we will also oversimplify in defining a sentence as anorthographic unit terminated by a full stop, exclamation mark or question mark whichis not being used for another purpose such as an abbreviation.1

Additionally, we will adopt the following more specific terminology as defined byJurafsky and Martin (2000). The term wordform refers to a word as it appears in the textincluding attached inflectional morphemes, while lemma refers to an abstract uninflectedword stem of the type we might find in a dictionary. Thus in the sentence in Figure 1.1

1The task of classifying a full stop as a genuine or spurious sentence boundary is in fact an area of NLPstudy in itself (e.g. Mikheev 2000, Ratnaparkhi 1996), however we will assume that this classificationtask has already taken place.

3

the set of lemmas is {the, cat, watch, fantastic, performer} and the set of wordformsis {the, cat, watched, fantastic, performers}. The term types, referring to the distinctwordforms in some text, is often contrasted with tokens, the running wordforms in asample of text. The example sentence contains 6 tokens (the, cat, watched, fantastic, theand performers) and 5 types (the, cat, watched, fantastic and performers).

1.2.2 POSs Defined

The term part of speech has been used since the beginnings of grammatical study todenote broad categories of words in natural language. As discussed in Crystal (1987),traditional grammars usually recognise eight to ten POSs: noun, verb, adjective, pronoun,preposition, conjunction, adverb, interjection, and possibly article and participle. ThesePOSs were generally defined by broad semantic criteria – e.g. a noun is “a person, placeor thing”, a verb is an “an action” and an adjective is “a quality of something”.

As noted by Crystal (1987), linguists more recently have noted a number of shortcom-ings of such an approach. While such definitions work acceptably for particular membersof the class such as the noun chair, it is trivial to find examples which we would liketo group with a particular POS but do not share the requisite semantic criteria of thesesimplistic definitions. It is difficult to argue that idea is a thing, or that to wish is anaction, and yet it is clearly desirable to group these with nouns and verbs respectively.Additionally, for some word classes such as articles, (a, an and the), it is not at all clearwhat semantic criteria could define the class. Essentially there is a mismatch between thesemantic criteria being used to define the POSs and the underlying syntactic distinctionsthey are meant to reflect.

Modern linguistic approaches often use the term word class rather than part of speechto avoid a term with too many pre-existing associations. Recognising the shortcomingsof the semantic definitions of POSs, they tend to identify word classes by formal criteria.One such criterion is based on syntax: word classes should be comprised of words whichtend to occur in a particular position relative to other words in a sentence. For example,the fact that in English chair and idea can both be substituted for X in I like theX is evidence that they belong in the same word class. Another useful criterion ismorphological: since chair and idea can both have the plural morpheme –s appended tothem as a suffix, there is stronger justification for grouping them together.

These criteria can be used with reasonable effectiveness in many languages to identifysalient groupings of words. However, as often occurs in natural language, the pictureis complicated somewhat by less canonical examples. A word like sheep does not occurwith the plural suffix –s even when the intended meaning is plural, so there is a caseon morphological grounds for placing it in a different word class to chair. Looking atsyntactic criteria, most adjectives can freely occur preceding nouns or following be, butsome adjectives such as asleep can only occur in the latter construction: the man is asleepversus *the asleep man.2 These so called attributive adjectives could be grouped withother adjectives or placed in a class of their own. Accounting for every irregularity ina language in this way potentially leads to thousands of word classes, however ignoringtoo many of the distinctions evident from these criteria obscures patterns of syntactic

2The * here is a shorthand commonly seen in linguistics literature to denote a sentence which isungrammatical, i.e. not a valid sample of the language

4

regularity. It is thus an open question how many of these morphological and syntacticdifferences we wish to account for when we divide up the lexicon of the language intoword classes. This is a point we return to in Section 1.2.3.

It is usual to recognise two broad categories of word classes: open classes and closedclasses. Open classes have a number of characteristics: they freely admit new members,and the words contained in them tend to make a up a large percentage of the lexicon of thelanguage and be readily modified by morphological processes. In most languages includingEnglish, classes like nouns, verbs and adjectives tend to be open classes. Prototypicalclosed classes have an opposing set of characteristics: they contain a small number oflexical entries (even though they make up a large percentage of tokens in running text), donot readily admit new members and contain words which allow little or no morphologicalmodification.

1.2.3 POS Tagging

Broadly, POS tagging is a classification task in NLP where the input is an unlabelledsample of text (or corpus) and each word token contained in it is assigned a POS. Herethe term “POS” is more similar to what we described as “word classes” above than thePOS from traditional grammars, however in our use of the term we follow the generalpractice in NLP literature.

There are important differences between the POSs in POS tagging and the wordclasses of modern linguists beyond those already mentioned. In NLP, POS tags areassigned to tokens in a sample of text, not to abstract lexical entries. An importantconsequence is that it is not always clear what the POS of a particular token is. A largenumber of wordforms in English are ambiguous: the string of characters constituting theword do not uniquely determine the actual POS. So a noun like chair could also be averb in a sentence like I would like to chair the meeting. As we shall see in Section 2.1,there are a number of strategies to overcome this ambiguity: we usually assign the tagto a token based on past evidence of the most likely tag for the wordform, augmentingthis with information from surrounding words.

Another difference is that word classes such as “noun” and “verb” tend to be asso-ciated with a lemma rather than a wordform, however this thesis follows the standardapproach in NLP where the POSs we assign distinguish between different wordforms cor-responding to the same lemma. Thus, as will be discussed more extensively in Section 2.3,a plural noun like cats is tagged differently from a singular noun like cat, and a past tenseverb like ate is distinguished from the present tense eat.

The focus on wordforms is the main reason why the size of even small computationaltagsets (i.e. the inventory of tags distinguished in a given POS tagging task) tends to besomewhat larger than the set of broad word classes used by linguists; each distinctioncorresponding to an inflectional suffix is marked. Additionally, a tagset can be designedto distinguish any number of lexical, morphological or syntactic differences between word-forms. It is the differences here which account for much of the variation between tagsets.This thesis investigates the extent to which such distinctions in tagsets are useful, pri-marily from the standpoint of tagging accuracy.

POS tagging is important since in some form it is a component of many applications inNLP and related domains. Tasks such as information retrieval, parsing and corpus-based

5

linguistics can all depend to some degree on POS tagging. The emphasis on downstreamapplications will be relevant later when we look at evaluation metrics (Section 3.3.3) butin general, we consider POS tagging in isolation.

1.2.4 Natural Language Corpora

Up until this point, the definition of “correct POS tag” has been left unspecified. Thisthesis follows the NLP mainstream of corpus-based POS-tagging. This presupposes theexistence of a “gold-standard” training corpus pre-annotated with what we will accept ascorrect tags. Essentially the task of POS-tagging becomes a task of specialised machinelearning: the corpus is divided up into training and test data, and, using the gold-standardtags on the training data, a tagger is trained to assign POS tags. Evaluation is conductedby running the trained tagger over the held-out test data and comparing the results withthe gold-standard.

There are a number of tagged corpora in existence, however of particular relevance isthe corpus which is the primary focus of this paper, the Penn Treebank (Marcus et al.1993), with tagged sentences such as:

Both Clarcor and Anderson are based in Rockford , IllDT NNP CC NNP VBP VBN IN NNP , NNP

As well a being the de facto standard for a range of NLP tasks including POS tagging, it issignificant because it is also annotated with syntactic parse trees, as shown in Figure 1.2.

S

�����

���

HHHHH

HHH

NP-SBJ

�������

���

@@@

PPPPPPP

DT

Both

NNP

Clarcor

CC

and

NNP

Anderson

VP

����

HHHH

VBP

are

VP

����

HHHH

VBN

based

PP-LOC-CLR

����

HHHH

IN

in

NP

����

HHHH

NP

NNP

Rockford

,

,

NP

NNP

Ill

Figure 1.2: A parse tree annotation from the Penn Treebank

6

Chapter 2

Literature Review

2.1 Overview of Tagging Algorithms

A large proportion of word types in English are unambiguous in terms of POS tags.According to DeRose (1988), roughly 88.5% of word types receive only one tag in theBrown corpus. However the tagging problem is not quite as easy as figures such as thismight tend to suggest. More frequent words tend, on average, to have a larger number ofpossible tags. DeRose also points out that a given character string uniquely determinesthe POS for only 60% of tokens in the Brown corpus. Thus we would expect that forroughly 40% of tokens which a tagger encounters in a unseen data, we will need some wayto disambiguate the possible tags. In the investigation phase of this project we calculatedthe average number of possible tags in sections 23 and 24 of the Penn Treebank WSJCorpus to be 2.2 tags per word token, when the tagger was trained on sections 0 to 22.

A problem which is perhaps even more challenging if less frequent is that of unknownwords. Words will inevitably appear in the test data which were not in the training data.Approximately 2.5% of tokens in sections 23 and 24 of the WSJ corpus were not observedanywhere else in the corpus. Many taggers deal with unknown words effectively by usingfeatures derived from the character strings at beginning or end of the word to determinethe correct tag.

The simplest and most obvious method for dealing with tag ambiguity yields sur-prisingly accurate performance. A probability distribution is built based on the tagsobserved for each word. In the test data, each word is assigned the tag with the highestprobability according to the distribution. This is known as unigram tagging as it assignsa tag based on a word unigram – a sequence of exactly one word. According to Charniaket al. (1993) such an approach (with some smoothing1) gives an accuracy of 91.5% onthe Brown corpus. The results from preliminary investigation here suggest somethingsimilar: an unsmoothed unigram tagger trained on sections 0 to 22 of the WSJ corpusachieves 91.5% accuracy on sections 23 and 24.

However the most significant aspect is how modern state-of-the-art taggers achievehigher accuracy. Almost all taggers since Church (1988) have used a similar observation:

1Smoothing is the reassignment of probability mass to deal sensibly with low-frequency word types –simply assigning their probability of a tag given a word to be the observed frequency seen in the trainingdata means that no allowance will be made for a word receiving a tag it did not receive in the trainingdata, but this is actually a very real possibility, especially for low-frequency words

7

nearby tokens can be highly informative about the POS of a particular token. A deter-miner such as the is highly likely to be followed by an adjective or noun, while it is almostnever followed by a verb or a modal. For example, the word can is ambiguous betweennoun and modal verb (among others). So if a word sequence such as the can is observed,a tagger can predict that can should be tagged as a noun rather than as a modal whichwould be predicted by the unigram method. Taggers which use features derived from thelocal context are described by Church (1988), Brill (1995), Ratnaparkhi (1996), Brants(2000) and Gimenez and Marquez (2004), inter alia.

2.2 Using Linguistic Insight to Optimise NLP Appli-

cations

There are a number of prior examples in the literature showing that in NLP, informedlinguistic insight can be more effective at improving performance in a particular task thanapplying a new algorithm.

The approach taken by Klein and Manning (2003) is perhaps the most informative.The task in question was parsing, i.e. assigning the correct parse tree to a set of sentences.While it is a separate task from POS-tagging, there are obvious parallels between thetwo. Rather than learning how to assign a POS label from a text annotated with tags, inTreebank-style parsing a grammar is induced from a training set annotated with a parsetree for each sentence. In the testing phase, a parser, instead of selecting the most likelytag from the tagset, determines the most likely parse tree for each sentence from thoseallowed by the induced grammar.

Klein and Manning looked only at unlexicalized parsing, where the lexical identity ofa word is used only to determine the possible POSs. They demonstrated that substantialincreases in parsing accuracy were possible by using a finer-grained set of annotationsfor nodes in the tree. The original Treebank annotations were split into subcategoriesbased on nearby features in the parse tree, either from descendants of the node or fromfeatures of parents or siblings. The authors used linguistic insight to identify potentiallyuseful categorial distinctions they could introduce to improve the accuracy which werenot present in the original annotation. For example, each node was annotated with thebasic node label of the parent, which captured the fact that the distribution of lexicalitems for a given POS is often dependent on the category of the parent node of the token.Applying a suite of similar splits on the node labels, they improved accuracy from 77.77%to 86.32%.

A different but related method was used by Toutanova and Manning (2000), whoapplied linguistically relevant observations to POS tagging. Here, rather than modifyingcategory labels and applying an existing algorithm, the authors modified the algorithm toutilise features derived from the data which were designed to encode linguistically relevantinformation, and in particular information which could help to distinguish between easilyconfusible POSs.

For example, in English, there is a systematic ambiguity between past tense and pastparticiple forms of regular verbs (which receive the tags VBD and VBN respectively inthe WSJ corpus). So while an irregular verb like speak has past tense spoke and pastparticiple spoken, regular English verbs (the majority) show no morphological distinction

8

between the two, such as talk, which has talked in both cases. The lack of overt wordformdistinctions between these two classes means that it is a significant source of error for mosttaggers – 6.9% of the total error in the baseline model used by Toutanova and Manning.Based on the knowledge that past participles in English occur after the auxiliary verbshave and be, the authors added a feature to the tagger which was activated if either ofthese two words occur in a (relatively large) preceding context window, and thus reducedthe number of VBD/VBN errors by 12.3%.

By adding a suite of similar hand-engineered features, Toutanova and Manning in-creased the accuracy from the baseline of 96.72% to a final figure of 96.86%, and theynote that “even when the accuracy figures for corpus-based POS taggers start to lookextremely similar, it is still possible to move performance levels up” (Toutanova andManning 2000:69). This observation justifies the goal of this thesis, which is to improvetagger performance using linguistic insight.

2.3 Linguistic Resources for Modifying the Tagset

We noted in Section 2.2 that a finer-grained set of category labels can markedlyimprove performance in a related NLP application of parsing, and that there is potentialto improve POS tagging performance by adding new linguistically motivated featuresto the tagger. These suggest that it may be possible to apply an analogous version ofKlein and Manning’s method to POS tagging. If we alter the tagset to encode moresubtle distinctions within the present word classes, providing more useful information todisambiguate surrounding words, we could potentially increase tagging accuracy.

The inspiration for these modifications can come from a number of sources. Oneobvious set of resources is the various comprehensive descriptive grammars of English,such as Huddleston and Pullum (2002) or Quirk et al. (1985). Grammars such as theseprovide thorough descriptions of various linguistic phenomena such as salient syntacticgroupings of words. Yielding a different but overlapping set of possible modifications arethe tagsets used in various other corpora. Some of these tagsets, along with the PennTreebank tagset itself, are examined below.

It will be useful in the discussion to introduce some points noted by Leech (1997)about the distinction between linguistic and computational requirements which influencethe design of tagsets. The linguistic quality of a tagset is determined by the extentto which each tag denotes a set of words with a unique set of syntactic properties incommon with each other, while the its computational tractability2 is determined by howeasy it is to determine the tag for a particular word, and how much each tag aids in thedisambiguation of surrounding words. The extreme cases of tagsets with either one tagper word, or one tag for all words, are examples of tagsets which are highly tractable incomputational terms, but of very little use linguistically, which perhaps serves to indicatethat these requirements sometimes conflict. This is a point to which we will return inSection 3.1.

2We will use this term as defined by Leech hereafter

9

2.3.1 The Brown Corpus Tagset

The Brown Corpus (Francis and Kucera 1979), as one of the earliest digital corpora,was annotated with a tagset which, according to Sampson (1987), was directly or in-directly the basis of most of the tagsets in existence today. The tagset has 87 simpletags, but 186 compound tags, which are discussed below. There are several reasons whyeven the number of simple tags is much larger than the eight traditional POSs discussedabove. Since most verbs show different syntactic distributions depending on how they areinflected, tokens from the major word classes can each be assigned one of several differentPOSs, depending on how they are inflected. For example, verb tokens can appear in oneof five different categories:

• VB for bare uninflected verb (present, imperative or infinitive) – e.g. take

• VBZ for present tense, third person singular – e.g. takes

• VBD for simple past tense – e.g. took

• VBN for past participle – e.g. taken

• VBG for present participle – e.g. taking

Similarly nouns are divided into common and proper nouns, each of which is furtherdivided into singular and plural. Additionally there are variants of each for when thetoken occurs with the possessive marker ’s. Adjectives are also divided into absolute (e.g.tall), comparative (e.g. taller) and superlative (e.g. tallest), while similar distinctions areobserved for adverbs. These open classes, along with the cardinal and ordinal numbers,account for 23 of the tags in the tagset. The remainder are used on the various closedclasses, including:

• Tags for all of the inflected forms of the auxiliary verbs be, have and do (similarto the inflected forms listed above, but with the addition of negated versions of thepast and present forms, so that is/BEZ is distinguished from isn’t/BEZ*)

• Tags for various articles and determiners, such as a(n), the, this, each

• Tags for different wh-words, such as what, who, when etc.

• Tags for prepositions: in, at, from, over etc.

• Tags for various punctuation marks

along with several other closed classes, such as pronouns and modal verbs.The tagset is still more complex however, when we consider that there are two ways

of forming complex tags: two tags joined by a plus symbol indicates the second itemis a contracted item appended to the host word. For example John’ll would be taggedNP+MD for “Proper noun” and “modal verb”. The many possible combinations inwhich this can occur brings the number of tags to 186. Additionally, there are a numberof tags denoting various additional pieces of information about word tokens, which arejoined with a hyphen to the primary tag: FW denotes foreign words, TL denotes a title,HL denotes a headline and NC denotes part of a multi-word phrase and FW a foreignword. In an extreme case L’Arcade receives the tag FW-AT+NN-TL (where AT denotesarticle). These extra tags bring the total number of simple and complex tags to 472.

10

2.3.2 The Penn Treebank Tagset

The tagset for the Penn Treebank (Marcus et al. 1993), which we reproduce in Ap-pendix A, is based on the original Brown tagset3 however its comparatively small sizea deliberate design decision. The corpus differs from earlier corpora in two ways: it isdesigned more for NLP and the sentences contained are not only tagged with POSs butare also parsed. In this spirit Marcus et al. set out to create “A Simplified POS Tagsetfor English” to alleviate problems of sparse data when used probabilistically, and thusincrease its computational tractability (as defined above). There are two primary waysthis simplification is achieved. One is by avoiding compound tags. For example, whilethe contracted auxiliaries ’ll and ’s and the contracted adverb n’t would all be indicatedwith compound tags on the host word in the Brown corpus (e.g. she’ll/PPS+MD), inthe Penn Treebank these clitics4 are split from the main noun or verb in the tokenisationstage, and treated as standalone tokens with their own tags; the possessive ’s is treatedsimilarly. This avoids having a separate tag for verb forms when they are negated fol-lowed by an auxiliary, and for nouns when they are used in a possessive construction.Thus, aren’t is tagged the same way as are not, making the syntactic similarity of thetwo apparent.

The other means by which Marcus et al. simplified the tagset, and the one which ismore relevant here, was with the notion of “recoverability”: if the distinctions betweenseveral tags could be recovered from some other available information, the informationwas not included in the tag.

The most obvious type of information which can be recovered is the lexical identity ofa word. Thus the Penn Treebank attempts to avoid classes with just a single lexical item,and, unlike the Brown tagset and several other related tagsets, the Penn Treebank tagsetreserves no special tags for the auxiliary verbs be, have and do – they are treated just likeany other verb. Other lexically recoverable distinctions which were removed include thosebetween various articles and determiners, and between reflexive and personal pronouns(e.g. the distinction between myself and me).

The other category of information available in the Penn Treebank stems from the factthat it was designed as a parsed corpus, meaning that there is abundant syntactic infor-mation available. Again, Marcus et al. set out to design the tagset to remove some of theredundancy present in the combination of syntactic structure and POS tags. The clear-est example of this is with the IN category. In traditional grammars, there is a divisionbetween subordinating conjunctions and prepositions. Subordinating conjunctions suchas that, because, since, etc., introduce subordinate clauses that are structurally similarto main clauses – they have a subject, verb and (sometimes) object, and can often standalone as a main clause, as in He left [because he was angry], where he was angry is asubordinate clause. Meanwhile prepositions such as in, to, on attach to noun phrases, asin at the zoo.5 The Penn Treebank IN tag conflates both of these categories despite their

3Note that the Penn Treebank actually contains a subset of the Brown corpus parsed and tagged inPenn Treebank format, however we will ignore this from now on – the Brown corpus refers to the originaldescribed in Francis and Kucera (1979)

4These are items which have a status between words and affixes (Manning and Schutze 1999). It isperhaps this intermediate status which explains their variable treatment between different tagsets.

5More modern grammars such as Huddleston and Pullum (2002) acknowledge but disagree with thistraditional division, preferring to group most of the subordinating conjunctions with prepositions, apart

11

differing syntactic functions.However Marcus et al. (1993:315) stress that all of this information is available to

users of the corpus by looking at these additional sources:

... the lexical and syntactic recoverability inherent in the POS-tagged versionof the Penn Treebank corpus allows end users to employ a much richer tagset... if the need arises.

Clearly the tagset was not designed to differentiate all possible distributional differences,but in most work on POS-tagging the tagset is used in unaltered form; this additionalinformation is, in the case of syntactic information, not used at all, and in the case oflexical information is only used as a side-effect of the taggers utilising features derivedfrom lexical context, as will be explained in more detail below. It seems that the possi-bility of making explicit certain syntactic regularities within the coarse Penn Treebankword classes is never considered.

There are a few other points worth noting regarding the Treebank tagset. One is thatin certain situations the tagset stipulates that certain words should be tagged differentlydepending on syntactic context, as opposed to the Brown tags, which are more invariantfor each lexical item. Thus in the Penn Treebank, one receives the tag NN (commonnoun) when it is the head of a noun phrase (as in the one you mentioned), rather thanthe tag CD (cardinal number) which it receives in a prenominal position. This concernwith syntactic function has led to the creation of one new category not in the Browntagset: present tense verbs with a non-third person singular subject (e.g. they run) aretagged VBP, even though – except for be – they are morphologically identical to theinfinitival form (they like to run), which is tagged VB. Returning to Leech’s distinctionbetween linguistic quality and computational tractability, it is clear that modificationssuch as these make the tagset more linguistically useful, while they probably slightlydecrease computational tractability. In certain contexts a VB/VBP distinction couldhelp disambiguate surrounding tags, but this is more than likely offset by the difficultyin distinguishing between POSs for which there is a systematic ambiguity for a givenorthographic form.

Marcus et al. note that further simplifications are possible, however they did not “pur-sue [their] strategy of tagset reduction to its logical conclusion”(Marcus et al. 1993:315),and it is not entirely clear why. There are a number of aspects of the tagset which do notseem to conform to the stated objectives. For example, VBP (present tense, non-thirdperson singular) and VBD (past tense) for the purposes of POS-tagging show almost iden-tical syntactic distributions, and the difference between them is almost always lexicallyrecoverable (the exceptions being a handful of irregular verbs like cut). There is thereforea strong argument, using the criteria established, for conflating the two. A tag which ismore notable for not conforming to these criteria is the idiosyncratic Penn Treebank tagTO, which is a tag solely used to tag the token to. to is used both as a preposition (c.f.IN for other prepositions), as in to the zoo, and to introduce an infinitive (e.g. they liketo run), so it is contravenes two of Marcus et al.’s conditions: it is a tag for a single lexical

from that, whether and conditional if which they leave as subordinators. The reasons are too complexto explain in detail here, but perhaps hint at some justification for Marcus et al.’s decision to conflatethe two classes. Nonetheless, it is still significant that in Huddleston and Pullum (2002) a distinction ismade.

12

item (one of only two in the Treebank tagset excluding punctuation, the other being EXfor existential there), and it fails to differentiate between two syntactically different uses.Some of these observations will be relevant in Chapter 4.

All of the modifications mentioned bring the tagset size to a comparatively small 36tags for lexical items and a further 12 for punctuation, symbols etc.. The one factorcomplicating this is provision for tags indicating ambiguity. Ambiguous words are taggedwith all possible tags separated by a ‘|’ (e.g. JJ|VBN), giving a theoretically very largemaximum tagset size. Over the 1.1 million tokens of the WSJ corpus, 36 ambiguous tagsoccur, and they are used on just 147 tokens in the corpus, which is so infrequent that anyprobabilistic method will have almost no chance of learning their correct application.

2.3.3 Other Tagsets in Use

It is worthwhile to briefly examine some other tagsets for purposes of comparison andas a possible source to inform modifications to the tagset. The differences between theBrown and Penn tagsets have already hinted that there is more than one possible way todivide up English words into POSs; the following should make that clearer still. Sampson(1987) summarises several alternative tagsets.

The Lancaster-Oslo Bergen Corpus of British English (LOB corpus hereafter; Johans-son et al. 1978) was developed jointly by British and Norwegian researchers as a BritishEnglish corpus parallel to the Brown corpus. While there is an equivalent sample oftexts, the tagsets differ substantially between the two. The LOB tagset, like the laterPenn Treebank tagset, requires that cliticised modals, auxiliaries and negation particlessuch as ’ll and n’t be split off from the host verb or (pro)noun, and tagged equivalently tothe corresponding expanded words. However, unlike the Penn Treebank, the possessive’s is left attached to the host noun and a $ is appended to the tag. Furthermore, theBrown corpus “extra information” tags for headlines, foreign words etc. are abandoned.

Thus, apart from roughly 20 possessive tags such as NN$ which are arguably com-pound tags, the LOB corpus is composed of atomic tags. While its tagset size of 132 isfar smaller than the corresponding 471-tag Brown corpus, the fact that these are all basictags makes it clear that it actually makes finer grained grammatical distinctions than theBrown corpus, with only 87 basic tags (including possessive tags).

A further extension of the LOB tagset is referred to by Sampson (1987) as the “Lan-caster” tagset, due to the affinities of the researchers involved. It introduces a number ofnew distinctions not recognised in the LOB tagset, such as for adjectives which can onlybe used attributively,6 and adjectives which are purely predicative.7

Some later tagsets that are clearly related to the Lancaster tagsets are the CLAWStagsets used by the CLAWS tagging system described in Garside (1987). The CLAWS5(C5) tagset which is used for the British National Corpus (Garside et al. 1997) is relativelysmall but at 61 tags including punctuation still contains more tags than the 46 in thePenn Treebank. Like the Penn Treebank, the C5 tagset distinguishes prepositions fromsubordinating conjunctions (which are in turn distinguished from that), of from the other

6As defined in Section 1.2.2, these are adjectives which can only appear in a prenominal position suchas utter: He is an utter fool versus *His foolishness is utter

7Those which can only be complements of verbs such as be – for example asleep: She is asleep versus*There was an asleep woman

13

prepositions, and reflexive pronouns from other pronouns. However there are severalcases where the C5 tagset conflates categories which are distinct in the Penn Treebank:in the C5 system, comparative and superlative adverbs (like more/most in a more/mostrepresentative sample) are grouped with the other adverbs as AV0 (c.f. RB, RBR andRBS in the Penn Treebank), and there is only a single class for singular and plural propernouns (versus NNP/NNPS).

A later development discussed in Garside et al. (1997) is the C7 tagset. At 146 tags,it is more similar in magnitude to the aforementioned Lancaster tagset than anythingelse we have discussed here. There is a corresponding similarity in the distinctions made,and while there are a number of differences, they are often quite subtle and not relevanthere. A brief summary of the distributions of tags across the different tagsets is givenin Table 2.3.3, which lists by tagset the number of distinctions made in each broadPOS category. A few caveats must be mentioned with regard to the table. Obviouslyit is a simplification of a large amount of information, and as such omits many of theintricacies of the distinctions between the tagsets. One aspect of this is that the broadPOS categories do not exactly line up between the different tagsets, so if one tagsetmakes more distinctions in a particular category, this does not necessarily mean that itis making these distinctions over exactly the same set of words. For example, tomorrowis tagged as a noun in the Penn Treebank, but as an RT “nominal adverb of time” in theC7 tagset.

2.4 Algorithms for POS Tagging

It remains now to examine the algorithms which are frequently used for POS-tagging,focussing especially on the implementations of those algorithms which are used here.Of particular relevance are the exact features which are used by each tagger for thedisambiguation of ambiguous and unknown words, so these will be treated in some detail.

2.4.1 POS Tagging with Transformation-Based Learning

The transformation-based learning paradigm as applied to POS tagging was firstdescribed in Brill (1995). Like the other taggers described here, it is a machine learningtechnique which takes a tagged corpus as input from which it can learn how to correctlytag a test sample. The tagger learns a set of rules for assigning tags (based on variousfeatures which will be explained below) from the training data which gives the leastpossible errors and applies them to test data.

The algorithm relies on the fact that for the task it is trivial to devise a very simpletagging mechanism which achieves quite reasonable results, such as the unigram most-likely-tag mentioned above. Such a method can be used to create an initial annotationof the text. Of course while such an annotation will have a large percentage of tagscorrect, there will still be a substantial proportion which are incorrectly tagged. The TBLalgorithm aims to correct these errors by successively applying rules which correct such

8The determiner category as we use it here includes possessive determiners such as their9The figures for the Brown and LOB tagsets differ from those quoted in the text as they do not

include possessive tags for words ending in ’s – i.e. tags for nouns ending in $

14

Brown LOB Lancaster C5 C7 PennArticles 1 2 2 1 2

3Determiners8 6 6 10 2 10wh-determiners 1 1 2 1 3 2Prepositions 1 1 5 2 4

2Subordinating Conjunctions 1 1 6 2 5Coordinating Conjunctions 1 1 3 1 2 1Pronouns 5 12 12 3 16 1wh-Pronouns 2 4 6 1 3 1Adverbs 4 5 10 1 11 3wh-Adverbs 2 1 4 1 2 1Particles 1 1 2 1 2 1Cardinal Numbers 1 3 4 1 4 1Nouns 4 16 25 3 15 2Proper Nouns 2 2 5 1 7 2General Verbs 5 5 7 6 8

6Auxiliary verbs 16 16 16 18 21Modal verbs 1 1 1 1 2 1Adjectives 4 6 9 3 4 3Punctuation 6 13 12 3 12 12Other 12 10 27 10 15 17Total9 75 105 166 61 146 48

Table 2.1: Distinctions made in broad POS categories for different tagsets

errors based on contextual and wordform-derived information. To use Brill’s example,if we have an incorrectly tagged sequence such as the following (with incorrect tags inbold):

The/DT can/MD rusted/VBD10

the fact that can is preceded by a word with the tag DT is strong evidence that it shouldbe tagged as a noun (NN) even though most occurrences of can in the training corpuswould be tagged as modals MD. So, this error will be corrected if we apply a rule like:

Change the tag to NN if the current tag is MD and the previous tag is DT

The linguistic relevance of this rule is also transparent: the syntactic slot following adeterminer is more likely to be a noun than a modal verb from what we know about thestructure of noun phrases in English. However, consider a sentence like:

This/DT can/MD be/VB difficult/JJ

10Penn Treebank tags have been used here. As a reminder, DT denotes determiner, MD denotesmodal verb and VBD denotes past tense verb

15

In this case, the correct tag of MD would be erroneously changed to NN by the aboverule. So the utility of a particular rule is increased by previously existing errors it correctsand reduced by new errors it introduces in its application. As we shall see below, the TBLalgorithm takes this into account by scoring the rules based on the difference betweenthe number of errors removed and the number of errors introduced.

As described in Brill (1995), the training module of a TBL tagger has access to aset of tagged gold-standard training data for reference. As shown in Algorithm 1, thefirst step is applying the initial-state annotation (usually the unigram most-likely tag, asmentioned above) to the untagged training data. From here, the tagger generates a setof transformational rules of the kind mentioned above. These are generated from a set ofrule templates, which are derived from POS-based and lexicalized (i.e. wordform-based)features including:

• Change the tag to B when the current tag is A and:

1. The preceding/following word is tagged T

2. The word two before/after is tagged T

3. One of the two preceding/following words is tagged T

4. The preceding/following word is tagged T and the following word is tagged S

5. The preceding/following word is W

6. The current word is W and the preceding/following word is X

7. The current word is W and the preceding/following word is tagged T

8. The current word is W

Where S and T range over the possible POSs and W and X range over all word typesin the corpus, and words separated by a ‘/’ represent different versions of the rule ratherthan disjunctive possibilities in the one rule.

In addition to learning rules conditioned by these contextual features, there is also aphase of the algorithm to deal with unknown words. The details are not relevant here;it will suffice to note that it operates in a similar way but with different rule templatesbased on word-initial or word-final character strings. When the tagger is run over testdata, after the initial annotation, these lexical rules can then be applied to correct errorsin unknown words, and finally the contextual rules can be applied.

This approach produces results very close to those which use far more mathematicallycomplex algorithms. Brill reports an overall accuracy of 96.6% on the Penn Treebank WSJcorpus using 900,000 words of training data (split as 600,000 for learning contextual rulesand 350,000 for learning rules for unknown words). The only drawback of this scheme isthe long training time required, since in each iteration the counts for each possible rulemust be regenerated, as previous rule applications will have probably changed the scoreof that rule since counts were last generated.

On of the most successful approaches to deal with this problem was that devisedby Ngai and Florian (2001), which vastly reduces the training time with no reductionsin accuracy. The basic idea is to avoid repetition by generating and storing good andbad counts for each rule r once at the beginning, and updating the counts only if they

16

Algorithm 1 Algorithm to train a transformation-based learning tagger, where S denotesthe samples in the training corpus, and for each samples s ∈ S, C[s] denotes the positedclassification, Cinit denotes the initial classification and T [s] denotes the true classification

for all s ∈ S doC[s] = Cinit[(s)] // apply initial-state rule to corpus

end forL← ∅ // list of rulesrepeat

vmax ← 0for all p ∈ P do // P the set of rule templates

R← all instantiations of pfor all r ∈ R do

good(r) ← |{s|s ∈ S ∧ C[s] 6=T[s] ∧ C[r(s)] = T [s]}|bad(r) ← |{s|s ∈ S ∧ C[s] = T [s] ∧ C[r(s)] 6= T [s]}|v(r)← good(r) − bad(r)if v(r) > vmax then

rbest ← rvmax ← v(r)

end ifend for

end forfor all s ∈ S do

C[s]← C[rbest(s)] // apply rule to corpusend forappend rbest to L // add rule to list

until vmax < 3output L

are modified by the application of another rule.11 According to Ngai and Florian’s fig-ures, these optimisations reduce the running time over the Penn Treebank WSJ corpusfrom 5880 minutes using Brill’s 1995 tagger to 17 minutes using their own “FastTBL”algorithm. This is the TBL implementation used here.

2.4.2 Support Vector Machine-based POS Tagging

Support vector machines (SVMs) were first applied to POS tagging in Nakagawa et al.(2001). An SVM is a binary classification algorithm based on a geometric interpretation ofthe feature values for each instance. As detailed in Cristianini and Shawe-Taylor (2000),given a set of training instances each consisting of a vector of binary or numeric featurevalues and a true classification y ∈ {−1, 1}, an SVM learns a classification functionf(x) which can be used to classify a test instance with feature vector x. In binaryclassification problems, the classification rule is then sgn (f(x)). The classification rulef(x) is dependent on what is known as the kernel function, which effectively maps the

11This includes not only tags which are directly affected by r, but also those tags on which r depends,i.e. those in the context window used by the rule

17

data into a higher dimensional feature space allowing the correct classification of instancewhich have non-linearly separable feature values in the original feature space.

Gimenez and Marquez (2004), whose SVM-based tagger SVMTool-1.2.2 we utilise here,extend binary support vector machines to cover multiclass classification using a strategyknown as one-per-class binarisation: an SVM is constructed for each POS which containsambiguous lexical items (reportedly 34 for the Penn Treebank), and in the classificationstage, the most confident prediction from all of the SVMs is selected as the tag for theword.

The contextual features used in Gimenez and Marquez’s tagger include unigrams,bigrams and trigrams of words and POSs, derived from the tokens appearing in a contextwindow of 2 tokens on either side of the target. POS features for ambiguous words whichhave not yet been tagged can be replaced with ambiguity classes. These nominal featuresare binarised to act as input to the SVM in the usual way: a nominal feature with kpossible values is represented by k binary features each of which is true when the originalfeature takes one particular value and false otherwise. Thus we have features alongthe lines of preceding POS bigram is (DT,JJ). The accuracy reported for SVMTool is97.16% for all tokens and 89.01% for unknown tokens.

2.4.3 Maximum Entropy POS Tagging

A successful approach to POS tagging in Maximum Entropy (ME) framework is de-scribed in Ratnaparkhi (1996). This is a probabilistic approach to classification tasksbased on the Principle of Maximum Entropy, which states that when choosing betweena number of different probabilistic models for a set of data, the most valid model is theone which makes fewest arbitrary assumptions about the nature of the data.

The ME approach was shown to be effective on a range of NLP problems by Ratna-parkhi (1998). It uses a set of binary features12 and pairs each of these with each possibleoutput label. In the training stage each feature-label pair is assigned a weighting (usingany of a number of algorithms) so that the entropy of the observed data is maximised,and in the classification stage these weightings are multiplied with the corresponding fea-ture to estimate the probability of a label for a particular instance. Finally, the highestprobability label is selected.

The features in this case are similar to the features used by the taggers describedin Sections 2.4.1 and 2.4.2, depending on tokens in a context window of size five. InRatnaparkhi (1996) the discriminating elements are the current word and the two wordson either side, and the two preceding POSs which have already been assigned.

Toutanova and Manning (2000) adapt Ratnaparkhi’s approach, by removing the lexi-cal features based on preceding words and as we saw in Section 2.2 by adding a number ofother hand-tuned features derived from a larger context window to assist in disambigua-tion of problematic words. We identify their tagger, which uses the Improved IterativeScaling algorithm (Malouf 2002) for parameter estimation, by StanME. These optimisa-tions bring the accuracy from the baseline for all/unknown words of 96.76%/84.5% to96.86%/86.91%.

12In ME literature such as Ratnaparkhi (1998) the term contextual predicate (CP) corresponds towhat is called feature in most machine-learning literature, with feature reserved for a distinct but relatedstructure dependent on CPs. For consistency we retain the more widespread usage of feature throughout.

18

Chapter 3

Methodology

3.1 General Outline

Previous approaches to corpus-based POS-tagging have tended to focus on applyingnew algorithms to an existing task (Brill 1995; Ratnaparkhi 1996; Daelemans et al. 1996;Nakagawa et al. 2001) or, in a few cases, on increasing accuracy by adding additional fea-tures to the common core of features used by almost all taggers (Toutanova and Manning2000). This previous work has for the most part used the Penn Treebank in unalteredform as the source of training and test data. In contrast to this, here we take an alter-native approach in keeping the algorithms and features fixed and investigate the effect oftreating the tagset as variable.

We investigate whether we can improve the information available to POS taggers byintroducing a finer-grained and therefore more informative tagset. If we subdivide thetags in a linguistically sensible way, the extra information provided may help the taggermake syntactic generalisations which are not apparent either from the coarse POS tagsor from the sparsely populated lexical feature vector. Effectively, we are attempting tocreate a tagset which is more computationally tractable, so the tagger can more effectivelydisambiguate ambiguous and unknown words using POS-based contextual information.

In creating a finer-grained tagset, we are almost inevitably increasing its linguisticutility; we do not plan to introduce distinctions which have no basis in linguistic reality.We have already mentioned in Section 2.3 that there is often tension between the re-quirements of linguistic utility and computational tractability. It is thus an investigationinto whether they always conflict, i.e. whether there is some set of linguistically usefulmodifications which also increase computational tractability.

However, it is clear that a more complex tagset could make the baseline tagging taskmore difficult due to a potentially increased number of tags for a given word. This meansthat any performance improvements achieved using a modified tagset could be obscuredby decreases in performance caused by increased ambiguity elsewhere. Therefore, toenable comparison with previous work we will adopt the architecture shown in Figure 3.1.For a given set of tagset modifications, the testing cycle is as follows: we map the tag ofeach token in the training data appropriately to a particular new version of the tagset,run the trained tagger over a test corpus, and for purposes of comparison map the finer-grained POS-tags back to the original Penn Treebank tags before evaluating performance.This method means that any increased linguistic utility of the mapped tags is discarded

19

Figure 3.1: The experimental architecture

before evaluation, but for the purposes of this experiment the linguistic utility is a meansfor improving tagger performance rather than an end in itself.

To facilitate the final stage of mapping the tags back to original Penn tags, we placecertain restrictions on allowable modifications: the mapping function must either beinjective from the old to the new tags (i.e. each tag in the new tagset corresponds toat most one in the original tagset), or any distinctions which are collapsed must beunambiguously recoverable from the wordform so the equivalent tags from the originaltagset can be determined reliably.

3.2 Motivation

It is worth addressing the question here of why it is worth striving for a small perfor-mance improvement. By NLP standards, accuracy of ∼97.0% seems astoundingly high,begging the question of whether there is any point in attempting to raise this figure bya few fractions of a percent. However, according to word-by-word evaluation metrics,POS tagging is actually quite a simple task – as noted by Charniak et al. (1993), theunigram-based most-likely tag (MLT) baseline for the task is around 91%.

In getting a clearer understanding of POS tagger evaluation, it is worth consideringthat POS tagging is generally a pre-processing phase in NLP, which acts as input to asecond stage such as sentence-level parsing. If we look at sentence-level accuracy (i.e. theproportion of sentences in which all tokens are correctly tagged) the POS tagging taskseems harder – with an average sentence length of ∼24 words and assuming errors occurindependently we would expect a tagger which gives 97% accuracy over word tokens toachieve 49% at the sentence level, while the corresponding figure for a tagger performingat 98% is 62% of sentences tagged correctly.

There are a number of reasons to believe there is still room for improvement in tagging

20

accuracy. As noted in Brill and Wu (1998), there is a high degree of complementarityin errors made by maximum entropy and TBL-based taggers (among others), suggestingthat even though these taggers use similar contextual features, the differences in the waythese features are combined result in errors over different words. This implies that atleast some of the time, there is sufficient information available, but that the differentunderlying algorithms fail to apply it correctly in certain cases.

3.3 Experimental Setup

The tag-mapping module is implemented in Python using the Natural LanguageToolkit (Loper and Bird 2002) and can conditionally map the POS tag of any tokenin the Penn Treebank dependent on a conjunctive or disjunctive combination of lexical,syntactic or collocational features.

The lexical features, as we have already seen, are those dependent on the wordformitself. The usual usage of this is to supply a list of wordforms mapped to a new POS.Syntactic features are those derived from the parsed annotation of the corpus; morespecifically, they are dependent on the phrasal categories of nearby nodes: parent, grand-parent, immediately preceding or following siblings or all preceding or following siblings.Collocational features are based on relative sentential position with no regard to treestructure. In this case, they are determined by the two preceding and following POSunigrams.

It is worth emphasising that using extra information sources, including syntactic in-formation derived from possibly very distant elements in the sentence, does not representa departure from the mainstream methods of POS-tagging in NLP using only local con-text. These features are purely being used to filter the POS tags in a preprocessing stagein ways we hope more accurately reflect underlying patterns of regularity. While we aremodifying the data so that it is annotated in a way which is hopefully more conduciveto accurate tagging, the tagging algorithms themselves remain unchanged – this methoddoes not give us an unfair advantage by using syntactic information in the tagging processitself.

To illustrate the extraction of feature values, let us consider the parse tree previ-ously shown in Section 1.2.4 and repeated in Figure 3.2 for convenience. The values ofthese features for each of the tokens are shown in Table 3.1. The only lexical feature isWordform; syntactic features are all of the Sib and Par features, and the collocationalfeatures are the NextPOS and PrevPOS features. Note that the preterminal POS la-bel of the target token is ignored, while the POS labels of adjacent tokens are treatedlike any other non-terminal node. This was the most sensible way to avoid redundancyin the features.

We could use these features, for example, to map tags to create a class correspondingto transitive prepositions which we could denote IN-TR, as distinct from subordinators(discussed in Section 2.3.2). The condition for this mapping would be [POS=IN &RightSib=NP] (making use of a syntactic feature in this case). With lexical featureswe might create a new class BEP representing the verb to be in the finite base form(i.e. present tense, non-third person singular) which is equivalent to the C5 tag VBB1

1This would in general be used in conjunction with a suite of corresponding modifications to create

21

S

�����

���

HHHHH

HHH

NP-SBJ

�������

���

@@@

PPPPPPP

DT

Both

NNP

Clarcor

CC

and

NNP

Anderson

VP

����

HHHH

VBP

are

VP

����

HHHH

VBN

based

PP-LOC-CLR

�����

HHHHH

IN

in

NP

������

HHHHHH

NP

NNP

Rockford

–CM–

,

NP

NNP

Ill

Figure 3.2: A parse tree annotation from the Penn Treebank repeated (with minor nota-tional changes) from Section 1.2.4 for convenience

using the condition [POS=VBP & Wordform∈{am, are}]. If we applied both ofthese mappings, the sentence shown in Figure 3.2 would be rendered by the tag mappingmodule as shown in Figure 3.3. Sentences with modified tags such as this would be usedas training data.

Both Clarcor and Anderson are based in Rockford , IllDT NNP CC NNP BEB VBN IN–TR NNP , NNP

Figure 3.3: A sample sentence with some mapped tags (in bold)

There is of course a corresponding inverse mapping module enabling the tags of thetest data to be mapped back to the original tags for comparison with the gold-standard.As described above, this ensures our results are not penalised by the possibly increaseddifficulty of assigning tags in a more complex tagset. At the same time, we will stillbe able to see any changes which, due to the more specific contextual information, haveimproved accuracy over the original tags.

An example of the experimental process is illustrated in Figure 3.4. To use the goldstandard sentence shown as test data, the tags would first be stripped off. Following this,a tagger trained on training data of the sort exemplified by Figure 3.3 might assign tagsas seen in the second row of Figure 3.4. These would be mapped back to the originalPenn Treebank tagset using the inverse mapping module, producing the tagged versionshown on the final line. Comparison with the gold-standard would reveal one taggingerror in the sentence.

other tags for the verb to be, such as BEZ for is, conditioned in a similar way

22

Wordform are based in Rockford , IllPOS VBP VBN IN NNP –CM– NNPPar VP VP PP–LOC-CLR NP NP NPGrandPar S VP VP NP NP NPLeftSib – – – – NNP –CM–AllLeftSib ∅ ∅ ∅ ∅ {NNP} {–CM–}RightSib VP PP–LOC–CLR NP –CM– NNP –AllRightSib {VP} {PP–LOC–CLR} {NP} {–CM–,NNP} {NNP} ∅PrevPOS NNP VBP VBN IN NNP –CM–PrevPOS2 CC NNP VBP VBN IN NNPNextPOS VBP VBN IN NNP –CM– –NextPOS2 VBN IN NNP –CM– – –

Table 3.1: Feature values for a selection of tokens given the parse tree in Figure 3.2, withthe tag for comma listed as –CM– for clarity

Gold-standard sentence with originalPenn tags

IBM is based in Armonk N.YNNP VBZ VBN IN NNP NNP

The sentence as it might be tagged byour trained tagger

IBM is based in Armonk N.YNNP BEZ2 JJ IN–TR NNP NNP

The test sentence with tags mappedback to original Penn tagset

IBM is based in Armonk N.YNNP VBZ JJ IN NNP NNP

Figure 3.4: An illustration of the testing process, showing the gold-standard sentence,the sentence as it might be tagged by a tagger trained on data such as that in Figure 3.3,and the tagged sentence with its tags converted back to to the original tagset, revealingone error, marked in bold.

3.3.1 Data Sampling

An initial round of experimentation with the corpus divided into training, test anddevelopment sets as most commonly seen in the literature, (i.e. sections 0–20 as a trainingset and sections 21–23 as a development/test sets) revealed that the results were quitesensitive to the choice of development set. Since we were aiming for possibly quite smallincrements in accuracy, this meant there was a risk of overfitting to the data since anychanges in accuracy could have been due to peculiarities of the particular data set – evenwith 200K words of test data, a global change of 0.05% corresponds to only 100 words,or much less over a specific POS.

To alleviate this problem we used five-fold cross-validation: the data is divided intofive partitions and a complete training/test cycle involves five iterations. In each iteration,a different partition is held out as test data and the remainder are used to train thePOS tagger. Five-fold cross-validation over sections 0 to 22 of the Penn Treebank WSJcorpus effectively gives a development set of one-million words, the same size as the entiretraining/development corpus.

2See footnote 1

23

Ratnaparkhi (1996) noted the presence of inter-annotator inconsistencies in the PennTreebank, observing sharp changes in the distribution of certain POSs at boundarieswhich correspond to changes in annotator. To avoid effects caused by these discrep-ancies masking any meaningful corpus-wide generalisations, we also departed from theusual strategy of constructing the data partitions from contiguous sections of the corpus.Rather, we divided the corpus into units of five sentences each and assigned the firstsentence of each group to the first partition, the second to the second partition and soon. As noted by Manning and Schutze (1999:208) (albeit for the distinct but related taskof n-gram language modelling), this method tends to inflate performance figures, howeverwe are purely looking for differences in accuracy relative to the baseline so this is not aproblem.

3.3.2 A Data-Driven Alternative

Our primary goal here is to apply linguistic intuition to the task of tagset modifica-tion. We also pursue an alternative, more data-driven line of investigation: we investigatewhether in a separate stage to training the taggers, we can use machine learning tech-niques to determine useful subdivisions (or clusters in machine-learning terms) in thetagsets corresponding to patterns of syntactic regularity. The approach is almost identi-cal to the primary approach described above, but replaces linguistic insight with machinelearning techniques.

We defined a range of features which could help in determining patterns of syntacticregularity. Some of the features were syntactic, often corresponding to layers of annota-tion used by Klein and Manning (2003): phrasal categories of the parent, grandparent,left sibling and right sibling, and binary-valued features for whether a given preterminalcorresponds to a phrasal head,3 or whether it is the only element in its phrase. There werealso a set of collocational features corresponding more closely to the features available tothe tagger, based on the two preceding and two following POS tags.

The approach we took was to conflate the nominal feature values extracted for eachtoken by word type and construct a frequency distribution of the values of each featurefor each word type. For example if we had as input sentences the sentences shown inFigures 3.2 and 3.4 (naturally we would have access to the original unmodified gold-standard parse tree which for Figure 3.4 is not shown), and we were using the featuresPar and PrecPOS, we would construct a number of frequency distributions, includingthe following for based/VBN :

Par:VP2

PrecPOS:VBZ VBP

1 1

The frequency distribution for each feature with n non-zero values was then convertedinto a set of n numeric features for the word type using maximum likelihood estimation.So if we specify f syntactic or collocational features, the result of this is f groups offeatures for each type, derived from f probability distributions composed of individual

3A phrasal head in English is the rightmost non-terminal in a phrase with a category which correspondsto the phrasal category, e.g. the rightmost NN in an NP. These are regarded by linguists as beingparticularly significant components of a phrase.

24

features with values corresponding to the relative frequencies of these particular valuesfor the feature. For the above example, the corresponding set of derived features andvalues for based/VBN would be:

Par is VP PrecPOS is VBZ PrecPos is VBP1.0 0.5 0.5

This method of combining feature values was the most principled way we could find ofcapturing a large amount of distributional information manageably. These feature valueswere then used as input for the implementation of the EM algorithm4 in the Weka toolkit(Witten and Frank 2000).

3.3.3 Evaluation Metrics

Perhaps the most intuitively obvious method of evaluation in classification tasks is theaccuracy: the fraction of samples (in this case word tokens) which the algorithm classifiescorrectly according to the gold standard – i.e. the number of correct classifications dividedby the total number of samples. For comparison with previous work we use the globaltoken-level accuracy metric since it is the most widely-used metric in tagging research.The token-level accuracy over unknown words (i.e. those which did not appear in thetraining data) is also crucial since this is a major source of tagging errors – in our baselinewith an unmodified tagset, just 2.4% of the tokens in the training data were unknown butthey contributed 11–13% of errors. Another metric we will use which is less widespreadis the sentence-level accuracy metric described in Section 3.2. As noted, it is valuable asa reflection of likely problems in downstream applications.

However in a classification problem, it is also often instructive to look at performanceover individual classes, since two classifiers which produce identical global accuracy figurescould in fact have vastly different distributions of errors between different classes. In thiscase, accuracy is often less interesting, as the figures tend to be dominated by the largenumber of true negatives. If we instead focus on a particular target class and denote thenumber of true positives (i.e. those samples correctly classified with the target label) bypt, false negatives (erroneously classified into a non-target class) by fn, and true negativesand false positives by nt and pf respectively, we can gauge the number of true positivesrelative to false positive and false negatives using precision (P ), recall (R) and F-Score(F ), which are defined by:

P =pt

pt + pf

R =pt

pt + nfF =

2PR

P + R

The precision corresponds to the fraction of positive classifications which were actuallycorrect, and the recall corresponds to the fraction of true samples which were classifiedpositively, while the F-score combines the two using the harmonic mean, which heavilypenalises a system with a low score for either P or R. Where it seems relevant, wewill look at precision, recall and F-score over individual POSs to highlight points ofinterest observable from variations in results which are not evident from the broad metricsdiscussed above.

4In fact the implementation included a wrapper which ran the algorithm multiple times iterativelyincreasing the number of clusters until the log-likelihood decreased, avoiding arbitrary selection of thenumber of clusters

25

Baseline

As well as showing the benchmark of accuracy achievable with an unmodified tagset,for a point of comparison we will apply a suite of naively conceived modifications to actas a baseline indicating how far it is possible to get with a simplistic approach. Theidea is borrowed from POS induction, which involves determining word clusters (i.e.POSs) from unannotated data. The task here is similar except that we are looking forpatterns of regularity within a particular POS, so the baseline used by Clark (2003) maybe informative. To subdivide a POS into n subclasses, we assign each of the (n − 1)most frequently seen word tokens from the class into (n − 1) separate new classes andthe remainder to a final subclass. For example, it can be determined empirically that themost frequently observed token with class PRP is it. So, for n = 1 over the PRP class wecreate a mapping (using lexical features) assigning it to one class and all other membersof PRP to another (or equivalently leave them in the original).

There are number of reasons why we might expect such a baseline to show reasonableperformance. Often the most frequent members of the class will show a large degree ofsyntactic irregularity compared to other members of their class. For example in the de-terminer class (DT), the two most frequent lexical items are the and a, which correspondto what grammarians call articles, are unique among the determiners in being unablestand alone as a grammatical noun phrase, as is demonstrated by oppositions such asI like that versus *I like the. Additionally, these most frequent lexical items are thosewhich will be most resistant to problems of data sparseness which we often introduce increating new categories.

26

Chapter 4

Experimental Evaluation

There is a very large range of potential modifications to any tagset, therefore it wasnecessary to adopt an approach which involved tentatively testing as wide a range ofmodifications as possible but only running a complete test cycle on the most promisingmodifications. This enabled a broad search space initially but also kept the investigationmanageable in terms of CPU time.

We discuss several broad groups of modifications below: the baseline in Section 4.1,clustering in Section 4.2 and linguistically motivated modifications described in Sec-tion 4.3. For each of these, we adopted the following incremental prototyping archi-tecture. We selected fnTBL (Ngai and Florian 2001) as our first stage prototyping toolfor a set of tagset modifications, as it can complete a five-fold cross-validation test-cyclein under two hours. Any modification which had a large negative impact on performanceat this stage was generally not investigated further, since the taggers use similar features,and we were attempting to find universally useful distinctions. The SVM tagger SVM-Tool 1.2.2 (Gimenez and Marquez 2004), with a turnaround of under seven hours, wasused in subsequent experimentation. Only the Stanford NLP Maximum Entropy tagger(Toutanova and Manning 2000) (StanME hereafter) had a prohibitive training time, sofor practical reasons was used minimally, for benchmarking and later-stage testing.

We have attempted to summarise as broad a range of modifications as possible withinthe space constraints, meaning that in some cases there was one or more subtly differentvariants of a mapping we cover here for which we do not report results, instead reportingonly on the most successful variant. Additionally a number of (generally less successful)modifications are not reported here.

In all of the tables in the results section we quote figures for accuracy over all tokens tofive significant figures. Even though these figures are derived from over one million tokens,the comparatively large number of significant figures probably implies an unjustifieddegree of confidence in these numbers, however we quote this many figures in order tomake visible the comparatively small changes in performance we were expecting. Thefinal digit in particular should not be interpreted as confident reflection of performancebut rather as a rough indication. Other figures over which we expect more variability andwhich are derived from a smaller number of samples are given to four significant figures.

In each table, as well as showing the token and sentence-level accuracy figures, we alsoshow a selection of changes in F-score over individual POSs relative to the benchmark. Tokeep the number of figures manageable, we determined the most significant changes using

27

TB SV MaxEntAll Tokens 96.842 96.852 97.056Unknown Tokens 81.94 84.62 87.34Sentences 51.77 50.72 53.72

Table 4.1: Accuracy (%) of off-the-shelf taggers with default parameters

the paired t-test (Witten and Frank 2000), which estimates the statistical significance ofdifferences between two tests using the figures for each fold of cross-validation. In eachtable, we show the two most significant negative F-score changes, and the two mostsignificant positive changes, excluding uninteresting POSs: punctuation marks, as wellas FW, which occurs so infrequently that it is almost impossible to tag correctly meaningthat any changes are almost certainly due to noise, despite the confidence of the pairedt-test.

4.1 Benchmark and Baseline

As a benchmark, in Table 4.9, we show the global accuracy figures for each of ourtaggers as run with default or recommended parameter settings using the experimentalsetup described in Section 3.3 (i.e. five-fold validation over sections 0–21 of the PennTreebank) with no mapping function applied to any of the tags. F-scores over specificPOSs from the same experiment are shown in Table 4.2, for reference in subsequent tableswhich quote changes relative to these.

In Table 4.3 we show the results of our baseline test (as described in Section 3.3.3) ofnaive frequency-based clustering over a selection of POSs judged to potentially benefitfrom such mappings, with further testing using the more CPU-intensive taggers conductedfor the most successful modifications, as described above. Closed classes are intuitivelymore likely to exhibit syntactic irregularities that could be captured by such a method,however a notable exception is the open class RB. Of all of the open classes, RB standsout as a disparate class (a “dustbin class” according to Crystal (1987:92)) often used tocontain words which do not obviously belong in any other class. It is thus composed ofwordforms with widely differing syntactic functions such as not, as and home, and shouldbe an ideal candidate for modification in order to tease apart these distinct functions. Wewill make use of this fact more extensively in Section 4.3 but for the moment we simplynote this as a motivating factor for further investigation of naive methods over the RBclass. In addition, we show results for the closed classes IN, PRP and DT.

These results are instructive for a number of reasons. The first point to note is thateven from these few data points it is clear that there is no simple correlation (either pos-itive or negative) between the number of new classes introduced and the accuracy. Thereseems to be an optimal number of classes in many cases which produces performance veryclose to the benchmark, and having more or fewer classes degrades performance. Thefact that the best of the modifications produces very similar performance to the baselineis a point of interest we will return to later.

28

TBL SVM MaxEntAll Unk All Unk All Unk

EX 97.38 – 97.26 – 97.05 –IN 98.36 – 98.43 – 98.21 48.48JJ 92.22 80.84 91.66 76.01 92.31 82.44

JJR 88.41 41.44 87.55 34.86 87.46 58.65JJS 95.46 70.33 93.26 73.87 94.77 78.35NN 96.12 73.75 96.13 71.79 96.48 78.99

NNP 96.29 89.65 97.01 88.57 96.96 92.35NNPS 62.62 19.65 65.6 20.78 59.44 44.62

NNS 97.47 88.79 97.73 85.77 98.07 91.32PDT 76.48 – 75.18 – 68.82 –

RB 92.8 85.63 92.57 82.29 92.32 88.62RBR 71.86 – 70.47 – 68.37 –RBS 86.04 – 78.55 – 82.58 –RP 75.72 – 77.04 – 76.37 –VB 95.65 82.27 95.42 79.71 96.27 88.41

VBD 95.46 75.25 95.06 72.25 96.32 82.34VBG 93.28 86.42 91.79 79.21 93.08 87.76VBN 89.56 75.54 88.84 74.00 90.33 80.25VBP 93.06 44.44 92.97 55.46 94.16 68.31VBZ 97.61 75.22 97.03 66.75 97.78 83.35

Table 4.2: Benchmark F-Score (%) over 1,047K tokens of text, for selected POSs

4.2 Clustering

This round of experimentation concerns the clustering approach described in Sec-tion 3.3.2. Two different sets of features were used as input to the clustering algorithmin order to determine slightly different patterns of regularity. One trial used collocationalinformation derived from preceding and following POSs only. Since there is no extrainformation here than what is already available to the taggers, this effectively acts asa preprocessing stage to determine distributional similarities that the taggers had notnoted explicitly. The alternative approach used the same collocational information andadded to this the syntactic information, such as parent and sibling node labels from theparse tree annotation of the corpus, in an attempt to capture deeper syntactic regularitieswhich could facilitate accurate tagging.

In each case, the clustering algorithm was run over a range of intermediate sizedclasses: DT, IN, JJR, JJS, PRP, RB, RBR, RBS and RP, ignoring classes containingjust a handful of items (such as EX) or a large number of items (such as VBN).

A qualitative examination of the output was used to select candidates for furthertesting. A possible modification could be rejected for a number of reasons. One reasonwas if it reflected transparently an easily reproducible pattern of syntactic regularity butas an imperfect version of a manually created mapping used elsewhere. For example, the

29

POS:n Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –ME 97.056 87.34 53.72 – – – –

IN:2 TB 96.823 81.40 51.41 JJU :−1.75 VBPA:−0.20 RPA:+0.38 RBU :+1.75IN:3 TB 96.817 81.78 51.28 JJA:−0.10 VBNA:−0.26 VBZA:+0.10 VBGU :+1.21IN:4 TB 96.819 81.57 51.55 NNSU :−0.44 JJSA:−0.26 PDTA:+0.12 NNPSU :+23.60DT:2 TB 96.806 81.61 51.13 RBA:−0.09 NNPA:−0.05 PRP$A:+0.01 VBGU :+1.66DT:3 TB 96.813 81.78 51.38 PDTA:−2.07 RBA:−0.11 VBGU :+1.47 VBGA:+0.12DT:4 TB 96.806 81.64 51.28 CDA:−0.05 JJU :−1.12 VBGU :+1.40 VBDU :+1.96

PRP:2TB 96.839 81.81 51.69 VBA:−0.12 EXA:−0.26 JJRA:+0.40 VBGA:+0.18SV 96.851 84.60 50.67 VBNA:−0.04 NNSU :−0.17 VBA:+0.02 CCA:+0.01ME 97.050 87.32 53.63 VBNA:−0.05 NNU :−0.13 VBDU :+0.36 VBA:+0.02

PRP:3 TB 96.830 81.95 51.57 CDU :−0.44 CDA:−0.04 NNPU :+0.35 RBU :+1.95PRP:4 TB 96.838 81.71 51.60 VBDU :−1.74 CDU :−0.32 PRPA:+0.01 VBPA:+0.13RB:2 TB 96.834 81.67 51.49 VBNA:−0.15 JJA:−0.04 RBRA:+0.91 RPA:+0.60

RB:3TB 96.843 81.73 51.72 VBU :−0.75 NNPSA:−1.71 VBGA:+0.20 POSA:+0.04SV 96.855 84.67 50.71 VBA:−0.01 CDA:−0.00 VBPA:+0.04 NNA:+0.01ME 97.056 87.28 53.72 DTA:−0.00 JJRA:−0.03 NNSA:+0.01 VBA:+0.02

RB:4 TB 96.831 81.56 51.62 WDTA:−0.17 VBNU :−2.36 RPA:+0.32 VBZA:+0.07

Table 4.3: Accuracy (%) over all tokens (All), unknown tokens (Unk) and sentences(Sent), and largest changes in F-score over specific POSs (subscript A and U denote alland unknown tokens, respectively) for modifications described in Section 4.1 with naivelysubdivided POSs

clustering for PRP using all features approximately reproduced the distinction betweennominative pronouns such as he, accusative pronouns such as them and reflexive pronounssuch as ourselves, with some irregularities such as dividing the reflexive pronouns intoseveral different clusters. Additionally, a clustering would be ignored if it produced onlyone cluster or in the case of larger classes such as RB, if there was no clear pattern ofregularity in the data and the observed distinctions seemed largely random.

As described above, we selected more promising results on the basis of experimen-tation with fnTBL for further investigation. The results are shown in Table 4.4 wherea clustering within a POS is identified by the POS followed by “cl” for “clustering”in parentheses along with one or more identifiers for the feature set(s) used, where “c”denotes “collocational” and “s” denotes “syntactic”.

Some interesting points should be noted about the clustering results. The performanceof the RP2 cluster is very similar to that of the benchmark, however closer examinationreveals that the clusters correspond to the frequent lexical items in RP versus the infre-quent items, many of which occur only once in the data and are likely to be annotationerrors. Thus, the clustering is little more than a form of data cleaning and it is prob-

2Empirically we found that in this case the same cluster was produced both by using collocationalfeatures only as well as using both collocational and syntactic features

30

Mapping Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchmarkTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –ME 97.056 87.34 53.72 – – – –

in(cl–c)TB 96.850 82.00 51.82 VBU :−1.18 VBA:−0.04 MDA:+0.01 RBA:+0.08SV 96.855 84.61 50.74 JJA:−0.01 VBNA:−0.02 VBZA:+0.01 CCA:+0.02ME 97.050 87.32 53.59 VBNA:−0.04 JJSA:−0.30 NNPSU :+3.26 CCA:+0.01

rp(cl–c)1TB 96.840 81.65 51.73 VBNU :−3.62 VBNA:−0.16 PRPA:+0.01 VBZA:+0.08SV 96.852 84.65 50.73 CCA:−0.00 RBA:−0.02 VBDU :+0.62 WDTA:+0.03ME 97.053 87.30 53.68 NNU :−0.25 NNPSA:−0.38 CDA:+0.00 VBDU :+0.41

rbr(cl–c,s) TB 96.822 81.77 51.41 NNPA:−0.04 VBNA:−0.07 VBGA:+0.17 VBGU :+2.19

in(cl–c,s)TB 96.831 81.79 51.52 RBA:−0.09 CDU :−0.33 POSA:+0.03 VBDA:+0.07SV 96.865 84.64 50.90 RBU :−0.24 NNPSU :−3.23 RBA:+0.05 INA:+0.06ME 97.065 87.32 53.78 WDTA:−0.28 JJRA:−0.08 CCA:+0.01 RBA:+0.11

Table 4.4: Accuracy figures over all tokens (All), unknown tokens (Unk) and sentences(Sent), and largest changes in F-score over specific POSs (subscript A and U denote alland unknown tokens, respectively) for modifications described in Section 4.3.2

ably unsurprising that the results are very similar to the benchmark. Additionally, thein(cl–c) clustering which produces slight performance improvements in some cases isprobably due to overfitting, as a result of using a very similar set of features to thoseavailable to the tagger in a dataset and extracting these features from the testing data –the clusters produced show no discernible patterns of regularity. To show that there wasgenuine improvement we could test this using the held-out test data (Sections 22-23 ofthe WSJ) that until now has been unutilised, however we will assume the result is notsignificant.

4.3 Linguistically Motivated Modifications

We show results here for a wide range of linguistically motivated modifications of thetype outlined in Section 3.3.

4.3.1 Notational Conventions

We use the following conventions for the mnemonic names used to identify the differentmodifications to the tagset. Each “modification” will refer to a mapping applied only to aspecific POS. Where two or more mappings are used in combination they are shown joinedby a “+”. For modifications which simply subdivide an existing tag (the majority), thename of the modification is composed of two parts – the first showing the original POSwhich is separated by a long dash (“–”) from the second part, which is an abbreviationreflecting the new distinction being made. Groupings of closely related modificationswill be identified by the affected POSs separated by a “/”. Mappings which (partially)collapse a distinction will be identified by the original class and the target class separated

31

by a colon. In cases where a similar distinction is conditioned in different tests by differenttypes of features, these are disambiguated by a code in square brackets indicating thetype of features appended to the end of the name, where [s] denotes “syntactic” and[l] denotes “lexical”. Additionally these lexical features may be further specified as[lm] or [ld], for “manually created” or “data-extracted” respectively, referring to thesources of information to create the lexical mappings, which will be explained morefully in Section 4.3.5. For example, later we will see a tagset modification identified byvb–cop[lm]+rb–deg[s], indicating a manually created lexical mapping of the VB classin conjunction with a syntactically conditioned modification of the RB class.

4.3.2 Syntactically-Conditioned Modifications of ClosedClasses

Here we consider mapping which are conditioned on syntactic features which affectwords in closed classes. One obvious candidate modification is reversing the idiosyncraticconflation of prepositions such as in (which take NP (noun phrase) complements) andsubordinating conjunctions such as because (which take S (clause) complements), i.e. theIN tag as described in Section 2.3.2. This could have been achieved lexically, by extractinga list of lexemes which frequently act as subordinators in the training data, and mappingthe tags of the tokens accordingly. However, the most successful and principled approachwas using syntactic features for each token and thus deciding on a token-by-token basis.This captures the fact that there are certain words that are ambiguous (using only wordunigrams) between the two. For example, before can act as a preposition in He left beforeher and as a subordinating conjunction in He left before she got angry. We let the taggerresolve such ambiguities as appropriate. Two syntactic features were used to determineif a given IN token is a subordinating conjunction (as distinct from a preposition): anSBAR parent node or an S immediate right sibling.3 This modification is designatedin–sub[s].

A similar modification in this domain was another reversal of a Penn Treebank id-iosyncrasy: the conflation mentioned in Section 2.3.2 of the preposition to which headsa prepositional phrase such as She went to university with the infinitival to whichheads a non-finite verb phrase such as She likes to teach. According to Marcus et al.’scriteria, it would make more sense to disambiguate these distinct uses of to and groupthe prepositional use with the other prepositions IN. The prepositional use can be dis-tinguished by a right sibling with category NP or QP while tokens which introduce aninfinitive are left with the original tag TO. This modification, identified by to:in wasalso run in combination with in–sub[s], since both were designed to label prepositionsmore consistently.

All of the modifications are reversals of characteristics of the Penn tagset, which werefounded with a particular intention in mind but are clear examples of idiosyncrasies of thePenn tagset. As is clear from Table 4.5, while neither of them achieved the objective ofthe experiment, the results for in–sub[s] in particular suggest that there is an argumentfor introducing this distinction in other NLP applications and retaining the mapped tagsin the output, since while we do not gain anything in terms of performance, we also do

3These are theoretically equivalent but were included for robustness

32

Mapping Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchmarkTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –ME 97.056 87.34 53.72 – – – –

in–sub[s]TB 96.842 81.76 51.63 NNPA:−0.05 NNPSA:−1.74 JJRA:+0.25 VBGA:+0.20SV 96.855 84.65 50.77 RPA:−0.49 EXA:−0.10 INA:+0.02 NNPA:+0.01ME 97.048 87.29 53.64 WDTA:−0.79 DTA:−0.02 JJA:+0.04 NNSA:+0.01

to:inTB 96.833 81.67 51.60 INA:−0.01 NNPSA:−0.82 VBZA:+0.10 VBGU :+1.38SV 96.855 84.51 50.72 VBU :−5.48 JJA:−0.02 VBPA:+0.05 VBPU :+13.07

to:in +in–sub[s]

TB 96.834 81.88 51.54 VBDU :−2.67 VBNU :−2.59 WDTA:+0.22 VBGU :+1.68SV 96.846 84.42 50.64 VBU :−7.10 NNU :−0.43 PRP$A:+0.02 NNSU :+0.16

Table 4.5: Accuracy figures over all tokens (All), unknown tokens (Unk) and sentences(Sent), and most significant changes in F-score over specific POSs (subscript A and Udenote all and unknown tokens, respectively) for modifications described in Section 4.3.2

not lose anything, and it is potentially useful in downstream applications. The to:inmodification induces a small performance penalty but does at least result in a moreconsistent set of labels. Of course neither of these modifications would be of much valueif an analysis of errors in the finer-grained tags (i.e. without mapping the tags back to theoriginal tagset) revealed the classification was doing no better than chance. For exampleif the tagger was not doing particularly well at distinguishing between the to/TO andto/IN then the distinction could introduce more problems than it solves through noisein the data.

4.3.3 Syntactically-Conditioned Modifications of Open Classes

The mappings we consider now are again syntactically conditioned but apply to openclasses such as RB. One modification is based on the observation that in the baselinetaggers, 5.8-6.4% of tagging errors were due to a gold-standard JJ (adjective) beingtagged VBN (verb past participle) or vice versa, with a further 1.9-2.0% of errors dueto the corresponding JJ/VBG (verb present participle) confusion. These distinctionsare notoriously difficult to make in certain cases: in an isolated sentence such as Shewas offended, it is not possible even for a human to determine whether offended shouldbe tagged as JJ or VBN ; similarly we cannot discriminate between JJ or VBG as thecorrect tag for entertaining in They were entertaining. However, but we should be ableto assist in discrimination by utilising the linguistic tests for distinguishing between thetwo:4 adjectives can be modified by “degree adverbs” such as very, while verbs cannot.Thus, the presence of a degree adverb should indicate unequivocally that the head word isan adjective. In practice there is no clear boundary between degree adverbs and the morecommon verb-modifying adverbs so the approach we took here was to allow ambiguity ofdegree adverb membership and condition the tag mapping on syntactic features for eachtoken: an RB with either an RB or JJ as its right sibling, or an ADJP (adjective phrase)

4These are also recommended in the Penn Treebank tagging guidelines (Santorini 1990)

33

as parent was mapped to a degree adverb. This modification is denoted rb–deg[s].

In line with the observation mentioned in Section 4.1 that the adverb or RB classstands out as a particularly disparate class, we attempted some other similar modificationsto different subsets of the RB class. The rb–loc[s] modification maps tokens taggedRB which have ADVP-LOC or ADVP-DIR as parent to a new class. Empirically thistends to be words such as here, home and abroad, which show quite different syntacticdistributions to other adverbs (for example, they can occur as complement to be, such asHe is abroad), however due to the labelling conventions this also captures other adverbssuch as locally which show distribution more like the larger class of non-locative adverbs.There is also a natural parallel in the temporal domain for adverbs such as now andsometimes, which we map to a new class when the tokens occur with ADVP-TMP asparent; we denote this mapping rb–tmp[s].

Another series of modifications of this type relies on the fact that verbs tend toselect for certain types of complements so knowledge of characteristics of the verb canassist in disambiguating problematic words in the immediately following context. Allof the modifications of this type are in fact composed of a suite of six modificationscorresponding to each of the original Penn verb tags (VB, VBP, VBZ, VBD, VBN andVBG), all of which must be mapped to a new tag. There were four modifications of thistype which are described below.

The first such modification represents an alternate approach to making the JJ/VBGand JJ/VBN distinctions mentioned above and is again inspired by a test used by linguists(and also recommended by Santorini (1990)) to distinguish between these problematiccases. Adjectives can occur after be as well as after other so-called copular verbs suchas seem, become and appear. Thus while crushed in He was crushed is ambiguous(without context) even to a human annotator, in He seemed crushed it is unambiguouslyan adjective or JJ. This motivates the vb–cop[s] modification, which is conditionedon the label of any right sibling containing –PRD, the Treebank notation for predicatecomplement, i.e. the argument of a copular verb.

A set of problematic distinctions (noted by authors such as Toutanova and Manning(2000)) is a set of words which are ambiguous between verb particles (RP), prepositions(IN) and adverbs (RB).5 For example, in could be an RP in She cashed in her shares,an IN in She stood in the hallway and an RB in She went in. Toutanova and Manningreduced this ambiguity using a lexically conditioned feature, based on particular verbtokens, however in this case we evaluate the utility of syntactic features for the samepurpose. In vb–rp[s], a verb token is mapped to a new tag when one of its right siblingsis PRT (the Treebank annotation for the parent of a verb particle). This could helpin making the difficult RP/IN and RP/RB distinctions, since it will be more likely toexplicitly annotate verbs which we could expect to precede particles, and this informationcould in turn be used to determine the identity of a subsequent ambiguous token.

The vb–inf[s] modification is another syntactically conditioned mapping inspired bya lexical feature used by Toutanova and Manning (2000), where a feature was added

5Huddleston and Pullum (2002) hold that the class of words to which we are referring which aretagged as RB in the Penn Treebank should instead be treated as prepositions without noun phrasecomplements for a range of reasons including the systematic ambiguity mentioned above and the factthat these phrases headed by putative adverbs are distributed like prepositional phrases rather thanadverb phrases.

34

Mapping Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchmarkTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –

rb–deg[s]TB 96.818 81.66 51.36 CDA:−0.63 JJRA:−4.15 POSA:+0.41 WPA:+0.66SV 96.847 84.72 50.63 NNPU :−0.05 VBGA:−0.09 VBDU :+2.73 NNPSA:+0.17

rb–loc[s]TB 96.841 81.73 51.59 NNSU :−1.17 VBDU :−3.26 RBA:+0.09 VBGU :+1.87SV 96.851 84.66 50.66 NNA:−0.01 VBZU :−0.71 WDTA:+0.04 VBA:+0.02

rb–tmp[s] TB 96.807 81.92 51.29 CDA:−0.03 VBNA:−0.21 VBGU :+2.15 RBU :+4.05vb–cop[s] +rb–deg[s]

TB 96.822 81.72 51.43 NNU :−1.14 RBA:−0.13 VBGU :+2.30 VBZA:+0.17

vb–cop[s]TB 96.839 81.78 51.62 VBPA:−0.21 NNPSA:−0.81 VBGU :+1.74 VBZA:+0.20SV 96.846 84.70 50.70 VBNU :−3.02 POSA:−0.03 RBA:+0.10 JJU :+0.29

vb–rp[s] TB 96.824 81.71 51.59 VBA:−0.21 VBNA:−0.34 VBGU :+2.04 VBZA:+0.12vb–tr[s] TB 96.776 81.52 50.89 VBDU :−4.95 VBPA:−0.65 DTA:+0.02 VBZA:+0.07

Table 4.6: Overall accuracy figures and most significant changes in F-scores over spe-cific POSs for modifications described in Section 4.3.3 (See Table 4.5 for explanation ofsymbols)

which was sensitive to the presence of the verbs do, let, make or help which frequentlytake bare infinitival complements (i.e. a verb with tag VB — the bare, uninflected form —not preceded by to). This is designed to help with the problematic VB/VBP distinctionmade in the Penn Treebank which we described in Section 2.3.2. For example we shouldbe able to correctly tag find as VBP in They find it difficult and as VB in They dofind it difficult. Our strategy here differs slightly: we map tokens with an immediateright sibling S, which corresponds to the parent of infinitival verb phrases, to investigatewhether it helps the tagger make the distinction.

The final syntactically-conditioned modification of verbs is vb–tr[s], reflecting thedistinction between transitive and intransitive verbs. Transitive verbs such as kill havea complement corresponding to an object (in English, this is usually a noun followingthe verb), which is prototypically the thing acted upon. Intransitive verbs such as dielack such an argument. This mapping considers verb tokens as transitive if they have anNP as one of their right siblings. The mapping was not aimed at one particular classof ambiguity, however it was a promising modification as it would create new classescomposed of large numbers of token instances.

As we can see in Table 4.6 many of these modifications were not as successful as we hadhoped. Some of the lack of success of the rb–deg[s] modifications can be attributed toinconsistencies in the data (examples such as very/RB alarmed/VBN directly contradictthe Treebank tagging guidelines), as well as the fact that there are too few instances ofJJ/VBN ambiguity following a degree adverb in the data for this extra information tohelp. A more successful modification targeted at the same problem was vb–cop[s], whicheven produced some significant changes over the targeted POSs, although some of thesesuch VBNU for SVMTool, were decreases rather than increases in F-score. rb–loc[s]produced some noticeable changes in F-score, although these are difficult to accountfor. The least successful by far was vb–tr[s], suggesting that verb valency is very

35

Mapping Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchmarkTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –

in–rp[l] TB 96.818 81.52 51.43 VBNU :−1.32VBDU :−2.64 VBGA:+0.07 VBZA:+0.10in–rp[l] +in–sub[s]

TB 96.832 81.59 51.51 NNU :−1.72 JJU :−1.36 VBDA:+0.11 VBGU :+1.21SV 96.851 84.63 50.73 NNPU :−0.06 PDTA:−0.85 JJRU :+3.42 NNSU :+0.12

in/rp/rb[l] TB 96.812 81.68 51.42 RBA:−0.08 NNPA:−0.05 VBZU :+3.89 VBZA:+0.13in/rp/rb[l] +in–sub[s]

TB 96.801 81.57 51.15 NNU :−1.42 RBA:−0.10 VBZU :+4.65 VBZA:+0.16

dt–num[l] +jj–num[l]

TB 96.810 81.59 51.37 VBPA:−0.12 NNSU :−0.82 VBDA:+0.05 CCA:+0.05

Table 4.7: Overall accuracy figures and most significant changes in F-scores over specificsPOSs for modifications described in Section 4.3.4 (See Table 4.5 for explanation ofsymbols)

unpredictable and/or the extra information it provides is not useful in disambiguating inproblematic cases, perhaps explaining why none of the tagsets discussed in Section 2.3make such a distinction.

4.3.4 Lexically-Conditioned Modifications of Closed Classes

Here we move on to lexical mappings, i.e. those which have in their condition set alist of words, focussing first on closed class words. One trial in this category was designedto increase computational tractability with little reference to linguistic motivation. Itconcerns the ambiguity between IN and RP (particle). As we have already noted inSection 4.3.3, these POSs are notoriously difficult to distinguish between, since manywords such as on are systematically ambiguous between the two. However, there are manymembers of IN which have no homographs (distinct lexemes with the same spelling) in theRP class. If we determine the ambiguous types in a preprocessing pass over the trainingdata6 and map the ambiguous members of IN to a new class, we are explicitly indicatingto the tagger whether or not a word is ambiguous between the two POSs and couldimprove performance for these particular words. This mapping is designated in–rp[l].As shown in Table 4.7, in another trial this modification was used in conjunction the thein–sub[s] modification mentioned in Section 4.3.2, reflecting our intuition that since thesetwo modifications dealt with the same POSs there might be some interaction betweenthe two. It is clear from Table 4.7 that the two modifications described in combinationperformed better than the in–rp[l] modification alone.

A related trial concerns the similar ambiguity of certain members of RB. A largenumber of word types are unambiguously RB, however there is a subset of types which

6A highly rigorous approach would determine these figures from the training data only. However ouruse of cross-validation means that such an approach would require a separate preprocessing phase foreach folder. The well-populated feature vectors we were examining meant that overfitting by extractingcounts from the development data as well as training data was unlikely to cause overfitting, so we adoptedthis slightly less rigorous approach to avoid complexity of implementation

36

are ambiguous between RB and IN, and a subset of this containing types for which thecorrect tag could be any of RB, IN or RP. We thus divide RBs into those which areunambiguous (leaving them unchanged), those which are ambiguous between RB and INand those which exhibit a three-way ambiguity between the relevant classes. Similarly,INs are divided into classes for types showing no ambiguity, RB/IN ambiguity and three-way ambiguity. This suite of closely related modifications is designated in/rp/rp[l]according to the conventions noted in Section 4.3.1

Another modification in this domain is inspired by tagsets such as C7 which makedistinctions within the determiners and articles according to whether they indicate thatthe following noun phrase is singular (e.g. a or this) or plural (e.g. these), or do not specifythe number of the noun phrase which they introduce (e.g. the). Related to this are thewords such as many, few and countless which are tagged JJ7 in the Penn Treebank butlike these determiners indicate explicitly the number of the noun phrase they occur in(which in this case is plural). The modifications are designated dt–num and jj–num.

One of the most interesting aspects of the results shown in Table 4.7 is that thein–rp[l] modification performed better when combined with the the in–sub[s] modifi-cation. We do not have convincing explanation for the outcome of these rules interact-ing. While it does not support our hypothesis that we can improve performance overthe benchmark using carefully selected modifications, it does indicate that in certainsituations (i.e. with the in–rp[l] modification as starting point) the addition of a moti-vated distinction to the tagset can improve accuracy. Interestingly, the addition of thein–sub[s] distinction had the opposite effect for the in/rp/rb[l] modification. Finallywe note an interesting point that the dt–num[l] modification reduced performance sig-nificantly over NNS (singular common noun), one of the POSs where we would naivelyexpect better performance.

4.3.5 Lexically-Conditioned Modifications of Open Classes

Outlined here are some mappings of open classes based on lexical features. There area large number of modifications here which are variant versions of the modifications listedin Section 4.3.3. They are intended to reflect similar patterns of syntactic regularity, butinstead of conditioning the mappings using syntactic features on a token-by-token basis, apre-prepared word-list was used. These word lists could be generated in one of two ways.They could either be manually created from grammars such as Huddleston and Pullum(2002) or generated in a preprocessing pass over the data using similar syntactic featuresto extract counts for each word type and using these counts to create a list of word typeswhich had relative frequencies of occurrence of more than an arbitrarily chosen thresholdin the particular syntactic context.

Manually generated lists were applied to the following mappings (recall from Sec-tion 4.3.1 that [lm] is shorthand for “manually created lexical mapping”):

• The locative adverb mapping, rb–loc[lm]

• The mapping for verbs which select for infinitives, denoted vb–inf[lm] (these werederived from those used for the related feature by Toutanova and Manning (2000))

7These are technically open class words and thus should appear in Section 4.3.5 however it was naturalto group this modification with the determiner mapping due to their similar functions

37

Mapping Alg Accuracy Most Significant F-score Changes from Benchmark(%)All Unk Sent Negative Positive

BenchmarkTB 96.842 81.94 51.77 – – – –SV 96.852 84.62 50.72 – – – –

rb–loc[lm] TB 96.826 81.74 51.53 NNSU :−0.84 VBPA:−0.18 RPA:+0.25 VBZA:+0.14

vb–inf[lm]TB 96.842 81.89 51.61 CDA:−0.04 CDU :−0.39 RBU :+2.08 RBA:+0.10SV 96.848 84.59 50.66 VBNA:−0.07 VBZU :−1.08 POSA:+0.02 NNPSA:+0.10

vb–cop[lm] TB 96.827 81.73 51.45 NNPSA:−1.16 VBPA:−0.16 WDTA:+0.13 RPA:+0.30vb–cop[lm] +rb–deg[s]

TB 96.821 81.90 51.43 CDA:−0.03 WDTA:−0.14 MDA:+0.02 VBPU :+14.12

rb–deg[ld] TB 96.825 81.44 51.51 VBNU :−2.78 PDTA:−1.16 VBGA:+0.07 VBZA:+0.09

vb-rp[ld]TB 96.823 81.69 51.46 VBPA:−0.19 VBDA:−0.07 DTA:+0.02 POSA:+0.02SV 96.852 84.60 50.71 NNSU :−0.19 VBGA:−0.05 VBPA:+0.03 INA:+0.00

nn–ms[l] +dt–num[l]

TB 96.782 81.11 51.05 NNU :−2.91 JJA:−0.32 CCA:+0.03 VBGU :+1.05SV 96.858 84.41 50.73 NNU :−2.16 VBZU :−2.94 RBSA:+0.42 CDA:+0.01

Table 4.8: Overall accuracy figures and most significant changes in F-scores over specificsPOSs for modifications described in Section 4.3.5 (See Table 4.5 for explanation ofsymbols)

• The copular verb mapping, denoted vb–cop[lm]

Lists generated from the data were used in the following variants:

• The degree adverb mapping denoted rb–deg[ld]

• The mapping for verbs which are frequently followed by particles vb–rp[ld]

Another lexical modification used external sources of information to create the word-list. The LinGO ERG (Copestake and Flickinger 2000) is a freely available precisiongrammar with a lexicon which encodes very fine-grained syntactic and semantic distinc-tions between wordforms. One of these features is noun countability : the distinctionbetween nouns which can be pluralised (the majority), such as chair which becomeschairs and those which cannot, such as inspiration. We can extract from the lexicon alist of non-count nouns and use this to create a mapping of count nouns for the NN class,designated nn–ms. Similarly we can also extract a (much smaller) list of nouns such asarmaments which only occur in the plural. This modification was combined with thedt–sg[l] mapping described in Section 4.3.4 since we would expect these two mappingswhich deal with number in noun phrases to interact with each other somewhat. We mightexpect this mapping to reduce confusions between NN and NNS.

The results are shown in Table 4.8. There are again several modifications whichmaintain benchmark performance either with fnTBL or SVMTool. fnTBL had no netchange with the vb–inf[lm] modification, however the POSs most affected are somewhatsurprising – the most significant changes were over RBs. Examination of the outputsuggests that this is because of fewer RB/JJ confusions which are perhaps distributeddifferently near to the targeted words. Another puzzling F-score change is over NN innn–ms[l] + dt–num[l]: while we might might have hoped to tag NN more accuratelywith this modification, performance in fact dropped appreciably.

38

TBL SVM MaxEntAll Unk Sent All Unk Sent All Unk Sent

Benchmark 96.842 81.94 51.77 96.852 84.62 50.72 97.056 87.34 53.72Freq-based PRP:2 96.839 81.81 51.69 96.851 84.60 50.67 97.048 87.40 53.51Freq-based RB:3 96.843 81.73 51.72 96.855 84.67 50.71 97.056 87.28 53.72Clust rp(cl–c) 96.840 81.65 51.73 96.852 84.65 50.73 97.053 87.30 53.68Clust in(cl–c,s) 96.831 81.79 51.52 96.865 84.64 50.90 97.065 87.32 53.78Clust in(cl–c) 96.850 82.00 51.82 96.855 84.61 50.74 97.050 87.32 53.59in–sub[s] 96.842 81.76 51.63 96.855 84.65 50.77 97.050 87.37 53.51vb–inf[lm] 96.842 81.89 51.61 96.848 84.59 50.66 – – –vb–cop[s] 96.839 81.78 51.62 96.846 84.70 50.70 – – –in–rp[s] + in–sub[ld] 96.832 81.59 51.51 96.851 84.63 50.73 – – –

Table 4.9: Accuracy (%) of the best-performing or most motivated tag modifications foreach of the broad methods discussed, with the highest accuracy figure in each column inbold

TBL SVMAll Unk Sent All Unk Sent

Benchmark 96.68 83.71 49.52 96.75 87.23 49.76Clust in(cl–c,s) 96.68 83.59 49.91 96.78 87.38 50.04in–sub[s] 96.70 84.07 50.00 96.77 87.32 49.94vb–inf[lm] 96.73 84.10 49.94 96.75 87.26 49.74

Table 4.10: Accuracy (%) of selected tag modifications from Table 4.9 over the held-out129K-token test set of sections 22 and 23 of the WSJ corpus

4.4 Overall Summary

In Table 4.9, we evaluate some of the most promising modifications from each ofthe broad categories. Over the training/development data, the clustering modificationsachieve higher accuracy in general than our linguistically motivated modifications. Weshow results for selected modifications in Table 4.10 over the test set to determine theextent of overfitting. Time constraints prevented testing with StanME however we wouldexpect from previous results that it would follow a simlar pattern to SVMTool. Thesefigures come from a much smaller dataset of only 129K tokens and are therefore a lessaccurate reflection of performance. The linguistic modifications if anything seem moresuccessful over this data, however the size of the change over the comparatively smalldataset is such that it is unlikely to be statistically significant. In general it seems thatthe linguistic modifications are less data-dependent while the data-driven modificationsmay have a slight tendency to overfit. Nonetheless, it seem that according to the resultsfor SVMTool that it is possible there is some genuine improvement using the clusteringmodifications, and the modification we show provides the best performance over the testset.

39

Chapter 5

Conclusion

5.1 Discussion

This thesis has detailed a thorough investigation into the possibility of improving POStagging accuracy by subdividing the tagset used in ways which make it more informative.We have shown the effect of three alternative types of modification: naive, linguisticallymotivated and data-driven.

Our results in general have not supported the hypothesis that it is possible to achievesignificant performance improvements in POS tagging over the Penn Treebank by utilisinga finer-grained tagset. Even with a diverse range of modifications to both closed andopen classes, we have not found a mapping we could introduce which led to statisticallysignificant performance improvements, while we were able to come up with a number ofmodifications which led to noticeable accuracy reductions.

These results could be seen to support the intuitions of Marcus et al. (1993) about thedesirability of a coarse tagset to avoid detrimental effects on accuracy of data sparseness.It seems that the linguistic constructions in which particular modification might be usefulare not frequent enough to improve performance, and that if these modifications do haveany positive effect on accuracy these effects are more than counteracted by the effect ofthe more sparsely populated feature vectors.

There are several obvious reasons for the difficulty in improving performance here. It isthe most difficult 3% of tokens which we are attempting to tag correctly. Among these arewords which probably cannot be tagged correctly with a small context window, wordsfor which humans would have difficulty agreeing on a tag, and tags which are taggedincorrectly in the gold standard (a fact which was explored in Ratnaparkhi (1996)).This thesis lends weight to argument that the 97.0% “glass ceiling” in tagger accuracyprobably has as much to do with the estimated 3% error rate quoted in the Penn Treebankdocumentation as a lack of specific contextual information.

However despite this, there are still reasons to believe that there is room for im-provement. As mentioned in Section 3.2, the observation of Brill and Wu (1998) thatthere is high degree of complementarity in errors made by taggers including maximumentropy and TBL suggest that at least some of the time there is sufficient informationavailable and problems in correct application. Given this, the lack of success so far inapplying linguistic intuition was surprising. While the highest-performing modificationwas the linguistically-motivated reintroduction of subordinators, accuracy in this best

40

case was not significantly different from using an unmodified tagset. However the worstof the linguistically motivated modifications resulted in markedly lower accuracy thanthe benchmark. Even modifications targeted at addressing a specific confusion (such asrb-deg) reduced overall performance and failed to produce noticeable changes in F-scoresover the affected POSs.

The clustering was not designed on a particularly firm theoretical basis; rather, weattempted it as a comparison with the linguistically motivated methods. Despite this, ithas produced some intra-POS clusters which, at least over the development set, improveperformance. As the results in Table 4.10 show, there is some overfitting occurring, butthere are still appreciable performance improvements.

We have evaluated a range of naive, data-driven and linguistically informed approachesto tagset modification. Results from modifications in all of the areas ranged from appre-ciable performance deterioration to approximately constant performance, with just onemodification achieving a noticeable improvement over the test and development data.The clustering clearly has the potential to produce substantial improvements when thefeature values are derived from the development data, while the linguistic modificationshave not produced such results. However, it seems from Table 4.9 and Table 4.10 that thelinguistic modifications are less data-dependent in that they perform equally well overunseen data.

5.2 Further Work

A number of opportunities exist for future research. It is possible that some of themodifications we added which kept performance at an approximately constant level (mostnotably the in–sub[s] modification) would actually result in better performance in down-stream applications such as chunk parsing. Evaluating modifications extrinsically withinapplications was beyond the scope of the investigation but remains an open possibility.

Another area of potential research in this domain would be to follow more closelythe model established by Klein and Manning (2003)’s approach to parsing. If we wererestricted to unlexicalised tagging, it is possible new distinctions in the tagset could be farmore productive, since the baseline tagger would have more impoverished information.However, such an approach would only be useful if an unlexicalised tagger could be shownto have other advantages over a lexicalised tagger, such as shorter times for training ortagging, or ability to be trained from smaller data sets.

A possible method for avoiding the aforementioned problems of data sparseness isusing a two-tiered classification of POS tags. We conducted a preliminary investigationof this, systematically adding delimiters to newly created tags, and adding contextualfeatures to the tagger (in this case SVMTool) dependent on the portion of the POS tagpreceding or following the delimiter. Multiple levels of classification of POS tags areused successfully in the jaws tagging system (Garside et al. 1997) but do not appearto have been applied to the the Penn Treebank. This method should give the taggersaccess to the more densely populated coarse-tag features when necessary, but when thesubtler distinctions we have added are useful they are available. The first stages ofinvestigation did not produce particularly promising results, however far more extensiveexperimentation is possible.

41

Bibliography

Brants, Thorsten. 2000. TnT - a statistical part-of-speech tagger. In Proceedings ofthe 6th Applied Natural Language Processing Conference, 224–231, Seattle, USA.

Brill, Eric. 1995. Transformation-based error-driven learning and natural languageprocessing: A case study in part-of-speech tagging. Computational Linguistics21.543–65.

——, and Jun Wu. 1998. Classifier combination for improved lexical disambiguation.In Proceedings of the 36th Annual Meeting of the Association for ComputationalLinguistics and 17th International Conference on Computational Linguistics , 191–195, Montreal, Canada.

Charniak, Eugene, Curtis Hendrickson, Neil Jacobson, and MikePerkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of theNational Conference on Artificial Intelligence, 784–789, Washington, USA.

Church, Kenneth. 1988. A stochastic parts program and noun phrase parser forunrestricted text. In Proceedings of the 2nd Conference on Applied Natural LanguageProcessing , 136–143, Austin, USA.

Clark, Alexander. 2003. Combining distributional and morphological informationfor part of speech induction. In Proceedings of the 10th Conference of the Euro-pean Chapter of the Association for Computational Linguistics , 59–66, Budapest,Hungary.

Copestake, Ann, and Dan Flickinger. 2000. An open-source grammar developmentenvironment and broad-coverage English grammar using HPSG. In Proceedings ofthe Second conference on Language Resources and Evaluation (LREC-2000), Athens,Greece.

Cristianini, Nello, and John Shawe-Taylor. 2000. An Introduction to SupportVector Machines and other kernel-based learning methods . Cambridge, UK: Cam-bridge University Press.

Crystal, David. 1987. The Cambridge Encyclopedia of Language. Cambridge, UK:Cambridge University Press.

Daelemans, Walter, Jakob Zavrel, Peter Berck, and Steven Gillis. 1996.MBT: A memory-based part of speech tagger-generator. In Fourth Workshop onVery Large Corpora, Copenhagen, Denmark.

42

DeRose, Steven J. 1988. Grammatical category disambiguation by statistical opti-mization. Computational Linguistics 14.31–39.

Francis, W. N., and H. Kucera, 1979. Brown Corpus Manual: Manual of Informationto accompany A Standard Corpus of Present-Day Edited American English for usewith Digital Computers . Brown University, Providence, USA.

Garside, Roger. 1987. The CLAWS tagging system. In A Computational Analysisof English, ed. by Roger Garside, Geoffrey Leech, and Geoffrey Sampson, chapter 2.Essex, England: Longman Group UK.

——, Geoffrey Leech, and Anthony McEnery (eds.) 1997. Corpus Annotation:Linguistic Information from Computer Text Corpora. New York, USA: AddisonWesley Longman Ltd.

Gimenez, Jesus, and Lluıs Marquez. 2003. Fast and accurate part-of-speech tagging:The SVM approach revisited. In Proceedings of the International Conference onRecent Advances on Natural Language Processing , Borovets, Bulgaria.

——, and ——. 2004. SVMTool: A general POS tagger generator based on support vectormachines. In Proceedings of the 4th International Conference on Language Resourcesand Evaluation, Lisbon, Portugal.

Huddleston, Rodney, and Geoffrey K. Pullum (eds.) 2002. The CambridgeGrammar of the English Language. Cambridge, UK: Cambridge University Press.

Johansson, Stig, Geoffrey Leech, and Helen Goodluck. 1978. Manual OfInformation To Accompany The Lancaster-Oslo/Bergen Corpus Of British English,For Use With Digital Computers . Oslo, Norway: Department of English, Universityof Oslo.

Jurafsky, Daniel, and James Martin. 2000. Speech and Language Processing .Prentice-Hall Series in Artificial Intelligence. Upper Saddle River, USA: Prentice-Hall.

Klein, Dan, and Christopher D. Manning. 2003. Accurate unlexicalized parsing.In Proceedings of the 41st Annual Meeting of the Association for ComputationalLinguistics , 423–430, Sapporo, Japan.

Leech, Geoffrey. 1997. Grammatical tagging. In Corpus Annotation: LinguisticInformation from Computer Text Corpora, ed. by Roger Garside, Geoffrey Leech,and Anthony McEnery, chapter 2. New York, USA: Addison Wesley Longman Ltd.

Loper, Edward, and Steven Bird. 2002. NLTK: The natural language toolkit. InProceedings of the ACL Workshop on Effective Tools and Methodologies for TeachingNatural Language Processing and Computational Linguistics , 62–69, Philadelphia,USA.

Malouf, Robert. 2002. A comparison of algorithms for maximum entropy parameterestimation. In Proc. of the 6th Conference on Natural Language Learning (CoNLL-2002), 49–55, Taipei, Taiwan.

43

Manning, Christopher D., and Hinrich Schutze. 1999. Foundations of StatisticalNatural Language Processing . Cambridge, USA: The MIT Press.

Marcus, Mitchell, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993.Building a large annotated corpus of English: The Penn Treebank. ComputationalLinguistics 19.313–330.

Mikheev, Andrei. 2000. Tagging sentence boundaries. In Proceedings of the FirstMeeting of the North American Chapter of the Association for Computational Lin-guistics , 264–271, Seattle, USA.

Nakagawa, Tetsuji, Taku Kudoh, and Yuji Matsumoto. 2001. Unknown wordguessing and part-of-speech tagging using support vector machines. In Proceedingsof the Sixth Natural Language Processing Pacific Rim Symposium, 325–331, Tokyo,Japan.

Ngai, Grace, and Radu Florian. 2001. Transformation-based learning in the fastlane. In Proceedings of the Second Meeting of the North American Chapter of theAssociation for Computational Linguistics , 40–7, Pittsburgh, USA.

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik.1985. A Comprehensive Grammar of the English Language. New York, USA: Long-man.

Ratnaparkhi, Adwait. 1996. A maximum entropy part-of-speech tagger. In Pro-ceedings of the Conference on Empirical Methods in Natural Language Processing ,133–142, Philadelphia, USA.

——, 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Uni-versity of Pennsylvania dissertation.

Sampson, Geoffrey. 1987. Appendix B: Alternative grammatical coding systems. InA Computational Analysis of English, ed. by Roger Garside, Geoffrey Leech, andGeoffrey Sampson. Essex, England: Longman Group UK.

Santorini, Beatrice, 1990. Part-of-Speech Tagging Guidelines for the Penn TreebankProject , 2nd printing, 3rd edition.

Toutanova, Kristina, and Christoper D. Manning. 2000. Enriching the knowl-edge sources used in a maximum entropy part-of-speech tagger. In Proceedings ofthe 2000 Joint SIGDAT Conference on Empirical Methods in NLP and Very LargeCorpora, 63–70, Hong Kong, China.

Witten, Ian H., and Eibe Frank. 2000. Data Mining: Practical Machine LearningTools and Techniques with Java Implementations . San Francisco, USA: MorganKaufmann.

44

Appendix A

The Penn Tagset

We reproduce the Penn Tagset in full here, providing a range of examples selecetdto indicate the scope of word types in the class. Note that many of the examples areambiguous with one or more other word types apart from the class under which they arelisted.

Tag Description Examples$ dollar $ -$ –$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$

“ opening quotation mark ‘ “

” closing quotation mark ’ ”

( opening parenthesis ( [ {) closing parenthesis ) ] }, comma ,

– dash –

. sentence terminator . ! ?

: colon or ellipsis : ; ...

CC conjunction, coordinating and but or ...

CD numeral, cardinal ten-thirty forty-three 25 one-tenth million

DT determiner all an another any both each many neither somethat the these

EX existential there there

FW foreign word ich jeux habeas alai je jour ...

IN preposition or conjunction,subordinating

in out inside on by below within foruntil intothan whether if because before that...

JJ adjective or numeral, ordinal third alarmed English resilient different finan-cial early big average ...

JJR adjective, comparative earlier greater shaper fewer older more lesscleaner creamier ...

JJS adjective, superlative latest most largest oldest loudest least dirtiestnearest ...

LS list item marker A A. B B. C C. First One one two three ...

MD modal auxiliary can could may might must need shall should willwould ...

45

NN noun, common, singular ormass

director yield exercise chairman cigarette per-centage rate growth milk book ...

NNP noun, proper, singular Lorillard Pacific McDermott Indianapolis Jan-uary Rothschild Frederick Japan Tuesday ...

NNPS noun, proper, plural Asians Cabernets States Airlines DemocratsProtestants Rothschilds ...

NNS noun, common, plural filters men workers ratepayers units peoplerights counterparts capsules quantities ...

PDT pre-determiner all both half many quite such sure this

POS genitive marker ’ ’s

PRP pronoun, personal hers herself him it I me myself ours ourselves hetheirs them we ...

PRP$ pronoun, possessive her his mine my our their thy your

RB adverb heavily far not perhaps again still here often in-creasingly very relatively also then on over in...

RBR adverb, comparative more further earlier better closer less laterharder...

RBS adverb, superlative most best hardest least ...

RP particle in on off up across even about along through ...

SYM symbol % & ’ ” ”. ) ). * + ,. < = > @

TO “to” as preposition or infini-tive marker

to

UH interjection oh yes well heck quackw wow hey ...

VB verb, base form be have make oversee treat prove remain seemrefund get work offer share ...

VBD verb, past tense were dumped poured had contracted was madeupheld favored said became took gave ...

VBG verb, present participle orgerund

being doing trying increasing running reducingpredicting ...

VBN verb, past participle been taken become based considered broken got-ten managed surfaced given studied sold ...

VBP verb, present tense, not 3rdperson singular

am are have allow offer argue invest talk men-tion seem ...

VBZ verb, present tense, 3rd per-son singular

is has does appears requires follows describessays makes ...

WDT Wh-determiner that what whatever which whichever

WP Wh-pronoun what who whom ...

WP$ Wh-pronoun, possessive whose

WRB Wh-adverb how however when where whereby why ...

46

Appendix B

Complete Results

Here we show the results for all of the trials we attempted , with mappings referred to in thebody of the thesis named explicitly by the heading, and conditions under which the tags weremapped listed before the results. We show precision recall and F-score over individal POSs,and accuracy over the global metrics. The column headed “F chg” denotes the change inF-score or accuracy relative (as appropriate) relative to the benchmark figures. Finally thecolumn head “p(F chg)”’ shows the statistical significance of the change according to thepaired t-test evaluated over the five cross-validation folds.

47

B.1

(Pos = RB & Wd ∈ {absolutely, admirably, all,...<153 ommitted>...,wickedly, wildly,wonderfully}) ⇒ (Pos ← RB–JJ)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.37 99.50 0.02 0.281 1 0.00 99.50 – –CD 40132 99.34 99.44 99.39 -0.02 0.138 3413 98.75 96.98 99.39 -0.23 0.217DT 90066 99.34 99.49 99.41 0.01 0.362 2 0.00 99.41 – –EX 951 95.43 98.84 97.11 -0.16 0.393 0 97.11 – –FW 238 58.82 46.22 51.76 0.06 0.993 67 0.00 0.00 51.76 – –IN 108456 98.16 98.69 98.43 -0.00 0.938 22 0.00 0.00 98.43 – –JJ 67085 91.97 91.19 91.58 -0.09 0.348 4581 78.87 72.84 91.58 -0.36 0.524JJR 3621 84.59 90.80 87.59 0.04 0.892 85 77.27 20.00 87.59 -8.85 0.178JJS 2129 85.25 93.14 89.02 -4.54 0.066 54 70.18 74.07 89.02 -2.44 0.186MD 10743 99.65 99.80 99.73 0.01 0.469 5 0.00 0.00 99.73 – –NN 146173 96.17 96.02 96.10 -0.03 0.310 3934 74.50 68.84 96.10 -0.32 0.622NNP 100926 96.58 97.37 96.97 -0.05 0.367 6075 83.30 94.24 96.97 -0.16 0.766NNPS 2917 63.04 66.20 64.58 -1.55 0.188 234 34.68 18.38 64.58 15.61 0.914NNS 65922 97.60 97.80 97.70 -0.03 0.206 2353 83.65 86.10 97.70 -1.06 0.072PDT 397 70.51 80.10 75.00 -0.24 0.686 0 75.00 – –POS 9529 98.94 99.50 99.22 -0.01 0.798 0 99.22 – –PRP 19164 99.75 99.36 99.55 0.00 0.390 11 0.00 99.55 – –PRP$ 9173 99.38 99.91 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.24 90.76 92.47 -0.11 0.073 516 85.71 80.23 92.47 0.72 0.470RBR 1905 75.96 65.51 70.35 -0.18 0.918 7 0.00 0.00 70.35 – –RBS 486 61.30 36.83 46.02 -41.42 0.070 2 0.00 46.02 – –RP 2879 78.39 75.72 77.03 -0.01 0.975 0 77.03 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.16 95.34 95.25 -0.18 0.199 570 82.06 77.02 95.25 -0.32 0.737VBD 32941 95.71 94.48 95.09 0.03 0.572 426 80.35 64.32 95.09 -1.11 0.244VBG 16321 91.39 92.41 91.90 0.11 0.051 924 70.83 93.29 91.90 1.66 0.104VBN 22177 86.90 90.90 88.86 0.02 0.771 716 65.90 83.66 88.86 -0.38 0.446VBP 13819 93.54 92.26 92.90 -0.08 0.498 131 63.81 51.15 92.90 2.38 0.781VBZ 23816 97.73 96.52 97.12 0.10 0.025 467 88.26 56.32 97.12 3.00 0.307WDT 4745 96.61 95.62 96.11 0.03 0.663 4 0.00 96.11 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.21 -1.17 0.031TOKENS 1044667 96.800 -0.04 0.046 24622 81.73 -0.26 0.445

48

B.2 rb–deg[ld] Mapping

(Pos = RB & Wd ∈ {absolutely, all, any,...<33 ommitted>...,utterly, very, wildly}) ⇒ (Pos ←RB–DEG)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.91 99.89 -0.02 0.208 0 99.89 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.39 99.50 0.02 0.332 1 0.00 99.50 – –CD 40132 99.35 99.45 99.40 -0.01 0.649 3413 98.84 97.07 99.40 -0.13 0.466DT 90066 99.31 99.48 99.40 -0.01 0.744 2 0.00 99.40 – –EX 951 95.33 98.84 97.06 -0.21 0.241 0 97.06 – –FW 238 55.50 46.64 50.68 -2.02 0.495 67 0.00 50.68 – –IN 108456 98.15 98.70 98.42 -0.00 0.862 22 0.00 0.00 98.42 – –JJ 67085 91.85 91.34 91.59 -0.08 0.197 4581 77.18 73.02 91.59 -1.28 0.157JJR 3621 84.90 90.83 87.77 0.25 0.406 85 71.43 23.53 87.77 1.54 0.865JJS 2129 87.91 96.62 92.06 -1.29 0.854 54 75.47 74.07 92.06 1.21 0.194MD 10743 99.67 99.80 99.73 0.01 0.423 5 0.00 99.73 – –NN 146173 96.19 96.07 96.13 -0.00 0.927 3934 74.11 69.06 96.13 -0.40 0.643NNP 100926 96.71 97.28 96.99 -0.02 0.392 6075 83.64 93.10 96.99 -0.51 0.155NNPS 2917 63.50 66.85 65.13 -0.72 0.215 234 39.32 19.66 65.13 26.14 0.957NNS 65922 97.58 97.87 97.72 -0.01 0.765 2353 82.66 88.10 97.72 -0.56 0.317PDT 397 71.30 77.58 74.31 -1.16 0.071 0 74.31 – –POS 9529 98.93 99.52 99.22 0.00 0.978 0 99.22 – –PRP 19164 99.74 99.37 99.55 0.00 0.379 11 0.00 99.55 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.13 91.06 92.57 -0.00 0.947 516 83.27 79.07 92.57 -1.43 0.317RBR 1905 75.97 66.56 70.96 0.69 0.470 7 0.00 0.00 70.96 – –RBS 486 85.40 48.15 61.58 -21.61 0.259 2 0.00 61.58 – –RP 2879 79.06 75.69 77.34 0.38 0.313 0 77.34 – –SYM 59 78.12 84.75 81.30 -0.45 0.856 1 0.00 0.00 81.30 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.38 95.33 95.36 -0.07 0.386 570 83.40 75.79 95.36 -0.37 0.354VBD 32941 95.68 94.47 95.07 0.01 0.826 426 82.52 59.86 95.07 -3.96 0.102VBG 16321 91.62 92.10 91.86 0.07 0.085 924 70.82 91.67 91.86 0.88 0.270VBN 22177 86.99 90.64 88.78 -0.06 0.569 716 65.82 79.33 88.78 -2.78 0.069VBP 13819 93.60 92.16 92.87 -0.11 0.220 131 64.71 50.38 92.87 2.15 0.658VBZ 23816 97.66 96.57 97.12 0.09 0.049 467 86.93 56.96 97.12 3.10 0.300WDT 4745 96.42 95.45 95.93 -0.16 0.098 4 0.00 95.93 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.51 -0.60 0.207TOKENS 1044667 96.825 -0.02 0.399 24622 81.44 -0.61 0.156

49

B.3 rb–deg[s] Mapping

(Pos = RB & SibR = RB) ⇒ (Pos ← RB–DEG)(Pos = RB & SibR = JJ) ⇒ (Pos ← RB–DEG)(Pos = RB & Par = ADJP) ⇒ (Pos ← RB–DEG)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.67 99.41 99.54 0.00 0.799 1 0.00 99.54 – –CD 40132 99.28 99.57 99.43 0.01 0.230 3413 99.35 98.51 99.43 – –DT 90066 99.46 99.30 99.38 0.01 0.295 2 0.00 99.38 – –EX 951 95.18 99.58 97.33 -0.05 0.700 0 97.33 – –FW 238 65.00 32.77 43.58 1.30 0.265 67 0.00 0.00 43.58 – –IN 108456 97.81 98.94 98.37 0.00 0.427 22 0.00 98.37 – –JJ 67085 92.24 92.18 92.21 -0.01 0.848 4581 79.92 82.25 92.21 0.28 0.081JJR 3621 85.03 92.10 88.43 0.02 0.904 85 87.50 24.71 88.43 -7.02 0.196JJS 2129 95.19 95.77 95.48 0.03 0.374 54 86.84 61.11 95.48 2.00 0.374MD 10743 99.70 99.78 99.74 -0.00 0.978 5 0.00 99.74 – –NN 146173 97.16 95.10 96.12 0.01 0.278 3934 81.01 67.77 96.12 0.07 0.212NNP 100926 94.62 97.96 96.26 -0.03 0.084 6075 82.40 98.17 96.26 -0.05 0.049NNPS 2917 54.76 73.40 62.72 0.17 0.000 234 53.85 11.97 62.72 -0.35 0.374NNS 65922 97.73 97.20 97.46 -0.00 0.792 2353 87.27 90.35 97.46 -0.01 0.959PDT 397 69.07 84.38 75.96 -0.68 0.234 0 75.96 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.38 99.96 99.67 -0.01 0.374 1 0.00 99.67 – –RB 33806 95.79 89.87 92.73 -0.07 0.515 516 91.63 80.62 92.73 0.17 0.713RBR 1905 76.92 67.87 72.11 0.35 0.374 7 0.00 0.00 72.11 – –RBS 486 87.79 84.36 86.04 0.00 0.914 2 0.00 0.00 86.04 – –RP 2879 75.34 75.89 75.62 -0.13 0.462 0 75.62 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 70.37 38.00 49.35 0.76 0.880 17 33.33 5.88 49.35 – –VB 29021 96.70 94.64 95.66 0.01 0.290 570 85.88 78.95 95.66 0.00 0.919VBD 32941 94.97 95.94 95.46 -0.00 0.768 426 82.45 72.77 95.46 2.73 0.006VBG 16321 92.28 94.14 93.20 -0.09 0.056 924 82.34 91.34 93.20 0.22 0.145VBN 22177 88.25 90.87 89.54 -0.02 0.588 716 79.13 73.60 89.54 0.96 0.282VBP 13819 93.95 92.18 93.06 -0.00 0.982 131 74.14 32.82 93.06 2.38 0.552VBZ 23816 98.82 96.41 97.60 -0.00 0.792 467 90.12 64.45 97.60 -0.08 0.706WDT 4745 97.20 96.42 96.80 0.03 0.071 4 0.00 96.80 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.91 99.96 0.02 0.374 1 0.00 99.96 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.63 -0.18 0.264TOKENS 1044667 96.847 -0.01 0.500 24622 84.72 0.12 0.092

50

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 1 100.00 100.00 100.00 – – 0 100.00 – –$ 342 100.00 100.00 100.00 0.01 0.178 0 100.00 – –” 399 100.00 100.00 100.00 0.09 0.030 0 100.00 – –, 2571 99.92 99.96 99.94 -0.05 0.080 0 99.94 – –-LRB- 52 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 55 100.00 100.00 100.00 – – 0 100.00 – –. 1959 100.00 100.00 100.00 – – 0 100.00 – –: 293 100.00 100.00 100.00 0.03 0.206 0 100.00 – –CC 1141 98.96 99.65 99.30 -0.18 0.306 0 99.30 – –CD 1439 99.02 98.54 98.78 -0.63 0.089 116 99.07 92.24 98.78 -2.59 0.162DT 4076 99.31 99.34 99.33 -0.08 0.560 0 99.33 – –EX 49 96.08 100.00 98.00 0.76 0.524 0 98.00 – –FW 2 33.33 50.00 40.00 – – 1 0.00 40.00 – –IN 4952 97.98 98.87 98.42 -0.00 0.969 1 0.00 0.00 98.42 – –JJ 2992 91.76 91.58 91.67 0.01 0.939 250 84.23 74.80 91.67 4.24 0.344JJR 157 83.12 84.71 83.91 -4.15 0.120 3 0.00 83.91 – –JJS 93 92.39 91.40 91.89 -1.46 0.603 3 100.00 66.67 91.89 – –MD 413 99.76 100.00 99.88 0.16 0.411 0 99.88 – –NN 6268 95.93 95.95 95.94 -0.20 0.232 162 72.61 70.37 95.94 -0.44 0.792NNP 5202 97.33 97.44 97.39 0.38 0.073 384 88.83 95.31 97.39 3.83 0.021NNPS 95 53.33 67.37 59.53 -9.25 0.182 6 28.57 33.33 59.53 – –NNS 3004 97.71 97.97 97.84 0.11 0.581 115 83.87 90.43 97.84 1.47 0.684PDT 9 43.75 77.78 56.00 – – 0 56.00 – –POS 403 99.50 99.75 99.63 0.41 0.021 0 99.63 – –PRP 954 99.58 99.16 99.37 -0.18 0.446 0 99.37 – –PRP$ 409 98.55 99.76 99.15 -0.49 0.159 0 99.15 – –RB 1490 94.09 91.81 92.93 0.39 0.439 21 80.00 95.24 92.93 5.67 0.267RBR 86 68.92 59.30 63.75 -9.54 0.129 1 100.00 100.00 63.75 – –RBS 19 66.67 63.16 64.86 – – 0 64.86 – –RP 140 83.47 72.14 77.39 0.46 0.796 0 77.39 – –SYM 1 0.00 0.00 0.00 – – 0 0.00 – –TO 1028 99.71 100.00 99.85 -0.14 0.209 0 99.85 – –UH 1 0.00 0.00 0.00 – – 0 0.00 – –VB 1196 94.22 96.74 95.46 0.04 0.872 28 85.71 85.71 95.46 7.53 0.447VBD 1548 95.77 93.54 94.64 -0.44 0.203 20 91.67 55.00 94.64 -4.85 0.556VBG 764 92.20 91.23 91.71 -0.09 0.789 42 65.00 92.86 91.71 -3.46 0.376VBN 1031 87.89 90.79 89.31 0.54 0.586 44 70.59 81.82 89.31 2.41 0.837VBP 727 95.26 91.20 93.18 0.23 0.772 16 76.92 62.50 93.18 24.35 0.249VBZ 1133 98.57 97.18 97.87 0.87 0.147 20 100.00 55.00 97.87 – –WDT 204 98.01 96.57 97.28 1.24 0.643 1 0.00 97.28 – –WP 141 100.00 100.00 100.00 0.66 0.002 0 100.00 – –WP$ 6 100.00 100.00 100.00 – – 0 100.00 – –WRB 92 100.00 100.00 100.00 0.06 0.213 0 100.00 – –“ 409 100.00 100.00 100.00 0.01 0.374 0 100.00 – –SENT 1963 50.53 -2.48 0.000TOKENS 47356 96.833 -0.01 0.921 1234 83.79 2.26 0.149

51

B.4 vb–cop[s] + rb–deg[s] Mapping

(Pos = VB & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–C)(Pos = VB & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–C)and similarly for other VB.*(Pos = RB & SibR = RB) ⇒ (Pos ← RB–DEG)(Pos = RB & SibR = JJ) ⇒ (Pos ← RB–DEG)(Pos = RB & Par = ADJP) ⇒ (Pos ← RB–DEG)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.37 99.50 0.02 0.442 1 0.00 99.50 – –CD 40132 99.31 99.45 99.38 -0.03 0.143 3413 98.37 97.36 99.38 -0.22 0.140DT 90066 99.32 99.48 99.40 -0.00 0.571 2 0.00 99.40 – –EX 951 95.53 98.95 97.21 -0.05 0.889 0 97.21 – –FW 238 54.73 46.22 50.11 -3.13 0.231 67 20.00 1.49 50.11 – –IN 108456 98.15 98.69 98.42 -0.01 0.633 22 6.67 4.55 98.42 – –JJ 67085 91.94 91.25 91.60 -0.07 0.132 4581 78.70 72.34 91.60 -0.83 0.075JJR 3621 84.88 90.42 87.56 0.02 0.967 85 71.43 29.41 87.56 19.52 0.527JJS 2129 91.58 96.10 93.79 0.57 0.793 54 70.00 77.78 93.79 -0.26 0.374MD 10743 99.65 99.81 99.73 0.01 0.418 5 0.00 99.73 – –NN 146173 96.21 95.99 96.10 -0.03 0.282 3934 73.97 68.20 96.10 -1.14 0.048NNP 100926 96.60 97.35 96.97 -0.04 0.149 6075 83.47 94.12 96.97 -0.10 0.669NNPS 2917 63.08 66.23 64.62 -1.50 0.100 234 36.00 19.23 64.62 20.65 0.339NNS 65922 97.46 98.01 97.74 0.01 0.864 2353 82.32 88.27 97.74 -0.67 0.435PDT 397 71.53 77.83 74.55 -0.84 0.430 0 0.00 74.55 – –POS 9529 98.90 99.53 99.22 -0.00 0.717 0 99.22 – –PRP 19164 99.74 99.36 99.55 -0.00 0.737 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.23 90.74 92.45 -0.13 0.057 516 87.21 79.26 92.45 0.91 0.426RBR 1905 74.43 66.93 70.48 0.01 0.951 7 40.00 28.57 70.48 – –RBS 486 86.34 68.93 76.66 -2.41 0.542 2 0.00 76.66 – –RP 2879 78.31 76.10 77.19 0.19 0.627 0 0.00 77.19 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.27 95.31 95.29 -0.14 0.155 570 83.24 74.91 95.29 -1.07 0.352VBD 32941 95.65 94.53 95.09 0.03 0.539 426 85.30 62.68 95.09 0.01 0.968VBG 16321 91.43 92.37 91.90 0.11 0.302 924 71.69 93.18 91.90 2.30 0.057VBN 22177 86.92 90.60 88.72 -0.13 0.123 716 67.39 82.26 88.72 0.11 0.896VBP 13819 93.85 92.15 92.99 0.02 0.705 131 67.26 58.02 92.99 12.32 0.112VBZ 23816 98.00 96.40 97.19 0.17 0.027 467 86.75 56.10 97.19 2.08 0.516WDT 4745 96.31 95.62 95.96 -0.13 0.341 4 0.00 95.96 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.43 -0.76 0.106TOKENS 1044667 96.822 -0.02 0.148 24622 81.72 -0.27 0.174

52

B.5 vb–cop[lm] + rb–deg[s] Mapping

(Pos = VB & Wd ∈ {appear, become, feel, look, remain, seem, smell, sound}) ⇒ (Pos ← V–J)(Pos = VBD & Wd ∈ {appeared, became, felt, looked, remained, seemed, smelled, smelt,sounded}) ⇒ (Pos ← V–JD)and similarly for other VB.*(Pos = RB & SibR = RB) ⇒ (Pos ← RB–DEG)(Pos = RB & SibR = JJ) ⇒ (Pos ← RB–DEG)(Pos = RB & Par = ADJP) ⇒ (Pos ← RB–DEG)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.39 99.50 0.02 0.340 1 0.00 0.00 99.50 – –CD 40132 99.33 99.42 99.37 -0.03 0.030 3413 98.60 97.19 99.37 -0.19 0.167DT 90066 99.33 99.49 99.41 0.01 0.508 2 0.00 99.41 – –EX 951 95.43 98.74 97.05 -0.21 0.276 0 97.05 – –FW 238 55.12 47.48 51.02 -1.38 0.682 67 0.00 0.00 51.02 – –IN 108456 98.11 98.72 98.41 -0.01 0.182 22 0.00 0.00 98.41 – –JJ 67085 91.89 91.52 91.71 0.05 0.397 4581 78.74 74.29 91.71 0.57 0.444JJR 3621 84.84 90.44 87.56 0.01 0.979 85 82.61 22.35 87.56 – –JJS 2129 91.36 88.35 89.83 -3.68 0.192 54 73.21 75.93 89.83 0.91 0.817MD 10743 99.67 99.81 99.74 0.02 0.139 5 0.00 99.74 – –NN 146173 96.22 96.05 96.14 0.01 0.707 3934 74.15 68.99 96.14 -0.43 0.574NNP 100926 96.77 97.28 97.03 0.01 0.523 6075 84.29 93.35 97.03 0.02 0.976NNPS 2917 63.84 65.96 64.88 -1.10 0.349 234 35.44 11.97 64.88 -13.90 0.640NNS 65922 97.48 97.92 97.70 -0.03 0.239 2353 81.98 88.57 97.70 -0.72 0.201PDT 397 70.40 79.09 74.50 -0.91 0.449 0 0.00 74.50 – –POS 9529 98.89 99.55 99.22 0.00 0.983 0 99.22 – –PRP 19164 99.75 99.36 99.56 0.01 0.184 11 0.00 99.56 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.22 90.92 92.54 -0.04 0.676 516 83.96 82.17 92.54 0.93 0.497RBR 1905 74.78 66.30 70.28 -0.27 0.424 7 25.00 14.29 70.28 – –RBS 486 61.47 70.58 65.71 -16.35 0.248 2 0.00 65.71 – –RP 2879 78.54 75.89 77.19 0.20 0.403 0 77.19 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.19 46.00 56.79 -1.52 0.374 17 0.00 56.79 – –VB 29021 95.38 95.23 95.30 -0.13 0.136 570 84.11 76.14 95.30 0.27 0.882VBD 32941 95.65 94.44 95.04 -0.02 0.543 426 82.13 61.50 95.04 -2.65 0.335VBG 16321 91.54 92.36 91.95 0.17 0.190 924 70.78 91.23 91.95 0.64 0.527VBN 22177 86.97 90.50 88.70 -0.15 0.134 716 66.22 82.68 88.70 -0.63 0.664VBP 13819 93.70 92.16 92.93 -0.05 0.648 131 70.75 57.25 92.93 14.12 0.050VBZ 23816 97.67 96.45 97.05 0.03 0.497 467 86.29 55.25 97.05 0.91 0.647WDT 4745 96.62 95.30 95.96 -0.14 0.057 4 0.00 95.96 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.43 -0.76 0.057TOKENS 1044667 96.821 -0.02 0.017 24622 81.90 -0.05 0.911

53

B.6

(Pos = IN & Wd = than) ⇒ (Pos ← IN–CMP)(Pos = IN & Wd ∈ {’til, although, as,...<9 ommitted>...,whereas, whether, while}) ⇒ (Pos← IN–SUB)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.55 99.40 99.48 -0.00 0.832 1 0.00 0.00 99.48 – –CD 40132 99.34 99.43 99.39 -0.02 0.394 3413 98.72 96.92 99.39 -0.27 0.335DT 90066 99.33 99.45 99.39 -0.01 0.537 2 0.00 99.39 – –EX 951 95.42 98.63 97.00 -0.27 0.555 0 0.00 97.00 – –FW 238 52.27 48.32 50.22 -2.93 0.145 67 100.00 1.49 50.22 – –IN 108456 98.16 98.71 98.44 0.01 0.413 22 0.00 0.00 98.44 – –JJ 67085 91.89 91.28 91.59 -0.08 0.128 4581 79.45 72.23 91.59 -0.45 0.589JJR 3621 84.77 90.20 87.40 -0.17 0.336 85 70.59 14.12 87.40 – –JJS 2129 90.95 96.29 93.54 0.31 0.814 54 71.93 75.93 93.54 – –MD 10743 99.69 99.76 99.73 0.01 0.366 5 0.00 99.73 – –NN 146173 96.17 96.04 96.10 -0.03 0.236 3934 73.71 69.70 96.10 -0.19 0.760NNP 100926 96.63 97.34 96.98 -0.03 0.197 6075 83.24 94.32 96.98 -0.15 0.413NNPS 2917 64.69 66.06 65.37 -0.36 0.473 234 47.37 11.54 65.37 -10.70 0.703NNS 65922 97.57 97.92 97.74 0.01 0.413 2353 83.49 88.10 97.74 -0.04 0.878PDT 397 69.87 78.84 74.08 -1.46 0.361 0 0.00 74.08 – –POS 9529 98.95 99.50 99.22 0.00 0.995 0 99.22 – –PRP 19164 99.73 99.38 99.55 0.00 0.371 11 0.00 99.55 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.15 90.87 92.48 -0.10 0.223 516 83.60 81.01 92.48 -0.01 0.964RBR 1905 75.45 65.98 70.40 -0.10 0.953 7 0.00 0.00 70.40 – –RBS 486 86.10 65.02 74.09 -5.68 0.527 2 0.00 74.09 – –RP 2879 78.51 75.62 77.03 -0.01 0.895 0 77.03 – –SYM 59 80.95 86.44 83.61 2.38 0.178 1 0.00 83.61 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 73.33 44.00 55.00 -4.63 0.196 17 0.00 55.00 – –VB 29021 95.42 95.29 95.36 -0.07 0.346 570 82.22 75.44 95.36 -1.29 0.159VBD 32941 95.54 94.64 95.09 0.03 0.726 426 84.62 64.55 95.09 1.36 0.171VBG 16321 91.34 92.30 91.81 0.02 0.835 924 69.02 93.07 91.81 0.07 0.939VBN 22177 87.19 90.34 88.74 -0.11 0.304 716 66.44 81.01 88.74 -1.36 0.164VBP 13819 93.52 92.42 92.96 -0.01 0.970 131 62.04 51.15 92.96 1.09 0.974VBZ 23816 97.75 96.48 97.11 0.09 0.167 467 87.54 55.67 97.11 1.96 0.447WDT 4745 96.47 96.06 96.26 0.18 0.059 4 0.00 96.26 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.54 -0.53 0.313TOKENS 1044667 96.826 -0.02 0.405 24622 81.77 -0.21 0.397

54

B.7

(Pos = TO & SibR = NP) ⇒ (Pos ← TO–IN)(Pos = TO & Par = QP) ⇒ (Pos ← TO–Q)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.39 99.50 0.02 0.474 1 0.00 0.00 99.50 – –CD 40132 99.29 99.49 99.39 -0.02 0.385 3413 98.26 97.36 99.39 -0.28 0.222DT 90066 99.27 99.50 99.39 -0.02 0.287 2 0.00 99.39 – –EX 951 95.53 98.84 97.16 -0.11 0.794 0 97.16 – –FW 238 50.88 48.32 49.57 -4.18 0.049 67 25.00 1.49 49.57 – –IN 108456 98.23 98.62 98.43 -0.00 0.938 22 0.00 0.00 98.43 – –JJ 67085 91.98 91.26 91.62 -0.05 0.302 4581 77.78 72.95 91.62 -0.96 0.087JJR 3621 85.08 89.89 87.42 -0.15 0.599 85 77.78 24.71 87.42 – –JJS 2129 90.92 94.08 92.47 -0.84 0.823 54 70.18 74.07 92.47 -2.44 0.195MD 10743 99.66 99.76 99.71 -0.01 0.479 5 0.00 99.71 – –NN 146173 96.25 96.02 96.14 0.01 0.643 3934 74.63 69.09 96.14 -0.05 0.982NNP 100926 96.75 97.24 97.00 -0.02 0.473 6075 84.23 92.76 97.00 -0.32 0.341NNPS 2917 62.58 66.85 64.64 -1.46 0.203 234 30.82 20.94 64.64 20.01 0.743NNS 65922 97.54 97.90 97.72 -0.01 0.725 2353 82.32 87.63 97.72 -1.02 0.255PDT 397 70.75 78.59 74.46 -0.95 0.320 0 74.46 – –POS 9529 98.92 99.51 99.21 -0.01 0.549 0 99.21 – –PRP 19164 99.73 99.35 99.54 -0.01 0.107 11 0.00 0.00 99.54 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.03 91.16 92.57 -0.00 0.945 516 82.87 81.59 92.57 -0.08 0.998RBR 1905 74.46 66.72 70.38 -0.14 0.869 7 25.00 14.29 70.38 – –RBS 486 77.40 66.26 71.40 -9.11 0.434 2 0.00 71.40 – –RP 2879 78.23 76.76 77.49 0.58 0.331 0 0.00 77.49 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 69.12 47.00 55.95 -2.98 0.374 17 0.00 0.00 55.95 – –VB 29021 95.34 95.47 95.40 -0.02 0.606 570 82.16 75.96 95.40 -0.96 0.316VBD 32941 95.71 94.58 95.14 0.08 0.031 426 81.44 63.85 95.14 -0.93 0.479VBG 16321 91.43 92.24 91.84 0.04 0.668 924 69.89 91.45 91.84 0.03 0.933VBN 22177 87.02 90.70 88.82 -0.02 0.863 716 66.20 78.77 88.82 -2.79 0.185VBP 13819 93.66 92.18 92.92 -0.06 0.491 131 59.46 50.38 92.92 -1.65 0.670VBZ 23816 97.68 96.62 97.14 0.12 0.236 467 87.13 56.53 97.14 2.72 0.358WDT 4745 96.27 96.27 96.27 0.19 0.217 4 0.00 96.27 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.51 -0.60 0.029TOKENS 1044667 96.831 -0.01 0.359 24622 81.46 -0.59 0.153

55

B.8 in–sub[s] Mapping

(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.70 99.37 99.54 0.00 0.912 1 0.00 99.54 – –CD 40132 99.27 99.57 99.42 0.00 0.988 3413 99.35 98.51 99.42 – –DT 90066 99.41 99.34 99.38 0.00 0.656 2 0.00 99.38 – –EX 951 95.08 99.58 97.28 -0.10 0.179 0 97.28 – –FW 238 62.40 32.77 42.98 -0.10 0.953 67 0.00 0.00 42.98 – –IN 108456 97.96 98.80 98.38 0.02 0.194 22 0.00 98.38 – –JJ 67085 92.35 92.08 92.21 -0.00 0.764 4581 79.68 82.10 92.21 0.04 0.682JJR 3621 84.71 92.29 88.34 -0.08 0.576 85 88.89 28.24 88.34 3.42 0.962JJS 2129 95.19 95.77 95.48 0.03 0.660 54 86.49 59.26 95.48 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.09 96.12 0.00 0.670 3934 81.14 67.69 96.12 0.09 0.272NNP 100926 94.69 97.97 96.30 0.01 0.192 6075 82.45 98.16 96.30 -0.03 0.305NNPS 2917 54.81 73.19 62.68 0.11 0.650 234 54.00 11.54 62.68 -3.23 0.374NNS 65922 97.73 97.22 97.47 0.01 0.440 2353 87.32 90.44 97.47 0.07 0.284PDT 397 70.02 84.13 76.43 -0.07 0.746 0 76.43 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 0.00 0.983 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 0.00 0.987 1 0.00 99.67 – –RB 33806 95.10 90.63 92.81 0.01 0.766 516 90.30 81.20 92.81 -0.14 0.374RBR 1905 77.41 66.56 71.58 -0.40 0.204 7 0.00 71.58 – –RBS 486 87.98 84.36 86.13 0.11 0.797 2 0.00 0.00 86.13 – –RP 2879 75.05 75.65 75.35 -0.49 0.132 0 75.35 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.74 94.61 95.66 0.01 0.277 570 86.35 78.77 95.66 0.14 0.711VBD 32941 95.01 95.94 95.47 0.02 0.466 426 80.59 71.13 95.47 0.41 0.401VBG 16321 92.37 94.16 93.26 -0.03 0.453 924 82.05 91.02 93.26 -0.13 0.557VBN 22177 88.24 90.90 89.55 -0.01 0.899 716 78.53 73.04 89.55 0.19 0.623VBP 13819 93.98 92.24 93.10 0.04 0.219 131 72.88 32.82 93.10 1.84 0.921VBZ 23816 98.84 96.41 97.61 0.00 0.825 467 89.91 64.88 97.61 0.21 0.848WDT 4745 96.41 96.88 96.65 -0.13 0.288 4 0.00 96.65 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.77 0.11 0.380TOKENS 1044667 96.855 0.00 0.003 24622 84.65 0.03 0.424

56

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.39 99.51 0.03 0.271 1 0.00 99.51 – –CD 40132 99.31 99.49 99.40 -0.01 0.653 3413 98.55 97.60 99.40 -0.01 0.920DT 90066 99.26 99.49 99.38 -0.03 0.100 2 0.00 99.38 – –EX 951 95.43 98.84 97.11 -0.16 0.598 0 97.11 – –FW 238 54.85 47.48 50.90 -1.61 0.288 67 0.00 0.00 50.90 – –IN 108456 98.22 98.63 98.42 -0.00 0.867 22 0.00 0.00 98.42 – –JJ 67085 92.03 91.23 91.63 -0.04 0.605 4581 79.44 71.93 91.63 -0.68 0.533JJR 3621 84.82 90.91 87.76 0.25 0.091 85 70.00 24.71 87.76 4.76 0.981JJS 2129 94.27 95.82 95.04 1.91 0.393 54 72.73 74.07 95.04 -0.65 0.517MD 10743 99.67 99.78 99.72 0.00 0.688 5 0.00 99.72 – –NN 146173 96.22 96.02 96.12 -0.01 0.846 3934 73.71 69.14 96.12 -0.60 0.578NNP 100926 96.62 97.32 96.97 -0.05 0.041 6075 83.62 94.01 96.97 -0.07 0.650NNPS 2917 63.28 65.68 64.46 -1.74 0.079 234 31.82 11.97 64.46 -16.30 0.483NNS 65922 97.52 97.92 97.72 -0.01 0.803 2353 82.65 89.08 97.72 -0.03 0.915PDT 397 70.64 80.60 75.29 0.16 0.833 0 75.29 – –POS 9529 98.95 99.54 99.24 0.02 0.168 0 99.24 – –PRP 19164 99.73 99.36 99.55 -0.00 0.736 11 0.00 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.00 91.16 92.56 -0.02 0.559 516 84.65 81.20 92.56 0.72 0.545RBR 1905 75.80 66.25 70.70 0.32 0.657 7 33.33 14.29 70.70 – –RBS 486 87.39 81.28 84.22 7.22 0.409 2 0.00 84.22 – –RP 2879 78.51 76.24 77.36 0.41 0.186 0 77.36 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.49 95.37 95.43 0.01 0.877 570 86.06 75.79 95.43 1.11 0.468VBD 32941 95.65 94.60 95.12 0.06 0.204 426 82.21 62.91 95.12 -1.35 0.375VBG 16321 91.61 92.34 91.97 0.20 0.075 924 69.92 93.07 91.97 0.81 0.125VBN 22177 87.17 90.55 88.83 -0.01 0.885 716 66.21 81.01 88.83 -1.54 0.411VBP 13819 93.49 92.20 92.84 -0.14 0.240 131 54.87 47.33 92.84 -8.37 0.406VBZ 23816 97.68 96.59 97.13 0.11 0.180 467 86.45 57.39 97.13 3.34 0.178WDT 4745 96.30 96.04 96.17 0.08 0.286 4 0.00 96.17 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.63 -0.36 0.313TOKENS 1044667 96.842 0.00 0.992 24622 81.76 -0.22 0.652

57

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 0.00 0.983 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.75 99.31 99.53 -0.01 0.034 1 0.00 99.53 – –CD 40132 99.35 99.48 99.41 0.01 0.109 3413 99.21 98.80 99.41 0.03 0.591DT 90066 99.34 99.35 99.35 -0.02 0.014 2 0.00 99.35 – –EX 951 96.10 98.42 97.25 0.20 0.239 0 97.25 – –FW 238 75.00 27.73 40.49 -2.00 0.338 67 100.00 5.97 40.49 – –IN 108456 97.84 98.50 98.17 -0.05 0.015 22 62.50 22.73 98.17 – –JJ 67085 91.86 92.84 92.35 0.04 0.069 4581 82.47 82.45 92.35 0.03 0.826JJR 3621 84.82 90.11 87.39 -0.08 0.096 85 81.25 45.88 87.39 0.00 0.756JJS 2129 94.65 94.65 94.65 -0.13 0.285 54 88.10 68.52 94.65 -1.62 0.374MD 10743 99.75 99.56 99.66 -0.00 0.618 5 0.00 99.66 – –NN 146173 96.36 96.59 96.48 -0.00 0.953 3934 79.56 78.09 96.48 -0.22 0.321NNP 100926 96.20 97.72 96.96 -0.00 0.457 6075 89.33 95.62 96.96 0.01 0.774NNPS 2917 65.83 54.10 59.39 -0.08 0.739 234 56.76 35.90 59.39 -1.44 0.740NNS 65922 97.66 98.51 98.08 0.01 0.042 2353 90.09 92.69 98.08 0.06 0.609PDT 397 73.86 65.49 69.43 0.88 0.316 0 69.43 – –POS 9529 98.40 99.63 99.01 -0.02 0.104 0 99.01 – –PRP 19164 99.85 99.17 99.51 0.01 0.420 11 0.00 99.51 – –PRP$ 9173 99.15 99.95 99.54 0.01 0.374 1 0.00 99.54 – –RB 33806 93.68 91.00 92.32 0.01 0.708 516 92.36 84.30 92.32 -0.53 0.112RBR 1905 75.19 62.52 68.27 -0.14 0.525 7 0.00 0.00 68.27 – –RBS 486 83.84 80.04 81.89 -0.83 0.049 2 0.00 81.89 – –RP 2879 79.73 73.36 76.41 0.05 0.908 0 76.41 – –SYM 59 80.43 62.71 70.48 – – 1 100.00 100.00 70.48 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.85 28.00 42.11 2.92 0.374 17 0.00 42.11 – –VB 29021 97.46 95.10 96.26 -0.01 0.518 570 90.56 85.79 96.26 -0.34 0.220VBD 32941 96.10 96.55 96.32 -0.00 0.929 426 84.44 80.28 96.32 -0.03 0.889VBG 16321 94.17 91.94 93.04 -0.04 0.235 924 85.48 89.83 93.04 -0.19 0.274VBN 22177 91.04 89.66 90.35 0.02 0.717 716 80.94 79.47 90.35 -0.07 0.744VBP 13819 94.68 93.69 94.18 0.03 0.396 131 74.31 61.83 94.18 -1.19 0.378VBZ 23816 98.87 96.72 97.79 0.00 0.827 467 89.93 78.37 97.79 0.48 0.309WDT 4745 94.29 96.48 95.38 -0.79 0.007 4 0.00 95.38 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.64 -0.16 0.287TOKENS 1044667 97.048 -0.01 0.136 24622 87.29 -0.05 0.336

58

B.9

(Pos = DT & Wd ∈ {a, an, another, each, every, that, this}) ⇒ (Pos ← DT–1)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.67 99.39 99.53 0.05 0.128 1 0.00 0.00 99.53 – –CD 40132 99.33 99.44 99.39 -0.02 0.397 3413 98.95 97.10 99.39 -0.06 0.740DT 90066 99.33 99.50 99.41 0.01 0.062 2 0.00 99.41 – –EX 951 95.43 98.74 97.05 -0.21 0.276 0 97.05 – –FW 238 56.99 46.22 51.04 -1.33 0.791 67 57.14 5.97 51.04 – –IN 108456 98.20 98.69 98.44 0.02 0.131 22 0.00 0.00 98.44 – –JJ 67085 91.89 91.17 91.53 -0.15 0.178 4581 78.76 72.06 91.53 -0.99 0.286JJR 3621 84.62 90.25 87.34 -0.23 0.291 85 73.53 29.41 87.34 20.52 0.453JJS 2129 94.27 88.16 91.12 -2.30 0.362 54 71.43 74.07 91.12 -1.55 0.374MD 10743 99.67 99.77 99.72 0.00 0.996 5 0.00 99.72 – –NN 146173 96.20 96.00 96.10 -0.03 0.269 3934 74.45 69.19 96.10 -0.09 0.921NNP 100926 96.66 97.30 96.98 -0.03 0.363 6075 83.72 93.84 96.98 -0.09 0.797NNPS 2917 62.81 66.82 64.75 -1.30 0.164 234 32.14 19.23 64.75 15.81 0.543NNS 65922 97.59 97.91 97.75 0.02 0.497 2353 83.79 87.89 97.75 0.03 0.975PDT 397 71.00 78.34 74.49 -0.91 0.474 0 0.00 74.49 – –POS 9529 98.93 99.52 99.22 0.00 0.986 0 99.22 – –PRP 19164 99.74 99.35 99.55 -0.01 0.568 11 0.00 99.55 – –PRP$ 9173 99.36 99.91 99.64 -0.01 0.606 1 0.00 99.64 – –RB 33806 94.03 91.02 92.50 -0.08 0.035 516 83.95 79.07 92.50 -1.04 0.423RBR 1905 74.70 65.88 70.01 -0.65 0.461 7 20.00 14.29 70.01 – –RBS 486 64.86 83.54 73.02 -7.04 0.368 2 0.00 73.02 – –RP 2879 79.10 75.30 77.15 0.14 0.461 0 77.15 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.10 95.54 95.32 -0.11 0.030 570 79.60 76.67 95.32 -2.01 0.087VBD 32941 95.55 94.66 95.10 0.04 0.685 426 78.55 63.62 95.10 -2.70 0.184VBG 16321 91.39 92.11 91.75 -0.05 0.448 924 70.42 92.75 91.75 1.07 0.097VBN 22177 87.13 90.48 88.77 -0.08 0.690 716 66.29 81.01 88.77 -1.48 0.333VBP 13819 93.80 92.20 92.99 0.03 0.702 131 64.55 54.20 92.99 6.24 0.336VBZ 23816 97.64 96.62 97.13 0.11 0.161 467 86.62 58.24 97.13 4.35 0.321WDT 4745 96.60 95.79 96.19 0.11 0.496 4 0.00 96.19 – –WP 2604 99.05 99.62 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.43 -0.76 0.084TOKENS 1044667 96.815 -0.03 0.057 24622 81.70 -0.29 0.367

59

B.10

(Pos = JJ & WdCs = [A-Z]§*[a-z]§* & Pos1Before = (?!ABSENT).*) ⇒ (Pos ← JJP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.90 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.40 99.48 0.01 0.431 1 0.00 99.48 – –CD 40132 99.34 99.47 99.40 -0.00 0.870 3413 98.49 97.54 99.40 -0.07 0.744DT 90066 99.33 99.47 99.40 0.00 0.996 2 0.00 99.40 – –EX 951 95.33 98.84 97.06 -0.21 0.231 0 97.06 – –FW 238 58.43 43.70 50.00 -3.35 0.248 67 33.33 1.49 50.00 – –IN 108456 98.16 98.70 98.43 0.00 0.669 22 0.00 0.00 98.43 – –JJ 67085 92.17 91.18 91.67 0.01 0.725 4581 79.30 71.58 91.67 -1.02 0.143JJR 3621 85.04 90.39 87.63 0.09 0.500 85 80.00 14.12 87.63 – –JJS 2129 88.21 92.39 90.25 -3.22 0.501 54 71.93 75.93 90.25 – –MD 10743 99.67 99.78 99.72 0.00 0.688 5 0.00 99.72 – –NN 146173 96.29 96.00 96.15 0.02 0.619 3934 74.85 68.99 96.15 0.02 0.999NNP 100926 96.53 97.48 97.00 -0.02 0.309 6075 83.40 94.29 97.00 -0.06 0.766NNPS 2917 64.39 66.03 65.20 -0.61 0.249 234 42.42 11.97 65.20 -10.17 0.755NNS 65922 97.40 97.93 97.67 -0.06 0.000 2353 80.64 88.48 97.67 -1.62 0.011PDT 397 70.76 79.85 75.03 -0.20 0.374 0 75.03 – –POS 9529 98.91 99.57 99.24 0.02 0.489 0 99.24 – –PRP 19164 99.72 99.37 99.55 -0.01 0.774 11 14.29 9.09 99.55 – –PRP$ 9173 99.38 99.92 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.16 91.06 92.58 0.01 0.741 516 85.54 80.23 92.58 0.62 0.757RBR 1905 75.57 66.56 70.78 0.43 0.630 7 0.00 0.00 70.78 – –RBS 486 66.24 52.88 58.81 -25.13 0.185 2 0.00 58.81 – –RP 2879 79.21 75.58 77.36 0.40 0.082 0 77.36 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.32 95.42 95.37 -0.05 0.224 570 81.85 77.54 95.37 -0.09 0.753VBD 32941 95.63 94.56 95.09 0.03 0.497 426 80.53 64.08 95.09 -1.22 0.333VBG 16321 91.51 92.37 91.93 0.15 0.319 924 70.58 92.42 91.93 1.04 0.179VBN 22177 87.00 90.63 88.78 -0.06 0.437 716 65.05 80.59 88.78 -2.72 0.155VBP 13819 93.62 92.30 92.96 -0.01 0.683 131 58.56 49.62 92.96 -3.14 0.448VBZ 23816 97.63 96.41 97.02 -0.01 0.880 467 82.68 54.18 97.02 -1.94 0.559WDT 4745 96.59 95.62 96.10 0.01 0.731 4 0.00 96.10 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.49 -0.64 0.239TOKENS 1044667 96.829 -0.01 0.544 24622 81.61 -0.41 0.179

60

B.11 nn–ms[l] + dt–num[l] Mapping

(Pos = DT & Wd ∈ {a, another, each, every, little, many, much}) ⇒ (Pos ← DT–1)(Pos = DT & Wd ∈ {these, those}) ⇒ (Pos ← DT–P)(Pos = NNS & Wd ∈ {acrobatics, adenoids, alms,...<66 ommitted>...,tweezers, vicissitudes,waterworks}) ⇒ (Pos ← NNS–P)(Pos = NN & Wd ∈ {abalone, abandon, abandonment,...<9266 ommitted>...,zirconium,zoning, zoology}) ⇒ (Pos ← NN–M)(Pos = JJ & Wd ∈ {countless, few, many, numerous, several}) ⇒ (Pos ← JJ–P)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.72 99.33 99.53 -0.01 0.455 1 0.00 99.53 – –CD 40132 99.29 99.57 99.43 0.01 0.009 3413 99.38 98.54 99.43 0.03 0.178DT 90066 99.46 99.31 99.39 0.01 0.285 2 0.00 99.39 – –EX 951 94.89 99.58 97.18 -0.21 0.107 0 97.18 – –FW 238 62.10 32.35 42.54 -1.10 0.688 67 25.00 1.49 42.54 – –IN 108456 97.82 98.92 98.37 0.00 0.856 22 0.00 0.00 98.37 – –JJ 67085 92.13 92.24 92.19 -0.03 0.318 4581 77.42 84.48 92.19 -0.06 0.596JJR 3621 84.71 92.74 88.54 0.15 0.199 85 87.10 31.76 88.54 12.33 0.298JJS 2129 95.12 96.05 95.58 0.13 0.347 54 81.40 64.81 95.58 2.61 0.332MD 10743 99.71 99.78 99.74 0.00 0.718 5 0.00 99.74 – –NN 146173 97.30 94.96 96.12 0.00 0.932 3934 86.14 62.07 96.12 -2.16 0.002NNP 100926 94.73 97.99 96.34 0.04 0.179 6075 82.41 98.39 96.34 0.05 0.389NNPS 2917 55.11 72.81 62.74 0.19 0.548 234 56.00 11.97 62.74 0.35 0.972NNS 65922 97.69 97.31 97.50 0.04 0.194 2353 86.55 90.78 97.50 -0.20 0.386PDT 397 70.91 82.87 76.42 -0.08 0.840 0 76.42 – –POS 9529 98.73 99.65 99.19 -0.01 0.374 0 99.19 – –PRP 19164 99.86 99.33 99.59 -0.00 0.982 11 0.00 0.00 99.59 – –PRP$ 9173 99.38 99.97 99.67 0.00 0.980 1 0.00 99.67 – –RB 33806 95.22 90.55 92.83 0.03 0.099 516 89.68 82.56 92.83 0.41 0.659RBR 1905 78.16 66.88 72.08 0.30 0.563 7 0.00 72.08 – –RBS 486 88.55 84.36 86.41 0.42 0.017 2 0.00 0.00 86.41 – –RP 2879 75.75 75.72 75.73 0.02 0.950 0 75.73 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 78.72 37.00 50.34 2.78 0.238 17 33.33 5.88 50.34 – –VB 29021 96.68 94.68 95.67 0.01 0.587 570 83.15 80.53 95.67 -0.55 0.585VBD 32941 95.02 95.90 95.46 0.00 0.744 426 80.00 70.42 95.46 -0.46 0.519VBG 16321 92.10 94.20 93.14 -0.15 0.064 924 80.04 93.29 93.14 -0.30 0.631VBN 22177 88.18 90.93 89.53 -0.03 0.795 716 76.87 73.32 89.53 -0.65 0.298VBP 13819 94.07 92.08 93.07 0.01 0.836 131 66.15 32.82 93.07 -1.28 0.624VBZ 23816 98.84 96.33 97.57 -0.04 0.223 467 89.44 61.67 97.57 -2.94 0.032WDT 4745 97.01 96.44 96.72 -0.05 0.580 4 0.00 96.72 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.73 0.03 0.822TOKENS 1044667 96.858 0.01 0.252 24622 84.41 -0.25 0.015

61

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.83 99.87 -0.04 0.070 0 99.87 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.38 99.51 0.03 0.285 1 0.00 99.51 – –CD 40132 99.30 99.46 99.38 -0.03 0.111 3413 98.51 96.92 99.38 -0.37 0.089DT 90066 99.36 99.44 99.40 -0.01 0.638 2 0.00 99.40 – –EX 951 95.53 98.84 97.16 -0.11 0.447 0 97.16 – –FW 238 51.60 47.48 49.45 -4.41 0.417 67 50.00 1.49 49.45 – –IN 108456 98.14 98.69 98.41 -0.01 0.436 22 0.00 0.00 98.41 – –JJ 67085 91.68 91.06 91.37 -0.32 0.021 4581 76.07 73.70 91.37 -1.51 0.183JJR 3621 85.08 89.95 87.45 -0.11 0.556 85 90.00 10.59 87.45 – –JJS 2129 91.69 95.91 93.76 0.53 0.805 54 71.43 74.07 93.76 -1.55 0.374MD 10743 99.69 99.76 99.73 0.01 0.366 5 0.00 99.73 – –NN 146173 96.27 95.78 96.02 -0.11 0.026 3934 74.96 65.12 96.02 -2.91 0.005NNP 100926 96.44 97.41 96.92 -0.09 0.120 6075 82.46 94.01 96.92 -0.81 0.102NNPS 2917 64.04 66.37 65.19 -0.63 0.118 234 40.00 17.95 65.19 19.25 0.302NNS 65922 97.56 97.84 97.70 -0.03 0.467 2353 83.48 87.00 97.70 -0.66 0.324PDT 397 71.99 78.34 75.03 -0.20 0.688 0 75.03 – –POS 9529 98.89 99.51 99.20 -0.02 0.536 0 99.20 – –PRP 19164 99.75 99.36 99.55 0.00 0.627 11 0.00 99.55 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.629 1 0.00 99.64 – –RB 33806 93.89 91.29 92.57 -0.00 0.934 516 81.91 81.59 92.57 -0.66 0.670RBR 1905 75.19 66.82 70.76 0.41 0.547 7 0.00 0.00 70.76 – –RBS 486 85.90 68.93 76.48 -2.63 0.534 2 0.00 76.48 – –RP 2879 78.59 75.48 77.00 -0.05 0.807 0 77.00 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.13 95.52 95.32 -0.11 0.347 570 81.32 77.89 95.32 -0.17 0.817VBD 32941 95.49 94.47 94.97 -0.09 0.361 426 79.35 63.15 94.97 -2.66 0.299VBG 16321 91.19 92.37 91.78 -0.02 0.845 924 70.83 91.99 91.78 1.05 0.214VBN 22177 86.93 90.31 88.59 -0.28 0.231 716 66.47 80.59 88.59 -1.56 0.149VBP 13819 93.79 92.09 92.93 -0.04 0.180 131 64.00 48.85 92.93 -0.09 0.872VBZ 23816 97.70 96.38 97.04 0.01 0.839 467 87.99 53.32 97.04 -0.53 0.948WDT 4745 96.39 95.55 95.97 -0.13 0.331 4 0.00 95.97 – –WP 2604 98.90 99.77 99.33 -0.02 0.872 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.05 -1.49 0.007TOKENS 1044667 96.782 -0.06 0.013 24622 81.11 -1.02 0.002

62

B.12 vb–rp[s] Mapping

(Pos = VB & SibAllR = PRT) ⇒ (Pos ← VB–RP)(Pos = VBG & SibAllR = PRT) ⇒ (Pos ← VBG–RP)(Pos = VBD & SibAllR = PRT) ⇒ (Pos ← VBD–RP)(Pos = VBN & SibAllR = PRT) ⇒ (Pos ← VBN–RP)(Pos = VBP & SibAllR = PRT) ⇒ (Pos ← VBP–RP)(Pos = VBZ & SibAllR = PRT) ⇒ (Pos ← VBZ–RP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.37 99.49 0.02 0.535 1 0.00 99.49 – –CD 40132 99.30 99.46 99.38 -0.03 0.146 3413 98.40 97.25 99.38 -0.26 0.221DT 90066 99.34 99.48 99.41 0.01 0.389 2 0.00 99.41 – –EX 951 95.53 98.95 97.21 -0.05 0.374 0 97.21 – –FW 238 56.31 48.74 52.25 1.01 0.794 67 62.50 7.46 52.25 – –IN 108456 98.12 98.69 98.41 -0.02 0.178 22 0.00 0.00 98.41 – –JJ 67085 92.08 91.23 91.65 -0.01 0.871 4581 78.74 72.91 91.65 -0.40 0.563JJR 3621 85.16 90.17 87.59 0.05 0.814 85 68.18 17.65 87.59 – –JJS 2129 90.93 96.05 93.42 0.18 0.844 54 71.93 75.93 93.42 – –MD 10743 99.68 99.82 99.75 0.04 0.077 5 0.00 99.75 – –NN 146173 96.18 96.04 96.11 -0.02 0.579 3934 74.00 69.47 96.11 -0.17 0.856NNP 100926 96.70 97.37 97.03 0.02 0.519 6075 83.98 93.68 97.03 -0.01 0.996NNPS 2917 63.51 66.88 65.15 -0.68 0.558 234 34.91 15.81 65.15 4.74 0.965NNS 65922 97.51 97.93 97.72 -0.01 0.680 2353 82.56 87.34 97.72 -1.03 0.181PDT 397 70.87 77.83 74.19 -1.31 0.071 0 74.19 – –POS 9529 98.92 99.52 99.22 -0.01 0.747 0 99.22 – –PRP 19164 99.73 99.35 99.54 -0.01 0.203 11 0.00 0.00 99.54 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.01 91.17 92.57 -0.01 0.890 516 83.20 82.56 92.57 0.71 0.474RBR 1905 75.18 67.24 70.99 0.73 0.412 7 50.00 14.29 70.99 – –RBS 486 85.68 65.23 74.07 -5.71 0.527 2 0.00 74.07 – –RP 2879 78.48 74.75 76.57 -0.61 0.530 0 76.57 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 73.44 47.00 57.32 -0.61 0.374 17 0.00 0.00 57.32 – –VB 29021 95.27 95.17 95.22 -0.21 0.023 570 84.06 74.91 95.22 -0.61 0.444VBD 32941 95.57 94.36 94.96 -0.11 0.134 426 79.70 61.74 94.96 -3.70 0.058VBG 16321 91.78 92.24 92.01 0.23 0.070 924 71.56 92.86 92.01 2.04 0.023VBN 22177 86.74 90.40 88.53 -0.34 0.046 716 65.35 78.77 88.53 -3.47 0.074VBP 13819 93.50 92.26 92.88 -0.10 0.234 131 61.95 53.44 92.88 3.45 0.728VBZ 23816 97.75 96.55 97.15 0.12 0.012 467 86.44 58.67 97.15 4.71 0.175WDT 4745 96.61 95.43 96.01 -0.08 0.457 4 0.00 96.01 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.59 -0.45 0.489TOKENS 1044667 96.824 -0.02 0.465 24622 81.71 -0.28 0.437

63

B.13

(Pos = JJ & WdCs =re [A-Z]§*[a-z]§* & Pos1Before =re (?!ABSENT)(?¡‘)(?!LRB)(?! ).*) ⇒(Pos ← JJP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.91 99.89 -0.02 0.208 0 99.89 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.35 99.49 0.02 0.548 1 0.00 99.49 – –CD 40132 99.37 99.38 99.38 -0.03 0.037 3413 99.16 96.45 99.38 -0.30 0.193DT 90066 99.30 99.48 99.39 -0.01 0.187 2 0.00 99.39 – –EX 951 95.43 98.84 97.11 -0.16 0.393 0 97.11 – –FW 238 54.97 44.12 48.95 -5.38 0.172 67 20.00 1.49 48.95 – –IN 108456 98.20 98.69 98.44 0.01 0.253 22 0.00 0.00 98.44 – –JJ 67085 92.28 91.03 91.65 -0.01 0.913 4581 80.78 71.21 91.65 -0.42 0.516JJR 3621 84.92 90.39 87.57 0.03 0.892 85 76.00 22.35 87.57 -0.91 0.374JJS 2129 88.49 96.38 92.27 -1.06 0.842 54 70.69 75.93 92.27 -0.89 0.374MD 10743 99.68 99.80 99.74 0.03 0.071 5 0.00 99.74 – –NN 146173 96.17 96.05 96.11 -0.02 0.388 3934 73.78 69.67 96.11 -0.16 0.820NNP 100926 96.51 97.44 96.97 -0.04 0.309 6075 82.74 94.88 96.97 -0.19 0.522NNPS 2917 63.62 66.23 64.90 -1.07 0.366 234 30.34 11.54 64.90 -19.54 0.251NNS 65922 97.55 97.88 97.72 -0.01 0.629 2353 83.40 88.19 97.72 -0.05 0.762PDT 397 70.20 80.10 74.82 -0.47 0.283 0 74.82 – –POS 9529 98.91 99.50 99.20 -0.02 0.071 0 99.20 – –PRP 19164 99.73 99.37 99.55 -0.00 0.638 11 0.00 0.00 99.55 – –PRP$ 9173 99.40 99.90 99.65 0.01 0.378 1 0.00 99.65 – –RB 33806 94.00 91.16 92.56 -0.02 0.570 516 84.31 81.20 92.56 0.52 0.740RBR 1905 75.36 66.61 70.72 0.34 0.462 7 0.00 0.00 70.72 – –RBS 486 84.72 52.47 64.80 -17.50 0.266 2 0.00 64.80 – –RP 2879 79.35 75.55 77.40 0.47 0.212 0 77.40 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 73.02 46.00 56.44 -2.13 0.225 17 0.00 0.00 56.44 – –VB 29021 95.34 95.38 95.36 -0.07 0.247 570 81.32 75.61 95.36 -1.69 0.071VBD 32941 95.54 94.51 95.02 -0.04 0.222 426 80.42 62.68 95.02 -2.50 0.055VBG 16321 91.24 92.27 91.75 -0.04 0.480 924 69.32 92.21 91.75 -0.08 0.959VBN 22177 86.92 90.59 88.72 -0.13 0.089 716 65.89 82.54 88.72 -0.98 0.324VBP 13819 93.66 92.22 92.93 -0.04 0.543 131 57.85 53.44 92.93 0.17 0.883VBZ 23816 97.65 96.52 97.08 0.06 0.137 467 87.13 56.53 97.08 2.72 0.345WDT 4745 96.62 95.70 96.16 0.07 0.441 4 0.00 96.16 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.40 -0.82 0.144TOKENS 1044667 96.821 -0.02 0.297 24622 81.70 -0.30 0.213

64

B.14 in–rp[l] Mapping

(Pos = RP & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN)(Pos = IN & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.69 99.37 99.53 0.06 0.115 1 0.00 99.53 – –CD 40132 99.30 99.41 99.36 -0.05 0.099 3413 98.36 96.78 99.36 -0.52 0.104DT 90066 99.33 99.49 99.41 0.01 0.441 2 0.00 99.41 – –EX 951 95.04 98.74 96.85 -0.42 0.399 0 96.85 – –FW 238 58.33 47.06 52.09 0.70 0.903 67 75.00 4.48 52.09 – –IN 108456 98.17 98.69 98.43 0.00 0.885 22 0.00 0.00 98.43 – –JJ 67085 91.83 91.27 91.55 -0.12 0.071 4581 77.72 72.17 91.55 -1.55 0.144JJR 3621 84.67 90.64 87.56 0.01 0.975 85 81.25 30.59 87.56 27.49 0.374JJS 2129 91.54 90.00 90.76 -2.68 0.346 54 71.93 75.93 90.76 – –MD 10743 99.66 99.80 99.73 0.01 0.422 5 0.00 0.00 99.73 – –NN 146173 96.23 96.00 96.12 -0.01 0.648 3934 73.83 68.86 96.12 -0.73 0.315NNP 100926 96.74 97.26 97.00 -0.02 0.494 6075 84.70 92.66 97.00 -0.08 0.838NNPS 2917 63.48 66.61 65.01 -0.91 0.186 234 39.10 26.07 65.01 50.54 0.194NNS 65922 97.49 97.95 97.72 -0.01 0.733 2353 82.93 88.14 97.72 -0.36 0.580PDT 397 71.01 79.60 75.06 -0.16 0.687 0 75.06 – –POS 9529 98.95 99.54 99.24 0.02 0.285 0 99.24 – –PRP 19164 99.72 99.38 99.55 0.00 0.783 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.15 91.00 92.55 -0.03 0.685 516 84.84 80.23 92.55 0.21 0.856RBR 1905 75.21 65.46 70.00 -0.68 0.420 7 25.00 14.29 70.00 – –RBS 486 64.96 70.58 67.65 -13.87 0.292 2 0.00 67.65 – –RP 2879 78.83 75.55 77.16 0.14 0.756 0 77.16 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 73.77 45.00 55.90 -3.07 0.374 17 0.00 55.90 – –VB 29021 95.37 95.44 95.40 -0.02 0.673 570 82.33 76.84 95.40 -0.27 0.660VBD 32941 95.70 94.48 95.09 0.03 0.717 426 79.76 62.91 95.09 -2.64 0.040VBG 16321 91.40 92.32 91.86 0.07 0.062 924 69.48 93.40 91.86 0.60 0.321VBN 22177 86.76 90.87 88.77 -0.08 0.290 716 64.96 83.38 88.77 -1.32 0.039VBP 13819 93.74 92.26 92.99 0.02 0.796 131 62.50 53.44 92.99 3.88 0.648VBZ 23816 97.71 96.54 97.12 0.10 0.031 467 88.20 57.60 97.12 4.40 0.225WDT 4745 96.53 95.55 96.04 -0.05 0.497 4 0.00 96.04 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.43 -0.75 0.050TOKENS 1044667 96.818 -0.02 0.077 24622 81.52 -0.52 0.126

65

B.15 in–rp[l] + in–sub[s] Mapping

(Pos = RP & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)(Pos = IN & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RP)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.69 99.39 99.54 0.01 0.766 1 0.00 99.54 – –CD 40132 99.28 99.56 99.42 -0.00 0.616 3413 99.35 98.51 99.42 – –DT 90066 99.41 99.34 99.37 0.00 0.926 2 0.00 99.37 – –EX 951 95.27 99.58 97.38 0.00 0.957 0 97.38 – –FW 238 63.41 32.77 43.21 0.46 0.836 67 0.00 0.00 43.21 – –IN 108456 97.98 98.78 98.38 0.01 0.464 22 0.00 0.00 98.38 – –JJ 67085 92.36 92.06 92.21 -0.01 0.700 4581 79.67 81.95 92.21 -0.06 0.391JJR 3621 84.81 92.21 88.36 -0.06 0.758 85 88.89 28.24 88.36 3.42 0.374JJS 2129 95.19 95.77 95.48 0.03 0.577 54 88.89 59.26 95.48 1.11 0.374MD 10743 99.68 99.79 99.73 -0.00 0.374 5 0.00 99.73 – –NN 146173 97.15 95.09 96.11 -0.00 0.633 3934 80.97 67.82 96.11 0.09 0.455NNP 100926 94.67 97.97 96.29 -0.00 0.747 6075 82.41 98.16 96.29 -0.06 0.043NNPS 2917 54.87 73.19 62.72 0.16 0.552 234 53.85 11.97 62.72 -0.35 0.992NNS 65922 97.72 97.21 97.47 0.00 0.861 2353 87.49 90.35 97.47 0.12 0.043PDT 397 69.54 83.38 75.83 -0.85 0.056 0 75.83 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.38 99.96 99.67 -0.01 0.374 1 0.00 99.67 – –RB 33806 95.08 90.63 92.81 0.01 0.843 516 90.11 81.20 92.81 -0.24 0.182RBR 1905 77.27 66.93 71.73 -0.18 0.721 7 0.00 71.73 – –RBS 486 87.96 84.16 86.01 -0.03 0.909 2 0.00 0.00 86.01 – –RP 2879 75.08 75.96 75.52 -0.27 0.195 0 75.52 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.74 94.60 95.66 0.00 0.671 570 86.02 78.77 95.66 -0.04 0.747VBD 32941 95.00 95.96 95.48 0.02 0.398 426 80.65 70.42 95.48 -0.08 0.893VBG 16321 92.35 94.14 93.24 -0.04 0.324 924 82.23 91.13 93.24 0.04 0.947VBN 22177 88.28 90.89 89.57 0.01 0.859 716 78.51 73.46 89.57 0.48 0.450VBP 13819 93.95 92.20 93.06 0.00 0.787 131 76.36 32.06 93.06 1.61 0.996VBZ 23816 98.84 96.40 97.61 -0.00 0.986 467 89.85 64.45 97.61 -0.21 0.593WDT 4745 96.37 96.88 96.63 -0.15 0.236 4 0.00 96.63 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.73 0.02 0.922TOKENS 1044667 96.851 -0.00 0.725 24622 84.63 0.00 0.935

66

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.39 99.47 -0.01 0.508 1 0.00 0.00 99.47 – –CD 40132 99.33 99.49 99.41 0.00 0.899 3413 98.72 97.45 99.41 0.01 0.990DT 90066 99.28 99.47 99.37 -0.03 0.230 2 0.00 99.37 – –EX 951 95.13 98.53 96.80 -0.48 0.251 0 96.80 – –FW 238 60.23 44.54 51.21 -1.01 0.680 67 50.00 1.49 51.21 – –IN 108456 98.23 98.63 98.43 0.00 0.938 22 0.00 0.00 98.43 – –JJ 67085 91.86 91.35 91.61 -0.06 0.268 4581 78.07 72.12 91.61 -1.36 0.090JJR 3621 84.69 90.78 87.63 0.09 0.785 85 70.00 24.71 87.63 – –JJS 2129 94.58 91.73 93.13 -0.13 0.981 54 79.55 64.81 93.13 -3.31 0.180MD 10743 99.67 99.80 99.73 0.02 0.330 5 0.00 99.73 – –NN 146173 96.20 95.96 96.08 -0.05 0.123 3934 73.49 67.84 96.08 -1.72 0.030NNP 100926 96.61 97.42 97.01 -0.01 0.757 6075 83.23 94.86 97.01 0.11 0.686NNPS 2917 64.10 66.16 65.11 -0.74 0.210 234 44.44 17.09 65.11 18.83 0.350NNS 65922 97.62 97.88 97.75 0.02 0.498 2353 84.15 87.55 97.75 0.06 0.868PDT 397 70.54 79.60 74.79 -0.51 0.462 0 74.79 – –POS 9529 98.94 99.54 99.24 0.02 0.417 0 99.24 – –PRP 19164 99.74 99.37 99.55 0.00 0.379 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.04 91.03 92.51 -0.07 0.317 516 82.64 81.20 92.51 -0.46 0.895RBR 1905 75.80 65.93 70.52 0.07 0.907 7 20.00 14.29 70.52 – –RBS 486 73.85 82.51 77.94 -0.78 0.926 2 0.00 77.94 – –RP 2879 78.21 76.03 77.10 0.08 0.821 0 0.00 77.10 – –SYM 59 79.37 84.75 81.97 0.37 0.823 1 0.00 0.00 81.97 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 69.12 47.00 55.95 -2.98 0.374 17 0.00 0.00 55.95 – –VB 29021 95.39 95.41 95.40 -0.02 0.639 570 85.32 75.44 95.40 0.46 0.894VBD 32941 95.68 94.66 95.17 0.11 0.112 426 79.59 63.15 95.17 -2.54 0.095VBG 16321 91.58 92.02 91.80 0.01 0.918 924 71.11 91.88 91.80 1.21 0.083VBN 22177 87.26 90.53 88.87 0.03 0.809 716 65.86 80.03 88.87 -2.36 0.149VBP 13819 93.59 92.21 92.90 -0.08 0.601 131 59.09 49.62 92.90 -2.74 0.855VBZ 23816 97.66 96.59 97.12 0.10 0.200 467 84.29 56.32 97.12 1.15 0.408WDT 4745 96.11 96.23 96.17 0.08 0.641 4 0.00 96.17 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.51 -0.59 0.131TOKENS 1044667 96.832 -0.01 0.597 24622 81.59 -0.44 0.321

67

B.16

(Pos = VB & SibAllR = PRT) ⇒ (Pos ← VB–RP)(Pos = VBG & SibAllR = PRT) ⇒ (Pos ← VBG–RP)And similarly for other VB.* (Pos = RP & Wd ∈ {about, across,along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ← RP–IN)(Pos = IN & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.36 99.49 0.01 0.654 1 0.00 99.49 – –CD 40132 99.33 99.42 99.37 -0.04 0.029 3413 98.78 96.95 99.37 -0.23 0.312DT 90066 99.32 99.49 99.41 0.00 0.613 2 0.00 99.41 – –EX 951 95.14 98.74 96.90 -0.37 0.460 0 96.90 – –FW 238 60.00 45.38 51.67 -0.11 0.938 67 50.00 1.49 51.67 – –IN 108456 98.13 98.69 98.41 -0.02 0.217 22 0.00 0.00 98.41 – –JJ 67085 91.90 91.35 91.62 -0.04 0.348 4581 77.80 73.65 91.62 -0.46 0.575JJR 3621 84.66 91.02 87.73 0.21 0.263 85 70.97 25.88 87.73 – –JJS 2129 90.85 96.52 93.60 0.37 0.775 54 71.93 75.93 93.60 – –MD 10743 99.67 99.80 99.73 0.02 0.237 5 0.00 99.73 – –NN 146173 96.19 96.00 96.10 -0.03 0.192 3934 74.23 68.76 96.10 -0.55 0.165NNP 100926 96.66 97.28 96.97 -0.04 0.389 6075 83.82 93.58 96.97 -0.15 0.539NNPS 2917 63.70 66.88 65.25 -0.53 0.360 234 43.09 22.65 65.25 42.89 0.013NNS 65922 97.57 97.84 97.70 -0.02 0.479 2353 83.50 86.70 97.70 -0.81 0.038PDT 397 70.76 79.85 75.03 -0.20 0.653 0 75.03 – –POS 9529 98.92 99.53 99.22 0.00 0.988 0 99.22 – –PRP 19164 99.74 99.36 99.55 -0.00 0.990 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 93.93 91.12 92.50 -0.08 0.088 516 84.19 79.46 92.50 -0.66 0.653RBR 1905 75.67 65.46 70.19 -0.40 0.555 7 0.00 0.00 70.19 – –RBS 486 87.22 64.61 74.23 -5.50 0.526 2 0.00 74.23 – –RP 2879 78.01 73.71 75.80 -1.61 0.049 0 0.00 75.80 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 75.41 46.00 57.14 -0.91 0.374 17 0.00 57.14 – –VB 29021 95.29 95.22 95.25 -0.18 0.119 570 83.72 76.67 95.25 0.41 0.921VBD 32941 95.60 94.28 94.94 -0.13 0.080 426 86.42 61.27 94.94 -0.76 0.748VBG 16321 91.60 91.99 91.80 0.00 0.948 924 70.45 91.34 91.80 0.43 0.563VBN 22177 86.61 90.39 88.46 -0.42 0.027 716 65.32 81.28 88.46 -2.12 0.117VBP 13819 93.31 92.14 92.72 -0.27 0.011 131 63.96 54.20 92.72 5.80 0.282VBZ 23816 97.61 96.56 97.08 0.06 0.496 467 80.60 57.82 97.08 0.87 0.783WDT 4745 96.49 95.60 96.04 -0.05 0.574 4 0.00 96.04 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.25 -1.11 0.025TOKENS 1044667 96.802 -0.04 0.045 24622 81.65 -0.35 0.098

68

B.17

(Pos = RP & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN–RB)(Pos = RP & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RP–RB)(Pos = RB & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RB–IN–RP)(Pos = RB & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RB–RP)(Pos = RB & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←IN–RB)(Pos = IN & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RB–RP)(Pos = IN & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←IN–RB)

69

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.38 99.50 0.02 0.354 1 0.00 99.50 – –CD 40132 99.32 99.46 99.39 -0.02 0.295 3413 98.49 97.19 99.39 -0.25 0.305DT 90066 99.35 99.46 99.41 0.01 0.677 2 0.00 99.41 – –EX 951 94.93 98.53 96.70 -0.58 0.178 0 96.70 – –FW 238 57.73 47.06 51.85 0.23 0.958 67 100.00 1.49 51.85 – –IN 108456 95.77 98.96 97.34 -1.10 0.000 22 0.00 0.00 97.34 – –JJ 67085 91.88 91.25 91.57 -0.10 0.246 4581 77.94 73.02 91.57 -0.81 0.204JJR 3621 84.41 90.89 87.53 -0.02 0.930 85 60.61 23.53 87.53 -2.77 0.796JJS 2129 88.57 96.48 92.36 -0.97 0.864 54 71.93 75.93 92.36 – –MD 10743 99.67 99.81 99.74 0.02 0.225 5 0.00 99.74 – –NN 146173 96.23 95.96 96.10 -0.03 0.156 3934 73.93 67.92 96.10 -1.38 0.054NNP 100926 96.77 97.23 97.00 -0.02 0.697 6075 84.64 92.95 97.00 0.04 0.867NNPS 2917 63.08 66.61 64.80 -1.22 0.098 234 32.82 18.38 64.80 13.39 0.811NNS 65922 97.46 97.97 97.71 -0.02 0.335 2353 80.93 89.12 97.71 -1.09 0.068PDT 397 70.82 80.10 75.18 0.00 0.955 0 75.18 – –POS 9529 98.94 99.53 99.23 0.01 0.615 0 99.23 – –PRP 19164 99.72 99.37 99.54 -0.01 0.594 11 0.00 99.54 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.694 1 0.00 99.64 – –RB 33806 94.39 83.02 88.34 -4.57 0.000 516 82.74 80.81 88.34 -0.64 0.594RBR 1905 76.09 65.67 70.50 0.04 0.954 7 0.00 0.00 70.50 – –RBS 486 84.44 52.47 64.72 -17.61 0.264 2 0.00 64.72 – –RP 2879 78.36 76.73 77.54 0.64 0.314 0 77.54 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.41 95.37 95.39 -0.04 0.659 570 83.49 77.19 95.39 0.64 0.695VBD 32941 95.62 94.47 95.04 -0.02 0.297 426 83.54 61.97 95.04 -1.51 0.305VBG 16321 91.37 92.40 91.88 0.10 0.359 924 70.16 92.86 91.88 0.90 0.371VBN 22177 86.98 90.48 88.69 -0.16 0.014 716 66.47 79.47 88.69 -2.18 0.075VBP 13819 93.40 92.31 92.85 -0.13 0.234 131 53.17 51.15 92.85 -5.99 0.431VBZ 23816 97.70 96.62 97.16 0.14 0.021 467 87.25 57.17 97.16 3.49 0.208WDT 4745 96.43 96.10 96.26 0.18 0.156 4 0.00 96.26 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 49.20 -5.05 0.001TOKENS 1044667 96.585 -0.27 0.000 24622 81.49 -0.55 0.036

70

B.18

(Pos = RP & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN–RB)(Pos = RP & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RP–RB)(Pos = RB & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RB–IN–RP)(Pos = RB & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RB–RP)(Pos = RB & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←IN–RB)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)(Pos = IN & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RB–RP)(Pos = IN & Par =re (?!SBAR) & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then,though, under} & SibR =re (?!S)) ⇒ (Pos ← IN–RB)

71

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.67 99.33 99.50 0.02 0.525 1 0.00 0.00 99.50 – –CD 40132 99.29 99.42 99.36 -0.05 0.061 3413 98.51 96.84 99.36 -0.42 0.048DT 90066 99.29 99.47 99.38 -0.02 0.095 2 0.00 99.38 – –EX 951 95.13 98.53 96.80 -0.48 0.182 0 96.80 – –FW 238 57.95 47.48 52.19 0.89 0.756 67 0.00 0.00 52.19 – –IN 108456 95.78 98.90 97.32 -1.13 0.000 22 0.00 0.00 97.32 – –JJ 67085 91.99 91.25 91.62 -0.04 0.489 4581 78.48 72.34 91.62 -0.96 0.381JJR 3621 84.52 91.36 87.80 0.29 0.378 85 72.22 30.59 87.80 23.27 0.374JJS 2129 91.54 96.10 93.77 0.55 0.803 54 71.93 75.93 93.77 – –MD 10743 99.67 99.78 99.72 0.00 0.688 5 0.00 99.72 – –NN 146173 96.19 95.97 96.08 -0.05 0.091 3934 73.31 68.15 96.08 -1.60 0.013NNP 100926 96.69 97.28 96.98 -0.03 0.124 6075 83.96 94.09 96.98 0.19 0.329NNPS 2917 63.07 66.44 64.71 -1.36 0.212 234 32.69 14.53 64.71 -3.18 0.845NNS 65922 97.56 97.90 97.73 0.00 0.794 2353 83.04 87.80 97.73 -0.48 0.283PDT 397 70.82 80.10 75.18 0.00 0.905 0 75.18 – –POS 9529 98.94 99.54 99.24 0.02 0.071 0 99.24 – –PRP 19164 99.74 99.37 99.56 0.01 0.180 11 0.00 99.56 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.44 82.93 88.31 -4.60 0.000 516 82.90 80.81 88.31 -0.55 0.760RBR 1905 76.48 65.56 70.60 0.19 0.827 7 25.00 14.29 70.60 – –RBS 486 86.46 68.31 76.32 -2.84 0.531 2 0.00 76.32 – –RP 2879 77.88 76.94 77.41 0.47 0.410 0 77.41 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.30 95.27 95.28 -0.15 0.040 570 84.41 75.96 95.28 0.32 0.731VBD 32941 95.59 94.64 95.11 0.05 0.352 426 80.91 62.68 95.11 -2.24 0.283VBG 16321 91.51 92.22 91.86 0.07 0.130 924 70.48 91.99 91.86 0.76 0.332VBN 22177 87.09 90.65 88.84 -0.00 0.985 716 66.06 81.56 88.84 -1.36 0.430VBP 13819 93.24 92.08 92.66 -0.34 0.044 131 49.61 48.85 92.66 -11.24 0.016VBZ 23816 97.62 96.63 97.12 0.10 0.113 467 85.53 58.24 97.12 3.81 0.334WDT 4745 95.94 96.23 96.09 -0.00 0.994 4 0.00 96.09 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 49.23 -5.00 0.000TOKENS 1044667 96.585 -0.27 0.000 24622 81.52 -0.52 0.167

72

B.19 vb–rp[ld] Mapping

(Pos = VB & Wd ∈ {bail, beef, blot,...<76 ommitted>...,smooth, soak, speed}) ⇒ (Pos ←VB–RP)(Pos = VBG & Wd ∈ {bailing, beefing, blotting,...<76 ommitted>...,smoothing, soaking,speeding}) ⇒ (Pos ← VBG–RP)and similary for other VB.*

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.73 99.33 99.53 -0.00 0.178 1 0.00 99.53 – –CD 40132 99.28 99.57 99.43 0.00 0.208 3413 99.35 98.51 99.43 – –DT 90066 99.46 99.29 99.38 0.00 0.421 2 0.00 99.38 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.84 98.91 98.37 0.00 0.022 22 0.00 98.37 – –JJ 67085 92.37 92.08 92.23 0.01 0.329 4581 79.64 82.12 92.23 0.02 0.639JJR 3621 84.90 92.21 88.40 -0.01 0.925 85 88.89 28.24 88.40 3.42 0.374JJS 2129 95.15 95.77 95.46 0.00 0.915 54 84.21 59.26 95.46 -1.09 0.374MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.16 95.10 96.12 0.00 0.327 3934 81.02 67.69 96.12 0.02 0.875NNP 100926 94.68 97.95 96.29 -0.01 0.099 6075 82.48 98.12 96.29 -0.02 0.581NNPS 2917 54.78 73.12 62.63 0.03 0.767 234 53.85 11.97 62.63 -0.35 0.374NNS 65922 97.71 97.21 97.46 -0.01 0.329 2353 86.95 90.35 97.46 -0.19 0.019PDT 397 69.87 84.13 76.34 -0.18 0.374 0 76.34 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.34 99.59 0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.40 99.96 99.68 0.01 0.374 1 0.00 99.68 – –RB 33806 95.18 90.54 92.80 -0.00 0.815 516 90.30 81.20 92.80 -0.14 0.374RBR 1905 77.39 67.19 71.93 0.09 0.379 7 0.00 71.93 – –RBS 486 87.98 84.36 86.13 0.11 0.625 2 0.00 0.00 86.13 – –RP 2879 75.59 75.82 75.71 -0.02 0.858 0 75.71 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.74 94.63 95.68 0.02 0.096 570 86.07 79.12 95.68 0.22 0.099VBD 32941 95.04 95.87 95.45 -0.01 0.642 426 81.77 69.48 95.45 -0.16 0.859VBG 16321 92.34 94.15 93.23 -0.05 0.075 924 82.31 91.13 93.23 0.09 0.594VBN 22177 88.15 90.96 89.53 -0.03 0.251 716 78.04 73.46 89.53 0.19 0.837VBP 13819 94.01 92.18 93.09 0.03 0.054 131 75.44 32.82 93.09 2.93 0.459VBZ 23816 98.82 96.37 97.58 -0.03 0.099 467 89.33 62.74 97.58 -2.00 0.089WDT 4745 97.19 96.38 96.78 0.01 0.639 4 0.00 96.78 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.71 -0.01 0.913TOKENS 1044667 96.852 -0.00 0.899 24622 84.60 -0.03 0.130

73

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.41 99.51 0.03 0.278 1 0.00 99.51 – –CD 40132 99.35 99.46 99.40 -0.00 0.884 3413 98.84 97.28 99.40 -0.03 0.844DT 90066 99.35 99.49 99.42 0.02 0.104 2 0.00 99.42 – –EX 951 95.43 98.84 97.11 -0.16 0.208 0 97.11 – –FW 238 55.56 46.22 50.46 -2.46 0.268 67 0.00 50.46 – –IN 108456 98.15 98.71 98.43 0.00 0.870 22 0.00 0.00 98.43 – –JJ 67085 92.05 91.32 91.68 0.03 0.340 4581 78.71 73.37 91.68 -0.09 0.915JJR 3621 85.14 90.06 87.53 -0.02 0.924 85 70.00 16.47 87.53 -23.51 0.520JJS 2129 88.11 92.58 90.29 -3.18 0.507 54 71.93 75.93 90.29 – –MD 10743 99.67 99.79 99.73 0.01 0.581 5 0.00 99.73 – –NN 146173 96.21 96.01 96.11 -0.02 0.547 3934 73.99 68.76 96.11 -0.71 0.062NNP 100926 96.68 97.34 97.01 -0.01 0.792 6075 84.13 93.71 97.01 0.11 0.775NNPS 2917 63.78 65.75 64.75 -1.29 0.244 234 38.27 13.25 64.75 -5.28 0.853NNS 65922 97.46 97.97 97.72 -0.01 0.792 2353 81.60 88.61 97.72 -0.94 0.248PDT 397 71.04 79.09 74.85 -0.43 0.346 0 74.85 – –POS 9529 98.94 99.55 99.24 0.02 0.097 0 99.24 – –PRP 19164 99.74 99.37 99.56 0.01 0.493 11 0.00 99.56 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.07 91.26 92.65 0.08 0.170 516 83.63 82.17 92.65 0.73 0.492RBR 1905 75.15 67.30 71.01 0.76 0.405 7 0.00 0.00 71.01 – –RBS 486 66.75 52.47 58.76 -25.20 0.186 2 0.00 58.76 – –RP 2879 78.33 76.35 77.33 0.37 0.661 0 0.00 77.33 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.54 95.26 95.40 -0.03 0.537 570 85.08 74.04 95.40 -0.67 0.426VBD 32941 95.56 94.43 94.99 -0.07 0.061 426 81.73 61.97 94.99 -2.43 0.214VBG 16321 91.55 92.11 91.83 0.04 0.484 924 69.75 92.32 91.83 0.32 0.545VBN 22177 86.76 90.53 88.60 -0.26 0.134 716 65.55 79.47 88.60 -2.92 0.064VBP 13819 93.41 92.18 92.80 -0.19 0.012 131 53.03 53.44 92.80 -4.02 0.369VBZ 23816 97.77 96.50 97.13 0.11 0.192 467 87.58 55.89 97.13 2.22 0.612WDT 4745 96.71 95.49 96.10 0.01 0.917 4 0.00 96.10 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.46 -0.69 0.215TOKENS 1044667 96.823 -0.02 0.303 24622 81.69 -0.31 0.172

74

B.20

(Pos = VB & Wd ∈ {auction, back, bail,...<187 ommitted>...,speak, speed, spell}) ⇒ (Pos ←VB–RP)(Pos = VBG & Wd ∈ {auctioning, backing, bailing,...<187 ommitted>...,speaking, speeding,spelling}) ⇒ (Pos ← VBG–RP)(Pos = VBD & Wd ∈ {ate, auctioned, backed,...<187 ommitted>...,sped, spelt, spoke}) ⇒(Pos ← VBD–RP)(Pos = VBN & Wd ∈ {auctioned, backed, bailed,...<187 ommitted>...,sped, spelt, spoken})⇒ (Pos ← VBN–RP)(Pos = VBP & Wd ∈ {auction, back, bail,...<187 ommitted>...,speak, speed, spell}) ⇒ (Pos← VBP–RP)(Pos = VBZ & Wd ∈ {auctions, backs, bails,...<187 ommitted>...,speaks, speeds, spells}) ⇒(Pos ← VBZ–RP)

75

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.91 99.88 99.89 -0.01 0.374 0 99.89 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.40 99.48 0.00 0.646 1 0.00 99.48 – –CD 40132 99.33 99.48 99.40 -0.00 0.858 3413 98.87 97.30 99.40 0.00 0.993DT 90066 99.36 99.48 99.42 0.02 0.095 2 0.00 99.42 – –EX 951 95.63 98.95 97.26 0.00 0.954 0 97.26 – –FW 238 60.00 45.38 51.67 -0.11 0.994 67 33.33 1.49 51.67 – –IN 108456 98.15 98.69 98.42 -0.01 0.487 22 0.00 0.00 98.42 – –JJ 67085 91.64 91.37 91.50 -0.17 0.103 4581 77.68 72.78 91.50 -1.14 0.286JJR 3621 85.02 90.89 87.85 0.35 0.253 85 75.00 21.18 87.85 – –JJS 2129 90.84 92.20 91.52 -1.87 0.715 54 76.47 72.22 91.52 0.56 0.954MD 10743 99.65 99.80 99.73 0.01 0.680 5 0.00 99.73 – –NN 146173 96.18 95.95 96.06 -0.07 0.021 3934 73.89 68.84 96.06 -0.72 0.021NNP 100926 96.69 97.28 96.98 -0.03 0.305 6075 84.12 93.40 96.98 -0.06 0.909NNPS 2917 63.18 66.58 64.83 -1.17 0.267 234 36.55 22.65 64.83 34.60 0.645NNS 65922 97.47 97.89 97.68 -0.04 0.255 2353 82.51 87.59 97.68 -0.93 0.172PDT 397 70.11 78.59 74.11 -1.42 0.222 0 0.00 74.11 – –POS 9529 98.83 99.52 99.17 -0.05 0.074 0 99.17 – –PRP 19164 99.73 99.36 99.55 -0.01 0.473 11 0.00 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.14 90.82 92.45 -0.13 0.114 516 84.91 78.49 92.45 -0.88 0.146RBR 1905 76.24 66.88 71.25 1.11 0.033 7 0.00 0.00 71.25 – –RBS 486 70.58 65.64 68.02 -13.41 0.382 2 0.00 68.02 – –RP 2879 78.76 75.72 77.21 0.22 0.686 0 77.21 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 94.94 95.07 95.00 -0.44 0.010 570 81.59 75.44 95.00 -1.65 0.207VBD 32941 95.75 94.30 95.02 -0.04 0.368 426 83.33 62.21 95.02 -1.40 0.443VBG 16321 91.43 91.96 91.70 -0.11 0.293 924 70.28 91.88 91.70 0.55 0.411VBN 22177 86.77 90.61 88.65 -0.21 0.078 716 65.65 80.87 88.65 -2.08 0.083VBP 13819 93.60 91.93 92.76 -0.22 0.019 131 69.70 52.67 92.76 8.18 0.036VBZ 23816 97.58 96.46 97.02 -0.01 0.867 467 84.18 56.96 97.02 1.78 0.594WDT 4745 96.56 95.17 95.86 -0.24 0.105 4 0.00 95.86 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.14 -1.31 0.066TOKENS 1044667 96.782 -0.06 0.046 24622 81.52 -0.51 0.229

76

B.21

(Pos = VB & Wd ∈ {bail, beef, blot,...<76 ommitted>...,smooth, soak, speed}) ⇒ (Pos ←VB–RP)(Pos = VBG & Wd ∈ {bailing, beefing, blotting,...<76 ommitted>...,smoothing, soaking,speeding}) ⇒ (Pos ← VBG–RP)(Pos = VBD & Wd ∈ {bailed, beefed, blotted,...<76 ommitted>...,smoothed, soaked, sped})⇒ (Pos ← VBD–RP)(Pos = VBN & Wd ∈ {bailed, beefed, blotted,...<76 ommitted>...,smoothed, soaked, sped})⇒ (Pos ← VBN–RP)(Pos = VBP & Wd ∈ {bail, beef, blot,...<76 ommitted>...,smooth, soak, speed}) ⇒ (Pos ←VBP–RP)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)(Pos = IN & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RP)(Pos = RP & Wd ∈ {about, across, along,...<14 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN)(Pos = VBZ & Wd ∈ {bails, beefs, blots,...<76 ommitted>...,smooths, soaks, speeds}) ⇒(Pos ← VBZ–RP)

77

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.36 99.50 0.02 0.339 1 0.00 99.50 – –CD 40132 99.33 99.45 99.39 -0.02 0.358 3413 98.66 97.07 99.39 -0.22 0.308DT 90066 99.28 99.47 99.37 -0.03 0.188 2 0.00 99.37 – –EX 951 95.13 98.53 96.80 -0.48 0.251 0 96.80 – –FW 238 57.67 45.80 51.05 -1.31 0.512 67 100.00 1.49 51.05 – –IN 108456 98.22 98.64 98.43 -0.00 0.942 22 0.00 0.00 98.43 – –JJ 67085 91.77 91.38 91.57 -0.09 0.022 4581 78.00 73.00 91.57 -0.79 0.217JJR 3621 84.60 90.89 87.63 0.09 0.715 85 66.67 30.59 87.63 20.29 0.471JJS 2129 91.60 92.25 91.93 -1.43 0.352 54 69.64 72.22 91.93 -4.01 0.374MD 10743 99.67 99.80 99.73 0.01 0.418 5 0.00 99.73 – –NN 146173 96.22 95.97 96.09 -0.04 0.146 3934 74.54 68.38 96.09 -0.65 0.051NNP 100926 96.66 97.30 96.98 -0.03 0.427 6075 83.73 93.50 96.98 -0.25 0.648NNPS 2917 62.84 66.78 64.75 -1.30 0.247 234 33.77 22.22 64.75 28.99 0.448NNS 65922 97.59 97.83 97.71 -0.02 0.772 2353 83.37 86.91 97.71 -0.78 0.493PDT 397 70.42 80.35 75.06 -0.16 0.771 0 75.06 – –POS 9529 98.99 99.53 99.26 0.04 0.131 0 99.26 – –PRP 19164 99.72 99.38 99.55 0.00 0.994 11 0.00 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.05 90.98 92.49 -0.09 0.004 516 84.22 79.65 92.49 -0.51 0.699RBR 1905 75.85 65.93 70.54 0.10 0.884 7 33.33 14.29 70.54 – –RBS 486 71.64 70.16 70.89 -9.75 0.373 2 0.00 70.89 – –RP 2879 78.58 75.20 76.85 -0.24 0.724 0 0.00 76.85 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 100.00 100.00 82.64 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.20 95.26 95.23 -0.21 0.034 570 80.56 76.32 95.23 -1.67 0.185VBD 32941 95.58 94.58 95.08 0.02 0.767 426 82.12 63.62 95.08 -0.77 0.427VBG 16321 91.42 92.12 91.77 -0.03 0.846 924 70.38 92.32 91.77 0.83 0.075VBN 22177 87.11 90.49 88.77 -0.08 0.173 716 66.25 81.98 88.77 -0.98 0.169VBP 13819 93.52 92.13 92.82 -0.16 0.176 131 65.69 51.15 92.82 3.69 0.366VBZ 23816 97.69 96.66 97.17 0.15 0.135 467 88.89 58.24 97.17 5.43 0.182WDT 4745 96.06 96.14 96.10 0.02 0.938 4 0.00 96.10 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.26 -1.08 0.054TOKENS 1044667 96.806 -0.04 0.052 24622 81.59 -0.44 0.262

78

B.22

(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)(Pos = IN & Wd ∈ {after, before, since, until} & SibR =re (?!S)) ⇒ (Pos ← IN–TMP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.37 99.49 0.02 0.381 1 0.00 0.00 99.49 – –CD 40132 99.30 99.46 99.38 -0.02 0.450 3413 98.28 97.22 99.38 -0.34 0.272DT 90066 99.28 99.50 99.39 -0.01 0.303 2 0.00 99.39 – –EX 951 95.53 98.84 97.16 -0.11 0.793 0 97.16 – –FW 238 52.83 47.06 49.78 -3.78 0.191 67 75.00 4.48 49.78 – –IN 108456 98.23 98.62 98.42 -0.01 0.805 22 0.00 0.00 98.42 – –JJ 67085 91.94 91.19 91.56 -0.10 0.112 4581 78.85 72.04 91.56 -0.95 0.096JJR 3621 84.85 90.47 87.57 0.02 0.951 85 78.79 30.59 87.57 26.40 0.384JJS 2129 94.23 92.02 93.11 -0.16 0.987 54 71.93 75.93 93.11 – –MD 10743 99.67 99.77 99.72 0.00 0.979 5 0.00 99.72 – –NN 146173 96.26 96.00 96.13 -0.00 0.991 3934 75.19 68.81 96.13 0.10 0.826NNP 100926 96.59 97.34 96.96 -0.05 0.041 6075 83.02 94.80 96.96 -0.06 0.781NNPS 2917 63.96 66.13 65.03 -0.88 0.393 234 39.24 13.25 65.03 -4.67 0.806NNS 65922 97.58 97.92 97.75 0.02 0.246 2353 83.34 88.02 97.75 -0.18 0.593PDT 397 70.70 80.86 75.44 0.35 0.231 0 75.44 – –POS 9529 98.93 99.55 99.24 0.02 0.463 0 99.24 – –PRP 19164 99.74 99.36 99.55 -0.00 0.996 11 0.00 99.55 – –PRP$ 9173 99.38 99.89 99.64 -0.01 0.374 1 0.00 99.64 – –RB 33806 93.98 90.97 92.45 -0.14 0.081 516 84.93 80.81 92.45 0.64 0.572RBR 1905 75.51 66.19 70.55 0.10 0.831 7 25.00 14.29 70.55 – –RBS 486 74.26 82.51 78.17 -0.49 0.950 2 0.00 78.17 – –RP 2879 78.17 76.14 77.14 0.13 0.656 0 77.14 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.18 95.43 95.30 -0.13 0.189 570 81.63 77.19 95.30 -0.45 0.743VBD 32941 95.61 94.62 95.12 0.06 0.518 426 83.12 62.44 95.12 -1.30 0.402VBG 16321 91.46 92.24 91.85 0.06 0.387 924 70.95 91.99 91.85 1.14 0.090VBN 22177 87.19 90.63 88.87 0.04 0.726 716 66.48 82.54 88.87 -0.49 0.716VBP 13819 93.60 92.31 92.95 -0.02 0.506 131 61.74 54.20 92.95 4.08 0.633VBZ 23816 97.69 96.60 97.14 0.12 0.074 467 87.33 56.10 97.14 2.34 0.314WDT 4745 96.29 96.19 96.24 0.15 0.189 4 0.00 96.24 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.49 -0.64 0.233TOKENS 1044667 96.827 -0.02 0.452 24622 81.85 -0.11 0.640

79

B.23 vb–tr[s] Mapping

(Pos = VB & SibAllR = NP) ⇒ (Pos ← VB–TR)(Pos = VBG & SibAllR = NP) ⇒ (Pos ← VBG–TR)(Pos = VBD & SibAllR = NP) ⇒ (Pos ← VBD–TR)(Pos = VBN & SibAllR = NP) ⇒ (Pos ← VBN–TR)(Pos = VBP & SibAllR = NP) ⇒ (Pos ← VBP–TR)(Pos = VBZ & SibAllR = NP) ⇒ (Pos ← VBZ–TR)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.90 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.37 99.51 0.03 0.316 1 0.00 0.00 99.51 – –CD 40132 99.34 99.44 99.39 -0.02 0.427 3413 98.84 97.04 99.39 -0.15 0.398DT 90066 99.34 99.50 99.42 0.02 0.283 2 0.00 99.42 – –EX 951 95.43 98.74 97.05 -0.21 0.266 0 97.05 – –FW 238 53.77 47.90 50.67 -2.06 0.061 67 25.00 1.49 50.67 – –IN 108456 98.18 98.67 98.42 -0.00 0.789 22 0.00 0.00 98.42 – –JJ 67085 91.48 91.53 91.50 -0.17 0.064 4581 78.07 73.98 91.50 -0.06 0.933JJR 3621 84.62 90.58 87.50 -0.05 0.827 85 63.89 27.06 87.50 9.05 0.622JJS 2129 93.91 92.02 92.95 -0.33 0.975 54 71.43 74.07 92.95 -1.55 0.374MD 10743 99.64 99.80 99.72 0.00 0.979 5 0.00 99.72 – –NN 146173 96.08 96.10 96.09 -0.04 0.268 3934 73.59 69.22 96.09 -0.63 0.290NNP 100926 96.62 97.34 96.98 -0.04 0.323 6075 83.59 94.52 96.98 0.17 0.646NNPS 2917 63.07 66.34 64.66 -1.43 0.215 234 31.78 14.53 64.66 -4.03 0.789NNS 65922 97.30 98.01 97.65 -0.07 0.123 2353 80.97 87.89 97.65 -1.73 0.019PDT 397 70.67 80.10 75.09 -0.12 0.813 0 75.09 – –POS 9529 98.87 99.56 99.22 -0.00 0.831 0 99.22 – –PRP 19164 99.75 99.35 99.55 -0.00 0.961 11 0.00 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 93.94 91.19 92.54 -0.03 0.440 516 83.50 82.36 92.54 0.77 0.536RBR 1905 75.46 66.19 70.53 0.07 0.887 7 0.00 0.00 70.53 – –RBS 486 74.39 81.28 77.68 -1.11 0.916 2 0.00 77.68 – –RP 2879 78.57 75.89 77.21 0.21 0.672 0 0.00 77.21 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.04 94.96 95.00 -0.45 0.014 570 81.14 74.74 95.00 -2.38 0.166VBD 32941 95.48 94.28 94.88 -0.19 0.026 426 82.30 58.92 94.88 -4.95 0.005VBG 16321 91.98 91.50 91.74 -0.06 0.734 924 72.11 85.06 91.74 -1.46 0.526VBN 22177 87.56 88.87 88.21 -0.70 0.018 716 67.93 75.14 88.21 -3.58 0.082VBP 13819 93.62 91.14 92.36 -0.65 0.007 131 68.60 45.04 92.36 -1.96 0.487VBZ 23816 98.11 96.10 97.09 0.07 0.278 467 89.12 54.39 97.09 1.20 0.758WDT 4745 96.46 95.36 95.91 -0.19 0.018 4 0.00 95.91 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.89 -1.79 0.003TOKENS 1044667 96.776 -0.07 0.008 24622 81.52 -0.52 0.189

80

B.24

(Pos = TO & Wd = to & SibR = NP) ⇒ (Pos ← IN)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.76 99.32 99.54 0.01 0.298 1 0.00 99.54 – –CD 40132 99.28 99.56 99.42 -0.00 0.700 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.46 99.32 99.39 0.01 0.119 2 0.00 99.39 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 63.25 31.09 41.69 -3.08 0.437 67 0.00 0.00 41.69 – –IN 108456 97.84 98.92 98.38 0.02 0.040 22 0.00 98.38 – –JJ 67085 92.35 92.05 92.20 -0.02 0.005 4581 79.48 82.19 92.20 -0.04 0.684JJR 3621 84.85 92.21 88.38 -0.03 0.707 85 88.46 27.06 88.38 0.00 0.744JJS 2129 95.19 95.68 95.43 -0.03 0.374 54 88.89 59.26 95.43 1.11 0.374MD 10743 99.71 99.79 99.75 0.01 0.178 5 0.00 99.75 – –NN 146173 97.15 95.11 96.12 0.00 0.772 3934 80.45 67.87 96.12 -0.16 0.160NNP 100926 94.69 97.95 96.29 -0.00 0.410 6075 82.47 97.98 96.29 -0.10 0.211NNPS 2917 54.74 73.19 62.64 0.03 0.718 234 53.85 11.97 62.64 -0.35 0.374NNS 65922 97.74 97.21 97.47 0.01 0.297 2353 87.45 90.35 97.47 0.10 0.350PDT 397 69.92 84.89 76.68 0.25 0.552 0 76.68 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.40 99.96 99.68 0.01 0.374 1 0.00 99.68 – –RB 33806 95.21 90.50 92.80 -0.00 0.847 516 90.30 81.20 92.80 -0.14 0.233RBR 1905 77.58 67.03 71.92 0.08 0.697 7 0.00 71.92 – –RBS 486 87.77 84.16 85.92 -0.14 0.374 2 0.00 0.00 85.92 – –RP 2879 75.64 75.62 75.63 -0.12 0.247 0 75.63 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 75.00 36.00 48.65 -0.68 0.374 17 33.33 5.88 48.65 – –VB 29021 96.80 94.41 95.59 -0.07 0.069 570 86.45 70.53 95.59 -5.57 0.033VBD 32941 95.01 95.91 95.46 0.00 0.997 426 80.32 70.89 95.46 0.08 0.864VBG 16321 92.41 94.23 93.31 0.03 0.197 924 82.14 91.56 93.31 0.20 0.237VBN 22177 88.17 91.00 89.56 0.00 0.960 716 78.36 73.32 89.56 0.28 0.216VBP 13819 93.98 92.24 93.10 0.05 0.002 131 73.53 38.17 93.10 13.07 0.005VBZ 23816 98.83 96.43 97.61 0.01 0.484 467 89.47 65.52 97.61 0.57 0.318WDT 4745 96.97 96.56 96.77 -0.00 0.927 4 0.00 96.77 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.70 -0.03 0.791TOKENS 1044667 96.854 0.00 0.752 24622 84.51 -0.13 0.097

81

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.90 99.89 -0.02 0.208 0 99.89 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.41 99.48 0.00 0.644 1 0.00 99.48 – –CD 40132 99.34 99.45 99.39 -0.02 0.450 3413 98.69 97.16 99.39 -0.16 0.173DT 90066 99.34 99.48 99.41 0.01 0.476 2 0.00 99.41 – –EX 951 95.33 98.74 97.00 -0.26 0.216 0 97.00 – –FW 238 58.19 43.28 49.64 -4.05 0.090 67 0.00 0.00 49.64 – –IN 108456 98.15 98.67 98.41 -0.01 0.334 22 9.52 9.09 98.41 – –JJ 67085 92.00 91.26 91.63 -0.03 0.549 4581 78.14 73.26 91.63 -0.52 0.517JJR 3621 84.84 90.39 87.53 -0.03 0.892 85 78.57 12.94 87.53 – –JJS 2129 88.45 96.38 92.25 -1.09 0.840 54 73.21 75.93 92.25 0.91 0.374MD 10743 99.67 99.77 99.72 0.00 0.990 5 0.00 99.72 – –NN 146173 96.19 96.04 96.12 -0.01 0.456 3934 73.79 68.71 96.12 -0.87 0.046NNP 100926 96.71 97.27 96.99 -0.03 0.449 6075 84.55 93.15 96.99 0.09 0.839NNPS 2917 62.55 67.74 65.04 -0.85 0.364 234 37.31 32.05 65.04 65.95 0.142NNS 65922 97.63 97.91 97.77 0.04 0.340 2353 84.23 87.63 97.77 0.15 0.936PDT 397 70.34 78.84 74.35 -1.10 0.177 0 74.35 – –POS 9529 98.97 99.52 99.24 0.02 0.404 0 99.24 – –PRP 19164 99.72 99.38 99.55 0.00 0.989 11 0.00 0.00 99.55 – –PRP$ 9173 99.40 99.90 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.08 91.19 92.61 0.04 0.464 516 84.80 82.17 92.61 1.42 0.367RBR 1905 75.92 66.19 70.72 0.36 0.586 7 0.00 0.00 70.72 – –RBS 486 84.85 51.85 64.37 -18.06 0.260 2 0.00 64.37 – –RP 2879 78.91 75.76 77.30 0.33 0.438 0 77.30 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 99.99 99.99 -0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.36 95.36 95.36 -0.07 0.119 570 84.33 74.56 95.36 -0.71 0.447VBD 32941 95.67 94.62 95.15 0.09 0.062 426 84.23 62.68 95.15 -0.53 0.605VBG 16321 91.48 92.35 91.91 0.13 0.061 924 70.45 92.64 91.91 1.04 0.233VBN 22177 87.11 90.59 88.82 -0.02 0.791 716 65.73 81.98 88.82 -1.41 0.060VBP 13819 93.50 92.29 92.89 -0.08 0.213 131 53.17 51.15 92.89 -5.99 0.309VBZ 23816 97.77 96.58 97.17 0.15 0.029 467 86.82 57.82 97.17 3.98 0.130WDT 4745 96.41 95.62 96.01 -0.08 0.151 4 0.00 96.01 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.57 -0.48 0.329TOKENS 1044667 96.833 -0.01 0.677 24622 81.72 -0.27 0.377

82

B.25 to:in Mapping

(Pos = TO & Wd = to & SibR = NP) ⇒ (Pos ← IN)(Pos = TO & Par = QP & Wd = to) ⇒ (Pos ← IN)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.76 99.33 99.54 0.01 0.297 1 0.00 99.54 – –CD 40132 99.28 99.56 99.42 -0.00 0.991 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.46 99.32 99.39 0.01 0.171 2 0.00 99.39 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 63.79 31.09 41.81 -2.81 0.500 67 0.00 0.00 41.81 – –IN 108456 97.85 98.92 98.38 0.02 0.058 22 0.00 98.38 – –JJ 67085 92.36 92.05 92.20 -0.02 0.062 4581 79.47 82.21 92.20 -0.03 0.707JJR 3621 84.82 92.13 88.32 -0.10 0.228 85 88.46 27.06 88.32 0.00 0.744JJS 2129 95.19 95.68 95.43 -0.03 0.374 54 88.89 59.26 95.43 1.11 0.374MD 10743 99.71 99.79 99.75 0.01 0.178 5 0.00 99.75 – –NN 146173 97.15 95.12 96.12 0.01 0.457 3934 80.47 67.87 96.12 -0.15 0.182NNP 100926 94.68 97.95 96.29 -0.00 0.120 6075 82.46 97.98 96.29 -0.11 0.130NNPS 2917 54.74 73.19 62.64 0.03 0.718 234 53.85 11.97 62.64 -0.35 0.374NNS 65922 97.74 97.21 97.47 0.01 0.320 2353 87.45 90.35 97.47 0.10 0.350PDT 397 69.92 84.89 76.68 0.25 0.552 0 76.68 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.40 99.96 99.68 0.01 0.374 1 0.00 99.68 – –RB 33806 95.21 90.51 92.80 -0.00 0.913 516 90.30 81.20 92.80 -0.14 0.233RBR 1905 77.51 66.93 71.83 -0.04 0.797 7 0.00 71.83 – –RBS 486 87.77 84.16 85.92 -0.14 0.374 2 0.00 0.00 85.92 – –RP 2879 75.60 75.65 75.63 -0.13 0.177 0 75.63 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 75.00 36.00 48.65 -0.68 0.374 17 33.33 5.88 48.65 – –VB 29021 96.81 94.41 95.60 -0.06 0.111 570 86.64 70.53 95.60 -5.48 0.034VBD 32941 95.01 95.91 95.46 -0.00 0.935 426 80.32 70.89 95.46 0.08 0.864VBG 16321 92.41 94.22 93.31 0.03 0.235 924 82.12 91.45 93.31 0.13 0.498VBN 22177 88.17 91.00 89.56 0.00 0.956 716 78.36 73.32 89.56 0.28 0.216VBP 13819 93.98 92.25 93.11 0.05 0.012 131 73.53 38.17 93.11 13.07 0.005VBZ 23816 98.83 96.43 97.61 0.00 0.602 467 89.47 65.52 97.61 0.57 0.318WDT 4745 97.01 96.54 96.78 0.01 0.785 4 0.00 96.78 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.72 0.00 0.973TOKENS 1044667 96.855 0.00 0.659 24622 84.51 -0.13 0.077

83

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.67 99.36 99.52 0.04 0.286 1 0.00 99.52 – –CD 40132 99.31 99.47 99.39 -0.01 0.160 3413 98.60 97.30 99.39 -0.13 0.367DT 90066 99.33 99.49 99.41 0.01 0.318 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.228 0 97.06 – –FW 238 59.90 48.32 53.49 3.39 0.442 67 57.14 5.97 53.49 – –IN 108456 98.13 98.69 98.41 -0.01 0.139 22 10.53 9.09 98.41 – –JJ 67085 92.06 91.24 91.65 -0.01 0.851 4581 78.59 72.67 91.65 -0.66 0.527JJR 3621 85.01 90.06 87.46 -0.10 0.747 85 75.00 14.12 87.46 – –JJS 2129 91.40 89.90 90.65 -2.80 0.401 54 73.21 75.93 90.65 0.91 0.374MD 10743 99.67 99.77 99.72 0.00 0.609 5 0.00 99.72 – –NN 146173 96.21 96.04 96.12 -0.01 0.735 3934 73.80 68.81 96.12 -0.79 0.467NNP 100926 96.70 97.33 97.01 0.00 0.981 6075 84.04 93.71 97.01 0.05 0.929NNPS 2917 63.06 67.19 65.06 -0.82 0.153 234 36.59 19.23 65.06 21.32 0.243NNS 65922 97.62 97.86 97.74 0.01 0.773 2353 83.53 87.29 97.74 -0.46 0.366PDT 397 70.54 79.60 74.79 -0.51 0.407 0 74.79 – –POS 9529 98.94 99.52 99.23 0.01 0.626 0 99.23 – –PRP 19164 99.72 99.37 99.55 -0.01 0.630 11 0.00 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.02 91.22 92.60 0.03 0.608 516 83.01 82.36 92.60 0.48 0.661RBR 1905 75.31 67.09 70.96 0.69 0.366 7 0.00 0.00 70.96 – –RBS 486 65.08 70.16 67.52 -14.04 0.291 2 0.00 67.52 – –RP 2879 78.89 75.69 77.26 0.28 0.414 0 77.26 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.98 100.00 99.99 -0.00 0.178 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.50 95.28 95.39 -0.04 0.570 570 84.57 74.04 95.39 -0.95 0.352VBD 32941 95.63 94.62 95.12 0.06 0.162 426 80.47 64.79 95.12 -0.65 0.716VBG 16321 91.53 92.29 91.91 0.12 0.247 924 70.86 92.64 91.91 1.38 0.016VBN 22177 86.96 90.65 88.77 -0.08 0.481 716 65.92 82.40 88.77 -1.02 0.669VBP 13819 93.52 92.42 92.97 -0.00 0.988 131 58.20 54.20 92.97 1.20 0.988VBZ 23816 97.73 96.52 97.12 0.10 0.114 467 83.92 55.89 97.12 0.51 0.787WDT 4745 96.52 95.81 96.16 0.08 0.602 4 0.00 96.16 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.60 -0.42 0.322TOKENS 1044667 96.833 -0.01 0.552 24622 81.67 -0.33 0.535

84

B.26

(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.70 99.37 99.54 0.00 0.912 1 0.00 99.54 – –CD 40132 99.27 99.57 99.42 0.00 0.988 3413 99.35 98.51 99.42 – –DT 90066 99.41 99.34 99.38 0.00 0.656 2 0.00 99.38 – –EX 951 95.08 99.58 97.28 -0.10 0.179 0 97.28 – –FW 238 62.40 32.77 42.98 -0.10 0.953 67 0.00 0.00 42.98 – –IN 108456 97.96 98.80 98.38 0.02 0.194 22 0.00 98.38 – –JJ 67085 92.35 92.08 92.21 -0.00 0.764 4581 79.68 82.10 92.21 0.04 0.682JJR 3621 84.71 92.29 88.34 -0.08 0.576 85 88.89 28.24 88.34 3.42 0.962JJS 2129 95.19 95.77 95.48 0.03 0.660 54 86.49 59.26 95.48 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.09 96.12 0.00 0.670 3934 81.14 67.69 96.12 0.09 0.272NNP 100926 94.69 97.97 96.30 0.01 0.192 6075 82.45 98.16 96.30 -0.03 0.305NNPS 2917 54.81 73.19 62.68 0.11 0.650 234 54.00 11.54 62.68 -3.23 0.374NNS 65922 97.73 97.22 97.47 0.01 0.440 2353 87.32 90.44 97.47 0.07 0.284PDT 397 70.02 84.13 76.43 -0.07 0.746 0 76.43 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 0.00 0.983 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 0.00 0.987 1 0.00 99.67 – –RB 33806 95.10 90.63 92.81 0.01 0.766 516 90.30 81.20 92.81 -0.14 0.374RBR 1905 77.41 66.56 71.58 -0.40 0.204 7 0.00 71.58 – –RBS 486 87.98 84.36 86.13 0.11 0.797 2 0.00 0.00 86.13 – –RP 2879 75.05 75.65 75.35 -0.49 0.132 0 75.35 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.74 94.61 95.66 0.01 0.277 570 86.35 78.77 95.66 0.14 0.711VBD 32941 95.01 95.94 95.47 0.02 0.466 426 80.59 71.13 95.47 0.41 0.401VBG 16321 92.37 94.16 93.26 -0.03 0.453 924 82.05 91.02 93.26 -0.13 0.557VBN 22177 88.24 90.90 89.55 -0.01 0.899 716 78.53 73.04 89.55 0.19 0.623VBP 13819 93.98 92.24 93.10 0.04 0.219 131 72.88 32.82 93.10 1.84 0.921VBZ 23816 98.84 96.41 97.61 0.00 0.825 467 89.91 64.88 97.61 0.21 0.848WDT 4745 96.41 96.88 96.65 -0.13 0.288 4 0.00 96.65 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.77 0.11 0.380TOKENS 1044667 96.855 0.00 0.003 24622 84.65 0.03 0.424

85

B.27

(Pos = RB & SibR = RB) ⇒ (Pos ← RB–DEG)(Pos = RB & SibR = JJ) ⇒ (Pos ← RB–DEG)(Pos = RB & Par = ADJP) ⇒ (Pos ← RB–DEG)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.67 99.41 99.54 0.00 0.804 1 0.00 99.54 – –CD 40132 99.28 99.56 99.42 0.00 0.375 3413 99.35 98.51 99.42 – –DT 90066 99.46 99.29 99.38 0.00 0.698 2 0.00 99.38 – –EX 951 94.98 99.58 97.23 -0.15 0.345 0 97.23 – –FW 238 64.46 32.77 43.45 1.02 0.374 67 0.00 0.00 43.45 – –IN 108456 97.81 98.93 98.37 0.00 0.554 22 0.00 98.37 – –JJ 67085 92.23 92.18 92.21 -0.01 0.789 4581 79.75 82.21 92.21 0.15 0.316JJR 3621 84.79 92.38 88.42 0.01 0.926 85 88.46 27.06 88.42 – –JJS 2129 95.19 95.77 95.48 0.03 0.374 54 86.84 61.11 95.48 2.00 0.374MD 10743 99.70 99.78 99.74 -0.00 0.978 5 0.00 99.74 – –NN 146173 97.16 95.09 96.12 0.00 0.494 3934 81.16 67.77 96.12 0.16 0.250NNP 100926 94.63 97.96 96.27 -0.03 0.163 6075 82.49 98.16 96.27 -0.00 0.888NNPS 2917 54.77 73.40 62.73 0.18 0.087 234 53.85 11.97 62.73 -0.35 0.374NNS 65922 97.73 97.20 97.47 -0.00 0.896 2353 87.25 90.48 97.47 0.05 0.645PDT 397 69.34 84.89 76.33 -0.20 0.866 0 76.33 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.38 99.96 99.67 -0.01 0.374 1 0.00 99.67 – –RB 33806 95.76 89.91 92.74 -0.06 0.545 516 91.81 80.43 92.74 0.14 0.759RBR 1905 77.33 67.14 71.87 0.02 0.961 7 0.00 0.00 71.87 – –RBS 486 87.79 84.36 86.04 0.00 0.914 2 0.00 0.00 86.04 – –RP 2879 75.38 75.93 75.65 -0.09 0.631 0 75.65 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.00 38.00 50.67 3.44 0.227 17 33.33 5.88 50.67 – –VB 29021 96.74 94.62 95.67 0.02 0.002 570 86.56 79.12 95.67 0.50 0.041VBD 32941 95.00 95.94 95.46 0.00 0.698 426 81.12 71.60 95.46 1.08 0.014VBG 16321 92.34 94.18 93.25 -0.03 0.290 924 82.10 91.34 93.25 0.07 0.374VBN 22177 88.20 90.95 89.56 -0.00 0.980 716 78.81 73.74 89.56 0.86 0.050VBP 13819 93.99 92.16 93.06 0.00 0.951 131 73.68 32.06 93.06 0.53 0.849VBZ 23816 98.82 96.41 97.60 -0.01 0.681 467 89.82 64.24 97.60 -0.41 0.274WDT 4745 97.20 96.42 96.80 0.03 0.071 4 0.00 96.80 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.91 99.96 0.02 0.374 1 0.00 99.96 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.64 -0.14 0.414TOKENS 1044667 96.848 -0.00 0.613 24622 84.70 0.10 0.100

86

B.28 vb–inf[lm] Mapping

(Pos = VB & Wd ∈ {do, help, let, make}) ⇒ (Pos ← VB–I)(Pos = VBG & Wd ∈ {doing, helping, letting, making}) ⇒ (Pos ← VBG–I)(Pos = VBD & Wd ∈ {did, helped, let, made}) ⇒ (Pos ← VBD–I)(Pos = VBN & Wd ∈ {done, helped, let, made}) ⇒ (Pos ← VBN–I)(Pos = VBP & Wd ∈ {do, help, let, make}) ⇒ (Pos ← VBP–I)(Pos = VBZ & Wd ∈ {does, helps, lets, makes}) ⇒ (Pos ← VBZ–I)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.73 99.33 99.53 -0.00 0.374 1 0.00 99.53 – –CD 40132 99.27 99.56 99.42 -0.00 0.178 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.47 99.29 99.38 0.00 0.264 2 0.00 99.38 – –EX 951 95.37 99.58 97.43 0.05 0.374 0 97.43 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.83 98.91 98.37 0.00 0.247 22 0.00 98.37 – –JJ 67085 92.37 92.05 92.21 -0.01 0.507 4581 79.64 82.03 92.21 -0.03 0.440JJR 3621 84.95 92.27 88.46 0.05 0.634 85 88.00 25.88 88.46 -3.48 0.374JJS 2129 95.15 95.77 95.46 0.00 0.944 54 86.49 59.26 95.46 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.18 95.07 96.11 -0.00 0.290 3934 81.02 67.59 96.11 -0.07 0.255NNP 100926 94.68 97.96 96.29 -0.01 0.261 6075 82.47 98.11 96.29 -0.04 0.153NNPS 2917 54.77 73.26 62.68 0.10 0.148 234 51.92 11.54 62.68 -3.91 0.282NNS 65922 97.70 97.21 97.46 -0.01 0.258 2353 87.24 90.40 97.46 0.00 0.996PDT 397 69.94 84.38 76.48 – – 0 76.48 – –POS 9529 98.77 99.65 99.21 0.02 0.207 0 99.21 – –PRP 19164 99.85 99.34 99.59 0.00 0.597 11 0.00 0.00 99.59 – –PRP$ 9173 99.40 99.96 99.68 0.01 0.625 1 0.00 99.68 – –RB 33806 95.17 90.54 92.80 -0.00 0.852 516 90.32 81.40 92.80 – –RBR 1905 77.53 67.19 71.99 0.18 0.468 7 0.00 71.99 – –RBS 486 87.96 84.16 86.01 -0.03 0.882 2 0.00 0.00 86.01 – –RP 2879 75.57 75.76 75.66 -0.08 0.227 0 75.66 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.70 94.65 95.66 0.01 0.789 570 85.74 79.12 95.66 0.04 0.813VBD 32941 94.93 95.93 95.43 -0.03 0.118 426 80.70 70.66 95.43 0.13 0.700VBG 16321 92.35 94.19 93.26 -0.02 0.323 924 82.02 91.34 93.26 0.02 0.944VBN 22177 88.19 90.85 89.50 -0.07 0.044 716 78.41 73.04 89.50 0.12 0.616VBP 13819 93.96 92.16 93.05 -0.01 0.586 131 72.88 32.82 93.05 1.84 0.213VBZ 23816 98.81 96.35 97.56 -0.04 0.103 467 89.22 63.81 97.56 -1.08 0.072WDT 4745 97.18 96.46 96.82 0.05 0.412 4 0.00 96.82 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.66 -0.11 0.370TOKENS 1044667 96.848 -0.00 0.276 24622 84.59 -0.04 0.356

87

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.36 99.47 -0.01 0.210 1 0.00 99.47 – –CD 40132 99.32 99.42 99.37 -0.04 0.024 3413 98.71 96.69 99.37 -0.39 0.061DT 90066 99.32 99.47 99.40 -0.01 0.464 2 0.00 99.40 – –EX 951 95.53 98.84 97.16 -0.11 0.374 0 97.16 – –FW 238 55.88 47.90 51.58 -0.29 0.894 67 80.00 5.97 51.58 – –IN 108456 98.14 98.70 98.42 -0.01 0.245 22 0.00 0.00 98.42 – –JJ 67085 92.07 91.34 91.70 0.05 0.409 4581 78.79 72.98 91.70 -0.32 0.500JJR 3621 85.05 90.36 87.63 0.09 0.785 85 78.26 21.18 87.63 – –JJS 2129 94.36 92.02 93.17 -0.09 0.488 54 70.69 75.93 93.17 -0.89 0.374MD 10743 99.69 99.79 99.74 0.02 0.227 5 0.00 99.74 – –NN 146173 96.20 96.05 96.13 -0.00 0.931 3934 73.69 69.12 96.13 -0.64 0.220NNP 100926 96.68 97.35 97.02 0.00 0.937 6075 84.35 93.98 97.02 0.38 0.194NNPS 2917 63.37 66.85 65.07 -0.82 0.470 234 35.25 20.94 65.07 26.44 0.691NNS 65922 97.54 97.91 97.72 -0.00 0.384 2353 82.99 88.31 97.72 -0.23 0.595PDT 397 70.89 80.35 75.32 0.20 0.664 0 75.32 – –POS 9529 98.91 99.54 99.22 0.00 0.999 0 99.22 – –PRP 19164 99.74 99.35 99.55 -0.01 0.178 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.12 91.25 92.66 0.10 0.035 516 85.09 82.95 92.66 2.08 0.051RBR 1905 75.49 66.77 70.86 0.55 0.517 7 0.00 0.00 70.86 – –RBS 486 74.40 83.13 78.52 -0.04 0.824 2 0.00 78.52 – –RP 2879 79.16 75.48 77.28 0.30 0.192 0 77.28 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 73.44 47.00 57.32 -0.61 0.374 17 0.00 0.00 57.32 – –VB 29021 95.56 95.44 95.50 0.08 0.310 570 85.71 75.79 95.50 0.93 0.586VBD 32941 95.63 94.37 94.99 -0.07 0.134 426 81.29 62.21 94.99 -2.45 0.156VBG 16321 91.77 92.10 91.93 0.15 0.147 924 71.09 92.86 91.93 1.66 0.101VBN 22177 86.78 90.69 88.70 -0.16 0.072 716 66.89 82.12 88.70 -0.37 0.794VBP 13819 93.79 92.12 92.95 -0.02 0.857 131 59.17 54.20 92.95 2.00 0.889VBZ 23816 97.65 96.47 97.06 0.03 0.524 467 87.25 55.67 97.06 1.83 0.582WDT 4745 96.59 95.51 96.05 -0.04 0.580 4 0.00 96.05 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.61 -0.41 0.123TOKENS 1044667 96.842 0.00 0.969 24622 81.89 -0.06 0.746

88

B.29

(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.69 99.37 99.53 -0.00 0.853 1 0.00 99.53 – –CD 40132 99.27 99.57 99.42 0.00 0.988 3413 99.35 98.51 99.42 – –DT 90066 99.41 99.34 99.38 0.00 0.656 2 0.00 99.38 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 62.40 32.77 42.98 -0.10 0.953 67 0.00 0.00 42.98 – –IN 108456 97.96 98.80 98.38 0.02 0.170 22 0.00 98.38 – –JJ 67085 92.36 92.07 92.22 -0.00 0.856 4581 79.64 82.08 92.22 0.00 0.983JJR 3621 84.71 92.29 88.34 -0.08 0.576 85 88.89 28.24 88.34 3.42 0.962JJS 2129 95.19 95.77 95.48 0.03 0.660 54 86.49 59.26 95.48 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.09 96.12 0.00 0.603 3934 81.15 67.64 96.12 0.05 0.735NNP 100926 94.69 97.97 96.30 0.01 0.345 6075 82.45 98.16 96.30 -0.03 0.305NNPS 2917 54.81 73.19 62.68 0.11 0.650 234 54.00 11.54 62.68 -3.23 0.374NNS 65922 97.73 97.22 97.47 0.01 0.577 2353 87.32 90.44 97.47 0.07 0.284PDT 397 70.02 84.13 76.43 -0.07 0.746 0 76.43 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 0.00 0.983 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 0.00 0.987 1 0.00 99.67 – –RB 33806 95.09 90.63 92.81 0.01 0.741 516 90.30 81.20 92.81 -0.14 0.374RBR 1905 77.41 66.56 71.58 -0.40 0.204 7 0.00 71.58 – –RBS 486 87.98 84.36 86.13 0.11 0.797 2 0.00 0.00 86.13 – –RP 2879 75.05 75.65 75.35 -0.49 0.130 0 75.35 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.73 94.61 95.66 0.00 0.437 570 86.18 78.77 95.66 0.05 0.979VBD 32941 95.02 95.94 95.47 0.02 0.444 426 80.37 71.13 95.47 0.29 0.549VBG 16321 92.36 94.16 93.25 -0.03 0.384 924 82.05 91.02 93.25 -0.13 0.557VBN 22177 88.24 90.91 89.56 -0.00 0.922 716 78.53 73.04 89.56 0.19 0.623VBP 13819 93.99 92.24 93.11 0.05 0.202 131 72.88 32.82 93.11 1.84 0.921VBZ 23816 98.84 96.40 97.61 -0.00 0.989 467 89.91 64.88 97.61 0.21 0.848WDT 4745 96.41 96.88 96.65 -0.13 0.288 4 0.00 96.65 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.77 0.10 0.374TOKENS 1044667 96.854 0.00 0.057 24622 84.64 0.01 0.760

89

B.30

(Pos = VB & SibR = S) ⇒ (Pos ← VB–I)(Pos = VBG & SibR = S) ⇒ (Pos ← VBG–I)(Pos = VBD & SibR = S) ⇒ (Pos ← VBD–I)(Pos = VBN & SibR = S) ⇒ (Pos ← VBN–I)(Pos = VBP & SibR = S) ⇒ (Pos ← VBP–I)(Pos = VBZ & SibR = S) ⇒ (Pos ← VBZ–I)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.60 99.39 99.49 0.02 0.415 1 0.00 0.00 99.49 – –CD 40132 99.35 99.46 99.41 -0.00 0.917 3413 98.87 97.10 99.41 -0.10 0.524DT 90066 99.32 99.48 99.40 -0.01 0.612 2 0.00 99.40 – –EX 951 95.33 98.74 97.00 -0.26 0.131 0 97.00 – –FW 238 54.90 47.06 50.68 -2.04 0.377 67 0.00 0.00 50.68 – –IN 108456 98.14 98.70 98.42 -0.01 0.725 22 0.00 0.00 98.42 – –JJ 67085 91.87 91.22 91.54 -0.13 0.044 4581 78.67 72.12 91.54 -1.00 0.060JJR 3621 84.77 90.36 87.47 -0.08 0.804 85 68.00 20.00 87.47 – –JJS 2129 94.26 91.83 93.03 -0.25 0.994 54 69.64 72.22 93.03 -4.01 0.097MD 10743 99.65 99.79 99.72 0.00 0.979 5 0.00 99.72 – –NN 146173 96.17 96.02 96.09 -0.04 0.207 3934 74.20 68.86 96.09 -0.50 0.506NNP 100926 96.66 97.30 96.98 -0.04 0.205 6075 83.84 93.93 96.98 0.03 0.944NNPS 2917 64.29 65.48 64.88 -1.10 0.092 234 35.19 8.12 64.88 – –NNS 65922 97.41 97.93 97.67 -0.06 0.031 2353 81.31 89.29 97.67 -0.76 0.054PDT 397 70.72 79.09 74.67 -0.67 0.186 0 74.67 – –POS 9529 98.94 99.51 99.22 0.00 0.994 0 99.22 – –PRP 19164 99.77 99.36 99.57 0.02 0.064 11 0.00 99.57 – –PRP$ 9173 99.40 99.90 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 93.98 91.20 92.57 -0.01 0.869 516 83.20 82.56 92.57 0.71 0.469RBR 1905 75.58 66.30 70.64 0.23 0.702 7 0.00 0.00 70.64 – –RBS 486 74.40 83.13 78.52 -0.04 0.986 2 0.00 78.52 – –RP 2879 79.30 75.44 77.32 0.36 0.344 0 77.32 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.20 95.16 95.18 -0.25 0.000 570 84.89 74.91 95.18 -0.15 0.703VBD 32941 95.75 94.17 94.95 -0.11 0.118 426 85.00 63.85 94.95 0.93 0.370VBG 16321 91.56 92.09 91.82 0.03 0.693 924 70.48 92.75 91.82 1.12 0.120VBN 22177 86.70 90.40 88.51 -0.37 0.018 716 66.09 80.31 88.51 -2.02 0.097VBP 13819 93.15 91.95 92.55 -0.46 0.004 131 48.61 53.44 92.55 -8.21 0.125VBZ 23816 97.65 96.50 97.07 0.05 0.203 467 84.14 55.67 97.07 0.39 0.926WDT 4745 96.50 95.17 95.83 -0.27 0.052 4 0.00 95.83 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 99.87 99.87 99.87 -0.06 0.374 1 0.00 0.00 99.87 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.14 -1.32 0.015TOKENS 1044667 96.798 -0.05 0.034 24622 81.62 -0.40 0.207

90

B.31

(Pos = DT & Wd = the) ⇒ (Pos ← DT–0)(Pos = DT & Wd = a) ⇒ (Pos ← DT–1)(Pos = DT & Wd = The) ⇒ (Pos ← DT–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.38 99.50 0.02 0.349 1 0.00 0.00 99.50 – –CD 40132 99.33 99.46 99.39 -0.01 0.416 3413 98.78 97.25 99.39 -0.07 0.619DT 90066 99.35 99.47 99.41 0.01 0.602 2 0.00 99.41 – –EX 951 95.63 98.84 97.21 -0.05 0.550 0 97.21 – –FW 238 59.26 47.06 52.46 1.41 0.762 67 0.00 52.46 – –IN 108456 98.13 98.69 98.41 -0.02 0.102 22 0.00 0.00 98.41 – –JJ 67085 91.87 91.26 91.57 -0.10 0.090 4581 78.83 72.10 91.57 -0.92 0.164JJR 3621 84.67 90.78 87.62 0.08 0.782 85 75.76 29.41 87.62 21.54 0.425JJS 2129 88.34 92.86 90.54 -2.91 0.180 54 71.93 75.93 90.54 – –MD 10743 99.66 99.77 99.71 -0.00 0.637 5 0.00 99.71 – –NN 146173 96.23 95.98 96.11 -0.02 0.508 3934 74.51 68.89 96.11 -0.28 0.646NNP 100926 96.61 97.36 96.98 -0.03 0.185 6075 82.94 94.58 96.98 -0.21 0.429NNPS 2917 63.33 67.26 65.24 -0.56 0.602 234 34.65 18.80 65.24 17.31 0.915NNS 65922 97.58 97.86 97.72 -0.01 0.738 2353 84.18 87.08 97.72 -0.19 0.685PDT 397 70.25 77.33 73.62 -2.07 0.063 0 0.00 73.62 – –POS 9529 98.94 99.53 99.23 0.01 0.578 0 99.23 – –PRP 19164 99.72 99.36 99.54 -0.01 0.369 11 0.00 0.00 99.54 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.629 1 0.00 99.64 – –RB 33806 94.06 90.94 92.47 -0.11 0.064 516 84.96 81.01 92.47 0.78 0.628RBR 1905 75.88 65.41 70.26 -0.31 0.592 7 20.00 14.29 70.26 – –RBS 486 68.24 53.50 59.98 -23.65 0.180 2 0.00 59.98 – –RP 2879 78.91 74.99 76.90 -0.18 0.481 0 76.90 – –SYM 59 78.12 84.75 81.30 -0.45 0.374 1 0.00 81.30 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.17 95.36 95.26 -0.17 0.072 570 79.52 74.91 95.26 -3.22 0.075VBD 32941 95.55 94.82 95.18 0.13 0.275 426 84.50 65.26 95.18 1.93 0.362VBG 16321 91.49 92.32 91.91 0.12 0.080 924 70.98 92.64 91.91 1.47 0.121VBN 22177 87.36 90.42 88.86 0.03 0.846 716 67.66 82.40 88.86 0.41 0.657VBP 13819 93.67 92.28 92.97 -0.00 0.994 131 66.36 54.20 92.97 7.58 0.461VBZ 23816 97.71 96.50 97.10 0.08 0.319 467 86.91 55.46 97.10 1.44 0.706WDT 4745 96.53 95.51 96.02 -0.07 0.521 4 0.00 96.02 – –WP 2604 99.05 99.62 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.38 -0.84 0.067TOKENS 1044667 96.813 -0.03 0.023 24622 81.78 -0.19 0.547

91

B.32

(Pos = IN & Wd = of) ⇒ (Pos ← IN–0)(Pos = IN & Wd = in) ⇒ (Pos ← IN–1)(Pos = IN & Wd = for) ⇒ (Pos ← IN–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.37 99.49 0.01 0.660 1 0.00 0.00 99.49 – –CD 40132 99.33 99.48 99.40 -0.00 0.917 3413 98.64 97.39 99.40 -0.07 0.655DT 90066 99.30 99.47 99.38 -0.02 0.179 2 0.00 99.38 – –EX 951 95.04 98.63 96.80 -0.47 0.316 0 96.80 – –FW 238 54.11 47.06 50.34 -2.70 0.496 67 75.00 4.48 50.34 – –IN 108456 98.16 98.67 98.42 -0.01 0.613 22 11.11 9.09 98.42 – –JJ 67085 91.96 91.26 91.61 -0.06 0.460 4581 78.30 72.56 91.61 -0.91 0.228JJR 3621 84.79 90.36 87.49 -0.07 0.546 85 62.16 27.06 87.49 8.15 0.952JJS 2129 94.39 91.69 93.02 -0.26 0.138 54 70.00 77.78 93.02 -0.26 0.374MD 10743 99.69 99.78 99.73 0.02 0.291 5 0.00 99.73 – –NN 146173 96.21 95.97 96.09 -0.04 0.271 3934 73.96 68.02 96.09 -1.28 0.318NNP 100926 96.65 97.36 97.00 -0.01 0.758 6075 83.81 93.96 97.00 0.03 0.984NNPS 2917 62.92 67.12 64.95 -0.99 0.245 234 35.61 20.09 64.95 23.60 0.099NNS 65922 97.60 97.82 97.71 -0.02 0.628 2353 83.65 87.21 97.71 -0.44 0.086PDT 397 70.98 80.10 75.27 0.12 0.212 0 75.27 – –POS 9529 98.94 99.51 99.22 0.00 0.993 0 99.22 – –PRP 19164 99.73 99.36 99.55 -0.01 0.593 11 0.00 99.55 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.374 1 0.00 99.64 – –RB 33806 93.96 91.18 92.55 -0.03 0.549 516 84.40 81.78 92.55 0.94 0.424RBR 1905 74.85 66.40 70.38 -0.14 0.890 7 20.00 14.29 70.38 – –RBS 486 73.42 83.54 78.15 -0.51 0.413 2 0.00 78.15 – –RP 2879 78.25 75.34 76.77 -0.36 0.192 0 76.77 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.39 95.41 95.40 -0.02 0.664 570 81.00 77.02 95.40 -0.94 0.465VBD 32941 95.59 94.56 95.07 0.01 0.860 426 83.54 61.97 95.07 -1.51 0.500VBG 16321 91.37 92.28 91.82 0.03 0.780 924 69.87 92.86 91.82 0.67 0.414VBN 22177 87.16 90.40 88.75 -0.09 0.521 716 66.40 80.59 88.75 -1.62 0.283VBP 13819 93.50 92.24 92.86 -0.11 0.197 131 55.46 50.38 92.86 -4.80 0.569VBZ 23816 97.66 96.45 97.05 0.03 0.433 467 87.07 54.82 97.05 0.79 0.744WDT 4745 96.52 95.41 95.96 -0.13 0.197 4 0.00 95.96 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.55 -0.52 0.159TOKENS 1044667 96.819 -0.02 0.225 24622 81.57 -0.46 0.321

92

B.33

(Pos = WRB & Wd = when) ⇒ (Pos ← WRB–0)(Pos = WRB & Wd = where) ⇒ (Pos ← WRB–1)(Pos = WRB & Wd = how) ⇒ (Pos ← WRB–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.40 99.48 0.00 0.374 1 0.00 0.00 99.48 – –CD 40132 99.35 99.42 99.38 -0.03 0.347 3413 98.66 96.75 99.38 -0.39 0.185DT 90066 99.34 99.47 99.41 0.00 0.070 2 0.00 99.41 – –EX 951 95.53 98.84 97.16 -0.11 0.374 0 97.16 – –FW 238 57.89 46.22 51.40 -0.64 0.828 67 33.33 1.49 51.40 – –IN 108456 98.18 98.70 98.44 0.01 0.470 22 0.00 0.00 98.44 – –JJ 67085 92.13 91.20 91.67 0.01 0.906 4581 79.82 71.99 91.67 -0.41 0.666JJR 3621 84.87 90.64 87.66 0.13 0.611 85 68.97 23.53 87.66 0.65 0.884JJS 2129 91.57 92.30 91.93 -1.42 0.355 54 71.93 75.93 91.93 – –MD 10743 99.67 99.78 99.72 0.00 0.605 5 0.00 99.72 – –NN 146173 96.21 96.07 96.14 0.01 0.679 3934 74.33 70.13 96.14 0.53 0.108NNP 100926 96.67 97.37 97.02 0.00 0.944 6075 83.82 94.83 97.02 0.47 0.117NNPS 2917 63.43 66.95 65.14 -0.70 0.433 234 39.26 22.65 65.14 38.25 0.552NNS 65922 97.61 97.82 97.71 -0.01 0.526 2353 83.83 87.04 97.71 -0.42 0.279PDT 397 71.01 79.60 75.06 -0.16 0.374 0 75.06 – –POS 9529 98.95 99.50 99.22 0.00 0.997 0 99.22 – –PRP 19164 99.72 99.38 99.55 0.00 0.783 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.06 91.27 92.64 0.08 0.138 516 84.40 81.78 92.64 0.94 0.354RBR 1905 75.65 66.88 70.99 0.74 0.248 7 33.33 14.29 70.99 – –RBS 486 71.82 69.75 70.77 -9.90 0.369 2 0.00 70.77 – –RP 2879 78.71 76.03 77.35 0.40 0.006 0 77.35 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.36 95.43 95.39 -0.04 0.673 570 81.82 75.79 95.39 -1.28 0.393VBD 32941 95.64 94.47 95.05 -0.01 0.831 426 81.63 63.62 95.05 -1.03 0.193VBG 16321 91.72 92.16 91.94 0.16 0.022 924 70.48 91.23 91.94 0.40 0.377VBN 22177 86.90 90.70 88.76 -0.09 0.430 716 65.28 81.15 88.76 -2.23 0.106VBP 13819 93.52 92.46 92.98 0.02 0.726 131 60.33 55.73 92.98 4.46 0.697VBZ 23816 97.58 96.54 97.06 0.03 0.597 467 86.08 56.96 97.06 2.70 0.492WDT 4745 96.49 95.57 96.03 -0.06 0.633 4 0.00 96.03 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.63 -0.37 0.162TOKENS 1044667 96.840 -0.00 0.714 24622 81.93 -0.02 0.933

93

B.34

(Pos = WP & Wd = who) ⇒ (Pos ← WP–0)(Pos = WP & Wd = what) ⇒ (Pos ← WP–1)(Pos = WP & Wd = What) ⇒ (Pos ← WP–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.36 99.50 0.02 0.531 1 0.00 99.50 – –CD 40132 99.35 99.45 99.40 -0.01 0.526 3413 98.78 96.89 99.40 -0.26 0.154DT 90066 99.32 99.49 99.40 0.00 0.987 2 0.00 99.40 – –EX 951 95.53 98.84 97.16 -0.11 0.568 0 97.16 – –FW 238 54.68 46.64 50.34 -2.69 0.307 67 42.86 4.48 50.34 – –IN 108456 98.17 98.68 98.42 -0.00 0.849 22 0.00 0.00 98.42 – –JJ 67085 91.89 91.26 91.58 -0.09 0.145 4581 78.61 71.01 91.58 -1.84 0.133JJR 3621 84.87 90.91 87.79 0.27 0.211 85 72.22 30.59 87.79 23.27 0.374JJS 2129 90.73 88.77 89.74 -3.77 0.204 54 71.93 75.93 89.74 – –MD 10743 99.67 99.78 99.72 0.00 0.605 5 0.00 99.72 – –NN 146173 96.20 96.04 96.12 -0.01 0.665 3934 73.13 69.75 96.12 -0.54 0.417NNP 100926 96.67 97.26 96.96 -0.05 0.053 6075 83.85 93.56 96.96 -0.15 0.541NNPS 2917 63.78 66.71 65.21 -0.59 0.415 234 40.78 17.95 65.21 19.96 0.222NNS 65922 97.59 97.99 97.79 0.06 0.006 2353 83.20 88.82 97.79 0.18 0.526PDT 397 70.69 79.60 74.88 -0.39 0.561 0 74.88 – –POS 9529 98.94 99.50 99.22 -0.01 0.796 0 99.22 – –PRP 19164 99.74 99.36 99.55 -0.00 0.369 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.19 91.10 92.62 0.05 0.318 516 84.43 79.84 92.62 -0.27 0.818RBR 1905 75.63 66.61 70.83 0.51 0.432 7 33.33 14.29 70.83 – –RBS 486 61.12 67.28 64.05 -18.45 0.262 2 0.00 64.05 – –RP 2879 78.44 75.96 77.18 0.18 0.411 0 77.18 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.49 95.40 95.44 0.02 0.713 570 83.88 75.79 95.44 -0.10 0.830VBD 32941 95.42 94.47 94.94 -0.12 0.094 426 80.06 63.15 94.94 -2.28 0.338VBG 16321 91.40 92.31 91.85 0.07 0.193 924 70.18 92.21 91.85 0.62 0.246VBN 22177 86.98 90.27 88.60 -0.27 0.075 716 64.60 81.56 88.60 -2.58 0.030VBP 13819 93.49 92.26 92.87 -0.11 0.253 131 58.06 54.96 92.87 1.82 0.651VBZ 23816 97.75 96.57 97.16 0.14 0.037 467 87.88 55.89 97.16 2.35 0.183WDT 4745 96.63 95.62 96.12 0.04 0.075 4 0.00 96.12 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.35 -0.91 0.063TOKENS 1044667 96.815 -0.03 0.129 24622 81.52 -0.51 0.170

94

B.35

(Pos = WDT & Wd = which) ⇒ (Pos ← WDT–0)(Pos = WDT & Wd = that) ⇒ (Pos ← WDT–1)(Pos = WDT & Wd = what) ⇒ (Pos ← WDT–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.68 99.36 99.52 0.04 0.262 1 0.00 99.52 – –CD 40132 99.29 99.48 99.38 -0.02 0.237 3413 98.26 97.51 99.38 -0.20 0.321DT 90066 99.32 99.49 99.41 0.01 0.644 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.266 0 97.06 – –FW 238 56.28 47.06 51.26 -0.92 0.777 67 11.11 1.49 51.26 – –IN 108456 98.15 98.71 98.43 0.00 0.759 22 0.00 0.00 98.43 – –JJ 67085 92.00 91.28 91.64 -0.03 0.609 4581 78.74 72.10 91.64 -0.97 0.182JJR 3621 84.95 90.42 87.60 0.06 0.779 85 83.33 23.53 87.60 5.26 0.779JJS 2129 94.44 91.83 93.12 -0.15 0.988 54 71.43 74.07 93.12 -1.55 0.374MD 10743 99.67 99.80 99.73 0.02 0.288 5 0.00 99.73 – –NN 146173 96.25 96.02 96.14 0.01 0.820 3934 74.26 69.01 96.14 -0.34 0.719NNP 100926 96.62 97.34 96.98 -0.04 0.334 6075 83.13 94.16 96.98 -0.30 0.363NNPS 2917 63.07 66.27 64.63 -1.48 0.258 234 30.38 10.26 64.63 -26.20 0.435NNS 65922 97.57 97.87 97.72 -0.01 0.806 2353 83.12 87.25 97.72 -0.74 0.236PDT 397 70.71 77.83 74.10 -1.43 0.077 0 74.10 – –POS 9529 98.94 99.52 99.23 0.01 0.703 0 99.23 – –PRP 19164 99.74 99.35 99.55 -0.01 0.471 11 0.00 99.55 – –PRP$ 9173 99.38 99.91 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.10 91.13 92.59 0.02 0.457 516 83.63 81.20 92.59 0.13 0.815RBR 1905 75.34 66.72 70.77 0.42 0.588 7 33.33 14.29 70.77 – –RBS 486 73.72 83.13 78.14 -0.52 0.954 2 0.00 78.14 – –RP 2879 78.66 75.41 77.00 -0.06 0.882 0 77.00 – –SYM 59 77.78 83.05 80.33 -1.64 0.374 1 0.00 0.00 80.33 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.37 95.42 95.40 -0.03 0.721 570 82.71 77.19 95.40 0.18 0.991VBD 32941 95.69 94.52 95.10 0.04 0.148 426 81.01 64.08 95.10 -0.96 0.554VBG 16321 91.48 92.41 91.94 0.16 0.005 924 71.20 92.32 91.94 1.50 0.028VBN 22177 86.98 90.61 88.75 -0.09 0.404 716 66.21 81.84 88.75 -1.08 0.476VBP 13819 93.71 92.08 92.89 -0.09 0.326 131 61.11 50.38 92.89 -0.42 0.759VBZ 23816 97.65 96.54 97.09 0.07 0.017 467 87.54 54.18 97.09 0.27 0.893WDT 4745 96.76 95.55 96.15 0.07 0.490 4 0.00 96.15 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.59 -0.44 0.409TOKENS 1044667 96.837 -0.00 0.779 24622 81.62 -0.40 0.308

95

B.36

(Pos = PRP & Wd = it) ⇒ (Pos ← PRP–0)(Pos = PRP & Wd = he) ⇒ (Pos ← PRP–1)(Pos = PRP & Wd = they) ⇒ (Pos ← PRP–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.69 99.37 99.53 0.05 0.219 1 0.00 99.53 – –CD 40132 99.34 99.43 99.39 -0.02 0.152 3413 98.74 96.81 99.39 -0.32 0.029DT 90066 99.30 99.49 99.40 -0.00 0.767 2 0.00 99.40 – –EX 951 95.43 98.84 97.11 -0.16 0.407 0 97.11 – –FW 238 54.41 46.64 50.23 -2.91 0.064 67 50.00 4.48 50.23 – –IN 108456 98.16 98.69 98.42 -0.00 0.820 22 0.00 0.00 98.42 – –JJ 67085 92.04 91.21 91.62 -0.04 0.440 4581 79.11 72.23 91.62 -0.66 0.302JJR 3621 84.75 90.83 87.68 0.15 0.457 85 68.42 30.59 87.68 21.27 0.374JJS 2129 94.27 95.77 95.01 1.88 0.382 54 73.08 70.37 95.01 -2.95 0.374MD 10743 99.66 99.80 99.73 0.01 0.583 5 0.00 0.00 99.73 – –NN 146173 96.21 96.01 96.11 -0.02 0.349 3934 74.00 68.94 96.11 -0.57 0.157NNP 100926 96.63 97.33 96.98 -0.03 0.335 6075 83.45 94.39 96.98 0.02 0.972NNPS 2917 63.56 66.37 64.93 -1.02 0.187 234 37.25 16.24 64.93 8.85 0.621NNS 65922 97.53 97.95 97.74 0.01 0.591 2353 83.49 88.36 97.74 0.10 0.678PDT 397 70.78 78.09 74.25 -1.23 0.178 0 74.25 – –POS 9529 98.99 99.45 99.22 -0.00 0.992 0 99.22 – –PRP 19164 99.75 99.37 99.56 0.01 0.099 11 0.00 99.56 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 93.95 91.17 92.54 -0.04 0.302 516 82.94 81.01 92.54 -0.40 0.650RBR 1905 75.51 66.04 70.46 -0.02 0.993 7 0.00 0.00 70.46 – –RBS 486 87.80 81.48 84.53 7.61 0.357 2 0.00 84.53 – –RP 2879 78.75 75.82 77.26 0.28 0.300 0 77.26 – –SYM 59 77.78 83.05 80.33 -1.64 0.374 1 0.00 0.00 80.33 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.27 95.50 95.38 -0.04 0.440 570 82.24 77.19 95.38 -0.09 0.821VBD 32941 95.65 94.54 95.09 0.03 0.456 426 81.46 62.91 95.09 -1.74 0.010VBG 16321 91.57 92.14 91.85 0.06 0.542 924 69.79 91.77 91.85 0.10 0.920VBN 22177 86.99 90.47 88.69 -0.16 0.225 716 65.41 79.75 88.69 -2.89 0.058VBP 13819 93.88 92.31 93.09 0.13 0.060 131 70.00 53.44 93.09 9.27 0.175VBZ 23816 97.63 96.54 97.09 0.06 0.136 467 87.00 55.89 97.09 1.95 0.592WDT 4745 96.55 95.55 96.05 -0.04 0.409 4 0.00 96.05 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.60 -0.43 0.329TOKENS 1044667 96.838 -0.00 0.816 24622 81.71 -0.28 0.051

96

B.37

(Pos = DT & Wd = the) ⇒ (Pos ← DT–0)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.91 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.38 99.51 0.03 0.267 1 0.00 99.51 – –CD 40132 99.35 99.42 99.38 -0.02 0.427 3413 98.86 96.84 99.38 -0.24 0.299DT 90066 99.33 99.48 99.40 0.00 0.934 2 0.00 99.40 – –EX 951 95.43 98.84 97.11 -0.16 0.393 0 97.11 – –FW 238 56.25 45.38 50.23 -2.90 0.381 67 20.00 1.49 50.23 – –IN 108456 98.19 98.69 98.44 0.01 0.345 22 0.00 0.00 98.44 – –JJ 67085 91.90 91.22 91.56 -0.11 0.079 4581 78.20 72.89 91.56 -0.74 0.310JJR 3621 84.64 90.39 87.42 -0.15 0.146 85 65.38 20.00 87.42 – –JJS 2129 91.64 88.54 90.06 -3.43 0.213 54 70.69 75.93 90.06 -0.89 0.374MD 10743 99.70 99.75 99.73 0.01 0.366 5 0.00 99.73 – –NN 146173 96.20 96.00 96.10 -0.03 0.459 3934 73.72 68.73 96.10 -0.90 0.280NNP 100926 96.61 97.32 96.97 -0.05 0.054 6075 83.49 93.88 96.97 -0.22 0.447NNPS 2917 63.67 66.10 64.86 -1.13 0.241 234 38.75 13.25 64.86 -4.98 0.566NNS 65922 97.54 97.88 97.71 -0.02 0.481 2353 83.26 87.08 97.71 -0.75 0.237PDT 397 70.41 77.33 73.71 -1.95 0.135 0 0.00 73.71 – –POS 9529 98.90 99.52 99.21 -0.01 0.694 0 99.21 – –PRP 19164 99.73 99.37 99.55 0.00 0.609 11 0.00 99.55 – –PRP$ 9173 99.42 99.88 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 93.98 91.05 92.49 -0.09 0.038 516 85.37 81.40 92.49 1.26 0.480RBR 1905 75.39 65.93 70.34 -0.18 0.835 7 50.00 14.29 70.34 – –RBS 486 62.03 71.60 66.48 -15.37 0.257 2 0.00 66.48 – –RP 2879 78.67 75.06 76.82 -0.29 0.393 0 76.82 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.19 46.00 56.79 -1.52 0.374 17 0.00 56.79 – –VB 29021 95.24 95.44 95.34 -0.09 0.086 570 79.81 75.61 95.34 -2.57 0.230VBD 32941 95.71 94.51 95.11 0.05 0.456 426 85.08 62.91 95.11 0.12 0.857VBG 16321 91.28 92.37 91.82 0.02 0.763 924 70.96 93.07 91.82 1.66 0.014VBN 22177 87.04 90.52 88.74 -0.10 0.274 716 66.37 82.40 88.74 -0.65 0.361VBP 13819 93.74 92.31 93.02 0.06 0.375 131 62.50 53.44 93.02 3.88 0.733VBZ 23816 97.59 96.58 97.08 0.06 0.494 467 85.16 56.53 97.08 1.80 0.638WDT 4745 96.29 95.74 96.02 -0.08 0.431 4 0.00 96.02 – –WP 2604 99.05 99.62 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.13 -1.34 0.026TOKENS 1044667 96.806 -0.04 0.069 24622 81.61 -0.41 0.296

97

B.38

(Pos = IN & Wd = of) ⇒ (Pos ← IN–0)(Pos = IN & Wd = in) ⇒ (Pos ← IN–1)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.90 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.59 99.38 99.49 0.01 0.715 1 0.00 99.49 – –CD 40132 99.31 99.43 99.37 -0.04 0.098 3413 98.66 96.92 99.37 -0.30 0.039DT 90066 99.33 99.47 99.40 -0.00 0.768 2 0.00 99.40 – –EX 951 95.23 98.63 96.90 -0.37 0.372 0 96.90 – –FW 238 53.28 51.26 52.25 1.00 0.861 67 71.43 7.46 52.25 – –IN 108456 98.19 98.68 98.43 0.01 0.730 22 0.00 0.00 98.43 – –JJ 67085 91.84 91.29 91.56 -0.10 0.016 4581 78.83 73.15 91.56 -0.17 0.729JJR 3621 84.84 90.44 87.56 0.01 0.998 85 73.91 20.00 87.56 – –JJS 2129 94.30 91.69 92.97 -0.30 0.983 54 73.21 75.93 92.97 0.91 0.374MD 10743 99.67 99.79 99.73 0.01 0.466 5 0.00 99.73 – –NN 146173 96.22 95.96 96.09 -0.04 0.236 3934 74.19 68.91 96.09 -0.46 0.390NNP 100926 96.64 97.31 96.97 -0.04 0.084 6075 83.97 93.28 96.97 -0.21 0.641NNPS 2917 63.96 66.06 64.99 -0.93 0.314 234 38.89 20.94 64.99 31.01 0.638NNS 65922 97.48 97.97 97.72 -0.00 0.867 2353 82.59 88.91 97.72 -0.16 0.592PDT 397 71.00 78.34 74.49 -0.91 0.199 0 74.49 – –POS 9529 98.89 99.55 99.22 0.00 0.971 0 99.22 – –PRP 19164 99.71 99.37 99.54 -0.01 0.541 11 12.50 9.09 99.54 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.04 91.13 92.57 -0.01 0.914 516 83.80 81.20 92.57 0.23 0.812RBR 1905 75.69 66.51 70.80 0.47 0.566 7 0.00 0.00 70.80 – –RBS 486 73.36 82.72 77.76 -1.01 0.908 2 0.00 77.76 – –RP 2879 78.83 75.93 77.35 0.40 0.251 0 77.35 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 73.77 45.00 55.90 -3.07 0.374 17 0.00 55.90 – –VB 29021 95.40 95.30 95.35 -0.08 0.322 570 82.67 76.14 95.35 -0.55 0.169VBD 32941 95.55 94.51 95.03 -0.03 0.447 426 85.90 62.91 95.03 0.52 0.629VBG 16321 91.56 92.20 91.88 0.09 0.235 924 70.72 92.53 91.88 1.21 0.067VBN 22177 86.93 90.35 88.61 -0.26 0.020 716 65.26 82.12 88.61 -1.73 0.124VBP 13819 93.47 92.35 92.91 -0.06 0.499 131 59.13 51.91 92.91 -0.32 0.799VBZ 23816 97.77 96.48 97.12 0.10 0.196 467 90.00 55.89 97.12 3.30 0.264WDT 4745 96.43 95.70 96.07 -0.02 0.857 4 0.00 96.07 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.28 -1.04 0.147TOKENS 1044667 96.817 -0.03 0.240 24622 81.78 -0.20 0.463

98

B.39

(Pos = DT & Wd = the) ⇒ (Pos ← DT–0)(Pos = DT & Wd = a) ⇒ (Pos ← DT–1)(Pos = DT & Wd = this) ⇒ (Pos ← DT–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.38 99.50 0.03 0.300 1 0.00 0.00 99.50 – –CD 40132 99.31 99.42 99.36 -0.05 0.013 3413 98.45 96.89 99.36 -0.42 0.096DT 90066 99.33 99.47 99.40 -0.00 0.875 2 0.00 0.00 99.40 – –EX 951 95.43 98.84 97.11 -0.16 0.393 0 97.11 – –FW 238 53.39 49.58 51.42 -0.61 0.707 67 75.00 4.48 51.42 – –IN 108456 98.15 98.69 98.42 -0.01 0.573 22 11.11 4.55 98.42 – –JJ 67085 91.91 91.20 91.55 -0.12 0.093 4581 78.57 72.04 91.55 -1.12 0.053JJR 3621 84.62 90.86 87.63 0.09 0.655 85 64.86 28.24 87.63 12.86 0.579JJS 2129 88.43 96.52 92.30 -1.03 0.853 54 71.93 75.93 92.30 – –MD 10743 99.69 99.77 99.73 0.01 0.211 5 0.00 99.73 – –NN 146173 96.20 95.97 96.09 -0.05 0.080 3934 74.14 68.43 96.09 -0.86 0.233NNP 100926 96.60 97.35 96.97 -0.05 0.069 6075 83.26 94.75 96.97 0.08 0.729NNPS 2917 63.67 66.10 64.86 -1.13 0.354 234 31.82 11.97 64.86 -16.30 0.224NNS 65922 97.56 97.87 97.72 -0.01 0.786 2353 83.52 87.63 97.72 -0.28 0.576PDT 397 70.09 79.09 74.32 -1.14 0.277 0 0.00 74.32 – –POS 9529 98.93 99.54 99.23 0.01 0.464 0 99.23 – –PRP 19164 99.75 99.36 99.56 0.01 0.706 11 0.00 99.56 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.629 1 0.00 99.64 – –RB 33806 94.06 91.05 92.53 -0.05 0.177 516 84.79 81.01 92.53 0.68 0.386RBR 1905 75.48 65.93 70.38 -0.13 0.723 7 0.00 0.00 70.38 – –RBS 486 85.62 51.44 64.27 -18.18 0.262 2 0.00 64.27 – –RP 2879 78.53 75.48 76.97 -0.09 0.924 0 76.97 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 72.73 48.00 57.83 0.28 0.374 17 33.33 5.88 57.83 – –VB 29021 95.27 95.41 95.34 -0.09 0.260 570 80.45 75.79 95.34 -2.08 0.344VBD 32941 95.55 94.60 95.07 0.01 0.888 426 82.32 66.67 95.07 1.96 0.148VBG 16321 91.28 92.22 91.75 -0.05 0.523 924 70.95 92.53 91.75 1.40 0.165VBN 22177 87.11 90.35 88.70 -0.15 0.453 716 67.05 80.45 88.70 -1.16 0.454VBP 13819 93.69 92.24 92.96 -0.02 0.843 131 62.39 51.91 92.96 2.17 0.627VBZ 23816 97.67 96.48 97.07 0.05 0.367 467 86.10 54.39 97.07 -0.13 0.960WDT 4745 96.32 95.53 95.93 -0.17 0.233 4 0.00 95.93 – –WP 2604 99.05 99.58 99.31 -0.04 0.178 0 99.31 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 99.96 99.87 99.91 -0.02 0.374 1 0.00 0.00 99.91 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.28 -1.04 0.134TOKENS 1044667 96.806 -0.04 0.200 24622 81.64 -0.37 0.295

99

B.40

(Pos = IN & Wd = of) ⇒ (Pos ← IN–0)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.39 99.50 0.03 0.313 1 0.00 99.50 – –CD 40132 99.35 99.42 99.38 -0.02 0.272 3413 98.74 96.78 99.38 -0.33 0.185DT 90066 99.31 99.49 99.40 -0.00 0.667 2 0.00 99.40 – –EX 951 95.33 98.74 97.00 -0.26 0.216 0 97.00 – –FW 238 53.64 49.58 51.53 -0.39 0.682 67 33.33 1.49 51.53 – –IN 108456 98.20 98.67 98.43 0.01 0.489 22 0.00 0.00 98.43 – –JJ 67085 91.94 91.24 91.58 -0.08 0.100 4581 77.61 71.97 91.58 -1.75 0.018JJR 3621 84.57 90.94 87.64 0.10 0.718 85 70.27 30.59 87.64 22.26 0.366JJS 2129 91.49 95.96 93.67 0.45 0.822 54 70.69 75.93 93.67 -0.89 0.374MD 10743 99.67 99.77 99.72 0.00 0.374 5 0.00 99.72 – –NN 146173 96.16 96.04 96.10 -0.03 0.245 3934 73.26 69.09 96.10 -0.94 0.076NNP 100926 96.63 97.37 97.00 -0.02 0.439 6075 83.31 93.93 97.00 -0.30 0.453NNPS 2917 64.38 65.86 65.11 -0.75 0.054 234 48.21 11.54 65.11 -10.39 0.376NNS 65922 97.54 97.90 97.72 -0.01 0.723 2353 82.73 87.76 97.72 -0.69 0.190PDT 397 70.58 80.35 75.15 -0.04 0.912 0 75.15 – –POS 9529 98.98 99.48 99.23 0.01 0.715 0 99.23 – –PRP 19164 99.73 99.38 99.55 0.00 0.380 11 0.00 99.55 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.374 1 0.00 99.64 – –RB 33806 94.04 91.15 92.58 0.00 0.989 516 85.77 81.78 92.58 1.75 0.087RBR 1905 75.75 65.93 70.50 0.04 0.956 7 20.00 14.29 70.50 – –RBS 486 86.05 68.52 76.29 -2.88 0.528 2 0.00 76.29 – –RP 2879 78.50 76.21 77.34 0.38 0.196 0 77.34 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.57 95.25 95.41 -0.01 0.867 570 85.14 74.39 95.41 -0.39 0.763VBD 32941 95.57 94.39 94.98 -0.09 0.155 426 83.55 60.80 94.98 -2.59 0.113VBG 16321 91.46 92.34 91.90 0.11 0.236 924 70.16 92.64 91.90 0.81 0.303VBN 22177 86.98 90.23 88.58 -0.29 0.101 716 65.51 79.05 88.58 -3.19 0.033VBP 13819 93.38 92.18 92.78 -0.20 0.023 131 56.52 49.62 92.78 -4.72 0.102VBZ 23816 97.63 96.51 97.07 0.04 0.436 467 87.46 56.75 97.07 3.11 0.233WDT 4745 96.61 95.62 96.11 0.03 0.880 4 0.00 96.11 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.41 -0.80 0.084TOKENS 1044667 96.823 -0.02 0.297 24622 81.40 -0.66 0.012

100

B.41

(Pos = PRP & Wd = it) ⇒ (Pos ← PRP–0)(Pos = PRP & Wd = he) ⇒ (Pos ← PRP–1)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.38 99.50 0.02 0.560 1 0.00 0.00 99.50 – –CD 40132 99.30 99.43 99.36 -0.04 0.113 3413 98.51 96.81 99.36 -0.44 0.086DT 90066 99.34 99.48 99.41 0.01 0.579 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.241 0 97.06 – –FW 238 55.56 48.32 51.69 -0.09 0.768 67 66.67 5.97 51.69 – –IN 108456 98.18 98.69 98.43 0.01 0.669 22 0.00 0.00 98.43 – –JJ 67085 92.04 91.20 91.62 -0.04 0.123 4581 79.04 72.54 91.62 -0.48 0.240JJR 3621 84.76 90.80 87.68 0.15 0.353 85 65.22 17.65 87.68 – –JJS 2129 88.53 96.43 92.31 -1.01 0.868 54 71.93 75.93 92.31 – –MD 10743 99.70 99.77 99.73 0.02 0.332 5 0.00 99.73 – –NN 146173 96.16 96.09 96.12 -0.01 0.825 3934 74.48 69.42 96.12 0.10 0.893NNP 100926 96.71 97.32 97.01 -0.00 0.864 6075 84.18 94.14 97.01 0.35 0.118NNPS 2917 63.29 67.09 65.14 -0.71 0.342 234 37.86 22.65 65.14 36.40 0.169NNS 65922 97.54 97.89 97.71 -0.01 0.562 2353 83.01 88.23 97.71 -0.27 0.358PDT 397 70.50 78.84 74.44 -0.99 0.413 0 0.00 74.44 – –POS 9529 98.99 99.44 99.21 -0.01 0.843 0 99.21 – –PRP 19164 99.75 99.37 99.56 0.01 0.232 11 0.00 99.56 – –PRP$ 9173 99.37 99.91 99.64 0.00 0.374 1 0.00 99.64 – –RB 33806 94.06 91.19 92.60 0.03 0.383 516 84.48 83.33 92.60 1.95 0.030RBR 1905 75.83 66.04 70.59 0.17 0.869 7 0.00 0.00 70.59 – –RBS 486 85.23 52.26 64.80 -17.51 0.266 2 0.00 64.80 – –RP 2879 79.12 75.41 77.22 0.23 0.622 0 77.22 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 72.73 48.00 57.83 0.28 0.374 17 33.33 5.88 57.83 – –VB 29021 95.39 95.39 95.39 -0.04 0.451 570 84.45 77.19 95.39 1.19 0.653VBD 32941 95.68 94.42 95.05 -0.02 0.824 426 82.42 63.85 95.05 -0.41 0.748VBG 16321 91.69 91.98 91.84 0.05 0.553 924 70.75 91.88 91.84 0.93 0.170VBN 22177 86.83 90.73 88.74 -0.11 0.536 716 66.25 81.70 88.74 -1.13 0.455VBP 13819 93.67 92.25 92.96 -0.01 0.857 131 62.83 54.20 92.96 4.93 0.514VBZ 23816 97.57 96.52 97.04 0.02 0.763 467 86.80 56.32 97.04 2.34 0.202WDT 4745 96.66 95.81 96.23 0.15 0.222 4 0.00 96.23 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.57 -0.49 0.104TOKENS 1044667 96.830 -0.01 0.371 24622 81.95 0.00 0.962

101

B.42

(Pos = PRP & Wd = it) ⇒ (Pos ← PRP–0)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.75 99.34 99.54 0.01 0.143 1 0.00 99.54 – –CD 40132 99.28 99.57 99.42 – – 3413 99.35 98.51 99.42 – –DT 90066 99.46 99.28 99.37 -0.00 0.618 2 0.00 99.37 – –EX 951 95.27 99.58 97.38 – – 0 97.38 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.82 98.91 98.36 -0.00 0.512 22 0.00 98.36 – –JJ 67085 92.37 92.07 92.22 0.00 0.673 4581 79.67 82.12 92.22 0.04 0.356JJR 3621 84.86 92.24 88.39 -0.02 0.697 85 88.46 27.06 88.39 – –JJS 2129 95.19 95.77 95.48 0.03 0.374 54 86.49 59.26 95.48 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.08 96.12 0.00 0.978 3934 81.07 67.69 96.12 0.04 0.777NNP 100926 94.69 97.96 96.29 0.00 0.972 6075 82.53 98.14 96.29 0.02 0.323NNPS 2917 54.73 73.19 62.63 0.02 0.570 234 54.00 11.54 62.63 -3.23 0.374NNS 65922 97.71 97.20 97.45 -0.01 0.072 2353 86.99 90.35 97.45 -0.17 0.033PDT 397 70.00 84.63 76.62 0.18 0.374 0 76.62 – –POS 9529 98.76 99.65 99.21 0.01 0.178 0 99.21 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.19 90.54 92.80 0.00 0.713 516 90.32 81.40 92.80 – –RBR 1905 77.38 66.98 71.81 -0.08 0.617 7 0.00 71.81 – –RBS 486 87.98 84.36 86.13 0.11 0.374 2 0.00 0.00 86.13 – –RP 2879 75.59 75.72 75.66 -0.09 0.191 0 75.66 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.73 94.65 95.68 0.02 0.172 570 85.90 79.12 95.68 0.13 0.606VBD 32941 94.97 95.91 95.44 -0.02 0.037 426 79.52 70.19 95.44 -0.91 0.103VBG 16321 92.35 94.22 93.27 -0.01 0.674 924 82.06 91.56 93.27 0.15 0.418VBN 22177 88.19 90.90 89.52 -0.04 0.011 716 78.58 72.77 89.52 0.03 0.970VBP 13819 94.03 92.16 93.09 0.03 0.220 131 72.41 32.06 93.09 0.00 0.908VBZ 23816 98.83 96.37 97.58 -0.02 0.168 467 89.88 62.74 97.58 -1.76 0.198WDT 4745 97.15 96.38 96.76 -0.01 0.374 4 0.00 96.76 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.67 -0.09 0.197TOKENS 1044667 96.851 -0.00 0.400 24622 84.60 -0.03 0.289

102

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.92 99.91 0.01 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.69 99.37 99.53 0.05 0.140 1 0.00 99.53 – –CD 40132 99.32 99.47 99.40 -0.01 0.634 3413 98.49 97.25 99.40 -0.22 0.334DT 90066 99.31 99.49 99.40 -0.00 0.945 2 0.00 99.40 – –EX 951 95.24 98.84 97.01 -0.26 0.134 0 97.01 – –FW 238 55.50 46.64 50.68 -2.02 0.206 67 10.00 1.49 50.68 – –IN 108456 98.17 98.70 98.44 0.01 0.418 22 5.00 4.55 98.44 – –JJ 67085 92.04 91.31 91.67 0.01 0.827 4581 78.95 72.63 91.67 -0.47 0.442JJR 3621 84.78 91.25 87.90 0.40 0.104 85 70.59 28.24 87.90 15.70 0.633JJS 2129 90.81 92.34 91.57 -1.81 0.723 54 74.55 75.93 91.57 1.83 0.374MD 10743 99.67 99.78 99.73 0.01 0.178 5 0.00 99.73 – –NN 146173 96.25 96.04 96.14 0.01 0.667 3934 74.05 68.76 96.14 -0.67 0.282NNP 100926 96.65 97.33 96.99 -0.02 0.472 6075 83.76 94.34 96.99 0.19 0.576NNPS 2917 63.92 66.37 65.12 -0.73 0.261 234 40.22 15.81 65.12 9.24 0.941NNS 65922 97.60 97.89 97.75 0.02 0.215 2353 83.85 87.17 97.75 -0.34 0.395PDT 397 71.00 78.34 74.49 -0.91 0.201 0 74.49 – –POS 9529 98.94 99.49 99.21 -0.01 0.807 0 99.21 – –PRP 19164 99.73 99.37 99.55 -0.00 0.980 11 0.00 99.55 – –PRP$ 9173 99.39 99.89 99.64 -0.00 0.994 1 0.00 99.64 – –RB 33806 94.04 91.11 92.55 -0.03 0.451 516 84.18 80.43 92.55 -0.04 0.990RBR 1905 76.59 65.93 70.86 0.55 0.328 7 25.00 14.29 70.86 – –RBS 486 71.11 65.84 68.38 -12.95 0.394 2 0.00 68.38 – –RP 2879 79.10 75.72 77.37 0.43 0.317 0 77.37 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.27 95.35 95.31 -0.12 0.077 570 82.40 77.19 95.31 0.00 0.950VBD 32941 95.66 94.47 95.06 0.00 0.968 426 81.04 62.21 95.06 -2.58 0.148VBG 16321 91.60 92.31 91.96 0.18 0.033 924 70.78 91.77 91.96 0.90 0.199VBN 22177 87.00 90.73 88.83 -0.01 0.851 716 66.70 82.82 88.83 -0.15 0.836VBP 13819 93.57 92.42 93.00 0.03 0.597 131 62.93 55.73 93.00 6.58 0.228VBZ 23816 97.57 96.62 97.09 0.07 0.296 467 83.95 58.24 97.09 3.03 0.352WDT 4745 96.62 95.70 96.16 0.07 0.492 4 0.00 96.16 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.69 -0.26 0.653TOKENS 1044667 96.839 -0.00 0.906 24622 81.81 -0.16 0.602

103

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 – – 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.74 99.33 99.53 -0.01 0.177 1 0.00 99.53 – –CD 40132 99.34 99.46 99.40 -0.00 0.308 3413 99.15 98.77 99.40 -0.01 0.678DT 90066 99.42 99.31 99.36 -0.00 0.295 2 0.00 99.36 – –EX 951 95.61 98.53 97.05 – – 0 97.05 – –FW 238 70.83 28.57 40.72 -1.45 0.558 67 100.00 5.97 40.72 – –IN 108456 97.72 98.70 98.21 -0.01 0.287 22 72.73 36.36 98.21 – –JJ 67085 91.81 92.80 92.30 -0.01 0.633 4581 82.45 82.36 92.30 -0.03 0.802JJR 3621 84.87 90.17 87.44 -0.02 0.701 85 80.85 44.71 87.44 -1.83 0.374JJS 2129 94.51 94.69 94.60 -0.18 0.281 54 90.00 66.67 94.60 -2.24 0.178MD 10743 99.74 99.56 99.65 -0.01 0.178 5 0.00 99.65 – –NN 146173 96.38 96.58 96.48 -0.00 0.967 3934 79.72 78.06 96.48 -0.13 0.057NNP 100926 96.20 97.71 96.95 -0.01 0.098 6075 89.35 95.60 96.95 0.02 0.710NNPS 2917 65.75 54.17 59.40 -0.06 0.867 234 57.72 36.75 59.40 0.65 0.635NNS 65922 97.64 98.50 98.07 0.00 0.811 2353 90.02 92.73 98.07 0.04 0.754PDT 397 73.93 64.99 69.17 0.51 0.228 0 69.17 – –POS 9529 98.43 99.63 99.03 -0.01 0.626 0 99.03 – –PRP 19164 99.86 99.14 99.50 -0.00 0.999 11 0.00 99.50 – –PRP$ 9173 99.11 99.95 99.53 -0.01 0.705 1 0.00 99.53 – –RB 33806 93.79 90.84 92.29 -0.03 0.155 516 92.95 84.30 92.29 -0.23 0.182RBR 1905 75.49 62.73 68.52 0.22 0.419 7 0.00 0.00 68.52 – –RBS 486 84.13 79.63 81.82 -0.92 0.185 2 0.00 81.82 – –RP 2879 79.64 73.22 76.29 -0.10 0.516 0 76.29 – –SYM 59 80.43 62.71 70.48 – – 1 100.00 100.00 70.48 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.85 28.00 42.11 2.92 0.374 17 0.00 42.11 – –VB 29021 97.50 95.11 96.29 0.02 0.020 570 90.64 86.67 96.29 0.23 0.380VBD 32941 96.02 96.59 96.31 -0.02 0.202 426 84.35 80.99 96.31 0.36 0.207VBG 16321 94.13 92.00 93.06 -0.03 0.327 924 85.42 90.04 93.06 -0.11 0.392VBN 22177 91.08 89.51 90.29 -0.05 0.018 716 81.00 79.19 90.29 -0.21 0.590VBP 13819 94.63 93.74 94.18 0.03 0.373 131 73.87 62.60 94.18 -0.80 0.374VBZ 23816 98.86 96.74 97.79 0.00 0.417 467 89.51 78.59 97.79 0.41 0.284WDT 4745 96.34 95.93 96.14 -0.00 0.957 4 0.00 96.14 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.63 -0.17 0.155TOKENS 1044667 97.050 -0.01 0.065 24622 87.32 -0.02 0.527

104

B.43

(Pos = DT & Wd ∈ {either, half, many, neither}) ⇒ (Pos ← DT–0)(Pos = DT & Wd = all) ⇒ (Pos ← DT–1)(Pos = DT & Wd = no) ⇒ (Pos ← DT–2)(Pos = DT & Wd = both) ⇒ (Pos ← DT–3)(Pos = DT & Wd ∈ {any, every, some, these, those}) ⇒ (Pos ← DT–4)(Pos = DT & Wd ∈ {del, la, le, nary, them}) ⇒ (Pos ← DT–5)(Pos = DT & Wd ∈ {a, an, another, each, that, the, this}) ⇒ (Pos ← DT–6)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.55 99.41 99.48 0.01 0.571 1 0.00 99.48 – –CD 40132 99.32 99.42 99.37 -0.04 0.035 3413 98.57 96.66 99.37 -0.48 0.034DT 90066 99.33 99.48 99.40 0.00 0.962 2 0.00 99.40 – –EX 951 95.33 98.84 97.06 -0.21 0.095 0 97.06 – –FW 238 54.68 46.64 50.34 -2.69 0.584 67 50.00 4.48 50.34 – –IN 108456 98.17 98.69 98.43 0.01 0.656 22 0.00 0.00 98.43 – –JJ 67085 92.03 91.22 91.62 -0.04 0.204 4581 79.04 71.93 91.62 -0.92 0.162JJR 3621 84.85 90.36 87.52 -0.03 0.880 85 81.25 15.29 87.52 -26.16 0.343JJS 2129 91.65 92.30 91.97 -1.38 0.373 54 71.93 75.93 91.97 – –MD 10743 99.67 99.79 99.73 0.01 0.466 5 0.00 99.73 – –NN 146173 96.25 96.01 96.13 -0.00 0.905 3934 74.05 69.22 96.13 -0.32 0.476NNP 100926 96.60 97.38 96.99 -0.02 0.475 6075 83.69 94.63 96.99 0.29 0.388NNPS 2917 63.99 66.27 65.11 -0.75 0.384 234 42.20 19.66 65.11 29.08 0.326NNS 65922 97.46 97.93 97.69 -0.04 0.099 2353 82.45 88.44 97.69 -0.50 0.144PDT 397 71.05 80.35 75.41 0.31 0.637 0 75.41 – –POS 9529 98.91 99.56 99.23 0.01 0.577 0 99.23 – –PRP 19164 99.76 99.35 99.56 0.01 0.647 11 0.00 99.56 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 93.99 91.21 92.58 0.00 0.950 516 83.23 81.78 92.58 0.25 0.779RBR 1905 75.38 65.72 70.22 -0.36 0.368 7 0.00 0.00 70.22 – –RBS 486 72.00 70.37 71.18 -9.39 0.381 2 0.00 71.18 – –RP 2879 79.08 76.17 77.60 0.72 0.165 0 77.60 – –SYM 59 80.95 86.44 83.61 2.38 0.178 1 0.00 83.61 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.36 95.32 95.34 -0.09 0.204 570 82.79 75.09 95.34 -1.20 0.159VBD 32941 95.57 94.56 95.06 -0.00 0.977 426 81.85 62.44 95.06 -1.95 0.301VBG 16321 91.75 92.24 91.99 0.22 0.040 924 71.42 92.75 91.99 1.88 0.033VBN 22177 87.00 90.55 88.74 -0.11 0.327 716 66.40 81.70 88.74 -1.00 0.524VBP 13819 93.61 92.29 92.94 -0.03 0.683 131 57.26 51.15 92.94 -2.58 0.706VBZ 23816 97.74 96.45 97.09 0.07 0.232 467 86.21 53.53 97.09 -1.05 0.677WDT 4745 96.57 95.47 96.02 -0.08 0.092 4 0.00 96.02 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.51 -0.61 0.112TOKENS 1044667 96.828 -0.01 0.145 24622 81.74 -0.25 0.033

105

B.44 rp(cl–c) Mapping

(Pos = RP & Wd ∈ {back, down, in, off, on, out, over, up}) ⇒ (Pos ← RP–0)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.73 99.33 99.53 -0.00 0.178 1 0.00 99.53 – –CD 40132 99.27 99.57 99.42 -0.00 0.374 3413 99.35 98.51 99.42 – –DT 90066 99.46 99.28 99.37 0.00 0.699 2 0.00 99.37 – –EX 951 95.27 99.58 97.38 – – 0 97.38 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.82 98.91 98.36 -0.00 0.749 22 0.00 98.36 – –JJ 67085 92.36 92.06 92.21 -0.01 0.603 4581 79.64 82.06 92.21 -0.02 0.826JJR 3621 84.88 92.27 88.42 0.01 0.761 85 88.89 28.24 88.42 3.42 0.374JJS 2129 95.24 95.77 95.50 0.05 0.180 54 88.89 59.26 95.50 1.11 0.374MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.08 96.12 0.00 0.869 3934 81.11 67.67 96.12 0.05 0.559NNP 100926 94.70 97.95 96.30 0.00 0.380 6075 82.52 98.14 96.30 0.01 0.608NNPS 2917 54.77 73.26 62.68 0.10 0.395 234 53.85 11.97 62.68 -0.35 0.374NNS 65922 97.72 97.22 97.47 0.00 0.291 2353 87.25 90.44 97.47 0.03 0.574PDT 397 70.00 84.63 76.62 0.18 0.374 0 76.62 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.15 90.53 92.78 -0.02 0.254 516 90.13 81.40 92.78 -0.10 0.374RBR 1905 77.42 67.14 71.91 0.07 0.688 7 0.00 71.91 – –RBS 486 87.98 84.36 86.13 0.11 0.374 2 0.00 0.00 86.13 – –RP 2879 75.90 75.69 75.79 0.09 0.607 0 75.79 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.71 94.63 95.66 0.00 0.667 570 86.18 78.77 95.66 0.05 0.627VBD 32941 94.97 95.93 95.45 -0.01 0.681 426 80.64 71.36 95.45 0.62 0.138VBG 16321 92.36 94.22 93.28 0.00 0.921 924 81.96 91.45 93.28 0.03 0.882VBN 22177 88.20 90.93 89.55 -0.01 0.622 716 78.88 73.04 89.55 0.41 0.390VBP 13819 94.01 92.13 93.06 -0.00 0.938 131 72.41 32.06 93.06 0.00 0.995VBZ 23816 98.83 96.42 97.61 0.00 0.608 467 89.85 64.45 97.61 -0.21 0.374WDT 4745 97.20 96.42 96.80 0.03 0.070 4 0.00 96.80 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.73 0.02 0.754TOKENS 1044667 96.852 0.00 0.994 24622 84.65 0.03 0.071

106

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.69 99.36 99.53 0.05 0.204 1 0.00 99.53 – –CD 40132 99.35 99.44 99.39 -0.01 0.605 3413 98.87 97.10 99.39 -0.10 0.577DT 90066 99.30 99.50 99.40 -0.00 0.675 2 0.00 99.40 – –EX 951 95.43 98.84 97.11 -0.16 0.407 0 97.11 – –FW 238 54.11 47.06 50.34 -2.70 0.407 67 0.00 0.00 50.34 – –IN 108456 98.19 98.69 98.44 0.01 0.519 22 0.00 0.00 98.44 – –JJ 67085 91.98 91.26 91.62 -0.04 0.390 4581 78.92 72.47 91.62 -0.60 0.251JJR 3621 84.90 90.56 87.64 0.10 0.682 85 84.21 18.82 87.64 – –JJS 2129 94.23 95.82 95.02 1.89 0.392 54 71.93 75.93 95.02 – –MD 10743 99.64 99.79 99.71 -0.00 0.641 5 0.00 0.00 99.71 – –NN 146173 96.24 96.02 96.13 0.00 0.838 3934 74.17 69.06 96.13 -0.36 0.777NNP 100926 96.67 97.30 96.98 -0.03 0.307 6075 83.83 93.71 96.98 -0.08 0.710NNPS 2917 63.14 66.37 64.72 -1.35 0.128 234 31.63 13.25 64.72 -10.13 0.204NNS 65922 97.53 97.98 97.75 0.03 0.446 2353 82.51 89.21 97.75 -0.05 0.877PDT 397 70.80 80.60 75.38 0.27 0.240 0 75.38 – –POS 9529 98.96 99.50 99.23 0.01 0.757 0 99.23 – –PRP 19164 99.73 99.38 99.56 0.01 0.178 11 0.00 99.56 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.07 91.08 92.55 -0.03 0.556 516 83.40 82.75 92.55 0.95 0.377RBR 1905 75.75 66.09 70.59 0.17 0.777 7 50.00 14.29 70.59 – –RBS 486 87.39 81.28 84.22 7.22 0.410 2 0.00 84.22 – –RP 2879 78.99 75.62 77.27 0.29 0.187 0 0.00 77.27 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 68.57 48.00 56.47 -2.08 0.374 17 25.00 5.88 56.47 – –VB 29021 95.30 95.34 95.32 -0.11 0.151 570 81.87 75.26 95.32 -1.61 0.155VBD 32941 95.72 94.43 95.07 0.01 0.843 426 81.65 62.68 95.07 -1.85 0.403VBG 16321 91.20 92.49 91.84 0.05 0.568 924 69.63 92.53 91.84 0.32 0.603VBN 22177 86.90 90.56 88.69 -0.16 0.051 716 64.98 79.05 88.69 -3.62 0.008VBP 13819 93.69 92.23 92.95 -0.02 0.774 131 63.11 49.62 92.95 0.17 0.947VBZ 23816 97.69 96.53 97.11 0.08 0.073 467 85.29 55.89 97.11 1.16 0.654WDT 4745 96.66 95.85 96.25 0.17 0.235 4 0.00 96.25 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.73 -0.18 0.631TOKENS 1044667 96.840 -0.00 0.904 24622 81.65 -0.36 0.294

107

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 – – 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.74 99.33 99.54 -0.00 0.716 1 0.00 99.54 – –CD 40132 99.35 99.47 99.41 0.00 0.101 3413 99.26 98.80 99.41 0.06 0.178DT 90066 99.41 99.31 99.36 -0.00 0.194 2 0.00 99.36 – –EX 951 95.71 98.53 97.10 0.05 0.719 0 97.10 – –FW 238 73.68 29.41 42.04 1.75 0.203 67 100.00 5.97 42.04 – –IN 108456 97.72 98.71 98.21 -0.00 0.414 22 72.73 36.36 98.21 – –JJ 67085 91.82 92.82 92.31 0.00 0.806 4581 82.50 82.34 92.31 -0.02 0.826JJR 3621 84.90 90.20 87.47 0.01 0.931 85 80.85 44.71 87.47 -1.83 0.374JJS 2129 94.48 94.93 94.70 -0.07 0.533 54 86.05 68.52 94.70 -2.63 0.202MD 10743 99.74 99.57 99.66 -0.00 0.374 5 0.00 99.66 – –NN 146173 96.37 96.58 96.48 0.00 0.983 3934 79.49 78.11 96.48 -0.25 0.105NNP 100926 96.21 97.70 96.95 -0.01 0.243 6075 89.36 95.52 96.95 -0.02 0.710NNPS 2917 65.58 53.96 59.21 -0.38 0.172 234 57.62 37.18 59.21 1.29 0.587NNS 65922 97.64 98.49 98.07 -0.00 0.991 2353 90.12 92.69 98.07 0.08 0.630PDT 397 74.28 64.74 69.18 0.53 0.159 0 69.18 – –POS 9529 98.45 99.63 99.04 0.01 0.374 0 99.04 – –PRP 19164 99.85 99.15 99.50 0.00 0.374 11 0.00 99.50 – –PRP$ 9173 99.12 99.95 99.53 – – 1 0.00 99.53 – –RB 33806 93.75 90.91 92.30 -0.01 0.554 516 92.77 84.50 92.30 -0.20 0.179RBR 1905 75.44 62.57 68.41 0.06 0.870 7 0.00 0.00 68.41 – –RBS 486 84.68 79.63 82.08 -0.60 0.268 2 0.00 82.08 – –RP 2879 80.60 72.59 76.39 0.02 0.992 0 76.39 – –SYM 59 80.43 62.71 70.48 – – 1 100.00 100.00 70.48 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.85 28.00 42.11 2.92 0.374 17 0.00 42.11 – –VB 29021 97.49 95.10 96.28 0.01 0.402 570 90.93 86.14 96.28 0.07 0.727VBD 32941 96.03 96.60 96.32 -0.01 0.569 426 84.18 81.22 96.32 0.41 0.076VBG 16321 94.17 91.99 93.06 -0.02 0.262 924 85.63 90.26 93.06 0.13 0.127VBN 22177 91.10 89.55 90.32 -0.02 0.552 716 81.00 79.19 90.32 -0.21 0.369VBP 13819 94.61 93.65 94.13 -0.03 0.205 131 73.21 62.60 94.13 -1.20 0.310VBZ 23816 98.86 96.74 97.79 0.00 0.766 467 89.22 77.94 97.79 -0.18 0.858WDT 4745 96.34 95.91 96.12 -0.01 0.667 4 0.00 96.12 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.68 -0.08 0.590TOKENS 1044667 97.053 -0.00 0.633 24622 87.30 -0.04 0.158

108

B.45

(Pos = RP & Wd ∈ {across, along, back,...<4 ommitted>...,out, over, up}) ⇒ (Pos ← RP–0)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.39 99.48 0.00 0.807 1 0.00 99.48 – –CD 40132 99.29 99.45 99.37 -0.03 0.104 3413 98.28 97.07 99.37 -0.41 0.127DT 90066 99.32 99.49 99.40 0.00 0.785 2 0.00 99.40 – –EX 951 95.53 98.95 97.21 -0.05 0.558 0 97.21 – –FW 238 55.02 48.32 51.45 -0.54 0.779 67 100.00 2.99 51.45 – –IN 108456 98.17 98.68 98.42 -0.00 0.747 22 0.00 0.00 98.42 – –JJ 67085 91.91 91.29 91.60 -0.07 0.142 4581 78.34 71.93 91.60 -1.34 0.015JJR 3621 84.83 90.94 87.78 0.26 0.333 85 69.23 21.18 87.78 -6.97 0.958JJS 2129 94.32 92.06 93.18 -0.09 0.412 54 70.69 75.93 93.18 -0.89 0.374MD 10743 99.70 99.75 99.73 0.01 0.471 5 0.00 99.73 – –NN 146173 96.24 96.03 96.14 0.01 0.779 3934 73.89 69.62 96.14 -0.13 0.829NNP 100926 96.64 97.35 96.99 -0.02 0.069 6075 83.58 94.17 96.99 -0.01 0.927NNPS 2917 63.68 66.64 65.13 -0.73 0.448 234 39.00 16.67 65.13 12.39 0.675NNS 65922 97.59 97.89 97.74 0.01 0.690 2353 83.64 87.55 97.74 -0.26 0.438PDT 397 70.73 80.35 75.24 0.08 0.823 0 75.24 – –POS 9529 98.94 99.55 99.24 0.02 0.288 0 99.24 – –PRP 19164 99.73 99.36 99.55 -0.00 0.374 11 0.00 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.13 90.97 92.52 -0.06 0.401 516 85.66 82.17 92.52 1.92 0.032RBR 1905 76.22 66.46 71.00 0.75 0.264 7 0.00 0.00 71.00 – –RBS 486 74.49 82.92 78.48 -0.09 0.712 2 0.00 78.48 – –RP 2879 78.77 75.41 77.05 0.01 0.962 0 77.05 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.53 95.41 95.47 0.04 0.364 570 83.46 76.14 95.47 -0.10 0.811VBD 32941 95.69 94.39 95.04 -0.03 0.267 426 82.28 64.32 95.04 -0.07 0.957VBG 16321 91.70 92.18 91.94 0.16 0.192 924 71.43 91.45 91.94 1.26 0.052VBN 22177 86.74 90.74 88.70 -0.16 0.022 716 65.70 82.12 88.70 -1.36 0.051VBP 13819 93.46 92.34 92.90 -0.08 0.313 131 63.55 51.91 92.90 3.03 0.659VBZ 23816 97.68 96.52 97.10 0.07 0.271 467 89.26 56.96 97.10 4.18 0.263WDT 4745 96.58 95.70 96.14 0.05 0.641 4 0.00 96.14 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.64 -0.34 0.333TOKENS 1044667 96.835 -0.01 0.536 24622 81.75 -0.24 0.385

109

B.46 in(cl–c,s) Mapping

(Pos = IN & Wd ∈ {@, a, and,...<12 ommitted>...,v., vs., which}) ⇒ (Pos ← IN–0)(Pos = IN & Wd = than) ⇒ (Pos ← IN–1)(Pos = IN & Wd ∈ {’til, ago, albeit,...<13 ommitted>...,whereas, whether, while}) ⇒ (Pos ←IN–2)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.96 99.94 99.95 -0.01 0.374 0 99.95 – –CC 26227 99.74 99.37 99.55 0.02 0.137 1 0.00 99.55 – –CD 40132 99.27 99.57 99.42 -0.00 0.651 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.44 99.36 99.40 0.02 0.025 2 0.00 99.40 – –EX 951 94.98 99.58 97.23 -0.15 0.442 0 97.23 – –FW 238 62.79 34.03 44.14 2.62 0.381 67 0.00 0.00 44.14 – –IN 108456 97.93 98.92 98.43 0.06 0.003 22 0.00 98.43 – –JJ 67085 92.37 92.08 92.23 0.01 0.479 4581 79.53 82.12 92.23 -0.04 0.577JJR 3621 84.73 92.24 88.32 -0.10 0.450 85 88.89 28.24 88.32 3.42 0.374JJS 2129 95.10 95.82 95.46 0.00 0.856 54 84.62 61.11 95.46 0.91 0.594MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.18 95.09 96.12 0.01 0.357 3934 81.30 67.64 96.12 0.13 0.468NNP 100926 94.69 97.96 96.30 0.00 0.752 6075 82.50 98.11 96.30 -0.02 0.413NNPS 2917 54.76 73.12 62.62 0.01 0.967 234 54.00 11.54 62.62 -3.23 0.374NNS 65922 97.73 97.21 97.47 0.00 0.428 2353 87.23 90.31 97.47 -0.05 0.558PDT 397 69.87 84.13 76.34 -0.18 0.374 0 76.34 – –POS 9529 98.76 99.65 99.21 0.01 0.178 0 99.21 – –PRP 19164 99.85 99.33 99.59 -0.00 0.374 11 0.00 0.00 99.59 – –PRP$ 9173 99.38 99.96 99.67 -0.01 0.374 1 0.00 99.67 – –RB 33806 95.31 90.50 92.84 0.05 0.019 516 90.11 81.20 92.84 -0.24 0.181RBR 1905 77.39 66.82 71.72 -0.20 0.603 7 0.00 71.72 – –RBS 486 87.96 84.16 86.01 -0.03 0.374 2 0.00 0.00 86.01 – –RP 2879 75.42 76.00 75.71 -0.02 0.934 0 75.71 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.71 94.64 95.66 0.01 0.498 570 85.74 79.12 95.66 0.04 0.914VBD 32941 95.03 95.91 95.47 0.01 0.196 426 81.67 71.13 95.47 1.04 0.132VBG 16321 92.38 94.22 93.29 0.01 0.586 924 81.91 91.13 93.29 -0.17 0.406VBN 22177 88.21 90.95 89.56 0.01 0.894 716 78.74 73.46 89.56 0.62 0.306VBP 13819 93.98 92.16 93.06 0.00 0.936 131 71.93 31.30 93.06 -1.86 0.605VBZ 23816 98.82 96.45 97.62 0.02 0.097 467 89.38 64.88 97.62 -0.04 0.829WDT 4745 96.81 96.69 96.75 -0.02 0.832 4 0.00 96.75 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.90 0.37 0.096TOKENS 1044667 96.865 0.01 0.024 24622 84.64 0.01 0.774

110

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.34 99.49 0.01 0.687 1 0.00 99.49 – –CD 40132 99.34 99.42 99.38 -0.03 0.357 3413 98.74 96.78 99.38 -0.33 0.200DT 90066 99.31 99.47 99.39 -0.01 0.556 2 0.00 99.39 – –EX 951 95.52 98.63 97.05 -0.22 0.626 0 0.00 97.05 – –FW 238 60.48 42.44 49.88 -3.59 0.282 67 0.00 49.88 – –IN 108456 98.18 98.71 98.44 0.02 0.153 22 16.67 4.55 98.44 – –JJ 67085 92.05 91.22 91.63 -0.03 0.564 4581 79.30 72.23 91.63 -0.55 0.499JJR 3621 84.60 90.89 87.63 0.09 0.634 85 68.42 30.59 87.63 21.27 0.387JJS 2129 94.29 88.45 91.27 -2.13 0.389 54 71.93 75.93 91.27 – –MD 10743 99.67 99.81 99.74 0.02 0.141 5 0.00 99.74 – –NN 146173 96.19 96.06 96.12 -0.01 0.826 3934 74.24 69.29 96.12 -0.15 0.805NNP 100926 96.66 97.36 97.01 -0.01 0.595 6075 83.87 94.09 97.01 0.14 0.280NNPS 2917 64.15 66.20 65.16 -0.67 0.382 234 31.25 8.55 65.16 -35.40 0.278NNS 65922 97.51 97.93 97.72 -0.01 0.724 2353 81.71 89.59 97.72 -0.35 0.640PDT 397 70.44 79.85 74.85 -0.43 0.431 0 74.85 – –POS 9529 98.96 99.54 99.25 0.03 0.137 0 99.25 – –PRP 19164 99.74 99.37 99.56 0.01 0.180 11 0.00 99.56 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.11 90.92 92.49 -0.09 0.110 516 84.19 79.46 92.49 -0.66 0.472RBR 1905 75.74 65.88 70.47 -0.01 0.991 7 20.00 14.29 70.47 – –RBS 486 65.06 83.54 73.15 -6.87 0.381 2 0.00 73.15 – –RP 2879 78.85 75.89 77.35 0.39 0.198 0 77.35 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.37 95.39 95.38 -0.05 0.480 570 83.94 77.02 95.38 0.78 0.762VBD 32941 95.57 94.69 95.13 0.07 0.123 426 78.47 65.02 95.13 -1.57 0.427VBG 16321 91.35 92.22 91.78 -0.02 0.444 924 69.84 92.21 91.78 0.34 0.675VBN 22177 87.15 90.56 88.82 -0.02 0.847 716 67.13 81.56 88.82 -0.49 0.677VBP 13819 93.66 92.18 92.91 -0.06 0.626 131 57.39 50.38 92.91 -3.25 0.607VBZ 23816 97.69 96.54 97.11 0.09 0.147 467 87.25 55.67 97.11 1.83 0.347WDT 4745 96.30 96.04 96.17 0.08 0.468 4 0.00 96.17 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.52 -0.59 0.149TOKENS 1044667 96.831 -0.01 0.411 24622 81.79 -0.18 0.444

111

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 – – 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.74 99.36 99.55 0.01 0.034 1 0.00 99.55 – –CD 40132 99.34 99.49 99.41 0.01 0.157 3413 99.21 98.86 99.41 0.06 0.284DT 90066 99.41 99.31 99.36 -0.01 0.207 2 0.00 99.36 – –EX 951 96.10 98.42 97.25 0.20 0.239 0 97.25 – –FW 238 71.84 31.09 43.40 5.04 0.046 67 80.00 5.97 43.40 – –IN 108456 97.80 98.68 98.24 0.02 0.131 22 71.43 22.73 98.24 – –JJ 67085 91.85 92.84 92.34 0.04 0.287 4581 82.37 82.54 92.34 0.02 0.718JJR 3621 84.78 90.17 87.39 -0.08 0.133 85 80.85 44.71 87.39 -1.83 0.374JJS 2129 94.39 94.83 94.61 -0.17 0.280 54 84.44 70.37 94.61 -2.02 0.208MD 10743 99.73 99.58 99.66 -0.00 0.378 5 0.00 99.66 – –NN 146173 96.38 96.58 96.48 0.00 0.830 3934 79.53 78.01 96.48 -0.29 0.183NNP 100926 96.23 97.70 96.96 -0.00 0.912 6075 89.44 95.47 96.96 0.00 0.906NNPS 2917 65.47 54.20 59.30 -0.22 0.521 234 56.08 35.47 59.30 -2.61 0.271NNS 65922 97.66 98.52 98.09 0.02 0.076 2353 90.15 92.99 98.09 0.25 0.170PDT 397 73.79 65.24 69.25 0.63 0.247 0 69.25 – –POS 9529 98.45 99.62 99.04 – – 0 99.04 – –PRP 19164 99.86 99.16 99.51 0.01 0.071 11 0.00 99.51 – –PRP$ 9173 99.13 99.95 99.54 0.01 0.374 1 0.00 99.54 – –RB 33806 93.85 91.03 92.42 0.11 0.015 516 92.75 84.30 92.42 -0.33 0.159RBR 1905 75.30 62.57 68.35 -0.03 0.936 7 0.00 0.00 68.35 – –RBS 486 84.31 79.63 81.90 -0.81 0.325 2 0.00 81.90 – –RP 2879 79.75 73.19 76.33 -0.06 0.724 0 76.33 – –SYM 59 80.85 64.41 71.70 1.73 0.374 1 100.00 100.00 71.70 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.85 28.00 42.11 2.92 0.374 17 0.00 42.11 – –VB 29021 97.47 95.08 96.26 -0.01 0.398 570 90.93 86.14 96.26 0.07 0.919VBD 32941 96.05 96.62 96.34 0.01 0.533 426 84.31 80.75 96.34 0.19 0.691VBG 16321 94.16 92.01 93.07 -0.01 0.543 924 85.54 90.26 93.07 0.08 0.500VBN 22177 91.14 89.58 90.35 0.02 0.572 716 81.44 79.05 90.35 -0.03 0.876VBP 13819 94.65 93.80 94.22 0.07 0.065 131 73.45 63.36 94.22 -0.41 0.565VBZ 23816 98.86 96.77 97.80 0.02 0.233 467 90.39 78.59 97.80 0.87 0.446WDT 4745 95.85 95.89 95.87 -0.28 0.038 4 0.00 95.87 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.78 0.10 0.578TOKENS 1044667 97.065 0.01 0.248 24622 87.32 -0.01 0.797

112

B.47

(Pos = PRP & Wd ∈ {I, he, she, they, we}) ⇒ (Pos ← PRP–N)(Pos = PRP & Wd ∈ {her, him, me, them, us}) ⇒ (Pos ← PRP–A)(Pos = PRP & Wd ∈ {herself, himself, itself, myself, ourselves, themselves, yourself,yourselves}) ⇒ (Pos ← PRP–RX)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.38 99.47 -0.00 0.587 1 0.00 99.47 – –CD 40132 99.35 99.47 99.41 0.00 0.908 3413 98.87 97.19 99.41 -0.06 0.786DT 90066 99.31 99.48 99.40 -0.01 0.709 2 0.00 99.40 – –EX 951 95.53 98.95 97.21 -0.05 0.374 0 97.21 – –FW 238 55.78 46.64 50.80 -1.80 0.458 67 100.00 1.49 50.80 – –IN 108456 98.16 98.67 98.42 -0.01 0.522 22 0.00 0.00 98.42 – –JJ 67085 92.01 91.15 91.57 -0.09 0.274 4581 79.85 71.38 91.57 -0.83 0.064JJR 3621 84.81 91.11 87.84 0.34 0.047 85 66.67 30.59 87.84 20.29 0.412JJS 2129 91.66 92.34 92.00 -1.35 0.375 54 71.93 75.93 92.00 – –MD 10743 99.69 99.75 99.72 0.00 0.612 5 0.00 99.72 – –NN 146173 96.14 96.07 96.11 -0.03 0.459 3934 73.75 69.78 96.11 -0.11 0.807NNP 100926 96.63 97.31 96.97 -0.05 0.255 6075 83.29 94.12 96.97 -0.22 0.312NNPS 2917 63.35 65.55 64.43 -1.78 0.033 234 25.64 8.55 64.43 -38.30 0.059NNS 65922 97.50 97.98 97.74 0.01 0.637 2353 82.13 88.48 97.74 -0.68 0.351PDT 397 71.23 78.59 74.73 -0.59 0.290 0 74.73 – –POS 9529 98.94 99.44 99.19 -0.03 0.185 0 99.19 – –PRP 19164 99.72 99.37 99.55 -0.01 0.591 11 0.00 0.00 99.55 – –PRP$ 9173 99.40 99.88 99.64 -0.00 0.994 1 0.00 99.64 – –RB 33806 94.09 90.81 92.42 -0.17 0.123 516 84.87 80.43 92.42 0.36 0.780RBR 1905 76.12 66.09 70.75 0.39 0.032 7 0.00 0.00 70.75 – –RBS 486 72.15 70.37 71.25 -9.29 0.380 2 0.00 71.25 – –RP 2879 78.62 75.76 77.16 0.15 0.348 0 77.16 – –SYM 59 77.94 89.83 83.46 2.20 0.482 1 100.00 100.00 83.46 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.19 46.00 56.79 -1.52 0.374 17 0.00 56.79 – –VB 29021 95.41 95.44 95.43 0.00 0.925 570 82.33 76.84 95.43 -0.27 0.730VBD 32941 95.62 94.62 95.11 0.06 0.110 426 84.23 62.68 95.11 -0.53 0.831VBG 16321 91.45 92.08 91.76 -0.03 0.567 924 69.85 91.77 91.76 0.15 0.809VBN 22177 87.08 90.45 88.73 -0.12 0.155 716 66.40 81.70 88.73 -1.00 0.616VBP 13819 93.56 92.26 92.90 -0.07 0.514 131 60.53 52.67 92.90 1.56 0.582VBZ 23816 97.75 96.53 97.13 0.11 0.047 467 86.93 56.96 97.13 3.10 0.340WDT 4745 96.50 95.79 96.14 0.05 0.731 4 0.00 96.14 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.33 -0.94 0.117TOKENS 1044667 96.817 -0.03 0.203 24622 81.67 -0.33 0.371

113

B.48

(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.69 99.37 99.53 -0.01 0.676 1 0.00 99.53 – –CD 40132 99.27 99.57 99.42 0.00 0.997 3413 99.35 98.51 99.42 – –DT 90066 99.42 99.33 99.38 0.00 0.835 2 0.00 99.38 – –EX 951 94.98 99.58 97.23 -0.15 0.071 0 97.23 – –FW 238 62.10 32.35 42.54 -1.10 0.787 67 0.00 0.00 42.54 – –IN 108456 97.96 98.80 98.38 0.02 0.223 22 0.00 98.38 – –JJ 67085 92.37 92.06 92.21 -0.00 0.641 4581 79.80 82.10 92.21 0.11 0.308JJR 3621 84.75 92.24 88.34 -0.08 0.468 85 88.46 27.06 88.34 0.00 0.744JJS 2129 95.15 95.77 95.46 0.00 0.944 54 84.21 59.26 95.46 -1.09 0.374MD 10743 99.68 99.79 99.73 -0.00 0.374 5 0.00 99.73 – –NN 146173 97.17 95.09 96.11 -0.00 0.985 3934 81.05 67.74 96.11 0.08 0.549NNP 100926 94.68 97.96 96.29 -0.00 0.649 6075 82.48 98.14 96.29 -0.01 0.456NNPS 2917 54.78 73.09 62.62 0.01 0.980 234 53.85 11.97 62.62 -0.35 0.374NNS 65922 97.72 97.23 97.47 0.00 0.555 2353 87.36 90.48 97.47 0.11 0.032PDT 397 70.15 84.63 76.71 0.30 0.199 0 76.71 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 0.00 0.983 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 0.00 0.987 1 0.00 99.67 – –RB 33806 95.10 90.63 92.81 0.01 0.773 516 90.32 81.40 92.81 – –RBR 1905 77.50 66.72 71.71 -0.22 0.453 7 0.00 71.71 – –RBS 486 87.98 84.36 86.13 0.11 0.374 2 0.00 0.00 86.13 – –RP 2879 75.05 75.65 75.35 -0.49 0.145 0 75.35 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.72 94.61 95.66 0.00 0.885 570 86.51 78.77 95.66 0.24 0.363VBD 32941 94.98 95.95 95.46 0.00 0.956 426 80.11 70.89 95.46 -0.04 0.988VBG 16321 92.35 94.15 93.24 -0.04 0.184 924 82.20 91.45 93.24 0.19 0.402VBN 22177 88.21 90.89 89.53 -0.03 0.267 716 78.73 72.91 89.53 0.22 0.707VBP 13819 93.95 92.22 93.08 0.02 0.415 131 72.88 32.82 93.08 1.84 0.821VBZ 23816 98.84 96.39 97.60 -0.01 0.749 467 89.68 65.10 97.60 0.29 0.788WDT 4745 96.39 96.88 96.64 -0.14 0.195 4 0.00 96.64 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.74 0.04 0.668TOKENS 1044667 96.851 -0.00 0.165 24622 84.67 0.06 0.112

114

B.49

(Pos = NNP & Par = NP-TMP) ⇒ (Pos ← NNP–TMP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.38 99.50 0.02 0.199 1 0.00 0.00 99.50 – –CD 40132 99.32 99.46 99.39 -0.02 0.559 3413 98.37 97.28 99.39 -0.26 0.348DT 90066 99.33 99.48 99.40 0.00 0.811 2 0.00 99.40 – –EX 951 95.53 98.95 97.21 -0.05 0.583 0 97.21 – –FW 238 53.02 47.90 50.33 -2.71 0.274 67 0.00 0.00 50.33 – –IN 108456 98.17 98.70 98.44 0.01 0.324 22 16.67 4.55 98.44 – –JJ 67085 91.94 91.21 91.57 -0.09 0.167 4581 77.75 72.32 91.57 -1.42 0.059JJR 3621 84.99 90.72 87.76 0.25 0.241 85 76.19 18.82 87.76 – –JJS 2129 88.15 92.63 90.33 -3.13 0.505 54 72.73 74.07 90.33 -0.65 0.517MD 10743 99.65 99.80 99.72 0.00 0.809 5 0.00 0.00 99.72 – –NN 146173 96.27 95.96 96.11 -0.02 0.387 3934 75.02 68.48 96.11 -0.26 0.650NNP 100926 96.68 97.30 96.99 -0.03 0.180 6075 83.91 93.51 96.99 -0.13 0.645NNPS 2917 63.51 66.23 64.84 -1.16 0.312 234 33.63 16.24 64.84 5.40 0.806NNS 65922 97.50 97.94 97.72 -0.01 0.809 2353 82.01 88.91 97.72 -0.52 0.348PDT 397 70.92 79.85 75.12 -0.08 0.628 0 75.12 – –POS 9529 98.91 99.50 99.20 -0.02 0.568 0 99.20 – –PRP 19164 99.73 99.37 99.55 -0.00 0.989 11 0.00 99.55 – –PRP$ 9173 99.38 99.91 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.06 91.06 92.54 -0.04 0.512 516 83.14 83.14 92.54 1.03 0.526RBR 1905 76.17 66.46 70.98 0.73 0.434 7 0.00 0.00 70.98 – –RBS 486 67.37 52.67 59.12 -24.73 0.188 2 0.00 59.12 – –RP 2879 78.74 75.89 77.29 0.32 0.124 0 77.29 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.22 95.40 95.31 -0.12 0.143 570 81.26 74.56 95.31 -2.44 0.172VBD 32941 95.64 94.62 95.13 0.07 0.124 426 83.54 64.32 95.13 0.59 0.450VBG 16321 91.32 92.42 91.87 0.08 0.385 924 70.50 92.32 91.87 0.93 0.351VBN 22177 87.08 90.72 88.86 0.03 0.717 716 65.38 80.73 88.86 -2.37 0.294VBP 13819 93.33 92.29 92.80 -0.18 0.076 131 57.80 48.09 92.80 -5.34 0.494VBZ 23816 97.78 96.56 97.16 0.14 0.038 467 90.33 58.03 97.16 5.86 0.136WDT 4745 96.67 95.93 96.30 0.22 0.091 4 0.00 96.30 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.49 -0.65 0.273TOKENS 1044667 96.820 -0.02 0.327 24622 81.59 -0.44 0.099

115

B.50

(Pos = VB & Wd ∈ {do, help, let, make}) ⇒ (Pos ← VB–I)(Pos = VBG & Wd ∈ {doing, helping, letting, making}) ⇒ (Pos ← VBG–I)(Pos = VBD & Wd ∈ {did, helped, let, made}) ⇒ (Pos ← VBD–I)(Pos = VBN & Wd ∈ {done, helped, let, made}) ⇒ (Pos ← VBN–I)(Pos = VBP & Wd ∈ {do, help, let, make}) ⇒ (Pos ← VBP–I)(Pos = VBZ & Wd ∈ {does, helps, lets, makes}) ⇒ (Pos ← VBZ–I)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.73 99.33 99.53 -0.00 0.374 1 0.00 99.53 – –CD 40132 99.27 99.56 99.42 -0.00 0.178 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.47 99.28 99.37 -0.00 0.999 2 0.00 99.37 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.83 98.91 98.37 0.00 0.912 22 0.00 98.37 – –JJ 67085 92.37 92.04 92.20 -0.01 0.027 4581 79.64 82.03 92.20 -0.03 0.527JJR 3621 84.78 92.29 88.38 -0.04 0.076 85 88.46 27.06 88.38 – –JJS 2129 95.14 95.73 95.43 -0.02 0.374 54 86.49 59.26 95.43 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.16 95.08 96.11 -0.00 0.222 3934 81.06 67.67 96.11 0.02 0.701NNP 100926 94.67 97.96 96.29 -0.01 0.162 6075 82.51 98.12 96.29 -0.01 0.374NNPS 2917 54.64 73.26 62.60 -0.04 0.436 234 52.94 11.54 62.60 -3.57 0.322NNS 65922 97.72 97.19 97.45 -0.01 0.114 2353 87.28 90.44 97.45 0.05 0.674PDT 397 69.94 84.38 76.48 – – 0 76.48 – –POS 9529 98.76 99.65 99.21 0.01 0.178 0 99.21 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.15 90.54 92.79 -0.01 0.150 516 90.32 81.40 92.79 – –RBR 1905 77.48 66.82 71.76 -0.14 0.071 7 0.00 71.76 – –RBS 486 87.77 84.16 85.92 -0.14 0.374 2 0.00 0.00 85.92 – –RP 2879 75.75 75.72 75.73 0.02 0.517 0 75.73 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.71 94.59 95.64 -0.02 0.247 570 85.88 78.95 95.64 – –VBD 32941 94.93 95.95 95.44 -0.02 0.176 426 80.53 70.89 95.44 0.21 0.199VBG 16321 92.34 94.19 93.26 -0.03 0.199 924 82.02 91.34 93.26 0.02 0.944VBN 22177 88.23 90.86 89.52 -0.04 0.067 716 78.53 73.04 89.52 0.19 0.374VBP 13819 93.91 92.11 93.00 -0.06 0.051 131 72.41 32.06 93.00 – –VBZ 23816 98.80 96.38 97.58 -0.03 0.132 467 89.29 64.24 97.58 -0.66 0.245WDT 4745 97.15 96.44 96.80 0.02 0.731 4 0.00 96.80 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.58 -0.27 0.030TOKENS 1044667 96.845 -0.01 0.030 24622 84.62 -0.01 0.374

116

B.51

(Pos = DT & Wd ∈ {a, another, each, every, little, many, much}) ⇒ (Pos ← DT–1)(Pos = DT & Wd ∈ {these, those}) ⇒ (Pos ← DT–P)(Pos = NNS & Wd ∈ {acrobatics, adenoids, alms,...<66 ommitted>...,tweezers, vicissitudes,waterworks}) ⇒ (Pos ← NNS–P)(Pos = NN & Wd ∈ {abaci, aback, abaft,...<32532 ommitted>...,zydeco, zygotic, zymurgy})⇒ (Pos ← NN–M)(Pos = JJ & Wd ∈ {countless, few, many, numerous, several}) ⇒ (Pos ← JJ–P)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.86 99.86 -0.05 0.080 0 99.86 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.38 99.51 0.03 0.380 1 0.00 99.51 – –CD 40132 99.35 99.44 99.39 -0.02 0.428 3413 98.84 96.95 99.39 -0.20 0.294DT 90066 99.35 99.43 99.39 -0.01 0.260 2 0.00 99.39 – –EX 951 95.53 98.84 97.16 -0.11 0.447 0 97.16 – –FW 238 53.47 45.38 49.09 -5.11 0.240 67 22.22 2.99 49.09 – –IN 108456 98.11 98.70 98.41 -0.02 0.460 22 0.00 0.00 98.41 – –JJ 67085 91.64 91.32 91.48 -0.20 0.019 4581 77.80 74.37 91.48 0.05 0.955JJR 3621 84.79 90.67 87.63 0.09 0.609 85 68.75 25.88 87.63 – –JJS 2129 94.27 91.97 93.11 -0.16 0.990 54 73.21 75.93 93.11 0.91 0.374MD 10743 99.67 99.79 99.73 0.01 0.210 5 0.00 99.73 – –NN 146173 96.26 95.79 96.02 -0.11 0.014 3934 75.56 66.47 96.02 -1.48 0.143NNP 100926 96.66 97.26 96.96 -0.06 0.165 6075 83.78 93.71 96.96 -0.11 0.784NNPS 2917 63.11 67.57 65.26 -0.51 0.507 234 41.50 26.07 65.26 54.10 0.192NNS 65922 97.44 97.89 97.67 -0.06 0.054 2353 82.57 87.59 97.67 -0.89 0.047PDT 397 71.46 77.58 74.40 -1.04 0.093 0 0.00 74.40 – –POS 9529 98.90 99.51 99.20 -0.02 0.567 0 99.20 – –PRP 19164 99.73 99.35 99.54 -0.01 0.209 11 0.00 0.00 99.54 – –PRP$ 9173 99.37 99.91 99.64 0.00 0.995 1 0.00 99.64 – –RB 33806 93.92 91.14 92.51 -0.07 0.032 516 83.37 81.59 92.51 0.21 0.815RBR 1905 75.28 66.35 70.54 0.09 0.881 7 33.33 14.29 70.54 – –RBS 486 74.31 82.72 78.29 -0.34 0.974 2 0.00 78.29 – –RP 2879 78.97 75.41 77.15 0.14 0.488 0 77.15 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.19 95.46 95.32 -0.11 0.370 570 81.78 77.19 95.32 -0.36 0.774VBD 32941 95.43 94.41 94.92 -0.15 0.081 426 78.64 62.21 94.92 -3.86 0.103VBG 16321 91.27 92.46 91.86 0.07 0.492 924 71.63 93.18 91.86 2.26 0.042VBN 22177 86.91 90.38 88.61 -0.26 0.114 716 66.82 82.96 88.61 0.02 0.986VBP 13819 93.84 91.99 92.90 -0.07 0.335 131 61.74 54.20 92.90 4.08 0.527VBZ 23816 97.70 96.34 97.01 -0.01 0.878 467 84.85 53.96 97.01 -1.17 0.829WDT 4745 96.48 95.28 95.88 -0.22 0.095 4 0.00 95.88 – –WP 2604 98.90 99.85 99.37 0.02 0.589 0 99.37 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.99 -1.60 0.033TOKENS 1044667 96.783 -0.06 0.031 24622 81.70 -0.30 0.508

117

B.52

(Pos = DT & Wd ∈ {a, another, each, every, little, many, much}) ⇒ (Pos ← DT–1)(Pos = DT & Wd ∈ {these, those}) ⇒ (Pos ← DT–P)(Pos = NN & Wd ∈ {abaci, aback, abaft,...<37563 ommitted>...,zydeco, zygotic, zymurgy})⇒ (Pos ← NN–M)(Pos = JJ & Wd ∈ {countless, few, many, numerous, several}) ⇒ (Pos ← JJ–P)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.86 99.86 -0.05 0.080 0 99.86 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.70 99.38 99.54 0.06 0.104 1 0.00 99.54 – –CD 40132 99.30 99.41 99.36 -0.05 0.198 3413 98.48 96.69 99.36 -0.51 0.177DT 90066 99.37 99.43 99.40 -0.00 0.816 2 0.00 99.40 – –EX 951 95.24 98.84 97.01 -0.26 0.134 0 97.01 – –FW 238 51.57 48.32 49.89 -3.56 0.209 67 40.00 5.97 49.89 – –IN 108456 98.10 98.71 98.41 -0.02 0.315 22 0.00 0.00 98.41 – –JJ 67085 91.51 91.28 91.40 -0.29 0.001 4581 77.21 74.20 91.40 -0.44 0.618JJR 3621 84.74 90.20 87.38 -0.19 0.025 85 75.00 21.18 87.38 – –JJS 2129 94.55 92.06 93.29 0.03 0.658 54 73.21 75.93 93.29 0.91 0.374MD 10743 99.70 99.77 99.73 0.02 0.291 5 0.00 99.73 – –NN 146173 96.28 95.82 96.05 -0.09 0.007 3934 74.93 67.01 96.05 -1.45 0.009NNP 100926 96.65 97.33 96.99 -0.03 0.448 6075 84.20 94.09 96.99 0.34 0.314NNPS 2917 63.29 67.09 65.14 -0.71 0.498 234 34.19 17.09 65.14 9.69 0.710NNS 65922 97.53 97.81 97.67 -0.06 0.203 2353 82.37 87.59 97.67 -1.01 0.260PDT 397 71.63 77.58 74.49 -0.92 0.310 0 74.49 – –POS 9529 98.93 99.52 99.22 0.00 0.983 0 99.22 – –PRP 19164 99.71 99.38 99.55 -0.01 0.574 11 0.00 99.55 – –PRP$ 9173 99.37 99.90 99.64 -0.01 0.629 1 0.00 99.64 – –RB 33806 93.96 91.22 92.57 -0.00 0.964 516 83.37 84.50 92.57 1.98 0.164RBR 1905 74.91 66.46 70.43 -0.06 0.929 7 0.00 0.00 70.43 – –RBS 486 74.63 83.54 78.83 0.36 0.629 2 0.00 78.83 – –RP 2879 79.21 74.92 77.01 -0.05 0.868 0 77.01 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 73.44 47.00 57.32 -0.61 0.374 17 0.00 0.00 57.32 – –VB 29021 95.03 95.55 95.29 -0.14 0.126 570 78.64 76.84 95.29 -2.48 0.114VBD 32941 95.48 94.28 94.88 -0.19 0.118 426 79.82 62.21 94.88 -3.23 0.154VBG 16321 91.58 92.51 92.04 0.27 0.084 924 74.20 92.75 92.04 4.08 0.008VBN 22177 86.81 90.12 88.43 -0.46 0.077 716 65.82 79.61 88.43 -2.63 0.184VBP 13819 93.87 92.16 93.00 0.04 0.650 131 67.27 56.49 93.00 10.73 0.003VBZ 23816 97.65 96.44 97.04 0.02 0.662 467 84.90 54.18 97.04 -0.91 0.922WDT 4745 96.50 95.38 95.94 -0.15 0.277 4 0.00 95.94 – –WP 2604 98.90 99.85 99.37 0.02 0.589 0 99.37 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.90 -1.78 0.002TOKENS 1044667 96.785 -0.06 0.001 24622 81.67 -0.34 0.192

118

B.53

(Pos = RB & Wd = n’t) ⇒ (Pos ← RB–1)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.55 99.39 99.47 -0.00 0.374 1 0.00 99.47 – –CD 40132 99.31 99.45 99.38 -0.03 0.332 3413 98.46 97.25 99.38 -0.23 0.262DT 90066 99.33 99.47 99.40 0.00 0.937 2 0.00 99.40 – –EX 951 95.53 98.84 97.16 -0.11 0.374 0 97.16 – –FW 238 55.17 47.06 50.79 -1.81 0.524 67 60.00 4.48 50.79 – –IN 108456 98.17 98.69 98.43 0.00 0.658 22 0.00 0.00 98.43 – –JJ 67085 91.92 91.32 91.62 -0.04 0.144 4581 78.52 73.50 91.62 -0.11 0.779JJR 3621 84.97 90.56 87.67 0.14 0.555 85 70.59 14.12 87.67 – –JJS 2129 94.58 93.52 94.05 0.85 0.378 54 71.93 75.93 94.05 – –MD 10743 99.67 99.77 99.72 0.00 0.762 5 0.00 99.72 – –NN 146173 96.20 96.02 96.11 -0.02 0.467 3934 73.86 69.04 96.11 -0.58 0.341NNP 100926 96.74 97.26 97.00 -0.02 0.405 6075 84.71 92.72 97.00 -0.04 0.901NNPS 2917 63.28 66.47 64.84 -1.16 0.225 234 38.05 18.38 64.84 19.27 0.836NNS 65922 97.46 97.93 97.69 -0.03 0.395 2353 81.76 88.02 97.69 -1.16 0.310PDT 397 70.88 79.09 74.76 -0.55 0.203 0 74.76 – –POS 9529 98.89 99.59 99.24 0.02 0.458 0 99.24 – –PRP 19164 99.75 99.36 99.55 0.00 0.817 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.11 91.18 92.62 0.05 0.217 516 83.70 81.59 92.62 0.41 0.743RBR 1905 76.00 66.82 71.12 0.91 0.180 7 0.00 0.00 71.12 – –RBS 486 79.14 83.54 81.28 3.48 0.343 2 0.00 81.28 – –RP 2879 78.88 76.17 77.50 0.60 0.127 0 77.50 – –SYM 59 79.03 83.05 80.99 -0.83 0.374 1 0.00 0.00 80.99 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.50 95.36 95.43 0.01 0.775 570 84.38 75.79 95.43 0.18 0.956VBD 32941 95.66 94.46 95.05 -0.01 0.771 426 81.46 62.91 95.05 -1.74 0.412VBG 16321 91.46 92.15 91.81 0.01 0.837 924 69.87 92.86 91.81 0.67 0.270VBN 22177 86.88 90.59 88.70 -0.15 0.042 716 66.74 82.68 88.70 -0.19 0.687VBP 13819 93.59 92.16 92.87 -0.11 0.219 131 57.39 50.38 92.87 -3.25 0.657VBZ 23816 97.63 96.54 97.08 0.05 0.318 467 82.21 57.39 97.08 1.26 0.626WDT 4745 96.68 95.66 96.17 0.08 0.572 4 0.00 96.17 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.49 -0.63 0.186TOKENS 1044667 96.834 -0.01 0.490 24622 81.67 -0.33 0.399

119

B.54

(Pos = RB & Wd = n’t) ⇒ (Pos ← RB–1)(Pos = RB & Wd = also) ⇒ (Pos ← RB–2)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 99.99 99.98 99.98 -0.01 0.374 1 0.00 0.00 99.98 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.74 99.33 99.53 -0.00 0.617 1 0.00 99.53 – –CD 40132 99.27 99.56 99.42 -0.00 0.178 3413 99.35 98.51 99.42 – –DT 90066 99.47 99.28 99.37 -0.00 0.990 2 0.00 99.37 – –EX 951 95.37 99.58 97.43 0.05 0.374 0 97.43 – –FW 238 64.17 32.35 43.02 – – 67 0.00 0.00 43.02 – –IN 108456 97.82 98.91 98.36 -0.00 0.302 22 0.00 98.36 – –JJ 67085 92.36 92.07 92.22 0.00 0.988 4581 79.69 82.21 92.22 0.11 0.176JJR 3621 84.89 92.32 88.45 0.05 0.344 85 88.89 28.24 88.45 3.42 0.374JJS 2129 95.19 95.77 95.48 0.03 0.374 54 86.49 59.26 95.48 – –MD 10743 99.69 99.78 99.73 -0.00 0.374 5 0.00 99.73 – –NN 146173 97.18 95.09 96.12 0.01 0.001 3934 81.21 67.69 96.12 0.13 0.087NNP 100926 94.69 97.95 96.29 -0.00 0.673 6075 82.54 98.12 96.29 0.02 0.374NNPS 2917 54.74 73.23 62.65 0.05 0.525 234 53.85 11.97 62.65 -0.35 0.374NNS 65922 97.72 97.21 97.47 0.00 0.733 2353 87.27 90.35 97.47 -0.01 0.969PDT 397 69.94 84.38 76.48 – – 0 76.48 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.19 90.55 92.82 0.02 0.260 516 90.30 81.20 92.82 -0.14 0.374RBR 1905 77.56 67.14 71.98 0.16 0.109 7 0.00 71.98 – –RBS 486 87.98 84.36 86.13 0.11 0.374 2 0.00 0.00 86.13 – –RP 2879 75.69 75.72 75.71 -0.02 0.374 0 75.71 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.70 94.61 95.64 -0.01 0.117 570 86.04 78.95 95.64 0.09 0.709VBD 32941 95.03 95.91 95.47 0.01 0.599 426 80.75 70.89 95.47 0.33 0.648VBG 16321 92.36 94.22 93.28 0.00 0.917 924 82.10 91.34 93.28 0.07 0.374VBN 22177 88.21 90.97 89.57 0.01 0.596 716 78.59 73.32 89.57 0.43 0.353VBP 13819 93.98 92.24 93.10 0.04 0.080 131 72.88 32.82 93.10 1.84 0.374VBZ 23816 98.84 96.41 97.61 -0.00 0.984 467 89.58 64.45 97.61 -0.33 0.418WDT 4745 97.17 96.33 96.75 -0.02 0.590 4 0.00 96.75 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.71 -0.02 0.759TOKENS 1044667 96.855 0.00 0.366 24622 84.67 0.05 0.145

120

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.68 99.36 99.52 0.04 0.230 1 0.00 99.52 – –CD 40132 99.33 99.46 99.40 -0.01 0.653 3413 98.72 97.33 99.40 -0.06 0.738DT 90066 99.32 99.51 99.41 0.01 0.056 2 0.00 99.41 – –EX 951 95.43 98.84 97.11 -0.16 0.407 0 97.11 – –FW 238 51.83 47.48 49.56 -4.20 0.272 67 50.00 1.49 49.56 – –IN 108456 98.17 98.69 98.43 0.01 0.647 22 0.00 0.00 98.43 – –JJ 67085 92.12 91.17 91.64 -0.02 0.472 4581 79.25 71.62 91.64 -1.01 0.137JJR 3621 84.81 90.94 87.77 0.25 0.200 85 81.25 30.59 87.77 27.49 0.374JJS 2129 90.84 96.38 93.53 0.29 0.791 54 71.93 75.93 93.53 – –MD 10743 99.67 99.80 99.74 0.02 0.226 5 0.00 99.74 – –NN 146173 96.17 96.06 96.12 -0.01 0.676 3934 73.95 68.99 96.12 -0.56 0.352NNP 100926 96.68 97.34 97.01 -0.01 0.783 6075 83.81 93.76 97.01 -0.07 0.793NNPS 2917 62.72 66.34 64.48 -1.71 0.129 234 33.80 20.51 64.48 22.87 0.803NNS 65922 97.56 97.93 97.75 0.02 0.379 2353 82.69 89.50 97.75 0.22 0.527PDT 397 70.76 79.85 75.03 -0.20 0.691 0 75.03 – –POS 9529 98.99 99.54 99.26 0.04 0.015 0 99.26 – –PRP 19164 99.74 99.38 99.56 0.01 0.255 11 0.00 99.56 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.13 91.09 92.59 0.01 0.660 516 84.57 81.78 92.59 1.04 0.464RBR 1905 75.64 66.35 70.69 0.31 0.673 7 25.00 14.29 70.69 – –RBS 486 87.47 64.61 74.32 -5.39 0.530 2 0.00 74.32 – –RP 2879 78.83 75.65 77.21 0.21 0.343 0 77.21 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.43 95.27 95.35 -0.08 0.148 570 83.59 75.09 95.35 -0.75 0.129VBD 32941 95.82 94.43 95.12 0.06 0.181 426 82.42 63.85 95.12 -0.41 0.783VBG 16321 91.64 92.33 91.98 0.20 0.025 924 69.80 92.32 91.98 0.36 0.605VBN 22177 86.78 90.94 88.81 -0.03 0.445 716 65.64 80.59 88.81 -2.23 0.231VBP 13819 93.44 92.21 92.82 -0.16 0.228 131 58.72 48.85 92.82 -3.84 0.435VBZ 23816 97.73 96.63 97.18 0.16 0.027 467 89.44 58.03 97.18 5.45 0.121WDT 4745 96.64 95.66 96.14 0.06 0.661 4 0.00 96.14 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 99.96 99.87 99.91 -0.02 0.374 1 0.00 0.00 99.91 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.72 -0.20 0.413TOKENS 1044667 96.843 0.00 0.876 24622 81.73 -0.26 0.191

121

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 – – 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.75 99.33 99.54 -0.00 0.374 1 0.00 99.54 – –CD 40132 99.34 99.47 99.40 0.00 0.999 3413 99.21 98.77 99.40 0.01 0.577DT 90066 99.42 99.31 99.36 -0.00 0.070 2 0.00 99.36 – –EX 951 95.71 98.53 97.10 0.05 0.583 0 97.10 – –FW 238 73.68 29.41 42.04 1.75 0.203 67 100.00 5.97 42.04 – –IN 108456 97.73 98.70 98.21 -0.00 0.415 22 72.73 36.36 98.21 – –JJ 67085 91.83 92.80 92.31 0.00 0.953 4581 82.55 82.19 92.31 -0.08 0.528JJR 3621 84.83 90.20 87.43 -0.03 0.101 85 81.25 45.88 87.43 0.00 0.888JJS 2129 94.71 94.97 94.84 0.07 0.235 54 90.24 68.52 94.84 -0.58 0.374MD 10743 99.74 99.57 99.66 -0.00 0.374 5 0.00 99.66 – –NN 146173 96.37 96.60 96.48 0.01 0.568 3934 79.38 78.29 96.48 -0.20 0.276NNP 100926 96.22 97.71 96.96 0.00 0.772 6075 89.43 95.51 96.96 0.01 0.613NNPS 2917 65.65 53.79 59.13 -0.51 0.172 234 56.76 35.90 59.13 -1.44 0.546NNS 65922 97.65 98.51 98.08 0.01 0.145 2353 90.02 92.78 98.08 0.07 0.558PDT 397 73.99 64.48 68.91 0.13 0.667 0 68.91 – –POS 9529 98.46 99.62 99.04 0.01 0.374 0 99.04 – –PRP 19164 99.85 99.17 99.51 0.01 0.209 11 0.00 99.51 – –PRP$ 9173 99.13 99.95 99.54 0.01 0.374 1 0.00 99.54 – –RB 33806 93.77 90.85 92.29 -0.03 0.289 516 92.77 84.50 92.29 -0.20 0.374RBR 1905 75.36 62.47 68.31 -0.08 0.808 7 0.00 0.00 68.31 – –RBS 486 85.00 80.45 82.66 0.11 0.471 2 0.00 82.66 – –RP 2879 79.64 73.12 76.24 -0.17 0.365 0 76.24 – –SYM 59 80.43 62.71 70.48 – – 1 100.00 100.00 70.48 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.85 28.00 42.11 2.92 0.374 17 0.00 42.11 – –VB 29021 97.50 95.11 96.29 0.02 0.141 570 90.28 86.32 96.29 -0.18 0.474VBD 32941 96.03 96.61 96.32 -0.00 0.626 426 83.66 80.52 96.32 -0.34 0.178VBG 16321 94.16 92.01 93.07 -0.01 0.659 924 85.51 90.04 93.07 -0.05 0.671VBN 22177 91.14 89.54 90.33 0.00 0.946 716 81.14 79.33 90.33 -0.04 0.851VBP 13819 94.59 93.73 94.16 -0.00 0.915 131 73.64 61.83 94.16 -1.60 0.131VBZ 23816 98.85 96.74 97.79 0.00 0.805 467 89.46 78.16 97.79 0.09 0.809WDT 4745 96.30 95.95 96.13 -0.01 0.570 4 0.00 96.13 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.72 -0.02 0.892TOKENS 1044667 97.056 -0.00 0.972 24622 87.28 -0.06 0.067

122

B.55

(Pos = RB & Wd = n’t) ⇒ (Pos ← RB–1)(Pos = RB & Wd = also) ⇒ (Pos ← RB–2)(Pos = RB & Wd = not) ⇒ (Pos ← RB–3)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.37 99.49 0.01 0.633 1 0.00 0.00 99.49 – –CD 40132 99.33 99.44 99.38 -0.03 0.290 3413 98.78 97.04 99.38 -0.18 0.255DT 90066 99.32 99.47 99.40 -0.00 0.714 2 0.00 99.40 – –EX 951 95.33 98.84 97.06 -0.21 0.095 0 97.06 – –FW 238 58.47 44.96 50.83 -1.74 0.596 67 50.00 1.49 50.83 – –IN 108456 98.15 98.69 98.42 -0.01 0.403 22 0.00 0.00 98.42 – –JJ 67085 91.86 91.32 91.59 -0.08 0.083 4581 78.46 72.34 91.59 -0.97 0.156JJR 3621 84.90 90.80 87.75 0.23 0.478 85 65.38 20.00 87.75 -12.14 0.165JJS 2129 94.17 95.58 94.87 1.73 0.420 54 71.43 74.07 94.87 -1.55 0.374MD 10743 99.67 99.77 99.72 0.00 0.983 5 0.00 99.72 – –NN 146173 96.20 96.01 96.10 -0.03 0.490 3934 73.24 69.22 96.10 -0.86 0.271NNP 100926 96.69 97.27 96.98 -0.04 0.175 6075 84.38 93.43 96.98 0.12 0.422NNPS 2917 63.19 66.27 64.69 -1.39 0.205 234 33.00 14.10 64.69 -4.90 0.771NNS 65922 97.49 97.94 97.72 -0.01 0.688 2353 82.38 88.44 97.72 -0.54 0.435PDT 397 70.69 79.60 74.88 -0.39 0.424 0 74.88 – –POS 9529 98.92 99.51 99.21 -0.01 0.643 0 99.21 – –PRP 19164 99.74 99.35 99.55 -0.00 0.743 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.07 91.07 92.55 -0.03 0.557 516 84.10 81.01 92.55 0.28 0.806RBR 1905 76.06 66.56 71.00 0.74 0.298 7 0.00 0.00 71.00 – –RBS 486 86.62 81.28 83.86 6.76 0.446 2 0.00 83.86 – –RP 2879 78.86 75.79 77.29 0.32 0.198 0 77.29 – –SYM 59 80.95 86.44 83.61 2.38 0.178 1 0.00 83.61 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.69 95.26 95.47 0.05 0.416 570 85.97 75.26 95.47 0.69 0.764VBD 32941 95.66 94.48 95.07 0.00 0.957 426 78.65 63.15 95.07 -3.04 0.228VBG 16321 91.45 92.30 91.87 0.09 0.387 924 69.54 92.64 91.87 0.29 0.759VBN 22177 86.87 90.63 88.71 -0.14 0.383 716 65.49 80.59 88.71 -2.36 0.062VBP 13819 93.49 92.24 92.86 -0.12 0.079 131 58.54 54.96 92.86 2.22 0.826VBZ 23816 97.71 96.48 97.09 0.07 0.196 467 87.67 56.32 97.09 2.74 0.471WDT 4745 96.44 95.41 95.92 -0.17 0.014 4 0.00 95.92 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.62 -0.39 0.377TOKENS 1044667 96.831 -0.01 0.500 24622 81.56 -0.47 0.143

123

B.56

(Pos = RB & Wd ∈ {n’t, not}) ⇒ (Pos ← RB–1)(Pos = RB & Wd = also) ⇒ (Pos ← RB–2)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.36 99.49 0.01 0.649 1 0.00 99.49 – –CD 40132 99.32 99.45 99.38 -0.03 0.112 3413 98.60 97.25 99.38 -0.16 0.292DT 90066 99.33 99.49 99.41 0.01 0.262 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.241 0 97.06 – –FW 238 57.07 49.16 52.82 2.11 0.602 67 75.00 4.48 52.82 – –IN 108456 98.17 98.68 98.42 -0.00 0.882 22 0.00 0.00 98.42 – –JJ 67085 91.85 91.32 91.58 -0.09 0.305 4581 77.12 73.74 91.58 -0.82 0.392JJR 3621 84.52 90.75 87.52 -0.03 0.832 85 69.23 21.18 87.52 -6.97 0.676JJS 2129 94.19 88.30 91.15 -2.26 0.368 54 71.43 74.07 91.15 -1.55 0.374MD 10743 99.66 99.80 99.73 0.01 0.297 5 0.00 0.00 99.73 – –NN 146173 96.26 96.02 96.14 0.01 0.694 3934 74.98 68.33 96.14 -0.40 0.358NNP 100926 96.67 97.28 96.97 -0.04 0.161 6075 83.92 92.38 96.97 -0.70 0.068NNPS 2917 63.34 66.34 64.80 -1.22 0.067 234 31.48 14.53 64.80 -4.31 0.641NNS 65922 97.51 97.85 97.68 -0.05 0.254 2353 81.28 86.91 97.68 -2.06 0.022PDT 397 70.63 79.35 74.73 -0.59 0.413 0 74.73 – –POS 9529 98.89 99.50 99.19 -0.03 0.193 0 99.19 – –PRP 19164 99.74 99.36 99.55 -0.00 0.997 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.16 90.97 92.53 -0.04 0.658 516 84.80 80.04 92.53 0.07 0.903RBR 1905 75.74 65.88 70.47 -0.01 0.976 7 0.00 0.00 70.47 – –RBS 486 65.06 83.13 72.99 -7.08 0.370 2 0.00 72.99 – –RP 2879 78.73 75.86 77.27 0.29 0.017 0 77.27 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.42 95.37 95.40 -0.03 0.578 570 81.20 75.79 95.40 -1.64 0.056VBD 32941 95.46 94.68 95.07 0.01 0.811 426 80.00 62.91 95.07 -2.52 0.028VBG 16321 91.56 92.27 91.92 0.13 0.161 924 71.21 92.10 91.92 1.40 0.213VBN 22177 87.11 90.48 88.76 -0.08 0.407 716 65.35 80.59 88.76 -2.48 0.037VBP 13819 93.38 92.24 92.81 -0.17 0.018 131 52.21 54.20 92.81 -4.11 0.289VBZ 23816 97.69 96.60 97.14 0.12 0.064 467 87.50 56.96 97.14 3.37 0.426WDT 4745 96.53 95.62 96.07 -0.02 0.887 4 0.00 96.07 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.44 -0.73 0.135TOKENS 1044667 96.820 -0.02 0.252 24622 81.29 -0.79 0.066

124

B.57 rbr(cl–c,s) Mapping

(Pos = RBR & Wd = more) ⇒ (Pos ← RBR–0)(Pos = RBR & Wd ∈ {better, faster, further, less}) ⇒ (Pos ← RBR–1)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.37 99.47 -0.01 0.182 1 0.00 99.47 – –CD 40132 99.35 99.43 99.39 -0.02 0.370 3413 98.72 96.98 99.39 -0.24 0.201DT 90066 99.31 99.47 99.39 -0.01 0.348 2 0.00 99.39 – –EX 951 95.43 98.84 97.11 -0.16 0.393 0 97.11 – –FW 238 58.33 47.06 52.09 0.70 0.924 67 57.14 5.97 52.09 – –IN 108456 98.16 98.69 98.43 -0.00 0.937 22 0.00 0.00 98.43 – –JJ 67085 91.91 91.24 91.57 -0.10 0.281 4581 78.03 72.95 91.57 -0.80 0.416JJR 3621 84.73 90.69 87.61 0.07 0.886 85 68.00 20.00 87.61 -11.34 0.829JJS 2129 87.76 96.62 91.97 -1.38 0.834 54 71.43 74.07 91.97 -1.55 0.374MD 10743 99.67 99.77 99.72 0.00 0.377 5 0.00 99.72 – –NN 146173 96.21 96.02 96.11 -0.02 0.556 3934 74.25 68.91 96.11 -0.42 0.621NNP 100926 96.61 97.34 96.97 -0.04 0.036 6075 83.31 94.01 96.97 -0.26 0.354NNPS 2917 64.16 66.34 65.23 -0.57 0.368 234 43.24 13.68 65.23 0.00 0.775NNS 65922 97.52 97.98 97.75 0.02 0.436 2353 83.34 87.59 97.75 -0.41 0.504PDT 397 70.95 80.60 75.47 0.39 0.505 0 75.47 – –POS 9529 98.94 99.52 99.23 0.01 0.745 0 99.23 – –PRP 19164 99.75 99.37 99.56 0.01 0.212 11 0.00 99.56 – –PRP$ 9173 99.40 99.91 99.66 0.02 0.071 1 0.00 99.66 – –RB 33806 94.16 90.97 92.54 -0.04 0.644 516 85.07 80.62 92.54 0.60 0.605RBR 1905 75.67 66.46 70.77 0.42 0.746 7 0.00 0.00 70.77 – –RBS 486 85.71 48.15 61.66 -21.50 0.262 2 0.00 61.66 – –RP 2879 78.77 75.65 77.18 0.18 0.352 0 77.18 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.37 95.30 95.33 -0.09 0.253 570 81.80 76.49 95.33 -0.82 0.418VBD 32941 95.55 94.54 95.05 -0.02 0.756 426 82.87 63.62 95.05 -0.38 0.605VBG 16321 91.59 92.31 91.95 0.17 0.035 924 71.61 93.07 91.95 2.19 0.026VBN 22177 87.08 90.54 88.77 -0.07 0.181 716 67.34 83.24 88.77 0.61 0.528VBP 13819 93.56 92.30 92.93 -0.05 0.229 131 63.96 54.20 92.93 5.80 0.235VBZ 23816 97.79 96.46 97.12 0.10 0.107 467 87.93 54.60 97.12 0.93 0.770WDT 4745 96.62 95.72 96.17 0.08 0.134 4 0.00 96.17 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.41 -0.78 0.217TOKENS 1044667 96.822 -0.02 0.459 24622 81.77 -0.21 0.615

125

B.58

(Pos = RBR & Wd ∈ {less, more}) ⇒ (Pos ← RBR–ML)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.38 99.50 0.02 0.427 1 0.00 99.50 – –CD 40132 99.32 99.44 99.38 -0.02 0.428 3413 98.78 97.30 99.38 -0.04 0.867DT 90066 99.34 99.49 99.42 0.01 0.141 2 0.00 99.42 – –EX 951 95.53 98.95 97.21 -0.05 0.583 0 97.21 – –FW 238 53.18 49.16 51.09 -1.24 0.497 67 20.00 1.49 51.09 – –IN 108456 98.17 98.70 98.43 0.01 0.626 22 0.00 0.00 98.43 – –JJ 67085 91.99 91.31 91.65 -0.01 0.813 4581 78.43 73.17 91.65 -0.40 0.618JJR 3621 84.33 91.25 87.65 0.12 0.782 85 86.67 15.29 87.65 – –JJS 2129 91.38 92.06 91.72 -1.65 0.709 54 71.70 70.37 91.72 -3.85 0.416MD 10743 99.66 99.82 99.74 0.02 0.090 5 0.00 99.74 – –NN 146173 96.22 96.02 96.12 -0.01 0.778 3934 74.66 69.04 96.12 -0.07 0.948NNP 100926 96.67 97.35 97.01 -0.01 0.810 6075 83.61 94.47 97.01 0.16 0.652NNPS 2917 63.51 66.47 64.96 -0.98 0.306 234 33.01 14.53 64.96 -2.89 0.896NNS 65922 97.58 97.87 97.72 -0.00 0.913 2353 83.82 87.38 97.72 -0.24 0.692PDT 397 70.76 79.85 75.03 -0.20 0.817 0 75.03 – –POS 9529 98.96 99.49 99.22 -0.00 0.986 0 99.22 – –PRP 19164 99.73 99.37 99.55 -0.00 0.990 11 0.00 99.55 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.13 91.15 92.62 0.05 0.315 516 84.99 81.20 92.62 0.92 0.465RBR 1905 76.79 65.30 70.58 0.15 0.921 7 0.00 0.00 70.58 – –RBS 486 71.43 68.93 70.16 -10.69 0.389 2 0.00 70.16 – –RP 2879 78.75 76.07 77.39 0.44 0.180 0 77.39 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 73.44 47.00 57.32 -0.61 0.374 17 0.00 0.00 57.32 – –VB 29021 95.28 95.39 95.34 -0.09 0.118 570 82.27 74.91 95.34 -1.62 0.412VBD 32941 95.67 94.41 95.03 -0.03 0.730 426 83.65 61.27 95.03 -2.10 0.214VBG 16321 91.38 92.35 91.87 0.08 0.346 924 70.78 92.53 91.87 1.26 0.122VBN 22177 86.91 90.44 88.64 -0.22 0.312 716 66.44 81.56 88.64 -1.05 0.484VBP 13819 93.43 92.28 92.85 -0.13 0.098 131 57.14 51.91 92.85 -1.92 0.780VBZ 23816 97.77 96.49 97.12 0.10 0.044 467 87.71 56.53 97.12 2.99 0.342WDT 4745 96.64 95.68 96.16 0.07 0.031 4 0.00 96.16 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.51 -0.59 0.292TOKENS 1044667 96.829 -0.01 0.500 24622 81.83 -0.14 0.710

126

B.59 in(cl–c) Mapping

(Pos = IN & Wd ∈ {@, albeit, although,...<14 ommitted>...,via, vs., whereas}) ⇒ (Pos ←IN–0)(Pos = IN & Wd ∈ {are, complicated, including, once, then, till, which}) ⇒ (Pos ← IN–1)(Pos = IN & Wd ∈ {’til, a, aka,...<10 ommitted>...,to, towards, underneath}) ⇒ (Pos ←IN–2)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.75 99.85 – – 0 99.85 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.74 99.37 99.55 0.02 0.046 1 0.00 99.55 – –CD 40132 99.27 99.57 99.42 -0.00 0.374 3413 99.35 98.51 99.42 – –DT 90066 99.46 99.29 99.37 0.00 0.743 2 0.00 99.37 – –EX 951 95.08 99.58 97.28 -0.10 0.178 0 97.28 – –FW 238 64.75 33.19 43.89 2.03 0.181 67 0.00 0.00 43.89 – –IN 108456 97.86 98.90 98.38 0.01 0.237 22 0.00 98.38 – –JJ 67085 92.37 92.04 92.20 -0.01 0.040 4581 79.53 82.03 92.20 -0.10 0.172JJR 3621 84.91 92.32 88.46 0.06 0.369 85 88.46 27.06 88.46 – –JJS 2129 95.14 95.73 95.43 -0.02 0.685 54 86.49 59.26 95.43 – –MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.17 95.08 96.11 -0.00 0.779 3934 80.99 67.67 96.11 -0.02 0.870NNP 100926 94.69 97.97 96.30 0.00 0.566 6075 82.55 98.09 96.30 0.00 0.832NNPS 2917 54.83 73.23 62.70 0.14 0.160 234 54.90 11.97 62.70 – –NNS 65922 97.72 97.22 97.47 0.00 0.315 2353 87.32 90.44 97.47 0.07 0.237PDT 397 69.94 84.38 76.48 – – 0 76.48 – –POS 9529 98.74 99.65 99.20 – – 0 99.20 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.16 90.62 92.83 0.04 0.120 516 90.13 81.40 92.83 -0.10 0.374RBR 1905 77.50 67.09 71.92 0.08 0.667 7 0.00 71.92 – –RBS 486 87.77 84.16 85.92 -0.14 0.585 2 0.00 0.00 85.92 – –RP 2879 75.67 75.72 75.69 -0.03 0.454 0 75.69 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.60 36.00 48.98 – – 17 33.33 5.88 48.98 – –VB 29021 96.70 94.63 95.65 0.00 0.951 570 85.88 78.95 95.65 – –VBD 32941 95.00 95.90 95.45 -0.01 0.273 426 80.48 70.66 95.45 0.00 0.959VBG 16321 92.35 94.20 93.27 -0.01 0.308 924 81.99 91.13 93.27 -0.12 0.183VBN 22177 88.18 90.95 89.54 -0.02 0.048 716 78.50 72.91 89.54 0.07 0.583VBP 13819 94.01 92.13 93.06 -0.00 0.957 131 73.68 32.06 93.06 0.53 0.859VBZ 23816 98.84 96.42 97.62 0.01 0.099 467 89.91 64.88 97.62 0.21 0.374WDT 4745 97.20 96.44 96.82 0.04 0.370 4 0.00 96.82 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.74 0.04 0.594TOKENS 1044667 96.855 0.00 0.389 24622 84.61 -0.01 0.380

127

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.40 99.51 0.03 0.258 1 0.00 99.51 – –CD 40132 99.33 99.45 99.39 -0.01 0.268 3413 98.69 97.10 99.39 -0.19 0.274DT 90066 99.36 99.47 99.41 0.01 0.124 2 0.00 99.41 – –EX 951 95.43 98.84 97.11 -0.16 0.407 0 97.11 – –FW 238 56.86 48.74 52.49 1.46 0.725 67 0.00 52.49 – –IN 108456 98.16 98.73 98.44 0.02 0.117 22 0.00 0.00 98.44 – –JJ 67085 92.08 91.27 91.67 0.01 0.462 4581 79.53 72.93 91.67 0.10 0.820JJR 3621 84.91 90.61 87.67 0.14 0.538 85 68.42 30.59 87.67 21.27 0.435JJS 2129 94.18 91.97 93.06 -0.21 0.997 54 71.93 75.93 93.06 – –MD 10743 99.71 99.75 99.73 0.01 0.071 5 0.00 99.73 – –NN 146173 96.23 96.03 96.13 0.00 0.780 3934 74.48 69.80 96.13 0.39 0.151NNP 100926 96.66 97.38 97.02 0.00 0.925 6075 83.41 94.68 97.02 0.14 0.460NNPS 2917 64.17 66.99 65.55 -0.08 0.835 234 50.00 14.53 65.55 8.36 0.638NNS 65922 97.53 97.88 97.70 -0.03 0.212 2353 83.10 87.55 97.70 -0.59 0.336PDT 397 71.00 78.34 74.49 -0.91 0.248 0 0.00 74.49 – –POS 9529 98.94 99.52 99.23 0.01 0.834 0 99.23 – –PRP 19164 99.73 99.36 99.55 -0.00 0.615 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.19 91.15 92.64 0.08 0.002 516 84.85 81.40 92.64 0.96 0.453RBR 1905 75.22 66.93 70.83 0.51 0.509 7 25.00 14.29 70.83 – –RBS 486 74.21 82.30 78.05 -0.64 0.932 2 0.00 78.05 – –RP 2879 78.71 76.28 77.47 0.56 0.103 0 77.47 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 72.73 48.00 57.83 0.28 0.374 17 50.00 5.88 57.83 – –VB 29021 95.42 95.34 95.38 -0.04 0.210 570 81.58 76.14 95.38 -1.18 0.167VBD 32941 95.77 94.51 95.13 0.08 0.163 426 84.49 62.68 95.13 -0.39 0.795VBG 16321 91.37 92.37 91.86 0.08 0.408 924 70.49 93.07 91.86 1.28 0.233VBN 22177 86.96 90.77 88.82 -0.02 0.887 716 66.82 80.73 88.82 -1.20 0.442VBP 13819 93.47 92.43 92.95 -0.03 0.654 131 60.53 52.67 92.95 1.56 0.737VBZ 23816 97.71 96.38 97.04 0.02 0.434 467 86.01 52.68 97.04 -2.12 0.477WDT 4745 96.63 95.57 96.10 0.01 0.882 4 0.00 96.10 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.82 -0.01 0.980TOKENS 1044667 96.850 0.01 0.600 24622 82.00 0.07 0.721

128

MAX True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.97 99.57 99.77 – – 0 99.77 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.76 99.35 99.55 0.01 0.025 1 0.00 99.55 – –CD 40132 99.35 99.47 99.41 0.00 0.374 3413 99.26 98.77 99.41 0.04 0.353DT 90066 99.41 99.31 99.36 -0.00 0.390 2 0.00 99.36 – –EX 951 95.61 98.42 96.99 -0.05 0.374 0 96.99 – –FW 238 71.29 30.25 42.48 2.81 0.238 67 80.00 5.97 42.48 – –IN 108456 97.78 98.66 98.22 0.00 0.683 22 83.33 22.73 98.22 – –JJ 67085 91.81 92.80 92.30 -0.01 0.587 4581 82.41 82.43 92.30 -0.02 0.831JJR 3621 84.91 90.11 87.43 -0.03 0.503 85 81.25 45.88 87.43 – –JJS 2129 94.34 94.65 94.49 -0.30 0.012 54 84.44 70.37 94.49 -2.02 0.208MD 10743 99.74 99.57 99.66 -0.00 0.374 5 0.00 99.66 – –NN 146173 96.36 96.59 96.47 -0.00 0.679 3934 79.70 78.04 96.47 -0.17 0.083NNP 100926 96.18 97.72 96.94 -0.02 0.097 6075 89.38 95.60 96.94 0.04 0.436NNPS 2917 65.83 53.75 59.18 -0.43 0.278 234 59.46 37.61 59.18 3.26 0.093NNS 65922 97.64 98.50 98.07 0.00 0.776 2353 90.02 92.73 98.07 0.04 0.732PDT 397 74.28 64.74 69.18 0.53 0.354 0 69.18 – –POS 9529 98.45 99.63 99.04 0.01 0.374 0 99.04 – –PRP 19164 99.85 99.16 99.50 0.00 0.374 11 0.00 99.50 – –PRP$ 9173 99.13 99.95 99.54 0.01 0.374 1 0.00 99.54 – –RB 33806 93.68 90.95 92.30 -0.02 0.546 516 92.95 84.30 92.30 -0.23 0.182RBR 1905 75.24 62.52 68.29 -0.11 0.390 7 0.00 0.00 68.29 – –RBS 486 83.55 79.42 81.43 -1.38 0.058 2 0.00 81.43 – –RP 2879 79.86 73.39 76.49 0.15 0.236 0 76.49 – –SYM 59 80.43 62.71 70.48 – – 1 100.00 100.00 70.48 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 84.38 27.00 40.91 – – 17 0.00 40.91 – –VB 29021 97.50 95.08 96.28 0.00 0.805 570 90.74 85.96 96.28 -0.14 0.682VBD 32941 96.02 96.63 96.32 -0.00 0.660 426 83.70 80.75 96.32 -0.17 0.374VBG 16321 94.21 91.94 93.06 -0.03 0.240 924 85.68 90.04 93.06 0.05 0.374VBN 22177 91.10 89.50 90.29 -0.04 0.005 716 80.80 79.33 90.29 -0.25 0.237VBP 13819 94.60 93.65 94.12 -0.04 0.205 131 71.93 62.60 94.12 -2.01 0.037VBZ 23816 98.86 96.73 97.78 -0.00 0.889 467 89.49 78.37 97.78 0.25 0.483WDT 4745 96.23 95.83 96.03 -0.11 0.078 4 0.00 96.03 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 53.59 -0.24 0.012TOKENS 1044667 97.050 -0.01 0.006 24622 87.32 -0.02 0.357

129

B.60 rb–loc[lm] Mapping

(Pos = RB & Wd ∈ {aboard, about, above,...<106 ommitted>...,westwards, whence, where})⇒ (Pos ← RB–LOC)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.91 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.36 99.49 0.02 0.602 1 0.00 0.00 99.49 – –CD 40132 99.34 99.42 99.38 -0.03 0.278 3413 98.92 96.86 99.38 -0.20 0.236DT 90066 99.33 99.49 99.41 0.01 0.539 2 0.00 99.41 – –EX 951 95.53 98.84 97.16 -0.11 0.568 0 97.16 – –FW 238 55.72 47.06 51.03 -1.37 0.570 67 50.00 1.49 51.03 – –IN 108456 98.19 98.66 98.42 -0.00 0.859 22 0.00 0.00 98.42 – –JJ 67085 92.06 91.25 91.65 -0.01 0.882 4581 78.91 72.52 91.65 -0.57 0.596JJR 3621 84.70 90.36 87.44 -0.12 0.598 85 66.67 14.12 87.44 -33.16 0.226JJS 2129 88.20 92.67 90.38 -3.09 0.165 54 71.43 74.07 90.38 -1.55 0.374MD 10743 99.67 99.78 99.72 0.00 0.737 5 0.00 99.72 – –NN 146173 96.18 96.06 96.12 -0.01 0.675 3934 73.58 70.01 96.12 -0.06 0.943NNP 100926 96.68 97.31 97.00 -0.02 0.683 6075 84.13 93.83 97.00 0.17 0.579NNPS 2917 64.54 66.20 65.36 -0.37 0.317 234 44.87 14.96 65.36 7.97 0.658NNS 65922 97.54 97.92 97.73 0.00 0.898 2353 82.42 87.85 97.73 -0.84 0.026PDT 397 69.98 78.09 73.81 -1.82 0.165 0 0.00 73.81 – –POS 9529 98.94 99.52 99.23 0.01 0.781 0 99.23 – –PRP 19164 99.72 99.38 99.55 0.00 0.981 11 0.00 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.07 91.11 92.57 -0.01 0.884 516 83.64 80.23 92.57 -0.48 0.767RBR 1905 75.87 66.35 70.79 0.45 0.443 7 0.00 0.00 70.79 – –RBS 486 67.45 52.88 59.28 -24.53 0.175 2 0.00 59.28 – –RP 2879 78.12 76.38 77.24 0.25 0.191 0 77.24 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.45 95.35 95.40 -0.03 0.713 570 81.68 75.09 95.40 -1.84 0.208VBD 32941 95.71 94.50 95.10 0.04 0.488 426 82.87 63.62 95.10 -0.38 0.921VBG 16321 91.48 92.30 91.89 0.10 0.246 924 70.41 92.97 91.89 1.16 0.370VBN 22177 86.86 90.73 88.75 -0.10 0.381 716 65.98 81.01 88.75 -1.73 0.214VBP 13819 93.35 92.26 92.80 -0.18 0.043 131 55.91 54.20 92.80 -0.76 0.821VBZ 23816 97.72 96.62 97.16 0.14 0.094 467 88.70 57.17 97.16 4.16 0.269WDT 4745 96.53 95.70 96.12 0.03 0.870 4 0.00 96.12 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.53 -0.56 0.279TOKENS 1044667 96.826 -0.02 0.462 24622 81.74 -0.24 0.390

130

B.61

(Pos = VB & SibAllR =re NP(.*)?) ⇒ (Pos ← VB–TR)(Pos = VBG & SibAllR =re NP(.*)?) ⇒ (Pos ← VBG–TR)(Pos = VBD & SibAllR =re NP(.*)?) ⇒ (Pos ← VBD–TR)(Pos = VBN & SibAllR =re NP(.*)?) ⇒ (Pos ← VBN–TR)(Pos = VBP & SibAllR =re NP(.*)?) ⇒ (Pos ← VBP–TR)(Pos = VBZ & SibAllR =re NP(.*)?) ⇒ (Pos ← VBZ–TR)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.67 99.39 99.53 0.05 0.123 1 0.00 99.53 – –CD 40132 99.29 99.50 99.40 -0.01 0.147 3413 98.38 97.57 99.40 -0.11 0.361DT 90066 99.33 99.50 99.41 0.01 0.160 2 0.00 99.41 – –EX 951 95.82 98.84 97.31 0.05 0.667 0 97.31 – –FW 238 61.78 40.76 49.11 -5.06 0.001 67 0.00 0.00 49.11 – –IN 108456 98.14 98.67 98.40 -0.02 0.249 22 0.00 0.00 98.40 – –JJ 67085 91.34 91.61 91.47 -0.20 0.042 4581 78.33 73.87 91.47 0.03 0.967JJR 3621 84.70 90.53 87.52 -0.03 0.928 85 64.71 12.94 87.52 – –JJS 2129 90.33 88.68 89.50 -4.03 0.175 54 71.43 74.07 89.50 -1.55 0.374MD 10743 99.64 99.77 99.70 -0.01 0.306 5 0.00 0.00 99.70 – –NN 146173 96.03 96.11 96.07 -0.06 0.102 3934 73.80 69.75 96.07 -0.09 0.877NNP 100926 96.50 97.41 96.95 -0.07 0.080 6075 82.26 94.90 96.95 -0.50 0.279NNPS 2917 62.93 66.16 64.51 -1.67 0.098 234 32.11 14.96 64.51 -1.79 0.668NNS 65922 97.39 97.94 97.67 -0.06 0.021 2353 82.51 87.04 97.67 -1.23 0.050PDT 397 70.25 79.09 74.41 -1.02 0.224 0 74.41 – –POS 9529 98.94 99.52 99.23 0.01 0.750 0 99.23 – –PRP 19164 99.76 99.35 99.55 0.00 0.847 11 0.00 0.00 99.55 – –PRP$ 9173 99.39 99.91 99.65 0.01 0.178 1 0.00 99.65 – –RB 33806 94.04 90.92 92.45 -0.13 0.243 516 84.69 80.43 92.45 0.26 0.854RBR 1905 75.91 65.98 70.60 0.18 0.799 7 0.00 0.00 70.60 – –RBS 486 60.72 65.84 63.18 -19.57 0.245 2 0.00 63.18 – –RP 2879 78.44 74.92 76.64 -0.53 0.462 0 76.64 – –SYM 59 75.76 84.75 80.00 -2.04 0.208 1 0.00 0.00 80.00 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 94.90 94.93 94.91 -0.54 0.004 570 83.83 74.56 94.91 -0.99 0.294VBD 32941 95.59 93.81 94.69 -0.39 0.031 426 79.87 55.87 94.69 -9.00 0.011VBG 16321 91.98 91.02 91.50 -0.32 0.074 924 71.44 82.03 91.50 -3.58 0.009VBN 22177 87.31 88.76 88.03 -0.91 0.001 716 68.24 72.91 88.03 -4.75 0.003VBP 13819 93.60 90.85 92.21 -0.82 0.001 131 61.86 45.80 92.21 -5.10 0.138VBZ 23816 97.91 96.17 97.03 0.01 0.904 467 89.56 56.96 97.03 4.32 0.399WDT 4745 96.37 95.09 95.73 -0.38 0.067 4 0.00 95.73 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 99.96 99.87 99.91 -0.02 0.374 1 0.00 0.00 99.91 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.35 -2.85 0.003TOKENS 1044667 96.730 -0.12 0.003 24622 81.39 -0.67 0.145

131

B.62

(Pos = RB & Par =re ADVPLOC(?:[A-Z]+)?) ⇒ (Pos ← RB–LOC)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.99 99.99 0.01 0.374 1 100.00 100.00 99.99 – –” 7620 99.87 99.92 99.90 -0.01 0.178 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.68 99.36 99.52 0.04 0.187 1 0.00 99.52 – –CD 40132 99.30 99.45 99.38 -0.03 0.132 3413 98.37 97.36 99.38 -0.22 0.276DT 90066 99.34 99.49 99.42 0.02 0.115 2 0.00 99.42 – –EX 951 94.95 98.84 96.86 -0.42 0.237 0 96.86 – –FW 238 53.49 48.32 50.77 -1.85 0.422 67 100.00 1.49 50.77 – –IN 108456 98.15 98.71 98.43 0.00 0.817 22 11.11 4.55 98.43 – –JJ 67085 92.01 91.30 91.65 -0.01 0.810 4581 78.31 73.06 91.65 -0.55 0.337JJR 3621 84.96 90.47 87.63 0.09 0.814 85 77.78 24.71 87.63 7.57 0.717JJS 2129 90.85 92.30 91.57 -1.81 0.727 54 74.55 75.93 91.57 1.83 0.374MD 10743 99.67 99.78 99.72 0.00 0.759 5 0.00 99.72 – –NN 146173 96.20 96.00 96.10 -0.03 0.478 3934 74.54 68.40 96.10 -0.62 0.501NNP 100926 96.65 97.41 97.03 0.01 0.663 6075 83.68 94.72 97.03 0.33 0.331NNPS 2917 63.44 66.51 64.94 -1.01 0.216 234 39.13 15.38 64.94 6.29 0.908NNS 65922 97.60 97.82 97.71 -0.02 0.377 2353 83.51 87.17 97.71 -0.55 0.104PDT 397 70.88 79.09 74.76 -0.55 0.470 0 74.76 – –POS 9529 98.90 99.49 99.19 -0.03 0.149 0 99.19 – –PRP 19164 99.75 99.36 99.56 0.01 0.494 11 0.00 99.56 – –PRP$ 9173 99.38 99.91 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.15 91.14 92.62 0.05 0.271 516 85.54 81.40 92.62 1.36 0.333RBR 1905 75.53 66.77 70.88 0.58 0.338 7 25.00 14.29 70.88 – –RBS 486 70.64 65.84 68.16 -13.23 0.388 2 0.00 68.16 – –RP 2879 78.55 75.69 77.09 0.06 0.924 0 77.09 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.50 95.34 95.42 -0.00 0.968 570 83.88 75.79 95.42 -0.10 0.838VBD 32941 95.63 94.59 95.11 0.05 0.406 426 83.23 62.91 95.11 -0.82 0.326VBG 16321 91.45 92.12 91.79 -0.01 0.875 924 70.11 92.64 91.79 0.76 0.088VBN 22177 87.03 90.63 88.79 -0.05 0.691 716 66.63 81.70 88.79 -0.82 0.677VBP 13819 93.58 92.31 92.94 -0.03 0.554 131 58.93 50.38 92.94 -2.06 0.701VBZ 23816 97.66 96.51 97.08 0.06 0.173 467 86.62 55.46 97.08 1.30 0.476WDT 4745 96.57 95.49 96.03 -0.06 0.078 4 0.00 96.03 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.57 -0.49 0.493TOKENS 1044667 96.831 -0.01 0.673 24622 81.85 -0.11 0.747

132

B.63

(Pos = DT & Wd ∈ {a, another, each, every, little, many, much}) ⇒ (Pos ← DT–1)(Pos = DT & Wd ∈ {these, those}) ⇒ (Pos ← DT–P)(Pos = NNS & Wd ∈ {acrobatics, adenoids, alms,...<66 ommitted>...,tweezers, vicissitudes,waterworks}) ⇒ (Pos ← NNS–P)(Pos = NN & Wd ∈ {abaci, aback, abaft,...<32532 ommitted>...,zydeco, zygotic, zymurgy})⇒ (Pos ← NN–M)(Pos = JJ & Wd ∈ {countless, few, many, numerous, several}) ⇒ (Pos ← JJ–P)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.91 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.40 99.48 0.01 0.379 1 0.00 0.00 99.48 – –CD 40132 99.35 99.44 99.39 -0.02 0.410 3413 98.81 96.98 99.39 -0.20 0.307DT 90066 99.37 99.45 99.41 0.01 0.465 2 0.00 99.41 – –EX 951 95.53 98.84 97.16 -0.11 0.568 0 97.16 – –FW 238 54.50 48.32 51.22 -0.98 0.782 67 50.00 1.49 51.22 – –IN 108456 98.11 98.71 98.41 -0.01 0.280 22 0.00 0.00 98.41 – –JJ 67085 91.84 91.27 91.56 -0.11 0.143 4581 78.55 73.00 91.56 -0.45 0.477JJR 3621 84.35 90.80 87.46 -0.10 0.517 85 50.00 16.47 87.46 -28.92 0.507JJS 2129 94.17 95.63 94.90 1.76 0.413 54 72.22 72.22 94.90 -2.24 0.496MD 10743 99.67 99.81 99.74 0.02 0.225 5 0.00 99.74 – –NN 146173 96.23 95.93 96.08 -0.05 0.227 3934 74.74 68.91 96.08 -0.11 0.863NNP 100926 96.57 97.35 96.96 -0.06 0.051 6075 83.02 94.52 96.96 -0.19 0.434NNPS 2917 63.61 66.64 65.09 -0.78 0.411 234 34.15 11.97 65.09 -14.72 0.618NNS 65922 97.51 97.90 97.70 -0.03 0.475 2353 82.98 87.63 97.70 -0.61 0.342PDT 397 70.75 78.59 74.46 -0.95 0.375 0 0.00 74.46 – –POS 9529 98.94 99.52 99.23 0.01 0.690 0 99.23 – –PRP 19164 99.75 99.35 99.55 -0.00 0.966 11 0.00 99.55 – –PRP$ 9173 99.37 99.91 99.64 0.00 0.995 1 0.00 99.64 – –RB 33806 94.03 91.06 92.52 -0.05 0.654 516 84.27 81.01 92.52 0.38 0.782RBR 1905 75.95 65.83 70.53 0.08 0.782 7 0.00 0.00 70.53 – –RBS 486 86.98 81.07 83.92 6.83 0.436 2 0.00 83.92 – –RP 2879 79.19 75.10 77.09 0.06 0.676 0 77.09 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.44 95.40 95.42 -0.01 0.820 570 81.92 76.32 95.42 -0.87 0.508VBD 32941 95.54 94.56 95.05 -0.01 0.755 426 81.54 62.21 95.05 -2.32 0.078VBG 16321 91.34 92.20 91.77 -0.03 0.757 924 72.63 91.34 91.77 2.16 0.099VBN 22177 87.09 90.44 88.74 -0.11 0.145 716 66.06 80.45 88.74 -1.97 0.231VBP 13819 93.55 92.37 92.95 -0.02 0.793 131 54.69 53.44 92.95 -2.54 0.700VBZ 23816 97.70 96.36 97.02 -0.00 0.975 467 84.59 52.89 97.02 -2.50 0.485WDT 4745 96.44 95.24 95.83 -0.27 0.080 4 0.00 95.83 – –WP 2604 99.01 99.62 99.31 -0.04 0.498 0 99.31 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.44 -0.73 0.117TOKENS 1044667 96.818 -0.03 0.115 24622 81.66 -0.35 0.433

133

B.64

(Pos = RB & Par = ADVP-TMP) ⇒ (Pos ← RB–TMP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.37 99.47 -0.01 0.284 1 0.00 99.47 – –CD 40132 99.29 99.46 99.38 -0.03 0.047 3413 98.20 97.33 99.38 -0.32 0.073DT 90066 99.28 99.49 99.39 -0.02 0.258 2 0.00 99.39 – –EX 951 95.53 98.95 97.21 -0.05 0.374 0 97.21 – –FW 238 51.82 47.90 49.78 -3.77 0.381 67 33.33 1.49 49.78 – –IN 108456 98.11 98.72 98.41 -0.01 0.265 22 0.00 0.00 98.41 – –JJ 67085 91.93 91.17 91.55 -0.12 0.155 4581 79.23 72.01 91.55 -0.74 0.335JJR 3621 84.76 90.91 87.73 0.21 0.436 85 74.29 30.59 87.73 24.30 0.414JJS 2129 88.30 92.53 90.37 -3.10 0.518 54 71.93 75.93 90.37 – –MD 10743 99.67 99.80 99.73 0.02 0.330 5 0.00 99.73 – –NN 146173 96.14 96.09 96.11 -0.02 0.524 3934 73.87 69.90 96.11 0.06 0.916NNP 100926 96.68 97.29 96.99 -0.03 0.267 6075 83.85 94.19 96.99 0.17 0.593NNPS 2917 63.04 66.20 64.58 -1.55 0.064 234 31.68 13.68 64.58 -8.06 0.583NNS 65922 97.52 97.91 97.72 -0.01 0.704 2353 82.30 88.74 97.72 -0.43 0.482PDT 397 70.94 78.09 74.34 -1.11 0.150 0 0.00 74.34 – –POS 9529 98.91 99.49 99.20 -0.02 0.180 0 99.20 – –PRP 19164 99.73 99.37 99.55 0.00 0.379 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.50 90.68 92.55 -0.02 0.521 516 90.32 81.40 92.55 4.05 0.004RBR 1905 74.39 67.09 70.55 0.11 0.875 7 20.00 14.29 70.55 – –RBS 486 66.93 53.29 59.34 -24.46 0.191 2 0.00 59.34 – –RP 2879 78.86 75.79 77.29 0.32 0.095 0 77.29 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.35 95.41 95.38 -0.05 0.125 570 82.06 77.02 95.38 -0.32 0.689VBD 32941 95.59 94.54 95.06 -0.00 0.976 426 80.95 63.85 95.06 -1.19 0.469VBG 16321 91.83 92.17 92.00 0.22 0.020 924 72.22 91.99 92.00 2.15 0.016VBN 22177 86.92 90.45 88.65 -0.21 0.062 716 66.06 81.28 88.65 -1.51 0.270VBP 13819 93.59 92.18 92.88 -0.10 0.283 131 65.45 54.96 92.88 7.73 0.325VBZ 23816 97.77 96.46 97.11 0.09 0.190 467 90.39 54.39 97.11 1.74 0.559WDT 4745 96.70 95.64 96.16 0.08 0.407 4 0.00 96.16 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.29 -1.02 0.113TOKENS 1044667 96.807 -0.04 0.170 24622 81.92 -0.02 0.942

134

B.65 rb–loc[s] Mapping

(Pos = RB & Par =re ADVP(?:LOC—DIR)(?:[A-Z]+)?) ⇒ (Pos ← RB–LOC)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.94 99.94 99.94 -0.02 0.178 0 99.94 – –CC 26227 99.73 99.34 99.53 0.00 0.998 1 0.00 99.53 – –CD 40132 99.28 99.56 99.42 -0.00 0.374 3413 99.35 98.51 99.42 – –DT 90066 99.46 99.28 99.37 -0.00 0.856 2 0.00 99.37 – –EX 951 95.18 99.58 97.33 -0.05 0.907 0 97.33 – –FW 238 63.93 32.77 43.33 0.74 0.587 67 0.00 0.00 43.33 – –IN 108456 97.82 98.93 98.37 0.00 0.430 22 0.00 98.37 – –JJ 67085 92.23 92.15 92.19 -0.03 0.154 4581 79.69 82.14 92.19 0.07 0.548JJR 3621 85.06 92.29 88.53 0.14 0.105 85 88.89 28.24 88.53 3.42 0.374JJS 2129 95.23 95.73 95.48 0.02 0.374 54 88.89 59.26 95.48 1.11 0.374MD 10743 99.69 99.79 99.74 – – 5 0.00 99.74 – –NN 146173 97.15 95.09 96.11 -0.01 0.031 3934 81.19 67.69 96.11 0.11 0.211NNP 100926 94.69 97.95 96.29 -0.00 0.209 6075 82.45 98.12 96.29 -0.05 0.320NNPS 2917 54.72 73.26 62.65 0.05 0.643 234 52.94 11.54 62.65 -3.57 0.322NNS 65922 97.72 97.21 97.46 -0.00 0.730 2353 87.10 90.40 97.46 -0.08 0.367PDT 397 69.67 83.88 76.11 -0.48 0.297 0 76.11 – –POS 9529 98.75 99.65 99.20 0.01 0.374 0 99.20 – –PRP 19164 99.85 99.33 99.59 – – 11 0.00 0.00 99.59 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.58 90.13 92.77 -0.03 0.634 516 90.71 81.40 92.77 0.20 0.179RBR 1905 77.43 67.35 72.04 0.24 0.139 7 0.00 72.04 – –RBS 486 87.66 84.77 86.19 0.17 0.472 2 0.00 0.00 86.19 – –RP 2879 74.59 76.97 75.76 0.05 0.940 0 75.76 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 76.09 35.00 47.95 -2.11 0.374 17 33.33 5.88 47.95 – –VB 29021 96.74 94.63 95.68 0.02 0.097 570 86.04 78.95 95.68 0.09 0.553VBD 32941 95.00 95.94 95.46 0.01 0.821 426 81.07 71.36 95.46 0.87 0.217VBG 16321 92.38 94.17 93.26 -0.02 0.207 924 82.12 91.45 93.26 0.13 0.408VBN 22177 88.24 90.94 89.57 0.01 0.786 716 78.92 73.18 89.57 0.53 0.155VBP 13819 94.04 92.18 93.10 0.04 0.105 131 74.14 32.82 93.10 2.38 0.447VBZ 23816 98.83 96.41 97.61 0.00 0.996 467 90.03 63.81 97.61 -0.71 0.075WDT 4745 97.26 96.38 96.81 0.04 0.101 4 0.00 96.81 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.66 -0.12 0.137TOKENS 1044667 96.851 -0.00 0.521 24622 84.66 0.04 0.340

135

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.37 99.50 0.03 0.341 1 0.00 99.50 – –CD 40132 99.34 99.48 99.41 -0.00 0.953 3413 98.81 97.39 99.41 0.02 0.961DT 90066 99.32 99.48 99.40 -0.00 0.858 2 0.00 99.40 – –EX 951 94.95 98.84 96.86 -0.42 0.223 0 96.86 – –FW 238 51.61 47.06 49.23 -4.84 0.077 67 0.00 49.23 – –IN 108456 98.12 98.72 98.42 -0.01 0.617 22 5.56 4.55 98.42 – –JJ 67085 91.93 91.40 91.66 0.00 0.966 4581 78.53 73.15 91.66 -0.35 0.670JJR 3621 84.89 90.61 87.66 0.12 0.553 85 72.22 15.29 87.66 -27.59 0.501JJS 2129 94.80 93.33 94.06 0.86 0.372 54 79.59 72.22 94.06 2.51 0.179MD 10743 99.66 99.81 99.73 0.02 0.330 5 0.00 0.00 99.73 – –NN 146173 96.21 96.05 96.13 0.00 0.944 3934 74.25 69.65 96.13 0.13 0.858NNP 100926 96.72 97.27 96.99 -0.02 0.186 6075 84.22 93.33 96.99 -0.03 0.930NNPS 2917 63.74 65.51 64.62 -1.50 0.160 234 27.63 8.97 64.62 -34.80 0.105NNS 65922 97.48 97.96 97.72 -0.01 0.662 2353 81.28 88.57 97.72 -1.17 0.010PDT 397 70.98 80.10 75.27 0.12 0.733 0 75.27 – –POS 9529 98.96 99.53 99.24 0.02 0.231 0 99.24 – –PRP 19164 99.72 99.38 99.55 0.00 0.972 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.37 91.00 92.65 0.09 0.109 516 84.03 81.59 92.65 0.61 0.438RBR 1905 75.75 66.56 70.86 0.55 0.329 7 0.00 0.00 70.86 – –RBS 486 78.83 83.54 81.12 3.27 0.395 2 0.00 81.12 – –RP 2879 77.34 76.59 76.96 -0.10 0.844 0 76.96 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.32 95.34 95.33 -0.10 0.208 570 84.21 75.79 95.33 0.09 0.987VBD 32941 95.63 94.47 95.05 -0.01 0.695 426 81.76 61.03 95.05 -3.26 0.086VBG 16321 91.67 92.24 91.95 0.17 0.125 924 71.16 93.18 91.95 1.87 0.014VBN 22177 87.11 90.47 88.76 -0.09 0.252 716 66.17 81.15 88.76 -1.50 0.115VBP 13819 93.63 92.14 92.88 -0.10 0.383 131 55.74 51.91 92.88 -3.08 0.784VBZ 23816 97.72 96.53 97.12 0.10 0.172 467 86.29 55.25 97.12 0.91 0.838WDT 4745 96.50 95.81 96.15 0.06 0.513 4 0.00 0.00 96.15 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.59 -0.44 0.156TOKENS 1044667 96.841 -0.00 0.945 24622 81.73 -0.26 0.398

136

B.66

(Pos = NN & Par =re NP-TMP(?:-[A-Z]+)?) ⇒ (Pos ← NN–TMP)(Pos = NNP & Par =re NP-TMP(?:-[A-Z]+)?) ⇒ (Pos ← NNP–TMP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.90 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.63 99.39 99.51 0.03 0.185 1 0.00 99.51 – –CD 40132 99.35 99.46 99.40 -0.00 0.729 3413 98.72 97.28 99.40 -0.09 0.512DT 90066 99.33 99.48 99.41 0.01 0.640 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.231 0 97.06 – –FW 238 57.29 46.22 51.16 -1.10 0.545 67 0.00 0.00 51.16 – –IN 108456 98.16 98.71 98.44 0.01 0.255 22 0.00 0.00 98.44 – –JJ 67085 92.03 91.22 91.62 -0.04 0.407 4581 78.89 72.30 91.62 -0.74 0.187JJR 3621 84.75 90.58 87.57 0.03 0.967 85 73.08 22.35 87.57 -1.80 0.975JJS 2129 91.66 95.96 93.76 0.54 0.804 54 71.93 75.93 93.76 – –MD 10743 99.65 99.80 99.73 0.01 0.633 5 0.00 0.00 99.73 – –NN 146173 96.14 96.06 96.10 -0.03 0.107 3934 73.71 69.42 96.10 -0.40 0.590NNP 100926 96.67 97.32 96.99 -0.02 0.268 6075 83.69 93.83 96.99 -0.11 0.640NNPS 2917 64.25 66.61 65.41 -0.29 0.439 234 42.86 15.38 65.41 8.96 0.786NNS 65922 97.52 97.87 97.69 -0.03 0.575 2353 82.88 87.46 97.69 -0.76 0.257PDT 397 71.17 78.34 74.58 -0.79 0.321 0 0.00 74.58 – –POS 9529 98.92 99.52 99.22 -0.01 0.838 0 0.00 99.22 – –PRP 19164 99.71 99.37 99.54 -0.01 0.513 11 14.29 9.09 99.54 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.07 91.15 92.59 0.01 0.811 516 84.10 81.98 92.59 0.89 0.360RBR 1905 75.61 66.40 70.71 0.34 0.547 7 33.33 14.29 70.71 – –RBS 486 85.86 68.72 76.34 -2.81 0.530 2 0.00 76.34 – –RP 2879 78.57 75.79 77.16 0.15 0.365 0 77.16 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.59 95.26 95.42 -0.00 0.999 570 83.90 74.04 95.42 -1.32 0.384VBD 32941 95.67 94.50 95.08 0.02 0.779 426 83.33 62.21 95.08 -1.40 0.540VBG 16321 91.50 92.21 91.85 0.06 0.545 924 69.71 92.64 91.85 0.43 0.242VBN 22177 86.94 90.72 88.79 -0.05 0.699 716 66.41 83.38 88.79 -0.10 0.991VBP 13819 93.57 92.28 92.92 -0.05 0.589 131 58.47 52.67 92.92 -0.07 0.874VBZ 23816 97.65 96.48 97.06 0.04 0.635 467 85.06 56.10 97.06 1.29 0.645WDT 4745 96.70 95.66 96.18 0.09 0.399 4 0.00 96.18 – –WP 2604 99.05 99.58 99.31 -0.04 0.178 0 0.00 99.31 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.59 -0.44 0.141TOKENS 1044667 96.836 -0.01 0.568 24622 81.69 -0.31 0.342

137

B.67

(Pos = NN & Wd ∈ {afternoon, evening, midsummer,...<13 ommitted>...,winter, wintertime,yesterday}) ⇒ (Pos ← NN–TMP)(Pos = NNP & Wd ∈ {advent, apr, apr.,...<62 ommitted>...,wed., wednesday, xmas}) ⇒ (Pos← NNP–TMP)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.57 99.39 99.48 0.01 0.379 1 0.00 99.48 – –CD 40132 99.32 99.42 99.37 -0.04 0.030 3413 98.74 96.54 99.37 -0.46 0.058DT 90066 99.33 99.49 99.41 0.01 0.179 2 0.00 99.41 – –EX 951 95.33 98.84 97.06 -0.21 0.241 0 97.06 – –FW 238 56.86 48.74 52.49 1.46 0.784 67 50.00 4.48 52.49 – –IN 108456 98.19 98.71 98.45 0.02 0.095 22 0.00 0.00 98.45 – –JJ 67085 91.86 91.26 91.56 -0.11 0.009 4581 78.43 72.54 91.56 -0.85 0.055JJR 3621 84.93 90.86 87.79 0.28 0.153 85 73.91 20.00 87.79 – –JJS 2129 90.80 92.20 91.49 -1.89 0.716 54 73.21 75.93 91.49 0.91 0.374MD 10743 99.67 99.78 99.72 0.00 0.694 5 0.00 99.72 – –NN 146173 96.25 96.04 96.14 0.01 0.515 3934 74.29 69.27 96.14 -0.13 0.819NNP 100926 96.72 97.29 97.00 -0.01 0.322 6075 84.21 94.17 97.00 0.39 0.088NNPS 2917 63.61 66.58 65.06 -0.83 0.329 234 37.21 20.51 65.06 27.27 0.510NNS 65922 97.60 97.98 97.79 0.06 0.015 2353 83.97 89.08 97.79 0.80 0.166PDT 397 70.45 78.09 74.07 -1.47 0.228 0 0.00 74.07 – –POS 9529 98.97 99.51 99.24 0.02 0.642 0 99.24 – –PRP 19164 99.74 99.36 99.55 -0.00 0.991 11 0.00 99.55 – –PRP$ 9173 99.38 99.91 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.23 91.03 92.61 0.03 0.478 516 85.80 80.81 92.61 1.14 0.446RBR 1905 76.28 66.67 71.15 0.96 0.203 7 33.33 14.29 71.15 – –RBS 486 70.33 65.84 68.01 -13.42 0.386 2 0.00 68.01 – –RP 2879 78.73 75.72 77.20 0.20 0.576 0 77.20 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.21 95.34 95.28 -0.16 0.093 570 80.00 75.79 95.28 -2.35 0.022VBD 32941 95.65 94.39 95.02 -0.04 0.064 426 83.38 64.79 95.02 0.92 0.307VBG 16321 91.49 92.31 91.90 0.11 0.140 924 70.85 92.32 91.90 1.21 0.285VBN 22177 86.75 90.69 88.68 -0.18 0.092 716 66.21 81.28 88.68 -1.39 0.290VBP 13819 93.49 92.34 92.91 -0.06 0.420 131 60.98 57.25 92.91 6.48 0.376VBZ 23816 97.76 96.62 97.19 0.17 0.036 467 88.49 57.60 97.19 4.53 0.266WDT 4745 96.60 95.72 96.16 0.07 0.526 4 0.00 96.16 – –WP 2604 99.08 99.58 99.33 -0.02 0.374 0 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.48 -0.66 0.167TOKENS 1044667 96.831 -0.01 0.521 24622 81.93 -0.01 0.958

138

B.68 vb–cop[s] Mapping

(Pos = VB & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–C)(Pos = VBG & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–CG)(Pos = VBD & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–CD)(Pos = VBN & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–CN)(Pos = VBP & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–CP)(Pos = VBZ & SibAllR ∈ {ADJP-PRD, ADJP-PRD-TPC,ADJP-TPC-PRD,...<47 ommitted>...,UCP-LOC-PRD, UCP-PRD, UCP-PRD-LOC}) ⇒(Pos ← V–CZ)

139

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.72 99.34 99.53 -0.00 0.765 1 0.00 99.53 – –CD 40132 99.27 99.57 99.42 -0.00 0.621 3413 99.38 98.51 99.42 0.01 0.374DT 90066 99.46 99.28 99.37 -0.00 0.407 2 0.00 99.37 – –EX 951 95.56 99.58 97.53 0.15 0.224 0 97.53 – –FW 238 63.64 32.35 42.90 -0.28 0.686 67 0.00 0.00 42.90 – –IN 108456 97.83 98.92 98.37 0.01 0.519 22 0.00 98.37 – –JJ 67085 92.33 92.20 92.26 0.05 0.029 4581 79.35 82.89 92.26 0.29 0.001JJR 3621 85.10 92.10 88.46 0.06 0.367 85 91.67 25.88 88.46 -2.59 0.599JJS 2129 95.27 95.63 95.45 -0.00 0.897 54 88.89 59.26 95.45 1.11 0.374MD 10743 99.75 99.74 99.74 0.00 0.771 5 0.00 99.74 – –NN 146173 97.15 95.11 96.12 0.00 0.513 3934 81.27 67.95 96.12 0.36 0.079NNP 100926 94.70 97.93 96.29 -0.01 0.268 6075 82.56 98.14 96.29 0.04 0.230NNPS 2917 54.49 73.23 62.48 -0.21 0.678 234 55.10 11.54 62.48 -2.89 0.486NNS 65922 97.68 97.18 97.43 -0.04 0.164 2353 87.33 90.48 97.43 0.09 0.413PDT 397 70.21 84.89 76.85 0.48 0.460 0 76.85 – –POS 9529 98.68 99.66 99.17 -0.03 0.037 0 99.17 – –PRP 19164 99.85 99.34 99.60 0.01 0.178 11 0.00 0.00 99.60 – –PRP$ 9173 99.39 99.96 99.67 – – 1 0.00 99.67 – –RB 33806 95.19 90.71 92.90 0.10 0.021 516 90.15 81.59 92.90 0.03 0.830RBR 1905 77.62 67.72 72.33 0.65 0.179 7 0.00 72.33 – –RBS 486 87.45 84.57 85.98 -0.07 0.696 2 0.00 0.00 85.98 – –RP 2879 76.19 75.79 75.99 0.35 0.045 0 75.99 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 74.47 35.00 47.62 -2.78 0.317 17 33.33 5.88 47.62 – –VB 29021 96.70 94.51 95.59 -0.07 0.073 570 86.90 79.12 95.59 0.68 0.040VBD 32941 94.76 96.07 95.41 -0.05 0.099 426 81.52 70.42 95.41 0.42 0.430VBG 16321 92.39 94.09 93.23 -0.05 0.345 924 82.50 91.34 93.23 0.32 0.199VBN 22177 88.57 90.14 89.35 -0.23 0.055 716 79.00 68.30 89.35 -3.02 0.029VBP 13819 93.84 92.24 93.03 -0.03 0.514 131 72.88 32.82 93.03 1.84 0.374VBZ 23816 98.73 96.26 97.48 -0.13 0.040 467 89.58 64.45 97.48 -0.33 0.184WDT 4745 97.19 96.35 96.77 -0.00 0.958 4 0.00 96.77 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.70 -0.04 0.720TOKENS 1044667 96.846 -0.01 0.081 24622 84.70 0.09 0.012

140

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.87 99.93 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.64 99.37 99.51 0.03 0.374 1 0.00 99.51 – –CD 40132 99.35 99.43 99.39 -0.02 0.476 3413 98.86 96.89 99.39 -0.21 0.379DT 90066 99.34 99.49 99.41 0.01 0.179 2 0.00 99.41 – –EX 951 95.82 98.84 97.31 0.05 0.417 0 97.31 – –FW 238 54.41 46.64 50.23 -2.91 0.369 67 0.00 0.00 50.23 – –IN 108456 98.17 98.71 98.44 0.02 0.286 22 0.00 0.00 98.44 – –JJ 67085 91.94 91.30 91.62 -0.05 0.550 4581 78.34 72.80 91.62 -0.72 0.488JJR 3621 84.73 90.75 87.64 0.10 0.659 85 63.16 14.12 87.64 – –JJS 2129 94.46 92.11 93.27 0.01 0.872 54 74.55 75.93 93.27 1.83 0.374MD 10743 99.67 99.80 99.73 0.02 0.331 5 0.00 99.73 – –NN 146173 96.17 96.03 96.10 -0.03 0.459 3934 73.57 69.40 96.10 -0.51 0.593NNP 100926 96.72 97.30 97.01 -0.00 0.930 6075 84.45 93.40 97.01 0.15 0.626NNPS 2917 63.57 66.64 65.07 -0.81 0.356 234 32.00 13.68 65.07 -7.78 0.773NNS 65922 97.47 98.00 97.74 0.01 0.751 2353 82.45 88.65 97.74 -0.38 0.447PDT 397 71.56 79.85 75.48 0.40 0.271 0 75.48 – –POS 9529 98.93 99.51 99.22 -0.01 0.714 0 99.22 – –PRP 19164 99.73 99.36 99.55 -0.01 0.616 11 0.00 99.55 – –PRP$ 9173 99.39 99.90 99.65 0.01 0.374 1 0.00 99.65 – –RB 33806 94.05 91.21 92.61 0.04 0.419 516 83.00 81.40 92.61 -0.12 0.880RBR 1905 75.74 66.04 70.56 0.12 0.876 7 0.00 0.00 70.56 – –RBS 486 74.63 82.92 78.56 0.01 0.906 2 0.00 78.56 – –RP 2879 79.17 75.76 77.42 0.49 0.226 0 77.42 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.56 95.27 95.41 -0.01 0.813 570 86.41 74.74 95.41 0.55 0.365VBD 32941 95.49 94.55 95.02 -0.05 0.470 426 79.71 64.55 95.02 -1.27 0.698VBG 16321 91.74 92.25 91.99 0.22 0.124 924 71.37 92.53 91.99 1.74 0.028VBN 22177 86.98 90.49 88.70 -0.15 0.450 716 66.23 84.08 88.70 0.12 0.885VBP 13819 93.43 92.13 92.78 -0.21 0.029 131 61.86 55.73 92.78 5.72 0.604VBZ 23816 98.00 96.46 97.22 0.20 0.009 467 89.60 57.17 97.22 4.57 0.161WDT 4745 96.53 95.57 96.05 -0.04 0.735 4 0.00 96.05 – –WP 2604 99.05 99.62 99.33 -0.02 0.374 0 0.00 99.33 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.62 -0.39 0.411TOKENS 1044667 96.839 -0.00 0.859 24622 81.78 -0.20 0.557

141

B.69 dt–num[l] + jj–num[l] Mapping

(Pos = DT & Wd ∈ {a, another, each, every, little, many, much}) ⇒ (Pos ← DT–1)(Pos = DT & Wd ∈ {these, those}) ⇒ (Pos ← DT–P)(Pos = JJ & Wd ∈ {countless, few, many, numerous, several}) ⇒ (Pos ← JJ–P)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.90 99.90 -0.01 0.374 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.67 99.37 99.52 0.05 0.120 1 0.00 99.52 – –CD 40132 99.34 99.43 99.38 -0.02 0.377 3413 98.63 97.04 99.38 -0.25 0.349DT 90066 99.33 99.48 99.41 0.00 0.763 2 0.00 99.41 – –EX 951 95.24 98.84 97.01 -0.26 0.134 0 97.01 – –FW 238 60.89 45.80 52.28 1.06 0.772 67 100.00 1.49 52.28 – –IN 108456 98.15 98.71 98.43 0.00 0.934 22 0.00 0.00 98.43 – –JJ 67085 92.03 91.06 91.55 -0.13 0.175 4581 79.58 71.80 91.55 -0.69 0.486JJR 3621 84.44 90.53 87.38 -0.19 0.372 85 60.87 16.47 87.38 -25.63 0.409JJS 2129 88.41 96.38 92.22 -1.11 0.832 54 71.93 75.93 92.22 – –MD 10743 99.66 99.80 99.73 0.01 0.418 5 0.00 99.73 – –NN 146173 96.18 96.00 96.09 -0.04 0.333 3934 74.14 69.17 96.09 -0.31 0.746NNP 100926 96.62 97.33 96.97 -0.04 0.193 6075 83.47 93.99 96.97 -0.17 0.512NNPS 2917 62.93 66.99 64.90 -1.08 0.356 234 33.56 20.94 64.90 24.11 0.686NNS 65922 97.49 97.89 97.69 -0.04 0.137 2353 82.56 87.72 97.69 -0.82 0.129PDT 397 71.20 79.09 74.94 -0.32 0.807 0 74.94 – –POS 9529 98.94 99.52 99.23 0.01 0.694 0 99.23 – –PRP 19164 99.68 99.38 99.53 -0.02 0.187 11 18.18 18.18 99.53 – –PRP$ 9173 99.38 99.90 99.64 0.00 0.986 1 0.00 99.64 – –RB 33806 94.04 91.06 92.53 -0.05 0.332 516 84.34 81.40 92.53 0.66 0.616RBR 1905 75.24 65.88 70.25 -0.32 0.455 7 0.00 0.00 70.25 – –RBS 486 84.85 51.85 64.37 -18.06 0.263 2 0.00 64.37 – –RP 2879 79.06 75.16 77.07 0.03 0.923 0 77.07 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.06 95.52 95.29 -0.14 0.204 570 79.53 77.02 95.29 -1.83 0.253VBD 32941 95.55 94.68 95.11 0.05 0.373 426 83.33 63.38 95.11 -0.35 0.948VBG 16321 91.50 92.12 91.81 0.02 0.835 924 69.96 91.99 91.81 0.34 0.668VBN 22177 87.20 90.49 88.81 -0.03 0.823 716 67.16 82.54 88.81 0.07 0.877VBP 13819 93.67 92.05 92.85 -0.12 0.079 131 62.73 52.67 92.85 3.24 0.823VBZ 23816 97.71 96.43 97.06 0.04 0.487 467 84.93 53.10 97.06 -2.10 0.663WDT 4745 96.51 95.64 96.07 -0.02 0.849 4 0.00 96.07 – –WP 2604 99.05 99.58 99.31 -0.04 0.178 0 99.31 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.37 -0.88 0.196TOKENS 1044667 96.810 -0.03 0.258 24622 81.59 -0.43 0.370

142

B.70 to:in + in–sub[s] Mapping

(Pos = TO & Wd = to & SibR = NP) ⇒ (Pos ← IN)(Pos = TO & Par = QP & Wd = to) ⇒ (Pos ← IN)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

SVM True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.95 99.74 99.84 -0.01 0.374 0 99.84 – –, 53640 100.00 99.99 100.00 – – 0 100.00 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 99.98 99.94 99.96 – – 0 99.96 – –CC 26227 99.70 99.37 99.53 0.00 0.987 1 0.00 99.53 – –CD 40132 99.26 99.57 99.42 -0.01 0.190 3413 99.35 98.48 99.42 -0.02 0.374DT 90066 99.42 99.34 99.38 0.00 0.741 2 0.00 99.38 – –EX 951 95.18 99.58 97.33 -0.05 0.374 0 97.33 – –FW 238 64.10 31.51 42.25 -1.77 0.745 67 0.00 0.00 42.25 – –IN 108456 97.96 98.79 98.37 0.01 0.747 22 0.00 98.37 – –JJ 67085 92.35 92.02 92.18 -0.04 0.105 4581 79.36 82.17 92.18 -0.13 0.517JJR 3621 84.97 92.27 88.47 0.07 0.606 85 88.89 28.24 88.47 3.42 0.962JJS 2129 95.24 95.82 95.53 0.07 0.281 54 88.89 59.26 95.53 1.11 0.374MD 10743 99.71 99.79 99.75 0.01 0.178 5 0.00 99.75 – –NN 146173 97.12 95.12 96.11 -0.00 0.622 3934 80.19 67.72 96.11 -0.43 0.029NNP 100926 94.67 97.95 96.28 -0.01 0.239 6075 82.37 98.01 96.28 -0.15 0.046NNPS 2917 54.72 73.19 62.62 0.00 0.966 234 55.10 11.54 62.62 -2.89 0.402NNS 65922 97.73 97.22 97.48 0.01 0.093 2353 87.40 90.52 97.48 0.16 0.004PDT 397 69.87 84.13 76.34 -0.18 0.374 0 76.34 – –POS 9529 98.74 99.65 99.20 0.00 0.995 0 99.20 – –PRP 19164 99.85 99.35 99.60 0.01 0.071 11 0.00 0.00 99.60 – –PRP$ 9173 99.44 99.96 99.70 0.02 0.016 1 0.00 99.70 – –RB 33806 95.03 90.65 92.79 -0.01 0.726 516 90.30 81.20 92.79 -0.14 0.233RBR 1905 77.72 67.03 71.98 0.17 0.648 7 0.00 71.98 – –RBS 486 88.01 84.57 86.25 0.24 0.525 2 0.00 0.00 86.25 – –RP 2879 75.16 75.55 75.35 -0.49 0.094 0 75.35 – –SYM 59 78.18 72.88 75.44 – – 1 0.00 75.44 – –TO 24551 99.99 100.00 99.99 – – 0 99.99 – –UH 100 74.47 35.00 47.62 -2.78 0.317 17 33.33 5.88 47.62 – –VB 29021 96.85 94.35 95.58 -0.08 0.171 570 86.83 68.25 95.58 -7.10 0.025VBD 32941 95.00 95.90 95.45 -0.01 0.407 426 79.84 70.66 95.45 -0.37 0.531VBG 16321 92.40 94.16 93.27 -0.01 0.833 924 82.12 91.45 93.27 0.13 0.667VBN 22177 88.14 90.95 89.53 -0.04 0.462 716 78.14 72.91 89.53 -0.14 0.844VBP 13819 93.95 92.26 93.10 0.04 0.159 131 72.31 35.88 93.10 7.91 0.307VBZ 23816 98.83 96.43 97.61 0.00 0.697 467 89.71 65.31 97.61 0.49 0.369WDT 4745 96.39 96.90 96.65 -0.13 0.333 4 0.00 96.65 – –WP 2604 98.90 100.00 99.45 – – 0 99.45 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 50.64 -0.14 0.449TOKENS 1044667 96.846 -0.01 0.251 24622 84.42 -0.24 0.015

143

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.91 99.91 99.91 -0.00 0.374 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.60 99.40 99.50 0.02 0.445 1 0.00 0.00 99.50 – –CD 40132 99.33 99.47 99.40 -0.01 0.648 3413 98.69 97.10 99.40 -0.19 0.361DT 90066 99.31 99.48 99.40 -0.01 0.726 2 0.00 99.40 – –EX 951 95.63 98.84 97.21 -0.05 0.917 0 97.21 – –FW 238 50.88 48.32 49.57 -4.18 0.206 67 100.00 1.49 49.57 – –IN 108456 98.20 98.63 98.41 -0.02 0.370 22 0.00 0.00 98.41 – –JJ 67085 92.03 91.26 91.64 -0.02 0.735 4581 78.81 72.65 91.64 -0.54 0.505JJR 3621 85.02 89.98 87.43 -0.14 0.512 85 75.00 21.18 87.43 -5.26 0.374JJS 2129 91.74 92.34 92.04 -1.30 0.395 54 71.93 75.93 92.04 – –MD 10743 99.66 99.78 99.72 0.00 0.979 5 0.00 99.72 – –NN 146173 96.14 96.08 96.11 -0.02 0.485 3934 74.01 69.57 96.11 -0.09 0.902NNP 100926 96.75 97.29 97.02 0.01 0.860 6075 84.36 93.94 97.02 0.37 0.235NNPS 2917 63.61 66.40 64.98 -0.95 0.259 234 34.69 14.53 64.98 -1.43 0.842NNS 65922 97.55 97.92 97.74 0.01 0.747 2353 82.21 88.99 97.74 -0.35 0.311PDT 397 70.15 81.11 75.23 0.07 0.958 0 75.23 – –POS 9529 98.95 99.54 99.24 0.02 0.094 0 99.24 – –PRP 19164 99.73 99.37 99.55 0.00 0.748 11 0.00 99.55 – –PRP$ 9173 99.38 99.93 99.66 0.02 0.205 1 0.00 99.66 – –RB 33806 94.06 91.05 92.53 -0.05 0.364 516 84.17 81.40 92.53 0.56 0.358RBR 1905 74.44 66.35 70.16 -0.44 0.626 7 0.00 0.00 70.16 – –RBS 486 72.27 70.78 71.52 -8.95 0.388 2 0.00 71.52 – –RP 2879 78.59 76.38 77.47 0.55 0.126 0 77.47 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.98 99.99 99.99 -0.00 0.178 0 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.48 95.34 95.41 -0.01 0.822 570 84.39 74.91 95.41 -0.43 0.639VBD 32941 95.68 94.58 95.12 0.06 0.138 426 81.68 61.74 95.12 -2.67 0.068VBG 16321 91.49 92.10 91.79 -0.00 0.970 924 70.80 93.40 91.79 1.68 0.027VBN 22177 87.01 90.63 88.78 -0.06 0.546 716 65.67 79.89 88.78 -2.59 0.142VBP 13819 93.77 92.33 93.05 0.08 0.343 131 66.36 54.20 93.05 7.58 0.388VBZ 23816 97.68 96.63 97.15 0.13 0.097 467 85.17 57.82 97.15 3.18 0.320WDT 4745 96.37 96.23 96.30 0.22 0.036 4 0.00 96.30 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 99.91 99.87 99.89 -0.04 0.374 1 0.00 0.00 99.89 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.54 -0.53 0.186TOKENS 1044667 96.834 -0.01 0.344 24622 81.88 -0.08 0.779

144

B.71 vb–cop[lm] Mapping

(Pos = VBZ & Wd ∈ {appears, becomes, feels, looks, remains, seems, smells, sounds}) ⇒(Pos ← V–JZ)(Pos = VB & Wd ∈ {appear, become, feel, look, remain, seem, smell, sound}) ⇒ (Pos ← V–J)(Pos = VBP & Wd ∈ {appear, become, feel, look, remain, seem, smell, sound}) ⇒ (Pos ←V–JP)(Pos = VBN & Wd ∈ {appeared, become, felt, looked, remained, seemed, smelled, smelt,sounded}) ⇒ (Pos ← V–JN)(Pos = VBD & Wd ∈ {appeared, became, felt, looked, remained, seemed, smelled, smelt,sounded}) ⇒ (Pos ← V–JD)

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.56 99.40 99.48 0.00 0.778 1 0.00 0.00 99.48 – –CD 40132 99.35 99.45 99.40 -0.01 0.621 3413 98.99 97.22 99.40 0.02 0.937DT 90066 99.33 99.48 99.40 -0.00 0.961 2 0.00 99.40 – –EX 951 95.33 98.84 97.06 -0.21 0.231 0 97.06 – –FW 238 56.15 44.12 49.41 -4.49 0.329 67 0.00 49.41 – –IN 108456 98.16 98.69 98.42 -0.00 0.714 22 7.14 4.55 98.42 – –JJ 67085 91.91 91.36 91.63 -0.03 0.666 4581 78.34 73.35 91.63 -0.33 0.556JJR 3621 84.72 90.78 87.64 0.11 0.710 85 73.33 25.88 87.64 – –JJS 2129 88.36 96.67 92.33 -1.00 0.875 54 70.00 77.78 92.33 -0.26 0.374MD 10743 99.66 99.77 99.71 -0.00 0.626 5 0.00 99.71 – –NN 146173 96.25 96.01 96.13 0.00 0.907 3934 74.72 68.73 96.13 -0.26 0.685NNP 100926 96.70 97.31 97.00 -0.01 0.586 6075 84.11 93.42 97.00 -0.05 0.855NNPS 2917 63.95 65.75 64.84 -1.16 0.012 234 36.07 9.40 64.84 -28.22 0.098NNS 65922 97.47 97.94 97.71 -0.02 0.380 2353 80.54 89.50 97.71 -1.15 0.046PDT 397 70.47 79.35 74.64 -0.71 0.231 0 74.64 – –POS 9529 98.89 99.52 99.20 -0.02 0.310 0 99.20 – –PRP 19164 99.74 99.36 99.55 -0.00 0.993 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 – – 1 0.00 99.64 – –RB 33806 94.08 90.96 92.49 -0.09 0.367 516 84.35 80.43 92.49 0.06 0.991RBR 1905 75.71 66.09 70.57 0.14 0.836 7 0.00 0.00 70.57 – –RBS 486 85.96 51.65 64.52 -17.86 0.265 2 0.00 64.52 – –RP 2879 79.23 75.41 77.27 0.30 0.137 0 77.27 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.19 46.00 56.79 -1.52 0.374 17 0.00 56.79 – –VB 29021 95.50 95.16 95.33 -0.10 0.147 570 85.63 75.26 95.33 0.51 0.704VBD 32941 95.60 94.55 95.07 0.01 0.804 426 82.57 63.38 95.07 -0.74 0.469VBG 16321 91.24 92.38 91.81 0.01 0.863 924 69.63 92.53 91.81 0.32 0.723VBN 22177 87.04 90.76 88.86 0.02 0.552 716 66.21 81.01 88.86 -1.54 0.205VBP 13819 93.37 92.28 92.82 -0.16 0.033 131 63.06 53.44 92.82 4.31 0.652VBZ 23816 97.66 96.59 97.12 0.10 0.192 467 85.52 54.39 97.12 -0.39 0.894WDT 4745 96.66 95.76 96.21 0.13 0.170 4 0.00 96.21 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.45 -0.71 0.037TOKENS 1044667 96.827 -0.02 0.113 24622 81.73 -0.26 0.423

145

B.72 in/rp/rb[l] Mapping

(Pos = RP & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN–RB)(Pos = RP & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RP–RB)(Pos = RB & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RB–IN–RP)(Pos = RB & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RB–RP)(Pos = RB & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←RB–IN)(Pos = IN & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RB–RP)(Pos = IN & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←IN–RB)

146

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.88 99.92 99.90 -0.01 0.373 0 99.90 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.62 99.37 99.50 0.02 0.430 1 0.00 0.00 99.50 – –CD 40132 99.33 99.41 99.37 -0.04 0.119 3413 98.86 96.69 99.37 -0.32 0.227DT 90066 99.33 99.46 99.40 -0.01 0.725 2 0.00 99.40 – –EX 951 94.84 98.53 96.65 -0.63 0.152 0 96.65 – –FW 238 56.31 48.74 52.25 1.01 0.633 67 80.00 5.97 52.25 – –IN 108456 98.15 98.70 98.42 -0.00 0.796 22 0.00 0.00 98.42 – –JJ 67085 91.93 91.21 91.57 -0.10 0.124 4581 78.75 72.50 91.57 -0.68 0.175JJR 3621 84.58 90.61 87.49 -0.06 0.697 85 81.25 30.59 87.49 27.49 0.353JJS 2129 90.76 93.14 91.93 -1.42 0.786 54 70.69 75.93 91.93 -0.89 0.374MD 10743 99.65 99.79 99.72 0.00 0.975 5 0.00 0.00 99.72 – –NN 146173 96.20 95.97 96.09 -0.04 0.145 3934 74.03 69.06 96.09 -0.45 0.280NNP 100926 96.63 97.30 96.97 -0.05 0.059 6075 83.29 94.06 96.97 -0.25 0.170NNPS 2917 63.00 66.85 64.87 -1.11 0.242 234 33.02 14.96 64.87 -0.92 0.913NNS 65922 97.61 97.87 97.74 0.01 0.714 2353 83.90 87.25 97.74 -0.26 0.333PDT 397 70.58 80.35 75.15 -0.04 0.989 0 75.15 – –POS 9529 98.93 99.53 99.23 0.01 0.744 0 99.23 – –PRP 19164 99.73 99.36 99.55 -0.00 0.369 11 0.00 99.55 – –PRP$ 9173 99.38 99.90 99.64 0.00 0.986 1 0.00 99.64 – –RB 33806 94.09 90.97 92.50 -0.08 0.021 516 83.43 81.01 92.50 -0.11 0.935RBR 1905 75.42 65.72 70.24 -0.33 0.623 7 20.00 14.29 70.24 – –RBS 486 73.00 65.64 69.12 -12.00 0.396 2 0.00 69.12 – –RP 2879 78.04 76.66 77.34 0.39 0.345 0 77.34 – –SYM 59 80.33 83.05 81.67 – – 1 0.00 81.67 – –TO 24551 99.98 100.00 99.99 – – 0 0.00 99.99 – –UH 100 74.60 47.00 57.67 – – 17 0.00 57.67 – –VB 29021 95.21 95.43 95.32 -0.11 0.176 570 83.53 75.61 95.32 -0.42 0.611VBD 32941 95.68 94.53 95.10 0.04 0.414 426 82.28 64.32 95.10 -0.07 0.954VBG 16321 91.37 92.36 91.86 0.07 0.542 924 70.25 93.29 91.86 1.19 0.267VBN 22177 87.01 90.62 88.78 -0.07 0.458 716 66.36 80.73 88.78 -1.57 0.209VBP 13819 93.72 92.10 92.90 -0.07 0.543 131 55.12 53.44 92.90 -2.16 0.771VBZ 23816 97.73 96.59 97.16 0.13 0.056 467 88.12 57.17 97.16 3.89 0.246WDT 4745 96.42 95.81 96.11 0.02 0.903 4 0.00 96.11 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.42 -0.78 0.146TOKENS 1044667 96.812 -0.03 0.124 24622 81.68 -0.32 0.150

147

B.73

(Pos = TO & SibR = NP) ⇒ (Pos ← TO–IN)(Pos = TO & Par = QP) ⇒ (Pos ← TO–Q)(Pos = RP & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RP–IN–RB)(Pos = RP & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RP–RB)(Pos = RB & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←RB–IN–RP)(Pos = RB & Wd ∈ {ahead, apart, aside,...<5 ommitted>...,open, together, yet}) ⇒ (Pos ←RB–RP)(Pos = RB & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←RB–IN)(Pos = IN & Wd ∈ {about, across, along,...<13 ommitted>...,up, upon, with}) ⇒ (Pos ←IN–RB–RP)(Pos = IN & Wd ∈ {’til, aboard, above,...<23 ommitted>...,then, though, under}) ⇒ (Pos ←IN–RB)(Pos = IN & SibR = S) ⇒ (Pos ← IN–SUB)(Pos = IN & Par = SBAR) ⇒ (Pos ← IN–SUB)

148

TBL True

Prec

Rec

F(0

.5)

Fchg

p(F

)

TrueU

PrecU

RecU

F(0

.5)U

FchgU

p(F

U)

# 158 100.00 100.00 100.00 – – 0 100.00 – –$ 8103 100.00 99.98 99.99 – – 1 0.00 99.99 – –” 7620 99.90 99.92 99.91 – – 0 99.91 – –, 53640 100.00 99.99 99.99 – – 0 99.99 – –-LRB- 1489 100.00 100.00 100.00 – – 0 100.00 – –-RRB- 1505 100.00 100.00 100.00 – – 0 100.00 – –. 43373 100.00 100.00 100.00 – – 0 100.00 – –: 5335 100.00 99.94 99.97 – – 0 99.97 – –CC 26227 99.61 99.36 99.48 0.01 0.765 1 0.00 99.48 – –CD 40132 99.33 99.40 99.37 -0.04 0.093 3413 98.80 96.45 99.37 -0.47 0.085DT 90066 99.33 99.46 99.40 -0.01 0.667 2 0.00 99.40 – –EX 951 94.74 98.53 96.60 -0.68 0.136 0 96.60 – –FW 238 57.89 46.22 51.40 -0.64 0.579 67 0.00 51.40 – –IN 108456 98.16 98.69 98.43 -0.00 0.951 22 0.00 0.00 98.43 – –JJ 67085 91.87 91.29 91.58 -0.09 0.135 4581 78.10 72.84 91.58 -0.84 0.136JJR 3621 84.62 90.83 87.61 0.07 0.696 85 80.00 23.53 87.61 4.31 0.783JJS 2129 91.64 91.17 91.41 -1.99 0.544 54 71.93 75.93 91.41 – –MD 10743 99.67 99.80 99.73 0.01 0.299 5 0.00 99.73 – –NN 146173 96.16 95.97 96.07 -0.06 0.033 3934 72.96 68.71 96.07 -1.42 0.016NNP 100926 96.65 97.31 96.98 -0.04 0.162 6075 83.61 94.11 96.98 -0.02 0.928NNPS 2917 63.29 66.13 64.68 -1.41 0.149 234 37.17 17.95 64.68 16.50 0.499NNS 65922 97.58 97.90 97.74 0.01 0.551 2353 84.04 86.83 97.74 -0.42 0.253PDT 397 70.51 80.10 75.00 -0.24 0.599 0 75.00 – –POS 9529 98.95 99.54 99.24 0.02 0.519 0 99.24 – –PRP 19164 99.71 99.34 99.52 -0.03 0.032 11 0.00 99.52 – –PRP$ 9173 99.33 99.89 99.61 -0.03 0.110 1 0.00 99.61 – –RB 33806 94.08 90.94 92.48 -0.10 0.021 516 85.13 81.01 92.48 0.88 0.498RBR 1905 75.99 65.30 70.24 -0.33 0.798 7 0.00 0.00 70.24 – –RBS 486 68.19 70.58 69.36 -11.70 0.330 2 0.00 69.36 – –RP 2879 78.05 76.69 77.37 0.42 0.487 0 77.37 – –SYM 59 80.65 84.75 82.64 1.20 0.374 1 0.00 82.64 – –TO 24551 99.99 100.00 99.99 0.00 0.374 0 99.99 – –UH 100 74.19 46.00 56.79 -1.52 0.374 17 0.00 56.79 – –VB 29021 95.29 95.36 95.33 -0.10 0.339 570 84.87 75.79 95.33 0.46 0.817VBD 32941 95.55 94.45 95.00 -0.07 0.300 426 83.84 64.55 95.00 0.96 0.461VBG 16321 91.40 92.06 91.73 -0.07 0.293 924 70.04 92.10 91.73 0.45 0.550VBN 22177 86.98 90.40 88.66 -0.20 0.119 716 66.40 80.03 88.66 -1.93 0.264VBP 13819 93.43 92.15 92.79 -0.20 0.116 131 52.67 52.67 92.79 -5.03 0.421VBZ 23816 97.79 96.59 97.18 0.16 0.007 467 88.24 57.82 97.18 4.65 0.274WDT 4745 96.34 95.97 96.16 0.07 0.628 4 0.00 96.16 – –WP 2604 99.08 99.62 99.35 – – 0 99.35 – –WP$ 183 100.00 100.00 100.00 – – 0 100.00 – –WRB 2322 100.00 99.87 99.94 – – 1 0.00 99.94 – –“ 7811 100.00 99.99 99.99 – – 1 0.00 99.99 – –SENT 43766 51.15 -1.30 0.033TOKENS 1044667 96.801 -0.04 0.079 24622 81.57 -0.46 0.094

149